A community for technology geeks in Lebanon.

You are not logged in.

- Topics: Active • Unanswered

**arithma****Member**

Given a set of unique strings, all of the same size, determine the minimum number of compares needed to identify an input of the same size to one of the strings in the given set, and the string is always one of the given strings.

**Cleaner Restatement**

For a set of unique strings, all of the same size, there exists a machine that takes a string input, and only by doing single character equality comparisons will determine an index that identifies one of those strings.

Given a set of such unique strings, determine the minimum number of comparisons needed by such a machine to identify any of those strings.**Further Clarification**

Please do note that the comparison constraint applies only to the mentioned machine and not to the algorithm itself. You may compare as you please within the algorithm.

**Example 1**

Set: ["a", "b"] (two strings as the given set)

Output: 1. (Only one comparison is needed to decided which of the two strings an input is equal to)

**Example 2**

Set: ["ab", "bc"]

Output: 1.

**Example 3**

Set: ["abc", "abd", "dbc"]

Output: 2.

Note: Doing the sample exercises is fun, but I am not sure whether this problem is actually hard or not. Worth the thought though.

P.S.: I just really, reaally hope that geek won't find it pretty to solve using Mathematica.

*Last edited by arithma (February 13 2012)*

**jsaade****Member**

I am not sure i understand it? It looks to me it will always be length-1

**arithma****Member**

For 2^10 values having only bits, you would need 10 comparisons. For a subset of these 2^10 strings, I would imagine you need less comparisons.

**arithma****Member**

I think I under defined the problem.

Comparisons are only equal operations to a single character at a single location that returns either true or false.

Has anyone found this exercise interesting?

**Joe****Member**

I'm going to have to go with ZeRaW on this one. It seems that it's always going to be len(set) -1, unless I misunderstood the problem ... :-/

**jsaade****Member**

@arithma, the character by character comparison made sense now.

I ll try to get involved in it tomorrow.

**arithma****Member**

If it still doesn't make sense, I think this will help (I will merge all of this back to the original problem statement when I know what to say):

A machine is going to make a single equality comparison at a time on the input to determine the input's index in the original set. How many comparisons does an optimized machine need to do to guarantee its ability to output the index. Note that there is for every different set of strings a different machine altogether.

**Example 4**

Set: ["aa", "ab", "ba", "bb"]

Output: 2.

If the machine sees an "a" at index 0, then the input is either "aa", or "ab".

**Example 5**

Set: ["aa", "ab", "ba"]

Output: 2.

*Last edited by arithma (February 6 2012)*

**arithma****Member**

I updated the original post to include a cleaner problem statement.

**raja****Member**

This is my solution. It runs in exponential time versus the length of the strings(though it does have some pruning, it is still exponential in the worst case).

If anyone needs explanation on how and why it works(or if you have counter examples) let me know :)

```
def prune(strings):
"""
Removes from the given strings all the parts common to all strings
"""
result = [""] * len(strings)
for i in range(len(strings[0])):
if len(set([a[i] for a in strings])) > 1:
for j, s in enumerate(strings):
result[j] += s[i]
return result
def classify(strings, n):
"""
given an array of strings gives a map of nth letter => array of strings
"""
result = {}
for s in strings:
if s[n] not in result:
result[s[n]] = []
result[s[n]].append(s)
return result
def max_comparisons(strings):
if len(strings) == 1:
return 0
if len(strings[0]) == 1:
return len(strings) - 1
strings = prune(strings)
m = None
for i in range(len(strings[0])):
groups = map(max_comparisons, classify(strings, i).values())
groups.sort(reverse=True)
groups[0] += 1
for i in range(1, len(groups)):
groups[i] += i
if m is None:
m = max(groups)
m = min(m, max(groups))
return m
```

**xterm****Moderator**

this exercise seems *somewhat* similar to the super self insertion exercise.

Correct me if i'm wrong.

**xterm****Moderator**

Simple quick solution. Test it here.

It's somewhat quadratic (On2).

Can we please define a given set of parameters so that we could all test against.

Perhaps:

```
input = ["bced","abed","abce","aceb","abcd"]
str = "abcd"
```

(Using quote to preserve indentation when copying to the console)

input = ["bced","abed","abce","aceb","abcd"]

str = "abcd"

checks = 0kick = {s -> s.length()>1?s[1..-1]:null}

check = {s,c -> s[0] == c}str.each { c ->

drop = []

input.eachWithIndex { s,i ->

print "$c : $s "

checks += 1

if(check(s,c)){

print "(caught)"

input[i] = kick(s)

} else{

print "(not caught)"

drop << input[i]

}

println()

}

drop.each { input.remove(it) }

}println "Total single char checks : $checks"

**raja****Member**

Hey there xterm, I ran your code(I'm assuming that's groovy) and it outputs "14". However, that is not correct. Given the input strings ["bced","abed","abce","aceb","abcd"] you can construct a machine which will uniquely identify one of them with 3 comparisons at most.

Here are the steps to uniquely identify a string from among those 5 strings.

compare the input string to the second letter of "bced"

If it matches:

We can eliminate "abed" "abce" and "abcd". Leaving us with 2 choices: "bced" and "aceb"

One further comparison is needed to distinguish those two, compare first letter of input with

first letter of "bced".

If it matches:

respond "bced"

Otherwise:

respond "aceb"

Total comparisons: 2

If it doesn't match:

We can eliminate "bced" and "aceb" leaving us with "abed" "abce" and "abcd"

We compare the input with the third letter of "abed".

If it matches:

respond "abed". Done, 2 comparisons

If it doesn't match:

We are left with "abce" and "abcd". Compare the last letter of the input with the last letter of "abce".

If it matches:

respond "abce". Done, 3 comparisons

Otherwise:

respond "abcd". Done, 3 comparisons

The answer is thus 3, since that is the number of comparisons an algorithm will need to make if it knows the set of all possible strings ahead of time and can devise a perfect search algorithm(which is how I understood this problem, though I'm open to correction)

PS: If my understanding of the problem is correct, it is quite obvious that the upper bound on the solution is the length of the input strings. Meaning if our input strings are of length 4, as in your example, that is the upper bound. Since you can, with each 1 comparison, eliminate all strings that are not equal at that position.

PPS: Given that observation, if one could devise an efficient solution to the following subproblem: "Is it possible to uniquely identify one of these strings in `n` comparisons?", it would be possible to find the solution via a binary search on the entire solution space which is 0=>length(strings[0])

*Last edited by raja (February 8 2012)*

**xterm****Moderator**

My understanding is the total number of single character compares. My initial implementation mimicked a NxM grid that does a simple character by character "check and kick". I ammended the code with some debugging info for your consideration

(Yes this is groovy)

Seems like we still need more information arithma.

P.S.: Keep in mind raja that i simply opted for a quick solution, not the optimal one, but i presume you know that since i already mentioned that its quadratic.

*Last edited by xterm (February 8 2012)*

**raja****Member**

That is true, total number of single compares, but you're allowed to use your prior knowledge of the set of all possible strings(the input in your code). Given that knowledge, my explanation shows why your input should have an output of 3, no?

**xterm****Moderator**

raja wrote:

That is true, total number of single compares, but you're allowed to use your prior knowledge of the set of all possible strings(the input in your code). Given that knowledge, my explanation shows why your input should have an output of 3, no?

OH! You were giving a better algorithm, for some reason i thought you meant my code vs the output is incorrect. Yes, I'm pretty sure your pseudo is reasonable.

How many single character compares does your code gives on my input?

**raja****Member**

My code outputs 3 compares. My code essentially analyzes the input set and determines the best algorithm(the one that minimizes compares) to uniquely identify any one of these strings. It then outputs the depth of the deepest comparison tree of this algorithm.

This is how I understood this problem. It would really be helpful if Arithma were to clear things up for us.

**xterm****Moderator**

I presume there is limitation as well on working on the set prior to starting the character checks.

On the following input:

input = ["fVq9jcBEMNrVl0M2A8VmfNPLgd7kK5JD",

"I9FlZT90gu2n9ywNFBljC7jce2rR0SFJ",

"lGOmu7loH80DbOyaYqzQBGwTPDwHlr8u",

"rFqF5g9u5TXAAXM2sdxHpfZ4VzFBOgNN",

"lGOmu7loH80DbOyfb6NuuWcEP81D7J5r",

"tEANjAjjNwvGfG3uhjjAtFfg3OmYXVK2",

"lGOmu7loH80DOyncyYpajmdnNIGH6HSu",

"zQUwjMzloRmzlR5jYuvaaa9sqo9AH09B",

"JsNn1Hu7J2FjMxoi8t49AogNogfPKlk6",

"lGOmu7loH80DbOyFbtVa1K1m6UxQFvX5",]

str = "lGOmu7loH80DbOyFbtVa1K1m6UxQFvX5"

Your code gave an output of 4 compares, which would only mean that you might be acting too much on the given set prior to doing the compares. Whether this is allowed, is really beyond my understanding, we really need to get more requirements info from arithma.

My code gave 83 character compares with a straight up brute force algorithm.

**raja****Member**

I don't think there is a limitation actually. From Arithma's statement:

For a set of unique strings, all of the same size, there exists a machine that takes a string input, and only by doing single character equality comparisons will determine an index that identifies one of those strings

That seems to very strongly imply that said machine is tailor-made to that input.

Now, as for your example, my answer may not be correct, but yours definitely is too high. Given that set of inputs, all you would need to do is compare the last letter of every string, in turn, to your input string. Given that the strings in that set all have unique endings, this will take at most 9(len(set)-1) comparisons.

*Last edited by raja (February 8 2012)*

**arithma****Member**

raja's interpretation and solution seem to be correct. That's how I intended the problem to be.

However this solution is correct, but I am not sure it is optimized. xterm, yours is definitely wrong. We can either use raja's solution as a reference or prove it wrong (which I doubt).

BTW, raja, check your Private Message Inbox.

**raja****Member**

Yes, I'm not sure it's optimized either. In my PPS above I outlined a possibly better approach though I haven't given it much further thought(I might later).

As for the PM, check your email inbox.

**xterm****Moderator**

arithma wrote:

raja's interpretation and solution seem to be correct. That's how I intended the problem to be.

However this solution is correct, but I am not sure it is optimized. xterm, yours is definitely wrong. We can either use raja's solution as a reference or prove it wrong (which I doubt).

Then I take myself out of this exercise because i clearly cannot seem to understand the requirements. How is it possible that given an array of 10 strings of 33 characters where 4 of the strings diverge into different character sequence on the 15th character would result in merely ** 4** single character compares is beyond me.

**xterm****Moderator**

rahmu wrote:

Anyone can brute force. I don't mean this negatively. I'm trying to make sense of the exercise. It is my understanding that arithma wanted to answer the following question:

Is there anything clever you can do to avoid bruteforcing the solution?.

I made it quite clear that this my code is a mere bruteforcing solution, not an optimized one. As i understood the exercise is "Calculate the least amount of single character compares needed to find the string".

The reason i wrote the brute force is to have a comparison measure when i write a better solution.

**raja****Member**

xterm, this is my attempt at an explanation:

First consider the **String Finder(tm)** machine. This machine has a list of strings in its internal memory, and when you input into it a string, it will tell you where, in its memory, this input string resides.

Now, you are a maker of **String Finder(tm)** machines. When customers contact you, they give you the strings they want to have in your machine. Your job is to provide them with the machine that will be as fast as possible, given their spec.

Arithma's question is: A customer has contacted you and given you a set of strings. You build for them a **String Finder(tm)** machine. Given that a single character comparison takes 1s, how much time would the customer have to wait, in the worst case, after they input a string? You are allowed to take as much time as you want when building the **String Finder(tm)**, however, it is *not* allowed to be slow when the customer uses it.

*Last edited by raja (February 8 2012)*

**arithma****Member**

xterm wrote:

I presume there is limitation as well on working on the set prior to starting the character checks.

On the following input:

input = ["fVq9jcBEMNrVl0M2A8VmfNPLgd7kK5JD",

"I9FlZT90gu2n9ywNFBljC7jce2rR0SFJ",

"lGOmu7loH80DbOyaYqzQBGwTPDwHlr8u",

"rFqF5g9u5TXAAXM2sdxHpfZ4VzFBOgNN",

"lGOmu7loH80DbOyfb6NuuWcEP81D7J5r",

"tEANjAjjNwvGfG3uhjjAtFfg3OmYXVK2",

"lGOmu7loH80DOyncyYpajmdnNIGH6HSu",

"zQUwjMzloRmzlR5jYuvaaa9sqo9AH09B",

"JsNn1Hu7J2FjMxoi8t49AogNogfPKlk6",

"lGOmu7loH80DbOyFbtVa1K1m6UxQFvX5",]

str = "lGOmu7loH80DbOyFbtVa1K1m6UxQFvX5"Your code gave an output of 4 compares, which would only mean that you might be acting too much on the given set prior to doing the compares. Whether this is allowed, is really beyond my understanding, we really need to get more requirements info from arithma.

My code gave 83 character compares with a straight up brute force algorithm.

I am a little skeptical about the ability to find an index using only 4 compares, so this might be worth checking.

What I meant earlier is that raja's interpretation is spot on, and after looking at the code, it does seem to capture the right way. I would guess it should fail to halt anytime soon on the above dataset, but again, this is all hunches. It may as well halt with 4.

**xterm****Moderator**

Give me a moment while i explain the differences between raja's implementation and mine (which is insanely subtle)