A community for technology geeks in Lebanon.

You are not logged in.

- Topics: Active • Unanswered

Pages: **1**

**arithma****Member**

*This is exercise is an original, so please work with me to see if there is any mishaps.*

Given any string, choose a random position in this string, and place a copy of the original string there.

Example: "abcd" -> "ab**abcd**cd". It could have equally likely been "abc**abcd**d" or "a**abcd**bcd" or even "**abcd**abcd".

Now given a string resulting from this operation, return the possible original strings:

Example: "xyxyzz". The solution will contain this item: xyz (from "xy**xyz**z). There will not be any other alternative in this case.

The algorithm data specification is to take a string as input and to return a list of strings as output. Each item in the output should be mappable through the self copy operation to the input string.

**scatman****Member**

i assume that there could be duplicate letters right?

ie aaabcabc has a single copy operation of: aabc?

if its not the case (ie no duplicates) the problem is trivial!

**arithma****Member**

Yes of course. It's trivial in all cases really, as I've just discovered with a friend at work. But let's probe our thinking styles and esoteric solutions.

**ali.koubeissi****Member**

Ummmm. Sleep Sort to the rescue!.

**xterm****Moderator**

The hatred grows.

input = "aabaabcc"; input.toList().subsequences().find { input - it.join() == it.join() }.join()

Test here

**xterm****Moderator**

ali.koubeissi wrote:

Ummmm. Sleep Sort to the rescue!.

Oh lord, this is incredibly funny!

**jsaade****Member**

xterm wrote:

The hatred grows.

input = "aabaabcc"; input.toList().subsequences().find { input - it.join() == it.join() }.join()

Test here

damn.

**xterm****Moderator**

Here's one that's exponentially more performant:

input = "aabaabcc"; (0..input.size()/2).collect { input[it..it+input.size()/2-1] }.find { input - it == it }

**MSD****Member**

C#

```
private List<string> FindOriginals(string input)
{
List<string> result = new List<string>();
if(input.Length % 2 != 0) return result;
int halfSize = input.Length / 2;
for(int i = 0; i < halfSize; i++)
{
string selection = input.Substring(i, halfSize);
if(input.Replace(selection, string.Empty) == selection)
result.Add(selection);
}
return result;
}
```

**arithma****Member**

When I reread the title, I figured that I didn't notice the comedy that it is.

Here's little bit of C# that looks a bit much like Groovy

```
var input = "I inserted it iI inserted it in my selfn my self";
var hLength = input.Length / 2;
var results =
Enumerable
.Range(0, input.Length / 2)
.Select(i => new { i = i,
insert = input.Substring(i, hLength),
rest = input.Substring(0, i) + input.Substring(i + hLength) })
.Where(x => x.insert == x.rest)
.Select(x => x.insert)
.ToList();
MessageBox.Show(String.Join(",\n", results));
```

Ok, it's a bit inflated with ceremony.

[EDIT] *Will I ever be able to post without having to edit it a second later!*

*Last edited by arithma (June 16 2011)*

**xterm****Moderator**

You know, i refrained from commenting on the title, but i chuckled when i read it.

**arithma****Member**

While exercise may be a little easy, try this for a brain scratch: Try to prove (or disprove, I haven't reached a solution) that there can at most one unique solution for any input string.

**Joe****Member**

Although xterm's solution is very elegant, it actually fails to address the performance issue. It's easy to brute force (I'm not saying Groovy does, but it might), but coming up with the best algorithm for the current issue can be very tricky.

The exercise is far from easy. There are just two ways to think about it. The "dynamic" Groovy/Perl/Python way is not easy or cheap. It just addresses the issue differently.

If you think of the **algorithm** to solve that issue, it becomes different. Enumerable/Range/Where, doesn't tell doesn't tell you anything about the complexity. How many loops? how many checks? If you care about this stuff, this exercise is actually pretty interesting.

**MSD****Member**

Now this is an informal proof, not tidy you have been warned.

Assume abcde has 2 solutions (a b c d and e are not letters, they represent substrings of input)

sol 1:

abe = cd

sol 2:

ade = bc

a, c, and e have a common start

d, e, and c have a common end

so

xAxBEy = xCyDy

xADyEy = xBxCy

AxBE = CyD

ADyE = BxC

now

A, C, and B have a common start

D, E, and C have a common end

XA'xXB'E'Y = XC'YD'Y

XA'D'YyE'Y = XB'xXC'Y

A'xXB'E' = C'YD'

A'D'YyE' = B'xXC'

simplifying

AxBE = CyD

ADzE = BxC

repeating like this leads to equality of A, B, C, D, E

I think this means that there is one solution only.

**arithma****Member**

@rahmu: My code is line for line equivalent to MSD's. Additionally its parallelizable. So not only is the complexity deterministic, it also encodes the parallesim of the problem.

I think I cooked up a little proof, but it needs some peer review. I'll share the scratch with you.

The idea is to start with an already factored string, assume there is another factoring out, and then proving that the other factoring must equal the first one.

The following works based on the assumption that XXYY is a valid factorization. Then it assumes UUVV is another valid factorization where |U| (length) is greater or equal to that of |X|. It then splits X into two parts such that U = X1+X2+X1. It proves the property that U = X by determining that the length of X1 is 0.

The sketch of the proof follows:

*Last edited by arithma (June 17 2011)*

**arithma****Member**

The above proof is a little bit off.

The correct way is:

X1X1X2X1Y1Y2Y1Y1 = X1X2Y1Y2

in terms of length: 3*X1+X2+3*Y1+Y2=X1+X2+Y1+Y2

2*X1+2*Y1=0. Since X1 and Y1 are not negative, they both have to be 0 to satisfy the equation.

**xterm****Moderator**

*Edit: Pointless blabber about pointless things*

*Last edited by xterm (June 17 2011)*

**arithma****Member**

@xterm: Does groovy evaluate the lambda's inside lazily or eagerly?

This is for the sake of comparison with LINQ's lazy technique.

Since we just proved that there should only be one solution, my algorithm should return a single string or null, and should use SingleOrDefault(...) instead of Where(...). SingleOrDefault will halt the preceding enumeration of integers. Interesting stuff would happen in the Parallel case, if they have implemented things exhaustively performace oriented (which you would expect from a **parallel** implementation).

**xterm****Moderator**

In groovy anything involving an iterator is lazily evaluated. I'll find the source code for you soon.

**arithma****Member**

I dreamt that I was xterm. In the dream I said to my machine:

"this is how the forward function works, and this is what the output looks like. figure out the input string and fetch me a capuccino"

**arithma****Member**

This is a C++ version. There are a few pain points that annoyed me. I am becoming a C# programmer. Farewell C++. (Not really).

```
string input = "I inserted it I inserted it in myselfin myself";
int length = input.size();
int hLength = input.size()/2;
vector<string> subsequences(hLength);
int i = 0;
generate(subsequences.begin(), subsequences.end(),
[=, &i]() { return input.substr(i++, hLength); } );
i = 0;
auto found = find_if(subsequences.begin(), subsequences.end(),
[=, &i](string sub) -> bool {
string rhs = input.substr(0, i).append(input.substr(i + hLength, length - (i+hLength)));
i++;
return sub == rhs;
});
if(found != subsequences.end()){
cout << *found << endl;
}
```

**rolf****Member**

PHP. It's quite verbose, could be reduced by about half.

I'm not bringing anything new, the same algorithm in a different language, so sorry if resurrecting that thread poses a problem. There is probably a more efficient way to do it, CPU-cycles-wise.

```
<pre>
<?php
function findSolutions($input) {
$solutions = array();
$inputLen = strlen($input);
$partLen = $inputLen / 2;
for ($i=0; $i<=$partLen; $i++) {
$preChunk = substr($input, 0, $i);
$chunk = substr($input, $i, $partLen);
$postChunk = substr($input, $i+$partLen);
// print_r("Testing $chunk against $preChunk.$postChunk \n");
if($chunk == $preChunk.$postChunk) {
$solutions[$i] = $chunk;
}
}
return($solutions);
}
$input = "ababcc";
$solutions = findSolutions($input);
echo "Input: $input \n";
echo "Solutions, [position] => string \n";
print_r($solutions);
?>
</pre>
```

Pages: **1**