Exercise - Self Insertion

arithma · Jun 16, 2011

While exercise may be a little easy, try this for a brain scratch: Try to prove (or disprove, I haven't reached a solution) that there can at most one unique solution for any input string.

Joe · Jun 16, 2011

Although xterm's solution is very elegant, it actually fails to address the performance issue. It's easy to brute force (I'm not saying Groovy does, but it might), but coming up with the best algorithm for the current issue can be very tricky.

The exercise is far from easy. There are just two ways to think about it. The "dynamic" Groovy/Perl/Python way is not easy or cheap. It just addresses the issue differently.

If you think of the algorithm to solve that issue, it becomes different. Enumerable/Range/Where, doesn't tell doesn't tell you anything about the complexity. How many loops? how many checks? If you care about this stuff, this exercise is actually pretty interesting.

MSD · Jun 16, 2011

Now this is an informal proof, not tidy you have been warned.

Assume abcde has 2 solutions (a b c d and e are not letters, they represent substrings of input)

sol 1:
abe = cd

sol 2:
ade = bc

a, c, and e have a common start
d, e, and c have a common end

so

xAxBEy = xCyDy

xADyEy = xBxCy

AxBE = CyD
ADyE = BxC

now
A, C, and B have a common start
D, E, and C have a common end

XA'xXB'E'Y = XC'YD'Y
XA'D'YyE'Y = XB'xXC'Y

A'xXB'E' = C'YD'
A'D'YyE' = B'xXC'

simplifying
AxBE = CyD
ADzE = BxC

repeating like this leads to equality of A, B, C, D, E

I think this means that there is one solution only.

arithma · Jun 17, 2011

@rahmu: My code is line for line equivalent to MSD's. Additionally its parallelizable. So not only is the complexity deterministic, it also encodes the parallesim of the problem.
I think I cooked up a little proof, but it needs some peer review. I'll share the scratch with you.
The idea is to start with an already factored string, assume there is another factoring out, and then proving that the other factoring must equal the first one.
The following works based on the assumption that XXYY is a valid factorization. Then it assumes UUVV is another valid factorization where |U| (length) is greater or equal to that of |X|. It then splits X into two parts such that U = X1+X2+X1. It proves the property that U = X by determining that the length of X1 is 0.
The sketch of the proof follows:

arithma · Jun 17, 2011

The above proof is a little bit off.
The correct way is:
X1X1X2X1Y1Y2Y1Y1 = X1X2Y1Y2
in terms of length: 3*X1+X2+3*Y1+Y2=X1+X2+Y1+Y2
2*X1+2*Y1=0. Since X1 and Y1 are not negative, they both have to be 0 to satisfy the equation.

xterm · Jun 17, 2011

Edit: Pointless blabber about pointless things

arithma · Jun 17, 2011

@xterm: Does groovy evaluate the lambda's inside lazily or eagerly?
This is for the sake of comparison with LINQ's lazy technique.
Since we just proved that there should only be one solution, my algorithm should return a single string or null, and should use SingleOrDefault(...) instead of Where(...). SingleOrDefault will halt the preceding enumeration of integers. Interesting stuff would happen in the Parallel case, if they have implemented things exhaustively performace oriented (which you would expect from a parallel implementation).

xterm · Jun 17, 2011

In groovy anything involving an iterator is lazily evaluated. I'll find the source code for you soon.

arithma · Jun 17, 2011

I dreamt that I was xterm. In the dream I said to my machine:

"this is how the forward function works, and this is what the output looks like. figure out the input string and fetch me a capuccino"

arithma · Jun 18, 2011

This is a C++ version. There are a few pain points that annoyed me. I am becoming a C# programmer. Farewell C++. (Not really).

string input = "I inserted it I inserted it in myselfin myself";
int length = input.size();
int hLength = input.size()/2;
vector<string> subsequences(hLength);

int i = 0;
generate(subsequences.begin(), subsequences.end(),
	[=, &i]() { return input.substr(i++, hLength); } );

i = 0;
auto found = find_if(subsequences.begin(), subsequences.end(),
	[=, &i](string sub) -> bool {
		string rhs = input.substr(0, i).append(input.substr(i + hLength, length - (i+hLength)));
		i++;
		return sub == rhs;
});
if(found != subsequences.end()){
	cout << *found << endl;
}

rolf · May 2, 2012

PHP. It's quite verbose, could be reduced by about half.
I'm not bringing anything new, the same algorithm in a different language, so sorry if resurrecting that thread poses a problem. There is probably a more efficient way to do it, CPU-cycles-wise.

<pre>
<?php

function findSolutions($input) {
	$solutions = array();
	$inputLen = strlen($input);
	$partLen = $inputLen / 2;
	for ($i=0; $i<=$partLen; $i++) {
		$preChunk = substr($input, 0, $i);
		$chunk = substr($input, $i, $partLen);
		$postChunk = substr($input, $i+$partLen);
		// print_r("Testing $chunk against $preChunk.$postChunk \n");
		if($chunk == $preChunk.$postChunk) {
			$solutions[$i] = $chunk;
		}
	}
	return($solutions);
}


$input = "ababcc";
$solutions = findSolutions($input);
echo "Input: $input \n";
echo "Solutions, [position] => string \n";
print_r($solutions);

?>
</pre>