This post will translate eventually into my blog. I will put back a reference back to this thread to give credit for those joining in the conversation.
Informally, given a method, can we break it up into smaller parts that are more comprehensible to the programmer. I want to define this problem formally so that I can come up with a solid algorithmic solution that can be worked out by hand or even possibly becoming automated. In either way, it has to be completely mechanical.
And since we don't have a model of human comprehension, or can't afford evolutionary techniques against people (which as I am writing this, makes even more sense, and could actually be possible). My best bet is to define some constraints that will guarantee a more simple method is output from a more complex one.
The most naive thing the refactorer will do is to return the method as is. The other naive thing it can do is to encapsulate each statement into a new method. Both of these are utterly useless. None the less, we have defined the 'range' of the algorithm, and actually have already bounded the solution (even without having totally defined the problem yet).
The algorithm stability and sanity pivots around another point. If a method is refactored into atomic methods, the algorithm applied to these methods will be an identity operation (will leave them unchanged).
To simplify the whole thing, so that I can start with some lead, we will assume that all the methods we define are actually context-less functions. The form of these functions is: function (input1, input2, input3, ...) => (output1, output2, output3, ...).
This all should be taken into a functional context (there are no state variables, only symbols that are bound to expressions or values).
Using pseudo code, here goes
naive_example: (x1, x2) => (y1, y2)
y1 <= x2;
y2 <= x1 * x2;
The output from the algorithm must come out like the following:
naive_example: (x1, x2) => (y1, y2)
y1 <= atomic1(x2);
y2 <= atomic2(x1, x2);
atomic1: (x) => (y)
y <= x;
atomic2: (x1, x2) => (y)
y <= x1 * x2;
This is the most naive example I could think of at the moment, and I already spotted a lot of problems with the rigidity of the definition of a function that I am using. Additionally there is a problem with singly bound variables that I do want to avoid having. I need variables to be more than once assignable, which can be solved by using multiple instances of a variable (x becomes x1, x2, x3...)
Share your thoughts, I will think more about all of this and expand on this later.