• Coding
  • [Exercise] Condition symmetry

This is based on a real-life scenario I came across at my job recently. Our scheduler is the program responsible for batch management. It basically manages the dependencies between jobs.

Let's take the following example:
A---B
    \---.---.---H
            /
    /---E---F
C---D
    \---G
You can understand this wonderful drawing like this:

A and C will start immediately in parallel.
B will execute only after A is finished. D waits for C
E and G run in parallel after D
F runs once E is done.
H waits for F and B to finish.

Internally (I'm super-simplifying), the program handles dependency by assigning COND to each job:
- IN COND if the job should wait for another one to finish.
- OUT COND if the job should notify another waiting one that it's done.

To make things clearer, let's take a look at this CSV extract of the database.
# Jobname; Incond; Outcuond;

A; ; B;
B; A; H;
C; ; D;
D; C; E G;
E; K D; F;
F; E; H;
G; D; ;
H; B F G;
- Incond and outcond are supposed to be a list of jobnames separated by a whitespace (look at job D and H).

- In order to create the dependency A---B, we need to define
Job A: OUTCOND B
Job B: INCOND A (look at the first two lines of the CSV).

- Job H has 3 INCONDS, one of them is G. But G doesn't have H as OUTCOND so the dependency is not created.

- Job E has K as INCOND. But job K does not exist. The condition is ignored.
Side note
This situation happens a lot. Usually, job K existed before but got deleted in a terribly unelegant way.

Unfortunately, we often have to modify that file by hand, and many of us make mistakes. It's easy to forget an existing condition when deleting a job. Our work CSV file has over 6500 entries, uses the worst naming convention of all time (ALL_CAPS_WITH_NUMBERS_004); it can get messy.


Exercise:
The goal of this exercise is to write a script that would parse the given CSV file and detect all the non valid conditions. A condition is considered valid if it is symmetrically matched.

Bonus points if your script can make the difference between a missing condition and a deleted job.
Nice exercise, the concept reminds me of operation scheduling in Operations Research. I'll try to think about it when I have time.
Job A: INCOND B
Job B: INCOND A (look at the first two lines of the CSV).
Just a little question, first line shouldn't it be Job A: OUTCOND B in this case?
Yes thank you, I corrected it.
btw, here's my answer in my new pet language AWK, abusing the power of associative arrays.
#!/usr/bin/awk -f

BEGIN {
    FS=" *; *";
    delim = "-";
    conds[""]=0;
}

{

    icnd_size = split($2, incond_list, " ");

    for (i=1; i<=icnd_size; ++i) {
    conds[incond_list[i] delim $1]++;
    }

    ocnd_size = split($3, outcond_list, " ");

    for (i=1; i<=ocnd_size; ++i) {
    conds[$1 delim outcond_list[i]]--;
    }

}

END {

    for (i in conds) {
    sz = split(i, answer, delim);

    if (conds[i] == 1) {
        j = answer[2];
        c = answer[1];
        inorout = "INCOND";
    }

    if (conds[i] == -1) {
        j = answer[1];
        c = answer[2];
        inorout = "OUTCOND";
    }

    if (conds[i] != 0)
        print "Invalid", inorout, c, "on job", j;
    }
}
Fear my drawing powers:
A   C
|   |
B   D
|  /\
| E  G
| |
| F
|/
H
Bonus question: Write a script that reads the csv file and output a ascii drawing (like mine or xterm's).