I chose CSV because they're particularly easy to manipulate, and almost all (dynamic) languages will include some sort of package to parse them easily ^^.
Consider the following file:
There are 3 types of changes you should detect:
- ADDED (line is present in after.csv but not in before.csv)
- REMOVED (line is present in before.csv but not in after.csv)
- MODIFIED (line is present in both, but second and/or third field are modified)
In my example, there are three modifications:
- ADDED line (K)
- REMOVED line (H)
- MODIFIED line (G)
Added complexity: The script is supposed to run on very large files. Try avoiding using a quadratic solution like this one:
BEWARE!! Bad code, do not do this!
Consider the following file:
before.csv
A; ; B;
B; A; H;
C; ; D;
D; C; E G;
E; K D; F;
F; E; H;
G; D; ;
H; B F G;
And a modified version of the file:
after.csv
A; ; B;
B; A; H;
C; ; D;
D; ; E G;
E; K D; F;
F; E; H;
G; D; ;
K; ; E;
The first field of the CSV is a unique identifier of each line. The exercise consists of detecting the changes applied to the file, by comparing before and after.There are 3 types of changes you should detect:
- ADDED (line is present in after.csv but not in before.csv)
- REMOVED (line is present in before.csv but not in after.csv)
- MODIFIED (line is present in both, but second and/or third field are modified)
In my example, there are three modifications:
- ADDED line (K)
- REMOVED line (H)
- MODIFIED line (G)
Added complexity: The script is supposed to run on very large files. Try avoiding using a quadratic solution like this one:
BEWARE!! Bad code, do not do this!
for line in before_csv:
if not after_csv.hasline(line):
print "REMOVED" + line
for line in after_csv:
if not before_csv.hasline(line):
print "ADDED" + line