Recommend a tool/approach--diff summaries of directory contents
-
I need a way of summarizing the differences between two archived directories, preferably in a scriptable way.
My Web Design students turn in checkpoints (an archive snapshot of their current progress). I'd like to be able to analyze each successive checkpoint to determine roughly how much progress they made each week without having to do it manually. These are HTML/CSS/jpg files, and all I care about with the images is a binary added/removed flag.
My napkin design goes something like this--
For each file in the current archive, check if there's a file with the same name in the previous archive. If not, the file is new. If so, check the last changed timestamps and sizes--if they're both the same, the file is the same. If it's changed, take the absolute value of the % change in file size as the % of the file that's changed. Check for deleted files by walking through the previous archive and comparing to the current archive. Summarize the total something like
+ N files - M files File 1 changed X% File 2 changed Y% ...
Is there a better way/tool to do this?
-
How about something like:
diff <(tar -tvf old.tgz | sort) <(tar -tvf new.tgz | sort)
If the archives are zips, adapt the listing commands as appropriate.
-
@pleegwat said in Recommend a tool/approach--diff summaries of directory contents:
How about something like:
diff <(tar -tvf old.tgz | sort) <(tar -tvf new.tgz | sort)
If the archives are zips, adapt the listing commands as appropriate.
I just found that diff option. I hadn't known that diff would handle whole directories. That basically solves the problem.
Thanks!
-
@benjamin-hall Diffing the directories works as well (with
-r
), but that's not what this does and may give more output. My example extracts the file listings from the archives, and diffs that.
-
@pleegwat could you please explain what the syntax means? It looks like redirecting outputs of both commands to diff's stdin - how does diff recognize where the second file starts?
-
@gąska said in Recommend a tool/approach--diff summaries of directory contents:
@pleegwat could you please explain what the syntax means? It looks like redirecting outputs of both commands to diff's stdin - how does diff recognize where the second file starts?
According to SO:
The arguments to diff will look like
/dev/fd/3
and/dev/fd/4
: they are file descriptors corresponding to two pipes created by bash. Whendiff
opens these files, it'll be connected to the read side of each pipe. The write side of each pipe is connected to thetar
command.So, shell magic basically. Documentation here.
-
@gąska As @heterodox already mentioned, the output does not go to stdin. The part between the round brackets is executed in a subshell, it's standard output is assigned to a file descriptor (typically I've seen it start at 63 and work downward), and the construct is replaced by a file name of the form
/dev/fd/63
for the main command. This allows you to send multiple commands into (in this case)diff
without having to create temporary files. As far asdiff
is concerned, it is comparing between/dev/fd/62
and/dev/fd/63
, both of which are named pipes.There is a similar construct
>(command)
which pipes in the other direction.This construct can be particularly useful in situations where you want some part of a pipeline to run in the main shell:
while read line do # some code which sets variables done < <(zcat data.gz)
In the (more readable)
zcat data.gz | while read line
notation, the loop runs in a subshell and it cannot set variables in the main shell.
-
-
@heterodox why does every piece of knowledge about shell scripting makes me hate it more?
-
@gąska said in Recommend a tool/approach--diff summaries of directory contents:
@heterodox why does every piece of knowledge about shell scripting makes me hate it more?
Because you're still (somewhat) sane?
-
@benjamin-hall except I can read G++'s template errors just fine.
-
@gąska said in Recommend a tool/approach--diff summaries of directory contents:
@benjamin-hall except I can read G++'s template errors just fine.
That... makes you sane?
-
@heterodox said in Recommend a tool/approach--diff summaries of directory contents:
@gąska said in Recommend a tool/approach--diff summaries of directory contents:
@benjamin-hall except I can read G++'s template errors just fine.
That... makes you sane?
Nah - @Benjamin-Hall said he was sane. @Gąska countered with that example.
-
@dcon Ah, yes. It’d help if I myself were literate.