Home
Head's Up: I'm in the middle of upgrading my site. Most things are in place, but there are something missing and/or broken including image alt text. Please bear with me while I'm getting things fixed.

Command Line One - Liner to Compare Files with MD5

While working on a tool to eliminate duplicate photos from old hard drives, I wrote this little snippet of code that's worth saving :

if [ $(md5 -q 1.txt) == $(md5 -q 2.txt) ]; then echo "same"; else echo "different"; fi

It's a command line one - liner that generates MD5 hashes for two files, compares them and states if they are the same or different.

For those unfamiliar with MD5, it's a "cryptographic hash function that produces a 128 - bit hash value." The useful part for this snippet is that MD5 can be fed a file of any size and the results is a 32 character string. Most importantly, the same input will always produce the same output and any difference (no matter how minor) creates a large difference in the result. For example, any computer can run MD5 on a file with the contents "asdfasdf - 1" and it will produce the hash signature :

f3748c05e25ca8cce7795d1ec97749b0

If you change the one to a two (i.e. "asdfasdf - 2") the signature changes to :

e9ca151e1882f63c5d05e7958a7527a9

More about this will come in another post, but what this means for a duplicate photo finder is that MD5 hashes can be generated for every photo and then compared. Any two files with the same hash signature are the same* and can be pared down. That is done with a larger program. The little snippet of code is used for verification. It's also useful enough to be broken out to its own.

**Note : For the tech/cryptography minded folks out there, I know that MD5 can have collisions. For what I'm doing, the chances are so small that I'm not worried about it.*