The words Under construction in black text on a yellow background with diagonal black stipes surrounding it
I'm in the process of moving my site. It's still a work in progress. Please excuse the mess and broken links.

Command Line One-Liner to Compare Files with MD5

TODO: Pull subtitle into page object

While working on a tool to eliminate duplicate photos from old hard drives, I wrote this little snippet of code that's worth saving:

code_start_default_section code_end_default_section

It's a command line one-liner that generates MD5 hashes for two files, compares them and states if they are the same or different.

For those unfamiliar with MD5, it's a "cryptographic hash function that produces a 128-bit hash value." The useful part for this snippet is that MD5 can be fed a file of any size and the results is a 32 character string. Most importantly, the same input will always produce the same output and any difference (no matter how minor) creates a large difference in the result. For example, any computer can run MD5 on a file with the contents "asdfasdf-1" and it will produce the hash signature:

f3748c05e25ca8cce7795d1ec97749b0

If you change the one to a two (i.e. "asdfasdf-2") the signature changes to:

e9ca151e1882f63c5d05e7958a7527a9

More about this will come in another post, but what this means for a duplicate photo finder is that MD5 hashes can be generated for every photo and then compared. Any two files with the same hash signature are the same* and can be pared down. That is done with a larger program. The little snippet of code is used for verification. It's also useful enough to be broken out to its own.

**Note: For the tech/cryptography minded folks out there, I know that MD5 can have collisions. For what I'm doing, the chances are so small that I'm not worried about it.*

Debugging Stuff

I'm moving stuff around right now. All this below is helping me figure out where to put stuff

        -- title

Command Line One-Liner to Compare Files with MD5

-- p

While working on a tool to eliminate duplicate photos from old hard drives, I wrote this little snippet of code that's worth saving:

-- code/
-- bash{numberLines: true}

if [ $(md5 -q 1.txt) == $(md5 -q 2.txt) ]; then echo "same"; else echo "different"; fi

-- /code

-- p

It's a command line one-liner that generates MD5 hashes for two files, compares them and states if they are the same or different.

-- p

For those unfamiliar with MD5, it's a "cryptographic hash function that produces a 128-bit hash value." The useful part for this snippet is that MD5 can be fed a file of any size and the results is a 32 character string. Most importantly, the same input will always produce the same output and any difference (no matter how minor) creates a large difference in the result. For example, any computer can run MD5 on a file with the contents "asdfasdf-1" and it will produce the hash signature:

-- p

f3748c05e25ca8cce7795d1ec97749b0

-- p

If you change the one to a two (i.e. "asdfasdf-2") the signature changes to:

-- p

e9ca151e1882f63c5d05e7958a7527a9

-- p

More about this will come in another post, but what this means for a duplicate photo finder is that MD5 hashes can be generated for every photo and then compared. Any two files with the same hash signature are the same* and can be pared down. That is done with a larger program. The little snippet of code is used for verification. It's also useful enough to be broken out to its own.

-- p

**Note: For the tech/cryptography minded folks out there, I know that MD5 can have collisions. For what I'm doing, the chances are so small that I'm not worried about it.*


-- categories
-- Miscellaneous

-- metadata
-- date: 2013-07-08 00:00:00
-- id: 20emw1ir
-- status: published
-- type: post
-- SCRUBBED_NEO: false
-- site: aws