Finding photos of a known size

Recently I did a friend a favour and installed Linux Mint on her laptop as she was a bit frustrated with Windows. Unfortunately I assumed she’d backed up everything before handing it over to me, so I re-partitioned the whole drive to ext4. She hadn’t.

On the bright side the computer was quite new and the only thing she wanted from the disk were some photos she’d taken. Well, that just made it my lucky day because there just happens to be a tool specifically for recovering photos (and a myriad of other filetypes) from disks that have been written over: TestDisk

We knew the photos were JPGs, so I let TestDisk do its magic and soon enough it had dumped every picture ever stored on that machine into a new directory. Sounds great, except when I say every picture I mean every picture including a few weeks worth of cached images from Facebook, advertisements for “singles in your area” and all sorts of other stuff we really don’t want.

I opened up two or three pictures she identified as photos she had taken and noticed that they were all either 640×480 or 480×640 depending on whether they were landscape or portrait. Now, if only there was a way to check the image dimensions of the pictures dumped in this directory and move the ones with that size to a Photos directory. That’s where Exiv2 comes in; a handy little tool which displays the metadata of various image types, including JPGs! I decided that it’s unlikely that a picture would have a height or width of 640 pixels and not be a photo so simplified my requirements to finding images with a height or width of 640 pixels and wrote a few lines of bash code to find images matching this requirement and copy them elsewhere.

for i in `ls`; do
    exiv2 $i/* 2> /dev/null | grep "Image size.*640" | sed -e 's/ .*//'
done > photos

for j in `cat photos`; do
    cp $j here/

Exiv2 produces, amongst other things, a line containing the file name of the image, followed by the dimensions. This code loops through a list of all files in the directory and, for each one, outputs the metadata, finds lines containing “Image size” some characters and then “640” so that either height or width will match. Matching lines are then stripped of the space and everything following it and output to a file called photos, so that it only contains the filenames. Now that we have a list of the desired filenames, each file in this list is copied to a directory named “here” where I want the images stored. Problem solved!

Afterwards I moved the “here” directory elsewhere and removed the image dump directory. It’s probably possible to do this using only one loop, but this way I was able to read the intermediary file first to check that it had what looks like the right files. It’s also possible to check for 640 and 480 but I didn’t feel like it. I felt it would be easier just to glance through the pictures afterwards and delete any bad matches.

This solution worked perfectly and I learnt something new that day so I’m writing this down so I don’t forget how it works and hopefully somebody else will benefit from reading this too. If you do, send me a message on Twitter or write me an email if you don’t tweet! 😀