Photo deduplication software



  • Some of you may recall that about a month ago I started a major project scanning photos:
    @hardwaregeek said in The Official Status Thread:

    Status: My ex is getting ready to move out of state and wants to get together and sort out photos before she goes. ... In another attempt to minimize conflict over who gets the keep each of these pictures, I have embarked on a major project to scan everything we don't have two of.

    Since I started out scanning only some of the photos, keeping track of which ones I've scanned and which I haven't is a bit of a problem. I currently have almost 1300 images scanned, and I'll probably be over 2000 before I'm done. Scrolling through multiple pages of little thumbnails is not a good way to find whether I've already scanned a particular picture.

    A bit of googling reveals that most data deduplication software just looks for files that are byte-for-byte identical, which is not useful for images that may have insignificant differences in cropping (by a couple of pixels) or exposure, but makes the files non-identical. So far, I've found a couple of programs, one of which handles photos only in a paid version (Duplicate Cleaner, digitalvolcano.co.uk), which is ok; the price is quite reasonable, but the article that mentions it doesn't really say anything about how well it works. The other (Free Duplicate Photo Finder, freepicturesolutions.com) is free and seems to work well, but it's been abandoned by its author for several years.

    Anybody have other suggestions?


  • BINNED

    If I remember correctly, Picasa used to be able to find duplicates back when I used it years ago. It also had nice features like face recognition, so it probably used a reasonable difference metric instead of byte comparison.
    It's discontinued as far as I know, but since they're big in the ML stuff, I would guess there's at least one free google app doing what you need.

    Sorry for not being more precise.


  • đźš˝ Regular

    @hardwaregeek

    Disclaimer: I haven't actually tried this.

    The demo script find_similar_images illustrates how to find similar images in a directory.

    You can get imagehash through pip install imagehash.



  • @hardwaregeek said in Photo deduplication software:

    I currently have almost 1300 images scanned, and I'll probably be over 2000 before I'm done.

    That could be a single weekend of shots :)



  • @zecc said in Photo deduplication software:

    imagehash - A Python Perceptual Image Hashing Module

    Thanks; I'll take a look at that. Ideally, I'd like something with a GUI that can show the similar pictures, but even if the demo script just says img1234.tif is very similar to img4321.tif, that's better than nothing; at least I know which of the hundreds of images to check.

    @topspin said in Photo deduplication software:

    I would guess there's at least one free google app doing what you need.

    If there is, you'd think Google would rank their own product high in their search results, but I'm not seeing one.


  • BINNED

    @HardwareGeek

    There is a trial version which works well. The full version is ~100$



  • @thecpuwizard said in Photo deduplication software:

    @hardwaregeek said in Photo deduplication software:

    I currently have almost 1300 images scanned, and I'll probably be over 2000 before I'm done.

    That could be a single weekend of shots :)

    It could, but back in the days I had to pay for film and processing, I didn't shoot quite that prolifically. I'd guess my record was probably about 300–400 for 2.5 weeks in Europe. Even with digital, I think the most I ever shot at one time was about 500 during a week in Europe.


  • đźš˝ Regular

    @hardwaregeek said in Photo deduplication software:

    Ideally, I'd like something with a GUI that can show the similar pictures

    Naturally.

    If you do wind up trying to use imagehash, this blog post might also be useful: https://fullstackml.com/wavelet-image-hash-in-python-3504fdd282b5

    I did a quick test with two images, where one was cropped and color corrected, and the metric they suggest for comparing hashes seems to work well enough:

    float(h — h2) / len(h.hash)**2 <-- Should be smaller than 1.00



  • @hardwaregeek said in Photo deduplication software:

    @thecpuwizard said in Photo deduplication software:

    @hardwaregeek said in Photo deduplication software:

    I currently have almost 1300 images scanned, and I'll probably be over 2000 before I'm done.

    That could be a single weekend of shots :)

    It could, but back in the days I had to pay for film and processing, I didn't shoot quite that prolifically. I'd guess my record was probably about 300–400 for 2.5 weeks in Europe. Even with digital, I think the most I ever shot at one time was about 500 during a week in Europe.

    Agreed, with Film the numbers were lower. I tended to average 6-8 rolls [varying size, depending on how rapidly I anticipated wanting a different type of film, since it was not effective to switch mid roll with 35mm. Last year was in Central Europe and shot fairly heavily, but that was nothing compared to when I was in South East Asia.

    Considering that both sets of locations could easily be "once in a lifetime" (though I do hope to return), I wanted to capture as much as possible.

    I know this is tangential to the topic, but I really wish there was a higher level of available photo matching. For example show me the many shots I took of a specific species of orchid at 10 location over a period of days. "Cognitive Services" should be able to do it in theory.



  • @topspin said in Photo deduplication software:

    If I remember correctly, Picasa used to be able to find duplicates back when I used it years ago. It also had nice features like face recognition, so it probably used a reasonable difference metric instead of byte comparison.

    I'm also 90% sure Picasa had this feature. Of course Google has "retired" it because it was too goddamned useful. (The Google page says the desktop app still works "for those who have already downloaded it", so I guess hit up Bittorrent?)

    @thecpuwizard said in Photo deduplication software:

    Considering that both sets of locations could easily be "once in a lifetime" (though I do hope to return), I wanted to capture as much as possible.

    I capture my trips using my eyeballs and my brain and my pen and my notebook. Everybody does their own thing. Don't be a snob.



  • @blakeyrat said in Photo deduplication software:

    @topspin said in Photo deduplication software:
    I capture my trips using my eyeballs and my brain and my pen and my notebook. Everybody does their own thing. Don't be a snob.

    Agreed, everyone does their own thing :) and (providing "their thing" does not hurt others) more power to them.

    I was actually attempting to be the opposite of a "snob" [a person with an exaggerated respect for high social position or wealth who seeks to associate with social superiors and dislikes people or activities regarded as lower-class] by being explicit about these having been extraordinary situations that I was luck and grateful to have had the opportunity. Now if I had said "oh, yes, one of the many times I have been touring internationally...." that would easily have been considered snobbish...



  • @hardwaregeek I actually wrote something similar to this in a C# program circa 2007 for my first ever real IT job. It didn't work well.

    EDIT: People have suggested similar things already and I didn't read the last part. Ignore me.



  • For something like this is might be worth paying someone to do it for you. Not a tech solution, but you could get a good kid up the road that is reliable person to do it for some cash.

    EDIT: When I was about 14 - 15 I made a little money do some stats for people in Excel for them because they didn't know excel etc. Not exactly well paying but was better than doing a paper round in terms of cash.



  • @lucas1 said in Photo deduplication software:

    you could get a good kid up the road that is reliable person to do it for some cash.

    Meh, I already got a kid, and I don't even have to pay him. Reliable, eh...

    What I'm looking for, though, is something where I can grab a stack of pictures, scan one, and check whether I've already scanned it. If so, don't scan the rest of the stack again.

    I could just scan them anyway and sort out the duplicates later, but scanning each batch takes a half-hour or so, what with getting dust off the scanner, putting pictures on the scanner more or less straight (the scanner software will automatically straighten skewed images, but only within limits, and I can't use the edge of the scanner glass to align them because the scanned area is slightly smaller than the glass), rotating the images to the correct orientation, adjusting the exposure and options for each picture (because the scanner has stupid defaults), and actually scanning at high resolution (about a minute per photo). So I'd like to avoid rescanning if I can.

    A human is better than a computer for paying attention to a few pixels difference in facial expression while ignoring other differences like exposure, but a human — whether that's me, my kid, or some other kid that I'm paying — can't give me the quick feedback I'm looking for to avoid redoing work I've already done. One of the programs I looked at — I don't remember which one; I think it may be the abandonware — does just that; as soon as you're done scanning, it checks for duplicates. Maybe only if you're using that software to do the scanning, I'm not sure; I haven't looked that closely, yet. But that's basically what I'm looking for.



  • Digikam is a photo organizer with duplicate detection built in.



  • @hardwaregeek said in Photo deduplication software:

    @lucas1 said in Photo deduplication software:

    you could get a good kid up the road that is reliable person to do it for some cash.

    ... and I can't use the edge of the scanner glass to align them because the scanned area is slightly smaller than the glass),

    A thin strip of polystyrene (available at most hobby shops) is how I solved that :)



  • @thecpuwizard Yeah, I just haven't gotten around to doing it yet.



  • In the past I've used a software called d'peg. It usually managed to identify duplicates even when differently-sized, differently-cropped, differently-captioned, one converted to black and white vs one in color, slightly rotated, etc. Sometimes it gave false positives when shown two pictures of the same person at the same location but with slightly different gestures, but it will present such possible matches to you to make the final call. My only real gripe is that it seems to run ridiculously slowly when given a large folder of images.

    That, and I wish I had something similar for audio files that wouldn't be completely fooled by slight changes in background noise, playback speed, dead-air space at beginning and end, or resolution.



  • Huh, my ex's used to just snatch the phone/camera from me and delete any photos/films of them....



  • @hardwaregeek said in Photo deduplication software:

    What I'm looking for, though, is something where I can grab a stack of pictures, scan one, and check whether I've already scanned it. If so, don't scan the rest of the stack again.

    I seem to recall a version of some GNOME photo application of circa 15 years ago having this feature. I never really used it, but do remember testing the functionality when I found out it existed, and as I recall it managed to match pictures that were very similar but not quite identical, like the kind you might have if you’ve scanned the same photo twice but cropped it differently.

    Only problem is that I have no clue what the program was called, though I suspect it was part of the default GNOME install for SuSE 7.x or 8.x.



  • @gurth said in Photo deduplication software:

    I suspect it was part of the default GNOME install for SuSE 7.x or 8.x.

    As it happens, I have a computer with an old version of SuSE on it. However, it's in storage, and it's unlikely that I'll be getting it out until after this project is done. Also, I don't think it's quite that old, 9.x, maybe; all I remember is that the last time I updated it, there was a kernel update that almost destroyed my data because it renamed all the /dev/disk* devices and none of my partitions were recognized.

    Anyway, thanks for the info. If I find out what it's called and it's available in Cygwin, it might be useful.



  • @hardwaregeek said in Photo deduplication software:

    and I can't use the edge of the scanner glass to align them because the scanned area is slightly smaller than the glass

    That's really unusual.



  • @blakeyrat My Epson V200 scanner has the same problem.


  • Java Dev

    @gurth I could imagine it makes them slightly cheaper to build. I could also imagine the blind area matches the area on the edge of the paper the included printer can't print on, so the copying function matches up.



  • @gurth said in Photo deduplication software:

    @blakeyrat My Epson V200 scanner has the same problem.

    V550. The unscannable area is maybe a mm, maybe less. But in a photo, it's noticeable. An object, maybe somebody's hand, is right at the edge of the picture; in the scan, it's cut off.



  • @gurth Oh. Epson. That explains it.



  • @blakeyrat Somebody make a better one? When I was at Fry's, this is the only scanner they had that wasn't a scanner/copier/printer/fax/kitchen sink, and cheaper than this, implying that it probably doesn't do any of those things particularly well.



  • @hardwaregeek My CanoScan rocks ass. It's old as shit though. (To give you an idea-- it shipped with Windows 2000 drivers.)

    I dunno if Canon still makes them.

    Also the Fry's in Washington State is always out of stock of everything all the time. So God knows what brands they nominally carry.


  • Considered Harmful

    @hardwaregeek Kitchen sinks work pretty well, you know. We've got an HP printer/copier/scanner, and it was on the cheap side but has never broken and gets good ink mileage.



  • @pie_flavor said in Photo deduplication software:

    Kitchen sinks work pretty well, you know.

    Maybe some fancy new smart sink does, but the one in my house just soaks and ruins the photos.


  • BINNED

    @pie_flavor said in Photo deduplication software:

    HP printer
    never broken

    They must've shipped you the single good one.



  • @topspin said in Photo deduplication software:

    @pie_flavor said in Photo deduplication software:

    HP printer
    never broken

    They must've shipped you the single good one.

    Just disposed (via Goodwill) a bunch of equipment, including 3 HP printers, all over 7 years old (one nearly 20 years old) all of which still worked.



  • @blakeyrat For reasons I posted elsewhere (status thread, I think), I was under a time crunch when I started this project, so I needed something I could walk into the store and walk out with; this was it. I'm happy with the scan quality, which is the most important thing, but it's a bit less convenient to get that quality than I'd like it to be.

    I never went to the Fry's in Washington; I never needed (or could afford) anything worth fighting the traffic to Renton and back. I was lucky; the local Fry's had exactly one in stock, and it happened to be open-box, so I saved a few bucks from the regular price.



  • @hardwaregeek Pfft. Match and a circle of bricks is quicker for getting rid of photos.



  • @hardwaregeek - Only been to the Fry's in Renton a few times, I hit the ones in Ca darn near every time I am there [and also buy from them on the Web]. Granted, more of my purchases (both in number and in $) are in the electronics (components, et. al.) rather than the consumer computer sections....


  • đźš˝ Regular

    @blakeyrat said in Photo deduplication software:

    My CanoScan rocks ass.

    I know someone whose CanoScan rocks ass.

    It rocked ass also under Windows 10, until an update made it drop TWAIN support. 🤷
    Then it continued to rock ass under a Windows XP machine.



  • @zecc Huh. Mine worked fine for paperwork while buying my house about 5 months back, maybe that was pre-breakage-update though I dunno.

    Worked with the default Scanner app in Windows; not even sure if that app uses TWAIN or something else, honestly.


Log in to reply