I have tens of thousands of photos, mostly mine, spanning decades.
During that time there has been a lot of opportunity for images to get
into my collection in different ways. Like, I take a photo, resize it
to post on a website, then download an archive of my activity from that
website a few years later and now I maybe have three copies of the
image, each with different MD5/SHA1/whatever hash:
1. My original
2. My reized version
3. The re-encoded version from the website archive
Or I have an image from a backup of my phone, which I then later
changed the tags on, so the exif data differs. (These I can _usually_
identify by filename matches. But some have filenames too generic
for that to work.)
Or I have a physical photo that I have scanned from both a print and
from the negative at different times.
Or I have a photo I shared with family and then they sent it back
a few years later as a reminder, each time it getting re-encoded.
I would like a tool that can scan my collection and easily help me find visually similar images but which may be not exactly pixel for pixel
identical, and for 100% sure are not byte for byte identical on disk.
It's been about ten years since I last looked for such a tool and I
wasn't really happy the ones for Linux back then. Best I remember was "Perceptual Hash" (
https://www.phash.org/ -- last release 2013 ). The
output was a number, but it could compare images pairwise, which doesn't
scale well.
Anything people like these days?
Elijah
------
has not tried using phash in a long time
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)