• the "same image" problem

    From Eli the Bearded@21:1/5 to All on Sun Jul 20 19:58:30 2025
    I have tens of thousands of photos, mostly mine, spanning decades.
    During that time there has been a lot of opportunity for images to get
    into my collection in different ways. Like, I take a photo, resize it
    to post on a website, then download an archive of my activity from that
    website a few years later and now I maybe have three copies of the
    image, each with different MD5/SHA1/whatever hash:

    1. My original
    2. My reized version
    3. The re-encoded version from the website archive

    Or I have an image from a backup of my phone, which I then later
    changed the tags on, so the exif data differs. (These I can _usually_
    identify by filename matches. But some have filenames too generic
    for that to work.)

    Or I have a physical photo that I have scanned from both a print and
    from the negative at different times.

    Or I have a photo I shared with family and then they sent it back
    a few years later as a reminder, each time it getting re-encoded.

    I would like a tool that can scan my collection and easily help me find visually similar images but which may be not exactly pixel for pixel
    identical, and for 100% sure are not byte for byte identical on disk.

    It's been about ten years since I last looked for such a tool and I
    wasn't really happy the ones for Linux back then. Best I remember was "Perceptual Hash" ( https://www.phash.org/ -- last release 2013 ). The
    output was a number, but it could compare images pairwise, which doesn't
    scale well.

    Anything people like these days?

    Elijah
    ------
    has not tried using phash in a long time

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Eli the Bearded on Sun Jul 20 23:38:41 2025
    On Sun, 20 Jul 2025 19:58:30 -0000 (UTC), Eli the Bearded wrote:

    I would like a tool that can scan my collection and easily help me
    find visually similar images but which may be not exactly pixel for
    pixel identical, and for 100% sure are not byte for byte identical
    on disk.

    A quick search through the Debian package repo turns up

    <https://packages.debian.org/bookworm/findimagedupes>

    I also found this, but not in current stable: <https://packages.debian.org/search?keywords=perceptualdiff&searchon=names&suite=all&section=all>

    Disclaimer: I haven’t tried either for myself.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)