Skip to content

pHash correctness #12

@leofidus

Description

@leofidus

I'm trying to use phash across different programming languages. For that purpose I consider the hashes produced by the python libraries https://github.com/thorn-oss/perception and https://github.com/JohannesBuchner/imagehash to be canonically correct.

The documentation of this project suggests that HasherConfig::new().hash_alg(HashAlg::Mean).preproc_dct().to_hasher() would produce a compatible hash, but in practice this is not the case. After some extensive experimentation, there are three changes I've identified to produce nearly the same results (Hamming Distance of ~4 on a 1024bit hash after these three changes):

  • HashAlg::Median, lifted straight from the old img_hash_median crate
  • Different bit order: reorder each byte in the resulting ImageHash so that the bit order 76543210 becomes 01234567.
  • Different conversion to grayscale: Pillow and Image use different conversion factors to go from RGB to grayscale. This has by far the lowest impact of the three (and from a quick search it seem the python versions also differ from the original C version of phash here)

I'll try to make the necessary PRs to make each of these options possible without changing the existing defaults.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions