Immich-Deduper
(Previously: Immich MediaKit)
duplicate photo finder and remover for Immich.
Find and remove similar/duplicate images using deep learning.
- Visual Similarity Detection: Find duplicates by how photos look
- Search from Photo: Find similar photos starting from any specific image
- Multi Mode: Process up to 50 duplicate groups at once
- Cross-User Detection: Find duplicates across multiple Immich users
- Flexible Threshold: Adjust from exact duplicates (0.97+) to similar shots (0.60+)
- Auto-Selection: Pick best photo by date, size, EXIF, and more
- Related Tree: Discover connected similarities - find photos related to related photos
- Exclude Filters: Skip specific extensions (.dng, .png) or filename patterns
- Safe Deletion: Removed photos go to Immich trash, fully recoverable
- Metadata Merge (BETA): Transfer metadata from deleted photos to kept ones (albums, favorites, tags, rating, description, location)
- Fetches Users & Assets data from the Immich PostgreSQL database
- Processes images through ResNet152 to extract feature vectors
- Stores vectors in the Qdrant vector database
- Uses vector similarity to identify similar/duplicate photos
- Multi-user scope: Duplicate detection operates across all imported users
- Import single user: duplicates detected within that user only
- Import multiple users: duplicates detected across all imported users
- Displays similar photo groups based on the configured threshold
- Manages asset deletion by updating Immich database directly:
- Follows Immich's deletion logic for compatibility
- Important: Enable trash feature in Immich settings first
- Deleted assets appear in Immich's trash where you can permanently delete or restore them
⚠️ This feature is experimental and under active testing. If you are not willing to participate in testing and accept potential risks, please do not enable this feature.Safe Testing:
- Copy a photo to create fake duplicates
- Upload copies to Immich and add different metadata to each (albums, tags, ratings)
- Run Deduper and use Metadata Merge
- Verify the kept photo has merged metadata from all copies
Do not test on photos you care about.
When deleting duplicates, Metadata Merge can transfer metadata from deleted photos to kept ones:
- Albums: Add kept photos to all albums that contained any photo in the group
- Favorites: Mark kept photos as favorite if any photo was favorited
- Tags: Merge all tags from the group to kept photos
- Rating: Apply highest rating from the group
- Description: Merge descriptions (deduplicated line by line)
- Location: Apply most common GPS coordinates
- Visibility: Apply strictest visibility setting (locked > hidden > archive > timeline)
How It Works:
- Database Update: Writes merged metadata directly to Immich PostgreSQL database
- XMP Sidecar: Creates/updates
.xmpsidecar files alongside kept photos with:- Description, Rating, GPS coordinates, Tags
- Prevents Immich's "Refresh Metadata" from overwriting merged values
- Atomic Operation: All changes (DB + XMP) are committed together; if any step fails, everything rolls back
Requirements:
- exiftool CLI: Required for XMP sidecar writing
- Docker: Pre-installed
- Source install:
brew install exiftool(macOS) orapt install libimage-exiftool-perl(Linux)
- File Access: Deduper needs write permission to photo directories for XMP files
Important Notes:
- External Libraries: Configure library paths in Fetch page before using Metadata Merge
- Backup XMP: Existing
.xmpfiles are backed up to.xmp.bakduring merge; restored on failure, removed on success - New XMP Files: For photos without existing sidecar, Deduper creates new
.xmpand updates Immich'ssidecarPath - Immich Sync: After merge, Immich will read the XMP sidecar on next "Refresh Metadata" without losing merged values
Troubleshooting:
- Check logs at
DEDUP_DATA/logs/for detailed error messages - Common errors:
exiftool CLI not found: Install exiftool (see Requirements above)File not found: Configure external library paths in Fetch pageNo write permission: Ensure Deduper has write access to photo directories
-
Find Similar- Starts searching for the next photo that matches your
Threshold Minsettings and shows it in thecurrenttab - When photo groups appear in the
currenttab, you can click on a photo's header to select it. This lights up the four action buttons on the top right. After using one of these actions, the kept photos in that group will be marked as resolved - If you don't do anything with a searched group, it'll show up in the
pendingtab waiting for you to handle it later Auto Find Next: When enabled, resolving or deleting the current group automatically triggers a search for the next unprocessed photo. When disabled, the system switches to thependingtab after each action, letting you work through all found groups before searching again- You can always manually switch to the
pendingtab to review and process previously found groups
- Starts searching for the next photo that matches your
-
Workflow- Press
Find Similarto search a batch of groups - Process groups in both
currentandpendingtabs - Press
Find Similaragain to search the next batch - Repeat until all photos are processed
- Press
-
Clear records & Keep resolved- Clears out search records that haven't been resolved yet
- This keeps all the records you've already marked as resolved
-
Reset records- Resets all search records, including the ones you've marked as resolved
-
Path Filter- Only show groups where at least one asset's path contains the filter pattern
- Useful for focusing on duplicates within a specific folder or external library
- Groups not matching the filter are auto-resolved. Use
Reset Recordsto search them again
-
Exclude Settings- Similar Less: Auto-resolve groups with fewer than N similar photos and continue search
- Example: Setting "< 2" means skip groups with 1 or 0 similar photos (requires at least 3 total photos)
- Useful for focusing only on groups with enough duplicates to warrant attention
- NameFilter: Exclude specific files from similarity search by filename patterns or extensions
- Extension format:
.png,.gif,.dng- Files with these extensions won't be selected as main image or appear in similar results - Filename pattern:
IMG_,DSC,screenshot- Files containing these patterns will be excluded - Mixed format:
.png,IMG_,screenshot- Combine extensions and patterns - Use case: Perfect for drone photography where you shoot both RAW (.dng) and JPEG simultaneously but want to keep both formats without them being flagged as duplicates
- Extension format:
- Similar Less: Auto-resolve groups with fewer than N similar photos and continue search
-
Make the most of
Auto Selection- When you enable auto selection, it'll automatically choose which photos to keep or delete after you run
Find Similar. Just scroll through to review, then hit one of the four action buttons at the top
- When you enable auto selection, it'll automatically choose which photos to keep or delete after you run
-
Multi Modesearch feature- By default (when
Multi Modeis off), it only searches for one group of photos at a time - Turn this on and set the
Max Groupnumber when you've got tons of photos to filter through - super handy for big cleanups - Note: Multi Mode and Related Tree are mutually exclusive
- By default (when
-
Related Tree- Only available in single group mode (when Multi Mode is off)
- When off,
Find Similaronly shows photos directly similar to the main photo - When on, it also searches each similar photo for their own similars, and continues expanding outward. This builds a connected chain: A→B→C→D, where A and D may not be directly similar but are linked through B and C
- Best for: burst shots, gradually changing scenes, or edited versions where consecutive photos look alike but the first and last don't
- Indirect matches (photos not directly similar to the main one) are visually marked in the grid
MaxItemscaps the total number of photos in the tree. With low thresholds like 0.5, the search could snowball across thousands of photos without this limit- Note: Photos directly similar to the main photo are always included regardless of
MaxItems
Mode Selection: Choose Single Mode + Related Tree for comprehensive similarity trees, or Multi Mode for quick processing of multiple separate groups.
Automatically select the best photo in each duplicate group based on configurable criteria.
- Each criterion has a weight (0-5). Score = weight × 10
- The photo with the highest total score in each group gets selected
- Selected photos show an "Auto-Selected ?" badge, Hover to see the scoring breakdown
- Each group header has an "auto select log" button, Click to view detailed scoring for all photos
- Settings changes trigger automatic recalculation
Selection Criteria:
| Category | Criterion | Description |
|---|---|---|
| Options | Skip Low Similarity | Skip groups containing photos with similarity < 0.96 |
| All LivePhotos | Select all LivePhoto assets in group (ignores other criteria) | |
| DateTime | Earlier / Later | Prefer photos taken earlier or later |
| EXIF | Rich / Poor | Prefer photos with more or fewer EXIF fields |
| Filename | Longer / Shorter | Prefer longer or shorter filenames |
| FileSize | Bigger / Smaller | Prefer larger or smaller file sizes |
| Dimension | Bigger / Smaller | Prefer higher or lower resolution |
| FileType | JPG / PNG / HEIC | Prefer specific file formats |
| Immich | Favorite / In Album | Prefer photos marked as favorite or added to albums |
| User | Priority User | Prefer photos owned by a specific Immich user |
| Path | Contains | Prefer photos whose path contains a specific string |
Tips:
- Use "Earlier" + high weight when you want to keep original captures over edited versions
- Use "Path Contains" to prioritize photos from a specific folder (e.g.,
/library/keep/) - Combine multiple criteria: FileSize+3, Earlier+2 gives preference to larger files, with date as tiebreaker
- Toggle Enable off/on to clear and re-run auto-selection
-
Progressive cleaning approach
- Start with the highest similarity threshold and work your way down:
- First, get rid of exact duplicates
(0.97-1.00) - Then find near-duplicates
(0.90-0.97) - Finally, catch similar but different shots
(0.80-0.90)
- First, get rid of exact duplicates
- This way you tackle the obvious duplicates first, then deal with the photos that need more careful judgment
- Start with the highest similarity threshold and work your way down:
-
Clear and rescan strategy
- Before changing your threshold settings, use
Reset Recordsto wipe all similarity data - This lets you rescan all photos with new thresholds and avoid missing anything or getting false matches
- Before changing your threshold settings, use
-
Auto Selection workflow
- See Auto Selection for detailed criteria configuration
- Recommended: Set criteria first, then run Find Similar to see pre-selected results
- Always review auto-selected photos before using batch actions
-
Large collection tips
- For 8000+ photos: Enable Multi Mode with appropriate Max Group settings
- Use batch operations for efficiency
-
Finding similar content (reverse approach)
- For finding similar themes (memes, similar scenes, burst shots):
- Start with low threshold (0.60-0.70) +
Multi Modewith limitedMax Group - Progressively raise threshold to filter out less similar matches
- Start with low threshold (0.60-0.70) +
- This is the reverse of duplicate cleaning - work your way up instead of down
- Always limit
Max Groupwhen using low thresholds to avoid overwhelming results
- For finding similar themes (memes, similar scenes, burst shots):
-
External library considerations
- Ensure external library paths are not set to read-only if using Docker Compose
- Enable Immich's recycle bin feature before processing external libraries
- Remember that Deduper reads from Immich thumbnails, so original file locations don't affect similarity detection
When Deduper starts up, it performs several system checks as shown below:
Important startup notes:
- Pay attention to the startup messages displayed during initialization
- The system will show proper status indicators and perform version checks
- If any components are outdated or incompatible, you'll receive update prompts
- Ensure all checks pass before proceeding with operations
If you encounter any startup errors or version mismatches, follow the update instructions above or check the logs for detailed error information.
- Version Check: Compares local version with GitHub to notify you of updates
- Immich Logic Verification: Validates that Deduper's delete/restore operations match Immich's current implementation, preventing potential data corruption from API changes
For air-gapped environments, see Offline Mode.
Choose the installation method that suits your needs:
| Installation Method | Use Case | Advantages | Disadvantages |
|---|---|---|---|
| Docker Compose (CPU) | Non-GPU hosts, smaller image | One-click install, auto Qdrant setup, smaller image size | CPU processing only |
| Docker Compose (GPU) | Linux users with NVIDIA GPU | One-click install, auto Qdrant setup, GPU acceleration | Linux + NVIDIA GPU only |
| Source Installation | Custom environment, development | Multi-platform GPU support (CUDA/MPS), customizable | Manual Qdrant and dependency setup |
Recommended Choice:
- Linux users with NVIDIA GPU: Use Docker Compose (GPU version, default image tag)
- Non-GPU hosts or smaller image needs: Use Docker Compose (CPU version)
- macOS users needing GPU: Use source installation (MPS support)
- Custom development or specific requirements: Use source installation
- Access to an Immich installation with trash feature enabled
- A configured
.envfile (see below)
Before you can use deduper, you need to set your database up, so Deduper can connect to it. This explanation covers only Immich installations via docker compose.
If your Immich installation is on the same machine than you want to install Deduper on, a docker network can be used to connect to the db. To create the network execute the following command (on the host, not in the docker container):
docker network create immich-deduperThen add the network to your immich database container and to the docker compose:
services:
database:
container_name: immich_postgres
image: ghcr.io/immich-app/postgres:14-vectorchord0.3.0-pgvectors0.2.0
networks: # Add the immich-deduper network to the db to allow immich-deduper to access the db
- immich-deduper
networks: # Add the immich-deduper network to the immich docker compose without any indentation
immich-deduper:
external: trueAfter updating, restart Immich to apply the changes. The PSQL_HOST in your .env file should match the container name of the database.
If your Immich installation is on a different machine than you want to install Deduper on, you need to expose the PostgreSQL port. Note that this exposes your database to anyone in the hosts network, so use a secure password! Add the following port mapping to your Immich's docker compose:
services:
database:
container_name: immich_postgres
image: ghcr.io/immich-app/postgres:14-vectorchord0.3.0-pgvectors0.2.0
ports:
- "5432:5432" # Add this line to expose PostgreSQLAfter updating, restart Immich to apply the changes. The exposed port (5432 in this example) should match the PSQL_PORT setting in your Deduper .env file.
Using Docker Compose is the easiest installation method, automatically including the Qdrant vector database.
Installation Steps:
-
Copy Docker Configuration Files
The compose has a few differences when you're installing Deduper on the same host vs on a different host than Immich. Choose the same as you have for setting up the database.
Same host configuration:
Different host configuration:
-
Configure Environment Variables
Choose the appropriate
.envfile based on your setup and modify:PSQL_HOST: Database connection (service name for same-host, IP address for different-host)IMMICH_PATH: Path to your Immich upload directoryIMMICH_THUMB: (Optional) Path for separate thumbnail directory (requires additional volume mount)DEDUP_DATA: Directory for Deduper data storageDEDUP_IMAGE: Deduper image tag to run (latestdefault CPU,latest-cuda, orlatest-cpu)QDRANT_URL: (Optional) Custom Qdrant database URL for non-Docker environments or custom container setupsOFFLINE: (Optional) Set totruefor air-gapped environments (see Offline Mode)
-
Create Docker Network (Same-host only)
If using same-host setup, create the shared network:
docker network create immich-deduper
-
Update Immich Configuration (Required)
Modify your existing Immich docker-compose.yml file according to the example provided:
- Same-host: Add networks configuration to enable communication
- Different-host: Expose PostgreSQL port for external access
-
Choose Image Tag
Set
DEDUP_IMAGEin your.env:# Default (CPU) DEDUP_IMAGE=razgrizhsu/immich-deduper:latest
# Much smaller image size (CPU only) DEDUP_IMAGE=razgrizhsu/immich-deduper:latest-cpu
If using a GPU image (
latest-cuda), add (or uncomment) GPU device reservation indocker-compose.yml:deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu]
Prerequisites for CUDA image:
- NVIDIA GPU with CUDA support
- NVIDIA Container Toolkit / Docker GPU runtime installed
- Linux host system
Note: Docker GPU support is Linux + NVIDIA only. For macOS MPS or Windows GPU acceleration, use source installation.
-
Start Services
docker compose up -d
-
Access Application
- Open browser to
http://localhost:8086
- Open browser to
-
Updating Deduper To update Deduper when using Docker Compose, run:
docker compose down && docker compose pull && docker compose up -d
To switch between tags (for example CPU -> CUDA), update
DEDUP_IMAGEin.envand then run:docker compose down && docker compose pull && docker compose up -d
latest: default CPU image tag (compose default)latest-cuda: explicit CUDA-enabled image for Linux + NVIDIA GPUlatest-cpu: CPU-only image, much smaller than CUDA
If you build locally, you can create equivalent tags:
# CPU
docker build -t immich-deduper:latest-cpu --build-arg DEVICE=cpu .
# CUDA
docker build -t immich-deduper:latest-cuda --build-arg DEVICE=cuda .
For custom environments and development needs.
Use Cases:
- Want to customize Python environment
- Need to modify source code
- Prefer manual control over dependencies
Installation Steps:
-
Install Qdrant Server
# Install Qdrant using Docker docker run -p 6333:6333 qdrant/qdrant:v1.16.3 -
Clone Source Code
git clone https://github.com/RazgrizHsu/immich-deduper.git cd immich-deduper -
Configure Environment Variables Create
.envfile and set connection information (refer to the example above) -
Install Python Dependencies
CPU/macOS-compatible version (default):
pip install -r requirements.txt
GPU acceleration:
# Linux with NVIDIA GPU (CUDA) pip install -r requirements-cuda.txt # macOS with Apple Silicon (MPS) pip install -r requirements.txt # Windows with NVIDIA GPU pip install -r requirements-cuda.txt
Platform-specific notes:
- Linux: Install CUDA drivers and corresponding PyTorch version first
- macOS: Apple Silicon automatically supports MPS acceleration
- Windows: Requires NVIDIA GPU and CUDA drivers
- May need additional system packages:
sudo apt-get install python3-dev libffi-dev(Linux)
-
Start Application
python -m src.app
Path Variables:
DEDUP_DATA: Deduper data directory (database, logs, cache).- Docker: Compose mounts to
/app/data. - Source: Used directly.
- Docker: Compose mounts to
- Docker:
IMMICH_PATH: Your ImmichUPLOAD_LOCATIONpath (folder containingthumbs,library). Compose mounts it to/immich.IMMICH_THUMB: (Optional) Separate thumbnail directory. Compose mounts it to/thumbs.
- Source:
IMMICH_PATH,IMMICH_THUMBrefer directly to your filesystem paths.
Path Mapping (Docker only):
If your Immich database stores original host paths (e.g.,
/mnt/photos/upload/...), Deduper automatically detects and translates them to container paths (/immich/...).
For air-gapped environments without internet access, Deduper supports offline operation.
Setup Steps:
-
Download Model Weights (on a machine with internet)
# Using the provided script (downloads to project_root/data/models/checkpoints/) python scripts/download-model.py # Or specify custom path via environment variable DEDUP_DATA=/your/path python scripts/download-model.py # Or manually download ResNet152 weights # Check the download URL from: python -c "from torchvision.models import ResNet152_Weights; print(ResNet152_Weights.DEFAULT.url)"
-
Transfer Files to Offline Environment
- Copy
data/models/directory (or your customDEDUP_DATA/models/) to the offline machine'sDEDUP_DATA/models/
- Copy
-
Enable Offline Mode
# In your .env file OFFLINE=true
What Offline Mode Does:
- Skips GitHub version checks
- Skips Immich logic verification
- Uses locally cached model weights
- All core functionality remains available
Note: In offline mode, you are responsible for ensuring version compatibility between Deduper and Immich.
Deduper automatically logs system operations and errors to help with troubleshooting.
Log Location:
- Logs are stored in the
DEDUP_DATA/logs/directory - Log files are rotated daily for better organization
Troubleshooting:
- If you encounter any issues or unexpected behavior, check the log files in the logs directory
- The logs contain detailed information about system operations, errors, and warnings
- Log files can help identify configuration issues, database connection problems, or processing errors
Deduper v0.1.11+ supports all Immich versions through automatic schema detection.
Automatic Schema Detection: Deduper automatically detects and adapts to your Immich database schema:
- Table names (plural vs singular: assets/asset, albums/album, tags/tag, users/user)
- Junction table column names (plural vs singular: albumsId/albumId, assetsId/assetId, tagsId/tagId)
No manual configuration needed - Deduper works seamlessly across all Immich versions.
Immich Schema Evolution:
- Immich v1.136.0: Changed main table names from plural to singular (assets → asset, albums → album, tags → tag, users → user)
- Immich v2.3.0: Changed junction table column names from plural to singular (albumsId → albumId, assetsId → assetId, tagsId → tagId)
- Deduper automatically handles all these variations
Initially, I was planning to build this with Electron + React frontend + Node.js backend, but given how much easier it is to integrate machine learning stuff with Python, I ended up going the Python route.
I usually use Gradio for quick AI demos, but it gets pretty limiting when you want more customization. Same story with Streamlit - they're great for prototypes but not so flexible for complex UIs. After trying a bunch of different options, I settled on Dash by Plotly. Sure, it still needs a lot of custom work to get exactly what I want, but it gets the job done pretty well.
What started as a simple little tool to help me clean up duplicate photos somehow turned into this whole complex system... funny how these things grow, right?
Hope this tool helps anyone who's dealing with the same photo organization headaches! :)
by raz
Contributions are welcome! Please feel free to submit a Pull Request.
If you find this project helpful, consider buying me a coffee:
This project is licensed under the GNU General Public License v3.0 (GPLv3).
Commercial use is permitted, but any derivative works must also be open-sourced under the same license. If you modify and distribute this software, you must make your source code publicly available.
This tool interacts with your Immich photo library and database. While designed to be safe, it is still under active development and may contain unexpected behaviors. Please consider the following:
- Always backup your Immich database before performing operations that modify data
- Use the similarity threshold carefully when identifying duplicates to avoid false positives
- The developers are not responsible for any data loss that may occur from using this tool
Immich-Deduper is provided "as is" without warranty of any kind. By using this software, you acknowledge the potential risks involved in managing and potentially modifying your photo collection.
Happy organizing! I hope this tool enhances your Immich experience by helping you maintain a clean, duplicate-free library.






