HPC Technologies. All Rights Reserved."/>


Got lots of duplicate images?

Most image collections tend to accumulate lots of duplicate images over time. The same image may be reposted with a slightly different filename or collections from multiple computers may be merged into one. While GAPS' normal sorting procedure can help you find dupes before they get into your collection, sometimes a more direct approach to dupe cleaning is in order. Enter DupeFinder!

DupeFinder / Cache Manager

DupeFinder's main selection window also doubles as GAPS' CacheManager. Information about the folders GAPS has cached is display. The following columns are available:

The buttons at the bottom of DupeFinder's window operate on all checked folders (those selected by checking the left-most checkbox).

DupeFinder searches all the images in your collection looking for duplicates regardless of filename or location. The MD5 hash algorithm allows GAPS to find images which have the same contents anywhere in your collection. Once found, duplicate images are listed allowing you to semi-automatically delete all of the duplicates leaving only one of each image.

When you run DupeFinder, you must first select which folders of images you wish to scan. DupeFinder can take a long time to run for large collections of images, so you might wish to only run it occasionally on your entire collection while running it more frequently on any "incoming" folders. To select folders for DupeFinder, check the box on the left of DupeFinder's window next to each folder you want to scan.

Once you have selected which folders to scan, you may also select which folders are "protected." When duplicate images are found, the copy in any protected folder will be kept by default while other copies will be deleted by default. Of course you can always manually control which copy(-ies) remain. GAPS never deletes images without your explicit confirmation.

When you click the Find Dupes button, all of the selected folders will be cached into memory. Full details will be loaded for all images which may take a VERY long time when you first run DupeFinder. Once caches are loaded, dupe searching will commence. Dupes will be displayed under the Duplicates tab when the search is completed.

Image Details

Image details include the file size (in bytes), picture dimensions (in pixels), last modified date, and MD5 hash. While some of this information (like file size and date) can be read quickly when folders are initially cached, some information (dimensions and MD5) require that the entire file be read from disk which can take a long time.

Full details must be loaded for all images before DupeFinder can scan them. While details are loaded in the background as GAPS is running, if you have a lot of images, GAPS may need to index them all the first time you run DupeFinder.

TIP: If you store images on a networked drive using a wireless connection (AirPort), you should try to use a wired connection when running DupeFinder, at least the first time. Wired connections are generally MUCH faster than wireless connections. Since DupeFinder must load each image to compare it, the additional speed of a wired network will likely make a huge difference in speed.


DupeFinder Results

Once DupeFinder finishes scanning your image collection, results are displayed on the "Confirm Deletions" tab.

Confirm Deletions

This screen lets you choose which duplicate images GAPS should delete. The following columns are shown:

Below the duplicate images is shown a summary of the count and the file size of all dupe images found (on the left) and of only checked (to be deleted) images on the right. When you click the "Delete Checked Files" button, all checked images will be deleted from your disk, saving you "Checked Bytes" of disk space.


Cache Management

The initial DupeFinder screen allows you to perform some basic cache management functions. GAPS stores a cache of all of the images in each folder it opens. These caches enable GAPS' lightening fast performance even on folders containing tens or hundreds of thousands of images. From the main DupeFinder window, you can delete any of GAPS cache files (they'll be regenerated the next time you open the folder), manually save any cache to disk, or force hash details to be loaded for all images. Hash details are what allow DupeFinder to search for duplicate images.

The list of folders on the DupeFinder window is pre-filled from a number of sources, including:

  1. Any folder which is configured as a sorting destination
  2. Any folder you've opened in GAPS
  3. Manually added folders -- You can manually add a folder with the '+' button.

CVS Id: $Id: dupefinder.php 542 2007-04-15 04:55:32Z pendor $