HPC Technologies. All Rights Reserved."/>
Most image collections tend to accumulate lots of duplicate images over time. The same image may be reposted with a slightly different filename or collections from multiple computers may be merged into one. While GAPS' normal sorting procedure can help you find dupes before they get into your collection, sometimes a more direct approach to dupe cleaning is in order. Enter DupeFinder!
DupeFinder / Cache Manager
DupeFinder's main selection window also doubles as GAPS' CacheManager. Information about the folders GAPS has cached is display. The following columns are available:
- Checkmark - By checking the left most column, you select a folder for inclusion in a DupeFinder scan.
- Path - The folder's full path on disk
- 'L' - Checked if the folder is currently loaded in memory (the folder is "opened")
- 'P' - By checking this column, you mark a folder as "protected" (see below).
- # Imgs - The total number of images in the folder
- # Detailed - The number of images for which all details have been cached. Image details include file size (in bytes) image dimensions (in pixels), last modified date, and the image's MD5 hash which is used for DupeFinder. See below for more on image details.
- Remove - Click to remove a cache from the DupeFinder list (see below for why folders "come back" or for why some folders don't have a Remove button).
- Load - (Re-)loads a folders cache, scanning for any newly created or deleted images. Does not attempt to load image details.
- Delete - Deletes the folder's cache file from disk but doesn't remove it from memory.
- Details - For folders which have undetailed images (# Imgs is greater than # Detailed), this button will load ALL images' details. This can take a long time, especially for images on a network disk.
- Save - For folders which have unsaved changes, this button will save all changes to disk. Not available for caches which aren't loaded or which are currently saved completely to disk.
The buttons at the bottom of DupeFinder's window operate on all checked folders (those selected by checking the left-most checkbox).
- (Re)load caches and updates any selected folders. Newly created images are added to the cache and any deleted images are removed. This button does the same thing as the 'Load' button does, but for all selected caches.
- Save - Saves any unsaved caches to disk. This is the same as the per-folder Save button above.
- Load Full Details - Loads all image details for the selected caches. This is the same as the Details button above. Note that this might take a very long time if you have a large number of caches selected.
- Dupe & Junk Finder - This triggers DupeFinder for all selected folders.
DupeFinder searches all the images in your collection looking for duplicates regardless of filename or location. The MD5 hash algorithm allows GAPS to find images which have the same contents anywhere in your collection. Once found, duplicate images are listed allowing you to semi-automatically delete all of the duplicates leaving only one of each image.
When you run DupeFinder, you must first select which folders of images you wish to scan. DupeFinder can take a long time to run for large collections of images, so you might wish to only run it occasionally on your entire collection while running it more frequently on any "incoming" folders. To select folders for DupeFinder, check the box on the left of DupeFinder's window next to each folder you want to scan.
Once you have selected which folders to scan, you may also select which folders are "protected." When duplicate images are found, the copy in any protected folder will be kept by default while other copies will be deleted by default. Of course you can always manually control which copy(-ies) remain. GAPS never deletes images without your explicit confirmation.
When you click the Find Dupes button, all of the selected folders will be cached into memory. Full details will be loaded for all images which may take a VERY long time when you first run DupeFinder. Once caches are loaded, dupe searching will commence. Dupes will be displayed under the Duplicates tab when the search is completed.
Image Details
Image details include the file size (in bytes), picture dimensions (in pixels), last modified date, and MD5 hash. While some of this information (like file size and date) can be read quickly when folders are initially cached, some information (dimensions and MD5) require that the entire file be read from disk which can take a long time.
Full details must be loaded for all images before DupeFinder can scan them. While details are loaded in the background as GAPS is running, if you have a lot of images, GAPS may need to index them all the first time you run DupeFinder.
TIP: If you store images on a networked drive using a wireless connection (AirPort), you should try to use a wired connection when running DupeFinder, at least the first time. Wired connections are generally MUCH faster than wireless connections. Since DupeFinder must load each image to compare it, the additional speed of a wired network will likely make a huge difference in speed.
Once DupeFinder finishes scanning your image collection, results are displayed on the "Confirm Deletions" tab.
Confirm Deletions
This screen lets you choose which duplicate images GAPS should delete. The following columns are shown:
- Found - Shows a disclosure triangle which shows each individual duplicate found. The top row of each group (the on with the triangle) serves as an overview for the group of duplicates. Below the top row, one row is shown for each copy of the image.
- Checkmark - Only the individual duplicates checked will be deleted by GAPS. By default, all but on of the copies is checked for you. You may choose to keep any one or more of the dupes. Unchecked images will NOT be deleted by GAPS.
- View - When the images for a group of duplicates are disclosed, a thumbnail of each dupe is shown in this column. Double-clicking on a thumbnail will show open a window with a larger version of the image.
- Filename - Shows the full path and filename of each duplicate found.
- Size - For the top row in each group, shows the total size of all of an image's duplicates. For the per-copy rows, shows the size of the individual images.
Below the duplicate images is shown a summary of the count and the file size of all dupe images found (on the left) and of only checked (to be deleted) images on the right. When you click the "Delete Checked Files" button, all checked images will be deleted from your disk, saving you "Checked Bytes" of disk space.
The initial DupeFinder screen allows you to perform some basic cache management functions. GAPS stores a cache of all of the images in each folder it opens. These caches enable GAPS' lightening fast performance even on folders containing tens or hundreds of thousands of images. From the main DupeFinder window, you can delete any of GAPS cache files (they'll be regenerated the next time you open the folder), manually save any cache to disk, or force hash details to be loaded for all images. Hash details are what allow DupeFinder to search for duplicate images.
The list of folders on the DupeFinder window is pre-filled from a number of sources, including:
CVS Id: $Id: dupefinder.php 542 2007-04-15 04:55:32Z pendor $