I had written to the developer of DoubleKiller, asking if he would please port it to Linux or release it as open source code. But even if I had been familiar with the quirks of dupeGuru, I would still not have been in a rush to switch to software that seemed to lack most of the capabilities just described. You would not want to delete the wrong 5,000 files. It seemed advisable to be familiar with your duplicate detection software, if experience indicated that sometimes you would be relying on it to delete thousands of files in a single stroke. I had been using DoubleKiller for maybe 15 years. It could take a while to become familiar with those search criteria. Further targeted searches and deletions could further reduce the size of the mess, with relatively low risk of losing anything that the user wanted to keep. (Similar things have been known to happen with office files.) A first step toward straightening it out might be to run a search for exact duplicates - and then, within the resulting list, delete those that did not contain parentheses (which would make sense if it was determined that the most accurate and desirable filenames were those containing dates, such as the 1979 example). Suppose that the user and/or friends had variously refined the names of those files, such that “Romeo’s Tune.mp3” was now joined by “Forbert – Romeo’s Tune” and “Steve Forbert – Romeo’s Tune (1979).” In some cases, maybe the contents of the files were the same but in other cases, maybe somebody had edited the MP3 or its metadata. To illustrate the value of these selection criteria, suppose a user had a set of several thousand MP3 files. As the image shows, I could also have narrowed the focus to files of a specified path, date range, or size. In the image above, I selected duplicates whose names contained “86.” That could be useful if, for instance, I had become aware of duplication among files whose names used the year 1986 in different ways (e.g., 1986, ’86). (The Uncheck tab reciprocally offered the same ways of unchecking checked files.) Specifically, the user could check all listed files, or all files that s/he might have manually selected (i.e., highlighted), or the first or last files in a group (which could mean any of the radio buttons arrayed across the top: files grouped into duplicate sets, or by duplicate filenames, or in the same folders, or files of the same size, or having the same date or CRC32 hash). As indicated at top left, the user could check files - that is, add checkmarks to indicate files to be deleted - in several ways. I was usually interested in an exact comparison, and thus focused on the content option (i.e., byte-per-byte comparison, as distinct from other possibilities, e.g., look for files whose first 4KB are identical).Īfter running that search, the user would then see a DoubleKiller screen like the one shown above. There were four general categories of comparison (i.e., name, date, size, and content), with endlessly customizable options in each (e.g., consider files to be duplicates only if (a) the last 23 characters of their names are the same and (b) their sizes do not differ by more than 198 bytes). The user would also click on the Comparison Options tab to specify the sort of comparison to be run. In DoubleKiller, the general idea was that the user would click on the Scan Options tab (at top left) to name the folders to be compared to specify files (e.g., file.txt) or file types (e.g., f*.txt), with the option to either exclude those from the search or (just the opposite) to limit the search to only those files or types to ignore, exclude, or limit the search to files of a certain size and/or with certain attributes (e.g., read-only) and so forth. In this image, we have a screenshot of DoubleKiller on the left and dupeGuru on the right, both running in Windows 10 (double-click to enlarge):Īs the image illustrates, DoubleKiller offered a denser listing than dupeGuru. This side-by-side comparison image may provide a sense of the difference. I had also concluded that, for my purposes, dupeGuru was not a serious competitor against DoubleKiller. In previous searches, I had repeatedly seen that dupeGuru was considered among the best Linux tools for this purpose. Perhaps I should begin by explaining what I was looking for, by describing a bit of my usage of DoubleKiller in Windows. It seemed that I would have to find duplicate detection software written for Linux. As described in the post reporting on that effort, I had unfortunately found that DoubleKiller performed poorly in Wine. I had recently tried to run DoubleKiller in Linux using Wine. As part of that process, I was interested in finding Linux software that could detect duplicate files. As described in another post, I was in the process of developing a ransomware-resistant backup system that relied, in part, on a Linux system.
0 Comments
Leave a Reply. |