Years passing, the amount of data in my server have kept growing and unfortunately, for several reasons, I’m having a lot of duplicates (Old copy of some documents, redownload of a file as I’ve “lost” it and even some accidental copy/paste,…). But no way to check one by one by myself!
And this is where FDupes came.
FDupes is a small command line tool to help you find duplicate files and if wanted, to delete all the copies. It uses MD5Sum to compare all the files on your selected device and has a wide set of options (Recursive, scan only files above a minimum size, can delete for you, etc…)
Very easy to use and install as you can see:
1) Install FDupes (From your distribution repo)
apt-get install fdupes
2) Start a scan (Recursive but no delete at this point)
fdupes -r /media/Stockage
It will scan and list all the duplicate file you have in your targeted device/folder:
fdupes -r /media/Stockage/ Progress [24404/82787] 29%
It can takes quite a long time if used on large disk/folder (10mn for me on a full 1TB drive) but here is an example of result:
/media/Stockage/Backup/CE_2010_33676.pdf /media/Stockage/Documents/Bills/2010.33676 - EEFC.pdf
Easy right? Perfect for my server.
However you still have to be a bit careful…for example some files are exactly the same, but you want to keep them as it is….
I have a lot of creative commons musics in my server and every folder got a licence.txt file. Sometimes, it’s the same licence file for different album or author, but I still want to keep all of these files right where they are.
So it’s still good to double check before deleting them. If you don’t care, you can simply add the option -d to fdupes and it will delete them.