Occasionally your mail delivery scheme might hicough, leaving you with duplicate copies of email messages sitting in your mailboxes. I find this happens occasionally if something goes wrong with fetchmail - you kill the fetchmail process before it has expunged the deleted email from the remote POP3 server, so the next time you run fetchmail it downloads a second copy of each email. This is a simple process that I came up with to remove duplicate email messages from a maildir format mailbox.
As a bit of background, a maildir mailbox is a small directory tree:
$ du .boxes.xml-dev 4 .boxes.xml-dev/tmp 124 .boxes.xml-dev/new 52340 .boxes.xml-dev/cur 52920 .boxes.xml-dev $Heirarchy is represented by components of the mailbox name separated by dots, so the mailbox above is called
xml-dev
and it is in
the boxes
mailbox.
Messages are files in either the new
or cur
directories.
Transport agents place messages into the new
directory.
When a user agent opens a mailbox it moves all the messages from
new
to cur
.
If you're accessing your mail through an IMAP server like
Courier-IMAP
the IMAP server will deal with this for you.
Make sure there's nothing sitting in the new
subdirectory.
If there are messages in the$ ls new $
new
subdirectory, open the mailbox
in a user agent to get it to move them into cur
.
See how many messages you have:
$ ls cur | wc -l 842 $
Check they all have Message-IDs:
$ for i in cur/*; do reformail -x Message-ID: <$i; done | wc -l 842 $
See how many you have if you filter out duplicate Message-IDs:
$ for i in cur/*; do reformail -x Message-ID: <$i; done | sort -u | wc -l 698 $
See how many we're going to delete:
If this total doesn't match you should increase the 20000 - reformail isn't remembering enough Message-IDs to spot all the duplicates.$ rm /tmp/dups $ for i in cur/*; do reformail -D 20000 /tmp/dups <$i && echo $i; done | wc -l 144 $ expr 698 + 144 842 $
Delete the messages and check things look right afterwards:
$ rm /tmp/dups $ for i in cur/*; do reformail -D 20000 /tmp/dups <$i && rm $i; done $ ls cur | wc -l 698 $
$Id: mail-duplicates.html,v 1.1 2003/04/22 10:32:59 mhw Exp $