How to fix emails for Cyrus LMTP and IMAP

I, as many others, have been bitten by Cyrus' strictness when it comes to RFC-compliant email headers. Although it cost me about a full day, I still appreciate that Cyrus interpretes the RFC strictly and thus forces email to be syntactically correct. It may not strictly adhere to the "be liberal in what you accept" approach, but this way is less likely to cause problems later (with IMAP clients, indexing, searching, etc.).

My pain started when I tried to "quickly" transfer my old, legacy mail tree (there must be some emails from 1998 or so in there, converted now at least 5 times from different mail spool formats) from a standard Maildir/ structure (previously served by courier imapd) with imapsync. This nice tool can synchronize from one IMAP box to another, and thus avoids the complexity of converting file formats and structures. I used something simple like

imapsync --host1= --ssl1 --user1=rene --host2= --ssl2 --user2=rene --syncinternaldates

and thought that, after about an hour or so (for the few GB worth of emails in a deep tree structure), I should be ready to switch the DNS entry to the new server. Wrong.

There are a few things that can happen when trying to import old emails into a Cyrus mail store:

  • "Message contains invalid header": This issue happens a lot, and the problem is that there are some lines in the header part of the email that, in my experience, either don't have a colon after the first word (the infamous "From ..." and ">From ..." first lines that stem from converting from mbox to Maildir format), or that have headers with an empty value ("X-something-unimportant: ").
  • "Message contains invalid header": The same error message can be caused by a "Message-ID: " entry without a value, i.e. it is related to the second cause above. However, this problem not only affects IMAP but also LMTP delivery into the store.
  • "Message contains NUL characters": This message is pretty descriptive: the mail contains somewhere a NULL character (\0).

As the number of emails in my tree was far too much to check manually why each email failed to import with imapsync, I wrote a small shell script to take care of those issues that I found. It's a quick and dirty hack that will most probably not catch all possible errors, but it worked for finally importing all of my emails after fixing them with it. It copies files before touching them, so that, if anything goes wrong, you can recover by simply copying back the backup files. The script assumes a directory called Maildir/:


# this removes invalid "From ..." headers left over from mbox file imports
mkdir -p Maildir-backup-files
cd Maildir
find . -type f | while read f; do if head -1 "$f" | egrep -q "^From "; then mkdir -p "../Maildir-backup-files/`dirname \"$f\"`"; cp "$f" "../Maildir-backup-files/$f"; awk 'NR>1' "../Maildir-backup-files/$f" > "$f"; fi; done
# the same for ">From ..."
mkdir -p Maildir-backup-files4
cd Maildir
find . -type f | while read f; do if head -1 "$f" | egrep -q "^>From "; then mkdir -p "../Maildir-backup-files4/`dirname \"$f\"`"; cp "$f" "../Maildir-backup-files4/$f"; awk 'NR>1' "../Maildir-backup-files4/$f" > "$f"; fi; done

# this removes empty headers (with nothing set)
mkdir -p Maildir-backup-files2
cd Maildir
find . -type f | while read f; do if egrep -q "^X-Keywords:\W+$" "$f"; then mkdir -p "../Maildir-backup-files2/`dirname \"$f\"`"; cp "$f" "../Maildir-backup-files2/$f"; egrep -v "X-Keywords:\W+$" "../Maildir-backup-files2/$f" > "$f"; fi; done
find . -type f | while read f; do if egrep -q "^X-MS-Has-Attach:\W+$" "$f"; then mkdir -p "../Maildir-backup-files2/`dirname \"$f\"`"; cp "$f" "../Maildir-backup-files2/$f"; egrep -v "X-MS-Has-Attach:\W+$" "../Maildir-backup-files2/$f" > "$f"; fi; done
find . -type f | while read f; do if egrep -q "^X-MS-TNEF-Correlator:\W+$" "$f"; then mkdir -p "../Maildir-backup-files2/`dirname \"$f\"`"; cp "$f" "../Maildir-backup-files2/$f"; egrep -v "X-MS-TNEF-Correlator:\W+$" "../Maildir-backup-files2/$f" > "$f"; fi; done

# and this removes NUL characters
mkdir -p Maildir-backup-files3
cd Maildir
find . -type f | while read f; do if [ x"`cat \"$f\" | md5sum`" != x"`cat \"$f\" | tr -d '\0' | md5sum`" ]; then mkdir -p "../Maildir-backup-files3/`dirname \"$f\"`"; cp "$f" "../Maildir-backup-files3/$f"; tr -d '\0' < "../Maildir-backup-files3/$f" > "$f"; fi; done

This fixes the import problem, but the issues can also happen during postfix (my choice of MTA) trying to deliver to the cyrus mail spool via LMTP. Fortunately, postfix >= 2.3 comes with options to fix that. First, the empty "Message-ID: " headers can just be discarded with header checks:

/^Message-ID:[[:space:]]*$/ IGNORE

e.g. in a file /etc/postfix/header_checks, which has to be listed in as

header_checks = regexp:/etc/postfix/header_checks

Getting rid of the NULL characters is even easier (starting with postfix 2.3) with an option in

message_strip_characters = \0

This is the "be liberal what you accept but strict what you send approach". Postfix will accept emails with NULL characters, but remove them before sending them out again or delivering to the local mail spool. Another option (which is said to catch a lot of SPAM and produce no false positives - but I haven't tried this yet) is

message_reject_characters = \0

This page was last modified on 2010-05-03