There're a few things about conditions in procmail recipes that are not quite trivial from the
procmailrc manpage, but you should know if you want to understand how things work.
By default the regular expressions of recipe condition lines are run on the header part of emails. If you take a look on the raw source of an email, you'll see something like this:
From bob@example.com Tue Sep 04 10:58:53 2007
Return-path: <bob@example.com>
Envelope-to: alice@example2.com
Delivery-date: Tue, 04 Sep 2007 10:58:53 +0200
Received: from mail.example3.com ([192.168.0.1])
by mail.example2.com with esmtp (Exim 3.36 #1 (Debian))
id 1ISoS0-0001Ex-00
for <alice@example2.com>; Wed, 05 Sep 2007 10:58:53 +0200
Received: from mail.example.com ([192.168.0.2]) by mail.example3.com with Microsoft SMTPSVC(6.0.3790.3959);
Tue, 4 Sep 2007 10:58:52 +0200
Received: from [127.0.0.1] ([192.168.0.3]) by mail.example.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959);
Tue, 4 Sep 2007 10:58:52 +0200
Message-ID: <46DD1E79.90004@example.com>
Date: Tue, 04 Sep 2007 10:58:49 +0200
From: joe@example.com
User-Agent: Thunderbird 1.5.0.13 (X11/20070824)
MIME-Version: 1.0
To: john@example2.com
Subject: teszt
Content-Type: text/plain; charset=ISO-8859-2; format=flowed
Content-Transfer-Encoding: quoted-printable
This is the body of the email.
You might have noticed that there're some headers spanning over multiple lines (eg. the "Received" headers in the above example). Procmail does merge these split lines together before it runs any recipes on it, so the first
Received: header will look like this as the input of a recipe (the following is a single line, it's just wrapped by the website's content rendering engine):
Received: from mail.example3.com ([192.168.0.1]) by mail.example2.com with esmtp (Exim 3.36 #1 (Debian)) id 1ISoS0-0001Ex-00 for <alice@example2.com>; Wed, 05 Sep 2007 10:58:53 +0200
There's another trick that you should be aware of: the newlines are not simply removed at the end of each line during the merge, but they're replaced by a space character. Plus each additional line of the
Received: header in the raw email started with a tabulator character and this is not changed during the merge!
So if you're going to do precise matching in regexps (where I mean not just using
.* between various fixed strings of the regexp, but some more exact whitespace pattern), then do not forget to think of the tabulator character too. Unfortunately in procmail regular expressions you cannot use character classes (eg.
[:space:]) and cannot use some common escape sequences either (eg.
\t as a tabulator). You've to use the literal characters, so a tab must be a character with 0x09 ASCII code.
The
procmail-lib Debian package has a huge number of very advanced recipes that you can use as learning material. It contains a
pm-javar.rc file with a lot of predefined variables that you can use in place of character classes.
Eg. whitespace (both space and tab characters) can be matched in recipes like this:
WSPC = " " # space and tab
SPC = "[$WSPC]"
s = "$SPC"
:0
*$ ^Received:$s*from$s+mail\.example3\.com$s*\(\[192\.168\.0\.1\]\)$s*by$s+mail\.example2\.com
some_mailbox
The above example recipe matches successfully, because we used everywhere the proper whitespace regexp. If we used
" *" instead of
"$s*" after the IP address part, then the regexp would not match due to the tab in the merged header line.
Of course a condition on
Received: headers is not really common, but
Subject:,
To: and
Cc: headers can grow easily large enough to be split into multiple lines.
Comments
SpamAssassin whitelisting by "Received" headers
Actually SpamAssassin already has a config option that just does what I did through the procmail recipe. It's called
whitelist_from_rcvd. You've to put it into your user config ($HOME/.spamassassin/user_pres) and specify the mail address and the pattern that has to match the relay server's reverse DNS. Thus if your own mail arrives always through localhost, you could whitelist it with this config:whitelist_from_rcvd your_address@example.com localhostIn some cases localhost may be identified as
localhost.localdomainso you might want to add that too. This works pretty well if you use eg. a webmail on the same server, where your email is delivered to.