15 December, 2003: Cross words

[ Home page | Web log ]

So, Saddam Hussein has emerged from his Undisclosed Location and Iraqis everywhere have been celebrating with the now-traditional car-bombings. However, unlike scores of other web-loggers, I'm going to let my ignorance of Iraq stand in the way of my commenting in any more detail.


Instead I'm going to talk about email. (This is where my half-dozen readers split into the non-technical, who will stop reading now; and the technical, who will disagree and then stop reading. Freedom of speech can be such a burden sometimes.)

Junk email is usually described as a `rising tide', a `flood', a `deluge', or in some other aqueous metaphor. And it's true that the stuff is mostly bilge. There's also a lot of it; in the past week, I've received, on average, 16 unwanted emails per hour; back at the end of August, it was more like 40 per hour, but that was a bit of a special case. In technicolour:

Rate of mail delivery

A couple of words of explanation of the plot. `Real' mail is mail which I want to receive, whether it's from actual human beings or garrulous computer programs like cron or Mailman. `Spam' is, basically, spam; advertisements, chain letters and the other internet detritus from people without the social skills to be telemarketers and who hear about a 0.0002% response rate and see an opportunity.

More generally, `spam' is stuff which can be automatically discarded based on content, so viruses like Sobig.F count as well. To filter spam, I use my own `Bayesian' filtering program, bfilter, and SpamAssassin, which is bigger, slower, and tries to be more general. Between them these two filters kill almost all spam; a few messages get through each week, and very occasionally a real mail is marked as spam.

`Forged' mail is error messages which are sent by remote sites in response to spams sent with fake headers which give one of my addresses as a return-path. Filtering these based on content is a really bad idea, because I don't want to lose error messages sent in response to my own mail. Instead I ensure that all mail I send carries a message-ID in a particular form; since message-IDs are usually quoted in error messages, it's then possible to tell whether a bounce message was caused by real mail, or forged spam.

The actual means by which email is delivered to me is quite complicated. A long time ago I was going to write a description of this. But the fact that my email system bears so much description probably means that I shouldn't expose others to its details. Anyway, I got as far as drawing a diagram of mail delivery before giving up:

Distribution diagram

This can safely be filed under `bad tube map art' and ignored, I think. Anyway, by explanation, the circles represent computers where I read mail, though the names have been changed; `rattus' is my home machine, on the grounds that one of its fans has started to squeak, and anyway when I'm at home I'm never more than 20 feet away from it; the other names are even less helpful. The lines of various colours represent transmission of email by various means.

The purpose of the whole contrivance is to ensure that email is available promptly and at any of the computers where I might want to read it. This obviously isn't important to most people, since they use Microsoft `Outlook' and set it to check email once every minute or something, and therefore add (on average) thirty seconds' needless delay to the delivery of every email. (I think that this is responsible for the rise of `instant messaging' and various other comedy internet protocols which were popular during the dot.com era, but it's possible that there was some other explanation.)

This all works fairly well, with only one wrinkle. Many companies have decided that it's easier just to spread email viruses then spend large sums of money on `anti-virus' software than simply to not use software which propagates viruses. (This may be a rational decision, though that would be slightly surprising.) The sellers of `anti-virus' software have also branched out into filtering spam. One thing they haven't branched out into is having a clue about email, with the result that `anti-virus' programs spray useless error messages around the internet with gay abandon. (It's important to remember that most computer viruses never affect any computers outside the offices of `anti-virus' software vendors. It's no great surprise that `anti-virus' software is, therefore, quite effective at stopping viruses. Sadly it is not true that the majority of spam is seen only inside the offices of `anti-virus' software vendors....)

And the people who write this software -- who are pretty dopey anyway -- really screw the pooch when it comes to sending those error messages. In internet email, an error message must be sent with a `blank return-path'; this bit of jargon means simply that the field which usually gives the address of the sender of the message should be blank. It is important that error messages be automaticallly distinguishable from normal messages, in part because an error message must never itself cause another error message to be sent -- doing so could cause broken mail servers to spend all their time bouncing error messages back and forth, consuming bandwidth, disk space, and money in an ever-increasing bonfire of cluelessness -- and also because it's useful for other types of software to be able to distinguish errors from other mails.

`Anti-virus' software vendors and users apparently don't understand this. Failing to maintain this distinction is incredibly irresponsible, and a much worse problem than the viruses themselves. After all, whether you run a computer which is susceptible to Microsoft Windows viruses is your own choice; but if you want to participate in the Internet, it's vital to handle email error messages correctly.

And so to my latest Internet guerilla campaign. Every time I receive one of these error messages, I'm going to reply with this form letter:

Important; please read

The message reproduced below is some kind of bounce or error message produced by anti-spam or anti-virus scanning software at your site. Since I am not a virus or a spammer, it was obviously sent me in error. This error occurred because your software made the completely wrong assumption that the addresses given in the from-address or return-path of an email it received were valid and could be used to communicate with the sender of that mail.

This assumption is wrong. Spammers and viruses routinely forge the addresses in email headers, and have done so now for many years. These addresses are wrong, and there is nothing you can do about it. By sending error messages to the wrong addresses quoted in the headers of spam and virus emails YOU ARE MAKING THE PROBLEM WORSE, by generating even more unwanted email.

You have compounded the problem by failing to send your error message with a blank return-path (written `<>'). All email error messages MUST be sent with a blank return-path, in order to show that they are error messages. This prevents mail loops, and also allows wrongly-sent error messages, like yours, effectively to be filtered out.

It is no excuse to say that your email scanning software comes from a respected vendor or that everyone else has made the same mistake. Your software is doing the wrong thing; it has already caused me -- and probably countless others -- wasted time and money; and one day it will cause a mail loop and cost YOU large amounts of time and money to fix the problems it created. It is your responsibility to fix these problem, just as surely as if your organisation was selling a dangerous product or dumping dangerous pollution in a watercourse.

You can fix your software by preventing it from sending error messages like that below; or, if you must send the error messages -- which are useless to 99% of the people who receive them, people whose email addresses appear in message headers by pure accident and who have nothing to do with your organisation and no desire to -- then you must make sure that the error messages originate with a blank return-path. Otherwise you will continue to cause the problems I describe above in even greater quantity.

If you did not understand any part of the above explanation, then please pass it on to someone in your organisation who is responsible for email service and who does understand it. If there is no such person in your organisation, then please for god's sake hire one without delay, before you create any more trouble for yourself and others.

I don't expect this to help -- and, indeed, I've already got one response from a fairly heavyweight academic institution in Denmark which completely missed the point. But it's important to try.

Copyright (c) 2003 Chris Lightfoot; available under a Creative Commons License.