Spam Storm

I seem to have spent most of today—Tax Day, of all days—helping my mail servers recover from some Internet weather. Namely, a gargantuan load of spam delivered over a period of about 14 hours.

Anyone who uses email knows that spam is a fact of life—so much so that folks who've "only" been using the Internet for a decade or so probably think of it much the same way commuters think about weather or traffic. Some days it's heavy, some days it's not so heavy, sometimes it's particularly icky. But it's ever-present, can't easily be avoided, so you just do the best you can, ignore what doesn't affect you, and move on with your life.

For folks like me who run their own email servers—yeah, I'm that old-school—it's a different story. I don't keep on top of the statistics as comprehensively as I did a few years ago, but I can say with reasonable confidence that 90 to 95 percent of all email messages delivered to domains I control are spam. Bear in mind that I'm not the only user here: all told, I host about 200 active mailboxes, plus about 50 mailing lists, and on a typical weekday, we process from about 1,000 to 1,500 or so legitimate email messages, ranging from mailing lists to personal messages to brief notes with giant attachments—the usual gamut. It's small potatoes in the grand scheme of things, but hey. We get by.

Do the math, you'll see we also swallow a lot of spam.

I've run some spam filtering on my mail servers, but it's always been very conservative: I've taken the approach that I'd rather let spam through than block legitimate mail. The filters also also all home-brew, based on spam messages that make it through the existing filters as well as those collected by "honeypot" addresses hosted elsewhere. Periodically, these messages all get run through a set of text analysis and heuristic tools (written by yours truly) looking for identifiable patterns. When the programs identify patterns, they get tested against both spam and legitimate mail, and, if effective, implemented as real live filtering rules. Sure, I could use commercial or open source antispam packages, but I built this system before there were commercial or open source antispam packages…and I already know how it works, so keeping it going is easier than swapping it out.

Anyway, when I say we "swallow" a lot of spam, I mean exactly that: for the most part, if you were to send a message to an address at quibble.com that doesn't exist, the systems here will accept it as if the user was legitimate. But behind the scenes, the server just drops it on the floor immediately and forgets about it: no error, no bounces, nothing.

I do this for two interrelated reasons. First, I don't have to synchronize constantly-changing user account databases across multiple servers. That's just a pain in my arse, and I can't be bothered. Second, it means that messages delivered to bogus addresses via a backup mail server don't "blow back" to the sender—and, in many cases, back to me. That would also be a pain in my arse, and I don't want to be bothered.

Here's a quick summary of how mail delivery works: when you send a message to someone here at quibble.com, your mail program (or Web-based mail service, or whatever) first sends it to a mail server run by your ISP, hosting provider, mail service, or whoever. That server looks at your message, notices it's for quibble.com, and then essentially asks the Internet: "Tell me the address of the server handling email for quibble.com so I can deliver this thing." In most cases, the Internet (actually, the Domain Name System, or DNS) returns several possible answers, each with a number indicating its relative priority. The smaller the number, the higher a server's priority. So, in theory, your mail server will first attempt to deliver your message to a server with a priority of "10" rather than a priority of "20." If the priority "10" server isn't available, your server will eventually try delivering to the priority "20" server, and so on down the line.

Those lower priority servers (with the counter-intuitively higher numbers) are generally called "backups"—they generally don't hold mailboxes for that domain, but have an agreement to store and forward mail in case the primary servers are offline or unavailable.

Most domains that do any volume of email (or place any value on that traffic) specify multiple mail servers. I typically specify two for my domains (one priority "10," one priority "20"); conversely, the mega-domain yahoo.com currently specifies seven mail servers for its domain, all priority "1." In reality, those seven machines at Yahoo are likely gateways to still more mail servers…and they probably get rotated around quite frequently.

So, here's the thing with those backup servers: if you want them to behave just like your main mail server, they have to know about every valid email account, and they have to employ the exact same antispam and mail processing policies as the main server. Depending on the mail server technologies you use, that can be time consuming, expensive, and labor intensive…or any combination of those three. In my case, "time consuming" alone is enough to shoot down the idea of keeping the backup servers completely synced up with the master server. I've already spent too much of my life babysitting these things, so any time I don't spend managing servers is time well-spent.

So I employ a less rigorous approach. That (conservative) spam filtering is identical across my main and backup server, but the backup server is not aware of all my user accounts and mailing lists: it just accepts mail for any address in the entire domain and forwards it along to the main mail server as soon as it can.

This is actually a common setup among backup mail servers—particularly in cases where organizations have set up peering arrangements amongst themselves to provide some backup email support. For instance, I provide backup mail service for some friends and clients who only operate a single email server. I have no idea what accounts they've created and deleted: my backup server will happily accept mail for their domain, and forward it along to their email server. If their Internet connection or mail server goes down, the theory is that mine may still be available to accept messages, and no communications will be lost.

Thing is, spammers know all about this sort of backup mail server scenario. As a result, spam programs often invert the mail server priority system, preferentially delivering to the lowest priority servers (with the highest numbers!) It isn't until the messages are forwarded along to the master mail server that a message could get rejected as "user unknown" or "user's account is over quota" or what-have-you…and at that point, the message is out of the spammer's system and the mail administrator's problem. Around here, that'd make it my problem.

What to do with those messages? Well, I could bounce them back to the sender as undeliverable—and in a perfect world, that would be the right thing to do. In the real world, however, essentially all the sender addresses on spam email are forged. There are two varieties of forgeries: outright bogus addresses that are just fictional, and legitimate email addresses that don't belong to the spammer. In either case, bouncing a message doesn't send it back to the spammer: it'll either be undeliverable, or get delivered to some innocent third party. The second case is particularly troubling: it means the spammer is effectively sending spam through your system! The undeliverable bogus addresses are also a problem: eventually, your mail server will stop trying to send them, and they'll either be deleted or (more likely) get returned to the postmaster or administrator account for the domain. Around here, that's me again. Yuck.

Another problem with bouncing undeliverable messages from backup servers is that those bounces consume resources, just like regular email. Sending the messages back out again takes not only more processing power, but also storage to hold the message while it's being processed and bandwidth to transmit the message. Not much, sure, but multiply it across thousands of messages a day, 365 days a year, and it adds up. This overhead effectively increases the real cost of spam: now I'm expending more CPU cycles (making fewer available for legitimate email), using more storage for spam (meaning, less available for legitimate email), and letting spam consume bandwidth not only when it's delivered to my server, but again when my server tries to send it back out! That's not what I'm paying for, and I'm sure it's not why my broadband provider built their network!

So, the result is that my primary mail servers will accept messages for mailboxes that don't exist. And those messages just vanish. There's no error, they're never written to disk, and they're never processed. Just dropped. I want them disposed with as quickly and as efficiently as possible.

But today, even that wasn't enough. For some reason, spam streams converged and my servers were swamped with an exceptional amount of spam. At its peak, over 350 megabytes of incoming mail was simultaneously queued for processing; that probably represented over 100,000 messages, all of which arrived in the space of about two hours. I tried various tricks to speed the processing, weed out the queues, and hurry things along, but as soon as I thought I had a handle on things, another wave of messages would come in. All told, the old, creaky servers here processed over 1.2 GB of email today, the overwhelming majority of which was spam.

I'm pleased that the systems held up, but dismayed that the flood meant legitimate mail was delayed, sometimes by as much as a three hours. I don't think it was a deliberate denial-of-service attack—if someone were really trying to overwhelm my email servers, there are more effective ways to do it. Rather, the message storm seems like it was just a strange confluence of the Internet's unpredictable winds of spam.

But, like any bout of bad weather, I can't say I'm happy about it.

Related Entries