SPAM is a four-letter word

Did you know that AOL is considering changing its catch phrase “You’ve Got Mail!” to “You’ve got SPAM!”? To be fair, most of the large e-mail providers, AOL included, do a pretty good job of blocking spam e-mail these days, but it continues to be an ever-growing problem.

First of all, let's define SPAM. As a verb, to SPAM can be defined as indiscriminately sending unsolicited, unwanted, irrelevant, or inappropriate messages, especially commercial advertising in mass quantities. The noun SPAM, of course, refers to the messages sent.

The name SPAM itself has humorous origins. We all know it originally as the much-maligned food product, which I actually like. As to associating with unwanted e-mail, it's related to a popular Monty Python sketch, first broadcast in 1970. In the sketch, two customers are trying to order a breakfast without Spam from a menu which includes the processed meat product in every entrée. The phrase "spam, spam, spam, spam, spam" is heard throughout, and the final cry of "I don't like SPAM!" sums up the feelings most people have for the bulk e-mailing practice.

The first e-mail spam is widely believed to have been sent in 1978. It was sent to several hundred addresses on the early Internet, and was intended to inform the recipients of a new computer product. At that time, the Internet was reserved for non-commercial traffic between government agencies and educational institutions, and this caused quite a ruckus.

Today's spam has a much wider reach, and often, a much more sinister purpose. Some estimate that spam accounts for 90% of all e-mail traffic on the Internet. In DWR alone, 500,000 spam messages are received in an average month. That's more than 16,000 per day, or about 60% of all received e-mail. While we are meeting here today, over 500 spams will be blocked by our e-mail servers. Looking at it another way, on average, a piece of spam is received at DWR every 5 seconds.

Home computer users are often protected from spam by their Internet Service Providers. However, private businesses, schools and colleges and government offices that operate their own mail servers, as we do at DWR, are left to their own devices to combat SPAM. By mid-2003, spam was becoming a major problem at DWR, and, as the Department's e-mail postmaster, I undertook the task of doing something about it. Having studied the problem and using software that I had previously developed for personal use, I introduced Department-wide spam filtering in September, 2003.

In order to understand spam filtering, it's necessary to understand the structure of an e-mail message itself.

An e-mail message is comprised of three basic parts, much like a standard letter.

First, there is the "envelope." This is information about the origin of the message, that is, the computer that is connecting to our system for delivery. We might compare this to the stamp and postmark on a regular letter. It also contains the e-mail address of the message sender (the return address) and the addresses of each recipient.

The next part is the "headers" which you would see normally as the "From," "To," "Subject" and "Date" fields. In a typical business letter, this can be compared to the letterhead, the complete address and the salutation. However, often hidden from view are additional headers that contain information about how the message is constructed and which path through the Internet it took to get to its final destination.

Finally, the body of the message, which is the actual information being sent. The body may be simple text, or could be a combination of text, images and other attachments.

Each of these parts of an e-mail message can be analyzed to determine the validity of a message, and whether or not it should be delivered.

For example, the computer address from which the message arrives can be examined to see if it is from a known source. That source may be considered safe or suspect. If suspect, the message may be dropped before the rest of it is even read. Then we can look at who is sending the message. If the message is from a person or organization that is known to be a spammer, we can also drop the message before it goes any further.

Once the envelope is "opened", we can take a look at the headers. There may be clues in this information that may allow us to discard the message. There may also be sufficient information to determine that the message itself is safe to send on.

Finally, we can examine the contents of the message itself. Certain attachments, such as viruses, can be isolated and removed to make the message safe for delivery, or dropped altogether.

It's usually easy for a person to look at a message and see that it is spam, but much harder for a computer to do so. A table of "banned" words in message subjects used to be enough to determine that a message should be dropped. These days, however, spammers often purposely misspell those words, or use alternate characters to disguise them from spam-scanning programs.

On the other hand, given a large enough set of messages, certain patterns about those messages can aid in the detection of spam. A message that looks innocent to a recipient, may in fact carry a dangerous virus, or otherwise induce the recipient to fall for a bank fraud, or cash transfer scheme. But spam-scanning software builds on a history of messages received, and can reduce the number of such occurrences.

With continued improvements in spam-filtering techniques, we can hope that spam will someday be a distant memory.