Cleaning Your Address List An easy and accurate way to use bounced messagesto clean your address list...
Lately atQuiksoft, we have been talking a lot about cleaning up our e-mail addresslist. Many of our customers have been asking how to reliably track thestatus of outbound e-mail messages, and how to update their address databasewhen a message is returned undeliverable, otherwise known as a bounce.
In this article you will learn:
Three very important reasons why your must clean your e-mail address list now
What you need to know about how SMTP servers route bounced messages
The secret to automatically matching bounced messages to addresses in your database
The difference between hard and soft bounces and why you should track both
Bonus secret to tracking failures on a mailing by mailing basis
This edition also contains downloadable sample code that will:
Encode your outbound messages with the proper information so that they can be matched to your address database if they are returned undeliverable
Scan your bounced messages and flag the addresses in your database
Provide you with tons of phrases found in typical bounced messages, which can be used to programmatically discover their meaning
Three reasons why you must clean your list now...
I used to think that the quality of my list didn't matter. I thoughtIt would be better to send to the entire list and let failures take care ofthemselves. But that was then, and this is now, and over the yearsexperience has taught me three important reasons why it is important to keep aclean list:
1.Some popular mailservers may block all mail from you if you repeatedly send mail to a badaddress on their domain.
2.Repeatedly sendinge-mail to bad addresses wastes bandwidth. Even if bandwidth is not anissue now, this problem will grow in scale with time.
3.If you are going to doany type of response tracking, you must subtract out the failures for anaccurate report.
So with these reasons in mind, I set out to clean our address list. But how to do it reliably was the question...
A simple answer to a complex problem...
To clean our address list I would have to identify bad addresses and flagthem in our address database so that I did not send e-mail to them anymore. I decided that I did not want to delete the bad addresses, I just wanted toflag them as being bad. But how do you determine that an address is bad?
Most SMTP servers will accept mail addressed to just about anyone in theirdomain, and only later figure out that the user does not exist. That means thatwhatever app you use to send mail will almost never know that there is aproblem. As far as your app is concerned, the SMTP server accepted themessage -- period.
I tried looking at so called "address verifier" components. These components check the email address for syntactical errors and fornon-existent domains, but they can not actually tell if the user part of theaddress is valid. I used several of these to validate buggs.bunny@microsoft.comand was excited to find that Buggs does work at Microsoft these days, butwhen I sent him an e-mail, it bounced back with the following message:"Delivery to the following recipients failed:buggs.bunny@microsoft.com". The truth is that these "addressverifier" components were no better at verifying addresses than my appwas, so they were of no use to me.
So how do you reliably determine if an address is good? The answer is-- you can't. But you can determine if an address is bad when a messagesent to it is returned undeliverable (bounced), and that is the key to solvingthis problem.
The best part of this solution is that it is not dependant on extended SMTPfeatures. It will work all the time provided that the recipient's mailserver correctly adheres to RFC-821, the minimum requirements for any SMTPserver. The SMTP protocol as outlined in RFC-821 provides for anotification mechanism when a message can not be delivered. Thisnotification mechanism works by creating a new e-mail message which is sent tothe original sender to inform them that their message was not delivered. This e-mail message is commonly referred to as a bounce. The first stepto cleaning our address list is to funnel the bounced messages into a centrallocation where they can be programmatically analyzed.
The following 3 step process, will enable you to capture bounced messages,figure out which address in your database they belong to, and flag therecord.
Three Easy Steps
Step 1. Use a bounce box...
The first step in cleaning your list is to trap bounced messages in acentral location. We suggest that you create a "bouncebox". A bounce box is a dedicated e-mail account that is setup totrap returned messages i.e. bounce@yourdomain.com. To be sure thatreturned messages find their way to your bounce box you must understand howthese messages are routed by SMTP servers.
When a message is submitted to an SMTP server it is tagged with areverse-path. The reverse-path is specified by the sending applicationwith the MAIL FROM: command as outlined in the SMTP RFC-821. Thereverse-path is the path the the server should use to communicate with theoriginal sender of the message, and therefore the reverse-path is typically thee-mail address of the sender (the from address).
The SMTP sever stores the reverse-path internally, not in the actualmessage, and forwards it with the message through any relay servers asnecessary until the message encounters an error or reaches its destination.Since the return-path is not recorded in the actual message it is typical toadd a From: header to the e-mail message which contains the address of thesender and an optional friendly name. i.e. "Joe Sender"<joe.sender@domain.com>. Mail readers use the From: header todisplay who a message is from.
It is very important to understand that the reverse-path and the address inthe From: header need not be the same. Therefore it is possible to send amessage which will be displayed by mail readers as coming fromjoe.sender@domain.com, but has a reverse-path ofsome_other_address@domain.com.
Once you understand the difference between the reverse-path and the From:header, and the roles they play, you are on your way to building messages thatwill be displayed in a friendly manner if delivered, or will be returned toyour centralized bounce box if there is a failure.
Step 2. Add custom data to bounced messages...
This step requires that your mail server is capable of being configured touse a wildcard address. In other words, it needs to be able to route allmail to bounce*@yourdomain.com to one specific account such asbounce@yourdomain.com. If your mail server does not support wildcardaddresses, you can accomplish the same thing by using a "catch-all"box and a dedicated domain.
You can then append custom data to the end of the account name portion ofthe return-path and it will still be delivered to the bounce@yourdomain.comaccount. For example, suppose each e-mail address in your database isidentified by a unique numerical id. You can then encode this id intoyour bounce address. For example, suppose that the recipient address isjane.recipient@domain.com, and the id of this address in your database is1063. You could then build an address such as bounce_1063@yourdomain.com.
You can then send a message to jane.recipient@domain.com and specifybounce_1063@yourdomain.com as the reverse-path by passing that address to theSMTP server with the MAIL FROM command. i.e. MAILFROM:<bounce_1063@yourdomain.com>. To provide a friendly"from" name or address for Jane's mail reader to display, you can adda From: header to the message. i.e. From: "Joe Sender"<joe.sender@domain.com>.
The sample at the end of this article shows how easily this can be done.
If the message is delivered successfully, Jane's mail reader will display itas coming from Joe Sender. If for some reason the message isundeliverable, a "undeliverable mail" notification message will besent to bounce_1063@yourdomain.com. Since your mail server has beeninstructed to deliver all messages for bounce*@yourdomain.com tobounce@yourdomain.com, this returned messages should now land in your bouncebox.
Additionally, since returned messages are returned to the address specifiedby its reverse-path, each of these messages should have your custom bounceaddress in the To: header. In other words, each of the messages in thebounce box will be addressed to bounce_<id>@yourdomain.com, where<id> represents the id of the e-mail address in your database which isrelated to the bounce. Our testing has indicated however that some mailservers use the From: address of the original message as the To: address of itsresulting bounce. This is not what should be going on according to theRFC, but we have a fix for that too. If the To: header address does notbegin with bounce_, you can scan the message's "Received" headers andfind your bounce address there. The sample code shows you how this isdone.
Following these rules, you can now easily match bounced messages up to yourdatabase, as you will see...
Step 3. Retrieve the bounced messages and update your database...
At this point, assuming you have sent mail as prescribed above, and some ofthose messages were returned, you will have one or more messages in your bouncebox. Each of these messages will be addressed tobounce_<id>@yourdomain.com, where <id> represents the id of thee-mail address in your database which is related to the bounce.
Now it is important to understand that there are two types of bounces: hardand soft. Permanent failures, such as a nonexistent account or domain,are considered hard bounces. Other failures, such as a full mailbox orblocked domain, are considered soft bounces. Instead of flagging youraddresses as good or bad, your database can keep a running count of hard andsoft bounces for each address. That way, your mailing application can bemore intelligent about determining which addresses to exclude from futuremailings. For example you might only want to send mail to any addresseswith less than 8 soft bounces and less than two hard bounces. I usuallydo not like to exclude someone from future mailings unless they have more thanone hard bounce. Just to be sure that the address is really invalid, Ilook for at least two hard bounces.
Your application will have to scan the text of the bounced messages lookingfor phrases that indicate the reason for the bounce. It will look forsuch phrases as "delivery failure", "box full",etc... (The downloadablesample code includes a database of the phrases we have discovered intypical bounced messages.) Your app will determine if each bounce is hardor soft based on the phrase it finds in the message.
Once your app determines if the bounce is hard or soft, it can increment thebounce_hard and bounce_soft fields in the database accordingly. It canthen delete the message from the bounce box. If your app can notdetermine if the message is a hard or soft bounce the message can be left inthe bounce box. Periodically the messages remaining in the bounce box canbe analyzed by a human who can visually determine why they were not identifiedby the phrase scanner algorithm. The algorithm can then be updated tocatch this type of message. Once your app is run again, it should handlethis message properly and clear it from the bounce box. As time goes on,your phrase scanning algorithm should improve more and more. If you start withthe phrases included with the downloadablesample code, your app should immediately id just about every bouncedmessages.
The Samples
The following VB Script samples interface with an Access database thatcontains the e-mail addresses. The second sample also interfaces with anXML file that contains the phrases typically found in bounced messages. The downloadablecode includes the source code shown below along with the Access and XMLfiles. The samples listed on this page vary slightly from thedownloadable code, as the code below has been edited to fit the newsletterformat.
SAMPLE 1: Constructing and sending the message...
In this sample, we will send a message with a friendly address in the From:header, and our bounce address specified as the reverse-path. Thisexample uses VB Script and the EasyMailSMTP object. The The SMTP object contains a FromAddr property, and bydefault the SMTP object will use the value specified by this property for boththe reverse-path and automatic creation of the From: header. We willoverride this behavior by setting the OptionFlags property to 1 which turns offthe automatic creation of the From: header. We will then create the From:header ourselves with the AddCustomHeader() method.
'To do: Set the following variables:
strLicenseKey = "Newsletter Sample/02V4BFDSFFDFSD62"
strMailServer="mail.yourdomain.com"
strBounceBoxDomain="yourdomain.com"
strFriendlyFromName="Joe Sender"
strFriendlyFromAddress="joe.sender@domain.com"
'End To Do
Set RS = CreateObject("ADODB.RecordSet")RS.Open "SELECT hard_bounces,id, name, address" &_" FROM email_table" &_" where hard_bounces < 2" &_" and soft_bounces < 4", cnnData, 1, 3"
'send to each address selectedDo While RS.EOF = False
'encode record id in from addressobjSMTP.FromAddr = "bounce_" & RS("id") &_"@" & strBounceBoxDomainobjSMTP.AddRecipient RS("name"), RS("address"), 1nRetVal = objSMTP.Send
'if the recipients address fails right'away then we mark it as a hard bounce now.If nRetVal = 8 ThenRS("hard_bounces") = RS("hard_bounces") + 1End If
'remove the recipientsobjSMTP.Clear 1
RS.MoveNext
Loop
'free remaining resourcesRS.ClosecnnData.Close
Sample 2: Scanning the bounced messages and updating your database...
This sample uses the EasyMailPOP3 object to download each message in our bounce box. Each messageis parsed and the body text is scanned for specific phrases to determine if themessage is a hard or a soft bounce. Once the code determines the type ofbounce, it parses the id off of the To: address which identifies the address inour database. If the To: address does not begin with "bounce"it scans the received headers for the bounce address by using the TimeStampscollection. The sample then updates the bounce_soft and bounce_hardfields in the database accordingly before deleting the message from the bouncebox. If the type of bounce can not be determined it is left in the bouncebox for human analysis which will be used to improve the phrase scanning codein the future. The phrases used to identify bounced messages are readfrom an XML file.
'To do: Set the following variables:
strLicenseKey = "Newsletter Sample/02E00220B529204B62"
strMailServer= "mail.yourdomain.com"
strAccount= "bounce_account"
strPassword= "bounce_password"
'End To Do
Set rs = CreateObject("ADODB.RecordSet")rs.Open "SELECT * FROM email_table", cnnData, 1, 3
'get the count of messages waiting in the 'bounce box and download and process each onenCnt = objPOP3.GetDownloadableCount()For x = 1 To nCntnOrdinal = objPOP3.DownloadSingleMessage(x)If nOrdinal < 0 ThenMsgBox "There was an error downloading " &_"the message. " & nOrdinalexit subEnd IfstrBodyText = objPOP3.Messages(nOrdinal).BodyText
'get id from To: addressset objMsgs = objPOP3.MessagesFor Each Recip In objMsgs(nOrdinal).RecipientsstrToAddr = Recip.AddressIf LCase(Left(strToAddr, 6)) = "bounce" ThenExit ForEnd ifNext
'if address is not found then try searching'timestamps (AKA received headers)If Not LCase(Left(strToAddr, 6)) = "bounce" ThenFor Each TimeS In objMsgs(nOrdinal).TimestampsstrToAddr = TimeS.ForIf LCase(Left(strToAddr, 6)) = "bounce" ThenExit ForEnd ifNextEnd If
'if it is a bounce message we will process itIf Left(strToAddr, 6) = "bounce" And _InStr(strToAddr, "_") ThennPos1 = InStr(strToAddr, "_") + 1nPos2 = InStr(strToAddr, "@")
If nPos2 > nPos1 ThennId = Mid(strToAddr, nPos1, nPos2 - nPos1)End If
'call the IdentifyBounce routing which scans'the bodytext for the phrases found in our'xml filenBounceType = IdentifyBounce(strBodyText)
If nBounceType > 0 Then
'the message has been identified as a hard'or soft bounce so update the databasers.Find ("id=" & nId)If rs.EOF = False and rs.BOF=False ThenIf nBounceType = 1 Thenrs("soft_bounces")=rs("soft_bounces")+1Elsers("hard_bounces")=rs("hard_bounces")+1End If'update changesrs.updateEnd If'delete the message from the bounce boxobjPOP3.DeleteSingleMessage x
elseif nBounceType = 0 then
'If nBounceType is 0 then it is a warning'message or auto-responsea so we will 'delete the message from the bounce box.objPOP3.DeleteSingleMessage xEnd IfEnd If
'free resources used by the parsed message. This'call does not delete messages from the server.objPOP3.Messages.DeleteAll
Next
'disconnect from mail server 'and free remaining resources objPOP3.Disconnect rs.Close msgbox "Operation Complete."
End sub
Function IdentifyBounce(strBodyText)
Set st = CreateObject("ADODB.Stream")Set rs = CreateObject("ADODB.RecordSet")
st.Openst.LoadFromFile ("bounce_signatures.xml")
rs.Open strs.Sort = "weight DESC"
IdentifyBounce = -1
Do While Not rs.EOFIf InStr(1, strBodyText, rs("signature"), _vbTextCompare) ThenIdentifyBounce = rs("weight")End Ifrs.MoveNextLooprs.Close
End Function
Conclusion
I hope you found this article useful in your efforts to clean your addresslist. If you have any suggestions for future topics, please let meknow. You can find my contact information at the bottom of this page.
Bonus. Measuring failures from a specific mailing...
Some of our customers want to measure the count of delivery failures foreach mailing they do. We showed you how to embed an id into the"reverse-path" so that it is easy to match the bounced message upwith the address in your database, but you can even go a step further byinserting a mailing identifier as well.
Lets say you want to keep track of the number of bounced messages for aspecific mailing, and lets assume that each mailing is represented by a row ina table. The row has a unique id field which is the mailingidentifier. You can encode the mailing identifier onto the accountportion of the reverse-path like this: bounce_1063_34@yourdomain.com, where1063 is the id of the address and 34 is the id of the mailing. You canthen modify your database update routine to flag the number of hard and soft bouncesfor each mailing as well as each address.
John Alessi has specialized in e-mail development for the past 5 years andhas helped many large companies like Microsoft, Boeing and EarthLink with theire-mail needs. He can be reached at john@quiksoftcorp.com.