Cleaning Your Address List
An easy and accurate way to use bounced messagesto clean your address list...
Lately atQuiksoft, we have been talking a lot about cleaning up our e-mail addresslist. Many of our customers have been asking how to reliably track thestatus of outbound e-mail messages, and how to update their address databasewhen a message is returned undeliverable, otherwise known as a bounce.
In this article you will learn:
- Three very important reasons why your must clean your e-mail address list now
- What you need to know about how SMTP servers route bounced messages
- The secret to automatically matching bounced messages to addresses in your database
- The difference between hard and soft bounces and why you should track both
- Bonus secret to tracking failures on a mailing by mailing basis
This edition also contains downloadable sample code that will:
- Encode your outbound messages with the proper information so that they can be matched to your address database if they are returned undeliverable
- Scan your bounced messages and flag the addresses in your database
- Provide you with tons of phrases found in typical bounced messages, which can be used to programmatically discover their meaning
Three reasons why you must clean your list now...
I used to think that the quality of my list didn't matter. I thoughtIt would be better to send to the entire list and let failures take care ofthemselves. But that was then, and this is now, and over the yearsexperience has taught me three important reasons why it is important to keep aclean list:
1. Some popular mailservers may block all mail from you if you repeatedly send mail to a badaddress on their domain.
2. Repeatedly sendinge-mail to bad addresses wastes bandwidth. Even if bandwidth is not anissue now, this problem will grow in scale with time.
3. If you are going to doany type of response tracking, you must subtract out the failures for anaccurate report.
So with these reasons in mind, I set out to clean our address list. But how to do it reliably was the question...
A simple answer to a complex problem...
To clean our address list I would have to identify bad addresses and flagthem in our address database so that I did not send e-mail to them anymore. I decided that I did not want to delete the bad addresses, I just wanted toflag them as being bad. But how do you determine that an address is bad?
Most SMTP servers will accept mail addressed to just about anyone in theirdomain, and only later figure out that the user does not exist. That means thatwhatever app you use to send mail will almost never know that there is aproblem. As far as your app is concerned, the SMTP server accepted themessage -- period.
I tried looking at so called "address verifier" components. These components check the email address for syntactical errors and fornon-existent domains, but they can not actually tell if the user part of theaddress is valid. I used several of these to validate buggs.bunny@microsoft.comand was excited to find that Buggs does work at Microsoft these days, butwhen I sent him an e-mail, it bounced back with the following message:"Delivery to the following recipients failed:buggs.bunny@microsoft.com". The truth is that these "addressverifier" components were no better at verifying addresses than my appwas, so they were of no use to me.
So how do you reliably determine if an address is good? The answer is-- you can't. But you can determine if an address is bad when a messagesent to it is returned undeliverable (bounced), and that is the key to solvingthis problem.
The best part of this solution is that it is not dependant on extended SMTPfeatures. It will work all the time provided that the recipient's mailserver correctly adheres to RFC-821, the minimum requirements for any SMTPserver. The SMTP protocol as outlined in RFC-821 provides for anotification mechanism when a message can not be delivered. Thisnotification mechanism works by creating a new e-mail message which is sent tothe original sender to inform them that their message was not delivered. This e-mail message is commonly referred to as a bounce. The first stepto cleaning our address list is to funnel the bounced messages into a centrallocation where they can be programmatically analyzed.
The following 3 step process, will enable you to capture bounced messages,figure out which address in your database they belong to, and flag therecord.
Three Easy Steps
Step 1. Use a bounce box...
The first step in cleaning your list is to trap bounced messages in acentral location. We suggest that you create a "bouncebox". A bounce box is a dedicated e-mail account that is setup totrap returned messages i.e. bounce@yourdomain.com. To be sure thatreturned messages find their way to your bounce box you must understand howthese messages are routed by SMTP servers.
When a message is submitted to an SMTP server it is tagged with areverse-path. The reverse-path is specified by the sending applicationwith the MAIL FROM: command as outlined in the SMTP RFC-821. Thereverse-path is the path the the server should use to communicate with theoriginal sender of the message, and therefore the reverse-path is typically thee-mail address of the sender (the from address).
The SMTP sever stores the reverse-path internally, not in the actualmessage, and forwards it with the message through any relay servers asnecessary until the message encounters an error or reaches its destination.Since the return-path is not recorded in the actual message it is typical toadd a From: header to the e-mail message which contains the address of thesender and an optional friendly name. i.e. "Joe Sender"<joe.sender@domain.com>. Mail readers use the From: header todisplay who a message is from.
It is very important to understand that the reverse-path and the address inthe From: header need not be the same. Therefore it is possible to send amessage which will be displayed by mail readers as coming fromjoe.sender@domain.com, but has a reverse-path ofsome_other_address@domain.com.
Once you understand the difference between the reverse-path and the From:header, and the roles they play, you are on your way to building messages thatwill be displayed in a friendly manner if delivered, or will be returned toyour centralized bounce box if there is a failure.
Step 2. Add custom data to bounced messages...
This step requires that your mail server is capable of being configured touse a wildcard address. In other words, it needs to be able to route allmail to bounce*@yourdomain.com to one specific account such asbounce@yourdomain.com. If your mail server does not support wildcardaddresses, you can accomplish the same thing by using a "catch-all"box and a dedicated domain.
You can then append custom data to the end of the account name portion ofthe return-path and it will still be delivered to the bounce@yourdomain.comaccount. For example, suppose each e-mail address in your database isidentified by a unique numerical id. You can then encode this id intoyour bounce address. For example, suppose that the recipient address isjane.recipient@domain.com, and the id of this address in your database is1063. You could then build an address such as bounce_1063@yourdomain.com.
You can then send a message to jane.recipient@domain.com and specifybounce_1063@yourdomain.com as the reverse-path by passing that address to theSMTP server with the MAIL FROM command. i.e. MAILFROM:<bounce_1063@yourdomain.com>. To provide a friendly"from" name or address for Jane's mail reader to display, you can adda From: header to the message. i.e. From: "Joe Sender"<joe.sender@domain.com>.
The sample at the end of this article shows how easily this can be done.
If the message is delivered successfully, Jane's mail reader will display itas coming from Joe Sender. If for some reason the message isundeliverable, a "undeliverable mail" notification message will besent to bounce_1063@yourdomain.com. Since your mail server has beeninstructed to deliver all messages for bounce*@yourdomain.com tobounce@yourdomain.com, this returned messages should now land in your bouncebox.
Additionally, since returned messages are returned to the address specifiedby its reverse-path, each of these messages should have your custom bounceaddress in the To: header. In other words, each of the messages in thebounce box will be addressed to bounce_<id>@yourdomain.com, where<id> represents the id of the e-mail address in your database which isrelated to the bounce. Our testing has indicated however that some mailservers use the From: address of the original message as the To: address of itsresulting bounce. This is not what should be going on according to theRFC, but we have a fix for that too. If the To: header address does notbegin with bounce_, you can scan the message's "Received" headers andfind your bounce address there. The sample code shows you how this isdone.
Following these rules, you can now easily match bounced messages up to yourdatabase, as you will see...
Step 3. Retrieve the bounced messages and update your database...
At this point, assuming you have sent mail as prescribed above, and some ofthose messages were returned, you will have one or more messages in your bouncebox. Each of these messages will be addressed tobounce_<id>@yourdomain.com, where <id> represents the id of thee-mail address in your database which is related to the bounce.
Now it is important to understand that there are two types of bounces: hardand soft. Permanent failures, such as a nonexistent account or domain,are considered hard bounces. Other failures, such as a full mailbox orblocked domain, are considered soft bounces. Instead of flagging youraddresses as good or bad, your database can keep a running count of hard andsoft bounces for each address. That way, your mailing application can bemore intelligent about determining which addresses to exclude from futuremailings. For example you might only want to send mail to any addresseswith less than 8 soft bounces and less than two hard bounces. I usuallydo not like to exclude someone from future mailings unless they have more thanone hard bounce. Just to be sure that the address is really invalid, Ilook for at least two hard bounces.
Your application will have to scan the text of the bounced messages lookingfor phrases that indicate the reason for the bounce. It will look forsuch phrases as "delivery failure", "box full",etc... (The downloadablesample code includes a database of the phrases we have discovered intypical bounced messages.) Your app will determine if each bounce is hardor soft based on the phrase it finds in the message.
Once your app determines if the bounce is hard or soft, it can increment thebounce_hard and bounce_soft fields in the database accordingly. It canthen delete the message from the bounce box. If your app can notdetermine if the message is a hard or soft bounce the message can be left inthe bounce box. Periodically the messages remaining in the bounce box canbe analyzed by a human who can visually determine why they were not identifiedby the phrase scanner algorithm. The algorithm can then be updated tocatch this type of message. Once your app is run again, it should handlethis message properly and clear it from the bounce box. As time goes on,your phrase scanning algorithm should improve more and more. If you start withthe phrases included with the downloadablesample code, your app should immediately id just about every bouncedmessages.
The Samples
The following VB Script samples interface with an Access database thatcontains the e-mail addresses. The second sample also interfaces with anXML file that contains the phrases typically found in bounced messages. The downloadablecode includes the source code shown below along with the Access and XMLfiles. The samples listed on this page vary slightly from thedownloadable code, as the code below has been edited to fit the newsletterformat.
SAMPLE 1: Constructing and sending the message...
In this sample, we will send a message with a friendly address in the From:header, and our bounce address specified as the reverse-path. Thisexample uses VB Script and the EasyMailSMTP object. The The SMTP object contains a FromAddr property, and bydefault the SMTP object will use the value specified by this property for boththe reverse-path and automatic creation of the From: header. We willoverride this behavior by setting the OptionFlags property to 1 which turns offthe automatic creation of the From: header. We will then create the From:header ourselves with the AddCustomHeader() method.
'To do: Set the following variables:
strLicenseKey = "Newsletter Sample/02V4BFDSFFDFSD62"
strMailServer="mail.yourdomain.com"
strBounceBoxDomain="yourdomain.com"
strFriendlyFromName="Joe Sender"
strFriendlyFromAddress="joe.sender@domain.com"
'End To Do
Dim objSMTP, Data, RS, nRetVal
'create EasyMail SMTP object and set basic propertiesSet objSMTP = CreateObject("EasyMail.SMTP")objSMTP.LicenseKey = strLicenseKeyobjSMTP.MailServer = strMailServerobjSMTP.OptionFlags = 1objSMTP.AddCustomHeader "From", _ """" & strFriendlyFromName & """" &_ " <" & strFriendlyFromAddress & ">"objSMTP.Subject = "Subject..."objSMTP.BodyText = "Message text" 'setup database and select addresses.'This sample uses a access database.Set cnnData = CreateObject("ADODB.Connection")strConnection = "DBQ=email_database.mdb" cnnData.Open "DRIVER=" &_ "{Microsoft Access Driver (*.mdb)};" &_ strConnectionSet RS = CreateObject("ADODB.RecordSet")RS.Open "SELECT hard_bounces,id, name, address" &_ " FROM email_table" &_ " where hard_bounces < 2" &_ " and soft_bounces < 4", cnnData, 1, 3"'send to each address selectedDo While RS.EOF = False
'encode record id in from address objSMTP.FromAddr = "bounce_" & RS("id") &_ "@" & strBounceBoxDomain objSMTP.AddRecipient RS("name"), RS("address"), 1 nRetVal = objSMTP.Send
'if the recipients address fails right 'away then we mark it as a hard bounce now. If nRetVal = 8 Then RS("hard_bounces") = RS("hard_bounces") + 1 End If 'remove the recipients objSMTP.Clear 1
RS.MoveNext
Loop
'free remaining resourcesRS.ClosecnnData.Close
Sample 2: Scanning the bounced messages and updating your database...
This sample uses the EasyMailPOP3 object to download each message in our bounce box. Each messageis parsed and the body text is scanned for specific phrases to determine if themessage is a hard or a soft bounce. Once the code determines the type ofbounce, it parses the id off of the To: address which identifies the address inour database. If the To: address does not begin with "bounce"it scans the received headers for the bounce address by using the TimeStampscollection. The sample then updates the bounce_soft and bounce_hardfields in the database accordingly before deleting the message from the bouncebox. If the type of bounce can not be determined it is left in the bouncebox for human analysis which will be used to improve the phrase scanning codein the future. The phrases used to identify bounced messages are readfrom an XML file.
'To do: Set the following variables:
strLicenseKey = "Newsletter Sample/02E00220B529204B62"
strMailServer= "mail.yourdomain.com"
strAccount= "bounce_account"
strPassword= "bounce_password"
'End To Do
Main
Sub Main()
Dim objPOP3, nCnt Dim nBounceType, nId, nPos1, nPos2 Dim strBodyText, strToAddr, nOrdinal Dim strConnection, nRetVal
'create the EasyMail POP3 object and assign 'the basic properties Set objPOP3 = CreateObject("EasyMail.POP3") objPOP3.LicenseKey = strLicenseKey objPOP3.MailServer = strMailServer objPOP3.Account = strAccount objPOP3.Password = strPassword 'connect to the mail server nRetVal = objPOP3.Connect() If Not nRetVal = 0 Then MsgBox "Error connecting to mail server." exit sub End If
'prepare the database and select our e-mail table Set cnnData = CreateObject("ADODB.Connection") strConnection = "DBQ=email_database.mdb" cnnData.Open "DRIVER=" &_ "{Microsoft Access Driver (*.mdb)};" &_ strConnection Set rs = CreateObject("ADODB.RecordSet") rs.Open "SELECT * FROM email_table", cnnData, 1, 3 'get the count of messages waiting in the 'bounce box and download and process each one nCnt = objPOP3.GetDownloadableCount() For x = 1 To nCnt nOrdinal = objPOP3.DownloadSingleMessage(x) If nOrdinal < 0 Then MsgBox "There was an error downloading " &_ "the message. " & nOrdinal exit sub End If strBodyText = objPOP3.Messages(nOrdinal).BodyText
'get id from To: address set objMsgs = objPOP3.Messages For Each Recip In objMsgs(nOrdinal).Recipients strToAddr = Recip.Address If LCase(Left(strToAddr, 6)) = "bounce" Then Exit For End if Next
'if address is not found then try searching 'timestamps (AKA received headers) If Not LCase(Left(strToAddr, 6)) = "bounce" Then For Each TimeS In objMsgs(nOrdinal).Timestamps strToAddr = TimeS.For If LCase(Left(strToAddr, 6)) = "bounce" Then Exit For End if Next End If
'if it is a bounce message we will process it If Left(strToAddr, 6) = "bounce" And _ InStr(strToAddr, "_") Then nPos1 = InStr(strToAddr, "_") + 1 nPos2 = InStr(strToAddr, "@")
If nPos2 > nPos1 Then nId = Mid(strToAddr, nPos1, nPos2 - nPos1) End If
'call the IdentifyBounce routing which scans 'the bodytext for the phrases found in our 'xml file nBounceType = IdentifyBounce(strBodyText)
If nBounceType > 0 Then
'the message has been identified as a hard 'or soft bounce so update the database rs.Find ("id=" & nId) If rs.EOF = False and rs.BOF=False Then If nBounceType = 1 Then rs("soft_bounces")=rs("soft_bounces")+1 Else rs("hard_bounces")=rs("hard_bounces")+1 End If 'update changes rs.update End If 'delete the message from the bounce box objPOP3.DeleteSingleMessage x elseif nBounceType = 0 then
'If nBounceType is 0 then it is a warning 'message or auto-responsea so we will 'delete the message from the bounce box. objPOP3.DeleteSingleMessage x End If End If
'free resources used by the parsed message. This 'call does not delete messages from the server. objPOP3.Messages.DeleteAll
Next
'disconnect from mail server 'and free remaining resources objPOP3.Disconnect rs.Close msgbox "Operation Complete."
End sub
Function IdentifyBounce(strBodyText)
Set st = CreateObject("ADODB.Stream") Set rs = CreateObject("ADODB.RecordSet") st.Open st.LoadFromFile ("bounce_signatures.xml") rs.Open st rs.Sort = "weight DESC"
IdentifyBounce = -1
Do While Not rs.EOF If InStr(1, strBodyText, rs("signature"), _ vbTextCompare) Then IdentifyBounce = rs("weight") End If rs.MoveNext Loop rs.CloseEnd Function
Conclusion
I hope you found this article useful in your efforts to clean your addresslist. If you have any suggestions for future topics, please let meknow. You can find my contact information at the bottom of this page.
Bonus. Measuring failures from a specific mailing...
Some of our customers want to measure the count of delivery failures foreach mailing they do. We showed you how to embed an id into the"reverse-path" so that it is easy to match the bounced message upwith the address in your database, but you can even go a step further byinserting a mailing identifier as well.
Lets say you want to keep track of the number of bounced messages for aspecific mailing, and lets assume that each mailing is represented by a row ina table. The row has a unique id field which is the mailingidentifier. You can encode the mailing identifier onto the accountportion of the reverse-path like this: bounce_1063_34@yourdomain.com, where1063 is the id of the address and 34 is the id of the mailing. You canthen modify your database update routine to flag the number of hard and soft bouncesfor each mailing as well as each address.
John Alessi has specialized in e-mail development for the past 5 years andhas helped many large companies like Microsoft, Boeing and EarthLink with theire-mail needs. He can be reached at john@quiksoftcorp.com.
©2002 QuiksoftCorporation. All rights reserved. Unauthorized duplication or distributionprohibited. Quiksoft, EasyMail, EasyMail Objects, EasyMail .Net Edition,EasyMail Advanced API, EasyMail SMTP Express, and MailStore are trademarks ofQuiksoft Corporation. Other trademarks mentioned are the property of theirlegal owner.
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |
More ASP Code Articles
More By aspfree
developerWorks - FREE Tools! |
Hold your calendar on January 30, 2008 for this free webcast on the new i5/OS. Rational's Enterprise Modernization products will be discussed at this webcast as they help to drive the application development environment for this new System i OS. <br />And learn how i5/OS will take you to the next step of efficient, resilient business processing. You will hear about the new i5/OS capabilities as it will be the most significant i5/OS release in years. If you cannot join the webcast on 1/30/08 you can still use this link to listen to the replay.<br /> FREE! Go There Now!
|
|
|
|
You'll get answers to many questions and more from David Barnes, Lead Evangelist for IBM Emerging Internet Technologies. David will discuss aspects of Web 2.0 that bring value to corporations, academia, and government. He'll also discuss IBM's vision around Web 2.0, including the importance of remixability and consumability. The discussion will culminate with examples of various IBM Software Group solutions you can use to get ahead of the Web 2.0 adoption curve. FREE! Go There Now!
|
|
|
|
Download a free trial version of IBM DB2 9.5 for Linux, UNIX, and Windows. DB2 9 is the result of a five-year development project that transformed traditional (static) database technology into an interactive data server that merges the high performance and ease of use of DB2 with the self-describing benefits of XML. FREE! Go There Now!
|
|
|
|
This Fall, IBM Rational talks to you directly through a special teleconference series giving you access to the best minds in IBM Rational - product experts and market thought leaders who will answer your questions during these pre-scheduled telephone conference calls. Register today! FREE! Go There Now!
|
|
|
|
Rational Build Forge Express Edition is an automation framework that packages the latest enterprise-grade technologies into a reliable, flexible and robust configuration designed and priced specifically for small to midsize businesses. The new Rational Build Forge Express eKit provides you with valuable resources – including a case study, podcast, demo, and articles – to help you increase staff productivity, compress development cycles and deliver better software, fast. FREE! Go There Now!
|
|
|
|
Join this Rational Talks to You teleconference on December 11 at 1:00 pm ET to get tips on building your own plugins with Rational Method Composer. Get your questions answered! FREE! Go There Now!
|
|
|
|
Get a free trial download of the latest version of IBM Rational Tester for SOA Quality V7.0.1, a functional and regression testing tool that enables the creation, comprehension, modification and execution of testing GUI-less Web services. FREE! Go There Now!
|
|
|
|
You can now evaluate IBM Rational Asset Manager V7.0 online without installing or configuring it on your own system! Rational Asset Manager helps create, modify, govern, find, and reuse any type of development assets, including SOA and systems development assets. Rational Asset Manager helps you reduce software development costs and improve quality by facilitating the reuse of all types of software development-related assets. Visit developerWorks to learn more about this product and register to explore its capabilities online. FREE! Go There Now!
|
|
|
|
User communities play an important role in communication and collaboration around products, solutions and other areas of special interest to members. Successful communities are able to provide the right mix of content and services to deliver a value proposition that resonates with each audience. Join Tom Inman, VP of Marketing for Information and Platform Solutions as he introduces the new LeverageINFORMATION community. During this webcast, learn about the value provided by the community and how customers and partners derive value from the community in addressing their own technical and business challenges. FREE! Go There Now!
|
|
|
|
Viper 2 brings a great value to developer communities including SQL, XML, PHP, Ruby, .NET and Java. You probably already know that DB2 Express-C is free for developers to develop, deploy and distribute. Viper 2 provides a variety of means that help move your application from the development stage to deployment more rapidly. This webcast shows how to best utilize the latest tools available for developing DB2 applications. FREE! Go There Now!
|
|
|
|
All FREE IBM® developerWorks Tools! |