Cleaning Your Address List
An easy and accurate way to use bounced messagesto clean your address list...
Lately atQuiksoft, we have been talking a lot about cleaning up our e-mail addresslist. Many of our customers have been asking how to reliably track thestatus of outbound e-mail messages, and how to update their address databasewhen a message is returned undeliverable, otherwise known as a bounce.
In this article you will learn:
- Three very important reasons why your must clean your e-mail address list now
- What you need to know about how SMTP servers route bounced messages
- The secret to automatically matching bounced messages to addresses in your database
- The difference between hard and soft bounces and why you should track both
- Bonus secret to tracking failures on a mailing by mailing basis
This edition also contains downloadable sample code that will:
- Encode your outbound messages with the proper information so that they can be matched to your address database if they are returned undeliverable
- Scan your bounced messages and flag the addresses in your database
- Provide you with tons of phrases found in typical bounced messages, which can be used to programmatically discover their meaning
Three reasons why you must clean your list now...
I used to think that the quality of my list didn't matter. I thoughtIt would be better to send to the entire list and let failures take care ofthemselves. But that was then, and this is now, and over the yearsexperience has taught me three important reasons why it is important to keep aclean list:
1. Some popular mailservers may block all mail from you if you repeatedly send mail to a badaddress on their domain.
2. Repeatedly sendinge-mail to bad addresses wastes bandwidth. Even if bandwidth is not anissue now, this problem will grow in scale with time.
3. If you are going to doany type of response tracking, you must subtract out the failures for anaccurate report.
So with these reasons in mind, I set out to clean our address list. But how to do it reliably was the question...
A simple answer to a complex problem...
To clean our address list I would have to identify bad addresses and flagthem in our address database so that I did not send e-mail to them anymore. I decided that I did not want to delete the bad addresses, I just wanted toflag them as being bad. But how do you determine that an address is bad?
Most SMTP servers will accept mail addressed to just about anyone in theirdomain, and only later figure out that the user does not exist. That means thatwhatever app you use to send mail will almost never know that there is aproblem. As far as your app is concerned, the SMTP server accepted themessage -- period.
I tried looking at so called "address verifier" components. These components check the email address for syntactical errors and fornon-existent domains, but they can not actually tell if the user part of theaddress is valid. I used several of these to validate buggs.bunny@microsoft.comand was excited to find that Buggs does work at Microsoft these days, butwhen I sent him an e-mail, it bounced back with the following message:"Delivery to the following recipients failed:buggs.bunny@microsoft.com". The truth is that these "addressverifier" components were no better at verifying addresses than my appwas, so they were of no use to me.
So how do you reliably determine if an address is good? The answer is-- you can't. But you can determine if an address is bad when a messagesent to it is returned undeliverable (bounced), and that is the key to solvingthis problem.
The best part of this solution is that it is not dependant on extended SMTPfeatures. It will work all the time provided that the recipient's mailserver correctly adheres to RFC-821, the minimum requirements for any SMTPserver. The SMTP protocol as outlined in RFC-821 provides for anotification mechanism when a message can not be delivered. Thisnotification mechanism works by creating a new e-mail message which is sent tothe original sender to inform them that their message was not delivered. This e-mail message is commonly referred to as a bounce. The first stepto cleaning our address list is to funnel the bounced messages into a centrallocation where they can be programmatically analyzed.
The following 3 step process, will enable you to capture bounced messages,figure out which address in your database they belong to, and flag therecord.
Three Easy Steps
Step 1. Use a bounce box...
The first step in cleaning your list is to trap bounced messages in acentral location. We suggest that you create a "bouncebox". A bounce box is a dedicated e-mail account that is setup totrap returned messages i.e. bounce@yourdomain.com. To be sure thatreturned messages find their way to your bounce box you must understand howthese messages are routed by SMTP servers.
When a message is submitted to an SMTP server it is tagged with areverse-path. The reverse-path is specified by the sending applicationwith the MAIL FROM: command as outlined in the SMTP RFC-821. Thereverse-path is the path the the server should use to communicate with theoriginal sender of the message, and therefore the reverse-path is typically thee-mail address of the sender (the from address).
The SMTP sever stores the reverse-path internally, not in the actualmessage, and forwards it with the message through any relay servers asnecessary until the message encounters an error or reaches its destination.Since the return-path is not recorded in the actual message it is typical toadd a From: header to the e-mail message which contains the address of thesender and an optional friendly name. i.e. "Joe Sender"<joe.sender@domain.com>. Mail readers use the From: header todisplay who a message is from.
It is very important to understand that the reverse-path and the address inthe From: header need not be the same. Therefore it is possible to send amessage which will be displayed by mail readers as coming fromjoe.sender@domain.com, but has a reverse-path ofsome_other_address@domain.com.
Once you understand the difference between the reverse-path and the From:header, and the roles they play, you are on your way to building messages thatwill be displayed in a friendly manner if delivered, or will be returned toyour centralized bounce box if there is a failure.
Step 2. Add custom data to bounced messages...
This step requires that your mail server is capable of being configured touse a wildcard address. In other words, it needs to be able to route allmail to bounce*@yourdomain.com to one specific account such asbounce@yourdomain.com. If your mail server does not support wildcardaddresses, you can accomplish the same thing by using a "catch-all"box and a dedicated domain.
You can then append custom data to the end of the account name portion ofthe return-path and it will still be delivered to the bounce@yourdomain.comaccount. For example, suppose each e-mail address in your database isidentified by a unique numerical id. You can then encode this id intoyour bounce address. For example, suppose that the recipient address isjane.recipient@domain.com, and the id of this address in your database is1063. You could then build an address such as bounce_1063@yourdomain.com.
You can then send a message to jane.recipient@domain.com and specifybounce_1063@yourdomain.com as the reverse-path by passing that address to theSMTP server with the MAIL FROM command. i.e. MAILFROM:<bounce_1063@yourdomain.com>. To provide a friendly"from" name or address for Jane's mail reader to display, you can adda From: header to the message. i.e. From: "Joe Sender"<joe.sender@domain.com>.
The sample at the end of this article shows how easily this can be done.
If the message is delivered successfully, Jane's mail reader will display itas coming from Joe Sender. If for some reason the message isundeliverable, a "undeliverable mail" notification message will besent to bounce_1063@yourdomain.com. Since your mail server has beeninstructed to deliver all messages for bounce*@yourdomain.com tobounce@yourdomain.com, this returned messages should now land in your bouncebox.
Additionally, since returned messages are returned to the address specifiedby its reverse-path, each of these messages should have your custom bounceaddress in the To: header. In other words, each of the messages in thebounce box will be addressed to bounce_<id>@yourdomain.com, where<id> represents the id of the e-mail address in your database which isrelated to the bounce. Our testing has indicated however that some mailservers use the From: address of the original message as the To: address of itsresulting bounce. This is not what should be going on according to theRFC, but we have a fix for that too. If the To: header address does notbegin with bounce_, you can scan the message's "Received" headers andfind your bounce address there. The sample code shows you how this isdone.
Following these rules, you can now easily match bounced messages up to yourdatabase, as you will see...
Step 3. Retrieve the bounced messages and update your database...
At this point, assuming you have sent mail as prescribed above, and some ofthose messages were returned, you will have one or more messages in your bouncebox. Each of these messages will be addressed tobounce_<id>@yourdomain.com, where <id> represents the id of thee-mail address in your database which is related to the bounce.
Now it is important to understand that there are two types of bounces: hardand soft. Permanent failures, such as a nonexistent account or domain,are considered hard bounces. Other failures, such as a full mailbox orblocked domain, are considered soft bounces. Instead of flagging youraddresses as good or bad, your database can keep a running count of hard andsoft bounces for each address. That way, your mailing application can bemore intelligent about determining which addresses to exclude from futuremailings. For example you might only want to send mail to any addresseswith less than 8 soft bounces and less than two hard bounces. I usuallydo not like to exclude someone from future mailings unless they have more thanone hard bounce. Just to be sure that the address is really invalid, Ilook for at least two hard bounces.
Your application will have to scan the text of the bounced messages lookingfor phrases that indicate the reason for the bounce. It will look forsuch phrases as "delivery failure", "box full",etc... (The downloadablesample code includes a database of the phrases we have discovered intypical bounced messages.) Your app will determine if each bounce is hardor soft based on the phrase it finds in the message.
Once your app determines if the bounce is hard or soft, it can increment thebounce_hard and bounce_soft fields in the database accordingly. It canthen delete the message from the bounce box. If your app can notdetermine if the message is a hard or soft bounce the message can be left inthe bounce box. Periodically the messages remaining in the bounce box canbe analyzed by a human who can visually determine why they were not identifiedby the phrase scanner algorithm. The algorithm can then be updated tocatch this type of message. Once your app is run again, it should handlethis message properly and clear it from the bounce box. As time goes on,your phrase scanning algorithm should improve more and more. If you start withthe phrases included with the downloadablesample code, your app should immediately id just about every bouncedmessages.
The Samples
The following VB Script samples interface with an Access database thatcontains the e-mail addresses. The second sample also interfaces with anXML file that contains the phrases typically found in bounced messages. The downloadablecode includes the source code shown below along with the Access and XMLfiles. The samples listed on this page vary slightly from thedownloadable code, as the code below has been edited to fit the newsletterformat.
SAMPLE 1: Constructing and sending the message...
In this sample, we will send a message with a friendly address in the From:header, and our bounce address specified as the reverse-path. Thisexample uses VB Script and the EasyMailSMTP object. The The SMTP object contains a FromAddr property, and bydefault the SMTP object will use the value specified by this property for boththe reverse-path and automatic creation of the From: header. We willoverride this behavior by setting the OptionFlags property to 1 which turns offthe automatic creation of the From: header. We will then create the From:header ourselves with the AddCustomHeader() method.
'To do: Set the following variables:
strLicenseKey = "Newsletter Sample/02V4BFDSFFDFSD62"
strMailServer="mail.yourdomain.com"
strBounceBoxDomain="yourdomain.com"
strFriendlyFromName="Joe Sender"
strFriendlyFromAddress="joe.sender@domain.com"
'End To Do
Dim objSMTP, Data, RS, nRetVal
'create EasyMail SMTP object and set basic propertiesSet objSMTP = CreateObject("EasyMail.SMTP")objSMTP.LicenseKey = strLicenseKeyobjSMTP.MailServer = strMailServerobjSMTP.OptionFlags = 1objSMTP.AddCustomHeader "From", _ """" & strFriendlyFromName & """" &_ " <" & strFriendlyFromAddress & ">"objSMTP.Subject = "Subject..."objSMTP.BodyText = "Message text" 'setup database and select addresses.'This sample uses a access database.Set cnnData = CreateObject("ADODB.Connection")strConnection = "DBQ=email_database.mdb" cnnData.Open "DRIVER=" &_ "{Microsoft Access Driver (*.mdb)};" &_ strConnectionSet RS = CreateObject("ADODB.RecordSet")RS.Open "SELECT hard_bounces,id, name, address" &_ " FROM email_table" &_ " where hard_bounces < 2" &_ " and soft_bounces < 4", cnnData, 1, 3"'send to each address selectedDo While RS.EOF = False
'encode record id in from address objSMTP.FromAddr = "bounce_" & RS("id") &_ "@" & strBounceBoxDomain objSMTP.AddRecipient RS("name"), RS("address"), 1 nRetVal = objSMTP.Send
'if the recipients address fails right 'away then we mark it as a hard bounce now. If nRetVal = 8 Then RS("hard_bounces") = RS("hard_bounces") + 1 End If 'remove the recipients objSMTP.Clear 1
RS.MoveNext
Loop
'free remaining resourcesRS.ClosecnnData.Close
Sample 2: Scanning the bounced messages and updating your database...
This sample uses the EasyMailPOP3 object to download each message in our bounce box. Each messageis parsed and the body text is scanned for specific phrases to determine if themessage is a hard or a soft bounce. Once the code determines the type ofbounce, it parses the id off of the To: address which identifies the address inour database. If the To: address does not begin with "bounce"it scans the received headers for the bounce address by using the TimeStampscollection. The sample then updates the bounce_soft and bounce_hardfields in the database accordingly before deleting the message from the bouncebox. If the type of bounce can not be determined it is left in the bouncebox for human analysis which will be used to improve the phrase scanning codein the future. The phrases used to identify bounced messages are readfrom an XML file.
'To do: Set the following variables:
strLicenseKey = "Newsletter Sample/02E00220B529204B62"
strMailServer= "mail.yourdomain.com"
strAccount= "bounce_account"
strPassword= "bounce_password"
'End To Do
Main
Sub Main()
Dim objPOP3, nCnt Dim nBounceType, nId, nPos1, nPos2 Dim strBodyText, strToAddr, nOrdinal Dim strConnection, nRetVal
'create the EasyMail POP3 object and assign 'the basic properties Set objPOP3 = CreateObject("EasyMail.POP3") objPOP3.LicenseKey = strLicenseKey objPOP3.MailServer = strMailServer objPOP3.Account = strAccount objPOP3.Password = strPassword 'connect to the mail server nRetVal = objPOP3.Connect() If Not nRetVal = 0 Then MsgBox "Error connecting to mail server." exit sub End If
'prepare the database and select our e-mail table Set cnnData = CreateObject("ADODB.Connection") strConnection = "DBQ=email_database.mdb" cnnData.Open "DRIVER=" &_ "{Microsoft Access Driver (*.mdb)};" &_ strConnection Set rs = CreateObject("ADODB.RecordSet") rs.Open "SELECT * FROM email_table", cnnData, 1, 3 'get the count of messages waiting in the 'bounce box and download and process each one nCnt = objPOP3.GetDownloadableCount() For x = 1 To nCnt nOrdinal = objPOP3.DownloadSingleMessage(x) If nOrdinal < 0 Then MsgBox "There was an error downloading " &_ "the message. " & nOrdinal exit sub End If strBodyText = objPOP3.Messages(nOrdinal).BodyText
'get id from To: address set objMsgs = objPOP3.Messages For Each Recip In objMsgs(nOrdinal).Recipients strToAddr = Recip.Address If LCase(Left(strToAddr, 6)) = "bounce" Then Exit For End if Next
'if address is not found then try searching 'timestamps (AKA received headers) If Not LCase(Left(strToAddr, 6)) = "bounce" Then For Each TimeS In objMsgs(nOrdinal).Timestamps strToAddr = TimeS.For If LCase(Left(strToAddr, 6)) = "bounce" Then Exit For End if Next End If
'if it is a bounce message we will process it If Left(strToAddr, 6) = "bounce" And _ InStr(strToAddr, "_") Then nPos1 = InStr(strToAddr, "_") + 1 nPos2 = InStr(strToAddr, "@")
If nPos2 > nPos1 Then nId = Mid(strToAddr, nPos1, nPos2 - nPos1) End If
'call the IdentifyBounce routing which scans 'the bodytext for the phrases found in our 'xml file nBounceType = IdentifyBounce(strBodyText)
If nBounceType > 0 Then
'the message has been identified as a hard 'or soft bounce so update the database rs.Find ("id=" & nId) If rs.EOF = False and rs.BOF=False Then If nBounceType = 1 Then rs("soft_bounces")=rs("soft_bounces")+1 Else rs("hard_bounces")=rs("hard_bounces")+1 End If 'update changes rs.update End If 'delete the message from the bounce box objPOP3.DeleteSingleMessage x elseif nBounceType = 0 then
'If nBounceType is 0 then it is a warning 'message or auto-responsea so we will 'delete the message from the bounce box. objPOP3.DeleteSingleMessage x End If End If
'free resources used by the parsed message. This 'call does not delete messages from the server. objPOP3.Messages.DeleteAll
Next
'disconnect from mail server 'and free remaining resources objPOP3.Disconnect rs.Close msgbox "Operation Complete."
End sub
Function IdentifyBounce(strBodyText)
Set st = CreateObject("ADODB.Stream") Set rs = CreateObject("ADODB.RecordSet") st.Open st.LoadFromFile ("bounce_signatures.xml") rs.Open st rs.Sort = "weight DESC"
IdentifyBounce = -1
Do While Not rs.EOF If InStr(1, strBodyText, rs("signature"), _ vbTextCompare) Then IdentifyBounce = rs("weight") End If rs.MoveNext Loop rs.CloseEnd Function
Conclusion
I hope you found this article useful in your efforts to clean your addresslist. If you have any suggestions for future topics, please let meknow. You can find my contact information at the bottom of this page.
Bonus. Measuring failures from a specific mailing...
Some of our customers want to measure the count of delivery failures foreach mailing they do. We showed you how to embed an id into the"reverse-path" so that it is easy to match the bounced message upwith the address in your database, but you can even go a step further byinserting a mailing identifier as well.
Lets say you want to keep track of the number of bounced messages for aspecific mailing, and lets assume that each mailing is represented by a row ina table. The row has a unique id field which is the mailingidentifier. You can encode the mailing identifier onto the accountportion of the reverse-path like this: bounce_1063_34@yourdomain.com, where1063 is the id of the address and 34 is the id of the mailing. You canthen modify your database update routine to flag the number of hard and soft bouncesfor each mailing as well as each address.
John Alessi has specialized in e-mail development for the past 5 years andhas helped many large companies like Microsoft, Boeing and EarthLink with theire-mail needs. He can be reached at john@quiksoftcorp.com.
©2002 QuiksoftCorporation. All rights reserved. Unauthorized duplication or distributionprohibited. Quiksoft, EasyMail, EasyMail Objects, EasyMail .Net Edition,EasyMail Advanced API, EasyMail SMTP Express, and MailStore are trademarks ofQuiksoft Corporation. Other trademarks mentioned are the property of theirlegal owner.
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |
More ASP Code Articles
More By aspfree
developerWorks - FREE Tools! |
WebSphere Process Server delivers a unique integration framework that simplifies existing IT resources. Often, as IT assets grow to support business demand, so too does their complexity and manageability. In this webcast, we’ll discuss how WebSphere Process Server helps deliver an SOA infrastructure that provides a common model to orchestrate, mediate, connect, map, and execute the underlying IT functions. Discover how WebSphere Process Server simplifies integration of business processes by leveraging existing IT assets as reusable services without the complexities of traditional integration methodologies. FREE! Go There Now!
|
|
|
|
Download a free trial version of IBM DB2 9.5 for Linux, UNIX, and Windows. DB2 9 is the result of a five-year development project that transformed traditional (static) database technology into an interactive data server that merges the high performance and ease of use of DB2 with the self-describing benefits of XML. FREE! Go There Now!
|
|
|
|
Visit IBM developerWorks to download a free trial version of WebSphere Extended Deployment Compute Grid, which lets you schedule, execute, and monitor batch jobs. Because online transaction processing and batch jobs execute simultaneously on the same server resources, you can avoid costly duplication of resources. Compute Grid supports job types of Java transactional batch, compute-intensive and a new type called "native execution", which enables non-Java workloads to run on distributed end points. FREE! Go There Now!
|
|
|
|
Asset Reuse is a key strategy for companies looking to create innovative solutions to solve complex software development problems. Searching for, identifying, updating, using and deploying software assets can be a difficult challenge. Listen to this webcast, to learn about strategies and tools that you can leverage for a successful project, including Rational Asset Manager, Rational Software Architect and WebSphere Service Registry and Repository. FREE! Go There Now!
|
|
|
|
Portfolio Management is about effectively managing portfolio value by aligning portfolio investments with business goals. This complimentary e-kit provides a collection of materials that can help you understand how IBM Rational enables and automates best practices for improved governance and clear visibility into portfolio and project performance across the entire IT project lifecycle. FREE! Go There Now!
|
|
|
|
This Fall, IBM Rational talks to you directly through a special teleconference series giving you access to the best minds in IBM Rational - product experts and market thought leaders who will answer your questions during these pre-scheduled telephone conference calls. Register today! FREE! Go There Now!
|
|
|
|
Join this Rational Talks to You teleconference on November 29 at 1:00 pm ET to participate in an interactive discusssion with Grady Booch around architecture and reuse. Get your questions answered! FREE! Go There Now!
|
|
|
|
Because access to government information continues to be an area of concern for many U.S. citizens with disabilities, the U.S. government enacted Section 508 of the Rehabilitation Act in 2001 to ensure that government agencies create accessible Web content, enabling all citizens to access the information they need. A fully accessible Web site makes Web content accessible to all individuals, including those with disabilities, who may be accessing Web content via a variety of user agents. Common user agents include standard Web browsers, text-only browsers, assistive devices and mobile devices such as cell phones or personal digital assistants (PDAs). FREE! Go There Now!
|
|
|
|
Attend this launch webcast with Scott Hebner, Vice President of IBM Rational Marketing and Strategy, where he will overview Rational’s new offerings and programs to help customers accelerate software innovation on System z. He will discuss how these solutions help organizations extend their core business processes toward modern architectures such as SOA and web technologies to deliver business improvements that stand the test of time. FREE! Go There Now!
|
|
|
|
IBM Lotus Notes 8 provides a wide range of developers the ability to provide customized, integrated user interfaces via composite applications and via custom sidebar and toolbar plug-ins. This webcast provides you with tips and techniques to use with out-of-the-box capabilities of Lotus Notes 8, and survey how you can share useful components within your own company and within a larger community. FREE! Go There Now!
|
|
|
|
All FREE IBM® developerWorks Tools! |