Advanced String Manipulation Using MS Excel - The FIND Function in MS Excel
(Page 3 of 4 )
The FIND function in MS Excel makes your work even more efficient, especially when combined with the MID function.
Here is the definition of the FIND function in MS Excel:
=FIND(find_text,within_text,[start_num])
The "find text" is the character for which you are looking in the text. It can be only one character or even a block of text. You have to enclose the characters to be searched in between double quotes (").
The "within_text" part refers to the cell location or address of the text to be analyzed, while "start_num" tells Excel where to start searching (is it in the first character or in the succeeding characters?).
Here are some illustrative examples. One of the most difficult tasks of any webmaster is to extract the official URL out of the session ID-based URLs. Imagine that you are about to make a sitemap with the following URLs:
http://www.yoursessionidbasedwebsite.com/file-x-y-1.html?osCsid=g25145xf
http://www.yoursessionidbasedwebsite.com/file-x-y-2.html?osCsid=g25145xf
http://www.yoursessionidbasedwebsite.com/file-x-y-3.html?osCsid=g25145xf
http://www.yoursessionidbasedwebsite.com/file-x-y-4.html?osCsid=g25145xf
http://www.yoursessionidbasedwebsite.com/file-x-y-5.html?osCsid=g25145xf
http://www.yoursessionidbasedwebsite.com/file-x-y-6.html?osCsid=g25145xf
The above example is simple because it is only used for illustration purposes; in real world filtering of session IDs, things can get pretty complicated, especially if you have other long and ugly characters in the URLs (aside from the session ID), such in today's modern dynamic websites.
To take out the session ID above, you can analyze for similarities, just as we did in the MID function example previously. Note that all URLs have session IDs beginning with ?oSCsid, but we cannot filter by "?" since there could be other URLs with two "?" characters (in a real world session ID-based website) . Therefore we will want to find the location of osCid, and then use that location as the start of where things need to be filtered. For example, in this URL: http://www.yoursessionidbasedwebsite.com/file-x-y-4.php?osCsid=g25145xf , the location of "osCsid" is in the fifty-seventh character. So we will start filtering on the (57-2) = 55th character to include the "?" before the osCsid which is a part of the session ID. The result no longer contains the session ID.
To implement the above example in Excel, copy and paste the six URLs above to an Excel worksheet starting in cell A1; the last one will occupy on cell A6. Remove the spaces in between.
In cells B1 through B6, copy and paste the formula below:
=FIND("?osCsid",A1,1)
=FIND("?osCsid",A2,1)
=FIND("?osCsid",A3,1)
=FIND("?osCsid",A4,1)
=FIND("?osCsid",A5,1)
=FIND("?osCsid",A6,1)
The above formula means we will start searching the first character, looking for "?osCsid".
Finally, in cells C1 through C6, copy and paste the MID formula below, which will then extract the clean URL (without the session ID) in cells A1 through A6.
=MID(A1,1,B1-2)
=MID(A2,1,B2-2)
=MID(A3,1,B3-2)
=MID(A4,1,B4-2)
=MID(A5,1,B5-2)
=MID(A6,1,B6-2)
After this text manipulation, you should have the same results as shown in the screen shot below:
Next: The LEN function in MS Excel >>
More BrainDump Articles
More By Codex-M