BrainDump
  Home arrow BrainDump arrow Page 4 - Extracting Google-Indexed Web Site Pages U...
ASP Free Forums 
.NET  
ASP  
ASP Code  
ASP.NET  
ASP.NET Code  
BrainDump  
C#  
Code Examples  
Database  
Database Code  
IIS  
Microsoft Access  
MS SQL Server  
Silverlight  
Visual Basic.NET  
Windows Scripting  
Windows Security  
XML  
Mobile Linux 
App Generation ROI 
IBM® developerWorks 
ASP Web Hosting  
ASP.NET Web Hosting 
Windows Web Hosting
 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
BRAINDUMP

Extracting Google-Indexed Web Site Pages Using MS Excel
By: Codex-M
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 3
    2009-06-11

    Table of Contents:
  • Extracting Google-Indexed Web Site Pages Using MS Excel
  • Understanding the Google Search Result
  • The Process
  • Explaining the Results

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    Extracting Google-Indexed Web Site Pages Using MS Excel - Explaining the Results


    (Page 4 of 4 )

    The above steps should filter rows and leave only rows containing the following domain name URL: www.aspfree.com and cached link.

    After filtering the rows, select the result of the filtered rows and copy and paste that into another Excel worksheet.

    The result of the filtered rows should have information somewhat like this:

    www.aspfree.com/ - 75k - Cached - Similar pages -

    www.aspfree.com/c/b/XML/ - 47k - Cached - Similar pages -

    www.aspfree.com/c/b/IIS/ - 47k - Cached - Similar pages -

    www.aspfree.com/asp/freeasphost.asp - 77k - Cached - Similar pages -

    www.aspfree.com/c/b/Silverlight/ - 46k - Cached - Similar pages -

    www.aspfree.com/c/b/BrainDump/ - 47k - Cached - Similar pages -

    www.aspfree.com/c/b/ASP/ - 47k - Cached - Similar pages -

    At this point, the results are not yet the indexed URLs, as they contain unrelated data, such as the file size of the web page (75k, for example) and other stuff.

    What we will do is extract only the URLs in the rows, so for example in this data:

    www.aspfree.com/c/b/BrainDump/ - 47k - Cached - Similar pages -

    We will extract only the URL so that it will now be:

    www.aspfree.com/c/b/BrainDump/

    To do that in MS Excel, you will have to use the function:

    =CONCATENATE("http://",MID(A1,1,(FIND(" ",A1,1))-1))

    So after filtering the information and copying that to another sheet as instructed in the previous steps, copy and paste that formula into cell B1, with your filtered data in cell A1.

    Column B in the Excel worksheet will now give the indexed URLs. If the data to be filtered is placed starting in cell A1, make sure that the formula is placed in cell B1, and it will be:

    =CONCATENATE("http://",MID(A1,1,(FIND(" ",A1,1))-1))

    Click and drag the formula until all data are correctly filtered for clean URLs.

    The only disadvantage with this method is that the results will be inaccurate for long URLs. So after filtering and applying text manipulation using Excel, it will be useless because Google displays dots for long URLs at the end. For example:

    www.somewebsite.com/this.../should-be-a-very-very-long-url-which-google-will-display-properly-and-will-makefiltering-hard...

    In this case, you have to open it manually using the browser to extract URLs for what is shown exactly in the address bar. Having long URLs is not recommended, as it tends to look unfriendly and spammy for search engines. In this case, consider shortening the URLs in your website.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

       · Hi,The post is helpful. Though I think we can further enhance the last step where...
     

    BRAINDUMP ARTICLES

    - Introduction to Office Live Workspace
    - Using MS Excel for One-way Analysis of Varia...
    - Comparing Data Sets Using Statistical Analys...
    - Import Blogger Posts into WordPress Using Wi...
    - Download WordPress from an FTP Server and Ru...
    - Install and Run WordPress in XAMPP Local Host
    - What Windows 7 Brings to the Table
    - Virtualization and Sandbox Detection
    - Advanced Firebug Techniques in Windows XP Ho...
    - Editing CSS with Firebug in Windows XP Home
    - Using Firebug in Windows XP Home
    - Migrating to Exchange Server 2007
    - Using System Restore on a Non-Bootable PC
    - Finding Logged on Users and More Scripting S...
    - Developing Macro Commands in MS Excel





    © 2003-2009 by Developer Shed. All rights reserved. DS Cluster 6 Hosted by Hostway
    For more Enterprise Application Development news, visit eWeek