XML
  Home arrow XML arrow Page 5 - XML Tricks for C#
ASP Free Forums 
.NET  
ASP  
ASP Code  
ASP.NET  
ASP.NET Code  
BrainDump  
C#  
Code Examples  
Database  
Database Code  
IIS  
Microsoft Access  
MS SQL Server  
Visual Basic.NET  
Windows Scripting  
Windows Security  
XML  
ASP Web Hosting  
ASP.NET Web Hosting 
Mobile Linux 
App Generation ROI 
Windows Web Hosting
 
IBM® developerWorks 
Sun Developer Network 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Write For Us Get Paid 
Request Media Kit
Contact Us 
Site Map 
Privacy Policy 
Support 
 USERNAME
 
 PASSWORD
 
 
  >>> SIGN UP!  
  Lost Password? 
XML

XML Tricks for C#
By: Michael Youssef
  • Search For More Articles!
  • Disclaimer
  • Author Terms
  • Rating: 4 stars4 stars4 stars4 stars4 stars / 64
    2004-03-24

    Table of Contents:
  • XML Tricks for C#
  • Attributes and Document Complexity
  • A First Look at Encoding
  • Unicode
  • Encoding with XML

  • Rate this Article: Poor Best 
      ADD THIS ARTICLE TO:
      Del.ici.ous Digg
      Blink Simpy
      Google Spurl
      Y! MyWeb Furl
    Email Me Similar Content When Posted
    Add Developer Shed Article Feed To Your Site
    Email Article To Friend
    Print Version Of Article
    PDF Version Of Article
     
     
    ADVERTISEMENT


    XML Tricks for C# - Encoding with XML


    (Page 5 of 5 )

    When you write an XML document, by default, you will use the ANSI character set because editors like Notepad save documents using the ANSI character set.

    However, when an XML parser parses the document, it has a built-in mechanism to know the format and how to interpret the characters. XML Parsers use a built-in mechanism called Byte Order Mark (BOM). When a file is saved a BOM may be inserted as the beginning of the file to indicate the encoding. When using Windows, the default is Windows-1252 (where all Latin characters are supported), so when you save a file using the default encoding in Windows there will be no BOM. If you save the file as Unicode a BOM is inserted at the start of the file.

    Actually you will not see these BOM characters in most editors because they understand Unicode, so they strip out header information that the viewer is not supposed to see. How then does an XML parser read these documents and then ensure that it parses and outputs the correct character interpretations?  When an XML parser reads an XML file, the W3C defines the following three rules to decides how the document should be read:

    1. If there is a BOM, the BOM defines the file encoding
    2. If there is no BOM, then the encoding attribute in the XML declaration is definitive
    3. If there are neither of these, then assume the XML document is UTF-8 encoded

    Of course, if the BOM is incorrect, then it is likely that the XML file won't be correctly parsed and will throw an error. Equally, if there is no BOM or encoding declared and the default UTF-8 is used but the document is not UTF-8 encoded, then equally an error will be thrown. These should really not be a surprise; how can it decode characters when its definition is completely wrong? As I said before the first 128 characters of Unicode are the same as that of ASCII. So if your file consisted only of these characters you would be fine. However, if you include ASCII characters beyond 128, such as ñ and ç, you will run into difficulties.

    I'd like now to address a big problem with XML documents that we create. When writing XML documents, you can use the encoding attribute to specify the encoding character set that you use for your document. This is very confusing for beginners.

    At first, I want to tell you that when you open your Notepad to write the following simple XML document:


    < ? xml version="1.0" encoding="UTF-8" ? >
     
    <name>Michael Youssef </name>

    And save it using the File -> Save dialog box.

    Note the Encoding drop-down list which you can choose the encoding character set from a few character sets.

    Save the file with the default (ANSI character set). Now how does the XML parser decide the saved format? Look above at Rule #3. If there is no BOM, you will know that the XML parser will use the encoding in the encoding attribute. That is UTF-8 and it will read the characters very normally because, as we've by now learned, the first 128 of all encoding character sets will be the same.


    DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware.

     

    XML ARTICLES

    - More on Triggers and Styles and Control Temp...
    - Looking at Triggers with Styles and Control ...
    - A Closer Look at Styles and Control Templates
    - Styles and Control Templates
    - Properties and More in XAML
    - Elements and Attributes in XAML
    - XAML in a Nutshell
    - Importing XML Files into Access 2007
    - Using MSXML3.0 with VB 6.0
    - MSXML, concluded
    - MSXML, continued
    - MSXML Tutorial
    - Generating XML Schema Dynamically Using VB.N...
    - XSL Transformations using ASP.NET
    - Applying XSLT to XML Using ASP.NET





    © 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway
    Stay green...Green IT