Strings and Characters, Part 2

This half of the chapter on Strings and Characters starts with how to return a byte[] consisting of characters instead of a string, and covers other topics such as converting string types to their equvalent value type and creating a delimited string. (From C# Cookbook by Stephen Teilhet and Jay Hilyard, O'Reilly Media, ISBN: 0596003390, 2004.)

Contributed by
Rating: 5 stars5 stars5 stars5 stars5 stars / 8
August 03, 2004
Rate this Article:
MEH MEH++


SEARCH ASP FREE
TOOLS YOU CAN USE

advertisement

c sharp cookbookPart 1 of this article is available at this link.

2.13 Converting a String Returned as a Byte[ ] Back into a String

Problem

Many methods in the FCL return a byte[] consisting of characters instead of a string. Some of these methods include:




System.Net.Sockets.Socket.Receive System.Net.Sockets.Socket.ReceiveFrom System.Net.Sockets.Socket.BeginReceive System.Net.Sockets.Socket.BeginReceiveFrom System.Net.Sockets.NetworkStream.Read System.Net.Sockets.NetworkStream.BeginRead System.IO.BinaryReader.Read System.IO.BinaryReader.ReadBytes System.IO.FileStream.Read System.IO.FileStream.BeginRead System.IO.MemoryStream // Constructor System.IO.MemoryStream.Read System.IO.MemoryStream.BeginRead System.Security.Cryptography.CryptoStream.Read System.Security.Cryptography.CryptoStream.BeginRead System.Diagnostics.EventLogEntry.Data

In many cases, this byte[] might contain ASCII or Unicode encoded characters. You need a way to recombine this byte[] to obtain the original string.

Solution

To convert a byte array of ASCII values to a complete string, use the following method:

using System;
using System.Text;

public static string FromASCIIByteArray(byte[] characters)

{

ASCIIEncoding encoding = new ASCIIEncoding();

string constructedString = encoding.GetString(characters);

return (constructedString);
}

To convert a byte array of Unicode values (UTF-16 encoded) to a complete string, use the following method:

public static string FromUnicodeByteArray(byte[] characters)

{

UnicodeEncoding encoding = new UnicodeEncoding();

string constructedString = encoding.GetString(characters);

return (constructedString);
}

Discussion

The GetString method of the ASCIIEncoding class converts 7-bit ASCII characters contained in a byte array to a string. Any value larger than 127 is converted to the ? character. The ASCIIEncoding class can be found in the System.Text namespace. The GetString method is overloaded to accept additional arguments as well. The overloaded versions of the method convert all or part of a string to ASCII and then store the result in a specified range inside a byte array.

The GetString method returns a string containing the converted byte array of ASCII characters.

The GetString method of the UnicodeEncoding class converts Unicode characters into 16-bit Unicode values. The UnicodeEncoding class can be found in the System.Text namespace. The GetString method returns a string containing the converted byte array of Unicode characters.

See Also

See the “ASCIIEncoding Class” and “UnicodeEncoding Class” topics in the MSDN documentation.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

2.14 Passing a String to a Method that Accepts Only a Byte[ ]

Problem

Many methods in the FCL accept a byte[] consisting of characters instead of a string. Some of these methods include:

System.Net.Sockets.Socket.Send System.Net.Sockets.Socket.SendTo System.Net.Sockets.Socket.BeginSend System.Net.Sockets.Socket.BeginSendTo System.Net.Sockets.NetworkStream.Write System.Net.Sockets.NetworkStream.BeginWrite System.IO.BinaryWriter.Write System.IO.FileStream.Write System.IO.FileStream.BeginWrite System.IO.MemoryStream.Write System.IO.MemoryStream.BeginWrite System.Security.Cryptography.CryptoStream.Write System.Security.Cryptography.CryptoStream.BeginWrite System.Diagnostics.EventLog.WriteEntry

In many cases, you might have a string that you need to pass into one of these methods or some other method that only accepts a byte[]. You need a way to break up this string into a byte[].

Solution

To convert a string to a byte array of ASCII values, use the GetBytes method on an instance of the ASCIIEncoding class:

using System;
using System.Text;

public static byte[] ToASCIIByteArray(string characters)

{

ASCIIEncoding encoding = new ASCIIEncoding();

int numberOfChars = encoding.GetByteCount(characters);

byte[] retArray = new byte[numberOfChars];

retArray = encoding.GetBytes(characters);

return (retArray);
}

To convert a string to a byte array of Unicode values, use the UnicodeEncoding class:

public static byte[] ToUnicodeByteArray(string characters)

{

UnicodeEncoding encoding = new UnicodeEncoding();

int numberOfChars = encoding.GetByteCount(characters);

byte[] retArray = new byte[numberOfChars];

retArray = encoding.GetBytes(characters);

return (retArray);
}

Discussion

The GetBytes method of the ASCIIEncoding class converts ASCII characters—con-tained in either a char array or a string—into a byte array of 7-bit ASCII values. Any value larger than 127 is converted to the ? character. The ASCIIEncoding class can be found in the System.Text namespace. The GetBytes method is overloaded to accept additional arguments as well. The overloaded versions of the method convert all or part of a string to ASCII and then store the result in a specified range inside a byte array, which is returned to the caller.

The GetBytes method of the UnicodeEncoding class converts Unicode characters into 16-bit Unicode values. The UnicodeEncoding class can be found in the System.Text namespace. The GetBytes method returns a byte array, each element of which contains the Unicode value of a single character of the string.

A single Unicode character in the source string or in the source char array corresponds to two elements of the byte array. For example, the following byte array contains the ASCII value of the letter 'S':

byte[] sourceArray = {83};

However, for a byte array to contain a Unicode representation (UTF-16 encoded) of the letter 'S', it must contain two elements. For example:

byte[] sourceArray = {83, 0};

The Intel architecture uses a little-endian encoding, which means that the first element is the least-significant byte and the second element is the most-significant byte. Other architectures may use big-endian encoding, which is the opposite of littleendian encoding. The UnicodeEncoding class supports both big-endian and littleendian encodings. Using the UnicodeEncoding instance constructor, you can construct an instance that uses either big-endian or little-endian ordering. In addition, you have the option to indicate whether a byte order mark preamble should be generated so that readers of the file will know which endianness is in use.

See Also

See the “ASCIIEncoding Class” and “UnicodeEncoding Class” topics in the MSDN documentation.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

2.15 Converting Strings to Their Equivalent Value Type

Problem

You have a string that represents the equivalent value of a number ("12"), char ("a"), bool ("true"), or a color enumeration ("Red"). You need to convert this string to its equivalent value type. Therefore, the number "12" would be converted to a numeric value such as int, short, float, etc. The string "a" would be converted to a char value 'a', the string "true" would be converted to a bool value, and the color "Red" could be converted to an enumeration value (if an enumeration were defined that contained the element Red).

Solution

Use the Parse static method of the type that the string is to be converted to. To convert a string containing a number to its numeric type, use the following code:

// This code requires the use of the System and System.Globalization namespaces

string longString = "7654321";
int actualInt = Int32.Parse(longString); // longString = 7654321

string dblString = "-7654.321"; double actualDbl = Double.Parse(dblString, NumberStyles.AllowDecimalPoint | NumberStyles.AllowLeadingSign); // longString = "-7654.321

To convert a string containing a Boolean value to a Boolean type, use the following code:

// This code requires the use of the System namespace

string boolString = "true"; bool actualBool = Boolean.Parse(boolString); // actualBool = true

To convert a string containing a char value to a char type, use the following code:

// This code requires the use of the System namespace

string charString = "t"; char actualChar = char.Parse(charString); // actualChar = 't'

To convert a string containing an enumeration value to an enumeration type, use the following code:

// This code requires the use of the System namespace

enum Colors
{
red, green, blue
}

string colorString = "blue";
// Note that the Parse method below is a method defined by System.Enum, not by Colors
Colors actualEnum = (Colors)Colors.Parse(typeof(Colors), colorString);

// actualEnum = blue

Discussion

The static Parse method on certain types derived from the ValueType data types allows easy conversion from a string value to the value of that specific value type. The Parse method is supported by the following types:

Boolean     Int64
Byte         SByte
Decimal     Single
Double      UInt16
Int16        UInt32
Int32        UInt64

In addition to the Parse methods that take a single string parameter and convert it to the target data type, each numeric type has a second overloaded version of the Parse method that includes a second parameter of type System.Globalization. NumberStyles. This allows the Parse method to correctly handle specific properties of numbers, such as leading or trailing signs, decimal points, currency symbols, thousands separators, etc. NumberStyles is marked as a flag-style enumeration, so you can bitwise OR more than one enumerated value together to allow a group of styles to be used on the string.

The NumberStyles enumeration is defined as follows:

AllowCurrencySymbol

If the string contains a number with a currency symbol, it is parsed as currency; otherwise, it is parsed as a number.

AllowDecimalPoint

Allows a decimal point in the number.

AllowExponent

Allows the number to be in exponential notation format.

AllowHexSpecifier

Allows characters that specify a hexadecimal number.

AllowLeadingSign

Allows a leading sign symbol.

AllowLeadingWhite

Ignores any leading whitespace.

AllowParentheses

Allows parentheses.

AllowThousands

Allows group separators.

AllowTrailingSign

Allows a trailing sign symbol.

AllowTrailingWhite

Ignores any trailing whitespace.

Any

Applies any of the previous styles. This style simply ORs together all of the preceding styles.

Currency

Same as the All style, except that the AllowExponent style is omitted.

Float

Equivalent to AllowLeadingWhite | AllowTrailingWhite | AllowLeadingSign |AllowDecimalPoint | AllowExponent

HexNumber

Equivalent to AllowLeadingWhite | AllowTrailingWhite | AllowHexSpecifier

Integer

Equivalent to AllowLeadingWhite | AllowTrailingWhite | AllowLeadingSign

None

Applies none of the styles.

Number

Equivalent to AllowLeadingWhite | AllowTrailingWhite | AllowLeadingSign |AllowTrailingSign | AllowDecimalPoint | AllowThousands

If the NumberStyle parameter is not supplied when it is required (as when, for example, a numeric string includes a thousands separator), or if the NumberStyle enumeration is used on a string that does not contain a number in the supplied NumberStyle format, a FormatException exception will be thrown. If the size of the number in the string is too large or too small for the data type, an OverFlowException exception will be thrown. Passing in a null for the SourceString parameter will throw an ArgumentNullException exception.

The Parse method of the two non-numeric data types, bool and char, also deserve some additional explanation. When calling Boolean.Parse, if a string value contains anything except a value equal to the static properties Boolean.FalseString, Boolean. TrueString, or the string literals "false" or "true" (which are case-insensitive), a FormatException exception is thrown. Passing in a null for the SourceString parameter throws an ArgumentNullException exception.

When invoking char.Parse, if a string value containing more than one character is passed as its single argument, a FormatException exception is thrown. Passing in a null for the string parameter throws an ArgumentNullException exception.

The static Enum.Parse method returns an Object of the same type as specified in the first parameter of this method (EnumType). This value is viewed as an Object type and must be cast to its correct enumeration type.

This method throws an ArgumentException exception if the Value parameter cannot be matched to a string in the enumeration. An ArgumentNullException exception is thrown if a null is passed in to the Value parameter.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

2.16 Formatting Data in Strings Problem

You need to format one or more embedded pieces of information inside of a string, such as a number, character, or substring.

Solution

The static string.Format method allows you to format strings in a variety of ways. For example:

int ID = 12345;
double weight = 12.3558;
char row = 'Z';
string section = "1A2C";

string output = string.Format(@"The item ID = {0:G} having weight = {1:G}

is found in row {2:G} and section {3:G}", ID, weight, row, section); Console.WriteLine(output);
output = string.Format(@"The item ID = {0:N} having weight = {1:E}

is found in row {2:E} and section {3:E}", ID, weight, row, section); Console.WriteLine(output);
output = string.Format(@"The item ID = {0:N} having weight = {1:N}

is found in row {2:E} and section {3:D}", ID, weight, row, section); Console.WriteLine(output);
output = string.Format(@"The item ID = {0:(#####)} having weight = {1:0000.00 lbs}

is found in row {2} and section {3}", ID, weight, row, section); Console.WriteLine(output);

The output is as follows:

The item ID = 12345 having weight = 12.3558 is found in row Z and section 1A2C
The item ID = 12,345.00 having weight = 1.235580E+001 is found in row Z and section 1A2C
The item ID = 12,345.00 having weight = 12.36 is found in row Z and section 1A2C
The item ID = (12345) having weight = 0012.36 lbs is found in row Z and section 1A2C

To simplify things, the string.Format method could be discarded and all the work could have been done in the System.Console.WriteLine method, which calls string. Format internally, as shown here:

Console.WriteLine(@"The item ID = {0,5:G} having weight = {1,10:G} " +
"is found in row {2,-5:G} and section {3,-10:G}",
ID, weight, row, section);

The output of this WriteLine method is:

The item ID = 12345 having weight = 12.3558 is found in row Z and section 1A2C

Discussion

The string.Format method allows a wide range of formatting options for string data. The first parameter of this method can be passed a string that may look similar to the following:

"The item ID = {0,5:G}"

The text The item ID= will be displayed as is, with no changes. The interesting part of this string is the section enclosed in braces. This section has the following form:

{index, alignment:formatString}

The section can contain the following three parts:

index

A number identifying the zero-based position of the section’s data in the args parameter array. The data is to be formatted accordingly and substituted for this section. This number is required.

alignment

The number of spaces to insert before or after this data. A negative number indicates left justification (spaces are added to the right of the data), and a positive number indicates right justification (spaces are added to the left of the data). This number is optional.

formatString

A string indicating the type of formatting to perform on this data. This section is where most of the formatting information usually resides. Tables 2-2 and 2-3 contain valid formatting codes that can be used here. This part is optional.

Table 2-2. The standard formatting strings

Formatting character(s) Meaning
C or c Use the currency format. A precision specifier can optionally follow, indicating the number of decimal places to use.
D or d Use the decimal format for integral types. A precision specifier can optionally follow, which represents the minimum number of digits in the formatted number.
 
E or e Use scientific notation. A precision specifier can optionally follow, indicating the number of dig its to use after the decimal point.  
F or f Use fixed-point format. A precision specifier can optionally follow, which represents the number of digits to display to the right of the decimal point.
G or g Use the general format. The number is displayed in its shortest form. A precision specifier can optionally follow, which represents the number of significant digits to display.
N or n Use the number format. A minus sign is added to the beginning of a negative number, and thousands separators are placed accordingly in the number. A precision specifier can optionally follow, which represents the number of digits to display to the right of the decimal point. 
P or p Use the percent format. The number is converted to a percent representation of itself. A precision specifier can optionally follow, indicating the number of decimal places to use.
R or r Use the round-trip format. This format allows the number to be formatted to a representation that can be parsed back to its original form by using the Parse method. Any precision specifier is ignored.
 
X or x Use the hexadecimal format. The number is converted to its hexadecimal representation. The uppercase X produces a hexadecimel number with all capital letters A through F. The lowercase x produces a hexadecimal number with all lowercase letters a through f. A precision specifier can optionally follow, which represents the minimum number of digits in the formatted number.

 

Table 2-3. Custom formatting strings

Formatting character(s) Meaning
0 Use the zero placeholder format. If a digit in the original number exists in this position, display that digit. If there is no digit in the original string, display a zero.
# Use the digit placeholder format. If a digit in the original number exists in this position, display that digit. If there is no digit in the original string, display nothing.
. Use the decimal point format. The decimal point is matched up with the decimal point in the number that is to be formatted. Formatting to the right of the decimal point operates on the digits to the right of the decimal point in the original number. Formatting to the left of the decimal  point operates in the same way.
, Use the thousands separator format. A thousands separator will be placed after every three digits starting at the decimal point and moving to the left. 
 
% Use the percentage placeholder format. The original number is multiplied by 100 before being displayed.
E or e Use the scientific notation format. A precision specifier can optionally follow, indicating the
number of digits to use after the decimal point.
\ Use the escape character format. The \character and the next character after it are grouped into an escape sequence.
Any text within single or double quotes such as"aa" or 'aa' Use no formatting; display as is and in the same position in which the text resides in the format string.
; Used as a section separator between positive, negative, and zero formatting strings.
Any other character Use no formatting; display as is and in the same position in which it resides in the format string.

In addition to the string.Format and the Console.WriteLine methods, the overloaded ToString instance method of a value type may also use the previous formatting characters in Table 2-3. Using ToString, the code would look like this:

float valueAsFloat = 122.35;

string valueAsString = valueAsFloat.ToString("[000000.####]");

The valueAsString variable would contain the formatted number contained in valueAsFloat. The formatted number would look like this:

[000122.35]

The overloaded ToString method accepts a single parameter of type IFormatProvider. The IFormatProvider provided for the valueAsFloat.ToString method is a string containing the formatting for the value type plus any extra text that needs to be supplied.

See Also

See the “String.Format Method,” “Standard Numeric Format Strings,” and “Custom Numeric Format Strings” topics in the MSDN documentation.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

2.17 Creating a Delimited String

Problem

You have an array of strings to format as delimited text and possibly to store in a text file.

Solution

Using the static Join method of the string class, the array of strings can be easily joined in as little as one line of code. For example:

string[] infoArray = new string[5] {"11", "12", "Checking", "111", "Savings"};
string delimitedInfo = string.Join(",", infoArray);

This code sets the value of delimitedInfo to the following:

11,12,Checking,111,Savings

Discussion

The Join method concatenates all the strings contained in a string array. Additionally, a specified delimiting character(s) is inserted between each string in the array. This method returns a single string object with the fully joined and delimited text.

Unlike the Split method of the string class, the Join method accepts only one delimiting character at a time. In order to use multiple delimiting characters within a string of values, subsequent Join operations must be performed on the information until all of the data has been joined together into a single string. For example:

string[] infoArray = new string[4] {"11", "12", "Checking", "Savings"}; string delimitedInfoBegin = string.Join(",", infoArray, 0, 2);
string delimitedInfoEnd = string.Join(",", infoArray, 2, 2);
string[] delimitedInfoTotal = new string[2] {delimitedInfoBegin,
delimitedInfoEnd};
string delimitedInfoFinal = string.Join(":", delimitedInfoTotal); Console.WriteLine(delimitedInfoFinal);

produces the following delimited file:

11,12:Checking,Savings

See Also

See the “String.Join Method” topic in the MSDN documentation.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

2.18 Extracting Items from a Delimited String

Problem

You have a string, possibly from a text file, which is delimited by one or more characters. You need to retrieve each piece of delimited information as easily as possible.

Solution

Using the Split instance method on the string class, we can place the delimited information into an array in as little as a single line of code. For example:

string delimitedInfo = "100,200,400,3,67";
string[] discreteInfo = delimitedInfo.Split(new char[1] {','});

foreach (string Data in discreteInfo)
Console.WriteLine(Data);

The string array discreteInfo holds the following values:

100
200
400
3
67

Discussion

The Split method, like most methods in the string class, is simple to use. This method returns a string array with each element containing one discrete piece of the delimited text split on the delimiting character(s).

In the Solution, the string delimitedInfo was comma-delimited. However, it could have been delimited by any type of character or even by more than one character. When there is more than one type of delimiter, use code like the following:

string[] discreteInfo = delimitedInfo.Split(new char[3] {',', ':', ' '});

This line splits the delimitedInfo string whenever one of the three delimiting characters (comma, colon, or space character) is found.

The Split method is case-sensitive. To split a string on the letter "a" in a case-insensitive manner, use code like the following:

string[] discreteInfo = delimitedInfo.Split(new char[1] {'a', 'A'});

Now, anytime the letter "a" is encountered, no matter what its case, the Split method views that character as a delimiter.

See Also

See the “String.Join Method” topic in the MSDN documentation.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

2.19 Setting the Maximum Number of Characters a String Can Contain

Problem

You want to ensure that the data entered by a user and assigned to a string does not exceed a certain number of characters.

Solution

Use the overloaded constructor of the StringBuilder class, which accepts a maximum capacity. The following code creates a StringBuilder object that has a maximum size of 10 characters:

System.Text.StringBuilder sbMax = new System.Text.StringBuilder(10, 10);
sbMax.Append("123456789");
sbMax.Append("0");

This code creates a StringBuilder object, sbMax, which has a maximum length of 10 characters. Nine characters are appended to this string and then a tenth character is appended without a problem. However, if the next line of code is executed:

sbMax.Append("#");

The length of sbMax goes beyond 10 characters and an ArgumentOutOfRangeException is thrown.

Discussion

The string object is immutable and, as such, does not have a built-in method to prevent its length from going beyond a certain point. Fortunately, the StringBuilder object contains an overloaded constructor that allows the maximum size of its string to be set. The StringBuilder constructor that we are concerned with is defined as follows:

public StringBuilder(int initialCapacity, int maxCapacity)

For most applications, the initialCapacity and maxCapacity can be identical. This way gives you the best performance, overall. If these two parameters are not identical, it is critical that these two parameters can coexist. Take, for example, the following code:

System.Text.StringBuilder
sbMax = new System.Text.StringBuilder(3, 12);
sbMax.Append("1234567890");
sbMax.Append("0");
sbMax.Append("#");

which will throw an ArgumentOutOfRangeException as the final # character is appended. This configuration incorrectly allows a maximum of only 11 characters instead of the 12 indicated.

The following line of code:

System.Text.StringBuilder sbMax = new System.Text.StringBuilder(30, 12);

also throws an ArgumentOutOfRangeException. This time, the initialCapacity parameter is larger than maxCapacity, causing the exception. While you may not be explicitly writing these values for your application, if you are calculating them using some type of expression, you may run into these problems.

To handle an attempt to append characters to the StringBuilder string, forcing it beyond the maximum size, wrap any code to append text to the StringBuilder object in a try-catch block:

try

{

sbMax.Append("New String"); } catch(ArgumentOutOfRangeException rangeE)
{

// Handle overrun here
}

In addition to the Append method, you should also wrap any AppendFormat, Insert, and Replace methods of the StringBuilder object in a try-catch block. Any of these methods can allow characters to be added to the StringBuilder string, potentially causing its length to exceed its maximum specified length.

See Also

See the “StringBuilder.Append Method” topic in the MSDN documentation.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

2.20 Iterating Over Each Character in a String

Problem

You need to iterate over each character in a string efficiently in order to examine or process each character.

Solution

C# provides two methods for iterating strings. The first is by using a foreach loop, as follows:

string testStr = "abc123";
foreach (char c in testStr)
{

Console.WriteLine(c.ToString());
}

This method is quick and easy. Unfortunately, it is somewhat less flexible than the second method, which uses the for loop instead of a foreach loop to iterate over the string. For example:

string testStr = "abc123";
for (int counter = 0; counter < testStr.Length; counter++)
{

Console.WriteLine(testStr[counter].ToString());
}

Discussion

The foreach loop is simpler and thus less error-prone, but it lacks flexibility. In contrast, the for loop is slightly more complex, but it makes up for that in flexibility.

The for loop method uses the indexer of the string variable testStr to get the character located at the position indicated by the counter loop index. Care must be taken not to run over the bounds of the string array when using this type of looping mechanism.

A for loop is flexible enough to change how looping over characters in a string is performed. For example, the loop could be quickly modified to start and end at a specific point in the string by simply changing the initializer and conditional expressions of the for loop. Characters can be skipped by changing the iterator expression to increment the counter variable by more than one. The string can also be iterated in reverse order by changing the for loop expressions, as shown:

for (int counter = testStr.Length - 1; counter >= 0; counter--)
{

Console.WriteLine(testStr[counter].ToString());
}

This example allows a string to be created containing the characters of the original string in reverse order:

string revTestStr = "";
for (int counter = testStr.Length - 1; counter >= 0; counter--)
{

revTestStr += testStr[counter];
}
Console.WriteLine(revTestStr);

It should be noted that each of these methods was compiled using the /optimize compiler option. However, adding or removing this option has very little impact on the resulting IL code.

The compiler optimizes the use of a foreach loop iterating through a vector array—one that starts at zero and has only one dimension. Converting a foreach loop to another type of loop, such as a for loop, may not produce any noticeable increases in performance.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

2.21 Improving String Comparison Performance

Problem

Your application consists of many strings that are compared frequently. You have been tasked with improving performance and making more efficient use of resources.

Solution

Use the intern pool to improve resource usage and, in turn, improve performance. The Intern and IsInterned instance methods of the string class allow you to use the intern pool. Use the following static methods to make use of the string intern pool:

using System;
using System.Text;

public class InternedStrCls

{
public static void CreateInternedStr(char[] characters)
{

string NonInternedStr = new string(characters); String.Intern(NonInternedStr);
}

public static void CreateInternedStr(StringBuilder strBldr)
{
String.Intern(strBldr.ToString());
}

public static void CreateInternedStr(string str)
{
String.Intern(str);
}

public static void CreateInternedStr(string[] strArray)

{
foreach(string s in strArray)

{

String.Intern(s);
}
}
}

Discussion

The CLR automatically stores all string literals declared in an application in an area of memory called the intern pool. The intern pool contains a unique instance of each string literal found in your code, which allows for more efficient use of resources by not storing multiple copies of strings that contain the same string literal. Another benefit is speed. When two strings are compared using either the == operator or the Equals instance method of the string class, a test is done to determine whether each string variable reference is the same;if they are not, then each string’s length is checked;if both string’s lengths are equal, each character is compared individually. However, if we could guarantee that the references, instead of the string contents, could be compared, much faster string comparisons can be made. String interning does just that: it guarantees that the references to equivalent string values are the same, eliminating the possibility of attempting the length and character-by-character checks. This yields better performance in situations where the references to two equal strings are different and the length and character-by-character comparisons have to be made.

Note that the only strings automatically placed in this intern pool by the compiler are string literals—strings surrounded by double quotes—found in code by the compiler. The following lines of code will place the string "foo" into the intern pool:

string s = "foo";

StringBuilder sb = new StringBuilder("foo");

StringBuilder sb = new StringBuilder().Append("foo");

The following lines of code will not place the string "foo" into the intern pool:

char[] ca = new char[3] {'f','o','o'};

StringBuilder sb = new StringBuilder().Append("f").Append("oo");

string s1 = "f"; string s2 = "oo"; string s3 = s1 + s2;

You can programmatically store a new string created by your application in the intern pool using the static string.Intern method. This method returns a string referencing the string literal contained in the intern pool, or, if the string is not found, the string is entered into the intern pool and a reference to this newly pooled string is returned.

There is also another method used in string interning called IsInterned. This method operates similarly to the Intern method, except that it returns null if the string is not in the intern pool, rather than adding it to the pool. This method returns a string referencing the string literal contained in the intern pool, or, if the string is not found, it returns null.

An example of using this method is shown here:

string s1 = "f";
string s2 = "oo";
string s3 = s1 + s2;
if (String.IsInterned(s3) == null)
{

Console.WriteLine("NULL");
}

However, if we add the highlighted line of code, the IsInterned test returns a non-null string object:

string s1 = "f";

string s2 = "oo";
string s3 = s1 + s2;

InternedStrCls.CreateInternedStr(s3);

if (String.IsInterned(s3) == null) { Console.WriteLine("NULL"); }

The Intern method is useful when you need a reference to a string, even if it does not exist in the intern pool.

The IsInterned method can optimize the comparison of a single string to any string literal or manually interned string. Consider that you need to determine whether a string variable contains any string literal that has been defined in the application. Call the string.IsInterned method with the string variable as the parameter. If null is returned, there is no match in the intern pool, and thus there is no match between the string variable’s value and any string literals:

string s1 = "f";
string s2 = "oo";
string s3 = s1 + s2;

if (String.IsInterned(s3) != null)

{

// If the string "foo" has been defined in the app and placed

// into the intern pool, this block of code executes.

}
else
{

// If the string "foo" has NOT been defined in the app NOR been placed
// into the intern pool, this block of code executes.
}

Exercise caution when using the string interning methods. Calling the Intern method for every possible string that could be created by your application would actually cause the application’s performance to slow considerably, since this method must search the intern pool for the string;if it does not exist in the pool, it is added. The reference to the newly created string in the intern pool is then returned.

Another potential problem with the IsInterned method in particular stems from the fact that every string literal in the application is stored in this intern pool at the start of the application. If you are using IsInterned to determine whether a string exists, you are comparing that string against all string literals that exist in the application, as well as any you might have explicitly interned, not just the ones in the scope in which IsInterned is used.

See Also

See the “String.Intern Method” and “String.IsInterned Method” topics in the MSDN documentation.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

2.22 Improving StringBuilder Performance

Problem

In an attempt to improve string-handling performance, you have converted your code to use the StringBuilder class. However, this change has not improved performance as much as you had hoped.

Solution

The chief advantage of a StringBuilder object over a string object is that it preallocates a default initial amount of memory in an internal buffer in which a string value can expand and contract. When that memory is used, however, .NET must allocate new memory for this internal buffer. You can reduce the frequency with which this occurs by explicitly defining the size of the new memory using either of two techniques. The first approach is to set this value when the StringBuilder class constructor is called. For example, the code:

StringBuilder sb = new StringBuilder(200);

specifies that a StringBuilder object can hold 200 characters before new memory must be allocated.

The second approach is to change the value after the StringBuilder object has been created, using one of the following properties or methods of the StringBuilder object:

sb.Capacity = 200;

sb.EnsureCapacity(200);

Discussion

As noted in previous recipes in this chapter, the string class is immutable;once a string is assigned to a variable of type string, that variable cannot be changed in any way. So changing the contents of a string variable entails the creation of a new string containing the modified string. The reference variable of type string must then be changed to reference this newly created string object. The old string object will eventually be marked for collection by the garbage collector, and, subsequently, its memory will be freed. Because of this intensive behind-the-scene action, code that performs intensive string manipulations using the string class suffers greatly from having to create new string objects for each string modification, and greater pressure is on the garbage collector to remove unused objects from memory more frequently.

The StringBuilder class solves this problem by preallocating an internal buffer to hold a string. The contents of this string buffer are manipulated directly. Any operations performed on a StringBuilder object do not carry with it the performance penalty of creating a whole new string or StringBuilder object and, consequently, filling up the managed heap with many unused objects.

There is one caveat with using the StringBuilder class, which, if not heeded, can impede performance. The StringBuilder class uses a default initial capacity to contain the characters of a string, unless you change this default initial capacity through one of the StringBuilder constructors. Once this space is exceeded, by appending characters, for instance, a new string buffer is allocated double the size of the original buffer. For example, a StringBuilder object with an initial size of 20 characters would be increased to 40 characters, then to 80 characters, and so on. The string contained in the original internal string buffer is then copied to this newly allocated internal string buffer along with any appended or inserted characters.

The default capacity for a StringBuilder object is 16 characters;in many cases, this is much too small. To increase this size upon object creation, the StringBuilder class has an overloaded constructor that accepts an integer value to use as the starting size of the preallocated string. Determining an initial size value that is not too large (thereby allocating too much unused space) or too small (thereby incurring a performance penalty for creating and discarding a large number of StringBuilder objects) may seem like more of an art than a science. However, determining the optimal size may prove invaluable when your application is tested for performance.

In cases where good values for the initial size of a StringBuilder object cannot be obtained mathematically, try running the applications under a constant load while varying the initial StringBuilder size. When a good initial size is found, try varying the load while keeping this size value constant. You may discover that this value needs to be tweaked to get better performance. Keeping good records of each run, and committing them to a graph, will be invaluable in determining the appropriate number to choose. As an added note, using PerfMon (Administrative Tools ➝ Performance Monitor) to detect and graph the number of garbage collections that occur might also provide useful information in determining whether your StringBuilder initial size is causing too many reallocations of your StringBuilder objects.

The most efficient method of setting the capacity of the StringBuilder object is to set it in the call to its constructor. The overloaded constructors of a StringBuilder object that accept a capacity value are defined as follows:

public StringBuilder(int capacity)
public StringBuilder(string str, int capacity)
public StringBuilder(int capacity, int maxCapacity)
public StringBuilder(string str, int startPos, int length, int capacity)

In addition to the constructor parameters, one property of the StringBuilder object allows its capacity to be increased (or decreased.) The Capacity property gets or sets an integer value that determines the new capacity of this instance of a StringBuilder object. Note that the Capacity property cannot be less than the Length property.

A second way to change the capacity is through the EnsureCapacity method, which is defined as follows:

public int EnsureCapacity(string capacity)

This method returns the new capacity for this object. If the capacity of the existing object already exceeds that of the value in the capacity parameter, the initial capacity is retained, and this value is also returned by this method.

There is one problem with using these last two members. If any of these members increases the size of the StringBuilder object by even a single character, the internal buffer used to store the string has to be reallocated. However, minimizing the capacity of the object does not force a reallocation of a new, larger internal string buffer. These methods are useful if they are used in exceptional cases when the StringBuilder capacity may need an extra boost, so that fewer reallocations are performed in the long run.

The StringBuilder object also contains a Length property, which, if increased, appends spaces to the end of the existing StringBuilder object’s string. If the Length is decreased, characters are truncated from the StringBuilder object’s string. Increasing the Length property can increase the Capacity property, but only as a side effect. If the Length property is increased beyond the size of the Capacity property, the Capacity property value is set to the new value of the Length property. This property acts similarly to the Capacity property:

sb.Length = 200;

The string and StringBuilder objects are considered nonblittable, which means that they must be marshaled across any managed/ unmanaged boundaries in your code. The reason is that strings have multiple ways of being represented in unmanaged code, and there is no one-to-one correlation between these representations in unmanaged and managed code. In contrast, types such as byte, sbyte, short, ushort, int, uint, long, ulong, IntPtr, and UIntPtr are blittable types and do not require conversion between managed and unmanaged code. One-dimensional arrays of these blittable types, as well as structures or classes containing only blittable types, are also considered blittable and do not need extra conversion when passed between managed and unmanaged code.

The string and StringBuilder objects take more time to marshal, due to conversion between managed and unmanaged types. Performance will be improved when calling unmanaged code through P/Invoke methods if only blittable types are used. Consider using a byte array instead of a string or StringBuilder object, if at all possible.

See Also

See the “StringBuilder Class” topic in the MSDN documentation.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

2.23 Pruning Characters from the Head and/or Tail of a String

Problem

You have a string with a specific set of characters, such as spaces, tabs, escaped single/ double quotes, any type of punctuation character(s), or some other character(s), at the beginning and/or end of a string. You want a simple way to remove these characters.

Solution

Use the Trim, TrimEnd, or TrimStart instance methods of the String class:

string foo = "--TEST--";
Console.WriteLine(foo.Trim(new char[1] {'-'})); // Displays "TEST"

foo = ",-TEST-,-";
Console.WriteLine(foo.Trim(new char[2] {'-',','})); // Displays "TEST"

foo = "--TEST--";
Console.WriteLine(foo.TrimStart(new char[1] {'-'})); // Displays "TEST--"

foo = ",-TEST-,-";
Console.WriteLine(foo.TrimStart(new char[2] {'-',','})); // Displays "TEST-,-"

foo = "--TEST--";
Console.WriteLine(foo.TrimEnd(new char[1] {'-'})); // Displays "--TEST"

foo = ",-TEST-,-";
Console.WriteLine(foo.TrimEnd(new char[2] {'-',','})); // Displays "-,-TEST"

Discussion

The Trim method is most often used to eliminate whitespace at the beginning and end of a string. In fact, if you call Trim without any parameters on a string variable, this is exactly what would happen. The Trim method is overloaded to allow you to remove other types of characters from the beginning and end of a string. You can pass in a char[] containing all the characters that you want removed from the beginning and end of a string. Note that if the characters contained in this char[] are located somewhere in the middle of the string, they are not removed.

The TrimStart and TrimEnd methods remove characters at the beginning and end of a string, respectively. These two methods are not overloaded, similar to the Trim method. Rather, these two methods accept only a char[]. If you pass a null into either one of these methods, only whitespace is removed from the beginning or the end of a string.

See Also

See the “String.Trim Method,” “String.TrimStart Method,” and “String.TrimEnd Method” topics in the MSDN documentation.

Buy the book!If you've enjoyed what you've seen here, or to get more information, click on the "Buy the book!" graphic. Pick up a copy today!

Visit the O'Reilly Network http://www.oreillynet.com for more online content.

blog comments powered by Disqus
C# ARTICLES

- Beginning C#
- ASP.NET RedirectPermanent Method using C# an...
- C Programming Language and UNIX Pioneer Pass...
- Using Facebook JavaScript SDK in ASP.NET wit...
- ASP.NET Export to Excel and Word using VB.NE...
- WAV and MP3 Streaming with ASP.Net and C#
- Game Programming using SDL: the File I/O API
- C# and Java Developer Jobs on the Rise
- The Future Evolution of C# and VB.NET
- C# If and Else-if Statements
- How To Use the C# String Replace Method
- 5 Ways to Parse XML in C#
- C# Meets Design Patterns
- Coding a CRC-Generating Algorithm in C
- Cyclic Redundancy Check

ASP Web Hosting ASP.Net Web Hosting Windows Web Hosting
ASP Free Forums 
 RSS  Tutorials RSS
 RSS  Forums RSS
 RSS  All Feeds
Site Map 
Request Media Kit
Write For Us Get Paid 
Weekly Newsletter
 
Developer Updates  
Free Website Content 
Privacy Policy 
Support 


© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 3 - Follow our Sitemap
Most Popular Topics
All ASP.Net Tutorials