In this second part of the series, you will learn about reference types, especially arrays and strings. You will also learn how reference types work internally. You will be introduced to delegates and classes, too, as reference types, but they will be discussed in detail later.
Each time we declare a variable of a value type (like int32 variable) we know that this variable require four bytes to store its value. Because the value that will be stored in this variable is one piece of data, it is stored in-line with the variable (in the variable). This is fine, because copying operations on small values is not a big deal. For example, look at the following program:
using System;
namespace ValueTypes { class Class1 { static void Main(string[] args) { int x = 45999; int y = x; // the copying operation Console.WriteLine(x); Console.WriteLine(y); Console.ReadLine(); } } }
As you can see, the value of the variable x is copied to the variable y. This happens because we know that any variable of type Int32 is four bytes. This is very different with reference types; when you define a class called Customer you don't know how many bytes you need to allocate for this class, so it's not a fixed size type. In fact, when you declare a variable, the CLR needs to know how many bytes of the memory this variable needs, in order to allocate the space.
So why is it that, though you don't know the size of the Customer class instance, the CLR will allocate space for it? The answer is, because it's a reference type. When you declare a variable of a reference type, the variable itself does not contain the value (unlike the value type); it contains a piece of data called a "reference" to where the value is actually stored.
The value is stored in a large piece of memory called the CLR Managed Heap. The Managed Heap is slower than the Stack in copying and allocating because of the semantics of reference types (the types are not fixed in size). We allocate objects on the Managed Heap and (unlike C++) the CLR will free the unused memory location as it is needed. Let's take a look at the Customer Example so you can better understand this concept:
class Customer { public string FirstName; public string LastName; }
For now this is a class that contains two member fields. Copy the following code and compile it.
class Customer { public string FirstName; public string LastName; } }
The output of this code is
Michael Youssef
Michael Youssef
We simply created an instance of the Customer class using the new operator. The new operator created an instance of the reference type on the Managed Heap and returned a memory reference to it, so it may return something like 0x456454 and store it in the variable. The reference to the instance is stored in the variable aCustomer; we initialized the member fields with the FirstName Michael and the LastName Youssef. We used the Console.WriteLine() method to print these values to the console.
After that we declared another variable of type Customer, but for this variable we did not use the new operator to create another instance of the Customer class. We assigned the variable aCustomer (which contains the reference to the actual Customer instance) to the second variable AnotherCustomer, and again, we printed the FirstName and LastName of the AnotherCustomer object. As you can see, it's the same as the aCustomer object.
Why? I think it's pretty clear now that we assigned the reference to the memory address of aCustomer object to the variable AnotherCustomer, so both of them refer to the same object. So the assignment is AnotherCustomer = aCustomer; it just copies the aCustomer value (which is a reference to the memory where the actual object is allocated) to the AnotherCustomer variable, it doesn't do anything else. Now both variables refer to the same object in memory. So the instance in this example exists in one place (the Managed Heap) and is referenced by two variables (or as many as you need).
A class in C# is much like any other class in other Object Oriented programming languages, but it's encapsulated in one file (unlike C++) with the extension .cs. Although you can write as many classes as you want in that one file, it's not a good programming practice.
A class contains members like constructors, destructors, nested types, events, delegates, constants, indexers, properties, methods and fields. These are the members of a class, so you can define one or more of them. The expression class member refers to a piece of data or an operation that exists in the class, so methods, constructors, destructors and properties represent operations of a given class which we can call functions of the class, too.
In C# all classes are reference types, so they need the new operator in order to be allocated on the Managed Heap. You will learn more about classes and class members in the following article.
C# Delegates
If you are a C++ programmer then you can think of C# delegates as function pointers (but in .NET it's type safe). A delegate is a pointer to a method in a class. We define a delegate class to declare the signature of the methods for this delegate. Windows Forms components use delegates extensively to provide the functionality of events. In order to grasp the concepts of delegates and events, you will need to look at the Generated MSIL code for these types, and this is exactly what we will do later. There is a complete, detailed article about delegates and another one about events coming later in this series, so I won't go into details here.
C# Arrays
C# arrays have syntax similar to Java arrays and C++ arrays, but they work differently because C# arrays have built-in methods and properties for manipulating them, which Java and C++ lack. Arrays have a base class, which is System.Array, and we will look at some of the provided methods later on the article. You can create an array of value types and reference types. We said before that Value-Types allocated on the Stack; it's true, but it needs a little explanation, since we are covering reference types in this section.
Arrays are reference types, so when you declare a new array that contains value types, the CLR will allocate memory space for the array elements on the Managed Heap (despite their being value types). When you begin to use the elements of the array, they will be copied to the stack. So, to put it simply, value types are declared when they're used, as we will see in an example about this later on.
You declare an array in C# by placing empty square brackets after the type of the array, followed by the variable name that will hold the reference to the array:
int[] Numbers;
It's different from the syntax used in C++, where you put the square brackets after the variable name. Note that you need to create the array using the new operator in order to use it:
Numbers = new int[6];
This statement simply creates an array of six integers on the Managed Heap, returns the reference, and assigns it to the variable Numbers. You can declare and instantiate the array in one statement:
int[] Numbers = new int[6];
You can initialize the elements of the array within the same declaration statement:
int[] Numbers = {10,29,33,47,51,64};
You can also initialize the array after you instantiate it:
We initialized the array by initializing each element separately. As you can see, you access each element in the array through its index, so to access the first element of the array you write the code Numbers[0], and so on. Also note that .NET arrays are zero-based; this is an inherited feature from the base class System.Array, which you derive implicitly each time you create an array in C#.
When you declare an array of value types, the elements will contain the actual data, but when you declare an array of reference types, the array will contain references to memory addresses where the data is stored. C# provides you with single-dimension arrays, multi-dimension arrays and jagged arrays.
This is the basic array type. It has only one dimension, and it's the most common type that you will be using in your applications. Here's an example:
using System;
namespace Arrays { class SingleDimension { static void Main(string[] args) { // creating the array int[] Numbers = new int[5];
// initializing the array for(int x = 0; x < Numbers.Length; x++) { Numbers[x] = (x + x); }
// printing the array elements to the Console for(int x = 0; x < Numbers.Length; x++) { Console.WriteLine("Element number {0} = {1}",x, Numbers[x]); }
Console.ReadLine(); } } }
The output of the program will be:
Element number 0 = 0 Element number 1 = 2 Element number 2 = 4 Element number 3 = 6 Element number 4 = 8
This is a very simple example of a single-dimension array. We first create an array of five integer elements, then we loop on the elements to initialize them, and then we do another loop to write them to the Console. Note that we use the Length property of the array (inherited from the base class System.Array) to return the number of elements in the array.
Let's extend this example. We will not print the element's value in the next example; instead, we will copy the values into a new array using the static method Clone of the base class. Note that this method performs a shallow copying, not a deep copying -- in other words, it will copy value types exactly as you expect (copying the value of the element to the target array's element), but with reference types it will copy the reference itself, not the object that the reference refers to. So, for example, it will copy the reference aCustomer, not the member fields FirstName and LastName to the new target array's element.
using System;
namespace Arrays { class SingleDimension { static void Main(string[] args) { // creating the array int[] Numbers = new int[5];
// initializing the array for(int x = 0; x < Numbers.Length; x++) { Numbers[x] = (x + x); }
// copying the array into a new array int[] NewNumbers = (int[])Numbers.Clone();
//printing the values of the new array for(int x = 0; x < NewNumbers.Length; x++) { Console.WriteLine("NewNumbers's element number {0} = {1}",x, NewNumbers[x]); }
Console.ReadLine(); } } }
We used the Clone() method to copy the values to the newly created array NewNumbers. Note that we must cast to the target array type, because the return type of the clone method is object[] not int[]. We will discuss type conversion in later articles. The result of this code is:
NewNumber's element number 0 = 0 NewNumber's element number 1 = 2 NewNumber's element number 2 = 4 NewNumber's element number 3 = 6 NewNumber's element number 4 = 8
Let's create another example that sorts an array. Copy the following code into a file with the .cs extension, then compile it:
foreach(int i in Numbers) { Console.WriteLine(i); } Console.ReadLine(); } } }
We have used the Array.Sort() method that accepts an array and sorts it; after that we used the foreach structure, which we will discuss later in the series, to iterate through the array and print the value of the elements.
A single-dimension array is just a simple sequence of elements, but multi-dimension arrays extend to complex sequences. This complexity involves walking in more than one dimension. For example, you can have two dimensions in your array (such as x and y) to represent screen coordinates, so you can go to point "5, 6" or "100, 400" and so on. You specify a multi-dimension array by using a comma inside the square brackets like this: int[,]. If you want a three dimension array, it looks like this: int[,,]. Sometimes multi-dimension arrays are called rectangular arrays, because all the dimensions are fixed, unlike jagged arrays, which are arrays of arrays, as we will discuss soon. The following example illustrates the use of Multi-Dimension arrays:
Note that we have used the method GetLength() to return the length of each dimension. Dimensions are also zero-based. The Length property returns the total number of elements in all dimensions, so we didn't use it in this example.
Jagged Arrays
This is the most complex type of array you will define; fortunately, it's very uncommon to define jagged arrays, but we need to discuss it for the sake of completeness. A jagged array is simply array of arrays. In other words, it's much more like a multi-dimension array, but each dimension may vary in its element number from the other dimensions in the array. You declare a jagged array like this:
int[][] Players = new int[3][];
This simply means that we have an array called Players that contains three elements. Each element represents an array of its own that can vary in the number of elements. To initialize the elements (the arrays) of the jagged array, you can write the following code:
int[][] Players = new int[3][]; Players[0] = new int[4] {1,2,9,10}; Players[1] = new int[6] {23,12,34,35,65,14}; Players[2] = new int[11] {1,4,5,6,7,9,10,25,12,15,3};
As you can see, it's an array of arrays. In our Players jagged array we can store the players' numbers of different games, and because each game contains a different number of players, the jagged array is a perfect solution with which to store them all.
C# lists the string data type as primitive, although it's reference type, not value type as you may think. In this section we will look at C# strings and some of their related operations.
You initialize a string variable in the same way as you initialize a variable of any other primitive data type, like the following:
int x = 34;
string name = "Michael Youssef";
Note that int and string both are C# keywords, and they alias the .NET Framework Classes. In the case of the int keyword, it's an alias to the structure System.Int32, and for the string keyword it's an alias to the class System.String, so both the keyword and the class refer to a string. The System.String class provides us with methods for replacing, adding and inserting characters, and we will see an example soon. You can't use the new operator to create a string object like the following:
string name = new string("Michael Youssef");
This generates an error.
String literals are included in double quotation marks, and C# has the same escape sequence characters as in C and C++, so slashes must be preceded with another slash as in the following example:
string path = "c:\\My Documents";
You can use the @ operator (as in C and C++) to tell the C# compiler that you don't want escape sequence processing in this string:
string path = @"c:\My Documents";
Strings in C# can't be changed once created -- it's immutable -- so the following code simply returns a new string instance and stores it in the variable; it's not the same object:
string test = "Michael";
test += " Youssef";
You can concatenate strings using C# + operator:
string name = "Michael" + " " + "Youssef";
or the following:
string firstName = "Michael ";
string lastName = "Youssef";
string name = firstName + lastName;
There's an interesting issue you should know at this point, which is the difference between concatenating the literal strings and concatenating string variables. When you concatenate literal strings the C# compiler evaluates the expression and ends up with the full concatenated string that it placed as the metadata. When you concatenate variable strings, the C# compiler will evaluate the expression at runtime. If you use several string variables, don't use the + operator to concatenate strings, because it creates multiple string objects on the Managed Heap which will slow the process. Instead, use the class System.Text.StringBuilder. Let's look at some example of using strings in our applications.
Suppose that we have a delimited text file, or a string to make it simpler, and we need to remove the comma and return the text. System.String class provides us with a static method called Split which takes an array of type Char (Unicode Character) which will be used to substring the text and return an array of strings based on this Char array. In our case this array contains only the ',' Char, and it will return the names in a string array after splitting the text, so let's get to the code:
foreach(string x in names) { Console.WriteLine(x); } Console.ReadLine(); } } }
The result:
Michael Youssef John Gary Gerry
Sometimes you will need to replace strings, for example you might need to replace all occurrences of "Mick" with "Michael." Let's see how to do that:
using System;
namespace Arrays { class SingleDimension { static void Main(string[] args) { string base = "Hey Mick, how are you Mick?"; string replacement = "Michael"; base = base.Replace("Mick",replacement);
As you can see, the code creates two strings. One is the replacement string and the other is the base string. The Replace method simply replaces the string and returns a new object, so we need to assign the new object to the variable base again. All this is possible because, when you create a string, it derives from System.String, which provides many methods as you saw in the previous examples.
The System.String class contains a method called Substring, which actually returns a substring of a given string. Let's write some code using it:
using System; namespace Arrays { class SingleDimension { static void Main(string[] args) { string x = "the car is BMW"; string car = x.Substring(11,3); Console.WriteLine(car); Console.ReadLine(); } } }
the result is
BMW
The Substring method takes two integer values: the first indicates the starting position, which is zero based, and the second tells how many characters to return to the new string.
Sometimes you need to use a value type as a reference type, perhaps to pass it as an argument to a method parameter of type System.Object, and in many other scenarios. When you convert a value type instance to a reference type instance, it's a boxing operation; when you convert a reference type instance to a value type instance it's called an unboxing operation. Actually C# implicitly performs the boxing operation on your behalf, but you will need to explicitly perform the unboxing operation.
To explain the boxing process I will write a simple program that declares a variable of type int and assigns the value 91 to this variable. It will also declare another variable of type System.Object (but I will use the C# keyword object; it's the same as System.String class and the keyword string analogy). Then I will assign the int variable to the object reference -- and this is exactly where the boxing operation happens. After that we will look at the MSIL code and see the IL Instruction box that does the magic for us. Compile the following code into a class, and then use the ILDASM tool to load the MSIL generated code as we have done in previous articles:
using System; namespace Boxing { class BoxTest { static void Main(string[] args) { int x = 91; object xObject = x; // boxing happens here } } }
Load the MSIL code and the namespace Boxing will be shown, as in the following figure (extend the hierarchy until you get to the Main method):
Double click on the Main method and you will get the following MSIL code:
Okay that's good; the next step is to explain how the boxing process works, and then we will talk about the unboxing operation.
The boxing operation begins with allocating memory on the Managed Heap (for the value type being boxed). Memory size for this boxed object is the same as the value type size plus the memory needed for .NET CLR internals like method invocation list tables and other information that is needed to form a reference type. Then the value of the value type is copied to this memory space on the Heap, and a reference (memory address) returns and gets stored on the Stack --and now you have a boxed value type. This happens automatically. Let's take a look at the MSIL code; I'm sure that you will like it.
The first thing to notice is the .locals. It tells you that we have two local variables for the Main method, one of type int32 and the other of type object. The first instruction IL_0000: ldc.i4.s 91 loads a constant value; i4.s means load an integer value of 4 bytes. The value 92 is popped to (assigned to) our local variable (x) using the Instruction stloc.0. Note that the zero here means the first variable, which is x in our example.
Now we have the value popped on the stack, so the instruction idloc.0 loads the value back, and now we can use the box instruction. The box instruction converts the value type, which in our example is type System.Int32, to a Reference Type. The stloc.1 is the most interesting Instruction in this example. As you know, the box instruction creates the object on the Managed Heap and returns a reference to that object. The Instruction stloc.1 requires the Common Language Runtime to pop the Boxing return value (which in this case is a reference to the newly created object on the Managed Heap) to the second local variable, which is xObject; now we have a reference to the object.
Unboxing Operation
I will extend the boxing example to explain the unboxing process. Copy the following code and compile it:
using System; namespace Boxing { class BoxTest { static void Main(string[] args) { int x = 91; object xObject = x; // boxing happens here int y = (int) xObject; // Unboxing the xObject into a Value-Type } } }
Load the file with the MSIL tool and extend the namespace, then double click on the Main method to get the MSIL code:
The first thing to notice about the C# code is the cast operation. We didn't perform casting with boxing because the CLR knows what to do, and it allocates the memory needed to contain the value of the value type. With unboxing, we must explicitly perform the cast operation back to the right type (in our example back to int) because object is a general type and it can be anything. The C# compiler needs to know exactly what type is needed for the unboxing operation, so the cast must be valid.
Look at the .locals and you will see that there is a third variable (y) of type int that we used in the example for the unboxing. We will begin from the Instruction IL_000a: ldloc.1. This Instruction pushes or loads the value on the stack, and again 1 means the second variable which is the xObject. The unbox Instruction is used to convert the value on the stack from a reference type to a value type. Note that the value here is a reference (memory address) and the runtime will take care of converting the actual data (what the reference refers to) to a value type.
The last Instruction stloc.2 pops the value on the stack (in this case to the variable y) and the Main method returns.The unbox Instruction tests that the object variable (in our example xObject) refers to a valid object on the Managed Heap, and also tests that the value of this object can be converted to the specified value type that the cast is using. If it fails, the InvalidCastOperation exception is thrown. Note also that the boxed value is an independent copy, and changes to the original value type don't affect it; you can simply copy the boxed value back into a value type using the cast operator.