In this second part to a two-part article on CIL, we'll take the C# program with which we started and convert it into CIL. This will involve several steps, which we'll describe and discuss in full.
In the previous article, we looked at a C# program that finds prime numbers between two and one million. We started to convert it to CIL, recreating the class and method involved in the program, and then we took a look at method calls and the concept of a stack. Finally, we diverged a bit from the main task (converting the C# program) in order to create a simple “hello world” program in CIL, which we then assembled and executed.
In this final part, we'll complete the conversion of our C# prime number program into CIL. This will involve learning local variables, mathematical operations and loops in CIL, which we'll all take step-by-step.
For convenience, here is the C# program from the last article, in its entirety:
using System;
publicclassFindPrimes
{
publicstaticvoid Main(string[] args)
{
for (int i = 2; i <= 1000000; i++)
{
bool prime = true;
int limit = (int)Math.Sqrt(i);
for (int f = 2; f <= limit; f++)
{
if (i % f == 0)
{
prime = false;
break;
}
}
if (prime)
{
Console.WriteLine(i);
}
}
Console.WriteLine("Done!");
}
}
In addition, here is the converted CIL program so far:
Before moving on to local variables, remember that we should state the maximum number of spaces on the stack that we'll require. We'll require two spaces for our program:
.maxstack 2
While the program will function if the number is set to more than is necessary, setting it to less than is necessary causes the program to crash. Alternatively, if we don't specify a value at all, the assembler will choose a default value.
Recall that we also need to manually call for a return at the end of the method, unlike in C#:
ret
Otherwise, the program will crash.
Moving on to local variables, from looking at the source of the C# version of the prime number program, we can see that four local variables are defined: i (the number being tested for primality), prime (a boolean value indicating whether or not the number is prime), limit (the square root of the number, up to which we check for factors – think of factors in pairs, one of the pair being less than the square root and one being greater than the square root, or the square root itself being a factor) and f (a possible factor, between two and the square root of the number).
In CIL, local variables aren't defined as they are in C#. Instead of defining them as we need them at various parts of the method, we need to define them all at once near the top of the method. Let's do that now:
.locals init (int32 i, int32 limit, int32 f)
This method of defining variables is indeed different, but it shouldn't seem too scary. We simply give each variable a name and a type. Also, notice how we have not included our prime variable. In C#, we need it in order to mark a number as not prime when breaking out of the loop (otherwise, how could we distinguish between an early break and a complete run through the loop?). However, such a device is unnecessary with CIL, and soon we'll see why.
Now comes the difficult part of our program. In the C# version of our program, we quickly put together two for loops nested within each other. In CIL, however, this becomes difficult due to one reason: loops don't exist. Instead, we must manually guide the flow of the program though labels and branches. For one final time, let's break from our prime number program example in order to study flow control in CIL. To do this, let's first look at a simple C# program that prints out the numbers one through ten:
using System;
classCount
{
publicstaticvoid Main(string[] args)
{
for (int i = 1; i <= 10; i++)
{
Console.WriteLine("Number: {0}", i);
}
Console.WriteLine("Done!");
}
}
Let's break down the loop and examine what exactly the program is doing. Yes, it essentially displays the numbers one through ten, but how exactly does it do that? First, an integer variable is defined, which will act as a counter. Then, the value of this counter is checked to see if it has passed ten. If it hasn't, the number is printed. After it's printed, it's incremented, and after that, we check the value again. Then the value is printed, and this same process of checking, printing and incrementing is repeated. So, we can now break it down into a series of steps:
Assign the initial value to i.
Check to see if the value is less than 10. If it is, go to step 3. If it isn't, skip past the loop.
Print out the number.
Increment i and go to step 2.
Our task is to replicate this behavior in CIL using labels. Let's divide the above steps into sections for our program:
Initialization
Condition
Contents
Incrementation
We have now broken down the behavior of the loop into four words. Now we can begin representing this behavior in CIL. Let's first set up the skeleton of our program:
.assembly extern mscorlib {}
.assembly LoopTest {}
.class public LoopTest
{
.method public static void Main(string[] args)
{
.entrypoint
.maxstack 2
.locals init (int32 I)
ret
}
}
Notice how we define the local variable that we will be using as a counter. Now we need to move on to the first step in our list, which is initialization. This is simple. We need to assign the value one to our variable. This is done by pushing the number one onto the stack and then storing the value at the top of the stack (which will be the number, of course):
ldc.i4.1
stloc i
Don't let the short, seemingly meaningless words scare you. The first line simply puts the number on the stack, and the second line takes the value off of the stack and stores it. Yes, it really is necessary here to go through the stack.
The second step is to check the value of our variable. Notice how we need to repeat this step more than once. Because we need to return to this step, we need to give it a label that we can jump to when appropriate:
loop_condition:
That takes care of the label. Now we need to compare the value of our counter to the number ten. To do this, we have to push both values onto the stack. Again, with CIL, we must work through the stack:
ldloc i
ldc.i4 10
The only thing left is to compare them. If the counter is greater than ten, then we have to jump to the end of the loop. Because we're changing location within the method here, we have to use another label. We'll define this label at the end of everything, but here is how to jump to it:
bgt loop_done
Notice the term “bgt.” Read this as “branch if greater than.” The whole line can be read as “if the one number is greater than the other number, then branch to the label 'loop_done'.” So, here's what we have for the first step:
Now it's time to move on to the second step. If the flow of the program isn't interrupted by branching in the previous step, then the flow of the program will continue to this step, which is, in the program, located immediately after.
The third step in our list is the performance of whatever is in the body of the loop. So, we simply need to print out the value of the counter. This part isn't hard, and we did something like it in the last part:
Returning to our main task, we now need to recreate the loops of our program in CIL. Here's the body of our method:
for (int i = 2; i <= 1000000; i++)
{
bool prime = true;
int limit = (int)Math.Sqrt(i);
for (int f = 2; f <= limit; f++)
{
if (i % f == 0)
{
prime = false;
break;
}
}
if (prime)
{
Console.WriteLine(i);
}
}
Console.WriteLine("Done!");
As you can see here, we have two loops, one nested in the other. Let's first tackle the outer loop, which is really quite simple. This is done in exactly the same way as our last example, with one exception. Remember how we eliminated the boolean variable? This is because rather than being forced to continue the rest of the outer loop after breaking from the inner one, we can instead jump directly to the next iteration of the outer loop. This is done by jumping straight to the incrementation stage. So, this stage will not require a label (if this isn't clear now, it will become clear later):
If you understood the last example, then this part shouldn't be strange at all. We're doing the exact same thing here, with the exception of an added label.
Now we move on to the inner loop. The first thing we need to do is initialize everything. We'll need to assign a value of two to f, and we'll also need to assign the square root of i to limit. Let's tackle the former part first:
ldc.i4.2
stloc f
Now comes the harder part.
int limit = (int)Math.Sqrt(i);
There are two things that are important here. First, Sqrt takes a double. Second, Sqrt returns a double. Because of this, we need to do a little typecasting. We first need to push i onto the stack:
ldloc i
Now we need to convert it to a double and call Sqrt:
conv.r8
call float64 [mscorlib]System.Math::Sqrt(float64)
Finally, we need to convert it back to an integer and then store the result back in i:
conv.i4
stloc limit
Initialization is complete, and now it's time to move on to the condition part:
inner_condition:
ldloc f
ldloc limit
bgt inner_done
This should look very familiar, so we'll move on without further explanation. Inside the inner loop, we divide i by f and check for a remainder. As with addition, this is not very hard in CIL. We simply have to push the two numbers onto the stack and then perform the operation:
ldloc i
ldloc f
rem
The result (the remainder) is stored on the stack. If the number is not prime, then we'll eventually have a remainder of zero. So, we need to check for this, and if the remainder is indeed ever zero, then we need to jump out of the inner loop. Where do we jump to? Recall how we gave the outer loop's counter a label. If we jump here, then we totally bypass the rest of the inner loop. The value of i is then incremented, and we move into an iteration. This is what we want, and so we need to push zero onto the stack for comparison (remember that the remainder is already on the stack):
ldc.i4.0
Now we simply branch to the outer loop's counter if they are equal (“branch if equal” or “beq”):
beq outer_counter
If we haven't branched, we need to increment f and go back to the condition section:
ldc.i4.1
ldloc f
add
stloc f
br.s inner_condition
If the condition is false, then the loop is done, and the number is prime. We now need to print it out:
Yes, CIL is quite complicated, but is it worth knowing a little bit about? Absolutely. You should now know a little bit more about what happens to your C# (or Visual Basic, or whatever .NET language you prefer) program after you hit the compile button in Visual Studio. The source, no matter what language it may be in, is translated to CIL and then assembled.
This common language ties every .NET language together and enables .NET to support a diversity of languages. In any case, after recreating a seemingly simple program in CIL, you should have an appreciation of what languages such as C# and Visual Basic do for you, and for the .NET platform as a whole.