When you sit down to implement any theory regardless of its simplicity, soon you will find that it is harder than it seems. All sorts of details will need to be dealt with that you may not have thought about while coming up with the solution to the problem in theory. In the following article I will implement a CRC-generating code. So stick with me if you are interested.
Contributed by Gabor Bernat Rating: / 6 November 09, 2009
Last time I presented the theory behind generating a checksum from a sequence of ones and zeros and from a generator polynomial. If you missed it and you do not understand the basic concepts of this technique, please stop reading this article and seek out that one, entitled Cyclic Redundancy Check.
I will present the software and hardware solution. Let's start with the software. I will try to implement all this as generally as possible, using only the C language. This will give you the option to use the code within some microcontrollers, with nice C++ GUIs as well. Nevertheless, before I get down to some concrete C code, we need to look into a few implementation issues and define some rules that will help us make fast, general code.
We will use a registry that has the length of the check sum width. After this we will push in at one end the input data binary sequence and make the operation with the generator polynomial in the registry. The generator polynomial, in fact, is also a binary number that has the length of the checksum generated.
Because the first bit is one for every generator polynomial, we will store only the further bits. This will lead to the fact that by using a 2 byte register, for instance, we can use a 17-width generator polynomial, and as a result, detect burst error with a length of 17.
If we append to our input data zeros of the count of the registry width, this will not change our end result (as I explained in my earlier article), and it will make it certain that, once we shift the last bit into our registry, we are done, and we no longer need to make a partial shift operation in the registry.
As for the algorithm, first we could implement the method I presented in my theory article: division, the way we learned it from primary school. Then again, that would be awfully slow for a large amount of data, because it implies a shift operation for every bit of the input, and afterward a subtraction (which is an XOR bitwise operation).
Using this approach, we will need to shift our data bit by bit into our register from the right side, and once one bit makes it into the left side, just subtract the generator polynomial. Because we've already thrown away the first bit of the generator polynomial, we can also throw away the first bit from the register, make an additional shift and only afterward make an xor operation with our generator polynomial's stored bits.
Let's write the code. In an effort to make it flexible enough to use multiple generator polynomials, we will make some definitions that will tell our algorithm what traits our current polynomial possesses.
typedef unsigned char crc;
/* (1,0) -> most hardware/ parity bit*/
#define CRC_NAME "CRC-1"
#define POLYNOMIAL 0x01
#define CRC_SIZE 1
#define INITIAL_REMAINDER 0xFF
#define FINAL_XOR_VALUE 0x00
#define REFLECT_DATA FALSE
#define REFLECT_REMAINDER FALSE
#define CHECK_VALUE 0x01
// For the data stream of "12345789"
This is the definition for the parity checksum that counts the number of ones in our input package and tells us if it is odd or even. The crc is a type within which our checksum must fit; a single byte will suffice for us. The CRC_NAME is there to help you print out what polynomial you are using.
Inside the POLYNOMIAL is coded our generator in the binary way, with the highest bit already thrown out. The CRC_SIZE tells you the length of the generator polynomial minus one and the width of our registry used. The following two variables further bulletproof our algorithm. For instance, imagine that you initialize your register with just zeros. Now if your input package contains a sequence of only zeros, the register will not observe it until the first one arrives.
This could be a problem, because in real life applications you could be mistaken about not having input or even just receiving the wrong data with a lot of zeros at the start. By setting a starting value of INITIAL_REMAINDER inside our registry (usually only bits of ones) we can deflect this issue. The FINAL_REMAINDER is there for the same reason; in that case, the problem happens at the other side.
Moreover, if we do the same at the sender side and the receiver side due to the modulo two arithmetic operation's traits, we will have the same result. Therefore we still only need to check at the end that the checksum to our received package is zero. The reflected data and remainder has its origins in the UARTs.
Most communication protocols transfer the data using the Universal Asynchronous ReceiverTransmitter. This performs a serial IO, transmitting each byte with the least significant bit first and the most significant bit last. Hardware CRCs work with the flipped/reflected byte. If we make a checksum or use codes like this to counter it, we have two options.
First, we can reflect the data package. However, this can take a long time, and a flip operation would be long and troublesome. We can achieve the same result by flipping everything else: the initial remainder, final remainder, and the CRC generator polynomial. The checksum will also be reflected. Curiously if the initial generator polynomial is good, the flipped one will also be good.
Finally the check value is what our algorithm should generate to the input data of "12345789." Using this you can observe whether or not everything is okay and working. Because we do not want the reflection part to mess up our algorithm, we will use the following macros that will take care of the flip if required.
reflect(unsigned long data, const unsigned char dataSize)
{
unsigned long reflection = 0x00000000; // Initial value
unsigned char curBitPos;
// Reflect the data about the center curBitPos.
for (curBitPos = 0; curBitPos < dataSize; ++curBitPos)
{
//If the LSB bit is set, set the reflection of it inside //the reflection. We just or a pushed in 1 value. In case //of zero just push further the input data
Now with this much knowledge you could already write the code snippet yourself. I will not present this here on the site; however, at the end of the article I will attach my source code, and you are free to examine it.
Instead of this slow operation we will use to our advantage the fact that computers have memory, and we can store the result of multiple operations done beforehand, prior to our program. This approach is the table drive solution, and tries to speed up shift operations by performing multiple operations at once.
During our previous approach, a one-bit-at-a-time shift was required. What if we could shift a byte at a time? Happily for us, modulo two arithmetic operations will help us again. Most of the time we use CRCs for 8, 16 and 32 bit registers. Also, if we XOR a piece of data with another one, shift it, and do this multiple times, it is the same as if we shifted at once with the same length and just XOR it with custom data. For example:
Shift( Shift( (Shift Data XOR alfa) ) )+XOR alfa = Shift by three and XOR beta
Of course, this beta is different for each input data, given a custom "alfa" value. In our case the "alfa" is the generator polynomial and the data is the content of our registry. If we make our registry a byte long, that will mean that we can put 256 different values into it. Now we only need to generate a table that contains that for the register with content x (a number from 0 t0 255); the old school method will generate that.
Afterward we can shift our data by a byte, and then XOR it with this custom value. You could say that this way we will discard the bits that we will shift in from our input data, because while we generate our table, we only shift in zeros. Nevertheless, this is not an issue, because if we subtract too little with the XOR of the new value, this will come back at the next XOR, and if we subtract too many that will also come back, because in the modulo two world there is no such thing as negative numbers.
While the content of the registry will be different from the basic approach during the run time of the algorithm, in the end it will generate the same result. The code snippet generating the table is:
printf("Success to open and start write to: %sn", filePosPlace);
#endif
/*
* Compute the remainder of each possible dividend. Therefore first generate all the possible curBitPos combinations.*/
for (dividend = 0; dividend < 256; ++dividend)
{
/* Start with the dividend followed by zeros. Push the dividend to the start of the crc byte.*/
remainder =(crc) dividend << (WIDTH - 8);
/*Perform modulo-2 division, a curBitPos at a time.*/
for (curBitPos = 8; curBitPos > 0; --curBitPos)
{
/* Try to divide the current byte (check if at the correct position there is a 1 bit). If so discard first bit and subtract, otherwise shift and try again.*/
I have also implemented a printing mechanism to a custom file of the table. This is useful because there is a different table for each type of generator polynomial. Due to this, you need to rerun this function whenever you change the generator polynomial. If you are implementing this in a microprocessor to speed up your device, you would want to save this into ROM. Using this, you can copy the binary table from the file and compile it as a constant.
And here it is the concrete implementation, heavily commented to make it as clear as possible:
crc crcBytes
(const unsigned char message[], const int sizeInBytes)
{
// The initial reminder has the purpose to assure that we // detect zeros at the start of the input message
crc remainder = INITIAL_REMAINDER;
unsigned char lookup_table_pos;
int curPos = 0;
/* Check for correct input */
assert(sizeInBytes>0 && message);
/* Divide the message by the polynomial, a byte at a time. Use the table to avoid the multiple shift and XOR operations and replace it with a single one of each of these.*/
for (; curPos < sizeInBytes; ++curPos)
{
/* First generate the new first byte item (XOR first byte of message with first byte of the current reminder).We already generated the single value with what we need to XOR the current reminder in order to get the same result as if we used bit by bit division. Shift out the first byte and do the operation.*/
Nevertheless, this algorithm will only work for CRCs where we use a generator polynomial divisible by 8. For the others, the bit-by-bit shift approach remains the only option.
Some testing results:
printf("The crcBits() of "%s" is 0x %X n", test, crcBits(test, strlen(test)));
crcGenerateTable();
printf("The crcBytes() of "%s" is 0x %X n", test, crcBytes(test, strlen(test)));
The check value for the CRC-32 standard is 0x CBF43926
The crcBits() of "123456789" is 0x CBF43926
Success to open and start write to: C:CRC-32_table.txt
The crcBytes() of "123456789" is 0x CBF43926
Press any key to continue . . .
With the text file looking as:
0X0,
0X4C11DB7,
0X9823B6E,
0XD4326D9,
0X130476DC,
0X17C56B6B,
...
As for the hardware implementation, I have four words for you: linear shift feedback registers. Mix them with some XOR gates, and there you have it. Here the presence of an XOR at a given position is equal to the presence of a one at that position of the generator polynomial. Whenever we shift out a one from the register, a subtraction will be made from the corresponding position. You can find more detailed information about this here.
Here is the source code that I wrote for this article:
This is all. I am glad that you read my article. I would like to thank you for your effort and I hope you learned something today. You can still post your questions here on the blog or over at the DevHardware forum for our experts. I would like to ask you to rate my article as you think it's worth. Live With Passion!