Graphs help solve many computer-related problems. They can show us the simplest way to present something, so that it takes up as little memory as possible. In this article, the second part of a multi-part series that covers graphs in detail, you will learn a very clever way to code and decode a tree.
Contributed by Gabor Bernat Rating: / 3 May 07, 2009
"The whole of science is nothing more than a refinement of everyday thinking," wrote Albert Einstein in Physics and Reality in the year 1936. Much truth lies in these words; you will see this when you read on and find out the ingenious ways to solve a few issues that at first may look too complicated.
Welcome back for anyone who read the first part of this article series about graphs; if you didn't, I invite you to read it. You will find it here on the site under the name of An Insight into Graphs. In essence, it tries to present what graphs are all about and how we can store them in memory.
Today I am going to present another ingenious way to code and decode a tree. This will also allow us to store that tree inside memory with the smallest amount of space used. Before we venture on, let me remind you of a few theoretical statements from my previous article that you will need to understand, so we can speak the same language.
"Graphs inside which there are no circles are called forests. The ones that are also connected are referred to simply as trees. Inside a tree, vertexes that have only one edge attached to them are leafs." Now that we know what a tree is, you should already realize that a tree always has n-1 edges, where n represents the number of vertexes present in the tree.
The Prüfer code is a proof to Cayley's formula, which states that if n is the number of vertexes inside a tree, there exist n^(n-2) different trees. The inventor of the algorithm was Heinz Prüfer and it was invented in 1918. The Prüfer code for each tree allocates a sequence of n-2 numbers, where the numbers are from the range {1, 2, ..., n}.
In this approach, every point of the tree is labeled with a number from one to n. The Prüfer code generated will be unique, and from this we can already deduce that Cayley's Formula is correct. As given, n-2 numbers from the {1, 2, ..., n} can be made n*n*...*n (n-2 times) according to a simple combinatorial point of view. Yet, how Prüfer code guarantees all this and how it is accomplished remain the subjects of the next page.
This algorithm works in both directions. We can code a tree into a sequence of n-2 numbers and we can decode it back. The representation we are going to use is of the edge list. Therefore the input file will contain the number of vertexes used, followed by n-1 lines with a pair of numbers, each representing an edge.
The algorithm is simple; it contains n-1 steps. We get the smallest indexed leaf from the tree and add it to the sequence. Delete the vertex from the tree (and with it the edge also) and repeat the step n-2 times.
In the end, we should add the n to the sequence. The n-2 sequence will represent the Prüfer code, as the last element (the n) is always at the end, and if we know the number of vertexes in the tree, we do not need it.
This n-1 th element will take always take up the n; for proof, think that we always cut off of the tree the vertex that has the smallest index. So think about what would happen if the nth element remains there until the end of our algorithm (n-1th step), if we will still have to choose the smaller one, and with it, the n remains there as a vertex with no additional edges to the others.
When we code the algorithm inside a language, we need to determine which vertexes are leaves, so we can choose the smallest one from them. I have opted to count the degree of all points and search within this list for the smallest vertex that has a degree equal to one (as this is, by definition, a leaf).
Deleting the vertex and the edge related to it is a job to set to an invalid vertex pair in the edge list, and to decrease the degree of the item at the other end. The deletion of the vertex itself that we just cut off will be achieved by setting the degree of that point to zero. Here it is the code snippet:
Count the degree at read time.
++degre[edgeList[i].from];
++degre[edgeList[i].to];
for( j= 0; j < n-2; ++j)
{
// find the vertex with the smallest grad
for( k=0; k < n;k++)
{
if(degre[k] == 1)
{
l = k;
break;
}
}
// ok find the pair/edge of this
for( k =0; k < n-1; ++k)
{
if( edgeList[k].from == l || edgeList[k].to == l)
{
// cut it off
// delete the edge
// decrease the degree of the vertex
if(edgeList[k].from ==l)
{
--degre[prufer[j] = edgeList[k].to];
}
else
{
--degre[prufer[j] = edgeList[k].from];
}
edgeList[k].from = -1;
edgeList[k].to = -1;
break;
}
}
degre[l] = 0; // no more additional edge on this vertex
}
prufer[n-2] = n-1; //set the last item
Okay, the first thing I need to clarify is that I counted the vertexes from zero to n-1, so decrease by one from everything that I said in the upper description of the algorithm. It is just another way of representing it; everything else remains the same, and I think it is straightforward enough for you to understand.
If we code something and fail to decode it, that probably would be meaningless, so here we will go in the other direction. First, we have to add the n to the last position. As I said before, the last item is there by default, so it is not added to the sequence (even if we do not know the number of vertexes, this will be the length of the sequence plus two).
The algorithm will try, and it will succeed, to determine at every step another edge of the tree. Therefore, this also will have n-1 steps. When we add the n to the sequence, the length of the sequence will turn into n-1.
The algorithm is simple. When we wrote down at code time the "i"th item that was not cut before, it will not be cut in the future either; it was the smallest indexed vertex at that time that still was related to the tree.
In conclusion, we need to find the smallest index in the sequence containing the points that were cut down, and the one present in the Prüfer code. Look at the image below in order to comprehend it better.
The upper row is the Prüfer code, and for the next item we will always choose the smallest not-existing index from the grayed numbers (those that will be added later and those that were already added to the code). In the end, the two- dimensional array will hold one edge in each column. Here is everything coded in C:
for ( j =0; j < n-1; ++j)
{
memset(present,0, n*sizeof(int) ) ; // reset the count // array - here present[i] will be zero if it's grayed
for( k = j ; k < n-2; ++k)
present[prufer[k]] = 1; // the items in the code
for( k = 0; k < j; k++) // points already added
present[temp[k]] = 1;
for( k =0 ; k < n; ++k)
if(!present[k]) // first not present
{
// add to our edge list
edgeList[j].from = prufer[j] ;
edgeList[j].to = k;
temp[j] = k; // add it as already deduced vertex
break; // end this loop...
}
}
With one input of:
8
1 2
1 6
1 7
0 1
0 3
0 4
4 5
The result:
Edges:
1 - 2
1 - 6
1 - 7
0 - 1
0 - 3
0 - 4
4 - 5
$$$$$$$$$$ 0 1 2 3 4 5 6 7
A Degrees: 3 4 1 1 2 1 1 1
The prufer code:
1 0 4 0 1 1 7
The prufer code transformed back: edges
1 - 2
0 - 3
4 - 5
0 - 4
1 - 0
1 - 6
7 - 1
Press any key to continue . . .
Here is the code if you want to have a look at it; it's a simple C file, so just compile it. Assure that you have a valid input within the In.txt file and it should work perfectly:
Before I leave you for today, I want to present another simple algorithm. It is the Floyd-Warshall algorithm. This algorithm will in essence tell you between which vertexes a road exists. The idea is simple; we start one iteration and go through point to point, and if from point "i" to "j" there exists a road, and from "j" to "k" also, then we can conclude that from point "i" to "k" there exists a road as well.
The algorithm works with an adjacency matrix, where mat[i][j] is one if there exists an edge between "i" and "j". In the end, mat[i][j] will be one if there exists a road between "i" and "j," and zero otherwise.
If we would make these checks in the right order, a single iteration should be enough; however, determination of this order is hard, so we will execute the upper theory as long as it assures that we will find all connection between the vertexes. It is proved that n times will suffice in a graph with n vertexes.
The code snippet in C:
for( i=0; i <n; ++i)
for( j =0; j < n;++j)
for ( k =0; k < n; ++k)
if(mat[i][k] && mat[k][j])
mat[i][j] = 1;
We can use this code for determination if, inside a graph, there is a circle, for instance. If at the end on the main diagonal of the matrix, we find one, that means that there exists a road from a point to the same point -- which in practice means a circle. This is a very slow way to determine this (there are much better solutions, as we will find out), yet it is a possible solution.
If we choose to save the distances between the vertexes, and when we make the check, the new value will be the distance between "i" to "j" plus "j" to "k," we will find out the distance in number of edges between the vertexes in the end matrix.
Here is a little C++ program that uses this algorithm to tell if there exists a circle and answers a couple of other, more general questions about graphs, like finding the vertex with the most incoming or outgoing edges, the isolated vertexes, vertexes which have only incoming edges, vertexes which have only outgoing edges, and vertexes which have edges in both directions.
The code is simple and should be perfect for illustrating the use of the adjacency matrix. Feel free to skim through it and learn from it. Thank you for reading up to this point; I invite you to come back next time for the next chapter of this series about graphs. Rating this article is appreciated, as is posting your thoughts and comments here on the blog. Until we meet again, Live With Passion!