At some portions in our life we arrive at a situation where it is no longer enough to be average, or to perform at an average rate. In order to succeed we need to bring forth the best solution possible from the circumstances. If this means finding the shortest spanning tree inside something that may be represented as a graph, then you've come to the right place. Today, we are going to examine two techniques to achieve this: Kruskall’s and Prim’s.
Contributed by Gabor Bernat Rating: / 10 May 26, 2009
Before we venture more deeply into the waters of minimal spanning trees, I would like to remind you that this article is part of a larger series I am writing about graphs, and I invite you to read the rest of them if you are interested. Each article presents a set of new techniques to resolve a particular problem. It is also illustrated with code snippets written in C. This article is the seventh part of a 13-part series.
Strictly speaking, if you want to understand this article, only the first article in this series, An Insight into Graphs, is needed. It will explain what graphs are, why we use them, and how we incorporate them into a coding language. For now, let us discuss minimal spanning trees, shall we?
Until now we've treated the trees only from the point of view of whether or not there exists an edge between two vertexes. We used the adjacent matrix or the edge list, and sometimes the adjacent list, to represent this. Nevertheless, the edges do not have the same traits all the time.
We may like a few of them more, and even build a list of how favorable each one of them is for us. This idea stems from day-to-day problems. For instance, we may want to build the road system of a country. Now we do not have the money to construct a road between all of the cities, so the idea is to construct a system that allows us to spend the least amount of money (it is a world crisis, remember?).
If we are on a plane (i.e. the country is completely flat), then all we need to take into account is the distance. Then again, if the country has some mountains, we also need to consider how much each road would cost to build between the cities. This way we can assign weights to the edges. Storing them is usually done in two ways. The first solution is simply to add another member to the edge list, that we can call intuitively the weight of the edge.
The second one is to build the weight/distance/niceness matrix of the graph. Here we will store at position "i" "j" the weight/distance/niceness of the edge, and otherwise infinity. Since we are limited to a maximal number in programming, we are going to use INT_MAX/2, assuring for us that adding infinity with infinity will be still a positive number. The closer to the best an edge is in the rank of weight/ distance/niceness, the closer it will be to the number 0 (or the more negative it is).
The best solution for our road system will be the minimal spanning tree. This is defined for graphs with no directed edges inside them. Therefore, if we find the correct n-1 edges (recall the rule that a tree with n edges will have exactly n-1 edges), we will have a road from all vertexes to all others, with the minimal cost between them.
The algorithm relies on a simple idea. We compile a list in ascending order of the edges based on their weight/distance/niceness (in the future we will refer to them only as weight). Now we kick off with a clean graph, from which we remove all the edges.
The idea is to now take all the edges in this increasing order and add n-1 number of edges in such a manner that we do not form a circle. In a tree, there will be no circle, so if we want to avoid having this happen, the problem is solved. The tree has no directed edges inside it; therefore, we can form a circle only if we add an edge to a couple of nodes, both of which are already corrected to an edge.
For optimal performance, we are going to resolve this problem by first assigning to each of the vertexes a different identification number (made up of their vertex number). We will perceive each node first as a member of a different tree. Adding new edges between these trees will result in the integration of one tree into another.
This way we can add an edge to the minimal spanning tree only if the two ends are members of a different tree. Systematically, we are going to melt one tree into another, and in the end, we are going to have a single tree. This will be the minimal spanning tree. Add up the weight of the edges that are members of the tree, and you will have the overall cost of the minimal spanning tree as well.
It is quite easy to understand and to program. The only detail left is how we are going to sort the tree. I decided to call the qsort function residing inside the libraries of C. For this, we also need to write a function comparing the two edges. I've included that as well. Study both of them:
int compareEdges(const void* edge1, const void * edge2)
{
pEdge first = (pEdge)edge1;
pEdge second = (pEdge)edge2;
return first->weight - second->weight;
}
void Kruskal ()
{
int i = 0, overallWeight = 0, j =0;
int fromTree = 0, toTree = 0, changeNr = 0;
// sort the edge list
qsort ((void*) edges, m, sizeof (Edge), compareEdges);
printf ("n");
print ();
printf ("n");
printf ("nThe list of the edges with the algorithm of Kruskall:n");
Kruskall's algorithm can already solve our problem. However, there exists a different solution to this, initially formed by Vojtech Jarník in 1930 and later rediscovered by Robert C. Prim in 1957. The approach is different and the complexity of the algorithms is different.
Kruskall's algorithm, coded with the previously-shown solution to avoid the formation of circles, has a complexity of O( m*log(m)) together with the sort. On the other hand, if we code with a binary heap (which is a lot of work to do), it may yield a complexity of only (( m*log(n)). So if you invest a little effort, this solution may run faster. The theory is also a bit more complicated.
You could already see the greedy traits in Kruskall's algorithm; this technique will behave in the same manner. Now, instead of constructing multiple trees at the same time, we will start from a root vertex and build a tree with the root item being that specific node.
At every step, we will increase it with the minimal edge relating to the current tree. The edge related to the tree is composed of the edges that have exactly one node as the member of the tree. This way, for each time, we will have added another edge that is related to the current tree, and after n-1 steps our tree is complete.
What we need to resolve is the inclusion and exclusion of edges inside this list from where we are going to find the next smallest edge. I opted for the following solution. First, I sorted the edges into an increasing order. Second, I added another trait to the edges in the edge list.
Finally, after we add another vertex to the tree (via the add root, or find another good minimal edge), we need to remove the edges that go inside the tree -- that is, both edges are inside the tree -- and add those that will come with the addition of the new vertex. The edges from which we are going to choose the next/minimal edge are the ones that connect our current tree with the rest of the graph. This way, they can be perceived as the ones standing in the cut of our tree and the graph.
I will only change the value of whether or not an edge is within this cut, by setting the inTheCut trait of the edge to false or true. Finally, we will choose the first inTheCut edge from the list, as this will be minimal, because our list is sorted in ascending order.
Adding up the weight of the edges that make up the tree is child's play, so in the end we may print out the weight of the minimal spanning tree. This is what interests us if we are building that road. This can represent the distance in miles that we need to construct, the money that it will cost, and so on, based on what we input as weight.
printf( "nn The overall Weight: %d " , overallWeight);
return 0;
}
From the following link you can see this encapsulated into a working C file that, after you compile it you can execute it, test it and see if you can improve it. It is painlessly commented, so if you feel like doing some homework in coding, you'll be able to understand it with no effort, even without this article.
I invite you to rate my article or express your ideas, questions and other thoughts you may have here on the blog for the article. If you want, you may also join our friendly and ever-growing forum at DevHardware. Here our experts will eagerly answer your questions, whatever they may be. During our next meeting with graphs, I am going to present how to calculate the minimal distances between points inside a graph. Until that time arrives, Live With Passion!