An understanding of graphs can actually help simplify apparently difficult problems. They're a useful tool that belongs in every programmer's toolkit. In this introductory article, you will learn how graphs and programming relate to each other.
Contributed by Gabor Bernat Rating: / 2 May 06, 2009
"Everything should be made as simple as possible, but not simpler," said the great theoretical physicist Albert Einstein. Indeed, why should we bother to understand and possibly even solve practical problems beyond our comprehension? If only we could bring them down to a level that is straightforward enough for an immediate solution.
This is the entry point of an article series I will write relating to the graphs. During this series, I will present many algorithms that usually relate to a simple observation. I will focus on the programming part most of the time and offer to you only minimal background knowledge.
During this first article, you will learn some general information about graphs, and more importantly, how we represent them in our coding languages, which for me will be C most of the time. So if you are interested in this field, come aboard and keep reading.
Graphs in their essence were constructed and used to solve complex-looking problems that we face in our everyday life. The idea is that we need to find the common bonds between similar-looking issues and make them simple enough so that we can find for them an answer that will work every time.
Graphs are the road to simplification. There are countless problems that may be represented as a graph; however, before we learn that we need to find out what a graph is. Graphs come from the vast land of mathematics; they try to represent a set of objects that are connected by some rule with another set of links.
-A prime example of a simple directed graph-
To make all this a little more interesting, the objects at the points are vertices(nodes/points) while the links are referred to as edges (lines). That's pretty simple so far, right? To make all this a little more complicated, the relationships may be bidirectional (as in the case of you and your mother) or they may work only in one direction (as in the case of the president and the vast majority of people). The latter connections are called directed edges.
The problems stem from our daily life, and with the knowledge described above, I'm sure you can already think of a couple. Nevertheless, if not, let me help you with one: the road system of a country where the cities represent the objects and the roads the edges. Alternatively, think about the relationships between people at a meeting, where the vertices are the individuals, and if there is a connection between two of them, there will be an edge.
The pipe system transporting gases, water, oil and so forth, also has the potential to be the base of a graph. Finding out the maximum amount that can flow through the pipes in a moment is something to take into account when we build them, and oil companies are willing to pay a lot of cash for this information.
From this, we can also deduce that sometimes the edges may have an additional attribute -- something that will point out the length of the edge or how much of a substance can flow through it. Moreover, the examples can follow on endlessly. The idea is that this is something worth investigating and developing a theory around.
Now, despite what you might think from the title, I will not embark on a deep adventure into the waters related to the theory shores. Following the quote starting this article, I will try to keep everything as simple as possible -- but not a bit simpler. This is the starting point of an article series, throughout which I will endeavor to explain the main points of interest of the graphs in use to this day.
I will try to present all of this from the point of view of the programmer, putting the mathematical parts in the background. So you'll see no mathematical theory proofs here, sorry. However, every presentation of an algorithm will be accompanied by some pure code snippet in the language of C (maybe mixed in with a little C++; nevertheless, there will be no object-oriented programming involved).
Regardless, we still need some minimal theoretical knowledge to understand the future articles, so here it goes, as compressed and straightforward as possible.
Every graph is composed of n vertexes (objects) and m edges (links between the objects). Edges that have the same starting point and ending point are called loop edges. You can see one in the graph below at vertex number 3. Multiple edges are the ones that have the same start and end, just like the two between vertexes one and five on the image below:
Simple graphs are those that contain no loop or multiple edges. Two vertexes are considered neighbors if an edge exists between them. For example, on the upper graph, vertex one and two are neighbors.
Isolated vertexes are vertexes that have no edge attached to them (vertex 6 is the correct illustration for this). Further, we can declare in and out degrees for vertexes. This will show how many edges come into (in) and go out from a specific vertex. For instance the vertex four has an in-degree of two (edges coming from vertices three and five) and out-degree of one (edge towards vertex five).
The in- and out-degree of a vertex will be equal if the graph's edges are not directed. A graph is k regular if every vertex has a degree equivalent to k. A walk is a consecutive sequence of edges between two points. If this sequence will not go through any of the vertexes twice, it is called a road. A circuit is a closed road (aka the start and the end are the same). If none of the edges stays twice in the succession, we're talking about a trail.
We can also define the complement of a graph, sub graph and complete graph, as the three images below illustrates.
--Graph A--
--Graph B --
--Graph C --
-Graph C -
A complete graph is one inside which there exists an edge between all of the vertexes, as in the Graph C. Graph A and Graph B are sub-graphs of Graph C, as we can obtain them by removing an arbitrary number of edges or vertices and the edges related to them. Graph A is the complement of Graph B and vice versa, because if we put the two together, they will provide a complete graph.
Inside a connected graph, there exists a road between any two of the points. Here the graph is not directed. In a strongly connected graph (which is also directed), there exists a road in both directions between any two vertices. Graphs inside which there are no circles are called forests. The ones that are also connected are simply called referred. Inside a tree, vertexes that have only one edge attached to them are leafs.
Now that we've finished our discussion of theory, we can start with the proper programs and algorithms. However, as sad as it is, we cannot draw up the graph for coding languages, so we need to code it somehow. There exists several representation modes; each of them will be more or less adequate to a specific algorithm, and to achieve maximal efficiency we need to use the correct one.
There are around seven types of representation, however I will present only a couple of them in detail, as the rest will have little importance to the algorithms we are going to study later on.
The first and the most important one is the adjacent list, where for every vertex is enumerated all the vertexes with which it has a direct connection (though for directed graphs, only the out-going edges will be taken into account). Realizing this in code is a little more complicated, so I have allocated it to it the next page.
The edge list is the enumeration of the edges with the syntax from node to node. This can be easily made with a matrix of 2 X m(number of edges). This is the one used most of the time when you read in the graph, as it makes it easy to add and remove edges.
The adjacent matrix will have n columns and n rows, where at the i-th row and j-th column will be:
---> k if there exists k number of edges between "i" and "j" if "i" is not equal with "j".
---> h if, there exists h number of loop edges if "i" equal "j".
---> 0 if there exists no edge between "i" and "j".
Furthermore, there exists the circle matrix representation, the incidence matrix or the cut matrix; however, I will not discuss these as they are too rarely used; search for them on the Internet if you are interested. The last one will be the representation of trees.
This is done most of the time with a single array of n, where the "i"-th place will be the number of its ancient. The root item will be -1 or 0 depending on implementation. This way we can even determine the road from the root to a leaf by applying recursive calls to this array.
In this section, I will present the implementation of the adjacent list in some C code. The idea is to make a structure that is linked together with no memory waste; to achieve this, it will be created dynamically.
For this, we will proceed as follows. Create a new type containing an int value (the vertex to which it is linked) and a pointer to this kind of type, meaning the following: point to where this edge has a connection. For an easy update of the list, we will create a function that will have as parameters the row where we insert the item and what will be inserted.
The function should take care that no point will be added twice. Moreover, it will arrange the points in an increasing order. The list should be created dynamically with the calloc function. At the nth position will be listed to what vertexes it has edges. If NULL is present in the p_next, the list is over. If the place itself is NULL, there are no edges. Take a look at the code snippets now. First, the creation of the new type:
// A structure for the element point
struct ListIt
{
int value;
ListIt *p_next;
};
// typedef for the pointer type
typedef list * pListIt;
The add function:
// Here we need to make sure that the items are added in // correct order
bool add(pListIt &dest, int val)
{
//create the item
pListIt p;
p = (pListIt) malloc(sizeof(ListIt));
p -> value = val;
if(!dest) // first item addition
{
p -> p_next = NULL;
dest = p;
}
else
{
pListIt find = dest;
pListIt at = NULL;
// first find the first greater number, insert before
while(find && find->value <= val)
{
if( find->value == val) // do not add a duplicate
{
free(p);
return false;
}
at = find;
find = find->p_next;
}
// insert at
if (at) // insert at a valid point
{
p->p_next = at->p_next;
at->p_next = p;
}
else // insert at the start
{
p->p_next = dest;
dest = p;
}
}
return true;
}
The creation of the type after we know its size:
v = (pListIt *) calloc(n+1,sizeof(pListIt));
The read method with an edge list in the file:
for (i = 1; ; i++)
{
scanf("%d %d",&from,&to); // from -> to
if(feof(stdin))
break;
add(v[from], to);
++m;
//add(v[to], value); // directed graph or not
}
}
Moreover, we read from it the data and print it:
void print()
{
int i;
pListIt j ;
for( i = 1; i <= n; ++i)
{
j = v[i];
printf("n%d -> ", i);
while( j)
{
printf(" %d ",j->value);
j = j->p_next;
}
}
}
Now if we use a little more complicated input file with an edge list:
22
1 2
1 3
1 4
2 5
5 7
5 8
3 6
6 9
6 10
8 2
8 1
9 11
10 3
11 12
11 13
11 14
11 3
12 8
12 15
13 16
12 20
14 17
14 18
15 19
15 20
16 21
16 22
18 9
19 12
20 11
22 13
The result will be just as clear:
1 -> 2 3 4
2 -> 5
3 -> 6
4 ->
5 -> 7 8
6 -> 9 10
7 ->
8 -> 1 2
9 -> 11
10 -> 3
11 -> 3 12 13 14
12 -> 8 15 20
13 -> 16
14 -> 17 18
15 -> 19 20
16 -> 21 22
17 ->
18 -> 9
19 -> 12
20 -> 11
21 ->
22 ->
Now you can already imagine the graph (which is a tree) that the large list of the edges tries to describe. The reading in of the edges assures that you miss none of them, while our code and algorithm will make sure that the code is as optimized and efficient as it can be for a couple of algorithms.
For now I will stop here; nevertheless, I will be back soon with the Prüfer code and traveling algorithms inside graphs. Thank you for investing your time and reading until this point; with the hope that you learned a lot from this article, I invite you to comment it here on the blog or even better join our forums at DevShed or DevHardware and ask there any question you might have. Live With Passion!