Bipartite Graphs

The economic crisis is upon us. You hear from radio, newspaper and television that we need to hold the line. You can no longer waste any of your time and money, because otherwise, the bankruptcy of your company is almost certain. In these harsh times, every little cent you can save is worth a fortune. If you have a company with various employees that can work in multiple places, but their production depends on what they are doing, and you want to maximize the company’s productivity, you’ve come to the right place. This is the conclusion of a thirteen-part series that shows you how to use graphs and algorithms to solve everyday problems.

Although at first glance the bipartite trait of graphs has nothing at common with your problem, if you think about it,  you will discover otherwise. All we need to do is rephrase the input data a little to get closer to your issue. So, let there be n workers that can work in a given number of machines/places.

Of course, based on your experience, you know that each worker provides different levels of efficiency on each job. Therefore, it is crucial for you to make the assignments in a fashion such that your productivity will reach its maximum with your current team.

Now consider that vertexes represent every job. We will name this group A. The assembly of the employees we will represent via nodes, and name them group B. A connection between these two groups matches the productivity of one with one assignment. Now, translating the problem to graph language will mean finding the maximum pairing (edges) that will result in n pairs.

Of course, you may have more than n jobs or less than n employees. In this case, only the number of vertexes in each group will differ; the problem remains the same. Here it is a prime example of this: Image Courtesy of Grapy Theory Algorithms, Kátai Zoltán

The first connection group (represented by the bold edges) offers two pairings, while the second is a little more efficient, offering three as the maximum. Now that you understand the issue before you, to venture any further I need to clarify what you must know to understand this article.

This is part of a larger series about graphs; you will find all of the other articles here on the ASP Free web site. If you are not familiar with any of the concepts I’m about to enumerate, just search for the articles; you can even click on my name to find them all. Come back later, after you have read the articles explaining the concepts. You should understand how to represent graphs in memory, the breadth-first search and the Ford-Fulkerson Algorithm for network flows.

{mospagebreak title=The Problem}

The first and most important question is, how do we divide a given graph into two individual sets? We could also write a program that will ask for the data from two different files, or ask for the vertexes one after another and only then add the edges.

However, this is a time-intensive job, and we will decrease the re-usability of the program. Any little modification would mean a major modification in the code. We want to make it as user-friendly as possible, so whoever adds the data in the input file should only add the edges representing the efficiency.

The issue, translated again into the language of graphs, is to find out if the graph is bipartite, and if so, make the split into set A and set B. A pair graph is one within which the edges are not directed, and can be split in two individual sets where every edge has one end in the first set and the other end in the second set.

When can any kind of graph be split into two types of sets? This can be done only if all of the basic circle system’s components have a length that is also a pair number. Now again, if you are not familiar with the statement of circle systems, you can just read my article entitled Circles and Connectivity in Graphs.

Before you jump in to see if this is correct, there is a shorter solution. It is possible with the Breadth-First Search. This search visits the edges by level. First, it drops by the root. It then visits all of the direct neighbors (also called children). It next visits the children of all the children of the root (of course skipping the ones that were already touched). Repeat these steps until you have no more edges to visit.

If we follow this idea we can add all the odd numbered edges (in the parent tree) to set one and the others to set B. To determine invalid graph input, it is only a matter of testing to see if the bottom-most leaf in the tree and the root have the same color.

This process is also called coloring of the graph’s nodes with two colors. Therefore, we will refer to this as the colorizing process. However, assume that we have the correct input. With this search, we can count how many nodes we added to which color and return the number of the color which is less used, to make it clear which one of them has more elements. This can save us an extra iteration.

{mospagebreak title=The Colorize Code Source}

It is time to write into C code what I explained on the previous page. I will use color 1 and color 2 instead of red and yellow as these are much easier to use in programming. The colors WHITE(not visited), GRAY(in travel process), and BLACK(visited) will signify only the state of the vertexes in the matrix.

The variables f1 and f2 count the number of appearances of each color.

int colorize(Node*& list,int s, int* painted, const int& n)

{

int* color = (int*) malloc((n+1)*sizeof(int));

memset(color, WHITE, sizeof(int)*(n+1));

int f1 = 1, f2 = 0;

int u = 0;

int v = 0;

color[s] = GRAY ;

painted[s] = 1;

pListIt Q;

pListIt at;

pListIt endQSe;

Q = NULL;

// put s into the queue

Q = (pListIt) malloc ( sizeof( ListIt));

Q->value = s;

Q->p_next = NULL;

while(Q) // while we have nodes to visit

{

u = Q->value;

// visit the neighbors

for( at = list[u].neighbors; at; at = at->p_next)

{

v = at->value;

if (color[v] == WHITE)

{

color[v] = GRAY;

if( painted[u] == 1) // what color its ancient had

{

painted[v] = 2; //use the other color

++f2;

}

else

{

painted[v] = 1;

++f1;

}

// find the end of Q

for(endQSe = Q; endQSe->p_next;endQSe = endQSe->p_next);

// put v into the queue/ at the end

endQSe->p_next = (pListIt) malloc ( sizeof( ListIt));

endQSe->p_next->value = v;

endQSe->p_next->p_next = NULL;

}

}

//delete first item

endQSe = Q;

Q = Q->p_next;

free(endQSe);

color[u] = BLACK;

}

if(f1 < f2 )

return 1;

else

return 2;

}

{mospagebreak title=The Solution}

With this, we’ve made the distinction. For the final solution we will use the Ford-Fulkerson algorithm. The basic setup is simple. Consider that we needed to find out the maximum flow that can go through the network from set A to set B. We will make the edges directed, and all of them will come from the points of set A to set B.

Each vertex from set A will behave as a source node, while the vertexes from set b will be terminal nodes. We will remove the issue of multiple vertexes and multiple terminals by adding another virtual source (n+1) and a terminal (n+2). This virtual sources and terminals will be connected to all of the sources from set A, and respectively the terminals from set B. The capacity of the edges is one. What we get is something like this: Image Courtesy of Grapy Theory Algorithms, Kátai Zoltán

Now we will call the Ford-Fulkerson algorithm to calculate for us the maximum amount of flow that the network can handle, and we are done. The improved roads that the algorithm uses, or more obviously stated, the individual alternative routes from the virtual source to the virtual terminal, will also say which one of these connections are advisable to use. Here it is translated into C code, given that you already saw the Ford-Fulkerson algorithm in my previous article:

int main()

{

//create the variables

Vertex** neigMatrix;

int n;

Node* list;

// Pairing in graphs

print(neigMatrix, n, list);

int* color = ( int*) calloc ( n+1, sizeof(int));

int result = 0;

result = colorize(list,1, color, n);

printf("n The colored array: ");

// Now we will build from this a flow problem

for (int i =1 ; i <= n; ++i)

{

printf( " %d ", color[i]);

for ( int j = 1; j <= n; ++j)

neigMatrix[i][j].capacity = neigMatrix[i][j].flowValue =0;

}

// all point from left to right in the already existing system

pListIt at;

for (int i =1 ; i <= n; ++i)

{

at = list[i].neighbors;

if(color[i] == 1)

while (at)

{

neigMatrix[i][at->value].capacity = 1 ;

at = at->p_next;

}

}

// now add the source and the drain

// n+1 source

// n+2 drain

for (int i = 1; i <= n; ++i)

{

if (color[i] == 1 ) //connect the source to this

{

neigMatrix[n+1][i].capacity = 1;

list[i].vertexNr++; // count incoming edges if(add(list[n+1].neighbors, i))

list[n+1].vertexNr++; // count outgoing edges

}

else //connect this to the drain

{

neigMatrix[i][n+2].capacity = 1;

list[n+2].vertexNr++; // count incoming edges if(add(list[i].neighbors, n+2))

list[i].vertexNr++; // count outgoing edges

}

}

// now call on this the function

printf( "n Maximal parity: %dn", Ford_Fulkerson(neigMatrix, list, n+1, n+2, n+2 ));

return 0;

}

Given that you also defined the verbose variable, which tells the Ford-Fulkerson Algorithm to print the improved roads, the output for the upper graph should look like this (eliminate the virtual source and virtual terminal vertexes and you have the solutions):

The colored array: 1 2 1 2 1 2 1 1 2

-Negative: 10 1 2 11 —-> with 1

-Negative: 10 3 6 11 —-> with 1

-Negative: 10 5 4 11 —-> with 1

Maximal parity: 3

In this scenario we considered everyone to be equal, and to provide the same productivity; however, if this is not true, all you need to modify the graph is to add the respective capacity to that edge before you call the Ford-Fulkerson algorithm. In this case, the number of improvements will tell the number of pairings. In this way, the Ford-Fulkerson algorithm will tell the company’s maximum productivity.

For those of you who still do not fully understand all of the traits of this problem, below you’ll see a downloadable file with the full code for this implementation in C, with just a little extra C++ flourish via the references. Before I give you this, I should mention that there exists an alternative solution, called the Hungarian method, that solves the problem in O(n*(n+m)).

Thank you for reading through my article. I hope you found it to be worth the time you spent with it, and that you are willing to make the extra effort to rate it as well. With this, my article series related to  graphs is rapidly reaching its end; as a bonus, I will later be covering the Euler and Hamilton graphs. Regardless, any kind of question you may have, you can ask it any time here on the blog or over at the friendly community under the name of DevHardware. Live With Passion!