During our first week of class, the BGSE Data Science class of 2017 was asked to form groups of 3 (and one group of 4) for the upcoming group projects of the first term. The question on-hand was how and on what basis to form these groups.
After some discussions, we decided to go with the following approach: Each of the 25 students provides a ranking of fellow students from 1 (most preferred) to 24 (least), considering the people’s’ previous academic background, interests, career aspirations, etc. From this input we would then compute the group allocation that is collectively most favorable, i.e. which minimizes the sum over the group member links, weighted by their ranking.
Finding the optimum solution to this problem turned out to be non-trivial. In the following I shall discuss a possible solution to that problem.
Representation as Graph
First, note that the problem on hand can be formulated as a complete, undirected graph where each student represents a vertex and the edge’s weight between 2 vertices A and B forms the sum of the rank of student A for student B and vice versa. The aggregated rank matrix therefore becomes the adjacency matrix of that graph.
The goal now is to form complete subgraphs of size 3 (and one subgraph with 4 vertices) such that the total weights of used edges gives the minimum.
A graph with 25 vertices allows for over different combinations of group allocations:
A brute-force approach – i.e. computing all possible combinations and then picking the combination with the minimum total edge weight – can therefore be deemed infeasible.
The next step was to review the literature for similar algorithmic problems and associated efficient solutions.
The stable matching (or marriage) problem considers finding a stable matching between two equally sized sets of elements given an ordering of preferences for each element. It differs from our problem in that each vertex is associated with one of two groups and it only allows for pair-wise matching. Furthermore, the solution to the problem is stable (i.e. no two individuals have an incentive to re-match) but not necessarily optimal from all individuals’ points of view.
In the similar stable roommate problem all elements (roommates) belong to a single pool, however, the problem is still restricted to pair-matching and the efficient algorithm by Irving (1985) does not generalize to larger groups.
With no ready-to-use algorithm on hand, we were forced to design a new group-matching algorithm.
The proposed algorithm starts from any valid solution and then iteratively improves the solution by swapping unstable pairs of students from different groups. The produced solution is therefore guaranteed to be stable for pairs of individuals but not necessarily for 3 or more individuals trading “under-the-table”. From that follows that the algorithm is likely to find a “good” solution but not necessarily an optimal solution.
The pseudo-code of the algorithm is given below:
INPUT: adjacency matrix, group_sizes OUTPUT: group allocation -------------------------------------------------------------------------- form groups of given group sizes initial solution: randomly assign vertices to initial groups -------------------------------------------------------------------------- WHILE solution has changed FOREACH pair of vertices (i,j) cur_cost <- sum(edge weights of i) + sum(edge weights of j) alt_cost <- sum(edge weights of i if in group(j)) + sum(edge weights of j if in group(i)) IF alt_cost < cur_cost swap i and j
The algorithm compares pairs of students in each WHILE loop. In the worst case, the algorithm could start from the worst possible solution and in each WHILE loop iteration only move to the next-best solution, thus leading to brute-force-like performance.
Tests with an implemented version of the algorithm (in R), however, showed that for graphs of size 25, the algorithm converges after an average of 20 swaps and no more than 3 WHILE loop iterations. Thus, the algorithm is believed to be efficient in the average case.
Further tests with different initializations for the same rank matrix confirmed that the algorithm only converges to a local minimum in the solution space. It is believed that the algorithm produces a good solution when run several times with different initializations and then choosing the solution with lowest total cost, however, by no means it can be guaranteed to have found the global optimum solution.
The described group allocation process based on rankings turned out to be not self-enforcing and therefore was not actually used in the end.
About The Author (Hans-Peter Höllwirth):
I have several years of working experience in the software industry as programmer, analyst, and project manager, and graduated from the University of Edinburgh with a bachelor degree in Computer Science and Management Science in 2016. My interests in machine learning, computing, and finance led me in my decision to join the BGSE Data Science program.