The Hungarian Algorithm: The quest for the best lineup for football teams

With football players skill data, the usage of the hungarian algorithm (also called the Munkres algorithm), combinatorics and a bit of Python we can design a solution that suggests the best modules/schemes to adopt by a football manager by deploying players to the roles where they perform best.

Andrea Grianti
8 min readMay 12, 2022
Photo by Lars Bo Nielsen on Unsplash

This is the result story of my data science passion and “amateurish” cooperation with a local football school, so the focus will be on the approach used and how you could replicate the same to other (mainly team sport contexts). I won’t show you the whole developed code but just the parts of python code I think are more interesting. If you are interested in the details drop me a comment.

Ingredient #1: The Hungarian (Munkres) algorithm.

For those who don’t know/remember what this algorithm is, in short, it’s an optimization method to solve a class of “operating research” problems called “assignment problems”.

Because there are many in depth articles on that I cut short and tell you that the typical example you can find around is something like this :
You have a set of resources (usually people) and a group of tasks to be performed by the resources. You have also a cost data matrix representing the values (could be $/€ or time or else) that every resource would spend in case they are assigned to perform the tasks, so you have a cost matrix resources/tasks.

The algorithm then solves the problem to find the optimal assignment so that every resource is assigned to every task in a way that minimizes the total cost to perform all of the tasks while deploying all of the available resources.

So far so good. Some notes:

  1. If you have more resources than tasks, some of the resources will be unassigned
  2. if you have more tasks than resources, some tasks (those with higher cost) will not be performed.
  3. If there are equal number of resources and tasks the algorithm will choose the optimal assignment based on minimizing the total cost
  4. if there are several assignment configurations with the same minimal cost the algorithm choose the first useful configuration (usually this depends on how the resources are sorted in the matrix) satisfying the minimal total cost constraint.

The hungarian algorithm also called the Munkres algorithm is available at the link in terms of python library, so if you don’t have it installed you have to install following the instructions (pip install munkres).

To use it I share this sample commented code that you can cut and paste in your python environment and start playing with it.

The algorithm returns a list of tuples where the first element of the tuple is row index and the second element of the tuple is column index.

if you run the above code you should get something like this (not exactly this because in every run the matrix is randomly generated, see line 21). In any case the logic is clear: the algorithm suggest to assign p0..p4 to tasks c0..c5 in that order.

How is this related to football ?

Now consider a football skills data matrix like this: on the rows you have the players , on the columns you have the different roles used in football schemes/modules/formations (like for example ‘GK’ for Goalkeeper, ‘LB’ for Left Back, … ‘ST’ for Striker and so on, if you are not familiar see legenda for details in the picture below). In every cell we must define their expected performance if they are assigned to any of the roles. In general you can have numbers where where higher numbers means higher skills (for example a number from 0 to 99). So instead of a cost matrix you have a skill matrix (see the table in the picture) which is in the end a profit matrix.

Fig.1 Roles in Football and skill data matrix. Legenda: GK=Goalkeeper, LB/RB=Left/Right Back, LCB/RCB=Left/Right Central Back, CB=Central Back, LWB/RWB=Left/Right Wing Back, LDM/RDM=Left/Right Defensive Middlefield, CDM=Central Defensive Middlefield, LM/RM=Left/Right Middlefield, LCM/RCM=Left/Right Central Middlefield, CM=Central Middlefield, LAM/RAM/CAM=Left/Right/Central Attacking Middlefield, LW/RW=Left/Right Wing, LF/RF/CF=Left,Right,Central Forward, LS/RS/ST=Left/Right/Central Striker. ON THE RIGHT the matrix with skills for every player and for every role

Because the hungarian algorithm works by minimization (usually costs) for our football use case we must first convert the skills matrix values subtracting a predefined max value from every cell so that the best players in a role are those with the min value number. Because I had values from 0 to 99, I subtracted 100 from all values of the matrix.

Ingredient #2: A module/scheme generator

Although every game is played in 11 per team, the possible roles a player could have are much more than that. If you take a look around you will find at least 26 different roles for players (excluding the GK).

In this case the question is how do you define the different game lineups given that you have 26 roles to chose.

So the problem becomes more mathematically intriguing if the manager wants to chose the team scheme as a consequence of the optimization based on the skills of the players by role.

So the question we try to focus is: which is the best combination of 10 (excluding the goalkeeper) players out of 26 different possible roles they could have in game which whould result in the highest team value given an input data matrix with their skills for each role ?

Binomial coefficient in football

Given that problem we could think to loop the hungarian algorithm a number o times for every possible scheme/lineup combining in all the different ways the 26 different roles resulting in a team of 11 players.

Although in theory this is possible, mathematics tells us that this excercise would imply the analysis of a number given by the binomial coefficient of n=26 possible roles with p=10 players which would result in 5.3 millions of different modules/schemes formations.

Binomial coefficient in fottoball: 26 roles, 10 players (excluding the goalkeeper) = 5+ millions of possible team schemes. which is the one that maximizes the value of the players skills ?

There’s a faster solution because you probably have realized that if we follow that approach we waste time analyzing formations which are yes made of 11 players, but totally unrealistic like for example a formation with all players in offensive roles or all in defensive roles, or all on the left or all on the right. So this approach, although theoretically valid, lacks balancement and knowledge.

To do that we can work both with a bit of math (combinatorics is useful) but also with our personal judgement and good sense to reduce the numbers by an order of magnitude and feed the hungarian algorithm with few thousands or even few hundred different formation scenarios to work out the best, or the list of the best, among them.

First of all we should have in mind the different roles so we divide the field in a matrix like in the figure below:

Figure 2: on the left the roles and on the right the possible mapping. Every line has its possible configurations of valid presence/absence of roles. The set of all the lines with the constraint that the sum of 1s must be == 11 makes up a football scheme that’s feeded to the Munkres algorithm resulting in a value based on the skills matrix. Looping results in finding out the scheme (or schemes) with the highest value and the therefore the best scheme.

To start simplify things the line0 is always present so that’s a constant like having [0,0,1,0,0].

For each of the other lines we can manually build a set of realistic configurations given by groups of 0s (=absent) and 1s(=present) using our judgement and also a symmetrical approach (if you have a player on the left you should have also 1 on the right and something that gives balance in every line) so we exclude those configurations mathematically possible but unbalanced and unrealistic.

Let’s take an example for realistic configurations for Line 1 out of 2⁵ = 32 possibilities we reduce to 4 if we consider as valid just these :

We can think a solution in python, to something like this: [[1,1,1,1,1],[0,1,1,1,0],[1,0,1,0,1], […]…] so a simple list of lists containing several valid configurations for a single line.

Because we have 7 lines for roles we just have to think the possible realisitic configurations for every line without having to worry about the compatibility of our choices with the other lines because we will take care of that setting a constraint that total number of players will be 11.

Now yes we can use combinatorics and if we have for example an average of 4 configurations for every line we have 4⁶ = approx 4 thousands combination which is not a problem to loop for the algorithm.

In my experiment I had even less configurations : 1*2*5*4*3*3*3 resulting in 1080 different modules to be generated .

Even in this case we still can run into the problem that among the few thosusand different formations there could be some of them with a number of players higher or lower than 11. So in our python code we must consider this contraint to filter out unwanted schemes.

To generate the valid schemes/formations/modules from a list of lists and build the loop that can be used for scheme evaluation, I share a python block of code with, in my opinion, a pearl I found in the net which in few lines does the job. In particular note the itertool.product(*lines) which takes as a parameter the lines data structure (which is made of different length lists) and how the asterisk takes this into account producing all the different combinations of lines as a cartesian product. If you don’t believe you can try yourself copy and paste:

You can copy and paste and run it to see how it works. The output represents possible football schemes where the 1s and 0s correspond to presence/absence of players in that role according to

Now that the loop structure has been defined, of course there are other steps to prepare the data in order to invoke the hungarian algorithm.

Ok but what is the result if we apply that to real football?

Now if you work around this core and you have a real player names (rows) and real roles (columns) like with skill values in the cells like in figure 1 and you loop for every generated module as described above, the program will process every scheme, will assign a total value based on the data matrix and the highest ranking schemes AND lineup will be the outcome.

To give a feeling of the outcome I built, as an example, a skill data matrix based on Sofifa.com web site data for some world known clubs like RealMadrid (Spain) and Inter Milan (Italy) and Paris S.Germain (Fra) and Liverpool (Eng). As you can see in the sample below the outcome is rather realistic and well aligned to real deployed schemes/formations:

One of the Real Madrid (left) Inter Milan (right) schemes and lineup resulting from the highest ranked within the hundred processed by the Munkres algorithm
One of the PSG (left) Liverpool (right) schemes and lineup resulting from the highest ranked within the hundred processed by the Munkres algorithm

Final Comments and…

The experiment results have been quite interesting because the assignments worked well in real situations and allow for a structured approach to scheme and lineup decision. The developed code is a bit more complex as it takes into account also other factors like the center of gravity of player positions and other player features in order to give more weight or less weight to the schemes that could be in some cases too much offensive than balanced or defensive. In general as expected tuning and further processing is mandatory to refine the results.

… just Follow me

Thanks for reading and please consider following me in order for me to reach the threshold of number of followers so that Medium platform consider me in their partner program.

--

--

Andrea Grianti

IT Senior Manager and Consultant. Data Warehouse and Business Intelligence expertise in design and build. Freelance.