Skip to the content.

My FPL Team Generator

Intro

This is my third season of FPL and ever since I was introduced to the game, I’ve been meaning to try my hand at writing an automated team picker. This year, thanks to COVID, Stanford gave us an abnormally long break. This, coupled with the fact that I can’t really go anywhere(also thanks to COVID) has given me a perfect opportunity to do just that. After a lot of tinkering, I’m at the point where I’m happy enough with the algorithm’s output for me to share it, though it’s far from perfect. I do plan to improve it more to see how good it can get, and hopefully by next season I will have ironed out many of the kinks. First, credit where credit is due

Algorithm overview

  1. Player Points Prediction

    In order to pick the best team, the algorithm first tries to predict each player’s point count over the next 10 gameweeks.

    The general process for player point prediction is:

    1. Calculate each team’s relative attacking and defensive strength and from that predict the score of each game
    2. Calculate how involved each player is in their team’s assists and goals as a percentage
    3. From step 1 and 2, predict the number of goals/assists of each player
    4. If the player is a gk, def, or mid, calculate probability of keeping a cs in each game
    5. If the player is a gk or def, calculate the probability of conceding 2+ goals
    6. If the player is a gk, try to predict number of saves
    7. From the above data, try to predict bonus points
    8. Convert the predicted goal values, assist values, etc into a predicted fpl score

    All player data was taken from the FPL api.

    The first and most important step of the algorithm is to find the relative attacking and defensive strength of each team(where a higher attacking strength and lower defensive strength is better). Again, a lot of the credit for this part goes to u/blubbersassafras for his writeup on a similar process(see above). In order to find the relative strength of any given team, the program loops through that team’s past few scores and compares the number of goals that team conceded and scored to how an average team would have fared. For example, let’s say team A beat team B 2-1 where team B is a fairly average team and we want to find team A’s attacking and defensive strength. In order to find team A’s relative attacking strength from this game, the program compares the number of goals team A scored to the number of goals the average team would score. Since team B is a fairly average team and the average team concedes 1.36 goals/game, team A outperformed the average by 2/1.36, so its attacking strength is 2/1.36. However, if team B had a defense that was half as good as the average team(d. strength = 2), you would expect the average team to score 1.36 * 2 or 2.72 goals. Since team A only scored 2 goals, team A only has an attacking strength of roughly 2/2.72. Likewise, the defensive strength of team A is 1/1.36 since the average team would concede 1.36 goals/game against team B’s average attack but team A only conceded 1(recall that lower defensive strengths and higher attacking strengths indicate a stronger team). If team B had an attack that was half as good as average(a. strength = 0.5), the average team would be expected to concede 0.5 * 1.36 or .68 goals. Therefore, team A’s defensive strength in this scenario would be 1 / 0.68 which is worse than average.

    Each team’s attacking and defensive strength is initially set to equal 1(the average) and this calculation is performed for each team’s last few games and averaged to find a total attacking and defensive score. This process is then repeated, this time using the previously calculated strengths in order to get a better picture of the opponent’s level when assessing each score. And then it’s repeated again using the new strengths from that calculation. And so on and so forth. I have included some python-esque pseudocode that shows the process in a bit more detail.

        def team_strengths():
        	 team_list = ['ARS', 'SOU', 'WBA', ...]
        	 # Past scores in the form:
        	 # {Team: [(Opposition1, Team goals, Opposition goals), ...] ... }
        	 scores = {'ARS': [('CRY', 2, 0), ('SOU', 1, 3), ('BUR', 1, 0), ...],
        			   'CRY': [('ARS', 0, 2), ('MUN', 3, 3), ('WBA', 4, 0) ...]
        			   ...}
        	 # Initialize attacking and defensive strengths to 1 for each team
        	 d_strengths = {'ARS': 1, 'BUR': 1, 'SOU': 1 ...}
        	 a_strengths = {'ARS': 1, 'BUR': 1, 'SOU': 1 ...}
        	 avg_goals = 1.36 # epl average goals/team/game
        	 num_iterations = 50
        	 for i in range(num_iterations): # Update strengths num_iterations times
        		  # The new strengths calculated from this iteration
        		  new_d_strengths = dict()
        		  new_a_strengths = dict()
        		  # find strengths for each team
        		  for current_team in team_list:
        			   current_team_d_strengths = list()
        			   current_team_a_strengths = list()
        			   for score in scores[current_team]:
        					opposition = score[0] # opposition team
        					goals_for = score[1] # current_team goals scored
        					goals_against = score[2] # current_team goals conceeded
        					opposition_d_strength = d_strengths[opposition][0]
        					opposition_a_strength = a_strengths[opposition][0]
        					# Find defensive strength from this game. Lower is better
        					current_team_d_strengths.append(goals_against / avg_scored / opp_a_strength)
        					# Find attacking strength from this game. Higher is better
        					current_team_a_strengths.append(goals_for / avg_scored / opp_d_strength)
        			   # Average the newly calculated strengths to find the team's
        			   # overall strength
        			   new_d_strengths[current_team] = average(current_team_d_strengths)
        			   new_a_strengths[current_team] = average(current_team_a_strengths)
        		  # If we haven't completed num_iterations, redo the process
        		  # newly calculated team strengths
        		  a_strengths = new_a_strengths
        		  d_strengths = new_d_strengths
        	 return d_strengths, a_strengths
    

    The initial idea was that attacking and defensive strengths would converge after enough iterations. In reality, they often alternated between 2-4 values. As a result, the final algorithm calculates team strengths for 54 iterations and uses the average of the last 4. It is possible that a higher iteration count would have lead to a convergence, but I’m not sure how long that would take and the average is likely very close to the theoretical value of convergence. In my actual implementation, I max out the defensive and offensive strength of each game to be 3 so as to remove the impact of outliers like WBA scoring against MCI who hadn’t conceded in a month and the 7-2 AVL-LIV game. Also, in my implementation, the past 10 gameweeks are considered where the 5 most recent fixtures are weighted twice as heavily as the other 5 fixtures. See the team_strengths function at the top of predict_points.py for the actual implementation. Once we have each team’s strength, we can predict any score. The formula for the score is

    avg_goals = 1.36

    (team1 goals, team2 goals) = (avg_goals * team1_a_strength * team2_d_strength, avg_goals * team2_a_strength * team1_d_strength)

    The rest of point prediction is fairly straightforward. As many of you know, FPL provides their own custom statistics entitled creativity and threat, where 100 creativity roughly corresponds to 1 assist and 100 threat roughly corresponds to 1 goal. Here is a link to a breakdown of how the calculation works. In order to calculate the number of goals a player is going to score, we calculate that player’s threat per minute(tpm) and their team’s overall threat per minute over the last 10 games. Then we figure out the expected goals their team is going to score in a game using the score formula above and apply the formula

    predicted_goals_player = (player_tpm/team_tpm) * predicted_goals_team

    We do the same process to figure out assists but with creativity per minute(cpm) so that

    expected assists = (player_cpm/team_cpm) * expected_goals * .75

    I multiply by .75 because I’d estimate roughly 25% of goals don’t have an assist.

    I won’t go into too much detail for the rest of the steps, but here is a quick overview of the highlights

    • We can use the poisson distribution formula to get probability of cleansheet = e(-predicted goals conceeded)
    • The poisson distribution formula is also used to find probability of opposition scoring 2+ goals(calculate the probability of each score and then add up the probabilities of each score where the opposition scores 2+ goals)
    • Goalkeeper save count is predicted by looking at their saves/opposition attacking strength/game over the last 10 games and then multiplying by new opposition attacking strength
    • Bonus points are calculated by dividing the number of bps of a player(found using the official bps formula and the previously calculated stats) by 16
      • I chose 16 because the average number of predicted bonus points when I use 16 is quite close to the real life average
    • Points are found using multiplication. 2 points are added for playing time
      • For example, if Ings, has 1.2 predicted goals and 0.23 predicted assists, his point value is 1.2*4 + .23*3 + 2
    • Players who have played less than 240 mins over the last 4 games have their point value overridden to 0 as they aren’t considered a regular starter
    • Players who are marked as 25%, 50%, and 75% have their point values multiplied accordingly(see the get_data function)
    • Players who are suspended and have long term injuries have their point values modified accordingly
  2. Team selection

    The idea of the team generation algorithm is simple: pick the team that maximizes predicted points(see previous section) given the constraints of FPL(11 starters, 3 subs, max of 3 players per team, budget, etc). Thankfully, python has a very convenient package called PuLP that makes solving problems like these a breeze. Essentially, all I had to do was input the constraints and predicted points and the package would spit out the optimal team. To pick the initial team, I inputted all of the FPL constraints into a PuLP model along with each player’s total predicted points over the next 10 gameweeks. In order to take into account the fact that subs don’t play all that much and captains get double points, I got the model to pick an overall starting XI, 3 subs, and 2 captains for the next 10 weeks with the assumptions that the first, second, and third subs play 20%, 10%, and 5% of the time and each captain gets 1.5x their original predicted points over the 10 weeks since we rotate captains. The sub percents and number of captains were all picked somewhat arbitrarily/by experimentation and intuition rather than having any real statistical basis, something that could be improved in the future. To pick the starting XI for any given gameweek, the algorithm picks the starting 11 that maximizes predicted points for that gameweek and then orders the subs from highest predicted points to the lowest. The transfer process is quite similar to the team selection process. The only difference is I added the constraint that the new team must have only 1 different player than the original team(this can be altered to 2 players, 3 players, etc).

Limitations/things to be improved/modifications

If you want to track my progress, I have an fpl team(team id 7742703) that I plan to keep updating with the algorithm’s choices as the season passes. My hope is to iron out the kinks for next season. I left out a number of details because this post is already sort of long, but feel free to check out the github and leave any feedback/questions.