Team Gaming Rankings
Ranking My Friends on Generals
-
During the course of the pandemic, I started playing a decade old game called Command&Conquer Generals:Zero Hour with my brother and a bunch of friends.
-
Out of curiousity, we started keeping track of the stats from all the games and show wins and losses for each person.
-
This is a team game though so being able to pull out an individual’s overall excellence and measure of how much better the person was a little more tricky.
-
For this project, I wanted to ingest the stats, create a model for predicting each person’s relative strength, and end up with a score that could be used to create more even teams for future games.
The scores are stored in a google sheet located here
The output of tableau dashboard can be found here or embedded at the end of this page.
#import all the libraries we'll use
import math
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from numpy import mean, std
import matplotlib.pyplot as plt
from itertools import combinations
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
RANDOM_STATE = 42
MAX_NUMBER_OF_GAMES = 25
The games were stored on a google sheet so the following functions ingest the most up-to-date data.
def gsheet_api_check(SCOPES):
creds = None
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(
'credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
return creds
def pull_sheet_data(SCOPES,SPREADSHEET_ID,DATA_TO_PULL):
creds = gsheet_api_check(SCOPES)
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
result = sheet.values().get(
spreadsheetId=SPREADSHEET_ID,
range=DATA_TO_PULL).execute()
values = result.get('values', [])
if not values:
print('No data found.')
else:
rows = sheet.values().get(spreadsheetId=SPREADSHEET_ID,
range=DATA_TO_PULL).execute()
data = rows.get('values')
print("COMPLETE: Data copied")
return data
def push_sheet_data(SCOPES,SPREADSHEET_ID,RANGE, DATA_TO_PUSH):
creds = gsheet_api_check(SCOPES)
service = build('sheets', 'v4', credentials=creds)
sheet = service.spreadsheets()
body = {
'values': DATA_TO_PUSH
}
result = sheet.values().update(
spreadsheetId=SPREADSHEET_ID, range=RANGE,
valueInputOption='USER_ENTERED', body=body).execute()
data = result.get('updatedCells')
print('{0} cells updated.'.format(data))
return data
Import Data from Google Sheets
# If modifying these scopes, delete the file token.pickle.
#SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']
SCOPES = ['https://www.googleapis.com/auth/spreadsheets']
# The ID and range of a sample spreadsheet.
SPREADSHEET_ID = '1ks6mqMbTgVFkQE-rZDByKVnGH4WMRMOdLSdabeMfZaA'
#Pulls data from the entire spreadsheet tab.
#DATA_TO_PULL = 'Games'
#or
#Pulls data only from the specified range of cells.
DATA_TO_PULL = 'Games!A1:Q4000'
data = pull_sheet_data(SCOPES,SPREADSHEET_ID,DATA_TO_PULL)
games = pd.DataFrame(data[1:], columns=data[0])
games = games.set_index('Index',drop=True)
#df.head()
numeric_columns = ['Team', 'Win', 'Game', 'Units Created',
'Units Lost', 'Units Destroyed', 'Buildings Constructed',
'Buildings Lost', 'Buildings Destroyed', 'Supplies Collected', 'Rank',
'Inverse Rank', 'Normalized Rank']
for col in numeric_columns:
games[col] = pd.to_numeric(games[col]).copy()
games['Date'] = pd.to_datetime(games['Date'])
games.head()
COMPLETE: Data copied
Date | Name | Faction | Team | Win | Game | Units Created | Units Lost | Units Destroyed | Buildings Constructed | Buildings Lost | Buildings Destroyed | Supplies Collected | Rank | Inverse Rank | Normalized Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Index | ||||||||||||||||
1 | 2021-02-03 | Matt | USA | 1.0 | 1.0 | 1 | 216.0 | 195.0 | 140.0 | 47.0 | 6.0 | 14.0 | 360860.0 | 1.0 | 7.0 | 0.125 |
2 | 2021-02-03 | Skippy | USA | 2.0 | NaN | 1 | 129.0 | 132.0 | 115.0 | 63.0 | 12.0 | 9.0 | 260440.0 | 2.0 | 6.0 | 0.250 |
3 | 2021-02-03 | Neo | China | 1.0 | 1.0 | 1 | 175.0 | 83.0 | 83.0 | 53.0 | 12.0 | 8.0 | 233450.0 | 3.0 | 5.0 | 0.375 |
4 | 2021-02-03 | TVH | USA | 1.0 | 1.0 | 1 | 98.0 | 93.0 | 93.0 | 47.0 | 11.0 | 8.0 | 211565.0 | 4.0 | 4.0 | 0.500 |
5 | 2021-02-03 | Pancake | China | 2.0 | NaN | 1 | 122.0 | 64.0 | 137.0 | 40.0 | 11.0 | 6.0 | 192521.0 | 5.0 | 3.0 | 0.625 |
Each row will have the player’s name (which must be consistent across the dataset), what faction they played as (USA, GLA, or China), their team number (which, by convention is 1 for the winning team), whether they won (1 if so, blank if not), stats from the end game screen, then three ways of calculating rank.
For basic familiarization, here’s some of the stats from the columns:
games.describe()
Team | Win | Game | Units Created | Units Lost | Units Destroyed | Buildings Constructed | Buildings Lost | Buildings Destroyed | Supplies Collected | Rank | Inverse Rank | Normalized Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 1962.000000 | 961.0 | 2017.000000 | 1992.000000 | 1992.000000 | 1992.000000 | 1992.000000 | 1992.000000 | 1992.000000 | 1.992000e+03 | 1998.000000 | 1998.000000 | 1998.000000 |
mean | 1.544852 | 1.0 | 167.371344 | 130.930723 | 124.909137 | 116.313755 | 32.239960 | 13.498996 | 13.303715 | 1.515431e+05 | 3.644645 | 4.355355 | 0.589339 |
std | 0.593417 | 0.0 | 107.399041 | 109.309164 | 136.530457 | 132.316169 | 22.944272 | 16.695097 | 17.853459 | 1.763671e+05 | 2.037910 | 2.037910 | 0.286096 |
min | 1.000000 | 1.0 | 1.000000 | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 1.000000 | 0.000000 | 0.125000 |
25% | 1.000000 | 1.0 | 75.000000 | 58.000000 | 43.000000 | 35.000000 | 17.000000 | 2.000000 | 2.000000 | 5.753900e+04 | 2.000000 | 3.000000 | 0.333333 |
50% | 2.000000 | 1.0 | 154.000000 | 96.000000 | 85.000000 | 78.000000 | 27.000000 | 9.000000 | 7.000000 | 9.960250e+04 | 3.000000 | 5.000000 | 0.600000 |
75% | 2.000000 | 1.0 | 265.000000 | 169.000000 | 163.000000 | 149.000000 | 39.000000 | 19.000000 | 19.000000 | 1.808620e+05 | 5.000000 | 6.000000 | 0.833333 |
max | 5.000000 | 1.0 | 357.000000 | 1038.000000 | 2108.000000 | 1953.000000 | 221.000000 | 159.000000 | 226.000000 | 3.142320e+06 | 8.000000 | 7.000000 | 1.000000 |
Basic Cleaning
There may be some rows of data that need cleaning. The main method of dealing with it is to eliminate the problem games since we have enough data regardless.
#Remove any row that doesn't have an index
games = games[(~games.index.isna())&(~games.Game.isna())].copy()
#Set the Game row to be integers instead of floats since we'll use it to make ranges
games.loc[:,'Game']=games.Game.astype(np.int32)
#Remove any game where team data isn't present and only include the columns up to Normalized Rank
games=games.loc[~games.Team.isna(),games.columns[0:16]].copy()
#Win's are designated with a 1 if there's a win, and are empty (NA) if it's a loss
#If it's a loss, we need to use a 0, otherwise it will throw off our average win calculations
games.loc[games.Win.isna(),'Win'] = games.loc[games.Win.isna(),'Win'].fillna(0)
#For every person calculate Win ratio and average rank from normalized rankings
for name in games.Name.unique():
#print(name)
games.loc[games.Name==name,'Win Ratio'] = games.loc[games.Name==name,'Win'].mean()
games.loc[games.Name==name,'Avg Rank'] = games.loc[games.Name==name,'Normalized Rank'].mean()
#We use team 1 to designate which team won in another program to see which people win and lose the most together,
#but we need to mix this up or the computer's predictive model would take that as way to easily cheat.
for i in range( int(games.Game.max()+1)):
#randomize the team numbers for each game
team_1 = np.random.choice([0,1])
team_2 = 1-team_1
games.loc[(games.Game==i)&(games.Team==1),'Team'] = team_1
games.loc[(games.Game==i)&(games.Team==2),'Team'] = team_2
games.head()
Date | Name | Faction | Team | Win | Game | Units Created | Units Lost | Units Destroyed | Buildings Constructed | Buildings Lost | Buildings Destroyed | Supplies Collected | Rank | Inverse Rank | Normalized Rank | Win Ratio | Avg Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Index | ||||||||||||||||||
1 | 2021-02-03 | Matt | USA | 0.0 | 1.0 | 1 | 216.0 | 195.0 | 140.0 | 47.0 | 6.0 | 14.0 | 360860.0 | 1.0 | 7.0 | 0.125 | 0.606383 | 0.470567 |
2 | 2021-02-03 | Skippy | USA | 1.0 | 0.0 | 1 | 129.0 | 132.0 | 115.0 | 63.0 | 12.0 | 9.0 | 260440.0 | 2.0 | 6.0 | 0.250 | 0.456000 | 0.725281 |
3 | 2021-02-03 | Neo | China | 0.0 | 1.0 | 1 | 175.0 | 83.0 | 83.0 | 53.0 | 12.0 | 8.0 | 233450.0 | 3.0 | 5.0 | 0.375 | 0.521739 | 0.439990 |
4 | 2021-02-03 | TVH | USA | 0.0 | 1.0 | 1 | 98.0 | 93.0 | 93.0 | 47.0 | 11.0 | 8.0 | 211565.0 | 4.0 | 4.0 | 0.500 | 0.439024 | 0.708537 |
5 | 2021-02-03 | Pancake | China | 1.0 | 0.0 | 1 | 122.0 | 64.0 | 137.0 | 40.0 | 11.0 | 6.0 | 192521.0 | 5.0 | 3.0 | 0.625 | 0.450331 | 0.618046 |
Create New Training Data
We’re now going to create synthetic training data. For this, we’ll look at each player’s statistics, figure out for this particular game, what his stats were for the last N games, take the average, and use that as their nominal stats each game, then pretend the two teams played each other and make the logistic regression model predict which team will win.
#Create a new column that contains the winning team
df = pd.DataFrame(games.loc[games.Win==1,:].groupby('Game').mean().Team.astype(np.int32))
df.columns = ['Winning_Team']
#We'll use this for segmenting out which columns to use for predicting the winning team
prediction_columns = ['Units Created',
'Units Lost',
'Units Destroyed',
'Buildings Constructed',
'Buildings Lost',
'Buildings Destroyed',
'Supplies Collected',
'Avg Rank',
'Win Ratio']
games_copy = games.copy()
df.head()
Winning_Team | |
---|---|
Game | |
1 | 0 |
2 | 1 |
3 | 1 |
4 | 1 |
5 | 1 |
#for each player, and each game, create their average win and rank stats for the previous N games
for i in range( int(games_copy.Game.max()+1)):
for name in games_copy.loc[games_copy.Game==i,'Name'].unique():
name_bool=games_copy.Name==name
game_bool=games_copy.Game==i
games_copy.loc[(name_bool)&(game_bool),'Win Ratio'] = games_copy.loc[(name_bool)&(games_copy.Game<=i),'Win'].tail(MAX_NUMBER_OF_GAMES).mean()
games_copy.loc[(name_bool)&(game_bool),'Avg Rank'] = games_copy.loc[(name_bool)&(games_copy.Game<=i),'Normalized Rank'].tail(MAX_NUMBER_OF_GAMES).mean()
games_copy.head()
Date | Name | Faction | Team | Win | Game | Units Created | Units Lost | Units Destroyed | Buildings Constructed | Buildings Lost | Buildings Destroyed | Supplies Collected | Rank | Inverse Rank | Normalized Rank | Win Ratio | Avg Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Index | ||||||||||||||||||
1 | 2021-02-03 | Matt | USA | 0.0 | 1.0 | 1 | 216.0 | 195.0 | 140.0 | 47.0 | 6.0 | 14.0 | 360860.0 | 1.0 | 7.0 | 0.125 | 1.0 | 0.125 |
2 | 2021-02-03 | Skippy | USA | 1.0 | 0.0 | 1 | 129.0 | 132.0 | 115.0 | 63.0 | 12.0 | 9.0 | 260440.0 | 2.0 | 6.0 | 0.250 | 0.0 | 0.250 |
3 | 2021-02-03 | Neo | China | 0.0 | 1.0 | 1 | 175.0 | 83.0 | 83.0 | 53.0 | 12.0 | 8.0 | 233450.0 | 3.0 | 5.0 | 0.375 | 1.0 | 0.375 |
4 | 2021-02-03 | TVH | USA | 0.0 | 1.0 | 1 | 98.0 | 93.0 | 93.0 | 47.0 | 11.0 | 8.0 | 211565.0 | 4.0 | 4.0 | 0.500 | 1.0 | 0.500 |
5 | 2021-02-03 | Pancake | China | 1.0 | 0.0 | 1 | 122.0 | 64.0 | 137.0 | 40.0 | 11.0 | 6.0 | 192521.0 | 5.0 | 3.0 | 0.625 | 0.0 | 0.625 |
Reduce each game to a single row of stats
- For predicting each game, we’re going to sum the stats for each team, then take the difference.
- For games with large negative numbers, this will indicate that team 0 won, for mostly positive, it would show team 1 won.
- For predicting each game, most of the stats should be added except for the game, team, and win stats.
values = games_copy.loc[games_copy.Team==1,:].iloc[:,3:].groupby('Game').agg({'Team':'mean',
'Win':'mean',
'Game':'mean',
'Units Created':'sum',
'Units Lost':'sum',
'Units Destroyed':'sum',
'Buildings Constructed':'sum',
'Buildings Lost':'sum',
'Buildings Destroyed':'sum',
'Supplies Collected':'sum',
'Rank':'sum',
'Inverse Rank':'sum', #not used
'Normalized Rank':'sum', #not used
'Win Ratio':'sum',
'Avg Rank':'sum'
}) - games_copy.loc[
games_copy.Team==0,:].iloc[:,3:].groupby('Game').agg({'Team':'mean',
'Win':'mean',
'Game':'mean',
'Units Created':'sum',
'Units Lost':'sum',
'Units Destroyed':'sum',
'Buildings Constructed':'sum',
'Buildings Lost':'sum',
'Buildings Destroyed':'sum',
'Supplies Collected':'sum',
'Rank':'sum',
'Inverse Rank':'sum', #not used
'Normalized Rank':'sum', #not used
'Win Ratio':'sum',
'Avg Rank':'sum',
})
#create the difference columns
diff_cols = []
for col in prediction_columns:
column_name = col+'_diff'
diff_cols += [column_name]
#this really isn't needed anymore but the winning team will still be needed for training value
df.loc[:,column_name] = values.loc[:,col]
#create nominal game stats based on median stats for each player
predicted_games = []
#make stats for each game
for game in games_copy.Game.unique():
#print('game ', game)
team_values = []
#make stats for each team
for team in range(2):
#print('team ', team)
names = []
#make stats for each player on this team
games_copy.loc[(games_copy.Name==name)&(games.Game<=i),'Win'].tail(MAX_NUMBER_OF_GAMES).mean()
for name in games_copy.loc[(games_copy.Game==game)&(games_copy.Team==team),'Name'].values:
name_stats = games_copy.loc[(games_copy.Name==name)&(games.Game<=game)].tail(MAX_NUMBER_OF_GAMES).iloc[:,6:].median()
name_stats['Win Ratio'] = games_copy.loc[(games_copy.Name==name)&(games_copy.Game<=game),'Win'].tail(MAX_NUMBER_OF_GAMES).mean()
name_stats['Avg Rank'] = games_copy.loc[(games_copy.Name==name)&(games_copy.Game<=game),'Normalized Rank'].tail(MAX_NUMBER_OF_GAMES).mean()
names += [name_stats]
names[-1].loc['Win_avg'] = games_copy.loc[(games_copy.Name==name)&
(~games_copy.Team.isna())&
(games_copy.Game<=game)].Win.fillna(0).tail(MAX_NUMBER_OF_GAMES).mean()
#print(name)
#combine all the medians and sum them together
#Summing works better than an average or median since if the teams have uneven number of players, the weight is on the side with more players
team_values += [pd.concat(names, axis = 1).T.sum()]
predicted_games +=[team_values[1]-team_values[0]]
X_generated = pd.concat(predicted_games, axis = 1).T[prediction_columns]
X_generated.columns = diff_cols
X_generated.head()
Units Created_diff | Units Lost_diff | Units Destroyed_diff | Buildings Constructed_diff | Buildings Lost_diff | Buildings Destroyed_diff | Supplies Collected_diff | Avg Rank_diff | Win Ratio_diff | |
---|---|---|---|---|---|---|---|---|---|
0 | -90.0 | -79.0 | 75.0 | 4.0 | 0.0 | -1.0 | -99050.0 | 0.500000 | -4.000000 |
1 | 199.0 | 251.5 | 65.0 | 57.5 | -13.0 | 45.0 | 378716.0 | -0.229167 | 2.500000 |
2 | 19.0 | 50.0 | -96.5 | 64.5 | -31.5 | 25.5 | 81672.5 | -0.638889 | 2.166667 |
3 | 208.0 | 202.0 | 15.0 | 54.0 | 58.5 | -37.0 | 173173.5 | 0.173611 | 1.083333 |
4 | 95.5 | 162.5 | 56.5 | 44.5 | 22.5 | 12.0 | 350781.5 | -1.067361 | 1.633333 |
Create the Training Data
y_cols= ['Winning_Team']
X = X_generated
y = np.ravel(df[y_cols])
y[0:5]
array([0, 1, 1, 1, 1])
Gridsearch for parameters
The following is how I decided which solver algorithm and inverse regularization strength C to use for the logistic regresssion part of the model.
solvers = ['newton-cg', 'lbfgs','liblinear', 'sag', 'saga']
Cs = [1, 3, 10, 30, 100]
parameters = {'logisticregression__solver':solvers, 'logisticregression__C':Cs}
#standard test, train split should be 20-30% held back for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
#cross validation
cv = KFold(n_splits = 10, random_state=10, shuffle=True)
#create model
scaler = StandardScaler()
logreg = LogisticRegression(random_state=RANDOM_STATE)
#simple pipeline of normalizing all the stats then applying logistic regression
pipe = make_pipeline(scaler, logreg)
clf = GridSearchCV(pipe, parameters, cv=10)
clf.fit(X, y)
#make a heatmap of the results
def make_heatmap(ax, gs):
"""Helper to make a heatmap."""
results = pd.DataFrame.from_dict(gs.cv_results_)
results['params_str'] = results.params.apply(str)
scores_matrix = results.pivot(index='param_logisticregression__solver', columns='param_logisticregression__C',
values='mean_test_score')
im = ax.imshow(scores_matrix)
ax.set_xticks(np.arange(len(Cs)))
ax.set_xticklabels([x for x in Cs])
ax.set_xlabel('C', fontsize=15)
ax.set_yticks(np.arange(len(solvers)))
ax.set_yticklabels([x for x in solvers])
ax.set_ylabel('solver', fontsize=15)
# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
rotation_mode="anchor")
fig.subplots_adjust(right=0.8)
cbar_ax = fig.add_axes([0.85, 0.15, 0.05, 0.7])
fig.colorbar(im, cax=cbar_ax)
cbar_ax.set_ylabel('Mean Test Score', rotation=-90, va="bottom",
fontsize=15)
fig, axes = plt.subplots(ncols=1, sharey=True)
ax2 = axes
make_heatmap(ax2, clf)
ax2.set_title('GridSearch', fontsize=15)
plt.show()
Prediction Test
Create a logistic regression model, with test and training splits, and 10 cross validation folds for determing accuracy.
Also, since the supplies column is so much larger than the rest of the variables, we’re going to normalize the data set with the StandardScaler model which brings all the columns to a normal distribution by subtracting the mean and dividing by the standard deviation.
#standard test, train split should be 20-30% held back for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
#cross validation
cv = KFold(n_splits = 10, random_state=10, shuffle=True)
#create model
scaler = StandardScaler()
logreg = LogisticRegression(random_state=RANDOM_STATE,C=1, solver='lbfgs')
#simple pipeline of normalizing all the stats then applying logistic regression
pipe = make_pipeline(scaler, logreg)
pipe.fit(X_train, y_train) # apply scaling on training data
pipe.fit(X, y)
Pipeline(steps=[('standardscaler', StandardScaler()),
('logisticregression',
LogisticRegression(C=3, random_state=42))])
#score it
scores = cross_val_score(pipe, X, y, scoring = 'accuracy', cv=cv, n_jobs = -1)
print('10-fold cross validation accuracy: %.3f (%.3f stdev)' % (mean(scores), std(scores)))
plt.hist(scores)
plt.xlabel("Scores")
plt.show()
10-fold cross validation accuracy: 0.761 (0.075 stdev)
For this run, we got a 10-fold cross validation accuracy of 78%. This is normal for the process.
Relative importance of each feature
fig, ax = plt.subplots()
ax.bar(x = np.arange(len(pipe.steps[1][1].coef_[0])), height = -(pipe.steps[1][1].coef_[0]))
ax.set_xticks(np.arange(len(prediction_columns)))
ax.set_xticklabels(prediction_columns)
plt.xticks(rotation=-90)
plt.title('Feature Importance')
plt.show()
We can see that the model is rewarding the Win Ratio a lot, if the person has been winning a good deal recently, it will guess they continue to win.
There’s also some rewards for losing a lot of units but also destroying units, and less of a reward for creating a lot of units, and making a ton of supplies
Example ranking
Let’s see what the odds would be now for one team to play against some others, for instance, how likely is it that our best player, CoreDawg, is able to fend off two hard armies?
#chances of one team possibilities
first_team=['Neo', 'Shift', 'Matt', 'Skippy']
second_team= ['Hard', 'Hard', 'Hard', 'Hard']
test_team = [first_team, second_team]
team_values = []
for team in range(2):
#print('team ', team)
names = []
for name in test_team[team]:
#"Make sure we're getting correct names input:
if name not in games.Name.unique():
print(name +' not found')
break
name_stats = games_copy.loc[(games_copy.Name==name)].tail(MAX_NUMBER_OF_GAMES).iloc[:,6:].mean()
name_stats['Win Ratio'] = games_copy.loc[(games_copy.Name==name)&(games_copy.Game<=game),'Win'].tail(MAX_NUMBER_OF_GAMES).mean()
name_stats['Avg Rank'] = games_copy.loc[(games_copy.Name==name)&(games_copy.Game<=game),'Normalized Rank'].tail(MAX_NUMBER_OF_GAMES).mean()
names += [name_stats]
names[-1].loc['Win_avg'] = games_copy.loc[(games_copy.Name==name)&
(~games_copy.Team.isna())&
(games_copy.Game<=game)].Win.fillna(0).tail(MAX_NUMBER_OF_GAMES).mean()
team_values += [pd.concat(names, axis = 1).T.sum()]
predicted_games =[team_values[1]-team_values[0]]
X_predict = pd.concat(predicted_games, axis = 1).T[prediction_columns]
predicted_win = pipe.predict(X_predict)[0]
probability = pipe.predict_proba(X_predict)[0][predicted_win]
print('Between '+ ', '.join(first_team)+' and ' + ', '.join(second_team)+ ',\nModel predicts ' + ', '.join(test_team[predicted_win]) + ' with a '+"{:.2%}".format(probability)+' chance')
Between Neo, Shift, Matt, Skippy and Hard, Hard, Hard, Hard,
Model predicts Neo, Shift, Matt, Skippy with a 83.50% chance
Team Generator
What we’re really interested in though, is making even teams given a list of contestants.
We’ll use the combinations
function from itertools
to generate all possible teams that take half the players. For 8 players, there’s 70 different unique combinations. However, for each of those 70, there’s another unique team uses the other four players. This means we have effectively 35 different ways we can organize the players into two teams.
#all possibilities
all_names = ['Hard', 'Hard', 'Hard', 'Hard', 'Neo', 'Shift','Matt', 'pcap']
#how many combos are possible?
possible_combos = list(combinations(all_names,int(len(all_names)/2)))
non_parity_combos= int(len(possible_combos)/2)
possible_combos
[('Hard', 'Hard', 'Hard', 'Hard'),
('Hard', 'Hard', 'Hard', 'Neo'),
('Hard', 'Hard', 'Hard', 'Shift'),
('Hard', 'Hard', 'Hard', 'Matt'),
('Hard', 'Hard', 'Hard', 'pcap'),
('Hard', 'Hard', 'Hard', 'Neo'),
('Hard', 'Hard', 'Hard', 'Shift'),
('Hard', 'Hard', 'Hard', 'Matt'),
('Hard', 'Hard', 'Hard', 'pcap'),
('Hard', 'Hard', 'Neo', 'Shift'),
('Hard', 'Hard', 'Neo', 'Matt'),
('Hard', 'Hard', 'Neo', 'pcap'),
('Hard', 'Hard', 'Shift', 'Matt'),
('Hard', 'Hard', 'Shift', 'pcap'),
('Hard', 'Hard', 'Matt', 'pcap'),
('Hard', 'Hard', 'Hard', 'Neo'),
('Hard', 'Hard', 'Hard', 'Shift'),
('Hard', 'Hard', 'Hard', 'Matt'),
('Hard', 'Hard', 'Hard', 'pcap'),
('Hard', 'Hard', 'Neo', 'Shift'),
('Hard', 'Hard', 'Neo', 'Matt'),
('Hard', 'Hard', 'Neo', 'pcap'),
('Hard', 'Hard', 'Shift', 'Matt'),
('Hard', 'Hard', 'Shift', 'pcap'),
('Hard', 'Hard', 'Matt', 'pcap'),
('Hard', 'Hard', 'Neo', 'Shift'),
('Hard', 'Hard', 'Neo', 'Matt'),
('Hard', 'Hard', 'Neo', 'pcap'),
('Hard', 'Hard', 'Shift', 'Matt'),
('Hard', 'Hard', 'Shift', 'pcap'),
('Hard', 'Hard', 'Matt', 'pcap'),
('Hard', 'Neo', 'Shift', 'Matt'),
('Hard', 'Neo', 'Shift', 'pcap'),
('Hard', 'Neo', 'Matt', 'pcap'),
('Hard', 'Shift', 'Matt', 'pcap'),
('Hard', 'Hard', 'Hard', 'Neo'),
('Hard', 'Hard', 'Hard', 'Shift'),
('Hard', 'Hard', 'Hard', 'Matt'),
('Hard', 'Hard', 'Hard', 'pcap'),
('Hard', 'Hard', 'Neo', 'Shift'),
('Hard', 'Hard', 'Neo', 'Matt'),
('Hard', 'Hard', 'Neo', 'pcap'),
('Hard', 'Hard', 'Shift', 'Matt'),
('Hard', 'Hard', 'Shift', 'pcap'),
('Hard', 'Hard', 'Matt', 'pcap'),
('Hard', 'Hard', 'Neo', 'Shift'),
('Hard', 'Hard', 'Neo', 'Matt'),
('Hard', 'Hard', 'Neo', 'pcap'),
('Hard', 'Hard', 'Shift', 'Matt'),
('Hard', 'Hard', 'Shift', 'pcap'),
('Hard', 'Hard', 'Matt', 'pcap'),
('Hard', 'Neo', 'Shift', 'Matt'),
('Hard', 'Neo', 'Shift', 'pcap'),
('Hard', 'Neo', 'Matt', 'pcap'),
('Hard', 'Shift', 'Matt', 'pcap'),
('Hard', 'Hard', 'Neo', 'Shift'),
('Hard', 'Hard', 'Neo', 'Matt'),
('Hard', 'Hard', 'Neo', 'pcap'),
('Hard', 'Hard', 'Shift', 'Matt'),
('Hard', 'Hard', 'Shift', 'pcap'),
('Hard', 'Hard', 'Matt', 'pcap'),
('Hard', 'Neo', 'Shift', 'Matt'),
('Hard', 'Neo', 'Shift', 'pcap'),
('Hard', 'Neo', 'Matt', 'pcap'),
('Hard', 'Shift', 'Matt', 'pcap'),
('Hard', 'Neo', 'Shift', 'Matt'),
('Hard', 'Neo', 'Shift', 'pcap'),
('Hard', 'Neo', 'Matt', 'pcap'),
('Hard', 'Shift', 'Matt', 'pcap'),
('Neo', 'Shift', 'Matt', 'pcap')]
Now for each unique combination, we’ll create the two teams, and give a score of which team is predicted to win.
possibilities = []
for combo_index in range(non_parity_combos):
#create the two teams for this unique combination
first_team=list(possible_combos[combo_index])
second_team= list(possible_combos[2*non_parity_combos-combo_index-1])
test_team = [first_team, second_team]
#holders for each team's stats
team_values = []
for team in range(2):
#print('team ', team)
names = []
for name in test_team[team]:
#each person's stats from the last N games
names += [games.loc[games.Name==name].tail(MAX_NUMBER_OF_GAMES).iloc[:,6:].mean()]
#except for Win average which requires some extra calculations for blank spots
names[-1].loc['Win_avg'] = games.loc[(games.Name==name)&(~games.Team.isna())].Win.fillna(0).tail(MAX_NUMBER_OF_GAMES).mean()
#combine each team member's stats into a table for that team
team_values += [pd.concat(names, axis = 1).T.sum()]
predicted_game =[team_values[1]-team_values[0]]
X_predict = pd.concat(predicted_game, axis = 1).T[prediction_columns]
team_predicted_to_win = pipe.predict(X_predict)[0]
probability_of_winning = pipe.predict_proba(X_predict)[0][team_predicted_to_win]
possibilities +=[{"Team 1":first_team, "Team 2":second_team, "Predicted Team": team_predicted_to_win+1, "Probability": probability_of_winning}]
all_runs = pd.DataFrame(possibilities)
all_runs
Team 1 | Team 2 | Predicted Team | Probability | |
---|---|---|---|---|
0 | [Hard, Hard, Hard, Hard] | [Neo, Shift, Matt, pcap] | 2 | 0.965438 |
1 | [Hard, Hard, Hard, Neo] | [Hard, Shift, Matt, pcap] | 2 | 0.590536 |
2 | [Hard, Hard, Hard, Shift] | [Hard, Neo, Matt, pcap] | 2 | 0.594433 |
3 | [Hard, Hard, Hard, Matt] | [Hard, Neo, Shift, pcap] | 2 | 0.499961 |
4 | [Hard, Hard, Hard, pcap] | [Hard, Neo, Shift, Matt] | 2 | 0.996667 |
5 | [Hard, Hard, Hard, Neo] | [Hard, Shift, Matt, pcap] | 2 | 0.590536 |
6 | [Hard, Hard, Hard, Shift] | [Hard, Neo, Matt, pcap] | 2 | 0.594433 |
7 | [Hard, Hard, Hard, Matt] | [Hard, Neo, Shift, pcap] | 2 | 0.499961 |
8 | [Hard, Hard, Hard, pcap] | [Hard, Neo, Shift, Matt] | 2 | 0.996667 |
9 | [Hard, Hard, Neo, Shift] | [Hard, Hard, Matt, pcap] | 1 | 0.908797 |
10 | [Hard, Hard, Neo, Matt] | [Hard, Hard, Shift, pcap] | 1 | 0.940660 |
11 | [Hard, Hard, Neo, pcap] | [Hard, Hard, Shift, Matt] | 2 | 0.939488 |
12 | [Hard, Hard, Shift, Matt] | [Hard, Hard, Neo, pcap] | 1 | 0.944947 |
13 | [Hard, Hard, Shift, pcap] | [Hard, Hard, Neo, Matt] | 2 | 0.940093 |
14 | [Hard, Hard, Matt, pcap] | [Hard, Hard, Neo, Shift] | 2 | 0.914317 |
15 | [Hard, Hard, Hard, Neo] | [Hard, Shift, Matt, pcap] | 2 | 0.590536 |
16 | [Hard, Hard, Hard, Shift] | [Hard, Neo, Matt, pcap] | 2 | 0.594433 |
17 | [Hard, Hard, Hard, Matt] | [Hard, Neo, Shift, pcap] | 2 | 0.499961 |
18 | [Hard, Hard, Hard, pcap] | [Hard, Neo, Shift, Matt] | 2 | 0.996667 |
19 | [Hard, Hard, Neo, Shift] | [Hard, Hard, Matt, pcap] | 1 | 0.908797 |
20 | [Hard, Hard, Neo, Matt] | [Hard, Hard, Shift, pcap] | 1 | 0.940660 |
21 | [Hard, Hard, Neo, pcap] | [Hard, Hard, Shift, Matt] | 2 | 0.939488 |
22 | [Hard, Hard, Shift, Matt] | [Hard, Hard, Neo, pcap] | 1 | 0.944947 |
23 | [Hard, Hard, Shift, pcap] | [Hard, Hard, Neo, Matt] | 2 | 0.940093 |
24 | [Hard, Hard, Matt, pcap] | [Hard, Hard, Neo, Shift] | 2 | 0.914317 |
25 | [Hard, Hard, Neo, Shift] | [Hard, Hard, Matt, pcap] | 1 | 0.908797 |
26 | [Hard, Hard, Neo, Matt] | [Hard, Hard, Shift, pcap] | 1 | 0.940660 |
27 | [Hard, Hard, Neo, pcap] | [Hard, Hard, Shift, Matt] | 2 | 0.939488 |
28 | [Hard, Hard, Shift, Matt] | [Hard, Hard, Neo, pcap] | 1 | 0.944947 |
29 | [Hard, Hard, Shift, pcap] | [Hard, Hard, Neo, Matt] | 2 | 0.940093 |
30 | [Hard, Hard, Matt, pcap] | [Hard, Hard, Neo, Shift] | 2 | 0.914317 |
31 | [Hard, Neo, Shift, Matt] | [Hard, Hard, Hard, pcap] | 1 | 0.986479 |
32 | [Hard, Neo, Shift, pcap] | [Hard, Hard, Hard, Matt] | 1 | 0.545065 |
33 | [Hard, Neo, Matt, pcap] | [Hard, Hard, Hard, Shift] | 1 | 0.639590 |
34 | [Hard, Shift, Matt, pcap] | [Hard, Hard, Hard, Neo] | 1 | 0.639314 |
This gives us all the possible matchups and which teams are predicted to win and by how much
All that’s left is to rank the teams, smallest chance of winning to largest to get the most even teams. In other words, if the model has difficulty guessing which team would win, they’re more evenly matched.
desired_team_bool = all_runs.Probability == all_runs.Probability.min()
first_team = all_runs.loc[desired_team_bool, 'Team 1'].values[0]
second_team = all_runs.loc[desired_team_bool, 'Team 2'].values[0]
team_predicted_to_win= all_runs.loc[desired_team_bool, 'Predicted Team'].values[0]-1
test_team = [first_team, second_team]
probability = all_runs.Probability.min()
print('\n\nFor '+ ', '.join(all_names)+',\nThe most even teams are '+ ', '.join(first_team)+' and ' + ', '.join(second_team)+ ',\nI predict ' + ', '.join(test_team[team_predicted_to_win]) + ' with a '+"{:.2%}".format(probability)+' chance')
For Hard, Hard, Hard, Hard, Neo, Shift, Matt, pcap,
The most even teams are Hard, Hard, Hard, Matt and Hard, Neo, Shift, pcap,
I predict Hard, Neo, Shift, pcap with a 50.00% chance
Update the google sheet with the updated team maker stats
This boils down to the model’s rating of each person, they can just be added together for each team and google sheets or tableau can use that number to recreate the teams generated. Because logistic regression uses the sigmoid function so the probability will shift as more ratings are added or subtracted. This boils down to trying to find the team with the most even summation of their individual ratings.
#Player rankings
names = []
stats= {}
sheets_stats = [['Name', 'Predictive Rating']]
for name in games.Name.unique():
only_team=[name]
predicted_games = []
team_values = []
for team in range(2):
#print('team ', team)
names = []
for name in only_team:
name_stats = games_copy.loc[(games_copy.Name==name)].tail(MAX_NUMBER_OF_GAMES).iloc[:,6:].median()
name_stats['Win Ratio'] = games_copy.loc[(games_copy.Name==name),'Win'].tail(MAX_NUMBER_OF_GAMES).mean()
name_stats['Avg Rank'] = games_copy.loc[(games_copy.Name==name),'Normalized Rank'].tail(MAX_NUMBER_OF_GAMES).mean()
names += [name_stats]
names[-1].loc['Win_avg'] = games_copy.loc[(games_copy.Name==name)&
(~games_copy.Team.isna())].Win.fillna(0).tail(MAX_NUMBER_OF_GAMES).mean()
team_values += [pd.concat(names, axis = 1).T.sum()]
predicted_games +=[team_values[0]]
X_predict = pd.concat(predicted_games, axis = 1).T[prediction_columns]
#predicted_win = pipe.predict(X_predict)[0]
probability = pipe.predict_proba(X_predict)[0][1] #the player should always be 1
#stats[name] = -math.log((1 - probability)/probability)
stats[name] = probability
#sheets_stats +=[[name, -math.log((1 - probability)/probability)]]
sheets_stats +=[[name, probability]]
games.loc[games.Name==name, 'Predictive Rating'] = stats[name]
RANGE = "'Team Maker'!P1:Q"+str(len(sheets_stats))
push_sheet_data(SCOPES,SPREADSHEET_ID,RANGE, sheets_stats)
#make a quick plot to ensure no one is getting a good deal
games.groupby('Name')['Predictive Rating'].max().sort_values().plot.barh()
plt.title('Ranking based on last '+str(MAX_NUMBER_OF_GAMES)+' games')
plt.show()
38 cells updated.
Update Tableau spreadsheet
The tableau data source will have two tabs, one of all the indivdual games which is a copy and paste of our games dataframe.
The other tab will be a pre-calculated list of possible combinations for a 4v4, 3v3, or 2v2 game in binary form. This gives the possible name combinations for both the google sheet and tableau to ensure they’re hitting all the name combos, create the resulting team, and rank them.
v2 = []
v3 = []
v4 = []
#Do 4v4 since it will be the longest column and we don't have to error check the other two for rows with no names and ratings
for i in range(256):
if i == 0:
sheets_stats[0]+=['4v4']
number = bin(i+1).replace('0b',"")
if (sum([int(x) for x in number])==4) & (len(number)==8):
v4 +=[number]
try:
sheets_stats[len(v4)]+=[number]
except:
sheets_stats+=[["", "", number]]
for i in range(64):
if i == 0:
sheets_stats[0]+=['3v3']
number = bin(i+1).replace('0b',"")
if (sum([int(x) for x in number])==3) & (len(number)==6):
v3 +=[number]
sheets_stats[len(v3)]+=[number]
for i in range(16):
if i == 0:
sheets_stats[0]+=['2v2']
number = bin(i+1).replace('0b',"")
if (sum([int(x) for x in number])==2) & (len(number)==4):
v2 +=[number]
sheets_stats[len(v2)]+=[number]
pd.DataFrame(sheets_stats[1:], columns=sheets_stats[0])
Name | Predictive Rating | 4v4 | 3v3 | 2v2 | |
---|---|---|---|---|---|
0 | Matt | 0.945633 | 10000111 | 100011 | 1001 |
1 | Skippy | 0.74034 | 10001011 | 100101 | 1010 |
2 | Neo | 0.872259 | 10001101 | 100110 | 1100 |
3 | TVH | 0.772892 | 10001110 | 101001 | None |
4 | Pancake | 0.698927 | 10010011 | 101010 | None |
5 | Jack | 0.868168 | 10010101 | 101100 | None |
6 | Scottagorn | 0.894642 | 10010110 | 110001 | None |
7 | SS | 0.339708 | 10011001 | 110010 | None |
8 | STM | 0.536517 | 10011010 | 110100 | None |
9 | CoreDawg | 0.964444 | 10011100 | 111000 | None |
10 | Mike | 0.95847 | 10100011 | None | None |
11 | Medium | 0.73781 | 10100101 | None | None |
12 | Shift | 0.884507 | 10100110 | None | None |
13 | Pcap | 0.794414 | 10101001 | None | None |
14 | Hard | 0.822299 | 10101010 | None | None |
15 | Spiff | 0.879811 | 10101100 | None | None |
16 | Copper Kettle | 0.17295 | 10110001 | None | None |
17 | Tytan | 0.96384 | 10110010 | None | None |
18 | 10110100 | None | None | ||
19 | 10111000 | None | None | ||
20 | 11000011 | None | None | ||
21 | 11000101 | None | None | ||
22 | 11000110 | None | None | ||
23 | 11001001 | None | None | ||
24 | 11001010 | None | None | ||
25 | 11001100 | None | None | ||
26 | 11010001 | None | None | ||
27 | 11010010 | None | None | ||
28 | 11010100 | None | None | ||
29 | 11011000 | None | None | ||
30 | 11100001 | None | None | ||
31 | 11100010 | None | None | ||
32 | 11100100 | None | None | ||
33 | 11101000 | None | None | ||
34 | 11110000 | None | None |
Save the data source
games.to_excel('Generals Statistics.xlsx',sheet_name="Games")
from openpyxl import load_workbook
workbook = load_workbook(filename="Generals Statistics.xlsx")
workbook.create_sheet('Teams')
sheet = workbook['Teams']
for i, rows in enumerate(sheets_stats):
sheet.cell(row=i+1, column=1).value = rows[0]
sheet.cell(row=i+1, column=2).value = rows[1]
try:
sheet.cell(row=i+1, column=3).value = rows[2]
except:
pass
try:
sheet.cell(row=i+1, column=4).value = rows[3]
except:
pass
try:
sheet.cell(row=i+1, column=5).value = rows[4]
except:
pass
#add a final name of 'None' with a value of 0 so it tableau doesn't have to have eight people every time
sheet.cell(row=len(games.Name.unique())+2, column=1).value = 'None'
sheet.cell(row=len(games.Name.unique())+2, column=2).value = 0
workbook.save(filename="Generals Statistics.xlsx")
Conclusion
This was a great project for taking each person’s individual stats, creating some synthetic games based on them, and having a logistic regression model predict which team would win. With an ~80% accuracy rate, this could probably be improved with some more complex models but they’d be a lot less explainable as well as implementable inside of tableau and google sheets. Making this usable to the other team members when I’m not around was therefore more important than an extremely accurate model.
If you’d like to see the tableau page of the latest stats, they can be found here but I’ve also attempted to embed it below.
For a more interactive version click here
https://public.tableau.com/views/LifegroupTraxxNight/TraxxRaceStats?:language=en-US&publish=yes&:display_count=n&:showVizHome=no