Predicting the NBA MVP w/ Python

My sincerest apologies for my absence on this blog, other things have been eating up my time. Thank you to everyone who supported the last post. Way bigger things are coming soon. My goal is to post every other Monday. Ambitious, I know. Without further ado, here is the highly anticipated story behind Goat Grade and how you can build your own NBA MVP forecaster.
---
During February of 2021, one year ago, I used data from Basketball Reference and Python to rank NBA players. After I uploaded my rankings to the internet, I didn't think too much of it. A few months later, the NBA MVP finalists were announced. To my surprise, my program predicted the top two front runners in the race for most valuable player. After Nikola Jokic was dubbed league MVP, it was official: using a program I wrote in a weekend, I predicted the MVP months before the finalists were even announced. This is the story of how I built Goat Grade and why my prediction may have been a total fluke. Or not?

Beginnings

The inspiration behind this project came in a few ways. I had just finished the first semester of junior year (high school) and was looking for a new project to work on.
Earlier that year, I released Reporty, a Python library with useful tools for organizing and distributing visual data. The library took in data sets and figures and generated an HTML page with said figures. I also included an option to send that page embedded in an email [1].
After wrapping up that project, I wanted to get started on something new. Building Reporty, with some help from my mentor [2], had given me a solid introduction into the world of data science. Still, I hadn't quite undertaken a project where I actually organized, cleaned, and modeled data from a real data set. Reporty used figures that were already made with existing (or even fake) data [3].
So, I contacted a friend from Boise, Idaho [4], and asked him if he wanted to collaborate. At first, we didn't have any particular idea of what we wanted to do. We bounced around a few ideas and lingered on the subject of web scraping, which both of us were interested in, but unfamiliar with. What better way to learn than through building a project? Because of our love for basketball, we zeroed in on scraping basketball statistics and using that data to generate some kind of fun set of predictions.
We frequently debated who the best basketball player in the world was, so why not settle the debate with some programming and data analysis? The idea of Goat Grade was born.

Scraping the data

Before I talk about how we scraped the statistics, allow me to explain "web scraping" in simple terms.
As we all know, the Internet contains almost an unlimited amount information. Whether its Chic-Fil-A's Sunday hours or the price of Yeezy Foam Runners on StockX, the information we seek is somewhere on a computer connected to the internet, a simple Google search away.
But, what if we could automate this process? Is it possible to write a program to extract the Ocean's Eleven rating from IMDb? Could it also alert me anytime the rating changes? The answer to these questions is absolutely, thanks to web scraping, which is the process of fetching text from a website and extracting data [5].
Okay, so now that we know what web scraping is, how do we actually do it with Python? Thanks to some useful libraries, its quite simple.
In our case, we want to scrape statistics from Basketball Reference. We know that the player statistics page looks like this:

Screenshot from Basketball Reference

The data is displayed in a table, where each row contains each player's stats. The table headers contain the categories and the table rows contain the statistics.
Heads up, I'm going to be walking through the actual code I wrote for Goat Grade, how it works, and my thought process behind writing said code. If you want to follow along and build your own Goat Grade, make sure you have a decent grip on programming fundamentals and Python syntax. If you're just here for the story, don't worry. Simply nod your head, pretend you understand, and gloss over the code.

Before we can make any predictions, we need data. To obtain this data, we use web scraping.
The following code [6] imports the urllib and Beautiful Soup Python modules, visits the website, and parses through the HTML. The code extracts all of the headers and stores them in a list.

from urllib.request import urlopen
from bs4 import BeautifulSoup

# i will be using data from the 2020 - 2021 NBA season
year = 2021
url = f"https://www.basketball-reference.com/leagues/NBA_{year}_per_game.html"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")

# "th" means table headers and "tr" means table rows
headers = [th.getText() for th in soup.findAll("tr", limit=2)[0].findAll("th")]
headers = headers[1:]

headers = ["Player", "Pos", "Age", "Tm", "G", "GS", "MP", "FG", "FGA", "FG%", "3P", "3PA", "3P%", "2P", "2PA", "2P%", "eFG%", "FT", "FTA", "FT%", "ORB", "DRB", "TRB", "AST", "STL", "BLK", "TOV", "PF", "PTS"]

After collecting all the headers (the statistical categories we are interested in), we need to gather the actual data, which is stored in the table rows. Beautiful Soup makes it easy to scrape the HTML code from each row.

rows = soup.findAll("tr")[1:]

The rows list contains all of the HTML code for each row in the table. A single item from this list would look like this:

<tr class="full_table">
    <th class="right" csk="140" data-stat="ranker" scope="row">140</th>
    <td class="left" csk="Durant,Kevin" data-append-csv="duranke01" data-stat="player">
        <a href="/players/d/duranke01.html">Kevin Durant</a>
    </td>
    <td class="center" data-stat="pos">PF</td>
    <td class="right" data-stat="age">32</td>
    <td class="left" data-stat="team_id"><a href="/teams/BRK/2021.html">BRK</a></td>
    <td class="right" data-stat="g">35</td>
    <td class="right" data-stat="gs">32</td>
    <td class="right non_qual" data-stat="mp_per_g">33.1</td>
    <td class="right non_qual" data-stat="fg_per_g">9.3</td>
    <td class="right non_qual" data-stat="fga_per_g">17.2</td>
    <td class="right" data-stat="fg_pct">.537</td>
    <td class="right non_qual" data-stat="fg3_per_g">2.4</td>
    <td class="right non_qual" data-stat="fg3a_per_g">5.4</td>
    <td class="right" data-stat="fg3_pct">.450</td>
    <td class="right non_qual" data-stat="fg2_per_g">6.8</td>
    <td class="right non_qual" data-stat="fg2a_per_g">11.8</td>
    <td class="right" data-stat="fg2_pct">.577</td>
    <td class="right" data-stat="efg_pct">.608</td>
    <td class="right non_qual" data-stat="ft_per_g">6.0</td>
    <td class="right non_qual" data-stat="fta_per_g">6.8</td>
    <td class="right" data-stat="ft_pct">.882</td>
    <td class="right non_qual" data-stat="orb_per_g">0.4</td>
    <td class="right non_qual" data-stat="drb_per_g">6.7</td>
    <td class="right non_qual" data-stat="trb_per_g">7.1</td>
    <td class="right non_qual" data-stat="ast_per_g">5.6</td>
    <td class="right non_qual" data-stat="stl_per_g">0.7</td>
    <td class="right non_qual" data-stat="blk_per_g">1.3</td>
    <td class="right non_qual" data-stat="tov_per_g">3.4</td>
    <td class="right non_qual" data-stat="pf_per_g">2.0</td>
    <td class="right non_qual" data-stat="pts_per_g">26.9</td>
</tr>

Would you look at that! All the data is there, we just need to clean it up by getting rid of all the HTML code.
Below I extract the data and store it in the stats dictionary by looping through each row and getting all the text between the <td> tags. Then, I created a new sub-dictionary for each player and used a double loop to fill each category in the headers list with the player's data.

stats = {}

for i in range(len(rows)):
    tds = rows[i].findAll("td")
    if len(tds) > 0:
        h = 0
        name = tds[0].getText()
        stats[name] = {}
        for td in tds:
            stats[name][headers[h]] = td.getText()
            h += 1

The stats dictionary now contains every player's statistics, neatly organized in the format we want. Here is what stats["Kevin Durant"] would look like:

{
    "Player": "Kevin Durant",
    "Pos": "PF",
    "Age": "32",
    "Tm": "BRK",
    "G": "35",
    "GS": "32",
    "MP": "33.1",
    "FG": "9.3",
    "FGA": "17.2",
    "FG%": ".537",
    "3P": "2.4",
    "3PA": "5.4",
    "3P%": ".450",
    "2P": "6.8",
    "2PA": "11.8",
    "2P%": ".577",
    "eFG%": ".608",
    "FT": "6.0",
    "FTA": "6.8",
    "FT%": ".882",
    "ORB": "0.4",
    "DRB": "6.7",
    "TRB": "7.1",
    "AST": "5.6",
    "STL": "0.7",
    "BLK": "1.3",
    "TOV": "3.4",
    "PF": "2.0",
    "PTS": "26.9"
    }

All that remains is dumping this dictionary into a JSON file for storage and we can start working with our data.

import json
with open("stats.json", "w+", encoding="utf8") as file:
    file.write(json.dumps(stats, ensure_ascii=False, indent=4))

There is a lot more data available to us. Basketball Reference keeps track of advanced stats, giving us more data to base our predictions off of. We can scrape this in the exact same way as before and then combine the advanced statistics with the regular statistics. Just change the url variable like so:

url = f"https://www.basketball-reference.com/leagues/NBA_{year}_advanced.html"

Now, we can combine our advanced stats with the regular stats like this:

for player in stats:
    stats[player].update(advanced_stats[player])

With the added categories, a single player's sub-dictionary looks like this:

{
    "Player": "Kevin Durant",
    "Pos": "PF",
    "Age": "32",
    "Tm": "BRK",
    "G": "35",
    "GS": "32",
    "MP": "33.1",
    "FG": "9.3",
    "FGA": "17.2",
    "FG%": ".537",
    "3P": "2.4",
    "3PA": "5.4",
    "3P%": ".450",
    "2P": "6.8",
    "2PA": "11.8",
    "2P%": ".577",
    "eFG%": ".608",
    "FT": "6.0",
    "FTA": "6.8",
    "FT%": ".882",
    "ORB": "0.4",
    "DRB": "6.7",
    "TRB": "7.1",
    "AST": "5.6",
    "STL": "0.7",
    "BLK": "1.3",
    "TOV": "3.4",
    "PF": "2.0",
    "PTS": "26.9",
    "TMP": "1157",
    "PER": "26.4",
    "TS%": ".666",
    "3PAr": ".313",
    "FTr": ".395",
    "ORB%": "1.3",
    "DRB%": "21.3",
    "TRB%": "11.8",
    "AST%": "27.5",
    "STL%": "1.0",
    "BLK%": "3.4",
    "TOV%": "14.5",
    "USG%": "31.2",
    "OWS": "3.7",
    "DWS": "1.2",
    "WS": "5.0",
    "WS/48": ".206",
    "OBPM": "6.4",
    "DBPM": "0.8",
    "BPM": "7.2",
    "VORP": "2.7"
}

After tweaking a few technicalities [7] to make sure the data is formatted in the correct way, the final web scraping script is done.

# webscrape.py

from urllib.request import urlopen
from bs4 import BeautifulSoup
import json

year = 2021

def scrape(url):
    html = urlopen(url)
    soup = BeautifulSoup(html, "html.parser")  
    
    headers = [th.getText() for th in soup.findAll("tr", limit=2)[0].findAll("th")]
    headers = headers[1:]

    rows = soup.findAll('tr')[1:]

    stats = {}
    for i in range(len(rows)):
        tds = rows[i].findAll("td")
        if len(tds) > 0:
            name = tds[0].getText()
            try:
                if stats[name] != {}:
                    h = 0
                    player_dict = {}
                    for td in tds:
                        header = headers[h]
                        if header == 'MP' and 'advanced' in url:
                            header = 'TMP'
                        stats[name][header] = td.getText()
                    if player_dict["Tm"] == "TOT":
                        stats[name] = player_dict
                    else:
                        pass
            except:
                stats[name] = {}
                h = 0
                for td in tds:
                    header = headers[h]
                    if header == 'MP' and 'advanced' in url:
                        header = 'TMP'
                    stats[name][header] = td.getText()
                    h += 1
    return stats

reg_stats_url = f"https://www.basketball-reference.com/leagues/NBA_{year}_per_game.html"
adv_stats_url = f"https://www.basketball-reference.com/leagues/NBA_{year}_advanced.html"

reg_stats = scrape(reg_stats_url)
adv_stats = scrape(adv_stats_url)

for player in reg_stats:
    reg_stats[player].update(adv_stats[player])
    del reg_stats[player]["<0xa0>"]

with open('stats.json', 'w+', encoding='utf8') as file:
    file.write(json.dumps(reg_stats, ensure_ascii=False, indent =4))

Ranking the players

We now need to develop some kind of algorithm to decide which player is the best overall. The first step in doing this is discerning what statistical categories are actually important and what categories don't matter. The thirteen categories that I thought would be most important were [8]:

"PTS": points
"AST": assists
"TRB": total rebounds
"FG%": field goal %
"FT%": free throw %
"3P%": three point %
"STL": steals
"BLK": blocks
"MP": minutes played
"PER": player efficiency rating
"TS%": true shooting %
"WS": win shares
"BPM": box plus/minus

categories = ["PTS", "AST", "TRB", "FG%", "FT%", "3P%", "STL", "BLK", "MP", "PER", "TS%", "WS", "BPM"]

Now that we have the categories by which we will evaluate each player, we have to make some choices. How important is each category? Is field goal percentage more important than three point percentage? Surely points matter more than steals, right? Unfortunately, at the time I was actually writing this code (one year ago), my main goal was to just get my rankings up and I didn't really care about how correct they were.
I determined the easiest and most simple way to rank every player was to give each player a ranking for each category, then add those rankings up to get a final score. Theoretically, the player with the lowest overall score is the best player.
For example, lets say a player is fifteenth overall in points and fourth overall in field goal percentage. That player's final score would be nineteen. If another player was eighth overall in points and ninth overall in field goal percentage, that player would have a better score than the first player, with a score of seventeen.

# brainstorming:

player_1_rankings = {
    "PTS": 15,
    "FG%": 4,
    "Overall rankings": 19
}

player_2_rankings = {
    "PTS": 8,
    "FG%": 9,
    "Overall rankings": 17  
}

# player_2 has a lower overall ranking, therefore player_2 is better

So, I'm weighting each of the thirteen categories the same. This means that a player with the highest free throw percentage gets same number tacked onto their final score (or "grade") as a player who has the highest points per game. Anyone who knows basketball is probably cringing while reading this, but it did end up working pretty well *foreshadowing*.
Let's write some code to get the categorical rankings for each player. I thought it would be best to store our rankings in a new variable, so I made a dictionary called ranks, where each player gets their own sub-dictionary.

rankings = {}
for player in stats:
    rankings[player] = {}

For the purposes of this explanation, let's say the only three players in the league are Stephen Curry, Kevin Durant, and James Harden (my three favorite players). Here is what ranks looks like right now:

{
    "Stephen Curry": {},
    "Kevin Durant": {},
    "James Harden": {}
}

The next step is to obviously fill each of these dictionaries with categorical rankings. Let's talk about how to find the player rankings for a given category. If we wanted to find the rankings of the three players in the "PTS" category, we would need to loop through all the players, find their points per game average, and compare it with the other players.
I did this by using a two-dimensional list: a list storing lists that each contain the player name and the player's statistic in the category of interest. Then, I sorted the list by that statistic and reversed it, leaving the player with the highest number at the beginning of the list. Continuing with the points per game example, the list would be sorted in descending order of points per game.

category_rankings = []
for player in stats:
    category_rankings.append([player, float(stats[player][category])])
category_rankings = sorted(category_rankings, key=lambda x: x[1])
category_rankings.reverse()

[['Stephen Curry', 32.0], ['Kevin Durant', 26.9], ['James Harden', 24.6]]

Now that we have this list, we can add rankings to the ranks dictionary by looping through the ordered list and storing the index in each sub-dictionary. The list index plus one is that player's ranking in the category. Kevin Durant is at index one in the list, so his ranking for points per game would be two.

for i in range(len(category_rankings)):
    name = category_rankings[i][0]
    value = category_rankings[i][1]
    ranks[name][category] = i + 1

{
    "Stephen Curry": {
        "PTS": 1
    },
    "Kevin Durant": {
        "PTS": 2
    },
    "James Harden": {
        "PTS": 3
    }
}

We can now loop through every category and gather rankings by putting that code block into a function and calling it for each category in the categories list.

def rank(category):
    category_rankings = []
    for player in stats:
        category_rankings.append([player, float(stats[player][category])])
    category_rankings = sorted(category_rankings, key=lambda x: x[1])
    category_rankings.reverse()

    for i in range(len(category_rankings)):
        name = category_rankings[i][0]
        value = category_rankings[i][1]
        ranks[name][category] = i + 1

for category in categories:
    rank(category)

The last thing we need to do is add all of these categorical rankings up into a final "grade" [9].

for player in ranks:
    score = 0
    for category in ranks[player]:
        score += ranks[player][category]
    ranks[player]['grade'] = score

{
    "Stephen Curry": {
        "PTS": 1,
        "AST": 2,
        "TRB": 3,
        "FG%": 2,
        "FT%": 1,
        "3P%": 2,
        "STL": 2,
        "BLK": 3,
        "MP": 2,
        "PER": 2,
        "TS%": 2,
        "WS": 1,
        "BPM": 1,
        "grade": 24
    },
    "Kevin Durant": {
        "PTS": 2,
        "AST": 3,
        "TRB": 2,
        "FG%": 1,
        "FT%": 2,
        "3P%": 1,
        "STL": 3,
        "BLK": 1,
        "MP": 3,
        "PER": 1,
        "TS%": 1,
        "WS": 3,
        "BPM": 3,
        "grade": 26
    },
    "James Harden": {
        "PTS": 3,
        "AST": 1,
        "TRB": 1,
        "FG%": 3,
        "FT%": 3,
        "3P%": 3,
        "STL": 1,
        "BLK": 2,
        "MP": 1,
        "PER": 3,
        "TS%": 3,
        "WS": 2,
        "BPM": 2,
        "grade": 28
    }
}

Let's write write our final rankings to a text file. We just need to sort all the players by the "grade" variable, which we can do the same way we sorted players by any other category. Then, we just add the player name and their grade to a string variable and write it to a text file.

final_ranks = []
for player in ranks:
    final_ranks.append([player, int(ranks[player]['grade'])])
final_ranks = sorted(final_ranks, key=lambda x: x[1])

final_string = ""
for i in range(len(final_ranks)):
    name = final_ranks[i][0]
    grade = final_ranks[i][1]
    if i == len(final_ranks) - 1:
        final_string += f"{str(i + 1)}. {name} (score: {str(grade)})"
    else:
        final_string += f"{str(i + 1)}. {name} (score: {str(grade)})\n"

1. Stephen Curry (score: 24)
2. Kevin Durant (score: 26)
3. James Harden (score: 28)

Putting it all together [10], this is the final ranking script:

# rankings.py

import json

with open('stats.json', 'r', encoding='utf8') as file:
    stats = json.load(file)

ranks = {}
for player in stats:
    ranks[player] = {}

categories = ["PTS", "AST", "TRB", "FG%", "FT%", "3P%", "STL", "BLK", "MP", "PER", "TS%", "WS", "BPM"]

def rank(category):
    category_rankings = []
    for player in stats:
        if stats[player][category] != "":
            category_rankings.append([player, float(stats[player][category])])
        else:
            category_rankings.append([player, 0])
    category_rankings = sorted(category_rankings, key=lambda x: x[1])
    category_rankings.reverse()

    for i in range(len(category_rankings)):
        name = category_rankings[i][0]
        value = category_rankings[i][1]
        ranks[name][category] = i + 1

for category in categories:
    rank(category)

for player in ranks:
    score = 0
    for category in ranks[player]:
        score += ranks[player][category]
    ranks[player]['grade'] = score

final_ranks = []
for player in ranks:
    final_ranks.append([player, int(ranks[player]['grade'])])
final_ranks = sorted(final_ranks, key=lambda x: x[1])

final_string = ""
for i in range(len(final_ranks)):
    name = final_ranks[i][0]
    grade = final_ranks[i][1]
    if i == len(final_ranks) - 1:
        final_string += f"{str(i + 1)}. {name} (score: {str(grade)})"
    else:
        final_string += f"{str(i + 1)}. {name} (score: {str(grade)})\n"  

with open('ranks.json', 'w+', encoding='utf8') as file:
    file.write(json.dumps(ranks, ensure_ascii=False, indent =4))

with open('final.txt', 'w+', encoding='utf8') as file:
    file.write(final_string)

Aftermath

After I got my rankings, I quickly put together a very basic [11] webpage displaying my results. This was in February 2021. I updated the site a few more times, extracting the statistics every so often (manually running the web scraping script). The last time I updated my rankings was April 18, 2021.

Screenshot from Goat Grade

After that, I left the page up for fun and forgot about updating it. Then, in late May, the MVP was announced, along with the runner ups. Nikola Jokic was crowned MVP and Joel Embiid was a close second.
I couldn't believe it. Looking back at my website, which I hadn't touched since April, Jokic and Embiid were the number one and two players on my list, respectively. In just a weekend of coding, I not only predicted who would win MVP, but also the runner up!

Nikola Jokic crowned MVP

Of course, I'll be the first to admit that it's very likely this was a fluke. I used no advanced math or statistical analysis to make my predictions and used no existing data sets that defined what made a player valuable. I simply ranked players by how high they placed in the thirteen statistical categories I chose, weighting each of the categories the same.
I believe the reason this worked so well is because Jokic and Embiid are both centers with elite offensive and defensive skill sets. Because they are bigs, they average a lot of blocks and rebounds, and high shooting percentages [12] compared to the rest of the league. What makes both of these players special (better than your average big) is that they can shoot and pass very well, not just for a center, but in general. Jokic ranked fourth overall in assists per game! Because of their height and skill, Jokic and Embiid can place very high in all categories.
For example, Stephen Curry and Kyrie Irving are both great players that specialize in being quick guards that can shoot and score exceptionally well. In terms of rebounding and defense, their size limits how much they can contribute in those categories. They specialize in scoring. On the other hand, Jokic and Embiid are pretty much good at everything and don't specialize in any one particular skill, which elevates all of their statistical categories, and thus their ranking. Is Embiid better than Curry? No. Embiid is just decent in more categories, while Steph exceeds in a few categories.
That's just my hypothesis. Last year, I personally didn't think Jokic was going to win MVP, as I said many times on Half Hour Hook. During January and February, I watched James Harden put the Nets on his back during that stretch when Kyrie Irving and Kevin Durant were both injured. Plus, the Nets had a better record than the Nuggets, with a way weaker supporting cast (after you take away Durant and Irving). In my opinion (as a Harden fan), Harden was the obvious MVP.
At the end of the day, the biggest shortcoming of Goat Grade is that it lacks the eye test. The eye test is simply determining who is better by watching the players, rather than studying the numbers all day. Is the eye test the best way to decide the MVP? No. You could easily get caught up in the hype of athletic dunks and forget about the guy who's been shooting a nearly perfect percentage from mid-range all season. Just like data analysis, the eye test is not the end all be all way to determine greatness.
That being said, I believe that to truly determine if a player is great, you need to both watch them and study their numbers. The perfect "Goat Grade" is a combination of statistical analysis and the eye test.

The future

I'm definitely going add more to this project. It's been a great way to tap into my love of both basketball and programming, and I'll continue working on it in my spare time.
Aside from the ranking process, there are a lot of things I need to improve on the website. The front end needs a ton of work, and my plan is to have multiple pages for each year's predictions and actual MVP results. After some work, Goat Grade will hopefully have a pretty slick UI.
For predictions, I can break out a lot more advanced math and statistical analysis. I'm currently not using any data science libraries like numpy or pandas, and I should definitely look into using some cool machine learning techniques to help me make predictions.
On top of that, I could also use data from previous seasons to make predictions for the future. Another useful category to incorporate into rankings could be team statistics, as MVP is largely determined by how good a player's team is and how much the player improves their team. Updating the data frequently is crucial as well. I'll set up some kind of way to automatically run the web scraping script and update the data more frequently.
My end goal is to allow each user to make their own "Goat Grade." A user can determine how much each category is weighted based on what they value in a player. This allows Goat Grade to be absolutely customizable to please any and every basketball fan [13].

Goat Grade concept

Thank you

If you've gotten to the end of this post: congratulations! It was a long one, as I had a lot to talk about. I hope everything made sense, even if you only know a little about basketball, computer science, or both.
I've had a ton of fun working on and writing about Goat Grade. Hopefully you had fun reading about it. "I predicted the MVP with my epic coding skills," makes for an interesting conversation starter, even if it was a fluke. Or was it? I guess we have to look to next season's predictions to know for sure...

[1] I am still improving Reporty. I may even write a blog post explaining how I built it and what my plans are for the future of the library. Check out our GitHub and fork the repository if you would like to contribute!

[2] btengels

[3] That being said, building Reporty taught me a lot about Python modules, image encoding, and automated emails.

[4] kadebaxter

[5] Yes, I do acknowledge there are other ways of getting information from websites. A lot of apps have their own API, which make things a lot easier when you want to automate the process of collecting data and information. The before mentioned IMDb problem can easily be solved with the IMDb Python library, which uses the IMDb API. This is what I use for my website's movies section. I can get titles, ratings, posters, and more with a simple Python function. I'll probably post a short tutorial on how to use my favorite APIs soon.

[6] I used some code from this article to get started.

[7] If a player has played on multiple teams during the same season, Basketball Reference has multiple rows for that player. See the Basketball Reference screenshot. LaMarcus Aldridge has three rows because he played on both the Spurs and Nets in one season. The "TOT" category is his stats in total from both teams. I added a chunk of code in the loop to make sure the player on the "TOT" team is the one added to the dictionary, because Python dictionaries do not allow duplicate keys. Another issue I thought of is if there are two players with the same name, there is no way of knowing if it is a different player or just that player on a different team. Luckily, I don't know of any two NBA players with the same exact name (fingers crossed).

[8] At the time, I didn't really put too much thought into this. I'm definitely going to do some more research and tweak these categories later.

[9] Going forward, I think the final score should be all the rankings added up and then divided by the amount of categories, yielding an average ranking rather than just a tally of all the rankings. Or that could just be another statistic. I'm still trying to determine the final score that makes the most sense to users.

[10] I added a line to check if a statistic was left blank and replaced it with a zero. For example, a player with no free throw percentage would have a percentage of zero rather than one-hundred.

[11] Very basic. Zero CSS. I was lazy.

[12] At center, they are mostly taking lay-ups and inside shots.

[13] I have a lot of other cool ideas that I won't bore you with on this post, but if you want to keep up on updates regarding Goat Grade, check out our GitHub repository and give it a star.

kd / asboyer.com

the beard / asboyer.com

steph / asboyer.com

Predicting the NBA MVP with Python

Andrew Boyer