Six Degrees of NBA Players - A BFS Approach

I. Introduction

Think of your favorite celebrity that has been in a movie. It is almost guranteed that through a web of mutual costars, there will always be a path that leads to Kevin Bacon. What's even more wild, is that you will probably find a connection that only needs 6 or less costars. Don't believe me, you can try it yourself by visiting The Oracle of Bacon. We'll go over the math in a bit, but I just wanted to introduce this website as a bit of a primer for what we are going to discuss, for it was this project that prompted the idea of creating a similar application, but using only NBA Players.

With this context, I created NBA Teams Apart (NBATA). NBATA is a simple web application in which you can select any two basketball players in professional history dating all the way back to 1947, and you will find a connection between them through shared teammates. If you want to try it yourself, you can visit NBATeamsApart.com.

NBA Teams Apart

II. Only Six Degrees, How?

It seems a bit remarkable that if you pick any two players in basketball history, that you will be able to find a chain of teammates that is pretty much always going to be less than 6 players. But why?

The answer to this is related to Graph Theory. Graph theory is a methematical branch that deals with the way objects are related to each other. I'm not going to go into a ton of depth into this because it's a bit outside of the scope of this simple blog post, so instead we'll go through a quick math exercise.

Four years ago, there was a Reddit post that indicated that there have been 5,018 players that have played in the NBA. To keep numbers simple, let's say that there are now 6,000 players in the league since the time this post was made. A big jump but we like round numbers over here.

\( T_p = 6000 \)

I feel like for some people, seeing this number is already making the claim that we can make a connection between two basketball players, a bit more believable. Let's continue. Now we're going to make the assumption that each player has 14 different teammates at a time. This is to assume that each NBA roster has 15 players.

\( N_p = 14 \)

Now, it's reasonable to assume that not every team was created only through draft picks since the inception of the team. So, to go take into account teams being traded, signed, and waived. Let's say that 7 players from each team have also been a part of a different team.

\( N_t = 7 \)

This means that those 7 teammates that came from a different team have a connection with 14 different players. Putting the pieces together, that means that any single player on a team \( P_c \) already has a network of 91 unique players.

\( P_c = (N_p * N_t) - N_t \)
\( P_c = (14 * 7) - 7 \)
\( P_c = 91 \)

So with just one player and one connection, we get 91. What happens if we want to expand this into two connections. Well to keep it simple, we can just square the number we calculated and get \( P_{c2} = 8,281 \). 8,281 players alraedy puts us above all NBA players in history. Of course the numbers don't explode that quickly when it comes to our example. There is a bit of overlap between players and teams and I don't think I need to clarify this but not all 6,000 players were active at the same time. The point of this example was to show how quickly numbers can blow up.

This example has some backing though. Facebook did an in-depth analysis to determine what the average distance between users on Facebook was and the answer was 4.7. A shockingly low number. If you want to read the entire paper, you can find it here, The Anatomy of the Facebook Social Graph

III. A Breadth First Search Approach

The math is great and all but how exactly do we implement an algorithm to filter through every player, every team, and every connection to ensure that not only do we get the correct answer, but we don't run a million calculations every single time. Enter Breadth-First Search (BFS).

BFS is an algorithm that traverses nodes in a graph by exploring every neighbor at the current level before moving on to the next. This is different from Depth-First Search (DFS), which follows a path as far as it can before backtracking. While both algorithms have their place, if the goal is to find the shortest path between two nodes, BFS is typically the better choice. Because it explores all nodes at one “layer” before moving deeper, the first time you reach a node you can be confident that you’ve found the shortest path to it. With DFS, you would need to explore every possible path to guarantee the shortest one.

Let's use an example with NBA context.

Let's say we want to find the relationship between Stephen Curry (Player A) and Anthony Edwards (Player B).

The first thing we need to do is get a list of all the teams that Steph Curry has played on in his career.

Team ID	Team Label	Season
GSW10	Golden State Warriors	2009-10
GSW11	Golden State Warriors	2010-11
GSW12	Golden State Warriors	2011-12
…	…	…
GSW25	Golden State Warriors	2024-25

Table 1. Stephen Curry Team History

Now, what we can do is iterate through each team and then create a queue that has a list of all players from each team.

Player ID	Player Label	Team ID
azubuke01	Kelenna Azubuike	GSW10
bellra01	Raja Bell	GSW10
ellismo01	Monta Ellis	GSW10
…	…	…
spencpa01	Pat Spencer	GSW25

Table 2. Players from Steph Curry Teams

At this point, I also want to bring up another important aspect of DSA type problems and that are sets and dictionaries. As we iterate through each team and player, we are storing them in a set so we can know if we have already visited a team or a player. This makes it so that we do not commit duplicate calculations or get stuck in an infinite loop.

\[ \text{Seen}_{\text{Teams}} = \{ \text{GSW}_{10}, \text{GSW}_{11}, \text{GSW}_{12}, \ldots, \text{GSW}_{25} \} \]

\[ \text{Seen}_{\text{Players}} = \{ \text{azubuke01}, \text{bellra01}, \text{ellismo01}, \ldots, \text{spencpa01} \} \]

I also mentioned a queue earlier. For our first iteration, our queue is going to look the same as the SeenPlayers set. That's because our queue is going to have every player from every team that Player A has played with.

\[ \text{queue} = \{ \text{azubuke01}, \text{bellra01}, \text{ellismo01}, \ldots, \text{spencpa01} \} \]

\[ \text{res} = 1\]

The res simply represents which iteration or "step" we are on. From here, because we know who Player B is, we simply iterate through all players until we get a match. In this case, we are iterating through all of the names in the queue until we find Anthony Edwards (edwaran01). This is where things get interesting! Because we know that Anthony Edwards and Steph Curry never played together, Anthony Edwards is not going to be in the queue. So what we do now, is we repeat that same process of iterating through each teammate that a player has had in their career, with every single player in the queue. Steph Curry alone has played with 100+ different teammates. Imagine how many players are going to be in our SeenPlayers set once we iterate through all of those players and figure out how many teammates all of those players have played with, combined. Doing an example like this shows us just how quickly these numbers can explode. It also takes a bit of the magic away from it but that's okay!

If you're curious to see what the link between Steph Curry and Anthony Edwards is, here's what my algorithm calculated.

Looking at it from a tree perspective, it sort of looks like this

IV. Thoughts

I wanted to pop by and make a few notes. First of all, even though our algorithm said that the shortest bath between Curry and Ant is through the use of Donte, that does not mean that it's the only valid path. To keep things optimized, the algorithm exits out of the iteration and returns the chain the moment it finds its first connection. As mentioned, this is because BFS will always return the shortest path between two objects. There is no need to keep iterating through names because we know for a fact that we alraedy have the shortest path. For example, we know Kyle Anderson also played for the Timberwolves alongside Anthony Edwards and he also played on the Warriors. Curry -> Anderson -> Edwards is a valid connection, but because it's the same length as the connection with Donte, it doesn't really matter.

Return to Portfolio