statisticallysound     About     Archive     Instagram

Jeopardy! Daily Double Heatmap

Jeopardy was one of many game shows I watched growing up. I’ve had fond memories sitting around the TV and competing with my family to see who could answer the most clues. Alex Trebek, you will be missed :(

To practice web scraping and work with heatmaps, I thought it would be a fun project to scrape the ~21,000 Daily Double locations from all the games archived on j-archive.com and see where Daily Doubles are most commonly placed. This heatmap includes Seasons 1-37.


Webscraping

The first step is to import the modules. BeautifulSoup was used to scrape data from j-archive.com.

import requests
from bs4 import BeautifulSoup
import numpy as np
from IPython.display import clear_output
import time

Next, I used BeautifulSoup to find the td_tag ‘clue_value_daily_double’ and appended the coordinates to a list. Notice that I set a time delay for 4 seconds between each request to reduce HTTP request errors. Once all the webpages have been scraped, the list is converted into a numpy array, grouping each x and y coordinate into one list.

positions = []
try:
    start = time.time()
    for i in range(9999):
        '{0:04}'.format(i)
        url = requests.get("http://www.j-archive.com/showgame.php?game_id={}".format(i))
        src = url.content
        soup = BeautifulSoup(src, 'lxml')

        for td_tag in soup.find_all("td"):
            try:
                if 'clue_value_daily_double' in td_tag.attrs['class']:
                    positions.append(td_tag.find_previous_sibling('td').attrs['id'])
            except KeyError:
                pass
        clear_output(wait=True)
        end = time.time()
        print("Game: " + str(i))
        print(end-start)
        time.sleep(4)
except:
    coordinates = []
    for position in positions:
        coordinates.append(position[-9])
        coordinates.append(position[-7])
    coordinates = np.array(coordinates, dtype='int32').reshape(-1, 2)
    print(coordinates)

Data Visualization

Now that the data has been collected, it’s time to visualize the data. With the numpy array from earlier, I converted the numpy array into a pandas dataframe to allow for easier plotting.

import pandas as pd
df = pd.DataFrame({'x':coordinates[:,0], 'y':coordinates[:,1]})
x y
0 5 3
1 1 3
2 2 3
3 5 5
4 3 3
... ... ...
20247 3 5
20248 5 5
20249 4 2
20250 1 3
20251 2 5

20252 rows × 2 columns


Matplotlib and Seaborn are utilized to visualize the data. To produce a heatmap, I used seaborn’s heatmap plotting and set the theme to plasma, which matches the color scheme used on Jeopardy.

import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

prop = fm.FontProperties(fname='C:/Windows/Fonts/SourceSansPro-Regular.ttf')
matplotlib.rcParams['font.sans-serif'] = "Source Sans Pro"
matplotlib.rcParams['font.family'] = "sans-serif"
matplotlib.rcParams.update({'font.size': 24})

data = np.histogram2d(df_new['x'], df_new['y'], bins=(6,5))[0]
data = np.rot90(data, k=3)
data = np.fliplr(data)


fig, ax = plt.subplots(figsize=(18, 10))
ax = sns.heatmap(data, linewidth=3, linecolor='black', cmap='plasma', cbar=True, annot=True)
for t in ax.texts:
    t.set_text(str(np.round(float(t.get_text())/len(df_new)*100, 2)) + " %")
plt.axis('off')
plt.show()

jeopardy_raw


Conclusion

Based on the heatmap, it is clear that most Daily Doubles are placed in the second from bottom row.

From a gameshow standpoint, it makes sense that Jeopardy would place the Daily Doubles in a clue towards the bottom. The norm is to play the game from top to bottom, incrementally earning money from the easier clues and building your way down the grid.

Jeopardy would not want to place the Daily Double at the top of the grid, since a contestant may have little to no money to bet on the Daily Double.

Placing the Daily Doubles on the bottom can be problematic as stronger contestants could establish their lead by betting more with the Daily Double and opening the gap.

The sweetspot, then, is near the bottom, allowing for more exciting games for the viewers.


jeopardy_final


Thanks for reading!