statisticallysound     About     Archive     Instagram

Tioga Pass Opening Date

Every winter, Tioga Pass in Yosemite National Park closes due to the snow and the unsafe driving conditions. Depending on the snowfall for the year and the road conditions, Tioga Pass opens between May and June. Tioga Pass usually opens to bikes only for a week, and then it opens to the general public afterwards. What’s a reasonable prediction for the 2021 Tioga Pass Opening Date?

Cleaning the Data

First, let’s import the historical opening dates from the NPS. The National Park Service (NPS) has recorded the historical opening dates for Tioga Pass since the 1980s.

import pandas as pd
import numpy as np

df = pd.read_csv('.../yose-Websitesortabledataelementsheetsroadscampgroundstrails.csv')
df
Year Tioga Opened Tioga Closed Glacier Pt Opened Glacier Pt Closed Mariposa Grove Opened Mariposa Grove Closed Snowpack as of Apr 1
0 2020 * 15-Jun 5-Nov 11-Jun 5-Nov 11-Jun 25-Dec 46%
1 2019 * 1-Jul 19-Nov 10-May 25-Nov 16-Apr 26-Nov 176%
2 2018 * 21-May 20-Nov 28-Apr 20-Nov 15-Jun 30-Nov 67%
3 2017 * 29-Jun 14-Nov 11-May 14-Nov Closed Closed 177%
4 2016 * 18-May 16-Nov 19-Apr 16-Nov Closed Closed 89%
5 2015 * 4-May 1-Nov 28-Mar 2-Nov No closure 6-Jul 7%
6 2014 * 2-May 13-Nov 14-Apr 28-Nov No closure No closure 33%
7 2013 11-May 18-Nov 3-May 18-Nov No closure No closure 52%
8 2012 7-May 8-Nov 20-Apr 8-Nov No closure No closure 43%
9 2011 * 18-Jun 17-Jan 27-May 19-Nov 15-Apr No closure 178%
10 2010 5-Jun 19-Nov 29-May 7-Nov 21-May 20-Nov 107%
11 2009 19-May 12-Nov 5-May 12-Nov NaN NaN 92%
12 2008 21-May 30-Oct 2-May 12-Dec NaN NaN 99%
13 2007 11-May 6-Dec 4-May 6-Dec NaN NaN 46%
14 2006 17-Jun 27-Nov 25-May 27-Nov NaN NaN 129%
15 2005 24-Jun 25-Nov 25-May ? NaN NaN 163%
16 2004 14-May 17-Oct 14-May ? NaN NaN 83%
17 2003 31-May 31-Oct 31-May 31-Oct NaN NaN 65%
18 2002 22-May 5-Nov 17-May 5-Nov NaN NaN 95%
19 2001 12-May 11-Nov 15-May ? NaN NaN 67%
20 2000 18-May 9-Nov 15-May 9-Nov NaN NaN 97%
21 1999 * 28-May 23-Nov 28-May 23-Nov NaN NaN 110%
22 1998 1-Jul 12-Nov 1-Jul 6-Nov NaN NaN 156%
23 1997 13-Jun 12-Nov 22-May 12-Nov NaN NaN 105%
24 1996 31-May 5-Nov 24-May 5-Nov NaN NaN 111%
25 1995 30-Jun 11-Dec 1-Jul 11-Dec NaN NaN 178%
26 1994 25-May 10-Nov NaN NaN NaN NaN 51%
27 1993 3-Jun 24-Nov NaN NaN NaN NaN 159%
28 1992 15-May 10-Nov NaN NaN NaN NaN 58%
29 1991 26-May 14-Nov NaN NaN NaN NaN 79%
30 1990 17-May 19-Nov NaN NaN NaN NaN 45%
31 1989 12-May 24-Nov NaN NaN NaN NaN 83%
32 1988 29-Apr 14-Nov NaN NaN NaN NaN 31%
33 1987 2-May 13-Nov NaN NaN NaN NaN 51%
34 1986 24-May 29-Nov NaN NaN NaN NaN 137%
35 1985 8-May 12-Nov NaN NaN NaN NaN 97%
36 1984 19-May 8-Nov NaN NaN NaN NaN 85%
37 1983 29-Jun 11-Nov NaN NaN NaN NaN 224%
38 1982 28-May 15-Nov NaN NaN NaN NaN 131%
39 1981 15-May 12-Nov NaN NaN NaN NaN 77%
40 1980 6-Jun 2-Dec NaN NaN NaN NaN 144%
41 Average\n('96-'15) 26-May 14-Nov 14-May 15-Nov -- -- 92%
42 Median\n('96-'15) 21-May 12-Nov 16-May 10-Nov -- -- 96%

The only columns needed for analysis are “Year” and “Tioga Opened,” and the other columns can be dropped.

Using April 1st as a reference date, I created the function formatDates() to calculate the number of days since April 1st that Tioga Pass opened for each year, appending to the new column “Days Since Apr 1.”

import datetime

df = df.drop([41, 42])
df = df.drop(columns=['Tioga Closed', 'Glacier Pt Opened', 'Glacier Pt Closed', 'Mariposa Grove Opened', 'Mariposa Grove Closed', 'Snowpack as of Apr 1'])
df['Days Since Apr 1'] = 0

def formatDates(row):
    snowpack_day = datetime.datetime.strptime('1900-04-01 00:00:00', '%Y-%m-%d %H:%M:%S')
    tioga_open = datetime.datetime.strptime(row, '%d-%b')
    days_since = tioga_open - snowpack_day
    return days_since.days

for i in range(len(df)):
    df.iloc[i, 2] = formatDates(df.iloc[i, 1])

df
Year Tioga Opened Days Since Apr 1
0 2020 * 15-Jun 75
1 2019 * 1-Jul 91
2 2018 * 21-May 50
3 2017 * 29-Jun 89
4 2016 * 18-May 47
5 2015 * 4-May 33
6 2014 * 2-May 31
7 2013 11-May 40
8 2012 7-May 36
9 2011 * 18-Jun 78
10 2010 5-Jun 65
11 2009 19-May 48
12 2008 21-May 50
13 2007 11-May 40
14 2006 17-Jun 77
15 2005 24-Jun 84
16 2004 14-May 43
17 2003 31-May 60
18 2002 22-May 51
19 2001 12-May 41
20 2000 18-May 47
21 1999 * 28-May 57
22 1998 1-Jul 91
23 1997 13-Jun 73
24 1996 31-May 60
25 1995 30-Jun 90
26 1994 25-May 54
27 1993 3-Jun 63
28 1992 15-May 44
29 1991 26-May 55
30 1990 17-May 46
31 1989 12-May 41
32 1988 29-Apr 28
33 1987 2-May 31
34 1986 24-May 53
35 1985 8-May 37
36 1984 19-May 48
37 1983 29-Jun 89
38 1982 28-May 57
39 1981 15-May 44
40 1980 6-Jun 66

The snow depth data dates back to 2005, so the Tioga Pass opening dates can be reduced to 2005-2020.

# Drops the years before 2005, and reverses the dataframe so data is sorted from old --> new.
df = df[0:16]
df = df[::-1].reset_index(drop=True)
df
Year Tioga Opened Days Since Apr 1
0 2005 24-Jun 84
1 2006 17-Jun 77
2 2007 11-May 40
3 2008 21-May 50
4 2009 19-May 48
5 2010 5-Jun 65
6 2011 * 18-Jun 78
7 2012 7-May 36
8 2013 11-May 40
9 2014 * 2-May 31
10 2015 * 4-May 33
11 2016 * 18-May 47
12 2017 * 29-Jun 89
13 2018 * 21-May 50
14 2019 * 1-Jul 91
15 2020 * 15-Jun 75

Next, let’s import the snow depth data from the California Data Exchange Center. The station of interest is in Tuolumne Meadows (TUM).

import pandas as pd
df_TUM = pd.read_excel('.../TUM_18.xlsx')
df_TUM
STATION_ID DURATION SENSOR_NUMBER SENS_TYPE DATE TIME OBS DATE VALUE DATA_FLAG UNITS
0 TUM D 18 SNOW DP NaN 20041001.0 2.0 INCHES
1 TUM D 18 SNOW DP NaN 20041002.0 0.0 INCHES
2 TUM D 18 SNOW DP NaN 20041003.0 -0.0 INCHES
3 TUM D 18 SNOW DP NaN 20041004.0 0.0 INCHES
4 TUM D 18 SNOW DP NaN 20041005.0 1.0 INCHES
... ... ... ... ... ... ... ... ... ...
6004 TUM D 18 SNOW DP NaN 20210310.0 38.0 INCHES
6005 TUM D 18 SNOW DP NaN 20210311.0 118.0 INCHES
6006 TUM D 18 SNOW DP NaN 20210312.0 118.0 INCHES
6007 TUM D 18 SNOW DP NaN 20210313.0 42.0 INCHES
6008 TUM D 18 SNOW DP NaN 20210314.0 NaN INCHES

6009 rows × 9 columns

The only columns needed for analysis are “OBS DATE” and “VALUE.” “DATE TIME” can also be kept, since we’ll need to convert the OBS DATE into a datetime object.

Other columns can be dropped, and any rows with missing snow data can also be dropped.

# Drops the columns that are no longer needed.
# Drops rows with NaN values in the VALUE or OBS DATE columns
df_TUM = df_TUM.drop(columns=['STATION_ID', 'DURATION', 'SENSOR_NUMBER', 'SENS_TYPE', 'DATA_FLAG', 'UNITS'])
df_TUM = df_TUM.dropna(subset=['VALUE']).reset_index(drop=True)
df_TUM = df_TUM.dropna(subset=['OBS DATE']).reset_index(drop=True)
df_TUM
DATE TIME OBS DATE VALUE
0 NaN 20041001.0 2.0
1 NaN 20041002.0 0.0
2 NaN 20041003.0 -0.0
3 NaN 20041004.0 0.0
4 NaN 20041005.0 1.0
... ... ... ...
5432 NaN 20210309.0 36.0
5433 NaN 20210310.0 38.0
5434 NaN 20210311.0 118.0
5435 NaN 20210312.0 118.0
5436 NaN 20210313.0 42.0

5437 rows × 3 columns

# Convert the OBS DATE column to string format
df_TUM['OBS DATE'] = df_TUM['OBS DATE'].astype(int)
df_TUM['OBS DATE'] = df_TUM['OBS DATE'].astype(str)

# Adds on the DATE TIME column by converting the string form of OBS DATE to pandas TimeStamp object
df_TUM['DATE TIME'] = pd.to_datetime(df_TUM['OBS DATE'], format='%Y%m%d')
df_TUM
DATE TIME OBS DATE VALUE
0 2004-10-01 20041001 2.0
1 2004-10-02 20041002 0.0
2 2004-10-03 20041003 -0.0
3 2004-10-04 20041004 0.0
4 2004-10-05 20041005 1.0
... ... ... ...
5432 2021-03-09 20210309 36.0
5433 2021-03-10 20210310 38.0
5434 2021-03-11 20210311 118.0
5435 2021-03-12 20210312 118.0
5436 2021-03-13 20210313 42.0

5437 rows × 3 columns

With the daily snow depth data from 2004-2020, let’s average the snow depth data for each month.

# Groupby the month and year
TUM_avg = df_TUM.groupby([(df_TUM['DATE TIME'].dt.year),(df_TUM['DATE TIME'].dt.month)]).mean()
TUM_avg
VALUE
DATE TIME DATE TIME
2004 10 9.833333
11 22.166667
12 35.258065
2005 1 78.064516
2 78.250000
... ... ...
2020 11 15.033333
12 24.258065
2021 1 33.838710
2 50.642857
3 50.538462

196 rows × 1 columns

Once the snow depth data has been averaged, let’s group the data by month. For example, the TUM_1 dataframe will include the average snow depth for each January from 2005-2020.

# Set the name of the multiindex
TUM_avg.index.names = ['Year', 'Month']

# Select January subset from the multiindex dataframe
TUM_1 = TUM_avg[TUM_avg.index.get_level_values('Month') == 1]
TUM_1
VALUE
Year Month
2005 1 78.064516
2006 1 70.058824
2007 1 20.483871
2008 1 54.870968
2009 1 27.774194
2010 1 44.800000
2011 1 66.428571
2012 1 8.178571
2013 1 54.407407
2014 1 4.933333
2015 1 8.866667
2016 1 51.137931
2017 1 73.782609
2018 1 24.483871
2019 1 49.580645
2020 1 29.129032
2021 1 33.838710
# Repeat for all 12 months
TUM_2 = TUM_avg[TUM_avg.index.get_level_values('Month') == 2]
TUM_3 = TUM_avg[TUM_avg.index.get_level_values('Month') == 3]
TUM_4 = TUM_avg[TUM_avg.index.get_level_values('Month') == 4]
TUM_5 = TUM_avg[TUM_avg.index.get_level_values('Month') == 5]
TUM_6 = TUM_avg[TUM_avg.index.get_level_values('Month') == 6]
TUM_7 = TUM_avg[TUM_avg.index.get_level_values('Month') == 7]
TUM_11 = TUM_avg[TUM_avg.index.get_level_values('Month') == 11]
TUM_12 = TUM_avg[TUM_avg.index.get_level_values('Month') == 12]

Given that the snow depth data has been averaged for each of the 12 months spanning 2005-2020, we can create a subset of the TUM_3 dataframe that excludes 2021 data.

TUM_3_Historical = TUM_3[TUM_3.index.get_level_values('Year') != 2021]
TUM_3_Historical
VALUE
Year Month
2005 3 81.548387
2006 3 89.200000
2007 3 31.225806
2008 3 55.645161
2009 3 58.387097
2010 3 66.548387
2011 3 83.333333
2012 3 20.793103
2013 3 37.000000
2014 3 19.724138
2015 3 1.064516
2016 3 44.566667
2017 3 97.516129
2018 3 50.709677
2019 3 93.870968
2020 3 34.709677

Plotting the Data

Now that the data has been processed, it’s time to plot the data and build the model. A simple linear regression between Days Since April 1st and Snow Depth in March can be performed.

import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

x = TUM_3_Historical['VALUE'].values.reshape(-1, 1)
y = df['Days Since Apr 1'].values.reshape(-1, 1)

linear_regressor = LinearRegression()  # Create object for the class
linear_regressor.fit(x, y)  # Perform linear regression
y_pred = linear_regressor.predict(x)  # Make predictions

plt.plot(x, y_pred, color='red')
plt.scatter(x, y)
plt.show()

Tioga Pass LSR Model

from scipy.stats import linregress
slope, intercept, r_value, p_value, std_err = linregress(x.flatten().tolist(), y.flatten().tolist())
results = linregress(x.flatten().tolist(), y.flatten().tolist())
print(results)
print('R-squared:', r_value**2)
LinregressResult(slope=0.6338435941606729, intercept=24.074433161488514, rvalue=0.8816178208666444, pvalue=6.354407547787794e-06, stderr=0.09068732698031783, intercept_stderr=5.541187818860552)
R-squared: 0.7772499820696507

The scatterplot shows a moderately strong, positive, linear association between Days Since April 1st and Snow Depth in March, with one potential outlier. 77.7% of the variability in Days Since April 1st is accounted by the LSR model on Snow Depth in March.

With the LRS model, we can approximate the number of Days Since April 1st that Tioga Pass will open this year based on the Snow Depth data for March 2021.

current_snow_depth = TUM_3['VALUE'].iloc[-1]
print(current_snow_depth, 'inches')

predicted_days = slope*current_snow_depth + intercept
print(np.round(predicted_days), 'days since April 1')
50.53846153846154 inches
56.0 days since April 1
raw_date = datetime.date(2021, 4, 1) + datetime.timedelta(np.round(predicted_days))
print(raw_date)
2021-05-27

Historically, NPS has always opened Tioga Pass on a Monday. As such, May 27 needs to be rounded to the following Monday.

import datetime
def nextMonday(date):
    monday = 0
    days_ahead = monday - date.weekday() + 7
    return date + datetime.timedelta(days_ahead)

predicted_date = nextMonday(raw_date)
print(predicted_date)
2021-05-31

Conclusion

Based on the snow depth data up to March 13, 2021, my prediction for this year’s Tioga Pass Opening Date is May 31, 2021.