Tioga Pass Opening Date

20 Mar 2021

Every winter, Tioga Pass in Yosemite National Park closes due to the snow and the unsafe driving conditions. Depending on the snowfall for the year and the road conditions, Tioga Pass opens between May and June. Tioga Pass usually opens to bikes only for a week, and then it opens to the general public afterwards. What’s a reasonable prediction for the 2021 Tioga Pass Opening Date?

Cleaning the Data

First, let’s import the historical opening dates from the NPS. The National Park Service (NPS) has recorded the historical opening dates for Tioga Pass since the 1980s.

import pandas as pd
import numpy as np

df = pd.read_csv('.../yose-Websitesortabledataelementsheetsroadscampgroundstrails.csv')
df

	Year	Tioga Opened	Tioga Closed	Glacier Pt Opened	Glacier Pt Closed	Mariposa Grove Opened	Mariposa Grove Closed	Snowpack as of Apr 1
0	2020 *	15-Jun	5-Nov	11-Jun	5-Nov	11-Jun	25-Dec	46%
1	2019 *	1-Jul	19-Nov	10-May	25-Nov	16-Apr	26-Nov	176%
2	2018 *	21-May	20-Nov	28-Apr	20-Nov	15-Jun	30-Nov	67%
3	2017 *	29-Jun	14-Nov	11-May	14-Nov	Closed	Closed	177%
4	2016 *	18-May	16-Nov	19-Apr	16-Nov	Closed	Closed	89%
5	2015 *	4-May	1-Nov	28-Mar	2-Nov	No closure	6-Jul	7%
6	2014 *	2-May	13-Nov	14-Apr	28-Nov	No closure	No closure	33%
7	2013	11-May	18-Nov	3-May	18-Nov	No closure	No closure	52%
8	2012	7-May	8-Nov	20-Apr	8-Nov	No closure	No closure	43%
9	2011 *	18-Jun	17-Jan	27-May	19-Nov	15-Apr	No closure	178%
10	2010	5-Jun	19-Nov	29-May	7-Nov	21-May	20-Nov	107%
11	2009	19-May	12-Nov	5-May	12-Nov	NaN	NaN	92%
12	2008	21-May	30-Oct	2-May	12-Dec	NaN	NaN	99%
13	2007	11-May	6-Dec	4-May	6-Dec	NaN	NaN	46%
14	2006	17-Jun	27-Nov	25-May	27-Nov	NaN	NaN	129%
15	2005	24-Jun	25-Nov	25-May	?	NaN	NaN	163%
16	2004	14-May	17-Oct	14-May	?	NaN	NaN	83%
17	2003	31-May	31-Oct	31-May	31-Oct	NaN	NaN	65%
18	2002	22-May	5-Nov	17-May	5-Nov	NaN	NaN	95%
19	2001	12-May	11-Nov	15-May	?	NaN	NaN	67%
20	2000	18-May	9-Nov	15-May	9-Nov	NaN	NaN	97%
21	1999 *	28-May	23-Nov	28-May	23-Nov	NaN	NaN	110%
22	1998	1-Jul	12-Nov	1-Jul	6-Nov	NaN	NaN	156%
23	1997	13-Jun	12-Nov	22-May	12-Nov	NaN	NaN	105%
24	1996	31-May	5-Nov	24-May	5-Nov	NaN	NaN	111%
25	1995	30-Jun	11-Dec	1-Jul	11-Dec	NaN	NaN	178%
26	1994	25-May	10-Nov	NaN	NaN	NaN	NaN	51%
27	1993	3-Jun	24-Nov	NaN	NaN	NaN	NaN	159%
28	1992	15-May	10-Nov	NaN	NaN	NaN	NaN	58%
29	1991	26-May	14-Nov	NaN	NaN	NaN	NaN	79%
30	1990	17-May	19-Nov	NaN	NaN	NaN	NaN	45%
31	1989	12-May	24-Nov	NaN	NaN	NaN	NaN	83%
32	1988	29-Apr	14-Nov	NaN	NaN	NaN	NaN	31%
33	1987	2-May	13-Nov	NaN	NaN	NaN	NaN	51%
34	1986	24-May	29-Nov	NaN	NaN	NaN	NaN	137%
35	1985	8-May	12-Nov	NaN	NaN	NaN	NaN	97%
36	1984	19-May	8-Nov	NaN	NaN	NaN	NaN	85%
37	1983	29-Jun	11-Nov	NaN	NaN	NaN	NaN	224%
38	1982	28-May	15-Nov	NaN	NaN	NaN	NaN	131%
39	1981	15-May	12-Nov	NaN	NaN	NaN	NaN	77%
40	1980	6-Jun	2-Dec	NaN	NaN	NaN	NaN	144%
41	Average\n('96-'15)	26-May	14-Nov	14-May	15-Nov	--	--	92%
42	Median\n('96-'15)	21-May	12-Nov	16-May	10-Nov	--	--	96%

The only columns needed for analysis are “Year” and “Tioga Opened,” and the other columns can be dropped.

Using April 1st as a reference date, I created the function formatDates() to calculate the number of days since April 1st that Tioga Pass opened for each year, appending to the new column “Days Since Apr 1.”

import datetime

df = df.drop([41, 42])
df = df.drop(columns=['Tioga Closed', 'Glacier Pt Opened', 'Glacier Pt Closed', 'Mariposa Grove Opened', 'Mariposa Grove Closed', 'Snowpack as of Apr 1'])
df['Days Since Apr 1'] = 0

def formatDates(row):
    snowpack_day = datetime.datetime.strptime('1900-04-01 00:00:00', '%Y-%m-%d %H:%M:%S')
    tioga_open = datetime.datetime.strptime(row, '%d-%b')
    days_since = tioga_open - snowpack_day
    return days_since.days

for i in range(len(df)):
    df.iloc[i, 2] = formatDates(df.iloc[i, 1])

df

	Year	Tioga Opened	Days Since Apr 1
0	2020 *	15-Jun	75
1	2019 *	1-Jul	91
2	2018 *	21-May	50
3	2017 *	29-Jun	89
4	2016 *	18-May	47
5	2015 *	4-May	33
6	2014 *	2-May	31
7	2013	11-May	40
8	2012	7-May	36
9	2011 *	18-Jun	78
10	2010	5-Jun	65
11	2009	19-May	48
12	2008	21-May	50
13	2007	11-May	40
14	2006	17-Jun	77
15	2005	24-Jun	84
16	2004	14-May	43
17	2003	31-May	60
18	2002	22-May	51
19	2001	12-May	41
20	2000	18-May	47
21	1999 *	28-May	57
22	1998	1-Jul	91
23	1997	13-Jun	73
24	1996	31-May	60
25	1995	30-Jun	90
26	1994	25-May	54
27	1993	3-Jun	63
28	1992	15-May	44
29	1991	26-May	55
30	1990	17-May	46
31	1989	12-May	41
32	1988	29-Apr	28
33	1987	2-May	31
34	1986	24-May	53
35	1985	8-May	37
36	1984	19-May	48
37	1983	29-Jun	89
38	1982	28-May	57
39	1981	15-May	44
40	1980	6-Jun	66

The snow depth data dates back to 2005, so the Tioga Pass opening dates can be reduced to 2005-2020.

# Drops the years before 2005, and reverses the dataframe so data is sorted from old --> new.
df = df[0:16]
df = df[::-1].reset_index(drop=True)
df

	Year	Tioga Opened	Days Since Apr 1
0	2005	24-Jun	84
1	2006	17-Jun	77
2	2007	11-May	40
3	2008	21-May	50
4	2009	19-May	48
5	2010	5-Jun	65
6	2011 *	18-Jun	78
7	2012	7-May	36
8	2013	11-May	40
9	2014 *	2-May	31
10	2015 *	4-May	33
11	2016 *	18-May	47
12	2017 *	29-Jun	89
13	2018 *	21-May	50
14	2019 *	1-Jul	91
15	2020 *	15-Jun	75

Next, let’s import the snow depth data from the California Data Exchange Center. The station of interest is in Tuolumne Meadows (TUM).

import pandas as pd
df_TUM = pd.read_excel('.../TUM_18.xlsx')
df_TUM

	STATION_ID	DURATION	SENSOR_NUMBER	SENS_TYPE	DATE TIME	OBS DATE	VALUE	DATA_FLAG	UNITS
0	TUM	D	18	SNOW DP	NaN	20041001.0	2.0		INCHES
1	TUM	D	18	SNOW DP	NaN	20041002.0	0.0		INCHES
2	TUM	D	18	SNOW DP	NaN	20041003.0	-0.0		INCHES
3	TUM	D	18	SNOW DP	NaN	20041004.0	0.0		INCHES
4	TUM	D	18	SNOW DP	NaN	20041005.0	1.0		INCHES
...	...	...	...	...	...	...	...	...	...
6004	TUM	D	18	SNOW DP	NaN	20210310.0	38.0		INCHES
6005	TUM	D	18	SNOW DP	NaN	20210311.0	118.0		INCHES
6006	TUM	D	18	SNOW DP	NaN	20210312.0	118.0		INCHES
6007	TUM	D	18	SNOW DP	NaN	20210313.0	42.0		INCHES
6008	TUM	D	18	SNOW DP	NaN	20210314.0	NaN		INCHES

6009 rows × 9 columns

The only columns needed for analysis are “OBS DATE” and “VALUE.” “DATE TIME” can also be kept, since we’ll need to convert the OBS DATE into a datetime object.

Other columns can be dropped, and any rows with missing snow data can also be dropped.

# Drops the columns that are no longer needed.
# Drops rows with NaN values in the VALUE or OBS DATE columns
df_TUM = df_TUM.drop(columns=['STATION_ID', 'DURATION', 'SENSOR_NUMBER', 'SENS_TYPE', 'DATA_FLAG', 'UNITS'])
df_TUM = df_TUM.dropna(subset=['VALUE']).reset_index(drop=True)
df_TUM = df_TUM.dropna(subset=['OBS DATE']).reset_index(drop=True)
df_TUM

	DATE TIME	OBS DATE	VALUE
0	NaN	20041001.0	2.0
1	NaN	20041002.0	0.0
2	NaN	20041003.0	-0.0
3	NaN	20041004.0	0.0
4	NaN	20041005.0	1.0
...	...	...	...
5432	NaN	20210309.0	36.0
5433	NaN	20210310.0	38.0
5434	NaN	20210311.0	118.0
5435	NaN	20210312.0	118.0
5436	NaN	20210313.0	42.0

5437 rows × 3 columns

# Convert the OBS DATE column to string format
df_TUM['OBS DATE'] = df_TUM['OBS DATE'].astype(int)
df_TUM['OBS DATE'] = df_TUM['OBS DATE'].astype(str)

# Adds on the DATE TIME column by converting the string form of OBS DATE to pandas TimeStamp object
df_TUM['DATE TIME'] = pd.to_datetime(df_TUM['OBS DATE'], format='%Y%m%d')
df_TUM

	DATE TIME	OBS DATE	VALUE
0	2004-10-01	20041001	2.0
1	2004-10-02	20041002	0.0
2	2004-10-03	20041003	-0.0
3	2004-10-04	20041004	0.0
4	2004-10-05	20041005	1.0
...	...	...	...
5432	2021-03-09	20210309	36.0
5433	2021-03-10	20210310	38.0
5434	2021-03-11	20210311	118.0
5435	2021-03-12	20210312	118.0
5436	2021-03-13	20210313	42.0

5437 rows × 3 columns

With the daily snow depth data from 2004-2020, let’s average the snow depth data for each month.

# Groupby the month and year
TUM_avg = df_TUM.groupby([(df_TUM['DATE TIME'].dt.year),(df_TUM['DATE TIME'].dt.month)]).mean()
TUM_avg

		VALUE
DATE TIME	DATE TIME
2004	10	9.833333
	11	22.166667
	12	35.258065
2005	1	78.064516
2005	2	78.250000
...	...	...
2020	11	15.033333
2020	12	24.258065
2021	1	33.838710
	2	50.642857
	3	50.538462

196 rows × 1 columns

Once the snow depth data has been averaged, let’s group the data by month. For example, the TUM_1 dataframe will include the average snow depth for each January from 2005-2020.

# Set the name of the multiindex
TUM_avg.index.names = ['Year', 'Month']

# Select January subset from the multiindex dataframe
TUM_1 = TUM_avg[TUM_avg.index.get_level_values('Month') == 1]
TUM_1

		VALUE
Year	Month
2005	1	78.064516
2006	1	70.058824
2007	1	20.483871
2008	1	54.870968
2009	1	27.774194
2010	1	44.800000
2011	1	66.428571
2012	1	8.178571
2013	1	54.407407
2014	1	4.933333
2015	1	8.866667
2016	1	51.137931
2017	1	73.782609
2018	1	24.483871
2019	1	49.580645
2020	1	29.129032
2021	1	33.838710

# Repeat for all 12 months
TUM_2 = TUM_avg[TUM_avg.index.get_level_values('Month') == 2]
TUM_3 = TUM_avg[TUM_avg.index.get_level_values('Month') == 3]
TUM_4 = TUM_avg[TUM_avg.index.get_level_values('Month') == 4]
TUM_5 = TUM_avg[TUM_avg.index.get_level_values('Month') == 5]
TUM_6 = TUM_avg[TUM_avg.index.get_level_values('Month') == 6]
TUM_7 = TUM_avg[TUM_avg.index.get_level_values('Month') == 7]
TUM_11 = TUM_avg[TUM_avg.index.get_level_values('Month') == 11]
TUM_12 = TUM_avg[TUM_avg.index.get_level_values('Month') == 12]

Given that the snow depth data has been averaged for each of the 12 months spanning 2005-2020, we can create a subset of the TUM_3 dataframe that excludes 2021 data.

TUM_3_Historical = TUM_3[TUM_3.index.get_level_values('Year') != 2021]
TUM_3_Historical

		VALUE
Year	Month
2005	3	81.548387
2006	3	89.200000
2007	3	31.225806
2008	3	55.645161
2009	3	58.387097
2010	3	66.548387
2011	3	83.333333
2012	3	20.793103
2013	3	37.000000
2014	3	19.724138
2015	3	1.064516
2016	3	44.566667
2017	3	97.516129
2018	3	50.709677
2019	3	93.870968
2020	3	34.709677

Plotting the Data

Now that the data has been processed, it’s time to plot the data and build the model. A simple linear regression between Days Since April 1st and Snow Depth in March can be performed.

import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

x = TUM_3_Historical['VALUE'].values.reshape(-1, 1)
y = df['Days Since Apr 1'].values.reshape(-1, 1)

linear_regressor = LinearRegression()  # Create object for the class
linear_regressor.fit(x, y)  # Perform linear regression
y_pred = linear_regressor.predict(x)  # Make predictions

plt.plot(x, y_pred, color='red')
plt.scatter(x, y)
plt.show()

Tioga Pass LSR Model

from scipy.stats import linregress
slope, intercept, r_value, p_value, std_err = linregress(x.flatten().tolist(), y.flatten().tolist())
results = linregress(x.flatten().tolist(), y.flatten().tolist())
print(results)
print('R-squared:', r_value**2)

LinregressResult(slope=0.6338435941606729, intercept=24.074433161488514, rvalue=0.8816178208666444, pvalue=6.354407547787794e-06, stderr=0.09068732698031783, intercept_stderr=5.541187818860552)
R-squared: 0.7772499820696507

The scatterplot shows a moderately strong, positive, linear association between Days Since April 1st and Snow Depth in March, with one potential outlier. 77.7% of the variability in Days Since April 1st is accounted by the LSR model on Snow Depth in March.

With the LRS model, we can approximate the number of Days Since April 1st that Tioga Pass will open this year based on the Snow Depth data for March 2021.

current_snow_depth = TUM_3['VALUE'].iloc[-1]
print(current_snow_depth, 'inches')

predicted_days = slope*current_snow_depth + intercept
print(np.round(predicted_days), 'days since April 1')

50.53846153846154 inches
56.0 days since April 1

raw_date = datetime.date(2021, 4, 1) + datetime.timedelta(np.round(predicted_days))
print(raw_date)

2021-05-27

Historically, NPS has always opened Tioga Pass on a Monday. As such, May 27 needs to be rounded to the following Monday.

import datetime
def nextMonday(date):
    monday = 0
    days_ahead = monday - date.weekday() + 7
    return date + datetime.timedelta(days_ahead)

predicted_date = nextMonday(raw_date)
print(predicted_date)

2021-05-31

Conclusion

Based on the snow depth data up to March 13, 2021, my prediction for this year’s Tioga Pass Opening Date is May 31, 2021.