A. Feature Descriptions
The table below provides detailed descriptions of features from both
the weather and bikeshare datasets used in the project.
Table A1.
Weather Data Features
|
Feature
|
Description
|
|
date
|
UTC date
|
|
hour
|
UTC hour
|
|
temp
|
Temperature in Celsius
|
|
humidity
|
Humidity (%)
|
|
wind_speed
|
Wind speed (m/s)
|
|
cloudiness
|
Cloud cover (%)
|
|
weather_main
|
General weather label
|
Table A2.
Bike Share Data Features
|
Feature
|
Description
|
|
trip_id
|
Unique trip identifier
|
|
trip_start_time
|
Start time of trip
|
|
trip_stop_time
|
End time of trip
|
|
trip_duration_seconds
|
Duration of trip (s)
|
|
from_station_name
|
Origin station
|
|
to_station_name
|
Destination station
|
|
user_type
|
Type of user (member/casual)
|
B. Hyperparameter Grid Search
The table below summarizes the hyperparameter grid used during
cross-validation for both the Random Forest and XGBoost models.
Table B.
Hyperparameter Grid Considered for Random Forest and XGBoost
|
Model
|
Parameter
|
Description
|
Values.Considered
|
|
Random Forest
|
mtry
|
Number of variables randomly sampled at each split
|
{2, 4, 6, 8}
|
|
ntree
|
Number of trees to grow in the forest
|
{100, 250, 500}
|
|
nodesize
|
Minimum number of observations per terminal node
|
{1, 5, 10}
|
|
XGBoost
|
eta
|
Learning rate, controlling tree contribution
|
{0.05, 0.1}
|
|
max_depth
|
Maximum depth of individual trees
|
{4, 6}
|
|
min_child_weight
|
Minimum sum of instance weights in child node
|
{1, 3}
|
|
subsample
|
Fraction of instances used per tree
|
{0.8, 1}
|
|
colsample_bytree
|
Fraction of features used per tree
|
{0.8, 1}
|