Datasets

clustercast includes two example datasets that can be imported and used to experiment with the forecasting classes.


Airline Passengers Dataset

function clustercast.datasets.load_airline_passengers()

This function returns the well-known airline passengers dataset as a pandas dataframe.[1] The airline passengers dataset records monthly airline passengers over a period of many years. This dataset is great for learning about time series forecasting because it is non-stationary and exhibits strong seasonality.

Example

from clustercast.datasets import load_airline_passengers

# load in the dataset
data = load_airline_passengers()
print(data)
            YM  Passengers
0   1949-01-01         112
1   1949-02-01         118
2   1949-03-01         132
3   1949-04-01         129
4   1949-05-01         121
..         ...         ...
139 1960-08-01         606
140 1960-09-01         508
141 1960-10-01         461
142 1960-11-01         390
143 1960-12-01         432

[144 rows x 2 columns]

Store Sales Dataset

function clustercast.datasets.load_store_sales()

This function returns a superstore sales dataset as a pandas dataframe.[2] This dataset is a modified version of an open dataset found on Kaggle, linked in the references below. The store sales dataset is a good example for global forecasting of grouped time series, as there are 12 closely related time series that have shared and overlapping attributes (store region and product category).[3]

Example

from clustercast.datasets import load_store_sales

# load in the dataset
data = load_store_sales()
print(data)
     ID         YM   Region    Category     Sales
0     1 2015-01-01  Central   Furniture   506.358
1     1 2015-02-01  Central   Furniture   439.310
2     1 2015-03-01  Central   Furniture  3639.290
3     1 2015-04-01  Central   Furniture  1468.218
4     1 2015-05-01  Central   Furniture  2304.382
..   ..        ...      ...         ...       ...
568  12 2018-08-01     West  Technology  6230.788
569  12 2018-09-01     West  Technology  5045.440
570  12 2018-10-01     West  Technology  4651.807
571  12 2018-11-01     West  Technology  7584.580
572  12 2018-12-01     West  Technology  8064.524

[573 rows x 5 columns]

References

[1] “Airline Passengers.” Kaggle. Accessed 2025.

[2] “Superstore Sales Dataset.” Kaggle. Accessed 2025.

[3] Hyndman, Rob J., and George Athanasopoulos. “Hierarchical and Grouped Time Series.” Forecasting: Principles and Practice, Otexts, 2021.