Time Series Data Mining with Python
The Western Australian Branch of the Statistical Society of Australia (SSA WA) is delighted to host this hands-on Python workshop on the analysis of time series data. It’s our privilege to have the 2021 Frank Hansford-Miller Fellow, Dr Manuel Herrera, as the instructor.
This two-day workshop aims to enable students and practitioners in data science to add methodologies of time series data mining to their skillset for future applications for both academic and industry projects. After an introduction to Python for time series analysis, this workshop explores data mining techniques for pattern extraction in time series, ranging from dimensionality reduction to anomaly detection. Participants will benefit from data wrangling for time series analysis with Python on the first day and a practical overview of time series data mining tools on the second day.
Registration
Workshop participants must first register on the SSA Events page: https://www.statsoc.org.au/event-4528724
Slack
Links to Zoom sessions will be distributed via the Slack workspace for this workshop. Slack is also the place to ask for assistance and continue any discussions that have arisen throughout the workshop. The Slack will remain activated for six months after the event (and perhaps longer).
Schedule
The online workshop is structured as 4 x 2h sessions over two consecutive afternoons:
Day | Session | Time | Content |
---|---|---|---|
18 Nov | 1 | 1:30pm (2h) | Introduction to Python |
2 | 4:00pm (2h) | Time Series | |
19 Nov | 3 | 1:30pm (2h) | Introduction to SAX |
4 | 4:00pm (2h) | Matrix Profile and Other tools |
Materials
The workshop is designed as a Python Notebook than can be run with the cloud service Colaboratory (‘Colab’), this allows you to write and execute Python in your browser without needing to setup a local Python environment. For more information visit the Google Colab website.
Links to materials will be posted here as they become available.
Curriculum
Day 1 - Fundamentals of Time Series Analysis with Python
This is a hands-on course for learning the basics of data wrangling and time series analysis with Python.
We will begin the course with a quick introduction to Python and the Google Colab environment which runs a Jupyter Notebook service for executing Python code in a web browser with no little user setup. We will then explore the use of libraries such as pandas
, numpy
and matplotlib
to data acquisition, timestamping, preprocessing and visualization. We will continue the session by introducing the fundamentals of time series analysis. Throughout the workshop you will gain experience implementing these analysis in Python in real-life case studies.
At the end of this module you will be able to:
- Get familiar with Python and the Google Colab environment.
- Use the Python libraries
pandas
andmatplotlib
to import, preprocessing, and data visualisation. - Work on time series data analysis with the Python libraries
pandas
andstatsmodels
.
Day 2 - Introduction to Time Series Data Mining
This workshop will introduce time series data mining techniques using Symbolic Aggregate approXimation (SAX) with the specifically dedicated Python library saxpy
, as well as with tslearn
which provides more general machine learning tools for the analysis of time series data. We will see the benefits of the data dimension reduction using SAX, as well as its possibilities on the application further of clustering and classification techniques.
Matrix profile is a more advanced technique than SAX for time-series data mining. The workshop will introduce its theoretical basics while using the Python library matrixprofile
for motif and novelty/discord discovery. The first, aiding to extract the most common patterns in a time series and the latter, to detect points and subsequences of potential anomalies. Other data mining problems, such as clustering and shapelet discovery for time series classification, will also be explored.
This session cover:
- Use the Python library
saxpy
to work with SAX on time-series dimension reduction, clustering and classification. - Explore the Python library
tslearn
for basic analysis based on SAX as well as for other machine learning techniques for time series. - Work on time-series data mining using matrix profile and the Python library
matrixprofile
. - Matrix profile analysis will include the discovery of time series discords that will lead to new possibilities for anomaly detection.
About the Speaker
Dr Manuel Herrera is a Research Associate in distributed intelligent systems at the University of Cambridge. He has a PhD in Hydraulic Engineering and a degree in Statistics. His research focuses on predictive analytics and complex (adaptive) networks for smart and resilient critical infrastructure and utilities. Manuel’s interdisciplinary profile has proven to be successful in terms of the number and quality of publications; having a high academic impact. His latest research deals with AI-enabled management and maintenance of the UK national infrastructure. He is currently involved in projects of topics ranging from telecommunications to 5G ports. Manuel is a fellow of the Royal Statistical Society and a member of IEEE and the Complex Systems Society.