Skip to the content.

Time Series Data Mining with Python

The Western Australian Branch of the Statistical Society of Australia (SSA WA) is delighted to host this hands-on Python workshop on the analysis of time series data. It’s our privilege to have the 2021 Frank Hansford-Miller Fellow, Dr Manuel Herrera, as the instructor.

This two-day workshop aims to enable students and practitioners in data science to add methodologies of time series data mining to their skillset for future applications for both academic and industry projects. After an introduction to Python for time series analysis, this workshop explores data mining techniques for pattern extraction in time series, ranging from dimensionality reduction to anomaly detection. Participants will benefit from data wrangling for time series analysis with Python on the first day and a practical overview of time series data mining tools on the second day.

Registration

Workshop participants must first register on the SSA Events page: https://www.statsoc.org.au/event-4528724

Slack

Links to Zoom sessions will be distributed via the Slack workspace for this workshop. Slack is also the place to ask for assistance and continue any discussions that have arisen throughout the workshop. The Slack will remain activated for six months after the event (and perhaps longer).

Schedule

The online workshop is structured as 4 x 2h sessions over two consecutive afternoons:

Day Session Time Content
18 Nov 1 1:30pm (2h) Introduction to Python
  2 4:00pm (2h) Time Series
19 Nov 3 1:30pm (2h) Introduction to SAX
  4 4:00pm (2h) Matrix Profile and Other tools

Materials

The workshop is designed as a Python Notebook than can be run with the cloud service Colaboratory (‘Colab’), this allows you to write and execute Python in your browser without needing to setup a local Python environment. For more information visit the Google Colab website.

Links to materials will be posted here as they become available.

Curriculum

Day 1 - Fundamentals of Time Series Analysis with Python

This is a hands-on course for learning the basics of data wrangling and time series analysis with Python.

We will begin the course with a quick introduction to Python and the Google Colab environment which runs a Jupyter Notebook service for executing Python code in a web browser with no little user setup. We will then explore the use of libraries such as pandas, numpy and matplotlib to data acquisition, timestamping, preprocessing and visualization. We will continue the session by introducing the fundamentals of time series analysis. Throughout the workshop you will gain experience implementing these analysis in Python in real-life case studies.

At the end of this module you will be able to:

Day 2 - Introduction to Time Series Data Mining

This workshop will introduce time series data mining techniques using Symbolic Aggregate approXimation (SAX) with the specifically dedicated Python library saxpy, as well as with tslearn which provides more general machine learning tools for the analysis of time series data. We will see the benefits of the data dimension reduction using SAX, as well as its possibilities on the application further of clustering and classification techniques.

Matrix profile is a more advanced technique than SAX for time-series data mining. The workshop will introduce its theoretical basics while using the Python library matrixprofile for motif and novelty/discord discovery. The first, aiding to extract the most common patterns in a time series and the latter, to detect points and subsequences of potential anomalies. Other data mining problems, such as clustering and shapelet discovery for time series classification, will also be explored.

This session cover:

About the Speaker

Dr Manuel Herrera is a Research Associate in distributed intelligent systems at the University of Cambridge. He has a PhD in Hydraulic Engineering and a degree in Statistics. His research focuses on predictive analytics and complex (adaptive) networks for smart and resilient critical infrastructure and utilities. Manuel’s interdisciplinary profile has proven to be successful in terms of the number and quality of publications; having a high academic impact. His latest research deals with AI-enabled management and maintenance of the UK national infrastructure. He is currently involved in projects of topics ranging from telecommunications to 5G ports. Manuel is a fellow of the Royal Statistical Society and a member of IEEE and the Complex Systems Society.