Time Series Introduction
- Time series analysis often comes down to the question of causality: How did the past influence the future?
Timestamp troubles
- what process generated the timestamp, how and when
- Time zone information
- Record your own actions and see how they are captured
- Local or universal time
Filling missing values
- Forward fill
- Moving average of past values - use only the data that occurred before the missing data point
- Interpolation - Determining the values of missing points based on geometric constraints regarding how we want the overall data to behave
Upsampling and downsampling
- To match the frequency of different time series data
- We either increase or decrease the timestamp frequency
Downsampling the data
- The original resolution of the data isn’t sensible
- Match against data at a lower frequency - we can either downsample, take average, weighted mean etc.
- Selecting out every nth element
Upsampling the data
- convert irregularly sampled time series to a regularly timed one
- Inputs sampled at different frequencies
- Knowledge of time series dynamics - Treat as a missing data problem and missing data techniques can be applied.
Smoothing data
Moving average is used to eliminate measurement spikes, errors of measurement etc.
purpose of smoothing
Data preparation - Raw data unsuitable
Feature generation
Prediction - For many processes prediction is mean, which we get from a smoothed feature
Visualization - Add signal to a noisy plot
Exponential smoothing - All data points are not treated equally
More weight to recent data
Equation for exponential smoothing Doing smoothing with pandas Exponential smoothing does not perform well in case of data with a long-term trend
Holt’s method and holt-winters smoothing are two exponential smoothing methods applied to data with a trend or with a trend and a seasonality
Kalman filters and LOESS(locally estimated scatter plot smoothing) are other computationally involved methods - These methods leak information from the future - Not appropriate for forecasting
Smoothing is a commonly used form of forecasting, and you can use a smoothed time series (without lookahead) as one easy null model when you are testing whether a fancier method is actually producing a successful forecast
Seasonality
- Seasonal time series are time series in which behaviors recur over a fixed period. There can be multiple periodicities reflecting different tempos of seasonality, such as the seasonality of the 24-hour day versus the 12-month calendar season, both of which exhibit strong features in most time series relating to human behavior.
Time Zones
- Python libraries to deal with Timezone - datetime, pytz and dateutil
- Be careful with timezone conversions
- Dealing with daylight savings can be tricky
Lookahead
- Processing and cleaning time-related data can be a tedious and detail-oriented process.There is tremendous danger in data cleaning and processing of introducing a lookahead! You should have lookaheads only if they are intentional, and this is rarely appropriate.