How to utilise timeseries in pandas?
Last Updated :
28 Feb, 2022
An ordered stream of values for a variable at evenly spaced time periods is known as a time series. Timeseries are useful in identifying the underlying factors and structures that resulted in the observed data and After you've fitted a model, one can move on to forecasting, monitoring. some applications of time series are Analysis of the Stock Market, Estimated Yields, studies of the spread of diseases like covid19 etc. We can use time series to a particular data based on certain conditions. In this article let's demonstrate how to use time-series data.
Click here to view and download the dataset.
Utilize timeseries in Pandas
All the examples are made on covid_19 data. After importing the CSV file 'ObservationDate' and 'Last Update' dates are converted to datetime using pd.to_datetime() method.
Python3
# import packages
import pandas as pd
# read csv file
df = pd.read_csv('covid_19.csv', encoding='UTF-8')
df['ObservationDate'] = pd.to_datetime(df['ObservationDate'])
df['Last Update'] = pd.to_datetime(df['Last Update'])
print(df)
Output:
Extract all observations before 2021. 192466 rows are retrieved.
Python3
df[df['ObservationDate']<='2021']
Output:
Retrieving observations of a particular day. in this example, we set the day to be '2020-06'.
Python3
df[df['ObservationDate'] == '2020-06']
Output:
Retrieving the day where maximum deaths are the highest. on 2021-05-29 maximum deaths are recorded from UK as per our data.
Python3
df[df['Deaths'] == max(df['Deaths'])]
Output:
Output
Sum of all the deaths on '2021-05-20'.
Python3
sum(df[df['ObservationDate'] == '2021-05-20']['Deaths'])
Output:
3430539.0
Instead of working on the hard way to retrieve data, we can set time series columns to datetime and set them as the index of the dataframe to easily retrieve the information we need. ObservationDate is set as the index of the dataframe in this example. by using df.loc() we can index and access required information by dates directly. df.loc['2020-01'] retrieves all the data of that date. The output shows that there are 513 observations.
Python3
# import packages
import pandas as pd
# read csv file
df = pd.read_csv('covid_19.csv')
df['ObservationDate'] = pd.to_datetime(df['ObservationDate'])
df['Last Update'] = pd.to_datetime(df['Last Update'])
df = df.set_index('ObservationDate')
print(df.loc['2020-01'])
Output:
Observations taken from may 20th to may 21st of 2021 are retrieved using indexing.
Python3
# import packages
import pandas as pd
# read csv file
df = pd.read_csv('covid_19.csv')
df['ObservationDate'] = pd.to_datetime(df['ObservationDate'])
df['Last Update'] = pd.to_datetime(df['Last Update'])
df = df.set_index('ObservationDate')
# observations taken from may 20th to may 21st of 2021
df.loc['2021-05-20':'2021-05-21']
Output:
In this example, df.groupby() is used to group all the observations based on the date they got updated and count them. for example, the first row says there are 40 observations on '2020-01-22'.Â
Python3
# import packages
import pandas as pd
# read csv file
df = pd.read_csv('covid_19.csv')
df['ObservationDate'] = pd.to_datetime(df['ObservationDate'])
df['Last Update'] = pd.to_datetime(df['Last Update'])
df = df.set_index('ObservationDate')
print(df.groupby(level=0).count())
Output:
After setting the index of the dataframe to time-series, we use df.plot.line() method to visualize all the information through a single line plot. Time series data helps us make good conclusions.Â
Python3
# import packages and libraries
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
# reading the dataset
df = pd.read_csv('covid_19_data.csv', encoding='UTF-8')
# convert Last update column to datetime
df['Last Update'] = pd.to_datetime(df['Last Update'])
# setting index
df.set_index('Last Update', inplace=True)
# plotting figure
df.plot.line()
Output:
Similar Reads
How to utilize time series in Pandas? The pandas library in python provides a standard set of time series tools and data algorithms. By this, we can efficiently work with very large time series and easily slice and dice, aggregate, and resample irregular and fixed frequency time series. Time series data is an important form of structure
5 min read
How to plot Timeseries based charts using Pandas? A series of data points collected over the course of a time period, and that are time-indexed is known as Time Series data. These observations are recorded at successive equally spaced points in time. For Example, the ECG Signal, EEG Signal, Stock Market, Weather Data, etc., all are time-indexed and
10 min read
How to Use Python Pandas to manipulate and analyze data efficientlyPandas is a Python toolbox for working with data collections. It includes functions for analyzing, cleaning, examining, and modifying data. In this article, we will see how we can use Python Pandas with the help of examples.What is Python Pandas?A Python lib
5 min read
Python | Pandas Series.at_time() Pandas series is a One-dimensional ndarray with axis labels. The labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Pandas Series.at_time() function is used to
3 min read
How to Plot a Vertical Line on a Time Series Plot in Pandas When working with time series data in Pandas, it is often necessary to highlight specific points or events in the data. One effective way to do this is by plotting vertical lines on the time series plot. In this article, we will explore how to plot a vertical line on a time series plot in Pandas, co
3 min read