0% found this document useful (0 votes)

89 views

Data Science Interview Preparation 7

The document discusses various techniques for time series analysis and forecasting. It provides explanations of key concepts like differencing and transforming to make time series stationary, checking for stationarity using visual and statistical tests, understanding ACF and PACF plots, decomposing time series into trend, seasonality, and noise components, and forecasting techniques like simple moving average, exponential smoothing, and ARIMA models. It also includes examples of time series forecasting problems and questions regarding time series analysis concepts.

Uploaded by

Julian Tolosa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views

Data Science Interview Preparation 7

Uploaded by

Julian Tolosa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

DATA SCIENCE

INTERVIEW
PREPARATION
(30 Days of Interview
Preparation)

# DAY 07
Q1. What is the process to make data stationery from non-
stationary in time series?
Ans:
The two most common ways to make a non-stationary time series stationary are:

 Differencing
 Transforming

Let us look at some details for each of them:

Differencing:
To make your series stationary, you take a difference between the data points. So let us say, your
original time series was:

X1, X2, X3,...........Xn

Your series with a difference of degree 1 becomes:

(X2 - X1, X3 - X2, X4 - X3,.......Xn - X(n-1)

Once, you make the difference, plot the series and see if there is any improvement in the ACF curve.
If not, you can try a second or even a third-order differencing. Remember, the more you difference,
the more complicated your analysis is becoming.
Transforming:
If we cannot make a time series stationary, you can try out transforming the variables. Log transform
is probably the most commonly used transformation if we see the diverging time series.
However, it is suggested that you use transformation only in case differencing is not working.

Q2. What is the process to check stationary data ?

Ans:

Stationary series: It is one in which the properties – mean, variance and covariance, do not vary
with time.

Let us get an idea with these three plots:

 In the first plot, we can see that the mean varies (increases) with time, which results in an
upward trend. This is the non-stationary series.
For the series classification as stationary, it should not exhibit the trend.
 Moving on to the second plot, we do not see a trend in the series, but the variance of the series
is a function of time. As mentioned previously, a stationary series must have a constant
variance.
 If we look at the third plot, the spread becomes closer, as the time increases, which implies that
covariance is a function of time.

These three plots refer to the non-stationary time series. Now give your attention to fourth:

In this case, Mean, Variance and Covariance are constant with time. This is how a stationary time
series looks like.

Most of the statistical models require the series to be stationary to make an effective and precise
prediction.

The various process you can use to find out your data is stationary or not by the following terms:
1. Visual Test
2. Statistical Test
3. ADF(Augmented Dickey-Fuller) Test
4. KPSS(Kwiatkowski-Phillips-Schmidt-Shin) Test

Q3. What are ACF and PACF?.

Ans:
ACF is a (complete) auto-correlation function which gives us the values of the auto-correlation of
any series with lagged values. We plot these values along with a confidence band.We have an ACF
plot. In simple terms, it describes how well the present value of the series is related to its past
values. A time series can have components like the trend, seasonality, cyclic and residual. ACF
considers all the components while finding correlations; hence, it’s a ‘complete auto-correlation
plot’.
PACF is a partial autocorrelation function. Instead of finding correlations of present with lags like
ACF, it finds the correlations of the residuals with the next lag value thus ‘partial’ and not
‘complete’ as we remove already found variations before we find next correlation. So if there are
any hidden pieces of information in the residual which can be modelled by next lag, we might get a
good correlation, and we’ll keep that next lag as a feature while modelling. Remember, while
modelling we don’t want to keep too many correlated features, as that it can create multicollinearity
issues. Hence we need to retain only relevant features.

Q4. What do you understand by the trend of data?

Ans:
A general systematic linear or (most often) nonlinear component that changes over time and does not
repeat.
There are different approaches to understanding trend. A positive trend means it is likely that
growth continues. Let's illustrate this with a simple example:

Hmm, this looks like there is a trend. To build up confidence, let's add a linear regression for this
graph:

Great, now it’s clear theirs a trend in the graph by adding Linear Regression.
Q5. What is the Augmented Dickey-Fuller Test?
Ans:
The Dickey-Fuller test: It is one of the most popular statistical tests. It is used to determine the
presence of unit root in a series, and hence help us to understand if the series is stationary or not.
The null and alternate hypothesis for this test is:
Null Hypothesis: The series has a unit root (value of a =1)
Alternate Hypothesis: The series has no unit root.
If we fail to reject the null hypothesis, we can say that the series is non-stationary. This means that
the series can be linear or difference stationary.

Q6. What is AIC and BIC into time series?

Ans:
Akaike’s information criterion (AIC) compares the quality of a set of statistical models to each
other. For example, you might be interested in what variables contribute to low socioeconomic
status and how the variables contribute to that status. Let’s say you create several regression models
for various factors like education, family size, or disability status; The AIC will take each model
and rank them from best to worst. The “best” model will be the one that neither under-fits nor over-
fits.
The Bayesian Information Criterion (BIC) can be defined as:

k log(n)- 2log(L(θ̂)).
Here n is the sample size.
K is the number of parameters which your model estimates.

θ is the set of all parameter.

L (θ̂) represents the likelihood of the model tested, when evaluated at maximum likelihood values
of θ.

Q7. What are the components of the Time -Series?

Ans:
Time series analysis: It provides a body of techniques to understand a dataset better. The most
useful one is the decomposition of the time series into four constituent parts-
1. Level- The baseline value for the series if it were a straight line.
2. Trend - The optional and linear, increasing or decreasing behaviour of series over time.
3. Seasonality - Optional repeated patterns /cycles of behaviour over time.
4. Noise - The optional variability in the observations that cannot be explained by the model.

Q8. What is Time Series Analysis?

Ans:
Time series analysis: It involves developing models that best capture or describe an observed time
series to understand the underlying cause. This study seeks the “why” behind the time-series
datasets. This involves making assumptions about the form of data and decomposing time-series
into the constitution component.

Quality of descriptive model is determined by how well it describes all available data and the
interpretation it provides to inform the problem domain better.
Q9. Give some examples of the Time-Series forecast?
Ans:
There is almost an endless supply of the time series forecasting problems. Below are ten examples
from a range of industries to make the notions of time series analysis and forecasting more
concrete.
1. Forecasting the corn yield in tons by the state each year.
2. Forecasting whether an EEG trace in seconds indicates a patient is having a seizure or not.
3. Forecasting the closing price of stocks every day.
4. Forecasting the birth rates at all hospitals in the city every year.
5. Forecasting product sales in the units sold each day for the store.
6. Forecasting the number of passengers through the train station each day.
7. Forecasting unemployment for a state each quarter.
8. Forecasting the utilisation demand on the server every hour.
9. Forecasting the size of the rabbit populations in the state each breeding season.
10. Forecasting the average price of gasoline in a city each day.

Q10. What are the techniques of Forecasting?

Ans:
There are so many statistical techniques available for time series forecast however we have found a
few effective ones which are listed below:

 Simple Moving Average (SMA)

 Exponential Smoothing (SES)
 Autoregressive Integration Moving Average (ARIMA)

Q11. What is the Moving Average?

Ans:
The moving average model is probably the most naive approach to time series modelling. This model
states that the next observation is the mean of all past observations.

Although simple, this model might be surprisingly good, and it represents a good starting point.

Otherwise, the moving average can be used to identify interesting trends in the data. We can define
a window to apply the moving average model to smooth the time series and highlight different trends.
Example of a moving average on a 24h window

In the plot above, we applied the moving average model to a 24h window. The green
line smoothed the time series, and we can see that there are two peaks in the 24h period.

The longer the window, the smoother the trend will be.

Below is an example of moving average on a smaller window.

Example of a moving average on a 12h window

Q12. What is Exponential smoothing?
Ans:
Exponential smoothing uses similar logic to moving average, but this time, different decreasing
weight is assigned to each observation. We can also say, less importance is given to the
observations as we move further from the present.

Mathematically, exponential smoothing is expressed as:

Here, alpha is the smoothing factor which takes values between 0 to 1. It determines how fast the
weight will decrease for the previous observations.

From the above plot, the dark blue line represents the exponential smoothing of the time series
using a smoothing factor of 0.3, and the orange line uses a smoothing factor of 0.05. As we can see,
the smaller the smoothing factor, the smoother the time series will be. Because as smoothing factor
approaches 0, we approach to the moving average model
------------------------------------------------------------------------------------------------------------------------

Data Analyst Case Study Example
No ratings yet
Data Analyst Case Study Example
10 pages
41 Essential Machine Learning Interview Questions: 18 Mins Read
No ratings yet
41 Essential Machine Learning Interview Questions: 18 Mins Read
21 pages
Applied Science Interview Prep
No ratings yet
Applied Science Interview Prep
4 pages
100 Data Scientist Interview Questions by DataInterview 1688929352
No ratings yet
100 Data Scientist Interview Questions by DataInterview 1688929352
7 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Time Series Analysis - COMPLETE
No ratings yet
Time Series Analysis - COMPLETE
15 pages
Time Series
No ratings yet
Time Series
23 pages
Python Advanced - Finite State Machine in Python
No ratings yet
Python Advanced - Finite State Machine in Python
1 page
ML Cheatsheet Final
No ratings yet
ML Cheatsheet Final
32 pages
Simple Tutorial in R
No ratings yet
Simple Tutorial in R
15 pages
Data Science Course Content Chapter 1: Introduction To Data Science
No ratings yet
Data Science Course Content Chapter 1: Introduction To Data Science
8 pages
100 plus Statistics Interview Questions
0% (1)
100 plus Statistics Interview Questions
44 pages
Principles of Data Science
No ratings yet
Principles of Data Science
3 pages
Resume - Rajat Chaturvedi
No ratings yet
Resume - Rajat Chaturvedi
3 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Statistical Forcasting - Excel, ARIMA
No ratings yet
Statistical Forcasting - Excel, ARIMA
14 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Building A Career in Data Science - The Overview
No ratings yet
Building A Career in Data Science - The Overview
2 pages
Data Science
No ratings yet
Data Science
8 pages
Basic Data Science Interview Questions
No ratings yet
Basic Data Science Interview Questions
18 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Data Science: Concepts and Practice: Course Slides
No ratings yet
Data Science: Concepts and Practice: Course Slides
9 pages
Analysis of Time Series
100% (1)
Analysis of Time Series
27 pages
Netflix Data Science Interview Question
No ratings yet
Netflix Data Science Interview Question
7 pages
20 Most Popular Data Science Interview Questions
No ratings yet
20 Most Popular Data Science Interview Questions
44 pages
000+ +curriculum+ +Complete+Data+Science+and+Machine+Learning+Using+Python
No ratings yet
000+ +curriculum+ +Complete+Data+Science+and+Machine+Learning+Using+Python
10 pages
DATA SCIENCE INTERVIEW
No ratings yet
DATA SCIENCE INTERVIEW
32 pages
100 Data Science in R Interview Questions and Answers For 2016
100% (2)
100 Data Science in R Interview Questions and Answers For 2016
56 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Real Statistics Examples Part 1A
No ratings yet
Real Statistics Examples Part 1A
853 pages
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
100% (1)
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
122 pages
Simple Regression 1
No ratings yet
Simple Regression 1
18 pages
Data Science Interview Questions and Answers For 2020
No ratings yet
Data Science Interview Questions and Answers For 2020
20 pages
51 Machine Learning Interview Questions With Answers - Springboard
100% (1)
51 Machine Learning Interview Questions With Answers - Springboard
20 pages
Python Interview Questions 1653100147
No ratings yet
Python Interview Questions 1653100147
24 pages
CS7641 Machine Learning Midterm Notes PDF
No ratings yet
CS7641 Machine Learning Midterm Notes PDF
239 pages
2nd Unit - 2.2 - Data Analytics
No ratings yet
2nd Unit - 2.2 - Data Analytics
22 pages
Data Science Notes
No ratings yet
Data Science Notes
36 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Business Report Advance Statistics
No ratings yet
Business Report Advance Statistics
39 pages
Detecting Data Outliers
No ratings yet
Detecting Data Outliers
7 pages
40 Interview Questions Asked at Startups in Machine Learning - Data Science
100% (1)
40 Interview Questions Asked at Startups in Machine Learning - Data Science
33 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
R-Cheat Sheet
100% (1)
R-Cheat Sheet
4 pages
Introduction To R: Arin Basu MD MPH Dataanalytics
No ratings yet
Introduction To R: Arin Basu MD MPH Dataanalytics
33 pages
Multinomial Logistic Regression Basic Relationships
No ratings yet
Multinomial Logistic Regression Basic Relationships
73 pages
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
No ratings yet
Machine Lpipearning Interview Questions: Algorithms/Tp: Q1-What's The Trade-Off Between Bias and Variance?
46 pages
Data Science
100% (1)
Data Science
7 pages
Data Science Interview Questions 2019
No ratings yet
Data Science Interview Questions 2019
16 pages
Understanding Random Forest
100% (1)
Understanding Random Forest
12 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
Regis Touch
No ratings yet
Regis Touch
177 pages
Chapter 1 Data Analysis
No ratings yet
Chapter 1 Data Analysis
18 pages
Single customer view Second Edition
From Everand
Single customer view Second Edition
Gerardus Blokdyk
No ratings yet
Time Series and Survival Analysis
No ratings yet
Time Series and Survival Analysis
30 pages
Module - 3 Time Series Analysis
No ratings yet
Module - 3 Time Series Analysis
26 pages
M1_L1 (Introduction, Applications)
No ratings yet
M1_L1 (Introduction, Applications)
39 pages
Da Rocha Barros - Reflections On The Clinical Implications of Symbolism
No ratings yet
Da Rocha Barros - Reflections On The Clinical Implications of Symbolism
23 pages
Chapter3 Future Time
No ratings yet
Chapter3 Future Time
15 pages
Engineering Mathematics-I PDF
100% (1)
Engineering Mathematics-I PDF
3 pages
B Senior Phase Further Education Training Teaching Sciences 2022
No ratings yet
B Senior Phase Further Education Training Teaching Sciences 2022
11 pages
Grade 7 Math Practice Test Scoring Guide PDF
No ratings yet
Grade 7 Math Practice Test Scoring Guide PDF
35 pages
DAA Notes
No ratings yet
DAA Notes
80 pages
Module 4 - Logistic Regression - Afterclass1b
No ratings yet
Module 4 - Logistic Regression - Afterclass1b
54 pages
Barath Kanna C Department of ICE Barath@nitt - Edu
100% (1)
Barath Kanna C Department of ICE Barath@nitt - Edu
26 pages
andreasen_warwick_2008
No ratings yet
andreasen_warwick_2008
30 pages
How To Answer Multiple Choice Questions Like A Pro
No ratings yet
How To Answer Multiple Choice Questions Like A Pro
5 pages
Maths Assighnment by Rahul Amin
No ratings yet
Maths Assighnment by Rahul Amin
6 pages
Statistics & Probability
No ratings yet
Statistics & Probability
23 pages
B.Tech Third Year Computer Science and Engineering From Academic Year 2016-17
No ratings yet
B.Tech Third Year Computer Science and Engineering From Academic Year 2016-17
14 pages
Lutron TM 914c
No ratings yet
Lutron TM 914c
2 pages
Assignment - Nourhan Khaled
No ratings yet
Assignment - Nourhan Khaled
5 pages
2015 Attention Based Models For Speech Recognition Paper
No ratings yet
2015 Attention Based Models For Speech Recognition Paper
9 pages
Answer Key - CK-12 Chapter 06 PreCalculus Concepts
No ratings yet
Answer Key - CK-12 Chapter 06 PreCalculus Concepts
5 pages
Work and Energy
100% (1)
Work and Energy
124 pages
Computers Math. Applic. Vol. 18, No. 5, Pp. 459-466, 1989: I0 TJ, I, J 1 - . - . - N
No ratings yet
Computers Math. Applic. Vol. 18, No. 5, Pp. 459-466, 1989: I0 TJ, I, J 1 - . - . - N
8 pages
Predicate Logic 2
No ratings yet
Predicate Logic 2
33 pages
Kinematics With Sol Assignment
No ratings yet
Kinematics With Sol Assignment
12 pages
Second Edition: For Software Versions GA09 Replaces C0240102-04-11-EN
No ratings yet
Second Edition: For Software Versions GA09 Replaces C0240102-04-11-EN
48 pages
Gravitation (MODULE-2)
No ratings yet
Gravitation (MODULE-2)
14 pages
Basics of Algebra Polynomials
No ratings yet
Basics of Algebra Polynomials
4 pages
New Eutectic Alloys and Their Heats of Transformation
No ratings yet
New Eutectic Alloys and Their Heats of Transformation
6 pages
Class 6 - April Assignment 2014-15
No ratings yet
Class 6 - April Assignment 2014-15
5 pages
International System of Units
No ratings yet
International System of Units
11 pages
Formal Language Theory and Compiler Design and Analysis
No ratings yet
Formal Language Theory and Compiler Design and Analysis
13 pages
Smart Antennas Adaptive Beamforming Through Statistical Signal Processing Techniques
No ratings yet
Smart Antennas Adaptive Beamforming Through Statistical Signal Processing Techniques
6 pages
Definite Integral Sms
No ratings yet
Definite Integral Sms
46 pages

Data Science Interview Preparation 7

Uploaded by

Data Science Interview Preparation 7

Uploaded by

DATA SCIENCE

Let us look at some details for each of them:

X1, X2, X3,...........Xn

Your series with a difference of degree 1 becomes:

(X2 - X1, X3 - X2, X4 - X3,.......Xn - X(n-1)

Q2. What is the process to check stationary data ?

Let us get an idea with these three plots:

Q3. What are ACF and PACF?.

Q4. What do you understand by the trend of data?

Q6. What is AIC and BIC into time series?

θ is the set of all parameter.

Q7. What are the components of the Time -Series?

Q8. What is Time Series Analysis?

Q10. What are the techniques of Forecasting?

 Simple Moving Average (SMA)

Q11. What is the Moving Average?

Below is an example of moving average on a smaller window.

Example of a moving average on a 12h window

Mathematically, exponential smoothing is expressed as:

You might also like