Data Science Interview Preparation 7
Data Science Interview Preparation 7
INTERVIEW
PREPARATION
(30 Days of Interview
Preparation)
# DAY 07
Q1. What is the process to make data stationery from non-
stationary in time series?
Ans:
The two most common ways to make a non-stationary time series stationary are:
Differencing
Transforming
Differencing:
To make your series stationary, you take a difference between the data points. So let us say, your
original time series was:
Once, you make the difference, plot the series and see if there is any improvement in the ACF curve.
If not, you can try a second or even a third-order differencing. Remember, the more you difference,
the more complicated your analysis is becoming.
Transforming:
If we cannot make a time series stationary, you can try out transforming the variables. Log transform
is probably the most commonly used transformation if we see the diverging time series.
However, it is suggested that you use transformation only in case differencing is not working.
Ans:
Stationary series: It is one in which the properties – mean, variance and covariance, do not vary
with time.
In the first plot, we can see that the mean varies (increases) with time, which results in an
upward trend. This is the non-stationary series.
For the series classification as stationary, it should not exhibit the trend.
Moving on to the second plot, we do not see a trend in the series, but the variance of the series
is a function of time. As mentioned previously, a stationary series must have a constant
variance.
If we look at the third plot, the spread becomes closer, as the time increases, which implies that
covariance is a function of time.
These three plots refer to the non-stationary time series. Now give your attention to fourth:
In this case, Mean, Variance and Covariance are constant with time. This is how a stationary time
series looks like.
Most of the statistical models require the series to be stationary to make an effective and precise
prediction.
The various process you can use to find out your data is stationary or not by the following terms:
1. Visual Test
2. Statistical Test
3. ADF(Augmented Dickey-Fuller) Test
4. KPSS(Kwiatkowski-Phillips-Schmidt-Shin) Test
Hmm, this looks like there is a trend. To build up confidence, let's add a linear regression for this
graph:
Great, now it’s clear theirs a trend in the graph by adding Linear Regression.
Q5. What is the Augmented Dickey-Fuller Test?
Ans:
The Dickey-Fuller test: It is one of the most popular statistical tests. It is used to determine the
presence of unit root in a series, and hence help us to understand if the series is stationary or not.
The null and alternate hypothesis for this test is:
Null Hypothesis: The series has a unit root (value of a =1)
Alternate Hypothesis: The series has no unit root.
If we fail to reject the null hypothesis, we can say that the series is non-stationary. This means that
the series can be linear or difference stationary.
k log(n)- 2log(L(θ̂)).
Here n is the sample size.
K is the number of parameters which your model estimates.
L (θ̂) represents the likelihood of the model tested, when evaluated at maximum likelihood values
of θ.
Quality of descriptive model is determined by how well it describes all available data and the
interpretation it provides to inform the problem domain better.
Q9. Give some examples of the Time-Series forecast?
Ans:
There is almost an endless supply of the time series forecasting problems. Below are ten examples
from a range of industries to make the notions of time series analysis and forecasting more
concrete.
1. Forecasting the corn yield in tons by the state each year.
2. Forecasting whether an EEG trace in seconds indicates a patient is having a seizure or not.
3. Forecasting the closing price of stocks every day.
4. Forecasting the birth rates at all hospitals in the city every year.
5. Forecasting product sales in the units sold each day for the store.
6. Forecasting the number of passengers through the train station each day.
7. Forecasting unemployment for a state each quarter.
8. Forecasting the utilisation demand on the server every hour.
9. Forecasting the size of the rabbit populations in the state each breeding season.
10. Forecasting the average price of gasoline in a city each day.
Although simple, this model might be surprisingly good, and it represents a good starting point.
Otherwise, the moving average can be used to identify interesting trends in the data. We can define
a window to apply the moving average model to smooth the time series and highlight different trends.
Example of a moving average on a 24h window
In the plot above, we applied the moving average model to a 24h window. The green
line smoothed the time series, and we can see that there are two peaks in the 24h period.
The longer the window, the smoother the trend will be.
Here, alpha is the smoothing factor which takes values between 0 to 1. It determines how fast the
weight will decrease for the previous observations.
From the above plot, the dark blue line represents the exponential smoothing of the time series
using a smoothing factor of 0.3, and the orange line uses a smoothing factor of 0.05. As we can see,
the smaller the smoothing factor, the smoother the time series will be. Because as smoothing factor
approaches 0, we approach to the moving average model
------------------------------------------------------------------------------------------------------------------------