0% found this document useful (0 votes)
2 views

Datascience Question and Answer

The document outlines various time series forecasting models including Autoregression (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMA (SARIMA), and their extensions with exogenous variables (SARIMAX, VAR, VARMA, VARMAX). It also discusses the architecture and significance of deep learning models like AlexNet and VGGNet in image classification tasks. Each model is explained with its components, parameters, and applications in data science.

Uploaded by

rams1.sapbasis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Datascience Question and Answer

The document outlines various time series forecasting models including Autoregression (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), Seasonal ARIMA (SARIMA), and their extensions with exogenous variables (SARIMAX, VAR, VARMA, VARMAX). It also discusses the architecture and significance of deep learning models like AlexNet and VGGNet in image classification tasks. Each model is explained with its components, parameters, and applications in data science.

Uploaded by

rams1.sapbasis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

DATA SCIENCE

INTERVIEW
PREPARATION
(30 Days of Interview
Preparation)
# Day13

P a g e 1 | 10
Q1. What is Autoregression?
Answer:
The autoregressive (AR) model is commonly used to model time-varying processes and solve
problems in the fields of natural science, economics and finance, and others. The models have always
been discussed in the context of random process and are often perceived as statistical tools for time
series data.
A regression model, like linear regression, models an output value which are based on a linear
combination of input values.
Example: y^ = b0 + b1*X1
Where y^ is the prediction, b0 and b1 are coefficients found by optimising the model on training
data, and X is an input value.
This model technique can be used on the time series where input variables are taken as observations
at previous time steps, called lag variables.
For example, we can predict the value for the next time step (t+1) given the observations at the last
two time steps (t-1 and t-2). As a regression model, this would look as follows:
X(t+1) = b0 + b1*X(t-1) + b2*X(t-2)-
Because the regression model uses the data from the same input variable at previous time steps, it is
referred to as an autoregression.
The notation AR(p) refers to the autoregressive model of order p. The AR(p) model is written

P a g e 2 | 10
Q2. What is Moving Average?
Answer:
Moving average: From a dataset, we will get an overall idea of trends by this technique; it is
an average of any subset of numbers. For forecasting long-term trends, the moving average is
extremely useful for it. We can calculate it for any period. For example: if we have sales data for
twenty years, we can calculate the five-year moving average, a four-year moving average, a three-
year moving average and so on. Stock market analysts will often use a 50 or 200-day moving average
to help them see trends in the stock market and (hopefully) forecast where the stocks are headed.

The notation MA(q) refers to the moving average model of order q:

Q3. What is Autoregressive Moving Average (ARMA)?


Answer:
ARMA: It is a model of forecasting in which the methods of autoregression (AR) analysis and
moving average (MA) are both applied to time-series data that is well behaved. In ARMA it is
assumed that the time series is stationary and when it fluctuates, it does so uniformly around a
particular time.
AR (Autoregression model)-
Autoregression (AR) model is commonly used in current spectrum estimation.

P a g e 3 | 10
The following is the procedure for using ARMA.

 Selecting the AR model and then equalizing the output to equal the signal being studied if
the input is an impulse function or the white noise. It should at least be good approximation
of signal.
 Finding a model’s parameters number using the known autocorrelation function or the data .
 Using the derived model parameters to estimate the power spectrum of the signal.
Moving Average (MA) model-
It is a commonly used model in the modern spectrum estimation and is also one of the methods of
the model parametric spectrum analysis. The procedure for estimating MA model’s signal spectrum
is as follows.

 Selecting the MA model and then equalising the output to equal the signal understudy in the
case where the input is an impulse function or white noise. It should be at least a good
approximation of the signal.
 Finding the model’s parameters using the known autocorrelation function.
 Estimating the signal’s power spectrum using the derived model parameters.
In the estimation of the ARMA parameter spectrum, the AR parameters are first estimated, and then
the MA parameters are estimated based on these AR parameters. The spectral estimates of the ARMA
model are then obtained. The parameter estimation of the MA model is, therefore often calculated as
a process of ARMA parameter spectrum association.
The notation ARMA(p, q) refers to the model with p autoregressive terms and q moving-average
terms. This model contains the AR(p) and MA(q) models,

P a g e 4 | 10
Q4. What is Autoregressive Integrated Moving Average (ARIMA)?
Answer:
ARIMA: It is a statistical analysis model that uses time-series data to either better understand the data
set or to predict future trends.
An ARIMA model can be understood by the outlining each of its components as follows-

 Autoregression (AR): It refers to a model that shows a changing variable that regresses on
its own lagged, or prior, values.
 Integrated (I): It represents the differencing of raw observations to allow for the time series
to become stationary, i.e., data values are replaced by the difference between the data values
and the previous values.
 Moving average (MA): It incorporates the dependency between an observation and the
residual error from the moving average model applied to the lagged observations.
Each component functions as the parameter with a standard notation. For ARIMA models, the
standard notation would be the ARIMA with p, d, and q, where integer values substitute for the
parameters to indicate the type of the ARIMA model used. The parameters can be defined as-

 p: It the number of lag observations in the model; also known as the lag order.
 d: It the number of times that the raw observations are differenced; also known as the degree
of differencing.
 q: It the size of the moving average window; also known as the order of the moving average.

P a g e 5 | 10
Q5.What is SARIMA (Seasonal Autoregressive Integrated Moving-
Average)?
Answer:
Seasonal ARIMA: It is an extension of ARIMA that explicitly supports the univariate time series
data with the seasonal component.
It adds three new hyper-parameters to specify the autoregression (AR), differencing (I) and the
moving average (MA) for the seasonal component of the series, as well as an additional parameter
for the period of the seasonality.

Configuring the SARIMA requires selecting hyperparameters for both the trend and seasonal
elements of the series.

Trend Elements
Three trend elements requires the configuration.
They are same as the ARIMA model, specifically-

p: It is Trend autoregression order.


d: It is Trend difference order.
q: It is Trend moving average order.

Seasonal Elements-

Four seasonal elements are not the part of the ARIMA that must be configured, they are-
P: It is Seasonal autoregressive order.
D: It is Seasonal difference order.
Q: It is Seasonal moving average order.
m: It is the number of time steps for the single seasonal period.

Together, the notation for the SARIMA model is specified as-

SARIMA(p,d,q)(P,D,Q)m-

The elements can be chosen through careful analysis of the ACF and PACF plots looking at the
correlations of recent time steps.

P a g e 6 | 10
Q6. What is Seasonal Autoregressive Integrated Moving-Average
with Exogenous Regressors (SARIMAX) ?
Answer:
SARIMAX: It is an extension of the SARIMA model that also includes the modelling of the
exogenous variables.
Exogenous variables are also called the covariates and can be thought of as parallel input sequences
that have observations at the same time steps as the original series. The primary series may be
referred as endogenous data to contrast it from exogenous sequence(s). The observations for
exogenous variables are included in the model directly at each time step and are not modeled in the
same way as the primary endogenous sequence (e.g. as an AR, MA, etc. process).

The SARIMAX method can also be used to model the subsumed models with exogenous variables,
such as ARX, MAX, ARMAX, and ARIMAX.
The method is suitable for univariate time series with trend and/or seasonal components and
exogenous variables.

P a g e 7 | 10
Q7. What is Vector autoregression (VAR)?
Answer:
VAR: It is a stochastic process model used to capture the linear interdependencies among
multiple time series. VAR models generalise the univariate autoregressive model (AR model) by
allowing for more than one evolving variable. All variables in the VAR enter the model in the same
way: each variable has an equation explaining its evolution based on its own lagged values, the lagged
values of the other model variables, and an error term. VAR modelling does not requires as much
knowledge about the forces influencing the variable as do structural models with simultaneous
equations: The only prior knowledge required is a list of variables which can be hypothesised to
affect each other intertemporally.
A VAR model describes the evolution of the set of k variables over the same sample period (t = 1,
..., T) as the linear function of only their past values. The variables are collected in the k-
vector ((k × 1)-matrix) yt, , which has as the (i th )element, yi,t, the observation at time t of
the (i th )variable. Example: if the (i th )variable is the GDP, then yi,t is the value of GDP at time “t”.

-
where the observation yt−i is called the (i-th) lag of y, c is the k-vector of constants (intercepts), Ai is
a time-invariant (k × k)-matrix, and et is a k-vector of error terms satisfying.

Q8. What is Vector Autoregression Moving-Average (VARMA)?


Answer:
VARMA: It is method models the next step in each time series using an ARMA model. It is the
generalisation of ARMA to multiple parallel time series, Example- multivariate time series.

P a g e 8 | 10
The notation for a model involves specifying the order for the AR(p) and the MA(q) models as
parameters to the VARMA function, e.g. VARMA (p, q). The VARMA model can also be used to
develop VAR or VMA models.
This method is suitable for multivariate time series without trend and seasonal components.

Q9. What is Vector Autoregression Moving-Average with Exogenous


Regressors (VARMAX)?
Answer:
VARMAX: It is an extension of the VARMA model that also includes the modelling of the
exogenous variables. It is the multivariate version of the ARMAX method.
Exogenous variables are also called the covariates and can be thought of as parallel input sequences
that have observations at the same time steps as the original series. The primary series(es) are referred
as the endogenous data to contrast it from the exogenous sequence(s). The observations for the
exogenous variables are included in the model directly at each time step and are not modeled in the
same way as the primary endogenous sequence (Example- as an AR, MA, etc.).
This method can also be used to model subsumed models with exogenous variables, such as VARX
and the VMAX.
This method is suitable for multivariate time series without trend and seasonal components and
exogenous variables.

P a g e 9 | 10
Q10. What is Simple Exponential Smoothing (SES)?
Answer:
SES: It method models the next time step as an exponentially weighted linear function of observations
at prior time steps.
This method is suitable for univariate time series without trend and seasonal components.
Exponential smoothing is the rule of thumb technique for smoothing time series data using the
exponential window function. Whereas in the simple moving average, the past observations are
weighted equally, exponential functions are used to assign exponentially decreasing weights over
time. It is easily learned and easily applied procedure for making some determination based on prior
assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of
time-series data.
Exponential smoothing is one of many window functions commonly applied to smooth data in signal
processing, acting as low-pass filters to remove high-frequency noise.
The raw data sequence is often represented by {xt} beginning at time t = 0, and the output of the
exponential smoothing algorithm is commonly written as {st} which may be regarded as a best
estimate of what the next value of x will be. When the sequence of observations begins at time t=
0, the simplest form of exponential smoothing is given by the formulas:

P a g e 10 | 10
DATA SCIENCE
INTERVIEW
PREPARATION
(30 Days of Interview
Preparation)

# DAY 14

P a g e 1 | 11
Q1. What is Alexnet?
Answer:
The Alex Krizhevsky, Geoffrey Hinton and Ilya Sutskever created the neural network architecture
called ‘AlexNet’ and won Image Classification Challenge (ILSVRC) in 2012. They trained their
network on 1.2 million high-resolution images into 1000 different classes with 60 million parameters
and 650,000 neurons. The training was done on two GPUs with split layer concept because GPUs
were a little bit slow at that time.
AlexNet is the name of convolutional neural network which has had a large impact on the field
of machine learning, specifically in the application of deep learning to machine vision. The network
had very similar architecture as the LeNet by Yann LeCun et al. but was deeper with more filters per
layer, and with the stacked convolutional layers. It consist of ( 11×11, 5×5,3×3, convolutions), max
pooling, dropout, data augmentation, ReLU activations and SGD with the momentum. It attached
with ReLU activations after every convolutional and fully connected layer. AlexNet was trained for
six days simultaneously on two Nvidia Geforce GTX 580 GPUs, which is the reason for why their
network is split into the two pipelines.
Architecture

AlexNet contains eight layers with weights, first five are convolutional, and the remaining three are
fully connected. The output of last fully-connected layer is fed to a 1000-way softmax which
produces a distribution over the 1000 class labels. The network maximises the multinomial logistic
regression objective, which is equivalent to maximising the average across training cases of the log-
probability of the correct label under the prediction distribution. The kernels of second, fourth, and
the fifth convolutional layers are connected only with those kernel maps in the previous layer which
reside on the same GPU. The kernels of third convolutional layer are connected to all the kernel maps
in second layer. The neurons in fully connected layers are connected to all the neurons in the previous
layers.

P a g e 2 | 11
In short, AlexNet contains five convolutional layers and three fully connected layers. Relu is applied
after the very convolutional and the fully connected layer. Dropout is applied before the first and
second fully connected year. The network has the 62.3 million parameters and needs 1.1 billion
computation units in a forward pass. We can also see convolution layers, which accounts for 6% of
all the parameters, consumes 95% of the computation.

Q2. What is VGGNet?


Answer:
VGGNet consists of 16 convolutional layers and is very appealing because of its very uniform
architecture. Similar to AlexNet, only 3x3 convolutions, but lots of filters. Trained on 4 GPUs for 2–
3 weeks. It is currently the most preferred choice in the community for extracting features from
images. The weight configuration of the VGGNet is publicly available and has been used in many
other applications and challenges as a baseline feature extractor. However, VGGNet consists of 138
million parameters, which can be a bit challenging to handle.
There are multiple variants of the VGGNet (VGG16, VGG19 etc.) which differ only in total number
of layers in the networks. The structural details of the VGG16 network has been shown:

The idea behind having the fixed size kernels is that all the variable size convolutional kernels used
in the Alexnet (11x11, 5x5, 3x3) can be replicated by making use of multiple 3x3 kernels as the
building blocks. The replication is in term of the receptive field covered by kernels .

P a g e 3 | 11
Let’s consider the example. Say we have an input layer of the size 5x5x1. Implementing the conv
layer with kernel size of 5x5 and stride one will the results and output feature map of (1x1). The same
output feature map can obtained by implementing the two (3x3) Conv layers with stride of 1 as
below:

Now, let’s look at the number of the variables needed to be trained. For a 5x5 Conv layer filter, the
number of variables is 25. On the other hand, two conv layers of kernel size 3x3 have a total of
3x3x2=18 variables (a reduction of 28%).

P a g e 4 | 11
Q3. What is VGG16?
Answer:
VGG16: It is a convolutional neural network model proposed by the K. Simonyan and A. Zisserman
from the University of Oxford in the paper “Very Deep Convolutional Networks for the Large-Scale
Image Recognition”. The model achieves 92.7% top 5 test accuracy in ImageNet, which is the dataset
of over 14 million images belonging to the 1000 classes. It was one of famous model submitted
to ILSVRC-2014. It improves AlexNet by replacing the large kernel-sized filters (11 and 5 in the first
and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another.
VGG16 was trained for weeks and was using NVIDIA Titan Black GPU’s.

The Architecture
The architecture depicted below is VGG16.

The input to the Cov1 layer is of fixed size of 224 x 224 RGB image. The image is passed through
the stack of convolutional (conv.) layers, where the filters were used with a very small receptive field:
3×3 (which is the smallest size to capture the notion of left/right, up/down, centre). In one of the
configurations, it also utilises the 1×1 convolution filters, which can be seen as the linear
transformation of the input channels . The convolution stride is fixed to the 1 pixel, the spatial padding
of the Conv. layer input is such that, the spatial resolution is preserved after the convolution, i.e. the

P a g e 5 | 11
padding is 1-pixel for 3×3 Conv. layers. Spatial pooling is carried out by the five max-pooling layers,
which follows some of the Conv. Layers. Max-pooling is performed over the 2×2 pixel window, with
stride 2.
Three Fully-Connected (FC) layers follow the stack of convolutional layers (which has the different
depth in different architectures): the first two have 4096 channels each, the third performs 1000-way
ILSVRC classification and thus contains 1000 channels . The final layer is softmax layer. The
configurations of the fully connected layers is same in all the networks.
All hidden layers are equipped with rectification (ReLU) non-linearity. It is also noted that none of
the networks (except for one) contain the Local Response Normalisation (LRN), such normalisation
does not improve the performance on the ILSVRC dataset, but leads to increased memory
consumption and computation time.

Q4. What is ResNet?


Answer:
At the ILSVRC 2015, so-called Residual Neural Network (ResNet) by the Kaiming He et al
introduced the anovel architecture with “skip connections” and features heavy batch normalisation.
Such skip connections are also known as the gated units or gated recurrent units and have the strong
similarity to recent successful elements applied in RNNs. Thanks to this technique as they were able
to train the NN with 152 layers while still having lower complexity than the VGGNet. It achieves the
top-5 error rate of 3.57%, which beats human-level performance on this dataset.

Q5. What is HAAR CASCADE?


Answer:
Haar Cascade: It is the machine learning object detections algorithm used to identify the objects in
an image or the video and based on the concept of features proposed by Paul Viola and Michael
Jones in their paper "Rapid Object Detection using a Boosted Cascade of Simple Features" in 2001.
It is a machine learning-based approach where the cascade function is trained from the lot of positive
and negative images. It is then used to detect the objects in other images.
The algorithm has four stages:
P a g e 6 | 11
 Haar Feature Selection
 Creating Integral Images
 Adaboost Training
 Cascading Classifiers
It is well known for being able to detect faces and body parts in an image but can be trained to
identify almost any object.

Q6. What is Transfer Learning?


Answer:
Transfer learning: It is the machine learning method where the model developed for a task is
reused as the starting point for the model on the second task .
Transfer Learning differs from the traditional Machine Learning in that it is the use of pre-trained
models that have been used for another task to jump-start the development process on a new task
or problem.

P a g e 7 | 11
The benefits of the Transfer Learning are that it can speed up the time as it takes to develop and
train the model by reusing these pieces or modules of already developed models. This helps to
speed up the model training process and accelerate results.

Q7. What is Faster, R-CNN?


Answer:
Faster R-CNN: It has two networks: region proposal network (RPN) for generating region
proposals and a network using these proposals to detect objects. The main difference here with
the Fast R-CNN is that the later uses selective search to generate the region proposals. The time
cost of generating the region proposals is much smaller in the RPN than selective search, when
RPN shares the most computation with object detection network. In brief, RPN ranks region
boxes (called anchors) and proposes the ones most likely containing objects.
Anchors
Anchors play an very important role in Faster R-CNN. An anchor is the box. In default
configuration of Faster R-CNN, there are nine anchors at the position of an image. The graphs
shown 9 anchors at the position (320, 320) of an image with size (600, 800).

P a g e 8 | 11
Region Proposal Network:
The output of the region proposal network is the bunch of boxes/proposals that will be examined
by a classifier and regressor to check the occurrence of objects eventually. To be more
precise, RPN predicts the possibility of an anchor being background or foreground, and refine
the anchor.

Q8. What is RCNN?


Answer:
To bypass the problem of selecting the huge number of regions, Ross Girshick et al. proposed a
method where we use the selective search to extract just 2000 regions from the image, and he
called them as region proposals. Therefore, instead of trying to classify the huge number of
regions, you can work with 2000 regions.

P a g e 9 | 11
Problems with R-CNN:

 It still takes the huge amount of time to train the network as we would have to classify
2000 region proposals per image.
 It cannot be implemented real-time as it takes around 47 seconds for each test image.
 The selective search algorithm is the fixed algorithm. Therefore, no learning is happening
at that stage. This leads to the generation of the bad candidate region proposals.

Q9.What is GoogLeNet/Inception?
Answer:
The winner of the ILSVRC 2014 competition was GoogLeNet from Google. It achieved a top-5 error
rate of 6.67%! This was very close to human-level performance which the organisers of the challenge
were now forced to evaluate. As it turns out, this was rather hard to do and required some human
training to beat GoogLeNets accuracy. After the few days of training, the human expert (Andrej
Karpathy) was able to achieve the top-5 error rate of 5.1%(single model) and 3.6%(ensemble). The
network used the CNN inspired by LeNet but implemented a novel element which is dubbed an
inception module. It used batch normalisation, image distortions and RMSprop. This module is based
on the several very small convolutions to reduce the number of parameters drastically. Their
architecture consisted of the 22 layer deep CNN but reduced the number of parameters from 60
million (AlexNet) to 4 million.
It contains 1×1 Convolution at the middle of network, and global average pooling is used at the end
of the network instead of using the fully connected layers. These two techniques are from another
paper “Network In-Network” (NIN). Another technique, called inception module, is to have different
sizes/types of convolutions for the same input and to stack all the outputs.

P a g e 10 | 11
Q10. What is LeNet-5?
Answer:
LeNet-5, a pioneering 7-level convolutional network by the LeCun et al in 1998, that classifies digits,
was applied by several banks to recognise hand-written numbers on checks (cheques) digitised in
32x32 pixel greyscale input images. The ability to process higher-resolution images requires larger
and more convolutional layers, so the availability of computing resources constrains this technique.

LeNet-5 is very simple network. It only has seven layers, among which there are three convolutional
layers (C1, C3 and C5), two sub-sampling (pooling) layers (S2 and S4), and one fully connected layer
(F6), that are followed by output layers. Convolutional layers use 5 by 5 convolutions with stride 1.
Sub-sampling layers are 2 by 2 average pooling layers. Tanh sigmoid activations are used to
throughout the network. Several interesting architectural choices were made in LeNet-5 that are not
very common in the modern era of deep learning.

------------------------------------------------------------------------------------------------------------------------

P a g e 11 | 11

You might also like