0% found this document useful (0 votes)
22 views

Cloudera Data Analyst

Cloudera Data Analyst INTERVIEW QUESTIONS

Uploaded by

shankar das
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Cloudera Data Analyst

Cloudera Data Analyst INTERVIEW QUESTIONS

Uploaded by

shankar das
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Question set 1

1. What is the Sliding Window method for Time Series


Forecasting?
● Time series can be phrased as supervised learning. Given a sequence of numbers for a time
series dataset, we can restructure the data to look like a supervised learning problem.
● In the sliding window method, the previous time steps can be used as input variables, and
the next time steps can be used as the output variable.
● In statistics and time series analysis, this is called a lag or lag method. The number of
previous time steps is called the window width or size of the lag. This sliding window is
the basis for how we can turn any time series dataset into a supervised learning
problem.

2. What are Ensemble Methods?

Ensemble methods is a machine learning technique that combines several base models in
order to produce one optimal predictive model. Random Forest is a type of ensemble method.
The number of component classifier in an ensemble has a great impact on the accuracy of the
prediction, although there is a law of diminishing results in ensemble construction.

3. What are constraints in SQL?

Constraints are the rules that we can apply on the type of data in a table. That is, we can specify
the limit on the type of data that can be stored in a particular column in a table using constraints.
NOT NULL, UNIQUE, DEFAULT, PRIMARY KEY, FOREIGN KEY, CHECK are the different
constraints in SQL.

4. How do you apply a single format to all the sheets present in a workbook?
To apply the same format to all the sheets of a workbook, follow the given steps:

1. Right-click on any sheet present in that workbook


2. Then, click on the Select All Sheets option
3. Format any of the sheets and you will see that the format has been applied to all the other sheets
as well
Question set 2

1. What is the difference between the RANK() and DENSE_RANK() functions?

The RANK() function in the result set defines the rank of each row within your ordered partition.
If both rows have the same rank, the next number in the ranking will be the previous rank plus a
number of duplicates. If we have three records at rank 4, for example, the next level indicated is
7. The DENSE_RANK() function assigns a distinct rank to each row within a partition based on
the provided column value, with no gaps. If we have three records at rank 4, for example, the
next level indicated is 5.

2. Explain One-hot encoding and Label Encoding. How do they affect the dimensionality of
the given dataset?

One-hot encoding is the representation of categorical variables as binary vectors. Label


Encoding is converting labels/words into numeric form. Using one-hot encoding
increases the dimensionality of the data set. Label encoding doesn’t affect the
dimensionality of the data set. One-hot encoding creates a new variable for each level in
the variable whereas, in Label encoding, the levels of a variable get encoded as 1 and 0.

3. Explain the Difference Between Tableau Worksheet, Dashboard, Story, and


Workbook in Tableau?

● Tableau uses a workbook and sheet file structure, much like Microsoft Excel.
● A workbook contains sheets, which can be a worksheet, dashboard, or a story.
● A worksheet contains a single view along with shelves, legends, and the Data
pane.
● A dashboard is a collection of views from multiple worksheets.
● A story contains a sequence of worksheets or dashboards that work together
to convey information.

4. How can you split a column into 2 or more columns?

You can split a column into 2 or more columns by following the below steps:

1. Select the cell that you want to split. Then, navigate to the Data tab, after that, select
Text to Columns. 2. Select the delimiter. 3. Choose the column data format and select
the destination you want to display the split. 4. The final output will look like below
where the text is split into multiple columns.

You might also like