0% found this document useful (0 votes)
18 views

Data Scientist Exercise

The document provides instructions for two data science exercises - the first is to identify and build a predictive model for any given dataset, including importing libraries, exploring and cleaning the data, building models, and evaluating performance; the second exercise is to deploy the best model from Exercise 1 as an API and store API access logs in a MySQL database.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Data Scientist Exercise

The document provides instructions for two data science exercises - the first is to identify and build a predictive model for any given dataset, including importing libraries, exploring and cleaning the data, building models, and evaluating performance; the second exercise is to deploy the best model from Exercise 1 as an API and store API access logs in a MySQL database.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

INTERNAL

Data Science Exercise


Introduction

This exercise tests your ability to machine learning skills to build models for elegance. You should exploit the
latest solution available in the market to your advantage. The less code you write to achieve a defined
functionality, the more your score. Use proper packages to solve technical challenges.

Exercise #1: Identify and build a predictive model for any dataset
In the given dataset create the following steps mentioned below. Make sure there are proper comments
available in the code region for a better understanding of the dataset and logic.

Please use the Jupyter notebook to complete the exercise. Preferably python. Organize your code in sections as
per the following details of the exercise. You are encouraged to learn classical Machine learning techniques and
experience their behaviour. Notebooks should be uploaded with output for evaluation.

1. Import needed Libraries and Datasets from Kaggle

a. Import the required libraries


b. Since our clients are either bank or financial institutions, we’ve restricted internet access. So that
make sure the required libraries are readily available in offline and installable. Convert required libraries
in .zip or .tar
c. Write a command for download and install the required libraries in offline. Save your commands in
notepad for assessment purpose

2. Data Visualization and Exploration

a. Print at least five rows for a sanity check to identify all the features present in the dataset and
if the target matches with them.
b. Print the description and shape of the dataset.
c. Provide appropriate visualization to get an insight into the dataset.
d. Try exploring the data and see what insights can be drawn from the dataset.

3. Data Pre-processing and cleaning

a. Do the appropriate pre-processing of the data, like identifying NULL or Missing Values, if
any, handling outliers in the dataset, skewed data, etc. Apply appropriate
feature engineering techniques for them.
b. Apply the feature transformation techniques like Standardization, Normalization, etc. You
are free to apply the appropriate transformations depending on the structure and the
complexity of your dataset.
c. Do the correlational analysis on the dataset. Provide a visualization for the same.

4. Data Preparation
a. Do the final feature selection and extract them into Column X and the class label into
Column into Y.
b. Split the dataset into training and test sets.

5. Model Building
a. Develop Random Forest and XGB Model
b. Train the model and print the training accuracy and loss values.
INTERNAL

Data Science Exercise


c. Save the model for further utilization
d. Load the saved model and perform below steps

6. Performance Evaluation
a. Predict the test data and display the results for the inference.
b. Use K-fold cross validation to find optimal accuracy and loss values on test dataset
c. Print the confusion matrix. Provide appropriate analysis for the same.
d. Perform Precision, Recall and F1 score
e. Perform Model tuning and identify the optimal parameters for above 2 models
f. Compare Random Forest and XGB and provide the Optimal/best model for the above problem
g. Describe how you achieved the best model from the 2 models created

Exercise #2: Deploy the model

a. Download the created model and deploy and consume the model and expose it as an API
d. Create the MYSQL Connection and Store the log when the API is accessed. The Table has at least
below attributes
* ID, Primary Key,
* Input given for model
* Output from the model
* Created On
* Created By
* Updated On
* Updated By

c. Use postman to call the API and share the postman result in the screenshot

You might also like