0% found this document useful (0 votes)
40 views

Data Science

Data science

Uploaded by

malathula00
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Data Science

Data science

Uploaded by

malathula00
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1.

Installation of Python/R language, Visual Studio code editors can be demonstrated along with
Kaggle data set usage. 2. Write programs in Python/R and Execute them in either Visual Studio
Code or PyCharm Community Edition or any other suitable environment. 3. A study was
conducted to understand the effect of number of hours the students spent studying on their
performance in the final exams. Write a code to plot line chart with number of hours spent
studying on x-axis and score in final exam on y-axis. Use a red ‘*’ as the point character, label the
axes and give the plot a title. 03.09.2022 Number of hrs spent studying (x) 10 9 2 15 10 16 11 16
Score in the final exam (0 – 100) (y) 95 80 10 50 45 98 38 93 4. For the given dataset mtcars.csv
(www.kaggle.com/ruiromanini/mtcars), plot a histogram to check the frequency distribution of
the variable ‘mpg’ (Miles per gallon
ChatGPT
ChatGPT
Sure, let's break down each of your requirements and address them step by step.

Installation of Python/R and Visual Studio Code

To install Python, you can download it from the official Python website
(https://www.python.org/downloads/). For R, you can download it from the Comprehensive R
Archive Network (CRAN) website (https://cran.r-project.org/).

For Visual Studio Code, you can download and install it from the official Visual Studio Code
website (https://code.visualstudio.com/).

Once installed, you can set up Python and R extensions in Visual Studio Code to enable coding in
these languages.

Demonstrate Kaggle Dataset Usage

For using Kaggle datasets, you can utilize the Kaggle API. First, install the Kaggle package using
pip:

bash
1. Installation of Python/R language, Visual Studio code editors can be demonstrated along with
Kaggle data set usage. 2. Write programs in Python/R and Execute them in either Visual Studio
Code or PyCharm Community Edition or any other suitable environment. 3. A study was
conducted to understand the effect of number of hours the students spent studying on their
performance in the final exams. Write a code to plot line chart with number of hours spent
studying on x-axis and score in final exam on y-axis. Use a red ‘*’ as the point character, label the
axes and give the plot a title. 03.09.2022 Number of hrs spent studying (x) 10 9 2 15 10 16 11 16
Score in the final exam (0 – 100) (y) 95 80 10 50 45 98 38 93 4. For the given dataset mtcars.csv
(www.kaggle.com/ruiromanini/mtcars), plot a histogram to check the frequency distribution of
the variable ‘mpg’ (Miles per gallon
ChatGPT
ChatGPT
Sure, let's break down each of your requirements and address them step by step.

Installation of Python/R and Visual Studio Code


Installation of Python/R and Visual Studio Code

To install Python, you can download it from the official Python website
(https://www.python.org/downloads/). For R, you can download it from the
Comprehensive R Archive Network (CRAN) website (https://cran.r-project.org/).

For Visual Studio Code, you can download and install it from the official Visual Studio
Code website (https://code.visualstudio.com/).

Once installed, you can set up Python and R extensions in Visual Studio Code to
enable coding in these languages.

Demonstrate Kaggle Dataset Usage

For using Kaggle datasets, you can utilize the Kaggle API. First, install the Kaggle
package using pip:

Then, you can download datasets directly from Kaggle using commands like:

Writing Programs and Executing in Visual Studio Code

After setting up Python and R extensions in Visual Studio Code, you can create new
Python (.py) or R (.R) files, write your code, and execute them directly within the
editor.

Code for Plotting Line Chart


Code for Plotting Histogram

You can execute these scripts in your preferred environment, such as Visual Studio Code or
PyCharm Community Edition, by running them as Python scripts. Make sure you have the
necessary packages installed (matplotlib, pandas) before running the scripts.
2 Consider the books dataset BL-Flickr-Images-Book.csv from Kaggle
(https://www.kaggle.com/adeyoyintemidayo/publication-of-books) which contains
information about books. Write a program to demonstrate the following.  Import the data
into a DataFrame  Find and drop the columns which are irrelevant for the book information.
 Change the Index of the DataFrame  Tidy up fields in the data such as date of publication
with the help of simple regular expression.  Combine str methods with NumPy to clean
column

Sure, here's a Python program demonstrating those steps using the Pandas library:

python
This program first imports the data into a DataFrame, then drops irrelevant
columns. It changes the index of the DataFrame to the 'Identifier' column. It
then tidies up the 'Date of Publication' field using a regular expression to
extract the year, and cleans up the 'Place of Publication' field using a
combination of string methods and NumPy. Finally, it displays the cleaned
DataFrame.

3Train a regularized logistic regression classifier on the iris dataset


(https://archive.ics.uci.edu/ml/machine-learning-databases/iris/ or the inbuilt iris dataset) using
sklearn. Train the model with the following hyperparameter C = 1e4 and report the best
classification accuracy.

Here's how you can train a regularized logistic regression classifier on the iris dataset using
scikit-learn and report the classification accuracy:
This code first loads the iris dataset, splits it into training and testing sets, and then creates a
pipeline with a StandardScaler and a LogisticRegression model with a regularization
parameter (C) set to 1e4. It then trains the model on the training set, makes predictions on the
test set, and calculates the classification accuracy using the accuracy_score function from
scikit-learn. Finally, it prints the classification accuracy.

3b Train an SVM classifier on the iris dataset using sklearn. Try different kernels and the associated
hyperparameters. Train model with the following set of hyperparameters RBFkernel, gamma=0.5,
one-vs-rest classifier, no-feature-normalization. Also try C=0.01,1,10C=0.01,1,10. For the above set
of hyperparameters, find the best classification accuracy along with total number of support vectors
on the test data

Here's how you can train an SVM classifier on the iris dataset using scikit-learn with different
kernels and hyperparameters, and then find the best classification accuracy along with the
total number of support vectors on the test data:
This code iterates over different combinations of kernels, gammas, and Cs, creates an SVM
classifier for each combination, trains it on the training set, predicts on the test set, calculates
accuracy, and counts the number of support vectors. Finally, it prints the best classification
accuracy along with the total number of support vectors.

You might also like