0% found this document useful (0 votes)
16 views7 pages

EDA_UNIT_1

The document outlines a lab exercise focused on Exploratory Data Analysis (EDA) using Python, specifically with the Cars4U dataset from Kaggle. It includes instructions for downloading the dataset, installing necessary Python libraries (numpy, pandas, matplotlib, seaborn), and performing basic operations with numpy arrays and pandas dataframes. Additionally, it covers loading datasets, selecting rows and columns in dataframes, and provides code examples for each task.

Uploaded by

arafaths062
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views7 pages

EDA_UNIT_1

The document outlines a lab exercise focused on Exploratory Data Analysis (EDA) using Python, specifically with the Cars4U dataset from Kaggle. It includes instructions for downloading the dataset, installing necessary Python libraries (numpy, pandas, matplotlib, seaborn), and performing basic operations with numpy arrays and pandas dataframes. Additionally, it covers loading datasets, selecting rows and columns in dataframes, and provides code examples for each task.

Uploaded by

arafaths062
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

EDA LAB

UNIT-I

1.a) Download Dataset from Kaggle using the following link :


https://www.kaggle.com/datasets/sukhmanibedi/cars4u
b) Install python libraries required for Exploratory Data Analysis (numpy,
pandas, matplotlib,seaborn)

Theory:

1.a) Open any browser and paste the above Kaggle link. A zip file will be
downloaded. Unzip it and study the Cars4U dataset(
used_cars_data.csv(785.45 kB))
in detail.
Explain the columns of the Dataset and count the no.of rows .

b) Install python libraries

PIP is a package management system used to install and manage software


packages/libraries written in Python. PIP stands for Preferred Installer
Program
Prerequisites:
Python should be installed on your Windows machine.
How to Check if Python is Installed?
Run the following command to test if Python is installed or not.
>python - -version
If it is installed, You will see something like this:
Python 3.12.4

Python PIP can be downloaded and installed with following method:


Follow these instructions to pip windows install:
Step 1: Open the cmd terminal
Step 2: In python, a curl is a tool for transferring data requests to and from a
server. Use the following command to request:
>https://bootstrap.pypa.io/get-pip.py

>python get-pip.py

pandas can be installed via pip from PyPI


>pip install pandas
If you use pip, you can install NumPy with:
>pip install numpy

>pip install Matplotlib

>pip install seaborn

After installation verify , by importing them in Jupyter note book.

Conclusions:

2 . Perform Numpy Array basic operations and Explore Numpy Built-in


functions.

Theory:

For importing numpy, we will use the following code:

import numpy as np

For creating different types of numpy arrays, we will use the following code:

# importing numpy
import numpy as np

# Defining 1D array
my1DArray = np.array([1, 8, 27, 64])
print(my1DArray)

# Defining and printing 2D array


my2DArray = np.array([[1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]])
print(my2DArray)

#Defining and printing 3D array


my3Darray = np.array([[[ 1, 2 , 3 , 4],[ 5 , 6 , 7 ,8]], [[ 1, 2,
3, 4],[ 9, 10, 11, 12]]])
print(my3Darray)

For displaying basic information, such as the data type, shape, size, and strides of
a NumPy array, we will use the following code:

# Print out memory address


print(my2DArray.data)

# Print the shape of array


print(my2DArray.shape)

# Print out the data type of the array


print(my2DArray.dtype)

Strides in numpy array:


How many bytes we have to skip in memory to move to the next position along a
certain axis.
# Print the stride of the array.
print(my2DArray.strides)

For creating an array using built-in NumPy functions, we will use the following
code:

# Array of ones
ones = np.ones((3,4))
print(ones)

# Array of zeros
zeros = np.zeros((2,3,4),dtype=np.int16)
print(zeros)

# Array with random values


np.random.random((2,2))

# Empty array
emptyArray = np.empty((3,2))
print(emptyArray)

# Full array
fullArray = np.full((2,2),7)
print(fullArray)
# Array of evenly-spaced values
evenSpacedArray = np.arange(10,25,5)
print(evenSpacedArray)

# Array of evenly-spaced values


evenSpacedArray2 = np.linspace(0,2,9)
print(evenSpacedArray2)

CONCLUSIONS:
3.. Loading Dataset into pandas dataframe

A. # Python Pandas read CSV


import pandas as pd
# Reading the CSV file
df = pd.read_csv("Iris.csv")
# Printing top 5 rows
df.head()

B. Using sk learn to import dataset

from sklearn.datasets import load_iris

# Load the Iris dataset


iris = load_iris()

# Access the features and target variable


X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # Target variable (species: 0 for setosa, 1 for versicolor, 2 for virginica)

# Print the feature names and target names


print("Feature names:", iris.feature_names)
print("Target names:", iris.target_names)

CONCLUSION:

4. Selecting rows and columns in the dataframe

The following code displays the rows, columns, data types, and memory used by the
dataframe:

Refer code in-A

df.info()

Let's now see how we can select rows and columns in any dataframe:
# Selects a row
df.iloc[10]

# Selects 10 rows
df.iloc[0:10]
# Selects a range of rows
df.iloc[10:15]

# Selects the last 2 rows


df.iloc[-2:]

# Selects every other row in columns 3-5


df.iloc[::2, 3:5].head()

Refer code in B -Selecting columns

# Access the features and target variable


X = iris.data # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # Target variable (species: 0 for setosa, 1 for versicolor, 2 for virginica)
print(X)
print(y)

CONCLUSIONS:

You might also like