0% found this document useful (0 votes)
5 views

exp1

The document outlines a series of experiments for data analysis using Python, focusing on downloading a dataset from Kaggle, installing necessary libraries, and performing basic operations with NumPy and pandas. It details steps for loading a dataset into a pandas DataFrame and selecting specific rows and columns. Additionally, it suggests extending experiments to include advanced exploratory data analysis techniques.

Uploaded by

aimlbtech7
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

exp1

The document outlines a series of experiments for data analysis using Python, focusing on downloading a dataset from Kaggle, installing necessary libraries, and performing basic operations with NumPy and pandas. It details steps for loading a dataset into a pandas DataFrame and selecting specific rows and columns. Additionally, it suggests extending experiments to include advanced exploratory data analysis techniques.

Uploaded by

aimlbtech7
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Sample Experiments:

1. a) Download Dataset from Kaggle using the following link :


https://www.kaggle.com/datasets/sukhmanibedi/cars4u b) Install
python libraries required for Exploratory Data Analysis (numpy,
pandas, matplotlib,seaborn)

2. Perform Numpy Array basic operations and Explore Numpy


Built-in functions.

3. Loading Dataset into pandas dataframe 4. Selecting rows and


columns in the dataframe

Sample Experiments for Data Analysis and Numpy Operations

1. Downloading Dataset from Kaggle

a) Download Dataset from Kaggle: To download a dataset from Kaggle,


you need a Kaggle account and API access. Follow these steps:

 Step 1: Go to the Kaggle dataset page Cars4U Dataset.

 Step 2: Click on the "Download" button to download the dataset as


a .zip file.

After downloading, extract the file to a suitable location on your system.

b) Install Python Libraries for Exploratory Data Analysis (EDA):

Install the required libraries using pip:

pip install numpy pandas matplotlib seaborn

2. Perform Numpy Array Basic Operations and Explore Numpy


Built-in Functions

Once you've installed numpy, you can perform some basic array
operations. Here's how you can explore some essential NumPy operations:

Basic Operations in NumPy:

import numpy as np

# Creating a NumPy array

arr = np.array([1, 2, 3, 4, 5])

print("Array:", arr)
# Array shape and size

print("Shape:", arr.shape)

print("Size:", arr.size)

# Array Indexing and Slicing

print("Element at index 2:", arr[2])

print("Sliced array (from index 1 to 3):", arr[1:4])

# Array Operations

arr2 = np.array([5, 4, 3, 2, 1])

print("Array sum:", np.sum(arr))

print("Array dot product:", np.dot(arr, arr2))

# Element-wise addition

print("Element-wise addition:", arr + arr2)

Exploring NumPy Built-in Functions:

NumPy provides many useful functions for mathematical operations:

# Create arrays with specific values

zeros_array = np.zeros((2, 3)) # 2x3 array filled with zeros

ones_array = np.ones((3, 2)) # 3x2 array filled with ones

random_array = np.random.rand(2, 3) # 2x3 array with random values


between 0 and 1

print("Zeros Array:\n", zeros_array)

print("Ones Array:\n", ones_array)

print("Random Array:\n", random_array)

# Statistical functions
print("Mean of array:", np.mean(arr))

print("Standard deviation of array:", np.std(arr))

3. Loading Dataset into Pandas DataFrame

To load the dataset into a pandas DataFrame, first ensure you have the
pandas library installed.

Example code for loading a dataset into pandas:

import pandas as pd

# Load the dataset into a pandas DataFrame

file_path = 'path_to_your_downloaded_file/cars4u.csv' # Adjust this path


to the location where you saved the dataset

df = pd.read_csv(file_path)

# Display the first 5 rows of the dataframe

print(df.head())

The read_csv() function in pandas is used to read CSV files into a


DataFrame. Once the file is loaded, you can explore the first few rows of
the data using .head(), which will show the top 5 rows by default.

4. Selecting Rows and Columns in the DataFrame

After loading the dataset into a DataFrame, you can select specific rows
and columns. Here's how you can do it:

Selecting Columns:

 You can select a single column by passing the column name in


square brackets ([]).

python

Copy code

# Selecting a single column

print(df['Car_Model']) # Replace 'Car_Model' with the actual column name

 For selecting multiple columns, pass a list of column names:


python

Copy code

# Selecting multiple columns

selected_columns = df[['Car_Model', 'Price', 'Year']] # Replace these with


actual column names

print(selected_columns.head())

Selecting Rows:

 You can select specific rows using .iloc[] (by index) or .loc[] (by
label):

python

Copy code

# Select the 10th row

print(df.iloc[9])

# Select rows based on a condition (e.g., Car Model)

condition = df['Car_Model'] == 'Toyota' # Replace 'Toyota' with an actual


car model from the dataset

print(df[condition])

Filtering Data based on multiple conditions:

python

Copy code

# Filter data where Price is greater than 20,000 and Year is greater than
2015

filtered_data = df[(df['Price'] > 20000) & (df['Year'] > 2015)]

print(filtered_data)

Using .loc[] for label-based indexing:

python

Copy code

# Select rows where Car Model is 'Toyota' and display 'Price' and 'Year'
columns

toyota_data = df.loc[df['Car_Model'] == 'Toyota', ['Price', 'Year']]


print(toyota_data)

Summary:

These experiments cover the basics of data loading, exploration, and


manipulation. Here's a summary of the steps:

1. Download the dataset from Kaggle and extract it.

2. Install necessary libraries (e.g., numpy, pandas, matplotlib,


seaborn).

3. Perform basic NumPy operations such as array creation, indexing,


and element-wise operations.

4. Load the dataset into a pandas DataFrame using pd.read_csv().

5. Select rows and columns from the DataFrame using indexing


techniques like .iloc[], .loc[], and condition-based filtering.

You can extend these experiments further by performing more advanced


EDA techniques like data visualization, handling missing data, and
calculating statistical summaries.

You might also like