0% found this document useful (0 votes)
14 views

Lab 1 ML

This document summarizes key information about a supermarket sales dataset containing 1000 rows and 13 columns. It prints the first few rows, describes the data types and features for each column, and provides summary statistics for the numeric columns including mean, standard deviation, minimum, quartiles, and maximum values.

Uploaded by

kashish.k
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lab 1 ML

This document summarizes key information about a supermarket sales dataset containing 1000 rows and 13 columns. It prints the first few rows, describes the data types and features for each column, and provides summary statistics for the numeric columns including mean, standard deviation, minimum, quartiles, and maximum values.

Uploaded by

kashish.k
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

1/11/24, 9:37 PM Untitled10.

ipynb - Colaboratory

import pandas as pd
df = pd.read_csv("supermarketsales.csv")
print("First few rows of the dataset:")
print(df.head())

First few rows of the dataset:


Invoice ID Branch City Customer type Gender \
0 716-39-1409 B Mandalay Normal Male
1 704-48-3927 A Yangon Member Male
2 628-34-3388 C Naypyitaw Normal Male
3 630-74-5166 A Yangon Normal Male
4 588-01-7461 C Naypyitaw Normal Female

Product line Unit price Quantity Tax 5% Total amount Date \


0 Health and beauty 30.35 7 10.6225 223.07 April
1 Electronic accessories 88.67 10 44.3350 931.04 April
2 Fashion accessories 27.38 6 8.2140 172.49 April
3 Sports and travel 62.13 6 18.6390 391.42 April
4 Food and beverages 33.98 9 15.2910 321.11 April

Payment Rating
0 Cash 8.0
1 Ewallet 7.3
2 Credit card 7.9
3 Cash 7.4
4 Cash 4.2

print("Information about the dataset:")


print(df.info())

Information about the dataset:


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Invoice ID 1000 non-null object
1 Branch 1000 non-null object
2 City 1000 non-null object
3 Customer type 1000 non-null object
4 Gender 1000 non-null object
5 Product line 1000 non-null object
6 Unit price 1000 non-null float64
7 Quantity 1000 non-null int64
8 Tax 5% 1000 non-null float64
9 Total amount 1000 non-null float64
10 Date 1000 non-null object
11 Payment 1000 non-null object
12 Rating 1000 non-null float64
dtypes: float64(4), int64(1), object(8)
memory usage: 101.7+ KB
None

print("Summary statistics of numerical columns:")


print(df.describe())

Summary statistics of numerical columns:


Unit price Quantity Tax 5% Total amount Rating
count 1000.000000 1000.000000 1000.000000 1000.000000 1000.00000
mean 55.672130 5.510000 15.379369 322.967430 6.97270
std 26.494628 2.923431 11.708825 245.885557 1.71858
min 10.080000 1.000000 0.508500 10.680000 4.00000
25% 32.875000 3.000000 5.924875 124.425000 5.50000
50% 55.230000 5.000000 12.088000 253.850000 7.00000
75% 77.935000 8.000000 22.445250 471.350000 8.50000
max 99.960000 10.000000 49.650000 1042.650000 10.00000

https://colab.research.google.com/drive/1wPr-x9-kma0doO2UknDmqGOj9sQ-15ZM#scrollTo=GEGiyS_jMHM2&printMode=true 1/2
1/11/24, 9:37 PM Untitled10.ipynb - Colaboratory
print("Data types of each column:")
print(df.dtypes)

Data types of each column:


Invoice ID object
Branch object
City object
Customer type object
Gender object
Product line object
Unit price float64
Quantity int64
Tax 5% float64
Total amount float64
Date object
Payment object
Rating float64
dtype: object

https://colab.research.google.com/drive/1wPr-x9-kma0doO2UknDmqGOj9sQ-15ZM#scrollTo=GEGiyS_jMHM2&printMode=true 2/2

You might also like