0% found this document useful (0 votes)

8 views

Family Main

The document analyzes a dataset containing family income and spending information to answer several questions, including identifying the highest and lowest earning families, determining if any families have inadequate income to cover spending, and checking for errors in the data.

Uploaded by

hamburgerhenry13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Family Main

Uploaded by

hamburgerhenry13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2.

Family Dataset
Q1. Which family boasts the highest annual income, and which has the lowest?
How do you ascertain this?

In [ ]: import pandas as pd
import numpy as np

In [ ]: family_df = pd.read_csv('family_data.csv')
print(family_df.head())

Family Member Income Spend

0 family1 Adult1 2376330 1119433
1 family1 Adult2 130268 37337
2 family1 Adult3 2254489 972327
3 family2 Adult1 2292355 649806
4 family2 Adult2 298167 100723

In [ ]: # print the unique values of the first two columns

print(family_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 279 entries, 0 to 278
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Family 279 non-null object
1 Member 279 non-null object
2 Income 279 non-null int64
3 Spend 279 non-null int64
dtypes: int64(2), object(2)
memory usage: 8.8+ KB
None

In [ ]: print(family_df.describe())

Income Spend
count 2.790000e+02 2.790000e+02
mean 9.477808e+05 3.344265e+05
std 1.001295e+06 3.760808e+05
min 0.000000e+00 1.391000e+03
25% 0.000000e+00 1.550700e+04
50% 5.458300e+05 1.744480e+05
75% 1.808509e+06 5.432650e+05
max 2.979034e+06 1.475664e+06

In [ ]: family_df["Family id"] = family_df["Family"].apply(lambda x: int(x.replace("family"

In [ ]: # get the total income for each family

total_income = family_df.groupby('Family id')['Income'].sum().sort_index()

print(total_income, '\n')
print(f"The family with the max income: {total_income.idxmax()} with {total_income.
print(f"The family with the min income: {total_income.idxmin()} with {total_income.
Family id
1 4761087
2 2939887
3 2301931
4 2896133
5 1428679
...
96 325062
97 2663794
98 3018609
99 1827150
100 1031646
Name: Income, Length: 100, dtype: int64

The family with the max income: 6 with 7804425

The family with the min income: 94 with 46790

Using the groupby method to group the data by family, and then use the sum method to
sum the annual income of each family. Finally, use the idxmax and idxmin methods to
find the family with the highest and lowest annual income.

As shown above, the family with the highest annual income is family 6 with 7,804,425 dollars;
the family with the lowest annual income is family 94 with 46,790 dollars.

**Q2. Which families do not possess adequate annual income to

cover all members' spending? What is the maximum shortfall? How do you determine this?**

In [ ]: total_spend = family_df.groupby('Family id')['Spend'].sum().sort_index()

family_income_spend = pd.concat([total_income, total_spend], axis=1)
family_income_spend["Surplus"] = family_income_spend["Income"] - family_income_spen
print(family_income_spend[family_income_spend["Surplus"] == family_income_spend["Su

Income Spend Surplus

Family id
94 46790 30029 16761

In [ ]: families_deficit = family_income_spend[family_income_spend["Surplus"] < 0]

print(families_deficit, '\n')
# print(f"The family with the max deficit: {families_deficit['Surplus'].idxmin()} w

Empty DataFrame
Columns: [Income, Spend, Surplus]
Index: []

In [ ]: print(family_income_spend)
Income Spend Surplus
Family id
1 4761087 2129097 2631990
2 2939887 890424 2049463
3 2301931 807835 1494096
4 2896133 1128708 1767425
5 1428679 501827 926852
... ... ... ...
96 325062 135954 189108
97 2663794 774694 1889100
98 3018609 1031955 1986654
99 1827150 493578 1333572
100 1031646 258414 773232

[100 rows x 3 columns]

From above, all families have adequate annual income to cover all members' spending. The
family with the smallest surplus is family 94 with 16,761 dollars. These facts are ascertained
by the results obtained from the describe method.

Q3. Are there any single-parent families, where only one Adult is present? Are
there any childless families? How do you discern this?

With the application of groupby method and apply conditions, we sum the number of
text Adult and Child in the Member column. If the number of Adult is 1, then it is a single-
parent family. If the number of Child is 0, then it is a childless family.

As shown below, there are 40 single-parent families and 35 childless families in the dataset.

In [ ]: # print the number of families with only one adult

adult_counts = family_df.groupby('Family id')["Member"].apply(lambda x: x.str.conta
print("Counts of single-parent families:", adult_counts[adult_counts == 1].count())

# print the number of families with no children

childless_counts = family_df.groupby('Family id')["Member"].apply(lambda x: x.str.c
print("Counts of childless families:", childless_counts[childless_counts == 0].coun

Counts of single-parent families: 40

Counts of childless families: 35

**Q4. Do you suspect any errors within this dataset? Examples

may include negative figures, missing or duplicate data, etc. Why?**

To ensure the accuracy of the dataset, we can use the describe method to check the basic
statistics of the dataset. Similarly, we can use the isnull method to check if there is any
missing data in the dataset. We can also use the duplicated method to check if there is
any duplicate data in the dataset.
As result shown below, we conclude that there are no negative figures, missing or duplicate
data in the dataset.

In [ ]: summary = family_df.describe()

# Check for missing values

missing_values = family_df.isnull().sum()

# Check for duplicated rows

duplicated_rows = family_df.duplicated().sum()

# Check for negative values

numeric_cols = family_df.select_dtypes(include=['number'])
negative_values = (numeric_cols < 0).sum()

print("Summary Statistics:")
print(summary)
print("\nMissing Values:")
print(missing_values)
print("\nDuplicated Rows:")
print(duplicated_rows)
print("\nNegative Values:")
print(negative_values)

Summary Statistics:
Income Spend Family id
count 2.790000e+02 2.790000e+02 279.000000
mean 9.477808e+05 3.344265e+05 47.906810
std 1.001295e+06 3.760808e+05 28.701739
min 0.000000e+00 1.391000e+03 1.000000
25% 0.000000e+00 1.550700e+04 22.000000
50% 5.458300e+05 1.744480e+05 47.000000
75% 1.808509e+06 5.432650e+05 71.500000
max 2.979034e+06 1.475664e+06 100.000000

Missing Values:
Family 0
Member 0
Income 0
Spend 0
Family id 0
dtype: int64

Duplicated Rows:
0

Negative Values:
Income 0
Spend 0
Family id 0
dtype: int64

**Q5. Can ChatGPT or Bing assist with the aforementioned four

questions? If so, to what extent? How do you issue commands to the AI tool? If not, why
not?**

Under the current limitations, ChatGPT and Bind are not able to fully assist with the
questions, but they're able to provide necessary assistance on regarding the usage of
pandas functions. Below are examples of commands used during the process:

ChatGPT: "How do I get the count of a column in each group satisfying a specific
condition in Pandas?"

ChatGPT: "How do I get the sum of a column in each group in Pandas?"

ChatGPT: "How do I get the maximum value of a column in each group in Pandas?"

Copyright Affidavit Template
100% (17)
Copyright Affidavit Template
3 pages
Osteoporosis A Guide To Prevention and Treatment Harvard Health
100% (7)
Osteoporosis A Guide To Prevention and Treatment Harvard Health
57 pages
Statisitics Project 6
100% (2)
Statisitics Project 6
48 pages
Stress-Free Math: A Visual Guide to Acing Math in Grades 4-9
From Everand
Stress-Free Math: A Visual Guide to Acing Math in Grades 4-9
Theresa R Fitzgerald
No ratings yet
Summary of Chapters NFPA 1852: Standard On Selection, Care, and Maintenance of Open-Circuit Self-Contained Breathing Apparatus (SCBA)
100% (1)
Summary of Chapters NFPA 1852: Standard On Selection, Care, and Maintenance of Open-Circuit Self-Contained Breathing Apparatus (SCBA)
2 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Pandas PDF
No ratings yet
Pandas PDF
6 pages
DAV_practicle_File
No ratings yet
DAV_practicle_File
28 pages
1724224347661
No ratings yet
1724224347661
31 pages
Prog Found Final
No ratings yet
Prog Found Final
10 pages
student analysis
No ratings yet
student analysis
16 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
Functionapplicationp PDF
No ratings yet
Functionapplicationp PDF
6 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Online Food Orders Analysis Using Python
No ratings yet
Online Food Orders Analysis Using Python
12 pages
Credit Card Default
No ratings yet
Credit Card Default
5 pages
EDA Python Code Cheatsheets
No ratings yet
EDA Python Code Cheatsheets
52 pages
DM Assignment - Thena Bank
No ratings yet
DM Assignment - Thena Bank
39 pages
PYQ Data Analysis and Visualisation Using Python GE May 2024
No ratings yet
PYQ Data Analysis and Visualisation Using Python GE May 2024
6 pages
pandasquiz
No ratings yet
pandasquiz
7 pages
Project paarth (1) (1)
No ratings yet
Project paarth (1) (1)
21 pages
PYTHON SQL
No ratings yet
PYTHON SQL
5 pages
Ip Practical 2024
No ratings yet
Ip Practical 2024
12 pages
64[7]
No ratings yet
64[7]
4 pages
aiml_
No ratings yet
aiml_
27 pages
Exercises 2
No ratings yet
Exercises 2
10 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
Data Manipulation With Pandas - Yulei's Sandbox
No ratings yet
Data Manipulation With Pandas - Yulei's Sandbox
18 pages
manishadav
No ratings yet
manishadav
27 pages
QP DAV 3rd Sem Dec 2023
No ratings yet
QP DAV 3rd Sem Dec 2023
12 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
20 pages
Set B
No ratings yet
Set B
8 pages
PANDAS & VIS 2
No ratings yet
PANDAS & VIS 2
11 pages
Numpy
No ratings yet
Numpy
9 pages
Python MCQs
No ratings yet
Python MCQs
21 pages
TUTORIAL 2 QB & QP
No ratings yet
TUTORIAL 2 QB & QP
4 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
6 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
B. Sc. H Computer S FkQNyBB
No ratings yet
B. Sc. H Computer S FkQNyBB
6 pages
Lab File
No ratings yet
Lab File
96 pages
Group 10A - GA2
No ratings yet
Group 10A - GA2
10 pages
Unit 5 Fully
No ratings yet
Unit 5 Fully
29 pages
Untitled4 Assigment 3
No ratings yet
Untitled4 Assigment 3
9 pages
dsbda_exp4_part1
No ratings yet
dsbda_exp4_part1
39 pages
Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory
No ratings yet
Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory
9 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
Descriptive Analytics2.Ipynb - Colab
No ratings yet
Descriptive Analytics2.Ipynb - Colab
9 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Week 4 LAB
No ratings yet
Week 4 LAB
26 pages
DAVPy_2024GE
No ratings yet
DAVPy_2024GE
12 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Project On Data Mining-Raveendra Babu Gaddam
No ratings yet
Project On Data Mining-Raveendra Babu Gaddam
29 pages
Building Logistic regression model in python
No ratings yet
Building Logistic regression model in python
24 pages
prints
No ratings yet
prints
43 pages
Python
No ratings yet
Python
32 pages
Project Intern - Jupyter Notebook
No ratings yet
Project Intern - Jupyter Notebook
16 pages
Unit 1 Python Pandas
No ratings yet
Unit 1 Python Pandas
20 pages
Document (4)
No ratings yet
Document (4)
15 pages
PMT2 22
No ratings yet
PMT2 22
24 pages
DALab Part-B BCU&BU
No ratings yet
DALab Part-B BCU&BU
12 pages
Project 3 Thera Bank
100% (1)
Project 3 Thera Bank
24 pages
data_preprocess_steps
No ratings yet
data_preprocess_steps
2 pages
Simulation and Analysis Guide
No ratings yet
Simulation and Analysis Guide
524 pages
Validation Readiness
No ratings yet
Validation Readiness
5 pages
G2Q1 - FIL - Sina Estella at Lisa at Isang Linggo Sa Klase Ni Gng. Reyes - 033016 - FINAL
100% (1)
G2Q1 - FIL - Sina Estella at Lisa at Isang Linggo Sa Klase Ni Gng. Reyes - 033016 - FINAL
24 pages
KEEWAY 50cc Models: Scooter Service Manual
No ratings yet
KEEWAY 50cc Models: Scooter Service Manual
64 pages
Which of The Following Is Not True of FORTRAN?
No ratings yet
Which of The Following Is Not True of FORTRAN?
16 pages
Unit 10 Video: The Community Builder: Narrator
No ratings yet
Unit 10 Video: The Community Builder: Narrator
1 page
Malaysian SST under WPL 27.12
No ratings yet
Malaysian SST under WPL 27.12
2 pages
RECIPE - Valrhona's Caramelized White Chocolate
100% (1)
RECIPE - Valrhona's Caramelized White Chocolate
11 pages
Week 1 - Lecture
No ratings yet
Week 1 - Lecture
46 pages
Favorite Nursery Rhymes From Mother Goose: Click Here
0% (1)
Favorite Nursery Rhymes From Mother Goose: Click Here
5 pages
Leadership
50% (2)
Leadership
19 pages
Italian Course
No ratings yet
Italian Course
172 pages
Subwaycasestudy
No ratings yet
Subwaycasestudy
2 pages
ĐỀ TIẾNG ANH GIAO LƯU LẦN 1
No ratings yet
ĐỀ TIẾNG ANH GIAO LƯU LẦN 1
9 pages
Answer Question 8 Assignment Aa
No ratings yet
Answer Question 8 Assignment Aa
6 pages
PHO Organizational Chart 2022b
No ratings yet
PHO Organizational Chart 2022b
2 pages
Bhushan Certificate
No ratings yet
Bhushan Certificate
29 pages
Cambodia Standard of Audit
No ratings yet
Cambodia Standard of Audit
76 pages
laffon2014
No ratings yet
laffon2014
4 pages
B1 DIL Worksheet - Dealing With Data
No ratings yet
B1 DIL Worksheet - Dealing With Data
4 pages
Climate Classificatio N: by C. W. Thornthwaite
100% (3)
Climate Classificatio N: by C. W. Thornthwaite
10 pages
Victorinox Swiss Army NightVision - II - User Manual
No ratings yet
Victorinox Swiss Army NightVision - II - User Manual
7 pages
Past and Past Perfect Tenses
No ratings yet
Past and Past Perfect Tenses
7 pages
PROUS_Part_Number_Harmonization_Nov24
No ratings yet
PROUS_Part_Number_Harmonization_Nov24
28 pages
Human Resources: The Seven Classic Types of Workplace Behavior
No ratings yet
Human Resources: The Seven Classic Types of Workplace Behavior
3 pages
129377
No ratings yet
129377
40 pages
Prueba Hidaulica+Civil
No ratings yet
Prueba Hidaulica+Civil
5 pages

Family Main

Uploaded by

Family Main

Uploaded by

2.

Family Member Income Spend

In [ ]: # print the unique values of the first two columns

In [ ]: family_df["Family id"] = family_df["Family"].apply(lambda x: int(x.replace("family"

In [ ]: # get the total income for each family

The family with the max income: 6 with 7804425

**Q2. Which families do not possess adequate annual income to

In [ ]: total_spend = family_df.groupby('Family id')['Spend'].sum().sort_index()

Income Spend Surplus

In [ ]: families_deficit = family_income_spend[family_income_spend["Surplus"] < 0]

[100 rows x 3 columns]

In [ ]: # print the number of families with only one adult

# print the number of families with no children

Counts of single-parent families: 40

**Q4. Do you suspect any errors within this dataset? Examples

may include negative figures, missing or duplicate data, etc. Why?**

# Check for missing values

# Check for duplicated rows

# Check for negative values

**Q5. Can ChatGPT or Bing assist with the aforementioned four

ChatGPT: "How do I get the sum of a column in each group in Pandas?"

You might also like