0% found this document useful (0 votes)

25 views7 pages

Practica 11

The document contains code to analyze and summarize data using NumPy and Pandas. It imports data from an Excel file containing employee data, then performs statistical calculations and grouping on the data. Calculations include sums, means, percentiles, standard deviations across the full dataset and grouped by variables like gender. Frequency counts and cross tabulations are also generated to analyze relationships between variables.

Uploaded by

2marlenehh2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views7 pages

Practica 11

Uploaded by

2marlenehh2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

import sys

print(sys.version)

3.10.11 (main, Apr 5 2023, 14:15:10) [GCC 9.4.0]

from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity="all"

Analisis exploratorio de datos

Calculos en Arrays

import numpy as np

a = np.array([4,5,9,4,6,3,2])
a

array([4, 5, 9, 4, 6, 3, 2])

np.sum(a); print()
np.mean(a); print()
np.median(a); print()

import statistics as stat

stat.mode(a)

33
4.714285714285714
4.0
4

np.percentile(a, 25); print() #Del 100% de datos de a, el 25% tiene valores menores a 3.5
np.percentile(a, 50); print() #Del 100% de datos de a, el 50% tiene valores menores a 4.0
np.percentile(a, 75) #Del 100% de datos de a, el 75% tiene valores menores a 5.5

3.5
4.0
5.5

min(a); print()
max(a); print()
np.std(a)

2
9
2.1189138534559038

from scipy.stats import skew #sesgado

from scipy.stats import kurtosis #curtosis

skew(a); print()
kurtosis(a)

0.8274271039321606
-0.1402479338842988

np.random.seed(2021)
a = np.random.randint(1,9,size=(3,3))
a

array([[5, 6, 2],
[1, 6, 7],
[7, 5, 8]])

np.sum(a); print()
np.sum(a,0); print() #por columnas
np.sum(a,1) #por filas

47
array([13, 17, 17])
array([13, 14, 20])

np.mean(a); print()
np.mean(a,0); print() #por columnas
np.mean(a,1) #por filas
5.222222222222222
array([4.33333333, 5.66666667, 5.66666667])
array([4.33333333, 4.66666667, 6.66666667])

np.std(a); print()
np.std(a,0); print() #por columnas
np.std(a,1) #por filas

2.199887763691481
array([2.49443826, 0.47140452, 2.62466929])
array([1.69967317, 2.62466929, 1.24721913])

np.percentile(a, 25); print()

np.percentile(a,25,0); print() #por columnas
np.percentile(a,25,1) #por filas

5.0
array([3. , 5.5, 4.5])
array([3.5, 3.5, 6. ])

a.min(); print()
a.min(0); print() #por columnas
a.min(1) #por filas

1
array([1, 5, 2])
array([2, 1, 5])

a.max(); print()
a.max(0); print() #por columnas
a.max(1) #por filas

8
array([7, 6, 8])
array([6, 7, 8])

Calculos en dataframes

import pandas as pd

import os
os.chdir('/content/dataset')
os.getcwd()

'/content/dataset'

ed = pd.read_excel('EmployeeData2.xlsx')
ed.head()

id sexo fechnac educ catlab salario salini tiempemp expprev minoria

0 1 Hombre 1952-02-03 15 Directivo 57000 27000 98 144.0 No

1 2 Hombre 1958-05-23 16 Administrativo 40200 18750 98 36.0 No

2 3 Mujer 1929-07-26 12 Administrativo 21450 12000 98 381.0 No

3 4 Mujer 1947-04-15 8 Administrativo 21900 13200 98 190.0 No

4 5 Hombre 1955-02-09 15 Administrativo 45000 21000 98 138.0 No

ed.info

<bound method DataFrame.info of id sexo fechnac educ catlab salario salini tiempemp \
0 1 Hombre 1952-02-03 15 Directivo 57000 27000 98
1 2 Hombre 1958-05-23 16 Administrativo 40200 18750 98
2 3 Mujer 1929-07-26 12 Administrativo 21450 12000 98
3 4 Mujer 1947-04-15 8 Administrativo 21900 13200 98
4 5 Hombre 1955-02-09 15 Administrativo 45000 21000 98
.. ... ... ... ... ... ... ... ...
469 470 Hombre 1964-01-22 12 Administrativo 26250 15750 64
470 471 Hombre 1966-08-03 15 Administrativo 26400 15750 64
471 472 Hombre 1966-02-21 15 Administrativo 39150 15750 63
472 473 Mujer 1937-11-25 12 Administrativo 21450 12750 63
473 474 Mujer 1968-11-05 12 Administrativo 29400 14250 63

expprev minoria
0 144.0 No
1 36.0 No
2 381.0 No
3 190.0 No
4 138.0 No
.. ... ...
469 69.0 Sí
470 32.0 Sí
471 46.0 No
472 139.0 No
473 9.0 No

[474 rows x 10 columns]>

ed.describe()
#ed.describe(include = [np.number])

id educ salario salini tiempemp expprev

count 474.000000 474.000000 474.000000 474.000000 474.000000 450.000000

mean 237.500000 13.491561 34419.567511 17016.086498 81.109705 100.973333

std 136.976275 2.884846 17075.661465 7870.638154 10.060945 104.907443

min 1.000000 8.000000 15750.000000 9000.000000 63.000000 2.000000

25% 119.250000 12.000000 24000.000000 12487.500000 72.000000 24.000000

50% 237.500000 12.000000 28875.000000 15000.000000 81.000000 59.000000

75% 355.750000 15.000000 36937.500000 17490.000000 90.000000 144.000000

max 474.000000 21.000000 135000.000000 79980.000000 98.000000 476.000000

ed.describe(include = ['O'])
#Incluye las columnas con tipos de datos de objetos (cad. de caract.)

sexo catlab minoria

count 474 474 474

unique 2 3 2

top Hombre Administrativo No

freq 258 363 370

ed.isnull().sum()

id 0
sexo 0
fechnac 1
educ 0
catlab 0
salario 0
salini 0
tiempemp 0
expprev 24
minoria 0
dtype: int64

ed['sexo'].value_counts(); print('\n')
ed['catlab'].value_counts()

Hombre 258
Mujer 216
Name: sexo, dtype: int64

Administrativo 363
Directivo 84
Seguridad 27
Name: catlab, dtype: int64

ed['sexo'].value_counts(normalize=True); print('\n')
ed['catlab'].value_counts(normalize=True)

Hombre 0.544304
Mujer 0.455696
Name: sexo, dtype: float64

Administrativo 0.765823
Directivo 0.177215
Seguridad 0.056962
Name: catlab, dtype: float64
pd.crosstab(ed['sexo'], ed['catlab'])

catlab Administrativo Directivo Seguridad

sexo

Hombre 157 74 27

Mujer 206 10 0

pd.crosstab(ed['sexo'], ed['catlab'], normalize = 'index')

catlab Administrativo Directivo Seguridad

sexo

Hombre 0.608527 0.286822 0.104651

Mujer 0.953704 0.046296 0.000000

pd.crosstab(ed['sexo'], ed['catlab'], normalize = 'columns')

catlab Administrativo Directivo Seguridad

sexo

Hombre 0.432507 0.880952 1.0

Mujer 0.567493 0.119048 0.0

pd.crosstab(ed['sexo'], ed['catlab'], normalize = 'all')

catlab Administrativo Directivo Seguridad

sexo

Hombre 0.331224 0.156118 0.056962

Mujer 0.434599 0.021097 0.000000

Groupby

ed.groupby(by='sexo')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f23685c9e40>

ed.groupby(by='sexo').describe()

id educ ... tiempemp expprev

count mean std min 25% 50% 75% max count mean ... 75% max count mean std min 25% 50%

sexo

Hombre 258.0 227.550388 141.168440 1.0 103.25 216.5 342.50 472.0 258.0 14.430233 ... 91.0 98.0 258.0 111.620155 109.692296 3.0 37.25 67.5

Mujer 216.0 249.384259 131.130737 3.0 141.75 247.5 360.25 474.0 216.0 12.370370 ... 88.0 98.0 192.0 86.666667 96.553993 2.0 11.00 48.0

2 rows × 48 columns

ed.groupby(by='sexo').describe().stack()
#calcular estadisticas descriptivas para cada grupo
id educ salario salini tiempemp expprev

sexo

Hombre count 258.000000 258.000000 258.000000 258.000000 258.000000 258.000000

mean 227.550388 14.430233 41441.782946 20301.395349 81.720930 111.620155

std 141.168440 2.979335 19499.213736 9111.780867 10.351020 109.692296

min 1.000000 8.000000 19650.000000 9000.000000 63.000000 3.000000

25% 103.250000 12.000000 28050.000000 15000.000000 73.250000 37.250000

50% 216.500000 15.000000 32850.000000 15750.000000 82.000000 67.500000

75% 342.500000 16.000000 50412.500000 22372.500000 91.000000 149.750000

max 472.000000 21.000000 135000.000000 79980.000000 98.000000 476.000000

Mujer count 216.000000 216.000000 216.000000 216.000000 216.000000 192.000000

ed.groupby(by='sexo').describe(include
mean 249.384259 =12.370370
['O']).stack()
26031.921296 13091.967593 80.379630 86.666667

std 131.130737 2.319152 7558.021452 2935.599213 9.676361 96.553993

catlab minoria
min 3.000000 8.000000 15750.000000 9000.000000 63.000000 2.000000
sexo
25% 141.750000 12.000000 21562.500000 11193.750000 72.000000 11.000000
Hombre count 258 258
50% 247.500000 12.000000 24300.000000 12375.000000 81.000000 48.000000
unique 3 2
75% 360.250000 15.000000 28500.000000 14250.000000 88.000000 137.500000
top Administrativo No
max 474.000000 17.000000 58125.000000 30000.000000 98.000000 412.000000
freq 157 194

Mujer count 216 216

unique 2 2

top Administrativo No

freq 206 176

#media de las variables numericas, por sexo

ed.groupby(by='sexo').mean()

<ipython-input-40-731e4483d053>:2: FutureWarning: The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeri
ed.groupby(by='sexo').mean()
id educ salario salini tiempemp expprev

sexo

Hombre 227.550388 14.430233 41441.782946 20301.395349 81.72093 111.620155

Mujer 249.384259 12.370370 26031.921296 13091.967593 80.37963 86.666667

#media de la variable salario, por sexo

ed.groupby(by='sexo')['salario'].mean()

sexo
Hombre 41441.782946
Mujer 26031.921296
Name: salario, dtype: float64

pd.DataFrame(ed.groupby(by='sexo')['salario'].mean())

salario

sexo

Hombre 41441.782946

Mujer 26031.921296

pd.DataFrame(ed.groupby(by='sexo', as_index = False)['salario'].mean())

sexo salario

0 Hombre 41441.782946

1 Mujer 26031.921296
ed.groupby(by=['sexo', 'catlab'])['salario'].mean()

sexo catlab
Hombre Administrativo 31558.152866
Directivo 66243.243243
Seguridad 30938.888889
Mujer Administrativo 25003.689320
Directivo 47213.500000
Name: salario, dtype: float64

pd.DataFrame(ed.groupby(by=['sexo', 'catlab'])['salario'].mean())

salario

sexo catlab

Hombre Administrativo 31558.152866

Directivo 66243.243243

Seguridad 30938.888889

Mujer Administrativo 25003.689320

Directivo 47213.500000

ed.groupby(by=['sexo', 'catlab'])['salario'].mean().unstack()
#['salario'].mean() calcula la media de la columna salario para cada grupo
#unstack() Convierte la tabla de resultados en un formato de "tabla pivoteada",
# en la que los valores de la columna 'catlab' se convierten en columnas separadas

catlab Administrativo Directivo Seguridad

sexo

Hombre 31558.152866 66243.243243 30938.888889

Mujer 25003.689320 47213.500000 NaN

ed[['sexo', 'catlab', 'salario','tiempemp']].groupby(by=['sexo',

'catlab']).aggregate(['min', 'max', np.mean, np.std])

salario tiempemp

min max mean std min max mean std

sexo catlab

Hombre Administrativo 19650 80000 31558.152866 7997.977675 63 98 81.726115 10.670239

Directivo 38700 135000 66243.243243 18051.569628 64 98 81.770270 10.403565

Seguridad 24300 35250 30938.888889 2114.616411 67 95 81.555556 8.486792

Mujer Administrativo 15750 54000 25003.689320 5812.838103 63 98 80.563107 9.658228

Directivo 34410 58125 47213.500000 8501.252538 64 90 76.600000 9.766155

ed[['sexo','catlab','salario','tiempemp']].groupby(by=['sexo',
'catlab']).aggregate({'salario':'min', 'tiempemp':np.mean})

salario tiempemp

sexo catlab

Hombre Administrativo 19650 81.726115

Directivo 38700 81.770270

Seguridad 24300 81.555556

Mujer Administrativo 15750 80.563107

Directivo 34410 76.600000

check 0 s se ejecutó 21:52

CSC Data Interpretation and Analysis Reviewer
50% (2)
CSC Data Interpretation and Analysis Reviewer
3 pages
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
Informality Data
No ratings yet
Informality Data
298 pages
Quiz Coding Question 1
No ratings yet
Quiz Coding Question 1
9 pages
4ems
No ratings yet
4ems
38 pages
Birthsf
No ratings yet
Birthsf
264 pages
Data Preprocessing - Ipynb - Colaboratory
No ratings yet
Data Preprocessing - Ipynb - Colaboratory
7 pages
CMSU Survey Data Analysis PDF
100% (3)
CMSU Survey Data Analysis PDF
13 pages
Linear
No ratings yet
Linear
107 pages
1 +Fast+Navigation
No ratings yet
1 +Fast+Navigation
96 pages
Skhiri Base de Donnes
No ratings yet
Skhiri Base de Donnes
48 pages
R Working Materials Prep
No ratings yet
R Working Materials Prep
43 pages
Account Based Analytics CMU Spring 2025 Spreadsheet
No ratings yet
Account Based Analytics CMU Spring 2025 Spreadsheet
212 pages
Adkins (2011) - Using Gretl For Principles of Econometrics, 4th Edition PDF
No ratings yet
Adkins (2011) - Using Gretl For Principles of Econometrics, 4th Edition PDF
494 pages
Tat 565485 17134374325117
No ratings yet
Tat 565485 17134374325117
52 pages
Dav 2024 Pyq
No ratings yet
Dav 2024 Pyq
7 pages
Mastering Pandas With 103 Practical Questions and Solution 1731584558
No ratings yet
Mastering Pandas With 103 Practical Questions and Solution 1731584558
48 pages
Business Case - Aerofit - Descriptive Statistics Probability (Final)
100% (1)
Business Case - Aerofit - Descriptive Statistics Probability (Final)
1 page
Probability Mass Functions: Allen Downey
No ratings yet
Probability Mass Functions: Allen Downey
37 pages
ECON306 Excel 1
No ratings yet
ECON306 Excel 1
18 pages
Ss Project With Python
No ratings yet
Ss Project With Python
9 pages
LDA Code
No ratings yet
LDA Code
19 pages
Prac 31 Jan
No ratings yet
Prac 31 Jan
16 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
UQ21CA632B Unit2 Class12&13 Pandas Basics
No ratings yet
UQ21CA632B Unit2 Class12&13 Pandas Basics
11 pages
DW 14
No ratings yet
DW 14
14 pages
DSBDA3 - Jupyter Notebook
No ratings yet
DSBDA3 - Jupyter Notebook
12 pages
SGT TELUGU MEDIUM Seniority List
No ratings yet
SGT TELUGU MEDIUM Seniority List
4 pages
Dsbda 3
No ratings yet
Dsbda 3
12 pages
230103-ECON209 S2025 Lab 2.ipynb-Colab
No ratings yet
230103-ECON209 S2025 Lab 2.ipynb-Colab
10 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Stata Codes
No ratings yet
Stata Codes
8 pages
R Studio Assignments
No ratings yet
R Studio Assignments
95 pages
Piura Cuadros
No ratings yet
Piura Cuadros
8 pages
MIL STD 105E - Text PDF
No ratings yet
MIL STD 105E - Text PDF
73 pages
R Working Manuals Students
No ratings yet
R Working Manuals Students
11 pages
HW 2
No ratings yet
HW 2
12 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Assignment 2 Mlo
No ratings yet
Assignment 2 Mlo
9 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Practical No 9 PDF
No ratings yet
Practical No 9 PDF
17 pages
Divp Pyq 2023
No ratings yet
Divp Pyq 2023
7 pages
ML 2 ND Unit
No ratings yet
ML 2 ND Unit
50 pages
Lesson 2.3-Measures of Central Tendency
No ratings yet
Lesson 2.3-Measures of Central Tendency
31 pages
Results
No ratings yet
Results
7 pages
Análisis Stats Health
No ratings yet
Análisis Stats Health
6 pages
Answer Key For SET-1 TO 3
No ratings yet
Answer Key For SET-1 TO 3
7 pages
UQ21CA632B Unit2 Class14a Data Representation
No ratings yet
UQ21CA632B Unit2 Class14a Data Representation
5 pages
Intructions: 1. Type The Following Data Below
No ratings yet
Intructions: 1. Type The Following Data Below
3 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
9 pages
Indicator Variables: Variable or Dummy Variables
No ratings yet
Indicator Variables: Variable or Dummy Variables
11 pages
Matplolib Cheat Sheet v2
No ratings yet
Matplolib Cheat Sheet v2
3 pages
NPV 70 Marks Set 2
No ratings yet
NPV 70 Marks Set 2
4 pages
RCommander Resultados
No ratings yet
RCommander Resultados
2 pages
Class 3
No ratings yet
Class 3
3 pages
Third Internal - Q&A
No ratings yet
Third Internal - Q&A
2 pages
Multivariable Analysis A Practical Guide For Clinicians and Public Health Researchers 3rd Edition Official Download
100% (21)
Multivariable Analysis A Practical Guide For Clinicians and Public Health Researchers 3rd Edition Official Download
16 pages
Wa0002.
No ratings yet
Wa0002.
2 pages
Mall Customer
No ratings yet
Mall Customer
1 page
Cleaning Functions Assignment 2
No ratings yet
Cleaning Functions Assignment 2
1 page
HCIA-AI Daily Test - Exloratory Data Analysis
No ratings yet
HCIA-AI Daily Test - Exloratory Data Analysis
1 page
Logistic Regression 007
No ratings yet
Logistic Regression 007
1 page
Determining Sample Size: Glenn D. Israel
No ratings yet
Determining Sample Size: Glenn D. Israel
5 pages
Notes On SPSS
No ratings yet
Notes On SPSS
19 pages
Ex 8
No ratings yet
Ex 8
3 pages
Ques Bank Gargi 1
No ratings yet
Ques Bank Gargi 1
28 pages
Statistical Quality Control (S.Q.C.) Presented By-: Nikhil Garg ROLL NO-0129626
No ratings yet
Statistical Quality Control (S.Q.C.) Presented By-: Nikhil Garg ROLL NO-0129626
38 pages
BI Syllabus
No ratings yet
BI Syllabus
3 pages
Module 01 - Performance Metrics in ML
No ratings yet
Module 01 - Performance Metrics in ML
15 pages
Joreskog Sorbom LISREL 8 Structural Equation Modeling With Simplis Command Language 1998
No ratings yet
Joreskog Sorbom LISREL 8 Structural Equation Modeling With Simplis Command Language 1998
12 pages
Grade 7 Core Subject: Science and Technology Course Topic Outline
100% (1)
Grade 7 Core Subject: Science and Technology Course Topic Outline
6 pages
Assignment5 Solution
No ratings yet
Assignment5 Solution
4 pages
Crime Scene Project
No ratings yet
Crime Scene Project
6 pages
426 747 1 SM PDF
No ratings yet
426 747 1 SM PDF
11 pages
Module 4
No ratings yet
Module 4
15 pages
Chapter 6 - 2-2
No ratings yet
Chapter 6 - 2-2
14 pages
Syllabus - Foundation of Data Science
No ratings yet
Syllabus - Foundation of Data Science
4 pages
QBUS2820 Mid-Semester 2015s2 (Solution)
No ratings yet
QBUS2820 Mid-Semester 2015s2 (Solution)
7 pages
Post Hoc Test: Descriptives
No ratings yet
Post Hoc Test: Descriptives
3 pages
Bus 308 Week One Assignment
No ratings yet
Bus 308 Week One Assignment
14 pages
Assignment DataSet2
No ratings yet
Assignment DataSet2
2 pages
2024-2025 S2 SB Assignment
No ratings yet
2024-2025 S2 SB Assignment
3 pages
EMTIV Assignment 2020
No ratings yet
EMTIV Assignment 2020
3 pages
Question Bank Aml
No ratings yet
Question Bank Aml
2 pages
Mathematics IV Nov2004 or 311851
No ratings yet
Mathematics IV Nov2004 or 311851
2 pages