Sesion_01b_pandas_V1.ipynb
Sesion_01b_pandas_V1.ipynb
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"<a
href=\"https://colab.research.google.com/github/CienciaDatosUdea/002_EstudiantesApr
endizajeEstadistico/blob/main/semestre2024-2/Sesiones/Sesion_01b_pandas_V1.ipynb\"
target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-
badge.svg\" alt=\"Open In Colab\"/></a>\n",
"\n",
"\n"
],
"metadata": {
"id": "NsRH0Q4Yqn_D"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "mO08F9Bh2fU6"
},
"source": [
"# Construyendo Data Frame desde diccionarios y cargando datos del un data
frame\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "aU4aSvHu56Hw"
},
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import os\n",
"import matplotlib.pylab as plt\n",
"import seaborn as sns"
],
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "NO59NdYu2W1z"
},
"source": [
"x = np.linspace(0, 10, 10)\n",
"y = np.linspace(0, 10, 10)\n",
"\n",
"d = {\"x\": x, \"y\": y}"
],
"execution_count": 3,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "hYctW3-G6OzD"
},
"source": [
"df = pd.DataFrame(d)"
],
"execution_count": 4,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 385
},
"id": "YCtY7mjr6QTA",
"outputId": "07287fee-303b-42e4-ac41-10c61fce3ee1"
},
"source": [
"df.y"
],
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"0 0.000000\n",
"1 1.111111\n",
"2 2.222222\n",
"3 3.333333\n",
"4 4.444444\n",
"5 5.555556\n",
"6 6.666667\n",
"7 7.777778\n",
"8 8.888889\n",
"9 10.000000\n",
"Name: y, dtype: float64"
],
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.111111</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2.222222</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3.333333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4.444444</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>5.555556</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>6.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>7.777778</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>8.888889</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>10.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div><br><label><b>dtype:</b> float64</label>"
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"source": [
"# Metodos para leer un data frame.\n",
"\n",
"\n",
"| Método de lectura | Explicación |\n",
"| --- | --- |\n",
"| `read_csv` | Lee un archivo CSV (valores separados por comas) y lo
convierte en un DataFrame. |\n",
"| `read_excel` | Lee un archivo Excel y lo convierte en un DataFrame. |\
n",
"| `read_sql` | Lee una consulta SQL y la ejecuta en una base de datos,
devolviendo el resultado como un DataFrame. |\n",
"| `read_json` | Lee un archivo JSON (JavaScript Object Notation) y lo
convierte en un DataFrame. |\n",
"| `read_html` | Lee todas las tablas HTML contenidas en una página web o
en un archivo HTML y las convierte en una lista de DataFrames. |\n",
"| `read_parquet` | Lee un archivo Parquet, un formato binario para
almacenar datos tabulares, y lo convierte en un DataFrame. |\n",
"| `read_feather` | Lee un archivo Feather, un formato binario para
almacenar datos tabulares, y lo convierte en un DataFrame. |\n",
"| `read_hdf` | Lee un archivo HDF5 (Hierarchical Data Format), un formato
para almacenar datos científicos, y lo convierte en un DataFrame. |\n",
"| `read_clipboard` | Lee el contenido del portapapeles y lo convierte en
un DataFrame. |\n"
],
"metadata": {
"id": "813MCCGuqwDJ"
}
},
{
"cell_type": "code",
"metadata": {
"id": "ZWDT-7Z16khQ",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 596
},
"outputId": "11ba0fb6-2467-4abc-a6e4-017638f6f458"
},
"source": [
"path = os.getcwd()\n",
"# https://www.kaggle.com/gpreda/covid-world-vaccination-progress?
select=country_vaccinations\n",
"#df = pd.read_excel(f\"{path}/datasets/country_vaccinations.xlsx\")\n",
"# https://www.kaggle.com/gpreda/covid-world-vaccination-progress?
select=country_vaccinations\n",
"path =
\"https://github.com/hernansalinas/Curso_aprendizaje_estadistico/blob/main/
datasets/sesion_01b_country_vaccinations.xlsx?raw=true\"\n",
"df = pd.read_excel(f\"{path}\")\n",
"df"
],
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" country iso_code date total_vaccinations
people_vaccinated \\\n",
"0 Afghanistan AFG 2021-02-22 0.0
0.0 \n",
"1 Afghanistan AFG 2021-02-23 NaN
NaN \n",
"2 Afghanistan AFG 2021-02-24 NaN
NaN \n",
"3 Afghanistan AFG 2021-02-25 NaN
NaN \n",
"4 Afghanistan AFG 2021-02-26 NaN
NaN \n",
"... ... ... ... ...
... \n",
"9571 Zimbabwe ZWE 2021-03-29 81610.0
69751.0 \n",
"9572 Zimbabwe ZWE 2021-03-30 85866.0
72944.0 \n",
"9573 Zimbabwe ZWE 2021-03-31 91880.0
76995.0 \n",
"9574 Zimbabwe ZWE 2021-04-01 105307.0
87791.0 \n",
"9575 Zimbabwe ZWE 2021-04-02 124753.0
103815.0 \n",
"\n",
" people_fully_vaccinated daily_vaccinations_raw
daily_vaccinations \\\n",
"0 NaN NaN
NaN \n",
"1 NaN NaN
1367.0 \n",
"2 NaN NaN
1367.0 \n",
"3 NaN NaN
1367.0 \n",
"4 NaN NaN
1367.0 \n",
"... ... ...
... \n",
"9571 11859.0 2471.0
5434.0 \n",
"9572 12922.0 4256.0
5810.0 \n",
"9573 14885.0 6014.0
5712.0 \n",
"9574 17516.0 13427.0
6617.0 \n",
"9575 20938.0 19446.0
8156.0 \n",
"\n",
" total_vaccinations_per_hundred
people_vaccinated_per_hundred \\\n",
"0 0.0
0.0 \n",
"1 NaN
NaN \n",
"2 NaN
NaN \n",
"3 NaN
NaN \n",
"4 NaN
NaN \n",