0% found this document useful (0 votes)
40 views

Pandas AI

Pandas is a Python library for working with structured data. It allows analyzing, cleaning, exploring and manipulating data. Pandas can import data from CSV and JSON files into DataFrames for analysis. DataFrames are like tables with rows and columns that allow accessing and filtering data.

Uploaded by

Saif Jutt
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Pandas AI

Pandas is a Python library for working with structured data. It allows analyzing, cleaning, exploring and manipulating data. Pandas can import data from CSV and JSON files into DataFrames for analysis. DataFrames are like tables with rows and columns that allow accessing and filtering data.

Uploaded by

Saif Jutt
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

INTRODUCTION TO

PANDAS
• Pandas is a Python library used for working with data sets.
• It has functions for analyzing, cleaning, exploring, and manipulating
data.
• The name "Pandas" has a reference to both "Panel Data", and "Python
Data Analysis" and was created by Wes McKinney in 2008.
WHY USE PANDAS?
• Pandas allows us to analyze big data and make conclusions based on statistical
theories.
• Pandas can clean messy data sets, and make them readable and relevant.
• Relevant data is very important in data science.
• Pandas gives you answers about the data. Like:
• Is there a correlation between two or more columns?
• What is average value?
• Max value?
• Min value?
• Pandas are also able to delete rows that are not relevant, or contains wrong
values, like empty or NULL values. This is called cleaning the data.
• The source code for Pandas is located at this github
repository https://github.com/pandas-dev/pandas
HOW TO START WITH PANDAS?

• pip install pandas


• import pandas as pd
• Now the Pandas package can be referred to as pd instead of pandas.

• print(pd.__version__)
SERIES -- LABELS
• If nothing else is specified, the values are labeled with their index number.
First value has index 0, second value has index 1 etc.
• This label can be used to access a specified value.
• With the index argument, you can name your own labels.
• import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a, index = ["x", "y", "z"])
print(myvar)
• print(myvar["y"])
KEY/VALUE OBJECTS AS SERIES

• You can also use a key/value object, like a dictionary, when creating a
Series.
• The keys of the dictionary become the labels.
• import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)
KEY/VALUE OBJECTS AS SERIES (CONT.)

• To select only some of the items in the dictionary, use the index
argument and specify only the items you want to include in the Series.
• import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories, index=["day1", "day2"])
print(myvar)
DATA FRAMES

• Data sets in Pandas are usually multi-dimensional tables, called DataFrames.


• A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional
array, or a table with rows and columns.
• Series is like a column, a DataFrame is the whole table.
• import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
myvar = pd.DataFrame(data)
print(myvar)
DATA FRAMES- LOCATE ROW

• As you can see from the result above, the DataFrame is like a table with rows and columns.
• Pandas use the loc attribute to return one or more specified row(s)

• import pandas as pd • print(df.loc[0])


data = {
"calories":
[420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)
NAMED INDEXES

• With the index argument, you can name your own indexes.
• import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

df = pd.DataFrame(data, index=["day1", "day2", "day3"])


print(df)
NAMED INDEXES (CONT.)

• Use the named index in the loc attribute to return the specified row(s).
• print(df.loc["day2"])
READ CSV FILES

• A simple way to store big data sets is to use CSV files (comma
separated files).
• CSV files contains plain text and is a well know format that can be read
by everyone including Pandas.
• import pandas as pd
df = pd.read_csv('data.csv')
print(df)
READ JSON
• Big data sets are often stored, or extracted as JSON.
• JSON is plain text, but has the format of an object, and
is well known in the world of programming, including
Pandas.
• import pandas as pd
df = pd.read_json('data.json')
print(df)
READ JSON

• If your JSON code is not in a file,


but in a Python Dictionary, you can
load it into a DataFrame directly

You might also like