0% found this document useful (0 votes)
63 views11 pages

Pandas

Pandas is an open-source Python library used for data manipulation and analysis. It provides fast and efficient data structures including Series and DataFrame. Series is a one-dimensional array-like structure, while DataFrame is a two-dimensional structure that allows for heterogeneous data types across columns. Pandas loads data from different file formats and performs operations like merging, joining, reshaping and grouping of data.

Uploaded by

rajeshd231
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views11 pages

Pandas

Pandas is an open-source Python library used for data manipulation and analysis. It provides fast and efficient data structures including Series and DataFrame. Series is a one-dimensional array-like structure, while DataFrame is a two-dimensional structure that allows for heterogeneous data types across columns. Pandas loads data from different file formats and performs operations like merging, joining, reshaping and grouping of data.

Uploaded by

rajeshd231
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Python - Pandas

Pandas is an open-source Python Library used for high-performance data manipulation and data
analysis using its powerful data structures. Python with pandas is in use in a variety of academic and
commercial domains, including Finance, Economics, Statistics, Advertising, Web Analytics, and more.
Python - Pandas

Key Features of Pandas

 Fast and efficient Data Frame object with default and customized indexing.
 Tools for loading data into in-memory data objects from different file formats.
 Data alignment and integrated handling of missing data.
 Reshaping and pivoting of date sets.
 Label-based slicing, indexing and sub setting of large data sets.
 Columns from a data structure can be deleted or inserted.
 Group by data for aggregation and transformations.
 High performance merging and joining of data.
 Time Series functionality.
Python - Pandas

Pandas deals with the following three data structures −

 Series

 Data Frame

These data structures are built on top of NumPy array, making them fast and efficient.

Data Structure Dimensions Description


Series 1 1D labeled homogeneous array,
size-immutable.
Data Frames 2 General 2D labeled, size-
mutable tabular structure with
potentially heterogeneously
typed columns.
Python - Pandas
Series

Series is a one-dimensional array like structure with homogeneous data. For example, the
following series is a collection of integers 10, 23, 56, …

Key Points of Series

 Homogeneous data
 Size Immutable
 Values of Data Mutable

10 23 56 17 52 61 73 90 26 72
Python - Pandas

DataFrame

DataFrame is a two-dimensional array with heterogeneous data. For example,


The table represents the data of a sales team of an organization with their overall
performance rating. The data is represented in rows and columns. Each column represents
an attribute and each row represents a person.

Name Age Gender Rating


Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2.78
Python - Pandas

Data Type of Columns

The data types of the four columns are as follows −

Key Points of Data Frame

 Heterogeneous data
 Size Mutable
 Data Mutable

Column Type
Name String
Age Integer
Gender String
Rating Float
Python - NumPy

NumPy is a Python package which stands for 'Numerical Python'. It is a library consisting of
multidimensional array objects and a collection of routines for processing of array.

Operations using NumPy

Using NumPy, a developer can perform the following operations −

 Mathematical and logical operations on arrays.


 Fourier transforms and routines for shape manipulation.
 Operations related to linear algebra. NumPy has in-built functions for linear algebra and random
number generation.
Python - NumPy
NumPy

NumPy is often used along with packages like SciPy (Scientific Python) and Mat−plotlib (plotting


library). This combination is widely used as a replacement for MatLab, a popular platform for technical
computing. However, Python alternative to MatLab is now seen as a more modern and complete
programming language.

It is open source, which is an added advantage of NumPy.

ndarray Object

The most important object defined in NumPy is an N-dimensional array type called ndarray. It
describes the collection of items of the same type. Items in the collection can be accessed using a zero-
based index. Every item in an ndarray takes the same size of block in the memory. Each element in
ndarray is an object of data-type object (called dtype). Any item extracted from ndarray object (by
slicing) is represented by a Python object of one of array scalar types.
Python - Scipy

The SciPy library of Python is built to work with NumPy arrays and provides many user-friendly and
efficient numerical practices such as routines for numerical integration and optimization. Together, they
run on all popular operating systems, are quick to install and are free of charge. NumPy and SciPy are easy
to use, but powerful enough to depend on by some of the world's leading scientists and engineers.
Python - SciPy
SciPy Sub-packages

SciPy is organized into sub-packages covering different scientific computing domains.


These are summarized in the following table −
scipy.constants Physical and mathematical constants

scipy.fftpack Fourier transform


scipy.integrate Integration routines
scipy.interpolate Interpolation
scipy.io Data input and output
scipy.linalg Linear algebra routines
scipy.optimize Optimization
scipy.signal Signal processing
scipy.sparse Sparse matrices
scipy.spatial Spatial data structures and algorithms

scipy.special Any special mathematical functions

scipy.stats Statistics
Python - Matplotlib

Matplotlib is a python library used to create 2D graphs and plots by using python scripts. It has
a module named pyplot which makes things easy for plotting by providing feature to control
line styles, font properties, formatting axes etc. It supports a very wide variety of graphs and
plots namely - histogram, bar charts, power spectra, error charts etc. It is used along with
NumPy to provide an environment that is an effective open source alternative for MatLab. It
can also be used with graphics toolkits like PyQt and wxPython.

Conventionally, the package is imported into the Python script by adding the following
statement −

from matplotlib import pyplot as plt

You might also like