Pandas
Pandas
Pandas is an open-source Python Library used for high-performance data manipulation and data
analysis using its powerful data structures. Python with pandas is in use in a variety of academic and
commercial domains, including Finance, Economics, Statistics, Advertising, Web Analytics, and more.
Python - Pandas
Fast and efficient Data Frame object with default and customized indexing.
Tools for loading data into in-memory data objects from different file formats.
Data alignment and integrated handling of missing data.
Reshaping and pivoting of date sets.
Label-based slicing, indexing and sub setting of large data sets.
Columns from a data structure can be deleted or inserted.
Group by data for aggregation and transformations.
High performance merging and joining of data.
Time Series functionality.
Python - Pandas
Series
Data Frame
These data structures are built on top of NumPy array, making them fast and efficient.
Series is a one-dimensional array like structure with homogeneous data. For example, the
following series is a collection of integers 10, 23, 56, …
Homogeneous data
Size Immutable
Values of Data Mutable
10 23 56 17 52 61 73 90 26 72
Python - Pandas
DataFrame
Heterogeneous data
Size Mutable
Data Mutable
Column Type
Name String
Age Integer
Gender String
Rating Float
Python - NumPy
NumPy is a Python package which stands for 'Numerical Python'. It is a library consisting of
multidimensional array objects and a collection of routines for processing of array.
ndarray Object
The most important object defined in NumPy is an N-dimensional array type called ndarray. It
describes the collection of items of the same type. Items in the collection can be accessed using a zero-
based index. Every item in an ndarray takes the same size of block in the memory. Each element in
ndarray is an object of data-type object (called dtype). Any item extracted from ndarray object (by
slicing) is represented by a Python object of one of array scalar types.
Python - Scipy
The SciPy library of Python is built to work with NumPy arrays and provides many user-friendly and
efficient numerical practices such as routines for numerical integration and optimization. Together, they
run on all popular operating systems, are quick to install and are free of charge. NumPy and SciPy are easy
to use, but powerful enough to depend on by some of the world's leading scientists and engineers.
Python - SciPy
SciPy Sub-packages
scipy.stats Statistics
Python - Matplotlib
Matplotlib is a python library used to create 2D graphs and plots by using python scripts. It has
a module named pyplot which makes things easy for plotting by providing feature to control
line styles, font properties, formatting axes etc. It supports a very wide variety of graphs and
plots namely - histogram, bar charts, power spectra, error charts etc. It is used along with
NumPy to provide an environment that is an effective open source alternative for MatLab. It
can also be used with graphics toolkits like PyQt and wxPython.
Conventionally, the package is imported into the Python script by adding the following
statement −