suraj report file
suraj report file
Seminar-
Report On
“Python Libraries For Data Science”
Submitted
In practical fulfillment
For the award of the Degree of
Bachelor of Technology
In Department of Computer Engineering
i
CANDIDATE'S DECLARATION
This is to certify that seminar report for Python Libraries For Data Science is submitted
by "Suraj Kumar Kushwah (21EAYCS134)" in partial fulfillment for the award
of degree of Bachelor of Technology in Computer Science & Engineering has
been found satisfactory and is approved for submission.
ii
ACKNOWLEDGEMENT
iii
ABSTRACT
iv
Table of Contents
TITLE PAGE No
Cover Page i
Candidate’s Declaration ii
Acknowledgement iii
Abstract iv
1. Introduction 1
2. Important of python libraries for data science 2
3. Numpy Library 3
4. Pandas Library 4-5
5. Matplotlib Library 6-7
6. Seaborn Library 8-9
7. Scikit-Learn Library 10
8. TensorFlow 11
9. References 12
5
Chapter – 1
Introduction
Data science has emerged as one of the most impactful domains in technology, enabling
businesses and researchers to derive meaningful insights from data. Python, a versatile and
high-level programming language, has established itself as a preferred tool for data science
due to its simplicity and the rich ecosystem of libraries.
The journey of data science involves multiple stages, including data collection,
preprocessing, analysis, visualization, and model building. Python provides dedicated
libraries for each of these stages. For instance, NumPy facilitates numerical computations,
Pandas enables efficient data manipulation, and Scikit-learn and TensorFlow power
machine learning and deep learning tasks. This report highlights the features, advantages, and
practical applications of these libraries with hands-on code examples.
The versatility and active community support of Python have made it an indispensable tool in
data science. As data continues to grow in volume and complexity, mastering Python and its
libraries is becoming a necessity for professionals across industries.
1
Chapter-2
Python’s dominance in data science is rooted in its ability to adapt to the diverse needs of the
field. Below are some of the core reasons why Python is indispensable in data science:
2
Chapter-3
Numpy Library
NumPy, short for Numerical Python, is the foundation of numerical computing in Python. It
offers support for large, multi-dimensional arrays and matrices, along with a rich set of
mathematical functions to operate on these arrays.
Key Features:
Advantages:
• Acts as the backbone for other libraries such as Pandas, Scikit-learn, and TensorFlow.
• Handles data processing tasks with minimal memory consumption.
Code
import numpy as np
# Create a 2D array
Applications:
NumPy is essential for tasks like:
3
Chapter-4
Pandas Library
Pandas is a powerful, flexible, and easy-to-use library for data manipulation and analysis. It
provides data structures like Series (one-dimensional) and DataFrames (two-dimensional) to
manage structured data efficiently.
Key Features:
Advantages:
Code
import pandas as pd
# Create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)
# Perform operations
print("DataFrame:\n", df)
print("Summary statistics:\n", df.describe())
4
print("Rows where Age > 30:\n", df[df['Age'] > 30])
Applications:
Pandas is widely used in:
5
Chapter-5
Matplotlib Library
Matplotlib is a powerful Python library for creating data visualizations and plots. It provides
various functio and modules that enable users to customize, create, and display a wide range
of visualizations. Here are som functions and concepts associated with Matplotlib in the
context of data analysis and visualization.
plt.plot(): Creates line plots and can be used for visualizing trends over continuous data
points.
plt.scatter(): Generates scatter plots for visualizing relationships between two variables.
plt.boxplot(), plt.violinplot(): Used to create box plots and violin plots for displaying the
distribution of
Key Features:
Code
plt.bar(categories, values)
6
plt.title("Bar Plot")
plt.show()
Figure :- 1
Applications:
7
Chapter-6
Seaborn Library
Seaborn Seaborn is a Python data visualization library based on Matplotlib that provides a
highlevel interface for creating informative and aesthetically pleasing statistical graphics. It is
particularly well-suited for data analysis and exploration, as it simplifies the process of
creating complex visualizations with concise code.
Key Features:
Code
# Create a histogram
plt.title("Histogram Example")
plt.show()
Figure :- 2
8
Applications:
9
Chapter-7
Scikit-Learn Library
Scikit-learn is a library for machine learning, providing tools for data preprocessing, model
building, and evaluation.
Key Features:
Code
iris = load_iris()
X, y = iris.data, iris.target
model = DecisionTreeClassifier()
model.fit(X, y)
print("Predictions:", model.predict(X[:5]))
Applications:
10
Chapter-8
Tensor Flow
TensorFlow is an open-source machine learning framework developed by Google. It's
designed for creating, training, and deploying machine learning models, particularly deep
learning models. TensorFlow allows you to build and train neural networks for a wide range
of machine learning tasks. Here's an explanation of TensorFlow in the context of machine
learning, along with some key functions and concepts:
TensorFlow Core:
• Tensors: TensorFlow is named after its core concept, tensors, which are multi-
dimensional arrays. Tensors can be constants, variables, or placeholders.
• Computational Graph: TensorFlow builds a computational graph that
represents the operations to be performed on tensors. This allows for efficient
execution and optimization.
Building Models:
• Functional API: TensorFlow allows you to create models with more complex
architectures, including multiple inputs and outputs.
Layers and Activation Functions:
• model.compile(): Configures the model with the chosen loss function, optimizer, and
metrics.
• model.fit(): Trains the model on labeled training data, specifying the number of
11
Chapter-9
References
NumPy: Core library for numerical computations, offering powerful array objects and
functions for mathematical operations.
Pandas: Ideal for data manipulation and analysis. It provides DataFrame and Series objects
to handle structured data efficiently.
Matplotlib: A plotting library for creating static, interactive, and animated visualizations.
Scikit-learn: A comprehensive library for machine learning, including tools for classification,
regression, clustering, and preprocessing.
TensorFlow and PyTorch: Popular frameworks for deep learning and neural network tasks.
SciPy: Enhances NumPy with advanced scientific computing features like optimization and
integration.
12