0% found this document useful (0 votes)
18 views

suraj report file

Report

Uploaded by

Deepak kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

suraj report file

Report

Uploaded by

Deepak kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

A

Seminar-
Report On
“Python Libraries For Data Science”
Submitted
In practical fulfillment
For the award of the Degree of
Bachelor of Technology
In Department of Computer Engineering

Submitted To: Submitted By:


Mr. Pawan Sen Suraj Kumar Kushwah
Head of Department RollNo.21EAYCS134
IV Year VIIth Sem

Department of Computer Science & Engineering


Arya College of Engineering & Research Centre, Kukas, Jaipur
Rajasthan Technical University, Kota
(2024-25)

i
CANDIDATE'S DECLARATION

This is to certify that seminar report for Python Libraries For Data Science is submitted
by "Suraj Kumar Kushwah (21EAYCS134)" in partial fulfillment for the award
of degree of Bachelor of Technology in Computer Science & Engineering has
been found satisfactory and is approved for submission.

Suraj Kumar Kushwah


21EAYCS134

ii
ACKNOWLEDGEMENT

I have no sufficient words to express my whole heartedly feelings of gratitude


to my respected Mr. Pawan Sen HOD, Department of Computer Science &
engineering and Prof. (Dr.) Himanshu Arora, Principal, ACERC, Jaipur. I am
very thankful to Sir and grateful to have an opportunity to work under his
supervision, which provided me with his generous guidance, valuable help and
endless encouragement by taking personal interest and attention.

I am so much thankful to Dr. Arvind Agrawal, Chairperson of Arya college,


Dr. Pooja Agrawal, Vice Chairperson of Arya college, who gave me immense
support throughout the seminar work in B.Tech. I wish to express my sincere
regards and thanks to All Teachers of Department of Computer Science &
engineering, Jaipur for being more than willing to share their treasure of
knowledge with me.

Suraj Kumar Kushwah


Roll no: 21EAYCS134

iii
ABSTRACT

Python is a versatile programming language that has become a cornerstone in


data science due to its robust ecosystem of libraries. Key libraries include
NumPy for numerical computations, Pandas for data manipulation, and
Matplotlib and Seaborn for data visualization. For machine learning, Scikit-
learn provides powerful algorithms, while TensorFlow and PyTorch enable
deep learning applications. These libraries, together with tools like Statsmodels
for statistical modeling and NLTK for natural language processing, empower
data scientists to analyze, model, and visualize data efficiently, making Python
an essential tool for insights and decision-making.

iv
Table of Contents
TITLE PAGE No
Cover Page i
Candidate’s Declaration ii
Acknowledgement iii
Abstract iv
1. Introduction 1
2. Important of python libraries for data science 2
3. Numpy Library 3
4. Pandas Library 4-5
5. Matplotlib Library 6-7
6. Seaborn Library 8-9
7. Scikit-Learn Library 10
8. TensorFlow 11
9. References 12

5
Chapter – 1

Introduction

Data science has emerged as one of the most impactful domains in technology, enabling
businesses and researchers to derive meaningful insights from data. Python, a versatile and
high-level programming language, has established itself as a preferred tool for data science
due to its simplicity and the rich ecosystem of libraries.

The journey of data science involves multiple stages, including data collection,
preprocessing, analysis, visualization, and model building. Python provides dedicated
libraries for each of these stages. For instance, NumPy facilitates numerical computations,
Pandas enables efficient data manipulation, and Scikit-learn and TensorFlow power
machine learning and deep learning tasks. This report highlights the features, advantages, and
practical applications of these libraries with hands-on code examples.

The versatility and active community support of Python have made it an indispensable tool in
data science. As data continues to grow in volume and complexity, mastering Python and its
libraries is becoming a necessity for professionals across industries.

1
Chapter-2

Important of Python Library For Data Science

Python’s dominance in data science is rooted in its ability to adapt to the diverse needs of the
field. Below are some of the core reasons why Python is indispensable in data science:

1. Simplicity and Readability: Python’s syntax is straightforward and mimics natural


language, making it easier for non-programmers, such as statisticians and analysts, to
adopt.
2. Extensive Libraries: Python offers libraries for every aspect of data science.
Libraries like NumPy and Pandas simplify data manipulation, while TensorFlow and
PyTorch provide powerful tools for advanced analytics and deep learning.
3. Community and Support: Python has a large, active community that contributes to
the development of libraries, tutorials, and troubleshooting forums. This ensures that
resources and support are always available.
4. Cross-Platform Compatibility: Python runs on all major operating systems
(Windows, macOS, Linux), making it highly versatile for data science projects that
require collaboration across platforms.
5. Scalability and Integration: Python can handle large-scale data processing tasks
with libraries like PySpark and Dask. It also integrates seamlessly with other
programming languages and databases.
6. Open-Source Nature: Python and its libraries are open-source, reducing costs and
encouraging widespread adoption across industries.

2
Chapter-3

Numpy Library

NumPy, short for Numerical Python, is the foundation of numerical computing in Python. It
offers support for large, multi-dimensional arrays and matrices, along with a rich set of
mathematical functions to operate on these arrays.

Key Features:

• Efficient array computation using ndarrays.


• Broadcasting capabilities for performing operations on arrays of different shapes.
• Tools for linear algebra, Fourier transform, and random number generation.
• High performance due to optimized C and Fortran code under the hood.

Advantages:

• Acts as the backbone for other libraries such as Pandas, Scikit-learn, and TensorFlow.
• Handles data processing tasks with minimal memory consumption.

Code

import numpy as np

# Create a 2D array

array = np.array([[1, 2, 3], [4, 5, 6]])

print("Original Array:\n", array)

Applications:
NumPy is essential for tasks like:

• Performing numerical simulations in engineering.


• Developing statistical models in data analysis.
• Accelerating computational tasks in machine learning.

3
Chapter-4

Pandas Library

Pandas is a powerful, flexible, and easy-to-use library for data manipulation and analysis. It
provides data structures like Series (one-dimensional) and DataFrames (two-dimensional) to
manage structured data efficiently.

Key Features:

• DataFrames: Tabular data structure with labeled rows and columns.


• Flexible indexing, filtering, and subsetting tools.
• Support for handling missing data.
• Integration with visualization libraries for quick data plotting.

Advantages:

• Simplifies data preprocessing tasks like merging, reshaping, and grouping.


• Handles large datasets efficiently with optimized performance.

Code

import pandas as pd

# Create a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Salary': [50000, 60000, 70000, 80000]
}
df = pd.DataFrame(data)

# Perform operations
print("DataFrame:\n", df)
print("Summary statistics:\n", df.describe())

4
print("Rows where Age > 30:\n", df[df['Age'] > 30])

# Adding a new column


df['Tax'] = df['Salary'] * 0.2
print("Updated DataFrame:\n", df)

Applications:
Pandas is widely used in:

• Exploratory Data Analysis (EDA).


• Data cleaning and wrangling.
• Preparing data for machine learning models.

5
Chapter-5

Matplotlib Library
Matplotlib is a powerful Python library for creating data visualizations and plots. It provides
various functio and modules that enable users to customize, create, and display a wide range
of visualizations. Here are som functions and concepts associated with Matplotlib in the
context of data analysis and visualization.

Basic Plotting Functions:

plt.plot(): Creates line plots and can be used for visualizing trends over continuous data
points.

plt.scatter(): Generates scatter plots for visualizing relationships between two variables.

plt.bar(), plt.barh(): Creates bar charts for displaying categorical data.

plt.hist(): Generates histograms for visualizing the distribution of data.

plt.boxplot(), plt.violinplot(): Used to create box plots and violin plots for displaying the
distribution of

data and identifying outliers.

plt.pie(): Generates pie charts for displaying parts of a whole.

Key Features:

1. Customizable plots (colors, styles, labels).


2. Support for 2D and 3D plotting.
3. Integration with NumPy and Pandas.

Code

import matplotlib.pyplot as plt

categories = ['A', 'B', 'C']

values = [10, 20, 15]

plt.bar(categories, values)

6
plt.title("Bar Plot")

plt.show()

Figure :- 1

Applications:

1. Visualizing time-series data.


2. Enhancing research papers with professional plots.
3. Business reporting and presentations.

7
Chapter-6

Seaborn Library

Seaborn Seaborn is a Python data visualization library based on Matplotlib that provides a
highlevel interface for creating informative and aesthetically pleasing statistical graphics. It is
particularly well-suited for data analysis and exploration, as it simplifies the process of
creating complex visualizations with concise code.

Key Features:

1. Built-in themes for better visuals.


2. Statistical plots like boxplots, heatmaps, and violin plots.
3. Seamless integration with Pandas.

Code

import seaborn as sns

# Create a histogram

sns.histplot([1, 2, 2, 3, 3, 3, 4, 4, 5], kde=True)

plt.title("Histogram Example")

plt.show()

Figure :- 2

8
Applications:

1. Exploring distributions and relationships in datasets.


2. Quick data visualizations during EDA.
3. Enhancing data storytelling.

9
Chapter-7

Scikit-Learn Library

Scikit-learn is a library for machine learning, providing tools for data preprocessing, model
building, and evaluation.

Key Features:

1. Support for supervised and unsupervised learning.


2. Feature engineering and selection tools.
3. Easy-to-use API for quick implementation

Code

from sklearn.datasets import load_iris

from sklearn.tree import DecisionTreeClassifier

# Load dataset and train model

iris = load_iris()

X, y = iris.data, iris.target

model = DecisionTreeClassifier()

model.fit(X, y)

print("Predictions:", model.predict(X[:5]))

Applications:

1. Building predictive models.


2. Feature extraction and dimensionality reduction.
3. Evaluating and fine-tuning machine learning pipelines.

10
Chapter-8

Tensor Flow
TensorFlow is an open-source machine learning framework developed by Google. It's
designed for creating, training, and deploying machine learning models, particularly deep
learning models. TensorFlow allows you to build and train neural networks for a wide range
of machine learning tasks. Here's an explanation of TensorFlow in the context of machine
learning, along with some key functions and concepts:

TensorFlow Core:

• Tensors: TensorFlow is named after its core concept, tensors, which are multi-
dimensional arrays. Tensors can be constants, variables, or placeholders.
• Computational Graph: TensorFlow builds a computational graph that
represents the operations to be performed on tensors. This allows for efficient
execution and optimization.

Building Models:

• Sequential API (Keras): TensorFlow's Keras API is the preferred high-level


interface for building deep learning models. It simplifies the process of creating
and training neural networks.

• Functional API: TensorFlow allows you to create models with more complex
architectures, including multiple inputs and outputs.
Layers and Activation Functions:

• TensorFlow provides a variety of layers like Dense, Conv2D, LSTM, and


activation functions like relu, sigmoid, and tanh to build neural network
architectures.

• RMSprop to minimize the loss during training.


Model Training:

• model.compile(): Configures the model with the chosen loss function, optimizer, and
metrics.

• model.fit(): Trains the model on labeled training data, specifying the number of

11
Chapter-9

References
NumPy: Core library for numerical computations, offering powerful array objects and
functions for mathematical operations.

Pandas: Ideal for data manipulation and analysis. It provides DataFrame and Series objects
to handle structured data efficiently.

Matplotlib: A plotting library for creating static, interactive, and animated visualizations.

Seaborn: Built on Matplotlib, it simplifies complex visualizations with beautiful, high-level


interfaces.

Scikit-learn: A comprehensive library for machine learning, including tools for classification,
regression, clustering, and preprocessing.

TensorFlow and PyTorch: Popular frameworks for deep learning and neural network tasks.

Statsmodels: Focused on statistical modeling and hypothesis testing.

SciPy: Enhances NumPy with advanced scientific computing features like optimization and
integration.

12

You might also like