Dsa Report
Dsa Report
TECHNOLOGY, BHOPAL
Parth Rajput
211112249
5th Sem
CSE – 2
DECLARATION
I hereby certify that the work which is being presented in the report
entitled “Data Science” in fulfilment of the requirement for
completion of Summer industrial training in Department of
Computer Science of”Maulana Azad National Institute of
Technology ‘Bhopal’ is an authentic record of my own work and
project carried out during industrial training in this Summer
PARTH RAJPUT
211112249
5th Sem
CSE-2
ABSTRACT
Data Science :
In this Project we downloaded the Gfg.1 csv file from the website
which basically contains the IMDB Movie_ID . then ,we use the
the Web scraping Method of the data extraction for extracting the
data from the webpage of the IMDB Website.
Here we Pass the Movie_ID of the Movie as input with which the
extracted html script from webpage is then parse using beautiful
soup application and the desired Information is extracted which
Includes “Series_name” , “Series_rating” And “Series_genre”.
Then the corresponding output Graph is plotted using Matplotlib
library .
3). Numpy:
NumPy, short for "Numerical Python," is a fundamental open-
source library in Python for numerical and scientific computing. It
provides support for creating and manipulating arrays and matrices
of data, along with a wide range of mathematical functions to
operate on these arrays efficiently. NumPy is an essential tool for
tasks involving numerical computations, data analysis, and
scientific research. It's particularly valuable for its speed and
memory efficiency, making it a foundation for many other
4) Pandas:
Pandas is a popular open-source Python library for data manipulation
and analysis. It provides easy-to-use data structures and functions for
working with structured and tabular data, making it a valuable tool for
data scientists, analysts, and researchers. Pandas introduces two
primary data structures: the DataFrame, which is a two-dimensional
table-like data structure with rows and columns, and the Series, which
is a one-dimensional array-like structure.
Pandas simplifies various data operations, including data cleaning,
transformation, filtering, aggregation, and exploration. It allows users
to import data from a variety of sources, such as CSV files, Excel
spreadsheets, databases, and more, and then perform data wrangling
and analysis tasks with ease.
5) MATPLOTLIB :
Matplotlib is a popular open-source Python library for creating static,
animated, and interactive visualizations and plots. It offers a wide
range of tools and functions for generating high-quality charts,
graphs, and figures, making it an essential tool for data visualization
in fields such as data analysis, scientific research, and engineering.
6) json.loads:
‘json.loads’ is a Python method that stands for "JSON load string."
It is part of the json module in Python and is used to parse and
convert a JSON-formatted string into a Python data structure,
typically a dictionary, list, or a combination of both, depending on
the JSON content.
Implementation:
7) Beautiful soup():-
It is implemented using bs4 library present in phyton
Beautiful Soup is a popular Python library for web scraping and
parsing HTML and XML documents. It provides a convenient way
to extract specific information from web pages.
Beautiful Soup makes it easier to work with web data by providing
functions and methods to:
IMPLEMENTATION:
APPLICATION AND IT’S OUTCOME
Here I implemented the above mentioned Technology and tools
using Jupyter Notebook and I have demonstrates it’s Application in
My Project : “Web Scraping – IMDB”.
CODE:
OUTPUT :
I would like to express our gratitude to the trainers and mentors who
have guided me throughout this program, providing their expertise,
support, and encouragement. I also appreciate the opportunity to
collaborate with our fellow trainees, as the exchange of ideas and
experiences has enriched our learning.
http://www.w3schools.com
http://www.wikipidea.com
http://www.w3.org/standards/semanticweb/
http://www.technophilia.com
http://www.kaggle.com
http://www.javapoint.com
http://www.Googledatasetsearch.com
******