0% found this document useful (0 votes)
50 views

Big Data Analytics Notes

This document provides an introduction to big data analytics. It defines big data and discusses its history, benefits, and types. Regarding types, it describes structured, unstructured, and semi-structured data. It also covers the three V's of big data - volume, velocity, and variety. Finally, it discusses why big data is important for cost savings, targeted promotions, and innovation.

Uploaded by

mazlankhan1430
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Big Data Analytics Notes

This document provides an introduction to big data analytics. It defines big data and discusses its history, benefits, and types. Regarding types, it describes structured, unstructured, and semi-structured data. It also covers the three V's of big data - volume, velocity, and variety. Finally, it discusses why big data is important for cost savings, targeted promotions, and innovation.

Uploaded by

mazlankhan1430
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Big Data Analytics

Instructor: Mr. Hasib Shamshad


Introduction to the Course
• What is Big Data?
According to Gartner, the definition of Big Data – “Big data” is high-volume,
velocity, and variety information assets that demand cost-effective, innovative
forms of information processing for enhanced insight and decision making.”
– Big Data refers to complex and large data sets that have to be processed and
analyzed to uncover valuable information that can benefit businesses and
organizations
The History of Big Data
• Although the concept of big data itself is relatively new, the origins of large data sets go
back to the 1960s and '70s when the world of data was just getting started with the first
data centers and the development of the relational database.
• Around 2005, people began to realize just how much data users generated through
Facebook, YouTube, and other online services. Hadoop (an open-source framework
created specifically to store and analyze big data sets) was developed that same year.
NoSQL also began to gain popularity during this time.
• The development of open-source frameworks, such as Hadoop (and more recently,
Spark) was essential for the growth of big data because they make big data easier to
work with and cheaper to store. In the years since then, the volume of big data has
skyrocketed. Users are still generating huge amounts of data—but it’s not just humans
who are doing it.
• With the advent of the Internet of Things (IoT), more objects and devices are connected
to the internet, gathering data on customer usage patterns and product performance.
The emergence of machine learning has produced still more data.
Benefits of Big Data and Data Analytics
• Big data makes it possible for you to gain more complete answers
because you have more information.
• More complete answers mean more confidence in the data—which
means a completely different approach to tackling problems.
• It Includes:
1. Customer Acquisition and Retention
2. Focused and Targeted Promotions
Benefits of Big Data and Data Analytics(Cont.)
3. Potential Risks Identification
4. Innovate
5. Complex Supplier Networks
6. Cost optimization
Types of Big Data
1. Structured(Relational data):
Structured is one of the types of big data and By structured data, we
mean data that can be processed, stored, and retrieved in a fixed
format. It refers to highly organized information that can be readily and
seamlessly stored and accessed from a database by simple search
engine algorithms. For instance, the employee table in a company
database will be structured as the employee details, their job positions,
their salaries, etc., will be present in an organized manner.
Types of Big Data (Cont..)
• Structured data is generally tabular data that is represented by
columns and rows in a database.
• Databases that hold tables in this form are called relational
databases.
• The mathematical term “relation” specify to a formed set of data
held as a table.
• In structured data, all row in a table has the same set of
columns.
• SQL (Structured Query Language) programming language used
for structured data.
Structured Data
Types of Big Data(Cont..)
2. Unstructured(Non-Relational data):
Unstructured data refers to the data that lacks any specific form or structure
whatsoever. This makes it very difficult and time-consuming to process and
analyze unstructured data. Email is an example of unstructured data.
• It is the information that either does not organize in a pre-defined
manner or not have a pre-defined data model.
• Unstructured information is a set of text-heavy but may contain data
such as numbers, dates, and facts as well.
• Videos, audio, and binary data files might not have a specific
structure. They’re assigned to as unstructured data.
Unstructured data
Types of Big Data (Cont..)
3. Semi-structured:
Semi-structured data pertains to the data containing both the formats
mentioned above, that is, structured and unstructured data. To be precise, it
refers to the data that although has not been classified under a particular
repository (database), yet contains vital information or tags that segregate
individual elements within the data. Thus we come to the end of types of
data.
• Semi-structured data is information that doesn’t consist of
Structured data (relational database) but still has some structure to
it.
• Semi-structured data consist of documents held in JavaScript
Object Notation (JSON) format. It also includes key-value stores
and graph databases.
Semi-structured data
Assignment 2
Write the Pros and Cons of Structured, Unstructured and
Semistructured data
Characteristics of Structured & Unstructured data
Relational Data
• Relational databases provide undoubtedly the most well-understood
model for holding data.
• The simplest structure of columns and tables makes them very easy to use
initially, but the inflexible structure can cause some problems.
• We can communicate with relational databases using Structured Query
Language (SQL).
• SQL allows the joining of tables using a few lines of code, with a structure
most beginner employees can learn very fast.
• Examples of relational databases:
• MySQL
• PostgreSQL
• Db2
Characteristics of Structured & Unstructured data(cont..)
Non-Relational Data
• Non-relational databases permit us to store data in a format that more
closely meets the original structure.
• A non-relational database is a database that does not use the tabular
schema of columns and rows found in most traditional database systems.
• It uses a storage model that is enhanced for the specific requirements of
the type of data being stored.
• In a non-relational database the data may be stored as JSON documents,
as simple key/value pairs, or as a graph consisting of edges and vertices.
• Examples of non-relational databases:
• Redis
• JanusGraph
• MongoDB
• RabbitMQ
Characteristics of Big Data
3 ‘V’s of Big Data – Variety, Velocity, and Volume
• Variety:
Variety of Big Data refers to structured, unstructured, and semi-
structured data that is gathered from multiple sources. While in the
past, data could only be collected from spreadsheets and databases,
today data comes in an array of forms such as emails, PDFs, photos,
videos, audios, SM posts, and so much more.
Characteristics of Big Data(cont..)
• Velocity
Velocity essentially refers to the speed at which data is being created in
real-time. In a broader prospect, it comprises the rate of change,
linking of incoming data sets at varying speeds, and activity bursts.
• Volume
Volume is one of the characteristics of big data. We already know that
Big Data indicates huge ‘volumes’ of data that is being generated on a
daily basis from various sources like social media platforms, business
processes, machines, networks, human interactions, etc. Such a large
amount of data is stored in data warehouses
References
• Simplilearn.com
https://www.simplilearn.com/benefits-of-big-data-and-analytics-article
• K21academy.com
https://k21academy.com/microsoft-azure/dp-900/structured-data-vs-
unstructured-data-vs-semi-structured-
data/#:~:text=Structured%20data%20is%20stored%20is,databases%20
or%20other%20data%20table
Big Data Analytics
Lecture 2
Instructor: Mr. Hasib Shamshad
Characteristics of Structured & Unstructured data
Relational Data
• Relational databases provide undoubtedly the most well-understood
model for holding data.
• The simplest structure of columns and tables makes them very easy to use
initially, but the inflexible structure can cause some problems.
• We can communicate with relational databases using Structured Query
Language (SQL).
• SQL allows the joining of tables using a few lines of code, with a structure
most beginner employees can learn very fast.
• Examples of relational databases:
• MySQL
• PostgreSQL
• Db2
Characteristics of Structured & Unstructured data(cont..)
Non-Relational Data
• Non-relational databases permit us to store data in a format that more
closely meets the original structure.
• A non-relational database is a database that does not use the tabular
schema of columns and rows found in most traditional database systems.
• It uses a storage model that is enhanced for the specific requirements of
the type of data being stored.
• In a non-relational database the data may be stored as JSON documents,
as simple key/value pairs, or as a graph consisting of edges and vertices.
• Examples of non-relational databases:
• Redis
• JanusGraph
• MongoDB
• RabbitMQ
Characteristics of Big Data
3 ‘V’s of Big Data – Variety, Velocity, and Volume
• Variety:
Variety of Big Data refers to structured, unstructured, and semi-
structured data that is gathered from multiple sources. While in the
past, data could only be collected from spreadsheets and databases,
today data comes in an array of forms such as emails, PDFs, photos,
videos, audios, SM posts, and so much more.
Characteristics of Big Data(cont..)
• Velocity
Velocity essentially refers to the speed at which data is being created in
real-time. In a broader prospect, it comprises the rate of change,
linking of incoming data sets at varying speeds, and activity bursts.
• Volume
Volume is one of the characteristics of big data. We already know that
Big Data indicates huge ‘volumes’ of data that is being generated on a
daily basis from various sources like social media platforms, business
processes, machines, networks, human interactions, etc. Such a large
amount of data is stored in data warehouses
Why is Big Data Important?
• Every company uses data in its own way; the more efficiently a company
uses its data, the more potential it has to grow. The company can take data
from any source and analyze it for better performance includes:
• Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based
Analytics can bring cost advantages to business when large amounts of
data are to be stored and these tools also help in identifying more efficient
ways of doing business.
• Time Reductions: The high speed of tools like Hadoop and in-memory
analytics can easily identify new sources of data which helps businesses
analyzing data immediately and make quick decisions based on the
learning.
Why is Big Data Important? (cont..)
• Understand the market conditions: By analyzing big data you can get
a better understanding of current market conditions. For example, by
analyzing customers’ purchasing behaviors, a company can find out
the products that are sold the most and produce products according
to this trend. By this, It can get ahead of its competitors.
• Control online reputation: Big data tools can do sentiment analysis.
Therefore, you can get feedback about who is saying what about your
company. If you want to monitor and improve the online presence of
your business, then, big data tools can help in all this.
Why is Big Data Important? (cont..)
• Using Big Data Analytics to Boost Customer Acquisition and
Retention :
The customer is the most important asset any business depends on.
There is no single business that can claim success without first having
to establish a solid customer base. However, even with a customer
base, a business cannot afford to disregard the high competition it
faces. If a business is slow to learn what customers are looking for, then
it is very easy to begin offering poor quality products. In the end, loss of
clientele will result, and this creates an adverse overall effect on
business success. The use of big data allows businesses to observe
various customer related patterns and trends. Observing customer
behavior is important to trigger loyalty.
Why is Big Data Important? (cont..)
• Using Big Data Analytics to Solve Advertisers Problem and Offer
Marketing Insights:
Big data analytics can help change all business operations. This includes
the ability to match customer expectation, changing company’s product
line and of course ensuring that the marketing campaigns are powerful.
Big Data Analytics As a Driver of Innovations and Product
Development:
It helps companies innovate and redevelop their products.
Activities performed on Big Data
1. Store – Big data need to be collected in a seamless repository, and
it is not necessary to store in a single physical database.
2. Process – The process becomes more tedious than traditional one
in terms of cleansing, enriching, calculating, transforming, and
running algorithms.
3. Access – There is no business sense of it at all when the data
cannot be searched, retrieved easily, and can be virtually
showcased along the business lines.
References
• Simplilearn.com
https://www.simplilearn.com/benefits-of-big-data-and-analytics-article
• K21academy.com
https://k21academy.com/microsoft-azure/dp-900/structured-data-vs-
unstructured-data-vs-semi-structured-
data/#:~:text=Structured%20data%20is%20stored%20is,databases%20
or%20other%20data%20table
Big Data Analytics
Instructor: Mr. Hasib Shamshad
Topics
1. Classification of analytics(Advantages, Examples)
2. Big Data Analytics Challenges
Classification of analytics
1. Descriptive analytics
Descriptive analytics is a statistical method that is used to search and
summarize historical data in order to identify patterns or meaning.
Data aggregation and data mining are two techniques used in
descriptive analytics to discover historical data. Data is first gathered
and sorted by data aggregation in order to make the datasets more
manageable by analysts.
Data mining describes the next step of the analysis and involves a
search of the data to identify patterns and meaning. Identified patterns
are analyzed to discover the specific ways that learners interacted with
the learning content and within the learning environment.
Classification of analytics(Cont..)
Advantages:
• Quickly and easily report on the Return on Investment (ROI) by
showing how performance achieved business or target goals.
• Identify gaps and performance issues early - before they become
problems.
• Identify specific learners who require additional support, regardless
of how many students or employees there are.
• Identify successful learners in order to offer positive feedback or
additional resources.
• Analyze the value and impact of course design and learning resources.
Classification of analytics(Cont..)
2. Predictive analytics:
Predictive Analytics is a statistical method that utilizes algorithms and
machine learning to identify trends in data and predict future
behaviors.
The software for predictive analytics has moved beyond the realm of
statisticians and is becoming more affordable and accessible for
different markets and industries, including the field of learning &
development.
Classification of analytics(Cont..)
For online learning specifically, predictive analytics is often found
incorporated in the Learning Management System (LMS), but can also
be purchased separately as specialized software.
For the learner, predictive forecasting could be as simple as a
dashboard located on the main screen after logging in to access a
course. Analyzing data from past and current progress, visual indicators
in the dashboard could be provided to signal whether the employee
was on track with training requirements.
Classification of analytics(Cont..)
Advantages:
• Personalize the training needs of employees by identifying their gaps,
strengths, and weaknesses; specific learning resources and training can be
offered to support individual needs.
• Retain Talent by tracking and understanding employee career progression
and forecasting what skills and learning resources would best benefit their
career paths. Knowing what skills employees need also benefits the design
of future training.
• Support employees who may be falling behind or not reaching their
potential by offering intervention support before their performance puts
them at risk.
• Simplified reporting and visuals that keep everyone updated when
predictive forecasting is required.
Classification of analytics(Cont..)
• Prescriptive analytics:
Prescriptive analytics is a statistical method used to generate
recommendations and make decisions based on the computational
findings of algorithmic models.
Generating automated decisions or recommendations requires specific
and unique algorithmic models and clear direction from those utilizing
the analytical technique. A recommendation cannot be generated
without knowing what to look for or what problem is desired to be
solved. In this way, prescriptive analytics begins with a problem.
Classification of analytics(Cont..)
Example
A Training Manager uses predictive analysis to discover that most learners
without a particular skill will not complete the newly launched course. What
could be done? Now prescriptive analytics can be of assistance on the matter
and help determine options for action. Perhaps an algorithm can detect the
learners who require that new course, but lack that particular skill, and send
an automated recommendation that they take an additional training
resource to acquire the missing skill.
The accuracy of a generated decision or recommendation, however, is only
as good as the quality of data and the algorithmic models developed. What
may work for one company’s training needs may not make sense when put
into practice in another company’s training department. Models are
generally recommended to be tailored for each unique situation and need
Classification of analytics(Cont..)
Descriptive vs Predictive vs Prescriptive Analytics
Descriptive Analytics is focused solely on historical data. You can think
of Predictive Analytics as then using this historical data to develop
statistical models that will then forecast about future possibilities.
Prescriptive Analytics takes Predictive Analytics a step further and takes
the possible forecasted outcomes and predicts consequences for these
outcomes.
Big Data Analytics Challenges (Cont..)
1. Need For Synchronization Across Disparate Data Sources
As data sets are becoming bigger and more diverse, there is a big challenge to
incorporate them into an analytical platform. If this is overlooked, it will create gaps
and lead to wrong messages and insights.
2. Acute Shortage Of Professionals Who Understand Big Data Analysis
The analysis of data is important to make this voluminous amount of data being
produced in every minute, useful. With the exponential rise of data, a huge
demand for big data scientists and Big Data analysts has been created in the
market. It is important for business organizations to hire a data scientist having
skills that are varied as the job of a data scientist is multidisciplinary. Another major
challenge faced by businesses is the shortage of professionals who understand Big
Data analysis. There is a sharp shortage of data scientists in comparison to the
massive amount of data being produced.
Big Data Analytics Challenges (Cont..)
3. Getting Meaningful Insights Through The Use Of Big Data Analytics
It is imperative for business organizations to gain important insights from Big
Data analytics, and also it is important that only the relevant department has
access to this information. A big challenge faced by the companies in the Big
Data analytics is mending this wide gap in an effective manner.
4. Getting Voluminous Data Into The Big Data Platform
It is hardly surprising that data is growing with every passing day. This simply
indicates that business organizations need to handle a large amount of data
on daily basis. The amount and variety of data available these days can
overwhelm any data engineer and that is why it is considered vital to make
data accessibility easy and convenient for brand owners and managers.
Big Data Analytics Challenges (Cont..)
5. Uncertainty Of Data Management Landscape:
With the rise of Big Data, new technologies and companies are being developed
every day. However, a big challenge faced by the companies in the Big Data
analytics is to find out which technology will be best suited to them without the
introduction of new problems and potential risks.
6. Data Storage And Quality:
Business organizations are growing at a rapid pace. With the tremendous growth of
the companies and large business organizations, increases the amount of data
produced. The storage of this massive amount of data is becoming a real challenge
for everyone. Popular data storage options like data lakes/ warehouses are
commonly used to gather and store large quantities of unstructured and structured
data in its native format. The real problem arises when a data lakes/ warehouse try
to combine unstructured and inconsistent data from diverse sources, it encounters
errors. Missing data, inconsistent data, logic conflicts, and duplicates data all result
in data quality challenges.
Big Data Analytics Challenges (Cont..)
7. Security And Privacy Of Data
Once business enterprises discover how to use Big Data, it brings them
a wide range of possibilities and opportunities. However, it also
involves the potential risks associated with big data when it comes to
the privacy and the security of the data. The Big Data tools used for
analysis and storage utilizes the data disparate sources. This eventually
leads to a high risk of exposure of the data, making it vulnerable. Thus,
the rise of voluminous amount of data increases privacy and security
concerns.
Big Data Analytics
Instructor: Mr. Hasib Shamshad
Topics
1. Life Cycle Phases of Data Analytics
2. Top Analytics Tools
Life Cycle Phases of Data Analytics
Life Cycle Phases of Data Analytics
The Data analytics lifecycle was designed to address Big Data
problems and data science projects. The process is repeated to show
the real projects. To address the specific demands for conducting
analysis on Big Data, the step-by-step methodology is required to plan
the various tasks associated with the acquisition, processing, analysis,
and recycling of data.
Phase 1: Discovery –
• The data science team is trained and researches the issue.
• Create context and gain understanding.
• Learn about the data sources that are needed and accessible to the
project.
• The team comes up with an initial hypothesis, which can be later
confirmed with evidence.
Life Cycle Phases of Data Analytics(cont..)
• Phase 2: Data Preparation -
• Methods to investigate the possibilities of pre-processing,
analysing, and preparing data before analysis and modelling.
• It is required to have an analytic sandbox. The team performs,
loads, and transforms to bring information to the data sandbox.
• Data preparation tasks can be repeated and not in a
predetermined sequence.
• Some of the tools used commonly for this process include -
Hadoop, Alpine Miner, Open Refine, etc.
Life Cycle Phases of Data Analytics(cont..)
• Phase 3: Model Planning -
• The team studies data to discover the connections between
variables. Later, it selects the most significant variables as well
as the most effective models.
• In this phase, the data science teams create data sets that can
be used for training for testing, production, and training goals.
• The team builds and implements models based on the work
completed in the modelling planning phase.
• Some of the tools used commonly for this stage are MATLAB
and STASTICA.
Life Cycle Phases of Data Analytics(cont..)
• Phase 4: Model Building -
• The team creates datasets for training, testing as well as
production use.
• The team is also evaluating whether its current tools are
sufficient to run the models or if they require an even more
robust environment to run models.
• Tools that are free or open-source or free tools Rand PL/R,
Octave, WEKA.
• Commercial tools - MATLAB, STASTICA.
Life Cycle Phases of Data Analytics(cont..)
• Phase 5: Communication Results -
• Following the execution of the model, team members will need
to evaluate the outcomes of the model to establish criteria for
the success or failure of the model.
• The team is considering how best to present findings and
outcomes to the various members of the team and other
stakeholders while taking into consideration cautionary tales
and assumptions.
• The team should determine the most important findings,
quantify their value to the business and create a narrative to
present findings and summarize them to all stakeholders
Life Cycle Phases of Data Analytics(cont..)
• Phase 6: Operationalize -
• The team distributes the benefits of the project to a wider audience.
It sets up a pilot project that will deploy the work in a controlled
manner prior to expanding the project to the entire enterprise of
users.
• This technique allows the team to gain insight into the performance
and constraints related to the model within a production setting at a
small scale and then make necessary adjustments before full
deployment.
• The team produces the last reports, presentations, and codes.
• Open source or free tools such as WEKA, SQL, MADlib, and Octave.
Top Analytics Tools
1. R-Language: is a language for statistical computing and graphics. It also
used for big data analysis. It provides a wide variety of statistical tests.
Features:
• Effective data handling and storage facility.
• It provides a suite of operators for calculations on arrays, in particular,
matrices.
• It provides coherent, integrated collection of big data tools for data analysis
• It provides graphical facilities for data analysis which display either on-
screen or on hardcopy.
Top Analytics Tools(cont..)
2. Apache Spark is a powerful open source big data analytics tool. It
offers over 80 high-level operators that make it easy to build parallel
apps. It is used at a wide range of organizations to process large
datasets.
Features:
• It helps to run an application in Hadoop cluster, up to 100 times faster
in memory, and ten times faster on disk
• It offers lighting Fast Processing
• Support for Sophisticated Analytics
• Ability to Integrate with Hadoop and Existing Hadoop Data
Top Analytics Tools(cont..)
3. Plotly: It is an analytics tool that lets users create charts and
dashboards to share online.
Features:
• Easily turn any data into eye-catching and informative graphics
• It provides audited industries with fine-grained information on data
provenance(origin).
• Plotly offers unlimited public file hosting through its free community
plan.
Top Analytics Tools(cont..)
4. Lumify: It is a big data fusion, analysis, and visualization platform. It helps
users to discover connections and explore relationships in their data via a suite
of analytic options.
Features:
• It provides both 2D and 3D graph visualizations with a variety of automatic
layouts.
• It provides a variety of options for analyzing the links between entities on the
graph
• It comes with specific ingest processing and interface elements for textual
content, images, and videos
• It spaces feature allows you to organize work into a set of projects, or
workspaces
• It is built on proven, scalable big data technologies
Top Analytics Tools(cont..)
5. IBM SPSS Modeler: It is a predictive big data analytics platform. It offers
predictive models and delivers to individuals, groups, systems and the
enterprise. It has a range of advanced algorithms and analysis techniques.
Features:
• Discover insights and solve problems faster by analyzing structured and
unstructured data
• Use an intuitive interface for everyone to learn
• You can select from on-premises, cloud and hybrid deployment options
• Quickly choose the best performing algorithm based on model
performance
Top Analytics Tools(cont..)
6. MongoDB is a NoSQL, document-oriented database written in C,
C++, and JavaScript. It is free to use and is an open source tool that
supports multiple operating systems including Windows Vista ( and
later versions), OS X (10.7 and later versions), Linux, Solaris, and
FreeBSD.
Its main features include Aggregation, Adhoc-queries, Uses BSON
format, Sharding, Indexing, Replication, Server-side execution of
javascript, Schemaless, Capped collection, MongoDB management
service (MMS), load balancing and file storage.
Top Analytics Tools(cont..)
Features:
• Easy to learn.
• Provides support for multiple technologies and platforms.
• No hiccups in installation and maintenance.
• Reliable and low cost
# DATA VISUALIZATION:
# Bar Chart
# Histogram
# Scatter Plot
# Stack Plot/ Area Plot
# Pie Chart

# Matplotlib library

from matplotlib import pyplot as plt


plt.plot([1,2,3,4,5],[10,20,30,40,50])
plt.show()

plt.scatter([1,2,3,4,5],[10,20,30,40,50])

<matplotlib.collections.PathCollection at 0x17254053ee0>
import numpy as np
import matplotlib.pyplot as plt
x= np.array([1,2,3,4,5,6,7,8])
y= np.array([15,32,66,45,90,153,170,200])
plt.title("Graph")
plt.xlabel("X axis")
plt.ylabel("Y axis")
plt.scatter(x,y)

<matplotlib.collections.PathCollection at 0x172540b2c40>
plt.title("Scatter Plot")
plt.xlabel("X axis")
plt.ylabel("Y axis")
plt.scatter(x,y,label= 'nothing', s=50, color = 'r', marker ='*')
plt.legend()
plt.show()
# Scatter plot is used to check the relationship between two
variables( Correlation)
plt.title("Graph")
plt.xlabel("X axis")
plt.ylabel("Y axis")
plt.plot(x,y, color='r')

[<matplotlib.lines.Line2D at 0x17254141760>]
# We can also visualize each variable by histogram
plt.hist(y)

(array([2., 1., 1., 0., 1., 0., 0., 1., 1., 1.]),
array([ 15. , 33.5, 52. , 70.5, 89. , 107.5, 126. , 144.5, 163. ,
181.5, 200. ]),
<BarContainer object of 10 artists>)
plt.hist(x)
plt.hist(y)

(array([2., 1., 1., 0., 1., 0., 0., 1., 1., 1.]),
array([ 15. , 33.5, 52. , 70.5, 89. , 107.5, 126. , 144.5, 163. ,
181.5, 200. ]),
<BarContainer object of 10 artists>)

from matplotlib import pyplot as plt


from matplotlib import style
style.use('ggplot')
x1= [5,8,10]
y1= [12,16,6]

x2= [6,9,11]
y2= [6,15,7]

plt.plot(x1,y1,'g',label= 'Fast food',linewidth=5)


plt.plot(x2,y2,'c',label='motu',linewidth=5)
plt.title("FOOD")
plt.xlabel("x axis")
plt.ylabel("y axis")
plt.legend()
plt.grid(True,color='k')

plt.show()
# Bar Graph : Bar Graphs are used to compare things between different
groups.
# Especially when we are trying to make changes over
time, bar graphs are very well suite.

import matplotlib.pyplot as plt


plt.bar([1,4,6,7,9],[5,6,7,8,2],label= "Example one",color='y')
plt.bar([2,4,6,8,10],[8,6,2,5,6],label="Example two", color='c')

plt.legend()
plt.title("BAR CHART ")
plt.xlabel("bar number")
plt.ylabel("bar height")
plt.show()
# HISTOGRAM:

import matplotlib.pyplot as plt


import numpy as np

age = np.array([22,55,62,45,75,21,22,34,42,4,99,100,104,52])

bins= np.array([0,10,20,30,40,50,60,70,80,90,100,110])

plt.hist(age, bins, histtype='bar',label= 'kuch nahi',rwidth=


0.8,linewidth= 5,color= 'c')
#plt.hist(age, bins)

plt.title("Histogram")
plt.xlabel("x axis")
plt.ylabel("y axis")
plt.legend()
#plt.grid(True,color='k')
plt.show()
# Difference between a Bar Chart & Histogram?
# In Histogram we have quantitative variable - E.g. how each age group
is contributing towards GDP in a specific country
# In Bar chart they have categorical variables - E.g. GDP in a country

# Stack Plot(Area plot): It can be used to track changes over time for
one or more related groups
#that make up 1 whole category

import matplotlib.pyplot as plt

days= [1,2,3,4,5]

sleeping= [7,8,6,11,7]
eating = [2,3,4,3,2]
working = [7,8,7,2,2]
playing = [8,5,7,8,13]

plt.plot([], [], color='m', label = "Khob", linewidth= 5)


plt.plot([], [],color='c', label = "Khorak", linewidth= 5)
plt.plot([], [], color='r', label = "lofari", linewidth= 5)
plt.plot([], [], color='k', label = "loby", linewidth= 5)

plt.stackplot(days,sleeping, eating, working, playing,


colors = ['m','c','r','k'] )
plt.xlabel('x')
plt.ylabel('y')
plt.title("Stack plot")
plt.legend()
plt.show()

# Pie Chart: used to show part to the whole,


# stack plot are for certain point in time

import matplotlib.pyplot as plt

slices = [7,2,2,13]
activities = ['sleeping', 'eating', 'working', 'playing']
col = ['c','m','r','b']

plt.pie(
slices,
labels = activities,
colors = col,
startangle = 90,
#shadow = True,
explode = (0,0.1,0,0),
autopct = '%1.1f%%'
)

plt.title("Pie Plot")
#plt.legend()
plt.show()
from sklearn import datasets
iris = datasets.load_iris()
digits = datasets.load_digits()
print(digits.data)

[[ 0. 0. 5. ... 0. 0. 0.]
[ 0. 0. 0. ... 10. 0. 0.]
[ 0. 0. 0. ... 16. 9. 0.]
...
[ 0. 0. 1. ... 6. 0. 0.]
[ 0. 0. 2. ... 12. 0. 0.]
[ 0. 0. 10. ... 12. 1. 0.]]

# NUMPY Library

from numpy import arange


a = arange(15)
print(a)

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]

from numpy import * # 2nd method


a = arange(15)
print(a)

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]

import numpy as np # 3rd method


a = np.array([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14])
a

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

a[4] #Accessing a specific value

import numpy as np
a = np.array([[1,2],[3,4]])
print(a)

[[1 2]
[3 4]]

import numpy as np
a= np.array([(1,2),(3,4)]) # same output as above
print(a)

[[1 2]
[3 4]]
import numpy as np # same output as above
a = np.matrix('1,2;3,4')
print(a)

[[1 2]
[3 4]]

a.shape # 2 rows & 2 columns

(2, 2)

a.ndim # checking the array dimension

# Numpy operations
# Numpy operation 1: Reshaping arrays
import numpy as np
a= np.array([(1,2,3,4),(3,4,5,6)]) # 2*2 array having 2 rows & 4
columns
a

array([[1, 2, 3, 4],
[3, 4, 5, 6]])

# reshaping (converting rows in to columns and columns into rows)

a = a.reshape(4,2) # 1st method


a

array([[1, 2],
[3, 4],
[3, 4],
[5, 6]])

a= a.T # 2nd Method i.e taking Transpose


a

array([[1, 3, 3, 5],
[2, 4, 4, 6]])

# Numpy operation(2): Slicing

import numpy as np
a= np.array([(1,2,3,4),(3,4,5,6)])
a[0,2] # Slicing

a= np.array([(1,2,3,4),(3,4,5,6),(7,8,9,10)])
a[0:,3] # Slicing
array([ 4, 6, 10])

a[0:2,3] # Slicing

array([4, 6])

# Numpy operation(3): Sum


a.sum()

62

# Numpy operation(4): Square root find using numpy


a = np.sqrt(a)
a

array([[1. , 1.41421356, 1.73205081, 2. ],


[1.73205081, 2. , 2.23606798, 2.44948974],
[2.64575131, 2.82842712, 3. , 3.16227766]])

# numpy operation(5): Standard deviation find using numpy


a =np.std(a)
a

0.6321412059754348

# numpy operation(6): Numpy used as Matrix multiplication, division,


addition and subtraction
import numpy as np
a= np.array([(1,2,3,4),(3,4,5,6),(7,8,9,10)])
b= np.array([(1,2,3,4),(3,4,5,6),(7,8,9,10)])
print(a*b)
# Same for subtraction,multiplication & division

[[ 1 4 9 16]
[ 9 16 25 36]
[ 49 64 81 100]]

You might also like