0% found this document useful (0 votes)
93 views

Unit I - BigData

Big data analytics is the process of extracting meaningful insights from large datasets. It allows for better decision making through identifying hidden patterns, trends, and customer preferences in data. Big data is now fueling industries like music streaming and recommendations are generated for users based on their activity. Key roles in big data analytics projects include business users to provide domain expertise, project sponsors to allocate resources, and data scientists to analyze data and generate insights.

Uploaded by

Farhan Sj
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

Unit I - BigData

Big data analytics is the process of extracting meaningful insights from large datasets. It allows for better decision making through identifying hidden patterns, trends, and customer preferences in data. Big data is now fueling industries like music streaming and recommendations are generated for users based on their activity. Key roles in big data analytics projects include business users to provide domain expertise, project sponsors to allocate resources, and data scientists to analyze data and generate insights.

Uploaded by

Farhan Sj
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

Unit I

Introduction to Big Data Business


Analytics
What is Big Data Analytics?
• Big Data analytics is a process used to extract meaningful insights,
such as hidden patterns, unknown correlations, market trends, and
customer preferences. Big Data analytics provides various advantages
—it can be used for better decision making, preventing fraudulent
activities, among other things.
Why is big data analytics important?
• In today’s world, Big Data analytics is fueling everything we do online— in every industry.
• Take the music streaming platform Spotify for example. The company has nearly 96
million users that generate a tremendous amount of data every day.
• Through this information, the cloud-based platform automatically generates suggested
songs—through a smart recommendation engine—based on likes, shares, search history,
and more.
• What enables this is the techniques, tools, and frameworks that are a result of Big Data
analytics.
• If you are a Spotify user, then you must have come across the top recommendation
section, which is based on your likes, past history, and other things. Utilizing a
recommendation engine that leverages data filtering tools that collect data and then
filter it using algorithms works. This is what Spotify does.
What is Big Data Analytics and Why It is
Important?
• What is Big Data?
Big Data is a massive amount of data sets that cannot be stored,
processed, or analyzed using traditional tools.

Today, there are millions of data sources that generate data at a very
rapid rate. These data sources are present across the world.
Some of the largest sources of data are social media platforms and
networks.
Let’s use Facebook as an example—it generates more than 500
terabytes of data every day.
This data includes pictures, videos, messages, and more. Data also
exists in different formats, like structured data, semi-structured data,
and unstructured data.
For example, in a regular Excel sheet, data is classified as structured
data—with a definite format.
In contrast, emails fall under semi-structured, and your pictures and
videos fall under unstructured data.
All this data combined makes up Big Data.
Benefits and Advantages of Big Data Analytics
1. Risk Management:
Use Case: Banco de Oro, a Phillippine banking company, uses Big Data
analytics to identify fraudulent activities and discrepancies. The
organization leverages it to narrow down a list of suspects or root
causes of problems.
2. Product Development and Innovations:
Use Case: Rolls-Royce, one of the largest manufacturers of jet engines
for airlines and armed forces across the globe, uses Big Data analytics
to analyze how efficient the engine designs are and if there is any need
for improvements.

3. Quicker and Better Decision Making Within Organizations:


Use Case: Starbucks uses Big Data analytics to make strategic decisions.
For example, the company leverages it to decide if a particular location
would be suitable for a new outlet or not. They will analyze several
different factors, such as population, demographics, accessibility of the
location, and more.
4. Improve Customer Experience:
Use Case: Delta Air Lines uses Big Data analysis to improve customer
experiences. They monitor tweets to find out their customers’
experience regarding their journeys, delays, and so on. The airline
identifies negative tweets and does what’s necessary to remedy the
situation. By publicly addressing these issues and offering solutions, it
helps the airline build good customer relations.
The Lifecycle Phases of Big Data Analytics:
• Stage 1 - Business case evaluation - The Big Data analytics lifecycle
begins with a business case, which defines the reason and goal behind
the analysis.
• Stage 2 - Identification of data - Here, a broad variety of data sources
are identified.
• Stage 3 - Data filtering - All of the identified data from the previous
stage is filtered here to remove corrupt data.
• Stage 4 - Data extraction - Data that is not compatible with the tool is
extracted and then transformed into a compatible form.
• Stage 5 - Data aggregation - In this stage, data with the same fields
across different datasets are integrated.
• Stage 6 - Data analysis - Data is evaluated using analytical and
statistical tools to discover useful information.
• Stage 7 - Visualization of data - With tools like Tableau, Power BI, and
QlikView, Big Data analysts can produce graphic visualizations of the
analysis.
• Stage 8 - Final analysis result - This is the last step of the Big Data
analytics lifecycle, where the final results of the analysis are made
available to business stakeholders who will take action.
Different Types of Big Data Analytics:
• Here are the four types of Big Data analytics:
1. Descriptive Analytics:
• This summarizes past data into a form that people can easily read. This
helps in creating reports, like a company’s revenue, profit, sales, and so on.
• Also, it helps in the tabulation of social media metrics.
• Use Case: The Dow Chemical Company analyzed its past data to increase
facility utilization across its office and lab space.
• Using descriptive analytics, Dow was able to identify underutilized space.
This space consolidation helped the company save nearly US $4 million
annually.
2. Diagnostic Analytics :
• This is done to understand what caused a problem in the first place.
Techniques like drill-down, data mining, and data recovery are all
examples.
• Organizations use diagnostic analytics because they provide an in-
depth insight into a particular problem.
• Use Case: An e-commerce company’s report shows that their sales
have gone down, although customers are adding products to their
carts. This can be due to various reasons like the form didn’t load
correctly, the shipping fee is too high, or there are not enough
payment options available. This is where you can use diagnostic
analytics to find the reason.
3. Predictive Analytics:
• This type of analytics looks into the historical and present data to
make predictions of the future.
• Predictive analytics uses data mining, AI, and machine learning to
analyze current data and make predictions about the future. It works
on predicting customer trends, market trends, and so on.
• Use Case: PayPal determines what kind of precautions they have to
take to protect their clients against fraudulent transactions. Using
predictive analytics, the company uses all the historical payment data
and user behavior data and builds an algorithm that predicts
fraudulent acti
4. Prescriptive Analytics :
• This type of analytics prescribes the solution to a particular problem.
Perspective analytics works with both descriptive and predictive
analytics.
• Most of the time, it relies on AI and machine learning.
• Use Case: Prescriptive analytics can be used to maximize an airline’s
profit. This type of analytics is used to build an algorithm that will
automatically adjust the flight fares based on numerous factors,
including customer demand, weather, destination, holiday season and
oil prices.
Big Data Analytics Tools:
Here are some of the key big data analytics tools :
• Hadoop - helps in storing and analyzing data
• MongoDB - used on datasets that change frequently
• Talend - used for data integration and management
• Cassandra - a distributed database used to handle chunks of data
• Spark - used for real-time processing and analyzing large amounts of data
• STORM - an open-source real-time computational system
• Kafka - a distributed streaming platform that is used for fault tolerant
storage
Big Data Industry Application:
Here are some of the sectors where Big Data is actively used:
• Ecommerce - Predicting customer trends and optimizing prices are a
few of the ways e-commerce uses Big Data analytics
• Marketing - Big Data analytics helps to drive high ROI marketing
campaigns, which result in improved sales
• Education - Used to develop new and improve existing courses based
on market requirements
• Healthcare - With the help of a patient’s medical history, Big Data
analytics is used to predict how likely they are to have health issues
• Media and entertainment - Used to understand the demand of shows,
movies, songs, and more to deliver a personalized recommendation
list to its users.
• Banking - Customer income and spending patterns help to predict the
likelihood of choosing various banking offers, like loans and credit
cards
• Telecommunications - Used to forecast network capacity and improve
customer experience
• Government - Big Data analytics helps governments in law
enforcement, among other things.
Key Roles for Data Analytics project:
• There are certain key roles that are required for the complete and
fulfilled functioning of the data science team to execute projects on
analytics successfully.
• The key roles are seven in number. Each key plays a crucial role in
developing a successful analytics project.
• There is no hard and fast rule for considering the listed seven roles,
they can be used fewer or more depending on the scope of the
project, skills of the participants, and organizational structure.
• Example –
For a small, versatile team, these listed seven roles may be fulfilled by
only three to four people but a large project on the contrary may
require 20 or more people for fulfilling the listed roles.
Key Roles for a Data analytics project :
1. Business User :
• The business user is the one who understands the main area of the
project and is also basically benefited from the results.
• This user gives advice and consult the team working on the project
about the value of the results obtained and how the operations on
the outputs are done.
• The business manager, line manager, or deep subject matter expert in
the project mains fulfills this role.
2. Project Sponsor :
• The Project Sponsor is the one who is responsible to initiate the
project. Project Sponsor provides the actual requirements for the
project and presents the basic business issue.
• He generally provides the funds and measures the degree of value
from the final output of the team working on the project.
• This person introduce the prime concern and brooms the desired
output.
3. Project Manager :
This person ensures that key milestone and purpose of the project is
met on time and of the expected.
4. Business Intelligence Analyst :
• Business Intelligence Analyst provides business domain perfection
based on a detailed and deep understanding of the data, key
performance indicators (KPIs), key matrix, and business intelligence
from a reporting point of view.
• This person generally creates fascia and reports and knows about the
data feeds and sources.
5. Database Administrator (DBA) :
• DBA facilitates and arrange the database environment to support the
analytics need of the team working on a project.
• His responsibilities may include providing permission to key databases
or tables and making sure that the appropriate security stages are in
their correct places related to the data repositories or not.
6. Data Engineer :
• Data engineer grasps deep technical skills to assist with tuning SQL
queries for data management and data extraction and provides
support for data intake into the analytic sandbox.
• The data engineer works jointly with the data scientist to help build
data in correct ways for analysis.
7. Data Scientist :
• Data scientist facilitates with the subject matter expertise for
analytical techniques, data modelling, and applying correct analytical
techniques for a given business issues.
• He ensures overall analytical objectives are met.
• Data scientists outline and apply analytical methods and proceed
towards the data available for the concerned project.
Data Science vs. Data Analytics:
• Data Science and Data Analytics deal with Big Data, each taking a
unique approach.
• Data Science is an umbrella that encompasses Data Analytics.
• Data Science is a combination of multiple disciplines – Mathematics,
Statistics, Computer Science, Information Science, Machine Learning,
and Artificial intelligence.
Data Science vs. Data Analytics: Job roles
of Data Scientist and Data Analyst
• Data Scientists and Data Analysts utilize data in different ways.
• Data Scientists use a combination of Mathematical, Statistical, and
Machine Learning techniques to clean, process, and interpret data to
extract insights from it.
• They design advanced data modeling processes using prototypes, ML
algorithms, predictive models, and custom analysis.
• While data analysts examine data sets to identify trends and draw
conclusions,
• Data Analysts collect large volumes of data, organize it, and analyze it
to identify relevant patterns.
• After the analysis part is done, they strive to present their findings
through data visualization methods like charts, graphs, etc.
• Thus, Data Analysts transform the complex insights into business-
savvy language that both technical and nontechnical members of an
organization can understand.
• Both the roles perform varying degrees of data collection, cleaning,
and analysis to gain actionable insights for data-driven decision
making. Hence, the responsibilities of Data Scientists and Data
Analysts often overlap.
Responsibilities of Data Scientists:
• To process, clean, and validate the integrity of data.
• To perform Exploratory Data Analysis on large datasets.
• To perform data mining by creating ETL pipelines.
• To perform statistical analysis using ML algorithms like logistic
regression, KNN, Random Forest, Decision Trees, etc.
• To write code for automation and build resourceful ML libraries.
• To glean business insights using ML tools and algorithms.
• To identify new trends in data for making business predictions.
Responsibilities of Data Analysts:
• To collect and interpret data.
• To identify relevant patterns in a dataset.
• To perform data querying using SQL.
• To experiment with different analytical tools like predictive analytics,
prescriptive analytics, descriptive analytics, and diagnostic analytics.
• To use data visualization tools like Tableau, IBM Cognos Analytics, etc.,
for presenting the extracted information.
Core Deliverables:
• In the big data life cycle, the data products that result from
developing a big data product are in most of the cases some of the
following:
Machine learning implementation − This could be a classification
algorithm, a regression model or a segmentation model.
Recommender system − The objective is to develop a system that
recommends choices based on user behavior. Netflix is the
characteristic example of this data product, where based on the ratings
of users, other movies are recommended.
Dashboard − Business normally needs tools to visualize aggregated
data. A dashboard is a graphical mechanism to make this data
accessible.
Ad-Hoc analysis − Normally business areas have questions, hypotheses
or myths that can be answered doing ad-hoc analysis with data.
Key Stakeholders - Check who and where are the sponsors of other
projects similar to the one that interests you.
• Having personal contacts in key management positions helps, so any
contact can be triggered if the project is promising.
• Who would benefit from your project? Who would be your client
once the project is on track?
• Develop a simple, clear, and exiting proposal and share it with the key
players in your organization.
• The best way to find sponsors for a project is to understand the
problem and what would be the resulting data product once it has
been implemented. This understanding will give an edge in convincing
the management of the importance of the big data project.
The nature of data:
• Data is the plural of datum, so it is always treated as plural.
• We can find data in all the situations of the world around us, in all the
structured or unstructured, in continuous or discrete conditions, in
weather records, stock market logs, in photo albums, music playlists,
or in our Twitter accounts.
• In fact, data can be seen as the essential raw material of any kind of
human activity.
• According to the Oxford English Dictionary: Data are known facts or
things used as basis for inference or reckoning.
As shown in the following figure, we can see Data in
two distinct ways: Categorical and Numerical:
Categorical data:
• Categorical data are values or observations that can be sorted into
groups or categories.
• There are two types of categorical values, nominal and ordinal. A
nominal variable has no intrinsic ordering to its categories.
• For example, housing is a categorical variable having two categories
(own and rent). An ordinal variable has an established ordering. For
example, age as a variable with three orderly categories (young, adult,
and elder).
Numerical data:
• Numerical data are values or observations that can be measured.
• There are two kinds of numerical values, discrete and continuous.
• Discrete data are values or observations that can be counted and are
distinct and separate.
• For example, number of lines in a code.
• Continuous data are values or observations that may take on any
value within a finite or infinite interval. For example, an economic
time series such as historic gold prices.
The kinds of datasets are as follows:
• E-mails (unstructured, discrete)
• Digital images (unstructured, discrete)
• Stock market logs (structured, continuous)
• Historic gold prices (structured, continuous)
• Credit approval records (structured, discrete)
• Social media friends and relationships (unstructured, discrete)
• Tweets and trending topics (unstructured, continuous)
• Sales records (structured, continuous)
Analytical Processes and Tools
• Big Data Analytics is the process of collecting large chunks of
structured/unstructured data, segregating and analyzing it and
discovering the patterns and other useful business insights from it.
• These days, organizations are realising the value they get out of big data
analytics and hence they are deploying big data tools and processes to
bring more efficiency in their work environment.
• Many big data tools and processes are being utilised by companies
these days in the processes of discovering insights and supporting
decision making.
• Big data processing is a set of techniques or programming models to
access large- scale data to extract useful information for supporting and
providing decisions.
Below is the list of some of the data
analytics tools used most in the industry:
• R Programming (Leading Analytics Tool in the industry)
• Python
• Excel
• SAS
• Apache Spark
• Splunk
• RapidMiner
• Tableau Public
• KNime
Difference Between Analysis and Reporting:
Reporting :
Need for Reporting:
• The problem with raw data is that it is not intelligible.
• It needs to be organized after it is collected so that it is easier to
visualize.
• In other words, it is much easier for users to use the information when
viewing it from within the reports.
• It is during this process that a transformation takes place. It is no longer
simply data – through reporting, it becomes usable information.
• Reporting follows a push approach, where reports are pushed to users
who are then expected to extract meaningful insights and take
appropriate actions for themselves.

Types of Reports:
1. Canned Reports:
• These are the generic reports that businesses can access within the
analysis tool or which can also be delivered regularly to a group of
end-users.
• These reports are static with fixed metrics and dimensions. In general,
some reports are more valuable than others, and a report’s value may
depend on how relevant it is to an individual’s role (e.g., product
manager, marketer).
2. Dashboards:
• These custom-made reports combine different KPIs and reports to
provide a comprehensive, high-level view of business performance for
specific audiences.
• Dashboards may include data from various data sources and are also
usually static.
3. Alerts: These conditional reports are triggered when data falls
outside of expected ranges, or some other pre-defined criteria are met.
Once people are notified of what happened, they can take appropriate
action as necessary.
Analysis:
• The analysis follows a pull approach, where an analyst pulls the data
to answer specific business questions.
Types of Analysis:
1. Ad hoc Analysis:
• Analysts receive requests to answer various business questions that
may be spurred by questions raised from the reports.
• Typically, these requests are time-sensitive and demand a quick
turnaround.
2. Analysis Presentations:
• Some business questions are more complex and require more time to
perform a comprehensive, deep-dive analysis.
• These analysis projects result in a more formal deliverable, which
includes two key sections: key findings and recommendations.

You might also like