0% found this document useful (0 votes)
4 views

BDA-Unit-1 (2)

The document provides an overview of Big Data, defining it as large and complex datasets that traditional systems cannot manage effectively. It discusses the challenges associated with Big Data, including volume, velocity, variety, veracity, value, variability, complexity, security, privacy, and ethical considerations. Additionally, it outlines the evolution of Big Data technologies, the importance of Big Data platforms, and the processes involved in intelligent data analysis and analytics.

Uploaded by

jerryashari1419
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

BDA-Unit-1 (2)

The document provides an overview of Big Data, defining it as large and complex datasets that traditional systems cannot manage effectively. It discusses the challenges associated with Big Data, including volume, velocity, variety, veracity, value, variability, complexity, security, privacy, and ethical considerations. Additionally, it outlines the evolution of Big Data technologies, the importance of Big Data platforms, and the processes involved in intelligent data analysis and analytics.

Uploaded by

jerryashari1419
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Big Data Analytics

Unit-1
What is Data?
• Data is a collection of facts or statistics. It can be gathered through
observations, measurements, research, or analysis.
• Data can include: Numbers, Names, Figures, Descriptions, Text,
Images, Videos, Symbols.
• Data can be organized into graphs, charts, or tables. When arranged
in an organized form, data can be called information.
Characteristics of Data
Nature of Data
Definition of Big data
Big data is a large and diverse set of
information that is growing at an
increasing rate. It can include
structured, unstructured, and semi-
structured data. Big data is so large
and complex that traditional data
management systems can't store,
process, and analyze it.
Challenges with Big Data
• Volume:
• Definition: Big Data involves the processing of massive volumes of data.
• Challenge: Traditional databases and data processing tools may struggle to
handle such vast amounts of information efficiently.
• Velocity:
• Definition: Big Data is often generated at high speeds, requiring real-time or
near-real-time processing.
• Challenge: Traditional data processing systems may not keep up with the
pace at which data is generated and needs to be analyzed.
Challenges with Big Data
• Variety:
• Definition: Big Data comes in diverse formats, including structured, semi-
structured, and unstructured data.
• Challenge: Integrating and analyzing different data types from various sources
can be complex and may require flexible data processing techniques.
• Veracity:
• Definition: Big Data may contain errors, inconsistencies, and inaccuracies.
• Challenge: Ensuring the quality and accuracy of the data is challenging, and
decisions based on flawed data can lead to incorrect results.
Challenges with Big Data
• Value:
• Definition: Extracting meaningful insights from Big Data to create value for
organizations.
• Challenge: Identifying relevant information and deriving actionable insights
from the vast amount of data can be challenging and requires sophisticated
analytics tools and techniques.
• Variability:
• Definition: Big Data flows with unpredictable variations in data flow and
processing requirements.
• Challenge: Handling the dynamic nature of data flow and adapting processing
capabilities accordingly can be a significant challenge.
Challenges with Big Data
• Complexity:
• Definition: Big Data systems are often complex, involving multiple
technologies, tools, and platforms.
• Challenge: Managing and integrating these complex systems requires
specialized skills and expertise.
• Security:
• Definition: Big Data systems may be susceptible to security threats and
breaches.
• Challenge: Protecting sensitive information and ensuring the privacy and
security of data is crucial but can be challenging due to the scale and
complexity of Big Data systems.
Challenges with Big Data
• Privacy:
• Definition: Big Data analytics often involve the use of personal and sensitive
information.
• Challenge: Balancing the need for data-driven insights with privacy concerns
and complying with data protection regulations poses a significant challenge.
• Ethical Considerations:
• Definition: The use of Big Data raises ethical concerns related to data
ownership, consent, and the potential for biased algorithms.
• Challenge: Addressing these ethical considerations and ensuring responsible
and fair use of Big Data is an ongoing challenge.
Characteristics of Big data
Why Big data is needed?
• Volume: Manages massive datasets that exceed the capacity of
traditional tools.
• Velocity: Processes data in real-time or near-real-time, crucial for
timely decision-making.
• Variety: Handles diverse data types, including structured, semi-
structured, and unstructured data.
• Veracity: Ensures data quality through tools for cleaning and
processing, addressing uncertainties.
• Value: Extracts valuable insights, leading to better decision-making
and a competitive edge.
Why Big data is needed?
• Complexity: Deals with the complexity of modern data through
advanced analytics and processing tools.
• Innovation: Drives innovation in various industries by enabling new
approaches and business models.
• Time Saving: Data analytics is faster.
• Decision-Making: Provides insights crucial for informed decision-
making in business, healthcare, finance, and more.
• Cost Efficiency: Offers cost-effective solutions, particularly through
scalable cloud-based infrastructure.
Evolution of Big Data:
• Data Warehousing:
In the 1990s, data warehousing emerged as a solution to store and
analyze large volumes of structured data.
• Hadoop:
Hadoop was introduced in 2006 by Doug Cutting and Mike Cafarella.
Distributed storage medium and large data processing are provided
by Hadoop, and it is an open-source framework.
• NoSQL Databases:
In 2009, NoSQL databases were introduced, which provide a flexible
way to store and retrieve unstructured data.
Evolution of Big Data:
• Cloud Computing:
Cloud Computing technology helps companies to store their important data
in data centers that are remote, and it saves their infrastructure cost and
maintenance costs.
• Machine Learning:
Machine Learning algorithms are those algorithms that work on large data,
and analysis is done on a huge amount of data to get meaningful insights
from it. This has led to the development of artificial intelligence (AI)
applications.
• Data Streaming:
Data Streaming technology has emerged as a solution to process large
volumes of data in real time.
Evolution of Big Data:
• Edge Computing:
Edge Computing is a kind of distributed computing paradigm that
allows data processing to be done at the edge or the corner of the
network, closer to the source of the data.
BIG DATA PLATFORM
• A big data platform works to manage large amount of information,
storing it in a manner that is organized and understandable enough
to extract useful insights.
• Big data platforms utilize a combination of data management
hardware and software tools to aggregate data on a massive scale,
usually onto the cloud.
Benefits of Big Data Platforms
• How does Netflix or Spotify know exactly what you want to stream
next? This is due in a large part to big data platforms working behind
the scenes.
• Understanding big data has become an asset in nearly every industry,
ranging from healthcare to retail and beyond. Companies increasingly
rely on these platforms to collect loads of data and turn them into
categorized, actionable business decisions. This helps firms get a
better view of their customers, target audiences, discover new
markets and make predictions about future steps.
Features of Big Data Platforms
• Big data platform features tend to involve the abilities to be scalable,
quick and equipped with built-in analysis tools to account for the
information at hand.
• For even more efficiency, some of the best big data platforms include
features for accommodating large sets of streaming or at-rest data,
converting data between multiple data formats and attaching new
applications at any necessary point.
Big Data Platforms
Intelligent Data Analysis
• Intelligent data analysis is the
process of applying artificial
intelligence techniques, such as
machine learning, natural
language processing, computer
vision, and deep learning, to
analyze and extract insights from
large and complex data sets.
Uses of Intelligent Data Analysis
IDA can be used for various purposes, such as:
• Predicting future outcomes or trends based on historical data
• Classifying or clustering data into meaningful groups or categories
• Detecting anomalies or outliers in data
• Generating natural language summaries or visualizations of data
• Recommending products or services based on user preferences or
behavior
Applications of Intelligent Data
Analysis
Some examples of intelligent data analysis applications are:
• Sentiment analysis: determining the emotional tone or attitude of a
text
• Face recognition: identifying or verifying a person’s identity from a
digital image
• Speech recognition: converting spoken words into text
• Recommendation systems: suggesting items or content that a user
might like
• Fraud detection: identifying fraudulent or suspicious transactions or
activities
Three types of data analytics
• Descriptive Analytics, which use data aggregation and data mining to
provide insight into the past and answer: “What has happened?”
• Predictive Analytics, which use statistical models and forecasting
techniques to understand the future and answer: “What could
happen?”
• Prescriptive Analytics, which use optimization and simulation
algorithms to advise on possible outcomes and answer: “What should
we do?”
Stages in IDA
Intelligent Data Analysis (IDA) has three stages:
• Data preparation: This stage involves selecting data from a relevant
source and integrating it into a data set for data mining. Data
preparation processes include data cleaning, data integration, data
collection, and data transformation.
• Data mining: This stage involves finding rules or mining data.
• Result validation and explanation: This stage involves validating and
explaining the results.
Data Analytic Processes
1.Data Collection: Gather data from diverse sources.
2.Data Cleaning and Preprocessing: Address errors, missing values, and
ensure consistency.
3.Data Storage: Utilize distributed storage solutions for efficient data
storage.
4.Data Exploration: Understand data characteristics and identify
relevant variables.
5.Data Analysis: Apply statistical and machine learning techniques to
extract insights.
Data Analytic Processes
6. Modeling: Develop predictive models and fine-tune for performance.
7.Visualization: Create visual representations to communicate insights.
8.Interpretation and Decision-Making: Interpret results and make
informed decisions.
9.Optimization: Continuously improve models and analytic processes.
10. Deployment: Implement solutions into operational systems.
Big Data analytics tools
• Hadoop: Open-source framework for distributed storage and
processing.
• Apache Spark: Fast and general-purpose cluster-computing
framework with in-memory processing.
• Apache Flink: Open-source stream processing framework for real-
time analytics.
• Apache Kafka: Distributed streaming platform for creating real-time
data pipelines.
• Hive: Data warehouse infrastructure with a SQL-like interface built on
Hadoop.
Big Data analytics tools
• Pig: High-level platform and scripting language for data processing on
Hadoop.
• NoSQL Databases (e.g., MongoDB, Cassandra): Scalable databases
for handling unstructured data.
• Tableau: Data visualization tool for creating interactive dashboards.
• Splunk: Platform for searching, monitoring, and analyzing machine-
generated data.
• R and Python (with libraries like pandas, NumPy): General-purpose
programming languages with rich data analysis and machine learning
capabilities.
Analytics vs Reporting
• Reporting is the practice of collecting existing information and
presenting it in a way that's easy to understand for specific audiences.
• Analytics is the process of manually or automatically analyzing large
amounts of data to generate meaningful conclusions that can improve
business performance.
• Reporting is the process of converting raw data into useful
information, while analysis transforms information into insights.

You might also like