0% found this document useful (0 votes)

12 views7 pages

BDC Output 10

Uploaded by

siyebic418

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views7 pages

BDC Output 10

Uploaded by

siyebic418

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Assignment no 10

Output
Big Data Analytics Options on AWS
Whitepaper

Solving big data problems on

AWS
This whitepaper has examined some tools available on AWS for big data analytics. This paper
provides a good reference point when starting to design your big data applications. However,
there are additional aspects you should consider when selecting the right tools for your speciﬁc
use case. In general, each analytical workload has certain characteristics and requirements that
dictate which tool to use, such as:

• How quickly do you need analytic results: in real time, in seconds, or is an hour a more
appropriate time frame?
• How much value will these analytics provide your organization and what budget
constraints exist?
• How large is the data and what is its growth rate?
• How is the data structured?
• What integration capabilities do the producers and consumers have?
• How much latency is acceptable between the producers and consumers?
• What is the cost of downtime or how available and durable does the solution need to be?
• Is the analytic workload consistent or elastic?

Each one of these questions helps guide you to the right tool. In some cases, you can
simply map your big data analytics workload into one of the services based on a set of
requirements. However, in most real-world, big data analytic workloads, there are many
diﬀerent, and sometimes conﬂicting, characteristics and requirements on the same data
set.

For example, some result sets may have real-time requirements as a user interacts with a
system, while other analytics could be batched and run on a daily basis. These diﬀerent
requirements over the same data set should be decoupled and solved by using more than
one tool. If you try to solve both of these examples using the same toolset, you end up
either over-provisioning or therefore overpaying for unnecessary response time, or you
have a solution that does not respond fast enough to your users in real time. Matching the
best-suited tool to each analytical problem results in the most cost-eﬀective use of your
compute and storage resources.

Big data doesn’t need to mean “big costs”. So, when designing your applications, it’s important
to make sure that your design is cost eﬃcient. If it’s not, relative to the alternatives, then it’s
probably not the right design. Another common misconception is that using multiple tool
sets to solve a big data problem is more expensive or harder to manage than using one big
tool. If you take the same example of two diﬀerent requirements on the same data set, the
real-time request may be low on CPU but high on I/O, while the slower processing request
may be very compute intensive.

Decoupling can end up being much less expensive and easier to manage, because you can
build each tool to exact specifications and not overprovision. With the AWS pay-as-you-go
model, this equates to a much better value because you could run the batch analytics in just
one hour and therefore only pay for the compute resources for that hour. Also, you may find
this approach easier to manage rather than leveraging a single system that tries to meet all
of the requirements. Solving for different requirements
Big Data Analytics Options on AWS
with one tool results in attempting to fit a square peg (real-time requests) into a round hole
Whitepaper
(a large data warehouse).

The AWS platform makes it easy to decouple your architecture by having diﬀerent tools
analyze the same data set. AWS services have built-in integration so that moving a subset of
data from one tool to another can be done very easily and quickly using parallelization.
Following are some real world, big data analytics problem scenarios, and an AWS architectural
solution for each.
Big Data Analytics Options on AWS
Example 1: Queries against an Amazon S3
Whitepaper
data lake

Example 1: Queries against an Amazon S3

data lake
Data lakes are an increasingly popular way to store and analyze both structured and
unstructured data. If you use an Amazon S3 data lake, AWS Glue can make all your data
immediately available for analytics without moving the data. AWS Glue crawlers can scan
your data lake and keep the AWS Glue Data Catalog in sync with the underlying data. You
can then directly query your data lake with Amazon Athena and Amazon Redshift Spectrum.
You can also use the AWS Glue Data Catalog as your external Apache Hive Metastore for

big data applications running on Amazon EMR.

Queries against an Amazon S3 data lake

1. An AWS Glue crawler connects to a data store, progresses through a prioritized list of
classifiers to extract the schema of your data and other statistics, and then populates the
AWS Glue Data Catalog with this metadata. Crawlers can run periodically to detect the
availability of new data as well as changes to existing data, including table definition
changes. Crawlers automatically add new tables, new partitions to existing table, and new
versions of table definitions. You can customize AWS Glue crawlers to classify your own file
types.
2. The AWS Glue Data Catalog is a central repository to store structural and operational
metadata for all your data assets. For a given data set, you can store its table definition,
physical location, add business relevant attributes, as well as track how this data has
changed over time. The AWS Glue Data Catalog is Apache Hive Metastore compatible and is
a drop-in replacement for the Apache Hive Metastore for Big Data applications running on
Amazon EMR. For more information on setting up your EMR cluster to use AWS Glue Data
Catalog as an Apache Hive Metastore, see AWS Glue documentation.
3. The AWS Glue Data Catalog also provides out-of-box integration with Amazon Athena,
Amazon EMR, and Amazon Redshift Spectrum. After you add your table definitions to the
AWS Glue Data Catalog, they are available for ETL and also readily available for querying
in Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum so that you can have a
common view of your data between these services.
4. Using a BI tool like Amazon QuickSight enables you to easily build visualizations, perform
ad hoc analysis, and quickly get business insights from your data. Amazon QuickSight
supports data sources such as Amazon Athena, Amazon Redshift Spectrum, Amazon S3
and many others. See Supported Data Sources.
Example 2: Capturing and analyzing sensor
data
An international air conditioner manufacturer has many large air conditioners that it sells
to various commercial and industrial companies. Not only do they sell the air conditioner
units but, to better position themselves against their competitors, they also offer add-on
services where you can see real-time dashboards in a mobile app or a web browser. Each
unit sends its sensor information for
processing and analysis. This data is used by the manufacturer and its customers. With this
capability, the manufacturer can visualize the dataset and spot trends.

Currently, they have a few thousand pre-purchased air conditioning (A/C) units with this
capability. They expect to deliver these to customers in the next couple of months and are
hoping that, in time, thousands of units throughout the world will use this platform. If
successful, they would like to expand this oﬀering to their consumer line as well, with a
much larger volume and a greater market share. The solution needs to be able to handle
massive amounts of data and scale as they grow their business without interruption. How
should you design such a system?

First, break it up into two work streams, both originating from the same data:

• A/C unit’s current information with near-real-time requirements and a large number of
customers consuming this information
• All historical information on the A/C units to run trending and analytics for

internal use The data-ﬂow architecture in the following ﬁgure shows how to

solve this big data problem.

Capturing and analyzing sensor data

1. The process begins with each A/C unit providing a constant data stream to Amazon
Kinesis Data Streams. This provides an elastic and durable interface the units can talk
to that can be scaled seamlessly as more and more A/C units are sold and brought
online.

][
2. Using the Amazon Kinesis Data Streams-provided tools such as the Kinesis Client Library
or SDK, a simple application is built on Amazon EC2 to read data as it comes into
Amazon Kinesis Data Streams, analyze it, and determine if the data warrants an update
to the real-time dashboard. It looks for changes in system operation, temperature
ﬂuctuations, and any errors that the units encounter.
3. This data ﬂow needs to occur in near real time so that customers and
maintenance teams can be alerted quickly if there is an issue with the unit. The
data in the dashboard does have some
aggregated trend information, but it is mainly the current state as well as any system errors.
So, t

][
Big Data Analytics Options on AWS AWS
Whitepaper Example 3: sentiment analysis
of social media

Example 3: sentiment analysis of social media

A large toy maker has been growing very quickly and expanding their product line. After
each new toy release, the company wants to understand how consumers are enjoying and
using their products.
Additionally, the company wants to ensure that their consumers are having a good experience
with their products. As the toy system grows, the company wants to ensure that their products
are still relevant
to their customers and that they can plan future roadmaps items based on customer
feedback. The company wants to capture the following insights from social media:

• Understand how consumers are using their products

• Ensure customer satisfaction
• Plan future roadmaps

Capturing the data from various social networks is relatively easy but the challenge is building
the intelligence programmatically. After the data is ingested, the company wants to be able
to analyze and classify the data in a cost-eﬀective and programmatic way. To do this, they
can use the architecture in the following ﬁgure.

Sentiment analysis of social media

1. First, deploy an Amazon EC2 instance in an Amazon VPC that ingests tweets from Twitter.
2. Next, create an Amazon Kinesis Data Firehose delivery stream that loads the streaming
tweets into the raw prefix in the solution's S3 bucket.
3. S3 invokes a Lambda function to analyze the raw tweets using Amazon Translate to
translate non- English tweets into English, and Amazon Comprehend to use natural
language-processing (NLP) to perform entity extraction and sentiment analysis.
4. A second Kinesis Data Firehose delivery stream loads the translated tweets and
sentiment values into the sentiment prefix in the S3 bucket. A third delivery stream loads
entities in the entities prefix in the S3 bucket.
5. This architecture also deploys a data lake that includes AWS Glue for data
transformation, Amazon Athena for data analysis, and Amazon QuickSight for data
visualization. AWS Glue Data Catalog contains a logical database used to organize the
tables for the data in S3. Athena uses these table definitions to query the data stored in
S3 and return the information to an Amazon QuickSight dashboard.

By using ML and BI services from AWS including Amazon Translate, Amazon Comprehend,
Amazon Kinesis, Amazon Athena, and Amazon QuickSight, you can build meaningful, low-
cost social media dashboards to analyze customer sentiment, which can lead to better
opportunities for acquiring leads, improve website traﬃc, strengthen customer relationships, and
improve customer service.

This example solution automatically provisions and conﬁgures the AWS services necessary
to capture multi-language tweets in near-real-time, translate them, and display them on a
dashboard powered by Amazon QuickSight. You can also capture both the raw and
enriched datasets and durably store them in the solution's data lake. This enables data
analysts to quickly and easily perform new types of analytics and ML on this data. For
more information, see the AI-Driven Social Media Dashboard solution.

Conclusion
As more and more data is generated and collected, data analysis requires scalable, ﬂexible, and
high performing tools to provide insights in a timely fashion. However, organizations are
facing a growing big data environment, where new tools emerge and become outdated very
quickly. Therefore, it can be very diﬃcult to keep pace and choose the right tools.

This whitepaper oﬀers a ﬁrst step to help you solve this challenge. With a broad set of
managed services to collect, process, and analyze big data, AWS makes it easier to build,
deploy, and scale big data applications. This enables you to focus on business problems
instead of updating and managing these tools.

AWS provides many solutions to address your big data analytic requirements. Most big data
architecture solutions use multiple AWS tools to build a complete solution. This approach
helps meet stringent business requirements in the most cost-optimized, performant, and
resilient way possible. The result is a ﬂexible big data architecture that is able to scale along
with your business

Cheat Sheet AWS Data Engineer Associate
No ratings yet
Cheat Sheet AWS Data Engineer Associate
117 pages
(Ebooks PDF) Download Modern Data Architecture On AWS: A Practical Guide For Building Next-Gen Data Platforms On AWS Behram Irani Full Chapters
100% (3)
(Ebooks PDF) Download Modern Data Architecture On AWS: A Practical Guide For Building Next-Gen Data Platforms On AWS Behram Irani Full Chapters
52 pages
Data Engineering by AWS
100% (1)
Data Engineering by AWS
11 pages
AWS Data Lake
100% (1)
AWS Data Lake
104 pages
Modern Data Architectures Using The AWS WellArchitected Data Analytics Lens REPEAT ARC321-R2
100% (1)
Modern Data Architectures Using The AWS WellArchitected Data Analytics Lens REPEAT ARC321-R2
19 pages
Bring Data Lakes and Data Warehouses Together
100% (1)
Bring Data Lakes and Data Warehouses Together
19 pages
Data Lake On Aws
No ratings yet
Data Lake On Aws
29 pages
PSO Data Analytics Day 1
100% (1)
PSO Data Analytics Day 1
106 pages
AWS_DE_Certification_Guide_1728124415
No ratings yet
AWS_DE_Certification_Guide_1728124415
112 pages
AWS Data Analytics Specialty Exam Cram Notes
No ratings yet
AWS Data Analytics Specialty Exam Cram Notes
43 pages
AWS Data Analytics - Technical - Student
No ratings yet
AWS Data Analytics - Technical - Student
160 pages
AWS Certified Data Engineer
No ratings yet
AWS Certified Data Engineer
186 pages
Implementing Travel & Hospitality Data Mesh: AWS Reference Architecture
No ratings yet
Implementing Travel & Hospitality Data Mesh: AWS Reference Architecture
2 pages
Big Data Analytics Options On AWS
100% (1)
Big Data Analytics Options On AWS
50 pages
C3_W2
No ratings yet
C3_W2
138 pages
AWS+Data+Lake (1)
No ratings yet
AWS+Data+Lake (1)
118 pages
AWS Data Lake
No ratings yet
AWS Data Lake
87 pages
Amazon Redshift - Analyze Data Across Your Lake House with Amazon Redshift
No ratings yet
Amazon Redshift - Analyze Data Across Your Lake House with Amazon Redshift
48 pages
DEA-C01
No ratings yet
DEA-C01
7 pages
AWS Storage Options
100% (1)
AWS Storage Options
34 pages
BIG DATA FRAMEWORK FOR AMAZON (An Orientation Template for New Employees)
No ratings yet
BIG DATA FRAMEWORK FOR AMAZON (An Orientation Template for New Employees)
7 pages
Chapter 6 - Big Data Architecture Part 1
No ratings yet
Chapter 6 - Big Data Architecture Part 1
41 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Big Data Architectural Patterns and Best Practices On AWS Presentation
100% (1)
Big Data Architectural Patterns and Best Practices On AWS Presentation
56 pages
Handout Streamline Data and AI Governance With Amazon SageMaker Catalog
No ratings yet
Handout Streamline Data and AI Governance With Amazon SageMaker Catalog
35 pages
Building+serverless+analytics+pipelines+with+AWS+Glue+-+Tom+McMeekin-1
No ratings yet
Building+serverless+analytics+pipelines+with+AWS+Glue+-+Tom+McMeekin-1
39 pages
58076778-Node Javier Ramirez - AWS PDF
No ratings yet
58076778-Node Javier Ramirez - AWS PDF
73 pages
Whitepaper Streaming Data Solutions On Aws With Amazon Kinesis
No ratings yet
Whitepaper Streaming Data Solutions On Aws With Amazon Kinesis
33 pages
aws_saa_c03_part_5
No ratings yet
aws_saa_c03_part_5
100 pages
AWS Services List and CLF02 Content - Services and Usage-1
No ratings yet
AWS Services List and CLF02 Content - Services and Usage-1
57 pages
Augmenting Data Warehouses With Big Data
No ratings yet
Augmenting Data Warehouses With Big Data
17 pages
Module 6
No ratings yet
Module 6
16 pages
AWS Whitepaper
No ratings yet
AWS Whitepaper
31 pages
Modernize Your Analyticsand Data Architecture
No ratings yet
Modernize Your Analyticsand Data Architecture
47 pages
Analytics Services v2
No ratings yet
Analytics Services v2
59 pages
AWS Certified Solutions Architect - Associate SAA-C03 Exam Questions and Answers
No ratings yet
AWS Certified Solutions Architect - Associate SAA-C03 Exam Questions and Answers
43 pages
AWS Services - Analytics and ML
No ratings yet
AWS Services - Analytics and ML
2 pages
APC Building Data Lakes On AWS SG
No ratings yet
APC Building Data Lakes On AWS SG
187 pages
BCE Report
No ratings yet
BCE Report
14 pages
Chapter 6 - Big Data Architecture Part 1
No ratings yet
Chapter 6 - Big Data Architecture Part 1
41 pages
AWS Architecture Icons Deck For Dark BG 04282023
No ratings yet
AWS Architecture Icons Deck For Dark BG 04282023
152 pages
40833 OR
No ratings yet
40833 OR
29 pages
Introduction To Analytics On AWS
No ratings yet
Introduction To Analytics On AWS
34 pages
Unit3 - Cloud Data Storage
No ratings yet
Unit3 - Cloud Data Storage
7 pages
Modernserverlessdatalak
No ratings yet
Modernserverlessdatalak
45 pages
Learning AWS Design build and deploy responsive applications using AWS Cloud components Aurobindo Sarkar - The full ebook version is just one click away
100% (1)
Learning AWS Design build and deploy responsive applications using AWS Cloud components Aurobindo Sarkar - The full ebook version is just one click away
61 pages
Aws Data Service Notes
No ratings yet
Aws Data Service Notes
9 pages
Aws Services
No ratings yet
Aws Services
59 pages
CCD chapter 3 notes
No ratings yet
CCD chapter 3 notes
11 pages
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
Download ebooks file AWS Certified Solutions Architect Associate All-in-One Exam Guide (Exam SAA-C02) Joyjeet Banerjee all chapters
100% (2)
Download ebooks file AWS Certified Solutions Architect Associate All-in-One Exam Guide (Exam SAA-C02) Joyjeet Banerjee all chapters
47 pages
Enterprise Data Warehousing On Aws
No ratings yet
Enterprise Data Warehousing On Aws
26 pages
Module 10 - Examen Test B
No ratings yet
Module 10 - Examen Test B
132 pages
Big Data PDF
No ratings yet
Big Data PDF
18 pages
AWS SAP Questions and Answers
No ratings yet
AWS SAP Questions and Answers
106 pages
Instant Download Event Streams in Action Real time event systems with Kafka and Kinesis 1st Edition Alexander Dean Valentin Crettaz PDF All Chapters
100% (2)
Instant Download Event Streams in Action Real time event systems with Kafka and Kinesis 1st Edition Alexander Dean Valentin Crettaz PDF All Chapters
62 pages
scs-c02_4
No ratings yet
scs-c02_4
28 pages
DocScanner 20 Oct 2024 2-19 PM
No ratings yet
DocScanner 20 Oct 2024 2-19 PM
16 pages
BDF 2022 Combined 2
No ratings yet
BDF 2022 Combined 2
266 pages
Amazon Pass4sure Aws-Solution-Architect-Associate Vce Dumps 2023-Jun-02 by Bishop 190q Vce
No ratings yet
Amazon Pass4sure Aws-Solution-Architect-Associate Vce Dumps 2023-Jun-02 by Bishop 190q Vce
27 pages
AWS Telco Media
No ratings yet
AWS Telco Media
15 pages
Event Streaming With Modern Data Pipelines in A SaaS Architecture ISV201
No ratings yet
Event Streaming With Modern Data Pipelines in A SaaS Architecture ISV201
22 pages
Ee Draindumps 2023-Aug-16 by Nicholas 93q Vce
100% (1)
Ee Draindumps 2023-Aug-16 by Nicholas 93q Vce
24 pages
AcademyCloudFoundations Module 07
No ratings yet
AcademyCloudFoundations Module 07
65 pages
IJRAR24D3235
No ratings yet
IJRAR24D3235
7 pages
IOT204 How Amazon Uses AWS IoT to Improve Sustainability Across Its Buildings
No ratings yet
IOT204 How Amazon Uses AWS IoT to Improve Sustainability Across Its Buildings
42 pages
Architecting Amazon Eks For Pci Dss Compliance
No ratings yet
Architecting Amazon Eks For Pci Dss Compliance
21 pages
Dibs Final Paper 2015
No ratings yet
Dibs Final Paper 2015
9 pages
AWS Data-Lake Ebook
No ratings yet
AWS Data-Lake Ebook
9 pages
Sap c02 True2
No ratings yet
Sap c02 True2
24 pages
1 AWS Analytics and Data Lakes
No ratings yet
1 AWS Analytics and Data Lakes
15 pages
Overview of Amazon Web Services - AWS Whitepaper
No ratings yet
Overview of Amazon Web Services - AWS Whitepaper
10 pages
data-mesh-with-amazon-datazone
No ratings yet
data-mesh-with-amazon-datazone
1 page
AWS Storage Use Cases
No ratings yet
AWS Storage Use Cases
12 pages
Final project on data lakes with AWS
No ratings yet
Final project on data lakes with AWS
2 pages
Landing Zone Accelerator On AWS - Implementation Guide
No ratings yet
Landing Zone Accelerator On AWS - Implementation Guide
1 page
Dva-C02 8
No ratings yet
Dva-C02 8
24 pages
Dva-C02 9
No ratings yet
Dva-C02 9
23 pages
Amazon Connect
No ratings yet
Amazon Connect
2 pages
The Power of AWS With Amazon Connect - Amazon Connect
No ratings yet
The Power of AWS With Amazon Connect - Amazon Connect
5 pages
Saa-C02 True
No ratings yet
Saa-C02 True
9 pages
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
AWS Cloud Practitioner Exam Success Kit
From Everand
AWS Cloud Practitioner Exam Success Kit
SUJAN
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
AWS Certified Cloud Practitioner - Practice Paper 3: AWS Certified Cloud Practitioner, #3
From Everand
AWS Certified Cloud Practitioner - Practice Paper 3: AWS Certified Cloud Practitioner, #3
Tech Interviews
5/5 (1)
AWS Certified Cloud Practitioner - Practice Paper 2: AWS Certified Cloud Practitioner, #2
From Everand
AWS Certified Cloud Practitioner - Practice Paper 2: AWS Certified Cloud Practitioner, #2
Tech Interviews
5/5 (2)
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)

BDC Output 10

Uploaded by

BDC Output 10

Uploaded by

Assignment no 10

Solving big data problems on

Example 1: Queries against an Amazon S3

big data applications running on Amazon EMR.

Queries against an Amazon S3 data lake

solve this big data problem.

Capturing and analyzing sensor data

Example 3: sentiment analysis of social media

• Understand how consumers are using their products

Sentiment analysis of social media

You might also like