0% found this document useful (0 votes)
44 views

Harnessing The Value of Big Data Analytics

Hadoop

Uploaded by

SuperAcc3ss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Harnessing The Value of Big Data Analytics

Hadoop

Uploaded by

SuperAcc3ss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Big Data Analytics

Harnessing the Value of Big Data Analytics


How to Gain Business Insight Using MapReduce and
Apache Hadoop with SQL-Based Analytics

By: Shaun Connolly,


VP, Corporate Strategy,
Hortonworks
Steve Wooledge,
Sr. Director, Marketing,
Teradata Aster
Harnessing the Value of Big Data Analytics
Table of Contents

Executive Summary 2 Executive Summary


The Challenges of Converting Big Data
Volumes into Insight 3 In business publications and IT trade journals, the buzz
Clear Path to New Value 4 about big data challenges is nearly deafening. Rapidly
Choosing the Ideal Big Data Analytics
Solution 5
growing volumes of data from transactional systems
Benefits of a Unified Big Data like enterprise resource planning (ERP) software and
Architecture for Analytics 7
non-transactional sources such as web logs, customer
Choosing the Right Big Data
Analytics Solution 8 call center records, and video images are everywhere.
Big Data Analytics 11 A tsunami of data, some experts call it.
in Action

Most companies know how to collect, store, and analyze


their operational data. But these new multi-structured
data types are often too variable and dynamic to be cost-
effectively captured in a traditional data schema using
only a standard query language (SQL) for analytics.

Some data scientists, business analysts, enterprise archi-


tects, developers, and IT managers are looking beyond
these big data volumes and focusing on the analytic value
they can deliver. These companies are searching for new
analytic solutions that can transform huge volumes of
complex, high-velocity data into pure business insight.
They also seek new data-driven applications and analytics
that can give them a competitive edge.

EB-7234 > 0612 > PAGE 2 OF 13


Harnessing the Value of Big Data Analytics

Leading organizations are exploring alterna- The Challenges of gigabytes in 2011.2 Thats the equiva-
tive solutions that use the MapReduce soft- Converting Big Data lent of 57.5 billion 32GB Apple iPads.
ware framework, such as Apache Hadoop. Volumes into Insight > Velocity: Data continues changing at
While Hadoop can cost-effectively load, an increasing rate of speed, making it
What business value does data bring to
store, and refine multi-structured data, it difficult for companies to capture and
your organization? If your company is
is not well-suited for low latency, iterative analyze. For example, machine-generated
like most, you wouldnt think of shifting
data discovery or classic enterprise busi- data from sensors and web log data is
production schedules, developing a market-
ness intelligence (BI). These applications being ingested in real-time by many
ing campaign, or forging a product strategy
require a strong ecosystem of tools that applications. Without real-time analyt-
without insight gleaned from business
provide ANSI SQL support as well as high ics to decipher these dynamic data
analytics tools. Using data from transac-
performance and interactivity. streams, companies cannot make sense
tional systems, your team reviews historical
purchase patterns, tracks sales, balances of the information in time to take
The more complete solution is to implement
the books, and seeks to understand tran- meaningful action.
a data discovery platform that can integrate
Hadoop with a relational integrated data sactional trends and behaviors. If your > Variety: Its no longer enough to collect
warehouse. New data discovery platforms analytics practice is advanced, you may just transactional data such as sales,
like the Teradata Aster MapReduce Platform even predict the likely outcomes of events. inventory details, or procurement
combine the power of the MapReduce information. Analysts are increasingly
But its not enough. Despite the value interested in new data types, such as
analytic framework with SQL-based BI tools
delivered by your current data warehouse sentiments expressed in product reviews,
that are familiar to analysts. The result is a
and analytics practices, you are only skim- unstructured text from call records and
unified solution that helps companies gain
ming the surface of the deep pool of service reports, online behavior such as
valuable business insight from new and
business value that data can deliver. Today click streams, images and videos, and
existing data using existing BI tools and
there are huge volumes of interactional and geospatial and temporal details. These
skill sets as well as enhanced MapReduce
observational data being created by busi- data types add richness that supports
analytic capabilities.
nesses and consumers around the world. more detailed analyses.
But which analytic workloads are best suited Generated by web logs, sensors, social
> Complexity: With more details and
for Hadoop, the data discovery platform, and media sites, and call centers, for example,
sources, the data is more complex and
an integrated data warehouse? How can these so-called big data volumes are
difficult to analyze. In the past, banks
these specialized systems best work together? difficult to process, store, and analyze.
used just transactional data to predict
What are the schema requirements for
According to industry analyst Gartner, 1 the probability of a customer closing
different data types? Which system provides
any effort to tackle the big data challenge an account. Now, these companies want
an optimized processing environment that
must address multiple factors, including: to understand the last mile of the
delivers maximum business value with the
customers decision process. By gaining
lowest total cost of ownership? This paper > Volume: The amount of data generated
visibility into common consumer
answers these questions and shows you by companies and their customers,
behavior patterns across the web site,
how to use MapReduce, Hadoop, and a competitors, and partners continues
social networks, call centers, and
unified big data architecture to support to grow exponentially. According to
branches, banks can address issues
big data analytics. industry analyst IDC, the digital universe
impacting customer loyalty before
created and replicated 1.8 trillion

1 Source: Big Data is Only the Beginning of Extreme Information Management, Gartner, April 2011
2 Source: Extracting Value from Chaos, John Gantz and David Reinsel, IDC, June 2011

EB-7234 > 0612 > PAGE 3 OF 13


Harnessing the Value of Big Data Analytics

consumers decide to defect. Analyzing Some data formats may not fit well into a Data scientists, business analysts, enter-
and detecting patterns on the fly schema without heavy pre-processing or prise architects, developers, and IT
across and all customer records is may have requirements for loading and managers are looking for alternative
time-consuming and costly. Replicating storing in their native format. Dealing methods to collect and analyze big data
that effort over time can be even more with this variety of data types efficiently streams. Whats needed is a unified big
challenging. can be difficult. As a result, many organi- data architecture that lets them refine raw
zations simply delete this data or never data into valuable analytical assets. (See
Addressing the multiple challenges posed
bother to capture it at all. Figure 1.) Specifically, they need to:
by big data volumes is not easy. Unlike
> Capture, store, and refine raw, multi-
transactional data, which can be stored in
Clear Path to New Value structured data in a data refinery
a stable schema that changes infrequently,
Companies that recognize the opportuni- platform. This platform extends
interactional data types are more dynamic.
ties inherent in big data analytics can take existing architectures that have been
They require an evolving schema, which
steps to unlock the value of these new data traditionally used to store data from
is defined dynamically often on-the-fly
flows. According to Gartner, CIOs face structured information sources, such
at query runtime. The ability to load data
significant challenges in addressing the as transactional systems.
quickly, and evolve the schema over time
if needed, is a tremendous advantage for issues surrounding big data New tech- > Explore and uncover value and new
analysts who want to reduce time to nologies and applications are emerging insights, quickly and iteratively, in a
valuable insights. and should be investigated to understand data discovery platform
their potential value. 3

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Reporting and Execution


Discover and Explore in the Enterprise

Capture, Store and Refine

Audio/ Web and Machine


Video Images Docs Text Social Logs CRM SCM ERP

Figure 1. Architecture for Refining Big Data Volumes into Analytical Assets.

3 Source: CEO Advisory: Big Data Equals Big Opportunity, Gartner, March 31, 2011

EB-7234 > 0612 > PAGE 4 OF 13


Harnessing the Value of Big Data Analytics

> Provide IT and business users with a MapReduce and Hadoop: A Primer
variety of analytic tools and techniques
How do technologies such as MapReduce and Hadoop help
to discover and explore patterns
organizations harness the value of unstructured and semi-
> Store valuable data and metadata in an structured data?
integrated data warehouse so analysts
MapReduce supports distributed processing of the common
and business applications can opera-
map and reduction operations. In the map step, a master
tionalize new insights from multi-
node divides a query or request into smaller problems. It
structured data
distributes each query to a set of map tasks scheduled on a
worker node within a cluster of execution nodes. The output
Choosing the Ideal Big
of the map steps is sent to nodes that combine or reduce
Data Analytics Solution
the output and create a response to the query. Because
To maximize the value of traditional and both the map and reduce functions can be distributed to
multi-structured data assets, companies clusters of commodity hardware and performed in parallel,
need to deploy technologies that integrate MapReduce techniques are appropriate for larger datasets.
Hadoop and relational database systems.
Apache Hadoop consists of two components: Hadoop
Although the two worlds were separate not
MapReduce for parallel data processing and the Hadoop
long ago, vendors are beginning to intro-
Distributed File System (HDFS) for low-cost, reliable data
duce solutions that effectively combine the
storage. Hadoop, the most popular open-source implemen-
technologies. For example, market leaders
tation of the MapReduce framework, can be used to refine
like Teradata and Hortonworks are
unstructured and semi-structured data into structured
partnering to deliver reference architec-
formats that can be analyzed or loaded into other analytic
tures and innovative product integration
platforms.
that unify Hadoop with data discovery
platforms and integrated data warehouses.

What should companies look for to get many multi-structured data types with needing to understand the programming
the most value from Hadoop? Most unknown initial value. It also serves as a behind them. With this architecture,
importantly, you need a unified big data cost-effective platform for retaining large enterprise architects can easily and cost-
architecture that tightly integrates the volumes of data and files for long periods effectively incorporate Hadoop storage
Hadoop/MapReduce programming model of time. and batch processing strengths together
with traditional SQL-based enterprise data with the relational database system.
The unified big data architecture also
warehousing. (See Figure 2.)
preserves the declarative and storage- A critical part of the unified big data
The unified big data architecture is based on independence benefits of SQL, without architecture is a discovery platform that
a system that can capture and store a wide compromising MapReduces ability to leverages the strengths of Hadoop for scale
range of multi-structured raw data sources. extend SQLs analytic capabilities. By and processing while bridging the gaps
It uses MapReduce to refine this data into offering the intuitiveness of SQL, the around BI tool support, SQL access, and
usable formats, helping to fuel new insights solution helps less-experienced users exploit interactive analytical workloads. SQL-
for the business. In this respect, Hadoop is the analytical capabilities of existing and MapReduce helps bridge this gap by pro-
an ideal choice for capturing and refining packaged MapReduce functions, without viding a distinct execution engine within

EB-7234 > 0612 > PAGE 5 OF 13


Harnessing the Value of Big Data Analytics

Unified Big Data Architecture for the Enterprise

Engineers Data Scientists Quants Business Analysts

Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualization, etc.

Analytics

Discovery Platform Integrated Data Warehouse

Capture, Store, Refine

Audio/ Web and Machine


Video Images Docs Text Social Logs CRM SCM ERP

Figure 2. Unified Big Data Architecture.

the discovery platform. This allows the Finally, an interactive development tool To support rapid iterations in the data dis-
advanced analytical functions to execute can reduce the effort required to build and covery processes, the solution also must
automatically, in parallel across the nodes test custom-developed functions. Such offer high performance and ease of analytic
of the machine cluster, while providing a tools can also be used to import existing iteration. Look for standard SQL and BI tools
standard SQL interface that can be lever- Java MapReduce programs. that can leverage both SQL and MapReduce
aged by BI tools. natively. By leveraging relational technology
To ensure that the platform delivers relevant
as the data store, analysts receive the
Some products include a library of insights, it must also offer enough scalability
performance benefit of a query optimizer,
prebuilt analytic functions such as path, to support entire data sets not just data
indexes, data partitioning, and simple SQL
pattern, statistical, graph, text and cluster samples. The more data you can analyze,
statements that can execute instantaneously.
analysis, and data transformation that the more accurate your results will be. As
help speed the deployment of analytic data science expert Anand Rajaraman In sum, a unified big data architecture
applications. Users should be able to write recently wrote on the Datawocky blog, blends the best of Hadoop and SQL,
custom functions as needed, in a variety Adding more, independent data usually allowing users to:
of languages, for use in both batch and beats out designing ever-better algorithms > Capture and refine data from a wide
interactive environments. to analyze an existing data set.4 variety of sources

4 More data usually beats better algorithms, Anand Rajaraman, Datawocky, March 24, 2008,
http://anand.typepad.com/datawocky/2008/03/more-data-usual.html.

EB-7234 > 0612 > PAGE 6 OF 13


Harnessing the Value of Big Data Analytics

> Perform necessary multi-structured Why Not Replace Analytical Relational


data preprocessing Databases with Hadoop?
> Develop rapid analytics Analytical relational databases were created for rapid
> Process embedded analytics, analyzing access to large data sets by many concurrent users. Typi-
both relational and non-relational data cal analytical databases support SQL and connectivity to a
> Produce semi-structured data as output, large ecosystem of analysis tools. They efficiently combine
often with metadata and heuristic analysis complex data sets, automate data partitioning and index
techniques, and provide complex analytics on structured
> Solve new analytical workloads with
data. They also offer security, workload management, and
reduced time to insight
service-level guarantees on top of a relational store. Thus,
> Use massively parallel storage in Hadoop
the database abstracts the user from the mundane tasks
to efficiently store and retain data
of partitioning data and optimizing query performance.
Benefits of a Unified Big Since Hadoop is founded on a distributed file system and
Data Architecture for not a relational database, it removes the requirement of
Analytics data schema. Unfortunately, Hadoop also eliminates the
benefits of an analytical relational database, such as
Using Hadoop with a data discovery
interactive data access and a broad ecosystem of SQL-
platform and integrated data warehouse
compatible tools. Integrating the best parts of Hadoop
can help you meet the challenges of
with the benefits of analytical relational databases is the
gaining insight from big data volumes.
optimum solution for a big data analytics architecture.
With a blended solution, you can combine
the developer-oriented MapReduce plat-
form with the SQL-based BI tools familiar They can also perform iterative, rich, hardware and offer linear, elastic scaling.
to business analysts. Providing the best big data analytics with greater accuracy Appliance-based solutions deliver high
of both worlds, this type of solution lets and effectiveness. value in a prepackaged form.
you use the right tool for the job that is,
Because a blended solution offers higher By unifying these solutions into a single
SQL for structured data and MapReduce
performance than SQL-only analytics, reference architecture, companies can
processing for large-scale procedural
users can gain insights faster. Whats unlock value from big data volumes
analytics that would otherwise require
more, native integration of SQL and without needing to retrain personnel or
complex, multi-pass SQL technologies.
MapReduce lets users perform analysis hire expensive data scientists. By protect-
Business users can easily and intuitively
without changing the underlying code, ing your existing investment in relational
perform analytics processes that would
so they can dig deeper for insight. database technology and user skill sets,
otherwise be difficult or impossible.
blended solutions also are kind to your
Unifying these best-of-breed solutions
This ease of use in turn enables extended budget. And unlike typical open source
also offers a lower total cost of ownership
use of the data. Data scientists and analysts offerings, a blended solution supports
(TCO) than individual tools. Many
alike can manage and analyze both rela- corporate compliance, security, and
software-only products come with
tional and non-relational data, inside and usability requirements with greater rigor.
deployment options that use commodity
outside the integrated data warehouse.

EB-7234 > 0612 > PAGE 7 OF 13


Harnessing the Value of Big Data Analytics

Choosing the Right Big type of data and schema exist in your includes data captured by machines
Data Analytics Solution environment. Possibilities include: with a well-defined format, but no
> Data that uses a stable schema (struc- semantics, such as images, videos, web
As big data challenges become more
tured) This can include data from pages, and PDF documents. Semantics
pressing, vendors are introducing products
packaged business processes with can be extracted from the raw data by
designed to help companies effectively
well-defined and known attributes, interpreting the format and pulling out
handle the huge volumes of data and
such as ERP data, inventory records, required data. This is often done with
perform insight-enhancing analytics.
and supply chain records. shapes from a video, face recognition in
But selecting the appropriate solution for
images, and logo detection. Sometimes
your requirements need not be difficult. > Data that has an evolving schema
formatted data is accompanied by meta-
(semi-structured) Examples include
With the inherent technical differences data that can have a stable schema or
data generated by machine processes,
in data types, schema requirements, and an evolving schema, which needs to be
with known but changing sets of
analytical workloads, its no surprise that classified and treated separately.
attributes, such as web logs, call detail
certain solutions lend themselves to records, sensor logs, JSON (JavaScript Each of these three schema types may
optimal performance in different parts Object Notation), social profiles, and include a wide spectrum of workloads that
of the unified big data architecture. The Twitter feeds. must be performed on the data. Table 1
first criteria to consider should be what
> Data that has a format, but no schema lists several common data tasks and
(unstructured) Unstructured data workload considerations.

Data Task Potential Workloads

Low-cost storage Retains raw data in manner that can provide low TCO-per-terabyte storage costs
and retention Requires access in deep storage, but not at same speeds as in a front-line system

Loading Brings data into the system from the source system

Pre-processing/ Prepares data for downstream processing by, for example, fetching dimension
prep/cleansing/ data, recording a new incoming batch, or archiving old window batch.
constraint
validation

Transformation Converts one structure of data into another structure. This may require going
from third-normal form in a relational database to a star or snowflake schema,
or from text to a relational database, or from relational technology to a graph,
as with structural transformations.

Reporting Queries historical data such as what happened, where it happened, how much
happened, who did it (e.g., sales of a given product by region)

Analytics (including Performs relationship modeling via declarative SQL (e.g., scoring or basic stats)
user-driven, inter- Performs relationship modeling via procedural MapReduce (e.g., model building
active, or ad-hoc) or time series)

Table 1. Matching Data Tasks and Workloads

EB-7234 > 0612 > PAGE 8 OF 13


Harnessing the Value of Big Data Analytics

Teradata Solutions for Big Democratizing Analytics


Data Analytics
Its no secret that leading companies use data and analytics
To help companies cost-effectively gain
to drive competitive advantage. Some organizations build data
valuable insight from big data volumes,
science teams to mine big data volumes, using enhanced analytic
Teradata recently introduced the Teradata techniques and tools that can expose hidden patterns in consumer
Aster MapReduce Platform. This data behavior, preferences, fraud, and other business trends.
discovery platform helps companies to
A new class of data discovery platform tools extends these
bridge the gap between esoteric data science
valuable practices from highly skilled, highly paid developers
technologies and the language of business.
and data scientists to analysts and business users. These
It combines the developer-oriented Map-
platforms give users their choice of tools whether they
Reduce platform with the SQL-based BI
prefer SQL, BI tools, statistical packages (such as R or SAS),
tools familiar to business analysts.
or programming languages.
This unified, intuitive software product By extending the use of these tools to a broader constituency
gives business analysts with ordinary SQL of users, data discovery platforms help democratize the power
skills the ability to work like data scientists, of data science throughout the business. Instead of confining
asking questions and getting valuable data discovery to data engineers who may lack the business
insight from huge stores of data. Using context of the problems they are asked to solve the data
this product, business analysts can quickly discovery platform brings powerful analytics tools to the entire
identify patterns and trends using a variety business community.
of techniques, such as pattern and graph
analysis. For the first time, they can
rapidly and intuitively explore massive discovery platform into the data warehouse. Stable Schema
volumes of multi-structured digital data These insights are then available for Sample applications: Financial analysis,
from a wide variety of sources. And ongoing strategic and operational analysis. ad-hoc/OLAP queries, enterprise-wide BI
companies can unlock value from big Business analysts can ask and answer even and reporting, spatial/temporal, and active
data without needing to retrain personnel more differentiated business questions. execution (in-process, operational insights)
or hire expensive data scientists.
Teradata, Aster, and Hadoop: Characteristics: In applications with a
The Teradata Aster MapReduce Platform When to Use Which Solution stable schema, the data model is relatively
includes Aster Database 5.0, a suite of Figure 3 offers a framework to help fixed. For example, financial reporting and
prepackaged analytic functions and apps enterprise architects most effectively use analysis is conducted much the same way
and an integrated development environ- each part of a unified big data architecture. throughout the fiscal quarter or year.
ment for easy development of custom This framework allows a best-of-breed Transactions collected from point-of-sale,
SQL-MapReduce functions. This platform approach that you can apply to each inventory, customer relationship manage-
includes an embedded analytics engine that schema type, helping you achieve maxi- ment, and accounting systems are known
supports an array of common program- mum performance, rapid enterprise and change infrequently. This business
ming languages, allowing analysts to build adoption, and the lowest TCO. requires ACID (atomicity, consistency,
and easily modify sophisticated algorithms isolation, durability) property or transaction
without additional training or investment. The following use cases demonstrate how
guarantees, as well as security, well-docu-
Analysts can integrate insights from the you can apply this framework to your big
mented data models, extract, transform, and
data challenges.

EB-7234 > 0612 > PAGE 9 OF 13


Harnessing the Value of Big Data Analytics

Loading and Refining


Low Cost Analytics
Storage and Data Reporting (User-driven,
Retention Pre-processing, Transformations interactive)
Prep, Cleansing

Stable Teradata/ Teradata


Teradata Teradata Teradata
Schema Hadoop (SQL analytics)

Aster
Aster
Evolving (SQL +
Hadoop Aster/Hadoop (joining with Aster
Schema MapReduce
structured data)
Analytics)

Aster
Format,
Hadoop Hadoop Hadoop (MapReduce
No Schema
Analytics)

Figure 3. Choosing Solutions by Workload and Data Type

load (ETL) jobs, data lineage, and metadata > Customers that want to store large data dynamically and automatically com-
management throughout the data pipeline volumes and perform light transforma- press cold data, driving higher volumes
from storage to refining through reporting tions can use the Teradata Extreme Data of data into the cold tier.
and analytics. Appliance. This platform offers low-cost
Evolving Schema
data storage with high compression
Recommended Approach: Leverage the Sample applications: Interactive data
rates at a highly affordable price.
strength of the relational model and SQL. discovery, including web click stream,
> For CPU-intensive transformations,
You may also want to use Hadoop to social feeds, set-top box analysis, sensor
the Teradata Data Warehouse Appli-
support low-cost, scale-out storage and logs, and JSON.
ance supports mid-level data storage
retention for some transactional data,
with built-in automatic compression Characteristics: Data generated by
which requires less rigor in security and
engines. machine processes typically requires a
metadata management.
> Customers that want to minimize data schema that changes or evolves rapidly.
Suggested Products: Teradata provides movement and complexity and are The schema itself may be structured, but
multiple solutions to handle low-cost executing transformations that require the changes occur too quickly for most
storage and retention applications as well reference data can use the Teradata data models, ETL steps, and reports to
as loading and transformation tasks. With Active Enterprise Data Warehouse. keep pace. Company e-commerce sites,
this architectural flexibility, Teradata This appliance provides a hybrid, social media, and other fast-changing
products help customers meet varying multi-temp architecture that places systems are good examples of evolving
cost, data latency, and performance cold data on hard disks and hot data schema. In many cases, an evolving
requirements. For example: on solid-state storage devices. With schema has two components one fixed
Teradata Database, customers can and one variable. For example, web logs

EB-7234 > 0612 > PAGE 10 OF 13


Harnessing the Value of Big Data Analytics

generate an IP address, time stamp, and read. The system handles this processing However, it appears less relational than
cookie ID, which are fixed. The URL string behind the scenes, allowing the analyst to non-relational, lacks semantics, and does
which is rich with information such as interpret and model data on-the-fly, based not easily fit into the notion of traditional
referral URLs and search terms used to on different analytic requirements. Ana- RDBMS rows and columns. There is often
find a page varies more. lysts never need to change data models or a need to store these data types in their
build new ETL scripts in order to break native file formats.
Recommended Approach: The design of
out the variable data. This feature reduces
web sites, applications, third-party sites, Recommended Approach: Hadoop
cost and saves time, giving analysts the
search engine marketing, and search MapReduce provides a large-scale process-
freedom to explore data without being
engine optimization strategies changes ing framework for workloads that need to
constrained by a rigid schema.
dynamically over time. Look for a solution extract semantics from raw file data. By
that eases the management of evolving Hadoop can also ingest files and store interpreting the format and pulling out
schema data by providing features that: them without structure, providing a required data, Hadoop can discern and
> Leverage the back end of the relational scalable data landing and staging area categorize shapes from video and perform
database management system (RDBMS), for huge volumes of machine-generated face recognition in images. Sometimes
so you can easily add or remove columns data. Because Hadoop uses the HDFS file format data is accompanied by meta-data,
system for storage instead of a relational which can be extracted, classified, and
> Make it easy for queries to do late
database, it requires additional processing treated separately.
binding of the structure
steps to create schema on-the-fly for
> Optimize queries dynamically by Suggested Products: When running batch
analysis. Therefore, Hadoop can slow an
collecting relevant statistics on the jobs to extract metadata from images or text,
iterative, interactive data discovery process.
variable part of the data Hadoop is an ideal platform. You can then
> Support encoding and enforcement However, if your process includes known analyze or join this metadata with other
of constraints on the variable part of batch data transformation steps that require dimensional data to provide additional
the data limited interactivity, Hadoop MapReduce value. Once youve used Hadoop to prepare
can be a good choice. Hadoop MapReduce the refined data, load it into Teradata Aster
Suggested Products: Teradata Aster is an enables large-scale data refining, so you can to quickly and easily join the data with other
ideal platform for ingesting and analyzing extract higher-value data from raw files for evolving- or stable-schema data.
data in an evolving schema. The product downstream data discovery and analytics. In
provides a discovery platform, which an evolving schema, Hadoop and Teradata Big Data Analytics in
allows evolving data to be stored natively Aster are a perfect complement for Action
without pre-defining how the variable part ingesting, refining, and discovering
Since its recent release, the Teradata Aster
of the data should be broken up. valuable insights from big data volumes.
discovery platform has already helped
Teradata Aster also allows the fixed part No Schema dozens of customers realize dramatic
of the data to be stored in a schema and Sample applications: Image processing, business benefit through enhanced insight.
indexed for performance. With this audio/video storage and refining, storage, The following examples illustrate how a
feature, analysts can define structure of and batch transformation and extraction company can use the Teradata Aster
the variable component at query run time. discovery platform, Teradata integrated
This task happens as part of the SQL- Characteristics: With data that has a data warehouse technology, and Hadoop
MapReduce analytic workflow in a process format, but no schema, the data structure to deliver new business insight from big
called late data binding or schema on is typically a well-defined file format. data analytics.

EB-7234 > 0612 > PAGE 11 OF 13


Harnessing the Value of Big Data Analytics

Hadoop captures, Social Aster does path and


stores and transforms and sentiment analysis
images and call Web Data with multi-structured
records data

Multi-structured
Raw Data Call Data
Teradata Aster Analysis
Hadoop Check Data and
Discovery
Call Center
Platform Marketing
Voice Records
Automation

Analytic Results
Dimensional Data
(Customer
Check Images Capture, Retention Retention
and
Transformation
Campaign)
Traditional Layer
Data Flow

Data Sources
ETL Tools Teradata
Integrated DW

Figure 4. Customer Retention Example.

Customer Retention and platform. The bank also uses its Teradata identifies the unhappy sentiment data
Profitability integrated data warehouse to store and from Mr. Jones call to the contact center.
Banks and other companies with retail analyze high-resolution check images. In addition, the analyst notes that one of
operations know that keeping a customer the customers deposited checks is drawn
Using Hadoop, analysts can efficiently
satisfied is far less costly than replacing a on the account of another bank, with the
capture these huge volumes of image and
dissatisfied customer. A unified big data note brokerage account opening bonus.
call data. Then they can use the Aster-
architecture can help companies better The analyst can recommend that a cus-
Hadoop adaptor or Aster SQL-H
understand customer communications tomer support agent reach out to Mr.
method for on-the-fly data access of
and take action to prevent unhappy Jones with an offer designed to prevent
Hadoop data at query runtime to merge
consumers from defecting to a competitor. him from leaving.
the unhappy customer data from call
(See Figure 4.)
center records with the check data. Furthermore, the analyst can use these
For example, assume that a customer, Mr. tools to reveal customers with similar
By using Aster nPath one of the SQL-
Jones, calls a banks contact center to behavior to that of Mr. Jones. Marketing
MapReduce-enabled functions in the
complain about an account fee. The bank and sales personnel can proactively
Teradata Aster MapReduce Platform an
collects interactive voice response infor- approach these dissatisfied customers,
analyst can quickly determine whether
mation from the call center, storing this making offers that save those relation-
Mr. Jones may be about to switch over to
unstructured data in the data discovery ships, too.
the new financial institution. The analyst

EB-7234 > 0612 > PAGE 12 OF 13


Harnessing the Value of Big Data Analytics
Teradata.com

Aster Database

Tokenize SQL
Analysis

Parallel
Extraction
SQL- Sentiment
MapReduce
Document
Parser Graph
MS Office 97-2010 Visualization
(doc, ppt, xls, msg) Email
PDF, HTML, ePub, Parser Form/table/mail ID
text, jpg

Image
File Server Metadata
Extract Custom parsing of
subject line (e.g.
CaseNumber)

All MapReduce
code automatically Custom
TxtProc
executes in parallel
in database

Figure 5. Text Extraction and Analysis Example.

Text Extraction and Analysis Once stored, these files can be processed to data can help companies identify customers
Applications such as e-discovery, sentiment extract the relevant data and structure it likely to churn or to identify brand
analysis, and search rely on the ability to for analysis. advocates who might be open to a market-
store, process, and analyze massive amounts ing affiliation program to help drive
Next, analysts can use SQL-MapReduce
of documents, text, and emails. In their awareness and sales.
functions for tokenization, e-mail parsing,
native formats, it is very difficult to analyze
sentiment analysis, and other types of
these data types. Huge data volumes further For More Information
processing. These features allow businesses
complicate analysis efforts.
to identify positive or negative consumer For more information about how you can
The Teradata Aster MapReduce Platform sentiments or look for trends or correla- bring more value to the business through
includes features that support text extrac- tions in email communications. New a unified big data architecture, contact your
tion and analysis applications. Hadoops insights can be combined with other Teradata or Teradata Aster representative or
HDFS is ideal for quickly loading and information about the customer in the visit us on the web at Teradata.com or
storing any type of file in its native format. integrated data warehouse. Analyzing this Asterdata.com.

The Best Decision Possible is a trademark, and Teradata and the Teradata logo are registered trademarks of Teradata Corporation and/or its affiliates in the U.S.
or worldwide. Teradata continually improves products as new technologies and components become available. Teradata, therefore, reserves the right to change
specifications without prior notice. All features, functions, and operations described herein may not be marketed in all parts of the world. Consult your Teradata
representative or Teradata.com for more information.
Copyright 2012 by Teradata Corporation All Rights Reserved. Produced in U.S.A.

EB-7234 > 0612 > PAGE 13 OF 13

You might also like