AWS Big Data
AWS Big Data
2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
PROCESS
STORE
PROCESS:
Amazon EMR with Spark & Hive
http://aws.amazon.com/big-data/use-cases/
Collect
"http://www.swivel.com/graphs/show/1163466"
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11)
Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)"
Process
Apache Spark
Fast, general purpose engine
for large-scale data processing
Write applications quickly in
Java, Scala, or Python
Combine SQL, streaming, and
complex analytics
Spark SQL
Spark's module for working with structured data using SQL
Apache Zeppelin
https://zeppelin.incubator.apache.org/
Open Zeppelin with your local web browser and create a new Note:
http://localhost:18890
Combine Fields: A, B, C A B C
var accessLogColumns = accessLogFields
.map( arrayOfFields => { var temp1 =""; for (field <- arrayOfFields) yield {
var temp2 = ""
if (temp1.replaceAll("\\[","\"").startsWith("\"") && !temp1.endsWith("\""))
temp1 = temp1 + " " + field.replaceAll("\\[|\\]","\"")
else temp1 = field.replaceAll("\\[|\\]","\"")
temp2 = temp1
if (temp1.endsWith("\"")) temp1 = ""
temp2
}})
.map( fields => fields.filter(field => (field.startsWith("\"") &&
field.endsWith("\"")) || !field.startsWith("\"") ))
.map(fields => fields.map(_.replaceAll("\"","")))
Analyze
psql -h YOUR-REDSHIFT-ENDPOINT \
-p 8192 -U master demo
Or use any JDBC or ODBC SQL client with the PostgreSQL 8.x drivers
or native Amazon Redshift support
Aginity Workbench for Amazon Redshift
SQL Workbench/J
DEMO
Amazon QuickSight
Amazon
Kinesis
Firehose
Amazon
S3
Event
Notification
Amazon
EMR
Amazon
S3
Amazon
Redshift
Spark job
List of objects from Lambda
Write to Amazon Redshift using spark-redshift
Amazon
QuickSight