Analytics API
Analytics API
Amazon's trademarks and trade dress may not be used in connection with any product or service that is not
Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or
discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may
or may not be affiliated with, connected to, or sponsored by Amazon.
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Table of Contents
What Is Amazon Kinesis Data Analytics for SQL Applications? .................................................................. 1
When Should I Use Amazon Kinesis Data Analytics? ........................................................................ 1
Are You a First-Time User of Amazon Kinesis Data Analytics? ........................................................... 1
How It Works .................................................................................................................................... 3
Input ........................................................................................................................................ 5
Configuring a Streaming Source ........................................................................................... 5
Configuring a Reference Source ........................................................................................... 7
Working with JSONPath ...................................................................................................... 9
Mapping Streaming Source Elements to SQL Input Columns .................................................. 13
Using the Schema Discovery Feature on Streaming Data ....................................................... 17
Using the Schema Discovery Feature on Static Data .............................................................. 18
Preprocessing Data Using a Lambda Function ...................................................................... 21
Parallelizing Input Streams for Increased Throughput ........................................................... 28
Application Code ...................................................................................................................... 31
Output .................................................................................................................................... 33
Creating an Output Using the AWS CLI ............................................................................... 33
Using a Lambda Function as Output ................................................................................... 34
Application Output Delivery Model ..................................................................................... 40
Error Handling ......................................................................................................................... 41
Reporting Errors Using an In-Application Error Stream .......................................................... 41
Auto Scaling Applications .......................................................................................................... 42
Tagging ................................................................................................................................... 42
Adding Tags when an Application is Created ........................................................................ 42
Adding or Updating Tags for an Existing Application ............................................................. 43
Listing Tags for an Application ........................................................................................... 43
Removing Tags from an Application ................................................................................... 43
Getting Started ................................................................................................................................ 45
Step 1: Set Up an Account ........................................................................................................ 45
Sign Up for AWS .............................................................................................................. 45
Create an IAM User .......................................................................................................... 46
Next Step ........................................................................................................................ 46
Step 2: Set Up the AWS CLI ....................................................................................................... 46
Next Step ........................................................................................................................ 47
Step 3: Create Your Starter Analytics Application ........................................................................ 47
Step 3.1: Create an Application .......................................................................................... 49
Step 3.2: Configure Input .................................................................................................. 50
Step 3.3: Add Real-Time Analytics (Add Application Code) ..................................................... 53
Step 3.4: (Optional) Update the Application Code ................................................................. 56
Step 4 (Optional) Edit the Schema and SQL Code Using the Console ............................................... 58
Working with the Schema Editor ........................................................................................ 58
Working with the SQL Editor ............................................................................................. 65
Streaming SQL Concepts ................................................................................................................... 68
In-Application Streams and Pumps ............................................................................................. 68
Timestamps and the ROWTIME Column ...................................................................................... 69
Understanding Various Times in Streaming Analytics ............................................................ 69
Continuous Queries .................................................................................................................. 71
Windowed Queries ................................................................................................................... 72
Stagger Windows ............................................................................................................. 72
Tumbling Windows ........................................................................................................... 77
Sliding Windows ............................................................................................................... 78
Stream Joins ............................................................................................................................ 82
Example 1: Report Orders Where There Are Trades Within One Minute of the Order Being
Placed ............................................................................................................................. 82
Examples ......................................................................................................................................... 84
iii
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
iv
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
v
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
vi
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
When Should I Use Amazon Kinesis Data Analytics?
To get started with Kinesis Data Analytics, you create a Kinesis data analytics application that
continuously reads and processes streaming data. The service supports ingesting data from Amazon
Kinesis Data Streams and Amazon Kinesis Data Firehose streaming sources. Then, you author your SQL
code using the interactive editor and test it with live streaming data. You can also configure destinations
where you want Kinesis Data Analytics to send the results.
Kinesis Data Analytics supports Amazon Kinesis Data Firehose (Amazon S3, Amazon Redshift, Amazon
Elasticsearch Service, and Splunk), AWS Lambda, and Amazon Kinesis Data Streams as destinations.
• Generate time-series analytics – You can calculate metrics over time windows, and then stream values
to Amazon S3 or Amazon Redshift through a Kinesis data delivery stream.
• Feed real-time dashboards – You can send aggregated and processed streaming data results
downstream to feed real-time dashboards.
• Create real-time metrics – You can create custom metrics and triggers for use in real-time monitoring,
notifications, and alarms.
For information about the SQL language elements that are supported by Kinesis Data Analytics, see
Amazon Kinesis Data Analytics SQL Reference.
1. Read the How It Works section of this guide. This section introduces various Kinesis Data Analytics
components that you work with to create an end-to-end experience. For more information, see
Amazon Kinesis Data Analytics for SQL Applications: How It Works (p. 3).
2. Try the Getting Started exercises. For more information, see Getting Started with Amazon Kinesis
Data Analytics for SQL Applications (p. 45).
1
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Are You a First-Time User of
Amazon Kinesis Data Analytics?
3. Explore the streaming SQL concepts. For more information, see Streaming SQL Concepts (p. 68).
4. Try additional examples. For more information, see Example Applications (p. 84).
2
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Kinesis Data Analytics applications continuously read and process streaming data in real time. You write
application code using SQL to process the incoming streaming data and produce output. Then, Kinesis
Data Analytics writes the output to a configured destination. The following diagram illustrates a typical
application architecture.
Each application has a name, description, version ID, and status. Amazon Kinesis Data Analytics assigns
a version ID when you first create an application. This version ID is updated when you update any
application configuration. For example, if you add an input configuration, add or delete a reference
data source, add or delete an output configuration, or update application code, Kinesis Data Analytics
updates the current application version ID. Kinesis Data Analytics also maintains timestamps for when an
application was created and last updated.
• Input – The streaming source for your application. You can select either a Kinesis data stream or
a Kinesis Data Firehose data delivery stream as the streaming source. In the input configuration,
you map the streaming source to an in-application input stream. The in-application stream is like a
continuously updating table upon which you can perform the SELECT and INSERT SQL operations.
In your application code, you can create additional in-application streams to store intermediate query
results.
3
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
You can optionally partition a single streaming source in multiple in-application input streams to
improve the throughput. For more information, see Limits (p. 165) and Configuring Application
Input (p. 5).
Amazon Kinesis Data Analytics provides a timestamp column in each application stream called
Timestamps and the ROWTIME Column (p. 69). You can use this column in time-based windowed
queries. For more information, see Windowed Queries (p. 72).
You can optionally configure a reference data source to enrich your input data stream within the
application. It results in an in-application reference table. You must store your reference data as
an object in your S3 bucket. When the application starts, Amazon Kinesis Data Analytics reads
the Amazon S3 object and creates an in-application table. For more information, see Configuring
Application Input (p. 5).
• Application code – A series of SQL statements that process input and produce output. You can write
SQL statements against in-application streams and reference tables. You can also write JOIN queries to
combine data from both of these sources.
For information about the SQL language elements that are supported by Kinesis Data Analytics, see
Amazon Kinesis Data Analytics SQL Reference.
In its simplest form, application code can be a single SQL statement that selects from a streaming
input and inserts results into a streaming output. It can also be a series of SQL statements where
output of one feeds into the input of the next SQL statement. Further, you can write application code
to split an input stream into multiple streams. You can then apply additional queries to process these
streams. For more information, see Application Code (p. 31).
• Output – In application code, query results go to in-application streams. In your application code,
you can create one or more in-application streams to hold intermediate results. You can then
optionally configure the application output to persist data in the in-application streams that hold
your application output (also referred to as in-application output streams) to external destinations.
External destinations can be a Kinesis Data Firehose delivery stream or a Kinesis data stream. Note the
following about these destinations:
• You can configure a Kinesis Data Firehose delivery stream to write results to Amazon S3, Amazon
Redshift, or Amazon Elasticsearch Service (Amazon ES).
• You can also write application output to a custom destination instead of Amazon S3 or Amazon
Redshift. To do that, you specify a Kinesis data stream as the destination in your output
configuration. Then, you configure AWS Lambda to poll the stream and invoke your Lambda
function. Your Lambda function code receives stream data as input. In your Lambda function code,
you can write the incoming data to your custom destination. For more information, see Using AWS
Lambda with Amazon Kinesis Data Analytics.
4
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Input
• Amazon Kinesis Data Analytics needs permissions to read records from a streaming source and write
application output to the external destinations. You use IAM roles to grant these permissions.
• Kinesis Data Analytics automatically provides an in-application error stream for each application. If
your application has issues while processing certain records (for example, because of a type mismatch
or late arrival), that record is written to the error stream. You can configure application output to direct
Kinesis Data Analytics to persist the error stream data to an external destination for further evaluation.
For more information, see Error Handling (p. 41).
• Amazon Kinesis Data Analytics ensures that your application output records are written to the
configured destination. It uses an "at least once" processing and delivery model, even if you experience
an application interruption. For more information, see Delivery Model for Persisting Application
Output to an External Destination (p. 40).
Topics
• Configuring Application Input (p. 5)
• Application Code (p. 31)
• Configuring Application Output (p. 33)
• Error Handling (p. 41)
• Automatically Scaling Applications to Increase Throughput (p. 42)
• Using Tagging (p. 42)
Topics
• Configuring a Streaming Source (p. 5)
• Configuring a Reference Source (p. 7)
• Working with JSONPath (p. 9)
• Mapping Streaming Source Elements to SQL Input Columns (p. 13)
• Using the Schema Discovery Feature on Streaming Data (p. 17)
• Using the Schema Discovery Feature on Static Data (p. 18)
• Preprocessing Data Using a Lambda Function (p. 21)
• Parallelizing Input Streams for Increased Throughput (p. 28)
5
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Configuring a Streaming Source
Note
If the Kinesis data stream is encrypted, Kinesis Data Analytics accesses the data in the encrypted
stream seamlessly with no further configuration needed. Kinesis Data Analytics does not store
unencrypted data read from Kinesis Data Streams. For more information, see What Is Server-
Side Encryption For Kinesis Data Streams?.
Kinesis Data Analytics continuously polls the streaming source for new data and ingests it in in-
application streams according to the input configuration.
Note
Adding a Kinesis Stream as your application's input does not affect the data in the stream.
If another resource such as a Kinesis Data Firehose delivery stream also accessed the same
Kinesis stream, both the Kinesis Data Firehose delivery stream and the Kinesis Data Analytics
application would receive the same data. Throughput and throttling might be affected, however.
Your application code can query the in-application stream. As part of input configuration you provide the
following:
• Streaming source – You provide the Amazon Resource Name (ARN) of the stream and an IAM role that
Kinesis Data Analytics can assume to access the stream on your behalf.
• In-application stream name prefix – When you start the application, Kinesis Data Analytics creates
the specified in-application stream. In your application code, you access the in-application stream
using this name.
You can optionally map a streaming source to multiple in-application streams. For more information,
see Limits (p. 165). In this case, Amazon Kinesis Data Analytics creates the specified number of in-
application streams with names as follows: prefix_001, prefix_002, and prefix_003. By default,
Kinesis Data Analytics maps the streaming source to one in-application stream named prefix_001.
There is a limit on the rate that you can insert rows in an in-application stream. Therefore, Kinesis
Data Analytics supports multiple such in-application streams so that you can bring records into your
application at a much faster rate. If you find that your application is not keeping up with the data in
the streaming source, you can add units of parallelism to improve performance.
• Mapping schema – You describe the record format (JSON, CSV) on the streaming source. You also
describe how each record on the stream maps to columns in the in-application stream that is created.
This is where you provide column names and data types.
Note
Kinesis Data Analytics adds quotation marks around the identifiers (stream name and column
names) when creating the input in-application stream. When querying this stream and the
columns, you must specify them in quotation marks using the same casing (matching lowercase
and uppercase letters exactly). For more information about identifiers, see Identifiers in the
Amazon Kinesis Data Analytics SQL Reference.
You can create an application and configure inputs in the Amazon Kinesis Data Analytics console. The
console then makes the necessary API calls. You can configure application input when you create a
new application API or add input configuration to an existing application. For more information, see
CreateApplication (p. 203) and AddApplicationInput (p. 191). The following is the input configuration
part of the Createapplication API request body:
"Inputs": [
{
"InputSchema": {
6
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Configuring a Reference Source
"RecordColumns": [
{
"Mapping": "string",
"Name": "string",
"SqlType": "string"
}
],
"RecordEncoding": "string",
"RecordFormat": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": "string",
"RecordRowDelimiter": "string"
},
"JSONMappingParameters": {
"RecordRowPath": "string"
}
},
"RecordFormatType": "string"
}
},
"KinesisFirehoseInput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"KinesisStreamsInput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"Name": "string"
}
]
You store reference data in the Amazon S3 object using supported formats (CSV, JSON). For example,
suppose that your application performs analytics on stock orders. Assume the following record format
on the streaming source:
In this case, you might then consider maintaining a reference data source to provide details for each
stock ticker, such as company name.
Ticker, Company
AMZN, Amazon
XYZ, SomeCompany
...
You can add an application reference data source either with the API or with the console. Amazon Kinesis
Data Analytics provides the following API actions to manage reference data sources:
7
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Configuring a Reference Source
For information about adding reference data using the console, see Example: Adding Reference Data to a
Kinesis Data Analytics Application (p. 118).
• If the application is running, Kinesis Data Analytics creates an in-application reference table, and then
loads the reference data immediately.
• If the application is not running (for example, it's in the ready state), Kinesis Data Analytics saves only
the updated input configuration. When the application starts running, Kinesis Data Analytics loads the
reference data in your application as a table.
Suppose that you want to refresh the data after Kinesis Data Analytics creates the in-application
reference table. Perhaps you updated the Amazon S3 object, or you want to use a different Amazon
S3 object. In this case, you can either explicitly call UpdateApplication (p. 239), or choose Actions,
Synchronize reference data table in the console. Kinesis Data Analytics does not refresh the in-
application reference table automatically.
There is a limit on the size of the Amazon S3 object that you can create as a reference data source. For
more information, see Limits (p. 165). If the object size exceeds the limit, Kinesis Data Analytics can't
load the data. The application state appears as running, but the data is not being read.
When you add a reference data source, you provide the following information:
• S3 bucket and object key name – In addition to the bucket name and object key, you also provide an
IAM role that Kinesis Data Analytics can assume to read the object on your behalf.
• In-application reference table name – Kinesis Data Analytics creates this in-application table and
populates it by reading the Amazon S3 object. This is the table name you specify in your application
code.
• Mapping schema – You describe the record format (JSON, CSV), encoding of data stored in the
Amazon S3 object. You also describe how each data element maps to columns in the in-application
reference table.
The following shows the request body in the AddApplicationReferenceDataSource API request.
{
"applicationName": "string",
"CurrentapplicationVersionId": number,
"ReferenceDataSource": {
"ReferenceSchema": {
"RecordColumns": [
{
"IsDropped": boolean,
"Mapping": "string",
"Name": "string",
"SqlType": "string"
}
],
"RecordEncoding": "string",
"RecordFormat": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": "string",
"RecordRowDelimiter": "string"
},
8
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with JSONPath
"JSONMappingParameters": {
"RecordRowPath": "string"
}
},
"RecordFormatType": "string"
}
},
"S3ReferenceDataSource": {
"BucketARN": "string",
"FileKey": "string",
"ReferenceRoleARN": "string"
},
"TableName": "string"
}
}
Amazon Kinesis Data Analytics uses JSONPath expressions in the application's source schema to identify
data elements in a streaming source that contains JSON-format data.
For more information about how to map streaming data to your application's input stream, see the
section called “Mapping Streaming Source Elements to SQL Input Columns” (p. 13).
{
"customerName":"John Doe",
"address":
{
"streetAddress":
[
"number":"123",
"street":"AnyStreet"
],
"city":"Anytown"
}
"orders":
[
{ "orderId":"23284", "itemName":"Widget", "itemPrice":"33.99" },
{ "orderId":"63122", "itemName":"Gadget", "itemPrice":"22.50" },
{ "orderId":"77284", "itemName":"Sprocket", "itemPrice":"12.00" }
]
}
$.elementName
9
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with JSONPath
The following expression queries the customerName element in the preceding JSON example.
$.customerName
The preceding expression returns the following from the preceding JSON record.
John Doe
Note
Path expressions are case sensitive. The expression $.customername returns null from the
preceding JSON example.
Note
If no element appears at the location where the path expression specifies, the expression returns
null. The following expression returns null from the preceding JSON example, because there
is no matching element.
$.customerId
$.parentElement.element
The following expression queries the city element in the preceding JSON example.
$.address.city
The preceding expression returns the following from the preceding JSON record.
Anytown
You can query further levels of subelements using the following syntax.
$.parentElement.element.subElement
The following expression queries the street element in the preceding JSON example.
$.address.streetAddress.street
The preceding expression returns the following from the preceding JSON record.
AnyStreet
Accessing Arrays
You can access the data in a JSON array in the following ways:
10
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with JSONPath
To query the entire contents of an array as a single row, use the following syntax.
$.arrayObject[0:]
The following expression queries the entire contents of the orders element in the preceding JSON
example used in this section. It returns the array contents in a single column in a single row.
$.orders[0:]
The preceding expression returns the following from the example JSON record used in this section.
[{"orderId":"23284","itemName":"Widget","itemPrice":"33.99"},
{"orderId":"61322","itemName":"Gadget","itemPrice":"22.50"},
{"orderId":"77284","itemName":"Sprocket","itemPrice":"12.00"}]
To query the individual elements in an array as separate rows, use the following syntax.
$.arrayObject[0:].element
The following expression queries the orderId elements in the preceding JSON example, and returns
each array element as a separate row.
$.orders[0:].orderId
The preceding expression returns the following from the preceding JSON record, with each data item
returned as a separate row.
23284
63122
77284
Note
If expressions that query nonarray elements are included in a schema that queries individual
array elements, the nonarray elements are repeated for each element in the array. For example,
suppose that a schema for the preceding JSON example includes the following expressions:
• $.customerName
• $.orders[0:].orderId
In this case, the returned data rows from the sample input stream element resemble the
following, with the name element repeated for every orderId element.
11
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with JSONPath
Note
The following limitations apply to array expressions in Amazon Kinesis Data Analytics:
• Only one level of dereferencing is supported in an array expression. The following expression
format is not supported.
$.arrayObject[0:].element[0:].subElement
• Only one array can be flattened in a schema. Multiple arrays can be referenced—returned as
one row containing all of the elements in the array. However, only one array can have each of
its elements returned as individual rows.
A schema containing elements in the following format is valid. This format returns the
contents of the second array as a single column, repeated for every element in the first array.
$.arrayObjectOne[0:].element
$.arrayObjectTwo[0:]
$.arrayObjectOne[0:].element
$.arrayObjectTwo[0:].element
Other Considerations
Additional considerations for working with JSONPath are as follows:
• If no arrays are accessed by an individual element in the JSONPath expressions in the application
schema, then a single row is created in the application's input stream for each JSON record processed.
• When an array is flattened (that is, its elements are returned as individual rows), any missing elements
result in a null value being created in the in-application stream.
• An array is always flattened to at least one row. If no values would be returned (that is, the array is
empty or none of its elements are queried), a single row with all null values is returned.
The following expression returns records with null values from the preceding JSON example, because
there is no matching element at the specified path.
$.orders[0:].itemId
The preceding expression returns the following from the preceding JSON example record.
null
null
null
12
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Mapping Streaming Source Elements to SQL Input Columns
Related Topics
• Introducing JSON
• To process and analyze streaming CSV data, you assign column names and data types for the columns
of the input stream. Your application imports one column from the input stream per column definition,
in order.
You don't have to include all of the columns in the application input stream, but you cannot skip
columns from the source stream. For example, you can import the first three columns from an input
stream containing five elements, but you cannot import only columns 1, 2, and 4.
• To process and analyze streaming JSON data, you use JSONPath expressions to map JSON elements
from a streaming source to SQL columns in an input stream. For more information about using
JSONPath with Amazon Kinesis Data Analytics, see Working with JSONPath (p. 9). The columns in
the SQL table have data types that are mapped from JSON types. For supported data types, see Data
Types. For details about converting JSON data to SQL data, see Mapping JSON Data Types to SQL Data
Types (p. 15).
For more information about how to configure input streams, see Configuring Application Input (p. 5).
• To map elements to columns using the console, see Working with the Schema Editor (p. 58).
• To map elements to columns using the Kinesis Data Analytics API, see the following section.
To map JSON elements to columns in the in-application input stream, you need a schema with the
following information for each column:
• Source Expression: The JSONPath expression that identifies the location of the data for the column.
• Column Name: The name that your SQL queries use to reference the data.
• Data Type: The SQL data type for the column.
13
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Mapping Streaming Source Elements to SQL Input Columns
"Mapping": "String",
"Name": "String",
"SqlType": "String"
}
The fields in the RecordColumn (p. 294) object have the following values:
• Mapping: The JSONPath expression that identifies the location of the data in the input stream record.
This value is not present for an input schema for a source stream in CSV format.
• Name: The column name in the in-application SQL data stream.
• SqlType: The data type of the data in the in-application SQL data stream.
"InputSchema": {
"RecordColumns": [
{
"SqlType": "VARCHAR(4)",
"Name": "TICKER_SYMBOL",
"Mapping": "$.TICKER_SYMBOL"
},
{
"SqlType": "VARCHAR(16)",
"Name": "SECTOR",
"Mapping": "$.SECTOR"
},
{
"SqlType": "TINYINT",
"Name": "CHANGE",
"Mapping": "$.CHANGE"
},
{
"SqlType": "DECIMAL(5,2)",
"Name": "PRICE",
"Mapping": "$.PRICE"
}
],
"RecordFormat": {
"MappingParameters": {
"JSONMappingParameters": {
"RecordRowPath": "$"
}
},
"RecordFormatType": "JSON"
},
"RecordEncoding": "UTF-8"
}
"InputSchema": {
"RecordColumns": [
{
14
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Mapping Streaming Source Elements to SQL Input Columns
"SqlType": "VARCHAR(16)",
"Name": "LastName"
},
{
"SqlType": "VARCHAR(16)",
"Name": "FirstName"
},
{
"SqlType": "INTEGER",
"Name": "CustomerId"
}
],
"RecordFormat": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": ",",
"RecordRowDelimiter": "\n"
}
},
"RecordFormatType": "CSV"
},
"RecordEncoding": "UTF-8"
}
Null Literal
A null literal in the JSON input stream (that is, "City":null) converts to a SQL null regardless of
destination data type.
Boolean Literal
A Boolean literal in the JSON input stream (that is, "Contacted":true) converts to SQL data as
follows:
Number
A number literal in the JSON input stream (that is, "CustomerId":67321) converts to SQL data as
follows:
15
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Mapping Streaming Source Elements to SQL Input Columns
• Numeric (DECIMAL, INT, and so on): Converts directly. If the converted value exceeds the size or
precision of the target data type (that is, converting 123.4 to INT), conversion fails and a coercion
error is written to the error stream.
• Binary (BINARY or VARBINARY): Conversion fails and a coercion error is written to the error stream.
• BOOLEAN:
• 0: Converts to false.
• All other numbers: Converts to true.
• Character (CHAR or VARCHAR): Converts to a string representation of the number.
• Datetime (DATE, TIME, or TIMESTAMP): Conversion fails and a coercion error is written to the error
stream.
String
A string value in the JSON input stream (that is, "CustomerName":"John Doe") converts to SQL data
as follows:
• Numeric (DECIMAL, INT, and so on): Amazon Kinesis Data Analytics attempts to convert the value to
the target data type. If the value cannot be converted, conversion fails and a coercion error is written
to the error stream.
• Binary (BINARY or VARBINARY): If the source string is a valid binary literal (that is, X'3F67A23A', with
an even number of f), the value is converted to the target data type. Otherwise, conversion fails and a
coercion error is written to the error stream.
• BOOLEAN: If the source string is "true", converts to true. This comparison is case-insensitive.
Otherwise, converts to false.
• Character (CHAR or VARCHAR): Converts to the string value in the input. If the value is longer than the
target data type, it is truncated and no error is written to the error stream.
• Datetime (DATE, TIME, or TIMESTAMP): If the source string is in a format that can be converted to the
target value, the value is converted. Otherwise, conversion fails and a coercion error is written to the
error stream.
Array or Object
An array or object in the JSON input stream converts to SQL data as follows:
• Character (CHAR or VARCHAR): Converts to the source text of the array or object. See Accessing
Arrays (p. 10).
• All other data types: Conversion fails and a coercion error is written to the error stream.
For an example of a JSON array, see Working with JSONPath (p. 9).
Related Topics
• Configuring Application Input (p. 5)
• Data Types
• Working with the Schema Editor (p. 58)
• CreateApplication (p. 203)
16
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using the Schema Discovery Feature on Streaming Data
The console uses the Discovery API to generate a schema for a specified streaming source. Using the
console, you can also update the schema, including adding or removing columns, changing column
names or data types, and so on. However, make changes carefully to ensure that you do not create an
invalid schema.
After you finalize a schema for your in-application stream, there are functions you can use to manipulate
string and datetime values. You can use these functions in your application code when working with
rows in the resulting in-application stream. For more information, see Example: Transforming DateTime
Values (p. 98).
• The source stream column name is a reserved SQL keyword, such as TIMESTAMP, USER, VALUES, or
YEAR.
• The source stream column name contains unsupported characters. Only letters, numbers, and the
underscore character ( _ ) are supported.
• The source stream column name begins with a number.
• The source stream column name is longer than 100 characters.
If a column is renamed, the renamed schema column name begins with COL_. In some cases, none of
the original column name can be retained—for example, if the entire name is unsupported characters.
In such a case, the column is named COL_#, with # being a number indicating the column's place in the
column order.
After discovery completes, you can update the schema using the console to add or remove columns, or
change column names, data types, or data size.
USER COL_USER
17
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using the Schema Discovery Feature on Static Data
USER@DOMAIN COL_USERDOMAIN
@@ COL_0
Kinesis Data Analytics infers your schema for common formats, such as CSV and JSON, which are UTF-8
encoded. Kinesis Data Analytics supports any UTF-8 encoded records (including raw text like application
logs and records) with a custom column and row delimiter. If Kinesis Data Analytics doesn't infer a
schema, you can define a schema manually using the schema editor on the console (or using the API).
If your data does not follow a pattern (which you can specify using the schema editor), you can define
a schema as a single column of type VARCHAR(N), where N is the largest number of characters you
expect your record to include. From there, you can use string and date-time manipulation to structure
your data after it is in an in-application stream. For examples, see Example: Transforming DateTime
Values (p. 98).
For more information on how to add reference data and discover schema in the console, see Example:
Adding Reference Data to a Kinesis Data Analytics Application (p. 118).
• BucketARN: The Amazon Resource Name (ARN) of the Amazon S3 bucket that contains the file.
For the format of an Amazon S3 bucket ARN, see Amazon Resource Names (ARNs) and AWS Service
Namespaces: Amazon Simple Storage Service (Amazon S3).
18
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using the Schema Discovery Feature on Static Data
• RoleARN: The ARN of an IAM role with the AmazonS3ReadOnlyAccess policy. For information about
how to add a policy to a role, see Modifying a Role.
• FileKey: The file name of the object.
1. Make sure that you have the AWS CLI set up. For more information, see Step 2: Set Up the AWS
Command Line Interface (AWS CLI) (p. 46) in the Getting Started section.
2. Create a file named data.csv with the following contents:
year,month,state,producer_type,energy_source,units,consumption
2001,1,AK,TotalElectricPowerIndustry,Coal,ShortTons,47615
2001,1,AK,ElectricGeneratorsElectricUtilities,Coal,ShortTons,16535
2001,1,AK,CombinedHeatandPowerElectricPower,Coal,ShortTons,22890
2001,1,AL,TotalElectricPowerIndustry,Coal,ShortTons,3020601
2001,1,AL,ElectricGeneratorsElectricUtilities,Coal,ShortTons,2987681
{
"InputSchema": {
"RecordEncoding": "UTF-8",
"RecordColumns": [
{
"SqlType": "INTEGER",
"Name": "COL_year"
},
{
"SqlType": "INTEGER",
"Name": "COL_month"
},
{
"SqlType": "VARCHAR(4)",
"Name": "state"
},
{
"SqlType": "VARCHAR(64)",
"Name": "producer_type"
},
{
"SqlType": "VARCHAR(4)",
"Name": "energy_source"
},
{
19
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using the Schema Discovery Feature on Static Data
"SqlType": "VARCHAR(16)",
"Name": "units"
},
{
"SqlType": "INTEGER",
"Name": "consumption"
}
],
"RecordFormat": {
"RecordFormatType": "CSV",
"MappingParameters": {
"CSVMappingParameters": {
"RecordRowDelimiter": "\r\n",
"RecordColumnDelimiter": ","
}
}
}
},
"RawInputRecords": [
"year,month,state,producer_type,energy_source,units,consumption
\r\n2001,1,AK,TotalElectricPowerIndustry,Coal,ShortTons,47615\r
\n2001,1,AK,ElectricGeneratorsElectricUtilities,Coal,ShortTons,16535\r
\n2001,1,AK,CombinedHeatandPowerElectricPower,Coal,ShortTons,22890\r
\n2001,1,AL,TotalElectricPowerIndustry,Coal,ShortTons,3020601\r
\n2001,1,AL,ElectricGeneratorsElectricUtilities,Coal,ShortTons,2987681"
],
"ParsedInputRecords": [
[
null,
null,
"state",
"producer_type",
"energy_source",
"units",
null
],
[
"2001",
"1",
"AK",
"TotalElectricPowerIndustry",
"Coal",
"ShortTons",
"47615"
],
[
"2001",
"1",
"AK",
"ElectricGeneratorsElectricUtilities",
"Coal",
"ShortTons",
"16535"
],
[
"2001",
"1",
"AK",
"CombinedHeatandPowerElectricPower",
"Coal",
"ShortTons",
"22890"
],
[
"2001",
"1",
20
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Preprocessing Data Using a Lambda Function
"AL",
"TotalElectricPowerIndustry",
"Coal",
"ShortTons",
"3020601"
],
[
"2001",
"1",
"AL",
"ElectricGeneratorsElectricUtilities",
"Coal",
"ShortTons",
"2987681"
]
]
}
Using a Lambda function for preprocessing records is useful in the following scenarios:
• Transforming records from other formats (such as KPL or GZIP) into formats that Kinesis Data Analytics
can analyze. Kinesis Data Analytics currently supports JSON or CSV data formats.
• Expanding data into a format that is more accessible for operations such as aggregation or anomaly
detection. For instance, if several data values are stored together in a string, you can expand the data
into separate columns.
• Data enrichment with other AWS services, such as extrapolation or error correction.
• Applying complex string transformation to record fields.
• Data filtering for cleaning up the data.
1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://
console.aws.amazon.com/kinesisanalytics.
2. On the Connect to a Source page for your application, choose Enabled in the Record pre-
processing with AWS Lambda section.
3. To use a Lambda function that you have already created, choose the function in the Lambda
function drop-down list.
4. To create a new Lambda function from one of the Lambda preprocessing templates, choose the
template from the drop-down list. Then choose View <template name> in Lambda to edit the
function.
5. To create a new Lambda function, choose Create new. For information about creating a Lambda
function, see Create a HelloWorld Lambda Function and Explore the Console in the AWS Lambda
Developer Guide.
21
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Preprocessing Data Using a Lambda Function
6. Choose the version of the Lambda function to use. To use the latest version, choose $LATEST.
When you choose or create a Lambda function for record preprocessing, the records are preprocessed
before your application SQL code executes or your application generates a schema from the records.
{
"Sid": "UseLambdaFunction",
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
"lambda:GetFunctionConfiguration"
],
"Resource": "<FunctionARN>"
}
For more information about adding permissions policies, see Authentication and Access Control for
Amazon Kinesis Data Analytics for SQL Applications (p. 175).
To get the necessary project code and instructions, see the Kinesis Producer Library Deaggregation
Modules for AWS Lambda on GitHub. You can use the components in this project to process KPL
serialized data within AWS Lambda in Java, Node.js, and Python. You can also use these components as
part of a multi-lang KCL application.
The input model to your preprocessing function varies slightly, depending on whether the data was
received from a Kinesis data stream or a Kinesis Data Firehose delivery stream.
22
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Preprocessing Data Using a Lambda Function
If the source is a Kinesis Data Firehose delivery stream, the event input data model is as follows:
Field Description
records
Field Description
kinesisFirehoseRecordMetadata
Field Description
{
"invocationId":"00540a87-5050-496a-84e4-e7d92bbaf5e2",
"applicationArn":"arn:aws:kinesisanalytics:us-east-1:12345678911:application/lambda-
test",
"streamArn":"arn:aws:firehose:us-east-1:AAAAAAAAAAAA:deliverystream/lambda-test",
"records":[
{
"recordId":"49572672223665514422805246926656954630972486059535892482",
"data":"aGVsbG8gd29ybGQ=",
"kinesisFirehoseRecordMetadata":{
"approximateArrivalTimestamp":1520280173
}
}
]
}
If the source is a Kinesis data stream, the event input data model is as follows:
Field Description
records
23
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Preprocessing Data Using a Lambda Function
Field Description
Field Description
kinesisStreamRecordMetadata
Field Description
Sequence number
sequenceNumber
from the Kinesis
stream record
{
"invocationId": "00540a87-5050-496a-84e4-e7d92bbaf5e2",
"applicationArn": "arn:aws:kinesisanalytics:us-east-1:12345678911:application/lambda-
test",
"streamArn": "arn:aws:kinesis:us-east-1:AAAAAAAAAAAA:stream/lambda-test",
"records": [
{
"recordId": "49572672223665514422805246926656954630972486059535892482",
"data": "aGVsbG8gd29ybGQ=",
"kinesisStreamRecordMetadata":{
"shardId" :"shardId-000000000003",
"partitionKey":"7400791606",
"sequenceNumber":"49572672223665514422805246926656954630972486059535892482",
"approximateArrivalTimestamp":1520280173
}
}
]
}
records
24
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Preprocessing Data Using a Lambda Function
Field Description
result The status of the data transformation of the record. The possible
values are:
{
"records": [
{
"recordId": "49572672223665514422805246926656954630972486059535892482",
"result": "Ok",
"data": "SEVMTE8gV09STEQ="
}
]
}
• Not all records (with record IDs) in a batch that are sent to the Lambda function are returned back to
the Kinesis Data Analytics service.
• The response is missing either the record ID, status, or data payload field. The data payload field is
optional for a Dropped or ProcessingFailed record.
• The Lambda function timeouts are not sufficient to preprocess the data.
• The Lambda function response exceeds the response limits imposed by the AWS Lambda service.
25
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Preprocessing Data Using a Lambda Function
For data preprocessing failures, Kinesis Data Analytics continues to retry Lambda invocations on the
same set of records until successful. You can monitor the following CloudWatch metrics to gain insight
into failures.
• Kinesis Data Analytics application MillisBehindLatest: Indicates how far behind an application is
reading from the streaming source.
• Kinesis Data Analytics application InputPreprocessing CloudWatch metrics: Indicates the number
of successes and failures, among other statistics. For more information, see Amazon Kinesis Analytics
Metrics.
• AWS Lambda function CloudWatch metrics and logs.
Topics
• Creating a Preprocessing Lambda Function in Node.js (p. 26)
• Creating a Preprocessing Lambda Function in Python (p. 26)
• Creating a Preprocessing Lambda Function in Java (p. 27)
• Creating a Preprocessing Lambda Function in .NET (p. 27)
Compressed Input Node.js 6.10 A Kinesis Data Analytics record processor that
Processing receives compressed (GZIP or Deflate compressed)
JSON or CSV records as input and returns
decompressed records with a processing status.
26
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Preprocessing Data Using a Lambda Function
KPL Input Processing Python 2.7 A Kinesis Data Analytics record processor that
receives Kinesis Producer Library (KPL) aggregates
of JSON or CSV records as input and returns
disaggregated records with a processing status.
The following code demonstrates a sample Lambda function that preprocesses records using Java:
@Override
public KinesisAnalyticsInputPreprocessingResponse handleRequest(
KinesisAnalyticsStreamsInputPreprocessingEvent event, Context context) {
context.getLogger().log("InvocatonId is : " + event.invocationId);
context.getLogger().log("StreamArn is : " + event.streamArn);
context.getLogger().log("ApplicationArn is : " + event.applicationArn);
event.records.stream().forEach(record -> {
context.getLogger().log("recordId is : " + record.recordId);
context.getLogger().log("record aat is :" +
record.kinesisStreamRecordMetadata.approximateArrivalTimestamp);
// Add your record.data pre-processing logic here.
// response.records.add(new Record(record.recordId,
KinesisAnalyticsInputPreprocessingResult.Ok, <preprocessedrecordData>));
});
return response;
}
The following code demonstrates a sample Lambda function that preprocesses records using C#:
27
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Parallelizing Input Streams for Increased Throughput
For more information about creating Lambda functions for preprocessing and destinations in .NET, see
Amazon.Lambda.KinesisAnalyticsEvents.
In almost all cases, Amazon Kinesis Data Analytics scales your application to handle the capacity of the
Kinesis streams or Kinesis Data Firehose source streams that feed into your application. However, if your
source stream's throughput exceeds the throughput of a single in-application input stream, you can
explicitly increase the number of in-application input streams that your application uses. You do so with
the InputParallelism parameter.
When the InputParallelism parameter is greater than one, Amazon Kinesis Data Analytics evenly
splits the partitions of your source stream among the in-application streams. For instance, if your source
stream has 50 shards, and you set InputParallelism to 2, each in-application input stream receives
the input from 25 source stream shards.
When you increase the number of in-application streams, your application must access the data in each
stream explicitly. For information about accessing multiple in-application streams in your code, see
Accessing Separate In-Application Streams in Your Amazon Kinesis Data Analytics Application (p. 30).
Although Kinesis Data Streams and Kinesis Data Firehose stream shards are both divided among in-
application streams in the same way, they differ in the way they appear to your application:
• The records from a Kinesis data stream include a shard_id field that can be used to identify the
source shard for the record.
28
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Parallelizing Input Streams for Increased Throughput
• The records from a Kinesis Data Firehose delivery stream don't include a field that identifies the
record's source shard or partition. This is because Kinesis Data Firehose abstracts this information away
from your application.
If the InputBytes metric is greater that 100 MB/sec (or you anticipate that it will be greater than this
rate), this can cause an increase in MillisBehindLatest and increase the impact of application issues.
To address this, we recommend making the following language choices for your application:
• Use multiple streams and Kinesis Data Analytics for SQL applications if your application has scaling
needs beyond 100 MB/second.
• Use Kinesis Data Analytics for Java Applications if you want to use a single stream and application.
If the MillisBehindLatest metric has either of the following characteristics, you should increase your
application's InputParallelism setting:
• The MillisBehindLatest metric is gradually increasing, indicating that your application is falling
behind the latest data in the stream.
• The MillisBehindLatest metric is consistently above 1000 (one second).
You don't need to increase your application's InputParallelism setting if the following are true:
• The MillisBehindLatest metric is gradually decreasing, indicating that your application is catching
up to the latest data in the stream.
• The MillisBehindLatest metric is below 1000 (one second).
For more information on using CloudWatch, see the CloudWatch User Guide.
29
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Parallelizing Input Streams for Increased Throughput
{
"ApplicationCode": "<The SQL code the new application will run on the input stream>",
"ApplicationDescription": "<A friendly description for the new application>",
"ApplicationName": "<The name for the new application>",
"Inputs": [
{
"InputId": "ID for the new input stream",
"InputParallelism": {
"Count": 2
}],
"Outputs": [ ... ],
}]
}
{
"InputUpdates": [
{
"InputId": "yourInputId",
"InputParallelismUpdate": {
"CountUpdate": 2
}
}
],
}
In the following example, each source stream is first aggregated using COUNT before being combined
into a single in-application stream called in_application_stream001. Aggregating the source
streams beforehand helps make sure that the combined in-application stream can handle the traffic from
multiple streams without being overloaded.
Note
To run this example and get results from both in-application input streams, update both
the number of shards in your source stream and the InputParallelism parameter in your
application.
30
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Application Code
ticker_symbol;
The preceding code example produces output in in_application_stream001 similar to the following:
Additional Considerations
When using multiple input streams, be aware of the following:
Application Code
Application code is a series of SQL statements that process input and produce output. These SQL
statements operate on in-application streams and reference tables. For more information, see Amazon
Kinesis Data Analytics for SQL Applications: How It Works (p. 3).
31
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Application Code
For information about the SQL language elements that are supported by Kinesis Data Analytics, see
Amazon Kinesis Data Analytics SQL Reference.
In relational databases, you work with tables, using INSERT statements to add records and the SELECT
statement to query the data. In Amazon Kinesis Data Analytics, you work with streams. You can write
a SQL statement to query these streams. The results of querying one in-application stream are always
sent to another in-application stream. When performing complex analytics, you might create several
in-application streams to hold the results of intermediate analytics. And then finally, you configure
application output to persist results of the final analytics (from one or more in-application streams) to
external destinations. In summary, the following is a typical pattern for writing application code:
• The SELECT statement is always used in the context of an INSERT statement. That is, when you select
rows, you insert results into another in-application stream.
• The INSERT statement is always used in the context of a pump. That is, you use pumps to write to an
in-application stream.
The following example application code reads records from one in-application
(SOURCE_SQL_STREAM_001) stream and write it to another in-application stream
(DESTINATION_SQL_STREAM). You can insert records to in-application streams using pumps, as shown
following:
Note
The identifiers that you specify for stream names and column names follow standard SQL
conventions. For example, if you put quotation marks around an identifier, it makes the
identifier case sensitive. If you don't, the identifier defaults to uppercase. For more information
about identifiers, see Identifiers in the Amazon Kinesis Data Analytics SQL Reference.
Your application code can consist of many SQL statements. For example:
• You can write SQL queries in a sequential manner where the result of one SQL statement feeds into
the next SQL statement.
• You can also write SQL queries that run independent of each other. For example, you can write
two SQL statements that query the same in-application stream, but send output into different in-
applications streams. You can then query the newly created in-application streams independently.
You can create in-application streams to save intermediate results. You insert data in in-application
streams using pumps. For more information, see In-Application Streams and Pumps (p. 68).
If you add an in-application reference table, you can write SQL to join data in in-application streams and
reference tables. For more information, see Example: Adding Reference Data to a Kinesis Data Analytics
Application (p. 118).
According to the application's output configuration, Amazon Kinesis Data Analytics writes data from
specific in-application streams to the external destination according to the application's output
configuration. Make sure that your application code writes to the in-application streams specified in the
output configuration.
32
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Output
There is a limit on the number of external destinations you can use to persist an application output. For
more information, see Limits (p. 165).
Note
We recommend that you use one external destination to persist in-application error stream data
so that you can investigate the errors.
• In-application stream name – The stream that you want to persist to an external destination.
Kinesis Data Analytics looks for the in-application stream that you specified in the output
configuration. (The stream name is case sensitive and must match exactly.) Make sure that your
application code creates this in-application stream.
• External destination – You can persist data to a Kinesis data stream, a Kinesis Data Firehose delivery
stream, or a Lambda function. You provide the Amazon Resource Name (ARN) of the stream or
function. You also provide an IAM role that Kinesis Data Analytics can assume to write to the stream or
function on your behalf. You describe the record format (JSON, CSV) to Kinesis Data Analytics to use
when writing to the external destination.
If Kinesis Data Analytics can't write to the streaming or Lambda destination, the service continues to
try indefinitely. This creates back pressure, causing your application to fall behind. If this issue is not
resolved, your application eventually stops processing new data. You can monitor Kinesis Data Analytics
Metrics and set alarms for failures. For more information about metrics and alarms, see Using Amazon
CloudWatch Metrics and Creating Amazon CloudWatch Alarms.
You can configure the application output using the AWS Management Console. The console makes the
API call to save the configuration.
"Outputs": [
{
"DestinationSchema": {
"RecordFormatType": "string"
},
"KinesisStreamsOutput": {
33
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using a Lambda Function as Output
"ResourceARN": "string",
"RoleARN": "string"
},
"Name": "string"
}
"Outputs": [
{
"DestinationSchema": {
"RecordFormatType": "string"
},
"KinesisFirehoseOutput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"Name": "string"
}
]
"Outputs": [
{
"DestinationSchema": {
"RecordFormatType": "string"
},
"LambdaOutput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"Name": "string"
}
]
34
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using a Lambda Function as Output
• Data encryption
Lambda functions can deliver analytic information to a variety of AWS services and other destinations,
including the following:
For more information about creating Lambda applications, see Getting Started with AWS Lambda.
Topics
• Lambda as Output Permissions (p. 35)
• Lambda as Output Metrics (p. 35)
• Lambda as Output Event Input Data Model and Record Response Model (p. 35)
• Lambda Output Invocation Frequency (p. 37)
• Adding a Lambda Function for Use as an Output (p. 37)
• Common Lambda as Output Failures (p. 38)
• Creating Lambda Functions for Application Destinations (p. 38)
{
"Sid": "UseLambdaFunction",
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
"lambda:GetFunctionConfiguration"
],
"Resource": "FunctionARN"
}
35
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using a Lambda Function as Output
Field Description
records
Field Description
lambdaDeliveryRecordMetadata
Field Description
Note
The retryHint is a value that increases for every delivery failure. This value is not durably
persisted, and resets if the application is disrupted.
records
Field Description
result The status of the delivery of the record. The following are possible
values:
36
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using a Lambda Function as Output
Field Description
Data Analytics continuously retries sending the delivery failed
records to the Lambda as output function.
• If records are emitted to the destination in-application stream within the data analytics application
as a tumbling window, the AWS Lambda destination function is invoked per tumbling window trigger.
For example, if a tumbling window of 60 seconds is used to emit the records to the destination in-
application stream, the Lambda function is invoked once every 60 seconds.
• If records are emitted to the destination in-application stream within the application as a continuous
query or a sliding window, the Lambda destination function is invoked about once per second.
Note
Per-Lambda function invoke request payload size limits apply. Exceeding those limits results in
output records being split and sent across multiple Lambda function calls.
1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://
console.aws.amazon.com/kinesisanalytics.
2. Choose the application in the list, and then choose Application details.
3. In the Destination section, choose Connect new destination.
4. For the Destination item, choose AWS Lambda function.
5. In the Deliver records to AWS Lambda section, either choose an existing Lambda function and
version, or choose Create new.
6. If you are creating a new Lambda function, do the following:
a. Choose one of the templates provided. For more information, Creating Lambda Functions for
Application Destinations (p. 38).
b. The Create Function page opens in a new browser tab. In the Name box, give the function a
meaningful name (for example, myLambdaFunction).
c. Update the template with post-processing functionality for your application. For information
about creating a Lambda function, see Getting Started in the AWS Lambda Developer Guide.
d. On the Kinesis Data Analytics console, in the Lambda function list, choose the Lambda function
that you just created. Choose $LATEST for the Lambda function version.
7. In the In-application stream section, choose Choose an existing in-application stream. For In-
application stream name, choose your application's output stream. The results from the selected
output stream are sent to the Lambda output function.
8. Leave the rest of the form with the default values, and choose Save and continue.
Your application now sends records from the in-application stream to your Lambda function. You
can see the results of the default template in the Amazon CloudWatch console. Monitor the AWS/
37
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using a Lambda Function as Output
• Not all records (with record IDs) in a batch that are sent to the Lambda function are returned to the
Kinesis Data Analytics service.
• The response is missing either the record ID or the status field.
• The Lambda function timeouts are not sufficient to accomplish the business logic within the Lambda
function.
• The business logic within the Lambda function does not catch all the errors, resulting in a timeout and
backpressure due to unhandled exceptions. These are often referred as “poison pill” messages.
For data delivery failures, Kinesis Data Analytics continues to retry Lambda invocations on the same
set of records until successful. To gain insight into failures, you can monitor the following CloudWatch
metrics:
• Kinesis Data Analytics application Lambda as Output CloudWatch metrics: Indicates the number of
successes and failures, among other statistics. For more information, see Amazon Kinesis Analytics
Metrics.
• AWS Lambda function CloudWatch metrics and logs.
Topics
• Creating a Lambda Function Destination in Node.js (p. 38)
• Creating a Lambda Function Destination in Python (p. 38)
• Creating a Lambda Function Destination in Java (p. 39)
• Creating a Lambda Function Destination in .NET (p. 39)
38
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using a Lambda Function as Output
The following code demonstrates a sample destination Lambda function using Java:
@Override
public KinesisAnalyticsOutputDeliveryResponse
handleRequest(KinesisAnalyticsOutputDeliveryEvent event,
Context context) {
context.getLogger().log("InvocatonId is : " + event.invocationId);
context.getLogger().log("ApplicationArn is : " + event.applicationArn);
event.records.stream().forEach(record -> {
context.getLogger().log("recordId is : " + record.recordId);
context.getLogger().log("record retryHint is :" +
record.lambdaDeliveryRecordMetadata.retryHint);
// Add logic here to transform and send the record to final destination of your
choice.
response.records.add(new Record(record.recordId,
KinesisAnalyticsOutputDeliveryResponse.Result.Ok));
});
return response;
}
The following code demonstrates a sample destination Lambda function using C#:
39
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Application Output Delivery Model
context.Logger.LogLine($"ApplicationArn: {evnt.ApplicationArn}");
For more information about creating Lambda functions for pre-processing and destinations in .NET, see
Amazon.Lambda.KinesisAnalyticsEvents.
In a normal situation, your application processes incoming data continuously. Kinesis Data Analytics
writes the output to the configured destinations, such as a Kinesis data stream or a Kinesis Data Firehose
delivery stream. However, your application can be interrupted occasionally, for example:
When your application restarts, Kinesis Data Analytics ensures that it continues to process and write
output from a point before or equal to when the failure occurred. This helps ensure that it doesn't miss
delivering any application output to the configured destinations.
Suppose that you configured multiple destinations from the same in-application stream. After the
application recovers from failure, Kinesis Data Analytics resumes persisting output to the configured
destinations from the last record that was delivered to the slowest destination. This might result in
40
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Error Handling
the same output record delivered more than once to other destinations. In this case, you must handle
potential duplications in the destination externally.
Error Handling
Amazon Kinesis Data Analytics returns API or SQL errors directly to you. For more information about API
operations, see Actions (p. 188). For more information about handling SQL errors, see Amazon Kinesis
Data Analytics SQL Reference.
Amazon Kinesis Data Analytics reports runtime errors using an in-application error stream called
error_stream.
• A record read from the streaming source does not conform to the input schema.
• Your application code specifies division by zero.
• The rows are out of order (for example, a record appears on the stream with a ROWTIME value that a
user modified that causes a record to go out of order).
• The data in the source stream can't be converted to the data type specified in the schema (Coercion
error). For information about what data types can be converted, see Mapping JSON Data Types to SQL
Data Types (p. 15).
We recommend that you handle these errors programmatically in your SQL code or persist the data
on the error stream to an external destination. This requires that you add an output configuration (see
Configuring Application Output (p. 33)) to your application. For an example of how the in-application
error stream works, see Example: Exploring the In-Application Error Stream (p. 143).
Note
Your Kinesis data analytics application can't access or modify the error stream programmatically
because the error stream is created using the system account. You must use the error output to
determine what errors your application might encounter. You then write your application's SQL
code to handle anticipated error conditions.
ERROR_LEVEL VARCHAR(10)
ERROR_NAME VARCHAR(32)
MESSAGE VARCHAR(4096)
41
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Auto Scaling Applications
The default limit for KPUs for your application is eight. For instructions on how to request an increase to
this limit, see To request a limit increase in AWS Service Limits.
Using Tagging
This section describes how to add key-value metadata tags to Kinesis Data Analytics applications. These
tags can be used for the following purposes:
• Determining billing for individual Kinesis Data Analytics applications. For more information, see Using
Cost Allocation Tags in the AWS Billing and Cost Management Guide.
• Controlling access to application resources based on tags. For more information, see Controlling Access
Using Tags in the AWS Identity and Access Management User Guide.
• User-defined purposes. You can define application functionality based on the presence of user tags.
• The maximum number of application tags includes system tags. The maximum number of user-defined
application tags is 50.
• If an action includes a tag list that has duplicate Key values, the service throws an
InvalidArgumentException.
42
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Adding or Updating Tags for an Existing Application
The following example request shows the Tags node for a CreateApplication request:
"Tags": [
{
"Key": "Key1",
"Value": "Value1"
},
{
"Key": "Key2",
"Value": "Value2"
}
]
To update an existing tag, add a tag with the same key of the existing tag.
The following example request for the TagResource action adds new tags or updates existing tags:
{
"ResourceARN": "string",
"Tags": [
{
"Key": "NewTagKey",
"Value": "NewTagValue"
},
{
"Key": "ExistingKeyOfTagToUpdate",
"Value": "NewValueForExistingTag"
}
]
}
The following example request for the ListTagsForResource action lists tags for an application:
{
"ResourceARN": "arn:aws:kinesisanalytics:us-west-2:012345678901:application/
MyApplication"
}
The following example request for the UntagResource action removess tags from an application:
{
"ResourceARN": "arn:aws:kinesisanalytics:us-west-2:012345678901:application/
MyApplication",
"TagKeys": [ "KeyOfFirstTagToRemove", "KeyOfSecondTagToRemove" ]
43
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Removing Tags from an Application
44
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 1: Set Up an Account
Topics
• Step 1: Set Up an AWS Account and Create an Administrator User (p. 45)
• Step 2: Set Up the AWS Command Line Interface (AWS CLI) (p. 46)
• Step 3: Create Your Starter Amazon Kinesis Data Analytics Application (p. 47)
• Step 4 (Optional) Edit the Schema and SQL Code Using the Console (p. 58)
With Kinesis Data Analytics, you pay only for the resources you use. If you are a new AWS customer, you
can get started with Kinesis Data Analytics for free. For more information, see AWS Free Usage Tier.
If you already have an AWS account, skip to the next task. If you don't have an AWS account, perform the
steps in the following procedure to create one.
1. Open https://portal.aws.amazon.com/billing/signup.
2. Follow the online instructions.
Part of the sign-up procedure involves receiving a phone call and entering a verification code on the
phone keypad.
Note your AWS account ID because you'll need it for the next task.
45
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Create an IAM User
If you signed up for AWS, but you haven't created an IAM user for yourself, you can create one using the
IAM console.
The Getting Started exercises in this guide assume that you have a user (adminuser) with administrator
privileges. Follow the procedure to create adminuser in your account.
1. Create an administrator user called adminuser in your AWS account. For instructions, see Creating
Your First IAM User and Administrators Group in the IAM User Guide.
2. A user can sign in to the AWS Management Console using a special URL. For more information, How
Users Sign In to Your Account in the IAM User Guide.
Next Step
Step 2: Set Up the AWS Command Line Interface (AWS CLI) (p. 46)
1. Download and configure the AWS CLI. For instructions, see the following topics in the AWS
Command Line Interface User Guide:
46
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Next Step
[profile adminuser]
aws_access_key_id = adminuser access key ID
aws_secret_access_key = adminuser secret access key
region = aws-region
For a list of available AWS Regions, see Regions and Endpoints in the Amazon Web Services General
Reference.
3. Verify the setup by entering the following help command at the command prompt:
aws help
Next Step
Step 3: Create Your Starter Amazon Kinesis Data Analytics Application (p. 47)
For this Getting Started exercise, you can use the console to work with either the demo stream or
templates with application code.
• If you choose to use the demo stream, the console creates a Kinesis data stream in your account that is
called kinesis-analytics-demo-stream.
A Kinesis data analytics application requires a streaming source. For this source, several SQL examples
in this guide use the demo stream kinesis-analytics-demo-stream. The console also runs a
script that continuously adds sample data (simulated stock trade records) to this stream, as shown
following.
47
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 3: Create Your Starter Analytics Application
You can use kinesis-analytics-demo-stream as the streaming source for your application in this
exercise.
Note
The demo stream remains in your account. You can use it to test other examples in this guide.
However, when you leave the console, the script that the console uses stops populating the
data. When needed, the console provides the option to start populating the stream again.
• If you choose to use the templates with example application code, you use template code that the
console provides to perform simple analytics on the demo stream.
You use these features to quickly set up your first application as follows:
1. Create an application – You only need to provide a name. The console creates the application and the
service sets the application state to READY.
2. Configure input – First, you add a streaming source, the demo stream. You must create a demo stream
in the console before you can use it. Then, the console takes a random sample of records on the demo
stream and infers a schema for the in-application input stream that is created. The console names the
in-application stream SOURCE_SQL_STREAM_001.
The console uses the discovery API to infer the schema. If necessary, you can edit the inferred schema.
For more information, see DiscoverInputSchema (p. 223). Kinesis Data Analytics uses this schema to
create an in-application stream.
When you start the application, Kinesis Data Analytics reads the demo stream continuously on your
behalf and inserts rows in the SOURCE_SQL_STREAM_001 in-application input stream.
48
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 3.1: Create an Application
3. Specify application code – You use a template (called Continuous filter) that provides the following
code:
The application code queries the in-application stream SOURCE_SQL_STREAM_001. The code then
inserts the resulting rows in another in-application stream DESTINATION_SQL_STREAM, using pumps.
For more information about this coding pattern, see Application Code (p. 31).
For information about the SQL language elements that are supported by Kinesis Data Analytics, see
Amazon Kinesis Data Analytics SQL Reference.
4. Configuring output – In this exercise, you don't configure any output. That is, you don't persist data in
the in-application stream that your application creates to any external destination. Instead, you verify
query results in the console. Additional examples in this guide show how to configure output. For one
example, see Example: Creating Simple Alerts (p. 141).
Important
The exercise uses the US East (N. Virginia) Region (us-east-1) to set up the application. You can
use any of the supported AWS Regions.
Next Step
1. Sign in to the AWS Management Console and open the Kinesis Data Analytics console at https://
console.aws.amazon.com/kinesisanalytics.
2. Choose Create application.
3. On the Create application page, type an application name, type a description, choose SQL for the
application's Runtime setting, and then choose Create application.
49
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 3.2: Configure Input
Doing this creates a Kinesis data analytics application with a status of READY. The console shows the
application hub where you can configure input and output.
Note
To create an application, the CreateApplication (p. 203) operation requires only the
application name. You can add input and output configuration after you create an
application in the console.
In the next step, you configure input for the application. In the input configuration, you add a
streaming data source to the application and discover a schema for an in-application input stream
by sampling data on the streaming source.
Next Step
1. On the application hub page in the console, choose Connect streaming data.
50
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 3.2: Configure Input
• Source section, where you specify a streaming source for your application. You can select an
existing stream source or create one. In this exercise, you create a new stream, the demo stream.
By default the console names the in-application input stream that is created as
INPUT_SQL_STREAM_001. For this exercise, keep this name as it appears.
• Stream reference name – This option shows the name of the in-application input stream that is
created, SOURCE_SQL_STREAM_001. You can change the name, but for this exercise, keep this
name.
In the input configuration, you map the demo stream to an in-application input stream that is
created. When you start the application, Amazon Kinesis Data Analytics continuously reads the
51
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 3.2: Configure Input
demo stream and insert rows in the in-application input stream. You query this in-application
input stream in your application code.
• Record pre-processing with AWS Lambda: This option is where you specify an AWS Lambda
expression that modifies the records in the input stream before your application code executes.
In this exercise, leave the Disabled option selected. For more information about Lambda
preprocessing, see Preprocessing Data Using a Lambda Function (p. 21).
After you provide all the information on this page, the console sends an update request (see
UpdateApplication (p. 239)) to add the input configuration the application.
3. On the Source page, choose Configure a new stream.
4. Choose Create demo stream. The console configures the application input by doing the following:
• The Raw stream sample tab shows the raw stream records sampled by the
DiscoverInputSchema (p. 223) API action to infer the schema.
• The Formatted stream sample tab shows the tabular version of the data in the Raw stream
sample tab.
52
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 3.3: Add Real-Time Analytics (Add Application Code)
• If you choose Edit schema, you can edit the inferred schema. For this exercise, don't change the
inferred schema. For more information about editing a schema, see Working with the Schema
Editor (p. 58).
If you choose Rediscover schema, you can request the console to run
DiscoverInputSchema (p. 223) again and infer the schema.
5. Choose Save and continue.
You now have an application with input configuration added to it. In the next step, you add SQL
code to perform some analytics on the data in-application input stream.
Next Step
Step 3.3: Add Real-Time Analytics (Add Application Code) (p. 53)
53
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 3.3: Add Real-Time Analytics (Add Application Code)
2. In the Would you like to start running "ExampleApp"? dialog box, choose Yes, start application.
The console sends a request to start the application (see StartApplication (p. 231)), and then the
SQL editor page appears.
3. The console opens the SQL editor page. Review the page, including the buttons (Add SQL from
templates, Save and run SQL) and various tabs.
4. In the SQL editor, choose Add SQL from templates.
5. From the available template list, choose Continuous filter. The sample code reads data from one in-
application stream (the WHERE clause filters the rows) and inserts it in another in-application stream
as follows:
54
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 3.3: Add Real-Time Analytics (Add Application Code)
Remember, you already started the application (status is RUNNING). Therefore, Amazon Kinesis
Data Analytics is already continuously reading from the streaming source and adding rows to the in-
application stream SOURCE_SQL_STREAM_001.
a. In the SQL Editor, choose Save and run SQL. The console first sends update request to save the
application code. Then, the code continuously executes.
b. You can see the results in the Real-time analytics tab.
• The Source data tab shows an in-application input stream that is mapped to the streaming
source. Choose the in-application stream, and you can see data coming in. Note the additional
columns in the in-application input stream that weren't specified in the input configuration.
These include the following timestamp columns:
• ROWTIME – Each row in an in-application stream has a special column called ROWTIME.
This column is the timestamp when Amazon Kinesis Data Analytics inserted the row in the
55
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 3.4: (Optional) Update the Application Code
first in-application stream (the in-application input stream that is mapped to the streaming
source).
• Approximate_Arrival_Time – Each Kinesis Data Analytics record includes a value called
Approximate_Arrival_Time. This value is the approximate arrival timestamp that is
set when the streaming source successfully receives and stores the record. When Kinesis
Data Analytics reads records from a streaming source, it fetches this column into the in-
application input stream.
These timestamp values are useful in windowed queries that are time-based. For more
information, see Windowed Queries (p. 72).
• The Real-time analytics tab shows all the other in-application streams created by your
application code. It also includes the error stream. Kinesis Data Analytics sends any rows it
cannot process to the error stream. For more information, see Error Handling (p. 41).
Choose DESTINATION_SQL_STREAM to view the rows your application code inserted. Note
the additional columns that your application code didn't create. These columns include the
ROWTIME timestamp column. Kinesis Data Analytics simply copies these values from the
source (SOURCE_SQL_STREAM_001).
• The Destination tab shows the external destination where Kinesis Data Analytics writes the
query results. You haven't configured any external destination for your application output yet.
Next Step
In the SQL editor, append the following code to the existing application code:
56
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 3.4: (Optional) Update the Application Code
Save and run the code. Additional in-application streams appear on the Real-time analytics tab.
2. Create two in-application streams. Filter rows in the SOURCE_SQL_STREAM_001 based on the stock
ticker, and then insert them in to these separate streams.
Save and run the code. Notice additional in-application streams on the Real-time analytics tab.
You now have your first working Amazon Kinesis Data Analytics application. In this exercise, you did the
following:
• Configured application input that identified the demo stream as the streaming source and mapped
it to an in-application stream (SOURCE_SQL_STREAM_001) that is created. Kinesis Data Analytics
continuously reads the demo stream and inserts records in the in-application stream.
• Your application code queried the SOURCE_SQL_STREAM_001 and wrote output to another in-
application stream called DESTINATION_SQL_STREAM.
Now you can optionally configure application output to write the application output to an
external destination. That is, you can configure the application output to write records in the
DESTINATION_SQL_STREAM to an external destination. For this exercise, this is an optional step. To
learn how to configure the destination, go to the next step.
Next Step
Step 4 (Optional) Edit the Schema and SQL Code Using the Console (p. 58).
57
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Step 4 (Optional) Edit the Schema
and SQL Code Using the Console
Topics
• Working with the Schema Editor (p. 58)
• Working with the SQL Editor (p. 65)
The schema contains selection criteria for determining what part of the streaming input is transformed
into a data column in the in-application input stream. This input can be one of the following:
• A JSONPath expression for JSON input streams. JSONPath is a tool for querying JSON data.
• A column number for input streams in comma-separated values (CSV) format.
• A column name and a SQL data type for presenting the data in the in-application data stream. The
data type also contains a length for character or binary data.
The console attempts to generate the schema using DiscoverInputSchema (p. 223). If schema discovery
fails or returns an incorrect or incomplete schema, you must edit the schema manually by using the
schema editor.
58
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with the Schema Editor
• Add a column (1): You might need to add a data column if a data item is not detected automatically.
• Delete a column (2): You can exclude data from the source stream if your application doesn't require
it. This exclusion doesn't affect the data in the source stream. If data is excluded, that data simply isn't
made available to the application.
59
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with the Schema Editor
• Rename a column (3). A column name can't be blank, must be longer than a single character, and
must not contain reserved SQL keywords. The name must also meet naming criteria for SQL ordinary
identifiers: The name must start with a letter and contain only letters, underscore characters, and
digits.
• Change the data type (4) or length (5) of a column: You can specify a compatible data type for a
column. If you specify an incompatible data type, the column is either populated with NULL or the in-
application stream is not populated at all. In the latter case, errors are written to the error stream. If
you specify a length for a column that is too small, the incoming data is truncated.
• Change the selection criteria of a column (6): You can edit the JSONPath expression or CSV column
order used to determine the source of the data in a column. To change the selection criteria for a
JSON schema, enter a new value for the row path expression. A CSV schema uses the column order as
selection criteria. To change the selection criteria for a CSV schema, change the order of the columns.
60
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with the Schema Editor
3. For Format, choose JSON or CSV. For JSON or CSV format, the supported encoding is ISO 8859-1.
For further information on editing the schema for JSON or CSV format, see the procedures in the next
sections.
A new column appears in the first column position. To change the column order, choose the up and
down arrows next to the column name.
A column name cannot be blank, must be longer than a single character, and must not contain
reserved SQL keywords. It must also meet naming criteria for SQL ordinary identifiers: It must
start with a letter and contain only letters, underscore characters, and digits.
• For Column type, type an SQL data type.
A column type can be any supported SQL data type. If the new data type is CHAR, VARBINARY, or
VARCHAR, specify a data length for Length. For more information, see Data Types.
61
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with the Schema Editor
• For Row path, provide a row path. A row path is a valid JSONPath expression that maps to a JSON
element.
Note
The base Row path value is the path to the top-level parent that contains the data to
be imported. This value is $ by default. For more information, see RecordRowPath in
JSONMappingParameters.
2. To delete a column, choose the x icon next to the column number.
3. To rename a column, enter a new name for Column name. The new column name cannot be blank,
must be longer than a single character, and must not contain reserved SQL keywords. It must also
meet naming criteria for SQL ordinary identifiers: It must start with a letter and contain only letters,
underscore characters, and digits.
4. To change the data type of a column, choose a new data type for Column type. If the new data type
is CHAR, VARBINARY, or VARCHAR, specify a data length for Length. For more information, see Data
Types.
5. Choose Save schema and update stream to save your changes.
The modified schema appears in the editor and looks similar to the following.
62
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with the Schema Editor
If your schema has many rows, you can filter the rows using Filter by column name. For example, to edit
column names that start with P, such as a Price column, enter P in the Filter by column name box.
1. In the schema editor, for Row delimiter, choose the delimiter used by your incoming data stream.
This is the delimiter between records of data in your stream, such as a newline character.
2. For Column delimiter, choose the delimiter used by your incoming data stream. This is the delimiter
between fields of data in your stream, such as a comma.
3. To add a column, choose Add column.
A new column appears in the first column position. To change the column order, choose the up and
down arrows next to the column name.
A column name cannot be blank, must be longer than a single character, and must not contain
reserved SQL keywords. It must also meet naming criteria for SQL ordinary identifiers: It must
start with a letter and contain only letters, underscore characters, and digits.
• For Column type, enter a SQL data type.
A column type can be any supported SQL data type. If the new data type is CHAR, VARBINARY, or
VARCHAR, specify a data length for Length. For more information, see Data Types.
63
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with the Schema Editor
5. To rename a column, enter a new name in Column name. The new column name cannot be blank,
must be longer than a single character, and must not contain reserved SQL keywords. It must also
meet naming criteria for SQL ordinary identifiers: It must start with a letter and contain only letters,
underscore characters, and digits.
6. To change the data type of a column, choose a new data type for Column type. If the new data type
is CHAR, VARBINARY, or VARCHAR, specify a data length for Length. For more information, see Data
Types.
7. Choose Save schema and update stream to save your changes.
The modified schema appears in the editor and looks similar to the following.
64
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with the SQL Editor
If your schema has many rows, you can filter the rows using Filter by column name. For example, to edit
column names that start with P, such as a Price column, enter P in the Filter by column name box.
65
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with the SQL Editor
Amazon Kinesis Data Analytics provides the following timestamp columns, so that you don't need to
provide explicit mapping in your input configuration:
• ROWTIME – Each row in an in-application stream has a special column called ROWTIME. This column
is the timestamp for the point when Kinesis Data Analytics inserted the row in the first in-application
stream.
• Approximate_Arrival_Time – Records on your streaming source include the
Approximate_Arrival_Timestamp column. It is the approximate arrival timestamp that is set when
the streaming source successfully receives and stores the related record. Kinesis Data Analytics fetches
this column into the in-application input stream as Approximate_Arrival_Time. Amazon Kinesis
Data Analytics provides this column only in the in-application input stream that is mapped to the
streaming source.
These timestamp values are useful in windowed queries that are time-based. For more information, see
Windowed Queries (p. 72).
66
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Working with the SQL Editor
Destination Tab
The Destination tab enables you to configure the application output to persist in-application streams
to external destinations. You can configure output to persist data in any of the in-application streams to
external destinations. For more information, see Configuring Application Output (p. 33).
67
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
In-Application Streams and Pumps
Topics
• In-Application Streams and Pumps (p. 68)
• Timestamps and the ROWTIME Column (p. 69)
• Continuous Queries (p. 71)
• Windowed Queries (p. 72)
• Streaming Data Operations: Stream Joins (p. 82)
You can also create more in-application streams as needed to store intermediate query results. Creating
an in-application stream is a two-step process. First, you create an in-application stream, and then
you pump data into it. For example, suppose that the input configuration of your application creates
an in-application stream named INPUTSTREAM. In the following example, you create another stream
(TEMPSTREAM), and then you pump data from INPUTSTREAM into it.
The column names are specified in quotes, making them case sensitive. For more information, see
Identifiers in the Amazon Kinesis Data Analytics SQL Reference.
2. Insert data into the stream using a pump. A pump is a continuous insert query running that inserts
data from one in-application stream to another in-application stream. The following statement
creates a pump (SAMPLEPUMP) and inserts data into the TEMPSTREAM by selecting records from
another stream (INPUTSTREAM).
68
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Timestamps and the ROWTIME Column
You can have multiple writers insert into an in-application stream, and there can be multiple readers
selected from the stream. Think of an in-application stream as implementing a publish/subscribe
messaging paradigm. In this paradigm, the data row, including the time of creation and time of receipt,
can be processed, interpreted, and forwarded by a cascade of streaming SQL statements, without having
to be stored in a traditional RDBMS.
After an in-application stream is created, you can perform normal SQL queries.
Note
When you query streams, most SQL statements are bound using a row-based or time-based
window. For more information, see Windowed Queries (p. 72).
You can also join streams. For examples of joining streams, see Streaming Data Operations: Stream
Joins (p. 82).
Amazon Kinesis Data Analytics guarantees that the ROWTIME values are monotonically increased.
You use this timestamp in time-based windowed queries. For more information, see Windowed
Queries (p. 72).
You can access the ROWTIME column in your SELECT statement like any other columns in your in-
application stream. For example:
• Event time – The timestamp when the event occurred. This is also sometimes called the client-side
time. It is often desirable to use this time in analytics because it is the time when an event occurred.
However, many event sources, such as mobile phones and web clients, do not have reliable clocks,
which can lead to inaccurate times. In addition, connectivity issues can lead to records appearing on a
stream not in the same order the events occurred.
69
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Understanding Various Times in Streaming Analytics
• Ingest time – The timestamp of when record was added to the streaming source. Amazon Kinesis
Data Streams includes a field called APPROXIMATE_ARRIVAL_TIME in every record that provides this
timestamp. This is also sometimes referred to as the server-side time. This ingest time is often the close
approximation of event time. If there is any kind of delay in the record ingestion to the stream, this
can lead to inaccuracies, which are typically rare. Also, the ingest time is rarely out of order, but it can
occur due to the distributed nature of streaming data. Therefore, Ingest time is a mostly accurate and
in-order reflection of the event time.
• Processing time – The timestamp when Amazon Kinesis Data Analytics inserts a row in the first in-
application stream. Amazon Kinesis Data Analytics provides this timestamp in the ROWTIME column
that exists in each in-application stream. The processing time is always monotonically increasing. But it
will not be accurate if your application falls behind. (If an application falls behind, the processing time
does not accurately reflect the event time.) This ROWTIME is accurate in relation to the wall clock, but it
might not be the time when the event actually occurred.
Using each of these times in windowed queries that are time-based has advantages and disadvantages.
We recommend that you choose one or more of these times, and a strategy to deal with the relevant
disadvantages based on your use case scenario.
Note
If you are using row-based windows, time is not an issue and you can ignore this section.
We recommend a two-window strategy that uses two time-based, both ROWTIME and one of the other
times (ingest or event time).
• Use ROWTIME as the first window, which controls how frequently the query emits the results, as shown
in the following example. It is not used as a logical time.
• Use one of the other times that is the logical time that you want to associate with your analytics. This
time represents when the event occurred. In the following example, the analytics goal is to group the
records and return count by ticker.
The advantage of this strategy is that it can use a time that represents when the event occurred. It
can gracefully handle when your application falls behind or when events arrive out of order. If the
application falls behind when bringing records into the in-application stream, they are still grouped by
the logical time in the second window. The query uses ROWTIME to guarantee the order of processing.
Any records that are late (the ingest timestamp shows an earlier value compared to the ROWTIME value)
are also processed successfully.
Consider the following query against the demo stream used in the Getting Started Exercise. The query
uses the GROUP BY clause and emits a ticker count in a one-minute tumbling window.
70
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Continuous Queries
"TICKER_SYMBOL",
COUNT(*) AS "symbol_count"
FROM "SOURCE_SQL_STREAM_001"
GROUP BY "TICKER_SYMBOL",
STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND),
STEP("SOURCE_SQL_STREAM_001".APPROXIMATE_ARRIVAL_TIME BY INTERVAL '60' SECOND);
In GROUP BY, you first group the records based on ROWTIME in a one-minute window and then by
APPROXIMATE_ARRIVAL_TIME.
The timestamp values in the result are rounded down to the nearest 60-second interval. The first group
result emitted by the query shows records in the first minute. The second group of results emitted shows
records in the next minutes based on ROWTIME. The last record indicates that the application was late in
bringing the record in the in-application stream (it shows a late ROWTIME value compared to the ingest
timestamp).
You can combine the results for a final accurate count per minute by pushing the results to a
downstream database. For example, you can configure the application output to persist the results to a
Kinesis Data Firehose delivery stream that can write to an Amazon Redshift table. After results are in an
Amazon Redshift table, you can query the table to compute the total count group by Ticker_Symbol.
In the case of XYZ, the total is accurate (6+1) even though a record arrived late.
Continuous Queries
A query over a stream executes continuously over streaming data. This continuous execution enables
scenarios, such as the ability for applications to continuously query a stream and generate alerts.
In the Getting Started exercise, you have an in-application stream named SOURCE_SQL_STREAM_001. It
continuously receives stock prices from a demo stream (a Kinesis data stream). The schema is as follows:
(TICKER_SYMBOL VARCHAR(4),
SECTOR varchar(16),
CHANGE REAL,
PRICE REAL)
Suppose that you are interested in stock price changes greater than 15 percent. You can use the
following query in your application code. This query runs continuously and emits records when a stock
price change greater than 15 percent is detected.
71
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Windowed Queries
Use the following procedure to set up an Amazon Kinesis Data Analytics application and test this query.
Windowed Queries
SQL queries in your application code execute continuously over in-application streams. An in-application
stream represents unbounded data that flows continuously through your application. Therefore, to get
result sets from this continuously updating input, you often bound queries using a window defined in
terms of time or rows. These are also called windowed SQL.
For a time-based windowed query, you specify the window size in terms of time (for example, a one-
minute window). This requires a timestamp column in your in-application stream that is monotonically
increasing. (The timestamp for a new row is greater than or equal to the previous row.) Amazon Kinesis
Data Analytics provides such a timestamp column called ROWTIME for each in-application stream. You
can use this column when specifying time-based queries. For your application, you might choose some
other timestamp option. For more information, see Timestamps and the ROWTIME Column (p. 69).
For a row-based windowed query, you specify the window size in terms of the number of rows.
You can specify a query to process records in a tumbling window, sliding window, or stagger window
manner, depending on your application needs. Kinesis Data Analytics supports the following window
types:
• Stagger Windows (p. 72): A query that aggregates data using keyed time-based windows that open
as data arrives. The keys allow for multiple overlapping windows. This is the recommended way to
aggregate data using time-based windows, because Stagger Windows reduce late or out-of-order data
compared to Tumbling windows.
• Tumbling Windows (p. 77): A query that aggregates data using distinct time-based windows that
open and close at regular intervals.
• Sliding Windows (p. 78): A query that aggregates data continuously, using a fixed time or rowcount
interval.
Stagger Windows
Using stagger windows is a windowing method that is suited for analyzing groups of data that arrive at
inconsistent times. It is well suited for any time-series analytics use case, such as a set of related sales or
log records.
72
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Stagger Windows
For example, VPC Flow Logs have a capture window of approximately 10 minutes. But they can have a
capture window of up to 15 minutes if you're aggregating data on the client. Stagger windows are ideal
for aggregating these logs for analysis.
Stagger windows address the issue of related records not falling into the same time-restricted window,
such as when tumbling windows were used.
If tumbling windows are used to analyze groups of time-related data, the individual records might fall
into separate windows. So then the partial results from each window must be combined later to yield
complete results for each group of records.
In the following tumbling window query, records are grouped into windows by row time, event time, and
ticker symbol:
In the following diagram, an application is counting the number of trades it receives, based on when
the trades happened (event time) with one minute of granularity. The application can use a tumbling
window for grouping data based on row time and event time. The application receives four records
that all arrive within one minute of each other. It groups the records by row time, event time, and ticker
symbol. Because some of the records arrive after the first tumbling window ends, the records do not all
fall within the same one-minute tumbling window.
73
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Stagger Windows
The result set from the tumbling window application looks similar to the following.
74
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Stagger Windows
• A record with a ROWTIME of 11:01:00 that aggregates the first two records.
• A record at 11:02:00 that aggregates just the third record. This record has a ROWTIME within the
second window, but an EVENT_TIME within the first window.
• A record at 11:02:00 that aggregates just the fourth record.
To analyze the complete result set, the records must be aggregated in the persistence store. This adds
complexity and processing requirements to the application.
A stagger window is a separate time-restricted window for each key grouping in a window clause. The
application aggregates each result of the window clause inside its own time window, rather than using a
single window for all results.
In the following stagger window query, records are grouped into windows by event time and ticker
symbol:
In the following diagram, events are aggregated by event time and ticker symbol into stagger windows.
75
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Stagger Windows
The preceding diagram has the following events, which are the same events as the tumbling window
application analyzed:
The result set from the stagger window application looks similar to the following.
76
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Tumbling Windows
The returned record aggregates the first three input records. The records are grouped by one-minute
stagger windows. The stagger window starts when the application receives the first AMZN record (with
a ROWTIME of 11:00:20). When the 1-minute stagger window expires (at 11:01:20), a record with the
results that fall within the stagger window (based on ROWTIME and EVENT_TIME) is written to the
output stream. Using a stagger window, all of the records with a ROWTIME and EVENT_TIME within a
one-minute window are emitted in a single result.
The last record (with an EVENT_TIME outside the one-minute aggregation) is aggregated separately. This
is because EVENT_TIME is one of the partition keys that is used to separate the records into result sets,
and the partition key for EVENT_TIME for the first window is 11:00.
The syntax for a stagger window is defined in a special clause, WINDOWED BY. This clause is used instead
of the GROUP BY clause for streaming aggregations. The clause appears immediately after the optional
WHERE clause and before the HAVING clause.
The stagger window is defined in the WINDOWED BY clause and takes two parameters: partition keys
and window length. The partition keys partition the incoming data stream and define when the window
opens. A stagger window opens when the first event with a unique partition key appears on the stream.
The stagger window closes after a fixed time period defined by the window length. The syntax is shown
in the following code example:
...
FROM <stream-name>
WHERE <... optional statements...>
WINDOWED BY STAGGER(
PARTITION BY <partition key(s)>
RANGE INTERVAL <window length, interval>
);
For example, an aggregation query using a GROUP BY clause processes rows in a tumbling window. The
demo stream in the getting started exercise receives stock price data that is mapped to the in-application
stream SOURCE_SQL_STREAM_001 in your application. This stream has the following schema.
(TICKER_SYMBOL VARCHAR(4),
SECTOR varchar(16),
CHANGE REAL,
PRICE REAL)
In your application code, suppose that you want to find aggregate (min, max) prices for each ticker over a
one-minute window. You can use the following query.
77
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Sliding Windows
Ticker_Symbol,
MIN(Price) AS Price,
MAX(Price) AS Price
FROM "SOURCE_SQL_STREAM_001"
GROUP BY Ticker_Symbol,
STEP("SOURCE_SQL_STREAM_001".ROWTIME BY INTERVAL '60' SECOND);
The preceding is an example of a windowed query that is time-based. The query groups records by
ROWTIME values. For reporting on a per-minute basis, the STEP function rounds down the ROWTIME
values to the nearest minute.
Note
You can also use the FLOOR function to group records into windows. However, FLOOR can
only round time values down to a whole time unit (hour, minute, second, and so on). STEP is
recommended for grouping records into tumbling windows because it can round values down to
an arbitrary interval, for example, 30 seconds.
This query is an example of a nonoverlapping (tumbling) window. The GROUP BY clause groups records
in a one-minute window, and each record belongs to a specific window (no overlapping). The query
emits one output record per minute, providing the min/max ticker price recorded at the specific minute.
This type of query is useful for generating periodic reports from the input data stream. In this example,
reports are generated each minute.
Sliding Windows
Instead of grouping records using GROUP BY, you can define a time-based or row-based window. You do
this by adding an explicit WINDOW clause.
In this case, as the window slides with time, Amazon Kinesis Data Analytics emits an output when new
records appear on the stream. Kinesis Data Analytics emits this output by processing rows in the window.
Windows can overlap in this type of processing, and a record can be part of multiple windows and be
processed with each window. The following example illustrates a sliding window.
Consider a simple query that counts records on the stream. This example assumes a 5-second window. In
the following example stream, new records arrive at time t1, t2, t6, and t7, and three records arrive at time
t8 seconds.
78
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Sliding Windows
• The example assumes a 5-second window. The 5-second window slides continuously with time.
• For every row that enters a window, an output row is emitted by the sliding window. Soon after the
application starts, you see the query emit output for every new record that appears on the stream,
even though a 5-second window hasn't passed yet. For example, the query emits output when a record
appears in the first second and second second. Later, the query processes records in the 5-second
window.
• The windows slide with time. If an old record on the stream falls out of the window, the query doesn't
emit output unless there is also a new record on the stream that falls within that 5-second window.
Suppose that the query starts executing at t0. Then the following occurs:
1. At the time t0, the query starts. The query doesn't emit output (count value) because there are no
records at this time.
2. At time t1, a new record appears on the stream, and the query emits count value 1.
3. At time t2, another record appears, and the query emits count 2.
At all of these times, the 5-second window has the same records—there are no new records.
Therefore, the query doesn't emit any output.
5. At time t6, the 5-second window is (t6 to t1). The query detects one new record at t6 so it emits output
2. The record at t1 is no longer in the window and doesn't count.
79
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Sliding Windows
6. At time t7, the 5-second window is t7 to t2. The query detects one new record at t7 so it emits output
2. The record at t2 is no longer in the 5-second window, and therefore isn't counted.
7. At time t8, the 5-second window is t8 to t3. The query detects three new records, and therefore emits
record count 5.
In summary, the window is a fixed size and slides with time. The query emits output when new records
appear.
Note
We recommend that you use a sliding window no longer than one hour. If you use a longer
window, the application takes longer to restart after regular system maintenance. This is
because the source data must be read from the stream again.
The following example queries use the WINDOW clause to define windows and perform aggregates.
Because the queries don't specify GROUP BY, the query uses the sliding window approach to process
records on the stream.
(TICKER_SYMBOL VARCHAR(4),
SECTOR varchar(16),
CHANGE REAL,
PRICE REAL)
Suppose that you want your application to compute aggregates using a sliding 1-minute window. That is,
for each new record that appears on the stream, you want the application to emit an output by applying
aggregates on records in the preceding 1-minute window.
You can use the following time-based windowed query. The query uses the WINDOW clause to define the
1-minute range interval. The PARTITION BY in the WINDOW clause groups records by ticker values within
the sliding window.
80
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Sliding Windows
PARTITION BY ticker_symbol
RANGE INTERVAL '1' MINUTE PRECEDING);
81
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Stream Joins
In the following example, the query emits the output ticker, price, a2, and a10. It emits output for ticker
symbols whose two-row moving average crosses the ten-row moving average. The a2 and a10 column
values are derived from two-row and ten-row sliding windows.
To test this query against the demo stream, follow the test procedure described in Example 1 (p. 80).
(tradeId SqlType, orderId SqlType, ticker SqlType, amount SqlType, ticker SqlType,
amount SqlType, ROWTIME TimeStamp)
The following are JOIN query examples that correlate data on these streams.
SELECT STREAM
ROWTIME,
82
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example 1: Report Orders Where There Are Trades
Within One Minute of the Order Being Placed
o.orderId, o.ticker, o.amount AS orderAmount,
t.amount AS tradeAmount
FROM OrderStream AS o
JOIN TradeStream OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS t
ON o.orderId = t.orderId;
You can define the windows explicitly using the WINDOW clause and writing the preceding query as
follows:
SELECT STREAM
ROWTIME,
o.orderId, o.ticker, o.amount AS orderAmount,
t.amount AS tradeAmount
FROM OrderStream AS o
JOIN TradeStream OVER t
ON o.orderId = t.orderId
WINDOW t AS
(RANGE INTERVAL '1' MINUTE PRECEDING)
When you include this query in your application code, the application code runs continuously. For each
arriving record on the OrderStream, the application emits an output if there are trades within the 1-
minute window following the order being placed.
The join in the preceding query is an inner join where the query emits records in OrderStream for which
there is a matching record in TradeStream (and vice versa). Using an outer join you can create another
interesting scenario. Suppose that you want stock orders for which there are no trades within one minute
of stock order being placed, and trades reported within the same window but for some other orders. This
is example of an outer join.
SELECT STREAM
ROWTIME,
o.orderId, o.ticker, o.amount AS orderAmount,
t.ticker, t.tradeId, t.amount AS tradeAmount,
FROM OrderStream AS o
OUTER JOIN TradeStream OVER (RANGE INTERVAL '1' MINUTE PRECEDING) AS t
ON o.orderId = t.orderId;
83
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming Data
Example Applications
This section provides examples of creating and working with applications in Amazon Kinesis Data
Analytics. They include example code and step-by-step instructions to help you create Kinesis data
analytics applications and test your results.
Before you explore these examples, we recommend that you first review Amazon Kinesis Data Analytics
for SQL Applications: How It Works (p. 3) and Getting Started with Amazon Kinesis Data Analytics for
SQL Applications (p. 45).
Topics
• Examples: Transforming Data (p. 84)
• Examples: Windows and Aggregation (p. 106)
• Examples: Joins (p. 118)
• Examples: Machine Learning (p. 121)
• Examples: Alerts and Errors (p. 141)
• Examples: Solution Accelerators (p. 144)
This section provides examples of how to use the available string functions to normalize data, how to
extract information that you need from string columns, and so on. The section also points to date time
functions that you might find useful.
Topics
• Examples: Transforming String Values (p. 84)
• Example: Transforming DateTime Values (p. 98)
• Example: Transforming Multiple Data Types (p. 101)
84
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
configuration specifies how record fields in the streaming source map to columns in an in-application
stream.
This mapping works when records on the streaming source follow the supported formats, which results
in an in-application stream with normalized data. But what if data on your streaming source does not
conform to supported standards? For example, what if your streaming source contains data such as
clickstream data, IoT sensors, and application logs?
• Streaming source contains application logs – The application logs follow the standard Apache log
format, and are written to the stream using JSON format.
{
"Log":"192.168.254.30 - John [24/May/2004:22:01:02 -0700] "GET /icons/apache_pb.gif
HTTP/1.1" 304 0"
}
For more information about the standard Apache log format, see Log Files on the Apache website.
• Streaming source contains semi-structured data – The following example shows two records. The
Col_E_Unstructured field value is a series of comma-separated values. There are five columns: the
first four have string type values, and the last column contains comma-separated values.
{ "Col_A" : "string",
"Col_B" : "string",
"Col_C" : "string",
"Col_D" : "string",
"Col_E_Unstructured" : "value,value,value,value"}
{ "Col_A" : "string",
"Col_B" : "string",
"Col_C" : "string",
"Col_D" : "string",
"Col_E_Unstructured" : "value,value,value,value"}
• Records on your streaming source contain URLs, and you need a portion of the URL domain name for
analytics.
{ "referrer" : "http://www.amazon.com"}
{ "referrer" : "http://www.stackoverflow.com" }
In such cases, the following two-step process generally works for creating in-application streams that
contain normalized data:
1. Configure application input to map the unstructured field to a column of the VARCHAR(N) type in the
in-application input stream that is created.
2. In your application code, use string functions to split this single column into multiple columns and
then save the rows in another in-application stream. This in-application stream that your application
code creates will have normalized data. You can then perform analytics on this in-application stream.
Amazon Kinesis Data Analytics provides the following string operations, standard SQL functions, and
extensions to the SQL standard for working with string columns:
• String operators – Operators such as LIKE and SIMILAR are useful in comparing strings. For more
information, see String Operators in the Amazon Kinesis Data Analytics SQL Reference.
85
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
• SQL functions – The following functions are useful when manipulating individual strings. For more
information, see String and Search Functions in the Amazon Kinesis Data Analytics SQL Reference.
• CHAR_LENGTH – Provides the length of a string.
• INITCAP – Returns a converted version of the input string such that the first character of each
space-delimited word is uppercase, and all other characters are lowercase.
• LOWER/UPPER – Converts a string to lowercase or uppercase.
• OVERLAY – Replaces a portion of the first string argument (the original string) with the second string
argument (the replacement string).
• POSITION – Searches for a string within another string.
• REGEX_REPLACE – Replaces a substring with an alternative substring.
• SUBSTRING – Extracts a portion of a source string starting at a specific position.
• TRIM – Removes instances of the specified character from the beginning or end of the source string.
• SQL extensions – These are useful for working with unstructured strings such as logs and URIs. For
more information, see Log Parsing Functions in the Amazon Kinesis Data Analytics SQL Reference.
• FAST_REGEX_LOG_PARSER – Works similar to the regex parser, but it takes several shortcuts to
ensure faster results. For example, the fast regex parser stops at the first match it finds (known as
lazy semantics).
• FIXED_COLUMN_LOG_PARSE – Parses fixed-width fields and automatically converts them to the
given SQL types.
• REGEX_LOG_PARSE – Parses a string based on default Java regular expression patterns.
• SYS_LOG_PARSE – Parses entries commonly found in UNIX/Linux system logs.
• VARIABLE_COLUMN_LOG_PARSE – Splits an input string into fields separated by a delimiter
character or a delimiter string.
• W3C_LOG_PARSE – Can be used for quickly formatting Apache logs.
Topics
• Example: Extracting a Portion of a String (SUBSTRING Function) (p. 86)
• Example: Replacing a Substring using Regex (REGEX_REPLACE Function) (p. 88)
• Example: Parsing Log Strings Based on Regular Expressions (REGEX_LOG_PARSE Function) (p. 91)
• Example: Parsing Web Logs (W3C_LOG_PARSE Function) (p. 93)
• Example: Split Strings into Multiple Fields (VARIABLE_COLUMN_LOG_PARSE Function) (p. 95)
In this example, you write the following records to an Amazon Kinesis data stream.
{ "REFERRER" : "http://www.amazon.com" }
{ "REFERRER" : "http://www.amazon.com"}
{ "REFERRER" : "http://www.amazon.com"}
...
You then create an Amazon Kinesis data analytics application on the console, using the Kinesis data
stream as the streaming source. The discovery process reads sample records on the streaming source and
infers an in-application schema with one column (REFERRER), as shown.
86
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
Then, you use the application code with the SUBSTRING function to parse the URL string to retrieve
the company name. Then you insert the resulting data into another in-application stream, as shown
following:
Topics
• Step 1: Create a Kinesis Data Stream (p. 87)
• Step 2: Create the Kinesis Data Analytics Application (p. 88)
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
3. Choose Create Kinesis stream, and create a stream with one shard. For more information, see Create
a Stream in the Amazon Kinesis Data Streams Developer Guide.
4. Run the following Python code to populate sample log records. This simple code continuously writes
the same log record to the stream.
import json
import boto3
import random
kinesis = boto3.client('kinesis')
def getReferrer():
data = {}
87
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
data['REFERRER'] = 'http://www.amazon.com'
return data
while True:
data = json.dumps(getReferrer())
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
a. Copy the following application code and paste it into the editor.
b. Choose Save and run SQL. On the Real-time analytics tab, you can see all the in-application
streams that the application created and verify the data.
88
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
In this example, you write the following records to an Amazon Kinesis data stream:
{ "REFERRER" : "http://www.amazon.com" }
{ "REFERRER" : "http://www.amazon.com"}
{ "REFERRER" : "http://www.amazon.com"}
...
You then create an Amazon Kinesis data analytics application on the console, with the Kinesis data
stream as the streaming source. The discovery process reads sample records on the streaming source and
infers an in-application schema with one column (REFERRER) as shown.
Then, you use the application code with the REGEX_REPLACE function to convert the URL to use
https:// instead of http://. You insert the resulting data into another in-application stream, as
shown following:
Topics
• Step 1: Create a Kinesis Data Stream (p. 89)
• Step 2: Create the Kinesis Data Analytics Application (p. 90)
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
3. Choose Create Kinesis stream, and create a stream with one shard. For more information, see Create
a Stream in the Amazon Kinesis Data Streams Developer Guide.
89
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
4. Run the following Python code to populate the sample log records. This simple code continuously
writes the same log record to the stream.
import json
import boto3
import random
kinesis = boto3.client('kinesis')
def getReferrer():
data = {}
data['REFERRER'] = 'http://www.amazon.com'
return data
while True:
data = json.dumps(getReferrer())
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
a. Copy the following application code, and paste it into the editor:
b. Choose Save and run SQL. On the Real-time analytics tab, you can see all the in-application
streams that the application created and verify the data.
90
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
In this example, you write the following records to an Amazon Kinesis stream:
You then create an Amazon Kinesis data analytics application on the console, with the Kinesis data
stream as the streaming source. The discovery process reads sample records on the streaming source and
infers an in-application schema with one column (LOGENTRY), as shown following.
Then, you use the application code with the REGEX_LOG_PARSE function to parse the log string to
retrieve the data elements. You insert the resulting data into another in-application stream, as shown in
the following screenshot:
Topics
• Step 1: Create a Kinesis Data Stream (p. 92)
• Step 2: Create the Kinesis Data Analytics Application (p. 92)
91
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
3. Choose Create Kinesis stream, and create a stream with one shard. For more information, see Create
a Stream in the Amazon Kinesis Data Streams Developer Guide.
4. Run the following Python code to populate sample log records. This simple code continuously writes
the same log record to the stream.
import json
import boto3
import random
kinesis = boto3.client('kinesis')
def getReferrer():
data = {}
data['LOGENTRY'] = '203.0.113.24 - - [25/Mar/2018:15:25:37 -0700] "GET /index.php
HTTP/1.1" 200 125 "-" "Mozilla/5.0 [en] Gecko/20100101 Firefox/52.0"'
return data
while True:
data = json.dumps(getReferrer())
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
a. Copy the following application code and paste it into the editor.
92
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
b. Choose Save and run SQL. On the Real-time analytics tab, you can see all the in-application
streams that the application created and verify the data.
In this example, you write log records to an Amazon Kinesis data stream. Example logs are shown
following:
You then create an Amazon Kinesis data analytics application on the console, with the Kinesis data
stream as the streaming source. The discovery process reads sample records on the streaming source and
infers an in-application schema with one column (log), as shown following:
Then, you use the application code with the W3C_LOG_PARSE function to parse the log, and create
another in-application stream with various log fields in separate columns, as shown following:
93
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
Topics
• Step 1: Create a Kinesis Data Stream (p. 94)
• Step 2: Create the Kinesis Data Analytics Application (p. 94)
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
3. Choose Create Kinesis stream, and create a stream with one shard. For more information, see Create
a Stream in the Amazon Kinesis Data Streams Developer Guide.
4. Run the following Python code to populate the sample log records. This simple code continuously
writes the same log record to the stream.
import json
import boto3
import random
kinesis = boto3.client('kinesis')
def getLog():
data = {}
data['log'] = '192.168.254.30 - John [24/May/2004:22:01:02 -0700] "GET /icons/
apache_pb.gif HTTP/1.1" 304 0'
return data
while True:
data = json.dumps(getLog())
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
94
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
a. Copy the following application code and paste it into the editor.
b. Choose Save and run SQL. On the Real-time analytics tab, you can see all the in-application
streams that the application created and verify the data.
In this example, you write semi-structured records to an Amazon Kinesis data stream. The example
records are as follows:
{ "Col_A" : "string",
"Col_B" : "string",
"Col_C" : "string",
"Col_D_Unstructured" : "value,value,value,value"}
{ "Col_A" : "string",
"Col_B" : "string",
"Col_C" : "string",
"Col_D_Unstructured" : "value,value,value,value"}
95
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
You then create an Amazon Kinesis data analytics application on the console, using the Kinesis stream as
the streaming source. The discovery process reads sample records on the streaming source and infers an
in-application schema with four columns, as shown following:
Then, you use the application code with the VARIABLE_COLUMN_LOG_PARSE function to parse the
comma-separated values, and insert normalized rows in another in-application stream, as shown
following:
Topics
• Step 1: Create a Kinesis Data Stream (p. 96)
• Step 2: Create the Kinesis Data Analytics Application (p. 97)
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
3. Choose Create Kinesis stream, and create a stream with one shard. For more information, see Create
a Stream in the Amazon Kinesis Data Streams Developer Guide.
4. Run the following Python code to populate the sample log records. This simple code continuously
writes the same log record to the stream.
import json
import boto3
import random
96
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming String Values
kinesis = boto3.client('kinesis')
def getHighHeartRate():
data = {}
data['Col_A'] = 'a'
data['Col_B'] = 'b'
data['Col_C'] = 'c'
data['Col_E_Unstructured'] = 'x,y,z'
return data
while True:
data = json.dumps(getHighHeartRate())
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
a. Copy the following application code and paste it into the editor:
97
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming DateTime Values
FROM "SOURCE_SQL_STREAM_001") as t;
b. Choose Save and run SQL. On the Real-time analytics tab, you can see all the in-application
streams that the application created and verify the data.
• Date and time operators – You can perform arithmetic operations on dates, times, and interval data
types. For more information, see Date, Timestamp, and Interval Operators in the Amazon Kinesis Data
Analytics SQL Reference.
• SQL Functions – These include the following. For more information, see Date and Time Functions in
the Amazon Kinesis Data Analytics SQL Reference.
• EXTRACT() – Extracts one field from a date, time, time stamp, or interval expression.
• CURRENT_TIME – Returns the time when the query executes (UTC).
• CURRENT_DATE – Returns the date when the query executes (UTC).
• CURRENT_TIMESTAMP – Returns the time stamp when the query executes (UTC).
• LOCALTIME – Returns the current time when the query executes as defined by the environment on
which Kinesis Data Analytics is running (UTC).
• LOCALTIMESTAMP – Returns the current time stamp as defined by the environment on which Kinesis
Data Analytics is running (UTC).
• SQL Extensions – These include the following. For more information, see Date and Time Functions and
Datetime Conversion Functions in the Amazon Kinesis Data Analytics SQL Reference.
• CURRENT_ROW_TIMESTAMP – Returns a new time stamp for each row in the stream.
• TSDIFF – Returns the difference of two time stamps in milliseconds.
• CHAR_TO_DATE – Converts a string to a date.
• CHAR_TO_TIME – Converts a string to time.
• CHAR_TO_TIMESTAMP – Converts a string to a time stamp.
• DATE_TO_CHAR – Converts a date to a string.
• TIME_TO_CHAR – Converts a time to a string.
• TIMESTAMP_TO_CHAR – Converts a time stamp to a string.
Most of the preceding SQL functions use a format to convert the columns. The format is flexible. For
example, you can specify the format yyyy-MM-dd hh:mm:ss to convert an input string 2009-09-16
03:15:24 into a time stamp. For more information, Char To Timestamp(Sys) in the Amazon Kinesis Data
Analytics SQL Reference.
98
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming DateTime Values
You then create an Amazon Kinesis data analytics application on the console, with the Kinesis stream as
the streaming source. The discovery process reads sample records on the streaming source and infers an
in-application schema with two columns (EVENT_TIME and TICKER) as shown.
Then, you use the application code with SQL functions to convert the EVENT_TIME time stamp field
in various ways. You then insert the resulting data into another in-application stream, as shown in the
following screenshot:
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
3. Choose Create Kinesis stream, and create a stream with one shard.
4. Run the following Python code to populate the stream with sample data. This simple code
continuously writes a record with a random ticker symbol and the current time stamp to the stream.
import json
import boto3
import random
import datetime
kinesis = boto3.client('kinesis')
99
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming DateTime Values
def getReferrer():
data = {}
now = datetime.datetime.now()
str_now = now.isoformat()
data['EVENT_TIME'] = str_now
data['TICKER'] = random.choice(['AAPL', 'AMZN', 'MSFT', 'INTC', 'TBV'])
price = random.random() * 100
data['PRICE'] = round(price, 2)
return data
while True:
data = json.dumps(getReferrer())
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
a. Copy the following application code and paste it into the editor.
SELECT STREAM
TICKER,
EVENT_TIME,
EVENT_TIME - INTERVAL '5' MINUTE,
UNIX_TIMESTAMP(EVENT_TIME),
100
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming Multiple Data Types
b. Choose Save and run SQL. On the Real-time analytics tab, you can see all the in-application
streams that the application created and verify the data.
1. First, you map the streaming source to an in-application input stream, similar to all other Kinesis
data analytics applications.
2. Then, in your application code, you write SQL statements to retrieve rows of specific types from the
in-application input stream. You then insert them into separate in-application streams. (You can
create additional in-application streams in your application code.)
In this exercise, you have a streaming source that receives records of two types (Order and Trade).
These are stock orders and corresponding trades. For each order, there can be zero or more trades.
Example records of each type are shown following:
Order record
{"RecordType": "Order", "Oprice": 9047, "Otype": "Sell", "Oid": 3811, "Oticker": "AAAA"}
Trade record
When you create an application using the AWS Management Console, the console displays the following
inferred schema for the in-application input stream created. By default, the console names this in-
application stream SOURCE_SQL_STREAM_001.
101
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming Multiple Data Types
When you save the configuration, Amazon Kinesis Data Analytics continuously reads data from the
streaming source and inserts rows in the in-application stream. You can now perform analytics on data in
the in-application stream.
In the application code in this example, you first create two additional in-application streams,
Order_Stream and Trade_Stream. You then filter the rows from the SOURCE_SQL_STREAM_001
stream based on the record type and insert them in the newly created streams using pumps. For
information about this coding pattern, see Application Code (p. 31).
a. Filter the order records in the SOURCE_SQL_STREAM_001, and save the orders in the
Order_Stream.
--Create Order_Stream.
CREATE OR REPLACE STREAM "Order_Stream"
(
order_id integer,
order_type varchar(10),
ticker varchar(4),
order_price DOUBLE,
record_type varchar(10)
);
b. Filter the trade records in the SOURCE_SQL_STREAM_001, and save the orders in the
Trade_Stream.
--Create Trade_Stream.
CREATE OR REPLACE STREAM "Trade_Stream"
(trade_id integer,
order_id integer,
trade_price DOUBLE,
ticker varchar(4),
record_type varchar(10)
);
2. Now you can perform additional analytics on these streams. In this example, you count the number
of trades by the ticker in a one-minute tumbling window and save the results to yet another stream,
DESTINATION_SQL_STREAM.
102
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming Multiple Data Types
Topics
• Step 1: Prepare the Data (p. 103)
• Step 2: Create the Application (p. 105)
Next Step
Topics
• Step 1.1: Create a Streaming Source (p. 103)
• Step 1.2: Populate the Streaming Source (p. 104)
• Using the console – Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis. Choose Data Streams, and then create a stream with one shard. For
more information, see Create a Stream in the Amazon Kinesis Data Streams Developer Guide.
• Using the AWS CLI – Use the following Kinesis create-stream AWS CLI command to create the
stream:
103
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming Multiple Data Types
You can install dependencies using pip. For information about installing pip, see Installation on the
pip website.
2. Run the following Python code. The put-record command in the code writes the JSON records to
the stream.
import json
import boto3
import random
kinesis = boto3.client('kinesis')
x = 1
while True:
#rnd = random.random()
rnd = random.randint(1,3)
if rnd == 1:
ticker = "AAAA"
elif rnd == 2:
ticker = "BBBB"
else:
ticker = "CCCC"
data = json.dumps(getOrderData(x, ticker))
kinesis.put_record(StreamName="OrdersAndTradesStream", Data=data,
PartitionKey="partitionkey")
print(data)
tId = 1
for y in range (0, random.randint(0,6)):
tradeId = tId
tradePrice = random.randint(0, 3000)
data2 = json.dumps(getTradeData(x, tradeId, ticker, tradePrice));
kinesis.put_record(StreamName="OrdersAndTradesStream", Data=data2,
PartitionKey="partitionkey")
print(data2)
tId+=1
104
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Transforming Multiple Data Types
x+=1
Next Step
a. Choose the stream that you created in Step 1: Prepare the Data (p. 103).
b. Choose to create an IAM role.
c. Wait for the console to show the inferred schema and samples records that are used to infer the
schema for the in-application stream created.
d. Choose Save and continue.
5. On the application hub, choose Go to SQL editor. To start the application, choose Yes, start
application in the dialog box that appears.
6. In the SQL editor, write the application code and verify the results:
a. Copy the following application code and paste it into the editor.
--Create Order_Stream.
CREATE OR REPLACE STREAM "Order_Stream"
(
"order_id" integer,
"order_type" varchar(10),
"ticker" varchar(4),
"order_price" DOUBLE,
"record_type" varchar(10)
);
105
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Windows and Aggregation
b. Choose Save and run SQL. Choose the Real-time analytics tab to see all of the in-application
streams that the application created and verify the data.
Next Step
You can configure application output to persist results to an external destination, such as another Kinesis
stream or a Kinesis Data Firehose data delivery stream.
Topics
• Example: Stagger Window (p. 106)
• Example: Tumbling Window Using ROWTIME (p. 109)
• Example: Tumbling Window Using an Event Timestamp (p. 111)
• Example: Retrieving the Most Frequently Occurring Values (TOP_K_ITEMS_TUMBLING) (p. 114)
• Example: Aggregating Partial Results from a Query (p. 116)
In this example, you write the following records to a Kinesis data stream at the following times. The
script does not write the times to the stream, but the time that the record is ingested by the application
is written to the ROWTIME field:
106
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Stagger Window
You then create a Kinesis Data Analytics application in the AWS Management Console, with the Kinesis
data stream as the streaming source. The discovery process reads sample records on the streaming
source and infers an in-application schema with two columns (EVENT_TIME and TICKER) as shown
following.
You use the application code with the COUNT function to create a windowed aggregation of the data.
Then you insert the resulting data into another in-application stream, as shown in the following
screenshot:
In the following procedure, you create a Kinesis Data Analytics application that aggregates values in the
input stream in a stagger window based on EVENT_TIME and TICKER.
Topics
• Step 1: Create a Kinesis Data Stream (p. 107)
• Step 2: Create the Kinesis Data Analytics Application (p. 108)
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
107
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Stagger Window
3. Choose Create Kinesis stream, and then create a stream with one shard. For more information, see
Create a Stream in the Amazon Kinesis Data Streams Developer Guide.
4. To write records to a Kinesis data stream in a production environment, we recommend using either
the Kinesis Producer Library or Kinesis Data Streams API. For simplicity, this example uses the
following Python script to generate records. Run the code to populate the sample ticker records.
This simple code continuously writes a group of six records with the same random EVENT_TIME and
ticker symbol to the stream, over the course of one minute. Keep the script running so that you can
generate the application schema in a later step.
import json
import boto3
import random
import datetime
import time
kinesis = boto3.client('kinesis')
def getData():
data = {}
now = datetime.datetime.utcnow() - datetime.timedelta(seconds=10)
str_now = now.isoformat()
data['EVENT_TIME'] = str_now
data['TICKER'] = random.choice(['AAPL', 'AMZN', 'MSFT', 'INTC', 'TBV'])
return data
while True:
data = json.dumps(getData())
# Send six records, ten seconds apart, with the same event time and ticker
for x in range(0, 6):
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
time.sleep(10)
108
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Tumbling Window Using ROWTIME
6. In the SQL editor, write the application code, and verify the results as follows:
a. Copy the following application code and paste it into the editor.
On the Real-time analytics tab, you can see all the in-application streams that the application
created and verify the data.
In this example, you write the following records to a Kinesis data stream.
You then create a Kinesis Data Analytics application in the AWS Management Console, with the Kinesis
data stream as the streaming source. The discovery process reads sample records on the streaming
source and infers an in-application schema with two columns (TICKER and PRICE) as shown following.
You use the application code with the MIN and MAX functions to create a windowed aggregation of the
data. Then you insert the resulting data into another in-application stream, as shown in the following
screenshot:
109
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Tumbling Window Using ROWTIME
In the following procedure, you create a Kinesis Data Analytics application that aggregates values in the
input stream in a tumbling window based on ROWTIME.
Topics
• Step 1: Create a Kinesis Data Stream (p. 110)
• Step 2: Create the Kinesis Data Analytics Application (p. 111)
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
3. Choose Create Kinesis stream, and then create a stream with one shard. For more information, see
Create a Stream in the Amazon Kinesis Data Streams Developer Guide.
4. To write records to a Kinesis data stream in a production environment, we recommend using either
the Kinesis Client Library or Kinesis Data Streams API. For simplicity, this example uses the following
Python script to generate records. Run the code to populate the sample ticker records. This simple
code continuously writes a random ticker record to the stream. Keep the script running so that you
can generate the application schema in a later step.
import json
import boto3
import random
import datetime
kinesis = boto3.client('kinesis')
def getReferrer():
data = {}
now = datetime.datetime.now()
str_now = now.isoformat()
data['EVENT_TIME'] = str_now
data['TICKER'] = random.choice(['AAPL', 'AMZN', 'MSFT', 'INTC', 'TBV'])
price = random.random() * 100
data['PRICE'] = round(price, 2)
return data
while True:
data = json.dumps(getReferrer())
print(data)
kinesis.put_record(
110
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Tumbling Window Using an Event Timestamp
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
a. Copy the following application code and paste it into the editor.
On the Real-time analytics tab, you can see all the in-application streams that the application
created and verify the data.
111
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Tumbling Window Using an Event Timestamp
In this example, you write the following records to an Amazon Kinesis stream. The EVENT_TIME value is
set to 5 seconds in the past, to simulate processing and transmission lag that might create a delay from
when the event occurred, to when the record is ingested into Kinesis Data Analytics.
You then create a Kinesis Data Analytics application in the AWS Management Console, with the Kinesis
data stream as the streaming source. The discovery process reads sample records on the streaming
source and infers an in-application schema with three columns (EVENT_TIME, TICKER, and PRICE) as
shown following.
You use the application code with the MIN and MAX functions to create a windowed aggregation of the
data. Then you insert the resulting data into another in-application stream, as shown in the following
screenshot:
In the following procedure, you create a Kinesis Data Analytics application that aggregates values in the
input stream in a tumbling window based on an event time.
Topics
• Step 1: Create a Kinesis Data Stream (p. 112)
• Step 2: Create the Kinesis Data Analytics Application (p. 113)
112
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Tumbling Window Using an Event Timestamp
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
3. Choose Create Kinesis stream, and then create a stream with one shard. For more information, see
Create a Stream in the Amazon Kinesis Data Streams Developer Guide.
4. To write records to a Kinesis data stream in a production environment, we recommend using either
the Kinesis Client Library or Kinesis Data Streams API. For simplicity, this example uses the following
Python script to generate records. Run the code to populate the sample ticker records. This simple
code continuously writes a random ticker record to the stream. Keep the script running so that you
can generate the application schema in a later step.
import json
import boto3
import random
import datetime
kinesis = boto3.client('kinesis')
def getReferrer():
data = {}
now = datetime.datetime.now()
str_now = now.isoformat()
data['EVENT_TIME'] = str_now
data['TICKER'] = random.choice(['AAPL', 'AMZN', 'MSFT', 'INTC', 'TBV'])
price = random.random() * 100
data['PRICE'] = round(price, 2)
return data
while True:
data = json.dumps(getReferrer())
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
113
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Most Frequently Occurring Values
(TOP_K_ITEMS_TUMBLING)
6. In the SQL editor, write the application code, and verify the results as follows:
a. Copy the following application code and paste it into the editor.
On the Real-time analytics tab, you can see all the in-application streams that the application
created and verify the data.
The TOP_K_ITEMS_TUMBLING function is useful when aggregating over tens or hundreds of thousands
of keys, and you want to reduce your resource usage. The function produces the same result as
aggregating with GROUP BY and ORDER BY clauses.
In this example, you write the following records to an Amazon Kinesis data stream:
{"TICKER": "TBV"}
{"TICKER": "INTC"}
{"TICKER": "MSFT"}
{"TICKER": "AMZN"}
...
You then create a Kinesis Data Analytics application in the AWS Management Console, with the Kinesis
data stream as the streaming source. The discovery process reads sample records on the streaming
source and infers an in-application schema with one column (TICKER) as shown following.
114
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Most Frequently Occurring Values
(TOP_K_ITEMS_TUMBLING)
You use the application code with the TOP_K_VALUES_TUMBLING function to create a windowed
aggregation of the data. Then you insert the resulting data into another in-application stream, as shown
in the following screenshot:
In the following procedure, you create a Kinesis Data Analytics application that retrieves the most
frequently occurring values in the input stream.
Topics
• Step 1: Create a Kinesis Data Stream (p. 115)
• Step 2: Create the Kinesis Data Analytics Application (p. 116)
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
3. Choose Create Kinesis stream, and then create a stream with one shard. For more information, see
Create a Stream in the Amazon Kinesis Data Streams Developer Guide.
4. To write records to a Kinesis data stream in a production environment, we recommend using either
the Kinesis Client Library or Kinesis Data Streams API. For simplicity, this example uses the following
Python script to generate records. Run the code to populate the sample ticker records. This simple
code continuously writes a random ticker record to the stream. Leave the script running so that you
can generate the application schema in a later step.
import json
import boto3
import random
import datetime
kinesis = boto3.client('kinesis')
def getReferrer():
data = {}
now = datetime.datetime.now()
str_now = now.isoformat()
data['EVENT_TIME'] = str_now
data['TICKER'] = random.choice(['AAPL', 'AMZN', 'MSFT', 'INTC', 'TBV'])
price = random.random() * 100
data['PRICE'] = round(price, 2)
return data
while True:
data = json.dumps(getReferrer())
115
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Aggregating Partial Results
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
a. Copy the following application code and paste it into the editor:
On the Real-time analytics tab, you can see all the in-application streams that the application
created and verify the data.
116
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Aggregating Partial Results
necessarily occur, within the window. In this case, the tumbling window contains only a partial set of the
results that you want. There are several approaches that you can use to correct this issue:
• Use a tumbling window only, and aggregate partial results in post processing through a database
or data warehouse using upserts. This approach is efficient in processing an application. It handles
the late data indefinitely for aggregate operators (sum, min, max, and so on). The downside to this
approach is that you must develop and maintain additional application logic in the database layer.
• Use a tumbling and sliding window, which produces partial results early, but also continues to produce
complete results over the sliding window period. This approach handles late data with an overwrite
instead of an upsert so that no additional application logic needs to be added in the database layer.
The downside to this approach is that it uses more Kinesis processing units (KPUs) and still produces
two results, which might not work for some use cases.
For more information about tumbling and sliding windows, see Windowed Queries (p. 72).
In the following procedure, the tumbling window aggregation produces two partial results (sent to the
CALC_COUNT_SQL_STREAM in-application stream) that must be combined to produce a final result. The
application then produces a second aggregation (sent to the DESTINATION_SQL_STREAM in-application
stream) that combines the two partial results.
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Analytics in the navigation pane. Create a Kinesis Data Analytics application as
described in the ??? (p. 45) tutorial.
3. In the SQL editor, replace the application code with the following:
117
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Joins
The SELECT statement in the application code filters rows in the SOURCE_SQL_STREAM_001 for
stock price changes greater than 1 percent and inserts those rows into another in-application stream
CHANGE_STREAM using a pump.
4. Choose Save and run SQL.
The first pump outputs a stream to CALC_COUNT_SQL_STREAM similar to the following. Note that the
result set is incomplete:
The second pump then outputs a stream to DESTINATION_SQL_STREAM that contains the complete
result set:
Examples: Joins
This section provides examples of Amazon Kinesis data analytics applications that use join queries.
Each example provides step-by-step instructions for setting up and testing your Kinesis data analytics
application.
Topics
• Example: Adding Reference Data to a Kinesis Data Analytics Application (p. 118)
118
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Add Reference Data Source
• Amazon Kinesis Data Analytics for SQL Applications: How It Works (p. 3)
• Configuring Application Input (p. 5)
In this exercise, you add reference data to the application you created in the Kinesis Data Analytics
Getting Started exercise. The reference data provides the company name for each ticker symbol; for
example:
Ticker, Company
AMZN,Amazon
ASD, SomeCompanyA
MMB, SomeCompanyB
WAS, SomeCompanyC
First, complete the steps in the Getting Started exercise to create a starter application. Then follow these
steps to set up and add reference data to your application:
Kinesis Data Analytics reads the Amazon S3 object and creates an in-application reference table that
you can query in your application code.
3. Test the code.
In your application code, you write a join query to join the in-application stream with the in-
application reference table, to get the company name for each ticker symbol.
Topics
• Step 1: Prepare (p. 119)
• Step 2: Add the Reference Data Source to the Application Configuration (p. 120)
• Step 3: Test: Query the In-Application Reference Table (p. 121)
Step 1: Prepare
In this section, you store sample reference data as an object in an Amazon S3 bucket. You also create an
IAM role that Kinesis Data Analytics can assume to read the object on your behalf.
1. Open a text editor, add the following data, and save the file as TickerReference.csv.
Ticker, Company
AMZN,Amazon
ASD, SomeCompanyA
MMB, SomeCompanyB
WAS, SomeCompanyC
2. Upload the TickerReference.csv file to your S3 bucket. For instructions, see Uploading Objects
into Amazon S3 in the Amazon Simple Storage Service Console User Guide.
119
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Add Reference Data Source
1. In AWS Identity and Access Management (IAM), create an IAM role named KinesisAnalytics-
ReadS3Object. To create the role, follow the instructions in Creating a Role for an AWS Service
(AWS Management Console) in the IAM User Guide.
• For Select Role Type, choose AWS Lambda. After creating the role, you will change the trust
policy to allow Kinesis Data Analytics (not AWS Lambda) to assume the role.
• Do not attach any policy on the Attach Policy page.
2. Update the IAM role policies:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "kinesisanalytics.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
c. On the Permissions tab, attach an AWS managed policy called AmazonS3ReadOnlyAccess. This
grants the role permissions to read an Amazon S3 object. This policy is shown following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*"
],
"Resource": "*"
}
]
}
120
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Machine Learning
1. In the main page for the application, choose Connect reference data.
2. In the Connect reference data source page, choose the Amazon S3 bucket containing your reference
data object, and enter the object's key name.
3. Enter CompanyName for the In-application reference table name.
4. In the Access to chosen resources section, choose Choose from IAM roles that Kinesis Analytics
can assume, and choose the KinesisAnalytics-ReadS3Object IAM role you created in the previous
section.
5. Choose Discover schema. The console detects two columns in the reference data.
6. Choose Save and close.
1. Replace your application code with the following. The query joins the in-application input stream
with the in-application reference table. The application code writes the results to another in-
application stream, DESTINATION_SQL_STREAM.
2. Verify that the application output appears in the SQLResults tab. Make sure that some of the rows
show company names (your sample reference data does not have all company names).
Topics
• Example: Detecting Data Anomalies on a Stream (RANDOM_CUT_FOREST Function) (p. 121)
• Example: Detecting Data Anomalies and Getting an Explanation
(RANDOM_CUT_FOREST_WITH_EXPLANATION Function) (p. 127)
• Example: Detecting Hotspots on a Stream (HOTSPOTS Function) (p. 131)
121
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Detecting Anomalies
In this exercise, you write application code to assign an anomaly score to records on your application's
streaming source. To set up the application, you do the following:
1. Set up a streaming source – You set up a Kinesis data stream and write sample heartRate data, as
shown following:
The procedure provides a Python script for you to populate the stream. The heartRate values are
randomly generated, with 99 percent of the records having heartRate values between 60 and
100, and only 1 percent of heartRate values between 150 and 200. Thus, the records that have
heartRate values between 150 and 200 are anomalies.
2. Configure input – Using the console, you create a Kinesis Data Analytics application and
configure the application input by mapping the streaming source to an in-application stream
(SOURCE_SQL_STREAM_001). When the application starts, Kinesis Data Analytics continuously reads
the streaming source and inserts records into the in-application stream.
3. Specify application code – The example uses the following application code:
The code reads rows in the SOURCE_SQL_STREAM_001, assigns an anomaly score, and writes
the resulting rows to another in-application stream (TEMP_STREAM). The application code then
sorts the records in the TEMP_STREAM and saves the results to another in-application stream
(DESTINATION_SQL_STREAM). You use pumps to insert rows in in-application streams. For more
information, see In-Application Streams and Pumps (p. 68).
4. Configure output – You configure the application output to persist data in the
DESTINATION_SQL_STREAM to an external destination, which is another Kinesis data stream.
Reviewing the anomaly scores that are assigned to each record and determining what score indicates
an anomaly (and that you need to be alerted) is external to the application. You can use an AWS
Lambda function to process these anomaly scores and configure alerts.
122
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Detecting Anomalies
The exercise uses the US East (N. Virginia) (us-east-1) AWS Region to create these streams and your
application. If you use any other Region, you must update the code accordingly.
Topics
• Step 1: Prepare (p. 123)
• Step 2: Create an Application (p. 124)
• Step 3: Configure Application Output (p. 126)
• Step 4: Verify Output (p. 127)
Next Step
Step 1: Prepare
Before you create an Amazon Kinesis Data Analytics application for this exercise, you must create two
Kinesis data streams. Configure one of the streams as the streaming source for your application, and the
other stream as the destination where Kinesis Data Analytics persists your application output.
Topics
• Step 1.1: Create the Input and Output Data Streams (p. 123)
• Step 1.2: Write Sample Records to the Input Stream (p. 123)
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Create data stream. Create a stream with one shard named ExampleInputStream. For
more information, see Create a Stream in the Amazon Kinesis Data Streams Developer Guide.
3. Repeat the previous step, creating a stream with one shard named ExampleOutputStream.
• To use the AWS CLI
1. Use the following Kinesis create-stream AWS CLI command to create the first stream
(ExampleInputStream).
2. Run the same command, changing the stream name to ExampleOutputStream. This command
creates the second stream that the application uses to write output.
123
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Detecting Anomalies
You can install dependencies using pip. For information about installing pip, see Installation on the
pip website.
2. Run the following Python code. The put-record command in the code writes the JSON records to
the stream.
import json
import boto3
import random
kinesis = boto3.client('kinesis')
while True:
rnd = random.random()
if (rnd < 0.01):
data = json.dumps(getHighHeartRate())
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
else:
data = json.dumps(getNormalHeartRate())
print(data)
kinesis.put_record(
StreamName="ExampleInputStream",
Data=data,
PartitionKey="partitionkey")
Next Step
124
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Detecting Anomalies
• Configure the application input to use the Kinesis data stream that you created in the section called
“Step 1: Prepare” (p. 123) as the streaming source.
• Use the Anomaly Detection template on the console.
To create an application
1. Follow steps 1, 2, and 3 in the Kinesis Data Analytics Getting Started exercise (see Step 3.1: Create
an Application (p. 49)).
Most of the heart rate values are normal, and the discovery process will most likely assign the
TINYINT type to this column. But a small percentage of the values show a high heart rate. If
these high values don't fit in the TINYINT type, Kinesis Data Analytics sends these rows to an
error stream. Update the data type to INTEGER so that it can accommodate all the generated
heart rate data.
• Use the Anomaly Detection template on the console. You then update the template code to
provide the appropriate column name.
2. Update the application code by providing column names. The resulting application code is shown
following (paste this code into the SQL editor):
3. Run the SQL code and review the results on the Kinesis Data Analytics console:
125
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Detecting Anomalies
Next Step
You can now send the application results from the in-application stream to an external destination,
which is another Kinesis data stream (OutputStreamTestingAnomalyScores). You can analyze the
anomaly scores and determine which heart rate is anomalous. You can then extend this application
further to generate alerts.
1. Open the Amazon Kinesis Data Analytics console. In the SQL editor, choose either Destination or
Add a destination in the application dashboard.
2. On the Connect to destination page, choose the OutputStreamTestingAnomalyScores stream
that you created in the preceding section.
Now you have an external destination, where Amazon Kinesis Data Analytics persists any records
your application writes to the in-application stream DESTINATION_SQL_STREAM.
3. You can optionally configure AWS Lambda to monitor the OutputStreamTestingAnomalyScores
stream and send you alerts. For instructions, see Preprocessing Data Using a Lambda
Function (p. 21). If you don't set alerts, you can review the records that Kinesis
Data Analytics writes to the external destination, which is the Kinesis data stream
OutputStreamTestingAnomalyScores, as described in Step 4: Verify Output (p. 127).
Next Step
126
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Anomalies and Get an Explanation
1. Run the get-shard-iterator command to get a pointer to data on the output stream.
You get a response with a shard iterator value, as shown in the following example response:
{
"ShardIterator":
"shard-iterator-value" }
The command returns a page of records and another shard iterator that you can use in the
subsequent get-records command to fetch the next set of records.
In this exercise, you write application code to obtain anomaly scores for records in your application's
streaming source. You also obtain an explanation for each anomaly.
Topics
• Step 1: Prepare the Data (p. 128)
• Step 2: Create an Analytics Application (p. 129)
• Step 3: Examine the Results (p. 130)
First Step
127
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Anomalies and Get an Explanation
Topics
• Step 1.1: Create a Kinesis Data Stream (p. 123)
• Step 1.2: Write Sample Records to the Input Stream (p. 123)
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane. Then choose Create Kinesis stream.
3. For the name, type ExampleInputStream. For the number of shards, type 1.
• Alternatively, to use the AWS CLI to create the data stream, run the following command:
You can install dependencies using pip. For information about installing pip, see Installation in the
pip documentation.
2. Run the following Python code. You can change the Region to the one you want to use for this
example. The put-record command in the code writes the JSON records to the stream.
import json
import boto3
import random
kinesis = boto3.client('kinesis')
128
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Anomalies and Get an Explanation
while True:
rnd = random.random()
if (rnd < 0.005):
data = json.dumps(getLowBloodPressure())
print(data)
kinesis.put_record(
StreamName="BloodPressureExampleInputStream",
Data=data,
PartitionKey="partitionkey")
elif (rnd > 0.995):
data = json.dumps(getHighBloodPressure())
print(data)
kinesis.put_record(
StreamName="BloodPressureExampleInputStream",
Data=data,
PartitionKey="partitionkey")
else:
data = json.dumps(getNormalBloodPressure())
print(data)
kinesis.put_record(
StreamName="BloodPressureExampleInputStream",
Data=data,
PartitionKey="partitionkey")
Next Step
To create an application
129
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Anomalies and Get an Explanation
6. Under Real time analytics, choose Go to SQL editor. When prompted, choose to run your
application.
7. Paste the following code into the SQL editor, and then choose Save and run SQL.
-- Compute an anomaly score with explanation for each record in the input stream
-- using RANDOM_CUT_FOREST_WITH_EXPLANATION
CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "TEMP_STREAM"
SELECT STREAM "Systolic", "Diastolic", "BloodPressureLevel", ANOMALY_SCORE,
ANOMALY_EXPLANATION
FROM TABLE(RANDOM_CUT_FOREST_WITH_EXPLANATION(
CURSOR(SELECT STREAM * FROM "SOURCE_SQL_STREAM_001"), 100, 256, 100000,
1, true));
Next Step
130
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Hotspots
In this exercise, you write application code to locate hotspots on your application's streaming source. To
set up the application, you do the following steps:
1. Set up a streaming source – You set up a Kinesis stream and write sample coordinate data as shown
following:
The example provides a Python script for you to populate the stream. The x and y values are
randomly generated, with some records being clustered around certain locations.
The is_hot field is provided as an indicator if the script intentionally generated the value as part of
a hotspot. This can help you evaluate whether the hotspot detection function is working properly.
2. Create the application – Using the AWS Management Console, you then create a Kinesis data
analytics application. Configure the application input by mapping the streaming source to an in-
application stream (SOURCE_SQL_STREAM_001). When the application starts, Kinesis Data Analytics
continuously reads the streaming source and inserts records into the in-application stream.
In this exercise, you use the following code for the application:
131
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Hotspots
"y" DOUBLE,
"is_hot" VARCHAR(4),
HOTSPOTS_RESULT VARCHAR(10000)
);
CREATE OR REPLACE PUMP "STREAM_PUMP" AS
INSERT INTO "DESTINATION_SQL_STREAM"
SELECT "x", "y", "is_hot", "HOTSPOTS_RESULT"
FROM TABLE (
HOTSPOTS(
CURSOR(SELECT STREAM "x", "y", "is_hot" FROM "SOURCE_SQL_STREAM_001"),
1000,
0.2,
17)
);
The code reads rows in the SOURCE_SQL_STREAM_001, analyzes it for significant hotspots, and
writes the resulting data to another in-application stream (DESTINATION_SQL_STREAM). You use
pumps to insert rows in in-application streams. For more information, see In-Application Streams
and Pumps (p. 68).
3. Configure the output – You configure the application output to send data from the application
to an external destination, which is another Kinesis data stream. Review the hotspot scores and
determine what scores indicate that a hotspot occurred (and that you need to be alerted). You can
use an AWS Lambda function to further process hotspot information and configure alerts.
4. Verify the output – The example includes a JavaScript application that reads data from the output
stream and displays it graphically, so you can view the hotspots that the application generates in
real time.
The exercise uses the US West (Oregon) (us-west-2) AWS Region to create these streams and your
application. If you use any other Region, update the code accordingly.
Topics
• Step 1: Create the Input and Output Streams (p. 132)
• Step 2: Create the Kinesis Data Analytics Application (p. 135)
• Step 3: Configure the Application Output (p. 136)
• Step 4: Verify the Application Output (p. 136)
Topics
• Step 1.1: Create the Kinesis Data Streams (p. 132)
• Step 1.2: Write Sample Records to the Input Stream (p. 133)
Create these data streams using the console or the AWS CLI.
132
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Hotspots
1. Sign in to the AWS Management Console and open the Kinesis console at https://
console.aws.amazon.com/kinesis.
2. Choose Data Streams in the navigation pane.
3. Choose Create Kinesis stream, and create a stream with one shard named
ExampleInputStream.
4. Repeat the previous step, creating a stream with one shard named ExampleOutputStream.
• To create data streams using the AWS CLI:
You can install dependencies using pip. For information about installing pip, see Installation on the
pip website.
2. Run the following Python code. This code does the following:
Important
Do not upload this file to a web server because it contains your AWS credentials.
import boto3
import json
import time
133
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Hotspots
class RecordGenerator(object):
"""A class used to generate points used as input to the hotspot detection
algorithm. With probability hotspotWeight,
a point is drawn from a hotspot, otherwise it is drawn from the base distribution.
The location of the hotspot
changes after every 1000 points generated."""
def __init__(self):
self.x_min = xRange[0]
self.width = xRange[1] - xRange[0]
self.y_min = yRange[0]
self.height = yRange[1] - yRange[0]
self.points_generated = 0
self.hotspot_x_min = None
self.hotspot_y_min = None
def get_record(self):
if self.points_generated % 1000 == 0:
self.update_hotspot()
self.points_generated += 1
data = json.dumps(record)
return {'Data': bytes(data, 'utf-8'), 'PartitionKey': 'partition_key'}
def update_hotspot(self):
self.hotspot_x_min = self.x_min + random() * (self.width - hotspotSideLength)
self.hotspot_y_min = self.y_min + random() * (self.height - hotspotSideLength)
134
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Hotspots
def main():
kinesis = boto3.client('kinesis')
generator = RecordGenerator()
batch_size = 10
while True:
records = generator.get_records(batch_size)
print(records)
kinesis.put_records(StreamName="ExampleInputStream", Records=records)
time.sleep(0.1)
if __name__ == "__main__":
main()
Next Step
• Configure the application input to use the Kinesis data stream you created as the streaming source in
Step 1 (p. 132).
• Use the provided application code in the AWS Management Console.
To create an application
1. Create a Kinesis data analytics application by following steps 1, 2, and 3 in the Getting Started
exercise (see Step 3.1: Create an Application (p. 49)).
• Specify the streaming source you created in the section called “Step 1: Create Streams” (p. 132).
• After the console infers the schema, edit the schema. Ensure that the x and y column types are set
to DOUBLE and that the IS_HOT column type is set to VARCHAR.
2. Use the following application code (you can paste this code into the SQL editor):
135
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Hotspots
17)
);
Next Step
You can now send the application result from the in-application stream to an external destination, which
is another Kinesis data stream (ExampleOutputStream). You can then analyze the hotspot scores and
determine what an appropriate threshold is for hotspot heat. You can extend this application further to
generate alerts.
Now you have an external destination, where Amazon Kinesis Data Analytics persists any records,
your application writes to the in-application stream DESTINATION_SQL_STREAM.
4. You can optionally configure AWS Lambda to monitor the ExampleOutputStream stream and
send you alerts. For more information, see Using a Lambda Function as Output (p. 34). You can also
review the records that Kinesis Data Analytics writes to the external destination, which is the Kinesis
stream ExampleOutputStream, as described in Step 4: Verify the Application Output (p. 136).
Next Step
136
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Hotspots
<!doctype html>
<html lang=en>
<head>
<meta charset=utf-8>
<title>hotspots viewer</title>
<style>
#visualization {
display: block;
margin: auto;
}
.point {
opacity: 0.2;
}
.hot {
fill: red;
}
.cold {
fill: blue;
}
.hotspot {
stroke: black;
stroke-opacity: 0.8;
stroke-width: 1;
fill: none;
}
</style>
<script src="https://sdk.amazonaws.com/js/aws-sdk-2.202.0.min.js"></script>
<script src="https://d3js.org/d3.v4.min.js"></script>
</head>
<body>
<svg id="visualization" width="600" height="600"></svg>
<script src="hotspots_viewer.js"></script>
</body>
</html>
2. Create a file in the same directory named hotspots_viewer.js with the following contents.
Provide your AWS Region, credentials, and output stream name in the variables provided.
// Visualize example output from the Kinesis Analytics hotspot detection algorithm.
// This script assumes that the output stream has a single shard.
// The variables in this section should reflect way input data was generated and the
parameters that the HOTSPOTS
// function was called with.
var windowSize = 1000, // The window size used for hotspot detection
minimumDensity = 40, // A filter applied to returned hotspots before visualization
xRange = [0, 10], // The range of values to display on the x-axis
yRange = [0, 10]; // The range of values to display on the y-axis
137
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Hotspots
///////////////////////////////////////////////////////////////////////////////////////////////////
// D3 setup
///////////////////////////////////////////////////////////////////////////////////////////////////
// Return the linear function that maps the segment [a, b] to the segment [c, d].
function linearScale(a, b, c, d) {
var m = (d - c) / (b - a);
return function(x) {
return c + m * (x - a);
};
}
// helper functions to extract the x-value from a stream record and scale it for output
var xValue = function(r) { return r.x; },
xScale = linearScale(xRange[0], xRange[1], 0, graphWidth),
xMap = function(r) { return xScale(xValue(r)); };
// helper functions to extract the y-value from a stream record and scale it for output
var yValue = function(r) { return r.y; },
yScale = linearScale(yRange[0], yRange[1], 0, graphHeight),
yMap = function(r) { return yScale(yValue(r)); };
// a helper function that assigns a CSS class to a point based on whether it was
generated as part of a hotspot
var classMap = function(r) { return r.is_hot == "Y" ? "point hot" : "point cold"; };
var g = svg.append("g")
.attr("transform", "translate(" + margin.left + "," + margin.top + ")");
points.enter().append("circle")
.attr("class", classMap)
.attr("r", 3)
.attr("cx", xMap)
.attr("cy", yMap);
points.exit().remove();
if (hotspots) {
var boxes = g.selectAll("rect").data(hotspots);
boxes.enter().append("rect")
.merge(boxes)
.attr("class", "hotspot")
.attr("x", function(h) { return xScale(h.minValues[0]); })
.attr("y", function(h) { return yScale(h.minValues[1]); })
.attr("width", function(h) { return xScale(h.maxValues[0]) -
xScale(h.minValues[0]); })
.attr("height", function(h) { return yScale(h.maxValues[1]) -
yScale(h.minValues[1]); });
boxes.exit().remove();
}
}
///////////////////////////////////////////////////////////////////////////////////////////////////
138
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Hotspots
// Use the AWS SDK to pull output records from Kinesis and update the visualization
///////////////////////////////////////////////////////////////////////////////////////////////////
// Fetch a new records from the shard iterator, append them to records, and update the
visualization
function getRecordsAndUpdateVisualization(shardIterator, records, lastRecordIndex) {
kinesis.getRecords({
"ShardIterator": shardIterator
}, function(err, data) {
if (err) {
console.log(err, err.stack);
return;
}
update(records, hotspots);
getRecordsAndUpdateVisualization(data.NextShardIterator, records,
lastRecordIndex);
});
}
// Get a shard iterator for the output stream and begin updating the visualization.
Note that this script will only
// read records from the first shard in the stream.
function init() {
kinesis.describeStream({
"StreamName": outputStream
}, function(err, data) {
if (err) {
console.log(err, err.stack);
return;
}
139
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Example: Detect Hotspots
kinesis.getShardIterator({
"StreamName": outputStream,
"ShardId": shardId,
"ShardIteratorType": "LATEST"
}, function(err, data) {
if (err) {
console.log(err, err.stack);
return;
}
getRecordsAndUpdateVisualization(data.ShardIterator, [], 0);
})
});
}
3. With the Python code from the first section running, open index.html in a web browser. The
hotspot information appears on the page, as shown following.
140
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Alerts and Errors
Topics
• Example: Creating Simple Alerts (p. 141)
• Example: Creating Throttled Alerts (p. 142)
• Example: Exploring the In-Application Error Stream (p. 143)
If any rows show a stock price change that is greater than 1 percent, those rows are inserted into another
in-application stream. In the exercise, you can configure the application output to persist the results to
an external destination. You can then further investigate the results. For example, you can use an AWS
Lambda function to process records and send you alerts.
1. Create the analytics application as described in the Kinesis Data Analytics Getting Started exercise.
2. In the SQL editor in Kinesis Data Analytics, replace the application code with the following:
The SELECT statement in the application code filters rows in the SOURCE_SQL_STREAM_001 for
stock price changes greater than 1 percent. It then inserts those rows into another in-application
stream DESTINATION_SQL_STREAM using a pump. For more information about the coding pattern
that explains using pumps to insert rows into in-application streams, see Application Code (p. 31).
3. Choose Save and run SQL.
4. Add a destination. To do this, either choose the Destination tab in the SQL editor or choose Add a
destination on the application details page.
a. In the SQL editor, choose the Destination tab, and then choose Connect to a destination.
141
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Throttled Alerts
Now you have an external destination, a Kinesis data stream, where Kinesis Data Analytics persists
your application output in the DESTINATION_SQL_STREAM in-application stream.
5. Configure AWS Lambda to monitor the Kinesis stream you created and invoke a Lambda function.
For instructions, see Preprocessing Data Using a Lambda Function (p. 21).
1. Create a Kinesis data analytics application as described in the Kinesis Data Analytics Getting Started
exercise.
2. In the SQL editor in Kinesis Data Analytics, replace the application code with the following:
142
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
In-Application Error Stream
The SELECT statement in the application code filters rows in the SOURCE_SQL_STREAM_001 for
stock price changes greater than 1 percent and inserts those rows into another in-application stream
CHANGE_STREAM using a pump.
The application then creates a second stream named TRIGGER_COUNT_STREAM for the throttled
alerts. A second query selects records from a window that hops forward every time a record is
admitted into it, such that only one record per stock ticker per minute is written to the stream.
3. Choose Save and run SQL.
You perform the following exercises on the console. In these examples, you introduce errors in the input
configuration by editing the schema that is inferred by the discovery process, and then you verify the
rows that are sent to the error stream.
Topics
• Introducing a Parse Error (p. 143)
• Introducing a Divide by Zero Error (p. 144)
1. Create a Kinesis data analytics application as described in the Kinesis Data Analytics Getting Started
exercise.
2. On the application details page, choose Connect streaming data.
3. If you followed the Getting Started exercise, you have a demo stream (kinesis-analytics-
demo-stream) in your account. On the Connect to source page, choose this demo stream.
4. Kinesis Data Analytics takes a sample from the demo stream to infer a schema for the in-application
input stream it creates. The console shows the inferred schema and sample data in the Formatted
stream sample tab.
143
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Solution Accelerators
5. Next, edit the schema and modify the column type to introduce the parse error. Choose Edit
schema.
6. Change the TICKER_SYMBOL column type from VARCHAR(4) to INTEGER.
Now that the column type of the in-application schema that is created is invalid, Kinesis Data
Analytics can't bring in data in the in-application stream. Instead, it sends the rows to the error
stream.
7. Choose Save schema.
8. Choose Refresh schema samples.
Notice that there are no rows in the Formatted stream sample. However, the Error stream tab
shows data with an error message. The Error stream tab shows data sent to the in-application error
stream.
Because you changed the column data type, Kinesis Data Analytics could not bring the data in the
in-application input stream. It sent the data to the error stream instead.
1. Create a Kinesis data analytics application as described in the Kinesis Data Analytics Getting Started
exercise.
Sour
2. Update the SELECT statement in the application code to introduce divide by zero; for example:
Because the division by zero runtime error occurs, instead of writing the results to the
DESTINATION_SQL_STREAM, Kinesis Data Analytics sends rows to the in-application error stream.
On the Real-time analytics tab, choose the error stream, and then you can see the rows in the in-
application error stream.
144
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Real-Time IoT Device Monitoring
with Kinesis Data Analytics
145
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Data Protection
Security is a shared responsibility between AWS and you. The shared responsibility model describes this
as security of the cloud and security in the cloud:
• Security of the cloud – AWS is responsible for protecting the infrastructure that runs AWS services
in the AWS Cloud. AWS also provides you with services that you can use securely. The effectiveness
of our security is regularly tested and verified by third-party auditors as part of the AWS compliance
programs. To learn about the compliance programs that apply to Kinesis Data Analytics, see AWS
Services in Scope by Compliance Program.
• Security in the cloud – Your responsibility is determined by the AWS service that you use. You are also
responsible for other factors including the sensitivity of your data, your organization’s requirements,
and applicable laws and regulations.
This documentation helps you understand how to apply the shared responsibility model when using
Kinesis Data Analytics. The following topics show you how to configure Kinesis Data Analytics to meet
your security and compliance objectives. You'll also learn how to use other AWS services that can help
you to monitor and secure your Kinesis Data Analytics resources.
Topics
• Data Protection in Amazon Kinesis Data Analytics for SQL Applications (p. 146)
• Identity and Access Management in Kinesis Data Analytics (p. 147)
• Monitoring Amazon Kinesis Data Analytics (p. 150)
• Compliance Validation for Amazon Kinesis Data Analytics for SQL Applications (p. 150)
• Resilience in Amazon Kinesis Data Analytics (p. 150)
• Infrastructure Security in Kinesis Data Analytics for SQL Applications (p. 151)
• Security Best Practices for Kinesis Data Analytics (p. 151)
• You can encrypt data on the incoming Kinesis data stream using StartStreamEncryption. For more
information, see What Is Server-Side Encryption for Kinesis Data Streams?.
146
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Identity and Access Management
• Output data can be encrypted at rest using Kinesis Data Firehose to store data in an encrypted
Amazon S3 bucket. You can specify the encryption key that your Amazon S3 bucket uses. For more
information, see Protecting Data Using Server-Side Encryption with AWS KMS–Managed Keys (SSE-
KMS).
• Your application's code is encrypted at rest.
• Your application's reference data is encrypted at rest.
Encryption In Transit
Kinesis Data Analytics encrypts all data in transit. Encryption in transit is enabled for all Kinesis Data
Analytics applications and cannot be disabled.
Key Management
Data encryption in Kinesis Data Analytics uses service-managed keys. Customer-managed keys are not
supported.
You can grant these permissions by creating an IAM role that Amazon Kinesis Data Analytics can assume.
Permissions that you grant to this role determine what Amazon Kinesis Data Analytics can do when the
service assumes the role.
Note
The information in this section is useful if you want to create an IAM role yourself. When you
create an application in the Amazon Kinesis Data Analytics console, the console can create an
IAM role for you at that time. The console uses the following naming convention for IAM roles
that it creates:
kinesis-analytics-ApplicationName
After the role is created, you can review the role and attached policies in the IAM console.
Each IAM role has two policies attached to it. In the trust policy, you specify who can assume the role. In
the permissions policy (there can be one or more), you specify the permissions that you want to grant to
this role. The following sections describe these policies, which you can use when you create an IAM role.
Trust Policy
To grant Amazon Kinesis Data Analytics permissions to assume a role to access a streaming or reference
source, you can attach the following trust policy to an IAM role:
147
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Permissions Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "kinesisanalytics.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Permissions Policy
If you are creating an IAM role to allow Amazon Kinesis Data Analytics to read from an application's
streaming source, you must grant permissions for relevant read actions. Depending on your source (for
example, an Kinesis stream, a Kinesis Data Firehose delivery stream, or a reference source in an Amazon
S3 bucket), you can attach the following permissions policy.
148
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Permissions Policy
]
}
Note
The firehose:Get* permission refers to an internal accessor that Kinesis Data Analytics uses
to access the stream. There is no public accessor for a Kinesis Data Firehose delivery stream.
If you direct Amazon Kinesis Data Analytics to write output to external destinations in your application
output configuration, you need to grant the following permission to the IAM role.
149
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Monitoring
"Effect": "Allow",
"Action": [
"s3:Get*",
"s3:List*"
],
"Resource": "*"
}
]
}
For a list of AWS services in scope of specific compliance programs, see AWS Services in Scope by
Compliance Program. For general information, see AWS Compliance Programs.
You can download third-party audit reports using AWS Artifact. For more information, see Downloading
Reports in AWS Artifact.
Your compliance responsibility when using Kinesis Data Analytics is determined by the sensitivity of your
data, your company's compliance objectives, and applicable laws and regulations. If your use of Kinesis
Data Analytics is subject to compliance with standards such as HIPAA or PCI, AWS provides resources to
help:
• Security and Compliance Quick Start Guides – These deployment guides discuss architectural
considerations and provide steps for deploying security- and compliance-focused baseline
environments on AWS.
• Architecting for HIPAA Security and Compliance Whitepaper – This whitepaper describes how
companies can use AWS to create HIPAA-compliant applications.
• AWS Compliance Resources – This collection of workbooks and guides might apply to your industry
and location.
• AWS Config – This AWS service assesses how well your resource configurations comply with internal
practices, industry guidelines, and regulations.
• AWS Security Hub – This AWS service provides a comprehensive view of your security state within AWS
that helps you check your compliance with security industry standards and best practices.
150
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Disaster Recovery
Availability Zones are more highly available, fault tolerant, and scalable than traditional single or
multiple data center infrastructures.
For more information about AWS Regions and Availability Zones, see AWS Global Infrastructure.
In addition to the AWS global infrastructure, Kinesis Data Analytics offers several features to help
support your data resiliency and backup needs.
Disaster Recovery
Kinesis Data Analytics runs in a serverless mode, and takes care of host degradations, Availability
Zone availability, and other infrastructure related issues by performing automatic migration. When
this happens, Kinesis Data Analytics ensures that the application is processed without any loss of
data. For more information, see Delivery Model for Persisting Application Output to an External
Destination (p. 40).
You use AWS published API calls to access Kinesis Data Analytics through the network. Clients must
support Transport Layer Security (TLS) 1.0 or later. We recommend TLS 1.2 or later. Clients must also
support cipher suites with perfect forward secrecy (PFS) such as Ephemeral Diffie-Hellman (DHE) or
Elliptic Curve Ephemeral Diffie-Hellman (ECDHE). Most modern systems such as Java 7 and later support
these modes.
Additionally, requests must be signed by using an access key ID and a secret access key that is associated
with an IAM principal. Or you can use the AWS Security Token Service (AWS STS) to generate temporary
security credentials to sign requests.
151
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Implement Server-Side Encryption in Dependent Resources
credentials that are not automatically rotated and could have a significant business impact if they are
compromised.
Instead, you should use an IAM role to manage temporary credentials for your application to access other
resources. When you use a role, you don't have to use long-term credentials (such as a user name and
password or access keys) to access other resources.
For more information, see the following topics in the IAM User Guide:
• IAM Roles
• Common Scenarios for Roles: Users, Applications, and Services
Using the information collected by CloudTrail, you can determine the request that was made to Kinesis
Data Analytics, the IP address from which the request was made, who made the request, when it was
made, and additional details.
For more information, see the section called “Using AWS CloudTrail” (p. 162).
152
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Monitoring Tools
The next step is to establish a baseline for normal Amazon Kinesis Data Analytics performance in your
environment, by measuring performance at various times and under different load conditions. As you
monitor Amazon Kinesis Data Analytics, you can store historical monitoring data. If you do, you can
compare it with current performance data, identify normal performance patterns and performance
anomalies, and devise methods to address issues.
With Amazon Kinesis Data Analytics, you monitor the application. The application processes data
streams (input or output), both of which include identifiers which you can use to narrow your search on
CloudWatch logs. For information about how Amazon Kinesis Data Analytics processes data streams, see
Amazon Kinesis Data Analytics for SQL Applications: How It Works (p. 3).
The most important metric is the millisBehindLatest, which indicates how far behind an application
is reading from the streaming source. In a typical case, the milliseconds behind should be at or near zero.
It is common for brief spikes to appear, which appears as an increase in millisBehindLatest.
We recommend that you set up a CloudWatch alarm that triggers when the application is behind by
more than an hour reading the streaming source. For some use cases that require very close to real-time
processing, such as emitting processed data to a live application, you might choose to set the alarm at a
lower value, such as five minutes.
Topics
• Monitoring Tools (p. 153)
• Monitoring with Amazon CloudWatch (p. 154)
• Logging Kinesis Data Analytics API Calls with AWS CloudTrail (p. 162)
Monitoring Tools
AWS provides various tools that you can use to monitor Amazon Kinesis Data Analytics. You can
configure some of these tools to do the monitoring for you, while some of the tools require manual
intervention. We recommend that you automate monitoring tasks as much as possible.
153
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Automated Tools
• Amazon CloudWatch Alarms – Watch a single metric over a time period that you specify, and perform
one or more actions based on the value of the metric relative to a given threshold over a number of
time periods. The action is a notification sent to an Amazon Simple Notification Service (Amazon SNS)
topic or Amazon EC2 Auto Scaling policy. CloudWatch alarms do not invoke actions simply because
they are in a particular state; the state must have changed and been maintained for a specified
number of periods. For more information, see Monitoring with Amazon CloudWatch (p. 154).
• Amazon CloudWatch Logs – Monitor, store, and access your log files from AWS CloudTrail or other
sources. For more information, see Monitoring Log Files in the Amazon CloudWatch User Guide.
• Amazon CloudWatch Events – Match events and route them to one or more target functions or
streams to make changes, capture state information, and take corrective action. For more information,
see What is Amazon CloudWatch Events in the Amazon CloudWatch User Guide.
• AWS CloudTrail Log Monitoring – Share log files between accounts, monitor CloudTrail log files in real
time by sending them to CloudWatch Logs, write log processing applications in Java, and validate that
your log files have not changed after delivery by CloudTrail. For more information, see Working with
CloudTrail Log Files in the AWS CloudTrail User Guide.
154
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Metrics and Dimensions
Topics
• Kinesis Data Analytics Metrics and Dimensions (p. 155)
• Viewing Amazon Kinesis Data Analytics Metrics and Dimensions (p. 156)
• Creating CloudWatch Alarms to Monitor Amazon Kinesis Data Analytics (p. 157)
• Working with Amazon CloudWatch Logs (p. 158)
Metric Description
Levels: Application-level
Levels: Application-level
155
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Viewing Metrics and Dimensions
Metric Description
Levels: Per input stream
Amazon Kinesis Data Analytics provides metrics for the following dimensions.
Dimension Description
On the console, metrics are grouped first by service namespace, and then by the dimension combinations
within each namespace.
156
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Alarms
• Application
• Input stream
• Output stream
Alarms invoke actions for sustained state changes only. For a CloudWatch alarm to invoke an action, the
state must have changed and been maintained for a specified amount of time.
You can set alarms using the AWS Management Console, CloudWatch AWS CLI, or CloudWatch API, as
described following.
1. Sign in to the AWS Management Console and open the CloudWatch console at https://
console.aws.amazon.com/cloudwatch/.
2. Choose Create Alarm. The Create Alarm Wizard starts.
3. Choose Kinesis Analytics Metrics. Then scroll through the Amazon Kinesis Data Analytics metrics to
locate the metric you want to place an alarm on.
To display just Amazon Kinesis Data Analytics metrics, search for the file system ID of your file
system. Choose the metric to create an alarm for, and then choose Next.
4. Enter values for Name, Description, and Whenever for the metric.
5. If you want CloudWatch to send you an email when the alarm state is reached, in the Whenever this
alarm: field, choose State is ALARM. In the Send notification to: field, choose an existing SNS topic.
If you select Create topic, you can set the name and email addresses for a new email subscription
list. This list is saved and appears in the field for future alarms.
Note
If you use Create topic to create a new Amazon SNS topic, the email addresses must be
verified before they receive notifications. Emails are only sent when the alarm enters an
alarm state. If this alarm state change happens before the email addresses are verified, they
do not receive a notification.
6. In the Alarm Preview section, preview the alarm you’re about to create.
7. Choose Create Alarm to create the alarm.
157
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Logs
• Call mon-put-metric-alarm. For more information, see the Amazon CloudWatch CLI Reference.
• Call PutMetricAlarm. For more information, see the Amazon CloudWatch API Reference.
Amazon Kinesis Data Analytics can generate configuration errors under the following conditions:
For more information about Amazon CloudWatch, see the Amazon CloudWatch User Guide.
Trust Policy
To grant Kinesis Data Analytics permissions to assume an IAM role, you can attach the following trust
policy to the role.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "kinesisanalytics.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
158
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Logs
Permissions Policy
To grant an application permissions to write log events to CloudWatch from a Kinesis Data Analytics
resource, you can use the following IAM permissions policy.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt0123456789000",
"Effect": "Allow",
"Action": [
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:us-east-1:123456789012:log-group:my-log-group:log-stream:my-
log-stream*"
]
}
]
}
{
"ApplicationCode": "<The SQL code the new application will run on the input stream>",
"ApplicationDescription": "<A friendly description for the new application>",
"ApplicationName": "<The name for the new application>",
"Inputs": [ ... ],
"Outputs": [ ... ],
"CloudWatchLoggingOptions": [{
"LogStreamARN": "<Amazon Resource Name (ARN) of the CloudWatch log stream to add to
the new application>",
"RoleARN": "<ARN of the role to use to access the log>"
}]
}
{
"ApplicationName": "<Name of the application to add the log option to>",
"CloudWatchLoggingOption": {
159
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Logs
{
"ApplicationName": "<Name of the application to update the log option for>",
"ApplicationUpdate": {
"CloudWatchLoggingOptionUpdates": [
{
"CloudWatchLoggingOptionId": "<ID of the logging option to modify>",
"LogStreamARNUpdate": "<ARN of the new log stream to use>",
"RoleARNUpdate": "<ARN of the new role to use to access the log stream>"
}
],
},
"CurrentApplicationVersionId": <ID of the application version to modify>
}
{
"ApplicationName": "<Name of application to delete log option from>",
"CloudWatchLoggingOptionId": "<ID of the application log option to delete>",
"CurrentApplicationVersionId": <Version of the application to delete the log option
from>
}
Configuration Errors
The following sections contain details about errors that you might see in Amazon CloudWatch Logs from
a misconfigured application.
{
"applicationARN": "string",
"applicationVersionId": integer,
"messageType": "ERROR",
"message": "string",
160
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Logs
"inputId": "string",
"referenceId": "string",
"errorCode": "string"
"messageSchemaVersion": "integer",
}
• applicationARN: The Amazon Resource Name (ARN) of the generating application, for example:
arn:aws:kinesisanalytics:us-east-1:112233445566:application/sampleApp
• applicationVersionId: The version of the application at the time the error was encountered. For
more information, see ApplicationDetail (p. 244).
• messageType: The message type. Currently, this type can be only ERROR.
• message: The details of the error, for example:
There is a problem related to the configuration of your input. Please check that the
resource exists, the role has the correct permissions to access the resource and that
Kinesis Analytics can assume the role provided.
• inputId: The ID associated with the application input. This value is only present if this input is the
cause of the error. This value is not present if referenceId is present. For more information, see
DescribeApplication (p. 219).
• referenceId: The ID associated with the application reference data source. This value is only present
if this source is the cause of the error. This value is not present if inputId is present. For more
information, see DescribeApplication (p. 219).
• errorCode: The identifier for the error. This ID is either InputError or ReferenceDataError.
• messageSchemaVersion: A value that specifies the current message schema version, currently 1. You
can check this value to see if the error message schema has been updated.
Errors
The errors that might appear in CloudWatch Logs for Amazon Kinesis Data Analytics include the
following.
If an ARN is specified for a Kinesis input stream that doesn't exist, but the ARN is syntactically correct, an
error like the following is generated.
{
"applicationARN": "arn:aws:kinesisanalytics:us-east-1:112233445566:application/sampleApp",
"applicationVersionId": "5",
"messageType": "ERROR",
"message": "There is a problem related to the configuration of your input. Please check
that the resource exists, the role has the correct permissions to access the resource and
that Kinesis Analytics can assume the role provided.",
"inputId":"1.1",
"errorCode": "InputError",
"messageSchemaVersion": "1"
}
If an incorrect Amazon S3 file key is used for reference data, an error like the following is generated.
{
"applicationARN": "arn:aws:kinesisanalytics:us-east-1:112233445566:application/sampleApp",
"applicationVersionId": "5",
"messageType": "ERROR",
161
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using AWS CloudTrail
"message": "There is a problem related to the configuration of your reference data. Please
check that the bucket and the file exist, the role has the correct permissions to access
these resources and that Kinesis Analytics can assume the role provided.",
"referenceId":"1.1",
"errorCode": "ReferenceDataError",
"messageSchemaVersion": "1"
}
If an ARN is specified for an IAM input role that doesn't exist, but the ARN is syntactically correct, an error
like the following is generated.
{
"applicationARN": "arn:aws:kinesisanalytics:us-east-1:112233445566:application/sampleApp",
"applicationVersionId": "5",
"messageType": "ERROR",
"message": "There is a problem related to the configuration of your input. Please check
that the resource exists, the role has the correct permissions to access the resource and
that Kinesis Analytics can assume the role provided.",
"inputId":null,
"errorCode": "InputError",
"messageSchemaVersion": "1"
}
If an input role is used that doesn't have permission to access the input resources, such as a Kinesis
source stream, an error like the following is generated.
{
"applicationARN": "arn:aws:kinesisanalytics:us-east-1:112233445566:application/sampleApp",
"applicationVersionId": "5",
"messageType": "ERROR",
"message": "There is a problem related to the configuration of your input. Please check
that the resource exists, the role has the correct permissions to access the resource and
that Kinesis Analytics can assume the role provided.",
"inputId":null,
"errorCode": "InputError",
"messageSchemaVersion": "1"
}
To learn more about CloudTrail, see the AWS CloudTrail User Guide.
162
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Kinesis Data Analytics Information in CloudTrail
For an ongoing record of events in your AWS account, including events for Kinesis Data Analytics, create
a trail. A trail enables CloudTrail to deliver log files to an Amazon S3 bucket. By default, when you create
a trail in the console, the trail applies to all AWS Regions. The trail logs events from all Regions in the
AWS partition and delivers the log files to the Amazon S3 bucket that you specify. Additionally, you can
configure other AWS services to further analyze and act upon the event data collected in CloudTrail logs.
For more information, see the following:
All Kinesis Data Analytics actions are logged by CloudTrail and are documented in the Kinesis Data
Analytics API reference. For example, calls to the CreateApplication and UpdateApplication
actions generate entries in the CloudTrail log files.
Every event or log entry contains information about who generated the request. The identity
information helps you determine the following:
• Whether the request was made with root or AWS Identity and Access Management (IAM) user
credentials.
• Whether the request was made with temporary security credentials for a role or federated user.
• Whether the request was made by another AWS service.
The following example shows a CloudTrail log entry that demonstrates the
AddApplicationCloudWatchLoggingOption and DescribeApplication actions.
{
"Records": [
{
"eventVersion": "1.05",
"userIdentity": {
"type": "IAMUser",
"principalId": "EX_PRINCIPAL_ID",
"arn": "arn:aws:iam::012345678910:user/Alice",
"accountId": "012345678910",
"accessKeyId": "EXAMPLE_KEY_ID",
163
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Understanding Kinesis Data Analytics Log File Entries
"userName": "Alice"
},
"eventTime": "2019-03-14T01:03:00Z",
"eventSource": "kinesisanalytics.amazonaws.com",
"eventName": "AddApplicationCloudWatchLoggingOption",
"awsRegion": "us-east-1",
"sourceIPAddress": "127.0.0.1",
"userAgent": "aws-sdk-java/unknown-version Linux/x.xx",
"requestParameters": {
"currentApplicationVersionId": 1,
"cloudWatchLoggingOption": {
"roleARN": "arn:aws:iam::012345678910:role/cloudtrail_test",
"logStreamARN": "arn:aws:logs:us-east-1:012345678910:log-
group:cloudtrail-test:log-stream:sql-cloudwatch"
},
"applicationName": "cloudtrail-test"
},
"responseElements": null,
"requestID": "e897cd34-45f4-11e9-8912-e52573a36cd9",
"eventID": "57fe50e9-c764-47c3-a0aa-d0c271fa1cbb",
"eventType": "AwsApiCall",
"apiVersion": "2015-08-14",
"recipientAccountId": "303967445486"
},
{
"eventVersion": "1.05",
"userIdentity": {
"type": "IAMUser",
"principalId": "EX_PRINCIPAL_ID",
"arn": "arn:aws:iam::012345678910:user/Alice",
"accountId": "012345678910",
"accessKeyId": "EXAMPLE_KEY_ID",
"userName": "Alice"
},
"eventTime": "2019-03-14T05:37:20Z",
"eventSource": "kinesisanalytics.amazonaws.com",
"eventName": "DescribeApplication",
"awsRegion": "us-east-1",
"sourceIPAddress": "127.0.0.1",
"userAgent": "aws-sdk-java/unknown-version Linux/x.xx",
"requestParameters": {
"applicationName": "cloudtrail-test"
},
"responseElements": null,
"requestID": "3b74eb29-461b-11e9-a645-fb677e53d147",
"eventID": "750d0def-17b6-4c20-ba45-06d9d45e87ee",
"eventType": "AwsApiCall",
"apiVersion": "2015-08-14",
"recipientAccountId": "012345678910"
}
]
}
164
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Limits
When working with Amazon Kinesis Data Analytics for SQL Applications, note the following limits:
• The size of a row in an in-application stream is limited to 512 KB. Kinesis Data Analytics uses up to 1
KB to store metadata. This metadata counts against the row limit.
• The SQL code in an application is limited to 100 KB.
• The service is available in specific AWS Regions. For more information, see Amazon Kinesis Data
Analytics in the AWS General Reference.
• You can create up to 50 Kinesis Data Analytics applications per AWS Region in your account. You
can create a case to request additional applications via the service limit increase form. For more
information, see the AWS Support Center.
• The maximum streaming throughput a single Kinesis Data Analytics for SQL application can process
is approximately 100 MB/sec. This assumes that you have increased the number of in-application
streams to the maximum value of 64, and you have increased your KPU limit beyond 8 (see the
following limit for details). If your application needs to process more than 100 MB/sec of input, do one
of the following:
• Use multiple Kinesis Data Analytics for SQL applications to process input
• Use Kinesis Data Analytics for Java Applications if you want to continue to use a single stream and
application.
• The number of Kinesis processing units (KPU) is limited to eight. For instructions on how to request an
increase to this limit, see To request a limit increase in AWS Service Limits.
With Kinesis Data Analytics, you pay only for what you use. You are charged an hourly rate based on
the average number of KPUs that are used to run your stream-processing application. A single KPU
provides you with 1 vCPU and 4 GB of memory.
• Each application can have one streaming source and up to one reference data source.
• You can configure up to three destinations for your Kinesis Data Analytics application. We recommend
that you use one of these destinations to persist in-application error stream data.
• The Amazon S3 object that stores reference data can be up to 1 GB in size.
• If you change the reference data that is stored in the S3 bucket after you upload reference data to an
in-application table, you need to use the UpdateApplication (p. 239) operation (using the API or AWS
165
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
CLI) to refresh the data in the in-application table. Currently, the AWS Management Console doesn't
support refreshing reference data in your application.
• Currently, Kinesis Data Analytics doesn't support data generated by the Amazon Kinesis Producer
Library (KPL).
• You can assign up to 50 tags per application.
166
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Managing Applications
Best Practices
This section describes best practices when working with Amazon Kinesis Data Analytics applications.
Topics
• Managing Applications (p. 167)
• Scaling Applications (p. 168)
• Defining Input Schema (p. 168)
• Connecting to Outputs (p. 169)
• Authoring Application Code (p. 169)
• Testing Applications (p. 170)
Managing Applications
When managing Amazon Kinesis Data Analytics applications, follow these best practices:
• Set up Amazon CloudWatch alarms – You can use the CloudWatch metrics that Kinesis Data Analytics
provides to monitor the following:
• Input bytes and input records (number of bytes and records entering the application)
• Output bytes and output records
• MillisBehindLatest (how far behind the application is in reading from the streaming source)
We recommend that you set up at least two CloudWatch alarms on the following metrics for your in-
production applications:
• MillisBehindLatest – For most cases, we recommend that you set this alarm to trigger when
your application is 1 hour behind the latest data, for an average of 1 minute. For applications with
lower end-to-end processing needs, you can tune this to a lower tolerance. This alarm can help
ensure that your application is reading the latest data.
• To avoid getting the ReadProvisionedThroughputException exception, limit the number of
production applications reading from the same Kinesis data stream to two applications.
Note
In this case, application refers to any application that can read from the streaming source.
Only a Kinesis Data Analytics application can read from a Kinesis Data Firehose delivery
stream. However, many applications can read from a Kinesis data stream, such as a Kinesis
Data Analytics application or AWS Lambda. The recommended application limit refers to all
applications that you configure to read from a streaming source.
Amazon Kinesis Data Analytics reads a streaming source approximately once per second per
application. However, an application that falls behind might read data at a faster rate to catch up. To
allow adequate throughput for applications to catch up, limit the number of applications reading the
same data source.
167
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Scaling Applications
• Limit the number of production applications reading from the same Kinesis Data Firehose delivery
stream to one application.
A Kinesis Data Firehose delivery stream can write to destinations such as Amazon S3 and Amazon
Redshift. It can also be a streaming source for your Kinesis Data Analytics application. Therefore, we
recommend that you do not configure more than one Kinesis Data Analytics application per Kinesis
Data Firehose delivery stream. This helps ensure that the delivery stream can also deliver to other
destinations.
Scaling Applications
Set up your application for your future scaling needs by proactively increasing the number of input in-
application streams from the default (one). We recommend the following language choices based on the
throughput of your application:
• Use multiple streams and Kinesis Data Analytics for SQL applications if your application has scaling
needs beyond 100 MB/second.
• Use Kinesis Data Analytics for Java Applications if you want to use a single stream and application.
• Adequately test the inferred schema. The discovery process uses only a sample of records on the
streaming source to infer a schema. If your streaming source has many record types, the discovery API
might have missed sampling one or more record types. This situation can result in a schema that does
not accurately reflect data on the streaming source.
When your application starts, these missed record types might result in parsing errors. Amazon Kinesis
Data Analytics sends these records to the in-application error stream. To reduce these parsing errors,
we recommend that you test the inferred schema interactively in the console and monitor the in-
application stream for missed records.
• The Kinesis Data Analytics API does not support specifying the NOT NULL constraint on columns in
the input configuration. If you want NOT NULL constraints on columns in your in-application stream,
create these in-application streams using your application code. You can then copy data from one in-
application stream into another, and then the constraint is enforced.
Any attempt to insert rows with NULL values when a value is required results in an error. Kinesis Data
Analytics sends these errors to the in-application error stream.
• Relax data types inferred by the discovery process. The discovery process recommends columns and
data types based on a random sampling of records on the streaming source. We recommend that
you review these carefully and consider relaxing these data types to cover all of the possible cases of
records in your input. This ensures fewer parsing errors across the application while it is running. For
example, if an inferred schema has a SMALLINT as a column type, consider changing it to an INTEGER.
168
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Connecting to Outputs
• Use SQL functions in your application code to handle any unstructured data or columns. You might
have unstructured data or columns, such as log data, in your input. For examples, see Example:
Transforming DateTime Values (p. 98). One approach to handling this type of data is to define the
schema with only one column of type VARCHAR(N), where N is the largest possible row that you would
expect to see in your stream. In your application code, you can then read the incoming records and use
the String and Date Time functions to parse and schematize the raw data.
• Make sure that you completely handle streaming source data that contains nesting more than two
levels deep. When source data is JSON, you can have nesting. The discovery API infers a schema that
flattens one level of nesting. For two levels of nesting, the discovery API also tries to flatten these.
Beyond two levels of nesting, there is limited support for flattening. To handle nesting completely, you
have to manually modify the inferred schema to suit your needs. Use either of the following strategies
to do this:
• Use the JSON row path to selectively pull out only the required key value pairs for your application.
A JSON row path provides a pointer to the specific key value pair that you want to bring in your
application. You can do this for any level of nesting.
• Use the JSON row path to selectively pull out complex JSON objects and then use string
manipulation functions in your application code to pull the specific data that you need.
Connecting to Outputs
We recommend that every application have at least two outputs:
• Use the first destination to insert the results of your SQL queries.
• Use the second destination to insert the entire error stream and send it to an S3 bucket through a
Kinesis Data Firehose delivery stream.
• In your SQL statement, don't specify a time-based window that is longer than one hour for the
following reasons:
• Sometimes an application needs to be restarted, either because you updated the application or for
Kinesis Data Analytics internal reasons. When it restarts, all data included in the window must be
read again from the streaming data source. This takes time before Kinesis Data Analytics can emit
output for that window.
• Kinesis Data Analytics must maintain everything related to the application's state, including relevant
data, for the duration. This consumes significant Kinesis Data Analytics processing units.
• During development, keep the window size small in your SQL statements so that you can see the
results faster. When you deploy the application to your production environment, you can set the
window size as appropriate.
• Instead of a single complex SQL statement, consider breaking it into multiple statements, in each step
saving results in intermediate in-application streams. This might help you debug faster.
• When you're using tumbling windows, we recommend that you use two windows, one for processing
time and one for your logical time (ingest time or event time). For more information, see Timestamps
and the ROWTIME Column (p. 69).
169
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Testing Applications
Testing Applications
When you're changing the schema or application code for your Kinesis Data Analytics application, we
recommend using a test application to verify your changes before deploying them to production.
When setting up a test application, you can either connect the application to your live data, or you can
populate a stream with mock data to test against. We recommend two methods for populating a stream
with mock data:
• Use the Kinesis Data Generator (KDG). The KDG uses a data template to send random data to a Kinesis
stream. The KDG is simple to use, but isn't appropriate for testing complex relationships between data
items, such as for applications that detect data hotspots or anomalies.
• Use a custom Python application to send more complex data to a Kinesis data stream. A Python
application can generate complex relationships between data items, such as hotspots or anomalies.
For an example of a Python application that sends data clustered into a data hotspot, see Example:
Detecting Hotspots on a Stream (HOTSPOTS Function) (p. 131).
When running your test application, view your results using a destination (such as a Kinesis Data Firehose
delivery stream to an Amazon Redshift database) instead of viewing your in-application stream on the
console. The data that is displayed on the console is a sampling of the stream and doesn't contain all of
the records.
• The data from your stream is being coerced into the correct data type. For example, ensure that
datetime data is not being ingested into the application as a string.
• The data is being parsed and coerced into the data type that you want. If parsing or coercion errors
occur, you can view them on the console, or assign a destination to the error stream and view the
errors in the destination store.
• The data fields for character data are of sufficient length, and the application isn't truncating the
character data. You can check the data records in your destination store to verify that your application
data isn't being truncated.
170
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Unable to Run SQL Code
Topics
• Unable to Run SQL Code (p. 171)
• Unable to Detect or Discover My Schema (p. 171)
• Reference Data is Out of Date (p. 172)
• Application Not Writing to Destination (p. 172)
• Important Application Health Parameters to Monitor (p. 172)
• Invalid Code Errors When Running an Application (p. 173)
• Application is Writing Errors to the Error Stream (p. 173)
• Insufficient Throughput or High MillisBehindLatest (p. 173)
• For more information about SQL statements, see Example Applications (p. 84). This section provides a
number of SQL examples that you can use.
• The Amazon Kinesis Data Analytics SQL Reference provides a detailed guide to authoring streaming
SQL statements.
• If you're still running into issues, we recommend that you ask a question on the Kinesis Data Analytics
Forums.
Suppose that you have UTF-8 encoded data that doesn't use a delimiter, or data that uses a format other
than comma-separated values (CSV), or the discovery API did not discover your schema. In these cases,
you can define a schema manually or use string manipulation functions to structure your data.
To discover the schema for your stream, Kinesis Data Analytics randomly samples the latest data in your
stream. If you aren't consistently sending data to your stream, Kinesis Data Analytics might not be able
to retrieve a sample and detect a schema. For more information, see Using the Schema Discovery Feature
on Streaming Data (p. 17).
171
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Reference Data is Out of Date
Reference data is not loaded into the application when updates are made to the underlying Amazon S3
object.
If the reference data in the application is not up to date, you can reload the data by following these
steps:
1. On the Kinesis Data Analytics console, choose the application name in the list, and then choose
Application details.
2. Choose Go to SQL editor to open the Real-time analytics page for the application.
3. In the Source Data view, choose your reference data table name.
4. Choose Actions, Synchronize reference data table.
• Verify that the application's role has sufficient permission to access the destination. For more
information, see Permissions Policy for Writing to a Kinesis Stream (p. 149) or Permissions Policy for
Writing to a Firehose Delivery Stream (p. 149).
• Verify that the application destination is correctly configured and that the application is using the
correct name for the output stream.
• Check the Amazon CloudWatch metrics for your output stream to see if data is being written. For
information about using CloudWatch metrics, see Monitoring with Amazon CloudWatch (p. 154).
• Add a CloudWatch log stream using the section called
“AddApplicationCloudWatchLoggingOption” (p. 189). Your application will write configuration errors
to the log stream.
If the role and destination configuration look correct, try restarting the application, specifying
LAST_STOPPED_POINT for the InputStartingPositionConfiguration (p. 268).
The most important parameter to monitor is the Amazon CloudWatch metric MillisBehindLatest.
This metric represents how far behind the current time you are reading from the stream. This metric
helps you determine whether you are processing records from the source stream fast enough.
As a general rule, you should set up a CloudWatch alarm to trigger if you fall behind more than one hour.
However, the amount of time depends on your use case. You can adjust it as needed.
172
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Invalid Code Errors When Running an Application
• The stream was redefined in your SQL code – After you create a stream and the pump associated
with the stream, you can't redefine the same stream in your code. For more information about
creating a stream, see CREATE STREAM in the Amazon Kinesis Data Analytics SQL Reference. For more
information about creating a pump, see CREATE PUMP.
• A GROUP BY clause uses multiple ROWTIME columns – You can specify only one ROWTIME column
in the GROUP BY clause. For more information, see GROUP BY and ROWTIME in the Amazon Kinesis
Data Analytics SQL Reference.
• One or more data types have an invalid casting – In this case, your code has an invalid implicit cast.
For example, you might be casting a timestamp to a bigint in your code.
• A stream has the same name as a service reserved stream name – A stream can't have the same
name as the service-reserved stream error_stream.
• Check your application's InputBytes CloudWatch metric. If you are ingesting more than 4 MB/sec,
this can cause an increase in MillisBehindLatest. To improve your application's throughput,
increase the value of the InputParallelism parameter. For more information, see Parallelizing
Input Streams for Increased Throughput (p. 28).
• Check your application's output delivery Success metric for failures in delivering to your destination.
Verify that you have correctly configured the output, and that your output stream has sufficient
capacity.
• If your application uses an AWS Lambda function for pre-processing or as an output, check the
application’s InputProcessing.Duration or LambdaDelivery.Duration CloudWatch metric. If the Lambda
function invocation duration is longer than 5 seconds, consider doing the following:
• Increase the Lambda function’s Memory allocation. You can do this on the AWS Lambda console,
on the Configuration page, under Basic settings. For more information, see Configuring Lambda
Functions in the AWS Lambda Developer Guide.
• Increase the number of shards in your input stream of the application. This increases the number of
parallel functions that the application will invoke, which might increase throughput.
• Verify that the function is not making blocking calls that are affecting performance, such as
synchronous requests for external resources.
• Examine your AWS Lambda function to see whether there are other areas where you can
improve performance. Check the CloudWatch Logs of the application Lambda function. For more
173
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Insufficient Throughput or High MillisBehindLatest
information, see Accessing Amazon CloudWatch Metrics for AWS Lambda in the AWS Lambda
Developer Guide.
• Verify that your application is not reaching the default limit for Kinesis Processing Units (KPU). If
your application is reaching this limit, you can request a limit increase. For more information, see
Automatically Scaling Applications to Increase Throughput (p. 42).
174
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Authentication
Authentication
You can access AWS as any of the following types of identities:
• AWS account root user – When you first create an AWS account, you begin with a single sign-in
identity that has complete access to all AWS services and resources in the account. This identity is
called the AWS account root user and is accessed by signing in with the email address and password
that you used to create the account. We strongly recommend that you do not use the root user for
your everyday tasks, even the administrative ones. Instead, adhere to the best practice of using the
root user only to create your first IAM user. Then securely lock away the root user credentials and use
them to perform only a few account and service management tasks.
• IAM user – An IAM user is an identity within your AWS account that has specific custom permissions
(for example, permissions to create an application in Amazon Kinesis Data Analytics). You can use an
IAM user name and password to sign in to secure AWS webpages like the AWS Management Console,
AWS Discussion Forums, or the AWS Support Center.
In addition to a user name and password, you can also generate access keys for each user. You can
use these keys when you access AWS services programmatically, either through one of the several
SDKs or by using the AWS Command Line Interface (CLI). The SDK and CLI tools use the access keys
to cryptographically sign your request. If you don’t use AWS tools, you must sign the request yourself.
Amazon Kinesis Data Analytics supports Signature Version 4, a protocol for authenticating inbound API
requests. For more information about authenticating requests, see Signature Version 4 Signing Process
in the AWS General Reference.
• IAM role – An IAM role is an IAM identity that you can create in your account that has specific
permissions. An IAM role is similar to an IAM user in that it is an AWS identity with permissions policies
that determine what the identity can and cannot do in AWS. However, instead of being uniquely
associated with one person, a role is intended to be assumable by anyone who needs it. Also, a role
does not have standard long-term credentials such as a password or access keys associated with it.
175
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Access Control
Instead, when you assume a role, it provides you with temporary security credentials for your role
session. IAM roles with temporary credentials are useful in the following situations:
• Federated user access – Instead of creating an IAM user, you can use existing identities from AWS
Directory Service, your enterprise user directory, or a web identity provider. These are known as
federated users. AWS assigns a role to a federated user when access is requested through an identity
provider. For more information about federated users, see Federated Users and Roles in the IAM User
Guide.
• AWS service access – A service role is an IAM role that a service assumes to perform actions in your
account on your behalf. When you set up some AWS service environments, you must define a role
for the service to assume. This service role must include all the permissions that are required for
the service to access the AWS resources that it needs. Service roles vary from service to service, but
many allow you to choose your permissions as long as you meet the documented requirements
for that service. Service roles provide access only within your account and cannot be used to grant
access to services in other accounts. You can create, modify, and delete a service role from within
IAM. For example, you can create a role that allows Amazon Redshift to access an Amazon S3 bucket
on your behalf and then load data from that bucket into an Amazon Redshift cluster. For more
information, see Creating a Role to Delegate Permissions to an AWS Service in the IAM User Guide.
• Applications running on Amazon EC2 – You can use an IAM role to manage temporary credentials
for applications that are running on an EC2 instance and making AWS CLI or AWS API requests. This
is preferable to storing access keys within the EC2 instance. To assign an AWS role to an EC2 instance
and make it available to all of its applications, you create an instance profile that is attached to
the instance. An instance profile contains the role and enables programs that are running on the
EC2 instance to get temporary credentials. For more information, see Using an IAM Role to Grant
Permissions to Applications Running on Amazon EC2 Instances in the IAM User Guide.
Access Control
You can have valid credentials to authenticate your requests, but unless you have permissions you cannot
create or access Amazon Kinesis Data Analytics resources. For example, you must have permissions to
create an Amazon Kinesis Data Analytics application.
The following sections describe how to manage permissions for Amazon Kinesis Data Analytics. We
recommend that you read the overview first.
• Overview of Managing Access Permissions to Your Amazon Kinesis Data Analytics Resources (p. 176)
• Using Identity-Based Policies (IAM Policies) for Amazon Kinesis Data Analytics (p. 180)
• Amazon Kinesis Data Analytics API Permissions: Actions, Permissions, and Resources
Reference (p. 185)
176
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Amazon Kinesis Data Analytics Resources and Operations
Note
An account administrator (or administrator user) is a user with administrator privileges. For more
information, see IAM Best Practices in the IAM User Guide.
When granting permissions, you decide who is getting the permissions, the resources they get
permissions for, and the specific actions that you want to allow on those resources.
Topics
• Amazon Kinesis Data Analytics Resources and Operations (p. 177)
• Understanding Resource Ownership (p. 177)
• Managing Access to Resources (p. 177)
• Specifying Policy Elements: Actions, Effects, and Principals (p. 179)
• Specifying Conditions in a Policy (p. 179)
These resources have unique Amazon Resource Names (ARNs) associated with them, as shown in the
following table.
Application arn:aws:kinesisanalytics:region:account-
id:application/application-name
Amazon Kinesis Data Analytics provides a set of operations to work with Amazon Kinesis Data Analytics
resources. For a list of available operations, see Amazon Kinesis Data Analytics Actions (p. 188).
• If you use the root account credentials of your AWS account to create an application, your AWS
account is the owner of the resource. (In Amazon Kinesis Data Analytics, the resource is an application.)
• If you create an IAM user in your AWS account and grant permissions to create an application to that
user, the user can create an application. However, your AWS account, to which the user belongs, owns
the application resource.
• If you create an IAM role in your AWS account with permissions to create an application, anyone who
can assume the role can create an application. Your AWS account, to which the user belongs, owns the
application resource.
177
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Managing Access to Resources
Note
This section discusses using IAM in the context of Amazon Kinesis Data Analytics. It doesn't
provide detailed information about the IAM service. For complete IAM documentation, see What
Is IAM? in the IAM User Guide. For information about IAM policy syntax and descriptions, see IAM
JSON Policy Reference in the IAM User Guide.
Policies that are attached to an IAM identity are referred to as identity-based policies (IAM policies).
Policies that are attached to a resource are referred to as resource-based policies. Amazon Kinesis Data
Analytics supports only identity-based policies (IAM policies).
Topics
• Identity-Based Policies (IAM Policies) (p. 178)
• Resource-Based Policies (p. 179)
• Attach a permissions policy to a user or a group in your account – To grant a user permissions to
create an Amazon Kinesis Data Analytics resource, such as an application, you can attach a permissions
policy to a user or group that the user belongs to.
• Attach a permissions policy to a role (grant cross-account permissions) – You can attach an
identity-based permissions policy to an IAM role to grant cross-account permissions. For example,
the administrator in account A can create a role to grant cross-account permissions to another AWS
account (for example, account B) or an AWS service as follows:
1. Account A administrator creates an IAM role and attaches a permissions policy to the role that
grants permissions on resources in account A.
2. Account A administrator attaches a trust policy to the role identifying account B as the principal
who can assume the role.
3. Account B administrator can then delegate permissions to assume the role to any users in account B.
Doing this allows users in account B to create or access resources in account A. The principal in the
trust policy can also be an AWS service principal if you want to grant an AWS service permissions to
assume the role.
For more information about using IAM to delegate permissions, see Access Management in the IAM
User Guide.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1473028104000",
"Effect": "Allow",
"Action": [
"kinesisanalytics:CreateApplication"
],
178
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Specifying Policy Elements: Actions, Effects, and Principals
"Resource": [
"*"
]
}
]
}
For more information about using identity-based policies with Amazon Kinesis Data Analytics, see Using
Identity-Based Policies (IAM Policies) for Amazon Kinesis Data Analytics (p. 180). For more information
about users, groups, roles, and permissions, see Identities (Users, Groups, and Roles) in the IAM User
Guide.
Resource-Based Policies
Other services, such as Amazon S3, also support resource-based permissions policies. For example, you
can attach a policy to an S3 bucket to manage access permissions to that bucket. Amazon Kinesis Data
Analytics doesn't support resource-based policies.
• Resource – You use an Amazon Resource Name (ARN) to identify the resource that the policy applies
to. For more information, see Amazon Kinesis Data Analytics Resources and Operations (p. 177).
• Action – You use action keywords to identify resource operations that you want to allow or deny. For
example, you can use create to allow users to create an application.
• Effect – You specify the effect, either allow or deny, when the user requests the specific action. If you
don't explicitly grant access to (allow) a resource, access is implicitly denied. You can also explicitly
deny access to a resource, which you might do to make sure that a user cannot access it, even if a
different policy grants access.
• Principal – In identity-based policies (IAM policies), the user that the policy is attached to is the
implicit principal. For resource-based policies, you specify the user, account, service, or other entity
that you want to receive permissions (applies to resource-based policies only). Amazon Kinesis Data
Analytics doesn't support resource-based policies.
To learn more about IAM policy syntax and descriptions, see IAM JSON Policy Reference in the IAM User
Guide.
For a list showing all of the Amazon Kinesis Data Analytics API operations and the resources that they
apply to, see Amazon Kinesis Data Analytics API Permissions: Actions, Permissions, and Resources
Reference (p. 185).
179
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Using Identity-Based Policies (IAM Policies)
For more information about specifying conditions in a policy language, see Condition in the IAM User
Guide.
To express conditions, you use predefined condition keys. There are no condition keys specific to Amazon
Kinesis Data Analytics. However, there are AWS-wide condition keys that you can use as appropriate. For
a complete list of AWS-wide keys, see Available Keys for Conditions in the IAM User Guide.
Topics
• Permissions Required to Use the Amazon Kinesis Data Analytics Console (p. 181)
• AWS Managed (Predefined) Policies for Amazon Kinesis Data Analytics (p. 181)
• Customer Managed Policy Examples (p. 182)
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1473028104000",
"Effect": "Allow",
"Action": [
"kinesisanalytics:CreateApplication"
],
"Resource": [
"*"
]
}
]
}
• The first statement grants permissions for one Kinesis Data Analytics action
(kinesisanalytics:CreateApplication) on a resource using the Amazon Resource Name
(ARN) for the application. The ARN in this case specifies a wildcard character (*) to indicate that the
permission is granted for any resource.
For a table showing all of the Kinesis Data Analytics API operations and the resources that they
apply to, see Amazon Kinesis Data Analytics API Permissions: Actions, Permissions, and Resources
Reference (p. 185).
180
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Permissions Required to Use the
Amazon Kinesis Data Analytics Console
• Use the AWS managed policies to grant user permissions. For available policies, see AWS Managed
(Predefined) Policies for Amazon Kinesis Data Analytics (p. 181).
• Create custom policies. In this case, we recommend that you review the example provided in this
section. For more information, see Customer Managed Policy Examples (p. 182).
The following AWS managed policies, which you can attach to users in your account, are specific to
Amazon Kinesis Data Analytics:
• AmazonKinesisAnalyticsFullAccess – Grants permissions for all Amazon Kinesis Data Analytics
actions and all other permissions that allows a user to create and manage Amazon Kinesis Data
Analytics applications. However, note the following:
• These permissions are not sufficient if the user wants to create a new IAM role in the console (these
permissions allow the user to select an existing role). If you want the user to be able to create an IAM
role in the console, add the IAMFullAccess AWS managed policy.
• A user must have permission for the iam:PassRole action to specify an IAM role when configuring
Amazon Kinesis Data Analytics application. This AWS managed policy grants permission for the
iam:PassRole action to the user only on the IAM roles that start with the prefix service-role/
kinesis-analytics.
If the user wants to configure the Amazon Kinesis Data Analytics application with a role that does
not have this prefix, you first must explicitly grant the user permission for the iam:PassRole action
on the specific role.
181
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Customer Managed Policy Examples
Note
You can review these permissions policies by signing in to the IAM console and searching for
specific policies there.
You can also create your own custom IAM policies to allow permissions for Amazon Kinesis Data
Analytics actions and resources. You can attach these custom policies to the IAM users or groups that
require those permissions.
Initially, the user doesn't have permissions and can't do anything on the console. As you attach policies
to the user, you can verify that the user can perform various actions on the console.
We recommend that you use two browser windows. In one window, create the user and grant
permissions. In the other, sign in to the AWS Management Console using the user's credentials and verify
permissions as you grant them.
For examples that show how to create an IAM role that you can use as an execution role for your Amazon
Kinesis Data Analytics application, see Creating IAM Roles in the IAM User Guide.
Example steps
• Step 1: Create an IAM User (p. 182)
• Step 2: Allow the User Permissions for Actions that Are Not Specific to Amazon Kinesis Data
Analytics (p. 182)
• Step 3: Allow the User to View a List of Applications and View Details (p. 183)
• Step 4: Allow the User to Start a Specific Application (p. 184)
• Step 5: Allow the User to Create an Amazon Kinesis Data Analytics Application (p. 184)
• Step 6: Allow the Application to Use Lambda Preprocessing (p. 185)
For instructions, see Creating Your First IAM User and Administrators Group in the IAM User Guide.
Step 2: Allow the User Permissions for Actions that Are Not
Specific to Amazon Kinesis Data Analytics
First, grant a user permission for all actions that aren't specific to Amazon Kinesis Data Analytics that the
user will need when working with Amazon Kinesis Data Analytics applications. These include permissions
for working with streams (Amazon Kinesis Data Streams actions, Amazon Kinesis Data Firehose actions),
and permissions for CloudWatch actions. Attach the following policy to the user.
You need to update the policy by providing an IAM role name for which you want to grant the
iam:PassRole permission, or specify a wildcard character (*) indicating all IAM roles. This is not a secure
practice; however you might not have a specific IAM role created during this testing.
{
"Version": "2012-10-17",
182
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Customer Managed Policy Examples
"Statement": [
{
"Effect": "Allow",
"Action": [
"kinesis:CreateStream",
"kinesis:DeleteStream",
"kinesis:DescribeStream",
"kinesis:ListStreams",
"kinesis:PutRecord",
"kinesis:PutRecords"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"firehose:DescribeDeliveryStream",
"firehose:ListDeliveryStreams"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "logs:GetLogEvents",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"iam:ListPolicyVersions",
"iam:ListRoles"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "iam:PassRole",
"Resource": "arn:aws:iam::*:role/service-role/role-name"
}
]
}
• Permission for the kinesisanalytics:ListApplications action so the user can view a list of
applications. This is a service-level API call, and therefore you specify "*" as the Resource value.
• Permission for the kinesisanalytics:DescribeApplication action so that you can get
information about any of the applications.
183
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Customer Managed Policy Examples
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kinesisanalytics:ListApplications"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"kinesisanalytics:DescribeApplication"
],
"Resource": "arn:aws:kinesisanalytics:aws-region:aws-account-id:application/*"
}
]
}
Verify these permissions by signing into the Amazon Kinesis Data Analytics console using the IAM user
credentials.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"kinesisanalytics:StartApplication"
],
"Resource": "arn:aws:kinesisanalytics:aws-region:aws-account-
id:application/application-name"
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1473028104000",
"Effect": "Allow",
"Action": [
"kinesisanalytics:CreateApplication"
184
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Amazon Kinesis Data Analytics API Permissions Reference
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"kinesisanalytics:StartApplication",
"kinesisanalytics:UpdateApplication",
"kinesisanalytics:AddApplicationInput",
"kinesisanalytics:AddApplicationOutput"
],
"Resource": "arn:aws:kinesisanalytics:aws-region:aws-account-
id:application/application-name"
}
]
}
{
"Sid": "UseLambdaFunction",
"Effect": "Allow",
"Action": [
"lambda:InvokeFunction",
"lambda:GetFunctionConfiguration"
],
"Resource": "<FunctionARN>"
}
You can use AWS-wide condition keys in your Amazon Kinesis Data Analytics policies to express
conditions. For a complete list of AWS-wide keys, see Available Keys in the IAM User Guide.
Note
To specify an action, use the kinesisanalytics prefix followed by the API operation name
(for example, kinesisanalytics:AddApplicationInput).
Amazon Kinesis Data Analytics API and Required Permissions for Actions
API Operation:
185
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
GetApplicationState
Resources:
Amazon Kinesis Data Analytics API and Required Permissions for Actions
Action: kinesisanalytics:AddApplicationInput
Resources:
arn:aws:kinesisanalytics: region:accountId:application/application-name
GetApplicationState
The console uses an internal method called GetApplicationState to sample or access application
data. Your Kinesis Data Analytics service application needs to have permissions for the internal
kinesisanalytics:GetApplicationState API to sample or access application data through the
AWS Management Console.
186
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
187
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Actions
API Reference
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
You can use the AWS CLI to explore the Amazon Kinesis Data Analytics API. This guide provides Getting
Started with Amazon Kinesis Data Analytics for SQL Applications (p. 45) exercises that use the AWS CLI.
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only supports SQL
applications. Version 2 of the API supports SQL and Java applications. For more information about
version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Topics
Actions
The following actions are supported:
188
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationCloudWatchLoggingOption
AddApplicationCloudWatchLoggingOption
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Adds a CloudWatch log stream to monitor application configuration errors. For more information about
using CloudWatch log streams with Amazon Kinesis Analytics applications, see Working with Amazon
CloudWatch Logs.
Request Syntax
{
"ApplicationName": "string",
"CloudWatchLoggingOption": {
"LogStreamARN": "string",
"RoleARN": "string"
},
"CurrentApplicationVersionId": number
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CloudWatchLoggingOption (p. 189)
Provides the CloudWatch log stream Amazon Resource Name (ARN) and the IAM role ARN. Note: To
write application messages to CloudWatch, the IAM role that is used must have the PutLogEvents
policy action enabled.
Required: Yes
CurrentApplicationVersionId (p. 189)
Type: Long
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
189
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationCloudWatchLoggingOption
Errors
ConcurrentModificationException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
190
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationInput
AddApplicationInput
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Adds a streaming source to your Amazon Kinesis application. For conceptual information, see
Configuring Application Input.
You can add a streaming source either when you create an application or you can use this operation to
add a streaming source after you create an application. For more information, see CreateApplication.
Any configuration update, including adding a streaming source using this operation, results in a new
version of the application. You can use the DescribeApplication operation to find the current application
version.
Request Syntax
{
"ApplicationName": "string",
"CurrentApplicationVersionId": number,
"Input": {
"InputParallelism": {
"Count": number
},
"InputProcessingConfiguration": {
"InputLambdaProcessor": {
"ResourceARN": "string",
"RoleARN": "string"
}
},
"InputSchema": {
"RecordColumns": [
{
"Mapping": "string",
"Name": "string",
"SqlType": "string"
}
],
"RecordEncoding": "string",
"RecordFormat": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": "string",
"RecordRowDelimiter": "string"
},
"JSONMappingParameters": {
"RecordRowPath": "string"
}
},
"RecordFormatType": "string"
}
},
"KinesisFirehoseInput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"KinesisStreamsInput": {
191
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationInput
"ResourceARN": "string",
"RoleARN": "string"
},
"NamePrefix": "string"
}
}
Request Parameters
The request accepts the following data in JSON format.
Name of your existing Amazon Kinesis Analytics application to which you want to add the streaming
source.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CurrentApplicationVersionId (p. 191)
Current version of your Amazon Kinesis Analytics application. You can use the DescribeApplication
operation to find the current application version.
Type: Long
Required: Yes
Input (p. 191)
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
CodeValidationException
User-provided application code (query) is invalid. This can be a simple syntax error.
192
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationInput
InvalidArgumentException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
193
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationInputProcessingConfiguration
AddApplicationInputProcessingConfiguration
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Request Syntax
{
"ApplicationName": "string",
"CurrentApplicationVersionId": number,
"InputId": "string",
"InputProcessingConfiguration": {
"InputLambdaProcessor": {
"ResourceARN": "string",
"RoleARN": "string"
}
}
}
Request Parameters
The request accepts the following data in JSON format.
Name of the application to which you want to add the input processing configuration.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CurrentApplicationVersionId (p. 194)
Version of the application to which you want to add the input processing configuration. You can use
the DescribeApplication operation to get the current application version. If the version specified is
not the current version, the ConcurrentModificationException is returned.
Type: Long
Required: Yes
InputId (p. 194)
The ID of the input configuration to add the input processing configuration to. You can get a list of
the input IDs for an application using the DescribeApplication operation.
Type: String
194
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationInputProcessingConfiguration
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
InputProcessingConfiguration (p. 194)
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
ConcurrentModificationException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
195
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationInputProcessingConfiguration
196
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationOutput
AddApplicationOutput
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
If you want Amazon Kinesis Analytics to deliver data from an in-application stream within your
application to an external destination (such as an Amazon Kinesis stream, an Amazon Kinesis Firehose
delivery stream, or an AWS Lambda function), you add the relevant configuration to your application
using this operation. You can configure one or more outputs for your application. Each output
configuration maps an in-application stream and an external destination.
You can use one of the output configurations to deliver data from your in-application error stream to
an external destination so that you can analyze the errors. For more information, see Understanding
Application Output (Destination).
Any configuration update, including adding a streaming source using this operation, results in a new
version of the application. You can use the DescribeApplication operation to find the current application
version.
For the limits on the number of application inputs and outputs you can configure, see Limits.
Request Syntax
{
"ApplicationName": "string",
"CurrentApplicationVersionId": number,
"Output": {
"DestinationSchema": {
"RecordFormatType": "string"
},
"KinesisFirehoseOutput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"KinesisStreamsOutput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"LambdaOutput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"Name": "string"
}
}
Request Parameters
The request accepts the following data in JSON format.
Name of the application to which you want to add the output configuration.
197
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationOutput
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CurrentApplicationVersionId (p. 197)
Version of the application to which you want to add the output configuration. You can use the
DescribeApplication operation to get the current application version. If the version specified is not
the current version, the ConcurrentModificationException is returned.
Type: Long
Required: Yes
Output (p. 197)
An array of objects, each describing one output configuration. In the output configuration, you
specify the name of an in-application stream, a destination (that is, an Amazon Kinesis stream, an
Amazon Kinesis Firehose delivery stream, or an AWS Lambda function), and record the formation to
use when writing to the destination.
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
ConcurrentModificationException
198
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationOutput
UnsupportedOperationException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
199
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationReferenceDataSource
AddApplicationReferenceDataSource
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Amazon Kinesis Analytics reads reference data (that is, an Amazon S3 object) and creates an in-
application table within your application. In the request, you provide the source (S3 bucket name and
object key name), name of the in-application table to create, and the necessary mapping information
that describes how data in Amazon S3 object maps to columns in the resulting in-application table.
For conceptual information, see Configuring Application Input. For the limits on data sources you can add
to your application, see Limits.
Request Syntax
{
"ApplicationName": "string",
"CurrentApplicationVersionId": number,
"ReferenceDataSource": {
"ReferenceSchema": {
"RecordColumns": [
{
"Mapping": "string",
"Name": "string",
"SqlType": "string"
}
],
"RecordEncoding": "string",
"RecordFormat": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": "string",
"RecordRowDelimiter": "string"
},
"JSONMappingParameters": {
"RecordRowPath": "string"
}
},
"RecordFormatType": "string"
}
},
"S3ReferenceDataSource": {
"BucketARN": "string",
"FileKey": "string",
"ReferenceRoleARN": "string"
},
"TableName": "string"
}
}
Request Parameters
The request accepts the following data in JSON format.
200
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationReferenceDataSource
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CurrentApplicationVersionId (p. 200)
Version of the application for which you are adding the reference data source. You can use the
DescribeApplication operation to get the current application version. If the version specified is not
the current version, the ConcurrentModificationException is returned.
Type: Long
Required: Yes
ReferenceDataSource (p. 200)
The reference data source can be an object in your Amazon S3 bucket. Amazon Kinesis Analytics
reads the object and copies the data into the in-application table that is created. You provide an S3
bucket, object key name, and the resulting in-application table that is created. You must also provide
an IAM role with the necessary permissions that Amazon Kinesis Analytics can assume to read the
object from your S3 bucket on your behalf.
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
ConcurrentModificationException
201
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AddApplicationReferenceDataSource
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
202
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
CreateApplication
CreateApplication
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Creates an Amazon Kinesis Analytics application. You can configure each application with one streaming
source as input, application code to process the input, and up to three destinations where you want
Amazon Kinesis Analytics to write the output data from your application. For an overview, see How it
Works.
In the input configuration, you map the streaming source to an in-application stream, which you
can think of as a constantly updating table. In the mapping, you must provide a schema for the in-
application stream and map each data column in the in-application stream to a data element in the
streaming source.
Your application code is one or more SQL statements that read input data, transform it, and generate
output. Your application code can create one or more SQL artifacts like SQL streams or pumps.
In the output configuration, you can configure the application to write data from in-application streams
created in your applications to up to three destinations.
To read data from your source stream or write data to destination streams, Amazon Kinesis Analytics
needs your permissions. You grant these permissions by creating IAM roles. This operation requires
permissions to perform the kinesisanalytics:CreateApplication action.
For introductory exercises to create an Amazon Kinesis Analytics application, see Getting Started.
Request Syntax
{
"ApplicationCode": "string",
"ApplicationDescription": "string",
"ApplicationName": "string",
"CloudWatchLoggingOptions": [
{
"LogStreamARN": "string",
"RoleARN": "string"
}
],
"Inputs": [
{
"InputParallelism": {
"Count": number
},
"InputProcessingConfiguration": {
"InputLambdaProcessor": {
"ResourceARN": "string",
"RoleARN": "string"
}
},
"InputSchema": {
"RecordColumns": [
{
"Mapping": "string",
"Name": "string",
"SqlType": "string"
}
],
203
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
CreateApplication
"RecordEncoding": "string",
"RecordFormat": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": "string",
"RecordRowDelimiter": "string"
},
"JSONMappingParameters": {
"RecordRowPath": "string"
}
},
"RecordFormatType": "string"
}
},
"KinesisFirehoseInput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"KinesisStreamsInput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"NamePrefix": "string"
}
],
"Outputs": [
{
"DestinationSchema": {
"RecordFormatType": "string"
},
"KinesisFirehoseOutput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"KinesisStreamsOutput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"LambdaOutput": {
"ResourceARN": "string",
"RoleARN": "string"
},
"Name": "string"
}
],
"Tags": [
{
"Key": "string",
"Value": "string"
}
]
}
Request Parameters
The request accepts the following data in JSON format.
One or more SQL statements that read input data, transform it, and generate output. For example,
you can write a SQL statement that reads data from one in-application stream, generates a running
average of the number of advertisement clicks by vendor, and insert resulting rows in another in-
application stream using pumps. For more information about the typical pattern, see Application
Code.
204
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
CreateApplication
You can provide such series of SQL statements, where output of one statement can be used as the
input for the next statement. You store intermediate results by creating in-application streams and
pumps.
Note that the application code must create the streams with names specified in the Outputs.
For example, if your Outputs defines output streams named ExampleOutputStream1 and
ExampleOutputStream2, then your application code must create these streams.
Type: String
Required: No
ApplicationDescription (p. 203)
Type: String
Required: No
ApplicationName (p. 203)
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CloudWatchLoggingOptions (p. 203)
Use this parameter to configure a CloudWatch log stream to monitor application configuration
errors. For more information, see Working with Amazon CloudWatch Logs.
Required: No
Inputs (p. 203)
You can configure your application to receive input from a single streaming source. In this
configuration, you map this streaming source to an in-application stream that is created. Your
application code can then query the in-application stream like a table (you can think of it as a
constantly updating table).
For the streaming source, you provide its Amazon Resource Name (ARN) and format of data on
the stream (for example, JSON, CSV, etc.). You also must provide an IAM role that Amazon Kinesis
Analytics can assume to read this stream on your behalf.
To create the in-application stream, you need to specify a schema to transform your data into a
schematized version used in SQL. In the schema, you provide the necessary mapping of the data
elements in the streaming source to record columns in the in-app stream.
205
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
CreateApplication
Required: No
Outputs (p. 203)
You can configure application output to write data from any of the in-application streams to up to
three destinations.
These destinations can be Amazon Kinesis streams, Amazon Kinesis Firehose delivery streams, AWS
Lambda destinations, or any combination of the three.
In the configuration, you specify the in-application stream name, the destination stream or Lambda
function Amazon Resource Name (ARN), and the format to use when writing data. You must also
provide an IAM role that Amazon Kinesis Analytics can assume to write to the destination stream or
Lambda function on your behalf.
In the output configuration, you also provide the output stream or Lambda function ARN. For stream
destinations, you provide the format of data in the stream (for example, JSON, CSV). You also must
provide an IAM role that Amazon Kinesis Analytics can assume to write to the stream or Lambda
function on your behalf.
Required: No
Tags (p. 203)
A list of one or more tags to assign to the application. A tag is a key-value pair that identifies an
application. Note that the maximum number of application tags includes system tags. The maximum
number of user-defined application tags is 50. For more information, see Using Tagging.
Required: No
Response Syntax
{
"ApplicationSummary": {
"ApplicationARN": "string",
"ApplicationName": "string",
"ApplicationStatus": "string"
}
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
In response to your CreateApplication request, Amazon Kinesis Analytics returns a response with
a summary of the application it created, including the application Amazon Resource Name (ARN),
name, and status.
206
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
CreateApplication
Errors
CodeValidationException
User-provided application code (query) is invalid. This can be a simple syntax error.
Application created with too many tags, or too many tags added to an application. Note that the
maximum number of application tags includes system tags. The maximum number of user-defined
application tags is 50.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
207
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplication
DeleteApplication
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Deletes the specified application. Amazon Kinesis Analytics halts application execution and deletes the
application, including any application artifacts (such as in-application streams, reference table, and
application code).
Request Syntax
{
"ApplicationName": "string",
"CreateTimestamp": number
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CreateTimestamp (p. 208)
Type: Timestamp
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
ConcurrentModificationException
208
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplication
ResourceInUseException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
209
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplicationCloudWatchLoggingOption
DeleteApplicationCloudWatchLoggingOption
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Deletes a CloudWatch log stream from an application. For more information about using CloudWatch log
streams with Amazon Kinesis Analytics applications, see Working with Amazon CloudWatch Logs.
Request Syntax
{
"ApplicationName": "string",
"CloudWatchLoggingOptionId": "string",
"CurrentApplicationVersionId": number
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CloudWatchLoggingOptionId (p. 210)
The CloudWatchLoggingOptionId of the CloudWatch logging option to delete. You can get the
CloudWatchLoggingOptionId by using the DescribeApplication operation.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CurrentApplicationVersionId (p. 210)
Type: Long
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
210
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplicationCloudWatchLoggingOption
Errors
ConcurrentModificationException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
211
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplicationInputProcessingConfiguration
DeleteApplicationInputProcessingConfiguration
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Request Syntax
{
"ApplicationName": "string",
"CurrentApplicationVersionId": number,
"InputId": "string"
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CurrentApplicationVersionId (p. 212)
Type: Long
Required: Yes
InputId (p. 212)
The ID of the input configuration from which to delete the input processing configuration. You can
get a list of the input IDs for an application by using the DescribeApplication operation.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
212
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplicationInputProcessingConfiguration
Errors
ConcurrentModificationException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
213
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplicationOutput
DeleteApplicationOutput
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Deletes output destination configuration from your application configuration. Amazon Kinesis
Analytics will no longer write data from the corresponding in-application stream to the external output
destination.
Request Syntax
{
"ApplicationName": "string",
"CurrentApplicationVersionId": number,
"OutputId": "string"
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CurrentApplicationVersionId (p. 214)
Amazon Kinesis Analytics application version. You can use the DescribeApplication operation
to get the current application version. If the version specified is not the current version, the
ConcurrentModificationException is returned.
Type: Long
Required: Yes
OutputId (p. 214)
The ID of the configuration to delete. Each output configuration that is added to the application,
either when the application is created or later using the AddApplicationOutput operation, has a
unique ID. You need to provide the ID to uniquely identify the output configuration that you want to
delete from the application configuration. You can use the DescribeApplication operation to get the
specific OutputId.
Type: String
214
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplicationOutput
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
ConcurrentModificationException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
215
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplicationOutput
216
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplicationReferenceDataSource
DeleteApplicationReferenceDataSource
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Deletes a reference data source configuration from the specified application configuration.
If the application is running, Amazon Kinesis Analytics immediately removes the in-application table that
you created using the AddApplicationReferenceDataSource operation.
Request Syntax
{
"ApplicationName": "string",
"CurrentApplicationVersionId": number,
"ReferenceId": "string"
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
CurrentApplicationVersionId (p. 217)
Version of the application. You can use the DescribeApplication operation to get the
current application version. If the version specified is not the current version, the
ConcurrentModificationException is returned.
Type: Long
Required: Yes
ReferenceId (p. 217)
ID of the reference data source. When you add a reference data source to your application using
the AddApplicationReferenceDataSource, Amazon Kinesis Analytics assigns an ID. You can use the
DescribeApplication operation to get the reference ID.
Type: String
217
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DeleteApplicationReferenceDataSource
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
ConcurrentModificationException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
218
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DescribeApplication
DescribeApplication
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
If you want to retrieve a list of all applications in your account, use the ListApplications operation.
Request Syntax
{
"ApplicationName": "string"
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
Response Syntax
{
"ApplicationDetail": {
"ApplicationARN": "string",
"ApplicationCode": "string",
"ApplicationDescription": "string",
"ApplicationName": "string",
"ApplicationStatus": "string",
"ApplicationVersionId": number,
"CloudWatchLoggingOptionDescriptions": [
{
"CloudWatchLoggingOptionId": "string",
"LogStreamARN": "string",
"RoleARN": "string"
}
],
"CreateTimestamp": number,
"InputDescriptions": [
{
"InAppStreamNames": [ "string" ],
"InputId": "string",
219
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DescribeApplication
"InputParallelism": {
"Count": number
},
"InputProcessingConfigurationDescription": {
"InputLambdaProcessorDescription": {
"ResourceARN": "string",
"RoleARN": "string"
}
},
"InputSchema": {
"RecordColumns": [
{
"Mapping": "string",
"Name": "string",
"SqlType": "string"
}
],
"RecordEncoding": "string",
"RecordFormat": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": "string",
"RecordRowDelimiter": "string"
},
"JSONMappingParameters": {
"RecordRowPath": "string"
}
},
"RecordFormatType": "string"
}
},
"InputStartingPositionConfiguration": {
"InputStartingPosition": "string"
},
"KinesisFirehoseInputDescription": {
"ResourceARN": "string",
"RoleARN": "string"
},
"KinesisStreamsInputDescription": {
"ResourceARN": "string",
"RoleARN": "string"
},
"NamePrefix": "string"
}
],
"LastUpdateTimestamp": number,
"OutputDescriptions": [
{
"DestinationSchema": {
"RecordFormatType": "string"
},
"KinesisFirehoseOutputDescription": {
"ResourceARN": "string",
"RoleARN": "string"
},
"KinesisStreamsOutputDescription": {
"ResourceARN": "string",
"RoleARN": "string"
},
"LambdaOutputDescription": {
"ResourceARN": "string",
"RoleARN": "string"
},
"Name": "string",
"OutputId": "string"
}
220
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DescribeApplication
],
"ReferenceDataSourceDescriptions": [
{
"ReferenceId": "string",
"ReferenceSchema": {
"RecordColumns": [
{
"Mapping": "string",
"Name": "string",
"SqlType": "string"
}
],
"RecordEncoding": "string",
"RecordFormat": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": "string",
"RecordRowDelimiter": "string"
},
"JSONMappingParameters": {
"RecordRowPath": "string"
}
},
"RecordFormatType": "string"
}
},
"S3ReferenceDataSourceDescription": {
"BucketARN": "string",
"FileKey": "string",
"ReferenceRoleARN": "string"
},
"TableName": "string"
}
]
}
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
Provides a description of the application, such as the application Amazon Resource Name (ARN),
status, latest version, and input and output configuration details.
Errors
ResourceNotFoundException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
221
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DescribeApplication
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
222
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DiscoverInputSchema
DiscoverInputSchema
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Infers a schema by evaluating sample records on the specified streaming source (Amazon Kinesis stream
or Amazon Kinesis Firehose delivery stream) or S3 object. In the response, the operation returns the
inferred schema and also the sample records that the operation used to infer the schema.
You can use the inferred schema when configuring a streaming source for your application. For
conceptual information, see Configuring Application Input. Note that when you create an application
using the Amazon Kinesis Analytics console, the console uses this operation to infer a schema and show
it in the console user interface.
Request Syntax
{
"InputProcessingConfiguration": {
"InputLambdaProcessor": {
"ResourceARN": "string",
"RoleARN": "string"
}
},
"InputStartingPositionConfiguration": {
"InputStartingPosition": "string"
},
"ResourceARN": "string",
"RoleARN": "string",
"S3Configuration": {
"BucketARN": "string",
"FileKey": "string",
"RoleARN": "string"
}
}
Request Parameters
The request accepts the following data in JSON format.
The InputProcessingConfiguration to use to preprocess the records before discovering the schema of
the records.
Required: No
InputStartingPositionConfiguration (p. 223)
Point at which you want Amazon Kinesis Analytics to start reading records from the specified
streaming source discovery purposes.
223
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DiscoverInputSchema
Required: No
ResourceARN (p. 223)
Type: String
Pattern: arn:.*
Required: No
RoleARN (p. 223)
ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
S3Configuration (p. 223)
Required: No
Response Syntax
{
"InputSchema": {
"RecordColumns": [
{
"Mapping": "string",
"Name": "string",
"SqlType": "string"
}
],
"RecordEncoding": "string",
"RecordFormat": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": "string",
"RecordRowDelimiter": "string"
},
"JSONMappingParameters": {
"RecordRowPath": "string"
}
},
"RecordFormatType": "string"
}
},
"ParsedInputRecords": [
[ "string" ]
],
"ProcessedInputRecords": [ "string" ],
"RawInputRecords": [ "string" ]
224
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DiscoverInputSchema
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
Schema inferred from the streaming source. It identifies the format of the data in the streaming
source and how each data element maps to corresponding columns in the in-application stream that
you can create.
An array of elements, where each element corresponds to a row in a stream record (a stream record
can have more than one row).
Errors
InvalidArgumentException
Discovery failed to get a record from the streaming source because of the Amazon Kinesis Streams
ProvisionedThroughputExceededException. For more information, see GetRecords in the Amazon
Kinesis Streams API Reference.
Data format is not valid. Amazon Kinesis Analytics is not able to detect schema for the given
streaming source.
225
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DiscoverInputSchema
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
226
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ListApplications
ListApplications
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Returns a list of Amazon Kinesis Analytics applications in your account. For each application, the
response includes the application name, Amazon Resource Name (ARN), and status. If the response
returns the HasMoreApplications value as true, you can send another request by adding the
ExclusiveStartApplicationName in the request body, and set the value of this to the last
application name from the previous response.
Request Syntax
{
"ExclusiveStartApplicationName": "string",
"Limit": number
}
Request Parameters
The request accepts the following data in JSON format.
Name of the application to start the list with. When using pagination to retrieve the list, you don't
need to specify this parameter in the first request. However, in subsequent requests, you add the last
application name from the previous response to get the next page of applications.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: No
Limit (p. 227)
Type: Integer
Required: No
Response Syntax
{
"ApplicationSummaries": [
{
"ApplicationARN": "string",
227
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ListApplications
"ApplicationName": "string",
"ApplicationStatus": "string"
}
],
"HasMoreApplications": boolean
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
Type: Boolean
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
228
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ListTagsForResource
ListTagsForResource
Retrieves the list of key-value tags assigned to the application. For more information, see Using Tagging.
Request Syntax
{
"ResourceARN": "string"
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: arn:aws:kinesisanalytics:[a-z]{2}-[a-z]+-\d{1}+:\d{12}+:application/
[a-zA-Z0-9_.-]{1,128}
Required: Yes
Response Syntax
{
"Tags": [
{
"Key": "string",
"Value": "string"
}
]
}
Response Elements
If the action is successful, the service sends back an HTTP 200 response.
229
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ListTagsForResource
Errors
ConcurrentModificationException
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
230
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
StartApplication
StartApplication
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Starts the specified Amazon Kinesis Analytics application. After creating an application, you must
exclusively call this operation to start your application.
After the application starts, it begins consuming the input data, processes it, and writes the output to the
configured destination.
The application status must be READY for you to start an application. You can get the application status
in the console or using the DescribeApplication operation.
After you start the application, you can stop the application from processing the input by calling the
StopApplication operation.
Request Syntax
{
"ApplicationName": "string",
"InputConfigurations": [
{
"Id": "string",
"InputStartingPositionConfiguration": {
"InputStartingPosition": "string"
}
}
]
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
InputConfigurations (p. 231)
Identifies the specific input, by ID, that the application starts consuming. Amazon Kinesis Analytics
starts reading the streaming source associated with the input. You can also specify where in the
streaming source you want Amazon Kinesis Analytics to start reading.
Required: Yes
231
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
StartApplication
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
InvalidApplicationConfigurationException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
232
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
StopApplication
StopApplication
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Stops the application from processing input data. You can stop an application only if it is in the running
state. You can use the DescribeApplication operation to find the application state. After the application
is stopped, Amazon Kinesis Analytics stops reading data from the input, the application stops processing
data, and there is no output written to the destination.
Request Syntax
{
"ApplicationName": "string"
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
ResourceInUseException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
233
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
StopApplication
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
234
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
TagResource
TagResource
Adds one or more key-value tags to a Kinesis Analytics application. Note that the maximum number of
application tags includes system tags. The maximum number of user-defined application tags is 50. For
more information, see Using Tagging.
Request Syntax
{
"ResourceARN": "string",
"Tags": [
{
"Key": "string",
"Value": "string"
}
]
}
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: arn:aws:kinesisanalytics:[a-z]{2}-[a-z]+-\d{1}+:\d{12}+:application/
[a-zA-Z0-9_.-]{1,128}
Required: Yes
Tags (p. 235)
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
ConcurrentModificationException
235
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
TagResource
InvalidArgumentException
Application created with too many tags, or too many tags added to an application. Note that the
maximum number of application tags includes system tags. The maximum number of user-defined
application tags is 50.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
236
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
UntagResource
UntagResource
Removes one or more tags from a Kinesis Analytics application. For more information, see Using Tagging.
Request Syntax
{
"ResourceARN": "string",
"TagKeys": [ "string" ]
}
Request Parameters
The request accepts the following data in JSON format.
The ARN of the Kinesis Analytics application from which to remove the tags.
Type: String
Pattern: arn:aws:kinesisanalytics:[a-z]{2}-[a-z]+-\d{1}+:\d{12}+:application/
[a-zA-Z0-9_.-]{1,128}
Required: Yes
TagKeys (p. 237)
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
ConcurrentModificationException
237
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
UntagResource
ResourceInUseException
Application created with too many tags, or too many tags added to an application. Note that the
maximum number of application tags includes system tags. The maximum number of user-defined
application tags is 50.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
238
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
UpdateApplication
UpdateApplication
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Updates an existing Amazon Kinesis Analytics application. Using this API, you can update application
code, input configuration, and output configuration.
Note that Amazon Kinesis Analytics updates the CurrentApplicationVersionId each time you
update your application.
Request Syntax
{
"ApplicationName": "string",
"ApplicationUpdate": {
"ApplicationCodeUpdate": "string",
"CloudWatchLoggingOptionUpdates": [
{
"CloudWatchLoggingOptionId": "string",
"LogStreamARNUpdate": "string",
"RoleARNUpdate": "string"
}
],
"InputUpdates": [
{
"InputId": "string",
"InputParallelismUpdate": {
"CountUpdate": number
},
"InputProcessingConfigurationUpdate": {
"InputLambdaProcessorUpdate": {
"ResourceARNUpdate": "string",
"RoleARNUpdate": "string"
}
},
"InputSchemaUpdate": {
"RecordColumnUpdates": [
{
"Mapping": "string",
"Name": "string",
"SqlType": "string"
}
],
"RecordEncodingUpdate": "string",
"RecordFormatUpdate": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": "string",
"RecordRowDelimiter": "string"
},
"JSONMappingParameters": {
"RecordRowPath": "string"
}
},
"RecordFormatType": "string"
}
},
"KinesisFirehoseInputUpdate": {
239
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
UpdateApplication
"ResourceARNUpdate": "string",
"RoleARNUpdate": "string"
},
"KinesisStreamsInputUpdate": {
"ResourceARNUpdate": "string",
"RoleARNUpdate": "string"
},
"NamePrefixUpdate": "string"
}
],
"OutputUpdates": [
{
"DestinationSchemaUpdate": {
"RecordFormatType": "string"
},
"KinesisFirehoseOutputUpdate": {
"ResourceARNUpdate": "string",
"RoleARNUpdate": "string"
},
"KinesisStreamsOutputUpdate": {
"ResourceARNUpdate": "string",
"RoleARNUpdate": "string"
},
"LambdaOutputUpdate": {
"ResourceARNUpdate": "string",
"RoleARNUpdate": "string"
},
"NameUpdate": "string",
"OutputId": "string"
}
],
"ReferenceDataSourceUpdates": [
{
"ReferenceId": "string",
"ReferenceSchemaUpdate": {
"RecordColumns": [
{
"Mapping": "string",
"Name": "string",
"SqlType": "string"
}
],
"RecordEncoding": "string",
"RecordFormat": {
"MappingParameters": {
"CSVMappingParameters": {
"RecordColumnDelimiter": "string",
"RecordRowDelimiter": "string"
},
"JSONMappingParameters": {
"RecordRowPath": "string"
}
},
"RecordFormatType": "string"
}
},
"S3ReferenceDataSourceUpdate": {
"BucketARNUpdate": "string",
"FileKeyUpdate": "string",
"ReferenceRoleARNUpdate": "string"
},
"TableNameUpdate": "string"
}
]
},
"CurrentApplicationVersionId": number
240
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
UpdateApplication
Request Parameters
The request accepts the following data in JSON format.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
ApplicationUpdate (p. 239)
Required: Yes
CurrentApplicationVersionId (p. 239)
The current application version ID. You can use the DescribeApplication operation to get this value.
Type: Long
Required: Yes
Response Elements
If the action is successful, the service sends back an HTTP 200 response with an empty HTTP body.
Errors
CodeValidationException
User-provided application code (query) is invalid. This can be a simple syntax error.
241
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Data Types
ResourceInUseException
The request was rejected because a specified parameter is not supported or a specified resource is
not valid for this operation.
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
Data Types
The following data types are supported:
242
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Data Types
243
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ApplicationDetail
ApplicationDetail
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Provides a description of the application, including the application Amazon Resource Name (ARN), status,
latest version, and input and output configuration.
Contents
ApplicationARN
Type: String
Pattern: arn:.*
Required: Yes
ApplicationCode
Returns the application code that you provided to perform data analysis on any of the in-application
streams in your application.
Type: String
Required: No
ApplicationDescription
Type: String
Required: No
ApplicationName
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
ApplicationStatus
Type: String
244
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ApplicationDetail
Required: Yes
ApplicationVersionId
Type: Long
Required: Yes
CloudWatchLoggingOptionDescriptions
Describes the CloudWatch log streams that are configured to receive application messages. For more
information about using CloudWatch log streams with Amazon Kinesis Analytics applications, see
Working with Amazon CloudWatch Logs.
Required: No
CreateTimestamp
Type: Timestamp
Required: No
InputDescriptions
Describes the application input configuration. For more information, see Configuring Application
Input.
Required: No
LastUpdateTimestamp
Type: Timestamp
Required: No
OutputDescriptions
Describes the application output configuration. For more information, see Configuring Application
Output.
Required: No
ReferenceDataSourceDescriptions
Describes reference data sources configured for the application. For more information, see
Configuring Application Input.
Required: No
245
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ApplicationDetail
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
246
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ApplicationSummary
ApplicationSummary
Note
This documentation is for version 1 of the Amazon Kinesis Data Analytics API, which only
supports SQL applications. Version 2 of the API supports SQL and Java applications. For more
information about version 2, see Amazon Kinesis Data Analytics API V2 Documentation.
Provides application summary information, including the application Amazon Resource Name (ARN),
name, and status.
Contents
ApplicationARN
Type: String
Pattern: arn:.*
Required: Yes
ApplicationName
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
ApplicationStatus
Type: String
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
247
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ApplicationUpdate
ApplicationUpdate
Describes updates to apply to an existing Amazon Kinesis Analytics application.
Contents
ApplicationCodeUpdate
Type: String
Required: No
CloudWatchLoggingOptionUpdates
Required: No
InputUpdates
Required: No
OutputUpdates
Required: No
ReferenceDataSourceUpdates
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
248
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
CloudWatchLoggingOption
CloudWatchLoggingOption
Provides a description of CloudWatch logging options, including the log stream Amazon Resource Name
(ARN) and the role ARN.
Contents
LogStreamARN
Type: String
Pattern: arn:.*
Required: Yes
RoleARN
IAM ARN of the role to use to send application messages. Note: To write application messages to
CloudWatch, the IAM role that is used must have the PutLogEvents policy action enabled.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
249
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
CloudWatchLoggingOptionDescription
CloudWatchLoggingOptionDescription
Description of the CloudWatch logging option.
Contents
CloudWatchLoggingOptionId
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: No
LogStreamARN
Type: String
Pattern: arn:.*
Required: Yes
RoleARN
IAM ARN of the role to use to send application messages. Note: To write application messages to
CloudWatch, the IAM role used must have the PutLogEvents policy action enabled.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
250
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
CloudWatchLoggingOptionUpdate
CloudWatchLoggingOptionUpdate
Describes CloudWatch logging option updates.
Contents
CloudWatchLoggingOptionId
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
LogStreamARNUpdate
Type: String
Pattern: arn:.*
Required: No
RoleARNUpdate
IAM ARN of the role to use to send application messages. Note: To write application messages to
CloudWatch, the IAM role used must have the PutLogEvents policy action enabled.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
251
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
CSVMappingParameters
CSVMappingParameters
Provides additional mapping information when the record format uses delimiters, such as CSV. For
example, the following sample records use CSV format, where the records use the '\n' as the row
delimiter and a comma (",") as the column delimiter:
"name1", "address1"
"name2", "address2"
Contents
RecordColumnDelimiter
Column delimiter. For example, in a CSV format, a comma (",") is the typical column delimiter.
Type: String
Required: Yes
RecordRowDelimiter
Row delimiter. For example, in a CSV format, '\n' is the typical row delimiter.
Type: String
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
252
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
DestinationSchema
DestinationSchema
Describes the data format when records are written to the destination. For more information, see
Configuring Application Output.
Contents
RecordFormatType
Type: String
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
253
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Input
Input
When you configure the application input, you specify the streaming source, the in-application stream
name that is created, and the mapping between the two. For more information, see Configuring
Application Input.
Contents
InputParallelism
Required: No
InputProcessingConfiguration
The InputProcessingConfiguration for the input. An input processor transforms records as they are
received from the stream, before the application's SQL code executes. Currently, the only input
processing configuration available is InputLambdaProcessor.
Required: No
InputSchema
Describes the format of the data in the streaming source, and how each data element maps to
corresponding columns in the in-application stream that is being created.
Required: Yes
KinesisFirehoseInput
If the streaming source is an Amazon Kinesis Firehose delivery stream, identifies the delivery
stream's ARN and an IAM role that enables Amazon Kinesis Analytics to access the stream on your
behalf.
Required: No
KinesisStreamsInput
If the streaming source is an Amazon Kinesis stream, identifies the stream's Amazon Resource Name
(ARN) and an IAM role that enables Amazon Kinesis Analytics to access the stream on your behalf.
Required: No
254
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Input
NamePrefix
Name prefix to use when creating an in-application stream. Suppose that you specify a
prefix "MyInApplicationStream." Amazon Kinesis Analytics then creates one or more (as
per the InputParallelism count you specified) in-application streams with names
"MyInApplicationStream_001," "MyInApplicationStream_002," and so on.
Type: String
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
255
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputConfiguration
InputConfiguration
When you start your application, you provide this configuration, which identifies the input source and the
point in the input source at which you want the application to start processing records.
Contents
Id
Input source ID. You can get this ID by calling the DescribeApplication operation.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
InputStartingPositionConfiguration
Point at which you want the application to start processing records from the streaming source.
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
256
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputDescription
InputDescription
Describes the application input configuration. For more information, see Configuring Application Input.
Contents
InAppStreamNames
Returns the in-application stream names that are mapped to the stream source.
Required: No
InputId
Input ID associated with the application input. This is the ID that Amazon Kinesis Analytics assigns to
each input configuration you add to your application.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: No
InputParallelism
Describes the configured parallelism (number of in-application streams mapped to the streaming
source).
Required: No
InputProcessingConfigurationDescription
The description of the preprocessor that executes on records in this input before the application's
code is run.
Required: No
InputSchema
Describes the format of the data in the streaming source, and how each data element maps to
corresponding columns in the in-application stream that is being created.
Required: No
InputStartingPositionConfiguration
Point at which the application is configured to read from the input stream.
Required: No
257
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputDescription
KinesisFirehoseInputDescription
If an Amazon Kinesis Firehose delivery stream is configured as a streaming source, provides the
delivery stream's ARN and an IAM role that enables Amazon Kinesis Analytics to access the stream
on your behalf.
Required: No
KinesisStreamsInputDescription
If an Amazon Kinesis stream is configured as streaming source, provides Amazon Kinesis stream's
Amazon Resource Name (ARN) and an IAM role that enables Amazon Kinesis Analytics to access the
stream on your behalf.
Required: No
NamePrefix
Type: String
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
258
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputLambdaProcessor
InputLambdaProcessor
An object that contains the Amazon Resource Name (ARN) of the AWS Lambda function that is used to
preprocess records in the stream, and the ARN of the IAM role that is used to access the AWS Lambda
function.
Contents
ResourceARN
The ARN of the AWS Lambda function that operates on records in the stream.
Note
To specify an earlier version of the Lambda function than the latest, include the Lambda
function version in the Lambda function ARN. For more information about Lambda ARNs,
see Example ARNs: AWS Lambda
Type: String
Pattern: arn:.*
Required: Yes
RoleARN
The ARN of the IAM role that is used to access the AWS Lambda function.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
259
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputLambdaProcessorDescription
InputLambdaProcessorDescription
An object that contains the Amazon Resource Name (ARN) of the AWS Lambda function that is used to
preprocess records in the stream, and the ARN of the IAM role that is used to access the AWS Lambda
expression.
Contents
ResourceARN
The ARN of the AWS Lambda function that is used to preprocess the records in the stream.
Type: String
Pattern: arn:.*
Required: No
RoleARN
The ARN of the IAM role that is used to access the AWS Lambda function.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
260
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputLambdaProcessorUpdate
InputLambdaProcessorUpdate
Represents an update to the InputLambdaProcessor that is used to preprocess the records in the stream.
Contents
ResourceARNUpdate
The Amazon Resource Name (ARN) of the new AWS Lambda function that is used to preprocess the
records in the stream.
Note
To specify an earlier version of the Lambda function than the latest, include the Lambda
function version in the Lambda function ARN. For more information about Lambda ARNs,
see Example ARNs: AWS Lambda
Type: String
Pattern: arn:.*
Required: No
RoleARNUpdate
The ARN of the new IAM role that is used to access the AWS Lambda function.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
261
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputParallelism
InputParallelism
Describes the number of in-application streams to create for a given streaming source. For information
about parallelism, see Configuring Application Input.
Contents
Count
Type: Integer
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
262
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputParallelismUpdate
InputParallelismUpdate
Provides updates to the parallelism count.
Contents
CountUpdate
Type: Integer
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
263
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputProcessingConfiguration
InputProcessingConfiguration
Provides a description of a processor that is used to preprocess the records in the stream before being
processed by your application code. Currently, the only input processor available is AWS Lambda.
Contents
InputLambdaProcessor
The InputLambdaProcessor that is used to preprocess the records in the stream before being
processed by your application code.
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
264
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputProcessingConfigurationDescription
InputProcessingConfigurationDescription
Provides configuration information about an input processor. Currently, the only input processor
available is AWS Lambda.
Contents
InputLambdaProcessorDescription
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
265
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputProcessingConfigurationUpdate
InputProcessingConfigurationUpdate
Describes updates to an InputProcessingConfiguration.
Contents
InputLambdaProcessorUpdate
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
266
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputSchemaUpdate
InputSchemaUpdate
Describes updates for the application's input schema.
Contents
RecordColumnUpdates
A list of RecordColumn objects. Each object describes the mapping of the streaming source element
to the corresponding column in the in-application stream.
Required: No
RecordEncodingUpdate
Specifies the encoding of the records in the streaming source. For example, UTF-8.
Type: String
Pattern: UTF-8
Required: No
RecordFormatUpdate
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
267
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputStartingPositionConfiguration
InputStartingPositionConfiguration
Describes the point at which the application reads from the streaming source.
Contents
InputStartingPosition
Type: String
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
268
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputUpdate
InputUpdate
Describes updates to a specific input configuration (identified by the InputId of an application).
Contents
InputId
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
InputParallelismUpdate
Describes the parallelism updates (the number in-application streams Amazon Kinesis Analytics
creates for the specific streaming source).
Required: No
InputProcessingConfigurationUpdate
Required: No
InputSchemaUpdate
Describes the data format on the streaming source, and how record elements on the streaming
source map to columns of the in-application stream that is created.
Required: No
KinesisFirehoseInputUpdate
If an Amazon Kinesis Firehose delivery stream is the streaming source to be updated, provides an
updated stream ARN and IAM role ARN.
Required: No
KinesisStreamsInputUpdate
If an Amazon Kinesis stream is the streaming source to be updated, provides an updated stream
Amazon Resource Name (ARN) and IAM role ARN.
Required: No
NamePrefixUpdate
Name prefix for in-application streams that Amazon Kinesis Analytics creates for the specific
streaming source.
269
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
InputUpdate
Type: String
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
270
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
JSONMappingParameters
JSONMappingParameters
Provides additional mapping information when JSON is the record format on the streaming source.
Contents
RecordRowPath
Type: String
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
271
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisFirehoseInput
KinesisFirehoseInput
Identifies an Amazon Kinesis Firehose delivery stream as the streaming source. You provide the delivery
stream's Amazon Resource Name (ARN) and an IAM role ARN that enables Amazon Kinesis Analytics to
access the stream on your behalf.
Contents
ResourceARN
Type: String
Pattern: arn:.*
Required: Yes
RoleARN
ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.
You need to make sure that the role has the necessary permissions to access the stream.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
272
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisFirehoseInputDescription
KinesisFirehoseInputDescription
Describes the Amazon Kinesis Firehose delivery stream that is configured as the streaming source in the
application input configuration.
Contents
ResourceARN
Amazon Resource Name (ARN) of the Amazon Kinesis Firehose delivery stream.
Type: String
Pattern: arn:.*
Required: No
RoleARN
ARN of the IAM role that Amazon Kinesis Analytics assumes to access the stream.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
273
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisFirehoseInputUpdate
KinesisFirehoseInputUpdate
When updating application input configuration, provides information about an Amazon Kinesis Firehose
delivery stream as the streaming source.
Contents
ResourceARNUpdate
Amazon Resource Name (ARN) of the input Amazon Kinesis Firehose delivery stream to read.
Type: String
Pattern: arn:.*
Required: No
RoleARNUpdate
ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.
You need to grant the necessary permissions to this role.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
274
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisFirehoseOutput
KinesisFirehoseOutput
When configuring application output, identifies an Amazon Kinesis Firehose delivery stream as the
destination. You provide the stream Amazon Resource Name (ARN) and an IAM role that enables Amazon
Kinesis Analytics to write to the stream on your behalf.
Contents
ResourceARN
ARN of the destination Amazon Kinesis Firehose delivery stream to write to.
Type: String
Pattern: arn:.*
Required: Yes
RoleARN
ARN of the IAM role that Amazon Kinesis Analytics can assume to write to the destination stream on
your behalf. You need to grant the necessary permissions to this role.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
275
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisFirehoseOutputDescription
KinesisFirehoseOutputDescription
For an application output, describes the Amazon Kinesis Firehose delivery stream configured as its
destination.
Contents
ResourceARN
Amazon Resource Name (ARN) of the Amazon Kinesis Firehose delivery stream.
Type: String
Pattern: arn:.*
Required: No
RoleARN
ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
276
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisFirehoseOutputUpdate
KinesisFirehoseOutputUpdate
When updating an output configuration using the UpdateApplication operation, provides information
about an Amazon Kinesis Firehose delivery stream configured as the destination.
Contents
ResourceARNUpdate
Amazon Resource Name (ARN) of the Amazon Kinesis Firehose delivery stream to write to.
Type: String
Pattern: arn:.*
Required: No
RoleARNUpdate
ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.
You need to grant the necessary permissions to this role.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
277
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisStreamsInput
KinesisStreamsInput
Identifies an Amazon Kinesis stream as the streaming source. You provide the stream's Amazon Resource
Name (ARN) and an IAM role ARN that enables Amazon Kinesis Analytics to access the stream on your
behalf.
Contents
ResourceARN
Type: String
Pattern: arn:.*
Required: Yes
RoleARN
ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.
You need to grant the necessary permissions to this role.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
278
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisStreamsInputDescription
KinesisStreamsInputDescription
Describes the Amazon Kinesis stream that is configured as the streaming source in the application input
configuration.
Contents
ResourceARN
Type: String
Pattern: arn:.*
Required: No
RoleARN
ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
279
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisStreamsInputUpdate
KinesisStreamsInputUpdate
When updating application input configuration, provides information about an Amazon Kinesis stream as
the streaming source.
Contents
ResourceARNUpdate
Amazon Resource Name (ARN) of the input Amazon Kinesis stream to read.
Type: String
Pattern: arn:.*
Required: No
RoleARNUpdate
ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.
You need to grant the necessary permissions to this role.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
280
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisStreamsOutput
KinesisStreamsOutput
When configuring application output, identifies an Amazon Kinesis stream as the destination. You
provide the stream Amazon Resource Name (ARN) and also an IAM role ARN that Amazon Kinesis
Analytics can use to write to the stream on your behalf.
Contents
ResourceARN
Type: String
Pattern: arn:.*
Required: Yes
RoleARN
ARN of the IAM role that Amazon Kinesis Analytics can assume to write to the destination stream on
your behalf. You need to grant the necessary permissions to this role.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
281
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisStreamsOutputDescription
KinesisStreamsOutputDescription
For an application output, describes the Amazon Kinesis stream configured as its destination.
Contents
ResourceARN
Type: String
Pattern: arn:.*
Required: No
RoleARN
ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
282
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
KinesisStreamsOutputUpdate
KinesisStreamsOutputUpdate
When updating an output configuration using the UpdateApplication operation, provides information
about an Amazon Kinesis stream configured as the destination.
Contents
ResourceARNUpdate
Amazon Resource Name (ARN) of the Amazon Kinesis stream where you want to write the output.
Type: String
Pattern: arn:.*
Required: No
RoleARNUpdate
ARN of the IAM role that Amazon Kinesis Analytics can assume to access the stream on your behalf.
You need to grant the necessary permissions to this role.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
283
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
LambdaOutput
LambdaOutput
When configuring application output, identifies an AWS Lambda function as the destination. You provide
the function Amazon Resource Name (ARN) and also an IAM role ARN that Amazon Kinesis Analytics can
use to write to the function on your behalf.
Contents
ResourceARN
Amazon Resource Name (ARN) of the destination Lambda function to write to.
Note
To specify an earlier version of the Lambda function than the latest, include the Lambda
function version in the Lambda function ARN. For more information about Lambda ARNs,
see Example ARNs: AWS Lambda
Type: String
Pattern: arn:.*
Required: Yes
RoleARN
ARN of the IAM role that Amazon Kinesis Analytics can assume to write to the destination function
on your behalf. You need to grant the necessary permissions to this role.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
284
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
LambdaOutputDescription
LambdaOutputDescription
For an application output, describes the AWS Lambda function configured as its destination.
Contents
ResourceARN
Type: String
Pattern: arn:.*
Required: No
RoleARN
ARN of the IAM role that Amazon Kinesis Analytics can assume to write to the destination function.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
285
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
LambdaOutputUpdate
LambdaOutputUpdate
When updating an output configuration using the UpdateApplication operation, provides information
about an AWS Lambda function configured as the destination.
Contents
ResourceARNUpdate
Type: String
Pattern: arn:.*
Required: No
RoleARNUpdate
ARN of the IAM role that Amazon Kinesis Analytics can assume to write to the destination function
on your behalf. You need to grant the necessary permissions to this role.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
286
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
MappingParameters
MappingParameters
When configuring application input at the time of creating or updating an application, provides
additional mapping information specific to the record format (such as JSON, CSV, or record fields
delimited by some delimiter) on the streaming source.
Contents
CSVMappingParameters
Provides additional mapping information when the record format uses delimiters (for example, CSV).
Required: No
JSONMappingParameters
Provides additional mapping information when JSON is the record format on the streaming source.
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
287
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Output
Output
Describes application output configuration in which you identify an in-application stream and a
destination where you want the in-application stream data to be written. The destination can be an
Amazon Kinesis stream or an Amazon Kinesis Firehose delivery stream.
For limits on how many destinations an application can write and other limitations, see Limits.
Contents
DestinationSchema
Describes the data format when records are written to the destination. For more information, see
Configuring Application Output.
Required: Yes
KinesisFirehoseOutput
Required: No
KinesisStreamsOutput
Required: No
LambdaOutput
Required: No
Name
Type: String
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
288
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Output
289
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
OutputDescription
OutputDescription
Describes the application output configuration, which includes the in-application stream name and the
destination where the stream data is written. The destination can be an Amazon Kinesis stream or an
Amazon Kinesis Firehose delivery stream.
Contents
DestinationSchema
Required: No
KinesisFirehoseOutputDescription
Describes the Amazon Kinesis Firehose delivery stream configured as the destination where output is
written.
Required: No
KinesisStreamsOutputDescription
Describes Amazon Kinesis stream configured as the destination where output is written.
Required: No
LambdaOutputDescription
Describes the AWS Lambda function configured as the destination where output is written.
Required: No
Name
Type: String
Required: No
OutputId
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: No
290
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
OutputDescription
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
291
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
OutputUpdate
OutputUpdate
Describes updates to the output configuration identified by the OutputId.
Contents
DestinationSchemaUpdate
Describes the data format when records are written to the destination. For more information, see
Configuring Application Output.
Required: No
KinesisFirehoseOutputUpdate
Describes an Amazon Kinesis Firehose delivery stream as the destination for the output.
Required: No
KinesisStreamsOutputUpdate
Required: No
LambdaOutputUpdate
Required: No
NameUpdate
If you want to specify a different in-application stream for this output configuration, use this field to
specify the new in-application stream name.
Type: String
Required: No
OutputId
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
292
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
OutputUpdate
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
293
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
RecordColumn
RecordColumn
Describes the mapping of each data element in the streaming source to the corresponding column in the
in-application stream.
Contents
Mapping
Reference to the data element in the streaming input or the reference data source. This element is
required if the RecordFormatType is JSON.
Type: String
Required: No
Name
Name of the column created in the in-application input stream or reference table.
Type: String
Required: Yes
SqlType
Type: String
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
294
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
RecordFormat
RecordFormat
Describes the record format and relevant mapping information that should be applied to schematize the
records on the stream.
Contents
MappingParameters
When configuring application input at the time of creating or updating an application, provides
additional mapping information specific to the record format (such as JSON, CSV, or record fields
delimited by some delimiter) on the streaming source.
Required: No
RecordFormatType
Type: String
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
295
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ReferenceDataSource
ReferenceDataSource
Describes the reference data source by providing the source information (S3 bucket name and object
key name), the resulting in-application table name that is created, and the necessary schema to map the
data elements in the Amazon S3 object to the in-application table.
Contents
ReferenceSchema
Describes the format of the data in the streaming source, and how each data element maps to
corresponding columns created in the in-application stream.
Required: Yes
S3ReferenceDataSource
Identifies the S3 bucket and object that contains the reference data. Also identifies the IAM
role Amazon Kinesis Analytics can assume to read this object on your behalf. An Amazon
Kinesis Analytics application loads reference data only once. If the data changes, you call the
UpdateApplication operation to trigger reloading of data into your application.
Required: No
TableName
Type: String
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
296
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ReferenceDataSourceDescription
ReferenceDataSourceDescription
Describes the reference data source configured for an application.
Contents
ReferenceId
ID of the reference data source. This is the ID that Amazon Kinesis Analytics assigns when you
add the reference data source to your application using the AddApplicationReferenceDataSource
operation.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
ReferenceSchema
Describes the format of the data in the streaming source, and how each data element maps to
corresponding columns created in the in-application stream.
Required: No
S3ReferenceDataSourceDescription
Provides the S3 bucket name, the object key name that contains the reference data. It also provides
the Amazon Resource Name (ARN) of the IAM role that Amazon Kinesis Analytics can assume to read
the Amazon S3 object and populate the in-application reference table.
Required: Yes
TableName
The in-application table name created by the specific reference data source configuration.
Type: String
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
297
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ReferenceDataSourceUpdate
ReferenceDataSourceUpdate
When you update a reference data source configuration for an application, this object provides all the
updated values (such as the source bucket name and object key name), the in-application table name
that is created, and updated mapping information that maps the data in the Amazon S3 object to the in-
application reference table that is created.
Contents
ReferenceId
ID of the reference data source being updated. You can use the DescribeApplication operation to get
this value.
Type: String
Pattern: [a-zA-Z0-9_.-]+
Required: Yes
ReferenceSchemaUpdate
Describes the format of the data in the streaming source, and how each data element maps to
corresponding columns created in the in-application stream.
Required: No
S3ReferenceDataSourceUpdate
Describes the S3 bucket name, object key name, and IAM role that Amazon Kinesis Analytics can
assume to read the Amazon S3 object on your behalf and populate the in-application reference
table.
Required: No
TableNameUpdate
Type: String
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
298
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
ReferenceDataSourceUpdate
299
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
S3Configuration
S3Configuration
Provides a description of an Amazon S3 data source, including the Amazon Resource Name (ARN) of the
S3 bucket, the ARN of the IAM role that is used to access the bucket, and the name of the Amazon S3
object that contains the data.
Contents
BucketARN
Type: String
Pattern: arn:.*
Required: Yes
FileKey
Type: String
Required: Yes
RoleARN
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
300
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
S3ReferenceDataSource
S3ReferenceDataSource
Identifies the S3 bucket and object that contains the reference data. Also identifies the IAM role Amazon
Kinesis Analytics can assume to read this object on your behalf.
An Amazon Kinesis Analytics application loads reference data only once. If the data changes, you call the
UpdateApplication operation to trigger reloading of data into your application.
Contents
BucketARN
Type: String
Pattern: arn:.*
Required: Yes
FileKey
Type: String
Required: Yes
ReferenceRoleARN
ARN of the IAM role that the service can assume to read data on your behalf. This role must have
permission for the s3:GetObject action on the object and trust policy that allows Amazon Kinesis
Analytics service principal to assume this role.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
301
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
S3ReferenceDataSourceDescription
S3ReferenceDataSourceDescription
Provides the bucket name and object key name that stores the reference data.
Contents
BucketARN
Type: String
Pattern: arn:.*
Required: Yes
FileKey
Type: String
Required: Yes
ReferenceRoleARN
ARN of the IAM role that Amazon Kinesis Analytics can assume to read the Amazon S3 object on
your behalf to populate the in-application reference table.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
302
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
S3ReferenceDataSourceUpdate
S3ReferenceDataSourceUpdate
Describes the S3 bucket name, object key name, and IAM role that Amazon Kinesis Analytics can assume
to read the Amazon S3 object on your behalf and populate the in-application reference table.
Contents
BucketARNUpdate
Type: String
Pattern: arn:.*
Required: No
FileKeyUpdate
Type: String
Required: No
ReferenceRoleARNUpdate
ARN of the IAM role that Amazon Kinesis Analytics can assume to read the Amazon S3 object and
populate the in-application.
Type: String
Pattern: arn:aws:iam::\d{12}:role/?[a-zA-Z_0-9+=,.@\-_/]+
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
303
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
SourceSchema
SourceSchema
Describes the format of the data in the streaming source, and how each data element maps to
corresponding columns created in the in-application stream.
Contents
RecordColumns
Required: Yes
RecordEncoding
Specifies the encoding of the records in the streaming source. For example, UTF-8.
Type: String
Pattern: UTF-8
Required: No
RecordFormat
Required: Yes
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
304
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Tag
Tag
A key-value pair (the value is optional) that you can define and assign to AWS resources. If you specify a
tag that already exists, the tag value is replaced with the value that you specify in the request. Note that
the maximum number of application tags includes system tags. The maximum number of user-defined
application tags is 50. For more information, see Using Tagging.
Contents
Key
Type: String
Required: Yes
Value
Type: String
Required: No
See Also
For more information about using this API in one of the language-specific AWS SDKs, see the following:
305
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Logging Kinesis Data Analytics Amazon Kinesis Data Analytics is March 22, 2019
API Calls with AWS CloudTrail integrated with AWS CloudTrail,
a service that provides a record
of actions taken by a user,
role, or an AWS service in
Kinesis Data Analytics. For more
information, see Using AWS
CloudTrail (p. 162).
Kinesis Data Analytics available Kinesis Analytics is now available July 18, 2018
in Frankfurt region in the Europe (Frankfurt) Region
region. For more information,
see AWS Regions and Endpoints:
Kinesis Data Analytics.
Use reference data in the You can now work with July 13, 2018
console application reference data in the
console. For more information,
see Example: Adding Reference
Data to a Kinesis Data Analytics
Application (p. 118) .
306
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Increase in size of returned rows The limit for the size for a May 2, 2018
and SQL code returned row is increased to 512
KB, and the limit for the size of
the SQL code in an application
is increased to 100 KB. For more
information, see Limits (p. 165).
AWS Lambda function examples Code samples for creating March 22, 2018
in Java and .NET Lambda functions for
preprocessing records and
for application destinations.
For more information, see
Creating Lambda Functions
for Preprocessing (p. 26) and
Creating Lambda Functions for
Application Destinations (p. 38).
New HOTSPOTS function Locate and return information March 19, 2018
about relatively dense regions in
your data. For more information,
see Example: Detecting Hotspots
on a Stream (HOTSPOTS
Function) (p. 131).
Schema discovery on static data Run schema discovery on static October 6, 2017
data stored in an Amazon S3
bucket. For more information,
see Using the Schema Discovery
Feature on Static Data (p. 18).
307
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
Auto scaling applications Automatically increase the data September 13, 2017
throughput of your application
with auto scaling. For more
information, see Automatically
Scaling Applications to Increase
Throughput (p. 42).
Guide to using the AWS Edit an inferred schema and April 7, 2017
Management Console for Kinesis SQL code using the schema
Data Analytics editor and SQL editor in the
Kinesis Data Analytics console.
For more information, see
Step 4 (Optional) Edit the
Schema and SQL Code Using the
Console (p. 58).
308
Amazon Kinesis Data Analytics for SQL
Applications Developer Guide SQL Developer Guide
AWS Glossary
For the latest AWS terminology, see the AWS Glossary in the AWS General Reference.
309