[DRAFT] The DataWeave Book
[DRAFT] The DataWeave Book
## Acknowledgements
My family
Amanda Pearo
Cyril Thornton
David Wang
William Gradin
Mariano De Achaval
Ana Felisatti
Leandro Shoki
Sabrina Marechal
Jordan Schuetz
Meg Durcan
Alex Mendoza
Bera Aksoy
Aaron Lieberman
Manik Magar
John Callahan
Patryk Bandurski
Ernie Maldonado
## About Me
<text>
# Overview
This section will give an overview of the DataWeave language. By the end of the section you
will be able to describe what the DataWeave language is and what problems it was designed to
solve. You will also be able to describe the high-level components of the runtime and how they
work together to solve integration problems.
## What is DataWeave?
DataWeave is a functional programming language designed to quickly create efficient data
transformations with fewer bugs. It is a free-to-use programming language built by MuleSoft,
and is the primary transformation and expression language of the Mule 4 Runtime. DataWeave
takes the messy concerns of serializing and deserializing data and turns them into an
implementation detail of the language, allowing the developer to devote more energy to creating
a correct data transformation. It uses a small number of data types, simple syntax, and powerful
functions to transform data to and from a variety of different data formats like JSON, XML and
CSV. Just like Java, Scala, Groovy, and the Mule Runtime as a whole, DataWeave runs on the
battle-tested JVM.
## Why DataWeave?
In the niche of software integration, creating and testing data transformations is a daily concern.
While DataWeave excels at enabling developers to create complex transformations with a
relatively small amount of code, that’s only part of why it is so valuable. Let’s dig into some of
the pain points around transformation to help illustrate what DataWeave does that makes it so
helpful to integration developers.
1. a “pre-transform” phase where data is parsed into a program’s data structures (typically
done with a library). This is known as deserialization.
Serialization allows languages like DataWeave to create data in a format that other programs
know how to work with. Formats like JSON and XML allow for programs written with different
programming languages to exchange data. Without serialization, programs cannot effectively
communicate with other programs.
Most languages have libraries for serializing and deserializing the most common data formats
like JSON, XML, and CSV. For example, in the case of Java, developers often use 3rd party
libraries like Jackson to parse JSON data into Java Objects. The Java developer designs Classes
that mirror the JSON data being consumed by the system. This design includes the individual
fields of the JSON, their expected types as they relate to Java types, as well as how the overall
structure of the JSON relates to Java Collection types like List and Map. Jackson then uses those
classes to create Java Objects from the JSON data (deserialization). When the data is effectively
transformed and ready to be transported out of the system, it then must be serialized. In this
example, the Java object might be serialized into a byte[] representing JSON, then sent over the
network to a remote system.
While these libraries shield the developer from the complexity of parsing and serializing
different data formats, they are still left with the mundane work of building the appropriate
representation of the data in their language and library of choice. But what library should the
developer use? Does the developer need to work with XML in Java? Then they must choose
between SAX, DOM, StAX, or JAXB. Does the developer need to work with JSON in Java?
They’ll need to pick between Jackson, Gson, json-io, Genson, and others. Each of these libraries
has their own unique set of APIs, types, and performance tradeoffs that a developer must have an
understanding of before making an informed decision.
At this point you might be thinking it would be great if there was a single library that could
efficiently (de)serialize nearly any data format you throw at it. This way developers would only
## Execution Model
DataWeave solves the problem of developers needing to understand multiple (de)serialization
libraries by making (de)serialization an implementation detail of the language. The developer
only needs to tell DataWeave what data format to expect, and DataWeave automatically takes
care of parsing incoming data into its own data types that can be easily queried and transformed
using a diverse and powerful set of selectors, operators, and functions. It does the same thing
when it’s time to write data out into a particular data format. You tell DataWeave what you want
and it takes care of the rest.
How does DataWeave do this? DataWeave utilizes a Reader and Writer that are integral to the
language and its design:
Data comes into DataWeave and hits the Reader first. The Reader takes instructions from the
DataWeave script and deserializes the data into DataWeave objects based on those instructions.
The DataWeave script then queries and transforms the input to create output. This output is then
sent to the Writer, which uses the instructions from the DataWeave script to serialize the
DataWeave values into the desired output data format.
Here’s an example script that takes in an XML payload and writes out a JSON payload:
```
%dw 2.0
input payload application/xml
output application/json
---
# Intro Concepts
This chapter will cover most of the basic concepts necessary to understand simple DataWeave
scripts. By the end of the chapter you should have a firm grasp of the following:
```
%dw 2.0
input payload application/xml
output application/json
fun greeting(name) =
"Hello, " ++ name
The first line declares that this script is for DataWeave 2.0. There is another version of
DataWeave, 1.0, that comes with the Mule 3.7 and later 3.x Runtimes.
The third line declares that this script will serialize its output as JSON
After the output declaration is an import declaration. This is how you can use other DataWeave
code within your script.
Below the import line and above the triple dash (---) are a couple other declarations. In this case,
the script is creating a function called “greeting”, and a variable called “obj”.
Everything above the triple dash (---) is referred to as the header of the script. This is the area of
the script where all the declarations are made. The input and output data formats, imports,
variable declarations and function declarations are all done in the header. DataWeave does not
enforce a particular order, but the common convention is to list your declarations in the
following order if it makes sense to do so:
1. Version
2. Input (if needed)
3. Output
4. Imports
5. Functions
6. Variables
Note that while DataWeave does not enforce a particular order, it does evaluate the header
sequentially, meaning you must declare variables, functions, etc before you use them later in the
header. For example, the following would fail because the variable “x” is used before it’s
declared:
```
%dw 2.0
output application/json
var y = 1 + x
var x = 2
---
y
In a DataWeave script, everything below the triple dash (---) is called the body. The body
defines the output of the DataWeave script.
You may see DataWeave scripts that only have a header and contain no input or output
declarations:
```
%dw 2.0
fun greeting(salutation) =
salutation ++ “, “ ++ location
```
These are not executable DataWeave scripts, but are instead modules that can be imported and
used by executable scripts. We will learn about modules in a later chapter.
DataWeave has a core set of types that enable it to represent most common data formats, and
even a few uncommon ones. We can break these types into two categories: scalar types that
represent a single value, and vector types that represent a collection of values. DataWeave’s main
scalar types should be familiar if you’re worked with other programming languages before. They
are:
1. String
1. Object
2. Array
### Strings
Strings represent a series of characters or text. They are created by surrounding text in a pair of
double or single quotes, just like quotes for a character speaking in a novel:
```
var s1 = "I am a string."
var s2 = 'I am also a string.'
```
The choice between double and single quotes should depend on whether or not the String itself
contains double or single quotes. Single quotes within a single-quoted String need to be escaped.
Escaping a character in a String just tells DataWeave to treat that character in a special way. If
you escape a single quote in a single-quoted String, that informs DataWeave that particular single
quote is not the end of the String, and should be treated like the rest of the characters. The same
goes for double quotes. However, if a String is created using double quotes, it does not need to
escape single quotes, and vice versa:
```
If we take the case above and use single quotes for “s1” and double quotes for “s2”, we would
need to escape the quotes within the String itself with a backslash:
```
var s1 = 'I\'m a string.'
var s2 = "He said, \"I am also a string.\""
```
Escaping quotes within Strings is important because it allows developers to have double quotes
in double-quoted Strings while still allowing DataWeave to accurately determine where the
String starts and ends.
Just because you can use double or single quotes for Strings does not mean you should use
whatever your gut tells you at the time. Be consistent. Pick double or single and stick with it,
only deviating from your choice when you can reap the benefits of not having to escape quotes
within your String.
### Numbers
Numbers in DataWeave are represented like so:
```
var int = 1
var float = 6.67
var eNotation = 6.67E-10
```
Floating point numbers and integers alike are both represented by the Number type in
DataWeave. The language also supports e-notation to represent very small and very large
numbers. By default, when performing mathematical operations, DataWeave will carry through
the precision of the smallest number:
```
6.6700 * "2"
// Returns: 13.3400
```
I personally find this feature troublesome. In the interest of creating code that is easily
understood and explicit about its intent, I generally advise not to use this feature unless
necessary. This book will cover techniques you can use when writing your own functions to help
mitigate against this kind of coercion.
### Booleans
The two Boolean values in DataWeave are represented by true and false.
```
var booleanTrue = true
var booleanFalse = false
```
DataWeave does not try to coerce non-Boolean values to be true or false. For example, some
languages will interpret the number 0 as being false, and any non-zero number as being true
(looking at you, JavaScript). DataWeave makes no attempt at doing any type coercion to
Boolean.
The Date type represents a certain day on the calendar, and does not contain any information
about time. To create a Date, use the following format, |yyyy-MM-dd|:
The DateTime type represents a specific instance in time. DateTime instances are always
associated with a timezone. This is because without a timezone, a date and time combination
could represent multiple instances in time (e.g., New Year’s Day in Denver and London take
place at different places on a fixed timeline, even though the clock time is the same). Time is
specified in a HH:mm:ss format, and the timezone is specified at the end with a +/-HH:mm
format:
```
var datetime = |2020-07-26T15:32:16-08:00|
```
The LocalDateTime type is just like a DateTime type but without a timezone:
```
var localdatetime = |2020-07-26T15:32:16|
```
It’s important to always question yourself when deciding to use LocalDateTime instead of
DateTime. They look extremely similar, but represent two subtly different things. DateTimes
represent an exact instance in global time, like when an order was placed, or when a user last
logged in to their Twitter account. They represent a singular, unambiguous point on a timeline
because they have a date component, a time component, and time zone component.
LocalDateTimes on the other hand, do not have the precision to represent a particular instance in
time. They are ambiguous unless they come with an associated location or a time zone that could
be reasonably assumed. For example, if you receive a wedding invitation date and time on an
invitation for a wedding in Hawaii, that would be a LocalDateTime. They are called
LocalDateTimes because they can only be reasoned about as instances in time within the locality
to which they apply. LocalDateTimes only have a date component and time component, they do
not have a time zone component. Unfortunately, LocalDateTimes tend to be easier to use in the
short term because they alleviate the developer of having to concern themselves with time zones.
However, providing a client with an ambiguous instant in time instead of a specific one can be
Times are like DateTimes but without a date component. Just like DateTimes, they must contain
a TimeZone component. However, because they lack a date, they cannot represent a particular
instance in time, just a particular instance in a day.
```
var time = |15:32:16-08:00|
```
LocalTimes are like LocalDateTimes but without a date component. Just like LocalDateTimes,
they do not contain a TimeZone component, and therefore can only be reasoned about within the
locality they apply to. If you invite a colleague for lunch at 12:00, that’s a LocalTime.
```
var localtime = |15:32:16|
```
Whereas all of the previous date and time types represented an instance in time (either
specifically or ambiguously), the next set of types represent an amount of time. When
representing an amount of time, it’s important to remember there are two different ways to think
of a period of time: we can think of it as time-based (referred to as a Duration) or date-based
(referred to as a DatePeriod). The difference, again, is subtle. A Duration is a specific amount of
time. It is the amount of time between two points on a fixed timeline. On the other hand, a
DatePeriod has ambiguity as to how much time actually passes during the period. This is because
DatePeriods like “2 months” are dependent on a particular starting date to determine the precise
amount of time. January and February have a different number of days than March and April do.
January and February 2020 have a different number of total days than January and February
2019 because of how leap years affect the number of days in February. It is because of this subtle
ambiguity that DataWeave has the concepts of Durations which represents a precise amount of
time, and DatePeriods which represent a more ambiguous and informal amount of time.
DatePeriods start with a P and use Y, M and D to represent years, months, and days. Here is a
period of 1 year, 2 months, and 3 days:
```
Durations start with PT, and use H, M, and S to represent hours, minutes, and seconds. Here’s a
duration of 1 hour, 2 minutes, and 3 seconds:
```
var duration = |PT1H2M3S|
```
DatePeriods and Durations can be applied to the above date and time types with the + and -
operators:
```
var futureTime = |12:00:00| + |PT1H2M3S|
```
### Objects
Objects are a collection of key:value pairs. The Keys typically represent some sort of label for
the value. Here’s an example of a simple Object:
```
var object = {
name: "Jerney.io",
age: 29
}
```
One thing you might notice right away is that DataWeave’s Objects look a lot like JSON objects.
At first glance, the only difference is that DataWeave does not require quotes around Object
keys. Like JSON objects, Objects in DataWeave support multiple value types within a single
Object. In the example above, the Object contains both a String and Number. In computer
science terms, this means that DataWeave supports heterogeneous Objects. This is in contrast to
```
var nestedObject = {
company: {
name: "Josh",
age: 29
}
}
```
DataWeave’s Object type has two interesting qualities that probably make it different from other
programming languages that you’re used to. Its Keys support Attributes, and Namespaces. Let’s
look at Attributes first.
```
var object = {
name @(id: 1, loc: "Denver"): "Jerney.io"
}
```
If you’re following along in the DataWeave Playground and you output object as
application/json, you might notice that your Attributes do not show up in the output data. We’ll
cover why this happens in more detail in the MIME Types vs data types section, but for now it
will suffice to say that Key Attributes exist in DataWeave for the purposes of XML support and
are stripped for some data formats like application/json. Go ahead and change your output
declaration to be “application/xml”, and you will see the Attributes displayed in the output:
```
%dw 2.0
output application/xml
Knowing that DataWeave supports Attributes for XML support, it shouldn’t be surprising to hear
that DataWeave supports Namespaces for the same reason. Namespaces can be declared in the
header and can be used on Keys like this:
```
ns data https://www.data.com
var object = {
data#name: "Jerney.io"
}
```
Notice that there is no equals sign in the namespace declaration and that the namespace itself is
not quoted. This is important; a DataWeave script will not run if it does not adhere to these rules
for declaring namespaces. Just like above, if you’re following along with the DataWeave
Playground make sure you change your output type to “application/xml”.
Objects in DataWeave support one other helpful feature: conditional elements. With conditional
elements, we can provide DataWeave with a test to execute to determine if a key:value pair is
included in the created Object. Here’s an example:
```
var object = {
(name: "Jerney.io") if (0 > 1),
(location: "Denver, CO") if (1 ==1)
}
```
In this example, the output Object would not contain the “name” key, but it would include the
“location” key. The parentheses around the if expressions are optional, but the parentheses
DataWeave uses this same concept for multiple aspects of the language. For example, Attributes
can also be created conditionally:
```
var object = {
name @(
(id: 1) if (0 > 1),
loc: "Denver"
): "Jerney.io"
}
```
### Arrays
Arrays might be DataWeave’s most prolific and powerful data type. I really did save the best for
last. Like DataWeave’s Objects, Arrays are heterogeneous, meaning they support multiple
different types within the same Array. Here’s an example of an Array:
```
var array = [
"foo",
2,
|2020-01-01|,
[ { foo: "bar" } ]
]
```
Compared to Objects, Arrays don’t have any unique properties like Attributes and Namespaces,
they merely represent an ordered collection of items.
```
var array = [
("foo") if (0 > 1),
("bar") if (1 > 1)
]
```
# Reader/Writer Concepts
TODO: INTRO
● MIME types, data types, and how DataWeave glues them together
● How to manipulate the Reader and Writer to handle variations of different data formats
(e.g. reading CSV data without headers, writing XML data without a declaration)
TODO: INTRO
A crucial aspect of DataWeave that is often overlooked is the relationship between the major
MIME types (application/json, application/xml, and application/csv) and DataWeave’s own data
types (i.e., Array, Object, etc). This section will explain how MIME Types and data types are
translated to and from each other by the DataWeave Reader and Writer. It will also cover the
application/dw MIME type.
Before beginning, let’s describe what a MIME type is. There’s a lengthy formal specification
from the IETF here, but for our purposes, Mozilla’s succinct definition will do:
> “[A MIME type is] a standard that indicates the nature and format of a document, file, or
assortment of bytes.”[1]
In other words, MIME types are metadata (i.e., data describing other data) that describe the
format of a piece of data. If a piece of data has a MIME type of application/json, a program
Why do MIME types exist? Remember that computers are incredibly stupid. They only do
exactly what a program instructs them to do. The graphic below represents a stream of characters
as they may be received by a program with the leftmost character being the first, and the
rightmost character being the last.
You, a human with an incredibly sophisticated cognition, a knack for pattern recognition, and
knowledge about the JSON data format, can immediately identify that this was probably meant
to be a JSON object. However to a computer with no contextual information this is just, at best,
an ordered collection of bytes. The computer cannot infer any meaning beyond that. The job of
the computer in this environment is not to infer, it’s job is to do exactly what you tell it to do. It
can’t make out that “foo” and “bar” are a key:value pair because it doesn’t have any idea what
constitutes a key:value pair. It doesn’t know what the curly brackets mean, or that there is an
error here because the opening curly bracket is not eventually succeeded by a closing curly
bracket.
MIME types give computers the context they need to make sense out of data like this. When you
tell a program that it should expect application/json and it receives this, it now has the tools it
needs to identify that this is not a properly formatted JSON document. Equipped with
information about the MIME type of the data, the program can now be more useful.
What about data types? Wikipedia has a good definition for data types:
> “In computer science and computer programming, a data type... is an attribute of data which
tells the compiler or interpreter how the programmer intends to use the data”[2]
Like MIME types, data types are metadata (i.e., data that describes other data). Unlike MIME
types, data types are bound to a particular compiler or interpreter. Said another way, data types
only have relevance within the context of a particular programming language environment.
Now that you have an understanding of what MIME types and data types are, how do they apply
to DataWeave? To understand this, we’ll need to take a closer look at the Reader and the Writer.
Here’s a modification of the diagram from earlier:
Notice that MIME types only apply when it comes to data outside of the DataWeave script, and
that data types only apply to data inside the DataWeave script. This is the mental model you need
when reasoning about how DataWeave works.
If data types only exist within a DataWeave script, and MIME types only apply outside of a
script, and DataWeave’s claim to fame is that it can transform data of many different MIME
types, then there must be some kind of translation between MIME types, and data types. That
concern is delegated to the Reader and Writer. They effectively act as translators from MIME
type to data type and back again.
And so on...
While the application/dw MIME type will likely never be used as the output format of your Mule
application, it is still important to know, particularly when debugging scripts. The majority of
this section is used to describe how DataWeave maps between external MIME types and its
internal data types, but when it comes to application/dw, there isn’t any mapping that needs to
take place; just look at the translation table again.
```
%dw 2.0
output application/dw
ns jerney https://www.jerney.io
The only visible differences between a DataWeave script and piece of data in the application/dw
MIME type is that application/dw does not contain:
1. Unevaluated code - All code in the script is evaluated before it is sent to the Writer.*
2. Variable & function declarations - These are evaluated and used if needed. This
information is not sent to the Writer.
3. Input/Output declarations - These declarations are stripped. They do not have
significance outside of an executable DataWeave script.
* This is not entirely true. The application/dw format does allow unevaluated code, but it will
never be sent out of the Writer. If you send unevaluated code through the Reader, it is eventually
evaluated by the script before it is sent to the Writer.
The application/dw format may or may not contain a triple dash (---) with a header. If the output
contains no namespaces, it will not contain a header.
Data that adheres to the application/dw MIME type will always contain a clearly visible
representation of the structure of the data, including language features like Attributes and
Namespaces. This might not seem useful at first, but consider some actions the Writer might take
in an attempt to coerce DataWeave data types into compliance with a specific MIME type:
Using the application/dw as your output type is perfect for times when you want to minimize the
involvement of the Writer and see exactly how DataWeave is representing the output of the script
using its own syntax.
Put another way, application/dw is the MIME type that glues all the other MIME types together
so that DataWeave can work with them.
If you’re ever curious about how DataWeave is representing a particular piece of data internally,
just pass it through the output of the script and write the output to application/dw like so. When
working with Mule you’ll likely be doing this with the payload of the message:
```
%dw 2.0
output application/dw
---
payload
```
JSON’s translation table is very simple because of how closely related DataWeave’s syntax is to
JSON. JSON objects translate to DataWeave’s Objects, arrays translate to Arrays, and so on. The
symbols they use to represent objects, arrays, strings, and numbers are almost exactly the same.
The biggest discrepancy between the two is that DataWeave has direct support for temporal types
like Date, DateTime, etc, and JSON does not. The Writer merely translates these temporal values
to JSON strings.
Attribute N/A
Namespace N/A
JSON is more flexible compared to other formats because even something like a lone string or
number with no containing array or object is considered valid JSON. It’s like the Wild West of
data formats. We’ll see with other formats like CSV and XML that the rules are much more
strict.
CSV’s translation table is simple, but the translation that takes place is not as obvious because
CSV doesn’t have concepts of arrays and objects. Instead it has headers, rows, and fields:
Namespace N/A
While the table above can be helpful in determining how certain elements translate to and from
either other, it unfortunately cannot tell the whole story. For example, if the output of a
DataWeave script is a single Number, that cannot be translated into properly-formatted CSV
data, so DataWeave will throw an error at runtime. The most important thing to remember when
working with CSV data is that DataWeave represents CSVs using an Array of Objects. Every
Object in the Array is a CSV row, and every key in the Object corresponds to a header field on
the CSV. Because of this, Objects cannot have any kind of nesting. CSVs are limited to
2-dimensions, but DataWeave’s Objects can be nested. With DataWeave it is always the
developer’s job to make sure the structure of the script’s output is in agreement with what the
Writer expects for the corresponding output MIME type.
It’s perfectly valid for CSVs to not have headers at all. For example, the following CSV
document does not have a header:
```
david,kim,25
heather,smith,42
```
By default, when the Reader reads in this CSV file and translates it into DataWeave’s data types,
this will be the result:
```
[
{
"david": "heather",
"kim": "smith",
"25": "42"
}
]
Probably not the kind of data you were expecting, right? However, if the script informs the
Reader that inbound CSV data will not have a header, it will translate that same CSV to this:
```
[
{
"column_0": "david",
"column_1": "kim",
"column_2": "25"
},
{
"column_0": "heather",
"column_1": "smith",
"column_2": "42"
}
]
```
XML is the most complex data format that DataWeave handles. Aspects like attributes and
namespaces are relatively unique to the format, and like CSV there are peculiarities around the
overall structure of XML that need to be accounted for when reading from and writing to this
data format. Let’s start with the translation table:
Depending on your knowledge of XML, this table could be extremely confusing. Let’s use some
concrete examples to further explain each row.
```
<foo>bar</foo>
```
```
{foo: "bar"}
```
```
{foo: "bar", baz: "bat"}
```
Would DataWeave be able to output this to XML? Before you give a try in the DataWeave
playground, take a moment to speculate what might happen.
If you thought that DataWeave would not be able to output the Object to XML, you would be
correct! This is because all XML documents must have a single root element. In DataWeave, this
means the root of the data you’re outputting to XML must be an Object with a single key! You
may have any number of keys in child Objects, but the root Object must only contain a single
key.
```
<root>
<name>Jerney</name>
```
{
root: {
name: "Jerney",
age: "500",
location: {
state: "CO",
city: "Denver"
}
}
}
```
Notice that “age”, which we would identify as a Number, is instead translated by the Reader to
be a String. This is because XML Tag Content has no type information associated with it,
making it very flexible. To accommodate for this flexibility, DataWeave translates all XML Tag
Content to the String data type.
Writing is the same concept in reverse. With Numbers and Dates, the DataWeave Writer casts
these types to String and writes them out as XML Tag Content.
```
<people>
<person>
<name>Xiao</name>
</person>
<person>
<name>Raj</name>
</person>
<person>
<name>Katherine</name>
</person>
</people>
```
```
{
people: {
person: { name: "Xiao" },
person: { name: "Raj" },
person: { name: "Katherine" }
}
}
```
You might notice something strange when comparing the XML translation tables to JSON and
CSV: XML is the only data format that has no clear translation for DataWeave’s Array type.
That’s odd. DataWeave is supposed to be a language that supports easy reading and writing to
multiple different kinds of data formats. The developer should be able to read in JSON and write
to XML. But JSON supports Arrays and XML does not. How does a developer translate an
Array of values to repeating elements in XML? You have two options, you can leave it to the
Writer to attempt to sort it out, or you can manually do it yourself using dynamic elements. Both
of these options do not have analogies in other languages, and in my experience this ends up
being a pain point when working with XML.
```
%dw 2.0
output application/xml
---
{
root: {
people: [
{person: "Xiao"},
{person: "Raj"},
{person: "Katherine"}
]
}
}
```
```
<?xml version='1.0' encoding='UTF-8'?>
<root>
If the Writer can handle writing DataWeave Arrays to XML, why isn’t it in the translation table?
I can’t claim to know exactly what’s going on here because I don’t have access to the DataWeave
source code, but here’s my understanding: When dealing with XML, the Writer preprocesses the
DataWeave output, coercing Arrays to be Objects before finally serializing the data. It does so
with the following algorithm:
```
// objWithArray example: {data: [1,2,3,4]}
def coerceToObject(objWithArray):
key = getKey(objWithArray)
arr = objWithArrays[key]
new_obj = {}
return obj
```
Let’s imagine that you needed this output instead of the one above:
Given this use case, a developer cannot leave it up to the Writer to provide the correct output,
because the algorithm it uses to handle the Array will always result in a repeated “people” tag for
every “person” tag. DataWeave provides an interesting feature for handling this scenario called
“dynamic elements”. Dynamic Elements is a technique to dynamically add key:value pairs to an
Object using existing Objects and/or Array of Objects. Confusing? I’ll show how a developer
would use this feature to obtain the above output, then describe how it’s working:
```
%dw 2.0
output application/xml
---
{
root: {
people: {
([
{person: "Xiao"},
{person: "Raj"},
{person: "Katherine"}
])
}
}
}
```
There are three pieces that the developer needs to get in place to use Dynamic Elements:
1. The Object that you are dynamically adding elements to must be declared explicitly with
curly braces. This is highlighted below:
```
%dw 2.0
output application/xml
---
{
root: {
people: {
([
{person: "Xiao"},
{person: "Raj"},
{person: "Katherine"}
])
}
}
}
```
2. The dynamic elements that you wish to add to the Object must be wrapped in
parentheses:
```
3. The dynamic elements that you wish to add to the Object must be either an Object, or an
Array of Objects:
```
%dw 2.0
output application/xml
---
{
root: {
people: {
([
{person: "Xiao"},
{person: "Raj"},
{person: "Katherine"}
])
}
}
}
```
```
{
root: {
people: {
person: "Xiao",
person: "Raj",
person: "Katherine"
}
}
}
```
The translation to and from DataWeave Attributes and XML Attributes is fairly straightforward.
For the following script:
```
%dw 2.0
output application/xml
---
{
root: {
person @(id: 1): "jerney"
}
}
```
If you were to take the XML output and put use it as input to a script that outputs the same data
with application/dw, you would see an output very similar to the previous script:
```
{
root: {
person @(id: 1): "jerney"
}
}
```
Like Attributes, the translation of DataWeave Namespaces to and from XML is fairly
straightforward. Given the following script:
```
%dw 2.0
output application/xml
ns jerney https://www.jerney.io
---
{
jerney#root: {
jerney#person: "jerney"
}
}
```
<?xml version='1.0' encoding='UTF-8'?>
<jerney:root xmlns:jerney="https://www.jerney.io">
<jerney:person>jerney</jerney:person>
</jerney:root>
```
DataWeave handles XML CData as well, which brings us to a new DataWeave type: CData.
XML CData is translated by the Reader into DataWeave’s CData data type, and the Writer
translates it back to XML CData. Since XML CData is really just a string of characters
embedded as XML Tag Content which are ignored by an XML parser. Because of this,
DataWeave’s CData type can be thought of as a type alias to the String type; all operations that
work on String work on CData as well. The only difference between a String and CData is how
the Reader and Writer deal with values of this type when working with XML. Let’s look at the
Reader first. We’ll send an XML document with CData as input, and view it as application/dw:
Input:
```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person><![CDATA[<embedded>xml</embedded>]]></person>
</root>
```
Representation as application/dw:
```
The details of how DataWeave handles CData are more complicated than actually working with
it. The CData information is read in as a String by the Reader, and is appended with as
String {cdata: true}. This is called casting, and it’s used to transform data from one
type to another type in DataWeave. However, that’s not exactly what’s going on, here.
DataWeave is casting a String to a String, but adding this metadata of {cdata: true} to it so
that if that String gets passed to the Writer, and the Writer is serializing the data to XML, it
knows that String should be CData. This metadata doesn’t make the String different from any
other String in DataWeave. The following evaluates to true:
```
"Hello" as String {cdata: true} == "Hello"
```
The lesson here is that when you’re working with CData after reading it in from XML data, you
can simply treat it as any other String.
```
%dw 2.0
output application/xml
---
{
root: {
person: "<embedded>jerney</embedded>"
as String {cdata: true}
}
}
```
```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person><![CDATA[<embedded>jerney</embedded>]]></person>
</root>
```
```
%dw 2.0
output application/xml
---
{
root: {
person: "<embedded>jerney</embedded>" as CData
}
}
```
This has the exact same effect. "String" as String {cdata: true} and "String"
as CData are functionally equivalent in DataWeave.
### Conclusion
<text>
[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types
[2] https://en.wikipedia.org/wiki/Data_type
##Writer Properties
Writer properties are instructions that a DataWeave script supplies to the Writer. These
instructions inform the Writer on exactly how to serialize the output. For example, sometimes
you may want to create some CSV data with a header, and other times without a header. Maybe
Writer properties do not always affect something that can be seen in the output of the script.
Some writer properties influence the execution aspects of the script, like streaming and buffer
size.
Using the diagram from earlier, here is the aspect of DataWeave this section will focus on:
Here is an example of a Writer property in a DataWeave script. This one creates a CSV with no
header where all values are quoted:
```
%dw 2.0
output application/csv header=false, quoteValues=true
---
[
{
name: "Alexis",
age: 42
},
{
name: "Raj",
age: 35
}
]
```
```
"Alexis","42"
"Raj","35"
```
Writer properties are simply key:value pairs on the same line as the output declaration. However,
unlike most key:value pairs in DataWeave, keys are associated with values through an equals
sign, and pairs are separated by a comma.
### JSON
Writer properties for the application/json MIME type can be used to handle duplicate object
keys, compress the output by removing unnecessary whitespace, and filter nulls from the output
JSON. There are other properties that can be used to change the buffer size of the Writer, change
the encoding of the output, but they are less commonly used. This section will cover the
duplicateKeyAsArray, indent, skipNullOn, and writeAttributes Writer properties.
The duplicateKeyAsArray property informs the Writer on how to handle duplicate Object keys.
By default, DataWeave Objects containing duplicate keys will be serialized to JSON objects
as-is. If you set dupicateKeyAsArray to true, the Writer will collapse the duplicate keys into one,
and collect all values for the duplicate key into a JSON array. This prevents you from having to
implement this functionality yourself. Here’s an example:
```
%dw 2.0
output application/json duplicateKeyAsArray=true
---
{
name: "Alexis",
name: "Tim",
name: "Raj",
location: "Denver"
}
```
{
"name": [
"Alexis",
"Tim",
"Raj"
],
"location": "Denver"
}
```
The indent property informs the Writer that it should not “pretty print” the JSON payload, and
instead create it all on a single line. This feature can be useful when you wish to conserve
network bandwidth, as it acts as a filter to prevent sending insignificant whitespace over the
wire. On the other hand, it can make debugging a little more cumbersome because the JSON
won’t be as easy to read. By default, this value is set to true.
Here’s an example:
```
%dw 2.0
output application/json indent=false
---
{
foo: "bar",
baz: "bat"
}
```
```
{"foo": "bar","baz": "bat"}
The skipNullOn property is one of the most commonly used properties in DataWeave. It allows
you to remove null values from JSON arrays, remove key:value pairs in JSON objects when the
value is null, or remove null for both occasions. Unlike the other properties covered so far, the
skipNullOn property does not use a Boolean value, but instead uses a String. By default, this
property is effectively turned off.
```
%dw 2.0
output application/json skipNullOn="arrays"
---
{
foo: null,
bar: ["foo", null, "bar"]
}
```
```
{
"foo": null,
"bar": [
"foo",
"bar"
]
}
```
Here’s an example of removing key:values pairs in a JSON object when the value is null:
```
%dw 2.0
output application/json skipNullOn="objects"
```
{
"bar": [
"foo",
null
"bar"
]
}
```
Finally, here is an example of removing null values for both JSON arrays and JSON objects:
```
%dw 2.0
output application/json skipNullOn="everywhere"
---
{
foo: null,
bar: ["foo", null, "bar"]
}
```
```
{
"bar": [
"foo",
The writeAttributes property is mostly used when the input is XML, the output is JSON, and the
JSON output must retain the attributes. By default, this property is set to false, meaning attributes
will not be serialized to the output.
```
%dw 2.0
output application/json writeAttributes=true
---
{
foo @(bar: "bat"): ["foo", "bar", "baz"],
}
```
```
{
"foo": {
"@bar": "bat",
"__text": [
"foo",
"bar",
"baz"
]
}
}
```
The “foo” key is now associated with a JSON object, despite it being associated with a
DataWeave Array in the script. The JSON object has two keys. Keys prefixed with “@” are
### CSV
Writer properties for the application/csv MIME type can be used to change the character that
separates fields, create a CSV without a header, and more. There are less commonly used Writer
properties, like those used to set an alternative quote character, change the output encoding, and
others. This section will cover the separator, and header Writer properties.
The separator property is useful for when you want to change the character that separates fields
in a CSV. This is how you make tab-separated and pipe-separated files.
```
%dw 2.0
output application/csv separator="\t"
---
[
{
name: "Alexis",
age: 42
},
{
name: "Raj",
age: 35
}
]
```
```
name age
Alexis 42
```
You can set the separator property to any single character. If you provide more than one
character, DataWeave will ignore all but the first.
By default, the header writer property is set to true, meaning the output CSV will contain a
header describing each column of data. You can set the header property to false if you wish to
output application/csv without a header. Here’s an example:
```
%dw 2.0
output application/csv header=false
---
[
{
name: "Alexis",
age: 42
},
{
name: "Raj",
age: 35
}
]
```
```
Alexis,42
Raj,35
```
Writer properties for the application/xml MIME type can be used to avoid writing null values to
the output, compress the output by removing unnecessary whitespace, avoid writing the XML
declaration to the output, whether to use inline close tags or open close tags, and more. This
section will cover the skipNullOn, indent, writeDeclaration, and inlineCloseOn writer properties.
The skipNullOn writer property for XML is similar to the one used for JSON. The difference is
“arrays” and “objects” are no longer valid values, but “elements” and “attributes” are. The
“everywhere” value still applies to both JSON and XML’s skipNullOn property.
When using the “element” value, DataWeave will remove all null elements from the output.
Here’s an example:
```
%dw 2.0
output application/xml skipNullOn="elements"
---
{
root: {
person @(id: 1, status: null): {
name: "Alex",
age: null
}
}
}
```
```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person id="1" status="null">
<name>Alex</name>
</person>
Notice that the <age> element does not make it to the output. This is because the value
associated with that element was null, so DataWeave removed it in the output XML.
You can do the same with null XML attributes by setting skipNullOn to “attributes”. Here’s an
example using the same data:
```
%dw 2.0
output application/xml skipNullOn="attributes"
---
{
root: {
person @(id: 1, status: null): {
name: "Alex",
age: null
}
}
}
```
```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person id="1">
<name>Alex</name>
<age/>
</person>
</root>
```
In this case, the status attribute is not output at all because it was null. However, the empty age
element exists because skipNullOn=“attribute” will not remove null elements.
```
%dw 2.0
output application/xml skipNullOn="everywhere"
---
{
root: {
person @(id: 1, status: null): {
name: "Alex",
age: null
}
}
}
```
```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person id="1">
<name>Alex</name>
</person>
</root>
```
In this case, neither the null attributes or the null elements make it to the output.
The indent writer property can be used to remove any unnecessary whitespace. Here’s an
example using the above data:
```
%dw 2.0
output application/xml indent=false
```
<?xml version='1.0' encoding='UTF-8'?><root><person id="1"
status="null"><name>Alex</name><age/></person></root>
```
While this makes the XML more difficult for people to read, a computer can parse this easily. By
using indent=false the Writer effectively compresses the output, making it more efficient to
transmit the data over the network.
You can use writeDeclaration=false if you would like the Writer to not include the XML
declaration in the output. Here’s an example:
```
%dw 2.0
output application/xml writeDeclaration=false
---
{
root: {
person @(id: 1, status: null): {
name: "Alex",
age: null
}
}
}
```
<root>
<person id="1" status="null">
<name>Alex</name>
<age/>
</person>
</root>
```
This feature is useful if you’re working with legacy software or other software that has problems
parsing an XML declaration.
```
%dw 2.0
output application/xml inlineCloseOn="none"
---
{
root: {
person @(id: 1, status: null): {
name : "Alex",
age : null,
other : ""
}
}
}
```
```
<?xml version='1.0' encoding='UTF-8'?>
If inlineCloseOn=”none” was not set, the age and other tags would appear as <age/> and
<other/> respectively. With this property set, null values and empty strings appear as elements
with an open and close tag containing no value.
## Reader Properties
Reader properties are instructions that a DataWeave script provides to the reader. The job of
Reader properties is to inform the Reader how to parse the input data so that it may correctly
deserialized into DataWeave’s data types. For example, you may know incoming data of MIME
type application/csv does not contain a header. Without using Reader properties, the Reader will
always parse the first line of application/csv data as the header, not a row of data. You can use
Reader properties to inform the Reader that the first line of the application/csv data is not a
header, but the first row of data.
Reader properties can not only inform the Reader how to properly parse the input data, they can
also inform the Readers execution aspects like whether to stream the input data.
This section will demonstrate how Reader properties work by showing the input data, a simple
DataWeave script with the reader properties, and the output of the script as application/dw. By
outputting the input data as application/dw, we can see exactly how DataWeave is representing
the data in terms of its own data types.
Using the diagram from earlier, here is the aspect of DataWeave this section will focus on:
### CSV
DataWeave has a few important Reader properties that you should be aware of. The most
important and commonly-used Reader properties for application/csv are header, which declares
whether or not the input CSV’s first line is a header, and separator, which declares what character
the input CSV is using as a field separator.
By default, DataWeave reads application/csv with two big assumptions: the data contains a
header, and fields are separated by a comma. For example, take a look at the following CSV:
```
Josh,29,Denver
Gina,22,Chicago
Edward,33,New York
```
```
%dw 2.0
input payload application/csv
output application/dw
---
payload
```
```
[
{
Josh: "Gina",
"29": "22",
Denver: "Chicago"
},
{
Josh: "Edward",
"29": "33",
Denver: "New York"
}
]
```
Instead of 3 records there are only 2, and instead of the headers being translated into meaningful
keys like “name”, “age” and “location”, DataWeave used its default behavior and parsed the first
record of the CSV as the header and the remaining two rows as data. To get around the default
behavior, you need to set the Reader property header to false. Given the same input as before:
```
Josh,29,Denver
Gina,22,Chicago
```
If you update the script to inform the Reader that the input CSV will not contain a header:
```
%dw 2.0
input payload application/csv header=false
output application/dw
---
payload
```
DataWeave will output something more inline with how you originally understood the CSV data:
```
[
{
column_0: "Josh",
column_1: "29",
column_2: "Denver"
},
{
column_0: "Gina",
column_1: "22",
column_2: "Chicago"
},
{
column_0: "Edward",
column_1: "33",
column_2: "New York"
}
]
```
What if you’re reading data that conforms to the application/csv MIME type, but is separated by
pipes instead of commas?
```
name|age|location
Josh|29|Denver, CO
Gina|22|Chicago, IL
Edward|33|New York, NY
```
```
%dw 2.0
input payload application/csv
output application/dw
---
payload
```
[
{
"name|age|location": "Josh|29|Denver",
column_1: " CO"
},
{
"name|age|location": "Gina|22|Chicago",
column_1: " IL"
},
{
```
That’s probably not what you were hoping for! Let’s break down what happened. First,
DataWeave parsed the CSV header, looking for commas to separate the individual header values.
Since were are no commas in the header, DataWeave interpreted the header as only having one
value, “name|age|location”. From that point forward, DataWeave parsed the CSV as if each row
contained a single value. DataWeave then began parsing the subsequent rows, trying to fit them
to the header data. It parsed each line, again looking for commas to determine the individual
values. In this case DataWeave does find a comma for each row, but now there’s a dilemma: after
DataWeave parsed the header, it moved forward assuming each row of data would have only
field. What it instead found was that each row of data contained two fields (everything before the
comma, and everything after). In order to accommodate this additional field, DataWeave assigns
it a default column name relative to its position in the CSV row, which is column_1 in this case.
These kinds of issues can easily be addressed with the separator Reader property. You can inform
DataWeave to treat any single-character String as a separator, including a tab ("\t"). Let’s see it
in action.
```
name|age|location
Josh|29|Denver, CO
Gina|22|Chicago, IL
Edward|33|New York, NY
```
```
%dw 2.0
input payload application/csv separator="|"
output application/dw
You’ll get output that’s parsed into data types that better represent the original data:
```
[
{
name: "Josh",
age: "29",
location: "Denver, CO"
},
{
name: "Gina",
age: "22",
location: "Chicago, IL"
},
{
name: "Edward",
age: "33",
location: "New York, NY"
}
]
```
### XML
XML is another MIME type that does not contain many Reader properties outside the domains
of performance, streaming, and security.
While input and output directives cover most of our Data Type / MIME Type translation needs,
having direct access to the Reader and Writer via functions is necessary in some situations. For
example, maybe for some horrible reason you need to embed JSON into XML. That’s easy to do
with the write function:
```
%dw 2.0
output application/xml
---
{
root: {
data: write({hello: "world"},
"application/json",
{indent: false}) as CData
}
}
```
```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<data><![CDATA[{"hello": "world"}]]></data>
</root>
```
The read function is typically useful in the opposite situations where write is useful. If write is
useful for writing XML with embedded JSON, then read is useful for reading XML with
embedded JSON.
```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<data><![CDATA[{"hello": "world"}]]></data>
</root>
```
```
%dw 2.0
input payload application/xml
output application/dw
---
{
beforeRead : payload.root.data,
afterRead : read(payload.root.data, "application/json")
}
```
```
{
beforeRead: "{\"hello\": \"world\"}" as String {cdata: true},
In other words, the read function, just like the Reader, deserializes strings of data into
DataWeave’s native data types. This is fantastic because now we can treat that embedded JSON
just like we would if it came directly through the Reader. If we didn’t have the read function,
we would be stuck working with the JSON data as String instead of an Object.
# DataWeave Syntax
TODO: Intro
● Single-value selector
● Multi-value selector
● Descendants selector
● Index selector
● Range selector
Selectors for Object values Selectors for Array and String values
The single-value selector is the most commonly used selector. It is typically used to query keys
on Objects, returning the value associated with the given key. For example:
```
%dw 2.0
output application/dw
var obj = {
name: "Alexis",
age: 42
}
---
obj.name
```
This would return "Alexis". While the single-value selector is mostly used on Objects, it can
be used on an Array of Objects as well:
```
%dw 2.0
output application/dw
var arr = [
{
This script would return [42,35]. When the single-value selector is used on an Array of
Objects, it loops through each Object in the Array and uses the selector to query the specified
key on that Object (in the example above that key was age). If the specified key is associated
with a value on the top level of the Object, it will be added to the returned Array. The selector
will not traverse through nested Objects, which is why the script would not return
[42,35,29].
(!) I generally do not recommend using the single-value selector on Arrays to return multiple
values. As the name implies, the single-value selector should be used to return a single value. If
you want to return multiple values like in the example above, you should use the multi-value
selector so that the intentions of your code are more clear to other developers.
Like all selectors, you can chain together single-value selectors to traverse through a data
structure and return nested data:
```
%dw 2.0
output application/dw
var data = {
name: "Josh",
An alternative (but equally important) syntax for the single-value selector is shown below:
```
%dw 2.0
output application/dw
var obj = {
name: "Alexis",
age: 42
}
---
obj["name"]
```
This syntax is particularly important when you need to use the single-value selector but the key
you need to provide is dynamic and not known until the application is running. For example, the
key may be in a variable:
```
%dw 2.0
output application/dw
var obj = {
name: "Alexis",
age: 42
}
```
%dw 2.0
output application/dw
var arr = [
{
name: "Alexis",
age: 42
},
{
name: "Raj",
age: 35
},
{
name: "Josh",
detail: {age: 29}
}
]
---
arr.*age
```
This script returns [42,35] just like the single-value selector did. Just like the single-value
selector, the multi-value selector only traverses keys on the top level of the Object, it does not
traverse additional levels of nesting searching for the key. To do that, you’ll need the next
selector.
Like the single-value selector, the multi-value selector will return null when the key provided
does not match with any keys it searches.
While the single-value and multi-value selector can only query the top-level of an Object, the
descendants selectors is used to retrieve values associated with an Object key at any level of
nesting. For example:
```
%dw 2.0
output application/dw
var arr = [
{ // 1
name: "Alexis",
age: 42
},
{ // 2
name: "Raj",
age: 35
},
{ // 3
name: "Josh",
detail: {age: 29}
}
]
---
arr..age
```
Would return [42,35,29]. The descendants selector not only retrieved the 42 and 35 values
associated with the first two Objects, it also traversed the detail key in the third Object as well
and retrieved the 29 associated with the age key. If you’re dealing with Objects that do not
Like the single-value and multi-value selectors, the descendants selector will return null if it
does not find any instances of the provided key.
The single-value, multi-value, and descendants selectors all handle duplicate Object keys in a
different way. This section will explore the differences.
The single-value selector will always return the value associated with the first occurrence of a
key. If there are duplicate keys in the Object, they will be ignored:
```
%dw 2.0
output application/json
var data = {
message: "Hello world!", // First occurance
message: "Goodbye space!" // Last occurance
}
---
payload.message
```
This script will return "Hello world!", ignoring the second instance of the message key in
the Object.
The multi-value selector will return all matches, even on duplicate keys so long as they are on
the top level of the Object:
```
%dw 2.0
output application/json
var data = {
This script will return ["Hello world!","Goodbye space!"]. The message "Hello
again!" is not in the output because its key did not reside on the root level of the Object being
queried by the multi-value selector.
These same rules apply when using the multi-value selector on an Array of Objects:
```
%dw 2.0
output application/dw
var data = [
{
message: "Hello world!",
message: "Goodbye space!",
moreData: {message: "TEST"}
},
{
message: "Hello space!",
message: "Goodbye world!"
}
]
---
data.*message
```
```
%dw 2.0
output application/dw
var data = [
{
key: "hello",
key: [
{key: 1,key:2}
]
},
{
key: [
{key: 3},
{key: 4}
],
key: "goodbye"
}
]
---
data..key
```
This will return the following. I’ve added comments to further explain:
```
[
// first occurrence of key
"hello",
At this point you may be wondering if there’s a way to combine the functionality of the
multi-value and descendants selector to retrieve all the values associated with a key, regardless of
their level of nesting and whether the key is duplicate of a previous key. There is! In this case
you must combine the descendants selector with the multi-value selector:
```
%dw 2.0
output application/dw
var data = [
{
key: "hello",
key: [
{key: 1,key:2}
]
},
{
key: [
{key: 3},
This script will return the following (comments added for clarity):
```
[
"hello",
[
{
key: 1,
key: 2
}
],
1,
2, // Duplicate key values are returned
[
{
key: 3
},
{
key: 4
}
],
"goodbye", // Duplicate key values are returned
3,
4
]
```
You just learned about 3 different ways to query your data in DataWeave using selectors. The
single-value, multi-value, and descendants selectors all work by querying keys on Objects and
returning the associated values. While these selectors work to return values from Objects, all of
them can be called directly on Arrays as well. It is not recommended that you use the
single-value selector directly on Arrays, you should use the multi-value selector instead. The
multi-value selector can also be used directly on Objects when you need to return all values
associated with a duplicate key. The descendants selector is great for querying keys that may
exist at multiple levels of nesting in an Array or Object, but it will not return values associated
with duplicate keys. Finally, we learned that we can combine the multi-value selector and the
descendants selector to work around this limitation. It will return all values associated with a
particular key regardless of their level of nesting and whether the value is associated with a
duplicate key.
The index selector is used to retrieve a particular item in an Array by its index. The index of an
Array item is determined by its position in the Array relative to the beginning or end of the
Array. DataWeave indexes Arrays starting with the number 0, so the first item of an Array is at
index 0, the second item at index 1, and so on. Here’s an example:
```
%dw 2.0
output application/dw
This script will return the number 2 since the number 2 is located at Array index 1. DataWeave
supports reverse indexes, meaning that you can provide a negative number as the index and
DataWeave will locate the element from the end of the Array instead of the beginning. For
```
%dw 2.0
output application/dw
This script would return the number 3, since 3 is the third-to-last item in the Array.
The index selector will return null if you provide it with an index that is not applicable to the
Array. For example, if you use the index 100 on an Array with only 5 items, the index selector
will return null.
One other cool thing to note about the index selector is that it works on Strings as well by
treating Strings as Arrays of characters. For example:
```
%dw 2.0
output application/dw
var s = "Hello!"
---
s[1]
```
This script would return "e". All of the rules for the index selector on Arrays also apply to
Strings.
You may have noticed that the index selector and the alternative dynamic syntax for the
single-value selector are the same: value[key/index]. There are a few consequences to this
```
%dw 2.0
output application/dw
1. Recognize the operation is taking place on and Array. Coerce the String to a Number, and
return 2.
2. Return null, since "1" is not a valid Array index.
DataWeave chooses the more strict option, 2, and will return null in this circumstance. In other
words: when encountering the value[key/index] syntax during execution, DataWeave
chooses whether to treat the operation as a single-value selection or index selection based on the
data type of the value in the square brackets. If the value supplied is of type String DataWeave
treats the operation as single-value selection. If the value supplied is of type Number DataWeave
treats the operation as index selection.
(!) Because of the flexibility of the value[key/index] syntax, it’s easier to create code that
may be ambiguous to the reader. For example in this code value[lookup] it’s not obvious if
the variable lookup will be of type String or Number. In these cases it may be beneficial to cast
the key/index value appropriately to make the intention of the code more obvious to the reader.
For example value[lookup as String] when using the syntax as a single-value selector,
and value[lookup as Number] when using the syntax as an index selector.
Another situation to consider is that DataWeave’s Number type supports both whole numbers
and decimal values. Therefore, it’s perfectly valid to supply the index selector with an index like
0.193. How does DataWeave handle this case? In these cases, DataWeave truncates the
decimal, leaving a whole number. 0.193 would be treated as index 0, 1.98 would be treated
as index 1, and so on. Here’s an example:
The range selector is used when you want to extract a contiguous section of an existing Array.
You can use the range selector to retrieve all values between the indexes 1 and 5, or all values
between the indexes of -2 and -4, or any other valid range of indexes. Here’s an example:
```
%dw 2.0
output application/dw
This script would return [2,3,4]. Note that the items associated with both of the indexes
provided to the range selector make it to the output; the range selector treats the indexes as
inclusive.
Just like with the index selector, you can use negative numbers to query for ranges from the end
of an Array:
```
%dw 2.0
This script would return [4,3,2]. While both of the above scripts returned the same values, the
order of the output values were different. This is because the range selector selects items from
the first index provided to the second index. Because of this functionality, the range selector is
not only good for extracting subsets of existing Arrays, but it is also useful for creating Arrays
that are a reverse of the provided Array:
```
%dw 2.0
output application/dw
The range selector will return null in all situations where one of the indexes provided is not
applicable to the Array. For example, if you retrieve all items in a 5-item Array with the
following code:
```
%dw 2.0
output application/dw
Just like the index selector, the range selector works on Strings by treating them as an Array of
single characters:
```
%dw 2.0
output application/dw
var s = "Hello!"
---
s[1 to 4]
```
## Variables
This section is the formal introduction to variables in DataWeave. You’ve already seen them
used to associate a value to a name that can be used later; this is the primary purpose of having
variables in any programming language. Here’s the basic syntax for variables in DataWeave:
```
var <name> = <value>
```
1. The first character is an alpha (meaning only letters from the alphabet), and all remaining
characters are alphanumeric or an underscore, and
2. The name is not a reserved keyword like var or output
1. A literal value of any valid data type in DataWeave (e.g. “hello”, 1, {}, []). This includes
functions, which will be covered in the next section.
Variables can be declared in any part of the script where declarations can be made. This includes
the script header, and as we’ll see later, do expressions.
Once the name is associated with the value via the var statement, you may access that value
anywhere in the script by simply referencing that name. However, unlike other languages, once
that value is associated with the name, you can no longer associate that name with another value.
For example, the following DataWeave script would not run because the variable X is declared
twice:
```
%dw 2.0
output application/dw
var x = 1
var x = 2
---
x
```
If variables cannot be reassigned, what are they good for? Well, they’re great for storing
intermediate calculations, or giving a useful name to the output of a particularly complex
operation. We’ll discuss variables further in the section on do expressions, and cover variable
scope as well.
Variables in DataWeave can also contain type information which defines the type the variable
can contain. To restrict the type of the variable, add a colon after the variable name and specify
the type after the colon. Here’s a quick example:
```
var s: String = "Hello, world!"
DataWeave will guarantee that your variables are only able to be associated with values that
match the type of the variable at both design time and run time:
```
var x_1: Boolean = 1 // ERROR
var x_2: String = 1 + 1 // ERROR
```
## Functions
Functions are DataWeave’s most powerful tool for abstraction. In programming, abstraction is
the ability to hide the implementation details of a certain computation behind an easy-to-use
interface. For example, given the length of a right-triangle’s two short sides, a developer could
use the Pythagorean Theorem to calculate the hypotenuse like this:
```
%dw 2.0
output application/dw
var short1 = 2
var short2 = 3
---
sqrt(pow(short1,2) + pow(short2,2))
```
1. As a developer looking at this code for the first time, it’s not very obvious what the
original developer was trying to calculate. For example, if I didn’t know the Pythagorean
Theorem, I wouldn’t even know what to lookup to verify this code is correct!
2. If a developer did understand what the original developer was trying to calculate, they’re
immediately exposed to the implementation details of how to calculate the hypotenuse.
3. A developer cannot take the formula and reuse it elsewhere in the script without
copy-pasting it.
```
%dw 2.0
output application/dw
fun hypotenuse(n, m) =
sqrt(pow(n,2) + pow(m,2))
---
{
twoAndTwo: hypotenuse(2, 2),
threeAndThree: hypotenuse(3, 3)
}
```
Given this example, it’s clear that functions provide developers the ability to abstract, name, and
reuse functionality.
If you’re coming from a language like Java, none of this should sound surprising to you.
Functions are a lot like methods in the sense that they abstract functionality, but that’s where the
similarities stop. DataWeave functions enjoy much more privilege than Java’s methods. What do
I mean by this? DataWeave treats functions like normal values just as it does Strings, Numbers,
etc. It’s a very simple concept but the implications of it are profound. This means DataWeave
functions can be stored and reused as variables, passed to other functions as parameters, and
returned from functions as well. The design implications of this are so profound that languages
supporting this ability often fall under a programming paradigm called Functional Programming,
which is completely different from the Object-Oriented Programming paradigm that Java lives
under. The reason for the distinction is because the thought-process (i.e. paradigm) and design
patterns used to develop solutions for Object-Oriented languages are completely different from
those used in Functional languages. We will delve much further into this topic in the section on
Functional Programming.
Because functions are so centric to DataWeave’s design, the language supports convenient ways
to declare and call functions. We’ll be exploring these methods in this section.
Functions are typically declared with the fun keyword which is valid in the header of
DataWeave scripts and the header of do expressions (more on do expressions later). Here’s an
outline:
```
fun <name>(<param_1>, <param_2>, ..., <param_n>) =
<body>
```
Function names and parameter names share the same limitations as variables do. They must start
with an alpha, and every character following must be an alphanumeric or an underscore.
```
%dw 2.0
output application/dw
fun add(n, m) =
n + m
---
add(1, 1)
```
The above script would, as expected, return 2. The above script also demonstrates how to call
functions using open and closed parentheses surrounding a comma-separated list of arguments.
This is called prefix notation because the name of the function prefixes the list of arguments.
However, this isn’t the only way to call functions in DataWeave. Functions that receive exactly
two arguments can also be called using infix notation, meaning the name of the function may
appear between the arguments, with no need for parentheses:
```
1 add 1
```
If you’re familiar with popular languages like Java or Python, you’ll probably find it odd that
DataWeave functions don’t require a return keyword. It’s not just that either, DataWeave
doesn’t even have a return keyword. Why is that? We could go into a fairly detailed
discussion on why this design choice was probably made, but for now it suffices to say that
DataWeave does not use a return keyword because the expressions in the function body must
evaluate to a single value, even if the single value is an Array containing other values. Once that
single value is reached, it is returned. In other words, it is inevitable that functions in DataWeave
return values. In the example above, n + m was substituted with the input parameters to form
the expression 1 + 1, which evaluates to 2. Since no other expressions needed to be evaluated,
the function returned 2.
```
%dw 2.0
Output application/dw
fun hello() =
"Hello"
---
hello()
```
The above script would return "Hello", but it uses poor style. Most functions that do not
receive arguments merely return the same value over and over, just like variables. If you need
this functionality you’re often better off using variables instead.
### Lambdas
DataWeave allows you to create functions without names as well. These functions are generally
referred to using any of the following names:
● Function literals
“Function literal” is my favorite, but this text will use the term “lambda” because it is used most
widely in the DataWeave documentation. Recall from the section on types that most types have a
notation that allows you to create a fixed value on the spot, called literal notation. Here are some
examples of literals:
```
var stringLiteral = "String"
var numberLiteral = 5
var dateLiteral = |2020-01-01|
var booleanLiteral = true
var arrayLiteral = []
var objectLiteral = {}
```
If you’re familiar with object-oriented programming languages like Java, you can think of these
literals as special constructor syntax. They allow you to create commonly-used types like Strings
and ints on the fly. DataWeave has literal syntax for functions as well:
```
var functionLiteral = (n,m) -> n + m
---
functionLiteral(1,1)
```
Let’s break this down. In the above example we’re declaring a variable called
functionLiteral, and setting it to a function. The function defines the two arguments it
receives and then, following a -> defines the body of the function which returns the sum of its
two arguments. Everything on the right side of the equals sign is a function literal, which we’ll
refer to from this point forward as a lambda:
Once a lambda is assigned to a variable like in the above script, it functions no differently than a
function that was declared with the fun keyword. The only difference is this syntax is more
```
[1,2,3] map (n) -> n + 1
```
The above function returns [2,3,4]. The map function is the tool of choice when you need to
change every value in an Array using the same function.
Lambdas are nice because they relieve the developer of needing to come up with a name for a
function every time they simply need to pass it to another function. Imagine if you needed to
come up with a variable name for a Number before you could pass it to a function. We have
lambdas for the same reason.
You may ask yourself if the name n used in the (n) -> n + 1 lambda has any significance. It
does not, you may choose any name for your lambda parameters provided they follow the same
rules used for naming variables and functions.
You may also be asking yourself how map knows to pass each of its values to the function, or
how you would know it’s ok to pass map a function that receives a single argument. Why not a
function that receives 2 or 3 arguments? It is the map function’s responsibility to pass values to
the function you provide. The map function also defines how many parameters the input function
should take in, as well as what it can return. You might be surprised to hear that the map function
passes two parameters to the lambda: a value from the input Array, and the index of that value.
The second parameter is optional and can be left out, but if you only need the second parameter,
you have no choice but to define the first:
```
["one","two","three"] map (v,idx) -> idx
```
DataWeave has lamdas to avoid burdening the developer with creating countless function names.
This is especially convenient when passing functions to other functions using lambdas. However,
when you use lambdas you still need to create parameter names. DataWeave’s dollar syntax
allows the developer to create a lambda with even less syntax and mental overhead, without the
need to name parameters. We’ll use the same example as above, using map to add 1 to each
number in the input Array. However, in this example we’ll use dollar syntax:
```
[1,2,3] map $ + 1
```
This also returns [2,3,4]. A single dollar sign refers to the first parameter passed to the
lambda. You can also use a double dollar sign to refer to the second parameter:
```
[1,2,3] map $$
```
The second parameter passed to the lambda is the index of the first parameter, so this would
return [0,1,2]. You can also use a triple dollar sign to refer to the third parameter where
applicable (mostly when dealing with Objects, not Arrays).
This convenience does not come without a cost, however. I generally only recommend using the
dollar syntax when you’re doing something simple with the lambda. It’s almost always a better
long-term choice to create thoughtful names for your lambda parameters so that developers
reading your code can more easily understand your intentions.
```
fun add(n: Number, m: Number): Number =
n + m
```
Adding type information to your functions can be helpful in scenarios where you want your
function to make certain guarantees about the types of its input parameters and output. When you
specify type information, some of it is enforced at design-time (before the code is running) and
some of it is enforced at run-time (when the code is running). For example, the following
function definition would give you a design-time error because the function is guaranteeing a
Number will be returned, but it can only possibly return a String:
```
fun number(): Number =
"hello"
```
DataWeave’s type system is relatively dynamic when compared to languages like Java, and it
allows the implicit coercion of Number to String in situations like those above. While type
coercion can be convenient it also creates ambiguity. When the author wrote the following
function, did they intend it to be used with Strings, or Numbers?
```
fun add(n, m) =
n + m
```
The more code I read, the more I tend to steer away from this kind of ambiguity; I want the
person reading my code to have a clear understanding of what I originally wanted the code to do.
Because of this aversion to ambiguity, I often specify type information when creating my
functions. You should give some consideration to doing the same!
## Do Expressions
This section covers do expressions. We’ll start by explaining the core concepts that make do
expressions useful: scoping and name resolution. Once this is established, we’ll discuss what do
expressions are, and useful ways to use them.
```
%dw 2.0
output application/dw
var x = 1
Instead of concerning ourselves with the output of this script, let’s concentrate on when it’s valid
to reference a variable or function name. If we define any variable, function, or namespace in the
header of the script, it is available everywhere in that script. The variable x and the function
addOne are both used in the body without problem. Let’s change the script a little:
var x = 1
In this scenario, the function add is referencing a variable, x, that was declared outside of the
function body. Since x was declared in an outer scope relative to the add function, DataWeave
can successfully resolve the name x. What is an outer scope? Simply, and outer scope is a scope
that encapsulates the current scope. For DataWeave, I like to think of scopes as a series of nested
circles:
1. Relative to the current scope, DataWeave may use outer scopes to resolve names
2. Relative to the current scope, DataWeave may not use inner scopes to resolve names
In reference to our script above, this means that the “script” scope cannot resolve the name n
using the “add function body” scope, but the function can resolve the x name using the outer
“script” scope.
```
%dw 2.0
output application/dw
var n = 2
Should this script return 2, or 3? The real question is whether the name n in the addOne body is
resolved to the variable or the input parameter. The answer is that n in the function body refers to
the input parameter, not the variable. In computer programming this is called variable
shadowing. DataWeave has the following name resolution rules for this case: if there exist
duplicate names in the current scope and the outer scopes, the name available in the current
scope is used, and the others are ignored. Imagine you are the computer responsible for
executing this program (exciting, right?). When the program starts, the header is evaluated first,
setting the variable n to 2 and creating the addOne function. Next, the body of the script is
evaluated, so the function addOne is called with the value 1. Within the addOne function body, n
can reference two possible values, 2 from the variable and 1 from the input parameter. In the
current scope n is 1, so it is used. The program would need to reach outside of the current scope
to access the other n, so it is ignored.
1. Relative to the current scope, DataWeave may use outer scopes to resolve names
2. Relative to the current scope, DataWeave may not use relative inner scopes to resolve
names
3. If a duplicate name exists in the current scope and a relative outer scope, the name
resolves to the value declared in the current scope.
We’ve seen that a DataWeave script has its own script-level scope, and that scopes can be created
by functions. There is a more flexible way to create a scope in DataWeave: do expressions. Let’s
look at an example:
```
%dw 2.0
output application/dw
var x = 1
---
do {
var y = 1
The above script would return 2. From this script we can deduce a few things about do
expressions:
Now that we know what do expressions are and how to create them, let’s discuss how they’re
useful. In my experience, do expressions are great for a few things:
## Conditional Logic
Conditional logic in DataWeave is typically expressed in one of three ways:
1. if/else expressions
2. Pattern matching
3. The default keyword
Most developers are familiar with if/else, but many are not familiar with pattern matching. This
section will cover both and when it’s best to make use of one over the other.
```
%dw 2.0
output application/dw
The parentheses after the if keyword are not optional, and the expression within the parentheses
must evaluate to a Boolean value.
# Functional Programming
<Text>
## Expressions vs Statements
<Text>
## Arrays
++, +, map, filter, reduce, groupBy, distinctBy
## Objects
++, -, mapObject, filterObject, pluck
## Strings
++, “$()”, contains, replace, splitBy, joinBy, read, write
## Dates
<Text>
## Numbers
<Text>
# Future