0% found this document useful (0 votes)
55 views95 pages

[DRAFT] The DataWeave Book

This document outlines the permissions and restrictions associated with a work protected by a Creative Commons License, allowing non-commercial copying and distribution while prohibiting derivative works. It introduces DataWeave, a functional programming language designed for efficient data transformations, detailing its execution model and components. The document also covers DataWeave's primary data types, including scalar and vector types, and provides examples of how to use them in scripts.

Uploaded by

Upadrasta Harish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views95 pages

[DRAFT] The DataWeave Book

This document outlines the permissions and restrictions associated with a work protected by a Creative Commons License, allowing non-commercial copying and distribution while prohibiting derivative works. It introduces DataWeave, a functional programming language designed for efficient data transformations, detailing its execution model and components. The document also covers DataWeave's primary data types, including scalar and vector types, and provides examples of how to use them in scripts.

Uploaded by

Upadrasta Harish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Please note that this work is protected by a

Creative Commons License -


https://creativecommons.org/licenses/by-nc-
nd/4.0/legalcode
You DO have permission to:
● Copy, distribute, and display this content FOR NON-COMMERCIAL PURPOSES
ONLY.

You DO NOT have permission to:


● Create derivative works off of this content

By continuing to read, you agree to comply with the above limitations.

© 2023 Jerney.io LLC 1


# Intro
<text>

## About this Book

## Acknowledgements
My family
Amanda Pearo
Cyril Thornton
David Wang
William Gradin
Mariano De Achaval
Ana Felisatti
Leandro Shoki
Sabrina Marechal
Jordan Schuetz
Meg Durcan
Alex Mendoza
Bera Aksoy
Aaron Lieberman
Manik Magar
John Callahan
Patryk Bandurski
Ernie Maldonado

## About Me
<text>

© 2023 Jerney.io LLC 2


## Setting up the DataWeave CLI + Playground
<text>

# Overview
This section will give an overview of the DataWeave language. By the end of the section you
will be able to describe what the DataWeave language is and what problems it was designed to
solve. You will also be able to describe the high-level components of the runtime and how they
work together to solve integration problems.

## What is DataWeave?
DataWeave is a functional programming language designed to quickly create efficient data
transformations with fewer bugs. It is a free-to-use programming language built by MuleSoft,
and is the primary transformation and expression language of the Mule 4 Runtime. DataWeave
takes the messy concerns of serializing and deserializing data and turns them into an
implementation detail of the language, allowing the developer to devote more energy to creating
a correct data transformation. It uses a small number of data types, simple syntax, and powerful
functions to transform data to and from a variety of different data formats like JSON, XML and
CSV. Just like Java, Scala, Groovy, and the Mule Runtime as a whole, DataWeave runs on the
battle-tested JVM.

## Why DataWeave?
In the niche of software integration, creating and testing data transformations is a daily concern.
While DataWeave excels at enabling developers to create complex transformations with a
relatively small amount of code, that’s only part of why it is so valuable. Let’s dig into some of
the pain points around transformation to help illustrate what DataWeave does that makes it so
helpful to integration developers.

Traditionally, the difficulty in transformation lies in three places:

1. a “pre-transform” phase where data is parsed into a program’s data structures (typically
done with a library). This is known as deserialization.

© 2023 Jerney.io LLC 3


2. the actual transforming of the data,
3. a “post-transform” phase where data is written to a particular data format. This is known
as serialization.

Deserialization allows languages like DataWeave to manipulate JSON arrays as a sequence of


values, to manipulate JSON objects as a sequence of key:value pairs, etc. Data formats like
JSON support not only strings, but also arrays, objects, booleans, and numbers. Without
deserialization the data effectively comes into the program as a String, a very limiting type for
data transformation.

Serialization allows languages like DataWeave to create data in a format that other programs
know how to work with. Formats like JSON and XML allow for programs written with different
programming languages to exchange data. Without serialization, programs cannot effectively
communicate with other programs.

Most languages have libraries for serializing and deserializing the most common data formats
like JSON, XML, and CSV. For example, in the case of Java, developers often use 3rd party
libraries like Jackson to parse JSON data into Java Objects. The Java developer designs Classes
that mirror the JSON data being consumed by the system. This design includes the individual
fields of the JSON, their expected types as they relate to Java types, as well as how the overall
structure of the JSON relates to Java Collection types like List and Map. Jackson then uses those
classes to create Java Objects from the JSON data (deserialization). When the data is effectively
transformed and ready to be transported out of the system, it then must be serialized. In this
example, the Java object might be serialized into a byte[] representing JSON, then sent over the
network to a remote system.

While these libraries shield the developer from the complexity of parsing and serializing
different data formats, they are still left with the mundane work of building the appropriate
representation of the data in their language and library of choice. But what library should the
developer use? Does the developer need to work with XML in Java? Then they must choose
between SAX, DOM, StAX, or JAXB. Does the developer need to work with JSON in Java?
They’ll need to pick between Jackson, Gson, json-io, Genson, and others. Each of these libraries
has their own unique set of APIs, types, and performance tradeoffs that a developer must have an
understanding of before making an informed decision.

At this point you might be thinking it would be great if there was a single library that could
efficiently (de)serialize nearly any data format you throw at it. This way developers would only

© 2023 Jerney.io LLC 4


need to learn a single API, and a single set of types. They would only need to develop a single
mental model for how to handle the (de)serialization aspects of data transformation. This kind of
ideal exists within MuleSoft but it’s not a library, it’s a programming language: DataWeave.

## Execution Model
DataWeave solves the problem of developers needing to understand multiple (de)serialization
libraries by making (de)serialization an implementation detail of the language. The developer
only needs to tell DataWeave what data format to expect, and DataWeave automatically takes
care of parsing incoming data into its own data types that can be easily queried and transformed
using a diverse and powerful set of selectors, operators, and functions. It does the same thing
when it’s time to write data out into a particular data format. You tell DataWeave what you want
and it takes care of the rest.

How does DataWeave do this? DataWeave utilizes a Reader and Writer that are integral to the
language and its design:

Data comes into DataWeave and hits the Reader first. The Reader takes instructions from the
DataWeave script and deserializes the data into DataWeave objects based on those instructions.
The DataWeave script then queries and transforms the input to create output. This output is then
sent to the Writer, which uses the instructions from the DataWeave script to serialize the
DataWeave values into the desired output data format.

Here’s an example script that takes in an XML payload and writes out a JSON payload:

```
%dw 2.0
input payload application/xml
output application/json
---

© 2023 Jerney.io LLC 5


payload
```

# Intro Concepts
This chapter will cover most of the basic concepts necessary to understand simple DataWeave
scripts. By the end of the chapter you should have a firm grasp of the following:

● Components of a DataWeave script


● DataWeave’s primary data types

## Anatomy of a DataWeave Script


In this section, we’ll examine all of the common components of a DataWeave script.

Below is an example of a DataWeave script:

```
%dw 2.0
input payload application/xml
output application/json

import * from dw::core::Arrays

fun greeting(name) =
"Hello, " ++ name

var obj = {hello: greeting("Foo")}


---
payload
```

The first line declares that this script is for DataWeave 2.0. There is another version of
DataWeave, 1.0, that comes with the Mule 3.7 and later 3.x Runtimes.

© 2023 Jerney.io LLC 6


The second line declares that this script will deserialize its input as XML, and assign it to the
variable `payload`.

The third line declares that this script will serialize its output as JSON

After the output declaration is an import declaration. This is how you can use other DataWeave
code within your script.

Below the import line and above the triple dash (---) are a couple other declarations. In this case,
the script is creating a function called “greeting”, and a variable called “obj”.

Everything above the triple dash (---) is referred to as the header of the script. This is the area of
the script where all the declarations are made. The input and output data formats, imports,
variable declarations and function declarations are all done in the header. DataWeave does not
enforce a particular order, but the common convention is to list your declarations in the
following order if it makes sense to do so:

1. Version
2. Input (if needed)
3. Output
4. Imports
5. Functions
6. Variables

Note that while DataWeave does not enforce a particular order, it does evaluate the header
sequentially, meaning you must declare variables, functions, etc before you use them later in the
header. For example, the following would fail because the variable “x” is used before it’s
declared:

```
%dw 2.0
output application/json

var y = 1 + x
var x = 2
---
y

© 2023 Jerney.io LLC 7


```

In a DataWeave script, everything below the triple dash (---) is called the body. The body
defines the output of the DataWeave script.

You may see DataWeave scripts that only have a header and contain no input or output
declarations:

```
%dw 2.0

var location = “Denver”

fun greeting(salutation) =
salutation ++ “, “ ++ location
```

These are not executable DataWeave scripts, but are instead modules that can be imported and
used by executable scripts. We will learn about modules in a later chapter.

## Primary Data Types


This subsection will describe DataWeave’s primary data types. It will cover all of the common
types you will work with on a daily basis. We’ll go in depth on some of the more misunderstood
types like DateTime vs LocalDateTime, and cover all the unique features DataWeave gives us for
creating Objects and Arrays. The scope of this subsection is limited to explaining what the types
are and how to create data of these types using DataWeave’s syntax. How to manipulate these
types is reserved for other sections, like the subsection on selectors, and the section on
DataWeave’s standard library.

DataWeave has a core set of types that enable it to represent most common data formats, and
even a few uncommon ones. We can break these types into two categories: scalar types that
represent a single value, and vector types that represent a collection of values. DataWeave’s main
scalar types should be familiar if you’re worked with other programming languages before. They
are:

1. String

© 2023 Jerney.io LLC 8


2. Number
3. Boolean
4. Date & Time types:
a. Date
b. DateTime
c. LocalDateTime
d. Time
e. LocalTime
f. Period
i. DatePeriod
ii. Duration

DataWeave has two vector main, or collection types:

1. Object
2. Array

### Strings
Strings represent a series of characters or text. They are created by surrounding text in a pair of
double or single quotes, just like quotes for a character speaking in a novel:

```
var s1 = "I am a string."
var s2 = 'I am also a string.'
```

The choice between double and single quotes should depend on whether or not the String itself
contains double or single quotes. Single quotes within a single-quoted String need to be escaped.
Escaping a character in a String just tells DataWeave to treat that character in a special way. If
you escape a single quote in a single-quoted String, that informs DataWeave that particular single
quote is not the end of the String, and should be treated like the rest of the characters. The same
goes for double quotes. However, if a String is created using double quotes, it does not need to
escape single quotes, and vice versa:

```

© 2023 Jerney.io LLC 9


var s1 = "I'm a string."
var s2 = 'He said, "I am also a string."'
```

If we take the case above and use single quotes for “s1” and double quotes for “s2”, we would
need to escape the quotes within the String itself with a backslash:

```
var s1 = 'I\'m a string.'
var s2 = "He said, \"I am also a string.\""
```

Escaping quotes within Strings is important because it allows developers to have double quotes
in double-quoted Strings while still allowing DataWeave to accurately determine where the
String starts and ends.

Just because you can use double or single quotes for Strings does not mean you should use
whatever your gut tells you at the time. Be consistent. Pick double or single and stick with it,
only deviating from your choice when you can reap the benefits of not having to escape quotes
within your String.

### Numbers
Numbers in DataWeave are represented like so:

```
var int = 1
var float = 6.67
var eNotation = 6.67E-10
```

Floating point numbers and integers alike are both represented by the Number type in
DataWeave. The language also supports e-notation to represent very small and very large
numbers. By default, when performing mathematical operations, DataWeave will carry through
the precision of the smallest number:

© 2023 Jerney.io LLC 10


```
6.6700 * 2
// Returns: 13.3400
```

If DataWeave is instructed to perform a mathematical operation on a Number and a String, it will


make a best-effort attempt to coerce that String to a Number and perform the operation:

```
6.6700 * "2"
// Returns: 13.3400
```

I personally find this feature troublesome. In the interest of creating code that is easily
understood and explicit about its intent, I generally advise not to use this feature unless
necessary. This book will cover techniques you can use when writing your own functions to help
mitigate against this kind of coercion.

### Booleans
The two Boolean values in DataWeave are represented by true and false.

```
var booleanTrue = true
var booleanFalse = false
```

DataWeave does not try to coerce non-Boolean values to be true or false. For example, some
languages will interpret the number 0 as being false, and any non-zero number as being true
(looking at you, JavaScript). DataWeave makes no attempt at doing any type coercion to
Boolean.

### Date & Time Types


DataWeave supports a powerful set of date and time types.

The Date type represents a certain day on the calendar, and does not contain any information
about time. To create a Date, use the following format, |yyyy-MM-dd|:

© 2023 Jerney.io LLC 11


```
var date = |2020-07-26|
```

The DateTime type represents a specific instance in time. DateTime instances are always
associated with a timezone. This is because without a timezone, a date and time combination
could represent multiple instances in time (e.g., New Year’s Day in Denver and London take
place at different places on a fixed timeline, even though the clock time is the same). Time is
specified in a HH:mm:ss format, and the timezone is specified at the end with a +/-HH:mm
format:

```
var datetime = |2020-07-26T15:32:16-08:00|
```

The LocalDateTime type is just like a DateTime type but without a timezone:

```
var localdatetime = |2020-07-26T15:32:16|
```

It’s important to always question yourself when deciding to use LocalDateTime instead of
DateTime. They look extremely similar, but represent two subtly different things. DateTimes
represent an exact instance in global time, like when an order was placed, or when a user last
logged in to their Twitter account. They represent a singular, unambiguous point on a timeline
because they have a date component, a time component, and time zone component.
LocalDateTimes on the other hand, do not have the precision to represent a particular instance in
time. They are ambiguous unless they come with an associated location or a time zone that could
be reasonably assumed. For example, if you receive a wedding invitation date and time on an
invitation for a wedding in Hawaii, that would be a LocalDateTime. They are called
LocalDateTimes because they can only be reasoned about as instances in time within the locality
to which they apply. LocalDateTimes only have a date component and time component, they do
not have a time zone component. Unfortunately, LocalDateTimes tend to be easier to use in the
short term because they alleviate the developer of having to concern themselves with time zones.
However, providing a client with an ambiguous instant in time instead of a specific one can be

© 2023 Jerney.io LLC 12


disastrous. It is extremely important that the integration you develop is not the place where time
zone information is lost. Be careful when using LocalDateTime!

Times are like DateTimes but without a date component. Just like DateTimes, they must contain
a TimeZone component. However, because they lack a date, they cannot represent a particular
instance in time, just a particular instance in a day.

```
var time = |15:32:16-08:00|
```

LocalTimes are like LocalDateTimes but without a date component. Just like LocalDateTimes,
they do not contain a TimeZone component, and therefore can only be reasoned about within the
locality they apply to. If you invite a colleague for lunch at 12:00, that’s a LocalTime.

```
var localtime = |15:32:16|
```

Whereas all of the previous date and time types represented an instance in time (either
specifically or ambiguously), the next set of types represent an amount of time. When
representing an amount of time, it’s important to remember there are two different ways to think
of a period of time: we can think of it as time-based (referred to as a Duration) or date-based
(referred to as a DatePeriod). The difference, again, is subtle. A Duration is a specific amount of
time. It is the amount of time between two points on a fixed timeline. On the other hand, a
DatePeriod has ambiguity as to how much time actually passes during the period. This is because
DatePeriods like “2 months” are dependent on a particular starting date to determine the precise
amount of time. January and February have a different number of days than March and April do.
January and February 2020 have a different number of total days than January and February
2019 because of how leap years affect the number of days in February. It is because of this subtle
ambiguity that DataWeave has the concepts of Durations which represents a precise amount of
time, and DatePeriods which represent a more ambiguous and informal amount of time.

DatePeriods start with a P and use Y, M and D to represent years, months, and days. Here is a
period of 1 year, 2 months, and 3 days:

```

© 2023 Jerney.io LLC 13


var dateperiod = |P1Y2M3D|
```

Durations start with PT, and use H, M, and S to represent hours, minutes, and seconds. Here’s a
duration of 1 hour, 2 minutes, and 3 seconds:

```
var duration = |PT1H2M3S|
```

At the time of this writing, Durations do not support sub-second precision.

DatePeriods and Durations can be applied to the above date and time types with the + and -
operators:

```
var futureTime = |12:00:00| + |PT1H2M3S|
```

### Objects
Objects are a collection of key:value pairs. The Keys typically represent some sort of label for
the value. Here’s an example of a simple Object:

```
var object = {
name: "Jerney.io",
age: 29
}
```

One thing you might notice right away is that DataWeave’s Objects look a lot like JSON objects.
At first glance, the only difference is that DataWeave does not require quotes around Object
keys. Like JSON objects, Objects in DataWeave support multiple value types within a single
Object. In the example above, the Object contains both a String and Number. In computer
science terms, this means that DataWeave supports heterogeneous Objects. This is in contrast to

© 2023 Jerney.io LLC 14


the Map type in Java, which is homogenous. All of the keys in a Java Map must be the same
type, and all of the values must be the same type as well.

Objects can be nested like this:

```
var nestedObject = {
company: {
name: "Josh",
age: 29
}
}
```

DataWeave’s Object type has two interesting qualities that probably make it different from other
programming languages that you’re used to. Its Keys support Attributes, and Namespaces. Let’s
look at Attributes first.

Here’s how you add Attributes to a Key:

```
var object = {
name @(id: 1, loc: "Denver"): "Jerney.io"
}
```

If you’re following along in the DataWeave Playground and you output object as
application/json, you might notice that your Attributes do not show up in the output data. We’ll
cover why this happens in more detail in the MIME Types vs data types section, but for now it
will suffice to say that Key Attributes exist in DataWeave for the purposes of XML support and
are stripped for some data formats like application/json. Go ahead and change your output
declaration to be “application/xml”, and you will see the Attributes displayed in the output:

```
%dw 2.0
output application/xml

© 2023 Jerney.io LLC 15


var object = {
name @(id: 1, loc: "Denver"): "Jerney.io"
}
---
object
```

Knowing that DataWeave supports Attributes for XML support, it shouldn’t be surprising to hear
that DataWeave supports Namespaces for the same reason. Namespaces can be declared in the
header and can be used on Keys like this:

```
ns data https://www.data.com

var object = {
data#name: "Jerney.io"
}
```

Notice that there is no equals sign in the namespace declaration and that the namespace itself is
not quoted. This is important; a DataWeave script will not run if it does not adhere to these rules
for declaring namespaces. Just like above, if you’re following along with the DataWeave
Playground make sure you change your output type to “application/xml”.

Objects in DataWeave support one other helpful feature: conditional elements. With conditional
elements, we can provide DataWeave with a test to execute to determine if a key:value pair is
included in the created Object. Here’s an example:

```
var object = {
(name: "Jerney.io") if (0 > 1),
(location: "Denver, CO") if (1 ==1)
}
```

In this example, the output Object would not contain the “name” key, but it would include the
“location” key. The parentheses around the if expressions are optional, but the parentheses

© 2023 Jerney.io LLC 16


around the key:value pairs are not. This feature is extremely useful when building dynamic
queries for Database calls. In these situations, you give Mule a DataWeave Object where the
keys are the name of the parameter you want to insert, and value is the value of the parameter.
Mule will throw an error if it receives a key:value pair that is not used by the query. Conditional
key:value pairs allow developers to handle this scenario in a concise and elegant way.

DataWeave uses this same concept for multiple aspects of the language. For example, Attributes
can also be created conditionally:

```
var object = {
name @(
(id: 1) if (0 > 1),
loc: "Denver"
): "Jerney.io"
}
```

### Arrays
Arrays might be DataWeave’s most prolific and powerful data type. I really did save the best for
last. Like DataWeave’s Objects, Arrays are heterogeneous, meaning they support multiple
different types within the same Array. Here’s an example of an Array:

```
var array = [
"foo",
2,
|2020-01-01|,
[ { foo: "bar" } ]
]
```

Compared to Objects, Arrays don’t have any unique properties like Attributes and Namespaces,
they merely represent an ordered collection of items.

© 2023 Jerney.io LLC 17


Like Objects, you can conditionally add items to an Array while the Array is being created:

```
var array = [
("foo") if (0 > 1),
("bar") if (1 > 1)
]
```

# Reader/Writer Concepts
TODO: INTRO

● MIME types, data types, and how DataWeave glues them together
● How to manipulate the Reader and Writer to handle variations of different data formats
(e.g. reading CSV data without headers, writing XML data without a declaration)

## MIME Types vs Data Types

TODO: INTRO

A crucial aspect of DataWeave that is often overlooked is the relationship between the major
MIME types (application/json, application/xml, and application/csv) and DataWeave’s own data
types (i.e., Array, Object, etc). This section will explain how MIME Types and data types are
translated to and from each other by the DataWeave Reader and Writer. It will also cover the
application/dw MIME type.

Before beginning, let’s describe what a MIME type is. There’s a lengthy formal specification
from the IETF here, but for our purposes, Mozilla’s succinct definition will do:

> “[A MIME type is] a standard that indicates the nature and format of a document, file, or
assortment of bytes.”[1]

In other words, MIME types are metadata (i.e., data describing other data) that describe the
format of a piece of data. If a piece of data has a MIME type of application/json, a program

© 2023 Jerney.io LLC 18


should assume the data is formatted as a JSON document, and attempt to work with it as such. If
it has a MIME type of application/xml, a program should assume the data is an XML document,
and so on. MIME types can describe data at rest, like a file stored in AWS S3, as well as in
motion, like data traveling through your MuleSoft application, or over the network.

Why do MIME types exist? Remember that computers are incredibly stupid. They only do
exactly what a program instructs them to do. The graphic below represents a stream of characters
as they may be received by a program with the leftmost character being the first, and the
rightmost character being the last.

You, a human with an incredibly sophisticated cognition, a knack for pattern recognition, and
knowledge about the JSON data format, can immediately identify that this was probably meant
to be a JSON object. However to a computer with no contextual information this is just, at best,
an ordered collection of bytes. The computer cannot infer any meaning beyond that. The job of
the computer in this environment is not to infer, it’s job is to do exactly what you tell it to do. It
can’t make out that “foo” and “bar” are a key:value pair because it doesn’t have any idea what
constitutes a key:value pair. It doesn’t know what the curly brackets mean, or that there is an
error here because the opening curly bracket is not eventually succeeded by a closing curly
bracket.

MIME types give computers the context they need to make sense out of data like this. When you
tell a program that it should expect application/json and it receives this, it now has the tools it
needs to identify that this is not a properly formatted JSON document. Equipped with
information about the MIME type of the data, the program can now be more useful.

What about data types? Wikipedia has a good definition for data types:

> “In computer science and computer programming, a data type... is an attribute of data which
tells the compiler or interpreter how the programmer intends to use the data”[2]

Like MIME types, data types are metadata (i.e., data that describes other data). Unlike MIME
types, data types are bound to a particular compiler or interpreter. Said another way, data types
only have relevance within the context of a particular programming language environment.

© 2023 Jerney.io LLC 19


When a program knows the data type of a particular piece of data, it knows exactly what
operations it can and cannot perform on that particular piece of data.

Now that you have an understanding of what MIME types and data types are, how do they apply
to DataWeave? To understand this, we’ll need to take a closer look at the Reader and the Writer.
Here’s a modification of the diagram from earlier:

Notice that MIME types only apply when it comes to data outside of the DataWeave script, and
that data types only apply to data inside the DataWeave script. This is the mental model you need
when reasoning about how DataWeave works.

If data types only exist within a DataWeave script, and MIME types only apply outside of a
script, and DataWeave’s claim to fame is that it can transform data of many different MIME
types, then there must be some kind of translation between MIME types, and data types. That
concern is delegated to the Reader and Writer. They effectively act as translators from MIME
type to data type and back again.

To be able to effectively leverage DataWeave, it is of paramount importance to understand


exactly how MIME types and DataWeave’s data types are translated back and forth by the
Reader and Writer.

### DW - MIME Type / Data Type Translation Table


The first MIME type we’ll discuss is application/dw. This MIME type is the data format that
DataWeave uses to externally represent its own data structures. The translation table is very
simple, everything is a passthrough:

© 2023 Jerney.io LLC 20


Reading DW DataWeave Data Type Writing DW

Object Object Object

Array Array Array

Attribute Attribute Attribute

And so on...

While the application/dw MIME type will likely never be used as the output format of your Mule
application, it is still important to know, particularly when debugging scripts. The majority of
this section is used to describe how DataWeave maps between external MIME types and its
internal data types, but when it comes to application/dw, there isn’t any mapping that needs to
take place; just look at the translation table again.

Take the following DataWeave script for example:

```
%dw 2.0
output application/dw

ns jerney https://www.jerney.io

var city = "Denver"


---
{
jerney#person @(id: 1): {
location: city,
age: 28 + 1
}
}
```

This is what it would output:


```
%dw 2.0
ns jerney https://www.jerney.io

© 2023 Jerney.io LLC 21


---
{
jerney#person @(id: 1): {
location: "Denver",
age: 29
}
}
```

The only visible differences between a DataWeave script and piece of data in the application/dw
MIME type is that application/dw does not contain:

1. Unevaluated code - All code in the script is evaluated before it is sent to the Writer.*
2. Variable & function declarations - These are evaluated and used if needed. This
information is not sent to the Writer.
3. Input/Output declarations - These declarations are stripped. They do not have
significance outside of an executable DataWeave script.

* This is not entirely true. The application/dw format does allow unevaluated code, but it will
never be sent out of the Writer. If you send unevaluated code through the Reader, it is eventually
evaluated by the script before it is sent to the Writer.

The application/dw format may or may not contain a triple dash (---) with a header. If the output
contains no namespaces, it will not contain a header.

Data that adheres to the application/dw MIME type will always contain a clearly visible
representation of the structure of the data, including language features like Attributes and
Namespaces. This might not seem useful at first, but consider some actions the Writer might take
in an attempt to coerce DataWeave data types into compliance with a specific MIME type:

1. Coercing Arrays into Objects


2. Stripping type information in favor of a more flexible type (e.g., DateTime to String)
3. Stripping out data that does not apply to the specified output MIME type

Using the application/dw as your output type is perfect for times when you want to minimize the
involvement of the Writer and see exactly how DataWeave is representing the output of the script
using its own syntax.

© 2023 Jerney.io LLC 22


The application/dw MIME type is also important to know because it is the only MIME type
associated with the data going into and out of the DataWeave script itself. Here’s a visual to see
what I mean:

Put another way, application/dw is the MIME type that glues all the other MIME types together
so that DataWeave can work with them.

If you’re ever curious about how DataWeave is representing a particular piece of data internally,
just pass it through the output of the script and write the output to application/dw like so. When
working with Mule you’ll likely be doing this with the payload of the message:

```
%dw 2.0
output application/dw
---
payload
```

### JSON - MIME Type / Data Type Translation Table

JSON’s translation table is very simple because of how closely related DataWeave’s syntax is to
JSON. JSON objects translate to DataWeave’s Objects, arrays translate to Arrays, and so on. The
symbols they use to represent objects, arrays, strings, and numbers are almost exactly the same.
The biggest discrepancy between the two is that DataWeave has direct support for temporal types
like Date, DateTime, etc, and JSON does not. The Writer merely translates these temporal values
to JSON strings.

Reading JSON DataWeave Data Type Writing JSON

© 2023 Jerney.io LLC 23


object Object (nested or not) object

array Array (nested or not) array

string String string

number Number number

boolean Boolean boolean

Date, DateTime, etc string

Attribute N/A

Namespace N/A

JSON is more flexible compared to other formats because even something like a lone string or
number with no containing array or object is considered valid JSON. It’s like the Wild West of
data formats. We’ll see with other formats like CSV and XML that the rules are much more
strict.

### CSV - MIME Type / Data Type Translation Table

CSV’s translation table is simple, but the translation that takes place is not as obvious because
CSV doesn’t have concepts of arrays and objects. Instead it has headers, rows, and fields:

Reading CSV DataWeave Data Type Writing CSV

CSV Document Array of Objects with no CSV Document


nesting

Row Object with no nesting Row

Header field Object key Header field

Row field Object value, always String Row field

Number Coerced to String, Row field

Boolean Coerced to String, Row field

Date, DateTime, etc Coerced to String, Row field

© 2023 Jerney.io LLC 24


Attribute N/A

Namespace N/A

Object with nesting Error

Array with nesting Error

While the table above can be helpful in determining how certain elements translate to and from
either other, it unfortunately cannot tell the whole story. For example, if the output of a
DataWeave script is a single Number, that cannot be translated into properly-formatted CSV
data, so DataWeave will throw an error at runtime. The most important thing to remember when
working with CSV data is that DataWeave represents CSVs using an Array of Objects. Every
Object in the Array is a CSV row, and every key in the Object corresponds to a header field on
the CSV. Because of this, Objects cannot have any kind of nesting. CSVs are limited to
2-dimensions, but DataWeave’s Objects can be nested. With DataWeave it is always the
developer’s job to make sure the structure of the script’s output is in agreement with what the
Writer expects for the corresponding output MIME type.

It’s perfectly valid for CSVs to not have headers at all. For example, the following CSV
document does not have a header:

```
david,kim,25
heather,smith,42
```

By default, when the Reader reads in this CSV file and translates it into DataWeave’s data types,
this will be the result:

```
[
{
"david": "heather",
"kim": "smith",
"25": "42"
}
]

© 2023 Jerney.io LLC 25


```

Probably not the kind of data you were expecting, right? However, if the script informs the
Reader that inbound CSV data will not have a header, it will translate that same CSV to this:

```
[
{
"column_0": "david",
"column_1": "kim",
"column_2": "25"
},
{
"column_0": "heather",
"column_1": "smith",
"column_2": "42"
}
]
```

This brings us to an important point: DataWeave removes a lot of complexity around


(de)serializing these data formats, but it’s not magic. DataWeave cannot automatically determine
if a CSV file has a header or not, or whether or not it should write a CSV with a header or not.
By default, the Reader will always read the first line of a CSV file as the header, and the Writer
will always write Object keys as the CSV header. You can affect this behavior with Reader and
Writer properties, which will be covered in the next two sections.

### XML - MIME Type / Data Type Translation Table

XML is the most complex data format that DataWeave handles. Aspects like attributes and
namespaces are relatively unique to the format, and like CSV there are peculiarities around the
overall structure of XML that need to be accounted for when reading from and writing to this
data format. Let’s start with the translation table:

Reading XML DataWeave Data Type Writing XML

© 2023 Jerney.io LLC 26


XML Document Object at root with one key XML Document

XML Element Object XML Element

XML Tag Object key XML Tag or Attribute Key

XML Tag Content String XML Tag Content/Attribute Value

Repeating XML Object with Repeating keys Repeating XML Element


Element

XML Attribute Attribute XML Attribute

XML Namespace Namespace XML Namespace

XML CData CData XML CData

Date, DateTime, etc XML Tag Content or Attribute Value

Number XML Tag Content or Attribute Value

Boolean XML Tag Content or Attribute Value

Array (without nested Translated to Object w/ Repeated


Arrays) keys if possible, then to Repeating
XML Element

Array (with nested Arrays) Error

Depending on your knowledge of XML, this table could be extremely confusing. Let’s use some
concrete examples to further explain each row.

Reading XML DataWeave Data Type Writing XML

XML Document Object with one key XML Document

The simplest XML Document might look like this:

```
<foo>bar</foo>
```

© 2023 Jerney.io LLC 27


The Reader translates this to the following in DataWeave:

```
{foo: "bar"}
```

Take the following Object with two keys

```
{foo: "bar", baz: "bat"}
```

Would DataWeave be able to output this to XML? Before you give a try in the DataWeave
playground, take a moment to speculate what might happen.

If you thought that DataWeave would not be able to output the Object to XML, you would be
correct! This is because all XML documents must have a single root element. In DataWeave, this
means the root of the data you’re outputting to XML must be an Object with a single key! You
may have any number of keys in child Objects, but the root Object must only contain a single
key.

Reading XML DataWeave Data Type Writing XML

XML Element Object XML Element

XML Tag Object key XML Tag or Attribute Key

XML Tag Content String XML Tag Content/Attribute Value

We saw this in our previous example, XML Elements like <hello>world</hello>


translate to DataWeave Objects. XML Elements are composed of two parts, the tag <hello>,
and the tag content, world. These two XML concepts translate to DataWeave Object keys and
values respectively. Given the following input:

```
<root>
<name>Jerney</name>

© 2023 Jerney.io LLC 28


<age>500</age>
<location>
<state>CO</state>
<city>Denver</city>
</location>
</root>
```

The Reader will translate this like so:

```
{
root: {
name: "Jerney",
age: "500",
location: {
state: "CO",
city: "Denver"
}
}
}
```

Notice that “age”, which we would identify as a Number, is instead translated by the Reader to
be a String. This is because XML Tag Content has no type information associated with it,
making it very flexible. To accommodate for this flexibility, DataWeave translates all XML Tag
Content to the String data type.

Writing is the same concept in reverse. With Numbers and Dates, the DataWeave Writer casts
these types to String and writes them out as XML Tag Content.

Reading XML DataWeave Data Type Writing XML

Repeating XML Element Object with Repeating Repeating XML Element


keys

© 2023 Jerney.io LLC 29


XML has support for repeated elements. It is how XML supports array-like features. Here’s an
example:

```
<people>
<person>
<name>Xiao</name>
</person>
<person>
<name>Raj</name>
</person>
<person>
<name>Katherine</name>
</person>
</people>
```

Here’s how the above XML would be translated to DataWeave:

```
{
people: {
person: { name: "Xiao" },
person: { name: "Raj" },
person: { name: "Katherine" }
}
}
```

Interesting, DataWeave Objects support duplicate keys! To most developers, a programming


language supporting duplicate keys on an Object or Map would seem like a bug. However, if you
look at DataWeave and zoom out to see the data formats it needs to support, it becomes clear that
for DataWeave to fully support XML, it needs to support duplicate keys on Objects. This has
some interesting ramifications. If the Object data type supports duplicate keys, how does the
Writer handle them when outputting JSON and CSV? The JSON and CSV specs do not
explicitly state that duplicate object keys or headers are not allowed, respectively. In order to be
fully supportive of those specs, DataWeave simply allows duplicate keys for JSON and duplicate

© 2023 Jerney.io LLC 30


headers for CSV. It is the developers job to handle duplicate keys if there should not be any in
the output.

You might notice something strange when comparing the XML translation tables to JSON and
CSV: XML is the only data format that has no clear translation for DataWeave’s Array type.
That’s odd. DataWeave is supposed to be a language that supports easy reading and writing to
multiple different kinds of data formats. The developer should be able to read in JSON and write
to XML. But JSON supports Arrays and XML does not. How does a developer translate an
Array of values to repeating elements in XML? You have two options, you can leave it to the
Writer to attempt to sort it out, or you can manually do it yourself using dynamic elements. Both
of these options do not have analogies in other languages, and in my experience this ends up
being a pain point when working with XML.

If the developer leaves it to the Writer, they might be surprised:

Here’s the script:

```
%dw 2.0
output application/xml
---
{
root: {
people: [
{person: "Xiao"},
{person: "Raj"},
{person: "Katherine"}
]
}
}
```

And here’s the output:

```
<?xml version='1.0' encoding='UTF-8'?>
<root>

© 2023 Jerney.io LLC 31


<people>
<person>Xiao</person>
</people>
<people>
<person>Raj</person>
</people>
<people>
<person>Katherine</person>
</people>
</root>
```

If the Writer can handle writing DataWeave Arrays to XML, why isn’t it in the translation table?
I can’t claim to know exactly what’s going on here because I don’t have access to the DataWeave
source code, but here’s my understanding: When dealing with XML, the Writer preprocesses the
DataWeave output, coercing Arrays to be Objects before finally serializing the data. It does so
with the following algorithm:

```
// objWithArray example: {data: [1,2,3,4]}
def coerceToObject(objWithArray):
key = getKey(objWithArray)
arr = objWithArrays[key]

for item in arr:


if (item is Array): ERROR()

new_obj = {}

for item in arr:


add_kv_pair(new_obj, key, item)

return obj
```

Let’s imagine that you needed this output instead of the one above:

© 2023 Jerney.io LLC 32


```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<people>
<person>Xiao</person>
<person>Raj</person>
<person>Katherine</person>
</people>
</root>
```

Given this use case, a developer cannot leave it up to the Writer to provide the correct output,
because the algorithm it uses to handle the Array will always result in a repeated “people” tag for
every “person” tag. DataWeave provides an interesting feature for handling this scenario called
“dynamic elements”. Dynamic Elements is a technique to dynamically add key:value pairs to an
Object using existing Objects and/or Array of Objects. Confusing? I’ll show how a developer
would use this feature to obtain the above output, then describe how it’s working:

```
%dw 2.0
output application/xml
---
{
root: {
people: {
([
{person: "Xiao"},
{person: "Raj"},
{person: "Katherine"}
])
}
}
}
```

This would then yield the correct output:

© 2023 Jerney.io LLC 33


```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<people>
<person>Xiao</person>
<person>Raj</person>
<person>Katherine</person>
</people>
</root>
```

There are three pieces that the developer needs to get in place to use Dynamic Elements:

1. The Object that you are dynamically adding elements to must be declared explicitly with
curly braces. This is highlighted below:

```
%dw 2.0
output application/xml
---
{
root: {
people: {
([
{person: "Xiao"},
{person: "Raj"},
{person: "Katherine"}
])
}
}
}
```

2. The dynamic elements that you wish to add to the Object must be wrapped in
parentheses:

```

© 2023 Jerney.io LLC 34


%dw 2.0
output application/xml
---
{
root: {
people: {
([
{person: "Xiao"},
{person: "Raj"},
{person: "Katherine"}
])
}
}
}
```

3. The dynamic elements that you wish to add to the Object must be either an Object, or an
Array of Objects:

```
%dw 2.0
output application/xml
---
{
root: {
people: {
([
{person: "Xiao"},
{person: "Raj"},
{person: "Katherine"}
])
}
}
}
```

© 2023 Jerney.io LLC 35


If you follow these rules, DataWeave will take the key:value pairs of all of the Objects provided
in the parenthesis and add them to the root of the containing Object. This is much easier to see
when outputting to application/dw since the data format more closely mirrors DataWeave’s
syntax. If the output of the above script was changed to be application/dw, this would be the
result:

```
{
root: {
people: {
person: "Xiao",
person: "Raj",
person: "Katherine"
}
}
}
```

Reading XML DataWeave Data Type Writing XML

XML Attribute Attribute XML Attribute

The translation to and from DataWeave Attributes and XML Attributes is fairly straightforward.
For the following script:

```
%dw 2.0
output application/xml
---
{
root: {
person @(id: 1): "jerney"
}
}
```

The XML output would be:

© 2023 Jerney.io LLC 36


```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person id="1">jerney</person>
</root>
```

If you were to take the XML output and put use it as input to a script that outputs the same data
with application/dw, you would see an output very similar to the previous script:

```
{
root: {
person @(id: 1): "jerney"
}
}
```

Reading XML DataWeave Data Type Writing XML

XML Namespace Namespace XML Namespace

Like Attributes, the translation of DataWeave Namespaces to and from XML is fairly
straightforward. Given the following script:

```
%dw 2.0
output application/xml

ns jerney https://www.jerney.io
---
{
jerney#root: {
jerney#person: "jerney"
}
}

© 2023 Jerney.io LLC 37


```

The XML output would be:

```
<?xml version='1.0' encoding='UTF-8'?>
<jerney:root xmlns:jerney="https://www.jerney.io">
<jerney:person>jerney</jerney:person>
</jerney:root>
```

Reading XML DataWeave Data Type Writing XML

XML CData CData XML CData

DataWeave handles XML CData as well, which brings us to a new DataWeave type: CData.
XML CData is translated by the Reader into DataWeave’s CData data type, and the Writer
translates it back to XML CData. Since XML CData is really just a string of characters
embedded as XML Tag Content which are ignored by an XML parser. Because of this,
DataWeave’s CData type can be thought of as a type alias to the String type; all operations that
work on String work on CData as well. The only difference between a String and CData is how
the Reader and Writer deal with values of this type when working with XML. Let’s look at the
Reader first. We’ll send an XML document with CData as input, and view it as application/dw:

Input:

```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person><![CDATA[<embedded>xml</embedded>]]></person>
</root>
```

Representation as application/dw:

```

© 2023 Jerney.io LLC 38


{
root: {
person: "<embedded>xml</embedded>" as String {cdata: true}
}
}
```

The details of how DataWeave handles CData are more complicated than actually working with
it. The CData information is read in as a String by the Reader, and is appended with as
String {cdata: true}. This is called casting, and it’s used to transform data from one
type to another type in DataWeave. However, that’s not exactly what’s going on, here.
DataWeave is casting a String to a String, but adding this metadata of {cdata: true} to it so
that if that String gets passed to the Writer, and the Writer is serializing the data to XML, it
knows that String should be CData. This metadata doesn’t make the String different from any
other String in DataWeave. The following evaluates to true:

```
"Hello" as String {cdata: true} == "Hello"
```

The lesson here is that when you’re working with CData after reading it in from XML data, you
can simply treat it as any other String.

When writing CData you can do the same thing:

```
%dw 2.0
output application/xml
---
{
root: {
person: "<embedded>jerney</embedded>"
as String {cdata: true}
}
}
```

© 2023 Jerney.io LLC 39


This would output:

```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person><![CDATA[<embedded>jerney</embedded>]]></person>
</root>
```

However, it is preferred that you simply cast the String as CData:

```
%dw 2.0
output application/xml
---
{
root: {
person: "<embedded>jerney</embedded>" as CData
}
}
```

This has the exact same effect. "String" as String {cdata: true} and "String"
as CData are functionally equivalent in DataWeave.

### Conclusion
<text>

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types
[2] https://en.wikipedia.org/wiki/Data_type

##Writer Properties
Writer properties are instructions that a DataWeave script supplies to the Writer. These
instructions inform the Writer on exactly how to serialize the output. For example, sometimes
you may want to create some CSV data with a header, and other times without a header. Maybe

© 2023 Jerney.io LLC 40


you want a tab-separated document instead of a comma-separated one; Writer properties allow
for this kind of flexibility.

Writer properties do not always affect something that can be seen in the output of the script.
Some writer properties influence the execution aspects of the script, like streaming and buffer
size.

Using the diagram from earlier, here is the aspect of DataWeave this section will focus on:

Here is an example of a Writer property in a DataWeave script. This one creates a CSV with no
header where all values are quoted:

```
%dw 2.0
output application/csv header=false, quoteValues=true
---
[
{
name: "Alexis",
age: 42
},
{
name: "Raj",
age: 35
}
]
```

© 2023 Jerney.io LLC 41


Which would output the following:

```
"Alexis","42"
"Raj","35"

```

Writer properties are simply key:value pairs on the same line as the output declaration. However,
unlike most key:value pairs in DataWeave, keys are associated with values through an equals
sign, and pairs are separated by a comma.

### JSON

Writer properties for the application/json MIME type can be used to handle duplicate object
keys, compress the output by removing unnecessary whitespace, and filter nulls from the output
JSON. There are other properties that can be used to change the buffer size of the Writer, change
the encoding of the output, but they are less commonly used. This section will cover the
duplicateKeyAsArray, indent, skipNullOn, and writeAttributes Writer properties.

The duplicateKeyAsArray property informs the Writer on how to handle duplicate Object keys.
By default, DataWeave Objects containing duplicate keys will be serialized to JSON objects
as-is. If you set dupicateKeyAsArray to true, the Writer will collapse the duplicate keys into one,
and collect all values for the duplicate key into a JSON array. This prevents you from having to
implement this functionality yourself. Here’s an example:

```
%dw 2.0
output application/json duplicateKeyAsArray=true
---
{
name: "Alexis",
name: "Tim",
name: "Raj",
location: "Denver"
}

© 2023 Jerney.io LLC 42


```

This will output:

```
{
"name": [
"Alexis",
"Tim",
"Raj"
],
"location": "Denver"
}
```

The indent property informs the Writer that it should not “pretty print” the JSON payload, and
instead create it all on a single line. This feature can be useful when you wish to conserve
network bandwidth, as it acts as a filter to prevent sending insignificant whitespace over the
wire. On the other hand, it can make debugging a little more cumbersome because the JSON
won’t be as easy to read. By default, this value is set to true.

Here’s an example:

```
%dw 2.0
output application/json indent=false
---
{
foo: "bar",
baz: "bat"
}
```

This will output:

```
{"foo": "bar","baz": "bat"}

© 2023 Jerney.io LLC 43


```

The skipNullOn property is one of the most commonly used properties in DataWeave. It allows
you to remove null values from JSON arrays, remove key:value pairs in JSON objects when the
value is null, or remove null for both occasions. Unlike the other properties covered so far, the
skipNullOn property does not use a Boolean value, but instead uses a String. By default, this
property is effectively turned off.

Here’s an example of removing null values from JSON arrays:

```
%dw 2.0
output application/json skipNullOn="arrays"
---
{
foo: null,
bar: ["foo", null, "bar"]
}
```

This will output:

```
{
"foo": null,
"bar": [
"foo",
"bar"
]
}
```

Here’s an example of removing key:values pairs in a JSON object when the value is null:

```
%dw 2.0
output application/json skipNullOn="objects"

© 2023 Jerney.io LLC 44


---
{
foo: null,
bar: ["foo", null, "bar"]
}
```

This will output:

```
{
"bar": [
"foo",
null
"bar"
]
}
```

Finally, here is an example of removing null values for both JSON arrays and JSON objects:

```
%dw 2.0
output application/json skipNullOn="everywhere"
---
{
foo: null,
bar: ["foo", null, "bar"]
}
```

This will output:

```
{
"bar": [
"foo",

© 2023 Jerney.io LLC 45


"bar"
]
}
```

The writeAttributes property is mostly used when the input is XML, the output is JSON, and the
JSON output must retain the attributes. By default, this property is set to false, meaning attributes
will not be serialized to the output.

Here’s an example of the writeAttributes property set to true:

```
%dw 2.0
output application/json writeAttributes=true
---
{
foo @(bar: "bat"): ["foo", "bar", "baz"],
}
```

This will output:

```
{
"foo": {
"@bar": "bat",
"__text": [
"foo",
"bar",
"baz"
]
}
}
```

The “foo” key is now associated with a JSON object, despite it being associated with a
DataWeave Array in the script. The JSON object has two keys. Keys prefixed with “@” are

© 2023 Jerney.io LLC 46


attribute keys from the script paired with their associated value. The value associated with the
“__text” key is the same value associated with the “foo” key in the DataWeave script.

### CSV

Writer properties for the application/csv MIME type can be used to change the character that
separates fields, create a CSV without a header, and more. There are less commonly used Writer
properties, like those used to set an alternative quote character, change the output encoding, and
others. This section will cover the separator, and header Writer properties.

The separator property is useful for when you want to change the character that separates fields
in a CSV. This is how you make tab-separated and pipe-separated files.

Here’s an example of creating a tab-separated file:

```
%dw 2.0
output application/csv separator="\t"
---
[
{
name: "Alexis",
age: 42
},
{
name: "Raj",
age: 35
}
]
```

Which would output the following:

```
name age
Alexis 42

© 2023 Jerney.io LLC 47


Raj 35

```

You can set the separator property to any single character. If you provide more than one
character, DataWeave will ignore all but the first.

By default, the header writer property is set to true, meaning the output CSV will contain a
header describing each column of data. You can set the header property to false if you wish to
output application/csv without a header. Here’s an example:

```
%dw 2.0
output application/csv header=false
---
[
{
name: "Alexis",
age: 42
},
{
name: "Raj",
age: 35
}
]
```

This script would output the following application/csv:

```
Alexis,42
Raj,35

```

© 2023 Jerney.io LLC 48


### XML

Writer properties for the application/xml MIME type can be used to avoid writing null values to
the output, compress the output by removing unnecessary whitespace, avoid writing the XML
declaration to the output, whether to use inline close tags or open close tags, and more. This
section will cover the skipNullOn, indent, writeDeclaration, and inlineCloseOn writer properties.

The skipNullOn writer property for XML is similar to the one used for JSON. The difference is
“arrays” and “objects” are no longer valid values, but “elements” and “attributes” are. The
“everywhere” value still applies to both JSON and XML’s skipNullOn property.

When using the “element” value, DataWeave will remove all null elements from the output.
Here’s an example:

```
%dw 2.0
output application/xml skipNullOn="elements"
---
{
root: {
person @(id: 1, status: null): {
name: "Alex",
age: null
}
}
}
```

This script would output the following XML:

```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person id="1" status="null">
<name>Alex</name>
</person>

© 2023 Jerney.io LLC 49


</root>
```

Notice that the <age> element does not make it to the output. This is because the value
associated with that element was null, so DataWeave removed it in the output XML.

You can do the same with null XML attributes by setting skipNullOn to “attributes”. Here’s an
example using the same data:

```
%dw 2.0
output application/xml skipNullOn="attributes"
---
{
root: {
person @(id: 1, status: null): {
name: "Alex",
age: null
}
}
}
```

This script would output the following XML:

```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person id="1">
<name>Alex</name>
<age/>
</person>
</root>
```

In this case, the status attribute is not output at all because it was null. However, the empty age
element exists because skipNullOn=“attribute” will not remove null elements.

© 2023 Jerney.io LLC 50


If you need to remove both null elements and null attributes, use
skipNullOn="everywhere". Here’s the same example with the same data as before:

```
%dw 2.0
output application/xml skipNullOn="everywhere"
---
{
root: {
person @(id: 1, status: null): {
name: "Alex",
age: null
}
}
}
```

This script would output the following XML:

```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<person id="1">
<name>Alex</name>
</person>
</root>
```

In this case, neither the null attributes or the null elements make it to the output.

The indent writer property can be used to remove any unnecessary whitespace. Here’s an
example using the above data:

```
%dw 2.0
output application/xml indent=false

© 2023 Jerney.io LLC 51


---
{
root: {
person @(id: 1, status: null): {
name: "Alex",
age: null
}
}
}
```

This script would output the following XML:

```
<?xml version='1.0' encoding='UTF-8'?><root><person id="1"
status="null"><name>Alex</name><age/></person></root>
```

While this makes the XML more difficult for people to read, a computer can parse this easily. By
using indent=false the Writer effectively compresses the output, making it more efficient to
transmit the data over the network.

You can use writeDeclaration=false if you would like the Writer to not include the XML
declaration in the output. Here’s an example:

```
%dw 2.0
output application/xml writeDeclaration=false
---
{
root: {
person @(id: 1, status: null): {
name: "Alex",
age: null
}
}
}

© 2023 Jerney.io LLC 52


```

This script would output the following XML:

```
<root>
<person id="1" status="null">
<name>Alex</name>
<age/>
</person>
</root>
```

This feature is useful if you’re working with legacy software or other software that has problems
parsing an XML declaration.

Use inclineCloseOn=”empty” to avoid using self-closing XML tags. Here’s an example:

```
%dw 2.0
output application/xml inlineCloseOn="none"
---
{
root: {
person @(id: 1, status: null): {
name : "Alex",
age : null,
other : ""
}
}
}
```

This script would output the following XML:

```
<?xml version='1.0' encoding='UTF-8'?>

© 2023 Jerney.io LLC 53


<root>
<person id="1" status="null">
<name>Alex</name>
<age></age>
<other></other>
</person>
</root>
```

If inlineCloseOn=”none” was not set, the age and other tags would appear as <age/> and
<other/> respectively. With this property set, null values and empty strings appear as elements
with an open and close tag containing no value.

## Reader Properties
Reader properties are instructions that a DataWeave script provides to the reader. The job of
Reader properties is to inform the Reader how to parse the input data so that it may correctly
deserialized into DataWeave’s data types. For example, you may know incoming data of MIME
type application/csv does not contain a header. Without using Reader properties, the Reader will
always parse the first line of application/csv data as the header, not a row of data. You can use
Reader properties to inform the Reader that the first line of the application/csv data is not a
header, but the first row of data.

Reader properties can not only inform the Reader how to properly parse the input data, they can
also inform the Readers execution aspects like whether to stream the input data.

This section will demonstrate how Reader properties work by showing the input data, a simple
DataWeave script with the reader properties, and the output of the script as application/dw. By
outputting the input data as application/dw, we can see exactly how DataWeave is representing
the data in terms of its own data types.

Using the diagram from earlier, here is the aspect of DataWeave this section will focus on:

© 2023 Jerney.io LLC 54


### JSON
DataWeave only has one Reader property related to application/json, it is used to inform the
Reader that the input should be streamed. Streaming is outside of the scope of this book at the
time, and may be covered in a future iteration. When it comes to application/json, either the
Reader can parse the input data as-is, or the input data is not compliant with the JSON
specification.

### CSV
DataWeave has a few important Reader properties that you should be aware of. The most
important and commonly-used Reader properties for application/csv are header, which declares
whether or not the input CSV’s first line is a header, and separator, which declares what character
the input CSV is using as a field separator.

By default, DataWeave reads application/csv with two big assumptions: the data contains a
header, and fields are separated by a comma. For example, take a look at the following CSV:

```
Josh,29,Denver
Gina,22,Chicago
Edward,33,New York

```

© 2023 Jerney.io LLC 55


To us, the CSV clearly has 3 records (rows), and 3 fields per record (columns). They seem to be
related to a person’s name, age, and some kind of location. However, if we take that data and
send it to the following DataWeave script:

```
%dw 2.0
input payload application/csv
output application/dw
---
payload
```

The script would output this:

```
[
{
Josh: "Gina",
"29": "22",
Denver: "Chicago"
},
{
Josh: "Edward",
"29": "33",
Denver: "New York"
}
]
```

Instead of 3 records there are only 2, and instead of the headers being translated into meaningful
keys like “name”, “age” and “location”, DataWeave used its default behavior and parsed the first
record of the CSV as the header and the remaining two rows as data. To get around the default
behavior, you need to set the Reader property header to false. Given the same input as before:

```
Josh,29,Denver
Gina,22,Chicago

© 2023 Jerney.io LLC 56


Edward,33,New York

```

If you update the script to inform the Reader that the input CSV will not contain a header:

```
%dw 2.0
input payload application/csv header=false
output application/dw
---
payload
```

DataWeave will output something more inline with how you originally understood the CSV data:

```
[
{
column_0: "Josh",
column_1: "29",
column_2: "Denver"
},
{
column_0: "Gina",
column_1: "22",
column_2: "Chicago"
},
{
column_0: "Edward",
column_1: "33",
column_2: "New York"
}
]
```

© 2023 Jerney.io LLC 57


Instead of parsing the first line of data as the header, DataWeave now interprets the first line as
the first row of data. The output is still an Array of Objects, but the keys are now given names
related to the columns of the CSV.

What if you’re reading data that conforms to the application/csv MIME type, but is separated by
pipes instead of commas?

```
name|age|location
Josh|29|Denver, CO
Gina|22|Chicago, IL
Edward|33|New York, NY

```

If we send the above data to the following DataWeave script:

```
%dw 2.0
input payload application/csv
output application/dw
---
payload
```

It will output the following:

[
{
"name|age|location": "Josh|29|Denver",
column_1: " CO"
},
{
"name|age|location": "Gina|22|Chicago",
column_1: " IL"
},
{

© 2023 Jerney.io LLC 58


"name|age|location": "Edward|33|New York",
column_1: " NY"
}
]

```

That’s probably not what you were hoping for! Let’s break down what happened. First,
DataWeave parsed the CSV header, looking for commas to separate the individual header values.
Since were are no commas in the header, DataWeave interpreted the header as only having one
value, “name|age|location”. From that point forward, DataWeave parsed the CSV as if each row
contained a single value. DataWeave then began parsing the subsequent rows, trying to fit them
to the header data. It parsed each line, again looking for commas to determine the individual
values. In this case DataWeave does find a comma for each row, but now there’s a dilemma: after
DataWeave parsed the header, it moved forward assuming each row of data would have only
field. What it instead found was that each row of data contained two fields (everything before the
comma, and everything after). In order to accommodate this additional field, DataWeave assigns
it a default column name relative to its position in the CSV row, which is column_1 in this case.

These kinds of issues can easily be addressed with the separator Reader property. You can inform
DataWeave to treat any single-character String as a separator, including a tab ("\t"). Let’s see it
in action.

```
name|age|location
Josh|29|Denver, CO
Gina|22|Chicago, IL
Edward|33|New York, NY

```

If we send the above input to the following DataWeave script:

```
%dw 2.0
input payload application/csv separator="|"
output application/dw

© 2023 Jerney.io LLC 59


---
payload
```

You’ll get output that’s parsed into data types that better represent the original data:

```
[
{
name: "Josh",
age: "29",
location: "Denver, CO"
},
{
name: "Gina",
age: "22",
location: "Chicago, IL"
},
{
name: "Edward",
age: "33",
location: "New York, NY"
}
]
```

### XML

XML is another MIME type that does not contain many Reader properties outside the domains
of performance, streaming, and security.

© 2023 Jerney.io LLC 60


## Advanced: Programmatically Accessing the Reader
and Writer
DataWeave’s Reader and Writer contain a lot of power. Between the two of them they can
(de)serialize all the common MIME types that you’re likely to run into. So far we’ve only
discussed accessing the Reader and Writer through the input and output directives contained in
the header of a DataWeave script. DataWeave provides two functions, read and write that
allow direct access to the Reader and Writer, respectively.

While input and output directives cover most of our Data Type / MIME Type translation needs,
having direct access to the Reader and Writer via functions is necessary in some situations. For
example, maybe for some horrible reason you need to embed JSON into XML. That’s easy to do
with the write function:

```
%dw 2.0
output application/xml
---
{
root: {
data: write({hello: "world"},
"application/json",
{indent: false}) as CData
}
}
```

The above script would output the following:

```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<data><![CDATA[{"hello": "world"}]]></data>
</root>
```

© 2023 Jerney.io LLC 61


Notice that the write function mimics how the Writer works. The first parameter is just like the
output of a DataWeave script before it hits the Writer. The second parameter is the MIME type
that the Writer should use when serializing the data, just like an output declaration. The third
parameter is optional and contains any Writer properties formatted as an Object.

The read function is typically useful in the opposite situations where write is useful. If write is
useful for writing XML with embedded JSON, then read is useful for reading XML with
embedded JSON.

Take the following input for example:

```
<?xml version='1.0' encoding='UTF-8'?>
<root>
<data><![CDATA[{"hello": "world"}]]></data>
</root>
```

If you pass it to the following script:

```
%dw 2.0
input payload application/xml
output application/dw
---
{
beforeRead : payload.root.data,
afterRead : read(payload.root.data, "application/json")
}
```

You get the following output:

```
{
beforeRead: "{\"hello\": \"world\"}" as String {cdata: true},

© 2023 Jerney.io LLC 62


afterRead: {
hello: "world"
}
}
```

In other words, the read function, just like the Reader, deserializes strings of data into
DataWeave’s native data types. This is fantastic because now we can treat that embedded JSON
just like we would if it came directly through the Reader. If we didn’t have the read function,
we would be stuck working with the JSON data as String instead of an Object.

# DataWeave Syntax
TODO: Intro

● How to use selectors to query data


● How to use variables, functions and “do” expressions to declare reusable constructs in
your DataWeave code
● How to execute conditional logic and pattern matching in DataWeave, and how it differs
from imperative languages like Java

## Querying Data with Selectors


Once data is read into the DataWeave script through the Reader, the data can be queried using
selectors. Selectors are a syntax feature that allow the developer to traverse through values like
Arrays and Objects to extract sections of data from a larger payload. While there are over 10
selectors in DataWeave 2.0, this part of the book will only cover the following selectors which
you will most frequently use:

● Single-value selector
● Multi-value selector
● Descendants selector
● Index selector
● Range selector

© 2023 Jerney.io LLC 63


Generally, selectors can be grouped into 2 major categories, selectors that return values
associated with Objects keys, and selectors that return values associated with Array and String
indices.

Selectors for Object values Selectors for Array and String values

● Single-value selector ● Index selector


● Multi-value selector ● Range selector
● Descendants selector

### Single-Value Selector

The single-value selector is the most commonly used selector. It is typically used to query keys
on Objects, returning the value associated with the given key. For example:

```
%dw 2.0
output application/dw

var obj = {
name: "Alexis",
age: 42
}
---
obj.name
```

This would return "Alexis". While the single-value selector is mostly used on Objects, it can
be used on an Array of Objects as well:

```
%dw 2.0
output application/dw

var arr = [
{

© 2023 Jerney.io LLC 64


name: "Alexis",
age: 42
},
{
name: "Raj",
age: 35
},
{
name: "Josh",
detail: {age: 29}
}
]
---
arr.age
```

This script would return [42,35]. When the single-value selector is used on an Array of
Objects, it loops through each Object in the Array and uses the selector to query the specified
key on that Object (in the example above that key was age). If the specified key is associated
with a value on the top level of the Object, it will be added to the returned Array. The selector
will not traverse through nested Objects, which is why the script would not return
[42,35,29].

(!) I generally do not recommend using the single-value selector on Arrays to return multiple
values. As the name implies, the single-value selector should be used to return a single value. If
you want to return multiple values like in the example above, you should use the multi-value
selector so that the intentions of your code are more clear to other developers.

Like all selectors, you can chain together single-value selectors to traverse through a data
structure and return nested data:

```
%dw 2.0
output application/dw

var data = {
name: "Josh",

© 2023 Jerney.io LLC 65


detail: {age: 29}
}
---
data.detail.age
```

An alternative (but equally important) syntax for the single-value selector is shown below:

```
%dw 2.0
output application/dw

var obj = {
name: "Alexis",
age: 42
}
---
obj["name"]
```

This syntax is particularly important when you need to use the single-value selector but the key
you need to provide is dynamic and not known until the application is running. For example, the
key may be in a variable:

```
%dw 2.0
output application/dw

var obj = {
name: "Alexis",
age: 42
}

var key = "name"


---
obj[key]
```

© 2023 Jerney.io LLC 66


If the single-value selector is called with a key that the Object does not contain, it will return
null. This is also true when the single-value selector is called on an Array of Objects (not
recommended).

### Multi-Value Selector

DataWeave’s multi-value selector is typically used on Arrays containing Objects. It is used to


retrieve values associated with the same key name for Objects in an Array. For example:

```
%dw 2.0
output application/dw

var arr = [
{
name: "Alexis",
age: 42
},
{
name: "Raj",
age: 35
},
{
name: "Josh",
detail: {age: 29}
}
]
---
arr.*age
```

This script returns [42,35] just like the single-value selector did. Just like the single-value
selector, the multi-value selector only traverses keys on the top level of the Object, it does not
traverse additional levels of nesting searching for the key. To do that, you’ll need the next
selector.

© 2023 Jerney.io LLC 67


The multi-value selector can also be used on Objects to return duplicate keys, which I’ll cover in
the section “Detour: Handling Duplicate Keys”.

Like the single-value selector, the multi-value selector will return null when the key provided
does not match with any keys it searches.

### Descendants Selector

While the single-value and multi-value selector can only query the top-level of an Object, the
descendants selectors is used to retrieve values associated with an Object key at any level of
nesting. For example:
```
%dw 2.0
output application/dw

var arr = [
{ // 1
name: "Alexis",
age: 42
},
{ // 2
name: "Raj",
age: 35
},
{ // 3
name: "Josh",
detail: {age: 29}
}
]
---
arr..age
```

Would return [42,35,29]. The descendants selector not only retrieved the 42 and 35 values
associated with the first two Objects, it also traversed the detail key in the third Object as well
and retrieved the 29 associated with the age key. If you’re dealing with Objects that do not

© 2023 Jerney.io LLC 68


contain duplicate keys, you can expect the descendants to return all values associated with a
particular key in any nested data structure.

Like the single-value and multi-value selectors, the descendants selector will return null if it
does not find any instances of the provided key.

### Detour: Handling Duplicate Object Keys

The single-value, multi-value, and descendants selectors all handle duplicate Object keys in a
different way. This section will explore the differences.

The single-value selector will always return the value associated with the first occurrence of a
key. If there are duplicate keys in the Object, they will be ignored:

```
%dw 2.0
output application/json

var data = {
message: "Hello world!", // First occurance
message: "Goodbye space!" // Last occurance
}
---
payload.message
```

This script will return "Hello world!", ignoring the second instance of the message key in
the Object.

The multi-value selector will return all matches, even on duplicate keys so long as they are on
the top level of the Object:

```
%dw 2.0
output application/json

var data = {

© 2023 Jerney.io LLC 69


message: "Hello world!",
message: "Goodbye space!",
moreData: {message: "TEST"}
}
---
payload.*message
```

This script will return ["Hello world!","Goodbye space!"]. The message "Hello
again!" is not in the output because its key did not reside on the root level of the Object being
queried by the multi-value selector.

These same rules apply when using the multi-value selector on an Array of Objects:

```
%dw 2.0
output application/dw

var data = [
{
message: "Hello world!",
message: "Goodbye space!",
moreData: {message: "TEST"}
},
{
message: "Hello space!",
message: "Goodbye world!"
}
]
---
data.*message
```

This will return ["Hello world!","Goodbye space!","Hello


space!","Goodbye world!"].

© 2023 Jerney.io LLC 70


When using the descendants selector on Objects with duplicate keys, the selector will only return
the first occurrence of the key, but it will traverse all occurrences of the key in search for
additional matches. For example:

```
%dw 2.0
output application/dw

var data = [
{
key: "hello",
key: [
{key: 1,key:2}
]
},
{
key: [
{key: 3},
{key: 4}
],
key: "goodbye"
}
]
---
data..key
```

This will return the following. I’ve added comments to further explain:

```
[
// first occurrence of key
"hello",

// skipped the second (duplicate) occurrence of key


1,
// notice 2 is not returned because it is a duplicate key

© 2023 Jerney.io LLC 71


// Found in the second Object of the Array
[
{
key: 3
},
{
key: 4
}
],

// Also returns the keys contained in previous matches


3,
4
// "Goodbye" is not returned because it is a duplicate key
]
```

At this point you may be wondering if there’s a way to combine the functionality of the
multi-value and descendants selector to retrieve all the values associated with a key, regardless of
their level of nesting and whether the key is duplicate of a previous key. There is! In this case
you must combine the descendants selector with the multi-value selector:

```
%dw 2.0
output application/dw

var data = [
{
key: "hello",
key: [
{key: 1,key:2}
]
},
{
key: [
{key: 3},

© 2023 Jerney.io LLC 72


{key: 4}
],
key: "goodbye"
}
]
---
data..*key
```

This script will return the following (comments added for clarity):

```
[
"hello",
[
{
key: 1,
key: 2
}
],
1,
2, // Duplicate key values are returned
[
{
key: 3
},
{
key: 4
}
],
"goodbye", // Duplicate key values are returned
3,
4
]

```

© 2023 Jerney.io LLC 73


### Recap

You just learned about 3 different ways to query your data in DataWeave using selectors. The
single-value, multi-value, and descendants selectors all work by querying keys on Objects and
returning the associated values. While these selectors work to return values from Objects, all of
them can be called directly on Arrays as well. It is not recommended that you use the
single-value selector directly on Arrays, you should use the multi-value selector instead. The
multi-value selector can also be used directly on Objects when you need to return all values
associated with a duplicate key. The descendants selector is great for querying keys that may
exist at multiple levels of nesting in an Array or Object, but it will not return values associated
with duplicate keys. Finally, we learned that we can combine the multi-value selector and the
descendants selector to work around this limitation. It will return all values associated with a
particular key regardless of their level of nesting and whether the value is associated with a
duplicate key.

### Index Selector

The index selector is used to retrieve a particular item in an Array by its index. The index of an
Array item is determined by its position in the Array relative to the beginning or end of the
Array. DataWeave indexes Arrays starting with the number 0, so the first item of an Array is at
index 0, the second item at index 1, and so on. Here’s an example:

```
%dw 2.0
output application/dw

var arr = [1,2,3,4,5]


---
arr[1]
```

This script will return the number 2 since the number 2 is located at Array index 1. DataWeave
supports reverse indexes, meaning that you can provide a negative number as the index and
DataWeave will locate the element from the end of the Array instead of the beginning. For

© 2023 Jerney.io LLC 74


example, the last item in an Array is at index -1, the second-to-last item in an Array is at index
-2, and so on. Here’s an example:

```
%dw 2.0
output application/dw

var arr = [1,2,3,4,5]


---
arr[-3]
```

This script would return the number 3, since 3 is the third-to-last item in the Array.

The index selector will return null if you provide it with an index that is not applicable to the
Array. For example, if you use the index 100 on an Array with only 5 items, the index selector
will return null.

One other cool thing to note about the index selector is that it works on Strings as well by
treating Strings as Arrays of characters. For example:

```
%dw 2.0
output application/dw

var s = "Hello!"
---
s[1]
```

This script would return "e". All of the rules for the index selector on Arrays also apply to
Strings.

### Detour: Additional Considerations

You may have noticed that the index selector and the alternative dynamic syntax for the
single-value selector are the same: value[key/index]. There are a few consequences to this

© 2023 Jerney.io LLC 75


design choice that you must keep in mind as you make use of these features. For example, what
should DataWeave return for the following:

```
%dw 2.0
output application/dw

var arr = [1,2,3,4,5]


---
arr["1"]
```

There are a couple of valid options:

1. Recognize the operation is taking place on and Array. Coerce the String to a Number, and
return 2.
2. Return null, since "1" is not a valid Array index.

DataWeave chooses the more strict option, 2, and will return null in this circumstance. In other
words: when encountering the value[key/index] syntax during execution, DataWeave
chooses whether to treat the operation as a single-value selection or index selection based on the
data type of the value in the square brackets. If the value supplied is of type String DataWeave
treats the operation as single-value selection. If the value supplied is of type Number DataWeave
treats the operation as index selection.

(!) Because of the flexibility of the value[key/index] syntax, it’s easier to create code that
may be ambiguous to the reader. For example in this code value[lookup] it’s not obvious if
the variable lookup will be of type String or Number. In these cases it may be beneficial to cast
the key/index value appropriately to make the intention of the code more obvious to the reader.
For example value[lookup as String] when using the syntax as a single-value selector,
and value[lookup as Number] when using the syntax as an index selector.

Another situation to consider is that DataWeave’s Number type supports both whole numbers
and decimal values. Therefore, it’s perfectly valid to supply the index selector with an index like
0.193. How does DataWeave handle this case? In these cases, DataWeave truncates the
decimal, leaving a whole number. 0.193 would be treated as index 0, 1.98 would be treated
as index 1, and so on. Here’s an example:

© 2023 Jerney.io LLC 76


```
%dw 2.0
output application/dw

var arr = [1,2,3,4,5]


---
arr[0.123]
```

This script would return 1, because 0.123 would be truncated to 0.

### Range Selector

The range selector is used when you want to extract a contiguous section of an existing Array.
You can use the range selector to retrieve all values between the indexes 1 and 5, or all values
between the indexes of -2 and -4, or any other valid range of indexes. Here’s an example:

```
%dw 2.0
output application/dw

var arr = [1,2,3,4,5]


---
arr[1 to 3]
```

This script would return [2,3,4]. Note that the items associated with both of the indexes
provided to the range selector make it to the output; the range selector treats the indexes as
inclusive.

Just like with the index selector, you can use negative numbers to query for ranges from the end
of an Array:

```
%dw 2.0

© 2023 Jerney.io LLC 77


output application/dw

var arr = [1,2,3,4,5]


---
arr[-2 to -4]
```

This script would return [4,3,2]. While both of the above scripts returned the same values, the
order of the output values were different. This is because the range selector selects items from
the first index provided to the second index. Because of this functionality, the range selector is
not only good for extracting subsets of existing Arrays, but it is also useful for creating Arrays
that are a reverse of the provided Array:

```
%dw 2.0
output application/dw

var arr = [1,2,3,4,5]


---
arr[-1 to 0]
```

This would return [5,4,3,2,1].

The range selector will return null in all situations where one of the indexes provided is not
applicable to the Array. For example, if you retrieve all items in a 5-item Array with the
following code:

```
%dw 2.0
output application/dw

var arr = [1,2,3,4,5]


---
arr[0 to 100]
```

© 2023 Jerney.io LLC 78


DataWeave would return null since 100 is not a valid index for the provided Array, even though
0 is.

Just like the index selector, the range selector works on Strings by treating them as an Array of
single characters:

```
%dw 2.0
output application/dw

var s = "Hello!"
---
s[1 to 4]
```

This script would return "ello".

## Variables
This section is the formal introduction to variables in DataWeave. You’ve already seen them
used to associate a value to a name that can be used later; this is the primary purpose of having
variables in any programming language. Here’s the basic syntax for variables in DataWeave:

```
var <name> = <value>
```

Where name can be any string of characters as long as:

1. The first character is an alpha (meaning only letters from the alphabet), and all remaining
characters are alphanumeric or an underscore, and
2. The name is not a reserved keyword like var or output

And value can be:

1. A literal value of any valid data type in DataWeave (e.g. “hello”, 1, {}, []). This includes
functions, which will be covered in the next section.

© 2023 Jerney.io LLC 79


2. The name of another variable or function

Variables can be declared in any part of the script where declarations can be made. This includes
the script header, and as we’ll see later, do expressions.

Once the name is associated with the value via the var statement, you may access that value
anywhere in the script by simply referencing that name. However, unlike other languages, once
that value is associated with the name, you can no longer associate that name with another value.
For example, the following DataWeave script would not run because the variable X is declared
twice:

```
%dw 2.0
output application/dw

var x = 1
var x = 2
---
x
```

These restrictions make DataWeave variables perform more like constants.

If variables cannot be reassigned, what are they good for? Well, they’re great for storing
intermediate calculations, or giving a useful name to the output of a particularly complex
operation. We’ll discuss variables further in the section on do expressions, and cover variable
scope as well.

### Advanced: Type Information

Variables in DataWeave can also contain type information which defines the type the variable
can contain. To restrict the type of the variable, add a colon after the variable name and specify
the type after the colon. Here’s a quick example:

```
var s: String = "Hello, world!"

© 2023 Jerney.io LLC 80


var n: Number = 1.2
```

DataWeave will guarantee that your variables are only able to be associated with values that
match the type of the variable at both design time and run time:

```
var x_1: Boolean = 1 // ERROR
var x_2: String = 1 + 1 // ERROR
```

## Functions
Functions are DataWeave’s most powerful tool for abstraction. In programming, abstraction is
the ability to hide the implementation details of a certain computation behind an easy-to-use
interface. For example, given the length of a right-triangle’s two short sides, a developer could
use the Pythagorean Theorem to calculate the hypotenuse like this:

```
%dw 2.0
output application/dw

var short1 = 2
var short2 = 3
---
sqrt(pow(short1,2) + pow(short2,2))
```

However, this isn’t ideal because:

1. As a developer looking at this code for the first time, it’s not very obvious what the
original developer was trying to calculate. For example, if I didn’t know the Pythagorean
Theorem, I wouldn’t even know what to lookup to verify this code is correct!
2. If a developer did understand what the original developer was trying to calculate, they’re
immediately exposed to the implementation details of how to calculate the hypotenuse.
3. A developer cannot take the formula and reuse it elsewhere in the script without
copy-pasting it.

© 2023 Jerney.io LLC 81


If this functionality is abstracted away into a function, we resolve all of the above problems
while still solving for the original requirement:

```
%dw 2.0
output application/dw

fun hypotenuse(n, m) =
sqrt(pow(n,2) + pow(m,2))
---
{
twoAndTwo: hypotenuse(2, 2),
threeAndThree: hypotenuse(3, 3)
}
```

Given this example, it’s clear that functions provide developers the ability to abstract, name, and
reuse functionality.

If you’re coming from a language like Java, none of this should sound surprising to you.
Functions are a lot like methods in the sense that they abstract functionality, but that’s where the
similarities stop. DataWeave functions enjoy much more privilege than Java’s methods. What do
I mean by this? DataWeave treats functions like normal values just as it does Strings, Numbers,
etc. It’s a very simple concept but the implications of it are profound. This means DataWeave
functions can be stored and reused as variables, passed to other functions as parameters, and
returned from functions as well. The design implications of this are so profound that languages
supporting this ability often fall under a programming paradigm called Functional Programming,
which is completely different from the Object-Oriented Programming paradigm that Java lives
under. The reason for the distinction is because the thought-process (i.e. paradigm) and design
patterns used to develop solutions for Object-Oriented languages are completely different from
those used in Functional languages. We will delve much further into this topic in the section on
Functional Programming.

Because functions are so centric to DataWeave’s design, the language supports convenient ways
to declare and call functions. We’ll be exploring these methods in this section.

© 2023 Jerney.io LLC 82


### Declaring Functions

Functions are typically declared with the fun keyword which is valid in the header of
DataWeave scripts and the header of do expressions (more on do expressions later). Here’s an
outline:

```
fun <name>(<param_1>, <param_2>, ..., <param_n>) =
<body>
```

Function names and parameter names share the same limitations as variables do. They must start
with an alpha, and every character following must be an alphanumeric or an underscore.

Let’s declare a simple function that adds two numbers together:

```
%dw 2.0
output application/dw

fun add(n, m) =
n + m
---
add(1, 1)
```

The above script would, as expected, return 2. The above script also demonstrates how to call
functions using open and closed parentheses surrounding a comma-separated list of arguments.
This is called prefix notation because the name of the function prefixes the list of arguments.
However, this isn’t the only way to call functions in DataWeave. Functions that receive exactly
two arguments can also be called using infix notation, meaning the name of the function may
appear between the arguments, with no need for parentheses:

```
1 add 1
```

© 2023 Jerney.io LLC 83


We’ll cover why you might want to call 2-argument functions using infix notation shortly.

If you’re familiar with popular languages like Java or Python, you’ll probably find it odd that
DataWeave functions don’t require a return keyword. It’s not just that either, DataWeave
doesn’t even have a return keyword. Why is that? We could go into a fairly detailed
discussion on why this design choice was probably made, but for now it suffices to say that
DataWeave does not use a return keyword because the expressions in the function body must
evaluate to a single value, even if the single value is an Array containing other values. Once that
single value is reached, it is returned. In other words, it is inevitable that functions in DataWeave
return values. In the example above, n + m was substituted with the input parameters to form
the expression 1 + 1, which evaluates to 2. Since no other expressions needed to be evaluated,
the function returned 2.

It’s possible to declare a function with no parameters as well:

```
%dw 2.0
Output application/dw

fun hello() =
"Hello"
---
hello()
```

The above script would return "Hello", but it uses poor style. Most functions that do not
receive arguments merely return the same value over and over, just like variables. If you need
this functionality you’re often better off using variables instead.

### Lambdas

DataWeave allows you to create functions without names as well. These functions are generally
referred to using any of the following names:

● Function literals

© 2023 Jerney.io LLC 84


● Anonymous functions
● Lambdas

“Function literal” is my favorite, but this text will use the term “lambda” because it is used most
widely in the DataWeave documentation. Recall from the section on types that most types have a
notation that allows you to create a fixed value on the spot, called literal notation. Here are some
examples of literals:

```
var stringLiteral = "String"
var numberLiteral = 5
var dateLiteral = |2020-01-01|
var booleanLiteral = true
var arrayLiteral = []
var objectLiteral = {}
```

If you’re familiar with object-oriented programming languages like Java, you can think of these
literals as special constructor syntax. They allow you to create commonly-used types like Strings
and ints on the fly. DataWeave has literal syntax for functions as well:

```
var functionLiteral = (n,m) -> n + m
---
functionLiteral(1,1)
```

Let’s break this down. In the above example we’re declaring a variable called
functionLiteral, and setting it to a function. The function defines the two arguments it
receives and then, following a -> defines the body of the function which returns the sum of its
two arguments. Everything on the right side of the equals sign is a function literal, which we’ll
refer to from this point forward as a lambda:

var functionLiteral = (n,m) -> n + m

Once a lambda is assigned to a variable like in the above script, it functions no differently than a
function that was declared with the fun keyword. The only difference is this syntax is more

© 2023 Jerney.io LLC 85


confusing! The value of lambdas becomes apparent when you see that one of DataWeave’s most
common patterns for transforming Arrays is to use a function that takes two arguments: an Array,
and a lambda. Yes, in DataWeave it is commonplace to pass functions to other functions. We will
cover this concept in great depth in the section on functional programming. For now, here’s an
example of the map function, which passes each of the items in the input Array to the provided
lambda to create the returned Array:

```
[1,2,3] map (n) -> n + 1
```

The above function returns [2,3,4]. The map function is the tool of choice when you need to
change every value in an Array using the same function.

Lambdas are nice because they relieve the developer of needing to come up with a name for a
function every time they simply need to pass it to another function. Imagine if you needed to
come up with a variable name for a Number before you could pass it to a function. We have
lambdas for the same reason.

You may ask yourself if the name n used in the (n) -> n + 1 lambda has any significance. It
does not, you may choose any name for your lambda parameters provided they follow the same
rules used for naming variables and functions.

You may also be asking yourself how map knows to pass each of its values to the function, or
how you would know it’s ok to pass map a function that receives a single argument. Why not a
function that receives 2 or 3 arguments? It is the map function’s responsibility to pass values to
the function you provide. The map function also defines how many parameters the input function
should take in, as well as what it can return. You might be surprised to hear that the map function
passes two parameters to the lambda: a value from the input Array, and the index of that value.
The second parameter is optional and can be left out, but if you only need the second parameter,
you have no choice but to define the first:

```
["one","two","three"] map (v,idx) -> idx
```

© 2023 Jerney.io LLC 86


This would return [0,1,2]. When you’re trying to determine how many parameters a lambda
provided to a function like map would take, it’s always a good idea to refer to the
documentation: https://docs.mulesoft.com/mule-runtime/4.3/dw-core-functions-map.

### Dollar Syntax

DataWeave has lamdas to avoid burdening the developer with creating countless function names.
This is especially convenient when passing functions to other functions using lambdas. However,
when you use lambdas you still need to create parameter names. DataWeave’s dollar syntax
allows the developer to create a lambda with even less syntax and mental overhead, without the
need to name parameters. We’ll use the same example as above, using map to add 1 to each
number in the input Array. However, in this example we’ll use dollar syntax:

```
[1,2,3] map $ + 1
```

This also returns [2,3,4]. A single dollar sign refers to the first parameter passed to the
lambda. You can also use a double dollar sign to refer to the second parameter:

```
[1,2,3] map $$
```

The second parameter passed to the lambda is the index of the first parameter, so this would
return [0,1,2]. You can also use a triple dollar sign to refer to the third parameter where
applicable (mostly when dealing with Objects, not Arrays).

This convenience does not come without a cost, however. I generally only recommend using the
dollar syntax when you’re doing something simple with the lambda. It’s almost always a better
long-term choice to create thoughtful names for your lambda parameters so that developers
reading your code can more easily understand your intentions.

### Advanced: Type Information

© 2023 Jerney.io LLC 87


Like variables, functions can also contain type information. With functions, you can define the
type of all input parameters and the type of the output. Here’s an example:

```
fun add(n: Number, m: Number): Number =
n + m
```

Adding type information to your functions can be helpful in scenarios where you want your
function to make certain guarantees about the types of its input parameters and output. When you
specify type information, some of it is enforced at design-time (before the code is running) and
some of it is enforced at run-time (when the code is running). For example, the following
function definition would give you a design-time error because the function is guaranteeing a
Number will be returned, but it can only possibly return a String:

```
fun number(): Number =
"hello"
```

DataWeave’s type system is relatively dynamic when compared to languages like Java, and it
allows the implicit coercion of Number to String in situations like those above. While type
coercion can be convenient it also creates ambiguity. When the author wrote the following
function, did they intend it to be used with Strings, or Numbers?

```
fun add(n, m) =
n + m
```

The more code I read, the more I tend to steer away from this kind of ambiguity; I want the
person reading my code to have a clear understanding of what I originally wanted the code to do.
Because of this aversion to ambiguity, I often specify type information when creating my
functions. You should give some consideration to doing the same!

© 2023 Jerney.io LLC 88


As an aside, type information isn’t an all-or-nothing deal. You can specify the type of some of
the input parameters and not others, you can specify the type of the output and not any of the
input, etc.

## Do Expressions
This section covers do expressions. We’ll start by explaining the core concepts that make do
expressions useful: scoping and name resolution. Once this is established, we’ll discuss what do
expressions are, and useful ways to use them.

### Scope & Name Resolution


All popular programming languages have a concept of scope, which determines where or when a
particular name is accessible to other parts of the program. Scopes are ultimately used by
programmers to leverage name resolution, which is the process of associating a name in a
program with some value or functionality. The name resolution rules of a language are important
in assisting the programmer in reasoning about the code. This is accomplished by effectively
reducing the number of possible ways a name could be resolved to a value.

Let’s look at a simple script to start:

```
%dw 2.0
output application/dw

var x = 1

fun addOne(n: Number): Number =


n + 1
---
addOne(x)
```

Instead of concerning ourselves with the output of this script, let’s concentrate on when it’s valid
to reference a variable or function name. If we define any variable, function, or namespace in the
header of the script, it is available everywhere in that script. The variable x and the function
addOne are both used in the body without problem. Let’s change the script a little:

© 2023 Jerney.io LLC 89


```
%dw 2.0
output application/dw

var x = 1

fun add(n: Number): Number =


n + x
---
add(5)
```

In this scenario, the function add is referencing a variable, x, that was declared outside of the
function body. Since x was declared in an outer scope relative to the add function, DataWeave
can successfully resolve the name x. What is an outer scope? Simply, and outer scope is a scope
that encapsulates the current scope. For DataWeave, I like to think of scopes as a series of nested
circles:

© 2023 Jerney.io LLC 90


In this case, the “script” scope is an outer scope relative to the “add function body” scope. When
you think of scopes like this, name resolution rules for DataWeave are extremely simple to
explain. There are two main rules:

1. Relative to the current scope, DataWeave may use outer scopes to resolve names
2. Relative to the current scope, DataWeave may not use inner scopes to resolve names

In reference to our script above, this means that the “script” scope cannot resolve the name n
using the “add function body” scope, but the function can resolve the x name using the outer
“script” scope.

Let’s complicate things a bit:

```
%dw 2.0
output application/dw

var n = 2

fun addOne(n: Number): Number =


n + 1
---
add(1)
```

Should this script return 2, or 3? The real question is whether the name n in the addOne body is
resolved to the variable or the input parameter. The answer is that n in the function body refers to
the input parameter, not the variable. In computer programming this is called variable
shadowing. DataWeave has the following name resolution rules for this case: if there exist
duplicate names in the current scope and the outer scopes, the name available in the current
scope is used, and the others are ignored. Imagine you are the computer responsible for
executing this program (exciting, right?). When the program starts, the header is evaluated first,
setting the variable n to 2 and creating the addOne function. Next, the body of the script is
evaluated, so the function addOne is called with the value 1. Within the addOne function body, n
can reference two possible values, 2 from the variable and 1 from the input parameter. In the
current scope n is 1, so it is used. The program would need to reach outside of the current scope
to access the other n, so it is ignored.

© 2023 Jerney.io LLC 91


This completes our name resolution rules for DataWeave:

1. Relative to the current scope, DataWeave may use outer scopes to resolve names
2. Relative to the current scope, DataWeave may not use relative inner scopes to resolve
names
3. If a duplicate name exists in the current scope and a relative outer scope, the name
resolves to the value declared in the current scope.

### Using do Expressions

We’ve seen that a DataWeave script has its own script-level scope, and that scopes can be created
by functions. There is a more flexible way to create a scope in DataWeave: do expressions. Let’s
look at an example:

```
%dw 2.0
output application/dw

var x = 1
---
do {
var y = 1

fun add(n: Number): Number =


y + 1
---
addOne(x)
}
```

The above script would return 2. From this script we can deduce a few things about do
expressions:

1. They look like adorable mini DataWeave scripts


2. The expression itself is surrounded by curly braces.

© 2023 Jerney.io LLC 92


3. The braces have nothing to do with Object literals
4. do expression can be used to declare variables and functions (namespaces too)
5. They evaluate to a single return value
6.
This is in addition to what we discussed earlier about scopes and name resolution.

Now that we know what do expressions are and how to create them, let’s discuss how they’re
useful. In my experience, do expressions are great for a few things:

● Avoiding repeated calculations - save the result of the calculation as a variable in a do


expression
● Refactoring - use do expressions to give helpful variable and function names and enhance
readability of your code
● “Private” functions and variables - you can use DataWeave’s name resolution rules and
do expressions to create private functions and variables that are not available outside the
scope. This helps keep the global namespace clean.

## Conditional Logic
Conditional logic in DataWeave is typically expressed in one of three ways:

1. if/else expressions
2. Pattern matching
3. The default keyword

Most developers are familiar with if/else, but many are not familiar with pattern matching. This
section will cover both and when it’s best to make use of one over the other.

### If/else Expressions


If/else expressions are used when you need to return some value when a condition is met, and
return a different value when the condition is not met. Here’s an example:

```
%dw 2.0
output application/dw

var condition = true

© 2023 Jerney.io LLC 93


---
if (condition)
"Hello"
else
"Goodbye"
```

The parentheses after the if keyword are not optional, and the expression within the parentheses
must evaluate to a Boolean value.

### Pattern Matching


<Text>

# Functional Programming
<Text>

## What is Functional Programming?


<Text>

## State and Immutable Data


<Text>

## Expressions vs Statements
<Text>

## First Class Functions


(passing functions as arguments)

© 2023 Jerney.io LLC 94


# Standard Library
<Text>

## Arrays
++, +, map, filter, reduce, groupBy, distinctBy

## Objects
++, -, mapObject, filterObject, pluck

## Strings
++, “$()”, contains, replace, splitBy, joinBy, read, write

## Dates
<Text>

## Numbers
<Text>

# Future

© 2023 Jerney.io LLC 95

You might also like