Achieving Predictable Multicore Execution of Automotive Applications Using The LET Paradigm
Achieving Predictable Multicore Execution of Automotive Applications Using The LET Paradigm
Abstract—Next generation automotive applications require sup- restore predictability by controlling the time when memory
port for safe, predictable, and deterministic execution. The Logical resources are accessed.
Execution Time (LET) model has been introduced to improve For modern automotive systems, the AUTOSAR standard [3]
the predictability and correctness of time-critical applications.
The advent of multicore architectures, together with the need to provides a reference model for the development of applications,
ensure time predictability despite the complex memory hierarchy including a model of the functions and the tasks, a standard
and the hardware resources shared by the cores, is an additional API for communication and execution, and a standard platform
motivation for the use of the LET paradigm in conjunction with architecture. In AUTOSAR, the application consists of a set
a suitable scheduling and memory access model. In this paper, of communicating runnables grouped into tasks and statically
we show how an implementation of the LET model on actual
multicore platforms for automotive systems brings the potential allocated and scheduled on the system cores. The AUTOSAR
to improve time determinism at the price of a modicum run-time model is based on the concept that the task model and the
overhead. Multiple implementation options are discussed using communication implementation are automatically generated by
the automotive AUTOSAR model and operating system standard, dedicated tools based on configuration information, the model
and a realistic application defined by Bosch for the 2017 WATERS of the application, and platform constraints. Such aspects are of
challenge. Experimental data of executions on the Infineon Aurix
platform show the feasibility of the proposed approach. The paper paramount importance when designing a LET implementation
also provides a discussion on further implementation optimizations for automotive applications.
and other issues related to the general problem of memory-aware This paper. In this paper, we draw analogies from all these
analysis of automotive applications on multicores.
concepts and propose an integrated approach to face the prob-
I. I NTRODUCTION lem of implementing and scheduling task communications in
multicores. We first provide a characterization of possible vari-
The introduction of safety-critical functions in automotive ants of the LET paradigm. Next, we discuss the implementation
systems, together with the advent of multicore platforms, brings of the LET paradigm in agreement with the AUTOSAR model
the need to rethink the development and execution paradigms and API on a multicore platform that is very common in the
for embedded functionality. Developers need high levels of automotive domain and representative of typical HW configu-
predictability, testability, and ultimately determinism in the rations: the Infineon Aurix microcontroller. Then, we provide
execution of their code. The LET model was introduced as an analysis of possible actual implementation options based
part of the GIOTTO framework [1] to eliminate output jitter on the ERIKA RTOS (compliant with the OSEK automotive
and provide time determinism in the code implementation of standard and a de-facto representative of the typical behavior of
controls. Recently, there has been a renewed interest in the AUTOSAR OS kernels). Finally, we provide our results on the
LET execution paradigm by automotive electronics vendors, as evaluation of a code implementation of the application proposed
witnessed by the recent WATERS challenges [2]. by Bosch in the context of the WATERS 2017 challenge [2],
In essence, the LET delays the program output of a task executed with our LET implementation on the Aurix. Other
(or any function executed inside the task) at the end of the related issues will be shortly discussed but are not the main
task period, trading delay for output jitter. The LET model is concern of this work, including the schedulability analysis with
also characterized by an execution model of functional units explicit consideration of memory contention.
with execution order (causality) constraints. The adoption of
this model brings to the foreground not only the concept of II. M ODELING AND BACKGROUND
timeliness, but also of causality, which is typical of synchronous This paper considers applications composed of a set of n
languages and their implementations. periodic tasks Γ = {τ1 , . . . , τn }, each characterized by a
A key observation is that the LET execution model not only worst-case execution time (WCET) Ci , a period Ti , and a
avoids output jitter but has the additional benefit of scheduling relative deadline Di ≤ Ti . A bound on the response time
precisely in time the accesses to the communications variables. of τi is denoted by Ri . The tasks execute upon a platform
This can be extremely valuable in the multicore execution of that comprises m processors P1 , . . . , Pm , with local memories
tasks communicating remotely. Several techniques have been M1 , . . . , Mm (one for each core), and a global memory Mm+1 .
proposed to analyze the time performance of real-time tasks The platform disposes of a crossbar switch that enables point-
on multicores in the face of the sharing of memory and to-point communication between each core and each memory.
other hardware resources, including interconnects, arbiters and Concurrent accesses to memories are arbitrated with a FIFO
I/O devices. Unfortunately, COTS multicore platforms are not policy. Blocking memory access is assumed, i.e., no write or
designed with the aim of providing predictability, with the read buffers. Tasks are scheduled according to partitioned fixed-
consequence that conventional analysis techniques can be at priority scheduling, and hp(i) denotes the set of tasks with
best pessimistic. The LET execution model can improve and higher priority than τi . Each task is statically allocated to a
given processor P (τi ). The symbol Γx denotes the set of tasks τ1
allocated to the processor Px , while Γ(τi ) denotes the set of
tasks allocated to the same processor to which τi is allocated. τ2
As a representative model for automotive AUTOSAR appli-
cations, each task τi is composed of an ordered sequence of τ3
ni runnables ρi,1 , . . . , ρi,ni , each of which has WCET Ci,j .
from input to program variables from program to output variables
The WCET of a task τi is simply computed (as a first-order LET τ3
approximation) as the sum of the WCETs of its runnables. LET input LET output
Runnables communicate by means of labels: variables that
can be read and written in an atomic manner. Each runnable ρi,j Fig. 1. The LET model of execution. The short arrows upon the dots denote
may read or write labels from a set L = {`1 , `2 , . . . , `q , . . .}. the input/output operations performed by the tasks.
Each label `q is characterized by a size (an integer number of
bytes no larger than the processor word) and an access cost
λq . Li denotes the set of labels accessed by task τi , which input of the task data is performed at the task activation, and the
can be constructed by looking at all the labels accessed by output is performed at the end of the task period. All task inputs
the runnables in τi . Each label is written by at most one task, are stored in local variables at the task activation. Similarly, all
while it can be read by multiple tasks. Labels that are written outputs need to be stored in local variables and are actually
and read by tasks on different cores are mapped to the global output only by the LET code at the end of the cycle. This
memory, while all the other labels (including constant data) are requires to allocate memory for local variables mirroring all
mapped to the local memories (including their duplicates). The input and output variables.
set of labels mapped in global memory and accessed by τi is Several mechanisms can be used to enforce the LET syn-
denoted by LG chronization of input and output operations. In essence, LET
i ⊆ Li . Task τi accesses label `v at most Ni,v
times. For a given pair of communicating tasks, a producer τP is a sample and hold mechanism with synchronized execution
and a consumer τC , LW (τP , τC ) denotes the set of labels that of the input and output part.
are written by τP and read by τC . LR (τC , τP ) denotes the set
III. LET S EMANTIC O PTIONS
of labels that are read by τC and written by τP . In order to
compare the effects of different memory access policies, the The following sections present and discuss three different
WCETs do not include the execution cost to read and write the LET semantics characterized by different timing properties and
memory labels. implementation concerns using a simple running example.
Running example. Consider a producer task τP communicat-
A. Logical Execution Time
ing with a consumer task τC by means of a shared label `.
The LET model we assume is inspired by the original Task τP acquires input data from a sensor, then it elaborates
proposal in [1]. However, in Section III we discuss other the data producing an update for `. In a dual manner, τC reads
semantics and implementation options that are still inspired data from `, performs further elaboration on such data, and then
by the need for predictable and deterministic execution. In performs a control output operation.
addition, we include a model for the implementation of the LET
execution paradigm in the context of the AUTOSAR standard. A. The GIOTTO LET semantic
For this option, we adopt from AUTOSAR definitions and In abstract terms, the LET paradigm assumes that the in-
most of the semantics for the activation and communication put/output operations happen in zero time. However, in a real
of functions (runnables in AUTOSAR). implementation, the actual input/output operations must be
Functional and runnable model. In the original LET proposal, scheduled for execution. The order with which they are exe-
the execution of functions is characterized by a predictable and cuted influences the timing properties of the systems, especially
deterministic execution that preserves the order of execution of when flow preservation along communication chains is re-
the functions and provides for deterministic communication and quired. To ensure time determinism, the GIOTTO programming
actuation times. In the LET model, the system is a network of paradigm [4] specifies an order of execution for the writes and
functional blocks B = {b1 , b2 , . . . , bn }. Communicating blocks reads of blocks communicating using LET to enforce causality
may be related by execution order constraints (expression of (see GIOTTO micro steps in [4]).
causality). Each block is characterized by a periodic activation Without delving too much in details, the order of execution
and execution. Each block can perform multiple reads and in GIOTTO can be recapped as follows: (i) first, data write and
multiple writes. Communication may occur between blocks control output (i.e., actuation) operations are performed, then
with different periods, and each writer can have multiple readers (ii) input (i.e., sensing) and data read operations are undertaken.
for the same piece of information. In the LET execution model, This order is applied at every periodic instance of the tasks
blocks are executed by tasks (or threads) and their input and in the system and considers the input/output operations of all
output operations are grouped together at the task level. the tasks in a holistic manner, i.e., if the period instances of
The LET execution model can be summarized as depicted two tasks begin at the same time, then the communication is
in Figure 1. In the figure, the output of task τ2 (denoted collapsed within a pair of phases (i) and (ii), each comprising
by the upward arrow at the end of the box representing the the communication operations for both tasks.
task execution) has a significant jitter. Because of variable Figure 2 illustrates an example schedule of LET communi-
interference from τ1 , it occurs late in the first task instance cation with GIOTTO semantic. The communication phases are
and much earlier in the second. The LET solution is shown scheduled at the beginning of each periodic instance, which
in the bottom timeline for task τ3 (taken as an example). The is compatible with the case in which they are performed by
a high-priority task. As shown in the figure (dashed arrow),
write operations have precedence on read operations, and the τP w S w S
third periodic instance of τC reads the data written by the first
instance of τP .
τC AR A R A R A R A R
time
logical end-to-end latency
τP w S w S
Fig. 3. Example schedule of LET communication with interleaved communi-
cation phases. The producer task τP has a period of TP = 4 ms while the
τC consumer task τC has a period of TC = 2 ms. The same legend of Figure 2
A R A R A R A R
applies.
time
logical end-to-end latency
time
logical end-to-end latency
As long as the tasks complete their execution before the
(a)
release of their next instance (i.e., according to the implicit-
deadline model), and ignoring the time needed to perform
the actual input/output operations, the end-to-end latency with τP S w S
which the system reacts to the control input is deterministic,
i.e., it is independent from the tasks’ response times and equal
to TP + TC . τC A R A
The same semantic can be realized by scheduling the in- time
logical end-to-end latency
put/output operations at different times than the ones in Fig-
(b)
ure 2: implementation issues related to the scheduling of LET
communication are addressed in Section V.
Fig. 4. Two examples of LET communication for a task chain. When it
B. Interleaved LET communications completes its execution, the producer task τP activates the consumer task
τC (dotted arrow). Both the tasks have the same period, but τC incurs in
By altering the order with which the input/output operations release jitter. The same legend of Figure 2 applies. The marker with a large
are performed, it is possible to obtain different end-to-end dot indicates the completion of a job of τP . Inset (a) depicts the case where
the GIOTTO semantic is applied, while inset (b) depicts an alternative case
latencies. For instance, consider the case where the LET where data write and read operations are scheduled when τC is activated.
communication phases are grouped by tasks, i.e., input and
output operations are interleaved. This case is compatible with
a LET implementation where each task delegates the LET Figure 4(a) illustrates an example schedule for the considered
communication for its input/output operations to a dedicated task chain where LET communication follows the GIOTTO
high-priority task. semantic introduced in Section III-A. Since the communication
Figure 3 illustrates an example schedule of LET commu- phases are performed at the beginning of the periodic instances,
nication where the input/output operations of task τC have the data produced by the first instance of τP is available to
precedence on those of τP , i.e., they follow the rate-monotonic the second instance of τC . As a result, the (logical) end-to-end
order (note the periods of the tasks in the figure caption). As latency is equal to TP +TC = 2T . However, this latency may be
it can be observed from the figure, differently from the case reduced while still preserving the flows of data values between
discussed in the previous section, the third periodic instance of two consecutive instances of such tasks, i.e., the data produced
τC is not able to read the data produced by the first instance by a job of task τP must be available to the successor job of
of τP . This happens because the read operations of τC are τC explicitly activated at the completion of τP . As illustrated
scheduled before the write operations of τP . As a consequence, in Figure 4(b), data writes and reads are performed at a time
the data produced by the first instance of τP are only available instant (e.g., at the response time of τP as in the figure) within
to the fourth instance of τC , which determines an increase of the the period of the tasks. Nevertheless, the LET paradigm can
end-to-end latency with which the system reacts to the control be retained for external inputs and outputs, thus maintaining
input. Specifically, the latter becomes TP + 2TC . predictability for the control timing. In other words, this scheme
can be seen as a LET paradigm applied in a holistic manner to
C. LET for task chains
the task chain, rather than to each individual task. As depicted
In the particular case in which a producer task τP only in Figure 4(b), the resulting end-to-end latency with which the
communicates with a consumer task τC , the LET model can system reacts to the control input is equal to T .
be dropped for the internal communication of the chain and
restored only at its boundaries, by enforcing an order of IV. R EALIZING LET WITH GIOTTO S EMANTICS ON
execution with an explicit activation signal. Under this scenario, M ULTICORES
the tasks have the same period TP = TC = T , but the consumer This section presents a method for realizing the LET commu-
task τC incurs in release jitter, which depends on the response nication with GIOTTO semantics on a multicore platform. To
time of τP . generalize the proposed method, the following sections consider
the abstract platform model in Section II. The method is later τP executed with a rate of TP = 2 ms that is communicating
instantiated for a real platform in Section V. The local copies of with a consumer task τC running with a rate of TC = 10 ms.
the labels required by the LET are allocated to local memories, Suppose also that both the tasks are synchronously released at
i.e., a task τi running upon processor Pk and accessing a the system startup. As a function of the ratio of their periods,
label `q disposes of a local copy for `q , named `i,q , allocated for each job of τC there are TP /TC = 5 jobs of τP that overlap
to Mk . Since application tasks work only on local copies, over time. For a given job JC of τC , the data produced by the
their execution is not affected by memory contention. The first four overlapping jobs of τP are never used by JC , as they
global communication labels are allocated to the global memory are overwritten by the data write operations performed by the
Mm+1 . Contention in the access to such labels is avoided by the last overlapping job, i.e., the last job of τC that completes no
LET communication mechanism at the price of a (predictable) later than the release of the next job of τC (following JC ).
synchronization delay. For the sake of simplicity, only data read In a dual manner, a consumer does not always need to read
and write operations are considered: possible improvements and the shared copies of the labels. By leveraging these observa-
optimizations are discussed at the end of this section. tions, it is possible to derive an analytical characterization of
the timing of LET communications.
A. LET as an opportunity to avoid memory contention
A major issue in executing real-time applications upon ⌊kTC /TP ⌋ jobs
multicore platforms is the contention of architectural shared
resources in the memory hierarchy (e.g., levels of caches and τP
global memories). Works in the literature [5], [6] addressed
such a problem by proposing clever solutions to improve the
predictability of memory traffic. τC
0
As discussed in Section III-A, LET communication can be (a)
kTC time
0.9
The results are reported in Table II. The table on the left
0.8
reports the net execution times of the first frames without the 0.7
kernel overhead. Note that the first frame is analogous to the 0.6
one executed at the tasks’ hyperperiod, where all the LET 0.5
0.4
communications are performed, and is the heaviest in terms of 0.3
execution time. Collecting the net execution times for all the 0.2
frames was beyond the capability of our tracing hardware due 0.1
0
to the limited trace buffer of the microprocessor. The execution ISR9 Task 1s Task ISR4 Task ISR3 ISR8 Task ISR1 ISR7 Task Task
times, including the kernel overhead, for the first eight frames 10ms 20ms 100ms 50ms 5ms
are reported in the table on the right. The GMF tasks require a LET Explicit