Real-Time Linux Testbench On Raspberry Pi 3 Using Xenomai
Real-Time Linux Testbench On Raspberry Pi 3 Using Xenomai
Keywords
i
Sammanfattning
Testbänkar används ofta för att simulera händelser till ett inbyggt system för
validering. Till simpla testbänkar kan mikrokontroller användas. För mer
avancerade testbänkar kan RTOS användas på mer komplex hårdvara. RTOS
har begränsad funktionalitet för att garantera en hög förutsägbarhet. GPOS
har stora mängder funktionaliteter men har istället en låg förutsägbarhet.
Litteraturstudien undersökte därför möjligheterna till att få Linux att hantera
realtid. Resultatet av litteraturstudien fann ett tillvägagångssätt vid namn
Xenomai Cobalt att vara den optimala lösningen för att få Linux till Real-Time
Linux.
Xenomai Cobalt utvärderades på en RPi 3 med hjälp av dess GPIO-pinnar och
ett fördröjningstest. En applikation skrevs med Xenomai’s API. Applikationen
använde GPIO-pinnarna till att läsa från en funktionsgenerator och till att
skriva till ett oskilloskop. Mätningarna från oskilloskopet jämfördes sen med
applikationens mätningar.
Resultatet visade mätskillnaderna mellan RPi 3 och oskilloskopet med sys-
temet i viloläge. Resultatet av mätningarna visade att läsningen varierade
med 66.20 µs och skrivandet med 56.20 µs. Fördröjningstestet utfördes med
stresstestning och visade den värsta uppmätta fördröjningen, resultatet blev
82 µs.
De resulterande mätskillnaderna blev dock för höga för projektets krav. Ma-
joriteten av mätningarna var mycket mindre än de värsta fallen med 23.52 µs för
läsning och 34.05 µs för skrivning. Detta innebar att systemet kan användas med
bättre precision som ett fast realtidssystem istället för ett hårt realtidssystem.
Nyckeord
ii
Acknowledgements
I would like to thank my KTH supervisor Tage Mohammadat, who was very
helpful answering all my emails with questions, and giving me continuous feed-
back and suggestions on the report.
I would also like to thank my supervisors as Saab Dynamics AB, Björn Johans-
son, Håkan Pettersson, and Mattias Helsing, for helping me and providing me
with all the necessary tools.
Finally I would like to thank Greg Gallagher at Xenomai for answering all my
emails, and helping me debug the Xenomai setup.
Stockholm, 23rd of June, 2018
Gustav Johansson
iii
Table of Contents
Abstract i
Sammanfattning ii
Acknowledgements iii
Table of Contents iv
List of Figures vi
1 Introduction 1
1.1 Project Description . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 6
2.1 Real-Time Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Real-Time Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Xenomai’s Architecture . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Raspberry Pi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Method 22
3.1 Xenomai Cobalt . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
iv
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Experimental Setup 43
5.1 Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Encountered Problems . . . . . . . . . . . . . . . . . . . . . . . . 47
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
References 58
Appendices 62
B GPIO Template 67
B.1 Makefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
B.2 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
v
List of Figures
2.1 Interrupt Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Linux Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Topology of the two approaches for real-time Linux . . . . . . . . 10
2.4 Xenomai’s Architecture . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 The interrupt pipeline, inspired by [26, Figure 13-2] . . . . . . . 14
2.6 Xenomai’s Cobalt core configuration, taken from [27, Figure 1] . 15
2.7 Xenomai’s Mercury core configuration, taken from [27, Figure 2] 16
2.8 Cache topology for Raspberry Pi 3 . . . . . . . . . . . . . . . . . 19
vi
List of Tables
2.1 Raspberry Pi hardware specifications . . . . . . . . . . . . . . . . 17
2.2 Cache information . . . . . . . . . . . . . . . . . . . . . . . . . . 18
vii
List of Acronyms
viii
RTAI Real-Time Application Interface
RTDM Real-Time Driver Model
RTOS Real-Time Operating System
SBC Single-Board Computer
SoC System on a Chip
SSH Secure Shell
TSC Time Stamp Counter
USB Universal Serial Bus
VISA Virtual Instrument Software Architecture
ix
1 Introduction
A common approach in building test benches is to use microcontrollers such as
Arduino. The microcontrollers are then used to simulate trigger signals towards
a target and capturing signals using Input/Output (I/O) pins. A microcontroller
without an OS can be developed to support timely measurement of events and
I/O trigger signals at the lowest overhead possible, i.e. high efficiency. This,
however comes at the expense of lower extensibility and higher development
time. Moreover, microcontrollers efficiency have limited memory and compu-
tation power which limits the use-cases, e.g. inability to use them to log large
amount of data.
RPis are a series of inexpensive platforms with a large and active community.
The RPi models have similar physical size and features of an Arduino but have
higher computational resources and larger memories that make them capable
of running complex OS. Instead of having a fixed memory size as Arduino, it
uses SD cards as secondary memory which gives the possibility to use much
more storage for applications and logging. However, the RPi sacrifices time
predictability for complexity and features when using a Linux OS.
The problem with Linux and many other GPOSs is that it was not originally
designed with real-time properties in mind, but the focus was instead on desktop
and server environments [1]. The result is a very user-friendly environment but
highly unpredictable regarding timely executions.
A solution for unpredictable GPOSs is to use a RTOS instead. The RTOSs
have many features to help the application developers to be more efficient while
providing real-time guarantees [2, PP. 79-80]. When comparing an RTOS with
a GPOS: the RTOS is more limited in regards to features as the RTOS only
provides with functionalities which guarantee real-time executions. Many de-
velopers would want the features provided by a GPOS but they are aware of
the compromises done by the RTOSs.
1
1.1.1 Problem
1.1.2 Motivation
The RPi is a series of low-cost platforms and has a large and active community.
It can be equipped with a large SD card and has thus capabilities to store
lots of sampled data. The Linux kernel offers several useful functions such
as communication and various hardware services. If real-time performance is
achieved RPi’s would be a powerful replacement to the microcontrollers used
for integration testing. The RPi also has support for high-level languages such
as Python and could thus make it easier to develop the test-code.
1.1.3 Task
The task is to examine the RPi and the theory behind RTOSs to find out if
the RPi can be used for real-time triggering and measuring. The goal is to out-
put signals with a time precision within ±10 µs, and also timestamp measured
changes of input signals with the same precision. These measurements should
not be delayed by general OS operations.
1.2 Requirements
From the background represented so far and with the project description, re-
quirements have been established on the system and the implementation.
• The approach chosen need to support the hardware of RPi 3.
• The GPOS preferred is Raspbian.
• The approach needs hard real-time support.
• The system needs to handle real-time tasks which won’t be interrupted or
degraded while using generic tasks from GPOS.
• The system needs to be able to set up communication between real-time
tasks and general-purpose tasks.
2
• The implementation of the system needs to be able to read and write GPIO
pins with a time precision of ±10 µs, and include timestamps within the
same precision.
• The implementation of the system needs to be able to monitor and log
data, either during or after the real-time measurements have been per-
formed.
Previous attempts have been done into combining the features of GPOSs and
the predictability of RTOSs. The question which remains is: how does such an
approach perform in terms of time predictability, especially on a RPi 3 for test
bench purposes?
1.4 Purpose
1.5 Goals
While a GPOS provides a vast amount of complex use cases, it does not provide
real-time properties. The RPi is one of the most popular computer systems
used for IoT projects. The most popular OS executed on the RPi called Rasp-
bian. Raspbian is, however, a GPOS and thus is limited in regards to real-time
properties. The main goals of the thesis are:
1. Modify the GPOS called Raspbian for RPi 3 so it can handle real-time
executions.
2. Evaluate the real-time capabilities of the system to make timely digital
triggers/stimuli using the GPIO pins accessible on the RPi 3.
Before the project, a microcontroller was used for the test bench. The microcon-
troller needed, however, to be connected with a computer which had complete
control over it. The computer could then get the logged data and re-flash the
3
microcontroller whenever needed. With a successful project, the microcontroller
and computer could be replaced by a single RPi. The benefit of this is a sim-
pler solution where only one hardware device is needed. Another benefit is the
power consumption, RPi 3 uses at most 12.75 W [3] while an average desktop
computer uses at average 60 − 250 W [4]. Therefore, if a RPi 3 would be able
to replace a desktop for a test bench, a more sustainable approach could be
demonstrated.
The procedure of the thesis will not include personal information of any kind.
It will also not give any information about the company which the thesis is
conducted at. It is therefore decided that no further details of ethics are needed.
1.6 Method
Acquiring real-time properties for Linux have been in focus for a time. Mainly
two approaches have been attempted [1, P. 432]. The first approach is called
the virtualization, interrupt abstraction or dual-kernel. The second approach
modifies the Linux kernel to decrease latencies and introduce real-time features.
More details can be read in section 2.3.
The first real-time extension which was introduced to Linux was called RTLinux
[5] (not to be confused with PREEMPT RT [6] which is often called real-time
Linux) which used the dual-kernel approach. In a dual-kernel approach, a micro-
kernel is introduced, which controls everything on the system. The microkernel
is responsible for providing the real-time executions. The microkernel executes
the Linux kernel as a thread with the lowest priority. After RTLinux, multi-
ple attempts have been done with the same approach, two of them are called
Real-Time Application Interface (RTAI) [7] and Xenomai [8].
The most popular approach of kernel modification is called PREEMPT RT
[6]. Its focus was to make the Linux kernel more deterministic and predictable
by making the Linux kernel as preemptible as possible to limit latencies and
jitter. Another kernel modification approach was called SCHED DEADLINE
[9] or SCHED EDF which is a Central Processing Unit (CPU) scheduler. As
the name suggests, the scheduling algorithm is based on Earliest Deadline First
(EDF). SCHED DEADLINE is now included in the Linux kernel since version
3.14 [10].
Xenomai was chosen to be used for the thesis as it provides everything necessary
to fulfill the requirements set in section 1.2. Details on Xenomai can be read in
section 2.4.
The objectives in this thesis can be seen in the list below:
1. Research on the current approaches for acquiring real-time properties for
Linux (section 2.3).
4
2. Evaluate the possibility of using chosen approach for RPi 3 (section 1.2).
3. Setup the approach for a RPi 3 running the Raspbian OS (section 4.1).
4. Implement an application which uses the GPIO pins of the RPi 3 in order
to characterize measure its performance (section 4.2).
5. Setup an automated experimental setup for data collection (chapter 5).
6. Analyze the result from the data collection (chapter 6).
The thesis is divided into six following chapters. Each chapter is given a short
description below.
Chapter 2: Background, describes the theoretical background for the thesis.
It describes the important insights in the current literature. Insights such as
real-time, real-time Linux approaches, Xenomai, and RPi hardware are listed
. The chapter ends with a project description and conclusions based on the
literature insight and project requirements.
Chapter 3: Method, describes the method used to be able to fulfill the goals
set for the project. The method of installing Xenomai Cobalt onto RPi 3 was
first described. Secondly, the measuring application described. Thirdly was the
data collection described, which was going to use an automated process to gather
all the necessary data. The data collection also described how the analysis was
going to be executed as well as determining reasonable sample sizes for the data
collection.
Chapter 4: Xenomai Installation & Application, describes the method
on how the real-time characteristics are introduced to RPi 3. Afterward, the
measuring application needed for the project is described in detail, with the
motivation behind every step.
Chapter 5: Experimental Setup, describes in full detail how the experimen-
tal setup was done. It describes the automation process which was implemented
and used. Data collection is then described in detail, i.e. how the automation
process gathered all necessary data for the next chapter.
Chapter 6: Results & Analysis, describes which types of measurements were
done and displays the results. The results are then compared and analyzed in
detail.
Chapter 7: Conclusions & Future Work, describes the conclusions of the
project based on the results derived from the data measurements, and the overall
experience. The project requirements which was made at the beginning of the
project are then compared with the end result. Lastly examples of future work
are described and motivated.
5
2 Background
In order to progress further into the thesis an insight into the current literature
is necessary. This chapter goes through the topics and areas needed to be known
to understand the further chapters of the thesis.
Many computational systems rely on computing tasks with a need for precise
execution time. These systems are called real-time systems. In these systems, it
is most important that the tasks the system withholds are executed when they
should. The result from tasks executing too late or too early can be considered
useless or even dangerous depending on the use and responsibility of such a
system [1, P. 1].
Today there are real-time systems almost everywhere in our society as most
computer systems are in fact real-time systems. Applications, where real-time
systems are crucial, are for example power plants, robotics, and flight control.
2.1.1 Classifications
The tasks in real-time systems are often put in three different categories de-
pending on what the result would be of a missed deadline. The categories are
hard, firm and soft [1, P. 9].
Hard Tasks considered hard must never miss their deadline as the consequences
can be catastrophic for either the system or its environment.
Firm Tasks considered firm can miss their deadlines, but the computed result of
the tasks would be considered useless. However, the result is not damaging
the system or its environment.
Soft When tasks, considered soft, miss their deadlines, the result would still be
useful but not as good as if it would have been, had the result not missed
the deadline.
6
difficult with the increasing complexity of the computer system architecture
design development [11]. The focus on computer system architecture design
is often to improve the performance for general-purpose use, which does not
include a strict requirement of timing. Because of this, it is not unusual for
improvements to affect the predictability in a negative way.
Microarchitecture
Multi-Core
7
each have a private cache but share caches on higher levels of the cache hierarchy.
The CPU cores are not isolated from each other because of the shared resources
in the system. This causes the timings and performance of each core being
dependent on the workload on the other cores. For example, the shared cache
can be invalidated by other cores which increases the number of cache misses
for a certain other core, thus decreases the predictability [11].
2.2 Linux
Linux is a free and open-source GPOS with many different variants called dis-
tributions. It was originally developed on Intel x86 architecture in the year
1991 [12] but has since then been ported to many other platforms. Linux was
originally designed to be used in desktops and servers [1, P. 432]. Linux is today
the most ported OS and is also the most common OS on servers and mainframe
computers. The Android OS is based on a modified version of the Linux kernel.
Android has a market share of over 73% overall mobile phone devices (late 2017)
[13].
2.2.1 Development
The main difference between Linux and most other OSs is that, Linux has
always been developed as open-source and free software. The most common
open-source license used in Linux is GNU General Public License (GPL) which
is a copyleft, meaning that anything taken from GPL must be used under the
same license [14].
8
2.2.2 Linux Kernel
The Linux kernel can be considered as either the heart or brain of the Linux
OS. The kernel is the first program to be loaded on the system when the system
boots. The kernel is loaded in a protected space in the memory called kernel
space [15]. Everything the average user does is done on a different memory
space called userspace. The kernel manages everything on the system: e.g.
processes, memory, files, and I/O. Whenever a user process needs to access
certain areas, for example, hardware details, the kernel takes control and retrieve
the information needed to the user process. This process is called system call
or syscall for short.
The properties of the kernel vary on different OSs but the kernel usually includes
a scheduler. The scheduler determines how all the processes in the system shall
be handled, in what order and priority.
The Linux kernel, as many OSs is driven by interrupts. For example, the sched-
uler is controlled by timer interrupts of a clock. The scheduler is awoken by the
interrupt and then reschedules whatever necessary. Other hardware can also
generate interrupts for the scheduler for a fast handling of hardware.
An example of an interrupt sequence can be seen in Figure 2.1 below. A task
is waiting to be executed with the help of an interrupt. It can, for example, be
a task waiting just to be rescheduled when a timer interrupt occurs.
Waiting Running
task task
Interrupt
Scheduler
Interrupt handler
happens
Interrupt Handler Scheduler Scheduler
latency duration latency duration
Time
The architecture topology of Linux can be viewed below in Figure 2.2. The
architecture shows a Hardware Abstraction Layer (HAL) sitting beneath the
kernel and above the hardware. The HAL abstracts the hardware with software
to achieve a less hardware dependent architecture, so the kernel and processes
on top do not need to be changed for each different type of hardware. For a
user process to gain access to a device driver, it needs to go through the kernel
processes with a system call.
The Linux kernel is in continuous development and can be acquired through its
kernel source tree Git repository [16].
9
User Processes User Mode
Device Driver
OS Kernel
Software
Specific Specific
CPU + Motherboard Hardware
Hardware Hardware
Linux is as mentioned a GPOS which was designed for desktops and servers.
Because of this, Linux is not suitable for real-time as it can cause high and
uncontrolled latencies. But because of the popularity of Linux, and of being
open-source, a few approaches have been undertaken in order to make Linux
more suitable for real-time computing. There have been mainly two different
approaches on achieving real-time properties on Linux. The first approach is
commonly called dual-kernel, where interrupt abstraction or virtualisation are
applicable. The second approach is by modifying the Linux kernel directly [1].
When comparing the two approaches, it is common to compare them with dual-
kernel and single-kernel. For a visual description of the two approaches, see
Figure 2.3 below.
virtualization-based
virtualization-based solutions are being done by having a very small ker-
nel (also often called hypervisor) which redirects all hardware interrupts.
The hypervisor assign resources such as processor time, memory space,
peripherals and more. This is a virtualization step. With this solution,
10
the hypervisor can prioritize real-time processes over the Linux kernel pro-
cesses. The Linux kernel is running as a thread with idle priority, meaning
that it will only execute when all the real-time tasks have finished exe-
cuting. Because of this architecture, the dual-kernel approach can achieve
hard real-time [17, P. 4]. This approach significantly lowers latencies com-
pared to single-kernel approach [18].
Kernel modification
This approach focuses instead on modifying the Linux kernel to make its
performance more predictable. It is common to reference this approach
as the single-kernel approach to easily distinguish it from the dual-kernel
approach.
2.3.1 RTLinux
RTLinux [5] was developed by Finite State Machine Labs, Inc (FSMLabs) and
was covered by a US patent (5885745), which was not valid outside of USA [18].
RTLinux used a very small kernel called microkernel to handle the hardware
and therefore is a dual-kernel approach. The Linux OS and its kernel is running
on a thread with the lowest priority. When the microkernel receives an interrupt
it checks first if the interrupt is related to the real-time tasks running or not. If
the interrupt is real-time related, then the correct real-time task will be notified.
If the interrupt is not related to real-time then the interrupt will be flagged and
later used when the Linux OS is allowed to execute. The system achieves low
latencies for real-time tasks but suffers from some drawbacks. One is that device
drivers will often need to be rewritten in order to work for real-time tasks [1,
P. 433].
A company called Wind River System acquired FSMLabs in the year 2007 and
made a renamed version Real-Time Core available for their Linux distribution.
In the year 2011, however, was the Real-time Core discontinued in development
[19].
11
2.3.3 Xenomai
Both RTLinux and RTAI had the same drawback: real-time tasks executed on
the same level as the Linux kernel code and there was no memory protection
between them [18]. An error in the real-time tasks (e.g. segmentation fault)
could cause the entire system to crash. This problem often occurred during
development and the developer often needed to reboot the entire system to
continue.
This was where Xenomai [8] came in; it used the ADEOS nanokernel but also
allowed real-time tasks to execute in Linux userspace. The real-time tasks can
execute in two domains. The real-time tasks start initially in the primary do-
main, controlled by the RTOS, the other domain is controlled by the Linux
scheduler. When a real-time task needs to use a function accessed only in Linux
API, the task will be transferred temporarily into the Linux domain until the
function has finished executing. This allows the developer to take advantage
of Linux but increases the unpredictability when real-time tasks are inside the
Linux domain [18].
2.3.4 PREEMPT RT
A project called the real-time Linux collaborative project was publicly an-
nounced in 2015. The project has been working continuously on a kernel patch
called PREEMPT RT which modifies the kernel to be more preemptive. Paul
McKenney describes what the aim of PREEMPT RT is, in this quote below
[20].
”The key point of the PREEMPT RT patch is to minimize the
amount of kernel code that is non-preemptible”
The project is using knowledge from existing RTOSs and has been releasing
stable versions since kernel version v2.6.11. The PREEMPT RT is used for
real-time tasks to reach lower latencies and jitter. This is achieved by modifying
the Linux kernel to be as preemptive as possible, some of the approaches are
described below.
Spinlocks
The kernel uses spinlocks to ensure that only one thread at a time has
access to a certain section. They are chosen instead of mutexes because
of being simpler and faster [21]. But spinlocks were regarded as a perfor-
mance bottleneck and PREEMPT RT, therefore, converted a majority of
spinlocks to rt mutexes instead.
rt mutex
PREEMPT RT replaces all mutexes with rt mutexes instead. The key
difference is that rt mutex has implemented priority inheritance in order
to avoid priority inversion [22].
12
2.3.5 SCHED DEADLINE
2.3.6 LITMUSrt
The main goal and focus of Linux Test bed for Multiprocessor Scheduling in
Real-Time (LITMUSrt ) [23], [24] is to provide an experimental platform for
real-time system research. LITMUSrt provides with abstractions and interfaces
within the kernel, which simplifies further modifications on the kernel compared
to an unmodified kernel. LITMUSrt has been modifying the kernel to support
their sporadic task model, modular scheduler plugins, and reservation-based
scheduling. LITMUSrt has also implemented support for clustered, partitioned,
global schedulers, and semi-portioned scheduling.
LITMUSrt has been intended to serve as a proof of concept for the predictability
of multiprocessor scheduling on existing hardware. LITMUSrt provides with
an API but is not considered to be stable, meaning that implementations can
change between releases without any warnings [25].
In order for Xenomai to keep latencies predictable the Linux kernel must be
blocked from directly handling interrupts. The interrupt must instead be redi-
rected to go first through Xenomai and then the Linux kernel. This is achieved
by having a microkernel between the hardware, Linux, and Xenomai. The mi-
crokernel acts as a virtual programmable interrupt controller, separating inter-
rupt masks between Linux and Xenomai. This microkernel is called Interrupt
13
Xenomai Tasks Linux Threads/Processes User Mode
Xenomai OS
Real-time Kernel Kernel Device
Device
driver Software
driver Hardware Abstraction
Layer (HAL)
Interrupt Pipeline
Specific Specific
CPU + Motherboard Hardware
Hardware Hardware
Figure 2.5 shows a virtual Interrupt Request (IRQ) which has the ability to lock
out certain interrupts for other domains when needed to. I-Pipe replaces the
hardware interrupt masks with a multiple of virtual masks. The virtual masks
are then used such that domain’s that use the same interrupt mask will not be
affected by another domains action. I-Pipe uses an architecture-neutral API.
The API has been ported to a variety of CPUs which simplifies the Xenomai
porting, as much less architecture dependent code needs to be implemented.
14
2.4.2 Cores
The Xenomai core supplies all the necessary resources that the Xenomai skins
(section 2.4.4) requires in order to mimic existing RTOS API. Real-time pro-
cesses on the system need only to use the Xenomai services in order to keep its
necessary predictabilities [26, P. 371].
Xenomai is now divided into two different approaches: Cobalt core and Mercury
core. The cores have significant differences and are used depending on what the
developer wants and need.
Cobalt Core
The Cobalt core is the dual-kernel approach and requires I-Pipe. Together with
I-Pipe is Cobalt built into the system to handle all the time-critical parts of
the system, such as interrupt handling and the scheduling of real-time tasks. A
visual overview of Cobalt’s configuration can be viewed below in Figure 2.6.
Figure 2.6: Xenomai’s Cobalt core configuration, taken from [27, Figure 1]
The Cobalt core is only recommended to be used when the system is using a
maximum of four CPUs for real-time tasks. If the system is going to use more
CPUs however, it is recommended to use the Mercury core instead [27]. The
Cobalt is dependent on being supported by the hardware as all the drivers need
to be modified in order for Cobalt to be functioning.
Mercury Core
15
core is chosen as PREEMPT RT improves the latencies. The configuration of
a Mercury core can be viewed below in Figure 2.7.
Figure 2.7: Xenomai’s Mercury core configuration, taken from [27, Figure 2]
Xenomai has been using Real-Time Driver Model (RTDM) early within its
development. RTDM is a common framework for developing device drivers for
real-time. RTDM started being developed when dual-kernel approaches such as
RTLinux and RTAI became known, as the approaches required new drivers for
real-time [28]. When a real-time application needs to access a certain driver,
RTDM acts as a mediator between the application and the device driver.
2.4.4 Skins
2.5 Raspberry Pi
RPi [30] is a Single-Board Computer (SBC) series, which are a popular choice for
embedded projects. They are being developed by the RPi Foundation. The first
RPi, model B was released in the year 2012 and became a success. Since then,
16
several different models of RPi have been developed and released. A comparison
of all the latest models can be seen below in section 2.5.1.
The current RPi models have different specifications, but some are shared, over-
all details can be seen in Table 2.1 below. All RPi models use an Advanced RISC
Machine (ARM) architecture. The System on a Chip (SoC) and the architecture
of the later chosen RPi will be of importance.
General-Purpose Input/Output
All of the RPi models have GPIO pins which allow sensors, actuators, and much
more to be communicated with the RPis.
The RPi which will be used is RPi 3 and the rest of hardware specifications will
therefore only consider RPi 3. As shown earlier in Table 2.1, RPi 3 uses the
SoC BCM2837. The SoC BCM2837 in RPi 3 is nearly identical to the BCM2836
which was used in the original RPi 2. The most important difference is that
RPi 3 uses a quad-core ARM Cortex-A53 processor which replaced the quad-core
Cortex-A7 on RPi 2. The cores run at maximum 1.2GHz without overclocking,
which is approximately 50% faster than RPi 2. The ARM Cortex-A53 processor
is considered a mid-range, and low-power processor with ARMv8-A architecture
[31].
ARM Architecture
17
AArch64 Cortex-A53 supports the AArch64, which is the ARM 64-bit execu-
tion state. It also supports the A64 ISA (also called armv8-a) needed for
AArch64.
AArch32 Cortex-A53 also supports AArch32 which is the corresponding 32-
bit execution state, and includes its A32 ISA (also called armv7-a). A32
was previously called ARM ISA
Thumb instruction set Now called T32. T32 is a subset of the A32 instruc-
tion set where each instruction is instead of length 16-bit. T32 has also,
for every 16-bit instruction a corresponding 32-bit instruction.
Exception levels Cortex-A53 has support for exception level EL0 to EL3.
The number corresponds to the software execution privilege where 0 is
the lowest (unprivileged) and 3 is the highest. EL2 provides support for
processor virtualization and El3 provides support for security states. The
exception levels are supported in AArch64 and AArch32.
Pipeline
Cache
Each core has its own level 1 cache with size 32KB, and share a level 2 cache
with size 512KB [32]. According to the ARM Cortex-A53s datasheet [31], the
instruction and data are in separated caches. It is therefore uncertain if [32]
are referring to level 1 cache of 32KB being the sum of both instruction cache
and data cache or not. A pessimistic assumption would be that the instruction
cache and data cache is of size 16KB each. See Figure 2.8 below for a visual
description of the cache topology. See Table 2.2 below for gathered specifications
for the caches.
Table 2.2: Cache information
Cache Level Data/Instruction Line size Set associative Replacement policy Size
1 Instruction 64 bytes 2-way Pseudo-random 16KB
1 Data 64 bytes 4-way Pseudo-random 16KB
2 Both 64 bytes 16-way Not found 512KB
As ARM Cortex-A53 has multiple cores, it needs a data cache coherence protocol
in order to avoid data corruption when cores share data. The protocol used is
18
called Modified Owned Exclusive Shared Invalid (MOESI) [31], where each word
in the name describes the state a shareable cache line can be in.
2.5.3 Raspbian OS
The OS that the RPi Foundation officially support is Raspbian [33]. Raspbian
is based on the Debian Linux distribution but is modified to run as smoothly
as possible for the RPi devices. Raspbian is currently only executing in 32-bit,
even on RPi 3 which has a 64-bit CPU. This limits the available ISAs on RPi 3
as armv8-a cannot be executed in a 32-bit environment [31]. Successful attempts
at running Raspbian in 64-bit have been made but is not yet considered stable
enough for an official release [34].
The RPi kernel has it own kernel Git repository forked from the mainline kernel.
The repository provides with e.g. heavy modified Universal Serial Bus (USB)
drivers and other modifications especially for the hardware on the RPis [35].
Related work has been researched. They have different purposes and different
approaches, however, their area has been related in terms of either real-time,
GPOS and Linux.
The authors Claudio Scordino and Giuseppe Lipari have provided an article
describing current approaches and future opportunities regarding real-time for
Linux [18]. Even though the article can be considered outdated today, it intro-
duced the area around the approaches such as interrupt abstraction and kernel
modifications which are still used today.
19
2.6.2 An Embedded Multi-Core Platform for Mixed-Crit-
icality Systems
The author Youssef Zaki has written a master thesis about his study and analysis
of virtualization techniques for mixed-criticality systems [36]. Mixed-criticality
systems are systems which combine tasks of different criticality/importance on
the same computing platform. It is most important that the lower critical tasks
would not interrupt or degrade the performance of the higher critical tasks. This
is achieved by system isolation and virtualization with Multiple OSs. Although
virtualization was not a sought out solution, the report provided a good insight
on approaches to virtualization.
Nitin Kulkarni was working on audio processing with soft real-time requirements
using a dual-kernel approach [37]. The dual-kernel approach was by using Xeno-
mai with Cobalt on an Intel Atom board [38, P. 41]. The conclusions from the
analysis done by the author state that the overall responsiveness of the sys-
tem was improved and that Xenomai can indeed be used for hard real-time
applications [37, P. 64].
The author Adam Lundström has written a study on approaches for executing
Ada-code in Real-Time Linux [39]. The author chose PREEMPT RT as the
most optimal approach for the purpose but describes that RTAI and Xenomai
could also be promising solutions. Xenomai and RTAI provided with better
performance in terms of latencies but lacked support for Ada [39]. The lack of
Ada support for Xenomai and RTAI was one of the main argument to be using
PREEMPT RT instead. Adam Lundström also states through [40] that PRE-
EMPT RT is not able to guarantee hard real-time because of its architecture
[39].
2.7 Summary
The conclusions based on the research in the background and the project de-
scription (Section 1.1) deemed that a system using Xenomai Cobalt could be an
appropriate solution. The system will be running on a RPi 3 with the Raspbian
GPOS together with Xenomai Cobalt and the implementation will be using
Xenomai’s API.
20
Measuring the predictability of a real-time system is a very complicated process,
and with an entire GPOS, sharing resources with the real-time parts complicate
the process further. Because of complexity and time constraints, the thesis will
analyze the predictability of the system on response jitter to external stimuli,
recognized via means of interrupts.
This chapter has been given details on the terms, topics, and areas for which
the thesis is based on. Based on the topics and areas in the literature, combined
with the project description has a conclusion been made on how the rest of the
thesis should be continued. The final conclusion was to use Xenomai Cobalt on
RPi 3 in order to provide hard real-time characteristics on the Raspbian OS.
21
3 Method
This chapter describes the research method in more details. The Xenomai
section describes how the installation process will is, and what is achieved by
it. The application section describes what needs to be implemented. The data
collection describes how the measurements will be done and why for the coming
analysis.
The installation process follows the official installation guide for Xenomai 3.x
[41]. RPi 3 is supported according to the hardware section of the webpage.
RPi 3 is shown in section: “Dual kernel Configuration → ARM → Supported
Evaluation Boards”. The installation process might still encounter problems
because of the lack of information on the webpage. Some developers have at-
tempted this installation process, and luckily, they provided guides on how they
succeeded [42] [43].
The installation process could encounter problems so severe that it could hinder
the development of this thesis. A backup plan would be needed, should that
happen. The backup plan is to use an image provided by [44]. The image
contains the Raspbian OS with an already installed Xenomai for RPi model
zero, 1, 2, and 3.
22
3.2 Application
In this section the overall planned structure of the application is described, with
motivations of why it was implemented as such.
Xenomai’s API is used as much as possible during the implementation, as using
the generic API and libraries available in Linux can create mode switching,
meaning that; the application may switch from the primary domain to the
secondary domain and vice versa. This may, consequently, cause the application
temporary to be executed in the Linux domain, and thus causing unpredictable
latencies.
3.2.1 Description
The application needs to read and write using the GPIO pins on the RPi 3
with hard real-time requirements. This means that tasks must never miss their
deadlines. The aim is to determine the rate which describes how accurate the
GPIO pins can be written and read. For example, if a rate of 10 kHz is found,
then the application will always be able to write and read a pin within 100 µs.
The application is developed to read a Comma Separated Values (CSV) file
which describes when the application should write to the GPIO pins and how
long each high signal should be. Reading files causes a switch to the secondary
domain, meaning that files will be read and stored before the application requires
real-time execution.
Subsequently, the application will handle a limited amount of data and thus
can avoid dynamic allocations. The application will initialize everything needed
before any real-time requirements are set. This way, there will not be any
background tasks running within the application, and it can instead focus on
the real-time measurements. A simple flowchart for the application is illustrated
in Figure 3.1 below.
When the application seems to be working as described, a trial and error method
will be used to increase the rate. Ways, such as creating CPU sets and isolating
CPUs, are considered. The CPU set is a way of having a hierarchy describing
which resources processes are allowed to use, for example restricting processes
from using certain CPU cores. Isolating CPUs refers to the kernel boot option
isolcpus. Isolcpus remove specified CPU cores from the kernel scheduler.
3.2.2 Testing
The application will be tested through the whole implementation using an os-
cilloscope. Measurements from the oscilloscope are compared with the configu-
ration in the application. The test is considered to be successful if the measure-
23
Figure 3.1: The flow of the application
ment differences are lower than the stated rate. During the tests, the system
will be under different kind of loads using stress tests (see section 3.3.4). This
is to ensure that the real-time requirement will still be met independently from
the situation of the rest of the system. Scripts will also be written in order to
automate the testing procedure, and thus reducing test time.
This section describes how the data collection procedure will be done, what it
requires and finally how the data will be analyzed.
24
3.3.2 Automation
As mentioned earlier in section 3.2.2, attempts on making the test and mea-
surement automatic are made. It is common that instruments such as oscil-
loscopes or function generators use the standard Virtual Instrument Software
Architecture (VISA) [46]. VISA is a standard for configuring, programming
or troubleshooting instruments through communication interfaces, for example,
General-Purpose Interface Bus (GPIB), Serial, Ethernet, and USB interfaces.
The VISA standard has been implemented as a Python package called PyVISA
[47], meaning that communication can be done with Python. The Python com-
munity is vast, open and offers a huge amount of different utilities, making the
automation process easier, hence Python was chosen for the automation process.
In order to understand the coming automation steps easier, see Figure 3.2 below
for an illustrated description of the measurement setup.
25
4. execute the application with a specified CSV file. The oscilloscope will be
triggered when it receives a high signal from the application.
5. extract data from the oscilloscope and compare it to the application’s
data.
Latency test is a measurement tool common to use as a reference for how pre-
dictable a system can be. The latency test has a periodic task. As soon as the
task is awakened, it compares the system time with when it was supposed to
be awoken. The difference in these values is the latency. The latency depends
on many things, such as hardware, OS, priorities, the architecture, and much
more. The latency can also be described as scheduling latency. An interrupt
latency has been described before, see Figure 2.1.
Figure 3.3 below illustrates the latency described as release time. The release
time varies depending on the scheduler interrupt latency. The response time
denotes how long time it takes for the task to finish after being released. Xeno-
mai’s API comes with a latency test application by default.
Relative deadline
Response time
Time
Release time Absolute deadline
During each measurement the system will be under stress by applications not
considered as real-time applications. This is done to simulate different loads that
the system might be having while responding to the real-time stimuli. Stress-
NG [48] was chosen as a stress application. The motivation was that Stress-NG
has many features such as: cache-thrashing, I/O-syncs, context-switching, and
more.
3.3.5 Analysis
The application writes to a GPIO pin in specified points in time. The oscillo-
scope will then measure the signals. The measurement data from the oscillo-
26
scope will be compared with when the application was supposed to write to the
GPIO pin. See Figure 3.4 below for an illustrated example.
The application also measures reading signals from a GPIO pin. The signals are
sent from the function generator which is also connected to the oscilloscope. The
measurement data from the oscilloscope will be compared with the applications
measurement data. See Figure 3.5 below for an illustrated example.
The comparison of the data mentioned will show the minimum, average, and
maximum latencies measured. The result represents how accurate the RPi 3
can write and read signals using the GPIO. See Table 3.1 below for a simplistic
example on resulting data.
Table 3.1: Example showing time difference from application and oscilloscope
If the data is rarely close to either the minimum or maximum data, it can be
useful to plot the data using box plots. The box plot will tell where the majority
of data will end up.
Sample Size
A sample is in this case defined as the measured difference between RPi and
the oscilloscope. The samples will most probably vary for each test, therefore
27
it is necessary to decide on how many samples which will be needed in order
to have reliable results. This decision is dependent on how the standard devi-
ation changes with the number of samples. If the standard deviation reaches
a constant value at a certain number of samples, then it is reasonable to limit
the number of samples to that number. The number might get lowered further,
however, because of time constraints.
The standard deviation for reading and writing GPIO pins are shown below in
Figure 3.6 and Figure 3.7.
0.030
0.006355
0.025
0.020
Standard Deviation
0.015
0.010
0.005
0.000 0
10 10 1 10 2 10 3 10 4
12105
01022304
12104
12103
12101
012342546789
21
3
12116
12115
12114
12113
12111 0
01 01 1 01 2 01 3 01 4
What can be seen from Figure 3.6 and 3.7 is that the standard deviation rel-
atively high for the first 100 samples. The standard deviation then stabilizes
after 1.00 × 103 samples. A reasonable decision on the sample size is there-
fore between 1.00 × 103 and 10.00 × 103 samples. During the data collection,
the standard deviation will always be in consideration, as the result may vary
depending on changes in the system.
28
3.4 Summary
The planned Xenomai installation process has been described, as well as the
backup plan in case of an unsuccessful installation.
The application implementation for the RPi has been described, including meth-
ods to possibly improve its rate. The data collection process has been described
to be automated, with a description of acceptable sample sizes.
29
4 Xenomai Installation & Application
This chapter explains first the Xenomai installation on RPi 3. The chapter
continues with the application implementation needed to fulfill the requirements
in section 1.2.
In order to install Xenomai Cobalt on the RPi 3, a few steps were needed.. A
kernel was needed to be decided. There exist a few options for RPi 3. The RPi
foundation has developed its own kernel using a forked version of the mainline
kernel (Torvalds Linux kernel). They have modified their kernel especially for
the RPi models, which could be a good argument on why the RPi kernel should
be chosen. The problem with the RPi kernel is that Xenomai does not focus
their development on separated kernels but instead only on the mainline kernel.
Xenomai has released support for the RPi kernel for kernel version 4.1 on a
forked repository. The last commit on that repository, however, was 14th Febru-
ary 2017, and it is now considered outdated.
Xenomai has support for the mainline kernel up to version 4.9 and is also more
up to date compared to the other options. This is one of the reasons why the
mainline kernel was decided to be used. The other option has been attempted
but without success. Details on the unsuccessful attempt for Xenomai on the
RPi kernel can be seen in Appendix A.
The kernel version was needed to be decided as well. It was important that the
kernel version matches the I-Pipe version, otherwise the differences could cause
the kernel to be not functioning as intended, or simply cannot be built. I-Pipe
was going to be acquired as a patch, which simplifies the merge with the kernel.
A patch is a file created using git diff, which essentially shows all the differences
between latest commit and unstaged files. The patches can then be applied on
different repositories.
Both the kernel and the Xenomai API was needed to be built, which was done
by using a cross-compiler. A cross-compiler is essentially an application which
translates the source code into machine code for a different architecture. Cross-
compilers are usually used for embedded systems in order to reduce compilation
time. The host computer which was used for the entire Xenomai installation
was running Ubuntu 16.04 with a 64-bit OS.
30
4.1.1 Preparation
The parts needed were: the kernel source, a cross-compiler, the Xenomai source,
and the I-Pipe patch.
Acquiring I-Pipe
As mentioned earlier, the kernel and the I-Pipe patch need to have the same
version or at least be as close as possible. The available releases of I-Pipe can be
acquired from the Xenomai’s download page [49]. The latest release for ARM
architecture was 4.9.51. The I-Pipe patch was downloaded with the command:
~/$ wget http://xenomai.org/downloads/ipipe/v4.x/arm/ipipe-core
,→ -4.9.51-arm-4.patch
Acquiring Kernel
The mainline kernel source was then acquired by using the command:
~/$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/
,→ stable/linux-stable.git ~/linux
The latest version on the kernel source tree was different from the sought out
version. It was therefore needed to change it into 4.9.51. It was done by first
finding the kernel version using ”git tag”. The kernel version was then changed
to 4.9.51 by using the command:
~/$ git checkout v.4.9.51
Acquiring Cross-Compiler
The chosen cross-compiler was from RPi own kernel building guide [50]. The
cross-compiler was built especially for RPi models. It was downloaded by the
command:
~/$ git clone https://github.com/raspberrypi/tools ~/tools
31
If a host computer would be using 32-bit OS instead, a 32-bit cross-compiler
would be needed. A 32-bit cross-compiler was found located in the same repos-
itory in path:
~/tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian/bin
The Xenomai source could be found on Xenomai’s git repository webpage [51].
The latest Xenomai version was 3.x and acquired with the command:
~/$ git clone http://git.xenomai.org/xenomai-3.git/ ~/xenomai
Next step was to continue with the Xenomai installation. The steps for Cobalt
core and Mercury core are similar but the Cobalt core has slightly more config-
urations because the kernel needs to be modified.
The next step was to apply I-Pipe and Xenomai to the kernel. I-Pipe could
be applied to the kernel separately or with a script included in the Xenomai
repository. A “dry-run” could be applied when patching the I-Pipe, in order to
make sure that it will be applied as intended. This was done by the command:
~/linux$ patch -p1 --dry-run < ~/ipipe-core-4.9.51-arm-4.patch
Everything patched successfully, meaning that the Xenomai script could be used
without any problems.
~/xenomai$ ./scripts/prepere-kernel.sh --linux=~/linux --arch=arm
,→ --ipipe=~/ipipe-core-4.9.51-arm-4.patch
The kernel source tree needs a configuration file specifying the hardware and
what features that should be included. A configuration could be acquired from
an already running RPi provided that the kernel it uses is the same version or
close enough. However, the already running RPi was using the RPi kernel and
not the mainline, so a different approach was chosen instead. A configuration
could be chosen from one of the default configurations available in the kernel
source tree. The configuration file for RPi 3 was called ”multi v7 defconfig”.
32
Before specifying the kernel configuration, the architecture and cross-compiler
must be specified for the kernel source tree. This was done by the commands:
~/linux$ export ARCH=arm
~/linux$ export CROSS_COMPILE=arm-linux-gnueabihf-
The default configuration was made to fit multiple of different systems. This
condition was avoided by deselecting all systems which were not for RPi 3, as
shown in Figure 4.1 below.
Then the GPIO device driver was added to Xenomai as a loadable module, see
Figure 4.2 below.
Some features were needed to be edited in order for Xenomai to work properly.
CONFIG CPU FREQ This option allows the CPU frequency to be modu-
lated depending on the workload. This feature adds unpredictability and
was therefore disabled.
CONFIG CPU IDLE This feature allows the CPU to enter sleep states. As
it takes time for the CPU to be awoken, it adds latencies and unpredictabil-
ity. The feature could also cause timers for Xenomai to stop working. It
was therefore decided to disable this feature.
CONFIG KGDB Is a kernel debugger which could only be enabled for X86
architectures. As RPi 3 is an ARM architecture, it was necessary to disable
it.
CONFIG CONTEXT TRACKING FORCE This option is automati-
cally disabled with the I-Pipe patch.
33
These configurations can be viewed below in Figure 4.3.
When the configurations were done, the kernel could finally be built. This is
done by the command:
~/linux$ make zImage modules dtbs -j12
The next step was to install the kernel in a temporary location. The Raspbian
OS was already installed on the SD card and was mounted in path ”/mnt/rpi.
The mount contained two partitions: boot and rootfs. The boot partition han-
dles only the boot sequence (which selects the kernel and its arguments). Rootfs
is the other partition which contains the rest of the entire OS. A temporary lo-
cation was created with the same structure as the SD card. Everything was
going to be installed to that location, and when finished, the temporary files
would finally be copied to the SD card.
The parts needed to be installed were the kernel image, modules, and Device
Tree Blobs (DTBS). These were installed with the commands:
~/linux$ export INSTALL_MOD_PATH=~/tmp/rootfs
~/linux$ make modules_install
~/linux$ cp arch/arm/boot/dts/bcm*.dtb ~/tmp/boot
~/linux$ cp arch/arm/boot/zImage ~/tmp/boot/kernel7.img
Then the last step regarding the kernel was to set which device tree to be
used. The device tree was given by Xenomai, which was called “bcm-2837-rpi-
b-cobalt.dtb”. It was added to “config.txt” inside the SD card boot partition.
The device tree was set by writing “device tree=bcm2837-rpi-b-cobalt.dtb”.
The kernel was then finally installed with Xenomai Cobalt. The next step was
to install the Xenomai API so that real-time applications could be developed
and used.
34
4.1.3 Installing Xenomai API
The Xenomai source was acquired through Git, and because of this, a config-
uration script was needed to be executed in order to generate the necessary
Makefiles. The automatic configuration script was executed by the command:
~/xenomai$ ./scripts/bootstrap
The next step required a build directory and a staging directory. The build was
needed to store the files that the succeeding command was going to generate.
The staging directory would then store the installed files temporary before mov-
ing them to the final location. The chosen directory was /̃tmp/rootfs from the
previous section.
It was required to generate an installation configuration for the specific chosen
platform. This was done by the commands:
~/$ export CFLAGS="-march=armv7-a -mfloat-abi=hard -mfpu=neon -
,→ ffast-math"
~/$ export LDFLAGS="-march=armv7-a -mfloat-abi=hard -mfpu=neon -
,→ ffast-math"
~/$ export DESTDIR=~/tmp/rootfs
~/xenomai/build$ ../configure --enable-smp --host=arm-linux-
,→ gnueabihf --with-core=cobalt
The arguments in the scripts describe the architecture on the platform. The OS
used was Raspbian, which is a 32-bit OS, meaning that ISA armv7-a or below
is necessary. The floating-type hardware is of type neon. The floating-point
convention was chosen to be hard, meaning that the code will be transformed
into instructions specific to the Floating Point Unit (FPU) RPi 3 uses. Finally,
math algorithms were optimized for speed by stating the last command.
When the configuration step was done, the installation of the API could finally
start. The installation was done by using the command:
~/xenomai/build$ make install
The last step for the Xenomai installation was then to copy everything to the
SD card. This was done with the command:
~/tmp$ sudo cp -r * /mnt/rpi/ --verbose
The Xenomai installation was complete. The final step was to run it on the
RPi 3. The path to the Xenomai API was needed to be added to ldconfig,
this was done by creating a file called /etc/ld.so.conf.d/xenomai.conf which
contained the path to the Xenomai library. When that was done, ldconfig could
set up all library paths with the command:
RPI:~/$ sudo ldconfig --verbose
35
4.1.4 Xenomai Installation Summary
4.2 Application
The application is described in detail in this section. The application was de-
veloped using C++ as it is easy to continue with the development for future
projects. The skins used by Xenomai’s API were RTDM and Alchemy. RTDM
was necessary as a real-time driver was needed when using the GPIO pins. A
template on how to use the RTDM GPIO device driver can be seen in Ap-
pendix B.
4.2.1 Structure
The structure of the application is divided into five parts: Main, Init, Write,
Read, and Logging. Init, Write, and Read is real-time tasks, while Main and
36
Logging are only parts of the execution process. For an illustration of the overall
structure of the application, see Figure 3.1 in section 3.2.1.
Main
Main is the first part to be executed when starting the application. It started as
a regular Linux process without any real-time characteristics. Main does four
things: creates the real-time tasks in a specified order, waits for the real-time
tasks to finish, logs the acquired data, and finally writes low to all used GPIO
pins.
Init Task
The purpose of Init is to set up the necessary variables and data structures which
are used by task Read and Write. In addition, Init provides synchronization
between the tasks by using a counting-priority-based semaphore. For details
on how this is done, see section 4.2.2. Afterward, Init waits for a specific time
to write to a specific GPIO pin which is used by the Function Generator as a
trigger. The time is specified by the user when starting the application.
Write Task
The purpose of Write task is to write to a given GPIO pin at specified points
in time. When the task first executes after being released by Init task, it reads
the current system time and stores it for future reference. The task receives
an array containing when each signal should be written and how long they are
supposed to be. The system time was then used with the array to determine
when each signal should be written. Task Write idles between each writing to
the GPIO pin. Setting a task to idle allows other tasks to be executed until the
idling task is awoken. However, the task will not be awoken by the exact time
specified because of latencies, described in section 3.3.3.
Task Write also synchronizes the oscilloscope with RPi by writing to an addi-
tional GPIO pin which the oscilloscope uses as a trigger. The trigger pin is
written to directly before reading the system time.
Read Task
The purpose of the Read task is only to read a specified GPIO pin and then
stores the system time in an array when the signal is received. The reading
GPIO pin is set up as an IRQ with both rising and falling edge detection. This
means that whenever an edge is detected by Xenomai IRQ handler, the handler
calls the callback function. Whenever task Read calls the read function, it gets
37
blocked until a signal edge is detected. The current time could then be stored
directly after the read function.
Logging
When both Write and Read tasks are finished, the logging begins. The logging
reads the array filled by task Read and converts its values from nanoseconds
to milliseconds. The logging then writes the data to a CSV file. The system
calls used by Logging would cause a mode switch. However, this will not be a
problem as the Logging only starts when the real-time tasks finish.
4.2.2 Execution
In order for every part to work together, a specific execution order is needed.
Figure 4.4 below illustrates how the whole application is executed.
38
7. Write timestamps the current time and starts with the writing process by
waiting for when the first GPIO signal should be written.
8. When task Read and task Write finish, Main saves the data acquired by
task Read to a file. The Application is then done.
4.2.3 Details
In this subsection, some details of the application are described, for example,
implementation decisions based on the hardware of RPi 3, and how they would
affect the system’s predictability.
Cores
The RPi 3 has four cores, which gives the possibility to setup the application in
such way that the real-time tasks are executing concurrently on different cores.
This might, however, increase the unpredictability if for example data would
be unknowingly be shared between cores. This would cause the data cache
coherence protocol to move the data back and forth between the cores. The
real-time tasks were therefore chosen to be placed on only one core.
Scheduling
When real-time tasks are created and started with Xenomai’s API, they are
scheduled with a First In First Out (FIFO) queue and start in the Xenomai
domain (also called primary). When a real-time task is executing in the primary
mode it has higher priority than the Linux kernel. When a real-time task
is scheduled to start, Xenomai preempts the Linux kernel with all conflicting
running processes on the specified core. When a real-time task is making a
syscall owned by the Linux kernel, it changes the domain to the secondary,
owned by the Linux kernel and then gets scheduled by the Linux kernel itself.
This is not the case for task Read, Write or Init as they are only using syscall
owned by Xenomai.
Xenomai’s scheduler uses a system timer to determine when rescheduling should
occur. Xenomai does not set the system timer to be periodic but instead pro-
grams the system timer in a one-shot mode with the time of the closest coming
event scheduled in the timeline. One-shot mode simply means that the sys-
tem timer will only interrupt once when the time is due and then needs to be
reprogrammed if another event should be scheduled.
39
Cache
As shown in Table 2.2, the cache line size is 64 bytes. Knowing the line size can
be beneficial as it is then possible to align data to 64. When data is aligned,
the data starts at a memory address which is a multiple of 64. When the data
is placed on the cache, the entire line will be used, provided that the data type
is big enough.
It might, however, be problematic to use caches because of the increased un-
predictability that caches introduce. The unpredictability is caused by delay
differences between a cache hit and cache miss. The unpredictability may be
increased further because the application will compete with many other appli-
cations executing in Linux.
It is possible to reduce the number of applications executing on the same CPU
by using CPU sets and isolcpus (described in section 3.2.1). This can decrease
the number of cache misses because fewer applications will interfere with the
same cache as the real-time application uses.
It is also possible to enable or disable the level 2 cache in attempts to get
better performance. But this would cause other applications to interfere as all
applications have access to the level 2 cache.
RTDM
When using the GPIO pins, RTDM is needed. If a regular Linux driver would be
used to control the GPIO pins the calling task would change its primary domain
to secondary. That would change the scheduler to the Linux kernel instead of
Xenomai, which would not guarantee timed executions. The RTDM drivers
provided with the Xenomai Cobalt have the devices located in /dev/rtdm, not
to be confused with regular Linux drivers within /dev. Each GPIO pin has its
own file within /dev/rtdm/pinctrl-bcm2835 and can be accessed separately.
In order for Xenomai to be able to use drivers for real-time, all drivers need their
interrupts redirected to I-Pipe instead of the generic interrupt handler which the
Linux kernel uses. This was done in section 4.1 either within the I-Pipe patch
or the backup.
Memory Locking
Linux uses an on-demand paging scheme, meaning that memory pages will first
be loaded on the primary memory when the first use occurs. This increases
unpredictability as loading the page (because of a page fault) onto primary
memory would cause additional latencies. A task executing in the primary
domain would be forced to the secondary domain if a page fault happens.
40
Fortunately, calling mlockall can be used to solve the problem. Mlockall forces
the entire process to be allocated on primary memory. This includes but not
limited to userspace code, data, stack, shared libraries, shared memory, and
memory-mapped files. The pages are guaranteed to stay in the primary memory
until the memory is later unlocked.
Xenomai’s API since version 2.6.3 automatically calls mlockall when starting
the application. Since the Xenomai API version 3.0.X was being used, mlockall
was not needed to be included manually in the application code.
Measuring
41
System halts During testing various errors were encountered which caused
the system to halt itself and required a hard reboot. Implementation
mistakes such as letting a real-time task get stuck without any sleep would
take up all execution time, and such application could not be canceled.
The problems were considered easy to solve but sometimes were time-
consuming.
4.3 Summary
This chapter described details on how the Xenomai installation was done. Com-
plications with the Xenomai installation caused a backup plan being used. The
backup was to use a distributed OS image with Xenomai version 3.0.5 already
installed. The application, which was implemented, was detailed in every aspect
deemed important.
42
5 Experimental Setup
When Xenomai was installed and the application was implemented, the focus
on the experimental setup was carried out. The experimental setup focused first
on how the automation part of measuring the performance of the application
could be done. Secondly, the experimental part focused on gathering the data
in order for the analysis to be possible.
5.1 Automation
Before the automated testing started, it was necessary to set up the function
generator, how it should interact with RPi and the properties of the signals
being sent. The function generator is capable of sending signals with a much
higher voltage than what RPi is capable of handling, meaning a wrong setup
could potentially break the RPi.
The RPi used its GPIO pins at a voltage of 3.30 V, meaning that the function
generator needed to send its signal at 3.30 V as well. The function generator was
set up to send signals as square waves as only digital signals were considered.
43
The GPIO on the RPi was using direct current, and that was set up on the
function generator as well.
The square waves of the function generator have an asymmetry of 1% + 5 ns.
It was however not important to have a highly accurate function generator,
because the measurements will be compared by the oscilloscope and RPi.
The function generator then was set up to accept a trigger signal from the RPi.
When the function generator received the trigger, it would output a specified
number of signals which both the RPi and oscilloscope would read. The fre-
quency on the function generator was decided to be 10 kHz as it was required
by the project requirements.
5.1.2 Oscilloscope
It was important to set up the oscilloscope correctly as it was the only device
which was used as a comparison with the RPi.
Preparation
The oscilloscope had many properties which needed to be set up before the
automated testing. Each channel of the oscilloscope was set up with same
properties. The horizontal position on the screen of the oscilloscope was also
set up. A few examples on set properties can be seen below in the list.
Coupling Was set to DC as the signals are direct current and not alternating
current.
Bandwidth Was set to full, the options are either full or 20 MHz. On this
particular oscilloscope was the full option 70 MHz.
Probe gain The probe gain was set up to 1X.
The trigger of the oscilloscope was directly connected to the RPi on a separate
channel and GPIO pin. The trigger was set up to detect the rising edge of the
signal.
The width of the window on the oscilloscope was set at 20 ms. The window width
was an important step to set up as a smaller width gave a higher accuracy while
a larger width gave lower accuracy. The horizontal position sets the trigger
point location. The horizontal position was changed so that it would be in the
leftmost position of the window. With a window width of 20 ms, the leftmost
position is −10 ms as the center is 0 ms. This was done to display as much as
possible on the window without wasting any width.
The record length of the oscilloscope was also set. The options were either 125k
or 10M where the 10M was chosen. A higher record length results in more
accurate data because each measured point was closer to each other.
44
Receiving and Converting Data
The oscilloscope was set up to wait for a trigger before the application would
start. When the application finished, data from the oscilloscope was received
and converted into CSV files.
The data received from the oscilloscope was the output of the RPi and output
from the function generator. The data was first converted into two arrays con-
taining the voltage for each measured point and the time since the trigger. The
data from the two arrays were then used to describe when each signal started
and what lengths they had. The signal information was stored in a CSV file
with the same structure as before. See Figure 5.2 below for an illustration on
the conversion.
0 2.0 2.0
Time (ms) 1 4.0 10.0
0 ... 2 ... 4 ............... 10 ........ 14 ...16 ...17 2 2.0 16.0
Voltage (V)
0 ... 3.3 ... 0 ........... 3.3 ......... 0 3.3 ... 0
The largest width between each measured point was measured to be 16 ns, which
is much smaller compared with the estimated accuracy of RPi. It was therefore
not needed to discuss oscilloscope inaccuracies in chapter 6.
A test case was a CSV file which the application will read and output to the
oscilloscope as mentioned before. Each test case was pseudo-randomly generated
with a number of signals, length of signals, and when signals should be sent.
The number of readings the application should have done was included in each
test case, as well as a pseudo-randomly chosen delay for when the trigger to the
function generator should be sent. The test cases were made pseudo-randomly
in order to have a larger various set of tests.
A typical test case was extracted from the oscilloscope and can be viewed in
Figure 5.3 below.
Figure 5.3 shows an example of a test case on the oscilloscope. The channel 1
(yellow) to channel 4 (green) were being used for each test. Channel 1 shows the
output of the RPi. Channel 2 shows the output from the function generator,
which was also read by the RPi. Channel 3 was used as a trigger for the
oscilloscope and channel 4 was used as a trigger for the function generator.
45
Figure 5.3: A test case example displayed on the oscilloscope
5.1.4 Stress
Applying stress to the RPi while running test cases was not done without diffi-
culties. When the stress application (Stress-NG) was executing, communication
with the RPi through SSH was not possible. The SSH process on the RPi was
delayed for so long that when trying to communicate with it, a timeout was
initiated.
The issue was addressed by having a second real-time application (stress-
controller) which starts the stress application and then after a few seconds
starts the main application. When the main application was finished, the stress
controller closed the stress application. Figure 5.4 below illustrates the stress
controller. This solution is not ideal however because of at least three reasons:
Stress Controller
Start main
Start Start Stress-NG Wait few seconds Close Stress-NG
application
46
5.2 Data Collection
The automation process created a lot of measurement data. This section de-
scribes how the data was used to make it easier to analyze.
With each test case two readings from the oscilloscope were done: Reading the
output of the function generator and reading the output from the RPi.
The readings from the oscilloscope and the RPi were compared with the CSV
file which the RPi used for writing. The comparison between the two files
showed how accurate RPi wrote. The result was a CSV file which showed the
differences between each signal. A negative value meant that the RPi wrote
later than specified. A positive value meant that the RPi wrote earlier.
The readings from the oscilloscope of the function generator were being com-
pared with the CSV file which the RPi application generated when reading the
function generator. The comparison showed how accurate the RPi could read.
The resulting CSV was the same as from the previous comparison. A negative
value meant that the RPi thought it read earlier than the oscilloscope, most
probably because of the inaccuracy of the time stamp. A positive value meant
that the RPi read later than the oscilloscope.
All the generated data was read when a certain number of test cases were
executed. The standard deviation of the data was then displayed to see if
enough test cases have been made. If the standard deviation was stabilized
enough then the data was plotted into box plots, showing the time difference
result for writing and reading.
A few problems were being encountered during the implementation of the au-
tomation process, as well as during the data collection. Below is a list of some
of the problems which occurred and how they were handled.
Stress
Difficulties of adding stress to the system during the automation process
were described earlier in section 5.1.4. Because of the difficulties, it was
not clear how efficient the stress application was within the limited time
frame in which the stress application was being executed. Because of the
47
complications of the stress and its unknown efficiency, it was decided to
not use stress testing at all. This causes the data measurement to not
being able to measure a worst-case execution time but instead focused on
an ideal case.
Oscilloscope
During the automated measuring process, the oscilloscope sometimes dis-
appeared from detection by the PC. No further communication with the
oscilloscope could be done and the automated measuring then was can-
celed. This problem was fixed by manually changing the USB settings on
the oscilloscope after each occurrence. Because of the problem, the mea-
surements could stop at any point in time and thus needed supervision.
The root cause for the problem was not identified.
5.4 Summary
This chapter gave insights and details on how the automation process was done,
how the data collection was achieved, and what complications occurred. Because
of the complications, the stress testing was not used. The automation process
needed supervision as the oscilloscope could stop responding at any time.
48
6 Results & Analysis
This chapter gives a detailed description of the results from the data collection.
The data distribution is going be shown first as a histogram, then with box
plots to clearly show the data. A few different measurement types had been
done and these differences are compared, evaluated, and briefly expounded.
The differences may provide insights relevant when using the system for future
purposes.
Different system configurations had been used on the RPi 3 in order to try to im-
prove the results. The descriptions for each system configuration are described
in the below subsections.
The level 2 cache was disabled by default on a RPi 3 with Raspbian. Measure-
ments are therefore also done with the level 2 cache enabled. The application
was also executed on a core together with regular programs. Enabling the
level 2 cache should change the number of cache misses and cache hits during
the execution of the application.
49
6.1.3 Regular
The regular data was done without using the isolated core method and with the
level 2 Cache disabled.
6.2 Results
In this section, the results of the data are viewed and explained. First, the data
distribution is shown and described. Afterward, each data type is shown in box
plots, then compared, and finally evaluated.
The latency test (described in section 3.3.3) was executed overnight together
with Stress-NG. The result from the latency test can be viewed in Table 6.1
below. The table shows the best, average, and worst encountered latency.
The reason why the best latency value is negative is that the timer interrupts
are triggered earlier than expected. This is because Xenomai tried to improve
the latencies by setting the timer trigger earlier. The result shows a latency
range of 87.96 µs.
6.2.2 Distribution
It is interesting to see how the data was being distributed. The data distribution
is illustrated as a histogram below in Figure 6.1. Only the data distribution of
data type Regular is being shown because the data from other measurement
types were distributed alike.
The Subfigures 6.1a and 6.1b clearly shows which data occurred the most. An
interesting thing to notice is how the data distribution was almost split in two
in both of the Sub figures. It is especially noticeable in Sub figure 6.1a.
The results of each measurement type can be seen in Figure 6.2 and Figure 6.3
below. Only the minimum and maximum fliers are shown in the box plots to
50
1600 3000
n = 12030 0 0 12333
1400
1200 400
1000
800 4000
600
400 00
200
0 0
0.10 0.05 0.00 0.05 0.10 0102 0103 0104 0100 0104 0103 0102 0105
Difference (ms) 67889
99
Figure 6.1: Histogram of reading and writing for the Regular type
51
0.15
1.25e-01 1.32e-01 1.29e-01
1.16e-01 1.21e-01 1.15e-01
0.10
0.05
2.21e-02 2.24e-02 2.17e-02 2.17e-02 2.27e-02 2.27e-02
Difference (ms)
0.10
0.20
Isol0 Isol1 Isol2 Isol3 Regular L2Cache
44
6946
4946 76946
6946 5946
44 75946
444
54494 54494 54494 54494 54494 54494
6947
55946 55946 55946 564946 567946
446
65946
75946 75946
76946
44
46946
6946
44
01234 01235 01236 01237 89
3
69
52
6.3 Analysis
In this section, the data is analyzed. Each section below describes each part of
the data of interest.
The fliers show the worst occurred execution time. A possible explanation for
the fliers being so far away from the rest of the data could be that a cache miss
occurred when calling the read timer function, causing a later timer reading
than usual.
The time differences show the interrupt latency from when the GPIO interrupt
occurred when the reading task received the timer value. Figure 2.1 shows a
representation to the interrupt latency.
The time differences were the result of the timer interrupt latency by the sched-
uler. An additional latency could be caused by task Read, as it had higher
priority than task Write. This means that task Write could be preempted by
task Read or delayed before it could write. Another data was done without task
Read enabled to demonstrate the differences, see Figure 6.4 below.
The Figure 6.4 clearly shows an improvement when task Read does not interrupt
task Write, thus proving that additional latency was caused by task Read. The
latency which remained was the scheduling latency. When comparing the result
from Figure 6.4 and Table 6.1, it can be seen that the range from the Figure
(51.20 µs) was close to the worst (82.00 µs) from the table.
The data type of Isol0 has better performance with reading compared to the
other data types. This is very likely based on where the interrupts were handled.
Almost all interrupts were handled by core 0, the exceptions were for example
rescheduling interrupts or function call interrupts which appeared on all cores.
A confirmation of this was done by reading /proc/interrupts on the RPi which
showed which core each interrupt was running on.
53
0.15
1.16e-01
0.10
0.05
1.81e-02 2.21e-02
Difference (ms)
0.10
0.15 -1.42e-01
0.20
Isol0: Without reading Isol0: With reading
Figure 6.4: Box plot of writing differences comparing with and without reading
6.4 Summary
This chapter has shown details about the results sought out through the thesis.
The results have been analyzed and compared based on information given from
previous chapters.
54
7 Conclusions & Future Work
This chapter describes the conclusions of the project. The final results are shown
afterward and discussed. The requirements set of the beginning of the project
are then compared with the result. Lastly, examples of possible future work are
described and motivated.
7.1 Conclusions
The final conclusions will be based on the measurement results from the pre-
vious chapter, on the process of the Xenomai installation and the application
implementation, and also on the automation process. The results will then be
evaluated based on the requirements from the project description in section 1.2.
Installing Xenomai on RPi 3 was proven to be more difficult than anticipated.
Much time was spending on researching on other developers success with having
Xenomai on RPi and trying to follow their guidelines. The Xenomai installation
was deemed unsuccessful as the RPi 3 would either not boot or be unstable.
Thanks to the backup, it was possible to continue on the project and gather
the data needed. Because of the Xenomai setup was unsuccessful, an initial
idea of configuring the kernel in ways to try to improve the measurements were
scrapped, as it was not possible with the backup plan.
Implementing the application was a success. Thanks to the available exercises
on how to use Xenomai’s API and the documentation of Xenomai’s API.
The final results which are used to determine if the requirements are met are
from the measurement type Isolate Core as they had the best performance. A
summary can be seen below in Table 7.1.
Together Separate
Min Max Range Min Max Range
Reading −28.10 µs 38.10 µs 66.20 µs −28.10 µs 38.10 µs 66.20 µs
Writing −142 µs 116 µs 258 µs −38.10 µs 18.10 µs 56.20 µs
55
What can be seen in the table is when the task Read and task Write was sched-
uled together, task Read caused worse performance for task Write, as earlier
explained. What is interesting to see is that task Write has better performance
than task Read when they were executed separately.
The project had requirements for the implementation in order to deem whether
it was successful or not. The requirements can be seen in the list below.
3 The approach chosen need to support the hardware of RPi 3.
3 The GPOS preferred is Raspbian.
3 The approach need hard real-time support.
3 The system needs to handle real-time tasks which won’t be interrupted or
degraded while using generic tasks from GPOS.
3 The system needs to be able to set up communication between real-time
tasks and general-purpose tasks.
7 The implementation of the system needs to be able to read and write GPIO
pins with a time precision of ±10 µs, and include timestamps within the
same precision.
3 The implementation of the system needs to be able to monitor and log
data, either during or after the real-time measurements have been per-
formed.
What can be seen from the list is that the time precision requirement was not
met. This is because of the minimum and maximum result in Table 7.1 exceeded
the required time precision of ±10 µs.
The measurement data shown in the box plots in chapter 6 showed short boxes,
meaning that the majority of measurement data had very small variance. This
means that with enough test sample, the minority of the measurement data
could be discarded. In other words, the system can be used for firm real-time
characteristics with better precision than hard real-time.
56
done. The measurements shown displayed only the measurements for a system
executed in an idle state, which can be interpreted as ideal execution time
instead. The latency test in section 6.2.1) was however done with stress testing
and could be used as a reference point on how the GPIO measurements could
be extended to.
Future work on the project can be made. A few of them are described in the
list below.
• Modifying the RTDM GPIO device driver to read and write pins using bit-
masks instead of selecting separated pins. This would allow users to read
multiple of pins concurrently and also write multiple pins concurrently.
• Creating an experimental setup with efficient stress testing on the system.
This would create more reliable data as the Cobalt core is sharing resources
with the Linux kernel.
• Adding PREEMPT RT patch to the Linux kernel together with Xenomai
Cobalt. The PREEMPT RT makes the Linux kernel more preemptible,
thus improving the unpredictability whenever a real-time task in Cobalt
need to make a syscall to the Linux kernel.
7.3 Summary
This chapter has described the overall results and conclusions of the project. The
final results have been shown in Table 7.1. The project has been compared with
its requirements which were set at the beginning of the project. All requirements
except for one were met. Examples of future work were also described and
motivated.
57
References
[1] Giorgio C. Buttazzo. Hard Real-Time Computing Systems. Vol. 24.
Real-Time Systems Series. DOI: 10.1007/978-1-4614-0676-1. Boston,
MA: Springer US, 2011. isbn: 978-1-4614-0675-4. url: http : / / link .
springer.com/10.1007/978-1-4614-0676-1 (visited on 26/01/2018).
[2] Phillip A. Laplante and Seppo J. Ovaska. Real-Time Systems Design and
Analysis: Tools for the Practitioner. 4th ed. the Institute of Electrical and
Electronics Engineers, Inc.: John Wiley & Sons, Inc., 2012. isbn: 978-1-
118-13660-7.
[3] Power Supply - Raspberry Pi Documentation. 2018. url: https://www
. raspberrypi . org / documentation / hardware / raspberrypi / power /
README.md (visited on 07/05/2018).
[4] Power Management Statistics: Information Technology - Northwestern
University. Oct. 2017. url: http : / / www . it . northwestern . edu / har
dware/eco/stats.html (visited on 07/05/2018).
[5] Victor Yodaiken. “The RTLinux Manifesto”. In: Linux Expo 5 (1999),
p. 12.
[6] Real-Time Linux Wiki. Aug. 2016. url: https://rt.wiki.kernel.org/
index.php/Main_Page (visited on 07/05/2018).
[7] RTAI - the RealTime Application Interface for Linux. Jan. 2018. url:
https://www.rtai.org/ (visited on 09/02/2018).
[8] Xenomai. 2018. url: https://xenomai.org/ (visited on 09/02/2018).
[9] Jonathan Corbet. Deadline scheduling for Linux [LWN.net]. Oct. 2009.
url: https://lwn.net/Articles/356576/ (visited on 07/05/2018).
[10] A. Stahlhofen and D. Zöbel. “Linux SCHED DEADLINE vs. MARTOP-
EDF”. In: 2015 IEEE 13th International Conference on Embedded and
Ubiquitous Computing. 2015 IEEE 13th International Conference on Em-
bedded and Ubiquitous Computing. Oct. 2015, pp. 168–172. doi: 10 .
1109/EUC.2015.28.
[11] Philip Axer et al. “Building timing predictable embedded systems”. In:
ACM Transactions on Embedded Computing Systems (TECS) 13.4 (2014),
p. 82.
[12] Steven J. Vaughan-Nichols. Twenty Years of Linux according to Linus
Torvalds. ZDNet. Apr. 2011. url: http://www.zdnet.com/article/
twenty-years-of-linux-according-to-linus-torvalds/ (visited on
07/02/2018).
58
[13] Mobile operating systems’ market share worldwide from January 2012 to
December 2017. 2017. url: https://www.statista.com/statistics/
272698/global-market-share-held-by-mobile-operating-systems-
since-2009/ (visited on 07/02/2018).
[14] GNU Operating System. What is Copyleft. 2018. url: https://www.gnu.
org/copyleft/ (visited on 07/02/2018).
[15] Kernel Definition. May 2004. url: http://www.linfo.org/kernel.html
(visited on 26/01/2018).
[16] Linus Torvalds. Linux kernel stable tree. May 2018. url: https://git.
kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
(visited on 08/05/2018).
[17] Philippe Gerum. “Xenomai-Implementing a RTOS emulation framework
on GNU/Linux”. In: White Paper, Xenomai (2004).
[18] Claudio Scordino and Giuseppe Lipari. “Linux and real-time: Current ap-
proaches and future opportunities”. In: ANIPLA International Congress,
Rome. 2006.
[19] Real-Time Linux. May 2017. url: https : / / wiki . linuxfoundation .
org/realtime/start (visited on 26/01/2018).
[20] Paul McKenney. A real-time preemption overview. Aug. 2005. url: https:
//lwn.net/Articles/146861/ (visited on 19/01/2018).
[21] Paul McKenney. Sleeping Spinlocks. July 2017. url: https://wiki.li
nuxfoundation.org/realtime/documentation/technical_details/
sleeping_spinlocks (visited on 09/05/2018).
[22] Paul McKenney. Technical details of PREEMPT RT patch. Feb. 2017.
url: https://wiki.linuxfoundation.org/realtime/documentation/
technical_details/start (visited on 09/05/2018).
[23] J. Calandrino and H. Leontyev and A. Block and U. Devi and J. An-
derson. “LITMUSRT: A Testbed for Empirically Comparing Real-Time
Multiprocessor Schedulers”. In: IEEE Real-Time Systems Symposium 27
(Dec. 2006), pp. 111–123.
[24] B. Brandenburg. “Scheduling and Locking in Multiprocessor Real-Time
Operating Systems”. PhD thesis. Chapel Hill: UNC, 2011.
[25] LITMUS-RT: Linux Testbed for Multiprocessor Scheduling in Real-Time
Systems. 2017. url: https://www.litmus-rt.org/index.html (visited
on 20/02/2018).
[26] Karim Yaghmour et al. Building embedded Linux systems: concepts, tech-
niques, tricks & traps. Ed. by Karim Yaghmour. 2. ed. [incl. real-time
variants]. Beijing: O’Reilly, 2008. 439 pp. isbn: 978-0-596-52968-0.
[27] Start Here: Xenomai. 2018. url: https://xenomai.org/start- here/
(visited on 15/02/2018).
59
[28] J. Kiszka. “The real-time driver model and first applications”. In: 7th
Real-Time Linux Workshop, Lille, France. 2005.
[29] Wind River Systems, Inc. VxWorks: Product Overview. Sept. 2016.
[30] Raspberry Pi. Jan. 2018. url: https://en.wikipedia.org/w/index.
php?title=Raspberry_Pi&oldid=820745037 (visited on 18/01/2018).
[31] ARM. ARM Cortex-A53 MPCore Processor Technical Reference Manual.
2013. url: http://infocenter.arm.com/help/topic/com.arm.doc.dd
i0500d/DDI0500D_cortex_a53_r0p2_trm.pdf (visited on 17/01/2018).
[32] Raspberry Pi 3 is out now! Specs, benchmarks & more. The MagPi
Magazine. Feb. 2016. url: https : / / www . raspberrypi . org / magpi /
raspberry-pi-3-specs-benchmarks/ (visited on 20/02/2018).
[33] Raspberry Pi Downloads - Software for the Raspberry Pi. Raspberry Pi.
2018. url: https : / / www . raspberrypi . org / downloads/ (visited on
12/02/2018).
[34] TalOrg. Build 64-bit kernel for Raspberry Pi 3, using native tools. TalOrg.
Mar. 2017. url: http : / / www . tal . org / tutorials / raspberry - pi3 -
build-64-bit-kernel (visited on 12/02/2018).
[35] Kernel source tree for Raspberry Pi Foundation. Jan. 2018. url: https:
//github.com/raspberrypi/linux (visited on 26/01/2018).
[36] Youssef Zaki. “An embedded multi-core platform for mixed-criticality sys-
tems: Study and analysis of virtualization techniques”. master thesis.
Stockholm, Sweden: KTH, 2016. 65 pp.
[37] Nitin Kulkarni. “Real-time audio processing for an embedded Linux sys-
tem using a dual-kernel approach”. master thesis. KTH, 2017.
[38] Salman Rafiq. “Measuring Performance of Soft Real-Time Tasks on Multi-
core Systems”. master thesis. KTH, 2011.
[39] Adam Lundström. “Finding strategies for executing Ada-code in real-
time on Linux using an embedded computer”. master thesis. Stockholm:
KTH, 2016. url: http://kth.diva- portal.org/smash/get/diva2:
931386/FULLTEXT01.pdf.
[40] Radoslaw Rybaniec and Piotr Z. Wieczorek. “Measuring and minimiz-
ing interrupt latency in Linux-based embedded systems”. In: Photonics
Applications in Astronomy, Communications, Industry, and High-Energy
Physics Experiments 2012. Vol. 8454. 2012. url: https://doi.org/10.
1117/12.2000230.
[41] Installing Xenomai 3.x: Xenomai. 2018. url: https : / / xenomai . org /
installing-xenomai-3-x/ (visited on 23/02/2018).
[42] Koide Masahiro. Raspberry Pi 3 and real-time kernel, introduction of
Xenomai. Japanese. Aug. 2016. url: http://artteknika.hatenablog.
com/entry/2016/08/23/143400 (visited on 18/06/2018).
60
[43] Christophe Blaess. Xenomai sur Raspberry Pi 3 : bon espoir mais bilan
mitigé. French. Mar. 2017. url: https://www.blaess.fr/christophe/
2017/03/20/xenomai- sur- raspberry- pi- 3- bilan- mitige/ (visited
on 18/06/2018).
[44] Harco Kuppens. Raspberry Pi image for the Pi zero,1,2,3 with Xenomai
3.0.5 on Raspbian linux 4.1.y Debian 8 jessie. Lab. Aug. 2017. url: http:
//www.cs.kun.nl/lab/xenomai/ (visited on 13/02/2018).
[45] Xenomai 3.0.5 API. 2018. url: https://xenomai.org/documentation/
xenomai-3/html/xeno3prm/index.html (visited on 07/03/2018).
[46] IVI Foundation. The VISA Library. VXIplug & play Systems Alliance,
Oct. 2017. (Visited on 09/03/2018).
[47] PyVISA. PyVISA: Control your instruments with Python. 2016. url: ht
tps://pyvisa.readthedocs.io/en/stable/ (visited on 09/03/2018).
[48] Ubuntu Manpage: stress-ng - a tool to load and stress a computer system.
Ubuntu manuals. Mar. 2016. url: http://manpages.ubuntu.com/manpa
ges/xenial/man1/stress-ng.1.html (visited on 09/03/2018).
[49] Xenomai. I-Pipe download page. Oct. 2017. url: https://xenomai.org/
downloads/ipipe/ (visited on 15/02/2018).
[50] Kernel building - Raspberry Pi Documentation. 2018. url: https://www.
raspberrypi.org/documentation/linux/kernel/building.md (visited
on 16/02/2018).
[51] Xenomai Git Repositories. 2018. url: http://git.xenomai.org/ (visited
on 16/02/2018).
[52] Harco Kuppens. Xenomai 3 on RPI with GPIO. Aug. 2017. url: htt
ps : / / github . com / harcokuppens / xenomai3 _ rpi _ gpio (visited on
24/04/2018).
61
A Xenomai Setup (Unsuccessful At-
tempt Using RPi Kernel)
This approach is using a guide created by Harco Kuppens[52] where he stated
to successfully setup Xenomai Cobalt on a RPi 3. In order to have a Xenomai
cobalt core on the RPi 3, a few steps was needed to be done. A kernel version
needed to be decided. This was done by comparing the available versions for
I-Pipe. It is important that the version is as close as possible because either
I-Pipe is version dependent and was be later patched to the kernel.
A patch is a file created from using a git diff, which essentially shows all the
differences between latest commit and unstaged files. The patches can then be
applied on different kernel repositories.
Both the kernel and the Xenomai API was needed to be built, which was done
by using a cross-compiler. A cross-compiler is essentially an application which
translates the source code into machine code for a different architecture. Cross-
compilers are usually used for building to embedded systems because the em-
bedded systems have usually much slower hardware than a host computer.
A.0.1 Preparation
The parts needed was: the kernel source, a cross-compiler, the Xenomai source,
and the I-Pipe patch.
Acquiring I-Pipe
As mentioned earlier, the kernel and the I-Pipe needed to have the same version
or as close as possible. The available releases of I-Pipe can be acquired from
the Xenomais download page[49]. It did not however include RPi 3, but a
similar version was acquired instead. In addition to the patch was another one
downloaded to fix certain configurations in the kernel source tree. A third file
was acquired which is an updated device driver for the GPIO used by RPi 3. It
was updated to be used with I-Pipe.
~/$ wget https://raw.githubusercontent.com/harcokuppens/
,→ xenomai3_rpi_gpio/master/install/patches_for_pi2+/ipipe-
,→ core-4.1.18-arm-9.fixed.patch
~/$ wget http://www.blaess.fr/christophe/files/article
,→ -2016-05-22/patch-xenomai-3-on-bcm-2709.patch
62
~/$ wget -O pinctrl-bcm2835.c http://git.xenomai.org/ipipe.git/
,→ plain/drivers/pinctrl/bcm/pinctrl-bcm2835.c?h=vendors/
,→ raspberry/ipipe-4.1
Acquiring Kernel
The RPi has its own kernel source tree accessible on The Raspberry Pi Foun-
dations official git repository[35]. The kernel source tree was downloaded using
command:
~/$ git clone https://github.com/raspberrypi/linux.git ~/rpi-
,→ kernel
The latest version on the kernel source tree was different from the sought out
version, therefore it was needed to change it into 4.1.21, that was done by
changing a branch which has the needed kernel version as latest commit.
~/$ git checkout rpi-4.1.y
Acquiring Cross-Compiler
The Xenomai source can be found on Xenomais git repository page[51]. The
latest Xenomai version was 3.x and downloaded with command:
~/$ git clone http://git.xenomai.org/xenomai-3.git/ ~/xenomai
The preparations was then done and next step was to continue with the Xenomai
setup. The steps for Cobalt core and Mercury core are similar but the Cobalt
core have slightly more configurations that was needed to be made.
63
Configure Kernel
Preparing the kernel source with a Xenomai script was needed. What this script
does is adding the Cobalt core into the kernel source together with the I-Pipe
patch. This is done by running this command:
~/xenomai$ ./scripts/prepere-kernel.sh --linux=~/rpi-kernel --
,→ arch=arm --ipipe=../ipipe-core-4.9.18-arm-9.fixed.patch
~/rpi-kernel$ patch -p1 < ../patch-xenomai-3-on-bcm2709.patch
~/rpi-kernel$ cp ../pinctrl-bcm2835.c ./drivers/pinctrl/bcm/
The configuration file was then copied over to the host computer and replaced
/rpi − kernel/.conf ig. After that could the configuration of the kernel be done.
Configuration was done by using the built-in menuconfig, which was ran by
command:
~/rpi-kernel$ make ARCH=arm menuconfig
Some features was needed to be edited in order for Xenomai to work properly.
CONFIG CPU FREQ This option allows the CPU frequency to be modu-
lated depending on the work load. This feature add unpredictability and
was needed to be disabled.
CONFIG CPU IDLE This feature allows the CPU to enter sleep states. As
it takes time for the CPU to be awoken, it adds latencies and unpredictabil-
ity. The feature could also cause timers for Xenomai to stop working. It
was therefore needed to disable this feature.
CONFIG KGDB Is a kernel debugger which could only be enabled for X86
architectures. As RPi 3 is an ARM architecture, it was needed to disable
it.
CONFIG CONTEXT TRACKING FORCE This option was automati-
cally disabled with the I-Pipe patch.
XENO DRIVERS GPIO BCM2835 Is the option to use the GPIO on RPi,
it was enabled with “m”, meaning that it would be a loadable kernel
module.
64
Installing the Kernel
When the configurations was done, the kernel could finally be built. This is
done by command:
~/rpi-kernel$ make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf-
,→ zImage modules dtbs -j12
The next step was to install the kernel on the proper destination. The Raspbian
OS was already installed on the SD card which was mounted in path /mnt/rpi.
The mount contains two partitions. Boot is the partition which only handles
the boot sequence which selects the kernel and its arguments. Rootfs is the
other partition which contains the rest of the entire OS.
The parts needed to be installed was the kernel image, modules, and device tree
blobs (dtbs). These was installed with the command:
~/rpi-kernel$ sudo make ARCH=arm CROSS_COMPILE=arm-linux-
,→ gnueabihf- INSTALL_MOD_PATH=/mnt/rpi/rootfs
,→ modules_install
~/rpi-kernel$ sudo cp arch/arm/boot/zImage /mnt/rpi/boot/kernel7.
,→ img
~/rpi-kernel$ sudo cp arch/arm/boot/dts/*.dtb /mnt/rpi/boot
~/rpi-kernel$ sudo cp arch/arm/boot/dts/overlays/*.dtb* /mnt/rpi/
,→ boot/overlays
The kernel was then finally installed with Xenomai Cobalt. The next step was
to install the Xenomai API so that real-time applications could be implemented.
The Xenomai source was acquired by Git, and because of this was a configura-
tion script needed to be executed in order to generate the necessary Makefiles.
The automatic configuration script was be executed by command:
~/xenomai$ ./scripts/bootstrap
The next step required a build root directory and a staging path. The build root
was needed to store the files the coming command was going to generate. The
staging path is usually a path to store the installed files temporary before moving
them to the final location. In this case however could the files be stored directly
on the final path on the SD card.
It was needed to generate a installation configuration for the specific platform
which was going to be used, this was done by commands:
~/xenomai$ cd buildroot
~/xenomai/buildroot$ ../configure \
65
CFLAGS="-mcpu=cortex-a53 -march=armv7-a" \
LDFLAGS="-mcpu=cortex-a53 -march=armv7-a" \
--enable-smp \
--disable-debug \
--build=i686-pc-linux-gnu \
--host=arm-linux-gnueabihf \
--with-core=cobalt
The arguments in the scripts describes the architecture on the platform. The
CPU used on RPi 3 is an ARM cortex-A53. The OS used was Raspbian which
is a 32bit OS, meaning that the instruction set armv7-a is necessary.
When the configuration step was done, the installation of the API could finally
start. The installation was done by using command:
~/xenomai/buildroot$ make DESTDIR=/mnt/rpi/rootfs install
66
B GPIO Template
B.1 Makefile
# Xenomai setup
XENO_CONFIG := /usr/xenomai/bin/xeno-config
XENO_CFLAGS := $(shell $(XENO_CONFIG) --rtdm --alchemy --cflags)
XENO_LDFLAGS := $(shell $(XENO_CONFIG) --rtdm --alchemy --ldflags)
CC := arm-linux-gnueabihf-gcc
# The Target
TARGET := main
#
# DO NOT EDIT BELOW THIS LINE
#
SOURCES := $(wildcard $(SRCDIR)/*.$(SRCEXT))
INCLUDES := $(wildcard $(INCDIR)/*.$(INCEXT))
OBJECTS := $(patsubst $(SRCDIR)/%,$(BUILDDIR)/%,$(SOURCES:.$(SRCEXT)=.$(OBJEXT)))
# Default Make
all: $(TARGET)
# Clean
clean:
@$(RM) -r $(BUILDDIR) $(TARGET) *.map
@echo "cleaned"
# Link
$(TARGET): $(OBJECTS)
@echo Linking $@
@$(CC) -o $(TARGET) $^ $(LDFLAGS)
67
@size $(TARGET)
# Compile
$(BUILDDIR)/%.$(OBJEXT): $(SRCDIR)/%.$(SRCEXT)
@echo "building "$@
@mkdir -p $(dir $@)
@$(CC) $(CFLAGS) $(INC) -c -o $@ $<
# Non-File Targets
.PHONY: clean
B.2 Code
//! @file gpio_template.cpp
//! @author Gustav Johansson <[email protected]>
//! @brief A template for using RTDM GPIO on RPi3 with Xenomai
//! Cobalt kernel. Don’t forget to load the GPIO device
//! driver module with "modprobe xeno-gpio-bcm2835"
//! before running.
//! The device driver in mainline linux tree is very different from the rpi
//! linux tree. A problem with this is that the gpio pins that appears with the
//! gpio module does not have the same numbers as on a regular RPi. The pins can
//! be seen in /dev/rtdm/pinctrl-bcm2835/. For me started the pins at 970 and
//! ended at 1023. The pins work however without any problems, you only need to
//! know which pin corresponds to a regular pin which is easy. 970+16=986
//! corresponds to gpio pin 16 (bcm).
#include <stdio.h>
#include <alchemy/task.h>
#include <rtdm/gpio.h>
68
}
return pin;
}
// real-time task
RT_TASK task;
void rtTask(void* arg)
{
int edges = GPIO_TRIGGER_EDGE_RISING | GPIO_TRIGGER_EDGE_FALLING;
int writeValue = 0;
int retval;
//! @brief open and read value from GPIO20 (BCM) using interrupt.
//! input change detect raising and falling edge.
rt_printf("INPUT IRQ\n");
int pinInputIRQ = pinInit("/dev/rtdm/pinctrl-bcm2835/gpio990",
O_RDONLY,
GPIO_RTIOC_IRQEN,
&edges);
rt_printf("read value:%d\n", pinGet(pinInputIRQ));
69
&writeValue);
for(int i=0; i<5; ++i){
pinSet(pinOutput, writeValue = !writeValue);
retval = rt_task_sleep(500000000);
if(retval < 0){
rt_printf("Error:%d couldn’t put task to sleep", retval);
}
}
}
// create task
retval = rt_task_create(&task, "rtTask", 0, 99, T_JOINABLE);
if(retval < 0){
rt_printf("Error:%d couldn’t create task.\n", -retval);
}
// start task
retval = rt_task_start(&task, rtTask, NULL);
if(retval < 0){
rt_printf("Error:%d couldn’t start task.\n", -retval);
}
// join task
retval = rt_task_join(&task);
if(retval < 0){
rt_printf("Error:%d couldn’t join task.\n", -retval);
}
return retval;
}
70
TRITA TRITA-EECS-EX-2018:179
www.kth.se