rust
rust
Language Tutorial
(Basics)
Written by:
:ULWWHQE\
Apriorit Inc.
$SULRULW,QF
Author:
$XWKRU
Alexey Lozovsky,
$OH[H\/R]RYVN\
Software Designer in System Programming Team
6RIWZDUH'HVLJQHULQ6\VWHP3URJUDPPLQJ7HDP
KWWSVZZZDSULRULWFRP
https://www.apriorit.com LQIR#DSULRULWFRP
[email protected]
Introduction
This Rust Programming Language Tutorial and feature overview is prepared by system
programming professionals from the Apriorit team. The Tutorial goes in-depth about main
features of the Rust programming language, provides examples of their implementation, and
a brief comparative analysis with C++ language in terms of complexity and possibilities.
Rust is a relatively new systems programming language, but it has already gained a lot of
loyal fans in the development community. Created as a low-level language, Rust has
managed to achieve goals that are usually associated with high-level languages.
Main advantages of Rust are its increased concurrency, safety, and speed, that is achieved
due to the absence of a garbage collector, eliminating data races, and zero-cost
abstractions. Unlike other popular programming languages, Rust can ensure a minimal
runtime and safety checks, while also offering a wide range of libraries and binding with
other languages.
This tutorial is divided into sections, with each section covering one of the main features of
the Rust language:
● zero-cost abstractions
● move semantics
● guaranteed memory safety
● threads without data races
● trait-based generics
● pattern matching
● type inference
● minimal runtime
● efficient C bindings
In addition, we have added a detailed chart comparing feature set of Rust to C++. As a
leading language for low-level software development, C++ serves as a great reference point
for illustrating advantages and disadvantages of Rust.
This tutorial will be useful for anyone who only starts their journey with Rust, as well as for
those who want to gain a more in-depth perspective on Rust features.
Table of Contents
Introduction
Summary of Features
Trait-Based Generics
Traits Define Type Interfaces
Traits Implement Polymorphism
Traits May be Implemented Automatically
Pattern Matching
Type Inference
Minimal Runtime
Efficient C Bindings
Calling C from Rust
The Libc Crate and Unsafe Blocks
Beyond Primitive Types
Calling Rust from C
Rust can be used for web applications as well as for backend operations
due to the many libraries that are available through the
Cargo package
registry
.
Summary of Features
Before describing the features of Rust, we’d like to mention some issues
that the language successfully manages.
Table of content
Issue Rust’s Solution
Use-after-free, double-free bugs, dangling Smart pointers and references avoid these
pointers issues by design
Legacy design of utility types heavily used Built-in, composable, structured types:
by the standard library tuples, structures, enumerations
Table of content
Embedded and bare-metal programming Minimal runtime size (which can be reduced
place high restrictions on runtime even further)
environment
Absence of built-in garbage collector, thread
scheduler, or virtual machine
Using existing libraries written in C and Only header declarations are needed to call C
other languages functions from Rust, or vice versa
Now let’s look more closely at the features provided by the Rust
programming language and see how they’re useful for developing system
software.
Zero-Cost Abstractions
“What you don’t use, you don’t pay for.” And further:
“What you do
use, you couldn’t hand code any better.”
Table of content
abstracted code is less efficient than specific code. However, with clever
language design and compiler optimizations, some abstractions can be
made to have effectively zero runtime cost. The usual sources of these
optimizations are static polymorphism (templates) and aggressive inlining,
both of which Rust embraces fully.
// And here is what we can see if we print out the resulting vector:
println!("{:?}", numbers); / / ===> [6, 7, 8, 10]
Table of content
Combinators use high-level concepts such as closures and lambda
functions that have significant costs if compiled natively. However, due to
optimizations powered by LLVM, this code compiles as efficiently as the
explicit hand-coded version shown here:
use std::cmp::min;
if n > 5 {
numbers.push(n);
}
if numbers.len() == 4 {
break;
}
}
While this version is more explicit in what it does, the code using
combinators is easier to understand and maintain. Switching the type of
container where values are collected requires changes in only one line with
combinators versus three in the expanded version. Adding new conditions
and transformations is also less error-prone.
The C++ standard library has a shared_ptr template class that’s used to
express shared ownership of an object. Internally, it uses reference
Table of content
counting to keep track of an object’s lifetime. An object is destroyed when
its last shared_ptris destroyed and the count drops to zero.
However, some objects (e.g. tree nodes) may need shared ownership but
may not need to be shared between threads. Atomic operations are
unnecessary overhead in this case. It may be possible to implement some
non_atomic_shared_ptr class, but accidentally sharing it between threads
compiler to ensure at compilation time that Rcs are never shared between
threads (more on this later). Therefore, it’s not possible to accidentally
share data that isn’t meant to be shared and we can be freed from the
unnecessary overhead of atomic operations.
Move Semantics
Table of content
programs by avoiding unnecessary copying of temporary values, enabling
safe storage of non-copyable objects like mutexes in containers, and
more.
The punchline here is that after the move, you generally can’t use the
previous location of the value (
foo in our case) because no value remains
there. But C++ doesn’t make this an error. Instead, it declares foo to have
an
unspecified value (defined by the move constructor). In some cases,
you can still safely use the variable (like with primitive types). In other
cases, you shouldn’t (like with mutexes).
On the other hand, Rust has a more advanced type system and it’s a
compilation error to use a value after it has been moved, no matter how
complex the control flow or data structure:
Table of content
error[E0382]: use of moved value: `foo`
--> src/main.rs:13:1
|
11 | let bar = foo;
| --- value moved here
12 |
13 | foo.some_method();
| ^^^ value used here after move
|
In fact, the Rust type system allows programmers to safely encode more
use cases than they can with C++. Consider converting between various
value representations. Let’s say you have a string in UTF-8 and you want
to convert it to a corresponding vector of bytes for further processing. You
don’t need the original string afterwards. In C++, the only safe option is to
copy the whole string using the vector copy constructor:
However, Rust allows you to move the internal buffer of the string into a
new vector, making the conversion efficient and disallowing use of the
original string afterwards:
Table of content
are an example of such copyable type, and any user-defined type can also
be marked as copyable with the #
[derive(Copy)]attribute.
● segmentation faults
● use-after-free and double-free bugs
● dangling pointers
● null dereferences
● unsafe concurrent modification
● buffer overflows
Ownership
Table of content
1 fn f() {
2 let v = Foo::new(); // ----+ v's lifetime
3 // |
4 /* some code */ // |
5 } // <---+
In this case, the object Foo is owned by the variable v and will die at line 5,
when function
f()returns.
Ownership can be
transferred by moving the object (which is performed by
default when the variable is assigned or used):
1 fn f() {
2 let v = Foo::new(); // ----+ v's lifetime
3 { // |
4 let u = v; // <---X---+ u's lifetime
5 // |
6 do_something(u); // <-------X
7 } //
8 } //
Initially, the variable v would be alive for lines 2 through 7, but its lifetime
ends at line 4 where v is assigned to u
. At that point we can’t use v
anymore (or a compiler error will occur). But the object F
oo isn’t dead yet;
it merely has a new owner u
that is alive for lines 4 through 6. However, at
line 6 the ownership of Foo is transferred to the function
do_something().
That function will destroy F
ooas soon as it returns.
Borrowing
But what if you don’t want to transfer ownership to the function? Then you
need to use references to pass a pointer to an object instead:
1 fn f() {
2 let v = Foo::new(); // ---+ v's lifetime
3 // |
4 do_something(&v); // :--|----.
5 // | } v's borrowed
6 / :--|----'
do_something_else(&v); /
Table of content
7 } // <--+
It’s expected that a reference will be alive for at least as long as the object
it refers to. This notion is implicit in C++ references, but Rust makes it an
explicit part of the reference type:
The argument v is in fact a reference to Foo with the lifetime ‘a, where ‘a is
defined by the function d
o_something()as the duration of its call.
C++ can handle simple cases like this just as well. But what if we want to
return a reference? What lifetime should the reference have? Obviously,
not longer than the object it refers to. However, since lifetimes aren’t part
of C++ reference types, the following code is syntactically correct for C++
and will compile just fine:
Table of content
Though this code is syntactically correct, however, it is
semantically
incorrect and has undefined behavior if the caller of
some_call() actually
uses the returned reference. Such errors may be hard to spot in casual
code review and generally require an external static code analyzer to
detect.
/
fn some_call(v: &Foo) -> &Foo { / ------------------+ expected
let w = Foo::new(); /
/ ---+ w's lifetime | lifetime
/
/ | | of the
return &w; /
/ <--+ | returned
} /
/ | value
/
/ <-----------------+
Table of content
don’t have to explicitly spell out all lifetimes for all references. In many
cases, like in the example above, the compiler is able to automatically infer
lifetimes, freeing the programmer from the burden of manual specification.
Foo c;
Foo *a = &c;
const Foo *b = &c;
Here, pointers a
and b
are aliases of the F
oo object owned by c
.
Modifications performed via a will be visible when b is dereferenced.
Usually, aliasing doesn’t cause errors, but there are some cases where it
might.
Consider the
memcpy() function. It can and is used for copying data, but
it’s known to be unsafe and can cause memory corruption when applied to
overlapping regions:
char array[5] = { 1, 2, 3, 4, 5 };
const char *a = &array[0];
char *b = &array[2];
memcpy(a, b, 3);
In the sample above, the first three elements are now undefined because
their values depend on the order in which
memcpy()performs the copying:
Table of content
The ultimate issue here is that the program contains two aliasing
references to the same object (the array), one of which is non-constant. If
such programs were syntactically incorrect then
memcpy() (and any other
function with pointer arguments as well) would always be safe to use.
The second rule relates to ownership, which was discussed in the previous
section. The first rule is the real novelty of Rust.
It’s obviously safe to have multiple aliasing pointers to the same object if
none of them can be used to modify the object (i.e. they are constant
references). If there are two mutable references, however, then
modifications can conflict with each other. Also, if there is a
const-reference A
and a mutable reference B, then presumably the
constant object as seen via A
can in fact change if modifications are made
via B
. But it’s perfectly safe if only one mutable reference to the object is
allowed to exist in the program. The Rust borrow checker enforces these
rules during compilation, effectively making each reference act as a
read-write lock for the object.
Table of content
Thank You for previewing this eBook
You can read the full version of this eBook in different formats:
To download this full book, simply select the format you desire below