Introduction
Participation
If you are interested in contributing to the Patterns catalogue, check out the contribution guidelines.
Design patterns
When developing programs, we have to solve many problems. A program can be viewed as a solution to a problem. It can also be viewed as a collection of solutions to many different problems. All of these solutions work together to solve a bigger problem.
Design patterns in Rust
There are many problems that share the same form. Due to the fact that Rust is not object-oriented design patterns vary with respect to other object-oriented programming languages. While the details are different, since they have the same form they can be solved using the same fundamental methods.
Design patterns are methods to solve common problems when writing software.
Anti-patterns are methods to solve these same common problems.
However, while design patterns give us benefits, anti-patterns create more problems.
Idioms are guidelines to follow when coding. They are social norms of the community. You can break them, but if you do you should have a good reason for it.
TODO: Mention why Rust is a bit special - functional elements, type system, borrow checker
Contributing
Introduction
This book is a catalogue of Rust programming techniques, (anti-)patterns, idioms and other explanations. It is a compilation of collective (sometimes implicit) knowledge as well as experiences that have emerged through collaborative work.
The patterns described here are not rules, but should be taken as guidelines for writing idiomatic code in Rust. We are collecting Rust patterns in this book so people can learn the tradeoffs between Rust idioms and use them properly in their own code.
If you want to be part of this effort here are some ways you can participate:
Discussion board
If you have a question or an idea regarding certain content but you want to have feedback of fellow community members and you think it may not be appropriate to file an issue open a discussion in our discussion board.
Writing a new article
Before writing a new article please check in one of the following resources if there is an existing discussion or if someone is already working on that topic:
If you don't find an issue regarding your topic and you are sure it is not more feasible to open a thread in the discussion board please open a new issue, so we can discuss about the ideas and future content of the article together and maybe give some feedback/input on it.
When writing a new article it's recommended to copy the pattern template into the appropriate directory and start editing it. You may not want to fill out every section and remove it or you might want to add extra sections.
Consider writing your article in a way that has a low barrier of entry so also Rustlings can follow and understand the thought process behind it. So we can encourage people to use these patterns early on.
We encourage you to write idiomatic Rust code that builds in the playground.
If you use links to blogposts or in general content that is not to be sure existing in a few years (e.g. pdfs) please take a snapshot with the Wayback Machine and use the link to that snapshot in your article.
Don't forget to add your new article to the SUMMARY.md
to let it be rendered to the book.
Please make Draft Pull requests
early so we can follow your progress and can give early feedback (see the following section).
Check the article locally
Before submitting the PR launch the commands mdbook build
to make sure that the book builds and mdbook test
to make sure that
code examples are correct.
Markdown lint
To make sure the files comply with our Markdown style we use markdownlint-cli. To spare you some manual work to get through the CI test you can use the following commands to automatically fix most of the emerging problems when writing Markdown files.
-
Install:
npm install -g markdownlint-cli
-
Check all markdown files:
- unix:
markdownlint '**/*.md'
- windows:
markdownlint **/*.md
- unix:
-
Automatically fix basic errors:
- unix:
markdownlint -f '**/*.md'
- windows:
markdownlint -f **/*.md
- unix:
Creating a Pull Request
"Release early and often!" also applies to pull requests!
Once your article has some visible work, create a [WIP]
draft pull request and give it a description of what you did or want to do.
Early reviews of the community are not meant as an offense but to give feedback.
A good principle: "Work together, share ideas, teach others."
Important Note
Please don't force push commits in your branch, in order to keep commit history and make it easier for us to see changes between reviews.
Make sure to Allow edits of maintainers
(under the text box) in the PR so people can actually collaborate on things or fix smaller issues themselves.
Idioms
Idioms are commonly used styles and patterns largely agreed upon by a community. They are guidelines. Writing idiomatic code allows other developers to understand what is happening because they are familiar with the form that it has.
The computer understands the machine code that is generated by the compiler. The language is therefore mostly beneficial to the developer. So, since we have this abstraction layer, why not put it to good use and make it simple?
Remember the KISS principle: "Keep It Simple, Stupid". It claims that "most systems work best if they are kept simple rather than made complicated; therefore, simplicity should be a key goal in design, and unnecessary complexity should be avoided".
Code is there for humans, not computers, to understand.
Use borrowed types for arguments
Description
Using a target of a deref coercion can increase the flexibility of your code when you are deciding which argument type to use for a function argument. In this way, the function will accept more input types.
This is not limited to slice-able or fat pointer types. In fact you should always prefer using the borrowed type over borrowing the owned type. E.g., &str
over &String
, &[T]
over &Vec<T>
, or &T
over &Box<T>
.
Using borrowed types you can avoid layers of indirection for those instances where the owned type already provides a layer of indirection. For instance, a String
has a layer of indirection, so a &String
will have two layers of indrection.
We can avoid this by using &str
instead, and letting &String
coerce to a &str
whenever the function is invoked.
Example
For this example, we will illustrate some differences for using &String
as a function argument versus using a &str
, but the ideas apply as well to using &Vec<T>
versus using a &[T]
or using a &T
versus a &Box<T>
.
Consider an example where we wish to determine if a word contains three consecutive vowels. We don't need to own the string to determine this, so we will take a reference.
The code might look something like this:
fn three_vowels(word: &String) -> bool { let mut vowel_count = 0; for c in word.chars() { match c { 'a' | 'e' | 'i' | 'o' | 'u' => { vowel_count += 1; if vowel_count >= 3 { return true } } _ => vowel_count = 0 } } false } fn main() { let ferris = "Ferris".to_string(); let curious = "Curious".to_string(); println!("{}: {}", ferris, three_vowels(&ferris)); println!("{}: {}", curious, three_vowels(&curious)); // This works fine, but the following two lines would fail: // println!("Ferris: {}", three_vowels("Ferris")); // println!("Curious: {}", three_vowels("Curious")); }
This works fine because we are passing a &String
type as a parameter.
If we comment in the last two lines this example fails because a &str
type will not coerce to a &String
type.
We can fix this by simply modifying the type for our argument.
For instance, if we change our function declaration to:
fn three_vowels(word: &str) -> bool {
then both versions will compile and print the same output.
Ferris: false
Curious: true
But wait, that's not all! There is more to this story.
It's likely that you may say to yourself: that doesn't matter, I will never be using a &'static str
as an input anways (as we did when we used "Ferris"
).
Even ignoring this special example, you may still find that using &str
will give you more flexibility than using a &String
.
Let's now take an example where someone gives us a sentence, and we want to determine if any of the words in the sentence has a word that contains three consecutive vowels. We probably should make use of the function we have already defined and simply feed in each word from the sentence.
An example of this could look like this:
fn three_vowels(word: &str) -> bool { let mut vowel_count = 0; for c in word.chars() { match c { 'a' | 'e' | 'i' | 'o' | 'u' => { vowel_count += 1; if vowel_count >= 3 { return true } } _ => vowel_count = 0 } } false } fn main() { let sentence_string = "Once upon a time, there was a friendly curious crab named Ferris".to_string(); for word in sentence_string.split(' ') { if three_vowels(word) { println!("{} has three consecutive vowels!", word); } } }
Running this example using our function declared with an argument type &str
will yield
curious has three consecutive vowels!
However, this example will not run when our function is declared with an argument type &String
.
This is because string slices are a &str
and not a &String
which would require an allocation to be converted to &String
which is not implicit, whereas converting from String
to &str
is cheap and implicit.
See also
- Rust Language Reference on Type Coercions
- For more discussion on how to handle
String
and&str
see this blog series (2015) by Herman J. Radtke III.
Concatenating strings with format!
Description
It is possible to build up strings using the push
and push_str
methods on a
mutable String
, or using its +
operator. However, it is often more
convenient to use format!
, especially where there is a mix of literal and
non-literal strings.
Example
#![allow(unused)] fn main() { fn say_hello(name: &str) -> String { // We could construct the result string manually. // let mut result = "Hello ".to_owned(); // result.push_str(name); // result.push('!'); // result // But using format! is better. format!("Hello {}!", name) } }
Advantages
Using format!
is usually the most succinct and readable way to combine strings.
Disadvantages
It is usually not the most efficient way to combine strings - a series of push
operations on a mutable string is usually the most efficient (especially if the
string has been pre-allocated to the expected size).
Constructors
Description
Rust does not have constructors as a language construct. Instead, the convention
is to use a static new
method to create an object.
Example
// A Rust vector, see liballoc/vec.rs
pub struct Vec<T> {
buf: RawVec<T>,
len: usize,
}
impl<T> Vec<T> {
// Constructs a new, empty `Vec<T>`.
// Note this is a static method - no self.
// This constructor doesn't take any arguments, but some might in order to
// properly initialise an object
pub fn new() -> Vec<T> {
// Create a new Vec with fields properly initialised.
Vec {
// Note that here we are calling RawVec's constructor.
buf: RawVec::new(),
len: 0,
}
}
}
See also
The builder pattern for constructing objects where there are multiple configurations.
The Default
Trait
Description
Many types in Rust have a constructor. However, this is specific to the
type; Rust cannot abstract over "everything that has a new()
method". To
allow this, the Default
trait was conceived, which can be used with
containers and other generic types (e.g. see Option::unwrap_or_default()
).
Notably, some containers already implement it where applicable.
Not only do one-element containers like Cow
, Box
or Arc
implement
Default
for contained Default
types, one can automatically
#[derive(Default)]
for structs whose fields all implement it, so the more
types implement Default
, the more useful it becomes.
On the other hand, constructors can take multiple arguments, while the
default()
method does not. There can even be multiple constructors with
different names, but there can only be one Default
implementation per type.
Example
use std::{path::PathBuf, time::Duration}; // note that we can simply auto-derive Default here. #[derive(Default, Debug)] struct MyConfiguration { // Option defaults to None output: Option<PathBuf>, // Vecs default to empty vector search_path: Vec<PathBuf>, // Duration defaults to zero time timeout: Duration, // bool defaults to false check: bool, } impl MyConfiguration { // add setters here } fn main() { // construct a new instance with default values let mut conf = MyConfiguration::default(); // do something with conf here conf.check = true; println!("conf = {:#?}", conf); }
See also
- The constructor idiom is another way to generate instances that may or may not be "default"
- The
Default
documentation (scroll down for the list of implementors) Option::unwrap_or_default()
derive(new)
Collections are smart pointers
Description
Use the Deref
trait to treat collections like smart pointers, offering owning
and borrowed views of data.
Example
use std::ops::Deref;
struct Vec<T> {
data: T,
//..
}
impl<T> Deref for Vec<T> {
type Target = [T];
fn deref(&self) -> &[T] {
//..
}
}
A Vec<T>
is an owning collection of T
s, a slice (&[T]
) is a borrowed
collection of T
s. Implementing Deref
for Vec
allows implicit dereferencing
from &Vec<T>
to &[T]
and includes the relationship in auto-derefencing
searches. Most methods you might expect to be implemented for Vec
s are instead
implemented for slices.
See also String
and &str
.
Motivation
Ownership and borrowing are key aspects of the Rust language. Data structures must account for these semantics properly in order to give a good user experience. When implementing a data structure which owns its data, offering a borrowed view of that data allows for more flexible APIs.
Advantages
Most methods can be implemented only for the borrowed view, they are then implicitly available for the owning view.
Gives clients a choice between borrowing or taking ownership of data.
Disadvantages
Methods and traits only available via dereferencing are not taken into account
when bounds checking, so generic programming with data structures using this
pattern can get complex (see the Borrow
and AsRef
traits, etc.).
Discussion
Smart pointers and collections are analogous: a smart pointer points to a single object, whereas a collection points to many objects. From the point of view of the type system there is little difference between the two. A collection owns its data if the only way to access each datum is via the collection and the collection is responsible for deleting the data (even in cases of shared ownership, some kind of borrowed view may be appropriate). If a collection owns its data, it is usually useful to provide a view of the data as borrowed so that it can be multiply referenced.
Most smart pointers (e.g., Foo<T>
) implement Deref<Target=T>
. However,
collections will usually dereference to a custom type. [T]
and str
have some
language support, but in the general case, this is not necessary. Foo<T>
can
implement Deref<Target=Bar<T>>
where Bar
is a dynamically sized type and
&Bar<T>
is a borrowed view of the data in Foo<T>
.
Commonly, ordered collections will implement Index
for Range
s to provide
slicing syntax. The target will be the borrowed view.
See also
Deref polymorphism anti-pattern.
Documentation for Deref
trait.
Finalisation in destructors
Description
Rust does not provide the equivalent to finally
blocks - code that will be
executed no matter how a function is exited. Instead an object's destructor can
be used to run code that must be run before exit.
Example
fn bar() -> Result<(), ()> {
// These don't need to be defined inside the function.
struct Foo;
// Implement a destructor for Foo.
impl Drop for Foo {
fn drop(&mut self) {
println!("exit");
}
}
// The dtor of _exit will run however the function `bar` is exited.
let _exit = Foo;
// Implicit return with `?` operator.
baz()?;
// Normal return.
Ok(())
}
Motivation
If a function has multiple return points, then executing code on exit becomes
difficult and repetitive (and thus bug-prone). This is especially the case where
return is implicit due to a macro. A common case is the ?
operator which
returns if the result is an Err
, but continues if it is Ok
. ?
is used as
an exception handling mechanism, but unlike Java (which has finally
), there is
no way to schedule code to run in both the normal and exceptional cases.
Panicking will also exit a function early.
Advantages
Code in destructors will (nearly) always be run - copes with panics, early returns, etc.
Disadvantages
It is not guaranteed that destructors will run. For example, if there is an infinite loop in a function or if running a function crashes before exit. Destructors are also not run in the case of a panic in an already panicking thread. Therefore destructors cannot be relied on as finalisers where it is absolutely essential that finalisation happens.
This pattern introduces some hard to notice, implicit code. Reading a function gives no clear indication of destructors to be run on exit. This can make debugging tricky.
Requiring an object and Drop
impl just for finalisation is heavy on boilerplate.
Discussion
There is some subtlety about how exactly to store the object used as a
finaliser. It must be kept alive until the end of the function and must then be
destroyed. The object must always be a value or uniquely owned pointer (e.g.,
Box<Foo>
). If a shared pointer (such as Rc
) is used, then the finaliser can
be kept alive beyond the lifetime of the function. For similar reasons, the
finaliser should not be moved or returned.
The finaliser must be assigned into a variable, otherwise it will be destroyed
immediately, rather than when it goes out of scope. The variable name must start
with _
if the variable is only used as a finaliser, otherwise the compiler
will warn that the finaliser is never used. However, do not call the variable
_
with no suffix - in that case it will be destroyed immediately.
In Rust, destructors are run when an object goes out of scope. This happens whether we reach the end of block, there is an early return, or the program panics. When panicking, Rust unwinds the stack running destructors for each object in each stack frame. So, destructors get called even if the panic happens in a function being called.
If a destructor panics while unwinding, there is no good action to take, so Rust aborts the thread immediately, without running further destructors. This means that desctructors are not absolutely guaranteed to run. It also means that you must take extra care in your destructors not to panic, since it could leave resources in an unexpected state.
See also
RAII.
mem::replace
to keep owned values in changed enums
Description
Say we have a &mut MyEnum
which has (at least) two variants,
A { name: String, x: u8 }
and B { name: String }
. Now we want to change
MyEnum::A
to a B
if x
is zero, while keeping MyEnum::B
intact.
We can do this without cloning the name
.
Example
#![allow(unused)] fn main() { use std::mem; enum MyEnum { A { name: String, x: u8 }, B { name: String } } fn a_to_b(e: &mut MyEnum) { // we mutably borrow `e` here. This precludes us from changing it directly // as in `*e = ...`, because the borrow checker won't allow it. Therefore // the assignment to `e` must be outside the `if let` clause. *e = if let MyEnum::A { ref mut name, x: 0 } = *e { // this takes out our `name` and put in an empty String instead // (note that empty strings don't allocate). // Then, construct the new enum variant (which will // be assigned to `*e`, because it is the result of the `if let` expression). MyEnum::B { name: mem::replace(name, String::new()) } // In all other cases, we return immediately, thus skipping the assignment } else { return } } }
This also works with more variants:
use std::mem;
enum MultiVariateEnum {
A { name: String },
B { name: String },
C,
D
}
fn swizzle(e: &mut MultiVariateEnum) {
use self::MultiVariateEnum::*;
*e = match *e {
// Ownership rules do not allow taking `name` by value, but we cannot
// take the value out of a mutable reference, unless we replace it:
A { ref mut name } => B { name: mem::replace(name, String::new()) },
B { ref mut name } => A { name: mem::replace(name, String::new()) },
C => D,
D => C
}
}
Motivation
When working with enums, we may want to change an enum value in place, perhaps to another variant. This is usually done in two phases to keep the borrow checker happy. In the first phase, we observe the existing value and look at its parts to decide what to do next. In the second phase we may conditionally change the value (as in the example above).
The borrow checker won't allow us to take out name
of the enum (because
something must be there. We could of course .clone()
name and put the clone
into our MyEnum::B
, but that would be an instance of the [Clone to satisfy
the borrow checker] antipattern. Anyway, we can avoid the extra allocation by
changing e
with only a mutable borrow.
mem::replace
lets us swap out the value, replacing it with something else. In
this case, we put in an empty String
, which does not need to allocate. As a
result, we get the original name
as an owned value. We can then wrap this in
another enum.
Note, however, that if we are using an Option
and want to replace its
value with a None
, Option
’s take()
method provides a shorter and
more idiomatic alternative.
Advantages
Look ma, no allocation! Also you may feel like Indiana Jones while doing it.
Disadvantages
This gets a bit wordy. Getting it wrong repeatedly will make you hate the borrow checker. The compiler may fail to optimize away the double store, resulting in reduced performance as opposed to what you'd do in unsafe languages.
Discussion
This pattern is only of interest in Rust. In GC'd languages, you'd take the reference to the value by default (and the GC would keep track of refs), and in other low-level languages like C you'd simply alias the pointer and fix things later.
However, in Rust, we have to do a little more work to do this. An owned value may only have one owner, so to take it out, we need to put something back in – like Indiana Jones, replacing the artifact with a bag of sand.
See also
This gets rid of the [Clone to satisfy the borrow checker] antipattern in a specific case.
[Clone to satisfy the borrow checker](TODO: Hinges on PR #23)
On-Stack Dynamic Dispatch
Description
We can dynamically dispatch over multiple values, however, to do so, we need to declare multiple variables to bind differently-typed objects. To extend the lifetime as necessary, we can use deferred conditional initialization, as seen below:
Example
use std::io; use std::fs; fn main() -> Result<(), Box<dyn std::error::Error>> { let arg = "-"; // These must live longer than `readable`, and thus are declared first: let (mut stdin_read, mut file_read); // We need to ascribe the type to get dynamic dispatch. let readable: &mut dyn io::Read = if arg == "-" { stdin_read = io::stdin(); &mut stdin_read } else { file_read = fs::File::open(arg)?; &mut file_read }; // Read from `readable` here. Ok(()) }
Motivation
Rust monomorphises code by default. This means a copy of the code will be generated for each type it is used with and optimized independently. While this allows for very fast code on the hot path, it also bloats the code in places where performance is not of the essence, thus costing compile time and cache usage.
Luckily, Rust allows us to use dynamic dispatch, but we have to explicitly ask for it.
Advantages
We do not need to allocate anything on the heap. Neither do we need to
initialize something we won't use later, nor do we need to monomorphize the
whole code that follows to work with both File
or Stdin
.
Disadvantages
The code needs more moving parts than the Box
-based version:
// We still need to ascribe the type for dynamic dispatch.
let readable: Box<dyn io::Read> = if arg == "-" {
Box::new(io::stdin())
} else {
Box::new(fs::File::open(arg)?)
};
// Read from `readable` here.
Discussion
Rust newcomers will usually learn that Rust requires all variables to be initialized before use, so it's easy to overlook the fact that unused variables may well be uninitialized. Rust works quite hard to ensure that this works out fine and only the initialized values are dropped at the end of their scope.
The example meets all the constraints Rust places on us:
- All variables are initialized before using (in this case borrowing) them
- Each variable only holds values of a single type. In our example,
stdin
is of typeStdin
,file
is of typeFile
andreadable
is of type&mut dyn Read
- Each borrowed value outlives all the references borrowed from it
See also
- Finalisation in destructors and RAII guards can benefit from tight control over lifetimes.
- For conditionally filled
Option<&T>
s of (mutable) references, one can initialize anOption<T>
directly and use its.as_ref()
method to get an optional reference.
FFI Idioms
Writing FFI code is an entire course in itself.
However, there are several idioms here that can act as pointers, and avoid traps for inexperienced users of unsafe
Rust.
This section contains idioms that may be useful when doing FFI.
-
Idiomatic Errors - Error handling with integer codes and sentinel return values (such as
NULL
pointers) -
Accepting Strings with minimal unsafe code
-
Passing Strings to FFI functions
Error Handling in FFI
Description
In foreign languages like C, errors are represented by return codes. However, Rust's type system allows much more rich error information to be captured a propogated through a full type.
This best practice shows different kinds of error codes, and how to expose them in a usable way:
- Flat Enums should be converted to integers and returned as codes.
- Structured Enums should be converted to an integer code with a string error message for detail.
- Custom Error Types should become "transparent", with a C representation.
Code Example
Flat Enums
enum DatabaseError {
IsReadOnly = 1, // user attempted a write operation
IOError = 2, // user should read the C errno() for what it was
FileCorrupted = 3, // user should run a repair tool to recover it
}
impl From<DatabaseError> for libc::c_int {
fn from(e: DatabaseError) -> libc::c_int {
(e as i8).into()
}
}
Structured Enums
pub mod errors {
enum DatabaseError {
IsReadOnly,
IOError(std::io::Error),
FileCorrupted(String), // message describing the issue
}
impl From<DatabaseError> for libc::c_int {
fn from(e: DatabaseError) -> libc::c_int {
match e {
DatabaseError::IsReadOnly => 1,
DatabaseError::IOError(_) => 2,
DatabaseError::FileCorrupted(_) => 3,
}
}
}
}
pub mod c_api {
use super::errors::DatabaseError;
#[no_mangle]
pub extern "C" fn db_error_description(
e: *const DatabaseError
) -> *mut libc::c_char {
let error: &DatabaseError = unsafe {
/* SAFETY: pointer lifetime is greater than the current stack frame */
&*e
};
let error_str: String = match error {
DatabaseError::IsReadOnly => {
format!("cannot write to read-only database");
}
DatabaseError::IOError(e) => {
format!("I/O Error: {}", e);
}
DatabaseError::FileCorrupted(s) => {
format!("File corrupted, run repair: {}", &s);
}
};
let c_error = unsafe {
// SAFETY: copying error_str to an allocated buffer with a NUL
// character at the end
let mut malloc: *mut u8 = libc::malloc(error_str.len() + 1) as *mut _;
if malloc.is_null() {
return std::ptr::null_mut();
}
let src = error_str.as_bytes().as_ptr();
std::ptr::copy_nonoverlapping(src, malloc, error_str.len());
std::ptr::write(malloc.add(error_str.len()), 0);
malloc as *mut libc::c_char
};
c_error
}
}
Custom Error Types
struct ParseError {
expected: char,
line: u32,
ch: u16
}
impl ParseError { /* ... */ }
/* Create a second version which is exposed as a C structure */
#[repr(C)]
pub struct parse_error {
pub expected: libc::c_char,
pub line: u32,
pub ch: u16
}
impl From<ParseError> for parse_error {
fn from(e: ParseError) -> parse_error {
let ParseError { expected, line, ch } = e;
parse_error { expected, line, ch }
}
}
Advantages
This ensures that the foreign language has clear access to error information while not compromising the Rust code's API at all.
Disadvantages
It's a lot of typing, and some types may not be able to be converted easily to C.
Accepting Strings
Description
When accepting strings via FFI through pointers, there are two principles that should be followed:
- Keep foreign strings "borrowed", rather than copying them directly.
- Minimize
unsafe
code during the conversion.
Motivation
Rust has built-in support for C-style strings with its CString
and CStr
types.
However, there are different approaches one can take with strings that are being accepted from a foreign caller of a Rust function.
The best practice is simple: use CStr
in such a way as to minimize unsafe code, and create a borrowed slice.
If an owned String is needed, call to_string()
on the string slice.
Code Example
pub mod unsafe_module {
// other module content
#[no_mangle]
pub extern "C" fn mylib_log(msg: *const libc::c_char, level: libc::c_int) {
let level: crate::LogLevel = match level { /* ... */ };
let msg_str: &str = unsafe {
// SAFETY: accessing raw pointers expected to live for the call,
// and creating a shared reference that does not outlive the current
// stack frame.
match std::ffi::CStr::from_ptr(msg).to_str() {
Ok(s) => s,
Err(e) => {
crate::log_error("FFI string conversion failed");
return;
}
}
};
crate::log(msg_str, level);
}
}
Advantages
The example is is written to ensure that:
- The
unsafe
block is as small as possible. - The pointer with an "untracked" lifetime becomes a "tracked" shared reference
Consider an alternative, where the string is actually copied:
pub mod unsafe_module {
// other module content
pub extern "C" fn mylib_log(msg: *const libc::c_char, level: libc::c_int) {
/* DO NOT USE THIS CODE. IT IS UGLY, VERBOSE, AND CONTAINS A SUBTLE BUG. */
let level: crate::LogLevel = match level { /* ... */ };
let msg_len = unsafe { /* SAFETY: strlen is what it is, I guess? */
libc::strlen(msg)
};
let mut msg_data = Vec::with_capacity(msg_len + 1);
let msg_cstr: std::ffi::CString = unsafe {
// SAFETY: copying from a foreign pointer expected to live
// for the entire stack frame into owned memory
std::ptr::copy_nonoverlapping(msg, msg_data.as_mut(), msg_len);
msg_data.set_len(msg_len + 1);
std::ffi::CString::from_vec_with_nul(msg_data).unwrap()
}
let msg_str: String = unsafe {
match msg_cstr.into_string() {
Ok(s) => s,
Err(e) => {
crate::log_error("FFI string conversion failed");
return;
}
}
};
crate::log(&msg_str, level);
}
}
This code in inferior to the original in two respects:
- There is much more
unsafe
code, and more importantly, more invariants it must uphold. - Due to the extensive arithmetic required, there is a bug in this version that cases Rust
undefined behaviour
.
The bug here is a simple mistake in pointer arithmetic: the string was copied, all msg_len
bytes of it.
However, the NUL
terminator at the end was not.
The Vector then had its size set to the length of the zero padded string -- rather than resized to it, which could have added a zero at the end. As a result, the last byte in the Vector is uninitialized memory.
When the CString
is created at the bottom of the block, its read of the Vector will cause undefined behaviour
!
Like many such issues, this would be difficult issue to track down.
Sometimes it would panic because the string was not UTF-8
, sometimes it would put a weird character at the end of the string, sometimes it would just completely crash.
Disadvantages
None?
Passing Strings
Description
When passing strings to FFI functions, there are four principles that should be followed:
- Make the lifetime of owned strings as long as possible.
- Minimize
unsafe
code during the conversion. - If the C code can modify the string data, use
Vec
instead ofCString
. - Unless the Foreign Function API requires it, the ownership of the string should not transfer to the callee.
Motivation
Rust has built-in support for C-style strings with its CString
and CStr
types.
However, there are different approaches one can take with strings that are being sent to a foreign function call from a Rust function.
The best practice is simple: use CString
in such a way as to minimize unsafe
code.
However, a secondary caveat is that the object must live long enough, meaning the lifetime should be maximized.
In addition, the documentation explains that "round-tripping" a CString
after modification is UB, so additional work is necessary in that case.
Code Example
pub mod unsafe_module {
// other module content
extern "C" {
fn seterr(message: *const libc::c_char);
fn geterr(buffer: *mut libc::c_char, size: libc::c_int) -> libc::c_int;
}
fn report_error_to_ffi<S: Into<String>>(
err: S
) -> Result<(), std::ffi::NulError>{
let c_err = std::ffi::CString::new(err.into())?;
unsafe {
// SAFETY: calling an FFI whose documentation says the pointer is
// const, so no modificationshould occur
seterr(c_err.as_ptr());
}
Ok(())
// The lifetime of c_err continues until here
}
fn get_error_from_ffi() -> Result<String, std::ffi::IntoStringError> {
let mut buffer = vec![0u8; 1024];
unsafe {
// SAFETY: calling an FFI whose documentation implies
// that the input need only live as long as the call
let written: usize = geterr(buffer.as_mut_ptr(), 1023).into();
buffer.truncate(written + 1);
}
std::ffi::CString::new(buffer).unwrap().into_string()
}
}
Advantages
The example is written in a way to ensure that:
- The
unsafe
block is as small as possible. - The
CString
lives long enough. - Errors with typecasts are always propagated when possible.
A common mistake (so common it's in the documentation) is to not use the variable in the first block:
pub mod unsafe_module {
// other module content
fn report_error<S: Into<String>>(err: S) -> Result<(), std::ffi::NulError> {
unsafe {
// SAFETY: whoops, this contains a dangling pointer!
seterr(std::ffi::CString::new(err.into())?.as_ptr());
}
Ok(())
}
}
This code will result in a dangling pointer, because the lifetime of the CString
is not extended by the pointer creation, unlike if a reference were created.
Another issue frequently raised is that the initialization of a 1k vector of zeroes is "slow".
However, recent versions of Rust actually optimize that particular macro to a call to zmalloc
, meaning it is as fast as the operating system's ability to return zeroed memory (which is quite fast).
Disadvantages
None?
Iterating over an Option
Description
Option
can be viewed as a container that contains either zero or one elements. In particular, it implements the IntoIterator
trait, and as such can be used with generic code that needs such a type.
Examples
Since Option
implements IntoIterator
, it can be used as an argument to .extend()
:
#![allow(unused)] fn main() { let turing = Some("Turing"); let mut logicians = vec!["Curry", "Kleene", "Markov"]; logicians.extend(turing); // equivalent to if let Some(turing_inner) = turing { logicians.push(turing_inner); } }
If you need to tack an Option
to the end of an existing iterator, you can pass it to .chain()
:
#![allow(unused)] fn main() { let turing = Some("Turing"); let logicians = vec!["Curry", "Kleene", "Markov"]; for logician in logicians.iter().chain(turing.iter()) { println!("{} is a logician", logician); } }
Note that if the Option
is always Some
, then it is more idiomatic to use std::iter::once
on the element instead.
Also, since Option
implements IntoIterator
, it's possible to iterate over it using a for
loop. This is equivalent to matching it with if let Some(..)
, and in most cases you should prefer the latter.
See also
-
std::iter::once
is an iterator which yields exactly one element. It's a more readable alternative toSome(foo).into_iter()
. -
Iterator::filter_map
is a version ofIterator::flat_map
, specialized to mapping functions which returnOption
. -
The
ref_slice
crate provides functions for converting anOption
to a zero- or one-element slice.
Pass variables to closure
Description
By default, closures capture their environment by borrowing. Or you can use move
-closure
to move whole environment. However, often you want to move just some variables to closure,
give it copy of some data, pass it by reference, or perform some other transformation.
Use variable rebinding in separate scope for that.
Example
Use
#![allow(unused)] fn main() { use std::rc::Rc; let num1 = Rc::new(1); let num2 = Rc::new(2); let num3 = Rc::new(3); let closure = { // `num1` is moved let num2 = num2.clone(); // `num2` is cloned let num3 = num3.as_ref(); // `num3` is borrowed move || { *num1 + *num2 + *num3; } }; }
instead of
#![allow(unused)] fn main() { use std::rc::Rc; let num1 = Rc::new(1); let num2 = Rc::new(2); let num3 = Rc::new(3); let num2_cloned = num2.clone(); let num3_borrowed = num3.as_ref(); let closure = move || { *num1 + *num2_cloned + *num3_borrowed; }; }
Advantages
Copied data are grouped together with closure definition, so their purpose is more clear and they will be dropped immediately even if they are not consumed by closure.
Closure uses same variable names as surrounding code whether data are copied or moved.
Disadvantages
Additional indentation of closure body.
Privacy for extensibility
Description
Use a private field to ensure that a struct is extensible without breaking stability guarantees.
Example
mod a {
// Public struct.
pub struct S {
pub foo: i32,
// Private field.
bar: i32,
}
}
fn main(s: a::S) {
// Because S::bar is private, it cannot be named here and we must use `..`
// in the pattern.
let a::S { foo: _, ..} = s;
}
Discussion
Adding a field to a struct is a mostly backwards compatible change. However, if a client uses a pattern to deconstruct a struct instance, they might name all the fields in the struct and adding a new one would break that pattern. The client could name some of the fields and use ..
in the pattern, in which case adding another field is backwards compatible. Making at least one of the struct's fields private forces clients to use the latter form of patterns, ensuring that the struct is future-proof.
The downside of this approach is that you might need to add an otherwise unneeded field to the struct. You can use the ()
type so that there is no runtime overhead and prepend _
to the field name to avoid the unused field warning.
If Rust allowed private variants of enums, we could use the same trick to make adding a variant to an enum backwards compatible. The problem there is exhaustive match expressions. A private variant would force clients to have a _
wildcard pattern.
Easy doc initialization
Description
If a struct takes significant effort to initialize, when writing docs, it can be quicker to wrap your example with a function which takes the struct as an argument.
Motivation
Sometimes there is a struct with multiple or complicated parameters and several methods. Each of these methods should have examples.
For example:
struct Connection {
name: String,
stream: TcpStream,
}
impl Connection {
/// Sends a request over the connection.
///
/// # Example
/// ```no_run
/// # // Boilerplate are required to get an example working.
/// # let stream = TcpStream::connect("127.0.0.1:34254");
/// # let connection = Connection { name: "foo".to_owned(), stream };
/// # let request = Request::new("RequestId", RequestType::Get, "payload");
/// let response = connection.send_request(request);
/// assert!(response.is_ok());
/// ```
fn send_request(&self, request: Request) -> Result<Status, SendErr> {
// ...
}
/// Oh no, all that boilerplate needs to be repeated here!
fn check_status(&self) -> Status {
// ...
}
}
Example
Instead of typing all of this boiler plate to create an Connection
and Request
it is easier to just create a wrapping dummy function which takes them as arguments:
struct Connection {
name: String,
stream: TcpStream,
}
impl Connection {
/// Sends a request over the connection.
///
/// # Example
/// ```
/// # fn call_send(connection: Connection, request: Request) {
/// let response = connection.send_request(request);
/// assert!(response.is_ok());
/// # }
/// ```
fn send_request(&self, request: Request) {
// ...
}
}
Note in the above example the line assert!(response.is_ok());
will not actually run while testing because it is inside of a function which is never invoked.
Advantages
This is much more concise and avoids repetitive code in examples.
Disadvantages
As example is in a function, the code will not be tested. (Though it still will checked to make sure it compiles when running a cargo test
)
So this pattern is most useful when need no_run
. With this, you do not need to add no_run
.
Discussion
If assertions are not required this pattern works well.
If they are, an alternative can be to create a public method to create a dummy instance which is annotated with #[doc(hidden)]
(so that users won't see it).
Then this method can be called inside of rustdoc because it is part of the crate's public API.
Temporary mutability
Description
Often it is necessary to prepare and process some data, but after that data are only inspected and never modified. The intention can be made explicit by redefining the mutable variable as immutable.
It can be done either by processing data within nested block or by redefining variable.
Example
Say, vector must be sorted before usage.
Using nested block:
let data = {
let mut data = get_vec();
data.sort();
data
};
// Here `data` is immutable.
Using variable rebinding:
let mut data = get_vec();
data.sort();
let data = data;
// Here `data` is immutable.
Advantages
Compiler ensures that you don't accidentally mutate data after some point.
Disadvantages
Nested block requires additional indentation of block body. One more line to return data from block or redefine variable.
Design Patterns
Design patterns are "general reusable solutions to a commonly occurring problem within a given context in software design". Design patterns are a great way to describe some of the culture and 'tribal knowledge' of programming in a language. Design patterns are very language-specific - what is a pattern in one language may be unnecessary in another due to a language feature, or impossible to express due to a missing feature.
If overused, design patterns can add unnecessary complexity to programs. However, they are a great way to share intermediate and advanced level knowledge about a programming language.
Design patterns in Rust
Rust has many very unique features. These features give us great benefit by removing whole classes of problems. Some of them are also patterns that are unique to Rust.
YAGNI
If you're not familiar with it, YAGNI is an acronym that stands for You Aren't Going to Need It
. It's an important software design principle to apply as you write code.
The best code I ever wrote was code I never wrote.
If we apply YAGNI to design patterns, we see that the features of Rust allow us to throw out many patterns. For instance, there is no need for the strategy pattern in Rust because we can just use traits.
TODO: Maybe include some code to illustrate the traits.
Builder
Description
Construct an object with calls to a builder helper.
Example
#![allow(unused)] fn main() { #[derive(Debug, PartialEq)] pub struct Foo { // Lots of complicated fields. bar: String, } pub struct FooBuilder { // Probably lots of optional fields. bar: String, } impl FooBuilder { pub fn new(/* ... */) -> FooBuilder { // Set the minimally required fields of Foo. FooBuilder { bar: String::from("X"), } } pub fn name(mut self, bar: String) -> FooBuilder { // Set the name on the builder itself, and return the builder by value. self.bar = bar; self } // If we can get away with not consuming the Builder here, that is an // advantage. It means we can use the FooBuilder as a template for constructing many Foos. pub fn build(self) -> Foo { // Create a Foo from the FooBuilder, applying all settings in FooBuilder to Foo. Foo { bar: self.bar } } } #[test] fn builder_test() { let foo = Foo { bar: String::from("Y"), }; let foo_from_builder: Foo = FooBuilder::new().name(String::from("Y")).build(); assert_eq!(foo, foo_from_builder); } }
Motivation
Useful when you would otherwise require many different constructors or where construction has side effects.
Advantages
Separates methods for building from other methods.
Prevents proliferation of constructors
Can be used for one-liner initialisation as well as more complex construction.
Disadvantages
More complex than creating a struct object directly, or a simple constructor function.
Discussion
This pattern is seen more frequently in Rust (and for simpler objects) than in many other languages because Rust lacks overloading. Since you can only have a single method with a given name, having multiple constructors is less nice in Rust than in C++, Java, or others.
This pattern is often used where the builder object is useful in its own right,
rather than being just a builder. For example, see
std::process::Command
is a builder for Child
(a process). In these cases, the T
and TBuilder
pattern
of naming is not used.
The example takes and returns the builder by value. It is often more ergonomic (and more efficient) to take and return the builder as a mutable reference. The borrow checker makes this work naturally. This approach has the advantage that one can write code like
let mut fb = FooBuilder::new();
fb.a();
fb.b();
let f = fb.build();
as well as the FooBuilder::new().a().b().build()
style.
See also
- Description in the style guide
- derive_builder, a crate for automatically implementing this pattern while avoiding the boilerplate.
- Constructor pattern for when construction is simpler.
- Builder pattern (wikipedia)
- Builders enable construction of complex values (C-BUILDER) from the Rust API guidelines
Compose structs together for better borrowing
TODO - this is not a very snappy name
Description
Sometimes a large struct will cause issues with the borrow checker - although fields can be borrowed independently, sometimes the whole struct ends up being used at once, preventing other uses. A solution might be to decompose the struct into several smaller structs. Then compose these together into the original struct. Then each struct can be borrowed separately and have more flexible behaviour.
This will often lead to a better design in other ways: applying this design pattern often reveals smaller units of functionality.
Example
Here is a contrived example of where the borrow checker foils us in our plan to use a struct:
#![allow(unused)] fn main() { struct A { f1: u32, f2: u32, f3: u32, } fn foo(a: &mut A) -> &u32 { &a.f2 } fn bar(a: &mut A) -> u32 { a.f1 + a.f3 } fn baz(a: &mut A) { // The later usage of x causes a to be borrowed for the rest of the function. let x = foo(a); // Borrow checker error: // let y = bar(a); // ~ ERROR: cannot borrow `*a` as mutable more than once at a time println!("{}", x); } }
We can apply this design pattern and refactor A
into two smaller structs, thus
solving the borrow checking issue:
#![allow(unused)] fn main() { // A is now composed of two structs - B and C. struct A { b: B, c: C, } struct B { f2: u32, } struct C { f1: u32, f3: u32, } // These functions take a B or C, rather than A. fn foo(b: &mut B) -> &u32 { &b.f2 } fn bar(c: &mut C) -> u32 { c.f1 + c.f3 } fn baz(a: &mut A) { let x = foo(&mut a.b); // Now it's OK! let y = bar(&mut a.c); println!("{}", x); } }
Motivation
TODO Why and where you should use the pattern
Advantages
Lets you work around limitations in the borrow checker.
Often produces a better design.
Disadvantages
Leads to more verbose code.
Sometimes, the smaller structs are not good abstractions, and so we end up with a worse design. That is probably a 'code smell', indicating that the program should be refactored in some way.
Discussion
This pattern is not required in languages that don't have a borrow checker, so in that sense is unique to Rust. However, making smaller units of functionality often leads to cleaner code: a widely acknowledged principle of software engineering, independent of the language.
This pattern relies on Rust's borrow checker to be able to borrow fields
independently of each other. In the example, the borrow checker knows that a.b
and a.c
are distinct and can be borrowed independently, it does not try to
borrow all of a
, which would make this pattern useless.
Entry API
Description
A short, prose description of the pattern.
Example
#![allow(unused)] fn main() { // An example of the pattern in action, should be mostly code, commented // liberally. }
Motivation
Why and where you should use the pattern
Advantages
Good things about this pattern.
Disadvantages
Bad things about this pattern. Possible contraindications.
Discussion
TODO vs insert_or_update etc.
See also
FFI Patterns
Writing FFI code is an entire course in itself. However, there are several idioms here that can act as pointers, and avoid traps for inexperienced users of unsafe Rust.
This section contains design patterns that may be useful when doing FFI.
-
Object-Based API design that has good memory safety characteristics, and a clean boundary of what is safe and what is unsafe
-
Type Consolidation into Wrappers - group multiple Rust types together into an opaque "object"
Object-Based APIs
Description
When designing APIs in Rust which are exposed to other languages, there are some important design principles which are contrary to normal Rust API design:
- All Encapsulated types should be owned by Rust, managed by the user, and opaque.
- All Transactional data types should be owned by the user, and transparent.
- All library behavior should be functions acting upon Encapsulated types.
- All library behavior should be encapsulated into types not based on structure, but provenance/lifetime.
Motivation
Rust has built-in FFI support to other languages. It does this by providing a way for crate authors to provide C-compatible APIs through different ABIs (though that is unimportant to this practice).
Well-designed Rust FFI follows C API design principles, while compromising the design in Rust as little as possible. There are three goals with any foreign API:
- Make it easy to use in the target language.
- Avoid the API dictating internal unsafety on the Rust side as much as possible.
- Keep the potential for memory unsafety and Rust
undefined behaviour
as small as possible.
Rust code must trust the memory safety of the foreign language beyond a certain point.
However, every bit of unsafe
code on the Rust side is an opportunity for bugs, or to exacerbate undefined behaviour
.
For example, if a pointer provenance is wrong, that may be a segfault due to invalid memory access. But if it is manipulated by unsafe code, it could become full-blown heap corruption.
The Object-Based API design allows for writing shims that have good memory safety characteristics, and a clean boundary of what is safe and what is unsafe
.
Code Example
The POSIX standard defines the API to access an on-file database, known as DBM. It is an excellent example of an "object-based" API.
Here is the definition in C, which hopefully should be easy to read for those involved in FFI. The commentary below should help explaining it for those who miss the subtleties.
struct DBM;
typedef struct { void *dptr, size_t dsize } datum;
int dbm_clearerr(DBM *);
void dbm_close(DBM *);
int dbm_delete(DBM *, datum);
int dbm_error(DBM *);
datum dbm_fetch(DBM *, datum);
datum dbm_firstkey(DBM *);
datum dbm_nextkey(DBM *);
DBM *dbm_open(const char *, int, mode_t);
int dbm_store(DBM *, datum, datum, int);
This API defines two types: DBM
and datum
.
The DBM
type was called an "encapsulated" type above.
It is designed to contain internal state, and acts as an entry point for the library's behavior.
It is completely opaque to the user, who cannot create a DBM
themselves since they don't know its size or layout.
Instead, they must call dbm_open
, and that only gives them a pointer to one.
This means all DBM
s are "owned" by the library in a Rust sense. The internal state of unknown size is kept in memory controlled by the library, not the user.
The user can only manage its life cycle with open
and close
, and perform operations on it with the other functions.
The datum
type was called a "transactional" type above. It is designed to facilitate the exchange of information between the library and its user.
The database is designed to store "unstructured data", with no pre-defined length or meaning.
As a result, the datum
is the C equivalent of a Rust slice: a bunch of bytes, and a count of how many there are.
The main difference is that there is no type information, which is what void
indicates.
Keep in mind that this header is written from the library's point of view.
The user likely has some type they are using, which has a known size.
But the library does not care, and by the rules of C casting, any type behind a pointer can be cast to void
.
As noted earlier, this type is transparent to the user. But also, this type is owned by the user. This has subtle ramifications, due to that pointer inside it. The question is, who owns the memory that pointer points to?
The answer for best memory safety is, "the user".
But in cases such as retrieving a value, the user does not know how to allocate it correctly (since they don't know how long the value is).
In this case, the library code is expected to use the heap that the user has access to -- such as the C library malloc
and free
-- and then transfer ownership in the Rust sense.
This may all seem speculative, but this is what a pointer means in C. It means the same thing as Rust: "user defined lifetime." The user of the library needs to read the documentation in order to use it correctly. That said, there are some decisions that have fewer or greater consequences if users do it wrong. Minimizing those is what this best practice is about, and the key is to transfer ownership of everything that is transparent.
Advantages
This minimizes the number of memory safety guarantees the user must uphold to a relatively small number:
- Do not call any function with a pointer not returned by
dbm_open
(invalid access or corruption). - Do not call any function on a pointer after close (use after free).
- The
dptr
on anydatum
must beNULL
, or point to a valid slice of memory at the advertised length.
In addition, it avoids a lot of pointer provenance issues. To understand why, let us consider an alternative in some depth: key iteration.
Rust is well known for its iterators.
When implementing one, the programmer makes a separate type with a bounded lifetime to its owner, and implements the Iterator
trait.
Here is how iteration would be done in Rust for DBM
:
struct Dbm { ... }
impl Dbm {
/* ... */
pub fn keys<'it>(&'it self) -> DbmKeysIter<'it> { ... }
/* ... */
}
struct DbmKeysIter<'it> {
owner: &'it Dbm,
}
impl<'it> Iterator for DbmKeysIter<'it> { ... }
This is clean, idiomatic, and safe. thanks to Rust's guarantees. However, consider what a straightforward API translation would look like:
#[no_mangle]
pub extern "C" fn dbm_iter_new(owner: *const Dbm) -> *mut DbmKeysIter {
/* THIS API IS A BAD IDEA! For real applications, use object-based design instead. */
}
#[no_mangle]
pub extern "C" fn dbm_iter_next(iter: *mut DbmKeysIter, key_out: *const datum) -> libc::c_int {
/* THIS API IS A BAD IDEA! For real applications, use object-based design instead. */
}
#[no_mangle]
pub extern "C" fn dbm_iter_del(*mut DbmKeysIter) {
/* THIS API IS A BAD IDEA! For real applications, use object-based design instead. */
}
This API loses a key piece of information: the lifetime of the iterator must not exceed the lifetime of the Dbm
object that owns it.
A user of the library could use it in a way which causes the iterator to outlive the data it is iterating on, resulting in reading uninitialized memory.
This example written in C contains a bug that will be explained afterwards:
int count_key_sizes(DBM *db) {
/* DO NOT USE THIS FUNCTION. IT HAS A SUBTLE BUT SERIOUS BUG! */
datum key;
int len = 0;
if (!dbm_iter_new(db)) {
dbm_close(db);
return -1;
}
int l;
while ((l = dbm_iter_next(owner, &key)) >= 0) { // an error is indicated by -1
free(key.dptr);
len += key.dsize;
if (l == 0) { // end of the iterator
dbm_close(owner);
}
}
if l >= 0 {
return -1;
} else {
return len;
}
}
This bug is a classic. Here's what happens when the iterator returns the end-of-iteration marker:
- The loop condition sets
l
to zero, and enters the loop because0 >= 0
. - The length is incremented, in this case by zero.
- The if statement is true, so the database is closed. There should be a break statement here.
- The loop condition executes again, causing a
next
call on the closed object.
The worst part about this bug?
If the Rust implementation was careful, this code will work most of the time!
If the memory for the Dbm
object is not immediately reused, an internal check will almost certainly fail, resulting in the iterator returning a -1
indicating an error.
But occasionally, it will cause a segmentation fault, or even worse, nonsensical memory corruption!
None of this can be avoided by Rust. From its perspective, it put those objects on its heap, returned pointers to them, and gave up control of their lifetimes. The C code simply must "play nice".
The programmer must read and understand the API documentation.
While some consider that par for the course in C, a good API design can mitigate this risk.
The POSIX API for DBM
did this by consolidating the ownership of the iterator with its parent:
datum dbm_firstkey(DBM *);
datum dbm_nextkey(DBM *);
Thus, all of the lifetimes were bound together, and such unsafety was prevented.
Disadvantages
However, this design choice also has a number of drawbacks, which should be considered as well.
First, the API itself becomes less expressive. With POSIX DBM, there is only one iterator per object, and every call changes its state. This is much more restrictive than iterators in almost any language, even though it is safe. Perhaps with other related objects, whose lifetimes are less hierarchical, this limitation is more of a cost than the safety.
Second, depending on the relationships of the API's parts, significant design effort may be involved. Many of the easier design points have other patterns associated with them:
-
Wrapper Type Consolidation groups multiple Rust types together into an opaque "object"
-
FFI Error Passing explains error handling with integer codes and sentinel return values (such as
NULL
pointers) -
Accepting Foreign Strings allows accepting strings with minimal unsafe code, and is easier to get right than Passing Strings to FFI
However, not every API can be done this way. It is up to the best judgement of the programmer as to who their audience is.
Type Consolidation into Wrappers
Description
This pattern is designed to allow gracefully handling multiple related types, while minimizing the surface area for memory unsafety.
One of the cornerstones of Rust's aliasing rules is lifetimes. This ensures that many patterns of access between types can be memory safe, data race safety included.
However, when Rust types are exported to other languages, they are usually transformed into pointers. In Rust, a pointer means "the user manages the lifetime of the pointee." It is their responsibility to avoid memory unsafety.
Some level of trust in the user code is thus required, notably around use-after-free which Rust can do nothing about. However, some API designs place higher burdens than others on the code written in the other language.
The lowest risk API is the "consolidated wrapper", where all possible interactions with an object are folded into a "wrapper type", while keeping the Rust API clean.
Code Example
To understand this, let us look at a classic example of an API to export: iteration through a collection.
That API looks like this:
- The iterator is initialized with
first_key
. - Each call to
next_key
will advance the iterator. - Calls to
next_key
if the iterator is at the end will do nothing. - As noted above, the iterator is "wrapped into" the collection (unlike the native Rust API).
If the iterator implements nth()
efficiently, then it is possible to make it ephemeral to each function call:
struct MySetWrapper {
myset: MySet,
iter_next: usize,
}
impl MySetWrapper {
pub fn first_key(&mut self) -> Option<&Key> {
self.iter_next = 0;
self.next_key()
}
pub fn next_key(&mut self) -> Option<&Key> {
if let Some(next) = self.myset.keys().nth(self.iter_next) {
self.iter_next += 1;
Some(next)
} else {
None
}
}
}
As a result, the wrapper is simple and contains no unsafe
code.
Advantages
This makes APIs safer to use, avoiding issues with lifetimes between types. See Object-Based APIs for more on the advantages and pitfalls this avoids.
Disadvantages
Often, wrapping types is quite difficult, and sometimes a Rust API compromise would make things easier.
As an example, consider an iterator which does not efficiently implement nth()
.
It would definitely be worth putting in special logic to make the object handle iteration internally, or to support a different access pattern efficiently that only the Foreign Function API will use.
Trying to Wrap Iterators (and Failing)
To wrap any type of iterator into the API correctly, the wrapper would need to do what a C version of the code would do: erase the lifetime of the iterator, and manage it manually.
Suffice it to say, this is incredibly difficult.
Here is an illustration of just one pitfall.
A first version of MySetWrapper
would look like this:
struct MySetWrapper {
myset: MySet,
iter_next: usize,
// created from a transmuted Box<KeysIter + 'self>
iterator: Option<NonNull<KeysIter<'static>>>,
}
With transmute
being used to extend a lifetime, and a pointer to hide it, it's ugly already.
But it gets even worse: any other operation can cause Rust undefined behaviour
.
Consider that the MySet
in the wrapper could be manipulated by other functions during iteration, such as storing a new value to the key it was iterating over.
The API doesn't discourage this, and in fact some similar C libraries expect it.
A simple implementation of myset_store
would be:
pub mod unsafe_module {
// other module content
pub fn myset_store(
myset: *mut MySetWrapper,
key: datum,
value: datum) -> libc::c_int {
/* DO NOT USE THIS CODE. IT IS UNSAFE TO DEMONSTRATE A PROLBEM. */
let myset: &mut MySet = unsafe { // SAFETY: whoops, UB occurs in here!
&mut (*myset).myset
};
/* ...check and cast key and value data... */
match myset.store(casted_key, casted_value) {
Ok(_) => 0,
Err(e) => e.into()
}
}
}
If the iterator exists when this function is called, we have violated one of Rust's aliasing rules.
According to Rust, the mutable reference in this block must have exclusive access to the object.
If the iterator simply exists, it's not exclusive, so we have undefined behaviour
! 1
To avoid this, we must have a way of ensuring that mutable reference really is exclusive. That basically means clearing out the iterator's shared reference while it exists, and then reconstructing it. In most cases, that will still be less efficient than the C version.
Some may ask: how can C do this more efficiently? The answer is, it cheats. Rust's aliasing rules are the problem, and C simply ignores them for its pointers. In exchange, it is common to see code that is declared in the manual as "not thread safe" under some or all circumstances. In fact, The GNU C library has an entire lexicon dedicated to concurrent behavior!
Rust would rather make everything memory safe all the time, for both safety and optimizations that C code cannot attain. Being denied access to certain shortcuts is the price Rust programmers need to pay.
For the C programmers out there scratching their heads, the iterator need not be read during this code cause the UB. The exclusivity rule also enables compiler optimizations which may cause inconsistent observations by the iterator's shared reference (e.g. stack spills or reordering instructions for efficiency). These observations may happen any time after the mutable reference is created.
Fold
Description
Run an algorithm over each item in a collection of data to create a new item, thus creating a whole new collection.
The etymology here is unclear to me. The terms 'fold' and 'folder' are used in the Rust compiler, although it appears to me to be more like a map than a fold in the usual sense. See the discussion below for more details.
Example
// The data we will fold, a simple AST.
mod ast {
pub enum Stmt {
Expr(Box<Expr>),
Let(Box<Name>, Box<Expr>),
}
pub struct Name {
value: String,
}
pub enum Expr {
IntLit(i64),
Add(Box<Expr>, Box<Expr>),
Sub(Box<Expr>, Box<Expr>),
}
}
// The abstract folder
mod fold {
use ast::*;
pub trait Folder {
// A leaf node just returns the node itself. In some cases, we can do this
// to inner nodes too.
fn fold_name(&mut self, n: Box<Name>) -> Box<Name> { n }
// Create a new inner node by folding its children.
fn fold_stmt(&mut self, s: Box<Stmt>) -> Box<Stmt> {
match *s {
Stmt::Expr(e) => Box::new(Stmt::Expr(self.fold_expr(e))),
Stmt::Let(n, e) => Box::new(Stmt::Let(self.fold_name(n), self.fold_expr(e))),
}
}
fn fold_expr(&mut self, e: Box<Expr>) -> Box<Expr> { ... }
}
}
use fold::*;
use ast::*;
// An example concrete implementation - renames every name to 'foo'.
struct Renamer;
impl Folder for Renamer {
fn fold_name(&mut self, n: Box<Name>) -> Box<Name> {
Box::new(Name { value: "foo".to_owned() })
}
// Use the default methods for the other nodes.
}
The result of running the Renamer
on an AST is a new AST identical to the old
one, but with every name changed to foo
. A real life folder might have some
state preserved between nodes in the struct itself.
A folder can also be defined to map one data structure to a different (but usually similar) data structure. For example, we could fold an AST into a HIR tree (HIR stands for high-level intermediate representation).
Motivation
It is common to want to map a data structure by performing some operation on
each node in the structure. For simple operations on simple data structures,
this can be done using Iterator::map
. For more complex operations, perhaps
where earlier nodes can affect the operation on later nodes, or where iteration
over the data structure is non-trivial, using the fold pattern is more
appropriate.
Like the visitor pattern, the fold pattern allows us to separate traversal of a data structure from the operations performed to each node.
Discussion
Mapping data structures in this fashion is common in functional languages. In OO languages, it would be more common to mutate the data structure in place. The 'functional' approach is common in Rust, mostly due to the preference for immutability. Using fresh data structures, rather than mutating old ones, makes reasoning about the code easier in most circumstances.
The trade-off between efficiency and reusability can be tweaked by changing how
nodes are accepted by the fold_*
methods.
In the above example we operate on Box
pointers. Since these own their data
exclusively, the original copy of the data structure cannot be re-used. On the
other hand if a node is not changed, reusing it is very efficient.
If we were to operate on borrowed references, the original data structure can be reused; however, a node must be cloned even if unchanged, which can be expensive.
Using a reference counted pointer gives the best of both worlds - we can reuse the original data structure and we don't need to clone unchanged nodes. However, they are less ergonomic to use and mean that the data structures cannot be mutable.
See also
Iterators have a fold
method, however this folds a data structure into a
value, rather than into a new data structure. An iterator's map
is more like
this fold pattern.
In other languages, fold is usually used in the sense of Rust's iterators, rather than this pattern. Some functional languages have powerful constructs for performing flexible maps over data structures.
The visitor pattern is closely related to fold. They share the concept of walking a data structure performing an operation on each node. However, the visitor does not create a new data structure nor consume the old one.
Newtype
What if in some cases we want a type to behave similar to another type or enforce some behaviour at compile time where using only type aliases would not be enough?
For example, if we want to create a custom Display
implementation for String
due to security considerations (e.g. passwords).
For such cases we could use the Newtype
pattern to provide type safety and encapsulation.
Description
Use a tuple struct with a single field to make an opaque wrapper for a type.
This creates a new type, rather than an alias to a type (type
items).
Example
// Some type, not necessarily in the same module or even crate.
struct Foo {
//..
}
impl Foo {
// These functions are not present on Bar.
//..
}
// The newtype.
pub struct Bar(Foo);
impl Bar {
// Constructor.
pub fn new(
//..
) -> Bar {
//..
}
//..
}
fn main() {
let b = Bar::new(...);
// Foo and Bar are type incompatible, the following do not type check.
// let f: Foo = b;
// let b: Bar = Foo { ... };
}
Motivation
The primary motivation for newtypes is abstraction. It allows you to share implementation details between types while precisely controlling the interface. By using a newtype rather than exposing the implementation type as part of an API, it allows you to change implementation backwards compatibly.
Newtypes can be used for distinguishing units, e.g., wrapping f64
to give
distinguishable Miles
and Kms
.
Advantages
The wrapped and wrapper types are not type compatible (as opposed to using
type
), so users of the newtype will never 'confuse' the wrapped and wrapper
types.
Newtypes are a zero-cost abstraction - there is no runtime overhead.
The privacy system ensures that users cannot access the wrapped type (if the field is private, which it is by default).
Disadvantages
The downside of newtypes (especially compared with type aliases), is that there is no special language support. This means there can be a lot of boilerplate. You need a 'pass through' method for every method you want to expose on the wrapped type, and an impl for every trait you want to also be implemented for the wrapper type.
Discussion
Newtypes are very common in Rust code. Abstraction or representing units are the most common uses, but they can be used for other reasons:
- restricting functionality (reduce the functions exposed or traits implemented),
- making a type with copy semantics have move semantics,
- abstraction by providing a more concrete type and thus hiding internal types, e.g.,
pub struct Foo(Bar<T1, T2>);
Here, Bar
might be some public, generic type and T1
and T2
are some internal types. Users of our module shouldn't know that we implement Foo
by using a Bar
, but what we're really hiding here is the types T1
and T2
, and how they are used with Bar
.
See also
- Advanced Types in the book
- Newtypes in Haskell
- Type aliases
- derive_more, a crate for deriving many builtin traits on newtypes.
- The Newtype Pattern In Rust
RAII with guards
Description
RAII stands for "Resource Acquisition is Initialisation" which is a terrible name. The essence of the pattern is that resource initialisation is done in the constructor of an object and finalisation in the destructor. This pattern is extended in Rust by using an RAII object as a guard of some resource and relying on the type system to ensure that access is always mediated by the guard object.
Example
Mutex guards are the classic example of this pattern from the std library (this is a simplified version of the real implementation):
use std::ops::Deref;
struct Foo {}
struct Mutex<T> {
// We keep a reference to our data: T here.
//..
}
struct MutexGuard<'a, T: 'a> {
data: &'a T,
//..
}
// Locking the mutex is explicit.
impl<T> Mutex<T> {
fn lock(&self) -> MutexGuard<T> {
// Lock the underlying OS mutex.
//..
// MutexGuard keeps a reference to self
MutexGuard {
data: self,
//..
}
}
}
// Destructor for unlocking the mutex.
impl<'a, T> Drop for MutexGuard<'a, T> {
fn drop(&mut self) {
// Unlock the underlying OS mutex.
//..
}
}
// Implementing Deref means we can treat MutexGuard like a pointer to T.
impl<'a, T> Deref for MutexGuard<'a, T> {
type Target = T;
fn deref(&self) -> &T {
self.data
}
}
fn baz(x: Mutex<Foo>) {
let xx = x.lock();
xx.foo(); // foo is a method on Foo.
// The borrow checker ensures we can't store a reference to the underlying
// Foo which will outlive the guard xx.
// x is unlocked when we exit this function and xx's destructor is executed.
}
Motivation
Where a resource must be finalised after use, RAII can be used to do this finalisation. If it is an error to access that resource after finalisation, then this pattern can be used to prevent such errors.
Advantages
Prevents errors where a resource is not finalised and where a resource is used after finalisation.
Discussion
RAII is a useful pattern for ensuring resources are properly deallocated or finalised. We can make use of the borrow checker in Rust to statically prevent errors stemming from using resources after finalisation takes place.
The core aim of the borrow checker is to ensure that references to data do not
outlive that data. The RAII guard pattern works because the guard object
contains a reference to the underlying resource and only exposes such
references. Rust ensures that the guard cannot outlive the underlying resource
and that references to the resource mediated by the guard cannot outlive the
guard. To see how this works it is helpful to examine the signature of deref
without lifetime elision:
fn deref<'a>(&'a self) -> &'a T {
//..
}
The returned reference to the resource has the same lifetime as self
('a
).
The borrow checker therefore ensures that the lifetime of the reference to T
is shorter than the lifetime of self
.
Note that implementing Deref
is not a core part of this pattern, it only makes
using the guard object more ergonomic. Implementing a get
method on the guard
works just as well.
See also
Finalisation in destructors idiom
RAII is a common pattern in C++: cppreference.com, wikipedia.
Style guide entry (currently just a placeholder).
Prefer small crates
Description
Prefer small crates that do one thing well.
Cargo and crates.io make it easy to add third-party libraries, much more so than in say C or C++. Moreover, since packages on crates.io cannot be edited or removed after publication, any build that works now should continue to work in the future. We should take advantage of this tooling, and use smaller, more fine-grained dependencies.
Advantages
- Small crates are easier to understand, and encourage more modular code.
- Crates allow for re-using code between projects. For example, the
url
crate was developed as part of the Servo browser engine, but has since found wide use outside the project. - Since the compilation unit of Rust is the crate, splitting a project into multiple crates can allow more of the code to be built in parallel.
Disadvantages
- This can lead to "dependency hell", when a project depends on multiple conflicting versions of a crate at the same time. For example, the
url
crate has both versions 1.0 and 0.5. Since theUrl
fromurl:1.0
and theUrl
fromurl:0.5
are different types, an HTTP client that usesurl:0.5
would not acceptUrl
values from a web scraper that usesurl:1.0
. - Packages on crates.io are not curated. A crate may be poorly written, have unhelpful documentation, or be outright malicious.
- Two small crates may be less optimized than one large one, since the compiler does not perform link-time optimization (LTO) by default.
Examples
The ref_slice
crate provides functions for converting &T
to &[T]
.
The url
crate provides tools for working with URLs.
The num_cpus
crate provides a function to query the number of CPUs on a machine.
See also
Contain unsafety in small modules
Description
If you have unsafe
code, create the smallest possible module that can uphold the needed invariants to build a minimal safe interface upon the unsafety. Embed this into a larger module that contains only safe code and presents an ergonomic interface. Note that the outer module can contain unsafe functions and methods that call directly into the unsafe code. Users may use this to gain speed benefits.
Advantages
- This restricts the unsafe code that must be audited
- Writing the outer module is much easier, since you can count on the guarantees of the inner module
Disadvantages
- Sometimes, it may be hard to find a suitable interface.
- The abstraction may introduce inefficiencies.
Examples
- The
toolshed
crate contains its unsafe operations in submodules, presenting a safe interface to users. std
sString
class is a wrapper overVec<u8>
with the added invariant that the contents must be valid UTF-8. The operations onString
ensure this behavior. However, users have the option of using anunsafe
method to create aString
, in which case the onus is on them to guarantee the validity of the contents.
See also
Visitor
Description
A visitor encapsulates an algorithm that operates over a heterogeneous collection of objects. It allows multiple different algorithms to be written over the same data without having to modify the data (or their primary behaviour).
Furthermore, the visitor pattern allows separating the traversal of a collection of objects from the operations performed on each object.
Example
// The data we will visit
mod ast {
pub enum Stmt {
Expr(Expr),
Let(Name, Expr),
}
pub struct Name {
value: String,
}
pub enum Expr {
IntLit(i64),
Add(Box<Expr>, Box<Expr>),
Sub(Box<Expr>, Box<Expr>),
}
}
// The abstract visitor
mod visit {
use ast::*;
pub trait Visitor<T> {
fn visit_name(&mut self, n: &Name) -> T;
fn visit_stmt(&mut self, s: &Stmt) -> T;
fn visit_expr(&mut self, e: &Expr) -> T;
}
}
use visit::*;
use ast::*;
// An example concrete implementation - walks the AST interpreting it as code.
struct Interpreter;
impl Visitor<i64> for Interpreter {
fn visit_name(&mut self, n: &Name) -> i64 { panic!() }
fn visit_stmt(&mut self, s: &Stmt) -> i64 {
match *s {
Stmt::Expr(ref e) => self.visit_expr(e),
Stmt::Let(..) => unimplemented!(),
}
}
fn visit_expr(&mut self, e: &Expr) -> i64 {
match *e {
Expr::IntLit(n) => n,
Expr::Add(ref lhs, ref rhs) => self.visit_expr(lhs) + self.visit_expr(rhs),
Expr::Sub(ref lhs, ref rhs) => self.visit_expr(lhs) - self.visit_expr(rhs),
}
}
}
One could implement further visitors, for example a type checker, without having to modify the AST data.
Motivation
The visitor pattern is useful anywhere that you want to apply an algorithm to heterogeneous data. If data is homogeneous, you can use an iterator-like pattern. Using a visitor object (rather than a functional approach) allows the visitor to be stateful and thus communicate information between nodes.
Discussion
It is common for the visit_*
methods to return void (as opposed to in the
example). In that case it is possible to factor out the traversal code and share
it between algorithms (and also to provide noop default methods). In Rust, the
common way to do this is to provide walk_*
functions for each datum. For
example,
pub fn walk_expr(visitor: &mut Visitor, e: &Expr) {
match *e {
Expr::IntLit(_) => {},
Expr::Add(ref lhs, ref rhs) => {
visitor.visit_expr(lhs);
visitor.visit_expr(rhs);
}
Expr::Sub(ref lhs, ref rhs) => {
visitor.visit_expr(lhs);
visitor.visit_expr(rhs);
}
}
}
In other languages (e.g., Java) it is common for data to have an accept
method
which performs the same duty.
See also
The visitor pattern is a common pattern in most OO languages.
The fold pattern is similar to visitor but produces a new version of the visited data structure.
Anti-patterns
An anti-pattern is a solution to a "recurring problem that is usually ineffective and risks being highly counterproductive". Just as valuable as knowing how to solve a problem, is knowing how not to solve it. Anti-patterns give us great counter-examples to consider relative to design patterns. Anti-patterns are not confined to code. For example, a process can be an anti-pattern, too.
#![deny(warnings)]
Description
A well-intentioned crate author wants to ensure their code builds without warnings. So they annotate their crate root with the following:
Example
#![allow(unused)] #![deny(warnings)] fn main() { // All is well. }
Advantages
It is short and will stop the build if anything is amiss.
Drawbacks
By disallowing the compiler to build with warnings, a crate author opts out of
Rust's famed stability. Sometimes new features or old misfeatures need a change
in how things are done, thus lints are written that warn
for a certain grace
period before being turned to deny
.
For example, it was discovered that a type could have two impl
s with the same
method. This was deemed a bad idea, but in order to make the transition smooth,
the overlapping-inherent-impls
lint was introduced to give a warning to those
stumbling on this fact, before it becomes a hard error in a future release.
Also sometimes APIs get deprecated, so their use will emit a warning where before there was none.
All this conspires to potentially break the build whenever something changes.
Furthermore, crates that supply additional lints (e.g. rust-clippy) can no
longer be used unless the annotation is removed. This is mitigated with
--cap-lints. The --cap-lints=warn
command line argument, turns all deny
lint errors into warnings. But be aware that forbid
lints are stronger than
deny
hence the 'forbid' level cannot be overridden to be anything lower than
an error. As a result forbid
lints will still stop compilation.
Alternatives
There are two ways of tackling this problem: First, we can decouple the build setting from the code, and second, we can name the lints we want to deny explicitly.
The following command line will build with all warnings set to deny
:
RUSTFLAGS="-D warnings" cargo build
This can be done by any individual developer (or be set in a CI tool like Travis, but remember that this may break the build when something changes) without requiring a change to the code.
Alternatively, we can specify the lints that we want to deny
in the code.
Here is a list of warning lints that is (hopefully) safe to deny:
#[deny(bad-style,
const-err,
dead-code,
extra-requirement-in-impl,
improper-ctypes,
legacy-directory-ownership,
non-shorthand-field-patterns,
no-mangle-generic-items,
overflowing-literals,
path-statements ,
patterns-in-fns-without-body,
plugin-as-library,
private-in-public,
private-no-mangle-fns,
private-no-mangle-statics,
raw-pointer-derive,
safe-extern-statics,
unconditional-recursion,
unions-with-drop-fields,
unused,
unused-allocation,
unused-comparisons,
unused-parens,
while-true)]
In addition, the following allow
ed lints may be a good idea to deny
:
#[deny(missing-debug-implementations,
missing-docs,
trivial-casts,
trivial-numeric-casts,
unused-extern-crates,
unused-import-braces,
unused-qualifications,
unused-results)]
Some may also want to add missing-copy-implementations
to their list.
Note that we explicitly did not add the deprecated
lint, as it is fairly
certain that there will be more deprecated APIs in the future.
See also
- deprecate attribute documentation
- Type
rustc -W help
for a list of lints on your system. Also typerustc --help
for a general list of options - rust-clippy is a collection of lints for better Rust code
Deref
polymorphism
Description
Abuse the Deref
trait to emulate inheritance between structs, and thus reuse
methods.
Example
Sometimes we want to emulate the following common pattern from OO languages such as Java:
class Foo {
void m() { ... }
}
class Bar extends Foo {}
public static void main(String[] args) {
Bar b = new Bar();
b.m();
}
We can use the deref polymorphism anti-pattern to do so:
use std::ops::Deref;
struct Foo {}
impl Foo {
fn m(&self) {
//..
}
}
struct Bar {
f: Foo
}
impl Deref for Bar {
type Target = Foo;
fn deref(&self) -> &Foo {
&self.f
}
}
fn main() {
let b = Bar { Foo {} };
b.m();
}
There is no struct inheritance in Rust. Instead we use composition and include
an instance of Foo
in Bar
(since the field is a value, it is stored inline,
so if there were fields, they would have the same layout in memory as the Java
version (probably, you should use #[repr(C)]
if you want to be sure)).
In order to make the method call work we implement Deref
for Bar
with Foo
as the target (returning the embedded Foo
field). That means that when we
dereference a Bar
(for example, using *
) then we will get a Foo
. That is
pretty weird. Dereferencing usually gives a T
from a reference to T
, here we
have two unrelated types. However, since the dot operator does implicit
dereferencing, it means that the method call will search for methods on Foo
as
well as Bar
.
Advantages
You save a little boilerplate, e.g.,
impl Bar {
fn m(&self) {
self.f.m()
}
}
Disadvantages
Most importantly this is a surprising idiom - future programmers reading this in
code will not expect this to happen. That's because we are abusing the Deref
trait rather than using it as intended (and documented, etc.). It's also because
the mechanism here is completely implicit.
This pattern does not introduce subtyping between Foo
and Bar
like
inheritance in Java or C++ does. Furthermore, traits implemented by Foo
are
not automatically implemented for Bar
, so this pattern interacts badly with
bounds checking and thus generic programming.
Using this pattern gives subtly different semantics from most OO languages with
regards to self
. Usually it remains a reference to the sub-class, with this
pattern it will be the 'class' where the method is defined.
Finally, this pattern only supports single inheritance, and has no notion of interfaces, class-based privacy, or other inheritance-related features. So, it gives an experience that will be subtly surprising to programmers used to Java inheritance, etc.
Discussion
There is no one good alternative. Depending on the exact circumstances it might
be better to re-implement using traits or to write out the facade methods to
dispatch to Foo
manually. We do intend to add a mechanism for inheritance
similar to this to Rust, but it is likely to be some time before it reaches
stable Rust. See these blog
posts
and this RFC issue for more details.
The Deref
trait is designed for the implementation of custom pointer types.
The intention is that it will take a pointer-to-T
to a T
, not convert
between different types. It is a shame that this isn't (probably cannot be)
enforced by the trait definition.
Rust tries to strike a careful balance between explicit and implicit mechanisms, favouring explicit conversions between types. Automatic dereferencing in the dot operator is a case where the ergonomics strongly favour an implicit mechanism, but the intention is that this is limited to degrees of indirection, not conversion between arbitrary types.
See also
Collections are smart pointers idiom.
Documentation for Deref
trait.
Functional Usage of Rust
Rust is an imperative language, but it follows many functional programming paradigms. One of the biggest hurdles to understanding functional programs when coming from an imperative background is the shift in thinking. Imperative programs describe how to do something, whereas declarative programs describe what to do. Let's sum the numbers from 1 to 10 to show this.
Imperative
#![allow(unused)] fn main() { let mut sum = 0; for i in 1..11 { sum += i; } println!("{}", sum); }
With imperative programs, we have to play compiler to see what is happening. Here, we start with a sum
of 0
. Next, we iterate through the range from 1 to 10. Each time through the loop, we add the corresponding value in the range. Then we print it out.
i | sum |
---|---|
1 | 1 |
2 | 3 |
3 | 6 |
4 | 10 |
5 | 15 |
6 | 21 |
7 | 28 |
8 | 36 |
9 | 45 |
10 | 55 |
This is how most of us start out programming. We learn that a program is a set of steps.
Declarative
#![allow(unused)] fn main() { println!("{}", (1..11).fold(0, |a, b| a + b)); }
Whoa! This is really different! What's going on here? Remember that with declarative programs we are describing what to do, rather than how to do it. fold
is a function that composes functions. The name is a convention from Haskell.
Here, we are composing functions of addition (this closure: |a, b| a + b)
) with a range from 1 to 10. The 0
is the starting point, so a
is 0
at first. b
is the first element of the range, 1
. 0 + 1 = 1
is the result. So now we fold
again, with a = 1
, b = 2
and so 1 + 2 = 3
is the next result. This process continues until we get to the last element in the range, 10
.
a | b | result |
---|---|---|
0 | 1 | 1 |
1 | 2 | 3 |
3 | 3 | 6 |
6 | 4 | 10 |
10 | 5 | 15 |
15 | 6 | 21 |
21 | 7 | 28 |
28 | 8 | 36 |
36 | 9 | 45 |
45 | 10 | 55 |
Additional resources
A collection of complementary helpful content
Talks
- Design Patterns in Rust by Nicholas Cameron at the PDRust (2016)
- Writing Idiomatic Libraries in Rust by Pascal Hertleif at RustFest (2017)
- Rust Programming Techniques by Nicholas Cameron at LinuxConfAu (2018)