Introduction
Design patterns
What are design patterns? What are idioms? Anti-patterns.
Design patterns in Rust
Why Rust is a bit special - functional elements, type system - borrow checker
Idioms
TODO: add description/explanation
Concatenating strings with format!
Description
It is possible to build up strings using the push
and push_str
methods on a
mutable String
, or using its +
operator. However, it is often more
convenient to use format!
, especially where there is a mix of literal and
non-literal strings.
Example
#![allow(unused)] fn main() { fn say_hello(name: &str) -> String { // We could construct the result string manually. // let mut result = "Hello ".to_owned(); // result.push_str(name); // result.push('!'); // result // But using format! is better. format!("Hello {}!", name) } }
Advantages
Using format!
is usually the most succinct and readable way to combine strings.
Disadvantages
It is usually not the most efficient way to combine strings - a series of push
operations on a mutable string is usually the most efficient (especially if the
string has been pre-allocated to the expected size).
Constructors
Description
Rust does not have constructors as a language construct. Instead, the convention
is to use a static new
method to create an object.
Example
#![allow(unused)] fn main() { // A Rust vector, see liballoc/vec.rs pub struct Vec<T> { buf: RawVec<T>, len: usize, } impl<T> Vec<T> { // Constructs a new, empty `Vec<T>`. // Note this is a static method - no self. // This constructor doesn't take any arguments, but some might in order to // properly initialise an object pub fn new() -> Vec<T> { // Create a new Vec with fields properly initialised. Vec { // Note that here we are calling RawVec's constructor. buf: RawVec::new(), len: 0, } } } }
See also
The builder pattern for constructing objects where there are multiple configurations.
The Default
Trait
Description
Many types in Rust have a constructor. However, this is specific to the
type; Rust cannot abstract over "everything that has a new()
method". To
allow this, the Default
trait was conceived, which can be used with
containers and other generic types (e.g. see Option::unwrap_or_default()
).
Notably, some containers already implement it where applicable.
Not only do one-element containers like Cow
, Box
or Arc
implement
Default
for contained Default
types, one can automatically
#[derive(Default)]
for structs whose fields all implement it, so the more
types implement Default
, the more useful it becomes.
On the other hand, constructors can take multiple arguments, while the
default()
method does not. There can even be multiple constructors with
different names, but there can only be one Default
implementation per type.
Example
// note that we can simply auto-derive Default here. #[derive(Default)] struct MyConfiguration { // Option defaults to None output: Option<Path>, // Vecs default to empty vector search_path: Vec<Path>, // Duration defaults to zero time timeout: Duration, // bool defaults to false check: bool, } impl MyConfiguration { // add setters here } fn main() { // construct a new instance with default values let mut conf = MyConfiguration::default(); // do somthing with conf here }
See also
- The constructor idiom is another way to generate instances that may or may not be "default"
- The
Default
documentation (scroll down for the list of implementors) Option::unwrap_or_default()
derive(new)
Collections are smart pointers
Description
Use the Deref
trait to treat collections like smart pointers, offering owning
and borrowed views of data.
Example
#![allow(unused)] fn main() { struct Vec<T> { ... } impl<T> Deref for Vec<T> { type Target = [T]; fn deref(&self) -> &[T] { ... } } }
A Vec<T>
is an owning collection of T
s, a slice (&[T]
) is a borrowed
collection of T
s. Implementing Deref
for Vec
allows implicit dereferencing
from &Vec<T>
to &[T]
and includes the relationship in auto-derefencing
searches. Most methods you might expect to be implemented for Vec
s are instead
implemented for slices.
See also String
and &str
.
Motivation
Ownership and borrowing are key aspects of the Rust language. Data structures must account for these semantics properly in order to give a good user experience. When implementing a data structure which owns its data, offering a borrowed view of that data allows for more flexible APIs.
Advantages
Most methods can be implemented only for the borrowed view, they are then implicitly available for the owning view.
Gives clients a choice between borrowing or taking ownership of data.
Disadvantages
Methods and traits only available via dereferencing are not taken into account
when bounds checking, so generic programming with data structures using this
pattern can get complex (see the Borrow
and AsRef
traits, etc.).
Discussion
Smart pointers and collections are analogous: a smart pointer points to a single object, whereas a collection points to many objects. From the point of view of the type system there is little difference between the two. A collection owns its data if the only way to access each datum is via the collection and the collection is responsible for deleting the data (even in cases of shared ownership, some kind of borrowed view may be appropriate). If a collection owns its data, it is usually useful to provide a view of the data as borrowed so that it can be multiply referenced.
Most smart pointers (e.g., Foo<T>
) implement Deref<Target=T>
. However,
collections will usually dereference to a custom type. [T]
and str
have some
language support, but in the general case, this is not necessary. Foo<T>
can
implement Deref<Target=Bar<T>>
where Bar
is a dynamically sized type and
&Bar<T>
is a borrowed view of the data in Foo<T>
.
Commonly, ordered collections will implement Index
for Range
s to provide
slicing syntax. The target will be the borrowed view.
See also
Deref polymorphism anti-pattern.
Documentation for Deref
trait.
Finalisation in destructors
Description
Rust does not provide the equivalent to finally
blocks - code that will be
executed no matter how a function is exited. Instead an object's destructor can
be used to run code that must be run before exit.
Example
#![allow(unused)] fn main() { fn bar() -> Result<(), ()> { // These don't need to be defined inside the function. struct Foo; // Implement a destructor for Foo. impl Drop for Foo { fn drop(&mut self) { println!("exit"); } } // The dtor of _exit will run however the function `bar` is exited. let _exit = Foo; // Implicit return with `?` operator. baz()?; // Normal return. Ok(()) } }
Motivation
If a function has multiple return points, then executing code on exit becomes
difficult and repetitive (and thus bug-prone). This is especially the case where
return is implicit due to a macro. A common case is the ?
operator which
returns if the result is an Err
, but continues if it is Ok
. ?
is used as
an exception handling mechanism, but unlike Java (which has finally
), there is
no way to schedule code to run in both the normal and exceptional cases.
Panicking will also exit a function early.
Advantages
Code in destructors will (nearly) always be run - copes with panics, early returns, etc.
Disadvantages
It is not guaranteed that destructors will run. For example, if there is an infinite loop in a function or if running a function crashes before exit. Destructors are also not run in the case of a panic in an already panicking thread. Therefore destructors cannot be relied on as finalisers where it is absolutely essential that finalisation happens.
This pattern introduces some hard to notice, implicit code. Reading a function gives no clear indication of destructors to be run on exit. This can make debugging tricky.
Requiring an object and Drop
impl just for finalisation is heavy on boilerplate.
Discussion
There is some subtlety about how exactly to store the object used as a
finaliser. It must be kept alive until the end of the function and must then be
destroyed. The object must always be a value or uniquely owned pointer (e.g.,
Box<Foo>
). If a shared pointer (such as Rc
) is used, then the finaliser can
be kept alive beyond the lifetime of the function. For similar reasons, the
finaliser should not be moved or returned.
The finaliser must be assigned into a variable, otherwise it will be destroyed
immediately, rather than when it goes out of scope. The variable name must start
with _
if the variable is only used as a finaliser, otherwise the compiler
will warn that the finaliser is never used. However, do not call the variable
_
with no suffix - in that case it will be again be destroyed immediately.
In Rust, destructors are run when an object goes out of scope. This happens whether we reach the end of block, there is an early return, or the program panics. When panicking, Rust unwinds the stack running destructors for each object in each stack frame. So, destructors get called even if the panic happens in a function being called.
If a destructor panics while unwinding, there is no good action to take, so Rust aborts the thread immediately, without running further destructors. This means that desctructors are not absolutely guaranteed to run. It also means that you must take extra care in your destructors not to panic, since it could leave resources in an unexpected state.
See also
RAII.
mem::replace
to keep owned values in changed enums
Description
Say we have a &mut MyEnum
which has (at least) two variants,
A { name: String, x: u8 }
and B { name: String }
. Now we want to change
MyEnum::A
to a B
if x
is zero, while keeping MyEnum::B
intact.
We can do this without cloning the name
.
Example
#![allow(unused)] fn main() { use std::mem; enum MyEnum { A { name: String, x: u8 }, B { name: String } } fn a_to_b(e: &mut MyEnum) { // we mutably borrow `e` here. This precludes us from changing it directly // as in `*e = ...`, because the borrow checker won't allow it. Therefore // the assignment to `e` must be outside the `if let` clause. *e = if let MyEnum::A { ref mut name, x: 0 } = *e { // this takes out our `name` and put in an empty String instead // (note that empty strings don't allocate). // Then, construct the new enum variant (which will // be assigned to `*e`, because it is the result of the `if let` expression). MyEnum::B { name: mem::replace(name, String::new()) } // In all other cases, we return immediately, thus skipping the assignment } else { return } } }
This also works with more variants:
use std::mem;
enum MultiVariateEnum {
A { name: String },
B { name: String },
C,
D
}
fn swizzle(e: &mut MultiVariateEnum) {
use self::MultiVariateEnum::*;
*e = match *e {
// Ownership rules do not allow taking `name` by value, but we cannot
// take the value out of a mutable reference, unless we replace it:
A { ref mut name } => B { name: mem::replace(name, String::new()) },
B { ref mut name } => A { name: mem::replace(name, String::new()) },
C => D,
D => C
}
}
Motivation
When working with enums, we may want to change an enum value in place, perhaps to another variant. This is usually done in two phases to keep the borrow checker happy. In the first phase, we observe the existing value and look at its parts to decide what to do next. In the second phase we may conditionally change the value (as in the example above).
The borrow checker won't allow us to take out name
of the enum (because
something must be there. We could of course .clone()
name and put the clone
into our MyEnum::B
, but that would be an instance of the [Clone to satisfy
the borrow checker] antipattern. Anyway, we can avoid the extra allocation by
changing e
with only a mutable borrow.
mem::replace
lets us swap out the value, replacing it with something else. In
this case, we put in an empty String
, which does not need to allocate. As a
result, we get the original name
as an owned value. We can then wrap this in
another enum.
Note, however, that if we are using an Option
and want to replace its
value with a None
, Option
’s take()
method provides a shorter and
more idiomatic alternative.
Advantages
Look ma, no allocation! Also you may feel like Indiana Jones while doing it.
Disadvantages
This gets a bit wordy. Getting it wrong repeatedly will make you hate the borrow checker. The compiler may fail to optimize away the double store, resulting in reduced performance as opposed to what you'd do in unsafe languages.
Discussion
This pattern is only of interest in Rust. In GC'd languages, you'd take the reference to the value by default (and the GC would keep track of refs), and in other low-level languages like C you'd simply alias the pointer and fix things later.
However, in Rust, we have to do a little more work to do this. An owned value may only have one owner, so to take it out, we need to put something back in – like Indiana Jones, replacing the artifact with a bag of sand.
See also
This gets rid of the [Clone to satisfy the borrow checker] antipattern in a specific case.
[Clone to satisfy the borrow checker](TODO: Hinges on PR #23)
On-Stack Dynamic Dispatch
Description
We can dynamically dispatch over multiple values, however, to do so, we need to declare multiple variables to bind differently-typed objects. To extend the lifetime as necessary, we can use deferred conditional initialization, as seen below:
Example
#![allow(unused)] fn main() { // These must live longer than `readable`, and thus are declared first: let (mut stdin_read, mut file_read); // We need to ascribe the type to get dynamic dispatch. let readable: &mut dyn io::Read = if arg == '-' { stdin_read = io::stdin(); &mut stdin_read } else { file_read = fs::File::open(arg)?; &mut file_read }; // Read from `readable` here. }
Motivation
Rust monomorphises code by default. This means a copy of the code will be generated for each type it is used with and optimized independently. While this allows for very fast code on the hot path, it also bloats the code in places where performance is not of the essence, thus costing compile time and cache usage.
Luckily, Rust allows us to use dynamic dispatch, but we have to explicitly ask for it.
Advantages
We do not need to allocate anything on the heap. Neither do we need to
initialize something we won't use later, nor do we need to monomorphize the
whole code that follows to work with both File
or Stdin
, with all the
Disadvantages
The code needs more moving parts than the Box
-based version:
#![allow(unused)] fn main() { // We still need to ascribe the type for dynamic dispatch. let readable: Box<dyn io::Read> = if arg == "-" { Box::new(io::stdin()) } else { Box::new(fs::File::open(arg)?) }; // Read from `readable` here. }
Discussion
Rust newcomers will usually learn that Rust requires all variables to be initialized before use, so it's easy to overlook the fact that unused variables may well be uninitialized. Rust works quite hard to ensure that this works out fine and only the initialized values are dropped at the end of their scope.
The example meets all the constraints Rust places on us:
- All variables are initialized before using (in this case borrowing) them
- Each variable only holds values of a single type. In our example,
stdin
is of typeStdin
,file
is of typeFile
andreadable
is of type&mut dyn Read
- Each borrowed value outlives all the references borrowed from it
See also
- Finalisation in destructors and RAII guards can benefit from tight control over lifetimes.
- For conditionally filled
Option<&T>
s of (mutable) references, one can initialize anOption<T>
directly and use its.as_ref()
method to get an optional reference.
Iterating over an Option
Description
Option
can be viewed as a container that contains either zero or one elements. In particular, it implements the IntoIterator
trait, and as such can be used with generic code that needs such a type.
Examples
Since Option
implements IntoIterator
, it can be used as an argument to .extend()
:
#![allow(unused)] fn main() { let turing = Some("Turing"); let mut logicians = vec!["Curry", "Kleene", "Markov"]; logicians.extend(turing); // equivalent to if let Some(turing_inner) = turing { logicians.push(turing_inner); } }
If you need to tack an Option
to the end of an existing iterator, you can pass it to .chain()
:
#![allow(unused)] fn main() { let turing = Some("Turing"); let logicians = vec!["Curry", "Kleene", "Markov"]; for logician in logicians.iter().chain(turing.iter()) { println!("{} is a logician", logician); } }
Note that if the Option
is always Some
, then it is more idiomatic to use std::iter::once
on the element instead.
Also, since Option
implements IntoIterator
, it's possible to iterate over it using a for
loop. This is equivalent to matching it with if let Some(..)
, and in most cases you should prefer the latter.
See also
-
std::iter::once
is an iterator which yields exactly one element. It's a more readable alternative toSome(foo).into_iter()
. -
Iterator::filter_map
is a version ofIterator::flat_map
, specialized to mapping functions which returnOption
. -
The
ref_slice
crate provides functions for converting anOption
to a zero- or one-element slice.
Pass variables to closure
Description
By default, closures capture their environment by borrowing. Or you can use move
-closure
to move whole environment. However, often you want to move just some variables to closure,
give it copy of some data, pass it by reference, or perform some other transformation.
Use variable rebinding in separate scope for that.
Example
Use
#![allow(unused)] fn main() { let num1 = Rc::new(1); let num2 = Rc::new(2); let num3 = Rc::new(3); let closure = { // `num1` is moved let num2 = num2.clone(); // `num2` is cloned let num3 = num3.as_ref(); // `num3` is borrowed move || { *num1 + *num2 + *num3; } }; }
instead of
#![allow(unused)] fn main() { let num1 = Rc::new(1); let num2 = Rc::new(2); let num3 = Rc::new(3); let num2_cloned = num2.clone(); let num3_borrowed = num3.as_ref(); let closure = move || { *num1 + *num2_cloned + *num3_borrowed; }; }
Advantages
Copied data are grouped together with closure definition, so their purpose is more clear and they will be dropped immediately even if they are not consumed by closure.
Closure uses same variable names as surrounding code whether data are copied or moved.
Disadvantages
Additional indentation of closure body.
Privacy for extensibility
Description
Use a private field to ensure that a struct is extensible without breaking stability guarantees.
Example
mod a { // Public struct. pub struct S { pub foo: i32, // Private field. bar: i32, } } fn main(s: a::S) { // Because S::bar is private, it cannot be named here and we must use `..` // in the pattern. let a::S { foo: _, ..} = s; }
Discussion
Adding a field to a struct is a mostly backwards compatible change. However, if a client uses a pattern to deconstruct a struct instance, they might name all the fields in the struct and adding a new one would break that pattern. The client could name some of the fields and use ..
in the pattern, in which case adding another field is backwards compatible. Making at least one of the struct's fields private forces clients to use the latter form of patterns, ensuring that the struct is future-proof.
The downside of this approach is that you might need to add an otherwise unneeded field to the struct. You can use the ()
type so that there is no runtime overhead and prepend _
to the field name to avoid the unused field warning.
If Rust allowed private variants of enums, we could use the same trick to make adding a variant to an enum backwards compatible. The problem there is exhaustive match expressions. A private variant would force clients to have a _
wildcard pattern.
Easy doc initialization
Description
If a struct takes significant effort to initialize, when writing docs, it can be quicker to wrap your example with a function which takes the struct as an argument.
Motivation
Sometimes there is a struct with multiple or complicated parameters and several methods. Each of these methods should have examples.
For example:
#![allow(unused)] fn main() { struct Connection { name: String, stream: TcpStream, } impl Connection { /// Sends a request over the connection. /// /// # Example /// ```no_run /// # // Boilerplate are required to get an example working. /// # let stream = TcpStream::connect("127.0.0.1:34254"); /// # let connection = Connection { name: "foo".to_owned(), stream }; /// # let request = Request::new("RequestId", RequestType::Get, "payload"); /// let response = connection.send_request(request); /// assert!(response.is_ok()); /// ``` fn send_request(&self, request: Request) -> Result<Status, SendErr> { // ... } /// Oh no, all that boilerplate needs to be repeated here! fn check_status(&self) -> Status { // ... } } }
Example
Instead of typing all of this boiler plate to create an Connection
and Request
it is easier to just create a wrapping dummy function which takes them as arguments:
#![allow(unused)] fn main() { struct Connection { name: String, stream: TcpStream, } impl Connection { /// Sends a request over the connection. /// /// # Example /// ``` /// # fn call_send(connection: Connection, request: Request) { /// let response = connection.send_request(); /// assert!(response.is_ok()); /// # } /// ``` fn send_request(&self, request: Request) { // ... } } }
Note in the above example the line assert!(response.is_ok());
will not actually run while testing because it is inside of a function which is never invoked.
Advantages
This is much more concise and avoids repetitive code in examples.
Disadvantages
As example is in a function, the code will not be tested. (Though it still will checked to make sure it compiles when running a cargo test
)
So this pattern is most useful when need no_run
. With this, you do not need to add no_run
.
Discussion
If assertions are not required this pattern works well.
If they are, an alternative can be to create a public method to create a dummy instance which is annotated with #[doc(hidden)]
(so that users won't see it).
Then this method can be called inside of rustdoc because it is part of the crate's public API.
Temporary mutability
Description
Often it is necessary to prepare and process some data, but after that data are only inspected and never modified. The intention can be made explicit by redefining the mutable variable as immutable.
It can be done either by processing data within nested block or by redefining variable.
Example
Say, vector must be sorted before usage.
Using nested block:
#![allow(unused)] fn main() { let data = { let mut data = get_vec(); data.sort(); data }; // Here `data` is immutable. }
Using variable rebinding:
#![allow(unused)] fn main() { let mut data = get_vec(); data.sort(); let data = data; // Here `data` is immutable. }
Advantages
Compiler ensures that you don't accidentally mutate data after some point.
Disadvantages
Nested block requires additional indentation of block body. One more line to return data from block or redefine variable.
Design Patterns
TODO: add description/explanation
Builder
Description
Construct an object with calls to a builder helper.
Example
struct Foo { // Lots of complicated fields. } struct FooBuilder { // Probably lots of optional fields. ... } impl FooBuilder { fn new(...) -> FooBuilder { // Set the minimally required fields of Foo. } fn named(mut self, name: &str) -> FooBuilder { // Set the name on the builder itself, and return the builder by value. } // More methods that take `mut self` and return `FooBuilder` setting up // various aspects of a Foo. ... // If we can get away with not consuming the Builder here, that is an // advantage. It means we can use the builder as a template for constructing // many Foos. fn finish(&self) -> Foo { // Create a Foo from the FooBuilder, applying all settings in FooBuilder to Foo. } } fn main() { let f = FooBuilder::new().named("Bar").with_attribute(...).finish(); }
Motivation
Useful when you would otherwise require many different constructors or where construction has side effects.
Advantages
Separates methods for building from other methods.
Prevents proliferation of constructors
Can be used for one-liner initialisation as well as more complex construction.
Disadvantages
More complex than creating a struct object directly, or a simple constructor function.
Discussion
This pattern is seen more frequently in Rust (and for simpler objects) than in many other languages because Rust lacks overloading. Since you can only have a single method with a given name, having multiple constructors is less nice in Rust than in C++, Java, or others.
This pattern is often used where the builder object is useful in its own right,
rather than being just a builder. For example, see
std::process::Command
is a builder for Child
(a process). In these cases, the T
and TBuilder
pattern
of naming is not used.
The example takes and returns the builder by value. It is often more ergonomic (and more efficient) to take and return the builder as a mutable reference. The borrow checker makes this work naturally. This approach has the advantage that one can write code like
let mut fb = FooBuilder::new();
fb.a();
fb.b();
let f = fb.finish();
as well as the FooBuilder::new().a().b().finish()
style.
See also
Description in the style guide
derive_builder, a crate for automatically implementing this pattern while avoiding the boilerplate.
Constructor pattern for when construction is simpler.
Compose structs together for better borrowing
TODO - this is not a very snappy name
Description
Sometimes a large struct will cause issues with the borrow checker - although fields can be borrowed independently, sometimes the whole struct ends up being used at once, preventing other uses. A solution might be to decompose the struct into several smaller structs. Then compose these together into the original struct. Then each struct can be borrowed separately and have more flexible behaviour.
This will often lead to a better design in other ways: applying this design pattern often reveals smaller units of functionality.
Example
Here is a contrived example of where the borrow checker foils us in our plan to use a struct:
struct A { f1: u32, f2: u32, f3: u32, } fn foo(a: &mut A) -> &u32 { &a.f2 } fn bar(a: &mut A) -> u32 { a.f1 + a.f3 } fn main(a: &mut A) { // x causes a to be borrowed for the rest of the function. let x = foo(a); // Borrow check error let y = bar(a); //~ ERROR: cannot borrow `*a` as mutable more than once at a time }
We can apply this design pattern and refactor A
into two smaller structs, thus
solving the borrow checking issue:
// A is now composed of two structs - B and C. struct A { b: B, c: C, } struct B { f2: u32, } struct C { f1: u32, f3: u32, } // These functions take a B or C, rather than A. fn foo(b: &mut B) -> &u32 { &b.f2 } fn bar(c: &mut C) -> u32 { c.f1 + c.f3 } fn main(a: &mut A) { let x = foo(&mut a.b); // Now it's OK! let y = bar(&mut a.c); }
Motivation
Why and where you should use the pattern
Advantages
Lets you work around limitations in the borrow checker.
Often produces a better design.
Disadvantages
Leads to more verbose code.
Sometimes, the smaller structs are not good abstractions, and so we end up with a worse design. That is probably a 'code smell', indicating that the program should be refactored in some way.
Discussion
This pattern is not required in languages that don't have a borrow checker, so in that sense is unique to Rust. However, making smaller units of functionality often leads to cleaner code: a widely acknowledged principle of software engineering, independent of the language.
This pattern relies on Rust's borrow checker to be able to borrow fields
independently of each other. In the example, the borrow checker knows that a.b
and a.c
are distinct and can be borrowed independently, it does not try to
borrow all of a
, which would make this pattern useless.
Entry API
Description
A short, prose description of the pattern.
Example
#![allow(unused)] fn main() { // An example of the pattern in action, should be mostly code, commented // liberally. }
Motivation
Why and where you should use the pattern
Advantages
Good things about this pattern.
Disadvantages
Bad things about this pattern. Possible contraindications.
Discussion
TODO vs insert_or_update etc.
See also
Fold
Description
Run an algorithm over each item in a collection of data to create a new item, thus creating a whole new collection.
The etymology here is unclear to me. The terms 'fold' and 'folder' are used in the Rust compiler, although it appears to me to be more like a map than a fold in the usual sense. See the discussion below for more details.
Example
#![allow(unused)] fn main() { // The data we will fold, a simple AST. mod ast { pub enum Stmt { Expr(Box<Expr>), Let(Box<Name>, Box<Expr>), } pub struct Name { value: String, } pub enum Expr { IntLit(i64), Add(Box<Expr>, Box<Expr>), Sub(Box<Expr>, Box<Expr>), } } // The abstract folder mod fold { use ast::*; pub trait Folder { // A leaf node just returns the node itself. In some cases, we can do this // to inner nodes too. fn fold_name(&mut self, n: Box<Name>) -> Box<Name> { n } // Create a new inner node by folding its children. fn fold_stmt(&mut self, s: Box<Stmt>) -> Box<Stmt> { match *s { Stmt::Expr(e) => Box::new(Stmt::Expr(self.fold_expr(e))), Stmt::Let(n, e) => Box::new(Stmt::Let(self.fold_name(n), self.fold_expr(e))), } } fn fold_expr(&mut self, e: Box<Expr>) -> Box<Expr> { ... } } } use fold::*; use ast::*; // An example concrete implementation - renames every name to 'foo'. struct Renamer; impl Folder for Renamer { fn fold_name(&mut self, n: Box<Name>) -> Box<Name> { Box::new(Name { value: "foo".to_owned() }) } // Use the default methods for the other nodes. } }
The result of running the Renamer
on an AST is a new AST identical to the old
one, but with every name changed to foo
. A real life folder might have some
state preserved between nodes in the struct itself.
A folder can also be defined to map one data structure to a different (but usually similar) data structure. For example, we could fold an AST into a HIR tree (HIR stands for high-level intermediate representation).
Motivation
It is common to want to map a data structure by performing some operation on
each node in the structure. For simple operations on simple data structures,
this can be done using Iterator::map
. For more complex operations, perhaps
where earlier nodes can affect the operation on later nodes, or where iteration
over the data structure is non-trivial, using the fold pattern is more
appropriate.
Like the visitor pattern, the fold pattern allows us to separate traversal of a data structure from the operations performed to each node.
Discussion
Mapping data structures in this fashion is common in functional languages. In OO languages, it would be more common to mutate the data structure in place. The 'functional' approach is common in Rust, mostly due to the preference for immutability. Using fresh data structures, rather than mutating old ones, makes reasoning about the code easier in most circumstances.
The trade-off between efficiency and reusability can be tweaked by changing how
nodes are accepted by the fold_*
methods.
In the above example we operate on Box
pointers. Since these own their data
exclusively, the original copy of the data structure cannot be re-used. On the
other hand if a node is not changed, reusing it is very efficient.
If we were to operate on borrowed references, the original data structure can be reused; however, a node must be cloned even if unchanged, which can be expensive.
Using a reference counted pointer gives the best of both worlds - we can reuse the original data structure and we don't need to clone unchanged nodes. However, they are less ergonomic to use and mean that the data structures cannot be mutable.
See also
Iterators have a fold
method, however this folds a data structure into a
value, rather than into a new data structure. An iterator's map
is more like
this fold pattern.
In other languages, fold is usually used in the sense of Rust's iterators, rather than this pattern. Some functional languages have powerful constructs for performing flexible maps over data structures.
The visitor pattern is closely related to fold. They share the concept of walking a data structure performing an operation on each node. However, the visitor does not create a new data structure nor consume the old one.
Late bound bounds
Description
TODO late binding of bounds for better APIs (i.e., Mutex's don't require Send)
Example
#![allow(unused)] fn main() { // An example of the pattern in action, should be mostly code, commented // liberally. }
Motivation
Why and where you should use the pattern
Advantages
Good things about this pattern.
Disadvantages
Bad things about this pattern. Possible contraindications.
Discussion
A deeper discussion about this pattern. You might want to cover how this is done in other languages, alternative approaches, why this is particularly nice in Rust, etc.
See also
Related patterns (link to the pattern file). Versions of this pattern in other languages.
Newtype
Description
Use a tuple struct with a single field to make an opaque wrapper for a type.
This creates a new type, rather than an alias to a type (type
items).
Example
// Some type, not necessarily in the same module or even crate. struct Foo { ... } impl Foo { // These functions are not present on Bar. ... } // The newtype. pub struct Bar(Foo); impl Bar { // Constructor. pub fn new(...) -> Bar { ... } ... } fn main() { let b = Bar::new(...); // Foo and Bar are type incompatible, the following do not type check. // let f: Foo = b; // let b: Bar = Foo { ... }; }
Motivation
The primary motivation for newtypes is abstraction. It allows you to share implementation details between types while precisely controlling the interface. By using a newtype rather than exposing the implementation type as part of an API, it allows you to change implementation backwards compatibly.
Newtypes can be used for distinguishing units, e.g., wrapping f64
to give
distinguishable Miles
and Kms
.
Advantages
The wrapped and wrapper types are not type compatible (as opposed to using
type
), so users of the newtype will never 'confuse' the wrapped and wrapper
types.
Newtypes are a zero-cost abstraction - there is no runtime overhead.
The privacy system ensures that users cannot access the wrapped type (if the field is private, which it is by default).
Disadvantages
The downside of newtypes (especially compared with type aliases), is that there is no special language support. This means there can be a lot of boilerplate. You need a 'pass through' method for every method you want to expose on the wrapped type, and an impl for every trait you want to also be implemented for the wrapper type.
Discussion
Newtypes are very common in Rust code. Abstraction or representing units are the most common uses, but they can be used for other reasons:
- restricting functionality (reduce the functions exposed or traits implemented),
- making a type with copy semantics have move semantics,
- abstraction by providing a more concrete type and thus hiding internal types, e.g.,
#![allow(unused)] fn main() { pub struct Foo(Bar<T1, T2>); }
Here, Bar
might be some public, generic type and T1
and T2
are some internal types. Users of our module shouldn't know that we implement Foo
by using a Bar
, but what we're really hiding here is the types T1
and T2
, and how they are used with Bar
.
See also
RAII with guards
Description
RAII stands for "Resource Acquisition is Initialisation" which is a terrible name. The essence of the pattern is that resource initialisation is done in the constructor of an object and finalisation in the destructor. This pattern is extended in Rust by using an RAII object as a guard of some resource and relying on the type system to ensure that access is always mediated by the guard object.
Example
Mutex guards are the classic example of this pattern from the std library (this is a simplified version of the real implementation):
struct Mutex<T> { // We keep a reference to our data: T here. ... } struct MutexGuard<'a, T: 'a> { data: &'a T, ... } // Locking the mutex is explicit. impl<T> Mutex<T> { fn lock(&self) -> MutexGuard<T> { // Lock the underlying OS mutex. ... // MutexGuard keeps a reference to self MutexGuard { data: self, ... } } } // Destructor for unlocking the mutex. impl<'a, T> Drop for MutexGuard<'a, T> { fn drop(&mut self) { // Unlock the underlying OS mutex. ... } } // Implementing Deref means we can treat MutexGuard like a pointer to T. impl<'a, T> Deref for MutexGuard<'a, T> { type Target = T; fn deref(&self) -> &T { self.data } } fn main(x: Mutex<Foo>) { let xx = x.lock(); xx.foo(); // foo is a method on Foo. // The borrow checker ensures we can't store a reference to the underlying // Foo which will outlive the guard xx. // x is unlocked when we exit this function and xx's destructor is executed. }
Motivation
Where a resource must be finalised after use, RAII can be used to do this finalisation. If it is an error to access that resource after finalisation, then this pattern can be used to prevent such errors.
Advantages
Prevents errors where a resource is not finalised and where a resource is used after finalisation.
Discussion
RAII is a useful pattern for ensuring resources are properly deallocated or finalised. We can make use of the borrow checker in Rust to statically prevent errors stemming from using resources after finalisation takes place.
The core aim of the borrow checker is to ensure that references to data do not
outlive that data. The RAII guard pattern works because the guard object
contains a reference to the underlying resource and only exposes such
references. Rust ensures that the guard cannot outlive the underlying resource
and that references to the resource mediated by the guard cannot outlive the
guard. To see how this works it is helpful to examine the signature of deref
without lifetime elision:
#![allow(unused)] fn main() { fn deref<'a>(&'a self) -> &'a T { ... } }
The returned reference to the resource has the same lifetime as self
('a
).
The borrow checker therefore ensures that the lifetime of the reference to T
is shorter than the lifetime of self
.
Note that implementing Deref
is not a core part of this pattern, it only makes
using the guard object more ergonomic. Implementing a get
method on the guard
works just as well.
See also
Finalisation in destructors idiom
RAII is a common pattern in C++: cppreference.com, wikipedia.
Style guide entry (currently just a placeholder).
Prefer small crates
Description
Prefer small crates that do one thing well.
Cargo and crates.io make it easy to add third-party libraries, much more so than in say C or C++. Moreover, since packages on crates.io cannot be edited or removed after publication, any build that works now should continue to work in the future. We should take advantage of this tooling, and use smaller, more fine-grained dependencies.
Advantages
- Small crates are easier to understand, and encourage more modular code.
- Crates allow for re-using code between projects. For example, the
url
crate was developed as part of the Servo browser engine, but has since found wide use outside the project. - Since the compilation unit of Rust is the crate, splitting a project into multiple crates can allow more of the code to be built in parallel.
Disadvantages
- This can lead to "dependency hell", when a project depends on multiple conflicting versions of a crate at the same time. For example, the
url
crate has both versions 1.0 and 0.5. Since theUrl
fromurl:1.0
and theUrl
fromurl:0.5
are different types, an HTTP client that usesurl:0.5
would not acceptUrl
values from a web scraper that usesurl:1.0
. - Packages on crates.io are not curated. A crate may be poorly written, have unhelpful documentation, or be outright malicious.
- Two small crates may be less optimized than one large one, since the compiler does not perform link-time optimization (LTO) by default.
Examples
The ref_slice
crate provides functions for converting &T
to &[T]
.
The url
crate provides tools for working with URLs.
The num_cpus
crate provides a function to query the number of CPUs on a machine.
See also
Contain unsafety in small modules
Description
If you have unsafe
code, create the smallest possible module that can uphold the needed invariants to build a minimal safe interface upon the unsafety. Embed this into a larger module that contains only safe code and presents an ergonomic interface. Note that the outer module can contain unsafe functions and methods that call directly into the unsafe code. Users may use this to gain speed benefits.
Advantages
- This restricts the unsafe code that must be audited
- Writing the outer module is much easier, since you can count on the guarantees of the inner module
Disadvantages
- Sometimes, it may be hard to find a suitable interface.
- The abstraction may introduce inefficiencies.
Examples
- The
toolshed
crate contains its unsafe operations in submodules, presenting a safe interface to users. std
sString
class is a wrapper overVec<u8>
with the added invariant that the contents must be valid UTF-8. The operations onString
ensure this behavior. However, users have the option of using anunsafe
method to create aString
, in which case the onus is on them to guarantee the validity of the contents.
See also
Visitor
Description
A visitor encapsulates an algorithm that operates over a heterogeneous collection of objects. It allows multiple different algorithms to be written over the same data without having to modify the data (or their primary behaviour).
Furthermore, the visitor pattern allows separating the traversal of a collection of objects from the operations performed on each object.
Example
#![allow(unused)] fn main() { // The data we will visit mod ast { pub enum Stmt { Expr(Expr), Let(Name, Expr), } pub struct Name { value: String, } pub enum Expr { IntLit(i64), Add(Box<Expr>, Box<Expr>), Sub(Box<Expr>, Box<Expr>), } } // The abstract visitor mod visit { use ast::*; pub trait Visitor<T> { fn visit_name(&mut self, n: &Name) -> T; fn visit_stmt(&mut self, s: &Stmt) -> T; fn visit_expr(&mut self, e: &Expr) -> T; } } use visit::*; use ast::*; // An example concrete implementation - walks the AST interpreting it as code. struct Interpreter; impl Visitor<i64> for Interpreter { fn visit_name(&mut self, n: &Name) -> i64 { panic!() } fn visit_stmt(&mut self, s: &Stmt) -> i64 { match *s { Stmt::Expr(ref e) => self.visit_expr(e), Stmt::Let(..) => unimplemented!(), } } fn visit_expr(&mut self, e: &Expr) -> i64 { match *e { Expr::IntLit(n) => n, Expr::Add(ref lhs, ref rhs) => self.visit_expr(lhs) + self.visit_expr(rhs), Expr::Sub(ref lhs, ref rhs) => self.visit_expr(lhs) - self.visit_expr(rhs), } } } }
One could implement further visitors, for example a type checker, without having to modify the AST data.
Motivation
The visitor pattern is useful anywhere that you want to apply an algorithm to heterogeneous data. If data is homogeneous, you can use an iterator-like pattern. Using a visitor object (rather than a functional approach) allows the visitor to be stateful and thus communicate information between nodes.
Discussion
It is common for the visit_*
methods to return void (as opposed to in the
example). In that case it is possible to factor out the traversal code and share
it between algorithms (and also to provide noop default methods). In Rust, the
common way to do this is to provide walk_*
functions for each datum. For
example,
#![allow(unused)] fn main() { pub fn walk_expr(visitor: &mut Visitor, e: &Expr) { match *e { Expr::IntLit(_) => {}, Expr::Add(ref lhs, ref rhs) => { visitor.visit_expr(lhs); visitor.visit_expr(rhs); } Expr::Sub(ref lhs, ref rhs) => { visitor.visit_expr(lhs); visitor.visit_expr(rhs); } } } }
In other languages (e.g., Java) it is common for data to have an accept
method
which performs the same duty.
See also
The visitor pattern is a common pattern in most OO languages.
The fold pattern is similar to visitor but produces a new version of the visited data structure.
Anti-patterns
TODO: add description/explanation
#![deny(warnings)]
Description
A well-intentioned crate author wants to ensure their code builds without warnings. So they annotate their crate root with the following:
Example
#![allow(unused)] #![deny(warnings)] fn main() { // All is well. }
Advantages
It is short and will stop the build if anything is amiss.
Drawbacks
By disallowing the compiler to build with warnings, a crate author opts out of
Rust's famed stability. Sometimes new features or old misfeatures need a change
in how things are done, thus lints are written that warn
for a certain grace
period before being turned to deny
.
For example, it was discovered that a type could have two impl
s with the same
method. This was deemed a bad idea, but in order to make the transition smooth,
the overlapping-inherent-impls
lint was introduced to give a warning to those
stumbling on this fact, before it becomes a hard error in a future release.
Also sometimes APIs get deprecated, so their use will emit a warning where before there was none.
All this conspires to potentially break the build whenever something changes.
Furthermore, crates that supply additional lints (e.g. rust-clippy) can no longer be used unless the annotation is removed. This is mitigated with --cap-lints.
Alternatives
There are two ways of tackling this problem: First, we can decouple the build setting from the code, and second, we can name the lints we want to deny explicitly.
The following command line will build with all warnings set to deny
:
RUSTFLAGS="-D warnings" cargo build
This can be done by any individual developer (or be set in a CI tool like Travis, but remember that this may break the build when something changes) without requiring a change to the code.
Alternatively, we can specify the lints that we want to deny
in the code.
Here is a list of warning lints that is (hopefully) safe to deny:
#![allow(unused)] fn main() { #[deny(bad-style, const-err, dead-code, extra-requirement-in-impl, improper-ctypes, legacy-directory-ownership, non-shorthand-field-patterns, no-mangle-generic-items, overflowing-literals, path-statements , patterns-in-fns-without-body, plugin-as-library, private-in-public, private-no-mangle-fns, private-no-mangle-statics, raw-pointer-derive, safe-extern-statics, unconditional-recursion, unions-with-drop-fields, unused, unused-allocation, unused-comparisons, unused-parens, while-true)] }
In addition, the following allow
ed lints may be a good idea to deny
:
#![allow(unused)] fn main() { #[deny(missing-debug-implementations, missing-docs, trivial-casts, trivial-numeric-casts, unused-extern-crates, unused-import-braces, unused-qualifications, unused-results)] }
Some may also want to add missing-copy-implementations
to their list.
Note that we explicitly did not add the deprecated
lint, as it is fairly
certain that there will be more deprecated APIs in the future.
See also
- deprecate attribute documentation
- Type
rustc -W help
for a list of lints on your system. Also typerustc --help
for a general list of options - rust-clippy is a collection of lints for better Rust code
Deref
polymorphism
Description
Abuse the Deref
trait to emulate inheritance between structs, and thus reuse
methods.
Example
Sometimes we want to emulate the following common pattern from OO languages such as Java:
class Foo {
void m() { ... }
}
class Bar extends Foo {}
public static void main(String[] args) {
Bar b = new Bar();
b.m();
}
We can use the deref polymorphism anti-pattern to do so:
struct Foo {} impl Foo { fn m(&self) { ... } } struct Bar { f: Foo } impl Deref for Bar { type Target = Foo; fn deref(&self) -> &Foo { &self.f } } fn main() { let b = Bar { Foo {} }; b.m(); }
There is no struct inheritance in Rust. Instead we use composition and include
an instance of Foo
in Bar
(since the field is a value, it is stored inline,
so if there were fields, they would have the same layout in memory as the Java
version (probably, you should use #[repr(C)]
if you want to be sure)).
In order to make the method call work we implement Deref
for Bar
with Foo
as the target (returning the embedded Foo
field). That means that when we
dereference a Bar
(for example, using *
) then we will get a Foo
. That is
pretty weird. Dereferencing usually gives a T
from a reference to T
, here we
have two unrelated types. However, since the dot operator does implicit
dereferencing, it means that the method call will search for methods on Foo
as
well as Bar
.
Advantages
You save a little boilerplate, e.g.,
#![allow(unused)] fn main() { impl Bar { fn m(&self) { self.f.m() } } }
Disadvantages
Most importantly this is a surprising idiom - future programmers reading this in
code will not expect this to happen. That's because we are abusing the Deref
trait rather than using it as intended (and documented, etc.). It's also because
the mechanism here is completely implicit.
This pattern does not introduce subtyping between Foo
and Bar
like
inheritance in Java or C++ does. Furthermore, traits implemented by Foo
are
not automatically implemented for Bar
, so this pattern interacts badly with
bounds checking and thus generic programming.
Using this pattern gives subtly different semantics from most OO languages with
regards to self
. Usually it remains a reference to the sub-class, with this
pattern it will be the 'class' where the method is defined.
Finally, this pattern only supports single inheritance, and has no notion of interfaces, class-based privacy, or other inheritance-related features. So, it gives an experience that will be subtly surprising to programmers used to Java inheritance, etc.
Discussion
There is no one good alternative. Depending on the exact circumstances it might
be better to re-implement using traits or to write out the facade methods to
dispatch to Foo
manually. We do intend to add a mechanism for inheritance
similar to this to Rust, but it is likely to be some time before it reaches
stable Rust. See these blog
posts
and this RFC issue for more details.
The Deref
trait is designed for the implementation of custom pointer types.
The intention is that it will take a pointer-to-T
to a T
, not convert
between different types. It is a shame that this isn't (probably cannot be)
enforced by the trait definition.
Rust tries to strike a careful balance between explicit and implicit mechanisms, favouring explicit conversions between types. Automatic dereferencing in the dot operator is a case where the ergonomics strongly favour an implicit mechanism, but the intention is that this is limited to degrees of indirection, not conversion between arbitrary types.