Introduction

Design patterns

When developing programs, we have to solve many problems. A program can be viewed as a solution to a problem. It can also be viewed as a collection of solutions to many different problems. All of these solutions work together to solve a bigger problem.

Design patterns in Rust

There are many problems that share the same form. Due to the fact that Rust is not object-oriented design patterns vary with respect to other object-oriented programming languages. While the details are different, since they have the same form they can be solved using the same fundamental methods.

Design patterns are methods to solve common problems when writing software.

Anti-patterns are methods to solve these same common problems.

However, while design patterns give us benefits, anti-patterns create more problems. There are some problems that we don't need to solve because Rust rocks!

Idioms are guidelines to follow when coding. They are social norms of the community. You can break them, but if you do you should have a good reason for it.

Refactoring is the process by which you convert code that works, but is hard to understand, into code that works and is easy to understand.

TODO: Mention why Rust is a bit special - functional elements, type system, borrow checker

Idioms

Idioms are commonly used styles and patterns largely agreed upon by a community. They are guidelines. Writing idiomatic code allows other developers to understand what is happening because they are familiar with the form that it has.

The computer understands the machine code that is generated by the compiler. The language is therefore mostly beneficial to the developer. So, since we have this abstraction layer, why not put it to good use and make it simple?

Remember the KISS principle: "Keep It Simple, Stupid". It claims that "most systems work best if they are kept simple rather than made complicated; therefore, simplicity should be a key goal in design, and unnecessary complexity should be avoided".

Code is there for humans, not computers, to understand.

Concatenating strings with `format!`

Description

It is possible to build up strings using the push and push_str methods on a mutable String, or using its + operator. However, it is often more convenient to use format!, especially where there is a mix of literal and non-literal strings.

Example


#![allow(unused)]
fn main() {
fn say_hello(name: &str) -> String {
    // We could construct the result string manually.
    // let mut result = "Hello ".to_owned();
    // result.push_str(name);
    // result.push('!');
    // result

    // But using format! is better.
    format!("Hello {}!", name)
}
}

Advantages

Using format! is usually the most succinct and readable way to combine strings.

Disadvantages

It is usually not the most efficient way to combine strings - a series of push operations on a mutable string is usually the most efficient (especially if the string has been pre-allocated to the expected size).

Constructors

Description

Rust does not have constructors as a language construct. Instead, the convention is to use a static new method to create an object.

Example

// A Rust vector, see liballoc/vec.rs
pub struct Vec<T> {
    buf: RawVec<T>,
    len: usize,
}

impl<T> Vec<T> {
    // Constructs a new, empty `Vec<T>`.
    // Note this is a static method - no self.
    // This constructor doesn't take any arguments, but some might in order to
    // properly initialise an object
    pub fn new() -> Vec<T> {
        // Create a new Vec with fields properly initialised.
        Vec {
            // Note that here we are calling RawVec's constructor.
            buf: RawVec::new(),
            len: 0,
        }
    }
}

The `Default` Trait

Description

Many types in Rust have a constructor. However, this is specific to the type; Rust cannot abstract over "everything that has a new() method". To allow this, the Default trait was conceived, which can be used with containers and other generic types (e.g. see Option::unwrap_or_default()). Notably, some containers already implement it where applicable.

Not only do one-element containers like Cow, Box or Arc implement Default for contained Default types, one can automatically #[derive(Default)] for structs whose fields all implement it, so the more types implement Default, the more useful it becomes.

On the other hand, constructors can take multiple arguments, while the default() method does not. There can even be multiple constructors with different names, but there can only be one Default implementation per type.

Example

// note that we can simply auto-derive Default here.
#[derive(Default)]
struct MyConfiguration {
    // Option defaults to None
    output: Option<Path>,
    // Vecs default to empty vector
    search_path: Vec<Path>,
    // Duration defaults to zero time
    timeout: Duration,
    // bool defaults to false
    check: bool,
}

impl MyConfiguration {
    // add setters here
}

fn main() {
    // construct a new instance with default values
    let mut conf = MyConfiguration::default();
    // do somthing with conf here
}

Collections are smart pointers

Description

Use the Deref trait to treat collections like smart pointers, offering owning and borrowed views of data.

Example

use std::ops::Deref;

struct Vec<T> {
    data: T,
    //..
}

impl<T> Deref for Vec<T> {
    type Target = [T];

    fn deref(&self) -> &[T] {
        //..
    }
}

A Vec<T> is an owning collection of Ts, a slice (&[T]) is a borrowed collection of Ts. Implementing Deref for Vec allows implicit dereferencing from &Vec<T> to &[T] and includes the relationship in auto-derefencing searches. Most methods you might expect to be implemented for Vecs are instead implemented for slices.

Motivation

Ownership and borrowing are key aspects of the Rust language. Data structures must account for these semantics properly in order to give a good user experience. When implementing a data structure which owns its data, offering a borrowed view of that data allows for more flexible APIs.

Advantages

Most methods can be implemented only for the borrowed view, they are then implicitly available for the owning view.

Gives clients a choice between borrowing or taking ownership of data.

Disadvantages

Methods and traits only available via dereferencing are not taken into account when bounds checking, so generic programming with data structures using this pattern can get complex (see the Borrow and AsRef traits, etc.).

Discussion

Smart pointers and collections are analogous: a smart pointer points to a single object, whereas a collection points to many objects. From the point of view of the type system there is little difference between the two. A collection owns its data if the only way to access each datum is via the collection and the collection is responsible for deleting the data (even in cases of shared ownership, some kind of borrowed view may be appropriate). If a collection owns its data, it is usually useful to provide a view of the data as borrowed so that it can be multiply referenced.

Most smart pointers (e.g., Foo<T>) implement Deref<Target=T>. However, collections will usually dereference to a custom type. [T] and str have some language support, but in the general case, this is not necessary. Foo<T> can implement Deref<Target=Bar<T>> where Bar is a dynamically sized type and &Bar<T> is a borrowed view of the data in Foo<T>.

Commonly, ordered collections will implement Index for Ranges to provide slicing syntax. The target will be the borrowed view.

Finalisation in destructors

Description

Rust does not provide the equivalent to finally blocks - code that will be executed no matter how a function is exited. Instead an object's destructor can be used to run code that must be run before exit.

Example

fn bar() -> Result<(), ()> {
    // These don't need to be defined inside the function.
    struct Foo;

    // Implement a destructor for Foo.
    impl Drop for Foo {
        fn drop(&mut self) {
            println!("exit");
        }
    }

    // The dtor of _exit will run however the function `bar` is exited.
    let _exit = Foo;
    // Implicit return with `?` operator.
    baz()?;
    // Normal return.
    Ok(())
}

Motivation

If a function has multiple return points, then executing code on exit becomes difficult and repetitive (and thus bug-prone). This is especially the case where return is implicit due to a macro. A common case is the ? operator which returns if the result is an Err, but continues if it is Ok. ? is used as an exception handling mechanism, but unlike Java (which has finally), there is no way to schedule code to run in both the normal and exceptional cases. Panicking will also exit a function early.

Advantages

Code in destructors will (nearly) always be run - copes with panics, early returns, etc.

Disadvantages

It is not guaranteed that destructors will run. For example, if there is an infinite loop in a function or if running a function crashes before exit. Destructors are also not run in the case of a panic in an already panicking thread. Therefore destructors cannot be relied on as finalisers where it is absolutely essential that finalisation happens.

This pattern introduces some hard to notice, implicit code. Reading a function gives no clear indication of destructors to be run on exit. This can make debugging tricky.

Requiring an object and Drop impl just for finalisation is heavy on boilerplate.

Discussion

There is some subtlety about how exactly to store the object used as a finaliser. It must be kept alive until the end of the function and must then be destroyed. The object must always be a value or uniquely owned pointer (e.g., Box<Foo>). If a shared pointer (such as Rc) is used, then the finaliser can be kept alive beyond the lifetime of the function. For similar reasons, the finaliser should not be moved or returned.

The finaliser must be assigned into a variable, otherwise it will be destroyed immediately, rather than when it goes out of scope. The variable name must start with _ if the variable is only used as a finaliser, otherwise the compiler will warn that the finaliser is never used. However, do not call the variable _ with no suffix - in that case it will be again be destroyed immediately.

In Rust, destructors are run when an object goes out of scope. This happens whether we reach the end of block, there is an early return, or the program panics. When panicking, Rust unwinds the stack running destructors for each object in each stack frame. So, destructors get called even if the panic happens in a function being called.

If a destructor panics while unwinding, there is no good action to take, so Rust aborts the thread immediately, without running further destructors. This means that desctructors are not absolutely guaranteed to run. It also means that you must take extra care in your destructors not to panic, since it could leave resources in an unexpected state.

`mem::replace` to keep owned values in changed enums

Description

Say we have a &mut MyEnum which has (at least) two variants, A { name: String, x: u8 } and B { name: String }. Now we want to change MyEnum::A to a B if x is zero, while keeping MyEnum::B intact.

We can do this without cloning the name.

Example


#![allow(unused)]
fn main() {
use std::mem;

enum MyEnum {
    A { name: String, x: u8 },
    B { name: String }
}

fn a_to_b(e: &mut MyEnum) {

    // we mutably borrow `e` here. This precludes us from changing it directly
    // as in `*e = ...`, because the borrow checker won't allow it. Therefore
    // the assignment to `e` must be outside the `if let` clause. 
    *e = if let MyEnum::A { ref mut name, x: 0 } = *e {
    
        // this takes out our `name` and put in an empty String instead
        // (note that empty strings don't allocate).
        // Then, construct the new enum variant (which will 
        // be assigned to `*e`, because it is the result of the `if let` expression).
        MyEnum::B { name: mem::replace(name, String::new()) }
        
    // In all other cases, we return immediately, thus skipping the assignment
    } else { return }
}
}

This also works with more variants:

use std::mem;

enum MultiVariateEnum {
    A { name: String },
    B { name: String },
    C,
    D
}

fn swizzle(e: &mut MultiVariateEnum) {
    use self::MultiVariateEnum::*;
    *e = match *e {
        // Ownership rules do not allow taking `name` by value, but we cannot
        // take the value out of a mutable reference, unless we replace it:
        A { ref mut name } => B { name: mem::replace(name, String::new()) },
        B { ref mut name } => A { name: mem::replace(name, String::new()) },
        C => D,
        D => C
    }
}

Motivation

When working with enums, we may want to change an enum value in place, perhaps to another variant. This is usually done in two phases to keep the borrow checker happy. In the first phase, we observe the existing value and look at its parts to decide what to do next. In the second phase we may conditionally change the value (as in the example above).

The borrow checker won't allow us to take out name of the enum (because something must be there. We could of course .clone() name and put the clone into our MyEnum::B, but that would be an instance of the [Clone to satisfy the borrow checker] antipattern. Anyway, we can avoid the extra allocation by changing e with only a mutable borrow.

mem::replace lets us swap out the value, replacing it with something else. In this case, we put in an empty String, which does not need to allocate. As a result, we get the original name as an owned value. We can then wrap this in another enum.

Note, however, that if we are using an Option and want to replace its value with a None, Option’s take() method provides a shorter and more idiomatic alternative.

Advantages

Look ma, no allocation! Also you may feel like Indiana Jones while doing it.

Disadvantages

This gets a bit wordy. Getting it wrong repeatedly will make you hate the borrow checker. The compiler may fail to optimize away the double store, resulting in reduced performance as opposed to what you'd do in unsafe languages.

Discussion

This pattern is only of interest in Rust. In GC'd languages, you'd take the reference to the value by default (and the GC would keep track of refs), and in other low-level languages like C you'd simply alias the pointer and fix things later.

However, in Rust, we have to do a little more work to do this. An owned value may only have one owner, so to take it out, we need to put something back in – like Indiana Jones, replacing the artifact with a bag of sand.

On-Stack Dynamic Dispatch

Description

We can dynamically dispatch over multiple values, however, to do so, we need to declare multiple variables to bind differently-typed objects. To extend the lifetime as necessary, we can use deferred conditional initialization, as seen below:

Example

std::io::File;

// These must live longer than `readable`, and thus are declared first:
let (mut stdin_read, mut file_read);

// We need to ascribe the type to get dynamic dispatch.
let readable: &mut dyn io::Read = if arg == '-' {
    stdin_read = io::stdin();
    &mut stdin_read
} else {
    file_read = fs::File::open(arg)?;
    &mut file_read
};

// Read from `readable` here.

Motivation

Rust monomorphises code by default. This means a copy of the code will be generated for each type it is used with and optimized independently. While this allows for very fast code on the hot path, it also bloats the code in places where performance is not of the essence, thus costing compile time and cache usage.

Luckily, Rust allows us to use dynamic dispatch, but we have to explicitly ask for it.

Advantages

We do not need to allocate anything on the heap. Neither do we need to initialize something we won't use later, nor do we need to monomorphize the whole code that follows to work with both File or Stdin, with all the

Disadvantages

The code needs more moving parts than the Box-based version:

// We still need to ascribe the type for dynamic dispatch.
let readable: Box<dyn io::Read> = if arg == "-" {
    Box::new(io::stdin())
} else {
    Box::new(fs::File::open(arg)?)
};
// Read from `readable` here.

Discussion

Rust newcomers will usually learn that Rust requires all variables to be initialized before use, so it's easy to overlook the fact that unused variables may well be uninitialized. Rust works quite hard to ensure that this works out fine and only the initialized values are dropped at the end of their scope.

The example meets all the constraints Rust places on us:

All variables are initialized before using (in this case borrowing) them
Each variable only holds values of a single type. In our example, stdin is of type Stdin, file is of type File and readable is of type &mut dyn Read
Each borrowed value outlives all the references borrowed from it

Iterating over an `Option`

Description

Option can be viewed as a container that contains either zero or one elements. In particular, it implements the IntoIterator trait, and as such can be used with generic code that needs such a type.

Examples

Since Option implements IntoIterator, it can be used as an argument to .extend():


#![allow(unused)]
fn main() {
let turing = Some("Turing");
let mut logicians = vec!["Curry", "Kleene", "Markov"];

logicians.extend(turing);

// equivalent to
if let Some(turing_inner) = turing {
    logicians.push(turing_inner);
}
}

If you need to tack an Option to the end of an existing iterator, you can pass it to .chain():


#![allow(unused)]
fn main() {
let turing = Some("Turing");
let logicians = vec!["Curry", "Kleene", "Markov"];

for logician in logicians.iter().chain(turing.iter()) {
    println!("{} is a logician", logician);
}
}

Note that if the Option is always Some, then it is more idiomatic to use std::iter::once on the element instead.

Also, since Option implements IntoIterator, it's possible to iterate over it using a for loop. This is equivalent to matching it with if let Some(..), and in most cases you should prefer the latter.

Pass variables to closure

Description

By default, closures capture their environment by borrowing. Or you can use move-closure to move whole environment. However, often you want to move just some variables to closure, give it copy of some data, pass it by reference, or perform some other transformation.

Use variable rebinding in separate scope for that.

Example

Use


#![allow(unused)]
fn main() {
use std::rc::Rc;

let num1 = Rc::new(1);
let num2 = Rc::new(2);
let num3 = Rc::new(3);
let closure = {
    // `num1` is moved
    let num2 = num2.clone();  // `num2` is cloned
    let num3 = num3.as_ref();  // `num3` is borrowed
    move || {
        *num1 + *num2 + *num3;
    }
};
}

instead of


#![allow(unused)]
fn main() {
use std::rc::Rc;

let num1 = Rc::new(1);
let num2 = Rc::new(2);
let num3 = Rc::new(3);

let num2_cloned = num2.clone();
let num3_borrowed = num3.as_ref();
let closure = move || {
	*num1 + *num2_cloned + *num3_borrowed;
};
}

Advantages

Copied data are grouped together with closure definition, so their purpose is more clear and they will be dropped immediately even if they are not consumed by closure.

Closure uses same variable names as surrounding code whether data are copied or moved.

Disadvantages

Additional indentation of closure body.

Privacy for extensibility

Description

Use a private field to ensure that a struct is extensible without breaking stability guarantees.

Example

mod a {
    // Public struct.
    pub struct S {
        pub foo: i32,
        // Private field.
        bar: i32,
    }
}

fn main(s: a::S) {
    // Because S::bar is private, it cannot be named here and we must use `..`
    // in the pattern.
    let a::S { foo: _, ..} = s;
}

Discussion

Adding a field to a struct is a mostly backwards compatible change. However, if a client uses a pattern to deconstruct a struct instance, they might name all the fields in the struct and adding a new one would break that pattern. The client could name some of the fields and use .. in the pattern, in which case adding another field is backwards compatible. Making at least one of the struct's fields private forces clients to use the latter form of patterns, ensuring that the struct is future-proof.

The downside of this approach is that you might need to add an otherwise unneeded field to the struct. You can use the () type so that there is no runtime overhead and prepend _ to the field name to avoid the unused field warning.

If Rust allowed private variants of enums, we could use the same trick to make adding a variant to an enum backwards compatible. The problem there is exhaustive match expressions. A private variant would force clients to have a _ wildcard pattern.

Easy doc initialization

Description

If a struct takes significant effort to initialize, when writing docs, it can be quicker to wrap your example with a function which takes the struct as an argument.

Motivation

Sometimes there is a struct with multiple or complicated parameters and several methods. Each of these methods should have examples.

For example:

struct Connection {
    name: String,
    stream: TcpStream,
}

impl Connection {
    /// Sends a request over the connection.
    ///
    /// # Example
    /// ```no_run
    /// # // Boilerplate are required to get an example working.
    /// # let stream = TcpStream::connect("127.0.0.1:34254");
    /// # let connection = Connection { name: "foo".to_owned(), stream };
    /// # let request = Request::new("RequestId", RequestType::Get, "payload");
    /// let response = connection.send_request(request);
    /// assert!(response.is_ok());
    /// ```
    fn send_request(&self, request: Request) -> Result<Status, SendErr> {
        // ...
    }
        
    /// Oh no, all that boilerplate needs to be repeated here!
    fn check_status(&self) -> Status {
        // ...
    }
}

Example

Instead of typing all of this boiler plate to create an Connection and Request it is easier to just create a wrapping dummy function which takes them as arguments:

struct Connection {
    name: String,
    stream: TcpStream,
}

impl Connection {
    /// Sends a request over the connection.
    ///
    /// # Example
    /// ```
    /// # fn call_send(connection: Connection, request: Request) {
    /// let response = connection.send_request();
    /// assert!(response.is_ok()); 
    /// # }
    /// ```
    fn send_request(&self, request: Request) {
        // ...
    }
}

Note in the above example the line assert!(response.is_ok()); will not actually run while testing because it is inside of a function which is never invoked.

Advantages

This is much more concise and avoids repetitive code in examples.

Disadvantages

As example is in a function, the code will not be tested. (Though it still will checked to make sure it compiles when running a cargo test) So this pattern is most useful when need no_run. With this, you do not need to add no_run.

Discussion

If assertions are not required this pattern works well.

If they are, an alternative can be to create a public method to create a dummy instance which is annotated with #[doc(hidden)] (so that users won't see it). Then this method can be called inside of rustdoc because it is part of the crate's public API.

Temporary mutability

Description

Often it is necessary to prepare and process some data, but after that data are only inspected and never modified. The intention can be made explicit by redefining the mutable variable as immutable.

It can be done either by processing data within nested block or by redefining variable.

Example

Say, vector must be sorted before usage.

Using nested block:

let data = {
	let mut data = get_vec();
	data.sort();
	data
};

// Here `data` is immutable.

Using variable rebinding:

let mut data = get_vec();
data.sort();
let data = data;

// Here `data` is immutable.

Advantages

Compiler ensures that you don't accidentally mutate data after some point.

Disadvantages

Nested block requires additional indentation of block body. One more line to return data from block or redefine variable.

Design Patterns

Design patterns are "general reusable solutions to a commonly occurring problem within a given context in software design". Design patterns are a great way to describe some of the culture and 'tribal knowledge' of programming in a language. Design patterns are very language-specific - what is a pattern in one language may be unnecessary in another due to a language feature, or impossible to express due to a missing feature.

If overused, design patterns can add unnecessary complexity to programs. However, they are a great way to share intermediate and advanced level knowledge about a programming language.

Design patterns in Rust

Rust has many very unique features. These features give us great benefit by removing whole classes of problems. Some of them are also patterns that are unique to Rust.

YAGNI

If you're not familiar with it, YAGNI is an acronym that stands for You Aren't Going to Need It. It's an important software design principle to apply as you write code.

The best code I ever wrote was code I never wrote.

If we apply YAGNI to design patterns, we see that the features of Rust allow us to throw out many patterns. For instance, there is no need for the strategy pattern in Rust because we can just use traits.

TODO: Maybe include some code to illustrate the traits.

Builder

Description

Construct an object with calls to a builder helper.

Example

struct Foo {
    // Lots of complicated fields.
}

struct FooBuilder {
    // Probably lots of optional fields.
    //..
}

impl FooBuilder {
    fn new(
        //..
    ) -> FooBuilder {
        // Set the minimally required fields of Foo.
    }

    fn named(mut self, name: &str) -> FooBuilder {
        // Set the name on the builder itself, and return the builder by value.
    }

    // More methods that take `mut self` and return `FooBuilder` setting up
    // various aspects of a Foo.
    ...

    // If we can get away with not consuming the Builder here, that is an
    // advantage. It means we can use the builder as a template for constructing
    // many Foos.
    fn finish(&self) -> Foo {
        // Create a Foo from the FooBuilder, applying all settings in FooBuilder to Foo.
    }
}

fn main() {
    let f = FooBuilder::new().named("Bar").with_attribute(...).finish();
}

Motivation

Useful when you would otherwise require many different constructors or where construction has side effects.

Advantages

Separates methods for building from other methods.

Prevents proliferation of constructors

Can be used for one-liner initialisation as well as more complex construction.

Disadvantages

More complex than creating a struct object directly, or a simple constructor function.

Discussion

This pattern is seen more frequently in Rust (and for simpler objects) than in many other languages because Rust lacks overloading. Since you can only have a single method with a given name, having multiple constructors is less nice in Rust than in C++, Java, or others.

This pattern is often used where the builder object is useful in its own right, rather than being just a builder. For example, see std::process::Command is a builder for Child (a process). In these cases, the T and TBuilder pattern of naming is not used.

The example takes and returns the builder by value. It is often more ergonomic (and more efficient) to take and return the builder as a mutable reference. The borrow checker makes this work naturally. This approach has the advantage that one can write code like

let mut fb = FooBuilder::new();
fb.a();
fb.b();
let f = fb.finish();

as well as the FooBuilder::new().a().b().finish() style.

Compose structs together for better borrowing

TODO - this is not a very snappy name

Description

Sometimes a large struct will cause issues with the borrow checker - although fields can be borrowed independently, sometimes the whole struct ends up being used at once, preventing other uses. A solution might be to decompose the struct into several smaller structs. Then compose these together into the original struct. Then each struct can be borrowed separately and have more flexible behaviour.

This will often lead to a better design in other ways: applying this design pattern often reveals smaller units of functionality.

Example

Here is a contrived example of where the borrow checker foils us in our plan to use a struct:

struct A {
    f1: u32,
    f2: u32,
    f3: u32,
}

fn foo(a: &mut A) -> &u32 { &a.f2 }
fn bar(a: &mut A) -> u32 { a.f1 + a.f3 }

fn baz(a: &mut A) {
    // x causes a to be borrowed for the rest of the function.
    let x = foo(a);
    // Borrow check error
    let y = bar(a); //~ ERROR: cannot borrow `*a` as mutable more than once at a time
}

We can apply this design pattern and refactor A into two smaller structs, thus solving the borrow checking issue:


#![allow(unused)]
fn main() {
// A is now composed of two structs - B and C.
struct A {
    b: B,
    c: C,
}
struct B {
    f2: u32,
}
struct C {
    f1: u32,
    f3: u32,
}

// These functions take a B or C, rather than A.
fn foo(b: &mut B) -> &u32 { &b.f2 }
fn bar(c: &mut C) -> u32 { c.f1 + c.f3 }

fn baz(a: &mut A) {
    let x = foo(&mut a.b);
    // Now it's OK!
    let y = bar(&mut a.c);
}
}

Motivation

Why and where you should use the pattern

Advantages

Lets you work around limitations in the borrow checker.

Often produces a better design.

Disadvantages

Leads to more verbose code.

Sometimes, the smaller structs are not good abstractions, and so we end up with a worse design. That is probably a 'code smell', indicating that the program should be refactored in some way.

Discussion

This pattern is not required in languages that don't have a borrow checker, so in that sense is unique to Rust. However, making smaller units of functionality often leads to cleaner code: a widely acknowledged principle of software engineering, independent of the language.

This pattern relies on Rust's borrow checker to be able to borrow fields independently of each other. In the example, the borrow checker knows that a.b and a.c are distinct and can be borrowed independently, it does not try to borrow all of a, which would make this pattern useless.

Entry API

Description

A short, prose description of the pattern.

Example


#![allow(unused)]
fn main() {
// An example of the pattern in action, should be mostly code, commented
// liberally.
}

Motivation

Why and where you should use the pattern

Advantages

Good things about this pattern.

Disadvantages

Bad things about this pattern. Possible contraindications.

Discussion

TODO vs insert_or_update etc.

Fold

Description

Run an algorithm over each item in a collection of data to create a new item, thus creating a whole new collection.

The etymology here is unclear to me. The terms 'fold' and 'folder' are used in the Rust compiler, although it appears to me to be more like a map than a fold in the usual sense. See the discussion below for more details.

Example

// The data we will fold, a simple AST.
mod ast {
    pub enum Stmt {
        Expr(Box<Expr>),
        Let(Box<Name>, Box<Expr>),
    }

    pub struct Name {
        value: String,
    }

    pub enum Expr {
        IntLit(i64),
        Add(Box<Expr>, Box<Expr>),
        Sub(Box<Expr>, Box<Expr>),
    }
}

// The abstract folder
mod fold {
    use ast::*;

    pub trait Folder {
        // A leaf node just returns the node itself. In some cases, we can do this
        // to inner nodes too.
        fn fold_name(&mut self, n: Box<Name>) -> Box<Name> { n }
        // Create a new inner node by folding its children.
        fn fold_stmt(&mut self, s: Box<Stmt>) -> Box<Stmt> {
            match *s {
                Stmt::Expr(e) => Box::new(Stmt::Expr(self.fold_expr(e))),
                Stmt::Let(n, e) => Box::new(Stmt::Let(self.fold_name(n), self.fold_expr(e))),
            }
        }
        fn fold_expr(&mut self, e: Box<Expr>) -> Box<Expr> { ... }
    }
}

use fold::*;
use ast::*;

// An example concrete implementation - renames every name to 'foo'.
struct Renamer;
impl Folder for Renamer {
    fn fold_name(&mut self, n: Box<Name>) -> Box<Name> {
        Box::new(Name { value: "foo".to_owned() })
    }
    // Use the default methods for the other nodes.
}

The result of running the Renamer on an AST is a new AST identical to the old one, but with every name changed to foo. A real life folder might have some state preserved between nodes in the struct itself.

A folder can also be defined to map one data structure to a different (but usually similar) data structure. For example, we could fold an AST into a HIR tree (HIR stands for high-level intermediate representation).

Motivation

It is common to want to map a data structure by performing some operation on each node in the structure. For simple operations on simple data structures, this can be done using Iterator::map. For more complex operations, perhaps where earlier nodes can affect the operation on later nodes, or where iteration over the data structure is non-trivial, using the fold pattern is more appropriate.

Like the visitor pattern, the fold pattern allows us to separate traversal of a data structure from the operations performed to each node.

Discussion

Mapping data structures in this fashion is common in functional languages. In OO languages, it would be more common to mutate the data structure in place. The 'functional' approach is common in Rust, mostly due to the preference for immutability. Using fresh data structures, rather than mutating old ones, makes reasoning about the code easier in most circumstances.

The trade-off between efficiency and reusability can be tweaked by changing how nodes are accepted by the fold_* methods.

In the above example we operate on Box pointers. Since these own their data exclusively, the original copy of the data structure cannot be re-used. On the other hand if a node is not changed, reusing it is very efficient.

If we were to operate on borrowed references, the original data structure can be reused; however, a node must be cloned even if unchanged, which can be expensive.

Using a reference counted pointer gives the best of both worlds - we can reuse the original data structure and we don't need to clone unchanged nodes. However, they are less ergonomic to use and mean that the data structures cannot be mutable.

Late bound bounds

Description

TODO late binding of bounds for better APIs (i.e., Mutex's don't require Send)

Example


#![allow(unused)]
fn main() {
// An example of the pattern in action, should be mostly code, commented
// liberally.
}

Motivation

Why and where you should use the pattern

Advantages

Good things about this pattern.

Disadvantages

Bad things about this pattern. Possible contraindications.

Discussion

A deeper discussion about this pattern. You might want to cover how this is done in other languages, alternative approaches, why this is particularly nice in Rust, etc.

Newtype

Rust has strong static types. This can be very different than what you are used to if you are coming from a loosely-typed language. Don't worry, though. Once you get used to them, you'll find the types actually make your life easier. Why? Because you are making implicit assumptions explicit.

A really convenient application of the Rust type system is the Newtype pattern.

Description

Use a tuple struct with a single field to make an opaque wrapper for a type. This creates a new type, rather than an alias to a type (type items).

Example

// Some type, not necessarily in the same module or even crate.
struct Foo {
    //..
}

impl Foo {
    // These functions are not present on Bar.
    //..
}

// The newtype.
pub struct Bar(Foo);

impl Bar {
    // Constructor.
    pub fn new(
        //..
    ) -> Bar {
    
        //..
    
    }

    //..
}

fn main() {
    let b = Bar::new(...);

    // Foo and Bar are type incompatible, the following do not type check.
    // let f: Foo = b;
    // let b: Bar = Foo { ... };
}

Motivation

The primary motivation for newtypes is abstraction. It allows you to share implementation details between types while precisely controlling the interface. By using a newtype rather than exposing the implementation type as part of an API, it allows you to change implementation backwards compatibly.

Newtypes can be used for distinguishing units, e.g., wrapping f64 to give distinguishable Miles and Kms.

Advantages

The wrapped and wrapper types are not type compatible (as opposed to using type), so users of the newtype will never 'confuse' the wrapped and wrapper types.

Newtypes are a zero-cost abstraction - there is no runtime overhead.

The privacy system ensures that users cannot access the wrapped type (if the field is private, which it is by default).

Disadvantages

The downside of newtypes (especially compared with type aliases), is that there is no special language support. This means there can be a lot of boilerplate. You need a 'pass through' method for every method you want to expose on the wrapped type, and an impl for every trait you want to also be implemented for the wrapper type.

Discussion

Newtypes are very common in Rust code. Abstraction or representing units are the most common uses, but they can be used for other reasons:

restricting functionality (reduce the functions exposed or traits implemented),
making a type with copy semantics have move semantics,
abstraction by providing a more concrete type and thus hiding internal types, e.g.,

pub struct Foo(Bar<T1, T2>);

Here, Bar might be some public, generic type and T1 and T2 are some internal types. Users of our module shouldn't know that we implement Foo by using a Bar, but what we're really hiding here is the types T1 and T2, and how they are used with Bar.

RAII with guards

Description

RAII stands for "Resource Acquisition is Initialisation" which is a terrible name. The essence of the pattern is that resource initialisation is done in the constructor of an object and finalisation in the destructor. This pattern is extended in Rust by using an RAII object as a guard of some resource and relying on the type system to ensure that access is always mediated by the guard object.

Example

Mutex guards are the classic example of this pattern from the std library (this is a simplified version of the real implementation):

use std::ops::Deref;

struct Foo {}

struct Mutex<T> {
    // We keep a reference to our data: T here.
    //..
}

struct MutexGuard<'a, T: 'a> {
    data: &'a T,
    //..
}

// Locking the mutex is explicit.
impl<T> Mutex<T> {
    fn lock(&self) -> MutexGuard<T> {
        // Lock the underlying OS mutex.
        //..

        // MutexGuard keeps a reference to self
        MutexGuard { 
            data: self, 
            //.. 
        }
    }
}

// Destructor for unlocking the mutex.
impl<'a, T> Drop for MutexGuard<'a, T> {
    fn drop(&mut self) {
        // Unlock the underlying OS mutex.
        //..
    }
}

// Implementing Deref means we can treat MutexGuard like a pointer to T.
impl<'a, T> Deref for MutexGuard<'a, T> {
    type Target = T;

    fn deref(&self) -> &T {
        self.data
    }
}

fn baz(x: Mutex<Foo>) {
    let xx = x.lock();
    xx.foo(); // foo is a method on Foo.
    // The borrow checker ensures we can't store a reference to the underlying
    // Foo which will outlive the guard xx.

    // x is unlocked when we exit this function and xx's destructor is executed.
}

Motivation

Where a resource must be finalised after use, RAII can be used to do this finalisation. If it is an error to access that resource after finalisation, then this pattern can be used to prevent such errors.

Advantages

Prevents errors where a resource is not finalised and where a resource is used after finalisation.

Discussion

RAII is a useful pattern for ensuring resources are properly deallocated or finalised. We can make use of the borrow checker in Rust to statically prevent errors stemming from using resources after finalisation takes place.

The core aim of the borrow checker is to ensure that references to data do not outlive that data. The RAII guard pattern works because the guard object contains a reference to the underlying resource and only exposes such references. Rust ensures that the guard cannot outlive the underlying resource and that references to the resource mediated by the guard cannot outlive the guard. To see how this works it is helpful to examine the signature of deref without lifetime elision:

fn deref<'a>(&'a self) -> &'a T {
    //..
}

The returned reference to the resource has the same lifetime as self ('a). The borrow checker therefore ensures that the lifetime of the reference to T is shorter than the lifetime of self.

Note that implementing Deref is not a core part of this pattern, it only makes using the guard object more ergonomic. Implementing a get method on the guard works just as well.

Prefer small crates

Description

Prefer small crates that do one thing well.

Cargo and crates.io make it easy to add third-party libraries, much more so than in say C or C++. Moreover, since packages on crates.io cannot be edited or removed after publication, any build that works now should continue to work in the future. We should take advantage of this tooling, and use smaller, more fine-grained dependencies.

Advantages

Small crates are easier to understand, and encourage more modular code.
Crates allow for re-using code between projects. For example, the url crate was developed as part of the Servo browser engine, but has since found wide use outside the project.
Since the compilation unit of Rust is the crate, splitting a project into multiple crates can allow more of the code to be built in parallel.

Disadvantages

This can lead to "dependency hell", when a project depends on multiple conflicting versions of a crate at the same time. For example, the url crate has both versions 1.0 and 0.5. Since the Url from url:1.0 and the Url from url:0.5 are different types, an HTTP client that uses url:0.5 would not accept Url values from a web scraper that uses url:1.0.
Packages on crates.io are not curated. A crate may be poorly written, have unhelpful documentation, or be outright malicious.
Two small crates may be less optimized than one large one, since the compiler does not perform link-time optimization (LTO) by default.

Examples

The ref_slice crate provides functions for converting &T to &[T].

The url crate provides tools for working with URLs.

The num_cpus crate provides a function to query the number of CPUs on a machine.

Contain unsafety in small modules

Description

If you have unsafe code, create the smallest possible module that can uphold the needed invariants to build a minimal safe interface upon the unsafety. Embed this into a larger module that contains only safe code and presents an ergonomic interface. Note that the outer module can contain unsafe functions and methods that call directly into the unsafe code. Users may use this to gain speed benefits.

Advantages

This restricts the unsafe code that must be audited
Writing the outer module is much easier, since you can count on the guarantees of the inner module

Disadvantages

Sometimes, it may be hard to find a suitable interface.
The abstraction may introduce inefficiencies.

Examples

The toolshed crate contains its unsafe operations in submodules, presenting a safe interface to users.
stds String class is a wrapper over Vec<u8> with the added invariant that the contents must be valid UTF-8. The operations on String ensure this behavior. However, users have the option of using an unsafe method to create a String, in which case the onus is on them to guarantee the validity of the contents.

Visitor

Description

A visitor encapsulates an algorithm that operates over a heterogeneous collection of objects. It allows multiple different algorithms to be written over the same data without having to modify the data (or their primary behaviour).

Furthermore, the visitor pattern allows separating the traversal of a collection of objects from the operations performed on each object.

Example

// The data we will visit
mod ast {
    pub enum Stmt {
        Expr(Expr),
        Let(Name, Expr),
    }

    pub struct Name {
        value: String,
    }

    pub enum Expr {
        IntLit(i64),
        Add(Box<Expr>, Box<Expr>),
        Sub(Box<Expr>, Box<Expr>),
    }
}

// The abstract visitor
mod visit {
    use ast::*;

    pub trait Visitor<T> {
        fn visit_name(&mut self, n: &Name) -> T;
        fn visit_stmt(&mut self, s: &Stmt) -> T;
        fn visit_expr(&mut self, e: &Expr) -> T;
    }
}

use visit::*;
use ast::*;

// An example concrete implementation - walks the AST interpreting it as code.
struct Interpreter;
impl Visitor<i64> for Interpreter {
    fn visit_name(&mut self, n: &Name) -> i64 { panic!() }
    fn visit_stmt(&mut self, s: &Stmt) -> i64 {
        match *s {
            Stmt::Expr(ref e) => self.visit_expr(e),
            Stmt::Let(..) => unimplemented!(),
        }
    }

    fn visit_expr(&mut self, e: &Expr) -> i64 {
        match *e {
            Expr::IntLit(n) => n,
            Expr::Add(ref lhs, ref rhs) => self.visit_expr(lhs) + self.visit_expr(rhs),
            Expr::Sub(ref lhs, ref rhs) => self.visit_expr(lhs) - self.visit_expr(rhs),
        }
    }
}

One could implement further visitors, for example a type checker, without having to modify the AST data.

Motivation

The visitor pattern is useful anywhere that you want to apply an algorithm to heterogeneous data. If data is homogeneous, you can use an iterator-like pattern. Using a visitor object (rather than a functional approach) allows the visitor to be stateful and thus communicate information between nodes.

Discussion

It is common for the visit_* methods to return void (as opposed to in the example). In that case it is possible to factor out the traversal code and share it between algorithms (and also to provide noop default methods). In Rust, the common way to do this is to provide walk_* functions for each datum. For example,

pub fn walk_expr(visitor: &mut Visitor, e: &Expr) {
    match *e {
        Expr::IntLit(_) => {},
        Expr::Add(ref lhs, ref rhs) => {
            visitor.visit_expr(lhs);
            visitor.visit_expr(rhs);
        }
        Expr::Sub(ref lhs, ref rhs) => {
            visitor.visit_expr(lhs);
            visitor.visit_expr(rhs);
        }
    }
}

In other languages (e.g., Java) it is common for data to have an accept method which performs the same duty.

Anti-patterns

An anti-pattern is a solution to a "recurring problem that is usually ineffective and risks being highly counterproductive". Just as valuable as knowing how to solve a problem, is knowing how not to solve it. Anti-patterns give us great counter-examples to consider relative to design patterns. Anti-patterns are not confined to code. For example, a process can be an anti-pattern, too.

`#![deny(warnings)]`

Description

A well-intentioned crate author wants to ensure their code builds without warnings. So they annotate their crate root with the following:

Example


#![allow(unused)]
#![deny(warnings)]

fn main() {
// All is well.
}

Advantages

It is short and will stop the build if anything is amiss.

Drawbacks

By disallowing the compiler to build with warnings, a crate author opts out of Rust's famed stability. Sometimes new features or old misfeatures need a change in how things are done, thus lints are written that warn for a certain grace period before being turned to deny.

For example, it was discovered that a type could have two impls with the same method. This was deemed a bad idea, but in order to make the transition smooth, the overlapping-inherent-impls lint was introduced to give a warning to those stumbling on this fact, before it becomes a hard error in a future release.

Also sometimes APIs get deprecated, so their use will emit a warning where before there was none.

All this conspires to potentially break the build whenever something changes.

Furthermore, crates that supply additional lints (e.g. rust-clippy) can no longer be used unless the annotation is removed. This is mitigated with --cap-lints.

Alternatives

There are two ways of tackling this problem: First, we can decouple the build setting from the code, and second, we can name the lints we want to deny explicitly.

The following command line will build with all warnings set to deny:

RUSTFLAGS="-D warnings" cargo build

This can be done by any individual developer (or be set in a CI tool like Travis, but remember that this may break the build when something changes) without requiring a change to the code.

Alternatively, we can specify the lints that we want to deny in the code. Here is a list of warning lints that is (hopefully) safe to deny:

#[deny(bad-style,
       const-err,
       dead-code,
       extra-requirement-in-impl,
       improper-ctypes,
       legacy-directory-ownership,
       non-shorthand-field-patterns,
       no-mangle-generic-items,
       overflowing-literals,
       path-statements ,
       patterns-in-fns-without-body,
       plugin-as-library,
       private-in-public,
       private-no-mangle-fns,
       private-no-mangle-statics,
       raw-pointer-derive,
       safe-extern-statics,
       unconditional-recursion,
       unions-with-drop-fields,
       unused,
       unused-allocation,
       unused-comparisons,
       unused-parens,
       while-true)]

In addition, the following allowed lints may be a good idea to deny:

#[deny(missing-debug-implementations,
       missing-docs,
       trivial-casts,
       trivial-numeric-casts,
       unused-extern-crates,
       unused-import-braces,
       unused-qualifications,
       unused-results)]

Some may also want to add missing-copy-implementations to their list.

Note that we explicitly did not add the deprecated lint, as it is fairly certain that there will be more deprecated APIs in the future.

`Deref` polymorphism

Description

Abuse the Deref trait to emulate inheritance between structs, and thus reuse methods.

Example

Sometimes we want to emulate the following common pattern from OO languages such as Java:

class Foo {
    void m() { ... }
}

class Bar extends Foo {}

public static void main(String[] args) {
    Bar b = new Bar();
    b.m();
}

We can use the deref polymorphism anti-pattern to do so:

use std::ops::Deref;

struct Foo {}

impl Foo {
    fn m(&self) { 
        //.. 
    }

}

struct Bar {
    f: Foo
}

impl Deref for Bar {
    type Target = Foo;
    fn deref(&self) -> &Foo {
        &self.f
    }
}

fn main() {
    let b = Bar { Foo {} };
    b.m();
}

There is no struct inheritance in Rust. Instead we use composition and include an instance of Foo in Bar (since the field is a value, it is stored inline, so if there were fields, they would have the same layout in memory as the Java version (probably, you should use #[repr(C)] if you want to be sure)).

In order to make the method call work we implement Deref for Bar with Foo as the target (returning the embedded Foo field). That means that when we dereference a Bar (for example, using *) then we will get a Foo. That is pretty weird. Dereferencing usually gives a T from a reference to T, here we have two unrelated types. However, since the dot operator does implicit dereferencing, it means that the method call will search for methods on Foo as well as Bar.

Advantages

You save a little boilerplate, e.g.,

impl Bar {
    fn m(&self) { 
        self.f.m()
    }
}

Disadvantages

Most importantly this is a surprising idiom - future programmers reading this in code will not expect this to happen. That's because we are abusing the Deref trait rather than using it as intended (and documented, etc.). It's also because the mechanism here is completely implicit.

This pattern does not introduce subtyping between Foo and Bar like inheritance in Java or C++ does. Furthermore, traits implemented by Foo are not automatically implemented for Bar, so this pattern interacts badly with bounds checking and thus generic programming.

Using this pattern gives subtly different semantics from most OO languages with regards to self. Usually it remains a reference to the sub-class, with this pattern it will be the 'class' where the method is defined.

Finally, this pattern only supports single inheritance, and has no notion of interfaces, class-based privacy, or other inheritance-related features. So, it gives an experience that will be subtly surprising to programmers used to Java inheritance, etc.

Discussion

There is no one good alternative. Depending on the exact circumstances it might be better to re-implement using traits or to write out the facade methods to dispatch to Foo manually. We do intend to add a mechanism for inheritance similar to this to Rust, but it is likely to be some time before it reaches stable Rust. See these blog posts and this RFC issue for more details.

The Deref trait is designed for the implementation of custom pointer types. The intention is that it will take a pointer-to-T to a T, not convert between different types. It is a shame that this isn't (probably cannot be) enforced by the trait definition.

Rust tries to strike a careful balance between explicit and implicit mechanisms, favouring explicit conversions between types. Automatic dereferencing in the dot operator is a case where the ergonomics strongly favour an implicit mechanism, but the intention is that this is limited to degrees of indirection, not conversion between arbitrary types.

Functional Usage of Rust

Rust is an imperative language, but it follows many functional programming paradigms. One of the biggest hurdles to understanding functional programs when coming from an imperative background is the shift in thinking. Imperative programs describe how to do something, whereas declarative programs describe what to do. Let's sum the numbers from 1 to 10 to show this.

Imperative


#![allow(unused)]
fn main() {
let mut sum = 0;
for i in 1..11 {
	sum += i;
}
println!("{}", sum);
}

With imperative programs, we have to play compiler to see what is happening. Here, we start with a sum of 0. Next, we iterate through the range from 1 to 10. Each time through the loop, we add the corresponding value in the range. Then we print it out.

`i`	`sum`
1	1
2	3
3	6
4	10
5	15
6	21
7	28
8	36
9	45
10	55

This is how most of us start out programming. We learn that a program is a set of steps.

Declarative


#![allow(unused)]
fn main() {
println!("{}", (1..11).fold(0, |a, b| a + b));
}

Whoa! This is really different! What's going on here? Remember that with declarative programs we are describing what to do, rather than how to do it. fold is a function that composes functions. The name is a convention from Haskell.

Here, we are composing functions of addition (this closure: |a, b| a + b)) with a range from 1 to 10. The 0 is the starting point, so a is 0 at first. b is the first element of the range, 1. 0 + 1 = 1 is the result. So now we fold again, with a = 1, b = 2 and so 1 + 2 = 3 is the next result. This process continues until we get to the last element in the range, 10.

`a`	`b`	result
0	1	1
1	2	3
3	3	6
6	4	10
10	5	15
15	6	21
21	7	28
28	8	36
36	9	45
45	10	55

Rust Design Patterns