You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
patterns/src/functional/lenses.md

12 KiB

Lenses and Prisms

This is a pure functional concept that is not frequently used in Rust. Nevertheless, exploring the concept may be helpful to understand other patterns in Rust APIs, such as visitors. They also have niche use cases.

Lenses: Uniform Access Across Types

A lens is a concept from functional programming languages that allows accessing parts of a data type in an abstract, unified way.1 In basic concept, it is similar to the way Rust traits work with type erasure, but it has a bit more power and flexibility.

For example, suppose a bank contains several JSON formats for customer data. This is because they come from different databases or legacy systems. One database contains the data needed to perform credit checks:

{ "name": "Jane Doe",
  "dob": "2002-02-24",
  [...]
  "customer_id": 1048576332,
}

Another one contains the account information:

{ "customer_id": 1048576332,
  "accounts": [
      { "account_id": 2121,
        "account_type: "savings",
        "joint_customer_ids": [],
        [...]
      },
      { "account_id": 2122,
        "account_type: "checking",
        "joint_customer_ids": [1048576333],
        [...]
      },
  ]
}

Notice that both types have a customer ID number which corresponds to a person. How would a single function handle both records of different types?

In Rust, a struct could represent each of these types, and a trait would have a get_customer_id function they would implement:

use std::collections::HashSet;

pub struct Account {
    account_id: u32,
    account_type: String,
    // other fields omitted
}

pub trait CustomerId {
    fn get_customer_id(&self) -> u64;
}

pub struct CreditRecord {
    customer_id: u64,
    name: String,
    dob: String,
    // other fields omitted
}

impl CustomerId for CreditRecord {
    fn get_customer_id(&self) -> u64 {
        self.customer_id
    }
}

pub struct AccountRecord {
    customer_id: u64,
    accounts: Vec<Account>,
}

impl CustomerId for AccountRecord {
    fn get_customer_id(&self) -> u64 {
        self.customer_id
    }
}

// static polymorphism: only one type, but each function call can choose it
fn unique_ids_set<R: CustomerId>(records: &[R]) -> HashSet<u64> {
    records.iter().map(|r| r.get_customer_id()).collect()
}

// dynamic dispatch: iterates over any type with a customer ID, collecting all
// values together
fn unique_ids_iter<I>(iterator: I) -> HashSet<u64>
    where I: Iterator<Item=Box<dyn CustomerId>>
{
    iterator.map(|r| r.as_ref().get_customer_id()).collect()
}

Lenses, however, allow the code supporting customer ID to be moved from the type to the accessor function. Rather than implementing a trait on each type, all matching structures can simply be accessed the same way.

While the Rust language itself does not support this (type erasure is the preferred solution to this problem), the lens-rs crate allows code that feels like this to be written with macros:

use std::collections::HashSet;

use lens_rs::{optics, Lens, LensRef, Optics};

#[derive(Clone, Debug, Lens /* derive to allow lenses to work */)]
pub struct CreditRecord {
    #[optic(ref)] // macro attribute to allow viewing this field
    customer_id: u64,
    name: String,
    dob: String,
    // other fields omitted
}

#[derive(Clone, Debug)]
pub struct Account {
    account_id: u32,
    account_type: String,
    // other fields omitted
}

#[derive(Clone, Debug, Lens)]
pub struct AccountRecord {
    #[optic(ref)]
    customer_id: u64,
    accounts: Vec<Account>,
}

fn unique_ids_lens<T>(iter: impl Iterator<Item = T>) -> HashSet<u64>
where
    T: LensRef<Optics![customer_id], u64>, // any type with this field
{
    iter.map(|r| *r.view_ref(optics!(customer_id))).collect()
}

The version of unique_ids_lens shown here allows any type to be in the iterator, so long as it has an attribute called customer_id which can be accessed by the function. This is how most functional programming languages operate on lenses.

Rather than macros, they achieve this with a technique known as "currying". That is, they "partially construct" the function, leaving the type of the final parameter (the value being operated on) unfilled until the function is called. Thus it can be called with different types dynamically even from one place in the code. That is what the optics! and view_ref in the example above simulates.

The functional approach need not be restricted to accessing members. More powerful lenses can be created which both set and get data in a structure. But the concept really becomes interesting when used as a building block for composition. That is where the concept appears more clearly in Rust.

Prisms: A Higher-Order form of "Optics"

A simple function such as unique_ids_lens above operates on a single lens. A prism is a function that operates on a family of lenses. It is one conceptual level higher, using lenses as a building block, and continuing the metaphor, is part of a family of "optics". It is the main one that is useful in understanding Rust APIs, so will be the focus here.

The same way that traits allow "lens-like" design with static polymorphism and dynamic dispatch, prism-like designs appear in Rust APIs which split problems into multiple associated types to be composed. A good example of this is the traits in the parsing crate Serde.

Trying to understand the way Serde works by only reading the API is a challenge, especially the first time. Consider the Deserializer trait, implemented by some type in any library which parses a new format:

pub trait Deserializer<'de>: Sized {
    type Error: Error;

    fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
    where
        V: Visitor<'de>;

    fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error>
    where
        V: Visitor<'de>;

    // remainder ommitted
}

For a trait that is just supposed to parse data from a format and return a value, this looks odd.

Why are all the return types type erased?

To understand that, we need to keep the lens concept in mind and look at the definition of the Visitor type that is passed in generically:

pub trait Visitor<'de>: Sized {
    type Value;

    fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
    where
        E: Error;

    fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
    where
        E: Error;

    fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
    where
        E: Error;

    // remainder omitted
}

The job of the Visitor type is to construct values in the Serde data model, which are represented by its associated Value type.

These values represent parts of the Rust value being deserialized. If this fails, it returns an Error type - an error type determined by the Deserializer when its methods were called.

This highlights that Deserializer is similar to CustomerId from earlier, allowing any format parser which implements it to create Values based on what it parsed. The Value trait is acting like a lens in functional programming languages.

But unlike the CustomerId trait, the return types of Visitor methods are generic, and the concrete Value type is determined by the Visitor itself.

Instead of acting as one lens, it effectively acts as a family of lenses, one for each concrete type of Visitor.

The Deserializer API is based on having a generic set of "lenses" work across a set of other generic types for "observation". It is a prism.

For example, consider the identity record from earlier but simplified:

{ "name": "Jane Doe", "customer_id": 1048576332 }

How would the Serde library deserialize this JSON into struct CreditRecord?

  1. The user would call a library function to deserialize the data. This would create a Deserializer based on the JSON format.
  2. Based on the fields in the struct, a Visitor would be created (more on that in a moment) which knows how to create each type in a generic data model that was needed to represent it: u64 and String.
  3. The deserializer would make calls to the Visitor as it parsed items.
  4. The Visitor would indicate if the items found were expected, and if not, raise an error to indicate deserialization has failed.

For our very simple structure above, the expected pattern would be:

  1. Visit a map (Serde's equvialent to HashMap or JSON's dictionary).
  2. Visit a string key called "name".
  3. Visit a string value, which will go into the name field.
  4. Visit a string key called "customer_id".
  5. Visit a string value, which will go into the customer_id field.
  6. Visit the end of the map.

But what determines which "observation" pattern is expected?

A functional programming language would be able to use currying to create reflection of each type based on the type itself. Rust does not support that, so every single type would need to have its own code written based on its fields and their properties.

Serde solves this usability challenge with a derive macro:

use serde::Deserialize;

#[derive(Deserialize)]
struct IdRecord {
    name: String,
    customer_id: String,
}

That macro simply generates an impl block causing the struct to implement a trait called Deserialize.

It is defined this way:

pub trait Deserialize<'de>: Sized {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>;
}

This is the function that determines how to create the struct itself. Code is generated based on the struct's fields. When the parsing library is called - in our example, a JSON parsing library - it creates a Deserializer and calls Type::deserialize with it as a parameter.

The deserialize code will then create a Visitor which will have its calls "refracted" by the Deserializer. If everything goes well, eventually that Visitor will construct a value corresponding to the type being parsed and return it.

For a complete example, see the Serde documentation.

To wrap up, this is the power of Serde:

  1. The structure being parsed is represented by an impl block for Deserialize
  2. The input data format (e.g. JSON) is represented by a Deserializer called by Deserialize
  3. The Deserializer acts like a prism which "refracts" lens-like Visitor calls which actually build the data value

The result is that types to be deserialized only implement the "top layer" of the API, and file formats only need to implement the "bottom layer". Each piece can then "just work" with the rest of the ecosystem, since generic types will bridge them.

To emphasize, the only reason this model works on any format and any type is because the Deserializer trait's output type is specified by the implementor of Visitor it is passed, rather than being tied to one specific type. This was not true in the account example earlier.

Rust's generic-inspired type system can bring it close to these concepts and use their power, as shown in this API design. But it may also need procedural macros to create bridges for its generics.

See Also

  • lens-rs crate for a pre-built lenses implementation, with a cleaner interface than these examples
  • serde itself, which makes these concepts intuitive for end users (i.e. defining the structs) without needing to undestand the details
  • luminance is a crate for drawing computer graphics that uses lens API design, including proceducal macros to create full prisms for buffers of different pixel types that remain generic
  • An Article about Lenses in Scala that is very readable even without Scala expertise.
  • Paper: Profunctor Optics: Modular Data Accessors