You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
patterns/src/functional/lenses.md

356 lines
12 KiB
Markdown

# Lenses and Prisms
This is a pure functional concept that is not frequently used in Rust.
Nevertheless, exploring the concept may be helpful to understand other patterns
in Rust APIs, such as [visitors](../patterns/behavioural/visitor.md). They also
have niche use cases.
## Lenses: Uniform Access Across Types
A lens is a concept from functional programming languages that allows accessing
parts of a data type in an abstract, unified way.[^1] In basic concept, it is
similar to the way Rust traits work with type erasure, but it has a bit more
power and flexibility.
For example, suppose a bank contains several JSON formats for customer data.
This is because they come from different databases or legacy systems. One
database contains the data needed to perform credit checks:
```json
{ "name": "Jane Doe",
"dob": "2002-02-24",
[...]
"customer_id": 1048576332,
}
```
Another one contains the account information:
```json
{ "customer_id": 1048576332,
"accounts": [
{ "account_id": 2121,
"account_type: "savings",
"joint_customer_ids": [],
[...]
},
{ "account_id": 2122,
"account_type: "checking",
"joint_customer_ids": [1048576333],
[...]
},
]
}
```
Notice that both types have a customer ID number which corresponds to a person.
How would a single function handle both records of different types?
In Rust, a `struct` could represent each of these types, and a trait would have
a `get_customer_id` function they would implement:
```rust
use std::collections::HashSet;
pub struct Account {
account_id: u32,
account_type: String,
// other fields omitted
}
pub trait CustomerId {
fn get_customer_id(&self) -> u64;
}
pub struct CreditRecord {
customer_id: u64,
name: String,
dob: String,
// other fields omitted
}
impl CustomerId for CreditRecord {
fn get_customer_id(&self) -> u64 {
self.customer_id
}
}
pub struct AccountRecord {
customer_id: u64,
accounts: Vec<Account>,
}
impl CustomerId for AccountRecord {
fn get_customer_id(&self) -> u64 {
self.customer_id
}
}
// static polymorphism: only one type, but each function call can choose it
fn unique_ids_set<R: CustomerId>(records: &[R]) -> HashSet<u64> {
records.iter().map(|r| r.get_customer_id()).collect()
}
// dynamic dispatch: iterates over any type with a customer ID, collecting all
// values together
fn unique_ids_iter<I>(iterator: I) -> HashSet<u64>
where I: Iterator<Item=Box<dyn CustomerId>>
{
iterator.map(|r| r.as_ref().get_customer_id()).collect()
}
```
Lenses, however, allow the code supporting customer ID to be moved from the
*type* to the *accessor function*. Rather than implementing a trait on each
type, all matching structures can simply be accessed the same way.
While the Rust language itself does not support this (type erasure is the
preferred solution to this problem), the
[lens-rs crate](https://github.com/TOETOE55/lens-rs/blob/master/guide.md) allows
code that feels like this to be written with macros:
```rust,ignore
use std::collections::HashSet;
use lens_rs::{optics, Lens, LensRef, Optics};
#[derive(Clone, Debug, Lens /* derive to allow lenses to work */)]
pub struct CreditRecord {
#[optic(ref)] // macro attribute to allow viewing this field
customer_id: u64,
name: String,
dob: String,
// other fields omitted
}
#[derive(Clone, Debug)]
pub struct Account {
account_id: u32,
account_type: String,
// other fields omitted
}
#[derive(Clone, Debug, Lens)]
pub struct AccountRecord {
#[optic(ref)]
customer_id: u64,
accounts: Vec<Account>,
}
fn unique_ids_lens<T>(iter: impl Iterator<Item = T>) -> HashSet<u64>
where
T: LensRef<Optics![customer_id], u64>, // any type with this field
{
iter.map(|r| *r.view_ref(optics!(customer_id))).collect()
}
```
The version of `unique_ids_lens` shown here allows any type to be in the
iterator, so long as it has an attribute called `customer_id` which can be
accessed by the function. This is how most functional programming languages
operate on lenses.
Rather than macros, they achieve this with a technique known as "currying". That
is, they "partially construct" the function, leaving the type of the final
parameter (the value being operated on) unfilled until the function is called.
Thus it can be called with different types dynamically even from one place in
the code. That is what the `optics!` and `view_ref` in the example above
simulates.
The functional approach need not be restricted to accessing members. More
powerful lenses can be created which both *set* and *get* data in a structure.
But the concept really becomes interesting when used as a building block for
composition. That is where the concept appears more clearly in Rust.
## Prisms: A Higher-Order form of "Optics"
A simple function such as `unique_ids_lens` above operates on a single lens. A
*prism* is a function that operates on a *family* of lenses. It is one
conceptual level higher, using lenses as a building block, and continuing the
metaphor, is part of a family of "optics". It is the main one that is useful in
understanding Rust APIs, so will be the focus here.
The same way that traits allow "lens-like" design with static polymorphism and
dynamic dispatch, prism-like designs appear in Rust APIs which split problems
into multiple associated types to be composed. A good example of this is the
traits in the parsing crate *Serde*.
Trying to understand the way *Serde* works by only reading the API is a
challenge, especially the first time. Consider the `Deserializer` trait,
implemented by some type in any library which parses a new format:
```rust,ignore
pub trait Deserializer<'de>: Sized {
type Error: Error;
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: Visitor<'de>;
fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: Visitor<'de>;
// remainder ommitted
}
```
For a trait that is just supposed to parse data from a format and return a
value, this looks odd.
Why are all the return types type erased?
To understand that, we need to keep the lens concept in mind and look at the
definition of the `Visitor` type that is passed in generically:
```rust,ignore
pub trait Visitor<'de>: Sized {
type Value;
fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
where
E: Error;
fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
where
E: Error;
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where
E: Error;
// remainder omitted
}
```
The job of the `Visitor` type is to construct values in the *Serde* data model,
which are represented by its associated `Value` type.
These values represent parts of the Rust value being deserialized. If this
fails, it returns an `Error` type - an error type determined by the
`Deserializer` when its methods were called.
This highlights that `Deserializer` is similar to `CustomerId` from earlier,
allowing any format parser which implements it to create `Value`s based on what
it parsed. The `Value` trait is acting like a lens in functional programming
languages.
But unlike the `CustomerId` trait, the return types of `Visitor` methods are
*generic*, and the concrete `Value` type is *determined by the Visitor itself*.
Instead of acting as one lens, it effectively acts as a family of lenses, one
for each concrete type of `Visitor`.
The `Deserializer` API is based on having a generic set of "lenses" work across
a set of other generic types for "observation". It is a *prism*.
For example, consider the identity record from earlier but simplified:
```json
{ "name": "Jane Doe", "customer_id": 1048576332 }
```
How would the *Serde* library deserialize this JSON into `struct CreditRecord`?
1. The user would call a library function to deserialize the data. This would
create a `Deserializer` based on the JSON format.
1. Based on the fields in the struct, a `Visitor` would be created (more on that
in a moment) which knows how to create each type in a generic data model that
was needed to represent it: `u64` and `String`.
1. The deserializer would make calls to the `Visitor` as it parsed items.
1. The `Visitor` would indicate if the items found were expected, and if not,
raise an error to indicate deserialization has failed.
For our very simple structure above, the expected pattern would be:
1. Visit a map (*Serde*'s equvialent to `HashMap` or JSON's dictionary).
1. Visit a string key called "name".
1. Visit a string value, which will go into the `name` field.
1. Visit a string key called "customer_id".
1. Visit a string value, which will go into the `customer_id` field.
1. Visit the end of the map.
But what determines which "observation" pattern is expected?
A functional programming language would be able to use currying to create
reflection of each type based on the type itself. Rust does not support that, so
every single type would need to have its own code written based on its fields
and their properties.
*Serde* solves this usability challenge with a derive macro:
```rust,ignore
use serde::Deserialize;
#[derive(Deserialize)]
struct IdRecord {
name: String,
customer_id: String,
}
```
That macro simply generates an impl block causing the struct to implement a
trait called `Deserialize`.
It is defined this way:
```rust,ignore
pub trait Deserialize<'de>: Sized {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>;
}
```
This is the function that determines how to create the struct itself. Code is
generated based on the struct's fields. When the parsing library is called - in
our example, a JSON parsing library - it creates a `Deserializer` and calls
`Type::deserialize` with it as a parameter.
The `deserialize` code will then create a `Visitor` which will have its calls
"refracted" by the `Deserializer`. If everything goes well, eventually that
`Visitor` will construct a value corresponding to the type being parsed and
return it.
For a complete example, see the
[*Serde* documentation](https://serde.rs/deserialize-struct.html).
To wrap up, this is the power of *Serde*:
1. The structure being parsed is represented by an `impl` block for
`Deserialize`
1. The input data format (e.g. JSON) is represented by a `Deserializer` called
by `Deserialize`
1. The `Deserializer` acts like a prism which "refracts" lens-like `Visitor`
calls which actually build the data value
The result is that types to be deserialized only implement the "top layer" of
the API, and file formats only need to implement the "bottom layer". Each piece
can then "just work" with the rest of the ecosystem, since generic types will
bridge them.
To emphasize, the only reason this model works on any format and any type is
because the `Deserializer` trait's output type **is specified by the implementor
of `Visitor` it is passed**, rather than being tied to one specific type. This
was not true in the account example earlier.
Rust's generic-inspired type system can bring it close to these concepts and use
their power, as shown in this API design. But it may also need procedural macros
to create bridges for its generics.
## See Also
- [lens-rs crate](https://crates.io/crates/lens-rs) for a pre-built lenses
implementation, with a cleaner interface than these examples
- [serde](https://serde.rs) itself, which makes these concepts intuitive for end
users (i.e. defining the structs) without needing to undestand the details
- [luminance](https://github.com/phaazon/luminance-rs) is a crate for drawing
computer graphics that uses lens API design, including proceducal macros to
create full prisms for buffers of different pixel types that remain generic
- [An Article about Lenses in Scala](https://web.archive.org/web/20221128185849/https://medium.com/zyseme-technology/functional-references-lens-and-other-optics-in-scala-e5f7e2fdafe)
that is very readable even without Scala expertise.
- [Paper: Profunctor Optics: Modular Data
Accessors](https://web.archive.org/web/20220701102832/https://arxiv.org/ftp/arxiv/papers/1703/1703.10857.pdf)
[^1]: [School of Haskell: A Little Lens Starter Tutorial](https://web.archive.org/web/20221128190041/https://www.schoolofhaskell.com/school/to-infinity-and-beyond/pick-of-the-week/a-little-lens-starter-tutorial)