Iteration on Functional optics (#364)
Co-authored-by: simonsan <14062932+simonsan@users.noreply.github.com> Co-authored-by: Julian Ganz <neither@nut.email>pull/373/head
parent
0a12425a64
commit
bbcf07985d
@ -1,355 +0,0 @@
|
|||||||
# Lenses and Prisms
|
|
||||||
|
|
||||||
This is a pure functional concept that is not frequently used in Rust.
|
|
||||||
Nevertheless, exploring the concept may be helpful to understand other patterns
|
|
||||||
in Rust APIs, such as [visitors](../patterns/behavioural/visitor.md). They also
|
|
||||||
have niche use cases.
|
|
||||||
|
|
||||||
## Lenses: Uniform Access Across Types
|
|
||||||
|
|
||||||
A lens is a concept from functional programming languages that allows accessing
|
|
||||||
parts of a data type in an abstract, unified way.[^1] In basic concept, it is
|
|
||||||
similar to the way Rust traits work with type erasure, but it has a bit more
|
|
||||||
power and flexibility.
|
|
||||||
|
|
||||||
For example, suppose a bank contains several JSON formats for customer data.
|
|
||||||
This is because they come from different databases or legacy systems. One
|
|
||||||
database contains the data needed to perform credit checks:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{ "name": "Jane Doe",
|
|
||||||
"dob": "2002-02-24",
|
|
||||||
[...]
|
|
||||||
"customer_id": 1048576332,
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Another one contains the account information:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{ "customer_id": 1048576332,
|
|
||||||
"accounts": [
|
|
||||||
{ "account_id": 2121,
|
|
||||||
"account_type: "savings",
|
|
||||||
"joint_customer_ids": [],
|
|
||||||
[...]
|
|
||||||
},
|
|
||||||
{ "account_id": 2122,
|
|
||||||
"account_type: "checking",
|
|
||||||
"joint_customer_ids": [1048576333],
|
|
||||||
[...]
|
|
||||||
},
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Notice that both types have a customer ID number which corresponds to a person.
|
|
||||||
How would a single function handle both records of different types?
|
|
||||||
|
|
||||||
In Rust, a `struct` could represent each of these types, and a trait would have
|
|
||||||
a `get_customer_id` function they would implement:
|
|
||||||
|
|
||||||
```rust
|
|
||||||
use std::collections::HashSet;
|
|
||||||
|
|
||||||
pub struct Account {
|
|
||||||
account_id: u32,
|
|
||||||
account_type: String,
|
|
||||||
// other fields omitted
|
|
||||||
}
|
|
||||||
|
|
||||||
pub trait CustomerId {
|
|
||||||
fn get_customer_id(&self) -> u64;
|
|
||||||
}
|
|
||||||
|
|
||||||
pub struct CreditRecord {
|
|
||||||
customer_id: u64,
|
|
||||||
name: String,
|
|
||||||
dob: String,
|
|
||||||
// other fields omitted
|
|
||||||
}
|
|
||||||
|
|
||||||
impl CustomerId for CreditRecord {
|
|
||||||
fn get_customer_id(&self) -> u64 {
|
|
||||||
self.customer_id
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
pub struct AccountRecord {
|
|
||||||
customer_id: u64,
|
|
||||||
accounts: Vec<Account>,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl CustomerId for AccountRecord {
|
|
||||||
fn get_customer_id(&self) -> u64 {
|
|
||||||
self.customer_id
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// static polymorphism: only one type, but each function call can choose it
|
|
||||||
fn unique_ids_set<R: CustomerId>(records: &[R]) -> HashSet<u64> {
|
|
||||||
records.iter().map(|r| r.get_customer_id()).collect()
|
|
||||||
}
|
|
||||||
|
|
||||||
// dynamic dispatch: iterates over any type with a customer ID, collecting all
|
|
||||||
// values together
|
|
||||||
fn unique_ids_iter<I>(iterator: I) -> HashSet<u64>
|
|
||||||
where I: Iterator<Item=Box<dyn CustomerId>>
|
|
||||||
{
|
|
||||||
iterator.map(|r| r.as_ref().get_customer_id()).collect()
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Lenses, however, allow the code supporting customer ID to be moved from the
|
|
||||||
*type* to the *accessor function*. Rather than implementing a trait on each
|
|
||||||
type, all matching structures can simply be accessed the same way.
|
|
||||||
|
|
||||||
While the Rust language itself does not support this (type erasure is the
|
|
||||||
preferred solution to this problem), the
|
|
||||||
[lens-rs crate](https://github.com/TOETOE55/lens-rs/blob/master/guide.md) allows
|
|
||||||
code that feels like this to be written with macros:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
use std::collections::HashSet;
|
|
||||||
|
|
||||||
use lens_rs::{optics, Lens, LensRef, Optics};
|
|
||||||
|
|
||||||
#[derive(Clone, Debug, Lens /* derive to allow lenses to work */)]
|
|
||||||
pub struct CreditRecord {
|
|
||||||
#[optic(ref)] // macro attribute to allow viewing this field
|
|
||||||
customer_id: u64,
|
|
||||||
name: String,
|
|
||||||
dob: String,
|
|
||||||
// other fields omitted
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Clone, Debug)]
|
|
||||||
pub struct Account {
|
|
||||||
account_id: u32,
|
|
||||||
account_type: String,
|
|
||||||
// other fields omitted
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Clone, Debug, Lens)]
|
|
||||||
pub struct AccountRecord {
|
|
||||||
#[optic(ref)]
|
|
||||||
customer_id: u64,
|
|
||||||
accounts: Vec<Account>,
|
|
||||||
}
|
|
||||||
|
|
||||||
fn unique_ids_lens<T>(iter: impl Iterator<Item = T>) -> HashSet<u64>
|
|
||||||
where
|
|
||||||
T: LensRef<Optics![customer_id], u64>, // any type with this field
|
|
||||||
{
|
|
||||||
iter.map(|r| *r.view_ref(optics!(customer_id))).collect()
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
The version of `unique_ids_lens` shown here allows any type to be in the
|
|
||||||
iterator, so long as it has an attribute called `customer_id` which can be
|
|
||||||
accessed by the function. This is how most functional programming languages
|
|
||||||
operate on lenses.
|
|
||||||
|
|
||||||
Rather than macros, they achieve this with a technique known as "currying". That
|
|
||||||
is, they "partially construct" the function, leaving the type of the final
|
|
||||||
parameter (the value being operated on) unfilled until the function is called.
|
|
||||||
Thus it can be called with different types dynamically even from one place in
|
|
||||||
the code. That is what the `optics!` and `view_ref` in the example above
|
|
||||||
simulates.
|
|
||||||
|
|
||||||
The functional approach need not be restricted to accessing members. More
|
|
||||||
powerful lenses can be created which both *set* and *get* data in a structure.
|
|
||||||
But the concept really becomes interesting when used as a building block for
|
|
||||||
composition. That is where the concept appears more clearly in Rust.
|
|
||||||
|
|
||||||
## Prisms: A Higher-Order form of "Optics"
|
|
||||||
|
|
||||||
A simple function such as `unique_ids_lens` above operates on a single lens. A
|
|
||||||
*prism* is a function that operates on a *family* of lenses. It is one
|
|
||||||
conceptual level higher, using lenses as a building block, and continuing the
|
|
||||||
metaphor, is part of a family of "optics". It is the main one that is useful in
|
|
||||||
understanding Rust APIs, so will be the focus here.
|
|
||||||
|
|
||||||
The same way that traits allow "lens-like" design with static polymorphism and
|
|
||||||
dynamic dispatch, prism-like designs appear in Rust APIs which split problems
|
|
||||||
into multiple associated types to be composed. A good example of this is the
|
|
||||||
traits in the parsing crate *Serde*.
|
|
||||||
|
|
||||||
Trying to understand the way *Serde* works by only reading the API is a
|
|
||||||
challenge, especially the first time. Consider the `Deserializer` trait,
|
|
||||||
implemented by some type in any library which parses a new format:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
pub trait Deserializer<'de>: Sized {
|
|
||||||
type Error: Error;
|
|
||||||
|
|
||||||
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
|
||||||
where
|
|
||||||
V: Visitor<'de>;
|
|
||||||
|
|
||||||
fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
|
||||||
where
|
|
||||||
V: Visitor<'de>;
|
|
||||||
|
|
||||||
// remainder ommitted
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
For a trait that is just supposed to parse data from a format and return a
|
|
||||||
value, this looks odd.
|
|
||||||
|
|
||||||
Why are all the return types type erased?
|
|
||||||
|
|
||||||
To understand that, we need to keep the lens concept in mind and look at the
|
|
||||||
definition of the `Visitor` type that is passed in generically:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
pub trait Visitor<'de>: Sized {
|
|
||||||
type Value;
|
|
||||||
|
|
||||||
fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
|
|
||||||
where
|
|
||||||
E: Error;
|
|
||||||
|
|
||||||
fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
|
|
||||||
where
|
|
||||||
E: Error;
|
|
||||||
|
|
||||||
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
|
|
||||||
where
|
|
||||||
E: Error;
|
|
||||||
|
|
||||||
// remainder omitted
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
The job of the `Visitor` type is to construct values in the *Serde* data model,
|
|
||||||
which are represented by its associated `Value` type.
|
|
||||||
|
|
||||||
These values represent parts of the Rust value being deserialized. If this
|
|
||||||
fails, it returns an `Error` type - an error type determined by the
|
|
||||||
`Deserializer` when its methods were called.
|
|
||||||
|
|
||||||
This highlights that `Deserializer` is similar to `CustomerId` from earlier,
|
|
||||||
allowing any format parser which implements it to create `Value`s based on what
|
|
||||||
it parsed. The `Value` trait is acting like a lens in functional programming
|
|
||||||
languages.
|
|
||||||
|
|
||||||
But unlike the `CustomerId` trait, the return types of `Visitor` methods are
|
|
||||||
*generic*, and the concrete `Value` type is *determined by the Visitor itself*.
|
|
||||||
|
|
||||||
Instead of acting as one lens, it effectively acts as a family of lenses, one
|
|
||||||
for each concrete type of `Visitor`.
|
|
||||||
|
|
||||||
The `Deserializer` API is based on having a generic set of "lenses" work across
|
|
||||||
a set of other generic types for "observation". It is a *prism*.
|
|
||||||
|
|
||||||
For example, consider the identity record from earlier but simplified:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{ "name": "Jane Doe", "customer_id": 1048576332 }
|
|
||||||
```
|
|
||||||
|
|
||||||
How would the *Serde* library deserialize this JSON into `struct CreditRecord`?
|
|
||||||
|
|
||||||
1. The user would call a library function to deserialize the data. This would
|
|
||||||
create a `Deserializer` based on the JSON format.
|
|
||||||
1. Based on the fields in the struct, a `Visitor` would be created (more on that
|
|
||||||
in a moment) which knows how to create each type in a generic data model that
|
|
||||||
was needed to represent it: `u64` and `String`.
|
|
||||||
1. The deserializer would make calls to the `Visitor` as it parsed items.
|
|
||||||
1. The `Visitor` would indicate if the items found were expected, and if not,
|
|
||||||
raise an error to indicate deserialization has failed.
|
|
||||||
|
|
||||||
For our very simple structure above, the expected pattern would be:
|
|
||||||
|
|
||||||
1. Visit a map (*Serde*'s equvialent to `HashMap` or JSON's dictionary).
|
|
||||||
1. Visit a string key called "name".
|
|
||||||
1. Visit a string value, which will go into the `name` field.
|
|
||||||
1. Visit a string key called "customer_id".
|
|
||||||
1. Visit a string value, which will go into the `customer_id` field.
|
|
||||||
1. Visit the end of the map.
|
|
||||||
|
|
||||||
But what determines which "observation" pattern is expected?
|
|
||||||
|
|
||||||
A functional programming language would be able to use currying to create
|
|
||||||
reflection of each type based on the type itself. Rust does not support that, so
|
|
||||||
every single type would need to have its own code written based on its fields
|
|
||||||
and their properties.
|
|
||||||
|
|
||||||
*Serde* solves this usability challenge with a derive macro:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
use serde::Deserialize;
|
|
||||||
|
|
||||||
#[derive(Deserialize)]
|
|
||||||
struct IdRecord {
|
|
||||||
name: String,
|
|
||||||
customer_id: String,
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
That macro simply generates an impl block causing the struct to implement a
|
|
||||||
trait called `Deserialize`.
|
|
||||||
|
|
||||||
It is defined this way:
|
|
||||||
|
|
||||||
```rust,ignore
|
|
||||||
pub trait Deserialize<'de>: Sized {
|
|
||||||
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
|
|
||||||
where
|
|
||||||
D: Deserializer<'de>;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
This is the function that determines how to create the struct itself. Code is
|
|
||||||
generated based on the struct's fields. When the parsing library is called - in
|
|
||||||
our example, a JSON parsing library - it creates a `Deserializer` and calls
|
|
||||||
`Type::deserialize` with it as a parameter.
|
|
||||||
|
|
||||||
The `deserialize` code will then create a `Visitor` which will have its calls
|
|
||||||
"refracted" by the `Deserializer`. If everything goes well, eventually that
|
|
||||||
`Visitor` will construct a value corresponding to the type being parsed and
|
|
||||||
return it.
|
|
||||||
|
|
||||||
For a complete example, see the
|
|
||||||
[*Serde* documentation](https://serde.rs/deserialize-struct.html).
|
|
||||||
|
|
||||||
To wrap up, this is the power of *Serde*:
|
|
||||||
|
|
||||||
1. The structure being parsed is represented by an `impl` block for
|
|
||||||
`Deserialize`
|
|
||||||
1. The input data format (e.g. JSON) is represented by a `Deserializer` called
|
|
||||||
by `Deserialize`
|
|
||||||
1. The `Deserializer` acts like a prism which "refracts" lens-like `Visitor`
|
|
||||||
calls which actually build the data value
|
|
||||||
|
|
||||||
The result is that types to be deserialized only implement the "top layer" of
|
|
||||||
the API, and file formats only need to implement the "bottom layer". Each piece
|
|
||||||
can then "just work" with the rest of the ecosystem, since generic types will
|
|
||||||
bridge them.
|
|
||||||
|
|
||||||
To emphasize, the only reason this model works on any format and any type is
|
|
||||||
because the `Deserializer` trait's output type **is specified by the implementor
|
|
||||||
of `Visitor` it is passed**, rather than being tied to one specific type. This
|
|
||||||
was not true in the account example earlier.
|
|
||||||
|
|
||||||
Rust's generic-inspired type system can bring it close to these concepts and use
|
|
||||||
their power, as shown in this API design. But it may also need procedural macros
|
|
||||||
to create bridges for its generics.
|
|
||||||
|
|
||||||
## See Also
|
|
||||||
|
|
||||||
- [lens-rs crate](https://crates.io/crates/lens-rs) for a pre-built lenses
|
|
||||||
implementation, with a cleaner interface than these examples
|
|
||||||
- [serde](https://serde.rs) itself, which makes these concepts intuitive for end
|
|
||||||
users (i.e. defining the structs) without needing to undestand the details
|
|
||||||
- [luminance](https://github.com/phaazon/luminance-rs) is a crate for drawing
|
|
||||||
computer graphics that uses lens API design, including proceducal macros to
|
|
||||||
create full prisms for buffers of different pixel types that remain generic
|
|
||||||
- [An Article about Lenses in Scala](https://web.archive.org/web/20221128185849/https://medium.com/zyseme-technology/functional-references-lens-and-other-optics-in-scala-e5f7e2fdafe)
|
|
||||||
that is very readable even without Scala expertise.
|
|
||||||
- [Paper: Profunctor Optics: Modular Data
|
|
||||||
Accessors](https://web.archive.org/web/20220701102832/https://arxiv.org/ftp/arxiv/papers/1703/1703.10857.pdf)
|
|
||||||
|
|
||||||
[^1]: [School of Haskell: A Little Lens Starter Tutorial](https://web.archive.org/web/20221128190041/https://www.schoolofhaskell.com/school/to-infinity-and-beyond/pick-of-the-week/a-little-lens-starter-tutorial)
|
|
@ -0,0 +1,507 @@
|
|||||||
|
# Functional Language Optics
|
||||||
|
|
||||||
|
Optics is a type of API design that is common to functional languages. This is a
|
||||||
|
pure functional concept that is not frequently used in Rust.
|
||||||
|
|
||||||
|
Nevertheless, exploring the concept may be helpful to understand other patterns
|
||||||
|
in Rust APIs, such as [visitors](../patterns/behavioural/visitor.md). They also
|
||||||
|
have niche use cases.
|
||||||
|
|
||||||
|
This is quite a large topic, and would require actual books on language design
|
||||||
|
to fully get into its abilities. However their applicability in Rust is much
|
||||||
|
simpler.
|
||||||
|
|
||||||
|
To explain the relevant parts of the concept, the `Serde`-API will be used as an
|
||||||
|
example, as it is one that is difficult for many to to understand from simply
|
||||||
|
the API documentation.
|
||||||
|
|
||||||
|
In the process, different specific patterns, called Optics, will be covered.
|
||||||
|
These are *The Iso*, *The Poly Iso*, and *The Prism*.
|
||||||
|
|
||||||
|
## An API Example: Serde
|
||||||
|
|
||||||
|
Trying to understand the way *Serde* works by only reading the API is a
|
||||||
|
challenge, especially the first time. Consider the `Deserializer` trait,
|
||||||
|
implemented by any library which parses a new data format:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
pub trait Deserializer<'de>: Sized {
|
||||||
|
type Error: Error;
|
||||||
|
|
||||||
|
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
||||||
|
where
|
||||||
|
V: Visitor<'de>;
|
||||||
|
|
||||||
|
fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
||||||
|
where
|
||||||
|
V: Visitor<'de>;
|
||||||
|
|
||||||
|
// remainder omitted
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And here's the definition of the `Visitor` trait passed in generically:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
pub trait Visitor<'de>: Sized {
|
||||||
|
type Value;
|
||||||
|
|
||||||
|
fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
|
||||||
|
where
|
||||||
|
E: Error;
|
||||||
|
|
||||||
|
fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
|
||||||
|
where
|
||||||
|
E: Error;
|
||||||
|
|
||||||
|
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
|
||||||
|
where
|
||||||
|
E: Error;
|
||||||
|
|
||||||
|
// remainder omitted
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
There is a lot of type erasure going on here, with multiple levels of associated
|
||||||
|
types being passed back and forth.
|
||||||
|
|
||||||
|
But what is the big picture? Why not just have the `Visitor` return the pieces
|
||||||
|
the caller needs in a streaming API, and call it a day? Why all the extra
|
||||||
|
pieces?
|
||||||
|
|
||||||
|
One way to understand it is to look at a functional languages concept called
|
||||||
|
*optics*.
|
||||||
|
|
||||||
|
This is a way to do composition of behavior and proprieties that is designed to
|
||||||
|
facilitate patterns common to Rust: failure, type transformation, etc.[^1]
|
||||||
|
|
||||||
|
The Rust language does not have very good support for these directly. However,
|
||||||
|
they appear in the design of the language itself, and their concepts can help to
|
||||||
|
understand some of Rust's APIs. As a result, this attempts to explain the
|
||||||
|
concepts with the way Rust does it.
|
||||||
|
|
||||||
|
This will perhaps shed light on what those APIs are achieving: specific
|
||||||
|
properties of composability.
|
||||||
|
|
||||||
|
## Basic Optics
|
||||||
|
|
||||||
|
### The Iso
|
||||||
|
|
||||||
|
The Iso is a value transformer between two types. It is extremely simple, but a
|
||||||
|
conceptually important building block.
|
||||||
|
|
||||||
|
As an example, suppose that we have a custom Hash table structure used as a
|
||||||
|
concordance for a document.[^2] It uses strings for keys (words) and a list of
|
||||||
|
indexes for values (file offsets, for instance).
|
||||||
|
|
||||||
|
A key feature is the ability to serialize this format to disk. A "quick and
|
||||||
|
dirty" approach would be to implement a conversion to and from a string in JSON
|
||||||
|
format. (Errors are ignored for the time being, they will be handled later.)
|
||||||
|
|
||||||
|
To write it in a normal form expected by functional language users:
|
||||||
|
|
||||||
|
```text
|
||||||
|
case class ConcordanceSerDe {
|
||||||
|
serialize: Concordance -> String
|
||||||
|
deserialize: String -> Concordance
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The Iso is thus a pair of functions which convert values of different types:
|
||||||
|
`serialize` and `deserialize`.
|
||||||
|
|
||||||
|
A straightforward implementation:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use std::collections::HashMap;
|
||||||
|
|
||||||
|
struct Concordance {
|
||||||
|
keys: HashMap<String, usize>,
|
||||||
|
value_table: Vec<(usize, usize)>,
|
||||||
|
}
|
||||||
|
|
||||||
|
struct ConcordanceSerde {}
|
||||||
|
|
||||||
|
impl ConcordanceSerde {
|
||||||
|
fn serialize(value: Concordance) -> String {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
// invalid concordances are empty
|
||||||
|
fn deserialize(value: String) -> Concordance {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This may seem rather silly. In Rust, this type of behavior is typically done
|
||||||
|
with traits. After all, the standard library has `FromStr` and `ToString` in it.
|
||||||
|
|
||||||
|
But that is where our next subject comes in: Poly Isos.
|
||||||
|
|
||||||
|
### Poly Isos
|
||||||
|
|
||||||
|
The previous example was simply converting between values of two fixed types.
|
||||||
|
This next block builds upon it with generics, and is more interesting.
|
||||||
|
|
||||||
|
Poly Isos allow an operation to be generic over any type while returning a
|
||||||
|
single type.
|
||||||
|
|
||||||
|
This brings us closer to parsing. Consider what a basic parser would do ignoring
|
||||||
|
error cases. Again, this is its normal form:
|
||||||
|
|
||||||
|
```text
|
||||||
|
case class Serde[T] {
|
||||||
|
deserialize(String) -> T
|
||||||
|
serialize(T) -> String
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Here we have our first generic, the type `T` being converted.
|
||||||
|
|
||||||
|
In Rust, this could be implemented with a pair of traits in the standard
|
||||||
|
library: `FromStr` and `ToString`. The Rust version even handles errors:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
pub trait FromStr: Sized {
|
||||||
|
type Err;
|
||||||
|
|
||||||
|
fn from_str(s: &str) -> Result<Self, Self::Err>;
|
||||||
|
}
|
||||||
|
|
||||||
|
pub trait ToString {
|
||||||
|
fn to_string(&self) -> String;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Unlike the Iso, the Poly Iso allows application of multiple types, and returns
|
||||||
|
them generically. This is what you would want for a basic string parser.
|
||||||
|
|
||||||
|
At first glance, this seems like a good option for writing a parser. Let's see
|
||||||
|
it in action:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
use anyhow;
|
||||||
|
|
||||||
|
use std::str::FromStr;
|
||||||
|
|
||||||
|
struct TestStruct {
|
||||||
|
a: usize,
|
||||||
|
b: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
impl FromStr for TestStruct {
|
||||||
|
type Err = anyhow::Error;
|
||||||
|
fn from_str(s: &str) -> Result<TestStruct, Self::Err> {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
impl ToString for TestStruct {
|
||||||
|
fn to_string(&self) -> String {
|
||||||
|
todo!()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let a = TestStruct { a: 5, b: "hello".to_string() };
|
||||||
|
println!("Our Test Struct as JSON: {}", a.to_string());
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
That seems quite logical. However, there are two problems with this.
|
||||||
|
|
||||||
|
First, `to_string` does not indicate to API users, "this is JSON." Every type
|
||||||
|
would need to agree on a JSON representation, and many of the types in the Rust
|
||||||
|
standard library already don't. Using this is a poor fit. This can easily be
|
||||||
|
resolved with our own trait.
|
||||||
|
|
||||||
|
But there is a second, subtler problem: scaling.
|
||||||
|
|
||||||
|
When every type writes `to_string` by hand, this works. But if every single
|
||||||
|
person who wants their type to be serializable has to write a bunch of code --
|
||||||
|
and possibly different JSON libraries -- to do it themselves, it will turn into
|
||||||
|
a mess very quickly!
|
||||||
|
|
||||||
|
The answer is one of Serde's two key innovations: an independent data model to
|
||||||
|
represent Rust data in structures common to data serialization languages. The
|
||||||
|
result is that it can use Rust's code generation abilities to create an
|
||||||
|
intermediary conversion type it calls a `Visitor`.
|
||||||
|
|
||||||
|
This means, in normal form (again, skipping error handling for simplicity):
|
||||||
|
|
||||||
|
```text
|
||||||
|
case class Serde[T] {
|
||||||
|
deserialize: Visitor[T] -> T
|
||||||
|
serialize: T -> Visitor[T]
|
||||||
|
}
|
||||||
|
|
||||||
|
case class Visitor[T] {
|
||||||
|
toJson: Visitor[T] -> String
|
||||||
|
fromJson: String -> Visitor[T]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The result is one Poly Iso and one Iso (respectively). Both of these can be
|
||||||
|
implemented with traits:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
trait Serde {
|
||||||
|
type V;
|
||||||
|
fn deserialize(visitor: Self::V) -> Self;
|
||||||
|
fn serialize(self) -> Self::V;
|
||||||
|
}
|
||||||
|
|
||||||
|
trait Visitor {
|
||||||
|
fn to_json(self) -> String;
|
||||||
|
fn from_json(json: String) -> Self;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Because there is a uniform set of rules to transform Rust structures to the
|
||||||
|
independent form, it is even possible to have code generation creating the
|
||||||
|
`Visitor` associated with type `T`:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
#[derive(Default, Serde)] // the "Serde" derive creates the trait impl block
|
||||||
|
struct TestStruct {
|
||||||
|
a: usize,
|
||||||
|
b: String,
|
||||||
|
}
|
||||||
|
|
||||||
|
// user writes this macro to generate an associated visitor type
|
||||||
|
generate_visitor!(TestStruct);
|
||||||
|
```
|
||||||
|
|
||||||
|
Or do they?
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
fn main() {
|
||||||
|
let a = TestStruct { a: 5, b: "hello".to_string() };
|
||||||
|
let a_data = a.serialize().to_json();
|
||||||
|
println!("Our Test Struct as JSON: {}", a_data);
|
||||||
|
let b = TestStruct::deserialize(
|
||||||
|
generated_visitor_for!(TestStruct)::from_json(a_data));
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
It turns out that the conversion isn't symmetric after all! On paper it is, but
|
||||||
|
with the auto-generated code the name of the actual type necessary to convert
|
||||||
|
all the way from `String` is hidden. We'd need some kind of
|
||||||
|
`generated_visitor_for!` macro to obtain the type name.
|
||||||
|
|
||||||
|
It's wonky, but it works... until we get to the elephant in the room.
|
||||||
|
|
||||||
|
The only format currently supported is JSON. How would we support more formats?
|
||||||
|
|
||||||
|
The current design requires completely re-writing all of the code generation and
|
||||||
|
creating a new Serde trait. That is quite terrible and not extensible at all!
|
||||||
|
|
||||||
|
In order to solve that, we need something more powerful.
|
||||||
|
|
||||||
|
## Prism
|
||||||
|
|
||||||
|
To take format into account, we need something in normal form like this:
|
||||||
|
|
||||||
|
```text
|
||||||
|
case class Serde[T, F] {
|
||||||
|
serialize: T, F -> String
|
||||||
|
deserialize: String, F -> Result[T, Error]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This construct is called a Prism. It is "one level higher" in generics than Poly
|
||||||
|
Isos (in this case, the "intersecting" type F is the key).
|
||||||
|
|
||||||
|
Unfortunately because `Visitor` is a trait (since each incarnation requires its
|
||||||
|
own custom code), this would require a kind of generic type boundary that Rust
|
||||||
|
does not support.
|
||||||
|
|
||||||
|
Fortunately, we still have that `Visitor` type from before. What is the
|
||||||
|
`Visitor` doing? It is attempting to allow each data structure to define the way
|
||||||
|
it is itself parsed.
|
||||||
|
|
||||||
|
Well what if we could add one more interface for the generic format? Then the
|
||||||
|
`Visitor` is just an implementation detail, and it would "bridge" the two APIs.
|
||||||
|
|
||||||
|
In normal form:
|
||||||
|
|
||||||
|
```text
|
||||||
|
case class Serde[T] {
|
||||||
|
serialize: F -> String
|
||||||
|
deserialize F, String -> Result[T, Error]
|
||||||
|
}
|
||||||
|
|
||||||
|
case class VisitorForT {
|
||||||
|
build: F, String -> Result[T, Error]
|
||||||
|
decompose: F, T -> String
|
||||||
|
}
|
||||||
|
|
||||||
|
case class SerdeFormat[T, V] {
|
||||||
|
toString: T, V -> String
|
||||||
|
fromString: V, String -> Result[T, Error]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And what do you know, a pair of Poly Isos at the bottom which can be implemented
|
||||||
|
as traits!
|
||||||
|
|
||||||
|
Thus we have the Serde API:
|
||||||
|
|
||||||
|
1. Each type to be serialized implements `Deserialize` or `Serialize`,
|
||||||
|
equivalent to the `Serde` class
|
||||||
|
1. They get a type (well two, one for each direction) implementing the `Visitor`
|
||||||
|
trait, which is usually (but not always) done through code generated by a
|
||||||
|
derive macro. This contains the logic to construct or destruct between the
|
||||||
|
data type and the format of the Serde data model.
|
||||||
|
1. The type implementing the `Deserializer` trait handles all details specific
|
||||||
|
to the format, being "driven by" the `Visitor`.
|
||||||
|
|
||||||
|
This splitting and Rust type erasure is really to achieve a Prism through
|
||||||
|
indirection.
|
||||||
|
|
||||||
|
You can see it on the `Deserializer` trait
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
pub trait Deserializer<'de>: Sized {
|
||||||
|
type Error: Error;
|
||||||
|
|
||||||
|
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
||||||
|
where
|
||||||
|
V: Visitor<'de>;
|
||||||
|
|
||||||
|
fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
||||||
|
where
|
||||||
|
V: Visitor<'de>;
|
||||||
|
|
||||||
|
// remainder omitted
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And the visitor:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
pub trait Visitor<'de>: Sized {
|
||||||
|
type Value;
|
||||||
|
|
||||||
|
fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
|
||||||
|
where
|
||||||
|
E: Error;
|
||||||
|
|
||||||
|
fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
|
||||||
|
where
|
||||||
|
E: Error;
|
||||||
|
|
||||||
|
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
|
||||||
|
where
|
||||||
|
E: Error;
|
||||||
|
|
||||||
|
// remainder omitted
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And the trait `Deserialize` implemented by the macros:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
pub trait Deserialize<'de>: Sized {
|
||||||
|
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
|
||||||
|
where
|
||||||
|
D: Deserializer<'de>;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This has been abstract, so let's look at a concrete example.
|
||||||
|
|
||||||
|
How does actual Serde deserialize a bit of JSON into `struct Concordance` from
|
||||||
|
earlier?
|
||||||
|
|
||||||
|
1. The user would call a library function to deserialize the data. This would
|
||||||
|
create a `Deserializer` based on the JSON format.
|
||||||
|
1. Based on the fields in the struct, a `Visitor` would be created (more on that
|
||||||
|
in a moment) which knows how to create each type in a generic data model that
|
||||||
|
was needed to represent it: `Vec` (list), `u64` and `String`.
|
||||||
|
1. The deserializer would make calls to the `Visitor` as it parsed items.
|
||||||
|
1. The `Visitor` would indicate if the items found were expected, and if not,
|
||||||
|
raise an error to indicate deserialization has failed.
|
||||||
|
|
||||||
|
For our very simple structure above, the expected pattern would be:
|
||||||
|
|
||||||
|
1. Begin visiting a map (*Serde*'s equivalent to `HashMap` or JSON's
|
||||||
|
dictionary).
|
||||||
|
1. Visit a string key called "keys".
|
||||||
|
1. Begin visiting a map value.
|
||||||
|
1. For each item, visit a string key then an integer value.
|
||||||
|
1. Visit the end of the map.
|
||||||
|
1. Store the map into the `keys` field of the data structure.
|
||||||
|
1. Visit a string key called "value_table".
|
||||||
|
1. Begin visiting a list value.
|
||||||
|
1. For each item, visit an integer.
|
||||||
|
1. Visit the end of the list
|
||||||
|
1. Store the list into the `value_table` field.
|
||||||
|
1. Visit the end of the map.
|
||||||
|
|
||||||
|
But what determines which "observation" pattern is expected?
|
||||||
|
|
||||||
|
A functional programming language would be able to use currying to create
|
||||||
|
reflection of each type based on the type itself. Rust does not support that, so
|
||||||
|
every single type would need to have its own code written based on its fields
|
||||||
|
and their properties.
|
||||||
|
|
||||||
|
*Serde* solves this usability challenge with a derive macro:
|
||||||
|
|
||||||
|
```rust,ignore
|
||||||
|
use serde::Deserialize;
|
||||||
|
|
||||||
|
#[derive(Deserialize)]
|
||||||
|
struct IdRecord {
|
||||||
|
name: String,
|
||||||
|
customer_id: String,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
That macro simply generates an impl block causing the struct to implement a
|
||||||
|
trait called `Deserialize`.
|
||||||
|
|
||||||
|
This is the function that determines how to create the struct itself. Code is
|
||||||
|
generated based on the struct's fields. When the parsing library is called - in
|
||||||
|
our example, a JSON parsing library - it creates a `Deserializer` and calls
|
||||||
|
`Type::deserialize` with it as a parameter.
|
||||||
|
|
||||||
|
The `deserialize` code will then create a `Visitor` which will have its calls
|
||||||
|
"refracted" by the `Deserializer`. If everything goes well, eventually that
|
||||||
|
`Visitor` will construct a value corresponding to the type being parsed and
|
||||||
|
return it.
|
||||||
|
|
||||||
|
For a complete example, see the
|
||||||
|
[*Serde* documentation](https://serde.rs/deserialize-struct.html).
|
||||||
|
|
||||||
|
The result is that types to be deserialized only implement the "top layer" of
|
||||||
|
the API, and file formats only need to implement the "bottom layer". Each piece
|
||||||
|
can then "just work" with the rest of the ecosystem, since generic types will
|
||||||
|
bridge them.
|
||||||
|
|
||||||
|
In conclusion, Rust's generic-inspired type system can bring it close to these
|
||||||
|
concepts and use their power, as shown in this API design. But it may also need
|
||||||
|
procedural macros to create bridges for its generics.
|
||||||
|
|
||||||
|
If you are interested in learning more about this topic, please check the
|
||||||
|
following section.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [lens-rs crate](https://crates.io/crates/lens-rs) for a pre-built lenses
|
||||||
|
implementation, with a cleaner interface than these examples
|
||||||
|
- [Serde](https://serde.rs) itself, which makes these concepts intuitive for end
|
||||||
|
users (i.e. defining the structs) without needing to understand the details
|
||||||
|
- [luminance](https://github.com/phaazon/luminance-rs) is a crate for drawing
|
||||||
|
computer graphics that uses similar API design, including procedural macros to
|
||||||
|
create full prisms for buffers of different pixel types that remain generic
|
||||||
|
- [An Article about Lenses in Scala](https://web.archive.org/web/20221128185849/https://medium.com/zyseme-technology/functional-references-lens-and-other-optics-in-scala-e5f7e2fdafe)
|
||||||
|
that is very readable even without Scala expertise.
|
||||||
|
- [Paper: Profunctor Optics: Modular Data
|
||||||
|
Accessors](https://web.archive.org/web/20220701102832/https://arxiv.org/ftp/arxiv/papers/1703/1703.10857.pdf)
|
||||||
|
- [Musli](https://github.com/udoprog/musli) is a library which attempts to use a
|
||||||
|
similar structure with a different approach, e.g. doing away with the visitor
|
||||||
|
|
||||||
|
[^1]: [School of Haskell: A Little Lens Starter Tutorial](https://web.archive.org/web/20221128190041/https://www.schoolofhaskell.com/school/to-infinity-and-beyond/pick-of-the-week/a-little-lens-starter-tutorial)
|
||||||
|
|
||||||
|
[^2]: [Concordance on Wikipedia](https://en.wikipedia.org/wiki/Concordance_(publishing))
|
Loading…
Reference in New Issue