Iteration on Functional optics (#364)
Co-authored-by: simonsan <14062932+simonsan@users.noreply.github.com> Co-authored-by: Julian Ganz <neither@nut.email>pull/373/head
parent
0a12425a64
commit
bbcf07985d
@ -1,355 +0,0 @@
|
||||
# Lenses and Prisms
|
||||
|
||||
This is a pure functional concept that is not frequently used in Rust.
|
||||
Nevertheless, exploring the concept may be helpful to understand other patterns
|
||||
in Rust APIs, such as [visitors](../patterns/behavioural/visitor.md). They also
|
||||
have niche use cases.
|
||||
|
||||
## Lenses: Uniform Access Across Types
|
||||
|
||||
A lens is a concept from functional programming languages that allows accessing
|
||||
parts of a data type in an abstract, unified way.[^1] In basic concept, it is
|
||||
similar to the way Rust traits work with type erasure, but it has a bit more
|
||||
power and flexibility.
|
||||
|
||||
For example, suppose a bank contains several JSON formats for customer data.
|
||||
This is because they come from different databases or legacy systems. One
|
||||
database contains the data needed to perform credit checks:
|
||||
|
||||
```json
|
||||
{ "name": "Jane Doe",
|
||||
"dob": "2002-02-24",
|
||||
[...]
|
||||
"customer_id": 1048576332,
|
||||
}
|
||||
```
|
||||
|
||||
Another one contains the account information:
|
||||
|
||||
```json
|
||||
{ "customer_id": 1048576332,
|
||||
"accounts": [
|
||||
{ "account_id": 2121,
|
||||
"account_type: "savings",
|
||||
"joint_customer_ids": [],
|
||||
[...]
|
||||
},
|
||||
{ "account_id": 2122,
|
||||
"account_type: "checking",
|
||||
"joint_customer_ids": [1048576333],
|
||||
[...]
|
||||
},
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Notice that both types have a customer ID number which corresponds to a person.
|
||||
How would a single function handle both records of different types?
|
||||
|
||||
In Rust, a `struct` could represent each of these types, and a trait would have
|
||||
a `get_customer_id` function they would implement:
|
||||
|
||||
```rust
|
||||
use std::collections::HashSet;
|
||||
|
||||
pub struct Account {
|
||||
account_id: u32,
|
||||
account_type: String,
|
||||
// other fields omitted
|
||||
}
|
||||
|
||||
pub trait CustomerId {
|
||||
fn get_customer_id(&self) -> u64;
|
||||
}
|
||||
|
||||
pub struct CreditRecord {
|
||||
customer_id: u64,
|
||||
name: String,
|
||||
dob: String,
|
||||
// other fields omitted
|
||||
}
|
||||
|
||||
impl CustomerId for CreditRecord {
|
||||
fn get_customer_id(&self) -> u64 {
|
||||
self.customer_id
|
||||
}
|
||||
}
|
||||
|
||||
pub struct AccountRecord {
|
||||
customer_id: u64,
|
||||
accounts: Vec<Account>,
|
||||
}
|
||||
|
||||
impl CustomerId for AccountRecord {
|
||||
fn get_customer_id(&self) -> u64 {
|
||||
self.customer_id
|
||||
}
|
||||
}
|
||||
|
||||
// static polymorphism: only one type, but each function call can choose it
|
||||
fn unique_ids_set<R: CustomerId>(records: &[R]) -> HashSet<u64> {
|
||||
records.iter().map(|r| r.get_customer_id()).collect()
|
||||
}
|
||||
|
||||
// dynamic dispatch: iterates over any type with a customer ID, collecting all
|
||||
// values together
|
||||
fn unique_ids_iter<I>(iterator: I) -> HashSet<u64>
|
||||
where I: Iterator<Item=Box<dyn CustomerId>>
|
||||
{
|
||||
iterator.map(|r| r.as_ref().get_customer_id()).collect()
|
||||
}
|
||||
```
|
||||
|
||||
Lenses, however, allow the code supporting customer ID to be moved from the
|
||||
*type* to the *accessor function*. Rather than implementing a trait on each
|
||||
type, all matching structures can simply be accessed the same way.
|
||||
|
||||
While the Rust language itself does not support this (type erasure is the
|
||||
preferred solution to this problem), the
|
||||
[lens-rs crate](https://github.com/TOETOE55/lens-rs/blob/master/guide.md) allows
|
||||
code that feels like this to be written with macros:
|
||||
|
||||
```rust,ignore
|
||||
use std::collections::HashSet;
|
||||
|
||||
use lens_rs::{optics, Lens, LensRef, Optics};
|
||||
|
||||
#[derive(Clone, Debug, Lens /* derive to allow lenses to work */)]
|
||||
pub struct CreditRecord {
|
||||
#[optic(ref)] // macro attribute to allow viewing this field
|
||||
customer_id: u64,
|
||||
name: String,
|
||||
dob: String,
|
||||
// other fields omitted
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug)]
|
||||
pub struct Account {
|
||||
account_id: u32,
|
||||
account_type: String,
|
||||
// other fields omitted
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, Lens)]
|
||||
pub struct AccountRecord {
|
||||
#[optic(ref)]
|
||||
customer_id: u64,
|
||||
accounts: Vec<Account>,
|
||||
}
|
||||
|
||||
fn unique_ids_lens<T>(iter: impl Iterator<Item = T>) -> HashSet<u64>
|
||||
where
|
||||
T: LensRef<Optics![customer_id], u64>, // any type with this field
|
||||
{
|
||||
iter.map(|r| *r.view_ref(optics!(customer_id))).collect()
|
||||
}
|
||||
```
|
||||
|
||||
The version of `unique_ids_lens` shown here allows any type to be in the
|
||||
iterator, so long as it has an attribute called `customer_id` which can be
|
||||
accessed by the function. This is how most functional programming languages
|
||||
operate on lenses.
|
||||
|
||||
Rather than macros, they achieve this with a technique known as "currying". That
|
||||
is, they "partially construct" the function, leaving the type of the final
|
||||
parameter (the value being operated on) unfilled until the function is called.
|
||||
Thus it can be called with different types dynamically even from one place in
|
||||
the code. That is what the `optics!` and `view_ref` in the example above
|
||||
simulates.
|
||||
|
||||
The functional approach need not be restricted to accessing members. More
|
||||
powerful lenses can be created which both *set* and *get* data in a structure.
|
||||
But the concept really becomes interesting when used as a building block for
|
||||
composition. That is where the concept appears more clearly in Rust.
|
||||
|
||||
## Prisms: A Higher-Order form of "Optics"
|
||||
|
||||
A simple function such as `unique_ids_lens` above operates on a single lens. A
|
||||
*prism* is a function that operates on a *family* of lenses. It is one
|
||||
conceptual level higher, using lenses as a building block, and continuing the
|
||||
metaphor, is part of a family of "optics". It is the main one that is useful in
|
||||
understanding Rust APIs, so will be the focus here.
|
||||
|
||||
The same way that traits allow "lens-like" design with static polymorphism and
|
||||
dynamic dispatch, prism-like designs appear in Rust APIs which split problems
|
||||
into multiple associated types to be composed. A good example of this is the
|
||||
traits in the parsing crate *Serde*.
|
||||
|
||||
Trying to understand the way *Serde* works by only reading the API is a
|
||||
challenge, especially the first time. Consider the `Deserializer` trait,
|
||||
implemented by some type in any library which parses a new format:
|
||||
|
||||
```rust,ignore
|
||||
pub trait Deserializer<'de>: Sized {
|
||||
type Error: Error;
|
||||
|
||||
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
||||
where
|
||||
V: Visitor<'de>;
|
||||
|
||||
fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
||||
where
|
||||
V: Visitor<'de>;
|
||||
|
||||
// remainder ommitted
|
||||
}
|
||||
```
|
||||
|
||||
For a trait that is just supposed to parse data from a format and return a
|
||||
value, this looks odd.
|
||||
|
||||
Why are all the return types type erased?
|
||||
|
||||
To understand that, we need to keep the lens concept in mind and look at the
|
||||
definition of the `Visitor` type that is passed in generically:
|
||||
|
||||
```rust,ignore
|
||||
pub trait Visitor<'de>: Sized {
|
||||
type Value;
|
||||
|
||||
fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
|
||||
where
|
||||
E: Error;
|
||||
|
||||
fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
|
||||
where
|
||||
E: Error;
|
||||
|
||||
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
|
||||
where
|
||||
E: Error;
|
||||
|
||||
// remainder omitted
|
||||
}
|
||||
```
|
||||
|
||||
The job of the `Visitor` type is to construct values in the *Serde* data model,
|
||||
which are represented by its associated `Value` type.
|
||||
|
||||
These values represent parts of the Rust value being deserialized. If this
|
||||
fails, it returns an `Error` type - an error type determined by the
|
||||
`Deserializer` when its methods were called.
|
||||
|
||||
This highlights that `Deserializer` is similar to `CustomerId` from earlier,
|
||||
allowing any format parser which implements it to create `Value`s based on what
|
||||
it parsed. The `Value` trait is acting like a lens in functional programming
|
||||
languages.
|
||||
|
||||
But unlike the `CustomerId` trait, the return types of `Visitor` methods are
|
||||
*generic*, and the concrete `Value` type is *determined by the Visitor itself*.
|
||||
|
||||
Instead of acting as one lens, it effectively acts as a family of lenses, one
|
||||
for each concrete type of `Visitor`.
|
||||
|
||||
The `Deserializer` API is based on having a generic set of "lenses" work across
|
||||
a set of other generic types for "observation". It is a *prism*.
|
||||
|
||||
For example, consider the identity record from earlier but simplified:
|
||||
|
||||
```json
|
||||
{ "name": "Jane Doe", "customer_id": 1048576332 }
|
||||
```
|
||||
|
||||
How would the *Serde* library deserialize this JSON into `struct CreditRecord`?
|
||||
|
||||
1. The user would call a library function to deserialize the data. This would
|
||||
create a `Deserializer` based on the JSON format.
|
||||
1. Based on the fields in the struct, a `Visitor` would be created (more on that
|
||||
in a moment) which knows how to create each type in a generic data model that
|
||||
was needed to represent it: `u64` and `String`.
|
||||
1. The deserializer would make calls to the `Visitor` as it parsed items.
|
||||
1. The `Visitor` would indicate if the items found were expected, and if not,
|
||||
raise an error to indicate deserialization has failed.
|
||||
|
||||
For our very simple structure above, the expected pattern would be:
|
||||
|
||||
1. Visit a map (*Serde*'s equvialent to `HashMap` or JSON's dictionary).
|
||||
1. Visit a string key called "name".
|
||||
1. Visit a string value, which will go into the `name` field.
|
||||
1. Visit a string key called "customer_id".
|
||||
1. Visit a string value, which will go into the `customer_id` field.
|
||||
1. Visit the end of the map.
|
||||
|
||||
But what determines which "observation" pattern is expected?
|
||||
|
||||
A functional programming language would be able to use currying to create
|
||||
reflection of each type based on the type itself. Rust does not support that, so
|
||||
every single type would need to have its own code written based on its fields
|
||||
and their properties.
|
||||
|
||||
*Serde* solves this usability challenge with a derive macro:
|
||||
|
||||
```rust,ignore
|
||||
use serde::Deserialize;
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct IdRecord {
|
||||
name: String,
|
||||
customer_id: String,
|
||||
}
|
||||
```
|
||||
|
||||
That macro simply generates an impl block causing the struct to implement a
|
||||
trait called `Deserialize`.
|
||||
|
||||
It is defined this way:
|
||||
|
||||
```rust,ignore
|
||||
pub trait Deserialize<'de>: Sized {
|
||||
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
|
||||
where
|
||||
D: Deserializer<'de>;
|
||||
}
|
||||
```
|
||||
|
||||
This is the function that determines how to create the struct itself. Code is
|
||||
generated based on the struct's fields. When the parsing library is called - in
|
||||
our example, a JSON parsing library - it creates a `Deserializer` and calls
|
||||
`Type::deserialize` with it as a parameter.
|
||||
|
||||
The `deserialize` code will then create a `Visitor` which will have its calls
|
||||
"refracted" by the `Deserializer`. If everything goes well, eventually that
|
||||
`Visitor` will construct a value corresponding to the type being parsed and
|
||||
return it.
|
||||
|
||||
For a complete example, see the
|
||||
[*Serde* documentation](https://serde.rs/deserialize-struct.html).
|
||||
|
||||
To wrap up, this is the power of *Serde*:
|
||||
|
||||
1. The structure being parsed is represented by an `impl` block for
|
||||
`Deserialize`
|
||||
1. The input data format (e.g. JSON) is represented by a `Deserializer` called
|
||||
by `Deserialize`
|
||||
1. The `Deserializer` acts like a prism which "refracts" lens-like `Visitor`
|
||||
calls which actually build the data value
|
||||
|
||||
The result is that types to be deserialized only implement the "top layer" of
|
||||
the API, and file formats only need to implement the "bottom layer". Each piece
|
||||
can then "just work" with the rest of the ecosystem, since generic types will
|
||||
bridge them.
|
||||
|
||||
To emphasize, the only reason this model works on any format and any type is
|
||||
because the `Deserializer` trait's output type **is specified by the implementor
|
||||
of `Visitor` it is passed**, rather than being tied to one specific type. This
|
||||
was not true in the account example earlier.
|
||||
|
||||
Rust's generic-inspired type system can bring it close to these concepts and use
|
||||
their power, as shown in this API design. But it may also need procedural macros
|
||||
to create bridges for its generics.
|
||||
|
||||
## See Also
|
||||
|
||||
- [lens-rs crate](https://crates.io/crates/lens-rs) for a pre-built lenses
|
||||
implementation, with a cleaner interface than these examples
|
||||
- [serde](https://serde.rs) itself, which makes these concepts intuitive for end
|
||||
users (i.e. defining the structs) without needing to undestand the details
|
||||
- [luminance](https://github.com/phaazon/luminance-rs) is a crate for drawing
|
||||
computer graphics that uses lens API design, including proceducal macros to
|
||||
create full prisms for buffers of different pixel types that remain generic
|
||||
- [An Article about Lenses in Scala](https://web.archive.org/web/20221128185849/https://medium.com/zyseme-technology/functional-references-lens-and-other-optics-in-scala-e5f7e2fdafe)
|
||||
that is very readable even without Scala expertise.
|
||||
- [Paper: Profunctor Optics: Modular Data
|
||||
Accessors](https://web.archive.org/web/20220701102832/https://arxiv.org/ftp/arxiv/papers/1703/1703.10857.pdf)
|
||||
|
||||
[^1]: [School of Haskell: A Little Lens Starter Tutorial](https://web.archive.org/web/20221128190041/https://www.schoolofhaskell.com/school/to-infinity-and-beyond/pick-of-the-week/a-little-lens-starter-tutorial)
|
@ -0,0 +1,507 @@
|
||||
# Functional Language Optics
|
||||
|
||||
Optics is a type of API design that is common to functional languages. This is a
|
||||
pure functional concept that is not frequently used in Rust.
|
||||
|
||||
Nevertheless, exploring the concept may be helpful to understand other patterns
|
||||
in Rust APIs, such as [visitors](../patterns/behavioural/visitor.md). They also
|
||||
have niche use cases.
|
||||
|
||||
This is quite a large topic, and would require actual books on language design
|
||||
to fully get into its abilities. However their applicability in Rust is much
|
||||
simpler.
|
||||
|
||||
To explain the relevant parts of the concept, the `Serde`-API will be used as an
|
||||
example, as it is one that is difficult for many to to understand from simply
|
||||
the API documentation.
|
||||
|
||||
In the process, different specific patterns, called Optics, will be covered.
|
||||
These are *The Iso*, *The Poly Iso*, and *The Prism*.
|
||||
|
||||
## An API Example: Serde
|
||||
|
||||
Trying to understand the way *Serde* works by only reading the API is a
|
||||
challenge, especially the first time. Consider the `Deserializer` trait,
|
||||
implemented by any library which parses a new data format:
|
||||
|
||||
```rust,ignore
|
||||
pub trait Deserializer<'de>: Sized {
|
||||
type Error: Error;
|
||||
|
||||
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
||||
where
|
||||
V: Visitor<'de>;
|
||||
|
||||
fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
||||
where
|
||||
V: Visitor<'de>;
|
||||
|
||||
// remainder omitted
|
||||
}
|
||||
```
|
||||
|
||||
And here's the definition of the `Visitor` trait passed in generically:
|
||||
|
||||
```rust,ignore
|
||||
pub trait Visitor<'de>: Sized {
|
||||
type Value;
|
||||
|
||||
fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
|
||||
where
|
||||
E: Error;
|
||||
|
||||
fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
|
||||
where
|
||||
E: Error;
|
||||
|
||||
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
|
||||
where
|
||||
E: Error;
|
||||
|
||||
// remainder omitted
|
||||
}
|
||||
```
|
||||
|
||||
There is a lot of type erasure going on here, with multiple levels of associated
|
||||
types being passed back and forth.
|
||||
|
||||
But what is the big picture? Why not just have the `Visitor` return the pieces
|
||||
the caller needs in a streaming API, and call it a day? Why all the extra
|
||||
pieces?
|
||||
|
||||
One way to understand it is to look at a functional languages concept called
|
||||
*optics*.
|
||||
|
||||
This is a way to do composition of behavior and proprieties that is designed to
|
||||
facilitate patterns common to Rust: failure, type transformation, etc.[^1]
|
||||
|
||||
The Rust language does not have very good support for these directly. However,
|
||||
they appear in the design of the language itself, and their concepts can help to
|
||||
understand some of Rust's APIs. As a result, this attempts to explain the
|
||||
concepts with the way Rust does it.
|
||||
|
||||
This will perhaps shed light on what those APIs are achieving: specific
|
||||
properties of composability.
|
||||
|
||||
## Basic Optics
|
||||
|
||||
### The Iso
|
||||
|
||||
The Iso is a value transformer between two types. It is extremely simple, but a
|
||||
conceptually important building block.
|
||||
|
||||
As an example, suppose that we have a custom Hash table structure used as a
|
||||
concordance for a document.[^2] It uses strings for keys (words) and a list of
|
||||
indexes for values (file offsets, for instance).
|
||||
|
||||
A key feature is the ability to serialize this format to disk. A "quick and
|
||||
dirty" approach would be to implement a conversion to and from a string in JSON
|
||||
format. (Errors are ignored for the time being, they will be handled later.)
|
||||
|
||||
To write it in a normal form expected by functional language users:
|
||||
|
||||
```text
|
||||
case class ConcordanceSerDe {
|
||||
serialize: Concordance -> String
|
||||
deserialize: String -> Concordance
|
||||
}
|
||||
```
|
||||
|
||||
The Iso is thus a pair of functions which convert values of different types:
|
||||
`serialize` and `deserialize`.
|
||||
|
||||
A straightforward implementation:
|
||||
|
||||
```rust
|
||||
use std::collections::HashMap;
|
||||
|
||||
struct Concordance {
|
||||
keys: HashMap<String, usize>,
|
||||
value_table: Vec<(usize, usize)>,
|
||||
}
|
||||
|
||||
struct ConcordanceSerde {}
|
||||
|
||||
impl ConcordanceSerde {
|
||||
fn serialize(value: Concordance) -> String {
|
||||
todo!()
|
||||
}
|
||||
// invalid concordances are empty
|
||||
fn deserialize(value: String) -> Concordance {
|
||||
todo!()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This may seem rather silly. In Rust, this type of behavior is typically done
|
||||
with traits. After all, the standard library has `FromStr` and `ToString` in it.
|
||||
|
||||
But that is where our next subject comes in: Poly Isos.
|
||||
|
||||
### Poly Isos
|
||||
|
||||
The previous example was simply converting between values of two fixed types.
|
||||
This next block builds upon it with generics, and is more interesting.
|
||||
|
||||
Poly Isos allow an operation to be generic over any type while returning a
|
||||
single type.
|
||||
|
||||
This brings us closer to parsing. Consider what a basic parser would do ignoring
|
||||
error cases. Again, this is its normal form:
|
||||
|
||||
```text
|
||||
case class Serde[T] {
|
||||
deserialize(String) -> T
|
||||
serialize(T) -> String
|
||||
}
|
||||
```
|
||||
|
||||
Here we have our first generic, the type `T` being converted.
|
||||
|
||||
In Rust, this could be implemented with a pair of traits in the standard
|
||||
library: `FromStr` and `ToString`. The Rust version even handles errors:
|
||||
|
||||
```rust,ignore
|
||||
pub trait FromStr: Sized {
|
||||
type Err;
|
||||
|
||||
fn from_str(s: &str) -> Result<Self, Self::Err>;
|
||||
}
|
||||
|
||||
pub trait ToString {
|
||||
fn to_string(&self) -> String;
|
||||
}
|
||||
```
|
||||
|
||||
Unlike the Iso, the Poly Iso allows application of multiple types, and returns
|
||||
them generically. This is what you would want for a basic string parser.
|
||||
|
||||
At first glance, this seems like a good option for writing a parser. Let's see
|
||||
it in action:
|
||||
|
||||
```rust,ignore
|
||||
use anyhow;
|
||||
|
||||
use std::str::FromStr;
|
||||
|
||||
struct TestStruct {
|
||||
a: usize,
|
||||
b: String,
|
||||
}
|
||||
|
||||
impl FromStr for TestStruct {
|
||||
type Err = anyhow::Error;
|
||||
fn from_str(s: &str) -> Result<TestStruct, Self::Err> {
|
||||
todo!()
|
||||
}
|
||||
}
|
||||
|
||||
impl ToString for TestStruct {
|
||||
fn to_string(&self) -> String {
|
||||
todo!()
|
||||
}
|
||||
}
|
||||
|
||||
fn main() {
|
||||
let a = TestStruct { a: 5, b: "hello".to_string() };
|
||||
println!("Our Test Struct as JSON: {}", a.to_string());
|
||||
}
|
||||
```
|
||||
|
||||
That seems quite logical. However, there are two problems with this.
|
||||
|
||||
First, `to_string` does not indicate to API users, "this is JSON." Every type
|
||||
would need to agree on a JSON representation, and many of the types in the Rust
|
||||
standard library already don't. Using this is a poor fit. This can easily be
|
||||
resolved with our own trait.
|
||||
|
||||
But there is a second, subtler problem: scaling.
|
||||
|
||||
When every type writes `to_string` by hand, this works. But if every single
|
||||
person who wants their type to be serializable has to write a bunch of code --
|
||||
and possibly different JSON libraries -- to do it themselves, it will turn into
|
||||
a mess very quickly!
|
||||
|
||||
The answer is one of Serde's two key innovations: an independent data model to
|
||||
represent Rust data in structures common to data serialization languages. The
|
||||
result is that it can use Rust's code generation abilities to create an
|
||||
intermediary conversion type it calls a `Visitor`.
|
||||
|
||||
This means, in normal form (again, skipping error handling for simplicity):
|
||||
|
||||
```text
|
||||
case class Serde[T] {
|
||||
deserialize: Visitor[T] -> T
|
||||
serialize: T -> Visitor[T]
|
||||
}
|
||||
|
||||
case class Visitor[T] {
|
||||
toJson: Visitor[T] -> String
|
||||
fromJson: String -> Visitor[T]
|
||||
}
|
||||
```
|
||||
|
||||
The result is one Poly Iso and one Iso (respectively). Both of these can be
|
||||
implemented with traits:
|
||||
|
||||
```rust
|
||||
trait Serde {
|
||||
type V;
|
||||
fn deserialize(visitor: Self::V) -> Self;
|
||||
fn serialize(self) -> Self::V;
|
||||
}
|
||||
|
||||
trait Visitor {
|
||||
fn to_json(self) -> String;
|
||||
fn from_json(json: String) -> Self;
|
||||
}
|
||||
```
|
||||
|
||||
Because there is a uniform set of rules to transform Rust structures to the
|
||||
independent form, it is even possible to have code generation creating the
|
||||
`Visitor` associated with type `T`:
|
||||
|
||||
```rust,ignore
|
||||
#[derive(Default, Serde)] // the "Serde" derive creates the trait impl block
|
||||
struct TestStruct {
|
||||
a: usize,
|
||||
b: String,
|
||||
}
|
||||
|
||||
// user writes this macro to generate an associated visitor type
|
||||
generate_visitor!(TestStruct);
|
||||
```
|
||||
|
||||
Or do they?
|
||||
|
||||
```rust,ignore
|
||||
fn main() {
|
||||
let a = TestStruct { a: 5, b: "hello".to_string() };
|
||||
let a_data = a.serialize().to_json();
|
||||
println!("Our Test Struct as JSON: {}", a_data);
|
||||
let b = TestStruct::deserialize(
|
||||
generated_visitor_for!(TestStruct)::from_json(a_data));
|
||||
}
|
||||
```
|
||||
|
||||
It turns out that the conversion isn't symmetric after all! On paper it is, but
|
||||
with the auto-generated code the name of the actual type necessary to convert
|
||||
all the way from `String` is hidden. We'd need some kind of
|
||||
`generated_visitor_for!` macro to obtain the type name.
|
||||
|
||||
It's wonky, but it works... until we get to the elephant in the room.
|
||||
|
||||
The only format currently supported is JSON. How would we support more formats?
|
||||
|
||||
The current design requires completely re-writing all of the code generation and
|
||||
creating a new Serde trait. That is quite terrible and not extensible at all!
|
||||
|
||||
In order to solve that, we need something more powerful.
|
||||
|
||||
## Prism
|
||||
|
||||
To take format into account, we need something in normal form like this:
|
||||
|
||||
```text
|
||||
case class Serde[T, F] {
|
||||
serialize: T, F -> String
|
||||
deserialize: String, F -> Result[T, Error]
|
||||
}
|
||||
```
|
||||
|
||||
This construct is called a Prism. It is "one level higher" in generics than Poly
|
||||
Isos (in this case, the "intersecting" type F is the key).
|
||||
|
||||
Unfortunately because `Visitor` is a trait (since each incarnation requires its
|
||||
own custom code), this would require a kind of generic type boundary that Rust
|
||||
does not support.
|
||||
|
||||
Fortunately, we still have that `Visitor` type from before. What is the
|
||||
`Visitor` doing? It is attempting to allow each data structure to define the way
|
||||
it is itself parsed.
|
||||
|
||||
Well what if we could add one more interface for the generic format? Then the
|
||||
`Visitor` is just an implementation detail, and it would "bridge" the two APIs.
|
||||
|
||||
In normal form:
|
||||
|
||||
```text
|
||||
case class Serde[T] {
|
||||
serialize: F -> String
|
||||
deserialize F, String -> Result[T, Error]
|
||||
}
|
||||
|
||||
case class VisitorForT {
|
||||
build: F, String -> Result[T, Error]
|
||||
decompose: F, T -> String
|
||||
}
|
||||
|
||||
case class SerdeFormat[T, V] {
|
||||
toString: T, V -> String
|
||||
fromString: V, String -> Result[T, Error]
|
||||
}
|
||||
```
|
||||
|
||||
And what do you know, a pair of Poly Isos at the bottom which can be implemented
|
||||
as traits!
|
||||
|
||||
Thus we have the Serde API:
|
||||
|
||||
1. Each type to be serialized implements `Deserialize` or `Serialize`,
|
||||
equivalent to the `Serde` class
|
||||
1. They get a type (well two, one for each direction) implementing the `Visitor`
|
||||
trait, which is usually (but not always) done through code generated by a
|
||||
derive macro. This contains the logic to construct or destruct between the
|
||||
data type and the format of the Serde data model.
|
||||
1. The type implementing the `Deserializer` trait handles all details specific
|
||||
to the format, being "driven by" the `Visitor`.
|
||||
|
||||
This splitting and Rust type erasure is really to achieve a Prism through
|
||||
indirection.
|
||||
|
||||
You can see it on the `Deserializer` trait
|
||||
|
||||
```rust,ignore
|
||||
pub trait Deserializer<'de>: Sized {
|
||||
type Error: Error;
|
||||
|
||||
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
||||
where
|
||||
V: Visitor<'de>;
|
||||
|
||||
fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error>
|
||||
where
|
||||
V: Visitor<'de>;
|
||||
|
||||
// remainder omitted
|
||||
}
|
||||
```
|
||||
|
||||
And the visitor:
|
||||
|
||||
```rust,ignore
|
||||
pub trait Visitor<'de>: Sized {
|
||||
type Value;
|
||||
|
||||
fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
|
||||
where
|
||||
E: Error;
|
||||
|
||||
fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
|
||||
where
|
||||
E: Error;
|
||||
|
||||
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
|
||||
where
|
||||
E: Error;
|
||||
|
||||
// remainder omitted
|
||||
}
|
||||
```
|
||||
|
||||
And the trait `Deserialize` implemented by the macros:
|
||||
|
||||
```rust,ignore
|
||||
pub trait Deserialize<'de>: Sized {
|
||||
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
|
||||
where
|
||||
D: Deserializer<'de>;
|
||||
}
|
||||
```
|
||||
|
||||
This has been abstract, so let's look at a concrete example.
|
||||
|
||||
How does actual Serde deserialize a bit of JSON into `struct Concordance` from
|
||||
earlier?
|
||||
|
||||
1. The user would call a library function to deserialize the data. This would
|
||||
create a `Deserializer` based on the JSON format.
|
||||
1. Based on the fields in the struct, a `Visitor` would be created (more on that
|
||||
in a moment) which knows how to create each type in a generic data model that
|
||||
was needed to represent it: `Vec` (list), `u64` and `String`.
|
||||
1. The deserializer would make calls to the `Visitor` as it parsed items.
|
||||
1. The `Visitor` would indicate if the items found were expected, and if not,
|
||||
raise an error to indicate deserialization has failed.
|
||||
|
||||
For our very simple structure above, the expected pattern would be:
|
||||
|
||||
1. Begin visiting a map (*Serde*'s equivalent to `HashMap` or JSON's
|
||||
dictionary).
|
||||
1. Visit a string key called "keys".
|
||||
1. Begin visiting a map value.
|
||||
1. For each item, visit a string key then an integer value.
|
||||
1. Visit the end of the map.
|
||||
1. Store the map into the `keys` field of the data structure.
|
||||
1. Visit a string key called "value_table".
|
||||
1. Begin visiting a list value.
|
||||
1. For each item, visit an integer.
|
||||
1. Visit the end of the list
|
||||
1. Store the list into the `value_table` field.
|
||||
1. Visit the end of the map.
|
||||
|
||||
But what determines which "observation" pattern is expected?
|
||||
|
||||
A functional programming language would be able to use currying to create
|
||||
reflection of each type based on the type itself. Rust does not support that, so
|
||||
every single type would need to have its own code written based on its fields
|
||||
and their properties.
|
||||
|
||||
*Serde* solves this usability challenge with a derive macro:
|
||||
|
||||
```rust,ignore
|
||||
use serde::Deserialize;
|
||||
|
||||
#[derive(Deserialize)]
|
||||
struct IdRecord {
|
||||
name: String,
|
||||
customer_id: String,
|
||||
}
|
||||
```
|
||||
|
||||
That macro simply generates an impl block causing the struct to implement a
|
||||
trait called `Deserialize`.
|
||||
|
||||
This is the function that determines how to create the struct itself. Code is
|
||||
generated based on the struct's fields. When the parsing library is called - in
|
||||
our example, a JSON parsing library - it creates a `Deserializer` and calls
|
||||
`Type::deserialize` with it as a parameter.
|
||||
|
||||
The `deserialize` code will then create a `Visitor` which will have its calls
|
||||
"refracted" by the `Deserializer`. If everything goes well, eventually that
|
||||
`Visitor` will construct a value corresponding to the type being parsed and
|
||||
return it.
|
||||
|
||||
For a complete example, see the
|
||||
[*Serde* documentation](https://serde.rs/deserialize-struct.html).
|
||||
|
||||
The result is that types to be deserialized only implement the "top layer" of
|
||||
the API, and file formats only need to implement the "bottom layer". Each piece
|
||||
can then "just work" with the rest of the ecosystem, since generic types will
|
||||
bridge them.
|
||||
|
||||
In conclusion, Rust's generic-inspired type system can bring it close to these
|
||||
concepts and use their power, as shown in this API design. But it may also need
|
||||
procedural macros to create bridges for its generics.
|
||||
|
||||
If you are interested in learning more about this topic, please check the
|
||||
following section.
|
||||
|
||||
## See Also
|
||||
|
||||
- [lens-rs crate](https://crates.io/crates/lens-rs) for a pre-built lenses
|
||||
implementation, with a cleaner interface than these examples
|
||||
- [Serde](https://serde.rs) itself, which makes these concepts intuitive for end
|
||||
users (i.e. defining the structs) without needing to understand the details
|
||||
- [luminance](https://github.com/phaazon/luminance-rs) is a crate for drawing
|
||||
computer graphics that uses similar API design, including procedural macros to
|
||||
create full prisms for buffers of different pixel types that remain generic
|
||||
- [An Article about Lenses in Scala](https://web.archive.org/web/20221128185849/https://medium.com/zyseme-technology/functional-references-lens-and-other-optics-in-scala-e5f7e2fdafe)
|
||||
that is very readable even without Scala expertise.
|
||||
- [Paper: Profunctor Optics: Modular Data
|
||||
Accessors](https://web.archive.org/web/20220701102832/https://arxiv.org/ftp/arxiv/papers/1703/1703.10857.pdf)
|
||||
- [Musli](https://github.com/udoprog/musli) is a library which attempts to use a
|
||||
similar structure with a different approach, e.g. doing away with the visitor
|
||||
|
||||
[^1]: [School of Haskell: A Little Lens Starter Tutorial](https://web.archive.org/web/20221128190041/https://www.schoolofhaskell.com/school/to-infinity-and-beyond/pick-of-the-week/a-little-lens-starter-tutorial)
|
||||
|
||||
[^2]: [Concordance on Wikipedia](https://en.wikipedia.org/wiki/Concordance_(publishing))
|
Loading…
Reference in New Issue