Several FFI Sections: Strings, Errors, and API Design (#106)

pull/160/head
jhwgh1968 3 years ago committed by GitHub
parent 501ae92a43
commit 11a4b712c0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -11,7 +11,6 @@ language that you can read [here](https://rust-unofficial.github.io/patterns/).
* TODO trait to separate visibility of methods from visibility of data (<https://github.com/sfackler/rust-postgres/blob/v0.9.6/src/lib.rs#L1400>)
* TODO leak amplification ("Vec::drain sets the Vec's len to 0 prematurely so that mem::forgetting Drain "only" mem::forgets more stuff. instead of exposing uninitialized memory or having to update the len on every iteration")
* TODO interior mutability - UnsafeCell, Cell, RefCell
* TODO FFI usage (By being mindful of how to provide Rust libraries, and make use of existing libraries across the FFI, you can get more out of benefits Rust can bring)
### Design patterns

@ -11,6 +11,10 @@
- [Finalisation in Destructors](./idioms/dtor-finally.md)
- [`mem::replace(_)`](./idioms/mem-replace.md)
- [On-Stack Dynamic Dispatch](./idioms/on-stack-dyn-dispatch.md)
- [Foreign function interface usage](./idioms/ffi-intro.md)
- [Idiomatic Errors](./idioms/ffi-errors.md)
- [Accepting Strings](./idioms/ffi-accepting-strings.md)
- [Passing Strings](./idioms/ffi-passing-strings.md)
- [Iterating over an `Option`](./idioms/option-iter.md)
- [Pass Variables to Closure](./idioms/pass-var-to-closure.md)
- [Privacy For Extensibility](./idioms/priv-extend.md)
@ -20,6 +24,10 @@
- [Design Patterns](./patterns/index.md)
- [Builder](./patterns/builder.md)
- [Compose Structs](./patterns/compose-structs.md)
- [Entry API](./patterns/entry.md)
- [Foreign function interface usage](./patterns/ffi-intro.md)
- [Object-Based APIs](./patterns/ffi-export.md)
- [Type Consolidation into Wrappers](./patterns/ffi-wrappers.md)
- [Fold](./patterns/fold.md)
- [Newtype](./patterns/newtype.md)
- [RAII Guards](./patterns/RAII.md)

@ -0,0 +1,113 @@
# Accepting Strings
## Description
When accepting strings via FFI through pointers, there are two principles that should be followed:
1. Keep foreign strings "borrowed", rather than copying them directly.
2. Minimize `unsafe` code during the conversion.
## Motivation
Rust has built-in support for C-style strings with its `CString` and `CStr` types.
However, there are different approaches one can take with strings that are being accepted from a foreign caller of a Rust function.
The best practice is simple: use `CStr` in such a way as to minimize unsafe code, and create a borrowed slice.
If an owned String is needed, call `to_string()` on the string slice.
## Code Example
```rust,ignore
pub mod unsafe_module {
// other module content
#[no_mangle]
pub extern "C" fn mylib_log(msg: *const libc::c_char, level: libc::c_int) {
let level: crate::LogLevel = match level { /* ... */ };
let msg_str: &str = unsafe {
// SAFETY: accessing raw pointers expected to live for the call,
// and creating a shared reference that does not outlive the current
// stack frame.
match std::ffi::CStr::from_ptr(msg).to_str() {
Ok(s) => s,
Err(e) => {
crate::log_error("FFI string conversion failed");
return;
}
}
};
crate::log(msg_str, level);
}
}
```
## Advantages
The example is is written to ensure that:
1. The `unsafe` block is as small as possible.
2. The pointer with an "untracked" lifetime becomes a "tracked" shared reference
Consider an alternative, where the string is actually copied:
```rust,ignore
pub mod unsafe_module {
// other module content
pub extern "C" fn mylib_log(msg: *const libc::c_char, level: libc::c_int) {
/* DO NOT USE THIS CODE. IT IS UGLY, VERBOSE, AND CONTAINS A SUBTLE BUG. */
let level: crate::LogLevel = match level { /* ... */ };
let msg_len = unsafe { /* SAFETY: strlen is what it is, I guess? */
libc::strlen(msg)
};
let mut msg_data = Vec::with_capacity(msg_len + 1);
let msg_cstr: std::ffi::CString = unsafe {
// SAFETY: copying from a foreign pointer expected to live
// for the entire stack frame into owned memory
std::ptr::copy_nonoverlapping(msg, msg_data.as_mut(), msg_len);
msg_data.set_len(msg_len + 1);
std::ffi::CString::from_vec_with_nul(msg_data).unwrap()
}
let msg_str: String = unsafe {
match msg_cstr.into_string() {
Ok(s) => s,
Err(e) => {
crate::log_error("FFI string conversion failed");
return;
}
}
};
crate::log(&msg_str, level);
}
}
```
This code in inferior to the original in two respects:
1. There is much more `unsafe` code, and more importantly, more invariants it must uphold.
2. Due to the extensive arithmetic required, there is a bug in this version that cases Rust `undefined behaviour`.
The bug here is a simple mistake in pointer arithmetic: the string was copied, all `msg_len` bytes of it.
However, the `NUL` terminator at the end was not.
The Vector then had its size *set* to the length of the *zero padded string* -- rather than *resized* to it, which could have added a zero at the end. As a result, the last byte in the Vector is uninitialized memory.
When the `CString` is created at the bottom of the block, its read of the Vector will cause `undefined behaviour`!
Like many such issues, this would be difficult issue to track down.
Sometimes it would panic because the string was not `UTF-8`, sometimes it would put a weird character at the end of the string, sometimes it would just completely crash.
## Disadvantages
None?

@ -0,0 +1,134 @@
# Error Handling in FFI
## Description
In foreign languages like C, errors are represented by return codes.
However, Rust's type system allows much more rich error information to be captured a propogated through a full type.
This best practice shows different kinds of error codes, and how to expose them in a usable way:
1. Flat Enums should be converted to integers and returned as codes.
2. Structured Enums should be converted to an integer code with a string error message for detail.
3. Custom Error Types should become "transparent", with a C representation.
## Code Example
### Flat Enums
```rust,ignore
enum DatabaseError {
IsReadOnly = 1, // user attempted a write operation
IOError = 2, // user should read the C errno() for what it was
FileCorrupted = 3, // user should run a repair tool to recover it
}
impl From<DatabaseError> for libc::c_int {
fn from(e: DatabaseError) -> libc::c_int {
(e as i8).into()
}
}
```
### Structured Enums
```rust,ignore
pub mod errors {
enum DatabaseError {
IsReadOnly,
IOError(std::io::Error),
FileCorrupted(String), // message describing the issue
}
impl From<DatabaseError> for libc::c_int {
fn from(e: DatabaseError) -> libc::c_int {
match e {
DatabaseError::IsReadOnly => 1,
DatabaseError::IOError(_) => 2,
DatabaseError::FileCorrupted(_) => 3,
}
}
}
}
pub mod c_api {
use super::errors::DatabaseError;
#[no_mangle]
pub extern "C" fn db_error_description(
e: *const DatabaseError
) -> *mut libc::c_char {
let error: &DatabaseError = unsafe {
/* SAFETY: pointer lifetime is greater than the current stack frame */
&*e
};
let error_str: String = match error {
DatabaseError::IsReadOnly => {
format!("cannot write to read-only database");
}
DatabaseError::IOError(e) => {
format!("I/O Error: {}", e);
}
DatabaseError::FileCorrupted(s) => {
format!("File corrupted, run repair: {}", &s);
}
};
let c_error = unsafe {
// SAFETY: copying error_str to an allocated buffer with a NUL
// character at the end
let mut malloc: *mut u8 = libc::malloc(error_str.len() + 1) as *mut _;
if malloc.is_null() {
return std::ptr::null_mut();
}
let src = error_str.as_bytes().as_ptr();
std::ptr::copy_nonoverlapping(src, malloc, error_str.len());
std::ptr::write(malloc.add(error_str.len()), 0);
malloc as *mut libc::c_char
};
c_error
}
}
```
### Custom Error Types
```rust,ignore
struct ParseError {
expected: char,
line: u32,
ch: u16
}
impl ParseError { /* ... */ }
/* Create a second version which is exposed as a C structure */
#[repr(C)]
pub struct parse_error {
pub expected: libc::c_char,
pub line: u32,
pub ch: u16
}
impl From<ParseError> for parse_error {
fn from(e: ParseError) -> parse_error {
let ParseError { expected, line, ch } = e;
parse_error { expected, line, ch }
}
}
```
## Advantages
This ensures that the foreign language has clear access to error information while not compromising the Rust code's API at all.
## Disadvantages
It's a lot of typing, and some types may not be able to be converted easily to C.

@ -0,0 +1,12 @@
# FFI Idioms
Writing FFI code is an entire course in itself.
However, there are several idioms here that can act as pointers, and avoid traps for inexperienced users of `unsafe` Rust.
This section contains idioms that may be useful when doing FFI.
1. [Idiomatic Errors](./ffi-errors.md) - Error handling with integer codes and sentinel return values (such as `NULL` pointers)
2. [Accepting Strings](./ffi-accepting-strings.md) with minimal unsafe code
3. [Passing Strings](./ffi-passing-strings.md) to FFI functions

@ -0,0 +1,95 @@
# Passing Strings
## Description
When passing strings to FFI functions, there are four principles that should be followed:
1. Make the lifetime of owned strings as long as possible.
2. Minimize `unsafe` code during the conversion.
3. If the C code can modify the string data, use `Vec` instead of `CString`.
4. Unless the Foreign Function API requires it, the ownership of the string should not transfer to the callee.
## Motivation
Rust has built-in support for C-style strings with its `CString` and `CStr` types.
However, there are different approaches one can take with strings that are being sent to a foreign function call from a Rust function.
The best practice is simple: use `CString` in such a way as to minimize `unsafe` code.
However, a secondary caveat is that *the object must live long enough*, meaning the lifetime should be maximized.
In addition, the documentation explains that "round-tripping" a `CString` after modification is UB, so additional work is necessary in that case.
## Code Example
```rust,ignore
pub mod unsafe_module {
// other module content
extern "C" {
fn seterr(message: *const libc::c_char);
fn geterr(buffer: *mut libc::c_char, size: libc::c_int) -> libc::c_int;
}
fn report_error_to_ffi<S: Into<String>>(
err: S
) -> Result<(), std::ffi::NulError>{
let c_err = std::ffi::CString::new(err.into())?;
unsafe {
// SAFETY: calling an FFI whose documentation says the pointer is
// const, so no modificationshould occur
seterr(c_err.as_ptr());
}
Ok(())
// The lifetime of c_err continues until here
}
fn get_error_from_ffi() -> Result<String, std::ffi::IntoStringError> {
let mut buffer = vec![0u8; 1024];
unsafe {
// SAFETY: calling an FFI whose documentation implies
// that the input need only live as long as the call
let written: usize = geterr(buffer.as_mut_ptr(), 1023).into();
buffer.truncate(written + 1);
}
std::ffi::CString::new(buffer).unwrap().into_string()
}
}
```
## Advantages
The example is written in a way to ensure that:
1. The `unsafe` block is as small as possible.
2. The `CString` lives long enough.
3. Errors with typecasts are always propagated when possible.
A common mistake (so common it's in the documentation) is to not use the variable in the first block:
```rust,ignore
pub mod unsafe_module {
// other module content
fn report_error<S: Into<String>>(err: S) -> Result<(), std::ffi::NulError> {
unsafe {
// SAFETY: whoops, this contains a dangling pointer!
seterr(std::ffi::CString::new(err.into())?.as_ptr());
}
Ok(())
}
}
```
This code will result in a dangling pointer, because the lifetime of the `CString` is not extended by the pointer creation, unlike if a reference were created.
Another issue frequently raised is that the initialization of a 1k vector of zeroes is "slow".
However, recent versions of Rust actually optimize that particular macro to a call to `zmalloc`, meaning it is as fast as the operating system's ability to return zeroed memory (which is quite fast).
## Disadvantages
None?

@ -0,0 +1,215 @@
# Object-Based APIs
## Description
When designing APIs in Rust which are exposed to other languages, there are some important design principles which are contrary to normal Rust API design:
1. All Encapsulated types should be *owned* by Rust, *managed* by the user, and *opaque*.
2. All Transactional data types should be *owned* by the user, and *transparent*.
3. All library behavior should be functions acting upon Encapsulated types.
4. All library behavior should be encapsulated into types not based on structure, but *provenance/lifetime*.
## Motivation
Rust has built-in FFI support to other languages.
It does this by providing a way for crate authors to provide C-compatible APIs through different ABIs (though that is unimportant to this practice).
Well-designed Rust FFI follows C API design principles, while compromising the design in Rust as little as possible. There are three goals with any foreign API:
1. Make it easy to use in the target language.
2. Avoid the API dictating internal unsafety on the Rust side as much as possible.
3. Keep the potential for memory unsafety and Rust `undefined behaviour` as small as possible.
Rust code must trust the memory safety of the foreign language beyond a certain point.
However, every bit of `unsafe` code on the Rust side is an opportunity for bugs, or to exacerbate `undefined behaviour`.
For example, if a pointer provenance is wrong, that may be a segfault due to invalid memory access.
But if it is manipulated by unsafe code, it could become full-blown heap corruption.
The Object-Based API design allows for writing shims that have good memory safety characteristics, and a clean boundary of what is safe and what is `unsafe`.
## Code Example
The POSIX standard defines the API to access an on-file database, known as [DBM](https://web.archive.org/web/20210105035602/https://www.mankier.com/0p/ndbm.h). It is an excellent example of an "object-based" API.
Here is the definition in C, which hopefully should be easy to read for those involved in FFI.
The commentary below should help explaining it for those who miss the subtleties.
```C
struct DBM;
typedef struct { void *dptr, size_t dsize } datum;
int dbm_clearerr(DBM *);
void dbm_close(DBM *);
int dbm_delete(DBM *, datum);
int dbm_error(DBM *);
datum dbm_fetch(DBM *, datum);
datum dbm_firstkey(DBM *);
datum dbm_nextkey(DBM *);
DBM *dbm_open(const char *, int, mode_t);
int dbm_store(DBM *, datum, datum, int);
```
This API defines two types: `DBM` and `datum`.
The `DBM` type was called an "encapsulated" type above.
It is designed to contain internal state, and acts as an entry point for the library's behavior.
It is completely opaque to the user, who cannot create a `DBM` themselves since they don't know its size or layout.
Instead, they must call `dbm_open`, and that only gives them *a pointer to one*.
This means all `DBM`s are "owned" by the library in a Rust sense. The internal state of unknown size is kept in memory controlled by the library, not the user.
The user can only manage its life cycle with `open` and `close`, and perform operations on it with the other functions.
The `datum` type was called a "transactional" type above. It is designed to facilitate the exchange of information between the library and its user.
The database is designed to store "unstructured data", with no pre-defined length or meaning.
As a result, the `datum` is the C equivalent of a Rust slice: a bunch of bytes, and a count of how many there are.
The main difference is that there is no type information, which is what `void` indicates.
Keep in mind that this header is written from the library's point of view.
The user likely has some type they are using, which has a known size.
But the library does not care, and by the rules of C casting, any type behind a pointer can be cast to `void`.
As noted earlier, this type is *transparent* to the user. But also, this type is *owned* by the user.
This has subtle ramifications, due to that pointer inside it.
The question is, who owns the memory that pointer points to?
The answer for best memory safety is, "the user".
But in cases such as retrieving a value, the user does not know how to allocate it correctly (since they don't know how long the value is).
In this case, the library code is expected to use the heap that the user has access to -- such as the C library `malloc` and `free` -- and then *transfer ownership* in the Rust sense.
This may all seem speculative, but this is what a pointer means in C.
It means the same thing as Rust: "user defined lifetime."
The user of the library needs to read the documentation in order to use it correctly.
That said, there are some decisions that have fewer or greater consequences if users do it wrong.
Minimizing those is what this best practice is about, and the key is to *transfer ownership of everything that is transparent*.
## Advantages
This minimizes the number of memory safety guarantees the user must uphold to a relatively small number:
1. Do not call any function with a pointer not returned by `dbm_open` (invalid access or corruption).
2. Do not call any function on a pointer after close (use after free).
3. The `dptr` on any `datum` must be `NULL`, or point to a valid slice of memory at the advertised length.
In addition, it avoids a lot of pointer provenance issues.
To understand why, let us consider an alternative in some depth: key iteration.
Rust is well known for its iterators.
When implementing one, the programmer makes a separate type with a bounded lifetime to its owner, and implements the `Iterator` trait.
Here is how iteration would be done in Rust for `DBM`:
```rust,ignore
struct Dbm { ... }
impl Dbm {
/* ... */
pub fn keys<'it>(&'it self) -> DbmKeysIter<'it> { ... }
/* ... */
}
struct DbmKeysIter<'it> {
owner: &'it Dbm,
}
impl<'it> Iterator for DbmKeysIter<'it> { ... }
```
This is clean, idiomatic, and safe. thanks to Rust's guarantees.
However, consider what a straightforward API translation would look like:
```rust,ignore
#[no_mangle]
pub extern "C" fn dbm_iter_new(owner: *const Dbm) -> *mut DbmKeysIter {
/* THIS API IS A BAD IDEA! For real applications, use object-based design instead. */
}
#[no_mangle]
pub extern "C" fn dbm_iter_next(iter: *mut DbmKeysIter, key_out: *const datum) -> libc::c_int {
/* THIS API IS A BAD IDEA! For real applications, use object-based design instead. */
}
#[no_mangle]
pub extern "C" fn dbm_iter_del(*mut DbmKeysIter) {
/* THIS API IS A BAD IDEA! For real applications, use object-based design instead. */
}
```
This API loses a key piece of information: the lifetime of the iterator must not exceed the lifetime of the `Dbm` object that owns it.
A user of the library could use it in a way which causes the iterator to outlive the data it is iterating on, resulting in reading uninitialized memory.
This example written in C contains a bug that will be explained afterwards:
```C
int count_key_sizes(DBM *db) {
/* DO NOT USE THIS FUNCTION. IT HAS A SUBTLE BUT SERIOUS BUG! */
datum key;
int len = 0;
if (!dbm_iter_new(db)) {
dbm_close(db);
return -1;
}
int l;
while ((l = dbm_iter_next(owner, &key)) >= 0) { // an error is indicated by -1
free(key.dptr);
len += key.dsize;
if (l == 0) { // end of the iterator
dbm_close(owner);
}
}
if l >= 0 {
return -1;
} else {
return len;
}
}
```
This bug is a classic. Here's what happens when the iterator returns the end-of-iteration marker:
1. The loop condition sets `l` to zero, and enters the loop because `0 >= 0`.
2. The length is incremented, in this case by zero.
3. The if statement is true, so the database is closed. There should be a break statement here.
4. The loop condition executes again, causing a `next` call on the closed object.
The worst part about this bug?
If the Rust implementation was careful, this code will work most of the time!
If the memory for the `Dbm` object is not immediately reused, an internal check will almost certainly fail, resulting in the iterator returning a `-1` indicating an error.
But occasionally, it will cause a segmentation fault, or even worse, nonsensical memory corruption!
None of this can be avoided by Rust.
From its perspective, it put those objects on its heap, returned pointers to them, and gave up control of their lifetimes. The C code simply must "play nice".
The programmer must read and understand the API documentation.
While some consider that par for the course in C, a good API design can mitigate this risk.
The POSIX API for `DBM` did this by *consolidating the ownership* of the iterator with its parent:
```C
datum dbm_firstkey(DBM *);
datum dbm_nextkey(DBM *);
```
Thus, all of the lifetimes were bound together, and such unsafety was prevented.
## Disadvantages
However, this design choice also has a number of drawbacks, which should be considered as well.
First, the API itself becomes less expressive.
With POSIX DBM, there is only one iterator per object, and every call changes its state.
This is much more restrictive than iterators in almost any language, even though it is safe.
Perhaps with other related objects, whose lifetimes are less hierarchical, this limitation is more of a cost than the safety.
Second, depending on the relationships of the API's parts, significant design effort may be involved.
Many of the easier design points have other patterns associated with them:
- [Wrapper Type Consolidation](./ffi-wrappers.md) groups multiple Rust types together into an opaque "object"
- [FFI Error Passing](../idioms/ffi-errors.md) explains error handling with integer codes and sentinel return values (such as `NULL` pointers)
- [Accepting Foreign Strings](../idioms/ffi-accepting-strings.md) allows accepting strings with minimal unsafe code, and is easier to get right than [Passing Strings to FFI](../idioms/ffi-passing-strings.md)
However, not every API can be done this way.
It is up to the best judgement of the programmer as to who their audience is.

@ -0,0 +1,10 @@
# FFI Patterns
Writing FFI code is an entire course in itself.
However, there are several idioms here that can act as pointers, and avoid traps for inexperienced users of unsafe Rust.
This section contains design patterns that may be useful when doing FFI.
1. [Object-Based API](./ffi-export.md) design that has good memory safety characteristics, and a clean boundary of what is safe and what is unsafe
2. [Type Consolidation into Wrappers](./ffi-wrappers.md) - group multiple Rust types together into an opaque "object"

@ -0,0 +1,138 @@
# Type Consolidation into Wrappers
## Description
This pattern is designed to allow gracefully handling multiple related types, while minimizing the surface area for memory unsafety.
One of the cornerstones of Rust's aliasing rules is lifetimes.
This ensures that many patterns of access between types can be memory safe, data race safety included.
However, when Rust types are exported to other languages, they are usually transformed into pointers.
In Rust, a pointer means "the user manages the lifetime of the pointee." It is their responsibility to avoid memory unsafety.
Some level of trust in the user code is thus required, notably around use-after-free which Rust can do nothing about.
However, some API designs place higher burdens than others on the code written in the other language.
The lowest risk API is the "consolidated wrapper", where all possible interactions with an object are folded into a "wrapper type", while keeping the Rust API clean.
## Code Example
To understand this, let us look at a classic example of an API to export: iteration through a collection.
That API looks like this:
1. The iterator is initialized with `first_key`.
2. Each call to `next_key` will advance the iterator.
3. Calls to `next_key` if the iterator is at the end will do nothing.
4. As noted above, the iterator is "wrapped into" the collection (unlike the native Rust API).
If the iterator implements `nth()` efficiently, then it is possible to make it ephemeral to each function call:
```rust,ignore
struct MySetWrapper {
myset: MySet,
iter_next: usize,
}
impl MySetWrapper {
pub fn first_key(&mut self) -> Option<&Key> {
self.iter_next = 0;
self.next_key()
}
pub fn next_key(&mut self) -> Option<&Key> {
if let Some(next) = self.myset.keys().nth(self.iter_next) {
self.iter_next += 1;
Some(next)
} else {
None
}
}
}
```
As a result, the wrapper is simple and contains no `unsafe` code.
## Advantages
This makes APIs safer to use, avoiding issues with lifetimes between types.
See [Object-Based APIs](./ffi-export.md) for more on the advantages and pitfalls this avoids.
## Disadvantages
Often, wrapping types is quite difficult, and sometimes a Rust API compromise would make things easier.
As an example, consider an iterator which does not efficiently implement `nth()`.
It would definitely be worth putting in special logic to make the object handle iteration internally, or to support a different access pattern efficiently that only the Foreign Function API will use.
### Trying to Wrap Iterators (and Failing)
To wrap any type of iterator into the API correctly, the wrapper would need to do what a C version of the code would do: erase the lifetime of the iterator, and manage it manually.
Suffice it to say, this is *incredibly* difficult.
Here is an illustration of just *one* pitfall.
A first version of `MySetWrapper` would look like this:
```rust,ignore
struct MySetWrapper {
myset: MySet,
iter_next: usize,
// created from a transmuted Box<KeysIter + 'self>
iterator: Option<NonNull<KeysIter<'static>>>,
}
```
With `transmute` being used to extend a lifetime, and a pointer to hide it, it's ugly already.
But it gets even worse: *any other operation can cause Rust `undefined behaviour`*.
Consider that the `MySet` in the wrapper could be manipulated by other functions during iteration, such as storing a new value to the key it was iterating over.
The API doesn't discourage this, and in fact some similar C libraries expect it.
A simple implementation of `myset_store` would be:
```rust,ignore
pub mod unsafe_module {
// other module content
pub fn myset_store(
myset: *mut MySetWrapper,
key: datum,
value: datum) -> libc::c_int {
/* DO NOT USE THIS CODE. IT IS UNSAFE TO DEMONSTRATE A PROLBEM. */
let myset: &mut MySet = unsafe { // SAFETY: whoops, UB occurs in here!
&mut (*myset).myset
};
/* ...check and cast key and value data... */
match myset.store(casted_key, casted_value) {
Ok(_) => 0,
Err(e) => e.into()
}
}
}
```
If the iterator exists when this function is called, we have violated one of Rust's aliasing rules.
According to Rust, the mutable reference in this block must have *exclusive* access to the object.
If the iterator simply exists, it's not exclusive, so we have `undefined behaviour`! [^1]
To avoid this, we must have a way of ensuring that mutable reference really is exclusive.
That basically means clearing out the iterator's shared reference while it exists, and then reconstructing it.
In most cases, that will still be less efficient than the C version.
Some may ask: how can C do this more efficiently?
The answer is, it cheats. Rust's aliasing rules are the problem, and C simply ignores them for its pointers.
In exchange, it is common to see code that is declared in the manual as "not thread safe" under some or all circumstances.
In fact, [The GNU C library has an entire lexicon dedicated to concurrent behavior!](https://manpages.debian.org/buster/manpages/attributes.7.en.html)
Rust would rather make everything memory safe all the time, for both safety and optimizations that C code cannot attain.
Being denied access to certain shortcuts is the price Rust programmers need to pay.
[^1]: For the C programmers out there scratching their heads, the iterator need not be read *during* this code cause the UB.
The exclusivity rule also enables compiler optimizations which may cause inconsistent observations by the iterator's shared reference (e.g. stack spills or reordering instructions for efficiency).
These observations may happen *any time after* the mutable reference is created.
Loading…
Cancel
Save