Rust is a new language that already has good textbooks. But sometimes its textbooks are difficult because they are for native English speakers. Many companies and people now learn Rust, and could learn faster with a book that has easy English. This textbook is for these companies and people to learn Rust with simple English.
It is now late July, and *Easy Rust* is about 200 pages long. I am still writing it so there will be much more content. I plan to finish writing the main content by around August 15. You can contact me here or [on LinkedIn](https://www.linkedin.com/in/davemacleod) if you have any questions. I am a Canadian who lives in Korea, and as I write Easy Rust I think of how to make it easy for companies here to start using it. I hope that other countries that don't use English as a first language can use it too.
Maybe you don't want to install Rust yet, and that's okay. You can go to [https://play.rust-lang.org/](https://play.rust-lang.org/) and start writing Rust. You can write your code there and click Run to see the results.
- Change Debug to Release if you want your code to be faster. Debug: compiles faster, runs slower, contains debug information. Release: compiles slower, runs much faster, removes debug information.
- Click on Share to get a url. You can use that to share your code if you want help.
- Tools: Rustfmt will format your code nicely.
- Tools: Clippy will give you extra information about how to make your code better.
- Config: here you can change your theme to dark mode, and many other configurations.
If you want to install Rust, go here [https://www.rust-lang.org/tools/install](https://www.rust-lang.org/tools/install) and follow the instructions. Usually you will use `rustup` to install and update Rust.
Rust has simple types that are called **primitive types**. We will start with integers. Integers are whole numbers with no decimal point. There are two types of integers:
Signs means `+` (plus sign) and `-` (minus sign), so signed integers can be positive or negative (e.g. +8, -8). But unsigned integers can only be positive, because they do not have a sign.
The number after the i or the u means the number of bits for the number, so numbers with more bits can be larger. 8 bits = one byte, so `i8` is one byte, `i64` is 8 bytes, and so on. Number types with larger sizes can hold larger numbers. For example, a `u8` can hold up to 255, but a `u16` can hold up to 65535. And a `u128` can hold up to 340282366920938463463374607431768211455.
So what is `isize` and `usize`? This means the number of bits on your type of computer. (This is called the **architecture** of your computer.) So `isize` and `usize` on a 32-bit computer is like `i32` and `u32`, and `isize` and `usize` on a 64-bit computer is like `i64` and `u64`.
There are many reasons for the different types of integers. One reason is computer performance: a smaller number of bytes is faster to process. But here are some other uses:
Characters in Rust are called `char`. Every `char` has a number: the letter `A` is number 65, while the character `友` ("friend" in Chinese) is number 21451. The list of numbers is called "Unicode". Unicode uses smaller numbers for characters that are used more, like A through Z, or digits 0 through 9, or space. The characters that are used most get numbers that are less than 256, and they can fit into a `u8`. This means that Rust can safely **cast** a `u8` into a `char`, using `as`. (Cast `u8` as `char` means "pretend `u8` is a `char`")
Here is another reason for the different sizes: `usize` is the size that Rust uses for *indexing*. (Indexing means "which item is first", "which item is second", etc.) `usize` is the best size for indexing because:
`slice` is six characters in length and six bytes, but `slice2` is three characters in length and seven bytes. `char` needs to fit any character in any language, so it is 4 bytes long.
Type inference means that if you don't tell the compiler the type, but it can decide by itself, it will decide. The compiler always needs to know the type of the variables, but you don’t always need to tell it. For example, for `let my_number = 8`, `my_number` will be an `i32`. That is because the compiler chooses i32 for integers if you don't tell it. But if you say `let my_number: u8 = 8`, it will make `my_number` a `u8`, because you told it `u8`.
But the types are not called `float`, they are called `f32` and `f64`. It is the same as integers: the number after `f` shows the number of bits. If you don't write the type, Rust will choose `f64`.
`println!` is a **macro** that prints to the console. A **macro** is like a function that writes code for you. Macros have a `!` after them. We will learn about making macros later. For now, remember that `!` means that it is a macro.
Inside the function is just `8`. Because there is no `;`, this is the value it returns. If it had a `;`, it would not return anything. Rust will not compile this if it has a `;`, because the return is `i32` and `;` returns `()`, not `i32`:
This means "you told me that `number()` returns an `i32`, but you added a `;` so it doesn't return anything". So the compiler suggests removing the semicolon.
Variables start and end inside a code block `{}`. In this example, `my_number` ends before we call `println!`, because it is inside its own code block.
Simple variables in Rust can be printed with `{}` inside `println!()`. But some variables can't, and you need to **debug print**. Debug print is printing for the programmer, because it usually shows more information. Debug sometimes doesn't look pretty, because it has extra information to help you.
This is a lot of information. But the important part is: `you may be able to use {:?} (or {:#?} for pretty-print) instead`. This means that you can try `{:?}`, and also `{:#?}` (`{:#?}` prints with different formatting).
The compiler says: `error[E0384]: cannot assign twice to immutable variable my_number`. This is because variables are immutable if you only write `let`.
Shadowing means using `let` to declare a new variable with the same name as another variable. It looks like mutability, but it is completely different. Shadowing looks like this:
So is the first `my_number` destroyed? No, but when we call `my_number` we now get `my_number` the `f64`. And because they are in the same scope block, we can't see the first `my_number` anymore.
- The stack is very fast, but the heap is not so fast.
- The stack needs to know the size of a variable at compile time. So simple variables like `i32` go on the stack, because we know their exact size.
- Some types don't know the size at compile time. But the stack needs to know the exact size. What do you do? You put the data in the heap, because the heap can have any size of data. And to find it, a pointer goes on the stack, because we always know the size of a pointer.
The pointer you usually see in Rust is called a **reference**. This is the important part to know: a reference points to the memory of another value. A reference means you *borrow* the value, but you don't own it. In Rust, references have a `&`. So:
If you want to print characters like `\n` (called "escape characters"), you can add an extra `\`:
```rust
fn main() {
println!("Here are two escape characters: \\n and \\t");
```
This prints:
```rust
Here are two escape characters: \n and \t
```
Sometimes you have many `"` and escape characters inside a string, and want Rust to ignore everything. To do this, you can add `r#` to the beginning and `#` to the end. If you need to print `#` then you can start with `r##` and end with `##`. And if you need more than one, you can add one more # on each side.
Here are four examples:
```rust
fn main() {
let my_string = "'Ice to see you,' he said."; // single quotes
let quote_string = r#""Ice to see you," he said."#; // double quotes
let hashtag_string = r##"The hashtag #IceToSeeYou had become very popular."##; // Has one # so we need at least ##
let many_hashtags = r####""You don't have to type ### to use a hashtag. You can just use #.""####; // Has three ### so we need at least ####
`r#` has another use: with it you can use a keyword as a variable name.
```rust
fn main() {
let r#let = 6; // The variable's name is let
let mut r#mut = 10; // This variable's name is mut
}
```
`r#` also has this function because older versions of Rust didn't have all the same keywords that Rust has now. So with `r#` it's easier to avoid mistakes with variable names that were not keywords before. You probably won't need it, but if you really need to use a keyword for a variable then you can use `r#`.
If you want to print the bytes of a `&str` or a `char`, you can just write `b'` before the string. This works for all ASCII characters. These are all the ASCII characters:
There is also a Unicode escape that lets you print any Unicode character inside a string: `\u{}`. A hexidecimal number goes inside the `{}` to print it. Here is a short example of how to get the Unicode number, and how to print it again.
We know that `println!` can print with `{}` (for Display) and `{:?}` (for Debug), plus `{:#?}` for pretty printing. But there are many other ways to print.
`father_name` is in position 0, `son_name` is in position 1, and `family_name` is in position 2. So it prints `This is Adrian Fahrenheit Țepeș, son of Vlad Țepeș`.
- Do you want a variable name? `{:ㅎ^11}` No variable name: it comes before `:`.
- Do you want a padding character? `{:ㅎ^11}` Yes. ㅎ comes after the `:` and has a `^`. `<` means padding with the character on the left, `>` means on the right, and `^` means in the middle.
- Do you want a minimum length? `{:ㅎ^11}` Yes: there is an 11 after.
- Do you want a maximum length? `{:ㅎ^11}` No: there is no number with a `.` before.
-`str` is a dynamically sized type (dynamically sized = the size can be different). For example, the names "서태지" and "Adrian Fahrenheit Țepeș" are not the same size on the stack:
println!("A String is always {:?} bytes. It is Sized.", std::mem::size_of::<String>()); // std::mem::size_of::<Type>() gives you the size in bytes of a type
println!("And an i8 is always {:?} bytes. It is Sized.", std::mem::size_of::<i8>());
println!("And an f64 is always {:?} bytes. It is Sized.", std::mem::size_of::<f64>());
println!("But a &str? It can be anything. '서태지' is {:?} bytes. It is not Sized.", std::mem::size_of_val("서태지")); // std::mem::size_of_val() gives you the size in bytes of a variable
println!("And 'Adrian Fahrenheit Țepeș' is {:?} bytes. It is not Sized.", std::mem::size_of_val("Adrian Fahrenheit Țepeș"));
That is why we need a &, because `&` makes a pointer, and Rust knows the size of the pointer. So the pointer goes on the stack. If we wrote `str`, Rust wouldn't know what to do because it doesn't know the size.
One other way to make a String is called `.into()` but it is a bit different. Some types can easily convert to and from another type using `From` and `.into()`. And if you have `From`, then you also have `.into()`. `From` is clearer because you already know the types: you know that `String::from("Some str")` is a `String` from a `&str`. But with `.into()`, sometimes the compiler doesn't know:
There are two types that don't use `let` to declare: `const` and `static`. Also, you need to write the type for them. These are for variables that don't change (`const` means constant). The difference is that:
References are very important in Rust. Rust uses references to make sure that all memory access is safe. We know that we use `&` to create a reference:
`country` is a `String`. We created two references to `country`. They have the type `&String`: a "reference to a String". We could create one hundred references to `country` and it would be no problem.
The function `return_str()` creates a String, then it creates a reference to the string. Then it tries to return the reference. But `country` only lives inside the function. So after the function is over, `country_ref` is referring to memory that is already gone. Rust prevents us from making a mistake with memory.
So let's use it to add 10 to my_number. But you can't write `num_ref += 10`, because `num_ref` is not the `i32` value. To reach the value, we use `*`. `*` means "I don't want the reference, I want the value behind the reference". In other words, one `*` erases one `&`.
Situation one: An employee is writing a Powerpoint presentation. He wants his manager to help him. The employee gives his login information to his manager, and asks him to help by making edits. Now the manager has a "mutable reference" to the employee's presentation. This is fine, because nobody else is looking at the presentation.
Situation two is about only immutable references.
Situation two: The employee is giving the presentation to 100 people. All 100 people can now see the employee's data. This is fine, because nobody can change the data.
Situation three: The Employee gives his manager his login information. Then the employee went to give the presentation to 100 people. This is not fine, because the manager can log in and do anything. Maybe his manager will log into the computer and type an email to his mother? Now the 100 people have to watch the manager write an email to his mother instead of the presentation. That's not what they expected to see.
The compiler knows that we used `number_change` to change `number`, but didn't use it again. So here there is no problem. We are not using immutable and mutable references together.
Does this print `Austria, 8` or `8, 8`? It prints `Austria, 8`. First we declare a `String` called `country`. Then we create a reference `country_ref` to this string. Then we shadow country with 8, which is an `i32`. But the first `country` was not destroyed, so `country_ref` still says "Austria", not "8".
- Step 1: We create the `String``country`. `country` is the owner.
- Step 2: We give `country` to `print_country`. `print_country` doesn't have an `->`, so it doesn't return anything. After `print_country` finishes, our `String` is now dead.
- Step 3: We try to give `country` to `print_country`, but we already did that. We don't have `country` to give anymore.
Now `print_country()` is a function that takes a reference to a `String`: a `&String`. Also, we give it a reference to country by writing `&country`. This says "you can look at it, but I will keep it".
How is this possible? It is because `mut country` is not a reference: `adds_hungary` owns `country` now. (Remember, it takes `String` and not `&String`). `adds_hungary` is the full owner, so it can take `country` as mutable.
Some types in Rust are very simple. They are called **copy types**. These simple types are all on the stack, and the compiler knows their size. That means that they are very easy to copy, so the compiler always copies when you send it to a function. So you don't need to worry about ownership.
These simple types include: integers, floats, booleans (true and false), and char.
How do you know if a type **implements** copy? (implements = can use) You can check the documentation. For example, here is the documentation for char:
On the left in **Trait Implementations** you can look in alphabetical order. A, B, C... there is no **Copy** in C. But there is **Clone**. **Clone** is similar to **Copy**, but needs more memory.
The important part is `which does not implement the Copy trait`. But in the documentation we saw that String implements the Clone trait. So we can add `.clone()`. This creates a clone, and we send the clone to the function. `country` is still alive, so we can use it.
Of course, if the `String` is very large, `.clone()` can use a lot of memory. For example, one `String` can be a whole book, and every time we call `.clone()` it will copy the book. So using `&` for a reference is faster, if you can.
It is important to know that `my_number` was declared in the `main()` function, so it lives until the end. But it gets its value from inside a loop. However, that value lives as long as `my_number`, because `my_number` has the value.
It helps to imagine if you simplify the code. `loop_then_return(number)` gives the result 50, so let's delete it and write `50` instead. Also, now we don't need `number` so we will delete it too. Now it looks like this:
The type of an array is: `[type; number]`. For example, the type of `["One", "Two"]` is `[&str; 2]`. This means that even these two arrays have different types:
This method is used a lot to create buffers. For example, `let mut buffer = [0; 640]` creates an array of 640 zeroes. Then we can change zero to other numbers in order to add data.
So `[0..2]` means the first index and the second index (0 and 1). Or you can call it the "zeroth and first" index. It doesn't have the third item, which is index 2.
You can also have an **inclusive** range, which means it includes the last number too. To do this, add `=` to write `..=` instead of `..`. So instead of `[0..2]` you can write `[0..=2]` if you want the first, second, and third item.
In the same way that we have `&str` and `String`, we have arrays and vectors. Arrays are faster with less functionality, and vectors are slower with more functionality. The type is written `Vec`.
The type is `Vec<i32>`. You call it a "vec of i32s". And a `Vec<String>` is a "vec of strings". And a `Vec<Vec<String>>` is a "vec of a vec of strings".
Because a vec is slower than an array, we can use some methods to make it faster. A vec has a **capacity**, which means the space given to the vector. If you add more to a vector than its capacity, it will make its capacity double and copy the items into the new space. This is called reallocation.
For example:
```rust
fn main() {
let mut num_vec = Vec::new();
num_vec.push('a'); // add one character
println!("{}", num_vec.capacity()); // prints 1
num_vec.push('a'); // add one more
println!("{}", num_vec.capacity()); // prints 2
num_vec.push('a'); // add one more
println!("{}", num_vec.capacity()); // prints 4. It has three elements, but capacity is 4
num_vec.push('a'); // add one more
num_vec.push('a'); // add one more // Now we have 5 elements
println!("{}", num_vec.capacity()); // Now capacity is 8
}
```
So this vector has three reallocations: 1 to 2, 2 to 4, and 4 to 8. We can make it faster:
```rust
fn main() {
let mut num_vec = Vec::with_capacity(8); // Give it capacity 8
num_vec.push('a'); // add one character
println!("{}", num_vec.capacity()); // prints 8
num_vec.push('a'); // add one more
println!("{}", num_vec.capacity()); // prints 8
num_vec.push('a'); // add one more
println!("{}", num_vec.capacity()); // prints 8.
num_vec.push('a'); // add one more
num_vec.push('a'); // add one more // Now we have 5 elements
This vector has 0 reallocations, which is better. So if you think you know how many elements you need, you can use `Vec::with_capacity()` to make it faster.
You remember that you can use `.into()` to make a `&str` into a `String`. You can also use it to make an array into a `Vec`. You have to tell `.into()` that you want a `Vec`, but you don't have to choose the type of `Vec`. If you don't want to choose, you can write `Vec<_>`.
Tuples in Rust use `()`. We have seen many empty tuples already. `fn do_something() {}` has an empty tuple. Also, when you don't return anything in a function, you actually return an empty tuple.
let (a, b, c) = (str_vec[0], str_vec[1], str_vec[2]);
println!("{:?}", b);
}
```
There are many more collection types, and many more ways to use arrays, vecs, and tuples. We will learn more about them. But first we will learn control flow.
Too much `if`, `else`, and `else if` can be difficult to read. You can use `match` instead. But you must match for every possible result. For example, this will not work:
This means "you told me about 0 to 2, but u8s can go up to 255. What about 3? What about 4? What about 5?" And so on. So you can add `_` which means "anything else".
This also shows how `match` statements work, because in the first example it only printed `Not much blue`. But `first` also has not much green. A `match` statement always stops when it finds a match, and doesn't check the rest. This is a good example of code that compiles well but is not the code you want. You can make a really big `match` statement to fix it, but it is probably better to use a `for` loop. We will talk about loops soon.
You can also use `@` to use the value of a `match` expression when you want to. In this example we match an `i32` input in a function. If it's 4 or 13 we want to use that number in a `println!` statement. Otherwise, we don't need to use it.
With structs, you can create your own type. Structs are created with the keyword `struct`. The name of a struct should be in UpperCamelCase (capital letter for each word, no spaces).
The next is a tuple struct, or an unnamed struct. It is "unnamed" because you only need to write the types, not the variable names. Tuple structs are good when you need a simple struct and don't need to remember names.
The third type is the named struct. This is probably the most common struct. In this struct you declare variable names and types inside a `{}` code block.
In a named struct, you separate variables by commas. For the last variable you can add a comma or not - it's up to you. `SizeAndColour` had a comma after `colour`:
```rust
struct SizeAndColour {
size: u32,
colour: Colour, // And we put it in our new named struct
}
```
but you don't need it. But it can be a good idea to always put a comma, because sometimes you will change the order of the variables:
```rust
struct SizeAndColour {
size: u32,
colour: Colour // No comma here
}
// then we decide to change the order...
struct SizeAndColour {
colour: Colour // Whoops! Now this doesn't have a comma.
size: u32,
}
```
But it is not very important either way so you can choose whether to use a comma or not.
Did you notice that we wrote the same thing twice? Actually, you don't need to do that. If the field name and variable name are the same, you don't have to write it twice.
To declare an enum, write `enum` and use a code block with the options, separated by commas. Just like a `struct`, the last part can have a comma or not. We will create an enum called `ThingsInTheSky`:
You know that items in a `Vec`, array, etc. all need the same type (only tuples are different). But you can actually use an enum to put different types in. Imagine we want to have a `Vec` with `u32`s or `i32`s. Of course, you can make a `Vec<(u32, i32)>` (a vec with `(u32, i32)` tuples) but we only want one. So here you can use an enum. Here is a simple example:
So there are two variants: the `U32` variant with a `u32` inside, and the `I32` variant with `i32` inside. `U32` and `I32` are just names we made. They could have been `UThirtyTwo` or `IThirtyTwo` or anything else.
Now, if we put them into a `Vec` we just have a `Vec<Number>`, and the compiler is happy. Because it's an enum, you have to pick one. We will use the `.is_positive()` method to pick. If it's `true` then we will choose `U32`, and if it's `false` then we will choose `I32`.
You can get the values from a struct or enum by using `let` backwards. This is called `destructuring`, and gives you the values separately. First a simple example:
You can see that it's backwards. First we say let `papa_doc = Person { fields }` to create the struct. Then we say `let Person {fields} = papa_doc` to destructure it.
Now a bigger example. In this example we have a `City` struct. We give it a `new` function to make it. Then we have a `process_city_values` function to do things with the values. In the function we just create a `Vec`, but you can imagine that we can do much more after we destructure it.
With loops you can tell Rust to continue something until you want it to stop. With `loop` you can start a loop that does not stop, unless you tell it when to `break`.
If you have a loop inside of a loop, you can give them names. With names, you can tell Rust which loop to `break` out of. Use `'` (called a "tick") and a `:` to give it a name:
A `while` loop is a loop that continues while something is still `true`. Each loop, Rust will check if it is still `true`. If it becomes `false`, Rust will stop the loop.
A `for` loop lets you tell Rust what to do each time. But in a `for` loop, the loop stops after a certain number of times. `for` loops use **ranges** very often. You use `..` and `..=` to create a range.
Rust also suggests `_number`. Putting `_` in front of a variable name means "maybe I will use it later". But using just `_` means "I don't care about this variable at all".
You can also use `break` to return a value. You write the value right after `break` and use a `;`. Here is an example with a `loop` and a break that gives `my_number` its value.
Now that we know how to use loops, here is a better solution to the `match` problem with colours. It is a better solution because we want to compare everything, and a `for` loop looks at every item.
To call functions on a `struct` or an `enum`, use an `impl` block. These functions are called **methods**. There are two kinds of methods in an `impl` block.
- Regular methods: these take **self** (or **&self** or **&mut self**). Regular methods use a `.`. `.clone()` is a regular method.
- Associated methods (or "static" methods): these do not take self. They are written differently, using `::`. `String::from()` is an associated method. You usually use associated methods to create new variables.
In our example we are going to create animals and print them. For a new struct or enum, you need to give it **Debug** if you want to use `{:?}` to print. If you write `#[derive(Debug)]` above the struct or enum then you can print it with `{:?}`.
Rust has many more types of collections. You can see them at https://doc.rust-lang.org/beta/std/collections/ in the standard library. That page has good explanations for why to use one type, so go there if you don't know what type you want. We will start with `HashMap`, which is very common.
## HashMap (and BTreeMap)
A HashMap is a collection made out of *keys* and *values*. You use the key to look up the value that matches the key. You can create a new `HashMap` with just `HashMap::new()` and use `.insert(key, value)` to insert items.
A `HashMap` is not in order, so if you print every key in a `HashMap` together it will probably print differently. We can see this in an example:
```rust
use std::collections::HashMap; // You have to bring HashMap in to use it
struct City {
name: String,
population: HashMap<u32,u32>, // This will have the date and the population for the date
}
fn main() {
let mut tallinn = City {
name: "Tallinn".to_string(),
population: HashMap::new(), // So far the HashMap is empty
};
tallinn.population.insert(1372, 3_250); // insert three dates
tallinn.population.insert(1851, 24_000);
tallinn.population.insert(2020, 437_619);
for (year, population) in tallinn.population { // The HashMap is HashMap<u32,u32> so it returns a tuple with two items
println!("In the year {} the city of {} had a population of {}.", year, tallinn.name, population);
}
}
```
This prints:
```text
In the year 1372 the city of Tallinn had a population of 3250.
In the year 2020 the city of Tallinn had a population of 437619.
In the year 1851 the city of Tallinn had a population of 24000.
```
or it might print:
```text
In the year 1851 the city of Tallinn had a population of 24000.
In the year 2020 the city of Tallinn had a population of 437619.
In the year 1372 the city of Tallinn had a population of 3250.
```
If you want a `HashMap` that you can sort, you can use a `BTreeMap`. Actually they are very similar to each other, so we can quickly change our `HashMap` to a `BTreeMap` to see. You will notice that it is almost the same code.
```rust
use std::collections::BTreeMap; // Just change HashMap to BTreeMap
struct City {
name: String,
population: BTreeMap<u32,u32>, // Just change HashMap to BTreeMap
}
fn main() {
let mut tallinn = City {
name: "Tallinn".to_string(),
population: BTreeMap::new(), // Just change HashMap to BTreeMap
};
tallinn.population.insert(1372, 3_250);
tallinn.population.insert(1851, 24_000);
tallinn.population.insert(2020, 437_619);
for (year, population) in tallinn.population.iter() { // just add .iter() - it will be sorted
println!("In the year {} the city of {} had a population of {}.", year, tallinn.name, population);
}
}
```
Now it will always print:
```text
In the year 1372 the city of Tallinn had a population of 3250.
In the year 1851 the city of Tallinn had a population of 24000.
In the year 2020 the city of Tallinn had a population of 437619.
You can get a value in a `HashMap` by just putting the key in `[]` square brackets.
```rust
use std::collections::HashMap;
fn main() {
let canadian_cities = vec!["Calgary", "Vancouver", "Gimli"];
let german_cities = vec!["Karlsruhe", "Bad Doberan", "Bielefeld"];
let mut city_hashmap = HashMap::new();
for city in canadian_cities {
city_hashmap.insert(city, "Canada");
}
for city in german_cities {
city_hashmap.insert(city, "Germany");
}
println!("{:?}", city_hashmap["Bielefeld"]);
}
```
This will bring up the value for the key `Bielefeld`, which is `Germany`. But be careful, because the program will crash if there is no key. If you write `println!("{:?}", city_hashmap["Bielefeldd"]);` for example then it will crash, because `Bielefeldd` doesn't exist. If you are not sure that there will be a key, you can use `.get()` which returns an `Option`. Then you will get `None` instead of crashing the program.
```rust
println!("{:?}", city_hashmap.get("Bielefeld"));
println!("{:?}", city_hashmap.get("Bielefeldd"));
```
This prints
```text
Some("Germany")
None
```
because *Bielefeld* exists, but *Bielefeldd* does not exist.
If a `HashMap` already has a key when you try to put it in, it will overwrite the value that matches it:
```rust
use std::collections::HashMap;
fn main() {
let mut book_hashmap = HashMap::new();
book_hashmap.insert(1, "L'Allemagne Moderne");
book_hashmap.insert(1, "Le Petit Prince");
book_hashmap.insert(1, "Eye of the World");
println!("{:?}", book_hashmap.get(&1));
}
```
This prints `Some("Eye of the World")`, because it was the last one you used `.insert()` for.
It is easy to check if an entry exists, because you can check with `.get()` which gives an `Option`:
```rust
use std::collections::HashMap;
fn main() {
let mut book_hashmap = HashMap::new();
book_hashmap.insert(1, "L'Allemagne Moderne");
if book_hashmap.get(&1).is_none() {
book_hashmap.insert(1, "Le Petit Prince");
}
println!("{:?}", book_hashmap.get(&1));
}
```
On this subject, `HashMap` has a very interesting method called `.entry()`. With this you can try to make an entry and use another method like `.or_insert()` to insert the value if there is no key. The interesting part is that it also gives a mutable reference so you can change it. First is an example where we just insert `true` every time we insert a book title into the `HashMap`.
```rust
use std::collections::HashMap;
fn main() {
let book_collection = vec!["L'Allemagne Moderne", "Le Petit Prince", "Eye of the World", "Eye of the World"]; // Eye of the World appears twice
let mut book_hashmap = HashMap::new();
for book in book_collection {
book_hashmap.entry(book).or_insert(true);
}
}
```
But maybe it would be better to count the number of books so that we know that there are two copies of *Eye of the World*. First let's look at what `.entry()` does, and what `.or_insert()` does. `.entry()` actually returns an `enum` called `Entry`:
```rust
pub fn entry(&mut self, key: K) -> Entry<K,V>
```
[Here is the page for Entry](https://doc.rust-lang.org/std/collections/hash_map/enum.Entry.html). There we can see the code for it:
```rust
pub enum Entry<'a, K: 'a, V: 'a> {
Occupied(OccupiedEntry<'a, K, V>),
Vacant(VacantEntry<'a, K, V>),
}
```
Then when we call `.or_insert()`, it looks at the enum and decides what to do.
```rust
pub fn or_insert(self, default: V) -> &'a mut V {
match self {
Occupied(entry) => entry.into_mut(),
Vacant(entry) => entry.insert(default),
}
}
```
The interesting part is that it returns a `mut` reference. That means you can bind it to a variable, and change the variable to change the value in the `HashMap`. So for every book we will insert a 0 if there is no entry, and we will use `+= 1` on the reference. Now it looks like this:
```rust
use std::collections::HashMap;
fn main() {
let book_collection = vec!["L'Allemagne Moderne", "Le Petit Prince", "Eye of the World", "Eye of the World"];
let mut book_hashmap = HashMap::new();
for book in book_collection {
let return_value = book_hashmap.entry(book).or_insert(0);
*return_value +=1;
}
for (book, number) in book_hashmap {
println!("{:?}, {:?}", book, number);
}
}
```
The important part is `let return_value = book_hashmap.entry(book).or_insert(0);`. If you take out the variable, you get `book_hashmap.entry(book).or_insert(0)`. If you do that then the mutable reference just doesn't go anywhere: it inserts 0, and then has a mutable reference to 0 that nobody takes. So we bind it to `return_value` so we can keep the 0. Then we increase the value by 1, which gives at least 1 for every book in the `HashMap`. Then when `.entry()` looks at *Eye of the World* again it doesn't insert anything, but it gives us a mutable 1. Then we increase it to 2, and that's why it prints this:
```text
"L\'Allemagne Moderne", 1
"Eye of the World", 2
"Le Petit Prince", 1
```
You can also use `.or_insert_with()` which lets you use a closure. You can always just do this:
```rust
let return_value = book_hashmap.entry(book).or_insert_with(|| 0); // Closure with nothing
```
or add any logic that you want. Here is something simple.
```rust
for book in book_collection {
let return_value =
book_hashmap.entry(book).or_insert_with(|| {
if book == "Eye of the World" { // Maybe an extra copy is arriving next week
// so we know there will be at least one more
1
} else {
0
}
},
);
*return_value += 1;
}
```
You can also do things with `.or_insert()` like insert a vec and then push into the vec. Let's pretend that we asked men and women on the street what they think of a politican. They give a rating from 0 to 10. Then we want to put the numbers together to see if the politician is more popular with men or women. It can look like this:
```rust
use std::collections::HashMap;
fn main() {
let data = [ // This is the raw data
("male", 9),
("female", 5),
("male", 0),
("female", 6),
("female", 5),
("male", 10),
];
let mut survey_hash = HashMap::new();
for item in data.iter() { // This gives a tuple of &(&str, i32)
The important line is: `survey_hash.entry(item.0).or_insert(Vec::new()).push(item.1);` So if it sees "female" it will check to see if there is "female" already in the `HashMap`. If not, it will insert a `Vec::new()`, then push the number in. If it sees "female" already in the `HashMap`, it will not insert a new Vec, and will just push the number into it.
A `HashSet` is actually a `HashMap` that only has keys. On [the page for HashSet](https://doc.rust-lang.org/std/collections/struct.HashSet.html) it explains this on the top:
`A hash set implemented as a HashMap where the value is ().`
You often use a `HashSet` if you just want to know if a key exists, or doesn't exist.
Imagine that you have 100 random numbers, and each number between 1 and 100. If you do this, some numbers will appear more than once, while some won't appear at all. If you put them into a `HashSet` then you will have a list of all the numbers that appeared.
A `BTreeSet` is similar to a `HashSet` in the same way that a `BTreeMap` is similar to a `HashMap`. If we print each item in the `HashSet`, we don't know what the order will be:
```rust```
for entry in number_hashset {
print!("{} ", entry);
}
```
Maybe it will print this: `67 28 42 25 95 59 87 11 5 81 64 34 8 15 13 86 10 89 63 93 49 41 46 57 60 29 17 22 74 43 32 38 36 76 71 18 14 84 61 16 35 90 56 54 91 19 94 44 3 0 68 80 51 92 24 20 82 26 58 33 55 96 37 66 79 73`. But it will almost never print it in the same way again.
Here as well, it is easy to change your `HashSet` to a `BTreeSet` if you decide you need ordering. For our code, we only change two places.
```rust
use std::collections::BTreeSet; // Change HashSet to BTreeSet
A `BinaryHeap` is an interesting collection type, because is is mostly unordered but has a bit of order. It is a collection that keeps the largest item in the front, but the other items are in any order.
We will use another list of items for an example, but this time smaller.
```rust
use std::collections::BinaryHeap;
fn show_remainder(input: &BinaryHeap<i32>) -> Vec<i32> { // This function shows the remainder in the BinaryHeap. Actually an iterator would be
// faster than a function - we will learn them later.
let mut remainder_vec = vec![];
for number in input {
remainder_vec.push(*number)
}
remainder_vec
}
fn main() {
let many_numbers = vec![0, 5, 10, 15, 20, 25, 30]; // These numbers are in order
let mut my_heap = BinaryHeap::new();
for number in many_numbers {
my_heap.push(number);
}
while let Some(number) = my_heap.pop() { // .pop() returns Some(number) if a number is there, None if not. It pops from the front
println!("Popped off {}. Remaining numbers are: {:?}", number, show_remainder(&my_heap));
Popped off 20. Remaining numbers are: [15, 10, 5, 0]
Popped off 15. Remaining numbers are: [10, 0, 5]
Popped off 10. Remaining numbers are: [5, 0]
Popped off 5. Remaining numbers are: [0]
Popped off 0. Remaining numbers are: []
```
You can see that the number in the 0 index is always largest: 25, 20, 15, 10, 5, then 0. But the other ones are all different.
A good way to use a `BinaryHeap` is for a collection of tasks. Here we create a `BinaryHeap<u8, &str>` where the `u8` is a number for the importance of the task. The `&str` is a description of what to do.
```rust
use std::collections::BinaryHeap;
fn main() {
let mut jobs = BinaryHeap::new();
// Add jobs to do throughout the day
jobs.push((100, "Write back to email from the CEO"));
jobs.push((80, "Finish the report today"));
jobs.push((5, "Watch some YouTube"));
jobs.push((70, "Tell your team members thanks for always working hard"));
jobs.push((30, "Plan who to hire next for the team"));
while let Some(job) = jobs.pop() {
println!("You need to: {}", job.1);
}
}
```
This will always print:
```text
You need to: Write back to email from the CEO
You need to: Finish the report today
You need to: Tell your team members thanks for always working hard
For generics, you use angle brackets with the type inside: `<T>` This means "any type you put into the function". Usually, generics uses types with one capital letter (T, U, V, etc.).
The important part is the `<T>` after the function name. Without this, Rust will think that T is a concrete (concrete = not generic) type, like `String` or `i8`.
You will remember that some types in Rust are **Copy**, some are **Clone**, some are **Display**, some are **Debug**, and so on. With **Debug**, we can print with `{}`. So now you can see that this is a problem:
29 | println!("Here is your number: {:?}", number);
| ^^^^^^ `T` cannot be formatted using `{:?}` because it doesn't implement `std::fmt::Debug`
```
T doesn't implement **Debug**. So do we implement Debug for T? No, because we don't know what T is. But we can tell the function: "only accept types that have Debug".
```rust
use std::fmt::Debug; // Debug is located at std::fmt::Debug. If we write this,
So now the compiler knows: "Okay, I will only take a type if it has Debug". Now the code works, because `i32` is Debug. Now we can give it many types: `String`, `&str`, and so on.
Sometimes we need more than one type in a generic function. We have to write out each type name, and think about how we want to use it. In this example, we want two types. First we want to print a statement for type T. Printing with `{}` is nicer, so we will require Display for T.
Next is type U, and two variables have type U (U is some sort of number). We want to compare them, so we need PartialOrd. We want to print them too, so we require Display for U as well.
```rust
use std::fmt::Display;
use std::cmp::PartialOrd;
fn compare_and_display<T:Display,U:Display+PartialOrd>(statement: T, num_1: U, num_2: U) {
println!("{}! Is {} greater than {}? {}", statement, num_1, num_2, num_1 > num_2);
We understand enums and generics now, so we can understand `Option` and `Result`. Rust uses these two enums to make code safer. We will start with Option.
thread 'main' panicked at 'index out of bounds: the len is 2 but the index is 4', src\main.rs:34:5
```
Panic means that the program stops before the problem happens. Rust sees that the function wants something impossible, and stops. It "unwinds the stack" (takes the values off the stack) and tells you "sorry, I can't do that".
So now we will change the return type from `i32` to `Option<i32>`. This means "give me an `i32` if it's there, and give me `None` if it's not". We say that the `i32` is "wrapped" in an Option, which means that it's inside an Option.
The important point to remember: with `Some`, you have a value of type `T` (any type). But with `None`, you don't have anything. So in a `match` statement for Option you can't say:
`Result<T, E>` means you need to think of what you want to return for `Ok`, and what you want to return for `Err`. Actually, you can decide anything. Even this is okay:
This information helps you fix your code. `src\main.rs:30:20` means "inside main.rs in directory src, on line 30 and column 20". So you can go there to look at your code and fix the problem.
This function take a vector of bytes (`u8`) and tries to make a `String`. So the success case for the Result is a `String` and the error case is `FromUtf8Error`. You can give your error type any name you want. We will create our own error types later, because first we need to learn other things.
This is good, but we don't do anything for `None`. Here we can make the code smaller by using `if let`. `if let` means "do something if it matches, and don't do anything if it doesn't". `if let` is when you don't care about matching for everything.
We want to get the numbers, but not the words. For the numbers, we can use a method called `parse::<i32>()`. `parse()` is the method, and `::<i32>` is the type. It will try to turn the `&str` into an `i32`, and give it to us if it can.
There is an even shorter way to deal with Result (and Option), shorter than `match` and even shorter than `if let`. It is called the "question mark operator", and is just `?`. After a function that returns a result, you can add `?`. This will:
This function takes a `&str`. If it is `Ok`, it gives an `i32` wrapped in `Ok`. If it is an `Err`, it returns a `std::num::ParseIntError`. Then we try to parse the number, and add `?`. That means "check if it is an error, and give the result if it is okay". If it is not okay, it will return the error and end. But if it is okay, it will go to the next line. On the next line is the number inside of `Ok()`. We need to wrap it in `Ok` because the return is `Result<i32, std::num::ParseIntError>`, not `i32`.
We don't need to write `std::result::Result` because `Result` is always "in scope" (in scope = ready to use). But `std::num::ParseIntError` is not in scope. We can bring it in scope if we want:
You will remember that `src\main.rs` is the directory and file name, and `2:3` is the line and column name. With this information, you can find the code and fix it.
`panic!` is a good macro to use when writing your code to make sure that you know when something changes. For example, our function `prints_three_things` always prints index [0], [1], and [2] from a vector. It is okay because we always give it a vector with three items:
This gives us `thread 'main' panicked at 'my_vec must always have three items', src\main.rs:8:9`. Now we remember that `my_vec` should only have three items. So `panic!` is a good macro to create reminders in your code.
`unwrap` is also good when you want the program to crash when there is a problem. Later, when your code is finished it is good to change `unwrap` to something else that won't crash. You can also use `expect`, which is like `unwrap` but with your own message.
We have seen traits before: Debug, Copy, Clone are all traits. To give a type a trait, you have to implement it. Because Debug and the others are so common, it's easy to do:
But other traits are more difficult, so you need to implement them manually with `impl`. For example, std::ops::Add is used to add two things. But you can add in many ways.
Or maybe we want to just put `self.first_thing` next to `self.second_thing` and say that this is how we want to add. So if we add 55 to 33.4, we want to see 5533, not 88.
This is okay, but we don't want to print "The dog is running". We can change the method `.run()`, but we have to follow the signature. The signature says:
Actually, a trait doesn't need to write out the whole function. Now we change `bark()` and `run()` to just say `fn bark(&self)` and `fn run(&self);`. This is not a full function, so the user must write it.
So when you create a trait, you must think: "Which functions should I write? And which functions should the user write?" If you think the user will use the function the same way every time, then write out the function. If you think the user will use it differently, then just write the function signature.
So we need to implement Display for Cat. On [https://doc.rust-lang.org/std/fmt/trait.Display.html](https://doc.rust-lang.org/std/fmt/trait.Display.html) we can see the information for Display, and one example. It says:
Some parts of this we don't understand yet, like `<'_>` and what `f` is doing. But we understand the `Position` struct: it is just two `f32`s. We also understand that `self.longitude` and `self.latitude` are the values in the struct. So maybe we can just use this for our struct, with `self.name` and `self.age`. Also, `write!` looks a lot like `println!`. So we write this:
*From* is a very convenient trait to use, and you know this because you have seen it so much already. With *From* you can make a `String` from a `&str`, but you can make many types from many other types. For example, Vec uses *From* for the following:
If you look at the type, the second and third vectors are `Vec<u8>`, which means the bytes of the `&str` and the `String`. So you can see that `From` is very flexible and used a lot.
Let's make two structs and then implement `From` for one of them. One struct will be `City`, and the other will be `Country`. We want to be able to do this: `let country_name = Country::from(vector_of_cities)`.
It looks like this:
```rust
#[derive(Debug)] // So we can print City
struct City {
name: String,
population: u32,
}
impl City {
fn new(name: &str, population: u32) -> Self { // just a new function
Self {
name: name.to_string(),
population,
}
}
}
#[derive(Debug)] // Country also needs to be printed
struct Country {
cities: Vec<City>, // Our cities go in here
}
impl From<Vec<City>> for Country { // Note: we don't have to write From<City>, we can also do
// From<Vec<City>>. So we can also implement on a type that
// we didn't create
fn from(cities: Vec<City>) -> Self {
Self { cities }
}
}
impl Country {
fn print_cities(&self) { // function to print the cities in Country
for city in &self.cities {
// & because Vec<Cities> isn't Copy
println!("{:?} has a population of {:?}.", city.name, city.population);
}
}
}
fn main() {
let helsinki = City::new("Helsinki", 631_695);
let turku = City::new("Turku", 186_756);
let finland_cities = vec![helsinki, turku]; // This is the Vec<City>
let finland = Country::from(finland_cities); // So now we can use From
finland.print_cities();
}
```
You can see that `From` is easy to implement from types you didn't create like `Vec`, `i32`, and so on. Here is one more example where we create a vector that has two vectors. The first vector holds even numbers, and the second holds odd numbers. With `From` you can give it a vector of `i32`s and it will turn it into a `Vec<Vec<i32>>`: a vector that holds vectors of `i32`.
```rust
use std::convert::From;
#[derive(Debug)]
struct EvenOddVec(Vec<Vec<i32>>);
impl From<Vec<i32>> for EvenOddVec {
fn from(input: Vec<i32>) -> Self {
let mut even_odd_vec: Vec<Vec<i32>> = vec![vec![], vec![]]; // A vec with two empty vecs inside
// This is the return value but first we must fill it
A type like `EvenOddVec` is probably better as a generic `T` so we can use many number types. You can try to make the example generic if you want for practice.
Sometimes you want a function that can take both a `String` and a `&str`. You can do this with generics and the `AsRef` trait. `AsRef` is used to give a reference from one type to another type. If you look at the documentation for `String`, you can see that it has `AsRef` for many types:
You can see that it takes `&self` and gives a reference to the other type. This means that if you have a generic type T, you can say that it needs `AsRef<str>`. If you do that, it will be able to take a `&str` and a `String`.
Now it works and prints `Please print me`. That is good, but T can still be too many things. It can be an `i8`, an `f32` and anything else with just `Display`. So we add `AsRef<str>`, and now T needs both `AsRef<str>` and `Display`.
Don't forget that you can write the function differently when it gets long. If we add Debug then it becomes `fn print_it<T: AsRef<str> + Display + Debug>(input: T)` which is long for one line. So we can write it like this:
Rust is a systems programming language, but it also has a functional style. Both styles are okay, but functional style is usually shorter. Here is an example of declarative style to make a Vec from 1 to 10:
With functional style you can chain methods. That means to put many methods together in a single statement. Here is an example of many methods chained together:
This creates a Vec with `[3, 4, 5, 6]`. It is a good idea to put each method on a new line if you have many chained methods. This helps you to read the code. Here is the same code with each method on a new line:
An iterator is a collection that can give you the items in the collection, one at a time. Actually, we have already used iterators: the `for` loop gives you an iterator. When you want to use an iterator other times, you have to choose what kind:
First we used `.iter()` on `vector1` to get references. We added 1 to each, and made it into a new Vec. `vector1` is still alive because we only used references: we didn't take by value. Now we have `vector1`, and a new Vec called `vector1_a`.
Then we used `.iter_mut()` for `vector2`. It is mutable, so we don't need to use `.collect()` to create a new Vec. Instead, we change the values in the same Vec with mutable references. So `vector2` is still there. Because we don't need a new Vec, we use `for_each`: it's just like a `for` loop.
Finally we used `into_iter` to get an iterator by value from `vector1`. This destroys `vector1`, so after we make `vector1_b` we can't use `vector1` again.
An iterator works by using a method called `.next()`, which gives an `Option`. When you use an iterator, Rust calls `next()`. If it gets `Some`, it keeps going. If it gets `None`, it stops.
That works well. Now we want to implement `Iterator` for the library so we can use it in a `for` loop. Right now if we try a `for` loop, it doesn't work:
But we can make library into an iterator with `impl Iterator for Library`. Information on the `Iterator` trait is here in the standard library: [https://doc.rust-lang.org/std/iter/trait.Iterator.html](https://doc.rust-lang.org/std/iter/trait.Iterator.html)
On the top left of the page it says: `Associated Types: Item` and `Required Methods: next`. An "associated type" means "a type that goes together". Our associated type will be `String`, because we want the iterator to give us Strings.
You can see that under `impl Iterator for Alternate` it says `type Item = i32`. This is the associated type. Our iterator will be for our list of books, which is a `Vec<String>>`. When we call next, it will give us a `String`. So we will write `type Item = String;`. That is the associated item.
To implement `Iterator`, you need to write the `fn next()` function. This is where you decide what the iterator should do. For our `Library`, we want it to give us the last books first. So we will `match` with `.pop()` which takes the last item off it it is `Some`. We also want to print " is found!" for each item. Now it looks like this:
Closures are like quick functions that don't need a name. Sometimes they are called lambdas. Closures are easy to find because they use `||` instead of `()`.
Usually you see closures in Rust inside of a method, because it is very convenient to have a closure inside. For example, there is the `unwrap_or` method that we know that you can use to give a value if `unwrap` doesn't work. Before, we wrote: `let fourth = my_vec.get(3).unwrap_or_(&0);`. But there is also an `unwrap_or_else` method that has a closure inside. So you can do this:
Of course, a closure can be very simple. You can just write `let fourth = my_vec.get(3).unwrap_or_else(|| &0);` for example. As long as you put the `||` in, the compiler knows that you have put in the closure that you need.
Another good example is with `.enumerate()`. This gives an iterator with the index number, and the item. For example: `[10, 9, 8]` becomes `(0, 10), (1, 9), (2, 8)`. So you can do this:
In this case we use `for_each` instead of `map`. `map` is for **doing something to** each item and passing it on, and `for_each` is **doing something when you see each item**. Also, `map` doesn't do anything unless you use a method like `collect`.
Actually, this is the interesting thing about iterators. If you try to `map` without a method like `collect`, the compiler will tell you that it doesn't do anything:
So this `Map<Enumerate<Iter<i32>>>` is a structure that is ready to go, when we tell it what to do. Rust does this because it needs to be fast. It doesn't want to do this:
Rust only wants to do one calculation, so it creates the structure and waits. Then if we say `.collect::<Vec<i32>>()` it knows what to do, and starts moving. This is what `iterators are lazy and do nothing unless consumed` means. The iterators don't do anything until you "consume" them (use them up).
You can even create complicated things like `HashMap` using `.collect()`, so it is very powerful. Here is an example of how to put two vecs into a `HashMap`. First we make the two vectors, and then we will use `.into_iter()` on them to get an iterator of values. Then we use the `.zip()` method. This method takes two iterators and attaches them together, like a zipper. Finally, we use `.collect()` to make the `HashMap`.
Here is the code:
```rust
use std::collections::HashMap;
fn main() {
let some_numbers = vec![0, 1, 2, 3, 4, 5]; // a Vec<i32>
let some_words = vec!["zero", "one", "two", "three", "four", "five"]; // a Vec<&str>
let number_word_hashmap = some_numbers
.into_iter() // now it is an iter
.zip(some_words.into_iter()) // inside .zip() we put in the other iter. Now they are together.
.collect::<HashMap<_,_>>();
println!("For key {} we get {}.", 2, number_word_hashmap.get(&2).unwrap());
You can see that we wrote `<HashMap<_, _>>` because that is enough information for Rust to decide on the type `HashMap<i32, &str>`. You can write `.collect::<HashMap<i32, &str>>();` if you want, or you can write it like this if you prefer:
Sometimes you see `|_|` in a closure. This means that the closure needs an argument, but you don't need to use it. So `|_|` means "Okay, here is the argument but I won't give it a name because I don't care about it".
This code creates a new mutable number and changes it. Then it creates a vec, and uses `iter` and `map` and `collect` to create a new vec. We can put `dbg!` almost everywhere in this code. `dbg!` asks the compiler: "What are you doing here?"
- String literals: you make these when you write `let my_str = "I am a &str"`. They last for the whole program, because they are written directly into the binary. They have the type `&'static str`. `'` means its lifetime, and string literal have a lifetime called `static`.
- Borrowed str: This is the regular `&str` form without a `static` lifetime. If you create a `String` and get a reference to it, Rust will convert it to a `&str` when you need it. For example:
A lifetime means "how long the variable lives". You only need to think about lifetimes with references. This is because references can't live longer than the object they come from. For example, this function does not work:
The problem is that `my_string` only lives inside `returns_reference`. We try to return `&my_string`, but `&my_string` can't exist without `my_string`. So the compiler says no.
`missing lifetime specifier` means that we need to add a `'` with the lifetime. Then it says that it `contains a borrowed value, but there is no value for it to be borrowed from`. That means that `I am a str` isn't borrowed from anything. It says `consider using the 'static lifetime` by writing `&'static str`. So it thinks we should try saying that this is a string literal.
So now `fn returns_str() -> &'static str` tells Rust: "don't worry, we will only return a string literal". String literals live for the whole program, so Rust is happy.
But `'static` is not the only lifetime. Actually, every variable has a lifetime, but usually we don't have to write it. We only have to write the lifetime when the compiler doesn't know.
Here is an example of another lifetime. Imagine we want to create a `City` struct and give it a `&str` for the name. Later we will have many more `&str`s because we need faster performance than with `String`. So we write it like this, but it won't work:
println!("{} was founded in {}", my_city.name, my_city.date_founded);
}
```
Okay, that works. And maybe this is what you wanted for the struct. However, note that we can only take "string literals", so not references to something else. So this will not work:
```rust
#[derive(Debug)]
struct City {
name: &'static str, // must live for the whole program
date_founded: u32,
}
fn main() {
let city_names = vec!["Ichinomiya".to_string(), "Kurume".to_string()]; // city_names does not live for the whole program
let my_city = City {
name: &city_names[0], // This is a &str, but not a &'static str. It is a reference to a value inside city_names
So now we will try what the compiler suggested before. It said to try writing `struct City<'a>` and `name: &'a str`. This means that it will only take a reference for `name` if it lives as long as `City`.
Also remember this important fact: `'a` etc. don't change the actual lifetime of variables. They are like traits for generics. Remember when we wrote generics? For example:
**Interior mutability** means having a little bit of mutability on the inside. Rust has some ways to let you safely change values inside of a struct that is immutable. First, let's look at a simple example where we would want this. Imagine a `struct` called `PhoneModel` with many fields:
It is better for the fields in `PhoneModel` to be immutable, because we don't want the data to change. The `date_issued` and `screen_size` never change, for example.
But inside is one field called `on_sale`. A phone model will first be on sale (`true`), but later the company will stop selling it. Can we make just this one field mutable? Because we don't want to write `let mut super_phone_3000`. If we do, then every field will become mutable.
Rust has many ways to allow some safe mutability inside of something that is immutable. The most simple is called `Cell`. First we use `use std::cell::Cell` so that we can just write `Cell` instead of `std::cell::Cell` every time.
`Cell` works for all types, but works best for simple Copy types because it gives values, not references. `Cell` has a method called `get()` for example that only works on Copy types.
There are many methods for `RefCell`. Two of them are `.borrow()` and `.borrow_mut()`. With these methods, you can do the same thing you do with `&` and `&mut`. The rules are the same:
`Mutex` is another way to change values without declaring `mut`. Mutex means `mutual exclusion`, which means "only one at a time". This is why a `Mutex` is safe, because it only lets one process change it at a time. To do this, it uses `.lock()`. `Lock` is like locking a door from the inside. You go into a room, lock the door, and now you can change things inside the room. Nobody else can come in and stop you, because you locked the door.
But `mutex_changer` still has a lock. How do we stop it? A `Mutex` is unlocked when the `MutexGuard` goes out of scope. "Go out of scope" means the code block is finished. For example:
One other method is `try_lock()`. Then it will try once, and if it doesn't get the lock it will give up. Don't do `try_lock().unwrap()`, because it will panic if it doesn't work. `if let` or `match` is better:
`*my_mutex.lock().unwrap() = 6;` means "unlock my_mutex and make it 6". There is no variable that holds it so you don't need to call `std::mem::drop`. You can do it 100 times if you want - it doesn't matter:
`RwLock` means "read write lock". It is like a `Mutex` but also like a `RefCell`. You use `.write().unwrap()` instead of `.lock().unwrap()` to change it. But you can also use `.read().unwrap()` to get read access. It is like `RefCell` because it follows the rules:
Cow is a very convenient enum. It means "clone on write" and lets you return a `&str` if you don't need a `String`, and a `String` if you need it. (It can also do the same with arrays vs. Vecs, etc.)
You know right away that `'a` means it works with references. The `ToOwned` trait means that it is a type that can be turned into an owned type. For example, `str` is usually a reference (`&str`) and you can turn it into an owned `String`.
Next is `?Sized`. This means "maybe Sized, but maybe not". Almost every type in Rust is Sized, but types like `str` are not. That is why we need a `&` for a `str`, because the compiler doesn't know the size. So if you want a trait that can use something like a `str`, you add `?Sized.`
Imagine that you have a function that returns `Cow<'static, str>`. If you tell the function to return `"My message".into()`, it will look at the type: "My message" is a `str`. This is a `Borrowed` type, so it chooses `Borrowed(&'a B)`. So it becomes `Cow::Borrowed(&'static str)`.
And if you give it a `format!("{}", "My message".into()` then it will look at the type. This time it is a `String`, because `format!` makes a `String`. So this time it will select "Owned".
Here is an example to test `Cow`. We will put a number into a function that returns a `Cow<'static, str>`. Depending on the number, it will create a `&str` or a `String`. Then it uses `.into()` to turn it into a `Cow`. When you do that, it will choose either `Cow:::Borrowed` or `Cow::Owned`. Then we will match to see which one it chose.
A type alias means "giving a new name to another type". Type aliases are very easy. Usually you use them when you have a very long type and don't want to write it every time. It is also good when you want to give a type a better name that is easy to remember. Here are two examples of type aliases.
The type is not difficult, but you want to make your code easier to understand for other people (or for you):
Sometimes you want to write code in general to help you imagine your project. For example, imagine a simple project to do something with books. Here's what you think as you write it:
```rust
struct Book {} // Okay, first I need a book struct.
// Nothing in there yet - will add later
enum BookType { // A book can be hardcover or softcover, so add an enum
HardCover,
SoftCover,
}
fn get_book(book: &Book) -> Option<String> {} // get_book should take a &Book and return an Option<String>
fn delete_book(book: Book) -> Result<(), String> {} // delete_book should take a Book and return a Result...
// TODO: impl block and make these functions methods...
fn check_book_type(book_type: &BookType) { // Let's make sure the match statement works
But you don't care about `get_book` and `delete_book` right now. This is where you can use `todo!()`. If you add that to the function, Rust will not complain, and will compile.
`todo!()` is actually the same as another macro: `unimplemented!()`. Programmers were using `unimplemented()` a lot but it was long to type, so they created `todo!()` which is shorter.
After `takes_a_string` takes `user_name`, you can't use it anymore. Here that is no problem: you can just give it `user_name.clone()`. But sometimes a variable is part of a struct, and maybe you can't clone the struct. Or maybe the `String` is really long and you don't want to clone it. These are some reasons for `Rc`, which lets you have more than one owner. An `Rc` is like a good office worker: `Rc` writes down who has ownership, and how many. Then once the number of owners goes down to 0, the variable can disappear.
Here's how you use an `Rc`. First imagine two structs: one called `City`, and another called `Cities`. `City` has information for one city, and `Cities` puts all the cities together in `Vec`s.
Of course, it doesn't work because `canada_cities` now owns the data and `calgary` doesn't. We can clone the name: `names: vec![calgary.name.clone()]` but we don't want to clone the `city_history`, which is long. So we can use an `Rc`.
So if there are strong pointers, are there weak pointers? Yes, there are. Weak pointers are useful because if two Rcs point at each other, they can't die. This is called a "reference cycle". If item 1 has an Rc to item 2, and item 2 has an Rc to item 1, they can't get to 0. In this case you want to use weak references. `Rc` will count the references, but if it only has weak references then it can die. You use `Rc::downgrade(&item)` instead of `Rc::clone(&item)` to make weak references. Also, you use `Rc::weak_count(&item)` to see the weak count.
If you use multiple threads, you can do many things at the same time. Rust uses threads that are called "OS threads". OS thread means the operating system creates the thread on a different core.
You create threads with `std::thread::spawn` and then a closure to tell it what to do. Threads are interesting because they run at the same time. Here is a simple example:
If you run this, it will be different every time. Sometimes it will print, and sometimes it won't print. That is because sometimes `main()` finishes before the thread finishes. And when `main()` finishes, the program is over. This is easier to see in a `for` loop:
But that is a silly way to give the threads time to finish. The better way is to bind the threads to a variable. If you add `let`, then you will create a `JoinHandle`.
`handle` is now a `JoinHandle`, and it has the method `.join()`. This method means "wait until all the threads are done" (it waits for the threads to join it). So now just write `handle.join()` and it will wait for all ten threads to finish.
A closure will try to use `Fn` if it can. But if it needs to change the value it will use `FnMut`, and if it needs to take the whole value, it will use `FnOnce`. `FnOnce` is a good name because it explains what it does: it takes the value once, and then it can't take it again.
It is a long message, but helpful: it says to ``use the `move` keyword``. The problem is that we can do anything to `my_string` while the thread is using it. That would be unsafe.
You remember that we used an `Rc` to give a variable more than one owner. If we are doing the same thing in a thread, we need an `Arc`. `Arc` means "atomic reference counter". Atomic means that it uses the computer's processor so that data only gets written once each time. This is important because if two threads write data at the same time, you will get the wrong result. For example, imagine if you could do this in Rust:
An `Arc` uses the processor to make sure this doesn't happen, so it is the method you must use when you have threads. You don't want an `Arc` for just one thread though, because `Rc` is a bit faster.
Now let's make one more thread. Each thread will do the same thing. You can see that the threads are working at the same time. Sometimes it will say `Thread 1 is working!` first, but other times `Thread 1 is working!` is first. This is called **concurrency**, which means "running together".
Now we want to change the value of `my_number`. Right now it is an `i32`. We will change it to an `Arc<Mutex<i32>>`: an `i32` that can be changed, protected by an `Arc`.
This looks complicated but `Arc<Mutex>>` is used very often in Rust, so it becomes natural. Also, you can always write your code to make it cleaner. Here is the same code with one more `use` statement and two functions. The functions don't do anything new, but they move some code out of `main()`. You can try rewriting code like this if it is hard to read.
You can also make a vector of handles, and use `.join().unwrap()` on them. Then you can do what you want with each handle inside. Here is `main()` with the handles in a vector:
A channel is an easy way to use many threads that send to one place. You can create a channel in Rust with `std::sync::mpsc`. `mpsc` means "multiple producer, single consumer", so "many threads sending to one place". To start a channel, you use `channel()`. This creates a `Sender` and a `Receiver` that are tied together. You can see this in the function signature:
So you have to choose one name for the sender and one for the receiver. Usually you see `let (sender, receiver) = channel();` to start. Because it's generic, Rust won't know the type if that is all you write:
Now the compiler knows the type. `sender` is a `Result<(), SendError<i32>>` and `receiver` is a `Result<i32, RecvError`. So you can use `unwrap` to see if the sending works, or use better error handling. Let's add `.unwrap()` and also `println!` to see what we get:
A `channel` is like an `Arc` because you can clone it and send the clones into other threads. Let's make two threads and send values to `receiver`. This code will work, but it is not exactly what we want.
The two threads start sending, and then we `println!`. Sometimes it will say `Send a &str this time` and sometimes it will say `And here is another &str`. Let's make a join handle to make them wait.
Now let's pretend that we have a lot of work to do, and want to use threads. We have a big vec with 10,000 items, all 0. We want to change each 0 to a 1. We will use ten threads, and each thread will do one tenth of the work. We will create a new vec and use `.extend()` to put the work in.
It is important to know how to read documentation in Rust so you can understand what other people wrote. Here are some things to know in Rust documentation:
You saw that `assert_eq!` is used when doing testing. You put two items inside the function and the program will panic if they are not equal. Here is a simple example where we need an even number.
Maybe you don't have any plans to use `assert_eq!` in your code, but it is everywhere in Rust documentation. This is because in a document you would need a lot of room to `println!` everything. Also, you would require `Display` or `Debug` for the things you want to print. That's why documentation has `assert_eq!` everywhere. Here is an example from here [https://doc.rust-lang.org/std/vec/struct.Vec.html](https://doc.rust-lang.org/std/vec/struct.Vec.html) showing how to use a Vec:
In these examples, you can just think of `assert_eq!(a, b)` as saying "a is b". Now look at the same example with comments on the right. The comments show what it actually means.
The top bar of a Rust document is the search bar. It shows you results as you type. When you go down a page you can't see the search bar anymore, but if you press `<kbd>s</kbd>` on the keyboard you can search again. So pressing `<kbd>s</kbd>` anywhere lets you search right away.
Usually the code for a method, struct, etc. will not be complete. This is because you don't usually need to see the full source to know how it works, and the full code can be confusing. But if you want to know more, you can click on [src] and see everything. For example, on the page for `String` you can see this signature:
Interesting! Now you can see that a String is a kind of `Vec`. And actually a `String` is a vector of `u8` bytes, which is interesting to know. But you don't need to know that to use the `with_capacity` method so you only see it if you click [src]. So clicking on [src] is a good idea if the document doesn't have much detail and you want to know more.
The important part of the documentation for a trait is "Required Methods" on the left. If you see Required Methods, it probabl means that you have to write the method yourself. For example, for `Iterator` you need to write the `.next()` method. And for `From` you need to write the `.from()` method. But some traits can be implemented with just an **attribute**, like we see in `#[derive(Debug)]`. `Debug` needs the `.fmt()` method, but usually you just use `#[derive(Debug)]` unless you want to do it yourself. That's why the page on `std::fmt::Debug` says that "Generally speaking, you should just derive a Debug implementation."
`Box` is a very convenient type in Rust. When you use a `Box`, you can put a type on the heap instead of the stack. To make a new `Box`, just use `Box::new()` and put the item inside.
```rust
fn just_takes_a_variable<T>(item: T) {} // Takes anything and drops it.
just_takes_a_variable(my_number); // Using this function twice is no problem, because it's Copy
let my_box = Box::new(1); // This is a Box<i32>
just_takes_a_variable(my_box.clone()); // WIthout .clone() the second function would make an error
just_takes_a_variable(my_box); // because Box is not Copy
}
```
At first it is hard to imagine where to use it, but you use it in Rust a lot. You remember that `&` is used for `str` because the compiler doesn't know the size of a `str`: it can be any length. But the `&` reference is always the same length, so the compiler can use it. `Box` is similar. Also, you can use `*` on a `Box` to get to the value, just like with `&`:
```rust
fn main() {
let my_box = Box::new(1); // This is a Box<i32>
let an_integer = *my_box; // This is an i32
println!("{:?}", my_box);
println!("{:?}", an_integer);
}
```
You can also use a Box to create structs with the same struct inside. These are called *recursive*, which means that inside Struct A is maybe another Struct A. Sometimes programmers use these types to create lists, although this type of list is not very popular in Rust. But if you want to create a recursive struct, you can use a `Box`. Here's what happens if you try without a `Box`:
```rust
struct List {
item: Option<List>,
}
```
This simple `List` has one item, that may be `Some<List>` (another list), or `None`. Because you can choose `None`, it will not be recursive forever. But the compiler still doesn't know the size:
error[E0072]: recursive type `List` has infinite size
--> src\main.rs:16:1
|
16 | struct List {
| ^^^^^^^^^^^ recursive type has infinite size
17 | item: Option<List>,
| ------------------ recursive without indirection
|
= help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `List` representable
```
You can see that it even suggests trying a `Box`. So let's put a `Box` around List:
```rust
struct List {
item: Option<Box<List>>,
}
```
Now the compiler is fine with the `List`, because everything is behind a `Box`, and it knows the size of a `Box`. Then a very simple list might look like this:
```rust
struct List {
item: Option<Box<List>>,
}
impl List {
fn new() -> List {
List {
item: Some(Box::new(List { item: None })),
}
}
}
fn main() {
let mut my_list = List::new();
}
```
Even without data it is a bit complicated, and Rust does not use this type of pattern very much. This is because Rust has strict rules on borrowing and ownership, as you know. But if you want to start a list like this (a linked list), `Box` can help.