This part opens new chapter in the [linux-insides](https://proninyaroslav.gitbooks.io/linux-insides-ru/content/) book. Timers and time management related stuff was described in the previous [chapter](https://proninyaroslav.gitbooks.io/linux-insides-ru/content/Timers/index.html). This chapter will describe [synchronization](https://en.wikipedia.org/wiki/Synchronization_%28computer_science%29) primitives in the Linux kernel.
As always, before we consider something synchronization related, we will look at the concept of a`synchronization primitive` in general. A synchronization primitive is a software mechanism which provides ability to two or more [parallel](https://en.wikipedia.org/wiki/Parallel_computing) processes or threads to coordinate. For example, to not execute simultaneously one the same segment of a code (such as writing different values to a shared variable). Let's look on the following piece of code:
from the [kernel/time/clocksource.c](https://github.com/torvalds/linux/blob/master/kernel/time/clocksource.c) source code file. This code is from the `__clocksource_register_scale` function which adds the given [clocksource](https://proninyaroslav.gitbooks.io/linux-insides-ru/content/Timers/timers-2.html) to a list shared by more than one process. It's the `mutex_lock` and `mutex_unlock` functions we're interested in here, which take one parameter - the `clocksource_mutex` in our case. These functions provide [mutual exclusion](https://en.wikipedia.org/wiki/Mutual_exclusion) with the mutex synchronization primitive; enabling proccesses or threads to coordinate use of a shared resource. The clocksource is added with the `clocksource_enqueue` function (after the clock source in the list which has the biggest rating, the highest frequency clocksource regestered in the system):
If two parallel processes execute this function simultaneously, some [nasty things](https://en.wikipedia.org/wiki/Race_condition) can happen. In the above, the clocksource is added to a sorted list; the correct place to add the clocksource is found, and the entry is added. Two proccesses executing this code at the same time may chose the same place to add an entry; and the second process calling `list_add` may un-intentionally overwrite the clocksource just added by the first process.
Synchronization primitives are ubiquitous in the Linux kernel. A quick look through any of the chapters of this book will demonstrate their extensive use. The following set of synchronization primitives are provided:
A process attempting to acquire or release a `spinlock`, must write the associated value to the spinlock variable. A process trying to execute code which is protected by a `spinlock` which another process has already aquired, it will be locked until the spinlock variable is released. To safely aquire or release a spinlock, the write operation performed to the spinlock must be [atomic](https://en.wikipedia.org/wiki/Linearizability) to prevent [race conditions](https://en.wikipedia.org/wiki/Race_condition). The `spinlock` is represented by the `spinlock_t` type in the Linux kernel. If we will look at the Linux kernel code, we will see that this type is [widely](http://lxr.free-electrons.com/ident?i=spinlock_t) used. The `spinlock_t` is defined as:
and located in the [include/linux/spinlock_types.h](https://github.com/torvalds/linux/blob/master/include/linux/spinlock_types.h) header file. We may see that its implementation depends on the state of the `CONFIG_DEBUG_LOCK_ALLOC` kernel configuration option. We will skip this now, because all debugging related stuff will be at the end of this part. So, if the `CONFIG_DEBUG_LOCK_ALLOC` kernel configuration option is disabled, the `spinlock_t` contains a union [union](https://en.wikipedia.org/wiki/Union_type#C.2FC.2B.2B) with one field - `raw_spinlock`:
where the `arch_spinlock_t` represents architecture-specific `spinlock` implementation.The `break_lock` field is set to value - `1` when one processor starts to wait while the lock is held by another processor (on [SMP](https://en.wikipedia.org/wiki/Symmetric_multiprocessing) systems). This helps prevent locks of exessive duration. We focus on the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture in this book, so the `arch_spinlock_t` is defined in the [arch/x86/include/asm/spinlock_types.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/spinlock_types.h) header file and looks like:
The definition of the `arch_spinlock` structure depends on the value of the `CONFIG_QUEUED_SPINLOCKS` kernel configuration option. This configuration option provides a specialized spinlock with a queue. This special type of `spinlocks` which instead of `acquired` and `released` [atomic](https://en.wikipedia.org/wiki/Linearizability) values used `atomic` operation on a `queue`. If the `CONFIG_QUEUED_SPINLOCKS` kernel configuration option is enabled, the `arch_spinlock_t` will be represented by the following structure:
and the implementation of the `spin_lock_init` macro (from [include/linux/spinlock.h](https://github.com/torvalds/linux/master/include/linux/spinlock.h)):
Here `spinlock_check` just returns the `raw_spinlock_t` of the given `spinlock`, ensuring that a `normal` raw spinlock has been provided as an argument:
assigns the value of `__RAW_SPIN_LOCK_UNLOCKED` to the given `spinlock`. As we may understand from the name of the `__RAW_SPIN_LOCK_UNLOCKED` macro; initializatingthe given `spinlock` and setting it in a `released` state. This macro defined in the [include/linux/spinlock_types.h](https://github.com/torvalds/linux/blob/master/include/linux/spinlock_types.h) header file and expands to the following macros:
So here we see (for the [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture with the `CONFIG_QUEUED_SPINLOCKS` kernel configuration option enabled) that the `spin_lock_init` macro simmply initializes a given `spinlock` atomically with the value 0 (corresponding to an `unlocked` state).
Now we know how to a `spinlock` is initalized, let's consider the [API](https://en.wikipedia.org/wiki/Application_programming_interface) which Linux kernel provides for operating with `spinlocks`. Starting with:
the function used to `acquire` a spinlock, from [include/linux/spinlock.h](https://github.com/torvalds/linux/blob/master/include/linux/spinlock.h). The `raw_spin_lock` macro is defined in the same header file and expands to the call of the `_raw_spin_lock` function:
if the [SMP](https://en.wikipedia.org/wiki/Symmetric_multiprocessing) is enabled in the Linux kernel, the `_raw_spin_lock` macro is defined in the [arch/x86/include/asm/spinlock.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/spinlock.h) header file and looks like:
this, first of all, disables [preemption](https://en.wikipedia.org/wiki/Preemption_%28computing%29) by calling the `preempt_disable` macro (from [include/linux/preempt.h](https://github.com/torvalds/linux/blob/master/include/linux/preempt.h), more about this in [part](https://proninyaroslav.gitbooks.io/linux-insides-ru/content/Initialization/linux-initialization-9.html) nine of the Linux kernel initialization process chapter). When we unlock the given `spinlock`, preemption will be reenabled:
We need to do this while a process is spinning on a lock, other processes must be prevented to preempt the process which acquired a lock. The `spin_acquire` macro which through a chain of other macros expands to the call of the:
The `lock_acquire`function disables hardware interrupts by calling the `raw_local_irq_save` macro. This is to ensure the process is not preempted until the `spinlock` has been safely aquired. Hardware interrupts are reenabled before the function exits with the `raw_local_irq_restore` macro.
The interesting work here is being done by the `__lock_acquire` function (defined in [kernel/locking/lockdep.c](https://github.com/torvalds/linux/blob/master/kernel/locking/lockdep.c)), which is large, so we won't get into it right away. It's mostly related to the Linux kernel [lock validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt).
The `LOCK_CONTENDED` macro is defined in the [include/linux/lockdep.h](https://github.com/torvalds/linux/blob/master/include/linux/lockdep.h) header file and is defined as:
In our case, the `lock` is `do_raw_spin_lock` function from [include/linux/spinlock.h](https://github.com/torvalds/linux/blob/master/include/linux/spnlock.h) and the `_lock` is the given `raw_spinlock_t`:
`__acquire` here is just [sparse](https://en.wikipedia.org/wiki/Sparse) related macro and is not immediately interesting. The definition of the `arch_spin_lock` function is dependant on the architecture of system and whether queued spinlocks are supported.
For the `x86_64` architecture without queued spin locks (we'll get there later) it's defined in arch/x86/include/asm/spinlock.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/spinlock.h). Let's quickly look at the definition of the `arch_spinlock` structure again:
This variant of the `spinlock` is called a [ticket spinlock](https://en.wikipedia.org/wiki/Ticket_lock). A process wanting to aquire the spinlock will increment `tail`. If the `tail` field is not equal to `head`, the process will hang, waiting for the spinlock a matching `head` value. Let's look on the implementation of the `arch_spin_lock` function:
[xadd](http://x86.renejeschke.de/html/file_module_x86_id_327.html) (exchange and add) on `inc` and `lock->tickets`; sets `inc` to `lock->tickets` of the given `lock` and increments `tickets.tail` by the previous value of `inc` (one). If `head` and `tail` are equal the lock has been aquired and the function exits with `goto out`.
The `barrier` macro, called just before the function exits ensures the compiler will not to change the order of operations that access memory (more about memory barriers can be found in the kernel [documentation](https://www.kernel.org/doc/Documentation/memory-barriers.txt)).
The `spin_unlock` operation goes through the a similar set of macros/function as `spin_lock` (disabling hardware interrupts etc.) before the `arch_spin_unlock` function is called; which simply increments `arch_spinlock->ticket->head`:
In brief, the ticketed spinlock is simply a fair queuing mechanisms; where processes waiting to aquire a lock gain [first-in first-out](https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)) access. `head` contains an index number correspinding to the process currently holding the lock currently executed process which holds a lock and the `tail` maps to the last process which queued to gain access to the lock:
We won't cover more of the `spinlock` API in in this part, but hopefully the the mechanism provided by the linux kernel spinlock and the basics of it's implimentation on x86 are clear.
This concludes the first part covering synchronization primitives in the Linux kernel. In this part, we met first synchronization primitive `spinlock` provided by the Linux kernel. In the next part we will continue to dive into this interesting theme and will see other `synchronization` related stuff.
If you have questions or suggestions, feel free to ping me in twitter [0xAX](https://twitter.com/0xAX), drop me [email](anotherworldofworld@gmail.com) or just create [issue](https://github.com/0xAX/linux-insides/issues/new).
**Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to [linux-insides](https://github.com/0xAX/linux-insides).**