21 KiB
Synchronization primitives in the Linux kernel. Part 1.
Introduction
This part opens new chapter in the linux-insides book. Timers and time management related stuff was described in the previous chapter. This chapter will describe synchronization primitives in the Linux kernel.
As always, before we consider something synchronization related, we will look at the concept of asynchronization primitive
in general. A synchronization primitive is a software mechanism which provides ability to two or more parallel processes or threads to coordinate. For example, to not execute simultaneously one the same segment of a code (such as writing different values to a shared variable). Let's look on the following piece of code:
mutex_lock(&clocksource_mutex);
...
...
...
clocksource_enqueue(cs);
clocksource_enqueue_watchdog(cs);
clocksource_select();
...
...
...
mutex_unlock(&clocksource_mutex);
from the kernel/time/clocksource.c source code file. This code is from the __clocksource_register_scale
function which adds the given clocksource to a list shared by more than one process. It's the mutex_lock
and mutex_unlock
functions we're interested in here, which take one parameter - the clocksource_mutex
in our case. These functions provide mutual exclusion with the mutex synchronization primitive; enabling proccesses or threads to coordinate use of a shared resource. The clocksource is added with the clocksource_enqueue
function (after the clock source in the list which has the biggest rating, the highest frequency clocksource regestered in the system):
static void clocksource_enqueue(struct clocksource *cs)
{
struct list_head *entry = &clocksource_list;
struct clocksource *tmp;
list_for_each_entry(tmp, &clocksource_list, list)
if (tmp->rating >= cs->rating)
entry = &tmp->list;
list_add(&cs->list, entry);
}
If two parallel processes execute this function simultaneously, some nasty things can happen. In the above, the clocksource is added to a sorted list; the correct place to add the clocksource is found, and the entry is added. Two proccesses executing this code at the same time may chose the same place to add an entry; and the second process calling list_add
may un-intentionally overwrite the clocksource just added by the first process.
Synchronization primitives are ubiquitous in the Linux kernel. A quick look through any of the chapters of this book will demonstrate their extensive use. The following set of synchronization primitives are provided:
mutex
;semaphores
;seqlocks
;atomic operations
;- etc.
We will start this chapter with the spinlock
.
Spinlocks in the Linux kernel.
The spinlock
is a low-level synchronization mechanism; simply a variable with two possible states:
acquired
;released
.
A process attempting to acquire or release a spinlock
, must write the associated value to the spinlock variable. A process trying to execute code which is protected by a spinlock
which another process has already aquired, it will be locked until the spinlock variable is released. To safely aquire or release a spinlock, the write operation performed to the spinlock must be atomic to prevent race conditions. The spinlock
is represented by the spinlock_t
type in the Linux kernel. If we will look at the Linux kernel code, we will see that this type is widely used. The spinlock_t
is defined as:
typedef struct spinlock {
union {
struct raw_spinlock rlock;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define LOCK_PADSIZE (offsetof(struct raw_spinlock, dep_map))
struct {
u8 __padding[LOCK_PADSIZE];
struct lockdep_map dep_map;
};
#endif
};
} spinlock_t;
and located in the include/linux/spinlock_types.h header file. We may see that its implementation depends on the state of the CONFIG_DEBUG_LOCK_ALLOC
kernel configuration option. We will skip this now, because all debugging related stuff will be at the end of this part. So, if the CONFIG_DEBUG_LOCK_ALLOC
kernel configuration option is disabled, the spinlock_t
contains a union union with one field - raw_spinlock
:
typedef struct spinlock {
union {
struct raw_spinlock rlock;
};
} spinlock_t;
The raw_spinlock
structure is defined in the same header file:
typedef struct raw_spinlock {
arch_spinlock_t raw_lock;
#ifdef CONFIG_GENERIC_LOCKBREAK
unsigned int break_lock;
#endif
} raw_spinlock_t;
where the arch_spinlock_t
represents architecture-specific spinlock
implementation.The break_lock
field is set to value - 1
when one processor starts to wait while the lock is held by another processor (on SMP systems). This helps prevent locks of exessive duration. We focus on the x86_64 architecture in this book, so the arch_spinlock_t
is defined in the arch/x86/include/asm/spinlock_types.h header file and looks like:
#ifdef CONFIG_QUEUED_SPINLOCKS
#include <asm-generic/qspinlock_types.h>
#else
typedef struct arch_spinlock {
union {
__ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
};
} arch_spinlock_t;
The definition of the arch_spinlock
structure depends on the value of the CONFIG_QUEUED_SPINLOCKS
kernel configuration option. This configuration option provides a specialized spinlock with a queue. This special type of spinlocks
which instead of acquired
and released
atomic values used atomic
operation on a queue
. If the CONFIG_QUEUED_SPINLOCKS
kernel configuration option is enabled, the arch_spinlock_t
will be represented by the following structure:
typedef struct qspinlock {
atomic_t val;
} arch_spinlock_t;
from the include/asm-generic/qspinlock_types.h header file.
We will not stop on this structure for now and before we will consider both arch_spinlock
and the qspinlock
,
Let's look at the operations that can performed on a spinlock:
spin_lock_init
- produces initialization of the givenspinlock
;spin_lock
- acquires givenspinlock
;spin_lock_bh
- disables software interrupts and acquire givenspinlock
.spin_lock_irqsave
andspin_lock_irq
- disable interrupts on local processor and preserve/not preserve previous interrupt state in theflags
;spin_unlock
- releases givenspinlock
;spin_unlock_bh
- releases givenspinlock
and enables software interrupts;spin_is_locked
- returns the state of the givenspinlock
;- and etc.
and the implementation of the spin_lock_init
macro (from include/linux/spinlock.h):
#define spin_lock_init(_lock) \
do { \
spinlock_check(_lock); \
raw_spin_lock_init(&(_lock)->rlock); \
} while (0)
Here spinlock_check
just returns the raw_spinlock_t
of the given spinlock
, ensuring that a normal
raw spinlock has been provided as an argument:
static __always_inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
{
return &lock->rlock;
}
And the raw_spin_lock_init
macro:
# define raw_spin_lock_init(lock) \
do { \
*(lock) = __RAW_SPIN_LOCK_UNLOCKED(lock); \
} while (0) \
assigns the value of __RAW_SPIN_LOCK_UNLOCKED
to the given spinlock
. As we may understand from the name of the __RAW_SPIN_LOCK_UNLOCKED
macro; initializatingthe given spinlock
and setting it in a released
state. This macro defined in the include/linux/spinlock_types.h header file and expands to the following macros:
#define __RAW_SPIN_LOCK_UNLOCKED(lockname) \
(raw_spinlock_t) __RAW_SPIN_LOCK_INITIALIZER(lockname)
#define __RAW_SPIN_LOCK_INITIALIZER(lockname) \
{ \
.raw_lock = __ARCH_SPIN_LOCK_UNLOCKED, \
SPIN_DEBUG_INIT(lockname) \
SPIN_DEP_MAP_INIT(lockname) \
}
We won't consider the debugging SPIN_DEBUG_INIT
and the SPIN_DEP_MAP_INIT
macros, the __RAW_SPINLOCK_UNLOCKED
macro expands to:
*(&(_lock)->rlock) = __ARCH_SPIN_LOCK_UNLOCKED;
where the __ARCH_SPIN_LOCK_UNLOCKED
is:
#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } }
and:
#define __ARCH_SPIN_LOCK_UNLOCKED { ATOMIC_INIT(0) }
So here we see (for the x86_64 architecture with the CONFIG_QUEUED_SPINLOCKS
kernel configuration option enabled) that the spin_lock_init
macro simmply initializes a given spinlock
atomically with the value 0 (corresponding to an unlocked
state).
Now we know how to a spinlock
is initalized, let's consider the API which Linux kernel provides for operating with spinlocks
. Starting with:
static __always_inline void spin_lock(spinlock_t *lock)
{
raw_spin_lock(&lock->rlock);
}
the function used to acquire
a spinlock, from include/linux/spinlock.h. The raw_spin_lock
macro is defined in the same header file and expands to the call of the _raw_spin_lock
function:
#define raw_spin_lock(lock) _raw_spin_lock(lock)
The definition of the _raw_spin_lock
macro depends on the CONFIG_SMP
kernel configuration parameter:
#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK)
# include <linux/spinlock_api_smp.h>
#else
# include <linux/spinlock_api_up.h>
#endif
if the SMP is enabled in the Linux kernel, the _raw_spin_lock
macro is defined in the arch/x86/include/asm/spinlock.h header file and looks like:
#define _raw_spin_lock(lock) __raw_spin_lock(lock)
The __raw_spin_lock
function looks:
static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
preempt_disable();
spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
}
this, first of all, disables preemption by calling the preempt_disable
macro (from include/linux/preempt.h, more about this in part nine of the Linux kernel initialization process chapter). When we unlock the given spinlock
, preemption will be reenabled:
static inline void __raw_spin_unlock(raw_spinlock_t *lock)
{
...
...
...
preempt_enable();
}
We need to do this while a process is spinning on a lock, other processes must be prevented to preempt the process which acquired a lock. The spin_acquire
macro which through a chain of other macros expands to the call of the:
#define spin_acquire(l, s, t, i) lock_acquire_exclusive(l, s, t, NULL, i)
#define lock_acquire_exclusive(l, s, t, n, i) lock_acquire(l, s, t, 0, 1, n, i)
looking at the lock lock_acquire
function:
void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
int trylock, int read, int check,
struct lockdep_map *nest_lock, unsigned long ip)
{
unsigned long flags;
if (unlikely(current->lockdep_recursion))
return;
raw_local_irq_save(flags);
check_flags(flags);
current->lockdep_recursion = 1;
trace_lock_acquire(lock, subclass, trylock, read, check, nest_lock, ip);
__lock_acquire(lock, subclass, trylock, read, check,
irqs_disabled_flags(flags), nest_lock, ip, 0, 0);
current->lockdep_recursion = 0;
raw_local_irq_restore(flags);
}
The lock_acquire
function disables hardware interrupts by calling the raw_local_irq_save
macro. This is to ensure the process is not preempted until the spinlock
has been safely aquired. Hardware interrupts are reenabled before the function exits with the raw_local_irq_restore
macro.
The interesting work here is being done by the __lock_acquire
function (defined in kernel/locking/lockdep.c), which is large, so we won't get into it right away. It's mostly related to the Linux kernel lock validator.
Returning to the __raw_spin_lock
function, we will see that it contains the following:
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
The LOCK_CONTENDED
macro is defined in the include/linux/lockdep.h header file and is defined as:
#define LOCK_CONTENDED(_lock, try, lock) \
lock(_lock)
In our case, the lock
is do_raw_spin_lock
function from include/linux/spinlock.h and the _lock
is the given raw_spinlock_t
:
static inline void do_raw_spin_lock(raw_spinlock_t *lock) __acquires(lock)
{
__acquire(lock);
arch_spin_lock(&lock->raw_lock);
}
__acquire
here is just sparse related macro and is not immediately interesting. The definition of the arch_spin_lock
function is dependant on the architecture of system and whether queued spinlocks are supported.
For the x86_64
architecture without queued spin locks (we'll get there later) it's defined in arch/x86/include/asm/spinlock.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/spinlock.h). Let's quickly look at the definition of the arch_spinlock
structure again:
typedef struct arch_spinlock {
union {
__ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
};
} arch_spinlock_t;
This variant of the spinlock
is called a ticket spinlock. A process wanting to aquire the spinlock will increment tail
. If the tail
field is not equal to head
, the process will hang, waiting for the spinlock a matching head
value. Let's look on the implementation of the arch_spin_lock
function:
static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
{
register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
inc = xadd(&lock->tickets, inc);
if (likely(inc.head == inc.tail))
goto out;
for (;;) {
unsigned count = SPIN_THRESHOLD;
do {
inc.head = READ_ONCE(lock->tickets.head);
if (__tickets_equal(inc.head, inc.tail))
goto clear_slowpath;
cpu_relax();
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
clear_slowpath:
__ticket_check_and_clear_slowpath(lock, inc.head);
out:
barrier();
}
At the beginning of the arch_spin_lock
function __raw_tickets
is initializated with tail
= 1
:
#define __TICKET_LOCK_INC 1
xadd (exchange and add) on inc
and lock->tickets
; sets inc
to lock->tickets
of the given lock
and increments tickets.tail
by the previous value of inc
(one). If head
and tail
are equal the lock has been aquired and the function exits with goto out
.
Otherwise the function spins
; periodically checking the value of head
until the head == tail
condition is met, and the spinlock is aquired.
Note: cpu_relax
is simply a NOP instruction:
#define cpu_relax() asm volatile("rep; nop")
The barrier
macro, called just before the function exits ensures the compiler will not to change the order of operations that access memory (more about memory barriers can be found in the kernel documentation).
The spin_unlock
operation goes through the a similar set of macros/function as spin_lock
(disabling hardware interrupts etc.) before the arch_spin_unlock
function is called; which simply increments arch_spinlock->ticket->head
:
__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
In brief, the ticketed spinlock is simply a fair queuing mechanisms; where processes waiting to aquire a lock gain first-in first-out access. head
contains an index number correspinding to the process currently holding the lock currently executed process which holds a lock and the tail
maps to the last process which queued to gain access to the lock:
+-------+ +-------+
| | | |
head | 7 | - - - | 7 |
| | | |
+-------+ +-------+
|
+-------+
| |
| 8 |
| |
+-------+
|
+-------+
| |
| 9 | tail
| |
+-------+
We won't cover more of the spinlock
API in in this part, but hopefully the the mechanism provided by the linux kernel spinlock and the basics of it's implimentation on x86 are clear.
Conclusion
This concludes the first part covering synchronization primitives in the Linux kernel. In this part, we met first synchronization primitive spinlock
provided by the Linux kernel. In the next part we will continue to dive into this interesting theme and will see other synchronization
related stuff.
If you have questions or suggestions, feel free to ping me in twitter 0xAX, drop me email or just create issue.
Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to linux-insides.