linux-insides/Concepts/cpumask.md
Dou Liyang c0847d8ca5 fix a typo
s/consuider/consider
2017-03-05 20:26:04 +03:00

10 KiB

CPU masks

Introduction

Cpumasks is a special way provided by the Linux kernel to store information about CPUs in the system. The relevant source code and header files which contains API for Cpumasks manipulation:

As comment says from the include/linux/cpumask.h: Cpumasks provide a bitmap suitable for representing the set of CPU's in a system, one bit position per CPU number. We already saw a bit about cpumask in the boot_cpu_init function from the Kernel entry point part. This function makes first boot cpu online, active and etc...:

set_cpu_online(cpu, true);
set_cpu_active(cpu, true);
set_cpu_present(cpu, true);
set_cpu_possible(cpu, true);

Before we will consider implementation of these functions, let's consider all of these masks.

The cpu_possible is a set of cpu ID's which can be plugged in anytime during the life of that system boot or in other words mask of possible CPUs contains maximum number of CPUs which are possible in the system. It will be equal to value of the NR_CPUS which is which is set statically via the CONFIG_NR_CPUS kernel configuration option.

The cpu_present mask represents which CPUs are currently plugged in.

The cpu_online represents a subset of the cpu_present and indicates CPUs which are available for scheduling or in other words a bit from this mask tells to kernel is a processor may be utilized by the Linux kernel.

The last mask is cpu_active. Bits of this mask tells to Linux kernel is a task may be moved to a certain processor.

All of these masks depend on the CONFIG_HOTPLUG_CPU configuration option and if this option is disabled possible == present and active == online. The implementations of all of these functions are very similar. Every function checks the second parameter. If it is true, it calls cpumask_set_cpu otherwise it calls cpumask_clear_cpu .

There are two ways for a cpumask creation. First is to use cpumask_t. It is defined as:

typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;

It wraps the cpumask structure which contains one bitmask bits field. The DECLARE_BITMAP macro gets two parameters:

  • bitmap name;
  • number of bits.

and creates an array of unsigned long with the given name. Its implementation is pretty easy:

#define DECLARE_BITMAP(name,bits) \
        unsigned long name[BITS_TO_LONGS(bits)]

where BITS_TO_LONGS:

#define BITS_TO_LONGS(nr)       DIV_ROUND_UP(nr, BITS_PER_BYTE * sizeof(long))
#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))

As we are focusing on the x86_64 architecture, unsigned long is 8-bytes size and our array will contain only one element:

(((8) + (8) - 1) / (8)) = 1

NR_CPUS macro represents the number of CPUs in the system and depends on the CONFIG_NR_CPUS macro which is defined in include/linux/threads.h and looks like this:

#ifndef CONFIG_NR_CPUS
        #define CONFIG_NR_CPUS  1
#endif

#define NR_CPUS         CONFIG_NR_CPUS

The second way to define cpumask is to use the DECLARE_BITMAP macro directly and the to_cpumask macro which converts the given bitmap to struct cpumask *:

#define to_cpumask(bitmap)                                              \
        ((struct cpumask *)(1 ? (bitmap)                                \
                            : (void *)sizeof(__check_is_bitmap(bitmap))))

We can see the ternary operator operator here which is true every time. __check_is_bitmap inline function is defined as:

static inline int __check_is_bitmap(const unsigned long *bitmap)
{
        return 1;
}

And returns 1 every time. We need it here for only one purpose: at compile time it checks that a given bitmap is a bitmap, or in other words it checks that a given bitmap has type - unsigned long *. So we just pass cpu_possible_bits to the to_cpumask macro for converting an array of unsigned long to the struct cpumask *.

cpumask API

As we can define cpumask with one of the method, Linux kernel provides API for manipulating a cpumask. Let's consider one of the function which presented above. For example set_cpu_online. This function takes two parameters:

  • Number of CPU;
  • CPU status;

Implementation of this function looks as:

void set_cpu_online(unsigned int cpu, bool online)
{
	if (online) {
		cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
		cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
	} else {
		cpumask_clear_cpu(cpu, to_cpumask(cpu_online_bits));
	}
}

First of all it checks the second state parameter and calls cpumask_set_cpu or cpumask_clear_cpu depends on it. Here we can see casting to the struct cpumask * of the second parameter in the cpumask_set_cpu. In our case it is cpu_online_bits which is a bitmap and defined as:

static DECLARE_BITMAP(cpu_online_bits, CONFIG_NR_CPUS) __read_mostly;

The cpumask_set_cpu function makes only one call to the set_bit function:

static inline void cpumask_set_cpu(unsigned int cpu, struct cpumask *dstp)
{
        set_bit(cpumask_check(cpu), cpumask_bits(dstp));
}

The set_bit function takes two parameters too, and sets a given bit (first parameter) in the memory (second parameter or cpu_online_bits bitmap). We can see here that before set_bit will be called, its two parameters will be passed to the

  • cpumask_check;
  • cpumask_bits.

Let's consider these two macros. First if cpumask_check does nothing in our case and just returns given parameter. The second cpumask_bits just returns the bits field from the given struct cpumask * structure:

#define cpumask_bits(maskp) ((maskp)->bits)

Now let's look on the set_bit implementation:

 static __always_inline void
 set_bit(long nr, volatile unsigned long *addr)
 {
         if (IS_IMMEDIATE(nr)) {
                asm volatile(LOCK_PREFIX "orb %1,%0"
                        : CONST_MASK_ADDR(nr, addr)
                        : "iq" ((u8)CONST_MASK(nr))
                        : "memory");
        } else {
                asm volatile(LOCK_PREFIX "bts %1,%0"
                        : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
        }
 }

This function looks scary, but it is not so hard as it seems. First of all it passes nr or number of the bit to the IS_IMMEDIATE macro which just calls the GCC internal __builtin_constant_p function:

#define IS_IMMEDIATE(nr)    (__builtin_constant_p(nr))

__builtin_constant_p checks that given parameter is known constant at compile-time. As our cpu is not compile-time constant, the else clause will be executed:

asm volatile(LOCK_PREFIX "bts %1,%0" : BITOP_ADDR(addr) : "Ir" (nr) : "memory");

Let's try to understand how it works step by step:

LOCK_PREFIX is a x86 lock instruction. This instruction tells the cpu to occupy the system bus while the instruction(s) will be executed. This allows the CPU to synchronize memory access, preventing simultaneous access of multiple processors (or devices - the DMA controller for example) to one memory cell.

BITOP_ADDR casts the given parameter to the (*(volatile long *) and adds +m constraints. + means that this operand is both read and written by the instruction. m shows that this is a memory operand. BITOP_ADDR is defined as:

#define BITOP_ADDR(x) "+m" (*(volatile long *) (x))

Next is the memory clobber. It tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters).

Ir - immediate register operand.

The bts instruction sets a given bit in a bit string and stores the value of a given bit in the CF flag. So we passed the cpu number which is zero in our case and after set_bit is executed, it sets the zero bit in the cpu_online_bits cpumask. It means that the first cpu is online at this moment.

Besides the set_cpu_* API, cpumask of course provides another API for cpumasks manipulation. Let's consider it in short.

Additional cpumask API

cpumask provides a set of macros for getting the numbers of CPUs in various states. For example:

#define num_online_cpus()	cpumask_weight(cpu_online_mask)

This macro returns the amount of online CPUs. It calls the cpumask_weight function with the cpu_online_mask bitmap (read about it). Thecpumask_weight function makes one call of the bitmap_weight function with two parameters:

  • cpumask bitmap;
  • nr_cpumask_bits - which is NR_CPUS in our case.
static inline unsigned int cpumask_weight(const struct cpumask *srcp)
{
	return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits);
}

and calculates the number of bits in the given bitmap. Besides the num_online_cpus, cpumask provides macros for the all CPU states:

  • num_possible_cpus;
  • num_active_cpus;
  • cpu_online;
  • cpu_possible.

and many more.

Besides that the Linux kernel provides the following API for the manipulation of cpumask:

  • for_each_cpu - iterates over every cpu in a mask;
  • for_each_cpu_not - iterates over every cpu in a complemented mask;
  • cpumask_clear_cpu - clears a cpu in a cpumask;
  • cpumask_test_cpu - tests a cpu in a mask;
  • cpumask_setall - set all cpus in a mask;
  • cpumask_size - returns size to allocate for a 'struct cpumask' in bytes;

and many many more...