Reduce latencies even further

Signed-off-by: Tyler Nijmeh <tylernij@gmail.com>
4 years ago · 979798b9ad
parent d2c4bb18ed
commit 979798b9ad
2 changed files with 13 additions and 15 deletions
--- a/README.md
+++ b/README.md
@ -61,14 +61,14 @@ When forking a child process from the parent, execute the child process before t
 ### kernel.sched_tunable_scaling: 0
 This is more of a precaution than anything. Since the next few tunables will be scheduler timing related, we don't want the scheduler to scale our values for multiple CPUs, as we will be providing CPU-agnostic values.

-### kernel.sched_latency_ns: 5000000 (5ms)
-Set the default scheduling period to 5ms. Reduce the maximum scheduling period to reduce overall scheduling latency.
+### kernel.sched_latency_ns: 1000000 (1ms)
+Set the default scheduling period to 1ms. Reduce the maximum scheduling period to reduce overall scheduling latency. With the addition of HRTick, we can gain timers that are accurate within microseconds. In testing, reducing this value to 1ms has been able to reduce even the minimum latencies by over half.

-### kernel.sched_min_granularity_ns: 500000 (0.5ms)
-Set the minimum task scheduling period to 0.5ms. With kernel.sched_latency_ns set to 5ms, this means that 10 active tasks may execute within the 10ms scheduling period before we exceed it.
+### kernel.sched_min_granularity_ns: 100000 (0.1ms)
+Set the minimum task scheduling period to 0.1ms. With kernel.sched_latency_ns set to 5ms, this means that 10 active tasks may execute within the 1ms scheduling period before we exceed it.

-### kernel.sched_wakeup_granularity_ns: 1000000 (1ms)
-Require tasks to be running for at least 1ms longer than the waiting task before preemption can happen. Reducing this value to 1ms reduces wakeup preemption latencies by up to 50% at a 50th percentile and around 10% for higher percentiles. Hackbench scores suffer if this value is reduced too low.
+### kernel.sched_wakeup_granularity_ns: 1000000 (0.1ms)
+Require tasks to be running for at least 0.1ms longer than the waiting task before preemption can happen. Reducing this value to 0.1ms reduces wakeup preemption latencies by up to 50% at a 50th percentile and around 10% for higher percentiles. Hackbench scores suffer if this value is reduced too low.

 ### kernel.sched_migration_cost_ns: 500000 (0.5ms) --> 5000000 (5ms)
 Increase the time that a task is considered to be cache hot. According to RedHat, increasing this tunable reduces the number of task migrations. This should reduce time spent balancing tasks and increase per-task performance.
@ -113,15 +113,12 @@ This tunable controls the kernel's tendency to reclaim inodes and dentries over
 ### Next Buddy
 By scheduling the last woken task first, we can increase cache locality since that task is likely to touch the same data as before.

-### No Strict Skip Buddy
-Usually, the scheduler will always choose to skip tasks that call `yield()`. However, these yeilding tasks may be of higher importance than the last or next buddy that are available. Do not always skip the skip buddy if we don't have to.
-
-### No Nontask Capacity
-The scheduler decrements the perceived CPU capacity that longer the CPU has been idle for. This means that an idle CPU may be skipped during task placement, and a task can be grouped with a busier CPU. Disable this to improve task start latency.
-
 ### TTWU Queue
 Allow the scheduler to place tasks on their origin CPU, increasing cache locality if the CPU is non-local (i.e. a cache hit would definitely have been missed).

+### HRTick
+Usually, the scheduler is interrupted once every 1/HZ milliseconds if no task directly interrupts it. This arises an issue where we rely on opportunism to keep scheduling latencies low. Enabling HRTick will schedule an hrtimer that is capable of interrupting the kernel more often than 1/HZ milliseconds, based on the current number of running tasks. In testing using schbench, latencies (and max latencies) were reduced to almost half of their original values.
+
 ### Governor Tweaks
 * {up_,down_}rate_limit_us / min_sample_time: 0 --> 5000: Only adjust frequencies once per scheduling cycle to reduce jitter or stutter caused by unrealistic frequency scaling.
 * hispeed_load / go_hispeed_load: 90: Jump to a higher frequency if we are approaching the end of the frequency list, where a task may begin to starve or begin to stutter.
--- a/7
+++ b/7
@ -91,9 +91,9 @@ write /proc/sys/kernel/perf_cpu_time_max_percent 5
 write /proc/sys/kernel/sched_autogroup_enabled 0
 write /proc/sys/kernel/sched_child_runs_first 1
 write /proc/sys/kernel/sched_tunable_scaling 0
-write /proc/sys/kernel/sched_latency_ns 5000000
-write /proc/sys/kernel/sched_min_granularity_ns 500000
-write /proc/sys/kernel/sched_wakeup_granularity_ns 1000000
+write /proc/sys/kernel/sched_latency_ns 1000000
+write /proc/sys/kernel/sched_min_granularity_ns 100000
+write /proc/sys/kernel/sched_wakeup_granularity_ns 100000
 write /proc/sys/kernel/sched_migration_cost_ns 5000000
 write /proc/sys/kernel/sched_min_task_util_for_colocation 0
 write /proc/sys/kernel/sched_nr_migrate 8
@ -115,6 +115,7 @@ if [[ -f "/sys/kernel/debug/sched_features" ]]
 then
 	write /sys/kernel/debug/sched_features NEXT_BUDDY
 	write /sys/kernel/debug/sched_features TTWU_QUEUE
+	write /sys/kernel/debug/sched_features HRTICK
 fi

 # CPU