Tuesday, 22 November 2016

linux-4.8-ck8, MuQSS version 0.144

Here's a new release to go along with and commemorate the 4.8.10 stable release (they're releasing stable releases faster than my development code now.)

linux-4.8-ck8 patch:

MuQSS by itself:

There are a small number of updates to MuQSS itself.
Notably there's an improvement in interactive mode when SMT nice is enabled and/or realtime tasks are running, or there are users of CPU affinity. Tasks previously would not schedule on CPUs when they were stuck behind those as the highest priority task and it would refuse to schedule them transiently.
The old hacks for CPU frequency changes from BFS have been removed, leaving the tunables to default as per mainline.
The default of 100Hz has been removed, but in its place a new and recommended 128Hz has been implemented - this just a silly microoptimisation to take advantage of the fast shifts that /128 has on CPUs compared to /100, and is close enough to 100Hz to behave otherwise the same.

For the -ck patch only I've reinstated updated and improved versions of the high resolution timeouts to improve behaviour of userspace that is inappropriately Hz dependent allowing low Hz choices to not affect latency.
Additionally by request I've added a couple of tunables to adjust the behaviour of the high res timers and timeouts.

Both of these are in microseconds and can be set from 1-10,000. The first is how accurate high res timers will be in the kernel and is set to 100us by default (on mainline it is Hz accuracy).
The second is how small to make a request for a "minimum timeout" generically in all kernel code. The default is set to 1000us by default (on mainline it is one tick).

I doubt you'll find anything useful by tuning these but feel free to go nuts. Decreasing the second tunable much further risks breaking some driver behaviour.


Saturday, 12 November 2016

linux-4.8-ck7, MuQSS version 0.140

Another week has passed, another stable linux release, and to follow, another -ck and MuQSS release.

linux-4.7-ck7 patch:

Split out patches:

MuQSS by itself for 4.8:

MuQSS by itself for 4.7:

This release marks a change towards conservative changes only.

I've rolled back the extensive timer changes outside the main scheduler code. There are too many assumptions made about timeouts in the kernel code that are potentially problematic in the real world, and there is code that is poorly prepared for freezer usage (suspend to ram) that breaks. Additionally, not a single user reported a workload that they noticed benefited from the lower latency accurate timeouts. Finally, the added overhead is demonstrable in throughput benchmarks, and when doing comparisons with mainline it is doing MuQSS a disservice to mix in other code that it's not actually responsible for.

There are also a small number of bugfixes for warnings/crashes in the updated MuQSS that showed up after the last release as people are using it on more and varied hardware in the wild now. These may have positive effects on other less defined issues in the wild too.

The -ck release also includes an updated version of BFQ. Along with this updated version, I would like to issue a warning regarding BFQ. I have heard rumour that a number of users have reported filesystem corruption with the combination of BTRFS and BFQ. If you are using this filesystem, I urge you to not compile in BFQ at all, or at the very least not make it default to BFQ, using it selectively on devices you are running a different filesystem (I still recommend people use ext4.) I would like to encourage users who have run into this problem to report it to the BFQ maintainer.

I've cleaned up the patches in the -ck tarball once again to include only the changes in combined related patches. This will ease the burden of porting to the next major linux kernel release and allow users to easily select which patches they wish to use themselves.

As always, make sure to give me your feedback, bug reports, warnings, and bitcoin.


Saturday, 5 November 2016

linux-4.8-ck6, MuQSS version 0.135

Announcing a new version of MuQSS and a -ck release

4.8-ck6 patchset:

MuQSS by itself for 4.8:

MuQSS by itself for 4.7:

Git tree:

A week has passed since the last major update to BFS and -ck was posted, allowing me to concentrate on receiving and responding to any bug reports. As it turns out, there were very few apart from the recurring local_softirq_pending warning/stalls. This is nice because it means MuQSS is mostly ~stable now. Mainline has even had more "stable" releases in the same time as MuQSS for 4.8, moving to 4.8.6 in the interim.

In this version I've added aggressive handling of pending softirqs in the hope the warnings and stalls all go away. The true reason the handling of softirqs are being dropped still escapes me but is likely related to the fact that MuQSS does a lot of lockless rescheduling across CPUs to decrease overhead but this does not give guarantees that locking would.

Additionally, I've added a number of APIs to the kernel to do specified millisecond schedule timeouts which use the highres timers which are mandatory now for MuQSS. The reason for doing this is there are many timeouts in the kernel that specify values below 10ms and the timer resolution at 100Hz only guarantees timeouts under 20ms.

I've also added a code sweep across the entire kernel looking for timeout calls under 50ms and use the new interface in its place. Additionally there are numerous places where schedule_timeout(1) are used in the kernel where a "minimum timeout" is expected, yet this is entirely Hz dependent, again being up to 20ms in duration. I've replaced all these with a 1ms timeout, emulating what would happen on a 1000Hz kernel, but without the overhead of running the higher Hz kernel. I'm not entirely sure this will equate to any real world improvements but the fact it's used in things like audio drivers worries me that it might.

Finally I've replaced the standard msleep call from userspace to use highres timers, in case there are userspace applications that expects msleep to actually give some kind of sleep that resembles what's asked of it, instead of something Hz limited, in case this is leading to slowdowns in userspace due to assumptions on the userspace coders' part. Calls to msleep() from userspace now give 100us accuracy at 100Hz instead of 20ms.

All these timing changes add overhead since they're trying to emulate the timing accuracy of running at 1000Hz but in a latency-focused scheduler I believe they're appropriate, and they do not incur the overhead that actually changing Hz would incur. Additionally they add accuracy to timers and timeouts that 1000Hz does not afford.

In the -ck tarball of broken-out patches, I've kept these timer changes separate to allow the muqss scheduler to be applied by itself should they prove problematic, and they will make merging with future kernels easier.


Saturday, 29 October 2016

linux-4.8-ck5, MuQSS version 0.120

Announcing a new version of MuQSS and a -ck release to go with it in concert with mainline releasing 4.8.5

4.8-ck5 patchset:

MuQSS by itself for 4.8:

MuQSS by itself for 4.7:

Git tree:

This is a fairly substantial update to MuQSS which includes bugfixes for the previous version, performance enhancements, new features, and completed documentation. This will likely be the first publicly announced version on LKML.

EDIT: Announce here: LKML

New features:
- MuQSS is now a tickless scheduler. That means it can maintain its guaranteed low latency even in a build configured with a low Hz tick rate. To that end, it is now defaulting to 100Hz, and it is recommended to use this as the default choice for it leads to more throughput and power savings as well.
- Improved performance for single threaded workloads with CPU frequency scaling.
- Full NoHZ now supported. This disables ticks on busy CPUs instead of just idle ones. Unlike mainline, MuQSS can do this virtually all the time, regardless of how many tasks are currently running. However this option is for very specific use cases (compute servers running specific workloads) and not for regular desktops or servers.
- Numerous other configuration options that were previously disabled from mainline are now allowed again (though not recommended for regular users.)
- Completed documentation can now be found in Documentation/scheduler/sched-MuQSS.txt
- Fix for the various stalls some people were still experiencing, along with the softirq pending warnings.
- Fix for some loss of CPU for heavily sched_yielding tasks.
- Fix for the BFQ warning (-ck only)