Collabora have sent in a fresh patch for discussion to the Linux Kernel list to help Linux gaming, acting as a follow-up to their previous attempt.
The idea with their patches, which is in collab with Valve, seems primarily focused on Wine and so Proton for Steam Play due to the differences in how Windows handles things to Linux that Wine needs to support for getting good performance. As the original patch explained:
The use case lies in the Wine implementation of the Windows NT interface WaitMultipleObjects. This Windows API function allows a thread to sleep waiting on the first of a set of event sources (mutexes, timers, signal, console input, etc) to signal. Considering this is a primitive synchronization operation for Windows applications, being able to quickly signal events on the producer side, and quickly go to sleep on the consumer side is essential for good performance of those running over Wine.
They went onto explain that current Linux Kernel interfaces fell short on performance. With their code being used, they saw a reduction in the CPU utilization in multiple titles running with the Steam Play Proton compatibility layer when compared with current methods. Additionally it doesn't rely on file descriptors so it will also solve issues with running out of resources there too.
The new patch in discussion goes about it a different way to before. Instead of extending the current interface in the Linux Kernel, they're going with building a new system call 'futex2'. It's early on as they're still building it up with this adding the new interface, that they can then expand upon.
In short: it would make Linux gaming better with Wine / Proton in future Linux Kernel versions. However, it would likely have other uses too. You can see the patch set here which is currently under discussion.
Quoting: toojaysThere is no way to use fds for synchronization without a syscall. That's no good for performance-critical paths. Pthreads primitives like mutex, condition, semaphore are designed to avoid syscalls where possible. Ideally (e.g. uncontended mutex lock) they use only atomic operations, but they call futex when they need to block, or to wake other threads.
Being able to wait on multiple futexes at once seems generally useful to me.
Do you know of a common concurrency problem were it makes sense? For me it sounds like a developer is trying to outsmart the scheduler due to a bad soft design and that will always end bad.
Not long ago there was a google engineer that published some code where he implemented spinlocks in order to "improve" performance (basically a busy waiting code to avoid doing the syscall), but in his numbers std::mutex proved to be a good solution on Linux but a "bad one" on Windows. So the issue here was they are Windows programmers that decided to program a Windows workaround on Linux when it's not necessary... IMO this is a prove of why you want to keep at bare minimum threads priority setup on programmers side.
Quoting: toojaysThere is no way to use fds for synchronization without a syscall. That's no good for performance-critical paths.
It is possible to build a fast-pathed mutex (and condition variable) that is userspace only and falls backs on a eventfd for the slow path, exactly like for futex based mutexes [1]. In fact because they do not have to fiddle with VM stuff, I have seen claims that eventfds can be slightly faster on the slow path (but nobody really cares about that, so keep using futexes unless you need to interoperate with poll and friends).
The advantage of futexes is that they are ephemeral and the kernel side support data structures are allocated implicitly on a futex_wait call (when a futex is used to wait for an event) and destroyed as soon as there are no waiters, while eventfds are allocated and destroyed explicitly. But you can have millions of inactive futexes without any issues, while with eventfd you can hit the fd limit very easily. Apparently there are a lot of broken programs that allocate and leak Windows mutex handlers (which are pretty much the equivalent of an Unix fd) but probably because Windows has code to workaround this brokenness or because handlers are lighter weights, it is not much of an issue there. Note that Windows today has keyed events (which are exactly like futexes) and those can't be used on WaitForMultipeObjects either.
[1] I know because I have done it.
See more from me