Don't want to see articles from a certain category? When logged in, go to your User Settings and adjust your feed in the Content Preferences section where you can block tags!
We do often include affiliate links to earn us some pennies. See more here.

After a long bumpy road with many revisions, it appears that the futex2 work sponsored by Valve is finally heading into the upstream Linux Kernel. Initially much larger, the work was slimmed down to get the main needed parts done and enabled before the rest can make it in.

So what is it? As developer André Almeida previously described it: "The use case of this syscall is to allow low level locking libraries to wait for multiple locks at the same time. This is specially useful for emulating Windows' WaitForMultipleObjects. A futex_waitv()-based solution has been used for some time at Proton's Wine (a compatibility layer to run Windows games on Linux). Compared to a solution that uses eventfd(), futex was able to reduce CPU utilization for games, and even increase frames per second for some games. This happens because eventfd doesn't scale very well for a huge number of read, write and poll calls compared to futex. Native game engines will benefit of this as well, given that this wait pattern is common for games.".

Speaking on Twitter, Valve developer Pierre-Loup Griffais said "It's amazing news that futex_waitv() seems to be on its way to the upstream kernel! Many thanks to the continued efforts of our partners at Collabora, CodeWeavers, and to the upstream community.".

Ideally then this will help Windows games in Proton on Linux run better. But that's not all!

Also interesting is the follow-up post from Griffais that mentions "Beyond Wine/Proton, we are also excited to bring those multi-threaded efficiency gains to Linux-native game engines and applications through some variant of the following primitive, pending more discussion with the glibc community:" with a link to some glibc work.

Article taken from GamingOnLinux.com.
40 Likes
About the author -
author picture
I am the owner of GamingOnLinux. After discovering Linux back in the days of Mandrake in 2003, I constantly checked on the progress of Linux until Ubuntu appeared on the scene and it helped me to really love it. You can reach me easily by emailing GamingOnLinux directly.
See more from me
The comments on this article are closed.
All posts need to follow our rules. For users logged in: please hit the Report Flag icon on any post that breaks the rules or contains illegal / harmful content. Guest readers can email us for any issues.
14 comments

jordicoma Oct 10, 2021
What? A functionality coming from windows world that its good?
kit89 Oct 10, 2021
What? A functionality coming from windows world that its good?

Not really, WaitForMultipleObjects() is different/higher-level than futex2, however futex2 can be used to implement a faster/better version of WaitForMultipleObjects().
3zekiel Oct 10, 2021
What? A functionality coming from windows world that its good?
Well, saying that gaming has not been the scientific focus on Linux would not be a big surprise right ?
From there on, it figures that for at least some cases windows has some better mechanism for gaming related workloads. Also, this just takes some ideas from windows and extend a linux idea (futexes), not an implementation from the windows syscall itself.

Generally speaking, all OSes have their pro and cons, and looking at your neighbor for some inspiration is always a good idea. As long as you do not blindly copy :)
ShabbyX Oct 10, 2021
I wonder why they didn't instead try and make eventfd more efficient?


Last edited by ShabbyX on 10 October 2021 at 5:03 pm UTC
ShabbyX Oct 10, 2021
What? A functionality coming from windows world that its good?

It's not necessarily good. If windows has feature X, games will depend on it. Now it may have been better if games did Y instead, but that doesn't exist and they are stuck with X.

Just because Linux now needs a way to implement X doesn't mean that X was the best option, just that because of windows we are now stuck with it.
nenoro Oct 10, 2021
I thought it was already in Zen Kernel and maybe Xanmod ?
BielFPs Oct 10, 2021
I wonder why they didn't instead try and make eventfd more effiecient?
Probably because they would have the burden of not breaking software depending on early versions, same reason of why they called it "futex2" instead of just modify the original futex.

With futex2 they have all the freedom they need without have to worry about legacy compatibility.

I thought it was already in Zen Kernel and maybe Xanmod ?
Yes but those are not the "official kernel linux", or in another words they are linux kernels with custom patches (like Wine and Proton or Proton and Proton GE for example)

What changed in this case is that futex2 will also be part of the official kernel, so any distro using the mainline kernel 5.16+ will have support to futex2 by default, without the necessity of custom patches (commonly called "out of tree") like Xanmod does for example.
nenoro Oct 10, 2021
I wonder why they didn't instead try and make eventfd more effiecient?

What changed in this case is that futex2 will also be part of the official kernel, so any distro using the mainline kernel 5.16+ will have support to futex2 by default, without the necessity of custom patches (commonly called "out of tree") like Xanmod does for example.

sounds good to me


Last edited by nenoro on 10 October 2021 at 4:25 pm UTC
3zekiel Oct 10, 2021
I wonder why they didn't instead try and make eventfd more efficient?

Short answer: they kinda already tried that at first, and judged it to be a dead end.

Long answer:
Modifying an existing (set of) syscall(s) is extremely limited. You can not break compatibility in any way, since that would break thousands of apps, with no way for users to fix it. Contrary to libraries, you can not just install another kernel, or use a lightweight container to fix a kernel ABI breakage. So all issues with that syscall set are pretty much set in stone.

More generally, it seems more natural and clean to use a tool that is actually made to fix your issue. File descriptors (which eventfd is based on) are made to deal with file-like stuff (that's a lot of stuff in Linux). Futexes are made to deal with synchronization issues. Futexes are also made to be used in large numbers, file descriptors... not that much (the overhead in memory and so on). Futexes are generally made to be accessed many times too - they were made to deal with heavily multi threaded workloads synchronization. So, futexes are in principle a better tool for the task at hand, but they actually miss a couple of syscalls around them to match the need. Well, the solution which has been chosen is to add a syscall to be able to wait on multiple futexes, which is the need here. Futex2 is kinda misnamed from what I saw of the patch, as it just adds syscalls around futexes, not a whole new concept/object - at least that's what I saw. The need of waiting on multiple futexes is also sound, and seems reasonable to me. So solving that need once and for all does seem like the thing to do. And it gives a cleaner/better solution to the problem at hand than eventd.


Last edited by 3zekiel on 10 October 2021 at 6:43 pm UTC
soulsource Oct 10, 2021
What? A functionality coming from windows world that its good?

It's not necessarily good. If windows has feature X, games will depend on it. Now it may have been better if games did Y instead, but that doesn't exist and they are stuck with X.

Just because Linux now needs a way to implement X doesn't mean that X was the best option, just that because of windows we are now stuck with it.

This.
Usually the problem of having to wait on multiple events can be solved by building cleaner (more readable, better maintainable) software architecture. I'm writing "usually" on purpose here, as there definitely are problems where going for a cleaner solution is not worth the effort, or where indeed waiting on multiple events is the most readable implementation (to be honest, right now I can't think of a problem where the latter would be the case, but that's probably just lack of experience on my side).
However, synchronization by waiting on multiple events is nearly always the easiest (quickest, cheapest in the short-term) solution to implement. If there's a deadline coming up, it's almost certainly the solution that's going to be picked, even though in the long-term it might cost more due to it being inherently more difficult to read and debug.

Generally speaking, if an API offers certain functionality, it will be used sooner or later. If one wants to be compatible, that functionality has to be there, and has to be about as performant as the original implementation (at least have the same asymptotic scaling behaviour).
ShabbyX Oct 11, 2021
Short answer: they kinda already tried that at first, and judged it to be a dead end.

Fair enough.

Long answer:
Modifying an existing (set of) syscall(s) is extremely limited. You can not break compatibility in any way, since that would break thousands of apps, with no way for users to fix it. Contrary to libraries, you can not just install another kernel, or use a lightweight container to fix a kernel ABI breakage. So all issues with that syscall set are pretty much set in stone.

Sure, but that doesn't mean you cannot provide the same functionality more efficiently. The point was eventfd doesn't scale, and making that scale doesn't necessarily have to interfere with its functionality.

More generally, it seems more natural and clean to use a tool that is actually made to fix your issue. File descriptors (which eventfd is based on) are made to deal with file-like stuff (that's a lot of stuff in Linux). Futexes are made to deal with synchronization issues. Futexes are also made to be used in large numbers, file descriptors... not that much (the overhead in memory and so on).

Everything is a file. In fact the few things that Unix didn't make a file turned out to be the most problematic areas (pids and signals notably). At least the pid problem is remedied with fds (pidfd), and if signals aren't already, I'm sure they will be turned into fds too.

I said all that to say that given how central fds are, it's worthwhile to make sure eventfd is actually efficient, rather than keep trying to work around it.
3zekiel Oct 12, 2021
Sure, but that doesn't mean you cannot provide the same functionality more efficiently. The point was eventfd doesn't scale, and making that scale doesn't necessarily have to interfere with its functionality.

I think here the point is more that eventfd does not scale for that particular purpose. And I guess that is very general to thread synchronization because all synchronization (mutexes, semaphore) in Linux has been made around futexes for quite a while.

Everything is a file. In fact the few things that Unix didn't make a file turned out to be the most problematic areas (pids and signals notably). At least the pid problem is remedied with fds (pidfd), and if signals aren't already, I'm sure they will be turned into fds too.

Well, you can see futexes as an extremely low overhead fd too. For signals ... They are a different beast. They basically break the illusion of user space applications that all their context is safe at every point. That nothing will come and trash their current state. Basically, you bring in kernel like issues in user space. They have their use sometime, but the only way to fix em, is not to use them really. Turning them to fd won't fix anything. You can read the comments about signals in kernel code, you will see how much the kernel devs love them :)

I said all that to say that given how central fds are, it's worthwhile to make sure eventfd is actually efficient, rather than keep trying to work around it.

eventfd is actually efficient enough for its purpose I would expect. But the issue is, the inner counter is - by spec - maintained by the kernel. So it means many round trips between kernel and user space ... Which will limit perf, and I guess that is why you can not just poll it as much as you want. And that is the syscall spec, can't do much about it.
Futexes on the other hand were made so that you only go to kernel space if you can not take the lock ownership/if the semaphore is at 0 (basically a yield). So you have much less round trips with futexes. They are also stored as a simple `intptr` (an address) whereas eventfd looks like that:
struct eventfd_ctx {
struct kref kref;
wait_queue_head_t wqh;
/*
 * Every time that a write(2) is performed on an eventfd, the
 * value of the __u64 being written is added to "count" and a
 * wakeup is performed on "wqh". A read(2) will return the "count"
 * value to userspace, and will reset "count" to zero. The kernel
 * side eventfd_signal() also, adds to the "count" counter and
 * issue a wakeup.
 */
__u64 count;
unsigned int flags;
int id;
};

From the look of it, eventfd will be more real time (you will wake up as soon as something happens if you have the priority), whereas futex side, you will wake up at your next quantum clearly (I only see mechanism unsuspending you, nothing scheduling you). Futexes also do not hold a list of who is waiting, they are just a counter. So the first one who comes and retake the lock wins seems. It's coherent since the scheduler is fair anyway.
So I would say, they just fill purposes that are orthogonal. I would typically use eventfd for IO related waits, or if I need something a bit more real time. And futexes for all the rest.
ShabbyX Oct 12, 2021
eventfd is actually efficient enough for its purpose I would expect. But the issue is, the inner counter is - by spec - maintained by the kernel. So it means many round trips between kernel and user space ... Which will limit perf, and I guess that is why you can not just poll it as much as you want. And that is the syscall spec, can't do much about it.
Futexes on the other hand were made so that you only go to kernel space if you can not take the lock ownership/if the semaphore is at 0 (basically a yield). So you have much less round trips with futexes. They are also stored as a simple `intptr` (an address) whereas eventfd looks like that:
struct eventfd_ctx {
struct kref kref;
wait_queue_head_t wqh;
/*
 * Every time that a write(2) is performed on an eventfd, the
 * value of the __u64 being written is added to "count" and a
 * wakeup is performed on "wqh". A read(2) will return the "count"
 * value to userspace, and will reset "count" to zero. The kernel
 * side eventfd_signal() also, adds to the "count" counter and
 * issue a wakeup.
 */
__u64 count;
unsigned int flags;
int id;
};

From the look of it, eventfd will be more real time (you will wake up as soon as something happens if you have the priority), whereas futex side, you will wake up at your next quantum clearly (I only see mechanism unsuspending you, nothing scheduling you). Futexes also do not hold a list of who is waiting, they are just a counter. So the first one who comes and retake the lock wins seems. It's coherent since the scheduler is fair anyway.
So I would say, they just fill purposes that are orthogonal. I would typically use eventfd for IO related waits, or if I need something a bit more real time. And futexes for all the rest.

Ack. I don't know eventfds well enough to actually have a proposal to improve it. But agreed, having futexes go through the kernel unconditionally is completely in contradiction with them being futexes, so a new syscall is reasonable.
HariboKing Jan 2, 2022
Hello folks,

Not really a Linux Gamer, but a real-time software engineer (and contributor to the Linux kernel). Not sure how I stumbled on this post, but just wanted to clarify a couple of points make by 3zekiel regarding futex functionality.

Futexes also do not hold a list of who is waiting, they are just a counter. So the first one who comes and retake the lock wins seems. It's coherent since the scheduler is fair anyway.

Futexes *do* hold a list of waiters.

Before Pierre Peiffer's 'futex priority based wakeup' patch, non-Priority Inheritance (PI) futexes made use of a simple linked-list to store the tasks waiting on a futex. With the old scheme, tasks would be enqueued on this list when required to wait. When waiting tasks were to be awoken, this would be undertaken by dequeueing the relevant number of waiting tasks from the list of waiters and making them runnable again. However, this scheme did not take account of the waiting tasks' priorities - waiters were woken in first come first served / FIFO order.

PI futexes, however, do not behave in the same way due to their very nature - they are priority aware (and more than that, they temporarily alter task priorities under certain conditions to avoid Priority Inversion).

Pierre's patch made changes to non-PI futexes such that futex wakeups are orchestrated with respect to the priority of awaiting tasks. See futex_wake() within kernel/futex.c for any mainline kernel version > v2.6.21.

From the look of it, eventfd will be more real time (you will wake up as soon as something happens if you have the priority), whereas futex side, you will wake up at your next quantum clearly (I only see mechanism unsuspending you, nothing scheduling you). Futexes also do not hold a list of who is waiting, they are just a counter. So the first one who comes and retake the lock wins seems. It's coherent since the scheduler is fair anyway.

On !CONFIG_PREEMPT configured kernels, the required amount of waiting tasks are made runnable on a futex wake event. There is no invocation of schedule(). However, on CONFIG_PREEMPT configured kernels, a reschedule is invoked. This is ultimately invoked within the call chain for wake_up_q():

futex_wake()
--> wake_up_q()
    --> wake_up_process()
        --> try_to_wakeup()
            --> preempt_enable() << This function invokes a reschedule on CONFIG_PREEMPT kernels in the subsequent call chain (details below here ommitted).


So the reschedule invocation depends on the build-time configuration of the kernel and is buried relatively deep within the call chain (hence why I think 3zekiel missed it - I did too, the first time I looked).

Jack
While you're here, please consider supporting GamingOnLinux on:

Reward Tiers: Patreon. Plain Donations: PayPal.

This ensures all of our main content remains totally free for everyone! Patreon supporters can also remove all adverts and sponsors! Supporting us helps bring good, fresh content. Without your continued support, we simply could not continue!

You can find even more ways to support us on this dedicated page any time. If you already are, thank you!
The comments on this article are closed.