Disclaimer: this information is all easily available around, I’m trying to condense it a bit, and focus on a particular topic for discussion. I may gloss over, or simplify information, but the centrals ideas should apply.
There’s been a fair bit of talk recently about attempting to multi-thread using OpenGL so I thought I’d write a bit more about what “multi-threading” an OpenGL game is, what’s normally done, and how it compares to multi-threading in Vulkan.
Some History
For those not aware, OpenGL is, in computing terms, old. It was designed before multiple CPU cores were even available to the general consumer, and long before just about every part of a graphics pipeline was programmable.
The central concept of OpenGL is a state machine. This has done it very well for a long time, but a single OpenGL context (state machine) is based on sequential inputs from the application - the application calls the API, and the OpenGL implementation reacts. State is changed, rendering commands issued, resources loaded, and so on.
State of OpenGL
Being state based, and because any of the API calls has the potential to change state, this makes multi-threaded access to an OpenGL context very difficult - indeed. Let's say one thread is handling "x" and only meant for it, but another thread accesses it and changes "x" to "x+1", the original thread then tries to do something else and it doesn't know it has changed. It’s not permitted in most cases and can result in undefined behaviour like that. There is some exception to this: contexts are allowed to share certain data such as texture information and vertex buffers, but more on that later.
Furthermore about a state based design, is that OpenGL implementations must ensure that the state is always valid. It must ensure that data is correctly bound, in range, and that nothing will break the system. Up to this point, everything is CPU-side still. If everything is okay, the implementation may then generate commands that can be sent to the GPU itself.
Drivers can do some fancy things behind the scenes of course, but the end result, as presented to the application, is the same. Accept an API command, modify and verify state, send hardware commands to the GPU.
Recent versions of OpenGL have been a big help in cutting out a lot of the overhead. Checking state validity can be decreased, greatly reducing the time from API call to GPU command, but it’s still very much a case of having to do that in a single thread.
Threading
So how can developers “multi-thread” OpenGL? It is possible to have multiple contexts in multiple threads, and use them to load texture data, update vertex buffers, possibly compile new shaders, in different threads. The tricky part about this is that sharing this information between OpenGL contexts is dependent on the drivers behaving themselves, in addition to the application not trying (by accident or intent) to do anything strange, so it’s often unstable. It can be quite the adventure getting a game running with this approach, and the runtime improvements are often simply not even worth the effort - it can often run worse if drivers need to synchronise data between contexts often enough. For the curious, things like editors with multiple rendering windows do this, but that’s a different scenario - each window isn’t trying to interfere with every other window while rendering, so multi-threading doesn’t normally come into play.
This leads to the second approach to multi-threading OpenGL: developers don’t! If OpenGL works best by submitting commands sequentially on the thread where a context is active, then that’s simply the best thing to do. Nothing stops a game developer making their own queue of OpenGL API calls they want to perform though, and creating that can be done by multi-threading. To give an example, if a game has a big list of objects, there’s going to be a bit of processing to do when deciding whether to draw each object or not; for each object, the game decides if the object might be visible first, and only try to render if it will actually be seen. The check for each object takes time, but processing each object is independent. So the list can be split into multiple sub-lists, and each sub-list given to a separate thread to run a visibility check on. Each thread will have it’s own rendering list to which objects that should be rendered are added. When done, each rendering list can be iterated over in turn and objects submitted to OpenGL in a single thread. This is a very simple example, but there’s normally quite a fair amount of similar logic in deciding what to render. So it’s not multi-threading OpenGL, but rather multi-threading in deciding how to use OpenGL.
Vulkan
Before I mentioned that OpenGL verifies state information and then generates commands to the GPU.
Firstly, once a developer has finished making everything work, then all that verification is still done, but isn’t actually required. It’s useful during development, but later it’s (hopefully!) a waste of time. So even on a dedicated thread submitting commands to OpenGL, there’s quite the overhead for the final task of actually sending commands to the GPU itself. It would be nice if there was some way to pre-build a list of commands to send to the GPU that were known to be valid.
Secondly, as with the example above of a game splitting object visibility checks into multiple sub-lists, it would also be nice if multiple GPU command lists could be created on separate threads, and then submitted to the GPU in turn. They are separate after all, and don’t require GPU access to actually prepare.
This is essentially what Vulkan allows. There are some requirements: all the state must be known up-front, and prepared for well before it’s time to actually render something. The flip side is that there is much, much less driver overhead, and the API itself can be used multi-threaded. Actual submission of commands to the GPU is still done sequentially, in a single thread, however there’s very little overhead; all error checking has been done, and it’s just sending commands directly to the GPU (feeding the beast).
There are other areas of Vulkan that lend themselves nicely to application level multi-threading, but I won’t cover them here. Suffice to say that Vulkan does not contain a central state machine, and instead tries to keep everything as isolated and contained as possible, meaning things like building a shader don’t block loading a texture, making multi-threaded designs easier to achieve.
Not Always Applicable
On a final note: when porting games, the way a game handles its data is not always compatible with some of the multi-threading ideas mentioned above. It can’t be expected in every game. In addition, it might simply be easier in time and effort (not to mention with testing and stability) to run things in a single thread anyway. Not as efficient, but possibly less error prone and faster to bring a port.
There’s been a fair bit of talk recently about attempting to multi-thread using OpenGL so I thought I’d write a bit more about what “multi-threading” an OpenGL game is, what’s normally done, and how it compares to multi-threading in Vulkan.
Some History
For those not aware, OpenGL is, in computing terms, old. It was designed before multiple CPU cores were even available to the general consumer, and long before just about every part of a graphics pipeline was programmable.
The central concept of OpenGL is a state machine. This has done it very well for a long time, but a single OpenGL context (state machine) is based on sequential inputs from the application - the application calls the API, and the OpenGL implementation reacts. State is changed, rendering commands issued, resources loaded, and so on.
State of OpenGL
Being state based, and because any of the API calls has the potential to change state, this makes multi-threaded access to an OpenGL context very difficult - indeed. Let's say one thread is handling "x" and only meant for it, but another thread accesses it and changes "x" to "x+1", the original thread then tries to do something else and it doesn't know it has changed. It’s not permitted in most cases and can result in undefined behaviour like that. There is some exception to this: contexts are allowed to share certain data such as texture information and vertex buffers, but more on that later.
Furthermore about a state based design, is that OpenGL implementations must ensure that the state is always valid. It must ensure that data is correctly bound, in range, and that nothing will break the system. Up to this point, everything is CPU-side still. If everything is okay, the implementation may then generate commands that can be sent to the GPU itself.
Drivers can do some fancy things behind the scenes of course, but the end result, as presented to the application, is the same. Accept an API command, modify and verify state, send hardware commands to the GPU.
Recent versions of OpenGL have been a big help in cutting out a lot of the overhead. Checking state validity can be decreased, greatly reducing the time from API call to GPU command, but it’s still very much a case of having to do that in a single thread.
Threading
So how can developers “multi-thread” OpenGL? It is possible to have multiple contexts in multiple threads, and use them to load texture data, update vertex buffers, possibly compile new shaders, in different threads. The tricky part about this is that sharing this information between OpenGL contexts is dependent on the drivers behaving themselves, in addition to the application not trying (by accident or intent) to do anything strange, so it’s often unstable. It can be quite the adventure getting a game running with this approach, and the runtime improvements are often simply not even worth the effort - it can often run worse if drivers need to synchronise data between contexts often enough. For the curious, things like editors with multiple rendering windows do this, but that’s a different scenario - each window isn’t trying to interfere with every other window while rendering, so multi-threading doesn’t normally come into play.
This leads to the second approach to multi-threading OpenGL: developers don’t! If OpenGL works best by submitting commands sequentially on the thread where a context is active, then that’s simply the best thing to do. Nothing stops a game developer making their own queue of OpenGL API calls they want to perform though, and creating that can be done by multi-threading. To give an example, if a game has a big list of objects, there’s going to be a bit of processing to do when deciding whether to draw each object or not; for each object, the game decides if the object might be visible first, and only try to render if it will actually be seen. The check for each object takes time, but processing each object is independent. So the list can be split into multiple sub-lists, and each sub-list given to a separate thread to run a visibility check on. Each thread will have it’s own rendering list to which objects that should be rendered are added. When done, each rendering list can be iterated over in turn and objects submitted to OpenGL in a single thread. This is a very simple example, but there’s normally quite a fair amount of similar logic in deciding what to render. So it’s not multi-threading OpenGL, but rather multi-threading in deciding how to use OpenGL.
Vulkan
Before I mentioned that OpenGL verifies state information and then generates commands to the GPU.
Firstly, once a developer has finished making everything work, then all that verification is still done, but isn’t actually required. It’s useful during development, but later it’s (hopefully!) a waste of time. So even on a dedicated thread submitting commands to OpenGL, there’s quite the overhead for the final task of actually sending commands to the GPU itself. It would be nice if there was some way to pre-build a list of commands to send to the GPU that were known to be valid.
Secondly, as with the example above of a game splitting object visibility checks into multiple sub-lists, it would also be nice if multiple GPU command lists could be created on separate threads, and then submitted to the GPU in turn. They are separate after all, and don’t require GPU access to actually prepare.
This is essentially what Vulkan allows. There are some requirements: all the state must be known up-front, and prepared for well before it’s time to actually render something. The flip side is that there is much, much less driver overhead, and the API itself can be used multi-threaded. Actual submission of commands to the GPU is still done sequentially, in a single thread, however there’s very little overhead; all error checking has been done, and it’s just sending commands directly to the GPU (feeding the beast).
There are other areas of Vulkan that lend themselves nicely to application level multi-threading, but I won’t cover them here. Suffice to say that Vulkan does not contain a central state machine, and instead tries to keep everything as isolated and contained as possible, meaning things like building a shader don’t block loading a texture, making multi-threaded designs easier to achieve.
Not Always Applicable
On a final note: when porting games, the way a game handles its data is not always compatible with some of the multi-threading ideas mentioned above. It can’t be expected in every game. In addition, it might simply be easier in time and effort (not to mention with testing and stability) to run things in a single thread anyway. Not as efficient, but possibly less error prone and faster to bring a port.
Some you may have missed, popular articles from the last month:
All posts need to follow our rules. For users logged in: please hit the Report Flag icon on any post that breaks the rules or contains illegal / harmful content. Guest readers can email us for any issues.
I think, whether OpenGL, Vulkan or D3D12, I would still be inclined to allocate a single thread/work queue to do nothing but execute GPU commands, simply to ensure the submission order and avoid subtle bugs, leaving other threads to prepare data and execute API commands that do not cause GPU submission.
One of the most difficult things to do is to determine what benefit you will actually achieve. Multi-threading is not cost-free. Even CPUs that support hardware threading generally just allow a lower-cost switch between 2 thread contexts per core. Designing something that works well on anything from 2-16 hardware threads is not always trivial.
One intriguing aspect of OpenGL 4 that offers a potential alternative method for reducing draw call overhead ( a primary driver for Mantle/Vulkan/D3D12 and their multi-threading enhancements ) is MultiDrawIndirect. This allows you to construct a collection of draw commands as a data array on the GPU, which can then be invoked as a single draw call. This could lead to a dramatic reduction in draw call overhead in certain circumstances, but I'm not sure whether it is flexible enough for all uses. If anyone has used this on Linux, I would be interested to hear their views.
Is that true? From what I've read, modern GPUs support multiple queues for input (some for graphics, some for compute). I'm not sure what GPU is supposed to do with multiple queues for graphics for example, since in the end, rendered image is a single frame, but if they exist, it means it should be possible to feed them from multiple threads (one thread per GPU input queue). And Vulkan should support that.
Also, it's possible to have multiple GPUs working in parallel (Vulkan aims to support that), to increase computational power. You for sure don't want to have one thread feeding such hardware setup - it's going to be underutilized.
Last edited by Shmerl on 12 February 2017 at 4:58 am UTC
Yes, Vulkan uses a single thread for GPU submission. AFAIK, in hardware terms, the most common case where a single thread may cause throttling would be for an extremely powerful GPU with a relatively weak CPU. In such a case, you would be using a dedicated PCIe card for the GPU, and the need to use PCIe would enforce single-thread synchronization in the driver regardless of what you do higher up the software stack.
AMD's APUs and Intel integrated graphics are different in that they are monolithic silicon, and therefore might be expected to benefit from multi-threading; but as the GPU elements are relatively weak, it is probably not the case.
Either way, an application design is probably more robust if it explicitly synchronizes submission order through a single thread. Responsibility for explicit synchronization is the trade-off developers accept for the benefits of using Vulkan.
Multiple independently operating GPUs ( say, one for compute and another for graphics ) could clearly benefit from a submission thread per GPU, but if they are co-operating on the same tasks, you still need to synchronize submissions, so you would probably still want a single thread to do that work.
Search for "GPU queue" there. Synchronization is also covered there with queue semaphores. So I assume Vulkan should have an analog of the same idea.
I can't find it now, but I saw someone asking similar question in one of the Khronos Q&A, and they said Vulkan should support multiple parallel GPU queues.
Last edited by Shmerl on 12 February 2017 at 5:31 pm UTC
But they avoid details about practical usage and threading in regards to that. I suppose some higher level articles should dive into that.
Last edited by Shmerl on 12 February 2017 at 6:21 pm UTC
From there at least it seems that GCN hardware has only one graphics queue, but in theory nothing prevents there to be multiple, which is clearly a possibility with multi-GPU setup. I.e. scenarios of SLI/Crossfire like usage, when multiple GPUs are used for rendering the single target would be such case. Supposedly it's coming in Vulkan-next.
Last edited by Shmerl on 12 February 2017 at 6:39 pm UTC
But, in a few words.
Is OpenGL obsolete because it doesn't make efficient use of modern CPU's??
Now, for the sake of porting windows games to Linux and for the sake of the performance of those ports, instead of using OpenGL, is not more convenient to teach Linux how to speak D3D11 (like gallium9 does with D3D9)?
Vulkan can be the future, but is not the standard in the actual blockbuster games.