DRI3K — First Steps
Here’s an update on DRI3000. I’ll start by describing what I’ve
managed to get working and then summarize discussions that happened on
the xorg-devel mailing list.
Private Back Buffers
One of the big goals for DRI3000 is to finish the job of moving buffer
management out of the X server and into applications. The only thing
still allocated by DRI2 in the X server are back buffers; everything
else moved to the client side. Yes, I know, this breaks the GLX
requirement for sharing buffers between applications, but we just
don’t care anymore.
As a quick hack, I figured out how to do this with DRI2 today —
allocate our back buffers separately by creating X pixmaps for them,
and then using the existing DRI2GetBuffersWithFormat request to get a
GEM handle for them.
Of course, now that all I’ve got is a pixmap, I can’t use the existing
DRI2 swap buffer support, so for now I’m just using CopyArea to get
stuff on the screen. But, that works fine, as long as you don’t care
about synchronization.
Handling Window Resize
The biggest pain in DRI2 has been dealing with window resize. When the
window resizes in the X server, a new back buffer is allocated and the
old one discarded. An event is delivered to ‘invalidate’ the old back
buffer, but anything done between the time the back buffer is
discarded and when the application responds to the event is
lost.
You can easily see this with any GL application today — resize the window
and you’ll see occasional black frames.
By allocating the back buffer in the application, the application
handles the resize within GL; at some point in the rendering process
the resize is discovered, and GL creates a new buffer, copies the
existing data over, and continues rendering. So, the rendered data are
never lost, and every frame gets displayed on the screen (although,
perhaps at the wrong size).
The puzzle here was how to tell that the window was resized. Ideally,
we’d have the application tell us when it received the X configure
notify event and was drawing the frame at the new size. We thought of
a cute hack that might do this; track GL calls to change the viewport
and make sure the back buffer could hold the viewport contents. In
theory, the application would receive the X configure notify event,
change the viewport and render at the new size.
Tracking the viewport settings for an entire frame and constructing
their bounding box should describe the size of the window; at least it
should describe the intended size of the window.
There’s at least one serious problem with this plan — applications may
well call glClear before calling glViewport, and as glClear does not
use the current viewport, instead clearing the “whole” window, we
couldn’t use the viewport as an indication of the current window size.
However, what this exercise did lead us to realize was that we don’t
care what size the window actually is, we only care what size the
application thinks it is. More accurately, the GL library just needs
to be aware of any window configuration changes before the
application, so that it will construct a buffer that is not older than
the application knowledge of the window size.
I came up with two possible mechanisms here; the first was to
construct a shared memory block between application and X server where
the X server would store window configuration changes and signal the
application by incrementing a sequence number in the shared page; the
GL library would simply look at the sequence number and reallocate
buffers when it changed.
The problem with the shared memory plan was that it wouldn’t work
across the network, and we have a future project in mind to replace
GLX indirect rendering with local direct rendering and PutImage which
still needs accurate window size tracking. More about that project in
a future post though…
X Events to the Rescue
So, I decided to just have the X server send me events when the window
size changed. I could simply use the existing X configure notify
events, but that would require a huge infrastructure change in the
application so that my GL library could get those events and have
the application also see them. Not knowing what the application is up
to, we’d have to track every ChangeWindowAttributes call and make sure
the event_mask included the right bits. Ick.
Fortunately, there’s another reason to use a new event — we need more
information than is provided in the ConfigureNotify event; as you
know, the Swap extension wants to have applications draw their content
within a larger buffer that can have the window decorations placed
around it to avoid a copy from back buffer to window buffer. So, our
new ConfigureNotify event would also contain that information.
Making sure that ConfigureNotify event is delivered before the core
ConfigureNotify event ensures that the GL library should always be
able to know about window size changes before the application.
Splitting the XCB Event Stream
Ok, so I’ve got these new events coming from the X server. I don’t
want the application to have to receive them and hand them down to the
GL library; that would mean changing every application on the planet,
something which doesn’t seem very likely at all.
Xlib does this kind of thing by allowing applications to stick
themselves into the middle of the event processing code with a
callback to filter out the events they’re interested in before they
hit the main event queue. That’s how DRI2 captures Invalidate events,
and it “works”, but using callbacks from the middle of the X event
processing code creates all kinds of locking nightmares.
As discussed above, I don’t care when GL sees the configure events, as
long as it gets them before the application finds about about the
window size change. So, we don’t need to synchronously handle these
events, we just need to be able to know they’ve arrived and then
handle them on the next call to a GL drawing function.
What I’ve created as a prototype is the ability to identify
specific events and place them in a separate event queue, and when
events are placed in that event queue, to bump a ‘sequence number’ so
that the application can quickly identify that there’s something to
process.
Making the Event Mask Per-API Instead of Per-Client
The problem described above about using the core ConfigureNotify
events made me think about how to manage multiple APIs all wanting to
track window configuration. For core events, the selection of which
events to receive is all based on the client; each client has a single
event mask, and each client receives one copy of each event.
Monolithic applications work fine with this model; there’s one place
in the application selecting for events and one place processing
them. However, modern applications end up using different
APIs for 3D, 2D and media. Getting those libraries to cooperate and
use a common API for event management seems pretty intractable. Making
the X server treat each API as a separate entity seemed a whole lot
easier; if two APIs want events, just have them register separately
and deliver two events flagged for the separate APIs.
So, the new DRI3 configure notify events are created with their own
XID to identify the client-side owner of the event. Within the X
server, this required a tiny change; we already needed to allocate an
XID for each event selection so that it could be automatically cleaned
up when the client exited, so the only change was to use the one
provided by the client instead of allocating one in the server.
On the wire, the event includes this new XID so that the library can
use it to sort out which event queue to stick the event in using the
new XCB event stream splitting code.
Current Status
The above section describes the work that I’ve got running; with it, I
can run GL applications and have them correctly track window size
changes without losing a frame. It’s all available on the ‘dri3’
branches of my various repositories for xcb proto, libxcb, dri3proto
and the X server.
Future Directions
The first obvious change needed is to move the configuration events
from the DRI3 extension to the as-yet-unspecified new ‘Swap’ extension
(which I may rename as ‘Present’, as in ‘please present this pixmap in
this window’). That’s because they aren’t related to direct rendering,
but rather to tracking window sizes for off-screen rendering,
either direct, indirect or even with the CPU to memory.
DRI3 and Fences
Right now, I’m not synchronizing the direct rendering with the
CopyArea call; that means the X server will end up with essentially
random contents as the application may be mid-way through the next
frame before it processes the CopyArea. A simple XSync call would
suffice to fix that, but I want a more efficient way of doing this.
With the current Linux DRI kernel APIs, it is sufficient to serialize
calls that post rendering requests to the kernel to ensure that the
rendering requests are themselves serialized. So, all I need to do is
have the application wait until the X server has sent the CopyArea
request down to the kernel.
I could do that by having the X server send me an X event, but I think
there’s a better way that will extend to systems that don’t offer the
kernel serialization guarantee. James Jones and Aaron Plattner put
together a proposal to add Fences to the X Sync extension. In the X
world, those offer a method to serialize rendering between two X
applications, but of course the real goal is to expose those fences to
GL applications through the various GL sync extensions (including
GLARBsync and GLNVfence).
With the current Linux DRI implementation, I think it would be pretty
easy to implement these fences using pthread semaphores in a block of
memory shared between the server and application. That would be
DRI-specific; other direct rendering interfaces would use alternate
means to share the fences between X server and application.
Swap/Present — The Second Extension
By simply using CopyArea for my application presentation step, I think
I’ve neatly split this problem into manageable pieces. Once I’ve got
the DRI3 piece working, I’ll move on to fixing the presentation
issue.
By making that depend solely on existing core Pixmap objects as the
source of data to present, I can develop that without any reference to
DRI. This will make the extension useful to existing X applications
that currently have only CopyArea for this operation.
Presentation of application contents occurs in two phases; the first
is to identify which objects are involved in the presentation. The
second is to perform the presentation operation, either using
CopyArea, or by swapping pages or the entire frame buffer. For
offscreen objects, these can occur at the same time. For
onscreen, the presentation will likely be synchronized with the
scanout engine.
The second form will mean that the Fences that mark when the
presentation has occurred will need to signaled only once the
operation completes.
A CopyArea operation means that the source pixmap is “ready”
immediately after the Copy has completed. Doing the presentation by
using the source pixmap as the new front buffer means that the source
pixmap doesn’t become “ready” until after the next swap completes.
What I don’t know now is whether we’ll need to report up-front
whether the presentation will involve a copy or a swap. At this point,
I don’t think so — the application will need two back buffers in all
cases to avoid blocking between the presentation request and the
presentation execution. Yes, it could use a fence for this, but that
still sticks a bubble in the 3D hardware where it’s blocked waiting
for vblank instead of starting on the next frame immediately.
Plan of Attack
Right now, I’m working on finishing up the DRI3 piece:
Replace the DRI2 buffer allocation kludge with actual local
buffer allocation, mapping them into pixmaps using FD passing.
Replace the DRI2 authentication scheme with having the X server
open the DRI object, preparing it for rendering and passing it
back to the application.
Working on the XCB pieces to get the split event-queue stuff landed
upstream.
Implementing the Fencing stuff to correctly serialize access to the
pixmap.
The first three seem fairly straight forward. The fencing stuff will
involve working with James and Aaron to integrate their XSync changes
into the server.
After that, I’ll start working on the presentation piece. Foremost
there is figuring out the right name for this new extension; I started
with the name ‘Swap’ as that’s the GL call it implements. However,
‘Swap’ is quite misleading as to the actual functionality; a name more
like ‘Present’ might provide a better indication of what it actually
does. Of course, ‘Present’ is both a verb and a noun, with very
different connotations. Suggestions on this most complicated part of
the project are welcome!