[PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device

Discussion:

[PATCH RFC wayland-protocols] unstable/linux-dmabuf: add wp_linux_dmabuf_device_hint

Simon Ser

2018-11-01 16:44:58 UTC

On multi-GPU setups, multiple devices can be used for rendering. Clients need
hints about the device in use by the compositor. For instance, if they render
on another GPU, then they need to make sure the memory is accessible between
devices and that their buffers are not placed in hidden memory.

This commit introduces a new wp_linux_dmabuf_device_hints object. This object
advertizes a preferred device via a file descriptor and a set of preferred
formats/modifiers.

Each object is bound to a wl_surface and can dynamically update its hints. This
enables fine-grained per-surface optimizations. For instance, when a surface is
scanned out on a GPU the compositor isn't compositing with, the preferred
device can be set to this GPU to avoid unnecessary roundtrips.

Signed-off-by: Simon Ser <***@emersion.fr>
---

These additions are inspired from [1]. The goal here is to be able to get rid
of wl_drm, enabling more use-cases in the process.

I'm not a DRM/Mesa specialist, so let me know if I've made horrible mistakes.
As always, comments and questions are welcome.

[1]: https://gitlab.freedesktop.org/wayland/wayland/issues/59

.../linux-dmabuf/linux-dmabuf-unstable-v1.xml | 67 ++++++++++++++++++-
1 file changed, 65 insertions(+), 2 deletions(-)

diff --git a/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml b/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml
index 154afe2..eafb559 100644
--- a/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml
+++ b/unstable/linux-dmabuf/linux-dmabuf-unstable-v1.xml
@@ -24,7 +24,7 @@
DEALINGS IN THE SOFTWARE.
</copyright>

- <interface name="zwp_linux_dmabuf_v1" version="3">
+ <interface name="zwp_linux_dmabuf_v1" version="4">
<description summary="factory for creating dmabuf-based wl_buffers">
Following the interfaces from:
https://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt
@@ -35,6 +35,9 @@
the set of supported formats and format modifiers is sent with
'format' and 'modifier' events.

+ Clients can use the get_surface_device_hints request to get dmabuf hints
+ for a particular surface.
+
The following are required from clients:

- Clients must ensure that either all data in the dma-buf is
@@ -138,9 +141,19 @@
<arg name="modifier_lo" type="uint"
summary="low 32 bits of layout modifier"/>
</event>
+
+ <request name="get_surface_device_hints" since="4">
+ <description summary="get device hints for a surface">
+ This request creates a new wp_linux_dmabuf_device_hints object for the
+ specified wl_surface. This object will deliver hints about dmabuf
+ parameters to use for buffers attached to this surface.
+ </description>
+ <arg name="id" type="new_id" interface="zwp_linux_dmabuf_device_hints_v1"/>
+ <arg name="surface" type="object" interface="wl_surface"/>
+ </request>
</interface>

- <interface name="zwp_linux_buffer_params_v1" version="3">
+ <interface name="zwp_linux_buffer_params_v1" version="4">
<description summary="parameters for creating a dmabuf-based wl_buffer">
This temporary object is a collection of dmabufs and other
parameters that together form a single logical buffer. The temporary
@@ -345,4 +358,54 @@

</interface>

+ <interface name="zwp_linux_dmabuf_device_hints_v1" version="4">
+ <description summary="dmabuf device hints">
+ This object advertizes dmabuf hints for a surface. Such hints include the
+ primary device and the formats that are preferred for this surface.
+
+ These hints are sent once when this object is created and whenever they
+ change. The done event is always sent once after all hints have been sent.
+ </description>
+
+ <request name="destroy" type="destructor">
+ <description summary="destroy the device hints">
+ Using this request a client can tell the server that it is not going to
+ use the wp_linux_dmabuf_device_hints object anymore.
+ </description>
+ </request>
+
+ <event name="primary_device">
+ <description summary="preferred primary device">
+ This event advertizes the primary device that the server prefers. There
+ is exactly one primary device.
+ </description>
+ <arg name="fd" type="fd" summary="device file descriptor"/>
+ </event>
+
+ <event name="modifier">
+ <description summary="preferred buffer format modifier">
+ This event advertises the formats that the server prefers, along with
+ the modifiers preferred for each format.
+
+ For the definition of the format and modifier codes, see the
+ wp_linux_buffer_params::create request.
+ </description>
+ <arg name="format" type="uint" summary="DRM_FORMAT code"/>
+ <arg name="modifier_hi" type="uint"
+ summary="high 32 bits of layout modifier"/>
+ <arg name="modifier_lo" type="uint"
+ summary="low 32 bits of layout modifier"/>
+ </event>
+
+ <event name="done">
+ <description summary="all hints have been sent">
+ This event is sent after all properties of a
+ wp_linux_dmabuf_device_hints have been sent.
+
+ This allows changes to the wp_linux_dmabuf_device_hints properties to be
+ seen as atomic, even if they happen via multiple events.
+ </description>
+ </event>
+ </interface>
+
</protocol>

--
2.19.1

Daniel Stone

2018-11-01 17:04:51 UTC

Permalink

Hi Simon,
Thanks a lot for taking this on! :)

Post by Simon Ser
This commit introduces a new wp_linux_dmabuf_device_hints object. This object
advertizes a preferred device via a file descriptor and a set of preferred
formats/modifiers.

s/advertizes/advertises/g (including in the XML doc)

I also think this would be better called
wp_linux_dmabuf_surface_hints, since the change over the dmabuf
protocol is that it's surface-specific.

Post by Simon Ser
+ <interface name="zwp_linux_dmabuf_device_hints_v1" version="4">
+ <description summary="dmabuf device hints">
+ This object advertizes dmabuf hints for a surface. Such hints include the

*advertises

Post by Simon Ser
+ <event name="primary_device">
+ <description summary="preferred primary device">
+ This event advertizes the primary device that the server prefers. There
+ is exactly one primary device.
+ </description>
+ <arg name="fd" type="fd" summary="device file descriptor"/>
+ </event>

I _think_ this might want to refer to separate objects.

When we receive an FD from the server, we don't know what device it
refers to, so we have to open the device to probe it. Opening the
device can be slow: if a device is in a low PCI power state, it can be
a couple of seconds to physically power up the device and then wait
for it to initialise before we can interrogate it.

One way around this would be to have a separate wp_linux_dmabuf_device
object, lazily sent as a new object in an event by the root
wp_linux_dmabuf object, with the per-surface hints then referring to a
previously-sent device. This would allow clients to only probe each
device once per EGLDisplay, rather than once per EGLSurface.

Post by Simon Ser
+ <event name="modifier">
+ <description summary="preferred buffer format modifier">
+ This event advertises the formats that the server prefers, along with
+ the modifiers preferred for each format.
+
+ For the definition of the format and modifier codes, see the
+ wp_linux_buffer_params::create request.
+ </description>
+ <arg name="format" type="uint" summary="DRM_FORMAT code"/>
+ <arg name="modifier_hi" type="uint"
+ summary="high 32 bits of layout modifier"/>
+ <arg name="modifier_lo" type="uint"
+ summary="low 32 bits of layout modifier"/>
+ </event>

I think we want another event here, to group sets of modifiers
together by preference.

For example, say the surface could be directly scanned out, but only
if it uses the linear or X-tiled modifiers. Our surface-preferred
modifiers would be LINEAR + X_TILED. However, the client may not be
able to produce that combination. If the GPU still supports Y_TILED,
then we want to indicate that the client _can_ use Y_TILED if it needs
to, but _should_ use LINEAR or X_TILED.

DRI3 implements this by sending sets of modifiers in 'tranches', which
are arrays of arrays, which in this case would be:
tranches = {
[0 /* optimal */] = {
{ .format = XRGB8888, .modifier = LINEAR }
{ .format = XRGB8888, .modifier = X_TILED }
},
[1 /* less optimal */] = {
{ .format = XRGB8888, .modifier = Y_TILED }
}
}

I imagine the best way to do it with Wayland events would be to add a
'marker' event to indicate the border between these tranches. So we
would send:
modifier(XRGB8888, LINEAR)
modifier(XRGB8888, X_TILED)
barrier()
modifier(XRGB8888, Y_TILED)
barrier()
done()

For a simple 'GPU composition or scanout' case, this would only be two
tranches, which are 'most optimal' and 'fallback'. For multiple GPUs
though, we could end up with three tranches: scanout-capable,
same-GPU-composition, or cross-GPU-composition. Similarly, if we take
media recording into account, we could end up with more than two
tranches.

What do you think?

Cheers,
Daniel

Pekka Paalanen

2018-11-02 08:53:28 UTC

Permalink

On Thu, 1 Nov 2018 17:04:51 +0000

Post by Daniel Stone
Hi Simon,
Thanks a lot for taking this on! :)

Post by Simon Ser
This commit introduces a new wp_linux_dmabuf_device_hints object. This object
advertizes a preferred device via a file descriptor and a set of preferred
formats/modifiers.

s/advertizes/advertises/g (including in the XML doc)
I also think this would be better called
wp_linux_dmabuf_surface_hints, since the change over the dmabuf
protocol is that it's surface-specific.

*advertises

Hi,

wouldn't drmGetDevice2() with flags=0 gets us everything needed without
waking up a sleeping PCI device?

I just read it from Emil:
https://lists.freedesktop.org/archives/mesa-dev/2018-October/207447.html

Post by Daniel Stone
One way around this would be to have a separate wp_linux_dmabuf_device
object, lazily sent as a new object in an event by the root
wp_linux_dmabuf object, with the per-surface hints then referring to a
previously-sent device. This would allow clients to only probe each
device once per EGLDisplay, rather than once per EGLSurface.

This optimization does sound attractive to me in any case.

Post by Daniel Stone

Combination? I thought modifiers are never combined with other
modifiers?

Post by Daniel Stone
then we want to indicate that the client _can_ use Y_TILED if it needs
to, but _should_ use LINEAR or X_TILED.
DRI3 implements this by sending sets of modifiers in 'tranches', which
tranches = {
[0 /* optimal */] = {
{ .format = XRGB8888, .modifier = LINEAR }
{ .format = XRGB8888, .modifier = X_TILED }
},
[1 /* less optimal */] = {
{ .format = XRGB8888, .modifier = Y_TILED }
}
}
I imagine the best way to do it with Wayland events would be to add a
'marker' event to indicate the border between these tranches. So we
modifier(XRGB8888, LINEAR)
modifier(XRGB8888, X_TILED)
barrier()
modifier(XRGB8888, Y_TILED)
barrier()
done()

Yeah. Another option is to send a wl_array of modifiers per format and
tranch.

I suppose it will be enough to send tranches for just the currently
used format? Otherwise it could be "a lot" of data.

Post by Daniel Stone
For a simple 'GPU composition or scanout' case, this would only be two
tranches, which are 'most optimal' and 'fallback'. For multiple GPUs
though, we could end up with three tranches: scanout-capable,
same-GPU-composition, or cross-GPU-composition. Similarly, if we take
media recording into account, we could end up with more than two
tranches.
What do you think?

At first I didn't understand this at all. I wonder if Simon is as
puzzled as I was. :-)

Is the idea of tranches such that within a tranch, a client will be able
to pick a modifier that is optimal for its rendering? This would convey
the knowledge that all modifiers withing a tranch are equally good
for the compositor, so the client can pick what it can use the best.

This is contrary to a flat preference list, where a client would pick
the first modifier it can use, even if it is less optimal than a later
modifer for its rendering while for compositor it would not make a
difference.

I'm also not sure I understand your tranch categories. Are you thinking
that, for instance, if a client uses same-GPU-composition modifers
which exclude cross-GPU-composition that a compositor would start
copy-converting buffers if the composition no longer happens on the
same GPU, until the client adjusts to the new preference? That makes
sense, if I guessed right what you meant.

I'm wondering how the requirement "a compositor must always be able to
consume the buffer regardless of where it will be shown" is accounted
for here. Do we need a reminder about that in the spec?

Thanks,
pq

Simon Ser

2018-11-02 18:38:10 UTC

Permalink

Post by Pekka Paalanen

Post by Daniel Stone
I think we want another event here, to group sets of modifiers
together by preference.
For example, say the surface could be directly scanned out, but only
if it uses the linear or X-tiled modifiers. Our surface-preferred
modifiers would be LINEAR + X_TILED. However, the client may not be
able to produce that combination. If the GPU still supports Y_TILED,

Combination? I thought modifiers are never combined with other
modifiers?

I think Daniel refers to the format + modifier combination. Yes, modifiers
cannot be mixed with each other.

Post by Pekka Paalanen

Yeah. Another option is to send a wl_array of modifiers per format and
tranch.

True. Any reason why this hasn't been done in the global?

Post by Pekka Paalanen
I suppose it will be enough to send tranches for just the currently
used format? Otherwise it could be "a lot" of data.

What do you mean by "the currently used format"?

I expect clients to bind to this interface and create a surface hints object
before the surface is mapped. In this case there's no "currently used format".

It will be a fair amount of data, yes. However it's just a list of integers.
When we send strings over the protocol (e.g. toplevel title in xdg-shell) it's
about the same amount of data I guess.

Post by Pekka Paalanen

Yeah, that's what I've understood too.

Post by Pekka Paalanen
I'm also not sure I understand your tranch categories. Are you thinking
that, for instance, if a client uses same-GPU-composition modifers
which exclude cross-GPU-composition that a compositor would start
copy-converting buffers if the composition no longer happens on the
same GPU, until the client adjusts to the new preference? That makes
sense, if I guessed right what you meant.

Right. I don't think we can do any better.

Post by Pekka Paalanen
I'm wondering how the requirement "a compositor must always be able to
consume the buffer regardless of where it will be shown" is accounted
for here. Do we need a reminder about that in the spec?

A reminder might be a good idea. The whole surface hints are just hints. The
client can choose to use another device or another format, and in the worst case
it'll just be more work and more copies on the compositor side.

Pekka Paalanen

2018-11-05 08:57:34 UTC

Permalink

On Fri, 02 Nov 2018 18:38:10 +0000