Introduction and updates from NVIDIA

Discussion:

Miguel Angel Vico

2016-03-21 16:28:13 UTC

Hi all,

First of all, I'd like to introduce myself to the Wayland community: My
name is Miguel A. Vico, and I've been working as a Software Engineer
for NVIDIA for some time now, more specifically, in the Linux drivers
team. Although I've never spoken before, I've been lately following the
amazing work that you all have been doing here.

We have been working on adding to our drivers all required features to
be able to run Wayland and Weston on top of it. We have just released
our NVIDIA's 364.12 GPU driver, which brings initial DRM KMS support
(among other things). Please, check out our public announcement here:

https://devtalk.nvidia.com/default/topic/925605/linux/nvidia-364-12-release-vulkan-glvnd-drm-kms-and-eglstreams/

In order to make the Weston DRM compositor work with our drivers, we
have used EGLDevice, EGLOutput, and EGLStream objects.

For those not familiar with this set of EGL structures, here I try to
summarize the most important part of them, and how would they fit in
the current Weston DRM compositor design:

EGLDevice provides means to enumerate native devices, and then
create an EGL display connection from them.

Similarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.

For instance, EGLOutputLayer represents a portion of display
control hardware that accepts an image as input and processes it
for presentation on a display device.

EGLStream implements a mechanism to communicate frame producers and
frame consumers. By attaching an EGLOutputLayer consumer to a
stream, a producer will be able to present frames on a display
device.

Thus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a display
device.

In a similar way, by attaching a GLTexture consumer to a stream, a
producer (wayland client) could feed frames to a texture, which in
turn can be used by a compositor to prepare the final frame to be
presented.

Whenever EGL_EXT_device_drm extension is present, EGLDevice can
be used to enumerate and access DRM KMS devices, and EGLOutputLayer
to enumerate and access DRM KMS crtcs and planes.

By using EGLStreams and attaching an EGLOutputLayer consumer
(representing a DRM KMS crtc or plane) to it, compositor-drm can
produce final composition frames and present them on a DRM device.

Most of the EGL extensions required to implement this may be already
found in the Khronos registry, but we also needed extended
functionality for EGLStreams and EGLOutput consumers provided by
following extensions:

- EGL_NV_stream_attrib:

https://github.com/aritger/eglstreams-kms-example/blob/master/proposed-extensions/EGL_NV_stream_attrib.txt

Among other things, this extension defines a version of the stream
acquire function that takes an EGLAttrib parameter, allowing to
modify/extend acquire behavior in several cases.

- EGL_EXT_stream_acquire_mode:

https://github.com/aritger/eglstreams-kms-example/blob/master/proposed-extensions/EGL_EXT_stream_acquire_mode.txt

By default, EGLOutputLayer consumer are set to automatically acquire
frames, so eglSwapBuffers() call on the producer side will present
to the display without any further action. This extension defines a
new EGLStream attribute which allows to change this behavior so that
acquire operations must be issued manually with
eglStreamConsumerAcquireAttribNV().

- EGL_NV_output_drm_flip_event:

https://github.com/aritger/eglstreams-kms-example/blob/master/proposed-extensions/EGL_NV_output_drm_flip_event.txt

This extension defines a new acquire attribute for EGLOutputLayer
consumers tied to DRM KMS CRTCs. It allows clients to get notified
whenever an acquire operation issued with
eglStreamConsumerAcquireAttribNV() is done.

Additionally, in order to allow wl_buffers to be bound to EGLStreams, we
kludged eglQueryWaylandBufferWL(EGL_WAYLAND_BUFFER_WL) to return the
stream file descriptor.

We think the proper way to handle this should be:

- Update WL_bind_wayland_display such that eglQueryWaylandBufferWL()
accepts a new attribute EGL_WAYLAND_BUFFER_TYPE_WL, returning
EGL_WAYLAND_BUFFER_EGLIMAGE_WL for the non-stream case.

- Add a new WL_wayland_buffer_eglstream extension, which would define
EGL_WAYLAND_BUFFER_EGLSTREAM_WL as a return value for
EGL_WAYLAND_BUFFER_TYPE_WL, and yet another attribute
EGL_WAYLAND_BUFFER_EGLSTREAM_FD_WL to query the stream file
descriptor.

I'm planning on posting to this mailing list the set of patches that
will add the support above-mentioned, hoping to get feedback from you.

Thanks in advance,
--
Miguel

NVIDIA GmbH, Wuerselen, Germany, Amtsgericht Aachen, HRB 8361
Managing Director: Karen Theresa Burns

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Nicole Fontenot

2016-03-22 13:39:57 UTC

Permalink

Hello Miguel,

I cannot comment on if these patches are within scope of wayland but, I
think now is the perfect time to consider API extensions.

It would be great if someone with an Nvidia card has time to run
performance tests when you submit your patches. A proper decision to
include the patches has more likelihood of happening that way, I think.
If they do not get accepted into Wayland I'm sure that users of Nvidia
cards, particularly the steam users, would still want this as an extension
in their favorite composer.

Daniel Stone

2016-03-22 13:49:59 UTC

Permalink

Hi Miguel,

Post by Miguel Angel Vico
First of all, I'd like to introduce myself to the Wayland community: My
name is Miguel A. Vico, and I've been working as a Software Engineer
for NVIDIA for some time now, more specifically, in the Linux drivers
team. Although I've never spoken before, I've been lately following the
amazing work that you all have been doing here.

Welcome!

I'm sorry I don't have some better news for you, but Andy and Aaron
can tell you it's not personal: this has been going on for years.

Post by Miguel Angel Vico
In order to make the Weston DRM compositor work with our drivers, we
have used EGLDevice, EGLOutput, and EGLStream objects.

This is ... unfortunate. To echo what Daniel Vetter said, on the whole
these modesetting-in-EGL extensions are not something which have that
wide support, or even implementation. That being said, it's
interesting to have an implementation, because it has helped shape my
feelings and arguments a little, into something more concrete

Post by Miguel Angel Vico
For those not familiar with this set of EGL structures, here I try to
summarize the most important part of them, and how would they fit in
EGLDevice provides means to enumerate native devices, and then
create an EGL display connection from them.

This is generically useful: we would like to extend
eglGetPlatformDisplay to take an attrib naming an EGLDevice, which we
could then use with platform_gbm (to select GPU and scanout device
separately, either for multi-GPU systems or also for SoCs with
discrete GPU/dispc setups) as well as platform_wayland and co.

Post by Miguel Angel Vico
Similarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.
For instance, EGLOutputLayer represents a portion of display
control hardware that accepts an image as input and processes it
for presentation on a display device.

I still struggle to see the value of what is essentially an
abstraction over KMS, but oh well.

Post by Miguel Angel Vico
EGLStream implements a mechanism to communicate frame producers and
frame consumers. By attaching an EGLOutputLayer consumer to a
stream, a producer will be able to present frames on a display
device.

This is understating things quite a bit, I think. On the
Wayland-client side, it's a pretty big change from the EGLSurface
model, particularly if you use the default mailbox mode (see comments
on patch 4/7 as to how this breaks real-world setups, AFAICT). On the
Wayland-compositor side, it's two _huge_ changes.

Firstly, again looking at the case where a Wayland client is a stream
producer and the Wayland compositor is a consumer, we move from a
model where references to individual buffers are explicitly passed
through the Wayland protocol, to where those buffers merely carry a
reference to a stream. Again, as stated in the review of 4/7, that
looks like it has the potential to break some actual real-world cases,
and I have no idea how to solve it, other than banning mailbox mode,
which would seem to mostly defeat the point of Streams (more on that
below).

Secondly, looking at the compositor-drm case, the use of the dumb
buffer to display undefined content as a dummy modeset really makes me
uneasy, again because both gl-renderer and compositor-drm are written
for explicit individual buffer management, rather than streams in +
streams out. I think the combination of the two pushes them long
beyond the point of readability, and I'd encourage you to look at
trying to split those files up, or at least the functions within them.
Attempting to keep both modes in there just looks like a maintenance
nightmare, especially when this streams implementation
(unsurprisingly) has to bypass almost the entire runtime (as opposed
to init-time) functionality of compositor-drm.

Also, I'm not quite sure how you're testing the compositor-as-consumer
mode: I can't seem to see any EGL extensions which allow you to
connect a Wayland surface as an EGLStream consumer. Do you have
something else unpublished that's being used here, or is this what the
libnvidia-egl-wayland library is for? Or do you just have clients
using EGLSurfaces as normal, which happen to be implemented internally
as EGLStreams? (Also, that the only way to test this is through
proprietary drivers implementing only-just-published extensions not
only makes me very sad, but hugely increases the potential for this to
be inadvertently broken.)

Post by Miguel Angel Vico
Thus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a display
device.
In a similar way, by attaching a GLTexture consumer to a stream, a
producer (wayland client) could feed frames to a texture, which in
turn can be used by a compositor to prepare the final frame to be
presented.

Quick aside: this reminds me in many unfortunate ways of
GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture stream of
stuff and make it appear in a texture' model as streams, whereas most
of the rest of the world (EGL, Vulkan WSI, Wayland, Android, ChromeOS,
etc) have all moved explicitly _away_ from that model to passing
references to individual buffers, this in many ways brings us back to
tfp.

Post by Miguel Angel Vico
Whenever EGL_EXT_device_drm extension is present, EGLDevice can
be used to enumerate and access DRM KMS devices, and EGLOutputLayer
to enumerate and access DRM KMS crtcs and planes.

Again, the enumeration isn't so much used as bypassed. The original
enumeration is used, and all we do with the EGL objects is a) list all
of them, b) filter them to find the one we already have, and c)
perhaps replace their internal representation of the device with the
one we already have.

Post by Miguel Angel Vico
By using EGLStreams and attaching an EGLOutputLayer consumer
(representing a DRM KMS crtc or plane) to it, compositor-drm can
produce final composition frames and present them on a DRM device.

Arguably it's gl-renderer producing the frames, with compositor-drm
kind of acting as a fake consumer (EGL_NV_stream_attrib).

Post by Miguel Angel Vico
Additionally, in order to allow wl_buffers to be bound to EGLStreams, we
kludged eglQueryWaylandBufferWL(EGL_WAYLAND_BUFFER_WL) to return the
stream file descriptor.

As said earlier, I don't think this is the right way to go, and have
other suggestions.

I'd like to look at the elephant in the room, which is why you're
using this in the first place (aside from general NVIDIA enthusiasm
for encapsulating everything within EGL Streams/Output/Device/etc,
dating back many years). Andy/Aaron, you've said that you found GBM to
be inadequate, and I'd like to find out explicitly how. Through a few
snippets of IRC and NVIDIA devtalk, so far I can see:

'We can't choose an optimal rendering configuration, because we don't
know how it's going to be used' - (almost completely) untrue. The FD
you pass to gbm_device_create is that of the KMS device, a gbm_surface
contains information as to how the plane (primary or overlay) will be
configured, and an EGLDisplay lets you tie the rendering and scanout
devices together. What more information do you need? It's true that we
don't have a way to select individual rendering devices at the moment,
but as said earlier, passing an EGLDevice as an attrib to
GetPlatformDisplay would resolve that, as you would have the render
device identified by the EGLDevice and the scanout device identified
by the gbm_device. At that point, you have the full pipeline and can
determine the optimal configuration.

'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?

'Width, height, pitch and format aren't enough information' - this is
true, but not necessarily relevant. I'm not sure what the source of
this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
to be extended with a gbm_bo_get_modifier() call, which would allow
you to get the DRM format modifier to describe tiling/compression/et
al (as well as perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers). If it's
not gbm, what actually is it? The only other place I can think of
(suggested by Pekka, I think) was the wl_drm protocol, which it should
be stressed is a) not required in any way by Wayland, b) not a
published/public protocol, c) not a stable protocol. wl_drm just
happens to be the way that Mesa shares buffers, just as wl_viv is how
Vivante's proprietary driver shares buffers, and mali_buffer_sharing
is how the Mali driver does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used through
EGL, there is _no_ requirement for you to also implement wl_drm. As it
is a hidden private Mesa protocol, there is also no requirement for
the protocol to remain stable.

'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
the explicit buffer management on the client side does not parallel
EGLStreams, and notably there is no equivalent consumer interface
offered on the server side, but instead the individual-buffer-driven
approach is taken. It's true that VK_WSI_display_swapchain does exist
and does match the EGLStreams model fairly closely, but also that it
does not have universal implementation: the Intel 'anv' Mesa-based
driver does not implement display_swapchain, instead having an
interface to export a VkImage as a dmabuf. It's true that the latter
is not optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
GBM avoids it for EGL on KMS.

I think it's been good to have this series to push the discussion
further in more concrete terms, but unfortunately I have to say that
I'm even less convinced now than I have ever been. Sorry.

Cheers,
Daniel

Daniel Vetter

2016-03-22 21:43:18 UTC

Permalink

Post by Daniel Stone
I'd like to look at the elephant in the room, which is why you're
using this in the first place (aside from general NVIDIA enthusiasm
for encapsulating everything within EGL Streams/Output/Device/etc,
dating back many years). Andy/Aaron, you've said that you found GBM to
be inadequate, and I'd like to find out explicitly how. Through a few
'We can't choose an optimal rendering configuration, because we don't
know how it's going to be used' - (almost completely) untrue. The FD
you pass to gbm_device_create is that of the KMS device, a gbm_surface
contains information as to how the plane (primary or overlay) will be
configured, and an EGLDisplay lets you tie the rendering and scanout
devices together. What more information do you need? It's true that we
don't have a way to select individual rendering devices at the moment,
but as said earlier, passing an EGLDevice as an attrib to
GetPlatformDisplay would resolve that, as you would have the render
device identified by the EGLDevice and the scanout device identified
by the gbm_device. At that point, you have the full pipeline and can
determine the optimal configuration.
'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?

There's display engines which can directly scan out buffers compressed by
the render engine. It's awesome, except randomly limited, so you need a
communication backchannel from your display driver all the way to your
buffer allocator thing on the client side. And depending upon luck you
really can't tell who should do the decompress past for most optimal
result upfront.

I think on android the most common way to do that is to attach arbitrary
metadata with a hand-rolled ioctl to dma-buf fds, which is ofc horrible.
Imo the right way is to create a real platform and start to standardize
some of this stuff (fb modifier) more, so that we can pass it from kms to
gbm, then to compositor clients through either a generic transport of
private extensions. Or maybe we can mostly hide all that.

Post by Daniel Stone
'Width, height, pitch and format aren't enough information' - this is
true, but not necessarily relevant. I'm not sure what the source of
this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
to be extended with a gbm_bo_get_modifier() call, which would allow
you to get the DRM format modifier to describe tiling/compression/et
al (as well as perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers). If it's
not gbm, what actually is it? The only other place I can think of
(suggested by Pekka, I think) was the wl_drm protocol, which it should
be stressed is a) not required in any way by Wayland, b) not a
published/public protocol, c) not a stable protocol. wl_drm just
happens to be the way that Mesa shares buffers, just as wl_viv is how
Vivante's proprietary driver shares buffers, and mali_buffer_sharing
is how the Mali driver does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used through
EGL, there is _no_ requirement for you to also implement wl_drm. As it
is a hidden private Mesa protocol, there is also no requirement for
the protocol to remain stable.

So I've what our own android folks all transport, and I think most of it
we can transport with the current addfb2.1 kms metadata. And we could even
add hints that kms atomic returns if a plane doesn't work with the most
preferred format that would just work in this config. Thus far I've
stumbled over 2 cases:
- compression formats that can't be easily described in addfb2.1 because
they allocate a side buffer in some fancy special memory. The solution
for that that was discussed at xdc2014 was to use a dma-buf to wrap that
up, and then use as aux buffer (there's patches floating for that) with
normal addfb2.1.
- content protection. Can't talk about this, but worst case it can all be
captured in special-purpose buffers too I think.

Post by Daniel Stone
'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
the explicit buffer management on the client side does not parallel
EGLStreams, and notably there is no equivalent consumer interface
offered on the server side, but instead the individual-buffer-driven
approach is taken. It's true that VK_WSI_display_swapchain does exist
and does match the EGLStreams model fairly closely, but also that it
does not have universal implementation: the Intel 'anv' Mesa-based
driver does not implement display_swapchain, instead having an
interface to export a VkImage as a dmabuf. It's true that the latter
is not optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
GBM avoids it for EGL on KMS.

I'm not sure a swapchain/stream is good enough, since the trouble really
starts when you have tons of hw planes and changing configurations.
Looking at individual streams instead of the global state is pointless in
that case.

Same for atomic, syncing multiple streams looks pretty tricky. And iirc
when I pinged Jakob Bornecrantz (who seems to know/like streams somewhat)
there's no way to eachive that.

Post by Daniel Stone
I think it's been good to have this series to push the discussion
further in more concrete terms, but unfortunately I have to say that
I'm even less convinced now than I have ever been. Sorry.

Well the thing that irks me is that this isn't aiming to build a common
platform. There's definitely issues with gbm/gralloc+kms+egl in upstream
repos, and vendors have hacked around those in all kinds of horrible ways.
But trying to fix this mess with yet another vendor-private solution just
doesn't help. Instead we need to fix what is there, for everyone, instead
of fragmenting more.
-Daniel

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Daniel Stone

2016-03-22 21:52:21 UTC

Permalink

Hi,

Post by Daniel Vetter

Post by Daniel Stone
'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?

There's display engines which can directly scan out buffers compressed by
the render engine. It's awesome, except randomly limited, so you need a
communication backchannel from your display driver all the way to your
buffer allocator thing on the client side. And depending upon luck you
really can't tell who should do the decompress past for most optimal
result upfront.
I think on android the most common way to do that is to attach arbitrary
metadata with a hand-rolled ioctl to dma-buf fds, which is ofc horrible.
Imo the right way is to create a real platform and start to standardize
some of this stuff (fb modifier) more, so that we can pass it from kms to
gbm, then to compositor clients through either a generic transport of
private extensions. Or maybe we can mostly hide all that.

Right, at least with some (AFBC), just the buffer data + FB modifier
completely describes what you need to scan out transparently. Though
this is not the case for Intel and Tegra.

Post by Daniel Vetter

So I've what our own android folks all transport, and I think most of it
we can transport with the current addfb2.1 kms metadata. And we could even
add hints that kms atomic returns if a plane doesn't work with the most
preferred format that would just work in this config. Thus far I've
- compression formats that can't be easily described in addfb2.1 because
they allocate a side buffer in some fancy special memory. The solution
for that that was discussed at xdc2014 was to use a dma-buf to wrap that
up, and then use as aux buffer (there's patches floating for that) with
normal addfb2.1.

Indeed, although sadly the current Intel patches go in the other
direction and use a driver-private plane property to describe the
current compression status. :( Hopefully the Tegra/Nouveau people are
able to prepare something which is usable from generic userspace.

Post by Daniel Vetter

True and irrelevant, at once. ;) You have to examine the global state
(as a compositor, just like HWComposer does) to determine the optimal
configuration, but to actually get that configuration to land, you
have to push down to individual clients, which means dealing with a
swapchain primitive. If you want to do seamless transitions and
reallocations, you need to get the client to gradually reallocate its
swapchain at a time convenient for it.

As to streams lacking atomicity et al, I do agree, and think the only
model which will actually work out is HWComposer.

Post by Daniel Vetter

Agreed. One of the things I've been incredibly happy with is how our
platform has managed to stay completely generic and vendor-neutral so
far, and I'd love to preserve that.

Cheers,
Daniel

Andy Ritger

2016-03-23 00:33:57 UTC

Permalink

Post by Daniel Stone
Hi,

[...]