Discussion:
Introduction and updates from NVIDIA
Miguel Angel Vico
2016-03-21 16:28:13 UTC
Permalink
Hi all,

First of all, I'd like to introduce myself to the Wayland community: My
name is Miguel A. Vico, and I've been working as a Software Engineer
for NVIDIA for some time now, more specifically, in the Linux drivers
team. Although I've never spoken before, I've been lately following the
amazing work that you all have been doing here.

We have been working on adding to our drivers all required features to
be able to run Wayland and Weston on top of it. We have just released
our NVIDIA's 364.12 GPU driver, which brings initial DRM KMS support
(among other things). Please, check out our public announcement here:

https://devtalk.nvidia.com/default/topic/925605/linux/nvidia-364-12-release-vulkan-glvnd-drm-kms-and-eglstreams/


In order to make the Weston DRM compositor work with our drivers, we
have used EGLDevice, EGLOutput, and EGLStream objects.

For those not familiar with this set of EGL structures, here I try to
summarize the most important part of them, and how would they fit in
the current Weston DRM compositor design:

EGLDevice provides means to enumerate native devices, and then
create an EGL display connection from them.

Similarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.

For instance, EGLOutputLayer represents a portion of display
control hardware that accepts an image as input and processes it
for presentation on a display device.

EGLStream implements a mechanism to communicate frame producers and
frame consumers. By attaching an EGLOutputLayer consumer to a
stream, a producer will be able to present frames on a display
device.

Thus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a display
device.

In a similar way, by attaching a GLTexture consumer to a stream, a
producer (wayland client) could feed frames to a texture, which in
turn can be used by a compositor to prepare the final frame to be
presented.

Whenever EGL_EXT_device_drm extension is present, EGLDevice can
be used to enumerate and access DRM KMS devices, and EGLOutputLayer
to enumerate and access DRM KMS crtcs and planes.

By using EGLStreams and attaching an EGLOutputLayer consumer
(representing a DRM KMS crtc or plane) to it, compositor-drm can
produce final composition frames and present them on a DRM device.


Most of the EGL extensions required to implement this may be already
found in the Khronos registry, but we also needed extended
functionality for EGLStreams and EGLOutput consumers provided by
following extensions:

- EGL_NV_stream_attrib:

https://github.com/aritger/eglstreams-kms-example/blob/master/proposed-extensions/EGL_NV_stream_attrib.txt

Among other things, this extension defines a version of the stream
acquire function that takes an EGLAttrib parameter, allowing to
modify/extend acquire behavior in several cases.

- EGL_EXT_stream_acquire_mode:

https://github.com/aritger/eglstreams-kms-example/blob/master/proposed-extensions/EGL_EXT_stream_acquire_mode.txt

By default, EGLOutputLayer consumer are set to automatically acquire
frames, so eglSwapBuffers() call on the producer side will present
to the display without any further action. This extension defines a
new EGLStream attribute which allows to change this behavior so that
acquire operations must be issued manually with
eglStreamConsumerAcquireAttribNV().

- EGL_NV_output_drm_flip_event:

https://github.com/aritger/eglstreams-kms-example/blob/master/proposed-extensions/EGL_NV_output_drm_flip_event.txt

This extension defines a new acquire attribute for EGLOutputLayer
consumers tied to DRM KMS CRTCs. It allows clients to get notified
whenever an acquire operation issued with
eglStreamConsumerAcquireAttribNV() is done.


Additionally, in order to allow wl_buffers to be bound to EGLStreams, we
kludged eglQueryWaylandBufferWL(EGL_WAYLAND_BUFFER_WL) to return the
stream file descriptor.

We think the proper way to handle this should be:

- Update WL_bind_wayland_display such that eglQueryWaylandBufferWL()
accepts a new attribute EGL_WAYLAND_BUFFER_TYPE_WL, returning
EGL_WAYLAND_BUFFER_EGLIMAGE_WL for the non-stream case.

- Add a new WL_wayland_buffer_eglstream extension, which would define
EGL_WAYLAND_BUFFER_EGLSTREAM_WL as a return value for
EGL_WAYLAND_BUFFER_TYPE_WL, and yet another attribute
EGL_WAYLAND_BUFFER_EGLSTREAM_FD_WL to query the stream file
descriptor.


I'm planning on posting to this mailing list the set of patches that
will add the support above-mentioned, hoping to get feedback from you.


Thanks in advance,
--
Miguel

NVIDIA GmbH, Wuerselen, Germany, Amtsgericht Aachen, HRB 8361
Managing Director: Karen Theresa Burns

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
Nicole Fontenot
2016-03-22 13:39:57 UTC
Permalink
Hello Miguel,

I cannot comment on if these patches are within scope of wayland but, I
think now is the perfect time to consider API extensions.

It would be great if someone with an Nvidia card has time to run
performance tests when you submit your patches. A proper decision to
include the patches has more likelihood of happening that way, I think.
If they do not get accepted into Wayland I'm sure that users of Nvidia
cards, particularly the steam users, would still want this as an extension
in their favorite composer.
Daniel Stone
2016-03-22 13:49:59 UTC
Permalink
Hi Miguel,
Post by Miguel Angel Vico
First of all, I'd like to introduce myself to the Wayland community: My
name is Miguel A. Vico, and I've been working as a Software Engineer
for NVIDIA for some time now, more specifically, in the Linux drivers
team. Although I've never spoken before, I've been lately following the
amazing work that you all have been doing here.
Welcome!

I'm sorry I don't have some better news for you, but Andy and Aaron
can tell you it's not personal: this has been going on for years.
Post by Miguel Angel Vico
In order to make the Weston DRM compositor work with our drivers, we
have used EGLDevice, EGLOutput, and EGLStream objects.
This is ... unfortunate. To echo what Daniel Vetter said, on the whole
these modesetting-in-EGL extensions are not something which have that
wide support, or even implementation. That being said, it's
interesting to have an implementation, because it has helped shape my
feelings and arguments a little, into something more concrete
Post by Miguel Angel Vico
For those not familiar with this set of EGL structures, here I try to
summarize the most important part of them, and how would they fit in
EGLDevice provides means to enumerate native devices, and then
create an EGL display connection from them.
This is generically useful: we would like to extend
eglGetPlatformDisplay to take an attrib naming an EGLDevice, which we
could then use with platform_gbm (to select GPU and scanout device
separately, either for multi-GPU systems or also for SoCs with
discrete GPU/dispc setups) as well as platform_wayland and co.
Post by Miguel Angel Vico
Similarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.
For instance, EGLOutputLayer represents a portion of display
control hardware that accepts an image as input and processes it
for presentation on a display device.
I still struggle to see the value of what is essentially an
abstraction over KMS, but oh well.
Post by Miguel Angel Vico
EGLStream implements a mechanism to communicate frame producers and
frame consumers. By attaching an EGLOutputLayer consumer to a
stream, a producer will be able to present frames on a display
device.
This is understating things quite a bit, I think. On the
Wayland-client side, it's a pretty big change from the EGLSurface
model, particularly if you use the default mailbox mode (see comments
on patch 4/7 as to how this breaks real-world setups, AFAICT). On the
Wayland-compositor side, it's two _huge_ changes.

Firstly, again looking at the case where a Wayland client is a stream
producer and the Wayland compositor is a consumer, we move from a
model where references to individual buffers are explicitly passed
through the Wayland protocol, to where those buffers merely carry a
reference to a stream. Again, as stated in the review of 4/7, that
looks like it has the potential to break some actual real-world cases,
and I have no idea how to solve it, other than banning mailbox mode,
which would seem to mostly defeat the point of Streams (more on that
below).

Secondly, looking at the compositor-drm case, the use of the dumb
buffer to display undefined content as a dummy modeset really makes me
uneasy, again because both gl-renderer and compositor-drm are written
for explicit individual buffer management, rather than streams in +
streams out. I think the combination of the two pushes them long
beyond the point of readability, and I'd encourage you to look at
trying to split those files up, or at least the functions within them.
Attempting to keep both modes in there just looks like a maintenance
nightmare, especially when this streams implementation
(unsurprisingly) has to bypass almost the entire runtime (as opposed
to init-time) functionality of compositor-drm.

Also, I'm not quite sure how you're testing the compositor-as-consumer
mode: I can't seem to see any EGL extensions which allow you to
connect a Wayland surface as an EGLStream consumer. Do you have
something else unpublished that's being used here, or is this what the
libnvidia-egl-wayland library is for? Or do you just have clients
using EGLSurfaces as normal, which happen to be implemented internally
as EGLStreams? (Also, that the only way to test this is through
proprietary drivers implementing only-just-published extensions not
only makes me very sad, but hugely increases the potential for this to
be inadvertently broken.)
Post by Miguel Angel Vico
Thus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a display
device.
In a similar way, by attaching a GLTexture consumer to a stream, a
producer (wayland client) could feed frames to a texture, which in
turn can be used by a compositor to prepare the final frame to be
presented.
Quick aside: this reminds me in many unfortunate ways of
GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture stream of
stuff and make it appear in a texture' model as streams, whereas most
of the rest of the world (EGL, Vulkan WSI, Wayland, Android, ChromeOS,
etc) have all moved explicitly _away_ from that model to passing
references to individual buffers, this in many ways brings us back to
tfp.
Post by Miguel Angel Vico
Whenever EGL_EXT_device_drm extension is present, EGLDevice can
be used to enumerate and access DRM KMS devices, and EGLOutputLayer
to enumerate and access DRM KMS crtcs and planes.
Again, the enumeration isn't so much used as bypassed. The original
enumeration is used, and all we do with the EGL objects is a) list all
of them, b) filter them to find the one we already have, and c)
perhaps replace their internal representation of the device with the
one we already have.
Post by Miguel Angel Vico
By using EGLStreams and attaching an EGLOutputLayer consumer
(representing a DRM KMS crtc or plane) to it, compositor-drm can
produce final composition frames and present them on a DRM device.
Arguably it's gl-renderer producing the frames, with compositor-drm
kind of acting as a fake consumer (EGL_NV_stream_attrib).
Post by Miguel Angel Vico
Additionally, in order to allow wl_buffers to be bound to EGLStreams, we
kludged eglQueryWaylandBufferWL(EGL_WAYLAND_BUFFER_WL) to return the
stream file descriptor.
As said earlier, I don't think this is the right way to go, and have
other suggestions.

I'd like to look at the elephant in the room, which is why you're
using this in the first place (aside from general NVIDIA enthusiasm
for encapsulating everything within EGL Streams/Output/Device/etc,
dating back many years). Andy/Aaron, you've said that you found GBM to
be inadequate, and I'd like to find out explicitly how. Through a few
snippets of IRC and NVIDIA devtalk, so far I can see:

'We can't choose an optimal rendering configuration, because we don't
know how it's going to be used' - (almost completely) untrue. The FD
you pass to gbm_device_create is that of the KMS device, a gbm_surface
contains information as to how the plane (primary or overlay) will be
configured, and an EGLDisplay lets you tie the rendering and scanout
devices together. What more information do you need? It's true that we
don't have a way to select individual rendering devices at the moment,
but as said earlier, passing an EGLDevice as an attrib to
GetPlatformDisplay would resolve that, as you would have the render
device identified by the EGLDevice and the scanout device identified
by the gbm_device. At that point, you have the full pipeline and can
determine the optimal configuration.

'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?

'Width, height, pitch and format aren't enough information' - this is
true, but not necessarily relevant. I'm not sure what the source of
this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
to be extended with a gbm_bo_get_modifier() call, which would allow
you to get the DRM format modifier to describe tiling/compression/et
al (as well as perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers). If it's
not gbm, what actually is it? The only other place I can think of
(suggested by Pekka, I think) was the wl_drm protocol, which it should
be stressed is a) not required in any way by Wayland, b) not a
published/public protocol, c) not a stable protocol. wl_drm just
happens to be the way that Mesa shares buffers, just as wl_viv is how
Vivante's proprietary driver shares buffers, and mali_buffer_sharing
is how the Mali driver does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used through
EGL, there is _no_ requirement for you to also implement wl_drm. As it
is a hidden private Mesa protocol, there is also no requirement for
the protocol to remain stable.

'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
the explicit buffer management on the client side does not parallel
EGLStreams, and notably there is no equivalent consumer interface
offered on the server side, but instead the individual-buffer-driven
approach is taken. It's true that VK_WSI_display_swapchain does exist
and does match the EGLStreams model fairly closely, but also that it
does not have universal implementation: the Intel 'anv' Mesa-based
driver does not implement display_swapchain, instead having an
interface to export a VkImage as a dmabuf. It's true that the latter
is not optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
GBM avoids it for EGL on KMS.

I think it's been good to have this series to push the discussion
further in more concrete terms, but unfortunately I have to say that
I'm even less convinced now than I have ever been. Sorry.

Cheers,
Daniel
Daniel Vetter
2016-03-22 21:43:18 UTC
Permalink
Post by Daniel Stone
I'd like to look at the elephant in the room, which is why you're
using this in the first place (aside from general NVIDIA enthusiasm
for encapsulating everything within EGL Streams/Output/Device/etc,
dating back many years). Andy/Aaron, you've said that you found GBM to
be inadequate, and I'd like to find out explicitly how. Through a few
'We can't choose an optimal rendering configuration, because we don't
know how it's going to be used' - (almost completely) untrue. The FD
you pass to gbm_device_create is that of the KMS device, a gbm_surface
contains information as to how the plane (primary or overlay) will be
configured, and an EGLDisplay lets you tie the rendering and scanout
devices together. What more information do you need? It's true that we
don't have a way to select individual rendering devices at the moment,
but as said earlier, passing an EGLDevice as an attrib to
GetPlatformDisplay would resolve that, as you would have the render
device identified by the EGLDevice and the scanout device identified
by the gbm_device. At that point, you have the full pipeline and can
determine the optimal configuration.
'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?
There's display engines which can directly scan out buffers compressed by
the render engine. It's awesome, except randomly limited, so you need a
communication backchannel from your display driver all the way to your
buffer allocator thing on the client side. And depending upon luck you
really can't tell who should do the decompress past for most optimal
result upfront.

I think on android the most common way to do that is to attach arbitrary
metadata with a hand-rolled ioctl to dma-buf fds, which is ofc horrible.
Imo the right way is to create a real platform and start to standardize
some of this stuff (fb modifier) more, so that we can pass it from kms to
gbm, then to compositor clients through either a generic transport of
private extensions. Or maybe we can mostly hide all that.
Post by Daniel Stone
'Width, height, pitch and format aren't enough information' - this is
true, but not necessarily relevant. I'm not sure what the source of
this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
to be extended with a gbm_bo_get_modifier() call, which would allow
you to get the DRM format modifier to describe tiling/compression/et
al (as well as perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers). If it's
not gbm, what actually is it? The only other place I can think of
(suggested by Pekka, I think) was the wl_drm protocol, which it should
be stressed is a) not required in any way by Wayland, b) not a
published/public protocol, c) not a stable protocol. wl_drm just
happens to be the way that Mesa shares buffers, just as wl_viv is how
Vivante's proprietary driver shares buffers, and mali_buffer_sharing
is how the Mali driver does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used through
EGL, there is _no_ requirement for you to also implement wl_drm. As it
is a hidden private Mesa protocol, there is also no requirement for
the protocol to remain stable.
So I've what our own android folks all transport, and I think most of it
we can transport with the current addfb2.1 kms metadata. And we could even
add hints that kms atomic returns if a plane doesn't work with the most
preferred format that would just work in this config. Thus far I've
stumbled over 2 cases:
- compression formats that can't be easily described in addfb2.1 because
they allocate a side buffer in some fancy special memory. The solution
for that that was discussed at xdc2014 was to use a dma-buf to wrap that
up, and then use as aux buffer (there's patches floating for that) with
normal addfb2.1.
- content protection. Can't talk about this, but worst case it can all be
captured in special-purpose buffers too I think.
Post by Daniel Stone
'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
the explicit buffer management on the client side does not parallel
EGLStreams, and notably there is no equivalent consumer interface
offered on the server side, but instead the individual-buffer-driven
approach is taken. It's true that VK_WSI_display_swapchain does exist
and does match the EGLStreams model fairly closely, but also that it
does not have universal implementation: the Intel 'anv' Mesa-based
driver does not implement display_swapchain, instead having an
interface to export a VkImage as a dmabuf. It's true that the latter
is not optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
GBM avoids it for EGL on KMS.
I'm not sure a swapchain/stream is good enough, since the trouble really
starts when you have tons of hw planes and changing configurations.
Looking at individual streams instead of the global state is pointless in
that case.

Same for atomic, syncing multiple streams looks pretty tricky. And iirc
when I pinged Jakob Bornecrantz (who seems to know/like streams somewhat)
there's no way to eachive that.
Post by Daniel Stone
I think it's been good to have this series to push the discussion
further in more concrete terms, but unfortunately I have to say that
I'm even less convinced now than I have ever been. Sorry.
Well the thing that irks me is that this isn't aiming to build a common
platform. There's definitely issues with gbm/gralloc+kms+egl in upstream
repos, and vendors have hacked around those in all kinds of horrible ways.
But trying to fix this mess with yet another vendor-private solution just
doesn't help. Instead we need to fix what is there, for everyone, instead
of fragmenting more.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Daniel Stone
2016-03-22 21:52:21 UTC
Permalink
Hi,
Post by Daniel Vetter
Post by Daniel Stone
'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?
There's display engines which can directly scan out buffers compressed by
the render engine. It's awesome, except randomly limited, so you need a
communication backchannel from your display driver all the way to your
buffer allocator thing on the client side. And depending upon luck you
really can't tell who should do the decompress past for most optimal
result upfront.
I think on android the most common way to do that is to attach arbitrary
metadata with a hand-rolled ioctl to dma-buf fds, which is ofc horrible.
Imo the right way is to create a real platform and start to standardize
some of this stuff (fb modifier) more, so that we can pass it from kms to
gbm, then to compositor clients through either a generic transport of
private extensions. Or maybe we can mostly hide all that.
Right, at least with some (AFBC), just the buffer data + FB modifier
completely describes what you need to scan out transparently. Though
this is not the case for Intel and Tegra.
Post by Daniel Vetter
Post by Daniel Stone
'Width, height, pitch and format aren't enough information' - this is
true, but not necessarily relevant. I'm not sure what the source of
this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
to be extended with a gbm_bo_get_modifier() call, which would allow
you to get the DRM format modifier to describe tiling/compression/et
al (as well as perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers). If it's
not gbm, what actually is it? The only other place I can think of
(suggested by Pekka, I think) was the wl_drm protocol, which it should
be stressed is a) not required in any way by Wayland, b) not a
published/public protocol, c) not a stable protocol. wl_drm just
happens to be the way that Mesa shares buffers, just as wl_viv is how
Vivante's proprietary driver shares buffers, and mali_buffer_sharing
is how the Mali driver does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used through
EGL, there is _no_ requirement for you to also implement wl_drm. As it
is a hidden private Mesa protocol, there is also no requirement for
the protocol to remain stable.
So I've what our own android folks all transport, and I think most of it
we can transport with the current addfb2.1 kms metadata. And we could even
add hints that kms atomic returns if a plane doesn't work with the most
preferred format that would just work in this config. Thus far I've
- compression formats that can't be easily described in addfb2.1 because
they allocate a side buffer in some fancy special memory. The solution
for that that was discussed at xdc2014 was to use a dma-buf to wrap that
up, and then use as aux buffer (there's patches floating for that) with
normal addfb2.1.
Indeed, although sadly the current Intel patches go in the other
direction and use a driver-private plane property to describe the
current compression status. :( Hopefully the Tegra/Nouveau people are
able to prepare something which is usable from generic userspace.
Post by Daniel Vetter
Post by Daniel Stone
'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
the explicit buffer management on the client side does not parallel
EGLStreams, and notably there is no equivalent consumer interface
offered on the server side, but instead the individual-buffer-driven
approach is taken. It's true that VK_WSI_display_swapchain does exist
and does match the EGLStreams model fairly closely, but also that it
does not have universal implementation: the Intel 'anv' Mesa-based
driver does not implement display_swapchain, instead having an
interface to export a VkImage as a dmabuf. It's true that the latter
is not optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
GBM avoids it for EGL on KMS.
I'm not sure a swapchain/stream is good enough, since the trouble really
starts when you have tons of hw planes and changing configurations.
Looking at individual streams instead of the global state is pointless in
that case.
True and irrelevant, at once. ;) You have to examine the global state
(as a compositor, just like HWComposer does) to determine the optimal
configuration, but to actually get that configuration to land, you
have to push down to individual clients, which means dealing with a
swapchain primitive. If you want to do seamless transitions and
reallocations, you need to get the client to gradually reallocate its
swapchain at a time convenient for it.

As to streams lacking atomicity et al, I do agree, and think the only
model which will actually work out is HWComposer.
Post by Daniel Vetter
Post by Daniel Stone
I think it's been good to have this series to push the discussion
further in more concrete terms, but unfortunately I have to say that
I'm even less convinced now than I have ever been. Sorry.
Well the thing that irks me is that this isn't aiming to build a common
platform. There's definitely issues with gbm/gralloc+kms+egl in upstream
repos, and vendors have hacked around those in all kinds of horrible ways.
But trying to fix this mess with yet another vendor-private solution just
doesn't help. Instead we need to fix what is there, for everyone, instead
of fragmenting more.
Agreed. One of the things I've been incredibly happy with is how our
platform has managed to stay completely generic and vendor-neutral so
far, and I'd love to preserve that.

Cheers,
Daniel
Andy Ritger
2016-03-23 00:33:57 UTC
Permalink
Post by Daniel Stone
Hi,
[...]
Post by Daniel Stone
Post by Daniel Vetter
Post by Daniel Stone
I think it's been good to have this series to push the discussion
further in more concrete terms, but unfortunately I have to say that
I'm even less convinced now than I have ever been. Sorry.
Well the thing that irks me is that this isn't aiming to build a common
platform. There's definitely issues with gbm/gralloc+kms+egl in upstream
repos, and vendors have hacked around those in all kinds of horrible ways.
But trying to fix this mess with yet another vendor-private solution just
doesn't help. Instead we need to fix what is there, for everyone, instead
of fragmenting more.
Agreed. One of the things I've been incredibly happy with is how our
platform has managed to stay completely generic and vendor-neutral so
far, and I'd love to preserve that.
I don't think you'll find any disagreement to that from NVIDIA, either.

I apologize if the EGLStreams proposal gave the impression of a
vendor-private solution. That wasn't the intent. The EGLStream family
of extensions are, after all, an open specification that any EGL vendor
can implement. If there are aspects of any of these EGL extensions that
seem useful, I'd hope that Mesa would we willing to adopt them.

We (NVIDIA) clearly think EGLStreams is a good direction for expressing
buffer sharing semantics. In our ideal world, everyone would implement
these extensions and Wayland compositors would migrate to using them as
the generic vendor-neutral mechanism for buffer sharing :)

But, I'm also happy discuss ways to incrementally improve gbm. I tried
Post by Daniel Stone
There's definitely issues with gbm/gralloc+kms+egl in upstream
repos, and vendors have hacked around those in all kinds of horrible ways.
Some examples were given earlier in this thread. What are some of the
other horrible hacks drivers have had to use today with gbm?

Thanks,
- Andy
Daniel Vetter
2016-03-23 10:48:01 UTC
Permalink
Post by Andy Ritger
Post by Daniel Stone
Hi,
[...]
Post by Daniel Stone
Post by Daniel Vetter
Post by Daniel Stone
I think it's been good to have this series to push the discussion
further in more concrete terms, but unfortunately I have to say that
I'm even less convinced now than I have ever been. Sorry.
Well the thing that irks me is that this isn't aiming to build a common
platform. There's definitely issues with gbm/gralloc+kms+egl in upstream
repos, and vendors have hacked around those in all kinds of horrible ways.
But trying to fix this mess with yet another vendor-private solution just
doesn't help. Instead we need to fix what is there, for everyone, instead
of fragmenting more.
Agreed. One of the things I've been incredibly happy with is how our
platform has managed to stay completely generic and vendor-neutral so
far, and I'd love to preserve that.
I don't think you'll find any disagreement to that from NVIDIA, either.
I apologize if the EGLStreams proposal gave the impression of a
vendor-private solution. That wasn't the intent. The EGLStream family
of extensions are, after all, an open specification that any EGL vendor
can implement. If there are aspects of any of these EGL extensions that
seem useful, I'd hope that Mesa would we willing to adopt them.
We (NVIDIA) clearly think EGLStreams is a good direction for expressing
buffer sharing semantics. In our ideal world, everyone would implement
these extensions and Wayland compositors would migrate to using them as
the generic vendor-neutral mechanism for buffer sharing :)
But, I'm also happy discuss ways to incrementally improve gbm. I tried
So I guess the top level issue with eglstreams+kms that at least I see is
that if we really want to do this, we would need to terminate the
eglstream in the kernel. Since with kms really only the kernel knows why
exactly a buffer isn't the right one, and what the producer should change
to get to a more optimal setup.

But the problem is that KMS is ABI and vendor-neutral, which means all
that fancy metadata that you want to attach would need to be standardized
in some way. And we'd need to have in-kernel eglstreams. So you'd face
both the problem of getting a new primitive into upstream (dma-buf took
massive efforts, same for fences going on now). And you'd lose the benefit
of eglstreams being able to encapsulate vendor metadata.

And we need to figure out how to standardize this a bit better even
without eglstreams, so that's why I don't really understand why eglstreams
has benefits. It's clearly a nice concept if your in a world of
one-vendor-only, but that's not what KMS is aiming for really.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Andy Ritger
2016-03-24 17:06:04 UTC
Permalink
Post by Daniel Vetter
Post by Andy Ritger
Post by Daniel Stone
Hi,
[...]
Post by Daniel Stone
Post by Daniel Vetter
Post by Daniel Stone
I think it's been good to have this series to push the discussion
further in more concrete terms, but unfortunately I have to say that
I'm even less convinced now than I have ever been. Sorry.
Well the thing that irks me is that this isn't aiming to build a common
platform. There's definitely issues with gbm/gralloc+kms+egl in upstream
repos, and vendors have hacked around those in all kinds of horrible ways.
But trying to fix this mess with yet another vendor-private solution just
doesn't help. Instead we need to fix what is there, for everyone, instead
of fragmenting more.
Agreed. One of the things I've been incredibly happy with is how our
platform has managed to stay completely generic and vendor-neutral so
far, and I'd love to preserve that.
I don't think you'll find any disagreement to that from NVIDIA, either.
I apologize if the EGLStreams proposal gave the impression of a
vendor-private solution. That wasn't the intent. The EGLStream family
of extensions are, after all, an open specification that any EGL vendor
can implement. If there are aspects of any of these EGL extensions that
seem useful, I'd hope that Mesa would we willing to adopt them.
We (NVIDIA) clearly think EGLStreams is a good direction for expressing
buffer sharing semantics. In our ideal world, everyone would implement
these extensions and Wayland compositors would migrate to using them as
the generic vendor-neutral mechanism for buffer sharing :)
But, I'm also happy discuss ways to incrementally improve gbm. I tried
So I guess the top level issue with eglstreams+kms that at least I see is
that if we really want to do this, we would need to terminate the
eglstream in the kernel. Since with kms really only the kernel knows why
exactly a buffer isn't the right one, and what the producer should change
to get to a more optimal setup.
But the problem is that KMS is ABI and vendor-neutral, which means all
that fancy metadata that you want to attach would need to be standardized
in some way. And we'd need to have in-kernel eglstreams. So you'd face
both the problem of getting a new primitive into upstream (dma-buf took
massive efforts, same for fences going on now). And you'd lose the benefit
of eglstreams being able to encapsulate vendor metadata.
And we need to figure out how to standardize this a bit better even
without eglstreams, so that's why I don't really understand why eglstreams
has benefits. It's clearly a nice concept if your in a world of
one-vendor-only, but that's not what KMS is aiming for really.
eglstreams or gbm or any other implementation aside, is it always _only_
the KMS driver that knows what the optimal configuration would be?
It seems like part of the decision could require knowledge of the graphics
hardware, which presumably the OpenGL/EGL driver is best positioned
to have.

For that aspect: would it be reasonable to execute hardware-specific
driver code in the drmModeAtomicCommit() call chain between the
application calling libdrm to make the atomic update, and the ioctl
into the kernel? Maybe that would be a call to libgbm that dispatches to
the hardware-specific gbm backend. However it is structured, having
hardware-specific graphics driver code execute as part of the flip
request might be one way let the graphics driver piece and the display
driver piece coordinate on hardware specifics, without polluting the
application-facing API with hardware-specifics?
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Jasper St. Pierre
2016-03-24 18:43:51 UTC
Permalink
On Thu, Mar 24, 2016 at 10:06 AM, Andy Ritger <***@nvidia.com> wrote:

... snip ...
Post by Andy Ritger
eglstreams or gbm or any other implementation aside, is it always _only_
the KMS driver that knows what the optimal configuration would be?
It seems like part of the decision could require knowledge of the graphics
hardware, which presumably the OpenGL/EGL driver is best positioned
to have.
Why would the OpenGL driver the best thing to know about display
controller configuration? On a lot of ARM SoCs, the two are separate
modules, often provided by separate companies. For instance, the Mali
GPUs don't have display controllers, and the Mali driver is often
provided as a blob to vendors, who must use it with their custom-built
display controller.

Buffer allocation is currently done through DRI2 with the Mali blob,
so it's expected that the best allocation is done server-side in your
xf86-video-* driver.

I agree that we need somewhere better to hook up smart buffer
allocation, but OpenGL/EGL isn't necessarily the best place. We
decided a little while ago that a separate shared library and
interface designed to do buffer allocation that can be configured on a
per-hardware basis would be a better idea, and that's how gbm started
-- as a generic buffer manager.
Post by Andy Ritger
For that aspect: would it be reasonable to execute hardware-specific
driver code in the drmModeAtomicCommit() call chain between the
application calling libdrm to make the atomic update, and the ioctl
into the kernel? Maybe that would be a call to libgbm that dispatches to
the hardware-specific gbm backend. However it is structured, having
hardware-specific graphics driver code execute as part of the flip
request might be one way let the graphics driver piece and the display
driver piece coordinate on hardware specifics, without polluting the
application-facing API with hardware-specifics?
Wait a minute. Once you're in commit, isn't that far too late for
hardware specifics? Aren't we talking about buffer allocation and
such, which would need to happen far, far before the commit? Or did I
miss something here?
Post by Andy Ritger
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
--
Jasper
Andy Ritger
2016-04-02 00:18:57 UTC
Permalink
Post by Jasper St. Pierre
... snip ...
Post by Andy Ritger
eglstreams or gbm or any other implementation aside, is it always _only_
the KMS driver that knows what the optimal configuration would be?
It seems like part of the decision could require knowledge of the graphics
hardware, which presumably the OpenGL/EGL driver is best positioned
to have.
Why would the OpenGL driver the best thing to know about display
controller configuration?
Sorry I was unclear: I didn't mean exclusively the display controller
configuration in the above. Rather, the combination of the display
controller configuration and the graphics rendering capabilities.
Post by Jasper St. Pierre
On a lot of ARM SoCs, the two are separate
modules, often provided by separate companies. For instance, the Mali
GPUs don't have display controllers, and the Mali driver is often
provided as a blob to vendors, who must use it with their custom-built
display controller.
Buffer allocation is currently done through DRI2 with the Mali blob,
so it's expected that the best allocation is done server-side in your
xf86-video-* driver.
I agree that we need somewhere better to hook up smart buffer
allocation, but OpenGL/EGL isn't necessarily the best place. We
decided a little while ago that a separate shared library and
interface designed to do buffer allocation that can be configured on a
per-hardware basis would be a better idea, and that's how gbm started
-- as a generic buffer manager.
OK.
Post by Jasper St. Pierre
Post by Andy Ritger
For that aspect: would it be reasonable to execute hardware-specific
driver code in the drmModeAtomicCommit() call chain between the
application calling libdrm to make the atomic update, and the ioctl
into the kernel? Maybe that would be a call to libgbm that dispatches to
the hardware-specific gbm backend. However it is structured, having
hardware-specific graphics driver code execute as part of the flip
request might be one way let the graphics driver piece and the display
driver piece coordinate on hardware specifics, without polluting the
application-facing API with hardware-specifics?
Wait a minute. Once you're in commit, isn't that far too late for
hardware specifics? Aren't we talking about buffer allocation and
such, which would need to happen far, far before the commit? Or did I
miss something here?
I think I led the discussion off course with my previous response to
Daniel Vetter.

Definitely buffer allocation for the current frame can't be altered
at commit time. But, it seems to me like there is a class of graphics
hardware specifics that _are_ applicable to commit time: detiling, color
decompression, or any other sorts of graphics/display coherency that needs
to be resolved by the graphics driver. If the graphics driver were in the
commit call chain, then it would have the option to perform those sorts
of resolutions at commit time. This would, in turn, allow the graphics
driver to _not_ perform these sorts of resolutions (unnecessarily, and
potentially expensively) if the client-produced buffer were going to be
used by something other than display (e.g., texture).

Thanks,
- Andy
Post by Jasper St. Pierre
Post by Andy Ritger
Post by Daniel Vetter
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
--
Jasper
Daniel Vetter
2016-03-28 18:12:59 UTC
Permalink
Post by Andy Ritger
Post by Daniel Vetter
Post by Andy Ritger
Post by Daniel Stone
Hi,
[...]
Post by Daniel Stone
Post by Daniel Vetter
Post by Daniel Stone
I think it's been good to have this series to push the discussion
further in more concrete terms, but unfortunately I have to say that
I'm even less convinced now than I have ever been. Sorry.
Well the thing that irks me is that this isn't aiming to build a common
platform. There's definitely issues with gbm/gralloc+kms+egl in upstream
repos, and vendors have hacked around those in all kinds of horrible ways.
But trying to fix this mess with yet another vendor-private solution just
doesn't help. Instead we need to fix what is there, for everyone, instead
of fragmenting more.
Agreed. One of the things I've been incredibly happy with is how our
platform has managed to stay completely generic and vendor-neutral so
far, and I'd love to preserve that.
I don't think you'll find any disagreement to that from NVIDIA, either.
I apologize if the EGLStreams proposal gave the impression of a
vendor-private solution. That wasn't the intent. The EGLStream family
of extensions are, after all, an open specification that any EGL vendor
can implement. If there are aspects of any of these EGL extensions that
seem useful, I'd hope that Mesa would we willing to adopt them.
We (NVIDIA) clearly think EGLStreams is a good direction for expressing
buffer sharing semantics. In our ideal world, everyone would implement
these extensions and Wayland compositors would migrate to using them as
the generic vendor-neutral mechanism for buffer sharing :)
But, I'm also happy discuss ways to incrementally improve gbm. I tried
So I guess the top level issue with eglstreams+kms that at least I see is
that if we really want to do this, we would need to terminate the
eglstream in the kernel. Since with kms really only the kernel knows why
exactly a buffer isn't the right one, and what the producer should change
to get to a more optimal setup.
But the problem is that KMS is ABI and vendor-neutral, which means all
that fancy metadata that you want to attach would need to be standardized
in some way. And we'd need to have in-kernel eglstreams. So you'd face
both the problem of getting a new primitive into upstream (dma-buf took
massive efforts, same for fences going on now). And you'd lose the benefit
of eglstreams being able to encapsulate vendor metadata.
And we need to figure out how to standardize this a bit better even
without eglstreams, so that's why I don't really understand why eglstreams
has benefits. It's clearly a nice concept if your in a world of
one-vendor-only, but that's not what KMS is aiming for really.
eglstreams or gbm or any other implementation aside, is it always _only_
the KMS driver that knows what the optimal configuration would be?
It seems like part of the decision could require knowledge of the graphics
hardware, which presumably the OpenGL/EGL driver is best positioned
to have.
Android agrees with that and stuffs all these decisions into hwc. And I
agree that there's cases with combinations of display block, 2d engined
and 3d engine where that full-system overview is definitely necessary. But
OpenGL still doesn't look like the right place to me. Something in-between
everything else, like hwc+gralloc on android (which has its own issues)
makes a lot more sense imo in a world where you can combine things widly.

I do believe though that with just kms + sensible heuristics to allocate
surfaces to hw planes + some semi-clever fallback mechanism/hints (which
is what we currently lack) it should be possible to pull something off
without special-case vendor magic in hwc for every combination. That's
purely a conjecture though on my part, otoh no one has ever really tried
all that hard yet.
Post by Andy Ritger
For that aspect: would it be reasonable to execute hardware-specific
driver code in the drmModeAtomicCommit() call chain between the
application calling libdrm to make the atomic update, and the ioctl
into the kernel? Maybe that would be a call to libgbm that dispatches to
the hardware-specific gbm backend. However it is structured, having
hardware-specific graphics driver code execute as part of the flip
request might be one way let the graphics driver piece and the display
driver piece coordinate on hardware specifics, without polluting the
application-facing API with hardware-specifics?
That's essentially the hwc interface, except much less powerful (since you
have no influence on the surface->plane assignment). I think it would be
better to add hwc support to weston (there's other people asking for
that), instead of inventing a new wheel. Also, hwc and upstream atomic
seem to be converging in some of the semantic details if my gossip sources
are correct ;-)

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Daniel Stone
2016-03-29 16:41:15 UTC
Permalink
Hi,
Post by Daniel Vetter
Post by Andy Ritger
eglstreams or gbm or any other implementation aside, is it always _only_
the KMS driver that knows what the optimal configuration would be?
It seems like part of the decision could require knowledge of the graphics
hardware, which presumably the OpenGL/EGL driver is best positioned
to have.
Android agrees with that and stuffs all these decisions into hwc. And I
agree that there's cases with combinations of display block, 2d engined
and 3d engine where that full-system overview is definitely necessary. But
OpenGL still doesn't look like the right place to me. Something in-between
everything else, like hwc+gralloc on android (which has its own issues)
makes a lot more sense imo in a world where you can combine things widly.
Right. Samsung decided that answer was correct, and Tizen has the
Tizen Buffer Manager, which started off life as GBM with the copyright
notices filed off[0] and the addition of separate allocation
intended-use flags for 2D/blit and media decode engines. So for them,
GBM has mutated from the thing that knows about the intersection of
GPU + display, to the gralloc-like thing that can determine optimal
allocation strategies.

Unfortunately I don't expect to ever get meaningful input there, as I
only discovered its existence by semi-accident, back when you needed a
Tizen login to access it as well. It's only ever really been mentioned
in passing, and has no users outside Tizen (and I still don't know
what exactly uses their 'surface queue'). Oh well.
Post by Daniel Vetter
I do believe though that with just kms + sensible heuristics to allocate
surfaces to hw planes + some semi-clever fallback mechanism/hints (which
is what we currently lack) it should be possible to pull something off
without special-case vendor magic in hwc for every combination. That's
purely a conjecture though on my part, otoh no one has ever really tried
all that hard yet.
Another fun suggestion that came back would be feedback from the
atomic ioctl: when rejecting a configuration, optionally return a list
of property changes with which a future configuration would have a
larger chance of success (e.g. wider stride, different tiling mode).
Plumbing that back through to clients isn't without the realm of
reason, though would require more user-visible API. This is something
that would fit in quite nicely with the Weston atomic KMS
implementation, where we attempt to enlarge the configuration one
plane at a time: start with the primary plane, and attempt to place
every other scanout target on a plane, seeing at every turn if they
succeed or need to be punted down to GPU composition.

Cheers,
Daniel

[0]: tbm_bo_handle must be copied from gbm_bo_handle, beacuse to write
that even once makes no sense; to write it independently is so
improbable as to be impossible.
https://review.tizen.org/git/?p=platform/core/uifw/libtbm.git;a=blob;f=src/tbm_bufmgr.h;h=7bf2597f3fee53d3b00ca7ba760675c977ba4435;hb=ecc409c142cd77b1d92cb35f444099e2c782b6ad
Andy Ritger
2016-03-23 00:12:52 UTC
Permalink
Thanks for the thorough responses, Daniel.
Post by Daniel Stone
Hi Miguel,
Post by Miguel Angel Vico
First of all, I'd like to introduce myself to the Wayland community: My
name is Miguel A. Vico, and I've been working as a Software Engineer
for NVIDIA for some time now, more specifically, in the Linux drivers
team. Although I've never spoken before, I've been lately following the
amazing work that you all have been doing here.
Welcome!
I'm sorry I don't have some better news for you, but Andy and Aaron
can tell you it's not personal: this has been going on for years.
Yes, we expected this would be somewhat controversial. I appreciate you
looking at the patch series seriously, Daniel, and especially trying to
drill into the crux of the gbm concerns.
Post by Daniel Stone
Post by Miguel Angel Vico
In order to make the Weston DRM compositor work with our drivers, we
have used EGLDevice, EGLOutput, and EGLStream objects.
This is ... unfortunate. To echo what Daniel Vetter said, on the whole
these modesetting-in-EGL extensions are not something which have that
wide support, or even implementation. That being said, it's
interesting to have an implementation, because it has helped shape my
feelings and arguments a little, into something more concrete
Post by Miguel Angel Vico
For those not familiar with this set of EGL structures, here I try to
summarize the most important part of them, and how would they fit in
EGLDevice provides means to enumerate native devices, and then
create an EGL display connection from them.
This is generically useful: we would like to extend
eglGetPlatformDisplay to take an attrib naming an EGLDevice, which we
could then use with platform_gbm (to select GPU and scanout device
separately, either for multi-GPU systems or also for SoCs with
discrete GPU/dispc setups) as well as platform_wayland and co.
Post by Miguel Angel Vico
Similarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.
For instance, EGLOutputLayer represents a portion of display
control hardware that accepts an image as input and processes it
for presentation on a display device.
I still struggle to see the value of what is essentially an
abstraction over KMS, but oh well.
The intent wasn't to abstract all of KMS, just the surface presentation
aspect where EGL and KMS intersect. Besides the other points below,
an additional motivation for abstraction is to allow EGL to work with
the native modesetting APIs on other platforms (e.g., OpenWF on QNX).
Post by Daniel Stone
Post by Miguel Angel Vico
EGLStream implements a mechanism to communicate frame producers and
frame consumers. By attaching an EGLOutputLayer consumer to a
stream, a producer will be able to present frames on a display
device.
This is understating things quite a bit, I think. On the
Wayland-client side, it's a pretty big change from the EGLSurface
model, particularly if you use the default mailbox mode (see comments
on patch 4/7 as to how this breaks real-world setups, AFAICT).
This shouldn't have a change on the client side: libnvidia-wayland-egl.so
abstracts away the DRM buffer versus EGLStream differences. These patches
have no effect on the behavior of clients, barring bugs.
Post by Daniel Stone
On the Wayland-compositor side, it's two _huge_ changes.
Firstly, again looking at the case where a Wayland client is a stream
producer and the Wayland compositor is a consumer, we move from a
model where references to individual buffers are explicitly passed
through the Wayland protocol, to where those buffers merely carry a
reference to a stream. Again, as stated in the review of 4/7, that
looks like it has the potential to break some actual real-world cases,
and I have no idea how to solve it, other than banning mailbox mode,
which would seem to mostly defeat the point of Streams (more on that
below).
Streams are just a transport for frames. The client still explicitly
communicates when a frame is delivered through the stream via wayland
protocol, and the compositor controls when it grabs a new frame, via
eglStreamConsumerAcquireKHR(). Unless there are bugs in the patches,
the flow of buffers is still explicit and fully under the wayland protocol
and compositor's control.

Also, mailbox mode versus FIFO mode should essentially equate to Vsync
off versus Vsync on, respectively. It shouldn't have anything to do
with the benefits of streams, but mailbox mode is a nice feature for
benchmarking games/simulations or naively displaying your latest &
greatest content without tearing.
Post by Daniel Stone
Secondly, looking at the compositor-drm case, the use of the dumb
buffer to display undefined content as a dummy modeset really makes me
uneasy,
Yes, the use of dumb buffer in this patch series is a kludge. If we
were going to use drmModeSetCrtc + EGLStreams, I think we'd want to
pass no fb to drmModeSetCrtc, but that currently gets rejected by DRM.
Are surface-less modesets intended to be allowable in DRM? I can hunt
that down if that is intended to work. Of course, better to work out
how EGLStreams should cooperate with atomic KMS.

It was definitely an oversight to not zero initialize the dumb buffer.
Post by Daniel Stone
again because both gl-renderer and compositor-drm are written
for explicit individual buffer management, rather than streams in +
streams out. I think the combination of the two pushes them long
beyond the point of readability, and I'd encourage you to look at
trying to split those files up, or at least the functions within them.
Attempting to keep both modes in there just looks like a maintenance
nightmare, especially when this streams implementation
(unsurprisingly) has to bypass almost the entire runtime (as opposed
to init-time) functionality of compositor-drm.
Also, I'm not quite sure how you're testing the compositor-as-consumer
mode: I can't seem to see any EGL extensions which allow you to
connect a Wayland surface as an EGLStream consumer. Do you have
something else unpublished that's being used here, or is this what the
libnvidia-egl-wayland library is for? Or do you just have clients
using EGLSurfaces as normal, which happen to be implemented internally
as EGLStreams? (Also, that the only way to test this is through
proprietary drivers implementing only-just-published extensions not
only makes me very sad, but hugely increases the potential for this to
be inadvertently broken.)
Sorry if this seemed cryptic. You are correct that EGL Wayland clients
just use EGLSurfaces as normal (no Wayland client changes), and that
gets implemented using EGLStreams within libnvidia-egl-wayland.

FWIW, we plan to release the source to libnvidia-egl-wayland
eventually... it has a few driver-specific warts right now, but the
intent is that it is a vendor-independent implementation (though, using
EGLStreams, so...) of EGL_KHR_platform_wayland using a set of EGL API
"wrappers". The goal was to allow window systems to write these EGL
platform binding themselves, so that each EGL implementation doesn't
have to implement each EGL_KHR_platform_*. Anyway, we'll try to get
libnvidia-egl-wayland cleaned up and released.
Post by Daniel Stone
Post by Miguel Angel Vico
Thus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a display
device.
In a similar way, by attaching a GLTexture consumer to a stream, a
producer (wayland client) could feed frames to a texture, which in
turn can be used by a compositor to prepare the final frame to be
presented.
Quick aside: this reminds me in many unfortunate ways of
GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture stream of
stuff and make it appear in a texture' model as streams, whereas most
of the rest of the world (EGL, Vulkan WSI, Wayland, Android, ChromeOS,
etc) have all moved explicitly _away_ from that model to passing
references to individual buffers, this in many ways brings us back to
tfp.
Is that really an accurate comparison? The texture_from_pixmap extension
let X11 composite managers bind a single X pixmap to an OpenGL texture.
It seems to me what was missing in TFP usage was explicit synchronization
between X and/or OpenGL rendering into the pixmap and OpenGL texturing
from the pixmap.
Post by Daniel Stone
Post by Miguel Angel Vico
Whenever EGL_EXT_device_drm extension is present, EGLDevice can
be used to enumerate and access DRM KMS devices, and EGLOutputLayer
to enumerate and access DRM KMS crtcs and planes.
Again, the enumeration isn't so much used as bypassed. The original
enumeration is used, and all we do with the EGL objects is a) list all
of them, b) filter them to find the one we already have, and c)
perhaps replace their internal representation of the device with the
one we already have.
That's fair in the context of this patch set.

In general, EGLDevice provides device enumeration for other use cases
where it is the basis for bootstrapping. Maybe we could better reconcile
udev and EGLDevice in the patch set, but some of this is a natural, though
unfortunate, artifact of correlating objects between two enumeration APIs.
Post by Daniel Stone
Post by Miguel Angel Vico
By using EGLStreams and attaching an EGLOutputLayer consumer
(representing a DRM KMS crtc or plane) to it, compositor-drm can
produce final composition frames and present them on a DRM device.
Arguably it's gl-renderer producing the frames, with compositor-drm
kind of acting as a fake consumer (EGL_NV_stream_attrib).
Post by Miguel Angel Vico
Additionally, in order to allow wl_buffers to be bound to EGLStreams, we
kludged eglQueryWaylandBufferWL(EGL_WAYLAND_BUFFER_WL) to return the
stream file descriptor.
As said earlier, I don't think this is the right way to go, and have
other suggestions.
I'd like to look at the elephant in the room, which is why you're
using this in the first place (aside from general NVIDIA enthusiasm
for encapsulating everything within EGL Streams/Output/Device/etc,
dating back many years). Andy/Aaron, you've said that you found GBM to
be inadequate, and I'd like to find out explicitly how.
Thanks. This is the real heart of the debate.
Post by Daniel Stone
Through a few
'We can't choose an optimal rendering configuration, because we don't
know how it's going to be used' - (almost completely) untrue. The FD
you pass to gbm_device_create is that of the KMS device, a gbm_surface
contains information as to how the plane (primary or overlay) will be
configured,
Maybe I'm not looking in the right place, but where does gbm_surface get
the intended plane configuration? Are there other display-related flags
beside GBM_BO_USE_SCANOUT? Then again, the particular plane doesn't
impact us for current GPUs.
Post by Daniel Stone
and an EGLDisplay lets you tie the rendering and scanout
devices together. What more information do you need? It's true that we
don't have a way to select individual rendering devices at the moment,
but as said earlier, passing an EGLDevice as an attrib to
GetPlatformDisplay would resolve that, as you would have the render
device identified by the EGLDevice and the scanout device identified
by the gbm_device. At that point, you have the full pipeline and can
determine the optimal configuration.
Beyond choosing optimal rendering configuration, there is arbitration of
the scarce resources needed for optimal rendering configuration. E.g.,
for Wayland compositor flipping to client-produced buffers, presumably the
client's buffer needs to be allocated with GBM_BO_USE_SCANOUT. NVIDIA's
display hardware requires physically contiguous buffers, so we wouldn't
want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
flag. It would be nice to have feedback between the EGL driver instance
in the compositor and the EGL driver running in the client, to know how
the buffer is going to be used by the Wayland compositor.

I imagine other hardware has even more severe constraints on displayable
memory, though, so maybe I'm misunderstanding something about how buffers
are shared between wayland clients and compositors?

This ties into the next point...
Post by Daniel Stone
'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?
The Vivante+Freescale example is a good one, but it would be more
interesting if they shared /some/ formats and you could only use those
common formats in /some/ cases.

I think a lot of the concern is about passing client-produced frames
all the way through to scanout (i.e., zero-copy). E.g., if the wayland
client is producing frames that the wayland compositor is going to use
as a texture, then we don't want the client to decompress as part of its
eglSwapBuffers: the wayland compositor will texture from the compressed
frame for best performance. But, if the wayland compositor is going to
flip to the surface, then we would want the client to decompress during
its eglSwapBuffers.

The nice thing about EGLStreams here is that if the consumer (the Wayland
compositor) wants to use the content in a different way, the producer
must be notified first, in order to produce something suitable for the
new consumer.
Post by Daniel Stone
'Width, height, pitch and format aren't enough information' - this is
true, but not necessarily relevant. I'm not sure what the source of
this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
to be extended with a gbm_bo_get_modifier() call, which would allow
you to get the DRM format modifier to describe tiling/compression/et
al (as well as perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers). If it's
not gbm, what actually is it? The only other place I can think of
(suggested by Pekka, I think) was the wl_drm protocol, which it should
be stressed is a) not required in any way by Wayland, b) not a
published/public protocol, c) not a stable protocol. wl_drm just
happens to be the way that Mesa shares buffers, just as wl_viv is how
Vivante's proprietary driver shares buffers, and mali_buffer_sharing
is how the Mali driver does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used through
EGL, there is _no_ requirement for you to also implement wl_drm. As it
is a hidden private Mesa protocol, there is also no requirement for
the protocol to remain stable.
I agree that wl_drm doesn't factor into it.

Maybe some of this is my confusion over what parts of gbm.h are
application-facing, and what parts are driver-facing? We, and
presumably most hardware vendors, would want the ability to associate
arbitrary metadata with gbm_bo's, but most of that metadata is
implementation-specific, and not really something an application should
be looking at without sacrificing portability.
Post by Daniel Stone
'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
the explicit buffer management on the client side does not parallel
EGLStreams, and notably there is no equivalent consumer interface
offered on the server side, but instead the individual-buffer-driven
approach is taken. It's true that VK_WSI_display_swapchain does exist
and does match the EGLStreams model fairly closely, but also that it
does not have universal implementation: the Intel 'anv' Mesa-based
driver does not implement display_swapchain, instead having an
interface to export a VkImage as a dmabuf. It's true that the latter
is not optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
GBM avoids it for EGL on KMS.
From your perspective, what would be more optimal than VkImage+dmabuf?
Post by Daniel Stone
I think it's been good to have this series to push the discussion
further in more concrete terms, but unfortunately I have to say that
I'm even less convinced now than I have ever been. Sorry.
Thanks for the feedback so far.
- Andy
Post by Daniel Stone
Cheers,
Daniel
Daniel Vetter
2016-03-23 10:17:48 UTC
Permalink
Post by Andy Ritger
Thanks for the thorough responses, Daniel.
Post by Daniel Stone
Secondly, looking at the compositor-drm case, the use of the dumb
buffer to display undefined content as a dummy modeset really makes me
uneasy,
Yes, the use of dumb buffer in this patch series is a kludge. If we
were going to use drmModeSetCrtc + EGLStreams, I think we'd want to
pass no fb to drmModeSetCrtc, but that currently gets rejected by DRM.
Are surface-less modesets intended to be allowable in DRM? I can hunt
that down if that is intended to work. Of course, better to work out
how EGLStreams should cooperate with atomic KMS.
Side comment: With universal planes (and hence atomic) you can light up a
CRTC without any planes enabled. If your hw can do it. It's supposed to be
black in that case, and there's some pages floating around to control the
background colour (if anyone ever wants to change that ...).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Carsten Haitzler (The Rasterman)
2016-03-24 23:52:31 UTC
Permalink
Post by Andy Ritger
Maybe I'm not looking in the right place, but where does gbm_surface get
the intended plane configuration? Are there other display-related flags
beside GBM_BO_USE_SCANOUT? Then again, the particular plane doesn't
impact us for current GPUs.
however you will not know the intended plane config because a compositor will
make this choice long after a buffer is allocated. it has received buffers from
clients and now has to choose how best to display this current screen setup
based on the input. it may use gpu to render, may assign buffers for scanout,
or anything else. the point is the layout may and often WILL change long after
the buffer has been allocated and even rendered to (or at least rendering has
started with fences able to ensure sync with an ongoing render).

so at best you can query current config - this is not totally correct and
streams don't solve this either. it's a fundamental issue that if you want real
optimal layout, you need an explicit protocol at a higher layer.
Post by Andy Ritger
Beyond choosing optimal rendering configuration, there is arbitration of
the scarce resources needed for optimal rendering configuration. E.g.,
for Wayland compositor flipping to client-produced buffers, presumably the
client's buffer needs to be allocated with GBM_BO_USE_SCANOUT. NVIDIA's
display hardware requires physically contiguous buffers, so we wouldn't
want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
flag. It would be nice to have feedback between the EGL driver instance
in the compositor and the EGL driver running in the client, to know how
the buffer is going to be used by the Wayland compositor.
I imagine other hardware has even more severe constraints on displayable
memory, though, so maybe I'm misunderstanding something about how buffers
are shared between wayland clients and compositors?
same thing as above. you really cannot do this at the egl level because you
don't know that usage scenario beforehand. this really needs to be at a higher
level likely with an explicit wayland protocol and client-side co-operation.

for example. let's pretend that we have hardware with a fixed limited number of
hw planes. 1 is limited 256x256 argb (cursor), 1 is yuv only (can scale and
rotate 90 degrees), 2 are yuv or rgba (can scale and rotate), and 1 is rgba
only (can scale and rotate).

you have 5 applications drawing stuff. some apps display some video, some
not... do you really want all apps to split up their rendering into subsurfaces
AND thus scanout capable buffers? unlikely. you do not have enough planes to
support this. so the compositor likely wants to send "hints" to clients as to
how many buffers may be available for them and what capabilities they have. the
compositor may choose to hide the cursor layer because it's busy using it for
the cursor. :) clients can break up their display into lots of subsurfaces and
buffers - eg render browser content separately from chrome so it could
pan/scroll the content simply by offsetting a larger buffer and not
re-rendering. if one client becomes fullscreen/maximized, the compositor may
choose to tell all clients that they now can't display except this one, and
tell this one that it has 4 planes available, so the fullscreen client can
maximize efficiency, whilst the other hidden clients can stop using scanout
capable buffers (because they likely only will be displayed when task switching
as thumbnails etc. and thus only need memory the gpu can use as a texture).

but all of this would be much higher level that percolates up into the
toolkit/widget set and even client logic directly. it would require some time
for clients to adapt and re-render.

i just don't think you can make this all magically perfect at purely the egl or
kms or drm etc. layer. these layers are simple and explicit. compositor will do
a "best effort" given the buffer inputs it has. if you want this more optimal
you need to tell clients much more and then hope toolkits etc. respond.
Post by Andy Ritger
This ties into the next point...
The Vivante+Freescale example is a good one, but it would be more
interesting if they shared /some/ formats and you could only use those
common formats in /some/ cases.
I think a lot of the concern is about passing client-produced frames
all the way through to scanout (i.e., zero-copy). E.g., if the wayland
client is producing frames that the wayland compositor is going to use
as a texture, then we don't want the client to decompress as part of its
eglSwapBuffers: the wayland compositor will texture from the compressed
frame for best performance. But, if the wayland compositor is going to
flip to the surface, then we would want the client to decompress during
its eglSwapBuffers.
correct, but as above... there is no way the client WILL know what WILL be done
because that decision is made much later. long after client has allocated and
rendered its frame. the compositor now reacts to this input and makes a
decision (and may change its decision frame by frame).

it's an inefficiency then to de-tile and re-tile (or compress then
decompress ... etc.). there really should be a compositor to client hinting
protocol that covers how many subsurfaces might be best, what formats might be
best etc. etc. - e.g. in this case if there are many surfaces on screen the
compositor might just tell all clients "please stick to 1 surface with argb, no
scanout" and at least until all clients re-draw and copy/convert their buffers
into non-scanout buffers there is a cost to display (de-tile/de-compress). too
bad. then once all clients have adapted, things work better.
Post by Andy Ritger
The nice thing about EGLStreams here is that if the consumer (the Wayland
compositor) wants to use the content in a different way, the producer
must be notified first, in order to produce something suitable for the
new consumer.
that's the problem... the compositor (consumer) makes this decision LATER, not
BEFORE. :) things have to work, efficiently or not, regardless of the
compositor (consumer) decisions. adapting to become more efficient is far more
than a stream of 1 surface and a stream of buffers.
--
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler) ***@rasterman.com
Andy Ritger
2016-04-02 00:21:23 UTC
Permalink
Post by Carsten Haitzler (The Rasterman)
Post by Andy Ritger
Maybe I'm not looking in the right place, but where does gbm_surface get
the intended plane configuration? Are there other display-related flags
beside GBM_BO_USE_SCANOUT? Then again, the particular plane doesn't
impact us for current GPUs.
however you will not know the intended plane config because a compositor will
make this choice long after a buffer is allocated. it has received buffers from
clients and now has to choose how best to display this current screen setup
based on the input. it may use gpu to render, may assign buffers for scanout,
or anything else. the point is the layout may and often WILL change long after
the buffer has been allocated and even rendered to (or at least rendering has
started with fences able to ensure sync with an ongoing render).
so at best you can query current config - this is not totally correct and
streams don't solve this either. it's a fundamental issue that if you want real
optimal layout, you need an explicit protocol at a higher layer.
Thanks. Sorry, I think I led the discussion incorrectly with the talk
of plane configuration. I was only asking a clarifying question to
Daniel's speculation of what we were concerned about in gbm.

As-is, yes, plane configuration is the domain of the Wayland compositor,
and the EGLStreams proposal doesn't alter that. The point of the
EGLStreams proposal is to make sure that the driver performing the
hw-specific details of the buffer allocation has a complete picture of
how the buffer will be used. Of course, your point that the usage could
change dynamically is good.
Post by Carsten Haitzler (The Rasterman)
Post by Andy Ritger
Beyond choosing optimal rendering configuration, there is arbitration of
the scarce resources needed for optimal rendering configuration. E.g.,
for Wayland compositor flipping to client-produced buffers, presumably the
client's buffer needs to be allocated with GBM_BO_USE_SCANOUT. NVIDIA's
display hardware requires physically contiguous buffers, so we wouldn't
want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
flag. It would be nice to have feedback between the EGL driver instance
in the compositor and the EGL driver running in the client, to know how
the buffer is going to be used by the Wayland compositor.
I imagine other hardware has even more severe constraints on displayable
memory, though, so maybe I'm misunderstanding something about how buffers
are shared between wayland clients and compositors?
same thing as above. you really cannot do this at the egl level because you
don't know that usage scenario beforehand. this really needs to be at a higher
level likely with an explicit wayland protocol and client-side co-operation.
for example. let's pretend that we have hardware with a fixed limited number of
hw planes. 1 is limited 256x256 argb (cursor), 1 is yuv only (can scale and
rotate 90 degrees), 2 are yuv or rgba (can scale and rotate), and 1 is rgba
only (can scale and rotate).
you have 5 applications drawing stuff. some apps display some video, some
not... do you really want all apps to split up their rendering into subsurfaces
AND thus scanout capable buffers? unlikely. you do not have enough planes to
support this. so the compositor likely wants to send "hints" to clients as to
how many buffers may be available for them and what capabilities they have. the
compositor may choose to hide the cursor layer because it's busy using it for
the cursor. :) clients can break up their display into lots of subsurfaces and
buffers - eg render browser content separately from chrome so it could
pan/scroll the content simply by offsetting a larger buffer and not
re-rendering. if one client becomes fullscreen/maximized, the compositor may
choose to tell all clients that they now can't display except this one, and
tell this one that it has 4 planes available, so the fullscreen client can
maximize efficiency, whilst the other hidden clients can stop using scanout
capable buffers (because they likely only will be displayed when task switching
as thumbnails etc. and thus only need memory the gpu can use as a texture).
but all of this would be much higher level that percolates up into the
toolkit/widget set and even client logic directly. it would require some time
for clients to adapt and re-render.
i just don't think you can make this all magically perfect at purely the egl or
kms or drm etc. layer. these layers are simple and explicit. compositor will do
a "best effort" given the buffer inputs it has. if you want this more optimal
you need to tell clients much more and then hope toolkits etc. respond.
I'm all for pushing decision making higher in the software stack,
in general. But for the sorts of things you describe above, it seems
like a lot of complexity to impose on clients. For fully optimizing
the plane usage, I wonder if a HWC-like solution is a better way to go.

But in any case, I didn't mean to get into plane usage decisions.
The EGLStreams proposal is meant to keep plane usage decisions where
they currently are in compositors.
Post by Carsten Haitzler (The Rasterman)
Post by Andy Ritger
This ties into the next point...
The Vivante+Freescale example is a good one, but it would be more
interesting if they shared /some/ formats and you could only use those
common formats in /some/ cases.
I think a lot of the concern is about passing client-produced frames
all the way through to scanout (i.e., zero-copy). E.g., if the wayland
client is producing frames that the wayland compositor is going to use
as a texture, then we don't want the client to decompress as part of its
eglSwapBuffers: the wayland compositor will texture from the compressed
frame for best performance. But, if the wayland compositor is going to
flip to the surface, then we would want the client to decompress during
its eglSwapBuffers.
correct, but as above... there is no way the client WILL know what WILL be done
because that decision is made much later. long after client has allocated and
rendered its frame. the compositor now reacts to this input and makes a
decision (and may change its decision frame by frame).
Agreed that the usage can change dynamically. And we should make sure
that things don't fall off a cliff when the usage changes. But, I think
the important performance case is the steady state.

Thanks,
- Andy
Post by Carsten Haitzler (The Rasterman)
it's an inefficiency then to de-tile and re-tile (or compress then
decompress ... etc.). there really should be a compositor to client hinting
protocol that covers how many subsurfaces might be best, what formats might be
best etc. etc. - e.g. in this case if there are many surfaces on screen the
compositor might just tell all clients "please stick to 1 surface with argb, no
scanout" and at least until all clients re-draw and copy/convert their buffers
into non-scanout buffers there is a cost to display (de-tile/de-compress). too
bad. then once all clients have adapted, things work better.
Post by Andy Ritger
The nice thing about EGLStreams here is that if the consumer (the Wayland
compositor) wants to use the content in a different way, the producer
must be notified first, in order to produce something suitable for the
new consumer.
that's the problem... the compositor (consumer) makes this decision LATER, not
BEFORE. :) things have to work, efficiently or not, regardless of the
compositor (consumer) decisions. adapting to become more efficient is far more
than a stream of 1 surface and a stream of buffers.
--
------------- Codito, ergo sum - "I code, therefore I am" --------------
Daniel Stone
2016-03-29 16:44:41 UTC
Permalink
Hi Andy,
Post by Andy Ritger
Thanks for the thorough responses, Daniel.
No problem; as I said, I'm actually really happy to see an
implementation out there.
Post by Andy Ritger
Post by Daniel Stone
Post by Miguel Angel Vico
Similarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.
For instance, EGLOutputLayer represents a portion of display
control hardware that accepts an image as input and processes it
for presentation on a display device.
I still struggle to see the value of what is essentially an
abstraction over KMS, but oh well.
The intent wasn't to abstract all of KMS, just the surface presentation
aspect where EGL and KMS intersect. Besides the other points below,
an additional motivation for abstraction is to allow EGL to work with
the native modesetting APIs on other platforms (e.g., OpenWF on QNX).
Fair enough. And, ah, _that's_ where the OpenWF implementation is - I
was honestly unsure for years since the last implementation I saw was
from the ex-Hybrid NVIDIA guys in Helsinki, back when it was aimed at
Series 60.
Post by Andy Ritger
Post by Daniel Stone
Firstly, again looking at the case where a Wayland client is a stream
producer and the Wayland compositor is a consumer, we move from a
model where references to individual buffers are explicitly passed
through the Wayland protocol, to where those buffers merely carry a
reference to a stream. Again, as stated in the review of 4/7, that
looks like it has the potential to break some actual real-world cases,
and I have no idea how to solve it, other than banning mailbox mode,
which would seem to mostly defeat the point of Streams (more on that
below).
Streams are just a transport for frames. The client still explicitly
communicates when a frame is delivered through the stream via wayland
protocol, and the compositor controls when it grabs a new frame, via
eglStreamConsumerAcquireKHR(). Unless there are bugs in the patches,
the flow of buffers is still explicit and fully under the wayland protocol
and compositor's control.
Right, I believe if you have FIFO mode and strictly enforce
synchronisation to wl_surface::frame, then you should be safe. Mailbox
mode or any other kind of SwapInterval(0) equivalent opens you up to a
series of issues.
Post by Andy Ritger
Also, mailbox mode versus FIFO mode should essentially equate to Vsync
off versus Vsync on, respectively. It shouldn't have anything to do
with the benefits of streams, but mailbox mode is a nice feature for
benchmarking games/simulations or naively displaying your latest &
greatest content without tearing.
I agree it's definitely a nice thing to have, but it does bring up the
serialisation issue: we expect any configuration performed by the
client (say, wl_surface::set_opaque_area to let the compositor know
where it can disable blending) to be fully in-line with buffer
attachment. The extreme case of this is resize, but there are quite a
few valid cases where you need serialisation.

I don't know quite off the top of my head how you'd support mailbox
mode with Streams, given this constraint - you need three-way feedback
between the compositor (recording all associated surface state,
including subsurfaces), clients (recording the surface state valid
when that buffer was posted), and the Streams implementation
(determining which frames to dequeue, which to discard and return to
the client, etc).
Post by Andy Ritger
Post by Daniel Stone
Secondly, looking at the compositor-drm case, the use of the dumb
buffer to display undefined content as a dummy modeset really makes me
uneasy,
Yes, the use of dumb buffer in this patch series is a kludge. If we
were going to use drmModeSetCrtc + EGLStreams, I think we'd want to
pass no fb to drmModeSetCrtc, but that currently gets rejected by DRM.
Are surface-less modesets intended to be allowable in DRM? I can hunt
that down if that is intended to work. Of course, better to work out
how EGLStreams should cooperate with atomic KMS.
It was definitely an oversight to not zero initialize the dumb buffer.
Right, atomic allows you separate pipe/CRTC configuration from
plane/overlay configuration. So you'd have two options: one is to use
atomic and require the CRTC be configured with planes off before using
Streams to post flips, and the other is to add KMS configuration to
the EGL output.

Though, now I think of it, this effectively precludes one case, which
is scaling a Streams-sourced buffer inside the display controller. In
the GBM case, the compositor gets every buffer, so can configure the
plane scaling in line with buffer display. I don't see how you'd do
that with Streams.

There's another hurdle to overcome too, which would currently preclude
avoiding the intermediate dumb buffer at all. One of the invariants
the atomic KMS API enforces is that (!!plane->crtc_id ==
!!plane->fb_id), i.e. that a plane cannot be assigned to a CRTC
without an active buffer. So again, we're left with either having the
plane fully configured and active (assigned to a CRTC and displaying,
I assume, a pre-allocated dumb buffer), or pushing more configuration
into Streams - specifically, connecting an EGLOutputLayer to an
EGLOutputPort.
Post by Andy Ritger
Post by Daniel Stone
Also, I'm not quite sure how you're testing the compositor-as-consumer
mode: I can't seem to see any EGL extensions which allow you to
connect a Wayland surface as an EGLStream consumer. Do you have
something else unpublished that's being used here, or is this what the
libnvidia-egl-wayland library is for? Or do you just have clients
using EGLSurfaces as normal, which happen to be implemented internally
as EGLStreams? (Also, that the only way to test this is through
proprietary drivers implementing only-just-published extensions not
only makes me very sad, but hugely increases the potential for this to
be inadvertently broken.)
Sorry if this seemed cryptic. You are correct that EGL Wayland clients
just use EGLSurfaces as normal (no Wayland client changes), and that
gets implemented using EGLStreams within libnvidia-egl-wayland.
Sorry, I'd missed this whilst reading through.
Post by Andy Ritger
FWIW, we plan to release the source to libnvidia-egl-wayland
eventually... it has a few driver-specific warts right now, but the
intent is that it is a vendor-independent implementation (though, using
EGLStreams, so...) of EGL_KHR_platform_wayland using a set of EGL API
"wrappers". The goal was to allow window systems to write these EGL
platform binding themselves, so that each EGL implementation doesn't
have to implement each EGL_KHR_platform_*. Anyway, we'll try to get
libnvidia-egl-wayland cleaned up and released.
Interesting!
Post by Andy Ritger
Post by Daniel Stone
Post by Miguel Angel Vico
Thus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a display
device.
In a similar way, by attaching a GLTexture consumer to a stream, a
producer (wayland client) could feed frames to a texture, which in
turn can be used by a compositor to prepare the final frame to be
presented.
Quick aside: this reminds me in many unfortunate ways of
GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture stream of
stuff and make it appear in a texture' model as streams, whereas most
of the rest of the world (EGL, Vulkan WSI, Wayland, Android, ChromeOS,
etc) have all moved explicitly _away_ from that model to passing
references to individual buffers, this in many ways brings us back to
tfp.
Is that really an accurate comparison? The texture_from_pixmap extension
let X11 composite managers bind a single X pixmap to an OpenGL texture.
It seems to me what was missing in TFP usage was explicit synchronization
between X and/or OpenGL rendering into the pixmap and OpenGL texturing
from the pixmap.
I'd argue that synchronisation (in terms of serialisation with the
rest of the client's protocol stream) is missing from Streams as well,
at least in mailbox mode.

(As an aside, I wonder if it's properly done in FIFO mode as well; the
compositor may very validly choose not to dequeue a buffer if a
surface is completely occluded. How does Streams then know that it can
submit another frame? Generally we use wl_surface::frame to deal with
this - the equivalent of eglSwapInterval(1) - but it sounds like
Streams relies more on strictly-paired internal queue/dequeue pairing
in FIFO mode. Maybe this isn't true.)
Post by Andy Ritger
Post by Daniel Stone
Post by Miguel Angel Vico
Whenever EGL_EXT_device_drm extension is present, EGLDevice can
be used to enumerate and access DRM KMS devices, and EGLOutputLayer
to enumerate and access DRM KMS crtcs and planes.
Again, the enumeration isn't so much used as bypassed. The original
enumeration is used, and all we do with the EGL objects is a) list all
of them, b) filter them to find the one we already have, and c)
perhaps replace their internal representation of the device with the
one we already have.
That's fair in the context of this patch set.
In general, EGLDevice provides device enumeration for other use cases
where it is the basis for bootstrapping. Maybe we could better reconcile
udev and EGLDevice in the patch set, but some of this is a natural, though
unfortunate, artifact of correlating objects between two enumeration APIs.
Mind you, this wasn't intended as a criticism, just noting that the
commit message didn't accurately describe the code.
Post by Andy Ritger
Post by Daniel Stone
I'd like to look at the elephant in the room, which is why you're
using this in the first place (aside from general NVIDIA enthusiasm
for encapsulating everything within EGL Streams/Output/Device/etc,
dating back many years). Andy/Aaron, you've said that you found GBM to
be inadequate, and I'd like to find out explicitly how.
Thanks. This is the real heart of the debate.
Yes!
Post by Andy Ritger
Post by Daniel Stone
Through a few
'We can't choose an optimal rendering configuration, because we don't
know how it's going to be used' - (almost completely) untrue. The FD
you pass to gbm_device_create is that of the KMS device, a gbm_surface
contains information as to how the plane (primary or overlay) will be
configured,
Maybe I'm not looking in the right place, but where does gbm_surface get
the intended plane configuration? Are there other display-related flags
beside GBM_BO_USE_SCANOUT? Then again, the particular plane doesn't
impact us for current GPUs.
Well, nowhere. By current plane configuration, I assume you're (to the
extent that you can discuss it) talking about asymmetric plane
capabilities, e.g. support for disjoint colour formats, scaling units,
etc? As Dan V says, I still see Streams as a rather incomplete fix to
this, given that plane assignment is pre-determined: what do you do
when your buffers are configured as optimally as possible, but the
compositor has picked the 'wrong' plane? I really think you need
something like HWC to rewrite your scene graph into the optimal setup.
Post by Andy Ritger
Post by Daniel Stone
and an EGLDisplay lets you tie the rendering and scanout
devices together. What more information do you need? It's true that we
don't have a way to select individual rendering devices at the moment,
but as said earlier, passing an EGLDevice as an attrib to
GetPlatformDisplay would resolve that, as you would have the render
device identified by the EGLDevice and the scanout device identified
by the gbm_device. At that point, you have the full pipeline and can
determine the optimal configuration.
Beyond choosing optimal rendering configuration, there is arbitration of
the scarce resources needed for optimal rendering configuration. E.g.,
for Wayland compositor flipping to client-produced buffers, presumably the
client's buffer needs to be allocated with GBM_BO_USE_SCANOUT. NVIDIA's
display hardware requires physically contiguous buffers, so we wouldn't
want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
flag. It would be nice to have feedback between the EGL driver instance
in the compositor and the EGL driver running in the client, to know how
the buffer is going to be used by the Wayland compositor.
I imagine other hardware has even more severe constraints on displayable
memory, though, so maybe I'm misunderstanding something about how buffers
are shared between wayland clients and compositors?
Ah! This is something I've very much had in mind - and have had for
quite a while, but keep getting pre-empted - for a while, but didn't
bring up as it didn't seem implemented in the current patchset. (IIRC,
jajones had some code to allow you to retarget Streams at different
consumers, but he's on leave.)

Also, I should add that there's nothing requiring clients to use GBM
to allocate. The client EGLSurface implementation is free to do purely
internal allocations that are only accessible to it, if it wants to;
gbm_bo_import would then note that the buffer is not usable for
scanout and fail the import, leaving the compositor to fall back to
EGLImage.
Post by Andy Ritger
This ties into the next point...
Post by Daniel Stone
'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?
The Vivante+Freescale example is a good one, but it would be more
interesting if they shared /some/ formats and you could only use those
common formats in /some/ cases.
That's also fairly common, particularly for tiling. Intel has more
tiling modes than I can remember, of which only one (X-tiling) is a
valid source for scanout. As you say, physical contiguity is also a
valid requirement, plus pitch alignment.
Post by Andy Ritger
I think a lot of the concern is about passing client-produced frames
all the way through to scanout (i.e., zero-copy). E.g., if the wayland
client is producing frames that the wayland compositor is going to use
as a texture, then we don't want the client to decompress as part of its
eglSwapBuffers: the wayland compositor will texture from the compressed
frame for best performance. But, if the wayland compositor is going to
flip to the surface, then we would want the client to decompress during
its eglSwapBuffers.
Yes, very much so. Taking the Freescale example, you want the client
to do a detiling blit during its swap if the surface is a valid
scanout target, but not at all if it's just getting textured by the
GPU anyway. Similarly, Intel wants to allocate X-tiled if scanout is
possible, but otherwise it wants to be Y/Yf/...-tiled.
Post by Andy Ritger
The nice thing about EGLStreams here is that if the consumer (the Wayland
compositor) wants to use the content in a different way, the producer
must be notified first, in order to produce something suitable for the
new consumer.
I believe this is entirely doable with GBM right now, taking advantage
of the fact that libgbm.so and libEGL.so must be as tightly paired as
libEGL.so and libGLESv2.so. For all of these, read 'wl_drm' as 'wl_drm
or its equivalent interface in other implementations'.

Firstly, create a new interface in wl_drm to represent a swapchain (in
the Vulkan sense), and modify its buffer-creation requests to take a
swapchain parameter. This we can do without penalty, since the only
users (aside from VA-API, which is really broken and also hopefully
soon to lose its Wayland sink anyway) are EGL_EXT_platform_wayland and
EGL_WL_bind_wayland_display, both within the same DSO.

Secondly, instrument gbm_bo_import's wl_buffer path (proxy for intent
to use a buffer for direct scanout) and EGLImage's
EGL_WAYLAND_BUFFER_WL path (proxy for intent to use via GPU
composition) to determine what the compositor is actually doing with
these buffers, and use that to store target/intent in the swapchain.

Thirdly, when the target/intent changes (e.g. 'was scanout every
frame, has been EGLImage for the last 120 frames'), send an event down
to the client to let it know to modify its allocation. The combination
of EGL/GBM are in the correct place to determine this, since between
them they already have to know the intersection of capabilities
between render and scanout.

That still doesn't solve the optimal-display-configuration problem -
that you have generic code determining not only the display strategy
(scanout vs. GPU composition) as well as the exact display controller
configuration - but neither does EGLStreams, or indeed anything
current short of HWC.

Do you see any problem with doing that within GBM? It's not actually
done yet, but then again, neither is direct scanout through Streams.
;)
Post by Andy Ritger
Post by Daniel Stone
'Width, height, pitch and format aren't enough information' - this is
true, but not necessarily relevant. I'm not sure what the source of
this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
to be extended with a gbm_bo_get_modifier() call, which would allow
you to get the DRM format modifier to describe tiling/compression/et
al (as well as perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers). If it's
not gbm, what actually is it? The only other place I can think of
(suggested by Pekka, I think) was the wl_drm protocol, which it should
be stressed is a) not required in any way by Wayland, b) not a
published/public protocol, c) not a stable protocol. wl_drm just
happens to be the way that Mesa shares buffers, just as wl_viv is how
Vivante's proprietary driver shares buffers, and mali_buffer_sharing
is how the Mali driver does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used through
EGL, there is _no_ requirement for you to also implement wl_drm. As it
is a hidden private Mesa protocol, there is also no requirement for
the protocol to remain stable.
I agree that wl_drm doesn't factor into it.
Maybe some of this is my confusion over what parts of gbm.h are
application-facing, and what parts are driver-facing? We, and
presumably most hardware vendors, would want the ability to associate
arbitrary metadata with gbm_bo's, but most of that metadata is
implementation-specific, and not really something an application should
be looking at without sacrificing portability.
All of gbm.h is user-facing; how you implement that API is completely
up to you, including arbitrary metadata. For instance, it's the driver
that allocates its own struct gbm_surface/gbo_bo/etc (which is
opaque), so it can do whatever it likes in terms of metadata. Is there
anything in particular you're thinking of that you're not sure you'd
be able to store portably?

Might also be worth striking a common misconception here: the Mesa GBM
implementation is _not_ canonical. gbm.h is the user-facing API you
have to implement, but beyond that, you don't need to be implemented
by Mesa's src/gbm/. As the gbm.h types are all opaque, I'm not sure
what you couldn't express/hide/store - do you have any examples?
Post by Andy Ritger
Post by Daniel Stone
'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
the explicit buffer management on the client side does not parallel
EGLStreams, and notably there is no equivalent consumer interface
offered on the server side, but instead the individual-buffer-driven
approach is taken. It's true that VK_WSI_display_swapchain does exist
and does match the EGLStreams model fairly closely, but also that it
does not have universal implementation: the Intel 'anv' Mesa-based
driver does not implement display_swapchain, instead having an
interface to export a VkImage as a dmabuf. It's true that the latter
is not optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
GBM avoids it for EGL on KMS.
From your perspective, what would be more optimal than VkImage+dmabuf?
Well, it's pretty much on par with GBM-compositor-Wayland-client and
an EGLStreams pipeline ending in an EGLOutput. Not having something
like HWC means that you can't determine the optimal plane-allocation
strategy.
Post by Andy Ritger
Post by Daniel Stone
Agreed. One of the things I've been incredibly happy with is how our
platform has managed to stay completely generic and vendor-neutral so
far, and I'd love to preserve that.
I don't think you'll find any disagreement to that from NVIDIA, either.
I apologize if the EGLStreams proposal gave the impression of a
vendor-private solution. That wasn't the intent. The EGLStream family
of extensions are, after all, an open specification that any EGL vendor
can implement. If there are aspects of any of these EGL extensions that
seem useful, I'd hope that Mesa would we willing to adopt them.
Indeed, this wasn't to cast any aspersions on how you guys have
developed Streams. Having it out there and having these patches has
really been tremendously useful.
Post by Andy Ritger
We (NVIDIA) clearly think EGLStreams is a good direction for expressing
buffer sharing semantics. In our ideal world, everyone would implement
these extensions and Wayland compositors would migrate to using them as
the generic vendor-neutral mechanism for buffer sharing :)
But here's where my problem lies. At the moment, the 'how do I
Wayland' story is very straightforward, and not entirely
coincidentally similar to ChromeOS's: you implement GBM+KMS, you
implement the ~25 LoC of libwayland-egl, you implement
EGL_EXT_platform_{gbm,wayland}, and ... that's it. Introducing Streams
as an alternate model is certainly interesting, and I understand why
you would do it, but having it as the sole option muddies the 'how do
I Wayland' story significantly.

Getting away from the vendor-bound DDX model was something we were
desperate to do (see also xf86-video-modesetting landing on GBM+EGL),
and I'd really just like to avoid that becoming 'well, for most
platforms you do this, but for this platform / these platforms, you do
this instead ...'.

Cheers,
Daniel
Andy Ritger
2016-04-02 00:28:17 UTC
Permalink
Post by Daniel Stone
Hi Andy,
Post by Andy Ritger
Thanks for the thorough responses, Daniel.
No problem; as I said, I'm actually really happy to see an
implementation out there.
Post by Andy Ritger
Post by Daniel Stone
Post by Miguel Angel Vico
Similarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.
For instance, EGLOutputLayer represents a portion of display
control hardware that accepts an image as input and processes it
for presentation on a display device.
I still struggle to see the value of what is essentially an
abstraction over KMS, but oh well.
The intent wasn't to abstract all of KMS, just the surface presentation
aspect where EGL and KMS intersect. Besides the other points below,
an additional motivation for abstraction is to allow EGL to work with
the native modesetting APIs on other platforms (e.g., OpenWF on QNX).
Fair enough. And, ah, _that's_ where the OpenWF implementation is - I
was honestly unsure for years since the last implementation I saw was
from the ex-Hybrid NVIDIA guys in Helsinki, back when it was aimed at
Series 60.
Yes. I haven't had any direct interaction with the QNX implementation
of OpenWF. In any case, portability across OSes has been an important
part of our downstream Wayland efforts in automotive.
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
Firstly, again looking at the case where a Wayland client is a stream
producer and the Wayland compositor is a consumer, we move from a
model where references to individual buffers are explicitly passed
through the Wayland protocol, to where those buffers merely carry a
reference to a stream. Again, as stated in the review of 4/7, that
looks like it has the potential to break some actual real-world cases,
and I have no idea how to solve it, other than banning mailbox mode,
which would seem to mostly defeat the point of Streams (more on that
below).
Streams are just a transport for frames. The client still explicitly
communicates when a frame is delivered through the stream via wayland
protocol, and the compositor controls when it grabs a new frame, via
eglStreamConsumerAcquireKHR(). Unless there are bugs in the patches,
the flow of buffers is still explicit and fully under the wayland protocol
and compositor's control.
Right, I believe if you have FIFO mode and strictly enforce
synchronisation to wl_surface::frame, then you should be safe. Mailbox
mode or any other kind of SwapInterval(0) equivalent opens you up to a
series of issues.
Post by Andy Ritger
Also, mailbox mode versus FIFO mode should essentially equate to Vsync
off versus Vsync on, respectively. It shouldn't have anything to do
with the benefits of streams, but mailbox mode is a nice feature for
benchmarking games/simulations or naively displaying your latest &
greatest content without tearing.
I agree it's definitely a nice thing to have, but it does bring up the
serialisation issue: we expect any configuration performed by the
client (say, wl_surface::set_opaque_area to let the compositor know
where it can disable blending) to be fully in-line with buffer
attachment. The extreme case of this is resize, but there are quite a
few valid cases where you need serialisation.
I don't know quite off the top of my head how you'd support mailbox
mode with Streams, given this constraint - you need three-way feedback
between the compositor (recording all associated surface state,
including subsurfaces), clients (recording the surface state valid
when that buffer was posted), and the Streams implementation
(determining which frames to dequeue, which to discard and return to
the client, etc).
It is possible we don't get that all completely right in our implementation, yet.
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
Secondly, looking at the compositor-drm case, the use of the dumb
buffer to display undefined content as a dummy modeset really makes me
uneasy,
Yes, the use of dumb buffer in this patch series is a kludge. If we
were going to use drmModeSetCrtc + EGLStreams, I think we'd want to
pass no fb to drmModeSetCrtc, but that currently gets rejected by DRM.
Are surface-less modesets intended to be allowable in DRM? I can hunt
that down if that is intended to work. Of course, better to work out
how EGLStreams should cooperate with atomic KMS.
It was definitely an oversight to not zero initialize the dumb buffer.
Right, atomic allows you separate pipe/CRTC configuration from
plane/overlay configuration. So you'd have two options: one is to use
atomic and require the CRTC be configured with planes off before using
Streams to post flips, and the other is to add KMS configuration to
the EGL output.
Yes, I think those are the two general directions, though neither
are great. It seems like you'd want a way to express the EGLStream to
use in a plane of a KMS configuration, to be latched on a subsequent
KMS atomic request. But, one API bleeding into the other, in either
direction, gets ugly.
Post by Daniel Stone
Though, now I think of it, this effectively precludes one case, which
is scaling a Streams-sourced buffer inside the display controller. In
the GBM case, the compositor gets every buffer, so can configure the
plane scaling in line with buffer display. I don't see how you'd do
that with Streams.
Agreed. I think we'd need something like I described above in order to
solve that within the context of EGLStreams.
Post by Daniel Stone
There's another hurdle to overcome too, which would currently preclude
avoiding the intermediate dumb buffer at all. One of the invariants
the atomic KMS API enforces is that (!!plane->crtc_id ==
!!plane->fb_id), i.e. that a plane cannot be assigned to a CRTC
without an active buffer. So again, we're left with either having the
plane fully configured and active (assigned to a CRTC and displaying,
I assume, a pre-allocated dumb buffer), or pushing more configuration
into Streams - specifically, connecting an EGLOutputLayer to an
EGLOutputPort.
Agreed.
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
Also, I'm not quite sure how you're testing the compositor-as-consumer
mode: I can't seem to see any EGL extensions which allow you to
connect a Wayland surface as an EGLStream consumer. Do you have
something else unpublished that's being used here, or is this what the
libnvidia-egl-wayland library is for? Or do you just have clients
using EGLSurfaces as normal, which happen to be implemented internally
as EGLStreams? (Also, that the only way to test this is through
proprietary drivers implementing only-just-published extensions not
only makes me very sad, but hugely increases the potential for this to
be inadvertently broken.)
Sorry if this seemed cryptic. You are correct that EGL Wayland clients
just use EGLSurfaces as normal (no Wayland client changes), and that
gets implemented using EGLStreams within libnvidia-egl-wayland.
Sorry, I'd missed this whilst reading through.
Post by Andy Ritger
FWIW, we plan to release the source to libnvidia-egl-wayland
eventually... it has a few driver-specific warts right now, but the
intent is that it is a vendor-independent implementation (though, using
EGLStreams, so...) of EGL_KHR_platform_wayland using a set of EGL API
"wrappers". The goal was to allow window systems to write these EGL
platform binding themselves, so that each EGL implementation doesn't
have to implement each EGL_KHR_platform_*. Anyway, we'll try to get
libnvidia-egl-wayland cleaned up and released.
Interesting!
Post by Andy Ritger
Post by Daniel Stone
Post by Miguel Angel Vico
Thus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a display
device.
In a similar way, by attaching a GLTexture consumer to a stream, a
producer (wayland client) could feed frames to a texture, which in
turn can be used by a compositor to prepare the final frame to be
presented.
Quick aside: this reminds me in many unfortunate ways of
GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture stream of
stuff and make it appear in a texture' model as streams, whereas most
of the rest of the world (EGL, Vulkan WSI, Wayland, Android, ChromeOS,
etc) have all moved explicitly _away_ from that model to passing
references to individual buffers, this in many ways brings us back to
tfp.
Is that really an accurate comparison? The texture_from_pixmap extension
let X11 composite managers bind a single X pixmap to an OpenGL texture.
It seems to me what was missing in TFP usage was explicit synchronization
between X and/or OpenGL rendering into the pixmap and OpenGL texturing
from the pixmap.
I'd argue that synchronisation (in terms of serialisation with the
rest of the client's protocol stream) is missing from Streams as well,
at least in mailbox mode.
(As an aside, I wonder if it's properly done in FIFO mode as well; the
compositor may very validly choose not to dequeue a buffer if a
surface is completely occluded. How does Streams then know that it can
submit another frame? Generally we use wl_surface::frame to deal with
this - the equivalent of eglSwapInterval(1) - but it sounds like
Streams relies more on strictly-paired internal queue/dequeue pairing
in FIFO mode. Maybe this isn't true.)
Right: in the case that the compositor wants to drop a frame, it would
need to dequeue it from the FIFO if it wants the client to be able to
produce a new frame. Otherwise, as I understand it, the client would
block in its next call to eglSwapBuffers().
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
Post by Miguel Angel Vico
Whenever EGL_EXT_device_drm extension is present, EGLDevice can
be used to enumerate and access DRM KMS devices, and EGLOutputLayer
to enumerate and access DRM KMS crtcs and planes.
Again, the enumeration isn't so much used as bypassed. The original
enumeration is used, and all we do with the EGL objects is a) list all
of them, b) filter them to find the one we already have, and c)
perhaps replace their internal representation of the device with the
one we already have.
That's fair in the context of this patch set.
In general, EGLDevice provides device enumeration for other use cases
where it is the basis for bootstrapping. Maybe we could better reconcile
udev and EGLDevice in the patch set, but some of this is a natural, though
unfortunate, artifact of correlating objects between two enumeration APIs.
Mind you, this wasn't intended as a criticism, just noting that the
commit message didn't accurately describe the code.
Got it; thanks.
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
I'd like to look at the elephant in the room, which is why you're
using this in the first place (aside from general NVIDIA enthusiasm
for encapsulating everything within EGL Streams/Output/Device/etc,
dating back many years). Andy/Aaron, you've said that you found GBM to
be inadequate, and I'd like to find out explicitly how.
Thanks. This is the real heart of the debate.
Yes!
Post by Andy Ritger
Post by Daniel Stone
Through a few
'We can't choose an optimal rendering configuration, because we don't
know how it's going to be used' - (almost completely) untrue. The FD
you pass to gbm_device_create is that of the KMS device, a gbm_surface
contains information as to how the plane (primary or overlay) will be
configured,
Maybe I'm not looking in the right place, but where does gbm_surface get
the intended plane configuration? Are there other display-related flags
beside GBM_BO_USE_SCANOUT? Then again, the particular plane doesn't
impact us for current GPUs.
Well, nowhere. By current plane configuration, I assume you're (to the
extent that you can discuss it) talking about asymmetric plane
capabilities, e.g. support for disjoint colour formats, scaling units,
etc? As Dan V says, I still see Streams as a rather incomplete fix to
this, given that plane assignment is pre-determined: what do you do
when your buffers are configured as optimally as possible, but the
compositor has picked the 'wrong' plane? I really think you need
something like HWC to rewrite your scene graph into the optimal setup.
Yes, encapsulating the composition within something more like HWC would
be ideal to allow for optimal use of planes.

My questions above were prompted by your statement that "a gbm_surface
contains information as to how the plane... will be configured." Maybe I
misunderstood what you meant by that.

In any case, I didn't mean to imply that EGLStreams arbitrates planes.
Rather, the intent with EGLStreams is to allow the driver to make the
most intelligent buffer allocation decisions, given the compositor's
current plane configuration.
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
and an EGLDisplay lets you tie the rendering and scanout
devices together. What more information do you need? It's true that we
don't have a way to select individual rendering devices at the moment,
but as said earlier, passing an EGLDevice as an attrib to
GetPlatformDisplay would resolve that, as you would have the render
device identified by the EGLDevice and the scanout device identified
by the gbm_device. At that point, you have the full pipeline and can
determine the optimal configuration.
Beyond choosing optimal rendering configuration, there is arbitration of
the scarce resources needed for optimal rendering configuration. E.g.,
for Wayland compositor flipping to client-produced buffers, presumably the
client's buffer needs to be allocated with GBM_BO_USE_SCANOUT. NVIDIA's
display hardware requires physically contiguous buffers, so we wouldn't
want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
flag. It would be nice to have feedback between the EGL driver instance
in the compositor and the EGL driver running in the client, to know how
the buffer is going to be used by the Wayland compositor.
I imagine other hardware has even more severe constraints on displayable
memory, though, so maybe I'm misunderstanding something about how buffers
are shared between wayland clients and compositors?
Ah! This is something I've very much had in mind - and have had for
quite a while, but keep getting pre-empted - for a while, but didn't
bring up as it didn't seem implemented in the current patchset.
It was abstracted too well :)

I think you spelled it out below, but I'd love to hear any other thoughts
you have for the right direction on this class of resource arbitration.
Post by Daniel Stone
(IIRC,
jajones had some code to allow you to retarget Streams at different
consumers, but he's on leave.)
Yes, I think James had some draft specs related to this. He'll be back
from paternity leave fairly soon; it will be good to get him involved
in this discussion.
Post by Daniel Stone
Also, I should add that there's nothing requiring clients to use GBM
to allocate. The client EGLSurface implementation is free to do purely
internal allocations that are only accessible to it, if it wants to;
gbm_bo_import would then note that the buffer is not usable for
scanout and fail the import, leaving the compositor to fall back to
EGLImage.
Got it.
Post by Daniel Stone
Post by Andy Ritger
This ties into the next point...
Post by Daniel Stone
'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?
The Vivante+Freescale example is a good one, but it would be more
interesting if they shared /some/ formats and you could only use those
common formats in /some/ cases.
That's also fairly common, particularly for tiling. Intel has more
tiling modes than I can remember, of which only one (X-tiling) is a
valid source for scanout. As you say, physical contiguity is also a
valid requirement, plus pitch alignment.
Post by Andy Ritger
I think a lot of the concern is about passing client-produced frames
all the way through to scanout (i.e., zero-copy). E.g., if the wayland
client is producing frames that the wayland compositor is going to use
as a texture, then we don't want the client to decompress as part of its
eglSwapBuffers: the wayland compositor will texture from the compressed
frame for best performance. But, if the wayland compositor is going to
flip to the surface, then we would want the client to decompress during
its eglSwapBuffers.
Yes, very much so. Taking the Freescale example, you want the client
to do a detiling blit during its swap if the surface is a valid
scanout target, but not at all if it's just getting textured by the
GPU anyway. Similarly, Intel wants to allocate X-tiled if scanout is
possible, but otherwise it wants to be Y/Yf/...-tiled.
That is good to know. How are those decisions made today?
Post by Daniel Stone
Post by Andy Ritger
The nice thing about EGLStreams here is that if the consumer (the Wayland
compositor) wants to use the content in a different way, the producer
must be notified first, in order to produce something suitable for the
new consumer.
I believe this is entirely doable with GBM right now, taking advantage
of the fact that libgbm.so and libEGL.so must be as tightly paired as
libEGL.so and libGLESv2.so. For all of these, read 'wl_drm' as 'wl_drm
or its equivalent interface in other implementations'.
Firstly, create a new interface in wl_drm to represent a swapchain (in
the Vulkan sense), and modify its buffer-creation requests to take a
swapchain parameter. This we can do without penalty, since the only
users (aside from VA-API, which is really broken and also hopefully
soon to lose its Wayland sink anyway) are EGL_EXT_platform_wayland and
EGL_WL_bind_wayland_display, both within the same DSO.
Secondly, instrument gbm_bo_import's wl_buffer path (proxy for intent
to use a buffer for direct scanout) and EGLImage's
EGL_WAYLAND_BUFFER_WL path (proxy for intent to use via GPU
composition) to determine what the compositor is actually doing with
these buffers, and use that to store target/intent in the swapchain.
Thirdly, when the target/intent changes (e.g. 'was scanout every
frame, has been EGLImage for the last 120 frames'), send an event down
to the client to let it know to modify its allocation. The combination
of EGL/GBM are in the correct place to determine this, since between
them they already have to know the intersection of capabilities
between render and scanout.
Thanks. The suggestion in the second step is particularly interesting.
I haven't tried to poke any holes in the proxy-for-intent cases, yet.
Do you think those inferences are reliable?
Post by Daniel Stone
That still doesn't solve the optimal-display-configuration problem -
that you have generic code determining not only the display strategy
(scanout vs. GPU composition) as well as the exact display controller
configuration - but neither does EGLStreams, or indeed anything
current short of HWC.
Do you see any problem with doing that within GBM? It's not actually
done yet, but then again, neither is direct scanout through Streams.
;)
This definitely seems worth exploring.
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
'Width, height, pitch and format aren't enough information' - this is
true, but not necessarily relevant. I'm not sure what the source of
this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
to be extended with a gbm_bo_get_modifier() call, which would allow
you to get the DRM format modifier to describe tiling/compression/et
al (as well as perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers). If it's
not gbm, what actually is it? The only other place I can think of
(suggested by Pekka, I think) was the wl_drm protocol, which it should
be stressed is a) not required in any way by Wayland, b) not a
published/public protocol, c) not a stable protocol. wl_drm just
happens to be the way that Mesa shares buffers, just as wl_viv is how
Vivante's proprietary driver shares buffers, and mali_buffer_sharing
is how the Mali driver does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used through
EGL, there is _no_ requirement for you to also implement wl_drm. As it
is a hidden private Mesa protocol, there is also no requirement for
the protocol to remain stable.
I agree that wl_drm doesn't factor into it.
Maybe some of this is my confusion over what parts of gbm.h are
application-facing, and what parts are driver-facing? We, and
presumably most hardware vendors, would want the ability to associate
arbitrary metadata with gbm_bo's, but most of that metadata is
implementation-specific, and not really something an application should
be looking at without sacrificing portability.
All of gbm.h is user-facing; how you implement that API is completely
up to you, including arbitrary metadata. For instance, it's the driver
that allocates its own struct gbm_surface/gbo_bo/etc (which is
opaque), so it can do whatever it likes in terms of metadata. Is there
anything in particular you're thinking of that you're not sure you'd
be able to store portably?
Might also be worth striking a common misconception here: the Mesa GBM
implementation is _not_ canonical. gbm.h is the user-facing API you
have to implement, but beyond that, you don't need to be implemented
by Mesa's src/gbm/. As the gbm.h types are all opaque, I'm not sure
what you couldn't express/hide/store - do you have any examples?
Good points. No, I don't have any examples off hand of things that
couldn't be encapsulated within that.

I agree that the Mesa GBM implementation is not canonical. Though, it
would be nice to avoid libgbm.so collisions. Let me know if I should
ask this separately on, e.g., mesa-dev, but would it be reasonable to
treat Mesa's libgbm as the "vendor neutral" library? It looks like
there are currently two opportunities to load into libgbm:

(a) Load as a "backend" DSO (i.e., get loaded by
mesa/src/gbm/main/backend.c:_gbm_create_device()).

(b) Load as a DRI driver by the DRI libgbm backend (i.e., get loaded
by mesa/src/gbm/backends/dri/gbm_dri.c).

For purposes of vendor-specific opaque data, it looks like (a) would
make the most sense. However, (b) currently conveniently infers a DSO
name to load, by querying the name of the DRM driver that corresponds
to the provided fd. Maybe it would make sense to hoist some of that
inference logic from (b) to (a)? It probably also depends on which of
(a) or (b) we'd consider a stabler ABI?
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
the explicit buffer management on the client side does not parallel
EGLStreams, and notably there is no equivalent consumer interface
offered on the server side, but instead the individual-buffer-driven
approach is taken. It's true that VK_WSI_display_swapchain does exist
and does match the EGLStreams model fairly closely, but also that it
does not have universal implementation: the Intel 'anv' Mesa-based
driver does not implement display_swapchain, instead having an
interface to export a VkImage as a dmabuf. It's true that the latter
is not optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
GBM avoids it for EGL on KMS.
From your perspective, what would be more optimal than VkImage+dmabuf?
Well, it's pretty much on par with GBM-compositor-Wayland-client and
an EGLStreams pipeline ending in an EGLOutput. Not having something
like HWC means that you can't determine the optimal plane-allocation
strategy.
Post by Andy Ritger
Post by Daniel Stone
Agreed. One of the things I've been incredibly happy with is how our
platform has managed to stay completely generic and vendor-neutral so
far, and I'd love to preserve that.
I don't think you'll find any disagreement to that from NVIDIA, either.
I apologize if the EGLStreams proposal gave the impression of a
vendor-private solution. That wasn't the intent. The EGLStream family
of extensions are, after all, an open specification that any EGL vendor
can implement. If there are aspects of any of these EGL extensions that
seem useful, I'd hope that Mesa would we willing to adopt them.
Indeed, this wasn't to cast any aspersions on how you guys have
developed Streams. Having it out there and having these patches has
really been tremendously useful.
Great.
Post by Daniel Stone
Post by Andy Ritger
We (NVIDIA) clearly think EGLStreams is a good direction for expressing
buffer sharing semantics. In our ideal world, everyone would implement
these extensions and Wayland compositors would migrate to using them as
the generic vendor-neutral mechanism for buffer sharing :)
But here's where my problem lies. At the moment, the 'how do I
Wayland' story is very straightforward, and not entirely
coincidentally similar to ChromeOS's: you implement GBM+KMS, you
implement the ~25 LoC of libwayland-egl, you implement
EGL_EXT_platform_{gbm,wayland}, and ... that's it. Introducing Streams
as an alternate model is certainly interesting, and I understand why
you would do it, but having it as the sole option muddies the 'how do
I Wayland' story significantly.
Getting away from the vendor-bound DDX model was something we were
desperate to do (see also xf86-video-modesetting landing on GBM+EGL),
and I'd really just like to avoid that becoming 'well, for most
platforms you do this, but for this platform / these platforms, you do
this instead ...'.
Yes, fair and understood.

Thanks,
- Andy
Post by Daniel Stone
Cheers,
Daniel
Miguel Angel Vico
2016-04-02 10:12:05 UTC
Permalink
A couple additions to Andy's comments inline.

On Fri, 1 Apr 2016 17:28:17 -0700
Post by Andy Ritger
Post by Daniel Stone
Hi Andy,
Post by Andy Ritger
Thanks for the thorough responses, Daniel.
No problem; as I said, I'm actually really happy to see an
implementation out there.
Post by Andy Ritger
On 21 March 2016 at 16:28, Miguel Angel Vico
Post by Miguel Angel Vico
Similarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.
For instance, EGLOutputLayer represents a portion of
display control hardware that accepts an image as input and
processes it for presentation on a display device.
I still struggle to see the value of what is essentially an
abstraction over KMS, but oh well.
The intent wasn't to abstract all of KMS, just the surface
presentation aspect where EGL and KMS intersect. Besides the
other points below, an additional motivation for abstraction is
to allow EGL to work with the native modesetting APIs on other
platforms (e.g., OpenWF on QNX).
Fair enough. And, ah, _that's_ where the OpenWF implementation is -
I was honestly unsure for years since the last implementation I saw
was from the ex-Hybrid NVIDIA guys in Helsinki, back when it was
aimed at Series 60.
Yes. I haven't had any direct interaction with the QNX implementation
of OpenWF. In any case, portability across OSes has been an important
part of our downstream Wayland efforts in automotive.
Post by Daniel Stone
Post by Andy Ritger
Firstly, again looking at the case where a Wayland client is a
stream producer and the Wayland compositor is a consumer, we
move from a model where references to individual buffers are
explicitly passed through the Wayland protocol, to where those
buffers merely carry a reference to a stream. Again, as stated
in the review of 4/7, that looks like it has the potential to
break some actual real-world cases, and I have no idea how to
solve it, other than banning mailbox mode, which would seem to
mostly defeat the point of Streams (more on that below).
Streams are just a transport for frames. The client still
explicitly communicates when a frame is delivered through the
stream via wayland protocol, and the compositor controls when it
grabs a new frame, via eglStreamConsumerAcquireKHR(). Unless
there are bugs in the patches, the flow of buffers is still
explicit and fully under the wayland protocol and compositor's
control.
Right, I believe if you have FIFO mode and strictly enforce
synchronisation to wl_surface::frame, then you should be safe.
Mailbox mode or any other kind of SwapInterval(0) equivalent opens
you up to a series of issues.
Post by Andy Ritger
Also, mailbox mode versus FIFO mode should essentially equate to
Vsync off versus Vsync on, respectively. It shouldn't have
anything to do with the benefits of streams, but mailbox mode is
a nice feature for benchmarking games/simulations or naively
displaying your latest & greatest content without tearing.
I agree it's definitely a nice thing to have, but it does bring up
the serialisation issue: we expect any configuration performed by
the client (say, wl_surface::set_opaque_area to let the compositor
know where it can disable blending) to be fully in-line with buffer
attachment. The extreme case of this is resize, but there are quite
a few valid cases where you need serialisation.
I don't know quite off the top of my head how you'd support mailbox
mode with Streams, given this constraint - you need three-way
feedback between the compositor (recording all associated surface
state, including subsurfaces), clients (recording the surface state
valid when that buffer was posted), and the Streams implementation
(determining which frames to dequeue, which to discard and return to
the client, etc).
It is possible we don't get that all completely right in our
implementation, yet.
Post by Daniel Stone
Post by Andy Ritger
Secondly, looking at the compositor-drm case, the use of the dumb
buffer to display undefined content as a dummy modeset really
makes me uneasy,
Yes, the use of dumb buffer in this patch series is a kludge. If
we were going to use drmModeSetCrtc + EGLStreams, I think we'd
want to pass no fb to drmModeSetCrtc, but that currently gets
rejected by DRM. Are surface-less modesets intended to be
allowable in DRM? I can hunt that down if that is intended to
work. Of course, better to work out how EGLStreams should
cooperate with atomic KMS.
It was definitely an oversight to not zero initialize the dumb buffer.
Right, atomic allows you separate pipe/CRTC configuration from
plane/overlay configuration. So you'd have two options: one is to
use atomic and require the CRTC be configured with planes off
before using Streams to post flips, and the other is to add KMS
configuration to the EGL output.
Yes, I think those are the two general directions, though neither
are great. It seems like you'd want a way to express the EGLStream to
use in a plane of a KMS configuration, to be latched on a subsequent
KMS atomic request. But, one API bleeding into the other, in either
direction, gets ugly.
Post by Daniel Stone
Though, now I think of it, this effectively precludes one case,
which is scaling a Streams-sourced buffer inside the display
controller. In the GBM case, the compositor gets every buffer, so
can configure the plane scaling in line with buffer display. I
don't see how you'd do that with Streams.
Agreed. I think we'd need something like I described above in order
to solve that within the context of EGLStreams.
Post by Daniel Stone
There's another hurdle to overcome too, which would currently
preclude avoiding the intermediate dumb buffer at all. One of the
invariants the atomic KMS API enforces is that (!!plane->crtc_id ==
!!plane->fb_id), i.e. that a plane cannot be assigned to a CRTC
without an active buffer. So again, we're left with either having
the plane fully configured and active (assigned to a CRTC and
displaying, I assume, a pre-allocated dumb buffer), or pushing more
configuration into Streams - specifically, connecting an
EGLOutputLayer to an EGLOutputPort.
Agreed.
Post by Daniel Stone
Post by Andy Ritger
Also, I'm not quite sure how you're testing the
compositor-as-consumer mode: I can't seem to see any EGL
extensions which allow you to connect a Wayland surface as an
EGLStream consumer. Do you have something else unpublished
that's being used here, or is this what the
libnvidia-egl-wayland library is for? Or do you just have
clients using EGLSurfaces as normal, which happen to be
implemented internally as EGLStreams? (Also, that the only way
to test this is through proprietary drivers implementing
only-just-published extensions not only makes me very sad, but
hugely increases the potential for this to be inadvertently
broken.)
Sorry if this seemed cryptic. You are correct that EGL Wayland
clients just use EGLSurfaces as normal (no Wayland client
changes), and that gets implemented using EGLStreams within
libnvidia-egl-wayland.
Sorry, I'd missed this whilst reading through.
Post by Andy Ritger
FWIW, we plan to release the source to libnvidia-egl-wayland
eventually... it has a few driver-specific warts right now, but
the intent is that it is a vendor-independent implementation
(though, using EGLStreams, so...) of EGL_KHR_platform_wayland
using a set of EGL API "wrappers". The goal was to allow window
systems to write these EGL platform binding themselves, so that
each EGL implementation doesn't have to implement each
EGL_KHR_platform_*. Anyway, we'll try to get
libnvidia-egl-wayland cleaned up and released.
Interesting!
Post by Andy Ritger
Post by Miguel Angel Vico
Thus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a
display device.
In a similar way, by attaching a GLTexture consumer to a
stream, a producer (wayland client) could feed frames to a
texture, which in turn can be used by a compositor to prepare
the final frame to be presented.
Quick aside: this reminds me in many unfortunate ways of
GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture
stream of stuff and make it appear in a texture' model as
streams, whereas most of the rest of the world (EGL, Vulkan WSI,
Wayland, Android, ChromeOS, etc) have all moved explicitly
_away_ from that model to passing references to individual
buffers, this in many ways brings us back to tfp.
Is that really an accurate comparison? The texture_from_pixmap
extension let X11 composite managers bind a single X pixmap to an
OpenGL texture. It seems to me what was missing in TFP usage was
explicit synchronization between X and/or OpenGL rendering into
the pixmap and OpenGL texturing from the pixmap.
I'd argue that synchronisation (in terms of serialisation with the
rest of the client's protocol stream) is missing from Streams as
well, at least in mailbox mode.
(As an aside, I wonder if it's properly done in FIFO mode as well;
the compositor may very validly choose not to dequeue a buffer if a
surface is completely occluded. How does Streams then know that it
can submit another frame? Generally we use wl_surface::frame to
deal with this - the equivalent of eglSwapInterval(1) - but it
sounds like Streams relies more on strictly-paired internal
queue/dequeue pairing in FIFO mode. Maybe this isn't true.)
Right: in the case that the compositor wants to drop a frame, it would
need to dequeue it from the FIFO if it wants the client to be able to
produce a new frame. Otherwise, as I understand it, the client would
block in its next call to eglSwapBuffers().
That's correct: in FIFO mode the EGL producer will block in
eglSwapBuffers() if the FIFO is full.

IIUC, it's wayland client's responsibility to request wl_surface::frame
notifications. Our implementation doesn't forbid the client to make use
of wl_surface::frame so it could skip eglSwapBuffers() calls. Therefore,
it would avoid blocking in eglSwapBuffers().

FWIW, in my next round of weston patches I'm planing to better align
our eglstreams implementation with the non-eglstreams when it comes to
updating a client texture content. If I'm reading weston code correctly,
it happens at buffer attach time, and weston calls into renderer attach
regardless the client surface being completely occluded or not.

The only behavior mismatch I see between the eglstream and
non-eglstream implementations is whenever a compositor doesn't update
the client texture content if it's occluded, and the client application
doesn't make use of wl_surface::frame. In that case, the eglstream
implementation would block if in FIFO mode, while the non-eglstream
wouldn't.
Post by Andy Ritger
Post by Daniel Stone
Post by Andy Ritger
Post by Miguel Angel Vico
Whenever EGL_EXT_device_drm extension is present,
EGLDevice can be used to enumerate and access DRM KMS devices,
and EGLOutputLayer to enumerate and access DRM KMS crtcs and
planes.
Again, the enumeration isn't so much used as bypassed. The
original enumeration is used, and all we do with the EGL objects
is a) list all of them, b) filter them to find the one we
already have, and c) perhaps replace their internal
representation of the device with the one we already have.
That's fair in the context of this patch set.
In general, EGLDevice provides device enumeration for other use
cases where it is the basis for bootstrapping. Maybe we could
better reconcile udev and EGLDevice in the patch set, but some of
this is a natural, though unfortunate, artifact of correlating
objects between two enumeration APIs.
Mind you, this wasn't intended as a criticism, just noting that the
commit message didn't accurately describe the code.
Got it; thanks.
The intent was to give a briefing about those extensions/EGL structures
so people could understand better our patches. It doesn't mean we
replaced the current enumeration mechanism in compositor-drm.c, however
we use it to find the EGLDevice corresponding to the selected DRM
device.

Anyway, I'll try to make it clearer in the commit message of the
updated patch.

Thanks,
Miguel.
Post by Andy Ritger
Post by Daniel Stone
Post by Andy Ritger
I'd like to look at the elephant in the room, which is why you're
using this in the first place (aside from general NVIDIA
enthusiasm for encapsulating everything within EGL
Streams/Output/Device/etc, dating back many years). Andy/Aaron,
you've said that you found GBM to be inadequate, and I'd like to
find out explicitly how.
Thanks. This is the real heart of the debate.
Yes!
Post by Andy Ritger
Through a few
'We can't choose an optimal rendering configuration, because we
don't know how it's going to be used' - (almost completely)
untrue. The FD you pass to gbm_device_create is that of the KMS
device, a gbm_surface contains information as to how the plane
(primary or overlay) will be configured,
Maybe I'm not looking in the right place, but where does
gbm_surface get the intended plane configuration? Are there
other display-related flags beside GBM_BO_USE_SCANOUT? Then
again, the particular plane doesn't impact us for current GPUs.
Well, nowhere. By current plane configuration, I assume you're (to
the extent that you can discuss it) talking about asymmetric plane
capabilities, e.g. support for disjoint colour formats, scaling
units, etc? As Dan V says, I still see Streams as a rather
incomplete fix to this, given that plane assignment is
pre-determined: what do you do when your buffers are configured as
optimally as possible, but the compositor has picked the 'wrong'
plane? I really think you need something like HWC to rewrite your
scene graph into the optimal setup.
Yes, encapsulating the composition within something more like HWC
would be ideal to allow for optimal use of planes.
My questions above were prompted by your statement that "a gbm_surface
contains information as to how the plane... will be configured."
Maybe I misunderstood what you meant by that.
In any case, I didn't mean to imply that EGLStreams arbitrates planes.
Rather, the intent with EGLStreams is to allow the driver to make the
most intelligent buffer allocation decisions, given the compositor's
current plane configuration.
Post by Daniel Stone
Post by Andy Ritger
and an EGLDisplay lets you tie the rendering and scanout
devices together. What more information do you need? It's true
that we don't have a way to select individual rendering devices
at the moment, but as said earlier, passing an EGLDevice as an
attrib to GetPlatformDisplay would resolve that, as you would
have the render device identified by the EGLDevice and the
scanout device identified by the gbm_device. At that point, you
have the full pipeline and can determine the optimal
configuration.
Beyond choosing optimal rendering configuration, there is
arbitration of the scarce resources needed for optimal rendering
configuration. E.g., for Wayland compositor flipping to
client-produced buffers, presumably the client's buffer needs to
be allocated with GBM_BO_USE_SCANOUT. NVIDIA's display hardware
requires physically contiguous buffers, so we wouldn't want
clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
flag. It would be nice to have feedback between the EGL driver
instance in the compositor and the EGL driver running in the
client, to know how the buffer is going to be used by the Wayland
compositor.
I imagine other hardware has even more severe constraints on
displayable memory, though, so maybe I'm misunderstanding
something about how buffers are shared between wayland clients
and compositors?
Ah! This is something I've very much had in mind - and have had for
quite a while, but keep getting pre-empted - for a while, but didn't
bring up as it didn't seem implemented in the current patchset.
It was abstracted too well :)
I think you spelled it out below, but I'd love to hear any other
thoughts you have for the right direction on this class of resource
arbitration.
Post by Daniel Stone
(IIRC,
jajones had some code to allow you to retarget Streams at different
consumers, but he's on leave.)
Yes, I think James had some draft specs related to this. He'll be
back from paternity leave fairly soon; it will be good to get him
involved in this discussion.
Post by Daniel Stone
Also, I should add that there's nothing requiring clients to use GBM
to allocate. The client EGLSurface implementation is free to do
purely internal allocations that are only accessible to it, if it
wants to; gbm_bo_import would then note that the buffer is not
usable for scanout and fail the import, leaving the compositor to
fall back to EGLImage.
Got it.
Post by Daniel Stone
Post by Andy Ritger
This ties into the next point...
'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that
barrier. For example, in Freescale i.MX6, the Vivante GPU and
Freescale IPU (display controller) do not share a single common
format between GPU render targets and IPU scanout sources, so
require a mandatory detiling pass in between render and display.
These work just fine with gbm with that pass scheduled by
eglSwapBuffers. This to me seems completely explicit, unless
there was something else you were meaning ... ?
The Vivante+Freescale example is a good one, but it would be more
interesting if they shared /some/ formats and you could only use
those common formats in /some/ cases.
That's also fairly common, particularly for tiling. Intel has more
tiling modes than I can remember, of which only one (X-tiling) is a
valid source for scanout. As you say, physical contiguity is also a
valid requirement, plus pitch alignment.
Post by Andy Ritger
I think a lot of the concern is about passing client-produced
frames all the way through to scanout (i.e., zero-copy). E.g., if
the wayland client is producing frames that the wayland
compositor is going to use as a texture, then we don't want the
client to decompress as part of its eglSwapBuffers: the wayland
compositor will texture from the compressed frame for best
performance. But, if the wayland compositor is going to flip to
the surface, then we would want the client to decompress during
its eglSwapBuffers.
Yes, very much so. Taking the Freescale example, you want the client
to do a detiling blit during its swap if the surface is a valid
scanout target, but not at all if it's just getting textured by the
GPU anyway. Similarly, Intel wants to allocate X-tiled if scanout is
possible, but otherwise it wants to be Y/Yf/...-tiled.
That is good to know. How are those decisions made today?
Post by Daniel Stone
Post by Andy Ritger
The nice thing about EGLStreams here is that if the consumer (the
Wayland compositor) wants to use the content in a different way,
the producer must be notified first, in order to produce
something suitable for the new consumer.
I believe this is entirely doable with GBM right now, taking
advantage of the fact that libgbm.so and libEGL.so must be as
tightly paired as libEGL.so and libGLESv2.so. For all of these,
read 'wl_drm' as 'wl_drm or its equivalent interface in other
implementations'.
Firstly, create a new interface in wl_drm to represent a swapchain
(in the Vulkan sense), and modify its buffer-creation requests to
take a swapchain parameter. This we can do without penalty, since
the only users (aside from VA-API, which is really broken and also
hopefully soon to lose its Wayland sink anyway) are
EGL_EXT_platform_wayland and EGL_WL_bind_wayland_display, both
within the same DSO.
Secondly, instrument gbm_bo_import's wl_buffer path (proxy for
intent to use a buffer for direct scanout) and EGLImage's
EGL_WAYLAND_BUFFER_WL path (proxy for intent to use via GPU
composition) to determine what the compositor is actually doing with
these buffers, and use that to store target/intent in the swapchain.
Thirdly, when the target/intent changes (e.g. 'was scanout every
frame, has been EGLImage for the last 120 frames'), send an event
down to the client to let it know to modify its allocation. The
combination of EGL/GBM are in the correct place to determine this,
since between them they already have to know the intersection of
capabilities between render and scanout.
Thanks. The suggestion in the second step is particularly
interesting. I haven't tried to poke any holes in the
proxy-for-intent cases, yet. Do you think those inferences are
reliable?
Post by Daniel Stone
That still doesn't solve the optimal-display-configuration problem -
that you have generic code determining not only the display strategy
(scanout vs. GPU composition) as well as the exact display
controller configuration - but neither does EGLStreams, or indeed
anything current short of HWC.
Do you see any problem with doing that within GBM? It's not actually
done yet, but then again, neither is direct scanout through Streams.
;)
This definitely seems worth exploring.
Post by Daniel Stone
Post by Andy Ritger
'Width, height, pitch and format aren't enough information' -
this is true, but not necessarily relevant. I'm not sure what
the source of this actually is: is it the gbm_bo_get_*() APIs?
If so, yes, they need to be extended with a
gbm_bo_get_modifier() call, which would allow you to get the DRM
format modifier to describe tiling/compression/et al (as well as
perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers).
If it's not gbm, what actually is it? The only other place I can
think of (suggested by Pekka, I think) was the wl_drm protocol,
which it should be stressed is a) not required in any way by
Wayland, b) not a published/public protocol, c) not a stable
protocol. wl_drm just happens to be the way that Mesa shares
buffers, just as wl_viv is how Vivante's proprietary driver
shares buffers, and mali_buffer_sharing is how the Mali driver
does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used
through EGL, there is _no_ requirement for you to also implement
wl_drm. As it is a hidden private Mesa protocol, there is also
no requirement for the protocol to remain stable.
I agree that wl_drm doesn't factor into it.
Maybe some of this is my confusion over what parts of gbm.h are
application-facing, and what parts are driver-facing? We, and
presumably most hardware vendors, would want the ability to
associate arbitrary metadata with gbm_bo's, but most of that
metadata is implementation-specific, and not really something an
application should be looking at without sacrificing
portability.
All of gbm.h is user-facing; how you implement that API is
completely up to you, including arbitrary metadata. For instance,
it's the driver that allocates its own struct
gbm_surface/gbo_bo/etc (which is opaque), so it can do whatever it
likes in terms of metadata. Is there anything in particular you're
thinking of that you're not sure you'd be able to store portably?
Might also be worth striking a common misconception here: the Mesa
GBM implementation is _not_ canonical. gbm.h is the user-facing API
you have to implement, but beyond that, you don't need to be
implemented by Mesa's src/gbm/. As the gbm.h types are all opaque,
I'm not sure what you couldn't express/hide/store - do you have any
examples?
Good points. No, I don't have any examples off hand of things that
couldn't be encapsulated within that.
I agree that the Mesa GBM implementation is not canonical. Though, it
would be nice to avoid libgbm.so collisions. Let me know if I should
ask this separately on, e.g., mesa-dev, but would it be reasonable to
treat Mesa's libgbm as the "vendor neutral" library? It looks like
(a) Load as a "backend" DSO (i.e., get loaded by
mesa/src/gbm/main/backend.c:_gbm_create_device()).
(b) Load as a DRI driver by the DRI libgbm backend (i.e., get loaded
by mesa/src/gbm/backends/dri/gbm_dri.c).
For purposes of vendor-specific opaque data, it looks like (a) would
make the most sense. However, (b) currently conveniently infers a DSO
name to load, by querying the name of the DRM driver that corresponds
to the provided fd. Maybe it would make sense to hoist some of that
inference logic from (b) to (a)? It probably also depends on which of
(a) or (b) we'd consider a stabler ABI?
Post by Daniel Stone
Post by Andy Ritger
'EGLStreams is the direction taken in Vulkan' - I would argue
not. IMO the explicit buffer management on the client side does
not parallel EGLStreams, and notably there is no equivalent
consumer interface offered on the server side, but instead the
individual-buffer-driven approach is taken. It's true that
VK_WSI_display_swapchain does exist and does match the
EGLStreams model fairly closely, but also that it does not have
universal implementation: the Intel 'anv' Mesa-based driver does
not implement display_swapchain, instead having an interface to
export a VkImage as a dmabuf. It's true that the latter is not
optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS,
just as GBM avoids it for EGL on KMS.
From your perspective, what would be more optimal than
VkImage+dmabuf?
Well, it's pretty much on par with GBM-compositor-Wayland-client and
an EGLStreams pipeline ending in an EGLOutput. Not having something
like HWC means that you can't determine the optimal plane-allocation
strategy.
Post by Andy Ritger
Agreed. One of the things I've been incredibly happy with is how
our platform has managed to stay completely generic and
vendor-neutral so far, and I'd love to preserve that.
I don't think you'll find any disagreement to that from NVIDIA, either.
I apologize if the EGLStreams proposal gave the impression of a
vendor-private solution. That wasn't the intent. The EGLStream
family of extensions are, after all, an open specification that
any EGL vendor can implement. If there are aspects of any of
these EGL extensions that seem useful, I'd hope that Mesa would
we willing to adopt them.
Indeed, this wasn't to cast any aspersions on how you guys have
developed Streams. Having it out there and having these patches has
really been tremendously useful.
Great.
Post by Daniel Stone
Post by Andy Ritger
We (NVIDIA) clearly think EGLStreams is a good direction for
expressing buffer sharing semantics. In our ideal world,
everyone would implement these extensions and Wayland compositors
would migrate to using them as the generic vendor-neutral
mechanism for buffer sharing :)
But here's where my problem lies. At the moment, the 'how do I
Wayland' story is very straightforward, and not entirely
coincidentally similar to ChromeOS's: you implement GBM+KMS, you
implement the ~25 LoC of libwayland-egl, you implement
EGL_EXT_platform_{gbm,wayland}, and ... that's it. Introducing
Streams as an alternate model is certainly interesting, and I
understand why you would do it, but having it as the sole option
muddies the 'how do I Wayland' story significantly.
Getting away from the vendor-bound DDX model was something we were
desperate to do (see also xf86-video-modesetting landing on
GBM+EGL), and I'd really just like to avoid that becoming 'well,
for most platforms you do this, but for this platform / these
platforms, you do this instead ...'.
Yes, fair and understood.
Thanks,
- Andy
Post by Daniel Stone
Cheers,
Daniel
--
Miguel

NVIDIA GmbH, Wuerselen, Germany, Amtsgericht Aachen, HRB 8361
Managing Director: Karen Theresa Burns

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
Daniel Stone
2016-04-04 15:35:15 UTC
Permalink
Hi Miguel,
Post by Miguel Angel Vico
On Fri, 1 Apr 2016 17:28:17 -0700
Post by Andy Ritger
Post by Daniel Stone
(As an aside, I wonder if it's properly done in FIFO mode as well;
the compositor may very validly choose not to dequeue a buffer if a
surface is completely occluded. How does Streams then know that it
can submit another frame? Generally we use wl_surface::frame to
deal with this - the equivalent of eglSwapInterval(1) - but it
sounds like Streams relies more on strictly-paired internal
queue/dequeue pairing in FIFO mode. Maybe this isn't true.)
Right: in the case that the compositor wants to drop a frame, it would
need to dequeue it from the FIFO if it wants the client to be able to
produce a new frame. Otherwise, as I understand it, the client would
block in its next call to eglSwapBuffers().
That's correct: in FIFO mode the EGL producer will block in
eglSwapBuffers() if the FIFO is full.
IIUC, it's wayland client's responsibility to request wl_surface::frame
notifications. Our implementation doesn't forbid the client to make use
of wl_surface::frame so it could skip eglSwapBuffers() calls. Therefore,
it would avoid blocking in eglSwapBuffers().
Yes-ish; depends if you define client at the process boundary, or the
DSO boundary. ;)

Taking the definition at the DSO boundary (i.e. 'Wayland client' as
something separate to the EGL Wayland implementation), then Wayland
clients _may_ call wl_surface_frame themselves to ensure they don't
block in SwapBuffers. However, the EGL implementation must also - when
in SwapInterval(1) mode - call wl_surface_frame itself, so that the
classic dumb rendering loop of while (true) { glDraw*();
eglSwapBuffers(); } is throttled to the compositor's repaint loop.
Post by Miguel Angel Vico
FWIW, in my next round of weston patches I'm planing to better align
our eglstreams implementation with the non-eglstreams when it comes to
updating a client texture content. If I'm reading weston code correctly,
it happens at buffer attach time, and weston calls into renderer attach
regardless the client surface being completely occluded or not.
Right, reading through it now, it seems like always dequeuing in
renderer->attach() will solve this (non-)issue.
Post by Miguel Angel Vico
The only behavior mismatch I see between the eglstream and
non-eglstream implementations is whenever a compositor doesn't update
the client texture content if it's occluded, and the client application
doesn't make use of wl_surface::frame. In that case, the eglstream
implementation would block if in FIFO mode, while the non-eglstream
wouldn't.
Both Weston and Mutter can support this (i.e. dequeue at
attach+commit, regardless of repaint), so in practical terms you'll be
fine if you just need to support those two.

Cheers,
Daniel
Daniel Stone
2016-04-04 15:27:56 UTC
Permalink
Hi,
Post by Andy Ritger
Post by Daniel Stone
Post by Andy Ritger
Also, mailbox mode versus FIFO mode should essentially equate to Vsync
off versus Vsync on, respectively. It shouldn't have anything to do
with the benefits of streams, but mailbox mode is a nice feature for
benchmarking games/simulations or naively displaying your latest &
greatest content without tearing.
I agree it's definitely a nice thing to have, but it does bring up the
serialisation issue: we expect any configuration performed by the
client (say, wl_surface::set_opaque_area to let the compositor know
where it can disable blending) to be fully in-line with buffer
attachment. The extreme case of this is resize, but there are quite a
few valid cases where you need serialisation.
I don't know quite off the top of my head how you'd support mailbox
mode with Streams, given this constraint - you need three-way feedback
between the compositor (recording all associated surface state,
including subsurfaces), clients (recording the surface state valid
when that buffer was posted), and the Streams implementation
(determining which frames to dequeue, which to discard and return to
the client, etc).
It is possible we don't get that all completely right in our implementation, yet.
Again this comes down to the synchronisation. In this case, assuming a
mailbox stream:
- wl_egl_surface_resize(w1, h1)
- gl*()
- eglSwapBuffers() <- commit 1
- wl_egl_surface_resize(w2, h2)
- gl*()
- eglSwapBuffers() <- commit 2

For this, you would need some kind of synchronisation, to ensure that
processing commit 1 didn't pick up on the differently-sized frames for
commit 2.
Post by Andy Ritger
Post by Daniel Stone
Right, atomic allows you separate pipe/CRTC configuration from
plane/overlay configuration. So you'd have two options: one is to use
atomic and require the CRTC be configured with planes off before using
Streams to post flips, and the other is to add KMS configuration to
the EGL output.
Yes, I think those are the two general directions, though neither
are great. It seems like you'd want a way to express the EGLStream to
use in a plane of a KMS configuration, to be latched on a subsequent
KMS atomic request. But, one API bleeding into the other, in either
direction, gets ugly.
Post by Daniel Stone
Though, now I think of it, this effectively precludes one case, which
is scaling a Streams-sourced buffer inside the display controller. In
the GBM case, the compositor gets every buffer, so can configure the
plane scaling in line with buffer display. I don't see how you'd do
that with Streams.
Agreed. I think we'd need something like I described above in order to
solve that within the context of EGLStreams.
Hm, so you'd effectively want to hand an atomic-KMS request object to
Streams, requesting that it stage its current state into that request.
The pending state is private ABI for libdrm, so doing post-hoc
rewrites wouldn't really work.

One detail which comes to mind: our assign_planes hook is what's
responsible for scanning the scene graph and pulling things out into
planes. We do a test request for each plane, to iteratively determine
(via trial and error) which scanout-candidate buffers we can and can't
hoist into planes. This can fail for any number of reasons (exceeded
global bandwidth limits, run out of shared scaler/detiling units, too
many planes on a single scanline, etc etc), so one key requirement we
have is that this fail gracefully and fall back to EGLImage
composition.

Would this work without client intervention, i.e. one buffer used in a
(failed) kernel request and then subsequently used for GPU
composition?
Post by Andy Ritger
Post by Daniel Stone
I'd argue that synchronisation (in terms of serialisation with the
rest of the client's protocol stream) is missing from Streams as well,
at least in mailbox mode.
(As an aside, I wonder if it's properly done in FIFO mode as well; the
compositor may very validly choose not to dequeue a buffer if a
surface is completely occluded. How does Streams then know that it can
submit another frame? Generally we use wl_surface::frame to deal with
this - the equivalent of eglSwapInterval(1) - but it sounds like
Streams relies more on strictly-paired internal queue/dequeue pairing
in FIFO mode. Maybe this isn't true.)
Right: in the case that the compositor wants to drop a frame, it would
need to dequeue it from the FIFO if it wants the client to be able to
produce a new frame. Otherwise, as I understand it, the client would
block in its next call to eglSwapBuffers().
Right, this should be doable with the existing attach hooks. I had
some concerns about subsurface commits, but am not sure they hold up.
Either way, they're fixable with Weston.
Post by Andy Ritger
Post by Daniel Stone
Post by Andy Ritger
Maybe I'm not looking in the right place, but where does gbm_surface get
the intended plane configuration? Are there other display-related flags
beside GBM_BO_USE_SCANOUT? Then again, the particular plane doesn't
impact us for current GPUs.
Well, nowhere. By current plane configuration, I assume you're (to the
extent that you can discuss it) talking about asymmetric plane
capabilities, e.g. support for disjoint colour formats, scaling units,
etc? As Dan V says, I still see Streams as a rather incomplete fix to
this, given that plane assignment is pre-determined: what do you do
when your buffers are configured as optimally as possible, but the
compositor has picked the 'wrong' plane? I really think you need
something like HWC to rewrite your scene graph into the optimal setup.
Yes, encapsulating the composition within something more like HWC would
be ideal to allow for optimal use of planes.
My questions above were prompted by your statement that "a gbm_surface
contains information as to how the plane... will be configured." Maybe I
misunderstood what you meant by that.
Oh right: I was just talking about basic dimensions and format. Sorry
for the confusion. Which other attributes would you like to see? I
guess scaling is a fairly obvious one.
Post by Andy Ritger
Post by Daniel Stone
Post by Andy Ritger
I think a lot of the concern is about passing client-produced frames
all the way through to scanout (i.e., zero-copy). E.g., if the wayland
client is producing frames that the wayland compositor is going to use
as a texture, then we don't want the client to decompress as part of its
eglSwapBuffers: the wayland compositor will texture from the compressed
frame for best performance. But, if the wayland compositor is going to
flip to the surface, then we would want the client to decompress during
its eglSwapBuffers.
Yes, very much so. Taking the Freescale example, you want the client
to do a detiling blit during its swap if the surface is a valid
scanout target, but not at all if it's just getting textured by the
GPU anyway. Similarly, Intel wants to allocate X-tiled if scanout is
possible, but otherwise it wants to be Y/Yf/...-tiled.
That is good to know. How are those decisions made today?
The dumbest way possible: Intel and AMD drivers just force all winsys
buffers to be scanout-compatible, partly as a hangover from X11 where
it was a lot more complicated to schedule composition. Freescale is
as-yet unresolved, but I believe it goes the opposite direction and
never aims for scanout-compatible buffers, except when sat directly on
top of GBM. Something I've been hoping to get to, but endlessly
pre-empted.

I agree it's a massive issue though and something we need to get fixed properly.
Post by Andy Ritger
Post by Daniel Stone
I believe this is entirely doable with GBM right now, taking advantage
of the fact that libgbm.so and libEGL.so must be as tightly paired as
libEGL.so and libGLESv2.so. For all of these, read 'wl_drm' as 'wl_drm
or its equivalent interface in other implementations'.
Firstly, create a new interface in wl_drm to represent a swapchain (in
the Vulkan sense), and modify its buffer-creation requests to take a
swapchain parameter. This we can do without penalty, since the only
users (aside from VA-API, which is really broken and also hopefully
soon to lose its Wayland sink anyway) are EGL_EXT_platform_wayland and
EGL_WL_bind_wayland_display, both within the same DSO.
Secondly, instrument gbm_bo_import's wl_buffer path (proxy for intent
to use a buffer for direct scanout) and EGLImage's
EGL_WAYLAND_BUFFER_WL path (proxy for intent to use via GPU
composition) to determine what the compositor is actually doing with
these buffers, and use that to store target/intent in the swapchain.
Thirdly, when the target/intent changes (e.g. 'was scanout every
frame, has been EGLImage for the last 120 frames'), send an event down
to the client to let it know to modify its allocation. The combination
of EGL/GBM are in the correct place to determine this, since between
them they already have to know the intersection of capabilities
between render and scanout.
Thanks. The suggestion in the second step is particularly interesting.
I haven't tried to poke any holes in the proxy-for-intent cases, yet.
Do you think those inferences are reliable?
Reliable-ish. The gbm_bo_import part is entirely reliable, since that
does only get called in assign_planes, when we've determined that we
would like to use that view as a scanout target. EGLImages will always
be created at attach time, so that's not a determination of intent,
_but_ as the configuration can change at any time without the client
posting new buffers, we do need the buffer to be EGL/GLES-compatible
as our lowest common denominator anyway, so.

(All of the above that I'm discussing is specific to Weston. Mutter
does not support composition bypass due to internal architectural
issues - its deep tie to Clutter's scene graph, Enlightenment are
still working heavily on their KMS backend and haven't got to that
point yet, and I'm not sure KWin does either.)
Post by Andy Ritger
Post by Daniel Stone
That still doesn't solve the optimal-display-configuration problem -
that you have generic code determining not only the display strategy
(scanout vs. GPU composition) as well as the exact display controller
configuration - but neither does EGLStreams, or indeed anything
current short of HWC.
Do you see any problem with doing that within GBM? It's not actually
done yet, but then again, neither is direct scanout through Streams.
;)
This definitely seems worth exploring.
Great! Let me know if I can be of any use, if you do end up exploring
this angle.
Post by Andy Ritger
Post by Daniel Stone
Might also be worth striking a common misconception here: the Mesa GBM
implementation is _not_ canonical. gbm.h is the user-facing API you
have to implement, but beyond that, you don't need to be implemented
by Mesa's src/gbm/. As the gbm.h types are all opaque, I'm not sure
what you couldn't express/hide/store - do you have any examples?
Good points. No, I don't have any examples off hand of things that
couldn't be encapsulated within that.
I agree that the Mesa GBM implementation is not canonical. Though, it
would be nice to avoid libgbm.so collisions.
Oh, yes. We should probably avoid creating new glvnd-type issues for
ourselves, yes ...
Post by Andy Ritger
Let me know if I should
ask this separately on, e.g., mesa-dev, but would it be reasonable to
treat Mesa's libgbm as the "vendor neutral" library? It looks like
(a) Load as a "backend" DSO (i.e., get loaded by
mesa/src/gbm/main/backend.c:_gbm_create_device()).
(b) Load as a DRI driver by the DRI libgbm backend (i.e., get loaded
by mesa/src/gbm/backends/dri/gbm_dri.c).
For purposes of vendor-specific opaque data, it looks like (a) would
make the most sense. However, (b) currently conveniently infers a DSO
name to load, by querying the name of the DRM driver that corresponds
to the provided fd. Maybe it would make sense to hoist some of that
inference logic from (b) to (a)? It probably also depends on which of
(a) or (b) we'd consider a stabler ABI?
Yes, I'd suggest that a would be the better way to go, with
backend/loader logic pulled up as appropriate. egl_dri ties you quite
heavily into __DRIscreen and __DRIimage interfaces, which get you an
alarming amount of the way towards having a full Mesa driver. I guess
if that's what you guys want to do, then great, but short of that
having your own GBM backend would definitely make the most sense.

Considering that, I'd suggest hoisting the non-gbm_dri parts of GBM
out of Mesa and into a separate repository, and trying to get minigbm
built as a GBM backend as well.

Cheers,
Daniel
Jonas Ådahl
2016-04-06 08:41:39 UTC
Permalink
Post by Daniel Stone
Hi,
Post by Andy Ritger
Post by Daniel Stone
Post by Andy Ritger
Also, mailbox mode versus FIFO mode should essentially equate to Vsync
off versus Vsync on, respectively. It shouldn't have anything to do
with the benefits of streams, but mailbox mode is a nice feature for
benchmarking games/simulations or naively displaying your latest &
greatest content without tearing.
I agree it's definitely a nice thing to have, but it does bring up the
serialisation issue: we expect any configuration performed by the
client (say, wl_surface::set_opaque_area to let the compositor know
where it can disable blending) to be fully in-line with buffer
attachment. The extreme case of this is resize, but there are quite a
few valid cases where you need serialisation.
I don't know quite off the top of my head how you'd support mailbox
mode with Streams, given this constraint - you need three-way feedback
between the compositor (recording all associated surface state,
including subsurfaces), clients (recording the surface state valid
when that buffer was posted), and the Streams implementation
(determining which frames to dequeue, which to discard and return to
the client, etc).
It is possible we don't get that all completely right in our implementation, yet.
Again this comes down to the synchronisation. In this case, assuming a
- wl_egl_surface_resize(w1, h1)
- gl*()
- eglSwapBuffers() <- commit 1
- wl_egl_surface_resize(w2, h2)
- gl*()
- eglSwapBuffers() <- commit 2
For this, you would need some kind of synchronisation, to ensure that
processing commit 1 didn't pick up on the differently-sized frames for
commit 2.
Just to point out the obvious, using wl_egl_surface_resize as a
barrier/separator/synchronization-triggerer is not enough for this. The
mailbox vs FIFO mode needs to be tightly coupled with subsurface
asynchronous/synchronous mode. For example, ignoring the resize calls
in the above code, commit 1 and commit 2 might have drawn UI elements
that are expected to be aligned with subsurface that were moved.


Jonas
Daniel Stone
2016-04-06 12:14:26 UTC
Permalink
Hi,
Post by Jonas Ådahl
Post by Daniel Stone
Again this comes down to the synchronisation. In this case, assuming a
- wl_egl_surface_resize(w1, h1)
- gl*()
- eglSwapBuffers() <- commit 1
- wl_egl_surface_resize(w2, h2)
- gl*()
- eglSwapBuffers() <- commit 2
For this, you would need some kind of synchronisation, to ensure that
processing commit 1 didn't pick up on the differently-sized frames for
commit 2.
Just to point out the obvious, using wl_egl_surface_resize as a
barrier/separator/synchronization-triggerer is not enough for this. The
mailbox vs FIFO mode needs to be tightly coupled with subsurface
asynchronous/synchronous mode. For example, ignoring the resize calls
in the above code, commit 1 and commit 2 might have drawn UI elements
that are expected to be aligned with subsurface that were moved.
Yep, subsurface in particular. It sounds like FIFO mode will DTRT with
some adjustments to the patches, but I just can't for the life of me
see how mailbox mode would work properly in all cases. You'd really
need some kind of explicit synchronisation barriers with the
compositor in order to determine when it was safe to replace one frame
with a newer frame.

Cheers,
Daniel
Miguel Angel Vico
2016-04-06 12:59:19 UTC
Permalink
On Wed, 6 Apr 2016 13:14:26 +0100
Post by Daniel Stone
Hi,
Post by Jonas Ådahl
Post by Daniel Stone
Again this comes down to the synchronisation. In this case,
- wl_egl_surface_resize(w1, h1)
- gl*()
- eglSwapBuffers() <- commit 1
- wl_egl_surface_resize(w2, h2)
- gl*()
- eglSwapBuffers() <- commit 2
For this, you would need some kind of synchronisation, to ensure
that processing commit 1 didn't pick up on the differently-sized
frames for commit 2.
Just to point out the obvious, using wl_egl_surface_resize as a
barrier/separator/synchronization-triggerer is not enough for this.
The mailbox vs FIFO mode needs to be tightly coupled with subsurface
asynchronous/synchronous mode. For example, ignoring the resize
calls in the above code, commit 1 and commit 2 might have drawn UI
elements that are expected to be aligned with subsurface that were
moved.
Yep, subsurface in particular. It sounds like FIFO mode will DTRT with
some adjustments to the patches, but I just can't for the life of me
see how mailbox mode would work properly in all cases. You'd really
need some kind of explicit synchronisation barriers with the
compositor in order to determine when it was safe to replace one frame
with a newer frame.
Agree. Mailbox mode is something we are aware we need to revisit for
this particular use case.
Post by Daniel Stone
Cheers,
Daniel
--
Miguel
James Jones
2016-04-29 21:16:28 UTC
Permalink
I was on leave when this discussion was started. Now that I'm back, I'd
Post by Daniel Stone
Hi Andy,
Post by Andy Ritger
Thanks for the thorough responses, Daniel.
No problem; as I said, I'm actually really happy to see an
implementation out there.
Post by Andy Ritger
Post by Daniel Stone
Post by Miguel Angel Vico
Similarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.
For instance, EGLOutputLayer represents a portion of display
control hardware that accepts an image as input and processes it
for presentation on a display device.
I still struggle to see the value of what is essentially an
abstraction over KMS, but oh well.
The intent wasn't to abstract all of KMS, just the surface presentation
aspect where EGL and KMS intersect. Besides the other points below,
an additional motivation for abstraction is to allow EGL to work with
the native modesetting APIs on other platforms (e.g., OpenWF on QNX).
Fair enough. And, ah, _that's_ where the OpenWF implementation is - I
was honestly unsure for years since the last implementation I saw was
from the ex-Hybrid NVIDIA guys in Helsinki, back when it was aimed at
Series 60.
Post by Andy Ritger
Post by Daniel Stone
Firstly, again looking at the case where a Wayland client is a stream
producer and the Wayland compositor is a consumer, we move from a
model where references to individual buffers are explicitly passed
through the Wayland protocol, to where those buffers merely carry a
reference to a stream. Again, as stated in the review of 4/7, that
looks like it has the potential to break some actual real-world cases,
and I have no idea how to solve it, other than banning mailbox mode,
which would seem to mostly defeat the point of Streams (more on that
below).
Streams are just a transport for frames. The client still explicitly
communicates when a frame is delivered through the stream via wayland
protocol, and the compositor controls when it grabs a new frame, via
eglStreamConsumerAcquireKHR(). Unless there are bugs in the patches,
the flow of buffers is still explicit and fully under the wayland protocol
and compositor's control.
Right, I believe if you have FIFO mode and strictly enforce
synchronisation to wl_surface::frame, then you should be safe. Mailbox
mode or any other kind of SwapInterval(0) equivalent opens you up to a
series of issues.
Post by Andy Ritger
Also, mailbox mode versus FIFO mode should essentially equate to Vsync
off versus Vsync on, respectively. It shouldn't have anything to do
with the benefits of streams, but mailbox mode is a nice feature for
benchmarking games/simulations or naively displaying your latest &
greatest content without tearing.
I agree it's definitely a nice thing to have, but it does bring up the
serialisation issue: we expect any configuration performed by the
client (say, wl_surface::set_opaque_area to let the compositor know
where it can disable blending) to be fully in-line with buffer
attachment. The extreme case of this is resize, but there are quite a
few valid cases where you need serialisation.
I don't know quite off the top of my head how you'd support mailbox
mode with Streams, given this constraint - you need three-way feedback
between the compositor (recording all associated surface state,
including subsurfaces), clients (recording the surface state valid
when that buffer was posted), and the Streams implementation
(determining which frames to dequeue, which to discard and return to
the client, etc).
Post by Andy Ritger
Post by Daniel Stone
Secondly, looking at the compositor-drm case, the use of the dumb
buffer to display undefined content as a dummy modeset really makes me
uneasy,
Yes, the use of dumb buffer in this patch series is a kludge. If we
were going to use drmModeSetCrtc + EGLStreams, I think we'd want to
pass no fb to drmModeSetCrtc, but that currently gets rejected by DRM.
Are surface-less modesets intended to be allowable in DRM? I can hunt
that down if that is intended to work. Of course, better to work out
how EGLStreams should cooperate with atomic KMS.
It was definitely an oversight to not zero initialize the dumb buffer.
Right, atomic allows you separate pipe/CRTC configuration from
plane/overlay configuration. So you'd have two options: one is to use
atomic and require the CRTC be configured with planes off before using
Streams to post flips, and the other is to add KMS configuration to
the EGL output.
Though, now I think of it, this effectively precludes one case, which
is scaling a Streams-sourced buffer inside the display controller. In
the GBM case, the compositor gets every buffer, so can configure the
plane scaling in line with buffer display. I don't see how you'd do
that with Streams.
There's another hurdle to overcome too, which would currently preclude
avoiding the intermediate dumb buffer at all. One of the invariants
the atomic KMS API enforces is that (!!plane->crtc_id ==
!!plane->fb_id), i.e. that a plane cannot be assigned to a CRTC
without an active buffer. So again, we're left with either having the
plane fully configured and active (assigned to a CRTC and displaying,
I assume, a pre-allocated dumb buffer), or pushing more configuration
into Streams - specifically, connecting an EGLOutputLayer to an
EGLOutputPort.
Not having a full mode-setting API within EGL did make this initial
configuration chicken-and-egg problem hard to solve.

I agree that EGLStreams/EGLOutput should integrate with atomic better
than is shown in this initial patchset.

Maybe a better way to achieve that would be to give EGL an opportunity
to amend an already created atomic request before commiting it? E.g.,

eglStreamsAcquire(dpy, <listOfStreams>, <atomicRequest>);

That would take a filled-out atomic request that does any necessary
reconfiguration and just add the new framebuffers to it from
<listOfStreams>. Any planes that don't need a new frame wouldn't be
included in <listOfStreams> and would keep their current frame. Planes
could also be turned off, moved, re-scaled, etc. Whatever atomic can
express.

Maybe we would need an eglStreamsCheckAcquire/eglStreamsCommitAcquire()
to fail and/or hint to the user that the suggested stream+atomic request
produces sub-optimal results and should be recreated with more optimal
buffers?

In any case, the idea should be nothing would limit the atomic API usage
just because streams are involved.
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
Also, I'm not quite sure how you're testing the compositor-as-consumer
mode: I can't seem to see any EGL extensions which allow you to
connect a Wayland surface as an EGLStream consumer. Do you have
something else unpublished that's being used here, or is this what the
libnvidia-egl-wayland library is for? Or do you just have clients
using EGLSurfaces as normal, which happen to be implemented internally
as EGLStreams? (Also, that the only way to test this is through
proprietary drivers implementing only-just-published extensions not
only makes me very sad, but hugely increases the potential for this to
be inadvertently broken.)
Sorry if this seemed cryptic. You are correct that EGL Wayland clients
just use EGLSurfaces as normal (no Wayland client changes), and that
gets implemented using EGLStreams within libnvidia-egl-wayland.
Sorry, I'd missed this whilst reading through.
Post by Andy Ritger
FWIW, we plan to release the source to libnvidia-egl-wayland
eventually... it has a few driver-specific warts right now, but the
intent is that it is a vendor-independent implementation (though, using
EGLStreams, so...) of EGL_KHR_platform_wayland using a set of EGL API
"wrappers". The goal was to allow window systems to write these EGL
platform binding themselves, so that each EGL implementation doesn't
have to implement each EGL_KHR_platform_*. Anyway, we'll try to get
libnvidia-egl-wayland cleaned up and released.
Interesting!
Post by Andy Ritger
Post by Daniel Stone
Post by Miguel Angel Vico
Thus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a display
device.
In a similar way, by attaching a GLTexture consumer to a stream, a
producer (wayland client) could feed frames to a texture, which in
turn can be used by a compositor to prepare the final frame to be
presented.
Quick aside: this reminds me in many unfortunate ways of
GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture stream of
stuff and make it appear in a texture' model as streams, whereas most
of the rest of the world (EGL, Vulkan WSI, Wayland, Android, ChromeOS,
etc) have all moved explicitly _away_ from that model to passing
references to individual buffers, this in many ways brings us back to
tfp.
Is that really an accurate comparison? The texture_from_pixmap extension
let X11 composite managers bind a single X pixmap to an OpenGL texture.
It seems to me what was missing in TFP usage was explicit synchronization
between X and/or OpenGL rendering into the pixmap and OpenGL texturing
from the pixmap.
I'd argue that synchronisation (in terms of serialisation with the
rest of the client's protocol stream) is missing from Streams as well,
at least in mailbox mode.
(As an aside, I wonder if it's properly done in FIFO mode as well; the
compositor may very validly choose not to dequeue a buffer if a
surface is completely occluded. How does Streams then know that it can
submit another frame? Generally we use wl_surface::frame to deal with
this - the equivalent of eglSwapInterval(1) - but it sounds like
Streams relies more on strictly-paired internal queue/dequeue pairing
in FIFO mode. Maybe this isn't true.)
Post by Andy Ritger
Post by Daniel Stone
Post by Miguel Angel Vico
Whenever EGL_EXT_device_drm extension is present, EGLDevice can
be used to enumerate and access DRM KMS devices, and EGLOutputLayer
to enumerate and access DRM KMS crtcs and planes.
Again, the enumeration isn't so much used as bypassed. The original
enumeration is used, and all we do with the EGL objects is a) list all
of them, b) filter them to find the one we already have, and c)
perhaps replace their internal representation of the device with the
one we already have.
That's fair in the context of this patch set.
In general, EGLDevice provides device enumeration for other use cases
where it is the basis for bootstrapping. Maybe we could better reconcile
udev and EGLDevice in the patch set, but some of this is a natural, though
unfortunate, artifact of correlating objects between two enumeration APIs.
Mind you, this wasn't intended as a criticism, just noting that the
commit message didn't accurately describe the code.
Post by Andy Ritger
Post by Daniel Stone
I'd like to look at the elephant in the room, which is why you're
using this in the first place (aside from general NVIDIA enthusiasm
for encapsulating everything within EGL Streams/Output/Device/etc,
dating back many years). Andy/Aaron, you've said that you found GBM to
be inadequate, and I'd like to find out explicitly how.
Thanks. This is the real heart of the debate.
Yes!
Post by Andy Ritger
Post by Daniel Stone
Through a few
'We can't choose an optimal rendering configuration, because we don't
know how it's going to be used' - (almost completely) untrue. The FD
you pass to gbm_device_create is that of the KMS device, a gbm_surface
contains information as to how the plane (primary or overlay) will be
configured,
Maybe I'm not looking in the right place, but where does gbm_surface get
the intended plane configuration? Are there other display-related flags
beside GBM_BO_USE_SCANOUT? Then again, the particular plane doesn't
impact us for current GPUs.
I believe Andy is correct for current discrete NVIDIA GPUs, but I think
the particular plane configuration does matter on some Tegra display
engines.
Post by Daniel Stone
Well, nowhere. By current plane configuration, I assume you're (to the
extent that you can discuss it) talking about asymmetric plane
capabilities, e.g. support for disjoint colour formats, scaling units,
etc? As Dan V says, I still see Streams as a rather incomplete fix to
this, given that plane assignment is pre-determined: what do you do
when your buffers are configured as optimally as possible, but the
compositor has picked the 'wrong' plane? I really think you need
something like HWC to rewrite your scene graph into the optimal setup.
Streams could provide a way to express that the compositor picked the
wrong plane, but they don't solve the optimal configuration problem.
Configuration is a tricky mix of policy and capabilities that something
like HWComposer or a wayland compositor with access to HW-specific
knowledge needs to solve. I agree with other statements here that
encapsulating direct HW knowledge within individual Wayland compositors
is probably not a great idea, but some separate standard or shared
library taking input from hardware-specific modules and wrangling scene
graphs is probably needed to get optimal behavior.

What streams do is allow allocating the most optimal set of buffers and
using the most optimal method to present them possible given a
configuration. So, streams would kick in after the scene graph thing
generated a config.
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
and an EGLDisplay lets you tie the rendering and scanout
devices together. What more information do you need? It's true that we
don't have a way to select individual rendering devices at the moment,
but as said earlier, passing an EGLDevice as an attrib to
GetPlatformDisplay would resolve that, as you would have the render
device identified by the EGLDevice and the scanout device identified
by the gbm_device. At that point, you have the full pipeline and can
determine the optimal configuration.
Beyond choosing optimal rendering configuration, there is arbitration of
the scarce resources needed for optimal rendering configuration. E.g.,
for Wayland compositor flipping to client-produced buffers, presumably the
client's buffer needs to be allocated with GBM_BO_USE_SCANOUT. NVIDIA's
display hardware requires physically contiguous buffers, so we wouldn't
want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
flag. It would be nice to have feedback between the EGL driver instance
in the compositor and the EGL driver running in the client, to know how
the buffer is going to be used by the Wayland compositor.
I imagine other hardware has even more severe constraints on displayable
memory, though, so maybe I'm misunderstanding something about how buffers
are shared between wayland clients and compositors?
Ah! This is something I've very much had in mind - and have had for
quite a while, but keep getting pre-empted - for a while, but didn't
bring up as it didn't seem implemented in the current patchset. (IIRC,
jajones had some code to allow you to retarget Streams at different
consumers, but he's on leave.)
Also, I should add that there's nothing requiring clients to use GBM
to allocate. The client EGLSurface implementation is free to do purely
internal allocations that are only accessible to it, if it wants to;
gbm_bo_import would then note that the buffer is not usable for
scanout and fail the import, leaving the compositor to fall back to
EGLImage.
Post by Andy Ritger
This ties into the next point...
Post by Daniel Stone
'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?
The Vivante+Freescale example is a good one, but it would be more
interesting if they shared /some/ formats and you could only use those
common formats in /some/ cases.
That's also fairly common, particularly for tiling. Intel has more
tiling modes than I can remember, of which only one (X-tiling) is a
valid source for scanout. As you say, physical contiguity is also a
valid requirement, plus pitch alignment.
Post by Andy Ritger
I think a lot of the concern is about passing client-produced frames
all the way through to scanout (i.e., zero-copy). E.g., if the wayland
client is producing frames that the wayland compositor is going to use
as a texture, then we don't want the client to decompress as part of its
eglSwapBuffers: the wayland compositor will texture from the compressed
frame for best performance. But, if the wayland compositor is going to
flip to the surface, then we would want the client to decompress during
its eglSwapBuffers.
Yes, very much so. Taking the Freescale example, you want the client
to do a detiling blit during its swap if the surface is a valid
scanout target, but not at all if it's just getting textured by the
GPU anyway. Similarly, Intel wants to allocate X-tiled if scanout is
possible, but otherwise it wants to be Y/Yf/...-tiled.
Post by Andy Ritger
The nice thing about EGLStreams here is that if the consumer (the Wayland
compositor) wants to use the content in a different way, the producer
must be notified first, in order to produce something suitable for the
new consumer.
I believe this is entirely doable with GBM right now, taking advantage
of the fact that libgbm.so and libEGL.so must be as tightly paired as
libEGL.so and libGLESv2.so. For all of these, read 'wl_drm' as 'wl_drm
or its equivalent interface in other implementations'.
Firstly, create a new interface in wl_drm to represent a swapchain (in
the Vulkan sense), and modify its buffer-creation requests to take a
swapchain parameter. This we can do without penalty, since the only
users (aside from VA-API, which is really broken and also hopefully
soon to lose its Wayland sink anyway) are EGL_EXT_platform_wayland and
EGL_WL_bind_wayland_display, both within the same DSO.
Secondly, instrument gbm_bo_import's wl_buffer path (proxy for intent
to use a buffer for direct scanout) and EGLImage's
EGL_WAYLAND_BUFFER_WL path (proxy for intent to use via GPU
composition) to determine what the compositor is actually doing with
these buffers, and use that to store target/intent in the swapchain.
Thirdly, when the target/intent changes (e.g. 'was scanout every
frame, has been EGLImage for the last 120 frames'), send an event down
to the client to let it know to modify its allocation. The combination
of EGL/GBM are in the correct place to determine this, since between
them they already have to know the intersection of capabilities
between render and scanout.
That still doesn't solve the optimal-display-configuration problem -
that you have generic code determining not only the display strategy
(scanout vs. GPU composition) as well as the exact display controller
configuration - but neither does EGLStreams, or indeed anything
current short of HWC.
Do you see any problem with doing that within GBM? It's not actually
done yet, but then again, neither is direct scanout through Streams.
;)
With new Wayland protocol, patches to all Wayland compositors to send
proper hints to clients using this protocol, improvements to GBM, and
updates to both of these when new GPU architectures introduced new
requirements, what you describe could do anything streams can do.
However, then the problem will have been solved only in the context of
top-of-tree Wayland and Weston.

There are far more use cases for streams or similar producer/consumer
constructs than Wayland. Streams allow drivers to solve the problem in
one place. Streams also allow vendors to ship new drivers when new
hardware appears that will enable that new hardware to work (and work
optimally, scenegraph issues aside) with existing compositors and
applications without modification. That second point is a guiding
principle for what should be encapsulated within a driver API Vs. what
should be on the application side.
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
'Width, height, pitch and format aren't enough information' - this is
true, but not necessarily relevant. I'm not sure what the source of
this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
to be extended with a gbm_bo_get_modifier() call, which would allow
you to get the DRM format modifier to describe tiling/compression/et
al (as well as perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers). If it's
not gbm, what actually is it? The only other place I can think of
(suggested by Pekka, I think) was the wl_drm protocol, which it should
be stressed is a) not required in any way by Wayland, b) not a
published/public protocol, c) not a stable protocol. wl_drm just
happens to be the way that Mesa shares buffers, just as wl_viv is how
Vivante's proprietary driver shares buffers, and mali_buffer_sharing
is how the Mali driver does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used through
EGL, there is _no_ requirement for you to also implement wl_drm. As it
is a hidden private Mesa protocol, there is also no requirement for
the protocol to remain stable.
I agree that wl_drm doesn't factor into it.
Maybe some of this is my confusion over what parts of gbm.h are
application-facing, and what parts are driver-facing? We, and
presumably most hardware vendors, would want the ability to associate
arbitrary metadata with gbm_bo's, but most of that metadata is
implementation-specific, and not really something an application should
be looking at without sacrificing portability.
All of gbm.h is user-facing; how you implement that API is completely
up to you, including arbitrary metadata. For instance, it's the driver
that allocates its own struct gbm_surface/gbo_bo/etc (which is
opaque), so it can do whatever it likes in terms of metadata. Is there
anything in particular you're thinking of that you're not sure you'd
be able to store portably?
Might also be worth striking a common misconception here: the Mesa GBM
implementation is _not_ canonical. gbm.h is the user-facing API you
have to implement, but beyond that, you don't need to be implemented
by Mesa's src/gbm/. As the gbm.h types are all opaque, I'm not sure
what you couldn't express/hide/store - do you have any examples?
If we could work out how to install vendor-specific GBM implementations,
I believe you're correct, the API is sufficiently high-level to
represent our allocation metadata.
Post by Daniel Stone
Post by Andy Ritger
Post by Daniel Stone
'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
the explicit buffer management on the client side does not parallel
EGLStreams, and notably there is no equivalent consumer interface
offered on the server side, but instead the individual-buffer-driven
approach is taken. It's true that VK_WSI_display_swapchain does exist
and does match the EGLStreams model fairly closely, but also that it
does not have universal implementation: the Intel 'anv' Mesa-based
driver does not implement display_swapchain, instead having an
interface to export a VkImage as a dmabuf. It's true that the latter
is not optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
GBM avoids it for EGL on KMS.
From your perspective, what would be more optimal than VkImage+dmabuf?
Well, it's pretty much on par with GBM-compositor-Wayland-client and
an EGLStreams pipeline ending in an EGLOutput. Not having something
like HWC means that you can't determine the optimal plane-allocation
strategy.
Post by Andy Ritger
Post by Daniel Stone
Agreed. One of the things I've been incredibly happy with is how our
platform has managed to stay completely generic and vendor-neutral so
far, and I'd love to preserve that.
I don't think you'll find any disagreement to that from NVIDIA, either.
I apologize if the EGLStreams proposal gave the impression of a
vendor-private solution. That wasn't the intent. The EGLStream family
of extensions are, after all, an open specification that any EGL vendor
can implement. If there are aspects of any of these EGL extensions that
seem useful, I'd hope that Mesa would we willing to adopt them.
Indeed, this wasn't to cast any aspersions on how you guys have
developed Streams. Having it out there and having these patches has
really been tremendously useful.
Post by Andy Ritger
We (NVIDIA) clearly think EGLStreams is a good direction for expressing
buffer sharing semantics. In our ideal world, everyone would implement
these extensions and Wayland compositors would migrate to using them as
the generic vendor-neutral mechanism for buffer sharing :)
But here's where my problem lies. At the moment, the 'how do I
Wayland' story is very straightforward, and not entirely
coincidentally similar to ChromeOS's: you implement GBM+KMS, you
implement the ~25 LoC of libwayland-egl, you implement
EGL_EXT_platform_{gbm,wayland}, and ... that's it. Introducing Streams
as an alternate model is certainly interesting, and I understand why
you would do it, but having it as the sole option muddies the 'how do
I Wayland' story significantly.
Getting away from the vendor-bound DDX model was something we were
desperate to do (see also xf86-video-modesetting landing on GBM+EGL),
and I'd really just like to avoid that becoming 'well, for most
platforms you do this, but for this platform / these platforms, you do
this instead ...'.
I also have no desire to start creating a Wayland DDX system. The fact
that we all seem to agree something like hwcomposer may be needed to
make things like Wayland optimal does not bode well for that, but that's
a separate discussion.

Yes, streams introduce a slightly different way of doing things than
GBM+the wl_drm protocol. However, the differences are minimal. I don't
think the patchset Miguel has proposed is that invasive, and as we say,
there's nothing preventing Mesa and others from implementing streams as
well. They're part of an open standard, and we'd certainly welcome
collaboration on the specifications. I hope we can at least consider
EGLStreams as a potentially better solution, even if it wasn't the first
solution.

Further, another thing I'd like to get rid of is "implement the ~25 LoC
of libwayland-egl". Streams let us do that. I want Wayland support to
be the last windowing/compositing system for which driver vendors needs
to explicitly maintain support in their code. Once we clean up &
standardize the very minimal driver interfaces beyond current EGL that
our libwayland-egl code is using, anyone should be able to write a
windowing system and provide hooks to enable any EGL driver supporting
the standardized window system hook ABI to run as a client of it. The
same should be done for Vulkan WSI platforms, where the per-platform
driver API is already even more self-contained. In other words, my hope
is that Wayland EGL and Vulkan support will soon be something that ships
with GLVND and the Vulkan common loader, not with the drivers.

Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Daniel Stone
2016-04-29 22:07:01 UTC
Permalink
Hi James,
Post by James Jones
I was on leave when this discussion was started. Now that I'm back, I'd
Welcome back!
Post by James Jones
Post by Daniel Stone
Right, atomic allows you separate pipe/CRTC configuration from
plane/overlay configuration. So you'd have two options: one is to use
atomic and require the CRTC be configured with planes off before using
Streams to post flips, and the other is to add KMS configuration to
the EGL output.
Though, now I think of it, this effectively precludes one case, which
is scaling a Streams-sourced buffer inside the display controller. In
the GBM case, the compositor gets every buffer, so can configure the
plane scaling in line with buffer display. I don't see how you'd do
that with Streams.
There's another hurdle to overcome too, which would currently preclude
avoiding the intermediate dumb buffer at all. One of the invariants
the atomic KMS API enforces is that (!!plane->crtc_id ==
!!plane->fb_id), i.e. that a plane cannot be assigned to a CRTC
without an active buffer. So again, we're left with either having the
plane fully configured and active (assigned to a CRTC and displaying,
I assume, a pre-allocated dumb buffer), or pushing more configuration
into Streams - specifically, connecting an EGLOutputLayer to an
EGLOutputPort.
Not having a full mode-setting API within EGL did make this initial
configuration chicken-and-egg problem hard to solve.
I agree that EGLStreams/EGLOutput should integrate with atomic better than
is shown in this initial patchset.
Maybe a better way to achieve that would be to give EGL an opportunity to
amend an already created atomic request before commiting it? E.g.,
eglStreamsAcquire(dpy, <listOfStreams>, <atomicRequest>);
That would take a filled-out atomic request that does any necessary
reconfiguration and just add the new framebuffers to it from
<listOfStreams>. Any planes that don't need a new frame wouldn't be
included in <listOfStreams> and would keep their current frame. Planes
could also be turned off, moved, re-scaled, etc. Whatever atomic can
express.
Maybe we would need an eglStreamsCheckAcquire/eglStreamsCommitAcquire() to
fail and/or hint to the user that the suggested stream+atomic request
produces sub-optimal results and should be recreated with more optimal
buffers?
In any case, the idea should be nothing would limit the atomic API usage
just because streams are involved.
That is indeed a possibility, though I'm concerned that it leaks KMS
atomic details through the Streams API. Certainly if the check failed,
you'd need to rewind using the atomic cursor API to be useful. It
would also complicate the Streams implementation, as you'd need the
operation to be a 'peek' at the stream head, rather than popping the
frame for a test, failing, and then blocking waiting for a new frame.
You'd also need somewhere to store a reference to that frame, so you
could reuse it later (say you turn the display off and later turn it
back on).

The alternative is, as you allude to, to push the modesetting into
EGL, so that the application feeds EGL its desired outcome and lets
EGL determine the optimal configuration, rather than driving the two
APIs in lockstep.
Post by James Jones
Post by Daniel Stone
Well, nowhere. By current plane configuration, I assume you're (to the
extent that you can discuss it) talking about asymmetric plane
capabilities, e.g. support for disjoint colour formats, scaling units,
etc? As Dan V says, I still see Streams as a rather incomplete fix to
this, given that plane assignment is pre-determined: what do you do
when your buffers are configured as optimally as possible, but the
compositor has picked the 'wrong' plane? I really think you need
something like HWC to rewrite your scene graph into the optimal setup.
Streams could provide a way to express that the compositor picked the wrong
plane, but they don't solve the optimal configuration problem. Configuration
is a tricky mix of policy and capabilities that something like HWComposer or
a wayland compositor with access to HW-specific knowledge needs to solve. I
agree with other statements here that encapsulating direct HW knowledge
within individual Wayland compositors is probably not a great idea, but some
separate standard or shared library taking input from hardware-specific
modules and wrangling scene graphs is probably needed to get optimal
behavior.
Yeah, I would lean towards HWC itself, but that's a separate discussion.
Post by James Jones
Post by Daniel Stone
Do you see any problem with doing that within GBM? It's not actually
done yet, but then again, neither is direct scanout through Streams.
;)
With new Wayland protocol, patches to all Wayland compositors to send proper
hints to clients using this protocol, improvements to GBM, and updates to
both of these when new GPU architectures introduced new requirements, what
you describe could do anything streams can do. However, then the problem
will have been solved only in the context of top-of-tree Wayland and Weston.
This doesn't require explicit/new compositor interaction at all.
Extensions can be done within the gbm/EGL bundle itself (via
EGL_WL_bind_wayland_display), so you're only changing one DSO (or DSO
bundle), and the API usage there today does seem to stand up. Given
that the protocol is private - I'm certainly not advocating for a
DRI2-style all-things-to-all-hardware standard protocol to communicate
this - and that it's localised in a vendor bundle, it seems completely
widely applicable to me. As someone who's writing this from
Mutter/Wayland/GBM, I'm certainly not interested in Weston-only
solutions.
Post by James Jones
There are far more use cases for streams or similar producer/consumer
constructs than Wayland. Streams allow drivers to solve the problem in one
place.
Certainly there are, but then again, there are far more usecases than
EGL. Looking at media playback, Vulkan, etc, where you don't have EGL
yet need to solve the same problems.
Post by James Jones
Streams also allow vendors to ship new drivers when new hardware
appears that will enable that new hardware to work (and work optimally,
scenegraph issues aside) with existing compositors and applications without
modification. That second point is a guiding principle for what should be
encapsulated within a driver API Vs. what should be on the application side.
I agree, and I'm not arguing this to be on the application or
compositor side either. I believe the GBM and HWC suggestions are
entirely doable, and further that these problems will need to be
solved outside EGL anyway, for the other usecases. My worry - quite
aside from how vendors who struggle to produce a conformant EGL 1.4
implementation today will ever implement the complexity of Streams,
though this isn't your problem - is that EGL is really the wrong place
to be solving this.
Post by James Jones
Post by Daniel Stone
All of gbm.h is user-facing; how you implement that API is completely
up to you, including arbitrary metadata. For instance, it's the driver
that allocates its own struct gbm_surface/gbo_bo/etc (which is
opaque), so it can do whatever it likes in terms of metadata. Is there
anything in particular you're thinking of that you're not sure you'd
be able to store portably?
Might also be worth striking a common misconception here: the Mesa GBM
implementation is _not_ canonical. gbm.h is the user-facing API you
have to implement, but beyond that, you don't need to be implemented
by Mesa's src/gbm/. As the gbm.h types are all opaque, I'm not sure
what you couldn't express/hide/store - do you have any examples?
If we could work out how to install vendor-specific GBM implementations, I
believe you're correct, the API is sufficiently high-level to represent our
allocation metadata.
After some thought, I've come around to the view that we should
declare the Mesa implementation and allow others to install plugins.
The EGLDisplay -> gbm_device bind happens too late to do it otherwise,
I think.
Post by James Jones
Yes, streams introduce a slightly different way of doing things than GBM+the
wl_drm protocol. However, the differences are minimal. I don't think the
patchset Miguel has proposed is that invasive, and as we say, there's
nothing preventing Mesa and others from implementing streams as well.
I think it's large enough that it warrants a split of gl-renderer and
compositor-drm, rather than trying to shoehorn them into the same
file. There's going to be quite some complexity hiding between the
synchronise-with-client-event-stream and direct-scanout boxes, that
will push it over the limit of what's tractable. Those files are
already pretty huge and complex.
Post by James Jones
They're part of an open standard, and we'd certainly welcome collaboration
on the specifications. I hope we can at least consider EGLStreams as a
potentially better solution, even if it wasn't the first solution.
Further, another thing I'd like to get rid of is "implement the ~25 LoC of
libwayland-egl". Streams let us do that. I want Wayland support to be the
last windowing/compositing system for which driver vendors needs to
explicitly maintain support in their code. Once we clean up & standardize
the very minimal driver interfaces beyond current EGL that our
libwayland-egl code is using, anyone should be able to write a windowing
system and provide hooks to enable any EGL driver supporting the
standardized window system hook ABI to run as a client of it. The same
should be done for Vulkan WSI platforms, where the per-platform driver API
is already even more self-contained. In other words, my hope is that
Wayland EGL and Vulkan support will soon be something that ships with GLVND
and the Vulkan common loader, not with the drivers.
I share the hope, and maybe with the WSI and Streams available, we can
design future window systems and display control APIs towards
something like that. But at the moment, the impedance mismatch between
Streams and the (deliberately very different) Wayland and KMS APIs is
already fairly glaring. The winsys support is absolutely trivial to
write, and with winsys interactions only getting more featureful and
complex, such will the common stream protocol have to be.

If I was starting from the position of the EGL ideal: that everything
is EGL, and the only external interactions are creating native types
for it, then I would surely arrive at the same position as you. But
everything we've seen so far - and again, ChromeOS have taken this to
a much further extent - has been chipping away at EGL, rather than
putting more into it, and this has been for the better. I don't think
that's a difference we'll ever resolve though.

Cheers,
Daniel
James Jones
2016-05-03 16:07:12 UTC
Permalink
Post by Daniel Stone
Hi James,
Post by James Jones
I was on leave when this discussion was started. Now that I'm back, I'd
Welcome back!
Thanks!
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
Right, atomic allows you separate pipe/CRTC configuration from
plane/overlay configuration. So you'd have two options: one is to use
atomic and require the CRTC be configured with planes off before using
Streams to post flips, and the other is to add KMS configuration to
the EGL output.
Though, now I think of it, this effectively precludes one case, which
is scaling a Streams-sourced buffer inside the display controller. In
the GBM case, the compositor gets every buffer, so can configure the
plane scaling in line with buffer display. I don't see how you'd do
that with Streams.
There's another hurdle to overcome too, which would currently preclude
avoiding the intermediate dumb buffer at all. One of the invariants
the atomic KMS API enforces is that (!!plane->crtc_id ==
!!plane->fb_id), i.e. that a plane cannot be assigned to a CRTC
without an active buffer. So again, we're left with either having the
plane fully configured and active (assigned to a CRTC and displaying,
I assume, a pre-allocated dumb buffer), or pushing more configuration
into Streams - specifically, connecting an EGLOutputLayer to an
EGLOutputPort.
Not having a full mode-setting API within EGL did make this initial
configuration chicken-and-egg problem hard to solve.
I agree that EGLStreams/EGLOutput should integrate with atomic better than
is shown in this initial patchset.
Maybe a better way to achieve that would be to give EGL an opportunity to
amend an already created atomic request before commiting it? E.g.,
eglStreamsAcquire(dpy, <listOfStreams>, <atomicRequest>);
That would take a filled-out atomic request that does any necessary
reconfiguration and just add the new framebuffers to it from
<listOfStreams>. Any planes that don't need a new frame wouldn't be
included in <listOfStreams> and would keep their current frame. Planes
could also be turned off, moved, re-scaled, etc. Whatever atomic can
express.
Maybe we would need an eglStreamsCheckAcquire/eglStreamsCommitAcquire() to
fail and/or hint to the user that the suggested stream+atomic request
produces sub-optimal results and should be recreated with more optimal
buffers?
In any case, the idea should be nothing would limit the atomic API usage
just because streams are involved.
That is indeed a possibility, though I'm concerned that it leaks KMS
atomic details through the Streams API.
The atomic/KMS usage, like the DRM integration itself, would be optional
though. This wouldn't, for example, leak KMS details into an
OpenWF-based EGLOutput+EGLStream application. I don't see it as any
worse than having EGL_KHR_platform_x11 and friends, for example.
Post by Daniel Stone
Certainly if the check failed,
you'd need to rewind using the atomic cursor API to be useful. It
would also complicate the Streams implementation, as you'd need the
operation to be a 'peek' at the stream head, rather than popping the
frame for a test, failing, and then blocking waiting for a new frame.
You'd also need somewhere to store a reference to that frame, so you
could reuse it later (say you turn the display off and later turn it
back on).
I believe streams require all this already. They need to maintain a
reference to the current frame for re-use if a new one is not available,
and they need to essentially "peek" at the beginning of an acquire and
"commit" at the end, so exposing that via the API wouldn't be a large
change.
Post by Daniel Stone
The alternative is, as you allude to, to push the modesetting into
EGL, so that the application feeds EGL its desired outcome and lets
EGL determine the optimal configuration, rather than driving the two
APIs in lockstep.
Indeed.
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
Well, nowhere. By current plane configuration, I assume you're (to the
extent that you can discuss it) talking about asymmetric plane
capabilities, e.g. support for disjoint colour formats, scaling units,
etc? As Dan V says, I still see Streams as a rather incomplete fix to
this, given that plane assignment is pre-determined: what do you do
when your buffers are configured as optimally as possible, but the
compositor has picked the 'wrong' plane? I really think you need
something like HWC to rewrite your scene graph into the optimal setup.
Streams could provide a way to express that the compositor picked the wrong
plane, but they don't solve the optimal configuration problem. Configuration
is a tricky mix of policy and capabilities that something like HWComposer or
a wayland compositor with access to HW-specific knowledge needs to solve. I
agree with other statements here that encapsulating direct HW knowledge
within individual Wayland compositors is probably not a great idea, but some
separate standard or shared library taking input from hardware-specific
modules and wrangling scene graphs is probably needed to get optimal
behavior.
Yeah, I would lean towards HWC itself, but that's a separate discussion.
Post by James Jones
Post by Daniel Stone
Do you see any problem with doing that within GBM? It's not actually
done yet, but then again, neither is direct scanout through Streams.
;)
With new Wayland protocol, patches to all Wayland compositors to send proper
hints to clients using this protocol, improvements to GBM, and updates to
both of these when new GPU architectures introduced new requirements, what
you describe could do anything streams can do. However, then the problem
will have been solved only in the context of top-of-tree Wayland and Weston.
This doesn't require explicit/new compositor interaction at all.
Extensions can be done within the gbm/EGL bundle itself (via
EGL_WL_bind_wayland_display), so you're only changing one DSO (or DSO
bundle), and the API usage there today does seem to stand up. Given
that the protocol is private - I'm certainly not advocating for a
DRI2-style all-things-to-all-hardware standard protocol to communicate
this - and that it's localised in a vendor bundle, it seems completely
widely applicable to me. As someone who's writing this from
Mutter/Wayland/GBM, I'm certainly not interested in Weston-only
solutions.
No, the necessary extensions can not be contained within the binding.
There is not enough information within the driver layer alone. Something
needs to tell the driver when the configuration changes (E.g., the
consumer of a wayland surface switches from a texture to a plane) and
what the new configuration is. This would trigger the protocol
notifications & subsequent optimization within the driver. By the
nature of their API, streams would require the compositor to take action
on such configuration changes, and streams can discover the new
configuration. Something equivalent would be required to make this work
in the GBM+wl_drm/EGL case.

Further, as a driver vendor, the idea of requiring even in-driver
platform-specific modifications for this sounds undesirable. If it was
something that could be contained entirely within GBM, that would be
interesting. However, distributing the architecture-specific code
throughout the window-system specific code in the driver means a lot
more maintenance burden in a world with X, Chrome OS, Wayland, and
several others.
Post by Daniel Stone
Post by James Jones
There are far more use cases for streams or similar producer/consumer
constructs than Wayland. Streams allow drivers to solve the problem in one
place.
Certainly there are, but then again, there are far more usecases than
EGL. Looking at media playback, Vulkan, etc, where you don't have EGL
yet need to solve the same problems.
EGLStreams, Vulkan swapchains, and (for example) VDPAU presentation
queues are all varying levels of abstraction on top of the same thing
within the driver: a presentation engine or buffer queue, depending on
whether the target is a physical output or a compositor. These
API-level components can be hooked up to eachother as long as the
lower-level details are fully contained within the driver abstraction.
A Vulkan swapchain can be internally implemented as an EGLStream
producer, for example. In fact, Vulkan swapchains borrow many ideas
directly and indirectly from EGLStream.
Post by Daniel Stone
Post by James Jones
Streams also allow vendors to ship new drivers when new hardware
appears that will enable that new hardware to work (and work optimally,
scenegraph issues aside) with existing compositors and applications without
modification. That second point is a guiding principle for what should be
encapsulated within a driver API Vs. what should be on the application side.
I agree, and I'm not arguing this to be on the application or
compositor side either. I believe the GBM and HWC suggestions are
entirely doable, and further that these problems will need to be
solved outside EGL anyway, for the other usecases. My worry - quite
aside from how vendors who struggle to produce a conformant EGL 1.4
implementation today will ever implement the complexity of Streams,
though this isn't your problem - is that EGL is really the wrong place
to be solving this.
Could you elaborate on what the other usecases are? If you mean the
Vulkan/media playback cases mentioned above, then I don't see what is
fundamentally wrong about using EGL as a backend within the window
system for those. If a Vulkan application needs to display on an
EGL+GLES-based Wayland compositor, there will be some point where a
transition is made from Vulkan -> EGL+GLES regardless.
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
All of gbm.h is user-facing; how you implement that API is completely
up to you, including arbitrary metadata. For instance, it's the driver
that allocates its own struct gbm_surface/gbo_bo/etc (which is
opaque), so it can do whatever it likes in terms of metadata. Is there
anything in particular you're thinking of that you're not sure you'd
be able to store portably?
Might also be worth striking a common misconception here: the Mesa GBM
implementation is _not_ canonical. gbm.h is the user-facing API you
have to implement, but beyond that, you don't need to be implemented
by Mesa's src/gbm/. As the gbm.h types are all opaque, I'm not sure
what you couldn't express/hide/store - do you have any examples?
If we could work out how to install vendor-specific GBM implementations, I
believe you're correct, the API is sufficiently high-level to represent our
allocation metadata.
After some thought, I've come around to the view that we should
declare the Mesa implementation and allow others to install plugins.
The EGLDisplay -> gbm_device bind happens too late to do it otherwise,
I think.
Post by James Jones
Yes, streams introduce a slightly different way of doing things than GBM+the
wl_drm protocol. However, the differences are minimal. I don't think the
patchset Miguel has proposed is that invasive, and as we say, there's
nothing preventing Mesa and others from implementing streams as well.
I think it's large enough that it warrants a split of gl-renderer and
compositor-drm, rather than trying to shoehorn them into the same
file. There's going to be quite some complexity hiding between the
synchronise-with-client-event-stream and direct-scanout boxes, that
will push it over the limit of what's tractable. Those files are
already pretty huge and complex.
Would it be better to wait until such complexities arise in future
patches and split the files at that point, or would you prefer we split
the backends now? Perhaps I'm just more optimistic about the
complexity, but it seems like it would be easier to evaluate once that
currently-hypothetical portion of the code exists.
Post by Daniel Stone
Post by James Jones
They're part of an open standard, and we'd certainly welcome collaboration
on the specifications. I hope we can at least consider EGLStreams as a
potentially better solution, even if it wasn't the first solution.
Further, another thing I'd like to get rid of is "implement the ~25 LoC of
libwayland-egl". Streams let us do that. I want Wayland support to be the
last windowing/compositing system for which driver vendors needs to
explicitly maintain support in their code. Once we clean up & standardize
the very minimal driver interfaces beyond current EGL that our
libwayland-egl code is using, anyone should be able to write a windowing
system and provide hooks to enable any EGL driver supporting the
standardized window system hook ABI to run as a client of it. The same
should be done for Vulkan WSI platforms, where the per-platform driver API
is already even more self-contained. In other words, my hope is that
Wayland EGL and Vulkan support will soon be something that ships with GLVND
and the Vulkan common loader, not with the drivers.
I share the hope, and maybe with the WSI and Streams available, we can
design future window systems and display control APIs towards
something like that. But at the moment, the impedance mismatch between
Streams and the (deliberately very different) Wayland and KMS APIs is
already fairly glaring. The winsys support is absolutely trivial to
write, and with winsys interactions only getting more featureful and
complex, such will the common stream protocol have to be.
If I was starting from the position of the EGL ideal: that everything
is EGL, and the only external interactions are creating native types
for it, then I would surely arrive at the same position as you. But
everything we've seen so far - and again, ChromeOS have taken this to
a much further extent - has been chipping away at EGL, rather than
putting more into it, and this has been for the better.
The direction ChromeOS is taking is even more problematic, and I'd hate
to see it being held up as an example of proper design direction. We
spent a good deal of time working with Google to support ChromeOS and
ended up essentially allowing them to punch through the driver
abstraction via very opaque EGL extensions that no engineer besides the
extension authors could be expected to use correctly, and embed
HW-specific knowledge within some component of ChromeOS, such that it
will likely only run optimally on a single generation of our hardware
and will need to be revisited. That's the type of problem we're trying
to avoid here. ChromeOS has made other design compromises that cost us
(and I suspect other vendors) 10-20% performance across the board to
optimize for a very specific use case (I.e., a browser) and within very
constrained schedules. It is not the right direction for OS<->graphics
driver interactions to evolve.
Post by Daniel Stone
I don't think that's a difference we'll ever resolve though.
I believe thus far we've all tried to focus objectively on specific
issues, proposed solutions for them, and the merits of those solutions.
Weston and the other Wayland compositors I'm aware of are based on EGL
at the moment, so regardless of its merits as an API it doesn't seem
problematic purely from a dependency standpoint to add EGLStream as an
option next to the existing EGLImage and EGLDisplay+GBM paths. I'm
certainly willing to continue discussing the merits of EGL on a broader
scale, but does that discussion need to block the patches proposed here?

Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
Daniel Stone
2016-05-03 16:53:03 UTC
Permalink
Hi James,
Post by Daniel Stone
Post by James Jones
With new Wayland protocol, patches to all Wayland compositors to send proper
hints to clients using this protocol, improvements to GBM, and updates to
both of these when new GPU architectures introduced new requirements, what
you describe could do anything streams can do. However, then the problem
will have been solved only in the context of top-of-tree Wayland and Weston.
This doesn't require explicit/new compositor interaction at all.
Extensions can be done within the gbm/EGL bundle itself (via
EGL_WL_bind_wayland_display), so you're only changing one DSO (or DSO
bundle), and the API usage there today does seem to stand up. Given
that the protocol is private - I'm certainly not advocating for a
DRI2-style all-things-to-all-hardware standard protocol to communicate
this - and that it's localised in a vendor bundle, it seems completely
widely applicable to me. As someone who's writing this from
Mutter/Wayland/GBM, I'm certainly not interested in Weston-only
solutions.
No, the necessary extensions can not be contained within the binding. There
is not enough information within the driver layer alone. Something needs to
tell the driver when the configuration changes (E.g., the consumer of a
wayland surface switches from a texture to a plane) and what the new
configuration is. This would trigger the protocol notifications &
subsequent optimization within the driver. By the nature of their API,
streams would require the compositor to take action on such configuration
changes, and streams can discover the new configuration. Something
equivalent would be required to make this work in the GBM+wl_drm/EGL case.
I don't think this is the case. As I went through with Andy, we
_already_ have intent expressed in the GBM case, in the exact same way
that EGLStreams does: consider gbm_bo_import as equivalent for
attaching to an EGLOutput(Layer) consumer, and EGLImage import +
TargetTexture2D as equivalent for attaching a gltexture consumer. This
is the exact same proxy for intent to display, and in fact the GBM
approach is slightly more flexible, because it allows you to both do
direct scanout as well as GPU composition (e.g. if you're
capturing/streaming at the same time as display).

Again though, without stream-retargeting, this is not something which
exists in Streams today, and doing so is going to require more
extensions: more code in your driver, more code in every
implementation. GBM today, for all its faults, does not require
further API extension to make this work.
Further, as a driver vendor, the idea of requiring even in-driver
platform-specific modifications for this sounds undesirable. If it was
something that could be contained entirely within GBM, that would be
interesting. However, distributing the architecture-specific code
throughout the window-system specific code in the driver means a lot more
maintenance burden in a world with X, Chrome OS, Wayland, and several
others.
This would hold true if Streams was a perfect encapsulation, but I
don't really see how doing so adds any burden over layering the
winsys/platform layer over Streams in the first place. I mean, you've
written Wayland bindings for Streams in the first place ... how would
this be too much different? Even if the protocol is designed to be the
perfect transport for Streams, you _still_ need transport bindings to
your target protocol.
Post by Daniel Stone
Certainly there are, but then again, there are far more usecases than
EGL. Looking at media playback, Vulkan, etc, where you don't have EGL
yet need to solve the same problems.
EGLStreams, Vulkan swapchains, and (for example) VDPAU presentation queues
are all varying levels of abstraction on top of the same thing within the
driver: a presentation engine or buffer queue, depending on whether the
target is a physical output or a compositor. These API-level components can
be hooked up to eachother as long as the lower-level details are fully
contained within the driver abstraction. A Vulkan swapchain can be
internally implemented as an EGLStream producer, for example. In fact,
Vulkan swapchains borrow many ideas directly and indirectly from EGLStream.
Indeed, I noted the similarity, but primarily for the device_swapchain
extension.
Post by Daniel Stone
I agree, and I'm not arguing this to be on the application or
compositor side either. I believe the GBM and HWC suggestions are
entirely doable, and further that these problems will need to be
solved outside EGL anyway, for the other usecases. My worry - quite
aside from how vendors who struggle to produce a conformant EGL 1.4
implementation today will ever implement the complexity of Streams,
though this isn't your problem - is that EGL is really the wrong place
to be solving this.
Could you elaborate on what the other usecases are? If you mean the
Vulkan/media playback cases mentioned above, then I don't see what is
fundamentally wrong about using EGL as a backend within the window system
for those. If a Vulkan application needs to display on an EGL+GLES-based
Wayland compositor, there will be some point where a transition is made from
Vulkan -> EGL+GLES regardless.
Media falls down because currently there is no zerocopy binding from
either hardware or software media decode engines. Perhaps not the case
on your hardware, unusually blessed with a great deal of memory
bandwidth, but a great many devices physically cannot cope with a
single copy in the pipeline, given the ratio of content size to memory
bandwidth. Doing this in EGL would require a 'draw' step which simply
presented an existing buffer - a step which would unnecessarily
involve the GPU if the pipeline is direct from decode to scanout - or
it would involve having every media engine write their own bindings to
the Streams protocol.

There are also incredibly exacting timing requirements for media
display, which the Streams model of 'single permanently fixed latency'
does not even come close to achieving. So for that you'd need another
extension, to report actual achieved timings back. Wayland today
fulfills these requirements with the zlinux_dmabuf and
presentation_timing protocols, with the original hardware timings fed
back through KMS.
Post by Daniel Stone
I think it's large enough that it warrants a split of gl-renderer and
compositor-drm, rather than trying to shoehorn them into the same
file. There's going to be quite some complexity hiding between the
synchronise-with-client-event-stream and direct-scanout boxes, that
will push it over the limit of what's tractable. Those files are
already pretty huge and complex.
Would it be better to wait until such complexities arise in future patches
and split the files at that point, or would you prefer we split the backends
now? Perhaps I'm just more optimistic about the complexity, but it seems
like it would be easier to evaluate once that currently-hypothetical portion
of the code exists.
Well, there were quite a few issues with the previous set of patches,
and honestly I'm expecting just resolving those to bring enough
complexity to require a three-way split (common, Streams, and
EGLImage/GBM), let alone the features you're talking about solving
with Streams: direct scanout via retargeting of Streams, etc.
Post by Daniel Stone
I share the hope, and maybe with the WSI and Streams available, we can
design future window systems and display control APIs towards
something like that. But at the moment, the impedance mismatch between
Streams and the (deliberately very different) Wayland and KMS APIs is
already fairly glaring. The winsys support is absolutely trivial to
write, and with winsys interactions only getting more featureful and
complex, such will the common stream protocol have to be.
If I was starting from the position of the EGL ideal: that everything
is EGL, and the only external interactions are creating native types
for it, then I would surely arrive at the same position as you. But
everything we've seen so far - and again, ChromeOS have taken this to
a much further extent - has been chipping away at EGL, rather than
putting more into it, and this has been for the better.
The direction ChromeOS is taking is even more problematic, and I'd hate to
see it being held up as an example of proper design direction. We spent a
good deal of time working with Google to support ChromeOS and ended up
essentially allowing them to punch through the driver abstraction via very
opaque EGL extensions that no engineer besides the extension authors could
be expected to use correctly, and embed HW-specific knowledge within some
component of ChromeOS, such that it will likely only run optimally on a
single generation of our hardware and will need to be revisited. That's the
type of problem we're trying to avoid here. ChromeOS has made other design
compromises that cost us (and I suspect other vendors) 10-20% performance
across the board to optimize for a very specific use case (I.e., a browser)
and within very constrained schedules. It is not the right direction for
OS<->graphics driver interactions to evolve.
Direction and extent are two very different things: I largely agree
with their direction (less encapsulation inside vendor drivers), and
disagree on the extent to which they've taken it.
Post by Daniel Stone
I don't think that's a difference we'll ever resolve though.
I believe thus far we've all tried to focus objectively on specific issues,
proposed solutions for them, and the merits of those solutions. Weston and
the other Wayland compositors I'm aware of are based on EGL at the moment,
so regardless of its merits as an API it doesn't seem problematic purely
from a dependency standpoint to add EGLStream as an option next to the
existing EGLImage and EGLDisplay+GBM paths. I'm certainly willing to
continue discussing the merits of EGL on a broader scale, but does that
discussion need to block the patches proposed here?
Every additional codepath has its cost. Even if you just look at
Mutter and Weston in a vacuum, it seems like it'll be quite the large
patchset(s) by the time it's done, let alone extending it out to all
the other compositors. This is a patchset which will need constant
care and feeding: if it's not tested, it's broken. Right now, there is
only one Streams implementation available, which is in a driver whose
legal status is seen to be sufficiently problematic that it is not
generally distributed by downstreams, which requires a whole set of
external kernel patches to run. So even getting it to run is
non-trivial.

But then we'd have to do that in such a way that it was generally
available, else any refactoring or changes we wanted to do internally
would have to be blocked on testing/review from someone who knew that
backend well enough. Either that, or it would just get broken.
Introducing these codepaths has a very, very, real cost to the
projects you're talking about.

You could quite rightly point to the Raspberry Pi DispManX backend as
an example of the same, and you'd be right. And that's why I'm
extremely enthused about how their new KMS/GBM driver allows us to
nuke the entire backend from orbit, and reduce our testing load by
shifting them to the generic driver.

Cheers,
Daniel
James Jones
2016-05-03 18:44:51 UTC
Permalink
Post by Daniel Stone
Hi James,
Post by Daniel Stone
Post by James Jones
With new Wayland protocol, patches to all Wayland compositors to send proper
hints to clients using this protocol, improvements to GBM, and updates to
both of these when new GPU architectures introduced new requirements, what
you describe could do anything streams can do. However, then the problem
will have been solved only in the context of top-of-tree Wayland and Weston.
This doesn't require explicit/new compositor interaction at all.
Extensions can be done within the gbm/EGL bundle itself (via
EGL_WL_bind_wayland_display), so you're only changing one DSO (or DSO
bundle), and the API usage there today does seem to stand up. Given
that the protocol is private - I'm certainly not advocating for a
DRI2-style all-things-to-all-hardware standard protocol to communicate
this - and that it's localised in a vendor bundle, it seems completely
widely applicable to me. As someone who's writing this from
Mutter/Wayland/GBM, I'm certainly not interested in Weston-only
solutions.
No, the necessary extensions can not be contained within the binding. There
is not enough information within the driver layer alone. Something needs to
tell the driver when the configuration changes (E.g., the consumer of a
wayland surface switches from a texture to a plane) and what the new
configuration is. This would trigger the protocol notifications &
subsequent optimization within the driver. By the nature of their API,
streams would require the compositor to take action on such configuration
changes, and streams can discover the new configuration. Something
equivalent would be required to make this work in the GBM+wl_drm/EGL case.
I don't think this is the case. As I went through with Andy, we
_already_ have intent expressed in the GBM case, in the exact same way
that EGLStreams does: consider gbm_bo_import as equivalent for
attaching to an EGLOutput(Layer) consumer, and EGLImage import +
TargetTexture2D as equivalent for attaching a gltexture consumer.
"Will be used for display on device X" is not sufficient information, as
Daniel Vetter outlined.
Post by Daniel Stone
This
is the exact same proxy for intent to display, and in fact the GBM
approach is slightly more flexible, because it allows you to both do
direct scanout as well as GPU composition (e.g. if you're
capturing/streaming at the same time as display).
Again though, without stream-retargeting, this is not something which
exists in Streams today, and doing so is going to require more
extensions: more code in your driver, more code in every
implementation. GBM today, for all its faults, does not require
further API extension to make this work.
Agreed. We're working on similar flexibility for streams via an
EGLSwitch muxing extension. As mentioned above, GBM would require API
extensions and driver updates to reach the expressiveness of streams as
well though.
Post by Daniel Stone
Further, as a driver vendor, the idea of requiring even in-driver
platform-specific modifications for this sounds undesirable. If it was
something that could be contained entirely within GBM, that would be
interesting. However, distributing the architecture-specific code
throughout the window-system specific code in the driver means a lot more
maintenance burden in a world with X, Chrome OS, Wayland, and several
others.
This would hold true if Streams was a perfect encapsulation, but I
don't really see how doing so adds any burden over layering the
winsys/platform layer over Streams in the first place. I mean, you've
written Wayland bindings for Streams in the first place ... how would
this be too much different? Even if the protocol is designed to be the
perfect transport for Streams, you _still_ need transport bindings to
your target protocol.
We wrote the wayland protocol as an example of what is possible using
streams, and we intend to open-source it. Presumably window-system
authors would write the protocol for other windowing systems. Further,
since streams would encapsulate all the device-specific stuff, the
protocol library wouldn't require as much maintenance as a
driver-specific protocol library.

In a world with only Wayland, yes, we'd be doing slightly more work to
bootstrap streams support than we would to support GBM+wayland.
However, other windowing systems and stream use cases exist.

What streams exposes is intended to lower the amount of stuff hidden in
drivers, not increase it. Streams is a generic swapchain mechanism
exposed to any user, whereas we would need to write something
proprietary (maybe open source, maybe closed source, but NVIDIA-specific
none the less) for each window system to get equivalent performance if
we pushed the abstraction to a lower level.
Post by Daniel Stone
Post by Daniel Stone
Certainly there are, but then again, there are far more usecases than
EGL. Looking at media playback, Vulkan, etc, where you don't have EGL
yet need to solve the same problems.
EGLStreams, Vulkan swapchains, and (for example) VDPAU presentation queues
are all varying levels of abstraction on top of the same thing within the
driver: a presentation engine or buffer queue, depending on whether the
target is a physical output or a compositor. These API-level components can
be hooked up to eachother as long as the lower-level details are fully
contained within the driver abstraction. A Vulkan swapchain can be
internally implemented as an EGLStream producer, for example. In fact,
Vulkan swapchains borrow many ideas directly and indirectly from EGLStream.
Indeed, I noted the similarity, but primarily for the device_swapchain
extension.
Post by Daniel Stone
I agree, and I'm not arguing this to be on the application or
compositor side either. I believe the GBM and HWC suggestions are
entirely doable, and further that these problems will need to be
solved outside EGL anyway, for the other usecases. My worry - quite
aside from how vendors who struggle to produce a conformant EGL 1.4
implementation today will ever implement the complexity of Streams,
though this isn't your problem - is that EGL is really the wrong place
to be solving this.
Could you elaborate on what the other usecases are? If you mean the
Vulkan/media playback cases mentioned above, then I don't see what is
fundamentally wrong about using EGL as a backend within the window system
for those. If a Vulkan application needs to display on an EGL+GLES-based
Wayland compositor, there will be some point where a transition is made from
Vulkan -> EGL+GLES regardless.
Media falls down because currently there is no zerocopy binding from
either hardware or software media decode engines. Perhaps not the case
on your hardware, unusually blessed with a great deal of memory
bandwidth, but a great many devices physically cannot cope with a
single copy in the pipeline, given the ratio of content size to memory
bandwidth. Doing this in EGL would require a 'draw' step which simply
presented an existing buffer - a step which would unnecessarily
involve the GPU if the pipeline is direct from decode to scanout - or
it would involve having every media engine write their own bindings to
the Streams protocol.
Right. Streams are meant to support lot's of different producers and
consumers.
Post by Daniel Stone
There are also incredibly exacting timing requirements for media
display, which the Streams model of 'single permanently fixed latency'
does not even come close to achieving. So for that you'd need another
extension, to report actual achieved timings back. Wayland today
fulfills these requirements with the zlinux_dmabuf and
presentation_timing protocols, with the original hardware timings fed
back through KMS.
Would it be reasonable to support such existing extensions while using
streams?
Post by Daniel Stone
Post by Daniel Stone
I think it's large enough that it warrants a split of gl-renderer and
compositor-drm, rather than trying to shoehorn them into the same
file. There's going to be quite some complexity hiding between the
synchronise-with-client-event-stream and direct-scanout boxes, that
will push it over the limit of what's tractable. Those files are
already pretty huge and complex.
Would it be better to wait until such complexities arise in future patches
and split the files at that point, or would you prefer we split the backends
now? Perhaps I'm just more optimistic about the complexity, but it seems
like it would be easier to evaluate once that currently-hypothetical portion
of the code exists.
Well, there were quite a few issues with the previous set of patches,
and honestly I'm expecting just resolving those to bring enough
complexity to require a three-way split (common, Streams, and
EGLImage/GBM), let alone the features you're talking about solving
with Streams: direct scanout via retargeting of Streams, etc.
Post by Daniel Stone
I share the hope, and maybe with the WSI and Streams available, we can
design future window systems and display control APIs towards
something like that. But at the moment, the impedance mismatch between
Streams and the (deliberately very different) Wayland and KMS APIs is
already fairly glaring. The winsys support is absolutely trivial to
write, and with winsys interactions only getting more featureful and
complex, such will the common stream protocol have to be.
If I was starting from the position of the EGL ideal: that everything
is EGL, and the only external interactions are creating native types
for it, then I would surely arrive at the same position as you. But
everything we've seen so far - and again, ChromeOS have taken this to
a much further extent - has been chipping away at EGL, rather than
putting more into it, and this has been for the better.
The direction ChromeOS is taking is even more problematic, and I'd hate to
see it being held up as an example of proper design direction. We spent a
good deal of time working with Google to support ChromeOS and ended up
essentially allowing them to punch through the driver abstraction via very
opaque EGL extensions that no engineer besides the extension authors could
be expected to use correctly, and embed HW-specific knowledge within some
component of ChromeOS, such that it will likely only run optimally on a
single generation of our hardware and will need to be revisited. That's the
type of problem we're trying to avoid here. ChromeOS has made other design
compromises that cost us (and I suspect other vendors) 10-20% performance
across the board to optimize for a very specific use case (I.e., a browser)
and within very constrained schedules. It is not the right direction for
OS<->graphics driver interactions to evolve.
Direction and extent are two very different things: I largely agree
with their direction (less encapsulation inside vendor drivers), and
disagree on the extent to which they've taken it.
That's a very good point. I agree minimal encapsulation is a good goal.
Post by Daniel Stone
Post by Daniel Stone
I don't think that's a difference we'll ever resolve though.
I believe thus far we've all tried to focus objectively on specific issues,
proposed solutions for them, and the merits of those solutions. Weston and
the other Wayland compositors I'm aware of are based on EGL at the moment,
so regardless of its merits as an API it doesn't seem problematic purely
from a dependency standpoint to add EGLStream as an option next to the
existing EGLImage and EGLDisplay+GBM paths. I'm certainly willing to
continue discussing the merits of EGL on a broader scale, but does that
discussion need to block the patches proposed here?
Every additional codepath has its cost. Even if you just look at
Mutter and Weston in a vacuum, it seems like it'll be quite the large
patchset(s) by the time it's done, let alone extending it out to all
the other compositors. This is a patchset which will need constant
care and feeding: if it's not tested, it's broken. Right now, there is
only one Streams implementation available, which is in a driver whose
legal status is seen to be sufficiently problematic that it is not
generally distributed by downstreams, which requires a whole set of
external kernel patches to run. So even getting it to run is
non-trivial.
But then we'd have to do that in such a way that it was generally
available, else any refactoring or changes we wanted to do internally
would have to be blocked on testing/review from someone who knew that
backend well enough. Either that, or it would just get broken.
Introducing these codepaths has a very, very, real cost to the
projects you're talking about.
If there were an open source implementation of streams, would that
affect your view?

Agreed, all new code, and especially new significant branches in code
has costs. However, a balance always needs to be struck.
Post by Daniel Stone
You could quite rightly point to the Raspberry Pi DispManX backend as
an example of the same, and you'd be right. And that's why I'm
extremely enthused about how their new KMS/GBM driver allows us to
nuke the entire backend from orbit, and reduce our testing load by
shifting them to the generic driver.
I hope we can avoid an entirely forked compositor-drm/eglstream (and
especially gl-renderer) for these reasons. The majority of the code is
still common and would be exercised using either path.

Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
Daniel Stone
2016-05-04 15:56:01 UTC
Permalink
Hi,
Interleaving both replies ...
Post by James Jones
Post by Daniel Stone
No, the necessary extensions can not be contained within the binding. There
is not enough information within the driver layer alone. Something needs to
tell the driver when the configuration changes (E.g., the consumer of a
wayland surface switches from a texture to a plane) and what the new
configuration is. This would trigger the protocol notifications &
subsequent optimization within the driver. By the nature of their API,
streams would require the compositor to take action on such configuration
changes, and streams can discover the new configuration. Something
equivalent would be required to make this work in the GBM+wl_drm/EGL case.
I don't think this is the case. As I went through with Andy, we
_already_ have intent expressed in the GBM case, in the exact same way
that EGLStreams does: consider gbm_bo_import as equivalent for
attaching to an EGLOutput(Layer) consumer, and EGLImage import +
TargetTexture2D as equivalent for attaching a gltexture consumer.
"Will be used for display on device X" is not sufficient information, as
Daniel Vetter outlined.
Indeed, but nothing we have - including both the initial Streams
patchset, and the subsequent proposals for adding muxing as well as
KMS config passthrough - is sufficient for that.

The Streams Check/Commit proposal you outlined a couple of mails ago
isn't sufficient because you often need to know global configuration
to determine if a configuration is even usable, let alone optimal:
shared decompression/detiling units, global bandwidth/watermark
limits, etc. Having just one entrypoint to Streams where it gets very
limited information about each plane that Streams is using isn't
enough, because it needs to know the global configuration.

So to actually make this work on other hardware, you'd need to pass
the full request (including content which came via other sources, e.g.
dmabuf) through to Streams. And by the time you're handing your entire
scene graph off to an external component to determine the optimal
configuration ... well, that's basically HWC.

I'm also not sure what the plan is for integrating with Vulkan
compositors: does that end up as an interop extension? Does VK WSI
gain an equivalent which allows you to mux swapchain/device_swapchain?
(Similar questions for the Check/Commit API really.)
Post by James Jones
Post by Daniel Stone
This
is the exact same proxy for intent to display, and in fact the GBM
approach is slightly more flexible, because it allows you to both do
direct scanout as well as GPU composition (e.g. if you're
capturing/streaming at the same time as display).
Again though, without stream-retargeting, this is not something which
exists in Streams today, and doing so is going to require more
extensions: more code in your driver, more code in every
implementation. GBM today, for all its faults, does not require
further API extension to make this work.
Agreed. We're working on similar flexibility for streams via an EGLSwitch
muxing extension. As mentioned above, GBM would require API extensions and
driver updates to reach the expressiveness of streams as well though.
Right - but as with the point I was making below, GBM _right now_ is
more capable than Streams _right now_. GBM right now would require API
additions to match EGLStreams + EGLSwitch + Streams/KMS-interop, but
the last two aren't written either, so. (More below.)
Post by James Jones
What streams exposes is intended to lower the amount of stuff hidden in
drivers, not increase it. Streams is a generic swapchain mechanism exposed
to any user, whereas we would need to write something proprietary (maybe
open source, maybe closed source, but NVIDIA-specific none the less) for
each window system to get equivalent performance if we pushed the
abstraction to a lower level.
Hm, I'm not quite sure how this adds up. Streams + Switch +
Streams/KMS interop is a _lot_ of complexity that goes buried into
drivers, with no external visibility. I don't doubt your ability to
get it right, but I _do_ doubt the ability of others to get this
right. As you say, Streams is intended to make these problems go away,
but it doesn't disappear, it just shifts elsewhere. I worry that, by
the time you're done building out all the capability you're talking
about on top of Streams, we'll end up with a spec that will be
interpreted and implemented quite differently by every vendor.
Post by James Jones
Post by Daniel Stone
Media falls down because currently there is no zerocopy binding from
either hardware or software media decode engines. Perhaps not the case
on your hardware, unusually blessed with a great deal of memory
bandwidth, but a great many devices physically cannot cope with a
single copy in the pipeline, given the ratio of content size to memory
bandwidth. Doing this in EGL would require a 'draw' step which simply
presented an existing buffer - a step which would unnecessarily
involve the GPU if the pipeline is direct from decode to scanout - or
it would involve having every media engine write their own bindings to
the Streams protocol.
Right. Streams are meant to support lot's of different producers and
consumers.
Have you looked much at the media landscape, and discussed it with
relevant projects - GStreamer, Kodi/XBMC, etc?
Post by James Jones
Post by Daniel Stone
There are also incredibly exacting timing requirements for media
display, which the Streams model of 'single permanently fixed latency'
does not even come close to achieving. So for that you'd need another
extension, to report actual achieved timings back. Wayland today
fulfills these requirements with the zlinux_dmabuf and
presentation_timing protocols, with the original hardware timings fed
back through KMS.
Would it be reasonable to support such existing extensions while using
streams?
Again, you'd need to add quite a bit of new API to Streams. In
particular, every frame would need to gain two EGL objects: one for
the producer which could be used to obtain presentation feedback, and
one for the consumer which could be used to submit presentation
feedback. And with this, you bang hard into EGL's lack of signalling,
unless clients are expected to either poll or spin up a separate
thread just to block.
Post by James Jones
Post by Daniel Stone
Every additional codepath has its cost. Even if you just look at
Mutter and Weston in a vacuum, it seems like it'll be quite the large
patchset(s) by the time it's done, let alone extending it out to all
the other compositors. This is a patchset which will need constant
care and feeding: if it's not tested, it's broken. Right now, there is
only one Streams implementation available, which is in a driver whose
legal status is seen to be sufficiently problematic that it is not
generally distributed by downstreams, which requires a whole set of
external kernel patches to run. So even getting it to run is
non-trivial.
But then we'd have to do that in such a way that it was generally
available, else any refactoring or changes we wanted to do internally
would have to be blocked on testing/review from someone who knew that
backend well enough. Either that, or it would just get broken.
Introducing these codepaths has a very, very, real cost to the
projects you're talking about.
If there were an open source implementation of streams, would that affect
your view?
It would definitely make it significantly easier, especially as we
work towards things like continuous integration (see kernelci.org -
and then extend that upwards a bit). Something that is open, doesn't
require non-mainline kernels (or at least has a path where you can see
it working towards running on mainline), runs on real hardware, etc,
would really make it much easier.
Post by James Jones
Post by Daniel Stone
You could quite rightly point to the Raspberry Pi DispManX backend as
an example of the same, and you'd be right. And that's why I'm
extremely enthused about how their new KMS/GBM driver allows us to
nuke the entire backend from orbit, and reduce our testing load by
shifting them to the generic driver.
I hope we can avoid an entirely forked compositor-drm/eglstream (and
especially gl-renderer) for these reasons. The majority of the code is
still common and would be exercised using either path.
Oh, I'm talking about a three-way split: gl-renderer-common.c,
gl-renderer-eglimage.c, gl-renderer-eglstreams.c, and the same for
compositor-drm.c. It's not reasonable to require you to write your own
DRM backlight property handling, or Weston -> GL scene-graph
transformation handling.
Post by James Jones
Post by Daniel Stone
It is unfortunate that you seem to discuss 'Streams' as an abstract
concept of a cross-process swapchain which can be infinitely adjusted
to achieve perfection, and yet 'GBM' gets discussed as a singular
fixed-in-time thing which has all the flaws of just one of its
particular platform implementations.
I have a stronger understanding of the design direction for streams than I
do for GBM, and EGLStream is indeed intended to evolve towards the best
abstraction of a swapchain possible. My views of GBM are based on the
current API. I'm not that familiar with the Mesa implementation details.
I'd be happy to learn more about the direction the GBM API is taking in the
future, and that's half of what I was attempting to do in my
responses/questions here.
Well, this thread is hopefully shaping it!
Post by James Jones
Post by Daniel Stone
I don't see how GBM could really perform any worse in such a design.
The current GBM API is not expressive enough to support optimal buffer
allocation (at least on our hardware) in such a design.
Currently, that's objectively true of both GBM and Streams. Both are
going to need extension to work as hoped.

Cheers,
Daniel
James Jones
2016-05-11 20:43:45 UTC
Permalink
Post by Daniel Stone
Hi,
Interleaving both replies ...
Post by James Jones
Post by Daniel Stone
No, the necessary extensions can not be contained within the binding. There
is not enough information within the driver layer alone. Something needs to
tell the driver when the configuration changes (E.g., the consumer of a
wayland surface switches from a texture to a plane) and what the new
configuration is. This would trigger the protocol notifications &
subsequent optimization within the driver. By the nature of their API,
streams would require the compositor to take action on such configuration
changes, and streams can discover the new configuration. Something
equivalent would be required to make this work in the GBM+wl_drm/EGL case.
I don't think this is the case. As I went through with Andy, we
_already_ have intent expressed in the GBM case, in the exact same way
that EGLStreams does: consider gbm_bo_import as equivalent for
attaching to an EGLOutput(Layer) consumer, and EGLImage import +
TargetTexture2D as equivalent for attaching a gltexture consumer.
"Will be used for display on device X" is not sufficient information, as
Daniel Vetter outlined.
Indeed, but nothing we have - including both the initial Streams
patchset, and the subsequent proposals for adding muxing as well as
KMS config passthrough - is sufficient for that.
The Streams Check/Commit proposal you outlined a couple of mails ago
isn't sufficient because you often need to know global configuration
shared decompression/detiling units, global bandwidth/watermark
limits, etc. Having just one entrypoint to Streams where it gets very
limited information about each plane that Streams is using isn't
enough, because it needs to know the global configuration.
So to actually make this work on other hardware, you'd need to pass
the full request (including content which came via other sources, e.g.
dmabuf) through to Streams. And by the time you're handing your entire
scene graph off to an external component to determine the optimal
configuration ... well, that's basically HWC.
I'm sorry for mixing them up again by alluding to Daniel Vetter's
statement, but there are two separate things being discussed here:

-A fully optimal scene-graph. This is important, but not solved by
streams alone. Streams could work as one of several building blocks in
a solution for this.

-Optimal presentation and allocation of buffers between two endpoints
(I.e., optimizing frame allocation and delivery for what Weston can do
right now). My claim was that current streams solve this, while current
GBM does not provide enough information for even this optimization.

Solving the global scene graph optimization problem is important, but
will require additional work. The incremental gains from using streams
(worth around 10% raw throughput on Kepler-based NVIDIA GPUs for
example, supposedly more on later hardware though I've not yet
benchmarked directly there) should not be ignored just because they
don't achieve perfection in a single step. Incremental improvements are
still valuable.
Post by Daniel Stone
I'm also not sure what the plan is for integrating with Vulkan
compositors: does that end up as an interop extension? Does VK WSI
gain an equivalent which allows you to mux swapchain/device_swapchain?
(Similar questions for the Check/Commit API really.)
Yes, if an EGL-based client was presenting to a Vulkan-based compositor,
interop would be happening somewhere. Either yet-to-be-developed Vulkan
primitives could be used to implement the wayland-egl library with
interop on the client side, or EGLStreams could be used to implement the
wayland-egl library with interop on the server side. Or there could be
EGL->(wl_drm)->Vulkan, which is essentially 2 interop steps, but that
has the same shortcomings we've been discussing for the current
EGL->(wl_drm)->EGL/GBM+DRM situation.
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
This
is the exact same proxy for intent to display, and in fact the GBM
approach is slightly more flexible, because it allows you to both do
direct scanout as well as GPU composition (e.g. if you're
capturing/streaming at the same time as display).
Again though, without stream-retargeting, this is not something which
exists in Streams today, and doing so is going to require more
extensions: more code in your driver, more code in every
implementation. GBM today, for all its faults, does not require
further API extension to make this work.
Agreed. We're working on similar flexibility for streams via an EGLSwitch
muxing extension. As mentioned above, GBM would require API extensions and
driver updates to reach the expressiveness of streams as well though.
Right - but as with the point I was making below, GBM _right now_ is
more capable than Streams _right now_. GBM right now would require API
additions to match EGLStreams + EGLSwitch + Streams/KMS-interop, but
the last two aren't written either, so. (More below.)
The current behavior that enables this, where basically all Wayland
buffers must be allocated as scanout-capable, isn't reasonable on NVIDIA
hardware. The requirements for scanout are too onerous. I'm sure it
works in demos on nouveau, but it's not realistic for a production driver.
Post by Daniel Stone
Post by James Jones
What streams exposes is intended to lower the amount of stuff hidden in
drivers, not increase it. Streams is a generic swapchain mechanism exposed
to any user, whereas we would need to write something proprietary (maybe
open source, maybe closed source, but NVIDIA-specific none the less) for
each window system to get equivalent performance if we pushed the
abstraction to a lower level.
Hm, I'm not quite sure how this adds up. Streams + Switch +
Streams/KMS interop is a _lot_ of complexity that goes buried into
drivers, with no external visibility. I don't doubt your ability to
get it right, but I _do_ doubt the ability of others to get this
right. As you say, Streams is intended to make these problems go away,
but it doesn't disappear, it just shifts elsewhere.
I agree with much of the above, but I don't think it's at odds with my
statement.

Yes, something still needs to solve the problem of which type of buffer
is best for the combination of producer X and consumer Y. However, this
is always going to be hardware-specific, so a vendor-specific backend is
going to be the best place for it regardless of where that backend
lives. EGLSwitch/supporting multiple possible consumers with one
preferred one just makes that decision more complex, but doesn't change
the HW-specific nature of the process.

Something needs to handle the operations that prepare a buffer for use
on consumer Y after producer X has completed its work, and vice-versa.
Again, what exactly those operations are is HW-specific, so they're
going to live in HW-specific portions of the library (eglSwapBuffers(),
or the Vulkan layout transitions + memory barriers).

The KMS interactions are trivial: Filling in some framebuffer attributes
on an atomic request. The rest of the atomic request setup could still
be done non-opaquely since, as you've pointed out, EGLStreams don't
solve the overall configuration optimization problem.

Comparing:

(a) The minimal set (or as close to it as possible) of HW-specific
operations encapsulated in one object (a stream) that can be re-used
across various higher-level projects.

(b) Implementing several similar but slightly different window system
integration modules in each driver along with the above necessary
encapsulations.

It seems to me that (a) results in less overall encapsulation.
Post by Daniel Stone
I worry that, by
the time you're done building out all the capability you're talking
about on top of Streams, we'll end up with a spec that will be
interpreted and implemented quite differently by every vendor.
The same could be said of any standard or API that attempts to address a
complex use case. We could agree to require standardized testing at the
Khronos level (It wouldn't be the first time EGL conformance was
suggested), or unofficially require piglit tests for the necessary
stream extensions if that would help. Arguably, weston could act as the
de-facto conformance test too, though.
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
Media falls down because currently there is no zerocopy binding from
either hardware or software media decode engines. Perhaps not the case
on your hardware, unusually blessed with a great deal of memory
bandwidth, but a great many devices physically cannot cope with a
single copy in the pipeline, given the ratio of content size to memory
bandwidth. Doing this in EGL would require a 'draw' step which simply
presented an existing buffer - a step which would unnecessarily
involve the GPU if the pipeline is direct from decode to scanout - or
it would involve having every media engine write their own bindings to
the Streams protocol.
Right. Streams are meant to support lot's of different producers and
consumers.
Have you looked much at the media landscape, and discussed it with
relevant projects - GStreamer, Kodi/XBMC, etc?
I haven't personally. Others in NVIDIA are working on the multimedia
aspects of streams.
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
There are also incredibly exacting timing requirements for media
display, which the Streams model of 'single permanently fixed latency'
does not even come close to achieving. So for that you'd need another
extension, to report actual achieved timings back. Wayland today
fulfills these requirements with the zlinux_dmabuf and
presentation_timing protocols, with the original hardware timings fed
back through KMS.
Would it be reasonable to support such existing extensions while using
streams?
Again, you'd need to add quite a bit of new API to Streams. In
particular, every frame would need to gain two EGL objects: one for
the producer which could be used to obtain presentation feedback, and
one for the consumer which could be used to submit presentation
feedback. And with this, you bang hard into EGL's lack of signalling,
unless clients are expected to either poll or spin up a separate
thread just to block.
The existing feedback mechanisms couldn't be used along side streams
without integrating them into EGL? Streams just deliver frames, but it
should be possible to correlate those frames with some external
mechanism providing feedback on them.
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
Every additional codepath has its cost. Even if you just look at
Mutter and Weston in a vacuum, it seems like it'll be quite the large
patchset(s) by the time it's done, let alone extending it out to all
the other compositors. This is a patchset which will need constant
care and feeding: if it's not tested, it's broken. Right now, there is
only one Streams implementation available, which is in a driver whose
legal status is seen to be sufficiently problematic that it is not
generally distributed by downstreams, which requires a whole set of
external kernel patches to run. So even getting it to run is
non-trivial.
But then we'd have to do that in such a way that it was generally
available, else any refactoring or changes we wanted to do internally
would have to be blocked on testing/review from someone who knew that
backend well enough. Either that, or it would just get broken.
Introducing these codepaths has a very, very, real cost to the
projects you're talking about.
If there were an open source implementation of streams, would that affect
your view?
It would definitely make it significantly easier, especially as we
work towards things like continuous integration (see kernelci.org -
and then extend that upwards a bit). Something that is open, doesn't
require non-mainline kernels (or at least has a path where you can see
it working towards running on mainline), runs on real hardware, etc,
would really make it much easier.
Post by James Jones
Post by Daniel Stone
You could quite rightly point to the Raspberry Pi DispManX backend as
an example of the same, and you'd be right. And that's why I'm
extremely enthused about how their new KMS/GBM driver allows us to
nuke the entire backend from orbit, and reduce our testing load by
shifting them to the generic driver.
I hope we can avoid an entirely forked compositor-drm/eglstream (and
especially gl-renderer) for these reasons. The majority of the code is
still common and would be exercised using either path.
Oh, I'm talking about a three-way split: gl-renderer-common.c,
gl-renderer-eglimage.c, gl-renderer-eglstreams.c, and the same for
compositor-drm.c. It's not reasonable to require you to write your own
DRM backlight property handling, or Weston -> GL scene-graph
transformation handling.
That does sound like a reasonable direction. Would you consider such a
refactoring palatable?
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
It is unfortunate that you seem to discuss 'Streams' as an abstract
concept of a cross-process swapchain which can be infinitely adjusted
to achieve perfection, and yet 'GBM' gets discussed as a singular
fixed-in-time thing which has all the flaws of just one of its
particular platform implementations.
I have a stronger understanding of the design direction for streams than I
do for GBM, and EGLStream is indeed intended to evolve towards the best
abstraction of a swapchain possible. My views of GBM are based on the
current API. I'm not that familiar with the Mesa implementation details.
I'd be happy to learn more about the direction the GBM API is taking in the
future, and that's half of what I was attempting to do in my
responses/questions here.
Well, this thread is hopefully shaping it!
Post by James Jones
Post by Daniel Stone
I don't see how GBM could really perform any worse in such a design.
The current GBM API is not expressive enough to support optimal buffer
allocation (at least on our hardware) in such a design.
Currently, that's objectively true of both GBM and Streams. Both are
going to need extension to work as hoped.
Yes. Given more work is needed (a lot more, apparently), my hope is to
leverage that work as broadly as possible. I hope NVIDIA's statements
thus far have shown that a solution based on streams is more valuable in
that regard than a solution spread across EGL, Wayland protocol, and GBM.

Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
Daniel Stone
2016-05-11 21:31:21 UTC
Permalink
Hi James,
Post by Daniel Stone
Right - but as with the point I was making below, GBM _right now_ is
more capable than Streams _right now_. GBM right now would require API
additions to match EGLStreams + EGLSwitch + Streams/KMS-interop, but
the last two aren't written either, so. (More below.)
The current behavior that enables this, where basically all Wayland buffers
must be allocated as scanout-capable, isn't reasonable on NVIDIA hardware.
The requirements for scanout are too onerous.
I think we're talking past each other, so I'd like to pare the
discussion down to these two sentences, and my two resultant points,
for now:

I posit that the Streams proposal you (plural) have put forward is, at
best, no better at meeting these criteria:
- there is currently no support for direct scanout from client
buffers in Streams, so it must always pessimise towards GPU
composition
- GBM stacks can obviously do the same: implement a no-op
gbm_bo_import, and have your client always allocate non-scanout
buffers - presto, you've matched Streams

I posit that GBM _can_ match the capability of a hypothetical
EGLStreams/EGLSwitch implementation. Current _implementations_ of GBM
cannot, but I posit that it is not a limitation of the API it exposes,
and unlike Streams, the capability can be plumbed in with no new
external API required.

These seem pretty fundamental, so ... am I missing something? :\ If
so, can you please outline fairly specifically how you think
non-Streams implementations are not capable of meeting the criteria in
your two sentences?

Cheers,
Daniel
James Jones
2016-05-11 23:08:13 UTC
Permalink
Post by Daniel Stone
Hi James,
Post by Daniel Stone
Right - but as with the point I was making below, GBM _right now_ is
more capable than Streams _right now_. GBM right now would require API
additions to match EGLStreams + EGLSwitch + Streams/KMS-interop, but
the last two aren't written either, so. (More below.)
The current behavior that enables this, where basically all Wayland buffers
must be allocated as scanout-capable, isn't reasonable on NVIDIA hardware.
The requirements for scanout are too onerous.
I think we're talking past each other, so I'd like to pare the
discussion down to these two sentences, and my two resultant points,
I posit that the Streams proposal you (plural) have put forward is, at
- there is currently no support for direct scanout from client
buffers in Streams, so it must always pessimise towards GPU
composition
- GBM stacks can obviously do the same: implement a no-op
gbm_bo_import, and have your client always allocate non-scanout
buffers - presto, you've matched Streams
I posit that GBM _can_ match the capability of a hypothetical
EGLStreams/EGLSwitch implementation. Current _implementations_ of GBM
cannot, but I posit that it is not a limitation of the API it exposes,
and unlike Streams, the capability can be plumbed in with no new
external API required.
These seem pretty fundamental, so ... am I missing something? :\ If
so, can you please outline fairly specifically how you think
non-Streams implementations are not capable of meeting the criteria in
your two sentences?
I respect the need to rein in the discussion, but I think several
substantive aspects have been lost here. I typed up a much longer
response below, but I'll try to summarize in 4 sentences:

GBM could match the allocation aspects of streams used in Miguel's first
round of patches. However, I disagree that its core API is sufficient
to match the allocation capabilities of EGLStream+EGLSwitch where all
producing and consuming devices+engines are known at allocation time.
Further, streams have additional equally valuable functionality beyond
allocation that GBM does not seem intended to address. Absent
agreement, I believe co-existence of EGLStreams and GBM+wl_drm in
Wayland/Weston is a reasonable path forward in the short term.

The longer version:

GBM alone can not perform as well as EGLStreams unless it is extended
into something more or less the same as EGLStreams, where it knows
exactly what engines are being used to produce the buffer content (along
with their current configuration), and exactly what
engines/configuration are being used to consume it. This implies
allocating against multiple specific objects, rather than a device and a
set of allocation modifier flags, and/or importing an external
allocation and hoping it meets the current requirements. From what I
can see, GBM fundamentally understands at most the consumer side of the
equation.

Suppose however, GBM was taught everything streams know implicitly about
all users of the buffers at allocation time. After allocation, GBM is
done with its job, but streams & drivers aren't.

The act of transitioning a buffer from optimal "producer mode" to
optimal "consumer mode" relies on all the device & config information as
well, meaning it would need to be fed into the graphics driver (EGL or
whatever window system binding is used) by each window system the
graphics driver was running on to achieve equivalent capabilities to
EGLStream.

Fundamentally, the API-level view of individual graphics buffers as raw
globally coherent & accessible stores of pixels with static layout is
flawed. Images on a GPU are more of a mutating spill space for a
collection of state describing the side effects of various commands than
a 2D array of pixels. Forcing GPUs to resolve an image to a 2D array of
pixels in any particular layout can be very inefficient. The
GL+GLX/EGL/etc. driver model hides this well, but it breaks down in a
few cases like EGLImage and GLX_EXT_texture_from_pixmap, the former not
really living up to its implied potential because of this, and the
latter mostly working only because it has a very limited domain where
things can be shared, but still requires a lot of platform-specific code
to support properly. Vulkan brings a lot more of this out into the open
with its very explicit image state transitions and limitations on which
engines can access an image in any given state, but that's just within
the Vulkan API itself (I.e., strictly on a single GPU and optionally an
associated display engine within the same driver & process) so far.

The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I
believe to be the minimal amount of the traditional GL+GLX/EGL/etc.
model, while still allowing as much of the flexibility of the "a bunch
of buffers" mental model as possible. We can re-invent that with GBM
API adjustments, a set of restrictions on how the buffers it allocates
can be used, and another layer of metadata being pumped into drivers on
top of that, but I suspect we'd wind up with something that looks very
similar to streams.

We're both delving into future developments and hypotheticals to some
degree here. If we can't agree now on which direction is best, I
believe the right solution is to allow the two to co-exist and compete
collegially until the benefits of one or the other become more apparent.
The Wayland protocol and Weston compositor were designed in a manner
that makes this as painless as possible. It's not like we're going to
get a ton of Wayland clients that suddenly rely on EGLStream. At worst,
streams lose out and some dead code needs to be deleted from any
compositors that adopted them. As we discussed, there is some
maintenance cost to having two paths, but I believe it is reasonably
contained.

Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
Mike Blumenkrantz
2016-05-11 23:55:35 UTC
Permalink
Post by Miguel Angel Vico
Post by Daniel Stone
Hi James,
Post by James Jones
Post by Daniel Stone
Right - but as with the point I was making below, GBM _right now_ is
more capable than Streams _right now_. GBM right now would require API
additions to match EGLStreams + EGLSwitch + Streams/KMS-interop, but
the last two aren't written either, so. (More below.)
The current behavior that enables this, where basically all Wayland
buffers
Post by Daniel Stone
Post by James Jones
must be allocated as scanout-capable, isn't reasonable on NVIDIA
hardware.
Post by Daniel Stone
Post by James Jones
The requirements for scanout are too onerous.
I think we're talking past each other, so I'd like to pare the
discussion down to these two sentences, and my two resultant points,
I posit that the Streams proposal you (plural) have put forward is, at
- there is currently no support for direct scanout from client
buffers in Streams, so it must always pessimise towards GPU
composition
- GBM stacks can obviously do the same: implement a no-op
gbm_bo_import, and have your client always allocate non-scanout
buffers - presto, you've matched Streams
I posit that GBM _can_ match the capability of a hypothetical
EGLStreams/EGLSwitch implementation. Current _implementations_ of GBM
cannot, but I posit that it is not a limitation of the API it exposes,
and unlike Streams, the capability can be plumbed in with no new
external API required.
These seem pretty fundamental, so ... am I missing something? :\ If
so, can you please outline fairly specifically how you think
non-Streams implementations are not capable of meeting the criteria in
your two sentences?
I respect the need to rein in the discussion, but I think several
substantive aspects have been lost here. I typed up a much longer
GBM could match the allocation aspects of streams used in Miguel's first
round of patches. However, I disagree that its core API is sufficient
to match the allocation capabilities of EGLStream+EGLSwitch where all
producing and consuming devices+engines are known at allocation time.
Further, streams have additional equally valuable functionality beyond
allocation that GBM does not seem intended to address. Absent
agreement, I believe co-existence of EGLStreams and GBM+wl_drm in
Wayland/Weston is a reasonable path forward in the short term.
GBM alone can not perform as well as EGLStreams unless it is extended
into something more or less the same as EGLStreams, where it knows
exactly what engines are being used to produce the buffer content (along
with their current configuration), and exactly what
engines/configuration are being used to consume it. This implies
allocating against multiple specific objects, rather than a device and a
set of allocation modifier flags, and/or importing an external
allocation and hoping it meets the current requirements. From what I
can see, GBM fundamentally understands at most the consumer side of the
equation.
Suppose however, GBM was taught everything streams know implicitly about
all users of the buffers at allocation time. After allocation, GBM is
done with its job, but streams & drivers aren't.
The act of transitioning a buffer from optimal "producer mode" to
optimal "consumer mode" relies on all the device & config information as
well, meaning it would need to be fed into the graphics driver (EGL or
whatever window system binding is used) by each window system the
graphics driver was running on to achieve equivalent capabilities to
EGLStream.
Fundamentally, the API-level view of individual graphics buffers as raw
globally coherent & accessible stores of pixels with static layout is
flawed. Images on a GPU are more of a mutating spill space for a
collection of state describing the side effects of various commands than
a 2D array of pixels. Forcing GPUs to resolve an image to a 2D array of
pixels in any particular layout can be very inefficient. The
GL+GLX/EGL/etc. driver model hides this well, but it breaks down in a
few cases like EGLImage and GLX_EXT_texture_from_pixmap, the former not
really living up to its implied potential because of this, and the
latter mostly working only because it has a very limited domain where
things can be shared, but still requires a lot of platform-specific code
to support properly. Vulkan brings a lot more of this out into the open
with its very explicit image state transitions and limitations on which
engines can access an image in any given state, but that's just within
the Vulkan API itself (I.e., strictly on a single GPU and optionally an
associated display engine within the same driver & process) so far.
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I
believe to be the minimal amount of the traditional GL+GLX/EGL/etc.
model, while still allowing as much of the flexibility of the "a bunch
of buffers" mental model as possible. We can re-invent that with GBM
API adjustments, a set of restrictions on how the buffers it allocates
can be used, and another layer of metadata being pumped into drivers on
top of that, but I suspect we'd wind up with something that looks very
similar to streams.
We're both delving into future developments and hypotheticals to some
degree here. If we can't agree now on which direction is best, I
believe the right solution is to allow the two to co-exist and compete
collegially until the benefits of one or the other become more apparent.
The Wayland protocol and Weston compositor were designed in a manner
Post by Miguel Angel Vico
that makes this as painless as possible. It's not like we're going to
get a ton of Wayland clients that suddenly rely on EGLStream. At worst,
streams lose out and some dead code needs to be deleted from any
compositors that adopted them. As we discussed, there is some
maintenance cost to having two paths, but I believe it is reasonably
contained.
Hi,

I've been following this thread for some time, and you've raised some
interesting points. This one in particular concerns me, however. As I
understand it, you're proposing your stream-based approach which would
exist alongside the current standard (and universally-used) GBM.
Additionally, in order to run on your specific brand of hardware, all
toolkit and compositor authors would need to implement your proposed
streams functionality otherwise only software rendering would be available?

If this is true then it seems a bit strange to me that, despite still
speaking in hypothetical terms about future developments in both GBM and
streams, you're stating that GBM cannot be improved to match the
functionality of your proposed approach and are instead advocating that
everyone who has already written support for GBM now also support streams.

As someone with more than a casual interest in both toolkit and compositor
development, I'd like to see the best approach succeed, but I don't see any
fundamental blocker to providing the functionality you've described in GBM,
and I'm not overly enthusiastic about someone requiring even more work from
those who write toolkits and compositors, especially when having "full"
Wayland support is already such an enormous undertaking.

If I'm misunderstanding things, I'd appreciate some clarifications.

Thanks,
Mike
Post by Miguel Angel Vico
Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
James Jones
2016-05-12 00:13:56 UTC
Permalink
Post by Daniel Stone
Post by Daniel Stone
Hi James,
Post by James Jones
Post by Daniel Stone
Right - but as with the point I was making below, GBM _right
now_ is
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
more capable than Streams _right now_. GBM right now would
require API
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
additions to match EGLStreams + EGLSwitch +
Streams/KMS-interop, but
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
the last two aren't written either, so. (More below.)
The current behavior that enables this, where basically all
Wayland buffers
Post by Daniel Stone
Post by James Jones
must be allocated as scanout-capable, isn't reasonable on NVIDIA
hardware.
Post by Daniel Stone
Post by James Jones
The requirements for scanout are too onerous.
I think we're talking past each other, so I'd like to pare the
discussion down to these two sentences, and my two resultant points,
I posit that the Streams proposal you (plural) have put forward
is, at
Post by Daniel Stone
- there is currently no support for direct scanout from client
buffers in Streams, so it must always pessimise towards GPU
composition
- GBM stacks can obviously do the same: implement a no-op
gbm_bo_import, and have your client always allocate non-scanout
buffers - presto, you've matched Streams
I posit that GBM _can_ match the capability of a hypothetical
EGLStreams/EGLSwitch implementation. Current _implementations_ of GBM
cannot, but I posit that it is not a limitation of the API it
exposes,
Post by Daniel Stone
and unlike Streams, the capability can be plumbed in with no new
external API required.
These seem pretty fundamental, so ... am I missing something? :\ If
so, can you please outline fairly specifically how you think
non-Streams implementations are not capable of meeting the
criteria in
Post by Daniel Stone
your two sentences?
I respect the need to rein in the discussion, but I think several
substantive aspects have been lost here. I typed up a much longer
GBM could match the allocation aspects of streams used in Miguel's first
round of patches. However, I disagree that its core API is sufficient
to match the allocation capabilities of EGLStream+EGLSwitch where all
producing and consuming devices+engines are known at allocation time.
Further, streams have additional equally valuable functionality beyond
allocation that GBM does not seem intended to address. Absent
agreement, I believe co-existence of EGLStreams and GBM+wl_drm in
Wayland/Weston is a reasonable path forward in the short term.
GBM alone can not perform as well as EGLStreams unless it is extended
into something more or less the same as EGLStreams, where it knows
exactly what engines are being used to produce the buffer content (along
with their current configuration), and exactly what
engines/configuration are being used to consume it. This implies
allocating against multiple specific objects, rather than a device and a
set of allocation modifier flags, and/or importing an external
allocation and hoping it meets the current requirements. From what I
can see, GBM fundamentally understands at most the consumer side of the
equation.
Suppose however, GBM was taught everything streams know implicitly about
all users of the buffers at allocation time. After allocation, GBM is
done with its job, but streams & drivers aren't.
The act of transitioning a buffer from optimal "producer mode" to
optimal "consumer mode" relies on all the device & config information as
well, meaning it would need to be fed into the graphics driver (EGL or
whatever window system binding is used) by each window system the
graphics driver was running on to achieve equivalent capabilities to
EGLStream.
Fundamentally, the API-level view of individual graphics buffers as raw
globally coherent & accessible stores of pixels with static layout is
flawed. Images on a GPU are more of a mutating spill space for a
collection of state describing the side effects of various commands than
a 2D array of pixels. Forcing GPUs to resolve an image to a 2D array of
pixels in any particular layout can be very inefficient. The
GL+GLX/EGL/etc. driver model hides this well, but it breaks down in a
few cases like EGLImage and GLX_EXT_texture_from_pixmap, the former not
really living up to its implied potential because of this, and the
latter mostly working only because it has a very limited domain where
things can be shared, but still requires a lot of platform-specific code
to support properly. Vulkan brings a lot more of this out into the open
with its very explicit image state transitions and limitations on which
engines can access an image in any given state, but that's just within
the Vulkan API itself (I.e., strictly on a single GPU and optionally an
associated display engine within the same driver & process) so far.
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I
believe to be the minimal amount of the traditional GL+GLX/EGL/etc.
model, while still allowing as much of the flexibility of the "a bunch
of buffers" mental model as possible. We can re-invent that with GBM
API adjustments, a set of restrictions on how the buffers it allocates
can be used, and another layer of metadata being pumped into drivers on
top of that, but I suspect we'd wind up with something that looks very
similar to streams.
We're both delving into future developments and hypotheticals to some
degree here. If we can't agree now on which direction is best, I
believe the right solution is to allow the two to co-exist and compete
collegially until the benefits of one or the other become more apparent.
The Wayland protocol and Weston compositor were designed in a manner
that makes this as painless as possible. It's not like we're going to
get a ton of Wayland clients that suddenly rely on EGLStream. At worst,
streams lose out and some dead code needs to be deleted from any
compositors that adopted them. As we discussed, there is some
maintenance cost to having two paths, but I believe it is reasonably
contained.
Hi,
I've been following this thread for some time, and you've raised some
interesting points. This one in particular concerns me, however. As I
understand it, you're proposing your stream-based approach which would
exist alongside the current standard (and universally-used) GBM.
Additionally, in order to run on your specific brand of hardware, all
toolkit and compositor authors would need to implement your proposed
streams functionality otherwise only software rendering would be available?
If this is true then it seems a bit strange to me that, despite still
speaking in hypothetical terms about future developments in both GBM and
streams, you're stating that GBM cannot be improved to match the
functionality of your proposed approach and are instead advocating that
everyone who has already written support for GBM now also support streams.
As someone with more than a casual interest in both toolkit and
compositor development, I'd like to see the best approach succeed, but I
don't see any fundamental blocker to providing the functionality you've
described in GBM, and I'm not overly enthusiastic about someone
requiring even more work from those who write toolkits and compositors,
especially when having "full" Wayland support is already such an
enormous undertaking.
If I'm misunderstanding things, I'd appreciate some clarifications.
I understand the concern, and thanks for following the discussion.
Toolkits shouldn't need any modification. Compositors would. The
changes required for compositors are not large.

Changes to all compositors would also be needed to make sufficient
improvements to GBM and related software to reach functional and
performance parity with EGLStream (or X11 for that matter), and even
more invasive changes would be needed to solve the only loosely related
scene graph optimization issues raised in this thread, so change of some
sort is a given. In general, Wayland is a young standard, and I expect
it will continue to evolve and require updates to its implementations
regardless of the issues discussed here.

While only NVIDIA currently supports streams, this is not an
NVIDIA-specific set of problems, nor is it intended to be an
NVIDIA-specific solution if other vendors adopt the open EGL standards
it is based on.

Thanks,
-James
Post by Daniel Stone
Thanks,
Mike
Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Carsten Haitzler (The Rasterman)
2016-05-12 01:56:33 UTC
Permalink
Post by James Jones
Post by Daniel Stone
Post by Daniel Stone
Hi James,
Post by James Jones
Post by Daniel Stone
Right - but as with the point I was making below, GBM _right
now_ is
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
more capable than Streams _right now_. GBM right now would
require API
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
additions to match EGLStreams + EGLSwitch +
Streams/KMS-interop, but
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
the last two aren't written either, so. (More below.)
The current behavior that enables this, where basically all
Wayland buffers
Post by Daniel Stone
Post by James Jones
must be allocated as scanout-capable, isn't reasonable on NVIDIA
hardware.
Post by Daniel Stone
Post by James Jones
The requirements for scanout are too onerous.
I think we're talking past each other, so I'd like to pare the
discussion down to these two sentences, and my two resultant points,
I posit that the Streams proposal you (plural) have put forward
is, at
Post by Daniel Stone
- there is currently no support for direct scanout from client
buffers in Streams, so it must always pessimise towards GPU
composition
- GBM stacks can obviously do the same: implement a no-op
gbm_bo_import, and have your client always allocate non-scanout
buffers - presto, you've matched Streams
I posit that GBM _can_ match the capability of a hypothetical
EGLStreams/EGLSwitch implementation. Current _implementations_ of GBM
cannot, but I posit that it is not a limitation of the API it
exposes,
Post by Daniel Stone
and unlike Streams, the capability can be plumbed in with no new
external API required.
These seem pretty fundamental, so ... am I missing something? :\ If
so, can you please outline fairly specifically how you think
non-Streams implementations are not capable of meeting the
criteria in
Post by Daniel Stone
your two sentences?
I respect the need to rein in the discussion, but I think several
substantive aspects have been lost here. I typed up a much longer
GBM could match the allocation aspects of streams used in Miguel's first
round of patches. However, I disagree that its core API is sufficient
to match the allocation capabilities of EGLStream+EGLSwitch where all
producing and consuming devices+engines are known at allocation time.
Further, streams have additional equally valuable functionality beyond
allocation that GBM does not seem intended to address. Absent
agreement, I believe co-existence of EGLStreams and GBM+wl_drm in
Wayland/Weston is a reasonable path forward in the short term.
GBM alone can not perform as well as EGLStreams unless it is extended
into something more or less the same as EGLStreams, where it knows
exactly what engines are being used to produce the buffer content (along
with their current configuration), and exactly what
engines/configuration are being used to consume it. This implies
allocating against multiple specific objects, rather than a device and a
set of allocation modifier flags, and/or importing an external
allocation and hoping it meets the current requirements. From what I
can see, GBM fundamentally understands at most the consumer side of the
equation.
Suppose however, GBM was taught everything streams know implicitly about
all users of the buffers at allocation time. After allocation, GBM is
done with its job, but streams & drivers aren't.
The act of transitioning a buffer from optimal "producer mode" to
optimal "consumer mode" relies on all the device & config information as
well, meaning it would need to be fed into the graphics driver (EGL or
whatever window system binding is used) by each window system the
graphics driver was running on to achieve equivalent capabilities to
EGLStream.
Fundamentally, the API-level view of individual graphics buffers as raw
globally coherent & accessible stores of pixels with static layout is
flawed. Images on a GPU are more of a mutating spill space for a
collection of state describing the side effects of various commands than
a 2D array of pixels. Forcing GPUs to resolve an image to a 2D array of
pixels in any particular layout can be very inefficient. The
GL+GLX/EGL/etc. driver model hides this well, but it breaks down in a
few cases like EGLImage and GLX_EXT_texture_from_pixmap, the former not
really living up to its implied potential because of this, and the
latter mostly working only because it has a very limited domain where
things can be shared, but still requires a lot of platform-specific code
to support properly. Vulkan brings a lot more of this out into the open
with its very explicit image state transitions and limitations on which
engines can access an image in any given state, but that's just within
the Vulkan API itself (I.e., strictly on a single GPU and optionally an
associated display engine within the same driver & process) so far.
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I
believe to be the minimal amount of the traditional GL+GLX/EGL/etc.
model, while still allowing as much of the flexibility of the "a bunch
of buffers" mental model as possible. We can re-invent that with GBM
API adjustments, a set of restrictions on how the buffers it allocates
can be used, and another layer of metadata being pumped into drivers on
top of that, but I suspect we'd wind up with something that looks very
similar to streams.
We're both delving into future developments and hypotheticals to some
degree here. If we can't agree now on which direction is best, I
believe the right solution is to allow the two to co-exist and compete
collegially until the benefits of one or the other become more apparent.
The Wayland protocol and Weston compositor were designed in a manner
that makes this as painless as possible. It's not like we're going to
get a ton of Wayland clients that suddenly rely on EGLStream. At worst,
streams lose out and some dead code needs to be deleted from any
compositors that adopted them. As we discussed, there is some
maintenance cost to having two paths, but I believe it is reasonably
contained.
Hi,
I've been following this thread for some time, and you've raised some
interesting points. This one in particular concerns me, however. As I
understand it, you're proposing your stream-based approach which would
exist alongside the current standard (and universally-used) GBM.
Additionally, in order to run on your specific brand of hardware, all
toolkit and compositor authors would need to implement your proposed
streams functionality otherwise only software rendering would be available?
If this is true then it seems a bit strange to me that, despite still
speaking in hypothetical terms about future developments in both GBM and
streams, you're stating that GBM cannot be improved to match the
functionality of your proposed approach and are instead advocating that
everyone who has already written support for GBM now also support streams.
As someone with more than a casual interest in both toolkit and
compositor development, I'd like to see the best approach succeed, but I
don't see any fundamental blocker to providing the functionality you've
described in GBM, and I'm not overly enthusiastic about someone
requiring even more work from those who write toolkits and compositors,
especially when having "full" Wayland support is already such an
enormous undertaking.
If I'm misunderstanding things, I'd appreciate some clarifications.
I understand the concern, and thanks for following the discussion.
Toolkits shouldn't need any modification. Compositors would. The
changes required for compositors are not large.
actually for us toolkits do need mods because the toolkit ALSO is used in the
compositor-side and thus it would have to support eglstreams as a SOURCE. we
don't render by hand in the compositor - we punt it back into the toolkit (it
effectively makes it easier to write compositors then as the toolkit actually
can deal with both producing output to a wayland compositor and consuming this
output from clients and then ALSO render it or pass it on to drm/kms etc. etc.).

so just saying your assumption here is wrong. this digs deep into the toolkit
too.
Post by James Jones
Changes to all compositors would also be needed to make sufficient
improvements to GBM and related software to reach functional and
performance parity with EGLStream (or X11 for that matter), and even
more invasive changes would be needed to solve the only loosely related
scene graph optimization issues raised in this thread, so change of some
sort is a given. In general, Wayland is a young standard, and I expect
it will continue to evolve and require updates to its implementations
regardless of the issues discussed here.
but given the state of things, we'd be left with having both a gbm path and an
eglstreams path and have to runtime select based on driver. this means
maintaining both, and sooner or later the lesser used one bitrots. we have bugs
that happen in one path only and not the other and so on.

if there were to be ab eglstreams etc. implementation for mesa and other
drivers. (i'm not even getting into the mali, imgtec, etc. etc. drivers that
would ALSO have to rev and every embedded oem now neds to provide an
eglstreams version of their drivers, as they to date have been doing things
the gbm way)... if this were to be universal, then ok, pain once to move over
then we're done.

the current trajectory is having to have both gbm and eglstreams and this is
undesirable.
Post by James Jones
While only NVIDIA currently supports streams, this is not an
NVIDIA-specific set of problems, nor is it intended to be an
NVIDIA-specific solution if other vendors adopt the open EGL standards
it is based on.
right now the others aren't budging. :)
Post by James Jones
Thanks,
-James
Post by Daniel Stone
Thanks,
Mike
Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
--
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler) ***@rasterman.com
Jonas Ådahl
2016-05-13 01:37:02 UTC
Permalink
Post by Carsten Haitzler (The Rasterman)
Post by James Jones
Post by Daniel Stone
Post by Daniel Stone
Hi James,
Post by James Jones
Post by Daniel Stone
Right - but as with the point I was making below, GBM _right
now_ is
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
more capable than Streams _right now_. GBM right now would
require API
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
additions to match EGLStreams + EGLSwitch +
Streams/KMS-interop, but
Post by Daniel Stone
Post by James Jones
Post by Daniel Stone
the last two aren't written either, so. (More below.)
The current behavior that enables this, where basically all
Wayland buffers
Post by Daniel Stone
Post by James Jones
must be allocated as scanout-capable, isn't reasonable on NVIDIA
hardware.
Post by Daniel Stone
Post by James Jones
The requirements for scanout are too onerous.
I think we're talking past each other, so I'd like to pare the
discussion down to these two sentences, and my two resultant points,
I posit that the Streams proposal you (plural) have put forward
is, at
Post by Daniel Stone
- there is currently no support for direct scanout from client
buffers in Streams, so it must always pessimise towards GPU
composition
- GBM stacks can obviously do the same: implement a no-op
gbm_bo_import, and have your client always allocate non-scanout
buffers - presto, you've matched Streams
I posit that GBM _can_ match the capability of a hypothetical
EGLStreams/EGLSwitch implementation. Current _implementations_ of GBM
cannot, but I posit that it is not a limitation of the API it
exposes,
Post by Daniel Stone
and unlike Streams, the capability can be plumbed in with no new
external API required.
These seem pretty fundamental, so ... am I missing something? :\ If
so, can you please outline fairly specifically how you think
non-Streams implementations are not capable of meeting the
criteria in
Post by Daniel Stone
your two sentences?
I respect the need to rein in the discussion, but I think several
substantive aspects have been lost here. I typed up a much longer
GBM could match the allocation aspects of streams used in Miguel's first
round of patches. However, I disagree that its core API is sufficient
to match the allocation capabilities of EGLStream+EGLSwitch where all
producing and consuming devices+engines are known at allocation time.
Further, streams have additional equally valuable functionality beyond
allocation that GBM does not seem intended to address. Absent
agreement, I believe co-existence of EGLStreams and GBM+wl_drm in
Wayland/Weston is a reasonable path forward in the short term.
GBM alone can not perform as well as EGLStreams unless it is extended
into something more or less the same as EGLStreams, where it knows
exactly what engines are being used to produce the buffer content (along
with their current configuration), and exactly what
engines/configuration are being used to consume it. This implies
allocating against multiple specific objects, rather than a device and a
set of allocation modifier flags, and/or importing an external
allocation and hoping it meets the current requirements. From what I
can see, GBM fundamentally understands at most the consumer side of the
equation.
Suppose however, GBM was taught everything streams know implicitly about
all users of the buffers at allocation time. After allocation, GBM is
done with its job, but streams & drivers aren't.
The act of transitioning a buffer from optimal "producer mode" to
optimal "consumer mode" relies on all the device & config information as
well, meaning it would need to be fed into the graphics driver (EGL or
whatever window system binding is used) by each window system the
graphics driver was running on to achieve equivalent capabilities to
EGLStream.
Fundamentally, the API-level view of individual graphics buffers as raw
globally coherent & accessible stores of pixels with static layout is
flawed. Images on a GPU are more of a mutating spill space for a
collection of state describing the side effects of various commands than
a 2D array of pixels. Forcing GPUs to resolve an image to a 2D array of
pixels in any particular layout can be very inefficient. The
GL+GLX/EGL/etc. driver model hides this well, but it breaks down in a
few cases like EGLImage and GLX_EXT_texture_from_pixmap, the former not
really living up to its implied potential because of this, and the
latter mostly working only because it has a very limited domain where
things can be shared, but still requires a lot of platform-specific code
to support properly. Vulkan brings a lot more of this out into the open
with its very explicit image state transitions and limitations on which
engines can access an image in any given state, but that's just within
the Vulkan API itself (I.e., strictly on a single GPU and optionally an
associated display engine within the same driver & process) so far.
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I
believe to be the minimal amount of the traditional GL+GLX/EGL/etc.
model, while still allowing as much of the flexibility of the "a bunch
of buffers" mental model as possible. We can re-invent that with GBM
API adjustments, a set of restrictions on how the buffers it allocates
can be used, and another layer of metadata being pumped into drivers on
top of that, but I suspect we'd wind up with something that looks very
similar to streams.
We're both delving into future developments and hypotheticals to some
degree here. If we can't agree now on which direction is best, I
believe the right solution is to allow the two to co-exist and compete
collegially until the benefits of one or the other become more apparent.
The Wayland protocol and Weston compositor were designed in a manner
that makes this as painless as possible. It's not like we're going to
get a ton of Wayland clients that suddenly rely on EGLStream. At worst,
streams lose out and some dead code needs to be deleted from any
compositors that adopted them. As we discussed, there is some
maintenance cost to having two paths, but I believe it is reasonably
contained.
Hi,
I've been following this thread for some time, and you've raised some
interesting points. This one in particular concerns me, however. As I
understand it, you're proposing your stream-based approach which would
exist alongside the current standard (and universally-used) GBM.
Additionally, in order to run on your specific brand of hardware, all
toolkit and compositor authors would need to implement your proposed
streams functionality otherwise only software rendering would be available?
If this is true then it seems a bit strange to me that, despite still
speaking in hypothetical terms about future developments in both GBM and
streams, you're stating that GBM cannot be improved to match the
functionality of your proposed approach and are instead advocating that
everyone who has already written support for GBM now also support streams.
As someone with more than a casual interest in both toolkit and
compositor development, I'd like to see the best approach succeed, but I
don't see any fundamental blocker to providing the functionality you've
described in GBM, and I'm not overly enthusiastic about someone
requiring even more work from those who write toolkits and compositors,
especially when having "full" Wayland support is already such an
enormous undertaking.
If I'm misunderstanding things, I'd appreciate some clarifications.
I understand the concern, and thanks for following the discussion.
Toolkits shouldn't need any modification. Compositors would. The
changes required for compositors are not large.
actually for us toolkits do need mods because the toolkit ALSO is used in the
compositor-side and thus it would have to support eglstreams as a SOURCE. we
don't render by hand in the compositor - we punt it back into the toolkit (it
effectively makes it easier to write compositors then as the toolkit actually
can deal with both producing output to a wayland compositor and consuming this
output from clients and then ALSO render it or pass it on to drm/kms etc. etc.).
so just saying your assumption here is wrong. this digs deep into the toolkit
too.
Post by James Jones
Changes to all compositors would also be needed to make sufficient
improvements to GBM and related software to reach functional and
performance parity with EGLStream (or X11 for that matter), and even
more invasive changes would be needed to solve the only loosely related
scene graph optimization issues raised in this thread, so change of some
sort is a given. In general, Wayland is a young standard, and I expect
it will continue to evolve and require updates to its implementations
regardless of the issues discussed here.
but given the state of things, we'd be left with having both a gbm path and an
eglstreams path and have to runtime select based on driver. this means
maintaining both, and sooner or later the lesser used one bitrots. we have bugs
that happen in one path only and not the other and so on.
if there were to be ab eglstreams etc. implementation for mesa and other
drivers. (i'm not even getting into the mali, imgtec, etc. etc. drivers that
would ALSO have to rev and every embedded oem now neds to provide an
eglstreams version of their drivers, as they to date have been doing things
the gbm way)... if this were to be universal, then ok, pain once to move over
then we're done.
the current trajectory is having to have both gbm and eglstreams and this is
undesirable.
Post by James Jones
While only NVIDIA currently supports streams, this is not an
NVIDIA-specific set of problems, nor is it intended to be an
NVIDIA-specific solution if other vendors adopt the open EGL standards
it is based on.
right now the others aren't budging. :)
As one of the people* that is working on GNOME on Wayland, I can only
agree that having multiple paths, indirectly depending on the GPU
vendor, seems like a highly undesirable way forward; one that we so far
also have managed to mostly avoid so far. I also find it hard to believe
that additions to the GBM API making it equally competent as a
hypothetical EGL stream API being as invasive as adding support
relatively different API all together.

I don't see the gbm path going away any time soon, and I don't see a
hypothetical EGL streams path going universal any time soon either, so
having to introduce the EGL streams path would as well not be a so much
of an "migration" towards using only that API, it'd just mean adding an
extra non-trivial path. A extra path that we'd have to maintain
indefinetely, which only people with the right hardware can test.

FWIW, I don't think we can really take the weston patches as a hint of
how large or complex changes to other compositors (especially ones that
are relatively old) might become.


* note that I'm expressing my own personal opinion here


Jonas
Post by Carsten Haitzler (The Rasterman)
Post by James Jones
Thanks,
-James
Post by Daniel Stone
Thanks,
Mike
Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
--
------------- Codito, ergo sum - "I code, therefore I am" --------------
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Dave Airlie
2016-05-13 01:31:54 UTC
Permalink
While only NVIDIA currently supports streams, this is not an NVIDIA-specific
set of problems, nor is it intended to be an NVIDIA-specific solution if
other vendors adopt the open EGL standards it is based on.
Open standards are great, what's better is open conformance tests, and
open implementations.

EGLStreams is complicated, I know this because I've spent some time
digging into it on a couple of occasions.

How is anyone going to validate compatibility of streams implementations?

btw I'm not saying having an open implementation of open conformance
tests for it will help get it adopted,
but I haven't seen any one except nvidia even remotely interested in EGLStreams.

This is old school Khronos collaboration when you probably really need
new world open source collaboration.

I also think Streams is too big a hammer for the job, and that
building up gbm collaboratively in the open
will result in a better solution for everyone in the long run.

Dave.
Pekka Paalanen
2016-05-12 09:30:50 UTC
Permalink
On Wed, 11 May 2016 16:08:13 -0700
Post by James Jones
Post by Daniel Stone
Hi James,
Post by Daniel Stone
Right - but as with the point I was making below, GBM _right now_ is
more capable than Streams _right now_. GBM right now would require API
additions to match EGLStreams + EGLSwitch + Streams/KMS-interop, but
the last two aren't written either, so. (More below.)
The current behavior that enables this, where basically all Wayland buffers
must be allocated as scanout-capable, isn't reasonable on NVIDIA hardware.
The requirements for scanout are too onerous.
I think we're talking past each other, so I'd like to pare the
discussion down to these two sentences, and my two resultant points,
I posit that the Streams proposal you (plural) have put forward is, at
- there is currently no support for direct scanout from client
buffers in Streams, so it must always pessimise towards GPU
composition
- GBM stacks can obviously do the same: implement a no-op
gbm_bo_import, and have your client always allocate non-scanout
buffers - presto, you've matched Streams
I posit that GBM _can_ match the capability of a hypothetical
EGLStreams/EGLSwitch implementation. Current _implementations_ of GBM
cannot, but I posit that it is not a limitation of the API it exposes,
and unlike Streams, the capability can be plumbed in with no new
external API required.
These seem pretty fundamental, so ... am I missing something? :\ If
so, can you please outline fairly specifically how you think
non-Streams implementations are not capable of meeting the criteria in
your two sentences?
I respect the need to rein in the discussion, but I think several
substantive aspects have been lost here. I typed up a much longer
GBM could match the allocation aspects of streams used in Miguel's first
round of patches. However, I disagree that its core API is sufficient
to match the allocation capabilities of EGLStream+EGLSwitch where all
producing and consuming devices+engines are known at allocation time.
Further, streams have additional equally valuable functionality beyond
allocation that GBM does not seem intended to address. Absent
agreement, I believe co-existence of EGLStreams and GBM+wl_drm in
Wayland/Weston is a reasonable path forward in the short term.
Hi,

I've been following this conversation from side with great interest,
and inspite of sounding stupid, I'd like to note a couple of things.

I twitch a little every time you mention wl_drm, because it is so easy
to understand as "the wl_drm protocol specified in Mesa", while for the
discussion you actually mean "any Wayland-based protocol extension you
might ever want to write for communicating between the client and the
server sides of the EGL implementation". Could we use a different word for it, please?
Post by James Jones
GBM alone can not perform as well as EGLStreams unless it is extended
into something more or less the same as EGLStreams, where it knows
exactly what engines are being used to produce the buffer content (along
with their current configuration), and exactly what
engines/configuration are being used to consume it. This implies
allocating against multiple specific objects, rather than a device and a
set of allocation modifier flags, and/or importing an external
allocation and hoping it meets the current requirements. From what I
can see, GBM fundamentally understands at most the consumer side of the
equation.
Suppose however, GBM was taught everything streams know implicitly about
all users of the buffers at allocation time. After allocation, GBM is
done with its job, but streams & drivers aren't.
The act of transitioning a buffer from optimal "producer mode" to
optimal "consumer mode" relies on all the device & config information as
well, meaning it would need to be fed into the graphics driver (EGL or
whatever window system binding is used) by each window system the
graphics driver was running on to achieve equivalent capabilities to
EGLStream.
Fundamentally, the API-level view of individual graphics buffers as raw
globally coherent & accessible stores of pixels with static layout is
flawed. Images on a GPU are more of a mutating spill space for a
collection of state describing the side effects of various commands than
a 2D array of pixels. Forcing GPUs to resolve an image to a 2D array of
pixels in any particular layout can be very inefficient. The
GL+GLX/EGL/etc. driver model hides this well, but it breaks down in a
few cases like EGLImage and GLX_EXT_texture_from_pixmap, the former not
really living up to its implied potential because of this, and the
latter mostly working only because it has a very limited domain where
things can be shared, but still requires a lot of platform-specific code
to support properly. Vulkan brings a lot more of this out into the open
with its very explicit image state transitions and limitations on which
engines can access an image in any given state, but that's just within
the Vulkan API itself (I.e., strictly on a single GPU and optionally an
associated display engine within the same driver & process) so far.
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I
believe to be the minimal amount of the traditional GL+GLX/EGL/etc.
model, while still allowing as much of the flexibility of the "a bunch
of buffers" mental model as possible. We can re-invent that with GBM
API adjustments, a set of restrictions on how the buffers it allocates
can be used, and another layer of metadata being pumped into drivers on
top of that, but I suspect we'd wind up with something that looks very
similar to streams.
We're both delving into future developments and hypotheticals to some
degree here. If we can't agree now on which direction is best, I
believe the right solution is to allow the two to co-exist and compete
collegially until the benefits of one or the other become more apparent.
The Wayland protocol and Weston compositor were designed in a manner
that makes this as painless as possible. It's not like we're going to
get a ton of Wayland clients that suddenly rely on EGLStream. At worst,
streams lose out and some dead code needs to be deleted from any
compositors that adopted them. As we discussed, there is some
maintenance cost to having two paths, but I believe it is reasonably
contained.
Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Pekka Paalanen
2016-05-12 09:52:58 UTC
Permalink
On Thu, 12 May 2016 12:30:50 +0300
Post by Pekka Paalanen
On Wed, 11 May 2016 16:08:13 -0700
Post by James Jones
Post by Daniel Stone
Hi James,
Post by Daniel Stone
Right - but as with the point I was making below, GBM _right now_ is
more capable than Streams _right now_. GBM right now would require API
additions to match EGLStreams + EGLSwitch + Streams/KMS-interop, but
the last two aren't written either, so. (More below.)
The current behavior that enables this, where basically all Wayland buffers
must be allocated as scanout-capable, isn't reasonable on NVIDIA hardware.
The requirements for scanout are too onerous.
I think we're talking past each other, so I'd like to pare the
discussion down to these two sentences, and my two resultant points,
I posit that the Streams proposal you (plural) have put forward is, at
- there is currently no support for direct scanout from client
buffers in Streams, so it must always pessimise towards GPU
composition
- GBM stacks can obviously do the same: implement a no-op
gbm_bo_import, and have your client always allocate non-scanout
buffers - presto, you've matched Streams
I posit that GBM _can_ match the capability of a hypothetical
EGLStreams/EGLSwitch implementation. Current _implementations_ of GBM
cannot, but I posit that it is not a limitation of the API it exposes,
and unlike Streams, the capability can be plumbed in with no new
external API required.
These seem pretty fundamental, so ... am I missing something? :\ If
so, can you please outline fairly specifically how you think
non-Streams implementations are not capable of meeting the criteria in
your two sentences?
I respect the need to rein in the discussion, but I think several
substantive aspects have been lost here. I typed up a much longer
GBM could match the allocation aspects of streams used in Miguel's first
round of patches. However, I disagree that its core API is sufficient
to match the allocation capabilities of EGLStream+EGLSwitch where all
producing and consuming devices+engines are known at allocation time.
Further, streams have additional equally valuable functionality beyond
allocation that GBM does not seem intended to address. Absent
agreement, I believe co-existence of EGLStreams and GBM+wl_drm in
Wayland/Weston is a reasonable path forward in the short term.
Hi,
I've been following this conversation from side with great interest,
and inspite of sounding stupid, I'd like to note a couple of things.
I twitch a little every time you mention wl_drm, because it is so easy
to understand as "the wl_drm protocol specified in Mesa", while for the
discussion you actually mean "any Wayland-based protocol extension you
might ever want to write for communicating between the client and the
server sides of the EGL implementation". Could we use a different word for it, please?
(Argh, sorry, ctrl+enter slip.)

A word like "EGL_WL_bind_wayland_display" that tells what we are talking
about without tying it into a specific implementation.

Weston and apps have zero code for dealing with wl_drm.
Post by Pekka Paalanen
Post by James Jones
GBM alone can not perform as well as EGLStreams unless it is extended
into something more or less the same as EGLStreams, where it knows
exactly what engines are being used to produce the buffer content (along
with their current configuration), and exactly what
engines/configuration are being used to consume it. This implies
allocating against multiple specific objects, rather than a device and a
set of allocation modifier flags, and/or importing an external
allocation and hoping it meets the current requirements. From what I
can see, GBM fundamentally understands at most the consumer side of the
equation.
Wouldn't that be provided by EGL_WL_bind_wayland_display protocol you
write to suit your needs, very much like you write the streams protocol
but without the burden of forcing everyone to use EGLStreams API?
Post by Pekka Paalanen
Post by James Jones
Suppose however, GBM was taught everything streams know implicitly about
all users of the buffers at allocation time. After allocation, GBM is
done with its job, but streams & drivers aren't.
GBM API is used by compositors to allocate buffers they will composite
into, for putting them via KMS for display.

I get the feeling there is some assumption that Wayland clients might
be using the GBM API. Is there? That is not the case (today). If the EGL
implementation internally uses GBM, that's up to it, but it also means
EGL implementation has all the arbitrary information available from
EGL_WL_bind_wayland_display protocol to do whatever allocation calls it
needs to do.

If you write your EGL_WL_bind_wayland_display protocol to support it,
you can swap the allocation from under an existing wl_buffer without
destroying the wl_buffer. At least I think it's possible, I have never
looked into what cornercases it might raise. But I also do not quite
see why you would need to avoid destroying a wl_buffer and making a new
one based on what your EGL_WL_bind_wayland_display protocol tells you
client-side. wl_buffer is just a handle to arbitrary (meta-)data,
anyway.


Thanks,
pq
Post by Pekka Paalanen
Post by James Jones
The act of transitioning a buffer from optimal "producer mode" to
optimal "consumer mode" relies on all the device & config information as
well, meaning it would need to be fed into the graphics driver (EGL or
whatever window system binding is used) by each window system the
graphics driver was running on to achieve equivalent capabilities to
EGLStream.
Fundamentally, the API-level view of individual graphics buffers as raw
globally coherent & accessible stores of pixels with static layout is
flawed. Images on a GPU are more of a mutating spill space for a
collection of state describing the side effects of various commands than
a 2D array of pixels. Forcing GPUs to resolve an image to a 2D array of
pixels in any particular layout can be very inefficient. The
GL+GLX/EGL/etc. driver model hides this well, but it breaks down in a
few cases like EGLImage and GLX_EXT_texture_from_pixmap, the former not
really living up to its implied potential because of this, and the
latter mostly working only because it has a very limited domain where
things can be shared, but still requires a lot of platform-specific code
to support properly. Vulkan brings a lot more of this out into the open
with its very explicit image state transitions and limitations on which
engines can access an image in any given state, but that's just within
the Vulkan API itself (I.e., strictly on a single GPU and optionally an
associated display engine within the same driver & process) so far.
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I
believe to be the minimal amount of the traditional GL+GLX/EGL/etc.
model, while still allowing as much of the flexibility of the "a bunch
of buffers" mental model as possible. We can re-invent that with GBM
API adjustments, a set of restrictions on how the buffers it allocates
can be used, and another layer of metadata being pumped into drivers on
top of that, but I suspect we'd wind up with something that looks very
similar to streams.
We're both delving into future developments and hypotheticals to some
degree here. If we can't agree now on which direction is best, I
believe the right solution is to allow the two to co-exist and compete
collegially until the benefits of one or the other become more apparent.
The Wayland protocol and Weston compositor were designed in a manner
that makes this as painless as possible. It's not like we're going to
get a ton of Wayland clients that suddenly rely on EGLStream. At worst,
streams lose out and some dead code needs to be deleted from any
compositors that adopted them. As we discussed, there is some
maintenance cost to having two paths, but I believe it is reasonably
contained.
Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Kristian Høgsberg
2016-05-13 05:07:10 UTC
Permalink
Post by James Jones
Post by Daniel Stone
Hi James,
Post by Daniel Stone
Right - but as with the point I was making below, GBM _right now_ is
more capable than Streams _right now_. GBM right now would require API
additions to match EGLStreams + EGLSwitch + Streams/KMS-interop, but
the last two aren't written either, so. (More below.)
The current behavior that enables this, where basically all Wayland buffers
must be allocated as scanout-capable, isn't reasonable on NVIDIA hardware.
The requirements for scanout are too onerous.
I think we're talking past each other, so I'd like to pare the
discussion down to these two sentences, and my two resultant points,
I posit that the Streams proposal you (plural) have put forward is, at
- there is currently no support for direct scanout from client
buffers in Streams, so it must always pessimise towards GPU
composition
- GBM stacks can obviously do the same: implement a no-op
gbm_bo_import, and have your client always allocate non-scanout
buffers - presto, you've matched Streams
I posit that GBM _can_ match the capability of a hypothetical
EGLStreams/EGLSwitch implementation. Current _implementations_ of GBM
cannot, but I posit that it is not a limitation of the API it exposes,
and unlike Streams, the capability can be plumbed in with no new
external API required.
These seem pretty fundamental, so ... am I missing something? :\ If
so, can you please outline fairly specifically how you think
non-Streams implementations are not capable of meeting the criteria in
your two sentences?
I respect the need to rein in the discussion, but I think several
substantive aspects have been lost here. I typed up a much longer response
GBM could match the allocation aspects of streams used in Miguel's first
round of patches. However, I disagree that its core API is sufficient to
match the allocation capabilities of EGLStream+EGLSwitch where all producing
and consuming devices+engines are known at allocation time. Further, streams
have additional equally valuable functionality beyond allocation that GBM
does not seem intended to address. Absent agreement, I believe co-existence
of EGLStreams and GBM+wl_drm in Wayland/Weston is a reasonable path forward
in the short term.
GBM alone can not perform as well as EGLStreams unless it is extended into
something more or less the same as EGLStreams, where it knows exactly what
engines are being used to produce the buffer content (along with their
current configuration), and exactly what engines/configuration are being
used to consume it. This implies allocating against multiple specific
objects, rather than a device and a set of allocation modifier flags, and/or
importing an external allocation and hoping it meets the current
requirements. From what I can see, GBM fundamentally understands at most
the consumer side of the equation.
Suppose however, GBM was taught everything streams know implicitly about all
users of the buffers at allocation time. After allocation, GBM is done with
its job, but streams & drivers aren't.
The act of transitioning a buffer from optimal "producer mode" to optimal
"consumer mode" relies on all the device & config information as well,
meaning it would need to be fed into the graphics driver (EGL or whatever
window system binding is used) by each window system the graphics driver was
running on to achieve equivalent capabilities to EGLStream.
Fundamentally, the API-level view of individual graphics buffers as raw
globally coherent & accessible stores of pixels with static layout is
flawed. Images on a GPU are more of a mutating spill space for a collection
of state describing the side effects of various commands than a 2D array of
pixels. Forcing GPUs to resolve an image to a 2D array of pixels in any
particular layout can be very inefficient. The GL+GLX/EGL/etc. driver model
hides this well, but it breaks down in a few cases like EGLImage and
GLX_EXT_texture_from_pixmap, the former not really living up to its implied
potential because of this, and the latter mostly working only because it has
a very limited domain where things can be shared, but still requires a lot
of platform-specific code to support properly. Vulkan brings a lot more of
this out into the open with its very explicit image state transitions and
limitations on which engines can access an image in any given state, but
that's just within the Vulkan API itself (I.e., strictly on a single GPU and
optionally an associated display engine within the same driver & process) so
far.
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I believe to
be the minimal amount of the traditional GL+GLX/EGL/etc. model, while still
allowing as much of the flexibility of the "a bunch of buffers" mental model
as possible. We can re-invent that with GBM API adjustments, a set of
restrictions on how the buffers it allocates can be used, and another layer
of metadata being pumped into drivers on top of that, but I suspect we'd
wind up with something that looks very similar to streams.
I think this is where the disconnect is. I (and others) don't see
reinventing some of the EGLStream functionality in gbm + wl_drm (or
similar EGL implementation private protocol) as a problem or that the
result will be worse then EGLStreams. Compositors use gbm today and
would much rather grow one code path incrementally and in a backwards
compatible way. I know that's already been done for various non-mesa
stacks, lilke SoCs where scanout memory is a scarce resource. If we
end up with something similar to what EGLStream will be one day, that
doesn't mean we should've used EGLStreams. It just means they're
different solutions to the same problem.

Kristian
Post by James Jones
We're both delving into future developments and hypotheticals to some degree
here. If we can't agree now on which direction is best, I believe the right
solution is to allow the two to co-exist and compete collegially until the
benefits of one or the other become more apparent. The Wayland protocol and
Weston compositor were designed in a manner that makes this as painless as
possible. It's not like we're going to get a ton of Wayland clients that
suddenly rely on EGLStream. At worst, streams lose out and some dead code
needs to be deleted from any compositors that adopted them. As we
discussed, there is some maintenance cost to having two paths, but I believe
it is reasonably contained.
Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Daniel Stone
2016-05-14 16:46:51 UTC
Permalink
Hi James,
Post by James Jones
GBM alone can not perform as well as EGLStreams unless it is extended into
something more or less the same as EGLStreams, where it knows exactly what
engines are being used to produce the buffer content (along with their
current configuration), and exactly what engines/configuration are being
used to consume it. This implies allocating against multiple specific
objects, rather than a device and a set of allocation modifier flags, and/or
importing an external allocation and hoping it meets the current
requirements. From what I can see, GBM fundamentally understands at most
the consumer side of the equation.
I disagree with the last part of this. GBM is integrated with EGL, and
thus has the facility to communicate with the producer as it pleases,
through private protocol.
Post by James Jones
Suppose however, GBM was taught everything streams know implicitly about all
users of the buffers at allocation time. After allocation, GBM is done with
its job, but streams & drivers aren't.
The act of transitioning a buffer from optimal "producer mode" to optimal
"consumer mode" relies on all the device & config information as well,
meaning it would need to be fed into the graphics driver (EGL or whatever
window system binding is used) by each window system the graphics driver was
running on to achieve equivalent capabilities to EGLStream.
Sure. But this leads into one huge (unaddressed) concern I have:
integration with the world outside libEGL.so. Vulkan and media APIs
are going to need to gain explicit knowledge - read, an extra
dependency on - EGL in order to deal with this. Then let's throw a
media device into the mix: how does Streams ensure optimal
configuration? Does that require teaching EGL about media decode
devices, and growing a whole other API for that? More pressingly, how
do you deal with other devices?

Tegra devices are in an enviable position where NVIDIA produces all
the IP, but in that regard it stands alone in the SoC world. The only
two cases I know of where the IP blocks are homogeneous are Tegra and
some Qualcomm devices - but then again, some Qualcomm blocks use a
Samsung media decode IP. Same story for multi-GPU drivers: how do you
do interop between an Intel GPU doing composition and an NVIDIA GPU
producing content?

From where I stand, there are two options to deal with this: one is to
declare that the world must use EGLStreams for optimal allocation,
even if they'd never previously used Streams, and the other is to
surface the interactions with Streams into a public API that can be
used by, say, media producers. Which model are you looking towards
here?

Again, NVIDIA are fine with producing a very large libEGL.so, and
Tegra's nature makes that easier to do, but what about everyone else?
Post by James Jones
Fundamentally, the API-level view of individual graphics buffers as raw
globally coherent & accessible stores of pixels with static layout is
flawed. Images on a GPU are more of a mutating spill space for a collection
of state describing the side effects of various commands than a 2D array of
pixels. Forcing GPUs to resolve an image to a 2D array of pixels in any
particular layout can be very inefficient. The GL+GLX/EGL/etc. driver model
hides this well, but it breaks down in a few cases like EGLImage and
GLX_EXT_texture_from_pixmap, the former not really living up to its implied
potential because of this, and the latter mostly working only because it has
a very limited domain where things can be shared, but still requires a lot
of platform-specific code to support properly. Vulkan brings a lot more of
this out into the open with its very explicit image state transitions and
limitations on which engines can access an image in any given state, but
that's just within the Vulkan API itself (I.e., strictly on a single GPU and
optionally an associated display engine within the same driver & process) so
far.
There's nothing in this I disagree with, but I also don't read it an
indictment of GBM. You've previously made the point that looking
beyond frames to streams is a better way of looking at things, which
is fine, but both Wayland and KMS are fundamentally frame-based at
their core, so the impedance mismatch is already pretty obvious from
the start.
Post by James Jones
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I believe to
be the minimal amount of the traditional GL+GLX/EGL/etc. model, while still
allowing as much of the flexibility of the "a bunch of buffers" mental model
as possible. We can re-invent that with GBM API adjustments, a set of
restrictions on how the buffers it allocates can be used, and another layer
of metadata being pumped into drivers on top of that, but I suspect we'd
wind up with something that looks very similar to streams.
The only allocation GBM does is for buffers produced by the compositor
and used for scanout, so in this regard it's quite straightforward.
Client buffers are a separate topic, and I don't buy that the
non-Streams model precludes things like render compression. In fact,
Ben Widawsky, Dan Vetter, and some others are as we speak working on
support for render compression within both Wayland EGL and GBM itself
(for direct scanout from compressed buffers with an auxiliary plane).
So far, the only external impact has been a very small extension to
the GBM API to allow use of multiple planes and FB modifiers: a far
smaller change than implementing the whole of Streams and all its
future extensions (Switch et al).
Post by James Jones
We're both delving into future developments and hypotheticals to some degree
here. If we can't agree now on which direction is best, I believe the right
solution is to allow the two to co-exist and compete collegially until the
benefits of one or the other become more apparent. The Wayland protocol and
Weston compositor were designed in a manner that makes this as painless as
possible. It's not like we're going to get a ton of Wayland clients that
suddenly rely on EGLStream. At worst, streams lose out and some dead code
needs to be deleted from any compositors that adopted them. As we
discussed, there is some maintenance cost to having two paths, but I believe
it is reasonably contained.
It would be interesting to see the full Streams patchset - including
EGLSwitch and direct-scanout - to see what the final impact would be
like.

As Kristian says, I really don't see where the existing non-Streams
solutions, being GBM on the compositor side and private frame-based
protocols between compositor and client, leave you unable to reach
full performance potential. Do you have any concrete usecases that you
can point to in as much detail as possible, outlining exactly how the
GBM/private-Wayland-protocol model forces you to compromise
performance?

Cheers,
Daniel
Daniel Vetter
2016-05-16 09:36:48 UTC
Permalink
Post by Daniel Stone
Post by James Jones
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I believe to
be the minimal amount of the traditional GL+GLX/EGL/etc. model, while still
allowing as much of the flexibility of the "a bunch of buffers" mental model
as possible. We can re-invent that with GBM API adjustments, a set of
restrictions on how the buffers it allocates can be used, and another layer
of metadata being pumped into drivers on top of that, but I suspect we'd
wind up with something that looks very similar to streams.
The only allocation GBM does is for buffers produced by the compositor
and used for scanout, so in this regard it's quite straightforward.
Client buffers are a separate topic, and I don't buy that the
non-Streams model precludes things like render compression. In fact,
Ben Widawsky, Dan Vetter, and some others are as we speak working on
support for render compression within both Wayland EGL and GBM itself
(for direct scanout from compressed buffers with an auxiliary plane).
So far, the only external impact has been a very small extension to
the GBM API to allow use of multiple planes and FB modifiers: a far
smaller change than implementing the whole of Streams and all its
future extensions (Switch et al).
Just a quick correction: For render compression we also do need some
allocation hinting interface, since on intel gpus you can't always scan
out render compressed buffers. So exactly what EGLstreams tries to also
solve (at least if my understanding is correct). So we need a bit more in
gbm than just be able to pass fb modifiers around.

I still think it's the better approach though since it's still fairly
incremental. And exposing the allocation hints and making them explicitly
will avoid the need to teach everything in the world about EGLstreams (vk,
v4l, drm, ...). Which as Daniel Stone pointed out, doesn't really work
well if you have IP blocks from multiple vendors on your SoC.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
James Jones
2016-05-16 18:12:35 UTC
Permalink
Post by Daniel Vetter
Post by Daniel Stone
Post by James Jones
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I believe to
be the minimal amount of the traditional GL+GLX/EGL/etc. model, while still
allowing as much of the flexibility of the "a bunch of buffers" mental model
as possible. We can re-invent that with GBM API adjustments, a set of
restrictions on how the buffers it allocates can be used, and another layer
of metadata being pumped into drivers on top of that, but I suspect we'd
wind up with something that looks very similar to streams.
The only allocation GBM does is for buffers produced by the compositor
and used for scanout, so in this regard it's quite straightforward.
Client buffers are a separate topic, and I don't buy that the
non-Streams model precludes things like render compression. In fact,
Ben Widawsky, Dan Vetter, and some others are as we speak working on
support for render compression within both Wayland EGL and GBM itself
(for direct scanout from compressed buffers with an auxiliary plane).
So far, the only external impact has been a very small extension to
the GBM API to allow use of multiple planes and FB modifiers: a far
smaller change than implementing the whole of Streams and all its
future extensions (Switch et al).
Just a quick correction: For render compression we also do need some
allocation hinting interface, since on intel gpus you can't always scan
out render compressed buffers. So exactly what EGLstreams tries to also
solve (at least if my understanding is correct). So we need a bit more in
gbm than just be able to pass fb modifiers around.
Yes, this, and it goes beyond just hinting at allocation time for us if
you intend to reconfigure the output without reallocating the surface
(E.g., switch to a different plane, start rotating the output, etc.).
Post by Daniel Vetter
I still think it's the better approach though since it's still fairly
incremental. And exposing the allocation hints and making them explicitly
will avoid the need to teach everything in the world about EGLstreams (vk,
v4l, drm, ...). Which as Daniel Stone pointed out, doesn't really work
well if you have IP blocks from multiple vendors on your SoC.
-Daniel
Yeah, IP blocks from multiple vendors are hard. I don't see how they're
any harder with streams though Vs. the alternate GBM-based proposals
that have been suggested thus far. We're not entirely immune to this at
NVIDIA. Sometimes we want to present to an Intel display engine, for
example. An EGL-based solution doesn't necessarily mean a single
vendor's EGL driver (GLVND is coming, slowly), and even if it does, it
only requires explicit cooperation if both vendors share some more
optimal layout than basic pitch-linear with minimal alignment
requirements and whatnot, no compression, either fully-coherent caches
or no caching.

However, there are two ways to solve this:

-Always resort to the lowest common denominator when the
producer/consumer aren't from the same vendor, as mentioned above.

-Have some sort of coordination, either handled by the application and a
bunch of capability bits, or handled by a driver<->driver API below the
level of the application API.

Neither of these seem specific to either a streams-based or EGL-based
solution to me. The important part is to standardize the interfaces
exposed to applications or drivers to coordinate the right formats.

As to needing to teach everything about EGLStreams, I think there's a
misconception that this means every component vendor needs to get on the
EGL bandwagon and start writing a bunch of no-op eglGetConfig() entry
points and whatnot. Even with all our in-house IP, that's not the case
at NVIDIA. Our media codecs aren't baked into the same driver module as
our OpenGL drivers for example, and the drivers and engineers
maintaining them know very little about eachother. Our EGL driver
allows stream producers/consumers to plug into it using some
internal-standard interfaces and a relatively minimal amount of code,
and without even including any Khronos EGL headers.

The current Khronos EGL API doesn't need to be the only interface
through which drivers plug in to a libEGL or vendor EGL implementation.
The proposal to expose a vendor-agnostic set of hooks to allow writing
EGL platform implementations without EGL vendor involvement is one
example of a non-application facing EGL API. EGLStream producer and
consumer hooks could be handled with another non-application facing API.
Post by Daniel Vetter
As Kristian says, I really don't see where the existing non-Streams
solutions, being GBM on the compositor side and private frame-based
protocols between compositor and client, leave you unable to reach
full performance potential. Do you have any concrete usecases that you
can point to in as much detail as possible, outlining exactly how the
GBM/private-Wayland-protocol model forces you to compromise
performance?
Unfortunately, the only realistic way to get to the full patchset is
incrementally. We haven't even finished the EGLSwitch extension, let
alone writing Weston code to use it. This is why I believe temporary
co-existance of the two paths is a reasonable path for now. Not all the
benefits of streams are demonstrable yet, nor is GBM in its final form.

Daniel Stone, I'd like to hear more about how you envision a GBM library
communicating with an EGL producer in a remote process. Would GBM be
sending wayland protocol directly? If so, this is really starting to
sound like streams-rewritten-using-wayland-protocol, and I don't think
wayland is the right domain to solve these non-wayland-specific issues
in. If, on the other hand, GBM is going to gain its own set of
per-vendor cross-process communication mechanisms, that really sounds
like a re-invention of EGLStreams.

Perhaps both of my assumed solutions above are way off the mark, and it
does seem like we're talking past eachother at times. It seems like you
have a pretty strong understanding of how this would all work in GBM,
even if it's not there in the code yet. I understand you're quite busy,
but perhaps we could have a brief real-time communication session (IRC?
Phone?) where we can talk through some of your ideas for GBM, so we
can at least start from the same basic understanding when talking about
this stuff. Let me know if you want to schedule something.

Thanks,
-James
Daniel Vetter
2016-05-16 20:44:47 UTC
Permalink
Post by Daniel Vetter
Post by Daniel Stone
Post by James Jones
The EGLStream encapsulation takes into consideration the new use cases
EGLImage, GBM, etc. were intended to address, and restores what I believe to
be the minimal amount of the traditional GL+GLX/EGL/etc. model, while still
allowing as much of the flexibility of the "a bunch of buffers" mental model
as possible. We can re-invent that with GBM API adjustments, a set of
restrictions on how the buffers it allocates can be used, and another layer
of metadata being pumped into drivers on top of that, but I suspect we'd
wind up with something that looks very similar to streams.
The only allocation GBM does is for buffers produced by the compositor
and used for scanout, so in this regard it's quite straightforward.
Client buffers are a separate topic, and I don't buy that the
non-Streams model precludes things like render compression. In fact,
Ben Widawsky, Dan Vetter, and some others are as we speak working on
support for render compression within both Wayland EGL and GBM itself
(for direct scanout from compressed buffers with an auxiliary plane).
So far, the only external impact has been a very small extension to
the GBM API to allow use of multiple planes and FB modifiers: a far
smaller change than implementing the whole of Streams and all its
future extensions (Switch et al).
Just a quick correction: For render compression we also do need some
allocation hinting interface, since on intel gpus you can't always scan
out render compressed buffers. So exactly what EGLstreams tries to also
solve (at least if my understanding is correct). So we need a bit more in
gbm than just be able to pass fb modifiers around.
Yes, this, and it goes beyond just hinting at allocation time for us if you
intend to reconfigure the output without reallocating the surface (E.g.,
switch to a different plane, start rotating the output, etc.).
Post by Daniel Vetter
I still think it's the better approach though since it's still fairly
incremental. And exposing the allocation hints and making them explicitly
will avoid the need to teach everything in the world about EGLstreams (vk,
v4l, drm, ...). Which as Daniel Stone pointed out, doesn't really work
well if you have IP blocks from multiple vendors on your SoC.
-Daniel
Yeah, IP blocks from multiple vendors are hard. I don't see how they're any
harder with streams though Vs. the alternate GBM-based proposals that have
been suggested thus far. We're not entirely immune to this at NVIDIA.
Sometimes we want to present to an Intel display engine, for example. An
EGL-based solution doesn't necessarily mean a single vendor's EGL driver
(GLVND is coming, slowly), and even if it does, it only requires explicit
cooperation if both vendors share some more optimal layout than basic
pitch-linear with minimal alignment requirements and whatnot, no
compression, either fully-coherent caches or no caching.
-Always resort to the lowest common denominator when the producer/consumer
aren't from the same vendor, as mentioned above.
-Have some sort of coordination, either handled by the application and a
bunch of capability bits, or handled by a driver<->driver API below the
level of the application API.
Neither of these seem specific to either a streams-based or EGL-based
solution to me. The important part is to standardize the interfaces exposed
to applications or drivers to coordinate the right formats.
As to needing to teach everything about EGLStreams, I think there's a
misconception that this means every component vendor needs to get on the EGL
bandwagon and start writing a bunch of no-op eglGetConfig() entry points and
whatnot. Even with all our in-house IP, that's not the case at NVIDIA. Our
media codecs aren't baked into the same driver module as our OpenGL drivers
for example, and the drivers and engineers maintaining them know very little
about eachother. Our EGL driver allows stream producers/consumers to plug
into it using some internal-standard interfaces and a relatively minimal
amount of code, and without even including any Khronos EGL headers.
I think this is the crux here - we need to standardize those hints/minimal
set of shared metadata, in a public, cross-vendor interface. Since
obviously without that it doesn't even work for you in your one-vendor
case.

And as soon as we have that standard, I don't really see the benefit of
EGLstreams any more. At least my understanding is that the entire point of
EGLstreams is to hide these hints and metadata and allow them to be vendor
specific.

The other argument for EGLstreams seems to be that once you use EGLstreams
everywhere, there's less wheel reinventing or well protocol re-typing
going on. But right now we (at least the open source community) are in a
world where EGLstreams doesn't exist, so we need to do all that typing
anyway. And typing the protocol itself has the advantage that you can more
easily fit it into whatever's there already. E.g. for
xfree86-video-modesetting we want to add any metadata we standardize to
DRI3 (probably, haven't looked at the details). On wayland we can just add
a new protocol object and reuse all the fancy stuff wayland provides, but
put that proto within libEGL (or gbm) since that fits more with how wl
works right now. And for Android we can do an Android (again haven't
looked into details there). So afaics more flexibility, with roughly the
same amount of work.

Or is there something else I'm not seeing?
The current Khronos EGL API doesn't need to be the only interface through
which drivers plug in to a libEGL or vendor EGL implementation. The
proposal to expose a vendor-agnostic set of hooks to allow writing EGL
platform implementations without EGL vendor involvement is one example of a
non-application facing EGL API. EGLStream producer and consumer hooks could
be handled with another non-application facing API.
Post by Daniel Vetter
As Kristian says, I really don't see where the existing non-Streams
solutions, being GBM on the compositor side and private frame-based
protocols between compositor and client, leave you unable to reach
full performance potential. Do you have any concrete usecases that you
can point to in as much detail as possible, outlining exactly how the
GBM/private-Wayland-protocol model forces you to compromise
performance?
Unfortunately, the only realistic way to get to the full patchset is
incrementally. We haven't even finished the EGLSwitch extension, let alone
writing Weston code to use it. This is why I believe temporary co-existance
of the two paths is a reasonable path for now. Not all the benefits of
streams are demonstrable yet, nor is GBM in its final form.
Daniel Stone, I'd like to hear more about how you envision a GBM library
communicating with an EGL producer in a remote process. Would GBM be
sending wayland protocol directly? If so, this is really starting to sound
like streams-rewritten-using-wayland-protocol, and I don't think wayland is
the right domain to solve these non-wayland-specific issues in. If, on the
other hand, GBM is going to gain its own set of per-vendor cross-process
communication mechanisms, that really sounds like a re-invention of
EGLStreams.
Perhaps both of my assumed solutions above are way off the mark, and it does
seem like we're talking past eachother at times. It seems like you have a
pretty strong understanding of how this would all work in GBM, even if it's
not there in the code yet. I understand you're quite busy, but perhaps we
could have a brief real-time communication session (IRC? Phone?) where we
can talk through some of your ideas for GBM, so we can at least start from
the same basic understanding when talking about this stuff. Let me know if
you want to schedule something.
Yeah, I think we're at the point where code is probably much easier to
understand. I'm trying to type up the gbm based idea in hopefully the near
feature. That hopefully grounds the discussion a lot more.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Daniel Vetter
2016-05-03 16:06:18 UTC
Permalink
Post by James Jones
Streams could provide a way to express that the compositor picked the wrong
plane, but they don't solve the optimal configuration problem. Configuration
is a tricky mix of policy and capabilities that something like HWComposer or
a wayland compositor with access to HW-specific knowledge needs to solve. I
agree with other statements here that encapsulating direct HW knowledge
within individual Wayland compositors is probably not a great idea, but some
separate standard or shared library taking input from hardware-specific
modules and wrangling scene graphs is probably needed to get optimal
behavior.
What streams do is allow allocating the most optimal set of buffers and
using the most optimal method to present them possible given a
configuration. So, streams would kick in after the scene graph thing
generated a config.
Daniel's reply cut out this crucial bit somehow, and he replied somewhere
else that he agrees that eglstreams solves at least the "optimal
allocation once scene graph is fixed" problem. I disagree since this
entire thing is highly dynamic - at least on SoC chips how you allocate
your buffers has big impacts on what the display engine can do, and the
other way round:
- depending upon tiling layout fifo space requirements change drastically,
and going for the "optimal" tiling might push some other plane over the
edge
- there's simpler stuff like some planes can only do some features like
render compression, which is why even for a TEST_ONLY atomic commit you
must supply all the buffers already
- other fun stuff happens around rotation/scaling/planar vs. single-plane
yuv buffers. All these tend to need special hw resources, which means
your choice in how to use it on the kms side has effects on what kind of
buffer you need to allocate. And the other way round.

I don't think there's any way at all, at least for a generic system that
wants to support embedded/mobile SoCs to solve the kms config and buffer
alloc problems as 2 separate steps. You absolutely need these two pieces
to talk to each another, and talk the same language. Either some vendor
horror show (what most of android bsp end up doing behind the back) or
something standardized (what we're trying to pull off around kms+gbm).

Hiding half of the story behind eglstreams doesn't help anyone afaict. If
you do that, you also need to hide the other half. Which means proprietary
hw composer driver (or similar), which can understand/change the metadata
you internally attach to eglstreams/buffers. And once you've decided to go
the fully hidden route hw composer seems to be the best choice really.
SurfaceFlinger isn't really great for multi-screen, but the hwc interface
itself is already fixed and handles that properly. But even with hwc you
don't have eglstreams, because once both ends are proprietary there's
really no need for any standard any more ;-)
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
James Jones
2016-05-03 16:29:58 UTC
Permalink
Post by Daniel Vetter
Post by James Jones
Streams could provide a way to express that the compositor picked the wrong
plane, but they don't solve the optimal configuration problem. Configuration
is a tricky mix of policy and capabilities that something like HWComposer or
a wayland compositor with access to HW-specific knowledge needs to solve. I
agree with other statements here that encapsulating direct HW knowledge
within individual Wayland compositors is probably not a great idea, but some
separate standard or shared library taking input from hardware-specific
modules and wrangling scene graphs is probably needed to get optimal
behavior.
What streams do is allow allocating the most optimal set of buffers and
using the most optimal method to present them possible given a
configuration. So, streams would kick in after the scene graph thing
generated a config.
Daniel's reply cut out this crucial bit somehow, and he replied somewhere
else that he agrees that eglstreams solves at least the "optimal
allocation once scene graph is fixed" problem. I disagree since this
entire thing is highly dynamic - at least on SoC chips how you allocate
your buffers has big impacts on what the display engine can do, and the
- depending upon tiling layout fifo space requirements change drastically,
and going for the "optimal" tiling might push some other plane over the
edge
- there's simpler stuff like some planes can only do some features like
render compression, which is why even for a TEST_ONLY atomic commit you
must supply all the buffers already
- other fun stuff happens around rotation/scaling/planar vs. single-plane
yuv buffers. All these tend to need special hw resources, which means
your choice in how to use it on the kms side has effects on what kind of
buffer you need to allocate. And the other way round.
I don't think there's any way at all, at least for a generic system that
wants to support embedded/mobile SoCs to solve the kms config and buffer
alloc problems as 2 separate steps. You absolutely need these two pieces
to talk to each another, and talk the same language. Either some vendor
horror show (what most of android bsp end up doing behind the back) or
something standardized (what we're trying to pull off around kms+gbm).
Hiding half of the story behind eglstreams doesn't help anyone afaict. If
you do that, you also need to hide the other half. Which means proprietary
hw composer driver (or similar), which can understand/change the metadata
you internally attach to eglstreams/buffers. And once you've decided to go
the fully hidden route hw composer seems to be the best choice really.
SurfaceFlinger isn't really great for multi-screen, but the hwc interface
itself is already fixed and handles that properly. But even with hwc you
don't have eglstreams, because once both ends are proprietary there's
really no need for any standard any more ;-)
-Daniel
Thank you for the additional information. If I follow this correctly:

1) You believe HW composer is a reasonable solution to optimizing the
scenegraph of a compositor.

2) Your preferred solution would not be HW composer, but rather GBM+KMS
(presumably with the addition of some yet-to-be-developed APIs)

3) You believe the constraints of the system are sufficiently
interdependent that to optimize the system, all allocations and display
engine configuration must be done atomically, in a sense.

Is that correct?

If so, I have some questions:

-Do you believe (2) is reasonably achievable, or just the style of
solution you would prefer in general?

-Why is GBM+DRM-KMS better suited to meet the requirements presented by
(3) than EGLStream+DRM-KMS?

Given Wayland is designed such that clients drive buffer allocation, and
I tend to agree that the compositor (along with its access to drivers
like KMS) is the component uniquely able to optimize the scene, I think
the best that can be achieved is a system that gravitates toward the
optimal solution in the steady state. Therefore, it seems that KMS
should optimize display engine resources assuming the Wayland compositor
and its clients will adjust to meet KMS' suggestions over time, where
"time" would hopefully be only a small number of additional frames.
Streams will perform quite well in such a design.

There would of course be cases where multiple iterations are required to
get from the current buffers and their display requirements to the
optimal buffers and the optimal display settings, but I don't see a way
around that. Hopefully over time display hardware will be optimized
towards these use-cases, as they are becoming ubiquitous.

Thanks,
-James
Daniel Stone
2016-05-03 16:58:02 UTC
Permalink
Hi James,
Post by James Jones
Given Wayland is designed such that clients drive buffer allocation
I'd just note that this isn't strictly true. I've personally
implemented Wayland support for platforms (media playback on an
extremely idiosyncratic platform) where server-side buffer allocation
was required for optimal performance, and that's what was done. wl_drm
is not exemplary for these platforms as it does not have a protocol
concept of a swapchain, but you can add one to your own private
protocol implementation (analagous to wl_eglstream) and it works with
no changes required to external clients or compositors.
Post by James Jones
, and I
tend to agree that the compositor (along with its access to drivers like
KMS) is the component uniquely able to optimize the scene, I think the best
that can be achieved is a system that gravitates toward the optimal solution
in the steady state. Therefore, it seems that KMS should optimize display
engine resources assuming the Wayland compositor and its clients will adjust
to meet KMS' suggestions over time, where "time" would hopefully be only a
small number of additional frames. Streams will perform quite well in such a
design.
It is unfortunate that you seem to discuss 'Streams' as an abstract
concept of a cross-process swapchain which can be infinitely adjusted
to achieve perfection, and yet 'GBM' gets discussed as a singular
fixed-in-time thing which has all the flaws of just one of its
particular platform implementations.

I don't see how GBM could really perform any worse in such a design.

Cheers,
Daniel
James Jones
2016-05-03 18:58:32 UTC
Permalink
Post by Daniel Stone
Hi James,
Post by James Jones
Given Wayland is designed such that clients drive buffer allocation
I'd just note that this isn't strictly true. I've personally
implemented Wayland support for platforms (media playback on an
extremely idiosyncratic platform) where server-side buffer allocation
was required for optimal performance, and that's what was done. wl_drm
is not exemplary for these platforms as it does not have a protocol
concept of a swapchain, but you can add one to your own private
protocol implementation (analagous to wl_eglstream) and it works with
no changes required to external clients or compositors.
Indeed, streams blur this a bit as well. What I meant to way is that
clients drive the timing of when new buffers are available for
compositing. Perhaps the server could perform a non-destructive
reallocation to avoid this though if the cost of such an operation were
not considered prohibitive?
Post by Daniel Stone
Post by James Jones
, and I
tend to agree that the compositor (along with its access to drivers like
KMS) is the component uniquely able to optimize the scene, I think the best
that can be achieved is a system that gravitates toward the optimal solution
in the steady state. Therefore, it seems that KMS should optimize display
engine resources assuming the Wayland compositor and its clients will adjust
to meet KMS' suggestions over time, where "time" would hopefully be only a
small number of additional frames. Streams will perform quite well in such a
design.
It is unfortunate that you seem to discuss 'Streams' as an abstract
concept of a cross-process swapchain which can be infinitely adjusted
to achieve perfection, and yet 'GBM' gets discussed as a singular
fixed-in-time thing which has all the flaws of just one of its
particular platform implementations.
I have a stronger understanding of the design direction for streams than
I do for GBM, and EGLStream is indeed intended to evolve towards the
best abstraction of a swapchain possible. My views of GBM are based on
the current API. I'm not that familiar with the Mesa implementation
details. I'd be happy to learn more about the direction the GBM API is
taking in the future, and that's half of what I was attempting to do in
my responses/questions here.
Post by Daniel Stone
I don't see how GBM could really perform any worse in such a design.
The current GBM API is not expressive enough to support optimal buffer
allocation (at least on our hardware) in such a design.

Thanks,
-James
Post by Daniel Stone
Cheers,
Daniel
Kristian Høgsberg
2016-05-03 19:49:21 UTC
Permalink
Post by James Jones
Post by Daniel Stone
Hi James,
Post by James Jones
Given Wayland is designed such that clients drive buffer allocation
I'd just note that this isn't strictly true. I've personally
implemented Wayland support for platforms (media playback on an
extremely idiosyncratic platform) where server-side buffer allocation
was required for optimal performance, and that's what was done. wl_drm
is not exemplary for these platforms as it does not have a protocol
concept of a swapchain, but you can add one to your own private
protocol implementation (analagous to wl_eglstream) and it works with
no changes required to external clients or compositors.
Indeed, streams blur this a bit as well. What I meant to way is that
clients drive the timing of when new buffers are available for compositing.
Perhaps the server could perform a non-destructive reallocation to avoid
this though if the cost of such an operation were not considered
prohibitive?
Post by Daniel Stone
Post by James Jones
, and I
tend to agree that the compositor (along with its access to drivers like
KMS) is the component uniquely able to optimize the scene, I think the best
that can be achieved is a system that gravitates toward the optimal solution
in the steady state. Therefore, it seems that KMS should optimize display
engine resources assuming the Wayland compositor and its clients will adjust
to meet KMS' suggestions over time, where "time" would hopefully be only a
small number of additional frames. Streams will perform quite well in such a
design.
It is unfortunate that you seem to discuss 'Streams' as an abstract
concept of a cross-process swapchain which can be infinitely adjusted
to achieve perfection, and yet 'GBM' gets discussed as a singular
fixed-in-time thing which has all the flaws of just one of its
particular platform implementations.
I have a stronger understanding of the design direction for streams than I
do for GBM, and EGLStream is indeed intended to evolve towards the best
abstraction of a swapchain possible. My views of GBM are based on the
current API. I'm not that familiar with the Mesa implementation details.
I'd be happy to learn more about the direction the GBM API is taking in the
future, and that's half of what I was attempting to do in my
responses/questions here.
Post by Daniel Stone
I don't see how GBM could really perform any worse in such a design.
The current GBM API is not expressive enough to support optimal buffer
allocation (at least on our hardware) in such a design.
I'm curious about the performance concern. What exactly is missing?

Kristian
Daniel Vetter
2016-05-03 17:10:05 UTC
Permalink
Post by James Jones
Post by Daniel Vetter
Post by James Jones
Streams could provide a way to express that the compositor picked the wrong
plane, but they don't solve the optimal configuration problem. Configuration
is a tricky mix of policy and capabilities that something like HWComposer or
a wayland compositor with access to HW-specific knowledge needs to solve. I
agree with other statements here that encapsulating direct HW knowledge
within individual Wayland compositors is probably not a great idea, but some
separate standard or shared library taking input from hardware-specific
modules and wrangling scene graphs is probably needed to get optimal
behavior.
What streams do is allow allocating the most optimal set of buffers and
using the most optimal method to present them possible given a
configuration. So, streams would kick in after the scene graph thing
generated a config.
Daniel's reply cut out this crucial bit somehow, and he replied somewhere
else that he agrees that eglstreams solves at least the "optimal
allocation once scene graph is fixed" problem. I disagree since this
entire thing is highly dynamic - at least on SoC chips how you allocate
your buffers has big impacts on what the display engine can do, and the
- depending upon tiling layout fifo space requirements change drastically,
and going for the "optimal" tiling might push some other plane over the
edge
- there's simpler stuff like some planes can only do some features like
render compression, which is why even for a TEST_ONLY atomic commit you
must supply all the buffers already
- other fun stuff happens around rotation/scaling/planar vs. single-plane
yuv buffers. All these tend to need special hw resources, which means
your choice in how to use it on the kms side has effects on what kind of
buffer you need to allocate. And the other way round.
I don't think there's any way at all, at least for a generic system that
wants to support embedded/mobile SoCs to solve the kms config and buffer
alloc problems as 2 separate steps. You absolutely need these two pieces
to talk to each another, and talk the same language. Either some vendor
horror show (what most of android bsp end up doing behind the back) or
something standardized (what we're trying to pull off around kms+gbm).
Hiding half of the story behind eglstreams doesn't help anyone afaict. If
you do that, you also need to hide the other half. Which means proprietary
hw composer driver (or similar), which can understand/change the metadata
you internally attach to eglstreams/buffers. And once you've decided to go
the fully hidden route hw composer seems to be the best choice really.
SurfaceFlinger isn't really great for multi-screen, but the hwc interface
itself is already fixed and handles that properly. But even with hwc you
don't have eglstreams, because once both ends are proprietary there's
really no need for any standard any more ;-)
-Daniel
1) You believe HW composer is a reasonable solution to optimizing the
scenegraph of a compositor.
2) Your preferred solution would not be HW composer, but rather GBM+KMS
(presumably with the addition of some yet-to-be-developed APIs)
3) You believe the constraints of the system are sufficiently interdependent
that to optimize the system, all allocations and display engine
configuration must be done atomically, in a sense.
Is that correct?
No to 2) I want to drive kms/gbm towards hwc/gralloc so that at least for
90% of use-cases you can run a generic drm_hwcomposer on top of kms and
generic_gralloc on top of gbm. But there will always be use-cases that
need that last bit of efficiency in some very specific use-case, and for
those hwc+gralloc seem perfectly suited. So no unconditional preference
from my side at all, just a desire to standardize things more, and share
more code across vendors and platforms.

Agreed on 1) & 3).
Post by James Jones
-Do you believe (2) is reasonably achievable, or just the style of solution
you would prefer in general?
See above. Aim for 90% percent.
Post by James Jones
-Why is GBM+DRM-KMS better suited to meet the requirements presented by (3)
than EGLStream+DRM-KMS?
It exists and is widely used in shipping open-source systems like CrOS, X,
wayland and whateever else. eglstreams lacks the adoption, both in
compositors and in open source drivers. You could try to fix that by just
writing the eglstreams support for everyone (including mesa drivers and
all the existing compositors people are building), but I don't see that
happening.
Post by James Jones
Given Wayland is designed such that clients drive buffer allocation, and I
tend to agree that the compositor (along with its access to drivers like
KMS) is the component uniquely able to optimize the scene, I think the best
that can be achieved is a system that gravitates toward the optimal solution
in the steady state. Therefore, it seems that KMS should optimize display
engine resources assuming the Wayland compositor and its clients will adjust
to meet KMS' suggestions over time, where "time" would hopefully be only a
small number of additional frames. Streams will perform quite well in such a
design.
There would of course be cases where multiple iterations are required to get
from the current buffers and their display requirements to the optimal
buffers and the optimal display settings, but I don't see a way around that.
Hopefully over time display hardware will be optimized towards these
use-cases, as they are becoming ubiquitous.
Yeah, that's the idea I have in mind too. Except there's no reason why
you'd hide half of that iterative pipeline improving behind streams in my
opinion. At least if you want to support a semi-generic compositor, which
seems to be the goal you have with eglstreams/egloutput extensions
proposed.

If your goal is simply to etch out the last bit of performance your hw
affords, then we already have hwc+gralloc. It works, and since 1-2 years
google engineers have become very open about extending it and fixing
corner cases to make it more widely suitable.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Miguel Angel Vico
2016-05-11 13:24:48 UTC
Permalink
Hi all,

I just sent a second round of patches to add support for EGLStream &
friend in Weston.

Also, we've uploaded two weston branches that include all these patches
on top of weston master branch.

You can find them here:

https://cgit.freedesktop.org/~jjones/weston/

'nvidia_head' contains the same set of patches I sent out for review.

'nvidia_r364' contains a slightly different set of patches to make
weston work with our r364 driver, which doesn't have an implementation
of EGL_WL_wayland_eglstream.


Don't hesitate to send me an email if you have doubts or suggestions
about the patches.

I'm also available on Freenode IRC as 'mvicomoya'.


Thanks,
Miguel.


On Mon, 21 Mar 2016 17:28:13 +0100
Post by Miguel Angel Vico
Hi all,
My name is Miguel A. Vico, and I've been working as a Software
Engineer for NVIDIA for some time now, more specifically, in the
Linux drivers team. Although I've never spoken before, I've been
lately following the amazing work that you all have been doing here.
We have been working on adding to our drivers all required features to
be able to run Wayland and Weston on top of it. We have just released
our NVIDIA's 364.12 GPU driver, which brings initial DRM KMS support
https://devtalk.nvidia.com/default/topic/925605/linux/nvidia-364-12-release-vulkan-glvnd-drm-kms-and-eglstreams/
In order to make the Weston DRM compositor work with our drivers, we
have used EGLDevice, EGLOutput, and EGLStream objects.
For those not familiar with this set of EGL structures, here I try to
summarize the most important part of them, and how would they fit in
EGLDevice provides means to enumerate native devices, and then
create an EGL display connection from them.
Similarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.
For instance, EGLOutputLayer represents a portion of display
control hardware that accepts an image as input and processes it
for presentation on a display device.
EGLStream implements a mechanism to communicate frame producers
and frame consumers. By attaching an EGLOutputLayer consumer to a
stream, a producer will be able to present frames on a display
device.
Thus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a display
device.
In a similar way, by attaching a GLTexture consumer to a stream, a
producer (wayland client) could feed frames to a texture, which in
turn can be used by a compositor to prepare the final frame to be
presented.
Whenever EGL_EXT_device_drm extension is present, EGLDevice can
be used to enumerate and access DRM KMS devices, and
EGLOutputLayer to enumerate and access DRM KMS crtcs and planes.
By using EGLStreams and attaching an EGLOutputLayer consumer
(representing a DRM KMS crtc or plane) to it, compositor-drm can
produce final composition frames and present them on a DRM device.
Most of the EGL extensions required to implement this may be already
found in the Khronos registry, but we also needed extended
functionality for EGLStreams and EGLOutput consumers provided by
https://github.com/aritger/eglstreams-kms-example/blob/master/proposed-extensions/EGL_NV_stream_attrib.txt
Among other things, this extension defines a version of the stream
acquire function that takes an EGLAttrib parameter, allowing to
modify/extend acquire behavior in several cases.
https://github.com/aritger/eglstreams-kms-example/blob/master/proposed-extensions/EGL_EXT_stream_acquire_mode.txt
By default, EGLOutputLayer consumer are set to automatically
acquire frames, so eglSwapBuffers() call on the producer side will
present to the display without any further action. This extension
defines a new EGLStream attribute which allows to change this
behavior so that acquire operations must be issued manually with
eglStreamConsumerAcquireAttribNV().
https://github.com/aritger/eglstreams-kms-example/blob/master/proposed-extensions/EGL_NV_output_drm_flip_event.txt
This extension defines a new acquire attribute for EGLOutputLayer
consumers tied to DRM KMS CRTCs. It allows clients to get notified
whenever an acquire operation issued with
eglStreamConsumerAcquireAttribNV() is done.
Additionally, in order to allow wl_buffers to be bound to EGLStreams,
we kludged eglQueryWaylandBufferWL(EGL_WAYLAND_BUFFER_WL) to return
the stream file descriptor.
- Update WL_bind_wayland_display such that eglQueryWaylandBufferWL()
accepts a new attribute EGL_WAYLAND_BUFFER_TYPE_WL, returning
EGL_WAYLAND_BUFFER_EGLIMAGE_WL for the non-stream case.
- Add a new WL_wayland_buffer_eglstream extension, which would define
EGL_WAYLAND_BUFFER_EGLSTREAM_WL as a return value for
EGL_WAYLAND_BUFFER_TYPE_WL, and yet another attribute
EGL_WAYLAND_BUFFER_EGLSTREAM_FD_WL to query the stream file
descriptor.
I'm planning on posting to this mailing list the set of patches that
will add the support above-mentioned, hoping to get feedback from you.
Thanks in advance,
--
Miguel
Martin Peres
2016-06-13 09:32:52 UTC
Permalink
Post by Miguel Angel Vico
Hi all,
First of all, I'd like to introduce myself to the Wayland community: My
name is Miguel A. Vico, and I've been working as a Software Engineer
for NVIDIA for some time now, more specifically, in the Linux drivers
team. Although I've never spoken before, I've been lately following the
amazing work that you all have been doing here.
We have been working on adding to our drivers all required features to
be able to run Wayland and Weston on top of it. We have just released
our NVIDIA's 364.12 GPU driver, which brings initial DRM KMS support
https://devtalk.nvidia.com/default/topic/925605/linux/nvidia-364-12-release-vulkan-glvnd-drm-kms-and-eglstreams/
In order to make the Weston DRM compositor work with our drivers, we
have used EGLDevice, EGLOutput, and EGLStream objects.
Hi everyone,

This discussion has been going on for years (not this thread, the general
discussion). This issue is hindering Wayland and X, as mesa developers
cannot enable HW features such as compression or new tiling modes for
scanout buffers and we need to create one standard that would work also
for AMD, NVIDIA and ARM platforms.

How about we actually have a forum/track at XDC just for these issues
and we do not go out of the room/conference without a plan? This is after
all the raison d'être of XDC, making it less difficult for developers to
talk with each others!

During this session, I would like to have all the goals and requirements
put out in clear lists, then we can work on defining the priorities and
then try to create the least amount of protocols to address as many
problems as possible at the same time. Of course, all the interested
parties should be coming to XDC2016 (Helsinki, 2016-09-21 -> 23).

Martin
Martin Peres
2016-06-13 10:14:42 UTC
Permalink
Post by Martin Peres
This discussion has been going on for years (not this thread, the general
discussion).
Pekka made me realize on IRC that I was not specific enough
about what I mean here.

By discussion here, I am talking about sharing buffers
across blocks/drivers/manufacturers while honouring
Wayland's mantra of every frame being perfect and being
as efficient as possible. This is related to Google's
GRalloc, GBM, EGLStreams and probably more.
James Jones
2016-07-19 18:10:50 UTC
Permalink
Post by Martin Peres
Post by Martin Peres
This discussion has been going on for years (not this thread, the general
discussion).
Pekka made me realize on IRC that I was not specific enough
about what I mean here.
By discussion here, I am talking about sharing buffers
across blocks/drivers/manufacturers while honouring
Wayland's mantra of every frame being perfect and being
as efficient as possible. This is related to Google's
GRalloc, GBM, EGLStreams and probably more.
Thanks for proposing this Martin.

This sounds great, and we're happy to participate if we can get together
a quorum.

Besides the people who were most vocal in this thread, it would be good
to have perspective from several of the ARM SoCs and Google (ChromeOS
and Android).

I'll reach out to some of my contacts there to make sure the right
groups will be represented at XDC, and we'll get the right NVIDIA people
signed up.

Thanks,
-James
Post by Martin Peres
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel
Daniel Vetter
2016-07-21 07:16:32 UTC
Permalink
Post by James Jones
Post by Martin Peres
Post by Martin Peres
This discussion has been going on for years (not this thread, the general
discussion).
Pekka made me realize on IRC that I was not specific enough
about what I mean here.
By discussion here, I am talking about sharing buffers
across blocks/drivers/manufacturers while honouring
Wayland's mantra of every frame being perfect and being
as efficient as possible. This is related to Google's
GRalloc, GBM, EGLStreams and probably more.
Thanks for proposing this Martin.
This sounds great, and we're happy to participate if we can get together a
quorum.
Besides the people who were most vocal in this thread, it would be good to
have perspective from several of the ARM SoCs and Google (ChromeOS and
Android).
I'll reach out to some of my contacts there to make sure the right groups
will be represented at XDC, and we'll get the right NVIDIA people signed up.
Kernel folks should be there plenty relevant for this topic (me included),
and I think some google cros folks will show up too. There's even going to
be a presentation about google's drm_hwcomposer. I think mostly you need
to make sure nvidia folks show up ;-)
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Continue reading on narkive:
Loading...