Hi Andy,
Post by Andy RitgerThanks for the thorough responses, Daniel.
No problem; as I said, I'm actually really happy to see an
implementation out there.
Post by Andy RitgerPost by Daniel StonePost by Miguel Angel VicoSimilarly, EGLOutput will provide means to access different
portions of display control hardware associated with an EGLDevice.
For instance, EGLOutputLayer represents a portion of display
control hardware that accepts an image as input and processes it
for presentation on a display device.
I still struggle to see the value of what is essentially an
abstraction over KMS, but oh well.
The intent wasn't to abstract all of KMS, just the surface presentation
aspect where EGL and KMS intersect. Besides the other points below,
an additional motivation for abstraction is to allow EGL to work with
the native modesetting APIs on other platforms (e.g., OpenWF on QNX).
Fair enough. And, ah, _that's_ where the OpenWF implementation is - I
was honestly unsure for years since the last implementation I saw was
from the ex-Hybrid NVIDIA guys in Helsinki, back when it was aimed at
Series 60.
Post by Andy RitgerPost by Daniel StoneFirstly, again looking at the case where a Wayland client is a stream
producer and the Wayland compositor is a consumer, we move from a
model where references to individual buffers are explicitly passed
through the Wayland protocol, to where those buffers merely carry a
reference to a stream. Again, as stated in the review of 4/7, that
looks like it has the potential to break some actual real-world cases,
and I have no idea how to solve it, other than banning mailbox mode,
which would seem to mostly defeat the point of Streams (more on that
below).
Streams are just a transport for frames. The client still explicitly
communicates when a frame is delivered through the stream via wayland
protocol, and the compositor controls when it grabs a new frame, via
eglStreamConsumerAcquireKHR(). Unless there are bugs in the patches,
the flow of buffers is still explicit and fully under the wayland protocol
and compositor's control.
Right, I believe if you have FIFO mode and strictly enforce
synchronisation to wl_surface::frame, then you should be safe. Mailbox
mode or any other kind of SwapInterval(0) equivalent opens you up to a
series of issues.
Post by Andy RitgerAlso, mailbox mode versus FIFO mode should essentially equate to Vsync
off versus Vsync on, respectively. It shouldn't have anything to do
with the benefits of streams, but mailbox mode is a nice feature for
benchmarking games/simulations or naively displaying your latest &
greatest content without tearing.
I agree it's definitely a nice thing to have, but it does bring up the
serialisation issue: we expect any configuration performed by the
client (say, wl_surface::set_opaque_area to let the compositor know
where it can disable blending) to be fully in-line with buffer
attachment. The extreme case of this is resize, but there are quite a
few valid cases where you need serialisation.
I don't know quite off the top of my head how you'd support mailbox
mode with Streams, given this constraint - you need three-way feedback
between the compositor (recording all associated surface state,
including subsurfaces), clients (recording the surface state valid
when that buffer was posted), and the Streams implementation
(determining which frames to dequeue, which to discard and return to
the client, etc).
Post by Andy RitgerPost by Daniel StoneSecondly, looking at the compositor-drm case, the use of the dumb
buffer to display undefined content as a dummy modeset really makes me
uneasy,
Yes, the use of dumb buffer in this patch series is a kludge. If we
were going to use drmModeSetCrtc + EGLStreams, I think we'd want to
pass no fb to drmModeSetCrtc, but that currently gets rejected by DRM.
Are surface-less modesets intended to be allowable in DRM? I can hunt
that down if that is intended to work. Of course, better to work out
how EGLStreams should cooperate with atomic KMS.
It was definitely an oversight to not zero initialize the dumb buffer.
Right, atomic allows you separate pipe/CRTC configuration from
plane/overlay configuration. So you'd have two options: one is to use
atomic and require the CRTC be configured with planes off before using
Streams to post flips, and the other is to add KMS configuration to
the EGL output.
Though, now I think of it, this effectively precludes one case, which
is scaling a Streams-sourced buffer inside the display controller. In
the GBM case, the compositor gets every buffer, so can configure the
plane scaling in line with buffer display. I don't see how you'd do
that with Streams.
There's another hurdle to overcome too, which would currently preclude
avoiding the intermediate dumb buffer at all. One of the invariants
the atomic KMS API enforces is that (!!plane->crtc_id ==
!!plane->fb_id), i.e. that a plane cannot be assigned to a CRTC
without an active buffer. So again, we're left with either having the
plane fully configured and active (assigned to a CRTC and displaying,
I assume, a pre-allocated dumb buffer), or pushing more configuration
into Streams - specifically, connecting an EGLOutputLayer to an
EGLOutputPort.
Post by Andy RitgerPost by Daniel StoneAlso, I'm not quite sure how you're testing the compositor-as-consumer
mode: I can't seem to see any EGL extensions which allow you to
connect a Wayland surface as an EGLStream consumer. Do you have
something else unpublished that's being used here, or is this what the
libnvidia-egl-wayland library is for? Or do you just have clients
using EGLSurfaces as normal, which happen to be implemented internally
as EGLStreams? (Also, that the only way to test this is through
proprietary drivers implementing only-just-published extensions not
only makes me very sad, but hugely increases the potential for this to
be inadvertently broken.)
Sorry if this seemed cryptic. You are correct that EGL Wayland clients
just use EGLSurfaces as normal (no Wayland client changes), and that
gets implemented using EGLStreams within libnvidia-egl-wayland.
Sorry, I'd missed this whilst reading through.
Post by Andy RitgerFWIW, we plan to release the source to libnvidia-egl-wayland
eventually... it has a few driver-specific warts right now, but the
intent is that it is a vendor-independent implementation (though, using
EGLStreams, so...) of EGL_KHR_platform_wayland using a set of EGL API
"wrappers". The goal was to allow window systems to write these EGL
platform binding themselves, so that each EGL implementation doesn't
have to implement each EGL_KHR_platform_*. Anyway, we'll try to get
libnvidia-egl-wayland cleaned up and released.
Interesting!
Post by Andy RitgerPost by Daniel StonePost by Miguel Angel VicoThus, a compositor could produce frames and feed them to an
EGLOutputLayer through an EGLStream for presentation on a display
device.
In a similar way, by attaching a GLTexture consumer to a stream, a
producer (wayland client) could feed frames to a texture, which in
turn can be used by a compositor to prepare the final frame to be
presented.
Quick aside: this reminds me in many unfortunate ways of
GLX_EXT_texture_from_pixmap. tfp gave us the same 'capture stream of
stuff and make it appear in a texture' model as streams, whereas most
of the rest of the world (EGL, Vulkan WSI, Wayland, Android, ChromeOS,
etc) have all moved explicitly _away_ from that model to passing
references to individual buffers, this in many ways brings us back to
tfp.
Is that really an accurate comparison? The texture_from_pixmap extension
let X11 composite managers bind a single X pixmap to an OpenGL texture.
It seems to me what was missing in TFP usage was explicit synchronization
between X and/or OpenGL rendering into the pixmap and OpenGL texturing
from the pixmap.
I'd argue that synchronisation (in terms of serialisation with the
rest of the client's protocol stream) is missing from Streams as well,
at least in mailbox mode.
(As an aside, I wonder if it's properly done in FIFO mode as well; the
compositor may very validly choose not to dequeue a buffer if a
surface is completely occluded. How does Streams then know that it can
submit another frame? Generally we use wl_surface::frame to deal with
this - the equivalent of eglSwapInterval(1) - but it sounds like
Streams relies more on strictly-paired internal queue/dequeue pairing
in FIFO mode. Maybe this isn't true.)
Post by Andy RitgerPost by Daniel StonePost by Miguel Angel VicoWhenever EGL_EXT_device_drm extension is present, EGLDevice can
be used to enumerate and access DRM KMS devices, and EGLOutputLayer
to enumerate and access DRM KMS crtcs and planes.
Again, the enumeration isn't so much used as bypassed. The original
enumeration is used, and all we do with the EGL objects is a) list all
of them, b) filter them to find the one we already have, and c)
perhaps replace their internal representation of the device with the
one we already have.
That's fair in the context of this patch set.
In general, EGLDevice provides device enumeration for other use cases
where it is the basis for bootstrapping. Maybe we could better reconcile
udev and EGLDevice in the patch set, but some of this is a natural, though
unfortunate, artifact of correlating objects between two enumeration APIs.
Mind you, this wasn't intended as a criticism, just noting that the
commit message didn't accurately describe the code.
Post by Andy RitgerPost by Daniel StoneI'd like to look at the elephant in the room, which is why you're
using this in the first place (aside from general NVIDIA enthusiasm
for encapsulating everything within EGL Streams/Output/Device/etc,
dating back many years). Andy/Aaron, you've said that you found GBM to
be inadequate, and I'd like to find out explicitly how.
Thanks. This is the real heart of the debate.
Yes!
Post by Andy RitgerPost by Daniel StoneThrough a few
'We can't choose an optimal rendering configuration, because we don't
know how it's going to be used' - (almost completely) untrue. The FD
you pass to gbm_device_create is that of the KMS device, a gbm_surface
contains information as to how the plane (primary or overlay) will be
configured,
Maybe I'm not looking in the right place, but where does gbm_surface get
the intended plane configuration? Are there other display-related flags
beside GBM_BO_USE_SCANOUT? Then again, the particular plane doesn't
impact us for current GPUs.
Well, nowhere. By current plane configuration, I assume you're (to the
extent that you can discuss it) talking about asymmetric plane
capabilities, e.g. support for disjoint colour formats, scaling units,
etc? As Dan V says, I still see Streams as a rather incomplete fix to
this, given that plane assignment is pre-determined: what do you do
when your buffers are configured as optimally as possible, but the
compositor has picked the 'wrong' plane? I really think you need
something like HWC to rewrite your scene graph into the optimal setup.
Post by Andy RitgerPost by Daniel Stoneand an EGLDisplay lets you tie the rendering and scanout
devices together. What more information do you need? It's true that we
don't have a way to select individual rendering devices at the moment,
but as said earlier, passing an EGLDevice as an attrib to
GetPlatformDisplay would resolve that, as you would have the render
device identified by the EGLDevice and the scanout device identified
by the gbm_device. At that point, you have the full pipeline and can
determine the optimal configuration.
Beyond choosing optimal rendering configuration, there is arbitration of
the scarce resources needed for optimal rendering configuration. E.g.,
for Wayland compositor flipping to client-produced buffers, presumably the
client's buffer needs to be allocated with GBM_BO_USE_SCANOUT. NVIDIA's
display hardware requires physically contiguous buffers, so we wouldn't
want clients to _always_ allocate buffers with the GBM_BO_USE_SCANOUT
flag. It would be nice to have feedback between the EGL driver instance
in the compositor and the EGL driver running in the client, to know how
the buffer is going to be used by the Wayland compositor.
I imagine other hardware has even more severe constraints on displayable
memory, though, so maybe I'm misunderstanding something about how buffers
are shared between wayland clients and compositors?
Ah! This is something I've very much had in mind - and have had for
quite a while, but keep getting pre-empted - for a while, but didn't
bring up as it didn't seem implemented in the current patchset. (IIRC,
jajones had some code to allow you to retarget Streams at different
consumers, but he's on leave.)
Also, I should add that there's nothing requiring clients to use GBM
to allocate. The client EGLSurface implementation is free to do purely
internal allocations that are only accessible to it, if it wants to;
gbm_bo_import would then note that the buffer is not usable for
scanout and fail the import, leaving the compositor to fall back to
EGLImage.
Post by Andy RitgerThis ties into the next point...
Post by Daniel Stone'We don't know when to schedule decompression, because there's no
explicit barrier' - completely untrue. eglSwapBuffers is that barrier.
For example, in Freescale i.MX6, the Vivante GPU and Freescale IPU
(display controller) do not share a single common format between GPU
render targets and IPU scanout sources, so require a mandatory
detiling pass in between render and display. These work just fine with
gbm with that pass scheduled by eglSwapBuffers. This to me seems
completely explicit, unless there was something else you were meaning
... ?
The Vivante+Freescale example is a good one, but it would be more
interesting if they shared /some/ formats and you could only use those
common formats in /some/ cases.
That's also fairly common, particularly for tiling. Intel has more
tiling modes than I can remember, of which only one (X-tiling) is a
valid source for scanout. As you say, physical contiguity is also a
valid requirement, plus pitch alignment.
Post by Andy RitgerI think a lot of the concern is about passing client-produced frames
all the way through to scanout (i.e., zero-copy). E.g., if the wayland
client is producing frames that the wayland compositor is going to use
as a texture, then we don't want the client to decompress as part of its
eglSwapBuffers: the wayland compositor will texture from the compressed
frame for best performance. But, if the wayland compositor is going to
flip to the surface, then we would want the client to decompress during
its eglSwapBuffers.
Yes, very much so. Taking the Freescale example, you want the client
to do a detiling blit during its swap if the surface is a valid
scanout target, but not at all if it's just getting textured by the
GPU anyway. Similarly, Intel wants to allocate X-tiled if scanout is
possible, but otherwise it wants to be Y/Yf/...-tiled.
Post by Andy RitgerThe nice thing about EGLStreams here is that if the consumer (the Wayland
compositor) wants to use the content in a different way, the producer
must be notified first, in order to produce something suitable for the
new consumer.
I believe this is entirely doable with GBM right now, taking advantage
of the fact that libgbm.so and libEGL.so must be as tightly paired as
libEGL.so and libGLESv2.so. For all of these, read 'wl_drm' as 'wl_drm
or its equivalent interface in other implementations'.
Firstly, create a new interface in wl_drm to represent a swapchain (in
the Vulkan sense), and modify its buffer-creation requests to take a
swapchain parameter. This we can do without penalty, since the only
users (aside from VA-API, which is really broken and also hopefully
soon to lose its Wayland sink anyway) are EGL_EXT_platform_wayland and
EGL_WL_bind_wayland_display, both within the same DSO.
Secondly, instrument gbm_bo_import's wl_buffer path (proxy for intent
to use a buffer for direct scanout) and EGLImage's
EGL_WAYLAND_BUFFER_WL path (proxy for intent to use via GPU
composition) to determine what the compositor is actually doing with
these buffers, and use that to store target/intent in the swapchain.
Thirdly, when the target/intent changes (e.g. 'was scanout every
frame, has been EGLImage for the last 120 frames'), send an event down
to the client to let it know to modify its allocation. The combination
of EGL/GBM are in the correct place to determine this, since between
them they already have to know the intersection of capabilities
between render and scanout.
That still doesn't solve the optimal-display-configuration problem -
that you have generic code determining not only the display strategy
(scanout vs. GPU composition) as well as the exact display controller
configuration - but neither does EGLStreams, or indeed anything
current short of HWC.
Do you see any problem with doing that within GBM? It's not actually
done yet, but then again, neither is direct scanout through Streams.
;)
Post by Andy RitgerPost by Daniel Stone'Width, height, pitch and format aren't enough information' - this is
true, but not necessarily relevant. I'm not sure what the source of
this actually is: is it the gbm_bo_get_*() APIs? If so, yes, they need
to be extended with a gbm_bo_get_modifier() call, which would allow
you to get the DRM format modifier to describe tiling/compression/et
al (as well as perhaps being extended to allow you to extract multiple
buffers/planes, e.g. to attach auxiliary compression buffers). If it's
not gbm, what actually is it? The only other place I can think of
(suggested by Pekka, I think) was the wl_drm protocol, which it should
be stressed is a) not required in any way by Wayland, b) not a
published/public protocol, c) not a stable protocol. wl_drm just
happens to be the way that Mesa shares buffers, just as wl_viv is how
Vivante's proprietary driver shares buffers, and mali_buffer_sharing
is how the Mali driver does it. Since the server side is bound by
eglBindWaylandDisplayWL and the client side is also only used through
EGL, there is _no_ requirement for you to also implement wl_drm. As it
is a hidden private Mesa protocol, there is also no requirement for
the protocol to remain stable.
I agree that wl_drm doesn't factor into it.
Maybe some of this is my confusion over what parts of gbm.h are
application-facing, and what parts are driver-facing? We, and
presumably most hardware vendors, would want the ability to associate
arbitrary metadata with gbm_bo's, but most of that metadata is
implementation-specific, and not really something an application should
be looking at without sacrificing portability.
All of gbm.h is user-facing; how you implement that API is completely
up to you, including arbitrary metadata. For instance, it's the driver
that allocates its own struct gbm_surface/gbo_bo/etc (which is
opaque), so it can do whatever it likes in terms of metadata. Is there
anything in particular you're thinking of that you're not sure you'd
be able to store portably?
Might also be worth striking a common misconception here: the Mesa GBM
implementation is _not_ canonical. gbm.h is the user-facing API you
have to implement, but beyond that, you don't need to be implemented
by Mesa's src/gbm/. As the gbm.h types are all opaque, I'm not sure
what you couldn't express/hide/store - do you have any examples?
Post by Andy RitgerPost by Daniel Stone'EGLStreams is the direction taken in Vulkan' - I would argue not. IMO
the explicit buffer management on the client side does not parallel
EGLStreams, and notably there is no equivalent consumer interface
offered on the server side, but instead the individual-buffer-driven
approach is taken. It's true that VK_WSI_display_swapchain does exist
and does match the EGLStreams model fairly closely, but also that it
does not have universal implementation: the Intel 'anv' Mesa-based
driver does not implement display_swapchain, instead having an
interface to export a VkImage as a dmabuf. It's true that the latter
is not optimal (it lacks the explicit targeting required to determine
the most optimal tiling/compression strategy), but OTOH it is
precedent for explicitly avoiding the
VK_WSI_display_swapchain/EGLStreams model for Vulkan on KMS, just as
GBM avoids it for EGL on KMS.
From your perspective, what would be more optimal than VkImage+dmabuf?
Well, it's pretty much on par with GBM-compositor-Wayland-client and
an EGLStreams pipeline ending in an EGLOutput. Not having something
like HWC means that you can't determine the optimal plane-allocation
strategy.
Post by Andy RitgerPost by Daniel StoneAgreed. One of the things I've been incredibly happy with is how our
platform has managed to stay completely generic and vendor-neutral so
far, and I'd love to preserve that.
I don't think you'll find any disagreement to that from NVIDIA, either.
I apologize if the EGLStreams proposal gave the impression of a
vendor-private solution. That wasn't the intent. The EGLStream family
of extensions are, after all, an open specification that any EGL vendor
can implement. If there are aspects of any of these EGL extensions that
seem useful, I'd hope that Mesa would we willing to adopt them.
Indeed, this wasn't to cast any aspersions on how you guys have
developed Streams. Having it out there and having these patches has
really been tremendously useful.
Post by Andy RitgerWe (NVIDIA) clearly think EGLStreams is a good direction for expressing
buffer sharing semantics. In our ideal world, everyone would implement
these extensions and Wayland compositors would migrate to using them as
the generic vendor-neutral mechanism for buffer sharing :)
But here's where my problem lies. At the moment, the 'how do I
Wayland' story is very straightforward, and not entirely
coincidentally similar to ChromeOS's: you implement GBM+KMS, you
implement the ~25 LoC of libwayland-egl, you implement
EGL_EXT_platform_{gbm,wayland}, and ... that's it. Introducing Streams
as an alternate model is certainly interesting, and I understand why
you would do it, but having it as the sole option muddies the 'how do
I Wayland' story significantly.
Getting away from the vendor-bound DDX model was something we were
desperate to do (see also xf86-video-modesetting landing on GBM+EGL),
and I'd really just like to avoid that becoming 'well, for most
platforms you do this, but for this platform / these platforms, you do
this instead ...'.
Cheers,
Daniel