[PATCH] wayland-server: update the client fd when it failed to flush with EAGAIN

Discussion:

Jeonghyun Kang

2018-11-23 05:08:48 UTC

_______________________________________________
wayland-devel mailing list
wayland-***@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Derek Foreman

2018-11-26 17:12:08 UTC

Permalink

When a wayland compositor gets an EAGAIN error whenever
sending or receiving event(s) to a client in the
wl_closure_send() or the wl_closure_queue(), the error
variable of the wl_client for the client will be set to
true and the client is going to be killed by the
compositor later soon. This patch fixes the problem by
updating the socket fd's event mask as we're doing
in wl_display_flush_clients() for having another chance
to do it again.
Actually, this kind of problem can be watched in an
environment in which massive input events are coming
from multi-touch screen or multi-pen devices. In this
kind of environment, a client receiving the massive
input events are trying to drawing something very hard
but it's being killed sooner or later. With the given
patch, the client receiving the input events is not being
destroyed and is working well even though it doesn't get
the whole input events from the compositor.

This is actually intentional behavior to avoid infinitely growing send
queues for clients that process events too slowly.

I don't think dropping events randomly (or at all) because the client
can't keep up is something we should do in libwayland. It'll just make
for nearly impossible to reproduce and debug problems when client and
compositor state fall out of sync.

What if it's a frame event that gets dropped and not an input event? Or
a resource destroy?

The disconnect is a clear indication that there's a problem that needs
to be fixed somewhere else. The client needs to process events in a
timely fashion, or the compositor needs to send less events.

Thanks,
Derek

---
src/wayland-server.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/wayland-server.c b/src/wayland-server.c
index eae8d2e..5afaa28 100644
--- a/src/wayland-server.c
+++ b/src/wayland-server.c
@@ -222,8 +222,15 @@ handle_array(struct wl_resource *resource, uint32_t
opcode,

        log_closure(resource, closure, true);

-       if (send_func(closure, resource->client->connection))
-               resource->client->error = 1;
+       if (send_func(closure, resource->client->connection)) {
+               if (errno == EAGAIN) {
+                       wl_event_source_fd_update(resource->client->source,
+                                                 WL_EVENT_WRITABLE |
+                                                 WL_EVENT_READABLE);
+               } else {
+                       resource->client->error = 1;
+               }
+       }

        wl_closure_destroy(closure);
}
--
2.7.4
_______________________________________________
wayland-devel mailing list
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Pekka Paalanen

2018-11-27 08:17:43 UTC

Permalink

On Mon, 26 Nov 2018 11:12:08 -0600

Post by Derek Foreman

This is actually intentional behavior to avoid infinitely growing send
queues for clients that process events too slowly.

Indeed. If a client cannot keep up with the compositor because the
client is doing something else, the client is by definition not
responsive enough and need to be fixed. It may also be a compositor bug
attempting to send too much data to a client, in which case the
compositor or even the protocol design will need changes to reduce the
volume.

Post by Derek Foreman
I don't think dropping events randomly (or at all) because the client
can't keep up is something we should do in libwayland. It'll just make
for nearly impossible to reproduce and debug problems when client and
compositor state fall out of sync.

Yes, dropping events is never acceptable.

The whole Wayland protocol design relies on the fact that no message is
*ever* dropped. The fundamental protocol design also relies on all
messages being in order, which means that re-ordering is as bad as
dropping events.

If there is any situation where a message would need to be dropped, it
is much more merciful to just disconnect the client, because otherwise
the server and client would get out of sync, which just causes more
problems that are extremely painful to debug.

Post by Derek Foreman
What if it's a frame event that gets dropped and not an input event? Or
a resource destroy?
The disconnect is a clear indication that there's a problem that needs
to be fixed somewhere else. The client needs to process events in a
timely fashion, or the compositor needs to send less events.

Exactly.

NAK for the patch.

Thanks,
pq

Post by Derek Foreman
Thanks,
Derek

---
Â src/wayland-server.c | 11 +++++++++--
Â 1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/wayland-server.c b/src/wayland-server.c
index eae8d2e..5afaa28 100644
--- a/src/wayland-server.c
+++ b/src/wayland-server.c
@@ -222,8 +222,15 @@ handle_array(struct wl_resource *resource, uint32_t
opcode,
Â
Â Â Â Â Â Â Â log_closure(resource, closure, true);
Â
-Â Â Â Â Â Â if (send_func(closure, resource->client->connection))
-Â Â Â Â Â Â Â Â Â Â Â Â Â Â resource->client->error = 1;
+Â Â Â Â Â Â if (send_func(closure, resource->client->connection)) {
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â if (errno == EAGAIN) {
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â wl_event_source_fd_update(resource->client->source,
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â WL_EVENT_WRITABLE |
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â WL_EVENT_READABLE);
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â } else {
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â resource->client->error = 1;
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â }
+Â Â Â Â Â Â }
Â
Â Â Â Â Â Â Â wl_closure_destroy(closure);
Â }
--
2.7.4

박성진

2018-11-27 08:59:16 UTC

Permalink

_______________________________________________
wayland-devel mailing list
wayland-***@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Pekka Paalanen

2018-11-27 12:24:07 UTC

Permalink

On Tue, 27 Nov 2018 17:59:16 +0900

Post by Derek Foreman
What if it's a frame event that gets dropped and not an input event? Or
a resource destroy?

Instead of destroying a connection, destroying a resource seems to be more
reasonable.

No, you cannot do that either. It too will cause the client and the
server to get out of sync, which may eventually lead to a fatal
protocol error.

Really, the only thing you can do is either disconnect the client or
halt its connection until it drains. Halting would require explicit
support throughout the compositor to avoid sending any events, so
disconnecting is the only thing that libwayland-server can do by itself.

Thanks,
pq

박성진

2018-11-28 00:49:54 UTC

Permalink

_______________________________________________
wayland-devel mailing list
wayland-***@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/wayland-devel

Pekka Paalanen

2018-11-28 08:48:45 UTC

Permalink

On Wed, 28 Nov 2018 09:49:54 +0900

Post by Pekka Paalanen
On Tue, 27 Nov 2018 17:59:16 +0900

Post by Derek Foreman
What if it's a frame event that gets dropped and not an input event? Or
a resource destroy?

Instead of destroying a connection, destroying a resource seems to be more
reasonable.

No, you cannot do that either. It too will cause the client and the
server to get out of sync, which may eventually lead to a fatal
protocol error.
Really, the only thing you can do is either disconnect the client or
halt its connection until it drains. Halting would require explicit
support throughout the compositor to avoid sending any events, so
disconnecting is the only thing that libwayland-server can do by itself.

Oops ! Actually I had to say "removing/re-creating resources like
wl_touch interface by updating capabilities via wl_seat capabilities
event seems better than destroying the client." As we have an
electronic whiteboard products that can be drawn with 4 pens
simultaneouly in high resolution, killing an important app that draws
trajectories of each pen may not be the best option.

That is going to be difficult as well, because to revoke a capability
you need to send an event and only then the compositor can decide to
stop sending the related input events. Sending an event will be hard
when the send buffers are already full.

It would also just remove one kind of event source from the client, but
you still have lots of other event sources that you cannot revoke this
way, and they also need to continue working. That will be very hard when
the send buffers are already full.

What you are thinking of here is just a tiny piece of the "halt the
connection" approach, and I very much doubt it alone would do any good.

Also, removing capabilities might prevent disconnects, but it will
still have the app misbehave from a user perspective, making the system
look broken in any case. I think it would be better to primarily avoid
ending up with overflowing messages, and secondary when it happens,
make it very obvious it happened so you get clear error reports, and
then recover by e.g. re-connecting if possible. I don't think papering
over the issue and obfuscating the symptoms would be a good idea.

Now we're going to do the further analysis about how many events are
coming from kernel/compositor and how fast the client can process the
events.

Keep your messages as small as possible, and return to your main event
loop to flush and read the socket as often as possible. That's the only
general advice coming to my mind.

Everything else will be kind of case by case, e.g. you might be
able to compress pointer motion events in the compositor if the input
device is sends too many too fast. Of course, compressing motion events
means that the client no longer sees the true trajectory but some
approximation of it.

Sometimes you may have to change your protocol design. If you need to
send a lot of data in one go, consider using shared memory instead of
many and/or big messages.

Thanks,
pq