hunchentoot-stuck-connection-monitor

2024-10-12

Monitors hunchentoot connections and logs the connections stuck in the same state for a long time (due to slow or inactive clients and network stream timeouts that hunchentoot tries to utilize not working properly). Offers an option to shutdown the stuck connections sockets manually or automatically, thus unblocking the connection threads and preventing thread and socket leak. See https://github.com/edicl/hunchentoot/issues/189

Upstream URL

github.com/avodonosov/hunchentoot-stuck-connection-monitor

Author

Anton Vodonosov <avodonosov@yandex.ru>

License

BSD-2-Clause
README

Monitors hunchentoot connections and logs the connections stuck in the same state for a long time.

Offers an option to shutdown the stuck connections sockets manually or automatically, thus unblocking the connection threads and preventing thread and socket leakage.

Usage

Create your own acceptor subclass and put our mixin class in the class precedence list before hunchentoot:acceptor, like this: test-stuck-connection-monitor.lisp#L47

The public API is in the defpackage form at the top of stuck-connection-monitor.lisp.

Background

Hunchentoot uses a thread per connection model. To prevent thread and socket leakage due to slow, inactive or lost clients, hunchentoot configures IO timeout for connection socket streams, using a lisp-implementation specific way: https://github.com/edicl/hunchentoot/blob/master/set-timeouts.lisp#L31

This does not always help.

One reported scenario of thread leakage is TLS mode (hunchentoot:ssl-acceptor). The default behaviour of cl+ssl is to retrieve socket file descriptor from socket lisp stream, and give the file descriptor to OpenSSL to perform all IO. In result the timeout settings of lisp stream are ignored.

To fix that users can switch cl+ssl to LispBIO mode, where all IO is performed over lisp stream, and so the timeouts remain working.

To do that either globally (setq cl+ssl:*default-unwrap-stream-p* nil) or override the hunchentoot:initialize-connection-stream for your acceptor, and inside of this method add :unwrap-stream-p nil parameter to the call to cl+ssl:make-ssl-server-stream: https://github.com/edicl/hunchentoot/blob/460a32c0378c659cf8740ad84a47169484f9b131/ssl.lisp#L83

However, there are more potential cases of stuck connection.

Slow request attack, where client sends data over the request stream very slowly, say one byte in 15 seconds, thus preventing timeout from happening, keeping server resources - socket and connection thread - occupied.

Another possibility is the lisp implementation-specific IO timeouts not working. For example on SBCL write timeouts are not supported (https://groups.google.com/g/sbcl-devel/c/-eLw-Wv3Prc). So if client does not read server response, the connection thread may be blocked. Although not forever. TCP implementation will be trying to re-transmit the data not acknowledged by the client, and will finally give up and close the socket (online articles suggest it will take round 15 minutes, in my testing that took 20 minutes).

Therefore it's good to have a health check service for hucnhentoot, that detects stuck connections.

Of course, not all applications want to consider connections staying in the same state for a long time as a problem. For example, a server offering some long polling endpoint, or supporting very large file downloads, may consider such connections as fine.

Reproducing

The file test-stuck-connection-monitor.lisp contains steps to reproduce various scenarios described above, and to see how hunchentoot-stuck-connection-monitor detects them.

Unblocking the Threads

To unblock the connection threads we suggest to shutdown the connection socket.

Strictly speaking, calling socket operations from a thread other than the connection thread that currently reads or writes to socket is not guaranteed to work correctly, socket API is single threaded.

But shutdown works fine on Linux and unblocks the connection thread. (Unlike, for example, close function).

Some StackOverflow user said (I don't have a link) that on Windows it's other way around, shutdown is unsafe to call from another thread, while close (or closesocket, I don't remember) is fine. I haven't verified that.

Alternatively to socket shutdown, the user can watch logs for stuck-connection-monitor messages and, for example, restart the web server container.

Dependencies (3)

  • bordeaux-threads
  • hunchentoot
  • usocket

Dependents (0)

    • GitHub
    • Quicklisp