hunchentoot-recycling-taskmaster

2026-01-01

An experiment to improve multithreading performance of Hunchentoot without any additional dependencies.

Upstream URL

github.com/y2q-actionman/hunchentoot-recycling-taskmaster

Author

Yokota Yuki

License

BSD 2-Clause
README

1Abstract

hunchentoot-recycling-taskmaster is a taskmaster implementation for Hunchentoot, aiming to improve connection establishment efficiency through thread-pooling and flexible thread count adjustment.

2Performance tl;dr

file:pics/benchmark-result-2025-12-26/sleep_1ms/keep-alive/100_connections_requests_sec__sleep_1ms__keep-alive__higher_is_better.svg

file:pics/benchmark-result-2025-12-26/sleep_1ms/no-keep-alive/100_connections_requests_sec__sleep_1ms__no_keep-alive__higher_is_better.svg

On this benchmark, my HTTP handler always responds after 1ms to simulate some workloads.

  • Hunchentoot is an all-rounder. It works well on Keep-alive connections. I think it is good for typical use-cases.
  • If your workload does not utilize Keep-alive, hunchentoot-recycling-taskmaster may be useful.
  • Woo is very difficult to use. Woo seems fast only when your handlers work with very low latency. See About Woo below.

In detail, See Benchmark below.

3How to use

3.1Caution about Lispworks

On Lispworks, hunchentoot-recycling-taskmaster does not work because Hunchentoot on that does not handle a listen socket directly.

3.2Installation

3.2.1Loading

Currently hunchentoot-recycling-taskmaster is only in Ultralisp, not in Quicklisp or ocicl.

When you use Ultralisp, ql:quickload it :

  (ql:quickload "hunchentoot-recycling-taskmaster")

Or download it by yourself.

  cd ~/quicklisp/local-projects
  git clone https://github.com/y2q-actionman/hunchentoot-recycling-taskmaster.git
  (ql:register-local-projects)            ; Do if required
  (ql:quickload "hunchentoot-recycling-taskmaster")

3.2.2Running tests

  (ql:quickload "hunchentoot-recycling-taskmaster-test")
  (asdf:test-system '#:hunchentoot-recycling-taskmaster)

3.3Starting/stopping server

You can use hunchentoot-recycling-taskmaster just by changing hunchentoot:acceptor to hunchentoot-recycling-taskmaster:parallel-acceptor, or hunchentoot:easy-acceptor to hunchentoot-recycling-taskmaster:parallel-easy-acceptor.

  (defparameter *test-server*
    (make-instance 'hunchentoot-recycling-taskmaster:parallel-easy-acceptor
		   :port 4242))
  (hunchentoot:start *test-server*)
  curl "http://127.0.0.1:4242/yo"
  # => "Hey!"

To stop it, hunchentoot:stop can be used.

  (hunchentoot:stop *test-server*)

See demo.lisp for the sample codes above.

3.4API

These symbols are exported from hunchentoot-recycling-taskmaster package. Please see their docstring.

  • [Class] parallel-acceptor
  • [Class] parallel-easy-acceptor
  • [Class] parallel-ssl-acceptor
  • [Class] parallel-easy-ssl-acceptor
  • [Class] recycling-taskmaster
  • [Variable] *default-standby-thread-count*
  • [Function] abandon-acceptor
  • [Condition] recycling-taskmaster-corrupted-error

4How it works

This section shows how hunchentoot-recycling-taskmaster works comparing other implementations.

4.1Hunchentoot; make one thread per connection.

file:pics/architecture/hunchentoot-architecture.dot.png

Hunchentoot uses one thread per one connection. It means when a client use one keep-alive connection for multiple requests, Hunchentoot dedicate one thread to that connection. There is some delay for new connections, but it works well on keep-alive connections.

4.2quux-hunchentoot and cl-tbnl-gserver-tmgr; Thread pooling

file:pics/architecture/hunchentoot-thread-pooling.dot.png

These implementations utilize a thread pool around Hunchentoot. Instead of making a new thread for a new connection, they reuse threads kept in its thread pool, reducing latency for new connections.

However, their benchmarks don't show significant differences from the original Hunchentoot. I suspect this is for two reasons:

  1. HTTP benchmarking tool, such as wrk, utilizes keep-aliveconnections. Effects of thread-pooling are limited at thebeginning of benchmarking.
  2. Their thread-pool's size may be fixed. They cannotincrease threads like Hunchentoot even on high workloads.

4.3hunchentoot-recycling-taskmaster

file:pics/architecture/hunchentoot-recycling-taskmaster-architecture.dot.png

hunchentoot-recycling-taskmaster tries to take both, thread-pooling and changing the number of threads dynamically. In hunchentoot-recycling-taskmaster, threads don't only work on connected sockets but also some management tasks, such as accepting a new connection from the listen socket, creating a new thread, or terminating themselves.

For this management, all threads share the listen socket for synchronizing acceptance and track how many threads on it. Since using the listen socket, hunchentoot-recycling-taskmaster successfully implements these mechanisms without adding new dependencies to hunchentoot.

5Benchmark

5.1Running benchmarks

To run benchmark by yourself, do below:
  (ql:quickload "hunchentoot-recycling-taskmaster-benchmark")
  (asdf:test-system '#:hunchentoot-recycling-taskmaster-benchmark)

5.2My Environments

5.2.1My machine

cl function name
LISP-IMPLEMENTATION-TYPE SBCL
LISP-IMPLEMENTATION-VERSION 2.2.9.debian
MACHINE-TYPE X86-64
MACHINE-VERSION 13th Gen Intel(R) Core(TM) i7-1360P
SOFTWARE-TYPE Linux
SOFTWARE-VERSION 5.15.153.1-microsoft-standard-WSL2

5.2.2Server library versions and parameters

name parameters version Git commit
hunchentoot-recycling-taskmaster standby-thread-count 8 0.0.1 82913edaf3f65afb189f0d72ddeb6339bf0499ae
hunchentoot 1.3.1 d1617e9d4eab6cb801c56cf36d9b0aab134fb7e6
quux-hunchentoot 1.0.2 quux-hunchentoot-20211230-git
cl-tbnl-gserver-tmgr max-thread-count 8 0.1.1 1ae71c9324e876761cd1ee51768a34f0793e6879
wookie 0.3.15 1f74b6c24b463c1e6fff35377e477934f72bac20
woo worker-num 8 0.12.0 7f5219c55d49190f5ae17b123a8729b31c5d706e

5.2.3benchmarking tool

I used wrk like below

  # keep-alive
  wrk -t 4 -c 100 -d 10 http://localhost:4242/yo
  # simulating no keep-alive
  wrk "-H Connection: close" -t 4 -c 100 -d 10 http://localhost:4242/yo

5.3Results

5.3.1keep-alive, sleep 1ms

file:pics/benchmark-result-2025-12-26/sleep_1ms/keep-alive/100_connections_requests_sec__sleep_1ms__keep-alive__higher_is_better.svg file:pics/benchmark-result-2025-12-26/sleep_1ms/keep-alive/100_connections_latency(us)__sleep_1ms__keep-alive__lower_is_better.svg

On this benchmark, my HTTP handler always responds after 1ms to simulate some workloads.

Hunchentoot was quite fast. As mentioned above, Hunchentoot assigns one thread per connection and keeps using that thread until the connection is closed. Therefore, if a connection is kept alive, there's no delay from thread creation, and it doesn't become Hunchentoot slow. wrk uses keep-alive by default, so this result benefited from this.

cl-tbnl-gserver-tmgr was not very fast. This is presumably because it only assigns 8 threads as workers.

Woo was not fast even set to 8 threads. See About Woo below.

5.3.2no keep-alive, sleep 1ms

file:pics/benchmark-result-2025-12-26/sleep_1ms/no-keep-alive/100_connections_requests_sec__sleep_1ms__no_keep-alive__higher_is_better.svg file:pics/benchmark-result-2025-12-26/sleep_1ms/no-keep-alive/100_connections_latency(us)__sleep_1ms__no_keep-alive__lower_is_better.svg

On this benchmark, HTTP handler still causes 1ms latency, and I simulated "no keep-alive" by adding wrk option ="-H Connection: close"= .

Hunchentoot became slow. In this test, Hunchentoot works like "one thread per one request" so latencies caused by creating a new thread affected the result.

hunchentoot-recycling-taskmaster is designed to work well in this situation.

5.3.3keep-alive, sleep 0ms

file:pics/benchmark-result-2025-12-26/sleep_0ms/keep-alive/100_connections_requests_sec__sleep_0ms__keep-alive__higher_is_better.svg file:pics/benchmark-result-2025-12-26/sleep_0ms/keep-alive/100_connections_latency(us)__sleep_0ms__keep-alive__lower_is_better.svg

On this benchmark, I set HTTP handler's latency to 0ms. (However some latencies caused by small computations still exist. See the section about this below.)

hunchentoot-recycling-taskmaster became fast. I think this is because threads are accepting connections in parallel.

5.3.4no keep-alive, sleep 0ms

file:pics/benchmark-result-2025-12-26/sleep_0ms/no-keep-alive/100_connections_requests_sec__sleep_0ms__no_keep-alive__higher_is_better.svg file:pics/benchmark-result-2025-12-26/sleep_0ms/no-keep-alive/100_connections_latency(us)__sleep_0ms__no_keep-alive__lower_is_better.svg

On this benchmark, I set HTTP handler's latency to 0ms like above, and set wrk option ="-H Connection:close"= .

Here, Woo finally takes first place, with hunchentoot-recycling-taskmaster coming in a close second.

5.3.5Other results

5.4About Woo

5.4.1How to sleep?

Woo becomes significantly slow if the handler is even slightly delayed. With the following setup for the "1ms sleep" benchmark, I observed poor results:"

  (defparameter *handler-sleep-seconds* 0)

  (defun handler-small-sleep ()
    (sleep *handler-sleep-seconds*))

  (woo:run
   (lambda (env)
     (declare (ignore env))
     (handler-small-sleep)
     '(200 (:content-type "text/plain") ("Hello, World")))
   :worker-num 8)

This is because Woo is an async server that handles multiple connections simultaneously in a thread. If a delay occurs in the processing of one connection, all other connections in the same thread will be delayed. Given its async architecture, sleep like above is obviously discouraged.

In cases like this, you generally don't sleep inside the async server's event loop. You run time-consuming processing outside the event loop, and when it's finished, you notify the event loop of the content to be sent and received, or set up a callback to be called. My code for benchmarking Wookie does it.

But for some reason, Woo doesn't seem to have such a mechanism. I couldn't find it. quickdocs-api, which is said to use Woo, do not seem to take such considerations into account. This code is also.

Some people on here or here have said that "offloading is possible with lparallel", but I have yet to find any code that actually does this. The following naive code, which creates a thread in the handler, will result in an error.

  ;; Making a thread like below don't works because `woo.ev:*evloop*' is NIL.
  (defparameter *woo-callback-threads-app*
    (lambda (_env)
      (declare (ignore _env))
      (lambda (callback)
	(bt:make-thread (lambda ()
			  (funcall callback '(200 (:content-type "text/plain") ("Hello, World"))))))))

  (clack:clackup *woo-callback-threads-app* :server :woo)
  ;; Binding like that does not works also.
  (let ((evloop woo.ev:*evloop*))
    (bt:make-thread (lambda (&aux (woo.ev:*evloop* evloop))
		      (funcall callback '(200 (:content-type "text/plain") ("Hello, World"))))))

This is because the woo.ev:*evloop* variable, which is the actual event loop, cannot be referenced in a newly created thread with (make-thread) and become NIL.

This problem can be solved by directly handling libev, which Woo depends on, directly handling the event loop. Gemini-CLI wrote this code to do so. It certainly works, and the benchmarks aren't bad. However, I don't know how this code works. (Please don't ask me.)

So, to benchmark "sleep 1ms" in Woo, I have no choice but to use the bad code above. This is a huge disadvantage to Woo, but it's unavoidable due to the lack of async support in it. This is also the way quickdocs-api is written.

5.4.2The delay that I overlooked

Some people might look at the above keep-alive sleep 0ms result and wonder "Why is Woo not as fast as people say?" I thought the same thing, investigated and found that the cause was that the handler definition for this benchmark calls (handler-small-sleep) defined above. Let's see it again:

  (defparameter *handler-sleep-seconds* 0)

  (defun handler-small-sleep ()
    (sleep *handler-sleep-seconds*))

This code does the following:

  1. Eval the special variable (which becomes 0).
  2. Call cl:sleep with 0 (which should return immediately).

These two processes should not take much time, but unfortunately they may cause performance problems in Woo. Changing handler-small-sleep as follows made it faster, as shown in the following graph.

  (defun handler-small-sleep ()
    )

file:pics/benchmark-result-2025-12-26/sleep_0ms_no_special_vars/keep-alive/100_connections_requests_sec__sleep_commented_out__keep-alive__higher_is_better.svg file:pics/benchmark-result-2025-12-26/sleep_0ms_no_special_vars/keep-alive/100_connections_latency(us)__sleep_commented_out__keep-alive__lower_is_better.svg

I tried some tests after a day, resulted here (with (defparameter *handler-sleep-seconds* 0)) :

requests/sec latency
(sleep *handler-sleep-seconds*) 44770.99 2.27ms
(sleep 0) 49158.22 2.06ms
*handler-sleep-seconds* 215447.83 0.95ms
do nothing 247789.97 1.00ms
(when (plusp *handler-sleep-seconds*) (sleep *handler-sleep-seconds*)) 240050.33 1.26ms

So, to get the most out of Woo's performance, I should not call sleep even with 0.

In other words, when using Woo I will be troubled by overlooking any little delays. I must examine handlers should work with very low latency. If Woo's situation doesn't change, it could be argued that Woo is very fast only when your handlers perform minimal computation. I think Woo is good for serving static pages

6TODOTODO list

6.1register to systems

  • Quicklisp
  • Ultralisp
  • ocicl

6.2Support Lispworks

Because Hunchentoot on Lispworks does not hold a listen socket on its structure, hunchentoot-recycling-taskmaster cannot utilize it.

6.3Benchmark other servers

[ ] conserv
Causing cl:type-error when I used its sample code.
[ ] house
Very fragile. See my memo .
[ ] teepeedee2
cannot be loaded by my machine because of "heap exhausted" error.

6.4Ideas

  • Using atomic variables -- its impact is small on SBCL, large on Allegro CL. See /atomic-op-taskmaster in this repository.
  • make the number of standby-thread-count to variadic.

7License

BSD 2-Clause, same as Hunchentoot. See LICENSE.

Dependencies (2)

  • 1am
  • hunchentoot

Dependents (0)

    • GitHub
    • Quicklisp