hunchentoot-recycling-taskmaster
2026-01-01
An experiment to improve multithreading performance of Hunchentoot without any additional dependencies.
Author
License
1Abstract
hunchentoot-recycling-taskmaster is a taskmaster implementation for Hunchentoot, aiming to improve connection establishment efficiency through thread-pooling and flexible thread count adjustment.
2Performance tl;dr
On this benchmark, my HTTP handler always responds after 1ms to simulate some workloads.
- Hunchentoot is an all-rounder. It works well on Keep-alive connections. I think it is good for typical use-cases.
- If your workload does not utilize Keep-alive, hunchentoot-recycling-taskmaster may be useful.
- Woo is very difficult to use. Woo seems fast only when your handlers work with very low latency. See About Woo below.
In detail, See Benchmark below.
3How to use
3.1Caution about Lispworks
On Lispworks, hunchentoot-recycling-taskmaster does not work because Hunchentoot on that does not handle a listen socket directly.
3.2Installation
3.2.1Loading
Currently hunchentoot-recycling-taskmaster is only in Ultralisp, not in Quicklisp or ocicl.
When you use Ultralisp, ql:quickload it :
(ql:quickload "hunchentoot-recycling-taskmaster")
Or download it by yourself.
cd ~/quicklisp/local-projects
git clone https://github.com/y2q-actionman/hunchentoot-recycling-taskmaster.git
(ql:register-local-projects) ; Do if required
(ql:quickload "hunchentoot-recycling-taskmaster")
3.2.2Running tests
(ql:quickload "hunchentoot-recycling-taskmaster-test")
(asdf:test-system '#:hunchentoot-recycling-taskmaster)
3.3Starting/stopping server
You can use hunchentoot-recycling-taskmaster just by changing
hunchentoot:acceptor to hunchentoot-recycling-taskmaster:parallel-acceptor, or
hunchentoot:easy-acceptor to hunchentoot-recycling-taskmaster:parallel-easy-acceptor.
(defparameter *test-server*
(make-instance 'hunchentoot-recycling-taskmaster:parallel-easy-acceptor
:port 4242))
(hunchentoot:start *test-server*)
curl "http://127.0.0.1:4242/yo"
# => "Hey!"
To stop it, hunchentoot:stop can be used.
(hunchentoot:stop *test-server*)
See demo.lisp for the sample codes above.
3.4API
These symbols are exported from hunchentoot-recycling-taskmaster package.
Please see their docstring.
- [Class]
parallel-acceptor - [Class]
parallel-easy-acceptor - [Class]
parallel-ssl-acceptor - [Class]
parallel-easy-ssl-acceptor - [Class]
recycling-taskmaster - [Variable]
*default-standby-thread-count* - [Function]
abandon-acceptor - [Condition]
recycling-taskmaster-corrupted-error
4How it works
This section shows how hunchentoot-recycling-taskmaster works comparing other implementations.
4.1Hunchentoot; make one thread per connection.

Hunchentoot uses one thread per one connection. It means when a client use one keep-alive connection for multiple requests, Hunchentoot dedicate one thread to that connection. There is some delay for new connections, but it works well on keep-alive connections.
4.2quux-hunchentoot and cl-tbnl-gserver-tmgr; Thread pooling

These implementations utilize a thread pool around Hunchentoot. Instead of making a new thread for a new connection, they reuse threads kept in its thread pool, reducing latency for new connections.
However, their benchmarks don't show significant differences from the original Hunchentoot. I suspect this is for two reasons:
- HTTP benchmarking tool, such as
wrk, utilizes keep-aliveconnections. Effects of thread-pooling are limited at thebeginning of benchmarking. - Their thread-pool's size may be fixed. They cannotincrease threads like Hunchentoot even on high workloads.
4.3hunchentoot-recycling-taskmaster

hunchentoot-recycling-taskmaster tries to take both, thread-pooling and changing the number of threads dynamically. In hunchentoot-recycling-taskmaster, threads don't only work on connected sockets but also some management tasks, such as accepting a new connection from the listen socket, creating a new thread, or terminating themselves.
For this management, all threads share the listen socket for synchronizing acceptance and track how many threads on it. Since using the listen socket, hunchentoot-recycling-taskmaster successfully implements these mechanisms without adding new dependencies to hunchentoot.
5Benchmark
5.1Running benchmarks
To run benchmark by yourself, do below: (ql:quickload "hunchentoot-recycling-taskmaster-benchmark")
(asdf:test-system '#:hunchentoot-recycling-taskmaster-benchmark)
5.2My Environments
5.2.1My machine
| cl function name | |
|---|---|
| LISP-IMPLEMENTATION-TYPE | SBCL |
| LISP-IMPLEMENTATION-VERSION | 2.2.9.debian |
| MACHINE-TYPE | X86-64 |
| MACHINE-VERSION | 13th Gen Intel(R) Core(TM) i7-1360P |
| SOFTWARE-TYPE | Linux |
| SOFTWARE-VERSION | 5.15.153.1-microsoft-standard-WSL2 |
5.2.2Server library versions and parameters
| name | parameters | version | Git commit |
|---|---|---|---|
| hunchentoot-recycling-taskmaster | standby-thread-count 8 | 0.0.1 | 82913edaf3f65afb189f0d72ddeb6339bf0499ae |
| hunchentoot | 1.3.1 | d1617e9d4eab6cb801c56cf36d9b0aab134fb7e6 | |
| quux-hunchentoot | 1.0.2 | quux-hunchentoot-20211230-git | |
| cl-tbnl-gserver-tmgr | max-thread-count 8 | 0.1.1 | 1ae71c9324e876761cd1ee51768a34f0793e6879 |
| wookie | 0.3.15 | 1f74b6c24b463c1e6fff35377e477934f72bac20 | |
| woo | worker-num 8 | 0.12.0 | 7f5219c55d49190f5ae17b123a8729b31c5d706e |
5.2.3benchmarking tool
I used wrk like below
# keep-alive
wrk -t 4 -c 100 -d 10 http://localhost:4242/yo
# simulating no keep-alive
wrk "-H Connection: close" -t 4 -c 100 -d 10 http://localhost:4242/yo
5.3Results
5.3.1keep-alive, sleep 1ms
On this benchmark, my HTTP handler always responds after 1ms to simulate some workloads.
Hunchentoot was quite fast. As mentioned above, Hunchentoot assigns
one thread per connection and keeps using that thread until the
connection is closed. Therefore, if a connection is kept alive,
there's no delay from thread creation, and it doesn't become
Hunchentoot slow. wrk uses keep-alive by default, so this result
benefited from this.
cl-tbnl-gserver-tmgr was not very fast. This is presumably because it only assigns 8 threads as workers.
Woo was not fast even set to 8 threads. See About Woo below.
5.3.2no keep-alive, sleep 1ms
On this benchmark, HTTP handler still causes 1ms latency, and I
simulated "no keep-alive" by adding wrk option ="-H Connection: close"= .
Hunchentoot became slow. In this test, Hunchentoot works like "one thread per one request" so latencies caused by creating a new thread affected the result.
hunchentoot-recycling-taskmaster is designed to work well in this situation.
5.3.3keep-alive, sleep 0ms
On this benchmark, I set HTTP handler's latency to 0ms. (However some latencies caused by small computations still exist. See the section about this below.)
hunchentoot-recycling-taskmaster became fast. I think this is because threads are accepting connections in parallel.
5.3.4no keep-alive, sleep 0ms
On this benchmark, I set HTTP handler's latency to 0ms like above, and set
wrk option ="-H Connection:close"= .
Here, Woo finally takes first place, with hunchentoot-recycling-taskmaster coming in a close second.
5.3.5Other results
- See benchmark-result-2025-12-26.org for graphs with "400 connections" parameters.
- See this directory for raw data.
5.4About Woo
5.4.1How to sleep?
Woo becomes significantly slow if the handler is even slightly delayed. With the following setup for the "1ms sleep" benchmark, I observed poor results:"
(defparameter *handler-sleep-seconds* 0)
(defun handler-small-sleep ()
(sleep *handler-sleep-seconds*))
(woo:run
(lambda (env)
(declare (ignore env))
(handler-small-sleep)
'(200 (:content-type "text/plain") ("Hello, World")))
:worker-num 8)
This is because Woo is an async server that handles multiple
connections simultaneously in a thread. If a delay occurs in the
processing of one connection, all other connections in the same thread
will be delayed. Given its async architecture, sleep like above is
obviously discouraged.
In cases like this, you generally don't sleep inside the async
server's event loop. You run time-consuming processing outside the
event loop, and when it's finished, you notify the event loop of the
content to be sent and received, or set up a callback to be called.
My code for benchmarking Wookie does it.
But for some reason, Woo doesn't seem to have such a mechanism. I couldn't find it. quickdocs-api, which is said to use Woo, do not seem to take such considerations into account. This code is also.
Some people on here or here have said that "offloading is possible with lparallel", but I have yet to find any code that actually does this. The following naive code, which creates a thread in the handler, will result in an error.
;; Making a thread like below don't works because `woo.ev:*evloop*' is NIL.
(defparameter *woo-callback-threads-app*
(lambda (_env)
(declare (ignore _env))
(lambda (callback)
(bt:make-thread (lambda ()
(funcall callback '(200 (:content-type "text/plain") ("Hello, World"))))))))
(clack:clackup *woo-callback-threads-app* :server :woo)
;; Binding like that does not works also.
(let ((evloop woo.ev:*evloop*))
(bt:make-thread (lambda (&aux (woo.ev:*evloop* evloop))
(funcall callback '(200 (:content-type "text/plain") ("Hello, World"))))))
This is because the woo.ev:*evloop* variable, which is the actual
event loop, cannot be referenced in a newly created thread with
(make-thread) and become NIL.
This problem can be solved by directly handling libev, which Woo
depends on, directly handling the event loop. Gemini-CLI wrote this
code to do so. It certainly works, and the benchmarks aren't
bad. However, I don't know how this code works. (Please don't ask
me.)
So, to benchmark "sleep 1ms" in Woo, I have no choice but to use the bad code above. This is a huge disadvantage to Woo, but it's unavoidable due to the lack of async support in it. This is also the way quickdocs-api is written.
5.4.2The delay that I overlooked
Some people might look at the above keep-alive sleep 0ms result and
wonder "Why is Woo not as fast as people say?" I thought the same
thing, investigated and found that the cause was that the handler
definition for this benchmark calls (handler-small-sleep)
defined above. Let's see it again:
(defparameter *handler-sleep-seconds* 0)
(defun handler-small-sleep ()
(sleep *handler-sleep-seconds*))
This code does the following:
- Eval the special variable (which becomes 0).
- Call
cl:sleepwith 0 (which should return immediately).
These two processes should not take much time, but unfortunately they
may cause performance problems in Woo. Changing handler-small-sleep
as follows made it faster, as shown in the following graph.
(defun handler-small-sleep ()
)
I tried some tests after a day, resulted here (with (defparameter *handler-sleep-seconds* 0)) :
| requests/sec | latency | |
|---|---|---|
(sleep *handler-sleep-seconds*) |
44770.99 | 2.27ms |
(sleep 0) |
49158.22 | 2.06ms |
*handler-sleep-seconds* |
215447.83 | 0.95ms |
| do nothing | 247789.97 | 1.00ms |
(when (plusp *handler-sleep-seconds*) (sleep *handler-sleep-seconds*)) |
240050.33 | 1.26ms |
So, to get the most out of Woo's performance, I should not call
sleep even with 0.
In other words, when using Woo I will be troubled by overlooking any little delays. I must examine handlers should work with very low latency. If Woo's situation doesn't change, it could be argued that Woo is very fast only when your handlers perform minimal computation. I think Woo is good for serving static pages
6TODOTODO list
6.1register to systems
- Quicklisp
- Ultralisp
- ocicl
6.2Support Lispworks
Because Hunchentoot on Lispworks does not hold a listen socket on its structure, hunchentoot-recycling-taskmaster cannot utilize it.
6.3Benchmark other servers
- [ ] conserv
- Causing
cl:type-errorwhen I used its sample code. - [ ] house
- Very fragile. See my memo .
- [ ] teepeedee2
- cannot be loaded by my machine because of "heap exhausted" error.
6.4Ideas
- Using atomic variables -- its impact is small on SBCL, large on Allegro CL. See /atomic-op-taskmaster in this repository.
- make the number of
standby-thread-countto variadic.
7License
BSD 2-Clause, same as Hunchentoot. See LICENSE.