cacle

2017-12-27

cacle - Extensible Cache services for Common Lisp

1. Introduction

cacle implements a generic cache management facility with configurable and extensible cache replacement policies. The actual cached data can be stored anywhere, with cacle taking charge of keeping track of which entry is to be discarded next when more space is needed for a new entry.

cacle is built to be thread safe and thus ready to be used in multithreaded environments, such as web services. In case multiple threads request the same piece of data simultaneously, the data is only obtained from the data provider once, then distributed to all the threads that requested it.

Note! While cacle itself is thread safe, the provider and cleanup functions are not called in a locked context, as they may take a long time to complete, during which fetches from the cache should be possible. It is on the user's responsibility to protect against potential thread conflicts in the provider and cleanup functions.

2. Installation

cacle can be installed using Quicklisp:

* (ql:quickload "cacle")
; Loading "cacle"
("cacle")
* (require "cacle")
"cacle"
NIL

3. Examples

3.1. Walkthrough

As cacle does not care about the nature of the data in the cache, nor where it comes from, it needs a provider function that is able to obtain a block of data, given the block's key. What the key is is up to the application; to cacle, it is just something that can function as a key to a hash table. Indeed, cacle stores the current contents of the cache in a hash table; the table's test can be specified when creating the cache.

Let's introduce a provider that maps key k into a block k units long. As the block's data content, a simple string is constructed. Of course, the string is not of the length indicated by the provider, but that is exactly the point: cacle does not care what the data is. The string could, for example, be the name of the actual file that holds the contents of the data block.

* (defun test-provider (key)
    "A provider returns two values: The data for the element, and the element's size."
    (format t "Providing data for key ~a~%" key)
    ;; Fetching the data takes some time...
    (sleep 1)
    (values (format nil "value for ~a" key)
            key))
TEST-PROVIDER

Now we are ready to create a cache that manages these blocks of data.

* (defparameter *my-cache* (cacle:make-cache 100 #'test-provider :policy :lru))
*MY-CACHE*
* (cacle:cache-max-size *my-cache*)
100
* (cacle:cache-size *my-cache*)
0
* (cacle:cache-count *my-cache*)
0

So, the cache is empty. Let's fetch some data.

* (cacle:cache-fetch *my-cache* 42)
Providing data for key 42
"value for 42"
NIL

Note the one-second delay in the function call. The function returned two values; meaning of the second value (here NIL) will be discussed later in this document.

etching the same data again does not cause a call to the provider, with the value returned immediately.

* (cacle:cache-fetch *my-cache* 42)
"value for 42"
NIL
* (cacle:cache-fetch *my-cache* 42)
"value for 42"
NIL

Next, widen the scope of requested items:

* (cacle:cache-fetch *my-cache* 17)
Providing data for key 17
"value for 17"
NIL
* (cacle:cache-fetch *my-cache* 33)
Providing data for key 33
"value for 33"
NIL
* (cacle:cache-fetch *my-cache* 42)
"value for 42"
NIL
* (cacle:cache-size *my-cache*)
92
* (cacle:cache-count *my-cache*)
3

The cache is already quite full, with 92 units out of 100 used. What happens if we now request data with a fresh key?

* (cacle:cache-fetch *my-cache* 24)
Providing data for key 24
"value for 24"
* (cacle:cache-size *my-cache*)
99
* (cacle:cache-count *my-cache*)
3

One of the stored blocks of data needs to go. Since of the three keys in the cache, both 42 and 33 have been referenced after 17, 17 has been discarded:

* (mapcar #'(lambda (key)
              (cacle:cache-fetch *my-cache* key :only-if-cached t))
          '(42 17 33 24))
("value for 42" NIL "value for 33" "value for 24")

Setting the :only-if-cached option prevents the calling of the provider if the queried data is not found in the cache, just returning NIL instead.

If multiple threads request the same data, the provider is only called once, with all threads eventually getting the same result data:

* (loop with out = *standard-output*
        for i below 10
        collect (bt:make-thread #'(lambda ()
                                    (let ((*standard-output* out))
                                      (cacle:cache-fetch *my-cache* 72)))))
Providing data for key 72
(#<PROCESS Anonymous thread(37) [Active] #x...> #<PROCESS Anonymous thread(38) [semaphore wait] #x...> #<PROCESS Anonymous thread(39) [semaphore wait] #x...> #<PROCESS Anonymous thread(40) [semaphore wait] #x...> #<PROCESS Anonymous thread(41) [semaphore wait] #x...> #<PROCESS Anonymous thread(42) [semaphore wait] #x...> #<PROCESS Anonymous thread(43) [semaphore wait] #x...> #<PROCESS Anonymous thread(44) [semaphore wait] #x...> #<PROCESS Anonymous thread(45) [semaphore wait] #x...> #<PROCESS Anonymous thread(46) [semaphore wait] #x...>)
* (mapcar #'bt:join-thread *)
("value for 72" "value for 72" "value for 72" "value for 72" "value for 72" "value for 72" "value for 72" "value for 72" "value for 72" "value for 72")
* (loop with first = (first *)
        for i in *
        always (eq i first))
T

Finally, get rid of all the cached data:

* (cacle:cache-flush *my-cache*)
NIL
* (cacle:cache-size *my-cache*)
0
* (cacle:cache-count *my-cache*)
0

To facilitate situations where data expiring from the cache needs some cleaning up - such as in the abovementioned case of the cache being on the disk - an optional cleanup function can be defined for the cache. This function is called whenever a block of data is discarded from the cache.

* (defun test-cleanup (data)
    (format t "Cleaning up: ~a~%" data))
TEST-CLEANUP
* (setf *my-cache* (cacle:make-cache 100 #'test-provider :policy :lru :cleanup #'test-cleanup))
#<CACHE #x...>

As cacle is designed to be used on multiple threads, a situation may arise where multiple threads request data from the cache simultaneously and an entry is removed from the cache by another thread before the thread that requested it can use the data. To prevent this situation, when a cleanup function has been defined for the cache, each call to cache-fetch must be paired with a call to cache-release. The release function is given as an argument the second value returned by cache-fetch. The cleanup function will not be called for the data if there are live references (fetches without corresponding releases) for the data.

* (defparameter *42* (multiple-value-list (cacle:cache-fetch *my-cache* 42)))
Providing data for key 42
*42*
* *42*
("value for 42" #<LINKED-CACHE-ENTRY key 42 #x...>)

The tag datum should be treated opaque by the caller and used only as an argument to cache-release.

A utility macro, with-cache-fetch, is provided for ensuring the pairing of cache-fetch and cache-release:

* (cacle:with-cache-fetch item (*my-cache* 17)
    (format t "my data: ~a~%" item))
Providing data for key 17
my data: value for 17
NIL
* (cacle:with-cache-fetch item (*my-cache* 33)
    (format t "my data: ~a~%" item))
Providing data for key 33
my data: value for 33
NIL
* (cacle:with-cache-fetch item (*my-cache* 24)
    (format t "my data: ~a~%" item))
Providing data for key 24
my data: value for 24
NIL
* (cacle:with-cache-fetch item (*my-cache* 55)
    (format t "my data: ~a~%" item))
Providing data for key 55
Cleaning up: value for 33
Cleaning up: value for 17
my data: value for 55
NIL

Note that even before the last function call, the item for the key 42 has already expired from the cache, since the total would otherwise exceed the cache's limit of 100. However, it has not been cleaned up, because it is still reserved by the very first call to cache-fetch that has not been matched with the call to cache-release yet.

* (cacle:cache-release *my-cache* (second *42*))
Cleaning up: value for 42
NIL

Note! This example also demonstrates a property in the design of cacle that should be understood before using it: The maximum size defined for the cache is the size of the live objects in the cache and does not include items that have already been scheduled for removal, pending a call to cache-release, or items that are being fetched to the cache. That is, the total size of the cache may exceed its limit by the combined size of the items currently being used by the application.

3.2. A simple CDN node

A simple node in a content distribution network could be built using cacle as follows. The content being distributed is fetched from a content server, and the cache resides on the local disk.

Warning: untested code - written as an example, not to be used as a real world CDN

(defparameter *content-server* "http://server.example.com/")
(defparameter *disk-space* #x1000000000) ;; 64 GB
(defparameter *cache-path* "/var/cache/%")

(defun fetch-content (uri)
  ;; This provider function retrieves data from the content server
  (let* (size
         (file (fad:with-output-to-temporary-file (out :template *cache-path* 
                                                       :element-type '(unsigned-byte 8))
                 (multiple-value-bind (in status)
                     (drakma:http-request (concatenate 'string *content-server* uri)
                                          :want-stream t)
                   ;; Copy the retrieved data into a file
                   (when (<= 200 status 299)
                     (fad:copy-stream in out)
                     (setf size (file-length out)))))))
    (if size
        (values file size) ; success
        (progn ; error
          (ignore-errors (delete-file file))
          (values nil 0)))))

(defun cleanup-content (file)
  ;; When content removed from the cache, delete the corresponding file
  (ignore-errors (delete-file file)))

;; Set object lifetime to 3600 seconds to force a refresh once per hour
(defparameter *cache* (cacle:make-cache *disk-space* #'fetch-content 
                                        :test 'equal
                                        :cleanup #'cleanup-content
                                        :policy :lfuda
                                        :lifetime 3600))

;; Function called by the web server to serve a certain file
(defun serve-file (uri)
  (cacle:with-cache-fetch file (*cache* uri)
    (if file
        (progn
          ;; Send back the data in file
          ...)
        (progn
          ;; Report a 404 not found
          ))))

That's it. On an incoming request, serve-file will fetch the corresponding content from a file in the cache. If the content is not cached, it is transparently fetched from the content server, stored in the cache, and sent to the end user.

4. Cache replacement policies

A cache replacement policy defines how existing entries are discarded from the cache to make room for the data that is currently being loaded. cacle implements a set of simple replacement policies and provides means for the user to build their own policy, if necessary. Additionally, a lifetime can be defined for the cache, after which cached data expires and a fresh copy is obtained instead.

The following cache replacement policies are implemented:

  • First In First Out (:fifo): Data that has been in the cache for the longest time is discarded
  • Last In First Out (:lifo): Most recently added data is discarded
  • Least Recently Used (:lru): Data that has gone unused for the longest time is discarded
  • Most Recently Used (:mru): Most recently used data is discarded
  • Random (:random): A randomly selected piece of data is discarded
  • Least Frequently Used (:lfu): Data with the lowest number of fetches is discarded
  • Least Frequently Used with Dynamic Aging (:lfuda): An aging variable is introduced to LFU to prefer discarding data that has been used a lot in the history but less often recently.

4.1. Creating your own replacement policy

Each cache replacement policy is responsible of keeping track of all the entries currently in the cache. A suitable daata structure should be chosen so that the relevant operations are as fast as possible.

A policy is a class that is instantiated once per a cache managed by the policy. Policies should be built as the replacement-policy class as the superclass. Additionally, a number of derived classes, listed in the next section, are exported by cacle and can be used as basis for a custom policy.

To be able to store entries in the policy's desired manner, each cache entry must be able to hold certain policy specific data. To accommodate this, policies may define specializations of the cache entry base class cache-entry. Policies must treat the base class opaque and access the base class's data only through the exported readers (entry-key, entry-size and entry-expiry). Policies can add slots as necessary for their own operation, and change-class of an entry-to-be-added in the entry-added generic function.

For example, linked-list-replacement-policy stores the entries in a circular doubly linked list to serve simple policies such as FIFO or LRU. In a list structure, insertion and removal of entries are constant time operations, but lookups for entries other than the first or last are costly. The respective cache entry class, linked-cache-entry, defines two additional slots that hold the forward and backward pointers:

(defclass linked-cache-entry (cache-entry)
  ((next)
   (prev)))

The replacement policy class itself holds the head of the list of entries, and, when a new entry is added to cache, changes the entry's class and pushes it at the head of the list. The call to change-class is done in a :before method so that classes derived from linked-list-replacement-policy do not need to remember to (call-next-method) in their entry-added method implementations.

(defclass linked-list-replacement-policy (replacement-policy)
  ((head :initform (make-instance 'linked-cache-entry))))

(defmethod entry-added :before ((policy linked-list-replacement-policy) (entry cache-entry))
  (change-class entry 'linked-cache-entry))

(defmethod entry-added ((policy linked-list-replacement-policy) (entry cache-entry))
  (link-after entry (slot-value policy 'head)))

The following classes have been implemented for the bundled replacement policies:

  • fifo-replacement-policy
  • lifo-replacement-policy
  • lru-replacement-policy
  • mru-replacement-policy
  • random-replacement-policy
  • lfu-replacement-policy
  • lfuda-replacement-policy.

lfu-replacement-policy uses a heap structure to store entries in their changing order of precedence. lfuda-replacement-policy builds on it.

5. Dictionary

5.1. The cache class

[Standard class] cache

cache is the main class of the system. It contains information abnout the data blocks currently stored in a certain cache. Note that while storing the information, cache leaves the storage of the actual data to the user: The data could be, for example, a vector of octets directly linked to the entry; a certain file in the filesystem; or a bunch of bananas in the storage room of a zoo.

While you can create an instance of cache directly with make-instance, it is recommended to use the convenience function make-cache.

[Generic reader] cache-provider cache => function

[Generic writer] (setf (cache-provider cache) new-provider)

Retrieves or sets the cache's provider function.

[Generic reader] cache-lifetime cache => number

[Generic writer] (setf (cache-lifetime cache) new-lifetime)

Retrieves or sets the cache object lifetime in seconds.

[Generic function] cache-size cache => number

Returns the sum of the sizes of data items currently stored in the cache.

[Generic function] cache-fetch cache &key only-if-cached => (values datum tag)

Fetches a datum from the cache for the given key. If the datum is not currently cached and the only-if-cached flag is not set, it is retrieved from the provider function, added to the cache, and then returned. The tag return value must be specified in a corresponding call to cache-release before it will be cleaned up.

[Macro] with-cache-fetch var (cache key &key only-if-cached) &body body

Wraps the given body between calls to cache-fetch and cache-release, ensuring that the data fetched from the cache is valid inside body and will be released afterwards.

[Generic function] cache-flush cache => NIL

Removes all entries from the cache.

5.3. Cache removal policies

[Standard class] cache-entry

Base class for cache entries. All entries are created as instances of this class, but the active removal policy may change the entry's class in the call to entry-added.

[Generic reader] entry-key

[Generic reader] entry-valid-p

[Generic reader] entry-size

[Generic reader] entry-expiry

Readers for the cache entry's basic information. entry-size will signal an unbound-slot condition when the entry is not valid (when entry-valid-p returns NIL).

To ensure thread safety, these functions must not be used outside of the cache removal policy callback functions entry-added, access-entry, entry-removed and evict-entry. The same restriction applies to all functions related to cache entries.

[Standard class] indexed-cache-entry

An indexed cache entry simply associates a free-form index with each cache entry.

[Generic reader] entry-index indexed-cache-entry => value

[Generic writer] (setf (entry-index indexed-cache-entry) new-index)

[Standard class] linked-list-replacement-policy

This policy stores entries in a linked list, pushing any new items after the head. evict-entry is not implemented, so the class cannot be directly used as a replacement policy.

[Generic reader] linked-list-head linked-list-replacement-policy => linked-cache-entry

Returns the head node of the linked list. The head node is a sentinel; it is a linked-cache-entry without a key or data, simply serving as a point for attaching the actual data nodes.


[Standard class] fifo-replacement-policy

[Standard class] lifo-replacement-policy

[Standard class] lru-replacement-policy

[Standard class] mru-replacement-policy

[Standard class] random-replacement-policy

[Standard class] lfu-replacement-policy

[Standard class] lfuda-replacement-policy

These classes implement their respective cache replacement policies.

6. Tests

Unit tests for cacle are written using FiveAM. They are hidden behind the #+5am read-time conditional; to enable the tests, load FiveAM before compiling cacle. After that, you can run the test suite from the REPL:

* (5am:run! 'cacle:cacle-tests)

Running test suite CACLE-TESTS
 Running test [...]
 [...]
 Did 619 checks.
    Pass: 619 (100%)
    Skip: 0 ( 0%)
    Fail: 0 ( 0%)

NIL

7. License

MIT

Author
Jussi Lahdenniemi <jussi@lahdenniemi.fi>
Maintainer
Jussi Lahdenniemi <jussi@lahdenniemi.fi>
License
MIT
Categories
data structure