distributions
2022-11-07
Random numbers and distributions
Upstream URL
Author
License
Distributions
The Distributions package provides a collection of probabilistic distributions and related functions
Explore the docs »
Report Bug
·
Request Feature
·
Reference Manual
Table of Contents
About the Project
DISTRIBUTIONS is a library for (1) generating random draws from various commonly used distributions, and (2) calculating statistical functions, such as density, distribution and quantiles for these distributions.
In the implementation and the interface, our primary considerations are:
-
Correctness. Above everything, all calculations should be correct. Correctness shall not be sacrificed for speed or implementational simplicity. Consequently, everything should be unit-tested all the time.
-
Simple and unified interface. Random variables are instances which can be used for calculations and random draws. The naming convention for building blocks is
(draw|cdf|pdf|quantile|...)-(standard-)?distribution-name(possible-suffix)?
, egpdf-standard-normal
ordraw-standard-gamma1
, for example. -
Speed and exposed building blocks on demand. You can obtain the generator function for random draws as a closure using the accessor "generator" from an rv. In addition, the package exports independent building blocks such as draw-standard-normal, which can be inlined into your code if necessary.
Implementation note: Subclasses are allowed to calculate intermediate values (eg to speed up computation) any time, eg right after the initialization of the instance, or on demand. The consequences or changing the slots of RV classes are UNDEFINED, but probably quite nasty. Don't do it. Note: lazy slots are currently not used, will be reintroduced in the future after profiling/benchmarking.
Built With
- anaphora
- alexandria
- array-operations
- select
- let-plus
- numerical-utilities
- cephes
- special-functions
- let-plus
- float-features
Getting Started
To get a local copy up and running follow these steps:
Prerequisites
An ANSI Common Lisp implementation. Developed and tested with SBCL.
Installation
Lisp-Stat is composed of several system that are designed to be
independently useful. So you can, for example, use distributions
for
any project needing to manipulate statistical distributions.
Getting the source
To make the system accessible to ASDF (a build facility, similar to make
in the C world), clone the repository in a directory ASDF knows about. By default the common-lisp
directory in your home directory is known. Create this if it doesn't already exist and then:
- Clone the repositories
cd ~/common-lisp && \ git clone https://github.com/Lisp-Stat/distributions.git && \
- Reset the ASDF source-registry to find the new system (from the REPL)
(asdf:clear-source-registry)
- Load the system
(ql:quickload :distributions)
This will download all of the dependencies for you.
Getting dependencies
To get the third party systems that Lisp-Stat depends on you can use a dependency manager, such as Quicklisp or CLPM Once installed, get the dependencies with either of:
(clpm-client:sync :sources "clpi") ;sources may vary
(ql:quickload :distributions)
You need do this only once. After obtaining the dependencies, you can
load the system with ASDF
: (asdf:load-system :distributions)
. If
you have installed the slime ASDF extensions, you can invoke this with
a comma (',') from the slime REPL in emacs.
Usage
Create a standard normal distribution
(defparameter *rv-normal* (distributions:r-normal))
and take a few draws from it:
LS-USER> (distributions:draw *rv-normal*) 1.037208743704438d0 LS-USER> (distributions:draw *rv-normal*) -0.2847287516046668d0 LS-USER> (distributions:draw *rv-normal*) -0.6793466378900889d0 LS-USER> (distributions:draw *rv-normal*) 1.5040711441992598d0 LS-USER>
For more examples, please refer to the Documentation.
Roadmap
- Sketch the interface.
- Extend basic functionality (see Coverage below)
- Keep extending the library based on user demand.
- Optimize things on demand, see where the bottlenecks are.
Specific planned improvements, roughly in order of priority
-
more serious testing. I like the approach in Cook (2006): we should transform empirical quantiles to z-statistics and calculate the p-value using chi-square tests
-
(mm rv x) and similar methods for multivariate normal (and maybe T)
See the open issues for a list of proposed features (and known issues).
Coverage
Distribution | CDF | Quantile | Draw | Fit | |
---|---|---|---|---|---|
Bernoulli | N/A | N/A | N/A | Yes | No |
Beta | Yes | Yes | Yes | Yes | Yes |
Binomial | No | No | No | Yes | No |
Chi-Square | No | No | No | No | No |
Discrete | Yes | Yes | No | Yes | No |
Exponential | Yes | Yes | Yes | Yes | No |
Gamma | Yes | Yes | Yes | Yes | No |
Geometric | No | No | No | Yes | No |
Inverse-Gamma | Yes | No | No | Yes | No |
Log-Normal | Yes | Yes | Yes | Yes | No |
Normal | Yes | Yes | Yes | Yes | No |
Poisson | No | No | No | Yes | No |
Rayleigh | No | Yes | No | Yes | No |
Student t | No | No | No | Yes | No |
Uniform | Yes | Yes | Yes | Yes | No |
Resources
This system is part of the Lisp-Stat project; that should be your first stop for information. Also see the resources and community page for more information.
Contributing
Always try to implement state-of-the-art generation and calculation methods. If you need something, read up on the literature, the field has developed a lot in the last decades, and most older books present obsolete methods. Good starting points are Gentle (2005) and Press et al (2007), though you should use the latter one with care and don't copy algorithms without reading a few recent articles, they are not always the best ones (the authors admit this, but they claim that some algorithms are there for pedagogical purposes).
Always document the references in the docstring, and include the full citation in doc/references.bib (BibTeX format).
Do at least basic optimization with declarations (eg until SBCL doesn't give a notes any more, notes about return values are OK). Benchmarks are always welcome, and should be documented.
Document doubts and suggestions for improvements, use !!
and ??
, more marks mean higher priority.
Please see CONTRIBUTING.md for details on the code of conduct, and the process for submitting pull requests.
License
Distributed under the MS-PL License. See LICENSE for more information.
Contact
Project Link: https://github.com/lisp-stat/distributions