*The `julia-observer-quote-cut-paste-0**work package has been deprecated since it has been merged into thejulia-observer-quote-cut-paste-1*

This package implements the following 6 MCMC algorithms:

Julia function | MCMC algorithm | |
---|---|---|

1 | mh | Metropolis-Hastings (MH) |

2 | mala | Metropolis adjusted Langevin algorithm (MALA) |

3 | smmala | Simplified Manifold MALA (SMMALA) |

4 | mmala | Manifold MALA (MMALA) |

5 | hmc | Hamiltonian Monte Carlo (HMC) |

6 | rmhmc | Riemannian manifold HMC (RMHMC) |

More details for the geometric MCMC methods of the package can be found in this article.

Furthermore, the package provides the `linearZV()`

and `quadraticZV()`

functions for the computation of the zero-variance (ZV) Monte Carlo Bayesian estimators, see this publication.

To install the GeometricMCMC package, the usual commands of the Julia package manager are called:

```
Pkg.update()
Pkg.add("GeometricMCMC")
```

After installation, the following typical statement is typed in the Julia command prompt so as to use the GeometricMCMC package:

```
using GeometricMCMC
```

The `Distributions`

package may be needed, mostly for sampling from
distributions, so it may be useful to make both packages available:

```
using Distributions, GeometricMCMC
```

This file serves as a tutorial explaining how to use the MCMC routines of the
package, as well as the `linearZV()`

and `quadraticZV()`

functions. Code for
the Bayesian logit model with multivariate uncorrelated zero-mean Normal prior
is provided to demonstrate usage and to follow through the tutorial.

To invoke each of the MCMC methods of the package, it is required to provide
two input arguments. The first argument is an instance of the `Model`

type,
defined in the package, and is common across all 6 MCMC routines. The second
argument is an instance of the algorithm's options type and is specific to the
algorithm.

`Model`

typeThe `Model`

type provides the statistical model to the MCMC routines. This
includes the functions defining the model, the number of the model's parameters
and the data.

More specifically, the functions required for defining the model are the
log-prior, the log-likelihood, the gradient of the log-posterior, the
position-specific metric tensor of the Riemannian manifold of the parameter
space and the tensor's partial derivatives with respect to each of the
parameters. These functions need to be known in closed form as the package
stands so far. The log-posterior is also one of the model's functions and it
does not need to be specified by the user, since the `Model`

type sets it to be
the sum of the log-likelihood with the log-prior.

It is apparent that the `Model`

type represents a Bayesian model. However, it
is also possible to accommodate simpler statistical models, such a non-Bayesian
log-target. This can be achieved, for instance, by setting the log-likelihood
to be equal to the log-target and the improper log-prior to be zero.

For ease of use, all the user-defined functions in the Model type (apart from the ones starting with "rand" as explained later in the tutorial) conventionally share the same signature

```
function myFunction(pars::Vector{Float64}, nPars::Int, data::Union(Array{Any}, Dict{Any, Any}))
```

where `pars`

are the model's parameters simulated by the MCMC algorithm and
thus not needed to be numerically specified by the user, `nPars`

is the number
of parameters and `data`

is an Array `Array{Any}`

or a dictionary ```
Dict{Any,
Any}
```

holding the data.

The Model can be instantiated with fewer arguments. For instance, the
Metropolis-Hastings function `mh()`

requires only the log-prior, log-likelihood
and the gradient of the log-posterior. In fact, the gradient of the
log-posterior is not necessary for running a Metropolis-Hastings MCMC
simulation. Nevertheless, it has been set as a required argument so that `mh()`

returns the zero-variance control variates along with the MCMC output.
Similarly, the log-prior, log-likelihood and the gradient of the log-posterior
suffice as arguments in the instantiation of `Model`

in order to run MALA or
HMC. SMMALA requires additionally the metric tensor. The partial derivatives of
the metric tensor with respect to the parameters are needed in the `Model`

instantiation only for MMALA or RMHMC simulations.

`data`

member of the `Model`

typeTo illustrate how to create a Bayesian `Model`

, the logit regression with a
multivariate uncorrelated zero-mean Normal prior is considered. The data are
then the design matrix `X`

, with `nData`

rows and `nPars`

columns, and the
binary response variable `y`

, which is a vector with `nData`

elements. `nData`

and`nPars`

are the number of data points and the number of parameters,
respectively. Assuming a Normal prior N(0, priorVar*I), the prior's variance
`priorVar`

is also part of the data, in the broad sense of the word.

It is up to the user's preference whether to define `data`

as `Array{Any}`

or
`Dict{Any, Any}`

, as long as `data`

is manipulated accordingly in
the subsequent definition of the `Model`

functions. In what follows, `data`

is
introduced as a dictionary, holding the design matrix `X`

, the response
variable `y`

, the prior variance `priorVar`

and the number of data points
`nData`

. Strictly speaking, `nData`

is not a requirement, since it can be
deduced by `size(X, 1)`

. It is however more efficient to pass `nData`

to the
`data`

dictionary, given that a typical MCMC simulation invokes the `Model`

functions thousands of times.

As a concrete example, `test/swiss.txt`

contains the Swiss banknotes dataset.
The dataset holds the measurements of 4 covariates on 200 Swiss banknotes,
of which 100 genuine and 100 counterfeit, representing the length of the bill,
the width of the left and the right edge, and the bottom margin width.
Therefore, the design matrix `X`

has 200 rows and 4 columns, and the 4
regression coefficients constitute the model parameters to be estimated. The
binary response variable `y`

is the type of banknote, 0 being genuine and 1
counterfeit. To create the `data`

dictionary for the Swiss banknotes, change
into the `test`

directory and run the `test/swiss.jl`

script:

```
# Create design matrix X and response variable y from swiss data array
swiss = readdlm("swiss.txt", ' ');
covariates = swiss[:, 1:end-1];
nData, nPars = size(covariates);
covariates = (bsxfun(-, covariates, mean(covariates, 1))
./repmat(std(covariates, 1), nData, 1));
polynomialOrder = 1;
X = zeros(nData, nPars*polynomialOrder);
for i = 1:polynomialOrder
X[:, ((i-1)*nPars+1):i*nPars] = covariates.^i;
end
y = swiss[:, end];
# Create data dictionary
data = {"X"=>X, "y"=>y, "priorVar"=>100., "nData"=>nData};
```

`Model`

Having defined the `data`

dictionary as above, `test/logitNormalPrior.jl`

provides the functions of the Bayesian logit model:

```
# Functions for Bayesian logit model with a Normal prior N(0, priorVar*I)
function logPrior(pars::Vector{Float64}, nPars::Int, data::Dict{Any, Any})
return (-dot(pars,pars)/data["priorVar"]
-nPars*log(2*pi*data["priorVar"]))/2
end
function logLikelihood(pars::Vector{Float64}, nPars::Int, data::Dict{Any, Any})
XPars = data["X"]*pars
return (XPars'*data["y"]-sum(log(1+exp(XPars))))[1]
end
function gradLogPosterior(pars::Vector{Float64}, nPars::Int,
data::Dict{Any, Any})
return (data["X"]'*(data["y"]-1./(1+exp(-data["X"]*pars)))
-pars/data["priorVar"])
end
function tensor(pars::Vector{Float64}, nPars::Int, data::Dict{Any, Any})
p = 1./(1+exp(-data["X"]*pars))
return ((data["X"]'.*repmat((p.*(1-p))', nPars, 1))*data["X"]
+(eye(nPars)/data["priorVar"]))
end
function derivTensor(pars::Vector{Float64}, nPars::Int, data::Dict{Any, Any})
matrix = Array(Float64, data["nData"], nPars)
output = Array(Float64, nPars, nPars, nPars)
p = 1./(1+exp(-data["X"]*pars))
for i = 1:nPars
for j =1:nPars
matrix[:, j] = data["X"][:, j].*((p.*(1-p)).*((1-2*p).*data["X"][:, i]))
end
output[:, :, i] = matrix'*data["X"]
end
return output
end
function randPrior(nPars::Int, data::Dict{Any, Any})
return rand(Normal(0.0, sqrt(data["priorVar"])), nPars)
end
```

There are two points to bare in mind when defining the functions of the model.
Firstly, it is practical, though not obligatory, to use the reserved names
`logPrior()`

, `logLikelihood()`

, `gradLogPosterior()`

, `tensor()`

,
`derivTensor()`

and `randPrior()`

in the relevant function definitions.
Secondly, it is mandatory to adhere to the signature

```
function myFunction(pars::Vector{Float64}, nPars::Int, data::Union(Array{Any}, Dict{Any, Any}))
```

for the deterministic functions of the model, and to the signature

```
function myFunction(nPars::Int, data::Union(Array{Any}, Dict{Any, Any}))
```

for the stochastic function `randPrior()`

.

The second argument `nPars`

to the model's functions seems redundant, since it
can be derived from the first argument `pars`

via the `size(pars, 1)`

command.
Nevertheless, it is more efficient practice to pass `nPars`

as an argument
given the number of times the functions are invoked in a single MCMC run.
Besides, once the functions are passed to `Model`

, then they are invoked with
fewer arguments as members of the instantiated `Model`

, so the interface remains
simple.

`Model`

Once the `data`

and the functions of `Model`

are defined, it is straightforward
to instantiate `Model`

with a single command. For example,
`test/logitNormalPriorSwissRmhmc.jl`

demosntrates how to create an instance of
`Model`

so as to run RMHMC:

```
model = Model(nPars, data, logPrior, logLikelihood, gradLogPosterior, tensor, derivTensor, randPrior);
```

As shown in `test/logitNormalPriorSwissMmala.jl`

, the same `Model`

instantiation is used for MMALA. In fact, the same command invocation suffices
to create an instance which can be then utilized by any of the 6 MCMC
algorithms of the GeometricMCMC package.

In order to make `Model`

more user friendly, it is possible to shorten the
`Model`

invocation in some cases. For example, the partial derivatives of the
metric tensor are not needed when running SMMALA. This is why `derivTensor()`

is omitted in `test/logitNormalPriorSwissSmmala.jl`

in the `model`

definition:

```
model = Model(nPars, data, logPrior, logLikelihood, gradLogPosterior, tensor, randPrior);
```

As a second example, the `tensor()`

function is not needed when running
Metropolis-Hastings, MALA or HMC. For this reason, both `tensor()`

and
`derivTensor()`

can be left out at `Model`

instantiation. So, the `model`

definition in `test/logitNormalPriorSwissMh.jl`

,
`test/logitNormalPriorSwissMala.jl`

and`test/logitNormalPriorSwissHmc.jl`

takes the more succinct form

```
model = Model(nPars, data, logPrior, logLikelihood, gradLogPosterior, randPrior);
```

Apparently, the corresponding function definitions `tensor()`

and
`derivTensor()`

are not required if there is no intention to run RMHMC or MMALA.

Although the functions in the `model`

instance are meant to be invoked
internally by the MCMC functions, they are also available at the disposal of
the user. For this reason, it is useful to demonstrate how they can be called.
Having defined `model`

, `model.randPrior()`

, without any input arguments,
samples from the prior. Any of the deterministic functions of the model are
called by passing to them a vector holding the values of the model's
paramater. For example

```
parsSample = model.randPrior()
model.logPrior(parsSample)
model.logLikelihood(parsSample)
model.logPosterior(parsSample)
model.gradLogPosterior(parsSample)
model.tensor(parsSample)
model.derivTensor(parsSample)
```

The common `Model`

type shared by all the MCMC routines is handy as it has
become clear from the tutorial. On the other hand, the options of each MCMC
algorithm are obviously specific to the algorithm. The only generic set of
common MCMC options are the total number of MCMC iterations `n`

, the number of
burnin iterations `nBurnin`

and the monitor rate `monitorRate`

, which is the
number of successive iterations for which the acceptance ratio is calculated.
It is therefore natural to gather `n`

, `nBurnin`

and `monitorRate`

in the
"base" MCMC type `McmcOpts`

. `nPostBurnin`

is also a member of `McmcOpts`

and
it is implicity set by the constructor of `McmcOpts`

to be the difference
`n-nBurnin`

.

`McmcOpts`

is in turn member of the other option types, which are specific to
each MCMC routine. Therefore, the Metropolis-Hastings `MhOpts`

type consists of
`McmcOpts`

and of the additional `widthCorrection`

member, which helps
adjusting the proposal's standard deviation when the acceptance ratio is
outside the [20%, 60%] acceptance ratio band. An empirically reasonable value
for `widthCorrection`

is `0.1`

for example.

The user does not need to initialize `McmcOpts`

separately, since the
constructor of the options type of each MCMC routine invokes the constructor of
`McmcOpts`

. As it can be seen in `test/logitNormalPriorSwissMh.jl`

for example,

```
mhOpts = MhOpts(55000, 5000, 0.1);
```

creates an instance of the Metropolis-Hastings `MhOpts`

type. This means that
the following values are accessible as members of `MhOpts`

:

mhOpts's member | Value | |
---|---|---|

1 | mhOpts.mcmc.n | 55000 |

2 | mhOpts.mcmc.Burnin | 5000 |

3 | mhOpts.mcmc.nPostBurnin | 50000 |

4 | mhOpts.mcmc.monitorRate | 100 |

5 | mhOpts.widthCorrection | 0.1 |

If a monitorRate other than the default of 100 is desired, say 200, then it
can be passed as the third argument to the constructor of `MhOpts`

:

```
mhOpts = MhOpts(55000, 5000, 200, 0.1);
```

In a similar way, the type `MalaOpts`

holds the options of the MALA algorithm,
which include `McmcOpts`

and the drift step. MALA's drift step can be either a
constant real number or a function with signature

```
myDriftStep(currentIter::Int, acceptanceRatio::Float64, nMcmc::Int, nBurnin::Int, currentStep::Float64)
```

which adjusts and returns the drift step during burnin on the basis of the
current value of the acceptance ratio. `src/setStep.jl`

contains auxiliary
functions to assist the user setting up a function for adjusting the drift step
during burnin. Without elaborating in full detail, if the user wants to use
the available auxiliary `setMalaDriftStep`

function, he needs to define only a
vector of 10 drift steps. The first value of the vector is the one empirically
considered to be reasonable. If the resulting acceptance ratio is too low or
too high for MALA, then the second drift step of the vector is used, which is
expected to be rather small. The successive steps of the vector should be
increasing at a pace that brings the acceptance ratio to the desired levels.
Apparently, defining appropriate numerical values for the 10 drift steps of the
vector is a trial-and-error process. `test/logitNormalPriorSwissMala.jl`

gives
an example of how to define the drift steps and subsequently how to invoke the
`setMalaDriftStep`

function:

```
driftSteps = [1, 1e-8, 1e-7, 1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 0.25];
setDriftStep(i::Int, acceptanceRatio::Float64, nMcmc::Int,
nBurnin::Int, currentStep::Float64) =
setMalaDriftStep(i, acceptanceRatio, nMcmc, nBurnin, currentStep, driftSteps)
```

Then the `MalaOpts`

type, holding the MALA options, is instantiated as

```
malaOpts = MalaOpts(55000, 5000, setDriftStep);
```

The `SmmalaOpts`

and `MmalaOpts`

types for SMMALA and MMALA, respectively, are
aliases of the `MalaOpts`

type.

In short, `MhOpts`

holds the options for HMC. These include `McmcOpts`

, the
number of leapfrog steps `nLeaps`

, the mass matrix `mass`

and the leapfrog
step, which can be specified either as a constant real number or as a function
that adjusts the leapfrog step, similarly to the MALA mechanism for adjusting
the drift step. An example of setting up the HMC options is available in
`test/logitNormalPriorSwissHmc.jl`

:

```
leapSteps = [0.4, 1e-3/2, 1e-3, 1e-2, 1e-1, 0.15, 0.2, 0.25, 0.3, 0.35];
setLeapStep(i::Int, acceptanceRatio::Float64, nMcmc::Int,
nBurnin::Int, currentStep::Float64) =
setHmcLeapStep(i, acceptanceRatio, nMcmc, nBurnin, currentStep, leapSteps)
hmcOpts = HmcOpts(55000, 5000, 10, setLeapStep, eye(nPars));
```

where 10 corresponds to `nLeaps`

and `eye(nPars)`

to `mass`

.

The `RmhmcOpts`

options type of RMHMC includes `McmcOpts`

, the number of
leapfrog steps `nLeaps`

, the number of Newton steps `nNewton`

and the leapfrog
step, which is either a constant real or it is tuned similarly to HMC's
leapfrog step. `test/logitNormalPriorSwissRmhmc.jl`

provides an example for
setting up the RMHMC options:

```
leapSteps = [0.9, 0.01, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8];
setLeapStep(i::Int, acceptanceRatio::Float64, nMcmc::Int,
nBurnin::Int, currentStep::Float64) =
setRmhmcLeapStep(i, acceptanceRatio, nMcmc, nBurnin, currentStep, leapSteps)
rmhmcOpts = RmhmcOpts(55000, 5000, 6, setLeapStep, 4);
```

where 6 corresponds to `nLeaps`

and 4 to `nNewton`

.

With the `Model`

and MCMC option types instantiated, the MCMC routines can be
called with a single command. Here are the examples for running all 6 MCMC
algorithms on the Bayesian logit model with multivariate uncorrelated zero-mean
Normal prior:

```
mcmc, z = mh(model, mhOpts);
mcmc, z = mala(model, malaOpts);
mcmc, z = smmala(model, smmalaOpts);
mcmc, z = mmala(model, mmalaOpts);
mcmc, z = hmc(model, hmcOpts);
mcmc, z = rmhmc(model, rmhmcOpts);
```

Each of the 6 MCMC routines return 2 `Float64`

arrays with
`opts.mcmc.nPostBurnin`

rows and `model.nPars`

columns each. The `mcmc`

array
holds the simulated MCMC chains, while the `z`

array holds the zero-variance
control variates, i.e. minus half the gradient of the log-target.

The `mcmc`

and `z`

arrays can be passed as arguments to the `linearZv()`

or
`quadraticZv()`

functions in order to compute the linear or the quadratic
zero-variance Bayesian estimators:

```
linearZvMcmc, linearCoef = linearZv(mcmc, z);
quadraticZvMcmc, quadraticCoef = quadraticZv(mcmc, z);
```

`linearZv()`

returns 2 `Float64`

arrays. The first one, saved in `linearZvMcmc`

above, has `opts.mcmc.nPostBurnin`

rows and `model.nPars`

columns and holds the
linear ZV-MCMC estimators. The second array, saved in `linearCoef`

, has
`model.nPars`

rows and `model.nPars`

columns. `linearCoef`

contains the
coefficients of the linear polynomial upon which the zero-variance approach
relies.

In a similar fashion, `quadraticZv()`

returns 2 `Float64`

arrays, the first of
which has `opts.mcmc.nPostBurnin`

rows and `model.nPars`

columns and holds the
quadratic ZV-MCMC estimators. The second array has
`model.nPars*(model.nPars+3)/2`

rows and `model.nPars`

columns, containing the
coefficients of the underlying quadratic polynomial of the zero-variance method.

The Bayesian probit model with multivariate uncorrelated zero-mean Normal is
also available in the `test`

directory, accompanied by the scripts for running
all 6 MCMC algorithms on the vasoconstriction dataset. The data come from an
experiment conducted on human physiology to study the effect of taking a single
deep breath on the occurrence of a reflex vasoconstriction in the skin of the
digits. 39 samples from three individuals are available, each of them
contributing 9, 8 and 22 samples. Although the data represent repeated
measurements, the observations can be assumed to be independent, therefore the
Bayesian probit model can be applied. Two explanatory variables are included in
the study, namely the rate of inhalation and the inhaled volume of air per
individual. An intercept is also added, so 3 regression coefficients comprise
the parameters of the model. The occurrence or non-occurrence of
vasoconstriction in the skin of the digits of each subject, corresponding to 1
and 0, plays the role of the binary response.

The GeometricMCMC package is being extended to include

- models of ordinary differential equations (ODEs),
- usage of the MCMC routines when the model's functions are not known in closed form.

04/04/2013

5 months ago

64 commits