MCMC samples with a random walk 

HOME 
In the last
post we looked at a way to use conjugate distributions for
several parameters
via a Gibbs sampler. The output from this was an MCMC sample of
random draws from the posterior distribution. We can produce
similar MCMC samples without using conjugate distributions with
a method often called "MetropoliswithinGibbs". The idea for the sampler was developed by Nicholas Metropolis and colleagues in a paper in 1953. This was before the Gibbs sampler was proposed, but it uses the same idea of updating the parameters one by one. A better name would be "componentwise random walk Metropolis sampler". The rules for the random walk ensure that a large number of samples will be a good description of the posterior distribution. We'll illustrate this with the same salamander data that we used before. For Gibbs sampling we used conjugacy, so we did not need to calculate the likelihood, we just needed the number of successes and failures. For the random walk method we do need to calculate likelihood, so take a look at the post where we looked at maximum likelihood estimation. Recall that 39 sites were visited on 5 separate occasions, and the number of occasions when at least one salamander was found recorded. Here are the data, sorted in descending order:
We want to estimate the probability that a site is occupied by salamanders (\(\psi\)) and probability of detecting salamanders on a single visit if they are present (\(\pi\)). The simplest model assumes that \(\psi\) is the same for all sites and \(\pi\) is the same for all sites and all occasions. Metropolis samplingThis is the componentwise Metropolis random walk algorithm:
Two important features make this algorithm simple:
The R code for the Metropolis sampler for the salamander data is at the foot of the page. The animation below shows how this works for the salamander data. I used uniform Beta(1, 1) priors and 0.5 as the starting values for both occupancy and detection. For the maximum step size, \(\delta\), I used 0.1 for both (though as we shall see, these are not the optimal values).
The general pattern is similar to the Gibbs sampler, but this time it takes several steps to move away from the starting point, and many of the steps are either vertical (constant \(\psi\)) or horizontal (constant \(\pi\)). You can download an MP4 version here. Checking... and tuningJust as with the Gibbs sampler, we discard the first values (burnin):
We can look at trace plots for the Metropolis output and check the effective sample size, just as we did for the Gibbs output:
This is based on 10,000 iterations! For the Gibbs sampler, the effective sizes were 1,400 and 2,100. Effective samples of less than 1000 are scarcely adequate descriptions of the posterior distribution. As the trace plots show, the sampler is moving in tiny steps, especially for \(\psi\). The autocorrelation is high, so effective sample sizes are small. We check the acceptance rates, the proportion of candidate values that are accepted:
Most of the candidate values are being accepted, which is a symptom of having a step size that is too small. With the Metropolis sampler we can control the step size. For the run we are looking at I used a maximum step size of 0.1, so the average step size will be half that, 0.05. Step size up to 0.6I reran the analysis with a maximum step size of 0.6. This gave acceptance rates of 0.27 for \(\psi\) and 0.12 for \(\pi\). Here are the trace plots:
Now the sampler takes big steps, but most of the time it isn't moving at all; it's trying to step too far and most of the candidate values are rejected. This also results in high autocorrelation and small effective sample sizes, now 788 and 512. Tuning the Metropolis samplerI did a series of runs with different combinations of step sizes and the one with the lowest autocorrelation and highest effective sample size used 0.3 for \(\psi\) and 0.2 for \(\pi\).
The effective sample sizes now are 1240 and 1420. The acceptance rates were 0.50 and 0.36. The optimum value for acceptance rates for updating parameters one at a time is 0.44, but rates between 0.15 and 0.50 are usable. This process of adjusting the step size is know as tuning or adapting the sampler, and software such as WinBUGS or JAGS include an adaptation phase before starting the main Markov chain. A practical advantage of the Gibbs sampler is that it does not need tuning, as all values generated are accepted. Metropolis sampler outputNow that we have an MCMC sample with reasonable effective
size, we plot
histograms and calculate means and credible intervals. If you
have the
You can also look at correlations between the MCMC chains and plot a credible region, just as we did for the Gibbs sampler, with the same results to within Monte Carlo error. R code for the Metropolis samplerHere's the code for the Metropolis sampler for the salamanders analysis:
The code for everything on this page, including the animated plot, is here.
ReferencesMetropolis N, Rosenbluth A, MN R, Teller E (1953). "Equation of State Calculations by Fast Computing Machines." Journal of Chemical Physics, 21, 10871092. 
Updated 5 March 2015 by Mike Meredith 