Blog roll: Andrew Gelman John Kruschke Rasmus Bååth = Bayesian 1st Aid Robin Lovelace = Maps in R Count Baysie ...and of course R bloggers Links: Install wiqid My Github page Workarounds WCS Malaysia Program BCSS BCSS backdoor BCSS clone

Most runs of MCMC chains for Bayesian estimation begin with an adapt phase, when the samplers used to produce the chains are tuned for best performance. I recently did a long run for a complex model, and the output was flagged with "convergence failure". In fact the problem was caused by cutting short the tuning or adapt phase. The default adapt phase for the R package  jagsUI is only 100 iterations, which was woefully inadequate for my model. We should be specifying big numbers for n.adapt and small numbers, or even zero, for n.burnin.

## Viewing previous graphs in R for Windows

You have produced several plots in R, and now you want to go back and take a look at one of those old plots. On a Mac or RStudio, you can do that with the navigation keys. R for Windows has the same functionality but it is not enabled by default. Here's how to enable it.

## The wiqid package is now on CRAN

The wiqid package has been accepted on CRAN, with the first version being 0.1.0. This means that it can now be installed - together with its dependencies - with install.packages("wiqid"), and update.packages() will also get you the newest version.

## Making a habitat mask for SECR in JAGS

I have just put together an R package to automate the conversion of habitat masks produced with Murray Efford's secr package to the format needed to run in JAGS or WinBUGS/OpenBUGS. You can install it by opening R and running

install.packages("makeJAGSmask", repos="http://mikemeredith.net/R")

## Installing R and JAGS on Windows

I've previously posted on compatibility issues encountered when installing R and JAGS on Macs or Ubuntu, but Windows seemed immune to these problems ... until now!

The issue arises because current R versions, 3.3.0 or later, are not compatible with the current default installer for JAGS. Note that the problem is with the Windows installer, not JAGS source code, and doesn't affect other platforms. A compatible installer is available, but it's not the default. See Martyn Plummer's post for more details.

## Installing R and JAGS on Apple Mac

I don't have an Apple computer, but I have picked up some hints about installing R and JAGS on a Mac from trying to trouble-shoot friend's installations. 10 June 2016: I now do have a Mac!

People seem to run into problems with different versions of the Mac OS, R, JAGS and the rjags package. The only way to stay sane is to use recent versions of all four.

### Check your Mac OS version!

From the Apple menu, choose About This Mac; the version number appears below the name. Note whether you have v. 10.9 (Mavericks) or later. If you have an earlier version, upgrade your OS before going further.

 Read more... 16 Jan, updated 10 June 2016

## Installing R and JAGS on Ubuntu OS

I recently tried installing R and JAGS on my machine running Ubuntu. I wanted to test my BEST and wiqid packages with the new version of JAGS on Ubuntu. It took me a while, but I finally found a simple way to do this which might be of interest to others.

I already had R and JAGS 3 installed, together with the rjags package version 3. Installing the rjags package within R (or updating it with update.packages()) installs the new version of rjags, v.4, which requires JAGS 4 and throws an error if it isn't found. But the Ubuntu repository still has JAGS 3, so you cannot update JAGS with Ubuntu Software Center.

## Bayesian estimation with a random walk sampler

In the last post we looked at a way to use conjugate distributions for several parameters via a Gibbs sampler. The output from this was an MCMC sample of random draws from the posterior distribution. We can produce similar MCMC samples without using conjugate distributions with a method often called "Metropolis-within-Gibbs".

The idea for the sampler was developed by Nicholas Metropolis and colleagues in a paper in 1953. This was before the Gibbs sampler was proposed, but it uses the same idea of updating the parameters one by one. A better name would be "componentwise random walk Metropolis sampler". The rules for the random walk ensure that a large number of samples will be a good description of the posterior distribution.

## Bayesian estimation with a simple Gibbs sampler

As discussed in the last post, conjugate distributions provide an easy way to calculate posterior distributions for a single parameter, such as detection of a species during a single visit to a site where it is present. If we have more than one unknown parameter in our model - as with a simple occupancy model, where we have detection and occupancy - we may still be able to use conjugacy via a Gibbs sampler.

Gibbs sampling works if we can describe the posterior for each parameter if we know all the other parameters in the model.

## Bayesian estimation with conjugate priors

Conjugate distributions provide useful tricks for combining informative priors with likelihoods to produce posterior distributions. In the days before powerful computers and clever algorithms, they were often the only way. They only work for a single variable. Nevertheless, Gibbs sampling, which the wiqid package uses when possible, builds on the idea of conjugate distributions.

As our example, we'll use estimation of detection probability from data for repeat visits to a site which is known to be occupied by our target species. First, we'll describe the beta distribution, then see how that can be combined with our data. A discussion of priors will follow, and we'll finish with brief descriptions of conjugate priors for other types of data.

## Likelihood and maximum likelihood estimation

I'm planning a series of posts looking at what happens under the hood when we analyse a data set using some of the estimation functions in the wiqid package. I'll focus mainly on Bayesian methods, but this first post will look at the likelihood, which is used for both Bayesian analysis and maximum likelihood estimation.

We'll use a simple occupancy model. It has just two parameters and both must between 0 and 1. That means that we can plot all possible combinations of the two parameters in a simple two-dimensional graph. As we'll see we need to add a third dimension, but three is still manageable.

## Introducing the wiqid package

The wiqid package for R statistical software provides Quick and Dirty functions for the analysis of Wildlife data.

Currently it has functions for estimating occupancy, abundance from closed captures, density from spatial capture-recaptures, and survival from mark-recapture data, plus a slew of functions for species richness and alpha and beta diversity.

It is intended to be used for (1) simulations and bootstraps, (2) teaching, and (3) introducing Bayesian methods. And it should work on all platforms: Windows, Linux, and Mac.

## The jackknife estimator

Jackknife estimators are used in ecology in two situations:
• mark-recapture estimation of number of animals in a closed population;
• species richness estimation for a defined assemblage.
In both cases, the raw number of animals or species observed (Sobs) is often too low, as some animals/species are missed. The raw number is thus a biased estimator. The jackknife aims to produce unbiased estimates.

## SECR in BUGS/JAGS with patchy habitat

Analysis of spatially explicit capture-recapture (SECR) data can be done in a maximum likelihood (ML) or a Bayesian framework. Program DENSITY and the secr package take care of the former. Bayesian analysis with the usual workhorses, WinBUGS, OpenBUGS and JAGS, is straightforward if the traps are laid out in a large area of homogenous habitat.

Faced with patches of suitable habitat surrounded by inhospitable terrain, or a large extent of habitat punctuated with patches of non-habitat, we had the choice of ML methods or one of the packages designed specifically for Bayesian SECR analysis, such as SPACECAP or SCRbayes. But then we are limited to the range of models provided by package authors: we don't have the flexibility to specify our own models that comes with WinBUGS, OpenBUGS or JAGS.

Here I present a way to incorporate patchy habitat into a BUGS/JAGS model specification.

 Read more... 22 Sept, updated 5 Nov 2013

## Probability densities and spinners

In our basic data analysis workshops, we use an idea from John Kruschke's Doing Bayesian Data Analysis: we use spinners to generate random values for continuous variables and introduce the concept of probability density.

We start off with simple spinners representing a uniform distribution over a range from, say, 0 to 0.5. We discuss the problems of attaching a probability to an exact value, which leads to probability of a range of values and hence probability density.

## Camera-trap layout for SECR

I have recently been looking at the design of camera-trap studies to estimate the population density of tigers when populations are very sparse. The intention is to use recently-developed spatially explicit capture-recapture (SECR) methods to analyse the data. The optimal camera-trap layout for SECR may well differ from the design used for older methods.

Before the advent of SECR methods, putting all your traps into a single cluster with minimal perimeter length made sense, as you needed to estimate the area trapped animals came from to get a density. SECR estimates density directly, without needing to estimate area, so a single, large cluster may no longer be advantageous.

## SECR and circular home ranges

Do the models used for SECR (spatially explicit capture recapture) assume that animals' home ranges are approximately circular?

I've seen this asserted a couple of times, in particular in Tobler and Powell (2013, p.110), and I've myself drawn circular home ranges when discussing the interpretation of the capture parameters, but I don't think it is a necessary assumption.

## SECR : spatially explicit capture recapture

I've a couple of ideas for blog posts on SECR (spatially explicit capture-recapture), and this post sets out the basic concepts of SECR which I will need to refer to in later posts.

Capture-recapture methods (also know as mark-recapture or capture-mark-recapture) have been used to estimate the size of animal populations for many years: the first software package for analysis of this kind of data, CAPTURE (Otis et al 1978), is now 35 years old. Early methods did not use the spatial component in the data, the capture locations, and spatially explicit capture-recapture models (SECR, or just spatial capture-recapture, SCR) first appeared in 2004 (Efford 2004).

## BEST - Bayesian Estimation Supersedes the t-Test

John Kruschke's BEST code for R is a nice introduction to Bayesian thinking for folks used to t-tests. I've referred to it, linked to it, and used it in workshops before now.

The idea is to provide an R function which is as easy to use as t.test but which gives not a mere p-value but the kind of output Bayesians are used to - posterior probability distributions. John's BESTmcmc function uses JAGS, but handles all the preliminaries automatically and produces a result in a simple format.

## Animal activity patterns and overlap

A new R package called overlap to estimate the overlapping of animal activity patterns from data derived from camera traps has now arrived on R's central depository, CRAN.

As soon as cameras with "data backs" came along in the early 90s, biologists realised that they could harvest data on the activity patterns of rare, secretive forest animals. Were they diurnal, nocturnal, crepuscular, or maybe cathemeral (active all around the clock)? More recently, people have tried to get clues about how species interact - competition or prey-predator interactions - from activity patterns, by examining the extent of overlap.

In our corner of the biological world, Martin Ridout and Matt Linkie published a paper (2009) on the activity patterns of tigers, clouded leopards and golden cats in Sumatra, with a lot of technical detail on how overlap could be quantified and confidence intervals estimated. They followed up (2011) with a paper on tigers and their prey, also in Sumatra. ...

## Comparing confidence intervals

Often we are interested in the difference between the means of two populations and whether we can infer from samples from the populations that the means are different.

This is often a silly question: the means of real populations are almost always different, even if the difference is microscopic. More useful would be to estimate the difference and the probability that it is big enough to be of practical importance. See the BEST software for a way to do this in R.

Sometimes we are presented with confidence intervals for each of the means. This happens in particular with the standard packages we use for wildlife data analysis, where the output includes confidence intervals for each coefficient or real value. Can we infer evidence of a difference from confidence intervals in the same way as for a p-value from a test of significance?

## What if my data aren't normal?

Sometimes people I talk to are worried because their data aren't normally distributed, and they believe that they can't use the usual techniques such as t-tests or ANOVA without first transforming the data to be normal, or they must resort to non-parametric methods.

There are many good reasons for transforming data or NOT using t tests or F tests, but non-normal data is not usually one of them!

## Installing JAGS with AVG anti-virus software

A couple of people on a recent workshop had trouble with their AVG anti-virus software when installing JAGS 3.3.0.

This appears to be due to AVG's paranoia: see Martyn Plummer's comment. No malware is detected by McAfee AntiVirus Plus or Trend Micro Office Scan. See also information on false positives at the AVG forum.

## Displaying rasters in QGIS

In ecology and wildlife studies, a lot of our spatial data takes the form of rasters rather than vector files. When you first add a raster in QGIS, you usually get a plain grey rectangle, or maybe just a grey outline on a white background, as most raster file formats have no styling information. To make sense of a raster, you need to change the style.

Here I'll give some hints for "quick-and-dirty" styling to display the contents of a raster. For a more detailed tutorial, see here.

## Creating a GIS layer for "Distance from..." in QGIS

In a recent post, I showed how to deal with "distance from..." data in GIS layers using the R packages for handling spatial information. The example I used there involved

1. producing a layer with distance-from-nearest-road as a habitat layer so that we can calculate a probability of occupancy layer, and

2. extracting distance-from-nearest-road for each of our cameras in order to model our data.

Here we will see how to do the same thing in QGIS.

## Importing data into R for home range analysis

At our recent workshop on Geographical Information Systems (GIS) using Quantum GIS we had a number of people interested in working with radio telemetry or GPS data to model animal home ranges. The home range plugin for QGIS doesn't work with current versions, at least with Windows.

It is designed to pass data to R and get the adehabitat package to do the home range estimation and pass the result back to QGIS. QGIS uses Python code, and to get it to talk to R requires a bit of software called "RPy2". This was always difficult to set up on Windows, but Python has been upgraded and RPy2 no longer works. In any case, the adehabitat package has been replaced by new packages with a wider ranger of options.

So now it's better to prepare spatial data in QGIS, read the files into R, process with adehabitatHR, write the results to new files, and load into QGIS.

## Creating a GIS layer for "Distance from..."

We recently ran a workshop on Geographical Information Systems (GIS) using Quantum GIS for ecologists and wildlife researchers. For many species, distance from water, a road, forest edge, or a settlement may be an important habitat variable.

For example, we may be using automatic cameras to investigate occupancy of sites by leopards. Probability of occupancy may depend on distance from the nearest road. Given vector layers with roads and camera locations, we want to do two things:

1. produce a layer with distance-from-nearest-road as a habitat layer so that we can calculate a probability of occupancy layer, and

2. extract distance-from-nearest-road for each of our cameras in order to model our data.

## Mathematical formulae with MathJax

I sometimes need to put formulae into my web pages, and I've been exploring the use of MathJax.

In the past I've inserted the formula into MS Word with MS Equation 3.0, doing a screen capture, cropping the image to the formula I want, saving as a .GIF file, and then displaying it on the web page as an image. So I get something like this for the Poisson distribution:

That's not ideal. If I want to change anything, I have to start all over again from Word. It's also messy if I want to put something like into the text; for a start it doesn't line up properly. MathJax allows me to type the formula in LaTeX style directly into the HTML code for my web page.

## Format for data files

I have a collection of data sets for use during workshops or just to play with when trying out new statistical techniques or computer code.

A big question is what format to use, and I've changed my mind on this several times already!

After looking at this blog post by John Mount I've decided to try using tab-separated files with a .tsv extension.