Tabula rasa

How to perform

Correspondence Analysis (CA) and Detrended Correspondence Analysis (DCA) in R

Difference between CA and DCA based on same data set

Difference between CA and DCA based on the same species composition.

In this post, I will try to describe how to run unconstrained (indirect) ordination such as Correspondence Analysis (CA) ^1,2,3 and Detrended Correspondence Analysis (DCA)^4,5using vegan package ⁶ in R language ⁷. Both CA and DCA are pretty simple and straightforward functions, thanks to the vegan package. This post would only be useful provided that you are familiar with high-level and low-level plotting commands.

First of all, it is absolutely marvellous to have this chance to perform these multivariate analysis in R, because it is free for everyone, even though its learning curve is rather steep and can be painful to accustom its environment. Furthermore, the vegan package for R is a splendid gift for those who need to employ multivariate analysis. If you are interested in mathematical background there are a couple of books available 7, 8

Firstly, we need to install vegan and rioja packages

install.packages("vegan", "rioja")

Here, I will use the existing data set in rioja package containing 41 diatom taxa in 20 samples from the Round Loch of Glenhead (RLGH) is a small yet beautiful lake from south-west Scotland.


# *********************************************************************** #

library(vegan)
library(rioja)

data(RLGH)
spec <- RLGH$spec #assign a variable for species

windows(10, 5)
par(mfrow=c(1,2))


# Run Correspondence Analysis (CA)
RLGH.ca <- cca(spec)
# set up a blank canvas first 
plot(RLGH.ca , type = "n", las=1, mgp=c(1.8, 0.3, 0), tcl= 0.25) 

# site scores for axis 1 and axis 2
points(RLGH.ca, display="sites", choices=c(1,2), col="black", bg="lightblue", pch=22,  cex=1.1, lwd=0.3)

# site names
ordipointlabel(RLGH.ca, display="sites", col="black", add = TRUE, cex=0.8)
print(RLGH.ca)

# Run Detrended Correspondence Analysis (DCA) 
RLGH.dca <- decorana(spec) 

# set up a blank canvas first 
#plot(RLGH.dca, type = "n", las=1, mgp=c(1.8, 0.3, 0), tcl= 0.25) 

plot(0:4, type = "n", las=1, mgp=c(1.8, 0.2, 0), tcl= 0.25, 
     xlab="DCA 1", ylab="DCA 2", xlim=c(-1,1), ylim=c(-1,1))

# site scores for axis 1 and axis 2 
points(RLGH.dca , display="sites", choices=c(1,2), col="black", bg="brown3", pch=24,  cex=1.1, lwd=0.3)

# site names use it if you would like to see them
ordipointlabel(RLGH.dca, display="sites", col="black", add = TRUE, cex=0.8) 

print(RLGH.dca)

# *********************************************************************** #

Results from CA and DCA with the RLGH data set

I must admit that my code seems untidy especially if you do not have experience in R, yet it still works well. I did not plot sample names because it makes the plot extremely messy. There is another trick that I generally use but it is a rather mundane process so I might share it in another post, hopefully.

The certain difference between CA and DCA is arch-effect ⁴. I am ignoring the infamous edge-effect since it is another important issues in CA as highlighted by Hill and Gauch (1980), I just wanted to point out arch-effect since it is evidently present in this data set. As you can see CA on the left-hand side, the way sample scores are positioned seem like an arch. DCA, thus, provide an easy interpretation to explore ordination of the samples. If you keep digging into these ordination methods you may come across many different opinions and perspectives. Although there is a controversy about DCA⁹, this is not a place to discuss this matter. Nevertheless, I can only say that the results from DCA is easy to interpret and decide whether environmental variable is linear or unimodal^10,11

What does DCA say?

We would like to learn whether biological data exhibit linear or unimodal distribution. The most important aspect of DCA is how large is the length of the first axis. Where the axis length larger than 4 SD unit suggests data set is nonlinear and unimodal methods are suitable whilst if the axis length is shorter than 3 SD suggests linear response is preferable. In case the length of the first axis is between 3 SD and 4 SD both linear and unimodal methods can be applied. This is suggested by Lepš & Šmilauer (2003) as a rule of thumb.

Please note that the results could be considered reasonable provided that you assume DCA is a valid method. Caution must be taken in terms of understanding the dataset at hand and fundamental assumption for weighted averaging approach as well as detrending process.

References

1. Escofier-Cordier, B. (1969). L'analyse factorielle des correspondances. Cah. Bur. univ. Rech. ope'r. Univ. Paris, 13.

2. Benzecri, J. P. (1969). Statistical analysis as a tool to make patterns emerge from data. Methodologies of Pattern Recognition (Ed. by S. Watanabe), pp. 35-60. Academic Press, New York.

3. Hill, M. (1973). Reciprocal Averaging: An Eigenvector Method of Ordination. Journal of Ecology, 61(1), 237-249. doi:10.2307/2258931

4. Hill, M. O. & Gauch, H. G. 1980. Detrended correspondence analysis: An improved ordination technique. Vegetatio, 42, 47-58.

5.Gauch, H.G., Jr., 1982, Multivariate Analysis in Community Ecology: Cambridge, UK., Cambridge University Press, 298 p.

6. Oksanen, J., Blanchet., F. G., Friendly., M., Kindt., R., Legendre., P., Mcglinn., D., Minchin., P. R., O'hara., R. B., Simpson., G. L., Solymos., P., Stevens., M. H. H., Szoecs., E. & Wagner., H. 2019. vegan: Community Ecology Package. R package version 2.5-6. https://CRAN.R-project.org/package=vegan.

7. R Core Team 2020. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.

8. Jongman, R. H. G., Braak, C. J. F. T. & Tongeren, O. F. R. V. 1995. Data Analysis in Community and Landscape Ecology, Cambridge, Cambridge University Press.

9. Legendre, P. & Legendre, L. 2012. Chapter 9. Numerical Ecology, Amsterdam, Elsevier.

10. Birks, H. J. B. 1995. Statistical Modelling of Quaternary Science Data. In: MADDY, D., BREW, J.S. (ed.) Statistical modelling of quaternary science data. Cambridge: Quaternary Research Association.

11. Birks, H. J. B. 1998. Numerical tools in palaeolimnology – Progress, potentialities, and problems. Journal of Paleolimnology, 20, 307-332.

12 Lepš, J. & Šmilauer, P. 2003. Multivariate Analysis of Ecological Data using CANOCO. Cambridge Press.

Further Reading

Ordination Methods for Ecologists Website (Dr Mike Palmer, Oklahoma State University)

http://ordination.okstate.edu/DCA.htm

http://ordination.okstate.edu/CA.htm

http://ordination.okstate.edu/eigen.htm

Single Pages

Thursday, 10 December 2020

How to perform CA and DCA in R