Tabula rasa

Thursday, 10 December 2020

How to perform CA and DCA in R

Tarih 12/10/2020 0 Yorum

How to perform

Correspondence Analysis (CA) and Detrended Correspondence Analysis (DCA) in R

Difference between CA and DCA based on same data set

Difference between CA and DCA based on the same species composition.

In this post, I will try to describe how to run unconstrained (indirect) ordination such as Correspondence Analysis (CA) ^1,2,3 and Detrended Correspondence Analysis (DCA)^4,5using vegan package ⁶ in R language ⁷. Both CA and DCA are pretty simple and straightforward functions, thanks to the vegan package. This post would only be useful provided that you are familiar with high-level and low-level plotting commands.

First of all, it is absolutely marvellous to have this chance to perform these multivariate analysis in R, because it is free for everyone, even though its learning curve is rather steep and can be painful to accustom its environment. Furthermore, the vegan package for R is a splendid gift for those who need to employ multivariate analysis. If you are interested in mathematical background there are a couple of books available 7, 8

Firstly, we need to install vegan and rioja packages

install.packages("vegan", "rioja")

Here, I will use the existing data set in rioja package containing 41 diatom taxa in 20 samples from the Round Loch of Glenhead (RLGH) is a small yet beautiful lake from south-west Scotland.


# *********************************************************************** #

library(vegan)
library(rioja)

data(RLGH)
spec <- RLGH$spec #assign a variable for species

windows(10, 5)
par(mfrow=c(1,2))


# Run Correspondence Analysis (CA)
RLGH.ca <- cca(spec)
# set up a blank canvas first 
plot(RLGH.ca , type = "n", las=1, mgp=c(1.8, 0.3, 0), tcl= 0.25) 

# site scores for axis 1 and axis 2
points(RLGH.ca, display="sites", choices=c(1,2), col="black", bg="lightblue", pch=22,  cex=1.1, lwd=0.3)

# site names
ordipointlabel(RLGH.ca, display="sites", col="black", add = TRUE, cex=0.8)
print(RLGH.ca)

# Run Detrended Correspondence Analysis (DCA) 
RLGH.dca <- decorana(spec) 

# set up a blank canvas first 
#plot(RLGH.dca, type = "n", las=1, mgp=c(1.8, 0.3, 0), tcl= 0.25) 

plot(0:4, type = "n", las=1, mgp=c(1.8, 0.2, 0), tcl= 0.25, 
     xlab="DCA 1", ylab="DCA 2", xlim=c(-1,1), ylim=c(-1,1))

# site scores for axis 1 and axis 2 
points(RLGH.dca , display="sites", choices=c(1,2), col="black", bg="brown3", pch=24,  cex=1.1, lwd=0.3)

# site names use it if you would like to see them
ordipointlabel(RLGH.dca, display="sites", col="black", add = TRUE, cex=0.8) 

print(RLGH.dca)

# *********************************************************************** #

Results from CA and DCA with the RLGH data set

I must admit that my code seems untidy especially if you do not have experience in R, yet it still works well. I did not plot sample names because it makes the plot extremely messy. There is another trick that I generally use but it is a rather mundane process so I might share it in another post, hopefully.

The certain difference between CA and DCA is arch-effect ⁴. I am ignoring the infamous edge-effect since it is another important issues in CA as highlighted by Hill and Gauch (1980), I just wanted to point out arch-effect since it is evidently present in this data set. As you can see CA on the left-hand side, the way sample scores are positioned seem like an arch. DCA, thus, provide an easy interpretation to explore ordination of the samples. If you keep digging into these ordination methods you may come across many different opinions and perspectives. Although there is a controversy about DCA⁹, this is not a place to discuss this matter. Nevertheless, I can only say that the results from DCA is easy to interpret and decide whether environmental variable is linear or unimodal^10,11

What does DCA say?

We would like to learn whether biological data exhibit linear or unimodal distribution. The most important aspect of DCA is how large is the length of the first axis. Where the axis length larger than 4 SD unit suggests data set is nonlinear and unimodal methods are suitable whilst if the axis length is shorter than 3 SD suggests linear response is preferable. In case the length of the first axis is between 3 SD and 4 SD both linear and unimodal methods can be applied. This is suggested by Lepš & Šmilauer (2003) as a rule of thumb.

Please note that the results could be considered reasonable provided that you assume DCA is a valid method. Caution must be taken in terms of understanding the dataset at hand and fundamental assumption for weighted averaging approach as well as detrending process.

References

1. Escofier-Cordier, B. (1969). L'analyse factorielle des correspondances. Cah. Bur. univ. Rech. ope'r. Univ. Paris, 13.

2. Benzecri, J. P. (1969). Statistical analysis as a tool to make patterns emerge from data. Methodologies of Pattern Recognition (Ed. by S. Watanabe), pp. 35-60. Academic Press, New York.

3. Hill, M. (1973). Reciprocal Averaging: An Eigenvector Method of Ordination. Journal of Ecology, 61(1), 237-249. doi:10.2307/2258931

4. Hill, M. O. & Gauch, H. G. 1980. Detrended correspondence analysis: An improved ordination technique. Vegetatio, 42, 47-58.

5.Gauch, H.G., Jr., 1982, Multivariate Analysis in Community Ecology: Cambridge, UK., Cambridge University Press, 298 p.

6. Oksanen, J., Blanchet., F. G., Friendly., M., Kindt., R., Legendre., P., Mcglinn., D., Minchin., P. R., O'hara., R. B., Simpson., G. L., Solymos., P., Stevens., M. H. H., Szoecs., E. & Wagner., H. 2019. vegan: Community Ecology Package. R package version 2.5-6. https://CRAN.R-project.org/package=vegan.

7. R Core Team 2020. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing.

8. Jongman, R. H. G., Braak, C. J. F. T. & Tongeren, O. F. R. V. 1995. Data Analysis in Community and Landscape Ecology, Cambridge, Cambridge University Press.

9. Legendre, P. & Legendre, L. 2012. Chapter 9. Numerical Ecology, Amsterdam, Elsevier.

10. Birks, H. J. B. 1995. Statistical Modelling of Quaternary Science Data. In: MADDY, D., BREW, J.S. (ed.) Statistical modelling of quaternary science data. Cambridge: Quaternary Research Association.

11. Birks, H. J. B. 1998. Numerical tools in palaeolimnology – Progress, potentialities, and problems. Journal of Paleolimnology, 20, 307-332.

12 Lepš, J. & Šmilauer, P. 2003. Multivariate Analysis of Ecological Data using CANOCO. Cambridge Press.

Further Reading

Ordination Methods for Ecologists Website (Dr Mike Palmer, Oklahoma State University)

http://ordination.okstate.edu/DCA.htm

http://ordination.okstate.edu/CA.htm

http://ordination.okstate.edu/eigen.htm

Comments

Friday, 13 March 2020

EndNote Kullanarak Referans Listesi Hazırlamak

Tarih 3/13/2020 0 Yorum

EndNote Kullanarak Referans Listesi Hazırlamak

EndNoteX9

EndNote, epey yaygın olarak kullanılan ücretli bir referans yönetim yazılımıdır. Ücretli olmasına rağmen pek çok üniversitenin yazılım hizmetleri ofisinden (software download centre ya da IT website), lisanslı sürümü temin edilebilir.

Yazılımı yükledikten sonra Nature, Science Direct, Science, AGU yahut Google Scholar üzerinden, ihtiyacımız olan referansların .ris ya da .enw uzantılı kaynak dosyalarını indirebiliriz. Burada dikkat edilmesi gerekilen husus, her websitesi .enw formatını temin etmediği için .ris formatını indirmekte bir sakınca yoktur.

Öncelikle, Google Scholar arama kutusuna ihtiyacımız olan makalenin başlığını yazıyoruz.

İlgili makaleyi bulduktan sonra tırnak işaretini tıklayıp...

açılan pencerede "EndNote" seçeneğini tıklayarak .enw fortmatındaki ya da "RefMan" seçeneğini tıklayıp .ris dosya türünü indiriyoruz. Ancak, Google scholar referans dosyası sağlamadığı taktirde, ilgili dosya, makalenin yayınlandığı derginin websitesinen indirilebilir.

Tuzo Wilson'un 1965 yılında, Nature Dergisinde yayınladığı "A New Class of Faults and their Bearing on Continental Drift" başlıklı makalenin, bulunduğu resmi websitesine gidelim. https://www.nature.com/articles/207343a0

"Cite this article" metnini tıklayıp...

Download citation bağlantısını tıklayarak, kaynak referans dosyasını indiriyoruz.

İndirdiğimiz dosyayı, iki kere tıklayarak EndNote kütüphanesine yüklüyoruz.

Microsoft Word üzerinden ihtiyacımız olan referansı kullanmak için EndNoteX9 sekmesini tıklayıp "Insert Citation" butonunu kullanarak mevcut referansların listelendiği pencereye erişilir.

Kütüphanedeki mevcut yayınlar listesinden atıf yapılacak yayın seçilir ve "Insert" butonu tıklanır.

EndNote, otomatik olarak atıf yaptığımız yayını, seçtiğimiz referans tipine göre (Harvard) çalışmakta olduğumuz word dosyasının en alt bölümüne ekler.

Birden fazla atıf yapmak istenildiği taktirde, CTRL'ye basılı tutarak çoklu seçim yapılır.
Ayrıca, ön ek yahut atıfların devamına kısa metinler girmek için EndNoteX9 menüsünden Edit & Manage Citation(s) butonuna tıklayıp açılan menüden prefix ya da suffix alanları doldurulabilir.

Örnek vermek maksadıyla kullanılan prefix e.g.

Son olarak, otomatik olarak hazırlanan referanslar listesi bölümünün başlığını, yazı tipini, metin yapısın ve atıflara direkt bağlantı verebilmek için aynı penceredeki Tools > Configure Bibliography seçeneği tıklanır.

Configure Bibliography

İlk açılan sekmede, Link in-text citation to references in the bibliography seçeneğini aktif ederek, yapılan atıflara link eklenebilir. Configure Bibliography penceresinden layout sekmesine tıklayarak, metin karakteri, boyunu, referans listesinin başlığı (kaynakça ya da biblografya) düzenlenebilir.

Burada paylaşılan adımlar yazılımın temel unsurlarını yansıtmaktadır ve pek çok yeni kullanıcıya yol gösterebilir. Detaylı bilgi için üretici firmanın hazırladığı detaylı doküman[1] ve topluluk forumuna[2] başvurmak şüphesiz çok faydalı olacaktır.

Kaynaklar
[1] https://researchsoftware.com/sites/researchsoftware.com/files/files/product_attachments/EndNote%20X9%20Windows%20Documentation.pdf

[2] https://community.endnote.com

Comments

All models are wrong

As George Box says, "All models are wrong," I prefer to adopt a heuristic approach to learning, allowing me to discover answers through exploration and experimentation rather than simply memorising facts and concepts. More importantly, our work is always subject to the principle of refutability, as Karl Popper suggests. Consequently, I have been driven to develop sophisticated Bayesian frameworks and apply Markov Chain Monte Carlo (MCMC) methods to build more robust and realistic hierarchical models. My journey has taken me through numerous multidisciplinary research and international projects, greatly enriching my experience. Participating in a variety of projects has significantly deepened my interest in learning more and understanding the needs of others. Ultimately, I emphasise the importance of recognising the limitations of models and ensuring data quality, model validity and reliability.

Single Pages

Thursday, 10 December 2020

How to perform CA and DCA in R

Friday, 13 March 2020

EndNote Kullanarak Referans Listesi Hazırlamak