Nonparametric Estimation and Comparison of Species Richness

Abstract

Species richness (the number of species) in an assemblage is a key metric in many research fields of ecology. Simple counts of species in samples typically underestimate the true species richness and strongly depend on sampling effort and sample completeness. Based on possibly unequal‐sampling effort and incomplete samples that miss many species, there are two approaches to infer species richness and make fair comparisons among multiple assemblages,: (1) An asymptotic approach via species richness estimation. This approach aims to compare species richness estimates across assemblages. We focus on the nonparametric estimators that are universally valid for all species abundance distributions. (2) A non‐asymptotic approach via the sample‐size‐ and coverage‐based rarefaction and extrapolation on the basis of standardised sample size or sample completeness (as measured by sample coverage). This approach aims to compare species richness estimates for equally large or equally complete samples. Two R packages (SpadeR and iNEXT) are applied to beetle data for illustration.

Key Concepts

  • Due to sampling limitation, there are undetected species in almost every biodiversity survey.
  • Empirical species counts underestimate species richness and highly depend on sampling efforts and sample completeness.
  • Based on incomplete samples, species richness (observed plus undetected) is statistically difficult to estimate accurately especially for highly diverse assemblages with many rare species.
  • Abundant species (which are certain to be detected in samples) contain almost no information about the undetected species richness.
  • Rare species (which are likely to be either undetected or infrequently detected) contain nearly all the information about the undetected species richness.
  • Most nonparametric estimators of the number of undetected species are based on the frequency counts of the detected rare species, e.g. singletons and doubletons for abundance data.
  • Nonparametric estimators of species richness are universally valid for all species abundance distributions and thus are more robust than parametric estimators that are based on specified parametric abundance models.
  • Rarefaction and extrapolation methods allow for fair and meaningful comparison of species richness estimates for standardised samples on the basis of sample size or sample completeness.
  • Sample‐size‐based rarefaction and extrapolation methods aim to compare species richness estimates for equally large samples determined by samplers.
  • Coverage‐based rarefaction and extrapolation methods aim to compare species richness estimates for equally complete samples or equal fractions of population individuals reliably estimated from data.

Keywords: abundance data; diversity; extrapolation; incidence data; interpolation; prediction; rarefaction; sample coverage; species richness; standardisation

Figure 1. Sample‐size‐based rarefaction (solid lines) and extrapolation (dashed lines) sampling curves with 95% confidence intervals (shaded areas, based on a bootstrap method with 200 replications) comparing beetle species richness between an old‐growth site and a second‐growth site (Janzen, ,b). Observed samples are denoted by the solid dot and triangle. The extrapolation extends up to a maximum sample size of 1200. For each reference sample point, the numbers in parentheses show the ‐ and ‐axis coordinate. The estimated asymptote for each curve is shown next to the arrow at the right‐hand end of each curve.
Figure 2. Sample completeness curve which depicts how sample completeness (measured by sample coverage) increases with sample size for beetle species data in an old‐growth site and a second‐growth site (Janzen, ,b). For each site, the plot of sample coverage for rarefied samples (solid lines) and extrapolated samples (dashed lines) with 95% confidence intervals (shaded areas, based on a bootstrap method with 200 replications) is extrapolated up to a maximum sample size of 1200. The observed samples are denoted by the solid dot and triangle. For each reference sample point, the numbers in parentheses show the ‐ and ‐axis coordinate.
Figure 3. Coverage‐based rarefaction (solid lines) and extrapolation (dashed lines) sampling curves with 95% confidence intervals (shaded areas, based on a bootstrap method with 200 replications) comparing beetle species richness between an old‐growth site and a second‐growth site (Janzen, ,b). Observed samples are denoted by the solid dot and triangle. The extrapolation extends up to the coverage value of the corresponding maximum sample size of 1200 in Figure (86.6% in the old‐growth site, and 93.6% in the second‐growth site). For each reference sample point, the numbers in parentheses show the ‐ and ‐axis coordinate. The estimated asymptote for each curve is shown next to the arrow at the right‐hand end of each curve.
close

References

Bulmer MG (1974) On fitting the Poisson lognormal distribution to species abundance data. Biometrics 30: 101–110.

Bunge J and Fitzpatrick M (1993) Estimating the number of species: a review. Journal of the American Statistical Association 88: 364–373.

Burnham KP and Overton WS (1978) Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65: 625–633.

Burnham KP and Overton WS (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology 60 (5): 927–936.

Chao A (1984) Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics 11 (4): 265–270.

Chao A (1987) Estimating the population size for capture‐recapture data with unequal catchability. Biometrics 43: 783–791.

Chao A and Lee S‐M (1992) Estimating the number of classes via sample coverage. Journal of the American Statistical Association 87: 210–217.

Chao A (2005) Species estimation and applications. In: Balakrishnan N, Read CB and Vidakovic B (eds) Encyclopedia of Statistical Sciences, pp. 7907–7916. New York: John Wiley & Sons, Inc.

Chao A and Chiu C‐H (2012) Estimation of species richness and shared species richness. In: Balakrishnan N (ed.) Methods and Applications of Statistics in the Atmospheric and Earth Sciences, pp. 76–111. New York: John Wiley & Sons, Inc.

Chao A and Jost L (2012) Coverage‐based rarefaction and extrapolation: standardizing samples by completeness rather than size. Ecology 93 (12): 2533–2547.

Chao A and Lin C‐W (2012) Nonparametric lower bounds for species richness and shared species richness under sampling without replacement. Biometrics 68 (3): 912–921.

Chao A, Gotelli NJ, Hsieh TC, et al. (2014) Rarefaction and extrapolation with Hill numbers: a framework for sampling and estimation in species diversity studies. Ecological Monographs 84 (1): 45–67.

Chao A, Chiu C‐H, Hsieh TC, et al. (2015) Rarefaction and extrapolation of phylogenetic diversity. Methods in Ecology and Evolution 6: 380–388.

Chiarucci A, Enright NJ, Perry GLW, et al. (2003) Performance of nonparametric species richness estimators in a high diversity plant community. Diversity and Distributions 9 (4): 283–295.

Chiarucci A, Bacaro G, Rocchini D, et al. (2008) Discovering and rediscovering the sample‐based rarefaction formula in the ecological literature. Community Ecology 9 (1): 121–123.

Chiu C‐H and Chao A (2016) Estimating and comparing microbial diversity when singletons are subject to sequencing errors. PeerJ. 4: e1634.

Chiu C‐H, Wang YT, Walther BA, et al. (2014) An improved nonparametric lower bound of species richness via a modified good–Turing frequency formula. Biometrics 70 (3): 671–682.

Coleman BD, Mares MA, Willig MR, et al. (1982) Randomness, area, and species richness. Ecology 63 (4): 1121–1133.

Colwell RK and Coddington JA (1994) Estimating terrestrial biodiversity through extrapolation. Philosophical Transactions of the Royal Society of London B – Biological Sciences 345: 101–118.

Colwell RK, Chao A, Gotelli NJ, et al. (2012) Models and estimators linking individual‐based and sample‐based rarefaction, extrapolation and comparison of assemblages. Journal of Plant Ecology 5 (1): 3–21.

Cormack RM (1989) Log‐linear models for capture‐recapture. Biometrics 45 (2): 395–413.

Darroch JN and Ratcliff D (1980) A note on capture‐recapture estimation. Biometrics 36: 149–153.

Esty WW (1986) The efficiency of Good's nonparametric coverage estimator. The Annals of Statistics 14: 1257–1260.

Fisher RA, Corbet AS and Williams CB (1943) The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology 12: 42–58.

Good IJ (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40: 237–264.

Good IJ and Toulmin G (1956) The number of new species and the increase of population coverage when a sample is increased. Biometrika 43: 45–63.

Good IJ (2000) Turing's anticipation of empirical Bayes in connection with the cryptanalysis of the naval Enigma. Journal of Statistical Computation and Simulation 66 (2): 101–111.

Gotelli NJ and Colwell RK (2001) Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters 4 (4): 379–391.

Gotelli NJ and Colwell RK (2011) Estimating species richness. In: Magurran AE and McGill BJ (eds) Biological Diversity: Frontiers in Measurement and Assessment, pp. 39–54. Oxford: Oxford University Press.

Gotelli NJ and Chao A (2013) Measuring and estimating species richness, species diversity, and biotic similarity from sampling data. In: Levin SA (ed.) Encyclopedia of Biodiversity, 2nd edn, vol. 5, pp. 195–211. Waltham, MA: Academic Press.

Heck KL Jr van Belle G and Simberloff D (1975) Explicit calculation of the rarefaction diversity measurement and the determination of sufficient sample size. Ecology 56: 1459–1461.

Hurlbert SH (1971) The nonconcept of species diversity: a critique and alternative parameters. Ecology 52 (4): 577–586.

Janzen DH (1973a) Sweep samples of tropical foliage insects: description of study sites, with data on species abundances and size distributions. Ecology 54: 659–686.

Janzen DH (1973b) Sweep samples of tropical foliage insects: effects of seasons, vegetation types, elevation, time of day, and insularity. Ecology 54: 687–708.

Magurran AE (2004) Measuring Biological Diversity. Oxford: Blackwell.

Magurran AE and McGill BJ (2011) Biological diversity: Frontiers in Measurement and Assessment. Oxford: Oxford University Press.

O'Hara RB (2005) Species richness estimators: how many species can dance on the head of a pin? Journal of Animal Ecology 74 (2): 375–386.

Ord JK and Whitmore GA (1986) The Poisson‐inverse Gaussian distribution as a model for species abundance. Communications in Statistics A – Theory and Methods 15 (3): 853–871.

Palmer MW (1991) Estimating species richness: the second‐order jackknife reconsidered. Ecology 72 (4): 1512–1513.

Pielou E (1977) Mathematical Ecology. New York: John Wiley & Sons, Inc.

Preston FW (1948) The commonness and rarity of species. Ecology 29 (3): 254–283.

Sanders HL (1968) Marine benthic diversity: a comparative study. American Naturalist 102: 243–282.

Shen T‐J, Chao A and Lin J‐F (2003) Predicting the number of new species in further taxonomic sampling. Ecology 84 (3): 798–804.

Sichel HS (1997) Modeling species‐abundance frequencies and species‐individual functions with the generalized inverse Gaussian‐Poisson distribution. South African Statistical Journal 31 (1): 13–37.

Simberloff D (1979) Rarefaction as a distribution‐free method of expressing and estimating diversity. In: Grassle JF, Patil GP, Smith WK and Taillie C (eds) Ecological Diversity in Theory and Practice, pp. 159–176. Fairland, MD: International Cooperative Publishing House.

Smith W and Grassle JF (1977) Sampling properties of a family of diversity measures. Biometrics 33 (2): 283–292.

Walther BA and Moore JL (2005) The concepts of bias, precision and accuracy, and their use in testing the performance of species richness estimators, with a literature review of estimator performance. Ecography 28 (6): 815–829.

Xu H, Liu S, Li Y, et al. (2012) Assessing non‐parametric and area‐based methods for estimating regional species richness. Journal of Vegetation Science 23 (6): 1006–1012.

Further Reading

Chao A, Colwell RK, Lin CW, et al. (2009) Sufficient sampling for asymptotic minimum species richness estimators. Ecology 90 (4): 1125–1133.

Chao A, Chiu C‐H and Jost L (2014) Unifying species diversity, phylogenetic diversity, functional diversity, and related similarity and differentiation measures through Hill numbers. Annual Reviews of Ecology, Evolution, and Systematics 45: 297–324.

Chao A and Jost L (2015) Estimating diversity and entropy profiles via discovery rates of new species. Methods in Ecology and Evolution 6 (8): 873–882.

Hughes JB, Hellmann JJ, Ricketts TH, et al. (2001) Counting the uncountable: statistical approaches to estimating microbial diversity. Applied and Environmental Microbiology 67 (10): 4399–4406.

Jost L, Chao A and Chazdon RL (2011) Compositional similarity and beta diversity. In: Magurran A and McGill B (eds) Biological Diversity: Frontiers in Measurement and Assessment, pp. 66–84. Oxford: Oxford University Press.

Legendre P and Legendre L (2012) Numerical Ecology, 3rd edn. Amsterdam: Elsevier.

Contact Editor close
Submit a note to the editor about this article by filling in the form below.

* Required Field

How to Cite close
Chao, Anne, and Chiu, Chun‐Huo(May 2016) Nonparametric Estimation and Comparison of Species Richness. In: eLS. John Wiley & Sons Ltd, Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0026329]