Bias correction in species distribution models: pooling survey and collection data for multiple species.

William Fithian, Jane Elith, Trevor Hastie, David A Keith
Author Information
  1. William Fithian: Stanford University, Department of Statistics, 390 Serra Mall, Stanford, CA, USA 94305, USA.
  2. Jane Elith: School of Botany, University of Melbourne, Parkville, VIC 3010, Australia.
  3. Trevor Hastie: Stanford University, Department of Statistics, 390 Serra Mall, Stanford, CA, USA 94305, USA.
  4. David A Keith: Centre for Ecosystem Science, University of New South Wales, Sydney 2052, NSW, Australia.

Abstract

Presence-only records may provide data on the distributions of rare species, but commonly suffer from large, unknown biases due to their typically haphazard collection schemes. Presence-absence or count data collected in systematic, planned surveys are more reliable but typically less abundant.We proposed a probabilistic model to allow for joint analysis of presence-only and survey data to exploit their complementary strengths. Our method pools presence-only and presence-absence data for many species and maximizes a joint likelihood, simultaneously estimating and adjusting for the sampling bias affecting the presence-only data. By assuming that the sampling bias is the same for all species, we can borrow strength across species to efficiently estimate the bias and improve our inference from presence-only data.We evaluate our model's performance on data for 36 eucalypt species in south-eastern Australia. We find that presence-only records exhibit a strong sampling bias towards the coast and towards Sydney, the largest city. Our data-pooling technique substantially improves the out-of-sample predictive performance of our model when the amount of available presence-absence data for a given species is scarceIf we have only presence-only data and no presence-absence data for a given species, but both types of data for several other species that suffer from the same spatial sampling bias, then our method can obtain an unbiased estimate of the first species' geographic range.

Keywords

References

  1. Ecography. 2013 Aug 1;36(8):864-867 [PMID: 25492992]
  2. Ecol Appl. 2009 Jan;19(1):181-97 [PMID: 19323182]
  3. Ecology. 2006 Dec;87(12):3021-8 [PMID: 17249227]
  4. PLoS One. 2013 Nov 18;8(11):e79168 [PMID: 24260167]
  5. Biometrics. 2013 Mar;69(1):274-81 [PMID: 23379623]
  6. Biometrics. 2012 Dec;68(4):1303-12 [PMID: 22937805]
  7. Ann Appl Stat. 2013 Dec 1;7(4):1917-1939 [PMID: 25493106]
  8. Biometrics. 2009 Jun;65(2):554-63 [PMID: 18759851]
  9. Ecol Evol. 2013 Dec;3(16):5225-36 [PMID: 24455151]
  10. Biometrics. 2016 Jun;72(2):649-58 [PMID: 26496390]

Grants

  1. R01 EB001988/NIBIB NIH HHS
  2. U54 EB020405/NIBIB NIH HHS

Word Cloud

Created with Highcharts 10.0.0dataspeciespresence-onlybiassamplingpresence-absencerecordssuffertypicallycollectionWemodeljointsurveymethodcanestimateperformancetowardsgivenspatialdistributionPresence-onlymayprovidedistributionsrarecommonlylargeunknownbiasesduehaphazardschemesPresence-absencecountcollectedsystematicplannedsurveysreliablelessabundantproposedprobabilisticallowanalysisexploitcomplementarystrengthspoolsmanymaximizeslikelihoodsimultaneouslyestimatingadjustingaffectingassumingborrowstrengthacrossefficientlyimproveinferenceevaluatemodel's36eucalyptsouth-easternAustraliafindexhibitstrongcoastSydneylargestcitydata-poolingtechniquesubstantiallyimprovesout-of-samplepredictiveamountavailablescarceIftypesseveralobtainunbiasedfirstspecies'geographicrangeBiascorrectionmodels:poolingmultiplepointprocessesmodels

Similar Articles

Cited By