Last week
- We saw how to do spatial cross validation in
tidymodels
- We downloaded occurrence points for a species using GBIF
Occurrence Data
- Up until now every model of occurrence we have done has using
Presence / Absence data
- We had sites where we knew the species was present and where we knew
it was absent
- The model could then estimate variation in the ‘probability’ of
occurrence across different sites
- Occurrence point data we downloaded are only places we know the
species is Present (no Absence data)
How to Model Presence-Only Data
- If we tried to put the data through a model as is, the model would
be unable to choose a sensible model
- There is no variation for the model to explain
- Therefore any model that always predicts presence will be
equivalently good
- We need to introduce variation
What Really Varies in Presence-Absence Data?
- Density!
- Points have higher density in some areas and lower in others
What is Density?
- Density is a continuous analog of counts
- Density is a count per unit area, at the limit of infinitesimal area
(for a two dimensional space)
- To estimate a density over a continuous space, we need to use
integration
Quadrature
- A simple way to estimate an integral is to sample a space randomly
or regularly, estimate the value at each sample, and then take a
weighted sum of the sample.
- Noting that models estimate the expectation of a function (which can
be formulated as a weighted sum), we can make our model predict the
density of points in space by using random or regular ‘background’
points, often called ‘pseudo-absences’.
- In order to estimate true density requires carefully choosing
observation weights and applying them in the model fitting, but not all
models allow for weighting.
- Without appropriate weighting we can still say the the model
produces estimates that are proportional to density, which is often good
enough.
- We often just want to know where species are more or less likely to
be, not the exact probability (although this is useful too, when
possible).
Example of Pseudo-absences
## Download file size: 0 MB
## On disk at ./0107668-220831081235567.zip
## convert to sf
sachsia <- sachsia %>%
select(long = decimalLongitude,
lat = decimalLatitude) %>%
st_as_sf(coords = c("long", "lat"), crs = 4326)
mapview(sachsia)
## Warning in cbind(`Feature ID` = fid, mat): number of rows of result is not a
## multiple of vector length (arg 1)
Sample Pseudo-Absences
sachsia_dat <- sdm_data(sachsia, bg = bg, n = 10000)
sachsia_dat
## Simple feature collection with 10032 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -81.56721 ymin: 24.4205 xmax: -80.16127 ymax: 25.79203
## Geodetic CRS: WGS 84
## # A tibble: 10,032 × 3
## pnts present pnt_origin
## * <POINT [°]> <fct> <chr>
## 1 (-80.69476 25.39784) present data
## 2 (-80.63322 25.40324) present data
## 3 (-80.6333 25.40328) present data
## 4 (-80.69095 25.39616) present data
## 5 (-80.62141 25.39301) present data
## 6 (-80.38955 25.54626) present data
## 7 (-80.63322 25.40085) present data
## 8 (-80.51267 25.43387) present data
## 9 (-80.657 25.40336) present data
## 10 (-80.39395 25.54607) present data
## # … with 10,022 more rows
mapview(sachsia_dat, zcol = "present")
Add environmental variables
sachsia_dat <- add_env_vars(sachsia_dat, bioclim_fl)
sachsia_dat
## Simple feature collection with 10032 features and 21 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -81.56721 ymin: 24.4205 xmax: -80.16127 ymax: 25.79203
## Geodetic CRS: WGS 84
## # A tibble: 10,032 × 22
## pnts present pnt_origin BIO1 BIO2 BIO3 BIO4 BIO5 BIO6
## * <POINT [°]> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (-80.69476 25.39784) present data 24.2 28.0 20.0 1360 235 39
## 2 (-80.63322 25.40324) present data 24.2 27.9 19.9 1386 238 35
## 3 (-80.6333 25.40328) present data 24.2 27.9 19.9 1386 238 35
## 4 (-80.69095 25.39616) present data 24.2 28.0 20.0 1364 235 38
## 5 (-80.62141 25.39301) present data 24.2 27.9 20.0 1388 238 37
## 6 (-80.38955 25.54626) present data 24.3 27.9 20.2 1464 246 35
## 7 (-80.63322 25.40085) present data 24.2 27.9 19.9 1386 238 35
## 8 (-80.51267 25.43387) present data 24.2 27.9 20 1417 241 37
## 9 (-80.657 25.40336) present data 24.2 27.9 20.0 1376 237 36
## 10 (-80.39395 25.54607) present data 24.3 27.9 20.1 1467 246 35
## # … with 10,022 more rows, and 13 more variables: BIO7 <dbl>, BIO8 <dbl>,
## # BIO9 <dbl>, BIO10 <dbl>, BIO11 <dbl>, BIO12 <dbl>, BIO13 <dbl>,
## # BIO14 <dbl>, BIO15 <dbl>, BIO16 <dbl>, BIO17 <dbl>, BIO18 <dbl>,
## # BIO19 <dbl>
Remove NAs
sachsia_dat <- sachsia_dat %>%
drop_na(BIO1:BIO19)
mapview(st_sf(sachsia_dat), zcol = "present")