Often when presented with a set of population dynamic data, we may wish to investigate the abiotic conditions such as temperature or humidity that these dynamics occurred under.
Ideally this sort of data would have been collected at the same time as the dynamics data. However with data collected for one project and then repurposed for another, this is often not possible.
In such scenarios it is possible to link the location of the data with other measurements of abiotic variables in the area.
These can be retrieved from raster files provided by projects such as the ERA5 data store from the Copernicus project, however often this is a large amount of data for a small number of datapoints.
An alternative method for gathering this data is through the AREAdata project, which provides historical and forecast data aggregated at different spatial scales.
The downloads for AREAdata are generally quite large, but rarely updated. As such it’s worthwhile keeping these files around if you’ll use them regularly.
To do this let’s run a command to set up a local cache.
Now you’ve done this, your cached data will not be deleted when you close R (though you must run this command once per session).
When you run this on your own computer, you will receive a message giving instructions on how to permanently set this using your .Rprofile file.
Now you have set up your cache, let’s take a look at the format of data retrieved from AREAdata (we will deconstruct this command properly in a bit):
ad_df <- fetch_ad(metric = "temp", gid = 0, use_cache = TRUE)
# Display only a brief summary of the data
summary(ad_df)## <ohvbd.ad.matrix>
## Areadata matrix for temp at gid level 0 .
## Cached: FALSE
## Dates: 2020-01-01 -> 2025-12-31 (2192)
## Locations: 256
So that is quite a lot of data! Way too much to print here, given that it’s 561152 entries, however if you want to print all the data rather than just a summary, you can use the print command:
AREAdata outputs are formatted as matrices. Here the rows correspond to locations (identified by GADM codes):
## [1] "Aruba" "Afghanistan" "Angola" "Anguilla" "Åland"
## [6] "Albania"
Meanwhile the column names correspond to the day that the data refers to:
## [1] "2020-01-01" "2020-01-02" "2020-01-03" "2020-01-04" "2020-01-05"
## [6] "2020-01-06"
So the value at ad_df["Angola", "2020-01-01"]
corresponds to the value of the metric (in this case temperature) in
Angola on 2020-01-01:
## [1] 29.33611
fetch_ad()That call above to fetch_ad() used some slightly odd
parameters, so let’s break that down a bit.
The arguments we provided were
metricThe metric argument tells fetch_ad() what type of data
it should retrieve from AREAdata.
There are many valid options for this including temperature and
humidity. (See AREAdata
or run ?fetch_ad() in the R console for a full list).
gidThe gid argument tells fetch_ad() the spatial scale to
download (with higher numbers corresponding to finer scales, at the cost
of larger download sizes).
use_cache and cache_locationAs the AREAdata files are large and monolithic, it is often advantageous to download data at an given spatial scale only once.
We can do this by caching the files and retrieving them as needed.
Whilst you could do this manually, ohvbd already has the
functionality to perform this behind the scenes.
If we set use_cache = TRUE then instead of downloading
from the repository immediately, ohvbd will first try to
load any cached version that may be present.
Typically, the cache location is a folder in your user directory,
obtained from tools::R_user_dir(). However, if you want to
overwrite it (such as if you want to share the cache between different
scripts) you can provide an alternative path to save data to.
Extracting data from this matrix can be fiddly, but luckily
ohvbd also provides a function to aid in that:
glean_ad().
So let’s again try to extract the same data as previously, the temperature in Angola on 2020-01-01:
## <ohvbd.ad.matrix>
## Areadata matrix for temp at gid level 0 .
## Cached: FALSE
## Dates: 2020-01-01 -> 2020-01-01 (1)
## Locations: 1
## Data:
## 2020-01-01
## Angola 29.33611
## attr(,"metric")
## [1] "temp"
## attr(,"gid")
## [1] 0
## attr(,"cached")
## [1] FALSE
## attr(,"db")
## [1] "ad"
## attr(,"writetime")
## [1] "2026-02-05 15:31:57 GMT"
## attr(,"class")
## [1] "ohvbd.ad.matrix" "matrix" "array"
Note: ignore the attr part of the output, this data
just allows ohvbd to ensure consistency between its
functions.
Wonderful! But what if we want a range of dates, or many places? What if the dates fall outside the dates that we have data for, or the places are not actually present in the data?
This is where glean_ad() really shines.
Let’s first look at the flexibility of the date arguments to
glean_ad().
If we (for example) want not just one day, we can specify either a
vector of dates, or a less exact date as targetdate
## <ohvbd.ad.matrix>
## Areadata matrix for temp at gid level 0 .
## Cached: FALSE
## Dates: 2020-01-01 -> 2020-01-05 (3)
## Locations: 1
## Data:
## 2020-01-01 2020-01-02 2020-01-05
## Angola 29.33611 28.77514 28.13407
## attr(,"metric")
## [1] "temp"
## attr(,"gid")
## [1] 0
## attr(,"cached")
## [1] FALSE
## attr(,"db")
## [1] "ad"
## attr(,"writetime")
## [1] "2026-02-05 15:31:57 GMT"
## attr(,"class")
## [1] "ohvbd.ad.matrix" "matrix" "array"
## <ohvbd.ad.matrix>
## Areadata matrix for temp at gid level 0 .
## Cached: FALSE
## Dates: 2020-01-01 -> 2020-01-31 (31)
## Locations: 1
## Data:
## 2020-01-01 2020-01-02 2020-01-03 2020-01-04 2020-01-05 2020-01-06
## Angola 29.33611 28.77514 28.88312 27.4325 28.13407 28.64046
## 2020-01-07 2020-01-08 2020-01-09 2020-01-10 2020-01-11 2020-01-12
## Angola 28.09983 28.40937 28.71222 27.21757 26.85815 28.92744
## 2020-01-13 2020-01-14 2020-01-15 2020-01-16 2020-01-17 2020-01-18
## Angola 29.5375 28.88849 28.81174 29.81402 29.89741 29.5939
## 2020-01-19 2020-01-20 2020-01-21 2020-01-22 2020-01-23 2020-01-24
## Angola 29.87222 30.28832 30.18386 29.98904 29.97556 29.63855
## 2020-01-25 2020-01-26 2020-01-27 2020-01-28 2020-01-29 2020-01-30
## Angola 29.53917 29.00056 28.81855 29.47186 28.44818 28.64745
## 2020-01-31
## Angola 28.34594
## attr(,"metric")
## [1] "temp"
## attr(,"gid")
## [1] 0
## attr(,"cached")
## [1] FALSE
## attr(,"db")
## [1] "ad"
## attr(,"writetime")
## [1] "2026-02-05 15:31:57 GMT"
## attr(,"class")
## [1] "ohvbd.ad.matrix" "matrix" "array"
# 366 days of data return as 2020 is a leap year
length(ad_df |> glean_ad(targetdate = "2020", places = "Angola"))## [1] 366
If we would instead like to specify a date range, we can add the
argument enddate.
Note: the end date is defined exclusively, meaning that any date before it (and on or after the target date) is considered to be “in”.
## <ohvbd.ad.matrix>
## Areadata matrix for temp at gid level 0 .
## Cached: FALSE
## Dates: 2020-01-01 -> 2020-01-03 (3)
## Locations: 1
## Data:
## 2020-01-01 2020-01-02 2020-01-03
## Angola 29.33611 28.77514 28.88312
## attr(,"metric")
## [1] "temp"
## attr(,"gid")
## [1] 0
## attr(,"cached")
## [1] FALSE
## attr(,"db")
## [1] "ad"
## attr(,"writetime")
## [1] "2026-02-05 15:31:57 GMT"
## attr(,"class")
## [1] "ohvbd.ad.matrix" "matrix" "array"
Multiple places to extract data for can also be provided as a vector of places:
## <ohvbd.ad.matrix>
## Areadata matrix for temp at gid level 0 .
## Cached: FALSE
## Dates: 2020-01-01 -> 2020-01-01 (1)
## Locations: 3
## Data:
## 2020-01-01
## Angola 29.336107
## Latvia 1.765062
## United_Kingdom 5.113141
## attr(,"metric")
## [1] "temp"
## attr(,"gid")
## [1] 0
## attr(,"cached")
## [1] FALSE
## attr(,"db")
## [1] "ad"
## attr(,"writetime")
## [1] "2026-02-05 15:31:57 GMT"
## attr(,"class")
## [1] "ohvbd.ad.matrix" "matrix" "array"
This can be combined with the date filters above to create a more restrictive extraction:
ad_df |> glean_ad(
targetdate = "2020-01-01",
enddate = "2020-01-04",
places = c("Angola", "Latvia", "United Kingdom")
)## <ohvbd.ad.matrix>
## Areadata matrix for temp at gid level 0 .
## Cached: FALSE
## Dates: 2020-01-01 -> 2020-01-03 (3)
## Locations: 3
## Data:
## 2020-01-01 2020-01-02 2020-01-03
## Angola 29.336107 28.775145 28.883123
## Latvia 1.765062 3.268600 2.733880
## United_Kingdom 5.113141 8.206171 6.198989
## attr(,"metric")
## [1] "temp"
## attr(,"gid")
## [1] 0
## attr(,"cached")
## [1] FALSE
## attr(,"db")
## [1] "ad"
## attr(,"writetime")
## [1] "2026-02-05 15:31:57 GMT"
## attr(,"class")
## [1] "ohvbd.ad.matrix" "matrix" "array"
The glean_ad() function is also relatively resilient to
the use of dates or places that are not in AREAdata:
ad_df |> glean_ad(
targetdate = "2023-12-30",
enddate = "2024-01-02",
places = c("Atlantis", "Latvia", "United Kingdom")
)## <ohvbd.ad.matrix>
## Areadata matrix for temp at gid level 0 .
## Cached: FALSE
## Dates: 2023-12-30 -> 2024-01-01 (3)
## Locations: 2
## Data:
## 2023-12-30 2023-12-31 2024-01-01
## United_Kingdom 7.007509 7.715774 7.124182
## Latvia 4.098629 -2.789258 -7.665634
## attr(,"metric")
## [1] "temp"
## attr(,"gid")
## [1] 0
## attr(,"cached")
## [1] FALSE
## attr(,"db")
## [1] "ad"
## attr(,"writetime")
## [1] "2026-02-05 15:31:57 GMT"
## attr(,"class")
## [1] "ohvbd.ad.matrix" "matrix" "array"
On top of providing a download interface for AREAdata,
ohvbd provides tools for easing the process of associating
your own data with data from AREAdata.
This is performed through the use of the assoc_ad()
function.
For example, let’s imagine we have some data and their associate locations:
lonlatdf <- data.frame(
Country = c("UK", "DE", "US", "ES", "IT", "BG", "US"),
Latitude = c(
53.813727033336384, 50.94133730102917, 41.502614374776414, 37.09584576240546,
46.190816816324634, 43.40408987260468, 34.02921305111613
),
Longitude = c(
-1.5631510640531983, 6.95792487354786, -73.96228644647785,
-2.0967211553577694, 11.135228310071971, 28.148385835241964,
-84.36091597146886
),
count = c(6, 36, 340, 202, 802, 541, 325)
)
lonlatdf## Country Latitude Longitude count
## 1 UK 53.81373 -1.563151 6
## 2 DE 50.94134 6.957925 36
## 3 US 41.50261 -73.962286 340
## 4 ES 37.09585 -2.096721 202
## 5 IT 46.19082 11.135228 802
## 6 BG 43.40409 28.148386 541
## 7 US 34.02921 -84.360916 325
If we know that we collected samples between the 4th and 5th of August 2021, we can associate the temperature values from these regions on these days at a given spatial scale.
Let’s do this at the country scale:
areadata <- fetch_ad(metric="temp", gid=0, use_cache=TRUE)
ad_extract_working <- assoc_ad(lonlatdf, areadata,
targetdate = c("2021-08-04"), enddate=c("2021-08-06"),
gid=0, lonlat_names = c("Longitude", "Latitude"))
# Suppress lonlat just to make the output easier to read
ad_extract_working |> dplyr::select(!c("Longitude", "Latitude"))## Country count temp_2021.08.04 temp_2021.08.05
## 1 UK 6 15.98993 16.09847
## 2 DE 36 17.17353 18.25541
## 3 US 340 26.12418 26.02746
## 4 ES 202 26.00963 27.42062
## 5 IT 802 26.21531 24.99068
## 6 BG 541 29.31433 31.18788
## 7 US 325 26.12418 26.02746
If we prefer a finer resolution (e.g. county-level), we could change
the gid argument to a higher number (indicating a finer
scale, up to 2):
areadata_gid2 <- fetch_ad(metric="temp", gid=2, use_cache=TRUE)
ad_extract_working_gid2 <- assoc_ad(lonlatdf, areadata_gid2,
targetdate = c("2021-08-04"), enddate=c("2021-08-06"),
gid=2, lonlat_names = c("Longitude", "Latitude"))
ad_extract_working_gid2 |> dplyr::select(!c("Longitude", "Latitude"))## Country count temp_2021.08.04 temp_2021.08.05
## 1 UK 6 16.35346 16.46636
## 2 DE 36 16.62339 18.94270
## 3 US 340 20.59405 20.87954
## 4 ES 202 29.42114 31.26401
## 5 IT 802 20.82837 23.16447
## 6 BG 541 25.01201 25.38486
## 7 US 325 23.69151 24.70063
We can see here that by using a finer spatial scale, the two entries from the US now have different associated temperatures.
Interacting with AREAdata requires the caching of a significant amount of data.
This is generally managed by ohvbd itself, and in most
cases you should not need to worry about it.
However if you want to free up some space, use guaranteed new data,
or see what data ohvbd has currently stored on your
computer, you have a few options available to you.
refresh_cache for fetch_ad()The fetch_ad() function has an argument called
refresh_cache. If this is set to TRUE, this
will force a refresh of the cache at download time (and suppress loading
from said cache):
## [1] "2026-02-05 15:31:57 GMT"
## [1] "2026-02-05 15:32:46 GMT"
Alternatively you can use the provided tools
list_ohvbd_cache() and clean_ohvbd_cache() to
explore and delete files within the ohvbd cache:
── Cached files ────────────────────────
Cache location: ../cache/R/ohvbd
── <root>: 0 files ──
none
── adcache: 3 files ──
• relhumid-1.rda
• temp-0.rda
• temp-2.rda
── gadmcache: 6 files ──
• gadm-counties.gpkg
• gadm-countries.gpkg
• gadm-states.gpkg
── vecdyn: 1 file ──
• vd_meta_table.rds
You can filter this by subdirectory, so let’s just look at the data
downloaded from AREAdata (in adcache):
── Cached files ────────────────────────
Cache location: ../cache/R/ohvbd
── adcache: 3 files ────────────────────
• relhumid-1.rda
• temp-0.rda
• temp-2.rda
If we wished to delete these files, we could simply run the following:
── Cached files ────────────────────────
Cache location: ../cache/R/ohvbd
── adcache: 3 files ──
• relhumid-1.rda
• temp-0.rda
• temp-2.rda
! This will permanently delete files from adcache
ℹ Are you sure? [y/N]
>> y
ℹ Clearing files from ../cache/R/ohvbd/adcache
✔ Removed 3 files
Built in 50.4761159s