Working with Cal-Adapt Climate Data in R:
Large Queries

Querying Large Data

Imagine you want to extract the climate data for 36,000 vernal pool locations.

Issues that arise when querying large number (1000s) of locations:

1) Aggregate point features by LOCA grid cells

2) Download rasters

ca_getrst_stars()
Although it will take longer to download, data extraction and geoprocessing may be faster locally

3) Save values in a local SQLite database

Use ca_getvals_db() Instead of ca_getvals_tbl()

downloaded values will be saved into SQLite database as you go
slow connection? No problem - it will chug away on its own
disconnected? No problem - it will pick up where it left off
get back a remote tibble connected to the SQLite database
additional caladaptR functions help you view contents of the SQLite database, manage indices, etc.
standard dplyr verbs work thanks to dbplyr

Sample usage:

my_vals <- my_api_req %>% 
  ca_getvals_db(db_fn = "my_data.sqlite",
                db_tbl = "daily_vals",
                new_recs_only = TRUE)

new_recs_only = TRUE → will pick up where it left off if the connection interrupted

ca_getvals_db() returns a ‘remote tibble’ linked to a local database

Work with ‘remote tibbles’ using many of the same techniques as regular tibbles (with a few exceptions)

ca_db_info() and ca_db_indices() help you view and manage database files

See the Large Queries Vignette for details

In Notebook 3 you will: