This notebook will demonstrate two ways of dealing with larger volumes of data by caching retrieved into a local SQLite database.
A functional definition of “large data” might include:
To manage large downloads and avoid downloading data twice, caladaptR provides ca_getvals_db()
. ca_getvals_db()
is very similar to ca_getvals_tbl()
, but saves the data into a SQLite database file on your computer as it comes in. Before making an API request, it checks to see if the data have already been downloaded, and if so skips it. If you get disconnected during a download, you can run the command again and it’ll just pick up where it left off. ca_getvals_db()
returns a ‘remote tibble’, which functions very similar to a regular ‘in-memory’ tibble however it points to the SQLite database file on disk.
For additional info on working with SQLite databases, see the Article on making Large Queries.
In this example we’ll download daily temperature data for 16 Congressional Districts in the LA region.
Load caladaptR and the other package we’re going to need. (If you haven’t installed these yet, see this setup script).
library(caladaptr)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
caladaptr (version 0.6.6)
URL: https://ucanr-igis.github.io/caladaptr
Bug reports: https://github.com/ucanr-igis/caladaptr/issues
library(units)
udunits database from C:/Users/Andy/Documents/R/win-library/4.1/units/share/udunits/udunits2.xml
library(ggplot2)
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(lubridate)
Attaching package: ‘lubridate’
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
library(sf)
Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(tidyr)
library(tmap)
library(conflicted)
conflict_prefer("filter", "dplyr", quiet = TRUE)
conflict_prefer("count", "dplyr", quiet = TRUE)
conflict_prefer("select", "dplyr", quiet = TRUE)
The LA Congressional district boundaries are saved in the ‘data’ folder as a geopackage:
cdist_la_fn <- "./data/cdistricts_la.gpkg"
file.exists(cdist_la_fn)
[1] TRUE
cdist_la_sf <- st_read(cdist_la_fn)
Reading layer `cdistricts_la' from data source
`D:\GitHub\cal-adapt\caladaptr-res\docs\workshops\ca_intro_apr22\notebooks\data\cdistricts_la.gpkg'
using driver `GPKG'
Simple feature collection with 15 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -118.9449 ymin: 33.70403 xmax: -117.3529 ymax: 34.8233
Geodetic CRS: WGS 84
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(cdist_la_sf) + tm_borders()
Pro Tip:
Here we use ca_loc_sf()
as the location function for an API request for 30-years of modeled daily temperature data (minimum and maximum) for 4 GCMs and 2 RCPs:
cdist_la_cap <- ca_loc_sf(loc = cdist_la_sf, idfld = "geoid") %>%
ca_gcm(gcms[1:4]) %>%
ca_scenario(c("rcp45", "rcp85")) %>%
ca_period("day") %>%
ca_years(start = 2070, end = 2099) %>%
ca_cvar(c("tasmin", "tasmax")) %>%
ca_options(spatial_ag = "mean")
cdist_la_cap
Cal-Adapt API Request
Location(s):
Simple Feature MULTIPOLYGON (15 feature(s))
ID field: geoid
Variable(s): tasmin, tasmax
Temporal aggregration period(s): day
GCM(s): HadGEM2-ES, CNRM-CM5, CanESM2, MIROC5
Scenario(s): rcp45, rcp85
Dates: 2070-01-01 to 2099-12-31
Options:
spatial ag: mean
Do the standard plotting and preflight checks:
plot(cdist_la_cap, locagrid = TRUE)
cdist_la_cap %>% ca_preflight()
General issues
- none found
Issues for querying values
- none found
Issues for downloading rasters
- none found
To copy downloaded data into a database, use ca_getvals_db()
instead of ca_getvals_tbl()
. ca_getvals_db()
has two arguments which are mandatory:
db_fn
- the file name of a SQLite database (will be created if it doesn’t exist)
db_tbl
- the name of a table inside the database where the values should be saved
my_database_fn <- "./data/cdist_la_temp_data.sqlite"
cdist_la_rtbl <- cdist_la_cap %>%
ca_getvals_db(db_fn = my_database_fn,
db_tbl = "temp_data",
quiet = FALSE)
|
| | 0%
|
|= | 1%
|
|== | 2%
|
|=== | 3%
|
|==== | 3%
|
|==== | 4%
|
|===== | 5%
|
|====== | 5%
|
|====== | 6%
|
|======= | 6%
|
|======= | 7%
|
|======== | 7%
|
|======== | 8%
|
|========= | 8%
|
|========= | 9%
|
|========== | 9%
|
|========== | 10%
|
|=========== | 10%
|
|============ | 11%
|
|============ | 12%
|
|============= | 12%
|
|============= | 13%
|
|============== | 13%
|
|=============== | 14%
|
|================ | 15%
|
|================= | 16%
|
|================== | 17%
|
|=================== | 18%
|
|==================== | 18%
|
|==================== | 19%
|
|===================== | 20%
|
|====================== | 21%
|
|======================= | 21%
|
|======================= | 22%
|
|======================== | 22%
|
|======================== | 23%
|
|========================= | 23%
|
|========================= | 24%
|
|========================== | 24%
|
|========================== | 25%
|
|=========================== | 25%
|
|=========================== | 26%
|
|============================ | 26%
|
|============================ | 27%
|
|============================= | 27%
|
|============================= | 28%
|
|============================== | 28%
|
|=============================== | 29%
|
|=============================== | 30%
|
|================================ | 30%
|
|================================ | 31%
|
|================================= | 31%
|
|================================== | 32%
|
|=================================== | 33%
|
|==================================== | 34%
|
|===================================== | 35%
|
|====================================== | 36%
|
|======================================= | 36%
|
|======================================= | 37%
|
|======================================== | 38%
|
|========================================= | 38%
|
|========================================= | 39%
|
|========================================== | 39%
|
|========================================== | 40%
|
|=========================================== | 40%
|
|=========================================== | 41%
|
|============================================ | 41%
|
|============================================ | 42%
|
|============================================= | 42%
|
|============================================= | 43%
|
|============================================== | 43%
|
|============================================== | 44%
|
|=============================================== | 44%
|
|=============================================== | 45%
|
|================================================ | 45%
|
|================================================ | 46%
|
|================================================= | 46%
|
|================================================== | 47%
|
|=================================================== | 48%
|
|=================================================== | 49%
|
|==================================================== | 49%
|
|===================================================== | 50%
|
|====================================================== | 51%
|
|======================================================= | 51%
|
|======================================================= | 52%
|
|======================================================== | 53%
|
|========================================================= | 54%
|
|========================================================== | 54%
|
|========================================================== | 55%
|
|=========================================================== | 55%
|
|=========================================================== | 56%
|
|============================================================ | 56%
|
|============================================================ | 57%
|
|============================================================= | 57%
|
|============================================================= | 58%
|
|============================================================== | 58%
|
|============================================================== | 59%
|
|=============================================================== | 59%
|
|=============================================================== | 60%
|
|================================================================ | 60%
|
|================================================================ | 61%
|
|================================================================= | 61%
|
|================================================================= | 62%
|
|================================================================== | 62%
|
|=================================================================== | 63%
|
|=================================================================== | 64%
|
|==================================================================== | 64%
|
|===================================================================== | 65%
|
|====================================================================== | 66%
|
|======================================================================= | 67%
|
|======================================================================== | 68%
|
|========================================================================= | 69%
|
|========================================================================== | 69%
|
|========================================================================== | 70%
|
|=========================================================================== | 70%
|
|=========================================================================== | 71%
|
|============================================================================ | 72%
|
|============================================================================= | 72%
|
|============================================================================= | 73%
|
|============================================================================== | 73%
|
|============================================================================== | 74%
|
|=============================================================================== | 74%
|
|=============================================================================== | 75%
|
|================================================================================ | 75%
|
|================================================================================ | 76%
|
|================================================================================= | 76%
|
|================================================================================= | 77%
|
|================================================================================== | 77%
|
|================================================================================== | 78%
|
|=================================================================================== | 78%
|
|=================================================================================== | 79%
|
|==================================================================================== | 79%
|
|===================================================================================== | 80%
|
|====================================================================================== | 81%
|
|====================================================================================== | 82%
|
|======================================================================================= | 82%
|
|======================================================================================== | 83%
|
|========================================================================================= | 84%
|
|========================================================================================== | 85%
|
|=========================================================================================== | 86%
|
|============================================================================================ | 87%
|
|============================================================================================= | 87%
|
|============================================================================================= | 88%
|
|============================================================================================== | 88%
|
|============================================================================================== | 89%
|
|=============================================================================================== | 90%
|
|================================================================================================ | 90%
|
|================================================================================================ | 91%
|
|================================================================================================= | 91%
|
|================================================================================================= | 92%
|
|================================================================================================== | 92%
|
|================================================================================================== | 93%
|
|=================================================================================================== | 93%
|
|=================================================================================================== | 94%
|
|==================================================================================================== | 94%
|
|==================================================================================================== | 95%
|
|===================================================================================================== | 95%
|
|====================================================================================================== | 96%
|
|====================================================================================================== | 97%
|
|======================================================================================================= | 97%
|
|======================================================================================================== | 98%
|
|========================================================================================================= | 99%
|
|==========================================================================================================| 100%
Inspect the results:
cdist_la_rtbl %>% head()
The number of rows we retrieved:
cdist_la_rtbl %>% count() %>% pull(n)
[1] 2629680
Many of the base R operations that work with in-memory tibbles may or many not work with remote tibbles. For example as we saw above cdist_la_rtbl %>% count()
works, but:
dim(cdist_la_rtbl)
[1] NA 8
nrow(cdist_la_rtbl)
[1] NA
In general, the best way to work with remote tibbles is with
DBI
packageSimple filtering, sorting, grouping and simple numeric summaries generally work fine with dplyr verbs:
cdist_la_rtbl %>%
filter(geoid == "0632") %>%
group_by(scenario, gcm, cvar) %>%
summarize(mean_temp = mean(val, na.rm = TRUE))
`summarise()` has grouped output by 'scenario', 'gcm'. You can override using the `.groups` argument.
Pro Tip:
collect()
at the end of a dplyr expression.
If your wrangling workflow involves a lot steps that are difficult or impossible to do with remote tibbles, a good strategy is to do your filtering and grouping with dplyr statements on the remote tibble, and then convert the results to a regular tibble with collect()
.
Below we convert the grouped summary table into a tibble so we can use pivot_wider()
(which doesn’t work on remote tibbles):
temp_long_tbl <- cdist_la_rtbl %>%
filter(geoid == "0632") %>%
group_by(scenario, gcm, cvar) %>%
summarize(mean_temp = mean(val, na.rm = TRUE)) %>%
collect()
`summarise()` has grouped output by 'scenario', 'gcm'. You can override using the `.groups` argument.
class(temp_long_tbl)
[1] "grouped_df" "tbl_df" "tbl" "data.frame"
temp_wide_tbl <- temp_long_tbl %>%
pivot_wider(names_from = cvar, values_from = mean_temp) %>%
mutate(tas_range = tasmax - tasmin)
temp_wide_tbl %>% head()
Create a histogram of the mean daily temperatures for one Congressional District and one emissions scenario, grouping the data by GCM. Does the distribution of mean average temperature look the same across GCMs?
## Your answer here