Write values from an API request to a local database

Usage

ca_getvals_db(
  x,
  db_fn,
  db_tbl,
  omit_col = NULL,
  indices = NULL,
  new_recs_only = TRUE,
  trans_len = 100,
  lookup_tbls = TRUE,
  lookup_ret_joined = TRUE,
  pause_n = 1000,
  pause_secs = 60,
  write_sidecar = TRUE,
  stop_on_err = TRUE,
  quiet = FALSE,
  debug = FALSE
)

Arguments

x: A Cal-Adapt API request
db_fn: File name of a SQLite database. See Details.
db_tbl: The name of a database table. See Details.
omit_col: Columns to exclude from the tibble
indices: Name of fields to index. See Details.
new_recs_only: Write new records only to the database. See Details.
trans_len: Number of APIs calls per write transaction. See Details.
lookup_tbls: Use lookup tables
lookup_ret_joined: Return a table with lookup table fields, ignored if lookup_tbls = FALSE. See Details.
pause_n: Number of API calls after which a built-in pause is triggered. See Details.
pause_secs: Number of seconds to pause. See Details.
write_sidecar: Save table metadata in a separate file. See Details.
stop_on_err: Stop if the server returns an error
quiet: Suppress messages
debug: Print additional output at the console

Value

A remote tibble linked to the SQLite database.

Details

ca_getvals_db fetches data from the Cal-Adapt API and writes the data to a SQLite database as they're received. This allows you to fetch relatively large volumes of data in the background, and potentially over multiple sessions as it will pick up where it left off if interrupted. Saving the values in a database also reduces the risk of exhausting your RAM.

Use ca_getvals_db to fetch large volumes of data (i.e., hundreds of thousands of values), or whenever you'd like to keep a local copy of the data. Note however for small amounts of data there is no advantage to putting it in a database as it will be slighly slower to retrieve and work with.

db_fn should be a file name with path to a SQLite database. A SQLIte database is a single file typically with a .db or .sqlite extension. If the database doesn't exist, it will be created. If it already exists, the new data will be added to it.

db_tbl should the name of a table within the database where the new data will be saved. The table name should not contain special characters and spaces are discouraged. If new_recs_only = TRUE, only new records will be added to the database.

trans_len defines the number of API calls per SQLite transaction (i.e. how many API calls of data to accumulate before doing a write operation to the database). This can speed things up. Set it to 0 to disable transactions.

If lookup_tbls = TRUE, the database will create lookup tables for categorical columns such as GCM, scenario, cvar, period, slug, etc. This can dramatically reduce the size of the SQLite database file and is generally recommended. id lookup_ret_joined = TRUE, the tibble returned will have the lookup tables joined (i.e., column names will be unaltered); if not the returned tibble will have id values for certain values. A small text file is created for each SQLite database containing the names of the tables and SQL statement to join them (read automatically by ca_db_read).

indices is a vector of column names in db_tbl that you'd like indexed (ignored if lookup_tbls = FALSE). Creating indices can improve the performance of filters and joins when you generate summaries, but at the cost of a larger database file and slightly slower write operations. Fields you can create indices on include "feat_id" (the location id value), "cvar", "gcm", "scenario", "period", "slug", and "spag".

Indices can also be added to a SQLite database after downloading is complete with ca_db_indices. For large queries (e.g. thousands of API calls), it is recommended to not build indices during the download process, and only add indices for those fields you plan to filter on or join during your analysis. You can view which indices exist with ca_db_info.

pause_n is the number of API calls after which a built-in pause of length pause_secs is triggered. This is intended to avoid disruption on the Cal-Adapt server. The maximum value for pause_n is 2500, and the minimum value for pause_secs is 30 seconds.

The returned tibble is linked to the SQLite datbase. For the most part you can use the same dplyr functions to manipulate the results, but to retrieve the actual values you need to use `collect()`. For more info working with a linked database, see https://dbplyr.tidyverse.org/articles/dbplyr.html.

Write values from an API request to a local database

Usage

Arguments

Value

Details

See also