suppressPackageStartupMessages({
library(cimir)
library(dplyr)
library(lubridate)
})
How Does CIMIS Compute Daily Summaries?
An exercise in reverse engineering
The CIMIS API allows you to query weather station data by hour or day. The daily summaries appear to be generated by aggregating hourly data with the definition of a day as starting at 1am and ending at 24:00. This is a little unusual but that’s how they do it.
There is a small unexplained difference in the aggregated hourly data and the daily summaries. This could be due to rounding errors or the possibility that hourly and daily summaries are both generated from an even shorter time period (e.g., 5 minutes).
This file can be viewed at: https://ucanr-igis.github.io/sketchbook/cimis_hour2day_investigation.html
Overview
The CIMIS API allows you to query data in hourly or daily time intervals. This notebook does some analysis to understand exactly how the daily summaries are constructed from the hourly data.
Why is this useful? A very good question. This is useful because i) we need daily values for certain models, such as degree days or irrigation requirements, but ii) we might only have access to the hourly data (i.e, if we are accessing the data from the Synoptic API, which is a lot more stable and reliable than the CIMIS API1).
To figure this out, we’re going to import Hourly and Daily CIMIS Data for the same time period. We’ll then compute daily totals starting at different hours (we do this off camera in Excel for convenience), and compare the totals to the daily summaries.
Setup
Load my CIMIS Key:
<- readLines("andy_cimis_key.txt", warn = FALSE)
my_cimis_key set_key(my_cimis_key)
Define a directory for cached data
<- here::here("cimis_cache")
cache_dir dir.exists(cache_dir)
[1] TRUE
Check if the API is alive
# library(httr)
# (xx <- httr::GET(url = "http://et.water.ca.gov/api/station/125"))
# httr::status_code(xx) == 200
View CIMIS Items
cimis_items()
Get one month of daily precip data
<- file.path(cache_dir, "station152_day-precip_2024-03-01_2024-03-31.Rds")
precip_day_cache_fn
if (file.exists(precip_day_cache_fn)) {
cat("Going to use cached daily data \n")
<- readRDS(precip_day_cache_fn)
precip_day_tbl else {
} cat("Going to fetch daily data \n")
<- cimis_data(targets = 152, start.date = "2024-03-01", end.date = "2024-03-31",
precip_day_tbl items = "day-precip")
saveRDS(precip_day_tbl, file = precip_day_cache_fn)
}
Going to use cached daily data
Get one month of hourly precip data
<- file.path(cache_dir, "station152_hourly-precip_2024-03-01_2024-03-31.Rds")
precip_hly_cache_fn
if (file.exists(precip_hly_cache_fn)) {
cat("Going to use cached hourly data \n")
<- readRDS(precip_hly_cache_fn)
precip_hly_tbl
else {
} cat("Going to fetch hourly data \n")
<- cimis_data(targets = 152, start.date = "2024-03-01", end.date = "2024-03-31",
precip_hly_tbl items = "hly-precip")
saveRDS(precip_hly_tbl, file = precip_hly_cache_fn)
}
Going to use cached hourly data
|>
precip_hly_tbl select(Name, Type, Date, Hour, Station, Scope, Item, Value) |>
slice(1:30)
Note: we see our first clue above. The first hour of data for March 1 is 1am, and the final hour is “2400”. Most people would consider midnight the first hour of the day, but CIMIS considers it the last minute (“24:00”) of the previous day.
Do some date conversion and pull out just a couple of fields
<- precip_hly_tbl |>
hly_tbl mutate(date = ymd(Date), hour_num = as.numeric(Hour) / 100, dt = date + hours(hour_num)) |>
select(dt, precip_hly = Value)
head(hly_tbl, n=30)
<- precip_day_tbl |>
day_tbl mutate(date = ymd(Date)) |>
select(date, precip_day = Value)
head(day_tbl)
Next, we export the hourly data to Excel, where it will be easier to explore the hour:
<- file.path(cache_dir, "station152_day-precip_2024-03-01_2024-03-31.xlsx")
precip_hourly_xlsx_fn
if (file.exists(precip_hourly_xlsx_fn)) {
cat("Excel file already exists. \n")
else {
} cat("Going to save hourly data as Excel \n")
library(openxlsx)
::write.xlsx(hly_tbl, file = precip_hourly_xlsx_fn)
openxlsx }
Excel file already exists.
The Excel investigation confirms our suspicions. We get a very good match between the sum of the hourly data and the daily summaries if we start counting at 1am. There is still a small discrepancy which we can’t explain, but is small and could be due to a rounding error (or maybe the data are recorded at an even finer temporal resolution than 1 hour).
To check the closeness of the summed up hourly totals (starting at 1am) and the daily summaries, we can run the following:
|>
hly_tbl mutate(midnight_offset = as.numeric(hour(dt) == 0), date_group = date(dt) - days(midnight_offset)) |>
group_by(date = date_group) |>
summarise(precip_hly_sum = sum(precip_hly), .groups = "drop") |>
left_join(day_tbl, by = "date") |>
mutate(diff = precip_hly_sum - precip_day)
Footnotes
The CIMIS API is getting an upgrade in September 2024, so this may be a moot point by then.↩︎