This Notebook will demonstrate how to import various types of vector GIS data into R.
First let’s look at layers in the data folder, by passing the directory to st_layers()
from the sf
package. This will show us the Shapefiles but not layers that are in ‘containers’, like file geodatabases, geojson files, etc.
library(sf)
Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
## View spatial layers in the data folder.
st_layers("./data")
Driver: ESRI Shapefile
Available layers:
Import the ‘yose_boundary’ layer (a Shapefile)
yose_bnd_ll <- st_read(dsn="./data", layer="yose_boundary")
Reading layer `yose_boundary' from data source `D:\Workshops\R-Spatial\rspatial_mod\outputs\rspatial_data\data' using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 11 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -119.8864 ymin: 37.4947 xmax: -119.1964 ymax: 38.18515
Geodetic CRS: North_American_Datum_1983
# This also works:
# yose_bnd_ll <- st_read(dsn="./data/yose_boundary.shp")
Note 1: we don’t need to add the .shp extension
Note 2: this code is using convention to name variables yose_bnd_ll.
yose
- all Yosemite layers start with this
bnd
- tell me this the park boundary
ll
- lat-long coordinates
Write an expression that returns the class (type) of yose_bnd_ll. Answer
## Your answer here
We see that yose_bnd_ll
is both a sf object (simple feature data frame) as well as a data.frame. This means we should be able to use the functions designed for either of those objects.
View the properties of yose_bnd_ll by simply running it by itself:
yose_bnd_ll
Simple feature collection with 1 feature and 11 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -119.8864 ymin: 37.4947 xmax: -119.1964 ymax: 38.18515
Geodetic CRS: North_American_Datum_1983
UNIT_CODE GIS_NOTES UNIT_NAME
1 YOSE Lands - http://landsnet.nps.gov/tractsnet/documents/YOSE/METADATA/yose_metadata.xml Yosemite National Park
DATE_EDIT STATE REGION GNIS_ID UNIT_TYPE CREATED_BY METADATA
1 2016-01-27 CA PW 255923 National Park Lands http://nrdata.nps.gov/programs/Lands/YOSE_METADATA.xml
PARKNAME geometry
1 Yosemite POLYGON ((-119.8456 37.8327...
What coordinate reference system is yose_bnd_ll in? Answer
The names()
function returns the column labels of a data frame (in this case the attribute table).
## View column names in the attribute table
names(yose_bnd_ll)
[1] "UNIT_CODE" "GIS_NOTES" "UNIT_NAME" "DATE_EDIT" "STATE" "REGION" "GNIS_ID" "UNIT_TYPE"
[9] "CREATED_BY" "METADATA" "PARKNAME" "geometry"
Take note of the last column - geometry
. That’s where the geometry is saved (we’ll come back to that later).
View the first few rows of the attribute table with head()
:
head(yose_bnd_ll)
Simple feature collection with 1 feature and 11 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: -119.8864 ymin: 37.4947 xmax: -119.1964 ymax: 38.18515
Geodetic CRS: North_American_Datum_1983
UNIT_CODE GIS_NOTES UNIT_NAME
1 YOSE Lands - http://landsnet.nps.gov/tractsnet/documents/YOSE/METADATA/yose_metadata.xml Yosemite National Park
DATE_EDIT STATE REGION GNIS_ID UNIT_TYPE CREATED_BY METADATA
1 2016-01-27 CA PW 255923 National Park Lands http://nrdata.nps.gov/programs/Lands/YOSE_METADATA.xml
PARKNAME geometry
1 Yosemite POLYGON ((-119.8456 37.8327...
To plot just the geometry of a sf object (i.e., no symbology from the attribute table), we can use the st_geometry()
function.
## Plot the geometry (outline) of the Yosemite boundary
plot(yose_bnd_ll %>% st_geometry(), asp=1)
Import the Yosemite Points-of-Interest (POI) Shapefile and plot them. Answer
kml & kmz files can have more than one layer. Hence the source is the kml file, and you must specify the layer by name.
Import a kml containing the National Register of Historic Places in Yosemite in Yosemite. First find the KML file:
## Import KML file
kml_fn <- "./data/yose_historic_pts.kml"
file.exists(kml_fn)
[1] TRUE
View the layers within this KML:
## View the layers in this kml
st_layers(kml_fn)
Driver: KML
Available layers:
Import:
## Import the 'yosem_historic_places' layer
yose_hp_ll <- st_read(kml_fn, layer="yose_historic_places")
Reading layer `yose_historic_places' from data source `D:\Workshops\R-Spatial\rspatial_mod\outputs\rspatial_data\data\yose_historic_pts.kml' using driver `KML'
Simple feature collection with 35 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -119.8447 ymin: 37.51356 xmax: -119.2165 ymax: 38.08368
Geodetic CRS: WGS 84
View its properties:
## View properties
yose_hp_ll
Simple feature collection with 35 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -119.8447 ymin: 37.51356 xmax: -119.2165 ymax: 38.08368
Geodetic CRS: WGS 84
First 10 features:
Name Description geometry
1 Hetch Hetchy Railroad Engine No. 6 POINT (-119.786 37.67437)
2 Hodgdon Homestead Cabin POINT (-119.656 37.53924)
3 Rangers' Club POINT (-119.5883 37.74735)
4 Buck Creek Cabin POINT (-119.4897 37.56131)
5 Wawona Covered Bridge POINT (-119.656 37.53859)
6 Crane Flat Fire Lookout POINT (-119.8207 37.75978)
7 Glacier Point Trailside Museum POINT (-119.5731 37.72916)
8 McCauley Cabin POINT (-119.3676 37.87812)
9 Bagby Stationhouse POINT (-119.7862 37.67439)
10 Great Sierra Mine POINT (-119.2688 37.9276)
Remember to overlay more than one layer on a plot:
## Plot the boundary, then the historic places
{plot(yose_bnd_ll %>% st_geometry(), asp=1)
plot(yose_hp_ll %>% st_geometry(), add=TRUE)}
Import the California county boundaries, which is saved as a GeoJSON file.
## Import a Geojson file
counties_fn <- "./data/ca_counties.geojson"
file.exists(counties_fn)
[1] TRUE
View the layers in this GeoJSON file:
## View the layers
st_layers(counties_fn)
Driver: GeoJSON
Available layers:
Import the ‘ca_counties’ layer:
## Import the 'ca_counties' layer
ca_counties_ll <- st_read(counties_fn)
Reading layer `ca_counties' from data source `D:\Workshops\R-Spatial\rspatial_mod\outputs\rspatial_data\data\ca_counties.geojson' using driver `GeoJSON'
Simple feature collection with 58 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XYZ
Bounding box: xmin: -124.4096 ymin: 32.53416 xmax: -114.1312 ymax: 42.00952
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Plot the county boundaries. Answer
You can import (but not write to) an ESRI file geodatabase using the sf package. In this case, the source is the folder containing the geodatabase.
Import the Yosemite’s trails from a geodatabase. First find the gdb file:
## Define the path to the file geodatabase (a folder)
gdb_fn <- "./data/yose_trails.gdb"
file.exists(gdb_fn)
[1] TRUE
View the layers in this source:
st_layers(gdb_fn)
Driver: OpenFileGDB
Available layers:
Import the ‘Trails’ layer
## Import the 'Trails' layer (case sensitive!)
yose_trails <- st_read(gdb_fn, layer="Trails")
Reading layer `Trails' from data source `D:\Workshops\R-Spatial\rspatial_mod\outputs\rspatial_data\data\yose_trails.gdb' using driver `OpenFileGDB'
Simple feature collection with 1074 features and 13 fields
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: 245134 ymin: 4153668 xmax: 323239.7 ymax: 4250703
Projected CRS: NAD83 / UTM zone 11N
Plot Yosemite’s Trails:
## Plot the trails layer
plot(st_geometry(yose_trails), axes=TRUE)
The following code does not work to make a plot of the park boundary and the trails. Can you tell why? Answer
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
{plot(yose_bnd_ll %>% st_geometry())
plot(yose_trails %>% st_geometry(), add=TRUE)}
Let’s import Yosemite’s watersheds from a geopackage file.
## Import watersheds from a geopackage
gpkg_watershd_fn <- "./data/yose_watersheds.gpkg"
file.exists(gpkg_watershd_fn)
[1] TRUE
st_layers(gpkg_watershd_fn)
Driver: GPKG
Available layers:
yose_watersheds <- st_read(gpkg_watershd_fn, layer="calw221")
Reading layer `calw221' from data source `D:\Workshops\R-Spatial\rspatial_mod\outputs\rspatial_data\data\yose_watersheds.gpkg' using driver `GPKG'
Simple feature collection with 127 features and 38 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 1383.82 ymin: -61442.93 xmax: 81596.71 ymax: 26405.66
Projected CRS: unnamed
Plot the watersheds:
plot(st_geometry(yose_watersheds), axes=TRUE)
What CRS are the Yosemite watersheds in? Answer
ANS. California Equal Albers (a common projection for statewide data in California)
Import a CSV file containing missing persons records. Step 1 is to import it as a data frame:
## Import missing people csv file
missing_df <- read.csv("./data/yosemite_missing_people.csv", stringsAsFactors = FALSE)
tibble::glimpse(missing_df)
Rows: 213
Columns: 49
$ ï..X <dbl> -119.6632, -119.8099, -119.5958, -119.5599, -119.5937, -119.6064, -119.4291, -119.5864, -119.5271, ~
$ Y <dbl> 37.66355, 37.76910, 37.74595, 37.75631, 37.74561, 37.74521, 37.86868, 37.71233, 37.74873, 37.73601,~
$ OBJECTID_1 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, ~
$ OBJECTID <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, ~
$ Georef_Unc <dbl> 336.3710, 526.3630, 56.3650, 126.3640, 41.3650, 846.5152, 41.3560, 51.3670, 41.3650, 431.3660, 41.3~
$ Distance <dbl> 1340.26046, 1293.06310, 0.00000, 1760.04205, 357.14291, 1823.43718, 651.53949, 2971.29565, 3025.122~
$ Type <chr> "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "IPP", "~
$ Lat <dbl> 37.66355, 37.76910, 37.74595, 37.75631, 37.74561, 37.74521, 37.86868, 37.71233, 37.74873, 37.73601,~
$ Long <dbl> -119.6632, -119.8099, -119.5958, -119.5599, -119.5937, -119.6064, -119.4291, -119.5864, -119.5271, ~
$ Extent <dbl> 310, 500, 30, 100, 15, 15, 15, 25, 15, 405, 15, 30, 150, 15, 47, 70, 15, 15, 15, 15, 468, 200, 40, ~
$ CaseNumber <int> 20090248, 20090652, 20090940, 20091134, 20091252, 20091345, 20091382, 20091583, 20091755, 20091760,~
$ SARNumber <int> 2009004, 2009014, 2009024, 2009029, 2009036, 2009042, 2009043, 2009052, 2009059, 2009060, 2009069, ~
$ IncidYear <int> 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 200~
$ DateTimeLa <chr> "2009-02-01T00:00:00.000Z", "2009-03-30T00:00:00.000Z", "2009-04-25T00:00:00.000Z", "2009-05-12T00:~
$ DateTimeIn <chr> "2009-02-01T00:00:00.000Z", "2009-03-30T00:00:00.000Z", "2009-04-25T00:00:00.000Z", "2009-05-12T00:~
$ DateTimeSu <chr> "2009-02-01T00:00:00.000Z", "2009-03-30T00:00:00.000Z", "2009-04-25T00:00:00.000Z", "2009-05-12T00:~
$ DateTIme_1 <chr> "2009-02-01T00:00:00.000Z", "2009-03-30T00:00:00.000Z", "2009-04-25T00:00:00.000Z", "2009-05-12T00:~
$ ContactMet <chr> "Subject Cell Phone", "Reported Missing", "Reported Missing", "Subject Cell Phone", "Reported Missi~
$ EcoRegionD <chr> "Temperate", "Temperate", "Temperate", "Temperate", "Temperate", "Temperate", "Temperate", "Tempera~
$ EcoRegio_1 <chr> "M260 Mediterranean Regime Mountains", "M260 Mediterranean Regime Mountains", "M260 Mediterranean R~
$ IncidType <chr> "Search", "Separated Party", "Overdue", "Search", "Separated Party", "Overdue", "Overdue", "Overdue~
$ NumberofSu <int> 1, 1, 1, 1, 1, 1, 2, 2, 3, 1, 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 2, 1, 1, ~
$ GroupDynam <chr> "Solo Subject", "Solo Subject", "Solo Subject", "Solo Subject", "Solo Subject", "Solo Subject", "Gr~
$ SubjectCat <chr> "Mental Retardation", "Hiker", "Child (13-15)", "Hiker", "Child (4-6)", "Hiker", "Climber", "Hiker"~
$ SubSex <chr> "Male", "Male", "Male", "Male", "Male", "Male", "Group - Mixed Sex", "Group - All Males", "Group -~
$ SubAge <int> 31, 0, 14, 35, 6, 29, 0, 0, 0, 23, 0, 0, 54, 0, 15, 13, 72, 19, 0, 23, 0, 50, 62, 71, 18, 0, 40, 42~
$ IPPType <chr> "LKP", "PLS", "LKP", "LKP", "PLS", "PLS", "PLS", "PLS", "LKP", "PLS", "PLS", "PLS", "LKP", "PLS", "~
$ IPPClassif <chr> "Locality Description (Added)", "Woods", "Building", "Locality Description (Added)", "Trailhead", "~
$ IncidContr <chr> "Darkness", "Unknown", "Unknown", "Snow/Ice", "Unknown", "Unknown", "Weather - Cold", "Darkness", "~
$ IncidOutco <chr> "Subject Found Alive", "Subject Found Alive", "Subject Found Alive", "Subject Found Alive", "Subjec~
$ Scenario <chr> "Lost", "Separated", "Overdue", "Lost", "Separated", "Overdue", "Overdue", "Lost", "Lost", "Despond~
$ SubjMedInj <chr> "None", "None", "None", "None", "None", "None", "None", "Other", "None", "Other", "None", "None", "~
$ RescueMeth <chr> "Snow Machine", "Walkout", "Other", "Helicopter", "Other", "Other", "Walkout", "Vehicle", "Helicopt~
$ LostPerson <chr> "Route Traveling", "Route Traveling", "Unknown", "Unknown", "Not Lost", "Unknown", "Unknown", "View~
$ IPP_GR_Loc <chr> "Badger Pass Ski Area", "Tuolumne Grove", "Lower Falls Restroom", "North Dome", "Lower Falls Trailh~
$ IPP_GR_Typ <chr> "NEAR A FEATURE", "FEATURE (NAMED PLACE)", "NEAR A FEATURE", "FEATURE (NAMED PLACE)", "FEATURE (NAM~
$ IPP_GR_Pat <chr> "Null", "Null", "Null", "Null", "Null", "Trail", "Null", "Null", "Trail", "Null", "ClimbingRoute", ~
$ IPP_GR_Not <chr> "Subject's last known point was described as \"Near Badger Pass Ski Area\"", "PLS - in the Tuolumne~
$ Intended_D <chr> "Unknown", "Unknown", "Top of Yosemite Falls", "Loop - back to Yosemite Valley", "Unknown", "Top of~
$ FindFeatur <chr> "Forest/woods", "Road", "Structure", "Forest/woods", "Structure", "Road", "Linear Feature", "Road",~
$ Found_GR_L <chr> "Eagle Chair Lift", "Tuolumne Grove Parking", "Lower Falls Restroom", "Indian Ridge", "Yosemite Lod~
$ Found_GR_T <chr> "OFFSET DIRECTION", "FEATURE (NAMED PLACE)", "NEAR A FEATURE", "NEAR A FEATURE", "FEATURE (NAMED PL~
$ Found_GR_P <chr> "Null", "Null", "Null", "Null", "Null", "Road", "Null", "Road", "Null", "Null", "Null", "Null", "Hy~
$ Found_GR_N <chr> "Found just south of the top of the Eale Chair Lift at Badger Pass Ski at an elevation of approxima~
$ Motorized_ <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ Incident_N <chr> "Subject was snowshoeing, became disoriented, and called for help. Subject described as mentally ch~
$ TotalTimeM <int> 18, -19, 5, 15, 1, 5, -12, 8, 33, 37, 21, 24, 5, -19, 20, 17, 5, 20, 2, 16, 40, 20, 34, 31, 120, 16~
$ TotalSearc <int> 1, -19, 0, 1, 1, 1, -22, 5, 13, 28, 2, 1, 2, -23, 1, 2, 1, 1, 2, 1, 6, 2, 1, 12, -8, 2, 6, -8, 1, 1~
$ GlobalID <chr> "083c9dbc-711f-4127-861d-b2f7b5bb0470", "5f387c80-547a-4a46-9757-21bad561a810", "690530d3-5221-4cda~
Step 2 is to convert it to a sf data frame. We can surmise from the column names that the coordinates are geographic. We don’t know precisely which datum, but passing crs=4326 (WGS84) will be close enough.
## Convert to sf and plot
yose_missing_ll <- st_as_sf(missing_df, coords=c("Long", "Lat"), crs=4326)
Plot to make sure:
{plot(yose_bnd_ll %>% st_geometry(), col=NA, border="chartreuse4", lwd=3, main = "Missing Persons!")
plot(yose_missing_ll %>% st_geometry(), pch=16, cex=0.5, add=TRUE)}
Look at the other GIS files in the data folder. Select one, import it, and plot it.
## Your answer here