In this Notebook, we’ll practice working with data frames including:

  • using the built-in sample data frames that come with base R
  • importing a csv file
  • viewing the properties of a data frame
  • grabbing individual columns
  • filtering rows and columns with square bracket notation
  • sorting rows

Sample Data Frames

R comes with several sample data frames, for example mtcars.

mtcars

Wondering where the mtcars data came from? Just like functions, sample datasets usually have their help pages. Run ?mtcars to see where this one came from.

Importing a CSV file

csv (comma separated values) is a common format for tabular data. You can import a csv file using base R with read.csv().

Import sf_libraries.csv in the data directory:

csv_fn <- "./data/ca_breweries.csv"
file.exists(csv_fn)
[1] TRUE
breweries_df <- read.csv(csv_fn)
head(breweries_df)

Viewing the Properties of a Data Frame

You can view the number of rows and columns of a dataframe with nrow() and ncol():

nrow(breweries_df)
[1] 311
ncol(mtcars)
[1] 11

You can view the names of the columns in a data frame with names():

names(mtcars)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

The tibble package has a nice function called glimpse() that will show you the names, column types, and first few values for each column in a concise format:

tibble::glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 30~
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 167.6, 167.6, 275.8, 275.8, 275.8, 472.0, 460.0~
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150, 150, 24~
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92, 3.07, 3.07, 3.07, 2.93, 3.00, 3.23, 4.08, 4.~
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.440, 3.440, 4.070, 3.730, 3.780, 5.250, 5.424~
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18.30, 18.90, 17.40, 17.60, 18.00, 17.98, 17.82~
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 5, 4
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1, 2, 2, 4, 6, 8, 2

Grabbing Columns

You can grab a single column using the $ operator.

Extract the values in the mpg column:

mtcars$mpg
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3
[25] 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4


CHALLENGE: Compute the average mpg

Compute the average mpg of vehicles in mtcars. Answer

mean(mtcars$mpg)
[1] 20.09062


CHALLENGE: Summarize quakes data frame

Answer the following questions about the quakes data frame, which has data about some earthquakes: Answer

  1. How many rows and columns does it contain?
  2. What is the average magnitude of the earthquakes recorded in this data frame?
  3. What is the minimum and maximum latitude and longitude of the earthquakes in this dataset? (hint: look at the range() function)
  4. In which year was the earliest earthquake recorded?
dim(quakes)
[1] 1000    5
mean(quakes$mag)
[1] 4.6204
range(quakes$lat)
[1] -38.59 -10.72
range(quakes$long)
[1] 165.67 188.13
# View the help page for this data set with 
# ?quakes
# "The events occurred in a cube near Fiji since 1964."

Filtering rows and columns

You can filter (subset) rows and columns using square bracket notation. Example:

my_df[rows-expression, cols-expression]

To view the first 5 rows of breweries_df, we pass a vector of integers as the rows expression:

quakes[1:5, ]

NOTE: You can omit the rows-expression or cols-expression, but you still need a comma instead the square brackets.

View every 5th row in quakes:

quakes[ c(5, 10, 15, 20, 25), ]

To return rows that meet a certain condition, rows can be an expression that returns TRUE/FALSE values:

## Quakes whose magnitude was >= 5.9
quakes[ quakes$mag >= 5.9,  ]


CHALLENGE: How many quakes?

How many earthquakes were detected by 100 or more stations? Answer

quakes[ quakes$stations >= 100,  ]
sum(quakes$stations >= 100)
[1] 18


CHALLENGE: Largest earthquake

What was the largest earthquake on record? Answer

max(quakes$mag)
[1] 6.4

Sorting Rows

You can also use the rows expression to sort the rows. The key to this is using order(), which returns the indices of elements in a vector sorted:

x <- c(50, 20, 70, 40, 90)
x
[1] 50 20 70 40 90
order(x)
[1] 2 4 1 3 5

To sort rows in a data frame, we simply pass a vector of integers in the desired order:

quakes[ order(quakes$mag), ]

The cols-expression can be vector of integers (corresponding to column numbers you want returned), or a character vector containing column names. You can also use the cols-expression to reorder the columns.

Write an expression that will return the longitude and latitude columns only (in that order) for the biggest 10 earthquakes (by magnitude).

mag_topten_idx <- order(quakes$mag, decreasing = TRUE)[1:10]
quakes[mag_topten_idx, ]


CHALLENGE: Compute MPG

Using the mtcars data frame, compute the average mpg for 4, 6, and 8 cylinder vehicles. Answer

## 4 cylinder
mean(mtcars[ mtcars$cyl == 4, "mpg" ])
[1] 26.66364
## 6 cylinder
mean(mtcars[ mtcars$cyl == 6, "mpg" ])
[1] 19.74286
## 8 cylinder
mean(mtcars[ mtcars$cyl == 8, "mpg" ])
[1] 15.1
LS0tDQp0aXRsZTogIkRhdGEgRnJhbWVzIDEwMSINCm91dHB1dDogDQogIGh0bWxfbm90ZWJvb2s6DQogICAgdG9jOiB5ZXMNCiAgICB0b2NfZmxvYXQ6IHllcw0KLS0tDQoNCkluIHRoaXMgTm90ZWJvb2ssIHdlJ2xsIHByYWN0aWNlIHdvcmtpbmcgd2l0aCBkYXRhIGZyYW1lcyBpbmNsdWRpbmc6DQoNCi0gdXNpbmcgdGhlIGJ1aWx0LWluIHNhbXBsZSBkYXRhIGZyYW1lcyB0aGF0IGNvbWUgd2l0aCBiYXNlIFIgDQotIGltcG9ydGluZyBhIGNzdiBmaWxlICAgDQotIHZpZXdpbmcgdGhlIHByb3BlcnRpZXMgb2YgYSBkYXRhIGZyYW1lICANCi0gZ3JhYmJpbmcgaW5kaXZpZHVhbCBjb2x1bW5zICANCi0gZmlsdGVyaW5nIHJvd3MgYW5kIGNvbHVtbnMgd2l0aCBzcXVhcmUgYnJhY2tldCBub3RhdGlvbiAgDQotIHNvcnRpbmcgcm93cyAgDQoNCiMjIFNhbXBsZSBEYXRhIEZyYW1lcw0KDQpSIGNvbWVzIHdpdGggc2V2ZXJhbCBzYW1wbGUgZGF0YSBmcmFtZXMsIGZvciBleGFtcGxlIGBtdGNhcnNgLg0KDQpgYGB7ciBjaHVuazAxfQ0KbXRjYXJzDQpgYGANCg0KV29uZGVyaW5nIHdoZXJlIHRoZSBgbXRjYXJzYCBkYXRhIGNhbWUgZnJvbT8gSnVzdCBsaWtlIGZ1bmN0aW9ucywgc2FtcGxlIGRhdGFzZXRzIHVzdWFsbHkgaGF2ZSB0aGVpciBoZWxwIHBhZ2VzLiBSdW4gYD9tdGNhcnNgIHRvIHNlZSB3aGVyZSB0aGlzIG9uZSBjYW1lIGZyb20uDQoNCiMjIEltcG9ydGluZyBhIENTViBmaWxlDQoNCmNzdiAoY29tbWEgc2VwYXJhdGVkIHZhbHVlcykgaXMgYSBjb21tb24gZm9ybWF0IGZvciB0YWJ1bGFyIGRhdGEuIFlvdSBjYW4gaW1wb3J0IGEgY3N2IGZpbGUgdXNpbmcgYmFzZSBSIHdpdGggIGByZWFkLmNzdigpYC4gDQoNCkltcG9ydCAqc2ZfbGlicmFyaWVzLmNzdiogaW4gdGhlIGRhdGEgZGlyZWN0b3J5OiAgDQoNCmBgYHtyIGNodW5rMDJ9DQpjc3ZfZm4gPC0gIi4vZGF0YS9jYV9icmV3ZXJpZXMuY3N2Ig0KZmlsZS5leGlzdHMoY3N2X2ZuKQ0KYnJld2VyaWVzX2RmIDwtIHJlYWQuY3N2KGNzdl9mbikNCmhlYWQoYnJld2VyaWVzX2RmKQ0KYGBgDQoNCiMjIFZpZXdpbmcgdGhlIFByb3BlcnRpZXMgb2YgYSBEYXRhIEZyYW1lDQoNCllvdSBjYW4gdmlldyB0aGUgbnVtYmVyIG9mICoqcm93cyoqIGFuZCAqKmNvbHVtbnMqKiBvZiBhIGRhdGFmcmFtZSB3aXRoIGBucm93KClgIGFuZCBgbmNvbCgpYDoNCg0KYGBge3IgY2h1bmswM30NCm5yb3coYnJld2VyaWVzX2RmKQ0KbmNvbChtdGNhcnMpDQpgYGANCllvdSBjYW4gdmlldyB0aGUgKipuYW1lcyoqIG9mIHRoZSBjb2x1bW5zIGluIGEgZGF0YSBmcmFtZSB3aXRoIG5hbWVzKCk6DQoNCmBgYHtyIGNodW5rMDR9DQpuYW1lcyhtdGNhcnMpDQpgYGANCg0KVGhlIHRpYmJsZSBwYWNrYWdlIGhhcyBhIG5pY2UgZnVuY3Rpb24gY2FsbGVkIGBnbGltcHNlKClgIHRoYXQgd2lsbCBzaG93IHlvdSB0aGUgbmFtZXMsIGNvbHVtbiB0eXBlcywgYW5kIGZpcnN0IGZldyB2YWx1ZXMgZm9yIGVhY2ggY29sdW1uIGluIGEgY29uY2lzZSBmb3JtYXQ6DQoNCmBgYHtyIGNodW5rMDV9DQp0aWJibGU6OmdsaW1wc2UobXRjYXJzKQ0KYGBgDQoNCiMjIEdyYWJiaW5nIENvbHVtbnMNCg0KWW91IGNhbiBncmFiIGEgc2luZ2xlIGNvbHVtbiB1c2luZyB0aGUgYCRgIG9wZXJhdG9yLiANCg0KRXh0cmFjdCB0aGUgdmFsdWVzIGluIHRoZSBtcGcgY29sdW1uOg0KDQpgYGB7ciBjaHVuazA2fQ0KbXRjYXJzJG1wZw0KYGBgDQoNClwNCg0KIyMgQ0hBTExFTkdFOiBDb21wdXRlIHRoZSBhdmVyYWdlIG1wZyANCg0KQ29tcHV0ZSB0aGUgYXZlcmFnZSBtcGcgb2YgdmVoaWNsZXMgaW4gbXRjYXJzLiBbQW5zd2VyXShodHRwOi8vYml0Lmx5LzMxMG84YlcpDQoNCmBgYHtyIGNodW5rMDd9DQptZWFuKG10Y2FycyRtcGcpDQpgYGANCg0KXA0KDQojIyBDSEFMTEVOR0U6IFN1bW1hcml6ZSBxdWFrZXMgZGF0YSBmcmFtZQ0KDQpBbnN3ZXIgdGhlIGZvbGxvd2luZyBxdWVzdGlvbnMgYWJvdXQgdGhlIGBxdWFrZXNgIGRhdGEgZnJhbWUsIHdoaWNoIGhhcyBkYXRhIGFib3V0IHNvbWUgZWFydGhxdWFrZXM6IFtBbnN3ZXJdKGh0dHA6Ly9iaXQubHkvM2VVb3lJVikNCg0KMSkgSG93IG1hbnkgcm93cyBhbmQgY29sdW1ucyBkb2VzIGl0IGNvbnRhaW4/ICANCjIpIFdoYXQgaXMgdGhlIGF2ZXJhZ2UgbWFnbml0dWRlIG9mIHRoZSBlYXJ0aHF1YWtlcyByZWNvcmRlZCBpbiB0aGlzIGRhdGEgZnJhbWU/ICANCjMpIFdoYXQgaXMgdGhlIG1pbmltdW0gYW5kIG1heGltdW0gbGF0aXR1ZGUgYW5kIGxvbmdpdHVkZSBvZiB0aGUgZWFydGhxdWFrZXMgaW4gdGhpcyBkYXRhc2V0PyAoaGludDogbG9vayBhdCB0aGUgYHJhbmdlKClgIGZ1bmN0aW9uKSAgDQo0KSBJbiB3aGljaCB5ZWFyIHdhcyB0aGUgZWFybGllc3QgZWFydGhxdWFrZSByZWNvcmRlZD8gIA0KDQpgYGB7ciBjaHVuazA4fQ0KZGltKHF1YWtlcykNCm1lYW4ocXVha2VzJG1hZykNCnJhbmdlKHF1YWtlcyRsYXQpDQpyYW5nZShxdWFrZXMkbG9uZykNCg0KIyBWaWV3IHRoZSBoZWxwIHBhZ2UgZm9yIHRoaXMgZGF0YSBzZXQgd2l0aCANCiMgP3F1YWtlcw0KIyAiVGhlIGV2ZW50cyBvY2N1cnJlZCBpbiBhIGN1YmUgbmVhciBGaWppIHNpbmNlIDE5NjQuIg0KYGBgDQoNCiMjIEZpbHRlcmluZyByb3dzIGFuZCBjb2x1bW5zDQoNCllvdSBjYW4gZmlsdGVyIChzdWJzZXQpIHJvd3MgYW5kIGNvbHVtbnMgdXNpbmcgc3F1YXJlIGJyYWNrZXQgbm90YXRpb24uIEV4YW1wbGU6DQoNCmBteV9kZltyb3dzLWV4cHJlc3Npb24sIGNvbHMtZXhwcmVzc2lvbl1gDQoNClRvIHZpZXcgdGhlIGZpcnN0IDUgcm93cyBvZiBgYnJld2VyaWVzX2RmYCwgd2UgcGFzcyBhIHZlY3RvciBvZiBpbnRlZ2VycyBhcyB0aGUgcm93cyBleHByZXNzaW9uOg0KDQpgYGB7ciBjaHVuazA5fQ0KcXVha2VzWzE6NSwgXQ0KYGBgDQoNCk5PVEU6IFlvdSBjYW4gb21pdCB0aGUgcm93cy1leHByZXNzaW9uIG9yIGNvbHMtZXhwcmVzc2lvbiwgYnV0IHlvdSBzdGlsbCBuZWVkIGEgY29tbWEgaW5zdGVhZCB0aGUgc3F1YXJlIGJyYWNrZXRzLg0KDQpWaWV3IGV2ZXJ5IDV0aCByb3cgaW4gcXVha2VzOg0KDQpgYGB7ciBjaHVuazEwfQ0KcXVha2VzWyBjKDUsIDEwLCAxNSwgMjAsIDI1KSwgXQ0KYGBgDQoNClRvIHJldHVybiByb3dzIHRoYXQgbWVldCBhIGNlcnRhaW4gY29uZGl0aW9uLCAqcm93cyogY2FuIGJlIGFuIGV4cHJlc3Npb24gdGhhdCByZXR1cm5zIFRSVUUvRkFMU0UgdmFsdWVzOg0KDQpgYGB7ciBjaHVuazExfQ0KIyMgUXVha2VzIHdob3NlIG1hZ25pdHVkZSB3YXMgPj0gNS45DQpxdWFrZXNbIHF1YWtlcyRtYWcgPj0gNS45LCAgXQ0KYGBgDQoNClwNCg0KIyMgQ0hBTExFTkdFOiBIb3cgbWFueSBxdWFrZXM/DQoNCkhvdyBtYW55IGVhcnRocXVha2VzIHdlcmUgZGV0ZWN0ZWQgYnkgMTAwIG9yIG1vcmUgc3RhdGlvbnM/IFtBbnN3ZXJdKGh0dHA6Ly9iaXQubHkvM3M1MXdUcSkNCg0KYGBge3IgY2h1bmsxMn0NCnF1YWtlc1sgcXVha2VzJHN0YXRpb25zID49IDEwMCwgIF0NCnN1bShxdWFrZXMkc3RhdGlvbnMgPj0gMTAwKQ0KYGBgDQoNClwNCg0KIyMgQ0hBTExFTkdFOiBMYXJnZXN0IGVhcnRocXVha2UNCg0KV2hhdCB3YXMgdGhlIGxhcmdlc3QgZWFydGhxdWFrZSBvbiByZWNvcmQ/IFtBbnN3ZXJdKGh0dHA6Ly9iaXQubHkvMzkxSDdIcykNCg0KYGBge3IgY2h1bmsxM30NCm1heChxdWFrZXMkbWFnKQ0KYGBgDQoNCiMjIFNvcnRpbmcgUm93cw0KDQpZb3UgY2FuIGFsc28gdXNlIHRoZSByb3dzIGV4cHJlc3Npb24gdG8gc29ydCB0aGUgcm93cy4gVGhlIGtleSB0byB0aGlzIGlzIHVzaW5nIGBvcmRlcigpYCwgd2hpY2ggcmV0dXJucyB0aGUgaW5kaWNlcyBvZiBlbGVtZW50cyBpbiBhIHZlY3RvciBzb3J0ZWQ6DQoNCmBgYHtyIGNodW5rMTR9DQp4IDwtIGMoNTAsIDIwLCA3MCwgNDAsIDkwKQ0KeA0Kb3JkZXIoeCkNCmBgYA0KDQpUbyBzb3J0IHJvd3MgaW4gYSBkYXRhIGZyYW1lLCB3ZSBzaW1wbHkgcGFzcyBhIHZlY3RvciBvZiBpbnRlZ2VycyBpbiB0aGUgZGVzaXJlZCBvcmRlcjoNCg0KYGBge3IgY2h1bmsxNX0NCnF1YWtlc1sgb3JkZXIocXVha2VzJG1hZyksIF0NCmBgYA0KDQoNClRoZSAqY29scy1leHByZXNzaW9uKiBjYW4gYmUgdmVjdG9yIG9mIGludGVnZXJzIChjb3JyZXNwb25kaW5nIHRvIGNvbHVtbiBudW1iZXJzIHlvdSB3YW50IHJldHVybmVkKSwgb3IgYSBjaGFyYWN0ZXIgdmVjdG9yIGNvbnRhaW5pbmcgY29sdW1uIG5hbWVzLiBZb3UgY2FuIGFsc28gdXNlIHRoZSBjb2xzLWV4cHJlc3Npb24gdG8gcmVvcmRlciB0aGUgY29sdW1ucy4NCg0KV3JpdGUgYW4gZXhwcmVzc2lvbiB0aGF0IHdpbGwgcmV0dXJuIHRoZSBsb25naXR1ZGUgYW5kIGxhdGl0dWRlIGNvbHVtbnMgb25seSAoaW4gdGhhdCBvcmRlcikgZm9yIHRoZSBiaWdnZXN0IDEwIGVhcnRocXVha2VzIChieSBtYWduaXR1ZGUpLg0KDQpgYGB7ciBjaHVuazE2fQ0KbWFnX3RvcHRlbl9pZHggPC0gb3JkZXIocXVha2VzJG1hZywgZGVjcmVhc2luZyA9IFRSVUUpWzE6MTBdDQpxdWFrZXNbbWFnX3RvcHRlbl9pZHgsIF0NCmBgYA0KDQpcDQoNCiMjIENIQUxMRU5HRTogQ29tcHV0ZSBNUEcNCg0KVXNpbmcgdGhlIG10Y2FycyBkYXRhIGZyYW1lLCBjb21wdXRlIHRoZSBhdmVyYWdlIG1wZyBmb3IgNCwgNiwgYW5kIDggY3lsaW5kZXIgdmVoaWNsZXMuIFtBbnN3ZXJdKGh0dHA6Ly9iaXQubHkvM3M2bWVTTCkNCg0KYGBge3IgY2h1bmsxN30NCiMjIDQgY3lsaW5kZXINCm1lYW4obXRjYXJzWyBtdGNhcnMkY3lsID09IDQsICJtcGciIF0pDQoNCiMjIDYgY3lsaW5kZXINCm1lYW4obXRjYXJzWyBtdGNhcnMkY3lsID09IDYsICJtcGciIF0pDQoNCiMjIDggY3lsaW5kZXINCm1lYW4obXRjYXJzWyBtdGNhcnMkY3lsID09IDgsICJtcGciIF0pDQoNCmBgYA0KDQoNCg0K