dplyr is a package for manipulating data frames, including subsetting rows and columns, sorting rows, adding new columns, grouping rows, joins, etc.

In this notebook we’ll practice using the dplyr package.

Set preferences with conflicted

Several dplyr functions have very generic names. To avoid potential conflicts with functions from other packages, we’ll start by stating our preferences using the conflicted package:

library(dplyr)

Attaching package: 㤼㸱dplyr㤼㸲

The following objects are masked from 㤼㸱package:stats㤼㸲:

    filter, lag

The following objects are masked from 㤼㸱package:base㤼㸲:

    intersect, setdiff, setequal, union
## Load the conflicted package
library(conflicted)

# Set conflict preferences
conflict_prefer("filter", "dplyr", quiet = TRUE)
conflict_prefer("count", "dplyr", quiet = TRUE)
conflict_prefer("select", "dplyr", quiet = TRUE)
conflict_prefer("arrange", "dplyr", quiet = TRUE)

Remember the iris data frame? How many rows does iris have?

nrow(iris)
[1] 150

Filter rows with filter()

Write an expression that returns just those rows where Sepal.Length > 7:

# Select rows where Sepal.Length > 7
iris %>% filter(Sepal.Length > 7)

Add another criteria specifying the species has to be virginica:

HINT: remember to use ‘==’ to test for equality

iris %>% filter(Sepal.Length > 7, Species=="virginica")

CHALLENGE: Filter

Write an expression that will return rows where the Petal.Length is between 4.1 and 4.4. Answer

# Your answer here

Filter rows with slice()

Sometimes you want to fliter rows by row number, rather than an attribute column. You can do that with slice().

Show the first 10 rows from iris:

iris %>% slice(1:10)

Filter columns with select()

To get a subset of columns, or if you want the columns in a different order, include the select() function in your expression.

For example if you only want the Sepal.Length and Sepal.Width columns, you could run:

iris %>% 
  filter(Sepal.Length > 7) %>% 
  select(Sepal.Length, Sepal.Width, Species)

Sort rows with arrange()

Add arrange() to your expression to sort rows. For example to sort the last expression by Sepal.Width we simply add one more piece to the expression:

iris %>% 
  filter(Sepal.Length > 7) %>% 
  select(Sepal.Length, Sepal.Width, Species) %>% 
  arrange(Sepal.Width)

To sort rows in descending order, wrap the field name in desc():

iris %>% 
  filter(Sepal.Length > 7) %>% 
  select(Sepal.Length, Sepal.Width, Species) %>% 
  arrange(desc(Sepal.Width))

CHALLENGE: Top-n Values

Write an expression that returns the top-5 Petal Lengths (all species)

Hint: there are a couple of ways to do this. Look at the dplyr Cheatsheet for inspiration. Answer

# Your answer here

Create a new column with mutate()

Create a new column which contains the product of the Sepal Length and Petal Length.

iris %>% mutate(sepal_petal_product = Sepal.Length * Petal.Length)

CHALLENGE: Filter on Multiple Columns

Write an expression from the quakes data frame that:

  • returns the lat, long, and mag columns
  • only return quakes that are i) north of 20 degrees South, and ii) larger than magnitude 5.0.
  • sorts the rows by magnitude

Hint: Fiji is south of the equator. Answer

# Your answer here

Group and Summarize Rows

You can ‘group’ rows using group_by(). By itself, grouping rows doesn’t do much. Usually you follow group_by() with summarise() which creates some kind of summary for each group.

Example: for each Species in iris, compute the mean petal length and width

iris %>% group_by(Species) %>% 
  summarise(avg_petal_length = mean(Petal.Length), avg_petal_width = mean(Petal.Width))

CHALLENGE: Group and Summarise

For each species in iris, compute the minimum and maximum sepal length. Answer

# Your answer here

End

Congratulations, you have completed the Notebook!

To view your Notebook at HTML, save it (again), then click the ‘Preview’ button in the RStudio toolbar.

LS0tDQp0aXRsZTogIldyYW5nbGluZyAyRCBEYXRhIHdpdGggZHBseXIiDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIHRvYzogeWVzDQogICAgdG9jX2Zsb2F0OiB5ZXMNCi0tLQ0KDQpgZHBseXJgIGlzIGEgcGFja2FnZSBmb3IgbWFuaXB1bGF0aW5nIGRhdGEgZnJhbWVzLCBpbmNsdWRpbmcgc3Vic2V0dGluZyByb3dzIGFuZCBjb2x1bW5zLCBzb3J0aW5nIHJvd3MsIGFkZGluZyBuZXcgY29sdW1ucywgZ3JvdXBpbmcgcm93cywgam9pbnMsIGV0Yy4NCg0KSW4gdGhpcyBub3RlYm9vayB3ZSdsbCBwcmFjdGljZSB1c2luZyB0aGUgYGRwbHlyYCBwYWNrYWdlLg0KDQojIyBTZXQgcHJlZmVyZW5jZXMgd2l0aCBgY29uZmxpY3RlZGANCg0KU2V2ZXJhbCBgZHBseXJgIGZ1bmN0aW9ucyBoYXZlIHZlcnkgZ2VuZXJpYyBuYW1lcy4gVG8gYXZvaWQgcG90ZW50aWFsIGNvbmZsaWN0cyB3aXRoIGZ1bmN0aW9ucyBmcm9tICBvdGhlciBwYWNrYWdlcywgd2UnbGwgc3RhcnQgYnkgc3RhdGluZyBvdXIgcHJlZmVyZW5jZXMgdXNpbmcgdGhlIGNvbmZsaWN0ZWQgcGFja2FnZToNCg0KYGBge3IgY2h1bmswMX0NCmxpYnJhcnkoZHBseXIpDQoNCiMjIExvYWQgdGhlIGNvbmZsaWN0ZWQgcGFja2FnZQ0KbGlicmFyeShjb25mbGljdGVkKQ0KDQojIFNldCBjb25mbGljdCBwcmVmZXJlbmNlcw0KY29uZmxpY3RfcHJlZmVyKCJmaWx0ZXIiLCAiZHBseXIiLCBxdWlldCA9IFRSVUUpDQpjb25mbGljdF9wcmVmZXIoImNvdW50IiwgImRwbHlyIiwgcXVpZXQgPSBUUlVFKQ0KY29uZmxpY3RfcHJlZmVyKCJzZWxlY3QiLCAiZHBseXIiLCBxdWlldCA9IFRSVUUpDQpjb25mbGljdF9wcmVmZXIoImFycmFuZ2UiLCAiZHBseXIiLCBxdWlldCA9IFRSVUUpDQpgYGANCg0KUmVtZW1iZXIgdGhlIGlyaXMgZGF0YSBmcmFtZT8gSG93IG1hbnkgcm93cyBkb2VzIGlyaXMgaGF2ZT8gDQoNCmBgYHtyIGNodW5rMDJ9DQpucm93KGlyaXMpDQpgYGANCg0KIyMgRmlsdGVyIHJvd3Mgd2l0aCBgZmlsdGVyKClgDQoNCldyaXRlIGFuIGV4cHJlc3Npb24gdGhhdCByZXR1cm5zIGp1c3QgdGhvc2Ugcm93cyB3aGVyZSBTZXBhbC5MZW5ndGggPiA3Og0KDQpgYGB7ciBjaHVuazAzfQ0KIyBTZWxlY3Qgcm93cyB3aGVyZSBTZXBhbC5MZW5ndGggPiA3DQppcmlzICU+JSBmaWx0ZXIoU2VwYWwuTGVuZ3RoID4gNykNCmBgYA0KDQpBZGQgYW5vdGhlciBjcml0ZXJpYSBzcGVjaWZ5aW5nIHRoZSBzcGVjaWVzIGhhcyB0byBiZSBgdmlyZ2luaWNhYDoNCg0KSElOVDogcmVtZW1iZXIgdG8gdXNlICc9PScgdG8gdGVzdCBmb3IgZXF1YWxpdHkNCg0KYGBge3IgY2h1bmswNH0NCmlyaXMgJT4lIGZpbHRlcihTZXBhbC5MZW5ndGggPiA3LCBTcGVjaWVzPT0idmlyZ2luaWNhIikNCmBgYA0KDQojIyBDSEFMTEVOR0U6IEZpbHRlcg0KDQpXcml0ZSBhbiBleHByZXNzaW9uIHRoYXQgd2lsbCByZXR1cm4gcm93cyB3aGVyZSB0aGUgUGV0YWwuTGVuZ3RoIGlzIGJldHdlZW4gNC4xIGFuZCA0LjQuIFtBbnN3ZXJdKGh0dHBzOi8vYml0Lmx5LzN0TlBWSUgpDQoNCmBgYHtyIGNodW5rMDV9DQojIFlvdXIgYW5zd2VyIGhlcmUNCg0KYGBgDQoNCiMjIEZpbHRlciByb3dzIHdpdGggYHNsaWNlKClgDQoNClNvbWV0aW1lcyB5b3Ugd2FudCB0byBmbGl0ZXIgcm93cyBieSByb3cgbnVtYmVyLCByYXRoZXIgdGhhbiBhbiBhdHRyaWJ1dGUgY29sdW1uLiBZb3UgY2FuIGRvIHRoYXQgd2l0aCBgc2xpY2UoKWAuDQoNClNob3cgdGhlIGZpcnN0IDEwIHJvd3MgZnJvbSBpcmlzOg0KDQpgYGB7ciBjaHVuazA2fQ0KaXJpcyAlPiUgc2xpY2UoMToxMCkNCmBgYA0KDQojIyBGaWx0ZXIgY29sdW1ucyB3aXRoIGBzZWxlY3QoKWANCg0KVG8gZ2V0IGEgc3Vic2V0IG9mIGNvbHVtbnMsIG9yIGlmIHlvdSB3YW50IHRoZSBjb2x1bW5zIGluIGEgZGlmZmVyZW50IG9yZGVyLCBpbmNsdWRlIHRoZSBgc2VsZWN0KClgIGZ1bmN0aW9uIGluIHlvdXIgZXhwcmVzc2lvbi4NCg0KRm9yIGV4YW1wbGUgaWYgeW91IG9ubHkgd2FudCB0aGUgYFNlcGFsLkxlbmd0aGAgYW5kIGBTZXBhbC5XaWR0aGAgY29sdW1ucywgeW91IGNvdWxkIHJ1bjoNCg0KYGBge3IgY2h1bmswN30NCmlyaXMgJT4lIA0KICBmaWx0ZXIoU2VwYWwuTGVuZ3RoID4gNykgJT4lIA0KICBzZWxlY3QoU2VwYWwuTGVuZ3RoLCBTZXBhbC5XaWR0aCwgU3BlY2llcykNCmBgYA0KDQojIyBTb3J0IHJvd3Mgd2l0aCBgYXJyYW5nZSgpYA0KDQpBZGQgYGFycmFuZ2UoKWAgdG8geW91ciBleHByZXNzaW9uIHRvIHNvcnQgcm93cy4gRm9yIGV4YW1wbGUgdG8gc29ydCB0aGUgbGFzdCBleHByZXNzaW9uIGJ5IGBTZXBhbC5XaWR0aGAgd2Ugc2ltcGx5IGFkZCBvbmUgbW9yZSBwaWVjZSB0byB0aGUgZXhwcmVzc2lvbjoNCg0KYGBge3IgY2h1bmswOH0NCmlyaXMgJT4lIA0KICBmaWx0ZXIoU2VwYWwuTGVuZ3RoID4gNykgJT4lIA0KICBzZWxlY3QoU2VwYWwuTGVuZ3RoLCBTZXBhbC5XaWR0aCwgU3BlY2llcykgJT4lIA0KICBhcnJhbmdlKFNlcGFsLldpZHRoKQ0KYGBgDQoNClRvIHNvcnQgcm93cyBpbiBkZXNjZW5kaW5nIG9yZGVyLCB3cmFwIHRoZSBmaWVsZCBuYW1lIGluIGBkZXNjKClgOg0KDQpgYGB7ciBjaHVuazA5fQ0KaXJpcyAlPiUgDQogIGZpbHRlcihTZXBhbC5MZW5ndGggPiA3KSAlPiUgDQogIHNlbGVjdChTZXBhbC5MZW5ndGgsIFNlcGFsLldpZHRoLCBTcGVjaWVzKSAlPiUgDQogIGFycmFuZ2UoZGVzYyhTZXBhbC5XaWR0aCkpDQpgYGANCg0KIyMgQ0hBTExFTkdFOiBUb3AtbiBWYWx1ZXMNCg0KV3JpdGUgYW4gZXhwcmVzc2lvbiB0aGF0IHJldHVybnMgdGhlIHRvcC01IFBldGFsIExlbmd0aHMgKGFsbCBzcGVjaWVzKQ0KDQpIaW50OiB0aGVyZSBhcmUgYSBjb3VwbGUgb2Ygd2F5cyB0byBkbyB0aGlzLiBMb29rIGF0IHRoZSBkcGx5ciBbQ2hlYXRzaGVldF0oaHR0cHM6Ly9naXRodWIuY29tL3JzdHVkaW8vY2hlYXRzaGVldHMvcmF3L21hc3Rlci9kYXRhLXRyYW5zZm9ybWF0aW9uLnBkZikgZm9yIGluc3BpcmF0aW9uLiBbQW5zd2VyXShodHRwczovL2JpdC5seS8zZDlyVnNTKQ0KDQpgYGB7ciBjaHVuazEwfQ0KIyBZb3VyIGFuc3dlciBoZXJlDQoNCmBgYA0KDQojIyBDcmVhdGUgYSBuZXcgY29sdW1uIHdpdGggYG11dGF0ZSgpYA0KDQpDcmVhdGUgYSBuZXcgY29sdW1uIHdoaWNoIGNvbnRhaW5zIHRoZSBwcm9kdWN0IG9mIHRoZSBTZXBhbCBMZW5ndGggYW5kIFBldGFsIExlbmd0aC4gDQoNCmBgYHtyIGNodW5rMTF9DQppcmlzICU+JSBtdXRhdGUoc2VwYWxfcGV0YWxfcHJvZHVjdCA9IFNlcGFsLkxlbmd0aCAqIFBldGFsLkxlbmd0aCkNCmBgYA0KDQojIyBDSEFMTEVOR0U6IEZpbHRlciBvbiBNdWx0aXBsZSBDb2x1bW5zDQoNCldyaXRlIGFuIGV4cHJlc3Npb24gZnJvbSB0aGUgYHF1YWtlc2AgZGF0YSBmcmFtZSB0aGF0Og0KDQotIHJldHVybnMgdGhlIGxhdCwgbG9uZywgYW5kIG1hZyBjb2x1bW5zICANCi0gb25seSByZXR1cm4gcXVha2VzIHRoYXQgYXJlIGkpIG5vcnRoIG9mIDIwIGRlZ3JlZXMgU291dGgsIGFuZCBpaSkgbGFyZ2VyIHRoYW4gbWFnbml0dWRlIDUuMC4gDQotIHNvcnRzIHRoZSByb3dzIGJ5IG1hZ25pdHVkZQ0KDQpIaW50OiBGaWppIGlzIHNvdXRoIG9mIHRoZSBlcXVhdG9yLiBbQW5zd2VyXShodHRwczovL2JpdC5seS8zZmRmbExSKQ0KDQpgYGB7ciBjaHVuazEyfQ0KIyBZb3VyIGFuc3dlciBoZXJlDQoNCmBgYA0KDQojIyBHcm91cCBhbmQgU3VtbWFyaXplIFJvd3MNCg0KWW91IGNhbiAnZ3JvdXAnIHJvd3MgdXNpbmcgYGdyb3VwX2J5KClgLiBCeSBpdHNlbGYsIGdyb3VwaW5nIHJvd3MgZG9lc24ndCBkbyBtdWNoLiBVc3VhbGx5IHlvdSBmb2xsb3cgYGdyb3VwX2J5KClgIHdpdGggc3VtbWFyaXNlKCkgd2hpY2ggY3JlYXRlcyBzb21lIGtpbmQgb2Ygc3VtbWFyeSBmb3IgZWFjaCBncm91cC4NCg0KRXhhbXBsZTogZm9yIGVhY2ggU3BlY2llcyBpbiBpcmlzLCBjb21wdXRlIHRoZSBtZWFuIHBldGFsIGxlbmd0aCBhbmQgd2lkdGgNCg0KYGBge3IgY2h1bmsxM30NCmlyaXMgJT4lIGdyb3VwX2J5KFNwZWNpZXMpICU+JSANCiAgc3VtbWFyaXNlKGF2Z19wZXRhbF9sZW5ndGggPSBtZWFuKFBldGFsLkxlbmd0aCksIGF2Z19wZXRhbF93aWR0aCA9IG1lYW4oUGV0YWwuV2lkdGgpKQ0KYGBgDQoNCiMjIENIQUxMRU5HRTogR3JvdXAgYW5kIFN1bW1hcmlzZQ0KDQpGb3IgZWFjaCBzcGVjaWVzIGluIGlyaXMsIGNvbXB1dGUgdGhlIG1pbmltdW0gYW5kIG1heGltdW0gc2VwYWwgbGVuZ3RoLiBbQW5zd2VyXShodHRwczovL2JpdC5seS8yVXRobGFUKQ0KDQpgYGB7ciBjaHVuazE0fQ0KIyBZb3VyIGFuc3dlciBoZXJlDQoNCmBgYA0KDQojIyBFbmQNCg0KQ29uZ3JhdHVsYXRpb25zLCB5b3UgaGF2ZSBjb21wbGV0ZWQgdGhlIE5vdGVib29rISANCg0KVG8gdmlldyB5b3VyIE5vdGVib29rIGF0IEhUTUwsIHNhdmUgaXQgKGFnYWluKSwgdGhlbiBjbGljayB0aGUgJ1ByZXZpZXcnIGJ1dHRvbiBpbiB0aGUgUlN0dWRpbyB0b29sYmFyLg0KDQoNCg==