dplyr
is a package for manipulating data frames, including subsetting rows and columns, sorting rows, adding new columns, grouping rows, joins, etc.
In this notebook we’ll practice using the dplyr
package.
Set preferences with conflicted
Several dplyr
functions have very generic names. To avoid potential conflicts with functions from other packages, we’ll start by stating our preferences using the conflicted package:
library(dplyr)
Attaching package: 㤼㸱dplyr㤼㸲
The following objects are masked from 㤼㸱package:stats㤼㸲:
filter, lag
The following objects are masked from 㤼㸱package:base㤼㸲:
intersect, setdiff, setequal, union
## Load the conflicted package
library(conflicted)
# Set conflict preferences
conflict_prefer("filter", "dplyr", quiet = TRUE)
conflict_prefer("count", "dplyr", quiet = TRUE)
conflict_prefer("select", "dplyr", quiet = TRUE)
conflict_prefer("arrange", "dplyr", quiet = TRUE)
Remember the iris data frame? How many rows does iris have?
nrow(iris)
[1] 150
Filter rows with filter()
Write an expression that returns just those rows where Sepal.Length > 7:
# Select rows where Sepal.Length > 7
iris %>% filter(Sepal.Length > 7)
Add another criteria specifying the species has to be virginica
:
HINT: remember to use ‘==’ to test for equality
iris %>% filter(Sepal.Length > 7, Species=="virginica")
CHALLENGE: Filter
Write an expression that will return rows where the Petal.Length is between 4.1 and 4.4. Answer
# Your answer here
Filter rows with slice()
Sometimes you want to fliter rows by row number, rather than an attribute column. You can do that with slice()
.
Show the first 10 rows from iris:
iris %>% slice(1:10)
Filter columns with select()
To get a subset of columns, or if you want the columns in a different order, include the select()
function in your expression.
For example if you only want the Sepal.Length
and Sepal.Width
columns, you could run:
iris %>%
filter(Sepal.Length > 7) %>%
select(Sepal.Length, Sepal.Width, Species)
Sort rows with arrange()
Add arrange()
to your expression to sort rows. For example to sort the last expression by Sepal.Width
we simply add one more piece to the expression:
iris %>%
filter(Sepal.Length > 7) %>%
select(Sepal.Length, Sepal.Width, Species) %>%
arrange(Sepal.Width)
To sort rows in descending order, wrap the field name in desc()
:
iris %>%
filter(Sepal.Length > 7) %>%
select(Sepal.Length, Sepal.Width, Species) %>%
arrange(desc(Sepal.Width))
CHALLENGE: Top-n Values
Write an expression that returns the top-5 Petal Lengths (all species)
Hint: there are a couple of ways to do this. Look at the dplyr Cheatsheet for inspiration. Answer
# Your answer here
Create a new column with mutate()
Create a new column which contains the product of the Sepal Length and Petal Length.
iris %>% mutate(sepal_petal_product = Sepal.Length * Petal.Length)
CHALLENGE: Filter on Multiple Columns
Write an expression from the quakes
data frame that:
- returns the lat, long, and mag columns
- only return quakes that are i) north of 20 degrees South, and ii) larger than magnitude 5.0.
- sorts the rows by magnitude
Hint: Fiji is south of the equator. Answer
# Your answer here
Group and Summarize Rows
You can ‘group’ rows using group_by()
. By itself, grouping rows doesn’t do much. Usually you follow group_by()
with summarise() which creates some kind of summary for each group.
Example: for each Species in iris, compute the mean petal length and width
iris %>% group_by(Species) %>%
summarise(avg_petal_length = mean(Petal.Length), avg_petal_width = mean(Petal.Width))
CHALLENGE: Group and Summarise
For each species in iris, compute the minimum and maximum sepal length. Answer
# Your answer here
End
Congratulations, you have completed the Notebook!
To view your Notebook at HTML, save it (again), then click the ‘Preview’ button in the RStudio toolbar.
LS0tDQp0aXRsZTogIldyYW5nbGluZyAyRCBEYXRhIHdpdGggZHBseXIiDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIHRvYzogeWVzDQogICAgdG9jX2Zsb2F0OiB5ZXMNCi0tLQ0KDQpgZHBseXJgIGlzIGEgcGFja2FnZSBmb3IgbWFuaXB1bGF0aW5nIGRhdGEgZnJhbWVzLCBpbmNsdWRpbmcgc3Vic2V0dGluZyByb3dzIGFuZCBjb2x1bW5zLCBzb3J0aW5nIHJvd3MsIGFkZGluZyBuZXcgY29sdW1ucywgZ3JvdXBpbmcgcm93cywgam9pbnMsIGV0Yy4NCg0KSW4gdGhpcyBub3RlYm9vayB3ZSdsbCBwcmFjdGljZSB1c2luZyB0aGUgYGRwbHlyYCBwYWNrYWdlLg0KDQojIyBTZXQgcHJlZmVyZW5jZXMgd2l0aCBgY29uZmxpY3RlZGANCg0KU2V2ZXJhbCBgZHBseXJgIGZ1bmN0aW9ucyBoYXZlIHZlcnkgZ2VuZXJpYyBuYW1lcy4gVG8gYXZvaWQgcG90ZW50aWFsIGNvbmZsaWN0cyB3aXRoIGZ1bmN0aW9ucyBmcm9tICBvdGhlciBwYWNrYWdlcywgd2UnbGwgc3RhcnQgYnkgc3RhdGluZyBvdXIgcHJlZmVyZW5jZXMgdXNpbmcgdGhlIGNvbmZsaWN0ZWQgcGFja2FnZToNCg0KYGBge3IgY2h1bmswMX0NCmxpYnJhcnkoZHBseXIpDQoNCiMjIExvYWQgdGhlIGNvbmZsaWN0ZWQgcGFja2FnZQ0KbGlicmFyeShjb25mbGljdGVkKQ0KDQojIFNldCBjb25mbGljdCBwcmVmZXJlbmNlcw0KY29uZmxpY3RfcHJlZmVyKCJmaWx0ZXIiLCAiZHBseXIiLCBxdWlldCA9IFRSVUUpDQpjb25mbGljdF9wcmVmZXIoImNvdW50IiwgImRwbHlyIiwgcXVpZXQgPSBUUlVFKQ0KY29uZmxpY3RfcHJlZmVyKCJzZWxlY3QiLCAiZHBseXIiLCBxdWlldCA9IFRSVUUpDQpjb25mbGljdF9wcmVmZXIoImFycmFuZ2UiLCAiZHBseXIiLCBxdWlldCA9IFRSVUUpDQpgYGANCg0KUmVtZW1iZXIgdGhlIGlyaXMgZGF0YSBmcmFtZT8gSG93IG1hbnkgcm93cyBkb2VzIGlyaXMgaGF2ZT8gDQoNCmBgYHtyIGNodW5rMDJ9DQpucm93KGlyaXMpDQpgYGANCg0KIyMgRmlsdGVyIHJvd3Mgd2l0aCBgZmlsdGVyKClgDQoNCldyaXRlIGFuIGV4cHJlc3Npb24gdGhhdCByZXR1cm5zIGp1c3QgdGhvc2Ugcm93cyB3aGVyZSBTZXBhbC5MZW5ndGggPiA3Og0KDQpgYGB7ciBjaHVuazAzfQ0KIyBTZWxlY3Qgcm93cyB3aGVyZSBTZXBhbC5MZW5ndGggPiA3DQppcmlzICU+JSBmaWx0ZXIoU2VwYWwuTGVuZ3RoID4gNykNCmBgYA0KDQpBZGQgYW5vdGhlciBjcml0ZXJpYSBzcGVjaWZ5aW5nIHRoZSBzcGVjaWVzIGhhcyB0byBiZSBgdmlyZ2luaWNhYDoNCg0KSElOVDogcmVtZW1iZXIgdG8gdXNlICc9PScgdG8gdGVzdCBmb3IgZXF1YWxpdHkNCg0KYGBge3IgY2h1bmswNH0NCmlyaXMgJT4lIGZpbHRlcihTZXBhbC5MZW5ndGggPiA3LCBTcGVjaWVzPT0idmlyZ2luaWNhIikNCmBgYA0KDQojIyBDSEFMTEVOR0U6IEZpbHRlcg0KDQpXcml0ZSBhbiBleHByZXNzaW9uIHRoYXQgd2lsbCByZXR1cm4gcm93cyB3aGVyZSB0aGUgUGV0YWwuTGVuZ3RoIGlzIGJldHdlZW4gNC4xIGFuZCA0LjQuIFtBbnN3ZXJdKGh0dHBzOi8vYml0Lmx5LzN0TlBWSUgpDQoNCmBgYHtyIGNodW5rMDV9DQojIFlvdXIgYW5zd2VyIGhlcmUNCg0KYGBgDQoNCiMjIEZpbHRlciByb3dzIHdpdGggYHNsaWNlKClgDQoNClNvbWV0aW1lcyB5b3Ugd2FudCB0byBmbGl0ZXIgcm93cyBieSByb3cgbnVtYmVyLCByYXRoZXIgdGhhbiBhbiBhdHRyaWJ1dGUgY29sdW1uLiBZb3UgY2FuIGRvIHRoYXQgd2l0aCBgc2xpY2UoKWAuDQoNClNob3cgdGhlIGZpcnN0IDEwIHJvd3MgZnJvbSBpcmlzOg0KDQpgYGB7ciBjaHVuazA2fQ0KaXJpcyAlPiUgc2xpY2UoMToxMCkNCmBgYA0KDQojIyBGaWx0ZXIgY29sdW1ucyB3aXRoIGBzZWxlY3QoKWANCg0KVG8gZ2V0IGEgc3Vic2V0IG9mIGNvbHVtbnMsIG9yIGlmIHlvdSB3YW50IHRoZSBjb2x1bW5zIGluIGEgZGlmZmVyZW50IG9yZGVyLCBpbmNsdWRlIHRoZSBgc2VsZWN0KClgIGZ1bmN0aW9uIGluIHlvdXIgZXhwcmVzc2lvbi4NCg0KRm9yIGV4YW1wbGUgaWYgeW91IG9ubHkgd2FudCB0aGUgYFNlcGFsLkxlbmd0aGAgYW5kIGBTZXBhbC5XaWR0aGAgY29sdW1ucywgeW91IGNvdWxkIHJ1bjoNCg0KYGBge3IgY2h1bmswN30NCmlyaXMgJT4lIA0KICBmaWx0ZXIoU2VwYWwuTGVuZ3RoID4gNykgJT4lIA0KICBzZWxlY3QoU2VwYWwuTGVuZ3RoLCBTZXBhbC5XaWR0aCwgU3BlY2llcykNCmBgYA0KDQojIyBTb3J0IHJvd3Mgd2l0aCBgYXJyYW5nZSgpYA0KDQpBZGQgYGFycmFuZ2UoKWAgdG8geW91ciBleHByZXNzaW9uIHRvIHNvcnQgcm93cy4gRm9yIGV4YW1wbGUgdG8gc29ydCB0aGUgbGFzdCBleHByZXNzaW9uIGJ5IGBTZXBhbC5XaWR0aGAgd2Ugc2ltcGx5IGFkZCBvbmUgbW9yZSBwaWVjZSB0byB0aGUgZXhwcmVzc2lvbjoNCg0KYGBge3IgY2h1bmswOH0NCmlyaXMgJT4lIA0KICBmaWx0ZXIoU2VwYWwuTGVuZ3RoID4gNykgJT4lIA0KICBzZWxlY3QoU2VwYWwuTGVuZ3RoLCBTZXBhbC5XaWR0aCwgU3BlY2llcykgJT4lIA0KICBhcnJhbmdlKFNlcGFsLldpZHRoKQ0KYGBgDQoNClRvIHNvcnQgcm93cyBpbiBkZXNjZW5kaW5nIG9yZGVyLCB3cmFwIHRoZSBmaWVsZCBuYW1lIGluIGBkZXNjKClgOg0KDQpgYGB7ciBjaHVuazA5fQ0KaXJpcyAlPiUgDQogIGZpbHRlcihTZXBhbC5MZW5ndGggPiA3KSAlPiUgDQogIHNlbGVjdChTZXBhbC5MZW5ndGgsIFNlcGFsLldpZHRoLCBTcGVjaWVzKSAlPiUgDQogIGFycmFuZ2UoZGVzYyhTZXBhbC5XaWR0aCkpDQpgYGANCg0KIyMgQ0hBTExFTkdFOiBUb3AtbiBWYWx1ZXMNCg0KV3JpdGUgYW4gZXhwcmVzc2lvbiB0aGF0IHJldHVybnMgdGhlIHRvcC01IFBldGFsIExlbmd0aHMgKGFsbCBzcGVjaWVzKQ0KDQpIaW50OiB0aGVyZSBhcmUgYSBjb3VwbGUgb2Ygd2F5cyB0byBkbyB0aGlzLiBMb29rIGF0IHRoZSBkcGx5ciBbQ2hlYXRzaGVldF0oaHR0cHM6Ly9naXRodWIuY29tL3JzdHVkaW8vY2hlYXRzaGVldHMvcmF3L21hc3Rlci9kYXRhLXRyYW5zZm9ybWF0aW9uLnBkZikgZm9yIGluc3BpcmF0aW9uLiBbQW5zd2VyXShodHRwczovL2JpdC5seS8zZDlyVnNTKQ0KDQpgYGB7ciBjaHVuazEwfQ0KIyBZb3VyIGFuc3dlciBoZXJlDQoNCmBgYA0KDQojIyBDcmVhdGUgYSBuZXcgY29sdW1uIHdpdGggYG11dGF0ZSgpYA0KDQpDcmVhdGUgYSBuZXcgY29sdW1uIHdoaWNoIGNvbnRhaW5zIHRoZSBwcm9kdWN0IG9mIHRoZSBTZXBhbCBMZW5ndGggYW5kIFBldGFsIExlbmd0aC4gDQoNCmBgYHtyIGNodW5rMTF9DQppcmlzICU+JSBtdXRhdGUoc2VwYWxfcGV0YWxfcHJvZHVjdCA9IFNlcGFsLkxlbmd0aCAqIFBldGFsLkxlbmd0aCkNCmBgYA0KDQojIyBDSEFMTEVOR0U6IEZpbHRlciBvbiBNdWx0aXBsZSBDb2x1bW5zDQoNCldyaXRlIGFuIGV4cHJlc3Npb24gZnJvbSB0aGUgYHF1YWtlc2AgZGF0YSBmcmFtZSB0aGF0Og0KDQotIHJldHVybnMgdGhlIGxhdCwgbG9uZywgYW5kIG1hZyBjb2x1bW5zICANCi0gb25seSByZXR1cm4gcXVha2VzIHRoYXQgYXJlIGkpIG5vcnRoIG9mIDIwIGRlZ3JlZXMgU291dGgsIGFuZCBpaSkgbGFyZ2VyIHRoYW4gbWFnbml0dWRlIDUuMC4gDQotIHNvcnRzIHRoZSByb3dzIGJ5IG1hZ25pdHVkZQ0KDQpIaW50OiBGaWppIGlzIHNvdXRoIG9mIHRoZSBlcXVhdG9yLiBbQW5zd2VyXShodHRwczovL2JpdC5seS8zZmRmbExSKQ0KDQpgYGB7ciBjaHVuazEyfQ0KIyBZb3VyIGFuc3dlciBoZXJlDQoNCmBgYA0KDQojIyBHcm91cCBhbmQgU3VtbWFyaXplIFJvd3MNCg0KWW91IGNhbiAnZ3JvdXAnIHJvd3MgdXNpbmcgYGdyb3VwX2J5KClgLiBCeSBpdHNlbGYsIGdyb3VwaW5nIHJvd3MgZG9lc24ndCBkbyBtdWNoLiBVc3VhbGx5IHlvdSBmb2xsb3cgYGdyb3VwX2J5KClgIHdpdGggc3VtbWFyaXNlKCkgd2hpY2ggY3JlYXRlcyBzb21lIGtpbmQgb2Ygc3VtbWFyeSBmb3IgZWFjaCBncm91cC4NCg0KRXhhbXBsZTogZm9yIGVhY2ggU3BlY2llcyBpbiBpcmlzLCBjb21wdXRlIHRoZSBtZWFuIHBldGFsIGxlbmd0aCBhbmQgd2lkdGgNCg0KYGBge3IgY2h1bmsxM30NCmlyaXMgJT4lIGdyb3VwX2J5KFNwZWNpZXMpICU+JSANCiAgc3VtbWFyaXNlKGF2Z19wZXRhbF9sZW5ndGggPSBtZWFuKFBldGFsLkxlbmd0aCksIGF2Z19wZXRhbF93aWR0aCA9IG1lYW4oUGV0YWwuV2lkdGgpKQ0KYGBgDQoNCiMjIENIQUxMRU5HRTogR3JvdXAgYW5kIFN1bW1hcmlzZQ0KDQpGb3IgZWFjaCBzcGVjaWVzIGluIGlyaXMsIGNvbXB1dGUgdGhlIG1pbmltdW0gYW5kIG1heGltdW0gc2VwYWwgbGVuZ3RoLiBbQW5zd2VyXShodHRwczovL2JpdC5seS8yVXRobGFUKQ0KDQpgYGB7ciBjaHVuazE0fQ0KIyBZb3VyIGFuc3dlciBoZXJlDQoNCmBgYA0KDQojIyBFbmQNCg0KQ29uZ3JhdHVsYXRpb25zLCB5b3UgaGF2ZSBjb21wbGV0ZWQgdGhlIE5vdGVib29rISANCg0KVG8gdmlldyB5b3VyIE5vdGVib29rIGF0IEhUTUwsIHNhdmUgaXQgKGFnYWluKSwgdGhlbiBjbGljayB0aGUgJ1ByZXZpZXcnIGJ1dHRvbiBpbiB0aGUgUlN0dWRpbyB0b29sYmFyLg0KDQoNCg==