Importing Messy Excel Data
Although your data may be in a spreadsheet or CSV file, it may not be
in an ideal format. See this registration list:
Here we see a bunch of stuff before the data, the column “names” are
unwieldy, there are several extra columns we don’t need, missing values,
etc.
You can tackle many of these issues when you import the data. First,
we construct the file name and verify it exists:
reg_list_fn = "./data/SurveyResults_36100.xlsx"
file.exists(reg_list_fn)
[1] TRUE
To import Excel files, we’ll go to read_xlsx()
from the
readxl package. This function has a lot of options, so it’s
worth reading the help page:
library(readxl)
reg_list_tbl = read_xlsx(path = reg_list_fn,
sheet = "Survey Output",
skip = 3,
col_names = c("user", "reg_date", "fname", "lname",
"email", "title", "org",
"cosponsor", "allergies"),
col_types = c("text", "text", "text", "text", "text", "text",
"numeric", "text", "text"))
reg_list_tbl
Likewise, the readr
package provides super-charged functions for importing and exporting
rectangular data in text formats.
Take Home Messags
dplyr
methods
library(dplyr)
We begin by looking at the Palmer Penguins data:
library(palmerpenguins)
head(penguins)
dplyr provides glimpse()
which is an alternative way
preview a tibble:
glimpse(penguins)
Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie,…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Torg…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 37.8, 37.8, 41.1, 38.6, 34.6, 36.6, 38.7, 42.…
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, 17.1, 17.3, 17.6, 21.2, 21.1, 17.8, 19.0, 20.…
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186, 180, 182, 191, 198, 185, 195, 197, 184, 194, 174,…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, 3300, 3700, 3200, 3800, 4400, 3700, 3450, 450…
$ sex <fct> male, female, female, NA, female, male, female, male, NA, NA, NA, NA, female, male, male, female, femal…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2…
Subset Columns
Suppose we want to compare how bill length and flipper length look
for each species. We could start by selecting just those columns:
library(dplyr)
penguins %>%
select(species, bill_length_mm, flipper_length_mm) %>%
head()
You can also rename columns with
select()
(dplyr also has a rename()
function):
penguins %>%
select(species, bill = bill_length_mm, flipper = flipper_length_mm) %>%
head()
If you want to extract the values from a single column and get the
results back as a vector, use pull()
(similar to the
$
operator):
penguins %>% pull(species) %>% table()
.
Adelie Chinstrap Gentoo
152 68 124
Adding/Modifying Columns
To add a column, we can use mutate()
.
Let’s create a new column with the ratio of the bill to flipper
length:
peng_billflip_tbl <- penguins %>%
select(species, bill_length_mm, flipper_length_mm) %>%
mutate(bill_flip_ratio = bill_length_mm / flipper_length_mm)
head(peng_billflip_tbl)
mutate()
can also be used to modify existing columns,
for example we could round bill_flip_ratio to 3 decimal places.
peng_billflip_tbl <- peng_billflip_tbl %>%
mutate(bill_flip_ratio = round(bill_flip_ratio, 3))
head(peng_billflip_tbl)
CHALLENGE #1: Add a column called
bill_flip_sum
which represents the sum of the bill and
flipper length.
Hint: use mutate()
## Your answer here
penguins %>%
mutate(bill_flip_sum = bill_length_mm + flipper_length_mm) %>%
head()
Managing Rows
Sorting
You can sort a table using arrange()
,
passing the column(s) you want to sort by. The default sort order is
ascending, but you can change that to descending by wrapping the column
name in desc()
.
Sort by flipper size (smallest to largest):
peng_billflip_tbl %>%
arrange(flipper_length_mm) %>%
head()
Largest to smallest:
peng_billflip_tbl %>%
arrange(desc(flipper_length_mm)) %>%
head()
Subsetting
The two functions most commonly used to subset rows include
filter()
and slice()
.
Logical expressions you can use in filter()
include:
Suppose we only wanted to work with observations of Gentoo
penguins:
gentoo_billflip_tbl <- peng_billflip_tbl %>%
filter(species == "Gentoo")
glimpse(gentoo_billflip_tbl)
Rows: 124
Columns: 4
$ species <fct> Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo,…
$ bill_length_mm <dbl> 46.1, 50.0, 48.7, 50.0, 47.6, 46.5, 45.4, 46.7, 43.3, 46.8, 40.9, 49.0, 45.5, 48.4, 45.8, 49.3, 42.0, 4…
$ flipper_length_mm <int> 211, 230, 210, 218, 215, 210, 211, 219, 209, 215, 214, 216, 214, 213, 210, 217, 210, 221, 209, 222, 218…
$ bill_flip_ratio <dbl> 0.218, 0.217, 0.232, 0.229, 0.221, 0.221, 0.215, 0.213, 0.207, 0.218, 0.191, 0.227, 0.213, 0.227, 0.218…
If you want to filter on two criteria, add a second expression to
filter()
:
peng_billflip_tbl %>%
filter(species == "Gentoo", flipper_length_mm > 220) %>%
select(species, flipper_length_mm) %>%
glimpse()
Rows: 35
Columns: 2
$ species <fct> Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo, Gentoo,…
$ flipper_length_mm <int> 230, 221, 222, 222, 230, 225, 222, 225, 225, 224, 221, 231, 230, 229, 223, 221, 221, 230, 223, 221, 224…
Suppose we wanted to identify the top 5 Gentoo observations with the
largest bill_flip_ratio value. We can do that using a combination of
arrange()
and slice()
:
gentoo_billflip_tbl %>%
arrange(desc(bill_flip_ratio)) %>%
slice(1:5)
We could also use top_n()
:
gentoo_billflip_tbl %>%
top_n(5, bill_flip_ratio) %>%
arrange(desc(bill_flip_ratio))
Challenge Questions
Mix and match the following functions for the challenge
questions.
select()
, mutate()
, pull()
,
arrange()
, filter()
, slice()
CHALLENGE #2: Write an expression that pulls out
just the males on Torgersen Island. HOw many are there?
penguins %>%
filter(island == "Torgersen", sex == "male") %>%
head()
Ans. 23
CHALLENGE #3: COmpute the ratio of body mass to bill
length and add it as a new column. Return just the species, sex, and
this ratio.
penguins %>%
mutate(bm_bill_len_ratio = body_mass_g / bill_length_mm) %>%
select(species, sex, bm_bill_len_ratio) %>%
head()
Table Summaries
The main dplyr function for creating summaries of data is
summarise()
. The syntax is similar to
mutate()
. Summary columns can use any aggregate function
from base R or dplyr (see the dplyr
cheatsheet for a list of functions that are designed to use with
summarise()
).
Summarise an entire tibble
Compute the mean of several columns (using the
na.rm = TRUE
to ignore NA values):
penguins %>%
summarise(bill_len_mean = mean(bill_length_mm, na.rm = TRUE),
bill_depth_mean = mean(bill_depth_mm, na.rm = TRUE),
body_mass_mean = mean(body_mass_g, na.rm = TRUE))
Summarise groups of rows
Often you want to create summaries for groups of rows (i.e., a
different unit of analysis).
If we wanted to compute summary stats for each species, you would
first group the rows by species with group_by()
, and then
use summarise()
.
penguins %>%
group_by(species) %>%
summarise(bill_len_mean = mean(bill_length_mm, na.rm = TRUE),
bill_depth_mean = mean(bill_depth_mm, na.rm = TRUE),
body_mass_mean = mean(body_mass_g, na.rm = TRUE))
How many rows per island?
For this question, we can use the n()
function in
summarise()
:
penguins %>%
group_by(island) %>%
summarise(num_records = n())
CHALLENGE #4: Compute the number of observations per
year
penguins %>%
group_by(year) %>%
summarise(num_records = n())
CHALLENGE #5: Compute the number of observations per
year and species?
Hint: group_by()
can take more than one column
penguins %>%
group_by(year, species) %>%
summarise(num_records = n())
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
LS0tDQp0aXRsZTogIkV4ZXJjaXNlIDU6IERhdGEgV3JhbmdsaW5nIFB0IDEiDQpvdXRwdXQ6IA0KICBodG1sX25vdGVib29rOg0KICAgIHRvYzogeWVzDQogICAgdG9jX2Zsb2F0OiB5ZXMNCi0tLQ0KDQojIEltcG9ydGluZyBNZXNzeSBFeGNlbCBEYXRhDQoNCkFsdGhvdWdoIHlvdXIgZGF0YSBtYXkgYmUgaW4gYSBzcHJlYWRzaGVldCBvciBDU1YgZmlsZSwgaXQgbWF5IG5vdCBiZSBpbiBhbiBpZGVhbCBmb3JtYXQuIFNlZSB0aGlzIHJlZ2lzdHJhdGlvbiBsaXN0Og0KDQohW10oLi9pbWFnZXMvcmVnX2xpc3RfZXhjZWxfMTUwMHg2MDAucG5nKQ0KDQpcDQoNCkhlcmUgd2Ugc2VlIGEgYnVuY2ggb2Ygc3R1ZmYgYmVmb3JlIHRoZSBkYXRhLCB0aGUgY29sdW1uICJuYW1lcyIgYXJlIHVud2llbGR5LCB0aGVyZSBhcmUgc2V2ZXJhbCBleHRyYSBjb2x1bW5zIHdlIGRvbid0IG5lZWQsIG1pc3NpbmcgdmFsdWVzLCBldGMuDQoNCllvdSBjYW4gdGFja2xlIG1hbnkgb2YgdGhlc2UgaXNzdWVzIHdoZW4geW91IGltcG9ydCB0aGUgZGF0YS4gRmlyc3QsIHdlIGNvbnN0cnVjdCB0aGUgZmlsZSBuYW1lIGFuZCB2ZXJpZnkgaXQgZXhpc3RzOg0KDQpgYGB7ciBjaHVuazAxfQ0KcmVnX2xpc3RfZm4gPSAiLi9kYXRhL1N1cnZleVJlc3VsdHNfMzYxMDAueGxzeCINCmZpbGUuZXhpc3RzKHJlZ19saXN0X2ZuKQ0KYGBgDQoNClwNCg0KVG8gaW1wb3J0IEV4Y2VsIGZpbGVzLCB3ZSdsbCBnbyB0byBgcmVhZF94bHN4KClgIGZyb20gdGhlICpyZWFkeGwqIHBhY2thZ2UuIFRoaXMgZnVuY3Rpb24gaGFzIGEgbG90IG9mIG9wdGlvbnMsIHNvIGl0J3Mgd29ydGggcmVhZGluZyB0aGUgaGVscCBwYWdlOg0KDQpgYGB7ciBjaHVuazAyfQ0KbGlicmFyeShyZWFkeGwpDQpyZWdfbGlzdF90YmwgPSByZWFkX3hsc3gocGF0aCA9IHJlZ19saXN0X2ZuLCANCiAgICAgICAgICAgICAgICAgICAgICAgICBzaGVldCA9ICJTdXJ2ZXkgT3V0cHV0IiwgDQogICAgICAgICAgICAgICAgICAgICAgICAgc2tpcCA9IDMsIA0KICAgICAgICAgICAgICAgICAgICAgICAgIGNvbF9uYW1lcyA9IGMoInVzZXIiLCAicmVnX2RhdGUiLCAiZm5hbWUiLCAibG5hbWUiLCANCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICJlbWFpbCIsICJ0aXRsZSIsICJvcmciLCANCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICJjb3Nwb25zb3IiLCAiYWxsZXJnaWVzIiksDQogICAgICAgICAgICAgICAgICAgICAgICAgY29sX3R5cGVzID0gYygidGV4dCIsICJ0ZXh0IiwgInRleHQiLCAidGV4dCIsICJ0ZXh0IiwgInRleHQiLA0KICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIm51bWVyaWMiLCAidGV4dCIsICJ0ZXh0IikpDQoNCnJlZ19saXN0X3RibA0KYGBgDQoNClwNCg0KTGlrZXdpc2UsIHRoZSBbcmVhZHJdKGh0dHBzOi8vcmVhZHIudGlkeXZlcnNlLm9yZy8pIHBhY2thZ2UgcHJvdmlkZXMgc3VwZXItY2hhcmdlZCBmdW5jdGlvbnMgZm9yIGltcG9ydGluZyBhbmQgZXhwb3J0aW5nIHJlY3Rhbmd1bGFyIGRhdGEgaW4gdGV4dCBmb3JtYXRzLg0KDQpcDQoNCioqVGFrZSBIb21lIE1lc3NhZ3MqKg0KDQotIERhdGEgd3JhbmdsaW5nIHN0YXJ0cyB3aGVuIHlvdSBpbXBvcnQNCg0KLSBZb3UncmUgbm90IGxpbWl0ZWQgdG8ganVzdCB1c2luZyBgcmVhZC5jc3YoKWANCg0KXA0KDQojIGBkcGx5cmAgbWV0aG9kcw0KDQpgYGB7ciBjaHVuazAzfQ0KbGlicmFyeShkcGx5cikNCmBgYA0KDQpcDQoNCldlIGJlZ2luIGJ5IGxvb2tpbmcgYXQgdGhlIFBhbG1lciBQZW5ndWlucyBkYXRhOg0KDQpgYGB7ciBjaHVuazA0fQ0KbGlicmFyeShwYWxtZXJwZW5ndWlucykNCmhlYWQocGVuZ3VpbnMpDQpgYGANCg0KXA0KDQpkcGx5ciBwcm92aWRlcyBgZ2xpbXBzZSgpYCB3aGljaCBpcyBhbiBhbHRlcm5hdGl2ZSB3YXkgcHJldmlldyBhIHRpYmJsZToNCg0KYGBge3IgY2h1bmswNX0NCmdsaW1wc2UocGVuZ3VpbnMpDQpgYGANCg0KXA0KDQojIyBTdWJzZXQgQ29sdW1ucw0KDQpTdXBwb3NlIHdlIHdhbnQgdG8gY29tcGFyZSBob3cgYmlsbCBsZW5ndGggYW5kIGZsaXBwZXIgbGVuZ3RoIGxvb2sgZm9yIGVhY2ggc3BlY2llcy4gV2UgY291bGQgc3RhcnQgYnkgc2VsZWN0aW5nIGp1c3QgdGhvc2UgY29sdW1uczoNCg0KYGBge3IgY2h1bmswNn0NCmxpYnJhcnkoZHBseXIpDQpwZW5ndWlucyAlPiUgDQogIHNlbGVjdChzcGVjaWVzLCBiaWxsX2xlbmd0aF9tbSwgZmxpcHBlcl9sZW5ndGhfbW0pICU+JSANCiAgaGVhZCgpDQpgYGANCg0KXA0KDQpZb3UgY2FuIGFsc28gKipyZW5hbWUqKiBjb2x1bW5zIHdpdGggYHNlbGVjdCgpYCAoZHBseXIgYWxzbyBoYXMgYSBgcmVuYW1lKClgIGZ1bmN0aW9uKToNCg0KYGBge3IgY2h1bmswN30NCnBlbmd1aW5zICU+JSANCiAgc2VsZWN0KHNwZWNpZXMsIGJpbGwgPSBiaWxsX2xlbmd0aF9tbSwgZmxpcHBlciA9IGZsaXBwZXJfbGVuZ3RoX21tKSAlPiUgDQogIGhlYWQoKQ0KYGBgDQoNClwNCg0KSWYgeW91IHdhbnQgdG8gZXh0cmFjdCB0aGUgdmFsdWVzIGZyb20gYSBzaW5nbGUgY29sdW1uIGFuZCBnZXQgdGhlIHJlc3VsdHMgYmFjayBhcyBhIHZlY3RvciwgDQp1c2UgYHB1bGwoKWAgKHNpbWlsYXIgdG8gdGhlIGAkYCBvcGVyYXRvcik6DQoNCmBgYHtyIGNodW5rMDh9DQpwZW5ndWlucyAlPiUgcHVsbChzcGVjaWVzKSAlPiUgdGFibGUoKQ0KYGBgDQoNClwNCg0KIyMgQWRkaW5nL01vZGlmeWluZyBDb2x1bW5zDQoNClRvICoqYWRkIGEgY29sdW1uKiosIHdlIGNhbiB1c2UgYG11dGF0ZSgpYC4gTGV0J3MgY3JlYXRlIGEgbmV3IGNvbHVtbiB3aXRoIHRoZSByYXRpbyBvZiB0aGUgYmlsbCB0byBmbGlwcGVyIGxlbmd0aDoNCg0KYGBge3IgY2h1bmswOX0NCnBlbmdfYmlsbGZsaXBfdGJsIDwtIHBlbmd1aW5zICU+JSANCiAgc2VsZWN0KHNwZWNpZXMsIGJpbGxfbGVuZ3RoX21tLCBmbGlwcGVyX2xlbmd0aF9tbSkgJT4lDQogIG11dGF0ZShiaWxsX2ZsaXBfcmF0aW8gPSBiaWxsX2xlbmd0aF9tbSAvIGZsaXBwZXJfbGVuZ3RoX21tKQ0KICANCmhlYWQocGVuZ19iaWxsZmxpcF90YmwpDQpgYGANCg0KXA0KDQpgbXV0YXRlKClgIGNhbiBhbHNvIGJlIHVzZWQgdG8gbW9kaWZ5IGV4aXN0aW5nIGNvbHVtbnMsIGZvciBleGFtcGxlIHdlIGNvdWxkIHJvdW5kIGJpbGxfZmxpcF9yYXRpbyB0byAzIGRlY2ltYWwgcGxhY2VzLg0KDQpgYGB7ciBjaHVuazEwfQ0KcGVuZ19iaWxsZmxpcF90YmwgPC0gcGVuZ19iaWxsZmxpcF90YmwgJT4lIA0KICBtdXRhdGUoYmlsbF9mbGlwX3JhdGlvID0gcm91bmQoYmlsbF9mbGlwX3JhdGlvLCAzKSkNCg0KaGVhZChwZW5nX2JpbGxmbGlwX3RibCkNCmBgYA0KDQpcDQoNCioqQ0hBTExFTkdFICMxKio6IEFkZCBhIGNvbHVtbiBjYWxsZWQgYGJpbGxfZmxpcF9zdW1gIHdoaWNoIHJlcHJlc2VudHMgdGhlIHN1bSBvZiB0aGUgYmlsbCBhbmQgZmxpcHBlciBsZW5ndGguDQoNCkhpbnQ6IHVzZSBgbXV0YXRlKClgDQoNCmBgYHtyIGNodW5rMTF9DQojIyBZb3VyIGFuc3dlciBoZXJlDQoNCnBlbmd1aW5zICU+JSANCiAgbXV0YXRlKGJpbGxfZmxpcF9zdW0gPSBiaWxsX2xlbmd0aF9tbSArIGZsaXBwZXJfbGVuZ3RoX21tKSAlPiUgDQogIGhlYWQoKQ0KYGBgDQoNClwNCg0KIyBNYW5hZ2luZyBSb3dzDQoNCiMjIFNvcnRpbmcNCg0KWW91IGNhbiAqKnNvcnQqKiBhIHRhYmxlIHVzaW5nIGBhcnJhbmdlKClgLCBwYXNzaW5nIHRoZSBjb2x1bW4ocykgeW91IHdhbnQgdG8gc29ydCBieS4gVGhlIGRlZmF1bHQgc29ydCBvcmRlciBpcyBhc2NlbmRpbmcsIGJ1dCB5b3UgY2FuIGNoYW5nZSB0aGF0IHRvIGRlc2NlbmRpbmcgYnkgd3JhcHBpbmcgdGhlIGNvbHVtbiBuYW1lIGluIGBkZXNjKClgLg0KDQpTb3J0IGJ5IGZsaXBwZXIgc2l6ZSAoc21hbGxlc3QgdG8gbGFyZ2VzdCk6DQoNCmBgYHtyIGNodW5rMTJ9DQpwZW5nX2JpbGxmbGlwX3RibCAlPiUgDQogIGFycmFuZ2UoZmxpcHBlcl9sZW5ndGhfbW0pICU+JSANCiAgaGVhZCgpDQpgYGANCg0KXA0KDQpMYXJnZXN0IHRvIHNtYWxsZXN0Og0KDQpgYGB7ciBjaHVuazEzfQ0KcGVuZ19iaWxsZmxpcF90YmwgJT4lIA0KICBhcnJhbmdlKGRlc2MoZmxpcHBlcl9sZW5ndGhfbW0pKSAlPiUgDQogIGhlYWQoKQ0KYGBgDQoNClwNCg0KIyMgU3Vic2V0dGluZw0KDQpUaGUgdHdvIGZ1bmN0aW9ucyBtb3N0IGNvbW1vbmx5IHVzZWQgdG8gc3Vic2V0IHJvd3MgaW5jbHVkZSBgZmlsdGVyKClgIGFuZCBgc2xpY2UoKWAuDQoNCiAgLSB1c2UgYHNsaWNlKClgIHRvIHN1YnNldCBiYXNlZCBvbiBpbmRpY2VzDQoNCiAgLSB1c2UgYGZpbHRlcigpYCB0byBzdWJzZXQgYmFzZWQgb24gYSBsb2dpY2FsIGV4cHJlc3Npb25zDQoNClwNCg0KTG9naWNhbCBleHByZXNzaW9ucyB5b3UgY2FuIHVzZSBpbiBgZmlsdGVyKClgIGluY2x1ZGU6DQoNCiFbXSguL2ltYWdlcy9sb2dpY2FsLWV4cHJlc3Npb25zLTc0MHgyODB4MjU2LnBuZykNCg0KXA0KDQpTdXBwb3NlIHdlIG9ubHkgd2FudGVkIHRvIHdvcmsgd2l0aCBvYnNlcnZhdGlvbnMgb2YgKkdlbnRvbyogcGVuZ3VpbnM6DQoNCmBgYHtyIGNodW5rMTR9DQpnZW50b29fYmlsbGZsaXBfdGJsIDwtIHBlbmdfYmlsbGZsaXBfdGJsICU+JSANCiAgZmlsdGVyKHNwZWNpZXMgPT0gIkdlbnRvbyIpIA0KDQpnbGltcHNlKGdlbnRvb19iaWxsZmxpcF90YmwpDQpgYGANCg0KXA0KDQpJZiB5b3Ugd2FudCB0byBmaWx0ZXIgb24gdHdvIGNyaXRlcmlhLCBhZGQgYSBzZWNvbmQgZXhwcmVzc2lvbiB0byBgZmlsdGVyKClgOg0KDQpgYGB7ciBjaHVuazE1fQ0KcGVuZ19iaWxsZmxpcF90YmwgJT4lIA0KICBmaWx0ZXIoc3BlY2llcyA9PSAiR2VudG9vIiwgZmxpcHBlcl9sZW5ndGhfbW0gPiAyMjApICU+JSANCiAgc2VsZWN0KHNwZWNpZXMsIGZsaXBwZXJfbGVuZ3RoX21tKSAlPiUgDQogIGdsaW1wc2UoKQ0KYGBgDQoNClwNCg0KU3VwcG9zZSB3ZSB3YW50ZWQgdG8gaWRlbnRpZnkgdGhlIHRvcCA1IEdlbnRvbyBvYnNlcnZhdGlvbnMgd2l0aCB0aGUgbGFyZ2VzdCBiaWxsX2ZsaXBfcmF0aW8gdmFsdWUuIFdlIGNhbiBkbyB0aGF0IHVzaW5nIGEgY29tYmluYXRpb24gb2YgYGFycmFuZ2UoKWAgYW5kIGBzbGljZSgpYDoNCg0KYGBge3IgY2h1bmsxNn0NCmdlbnRvb19iaWxsZmxpcF90YmwgJT4lIA0KICBhcnJhbmdlKGRlc2MoYmlsbF9mbGlwX3JhdGlvKSkgJT4lIA0KICBzbGljZSgxOjUpIA0KYGBgDQoNClwNCg0KV2UgY291bGQgYWxzbyB1c2UgYHRvcF9uKClgOg0KDQpgYGB7ciBjaHVuazE3fQ0KZ2VudG9vX2JpbGxmbGlwX3RibCAlPiUgDQogIHRvcF9uKDUsIGJpbGxfZmxpcF9yYXRpbykgJT4lIA0KICBhcnJhbmdlKGRlc2MoYmlsbF9mbGlwX3JhdGlvKSkNCmBgYA0KDQpcDQoNCiMjIENoYWxsZW5nZSBRdWVzdGlvbnMNCg0KTWl4IGFuZCBtYXRjaCB0aGUgZm9sbG93aW5nIGZ1bmN0aW9ucyBmb3IgdGhlIGNoYWxsZW5nZSBxdWVzdGlvbnMuDQoNCmBzZWxlY3QoKWAsIGBtdXRhdGUoKWAsIGBwdWxsKClgLCBgYXJyYW5nZSgpYCwgYGZpbHRlcigpYCwgYHNsaWNlKClgDQoNClwNCg0KKipDSEFMTEVOR0UgIzIqKjogV3JpdGUgYW4gZXhwcmVzc2lvbiB0aGF0IHB1bGxzIG91dCBqdXN0IHRoZSBtYWxlcyBvbiBUb3JnZXJzZW4gSXNsYW5kLiBIT3cgbWFueSBhcmUgdGhlcmU/DQoNCmBgYHtyIGNodW5rMTh9DQpwZW5ndWlucyAlPiUgDQogIGZpbHRlcihpc2xhbmQgPT0gIlRvcmdlcnNlbiIsIHNleCA9PSAibWFsZSIpICU+JSANCiAgaGVhZCgpDQpgYGANCg0KQW5zLiAyMw0KDQpcDQoNCioqQ0hBTExFTkdFICMzKio6IENPbXB1dGUgdGhlIHJhdGlvIG9mIGJvZHkgbWFzcyB0byBiaWxsIGxlbmd0aCBhbmQgYWRkIGl0IGFzIGEgbmV3IGNvbHVtbi4gUmV0dXJuIGp1c3QgdGhlIHNwZWNpZXMsIHNleCwgYW5kIHRoaXMgcmF0aW8uDQoNCmBgYHtyIGNodW5rMTl9DQpwZW5ndWlucyAlPiUgDQogIG11dGF0ZShibV9iaWxsX2xlbl9yYXRpbyA9IGJvZHlfbWFzc19nIC8gYmlsbF9sZW5ndGhfbW0pICU+JSANCiAgc2VsZWN0KHNwZWNpZXMsIHNleCwgYm1fYmlsbF9sZW5fcmF0aW8pICU+JSANCiAgaGVhZCgpDQpgYGANCg0KXA0KDQojIE1pc3NpbmcgVmFsdWVzDQoNCkluIGdlbmVyYWwgeW91IGhhdmUgdGhyZWUgb3B0aW9ucyBmb3IgbWlzc2luZyB2YWx1ZXM6DQoNCjEuIFJlbW92ZSBhbnkgcm93IHRoYXQgaGFzIG1pc3NpbmcgdmFsdWVzIGZyb20gdGhlIGRhdGENCjIuIExlYXZlIGFsbCByb3dzIGJ1dCBpZ25vcmUgTkEgdmFsdWVzIHdoZW4geW91IGNvbXB1dGUgc3VtbWFyaWVzIChsaWtlIG1lYW4pDQozLiBFc3RpbWF0ZSBvciBpbXB1dGUgdGhlbQ0KDQpcDQoNClN1cHBvc2Ugd2Ugd2FudGVkIHRvIGtlZXAgb25seSBvYnNlcnZhdGlvbnMgdGhhdCBoYXZlIG5vIG1pc3NpbmcgdmFsdWVzIGZvciBmbGlwcGVyX2xlbmd0aF9tbSwgYm9keV9tYXNzX2csIGFuZCBzZXguDQoNCmBgYHtyIGNodW5rMjB9DQpwZW5ndWlucyAlPiUgDQogIHNlbGVjdChzcGVjaWVzLCBmbGlwcGVyX2xlbmd0aF9tbSwgYm9keV9tYXNzX2csIHNleCkgJT4lIA0KICBmaWx0ZXIoIWlzLm5hKGZsaXBwZXJfbGVuZ3RoX21tKSwgIWlzLm5hKGJvZHlfbWFzc19nKSwgIWlzLm5hKHNleCkpICU+JSANCiAgaGVhZCgpDQpgYGANCg0KXA0KDQojIFRhYmxlIFN1bW1hcmllcw0KDQpUaGUgbWFpbiBkcGx5ciBmdW5jdGlvbiBmb3IgY3JlYXRpbmcgc3VtbWFyaWVzIG9mIGRhdGEgaXMgYHN1bW1hcmlzZSgpYC4gVGhlIHN5bnRheCBpcyBzaW1pbGFyIHRvIGBtdXRhdGUoKWAuIFN1bW1hcnkgY29sdW1ucyBjYW4gdXNlIGFueSBhZ2dyZWdhdGUgZnVuY3Rpb24gZnJvbSBiYXNlIFIgb3IgZHBseXIgKHNlZSB0aGUgW2RwbHlyIGNoZWF0c2hlZXRdKGh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9yc3R1ZGlvL2NoZWF0c2hlZXRzL21haW4vZGF0YS10cmFuc2Zvcm1hdGlvbi5wZGYpIGZvciBhIGxpc3Qgb2YgZnVuY3Rpb25zIHRoYXQgYXJlIGRlc2lnbmVkIHRvIHVzZSB3aXRoIGBzdW1tYXJpc2UoKWApLg0KDQpcDQoNCiMjIFN1bW1hcmlzZSBhbiBlbnRpcmUgdGliYmxlDQoNCkNvbXB1dGUgdGhlIG1lYW4gb2Ygc2V2ZXJhbCBjb2x1bW5zICh1c2luZyB0aGUgYG5hLnJtID0gVFJVRWAgdG8gaWdub3JlIE5BIHZhbHVlcyk6DQoNCmBgYHtyIGNodW5rMjF9DQpwZW5ndWlucyAlPiUgDQogIHN1bW1hcmlzZShiaWxsX2xlbl9tZWFuID0gbWVhbihiaWxsX2xlbmd0aF9tbSwgbmEucm0gPSBUUlVFKSwgDQogICAgICAgICAgICBiaWxsX2RlcHRoX21lYW4gPSBtZWFuKGJpbGxfZGVwdGhfbW0sIG5hLnJtID0gVFJVRSksIA0KICAgICAgICAgICAgYm9keV9tYXNzX21lYW4gPSBtZWFuKGJvZHlfbWFzc19nLCBuYS5ybSA9IFRSVUUpKQ0KYGBgDQoNClwNCg0KIyMgU3VtbWFyaXNlIGdyb3VwcyBvZiByb3dzDQoNCk9mdGVuIHlvdSB3YW50IHRvIGNyZWF0ZSBzdW1tYXJpZXMgZm9yIGdyb3VwcyBvZiByb3dzIChpLmUuLCBhIGRpZmZlcmVudCB1bml0IG9mIGFuYWx5c2lzKS4gDQoNCklmIHdlIHdhbnRlZCB0byBjb21wdXRlIHN1bW1hcnkgc3RhdHMgZm9yIGVhY2ggc3BlY2llcywgeW91IHdvdWxkIGZpcnN0IGdyb3VwIHRoZSByb3dzIGJ5IHNwZWNpZXMgd2l0aCBgZ3JvdXBfYnkoKWAsIGFuZCB0aGVuIHVzZSBgc3VtbWFyaXNlKClgLg0KDQpgYGB7ciBjaHVuazIyfQ0KcGVuZ3VpbnMgJT4lIA0KICBncm91cF9ieShzcGVjaWVzKSAlPiUgDQogIHN1bW1hcmlzZShiaWxsX2xlbl9tZWFuID0gbWVhbihiaWxsX2xlbmd0aF9tbSwgbmEucm0gPSBUUlVFKSwgDQogICAgICAgICAgICBiaWxsX2RlcHRoX21lYW4gPSBtZWFuKGJpbGxfZGVwdGhfbW0sIG5hLnJtID0gVFJVRSksIA0KICAgICAgICAgICAgYm9keV9tYXNzX21lYW4gPSBtZWFuKGJvZHlfbWFzc19nLCBuYS5ybSA9IFRSVUUpKQ0KYGBgDQoNClwNCg0KKipIb3cgbWFueSByb3dzIHBlciBpc2xhbmQ/KioNCg0KRm9yIHRoaXMgcXVlc3Rpb24sIHdlIGNhbiB1c2UgdGhlIGBuKClgIGZ1bmN0aW9uIGluIGBzdW1tYXJpc2UoKWA6DQoNCmBgYHtyIGNodW5rMjN9DQpwZW5ndWlucyAlPiUgDQogIGdyb3VwX2J5KGlzbGFuZCkgJT4lIA0KICBzdW1tYXJpc2UobnVtX3JlY29yZHMgPSBuKCkpDQpgYGANCg0KXA0KDQoqKkNIQUxMRU5HRSAjNCoqOiBDb21wdXRlIHRoZSBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zIHBlciB5ZWFyDQoNCmBgYHtyIGNodW5rMjR9DQpwZW5ndWlucyAlPiUgDQogIGdyb3VwX2J5KHllYXIpICU+JSANCiAgc3VtbWFyaXNlKG51bV9yZWNvcmRzID0gbigpKQ0KYGBgDQoNClwNCg0KKipDSEFMTEVOR0UgIzUqKjogQ29tcHV0ZSB0aGUgbnVtYmVyIG9mIG9ic2VydmF0aW9ucyBwZXIgeWVhciBhbmQgc3BlY2llcz8NCg0KSGludDogYGdyb3VwX2J5KClgIGNhbiB0YWtlIG1vcmUgdGhhbiBvbmUgY29sdW1uDQoNCmBgYHtyIGNodW5rMjV9DQpwZW5ndWlucyAlPiUgDQogIGdyb3VwX2J5KHllYXIsIHNwZWNpZXMpICU+JSANCiAgc3VtbWFyaXNlKG51bV9yZWNvcmRzID0gbigpKQ0KYGBgDQoNClwNCg0KIyBFTkQNCg0KVGhhdCdzIGl0IQ0KDQo=