Andy
Lyons
September 27, 2023
Familiarity with R
Email domain
Location
Move in the direction of becoming functional with R!!
1) Understand foundational terms and concepts
2) Hands-on practice
3) Discover RStudio’s bells and whistles
4) Learn how to get help
1) Watch now, practice later
2) Review what we cover within 24 hours
Date | Session |
---|---|
Sep. 27, 2023 10:00a - 12:00p |
Part 1. Getting Started |
Oct 4, 2023 10:00a - 12:00p |
Part 2. Packages, Functions, and Importing Data |
Oct 11, 2023 10:00a - 12:00p |
Part 3. Data Wrangling |
Oct 18, 2023 10:00a - 12:00p |
Part 4. Automation and ggplot |
See also Getting Started with R resources.
Why is R So Popular?
It’s free!
Huge user community (especially academics)
Thousands of add-ons (packages) that extend its capabilities
Particularly strong in plotting and reporting
Once you get over the initial hump, can work very efficiently
Makes it easy to get your code “out there”
Solid overall programming language
Exercise 1 Topics
RStudio Cloud project for this workshop:
https://posit.cloud/content/6638058
After it opens:
Key vocabulary terms are in italic.
When you enter an expression at the console, R will evaluate the expression, and print the results at the console.
If you enter an incomplete expression, R will prompt you to finish the job by showing a ‘+’ symbol in the console
You can save the results of an expression to an
object (variable) using an assignment operator
=
<-
R objects can be named almost anything (but no spaces or hyphens please)
R is case sensitive about everything
Once defined, R objects can be used in subsequent expressions
R objects can be updated (assigned a new value)
R objects are only saved in memory, and will disappear when you close RStudio
Comparison operators return TRUE or FALSE (aka Logical values)
The rules for naming objects are pretty flexible. You can use numbers, letters, and most special characters.
A few rules to take note of:
There are a handful of popular naming styles. Pick one that you like, and be consistent!
Style | Example |
---|---|
alllowercase | adjustcolor |
period.separated | shoe.size |
underscore_separated (aka snake case) | numeric_version |
lowerCamelCase | addTaskCallback |
UpperCamelCase | SignatureMethod |
All variables have a class or data type, which you can view using class().
Other common data types:
vectors are R objects that contain multiple values of the same class.
Example:
More examples:
In general, you need to use a function or operator to create a vector.
Sequence of numbers with the :
operator:
Repeat function:
## [1] "Quercus lobata" "Quercus lobata" "Quercus lobata" "Quercus lobata"
## [5] "Quercus lobata"
Combine elements of the same class with c()
:
Some built-in constants are also vectors:
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [20] "T" "U" "V" "W" "X" "Y" "Z"
## [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN" "IA"
## [16] "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH" "NJ"
## [31] "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT"
## [46] "VA" "WA" "WV" "WI" "WY"
## [1] "January" "February" "March" "April" "May" "June"
## [7] "July" "August" "September" "October" "November" "December"
Random number functions:
## [1] 0.17782436 0.64642744 0.89121214 0.70146090 0.79845666 0.26657743
## [7] 0.63596431 0.19446843 0.40912918 0.69756948 0.24045534 0.87716139
## [13] 0.67787150 0.91604563 0.79698219 0.53838195 0.64656791 0.19914492
## [19] 0.09933631 0.99757726
## [1] 0.314045995 -2.168952965 -1.035106500 2.050484172 -0.893925389
## [6] -0.004147209 0.794782381 0.919572292 1.473193529 1.972762083
## [11] 0.142524334 -0.451737784 0.767048058 -1.034922964 -0.210531267
## [16] -0.528755024 0.016250052 1.439573960 -0.156178807 -0.359327657
## [1] "Aug" "Apr" "Feb"
Many R functions and math operators are vectorized (i.e., operate on each individual element).
First we create two numeric vectors:
Are sin()
& cos()
vectorized?
## [1] 0.0000000 0.8414710 0.9092974 0.1411200 -0.7568025
## [1] 1.0000000 0.5403023 -0.4161468 -0.9899925 -0.6536436
Addition (and all math functions) is vectorized:
Functions that accept a vector and spit out a single value are aggregate.
## [1] 0.26211876 0.12979274 0.71034325 0.71292080 0.13439383 0.57130562
## [7] 0.33741735 0.41426845 0.92729550 0.53577323 0.77293588 0.63427758
## [13] 0.02293783 0.38646294 0.31318852 0.99781761 0.36043713 0.75702638
## [19] 0.73384852 0.06540885
Most descriptive stats functions are aggregate:
Other aggregate functions:
To extract a single element from a vector, use square bracket notation. Inside the square brackets, put the index of the element(s) you want.
To return multiple elements, pass a vector of indices.
You can also use square brackets to extract elements in a different order.
You can also insert a vector of Logical values (TRUE/FALSE) in the brackets. R will return the corresponding element for the TRUE values.
## [1] "A" "B" "C" "D" "E" "F" "G" "H"
Better still, use an expression that returns a vector of logical values:
Base R has simple plotting functions you can use to view the distribution of data.
To make a histogram, use hist()
:
Prefer a box plot?
The versatile plot()
can be used to make a simple
scatter plot:
Exercise 2 Topics
vectors are R objects that contain multiple values of the same class
Some functions that return
vectors:
c()
seq()
rnorm()
sample()
You can build vectors from scratch using:
c()
Functions and operators that operate on each element of a vector
and return another vector are said to be
vectorized
round(), abs()
+ - *
/
Functions that take multiple elements of a vector and spit out a
single value are said to be aggregate
functions
sum(), min(), mean(), max()
You can plot the distribution of numeric data using
plotting functions like:
hist(), boxplot(),
plot()
Top five advantages of using scripts over the console: