Chapter 6 Loading Data
Like we learned before, you’ll be able to see what data, variables, and other things you’ve loaded and stored in R
by checking your Environment
tab. Right now, yours is most likely empty, and that’s okay! We’ve learned a lot, but haven’t started doing a ton yet. We will soon, but first we need to learn how to get data into R
so that we can start exploring it.
6.1 Data Files
Most of the time, you’ll want to work with something called a CSV file. CSV stands for Comma Separated Value, which means that all of the data within an observation is separated by a comma Here’s an example CSV file that you can download. To see the raw file, you can open it in a program like TextEdit on a Mac or Notepad on PC. Usually, a CSV file will default to open in a spreadsheet program like Excel since it’s easy to parse (read in and break up) into columns – this is because of the commas!
6.2 Reading in CSV Files
To read a CSV into R
, there’s a great function called read.csv()
. It will read in your file as a data frame. To read in the file, you just need to put the name of the file as a charachter string (remember your data types?) in between the parenthesis. This is the only required argument, or input to a function, that you need to supply, however there are a few others to note:
The
stringsAsFactors
argument can be either aTRUE
orFALSE
value. WhenTRUE
(this argument’s default value), it takes any character strings in the data and coerces them to be factors. Sometimes this may be okay, sometimes notThe
header
argument. This indicates whether or not a header row is present in the CSV file, which would contain names for all of the columns. It defaults toTRUE
, but it’s a good idea to double check your data and make sure that one’s present. You can check the names and rename if you wish by using thenames()
function discussed in chapter 3
Directories
One common issue is the location on your computer where the CSV is located. If you just type the name of the file as the argument for read.csv()
, R
will look for it in your working directory, or default file lookup location. However, if your file isn’t present in the working directory, you’ll likely get an error message. If this is the case, you have a few options:
You can supply the full filepath to the data as the string. Watch this video to find how to find the full filepath for Mac or this one for PC. While it may be more work to find, it’ll guarantee that you import the right file.
You can change your working directory with the
setwd()
function, supplying the path to the directory as acharacter
string argument. This still requires knowing where your file is located, but if you plan to work with multiple files, this isn’t a bad option. You can check your current working directory with thegetwd()
function without any arguments.
Loading Example
Now let’s import that example CSV file. It’s the combined results from Survey 1 of STAT 100 and STAT 200. Remember, we should name the new data frame so that we can look at it and refer back to it. We’ll use the CSV file 'Combined Fall 2017 Survey 1.csv'
, which does have a header row, and is located in a subdirectory of our working directory called data
(this is the directory that holds all of our data files for the book). It’s in a directory called data
, has a header row, and we’ll keep character
variables as characters
. When we import it, we’ll call it survey1
:
To see the first few observations, you can use the head()
function, passing the data frame’s name (in our case, survey1
). This will display the observations in your Console
, and will look like this:
## gender genderID height weight shoeSize schoolYear studyHr GPA ACT pets
## 1 Male Male 66 200 10.5 Sophomore 2.5 3.5 24 1
## 2 Female Female 66 142 7.5 Freshman 3.0 3.8 26 1
## 3 Male Male 65 160 10.5 Freshman 3.0 3.9 33 6
## 4 Female Female 68 118 7.5 Sophomore 3.0 3.9 28 0
## 5 Female Female 61 173 9.0 Sophomore 0.5 2.8 21 1
## 6 Female Female 66 125 8.0 Sophomore 2.5 2.3 20 0
## siblings speed cash sleep shoeNums ageMother ageFather random love
## 1 2 25 5 7.0 3 47 49 6 few
## 2 1 80 11 4.5 21 37 47 4 few
## 3 0 0 45 7.5 4 43 46 8 few
## 4 1 105 9 7.5 25 54 53 8 few
## 5 3 90 7 8.0 13 35 60 7 one
## 6 0 50 3 6.0 20 40 41 3 dozens
## charity movie favTV1 favTV2
## 1 60 Life is Beautiful Narcos Daredevil
## 2 60 The Giant Como dice el dicho Drake and Josh
## 3 10 Interstellar none none
## 4 25 La La Land Friends Parks and Recreation
## 5 0 Get Out Attack on Titan Rick and Morty
## 6 50 Hidden Figures The Cosby Show A Different World
## section
## 1 Stat100_L1
## 2 Stat100_L1
## 3 Stat100_L1
## 4 Stat100_L1
## 5 Stat100_L1
## 6 Stat100_L1
Use the View()
function, again passing the name of the data frame as the argument. It’ll display much cleaner and clearer in your Source
pane, looking more like this:
gender | genderID | height | weight | shoeSize | schoolYear | studyHr | GPA | ACT | pets | siblings | speed | cash | sleep | shoeNums | ageMother | ageFather | random | love | charity | movie | favTV1 | favTV2 | section |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Male | Male | 66 | 200 | 10.5 | Sophomore | 2.5 | 3.5 | 24 | 1 | 2 | 25 | 5 | 7.0 | 3 | 47 | 49 | 6 | few | 60 | Life is Beautiful | Narcos | Daredevil | Stat100_L1 |
Female | Female | 66 | 142 | 7.5 | Freshman | 3.0 | 3.8 | 26 | 1 | 1 | 80 | 11 | 4.5 | 21 | 37 | 47 | 4 | few | 60 | The Giant | Como dice el dicho | Drake and Josh | Stat100_L1 |
Male | Male | 65 | 160 | 10.5 | Freshman | 3.0 | 3.9 | 33 | 6 | 0 | 0 | 45 | 7.5 | 4 | 43 | 46 | 8 | few | 10 | Interstellar | none | none | Stat100_L1 |
Female | Female | 68 | 118 | 7.5 | Sophomore | 3.0 | 3.9 | 28 | 0 | 1 | 105 | 9 | 7.5 | 25 | 54 | 53 | 8 | few | 25 | La La Land | Friends | Parks and Recreation | Stat100_L1 |
Female | Female | 61 | 173 | 9.0 | Sophomore | 0.5 | 2.8 | 21 | 1 | 3 | 90 | 7 | 8.0 | 13 | 35 | 60 | 7 | one | 0 | Get Out | Attack on Titan | Rick and Morty | Stat100_L1 |
Female | Female | 66 | 125 | 8.0 | Sophomore | 2.5 | 2.3 | 20 | 0 | 0 | 50 | 3 | 6.0 | 20 | 40 | 41 | 3 | dozens | 50 | Hidden Figures | The Cosby Show | A Different World | Stat100_L1 |
Much better.
6.3 Writing Files
After you finish with your analysis, you may wish to save the data frame(s) that you’ve created. Similar to the read.csv()
function that allows you to import a CSV, the write.csv()
function will allow you to write your own CSV files to your computer to save and send as needed.