Chapter 6 Loading Data

Like we learned before, you’ll be able to see what data, variables, and other things you’ve loaded and stored in R by checking your Environment tab. Right now, yours is most likely empty, and that’s okay! We’ve learned a lot, but haven’t started doing a ton yet. We will soon, but first we need to learn how to get data into R so that we can start exploring it.

6.1 Data Files

Most of the time, you’ll want to work with something called a CSV file. CSV stands for Comma Separated Value, which means that all of the data within an observation is separated by a comma Here’s an example CSV file that you can download. To see the raw file, you can open it in a program like TextEdit on a Mac or Notepad on PC. Usually, a CSV file will default to open in a spreadsheet program like Excel since it’s easy to parse (read in and break up) into columns – this is because of the commas!

6.2 Reading in CSV Files

To read a CSV into R, there’s a great function called read.csv(). It will read in your file as a data frame. To read in the file, you just need to put the name of the file as a charachter string (remember your data types?) in between the parenthesis. This is the only required argument, or input to a function, that you need to supply, however there are a few others to note:

  • The stringsAsFactors argument can be either a TRUE or FALSE value. When TRUE (this argument’s default value), it takes any character strings in the data and coerces them to be factors. Sometimes this may be okay, sometimes not

  • The header argument. This indicates whether or not a header row is present in the CSV file, which would contain names for all of the columns. It defaults to TRUE, but it’s a good idea to double check your data and make sure that one’s present. You can check the names and rename if you wish by using the names() function discussed in chapter 3

Directories

One common issue is the location on your computer where the CSV is located. If you just type the name of the file as the argument for read.csv(), R will look for it in your working directory, or default file lookup location. However, if your file isn’t present in the working directory, you’ll likely get an error message. If this is the case, you have a few options:

  1. You can supply the full filepath to the data as the string. Watch this video to find how to find the full filepath for Mac or this one for PC. While it may be more work to find, it’ll guarantee that you import the right file.

  2. You can change your working directory with the setwd() function, supplying the path to the directory as a character string argument. This still requires knowing where your file is located, but if you plan to work with multiple files, this isn’t a bad option. You can check your current working directory with the getwd() function without any arguments.

Loading Example

Now let’s import that example CSV file. It’s the combined results from Survey 1 of STAT 100 and STAT 200. Remember, we should name the new data frame so that we can look at it and refer back to it. We’ll use the CSV file 'Combined Fall 2017 Survey 1.csv', which does have a header row, and is located in a subdirectory of our working directory called data (this is the directory that holds all of our data files for the book). It’s in a directory called data, has a header row, and we’ll keep character variables as characters. When we import it, we’ll call it survey1:

To see the first few observations, you can use the head() function, passing the data frame’s name (in our case, survey1). This will display the observations in your Console, and will look like this:

##   gender genderID height weight shoeSize schoolYear studyHr GPA ACT pets
## 1   Male     Male     66    200     10.5  Sophomore     2.5 3.5  24    1
## 2 Female   Female     66    142      7.5   Freshman     3.0 3.8  26    1
## 3   Male     Male     65    160     10.5   Freshman     3.0 3.9  33    6
## 4 Female   Female     68    118      7.5  Sophomore     3.0 3.9  28    0
## 5 Female   Female     61    173      9.0  Sophomore     0.5 2.8  21    1
## 6 Female   Female     66    125      8.0  Sophomore     2.5 2.3  20    0
##   siblings speed cash sleep shoeNums ageMother ageFather random   love
## 1        2    25    5   7.0        3        47        49      6    few
## 2        1    80   11   4.5       21        37        47      4    few
## 3        0     0   45   7.5        4        43        46      8    few
## 4        1   105    9   7.5       25        54        53      8    few
## 5        3    90    7   8.0       13        35        60      7    one
## 6        0    50    3   6.0       20        40        41      3 dozens
##   charity             movie             favTV1               favTV2
## 1      60 Life is Beautiful             Narcos            Daredevil
## 2      60         The Giant Como dice el dicho       Drake and Josh
## 3      10      Interstellar               none                 none
## 4      25        La La Land            Friends Parks and Recreation
## 5       0           Get Out    Attack on Titan       Rick and Morty
## 6      50    Hidden Figures     The Cosby Show    A Different World
##      section
## 1 Stat100_L1
## 2 Stat100_L1
## 3 Stat100_L1
## 4 Stat100_L1
## 5 Stat100_L1
## 6 Stat100_L1

Use the View() function, again passing the name of the data frame as the argument. It’ll display much cleaner and clearer in your Source pane, looking more like this:

gender genderID height weight shoeSize schoolYear studyHr GPA ACT pets siblings speed cash sleep shoeNums ageMother ageFather random love charity movie favTV1 favTV2 section
Male Male 66 200 10.5 Sophomore 2.5 3.5 24 1 2 25 5 7.0 3 47 49 6 few 60 Life is Beautiful Narcos Daredevil Stat100_L1
Female Female 66 142 7.5 Freshman 3.0 3.8 26 1 1 80 11 4.5 21 37 47 4 few 60 The Giant Como dice el dicho Drake and Josh Stat100_L1
Male Male 65 160 10.5 Freshman 3.0 3.9 33 6 0 0 45 7.5 4 43 46 8 few 10 Interstellar none none Stat100_L1
Female Female 68 118 7.5 Sophomore 3.0 3.9 28 0 1 105 9 7.5 25 54 53 8 few 25 La La Land Friends Parks and Recreation Stat100_L1
Female Female 61 173 9.0 Sophomore 0.5 2.8 21 1 3 90 7 8.0 13 35 60 7 one 0 Get Out Attack on Titan Rick and Morty Stat100_L1
Female Female 66 125 8.0 Sophomore 2.5 2.3 20 0 0 50 3 6.0 20 40 41 3 dozens 50 Hidden Figures The Cosby Show A Different World Stat100_L1

Much better.

6.3 Writing Files

After you finish with your analysis, you may wish to save the data frame(s) that you’ve created. Similar to the read.csv() function that allows you to import a CSV, the write.csv() function will allow you to write your own CSV files to your computer to save and send as needed.