Chapter 16 for Loops

In order to clean up our datasets, we’ll have to make use of control structures. This sounds super technical, but really what they help us do is navigate our way through the data in an organized, controlled fashion. They can do anything from helping us perform a series of operations on each observation to doing different operations based on criteria that we determine. There’s two main components to control structures: for loops and conditional statements.

for loops allow us to iterate, or cycle through, our data set. Inside of a for loop, we can create variables, do calculations, or even put other for loops (this is called nesting, and we’ll come back to it in a little bit).

We’ll do a few simple examples of for loops just to demonstrate their usefulness. Let’s take a vector of numbers called v1 and use a for loop to do something simple: add 1 to each of the numbers.

16.1 Algorithms

Now it’s time to think like a computer and devise an algorithm, or set of strict steps. “Legal” operations are anything you’d be able to do by hand, such as:

  • “Getting” a number, which just means seeing what the number is

  • “Saving” a number, since you could write it down on a piece of paper to remember it

  • “Repeating” any step, since this is what for loops allow us to do

  • Any mathematical operation

  • Checking a condition (but more on that later)

Our algorithm for this process looks like this:

  1. Get the first number in the vector v1
  2. Add 1 to this number
  3. Save this new number in the same place as the original number
  4. Repeat for all remaining numbers

Note: While using vectorization is the faster, more elegant way to do this same task, this is still possible (and easy to understand) in a for loop.

##  [1]  1  8 15 22 29 36 43 50 57 64 71 78 85 92 99

16.2 Syntax

To declare or initialize (make use of) a for loop, start the line with the word for, followed immediately by parentheses (()).

Inside of the parentheses, we need to make use of an iterable index. You should be familiar with indexes from before, but as a refresher, the index is just the location in the vector/data frame of the point that we’re talking about. We’ve been using indicies that we knew we wanted (i.e. the 3rd location we used [3]), but now we want this iterable index to be “generally specific.” That is, we want it to be specific enough to know what location we’re at, but general enough so we don’t have to write the same line of code for each different index. Essentially, this iterable index is just a variable, and each time we go through the for loop, the variable will increase by 1. We’ll call this iterable index i.

The other thing we need to supply is the boundaries (that are inclusive) we want to iterate over. If we want to go over all of the elements of v1, for example, we’d want to specify that by putting in 1:15, since there are 15 elements in v1.

Pro tip: it’s a better idea to fill in the boundries as 1:length(v1) in case we need to change the number of elements in v1 (maybe we meant to put 10 or 1000 instead of 100?). Writing the boundaries in this way is more flexible in the code while still providing the same desired output, so that’s what we’ll do.

To finish off this line and specify the exact operations we’d like to do to v1, we need to enclose the operations in curly braces ({}). Each time R hits that last, closing brace, it will increase i by 1 and go back to the top. As long as i stays within the boundaries, R will keep running the code inside the loop. All together, we’ve got this as our for loop declaration:

Inside of the curly braces is where the metaphorical magic happens. We’ll put our operations in there, and let R handle the rest. We want to add 1 to each element of v1 and save it in its same location. Step-by-step, we’re saying to replace the ith element of v1 with 1 + the ith element of v1.

##  [1]   2   9  16  23  30  37  44  51  58  65  72  79  86  93 100

This is gross-looking, so let’s practice good coding style like we talked about earlier and show good spacing and indentation. Since we just acted on the initial v1 and made our replacements in-place, we need to recreate the original v1 vector.

##  [1]   2   9  16  23  30  37  44  51  58  65  72  79  86  93 100

Another quick example following: let’s reverse the order of a set of numbers. We’ll create a vector called numbers, which will have the numbers 1 through 10. Our goal is to reverse the order. Think like a computer and see if you agree with our algorithm:

  1. Get the first number from numbers
  2. Get the last number from numbers (since this is the value we want to switch the first value with) and save it in a variable called temp (since if we just put the first value in here, we’d lose the value it originally had)
  3. Replace the last item with the first item
  4. Put the value from temp into the current location
  5. Repeat steps 1-4 until we get to the halfway point of numbers (since then we’d start switching everything back)

All together, this is the resulting for loop. Match it up to make sure you agree!

##  [1]  1  2  3  4  5  6  7  8  9 10
##  [1] 10  9  8  7  6  5  4  3  2  1

Note: We used the index length(numbers) - i + 1 since the first element we wanted to replace was the last element, or at position length(numbers). Since i started at 1, and we needed it to change each time, we subtracted i and then added the 1 back in to take care of the first case. You can trace (follow) it if you write out the values that i takes in each iteration.

16.3 Nesting

The last point we’ll make about loops is that you can nest them, or put a for loop inside of another for loop. Just be sure that your indexes are different! Some example of when you may want to nest two for loops would be if you wanted to count the number of matching values in two different vectors, or if you wanted to go through each feature of each observation (i.e. go through each row, and within each row, go through each value) of a data frame. Again, be careful with your indentation. Check out this example, where we check go through the rows and columns of a data frame and add the row number plus the column number to it. That is, the data in the ith row and the jth column will have i + j added to it.

First, we need to create all of the vectors and merge them together as a data frame:

Multiplication Table for Numbers 1 through 12
1 2 3 4 5 6 7 8 9 10 11 12
1 1 2 3 4 5 6 7 8 9 10 11 12
2 2 4 6 8 10 12 14 16 18 20 22 24
3 3 6 9 12 15 18 21 24 27 30 33 36
4 4 8 12 16 20 24 28 32 36 40 44 48
5 5 10 15 20 25 30 35 40 45 50 55 60
6 6 12 18 24 30 36 42 48 54 60 66 72
7 7 14 21 28 35 42 49 56 63 70 77 84
8 8 16 24 32 40 48 56 64 72 80 88 96
9 9 18 27 36 45 54 63 72 81 90 99 108
10 10 20 30 40 50 60 70 80 90 100 110 120
11 11 22 33 44 55 66 77 88 99 110 121 132
12 12 24 36 48 60 72 84 96 108 120 132 144

Now, our nested loops:

After Nested for Loops
1 2 3 4 5 6 7 8 9 10 11 12
1 3 5 7 9 11 13 15 17 19 21 23 25
2 5 8 11 14 17 20 23 26 29 32 35 38
3 7 11 15 19 23 27 31 35 39 43 47 51
4 9 14 19 24 29 34 39 44 49 54 59 64
5 11 17 23 29 35 41 47 53 59 65 71 77
6 13 20 27 34 41 48 55 62 69 76 83 90
7 15 23 31 39 47 55 63 71 79 87 95 103
8 17 26 35 44 53 62 71 80 89 98 107 116
9 19 29 39 49 59 69 79 89 99 109 119 129
10 21 32 43 54 65 76 87 98 109 120 131 142
11 23 35 47 59 71 83 95 107 119 131 143 155
12 25 38 51 64 77 90 103 116 129 142 155 168

Great, everything worked! To check if it did, just take any number in the table, and subtract off the row and column numbers. This may seem like a silly example, and it admittedly is, but nested for loops work even better when we want to only operate on data that meet certain conditions. You’ll learn about this in the next chapter.