Chapter 16 for
Loops
In order to clean up our datasets, we’ll have to make use of control structures. This sounds super technical, but really what they help us do is navigate our way through the data in an organized, controlled fashion. They can do anything from helping us perform a series of operations on each observation to doing different operations based on criteria that we determine. There’s two main components to control structures: for
loops and conditional statements.
for
loops allow us to iterate, or cycle through, our data set. Inside of a for
loop, we can create variables, do calculations, or even put other for
loops (this is called nesting, and we’ll come back to it in a little bit).
We’ll do a few simple examples of for
loops just to demonstrate their usefulness. Let’s take a vector of numbers called v1
and use a for
loop to do something simple: add 1 to each of the numbers.
16.1 Algorithms
Now it’s time to think like a computer and devise an algorithm, or set of strict steps. “Legal” operations are anything you’d be able to do by hand, such as:
“Getting” a number, which just means seeing what the number is
“Saving” a number, since you could write it down on a piece of paper to remember it
“Repeating” any step, since this is what
for
loops allow us to doAny mathematical operation
Checking a condition (but more on that later)
Our algorithm for this process looks like this:
- Get the first number in the vector
v1
- Add 1 to this number
- Save this new number in the same place as the original number
- Repeat for all remaining numbers
Note: While using vectorization is the faster, more elegant way to do this same task, this is still possible (and easy to understand) in a for
loop.
## [1] 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99
16.2 Syntax
To declare or initialize (make use of) a for
loop, start the line with the word for
, followed immediately by parentheses (()
).
Inside of the parentheses, we need to make use of an iterable index. You should be familiar with indexes from before, but as a refresher, the index is just the location in the vector/data frame of the point that we’re talking about. We’ve been using indicies that we knew we wanted (i.e. the 3rd location we used [3]
), but now we want this iterable index to be “generally specific.” That is, we want it to be specific enough to know what location we’re at, but general enough so we don’t have to write the same line of code for each different index. Essentially, this iterable index is just a variable, and each time we go through the for
loop, the variable will increase by 1. We’ll call this iterable index i
.
The other thing we need to supply is the boundaries (that are inclusive) we want to iterate over. If we want to go over all of the elements of v1
, for example, we’d want to specify that by putting in 1:15
, since there are 15 elements in v1
.
Pro tip: it’s a better idea to fill in the boundries as 1:length(v1)
in case we need to change the number of elements in v1
(maybe we meant to put 10 or 1000 instead of 100?). Writing the boundaries in this way is more flexible in the code while still providing the same desired output, so that’s what we’ll do.
To finish off this line and specify the exact operations we’d like to do to v1
, we need to enclose the operations in curly braces ({}
). Each time R
hits that last, closing brace, it will increase i
by 1 and go back to the top. As long as i
stays within the boundaries, R
will keep running the code inside the loop. All together, we’ve got this as our for
loop declaration:
Inside of the curly braces is where the metaphorical magic happens. We’ll put our operations in there, and let R
handle the rest. We want to add 1 to each element of v1
and save it in its same location. Step-by-step, we’re saying to replace the ith element of v1
with 1 + the ith element of v1
.
## [1] 2 9 16 23 30 37 44 51 58 65 72 79 86 93 100
This is gross-looking, so let’s practice good coding style like we talked about earlier and show good spacing and indentation. Since we just acted on the initial v1
and made our replacements in-place, we need to recreate the original v1
vector.
## [1] 2 9 16 23 30 37 44 51 58 65 72 79 86 93 100
Another quick example following: let’s reverse the order of a set of numbers. We’ll create a vector called numbers
, which will have the numbers 1 through 10. Our goal is to reverse the order. Think like a computer and see if you agree with our algorithm:
- Get the first number from
numbers
- Get the last number from
numbers
(since this is the value we want to switch the first value with) and save it in a variable calledtemp
(since if we just put the first value in here, we’d lose the value it originally had) - Replace the last item with the first item
- Put the value from
temp
into the current location - Repeat steps 1-4 until we get to the halfway point of
numbers
(since then we’d start switching everything back)
All together, this is the resulting for
loop. Match it up to make sure you agree!
## [1] 1 2 3 4 5 6 7 8 9 10
for(i in 1:(length(numbers) / 2)){
temp = numbers[length(numbers) - i + 1]
numbers[length(numbers) - i + 1] = numbers[i]
numbers[i] = temp
}
numbers
## [1] 10 9 8 7 6 5 4 3 2 1
Note: We used the index length(numbers) - i + 1
since the first element we wanted to replace was the last element, or at position length(numbers)
. Since i
started at 1, and we needed it to change each time, we subtracted i
and then added the 1 back in to take care of the first case. You can trace (follow) it if you write out the values that i
takes in each iteration.
16.3 Nesting
The last point we’ll make about loops is that you can nest them, or put a for
loop inside of another for
loop. Just be sure that your indexes are different! Some example of when you may want to nest two for
loops would be if you wanted to count the number of matching values in two different vectors, or if you wanted to go through each feature of each observation (i.e. go through each row, and within each row, go through each value) of a data frame. Again, be careful with your indentation. Check out this example, where we check go through the rows and columns of a data frame and add the row number plus the column number to it. That is, the data in the i
th row and the j
th column will have i
+ j
added to it.
First, we need to create all of the vectors and merge them together as a data frame:
multiplication_table = data.frame(
ones = seq(1, 12, by = 1),
twos = seq(2, 24, by = 2),
threes = seq(3, 36, by = 3),
fours = seq(4, 48, by = 4),
fives = seq(5, 60, by = 5),
sixes = seq(6, 72, by = 6),
sevens = seq(7, 84, by = 7),
eights = seq(8, 96, by = 8),
nines = seq(9, 108, by = 9),
tens = seq(10, 120, by = 10),
elevens = seq(11, 132, by = 11),
twelves = seq(12, 144, by = 12)
)
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
2 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 18 | 20 | 22 | 24 |
3 | 3 | 6 | 9 | 12 | 15 | 18 | 21 | 24 | 27 | 30 | 33 | 36 |
4 | 4 | 8 | 12 | 16 | 20 | 24 | 28 | 32 | 36 | 40 | 44 | 48 |
5 | 5 | 10 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 50 | 55 | 60 |
6 | 6 | 12 | 18 | 24 | 30 | 36 | 42 | 48 | 54 | 60 | 66 | 72 |
7 | 7 | 14 | 21 | 28 | 35 | 42 | 49 | 56 | 63 | 70 | 77 | 84 |
8 | 8 | 16 | 24 | 32 | 40 | 48 | 56 | 64 | 72 | 80 | 88 | 96 |
9 | 9 | 18 | 27 | 36 | 45 | 54 | 63 | 72 | 81 | 90 | 99 | 108 |
10 | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 | 110 | 120 |
11 | 11 | 22 | 33 | 44 | 55 | 66 | 77 | 88 | 99 | 110 | 121 | 132 |
12 | 12 | 24 | 36 | 48 | 60 | 72 | 84 | 96 | 108 | 120 | 132 | 144 |
Now, our nested loops:
for(i in 1:nrow(multiplication_table)){
for(j in 1:ncol(multiplication_table)){
multiplication_table[i, j] = multiplication_table[i, j] + i + j
}
}
for
Loops
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 3 | 5 | 7 | 9 | 11 | 13 | 15 | 17 | 19 | 21 | 23 | 25 |
2 | 5 | 8 | 11 | 14 | 17 | 20 | 23 | 26 | 29 | 32 | 35 | 38 |
3 | 7 | 11 | 15 | 19 | 23 | 27 | 31 | 35 | 39 | 43 | 47 | 51 |
4 | 9 | 14 | 19 | 24 | 29 | 34 | 39 | 44 | 49 | 54 | 59 | 64 |
5 | 11 | 17 | 23 | 29 | 35 | 41 | 47 | 53 | 59 | 65 | 71 | 77 |
6 | 13 | 20 | 27 | 34 | 41 | 48 | 55 | 62 | 69 | 76 | 83 | 90 |
7 | 15 | 23 | 31 | 39 | 47 | 55 | 63 | 71 | 79 | 87 | 95 | 103 |
8 | 17 | 26 | 35 | 44 | 53 | 62 | 71 | 80 | 89 | 98 | 107 | 116 |
9 | 19 | 29 | 39 | 49 | 59 | 69 | 79 | 89 | 99 | 109 | 119 | 129 |
10 | 21 | 32 | 43 | 54 | 65 | 76 | 87 | 98 | 109 | 120 | 131 | 142 |
11 | 23 | 35 | 47 | 59 | 71 | 83 | 95 | 107 | 119 | 131 | 143 | 155 |
12 | 25 | 38 | 51 | 64 | 77 | 90 | 103 | 116 | 129 | 142 | 155 | 168 |
Great, everything worked! To check if it did, just take any number in the table, and subtract off the row and column numbers. This may seem like a silly example, and it admittedly is, but nested for
loops work even better when we want to only operate on data that meet certain conditions. You’ll learn about this in the next chapter.