Chapter 16 `for` Loops

In order to clean up our datasets, we’ll have to make use of control structures. This sounds super technical, but really what they help us do is navigate our way through the data in an organized, controlled fashion. They can do anything from helping us perform a series of operations on each observation to doing different operations based on criteria that we determine. There’s two main components to control structures: for loops and conditional statements.

for loops allow us to iterate, or cycle through, our data set. Inside of a for loop, we can create variables, do calculations, or even put other for loops (this is called nesting, and we’ll come back to it in a little bit).

We’ll do a few simple examples of for loops just to demonstrate their usefulness. Let’s take a vector of numbers called v1 and use a for loop to do something simple: add 1 to each of the numbers.

16.1 Algorithms

Now it’s time to think like a computer and devise an algorithm, or set of strict steps. “Legal” operations are anything you’d be able to do by hand, such as:

“Getting” a number, which just means seeing what the number is
“Saving” a number, since you could write it down on a piece of paper to remember it
“Repeating” any step, since this is what for loops allow us to do
Any mathematical operation
Checking a condition (but more on that later)

Our algorithm for this process looks like this:

Get the first number in the vector v1
Add 1 to this number
Save this new number in the same place as the original number
Repeat for all remaining numbers

Note: While using vectorization is the faster, more elegant way to do this same task, this is still possible (and easy to understand) in a for loop.

(v1 = seq(1, 100, by = 7))

##  [1]  1  8 15 22 29 36 43 50 57 64 71 78 85 92 99

16.2 Syntax

To declare or initialize (make use of) a for loop, start the line with the word for, followed immediately by parentheses (()).

for()

Inside of the parentheses, we need to make use of an iterable index. You should be familiar with indexes from before, but as a refresher, the index is just the location in the vector/data frame of the point that we’re talking about. We’ve been using indicies that we knew we wanted (i.e. the 3rd location we used [3]), but now we want this iterable index to be “generally specific.” That is, we want it to be specific enough to know what location we’re at, but general enough so we don’t have to write the same line of code for each different index. Essentially, this iterable index is just a variable, and each time we go through the for loop, the variable will increase by 1. We’ll call this iterable index i.

The other thing we need to supply is the boundaries (that are inclusive) we want to iterate over. If we want to go over all of the elements of v1, for example, we’d want to specify that by putting in 1:15, since there are 15 elements in v1.

Pro tip: it’s a better idea to fill in the boundries as 1:length(v1) in case we need to change the number of elements in v1 (maybe we meant to put 10 or 1000 instead of 100?). Writing the boundaries in this way is more flexible in the code while still providing the same desired output, so that’s what we’ll do.

To finish off this line and specify the exact operations we’d like to do to v1, we need to enclose the operations in curly braces ({}). Each time R hits that last, closing brace, it will increase i by 1 and go back to the top. As long as i stays within the boundaries, R will keep running the code inside the loop. All together, we’ve got this as our for loop declaration:

for(i in 1:length(v1)){}

Inside of the curly braces is where the metaphorical magic happens. We’ll put our operations in there, and let R handle the rest. We want to add 1 to each element of v1 and save it in its same location. Step-by-step, we’re saying to replace the i^th element of v1 with 1 + the i^th element of v1.

for(i in 1:length(v1)){v1[i] = v1[i] + 1}
v1

##  [1]   2   9  16  23  30  37  44  51  58  65  72  79  86  93 100

This is gross-looking, so let’s practice good coding style like we talked about earlier and show good spacing and indentation. Since we just acted on the initial v1 and made our replacements in-place, we need to recreate the original v1 vector.

v1 = seq(1, 100, by = 7)

for(i in 1:length(v1)){
  v1[i] = v1[i] + 1
}

v1

##  [1]   2   9  16  23  30  37  44  51  58  65  72  79  86  93 100

Another quick example following: let’s reverse the order of a set of numbers. We’ll create a vector called numbers, which will have the numbers 1 through 10. Our goal is to reverse the order. Think like a computer and see if you agree with our algorithm:

Get the first number from numbers
Get the last number from numbers (since this is the value we want to switch the first value with) and save it in a variable called temp (since if we just put the first value in here, we’d lose the value it originally had)
Replace the last item with the first item
Put the value from temp into the current location
Repeat steps 1-4 until we get to the halfway point of numbers (since then we’d start switching everything back)

All together, this is the resulting for loop. Match it up to make sure you agree!

(numbers = 1:10)

##  [1]  1  2  3  4  5  6  7  8  9 10

for(i in 1:(length(numbers) / 2)){
  temp = numbers[length(numbers) - i + 1]
  numbers[length(numbers) - i + 1] = numbers[i]
  numbers[i] = temp
}
numbers

##  [1] 10  9  8  7  6  5  4  3  2  1

Note: We used the index length(numbers) - i + 1 since the first element we wanted to replace was the last element, or at position length(numbers). Since i started at 1, and we needed it to change each time, we subtracted i and then added the 1 back in to take care of the first case. You can trace (follow) it if you write out the values that i takes in each iteration.

16.3 Nesting

The last point we’ll make about loops is that you can nest them, or put a for loop inside of another for loop. Just be sure that your indexes are different! Some example of when you may want to nest two for loops would be if you wanted to count the number of matching values in two different vectors, or if you wanted to go through each feature of each observation (i.e. go through each row, and within each row, go through each value) of a data frame. Again, be careful with your indentation. Check out this example, where we check go through the rows and columns of a data frame and add the row number plus the column number to it. That is, the data in the i^th row and the j^th column will have i + j added to it.

First, we need to create all of the vectors and merge them together as a data frame:

multiplication_table = data.frame(
  ones = seq(1, 12, by = 1),
  twos = seq(2, 24, by = 2),
  threes = seq(3, 36, by = 3),
  fours = seq(4, 48, by = 4),
  fives = seq(5, 60, by = 5),
  sixes = seq(6, 72, by = 6),
  sevens = seq(7, 84, by = 7),
  eights = seq(8, 96, by = 8),
  nines = seq(9, 108, by = 9),
  tens = seq(10, 120, by = 10),
  elevens = seq(11, 132, by = 11),
  twelves = seq(12, 144, by = 12)
)

Multiplication Table for Numbers 1 through 12

	1	2	3	4	5	6	7	8	9	10	11	12
1	1	2	3	4	5	6	7	8	9	10	11	12
2	2	4	6	8	10	12	14	16	18	20	22	24
3	3	6	9	12	15	18	21	24	27	30	33	36
4	4	8	12	16	20	24	28	32	36	40	44	48
5	5	10	15	20	25	30	35	40	45	50	55	60
6	6	12	18	24	30	36	42	48	54	60	66	72
7	7	14	21	28	35	42	49	56	63	70	77	84
8	8	16	24	32	40	48	56	64	72	80	88	96
9	9	18	27	36	45	54	63	72	81	90	99	108
10	10	20	30	40	50	60	70	80	90	100	110	120
11	11	22	33	44	55	66	77	88	99	110	121	132
12	12	24	36	48	60	72	84	96	108	120	132	144

Now, our nested loops:

for(i in 1:nrow(multiplication_table)){
  for(j in 1:ncol(multiplication_table)){
    multiplication_table[i, j] = multiplication_table[i, j] + i + j
  }
}

After Nested for Loops

	1	2	3	4	5	6	7	8	9	10	11	12
1	3	5	7	9	11	13	15	17	19	21	23	25
2	5	8	11	14	17	20	23	26	29	32	35	38
3	7	11	15	19	23	27	31	35	39	43	47	51
4	9	14	19	24	29	34	39	44	49	54	59	64
5	11	17	23	29	35	41	47	53	59	65	71	77
6	13	20	27	34	41	48	55	62	69	76	83	90
7	15	23	31	39	47	55	63	71	79	87	95	103
8	17	26	35	44	53	62	71	80	89	98	107	116
9	19	29	39	49	59	69	79	89	99	109	119	129
10	21	32	43	54	65	76	87	98	109	120	131	142
11	23	35	47	59	71	83	95	107	119	131	143	155
12	25	38	51	64	77	90	103	116	129	142	155	168

Great, everything worked! To check if it did, just take any number in the table, and subtract off the row and column numbers. This may seem like a silly example, and it admittedly is, but nested for loops work even better when we want to only operate on data that meet certain conditions. You’ll learn about this in the next chapter.