# 6. Simple random sampling in the field

The most common sampling design in vegetation science is simple random sampling. Simple random sampling is a type of probability sampling where each sampling location is equally likely to be selected, and the selection of one location does not influence which is selected next. In statistical terms, the sampling locations are independent and identically distributed.

Consider an example of simple random sampling (SRS) of canopy forest trees. You have determined that there are 24 canopy trees in the sampling universe of interest, and you want to take measurements from a subset of this group of 24, using simple random sampling. One way to do this is to number each tree (1-24), put numbers in a hat, and pick one. The tree corresponding to the number is now part of your sampling subset. Each number (that is, each tree) is equally likely to get picked and picking one number doesn't change the probability that another number will get picked next time.

There are two versions of random sampling: sampling with replacement and sampling without replacement. In the example of tree numbers in a hat, if you return the selected number to the hat, the corresponding tree has another chance to get selected. (And if selected, you repeat your measurements on the tree.) That is sampling with replacement. If instead you discard a number once it is selected—sampling without replacement—a tree can be selected only once. In vegetation science, SRS without replacement is much more common than SRS with replacement.

Picking numbers out of a hat is perfectly valid, if done correctly, but there are better ways to select random numbers. Even if you are familiar with using random number tables and random number generators in calculators, review the section of the course called How to use random number tables and generators.

## The general sequence for conducting simple random sampling studies in the field

The general procedures for any simple random sampling study in vegetation science are about the same. First, as has been emphasized in the course, you must determine your ecological objectives. For example, you might wish to know the stand basal area of a community. Then decide on the sampling scheme, such as sampling by individual or sampling by area, as with quadrats. Then pick individuals or locations at random (this is the simple random sampling part), and take your measurements. Finally, you use the data you collected to make inferences about the whole sampling universe, coming up with statements like "the stand basal area is 49 m2/ha."

In this section of the course you will learn the procedures for locating random samples in the field and the formulas for analyzing data collected from simple random samples.

## Sampling by individual

The first step is to number all the individuals in your sampling universe. In simple random sampling, each of these individuals has an equal chance of being selected. This step is a lot trickier than it might seem. For one thing, you must use an unambiguous definition of what constitutes an individual. Plants that spread vegetatively are notoriously difficult to separate into individuals. You must also enumerate all individuals in your sampling universe; if you don't, you violate the "equal chance of being selected" tenet of simple random sampling. Perhaps the most common use of sampling by individuals is with mature trees, where separate trunks define individuals and it is feasible to number all individuals. Sampling rhizomatous grasses, mosses, and much of the rest of the plant world by individual is usually not feasible.

Once your individuals are numbered, the next step is to select among those numbers at random, using a random number table or random number generator. You will make your measurements on the group of selected individuals.

It can be inefficient to pick a random number, take measurements on that individual, pick another random number, take measurements on that individual, and so forth. Much better is to pick the numbers for all the individuals to be measured ahead of time. Then you can plot a short path that visits each selected individual, and save yourself a lot of time.

 The figure on the right shows how this works. You have selected four trees at random. Don't go to the first tree you selected (marked as 1), make measurements, then traipse to the second tree. Rather, pick an efficient path, as from tree 3 to tree 1 to tree 4 to tree 2.

## The wrong way to pick random individuals

As mentioned earlier, the process of selecting random individuals requires an enumeration of all individuals. This enumeration can be an exhausting task. It might be tempting to use other techniques for the random selection of individuals. One of the most tempting shortcuts is to use the coordinate system to find a random location, then select the nearest individual for measurement. Although this sounds good, it is both technically invalid and can produce bad data.

Look at the diagram to see how this approach can go wrong. The illustration uses trees, but the principle holds for most any kind of plant. X marks the spot of a point selected at random; tree A is the nearest tree to this point (see left diagram). The problem with this approach is that plants are seldom uniformly distributed throughout vegetation. In the illustration, the trees are distributed in clumps. Look at the diagram on the right. The irregular polygons show all the points that are closest to the enclosed tree. That is, any random point that lands within the polygon results in that tree being selected. The polygon for tree A is much larger than the polygon for tree B, meaning that tree A is more likely to get selected than tree B. This violates the basic assumption of simple random sampling! Whenever plants are distributed in a non-uniform pattern, isolated individuals are more likely to be selected.

In the illustration, using this flawed technique for selected trees would produce misleading data. Because of crowding, trees within the clumps tend to be stunted and trees on the edge of clumps larger. In the illustration, taking measurements from trees that were selected because they are closest to random points will strongly overestimate tree abundance, because you are more likely to select trees on the edges of clumps.

## Locating quadrats using the coordinate system

The coordinate system is easier to explain if we assume that your study area (your sampling universe) is a rectangular tract of vegetation. Later, you'll learn how to relax this requirement. So let's say your study area is 100 m by 60 m, and you want to sample with quadrats selected at random from this area.

 Every point in this 100-m by 60-m rectangle corresponds to a pair of Cartesian coordinates. Call the 100-m side the X axis, and the 60-m side the Y axis. By picking a pair of random numbers, one between 0 and 100 and the other between 0 and 60, you are picking a random location within your study area. The figure shows where your quadrat would be located if you picked as your random pair of numbers X = 60.7 and Y = 36.2

OK, but finding your quadrat in the field is not as easy as finding it on a diagram. The most efficient process is to create one axis of this coordinate system by placing a meter tape along one side of the study area, with the zero end of the tape at one corner. To locate your plot, go to the point on this axis corresponding to the first number in your random number pair. Then run a second tape out at right angles for a distance corresponding to the second number in your random number pair. To see this process in action, click here.  (The coordinates have been rounded in this animation; do not round in the field.)

Repeat this process for each quadrat location. As usual, it is more efficient to select the series of random numbers first, even in the lab well before going to the field. That way you can rearrange the sequence of quadrats into an efficient order.

Once you have your random location for the quadrat, you need a system for actually placing the quadrat on the ground. You want a system that doesn't harm the vegetation and a system that is statistically valid. See the section on 'Hints for dealing with reality' for my advice.

## Locating quadrats using the grid system

 In the grid system, you divide up your study area into non-overlapping quadrat-sized rectangles. See the figure for what this looks like. These rectangles make up a grid for your study area. Do this on paper, not on the ground! Each rectangle segment of the resulting grid is a potential location for a quadrat. Number all the grid rectangles. Pick your quadrat locations by selecting from their numbers at random.

To actually find these quadrat locations in the field, use the procedure described for the coordinate system.

Now is a good chance to visit How to use random number tables and generators, if you haven't already. This section explains some nuances about using random numbers in the coordinate and grid systems.)

## Simple random sampling by area for non-rectangular study areas

 Many studies in vegetation science do not have the luxury of rectangular study areas. You can still use the coordinate system, but there is some extra work involved. Basically, you pick random coordinates as before but discard any locations that fall outside your sampling universe. This process is a lot easier if you have a map of the area boundary so you can select random locations in the lab.

The grid system for selecting sample locations does not work well for non-rectangular study areas because the study area usually cannot be broken up into equal-sized rectangles.

## Using GPS to locate quadrats

The Global Positioning System (GPS), coupled with Geographical Information Systems (GIS), provides an efficient way to locate points in the field. Modern, affordable GPS units can take you to a defined location within 2-5 meters. For locating sites or for locating large sampling plots, GPS can save a lot of effort. For intensive sampling with quadrats less than 200 m2 in area, GPS is usually too coarse and you need to stick with measuring tapes. A good procedure is to use a GPS unit to establish the boundaries of your study area, then use tape and stakes to locate sampling quadrats.

Tips on using GPS in vegetation science:

• Use a GPS unit with a high precision (at least within 5 m).
• Be wary of GPS units that drift rapidly, that is, the readings change before your eyes. Drift occurs because the unit has not locked onto enough satellites or does not have good enough software.
• If you have any drift, create a rule for knowing when you have reached your location, otherwise your subjective judgment will creep in and the sampling will no longer be random. A good rule is to stop the first time the GPS unit says that you have reached your destination. That is, do not wait for the unit to "settle down."
• Once you have stopped, have a rule for locating the plot itself. I like to use the mid-point between the toes of my boots.
• If you use your GPS unit to enter the boundary of your study area as a polygon, you can use some GIS systems to help in sampling. For example, many GIS systems will select random coordinates from within a polygon. This automatic process is much faster than the coordinate method when your study area is highly irregular in shape.

## Hints for dealing with reality

Sometimes the application of a procedure that sounds straightforward gets tricky in the application. This section presents some hints on dealing with the details of locating your samples.

### Avoid self-inflicted damage

It is unavoidable.  You have to walk through your study area as you establish its boundaries, as you find your sampling locations, and as you shift from side to side as you collect data.  If a plot ends up where your boots have ripped up the vegetation, what do you do?  (See the previous paragraph.)  Best minimize the damage that you and your crew-mates inflict on your study area.  Walk on animal trails when you can.  Know where your future plots will be, so you can avoid walking through those locations.  Eat your lunch outside the study area.

### Warnings and technicalities about the coordinate system.

When using the coordinate system, you need to decide if the selected coordinates designate the center of the quadrat or one of its corners. You also need to pick a plot orientation. Just pick a system (like "put the plot center at the selected coordinate and orient the long dimension of the quadrat north to south") and stick with it. The point of the system is to eliminate any subconscious bias in placing the quadrat frame. For example, in my experience, folks tend to move the frame away from poison oak but toward pretty flowers! Having a system protects your data from your subconscious biases.

Sometimes the selection of random locations leads to quadrats that overlap each other. This is statistically acceptable and goes by the technical name of "sampling with replacement." But overlapping quadrats are hardy ever used in vegetation science. For one thing, the vegetation around the previous quadrat is usually disturbed by the process of sampling. The second, overlapping quadrat would then be damaged and give false data. (See above.) The standard procedure in vegetation science is to drop any random locations that would produce an overlap with a previous quadrat.

### Knowing what is important

An important purpose of these guidelines for locating samples is to take the process out of our subjective hands and into an objective set of procedures. So it is important to follow the objective procedure precisely.  But it is also important to recognize which part of your procedures are crucial for maintaining objective, representative, and independent observations -- and which parts are not.  Imagine yourself at the end of a hard morning of sampling, when you discover that all your quadrat locations are off by half a meter because the tape establishing one Cartesian axis wasn't pulled quite tight enough.  Do you throw away your data from the morning and start over?  Not if you're on my crew! As long as the mistake didn't push a location outside your study boundary, everything is OK.  The mistake was unintentional, so it couldn't impose a subjective choice on the location of quadrats.  The locations are still random and independent of each other.  Therefore the data collected from those locations are completely valid.  Note the corrected locations, and get ready for the afternoon.

## Locating lines for line-intercept measurements of cover

The process of locating lines involves selecting a starting point and a direction. The coordinate system described for locating quadrats also works for locating random starting points. You can then pick a random direction by, for example, picking a random number between 1 and 360 and going in that compass direction. This system has two problems with the boundaries of the study area that are similar to the problems with locating quadrats. The issue is more severe, though, because lines are long and are more likely to extend beyond the edge of the study area.

 If you have a rectangular study area, there is a better way to locate lines. Say you have a 50 m by 100 m study area, and you want to locate 8 lines that are 25 m long. Picking a random starting point along the100-m axis of the study area, and then picking left or right at random (as by flipping a coin), is an efficient way to find line locations. This is valid simple random sampling, because every part of the study area is equally likely to be sampled and the location of one line does not affect the location of any other line.

## What to do with your measurements? Calculations for SRS

### Central tendency and variability

With simple random sampling without replacement, the best estimate of the population mean () is usually the sample mean, the mean of your n measurements:

The best estimate of the population variability is usually the standard deviation of your data:

.

There are separate formulas for and s2 for other sampling designs, like stratified random sampling and cluster sampling. Refer to the course references for details.

Be sure to keep in mind your scientific objective: You want to make statements about the population mean and about your confidence in that mean. That is, you need to know the variability of your estimate of the mean, not the variability of the data. Lucky for us, statistical theory provides a way to convert from describing data to describing the behavior of your estimates of the mean:

,

where n is the size of your sample, N is the size of the entire population, and

is the amount you expect your estimates of the mean to vary. is often called the standard error.

But what about the factor on the right in the equation? This factor is called the finite population correction, or fpc. It is necessary because statistical distributions describe infinite populations, but sampling is from a carefully delimited (finite) population. (Reminder: The step of defining your study area / statistical population / sampling universe is the step that makes the sampling population finite.) You can see the effect of sample size on fpc at the extremes. When N is very large and n very small, fpc approaches 1 and the formula reduces to that of the familiar standard error. When n = N, fpc = 0, which makes the estimate of variability = 0! But this makes sense because you have measured every member of the population and you now have a census, not a sample. Because you know the whole population, you know the mean exactly and there is no sampling error.

Most studies in vegetation science ignore the finite population correction. Although technically incorrect, in practice it has little effect because sampling intensity in vegetation science is typically very low. For example, the sampling intensity of a study using 20 1-m2 quadrats per hectare is only 20/10000, so the fpc is

which is very close to 1.0. For the rest of the course, we will follow this grand tradition and usually not bother with the finite population correction factor unless sampling intensity goes above 10%.

### Confidence and confidence interval

The next step is to convert your estimates of the population mean and its variability into confidence intervals.

The statistical formula for the confidence interval with simple random sampling is the same as the standard formula (see the Statistical Background chapter and the Confidence Interval primer):

to .

As before, is usually the best estimate of the population mean, t, the t-statistic, reflects both the number of samples and the level of confidence you have set (like 90%), and , the standard error, reflects the variability in the data.

## More on precision

 Before you use your carefully calculated values of central tendency and variability, pause a while to reflect on what contributes to the variability you measure. If your technique of vegetation measurement varied from one time to another (and you know it did) then this measurement variability contributes to overall variability. 22%?  28%? If the vegetation itself varied from one sample location to another (and it always does), then this spatial heterogeneity or sampling variability contributes to overall variability. Here's the important part.  You can reduce the effect of sampling variability just by collecting more statistically valid samples. But the only way to reduce measurement variability is to get better at conducting the measurements themselves. That is what a lot of Chapter 3 was about, and what Chapter 9 will state again.

## Putting your knowledge to the test

At this point, go to Assignments in Blackboard and select the quiz called "Using random numbers," if you haven't already. Then test your understanding of locating simple random samples with the exercise Locating quadrats.

© 2005 Mark V. Wilson and Oregon State University