logo

5. Sampling designs

Field work is essential in vegetation science. But thick undergrowth, poisonous plants, steep slopes, and other delights of the field can be a challenge, even more so after a month of sampling! This section of the course covers where to locate your samples, a key aspect of a study's sampling design. I first want to convince you that two types of sampling designs, although convenient to apply in the field, are inadequate for gathering sound data. To see this, you need to apply the concepts of replication, independence, randomness, representativeness, and interspersion. Probability sampling is the solution, a topic you'll learn more about in the next section of the course.

Convenient but flawed sampling designs

Replication, independence, randomness, representativeness, and interspersion

Probability sampling is the solution

Know the sampling universe

Randomness and interspersion (for 540 students)

Preview: Sampling designs vs. experimental designs

Putting your knowledge to the test


Convenient but flawed sampling designs

Forest diagram
One of these convenient but flawed sampling designs is preferential sampling. Look first at an example, a study trying to estimate biomass in a large tract of forest. Imagine you are the leader of the field crew. You drive into the forest and, for convenience you place your plots next to parking turnouts (marked with asterisks). This is not an unusual occurrence, especially in the early days of vegetation science. After all, the forest next to the turnout is part of what you want to measure. Why not take advantage of its convenience?

This is called preferential sampling because you place your samples where you prefer. The advantage is that it is quick and easy, important considerations in field work. The disadvantage is that it is unfair to draw inferences to the parts of the forest not next to turnouts. What if there is an effect of being next to the road? Compared to intact forest, the areas next to roads will have higher light, experience greater daily changes in temperature, get more weed seeds, and so forth. These special conditions can influence what you are trying to measure, in this case forest biomass.

In this hypothetical example, the data collected through preferential sampling next to turnouts would severely underestimate the true biomass of the forest. In statistical terms, because forest next to turnouts is not representative of the sampling universe as a whole (the entire forest tract), your estimate will be biased.

In real life examples, you can never tell if samples selected preferentially will be biased. But the chances of preferential sampling producing wrong answers is high. The solution is simple: Don't sample preferentially.

A better, but still flawed design is systematic sampling. Consider another hypothetical example, a study of succession in an abandoned agricultural field. You decide to sample from a grid, with quadrats 3 m apart. The advantages of this systematic design are that quadrat locations can be found quickly and easily, but unlike the previous example of preferential sampling, the quadrats are spread across the sampling universe.

Little did you know, however, that your grid corresponds to the planting pattern of corn in the old agricultural field! The data you collect will reflect what grows up after corn, and under-represents the plants that grow up in the weedy spots between rows. You might have thought that your grid of quadrats was giving you a sample that was representative because it was interspersed across the field. It is true that there was spatial interspersion across the field. But there was no interspersion across the kinds of vegetation in the field.

 

Grid
Corn

You might feel that as a savvy vegetation scientist you would never superimpose a sampling grid on pre-existing vegetation patterns, as was done in the corn example. The problem is that pre-existing patterns might not be obvious. The only solution is to abandon systematic sampling and embrace sampling designs that have replication and independence.

If you think these two examples were bad, consider the belt transect. The belt refers to the strip of quadrats arrayed along a transect line. In this approach, sampling is concentrated along one (or sometimes a few) transects. Typically, the location of the transect is picked subjectively. The belt transect approach combines the sins of both preferential sampling and systematic sampling. Moreover, the concentration of quadrats along a single line gives very low interspersion. I can imagine no circumstance where the belt transect approach should be used to gather information about a plant community.

Replication, independence, randomness, representativeness, and interspersion

In this section you will look more carefully at replication, independence, randomness, representativeness, and interspersion. These related concepts are important in helping you evaluate sampling approaches.

Replication is the repetition of equivalent measurements. Replication is an essential element of a good field design because, by allowing you to estimate variability, it tells you how confident you should be in your conclusions. Replication is essential because otherwise it is impossible to tell how representative a sample is. It is true that if a single quadrat was placed in a perfectly typical location, then measurements from the quadrat would represent the entire community. But the only way to know if a location is typical is to have multiple measurements, throughout the community! Repeating measurements across the community allows you to sample the range of conditions present in the vegetation. You determine from these replicated measurements what is "typical" or average.

Replicated measurements will generally be more representative if they are independent of each other and interspersed across the community. Independence means that the location of one sample does not affect the location of another. The best way to assure independence is to locate your samples at random. (There are many versions of field sampling that include randomness, including simple random sampling, stratified random sampling, cluster sampling, and ratio sampling. In this course, we will concentrate on simple random sampling and consider cluster sampling.)

Randomness and independence among sample points are key assumptions of standard statistical analysis. Without randomness and independence, standard statistical inference and statements about confidence intervals and confidence level are invalid. Let me say this again, another way. Data collected using sampling designs that lack randomness and independence are essentially useless.

Interspersion, the distribution of samples across the community, is another way to make your measurements representative. Unfortunately, not all random samples have adequate interspersion, just like when flipping a fair coin, it is possible to flip five tails in a row. (Students enrolled for graduate credit will revisit this issue of randomness and interspersion.)

Now consider preferential and systematic sampling in light of these statistical concepts. Preferential sampling, almost by definition, is not representative of the community. That is, results from preferential sampling will be biased. The repeated measurements in preferential sampling are not independent of each other, because all samples are in "preferred" locations. As a result, the variability from one observation to another does not represent the variability within the statistical universe, the whole community under study. The lack of independence and lack of representativeness means that you cannot make valid statistical inferences. This is bad!

Systematic sampling, in an important sense, is unreplicated. In the example of sampling across a grid presented earlier, once the location and orientation of the first quadrat is determined, the positions of all the remaining 15 quadrats is determined. The individual quadrats are not independent. Sample size is 1! (This is completely analogous to a single quadrat, where once one corner of the quadrat frame is in place, the locations of the other corners and of the whole quadrat are determined.) Completely systematic designs are not replicated, sample points are not randomly located and independent, and the design can be unrepresentative of the whole community. This too is bad!

Probability sampling is the solution

The problems with preferential and systematic sampling are solved by using the approach called probability sampling. In probability sampling, the probability of sampling any individual (plant, quadrat, etc.) is known. The simplest form of probability sampling is simple random sampling, where each potential sample location is equally likely ("probable") of being selected.

Probability sampling has some great advantages. First, randomization coupled with replication allows you to determine the reliability of the measurements. As a result, you can use standard statistical tests. Second, the randomization step increases the chance of representativeness and removes selection bias. Third, randomization and independence increase the chance that samples will be interspersed across the community.

You will learn much more about simple random sampling in the next chapter.

Know the sampling universe

A key step in defining the objectives of a study was identifying the sampling universe. Identifying the sampling universe, your statistical population, lets you decide what inferences are valid. Consider an example from the exercise.  It makes a big difference whether conclusions pertain to all similar agricultural fields, or to just the experimental farm, or to just the 20,000 quadrats that could have been sampled, or to just the 160 quadrats that were sampled. If you make inferences that are beyond your sampling universe, your conclusions are invalid. If you make inferences that are too narrow, you are underselling your results. Knowing this relationship between observations and sampling universe will help you in every fields of science.

Randomness and interspersion
(for students enrolled in 540)

By definition, any simple random selection of samples is equally likely. For example, the selection of quadrats on the left is as equally likely as the selection on the right. But in the left example, interspersion is nil, and the data collected from this design is likely to provide an inaccurate representation of the entire study area. You want your samples to have both randomization (to fit statistical assumptions) and interspersion (to represent the entire study area). What to do?!

Interspersion diagram

Consider some alternatives. A bad approach is to throw randomization out the window and subjectively select your sampling locations to ensure interspersion. Another bad approach is to make your random selection, but discard it if it has poor interspersion. The problem here is that subjective bias can creep into the process when you decide whether to discard a scheme. A good version of this approach would be to decide ahead of time on an objective way to cull out sample arrangements with unacceptably poor interspersion. In the example above, a good rule might be to accept only those arrangements with quadrats in all four quarters of the study area.

Another acceptable approach to increase interspersion (at least to me) is to draw up all the random arrangements that have acceptable interspersion, and select among these arrangements at random. As long as the number of acceptable arrangements is a large proportion of all possible arrangements, the statistical assumptions of simple random sampling should still hold.

A tempting but invalid way to increase interspersion is to enforce some minimum distance between quadrats.  For example, you might be want to throw out a random quadrat location if it touches another.  But if you follow this procedure, you can end up with a biased estimate of variability.  If what you are measuring is spatially autocorrelated across your study area, then throwing out adjacent quadrats will lead to overestimated variance.  And you wouldn't want that, would you.

A formal way to ensure interspersion is stratified random sampling (StRS). In StRS, you first divide the study area into strata. Then you decide how many sampling locations per stratum you will observe. Then, with stratified simple random sampling, you use simple random sampling to place sampling locations within each stratum. With StRS you know you will have locations throughout your study area. A later chapter covers stratified random sampling in greater detail.

Preview: Sampling designs vs. experimental designs

Sampling is a way to collect information from a subset of a study area or statistical population and make inferences to the whole population; it seeks to describe. Experimentation is a way to prod nature into revealing secrets about cause and effect; it seeks to understand. As different as sampling and experimentation sound, they share many similarities and can even be confused. One basic connection between sampling and experimentation is that almost always an experiment is run on only a portion of the larger group. For example, a forest ecologist fertilizes only some forest stands with nitrogen and leaves others unfertilized; a population biologist selects a subset of flowers in a populations and self-pollinates one half and cross-pollinates the other half. How these plots are selected from all the plots in the study area, how these flowers are selected from all the flowers in the study area — these are questions of sampling.

A second connection between sampling designs and experimental designs, especially in vegetation science, is that the treatment areas are often too large to census and must instead be sampled. In the forest fertilization example, data would be collected not from entire fertilized or unfertilized stands, but perhaps from quadrats selected at random from within each treated area. In fact, the forest fertilization example shows a cascade of connections between sampling and experimentation. How are stands selected to participate in the experiment? By using the principles of sampling. How are data collected from the stands? By selecting quadrats and taking measurements using the principles of field sampling methods.

Experimental designs are covered in more details in the chapter Making comparisons.

Putting your knowledge to the test

Now that you've learned of the horrors of preferential and systematic sampling and the joys of probability sampling, it is time to try things out. See if you can apply your knowledge to some ecological situations. published articles in vegetation science. If you are a BOT 440 student, select the exercise Interpreting field methods. If you are a BOT 540 student, select the exercise Identifying sampling designs.

HomeGo to Course InformationNext

© 2007 Mark V. Wilson and Oregon State University