Subsetting Data
The IF and WHERE statements can be used to subset a dataset. This is useful when the original
dataset is large and you only need to work with a part of it. SAS operates more efficiently
with smaller datasets.
Syntax
IF statement: IF expression ;
WHERE statement: WHERE expression;
The expression or condition part of the statement is similar to that of the IF-THEN-ELSE statement.
When no THEN or THEN-ELSE follows the expression, the dataset only keeps observations satisfying
the condition of the expression.
For example,
IF 20 < age < 40;
This IF statement would only keep observations where the value of the variable Age is between
20 and 40.
If the goal was not to keep specific observations, but to remove them, a IF-THEN statement can
be used with the command DELETE to remove them.
Example,
IF 20 < age < 40 THEN DELETE;
Now the datasets will remove any observations where the value of Age is between 20 and 40.
Obviously, this can also be accomplished by using: IF NOT (20 < age < 40); .
Often one is more useful in a particular situations.
The WHERE statement is used just like the IF statement, with one exception. The IF statement
is processed AFTER SAS has already read in the entire dataset and the WHERE statement
is processed WHILE SAS is reading in the dataset. So, if the dataset being subsetted
is large, the WHERE statement only requires that SAS read in the observations that satisfy the
expression of the WHERE statement. The IF statement would first read in all the observations
of the large dataset and then subset it to satisfy the IF statement. Since the WHERE statement
is processed as SAS is reading in data, it can only be used when reading in a dataset.
Example:
DATA subdata;
SET perm.bigdata;
WHERE Sex = "Female";
RUN;
If the datasets perm.bigdata contained data from both male and female subjects, the new dataset
subdata would only read in the observations with Sex="Female".
Only one WHERE statement can be used in a dataset--if there are more than one, then the last
one is used. Sometimes it is convenient to use separate statements for a complicated conditional
expression. The WHERE statement has a companion statement called the WHERE ALSO statement, which
allows additional conditions to be processed with the WHERE statement. It is used just like the
WHERE statement, with only the additional keyword ALSO used. More than one WHERE ALSO statements
can be used in a data step.
Example:
DATA subdata;
SET perm.bigdata;
WHERE Sex = "Female";
WHERE ALSO 20 < age < 40;
RUN;
Now, the only observations satisfying both Female between the ages of 20 and 40 will be read in
to the new dataset subdata.
One additional feature of the WHERE statement is that it can be used both for data and proc steps.
So if you want to run the data through a procedure, but only want a subset of the data to be
processed, you can use a WHERE statement.
Example,
PROC PRINT data=perm.bigdata;
WHERE Sex = "Female";
TITLE "Female subjects only";
RUN;
Only the observations with Sex = "Female" will be printed.