Merging SAS datasets (Sections 5.4 & 5.5)
When you have 2 or more datasets containing data on the same subjects, you can combine the datasets by merging them together. Merging is different than stacking or interleaving because (1) usually the datasets have only one or two variables in common and (2) the observations match, or correspond to the same subject.
Syntax
DATA merged_dataset;
MERGE data1 data2 ... ;
BY byvar1 ...;
RUN;
Example 1:1-1 Match Merge
data1:
ID Class Name
1 A John
2 C Jill
3 B Paul
4 C Andrea
data2:
ID Test1 Test2
1 93 85
2 86 75
3 74 88
4 85 93
DATA merged;
MERGE data1 data2;
BY id;
RUN;
merged:
ID Class Name Test1 Test2
1 A John 93 85
2 C Jill 86 75
3 B Paul 74 88
4 C Andrea 85 93
Note that the variable ID identifies which observations on each of the datasets should be merged together. For this example, there is only one observation from each dataset to be joined. This is called a 1-1 match merge. Each of the datasets MUST be sorted on the BYVAR before the MERGE can work.
Example 2: 1-many Match Merge
data1:
ID Class Name
1 A John
2 C Jill
3 B Paul
4 C Andrea
data2:
ID Test Score
1 1 93
1 2 85
2 1 86
3 2 74
3 1 82
3 3 81
4 2 85
4 1 93
DATA merged;
MERGE data1 data2;
BY id;
RUN;
merged:
ID Class Name Test Score
1 A John 1 93
1 A John 2 85
2 C Jill 1 86
3 B Paul 2 74
3 B Paul 1 82
3 B Paul 3 81
4 C Andrea 2 85
4 C Andrea 1 93
Once again, the ID variable found in both datasets identifies which observations to merge together. For this example, data1 has only 1 observation per unique ID, but data2 has between 1 and 3. This is called a 1-many match merge. SAS first reads an observation from data1 and then looks for a matching observation from data2. SAS continues to ready observations from data2 until it finds the first non-matching observation. SAS then reads a new observation from data1. Again, the datasets must be sorted on the BYVAR.
If there are more than one observation per ID in first dataset, SAS first does a 1-1 match merge and then uses the last observation in data1 to match to the remaining observations in data2. Or vice-versa if SAS encounters the last observation in data2 first. This is usually NOT what you want to do, so you may have to specify additional BYVARs or check your data for errors before merging.
Example 3:1-1 Match Merge with duplicate variables
data1:
ID Class Name
1 A John
2 C Jill
3 B Paul
4 C Andrea
data2:
ID Class Test1 Test2
1 lab1 93 58
2 lab3 86 92
3 lab2 74 88
4 lab3 85 68
DATA merged;
MERGE data1 data2;
BY id;
RUN;
merged:
ID Class Name Test1 Test2
1 lab1 John 93 85
2 lab3 Jill 86 75
3 lab2 Paul 74 88
4 lab3 Andrea 85 93
Note that the variable Class appears both in data1 and data2. When a variable appears in more than one dataset and is not the BYVAR, SAS keeps the value found in the last dataset. This is probably something you don't want to do, so you should rename non-BYVARs that are found in more than one dataset so that this situation does not occur.
Example 4:Match Merge with multiple BYVARs
data1:
ID Class Name
1 A John
1 C Sandy
2 A Carol
2 B Gail
2 C Jill
3 B Paul
4 B Ryan
4 C Andrea
data2:
ID Class Test1 Test2
1 A 93 85
1 c 85 93
2 A 86 92
2 B 75 83
2 C 94 95
3 B 74 88
4 B 89 98
4 C 85 68
DATA merged;
MERGE data1 data2;
BY id class;
RUN;
merged:
ID Class Name Test1 Test2
1 A John 93 85
1 C Sandy . .
1 c 85 93
2 C Jill 86 92
2 C Jill 75 83
2 C Jill 94 95
3 B Paul 74 88
4 C Andrea 89 98
4 C Andrea 85 68
When merging with multiple BYVARs, SAS simply finds unique combinations of the BYVARS and then does a 1-1 or 1-many match merge. Note that the BYVARs must match exactly. This is especially important when dealing with character variables. Note that in data2, the 2nd observations has the value of class as a lower case 'c' instead of an upper case 'C' as found in data1. Even though these two observations should probably match, they cannot because 'c' is not equal to 'C'.
Again, the observations in each dataset must be sorted by all the BYVARS in the order listed.