|
|
by Chong Ho (Alex) Yu, Ph.D. |
Problem
It is very common for a survey to contain questions that prompt respondents to check all options that apply. For example, in the following hypothetical survey developed for statistics professors, the first question is: "Which universities did you teach in the last ten years?" The given options are "Harvard," "Yale," "MIT," and "Other." The second question is: "Which statistical packages do you use on a daily basis?" The given choices are "SAS," "SPSS," "Splus," and "Other." In some survey engines, such as SurveyMonkey, item responses for "Check all that apply" are divided into separate fields, as shown in the following table. How can I present responses of A1-A4 in one single bar chart and that of A5-A8 in another chart?
| ID | Gender | A1 | A2 | A3 | A4 | A5 | A6 | A7 | A8 |
| Subject 1 | F | Harvard | Yale | MIT | SAS | SPSS | |||
| Subject 2 | F | Yale | |||||||
| Subject 3 | M | Harvard | Yale | MIT | SAS | SPSS | |||
| Subject 4 | M | Harvard | Splus | Other | |||||
| Subject 5 | M | MIT | Other | SPSS | |||||
| Subject 6 | M | Harvard | MIT | SAS | |||||
| Subject 7 | M | Yale | Other | SAS | SPSS | Splus | Other | ||
| Subject 8 | F | Yale | SAS | SPSS | Splus | ||||
| Subject 9 | F | Harvard | MIT | Other | SPSS | ||||
| Subject 10 | F | Yale | SAS | SPSS | Other | ||||
| Subject 11 | M | Harvard | MIT | Other | SPSS | ||||
| Subject 12 | F | Yale | SAS | Splus | |||||
| Subject 13 | F | SPSS | Splus | ||||||
| Subject 14 | M | Harvard | Yale | MIT | SAS | Other | |||
| Subject 15 | M | Harvard | SPSS | Splus | |||||
| Subject 16 | M | Yale | SAS | SPSS | |||||
| Subject 17 | M | Harvard | Yale | MIT | Other | SPSS | |||
| Subject 18 | M | Harvard | MIT | SAS | |||||
| Subject 19 | M | Harvard | MIT | SAS | |||||
| Subject 20 | F | Harvard | MIT | SAS | Splus | Other |
Solution
If there are only two questions using "check all that apply" in the survey, you can simply count the occurrence of the item responses in each column, write down the total numbers, and then create the bar chart using those numbers (or ask your graduate assistant to do it). But if you have 200 questions like this, automation is definitely a smarter way to go.
The automated process can be performed by three macros, which will be explained momentarily. An experienced SAS programmer can go even further to collapse the three steps into one. Nonetheless, I recommend making a logical break between tasks, because when something goes wrong, it is easier to find out where the bug is in partitioned code modules.
The first macros function is "count_check." As shown below, the comments in green should be self-explanatory. It is assumed that the survey data set has been loaded into a SAS dataset named "survey."
|
/* Create a macro function to count how many people check each option The function has two arguments: start item number, end item number */ %macro count_check(start, end); %DO i = &start %to &end; data temp; set survey; if A&i NE " " then count = 1; /* PROC SUMMARY gives the overall count and also the count by gender */ proc summary; class gender; var count; id A&i; output out=temp&i sum = Freq; /* To identify the option, create a new variable called "selection" To push the option "Other" to the bottom, add a space in front of the option values */ data temp&i; set temp&i; if A&i NE "Other" then selection = (" "||A&i); else selection = A&i; drop A&i; run; %END;
%mend
count_check; /* Invoke the function by starting from item A1 and end at item A4 */ %count_check(1, 4); |
The above procedure generates four temp files. In the first field entitled "Gender" the value for the first row is blank, because the first row is the frequency count of all subjects regardless of the gender. In PROC SUMMARY this type of summary is called "Type 0," which is indicated in the field "_TYPE_." The field "_FREQ_" shows the sample size whereas "Freq" indicates the number of respondents who checked the option.
The next two rows, which are "Type 1 summary," show the numbers partitioned by gender. It is important to note that in these tables percentage is omitted for a good reason. In this survey study there are totally 20 subjects, which are composed of 12 males and 8 females. It makes sense to say that 60% of the respondents are men whereas 40% are women. However, it is misleading to say X% taught at Harvard and Y% taught at Yale. It is because when subjects are allowed to check all that apply, responses are not mutually exclusive, of course, and thus the percentage will be over 100%.
|
Temp 1 |
Temp 2 |
||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
|
Temp 3 |
Temp 4 |
||||||||||||||||||||||||||||||||||||||||
|