Graphics for "Check all that apply"

by Chong Ho (Alex) Yu, Ph.D.

Problem

It is very common for a survey to contain questions that prompt respondents to check all options that apply. For example, in the following hypothetical survey developed for statistics professors, the first question is: "Which universities did you teach in the last ten years?" The given options are "Harvard," "Yale," "MIT," and "Other." The second question is: "Which statistical packages do you use on a daily basis?" The given choices are "SAS," "SPSS," "Splus," and "Other." In some survey engines, such as SurveyMonkey, item responses for "Check all that apply" are divided into separate fields, as shown in the following table. How can I present responses of A1-A4 in one single bar chart and that of A5-A8 in another chart?

ID	Gender	A1	A2	A3	A4	A5	A6	A7	A8
Subject 1	F	Harvard	Yale	MIT		SAS	SPSS
Subject 2	F		Yale
Subject 3	M	Harvard	Yale	MIT		SAS	SPSS
Subject 4	M	Harvard						Splus	Other
Subject 5	M			MIT	Other		SPSS
Subject 6	M	Harvard		MIT		SAS
Subject 7	M		Yale		Other	SAS	SPSS	Splus	Other
Subject 8	F		Yale			SAS	SPSS	Splus
Subject 9	F	Harvard		MIT	Other		SPSS
Subject 10	F		Yale			SAS	SPSS		Other
Subject 11	M	Harvard		MIT	Other		SPSS
Subject 12	F		Yale			SAS		Splus
Subject 13	F						SPSS	Splus
Subject 14	M	Harvard	Yale	MIT		SAS			Other
Subject 15	M	Harvard					SPSS	Splus
Subject 16	M		Yale			SAS	SPSS
Subject 17	M	Harvard	Yale	MIT	Other		SPSS
Subject 18	M	Harvard		MIT		SAS
Subject 19	M	Harvard		MIT		SAS
Subject 20	F	Harvard		MIT		SAS		Splus	Other

Solution

If there are only two questions using "check all that apply" in the survey, you can simply count the occurrence of the item responses in each column, write down the total numbers, and then create the bar chart using those numbers (or ask your graduate assistant to do it). But if you have 200 questions like this, automation is definitely a smarter way to go.
The automated process can be performed by three macros, which will be explained momentarily. An experienced SAS programmer can go even further to collapse the three steps into one. Nonetheless, I recommend making a logical break between tasks, because when something goes wrong, it is easier to find out where the bug is in partitioned code modules.
The first macros function is "count_check." As shown below, the comments in green should be self-explanatory. It is assumed that the survey data set has been loaded into a SAS dataset named "survey."

/* Create a macro function to count how many people check each option

The function has two arguments: start item number, end item number */

%macro count_check(start, end);

%DO i = &start %to &end;

data temp; set survey;

if A&i NE " " then count = 1;

/* PROC SUMMARY gives the overall count and also the count by gender */

proc summary; class gender; var count; id A&i; output out=temp&i sum = Freq;

/* To identify the option, create a new variable called "selection"

To push the option "Other" to the bottom, add a space in front of the option values */

data temp&i; set temp&i;

if A&i NE "Other" then selection = (" "||A&i);

else selection = A&i;

drop A&i; run;

%END;

%mend count_check;

/* Invoke the function by starting from item A1 and end at item A4 */

%count_check(1, 4);

The above procedure generates four temp files. In the first field entitled "Gender" the value for the first row is blank, because the first row is the frequency count of all subjects regardless of the gender. In PROC SUMMARY this type of summary is called "Type 0," which is indicated in the field "_TYPE_." The field "_FREQ_" shows the sample size whereas "Freq" indicates the number of respondents who checked the option.
The next two rows, which are "Type 1 summary," show the numbers partitioned by gender. It is important to note that in these tables percentage is omitted for a good reason. In this survey study there are totally 20 subjects, which are composed of 12 males and 8 females. It makes sense to say that 60% of the respondents are men whereas 40% are women. However, it is misleading to say X% taught at Harvard and Y% taught at Yale. It is because when subjects are allowed to check all that apply, responses are not mutually exclusive, of course, and thus the percentage will be over 100%.

Temp 1

Temp 2

Gender	_TYPE_	_FREQ_	Freq	selection
	0	20	12	Harvard
F	1	8	3	Harvard
M	1	12	9	Harvard

Gender	_TYPE_	_FREQ_	Freq	selection
	0	20	10	Yale
F	1	8	5	Yale
M	1	12	5	Yale

Temp 3

Temp 4

Gender	_TYPE_	_FREQ_	Freq	selection
	0	20	11	MIT
F	1	8	3	MIT
M	1	12	8	MIT

Gender	_TYPE_	_FREQ_	Freq	selection
	0	20	5	Other
F	1	8	1	Other
M	1	12	4	Other

The second macros function is called merge_temp. Again, the comments in green should be self-explanatory.

/* Merge separate temp files of A1 to A4 into one file */

%macro merge_temp(start, end);

data all; set

      %do i = &start %to &end;

      temp&i

      %end;

      ;run;

%mend merge_temp;

/* Invoke the function by starting from Temp1 and ending at Temp4 */

%merge_temp (1, 4);

The above procedure simply appends all temp files into one file. Now you have a data set ready for SAS graphing.

Gender	_TYPE_	_FREQ_	Freq	selection
	0	20	12	Harvard
F	1	8	3	Harvard
M	1	12	9	Harvard
	0	20	10	Yale
F	1	8	5	Yale
M	1	12	5	Yale
	0	20	11	MIT
F	1	8	3	MIT
M	1	12	8	MIT
	0	20	5	Other
F	1	8	1	Other
M	1	12	4	Other

The last macros function is for graphing. The focus of this write-up is about automation rather than graphing. If you are unfamiliar with SAS/GRAPH and ODS, please consult a SAS manual.

/* Create a macros function to plot the graphics

This function takes four arguments:

1. device e.g. activex, actximg, PNG

2. fileformat e.g. RTF, HTML

3. filename: Output file name e.g. A1_A4

4. title: Title in the graph, usually it is the question text */

%macro plot_select(device, fileformat, filename, title);

goptions reset=all device=&device;

ods &fileformat file="&filename..&fileformat" path="&path"(URL=none);

/* Plot the overall count of each option regardless of gender */

data all2; set all;

if _TYPE_ = 0; run;

title "&title";

proc gchart data=all2;

hbar selection /sumvar=freq;

/* Plot the frequency count of each option by gender */

data all2; set all;

if _TYPE_ = 1; run;

proc gchart;

hbar selection /sumvar=freq group=gender subgroup=gender; run;

ods &fileformat close; quit;

%mend plot_select;

/* Invoke the macros function, provide four agruments */

%plot_select

(PNG, html, A1_A4, Which universities did you teach in the last ten years?);

The preceding procedure generates the following graphs:

It looks very complicated, doesn't it? How about the next question and all the rest in the survey data set? Will I go through the above steps over and over? No, I am very lazy. To plot item response from A5 to A8, I simply invoke the three macros functions as follows:

%count_check(5, 8);

%merge_temp (5, 8);

%plot_select (ACTXIMG, RTF, A5_A8, Which statistical packages do you use on a daily basis?);

Please note that for A5-A8 I changed the device type from PNG to ActiveX Image (ACTXIMG), and changed the output file type from HTML to RTF. I can easily change these options without touching the original source code. Using this approach I can create 200 graphs in a short period of time, because copying, pasting, and editing are no longer necessary.

Graphics for "Check all that apply"

Navigation

Index

Simplified Navigation

Table of Contents

Search Engine

Contact