Export data to statistical packages

Chong-ho (Alex) Yu, Ph.D., MCSE, CNE

Problem:

I want to output data to a statistical software such as SAS and SPSS. How should I set up the database and export the data?

Solution:

There are three issues regarding exporting data to statistical packages: multiple values, missing values, and non-numeric values

Multiple values

In a database it is acceptable to store multiple values into a single field. You can either store them as a continuous string in a regular text field or as separate entries in a repeating field. The data are usually input by checkbox:

Which type of equipment do you have? (check all that apply)

a. Printer
b. Scanner
c. Digital camera
d. Modem

In statistical analysis each observation should carry one value per variable. Problems arise when you export these data to a statistical package. To avoid massive cleanup, you should treat each value in the above question as a variable. Each variable carries binary datum and the input format should be radio button instead:
Do you have the following equipment?
a. Printer YesNo
b. ScannerYesNo
c. Digital cameraYesNo
d. ModemYesNo

Missing values

When you output data from FMP to SAS and SPSS, missing data such as the following may bring you headache.

4 5 4 5 3 4 3 4
4 4 3 3   1 2   
3   2 3 2 4 2 2

Although in SAS and SPSS you can use read ASCII data as fixed columns to avoid the problem of missing data, you have to define the column position for each variable. It is easier to assign a "period" as a non-response beforehand (SAS and SPSS see a period as a missing value) When there is no input, the field carries a period, otherwise, the user input overwrites the period. If there is no gap between data, importing data into SAS and SPSS will be much easier.

4 5 4 5 3 4 3 4
4 4 3 3 . 1 2 . 
3 . 2 3 2 4 2 2

There are two ways to auto-enter missing values as default. You can do it in FMP by specifing the Auto Enter Option in Defining Field:

Another way is to specific a default value into the input tag in HTML file:


Question 1 <input type=textfield name=A1 value=".">   

The following is related to the cosmetic aspect rather than the statistical aspect. Nonethless, by pre-assigning a symbol for missing data, you can output a prettier table. In a HTML table, the table cell has no borders when there is no value:

A1

A2

A3

A4

A5

A6

A7

A8

4

5

4

5

3

4

3

4

4

4

3

3

1

2

3

2

3

2

4

2

2

However, pre-assigned symbols can "complete" the table as shown in the following:

A1

A2

A3

A4

A5

A6

A7

A8

4

5

4

5

3

4

3

4

4

4

3

3

.

1

2

.

3

.

2

3

2

4

2

2

Notes: If you do not want to use a period for missing data, you can also produce a prettier HTML table by inserting a "non-breaking space paragraph" (nbsp) tag or a "break" (br) tag into the table cell.


      <TD>
         <P>[FMP-FIELD: A1]&nbsp;</P>
      </TD>
      <TD>
         <P>[FMP-FIELD: A2]<br></P>
      </TD>

 

Non-numeric values

Last but not least, you should not import string-based data into a statistical package except the variables that will be used in the analysis (e.g. "gender" can be used as a class variable and "age" can be a blocking factor). It is very common that people tend to import all fields into SAS or SPSS without planning the analysis. Basically, SAS and SPSS are for number-crunching rather than textual analsyis. Importing text-based data into a stat package does more harm than good. For example, SPSS may be frozen when tones of essay-type data are dumped into SPSS.

Export data

Exporting data from FMP to different statistical packages requires different strategies. The following illustration uses SPSS as an example. A direct transfer of data from FMP to SPSS is difficult. SPSS tends to ignore the end of line in each case and thus all data are read into one row. To rectify this situation, one can use Excel as an intermediate path.


Navigation

FMP Tips Contents

Other computer tips

Search Engine

Credit/Copyright ©

Simplified Navigation

Table of Contents

Contact Me