Export data to statistical packages |
|
Problem:I want to output data to a statistical software such as SAS and SPSS. How should I set up the database and export the data?
Solution:
There are three issues regarding exporting data to statistical packages: multiple values, missing values, and non-numeric valuesMultiple values
In a database it is acceptable to store multiple values into a single field. You can either store them as a continuous string in a regular text field or as separate entries in a repeating field. The data are usually input by checkbox:
Which type of equipment do you have? (check all that apply)
|
In statistical analysis each observation should carry one value per variable. Problems arise when you export these data to a statistical package. To avoid massive cleanup, you should treat each value in the above question as a variable. Each variable carries binary datum and the input format should be radio button instead:
Do you have the following equipment? |
Missing values
When you output data from FMP to SAS and SPSS, missing data such as the following may bring you headache.
4 5 4 5 3 4 3 4 4 4 3 3 1 2 3 2 3 2 4 2 2 |
Although in SAS and SPSS you can use read ASCII data as fixed columns to avoid the problem of missing data, you have to define the column position for each variable. It is easier to assign a "period" as a non-response beforehand (SAS and SPSS see a period as a missing value) When there is no input, the field carries a period, otherwise, the user input overwrites the period. If there is no gap between data, importing data into SAS and SPSS will be much easier.
4 5 4 5 3 4 3 4 4 4 3 3 . 1 2 . 3 . 2 3 2 4 2 2 |
There are two ways to auto-enter missing values as default. You can do it in FMP by specifing the Auto Enter Option in Defining Field:
Another way is to specific a default value into the input tag in HTML file:
|
The following is related to the cosmetic aspect rather than the statistical aspect. Nonethless, by pre-assigning a symbol for missing data, you can output a prettier table. In a HTML table, the table cell has no borders when there is no value:
A1 |
A2 |
A3 |
A4 |
A5 |
A6 |
A7 |
A8 |
4 |
5 |
4 |
5 |
3 |
4 |
3 |
4 |
4 |
4 |
3 |
3 |
1 |
2 |
||
3 |
2 |
3 |
2 |
4 |
2 |
2 |
However, pre-assigned symbols can "complete" the table as shown in the following:
A1 |
A2 |
A3 |
A4 |
A5 |
A6 |
A7 |
A8 |
4 |
5 |
4 |
5 |
3 |
4 |
3 |
4 |
4 |
4 |
3 |
3 |
. |
1 |
2 |
. |
3 |
. |
2 |
3 |
2 |
4 |
2 |
2 |
Notes: If you do not want to use a period for missing data, you can also produce a prettier HTML table by inserting a "non-breaking space paragraph" (nbsp) tag or a "break" (br) tag into the table cell.
|
Non-numeric values
Last but not least, you should not import string-based data into a statistical package except the variables that will be used in the analysis (e.g. "gender" can be used as a class variable and "age" can be a blocking factor). It is very common that people tend to import all fields into SAS or SPSS without planning the analysis. Basically, SAS and SPSS are for number-crunching rather than textual analsyis. Importing text-based data into a stat package does more harm than good. For example, SPSS may be frozen when tones of essay-type data are dumped into SPSS.
Export data
Exporting data from FMP to different statistical packages requires different strategies. The following illustration uses SPSS as an example. A direct transfer of data from FMP to SPSS is difficult. SPSS tends to ignore the end of line in each case and thus all data are read into one row. To rectify this situation, one can use Excel as an intermediate path.
- Export the data from FMP as a tab delimited file.
- Import the tab delimited file into Excel. The default data type in Excel is delimited rather than fixed width. Also, Excel is able to detect whether it is tab, comma, or space delimited.
- The default data format in Excel is general, which is proper for numeric data. If the dataset has data formats other than number (e.g. text, date), highlight the column and choose the appropriate data format.
- After the data have been imported into Excel, save the file in Excel 4 format. Write down the range of cells. For example, if the data spans across 11 columns, the range should be A:K. Afterwards, close the Excel file. If the Excel spreadsheet is not closed, SPSS will not be able to access it.
- In SPSS, open the Excel 4 file and specify the range. The data should be imported seamlessly.
FMP Tips Contents
|
|