Parametric tests are frequently applied by researchers, but some researchers may neither understand the theoretical framework behind parametric tests, nor hold beliefs that are consistent with that framework. The problem is partly due to the fact that the framework consists of components from different sources (schools of thought) but textbook writers and instructors tend to de-emphasize the incompatibility, and as a result, numerous misconceptions arise. This article synthesizes historical and contemporary perspectives to the parametric test framework, and proposes that teaching the parametric test framework should occur in a coherent manner.
arametric tests are frequently applied by researchers, but some researchers may neither understand the theoretical framework behind parametric tests, nor hold beliefs that are consistent with that framework. The parametric test framework is defined by the relationships among sample, sampling distributions, and population. This article points out several omissions in statistics textbooks and common misconceptions concerning these relationships. It is proposed that these relationships should be taught in a coherent fashion. To provide support for this claim, we (a) reviewed 55 statistics textbooks for various majors such as social sciences and engineering, and (b) administered an online survey specific to the concepts of parametric tests to 34 graduate students who have taken 4.9 undergraduate and graduate statistics courses.
Parametric test framework
The absence of foundational concepts causes subsequent misconceptions in the interpretation and application of parametric tests. Sixty-two percent of the respondents to the survey did not know what a parametric test was, let alone the assumptions of parametric tests and the criteria of choosing among parametric tests, non-parametric tests, and other data analytical strategies. This lack of awareness of foundational concepts may be traced back to statistics texts. In the textbook review, it was found that only 20 percent of these books explained the term "parametric tests." Only one book illustrated a road map of choosing between parametric and non-parametric tests (Sharp, 1979). To address this problem, the following illustration is presented.
Figure 1 shows an illustration of the basic components of the parametric test framework. Unlike the resampling framework, which is entirely empirical-based (Good, 1994; Edgington, 1995), the parametric test framework consists of a theoretical world and an empirical world. Although statistical testing appears to be empirical, the foundation of inference is indeed non-empirical. On the theoretical side, there are an infinite population and a sampling distribution, which are the target and the foundation of probabilistic inference, respectively. Probabilistic inference, which leads to a codification of uncertainty (chance fluctuation) by confidence intervals and hypothesis testing, is considered the classical paradigm for parametric tests (Cleveland, 1993). This inference rests on the foundation of sampling distributions and the central limit theorem (CLT), and for this reason, the theorem is regarded as the most important theorem in statistics (Berk & Carey, 1998; Abell, Braselton, & Rafter, 1999). In order to generate data for the inference, power analysis and a sampling method are needed on the empirical side. Each component of the framework will be explained in detail.
Figure 1. Parametric test framework
A parametric test, as its name implies, is a test using the sample statistic to estimate the population parameter. An initial misconception arises from the meaning of "parameter." According to Webster's New Word Dictionary (Simon and Schuster, 1991), the term "parameter" denotes a constant with variable values. Another example can be found in computer programming: when a function passes a parameter to another function, the parameter carries an exact value. However, this is not always the case in statistics.
Hypothetical and theoretical world
Infinite size. Generally speaking, in parametric tests a parameter is introduced as a fixed number that describes the population, but a parameter is viewed by Bayesians as a random variable (Schield, 1997). Indeed, the first statement is correct if the population refers to the accessible population. However, parametric tests start with a hypothetical infinite population and thus it is controversial whether a parameter is a fixed constant. For instance, let us assume that we can measure the height of every American male aged 18 or over. We draw the conclusion that the mean height of these men is 1.51 meters. This mean height is not a fixed constant. Its value will change a second later, since every second thousands of American men die and thousands of American males reach their 18th birthday.
Even if the population size is finite, the population parameter is still not a fixed constant. Thurstone (1937) observed the existence of distributions both between people and within people. Since people are different, this between-subject variability forms a distribution. However, the same person also has different task performance levels and attitudes toward an issue at different times. This variability within the same person could also form a distribution. Following this framework, even if the population has a fixed number of members, it could still yield a changing parameter.
Frick (1998) used an example of "the planet of Forty" to illustrate the application of inferences to a finite population. This example could be stretched to illustrate the concept of distribution within. Imagine that in the planet of Forty, there are only 40 residents who can live forever but cannot reproduce offspring. Imagine that their memory can be erased so that a treatment effect will not carry over to the next one. When they are split into two groups and are exposed to two different treatments, are the two mean scores considered fixed parameters? The answer is "no." A month later when the researcher wipes out what they have learned in the first experiment and asks them to start the experiment over, the scores will vary. This is one of the reasons why statistical tests are still useful even if the researcher has full knowledge of the population. Since there is variability within and between subjects, the researcher needs to know whether the difference is due to chance fluctuations regardless of whether the source of fluctuation is between or within.
A real life example can be found at the debate on university faculty salary equity studies. Haignere, Lin, Eisenberg, and McCarthy (1996) suggested that use of statistical significance is improper while the complete population of faculty members is studied. To counter this argument, Dizinno (1999) stated that the current faculty are only a sample that reflects ongoing, and possible future, salary-setting policies, and thus they are a sample of the population, not the complete population. Cohen (1999) supported the preceding argument by applying the concept of "infinite population."
The preceding problem is associated with the confusion between people and observations. Researchers are concerned with the observations, which may be test scores or some form of measurement, rather than the subjects. In this view, even if the number of people is finite, the number of observations is infinite because the same people can generate infinite observations.
The failure to see a hypothetical population as an infinite population leads to another common misconception: Sample size determination is viewed as being based upon the ratio between the sample and the population. Burrill (1999) argued that the sample size does not depend on the population size being sampled, unless the population is so small that the sample size is a considerable fraction. The same sample size would be required for a population of 8 million as for 125,000. Most people perceive that a given level of precision would require a larger sample from the 8 million than from the 125,000. Actually, the precision of the result is a function of the quantity of information one has in hand, not of the quantity of information in the population of interest. Further, Warwick and Lininger (1975) made it clear that the most important factor in reducing the standard error is the absolute size of the sample rather than the proportionate size (sample-population ratio, see Figure 2). In this view, the size of the population is irrelevant to sample size determination.
Figure 2. Absolute sample size
In a similar vein, Fisher (2008) asserted that contrary to the popular belief, response rates do not determine the validity or reliability of survey data. Consider this scenario: two surveys yielded a sample size of 1,000 whereas one has a 100% rate and the other has a 1% response rate. Nevertheless, the sample of 1,000 provides a confidence interval of, at worst, 3.1%, at 95% confidence for a dichotomous proportion. If the relevant demographics of the respondents are proportional to that in the population, and there is no self-selection bias, then the 1% response rate is as good as the 100% response rate. In short, proper sample size should be determined by power analysis rather than sample-population ratio. Power analysis will be discussed in a later section.
Glass and Hopkins (1996) stated that the fact that the population is not actually infinite is of little concern, because unless the ratio between the sample and the population is .05 or greater, the techniques for making inferences to finite populations and those for infinite populations give essentially the same results. Procedurally speaking, the notion that "populations are infinite" is unimportant. Conceptually speaking, the misperception of population as finite and population parameter as a fixed constant has negative consequences, as it leads researchers to seek out an objective, true and final answer that does not exist. The confusion here is related to the nature of quantitative methods, fueling misguided debates concerning qualitative and quantitative methods (e.g. Langenbach et al., 1994; Erlandson et al., 1993).
Unfortunately, the difference between infinite and finite populations is not emphasized in most statistics textbooks. Out of fifty-five reviewed books, only sixteen of them (29%) explained the difference.
Unknown distribution. Not only is this hypothetical population infinite in size and fluctuating within, it is also unknown in distribution. Contrary to popular belief, the population distribution is not necessarily normal. This leads to another problem. It is a common belief that a random sample represents the target population and therefore random sampling is required for a parametric test. However, when the population is infinite, fluctuating, and unknown, there is no way of knowing whether the sample reflects the population even if the sample is random. (Frick, 1998). A noted physicist Jaynes (1995) identified the perceived equivalence between the random sample and the unknown population as the "mind projection fallacy."
Regardless of the uncertainty of the population, one must start with an accessible population from which random samples are drawn and to which these samples are compared. Problems arise when one regards the known and accessible population as the target population to which the inference is ultimately made. This accessible population is chosen for practical convenience only and is by no means the target population for theoretical inferences. For example, if a researcher defines the population as all current college students at a university and an inference regarding the effectiveness of web-based instruction is made to this population, does the inference also apply to future students? If the inference is localized to a particular time and space, then the findings of the experiment cannot be used to construct a theory, since theory by definition is predictive in nature (Kerlinger, 1986).
Summary. In summary, accepting the notion of a finite population will lead to four troublesome consequences:
- When the entire population is accessible, it is believed that there will be no need to conduct statistics tests.
- The population parameter will be regarded as a fixed constant and the mission of statistics will be seen as the search for one true answer.
- Proper sample size will be viewed as a high ratio between the sample and the population.
- A random sample will be believed to be representative of a population (the "mind projection fallacy")
- The generalizability of the inference will be limited to the accessible population and the construction of a universal theory will be crippled.
Survey results. The survey results confirm our suspicion that the preceding concepts are widely misunderstood. Sixty-one percent of respondents realized that a hypothetical population is infinite in size, but only twenty-three percent were aware that the distribution is unknown. Only thirty-eight percent of the respondents correctly believed that even when a researcher has full access to the entire population, there is still a need to perform a statistical test.
Theoretical sampling distributions
Central limit theorem. Obtaining a true and empirical random sample from an infinite, fluctuating, and unknown population is not possible. Under the CLT, limited cases are used to construct a sampling distribution to approximate the center of the population. This theoretical sampling distribution serves as a bridge between an empirical sample and a hypothetical population. The theorem itself is used to justify making inferences from the sample to the population.
Statistical tests are said to be positivist and empirical in nature (Suen, 1992). However, sampling distributions exist in theory only, and therefore questions arise. If some things exist in theory, do they really exist? In theory, a normal distribution is based on infinite cases. One can use a supercomputer to simulate a normal distribution, but of course the simulation cannot run forever. The debate regarding the existence of mathematical reality has a long history and remains inconclusive (Penrose,1989; Russell & Whitehead, 1938; Tieszen, 1992; Gonzalez, 1991; Yu, 1998). While mathematics is theoretical in essence and thus sampling distributions seem natural to a mathematician, sampling distributions may not correspond to the practical reality which confronts the physician, the engineer, and the scientist (Good, 1994).
Nonetheless, the founder of statistical testing, Sir R. A. Fisher (1956) did not view distributions as outcomes of empirical replications that might actually be conducted. He asserted that theoretical sampling distributions, against which observed effects are tested, have no objective reality "being exclusively products of the statistician's imagination through the hypothesis, which he has decided to test." (p.81).
Non-normal population. The requirement of data normality in parametric tests is grounded in the CLT. However, some researchers mistakenly believe that non-normal data are undesirable for parametric tests, because the data do not resemble a normal population to which observed data are compared (e.g. Siala, 1999). Burrill (1999) pointed out two problems about the preceding notion: (a) Not every statistical test requires normally-distributed variables, and (b) No statistical tests require the scores to be compared to a normal population. Although this discussion concentrates on the second problem, one can see how one misconception could lead to another, and eventually the entire conceptual model could fall apart.
Questionable statements concerning the CLT and normal distribution could be found in statistics texts. For example, a statistical guide for medical researchers stated, "sample values should be compatible with the population (which they represent) having a normal distribution." (Airman & Bland, 1995, p.298).
In fact, the CLT does not assume the normality of the population distribution. The theorem states that a sampling distribution becomes closer to normality as the sample size increases, regardless of the shape of the population distribution. Because the shape of the population distribution is unknown and could be non-normal, in parametric tests data normality resembles the sampling distribution, not the population. In other words, a test statistic from the sample will be compared against the sampling distribution rather than against the population.
Normality is a myth. The belief that most populations are normal is hardly an empirical fact. As early as 1900, Pearson was critical of normal curves, "We can only conclude from the investigations here considered that the normal curve possesses no special fitness for describing errors or deviations such as arise either in observing practice or in nature." French physicist Lippmann pointed out the circular logic of proving normality: "Everybody believes in the normal approximation, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact." (cited in Thompson, 1959, p.121). In a similar vein to Lippmann, Stigler (1986) criticized the "circular" logic employed by Gauss, who developed the Guassian (normal) distribution. Gauss conceptualized the mean in terms of "least squares": the mean could be used to summarize a data set, because when more observations are closer to the mean and less observations are farther from the mean, the sum of squares of the deviation is minimal. The mean is only "most probable" if the errors (deviations) are normally distributed; and the supposition that errors are normally distributed leads back to least squares. In response to the lack of proof of universal normal distributions, Geary (1947) stated that normality could be viewed as a special case of many distributions rather than a universal property. However, since the school of R. A. Fisher became dominant, universal normality has been favored and interest in non-normality has retreated to the background. In conclusion, Geary suggested that future editions of all existing textbooks and new textbooks should include this warning: "Normality is a myth; there never was, and never will be, a normal distribution." (p.241). However, this warning has been ignored. None of the reviewed texts carry this warning.
Mathematical efficiency. The belief that observed data are compared to a normal distribution is a serious misunderstanding of the role of normal distribution in hypothesis testing. Normal distributions are used because a statistical test procedure should be "efficient" and "optimal," in the sense of a high probability of detecting the falseness of a hypothesis when it is indeed false (Kariya & Sinha, 1989). This probability is known as statistical power, which will be discussed in the next section. An optimal test can maximize its power when normal distribution is assumed as the underlying distribution. In addition, when normality is satisfied, only the first- and second- order moments (mean and variance) are needed to fully describe the distribution of the variables. The third- and fourth-order moments (skewness and kurtosis) are not necessary (West, Finch, & Curran, 1995). Thus, the requirement of normality is not due to an empirical fact, rather it is driven by mathematical efficiency or optimality.
Summary The lack of the knowledge of sampling distributions and the CLT will result in three problems:
- Sampling distributions serve as a foundation for making the leap from sample to population. Without this knowledge, inferences are believed to be made to the sample or there is no justification for the leap from the empirical world to the hypothetical world.
- Statistical testing is believed to be positivist and empirical. Actually, the foundation of statistical testing, which is sampling distribution, is theoretical and cannot be verified empirically.
- The sample normality requirement is not driven by empirical facts, but mathematical efficiency. However, normally distributed data are expected to show that the shape of the sample distribution can match that of the population. But indeed the population distribution is unknown and only the sampling distribution is normal.
For more information of misconceptions of the CIL, please consult Yu, Anthony, and Behrens (1995).
Survey results. The survey results are not surprising. Forty-one percent of respondents failed to identify the population as the target of inferences. Fifty-six percent mistakenly believed that the hypothetical population must be normal.
Fusion of null and alternate hypotheses. Sampling distributions provide the basis for power analysis. Power analysis, which is based upon the null sampling distributions and the alternate sampling distributions, is applied to determine the proper sample size for a research project, and thereby determines the efficiency of the test. Sampling methods are used to draw subjects from an accessible population. Hence, a finite sample is obtained and empirical data are computed.
However, in Fisherian statistical testing, the null hypothesis is zero effect. The only conclusion after achieving statistical significance is that "the effect is not nil." Following this strict Fisherian tradition, researchers would find no room for power analysis since statistical power depends on the unknown alternate distribution (Lehmann, 1993). Figure 3 indicates that the relationship between power and beta is defined in the alternate distribution though the Alpha level is set at the null distribution. To rectify this shortcoming, an effect size, which is the standardized distance between the null and the alternate, must be pre-determined. By sketching a distance from the null, the position of the hypothetical alternate is "pinned down."
Figure 3. Power and beta are associated with the alternate distribution
Summary. Failure to recognize that power analysis is based on the alternate sampling distribution introduces two problems:
- Power analysis is perceived as an empirical-based procedure on the population and the sample.
- It is disconcerting that one looks for a clear-cut answer (reject/not reject the null hypothesis) while conducting a power analysis based on the alternate hypothesis, which is unknown in nature and is only hypothesized by an estimated effect size. A discussion on the logical problem of rejecting/not rejecting the null hypothesis can be found in Yu (1999).
For more detail of misconceptions of power analysis, please consult Yu and Behrens (1995).
Survey results. The survey results indicate that only thirty-two percent of participants correctly associated power analysis with sampling distributions.
Randomness as independence. As previously mentioned, it is impossible to obtain a true random sample from an infinite and unknown population and then empirically verify whether the sample could represent the population. Thus, random sampling emphasize the properties of the sample derived from the sampling process. For example, one draws a series of values of independent and identically distributed random variables to form a random sample. The keyword of the preceding statement is "independence."
Many authors define random sampling as a sampling process in which each element within a set has equal chances to be drawn (e.g. Loether & McTavish, 1988; Myers, 1990; Moore & McCabe, 1993; Aczel, 1995). Equality is associated with fairness. This definition contributes to the myth that if the occurrence of a particular event is very frequent, the outcome is considered "unfair" and thus the sample may not be random. This belief also implies that a random sample should reflect the population when every type of member in the population is "fairly" represented.
In reality, complete fairness does not exist. If a psycho-killer fires randomly in a public area, children who have smaller bodies do not have equal chances to be shot as do taller adults. By the same token, one should not expect that in an urn of balls, small balls have equal probabilities to be sampled as large balls. Even if we put the same size balls in the urn, we cannot "equalize" all other factors that are relevant to the outcome. Jaynes (1995) fully explained this problem:The probability of drawing any particular ball now depends on details such as the exact size and shape of the urn, the size of balls, the exact way in which the first one was tossed back in, the elastic properties of balls and urn, the coefficients of friction between balls and between ball and urn, the exact way you reach in to draw the second ball, etc.. (Randomization) is deliberating throwing away relevant information when it becomes too complicated for us to handle...For some, declaring a problem to be 'randomized' is an incantation with the same purpose and effect as those uttered by an exorcist to drive out evil spirits...The danger here is particularly great because mathematicians generally regard these limit theorems as the most important and sophisticated fruits of probability theory. (pp. 319-320)
Phenomena appear to occur according to equal chances, but indeed in those incidents there are many hidden biases and thus observers assume that chance alone would decide. Since authentic equality of opportunities and fairness of outcomes are not properties of randomness, a proper definition of random sampling should be a sampling process in which each member within a set has independent chances to be drawn. In other words, the probability of one being sampled is not related to that of others. Hassad (1999) made a very precise statement about the role of probability in sampling, "The probability in sampling takes care of selection bias only. It does not address representativeness."
At the early stage of the development of the concept "randomness," the essence of randomness was believed to be tied to independence rather than fair representation. It is important to note that when R. A. Fisher and his coworkers introduced randomization into experiments, their motive was not trying to obtain a representative sample. Instead they contended that the value of an experiment depended upon the valid estimation of error (Cowles, 1989). In other words, the errors must be independent rather than systematic.
The debate pertaining to probability can illuminate the issue of random sampling although they seem to be two separate issues. Positivists Reichenbach (1938, 1945) and von Mises (1964) proposed a frequentist theory of probability, which serves as a foundation of probabilistic inferences in hypothesis testing. Williams (1945) was opposed to the frequentist theory by insisting upon the classical Laplacean theory of probability. The classical theory is pure probability, which emphasizes mathematical configurations while the frequentist theory, as applied probability stresses the empirical aspect and views probability as a hypothesis of results of experiments (Goodstein, 1940). According to the principle of indifference in the classical probability theory, the occurrence of every member of the reference class is equi-probable. For example, if a bag contains 10 balls, in theory every ball has 1/10 chance to be drawn. On the other hand, frequentists found that the empirical result does not come up exactly 1/10 even though the number of trials may be very large.
In the case of sampling without replacement and sampling with replacement, one can find that in practice both sampling procedures do not guarantee each member of the set is equi-probable. In the case of sampling without replacement, when there are 10 balls in a bag,
- The probability of first ball being drawn is 1/10
- The probability of second ball being drawn is1 /9
- The probability of third ball being drawn is 1/8...etc
In the case of sampling with replacement,
- The probability of first ball being drawn is 1/10
- The probability of the same ball being drawn twice drawn is 1/10 * 1/10 = 1/100
- The probability of the same ball being drawn three times drawn is 1/10 * 1/10 * 1/10 = 1/1000...etc.
Summary. The misconception of random sampling as achieving "fairness" and "representation" is tied to this problem: The target population is finite and known, and therefore one can tell how representative a random sample is. The consequence of misunderstanding random sampling will result in a false sense of security: The sample can represent the population and thus the inference is valid.
Survey results. Although the population, to which the inference is made is hypothetical and unknown, the majority of the participants (32%) believed that a random sample could be more representative of the population, depending on the ratio between the sample size and the population size.
Inferences from empirical to theoretical world
As mentioned before, under the framework of parametric tests, the inference should be made to the population from the sample. In statistical testing, a test statistic is extracted out of a finite sample and used to compare against an infinite sampling distribution. The probability (p-value) indicates how likely the result will surface in the long run. In other words, the interpretation of statistical testing should be a probabilistic inference rather than the pursuit of one true answer.
Niels Bohr's "Copenhagen interpretation" applies well to statistical inference even though Bohr was in a different discipline. Bohr asserted that one can answer questions of the form: "If the experiment is performed, what are the possible results and their probabilities?" According to Bohr, one should not answer any question in this form: "What is really happening when ...?" (cited in Jaynes, 1995, p.1012).
"Copenhagen interpretation" is derived from quantum mechanics. According to the uncertainty principle in quantum mechanics, one cannot know the velocity and the position of a subatomic particle at the same time; the knowledge of either one must be obtained through measurement that requires human interferences. In this view, one can only give a conditional prediction about what the result would be if a given measurement were to be performed (Maxwell, 1993).
Lindsey (1996) went even further to assert that "probability is appropriate for constructing models but much less so for making scientific inferences about them." (p. vi) A model is an ideal case and does not necessarily fit the data in the real world. Given the assumptions of the model, one can tell what it is likely to happen but cannot make a firm inference of what has happened.
Some writers created the unnecessary polarity of perspective seeking verses truth seeking (Langenbach et al., 1994; Erlandson et al., 1993). Langenbach et al. even wrote that quantitative researchers who accept "truth seeking ontology" contend that ultimately there exists one best answer. On the contrary, this is not the nature of probabilistic inference.
The impression that a statistical inference leads to one true answer is due to the subsequent action after the rejection or retention of the null hypothesis. When an experiment indicates that there is a significant difference between the mean scores of the control group and that of the treatment group, the policy maker adopts the treatment although there is no logical connection between the action and the inference. One could read a qualitative research report and take an action based upon the report. As Schield (1997) said, "Probability itself does not lead to action. Rather probability justifies confidence and confidence justifies action" (p.3). In a similar vein, Krantz (1996) asserted, "Probabilities (sometimes) mediate evidence judgments, but they are not an end in themselves."
As a matter of fact, it is impossible that every study on the same topic can produce the same result. If there is only one true answer, which one is true? On the other hand, the probabilistic nature of inference is compatible with the philosophy of science that research results are tentative and thereby inquiry is a self-correcting process in the long run. Under this premise, inconsistent results from different research studies do not create any logical dilemma or cognitive dissonance.
The failure of conceptualizing a statistical inference as a probabilistic inference is tied to other misconceptions in sample, sampling distributions, power, and population. In addition, this failure not only leads researchers to have a false sense of certainty, but also leaves no room for harmonizing inconsistent research results.
According to the survey, misconceptions specific to inferences seem to be less serious. Only twenty-nine percent of participants misunderstood the meaning of the p-value, and misperceived that the nature of quantitative research, to some degrees, is truth-seeking by giving a definite answer.
The concept of the relationship among sample, population, and sampling distribution is the foundation of all subsequent statistical concepts and procedures. Misconceptions in different components of the framework are inter-related. Without a coherent theoretical framework, one may be able to perform procedures such as t-tests, ANOVA, and regression correctly, but fail to interpret the result and conceptualize the nature of the inference properly.
The following detrimental beliefs are some examples resulting from such an incoherent framework: "The population is finite and inferences are generalized to this population." "An inference is applied to the sample." "The data distribution are not normal and thus they cannot represent the population. " "Given the power as .85, it gives a definite answer: Reject the null hypothesis and thus the treatment is ineffective." "The result reveals one true answer of the population parameter"...etc. All these misconceptions could be boiled down to a common thread: the failure to identify the difference between the theoretical world and the empirical world, and how the researcher could leap back and forth from one to the other.
Teaching concepts in a piecemeal manner tends to increase the risk of forming an incoherent framework. Teaching statistical procedures without introducing a unified framework is even worse. It is recommended that a comprehensive and coherent parametric test framework should be learned by statistics students with each component thoroughly explained to ensure a smooth logical flow from one to the other
Abell, M. L., Braselton, J. P., & Rafter, J. A. (1999). Statistics with Mathematica. San Diego, CA: Academic Press.
Aczel, A. D. (1995). Statistics: Concepts and applications. Chicago: Richard D. Irwin, Inc.
Airman, D. G. & Bland, J. M. (1995). The normal distribution. British Medical Journal, 310, 298.
Berk, K. & Carey, P. (1998). Data analysis with Microsoft Excel. Pacific Grove: Duxbury Press.
Burrill, D. (1999, April 8). Re: Normalization. Educational Statistics Discussion List (EDSTAT-L). [Online]. Available E-mail: email@example.com [1999, April 8].
Burrill, D. F. (1999, November 21). Re: Help out an English Major, please!. Educational Statistics Discussion List (EDSTAT-L). [Online]. Available E-mail: firstname.lastname@example.org [1999, November 21].
Cleveland W. S. (1993). Visualizing data. Murray Hills, NJ: AT&T Bell Laboratories.
Cohen, M. P. (1999, March 30). Re: Population vs. sample: Implications for salary equity study. Educational Statistics Discussion List (EDSTAT-L). [Online]. Available E-mail: email@example.com [1999, March 30].
Cowles, M. (1989). Statistics in psychology: An historical perspective. Hillsdale, New Jersey: LEA.
Dizinno, G. (1999, March 29). Population vs. sample: Implications for salary equity study. Educational Statistics Discussion List (EDSTAT-L). [Online]. Available E-mail: firstname.lastname@example.org [1999, March 29].
Edgington, E. S. (1995). Randomization tests. NewYork: Marcel Dekker.
Erlandson, D. A., Harris, E. L., Skipper, B. L., & Allen, S. D. (1993). Doing naturalistic inquiry: A guide to methods. Newsbury Park, CA: Sage Publication.
Fisher, R. A. (1956). Statistical methods and scientific inference. Edinburgh: Oliver and Boyd.
Frick, R. W. (1998). Interpreting statistical testing: Process and propensity, not population and random sampling. Behavior Research Methods, Instruments, & Computers, 30, 527-535.
Geary, R. C. (1947). Testing for normality. Biometrika, 34, 209-241.
Goodstein, R. L. (1940). On von Mises' theory of probability. Mind (New series), 49, 58-62.
Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in psychology and education (Third edition). Boston, MA: Allyn and Bacon.
Good, P. (1994). Permutation tests: A practical guide to resampling methods for testing hypotheses. New York: Springer-Verlag.
Gonzalez, W. J. (1991). Intuitionist mathematics and Wittgenstein. History and Philosophy of Logic, 12, 167-183.
Haignere, L, Lin, Y. J., Eisenberg, B., & McCarthy, J. (1996). Pay checks: A guide to achieving salary equity in higher education. Albany, NY: United University Professors.
Hassad, R. (1999, March 1). Re: Question about Convenience Sampling. Educational Statistics Discussion List (EDSTAT-L). [Online]. Available E-mail: email@example.com [1999, March 1].
Jaynes, E. T. (1995). Probability theory: The logic of science. [On-line] Available URL: http://omega.math.albany.edu:8008/JaynesBook.html
Kariya, T. & Sinha, B. (1989). Robustness of statistical tests. Boston, MA: Academic Press, Inc.
Kerlinger, F. N. (1986). Foundations of behavioral science. New York: Holt, Rinehart, and Winston.
Krantz, D. (1996, February 28). Procedural and Bayesian probabilities. Statistical consulting newsgroup . [Online]. Available Newsgroup: sci.stat.consult [1996, February 28].
Langenbach, M.; Vaughn, C. & Aagaard, L. (1994). Introduction to educational research. Boston, MA: Allyn and Bacon.
Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88, 1242-1249.
Lindsey, J. K. (1996). Parametric statistical inference. Oxford: Clarendon Press.
Loether, H. J., & McTavish, H. J. (1988). Descriptive and inferential statistics: An introduction. Boston, MA: Allyn and Bacon, Inc.
Maxwell, N. (1993). Does Orthodox Quantum Theory undermine, or support, Scientific Realism? Philosophical Quarterly, 43, 139-157.
Moore, D. S. & McCabe, G. P. (1993). Introduction to the practice of statistics. New York: W. H. Freeman and Company.
Myers, K. N. (1990). An exploratory study of the effectiveness of computer graphics and simulations in a computer-student interactive environment in illustrating random sampling and the central limit theorem. Unpublished doctoral dissertation, Florida State University.
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can reasonably be supposed to have arisen from random sampling. Philosophical Magazine, 5, 157-175.
Penrose, R. (1989). The emperor's new mind: Concerning computers, minds, and the laws of physics. Oxford: Oxford University Press.
Reichenbach, H. (1938). Experience and prediction; an analysis of the foundations and the structure of knowledge. Chicago, Ill., The University of Chicago Press.
Reichenbach, H. (1945). Reply to Donald C. Williams' criticism of the frequency theory of probability. Philosophy and Phenomenological Research, 5, 508-512.
Russell, B. and Whitehead, K. (1938). Principles of mathematics. New York: W. W. Norton & Company, Inc.
Schield, M. (1997). Intepretating statistical confidence. Proceedings of 1997 American Statistical Association Convention. Alexandria, VA: ASA
Sharp, V.F. (1979). Statistics for the social sciences. Canada : Little, Brown & Company.
Simon and Schuster. (1991). Webster's New Word Dictionary. Cleveland, OH: The Author.
Siala, H. (1999, April 8). Re: Normalization. Educational Statistics Discussion List (EDSTAT-L). [Online]. Available E-mail: firstname.lastname@example.org [1999, April 8].
Stigler, S. M. (1986). The history of statistics: The measurement of uncertainty before 1900. Cambridge, MA: The Belknap Press of Harvard University Press.
Suen, H. K. (1992). Significance testing: Necessary but insufficient. Topics in Early Childhood Special Education, 12, 66-81.
Tieszen, R. (1992). Kurt Godel and phenomenology. Philosophy of Science, 59, 176-194.
Thompson, D. W. (1959). On growth and form. Cambridge: Cambridge University Press.
Thurstone, L L. (1937). Psychology as a quantitative rational science. Science, 85, 227-232.
Von mises, R. (1964). Mathematical theory of probability and statistics. New York, Academic Press.
Warwick, D. P., & Lininger, C. A. (1975). The sample survey: Theory and practice. New York: McGraw-Hill Book Company.
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables. In R. H. Hoyle (Eds.), Structural equation modeling: Concepts, issues, and applications (pp.56-75). Thousand Oaks: Sage Publications.
Williams, D. (1945). The challenging situation in the philosophy pf probability. Philosophy and Phenomenological Research, 6, 67-86.
Yu, C. H. (1998). Mathematical reality: Do theoretical distributions exist? [On-line] Available URL: http://www.creative-wisdom.com/computer/sas/math_reality.html
Yu, C. H. (1999). Probabilistic inferences or dichotomous answers? ? [On-line] Available URL: http://www.creative-wisdom.com/teaching/WBI/logic.html
Yu, C. H., Anthony, S, & Behrens, J. T. (1995, April). Identification of misconceptions in learning central limit theorem and evaluation of computer-based instruction as a remedial tool. Paper presented at the Annual Meeting of American Educational Researcher Association, San Francisco, CA.
Yu, C. H., & Behrens, J. T. (1995). Identification of misconceptions concerning statistical power with dynamic graphics as a remedial tool. Proceedings of 1994 American Statistical Association Convention. Alexandria, VA: ASA.
Appendix A Reviewed Statistics textbooks
Aczel, A. D. (1995). Statistics: Concepts and applications. Chicago, IL: Irwin.
Anderson, T.W., & Finn, J.D. (1996). The new statistical analysis of data. New York: Springer.
Caulcutt, R. (1991). Statistics in research and development (2nd ed.). New York: Chapman & Hall.
Chou, Y. L. (1989). Statistical analysis for business and economics. New York: Elsevier.
Daly, F., Hand, D.J., Jones, M.C., Lunn, A.D., & McConway, K.J. (1995). Elements of statistics. New York, NY: Addison-Wesley
Darlington, R.B., & Carlson, P.M. (1987). Behavioral statistics logic and methods. New York: Macmillian, Inc.
Ferguson, G.A. (1976). Statistical analysis in psychology & education. New York: McGraw-Hill Co.
Finkelstein, M.O., & Levin, B. (1990). Statistics for lawyers. New York: Springer-Verlag.
Fisher, W. (2008). The cash value of reliability. Rasch measurement, 22, 1160-1163.
Fleming, M.C., & Nelllis, J.G. (1994). Principles of applied statistics. New York: Rutledge.
Freund, J. E. & Williams, F.J. (1977). Elementary business statistics: The modern approach (3rd ed). Englewood Cliffs, NJ: Prentice-Hall, Inc.
Freund, J. E. (1976). Statistics: A first course. Englewood Cliffs, NJ: Prentice-Hall, Inc.
Glass, G. V. & Hopkins, K. D. (1996). Statistical methods in education and psychology (3rd ed.) Boston, MA: Allyn and Bacon.
Hamburg, M. (1974). Basic statistics: A modern approach. New York: Harcourt Brace Jovanovich, Inc:
Handel, J.D. (1978). Introductory statistics for sociology. Engelwood Cliffs, NJ: Prentice-Hall, Inc.
Harshbarger, T.R. (1977). Introductory statistics: A decision map. New York: Macmillian Publishing Co.
Havilcek, L.L., & Crain, R.D. (1988). Practical statistics for the physcial sciences. Washington, DC : American Chemical Society.
Hinkle, D. E., Wiersma, W. & Jurs, S. G. (1994). Applied statistics for the behavioral sciences. Geneva, IL: Houghton Mifflin Company.
Hoel, P.G. (1971). Elementary statistics (3rd ed.). New York: John Wiley & Sons, Inc.
Hoel, P.G., & Jessen, R.J. (1977). Basic statistics for business and economics (2nd ed.). Canada: John Wiley & Sons
Hopkins, K.D., & Glass, G.V. (1978). Basic statistics for the behavioral sciences. Englewood Cliffs: NJ: Prentice Hall.
Huntsburger, D.V., & Billingsley, P. (1981). Elements of statistical inference (5th ed.). Boston, MA: Allyn and Bacon, Inc.
Jaeger, R.M. (1990). Statistics: A spectator sport (2nd ed.). Newbury Park, CA: Sage Publications.
Kelly, W.D., Ratliff, Jr., T.A., & Nenadic, C. (1992). Basic statistics for laboratories: A primer for laboratory workers. New York: Van Nostrand Reinhold.
Korin, B. P. (1975). Statistical concepts for the social sciences. Cambridge, MA: Winthrop Publishers.
L'Esperance, W.L. (1971). Modern statistics for business and economics. New York: Macmillian Company.
Langley, R. (1971). Practical statistics for non-mathematical people. New York: Drake Publishers
Levin, J. (1977). Elementary statistics in social research (2nd ed.). New York: Harper Row.
Lindsey, J. K. (1995). Introductory statistics: A modeling approach. Oxford: Clarendon Press.
Madsen, R. W. & Moeschberger, M. L. (1986). Statistical concepts with applications to business and economics (2nd ed.) Englewood Cliffs, NJ: Prentice-Hall.
Malik, H.J., & Mullen, K. (1975). Applied statistics for business and economics. Addison-Wesley Publishing Co. Reading, MA
Marascuilo, L.A. (1971). Statistical methods for behavioral science research. McGraw-Hill, Inc. New York, NY
Marascuilo, L.A., & McSweeney, M. (1977). Nonparametric and distribution-free methods for the social sciences. Brooks/Cole Publishing Company. Monterey, CA
Mason, R. (1978). Statistical techniques in business and economics (4th ed.) Homewoord, IL: Richard D. Irwin. Inc.
McClave, J.T., & Dietrich II, F.H. (1983). A first course in statistics. San Francisco, CA: Dellen Publishing Co.
McElroy, E. E. (1979). Business statistics (2nd ed.). San Francisco, CA: Holden-Day.
McGee, V.E. (1971). Principles of statistics: Traditional and Bayesian. New York: Meredith Corporation.
Minium, E.W., & Clarke, R.B. (1982). Elements of statistical reasoning. Canada: John Wiley & Sons, Inc.
Moore, D. S. & McCabe, G. P. (1989). Introduction to the practice of statistics. New York: W. H. Freeman and Company.
Moore, D. (1985). Statistics: Concepts and controversies (2nd ed.). New York: W.H. Freeman and Co.
Mould, R.F. (1989). Introductory medical statistics (2nd ed.). Bristol, Philadelphia: Institute of Physics Pub.
Mueller, J. H., Schuessler, K. F. (1977). Statistical reasoning in sociology (3rd ed.). Boston, MA: Houghton Mifflin.
Pfaffenberger, R. C. & Patterson, J. (1977). Statistical methods for business and economics. Homewood, IL: Richard D. Irwin, Inc.
Rustagi, J.S. (1984). Introduction to statistical methods. Totowa, NJ: Rowman & Allanheld.
Sharp, V.F. (1979). Statistics for the social sciences. Little, Brown & Company. Canada.
Simpson, I.S. (1975). Basic statistics for librarians. London: Library Association..
Spatz, C. (1992). Basic statistics: Tales of distributions. Belmont, CA: Brooks/Cole Publishing.
Snedecor, G.W., & Cochran, W.G. (1980). Statistical methods (7th ed.). Ames, IA: Iowa State University Press..
Weinberg, G.H., Schumaker, J.A., & Oltman, D. (1981). Statistics; an intuitive approach. Monterey, CA: Brooks/Cole Publishing Co.
Weinberg, S.L., & Goldberg, K.P. (1979). Basic statistics for education and the behavioral sciences. Boston, MA: Houghton Mifflin, Co.
Welkowitz, J., Ewen, R. B., & Cohen, J. (1982). Introductory statistics for the behavioral sciences (3rd ed.). San Diego, CA: Harcourt Brace Jovanovich.
Wonnacott, R.J., & Wonnacott, T.H. (1982). Statistics: Discovering its power. New York: John Wiley & Sons, Inc.
Wynne, J.D. (1982). Learning statistics: A common-sense approach. New York: Macmillan Pub. Co., Inc.
Appendix B Survey questions
Q1: What is a parametric test?
Q2: When a researcher has full access to the entire population, there is no need to perform a statistical test.
c. It depends on the size of the population
d. It depends on the distribution shape of the population
Q3: In parametric tests, the size of a hypothetical population is _________.
Q4: In parametric tests, the distribution of a hypothetical population is _________.
Q5: A random sample is more representative of the population.
c. It depends on the sample size
d. It depends on the ratio between the sample size and the population size
Q6: When the researcher makes an inference, the inference is made to _______.
a. the sample
b. the sampling distribution
c. the population
d. b & c
e. All of the above
Q7: Which distribution is power analysis based on?
d. b & c
e. All of the above
Q8: When the analysis returns a p-value, what does it mean?
a. how likely the observed result will surface in the long run given the null hypothesis is true
b. how likely the observed result will surface in the long run given the alternate hypothesis is true
c. the true parameter of the population
d. the true parameter of the sampling distribution
Q9: The nature of quantitative research is truth-seeking--it gives a definite answer.
c. It depends on the statistical power
d. It depends on the confidence interval