Probabilistic inferences or Dichotomous answers?

What do psychologists conclude?
A discussion of the underlying philosophy of probabilistic inferences
and Dichotomous answers

Chong-ho Yu, Ph.Ds.

Hypothesis testing has been a controversial topic in psychological research (Mulaik & Steiger, 1997). One of the arguments against hypothesis testing is that it over-simplifies the complexity of reality by portraying solutions in a dichotomous manner. Schmidt and Hunter (1997) doubted the validity of making a dichotomous decision derived from a single study, and thus promoted the use of confidence intervals and meta-analysis¹. Although Wilkinson and the American Psychological Association's Task Force (1999) did not support banning hypothesis testing, they were also critical to dichotomous answers, "It is hard to imagine a situation in which a dichotomous accept/reject decision is better than reporting an actual p value." In a similar vein, Wang (1993) asserted, "the reporting of a p value is superior to the accept-reject method in the sense that it gives a useful measure of the credibility of Ho" (p.20). On the other hand, Huck and Cormer (1996) suggested that the terms "marginal significance," "borderline significance," "approach significance," and "trend toward significance" are appropriate. However, they realized that this approach to p value interpretation is subjective.
Contrary to the popular belief, probabilistic inferences rarely occurred in the history of science (Glymour, 1980). Most well-known scientists, including Copernicus, Newton, Kepler, Maxwell, Dalton, Einstein, Schrodinger, did not give probabilistic arguments. Glymour pointed out that although statistical procedures such as regression, correlation, and ANOVA are commonly applied in modern physical sciences, probability is a distinctly minor note in the history of scientific argument. Given this historical background, one may understand why on one hand statistics is pervasive in research of various disciplines but yet many scholars interpret the results in a dichotomous rather than a probabilistic manner.

The notion of dichotomous answer is also an area of contention between qualitative and quantitative researchers. While comparing the nature of quantitative and qualitative research, some writers created a polarity of perspective seeking versus truth seeking (Langenbach, Vaughn, & Aagaard, 1994). Langenbach et al. wrote that quantitative researchers who accept "truth seeking ontology" contend that ultimately there exists one best answer. To some certain extent, the notion of "seeking one best answer" results from the misconception that hypothesis testing leads to a dichotomous answer.

Let's look at the following scenerio:

Dr. Who ran several t-tests and the p-values were ".06," ".051," ".05", ".00001." He reported them as "near significant," "somewhat significant," "significant," and "very significant." Dr. No criticized, "What you wrote is nonsense. In hypothesis testing, one can either reject the null hypothesis or fail to reject it. The outcome is dichotomous rather than a continuum." Dr. Who argued, "What you just said is nonsense! The inference resulted from hypothesis testing is probabilistic. The p value indicates how likely the observed data will occur in the long run, thereby it is an uncertain answer." Who is right? ²

Both of the above interpretations can be found in psychological research. These separate interpretations represent two approaches of logical reasoning (deduction vs. induction), two distinct philosophies of science (determinism vs. probabilistic worldview), and two discrete schools of thought in quantitative research (Fisher vs. Neyman/Pearson). This article will discuss each of these perspectives and illustrate why the view of probabilistic inference is more appropriate.

Deduction and Induction

The debate pertaining to dichotomous answers is strongly tied to the development of logical reasoning. Following deductive logic, an either/or outcome is inevitable. On the other hand, inductive logic favors a probabilistic inference.

Deduction

Deduction presupposes the existence of truth and falsity. Quine (1982) stated that the mission of logic is the pursuit of truth, which is the endeavor to distinguish true statements from false statements. Hoffmann (1997) further explicated this point by saying that the task of deductive logic is to define the validity of one truth as it leads to another truth. In deduction, the source of knowledge is ideas, and the reasoning process is based upon premises, which are synthesized from ideas. The following is a typical example:

The first premise is true

The second premise is true

Therefore, the conclusion is true.

In deductive reasoning, propositions have two attributes. First, according to Carnap (1962), deductive logic has the characteristic of certainty. For example, the premises and the conclusion should be like:

All humans eventually die.

Alex Yu is a man.

He will die.

It is important to note that in deduction, premises do not have any quantifier other than "all" and "none" (e.g. "some," "many"). If the premises carry quantifiers like "some" or "most," the logical inference is inductive rather than deductive. For example,

Most Europeans are Whites.

Josephine is from Europe.

It is likely that she is White.

The phrase "it is likely" weakens the certainty of the conclusion and thus the logic is no longer deductive. Since the statement "most Europeans are Whites" results from an empirical observation, the preceding logical reasoning is inductive, even though the syllogism appears to be deductive.

Furthermore, in deduction, both premises and conclusions must be either true or false. There are no half-true or half-false statements. Neither "many are true but some are false" nor "some are true but many are false" is acceptable. Using this approach, the conclusion must follow the Boolean logic as shown in the Truth Table (See Table 1). Hence, researchers using deduction must make a binary (T/F) conclusion.

Table 1

Condition I

Condition II

Conclusion

T T T

T F F

F T F

F F F

Based upon the necessity of logical flow, certainty is attributed to deduction. Some researchers went even further to expand the validity of deduction from certainty to infallibility. The "infallible" character of deduction can be traced back to the Cartesian tradition. Hersh (1997) concisely described the relationship between the Cartesian tradition and the use of mathematical modeling in research:

In every scientific problem, said Descartes, find an algebraic equation relating an unknown variable to a known one. Then solve the algebraic equation! With the development of calculus, Descartes' doctrine was essentially justified. Today we don't say "find an algebraic equation." We say "construct a mathematicalmodel." This is only a technical generalization of Descarte's idea. Our scientific technology is an inheritance from Descartes (p.112)

Modeling is based upon deduction because "Descartes was embracing the Euclidean ideal: Start from self-evident axioms, proceed by infallible deduction" (p.112). Since statistical modeling and hypothesis testing in psychological research can be viewed as a specific form of mathematical modeling, it is not surprising that results yielded from statistical analysis are often viewed as a definite answer. However, there is a difference between mathematical deduction in the Cartesian tradition and logical deduction in statistical tests. In the former, the premises are considered self-evident and thus the deductive process is "infallible." In psychological research, premises are hardly self-evident; rather, they are unproven assumptions that provide a starting point for researchers to conceptualize the problem.

Induction

Induction is an inquiry approach introduced by Bacon (1260/1960). For induction, the source of knowledge is empirical observation. It is important to note that Bacon did not suggest one must observe every member in a set in order to reach a conclusion. Instead, for Bacon a conclusion is still legitimate even if the observation is performed within a finite sample. Because one cannot exhaust every case in the world, the conclusion is inevitably tentative (Hume, 1777/1912), and thus, probabilistic. Consider the following inductive process:

A₁, A₂, A₃, A₄...A₁₀₀₀ are red.

All A's are red.

However, if A₁₀₀₁ is blue, the previous conclusion is overthrown. In statistics the inductive approach is applied in the following manner:

100 A's have been observed. 30 of them are red.

The probability of obtaining a red A is 30/100.

If the 101th A is not red, then the probability of getting a red A becomes 30/101. If it is red, then the probability becomes 31/101. Thus, an empirical probability, which is based upon relative frequency in the long run, can hardly lead to a definitive answer.

In addition, when the population is unknown, an empirical-based inference can never be certain (Popper, 1962). In the previous example, the population is known and thus the probability can simply be expressed in terms of the ratio between the sample and the population. On the other hand, when one doesn't know how many balls are in the bag, one cannot absolutely affirm that most of the balls in the bag are red--even if seven red balls are drawn out of ten trials. As a matter of fact, in empirical research the size and distribution of most populations are unknown and thus theoretical sampling distributions are used as a bridge between the theoretical and empirical worlds. Nevertheless, the use of sampling distributions is a deductive rather than an inductive process. This point will be discussed in the following section.

Deduction and induction in quantitative methods

Quantitative research is unfortunately mis-identified as a "hypothetical-deductive method" (Glesne & Peshkin, 1992) while indeed hypothesis formulation is only a part of hypothesis testing. To be specific, both deduction and induction play significant roles in hypothesis testing as indicated in the following:

Mathematical probability: Deductive

Hypothesis formulation: Abductive/Deductive

Data analysis and inferences: Inductive

Mathematical probability. According to Fisher (1935), statistical methods incorporate both deduction and induction. In statistics, inferences are made from the sample to the population. This process is congruent to the inductive process, in which inferences are made from the particular to the general. Since statistical inferences are inductive in nature, Fisher called them "uncertain inferences." Nevertheless, this uncertainty does not imply that the inferences are not rigorous. Actually, uncertain inferences can be rigorously expressed in terms of classical mathematical probability, which is deductive in character. Mathematical probability is said to be deductive because it is based upon logical reasoning instead of empirical induction. Statements of mathematical probability are about the behavior of individuals, or samples, or sequences of samples, drawn from populations that are fully known. As mentioned before, this assumption is unrealistic because we can never fully know the population. Indeed, Fisher (1956) blatantly admitted that theoretical distributions in which the test statistic is compared against have no corresponding mapping to the objective reality. In a similar vein, Lindsey (1996) asserted that "probability is appropriate for constructing models but much less so for making scientific inferences about them." (p. vi)

Hypothesis formulation. In quantitative research, hypothesis formulation can be viewed as a process of abduction and deduction (Yu, Behrens, & Ohlund, 2000). Based on theory, previous research (ideas), and common sense, the researcher proposes an untested hypothesis (e. g., Web-based instruction is more effective than classroom instruction for teaching biology). In the deductive mode, the goal of the researcher is to test whether the hypothesis is true or false with reference to mathematical probability. If quantitative research is strictly equated with hypothesis testing, the researcher must give a dichotomous answer because both mathematical probability and hypothesis formulation are deductive in character.

Data analysis and inferences. On the other hand, data collection, data analysis, and interpretation can be viewed as a process of induction. However, the use of deduction and the use of induction are always confused at the stage of interpretation. In the deductive mode, the interpretation aims at the logical flow and consistency. In the inductive mode, the objective is to find the probability of the event occurrence. It is a common malpractice that quantitative researchers approach data analysis in a deductive manner, attempting to seek for the single best answer.

To rectify this situation, some proposed that all models are wrong to some degree but some are useful (Bernardo & Smith, 1994; MacCallum, 1995). In this view, induction dominates the reasoning process. A statistical model is a mathematical equation that represents the relationships among variables in the real world. However, in practice no observed data can perfectly fit a model derived from human ideas. Some models are better fitting and thus are useful to describe the world, while others are not. Given the assumptions of the model, one can tell what it is likely to happen but cannot make a firm inference of what has happened. This type of judgment, which is based upon a continuum, is inductive reasoning (Shield, 1999).

Deterministic and probabilistic philosophies

Laplace Demon

The difference between dichotomous answer and probabilistic inference can also be viewed as the chasm between deterministic and probabilistic world-views. In scientific determinism, every outcome is a necessity i.e., given the input, only one output can occur. This idea originated from French mathematician Laplace. Based on the Newtonian physics, Laplace claimed that everything is determined by physical laws. If a powerful intellect (called Laplace's demon) fully comprehends the Newtonian law, and knows the position and momentum of every particle in the universe, no doubt he could predict every event in the history of the universe. Laplace's determinism was applied to the realm of extended, spatial, material substance. Later determinism was expanded to the realm of psychological events. Under determinism, there is only one definite answer. Determinists asserted that scientific laws could only be founded on certainty and on an absolute determinism, not on a probability (Hacking, 1992).

Quantum mechanics and Copenhagen interpretation

Philosophers and scientists who hold the probabilistic view believe that quantum mechanics has disproved determinism. According to quantum mechanics, there are infinite possible universes. Physicists found that in the subatomic world, events are not the inevitable and unique solution to single-valued differential equations, but are the random expression of a probability distribution. The present state limits the probability of future outcomes, but does not determine a definite fixed result (Weatherford, 1991).

If the reader found it difficult to follow, watch "Star Trek: the next generation." In the episode entitled "Parallels," Lieutenant Worf was confused by the fact that events around him change rapidly. He was surprised to find that Counselor Deanna Troi became his wife! Was it caused by a disturbance in the temporal continuum of his own universe? No, actually he was shifting across many different quantum universes where events took a different path.

Niels Bohr is one of several physicists who advocate quantum mechanics. According to Bohr's "Copenhagen interpretation," one could answer questions of the form: "If the experiment is performed, what are the possible results and their probabilities?" Instead, one should not answer any question in this form: "What is really happening when ...?" (cited in Jaynes, 1995, p.1012). Copenhagen interpretation is derived from quantum mechanics. According to the Heisenberg's (1958) uncertainty principle in quantum mechanics, one cannot know the velocity and the position of a subatomic particle at the same time; the knowledge of either one must be obtained through measurement that requires human interferences. However, this interference alters the observed outcome. In this view, one can only give a conditional prediction about what the result would be if a given measurement were to be performed (Maxwell, 1993).
The implication of this concept is widely discussed in psychological research methodology. As a matter of fact, observed results in psychological research are more subject to human interferences than observed results in physical science. However, Copenhagen interpretation is rarely discussed in the context of statistical testing even though a high degree of resemblance between Copenhagen interpretation and statistical testing exists. Copenhagen interpretation and quantum mechanics are regarded as instrumentalistic because in this framework scientific models do not necessarily reflect the ultimate reality; rather, they result from instrumentation. Likewise, in statistical testing, both Fisherians and Bayesians believe that statistical modeling is a convenient method of description and prediction, but the model may not be an exact representation of reality. Realism and anti-realism has been a controversial topic since the beginning of philosophy of science. It is not the intention of this article to settle this issue. Nevertheless, if the underlying principles of quantum mechanics lead to Copenhagen interpretation, statistical testing, which is founded on a similar philosophy, may follow a similar route. The following notion should be considered by statisticians: One can only give a conditional prediction about what the result would be if such and such measurement and testing were to be performed.

Departure from ontology

As mentioned before, the discussion concerning deterministic and probabilistic worldviews may involve the unresolved question about realism and anti-realism, which tends to end up in a dead end. Interestingly enough, some opponents and proponents of determinism attempted to shift the focus away from realism. For example, Good (1988) argued that even if determinism is true, we would not be able to determine whether the world is deterministic or probabilistic. Therefore, it is legitimate to assume indeterminism though it is only a convenient fiction. In his view, an infinite amount of information would be required to make an accurate prediction of a closed system, but there must be some degree of accuracy that is physically impossible to measure. In this sense classical statistical mechanics provides indeterminism out of determinism (Good, 1983).

On the other hand, several philosophers believed that certainty could be achieved by translating metaphysical doctrines into formal logics. This approach is called the linguistic approach. In this school, scholars treat physical laws as linguistic entities. These entities must follow the law of logical flow; hence, a theory, which is composed of linguistic entities, must be deterministic in essence (Earman, 1971). In other words, both sides transformed the ontological problem to an epistemological and methodological one.

Ginzburg (1934) pointed out that on one hand scientists are dissatisfied with the absence of ontology because without ontology people would be skeptical of the truth and view them as subjective illusions. But in practice, statisticians who accept the frequentist view of probability have moved away from ontology. The frequentist view of probability, which is introduced by von Mises and Reichenbach, is aimed at championing the empirical mode of procedure in measuring probability rather than at exploring its ontological status.

In statistics, there are supporters to both world-views. Some statisticians regard a regression model as a deterministic model, because one can exactly predict the Y value given the X values are correct and the exact parameters of the model are known. For example, in economics Neo-classical Keynesian school asserted that the future is a shadow of the past; given the correct data, the future is absolutely predictable. On the other hand, some econometricians such as Post-Keyesians support the notion that econometrics cannot entirely fit the dynamic and complex world; statistical modeling does not necessarily lead to an exact prediction (Davidson, 1988). As a matter of fact, modern econometricians found that the future is not as predictable as Keyesians thought. The same trend could also be found in psychometrics and educmetrics.

Fisher and Neyman/Pearson

The origin of the debate concerning dichotomous answers can also be traced back to early history of statistics. The current form of hypothesis testing is a fusion of two schools of thoughts: Fisher and Neyman/Pearson (Lehmann, 1993). When Fisher introduced his methodology, there was only one hypothesis: Null (i.e., there is no difference between the control group and the treatment group). Following this strategy, the only possible options are whether one can reject the null hypothesis or not to reject the null. Put simply, the conclusion is an either/or answer. Later Neyman and Pearson introduced the concept of alternate hypothesis (i.e., there is a difference between the control group and the treatment group). However, the alternate distribution is unknown thereby could be anything (e.g. a very huge difference, a huge difference, a medium difference, a small difference ... etc). With the presence of the alternatives, the conclusion is no longer dichotomous. Further differences between the two schools can be found in the use of cut-off Alpha level. While Fisher advocated .05 as the standard cut-off Alpha level, Neyman and Pearson (1933) did not recommend a standard level but suggested that researchers should look for a balance between Type I and Type II errors.
Nevertheless, it is important to note that the Fisherian approach is not "mechanical" as some critics thought. First, in Fisher's later career he disapproved the use of any standard level though he once supported it. Second, Fisher was opposed to hand over the judgment about whether or not to accept a hypothesis to an automated test procedure. On the contrary, Fisher viewed the conclusion derived from a particular study as provisional (cited in Mulaik, Raju, & Harshman, 1997, pp.78-79). Further, Fisher (1956) emphasized that the purpose of research is to gain a better understanding of the experimental material, and of the problem it presents. In a similar tone, Pearson (1955) admitted that the terms "acceptance" and "reject," which carry a connotation of absolute certainty, were unfortunately chosen.

Rao's (1992) assessment of Fisher's work is helpful to clarify several misconceptions of dichotomous decisions in statistical testing:

The decision (reject/not reject the null) is based on the logical disjunction … Such a prescription was, perhaps, necessary at a time when statistical concepts were not fully understood and the exact level of significance attained by a test statistic could not be calculated due to the lack of computational power…Fisher gives a limited role to tests of significance in statistical inference, only useful in situations where alternative hypotheses are not specified…Fisher's emphasis on testing of null hypotheses in his earlier writings has probably misled the statistical practitioners in the interpretation of significance tests in research work (p.46)

Nevertheless, the preceding comments help us to understand the limitations of the classical hypothesis testing in a historical perspective. However, in modern days many psychological researchers still misperceive the dichotomous character of hypothesis testing. By reviewing the frameworks of Fisher and Neyman/Pearson, Lehmann (1993) give researchers several practical suggestions,

Should this (the reporting of the conclusions of the analysis) consist merely of a statement of significance or nonsignificance at a given level, or should a p value be reported? The original reason for fixed, standardized levels-unavailability of more detailed tables-no longer applies, and in any case reporting the p value provides more information. On the other hand, definite decisions or conclusions are often required. Additionally, in view of the enormously widespread use of testing at many different levels of sophistication, some statisticians (and journal editors) see an advantage in standardization; fortunately, this is a case where you can have your cake and eat it too. One should rountinely report the p value and, where desired, combine this with a statement on significance at any stated level (p.1247).

Conclusion

A conclusion resulting from hypothesis testing should be viewed as a probabilistic inference rather than a dichotomous answer. The infallible character of deduction in the Cartesian tradition cannot be applied to psychological research due to the absence of self-evident axioms in psychology. Quantitative method is not merely hypothetical-deductive as some critics thought. Instead, quantitative method is composed of mathematical probability, hypothesis formulation, and uncertain inference, in which abductive, deductive and inductive logics are all employed. The inductive character during the stage of interpretation yield a probabilistic inference rather than a dichotomous answer. Further, determinism has been seriously challenged by quantum mechanics. Although Bohr is in a different discipline, Copenhagen Interpretation can be well-applied to statistical tests because there is a high degree of resemblance between quantum mechanics and statistical tests in terms of the probabilistic worldviews. Even though the question of realism is unsettled in the level of ontology, probabilistic inference is still valid in the level of epistemology. Last, the hybrid model of Fisher-Neyman/Pearson relies on infinite theoretical distributions, which is compatible with the inexhaustible character of induction. Hypothesis testing is grounded on theoretical sampling distributions in long run. In theory, the "long run" is "infinity." But no one can live forever to examine every case. Due to this inconclusive nature, hypothesis testing can be viewed as a probabilistic inference instead of a clear-cut answer. Peirce (1900/1960) contended that inquiry is a self-correcting process across the intellectual community in the long run. Even though the probabilistic interpretation takes away the certainty of the dichotomous interpretation, individual studies are still meaningful when convergence of findings occurs.

Notes:

1. Schmidt and Hunter (1997) doubted the validity of making a dichotomous decision derived from a single study. Since no single study contains sufficient information to support a final conclusion about the truth or value of a hypothesis, two remedies are proposed:

Instead, point estimates and confidence intervals are better alternatives because they allow researchers to see that findings in a single study are tentative and preliminary. In this approach, first the population parameter is estimated from the sample statistic. Then an "error band" is applied to bracket the point estimate. This bracket, the confidence intervbal, tells the researcher a possible range where the true mean may be.

To overcome the uncertainty in a single study, combining confidence intervals across multiple studies using meta-analysis can help researchers to reach dependable results.

2. This is a real life example: Liu, Goetze, and Glynn (1992) conducted a study on how knowledge of other languages affects learning of object-oriented programming. They reported p-values associated with different variables as "most significant," "weakly significant," "high enough for significant," and "barely significant." (p.81). They even ranked order those variables by significance:

Language

P value

C

.0000

plx

.0031

Cobol

.0095

Assember

.0368

Basic

.0384

PL/1

.0422

Use of descripters such as "highly," "weakly," and "barely" are not recommended. The reason is that there are no specificed ranges such as "from .04-.05 is barely significant, from .06-.07 is almost significant." Without clear criteria, this kind of inferences are subjective.

Reference

Bacon, F. (1620/1960). The new organon, and related writings. New York: Liberal Arts Press.
Bernardo, J. M., & Smith, A. F. M. (1994). Bayesian theory. Chichester, New York: John Wiley & Sons.

Carnap, R. (1962). Logical foundations of probability. Chicago, IL: The Univerity of Chicago Press.

Davidson, P. (1988). Struggle over the Keyesian heritage. Nashville, TN: Carmichael and Carmichael, Inc.

Earman, J. (1971). Laplacian determinism, or is this any way to run a universe? The Journal of Philosophy, 68, 729-744.

Fisher, R. A. (1935). The logical of inductive inference. Journal of the Royal Statistical Society, 98, 39-82.

Fisher, R. A. (1956). Statistical methods and scientific inference. London: Collins Macmillan.

Ginzburg, B. (1934). Probability and the philosophical foundations of scientific knowledge. The Philosophical Review, 43, 258-278.

Glesne, C., & Peshkin, A. (1992). Becoming qualitative researchers : An introduction. New York : Longman.

Glymour, C. (1980). Theory and evidence. Princeton, NJ: Princeton University Press.

Good, I. J. (1983). Good thinking: The foundation of probability and its applications. Minneapolis, MN: University of Minnesota Press.

Good, I. J. (1988). The interface between statistics and philosophy of science. Statistical Science, 3, 386-397.

Hacking, I. (1992). The taming of chance. Cambridge, UK: Cambridge University Press.

Heisenberg, W. (1958). Physics and philosophy: The revolution in modern science. New York: Harper.

Hersh, R. (1997). What is mathematics, really? New York: Oxford University Press.

Hoffmann, M. (1997). Is there a logic of abduction? Paper presented at the 6th congress of the International Association for Semiotic Studies, Guadalajara, Mexico.

Huck, S. W.; & Cormier, W. H. (1996). Reading statistics and research (2^nd ed.). HarperCollins.

Hume, D. (1777/1912). An enquiry concerning human understanding, and selections from a treatise of human nature. Chicago: Open Court Pub. Co.

Jaynes, E. T. (1995). Probability theory: The logic of science. [On-line] Available URL: http://omega.math.albany.edu:8008/JaynesBook.html

Langenbach, M.; Vaughn, C. & Aagaard, L. (1994). Introduction to educational research. Boston, MA: Allyn and Bacon.

Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88, 1242-1249.

Lindsey, J. K. (1996). Parametric statistical inference. Oxford: Clarendon Press.

MacCallum, R. C. (1995). Model specification: Procedures, strategies, and related issues. In R. H. Hoyle (Eds.), Structural equation modeling: Concepts, issues, and applications (pp.16-36). Thousand Oaks: Sage Publications.

Maxwell, N. (1993). Does Orthodox Quantum Theory undermine, or support, Scientific Realism? Philosophical Quarterly, 43, 139-157.

Mulaik, S. A.; Raju, N. S.; & Harshman, R. A. (1997). There is a time and a place for significance testing. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 65-115). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.

Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, Series A, 231, 289-337.

Pearson, E. S. (1955). Statistical concepts in their relation to reality. Journal of the Royal Statistical Society, Series B, 17, 204-207.

Peirce, C. S. (1900/1960). Collected papers of Charles Sanders Peirce. Cambridge: Harvard University Press.

Popper, K. (1962). Conjectures and refutations: The growth of scientific knowledge. New York: Basic Books.

Quine, W. V. (1982). Methods of logic. Cambridge, Mass.: Harvard University Press.

Rao, C. R. (1992). R. A. Fisher: The founder of modern statistics. Statistical Science, 7, 34-48.

Schield, M. (1999, June 23). Re: A model can never be wrong. Educational Statistics Discussion List (EDSTAT-L). [Online]. Available E-mail: edstat-l@jse.stat.ncsu.edu [1999, June 23].

Schmidt, F. L., & Hunter, J. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 37-64). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.

Wang, C. (1993). Sense and nonsense of statistical inference: Controversy, misuse and subtley. New Yoek: Marcel Dekker, Inc.

Weatherford, R. (1991). The implications of determinism. New York: Routledge.

Wilkinson, L, & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594�604. [On-line] Available URL: http://www.apa.org/journals/amp/amp548594.html

Yu, C. H., Behrens, J., & Ohlund, B. (2000). Abduction, deduction, and induction: Their applications in quantitative methods. Manuscript submitted for publication.