|
of A Judge's Deskbook on the Basic Philosopies and
Methods of Science, Data Analysis: An Introduction to Statistics |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
Statistics is the science and art of gaining information from data -- of collecting, organizing, and interpreting numerical facts. The field of statistics includes the methods and procedures used to summarize, analyze, and draw inferences from data. Statistics and Measurement Scales Measurement is essentially the assigning of numbers to observations according to certain rules. The way in which the numbers are assigned to observations determines the scale measurement being used. The choice as to which statistical test can legitimately be used for data analysis rests largely on which scale of measurement has been employed. Further, the inferences that can be drawn from a study cannot, or at least should not, outrun the data being used.
|
Learning Objectives for Chapter 9 Upon completion of this chapter, the reader should be able to:
|
||||||||||||||||||||||||||||||||||||||||||||||||
Measures of Central Tendency and the Effects of the Scale of Measurement Used Interval and Ratio Data: Because with interval and ratio data the difference between scores is equal, interval and ratio data allow for the calculation of the mean, median, and mode. Ordinal Data: Since ordinal data provides no information regarding the distance between the scale points, calculating an ordinal mean is inappropriate and misleading. When ordinal data is used a median should be calculated-- that is, ordinal data can be ranked and the median is the middle score. Nominal Data: With nominal data, neither the mean nor the median can be used, since each of these measures implies comparisons of greater than and less than. The only measure of central tendency permissible for nominal data is the mode, the most frequently occurring score. |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
I. Descriptive Statistics Descriptive statistics refers to a set of procedures used to describe and summarize samples of data. Graphing Data The process of graphing data usually begins with the creation of a distribution. To exact some meaning from the original data, the researcher begins by bringing order to the data. The first step is to form a distribution of scores. A distribution is the arrangement of any set of scores in order of magnitude. Frequency distributions allow the researcher to see general trends more readily than does an ordered set of raw data. A frequency distribution is a listing, in order of magnitude, of each score achieved, together with the number of times that score occurred. Frequency distributions can be presented in both tabular and graphic form (e.g., bar graphs or line graphs).
|
There are two main branches of statistical methods: Descriptive Statistics statistics that summarize, describe, and make understandable the numbers generated by a research study Inferential Statistics statistics used to draw conclusions and inferences which are based upon, but go beyond, the numbers generated by a research study |
||||||||||||||||||||||||||||||||||||||||||||||||
|
Measures of Central Tendency Measures of central tendency are designed to give information concerning the average, or typical, score of a large number of scores - that is, which single score best represents an entire set of scores. There are three methods for obtaining a measure of the central tendency:
The mean is the arithmetic average of all the scores. It is calculated by adding all the scores together and then dividing by the total number of scores involved. It is important to realize that in some cases the mean can give a very distorted picture of the average value of a distribution of scores. That is, when there are extreme scores (called outliers) the average score will give a distorted picture of the distribution of scores.
The median is the exact midpoint of any distribution. The median is a much more accurate representation of central tendency than is the mean. To calculate the median, the scores must first be arranged in order of magnitude (e.g., from lowest to highest), the middle score is the median. In certain cases, the median is better than the mean as a typical or representative value for a group of scores. This happens when there are a few extreme scores (called outliers) that would strongly affect the mean but would not affect the median.
The mode is the most common single number in the distribution; in a perfectly symmetrical unimodel distribution, the mode is the same as the mean. However, when it is not the same, the mode is not really a good representative value of the distribution. A distribution having a single mode is called a unimodal distribution. A distribution having two or more modes is called a bimodal distribution.
|
Statistics: A Practical History of Craps and Beer During the seventeenth century, the birth of statistics finally took place. It happened one night in France. The scene was a gambling table, and the main character was the Chevalier de Mere, a noted gambler of his time. He had been having a disastrous run of losing throws. To find out whether his losses were indeed the product of bad luck or simply of unrealistic expectations, he sought the advice of the great French mathematician and philosopher Blaise Pascal (16231662). Pascal worked out the probabilities for the various dice throws, and the Chevalier de Mere discovered that he had been making some very bad bets indeed. Thus, the father of probability theory was Pascal. Another milestone for statistics occurred at the turn of the century in Ireland at the famous Guinness brewery, now known worldwide for the record books of the same name. In 1906, to produce the best beverage possible, the Guinness Company decided to select a sample of people from Dublin to do a little beer tasting. Since there turned out to be no shortage of individuals willing to participate in this taste test, the question of just how large a sample would be required became financially crucial to the brewery. They turned the problem over to the mathematician William Sealy Gossett. In 1908, under the pen name "Student," Gossett produced the formula for specifying how large a sample must be to generalize the results to the entire beer-drinking population. So that's the history - craps and beer.... The point is that the hallmark of statistics is the very practicality that gave rise to its existence in the first place. The field is not an area of mysticism or sterile speculations. It is a no-nonsense area of here-and-now pragmatism. -- Richard C. Sprinthall, Basic Statistical Analysis, 5th Edition. Allyn and Bacon (1997), pg. 13. |
||||||||||||||||||||||||||||||||||||||||||||||||
|
Measures of Variability A measure of central tendency (i.e., mean, median, or mode) is a single number that describes a hypothetical, typical person. A statistic that describes the extent to which scores differ from one another in a distribution, and the extent to which they differ from the mean, is called a measure of variability. Just as measures of central tendency give information about similarity among scores, measures of variability give information about how scores differ or vary. There are three major measures of variability: 1. The Range 2. The Standard Deviation 3. The Variance |
|||||||||||||||||||||||||||||||||||||||||||||||||
The range is the measurement of the width or spread of an entire distribution and is found simply by calculating the difference between the highest and lowest scores. The range is a limited measure of variability. For example, distributions can have identical means and ranges and yet vary widely in terms of other important measures of variability.
The standard deviation is one of the most important measures of variability and it takes into account all scores in a distribution. The standard deviation is defined as a measure of the variability that indicates by how much all of the scores in the distribution typically deviate or vary from the mean. Since the standard deviation is always calculated with reference to the mean, its calculation demands the use of interval or ratio data. The standard deviation is the typical deviation of a given distribution. The larger the value of the standard deviation, the more the scores are spread out around the mean; the smaller the value of the standard deviation, the less the scores are spread out around the mean. That is, a distribution with a small standard deviation indicates that the group being measured is homogeneous; their scores are clustered very close to the mean. A distribution with a large standard deviation indicates that the group is heterogeneous; their scores are more widely dispersed from the mean. |
![]() Normal Curve: a theoretical distribution; a unimodal frequency distribution with scores plotted on the X axis (the horizontal axis) and frequency plotted on the Y axis (the vertical axis); most of the scores cluster around the middle of the distribution; curve is symmetrical and all three measures of central tendency (mean, median, mode) fall precisely at the middle of the distribution. ![]() Positively Skewed Distribution: distribution in which scores are concentrated near the bottom of the distribution; tail of the distribution points to the top or positive end. ![]() Negatively Skewed Distribution: distribution in which scores are concentrated near the top of the distribution; tail of the distribution points to the low or negative end.
Inferential Statistics: statistical procedures used to draw conclusions and inferences which are based upon, but go beyond, the numbers generated by a research study |
||||||||||||||||||||||||||||||||||||||||||||||||
The variance of a distribution is the square of the standard deviation. It is a useful term because it reflects how much of the variability between people on one characteristic (e.g., income) can be explained by knowing where they stand on another characteristic (e.g., education). ![]() The Normal Curve and Z-Scores The normal curve is a theoretical distribution. However, many distributions of people-related measurements come close to approximating the normal curve and thus it is of crucial significance for describing data. The normal curve is a unimodal frequency distribution with scores plotted on the X axis (the horizontal axis) and frequency plotted on the Y axis (the vertical axis). In a normal curve, most of the scores cluster around the middle of the distribution (where the curve is at its highest). As the distance from the middle increases, in either direction, there are fewer and fewer scores. The normal curve is symmetrical - both sides are mirror images of the other - and all three measures of central tendency (the mean, median, and mode) fall precisely at the same point, the exact middle of the distribution. In a skewed distribution, scores tend to pile up at one end or the other. The direction of skewness is indicated by the "tail" of the curve. The curve is positively skewed when most of the scores pile up near the bottom (the tail points toward the high or positive end). The curve is negatively skewed when most of the scores pile up near the top (the tail points toward the low or negative end). The normal curve has a constant relationship with the standarddeviation. When the normal curve is marked off in units of standard deviation, a series of constant percentages under the normal curve are formed. Once the curve is plotted according to standard deviation units, it is called the standard normal curve, or z-distribution. A z-distribution is a normally distributed set of specially scaled scores whose mean is always equal to zero and whose standard deviation must equal 1.00. Z-scores take into account both the mean of the distribution and the amount of variability, the standard deviation. Thus, z-scores can be used to assess an individual's relative performance compared to the performance of the entire group being measured. The z-score is the number of standard deviations the observed value is from the mean. |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
Part II. Inferential Statistics The primary goal of inferential statistics is to measure a few and generalize to many. That is, observations are made of a small segment of the group, and then, from these observations, the characteristics of the entire group are inferred. Inferential statistics are procedures used to reach conclusions (generalizations) about larger populations from a small sample of data with a minimal degree of error. There are usually two issues to be explored: 1. Does the mean of a sample actually reflect the mean of the larger population of interest? 2. Is a difference found between two means (e.g., between an experimental group and a control group) a real and important difference, or is it merely the result of chance? Measures of Relationship: Correlation Measures of central tendency and variability are basic descriptive statistics that tell us something about the distribution of a variable. Measures of relationships provide information about what relationship the variable has to other variables. The association between one variable and any other variable is described as a correlation. If two variables have a perfect correlation (their data points fall along a straight line), then r = 1.0 (Fig 1) or r = -1.0 (Fig 2) ("r" is the correlation coefficient). The positive and negative values simply show the direction of the relationship. When two variables are positively correlated, as one increases, the other also increases. When they are negatively correlated, as one increases, the other decreases. Two variables with less than a perfect correlation will have an "r value" between 0 and 1.0 or 0 and -1.0. If no relationship exists between two variables, r = 0. Figure 1 depicts a positive correlation between Variable X and Variable Y. That is, as Variable X increases, Variable Y also increases. Figure 2 depicts a negative correlation between the two variables. That is, as Variable X increases, Variable Y decreases. Measures of Relationship: Regression Regression analysis predicts the extent to which the value of one or more variables can be predicted by knowing the value of other variables. A linear regression predicts the magnitude of the expected change in variable Y given a change in variable X. A simple linear regression is designed to determine whether there is a linear relationship between a response variable and a possible predictor variable. A multiple linear regression is designed to examine the relationship between a response variable and several possible predictor variables. Nonlinear regression is designed to describe the relationship between a response variable and one or more explanatory variables in a non-linear fashion. |
Key Concepts of Inferential Statistics Population (or universe): an entire group of persons, things, or events having at least one trait in common Sample: a smaller number of observations taken from the total number making up the population; in typical applications of inferential statistics, the sample size is small relative to the population size To make accurate predictions, the sample should be representative of the population. In a sense, a good representative sample provides the researcher with a miniature mirror with which to view the entire population. Recall that you have seen these concepts before in the chapter on surveys.
Correlation: an association between two variables; can be positive or negative Correlation does not equal Causation.
Correlation Coefficient: a number between -1 and 1 which measures the degree to which two variables are linearly related. If there is a perfect positive linear relationship, r = 1 (i. e., an increase (or decrease) in one variable is associated with an increase (or decrease) in the other variable); if there is a perfect negative linear relationship, r = -1 (i. e., an increase (decrease) in one variable i s associated with a decrease (increase) in the other variable; If r = 0 there is no linear relationship between the variables Pearsons Product Moment Correlation Coefficient: Pearsons product moment correlation, usually denoted by r, is one example of a correlation coefficient; a measure of the linear association between two variables that have been measured on interval or ratio scales (e. g., the relationship between height in inches and weight in pounds)
|
||||||||||||||||||||||||||||||||||||||||||||||||
|
Sampling Revisited Sampling techniques were briefly discussed in the chapter on survey methodology. They are briefly revisited here.
Random sampling demands that each member of the entire population has an equal chance of being included and that no member of the population may be systematically excluded. It is important to note that randomness describes the selection process, (i.e., the procedures by which the sample is selected), and not the particular pattern of observations in the sample.
To obtain this kind of sampling, the researcher must know beforehand what some of the major population characteristics are and, then, deliberately select a sample that shares these same characteristics in the same proportions. Whenever the sample differs systematically from the population of interest, a bias has occurred. Bias is a constant difference, in one direction, between the mean of the sample and the mean of the population. Bias occurs when most of the sampling error loads up on one side, so that the sample means are constantly either over- or under-estimating the population mean.
Each distribution discussed so far has been a distribution of individual scores - each point in the distribution represents a measure of a characteristic or performance of an individual. In sampling distributions, each point represents a measure of a characteristic or performance of a sample of individuals. The mean increase of a sample of U.S. adults is an example; it would be one data point in the sampling distribution of mean income. Sampling distributions are important in testing hypotheses. |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
Part III. Parameter Estimates and Hypothesis-Testing Criminal suspects are presumed innocent until proven guilty. Under hypothesis-testing procedures, the null hypothesis is presumed to be true until proven false. Once all the evidence has been considered, a verdict is reached, and the null hypothesis is either retained (failure to reject) or it is rejected. Evidence for testing an hypothesis about a sample statistic is based on the relationship between the observed sample statistic and the sampling distribution of that statistic. For example, if a researcher predicts that the mean weight of rats in an experimental group is greater than the mean weight in a control group, then the statistic at issue is the difference between the two means. The experimental or research hypothesis is that the two means represent different populations and that the difference between them is dependable. The null hypothesis is that the two means come from the same population and that the difference between them would not hold up under repeated replications of the experiment. The difference between the means is compared to the sampling distribution of such differences, the mean of which is usually zero (no difference). If a difference as large as or larger than the obtained difference is very unlikely for groups coming from the same population, then the difference will be judged to be an improbable outcome under the null hypothesis of no dependable difference and the null hypothesis will be rejected. On the other hand, if the observed difference is not so large as to be highly improbable, the null hypothesis will be accepted (or the null hypothesis will not be rejected). An observed sample statistic will qualify as a probable outcome if the difference between its value and that of the hypothesized population statistic is small enough to be attributed to chance. For example, a sample mean will qualify as a probable outcome if the difference between its value and that of the hypothesized population mean is small enough to be attributed to chance. Under these circumstances, because there is no compelling reason to reject the hypothesis, the null hypothesis is tentatively accepted. An observed sample statistic will qualify as an improbable outcome if the difference between its value and the hypothesized value is too large to be attributed to chance. That is, a sample mean will qualify as an improbable outcome if it deviates too far from the hypothesized mean and appears to emerge from the sparse concentration of possible sample means in either "tail" of the sampling distribution. Under these circumstances, because there are grounds for suspecting the hypothesis, the hypothesis is rejected. The decision to reject the null hypothesis involves a degree of risk. Having rejected a null hypothesis, we can never be absolutely certain whether the decision is correct or incorrect, unless, of course, the entire population was surveyed. Even if the null hypothesis is true, there is a slight possibility that just by chance, the one observed sample mean really originates from rejection regions (the tails) of the hypothesized sample distribution, thus causing the true null hypothesis to be erroneously rejected. |
Regression: predicts the extent to which the value of one or more variables can be predicted by knowing the value of other variables Linear Regression: predicts the magnitude of the expected change in variable Y given a change in variable X Simple Linear Regression: designed to determine whether there is a linear relationship between a response variable and a possible predictor variable Multiple Linear Regression: designed to examine the relationship between a response variable and several possible predictor variables Nonlinear Regression: designed to describe the relationship between a response variable and one or more explanatory variables in a non- linear fashion
Bias: a constant difference, in one direction, between the mean of the sample and the mean of the population; occurs when most of the sampling error loads up on one side, so that the sample means are constantly either over- or under-estimating the population mean
Sampling Error --Whenever a sample is selected, it must be assumed that the sample measures will not precisely match those that would be obtained if the entire population were measured. The sampling error reflects, or is an index of, the difference between the sample value and the population value. Sampling error is not a mistake. Any sample mean should be expected to deviate from the mean of the whole population, but the deviation will hopefully be random and should not be large. |
||||||||||||||||||||||||||||||||||||||||||||||||
|
Part IV. Error Rates In determining the admissibility of expert opinion regarding a particular scientific technique, the court ordinarily should consider known or potential rates of error, and existence and maintenance of standards controlling the technique's operation.(1) To assess known or potential rates of error, the judiciary must be prepared to carefully and critically evaluate the methodology and underlying assumptions of proffered scientific evidence. Such an evaluation would entail examination of whether the research hypothesis was appropriately articulated and tested, whether appropriate controls were utilized, whether threats to validity were controlled for, or at least severely minimized, and so forth. The likelihood with which a measurement device or a technological procedure leads to an incorrect classification is the error rate. Whereas formal testing of hypotheses usually relies on theoretical sampling distributions for estimating the likelihood that the decision based on the data is erroneous (especially Type I error), the likelihood of an incorrect classification is usually assessed in terms of error rates. Several rates should be taken into account, typically termed "true positive," "true negative," "false positive," and "false negative" rates. For example, if a laboratory claims that a particular test reliably identifies the existence of a serious disease, it is necessary to consider the proportion of people with the disease who were correctly identified as having it (true positive) and those who were correctly identified as not having it (true negative). It is also important to consider the proportion of individuals without the disease who were incorrectly identified as having it (false positive) and the proportion of individuals with the disease who were incorrectly identified as not having it (false negative). False positives could lead to unnecessary further expense and painful medical interventions; false negatives could lead to further and perhaps fatal progression of the disease. It usually is essential to examine both types of erroneous classification rates; if proffered evidence does not include both error rates, it is likely to be of little value. Error rates are generally stated as percentages or
proportions. In the above case, for example, the data might
have been drawn from people who visited their physicians
because of certain bothersome symptoms, and when the
physicians conducted the diagnostic test, the results for
104 patients might have been:
The true positive rate is .98 (90/92), with only two diseased patients mis-diagnosed (2/92, a false negative rate of .02). There were 12 patients without the disease, 10 of whom were mis-diagnosed as having the disease for a false positive rate of .83 (10/12). This example illustrates two points. First, the rate of correct classifications has to be compared to the rates of both false positive and false negative classifications. The relative importance of the two types of errors will depend on what they lead to-false security, expensive or painful further intervention, and so on. Second, although proportions and percentages are very useful modes of presenting data, sometimes the raw numbers underlying the percentages are equally important. In the example above, only 12 of the 104 patients were actually free of the disease, and that base of 12 is too small to draw firm conclusions about the false positive rate. We would be much more confident if the number of disease-free patients who were tested was larger. In general, if we were told that 50% of people held a certain opinion, we would want to know if the reference was 50% of 2 people or 50% of 2,000. |
Hypothesis-Testing in Statistical Terms The purpose of a hypothesis test is to determine the likelihood that a particular sample could have originated from a population with a hypothesized characteristic. The null hypothesis supplies the value about which the hypothesized sampling distribution is centered. It always makes a statement about a characteristic of the population, never about a characteristic of the sample. The null hypothesis always makes the claim about a single numerical value, never a range of values. The experimental hypothesis , asserts the opposite of the null hypothesis. A decision to accept the null hypothesis (or a failure to reject the null hypothesis) implies a lack of support for the experimental or research hypothesis, and a decision to reject the null hypothesis implies support for the experimental or research hypothesis. A decision rule specifies precisely when the null hypothesis should be rejected.
Error Rate: the likelihood with which a measurement device or a technological procedure leads to an incorrect classification True Positive Error: correctly classifying someone as possessing a particular characteristic or falling into a particular category (e. g., person has disease and is classified as having disease) True Negative Error: correctly classifying someone who does not possess a particular characteristic or who does not fall into a particular category (e. g, person does not have disease and is classified as not having the disease False Positive Error: incorrectly classifying someone without a particular characteristic as possessing that characteristic (e. g., person does not have disease, but incorrectly classified as having disease) False Negative Error: incorrectly classifying someone who has a particul ar characteristic as someone who does not possess that characteristic (e. g., person has disease, but is incorrectly identified as not having it) |
||||||||||||||||||||||||||||||||||||||||||||||||
|
Type I and Type II Errors The decision to reject the null hypothesis is based on probabilities rather than on certainties. The decision is made without direct knowledge of the true state of affairs in the population. There are two possible decisions: (1) reject the null hypothesis, or (2) fail to reject (accept) the null hypothesis. There are also two possibilities that may be true in the population: (1) the null hypothesis is true, or (2) the experimental hypothesis is true. Thus, there are two kinds of correct decisions and two kinds of errors. Most scientists begin with the assumption that the phenomenon they are studying does not cause the effect they expect -- the null hypothesis. In other words, the standard method of science is to presume 'innocence' and only with strong proof reject that assumption. Scientific conventions have developed regarding the strength of this presumption; that is, how much evidence is needed before rejecting the null hypothesis and accepting an alternative hypothesis that the experimental manipulation caused the observed effect (this will be discussed further in this chapter). It is important to realize, however, that an attempt to decrease one type of error results in an increased likelihood of making the other type of error. |
Type I Error: when the researcher rejects the null hypothesis but the null hypothesis is actually true (e. g., the researcher claims that there is a causal relationship between variable A and variable B when, in fact, there is not) Type II Error: when the researcher fails to reject the null hypothesis (i. e., accepts the null hypothesis) when in actuality the experimental hypothesis is true (e. g., the researcher claims there is no causal relationship between variable A and variable B when, in fact, there is one) |
||||||||||||||||||||||||||||||||||||||||||||||||
Consider the decision made by a juror in a criminal trial. As is the case with statistics, a decision must be made on the basis of evidence: Is the defendant innocent or guilty? However, the decision is the juror's and does not necessarily reflect the true state of affairs that the person really is innocent or guilty. Assume the null hypothesis is that the defendant is innocent. The of the null hypothesis is to decide, based upon the evidence, that the defendant is guilty. of the null hypothesis is to decide, based upon the evidence, that the defendant is innocent.
|
|||||||||||||||||||||||||||||||||||||||||||||||||
|
Confidence Level An alternative indicator of the probability of a Type I error is the confidence level. It specifies the range of values around the empirically obtained result within which the "true" or population value is likely to lie. Confidence levels are frequently reported in sample surveys. For example, it might be reported that the 95% confidence level for an obtained percentage of 20% is 20% plus or minus 3%. The higher the confidence level, the lower the probability of a Type I error, but the broader the range of values within which the "true" or population value might actually lie. Both types of error are important. For example, in a toxic tort case, a Type I error could mean that the frequency of occurrence of a symptom among workers could be accepted as indicating that the symptom was caused by a toxic substance found in the plant environment, whereas in fact that frequency of occurrence was just at the outer extreme of random fluctuation and was not a reflection of a causal link. The firm might improperly be held accountable. But with a Type II error, the frequency of occurrence of the symptom would be taken as well within the normal range of fluctuation, and no causal link between substance and symptom frequency would be inferred. The firm could be erroneously exonerated. The less likely a Type I error, the more likely a Type II error. |
To what extent do judges around the
country find the concept of error rate a useful criterion
for critically evaluating scientific evidence?
![]() All judges in the survey sample, even those not in FRE/Daubert states, were asked how useful they thought the concept of error rate is for admissibility decision- making (N= 400). The majority (91%) indicated that a consideration of error rate was a useful when determining the admissibility of scientific evidence, with 54% of those judges rating error rate as very useful. Focusing just on responses from judges in states which follow the FRE/Daubert standards, the vast majority of judges rated error rate as a useful guideline for evaluating the admissibility of scientific evidence. Even though the vast majority of judges rated error rate as a useful guide, the results of the survey indicate that judges do not fully understand the scientific meaning of error rates and that, as a result, they are unsure how to utilize the concept as a guideline for determining admissibility. When asked a question about how they would apply the concept of error rate, the majority of judges expressed some hesitancy or uncertainty. In order for a response to be coded as judge understands concept the response had to include reference to an evaluation of the variety of sources of error, or refer to a number or percent of instances in which the classification procedure was mis- classified. From the answers provided, the researchers could only infer a true understanding of the concept in 4% of the responses (N= 400). In 86% of the responses the judges understanding of the concept was questionable. In 10% of the responses, the judge relied solely upon a low error vs. high error heuristic (or rule of thumb) when explaining how the concept of error rate is applied to admissibility (i. e., if there is a high rate of error then the judge is more likely to exclude the evidence than if there is a low rate of error). |
||||||||||||||||||||||||||||||||||||||||||||||||
|
Significance and P-Values In order to decide whether the difference in the observed score differs significantly from the null hypothesis, a standard or criterion for deciding whether to accept or reject the null hypothesis must be established. Statisticians typically use two levels of significance: .05 and .01. These levels have been established by convention. When a significance level of p<.05 is chosen, the decision rule is that the null hypothesis will be rejected if the data are so unlikely that they could have occurred by chance less than 5 times out of 100. If a significance level of p<.01 is chosen, the probability of the observed value occurring by chance is less than 1 in 100. The odds of making a Type I error (rejecting the null hypothesis when it is true) are exactly equal to the value chosen for the significance level. That is, if a researcher has chosen a significance value of .05, the probability of a Type I error is .05 -- 5 times out of 100 (5%) the researcher will reject the null hypothesis when it is true. That is, there will be 5 times out of 100 when extreme differences are due to chance and not to some experimental manipulation. Can the odds of making a Type I error be minimized by choosing a more extreme significance level (e.g., p<.01)? Yes, but there is a trade-off: an increased likelihood of making a Type II error (failure to reject an hypothesis when it is false) -- the researcher concludes that the results were caused by chance and not by the experimental manipulation. |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
Statistical Significance and Legal Significance The scientist's concept of statistical error does not translate directly into the judge's concept of legal error. It cannot be said, therefore, that a study that is statistically significant at the .05 level of confidence will lead judges, if they admit the evidence, to make only 5 errors (Type I errors) out of 100. There is no true correspondence between statistical confidence and legal burdens of proof. Statistical Significance and Importance The significance of a finding (the probability of a Type I error) does not have a clear relationship to the importance of the finding, either. A small difference, or a small correlation, could still be highly significant statistically, if the sample were large enough. A finding of small magnitude would still be reliably replicated on repeated investigations, if large portions of the population were included in each investigation - but although dependable, the finding might not have practical or theoretical importance. |
|||||||||||||||||||||||||||||||||||||||||||||||||
|
Limitations of moving from statistical significance to legal significance:
|
Confidence Level: Specifies a range of values around the emperically obtained result, within which the "true" or population value is likely to lie. |
||||||||||||||||||||||||||||||||||||||||||||||||
|
Endnote: 1. Daubert vs. Merrell Dow Pharmaceuticals 509 U.S. 579, 113 S.Ct., 2786 at 508. |
|
||||||||||||||||||||||||||||||||||||||||||||||||
|
Glossary bias a constant difference, in one direction, between the mean of the sample and the mean of the population; occurs when most of the sampling error loads up on one side, so that the sample means are constantly either over- or under-estimating the population mean bimodal distribution a distribution of scores with two modal scores (two commonly occurring scores) confidence level specifies a range of values around the empirically obtained result within which the "true" or population value is likely to lie correlation an association between two variables; can be positive or negative; correlation does not equal causation correlation coefficient a number between -1 and 1 which measures the degree to which two variables are linearly related; if there is a perfect positive linear relationship, r = 1 (i.e., an increase in one variable is associated with an increase (or decrease) in the other variable); if there is a perfect negative linear relationship, r = -1 (i.e., an increase (decrease) in one variable is associated with a decrease (increase) in the other variable; if r = 0 there is no linear relationship between the variables decision rule specifies precisely when the null hypothesis should be rejected descriptive statistics statistics that summarize, describe, and make understandable the numbers generated in a research study distribution the arrangement of any set of scores or values in order of magnitude error rate the likelihood with which a measurement device or a technological procedure leads to an incorrect classification false negative error incorrectly classifying someone who has a particular characteristic as someone who does not possess that characteristic (e.g., person has disease, but is incorrectly identified as not having it) false positive error incorrectly classifying someone without a particular characteristic as possessing that characteristic (e.g., person does not have disease, but incorrectly classified as having disease) frequency distribution a listing, or order of magnitude, of each score and how many times that score occurred inferential statistics statistics used to draw conclusions and inferences which are based upon, but go beyond, the numbers generated by a research study interval scale a unit of measurement characterized by equal intervals; measures differences in amount (e.g., I.Q. score) linear regression predicts the magnitude of the expected change in variable Y given a change in variable X mean the arithmetic average of all the scores; calculated by adding all the scores together and then dividing by the total number of scores involved measures of central measures that provide information about the average, or typical, score of a large tendency number of scores; which single score (mean, median, mode) best represents an entire set of scores measures of variability procedures used to describe the extent to which scores differ from one another in a distribution; range, standard deviation, and variance statistics median the exact midpoint of any distribution; much more accurate representation of central tendency than the mean; to calculate the median, the scores must first be arranged in order of magnitude (e.g., from lowest to highest), the middle score is the median mode a measure of central tendency; the most common single number in the distribution; in a perfectly symmetrical unimodel distribution, the mode is the same as the mean; when it is not the same, the mode is not really a good representative value of the distribution multiple linear designed to examine the relationship between a response variable and several regression possible predictor variables negatively skewed distribution in which scores are concentrated near the top of the distribution; distribution tail of the distribution points to the low or negative end nominal scale a unit of measurement based on classification; measures differences in kind (e.g., ethnicity) nonlinear regression designed to describe the relationship between a response variable and one or more explanatory variables in a non-linear fashion normal curve a theoretical distribution; a unimodal frequency distribution with scores plotted on the X axis (the horizontal axis) and frequency plotted on the Y axis (the vertical axis); most of the scores cluster around the middle of the distribution; curve is symmetrical and all three measures of central tendency (mean, median, mode) fall precisely at the middle of the distribution ordinal scale unit of measurement characterized by order and classification; measures differences in degree (e.g., attitudes) pearson's product a measure of the linear association between two variables that have been measured moment correlation on interval or ration scales (e.g., the relationship between height in inches and coefficient weight in pounds); usually denoted by r, is an example of a correlation coefficient population an entire group of persons, things, or events having at least one trait in common; the larger group of all people of interest from which the sample is selected positively skewed distribution in which scores are concentrated near the bottom of the distribution; distribution tail of the distribution points to the top or positive end range a measure of variability; the width or spread of an entire distribution; found simply by calculating the difference between the highest and lowest scores regression predicts the extent to which the value of one or more variables can be predicted by knowing the value of other variables ratio scale a unit of measurement characterized by a true zero and equal intervals; measures differences in total amount (e.g., income) sample a smaller number of observations taken from the total number making up the population; in typical applications of inferential statistics, the sample size is small relative to the population size simple linear regression designed to determine whether there is a linear relationship between a response variable and a possible predictor variable skewed distribution a distribution of scores where the majority of scores in the distribution bunch up at one end of the distribution standard deviation a measure of variability; a measure of the variability that indicates by how much all of the scores in the distribution typically deviate or vary from the mean standard normal curve the normal curve is marked off in units of standard deviation; a normally distributed set of scaled scores whose mean is always equal to zero and whose standard deviation equals 1.00 true positive error correctly classifying someone as possessing a particular characteristic or falling into a particular category (e.g., person has disease and is classified as having disease) true negative error correctly classifying someone who does not possess a particular characteristic or who does not fall into a particular category (e.g, person does not have disease, and is classified as not having the disease) type I error when the researcher rejects the null hypothesis when the null hypothesis is true type II error when the researcher fails to reject the null hypothesis when the null hypothesis is false unimodal distribution a distribution of scores with a single modal score variance measures how much of the variance between
people on one characteristic can be explained by where they
stand on another characteristic |
Common Problems with the Use of Statistical Evidence in Court Statistics in court are not presented in their natural form Statistics presented in court are rarely presented in a single, complete presentation (i. e., one side presents statistical evidence that is challenged on cross examination and then, at a later point in the trial, the other side proffers opposing statistical conclusions). In court, statistics are often presented in graphic form and there is rarely a detailed discussion of the statistical techniques and models used, their assumptions and shortcomings. Improper inferences are drawn When statistics are presented in court, improper inferences are often drawn about what the data mean and what conclusions can be drawn. This problem typically occurs in three ways: (1) by extrapolating results of a statistical analysis to a population that is different from the population defined in the study; (2) by inferring, within the correct population, something beyond what is statistically correct given the available data and analysis; and (3) by misinterpreting statistical significance and the burden of proof. Improper methodologies used Methodological problems that undermine the scientific validity or relevance of statistical results occur at many stages of the research: study design, data collection, and data analysis. Iancu, C. A., and Steitz, P. W. (1997). Guide to Statistics,
Questions to consider when evaluating scientific evidence...
|
||||||||||||||||||||||||||||||||||||||||||||||||
|
Suggested Readings: Barnes, D.W. (1983). Statistics as Proof: Fundamentals of Quantitative Evidence. Boston: Little, Brown and Company. DeVore, J. and Peck, R. (1997). Statistics: The Exploration and Analysis of Data, 3rd Edition. San Francisco, CA: Duxbury. Hagg, R.V. and Craig, A. T. (1995). Introduction to Statistics, 5th Edition. Englewood Cliffs, N.J.: Prentince Hall. Iancu, C.A., and Steitz, P.W. (1997). "Guide to Statistics." In Expert Evidence: A Practitioner's Guide to Law, Science, and the FJC Manual. Washington, D.C.: Government Printing Office. pgs. 298-310. Kaye, D.H., and Freedman, D.A. (1994). "Reference Guide to Statistics." In the Federal Judicial Center's Reference Manual on Scientific Evidence. St. Paul, Minnesota: West., pgs. 331-414. Saville, D.J. and Wood, G.R. (1996). Statistical Methods: A Primer. New York: Springer |
|
||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||