thumbnail


Chi squared test for independence in a contingency table


DOWNLOAD
FREE



thumbnail

Chi squared test for independence in a contingency table


Two properties are associated if the probability of having one property affects the probability of having another. For example, the probability of passing an exam is increased by hard work. Also, it is generally agreed, as a matter of sociological observation, that the probability of having an income above the average is associated with your father&'s occupational background. The opposite of association of properties is independence. Two properties are independent if having one property does not alter the probability of having the other. For example, sex and intelligence are thought to be independent. Being male does not increase the chance of being stupid! Sometimes it is not known whether two properties are associated or not. What is required is a test of association, or, what is equivalent, a test of independence. The[Equation goes here &- download the original to see it.] distribution can be used as a test of independence. We illustrate this by introducing an example. Example (1) A psychologist conducted a survey into the relationship between the way in which a calculator was held and the speed with which 10 arithmetical operations were performed. The calculator could be either placed on a table or held in the hand; the sums could be performed in either less than 2 minutes, between 2 and 3 minutes or more than 3 minutes. The following results were obtained for a sample of 150 children between 12 and 13 years old. [Table goes here &- download the original to see it.] Test at the 5% significance level whether the mode of operation and speed of computation are associated. This kind of table is called a contingency table. Since there are three rows and two columns it is a [Equation goes here &- download the original to see it.] contingency table. &: Speed and mode are independent. &: Speed and mode are associated. In order to determine whether the two variables are associated it is necessary to calculate what the frequencies would be if there was absolutely no connection between them. In other words, we seek the expected frequencies. We can then compare these expected frequencies with the observed frequencies by means of a [Equation goes here &- download the original to see it.] To determine these expected frequencies, observe that the probability of being in a given row of the contingency table would be [Equation goes here &- download the original to see it.] Likewise, the probability of being in a given column would be [Equation goes here &- download the original to see it.] Thus the expected probability for a cell that is in the ith row and the jth column, assuming that the two probabilities are independent, is [Equation goes here &- download the original to see it.] where n is the sample size. We calculate expected frequencies by multiplying the probability of being in a given cell by the total sample size, n. Thus, the expected frequency for the cell in the ith row and the jth column is&: [Equation goes here &- download the original to see it.] Example continued For our example we find first the row and column totals&: [Table goes here &- download the original to see it.] The expected frequencies are&: [Table goes here &- download the original to see it.] Now we have to use the expected and observed frequencies to calculate a test statistic. The [Equation goes here &- download the original to see it.] [Equation goes here &- download the original to see it.] Let us make a table of the value of [Equation goes here &- download the original to see it.] for each of the entries in the above contingency table. We call each entry a contribution. The contributions made to the test statistic are, therefore [Table goes here &- download the original to see it.] The [Equation goes here &- download the original to see it.] symbol indicates that we make a total of all these entries. [Equation goes here &- download the original to see it.] In order to compare this with a critical value, we need to know the degrees of freedom of statistic. The degrees of freedom of any row is constrained by one, because the last frequency is determined by the other entries in the row. Likewise the degrees of freedom of a column is the number of columns less one. Thus, in general [Equation goes here &- download the original to see it.] Hence, in this example, [Equation goes here &- download the original to see it.] [Equation goes here &- download the original to see it.] Therefore, we reject and accept . The result is significant at the 0.05 or 5% level. This means that there is a 5 in 100 probability that the difference between the two conditions could have arisen by chance. According to these results the way you use your calculator does affect the speed with which you do a calculation. In the case of [Equation goes here &- download the original to see it.] contingency tables you are advised to use Yates&' correction, i.e. to use test statistic [Equation goes here &- download the original to see it.] instead o [Equation goes here &- download the original to see it. This is especially important with small samples when some or all expected frequencies are less than 5. For large samples, corrected and uncorrected chi&-square values may be practically the same and as a consequence the correction factor may be ignored.[Example goes here &- download the original to see it.]
Contents of
Chi squared test for independence in a contingency table

1 Chi squared test for independence in a contingency table

Related articles: (1) Hypothesis testing, (2) The Chi squared distribution