Student's T criterion automatic calculation online. Determining the significance of differences by t - Student's criterion

In most cases, to compare the averages of two independent samples(p. 91) apply Student's t-test. Since the Student's criterion is parametric, its use is possible only if the results of the study are presented in the form of measurements according to relationship scale(p. 90).

Student's criterion is denoted t and is calculated by the formula*:

t = x1 - x2 / √ m1² + m2²

In cases where the number of observations (n) is more than 500, the significance level at p = 0.05 is reached at t = 1.96, the significance levels at p = 0.01 or p = 0.001, respectively, are achieved at t = 2.59 and t = 3.29.

If the number of observations is less than 500, the required value of t for different significance levels is determined from Table 10.

Before turning to the table, it is necessary to determine the number degrees of freedom. This term refers to the number of independent quantities involved in the formation of one or another parameter (f). The rules for determining the degrees of freedom are presented in various manuals on mathematical statistics (Yu.K. Demyanenko, 1968). When calculating the Student's criterion t, the total number of degrees of freedom (f) will be equal to n1 + n2 - 2.

So, for example, when comparing the results shown by the skiers of the experimental and control groups in passing the control distance, the following data were obtained: the average in the experimental group (n = 12 people) was x = 34.6 sec, the error of the mean value m = 0.47 sec ; in the control group (n = 14 people), these data were, respectively, x = 37.3 sec, m = 0.49 sec.

Substituting the values ​​into the formula, we get the value of t.

t \u003d 37.3 - 34.6 / √ V 0.49 2 + 0.47 2 \u003d 2.7 / 0.68 \u003d 3.97

After determining the number of degrees of freedom (f \u003d 12 + 14 - 2 \u003d 24), we find the value of t from the table. The resulting value of 3.97 exceeds the table value for the 99% confidence level. Hence, we can state that there are significant differences between the results of the two compared groups at the significance level p< 0,01.



With relatively large numbers of measurements, it is conditionally assumed that if the difference between the arithmetic means is equal to or greater than three of its errors, the differences are considered significant. In this case, the reliability of differences is determined by the following equation:

X E -X K > 3√ me + mk ²

In the above example, the results of those involved in different groups were compared, that is, independent samples. In the case when the results obtained at the beginning and end of the experiment in the same group are compared, that is, when dependent samples, calculate Student's t-test using the usual formula it is forbidden . Student's criterion in this case should be calculated by the formula:

t \u003d X 1 -X 2 / m1 ² + m2² - 2rm1 m2

where r - correlation coefficient between the initial and final results for the studied trait.

Table 10

Limit values t (Student's criterion)

f Confidence levels (P)
95% . 99% 99,9%
12,71 63.60
4.30 9.93 31.60
3.18 5.84 12.94
2.78 4.60 8.61
2.57 4.03 6.86
2.45 3.71 5.96
2.37 3.50 5.41
2.31 3.36 5,04
2.26 3.25 4.78
2.23 3.17 4.59
P 2.20 3.11 4.44
2.18 3.06 4.32
1.16 3.01 4.22
2.15 2,98 4,14
2.13 2.95 4.07
2.12 2,92 4.02
2.11 2.90 3.97
2.10 2.88 3.92
2.09 2.86 3.88
2.09 2.85 3.85
2.08 2,83 3.82
2.07 2.82 3.79
2.07 2.81 3,77
2.06 2.80 3.75
2,06 2.79 3.73
2.06 2.78 3.71
2.05 2.77 3.69
2.05 2.76 3.67
2.04 2.76 3.66
2.04 2,75 " 3.65
2.02 2,70 3.55
2.01 2.68 3,50
2.00 2.66 3.46
1.99 2.64 3.42
1.98 2.63 3.39
1.98 2,62 3.37
1.97 2.60 3.34
1.96 2,59 3.31
oo 1.96 2.59 3.29
Significance levels (p)
0,05 0,01 0,001

Formulation of conclusions

(conclusion)

At the end of the work, conclusions are drawn. The formulation of conclusions, along with the formulation of the introduction, is one of the most difficult and critical stages in the design of any term paper.

The conclusions should reflect the most significant results of the study.

There are several common mistakes in drawing conclusions. Often a student constructs a sentence in such a way that it sounds like a declaration of the results of the work he has done (“studied”, “developed”, etc.). For example:

“In the course of the study, the main provisions of the experimental methodology were determined ...” or “Indicators were identified that allow evaluating the communicative skills of students of pedagogical specialties in the implementation of physical education and health work with schoolchildren ...”.

In order for the above to be conclusions, the phrases should have been built something like this: “The provisions of the experimental methodology formulated by us allow ...” and, accordingly: “Of the selected indicators, the most informative, allowing to assess the level of students’ communication skills pedagogical specialties, are...»

Another common mistake is the statement by the student in the conclusion of something obvious, for the statement of which it is not required to conduct special research. For example:

“In physical exercises with schoolchildren, it is necessary to take into account the developmental features of a teenager of this age.”

Sometimes the conclusion turns out to be completely meaningless. This is usually the first conclusion that a student makes on the basis of literature analysis. For example:

"The analysis of scientific and methodological literature showed that in the theory of physical education the question of the use of simulators in the sports training of swimmers has not yet been fully disclosed."

The conclusions should informatively reflect the work done by the student, but should not be verbose.


FORM REQUIREMENTS

COURSE WORKS

The following structural components should be presented in the final qualifying work:

· title page;

· introduction;

· main text(chapter 1, chapter 2);

· conclusions (conclusion);

· bibliography;

· applications(if they are needed).

The optimal amount of term paper is 40-50 pages of typewritten text in 1 ,5 interval (including figures, tables, graphs, bibliography and appendices).

Font size 14 Times New Roman.

The work is drawn up in a computer or handwritten form (the second option is less desirable).

In the computer version, the text of the work is printed at one and a half intervals on one side of a standard sheet of A4 paper (210x297 mm). Margins of the work page should have the following dimensions: left - 30 mm, right - 10 mm, top - 20 mm, bottom - 25 mm.

Tables, figures, drawings, diagrams, graphs must be made on standard A4 sheets (210x297 mm). Signatures and explanations must be on the front side.

All pages of final attestation works, including illustrations and applications, are numbered in order from the title page to the last page without omissions or repetitions. The title page is considered the first page, the number "1" is not put on it, the number "2" is put on the next page, etc. The serial number is placed in the middle of the bottom margin of the page.

All material of final attestation works in accordance with the table of contents (plan) is divided into paragraphs. The title of paragraphs should correspond to the content and be printed as a heading in lowercase letters without underlining.

In the work, standard generally accepted abbreviations such as "etc", "etc", "etc." "etc.", "see", "p."

A sample of the design of tables and illustrations is given in Appendix 3.

Title page

The title page is information about the work. It indicates the name of the institution where the work was performed; surname, name, patronymic of the author; title; surname, name, patronymic, academic degree and academic title of the supervisor (consultant); city, year The title page of the final attestation works is shown in Figure 1.

Federal State Autonomous Educational Institution

Higher education

"Nizhny Novgorod State University. N.I. Lobachevsky"

Arzamas branch

Faculty of Natural Geography

Department of Physical Culture

Coursework by discipline

"Theory and methods of physical culture"

on the topic:

"Methodological features of physical culture and recreation classes

with preschool children

Completed:

Ivanov A.V.,

student of direction 034300 (49.03.01)

Physical Culture

profile "Management in the field of

physical culture"

form of education - part-time

(full term of study /

accelerated learning program)

1 (2) course of study, group 11(12)

Supervisor:

PhD, Associate Professor Sidorova T.V.

Arzamas

Rice. 1. Sample of the title page of the term paper

Graduation papers use the word "table of contents", not "content". Table of contents is an index of headings (chapters) of a single work, while content is an index of the titles of the various works included in the publication. From the point of view of the culture of reading, the table of contents is placed at the beginning of the work: it is from the table of contents that the reader begins his acquaintance with the study.

When designing a table of contents, each subordinate heading should be indented to the right of the previous main heading to which it refers, placing the first digit under the capital letter of the heading to which it directly refers. All headings of equal grade should start from the same vertical line. Such a construction of the plan allows you to clearly see the subordination of all the material. For example:

Introduction. . . . . . . . . . . . . . . . . . . . . . . .. . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The problem of forming students' knowledge in order to increase their motivation for physical exercises. . . . ………….. . . .
1. Physical culture of students at the present stage. . . . .………...
1.11.1 Change of priorities in physical exercises for students in the 20-90s. . . . . . . . ………..
1.1 1.2 Direction of modern education of students in the field of physical culture. . . . . . . . …………
2. Formation of students' motivation for physical exercises. . . . . . . . . . . . . . . . . . . . ………
2.12.2 The attitude of students to physical exercises.
Conclusion…………………………………………………………………… . fourteen
Bibliography………………………………………………………………
Applications

In order to make the indents in the table of contents the same and align the page numbers, it is advisable to use the table format, the lines of which are set invisible in the parameters.

In the final attestation works, the rubrication of the text is of great importance. Headings reveal the structure of the text, show the connection and interdependence of sections and subsections.

Headings of paragraphs should accurately reflect the content of the text related to them. They should not reduce or expand the amount of semantic information that they contain.

The headings of paragraphs and subparagraphs are located in the middle of a separate line and are printed in bold roman type, in lowercase letters, except for the first capital letter (Fig. 2).

1.1. The concept of posture

Rice. 2. Sample paragraph title design

The heading is separated from the text following it by one interval (one non-printable character), and from the preceding text by two intervals (two non-printable characters standing one under the other). The title cannot be the last line on the page.

The paragraph indent is set through the options "Format" ® "Paragraph" ® "Indents and intervals" ® "First line" ® "Indent" ® 1.25 cm (1.27 cm). A keystroke does not set a paragraph indent!

Highlights

The subordination of the content within the paragraph, the delimitation of parts and elements of the text in terms of significance is made out by highlighting the font (of a different saturation, with the inclination of the strokes of the letters, in spacing).

In scientific works, it is customary to use the subordination of fonts (Table 11).

​ Paired Student's t-test is one of the modifications of the Student's method used to determine the statistical significance of differences in paired (repeated) measurements.

1. History of the development of the t-test

t-test was developed William Gosset to assess the quality of beer at Guinness. In connection with obligations to the company not to disclose trade secrets, Gosset's article was published in 1908 in the journal Biometrics under the pseudonym "Student" (Student).

2. What is the paired Student's t-test used for?

Paired Student's t-test is used to compare two dependent (paired) samples. Dependent are measurements taken in the same patients, but at different times, for example, blood pressure in hypertensive patients before and after taking an antihypertensive drug. The null hypothesis states that there are no differences between the compared samples, while the alternative hypothesis states that there are statistically significant differences.

3. When can paired Student's t-test be used?

The main condition is sample dependence, that is, the compared values ​​should be obtained by repeated measurements of one parameter.

As in the case of comparing independent samples, in order to apply the paired t-test, it is necessary that the original data have normal distribution. If this condition is not met, methods should be used to compare sample means. nonparametric statistics, such as the G-test signs and Wilcoxon t-test.

Paired t-test can only be used when comparing two samples. If you need to compare three or more repeated measurements, use one-way analysis of variance for repeated measures.

4. How to calculate paired Student's t-test?

The paired Student's t-test is calculated using the following formula:

where M d - the arithmetic mean of the differences between the indicators measured before and after, σd - standard deviation of the differences of indicators, n - the number of subjects.

5. How to interpret the value of Student's t-test?

The interpretation of the obtained value of the paired Student's t-test does not differ from the evaluation of the t-test for unrelated populations. First of all, it is necessary to find the number of degrees of freedom f according to the following formula:

f = n - 1

After that, we determine the critical value of Student's t-test for the required significance level (for example, p<0,05) и при данном числе степеней свободы f according to the table ( see below).

We compare the critical and calculated values ​​of the criterion:

  • If the calculated value of the paired Student's t-test equal or greater critical, found in the table, we conclude that the differences between the compared values ​​are statistically significant.
  • If the value of the calculated paired Student's t-test smaller tabular, which means that the differences between the compared values ​​are not statistically significant.

6. An example of calculating the Student's t-test

To evaluate the effectiveness of a new hypoglycemic agent, blood glucose levels were measured in patients with diabetes mellitus before and after taking the drug. As a result, the following data were obtained:

Decision:

1. Calculate the difference of each pair of values ​​( d):

Patient N Blood glucose level, mmol/l Value difference (d)
before taking the drug after taking the drug
1 9.6 5.7 3.9
2 8.1 5.4 2.7
3 8.8 6.4 2.4
4 7.9 5.5 2.4
5 9.2 5.3 3.9
6 8.0 5.2 2.8
7 8.4 5.1 3.3
8 10.1 6.9 3.2
9 7.8 7.5 2.3
10 8.1 5.0 3.1

2. Find the arithmetic mean of the differences using the formula:

3. Find the standard deviation of the differences from the average by the formula:

4. Calculate the paired Student's t-test:

5. Let's compare the obtained value of Student's t-test 8.6 with the tabular value, which, with the number of degrees of freedom f equal to 10 - 1 = 9 and the significance level p=0.05 is 2.262. Since the obtained value is greater than the critical one, we conclude that there are statistically significant differences in blood glucose levels before and after taking the new drug.

Most often in psychological research, tasks are observed to identify differences between two or more groups of signs. The clarification of such differences at the level of arithmetic means is considered in the analysis of primary statistics. However, the question arises as to how reliable these differences are and whether they can be extended (extrapolated) to the entire population. To solve this problem, they most often use (under the condition of a normal or close to normal distribution) t - criterion (Student's criterion), which is designed to find out how significantly the indicators of one sample of subjects differ from another (for example, when the subjects receive as a result of testing one group higher scores than the representatives of the other). This is a parametric criterion, has two main forms:

1) unrelated (odd) t - a criterion designed to find out if there are differences between the scores obtained when using the same test to test two groups formed from different people. For example, this can be a comparison of the level of intelligence or neuropsychic stability, anxiety of successful and unsuccessful students, or a comparison of students of different classes, ages, social levels, etc., on these grounds. There may be heterosexual, multinational samples, as well as subsamples in the studied samples, selected according to a certain attribute. The criterion is called "unrelated" because the compared groups are formed from different people;

2) connected (paired) t - a criterion used to compare the indicators of two groups, between the elements of which there is a specific relationship. This means that each element of the first group corresponds to an element of the second group, similar to it in a certain parameter of interest to the researcher. Most often, the parameters of the same persons are compared before and after a certain event or action (for example, in the process of conducting a longitudinal study or a formative experiment). Therefore, this criterion is used to compare the performance of the same individuals before and after an examination, experiment, or the passage of a certain time.

If the data is not normally distributed, use nonparametric tests equivalent to the t-test: the Mann-Whitney test, equivalent to an odd t-test, and the Wilcoxon two-sample test, equivalent to a paired t-test.

With the help of t-tests and their non-parametric equivalents, one can only compare the results of two groups obtained using the same test. However, in some cases it becomes necessary to compare several groups or assessments of several types. This can be done in stages by dividing the task into several pairs of comparisons (for example, if you need to compare groups A, B and Y according to the results of tests X and Y, then using the t-criterion, first compare groups A and B according to the results of test X, then A and B according to the results of test C, A and C according to the results of test X, etc.). However, this is a very time-consuming method, so a more complex method of analysis of variance is resorted to.

The method for assessing the reliability of differences in arithmetic means by a fairly effective parametric Student's test is designed to solve one of the problems most often observed in data processing - identifying the reliability of differences between two or more series of values. Such an assessment is often necessary in the comparative analysis of polar groups. they are distinguished on the basis of different severity of a certain target feature (characteristic) of the phenomenon under study. As a rule, the analysis begins with the calculation of the primary statistics of the selected groups ", then the significance of the differences is assessed. Student's t-test is calculated by the formula:

The value of Student's test for three levels of confidence (statistical) significance (p) is given in reference books on mathematical statistics. The number of degrees of freedom is determined by the formula:

With decreasing sample sizes (n<10) критерий Стьюдента становится чувствительным к форме распределения исследуемого признака в генеральной совокупности. Поэтому в сомнительных случаях рекомендуют использовать непараметрические методы или сравнивать полученные значения с критическими (табл. 2.17) для высшего уровня значимости.

The decision on the reliability of differences is taken if the calculated value of t exceeds the tabular value for a certain number of degrees of freedom (d (v)). In publications or scientific reports indicate the highest level of significance of the three: p<0,05; р <0,01; р <0,001.

For any numerical value of the criterion for the significance of the difference between the means, this indicator does not evaluate the degree of the revealed difference (it is assessed by the very difference between the means), but only its statistical significance, that is, the right to extend the conclusion obtained on the basis of a comparison of samples that there is a difference to the entire phenomenon (the entire process) as a whole. A low calculated difference criterion cannot serve as proof of the absence of a difference between two features (phenomena), because its significance (significance) depends not only on the average value, but also on the number of compared samples. He points not to the absence of a difference, but to the fact that with such a sample size it is statistically unreliable: there is a very high chance that the difference under these conditions is random, and the probability of its reliability is very small.

Table 2.17. Confidence limits for Student's t-test (t-test) for f degrees of freedom

of the average task completion time in the second attempt (compared to the first trial) is not significant.

This expression is not equivalent to a statement about the statistical homogeneity of the two samples that are compared. In addition, the application of the Student's test in the case of such unequal samples is not quite correct mathematically and, of course, affects the final result about the unreliability of the differences Xav = 9.1 and Xav = 8.5. Using this criterion, they do not evaluate the degree of closeness of two averages, but consider the assignment or seine carrying by chance (at a given level of significance). .

where f is the degree of freedom, which is defined as

Example . Two groups of students were trained according to two different methods. At the end of the training, they were given a test throughout the course. It is necessary to assess how significant the differences in the acquired knowledge are. The test results are presented in table 4.

Table 4

Calculate the sample mean, variance and standard deviation:

Determine the value of t p by the formula t p = 0.45

According to table 1 (see Appendix), we find the critical value t k for the significance level p = 0.01

Conclusion: since the calculated value of the criterion is less than the critical value of 0.45<2,88 гипотеза Но подтверждается и существенных различий в методиках обучения нет на уровне значимости 0,01.

Algorithm for calculating Student's t-test for dependent samples of measurements

1. Determine the calculated value of the t-criterion using the formula

, where

2. Calculate the degree of freedom f

3. Determine the critical value of the t-test according to Table 1 of the Appendix.

4. Compare the calculated and critical values ​​of the t-criterion. If the calculated value is greater than or equal to the critical value, then the hypothesis of equality of the means in the two change samples is rejected (But). In all other cases, it is taken at a given level of significance.

U- criterionManna- Whitney

Purpose of the criterion

The criterion is designed to assess the differences between two non-parametric samples in terms of the level of any trait, quantitatively measured. It allows you to identify differences between small samples when n< 30.

Description of the criterion

This method determines if the area of ​​overlapping values ​​between two series is small enough. The smaller this area, the more likely it is that the differences are significant. The empirical value of the U criterion reflects how large the zone of coincidence between the rows is. Therefore, the smaller U, the more likely it is that the differences are significant.

Hypotheses

BUT: The level of the feature in group 2 is not lower than the level of the feature in group 1.

HI: The level of the trait in group 2 is lower than the level of the trait in group 1.

Algorithm for calculating the Mann-Whitney criterion (u)

    Transfer all the data of the subjects to individual cards.

    Mark the cards of the subjects of sample 1 with one color, say red, and all the cards from sample 2 with another, for example, blue.

    Lay out all the cards in a single row according to the degree of growth of the attribute, regardless of which sample they belong to, as if we were working with one large sample.


where n 1 is the number of subjects in sample 1;

n 2 - the number of subjects in sample 2,

T x - the larger of the two rand sums;

n x - the number of subjects in the group with a larger sum of ranks.

9. Determine the critical values ​​of U according to table 2 (see Appendix).

If U emp.> U kr0.05, then the hypothesis But is accepted. If U emp. ≤ U cr, then it is rejected. The smaller the U value, the higher the reliability of the differences.

Example. Compare the effectiveness of two teaching methods in two groups. The test results are presented in table 5.

Table 5

Let's transfer all the data to another table, highlighting the data of the second group with an underline and make the ranking of the total sample (see the ranking algorithm in the guidelines for task 3).

Values

Find the sum of the ranks of two samples and choose the largest of them: T x = 113

Let's calculate the empirical value of the criterion according to the formula 2: U p = 30.

Let us determine the critical value of the criterion from Table 2 of the Appendix at a significance level p = 0.05: U k = 19.

Conclusion: since the calculated value of the criterionUis greater than the critical level at the significance level p = 0.05 and 30 > 19, then the hypothesis of the equality of the means is accepted and the differences in teaching methods are insignificant.

The method allows you to test the hypothesis that the average values ​​of the two general populations from which the compared dependent samples are different from each other. The dependence assumption most often means that the trait is measured twice in the same sample, for example, before and after exposure. In the general case, each representative of one sample is assigned a representative from another sample (they are combined in pairs) so that the two data series are positively correlated with each other. Weaker types of dependence of samples: sample 1 - husbands, sample 2 - their wives; sample 1 - one-year-old children, sample 2 is made up of twins of children from sample 1, etc.

A testable statistical hypothesis, as in the previous case, H 0: M 1 = M 2(mean values ​​in samples 1 and 2 are equal). When it is rejected, an alternative hypothesis is accepted that M 1 more less) M 2 .

Initial Assumptions for statistical verification:

□ each representative of one sample (from one general population) is assigned a representative of another sample (from another general population);

□ the data of the two samples are positively correlated (paired);

□ the distribution of the trait under study in both samples corresponds to the normal law.

Initial data structure: there are two values ​​of the trait under study for each object (for each pair).

Restrictions: the distribution of the feature in both samples should not differ significantly from the normal one; the data of the two measurements corresponding to the one and the other sample are positively correlated.

Alternatives: the T-Wilcoxon test, if the distribution for at least one sample differs significantly from the normal one; t-student test for independent samples - if the data for two samples do not correlate positively.

Formula for the empirical value of Student's t-test reflects the fact that the unit of difference analysis is difference (shift) feature values ​​for each pair of observations. Accordingly, for each of the N pairs of feature values, the difference is first calculated d i \u003d x 1 i - x 2 i.

(3) where M d is the average difference of values; σ d is the standard deviation of the differences.

Calculation example:

Let's suppose that in the course of testing the effectiveness of the training, each of the 8 members of the group was asked the question "How often do your opinions coincide with the opinion of the group?" - twice, before and after the training. For answers, a 10-point scale was used: 1 - never, 5 - in half the cases, 10 - always. The hypothesis was tested that as a result of the training, the self-assessment of conformity (the desire to be like others in the group) of the participants will increase (α = 0.05). Let's make a table for intermediate calculations (Table 3).

Table 3

The arithmetic mean for the difference M d = (-6)/8= -0.75. Subtract this value from each d (the penultimate column of the table).

The formula for the standard deviation differs only in that d appears instead of X. We substitute all the necessary values, we get

σd = 0.886.

Step 1. Calculate the empirical value of the criterion using formula (3): the average difference M d= -0.75; standard deviation σ d = 0,886; t e = 2,39; df = 7.

Step 2. We determine the p-significance level from the table of critical values ​​of the Student's t-test. For df = 7, the empirical value is between the critical ones for p = 0.05 and p - 0.01. Therefore, p< 0,05.

df R
0,05 0,01 0,001
2,365 3,499 5,408

Step 3. We make a statistical decision and formulate a conclusion. The statistical hypothesis that the means are equal is rejected. Conclusion: the indicator of self-assessment of participants' conformity after the training increased statistically significantly (at the significance level p< 0,05).

Parametric methods include comparison of the variances of two samples by the criterion F-Fischer. Sometimes this method leads to valuable meaningful conclusions, and in the case of comparing means for independent samples, the comparison of variances is mandatory procedure.

To calculate F emp you need to find the ratio of the variances of the two samples, and so that the larger variance is in the numerator, and the smaller denominator.

Comparison of variances. The method allows you to test the hypothesis that the variances of the two general populations from which the compared samples are extracted differ from each other. Tested statistical hypothesis H 0: σ 1 2 = σ 2 2 (variance in sample 1 is equal to the variance in sample 2). When it is rejected, an alternative hypothesis is accepted that one variance is greater than the other.

Initial Assumptions: two samples are drawn randomly from different general populations with a normal distribution of the trait under study.

Initial data structure: the trait being studied is measured in objects (subjects), each of which belongs to one of the two compared samples.

Restrictions: The distributions of the feature in both samples do not differ significantly from the normal one.

Method alternative: Lieven's test (Levene "sTest), the application of which does not require checking the assumption of normality (used in the SPSS program).

Formula for the empirical value of the F-Fisher test:

(4)

where σ 1 2 - large dispersion, and σ 2 2 - smaller dispersion. Since it is not known in advance which variance is greater, then to determine the p-level, Table of critical values ​​for non-directional alternatives. If a F e > F Kp for the corresponding number of degrees of freedom, then R < 0,05 и статистическую гипотезу о равенстве дисперсий можно отклонить (для α = 0,05).

Calculation example:

The children were given the usual arithmetic tasks, after which one randomly selected half of the students were told that they had not passed the test, and the rest - the opposite. Then each child was asked how many seconds it would take him to solve a similar problem. The experimenter calculated the difference between the time called by the child and the result of the completed task (in seconds). It was expected that reporting failure would cause some inadequacy in the child's self-esteem. The tested hypothesis (at the level of α = 0.005) was that the variance of the population of self-assessments does not depend on reports of success or failure (Н 0: σ 1 2=σ 2 2).

The following data was received:


Step 1. Calculate the empirical value of the criterion and the number of degrees of freedom using formulas (4):

Step 2. According to the table of critical values ​​of the f-Fisher criterion for non-directional alternatives we find the critical value for df number = 11; df sign= 11. However, there is a critical value only for df number= 10 and df sign = 12. A greater number of degrees of freedom cannot be taken, therefore we take the critical value for df number= 10: For R = 0,05 F Kp = 3.526; for R = 0,01 F Kp = 5,418.

Step 3. Making a statistical decision and meaningful conclusion. Since the empirical value exceeds the critical value for R= 0.01 (and even more so for p = 0.05), then in this case p< 0,01 и принимается альтернативная гипо­теза: дисперсия в группе 1 превышает дисперсию в группе 2 (R< 0.01). Consequently, after reporting failure, the inadequacy of self-esteem is higher than after reporting success.

/ practical statistics / reference materials / student t-test values

Meaningt - Student's test at a significance level of 0.10, 0.05 and 0.01

ν – degrees of freedom of variation

Standard values ​​of Student's t-test

Number of degrees of freedom

Significance levels

Number of degrees of freedom

Significance levels

Table XI

Standard values ​​of the Fisher test used to assess the significance of differences between two samples

Degrees of freedom

Significance level

Degrees of freedom

Significance level

Student's t-test

Student's t-test- the general name for a class of methods for statistical testing of hypotheses (statistical tests) based on the Student's distribution. The most common cases of applying the t-test are related to checking the equality of the means in two samples.

t-statistics is usually constructed according to the following general principle: the numerator is a random variable with zero mathematical expectation (when the null hypothesis is fulfilled), and the denominator is the sample standard deviation of this random variable, obtained as the square root of the unmixed variance estimate.

Story

This criterion was developed by William Gosset to evaluate the quality of beer at Guinness. In connection with the obligations to the company for non-disclosure of trade secrets (the Guinness leadership considered such use of the statistical apparatus in their work), Gosset's article was published in 1908 in the journal Biometrics under the pseudonym "Student" (Student).

Data Requirements

To apply this criterion, it is necessary that the original data have a normal distribution. In the case of applying a two-sample test for independent samples, it is also necessary to comply with the condition of equality of variances. There are, however, alternatives to Student's t-test for situations with unequal variances.

The requirement that the data distribution be normal is necessary for the exact t (\displaystyle t) -test. However, even with other data distributions, it is possible to use the t (\displaystyle t) -statistic. In many cases, these statistics asymptotically have a standard normal distribution - N (0 , 1) (\displaystyle N(0,1)) , so quantiles of this distribution can be used. However, often even in this case, the quantiles are used not from the standard normal distribution, but from the corresponding Student's distribution, as in the exact t (\displaystyle t) -test. They are asymptotically equivalent, but on small samples, the confidence intervals of the Student's distribution are wider and more reliable.

One-sample t-test

It is used to test the null hypothesis H 0: E (X) = m (\displaystyle H_(0):E(X)=m) about the equality of the expectation E (X) (\displaystyle E(X)) to some known value m ( \displaystyle m) .

Obviously, under the null hypothesis E (X ¯) = m (\displaystyle E((\overline (X)))=m) . Given the assumed independence of the observations, V (X ¯) = σ 2 / n (\displaystyle V((\overline (X)))=\sigma ^(2)/n) . Using the unbiased variance estimate s X 2 = ∑ t = 1 n (X t − X ¯) 2 / (n − 1) (\displaystyle s_(X)^(2)=\sum _(t=1)^(n )(X_(t)-(\overline (X)))^(2)/(n-1)) we get the following t-statistic:

t = X ¯ − m s X / n (\displaystyle t=(\frac ((\overline (X))-m)(s_(X)/(\sqrt (n)))))

Under the null hypothesis, the distribution of this statistic is t (n − 1) (\displaystyle t(n-1)) . Therefore, if the value of statistics in absolute value exceeds the critical value of this distribution (at a given level of significance), the null hypothesis is rejected.

Two-sample t-test for independent samples

Let there be two independent samples of sizes n 1 , n 2 (\displaystyle n_(1)~,~n_(2)) of normally distributed random variables X 1 , X 2 (\displaystyle X_(1),~X_(2)) . It is necessary to test the null hypothesis of equality of the mathematical expectations of these random variables H 0: M 1 = M 2 (\displaystyle H_(0):~M_(1)=M_(2)) using sample data.

Consider the difference of the sample means Δ = X ¯ 1 − X ¯ 2 (\displaystyle \Delta =(\overline (X))_(1)-(\overline (X))_(2)) . Obviously, if the null hypothesis is satisfied E (Δ) = M 1 − M 2 = 0 (\displaystyle E(\Delta)=M_(1)-M_(2)=0) . The variance of this difference is, based on the independence of the samples: V (Δ) = σ 1 2 n 1 + σ 2 2 n 2 (\displaystyle V(\Delta)=(\frac (\sigma _(1)^(2))( n_(1)))+(\frac (\sigma _(2)^(2))(n_(2)))) . Then using the unbiased variance estimate s 2 = ∑ t = 1 n (X t − X ¯) 2 n − 1 (\displaystyle s^(2)=(\frac (\sum _(t=1)^(n)( X_(t)-(\overline (X)))^(2))(n-1))) we obtain an unbiased estimate of the variance of the difference between the sample means: s Δ 2 = s 1 2 n 1 + s 2 2 n 2 (\ displaystyle s_(\Delta )^(2)=(\frac (s_(1)^(2))(n_(1)))+(\frac (s_(2)^(2))(n_(2) ))) . Therefore, the t-statistic for testing the null hypothesis is

T = X ¯ 1 − X ¯ 2 s 1 2 n 1 + s 2 2 n 2 (\displaystyle t=(\frac ((\overline (X))_(1)-(\overline (X))_( 2))(\sqrt ((\frac (s_(1)^(2))(n_(1)))+(\frac (s_(2)^(2))(n_(2))))) ))

This statistic, under the null hypothesis, has a distribution t (d f) (\displaystyle t(df)) , where d f = (s 1 2 / n 1 + s 2 2 / n 2) 2 (s 1 2 / n 1) 2 / (n 1 − 1) + (s 2 2 / n 2) 2 / (n 2 − 1) (\displaystyle df=(\frac ((s_(1)^(2)/n_(1)+s_(2 )^(2)/n_(2))^(2))((s_(1)^(2)/n_(1))^(2)/(n_(1)-1)+(s_(2 )^(2)/n_(2))^(2)/(n_(2)-1))))

Same variance case

If the sample variances are assumed to be the same, then

V (Δ) = σ 2 (1 n 1 + 1 n 2) (\displaystyle V(\Delta)=\sigma ^(2)\left((\frac (1)(n_(1)))+(\ frac (1)(n_(2)))\right))

Then the t-statistic is:

T = X ¯ 1 − X ¯ 2 s X 1 n 1 + 1 n 2 , s X = (n 1 − 1) s 1 2 + (n 2 − 1) s 2 2 n 1 + n 2 − 2 (\ displaystyle t=(\frac ((\overline (X))_(1)-(\overline (X))_(2))(s_(X)(\sqrt ((\frac (1)(n_(1 )))+(\frac (1)(n_(2)))))))~,~~s_(X)=(\sqrt (\frac ((n_(1)-1)s_(1)^ (2)+(n_(2)-1)s_(2)^(2))(n_(1)+n_(2)-2))))

This statistic has a distribution t (n 1 + n 2 − 2) (\displaystyle t(n_(1)+n_(2)-2))

Two-sample t-test for dependent samples

To calculate the empirical value of the t (\displaystyle t) -criterion in a situation of testing a hypothesis about the differences between two dependent samples (for example, two samples of the same test with a time interval), the following formula is used:

T = M d s d / n (\displaystyle t=(\frac (M_(d))(s_(d)/(\sqrt (n)))))

where M d (\displaystyle M_(d)) is the mean difference of the values, s d (\displaystyle s_(d)) is the standard deviation of the differences, and n is the number of observations

This statistic has a distribution of t (n − 1) (\displaystyle t(n-1)) .

Testing a Linear Constraint on Linear Regression Parameters

The t-test can also test an arbitrary (single) linear constraint on the parameters of a linear regression estimated by ordinary least squares. Let it be necessary to test the hypothesis H 0: c T b = a (\displaystyle H_(0):c^(T)b=a) . Obviously, under the null hypothesis E (c T b ^ − a) = c T E (b ^) − a = 0 (\displaystyle E(c^(T)(\hat (b))-a)=c^( T)E((\hat (b)))-a=0) . Here we use the property of unbiased least squares estimates of model parameters E (b ^) = b (\displaystyle E((\hat (b)))=b) . In addition, V (c T b ^ − a) = c T V (b ^) c = σ 2 c T (X T X) − 1 c (\displaystyle V(c^(T)(\hat (b))-a )=c^(T)V((\hat (b)))c=\sigma ^(2)c^(T)(X^(T)X)^(-1)c) . Using instead of the unknown variance its unbiased estimate s 2 = E S S / (n − k) (\displaystyle s^(2)=ESS/(n-k)) we get the following t-statistic:

T = c T b ^ − a s c T (X T X) − 1 c (\displaystyle t=(\frac (c^(T)(\hat (b))-a)(s(\sqrt (c^(T) (X^(T)X)^(-1)c)))))

This statistic, under the null hypothesis, has a distribution of t (n − k) (\displaystyle t(n-k)) , so if the value of the statistic is greater than the critical value, then the null hypothesis of a linear constraint is rejected.

Testing hypotheses about the coefficient of linear regression

A special case of a linear constraint is to test the hypothesis that the regression coefficient b j (\displaystyle b_(j)) is equal to some value a (\displaystyle a) . In this case, the corresponding t-statistic is:

T = b ^ j − a s b ^ j (\displaystyle t=(\frac ((\hat (b))_(j)-a)(s_((\hat (b))_(j)))))

where s b ^ j (\displaystyle s_((\hat (b))_(j))) is the standard error of the coefficient estimate - the square root of the corresponding diagonal element of the covariance matrix of the coefficient estimates.

Under the null hypothesis, the distribution of this statistic is t (n − k) (\displaystyle t(n-k)) . If the absolute value of the statistic is higher than the critical value, then the difference of the coefficient from a (\displaystyle a) is statistically significant (non-random), otherwise it is insignificant (random, that is, the true coefficient is probably equal to or very close to the expected value of a (\ display style a))

Comment

The one-sample test for mathematical expectations can be reduced to testing a linear constraint on the linear regression parameters. In a one-sample test, this is a "regression" on a constant. Therefore, s 2 (\displaystyle s^(2)) of the regression is a sample estimate of the variance of the random variable under study, the matrix X T X (\displaystyle X^(T)X) is equal to n (\displaystyle n) , and the estimate of the “coefficient” of the model is sample mean. From this we obtain the expression for the t-statistic given above for the general case.

Similarly, it can be shown that a two-sample test with equal sample variances also reduces to testing linear constraints. In a two-sample test, this is a "regression" on a constant and a dummy variable that identifies a subsample depending on the value (0 or 1): y = a + b D (\displaystyle y=a+bD) . The hypothesis about the equality of the mathematical expectations of the samples can be formulated as a hypothesis about the equality of the coefficient b of this model to zero. It can be shown that the corresponding t-statistic for testing this hypothesis is equal to the t-statistic given for the two-sample test.

It can also be reduced to checking the linear constraint in the case of different variances. In this case, the variance of model errors takes two values. From this, one can also obtain a t-statistic similar to that given for the two-sample test.

Nonparametric analogs

An analogue of the two-sample test for independent samples is the Mann-Whitney U-test. For the situation with dependent samples, the analogs are the sign test and the Wilcoxon T-test

Literature

student. The probable error of a mean. // Biometrika. 1908. No. 6 (1). P. 1-25.

Links

On the criteria for testing hypotheses about the homogeneity of means on the website of the Novosibirsk State Technical University