Confidence interval. ABC of medical statistics

Let's build a confidence interval in MS EXCEL for estimating the mean value of the distribution in the case of a known value of the variance.

Of course the choice level of trust completely depends on the task at hand. Thus, the degree of confidence of the air passenger in the reliability of the aircraft, of course, should be higher than the degree of confidence of the buyer in the reliability of the light bulb.

Task Formulation

Let's assume that from population having taken sample size n. It is assumed that standard deviation this distribution is known. Necessary on the basis of this samples evaluate the unknown distribution mean(μ, ) and construct the corresponding bilateral confidence interval.

Point Estimation

As is known from statistics(let's call it X cf) is an unbiased estimate of the mean this population and has the distribution N(μ;σ 2 /n).

Note: What if you need to build confidence interval in the case of distribution, which is not normal? In this case, comes to the rescue, which says that with a sufficiently large size samples n from distribution non- normal, sampling distribution of statistics Х av will approximately correspond normal distribution with parameters N(μ;σ 2 /n).

So, point estimate middle distribution values we have is sample mean, i.e. X cf. Now let's get busy confidence interval.

Building a confidence interval

Usually, knowing the distribution and its parameters, we can calculate the probability that a random variable will take a value from the interval we specified. Now let's do the opposite: find the interval in which the random variable falls with a given probability. For example, from properties normal distribution it is known that with a probability of 95%, a random variable distributed over normal law, will fall within the interval approximately +/- 2 from mean value(see article about). This interval will serve as our prototype for confidence interval.

Now let's see if we know the distribution , to calculate this interval? To answer the question, we must specify the form of distribution and its parameters.

We know the form of distribution is normal distribution(remember that we are talking about sampling distribution statistics X cf).

The parameter μ is unknown to us (it just needs to be estimated using confidence interval), but we have its estimate X cf, calculated based on sample, which can be used.

The second parameter is sample mean standard deviation will be known, it is equal to σ/√n.

Because we do not know μ, then we will build the interval +/- 2 standard deviations not from mean value, but from its known estimate X cf. Those. when calculating confidence interval we will NOT assume that X cf will fall within the interval +/- 2 standard deviations from μ with a probability of 95%, and we will assume that the interval is +/- 2 standard deviations from X cf with a probability of 95% will cover μ - the average of the general population, from which sample. These two statements are equivalent, but the second statement allows us to construct confidence interval.

In addition, we refine the interval: a random variable distributed over normal law, with a 95% probability falls within the interval +/- 1.960 standard deviations, not +/- 2 standard deviations. This can be calculated using the formula \u003d NORM.ST.OBR ((1 + 0.95) / 2), cm. sample file Sheet Spacing.

Now we can formulate a probabilistic statement that will serve us to form confidence interval:
"The probability that population mean located from sample average within 1.960" standard deviations of the sample mean", is equal to 95%.

The probability value mentioned in the statement has a special name , which is associated with significance level α (alpha) by a simple expression trust level =1 . In our case significance level α =1-0,95=0,05 .

Now, based on this probabilistic statement, we write an expression for calculating confidence interval:

where Zα/2 standard normal distribution(such a value of a random variable z, what P(z>=Zα/2 )=α/2).

Note: Upper α/2-quantile defines the width confidence interval in standard deviations sample mean. Upper α/2-quantile standard normal distribution is always greater than 0, which is very convenient.

In our case, at α=0.05, upper α/2-quantile equals 1.960. For other significance levels α (10%; 1%) upper α/2-quantile Zα/2 can be calculated using the formula \u003d NORM.ST.OBR (1-α / 2) or, if known trust level, =NORM.ST.OBR((1+confidence level)/2).

Usually when building confidence intervals for estimating the mean use only upper α/2-quantile and do not use lower α/2-quantile. This is possible because standard normal distribution symmetrical about the x-axis ( density of its distribution symmetrical about average, i.e. 0). Therefore, there is no need to calculate lower α/2-quantile(it is simply called α /2-quantile), because it is equal upper α/2-quantile with a minus sign.

Recall that, regardless of the shape of the distribution of x, the corresponding random variable X cf distributed approximately fine N(μ;σ 2 /n) (see article about). Therefore, in general, the above expression for confidence interval is only approximate. If x is distributed over normal law N(μ;σ 2 /n), then the expression for confidence interval is accurate.

Calculation of confidence interval in MS EXCEL

Let's solve the problem.
The response time of an electronic component to an input signal is an important characteristic of a device. An engineer wants to plot a confidence interval for the average response time at a confidence level of 95%. From previous experience, the engineer knows that the standard deviation of the response time is 8 ms. It is known that the engineer made 25 measurements to estimate the response time, the average value was 78 ms.

Decision: An engineer wants to know the response time of an electronic device, but he understands that the response time is not fixed, but a random variable that has its own distribution. So the best he can hope for is to determine the parameters and shape of this distribution.

Unfortunately, from the condition of the problem, we do not know the form of the distribution of the response time (it does not have to be normal). , this distribution is also unknown. Only he is known standard deviationσ=8. Therefore, while we cannot calculate the probabilities and construct confidence interval.

However, although we do not know the distribution time separate response, we know that according to CPT, sampling distribution average response time is approximately normal(we will assume that the conditions CPT are performed, because the size samples large enough (n=25)) .

Furthermore, the average this distribution is equal to mean value unit response distributions, i.e. μ. BUT standard deviation of this distribution (σ/√n) can be calculated using the formula =8/ROOT(25) .

It is also known that the engineer received point estimate parameter μ equal to 78 ms (X cf). Therefore, now we can calculate the probabilities, because we know the distribution form ( normal) and its parameters (Х ср and σ/√n).

Engineer wants to know expected valueμ of the response time distribution. As stated above, this μ is equal to expectation of the sample distribution of the average response time. If we use normal distribution N(X cf; σ/√n), then the desired μ will be in the range +/-2*σ/√n with a probability of approximately 95%.

Significance level equals 1-0.95=0.05.

Finally, find the left and right border confidence interval.
Left border: \u003d 78-NORM.ST.INR (1-0.05 / 2) * 8 / ROOT (25) = 74,864
Right border: \u003d 78 + NORM. ST. OBR (1-0.05 / 2) * 8 / ROOT (25) \u003d 81.136

Left border: =NORM.INV(0.05/2, 78, 8/SQRT(25))
Right border: =NORM.INV(1-0.05/2, 78, 8/SQRT(25))

Answer: confidence interval at 95% confidence level and σ=8msec equals 78+/-3.136ms

AT example file on sheet Sigma known created a form for calculation and construction bilateral confidence interval for arbitrary samples with a given σ and significance level.

CONFIDENCE.NORM() function

If the values samples are in the range B20:B79 , a significance level equal to 0.05; then MS EXCEL formula:
=AVERAGE(B20:B79)-CONFIDENCE(0.05,σ, COUNT(B20:B79))
will return the left border confidence interval.

The same boundary can be calculated using the formula:
=AVERAGE(B20:B79)-NORM.ST.INV(1-0.05/2)*σ/SQRT(COUNT(B20:B79))

Note: The TRUST.NORM() function appeared in MS EXCEL 2010. Earlier versions of MS EXCEL used the TRUST() function.

Confidence intervals ( English Confidence Intervals) one of the types of interval estimates used in statistics, which are calculated for a given level of significance. They allow us to make a statement that the true value of an unknown statistical parameter of the general population is in the obtained range of values ​​with a probability that is given by the chosen level of statistical significance.

Normal distribution

When the variance (σ 2 ) of the population of data is known, a z-score can be used to calculate confidence limits (boundary points of the confidence interval). Compared to using a t-distribution, using a z-score will not only provide a narrower confidence interval, but also provide more reliable estimates of the mean and standard deviation (σ), since the Z-score is based on a normal distribution.

Formula

To determine the boundary points of the confidence interval, provided that the standard deviation of the population of data is known, the following formula is used

L = X - Z α/2 σ
√n

Example

Assume that the sample size is 25 observations, the sample mean is 15, and the population standard deviation is 8. For a significance level of α=5%, the Z-score is Z α/2 =1.96. In this case, the lower and upper limits of the confidence interval will be

L = 15 - 1.96 8 = 11,864
√25
L = 15 + 1.96 8 = 18,136
√25

Thus, we can state that with a probability of 95% the mathematical expectation of the general population will fall in the range from 11.864 to 18.136.

Methods for narrowing the confidence interval

Let's say the range is too wide for the purposes of our study. There are two ways to decrease the confidence interval range.

  1. Reduce the level of statistical significance α.
  2. Increase the sample size.

Reducing the level of statistical significance to α=10%, we get a Z-score equal to Z α/2 =1.64. In this case, the lower and upper limits of the interval will be

L = 15 - 1.64 8 = 12,376
√25
L = 15 + 1.64 8 = 17,624
√25

And the confidence interval itself can be written as

In this case, we can make the assumption that with a probability of 90%, the mathematical expectation of the general population will fall into the range.

If we want to keep the level of statistical significance α, then the only alternative is to increase the sample size. Increasing it to 144 observations, we obtain the following values ​​of the confidence limits

L = 15 - 1.96 8 = 13,693
√144
L = 15 + 1.96 8 = 16,307
√144

The confidence interval itself will look like this:

Thus, narrowing the confidence interval without reducing the level of statistical significance is only possible by increasing the sample size. If it is not possible to increase the sample size, then the narrowing of the confidence interval can be achieved solely by reducing the level of statistical significance.

Building a confidence interval for a non-normal distribution

If the standard deviation of the population is not known or the distribution is non-normal, the t-distribution is used to construct a confidence interval. This technique is more conservative, which is expressed in wider confidence intervals, compared to the technique based on the Z-score.

Formula

The following formulas are used to calculate the lower and upper limits of the confidence interval based on the t-distribution

L = X - tα σ
√n

Student's distribution or t-distribution depends on only one parameter - the number of degrees of freedom, which is equal to the number of individual feature values ​​(the number of observations in the sample). The value of Student's t-test for a given number of degrees of freedom (n) and the level of statistical significance α can be found in the lookup tables.

Example

Assume that the sample size is 25 individual values, the mean of the sample is 50, and the standard deviation of the sample is 28. You need to construct a confidence interval for the level of statistical significance α=5%.

In our case, the number of degrees of freedom is 24 (25-1), therefore, the corresponding tabular value of Student's t-test for the level of statistical significance α=5% is 2.064. Therefore, the lower and upper bounds of the confidence interval will be

L = 50 - 2.064 28 = 38,442
√25
L = 50 + 2.064 28 = 61,558
√25

And the interval itself can be written as

Thus, we can state that with a probability of 95% the mathematical expectation of the general population will be in the range.

Using a t-distribution allows you to narrow the confidence interval, either by reducing statistical significance or by increasing the sample size.

Reducing the statistical significance from 95% to 90% in the conditions of our example, we get the corresponding tabular value of Student's t-test 1.711.

L = 50 - 1.711 28 = 40,418
√25
L = 50 + 1.711 28 = 59,582
√25

In this case, we can say that with a probability of 90% the mathematical expectation of the general population will be in the range.

If we do not want to reduce the statistical significance, then the only alternative is to increase the sample size. Let's say that it is 64 individual observations, and not 25 as in the initial condition of the example. The tabular value of Student's t-test for 63 degrees of freedom (64-1) and the level of statistical significance α=5% is 1.998.

L = 50 - 1.998 28 = 43,007
√64
L = 50 + 1.998 28 = 56,993
√64

This gives us the opportunity to assert that with a probability of 95% the mathematical expectation of the general population will be in the range.

Large Samples

Large samples are samples from a population of data with more than 100 individual observations. Statistical studies have shown that larger samples tend to be normally distributed, even if the distribution of the population is not normal. In addition, for such samples, the use of z-scores and t-distributions give approximately the same results when constructing confidence intervals. Thus, for large samples, it is acceptable to use a z-score for a normal distribution instead of a t-distribution.

Summing up

Confidence interval(CI; in English, confidence interval - CI) obtained in the study at the sample gives a measure of the accuracy (or uncertainty) of the results of the study, in order to draw conclusions about the population of all such patients (general population). The correct definition of 95% CI can be formulated as follows: 95% of such intervals will contain the true value in the population. This interpretation is somewhat less accurate: CI is the range of values ​​within which you can be 95% sure that it contains the true value. When using CI, the emphasis is on determining the quantitative effect, as opposed to the P value, which is obtained as a result of testing for statistical significance. The P value does not evaluate any amount, but rather serves as a measure of the strength of the evidence against the null hypothesis of "no effect". The value of P by itself does not tell us anything about the magnitude of the difference, or even about its direction. Therefore, independent values ​​of P are absolutely uninformative in articles or abstracts. In contrast, CI indicates both the amount of effect of immediate interest, such as the usefulness of a treatment, and the strength of the evidence. Therefore, DI is directly related to the practice of DM.

The scoring approach to statistical analysis, illustrated by CI, aims to measure the magnitude of the effect of interest (sensitivity of the diagnostic test, predicted incidence, relative risk reduction with treatment, etc.) and to measure the uncertainty in that effect. Most often, the CI is the range of values ​​on either side of the estimate that the true value is likely to lie in, and you can be 95% sure of it. The convention to use the 95% probability is arbitrary, as well as the value of P<0,05 для оценки статистической значимости, и авторы иногда используют 90% или 99% ДИ. Заметим, что слово «интервал» означает диапазон величин и поэтому стоит в единственном числе. Две величины, которые ограничивают интервал, называются «доверительными пределами».

The CI is based on the idea that the same study performed on different sets of patients would not produce identical results, but that their results would be distributed around the true but unknown value. In other words, the CI describes this as "sample-dependent variability". The CI does not reflect additional uncertainty due to other causes; in particular, it does not include the effects of selective loss of patients on tracking, poor compliance or inaccurate outcome measurement, lack of blinding, etc. CI thus always underestimates the total amount of uncertainty.

Confidence Interval Calculation

Table A1.1. Standard errors and confidence intervals for some clinical measurements

Typically, CI is calculated from an observed estimate of a quantitative measure, such as the difference (d) between two proportions, and the standard error (SE) in the estimate of that difference. The approximate 95% CI thus obtained is d ± 1.96 SE. The formula changes according to the nature of the outcome measure and the coverage of the CI. For example, in a randomized placebo-controlled trial of acellular pertussis vaccine, whooping cough developed in 72 of 1670 (4.3%) infants who received the vaccine and 240 of 1665 (14.4%) in the control group. The percentage difference, known as the absolute risk reduction, is 10.1%. The SE of this difference is 0.99%. Accordingly, the 95% CI is 10.1% + 1.96 x 0.99%, i.e. from 8.2 to 12.0.

Despite different philosophical approaches, CIs and tests for statistical significance are closely related mathematically.

Thus, the value of P is “significant”, i.e. R<0,05 соответствует 95% ДИ, который исключает величину эффекта, указывающую на отсутствие различия. Например, для различия между двумя средними пропорциями это ноль, а для относительного риска или отношения шансов - единица. При некоторых обстоятельствах эти два подхода могут быть не совсем эквивалентны. Преобладающая точка зрения: оценка с помощью ДИ - предпочтительный подход к суммированию результатов исследования, но ДИ и величина Р взаимодополняющи, и во многих статьях используются оба способа представления результатов.

The uncertainty (inaccuracy) of the estimate, expressed in CI, is largely related to the square root of the sample size. Small samples provide less information than large samples, and CIs are correspondingly wider in smaller samples. For example, an article comparing the performance of three tests used to diagnose Helicobacter pylori infection reported a urea breath test sensitivity of 95.8% (95% CI 75-100). While the figure of 95.8% looks impressive, the small sample size of 24 adult H. pylori patients means that there is significant uncertainty in this estimate, as shown by the wide CI. Indeed, the lower limit of 75% is much lower than the 95.8% estimate. If the same sensitivity were observed in a sample of 240 people, then the 95% CI would be 92.5-98.0, giving more assurance that the test is highly sensitive.

In randomized controlled trials (RCTs), non-significant results (i.e., those with P > 0.05) are particularly susceptible to misinterpretation. The CI is particularly useful here as it indicates how compatible the results are with the clinically useful true effect. For example, in an RCT comparing suture versus staple anastomosis in the colon, wound infection developed in 10.9% and 13.5% of patients, respectively (P = 0.30). The 95% CI for this difference is 2.6% (-2 to +8). Even in this study, which included 652 patients, it remains likely that there is a modest difference in the incidence of infections resulting from the two procedures. The smaller the study, the greater the uncertainty. Sung et al. performed an RCT comparing octreotide infusion with emergency sclerotherapy for acute variceal bleeding in 100 patients. In the octreotide group, the bleeding arrest rate was 84%; in the sclerotherapy group - 90%, which gives P = 0.56. Note that rates of continued bleeding are similar to those of wound infection in the study mentioned. In this case, however, the 95% CI for difference in interventions is 6% (-7 to +19). This range is quite wide compared to a 5% difference that would be of clinical interest. It is clear that the study does not rule out a significant difference in efficacy. Therefore, the conclusion of the authors "octreotide infusion and sclerotherapy are equally effective in the treatment of bleeding from varices" is definitely not valid. In cases like this where the 95% CI for absolute risk reduction (ARR) includes zero, as here, the CI for NNT (number needed to treat) is rather difficult to interpret. . The NLP and its CI are obtained from the reciprocals of the ACP (multiplying them by 100 if these values ​​are given as percentages). Here we get NPP = 100: 6 = 16.6 with a 95% CI of -14.3 to 5.3. As can be seen from the footnote "d" in Table. A1.1, this CI includes values ​​for NTPP from 5.3 to infinity and NTLP from 14.3 to infinity.

CIs can be constructed for most commonly used statistical estimates or comparisons. For RCTs, it includes the difference between mean proportions, relative risks, odds ratios, and NRRs. Similarly, CIs can be obtained for all major estimates made in studies of diagnostic test accuracy—sensitivity, specificity, positive predictive value (all of which are simple proportions), and likelihood ratios—estimates obtained in meta-analyses and comparison-to-control studies. A personal computer program that covers many of these uses of DI is available with the second edition of Statistics with Confidence. Macros for calculating CIs for proportions are freely available for Excel and the statistical programs SPSS and Minitab at http://www.uwcm.ac.uk/study/medicine/epidemiology_statistics/research/statistics/proportions, htm.

Multiple evaluations of treatment effect

While the construction of CIs is desirable for primary outcomes of a study, they are not required for all outcomes. The CI concerns clinically important comparisons. For example, when comparing two groups, the correct CI is the one that is built for the difference between the groups, as shown in the examples above, and not the CI that can be built for the estimate in each group. Not only is it useless to give separate CIs for the scores in each group, this presentation can be misleading. Similarly, the correct approach when comparing treatment efficacy in different subgroups is to compare two (or more) subgroups directly. It is incorrect to assume that treatment is effective only in one subgroup if its CI excludes the value corresponding to no effect, while others do not. CIs are also useful when comparing results across multiple subgroups. On fig. A1.1 shows the relative risk of eclampsia in women with preeclampsia in subgroups of women from a placebo-controlled RCT of magnesium sulfate.

Rice. A1.2. The Forest Graph shows the results of 11 randomized clinical trials of bovine rotavirus vaccine for diarrhea prevention versus placebo. The 95% confidence interval was used to estimate the relative risk of diarrhea. The size of the black square is proportional to the amount of information. In addition, a summary estimate of treatment efficacy and a 95% confidence interval (indicated by a diamond) are shown. The meta-analysis used a random-effects model that exceeds some pre-established ones; for example, it could be the size used in calculating the sample size. Under a more stringent criterion, the entire range of CIs must show a benefit that exceeds a predetermined minimum.

We have already discussed the fallacy of taking the absence of statistical significance as an indication that two treatments are equally effective. It is equally important not to equate statistical significance with clinical significance. Clinical importance can be assumed when the result is statistically significant and the magnitude of the treatment response

Studies can show whether the results are statistically significant and which ones are clinically important and which are not. On fig. A1.2 shows the results of four trials for which the entire CI<1, т.е. их результаты статистически значимы при Р <0,05 , . После высказанного предположения о том, что клинически важным различием было бы сокращение риска диареи на 20% (ОР = 0,8), все эти испытания показали клинически значимую оценку сокращения риска, и лишь в исследовании Treanor весь 95% ДИ меньше этой величины. Два других РКИ показали клинически важные результаты, которые не были статистически значимыми. Обратите внимание, что в трёх испытаниях точечные оценки эффективности лечения были почти идентичны, но ширина ДИ различалась (отражает размер выборки). Таким образом, по отдельности доказательная сила этих РКИ различна.

And others. All of them are estimates of their theoretical counterparts, which could be obtained if there was not a sample, but the general population. But alas, the general population is very expensive and often unavailable.

The concept of interval estimation

Any sample estimate has some scatter, because is a random variable depending on the values ​​in a particular sample. Therefore, for more reliable statistical inferences, one should know not only the point estimate, but also the interval, which with a high probability γ (gamma) covers the estimated indicator θ (theta).

Formally, these are two such values ​​(statistics) T1(X) and T2(X), what T1< T 2 , for which at a given level of probability γ condition is met:

In short, it is likely γ or more the true value is between the points T1(X) and T2(X), which are called the lower and upper bounds confidence interval.

One of the conditions for constructing confidence intervals is its maximum narrowness, i.e. it should be as short as possible. Desire is quite natural, because. the researcher tries to more accurately localize the finding of the desired parameter.

It follows that the confidence interval should cover the maximum probabilities of the distribution. and the score itself be at the center.

That is, the probability of deviation (of the true indicator from the estimate) upwards is equal to the probability of deviation downwards. It should also be noted that for skewed distributions, the interval on the right is not equal to the interval on the left.

The figure above clearly shows that the greater the confidence level, the wider the interval - a direct relationship.

This was a small introduction to the theory of interval estimation of unknown parameters. Let's move on to finding confidence limits for the mathematical expectation.

Confidence interval for mathematical expectation

If the original data are distributed over , then the average will be a normal value. This follows from the rule that a linear combination of normal values ​​also has a normal distribution. Therefore, to calculate the probabilities, we could use the mathematical apparatus of the normal distribution law.

However, this will require the knowledge of two parameters - the expected value and the variance, which are usually not known. You can, of course, use estimates instead of parameters (arithmetic mean and ), but then the distribution of the mean will not be quite normal, it will be slightly flattened down. Citizen William Gosset of Ireland adroitly noted this fact when he published his discovery in the March 1908 issue of Biometrica. For secrecy purposes, Gosset signed with Student. This is how the Student's t-distribution appeared.

However, the normal distribution of data, used by K. Gauss in the analysis of errors in astronomical observations, is extremely rare in terrestrial life and it is quite difficult to establish this (for high accuracy, about 2 thousand observations are needed). Therefore, it is best to drop the normality assumption and use methods that do not depend on the distribution of the original data.

The question arises: what is the distribution of the arithmetic mean if it is calculated from the data of an unknown distribution? The answer is given by the well-known in probability theory Central limit theorem(CPT). In mathematics, there are several versions of it (the formulations have been refined over the years), but all of them, roughly speaking, come down to the statement that the sum of a large number of independent random variables obeys the normal distribution law.

When calculating the arithmetic mean, the sum of random variables is used. From this it turns out that the arithmetic mean has a normal distribution, in which the expected value is the expected value of the original data, and the variance is .

Smart people know how to prove the CLT, but we will verify this with the help of an experiment conducted in Excel. Let's simulate a sample of 50 uniformly distributed random variables (using the Excel function RANDOMBETWEEN). Then we will make 1000 such samples and calculate the arithmetic mean for each. Let's look at their distribution.

It can be seen that the distribution of the average is close to the normal law. If the volume of samples and their number are made even larger, then the similarity will be even better.

Now that we have seen for ourselves the validity of the CLT, we can, using , calculate the confidence intervals for the arithmetic mean, which cover the true mean or mathematical expectation with a given probability.

To establish the upper and lower bounds, it is required to know the parameters of the normal distribution. As a rule, they are not, therefore, estimates are used: arithmetic mean and sample variance. Again, this method gives a good approximation only for large samples. When the samples are small, it is often recommended to use Student's distribution. Don't believe! Student's distribution for the mean occurs only when the original data has a normal distribution, that is, almost never. Therefore, it is better to immediately set the minimum bar for the amount of required data and use asymptotically correct methods. They say 30 observations are enough. Take 50 - you can't go wrong.

T 1.2 are the lower and upper bounds of the confidence interval

– sample arithmetic mean

s0– sample standard deviation (unbiased)

n – sample size

γ – confidence level (usually equal to 0.9, 0.95 or 0.99)

c γ =Φ -1 ((1+γ)/2) is the reciprocal of the standard normal distribution function. In simple terms, this is the number of standard errors from the arithmetic mean to the lower or upper bound (the indicated three probabilities correspond to the values ​​\u200b\u200bof 1.64, 1.96 and 2.58).

The essence of the formula is that the arithmetic mean is taken and then a certain amount is set aside from it ( with γ) standard errors ( s 0 /√n). Everything is known, take it and count.

Before the mass use of PCs, to obtain the values ​​​​of the normal distribution function and its inverse, they used . They are still used, but it is more efficient to turn to ready-made Excel formulas. All elements from the formula above ( , and ) can be easily calculated in Excel. But there is also a ready-made formula for calculating the confidence interval - CONFIDENCE NORM. Its syntax is the following.

CONFIDENCE NORM(alpha, standard_dev, size)

alpha– significance level or confidence level, which in the above notation is equal to 1-γ, i.e. the probability that the mathematicalthe expectation will be outside the confidence interval. With a confidence level of 0.95, alpha is 0.05, and so on.

standard_off is the standard deviation of the sample data. You don't need to calculate the standard error, Excel will divide by the root of n.

the size– sample size (n).

The result of the CONFIDENCE.NORM function is the second term from the formula for calculating the confidence interval, i.e. half-interval. Accordingly, the lower and upper points are the average ± the obtained value.

Thus, it is possible to build a universal algorithm for calculating confidence intervals for the arithmetic mean, which does not depend on the distribution of the initial data. The price for universality is its asymptotic nature, i.e. the need to use relatively large samples. However, in the age of modern technology, collecting the right amount of data is usually not difficult.

Testing Statistical Hypotheses Using a Confidence Interval

(module 111)

One of the main problems solved in statistics is. In a nutshell, its essence is this. An assumption is made, for example, that the expectation of the general population is equal to some value. Then the distribution of sample means is constructed, which can be observed with a given expectation. Next, we look at where in this conditional distribution the real average is located. If it goes beyond the allowable limits, then the appearance of such an average is very unlikely, and with a single repetition of the experiment it is almost impossible, which contradicts the hypothesis put forward, which is successfully rejected. If the average does not go beyond the critical level, then the hypothesis is not rejected (but it is not proved either!).

So, with the help of confidence intervals, in our case for the expectation, you can also test some hypotheses. It's very easy to do. Suppose the arithmetic mean for some sample is 100. The hypothesis is being tested that the expected value is, say, 90. That is, if we put the question primitively, it sounds like this: can it be that with the true value of the average equal to 90, the observed the average was 100?

To answer this question, additional information on standard deviation and sample size will be required. Let's say the standard deviation is 30, and the number of observations is 64 (to easily extract the root). Then the standard error of the mean is 30/8 or 3.75. To calculate the 95% confidence interval, you will need to set aside two standard errors on both sides of the mean (more precisely, 1.96). The confidence interval will be approximately 100 ± 7.5, or from 92.5 to 107.5.

Further reasoning is as follows. If the tested value falls within the confidence interval, then it does not contradict the hypothesis, since fits within the limits of random fluctuations (with a probability of 95%). If the tested point is outside the confidence interval, then the probability of such an event is very small, in any case below the acceptable level. Hence, the hypothesis is rejected as contradicting the observed data. In our case, the expectation hypothesis is outside the confidence interval (the tested value of 90 is not included in the interval of 100±7.5), so it should be rejected. Answering the primitive question above, one should say: no, it cannot, in any case, this happens extremely rarely. Often, this indicates a specific probability of erroneous rejection of the hypothesis (p-level), and not a given level, according to which the confidence interval was built, but more on that another time.

As you can see, it is not difficult to build a confidence interval for the mean (or mathematical expectation). The main thing is to catch the essence, and then things will go. In practice, most use the 95% confidence interval, which is about two standard errors wide on either side of the mean.

That's all for now. All the best!

CONFIDENCE INTERVALS FOR FREQUENCIES AND PARTS

© 2008

National Institute of Public Health, Oslo, Norway

The article describes and discusses the calculation of confidence intervals for frequencies and proportions using the Wald, Wilson, Klopper-Pearson methods, using the angular transformation and the Wald method with Agresti-Cowll correction. The presented material provides general information about the methods for calculating confidence intervals for frequencies and proportions and is intended to arouse the interest of the journal's readers not only in using confidence intervals when presenting the results of their own research, but also in reading specialized literature before starting work on future publications.

Keywords: confidence interval, frequency, proportion

In one of the previous publications, the description of qualitative data was briefly mentioned and it was reported that their interval estimation is preferable to a point estimate for describing the frequency of occurrence of the studied characteristic in the general population. Indeed, since studies are conducted using sample data, the projection of the results on the general population must contain an element of inaccuracy in the sample estimate. The confidence interval is a measure of the accuracy of the estimated parameter. Interestingly, in some medical textbooks on basic statistics, the topic of confidence intervals for frequencies is completely ignored. In this article, we will consider several ways to calculate confidence intervals for frequencies, assuming sample characteristics such as non-recurrence and representativeness, as well as the independence of observations from each other. The frequency in this article is not understood as an absolute number showing how many times this or that value occurs in the aggregate, but a relative value that determines the proportion of study participants who have the trait under study.

In biomedical research, 95% confidence intervals are most commonly used. This confidence interval is the region within which the true proportion falls 95% of the time. In other words, it can be said with 95% certainty that the true value of the frequency of occurrence of a trait in the general population will be within the 95% confidence interval.

Most statistical textbooks for medical researchers report that the frequency error is calculated using the formula

where p is the frequency of occurrence of the feature in the sample (value from 0 to 1). In most domestic scientific articles, the value of the frequency of occurrence of a feature in the sample (p), as well as its error (s) in the form of p ± s, is indicated. It is more expedient, however, to present a 95% confidence interval for the frequency of occurrence of a trait in the general population, which will include values ​​from

before.

In some textbooks, it is recommended, for small samples, to replace the value of 1.96 with the value of t for N - 1 degrees of freedom, where N is the number of observations in the sample. The value of t is found in the tables for the t-distribution, which are available in almost all textbooks on statistics. The use of the distribution of t for the Wald method does not provide visible advantages over other methods discussed below, and therefore is not welcomed by some authors.

The above method for calculating confidence intervals for frequencies or fractions is named after Abraham Wald (Abraham Wald, 1902–1950), since it began to be widely used after the publication of Wald and Wolfowitz in 1939. However, the method itself was proposed by Pierre Simon Laplace (1749–1827) as early as 1812.

The Wald method is very popular, but its application is associated with significant problems. The method is not recommended for small sample sizes, as well as in cases where the frequency of occurrence of a feature tends to 0 or 1 (0% or 100%) and is simply not possible for frequencies of 0 and 1. In addition, the normal distribution approximation, which is used when calculating the error , "does not work" in cases where n p< 5 или n · (1 – p) < 5 . Более консервативные статистики считают, что n · p и n · (1 – p) должны быть не менее 10 . Более детальное рассмотрение метода Вальда показало, что полученные с его помощью доверительные интервалы в большинстве случаев слишком узки, то есть их применение ошибочно создает слишком оптимистичную картину, особенно при удалении частоты встречаемости признака от 0,5, или 50 % . К тому же при приближении частоты к 0 или 1 доверительный интревал может принимать отрицательные значения или превышать 1, что выглядит абсурдно для частот. Многие авторы совершенно справедливо не рекомендуют применять данный метод не только в уже упомянутых случаях, но и тогда, когда частота встречаемости признака менее 25 % или более 75 % . Таким образом, несмотря на простоту расчетов, метод Вальда может применяться лишь в очень ограниченном числе случаев. Зарубежные исследователи более категоричны в своих выводах и однозначно рекомендуют не применять этот метод для небольших выборок , а ведь именно с такими выборками часто приходится иметь дело исследователям-медикам.

Since the new variable is normally distributed, the lower and upper bounds of the 95% confidence interval for variable φ will be φ-1.96 and φ+1.96left">

Instead of 1.96 for small samples, it is recommended to substitute the value of t for N - 1 degrees of freedom. This method does not give negative values ​​and allows you to more accurately estimate the confidence intervals for frequencies than the Wald method. In addition, it is described in many domestic reference books on medical statistics, which, however, did not lead to its widespread use in medical research. Calculating confidence intervals using an angle transform is not recommended for frequencies approaching 0 or 1.

This is where the description of methods for estimating confidence intervals in most books on the basics of statistics for medical researchers usually ends, and this problem is typical not only for domestic, but also for foreign literature. Both methods are based on the central limit theorem, which implies a large sample.

Taking into account the shortcomings of estimating confidence intervals using the above methods, Clopper (Clopper) and Pearson (Pearson) proposed in 1934 a method for calculating the so-called exact confidence interval, taking into account the binomial distribution of the studied trait. This method is available in many online calculators, but the confidence intervals obtained in this way are in most cases too wide. At the same time, this method is recommended for use in cases where a conservative estimate is required. The degree of conservativeness of the method increases as the sample size decreases, especially for N< 15 . описывает применение функции биномиального распределения для анализа качественных данных с использованием MS Excel, в том числе и для определения доверительных интервалов, однако расчет последних для частот в электронных таблицах не «затабулирован» в удобном для пользователя виде, а потому, вероятно, и не используется большинством исследователей.

According to many statisticians, the most optimal estimate of confidence intervals for frequencies is carried out by the Wilson method, proposed back in 1927, but practically not used in domestic biomedical research. This method not only makes it possible to estimate confidence intervals for both very small and very high frequencies, but is also applicable to a small number of observations. In general, the confidence interval according to the Wilson formula has the form from



where it takes the value 1.96 when calculating the 95% confidence interval, N is the number of observations, and p is the frequency of the feature in the sample. This method is available in online calculators, so its application is not problematic. and do not recommend using this method for n p< 4 или n · (1 – p) < 4 по причине слишком грубого приближения распределения р к нормальному в такой ситуации, однако зарубежные статистики считают метод Уилсона применимым и для малых выборок .

In addition to the Wilson method, the Agresti–Caull-corrected Wald method is also believed to provide an optimal estimate of the confidence interval for frequencies. The Agresti-Coulle correction is a replacement in the Wald formula of the frequency of occurrence of a trait in the sample (p) by p`, when calculating which 2 is added to the numerator, and 4 is added to the denominator, that is, p` = (X + 2) / (N + 4), where X is the number of study participants who have the trait under study, and N is the sample size. This modification produces results very similar to those of the Wilson formula, except when the event rate approaches 0% or 100% and the sample is small. In addition to the above methods for calculating confidence intervals for frequencies, corrections for continuity have been proposed for both the Wald method and the Wilson method for small samples, but studies have shown that their use is inappropriate.

Consider the application of the above methods for calculating confidence intervals using two examples. In the first case, we study a large sample of 1,000 randomly selected study participants, of which 450 have the trait under study (it can be a risk factor, an outcome, or any other trait), which is a frequency of 0.45, or 45%. In the second case, the study is conducted using a small sample, say, only 20 people, and only 1 participant in the study (5%) has the trait under study. Confidence intervals for the Wald method, for the Wald method with Agresti-Coll correction, for the Wilson method were calculated using an online calculator developed by Jeff Sauro (http://www./wald.htm). Continuity-corrected Wilson confidence intervals were calculated using the calculator provided by Wassar Stats: Web Site for Statistical Computation (http://faculty.vassar.edu/lowry/prop1.html). Calculations using the Fisher angular transformation were performed "manually" using the critical value of t for 19 and 999 degrees of freedom, respectively. The calculation results are presented in the table for both examples.

Confidence intervals calculated in six different ways for the two examples described in the text

Confidence Interval Calculation Method

P=0.0500, or 5%

95% CI for X=450, N=1000, P=0.4500, or 45%

–0,0455–0,2541

Walda with Agresti-Coll correction

<,0001–0,2541

Wilson with continuity correction

Klopper-Pearson's "exact method"

Angular transformation

<0,0001–0,1967

As can be seen from the table, for the first example, the confidence interval calculated by the "generally accepted" Wald method goes into the negative region, which cannot be the case for frequencies. Unfortunately, such incidents are not uncommon in Russian literature. The traditional way of representing data as a frequency and its error partially masks this problem. For example, if the frequency of occurrence of a trait (in percent) is presented as 2.1 ± 1.4, then this is not as “irritating” as 2.1% (95% CI: –0.7; 4.9), although and means the same. The Wald method with the Agresti-Coll correction and the calculation using the angular transformation give a lower bound tending to zero. The Wilson method with continuity correction and the "exact method" give wider confidence intervals than the Wilson method. For the second example, all methods give approximately the same confidence intervals (differences appear only in thousandths), which is not surprising, since the frequency of the event in this example does not differ much from 50%, and the sample size is quite large.

For readers interested in this problem, we can recommend the works of R. G. Newcombe and Brown, Cai and Dasgupta, which give the pros and cons of using 7 and 10 different methods for calculating confidence intervals, respectively. From domestic manuals, the book and is recommended, in which, in addition to a detailed description of the theory, the Wald and Wilson methods are presented, as well as a method for calculating confidence intervals, taking into account the binomial frequency distribution. In addition to free online calculators (http://www./wald.htm and http://faculty.vassar.edu/lowry/prop1.html), confidence intervals for frequencies (and not only!) can be calculated using the CIA program ( Confidence Intervals Analysis), which can be downloaded from http://www. medschool. soton. ac. uk/cia/ .

The next article will look at univariate ways to compare qualitative data.

Bibliography

Banerjee A. Medical statistics in plain language: an introductory course / A. Banerzhi. - M. : Practical medicine, 2007. - 287 p. Medical statistics / . - M. : Medical Information Agency, 2007. - 475 p. Glanz S. Medico-biological statistics / S. Glants. - M. : Practice, 1998. Data types, distribution verification and descriptive statistics / // Human Ecology - 2008. - No. 1. - P. 52–58. Zhizhin K.S.. Medical statistics: textbook / . - Rostov n / D: Phoenix, 2007. - 160 p. Applied Medical Statistics / , . - St. Petersburg. : Folio, 2003. - 428 p. Lakin G. F. Biometrics / . - M. : Higher school, 1990. - 350 p. Medic V. A. Mathematical statistics in medicine / , . - M. : Finance and statistics, 2007. - 798 p. Mathematical statistics in clinical research / , . - M. : GEOTAR-MED, 2001. - 256 p. Junkerov V. And. Medico-statistical processing of medical research data /,. - St. Petersburg. : VmedA, 2002. - 266 p. Agresti A. Approximate is better than exact for interval estimation of binomial proportions / A. Agresti, B. Coull // American statistician. - 1998. - N 52. - S. 119-126. Altman D. Statistics with confidence // D. Altman, D. Machin, T. Bryant, M. J. Gardner. - London: BMJ Books, 2000. - 240 p. Brown L.D. Interval estimation for a binomial proportion / L. D. Brown, T. T. Cai, A. Dasgupta // Statistical science. - 2001. - N 2. - P. 101-133. Clopper C.J. The use of confidence or fiducial limits illustrated in the case of the binomial / C. J. Clopper, E. S. Pearson // Biometrika. - 1934. - N 26. - P. 404-413. Garcia-Perez M. A. On the confidence interval for the binomial parameter / M. A. Garcia-Perez // Quality and quantity. - 2005. - N 39. - P. 467-481. Motulsky H. Intuitive biostatistics // H. Motulsky. - Oxford: Oxford University Press, 1995. - 386 p. Newcombe R.G. Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods / R. G. Newcombe // Statistics in Medicine. - 1998. - N. 17. - P. 857–872. Sauro J. Estimating completion rates from small samples using binomial confidence intervals: comparisons and recommendations / J. Sauro, J. R. Lewis // Proceedings of the human factors and ergonomics society annual meeting. – Orlando, FL, 2005. Wald A. Confidence limits for continuous distribution functions // A. Wald, J. Wolfovitz // Annals of Mathematical Statistics. - 1939. - N 10. - P. 105–118. Wilson E. B. Probable inference, the law of succession, and statistical inference / E. B. Wilson // Journal of American Statistical Association. - 1927. - N 22. - P. 209-212.

CONFIDENCE INTERVALS FOR PROPORTIONS

A. M. Grjibovski

National Institute of Public Health, Oslo, Norway

The article presents several methods for calculations confidence intervals for binomial proportions, namely, Wald, Wilson, arcsine, Agresti-Coull and exact Clopper-Pearson methods. The paper gives only general introduction to the problem of confidence interval estimation of a binomial proportion and its aim is not only to stimulate the readers to use confidence intervals when presenting results of own empirical research, but also to encourage them to consult statistics books prior to analyzing own data and preparing manuscripts.

key words: confidence interval, proportion

Contact Information:

Senior Advisor, National Institute of Public Health, Oslo, Norway