What is the variance? Absolute variations

Dispersion in statistics is found as the individual values ​​of the characteristic squared from . Depending on the initial data, it is determined using the simple and weighted variance formulas:

1. (for ungrouped data) is calculated using the formula:

2. Weighted variance (for variation series):

where n is frequency (repeatability of factor X)

An example of finding variance

This page describes a standard example of finding variance, you can also look at other problems for finding it

Example 1. The following data is available for a group of 20 correspondence students. It is necessary to construct an interval series of the distribution of the characteristic, calculate the average value of the characteristic and study its dispersion

Let's build an interval grouping. Let's determine the range of the interval using the formula:

where X max is the maximum value of the grouping characteristic;
X min – minimum value of the grouping characteristic;
n – number of intervals:

We accept n=5. The step is: h = (192 - 159)/ 5 = 6.6

Let's create an interval grouping

For further calculations, we will build an auxiliary table:

X'i is the middle of the interval. (for example, the middle of the interval 159 – 165.6 = 162.3)

We determine the average height of students using the weighted arithmetic average formula:

Let's determine the variance using the formula:

The dispersion formula can be transformed as follows:

From this formula it follows that variance is equal to the difference between the average of the squares of the options and the square and the average.

Dispersion in variation series with equal intervals using the method of moments can be calculated in the following way using the second property of dispersion (dividing all options by the value of the interval). Determining variance, calculated using the method of moments, using the following formula is less laborious:

where i is the value of the interval;
A is a conventional zero, for which it is convenient to use the middle of the interval with the highest frequency;
m1 is the square of the first order moment;
m2 - moment of second order

(if in a statistical population a characteristic changes in such a way that there are only two mutually exclusive options, then such variability is called alternative) can be calculated using the formula:

Substituting q = 1- p into this dispersion formula, we obtain:

Types of variance

Total variance measures the variation of a characteristic across the entire population as a whole under the influence of all factors that cause this variation. It is equal to the mean square of the deviations of individual values ​​of a characteristic x from the overall mean value of x and can be defined as simple variance or weighted variance.

characterizes random variation, i.e. part of the variation that is due to the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. Such dispersion is equal to the mean square of the deviations of individual values ​​of the attribute within group X from the arithmetic mean of the group and can be calculated as simple dispersion or as weighted dispersion.

Thus, within-group variance measures variation of a trait within a group and is determined by the formula:

where xi is the group average;
ni is the number of units in the group.

For example, intragroup variances that need to be determined in the task of studying the influence of workers’ qualifications on the level of labor productivity in a workshop show variations in output in each group caused by all possible factors (technical condition of equipment, availability of tools and materials, age of workers, labor intensity, etc. .), except for differences in qualification category (within a group all workers have the same qualifications).

The average of the within-group variances reflects random, i.e., that part of the variation that occurred under the influence of all other factors, with the exception of the grouping factor. It is calculated using the formula:

Characterizes the systematic variation of the resulting characteristic, which is due to the influence of the factor-sign that forms the basis of the group. It is equal to the mean square of the deviations of the group means from the overall mean. Intergroup variance is calculated using the formula:

The rule for adding variance in statistics

According to rule of adding variances the total variance is equal to the sum of the average of the within-group and between-group variances:

The meaning of this rule is that the total variance that arises under the influence of all factors is equal to the sum of the variances that arise under the influence of all other factors and the variance that arises due to the grouping factor.

Using the formula for adding variances, you can determine the third unknown variance from two known variances, and also judge the strength of the influence of the grouping characteristic.

Dispersion properties

1. If all values ​​of a characteristic are reduced (increased) by the same constant amount, then the dispersion will not change.
2. If all values ​​of a characteristic are reduced (increased) by the same number of times n, then the variance will correspondingly decrease (increase) by n^2 times.

If the population is divided into groups according to the characteristic being studied, then the following types of variance can be calculated for this population: total, group (within-group), average of group (average of within-group), intergroup.

Initially, it calculates the coefficient of determination, which shows what part of the total variation of the trait being studied is intergroup variation, i.e. due to the grouping characteristic:

The empirical correlation relationship characterizes the closeness of the connection between grouping (factorial) and performance characteristics.

The empirical correlation ratio can take values ​​from 0 to 1.

To assess the closeness of the connection based on the empirical correlation ratio, you can use the Chaddock relations:

Example 4. The following data is available on the performance of work by design and survey organizations of various forms of ownership:

Define:

1) total variance;

2) group variances;

3) the average of the group variances;

4) intergroup variance;

5) total variance based on the rule for adding variances;


6) coefficient of determination and empirical correlation ratio.

Draw conclusions.

Solution:

1. Let us determine the average volume of work performed by enterprises of two forms of ownership:

Let's calculate the total variance:

2. Determine group averages:

million rubles;

million rubles

Group variances:

;

3. Calculate the average of the group variances:

4. Let's determine the intergroup variance:

5. Calculate the total variance based on the rule for adding variances:

6. Let's determine the coefficient of determination:

.

Thus, the volume of work performed by design and survey organizations depends by 22% on the form of ownership of enterprises.

The empirical correlation ratio is calculated using the formula

.

The value of the calculated indicator indicates that the dependence of the volume of work on the form of ownership of the enterprise is small.

Example 5. As a result of a survey of the technological discipline of production areas, the following data were obtained:

Determine the coefficient of determination

Often in statistics, when analyzing a phenomenon or process, it is necessary to take into account not only information about the average levels of the indicators being studied, but also scatter or variation in the values ​​of individual units , which is an important characteristic of the population being studied.

The most subject to variation are stock prices, supply and demand, and interest rates over different periods of time and in different places.

The main indicators characterizing the variation , are range, dispersion, standard deviation and coefficient of variation.

Range of variation represents the difference between the maximum and minimum values ​​of the characteristic: R = Xmax – Xmin. The disadvantage of this indicator is that it evaluates only the boundaries of variation of a trait and does not reflect its variability within these boundaries.

Dispersion lacks this shortcoming. It is calculated as the average square of deviations of the characteristic values ​​from their average value:

A simplified way to calculate variance carried out using the following formulas (simple and weighted):

Examples of application of these formulas are presented in tasks 1 and 2.

A widely used indicator in practice is standard deviation :

The standard deviation is defined as the square root of the variance and has the same dimension as the characteristic being studied.

The considered indicators allow us to obtain the absolute value of the variation, i.e. evaluate it in units of measurement of the characteristic being studied. Unlike them, the coefficient of variation measures variability in relative terms - relative to the average level, which in many cases is preferable.

Formula for calculating the coefficient of variation.

Examples of solving problems on the topic “Indicators of variation in statistics”

Problem 1 . When studying the influence of advertising on the size of the average monthly deposit in banks in the region, 2 banks were examined. The following results were obtained:

Define:
1) for each bank: a) average deposit per month; b) contribution dispersion;
2) the average monthly deposit for two banks together;
3) Deposit variance for 2 banks, depending on advertising;
4) Deposit variance for 2 banks, depending on all factors except advertising;
5) Total variance using the addition rule;
6) Coefficient of determination;
7) Correlation relationship.

Solution

1) Let's create a calculation table for a bank with advertising . To determine the average monthly deposit, we will find the midpoints of the intervals. In this case, the value of the open interval (the first) is conditionally equated to the value of the interval adjacent to it (the second).

We will find the average deposit size using the weighted arithmetic average formula:

29,000/50 = 580 rub.

We find the variance of the contribution using the formula:

23 400/50 = 468

We will perform similar actions for a bank without advertising :

2) Let’s find the average deposit size for the two banks together. Хср =(580×50+542.8×50)/100 = 561.4 rub.

3) We will find the variance of the deposit for two banks, depending on advertising, using the formula: σ 2 =pq (formula for the variance of an alternative attribute). Here p=0.5 is the proportion of factors dependent on advertising; q=1-0.5, then σ 2 =0.5*0.5=0.25.

4) Since the share of other factors is 0.5, then the variance of the deposit for two banks, depending on all factors except advertising, is also 0.25.

5) Determine the total variance using the addition rule.

= (468*50+636,16*50)/100=552,08

= [(580-561,4)250+(542,8-561,4)250] / 100= 34 596/ 100=345,96

σ 2 = σ 2 fact + σ 2 rest = 552.08+345.96 = 898.04

6) Determination coefficient η 2 = σ 2 fact / σ 2 = 345.96/898.04 = 0.39 = 39% - the size of the contribution depends on advertising by 39%.

7) Empirical correlation ratio η = √η 2 = √0.39 = 0.62 – the relationship is quite close.

Problem 2 . There is a grouping of enterprises according to the size of marketable products:

Determine: 1) the dispersion of the value of marketable products; 2) standard deviation; 3) coefficient of variation.

Solution

1) By condition, an interval distribution series is presented. It must be expressed discretely, that is, find the middle of the interval (x"). In groups of closed intervals, we find the middle using a simple arithmetic mean. In groups with an upper limit - as the difference between this upper limit and half the size of the next interval (200-(400 -200):2=100).

In groups with a lower limit - the sum of this lower limit and half the size of the previous interval (800+(800-600):2=900).

We calculate the average value of marketable products using the formula:

Хср = k×((Σ((x"-a):k)×f):Σf)+a. Here a=500 is the size of the option at the highest frequency, k=600-400=200 is the size of the interval at the highest frequency Let's put the result in the table:

So, the average value of commercial output for the period under study is generally equal to Хср = (-5:37)×200+500=472.97 thousand rubles.

2) We find the variance using the following formula:

σ 2 = (33/37)*2002-(472.97-500)2 = 35,675.67-730.62 = 34,945.05

3) standard deviation: σ = ±√σ 2 = ±√34,945.05 ≈ ±186.94 thousand rubles.

4) coefficient of variation: V = (σ /Хср)*100 = (186.94 / 472.97)*100 = 39.52%

Along with studying the variation of a characteristic throughout the entire population as a whole, it is often necessary to trace quantitative changes in the characteristic across the groups into which the population is divided, as well as between groups. This study of variation is achieved by calculating and analyzing different types of variance.
There are total, intergroup and intragroup variances.
Total variance σ 2 measures the variation of a trait throughout the entire population under the influence of all factors that caused this variation.

Intergroup variance (δ) characterizes systematic variation, i.e. differences in the value of the studied trait that arise under the influence of the factor trait that forms the basis of the group. It is calculated using the formula:
.

Within-group variance (σ) reflects random variation, i.e. part of the variation that occurs under the influence of unaccounted factors and does not depend on the factor-attribute that forms the basis of the group. It is calculated by the formula:
.

Average of within-group variances: .

There is a law connecting 3 types of dispersion. The total variance is equal to the sum of the average of the within-group and between-group variance: .
This ratio is called rule for adding variances.

A widely used indicator in analysis is the proportion of between-group variance in the total variance. It's called empirical coefficient of determination (η 2): .
The square root of the empirical coefficient of determination is called empirical correlation ratio (η):
.
It characterizes the influence of the characteristic that forms the basis of the group on the variation of the resulting characteristic. The empirical correlation ratio ranges from 0 to 1.
Let us demonstrate its practical use using the following example (Table 1).

Example No. 1. Table 1 - Labor productivity of two groups of workers in one of the workshops of NPO "Cyclone"

Let's calculate the overall and group means and variances:




The initial data for calculating the average of intragroup and intergroup variance are presented in table. 2.
table 2
Calculation and δ 2 for two groups of workers.


Worker groups
Number of workers, people Average, children/shift Dispersion

Completed technical training

5 95 42,0

Those who have not completed technical training

5 81 231,2

All workers

10 88 185,6
Let's calculate the indicators. Average of within-group variances:
.
Intergroup variance

Total variance:
Thus, the empirical correlation ratio: .

Along with variation in quantitative characteristics, variation in qualitative characteristics can also be observed. This study of variation is achieved by calculating the following types of variances:

The within-group dispersion of the share is determined by the formula

Where n i– number of units in separate groups.
The share of the studied characteristic in the entire population, which is determined by the formula:
The three types of variance are related to each other as follows:
.

This relation of variances is called the theorem of addition of variances of the trait share.

The main generalizing indicators of variation in statistics are dispersions and standard deviations.

Dispersion this arithmetic mean squared deviations of each characteristic value from the overall average. The variance is usually called the mean square of deviations and is denoted by  2. Depending on the source data, the variance can be calculated using the simple or weighted arithmetic mean:

 unweighted (simple) variance;

 variance weighted.

Standard deviation this is a generalizing characteristic of absolute sizes variations signs in the aggregate. It is expressed in the same units of measurement as the attribute (in meters, tons, percentage, hectares, etc.).

The standard deviation is the square root of the variance and is denoted by :

 standard deviation unweighted;

 weighted standard deviation.

The standard deviation is a measure of the reliability of the mean. The smaller the standard deviation, the better the arithmetic mean reflects the entire represented population.

The calculation of the standard deviation is preceded by the calculation of the variance.

The procedure for calculating the weighted variance is as follows:

1) determine the weighted arithmetic mean:

2) calculate the deviations of the options from the average:

3) square the deviation of each option from the average:

4) multiply the squares of deviations by weights (frequencies):

5) summarize the resulting products:

6) the resulting amount is divided by the sum of the weights:

Example 2.1

Let's calculate the weighted arithmetic mean:

The values ​​of deviations from the mean and their squares are presented in the table. Let's define the variance:

The standard deviation will be equal to:

If the source data is presented in the form of interval distribution series , then you first need to determine the discrete value of the attribute, and then apply the described method.

Example 2.2

Let us show the calculation of variance for an interval series using data on the distribution of the sown area of ​​a collective farm according to wheat yield.

The arithmetic mean is:

Let's calculate the variance:

6.3. Calculation of variance using a formula based on individual data

Calculation technique variances complex, and with large values ​​of options and frequencies it can be cumbersome. Calculations can be simplified using the properties of dispersion.

The dispersion has the following properties.

1. Reducing or increasing the weights (frequencies) of a varying characteristic by a certain number of times does not change the dispersion.

2. Decrease or increase each value of a characteristic by the same constant amount A does not change the dispersion.

3. Decrease or increase each value of a characteristic by a certain number of times k respectively reduces or increases the variance in k 2 times standard deviation  in k once.

4. The dispersion of a characteristic relative to an arbitrary value is always greater than the dispersion relative to the arithmetic mean per square of the difference between the average and arbitrary values:

If A 0, then we arrive at the following equality:

that is, the variance of the characteristic is equal to the difference between the mean square of the characteristic values ​​and the square of the mean.

Each property can be used independently or in combination with others when calculating variance.

The procedure for calculating variance is simple:

1) determine arithmetic mean :

2) square the arithmetic mean:

3) square the deviation of each variant of the series:

X i 2 .

4) find the sum of squares of the options:

5) divide the sum of the squares of the options by their number, i.e. determine the average square:

6) determine the difference between the mean square of the characteristic and the square of the mean:

Example 3.1 The following data is available on worker productivity:

Let's make the following calculations: