Three-dimensional least squares method. Approximation of experimental data

Which finds the widest application in various fields of science and practice. It can be physics, chemistry, biology, economics, sociology, psychology and so on and so forth. By the will of fate, I often have to deal with the economy, and therefore today I will arrange for you a ticket to an amazing country called Econometrics=) ...How can you not want it?! It’s very good there – you just need to make up your mind! …But what you probably definitely want is to learn how to solve problems least squares method. And especially diligent readers will learn to solve them not only accurately, but also VERY FAST ;-) But first general statement of the problem+ accompanying example:

Let indicators be studied in some subject area that have a quantitative expression. At the same time, there is every reason to believe that the indicator depends on the indicator. This assumption can be both a scientific hypothesis and based on elementary common sense. Let's leave science aside, however, and explore more appetizing areas - namely, grocery stores. Denote by:

– retail space of a grocery store, sq.m.,
- annual turnover of a grocery store, million rubles.

It is quite clear that the larger the area of the store, the greater its turnover in most cases.

Suppose that after conducting observations / experiments / calculations / dancing with a tambourine, we have at our disposal numerical data:

With grocery stores, I think everything is clear: - this is the area of the 1st store, - its annual turnover, - the area of the 2nd store, - its annual turnover, etc. By the way, it is not at all necessary to have access to classified materials - a fairly accurate assessment of the turnover can be obtained using mathematical statistics. However, do not be distracted, the course of commercial espionage is already paid =)

Tabular data can also be written in the form of points and depicted in the usual way for us. Cartesian system .

Let's answer an important question: how many points are needed for a qualitative study?

The bigger, the better. The minimum admissible set consists of 5-6 points. In addition, with a small amount of data, “abnormal” results should not be included in the sample. So, for example, a small elite store can earn orders of magnitude more than “its colleagues,” thereby distorting the general pattern that you need to find!

To put it very simply, we need to select a function, schedule which passes as close as possible to the points . Such a function is called approximating (approximation - approximation) or theoretical function . Generally speaking, an obvious “contender” immediately appears here - a high-degree polynomial, the graph of which passes through ALL points. But this option is complicated and often simply incorrect. (since the graph will “loop” all the time and poorly reflect the main trend).

Thus, the sought function must be quite simple and at the same time adequately reflect the dependence. As you might guess, one of the methods for finding such functions is called least squares method. First, let's look at its essence in general terms. Let some function approximate experimental data:

How to evaluate the accuracy of this approximation? Let us also calculate the differences (deviations) between the experimental and functional values (we study the drawing). The first thought that comes to mind is to estimate how large the sum is, but the problem is that the differences can be negative (For example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it begs to take the sum modules deviations:

or in folded form: (in case anyone doesn’t know: – this is the sum icon, and – an auxiliary “counter” variable, which takes values from 1 to ).

By approximating experimental points with different functions, we will obtain different values, and obviously, where this sum is smaller, that function is more accurate.

Such a method exists and it is called least modulus method. However, in practice it has become much more widespread least square method, in which possible negative values are eliminated not by the module, but by squaring the deviations:

, after which efforts are aimed at selecting a function such that the sum of squared deviations was as small as possible. Actually, hence the name of the method.

And now we return to another important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic, exponential, logarithmic, quadratic etc. And, of course, here I would immediately like to "reduce the field of activity." What class of functions to choose for research? Primitive but effective technique:

- The easiest way to draw points on the drawing and analyze their location. If they tend to be in a straight line, then you should look for straight line equation with optimal values and . In other words, the task is to find SUCH coefficients - so that the sum of the squared deviations is the smallest.

If the points are located, for example, along hyperbole, then it is obviously clear that the linear function will give a poor approximation. In this case, we are looking for the most “favorable” coefficients for the hyperbola equation – those that give the minimum sum of squares .

Now note that in both cases we are talking about functions of two variables, whose arguments are searched dependency parameters:

And essentially we need to solve a standard problem - find minimum function of two variables.

Recall our example: suppose that the "shop" points tend to be located in a straight line and there is every reason to believe the presence linear dependence turnover from retail space. Let's find SUCH coefficients “a” and “be” such that the sum of squared deviations was the smallest. Everything is as usual - first 1st order partial derivatives. According to linearity rule You can differentiate right under the sum icon:

If you want to use this information for an essay or a term paper, I will be very grateful for the link in the list of sources, you will not find such detailed calculations anywhere:

Let's create a standard system:

We reduce each equation by “two” and, in addition, “break up” the sums:

Note : independently analyze why "a" and "be" can be taken out of the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in “applied” form:

after which the algorithm for solving our problem begins to emerge:

Do we know the coordinates of the points? We know. Amounts can we find it? Easily. Let's make the simplest system of two linear equations in two unknowns(“a” and “be”). We solve the system, for example, Cramer's method, as a result of which we obtain a stationary point. Checking sufficient condition for an extremum, we can verify that at this point the function reaches exactly minimum. Verification is associated with additional calculations and therefore we will leave it behind the scenes. (if necessary, the missing frame can be viewed). We draw the final conclusion:

Function the best way (at least compared to any other linear function) brings experimental points closer . Roughly speaking, its graph passes as close as possible to these points. In tradition econometrics the resulting approximating function is also called paired linear regression equation .

The problem under consideration is of great practical importance. In the situation with our example, the equation allows you to predict what kind of turnover ("Igrek") will be at the store with one or another value of the selling area (one or another meaning of "x"). Yes, the resulting forecast will be only a forecast, but in many cases it will turn out to be quite accurate.

I will analyze just one problem with "real" numbers, since there are no difficulties in it - all calculations are at the level of the school curriculum in grades 7-8. In 95 percent of cases, you will be asked to find just a linear function, but at the very end of the article I will show that it is no more difficult to find the equations for the optimal hyperbola, exponent, and some other functions.

In fact, it remains to distribute the promised goodies - so that you learn how to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which, in a Cartesian rectangular coordinate system, plot experimental points and a graph of the approximating function . Find the sum of squared deviations between empirical and theoretical values. Find out if the function is better (in terms of the least squares method) approximate experimental points.

Note that "x" values are natural values, and this has a characteristic meaningful meaning, which I will talk about a little later; but they, of course, can be fractional. In addition, depending on the content of a particular task, both "X" and "G" values can be fully or partially negative. Well, we have been given a “faceless” task, and we start it solution:

We find the coefficients of the optimal function as a solution to the system:

For the purposes of a more compact notation, the “counter” variable can be omitted, since it is already clear that the summation is carried out from 1 to .

It is more convenient to calculate the required amounts in a tabular form:

Calculations can be carried out on a microcalculator, but it is much better to use Excel - both faster and without errors; watch a short video:

Thus, we get the following system:

Here you can multiply the second equation by 3 and subtract the 2nd from the 1st equation term by term. But this is luck - in practice, systems are often not gifted, and in such cases it saves Cramer's method:
, which means the system has a unique solution.

Let's check. I understand that I don’t want to, but why skip mistakes where you can absolutely not miss them? Let us substitute the found solution into the left side of each equation of the system:

The right parts of the corresponding equations are obtained, which means that the system is solved correctly.

Thus, the desired approximating function: – from all linear functions It is she who best approximates the experimental data.

Unlike straight dependence of the store's turnover on its area, the found dependence is reverse (principle “the more, the less”), and this fact is immediately revealed by the negative angular coefficient. Function informs us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As they say, the higher the price of buckwheat, the less it is sold.

To plot the graph of the approximating function, we find its two values:

and execute the drawing:

The constructed straight line is called trend line (namely, a linear trend line, i.e. in the general case, a trend is not necessarily a straight line). Everyone is familiar with the expression "to be in trend", and I think that this term does not need additional comments.

Let's calculate the sum of squared deviations between empirical and theoretical values. Geometrically, this is the sum of the squares of the lengths of the “raspberry” segments (two of which are so small that they are not even visible).

Let's summarize the calculations in a table:

They can again be carried out manually, just in case I will give an example for the 1st point:

but it is much more effective to do it in the already known way:

Let's repeat: What is the meaning of the result obtained? From all linear functions y function the exponent is the smallest, that is, it is the best approximation in its family. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function would it be better to bring the experimental points closer?

Let's find the corresponding sum of squared deviations - to distinguish them, I will designate them with the letter "epsilon". The technique is exactly the same:

And again, just in case, the calculations for the 1st point:

In Excel we use the standard function EXP (syntax can be found in Excel Help).

Conclusion: , so the exponential function approximates the experimental points worse than the straight line .

But here it should be noted that “worse” is doesn't mean yet, what is wrong. Now I built a graph of this exponential function - and it also passes close to the points - so much so that without an analytical study it is difficult to say which function is more accurate.

This completes the solution, and I return to the question of the natural values of the argument. In various studies, as a rule, economic or sociological, months, years or other equal time intervals are numbered with natural "X". Consider, for example, the following problem.

Least square method

In the final lesson of the topic, we will get acquainted with the most famous application FNP, which finds the widest application in various fields of science and practice. It can be physics, chemistry, biology, economics, sociology, psychology and so on and so forth. By the will of fate, I often have to deal with the economy, and therefore today I will arrange for you a ticket to an amazing country called Econometrics=) ...How can you not want it?! It’s very good there – you just need to make up your mind! …But what you probably definitely want is to learn how to solve problems least squares method. And especially diligent readers will learn to solve them not only accurately, but also VERY FAST ;-) But first general statement of the problem+ accompanying example:

– retail space of a grocery store, sq.m.,
- annual turnover of a grocery store, million rubles.

It is quite clear that the larger the area of the store, the greater its turnover in most cases.

Tabular data can also be written in the form of points and depicted in the usual way for us. Cartesian system .

Let's answer an important question: how many points are needed for a qualitative study?

To put it very simply, we need to select a function, schedule which passes as close as possible to the points . Such a function is called approximating (approximation - approximation) or theoretical function . Generally speaking, an obvious “contender” immediately appears here - a high-degree polynomial, the graph of which passes through ALL points. But this option is complicated and often simply incorrect. (since the graph will “loop” all the time and poorly reflect the main trend).

Thus, the sought function must be quite simple and at the same time adequately reflect the dependence. As you might guess, one of the methods for finding such functions is called least squares method. First, let's look at its essence in general terms. Let some function approximate experimental data:

How to evaluate the accuracy of this approximation? Let us also calculate the differences (deviations) between the experimental and functional values (we study the drawing). The first thought that comes to mind is to estimate how large the sum is, but the problem is that the differences can be negative (For example, ) and deviations as a result of such summation will cancel each other out. Therefore, as an estimate of the accuracy of the approximation, it begs to take the sum modules deviations:

or in folded form: (for those who don't know: is the sum icon, and – an auxiliary “counter” variable, which takes values from 1 to ) .

Approximating the experimental points with different functions, we will get different values, and it is obvious where this sum is less - that function is more accurate.

, after which efforts are directed to the selection of such a function that the sum of the squared deviations was as small as possible. Actually, hence the name of the method.

And now we return to another important point: as noted above, the selected function should be quite simple - but there are also many such functions: linear , hyperbolic , exponential , logarithmic , quadratic etc. And, of course, here I would immediately like to "reduce the field of activity." What class of functions to choose for research? Primitive but effective technique:

Now note that in both cases we are talking about functions of two variables, whose arguments are searched dependency parameters:

And essentially we need to solve a standard problem - find minimum function of two variables.

If you want to use this information for an essay or a term paper, I will be very grateful for the link in the list of sources, you will not find such detailed calculations anywhere:

Let's create a standard system:

We reduce each equation by “two” and, in addition, “break up” the sums:

Note : independently analyze why "a" and "be" can be taken out of the sum icon. By the way, formally this can be done with the sum

Let's rewrite the system in “applied” form:

after which the algorithm for solving our problem begins to emerge:

In fact, it remains to distribute the promised goodies - so that you learn how to solve such examples not only accurately, but also quickly. We carefully study the standard:

Task

As a result of studying the relationship between two indicators, the following pairs of numbers were obtained:

Using the least squares method, find the linear function that best approximates the empirical (experienced) data. Make a drawing on which, in a Cartesian rectangular coordinate system, plot experimental points and a graph of the approximating function . Find the sum of squared deviations between empirical and theoretical values. Find out if the feature would be better (in terms of the least squares method) approximate experimental points.

We find the coefficients of the optimal function as a solution to the system:

For the purposes of a more compact notation, the “counter” variable can be omitted, since it is already clear that the summation is carried out from 1 to .

Thus, we get the following system:

Thus, the desired approximating function: – from all linear functions It is she who best approximates the experimental data.

Unlike straight dependence of the store's turnover on its area, the found dependence is reverse (principle “the more, the less”), and this fact is immediately revealed by the negative angular coefficient. The function tells us that with an increase in a certain indicator by 1 unit, the value of the dependent indicator decreases average by 0.65 units. As they say, the higher the price of buckwheat, the less it is sold.

To plot the graph of the approximating function, we find its two values:

and execute the drawing:

The constructed straight line is called trend line (namely, a linear trend line, i.e. in the general case, a trend is not necessarily a straight line). Everyone is familiar with the expression "to be in trend", and I think that this term does not need additional comments.

Let's repeat: What is the meaning of the result obtained? From all linear functions the function has the smallest exponent, that is, in its family, this is the best approximation. And here, by the way, the final question of the problem is not accidental: what if the proposed exponential function would it be better to bring the experimental points closer?

Conclusion: , which means that the exponential function approximates the experimental points worse than the straight line.

But here it should be noted that “worse” is doesn't mean yet, what is wrong. Now I have built a graph of this exponential function - and it also passes close to the points - so much so that without analytical research it is difficult to say which function is more accurate.

This concludes the solution, and I return to the question of the natural values of the argument. In various studies, usually economic or sociological, natural “X’s” are used to number months, years or other equal time intervals. Consider, for example, the following problem:

The following data is available on the store’s retail turnover for the first half of the year:

Using analytical straight line alignment, determine the volume of turnover for July.

Yes, no problem: we number the months 1, 2, 3, 4, 5, 6 and use the usual algorithm, as a result of which we get an equation - the only thing is that when it comes to time, they usually use the letter “te” (although it's not critical). The resulting equation shows that in the first half of the year trade turnover increased by an average of 27.74 units. per month. Get a forecast for July (month no. 7): d.e.

And there are countless tasks like this. Those who wish can use an additional service, namely my Excel calculator (demo version), which solves the analyzed problem almost instantly! Working version of the program is available in exchange or for symbolic payment.

At the end of the lesson, brief information about finding dependencies of some other types. Actually, there’s not much to tell, since the fundamental approach and solution algorithm remain the same.

Let us assume that the arrangement of the experimental points resembles a hyperbola. Then, to find the coefficients of the best hyperbola, you need to find the minimum of the function - anyone can carry out detailed calculations and arrive at a similar system:

From a formal technical point of view, it is obtained from a “linear” system (let's denote it with an asterisk) replacing "x" with . Well, the amounts calculate, after which to the optimal coefficients “a” and “be” close at hand.

If there is every reason to believe that the points are located along a logarithmic curve, then to find the optimal values we find the minimum of the function . Formally, in the system (*) should be replaced by:

When calculating in Excel, use the function LN. I confess that it will not be difficult for me to create calculators for each of the cases under consideration, but it will still be better if you "program" the calculations yourself. Video tutorials to help.

With exponential dependence, the situation is slightly more complicated. To reduce the matter to the linear case, we take the logarithm of the function and use properties of the logarithm:

Now, comparing the obtained function with the linear function , we come to the conclusion that in the system (*) must be replaced by , and – by . For convenience, let's denote:

Please note that the system is resolved with respect to and , and therefore, after finding the roots, you must not forget to find the coefficient itself.

To approximate experimental points optimal parabola , should be found minimum of a function of three variables . After performing standard actions, we get the following "working" system:

Yes, of course, there are more amounts here, but there are no difficulties at all when using your favorite application. And finally, I’ll tell you how to quickly check using Excel and build the desired trend line: create a scatter chart, select any of the points with the mouse and right click select option "Add trend line". Next, select the type of chart and on the tab "Options" activate the option "Show equation on chart". OK

As always, I want to end the article with some beautiful phrase, and I almost typed “Be in trend!”. But he changed his mind in time. And not because it's formulaic. I don't know how anyone, but I don't want to follow the promoted American and especially European trend at all =) Therefore, I wish each of you to stick to your own line!

http://www.grandars.ru/student/vysshaya-matematika/metod-naimenshih-kvadratov.html

The least squares method is one of the most common and most developed due to its simplicity and efficiency of methods for estimating the parameters of linear econometric models. At the same time, certain caution should be observed when using it, since the models built using it may not meet a number of requirements for the quality of their parameters and, as a result, not “well” reflect the patterns of process development.

Let us consider the procedure for estimating the parameters of a linear econometric model using the least squares method in more detail. Such a model in general form can be represented by equation (1.2):

y t = a 0 + a 1 x 1t +...+ a n x nt + ε t.

The initial data when estimating the parameters a 0 , a 1 ,..., a n is a vector of values of the dependent variable y= (y 1 , y 2 , ... , y T)" and the matrix of values of independent variables

in which the first column, consisting of ones, corresponds to the model coefficient.

The least squares method received its name based on the basic principle that the parameter estimates obtained on its basis must satisfy: the sum of squares of the model error should be minimal.

Examples of solving problems by the least squares method

Example 2.1. The trading enterprise has a network of 12 stores, information on the activities of which is presented in table. 2.1.

The management of the enterprise would like to know how the size of the annual turnover depends on the retail space of the store.

Table 2.1

Store number	Annual turnover, million rubles	Retail area, thousand m2
	19,76	0,24
	38,09	0,31
	40,95	0,55
	41,08	0,48
	56,29	0,78
	68,51	0,98
	75,01	0,94
	89,05	1,21
	91,13	1,29
	91,26	1,12
	99,84	1,29
	108,55	1,49

Least squares solution. Let us designate - the annual turnover of the -th store, million rubles; - selling area of the th store, thousand m 2.

Fig.2.1. Scatterplot for Example 2.1

To determine the form of the functional relationship between the variables and we will construct a scatter diagram (Fig. 2.1).

Based on the scatter diagram, we can conclude that annual turnover is positively dependent on retail space (i.e., y will increase with increasing ). The most appropriate form of functional connection is linear.

Information for further calculations is presented in Table. 2.2. Using the least squares method, we estimate the parameters of a linear one-factor econometric model

Table 2.2

t	y t	x 1t	y t 2	x1t2	x 1t y t

	19,76	0,24	390,4576	0,0576	4,7424
	38,09	0,31	1450,8481	0,0961	11,8079
	40,95	0,55	1676,9025	0,3025	22,5225
	41,08	0,48	1687,5664	0,2304	19,7184
	56,29	0,78	3168,5641	0,6084	43,9062
	68,51	0,98	4693,6201	0,9604	67,1398
	75,01	0,94	5626,5001	0,8836	70,5094
	89,05	1,21	7929,9025	1,4641	107,7505
	91,13	1,29	8304,6769	1,6641	117,5577
	91,26	1,12	8328,3876	1,2544	102,2112
	99,84	1,29	9968,0256	1,6641	128,7936
	108,55	1,49	11783,1025	2,2201	161,7395
S	819,52	10,68	65008,554	11,4058	858,3991
Average	68,29	0,89

Thus,

Therefore, with an increase in retail space by 1 thousand m2, other things being equal, the average annual turnover increases by 67.8871 million rubles.

Example 2.2. The company's management noticed that the annual turnover depends not only on the store's sales area (see example 2.1), but also on the average number of visitors. The relevant information is presented in table. 2.3.

Table 2.3

Solution. Denote - the average number of visitors to the -th store per day, thousand people.

To determine the form of the functional relationship between the variables and we will construct a scatter diagram (Fig. 2.2).

Based on the scatterplot, we can conclude that annual turnover is positively dependent on the average number of visitors per day (i.e., y will increase with increasing ). The form of functional dependence is linear.

Rice. 2.2. Scatterplot for example 2.2

Table 2.4

t	x 2t	x 2t 2	yt x 2t	x 1t x 2t

	8,25	68,0625	163,02	1,98
	10,24	104,8575	390,0416	3,1744
	9,31	86,6761	381,2445	5,1205
	11,01	121,2201	452,2908	5,2848
	8,54	72,9316	480,7166	6,6612
	7,51	56,4001	514,5101	7,3598
	12,36	152,7696	927,1236	11,6184
	10,81	116,8561	962,6305	13,0801
	9,89	97,8121	901,2757	12,7581
	13,72	188,2384	1252,0872	15,3664
	12,27	150,5529	1225,0368	15,8283
	13,92	193,7664	1511,016	20,7408
S	127,83	1410,44	9160,9934	118,9728
Average	10,65

In general, it is necessary to determine the parameters of a two-factor econometric model

y t = a 0 + a 1 x 1t + a 2 x 2t + ε t

The information required for further calculations is presented in Table. 2.4.

Let us estimate the parameters of a linear two-factor econometric model using the least squares method.

Thus,

Estimation of the coefficient =61.6583 shows that, other things being equal, with an increase in retail space by 1 thousand m 2, the annual turnover will increase by an average of 61.6583 million rubles.

The coefficient estimate = 2.2748 shows that, other things being equal, with an increase in the average number of visitors per 1 thousand people. per day, the annual turnover will increase by an average of 2.2748 million rubles.

Example 2.3. Using the information presented in table. 2.2 and 2.4, estimate the parameter of a single-factor econometric model

where is the centered value of the annual turnover of the -th store, million rubles; - centered value of the average daily number of visitors to the t-th store, thousand people. (see examples 2.1-2.2).

Solution. Additional information required for calculations is presented in Table. 2.5.

Table 2.5



	-48,53	-2,40	5,7720	116,6013
	-30,20	-0,41	0,1702	12,4589
	-27,34	-1,34	1,8023	36,7084
	-27,21	0,36	0,1278	-9,7288
	-12,00	-2,11	4,4627	25,3570
	0,22	-3,14	9,8753	-0,6809
	6,72	1,71	2,9156	11,4687
	20,76	0,16	0,0348	3,2992
	22,84	-0,76	0,5814	-17,413
	22,97	3,07	9,4096	70,4503
	31,55	1,62	2,6163	51,0267
	40,26	3,27	10,6766	131,5387
Amount			48,4344	431,0566

Using formula (2.35), we obtain

Thus,

http://www.cleverstudents.ru/articles/mnk.html

Example.

Experimental data on the values of variables X And at are given in the table.

As a result of their alignment, the function

Using least square method, approximate these data with a linear dependence y=ax+b(find parameters A And b). Find out which of the two lines is better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

Solution.

In our example n=5. We fill in the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values in the fourth row of the table are obtained by multiplying the values of the 2nd row by the values of the 3rd row for each number i.

The values in the fifth row of the table are obtained by squaring the values of the 2nd row for each number i.

The values of the last column of the table are the sums of the values across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute in them the corresponding values from the last column of the table:

Hence, y = 0.165x+2.184 is the desired approximating straight line.

It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

Proof.

So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function was positive definite. Let's show it.

The second order differential has the form:

That is

Therefore, the matrix of the quadratic form has the form

and the values of the elements do not depend on A And b.

Let us show that the matrix is positive definite. This requires that the angle minors be positive.

Angular minor of the first order . The inequality is strict, since the points

After alignment, we get a function of the following form: g (x) = x + 1 3 + 1 .

We can approximate this data using the linear relationship y = a x + b by calculating the corresponding parameters. To do this, we will need to apply the so-called least squares method. You will also need to make a drawing to check which line will best align the experimental data.

Yandex.RTB R-A-339285-1

What exactly is OLS (least squares method)

The main thing we need to do is to find such coefficients of linear dependence at which the value of the function of two variables F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 will be the smallest. In other words, for certain values of a and b, the sum of the squared deviations of the presented data from the resulting straight line will have a minimum value. This is the meaning of the least squares method. All we need to do to solve the example is to find the extremum of the function of two variables.

How to derive formulas for calculating coefficients

In order to derive formulas for calculating coefficients, you need to create and solve a system of equations with two variables. To do this, we calculate the partial derivatives of the expression F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 with respect to a and b and equate them to 0.

δ F (a , b) δ a = 0 δ F (a , b) δ b = 0 ⇔ - 2 ∑ i = 1 n (y i - (a x i + b)) x i = 0 - 2 ∑ i = 1 n ( y i - (a x i + b)) = 0 ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + ∑ i = 1 n b = ∑ i = 1 n y i ⇔ a ∑ i = 1 n x i 2 + b ∑ i = 1 n x i = ∑ i = 1 n x i y i a ∑ i = 1 n x i + n b = ∑ i = 1 n y i

To solve a system of equations, you can use any methods, for example, substitution or Cramer's method. As a result, we should have formulas that can be used to calculate coefficients using the least squares method.

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n

We have calculated the values of the variables for which the function
F (a , b) = ∑ i = 1 n (y i - (a x i + b)) 2 will take the minimum value. In the third paragraph, we will prove why it is so.

This is the application of the least squares method in practice. Its formula, which is used to find the parameter a, includes ∑ i = 1 n x i, ∑ i = 1 n y i, ∑ i = 1 n x i y i, ∑ i = 1 n x i 2, as well as the parameter
n - it denotes the amount of experimental data. We advise you to calculate each amount separately. The coefficient value b is calculated immediately after a .

Let's go back to the original example.

Example 1

Here we have n equal to five. To make it more convenient to calculate the required amounts included in the coefficient formulas, let’s fill out the table.

	i = 1	i=2	i=3	i=4	i=5	∑ i = 1 5
x i	0	1	2	4	5	12
y i	2 , 1	2 , 4	2 , 6	2 , 8	3	12 , 9
x i y i	0	2 , 4	5 , 2	11 , 2	15	33 , 8
x i 2	0	1	4	16	25	46

Solution

The fourth row includes the data obtained by multiplying the values from the second row by the values of the third for each individual i. The fifth line contains the data from the second squared. The last column shows the sums of the values of the individual rows.

Let's use the least squares method to calculate the coefficients a and b we need. To do this, substitute the desired values from the last column and calculate the sums:

n ∑ i = 1 n x i y i - ∑ i = 1 n x i ∑ i = 1 n y i n ∑ i = 1 n - ∑ i = 1 n x i 2 b = ∑ i = 1 n y i - a ∑ i = 1 n x i n ⇒ a = 5 33, 8 - 12 12, 9 5 46 - 12 2 b = 12, 9 - a 12 5 ⇒ a ≈ 0, 165 b ≈ 2, 184

It turns out that the required approximating straight line will look like y = 0, 165 x + 2, 184. Now we need to determine which line will better approximate the data - g (x) = x + 1 3 + 1 or 0, 165 x + 2, 184. Let's make an estimate using the least squares method.

To calculate the error, we need to find the sum of squared deviations of the data from the straight lines σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 and σ 2 = ∑ i = 1 n (y i - g (x i)) 2, the minimum value will correspond to a more suitable line.

σ 1 = ∑ i = 1 n (y i - (a x i + b i)) 2 = = ∑ i = 1 5 (y i - (0 , 165 x i + 2 , 184)) 2 ≈ 0 , 019 σ 2 = ∑ i = 1 n (y i - g (x i)) 2 = = ∑ i = 1 5 (y i - (x i + 1 3 + 1)) 2 ≈ 0 , 096

Answer: since σ 1< σ 2 , то прямой, наилучшим образом аппроксимирующей исходные данные, будет
y = 0 , 165 x + 2 , 184 .

The least squares method is clearly shown in the graphic illustration. The red line marks the straight line g (x) = x + 1 3 + 1, the blue line marks y = 0, 165 x + 2, 184. Raw data are marked with pink dots.

Let us explain why exactly approximations of this type are needed.

They can be used in tasks that require data smoothing, as well as in those where data must be interpolated or extrapolated. For example, in the problem discussed above, one could find the value of the observed quantity y at x = 3 or at x = 6. We have devoted a separate article to such examples.

Proof of the OLS method

In order for the function to take a minimum value when a and b are calculated, it is necessary that at a given point the matrix of the quadratic form of the differential of the function of the form F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2 is positive definite. Let's show you how it should look.

Example 2

We have a second-order differential of the following form:

d 2 F (a ; b) = δ 2 F (a ; b) δ a 2 d 2 a + 2 δ 2 F (a ; b) δ a δ b d a d b + δ 2 F (a ; b) δ b 2 d 2b

Solution

δ 2 F (a ; b) δ a 2 = δ δ F (a ; b) δ a δ a = = δ - 2 ∑ i = 1 n (y i - (a x i + b)) x i δ a = 2 ∑ i = 1 n (x i) 2 δ 2 F (a ; b) δ a δ b = δ δ F (a ; b) δ a δ b = = δ - 2 ∑ i = 1 n (y i - (a x i + b) ) x i δ b = 2 ∑ i = 1 n x i δ 2 F (a ; b) δ b 2 = δ δ F (a ; b) δ b δ b = δ - 2 ∑ i = 1 n (y i - (a x i + b)) δ b = 2 ∑ i = 1 n (1) = 2 n

In other words, we can write it like this: d 2 F (a ; b) = 2 ∑ i = 1 n (x i) 2 d 2 a + 2 2 ∑ x i i = 1 n d a d b + (2 n) d 2 b.

We obtained a matrix of the quadratic form M = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n .

In this case, the values of individual elements will not change depending on a and b . Is this matrix positive definite? To answer this question, let's check whether its angular minors are positive.

We calculate the angular minor of the first order: 2 ∑ i = 1 n (x i) 2 > 0 . Since the points x i do not coincide, the inequality is strict. We will keep this in mind in further calculations.

We calculate the second-order angular minor:

d e t (M) = 2 ∑ i = 1 n (x i) 2 2 ∑ i = 1 n x i 2 ∑ i = 1 n x i 2 n = 4 n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2

After this, we proceed to prove the inequality n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 using mathematical induction.

Let's check whether this inequality is valid for an arbitrary n. Let's take 2 and calculate:

2 ∑ i = 1 2 (x i) 2 - ∑ i = 1 2 x i 2 = 2 x 1 2 + x 2 2 - x 1 + x 2 2 = = x 1 2 - 2 x 1 x 2 + x 2 2 = x 1 + x 2 2 > 0

We got the correct equality (if the values x 1 and x 2 do not match).

Let's make the assumption that this inequality will be true for n , i.e. n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 – true.
Now let's prove the validity for n + 1 , i.e. that (n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 > 0 if n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 > 0 .

We calculate:

(n + 1) ∑ i = 1 n + 1 (x i) 2 - ∑ i = 1 n + 1 x i 2 = = (n + 1) ∑ i = 1 n (x i) 2 + x n + 1 2 - ∑ i = 1 n x i + x n + 1 2 = = n ∑ i = 1 n (x i) 2 + n x n + 1 2 + ∑ i = 1 n (x i) 2 + x n + 1 2 - - ∑ i = 1 n x i 2 + 2 x n + 1 ∑ i = 1 n x i + x n + 1 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + n x n + 1 2 - x n + 1 ∑ i = 1 n x i + ∑ i = 1 n (x i) 2 = = ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + x n + 1 2 - 2 x n + 1 x 1 + x 1 2 + + x n + 1 2 - 2 x n + 1 x 2 + x 2 2 + . . . + x n + 1 2 - 2 x n + 1 x 1 + x n 2 = = n ∑ i = 1 n (x i) 2 - ∑ i = 1 n x i 2 + + (x n + 1 - x 1) 2 + (x n + 1 - x 2) 2 + . . . + (x n - 1 - x n) 2 > 0

The expression enclosed in curly braces will be greater than 0 (based on what we assumed in step 2), and the remaining terms will be greater than 0, since they are all squares of numbers. We have proven the inequality.

Answer: the found a and b will correspond to the smallest value of the function F (a, b) = ∑ i = 1 n (y i - (a x i + b)) 2, which means that they are the required parameters of the least squares method (LSM).

If you notice an error in the text, please highlight it and press Ctrl+Enter

The method of least squares (OLS) allows you to estimate various quantities using the results of many measurements containing random errors.

Characteristics of MNEs

The main idea of this method is that the sum of squared errors is considered as a criterion for the accuracy of solving the problem, which they strive to minimize. When using this method, both numerical and analytical approaches can be used.

In particular, as a numerical implementation, the least squares method involves taking as many measurements as possible of an unknown random variable. Moreover, the more calculations, the more accurate the solution will be. Based on this set of calculations (initial data), another set of estimated solutions is obtained, from which the best one is then selected. If the set of solutions is parameterized, then the least squares method will be reduced to finding the optimal value of the parameters.

As an analytical approach to the implementation of LSM on a set of initial data (measurements) and an expected set of solutions, a certain one (functional) is determined, which can be expressed by a formula obtained as a certain hypothesis that requires confirmation. In this case, the least squares method comes down to finding the minimum of this functional on the set of squared errors of the original data.

Note that not the errors themselves, but the squares of the errors. Why? The fact is that often deviations of measurements from the exact value are both positive and negative. When determining the average, simple summation may lead to an incorrect conclusion about the quality of the estimate, since the cancellation of positive and negative values will reduce the power of sampling multiple measurements. And, consequently, the accuracy of the assessment.

To prevent this from happening, the squared deviations are summed up. Even moreover, in order to equalize the dimension of the measured value and the final estimate, the sum of the squared errors is extracted

Some MNC applications

MNC is widely used in various fields. For example, in probability theory and mathematical statistics, the method is used to determine such a characteristic of a random variable as the standard deviation, which determines the width of the range of values of the random variable.

tutorial

Introduction

I am a mathematician and programmer. The biggest leap I took in my career was when I learned to say: "I do not understand anything!" Now I am not ashamed to tell the luminary of science that he is giving me a lecture, that I do not understand what he, the luminary, is telling me. And it's very difficult. Yes, admitting your ignorance is difficult and embarrassing. Who likes to admit that he doesn’t know the basics of something? Due to my profession, I have to attend a large number of presentations and lectures, where, I admit, in the vast majority of cases I want to sleep because I do not understand anything. But I don’t understand because the huge problem of the current situation in science lies in mathematics. It assumes that all listeners are familiar with absolutely all areas of mathematics (which is absurd). Admitting that you don’t know what a derivative is (we’ll talk about what it is a little later) is shameful.

But I've learned to say that I don't know what multiplication is. Yes, I don't know what a subalgebra over a Lie algebra is. Yes, I don’t know why quadratic equations are needed in life. By the way, if you are sure that you know, then we have something to talk about! Mathematics is a series of tricks. Mathematicians try to confuse and intimidate the public; where there is no confusion, there is no reputation, no authority. Yes, it is prestigious to speak in as abstract a language as possible, which is complete nonsense.

Do you know what a derivative is? Most likely you will tell me about the limit of the difference ratio. In the first year of mathematics and mechanics at St. Petersburg State University, Viktor Petrovich Khavin told me determined derivative as the coefficient of the first term of the Taylor series of the function at a point (this was a separate gymnastics to determine the Taylor series without derivatives). I laughed at this definition for a long time until I finally understood what it was about. The derivative is nothing more than a simple measure of how similar the function we are differentiating is to the function y=x, y=x^2, y=x^3.

I now have the honor of lecturing to students who afraid mathematics. If you are afraid of mathematics, we are on the same path. As soon as you try to read some text and it seems to you that it is overly complicated, then know that it is poorly written. I assert that there is not a single area of mathematics that cannot be discussed “on the fingers” without losing accuracy.

Assignment for the near future: I assigned my students to understand what a linear quadratic regulator is. Don’t be shy, spend three minutes of your life and follow the link. If you don’t understand anything, then we are on the same path. I (a professional mathematician-programmer) didn’t understand anything either. And I assure you, you can figure this out “on your fingers.” At the moment I don't know what it is, but I assure you that we will be able to figure it out.

So, the first lecture that I am going to give to my students after they come running to me in horror and say that a linear-quadratic regulator is a terrible thing that you will never master in your life is least squares methods. Can you solve linear equations? If you are reading this text, then most likely not.

So, given two points (x0, y0), (x1, y1), for example, (1,1) and (3,2), the task is to find the equation of a straight line passing through these two points:

illustration

This line should have an equation like the following:

Here alpha and beta are unknown to us, but two points of this line are known:

We can write this equation in matrix form:

Here we should make a lyrical digression: what is a matrix? A matrix is nothing more than a two-dimensional array. This is a way of storing data; no further meanings should be attached to it. It depends on us exactly how to interpret a certain matrix. Periodically, I will interpret it as a linear mapping, periodically as a quadratic form, and sometimes simply as a set of vectors. This will all be clarified in context.

Let's replace concrete matrices with their symbolic representation:

Then (alpha, beta) can be easily found:

More specifically for our previous data:

Which leads to the following equation of the line passing through the points (1,1) and (3,2):

Okay, everything is clear here. Let's find the equation of the line passing through three points: (x0,y0), (x1,y1) and (x2,y2):

Oh-oh-oh, but we have three equations for two unknowns! A standard mathematician will say that there is no solution. What will the programmer say? And he will first rewrite the previous system of equations in the following form:

In our case, the vectors i, j, b are three-dimensional, therefore (in the general case) there is no solution to this system. Any vector (alpha\*i + beta\*j) lies in the plane spanned by the vectors (i, j). If b does not belong to this plane, then there is no solution (equality cannot be achieved in the equation). What to do? Let's look for a compromise. Let's denote by e(alpha, beta) exactly how far we have not achieved equality:

And we will try to minimize this error:

Why square?

We are looking not just for the minimum of the norm, but for the minimum of the square of the norm. Why? The minimum point itself coincides, and the square gives a smooth function (a quadratic function of the arguments (alpha, beta)), while simply the length gives a cone-shaped function, non-differentiable at the minimum point. Brr. A square is more convenient.

Obviously, the error is minimized when the vector e orthogonal to the plane spanned by the vectors i And j.

Illustration

In other words: we are looking for a straight line such that the sum of the squared lengths of the distances from all points to this straight line is minimal:

UPDATE: I have a problem here, the distance to the straight line should be measured vertically, and not by orthogonal projection. This commentator is right.

Illustration

In completely different words (carefully, poorly formalized, but it should be clear): we take all possible lines between all pairs of points and look for the average line between all:

Illustration

Another explanation is straightforward: we attach a spring between all data points (here we have three) and the straight line that we are looking for, and the straight line of the equilibrium state is exactly what we are looking for.

Quadratic form minimum

So, given this vector b and a plane spanned by the column vectors of the matrix A(in this case (x0,x1,x2) and (1,1,1)), we are looking for the vector e with a minimum square of length. Obviously, the minimum is achievable only for the vector e, orthogonal to the plane spanned by the column vectors of the matrix A:

In other words, we are looking for a vector x=(alpha, beta) such that:

I remind you that this vector x=(alpha, beta) is the minimum of the quadratic function ||e(alpha, beta)||^2:

Here it is useful to remember that the matrix can be interpreted as well as the quadratic form, for example, the identity matrix ((1,0),(0,1)) can be interpreted as a function of x^2 + y^2:

quadratic form

All this gymnastics is known under the name linear regression.

Laplace's equation with Dirichlet boundary condition

Now the simplest real problem: there is a certain triangulated surface, it is necessary to smooth it. For example, let's load a model of my face:

The original commit is available. To minimize external dependencies, I took the code of my software renderer, already on Habré. To solve the linear system, I use OpenNL , it's a great solver, but it's very difficult to install: you need to copy two files (.h + .c) to your project folder. All smoothing is done with the following code:

For (int d=0; d<3; d++) { nlNewContext(); nlSolverParameteri(NL_NB_VARIABLES, verts.size()); nlSolverParameteri(NL_LEAST_SQUARES, NL_TRUE); nlBegin(NL_SYSTEM); nlBegin(NL_MATRIX); for (int i=0; i<(int)verts.size(); i++) { nlBegin(NL_ROW); nlCoefficient(i, 1); nlRightHandSide(verts[i][d]); nlEnd(NL_ROW); } for (unsigned int i=0; i&face = faces[i]; for (int j=0; j<3; j++) { nlBegin(NL_ROW); nlCoefficient(face[ j ], 1); nlCoefficient(face[(j+1)%3], -1); nlEnd(NL_ROW); } } nlEnd(NL_MATRIX); nlEnd(NL_SYSTEM); nlSolve(); for (int i=0; i<(int)verts.size(); i++) { verts[i][d] = nlGetVariable(i); } }

X, Y and Z coordinates are separable, I smooth them separately. That is, I solve three systems of linear equations, each with the same number of variables as the number of vertices in my model. The first n rows of matrix A have only one 1 per row, and the first n rows of vector b have original model coordinates. That is, I spring-tie between the new vertex position and the old vertex position - the new ones shouldn't be too far away from the old ones.

All subsequent rows of matrix A (faces.size()*3 = the number of edges of all triangles in the grid) have one occurrence of 1 and one occurrence of -1, while the vector b has zero components opposite. This means I put a spring on each edge of our triangular mesh: all edges try to get the same vertex as their starting and ending points.

Once again: all vertices are variables, and they cannot deviate far from their original position, but at the same time they try to become similar to each other.

Here's the result:

Everything would be fine, the model is really smoothed, but it moved away from its original edge. Let's change the code a little:

For (int i=0; i<(int)verts.size(); i++) { float scale = border[i] ? 1000: 1; nlBegin(NL_ROW); nlCoefficient(i, scale); nlRightHandSide(scale*verts[i][d]); nlEnd(NL_ROW); }

In our matrix A, for the vertices that are on the edge, I add not a row from the category v_i = verts[i][d], but 1000*v_i = 1000*verts[i][d]. What does it change? And this changes our quadratic form of the error. Now a single deviation from the top at the edge will cost not one unit, as before, but 1000*1000 units. That is, we hung a stronger spring on the extreme vertices, the solution will prefer to stretch the others more strongly. Here's the result:

Let's double the strength of the springs between the vertices:
nlCoefficient(face[ j ], 2); nlCoefficient(face[(j+1)%3], -2);

It is logical that the surface has become smoother:

And now even a hundred times stronger:

What is this? Imagine that we have dipped a wire ring in soapy water. As a result, the resulting soap film will try to have the least curvature as possible, touching the border - our wire ring. This is exactly what we got by fixing the border and asking for a smooth surface inside. Congratulations, we have just solved Laplace's equation with Dirichlet boundary conditions. Sounds cool? But in fact, just one system of linear equations to solve.

Poisson's equation

Let's have another cool name.

Let's say I have an image like this:

Everyone is good, but I don't like the chair.

I'll cut the picture in half:

And I will select a chair with my hands:

Then I will pull everything that is white in the mask to the left side of the picture, and at the same time throughout the picture I will say that the difference between two neighboring pixels should be equal to the difference between two neighboring pixels of the right picture:

For (int i=0; i

Here's the result:

Code and pictures available