Let’s say I have

some matrix A. Let’s say it’s an n-by-k

matrix, and I have the equation Ax is equal to b. So in this case, x would have to

be a member of Rk, because we have k columns here, and

b is a member of Rn. Now, let’s say that it just so

happens that there is no solution to Ax is equal to b. What does that mean? Let’s just expand out A. I think you already know

what that means. If I write a like this, a1, a2,

if I just write it as its columns vectors right there,

all the way through ak, and then I multiply it times x1,

x2, all the way through xk, this is the same thing as

that equation there. I just kind of wrote out

the two matrices. Now, this is the same thing as

x1 times a1 plus x2 times a2, all the way to plus xk times ak

is equal to the vector b. Now, if this has no solution,

then that means that there’s no set of weights here on the

column vectors of a, where we can get to b. Or another way to say it is, no

linear combinations of the column vectors of a will

be equal to b. Or an even further way of saying

it is that b is not in the column space of a. No linear combination of these

guys can equal to that. So let’s see if we can

visualize it a bit. So let me draw the column

space of a. So maybe the column space of

a looks something like this right here. I’ll just assume it’s

a plane in Rn. It doesn’t have to be a plane. Things can be very general, but

let’s say that this is the column space. This is the column space of a. Now, if that’s the column space

and b is not in the column space, maybe we

can draw b like this. Maybe b, let’s say this is the

origin right there, and b just pops out right there. So this is the 0 vector. This is my vector b, clearly

not in my column spaces, clearly not in this plane. Now, up until now, we would

get an equation like that. We would make an augmented

matrix, put in reduced row echelon form, and get a line

that said 0 equals 1, and we’d say, no solution, nothing

we can do here. But what if we can do better? You know, we clearly can’t

find a solution to this. But what if we can find

a solution that gets us close to this? So what if I want to find some

x, I’ll call it x-star for now, where– so I want to find

some x-star, where A times x-star is– and this is

a vector– as close as possible– let me write this–

as close to b as possible. Or another way to view it, when

I say close, I’m talking about length, so I want to

minimize the length of– let me write this down. I want to minimize the length

of b minus A times x-star. Now, some of you all

might already know where this is going. But when you take the difference

between 2 and then take its length, what

does that look like? Let me just call Ax. Ax is going to be a member

of my column space. Let me just call that v. Ax is equal to v. You multiply any vector in Rk

times your matrix A, you’re going to get a member of

your column space. So any Ax is going to be

in your column space. And maybe that is the vector v

is equal to A times x-star. And we want this vector to get

as close as possible to this as long as it stays–

I mean, it has to be in my column space. But we want the distance between

this vector and this vector to be minimized. Now, I just want to show you

where the terminology for this will come from. I haven’t given it its

proper title yet. If you were to take this

vector– let just call this vector v for simplicity– that

this is equivalent to the length of the vector. You take the difference between

each of the elements. So b1 minus v1, b2 minus v2,

all the way to bn minus vn. And if you take the length of

this vector, this is the same thing as this. This is going to be equal

to the square root. Let me take the length

squared, actually. The length squared of this is

just going to be b1 minus v1 squared plus b2 minus v2 squared

plus all the way to bn minus vn squared. And I want to minimize this. So I want to make this value the

least value that it can be possible, or I want to get the

least squares estimate here. And that’s why, this last minute

or two when I was just explaining this, that was just

to give you the motivation for why this right here is called

the least squares estimate, or the least squares solution,

or the least squares approximation for the equation

Ax equals b. There is no solution to this,

but maybe we can find some x-star, where if I multiply A

times x-star, this is clearly going to be in my column space

and I want to get this vector to be as close to

b as possible. Now, we’ve already seen in

several videos, what is the closest vector in any

subspace to a vector that’s not in my subspace? Well, the closest vector to

it is the projection. The closest vector to b, that’s

in my subspace, is going to be the projection of

b onto my column space. That is the closest

vector there. So if I want to minimize this,

I want to figure out my x-star, where Ax-star is equal

to the projection of my vector b onto my subspace or onto

the column space of A. Remember what we’re

doing here. We said Axb has no solution, but

maybe we can find some x that gets us as close

as possible. So I’m calling that my least

squares solution or my least squares approximation. And this guy right here is

clearly going to be in my column space, because you take

some vector x times A, that’s going to be a linear combination

of these column vectors, so it’s going to

be in the column space. And I want this guy to be as

close as possible to this guy. Well, the closest vector in my

column space to that guy is the projection. So Ax needs to be equal

to the projection of b on my column space. It needs to be equal to that. But this is still pretty

hard to find. You saw how, you know, you took

A times the inverse of A transpose A times A transpose. That’s hard to find that

transformation matrix. So let’s see if we can find an

easier way to figure out the least squares solution, or kind

of our best solution. It’s not THE solution. It’s our BEST solution

to this right here. That’s why we call it the least

squares solution or approximation. Let’s just subtract b from

both sides of this and we might get something

interesting. So what happens if we take Ax

minus the vector b on both sides of this equation? I’ll do it up here

on the right. On the left-hand side we

get A times x-star. It’s hard write the x and

then the star because they’re very similar. And we subtract b from it. We subtract our vector b. That’s going to be equal to the

projection of b onto our column space minus b. All I did is I subtracted

b from both sides of this equation. Now, what is the projection

of b minus our vector b? If we draw it right here, it’s

going to be this vector right– let me do it in

this orange color. It’s going to be this

right here. It’s going to be that vector

right there, right? If I take the projection of b,

which is that, minus b, I’m going to get this vector. you

we could say b plus this vector is equal to

my projection of b onto my subspace. So this vector right

here is orthogonal. It’s actually part of the

definition of a projection that this guy is going to be

orthogonal to my subspace or to my column space. And so this guy is orthogonal

to my column space. So I can write Ax-star minus

b, it’s orthogonal to my column space, or we could

say it’s a member of the orthogonal complement

of my column space. The orthogonal complement is

just the set of everything, all of the vectors that are

orthogonal to everything in your subspace, in your column

space right here. So this vector right here

that’s kind of pointing straight down onto my plane

is clearly a member of the orthogonal complement

of my column space. Now, this might look familiar

to you already. What is the orthogonal

complement of my column space? The orthogonal complement of

my column space is equal to the null space of a transpose,

or the left null space of A. We’ve done this in many,

many videos. So we can say that A times my

least squares estimate of the equation Ax is equal to

b– I wrote that. So x-star is my least squares

solution to Ax is equal to b. So A times that minus

b is a member of the null space of A transpose. Now, what does that mean? Well, that means that if I

multiply A transpose times this guy right here, times

Ax-star– and let me, no I don’t want to lose the vector

signs there on the x. This is a vector. I don’t want to forget that. Ax-star minus b. So if I multiply A transpose

times this right there, that is the same thing is that,

what am I going to get? Well, this is a member of the

null space of A transpose, so this times A transpose has

got to be equal to 0. It is a solution to A transpose

times something is equal to the 0 vector. Now. Let’s see if we can simplify

this a little bit. We get A transpose A times

x-star minus A transpose b is equal to 0, and then if we add

this term to both sides of the equation, we are left with A

transpose A times the least squares solution to Ax

equal to b is equal to A transpose b. That’s what we get. Now, why did we do

all of this work? Remember what we started with. We said we’re trying to find a

solution to Ax is equal to b, but there was no solution. So we said, well, let’s find

at least an x-star that minimizes b, that minimizes

the distance between b and Ax-star. And we call this the least

squares solution. We call it the least squares

solution because, when you actually take the length, or

when you’re minimizing the length, you’re minimizing the

squares of the differences right there. So it’s the least squares

solution. Now, to find this, we know

that this has to be the closest vector in our

subspace to b. And we know that the closest

vector in our subspace to b is the projection of b onto our

subspace, onto our column space of A. And so, we know that A–

let me switch colors. We know that A times our least

squares solution should be equal to the projection of b

onto the column space of A. If we can find some x in Rk that

satisfies this, that is our least squares solution. But we’ve seen before that

the projection b is easier said than done. You know, there’s a

lot of work to it. So maybe we can do

it a simpler way. And this is our simpler way. If we’re looking for this,

alternately, we can just find a solution to this equation. So you give me an Ax equal to

b, there is no solution. Well, what I’m going to do is

I’m just going to multiply both sides of this equation

times A transpose. If I multiply both sides of this

equation by A transpose, I get A transpose times Ax is

equal to A transpose– and I want to do that in the same

blue– A– no, that’s not the same blue– A transpose b. All I did is I multiplied

both sides of this. Now, the solution to this

equation will not be the same as the solution to

this equation. This right here will always

have a solution, and this right here is our least

squares solution. So this right here is our

least squares solution. And notice, this is some matrix,

and then this right here is some vector. This right here is

some vector. So long as we can find a

solution here, we’ve given our best shot at finding a solution

to Ax equal to b. We’ve minimized the error. We’re going to get Ax-star,

and the difference between Ax-star and b is going

to be minimized. It’s going to be our least

squares solution. It’s all a little bit abstract

right now in this video, but hopefully, in the next video,

we’ll realize that it’s actually a very, very

useful concept.

really helpful

Respond to this video…

Good video!!!! And nice work! Good luck with the KhanAcademy 🙂

Thanks a lot, very comprehensive ! great job!

thaks

Excelent video.

Thanks much :))))))))

Vahag

I am the 60th guy liking it !! 😛 😀

Great vid, thank you. 🙂

can you teach me cubic expressions and cubic equations 🙂

eg. solve the equation x(3X3X3) – 2x(2X2) – x + 2 = 0

by using the factor theorem formula 🙂

I was just watching some stanford video lectures, and they were talking about some linear algebra concepts, and I thought to myself ' I don't remember linear algebra being so hard when I did it with Sal', and then I came back to these videos, and it turns out I was right. Linear Algebra is so so so intuitive the way Sal explains it, and I had no trouble just picking up from this video without watching all the previous ones. Stupid non-Khan teachers. make me feel retarded.

n1

I wish to know how to solve this: x has values of : -2 0 1 2 3 and y : 17 5 2 1 2 and i'm asked to use the least squares method, but i've been absent and i don't know exactly what my teacher ment by that or what that method consists of. Can anyone help me solve this ?

what happens when AT*A is singular. How do we solve for the least square solution?

very helpful! Thanks a lot! you are doing great things! I also listened to your other videos, all very wonderful!

You are like a billion times better than my professor… and my professor isn't even bad. On the contrary he's my favorite! You're just even better at explaining things.

Plus it's impossible for me to lose focus with the pretty colors and your beautiful handwriting. lol

I have my Linear Algebra final tomorrow (technically today) and I owe the A that I'm sure to get to you and all your helpful videos!

Your videos are just great !!! The concepts with geometrical examples make very good sense !!! Thanks a lot

nice vid, but why did you take the length squared? i understand that the length of the vector would be sqrt(b1^2 + b2^2…bn^2) but why did you square even that?

Very useful! In my lecture slides I had this term Hx=z for the same problem and I couldn't make sense of how we could get to this as the best solution: x = (Ht*H)^-1 * Ht * z.

Now I understand:-)

Very useful man you are doing an amazing job this literally saved me hours of searching and reading can't thank you enough 🙂

Love the diagram. It always give the big picture.

love this guy

I have a question..

does least sequare approximation has always solution..

god dang it I knew I should have chosen other bachelor thesis..

It would be great having links when says "I explained (whatever) in a different video" to access that explanation. In this case I wanted to know why C(A)transpose=N(Atranspose).

Thanks¡

accha hai

Super helpful! Thank you for uploading!

Thanks so much Khan…wonderful explanation in two videos that explains everything…great. You are wonderful

Comes handy while studying machine learning.

Khaini thuk k aa pehle

how did you know that it was a projection to the Col(A) and not anything else like the Range(A)?

Best approach to the problem. No gradient, no multivariable calculus. you're master!

Super clarity……

2018? Im alone 🙁

Excellent explanation of a valuable technique.

This guy is good………..

Should have used n instead of k its usually mxn in R^n

great geometric intuition of linear regression

Thank you so much. You just simplified long boring hours of confusing lecture

Helpful exploration of least square properties

I have one question, whether the LSS always consistent? if yes, how can I prove it? please answer

Why is this method chosen over the derivative method with minimize the error. Is this method more useful for multi linear or just multiple variable problems

Great video, thanks for your explaination, my I ask is Least – squares inversion the same thing as this with different name?

I was doing an online machine learning course and got lost when the lecturer introduced the normal equation (which this is, with a different name). Needless to say, I'm finna binge-watch your linear algebra lectures now because I get insecure about using equations I don't understand. Thanks for the playlist, I really wanna put ML in my toolset so we're doing this!

"Some of you might already know where this is going.."

Me: Nope