Mod-01 Lec-23 ILL-conditioned Linear System

We are considering perturbed systems, and today, we are going to study ill-conditioned matrices in more detail. Yesterday, we had considered first the perturbed system A of x plus delta x is equal to b plus delta b; that means, only there was perturbation in the right hand side; then we considered the system, when there is perturbation only in the coefficient matrix. Today, I am going to state the result for the perturbed system, when there will be perturbation in the coefficient matrix and also in the right hand side. I am just going to state the result, because the proof is something similar to what we have done before, and then after that, we will take up the study of ill-conditioned matrices So, our exact system is A x is equal to b, if there is perturbation is only in the right hand side, then we proved that the relative error norm delta x by norm x is less than or equal to condition number of A into norm delta A by norm A, where the condition number is norm A into norm A inverse. And our assumption is A, is invertible operator, right hand side b is not equal to 0, so that will guarantee that x is also not equal to 0, so you can divide by norm delta A Here, this norm delta A by norm A, actually it should be norm delta b by norm b, then when you consider perturbation in the coefficient matrix, then you get norm delta x by norm x less than or equal to condition number of A and then 1 minus condition number of A into norm delta A by norm A And this is the perturbation norm delta A by norm A, for this result, you assume that norm delta A by norm A is less than one upon condition number of A, that guarantees that A plus delta A is going to be invertible And in the denominator, 1 minus condition number of A into norm delta A by norm A will be A number bigger than 0 Now, look at the case, when there is perturbation in the coefficient matrix, perturbation in the right hand side, the assumption will be the norm delta A by norm A is less than 1 upon condition number of A In this case, the relative error norm delta x by norm x is less than or equal to condition number of A multiplied by relative error in the coefficient matrix plus relative error in the right hand side divided by 1 minus condition number of A norm delta A by norm A. So, if delta A is equal to 0, that means only perturbation in the right hand side, then you will have condition number of A norm delta b by norm b and denominator will be 1, so we get back our earlier result. If norm delta b is equal to 0, that means only perturbation in the coefficient matrix, then you have condition number of A multiplied by norm delta A by norm A divide by 1 minus condition number of A into norm delta A by norm A And if there is perturbation in both, the coefficient matrix and right hand side, then this is the result. So, here what you have to notice is that the perturbation is small, it is suppose to be small and so norm delta A by norm A is going to be a small number So, if the condition number is something reasonable, then the denominator is going to be approximately equal to 1. In the numerator, you have got condition number multiplied by the relative error in the coefficient matrix plus relative error in the right hand side So, this condition number is going to play A crucial role, your starting error will get multiplied by this condition number. So, if the condition number is small, then we say

that the system is well conditioned, if it is big, then we say that it is ill-conditioned Last time we saw that the condition number is going to be bigger than or equal to 1, so the least condition number is equal to 1, now when we say that condition number is small, condition number is big, there is no precise boundary here, it depends on your computer, it depends on your starting error and it depends on how much accuracy you need So, you have suppose your norm delta b by norm b is approximately equal to 10 raise to minus 4, and if your condition number is about 100, then norm delta x by norm x is less than or equal to 10 raise to minus 2 So, you are starting with the error to be of the order of 10 raise to minus 4, after doing your computations, then your error in the computed solution is going to be 10 raise to minus 2. So, this can be acceptable, like if you are starting error is 10 raise to minus 6, then your error in the computed solution will be 10 raise to minus 4 So, this can be something acceptable, but suppose your condition number is of the order of 10 raise to 4, and your starting error is 10 raise to minus 4, then your relative error norm delta x by norm x it is going to be about 1; that means, norm delta x is about the same as norm x, so the error is of the same order as of your solution. And then, this generally, it will not be acceptable, so if the condition number is between 1 to 100, generally it is considered to be a well conditioned system, otherwise it is ill-conditioned So, now, let us see whether we can do something about the condition number, our condition number is norm A into norm A inverse. Norm have property that if I look at norm of alpha times A, it is going to be equal to mod alpha times norm A, and if I multiply throughout by alpha, that means A x is equal to b is our exact system, and instead of that if I consider alpha times A x is equal to alpha times b, where alpha is A non-zero number, then I do not alter my system and then I can make norm alpha A to be as small as I want Unfortunately this multiplication of the coefficient matrix A by A non-zero number is not going to change the condition number, because you have norm A into norm A inverse. So, when you look at norm alpha a, you have to look at norm of alpha A inverse, and then you are going to have exactly the same condition number So, kappa A is norm A into norm A inverse, alpha not equal to 0, so kappa of alpha A is equal to norm alpha A into norm alpha A inverse This will be mod alpha times norm A, alpha A inverse will be A inverse divided by alpha, so that will be 1 upon mod alpha norm A inverse So, A mod alpha and 1 upon mod alpha will get cancelled, and then you get exactly the same condition number So, this trick of multiplying the system by A non-zero number that is not going to work, whether the condition number has something to do with the determinant, like we know that matrix A is invertible, provided determinant of A is not equal to 0. So, now, whether small determinant means ill-conditioned system, that means whether that will mean that the condition number is going to be big, so that is not the case like. The condition number

of matrix has nothing to do with small determinant, so we can look at a simple example of diagonal matrix. So, look at diagonal matrix with diagonal entries to be epsilon and epsilon is greater than 0 Now, if you look at any induced matrix norm, then norm A epsilon is equal to epsilon. Inverse of A epsilon is going to be diagonal matrix with diagonal entries as 1 by epsilon, so norm A epsilon inverse will be 1 upon epsilon and that gives you condition number of A is equal to 1, the best condition number we can have. So, now, determinant of epsilon A is epsilon square, I can make it as small as I wish by choosing my epsilon appropriately So, as long as epsilon is greater than 0, the condition number is going to be equal to 1, determinant of A epsilon will be epsilon square, so I can make it as small as I wish So, there is no connection between small determinant and ill conditioning Now, the ill conditioning, it can be because of poor scaling, so what I mean is you look at A system where the coefficient matrix is 1 0 0 epsilon x 1 x 2 is equal to 1 epsilon So, our system is x 1 is equal to 1 and epsilon x 2 is equal to epsilon. If I have epsilon to be very small compare to 1, then our A inverse is going to be 1 0 0 1 by epsilon When I consider norm A that is going to be equal to 1, because epsilon is much smaller than 1, when you look at compare 1 and mod epsilon, then 1 is going to be… and here take epsilon to be bigger than 0 So, norm A will be 1, whereas, norm A inverse will be 1 by epsilon, so that will give you condition number of A with respect to either 1 norm or infinity norm to be 1 by epsilon, and which is going to be, it will be big, because you can choose epsilon to be small and then 1 by epsilon will become big. Now, in this equation, if I multiply the second equation by 1 upon epsilon, then what I get is the system 1 0 0 1 x 1 x 2 to be equal to 1; 1 this is the well conditioned system In fact, the condition number is going to be equal to 1, the best possible condition number. So, see there is a difference between multiplying the coefficient matrix by alpha, and also right hand side by alpha and multiplying only one of the rows by alpha You have system of linear equations, so you have n equations in n unknowns. If I multiply one of the equation by a non-zero constant, so I multiply the left hand side as well as right hand side, in that case, my solution is not going to change. So, I will have the same solution as before, but as we saw just now, it can change the condition number of your matrix. So, if you multiply one row or more than one row, several rows by non-zero constants, and you multiply the corresponding right hand side also by the same constant, then your system remains the same, the solution remains the same, but the condition number of your matrix can change. What if I multiply the columns? Like I look at one of the column, and then I multiply by a constant, then you are changing the system, but in tutorial problem, we will see that this multiplying say jth column by a non-zero constant c, its effect on the solution will be the corresponding component, will get multiplied by 1 by c

So, let me repeat, you have got coefficient matrix a, you are looking at system A x is equal to b, I look at jth column, I multiply jth column by A non-zero constant c So, I am going to get a different system, but the solution of this system, the new system will get affected only in the jth component, all other components of the solution they will be same as the earlier solution, and jth component will get multiplied by one by c. It is easy to show this and this, we will do as a tutorial problem, but when you multiply a column by a non-zero constant, it can change the condition number of your matrix So, you have got a system A x is equal to b, we are going to see that if your say columns or rows, they are out of scale; that means, one of the column, you take some norm, we have decided that we are going to take either one norm, two norm or infinity norm. So, suppose one of the columns has a big norm, and you have got another column which has got small norm, then your matrix is going to be ill condition, this is what like we are trying to understand. What well conditioned matrix and what ill-conditioned matrix means, so it has nothing to do with small determinant, but it has something to do with the norms of the columns or norms of the rows. And then what one can do is, multiply rows and multiply by columns by numbers, so as to have the norms of all the columns and norms of all the rows to be about the same, it may not be possible to do this always, but that is one of the ways of making your system to be well conditioned So, you have scaling A x is equal to b, multiply an equation by c not equal to 0, you get the same solution, this is known as row scaling, multiply jth column by c not equal to 0 So, this is our exercise and this is known as column scaling Now, I define maximum magnification of A, that is maximum of norm A x by norm x, x not equal to 0. So, this is nothing but our norm A, induced matrix norm, you define minimum magnification of A to be minimum of norm A x upon norm x, x not equal to 0, so minimum magnification of by A. We are going to show in tutorial that condition number of A is equal to maximum magnification of A divided by minimum magnification of A. Now, what happens if minimum magnification of A is equal to 0? Because I am looking at when I consider maximum magnification of A, that means it is norm of a. We know that if your matrix A is A non-zero matrix, then the norm of A has to be non-zero, because if you get the maximum of norm A x by norm x, x not equal to 0, vector is equal to 0, then it will mean that norm A x is zero for all non-zero vectors So, you look at the canonical vectors and

then you conclude that A is a zero matrix, but now it is something different, we are looking at minimum of norm A x by norm x, x not equal to zero vector. So, what if this minimum is equal to 0? Now, this minimum cannot be equal to 0, because our matrix A is invertible; in fact we will show that minimum magnification of A is nothing but one upon norm A inverse, and that will give us condition number of A to be equal to maximum magnification of A divided by minimum magnification of A Now, using these ideas about maximum magnification and minimum magnification, we are going to show that if the columns of the matrix are out of scale; that means, one of the columns has big norm as compare to some other column, then such a matrix is going to be necessarily ill-conditioned So, let us show this now. So, you choose the vector norm to be either 1 norm, 2 norm or infinity norm, condition number of A is going to be norm A into norm A inverse, and we are going to show that norm A inverse is 1 upon minimum magnification. So, condition number of A is equal to norm A divided by minimum magnification of A. Since, norm A is maximum of all these quotients, if I look at x is equal to e j, canonical vector with one at j th place and 0 elsewhere. So, I will have norm A to be bigger than or equal to norm e A e j, but A e j is nothing but the jth column, so you get norm c j. When you consider minimum magnification of A, that is minimum of norm A x by norm x, x not equal to 0, so this minimum magnification of A will be less than or equal to norm A e i, which is equal to norm c i So, thus condition number of A is bigger than or equal to norm c j by norm c i, so j and i they can be any numbers, like that they denote the jth column and ith column. We have proved that condition number of A is bigger than or equal to norm c j by norm c i. If j is equal to i, this result tells us that condition number of A is going to be bigger than or equal to one, that result we have already proved. But, suppose your norm c 1 is ten cube, and your norm c 3 is 10 raise to minus 3, then your condition number will be bigger than or equal to 10 raise to 3 divided by 10 raise to minus 3, so it is going to be bigger than or equal to 10 raise to 6 It will also be bigger than or equal to 10 raise to minus 3 divided by 10 raise to 3, but we know that we have better result. So, condition number of A is bigger than or equal to norm c j by norm c i, where c j denotes the jth column and j and i can take any values So, if you have got one column with A big norm compare to norm of another column, your condition number is going to be big and then your system will be ill-conditioned. Now, I am going to go back to our example of ill-conditioned matrix, which we had considered last time, and for that matrix we will calculate, or we will try to find out the direction of maximum magnification and the direction of minimum magnification So, we have A to be a two by two matrix, 1000 999 999 and 998, inverse of this matrix is given by diagonal entries to be minus 998 and minus 1000 of diagonal entries are 999 and 999. So, these are symmetric matrices, norm A infinity is equal to norm A inverse

infinity is equal to 1999 and that gave us condition number to be of the order of 10 raise to 6 If you look at A of 1 1, that is 1999 and 1997, so norm of A x infinity divided by norm x infinity, x is 1 1, so its infinity norm is going to be 1 1. Infinity norm of this is 1999 and that is norm of A infinity, so this 1 1, it is going to be direction of maximum magnification by A, look at A inverse of minus 1 1, this you get it to be 1997 and 1999 So, norm A inverse x infinity will be maximum of these two entries So, that is 1999 and norm x infinity will be 1, so it is norm A inverse infinity. So, this minus 1 1 is going to be direction of maximum magnification by A inverse, and this is going to be direction of minimum magnification by A, this equation tells you that A of this is going to be equal to minus 1 1 So, 1 1 is direction of maximum magnification by A, and this 1997 and 1999, this is going to be direction of minimum magnification by A inverse. So, the example, which we considered last time where the exact solution was 1 1, you perturb it slightly… and then you get a solution, which is completely different than the original solution So, this example was constructed to illustrate the effect of ill-conditioned matrix. So, what one had done was this 1 1, it was the direction of magnification, maximum magnification by A, so that is what we choose. And in the right hand side, the perturbation which we choose, so the perturbation was 0.01 and minus 0.01, so that was chosen in the direction of maximum magnification by A inverse, that is why we got such spectacular result, so that was for the sake of illustration So, now we are going to look at what does ill conditioning means geometrically, so what we will do is, we will look at a two by two system, like consider two equations in two unknowns and then we will try to interpret the ill conditioning of the system for this particular two by two system. So, we are not going to take a example, we will just take two equations in two unknowns and then try to see what does ill conditioning means If A is A ill-conditioned matrix, whose rows and columns are not severely out of scale, that means we take out this or we rule out the possibility, that ill conditioning is coming because of the rows and columns which are out of scale. So, suppose this is not the case, they are approximately the same, now you normalize A, so that norm A is equal to 1 We have seen that the condition number does not change if you multiply A matrix by a non-zero constant, so without loss of generality I can assume that norm of A is going to be equal to one. Now, suppose the condition number is much big, so we write this as one is to two times less than norm a. So, now, condition number of A is norm A into norm A inverse We have normalized A, so that norm A is equal

to one. So, our condition number of A is equal to norm A inverse. So, that means our norm A inverse is going to be A big number, but what was norm A inverse is going to be? One upon minimum magnification of a, so we have got condition number of a, which is equal to norm A inverse, which is equal to one upon minimum magnification of a. So, this one upon minimum magnification of A, is going to be much bigger. So, that means, if you take its reciprocal, minimum magnification of A will be much small, so we have minimum magnification means, you look at minimum of norm A x by norm x, x not equal to zero vector Now, if minimum magnification is small, that means there will exist A vector such that x such that norm A x by norm x is going to be small, it will not be zero, because our matrix is invertible, but norm A x by norm x is going to be small. So, once again I normalize now vector x to be two one, so norm x is equal to one, so this minimum magnification of A is much small, will mean that there exist A vector x of norm one for which norm A x is small or it means the vector A x is going to be about a zero vector. So, minimum magnification of A less than or equal to 1, so there exist x belonging to r n, norm x is equal to one such that norm A x is much less than 1. e j’s are our canonical vectors, so we write our x belonging to r n, it can be written as x is equal to x 1 e 1 plus x n e n, where e j is canonical vector with 0 everywhere except 1 at jth place, then A x is going to equal to x 1 A e 1 plus x n A e n Now, what is A e 1? It is going to be c 1 plus x n A e n that is nth column which is c n, so this is about 0, 0 vector and that means, c 1, c 2, c n; these are almost linearly dependent and norm x is equal to 1. So, linearly dependent will mean that there exists a non scalar x 1, x 2, x n such that x 1 c 1 plus x 2 c 2 plus x n c n is equal to A zero vector. Our matrix A is invertible, so that means the columns are linearly independent But then, the condition number is big means that the columns are about or almost linearly dependent. So, the ill conditioning has nothing to do with small determinant, but it has to do something with linear independence and linear dependence of the columns So, now, we go will look at two by two system So, it is A 1 1 x 1 plus A 1 2 x 2 is equal to b 1 and A 2 1 x 1 plus A 2 2 x 2 is equal to b 2. If you look at only one equation, A 1 1 x 1 plus A 1 2 x 2 is equal to b 1, this is going to be an equation of a straight

line, then second equation also depend, it is also an equation of A straight line. The straight line defined by the first equation, this straight line will be perpendicular to the vector A 1 1 A 1 2, so think about it The second line which is given by A 2 1 x 1 plus A 2 2 x 2 is equal to b 2, so this second line is going to be perpendicular to A 2 1 A 2 2, so we are looking at two equations into unknowns, each of the equation represents A straight line. Now, the x 1 x 2, which satisfy both the equations, that means, your x 1 x 2 should lie on the first straight line, and your x 1 x 2 also should lie on the second straight line. So, that means, the solution is given by intersection of these two straight lines, so you have two equations in two unknowns, its solution is nothing but intersection of two straight lines, the first straight line is going to be perpendicular to vector A 11 A 12 and second straight is going to be perpendicular to vector A 21 A 22 So, now let us look at the straight line which are given by 1000 x 1 plus 999 x 2 is equal to b 1 and 999 x 1 plus 998 x 2 is equal to b 2. So, this is the same example which we had considered yesterday and today for the maximum magnification. So, now, the slopes of these two straight lines, they are given by slope of the first straight line, will be minus 1000 divided by 999, slope of the second straight line is given by minus 999 divided by 998. So, the first slope is about minus 1.001001, the second slope is minus 1.001002; that means, the two straight lines they are nearly parallel So, you have say the two straight lines, they are shown by solid straight line. So, these are our straight lines, then when you perturb, so if instead of b 1, I consider some b 1 plus delta b 1, then what I am doing is, I am looking at a parallel straight line I am not changing the slope, the slope is determined by 1000 and 999, but if I change the right hand side, I am looking at A parallel straight line. So, the first equation or the first system of equation, its solution was given by intersection of these two straight lines. So, this is going to be your original solution and then now you are looking at, your taking instead of this straight line, you are looking at this parallel straight line, which is obtained by perturbing b 1 slightly. So, now, I have to look at intersection of this new doted straight line and then the other straight line. And I look at the intersection, so this is the original intersection and this is the new intersection. So, there is a lot of change, and that is the geometric interpretation of ill conditioning that we had, two straight lines where almost parallel, so when I take slight change in the right hand side, it changes my solutions completely. So, when we look at this matrix, the example which we have considered, which has 1999, 199 and 999 When you look at their columns, then the columns, their one norms or their infinity norms, they

are they do not defer much, same thing for the rows, it is a symmetric matrix. So, the ill conditioning is not coming from the scaling problem, but it came from the fact that its two rows or two columns they are almost linearly dependent. So, let me summarize what we have learnt about the ill conditioning today, that the condition number will not change if I just multiply the coefficient matrix by a number alpha, it will change if I multiply only some of the rows by non-zero constants and similarly for the columns Then, if the rows or columns, if they are out of scale, then they are going to make your system to be ill-conditioned. And ill conditioning has nothing to do with small determinant, but it has it has a relation with linear independence or linear dependence of columns. So, now, you start with a system A x is equal to b, what you can try to do is try to make the columns and the rows to have about the same norm, you do as much as it is possible, so that is the columns scaling And then row scaling, after that the system is given to you, so you do not have much control over it, but at least you should know that you are solving some system of equations, you are getting approximate solution So, then how reliable that solution is; that means, how near it is to the exact solution, because we are interested in the exact solution, this is our limitation, that we have to use computer, so then you are going to solve A perturbed system, but then at least one should know that what is the limitation of my method, this much one can do. Other thing which is in our hands is not to make our system which was well conditioned to become ill-conditioned in the process, that we have got to look at gauss elimination method We look at our coefficient matrix; we want to introduce zeros in the first column below the diagonal, so we subtract appropriate multiples of the first row from the second row third row and so on. Then in the next stage, we work on only n minus one by n minus one sub matrix, which consist of the second, third and nth row and second, third and nth column So, your matrix A is well conditioned, but this new sub matrix, you have to be careful that you are not going to make it ill condition, so I am going to illustrate this by an example So, here is the system, you have got three equations in three unknowns, for three by three matrix you can calculate its inverse and you can verify that it is going to be a well conditioned matrix. Now, here the whatever analysis we are going to do, we are going to assume that you are doing computations using four digits, like no matter how powerful your computer is, your number of digits, number of significant digits that is going to be limited. Like you consider representation of any number, so that number one will represent as 0 point and then we want the first digit to be a non-zero number

So, we have got 0 point, then say alpha 1 up to alpha n into 10 raise to, and then here you will have some exponent. So, this is going to be representation of here number, I am writing to the base of 10, it can be a binary base or it can be 16. And then you consider alpha 1 to be not equal to 0, so any number you are going to represent in this form, so this n, this is going to be something fixed, and no matter how powerful your computer is, this n is going to be something finite So, there is going to be always some error, some numbers you can represent exactly, but most of the numbers there will be floating point representation and then there is going to be some error. So, here given a number, say suppose you are looking at 1 by root 2, then you look at its approximation 0 point Alpha 1 alpha 2 alpha n 10 raise to something, this is what you do. Now, in the process, you are going to do, you are going to multiply, you are going to add, you are going to subtract, so at every stage, you will have to either chop or you will have to do round off. So, that means always you are going to have say your number, it should be always of the form 0 point d 1 d 2 d n into 10 raise to some exponential So, for this example, in order to illustrate, we will assume that we have got four digits available, and then we will see that this number 0.002, it is a small pivot, if you it is non-zero, so I can very well do the gauss elimination method, but this small pivot is going to create problem, and your well conditioned matrix in the second step it becomes ill-conditioned, so this is what we will like to avoid. So, this thing we will consider next time, and we will consider some related issues like whether I can make this small pivot to be big arbitrarily by multiplying the row by a constant. So, we will see these things in the next lecture and we will consider also backward analysis; thank you