PDF version - The Institute for Computational
Transcription
PDF version - The Institute for Computational
TICAM REPORT 97-05 April 1997 Krylov-Secant Methods for Solving Large Scale Systems of Coupled Nonlinear Parabolic Equations Hector Klie Krylov-secant methods for solving large scale systems of coupled nonlinear parabolic equations Hector Klfe September TR96-2S 1996 RICE CNIVERSITY Krylov-secant methods for solving large scale systems of coupled nonlinear parabolic equations by Hector Klie .-\ THESIS IN PARTIAL SUBMITTED FULFILL~lENT REQl~IRE~IENTS OF THE FOR THE DEGREE Doctor of Philosophy ApPROVED, THESIS I): .' I \' ....( .-; t.. ( ! COMMITTEE: ;; ~IClrYF. \Vhe~r;r, Chairman Ernp~t and vlrginia Cockrell Chair in Engineering Ulliver~ty of Texas DallllY <)/:'orensen Prof~sllr of Computational ~ I it t ht> 11 lilt ic~ (;~()rgt> Hiri\saki pror~ssor of ('hemical and Applied Engineering ('lint ~. O"w:;on Associat.t> Profes~or of Aerospace Enginpt"ring and Engineering Mechanics {:l1i\'~rsity of Texa~ at Austin ~Iarcelo Rame CR PC Research Scientist IV Houston. Texas Sept.elllbE'l', 1996 Abstract Krylov-secant methods for solving large scale systems of coupled nonlinear parabolic equations by Hector Klie This dissertation of applications Cf.'llters on two major aspects dictating The former aspect leads to the conception of a way of reusing the l(rylov information systems arising within a ~ewton method. developed on a nonlinear soh'ill!!; non-symmetric (IS time based on the solution of systems of coupled nonlinear parabolic equa- tions: nonlinear and linear iterations. \1O\'e\ the computational generated by GMRES for solving linear The approach version of the Eirola-Nevanlinna stems from theory recently algorithm (originally linear systems) which is capable of converging Broydpn's I1lpthod . ..\ secant update strategy for twice as fast of the Hessenberg matrix resulting from the .-\rnoldi process in G:\IRES amounts to reflecting a secant update of the current .Jacobian with the rank-one term projected onto the generated (Krylov-Broyden ~evanlinna update). This allows the design of a new nonlinear Krylov subspace Krylov-Eirola- (KEN) algorithm and a higher-order version of Newton's method (HOK\) as well. The underlying development by cheaper Richardson iterations tion. Hence, three algorithms the nonlinear Eirola-:\evanlinna is also auspicious to replace the use of G MRES for the sake of fulfilling the inexact Newton condi- derived from ='lewton's method, algorithm are proposed Broyden's method and as a part of a new family \\'ith carefulness and diligent work: Carol San Soucie. for those never-ending worthy moments of developing solver code for RPARSIM the questions I addressed me that Latin enthusiastic in this dissertation; attitude which promoted many of and ~liguel Argaez, for sharing with towards research. Finally. I acknowledge Intevep S.A. for giving me the opportunity Fnivf"rsity and for its continuous hut financial support all these years. to come to Rice Contents .-\bstract v Acknowledgments VII List of Illustrations XIII List of Tables XXI 1 Introduction 1.1 ~Iotivation....... 1.2 Structure 1.:3 ~ otation of the thesis. 6 . . . . . . . . 8 1.-1 Some preliminary 9 results 1.-1.1 Matrix analysis results 10 1.-1.2 Fundamentals 12 1.-1.:3 :\onlinear 2 Newton-Krylov LIThe 2.2 1 of iterative solution methods. convergence ' . 16 19 methods 19 inexact ~ewton framework 2.1.1 Algorithm .. 20 2.1.2 Forcing terms 22 2.1.:3 Globalization 2,) 2.1.-1 \Vhy Krylov subspace methods 28 G\IRES . , . 2.2.1 The Arnoldi factorization 2.2.2 ~linimization 2.2.:] Convergence of residuals. . .) ')1 _._.-1 'I gon.th m .--\ 2.2.;) The role of G\IRES :n . :2.:3 BiCGSTAB in Newton's method . 2.:3.1 General remarks 2.:3.2 Algorithm .... 3 Secant methods 41 :3.1 The family of rank-one solvers. -1:3 :3.1.1 Broyclen's method. . . . :3.1.2 The family of E01-like methods -l8 1.1.:3 Inexactness .55 4-l in secant methods :3.2 Secant preconditioners :3.2.1 Secant preconditioners :3.2.2 Preconditioners :3.:3 Exploiting :3.:3.1 :JA . based on multiple secant updates .... 62 6-l the Arnoldi factorization 6.5 :3.:3.2 On the I\rylov-Broyclen update 11 \olllirlf'ar .... ,t Krylov-E\ :3.-1.t The nonlinear :3.-1.2 .-\ higher-order methods I\E\ algorithm KryIO\'-\ewton 4 Hybrid Krylov-secant -l.t for inexact Newton methods I\rylov basis information r pdating .57 algorithm 76 methods 83 Hybrid Krylov methods 8:3 -t.1.1 Projection S.} -t.1.2 Reducing t he cost of solving the new Jacobian -1.1.:3 Spectra onto the Krylov subspace vs. Pseudospectra equation. . . . . . . . . . . . . . . . . . S8 91 XI "".1A Richardson iteration 9.) and Leja ordering -1.2 The family of HKS methods 98 -1.2,1 The HKS algorithms "".2.2 The role of preconditioning .4:.2.:3 Globalization .4:.:3 Computational .4:.:3.1 9S in Krylov-secant methods 106 . considerations 118 for HKS methods 121 Limiting memory compact representations "".:3.2 Computational complexity 122 . 12.5 5 Preconditioning Krylov methods for systems of coupled nonlinear equations 129 .).1 \Iotivation.......... 129 :).2 Description of the Problem. 1:31 .5.2.1 Differential Equations 131 :).2.2 Discretization..... 1:3"" 5.:3 ·SA 5.·S .1.2.:3 ~ewton and linear system formulation 1:36 The algebraic coupled linear system framework. 1:36 :).:3.1 Structure 1:37 5.:3.2 An algebraic analysis of the coupled Jacobian of Resulting Decoupling operators Linear System . . matrix 1:38 . . .S.4.1 Block decoupling .S.-L2 Properties 1.4:2 . of the partially decoupled blocks 1.4::3 1-17 Two-stage preconditioners I·S0 ,S.·5.1 Background.... 150 :)..1.2 Combinative :)..1.:3 .-\dditive and multiplicative two-stage preconditioner extensions . 1.'):3 1·5"" XII ·1..5.4 Consecutive block two-stage .J.:).:) Relation between ·).:),6 Efficient implement.ation alternate preconditioners ... and consecutive forms l:)'j l6i . 6 Computational experiments 6.1 Evaluating Krylov-secant 6.1.1 Preliminary 6.1.2 The modified 6.1.:3 Richards' 171 methods. examples. 172 . . . problem. tT, equation . . . . . . tS?) for coupled Bratu 6.2 Evaluat.ing preconditioners 6.:3 Evaluating parallel preconditioners 171 Krylov-secant for systems 6.:3.1 Brief description 6.:3.2 Considerations methods of coupled 193 systems. and two-stage nonlinear of the model for implementing 202 equations 202 . the HOKN algorithm with the :2GSS preconditioner 6.:3.:3 ~ lImerical results . . . . 7 Conclusions and further work 20, 221 Bibliography 227 Glossary 245 Illustrations 2.1 The use of the forcing term criteria for dynamic control of linear tolerances. The solid line represents a standard implementation with fixed linear tolerances 0.1. The dotted line is the inexact :\ewton implementation Each symbol inexact Newton * indicates with the forcing term criterion. the start of a new Newton iteration. :3.1 Convergence comparison of ~ewton's method (dash-dotted Broyden's method (dotted line), the composite Newton's . . .. 2.:' line), method (dashed line) and the NEN algorithm (solid line) in their exact versions. ;'):3 :3.:2 C'onvergenge comparison of ~ ewton's method (dash-dotted Broyden's method (dotted line). the composite Newton's line), method (dashed line) and the NEN algorithm (solid line) in their inexact versIons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. :3.:3 Convergence comparison :3.-1 between Broyden's .. ,:3 . . . . . . . . .. ,9 method (solid line). . . . . . . . . . .. Convergence comparison between the nonlinear I~EN algorithm (solid line). Surface and contour plot of the Rosenbrock Jacobian at the first and last nonlinear iteration. -!.2 line) and the Krylov-Broyden (dotted line) and the HOKN algorithm -l.1 method (dotted iteration pseudospectra . . . . . . . . . . . . . . . . Surface and contour plot of the Powell Jacobian first and last nonlinear .16 pseudospectra 92 at the ' 9:3 XIV -1.:3 Surface and contour plot of the Chandrasekhar pseudospectra -1.-1 at the first and last nonlinear iteration (easy case). Surface and contour plot of the Chandrasekhar pseudospectra .1.,:) .Jacobian ~n .. 9:3 .Jacobian at the first and last nonlinear iteration Convergence comparison .. bet\veen the HKS-Broyden (hard case). (dash-dotted 10:3 line) and the HK5-EN (solid line) algorithm. -1.6 Convergence of t.he HKS-N algorithm.. -l.T Pseudospectra of preconditioned . . . 104 .Jacobian matrices for the extended Rosenbrock fUIlct.ioIl. C pper left corner: Al\I-l; upper right corner: .-\.+.U-1: lower left. corner .-\+(.\[-1)+ and, lower right corner: (AiVI-1)+.11:3 -l.S Pseudospectra of preconditioned singular function. .Jacobian matrices for the Powell Upper left corner: AA[-l; upper right corner: '-\'+.U-l: lower left corner .4.+p.[-I)+ and, lower right corner: (AkJ-l)+.11:3 UJ Pseudospectra of precondit.ioned of the Chandrasekhar H-equation. .Jacobian matrices for the easy case Upper left corner: AAJ-1; upper right corner: .-\.+.\I-l: 10\,,'er left corner ,4+(;\1-1)+ and, lower right corner: (._lJ[-I)+. . . . . . . . . . . . . . . . . . . . . . . . . . . .. LlO Pseudospectra of preconditioned of the C'handrasekhar H-equation. .Jacobian matrices for the hard case r pper left corner: ,4.~\I-I: upper right corner: .4+J[-I: lower left corner ..{+(.\I-I)+ corner: (.·U[-I)+ L 1t Convergence 11-1 bet\veen the HKS-Broyden line) and the HI\S-EN (solid line) algorithm preconditioning. -l.12 Convergence and. lower right , comparison 1l-t (dashed-dotted with tridiagonal . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. of the HKS-N algorithm with tridiagonal preconditioning. 117 117 Xli ,1,1 ~Iatrix structure of linear systems in the Newton iteration. .').2 Spectra of the blocks composing the sample Jacobian top to bottom, :').:3 they correspond to Jpp, Jpe, Jep and Jee• matrix. 1:38 From 1-1:0 .••.••• Spectrum of the sample Jacobian matrix to be used throughout discussiQQ..,on two-stage preconditioners. .j.-l: . . . .. the . . . . . . . . . . . . . . .. 141 Spectra of the partially decoupled forms of the sample Jacobian matrix. The one above corresponds to D-l J, and the one below to W J (or equivalently, .j..j ~V J.) . . . . . . . . . . . . . . . . . . . . . .. Spectra of each block after decoupling with D. From top to bottom, they correspond to the (1,1), (1,2), (2,1) and (2,2) blocks .... :').6 Spectra of the .Jacobian right-preconditioned the combinative C). j 1-1:9 operator. the two-stage additive operator. the two-stage multiplicative 1.54 by the exact version of . . . . . . . . . . . . . . . . . . .. .1.S Spectra of the .Jacobian right-preconditioned the two-stage block .Jacobi operator.. " the two-stage block Gauss-Seidel operator. 5.11 Spectra of the .Jacobian right-preconditioned l·j7 by the exact version of . . . . . . . . . . . . . . . .. right-preconditioned l.j6 by the exact version of operator ,).~) Spectra of the .Jacobian right-preconditioned UjD by the exact version of ,...................... Spectra of the .Jacobian right-preconditioned ,).10 Spectra of the Jacobian ,. 1·19 by the exact version of . . . . . . . . . . . . .. 162 by the exact version of the two-stage block discrete projection operator. . . . . . . . . . .. 162 XVI 6.1 Performance in millions of floati'ng point operations method (dash-dotted line), composite Newton's method (dashed line) and the HOKN algorithm Rosenbrock's (solid line) for solving the extended function, the extended Powell's function and two levels of difficulty of the Chandrasekhar 6.2 Performance H-equation. Rosenbrock's Powell's function H-equation. Performance in millions of floating point operations (dash-dotted line). HKS-E~ for solving the extended :"ionlinear iterations of \e\\·ton-like (clashed line) and the HKS-N (solid line) Rosenbrock's function, the extended iterations Performance HKS-\ algorithm -to grid. 1T) -w vs. Relative ~onlinear 1SO Residuals ~orms (RNRN) for the modified Bratu problem on a -10x -10grid. lSI in millions of floating point operations method, composite x H-equation. . \olliinear to Powell's methods for the modified Bratu problem on a 40 x of secallt-like methods 6.6 174 vs. Relative Nonlinear Residuals Norms (RNRN) grid G.:) . . .. of HKS-B function ancl two levels of difficulty of the Chandrasekhar 6.-! algorithm (solid line) for solving function, the extended and two levels of clifficulty of the Chandrasekhar 17:3 of Broyden's line). the nonlinear Eirola-Nevanlinna (dashed line) and the nonlinear KEN algorithm 6.:3 . . . . . . . . . . . .. in millions of floating point operations method (dash-dotted the extended of Newton's of Newton's :"Jewton's method, the HOKN algorithm and the for solving the modified Sratu problem on a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 182 XVII 6.7 Performance in millions of floating point operations method, the nonlinear Eirola-:'Jevanlinna KEN algorithm, algorithm, of Broyden's the nonlinear the HKS-B algorithm and the HKS-EN algorithm for solving the modified Bratu problem on a -10 x 40 grid .. 6.8 \-Vater content, distribution 6.9 Dispersivityat 1st and 1000th time steps 6.10 Hydraulic Conductivity 6.11 Performance ~ewton's at 1st and 1000th time steps. . IS7 coefficients at 1st and 1000th time steps. in accumulated and the HKS-N algorithm method, for solving Richards' in accumulated 187 of the HOKN algorithm equation. . . . . .. millions of floating point operations Broyden's method, the nonlinear Eirola-Nevanlinna nonlinear KE)I algorithm, . .. millions of floating point operations method. composite ~ewton's 6.12 Performance IS7 the HKS-B algorithm algorithm, IS9 of the and the HKS-EN algorithm for solving Richards' equation . 190 6.1:3 Relative residual norms vs. iteration of G~IRES for different preconditioners. The performance with different preconditioners are organized in matrix form. Subplot (1.1): ILU (dot), Trid(dash), block .Jacobi (solid). Subplot (1. 2): two-stage combinative two-stage additive (dash), two-stage multiplicative (solid). Subplot (2,1): two-stage block Jacobi (dot), two-stage Gauss-Seidel two-stage discrete projection (dot), two-stage multiplicative (dot), (dash). (solid). Subplot (2,2): block Jacobi (dash), two-stage discrete projection (solid): Problem Size: -1 x 8 x S. :It = 0.1. . . . . . . . . . . . . . .. 200 XVIII 6.1-! Relative residual norms vs. iteration of BiCGSTAB for different preconditioners. The performance with different preconditioners organized in matrix form. Subplot (1,1): ILU (dot). Trid(dash). block .Jacobi (solid). Subplot (L 2): two-stage combinative two-stage additive (dash), two-stage multiplicative (dot), (solid). Subplot (2,1): two-stage block Jacobi (dot). two-stage Gauss-Seidel two-stage discrete projection (dash), two-stage discrete projection ~t=O.1. (solid). Problem Size: -!:x8xS. function (RIGHT). (dash), (solid). Subplot (2.2): block Jacobi (dot). two-stage multiplicative 6.1:) Relative permeability are of both phases (LEFT) 201 and capillary pressure . . . . . . . . . . . . . . . . . . . . . . . . . . .. 204 6.16 Speedup \'s. number of processors for the two-phase problem using the HOK~/2SGS solver on an Intel Paragon after 20 time steps. . .. 211 6.1" Speedup \'s. number of processors for the two-phase problem using the HOK~/2SGS solver on an IBM SP2 after 20 time steps. 6.18 Log-log plot of the number of processors vs. execution two-phase problem using the HOK~/2SGS Paragon after 20 time steps. . . . .. 211 time for the sol\'f~r on an Intel . . . . . . . . . . . . . . . . . . . . .. 212 6.19 Log-log plot of the number of processors \'s. execution time for the two-phase problem using the HOK~ /2SGS solver on an IB~:{ SP2 after 20 time steps. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 6.20 Number of accumulated G\IRES iterations vs Relative nonlinear residual norms ();,RN'R) using the HOKN /2SGS, Newton/2SGS \'ewtonf2SComb 21:1 and solvers on 12 nodes of the IBM SP2 for a problem size of 16 x -!S x -~Sat the third time step with :It = .05 day. . . .. 21:1 XIX 6.21 CPU time vs Relative nonlinear residual norms (:\'RNR) using the HOK:'J/2SGS, Newton/2SGS and Newton/2SComb solvers on 12 nodes of the IB\>I SP2 for a problem size of 16 x 18 x 18 at the third time step with ~t = .05 day.. 6.22 Performance in accumulated and Newton/2SComb . ,.......... G~lRES iterations of the HOKN/2SGS solvers after 100 time steps of simulation with DT = .0.5 of a 16 x -lS x 48 problem size on 16 SP2 nodes. 6.2:3 Performance in accumulated :Jewton/2SC'omb CPU time of the HOKN/2SGS solvers after 100 time steps of simulation ~t = .05 of a l6 x -lS x -lS problem size on 16 SP2 nodes ... . . . . .. 21.j and with .. 21.j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 216 6.2-l Performance in accumulated Newtonj2SGS simulation nodes. 2U nonlinear iterations and ~ewton/2SComb " of HOKN/2SGS. solvers after 100 time steps of with ~t = .0.5 of a 16 x 48 x 48 problem size on 16 SP2 XXII 6.:3 Total of nonlinear iterations applicable) Richardson (NI), G\IRES iterations iterations (G1) and (when (Rich) for inexact versions of se\'eral nonlinear solvers. The problem size considered is of size 16 x 16 gridblocks after 100 time steps of simulation. . . . . . . . .. 6.-1 Results for G:\IRES preconditioned by the nine schemes tested in this work .. Vit: number of outer iterations; the solver iteration: Ts: elapsed time in seconds for Tp: elapsed time in seconds to form the preconditioner: :Vi.a: average number of inner iterations outer iteration. Preconditioners Tridiagonal (Tridiag.). Incomplete LU factorization with no infill (2SComb.), t.\vo-stage .-\clclitive (2SAcld.), t\vo-stage \lultiplicative (2S1Vlult.), two-stage block .Jacobi (2SB.J), two-stage Gauss-Seidel (2SGS) and two-stage Discrete Projection (2SDP). Results for 8iC'GSTAB preconditioned 194: by the nine schemes tested in this work. :Vit: number of outer iterations: seconds for the solver iteration: the preconditioner: outer iteration. tridiagonal Ts: elapsed time in Tp: elapsed time in seconds to form .Vi.a: average number of inner iterations Preconditioners (Tridiag.), t\vo-stage .\dditive incomplete (2S:\dd.), Lr factorization with no infill (2SComb.). two-stage :\hdtiplicative two-stage block .Jacobi (2SB.J), two-stage Gauss-Seidel two-stage Discrete Projection Physical input data. per unit shown are from top to bottom: (ILC(O)), block .Jacobi (8.J). two-stage Combinative 6.6 per unit shown are from top to bottom: (ILU(O)). block .Jacobi (8.1), two-stage Combinative 6..j L92 (2SDP). (2S:Ylult.), (2SGS) and 197 20-1: XXIII 6.; Summary of linear iterations of backtracks (LI), nonlinear iterations (NI), number (~B) and execution times of GMRES and Bi-CGSTAB with the use of the 2SComb and the 2SGS preconditioners. simulation The covers 20 time steps with ~t = .05 and ~t = ..j for a problem of size 8 x 24 x 24 grid blocks on a mesh of 4 x 4 nodes of the Intel Paragon. ("'): Backtracking method failed after the 17th time step; ("''"): ~t was halved after the 16th time step.. 6.S Results for the HOKN /2SGS and Newton/2SComb . . . . . . . .. 20S solvers for different large problem sizes vs. different number of processors of the Intel Paragon. simulation Execution figures are measured after 10 days of with ~t = 1 day. CPU times (T) are measured in minutes, (E) indicates parallel efficiency. ("') Abnormal efficiency due to . pagmg 6.9 0f . t h e operatmg system. . . . . . . . . . . . . . . .. .... '>1-I :.. CPU time measured in minutes of a million and six hundred thousand simulation of unknowns on 16 nodes of the SP2 for 10 days of ''''ith ~t = 1 day. . . . . . . . . . . . . . . . . . . . . . .. 219 1 Chapter 1 Introduction 1.1 Motivation For the last thirty years. reservoir simulation has played a major role in the petroleum industry and, as a by product, tors have been instrumental reco\'ery strategies stantly expanded algorithm industry as well. Numerical simula- in helping reservoir engineers locate oil reserves, design and optimize oil field management. the computing development development in the computer These packages have con- frontiers by driving a major part of the research in of the last three decades and by creating of vector and, more recently, parallel computing growing concern for underground a market for the systems. pollution has made it possible to adapt much of the same software and hardware technology to the numerical simulation contaminant migration and of contaminant The advent of increasing computing larger scientific and engineering power has been a driving force for solving problems. indeed. Consequently, have been coming forth with this computer ~owadays. the idea of solving partial differential technology equations lions of unknowns is becoming plausible and attractive programmer. of underground clean-up strategies. algorithms the application The recently (POE's) new numerical sophist.ication. involving mil- to the numerical analyst and The need for solving problems with at least one million grid blocks. and several unknowns per grid block, has become one of the main challenges in the reservoir community. solvers plays an important Therefore the conception role in the oil industry research. of robust and efficient ~Iajor challenges arise In connection to solving coupled sets of nonlinear equations implicit discretization This dissertation application of multi-phase and practical We are primarily mining the overall computing by a fully models . is an immediate context. as obtained interested response to the aforementioned in enhancing time of a simulation: those aspects deter- linear and nonlinear iterations. To this end. we have divided the present work in two major points • Efficient and robust implementations Krylov-subspace iterations. • Efficient preconditioners of the inexact Newton method based on and . for coupled systems of linear equations. [n the following. \ve describe the role that both ideas play in this research. This shall serve as further motivation and defining scope of the present work. Ideas on performing inexact Newton steps have been around for some time. Ortega and Rheinboldt had already suggested this type of computation the se\'enties [10,1]. However. maturity Dembo. Eisenstat \\'11<'11 of these ideas came up as result of the work of [4:1]. Their work provided mechanisms for deciding and Steihaug the relat i\'e residuals at the beginning of of the linear solver are sufficiently small to ensure an accc>ptable \'ewton step. Since then, the reliability and acceptance of inexact )iewton methods have been continuously growing according to the sophistication of linear iterative solvers for large scale problems. On the other hand, quasi-Newton with the high computational .Jacobian operators. cost and difficulties associated to deal to the evaluation of These methods rely upon low-rank updates that serve as correc- tion terms for secant approximations of these low rank updates problems. methods have been a good alternative is Broyden's to the Newton equation. method [26,49]. the main difficulty resides in maintaining The most widely used In the context of large scale the convergence of the method without destroying the sparsity pattern of the Jacobian advances aimed at overcoming this can be categorized Comprehensive matrix. Hence, most of the as limited memory methods. discussion and pointers to the literature about using these methods for large scale problems can be found in [28,49, 101]. This dissertation looks at both approaches in a complementary way: we perform inexactness through a Krylov iterative method (i.e., GMRES) and perform (or rather reflect) secant updates of the .Jacobian matrix restricted to the generated Krylov sub- space basis informat.ion. To make this possible, we exploit the information by the I\:rylov iterative method by solving a sequence of minimal residual approxi- mation problems or by propagating eigenvalue information generated of the current Jacobian matrix to the .Jacobian matrix at the next new point. The former procedure'leads the generation to of faster versions of Broyden's method and a higher order version of N' ewton's method. cheaper Richardson The latter allows one to replace the use of G MRES iterations by it.erations. The reliability of both approaches depends upon how well the Broyden update restricted to the Krylov subspace resembles the one given by Broyden's update in the entire subspace. Sillc~ G~IRES is based on the .-\rnoldi process to generate a basis for the KrylO\' subspace. we propose to update the Hessenberg matrix and preserve the basis for fu- ture :\'"ewton steps. Such updates are based on Broyden's method but restricted current Krylov subspace giving rise to what we call Krylov-Broyden underlying mechanism allows one to reformulate Therefore. The solutions of future linear Newton equation as a sequence of minimal residual approximation ing the G:\IRES method. updates. to the problems without reinvok- faster and higher-order nonlinear methods can be built and their effectiveness relies upon how well the linear directions contained in the Krylov basis are able to generate a descent direction for the norm of the nonlinear function. 6 1.2 Structure of the thesis. The present work is organized as follows. The remainder of the chapter is devoted to an overview of some notational conventions and results of linear algebra. review of linear iterative solution methods with emphasis ation and its connections to I\.rylov subspace methods groundwork for further analysis of Hybrid Krylov-Secant the chapter wit h a review of types of nonlinear A brief on the Richardson iter- are in order to prepare the (HKS) methods. convergence \Ve end and some addi tional notation. In Chapter method. 2. we establish the framework that distinguishes It comprises the use of local and global convergence as well as forcing term criteria to dynamically adjust linear tolerances is followed \vith a brief description method our inexact Newton is the cornerstone for the linear solver. The discussion of G~IRES and BiCGSTAB. The former iterative in developing the family of I\rylov-secant latter serves for comparison purposes in eventual numerical methods. experiments the with two- stage preconditioners. In Chapt.er :3. we review the fundamentals I'nt rallk-one type of solvers: intprpretations \'iewpoints methods. Broyden's met.hod and the EN algorithm. Both ha\'e in t.he linear and nonlinear world. \Ve also re\·iew a couple of recent along the notion of complementarity That is. the need to incorporate inexact Newton iterations. incorporate of secant methods through two differ- between inexact and quasi-;\f'wton the preconditioner A very appealing enhancement not only the preconditioner into the convergence of to this original idea is to but also the I\rylov information produced by G~[RES. This requires looking more closely at the .-\rnoldi factorization method and its modifications analyzing after secant updates. The main contribution of the thesis focuses in these updates under the figure of Krylov-Broyden updates more efficient algorithms for both inexact and quasi-Newton methods. and producing In Chapter 4, we complete the second part of Krylov-secant the previous chapter. Besides updating ideas introduced the Arnoldi factorization in without destroying the Krylov basis, we need to account for the changes of the nonlinear function after each Newton step (i.e., at every new point). This implies extending the approach to the change of right hand sides arising at different Newton linear equations. find that switches to the Richardson t.he underlying Krylov information. HKS algorithms iteration are appropriate to take advantage of \Ve propose and discuss two different version of according to the type of secant updates with Broyden update) employed: are devoted to address the problem of preconditioning methods. subsequent We also dis- A couple of sections and implementation of line- We end the chapter with some special considerations for efficient large scale implementations. pact representations HKS-B (HKS and HKS-E:'-i (HKS with EN type of update). cuss a Newton's method variant called the HKS-N algorithm. search globalization We This leads to revise limited memory com- for the implicit computation of accumulated updates through nonlinear iterations. In Chapter .) we focus the attention plexity associated on the issue of preconditioning. to coupled, non-symmetric to study strategies of decoupling The com- and indefinite linear systems leads and their role in preconditioning liS the whole system. Since we are looking at a specific problem of two-phase flow. we devote some preliminary discussion to the numerical system. This aids to understand model. its discretization algebraic the convenience of an aggressive decoupling strategy for the typical coupled systems arising in these problems. clecoupled operators and associated A detailed discussion on is in order, followed by coverage of different two-stage precon- ditioners (those based in nested or inexact iterations). out some implementation issues. \Ve end the chapter pointing Chapter 6 covers extensive numerical experimentation. [n agreement two main points of the thesis, this chapter is divided into experiments implementations of the Krylov-secant with the for sequential methods and for the two-stage preconditioners. At t.he end of the chapter. both ideas are integrated and tested upon a parallel two- phase reservoir simulat.or. Chapter 7 summarizes the main results and conclusions of this dissertation. recommendations Further and directions of work are ranked in the order that the author con- siclers worthwhile within the setting of large scale implementations. 1.3 Notation For notational simplicity. all scalars, vectors and matrices are treated as being in the real vector space. We closely follow Householder notational ing most of the entities. :\Iatrices, spaces and functions are denoted by capital Roman or Greek letters and vectors by lower case Roman letters. lower case Greek letters. engineering conventions for represent- Only exception and physics notation. All scalars are denoted by to this rule. in order to respect customary applies to differential equation terms. The norm 11·11 refers to the Euclidean norm and induced matrix norms. The inner product of two vectors represents the transpose \Ve indicate by I\: (.-\.) = ll. v E R is indistinctly operation IIAll IIA-III denoted hy ( ... ) or Ih~. (Here. / of a "ector or a matrix in the real vector space.) the condition numher of an invertible operator A E lRnxn. The spectrum or set of all eigem'alues of a matrix A is denoted hy AC-\.) and it is a suhset of the complex space C. Given:: + bi E C, the real part is va + b2• \Ve denote hy Inxn the n x n identity matrix; if the dimension n is apparent from denoted by Re(=) we represent the context. = a and = a the imaginary part by 1m (=) = b. With Izi = 2 the modulus of z. we simply write I. The symhol a is used for the scalar zero and for 9 the zero matrix: in the latter case, the dimension is assumed to be evident from the context. \Ve represent with the vector of zeroes with value 1 at the ith position. ei Its length should be apparent from the context. We also use the following notation Pm, = {<PP) = f>l'i,\i I aj E m}, IR,O ~ i ~ 1=0 for the set of polynomials of degree at most m. For any vector u E IRn and any matrix A. E IRnxn, we use Km (A, u) = span {v, Av, A2v, ... Am-Iv}, to indicate the mth Krylov subspace of lIe, generated Linear iterates are denoted by subscripts (1.1) by A and v. (usually i and j) and nonlinear iterates k enclosed between brackets. For the remainder of the chapter ancl by the superscript in some forthcoming sections of the thesis, we refer to the solution of a linear system of the form A.r = b, where A = (ai.j) non-symmetric .fO E nxn !R. , and x,b E in general. !R.n. ( 1.2) \Ve assume the matrix A is nonsingular In order to indicate inst.ances of linear iterations, as the initial guess for (1.2), Xi, as the ith iterate and ri = b- AXi, and we use as the ith linear residual. 1.4 Some preliminary results In this section a subset of linear and n'onlinear iteration reference in the forthcoming discussion. results are established as 10 1.4.1 Matrix analysis results The following definitions iterati"e methods properties are found in the standard (e.g .. [:3. 12]). \Ve state them as self-reference 1.4.1 The class znxnof znxn Definition 1.4.2 nxn \I-matrices = {A E ~nxn = when we discllss when a given iterative is given by i : ai,) ~ 0; # j}. of M-matrices {A. E znxn : (A-1) .. ~ is given by O}. I,) play an important in\'f'rse of a \I-rnatrix Z-matrices The class of Mnxn .u t hat of matrix theory or of coupled linear systems. Definition termining literature role in matrix analysis. method is convergent. is lIonnegatil'f. They are the basis for de- Definition lA.1 says that the and therefore monotone. 'We remark. however. a matrix could be monotone even though it is not a M-matrix. Other important class of matrices are those that have all eigenvalues in the right half of the complex plane. Definition 1.4.3 The class of of po"itirf. pnxll stable (or positive real) matrices is given by pnxn = {A E lRnxn : all t.igftll'aluu. The class of positive stable matrices gO\'erned by systems of coupled ordinary of A. hal'(; positive appears frequently differential real part}. in dynamical equations. systems Their occurrence implies the stabilit.y of the numerical model solution. The following result established in [:3] characterizes the relation between \I-mat.rices and positive stable matrices. II Theorem 1.4.1 Some additional Let A E then A E znxn, if and only if A E pnxn. results are in order to estimate the location of eigenvalues. we point out that a matrix is irreducible if there is no permutation such that PA.P! is a hlock upper triangular Definition JInxn 1.4.4 First. matrix P E llexn• matrix. The matrix A is diagonally dominant if n laijl > Llaijl,j = 1,2, ... n. )=1 #i and irreducibly least one ro\V diagonally dominant if the strict inequality holds for at and n L laij!.j laijl ~ = 1, 2, ... n. )=1 ):¢:.i Theorem 1.4.2 (Gershgorin) A (A) is enclosed in the union of the discs and in the union of the discs For implementation Woodbury formula original matrix. where B E of single or multiple secant updates, the Sherman-Alorrison- is useful to relate inverses of rank-one updates to inverses of the That is jRmxm and C \I·t E jRnxm. Here, we have implicitly expressions between brackets are invertible. assumed that the 1:2 1.4.2 Fundamentals Iterative methods on whether general T-I.\[ (T-I initial guess methods; The scheme .ro. The that most .\1 is assumed to be invertible non-stationary value iteration T. scheme known. the optimal relaxation parameter. is callecl the /'ela.ration Topt, (see e.g .. [:3. 7·L 11:3]). Here. '\max and ).min i. as conjugate by (1.4). applied iteration to the preconis the simplest stated parameter positive Topt character by choosing of the iteration by some a priori spectral is determined is given by (such It is simply when A is a symmetric and can for all values of methods described of as the Richardson (1.4) The stationary to a given constant Ti in convergence of A. For instance. = definite L .. : L._ are the largest matrix. thell for a stationary and minimum of A. respectin:{v. eigenvalue In the positive parameters interval. i = 0,1, ... , of the iteration. known parameter. information A = ;\1.,. - ,'iT = + T;JI-lrj, .f; operator iterative its effectiveness In a C. Hence, the following .\/-1A.e = .\I-lb. In fact. the Richardson .\[ = l. The damping the iterations. splitting E T see e.g .. [S]) can not be exclusively and lion-stationary and rl'gularly parameter (b - A..r;) = can be thought system :-;tatiollary iteration by t.he following damped here as the preconditioner however. during depending results + T;.\[-l 1'; or non-stationary of their parameters by fixing the parameters We remark. ditioned scheme .r;+1 = be interpreted is induced are changes as stationary JI - .4). for a given nonzero iterative for a gi\'en classified this can be realized non-stationary gradient are generally or not there format, - of iterative solution methods reduces which is customary definite case. to finding is given to assume the computation the best by a Chebyshev that of optimal approximation polynomial the symmetric part Richardson polynomial [3]. of the on a continuous In more matrix relaxation general is positive cases. definite it (i.e., the matrix across is positive stable) or that the eigenvalues each side of the complex are absent in the original these requirements. special plane matrix, (the indefinite then adequate In the broad literature, well kno\ ....n stationary .\1.,. is defined: e.g., the damped preconditioning methods of A. When part .Jacobi and Gauss Ti iterative Jacobi (A.) and the SOR iterative triangular \\ihen in two ellipses these conditions should iteration try to meet· is considered a (see e.g., [72, 74, 145] for further . Note that T-Idiag case). the Richardson case of the large class of Chebyshev discussion) are clustered = methods iterative method when = 0,1, ... 1, Vi derive method their arises AIr is defined as name when from how taking T-1 times we get the non-damped AIr = the lower versions of Seidel iterations. From (1.4) it follows that successive non-stationary Richardson iterations yields the ith residual (1..5 ) where Oi ().) nomial 1- = rIJ=o (1 - and it is monic .\I.'i-I (,\) = 1 - ,\ Tj).) (i.e., Pi. This polynomial E 4>(0) = 1). Therefore, L~~~"·O).j, is known as the residual it can be rewritten poly- as 4>;().) with U' E Pi-I' Furthermore. ,ri = .ro + l!'i-I (.-\)"0 (1.6 ) i-I = .ro + 2: Tjrj. j=O This implies that the solution at the ith iteration is determined by the affine subspace Xi The relation becomes evident. between E Xo the Richardson The Richardson + Ki (A, iteration iteration ( 1.7) ro) . and Krylov subspace delivers elements in Krylov methods subspaces now of 1-1 increasing dimension. the same I\rylov To generate defining subspace polynomial and with V represent.ing Tj'S (j)j iterations elements in set Q enclosing>' (.4), parameters 1. However. assumptions T, these are made. Let the eigenvalues of .-\. If we have some knowledge eigenvectors. then we can reformulate the problem of finding [6.5]to 116dA) roll :S I\: (V) min As said above. polynomials the origin. ~early optimal obtained in the general expensive to compute On the other subspace [:3] j polynomial when all eigenvalues (i.e., asymptotically case of non-symmetric (1.T) implies that (.4., ro) without and Arnoldi processes). obtain than of the required relaxation there on a Krylov • ~[inimal shifted Q are real and and scaled does polynomials not include can be only [62, S6]. Incidentally, systems are two types subspace residual iterates )iote that we can readily Basically, through optimal) any prior and from there. reciprocals is computable they are numerically. hand. K (1.8) .\eQ ,p(0)=1 the optimal Chebyshev maxl<i>d>')lllroli. 4lEPj 0(0)= I defined < IIct>j(A)II unless some suitable the corresponding ·>E", Lanczos other we seek relaxation in such a way that ().) available IIrjll = min Krylo\' we produce in the form A = V-1 DV, with D containing A be diagonalizah1e the Richardson are not readily of a compact after each iteration as (1.6) suggests. convergent the residual parameters :\'[oreover, information from a basis of the on A (.-\.) (as happens such basis allows us to compute Pl. The Tj for solving prior information Choose for the Richardson Zm E the linear of the spectrum Km (.-\.. ro) in 1;'j_1 (,\) of d>j (>') are nothing roots parameters of approaches without approximation: OJ can be obtained more iteration. system (1.2) of .4.: and solve (e.g., :\lI~RES. G:\IRES) min :EKm(.-\.ro) lib - A (xo + z)11 = min :EKm(.-\.ro) lira - Azli. (1.9 ) 1.) or a, • Petrov-Galerkin approximation: Choose Zrn E Km (A. ro) so that (1.10) .~' The Petrov-Galerkin approximation reduces to the Galerkin approximation Sm == Km (A. ro) (e.g .. CG). For solving non-symmetric based on Petrov-Galerkin wElle. approximations These algorithms use 8m problems, == Km (.4t• when most methods w). for some initial vector are built over Lanczos bi-orthogonalization procedures: e.g, BiCG, CGS and BiCGSTAB. As regards to (1.6), both approaches solution by set.ting Xm The non-stationary tation. In Chapter practical standpoint = IO (1.9) and (1.10) obtain an approximate + =m· Richardson iteration plays an important -I:. we discuss in more detail this method in connection role in this disser- from a theoretical to near optimal residual polynomials and and Krylov subspaces. \Ve say that the splitting of A is a (weak regular splitting) is monotone and the term (AI;l NT) NT is nonnegative. block splitting for M-matrices regular splitting if .\/,. Convergence of point. line or A and .\1. following the scheme (1.-1:)is guaranteed a~ the following theorem reveals. Theorem splitting Proof 1.4.3 If A = .\1 - N is a weak regular splitting, then the is convergent if and only if A is monotone. o See e.g., [:3;p. 213]. ~ote that (l.-l:) is a form of the classical Picard's method of successive approxi- mations converging to a unique fixed point when the mapping is a contraction [1 V5]. l6 From that point of view. ~evanlinna for stat.ionary and non-stationary [99] makes a rigorous analysis of convergence (including the GMRES iterative solver) methods. Finally, as regards to the following 2 x 2 block partitioning -\.r. . - All ( All we define the Schu.. complement ,412) u ) ( An v ( f --b 9 of A with respect A21i"li} A12 (5'22 = All - .412.4.221 A21) ) of (1.2) -. to All (.4.22) as 511 = An - The Schur complement arises when pivoting with the diagonal blocks to generate a block LG decomposition. We use it to generate an efficient t.wo-stage preconditioner coup:ted equations. 1.4.3 Nonlinear • for pressure coefficients associated with the linear For a helpful source on Schur complement analysis see [106]. convergence For !the convergence theory of inexact ~ewton methods, we need the following defi- nitiorrs (see e.g .. [-19. 10,5] for further details). Definition 1.4.5 A function F : 0 C IRn -+ IRn is Lipschitz continuous if it sat isfies IIF (.r) - F (!I)lI lor some,,: > 0, and V.r. yEO. F E ~ : lI·r - The condition !III· is summarized by denot.ing .c.., (0). Definition {.rlk)} k=O 1.4.6 Let.r .. r(k) E IRn, k = O. 1. ... Then the sequence converges to .r • q-li/lwrly if there is a constant. c E (0,1) and an integer that for all J.: ~ k k ~ 0 such 17 if there exist an integer k ~ • q-superlinearly a such converging to k k that for all k ~ if there is a constant c > • n-step q-quadratically such that for all k ~ function F : 1.4.1 n ~ IR n (Standard ~ • The equation assumptions). • F' (u") IRn x n E L, (n) . is invertible. an integer Consider k~ a a nonlinear IRn for which we seek to solve the equation above has a solution at u·. ~ a and k F(u)=O. • F': n ~ IRn and a sequence {Ck} if there is a constant c > 0 and an integer k ~ 0 such • 'I-quadratically Assumption that for all k ~ a 19 Chapter 2 Newton-Krylov 2.1 methods The inexact Newton framework Interest in using )rewton's method combined with a Krylov subspace method to solve large scale nonlinear problems dates from the middle 19S0's [143]. At that time, these methods were rapidly evolving together with their applicability arising from systems of nonlinear ordinary differential equations and references therein). for algebraic problems (see e.g., [19, 22, :36] In the context of partial differential equations their suitability for solving large nonlinear systems was finally established through the work of Brown and Saad [20]. In their paper, Brown and Saad include extensions globalization techniques, to several types of partial scaling and preconditioning. differential equations. is still going on fr0111both the theoretical _. • •'3') _. [')'3 - :):). -0] I;J for application of They also discuss application Currently. and the practical intensive investigation standpoint; see e.g., • This chapter discusses the globally convergent inexact method used in the present work and formalized by Eisenstat and Walker [.55.. =)6].The basic components of this method is a Krylov subspace iterative procedure for solving the Jacobian equation. forcing term selection criteria on dynamically search backtracking The discussion starts adjusting a linear tolerances and a line- method to provide global convergence under suitable conditions. from the presentation to describe each of these components. devoted to the G~lRES Krylov-subspace the Arnoldi factorization of the general algorithm and proceeds This comprises § 1. In § 2, the discussion iterative method with particular attention is to on which it is based. Some words are devoted to the method :W and its implementation. §:3 outlines the BiCGSTAB Krylov iterative solver. Its analysis is given with less emphasis than GMRES since it is only used for comparative purposes 2.1.1 when we study two-stage preconditioners in Chapter .5. Algorithm Consider finding a solution u" of the nonlinear system of equations (2.1 ) F(u)=O. where F : and .I(k) hh {1 ~ IRn == .I (/t(k») -+ IRn. For the remainder denote the evaluation ~ewton step. respectively. of the thesis. let of the function == F (u(k») F(k) and its derivative at the Algorithm 2.1.1 describes an inexact Newton method applied to equation (2.1). Algorithm 1. Let 2.1.1 11(0) (Inexact :\ewton method) be an initial guess. 2. For /..~= O. 1. 2.... 2.1 Choose until convergence do E [0, 1). 'l(k) 2.2 {'sing some Krylov iterative method, compute a vector S(k) sat- isfying (2.2) The residual solution. r(k), represents the amount by which the solution, by the I\.rylov iterati\'e solver (namely, GMRES or BiCGSTAB satisfy the .Vewton Equation (or Jacobian equation) s(k). given in this work) fails to 21 (2.:3 ) The use of an iterative solver stems from the high cost associated with solving (2.:3) exactly. Obviously, if Ol't is far from u· then iterating ll(k) to a low tolerance may produce /'soluing of the Ne\\,ton equation since there may be a considerable between the nonlinear disagreement function F and its local linear model. The residual norm in the linear solver must be reduced strictly for a locally valid Newton step. The step length is computed ,\(k) ensures a decrease of f(ll) = ~F(ll)t to be a descent direction for f(u(k»). using a line-search backtracking F(ll). method which The step given by (2.2) should force s(k) That is, (2.4) In this case. we can assure that there is a (0 such that 0 < ( < (0. 'V \ote that in view of (2.2), (F(k)r that the required condition is sufficient for s(k) = -IIF( + (F{k)r inequality argument tells us that being a descent direction. linear solver must be reduced strictly which means r(k), Ilr(k) II = IIF(kl + J(kls(k) II < Thus, the residual norm in the for a locally valid Newton step. this condition is achieved by setting a linear tolerance, residuals generated 2 klI1 (2.4) is achieved whenever .\ simple Cauchy-Schwarz II Fik) II J(k)s{k) say 7](k), In practice, to be met by relative in the linear solver. That is, o < 7] (k) < 7]max < 1, (2 ..j) where m indicates act ~olution s(k) the number of linear iterations is acceptable employed. condition. condition The predefined linear tolerance term of (2.,5) (i.e., the term that forces such condition Dembo, Eisenstat inexact )l'ewton iterates 'Ilk) :::; 'Imax TJ(k) < L and if Tl(k)'S uniformly bounded is known as the forcin[J to be satisfied). is nonsingular. 1.4.1, for u(k) suf- below one, that the sequence of If the sequence converges to zero with converge q-linearly, Jl·) This or more generally. the and Steihaug [.tI] showed under Assumption ficiently close to u· and the lOex- for the Newton step whenever (2.5) is satisfied. condition is known as the Dembo-Eisenstat-Steihaug ine.mcf Newton Therefore, then the iterates generated If 2.1.1 converge to the solution superlinearly. TJ(k) = 0 (IIF(k)II), by Algorithm then the sequence converges quadratically. 2.1.2 Forcing terms Empirical criteria t.o select forcing terms can be seen, for instance, Although the choices proposed in [20. :3.5.10:3]. therein are somehow simple. cheap to compute and. some of them promise fast local convergence. their ad hoc nature preclude them from a broader applicability. small ,/(k) In some situations. these choices may generate a prematurely for large values of /...with iterates causing oversoh'ing of the :\ewton equation. of 'Ilk) (close to 1) after se\'eral iterations convergence. Therefore. u(k) relati\'ely far from the solution u· Conversely. keeping quite high values may severely affect the nonlinear rate of the point is to select these forcing terms in a systematic way in order to ensure efficiency and rapid convergence at the same time. Criteria for choosing the forcing term, ,,(k). in (2 ..5) have been extensively stucliecl by Eisenstat and \Valker [.5.5,.56]. Although their choices still have a heuristic blend. they are designed together becoming arbitrarily with an efficient mechanism small or large. Besides providing for safeguarding them from robust and efficient choices :2:3 that prevent a significant amount of oversolving in (:2.:3) without sacrifying desirable rates of convergence. their results formalize much of the trial and error judgment specifying dynamic linear tolerances \Ve have incorporated mentation of this thesis. choice for 7](k) Eisenstat in an inexact Newton method. and Walker ideas in the Newton-Krylov imple- In fact, it has been observed that the following particular works \vell in practice [-1:0, .56]: (2.6) \\·here The choice of 17(k) -(k) _IIIF(k)II-IIF(k-l) '1 - + .j(k-l)s(k-l)11I I\F(k-l)1\ given above reflects the agreement model at the previous iteration. between F and its linear Thus, the linear solver tolerance ~e\Vton step is less likely to be productive (2.7) . is larger when the and smaller when the step is more likely to lead to a good approximation. Expression (2.6) is a safeguard that prevents (2.7) from becoming arbitrarily especially when the iterates are still far from the solution. there is coincidentally a good agreement smalL This may happen either if between F and its local linear model (e.g .. regions where the function behave almost linearly away from the solution) or if there is a \'Cry small :\'ewton step (i.e .. there is little progress as consequence of being far from the solution). and ,](0) In most practical cases. Eisenstat and vValker suggest = 0.5 as a fair initialization "max = 1 - 10--1 strategy. Eisenstat and \Valker suggest other choices for forcing term criteria and safeguards. Among alL the one proposed here has been the most robust combination observed in practice (\ve refer the reader to [.56]). The following result formalizes the local convergence predicted ing term criterion. by the above forc- Theorem 2.1.1 (Eisenstat ditions given by Assumption r and \Valker, [,56]) nder the standard 1.4.1, the sequence {1](kl} generated conby the forcing term criterion (2.7) produces a sequence of inexact Newton iterates {/t(k)} that converges as follows .\ couple of remarks are in order. Remark 2.1.1 The theorem immediately gence and t\vo-step q-quadratic Remark I.: 2.1.2 implies q-superlinear COl1\'er- convergence. Clearly, this safeguard ensures (1](k-l) f ~1](k) for all > O. One can t hen argue that expression (2.6) may prevent this con- \'ergence result (2.S) from happening. (:2.6) eventually r-quadratic becomes inactive. In practice. however, the safeguard Eisenstat and 'vValker [.36] show that convergence proceeds p\'en if the safeguard condition is active for all \'allles of k. The following example illustrates on a real application application problem the high le\'el of efficiency that can be achif>\'ed with this forcing term criterion. Incidentally. this represents one of the immediate targets to be pursued in this dissertation. Example 2.1.1 Fig 2.1 shows a loglo scale plot of accumulated of G~IRES iterations against IIPII for a moderate number problem size of 24 x :24x ,32 (composed of more than SOOOO unknowns) taken from a given time step of a :3-0 two-phase black oil reservoir simulation details). The staircase shape displayed (see [40] for further by an inexact Newton method :2.5 2 -2 -3 o 0,5 1 1.5 10910 of number of GMRES 2 iterations 3 Figure 2.1 The lise of the forcing term criteria for dynamic control of linear tolerances. The solid line represents a standard inexact Newton implementation with fixed linear tolerances 0.1. The dotted line is the inexact ~ewton implementation with the forcing term criterion. Each symbol * indicates the start of a new Newton iteration. with a fixed forcing term suggests the amount in generating decreasing Newton directions. of wasted computation In contrast, the criterion given by (2.6) and (2.7) avoids flat portions of curve resulting then in a significant overall saving of G~IRES iterations 2.1.3 (about -WO iterations). Globalization The condition (2.5) itself is not sufficient for converging to the root of the nonlinear function F if the inexact ~ewton iteration starts at any arbitrary we use the line-search backtracking this method, we need to find a step point. In this work method in order to remedy this situation. slk) For that not only satisfies (2 ..5) for a given '11k) but also a condition for ensuring a sufficient decrease for IIF(k)lI. can be also achieved by trust region methods Global convergence (e.g., [49, 124]). However, we restrict 26 our implementation to line-search flpxible t.o use. sufficient ble. and simpler Some developments methods reasons on the use of trust since they tend to be more for them region to be more methods within widely applica- inexact Newton can be found in [20. 2:3. :j.5. /7]. iterations [n any case. the key point greater model backtracking or equal to some fraction (i.e., the direction condition is to guarantee translates This inequality combined the act.ual reduction of the predicted reduction from solving the linear obtained to accepting that a new Newton needs to be given by the local linear Newton equation). step if with (2.5) yields (2.10) t E (0,1). The above expression This can also be seen as the result of combining the a-condition [-i9], of Goldstein-Armijo (:2.11) with (:2.:')). inequality Here. f = t IIF112. and (2 ..1). it readily f (u(kl + s(kl) 0 < (\ < 1. In fact. applying the Cauchy-Schwarz follows from (:2.11) that = IIF(k+1lll 2 ~ IIF(klll 2 + 20 2a(-IIF(klf = IIF(kl = [1 - 2a (1 - '1(k))] IIF(klf 2 + I1 (F(kl.J(kls(kl) + (F(kl, r(kl)) (2.12) . therefore. Since )1 - 2a(1square roots '1(kl) ~ 1- 0' (1 - '1(kl) in both sides of the above inequality for 0 < 2a we obtain (1-1J(k)) ~ 1. by taking (2.10) with a == t. Note that the coefficient procedure IIF(k)11 of as the iteration It is straightforward that in (2.10) is less than advances. to observe That that < 1}(k) is sm,all. ~'Ioreover. ;'sufficient reduction" condition (2.,5) alone II F(k)1I is not for this is that consequently. between met for a given computed of norms < 2(1- may not converge the nonlinear As said above, step decrease in in (2.12) implies the following and the restriction for any 0 TJ(k))' conditions 'l(k) is recommended in f. This t. ~ to shorten the current to replace < n(k) < 1]max it is not true < and IIF(k) II. See of step s(k) the a-condition [49] and it additionally conditions). for a given pair To overcome to avoid small relative by ,\(k)8(k) this fractional t this, it decreases for a suitable of Goldstein-Armijo follows that t, on 1. that simult.aneously. step in order the current It can be shown that holds in this case However, (2 ..5) and (2.10) are satisfied means [~.':\] C (0.1). 17(k) f condition ~ot.e that since t (= a) is less than unity (due to the Goldstein-Armijo this imposes The s(k). to a minimizer a sufficient for of this problem. 1 t requirements necessarily sources and the actual simultaneous agreement" to generate 1, which suggests the predicted method. 6] for two identified 0< < '7(k))] given by Newton's {u(k)} of the = O. model the sequence The non-negativity between and for a "sufficient it may not be possible [49. Chapter (1 - [1 - t (2.10) and (2..j) impose F and the local linear function reason of the convergence f (u(k)) is, limk_oo for a value of t close to one, the margin reduction 1, implying ,\(k) E (2.11) st.ill step leads t.o (2.1:3) Therefore. by ,\(k) ,\(k)s(k) and (2 ..5) and (2.10) are satisfied 1- ,\(k) from approaching grow arbitrarily larger at the same time with + ,\(k)17(k), respectively. zero which would make than 1. contradicting S(k) The only subtlety 17(k) (2.10). ~ 1. making and 1](k) replaced left is to prevent it possible for t to The discussion above provides almost all the necessary ingredients tracking globalization method. To generate adequate fractional steps with polynomial p (A) interpolating find the minimizer of a quadratic for the backwe ,\(k). the function (2.1-1) over a predefined practical = p(O) :2 (F(k). [~,X] interval implementations.) = p(l) ;:(1) The mechanics J(k)s(k)). fined as s(k) = ,\(kls(k) 2 + s(k))11 (u(k) is computed. and ,.,(k) 1_,\(k) ,.,(k)) in such a way that. and p'(O) in backtracking the ~ewton (1 - == [0.1,0.5], in most this interpolating is standard ,\(k) = = IIF [~,X] is constructed for computing defining the interval of interpolation (see e.g .. [-19]). Once (Initially, This polynomial = IIF(k)112, ;:(0) C (0.1). = ;:'(1) polynomial = and implementations step and forcing term are redeuntil condition (2.10) is eventually met. Furt.her theoretical results on globalization procedures for inexact Newton meth- ods can be seen in [23 .. 5.5]. 2.1.4 Why Krylov subspace methods [~rylo\' subspace iterative methods are basically based on a set of computational nels such as inner products. makes them part.icularly performance method. attractive This featurp In the context of the Newton's methods are suitable for reducing the frequently unaffordable with the computation the computation in several respects: • The function F (ll) cost of an exact Newton step. Besides high cost reasons (i.e .. implied by a direct procedure). impractical multiplications. for soh'ing large sparse linear systems at high rates on vect.or and parallel machines. iterative associated AXPY's and matrix-\'ector ker- is non-differentiable. of the Jacobian matrix can be 29 • Even if the Jacobian analytically exists, it may be expensive or numerically . • Even if the storage and computation really ill-conditioned Fortunately, to store or compute either of the Jacobian is affordable, it may be a system. since in Krylov subspace iterative methods explicit knowledge of the Jacobian is not required, but rather its action on a vector in the form of matrix-vector multiplication, the .Jacobian the first two aforementioned J(k) on an arbitrary and preconditioning points can be overcome. vector v can be approximated The action of by finite differences can be even performed to overcome the third point above. instance, right preconditioning for a suitable small E via an operator For M can be realized as follows, > O. In problems modeled by PDE's, if the Jacobian is not available, the preconditioner can be constructed time-lagging last respect, by approximating some operators a lower order discretization or conceiving domain decomposition there has been a lot of activity of the Jacobian, strategies. in the areas of nonlinear In this and non- symmetric problems (see [1.5, 30, :32, 31, 78, 108) to mention just a few). Theoretical convergence results of inexact Newton methods based on finite differences have been analyzed in depth by Brown [17). In our particular scenario, we do not make a special consideration the choice to rely upon finite-differences Jacobian matrix, whatever representation turns out to·be applicable. or explicit computation We are primarily with the general mechanics of algorithms not necessarily supporting ity of the operators therein involved. Throughout if the user has this dissertation, of the concerned explicit availabilthe implicitness :30 of the Jacobian and its corresponding memory quasi-Newton 2.2 methods preconditioner is achieved by use of limited to draw the maximum savings in computation. GMRES Much has been written about the GMRES algorithm in 1986 [114]. Currently, proposed by Saad and Schultz it is considered the most robust Krylov iterative scientific and engineering applications (see e.g., [32, 39, 59, 73, 146]). This feature has driven further research towards the generation efficient iterative solvers, several preconditioning of other equally robust but more strategies and, the issue of inexact- ness in Newton's method as analyzed here. Instructive new iterative solvers in connection One of the strongest monotonically findings have come about from with the GMRES algorithm arguments method in [50, 81,133,144]. for using GMRES is its capability of producing decreasing residual norms. For a problem size n, convergence is guar- anteed within n iterations in the absence of roundoff errors. However, m iterations of GMRES requires 0 (m2n) making the procedure floating point operations infeasible for large values of m . Restarting steps (with m ~ 11) has alleviated longer guarant.eed. and 0 (mn) of memory storage. GMRES after m the problem but, in this case, convergence is no However. the restarted version of GMRES (i.e., GMRES(m)) been observed to work well in practice. Remark 2.2.1 Research on generalized minimal residual algorithms had started several years before the launching of GMRES. In 1969, Kuznetsov [8.5] proposed the minimization lower dimensional subspaces. in fact to the minimization of weighted residual norms over suitable His work on m-step residual methods leads of residuals Pointers and discussion of particular subject to a Krylov subspace. cases on this approach can be found has 31 in [88, 116]. Later related advances and unifying ideas are shown in [45, 53]. In this section, we discuss those details relevant to the situation thesis. Further details on GMRES implementations analyzed in this are given at length in (113, 136, 137]. 2.2.1 The Arnoldi factorization The GMRES algorithm generates a basis for the Krylov space through process. The Arnoldi process constructs through the Gram-Schmidt algorithm. process creates a decomposition an orthogonal the Arnoldi basis for the Krylov subspace Given the notation in § 1.3, the output of such of the following form (2.15 ) which can be equivalently expressed as (2.16) where The matrix FmH is orthogonal and its columns represent The matrix Hm is upper Hessenberg. the initial vector l'l = ro/ IIroll. VI Considering a basis for Km (A, t'l)' (1.2) as the system to be solved, is defined as the normalized initial residual fO = b - Axo, i.e., Note that it immediately follows that Hm = V~AVm' Clearly, the upper Hessenberg matrix Hm is nothing but the matrix representation A onto Km (A,VI) with respect to the orthogonal of the projection of basis V. The main idea of GMRES (and in general, of Krylov subspace methods) is to project the original problem into Km (A, vd and to apply any of the two approaches described by (1.9) or (1.10). 32 2.2.2 Minimization of residuals In view of (2.16) the minimal residual approximation min :EX:m(A.ro) where i3 = IIrall = yEJR miItn ILBel - lira - Azil and el E JRm+l, AVmyll (1.9) can be reformulated = yEJR millll,Be1 due to the orthogonality Ym minimizes (2.17) the solution of (1.2) is obtained In practice, the minimization Givens rotations. - H myll ' as (2.17) of the columns of Vm+l. If by means of of H m via problem is solved by a QR factorization To achieve higher efficiency, the QR factorization is performed the Arnoldi process advances. This allows to compute rm without explicit use of More precisely, consider that m Givens rotations, reduce it to an augmented where Q E matrix. JR(m+l)x(m+l) upper triangular Gi, have been applied to H m to Rm E JRmxm is an upper triangular Therefore. we can rewrite (2.17) as the following minimization Furthermore, Ym = ;3R:;;/Q~el. an easy calculation Xm. form is a unitary matrix and \vhose solution is given by as with Qm = (Qm,qm+l) ,Om problem E reveals that the residual of the minimization lR(m+lxm). problem reduces to (2.18) Hence. the mth residual depends on the size of the initial residual and the resulting entry (1. m + 1) of the unitary matrix Qm intervening in the QR factorization. It is shown in [19] that this entry corresponds transformations, namely (i, to the accumulated product of the sine involved in every Givens rotation Gi. In other words. one can further say that (2.19) Therefore, one does not need to explicitly compute rm = b - AXm. scalar product per iteration the QR factorization retrieves the norm of the current residual. Instead, one Note that can be carried out efficiently since it is applied to an upper Hessenberg ma.trix. The cost incurred in this factorization is of only 0 (m) floating point operations. 2.2.3 Convergence If A is diagonalizable, case Eisenstat, then the basic result (1.8) follows. However, for a particular Elman a.nd Schultz [.53]established Theorem 2.2.1 If the symmetric the following result part As = AI At of A is positive defi- nite, then the norm of the mth residual produced by GMRES is bounded by (2.20) where Amin (As) and '\max (At A) denote the smallest eigenvalue of As and the la.rgest eigenvalue of At A. respectively. 2.2.4 Algorithm In this work, we use right preconditioning; i.e., we solve and solve lUX = x, :34 to carry out the action of the preconditioner .M. It is well known that this form is preferable over left preconditioning different preconditioners iterative since it makes relative residual nqrms measures within the solver invariant. of globalization This norm size invariance simplifies the implementation methods, described in § 2.1. reason for adopting right preconditioning. If a left such as the line-search There is a even more compelling preconditioner for comparing is inexact or unavailable norms accurately, or rather consistently, backtracking in closed form, there is no way to measure throughout the steps of an inexact Newton method. Interestingly preconditioning enough, there are no substantial when the preconditioner differences between right and left is well-conditioned. Both approaches gener- ate the solution in the same Krylov subspace but with the basic difference that the latter leads to minimization of the preconditioned residual norm I/M-I (b - Arm)11 instead of (2.19). Further discussion on this subject can be seen in [113]. Given a matrix A, an initial guess, .ro, right hand side vector, b, right preconditionel' AI, restart algorithm parameter of m and stopping is GMRES(A .. rQ,b,l\I,m,e) Algorithm 2.2.1 1. Solve My = XQ. 2. ro = b - A.y. :3. VI = roj Ilroll· 4. For j = t, 2.... , m do 4.1 Solve iVly = 4.2 tv = Ay Vj. tolerance c, the restarted GMRES 4.3 For i = 1,2, ... , m do 4.3.1 hi•j = (w, Vi). 4.3.2 w = w - hi,jVi. 4.4 hj+l,j = 4.5 Vj+~;' 5. Xm = Xo Ilwll· lV / hj+l,j. + VmYm,where Ym 6. Solve AJy =.,fm• 7. r = b - Ay. If 8. Xo = Xm m minimizes (2.17). Ilr II < s m return. and goto 1. The loop in step 4 defines the Arnoldi process which is based here on the modified Gram-Schmidt orthogonalization. We should point out that when hm+1.m = -!.4 then it is not possible to generate a at step the next Arnoldi vector. This implies that the residual vector is zero and the algorithm delivers the exact solution in this step (i.e., a happy breakdown). Conversely, if the algorithm hm+1.m = O. Theoretically, stops at step j with rj = 0, then this implies the delivery of an exact Newton step instead of an inexact Newton step in Algorithm 2.1.1. 2.2.5 The role of GMRES in Newton's method The GMRES algorithm offers the opportunity to compute Newton's method in a more efficient way. For instance, term criterion (2.7) can be computed several entities within in light of (2.18) the forcing as IIIF(k)ll- {3lq~+lelll IIF(k-l) II (2.21 ) :38 2.3.1 General remarks In the Bi-CGSTAB algorithm residual fi is orthogonal way.ri is orthogonal the iterates to {rd~-l 1'0, (Bi-orthogonality Wjx), condition). The i-th residual can where Pi is a monic polynomial of degree less than or equal to i. The "shadow" residualsrj rt=l (1- in such a way that the with respect to a sequence of vectors {fd~-l and in the same be expressed as ri = Pi (A) Qj (;r) = are constructed where the are implicitly computed Wj as ri = Qi(At)ra. Here, are chosen so that i # j. This last condition can be enforced without explicitly referring to At. BiCGSTAB has small storage requirements, and produces a solution smoother E Xa + K2k residual norm behavior guaranteed 2.3.2 Xk requires two matrix-vector (A, ra) . Typically, products per iteration this method produces much than CGS, but the residual norms still are not to decrease from one iteration to the next. Algorithm Gi\'ell a mat.rix .4.. an initial guess, ,fa. right hand side vector, b. preconditioner and stopping tolerance s. the BICGSTAB algorithm Algorithm 1. fa 2.3.1 .ro, b, AI, e) BiCGSTAB(A, = b - Axa. 2. Choose ro so that rfJ'Fo =/: :3. Po = Q = ",",'a = i = 1; Va O. = Po' = O. -!. While (Ilrill > s) do I "1'. 1 PI. - rafi-l,}J -t . I:J _ -. (-2L) ( -:-. P,_I <> "'1_1 ) is AI 39 4.2 = Ti-l + {3 (Pi-l Pi Mp = Pi. 4.3 Solve Ap. 4.4 Vi = 4 ..5 Q = pi! 4.6 S = (~Vi)' Ti-l - QVi. 4.7 Solve ills = s. 4.8 t = As. 4.9 Wi=WS)j(ttt). + QP + WiS. 4.10 Xi = Xi-l 4.11 Ti =s- 4.12 i = i Wjt. + 1. j-d· - Wi-lV 41 Chapter 3 Secant methods We now introduce secant methods as iterative procedures linear equations. for solving linear and non- They are better known for solving the latter type of problems (e.g., Broyden's method, DFP, BFGS, and so forth; see e.g., [49, 79,98, 101]) but, recently, there has been important activity formulating [50, 52, 135, 144]. Recent developments secant methods and other established In our particular context, ways to solving nonlinear has been one of the most effective when the computation Traditionally, as a linear solver and consequently, the iterative algorithms literature. almost forgotten (the EN algorithm, Their algorithm of the Jacobian matrix this method has been con- Eirola and Nevanlinna since they developed a new algorithm based on rank-one updates. between methods. Broyden's method equations linear solvers have shed light on new connections is highly expensive or infeasible to obtain. sidered impractical them as non-symmetric throughout [52] revitalized the subject as it is currently known) is an extension of the linear version of Broyden's method and it is mathematically equivalent a special selection of the type of update. Van der Vorst and Vuik proposed efficient ways to implement sion of GMRES (RGMRES) to the GMRES algorithm the former EN algorithm [133, 134, 135]. Motivated by these results, Deufihard, . Freund and Walter [.50]revisited the family of secant updates in combination with a In this way they were able to improve the effectivity of Broyden's method as a linear iterative with the GMRES algorithm. more which led to a recursive ver- . suitable line-search strategy. for method up to the level of being competitive A unifying algorithmic advances was recently introduced approach of all these previous by Yang in her Doctoral thesis [144]. It is impor- 42 tant to remark that her work provides a nonlinear version of the EN algorithm plays an important role in the present dissertation. In t.he scenario of nonlinear exploiting the conjunction pOl·tant attempt equations some isolated activity has been devoted to of inexact Newton methods and secant methods. rate of convergence by absorbing treated secant approximations to reuse information .Jacobian equation. of low-rank updates regard Broyden's left or required by the linear iterative Approaches local q-superlinear of the Jacobian matrix into Other efforts on the sources of inaccuracy in [48, 47]. Most recent approaches method Newton as a vehicle like these were proposed by Martinez [89] and Brown, for I\rylov iterative methods such as GMRES. In the optimization ically in truncated methods, similar attempts preconditioners arena, more specif- were previously reported Nash [9.5, 96]. An approach based on the combination truncated is reported by Byrd, Nocedal and Zhu [29]. Newton methods to justify and describe a couple of new nonlinear algorithms Instead of building preconditioners to propagate that subspace. Krylov subspace information order versions of this algorithm through Broyden's it can be recurrently and of Newton's method. can afford the use of any desired preconditioner overhead in computing provides the for large scale out of secant updates, we rather suggest The idea allows the efficient implementation (NEN) algorithm and furthermore, by of limited memory BFGS and This chapter reviews part of above the ideas but more importantly, problems. are method for solving the Saad a.nd Walker [21]. Their work was primarily meant to generate foundation One im- dates back to the work Eisenstat and Steihaug in extending inexact- ness issues to Broyden's method [54]. Basically, they characterize linear residuals. which updates restricted of the nonlinear extended to EN to develop higher- As a result, these algorithms to the current problem and avoid the secant updates and Jacobian evaluations simultaneously. 43 The discussion sets out with Broyden's method and the EN method. the linear version in order to contrast both methods the nonlinear case. This encompasses updates as a prec9fr?itioning tical scope, they certainly strategy. and motivate the arguments §3.1. Thereafter, Martinez and Brown, Saad and Walker enhanced We present we highlight some ideas of by the possibility of using secant Although, they may turn out of limited prac- drive the need of using the secant method as a device to alleviate the high cost involved in solving the Jacobian system. This describes the contents of § :3.2. In § :3.3. we suggest a way to perform rank-one updates Hessenberg matrix resulting from the Arnoldi factorization. previous development (potentially of the In § 3.4, we find that the allow us to reuse the Krylov information version of the NE~ algorithm The overall procedure for and devise an efficient faster than the inexact Newton's method). is named the nonlinear KEN algorithm and its cost is about the same as that of a GMRES solution plus the almost negligible cost of updating Hessenberg matrix and an extra minimal residual solution such as (2.17). higher-order method is also introduced. 3.1 version of this algorithm for Newton's a An even The family of rank-one solvers Rank-one updates for solving nonlinear equation are sometimes referred as secant or since no "true" Newton equation qua.5i-.\'fwton methods all iterations. These updates new Jacobian approximation. one originally introduced obey a secant condition is ever formed throughout that has to be satisfied by the The best of these methods still seems to be the first by Broyden [24, 2.5] *. 'Throughout this dissertation, we restrict the attention to the "good" versions of Broyden 's method and the NEN method since they have been commonly observed to be the most effective in practice and do not introduce a loss of generality to our discussion. 44 Under the same philosophy, rank-one updates for solving linear systems progressively updates an a.pproximation to the desired solution. conditioners approaching this variability to the matrix (or its inverse) in order to converge These approximations are nothing more than variable pre- or acting as the original operator in the preconditioner discussed in §§1.4.2.) Unfortunately, of the system. (Note that implies the non-stationary iterative method, as this type of methods has had a bad reputation in solving linear systems for a long time. Recent efforts such as [50, 52] have contributed to a reconsideration of this position. for Yang in yielding a nonlinear Remarkably enough, these works were valuable interpretation of the EN algorithm [144]. Our goal in this section is to introduce Broyden's method and the NEN algorithm. The evolutionary path leading to the current NEN algorithm the developments in the linear case be covered first. further motivation algorithm, of the ideas in this chapter. which incidentally requires that some of However, this should serve as We emphasize the essence of the EN presents a close affinity to higher-order methods derived from Newton's method and already known for around thirty years. The key result is that inexactness can be introduced into these rank-one met.hods without losing much of their local rapid convergence. 3.1.1 Broyden's method Given u ~ u- and lvl ~ J (u) , we can find an approximate new Newton step, ll+, by (:3.1 ) Broyden's method computes a new 1\1+ by means of the following rank-one update iU+ = 1\-'/ + [p+ (u) - P (u) - 1\-'/ s] gt gts ' (:3.2) 45 which, whenever the approximated Jacobian equation is solved exactly, i.e. lYf s = -F (u), it reduces to ~'l+ _ lV. for gts # u - lVl + F+ gt(u) s gt , O. The vector 9 can be chosen in several ways. For instance, we obtain the "good Broyden's update" the "bad Broyden's update" and when 9 = Mt when 9 s, [F+ (u) - F (u)], we have (see [49]). Applying the Sherman-Morrison- Woodbury formula (1.3) we obtain the corre- sponding inverse form of (3.2) (3.3) where y = F+ (u) - F (tl), In particular, P = l i- o. AJ-1 and provided that py if F (u) = Ax - b = 0 is a linear function, then it is not hard to see that (3.1) represents the instance of a stationary lvI. In such case, we have the·following iterative method with preconditioner formula to update M-I at the ith iteration (3.4 ) with qi = ri+l - ri, nqi # O. Here, ri, denotes the ith residual of the linear iteration. As in t.he nonlinear case. there are several possible choices for hensive list of choices for which we refer the interested she concludes that ji where Po, PI, ... ,Pi-I = (I - AA'!i-1 )t(I - .4AJi-I)qi form an orthogonal initial guess ia = Xa conj-ugate + AJal Xa. residual reader to [144]. In summary, and ji = AI-t(pi - L~::b(pj, (GCR) method were mutually Pi)Pj), by Pi,i The former is mathematically and to GMRES orthogonal. Both options with the of updates make Broyden's = equiva- The second option coincides with the projected dates studied by Gay and Schnabel [68] and implies the computation if all directions cites a compre- basis for the space spanned 0,1, ... ,j - 1, are two of the best options. lent to the generalized Ii- Yang upas method -!6 to converge in at most n steps. line-search strategy Deuflhard, Freund and Walter [50] incorporate to refine the proper step length, Cl'i, for updating a intermediate residuals and solutions, that is (:3.5 ) This makes the method also competitive with GMRES even for simple choices of Ii such as Ii = l\ifi-tpi (i.e., the "good Broyden's update"). All the above features were absent in the former Broyden's was established to terminate version of Broyden's definition of algorithm within at most 2n steps [67]. Therefore, [25] which an improved method for the linear case looks as follows (we leave open the Id: Algorithm 3.1.1 (Linear Broyden iterative solver) 1. Give an initial guess 2. Compute TO Xa 1\-10'1. and inverse preconditioner = b - Axo. :3. For i = O. 1, ... until convergence do 3.2 qi = Api. "1,-1 (Pi-,'-'ti-1q, )1,1 3.3 •.\/-1 i+l = lv.. j + II q, I 3.4 Cl'i = II{TI. Provided 1 q, that Ifqj . P 'd d h rOVl e t at -'- i qj T a. # o. Note that except for the update in step :3.3 and defining this algorithm ft is a general form of a descent method Ii = qj, for all i for linear systems. = 0,1. .... Eisenstat. 47 to derive th~. GCR method and other Elman and Schultz [53] use this presentation three closely related methods. Broyden's method for the nonlinear case relies on equations (3.1) and (3.2) above. One of the major virtues of the method consist of finding the minimal solution Frobenius norm ( i.e., IIM+ - in MIIF) over all matrices satisfying the secant equation M+ S =Y = p+ - p. (3.6) For simplicity, we omit the line-search step length determination in the next al- gorithm. Algorithm 3.1.2 (Nonlinear Broyden) 1. Give an initial guess uta) and Jacobian approximation Mo. 2. For k = 0, 1, . , . until convergence do 2.2 Update solution 2.3 2.4 q(k) = F(k+l) JI(k+1) _ = i\lI(k) u(k+1) = ll(k) + s(k). p(k). + (q(kl_M(klsP'l)( slI'l )' It can be shown (see e.g., [49, 79]) that Broyden's superlinearly u(k) # u· to P* = P (u·) = 0 under Assumption method iterates converge q- 1.4.1 and limk_-x, u(k) = lL*, if and only if (:3.7) Condition cornerstone optimization (3.7) is better known as the Dennis-More in proving local q-superlinear (see e.g., [43, 49]). characterization and it is convergence for general secant updates in 48 3.1.2 The family of EN-like methods The family of EN-like methods Eirola-Nevanlinna algorithm in the way the step length, update are specified. efficient algorithms outperform refers to a generalization formerly proposed Qi, in [52]. The generalization and the components Depending given by Yang [144] of the intervening resides within the rank-one on how they are selected, Yang shows that more than Broyden's other well established method are obtained which consequently, may methods such as the GCR algorithm and GMRES. In general terms, the EN algorithm computes directions based on Mi+l rather than on :Hi as depicted in Algorithm 3.1.1. This implies that the algorithm is looking one step ahead compared to Broyden's complexity the EN algorithm approximately method. restarts, truncation EN algorithm is about twice as fast than the other two. in terms of memory management and implicit updates) and computation give additionally The linear version can be described as follows: 3.1.3 (Linear EN iterative solver) 1. Give an initial guess 2. Compute Xa and inverse preconditioner 1\10'1. ra = b - Axa. :3. For i = 0, 1, ... until convergence do 3.2 qi = Api. '33 "1-1 '.' .l~ i+1 :3.4 Pi = -. 1 ,\Ji A1i+\ri. + (p,-lv[,-Iq, f', q,. (through slight advantages [144]. Algorithm of doubles both Broyden's and the GMRES algorithm [.50, 144]. However, the EN algorithm Careful implementations Hence, the computational )1,'. PrOVl 'd e d t hat ft i qi ...i. I a. to the 49 3.6 Qi = This algorithm ri jji: Provided that fttqi • i q. is equivalent Moreover, as the composition of two Broyden's = O. to Algorithm steps 3.4 and 3.5. Q # one iteration 3.1.1 when Pi = Pi and qi = qi in of the EN algorithm iterations: can be regarded one with unity step length (i.e., 1) followed by another one with the optimal choice (3.5) without updating lvli-1 to A-I. approximation In light of the similarities the in the linear case, one can expect that the nonlinear version performs two Jacobian system solutions as suggested by steps 3.1 and :3.4 above. Indeed, Algorithm 3.1.4 (Nonlinear 1. Give an initial guess uta) EN) and Jacobian approximation 1\-'/0. 2. For k = 0,1, ... until convergence do 2.2 q(k) = JI(k+1) F(k+l) = _ plk). AIlk) + (qlkl_M1kls!k))(s!kI)1 (s(k)f Update solution Notice that the direction combination of the direction U(k+l) = computed s!k) u(k) + S(k). by the nonlinear delivered by Broyden's EN algorithm method is a linear and an extra direction coming from step 2.4. In fact, it can be shown after some algebraic manipulation u(k+l) = u(k) that + s(k) (3.8) .jO where S(k) = _ (NI(k))-l S(k) = _ (NI(k») -1 p(k), + S(k)) p (U(k) , and provided that U(k+l) = u(k) (S(k)) t S(k) _ (NI(k») =/: O. -1 Furthermore, + elk) [P(k) p (u(k) _ (M(k)) -1 P(k))], for k = 0,1, ... (3.9) The last expression clearly exhibits that the updated solution is formed by combining a Bl'Oyden's step and a damped chord method step. The chord step is defined by fixing the Jacobian (its approximation Kelley presents an updated in this case) for .some iterations. analysis of this method in [79]. Since the angle between the two directions r and magnitude, If l"1lk) = jlk) is defined by II' as 1 = lI';tk) 1- This clearly shows that for mutually chord step is performed. (k) elk) can be reformulated u ,5<k) ,5<k) s cos l' = Ilslk) IIl1 lI(k) and s(k) (s(k) the damping parameter Incidentally, II ~COs1' Ils{klll orthogonal directions s(k) and s(k), a full On the other hand, if both entities are identical in direction then the chord step contribution vanishes. and elk) = 1 for k = 0,1, ... then (3.9) becomes 51 This recurrence represents a higher-order modification generated by (3.10) converge q-superlinearly studied by Shamanskii of Newton's method. Iterates with q-order 3 [105]. These methods were [118] and Traub [129]. They pointed out that even higher- order methods can be built out of a longer sequence of chord steps alternated regular Newton steps. In a more recent treatment, Shamanskii and compares the particular with Kelley names those methods after case (3.10) numerically against Newton's method [79]. Here, we rather adopt the term composite Newton's method when referring to recurrence (3.10). Along the lines of Gay's local convergence analysis for Broyden's was able to show that the NEN algorithm dimensional problems [144]. Therefore, verges twice faster than Broyden's important converges n-step method, Yang q-quadratically for n- as in the linear case, the NEN method con- method. The following theorem summarizes this result. Theorem 3.1.1 Let the standard assumptions Let for any x, yEn Then there exist t, in Assumption = f~ J [y + t (x that Ilu(O) - u*11 ::; ~ Rn F (x) - F (y) 8, CB, CEN > 8 and, for which Broyden's a such 1.4.1 hold. - y)] (x - y) dt. t and IIA1(0) - 1*11 ::; method converges as follows and the NEN method converges as follows Hence, the NEN algorithm the one-dimensional converges q-quadratically as does Newton's method in case. Note that the method reduces to a forward finite difference method in 1-D (144] which is sometimes referred to as Steffensen's method [10.5]. In .52 such case, the above equations give rise to the recurrence (k) S a(k+l) . p(k) =--a(k) = p , + S(k)) (U(k) - p(k) S(k) U(k+l) =U(k) _ p(k) a(k+1) . The first equality provides a systematic way to adjust the step length within the forward finite difference scheme as the iteration progresses. the shorter the step s(k) The steeper the slope a(k) and, vice versa. Moreover, current derivatives are estimated in terms of the previous derivative rather than two consecutive function values as it occurs with the secant method. dimensional It has been proven that the secant method for one- problem converges 2-step q-quadratically we can easily determine that the EN-algorithm and two extra floating point operations A key point can be made. NE~ method is to composite [67]. In terms of complexity. requires one extra function evaluation compared to Broyden's method. Broyden's Newton's method is to Newton's method. method what the Hence, it is possible (in fact. not rare in practice) that the NEN method produces faster converging iterates than those of :\ewton's method. especially, when The following example corroborates 1\1(0) and u(a) are sufficiently good. t.he previous observation. The cases shown there will be frequently brought up as the ideas are developed throughout and next chapter. We momentarily look at convergence in terms of nonlinear itera- tions and leave the discussion on computational operations) the present cost (i.e .. in terms of floating point to Chapter 6. Example 3.1.1 vVe consider the extended versions of the Rosenbrock function and Powell function described in Appendix guesses u(O) = (0,1,0.1, ... ,0, l)t and u(a) B of [49] with initial = (0, -1,0, 1,... ,0, -1,0, l)t, .53 Powell Rosenbrock 10° -2 ~ 10 -. ~ -8 10 10° 10-2 ~ 10-· z _. ex: 10 8' 10 - ~ 10-· o ~ 10-· -10 10-10 10 -'2 10 10-12 0 15 10 0 5 10 Iteration Iteration Chandrasekhar ~ z Chandrasekhar [c-.9] 10° 10° 10-2 10-2 ~ 10~ ~ 10-· 10-10 10-12 4 '- \ 6 8 '- \ 10-12 ! 2 [c~.999999] 10~ ~10-· [ 10-10 \ o 20 ~ 10-· o , , , ~10-· 15 \ \ 0 5 Iteration \ 10 15 Iteration Figure 3.1 Convergence comparison of Newton's method (dash-dotted line), Broyden's method (dotted line), the composite Newton's method (dashed line) and the NEN algorithm (solid line) in their exact versions. respectively. We a.lso consider two variants of a more physically sound problem which arises in radiative heat transfer applications by the so-called Chandrasekhar P(u) = H(u) H-equation - 1- and modeled (see [38, 79]): 1 £ r1 tJ.H(~)d~ 2 Ja tJ.+~ = 0, with u E [0. 1] . There are two solutions known for acE one, the problem (0, 1) and, as this value approaches. becomes harder to solve. specifications given in [79]; that is, H (u) the composite midpoint of the H-equation Here, we closely follow the = u, uta) = (0,0,0, rule to discretize the integral. are determined by setting ... ,0, O)t and The two variants c = .9 and c = .999999. For all four different cases we specify 100 unknown variables. Figure 3.1 shows the relative nonlinear residual norms (NRNR) against the number of 54 nonlinear iterations for Broyden's method (dotted line), Newton's method (dash-dotted line), the composite Newton's method (dashed line) and the EN method (solid line). For the first and last method. the initial Jacobian approximation = 1\;[(0) J(a) was defined. The backtracking line-search method was utilized in all methods. In the case of the Rosenbrock function, both Broyden's EN method were unable to generate a descent direction method and the II PI\ for at the first few steps of the process. In such case it was required to reevaluate the .Jacobian by the finite difference approximation. can see that the NEN method employed by Broyden's takes roughly half number the iterations method. This reduction reduction in iterations Newton's method. showed by the composite Newton's method except in the Rosenbrock The NEN method linear residual norms. converging superlinearly However, in all cases we appears surpasses Newton's in .50% the method over to converge faster than case at relative small non- In the remaining cases, the NEN method appears with a q-order between 2 and 3. Again, this trend breaks down in the Rosenbrock case, where also Broyden's method has seriolls difficulties and seems to have a q-order close to unity. For small and moderate problem sizes, the exact \'ersion of the composite Newt.on·s and the NEN methods can be efficiently implemented factorization to solve two linear systems with different right hand sides. plies significant savings in pivoting operations evaluations methods. by reusing the underlying and rank-one updates .Jacobian is computationally This im- whereas the total number of functions are reduced due to the faster convergence of both Kelley observes that alternation posite Newt.on is potentially LU attractive of chord steps and Newton's steps in comfor large scale problems where building expensive compared to function evaluations the [79]. Chord 5.5 steps may be a plausible and effective option in the setting of algebraic systems arising from transient problems (i.e., implicit formulation of parabolic equations), tial Newton iterates may be close to the root, particularly where ini- in simulations approaching the steady state (see e.g., [.59]). However, in large scale implementations where linear iterative tually a must, the high efficiency promised by the composite methods are VIr- Newton's and NEN methods fades away on account of the fact that two Jacobian systems must be solved from scratc h. The pitfall is that most iterative Krylov subspace methods) do not offer reusable information in the linear system coefficients. NEN algorithm methods as computationally Consequently, (including some popular in the event of changes this makes an inexact step of the expensive as two steps of an inexact Broyden's method. Fortunately, as we saw in §§2.2 the GMRES algorithm preserves Krylov infor- mation delivered by its intrinsic Arnoldi factorization. However, until now, this in- formation in subsequent has been restricted to build preconditioners GMRES within the inexact Newton's method. performed upon the current underlying utilizations of We show that chord steps can be still Krylov basis. preserve much of the integrity of an inexact nonlinear In this way, we are able to EN algorithm and recover the efficiency that it promises compared to Newton's and Broyden's method. 3.1.3 Inexactness in secant methods The issue of inexactness in quasi-Newton Reference [54] is of particular and Steihaug in [41]. has been examined in [54, 126]. interest since it is shown there that local q-superlinear rate of convergence is still attained results are a generalization methods for the inexact Broyden's method. of the work previously In fact, those developed by Dembo, Eisenstat ·56 Since the same conditions stated in [54] can be also imposed upon the inexact NEN algorithm, iterates. it is straightforward to show that it produces q-superlinearly convergent These conditions are given by . IIAJ(k)s(k) II IIAf(k)s(k) + F(k) k--00 and . + F(k)11 lIm lIm k-oo n/L\11 -, II - F(k+l) " "", = O. Rosenbrock 10° 10-' 10 4 ~ 10-4 0 ~ 10-0 :z ~ 10- c:> - 0 8' 10 ·0 ~ 10- - 10-10 10-12 _10 10 o 5 10 15 10 -,.0 Chandrasekhar [c=.9] 10 Iteration Chandrasekhar 15 20 [c=.999999] 10° 10-' 10-' ~ 10-' ~ 10-4 :z 5 Iteration 10° ~ Powell -. 10° ~ 10- a - :z 10-0 ~ :z. 0 ~ 10- 10-10 ...... , 10-" , 0 , , 10. 10-'° \ 10-12 o 10·" 2 4 Iteration 6 8 o 15 5 Iteration Figure 3.2 Convergenge comparison of Newton's method (dash-dotted line), Broyden's method (dotted line). the composite Newton's method (dashed line) and the NEN algorithm (solid line) in their inexact versions. Clearly, the first condition follows if the forcing terms converge to zero as k -+ ()(). The second one suggests that the residual should look like the value of the function at the new point with a discrepancy direction produced for k ~ 00. these conditions hold and u(k) a q-superlinear way. -+ size converging faster to zero than the size of the Eisensat and Steihaug show that whenever both of u·. it follows that the sequence {u(k)} converges m .')/ Rather illuminating than going over the lengthy details of this proof. we consider it more to present the convergence results for the cases exposed in Example :U.1 with G:\[RES solving the .Jacobian equations. Figure 3.2 presents the convergence history when G:\lRES Example 3.1.2 was used as inexact solver of the .Jacobian equation. tracking line-search strategy and the forcing term selection discussed in Chapter 2, The G~[R ES restart parameter and no preconditioning We follow the back- was chosen to be :30, 1]rna.x = .1 \"'as specified. As Figure :3.2 shows there is no ap- parent change in the convergence of the composite Newton's method and \"ewton's method. The secant methods in the number of iterations but without t.hat both ha\'e between each other. or ree\"aluat ion t instead, show a slight increase altering the convergence margin Rarely enough, the inexactness and he .Jacobian were more beneficial to the :'JE~ algorithm in achieving bettf'r convergence rates than \"ewton's method itself for the Rosenbrock fUllction. Table :3.1 and Table :3.2 complement by illustrating the number of G~[RES iterations the iterations for the particular these results and \'allles of case of the C'handrasekhar 1](k) along H-equation wi the = .~)999g9. 3.2 Secant preconditioners Secant procedures nonlinear equations ha\'e been traditionally at a lower cost. conceived as an alternative As happens way to solve with inexact Newton solvers they have been studied to tackle large scale problems. Recently, both methods began to be l'f'garcled more as complf'mentary procedures. than competing position has been mainly driven by ~[artinez Theory support.ing this [89. 90]. Brown. Saad and Walker [21] Table 3.1 Comparison of Broyden's methocl and ~ewton's method soh'ing the the C'handrasekhar H-equation with c = .999999. Broyden R:\R k 1 '2 :3 -! .1 I 2.24e-O 1 8 .-!:'S("-0'2 :3.-17e-0:2 1.l:Se-O:2 :L96e-O:3 1.:")1e-0:3 fi -I 6.2:")e-0·.t. L2le-04 8 1.1Oe-0-! 9 10 :3.1:3e-O.j II 9.05e-06 12 :3.2.Se-06 1:3 9.9Se-07 1-! 1.:").)e-07 1:j .1. 12e-09 16 :2.:j:3e-1O 17 ~.:3ge-12 Table for ~ewt.on '1( k) 1.00e-01 1.00e-0 1 1.00e-01 1.00e-01 1.00e-Ol 1.00e-0 1 LOOe-Ol 1.00e-Ol 1.00e-Ol 1.00e-01 1.00e-01 1.00e-0 1 1.00e-01 1.00e-01 2.88e-02 2.7:Je-O:2 ILl 2 1 1 2 :2 2 2 2 :J :3 :3 :3 :3 :3 :3 :3 R~R I 2.2-!e-Ol 4.98e-0'2 1.44e-02 :3AOe-03 8.:2.5e-04 2.28e-04 .SAge-05 1.2ge-0.5 2.53e-06 2.65e-07 -!.74e-09 -!..54e-l1 1.2ge-l.j TJ{k) 1.00e-01 l.00e-01 1.00e-01 1.00e-Ol 1.00e-0 1 1.00e-01 1.00e-01 1.0Oe-01 1.00e-Ol 1.00e-01 1.6.5e-02 2.72e-03 ILl 2 1 2 2 2 2 :3 :3 :3 :3 3 -! 3.2 Comparison of the ~E:\ and the composite :\"ewton's method for soh'ing t.he the C'handrasekhar H-equat.ion with c = .999999. :\E:\" /,: 1 :2 :~ Il :) 6 I S 9 R:\R 9.16e-02 1.60e-02 2.66e-O:~ -l,O~e-04 ~).oge-a.) :3.l7e-06 6.68e-07 2.8:2e-OS 6.2-!e-10 12.2:3e-11 2.90e-l-! I 10 11 I TIl Compo J.:) 1.00e-0 1 8.S0e-02 l.OOe-Ol 1.00e-O 1 l.OOe-01 1.00e-01 2.66e-0:2 -!.01e-02 1.-!.Se-02 :3..j7e-02 ILI .) :2 2 :2 :3 :3 :3 :3 -! :3 R~R 1.0-!e-0 1 lA6e-02 :2.0:3e-0:3 :3.l-le-O-! 4.:36e-0.j .j.20e-06 2.·50e-07 2.23e-1O 1.0.5e-l.j I ~e\\'ton q(k) 1.OOe-0 1 1.00e-01 1.00e-01 1.00e-O 1 1.00e-0 1 1.00e-0 1 l.OOe-01 1.93e-02 ILl 1 2 :2 2 :3 :3 :3 -! ·')9 propose multiple secant updates of a given Arnoldi factorization good preconelitioners for G~IRES in subsequent to eventually generate nonlinear iterations. Additionally. a few scattered efforts aim at the possibility of combining secant updates with sparsity preserving methods (i.e .. structured least-change secant updates) for solving .Jacobian systems (e.g .. [..19] and references therein). ~evertheless. integration of secant updates ods is barely in its beginnings. preconditioners into other traditional iterative meth- In our reality. we consider Martinez's theory of secant and Brown. Saad and \Valker ideas on multiple-secant-update ditioners to be the most representative precon- works .. -\t this point. we momentarily digress and revie\\I these two works. The discussion highlights some of the practicallimitations to be taken into consideration in the development that follows in the present and next chapter. 3.2.1 Secant preconditioners One of the most instructive inexact and quasi-\ewton relies on C[uasi-\ewton for inexact Newton points in \lartlnez's methods. methods \\lork is his complementary For the purpose of complying with (2.5) \Iartlnez updates of some approximation to the .Jacobian matrix. t his condition is not fulfilled for a fixed linear tolerance ". t.hen the condition to be satisfied by a standard inexact )iewton iteration step. eventually. steps should predominate and dictate the quasi-Newton view of \[artlnez When is forced argues that the convergence of the method. Consider sol ving (:2.l ) and let .\I(k+l l be the precondi tioner of the .J acobian matrix. J(k+ll JI(k+l) corresponding to the (I.: + l)th ~ewton is required to satisfy the secant equation iteration. The secant preconditioner 60 where .,,(k) = u(k+l) _ u(k). Hence. the secant preconditioner for the inexact ~ewton method can be algorith- mically described as follO\\!s: Algorithm 3.2.1 (\"ewton with secant preconditioner) Let 0 < '1 < 1 nnd Illk) E (0. 'I) such that limk_x, 1. Gi\'e an initial guess 1/(0) and preconditioner ,,(k) = O. then J/(O). 2. For k = O. 1. , . , . until convergence do . 2.1 If (,\I(k) is nonsingular) 2.1.1 Solve JI(k)s(k) = then -F(k), 2.1.2.2 goto step 2.:3. .).) Find s(k) such that IIJ(kls(k) + F(k)11 < 'l(k) IIF(k)11 by some itera- tive method. ',!,l f'pdate preconditionf'r :\ote that this algorithm of the preconditioner. of ./(0) may be quite expensive if no care is taken in the select.ion To make this algorithm efficient step 2.1.2 should be preferably satisfied in ll10st cases. approximation .\1(1.;) by The initial preconditioner and com'eniently expressed .\[(0) is to be chosen as a fair in some factor form (e.g .. block 61 Jacobi. ILU). \Ve remark that step 2.-! should be performed so that the 100\'-rank update strategy can take advantage of the underlying data structure. limited memory compact representations To that purpose could be a reasonable choice [28]. The main drawback of this approach is to carry .Jacobian evaluations updates simultaneously. decrease directions for \Vhen secant updates II F(kl II and secant are doing a good job in generating then an overhead is incurred in evaluating .Ilk) in order to check the condition in step 2.1.2. Conversely. if the secant updates are performing poorly. this overhead is added indefectibly 2.2. One may argue that for the inexact iteration. generate ill-conditioned the iterative solver. the operator to the inexact iteration AI(k) suggested by step can be still useful as preconditioner However, there are cases where the secant updates systems and hence they may be inadequate can for accelerating Besides, there is no reason to believe that the updates should eventually generate an operat.or close to t.he true .Jacobian (see [4-9]for discussion on this). In summary. it is hard to find a case when this algorithm \'ersions of both Broyden's In order to guarantee ~Iore characterization ators. outperforms inexact method and Newton's method. J/lk+ll has to satisfy the Dennis11.\!1k+1lll.II(.\!lk+ll) -III should be bounded oper- fast local convergence. (:3.7) and ("nder these assumptions. the potentialities of the abO\"e algorithm can be formal ized as follows: Theorem 3.2.1 Let the assumptions Let ;\ssumpt ion 1.-!.1 also hold. Slk) Proof = s(k) and the convergence above hold for all k Then there exists a k > k = 0.1. .... such that is q-superlinear. o See [89]. The theorem states that there should be an eventual good approximation .Jacobian (or its action onto a descent direction) that insures the condition to the at step and Hm. :'-iotice that this block form (:3.18) implies t.he application of m consecuti\,{' Broyd(,[l'S updates. In terms of floating point operations pensiH' than other standard disa{!\'antage. the method may turn out to be more ex- preconditioners such as block .Jacobi or ILU. The other is that using the identity matrix as an initial approximation be enough to get significant improvement may not on the rate of convergence of G \IRES as shown by Brown. Saad and \Yalker for a rather simple Bratu problem in i-D. In light of (:3.18). something a prohibiti\'e owrhead. lated C\IRES method. iterations more elaborated induces For that problem they observed a three-fold lesser accumucompared with no preconditioning ~Ioreo\"er. the preconditioncr for the entire ~ewton's is built in a least-change where t he secant update preconditioner Laplacian. than this initial approximation secant update format. is combined with a fixed part given by a 1-0 Brown. Saad and \Valker also in\"estigate a "bad" Broyden's update and a hybrid \'ersion leading to similar conclusions. ,\Ithough tlwir pxperiences are limited we consider them inspiring in the sense of ho\\" ~ecant updatf'~ may be used as a de\'ice to exploit underlying Krylov information. TIlt:' k('~' pOint "hall become more e\'ident in the remainder 3.3 of the chapter. Exploiting Krylov basis information The previous discussion associated Illoti\'ates us to take ad\"antage of the Krylov information with .Jtkl or its approximation preconditioners. we restrict the generation in a different way. Rather than building of successive descent directions for IIFII to the current Krylov basis. This implies to perform rank-one updates in the Hessenberg matrix resulting npproximation from the .\rnoldi factorization (2.16) and implicitly of Broyden's llpdate of the .Jacobian matrix. here is to minimize the direct manipulation reproduce an Hence. the main objectiH' of the .Jacobian matrix and the use of G\lRES as much as possible in the process of converging t.o the root of F, );otp that in contrast to \[artinez's approach. we do not perform Jacobian evaluations and spcant updates at the same time. Consider A as an approximation to the current Jacobian with ,.{+ s = F+ - F restricted ested in looking at a minimum change to .4. consistent to the underlying Krylov subspace. matrix J. We are inter- A basis for this subspace ing an iterative linear soh-er sllch as G\IRES arises as result of us- for solving the approximated .Jacobian system with A. \Ve quote howeH'r. that the present development algorithm. The Full Orthogonali::ation is not only valid for the G~IRES .\Iethod (FOAl) also known as the Arnoldi iter- ati\'e method [11:3] can be employed for the purposes underlined here. It is important to remark. however. that the GMRES algorithm is still more robust and efficient than this approach [18]. 3.3.1 Updating the Arnoldi factorization In § :2.:2 we discllssed the role that the Arnoldi process plays in G~IRES. t h(' \'f'hide to express t he minimal residual approximation way. The.\ discarded rnoldi factorization at all e\'ery time a G\IRES reflect secant updates hasis. provides \'aluahle on the .Jacobian matrix without For the sake of simplicity. We now show how to altering the current Krylov induced by to converge at a predefined forcing term value). Consider the solution to the following approximated nonlinear iteration over. that should not 1)(' let llS omit the sources of inexactness the use of G\[RES whose relative residuals are supposed tolerance (i.e .. to a prescribed (1.9) in a more manageable information solution starts It is basically Jacobian equation at the kth ()() = A(k) ..,(k) G\IRES algorithm. with m steps of the beclcled in an inexact tained. \ow. method. Krylov subspace we wish to use the information an approximation s~) = s&k) + V(k)y(k) for this problem during can be regarded as be the solution is given by K~) the solution ('111- oh- /'bk)) (:ilk). . of (:3.19) to provide to the system I(k+l) with corresponding t.hat Let gathered .'""\ guarantee ( :3. 1!) ) , This linear solution Broyden's The associated _F(k) Krylov K:!~+l)( basis A(k+1), Arnoldi the h:rylo\' basis. is. That __ - K~)(.-.t(k+l). r~k+l») onto the corresponding ,(k+l) .::; = K!;)( factorization F(k+l) , (:3. 20) Clearly, r~k+1)). .-.t(k) , l'~k)). in general However, we can not rank-one of (:3.19) can be done without updates destroying or f>f(ui\"alently. for allY vectors:;. tv rather than dated by a rank-one A(k). E IRm. Expression :';ote that Before proceeding. In terms of a solution introduce an implicit matrix the current whose range it would .Jacobian equation to express on the Krylov in terms of a clearer way to update approximation lies on K~) (A be cOll\'enient lying strictly secant (:J.2:,n suggests (k), to be up- k )). the secant subspace. A(k+l). rb appears H,(,~') equation Othenvise. To remove this would the shift from the oj origin. we reformulate (:3.19) as 4.(k) S(k) , F(k) __ _ - Ilk) .,lk) "'0 /1. _ - r.lk) 0 , and redefine the final solution as s~l = ~.·(k)y(k), that is, as if the initial guess werf' zero. Obviously. the associated Krylov basis is the same depicted above. Thereforf'. the secant equation (:3.2:3) becomes for s(kl = \Iultiplying \.·lk)y(kl. (~r(k)r both sides by it readily follows that H~k+ll should satisfy the follO\\!ing secant equation H(k+l) m where.3 = Ilr6k)ll. = /Ilk) .~ )! (V(kl + 'Be 10 F(k+l) (:3.26) Hence. the Krylov subspace projected version of the secant equation (:3.:2:n can be written as ((V'(k)r F(k+l) + 3el - (y(kl)t Remark 3.3.1 of par/inl poft (y(k)r y1k) The form (:3.21) has been pre\'iously used in the context fl.5."igIlTnfllf problems ill control theory. place a few eigenvalues conforming t he spectrum set of eigenvalues representing technique H~)y(kl) The idea is to re- of a matrix A by another more stable modes within the system. This is applied once the Arnoldi process have deli\'ered Km (A, l.:) as a small invariant subspace under .-\ for a given vector v. Further details and pointers to this problem can be seen in [111]. The following theorem Broyden's update for A(kl: states that update (:3.27) yields a modified version of 68 3.3.1 Theorem corresponding update = _llk+l) Ilk) , = Let (:J.n) be the rank-one ,"'1 + .4(k) + of [P(k) according A(k) + /,(k) F(k+l) update to (:3.21) is given _ (P(k) A(k) P(k)) ,(k)] U ,0; 1 (s(k)) [F(k+l) _ F(k) of H~), then the by (,(k))t .s s(k) _ A(k)s(kl] (s(k))1 + (:3.28) (s(ld)ts(k) [(1- P(k)) + .4.(kl.s(kl) (F(k+l) - A(k)s~k)] (s(k)r (s(k))ts(k) Proof for notational superscripts convenience. ~, + l hy the symbol -- -- I/IF+ ~ let us drop +. Thus. +. J tl the in view of (:3.21) choose H mY -- I:'!F+ + .'3el ~ - k and replace the superscripts - H m Vt S, and I'lt t.1 u:=-= .'/.11 = A + \-.:u..I\·1 .l+ = .1 .1 \.... .' .IJ q ,t I..' ,'\ ' .Il + (\.\'. -- [-'"1- .0; v ... t S + 1 /'1) - \ 'Hm \/ts) 8 .o;t.o; Arnoldi factorization we substitute Hm = Vt.-tV into the above expression. Thus A+ = A + (VV1F+ +"0 - V\/tAVVt.s)st s/s which .~(ld E can be split A.~m (A(k). r6 up in the desired k )) . form (:3.28). ~otice that p(k)S(k) = s(k). since 0 69 We refer to the update (:3.28) as the I\'rylov-Broyden ator ;'\ote that the oper- rb k projector onto the I\:rylov subspace Km (A(k). is an orthogonal p(k) update. )) . That IS • • (P(k)) • (p(k)f 2 = (Idempotency). p(k) (Symmetry). = plk) The update of H~~) reflects an update of larger the value of update. In the closer both updates The following observation Remark 3.3.2 ,4(k) If (k) $0 = on a lower dimensional space. The (:3.28) and (3.30) are to Broyden's provides us with further insights. a then furthermore. \\"11('1'('.·t~+l) is the .Jacobian operator resulting from Broyden's update. This stems from the fact that the third term of (:3.28) is orthogonal ""m'(k) 1 (.I{k) .'1. .ro(k)) . .\ little algebra leads to the following alternative .-\.(k+l) = ..t(k) [(I - + .(Flk+ l) _ F(k) _ .4. (k) (s(k)) P(k)) F(k+l) (s(k)) s(k)) I. form of (:3.28) t . $(") + .-\.(k)$~k) _ h~L.mt·m+l ls(k))1 s(k) (v~s(k))l (s(k))1 to ,\ssliming -"6k) = 0, the above expression tells liS that the departure from Broyden's update does not only depend on the acute angle between the underlying I\rylov subspace but. also on how nearly the columns of of (:L:Hl) F(k+l) \.'(k) ilnd span an invariant subspace of ,11k). Clearly. H!,~+I) is not necessarily an upper Hessenberg matrix. that expression of H~) (:3.:21) can be efficiently performed by updating However. we qllote a given QR form (see e.g .. [.l9. T1]). This form is not readily available, instead most standard implementations of G\IRES progressi"ely compute a QR factorization of~) new column enters the Arnoldi process (recall discussion there are efficient ways to perform the QR factorization the last row of H~,:')already point operations factorization factorized in QR form. in §§ 2.2.2). as en:'ry Fortunately. of H!:) by just deleting This requires 0 (m2) floating (see [T1. pp .. )96-.197]) .. -\n even more efficient way to obtain this consists of keeping an immediate copy of the QR factorization before applying all previous Givens rotations to the new entering column. of H~) In other .words. if -(k) Hm-l is the QR factorization and ,.1 = (;=t = Qm-l R'n-I of the augmented -;1 /2)' = (/rn-l ( Rm-l ) 0 Hessenberg at the (m - l)th G\lRES ~tt'p with r E IRrn-1 is the entering column. then the QR factor- ization of H!nk) at the mth GMRES step is given by ( :3.:31 ) In both cases. it IS necessary to use 0 (m2) memory locations for storing the factor Q to keep update 0.21) within a cost of 0 (m2) floating point operations. ,I 3.3.2 On the Krylov-Broyden (:3.28) is the solution Expression liB - min 1!l)n.n A(k)11 F update to the problem subject s(k) = p(k) to (p(k) BP(k)) + r(k)0 p(k+l) . BEJ1'l.. In fact. IIA.lk+l) _ Alk)11 = II [(P(k)BPlkl) (P(k),4(klP(kl) s(k) - F (.s(kl)t (s(k1r s(k)] s(k) II F (:L32) due to the consistency 12-norm lution ( p(k) property of an orthogonal projector follows from t.he convexity B P(k)) ..,(k) = p(k) On the otlwr of the F(k+ 1) Frobenius is bounded norm and to the by 1. Uniqueness above liB - A(kIII of the functional fact over all F that the of the so- B satisfying + r&k) . hane\. it similarly follows that exprpssion (:3,27) is the solution to tilt-' problem GE1R~)(m :3.:3.1 establishes Theorem lems. IIG - FI~k)IIF However. set of matrix other of .1J = F(k+l) .r. the That is. set of matrices generating Gy = (\.'(k)r between - F(q by .:; I = F(k) + Jel. these two minimization (:3.28) can be stated Q = {B E IRnxn and to the equi\'alence \,iew of updat.e quotients subject lI(k+11 as follows. - u(k) Consider defined prob- Q. the by Bs = !J}. the same Krylov subspace Km = Km (A(k). r&k)) . -.) 1- The resulting among these matrix in (:3.:28) can be thought .-t1k+l) the set of nearest matrices two sets is not empt.y, ronst ruet ion of least-change standard secant (t->.g.. sparsity The tion, condition pattern. \'ectors the solution to Q. then Alk+11 E secant updates and other Furthermore, n .l' propert.y to (:3.19) lies on the = ..,(kl/((s(kl(s(kl). equation minimization if the intersection Q. This observation consistent the (see [-!-!. -!6]). subspace On the other (:3.2.1) and having satisfying by a given affine subspace in IRm. However, I\:rylov of is key in the with operators prescribed in IRnxn posit.ive definiteness) with thp secant following .r :; and u: in (:3.21) are arbitrary Fylkl/(ylkl)tylk)) sistent in to Alk'l in .1.' as the nearest we could hand. A(kl since by assump- in pick z finding .t' amounts = = V(k)u' con- V(kl:; to solving the problem (:3.:3:") ) Since the solution implying the n('iH'est .-\.lk+ll to intprpretation In" Dt'llnis is nothing same convergence de\'t>loped is "harmless" behavior for cOn\'ergence tat ions can be pxtended few adaptations. The holds as a consequence Dennis-\lore = - .4.(kls), (\/(kl)t(y then the update in .1' to Q is given by (:3.28). This of all matrices case of the general result established [-!-l]. in pxperimentation of (:3,:W) ,4.(k) by:;. more than a particular and Schnabel Exltausti\'p equality of (:3.:3.5) is gi\'en re\"eals tltat the last in the sense that as Broyden's of Broyden's to show q-superlinear bounded deterioration of the bounded charactt->rization this method. method in its exact convergence property on the right update [n fact. deterioration can be \'erified. IeI'm side of tilt' produces theoretical almost tools already and inexact of update for update of (:3.27). the implemen- (:3.28) with a (:3.:30) or (:3.28) In the same way the ~::":r====:J1 Powell Rosenbrock = ~ ~ 10·· :§> 10·· 1 iJ 5 10 I 10·'0 10-"[o 15 5 10 10 Chandrasekhar J Chandrasekhar [c~. 9} 10 .• 10' 10 Z a: 10 .• o ~10·· ,at g. 10·· f - _10 10 -1210 J 2 4 -'2 10 8 6 0 5 Figure 3.3 (dotted Example Illethod 3.3.1 (dotted Convergence comparison between Broyden's method line) and the Krylov-Broyden method (solid line). In this example. the com"ergence line) and of the Krylov-Broyden C-OlllP"l't'd for t he ~ame four cases presented llllJl"t' IluTin-'able differences ;t1r1l1l1l!!,hthe I\rylo\'-Broyden it starts delin>rillg !"\"entllally surpasses (not at relati\'e nonlinear residual casp of the Chandrasekhar shown) does not happen duces iterates faster more difficult \'ersion than gets the performance the crossing but again Broyden's of the problem. .. \mong STuck within norms approaching !wrformance (solid line) are all of them. in the case of the Rosenbrock lIlore rapidly equation. method of Broyden's for c = .'L In The former ('qllation nwthod behavior pre\'iollsly are detected t"llllcti"lI "lid tIll-' ('handrasekhar at -;ome point 15 10 \leration iteration rllt' [c=.999999] ~ 10·· ~ 10'· Z 20 ·2 10 a: , 15 \leration Iteration con\'erging 1.0 x 10-10. between both and method In the easy both methods method does at some points. cur\'es region. iterates of Broyden's the Krylo\'-Broyden method a gi\'en one. proIn the look alike but with sev- I'ral crossing points. The case of the extended a case \\"t're the [\:rylov-Broyden identically. rolumns In t his situation. of \ -Iq after was reached). approach that \ ot t' that H~~) implyin~ .Jacobian ,\s it can he obsen·ecl. is no major t he last that operators \evertheless. difference term and Broyden's an in\'ariant four G~[RES may be better. there method Powell function subspace iterations nothing after remains constant. Therefore. of ~-(.\:) span an invariant not only to the current .Jacobian the eige!1\"alues of subsequent implicit obsen'at.ion shall become 3.4 and its Ilsefulness .Jacobians which indicates in general. secant the eigenvalues (i.e .. the clo~er the columns the approximation about several the smaller by the breakdown experimentation both approaches size in approximating perform were generated can be asserted of (:3.21) is preserved t he error method (i.e .. a happy broader between illustra.tes updates of of corresponding the term Ilh~~l.mt'm+lll for A.(k») the better subspace but also the approximation with this approach. more evident in Chapter This to is a key -!. Nonlinear Krylov-EN methods III ,hi ... ""',·riIJn we pl'f~sent two algorithms ~t'IlI'r(\tPd rclati\'e ,ia l ;\IRES nonlinear for the inexact st:'\'f'ral consecuti\'e a descent The first algorithm case and it is based algorithm \\,hethpr as a de,"ice to ~('rlPrate residuals. second is a high order residual for IIFII on only one make lise of the [\:rylov accpptable minimization or a maximum method in Rmxm is exhausted prespecified informatioll for decl'f'i"lsing of the \E\ G~IRES solution problems by G\IRES directions is an extension \·ersion of \ewton's the [\:rylo\' basis produced direction that algorithm per iteration. and amounts (with to soh'ing m ~ and unable The n) until to generatf' user value is exceeded. I .) 3.4.1 The KEN algorithm nonlinear \\'1" are no\\' in a po~ition to describe that exploits the information t he nonlinear an inexact left behind by the G~[RES 3.4.1 (:\onlinear 1. Gi\'e an initial :2. For I, .) 1 -' guess rl~(k) '. ("~lk) '.1 • '2.:2 'Ilk) = (~.;~k) r and .Jacobian con\'prgence h(k) ·m+l.m· H(k)m ,vY'(k) m F(k+I) + _'+ruin ",,~,,) 1'1 Dt'note Je t+ its solution - .. ).- -t~·) _-' mY· Some comments = II (k) + .s( k) . are in order. Ilr(kllll a k ) ) ( y( k) ) by !?~'I. with p(k) = y-(k) (v·(k)r. II (k + I) .3 = - Hm '2.6 Perform '2. i approximation A(a). do (k+l) - \·(k)-(.(:) ,) - Hence. we introdllcf' =G'IRES( .V I(k) .'1.. _F(k) ,(kl) ..~. 3el . H~ )y( •,/(1, ) _ .::. method. Krylov-E:'J") 1l(0) = O. t. , . , until \'t>r~ion of the E~ algorithm (K E:'J") algori thm as follows. I\:rylo\'- Eirola-:'; evanlinna Algorithm nonlinear r yll . with . H~+11 = ( ) • The .Jacobian could bl" a.ddresspd l'xplicit G:\lRES \ote is not required p(k) and work upon the rest of the ,'alues returned and consequently. • \\'e ha,"p lIut included the direction .Jacobian Ination approximation (!t"l\ally particl\lar 3.4.2 .\ 1""1"11 Wl\tt'xt rallk-ulw 'tpdates of the next C;:\[R ES call. residual The extend minimization we abandon Hessenberg simultaneous sulfi<:iPllt amount of decrease This presentation h:E\ of IIF" such as in the nmtext of \ewton's Hesspnlwrg matrix may result the inexact \e\\'ton method than );ote that as part of its the update in step does not generate a lead to reset the current with a new .Jacobian Discussion in approxi- on this topic for the [-!9]. can be attained is l\f>termined descent Hessenberg further method. in relati\'e where as long as possible is delivered allows us to illustrate the form I:LH ) available or the update al~orithm matrix and the algorithm in producing .Jacobian purposes. factorization. can be found of these llpdates problem QR the process Krylov-Newton ,"prsion of the nonlinear to retrieve these situations method to is required. by finite differences), Broyden's 1)1' (suhject end. situations system Basically. for efficiency are readily storage to handle and restart obtained A higher-order fastt'r IIFII. formulations we suggest Hessenberg no extra criteria for to that by G:\[RES :2.6 gi,"es rise to an ill-conditioned descent memory also that. out step 2A efficiently, to carry machinery by limited in the next chapter). forl11ulat ion of • In order from be updated directions. updates by \'erifying before makin~ We stress faster that the In this opportunit ..... and check instead the condition further for a higher nonlinear the by the capability KE~ update updates order if a (2.10). uses of the Krylov-Broyden less overhead a possible by performin~ to the \'ersion algorithm. uf Thf' II point is that the latter one requires simultaneous may reaclil.y increase the total number of updates. situation of H;:) and updates A(k) which Of course, t.his may be a desirable in terms of rapid convergence updates but it may turn out to be expensiw in terms of computer memory use. The algorithm can be outlined as follows. Algorithm 3.4.2 (Higher-Order 1. Give an initial guessu(a) 2. For I..~ = 0, 1.... Krylov-Newton) and define [max. until convergence do 2.1 [,:;(k).!I(k).H~). == v~k),h~:~l.m.J Ilr~k)11l =GMRES(J(k),-F(k),,:;(k»). 2.2 I = O. 2.:3 Repeat 2.:3.1 = (V~k)r q(k+I) F(k+l+l) + 71 k+ll_ ]el' H~~+1l .III k+1l (.II' k+l;)' .Ill (.Ill k+ll) I k+l) :? .:3.:3 Solve f1lk+l+l) 'EK,{''''''~'') II .h, + -1l~~"+11 !III· with min Denote its solution by 2.:3.0 [= [ if .-;(k+l} m I (k) Im+l.mt", y(k+I+l). + 1. 2.-t Cntil (l = 2,;) 1l~+I+" = ( [max) OR ,~;lk+/) is not a decreasing step for IIF(k+l)lI. is a decreasing step for IIF(k+l) \I then c ). 78 2.6 else 2.6.\ = lI(k+I) Il(k) +.s(k+l-ll. :3. EndFor This algorithm can be de\'ised as a variant of the composite Newton's method that seek chord directions belonging to the underlying Krylov subspace. The faster version of t he nonlinear K E); algorithm can be easily stated from the above presentation just including the r,rylov-Broyden \'ersion should be appealing E:'\ are effective compared To verify that evaluation of F. update of in situations to );ewton's A.(k) within the repeat loop 2.:3. This where Broyden's method or the nonlinear method. represents a sufficient decrease for s(k+l) by However. this computation IIF(k+I)1I implies one extra can be reused by a line-search back- tracking method following the end of the repeat loop. In general, the failure of this sufficient decrease can be corrected by shortening the !>l'I'\'IOUS the step afterwards or by accepting accept able step as suggested in step 2.6.1. rIll' following example illustrates the performance of the last two algorithms sell t f'cl. Example III this particular 3.4.1 example. we use the line-search back- t racking met hod with the same parameter specifications Figure :3.-1shows t he relative nonlinear residual par KE); algorithm Krylov-);ewton (dotted (solid line). behavior of both approaches. history of the nonlin- line) and the higher-order Table :3.:3supports of Example :L\.2, implementation of part of the convergence In all cases, G;\'lRES \vas able to converge within a prespecified restart value of m = 20, a zero initial guess vector and no preconditioning, gorithm. we set lm"r For the higher-order version of the Krylov-:'\ewton = 10. .-\S was observed al- before. the Rosenbrock func- prf'- 79 10 . Powell Rosenbrock o 100 -. 10 10-' iii§ 10-::z ~ 10'· Z -8 ;; 10-8 0:: 10 o 8'10- ~ 10-8 8 - -10 10-10 10 -12 10 10 10-12 0 10 Iteration Chandrasekhar [c=.9] o 10 15 20 10-' ~ 10-· ::z -8 0::10 o 8'10-' - 10 15 Iteration Chandrasekhar [c=.999999] 100 -. ~ 10·- ::z 5 0 ~ 10-8 ~ 10-8 .. -10 10 - 10 10-.0 10-" 0 2 4 Iteration 6 8 5 0 15 10 Iteration Figure 3.4 Convergence comparison between the nonlinear I\.E~ algorithm (dotted line) and the HOKN algorithm (solid line). Table 3.3 :'\iumber of successful Hessenberg updates (0jHC) and G:\IRES iterations (LI) in the HOI\.:\' algorithm. Rosenbrock :\'HC 1 0 2 0 1 :3 4 1 :) 1 1 6 k I 8 :2 :2 I LI 10 12 12 10 ·8 6 4 -l Powell ~HU 10 10 10 10 10 10 I LI -l -! -! -! -! -! Chand.( c=.9) ~Hl; I LI :3 :3 2 ;3 :2 -! Chand,( c=.999999) :'\iHl~T LI :3 2 :3 -! 10 6 2 -! 4 -! tion represents the hardest case. Hence. the algorithms portant improvements method do not show im- compared to their Newton's method and Broyden's counterparts. The plateau rit.hm at the first. iterations Krylov- Broydell update portion exhibited by the KE~ algo- obeys to the difficulties encountered for the same case (see Figure :3.2). ~ot surpris- ingly. Table :3.:3confirms the lack of success of the Krylov-Broyden for the higher-order by the version of the Krylov-Newton note the occurrence of backtracking update algorithm (the zeros de- steps at the first to two nonlinear cy- cles). The PO\rell function introduces an opposite situation. The solution to the minimal residual problem resulting from every Hessenberg updates was al\ra.vs able to generate a decreasing step for nonlinear \E\ I\:E~ algorithm reproduces algorithm and the higher-order dramatically outperforms 1 inns all Consequently. the almost exactly the behavior of the Krylov-Newton (HOKN) algorithm the composite Newton's method (with only one (;:\IRES call per nonlinear cycle). It is important generales IIFII. to remark that GMRES ill\'ariant Krylov subspace under the .Jacobian after -l itera- to the level of double precisioll roundoff errors (i.e .. the residual term ill (:3.21) \ras of order 1.0 x LO-\ti). \ote. however. that this does not llCC- essarily imply that the value of the function at the new point belongs to that im'ariant subspace as it seems to be the case here. An intermediate beha\'ior is shown by the C'handrasekhar equation. with a more favor- able tendency as the difficulty of the problem increases, though. easy case. the higher-order Krylov-:\ewton method. In the method is competiti\"e \vith the composite \ewton's The nonlinear KEN algorithm outperforms Broyden's method but it is slightly worse than the NEN algorithm. The difficult case delivers similar conclusions to the Powell function case. The 8l reader can verify that each convergence history is qualitative reflecting to that observed in :3.:2. ,\s a final comment. Table :3.:3clearly illustrates that the larger the dimension of the Krylov subspace does not mean a longer chain of decreasing directions for Before concluding. in step 2.:3 of Algorithm :3.4.2. a coupled of points need to be addressed. one perform line-search globalization this context of Krylov-Broyden use throughout IIFII the examples strategies updates·? and forcing term selection criteria in We have implicitly without much detailing commented on their practical tat.ion. Secondly. what. are t.he effects. if any, on both algorithms preconditioners for the .Jacobian or its approximations'? suspected some implications due to preconditioning produce a new unpreconditioned on their implemen- due to the use of The reader may have already since the update (3.27) does not .Jacobian to be used in the next nonlinear iteration. Both questions are to he discussed in the Chapter algorithms First, how does based upon the same I\rylov-Broyden -! in conjunction update philosophy. with three new Chapter 4 Hybrid Krylov-secant 4.1 methods Hybrid Krylov methods As an attempt to reduce the amount of work of G~lRES and other affine methods. a plethora of hybrid I-':rylo\' subspace methods has been proposed. The principle of these met hods is to start with an iterati\-e solver that requires no a priori information about the matrix but which itself can produce useful information spectrum of the matrix). The computation (i.e., such as t.he is then switched to a more economical method that requires such information. has been going on for some years [87. 94, 117. 12-1]. Research on this subject .-\lgorithms along these lines are primarily distinguished is estimated and passed over to Richardson O\'en-iew of all these algorithms is of particular importance of the matrix spectrum. is reported by Nachtigal Instead, for Richardson iteration. methods ~achtigal et (/1. suggest the construction to handle of the (The reader may recall the discussion on Chapter roots approximate hand, there has been: an increasing problems arise frequently A historical et ai. [94]. This last work eigenvalue estimates In fact. as we shall see later. they are rather pseudo-Ritz On the other iterative iterations. whose roots happen to be the reciprocals of relaxation 1 and realize that these polynomial matrix. or Chebyshev since it is the first one to avoid the explicit computation associated G~IRES polynomial parameters by the manner the spectrum interest values [65]). in adapting linear systems with several right hand sides. in engineering 12:3]). and there seems no evident of the and scientific applications Krylov These (see e.g. (:3:3. 61. way to overcome the need to start the Krvlov iteratin' method from scratch e\'ery time a new right hand side becomes available. \[oreon~r. the !\rylO\' subspace definition (1.1) expresses how tight its dependencf' is on a particular right hand side (i.e .. the initial linear residual). .\mong several efforts. perhaps the most effective is the one proposed by Simoncini and Gallopoulus ft al. [120. 121]. Their work is a direct consequence [9-!] and therefore does not require explicit estimation .\11 appealing of that of ~achtigal of matrix eigenvalues. point of their approach compared to previous ones (e.g .. [:3:3, 10·1. 119]) is t hat right hand sides may be either simultaneously or sequential available during the processi ng. The intention casual here. G\[RES II FII· of bringing experiences \\'e are interested iterations. C\IRES on multiple right hand sides is not merely in reducing the computational \[ore precisely. to the end of generating can be replaced by a cheaper Richardson cost due to several descent directions iteration for and at every new nonlinc>ar cycle (i.e .. when a new nonlinear residual comes out) we use the underlying spectrum framework (or pseudospectrum) information to handle the ne'.\' .Jacobian equation. The pre\'iously developed in § :3.:3 fits entirely here. The Krylov-Broyden up- dates do not destroy the current Krylov basis and the modifications matrix can march along modifications sped rum estimation to the .Jacobian matrix making the requirpd available at the next nonlinear iteration. The present chapter with some additional to the Hessenberg is organIzed moti\'ation as follows. \Ve complete the current and insights about the main ingredients section toward the generation of a new family of Hybrid- Krylov Secant (HKS) methocls: problems with projecting onto the current Krylov basis, how to handle efficiently the new .Jacobian linear system. the spectra versus pseudospectra with Leja ordering as a mechanism concept and the Richardson iteration to apply effectively its relaxation In § :2 we describe three new algorithms: parameters. the HKS-B. based on Broyden's method. ,';'.j the HhS-\ algorithm based on :\e\\'ton's Ilonlinear Eirola-:'ie\'anlinna fi :~ and § \amely. ..J: algorithm method (hence superior we address the t\\'o last questions preconditioning and. the HKS-EN. based on the and globalization t.o the HKS-B algorithm!. pending of Krylov-secant at the end of Chapter methods. Therefore. [n :~. the discussion on these t\\'o sections re\'ises the hE~ and the HOKN together with the new HhS algorithms. Concerns § ,j is devoted to some computational on operation savings via limited memory issues in all HKS algorithms. quasi-Newton and complexity analysis are detailed. Before getting into further details. we remark that HKS methods in general pro\'ide an infinite menu of alternatives forthcoming G~IRES iterations. oriented to reduce the computational cost of In response to this reality. future advances on hy- brid met hods may be fitted in our framework. 4.1.1 Projection onto the Krylov subspace [n t.he previous chapter we were able to generate and solve several minimal residual approximation problems out of the hrylov basis generated ment. we \\'ere looking at the norm of the same nonlinear all by G\lRES. At that mo- residual function but with updated version of the Hessenberg matrix reflecting hrylov- Boyden updates of the current. .Jacobian matrix. Right now. it would be desirable to carry out the same procedure even in forthcoming nonlinear cycles and account for the changes of nonlinear residuals. that is. changes of the right hand sides in the linear Newton equation as well. Given that G~rRES was used at the !.:th nonlinear rally tempted ated to (:3. 20): iteration, one would be natu- to so!\'e the follO\ving minimal residual approximation problem associ- II (,.,(k) min !lE.".(._l,kl._F'ki) )/ F(k+1) + H(k+l) m+l !J m' II ( I.l ) • and take .(~·+l) _ .~ -' where jj(k+1) is the minimizer of (.l.t). . t hat problem was -(k+l) Hessenberg. Hm == a and (k+l) .50 . . IS \ '(k)- y. Here, it is assumed that the initial guess for no restart has taken place. Of course. the augmented defined as ) 7t;+11 = ( In terms of a normal equation (\ -/q is projected F(k+l) r. onto the current Howe\'t:'L the quality of nothing F(k). if on how close to do if $~k) F(k+1) F(k+I) already I\:rylov basis then totally useless. In general, s(k+l) ,-\s Saad [L lO] may be far from sufficient. ,:;(k+1l is to the current the quality of the projection depends I\rylov subspace. Ideally. there is lies there and shares no components (i.e .. F(k+l) E K On the other extreme. = ,O;bk+1) if F(k+l) with r6~') (or specified in is orthogonal to t.he == O. implying that the solution of (.l.l) is we expect to be between these two extremes \vhich not necessarily lead to satisfactory situation (4.1) and (4.2) indicate = 0). This implies convergence at the same linear tolerance the previous G :\IRES solution. current Expressions I\:rylov basis by means of the operator points out in the context of Lanczos iterations. primarily (L3) solution. (.l.t) and (.l.2) reduce to This approach presents a serious limitation. that (L1) (A(k). linear residuals. r6 k ))). Note that even in the ideal extreme we may need to obtain an even better linear solution in order to guarantee rapid local convergence (recall discussion on forcing terms in § 2.1.2). Since F is a nonlinear function the chances of being close to the current Kl'ylo\' subspace (i.e .. having a small angle with respect to that subspace) are minimal. Chances may increase in any of the following situations: • There is no reasonable progress toward the solution (i.e .. we may be far from t he region of rapid local convergence), • The function F is slightly nonlinear (i.e., it is virtually • Km (A(k). r&kl) is (almost) invariant under linear or constant). or A(k). Certainly. these conditions are unrealistic in practice and therefore, it is necessary to refine the solution obtained for (..L 1) in order to satisfy a predefined linear tolerance. Further ~~l(.-\(k), improvements _F(k)) with can be drawn F(k+1). if we try to increase This implies to reorthogonalize all \'ectors already present in the basis (i.e .. all columns of age to accommodate the new vector. and Cf'l'adin [:3-1:]. Parlett This approach the dimension of the new residual against V(k)) and have extra stor- has been suggested by Carnoy [107] and Saad [1101 in the context of symmetric problems but could be equally applicable to nomymmetric Arnoldi instead of the Lanczos algorithm. problems uuder the light of the However, we do not follO\v this direction due to the following reasons: • It steadily increases storage requirements and floating point operations at the same pace as does GMRES with a higher restart value, • Reort.hogonalizat ion to prevent loss of orthogonality may be a costly issue and. • There is not much gain in terms of parallel implementations of the method. In summary. the remedy to the problem of efficiently handling the changing \'alues of the nonlinear function F relies on advances along the solution of systems with sen'ral right hand sides by Krylov subspace methods such as GYIRES. In the follow- ing section. we discuss how hybrid Krylo\" subspace methods could be an appealing solution to this problem. 4.1.2 Reducing the cost of solving the new Jacobian equation \\'e ha \'1" seen t hat project ion and exhausti ve reorthonalization meet the solution requirements of a new .Jacobian equation. refine or replace (by inexpensive underlying iterations The objective no"" is to means) the solution of (4.1) without Kr.ylov subspace information. affecting the To that end. we suggest to use Richardson instead of G~IRES and extract suitable relaxation t he recently updated are not enough to parameters for it out of Arnoldi factorization. The work of Simoncini and Gallopoulos offers a satisfactory answer to such objec- tive [120.121]. This work was motivated by the need to solve iteratively nonsymmetric linear systems with several right hand sides. They propose a version of G~IRES :\IHC:\IRES algorithm) that combines the .\rnoldi t ion, The resulting algorithm process with Richardson (tilt' itera- turns out to be more Hexible and efficient than ot her methods such as block BiCG and BG:\IRES (see discussions on these two algorithms in [11:3. 122. 120]) since there is no restriction in the order and availability of the right hand side \'ectors and t he cost of the .\ rnoldi process is somehow alleviated by cheaper Richardson iterations. The \IHG\1 RES algorithm has seven basic components: projections of linear residuals onto the generated generalized eigem'alues ters. Richardson (or pseudo-eigenvalues), iteration the Arnoldi process. Krylov subspace, computation Leja ordering of relaxation and a seed system procedure. of parame- The Arnoldi process serves to create a basis for the h:rylov subspace associated seeded residual. to the linear system for a given Hence. a block Arnoldi solution is provided to the current residual together with the remaining residuals previously projected (Projection and corresponding solution of each non-seeded onto the h:rylov subspacp residual have the san1P flavor depicted by the steps (.l.l) and (-t 2).) In order to define a meaningful set of relaxation iteration procedure. parameters for the Richardson the following generalized eigenvalue problem is solved (.l ..J) These eigenvalues of (.l ..j) are roots of the GMRES residual polynomial, dentally, have been useful in determining residuals polynomials. which inci- a suitable compact set on which to minimize To be more precise, level curves (i.e., lemniscates) associated to the G~IRES residual polynomials bound regions (not necessarily convex) that potentially exclude the origin. thus removing the possibility for nonmonic polynomials discussed in Chapter Related literature l). of generating coefficients (zero can not be a root of a residual polynomial In the next subsection, \ve elaborate as WetS more on this point. on this topic can be seen in [64. 6.j, 94]. The Leja ordering provides a stable way to apply the reciprocals of the G:'vIRES residual polynomial roots as the relaxation parameters for Richardson iteration. these parameters are feeded to Richardson as if they were equally distributed (in the potential sense) in the region of interest. Therefore, Richardson Basically. points iteration with Leja ordering produces residual polynomials that tend to decrease in norm (i.e .. convergence is not as erratic as in other known types of parameter The seed system is a heuristic the linear residuals. Simoncini processed in norm decreasing orderings). way to provide an effective processing and Gallopoulos suggest that order. Their choice is motivated residuals order of should be by the minimization of t hose residuals having the maximum subspace. distance in norm to the underlying However. this distance is not known beforehand since the projection that subspace depends on residuals that. have not undergone the absence of better information. 1\: rylo\' onto the Arnoldi process. In they assume that all residuals are orthogonal to t he current I~rylo\" subspace in order to come up with the norm decreasing selection criteria. In our particular context. we are mainly interested eralized eigem'alues. the implementation ing. The Arnoldi factorization of Richardson in the computation iteration of gen- and its Leja order- is readily available from the last GMRES solution. \[oreO\'er, it has been modified due to the Broyden update of Hm. Therefore. solution of (.l ..l) in terms of the updated ation parameters approximation projecting associated Hessenberg matrix should provide the suitable relax- to the Krylov-Broyden A(lo;). [n t.he last subsection. t.he \'alue of the nonlinear update we exhibited of the current Jacobian some.of the deficiencies in function at the new point (i.e., the new right. hand side) onto t he current Krylov basis. Although in ideal conditions \'ide a small initial residual norm for Richardson tlw minimal numerical advantage iteration. of this projecting this may pro- \ve argue further below on step .. -\ guess ..,(Hl) = a works nnl:' in practice and it amounts to assuming that the new right hand side is ortllOgonal to the current Krylov basis (i.e .. the hypothetical concerned about a seed mechanism. and Gallopoulos fits naturally worst case). We need not be [n fact. the seed selection heuristics of Simoncini in our case. since the initial guess to the linear system is zero a.nd nonlinear residuals are sUPP9sed to decrease monotonically is approached. as the solution 91 4.1.3 Spectra vs. Pseudospectra In yiew of (L.8). we can realize hO\\" the Richardson upon the shape and size of the region include the origin. otherwise timating or underestimating norms). while the that minimal latter situation residual Q is computed rates method the). mi. numerical satisfying difficulties. (i.e .. slow reduction is a critical In general. polynomial of the method has been subject overes- The former situation may be left out of the set). issue that heavilv the normalization of the residual may lead to divergence polynomial depends (.4) . This region can not than unity on such region. Q introduces may lead to slow conyergence desired polynomial. Q (0) = 1 can ever be smaller condition C enclosing Q C no residual iteration (i.e .. the Therefore. of careful the \vay treatment in the literature. There extreme are several approaches and undesirable be roughly divided in pSPlldospectrum described The division is defined 4.1.1 Q and many above. computing obeys [1:30] and later justified Definition .\( (A). situations in two: approaches the pseudospectrum. by Trefethen to obtaining of them However. the spectrum to the ideas for hybrid the theory on this can and. those computing introduced Krylov may lead to the on pseudospectra methods in [94]. The as For f ~ 0 and .4 E CFlXFl. the pseudospectrum of A . is given by (4.6) In terms of perturbation all complex numbers in [...!-norm by f. that of eigenvalues, are eigenvalues In practice. the norm of .4. but larger than this definition of .4. + E for some perturbation the value of ( is intended roundoff error. can be restated to be relatively as the set of E bounded smaller than The introduction of the pseudospectrum high ~en~iti\'ity associated is highly non-normal. to spectrum framework computations, responds primarily to the especially when the matrix .4. [n this respect. spectrum computations may lead to misleading definitions of Q. [n fact. expression (L.8) does not follow in such case. On the other hanet. it has been found in practice that rough eigenvalue estimations reliable as reciprocal of relaxation out that these estimations parameters than even exact eigenvalues. somehow approach the pseudospectrum is wort h to add. howeyer. that \vhen A is normal the pseudospectrum balls of radius t may be more around the eigenvalues of A. Hence, limf_-:-oAf (,4.) It turns of the matrix. It is the union of =). (,4.). Ro •• nDrock o. _ -05_ -2. -z . 5_ -3 ·4 IrnaQ,n"rV _4 R.a. Figure 4.1 Surface and contour plot of the Rosenbrock .Jacobian pseudospectra at the first and last nonlinear iteration. Strategies gill. to compute For instance, {=H ..t:; : = E C.II=II Consequently. the spectrum usually of the on- the field of values of A. (i.e., the set of all Rayleigh quotients = 1}) is a conyex set at least as large as the convex hull of ,\ (.-\.). eigell\'alues distributed on either side of real or imaginary easily define regions that enclose the origin. pseudospectrum lead to the inclusion [n this respect. approximations axis may to the can oyoid this situation due to its non-convex defining region feature. Pow.1l 0.' -0.5 . -, '0 '0 Figure 4.2 Slll'face and contour plot of the Powell .Jacobian pseudospectra at the first and last nonlinear iteration. -0 ~ .. '~~ o •onoilo,n.ary 0' .0 , ., . 0 Figure 4.3 Surface and contour plot of the Chandrasekhar .Jacobian pseudospectra at the first and last nonlinear iteration (easy case). - .. .. , -2 lm·o,nary -. 0 Figure 4.4 Surface and contour plot of the Chandrasekhar .Jacobian pseudospectra at the first and last nonlinear iteration (hard case). , . \achitgal tf al. [94] found out that the following set (,1.,) with I'm. standing for the mth linear residual of G~[RES. defines a lemniscate thai ptfectively excludes the origin. Computing C\ (;\IRES the pseudospectrum em'ironment can be computationally the [>re\'ious observation gives clues for defining a fair approx- illlat ion to it. To that pnd. we can explicitly compute takE' rt'l'iprocals of its ('oots as suitable relaxation This \-iew was gi\'pn hy \achitgal the G\[RES Howe\·er. ill demanding. f.t aI. polynomial as a pseudo-kernel the G~IRES polynomial parameters of Richardson iteration. A more efficient approach polynomial and is to consider [64]. The idea is to solve the generalized eigenvalue problem (i.e .. obtain the pseudo-Ritz values) depicted in (-1.5). wlwre each eigell\'alue represents a root of such kernel polynomial. [hp generalized eigem'alue problem can be solved via the QZmethod dirpct application of the method may not be that advisable cOIHlitioning of H: H 1l ization of Hm. = (2 R E A more efficient approach due to the potential updates of the Hessenberg in G.\[RES (recall discussion at the end of ~§:3.:3.1). Thus. ha\'ing )xm. jR(JIl+1 we can transform (.!.:')) to the following equivalent problem R-- --,. \Q-I--. wlwre QI. is the ill- is to work with the QR factor- which is readily available from rank-one matrix ~eneratt'd 77", m' [71]. Howe\·er. 111 x In leading princi pal submatrix ( -1:.8) of Qt. The QZ met hod can be now applied more efficiently due to the presence of the upper triangular matrix R in (4.8). Besides. this is a much a better conditioned problem than (4.0) since there is no need to form the normal equation associated fl./I' t'nfortunately. the complexity of the QZ algorithm is of ()(m3) to floating point operations, even though triangular form R. This 111 « roughly half of the operations complexity is still affordable. however. due to the IIp[Wr m where for small II. Example 4.1.1 Figures -L L--L-! show pseudospectra ('a~f>Sllsed for test ing t he nonlinear f (indicated dimensional associate in laglO plots. scale) to a particular Richardson Thl-' simplest iteration nonstationary linear system stated as follows Algorithm 4.1.1 iterative guess For i = O. 1.... residual .).) Compute relaxation The parameters. '.I' rank of H. This cycling system, [n pseudospectra are also presented. ordering method is the Richardson Ti. for iteration. Given the i = 0.1. ... m - 1, it can be iteration) until convergence solution its multiplicity). .I'll' 2.1 Compute 2.:3 Cpdate Leja levels of the three- into account parameters (Richardson I. Ci\'e an initial 'J and values of give an idea of the sensitivity of the .Jacobian iteration m relation (1.2) and (taken in difficulty Different by different of contours eigenvalue at the first and last nonlinear of the four problem solver algorithms. are measured The number order to reflect changes 4.1.4 are saved ri = b - A.x;_t. parameter .1'i+1 =.1' are applied do + Tj. T/i . in a cyclic fashion: is a way of restarting Tj = Tjmodm' where If is complex. the method. Tj m is the as it is expected out from most in real arithmetic iterations. That nonsymmetric by pairing Tlw Richardson methods. These subspace methods product with Tj + (2 Re rJ met hod is a particular methods free llwthods iteration conjugate is carried in two consecuti\'(' the [\rylov their discussed Table ,L l compares the operation inner products of preconditioners computationally. :\Ioreover. them even more attractive in carrying counts KrylO\' they iue for parallel a better choice than vectors that of G~[RES rnatrix-\'ector Clearly, we ha\'e a good selection resides family of Chebyshe\' i 111- methods that are not in the Krylov at the end of §§4.1.1. (DOTs). (PRECs). ( L~)) to traditional basis with new residual as w(' aln'ady - Asj). when compared use represents subspac('. (-F case of the more general which makes try to E'xpand G\[RES -ITjI2.4) :,inc(' they are much cheaper [8]. Therefore wht>ne\'er Richardson its complex are really attractive plementations of .-\XP'{S. the IS • .l'j+l = .rj inner problems. Richardson of relaxation out the Gram-Schmidt and products Richardson (MVP's) is a more economical parameters. process and for building the linear The major Tj. in terms use solv('r cost of up the Krylo\" hasis. Table 4.1 iteration. Operation count comparison bet\veen G\IRES The value i indicates the number of iterations G~IRES and Richardson. .\lgorithm G\IRES Richardson The effectiveness the parameters Tj DOT i i+ l) X ( 10 of Richardson are applied. This AXPY i"'(i+1 ) !\[VP and Richardson employed by PREC .)x· - I iteration depends is fully discussed heavily upon the order in which by Reichel [109] who proposed the Leja ordering as a robust way to define the parameter sequence for Richarrlsoll. This ordering is also suggested in [9-!] and successfully' used by Simoncini and Callopolllo!' [ 1:20]. The Leja ordering for different \'alues of III1 _....!.. T' J-\ !=o ;-\ It II J = max Tj In algorithmical Algorithm I Jslsm i=Ll Tj is defined by the recurrence T' - ....:1· j = L.~ ..... m-l. (L [()) TI terms. the above recurrence can be expressed as follows. 4.1. 2 (Leja ordering) [nput: ,-\n array p with entries Pi E Co fori = l.~ .... n. Output: l. ql An array q \'lith entries qi E C, fori = 1. 2.... n. = p(~'Iax~[od(lpil. 1 ::; i ::; n)). 2. Complu= TReE. :L For 1= 1. 2 ..... :U [1' (Comple;r n - 1 do = TRVE) or (im (pd # 0) then :L1.2 Cumplc.r=FALSE. :~.~ else :3.2.2 Comple.r=TRI-E. :L:3 end if -!. end for The function ~[ax~[od returns the maximum element index of the set specified in its argument. Similarly as in the Richardson iteration. the procedure takes care of the ')8 complex conjugate number modulus This updatt>~ order of :\ote that and its relative algorithm the same 4.2 pairs. requires only of operations depends distance with respect () (ml) floating required the complex to the rest of the points. point to perform on both operations. In fact. the QR factorization this l~ of rank-one Hm. The family of HKS methods In our context matrix problem. has been \'alue of the Hessenberg obtained after each nonlinear already function matrix updated at the to the .Jacobian information the use of a fair set of relaxation us to switch method Broyden current or any other update. nonlinear is. the next alternates ~ewton GMRES nonlinear Once step. \vith the linear flow of computation the equation and (:3.28). system Richardson In this iteration i~ switched iteration to this way. \ve can use to solve and make possihle iteration. deli\'ered method. the entire to converge Broyden's by a Krylov- and back to the original Therefore. This forces method, has com'erged in order the one would ha\'e to the one governed is soh'eel by G:\IRES. Richardson that for the Richardson method) with associated (e.g .. given by Ne\vton's solution the computation information approximation update parameters a Hessenbt>rg to be used in conjunction Spectrum of the h:rylov-Broyden eigem'alue step has been completed and is ready new point. corresponds by means consistent 1If'\\' the Leja ordering tht' Tltat procedurf' to the nonlinear solution. 4.2.1 The HKS algorithms One of the main lowing algorithm: results on H vbrid Krylov-Secant methods is synthetized in the fol- Algorithm 1. Set 4.2.1 (HKS-B) FALSE. ok = FALSE and k = O. .:·;J,:ipgmrf8 = '2. Gi\'e an init.ial guess a function IL(O). mat ion A. (0) and a tolerance value F(O), a .Jacobian approxi- ,.,(0). :30 Do until con\'ergence = :3.1 If (ok TReE) .)..1 1 C",ompute then F(k) 'J . :3.1.2 If (sJ.-i pgmres A.(k-l) = FALSE) then Perform a Broyden update of Endif of . :3.L.:3 Else Perform a Krylov-Broyden update A(k-1); :302 Else o/.· = TReE: Endif :L:~ [f (..,!.-ipgmrf." :>)..). .J L r(kl l'~ = FALSE) then ..tJ (k) • H(k)m' . p( Iate \'(I~) Ttl • (' :30:3.:3 ·~!.-iP!Jmre8 = TReE: 'J.) .J] -C"'[RES( I." ilk) :1., _F(k) H(k) .)..).,) II·m+l. (k) m··) m' Endif. :L-t Else :3o·Ll [0] = Eig(H). :3.L2 [T] = Leja(f!). :~.4.:3 [..;;(k).ok] = Richardson :3.4...1 ..,J.-ipgmres = (.4.(k). _F(k) .. .,(k).T}(k). T). F.-\LSE: Endif. :3.3 if (o/.: = TRUE) then 1] :3..3.1 [,\(kl.1}(k+1 :3..3.2 U(k+11 = u(k) = Backtracking(A.(k). + ,\lk) ..;(k) F(k). s(k). u(k). ,.,(k)). ,(k) .,';. T} (k)) • tOO :3.0.:3 k = k + L: Enclif. -L Enddo The abo\"!:, algorithm imation is written to the .Jacobian is prO\'ideci by means \Yith some slight modifications. be obtained in terms of Broyden's method. of Broyden other instructive That updates and important is. an approx- (see step nonlinear as well. From now on. we shall refer to the above algorithm :L 1.:2). solvers call as the HI\S-B algorit hm. Several to control linear comments are in order. the use of G~[RES iteration counter. by the 1lI0st suitable dynamically and the Richardson iteration. procedure leaves open the option available (e.g., if the former operation At that adjusted variables Step:2 diffen.>nce approximations or IIna\·ailable). The boolean k is the non- The variable to approximate reevaluate is either step we also define an initial (in accordance skipgmres and ok are flags the Jacobian it exactly or by finite- computationally linear expensi\'e '](0), which tolerance. to (2.6) and (2.7)) by the backtracking is procedure in step :L·). L. 1'111'nonlinear Th('ll. loop starts with the G\[RES "II t>lt'llIents belonging tIll' lllillimal approximation tilt' Hessenberg to prO\'ide matrix to the .-\rnoldi problem parameters iterat ion. Since. t he rank-one we do not consider Since the \'ariable iteration). step It'ngth. update the checking process are ref rie\'ed for Richardson is performed of its possible backtracking This globalization procedure proceeds linear equation. alld the least squares solutinn of IIpdatt' ur out Broyden's the Hessenberg iteration matrix singularity matrix. or ill-conditioning. (and after a successful to \'erit'y and correct also computes is ready for the next nonlinear once for a given Hessenberg oJ.: is always true after G\IRES the line-search of the ~ewton to carried :LL1). Thus. (as in .i.lgorithm the relaxation solution a new linear Richarson the nonlinear tolerance, It·,. tOI according to the agreement [f backtracking function F and its linear ll1oc!t->\. between the nonlinear steps take place, a fraction of the nonlinear step is taken. Othel"wisf'. the full step is accepted to improve the solution so far. The value of 8J.:ipgmre8 in step :3.1.1 allows the selection between the two tyP(~ of updates. Clearly. the Krylov-Broyden update Hessenberg matrix in t he previous iteration. follows the Broyden update of th€' [n §§4.:3.1 we shall devote some discussion t.o compactly collecting t.hese updates for the sake of efficiency. Steps :3..1.1 and :3.L~summarize the discussion given in §§4.1A. The variable n has to (·L.j) (i.e .. the roots of the G~[RES polynomial). the list of eigenvalues associated These eigenvalues are then reordered and assigned to the list i by the Leja ordering described in Algorithm 4.1.2. If Richardson iteration fails. we recommend here to reuse the operators constructed so far and resume the computation whether with G\[RES. \ote that the variable ok determines to avoid t he backtracking procedure. the recomputation of the .Jacobian matrix and the \'alue of the function at the current point. Another possibility (which in fact. it is used in our implementations) and ree\'aluate it. If this is not expensive compared may hf' a(h-antageous Broydt'nllpdatt~. mean t is to flush out the current .Jacobian matrix to the rest of the computation to take a better solution step t han that provided by a Krylo\'- \Ve remark however. thaI a failure of Richardson does not nect'ssinil.\' hat a poor I,ry lov- Broyden update has occurred. The Richardson itf'l"i1t ilJII method can fail due to a poor definition of the set where the residual polynomial minimized. We already know that indefiniteness [n t his regard, undesirable situations is (eigenvalues located on both sides of t he complex plane) or eigenvalues close to the origin cause severe problems t he method. il good preconditioning (even for an efficient utilization strategies of should remO\"e these of G\IRES). lOt Powell Rosenbrock 100 100 -2 10 10'· ti§ 4 ~ 10Z ,. a: 10 ~ ·8 - -10 8' 10 10 . 10 10'· ~ 10.0 ~10'· ,. 10-10 10"· a 10 15 a 5 10 15 Iteration Chandrasekhar [c=.999999) Iteration ChandraseKhar [c=.9) 100 100 ·2 10 ti§ 10'· ti§ 10" Z a:: 10 :; 10.0 i 10'· 10.,0 _10 10 ,'2 10 10'· Z .0 <=> 8'10.0 - 20 10"· 0 2 4 Iteration Figure 4.6 B a 5 10 15 Iteration Convergence of the HKS·~ algorithm. lar function casf.'. although t.his is the most favorable case for the I\E:\' and the HOK\' algorithms. [n the absence of preconditioning, eigenvalues located on either side of the complex plane make the Richardson tiun di\"e\'!~/'at. all nonlinear I ht' steps (indicated cun"(' ...i. The Chandrasekhar I'Xill1lpk "11(,('f.'SS 1)[" itera- by the absence of circles H-equation problem 011 is an illustrati\'(> how the dist.ribut ion of eigenvalues plays a decisi\'t~ role in the of Richardson iteration iSf:'(~ Figures -l.:J andL!). All.Jacobian ma· trices are positive stable for both the easy and the hard case. However. il:; the nonlinear solution is approached the eigenvalues spread out and closer to the origin. This situation t tend to be more is more pronounced in he hard case which explains the success of HKS methods only in the first half of the total number of nonlinear iterations. eigem'aluE's ation. III t he easier case was certainly The other instructive The major clustering of beneficial to Richardson point is the deterioration iter- in convergence of '. lO.) all methods (particularly, for the HKS-B algorithm) with the Krylov-Broyden update. \ote due to the switching t hat switching between Newton's method and Broyden's itly depictpd in Algorithm force a Krylov-Broyden :L2.1. update linear step (i.e .. after C\[RES step). \Ioreover. Broyden's The difference here. is that we systematically solution at least to take place in every even nonhas computed the new nonlinear direction at the odd the already observed quality of the Krylov-Broyden update and its direct application alternative method is implic- approximation to the current Jacobian with respect to rather than an of it. provides reasons to believe that this approach is by far more appealing than the one suggested by Algorithm :3.2.1. In other words. there are more chances to succeed with a Krylov-Broyden \"ewton's method step t han a Broyden's gence rate of an inexact ~ewton's r nfort unately. iteration t here is no guarantee I he y = repeated hand). the conver- t hat the solution obtained by the Richardson the formal possibility iteration solutions before returning within the :\rnoldi use of Richardson (neglecting an inexact (if ever !'P- The basic question is how to update the Hessenberg matrix and hClt l\pdate consistent \.:1$ overtaking lies on the current Krylov basis. [\m. This truncates '1l\irt~d) 10 G \[RES. 1 method eventually preceding method (recall discussion in §§ :3.2.1.) of creating a long chain of Richardson kepI' update decomposition. One way to rf'tain is by simply taking its solution vector s and doing of course, any shift from the I\:rylov subspace Then the vector y is used to update of Hessenberg matrix that one has in and relaxation parameters are computed again. This vector is nothing but the least squares solution of llIinqelR"' IIVy - ·~iland step $. thus. Vy represents the closest vector in [\m to the current As an approximation in §§ ·U.1. and therefore. to t he problem \ve fall into the same issues discussed this approach may result insufficient in many situat ions. Howewr. it is worth to mention that this approach was reported in [8:3] with reason- IOf) ably good results in practice. [n this dissertation. however. this idea is not. pursued any further. Of course. a refinement to the ahove approach can be achieved by projecting the Richardson solution vect.or and increasing the dimension of the Krylov basis hy including it. However. t his introduces reorthogonalization. additional which were also discussed in §§ 4.1.1. The good news is that in many nonlinear matrices may not change dramatically linear solution. parameters. difficulties such as those due to the spectrum of .Jacobian during the course of converging to the non- This opens the opportunity Hence. the nonlinear problems, to further reuse the current relaxation method may rely on Richardson iterations until its convergence det erioril.tes significantly (or breaks down) in which case. G~[RES is invoked once again. This idea deserves a separate and exhaustive treatment and goes heyond t he scope of the present work. 4.2.2 The role of preconditioning So far. our algorithms ('lIIplo.\'('(1 in Krylov-secant methods have been described under the assumption a preconditioner that we have not in G~IRES. It is not difficult to realize that.. if a precondi- tioner is used. the update (:3.'27) rather than reAecting a secant update of the .Jacohi<lll. reHects a secant update of the .Jacobian times t.he in\'Prse of its preconditioner assuming right preconditioning). [n other words. given .\I(kl as the preconditioner. (:3. '28) \Vould become (._\.\l_l)(k+lJ = .-t(k) (.\f-l)(k) + p(k) k (F(k+l) + r6 ) - (S'(kl)t where (i.t' .. l .-t(k)s(k ) .S'(k) (J(k)r 10 'j This means parameters fectively that the spectrum information are based corresponds .-\Igorithm to the form above. 4.2.1 we should with its preconditioner on which ensure are somehow the Richardson consistent in order t.o apply ef- Therefore. that the .Jacobian relaxat.ion operator with the associated toget.her J(k+l) Richardson relax- at ion parameters. There are three form update iteration possible ways to overcome (4.11) and carry out t he matrix in terms of (.-U/-1 )(k+l) consistent. with the preconditioned to soh'ing the following this vector This certainly . Richardson preconditioned problem. products makes iteration. .Jacobian system Firstly. within we may perthe Richardson the relaxation parameters This approach is equi\'alent by Richardson iteration (4.l2) Clearly. in order preconditioning to obtain effect embedded a meaningful in the operator is no explicit form of the preconditioner One possible approximat.ion (,\/-1 )(k) (k+l) = (.\I-l) to t.he problem (k) + step (.4..\/-1 we need to remove Unfortunately. )(k+l). leading to t.he right llnpreconditioned by means of the Sherman-:\lorrison- (,\I-I) nonlinear [s(1.-) - is to perform Woodbury there solution. the [,rylov-Sroyden formula. (,\[-1 )(k) 'Ilk)] (8(kl)/ (s(kl)/ (.\[-1 }(kl q(kl that t.he update is. compute (.\1-1 )(kl ( U:~) where and apply it to the preconditioned iteration. This implies solution the solution delivered at con\'E'rgence by the Richardson of the linear system (-L 1-~) \"ote obtained that the solution of t.his linear from a Krylov-Broyden update. system does not necessarily :\[oreover. the operator represent (.·-LVI-l)(k+ll that .\Ilk ...l) 108 l\lay introduce a significant overhead in t he implementation and in the computation tion of it) is required. of a globalization of the future updates \vhere the Jacobian (or an approximaConsequently, where even rapidly convergent its manipulation Richardson iterants may cause misleading situations for solving (4.14) lead to poorly nonlinear steps (i.e .. insufficient in producing a descent direction for The following theorem update of the Jacobian. For notational the proof of Theorem :3.:3.1) we drop the superscript sign to indicat.e t he operators Theorem 4.2.1 IIFII). provides an upper bound for this approximation ~pect to the Krylov-Broyden + strakgy updated with re- simplicity (as in on k and adopt the conventional by the Krylov- Broyden update. Let the I\rylov-Broyden ( U 1) .. -\lso. let the Krylov-Broyden update of tLU-l be given by update of both A and AI be given by the formula (:3.~8). then II (.-url) + .\1+ - A +1/ ~ II [ -,;.\~/-III(lIq .-1811 + K (.\I) 11..11111811) - + IIq ~ ::$11 (1 + Ilqll) K (JJ) . (,t.FS) where q = F+ Proof + roo 1\ (.\l) = 11.\/1111,\1-111 and. provided that II·~II =1= o. .\ simple algebraic manipulat ion yields the followi ng expression (...1.,\[-1)+ .\[+ =.,t + A,\J-l p!q - ,\I.,) .,= .~I.; + (.\[.S)I y (J/ S)I .\1S + p((! - '''\..~)~.\[$)1.\1 ( '\/''') .\1 $ . P l q - "t:i) $1 _ P (q - As) .s! SiS Sl.:; III this development we have used the fact that (.\IS)I P = (Ms)t to the I\rylov subspace) in order to split the product t he last two terms appearing (i.e .. .lIs belongs of the two rank-one at the right hand side of the above expression. terms ill Hence. l09 +P (.US)t!}.st (q - .4.8) t [ (.\IS) + ,_\..\I-I P (.4.\1-1 Taking norm on bot.h sides and noting (.-\.\r II l ) + .\1+ - .-\+ II ::; .\Is .sIS t }Is .\182-. .~,rs 1, we obtain III - AJI-11111q -Iisil.4.811 + IIq + PIs) ,I r) IIPII ::; that (.\.18)t.\1] + . - - Asil (lliill + ~) lis II 11·\I.sll III - .-\.\I-llllIfLH-'IIII}!.~11 Thus (lIq - II (.-url) + .\!+ - .-\+11 ::;111-li:~i~I-lll Asil + II:LW-1\lIlMslI) - {"IIII:\!8111IYII + IIA[1I11811 II + q .~ 11.\1s1111811 ::; III - 4 .\l- 'II li.~il ilq - .-bll + .... (11q- .4sll + (l + Ilqll) t> f.' (.\1) IIAllllsll) . (.\f). o Remark 4.2.1 \',lIli,,\tes if .\! \'ote that the first term at the right hand side of (-U5) = .-\. leaving with the following liS 1\[+.-\+ - .-\+11::; , ~Ioreover. directly a sharp relative bound IIq ~I.:~·"II (l + i1qll) h: (.\!). ; ·,:,1 error bound with [,rylov-Broyden upper update can be easily obtained of the identity by working matrix. .-\+11 < II(q - As)ll. 11.4+11 Iisil III+A+ This matnx bound arises since - [+ is a rank-one which by itself. perturbs perturbation the Krylov-Broyden update of the of identity A. In view l12 tioner does not e\'ol\'e in agreement t.o the undergoing [~ryIO\'-Broyden updates of each .Jacobian matrix. [n view of this approach the HOK~ algorithm. (.4,\[-I)lk+ll ;\[lkl. imation and the particular the incorporation case of the nonlinear of preconditioning forces us to work \\·jth It is clear there that the solution of the minimal residual approx- problems (:3.:l6) and (:l.Ti) is referred to the linear system (4.12), \vith the exception that the \'alue of the nonlinear function is taken at the hh step. Although this may imply the adoption of some of the potential ill tlw case of that approach. form globalization fortunately difficulties already discussed this does not occur here. [n order to per- strat.egies we do not need the explicit form of the .Jacobian mat rix. or there is no purpose in either updating the .Jacobian by a Krylov-Sroyden There is even a much stronger reason: the failure of the Krylov-Sroyden li\'ered by the least squares solution of the preconditioned compromise much less possible ullwasted computation The key observation algorithms not imply recomputation dOf'S t he computation HOI\\ algorithms. problems (3.36) and (:3.Ti)) than the HKS-B and the HI\.S- of the .Jacobian and its preconditiollt:'r is a suitable approach Obviollsly. any attE'mpt to update a relative high O\'erhead to a computation with the .Jacobian matrix. Furthermore. npproach is to keep the preconditioner (-f. 11) .. step (i.e .. de- if providing the improH'd step fails. hxing the preconditioner Theorem update. sterns from the fact that the KE)J and HOI\.\ E~ algorithms. Thlls. [\E\ and 4.2.2 in the nonlinear the preconditioner [~E:'\ alld illtrndll(,('" that does not require direct manipulation the following theorem shows that the \)pst fixed in all KrylO\'-secant algorithms. Let the Krylov-Sroyden -\lso. let .\1 be a preconditioner update of A.U-l be given by for A and. let the Krylov-Sroyden 0.5 0.5 en a .§ '" -0,5 1 Real ci> en a -0.5 1 Real 2 0.5 0.5 .§ a 2 a § 1 Real 0 2 -0.5 , a Real 2 Figure 4.7 Pseudospectra of preconditioned .Jacobian matrices for the extended Rosenbrock function. Cpper left corner: AAI-l; upper right corner: .-\+ ,\.1-1: lower left corner A + (AI-1)+ and, lower right corner: (.4.A/-1)+. 1] J co: <::X:> c 0 en ~ , .§ ::::? I 0 -5 2 , a co: ~ :5 :::? 1 Real 51 I .: :j ~ 2 Real Real 1°1 c -5 5 x 0 51 ::s 2 en '" .§ ~ x 01 -J 0 ~ c <:::::X::> 1 2 Real Figure 4.8 Pseudospectra of preconditioned .Jacobian matrices for the Powell singular function. Upper left corner: AJI-1; upper right corner: .-\+M-1; lower left corner A+(;U-1)+ and, lower right corner: (.4..\.1-1)+. III 0.2 0.2 (3 0.1 C> § '" 0 -0.1 -0.2 C> .§ 0.5 1 Real C> .§ 0 -0.1 -0.2 1.5 0.2 0,1 0,1 ® -0.1 -0.2 C> .§ I -0.2 1.5 0.5 1.5 8 0 -0,1 1 1 Real 0.5 0.2 0 8 0.1 0.5 1 Real 1.5 Figure 4.9 Pselldospectra of preconditioned .Jacobian matrices for the easy case of the Chandrasekhar H-equation. Cpper left corner: AM-1: upper 1 right COrlwr: ..-\+.\1- : lower left corner A+(J.J-1)+ and, lower right corner: (A.U-I)+. \ 0.2( 0.1 C> § '" Or r ~ I -0.1 f ~ -02h 0.5 1 Real 0.2, I 0.1 C> § '" f ,I I -0.1 -O.~ / I 0 .5 , 1 i j ! / \ ® 1 Real 0·'8 0.2 I j -0.1 I r I I -0.2' 0.5 1.5 1 Real 1.5 1 Real 1.5 0.2 I 0.1 C> § 0 -0.1 1.5 e 8> 0, -O,~ 8 ,5 Figure 4.10 Pseudospectra of preconditioned .Jacobian matrices for the hard case of the Chandrasekhar H-equation. l:pper left corner: AAf-1: upper right corner: .-\.+;\/-1: lower left corner .4+(.\1-1)+ and, lower right corner: (.4.\1-1)+. of A be given by the formula update (:3.28), then (4.18) = where q Proof F+ + "0. h" = il.\IIIII.\[-ll1 (.\1) and. provided Iisil =I o. that It easily follows that (,."'1....\[-1)+ Taking the /z-norm 1+ = P (q - As) ,r \[ _ ."'''1 ... I ,,. ('I-)t"1 ....... S \ f (4.19) .LV. on both sides. it results 1 II (.-L\[-I) + J[ - .4+11::; IIPllllq <!Ill - Asil - IIsll II"HIIIIM- 11 AslI I'i.(J[). o The result abow~ applies To characterize the difference t hat implicitly HKS-E:'\ directly associated algorithms. to the nonlinear between the new preconditioned to the Richardson we again II (.-t.\1-1) + KE:--i and the HOK~ relaxation .Jacobian parameters algorithms. matrix and in the HKS-B and ha\'e that - .4+,\[-111::; II (.-t.\r )+ l .\I - A+IIII·\[-111 < Ilq - As II II ,\I-Ii\ . ( \'I) I\sll . I/'i,· . The entire enhancing) change the quality boils down Krylo\'-secant and to realizing of the preconditioner. in close correspondence conditioned .Jacobian discll~sion to update methods its preconditioner that rather than we should better let the preconditioner (3.28). Therefore, is dictated are to (.-\..U-I maintaining the performance by how close the combined )(k+I) and how this itself. (or of preupdated is close to II() t he identity mat rix. Obviously, maintaining these operators time the G:\[RES does not prevent the use of the best preconditioning iterati\'e 4.2.2 strategy each much clearly all the above discussion. Figures -l.'--LiO show pseudospectra tended Rosenhrock function. the extended plots of the ex- Powell funct.ion and the two cases of the Chandrasekhar H-equation. .-\.\[-1, and (:_\..\/-1)+ are presented. ..\+.\/-1 .. _\+(.\/-1)+ generated In every Figure, subplots in terms of the first and second nonlinear agonal preconditioner All plots are iteration. A tricli- similarity shown by the operators .4.+ ,\I-I and (.,\.\[-1)+ ill ilil problem cases. which confirms the result established Theorem l.~.~. The Rosenbrock the Krylov-Broyden a slightly better updates function case perfectly illustrates may cause certain new .Jacobian. condition IIo\\·c\·er. this situation t ion subplots indicate. number than both in how quality deterioration In this particular . .4.+ (.\1-1)+ .4.+.\[-1 of presents and (A.\/-I)+. does not al\\'ays hold as the Powell singular func· \ote. t hat one conjugate eigenvalue pair of A + (.\[-1) "- would be alit of the com'ex hull (i.e .. the G:\[RES lemniscate) to (.4..\[-1)+ . This may negati\'ely affect Richardson's . as it was discussed in §§4.1.:3. The Chandrasekhar one may obtain a better conditioned trend is emphasized for was employed in all cases. The reader can observe the close pselldospectra the precollditioned among solver is required. The following example illustrates Example this consistency (or resemhlance) associated rate of convergence H-equation shows that matrix A + (.\1-1)+ than .4.'\1-1, This from the easiest to the most difficult case of this non- linear integral equation. liT Powell Rosenbrock 100 10° -. 10 10-' ~ 10-' ~ 10" Z _. a: 10 a -10 .. 10 10 0 10-12 5 10 ~ 10 15 0 10 15 Iteration Chandrasekhar (ca.999999] , 20 0 10 -. 10-' 10-· Z 5 Ileration Chandrasekhar [c2.9] 0 10 ~-t ~::-:·l 8',0" - ~ 10-' ~ ~ - G - EO - '0- _. a: 10 ~ 10-· a ~10-· a 8',0" - -10 10 10 ,'. 10-10 0 10-12 2 4 6 8 0 5 Iteration 10 15 Iteration Figure 4.11 Convergence comparison between the HKS-Broyden (dashed-dotted line) and the HKS-EN (solid line) algorithm with tridiagonal preconditioning. Rosenbrock ,. 10 Z 10" - 10 a: z a: a 10" . •. ~10 , ~ 10-' ~ ~10' Z 8 ~ 10- "'J: 10 10"0 • ,.! 0 10 10° 10-12 10 Ileration Chandrasekhar [c2.9) 5 15 0 -. tI 10" ~ 10-· f t ~ 10'· 10 15 20 Z ~10-· ,. to- °L 1 10 Iteration 10" ~ 10" ~'0"1 - 5 Chandrasekhar [c=.999999] 100 , 10 ~ Powell 100 10° 0 10-10 10-12 2 4 Iteration 6 B 0 5 10 Ileration Figure 4.12 Convergence of the HKS-N algorithm with tridiagonal preconditioning. 15 Since the procedure to introduce preconditioning established. in the HKS algorithms has Iwen it is now convenient to show its performance Example 4.2.3 and 4.2.2. This example complements Clearly. the introduction the success of the Richardson in practice. results of Examples -l.2.1 of preconditioning iteration helps notably in in all HKS algorithms (indicated by circles on?r the curves depicted in Figures -l.ll and -l.12). The Powell function is a particularly instructive case. The previous inclusion of the origin in the domain of minimal residual polynomials by preconditioning here but also A+.H-1 does a good job of approximat- ing both 04.\1-1 and (AJI-1)+ succeeds in every turn. in generating nonlinear . Consequently. the Richardson ::\ote that the more Richardson a descent. direction for iterations is not only removed IIFII. iteration succeeds the greater the total number of is. This reflects the fact that Krylov-Broyden are not as good as ::\ewton's steps. \\'e shall later address in §§6.1.1 The few Richardson Rosenbrock case and the C'handrasekhar iIlcrease of "'.(.\!) at those !lonl i l1t'ar steps \Ve stress this does not necessarily mean an increase (of the same order) in the overall computing 4.2.3 iteration cost as failures (as in the easy case) is due to a noticeahle sl f'pS. Globalization We ha\'e already discussed ill §§2.1.:3 t hat a globalization strategy is necessary t.o pre\"ent possible mo\'(-'ments away from the solution or even divergence of t.he nonlinear procedure. This is a consequence of the poor approximation of eq1\at ions produced by a linearization the solution. In that opportunity. of the nonlinear system by Taylor series whenever a point is far from we argued that manipulation of .Jacobians at the it y current point are required to both carry out parabolic line-searches and computation of t he forcing terms. [n our particular context. the Krylov-Broyden ~ociated quasi-:-.rewton directions. that is to be a direction of decrease IIF!I. for $(1.:) update delivers systems whose as- (.4.(k») = - the f'xact .Jacobian matrix at the current point. customary globalization Since our secant methods eration to implement. Eisenstat tolerances dynamically rarely occurs and therefore. it IS as if they were exact for the purpose are inherently inexact. we also adopt the last consid- and \Valker ideas [.56]. That is. we compute as if we were dealing with an inexact linear Newton method. Of to the .Jacobian are sufficiently good to avoid possible breakdowns of the line-search backtracking additional explicit knowledge of strategies (see [79]). course. we assume that approximations The incorporation without On This is a common problem of any In pract ice, howe\·er. t his situation to handle .Jacobian approximations of implementing are not guaranteed In fact. they may fail to satisfy (2 ..)). top of that. there is no \vay to verify this situation Sf'cant method. -1 F(I.:), strategy. of line-search in the HKS-B and the HKS-~ does not bring any considerations to the discussion in §§2.t.:3 since a .Jacobian approximation is ['t'i1dily a\'ailable via the E:rylov-Broyden update. IIowt.'\·er. an important. note call he hrought up in relation to the directions obtained from t he least squares solutioll uf the minimal residual approximation HOK\ l:J,:36) ill the HKS-E~. nonlinear KE\ and the algorithms. The point is that the least squares solution of (:3.:36) does not require the explicit knowledge of the approximated search backtracking .Jacobian matrix. and t he forcing term criterion selection we should try to avoid its f'xplicit use as well. :\ot. surprisingly. t In order to develop an efficient line- the underlying .-\rnoldi factorization his possibility which in advance was presented in §§2.2 ..'j, provides I~O Hence in \'ipw of the solution fo!lo\\'ing two quant.ities im'oh'ing to (:3.:36). we can infer from §§:2.2..) that only t.h~ .·l+ are required: and 11. = II .-l+s = " Both expressions HOI~:\ algorithm. is given by rpmain CI F. ~':n+l ( Jel - -+)) HmYm 111'",112 - !IF112 - 2 (F.A+s). The complexity for computing Hm of consecutive the inner product updates in the (F, V~+1 (3el - H:!J,,, 0 (n + 1111. + ml1) which. for small values of m. may be preferable Since. we are using right preconditioning, than these expressions the samp. [n particular. for + IIFII 1. + 2 ( 1. arE' also \'alid for any number +.., directly. computing.-\. Ilrmll zero initial if we are luckily able to com'erge guess (recall within the GMRES restart 2.:2.:2). we have the corresponding Remark window simplified expressIons ;llld This :\ote that simplifiration implies additional -II Fill. + 111'",112 computing advantages does not require quanti t jes ha \'e beell computed previously the contrary. distribution whatever (F. :ls} demands overhead tions. the data communication of communication This special for parallel among communication and are therefore layout readily because available. of .4, F and s, the inner all processors. but also a synchronization case can be fully exploited implementations. This point tht'st' 011 product not only introducps an for all local computa- in any of the Krylov-secant algorithms. i:2i 4.3 Computational considerations for HKS methods \Ve can further exploit the secant update (:3.28) to save morecomputation B and the HKS-E0i algorithms. features of Krylov subspace We already know that one of the most important methods is that they do not require explicit knO\"'ledge of the matrix that multiplies a vector and the preconditioner. action. This feature motivates multiplications for any starting the Jacobian to perform matrix. In this way. of the .Jacobian, we are able to avoid problematic to that purpose the .Jacobian sparsity structure valuable information representation fill-in due to secant rank-one updates. change secant updates maintain without recomputing approximation issues such as matrix We only require their the present section. In this section we propose limited memory compact matrix-vector in the H[~S- There are also least- [-!9]. However. since they are thought constant throughout secant updates. some may be left out and spoil rapid convergence properties. On the other hane\. in large scale settings and cases of ill-conditioned systems. preconditioners iteratiw .Jacobian tend to be expensive in response to the limitations of Krylov solvers in tackling these problems. problems not only demands We can expect that solving large nonlinear costly function and deri\'ative significant computer time in setting up these preconditioners. computations type of iterations implicit forms for updating (see Chapter (see e.g .. [,lj. [72]). Hence. it is important .Jacobian::; and preconditioner but also a [n fact. preconditiollcrs may not be a\'ailable in an explicit form. as it occurs in two-stage or inner-outer operators. the purpose of using Richardson iteration. ."») to employ We point out according to our discussion in §§-!.2.2. that \ve need not update the preconditioner preconditioner to for However. it may be desirable to update the (\'ia Broyden's update) to accelerate the convergence rate ofG~lRES. Let us remind that Krylov-Broyden and Broyden updates fashion in the HKS-B and the HKS-E~ algorithms. occur in an interlean>d The final part of the section is de\'oted to studying of these limited operation 4.3.1 memory compa.ct representations the computational impact in terms of their floating point complexity. Limiting memory compact representations \Ye ca.n either adapt mult.iple secant. updates or limited memory quasi-:"iewton pact representations particularly to carry out implicit and efficient secant updates. rOlll- They are llseful when analytica.l deri\·at.ives are not available or are costly to com- pute. The former approach is widely' known and it was formerly suggested by Barnes [71 and Gay and Schnabel [681. ~[ultiple secant updates enforce a set of secant conditions to hold but are unstable numerical representations linearly dependent. if the directions are nearly The latter one. cOl1\·ersely. does not present these numerical dif- ficulties but t he secant equation is only guaranteed to hold for the previous update. This type of scheme was recently proposed by Byrd. ~ocedal and Schnabel [281. They claim t hat there is not a clear distinction [n t his work. we limit our attention a nalyzed by Byrd. \'ocedal t hat obtained is key in the deri\'ation Lemma 4.3.1 R.nand to the compact limited memory representat ions and Schnabel. by multiple secant updates bela\\' is lower triangular p(k) = instead of as to which one of the two is the best. Iwill!?; ([n fact. these representations eli!fer from in that the definition of the matrix a full dense mat rix.) The following 1(-'111111<1 of such representations. Let {.s(k) [(.~(k)r 8(1.)] H~o and -I {y(k) H~o be (provided that ing the followi ng mat ri x recurrence s(k) =1= .\'(k) sequences of \'ectors in 0), T:/k = 0.1. ... , defin- with <{>(O) = O. Then = <f>lk+l) y(k) (.Vlk)) = 0,1. V~~ -I S(k), ... , where and if i S: j. otherwise. Proof This is proved by induction as part of Theorem The following theorem Krylov-Broyden updates provides a compact and Broyden updates representation for the alternating of the .Jacobian matrix. should be applicable in both the Richardson iteration Broyden update is followed by a Broyden update. pact representation o 6.1 in [28]. This form and GMRES. Since the Krylov- the formula is similar to that com- of Broyden's method with the exception that a projection onto a KrylO\' subspace occurs at e\'ery' ewn nonlinear step. Theorem 4.3.1 Let AY) tained by updating Broyden's update A(O) be a nonsingular r~ 1 times matrix and let with formula (:3.:28) and A(k) be ob- l ~j times with for I = O. 1. .... ~~- 1. Then (4.20) \vhere S(k) and .v(l,) are defined as in Lemma 4.:3.1 and Q(k) =( with for I even. for I odd. q(O). q(1), ... • q(l:-I) ) . I~ I Proof \\'h('re whf're \Ve express .-\(1.:) as .\(0) is a constant B(O) = o. B(/;) = 8(1,-1) , [ = p(i.') term and the recurrence [I - (. ...(1.:) p(k) s(k) (.,(k)) t ,~y)]-1 . Hence. r] + is defined as B(k) p(klq(k) t • Vk (.-;(1.:)) = O. L .... using Lemma4.:3.1 we obtain (..J:.20) in a straight- forward wav. 0 The result is a n'ry simple extension to the limited memory compact representation of Broydf'Il'S Ilpdate. compact representation For the purpose of applying the preconditioner. of Broyden's update is the most convenient. to he aware t hilt. t he formula is used after a Richardson 'iurcessfully obtained. Applying iteration the inverse We only have solution has been the Sherman-\Iorrison-\Voodbury formula we call ('asi ly obtai n the f'xpression .\,(1) = .v(/) ~(l) "- ()(I) an(.I \ote ~\'(l)' IS that = ( F('2) _ + (.~,(/))'(.\/(l)lrl _ ('( .~ 1)11) ... F('!) F{l). (eI fi neeI·'SlIl1I '1ar I y to ,\'(1..) as .\'(1) - u'(I)r S(l) is a I ;< ~\ _ .III .. F(3). u,(11r ()(I) - . ~,(k) ... ... ) .. • Flk+I) L emma 't... I '3 1 Wi V ,;,(1). = _ (k+1) ..! F(k) ). . h k. WIt I strict upper diagonal matrix . = 1. '3 .. ,).... 4.3.2 Since Computational .S'(k). putation Q(k) complexity Q(l) .. V(k) .. ~(I). and Sf(l) increase in one column at the time. some com- and bookkeeping can be performed prior to the application of the Jacobian and its preconditio-ner in both the Richardson iteration and GMRES. In addition. note that there are some common operations Starting we between (4.20) and (4.21). the analysis with .Jacobian products, we observe that the following oper- ations ran be prepared beforehand: • Formation of (.v(k)) -I . ~ot.e that (S(k-1lr (p(k))-l V(k-I) o s(k) thus (S(k1r' = Therefore. matrix 0(11 (.\'(k:"r' for e\'ery incoming step ,./ we only require one dot product. \'ector product + kn + k"!) • Formation of inal.Jacobian and one backward t\lat\'ec indicates impll'llwnt.at iOIl. This yields roughly floating point operations . This only requires one matrix \'ector product (wit h the orig- A(O)) and two :\XPY operations. associated + mn) + 0 (\[atVec) Ci \'f'n a \'ector substitution. one Q(k). the computation o (n ( It with the orthogonal projector ha\'e P. Overall. we have t floating point operations. E IRn• the product the typical [f /,:is even we additionally A. (k) cost of performing It can be carried out as follows: a matrix-vector multiplication in a particular l~(i It is not hard to show t hat the complexity o (\[at Vec). respect Thus. to /". governed this operation ~evertheless. [n a similar fashion. has a contribution for small by the matrix-yector of this operation that of k, the values 0 (1m + /...2)+ grows quadratically whole operation with the initial .Jacobian we can determined is given by wit h should 1)(' approximation. the complexity associated to the prerOI\- ditioner. • Formation right of (.y(l)) to left. we obs~rve one .-\X PY operation. llpdate the mat.rix opplication with We need to form -1. operat.ions. of this partial of [(II == (.\f(0) .5(1) - here is one AXP\'. of implying of 0 (nl) result r to .\'(1). Therefore. (Prec) (S(l)r of nonlinear The floating S(/) point and the LC point operations. j-: perforl1wd 0 (III floating point operations. It E IRn is clone as follo\\'s: ~ote that I i~ approximately is characterized half of the total numlwr iterations. The analysis the nonlinear (Prec). twice. to term of this expression From there it is t'asy to H'rify that the overall cost of this operation + nl + il) + 0 il1\'oln.>s in ordt'r the only operation The act ion of the preconclit ioner onto a given vector by 0 (n Q(I) and the multiplication +0 X(I) - Q(l). The second 1 in this contribution 0 (II + III + 13) more floating of S(l) introduces first. Going from performed onto one \'ector the result gives rise to a total ill t he computation required has been already preconditioner Q(I) of a new column so we do not need to count The addition factorization the formation This operat.ion of the initial (.5(/)) t of • Formation that (S(/)r C\J(O))-t shows that method the cost of Richardson ach'ances iteration to the solution .. \s noted and G~IRES increases as above. k (i.e .. the numher of ~e\vton steps since the last actualJacobian to t he problem size in practice. recomputation) should be small compared However, this linear growth in the operation may be an import.ant concern \... ·hen .Jacobian and preconditioner COllnt assembly costs are overtaken by the cost. of these linear solvers. Storage requirements are also a delicate matter for significant values of k. \'evert heless. one of the main aC!\'antages of the Byrd. :'-Jocedal and Schnabel limited memory rOnlpact represent at ions is t hat they are well defined for any value of Consequently, S(k) (or ~'(k)). we can prefix a maximum value for 11:, say 7 or 8, and start replacing the oldest columns in the each of the above defined operators. This can be done. without affecting the process considerably. We finally remark that this implicit manner of performing low-rank updates sacrifices part of the parallelism. duced in the matrix-vector Specifically, some new synchronization multiplication points are intro- and in the application of the preconditioner. Ho\\'e\·er. on the other hand, the block structure a good degree of coarse grain parallelism of compact representations t hat grows as more nonlinear proceed. This is certainly a trade-off that deserves particular attention. contains itera t ions 129 Chapter 5 Preconditioning Krylov methods for systems of coupled nonlinear equations 5.1 Motivation In this chapter we focus our attention on two-stage procedures in the literature as nested or inner-outer We address their use as preconditioners ing from the cell-centered l"lement discretization procedures; see e.g .. [3,4, 16,41. 57. 72, 11:2]. for the several large sparse linear systems aris- finite difference or. equivalently, (with an appropriate sequent. :\ewton linearization of the coupled algebraic system of nonlinear equations. surprisingly, are not frequent in the literature trasting lowest-order mixed finite rule: see [141]) and the sub- quadrature These linear systems (i.e .. instances of ~ewton equations) and indefinit.e. \ot which are also known specific preconditioners are highly non-symmetric for these type of problems due in part to the complexity suggested by the con- ph~"sical beha\'ior of the \'ariables invol\'ed: pressures (elliptic or parabolic rom ponent) ilnd sat Ilrat ions (hyperbolic or convect ion-dominated component.) De:-;pite t he difficulty of these linear systems. there are certainly some "nice" propl,rties associated to the coefficient blocks that affect each type of variable. Cnde(' mild conditions. which are regularly is irreducible and diagonally met at a modest time step siz.e. each of these blocks dominant. \Ioreover. the strict diagonal dominance in some of these blocks leads to the :\[-matrix property. These block algebraic properties can be exploited so that better system. t \Ioreover. conditioning can be achieved in the entire coupled de\"ices leading to this desirable situation he coupling of the discretized also aid in \veakening nonlinear partial differential equations represented by the off-diagonal blocks. \Ve call these devices decoupling operators a preprocessing :itep to facilitate the effectiveness of two-stage preconditioners. We remark intermediate t hOllgh. that different solvers or preconditioning to multi-stage methods \Ve rather center our attention complement The combinati\'e based system. as a possible inexact order to strengthen algorithms that anse block .Jacobi. block Gauss-Seidel and Schur met hod relies primarily [ll In fact. the idea can be on those two-stage based. We include in our analysis a combinative in [11] and later restated can he used as [~7]. \Ve do not pursue this idea further here. natlll'ally in block type of preconditioning: proposed strategies steps within these two-stage preconditioners. generalized and use them as preconditioner procedure in [1:38. 1:391. upon the solution of a reduced its robustness originally pressure we propose an additive llluitiplicati\'c extension of this combinative preconditioner concentration (i.e .. density times saturation of a given phase in our particular residuals. We also aim these preconditioners two \\'('11 known I\rylov-subspace [I is \\'ort h mentioning ill1plicilllt'SS I iOIl rUl'llIlt!at iterative Seq uential solut ion methods by means of operator of equations or time-lagging (i.e .. remove part of the flllly role not only in the time discreti7.a- flow and transport can be regarded splitting case) methods:' G~"IRES and BiCGST.-\B. in time) ha\'e played an important also in the solution of ~avier-Stokes in terms of pressure and at adding efficiency and robustness that ideas to sequentialize ion of multi-phase and in porous media simulation gO\'erning fluid dynamics as strategies hilt problems. to decouple the system some of the variables present in the physical model. Along this trend, we have the \vell knO\vn [\IPES (L\Jplicit PressuresExplicit Saturations) formulation in reservoir simulation (see. e.g., [.5]) and. for .\il\·jer- Stokes problems. t he segregated methods in CFD [/5. 16]. Such strategies .tainly be inspiring to generate preconditioners can cer- for coupled linear systems arising from t he fully implicit scheme. This general idea motivates our discussion here. as follows. We begin § 2 with a presentation This chapter is organized equations governing the multi-phase discretization structure and the linearization flow in porous media. by the Newton method. We then describe their In § :3, we analyze !'ht-' of the linear system to be solved at every Newton step. discussion on two different decoupling operators § 4 focuses the and their implications in cluster- § 5 is devoted to discussing the ing the eigenvalues of the original coupled system. philosophy behind the family of two-stage procedures ditioners that the author considers most appropriate of the and to describing those preconfor the type of modeling problem addressed in this dissertation. 5.2 Description of the Problem This research concentrates \'v'hich constitute on the analysis of the equations the simplest way to realistically port in porous underground formations. look at the two-phase model. for black-oil simulation model multi-phase flow and trans- To further simplify the presentation Extensions to multiple unknowns we only per grid block are readilye\·ident. 5.2.1 Differential The basic equations Equations for black-oil reservoir simulation tions for oil. gas and water. However. for simplicity. wetting (i.e .. water) and a non-wetting n. respecti\Oelyo .\ more thorough [91]. The mass conservation consist of conservation we limit the presentation (i.e .. oil) phase. denoted by subscripts description equato a wand of the model can be found in (0)] and of each phase is given by (:).1 ) (.).2 ) where PI is the density. term with denotes the phase where 0 the production/injection permeability the \'iscosity. PI is the pressure. ca.n be either tV the following and non-wetting • ('apillary depend TIll' model also allows for slight t-'IHries. porosity. The simulator problellls and CI \'iscosity are gi\'en and depth general boundary is III is The subscript I permeability. and Z is the depth. phase. add up to one: on both location compressibility physical depend used in the experiments from both the petroleum it can specify UI These equations are Sw + Sn = 1. = PrJ - P" .. permeabilit.ies PUI and relations: • Relati\'e where conditions. krl is the relative tensor. sat.urations P... (S'L') prt'ssure: at reservoir or n for the non-wetting extra ql is the SOI.Il't't-' t is time. as 9 is the gra\'ity for the wetting through • \Vetting POlf.'"!P/. rates Darcy \'f'?locity which is expressed K is the absolute coupled SI is the saturation. is t he porosity. constants and saturation. of both .. ·\bsolute au", . ii i.e .. PI permeability = (PI) tensor only upon location. presented in t his work can accommodatl' and the environmental conditions phases. engineering disciplines for given by + VP'l: = hl.L" (0.:3 ) (·~.I ) 1:~:3 \vhere a and v are spatially varying coefficients, ;'i is the outward. unit, normal vector and hi is a spatially varying function. Initially. Pn and S'l' are specified. to solve for an initial value of Sn. :\ gravity equilibrium On reservoir engineering, conditions are of :'-!eumann type for both the saturation condition is then used the typical boundary and pressure unknowns. The resulting (possibly) rank deficient linear system is solved by choosing the bottom hole pressure at a given reservoir location.) Frequently, t.he primary unknowns in the preceding system of parabolic equations are pressures and saturations of one phase or two different phases (see the discussion of [.S]about other possible formulations.) are Pn and respectively. en. standing All other variables can then be computed parabolic-hyperbolic unknowns in our simulator for pressure and concentration In the case of slight compressibility. character. pressure and one nonlinear U>l]. The primary of the non-wetting phase. explicitly based on these tvv·o. it can be shown that the system is of mixed with one nonlinear convection-diffusion In this model. there are weak nonlinearities parabolic equation equation in terms of in terms of concentration related to those \'ariahles that depend upon pressures of one phase (e.g., densities) and their effect depends on the degree of pressure change. In contrast, basically depend on saturations strong nonlinearities such as relative permeability :'\ote t.hat a combined nonlinearity depend upon pressure. incompressibility are present in \'ariables that and capillary pressure. effect is present for concentrations The pressure equation degenerates of both phases (i.e .. en = Cw = 0). since densities into an elliptic equation for On the other hand. the diffusive term in the latter equation vanishes in the absence of capillary pressure. giving rise to a first order quasi-linear hyperbolic equation. 1 :\ l 5.2.2 Discretization ,\'owadays. resen'oir disc·ussionsl. [n between discretizations Howe\'f'r. th(> fully implicit altel'llatiw's methods resides len>1. [f :'iewton systems in long term method t he and context dislTt't ization blocks discretization blocks. U = 1/.1(') and adaptive robustness drawback system implicit among these of fully implicit of equations then several non-symmetric at each time and indefinite linear at each time step. unknowns and The main of a large nonlinear of the two-phase COl\l"t'ntration [102] offers the highest simulation. is employed need to be soh'ed in time. [.5] and [91] for detailed (see semi-implicit schemes [6:3]. formulation in tlw solut.ion of discretization formulations these two extremes. ha\'e been proposed possible Slln' rely on a variety from the L\[PES to the fully implicit ranging III simulators problem being discussed (degrees of freedom) n~locities are approximated. The components jwt\\'t't't1 t\\'o grid elements in this work. both pres- occupy on the edges of the flow coefficients are defined the centers of the or faces of the or mass mobilities. /\1 as follows T+l T+I . \.,+1/2.Jk \\'11l'("1'the sll(H'rscript ilt'ralt'S T + to a \'aiue at the (1/ . grid block location. 1 denotes + upstream direction of the flow to account finite differences .;c'\'pn point stencil the weighting and of t he model (or. equi\·alently. for pressures !!;l'lleral to ~8 different . [\·i+I/1.)/.:. i+I/2.).k iT + l)-th approximation l)·tl1 time le\'el: the subscripts The first fractional through Discretization = p!k _l ( j/I) coefficients of the \f'\\'tOIl i.) and I..~ indicate tht' fact.or on the right hand side is approximated t he permeability for variations equations harmonically in the in grid block sizes. (.).1 )-( .).2) is performed by lowest-order and concentrations associat.ed is weighted mixed by block-centered finite elements) of both phases. with a given internal obeying a thus giving rise in grid point locat.ion. 1:~.i This discretization leads to a system of nonlinear algebraic equations given by + similar terms for the y and z directions. (.5 . .5 ) where ~.i·i+l/2 = (.ri+1 - .rd/'2. similar way. ~.l/i+ln and ~=i+l/2 i.e .. the cell midpoint along the .r direction. are defined. Higher degree of discretization considered in the context of Cv[PES formulations heterogeneities. together with general boundary has been [127]. Dawson et ai. [40] consider a 19-point stencil in space within a fully implicit parallel reservoir simulator . underground In a They use a full permeability to handle tensor implementation condition specifications. TIll-' t'xtra relations mentioned in the previous sllbsection and their corresponding partial differentiation the \ewton with respect to the primary linearization Ilnknowns are used in obtaining of the nonlinear conservation ibility allows for sOllle simplifications. equations. Small compress- without affecting the validity of t.he numerical approximation. The abo\'(:>procedure follows the description veloping a parallel hlack-oil simulator. equations can be found in [.5] and [.51]. by \Vheeler and Smith [14:2] on de- Further insights about discretization of these L:\6 5.2.3 The Newton and linear system formulation fully implicit. parabolic formulation equations leads for the numerical solution to solving the following liS of systems nonlinear of nonlinear problem for each time step F(u)=O. F : IR" where conccntrations large-scale systems arising in the \"e\vton methods pract ice in resen'oi the I' engineering applications L~8j bllt its plf<,ctin'ness o problems. and G\[RES e.g .. [7:3] 5.3 Lately. general and typical unknowns problem Consequently. ~ewton ces some others) of current inexact solvers [91] of years (see physical conditions \Iultigrid methods in Newton methods are recent [,!1]. ORTHO~H;-.J" have been of common and interest. and to solve the lin(~ar is relatively ha\'e lost popularity with in pressures sizes encountered methods for a general O\·en·iew.) over time on account common in resen'oir [12. has been also im'estigated has only been shown for moderate rock heterogeneity like BiCGST:\B. as inner so!\"ers for inexact Chebyshe\' \"ewton and 2iteral ions methods IS('(' Iherein.) coupled linear system framework description in step 2.2 of .\lgorithm system The for a number in dealing The algebraic of the partitioned analysis SOR, I\rylov-subspace and rt'ferences represents of inexact han:' been employed \Ve now provide arising iteration. engineering lack of robustness tL mle Ollt the use of direct theory (and vector phase. such as SIP. four algorithms of tlwir the sillllllation Although iteratin:' Here. • of olle particular resen"oir preferred. Tlwst' Tl lR -+ linear 2.1.1 We identify and establish the de\'elopment of the systems properties some moderate of the procedures (i.e .. :'-Jewton equation) associated assumptions on \vhich with the blocks to facilitate the preconditioners the art' tTi based. These assumptions are not intended to give a definitive characterization of real life simulation matrices but are met when the time step is short enough to ensure conwrgence e\'aluating of the :\"e\vton method itself and, therefore, the latest advances in preconditioning provide a framework for coupled linear systems in reservoir engl neenng. 5.3.1 Structure of Resulting Each linear system associated Linear System with the two phase model depicted in (.S.l)- (.j.2) can be part.itioned in the following :2 x :2 block form .lpp .Jpc ( .l.:p .lec Jx-/¢:} Each block and In (f'L')' .Ii.). ) ( p ) C - ( In) (.j.6 ) Iw - i. j = c. p is of size nb x nb. where nb. is the number of grid blocks is the residual vector corresponding to the non-wetting (wetting) phase coefficients. Each group of unknowns is numbered pressure unknowns are numbered (nh) and the concentrations The block .Ipp. in the non-\vetting parabolic from nb + 1 through fashion: phase pressures. The block .fpc of a purely 1'1of the .Jacobian similar to that of a discret.ized first-order hyperbolic phase concentrations. problem in the non-wetting parabolic (convective-diffusi\'e) the 2nb. presslll't' coefficif'nts. has the structure liptic problem in the the non-wetting matrix has a structure lexicografic from one through the total number of grid blocks are numbered containing in a sequential problem Jep has the coefficients of a convection-free phase pressure and. finally_ lec represents problem in the oil concentrations a . The position of nonzero entries of a given .Jacobian matrix is sho\',;n in Figure .j.l. [n this particular example. we can observe the effect of the upstream weighting 0,........• 501- .•••,•••,••• 100~ '.'. ....• I ", ! ~.~ 150~ . '. 200~ . ..... ....... 250l .......... 300l·········.• ! .... I .' ".", '. 350~ '~'" , I " '. 400~ ""'" . ..... J ...... '. ". ", "'" "' 450' • " ..... ", '.". '. 100 Figure 5.1 200 :\[atrix structure within the block .fpc: 300 nz = 5504 . 400 j "" I ", "~'" 500 of linear systems in the Newton iteration. the moving front is one block behind giving the only nonzero c<wlficients in the lowpr part of the block. However, the absent values in the upper part added po~iti\'ely to the main diagonal of that block. iUP 5.3.2 An algebraic analysis of the coupled The presence of slight compressibility (further matrix ensllrPs im'ertibility of the .Jacohian matrix this issue is gi\'en in [;')].) In general. in system (.1.6). the discussion about block coefficients Jacobian and Jcp share .f/>p' .fpc ther physical insights and p. t he following properties l2] for mathematical (see e.g .. [2..j] for fur- definitions and related theoretical results ): • Diagonal dominance . • Positi\'e diagonal matrices). and entries and negative off-diagonal entries (i.e., they are Z- • Irreducibility. Strict diagonal dominance in all rows is only present in pressibility and pore \'olume term rontribution In COl)sequence. these blocks are nonsingular. diagonal dominance for ~ome of the rows of and .fpc .fep as result of COI11- to the main diagonal of these blocks. positive stable and M-matrices. .fpp Strict can be achieved by the contribution of hottom hole pressures specified as part of the boundary conditions. In this case. this block is an irreducibly In addition, under small diagonally changes of formation \'olume factors: can expect both blocks Jpp and The concentration t he other blocks. .r p dominant matrix. and flow rates between adjacent grid blocks we to be nearly symmetric. coefficient block -Jee presents algebraic properties It has a convection-diffusion behavior characterized similar to by capillary pressure derivative terms (the diffusive part) and wetting phase relative permeability derivat.i\·e terms (the convective part). The diffusive part becomes dominant over the roI1vecti\'e part when capillary pressure gradients are higher than relative permeability gradients of the wetting phase. It is likely that this occurs at the beginning and end of lilt:' ~imulation when the capillary pressure curve tends to be steeper. illtprll1cdiatt' time steps of the simulation. the wetting-phase pressure gradients rt'lat i\'p permeability gradients with respect to wetting saturations and affect negati\'e!y the magnitude affecting negatively and are less pronounced of the convective part. However, under the same trend the capillary pressure derivati\'es prominent During with respect to wetting saturations are less the amount of dispersivity. Desirable diagonal dominance in -J.:.: can be indeed achieved by shortening time step. \Ve have observed that the conditioning the of this block has an immediate Formation volume factors of each phase are defined as the ratio of the volume occupied by the phase at reservoir conditions to the volume occupied at stock-tank or atmospheric conditions. l ll-l III pln'sical ('ients il~ terms. the decoupling if ("()Ij('('lltration operator derivatiw's tends to approximate in the transmissibility computa- or e\'aluating some transmissibilities explicit ly. U We prefer the form D-1J O\'er .10-1 ~ince the latter ~Il bsect of J. Other implications may spoil the inherent of this choice will be discussed in the diagIlext ion. Tlw abon~ decoupling ('all a~sociate smaller or .-\BF operator matrix rows and columns pressure followed unknown to repeat pt'rmut ation this an alternate unknowns in an interleaved by the concentration for P\'ery grid block. within fashion unknown P Let representation. PJ pi = _ Ji.J . if _ - .11.1 .11.2 .12,1 .7.2,2 .7,11:0.1 JnU 1·I.lp" be the matrix representing . .: ( -'I> ; ,j the coupling follows for an invertihle (./ pc ) i,J ) • f) (. is the '2 x :2 matrix \ 0 that (JcJi.j between the mesh. and to number at the same and define ./ = [1. clearly admits blocks with individual mealls to pt'rmllte and COt'Hi- were neglected t ion. Hf'nce. this is Ii kt>..t iIlle-lagging onal dominance pressure unknowns. that We This en~ry grid block performs such Hence. D-l is a block diagonal matrix whose blocks are the inverses of the .Jacobian blocks associated to a local problem at each grid block. That is. - D-I == I .11.1 0 0 . 2.2 I .JD == 0 J-I ... 0 To follow the underlying . " 0 (.1.9) 0 - 1 .J;;b.nb notation, let us define the alternate decoupled system as D-l.J = P D-I.J pt. This idea appears rather natural. the possibility of decoupling In fact. Behie and Vinsome [11] comment about more equations with respect to pressure coefficients. shall note below. that in their combinative method but only They did not foresee the positive effect, as we a full decollpling of the grid block has in conditioning the system. The core of the combinati\'e . "ystems. In this situation. appr, ch is the effective solution of pressure-based there is no need to go further in the decoupling process as eXIHPsseclin (;).9). The coefficients introducing the coupling with pressures are zeroed out wit hin the grid block by Gaussian elimination so that corresponding coefficients at Ilt'ighboring grid blocks are expected to become small. To be more precise. let (IVp) \Vp = 0 0 \\There 1 0 ... 0 (\Vpb 0 0 ... ( H/p)nb I. (5.10) I ! Ii and (-I = (1.0)/. Therefore. in each :2 x :2 diagonal the coupling fact. it readily IFp is a block diagonal the operator block with respect matrix that to the pressure unknown. ( .Icc) The [n follows that o Similarly. remon's we could define Wp was introduced operator (rc' with the canonical an operator by "'allis in his I~IPES ) i.j vector two-stage f2 = (0. I)'. preconditionf'r [1:39]. The consecuti\'e lVT>' of the alternate counterpart. representation of the operator IV,., is gi n~n by J"p . == IFpJ = ( :,.-ID, 0 o If.' ) ( D".J" - DpJp D"J" - D"J" ) J.:p I',bxnb .lee (.J.II) ( JIl, t>p pl'p ep Clearly. J \l'p ec the lower blocks suit ing pressul't' [n order block (i.e .. are unmodified say. t as well as the main diagonal of the' ("('- J}/~'P == .Ic'p and Jc~,~·.o== .lee.) to !"l'duce the already of coefficients. is defined JlI, ) IX decu'lpll'd hose associated --ystem to ,me ill\'oh'ing to pressure unknowns. a particulilr the operator ,.;('1 R~ E JR"'" ~,,' by if i = k and j = I + :2 (k - 1) . otherwise for k = 1. 2 ..... we could IIh. [n this particular also define j = :2 concentration coefficients. + :2 (k lexicographic alternate - 1) for R~ in order ordering to obtain of unknowns. the corresponding 1.t7 Finally, we stress that this presentation can be easily extended to more unknowns sharing a given grid point (e.g., three phases and multi-component 5.4.2 Properties of the partially decoupled systems.) blocks In generaL it is a difficult task (and in fact. an open problem in many related fields [1.6. lL ·)8. 9:3]) to characterize properties associated with the entire coupled Jacobian matrix and even more so if it has been affected by some of the operators described above. This is one of the reasons that theory concerning existence and applicability different linear solvers or preconditioners of is based on some specific assumptions on the matrix J. For the class of matrices that we obtain, there is not yet an easy-to-check theoretical result that determines \vhen the symmetric when a matrix part of a matrix could have only positive eigenvalues although the matrix has some blocks that are \l-matrices In the applications spectrum is positive stable and moreover, of iterative of the operators and present diagonal dominance. solvers it is fundamental on which they are applied. to have an idea of the Specially, one would like to know if the eigenvalues are located on the right half of the complex plane to guarantee theoretical com'ergence of the iterative method . .\lso important is to detect a possible clustering of the eigem'alues since this may increase the rate of convergence. section. we briefly present two immediate blocks of the already partially In this results related to the individual diagonal decoupled .Jacobian matrix through Consider the decoupled matrix with a block-partitioned the action of D. representation as showed in (.).6) . Theorem 5.4.1 Let Jpp and -Jcc be diagonally and let Jpc and Jcp be ~l-matrices matrices. in IRnbxnb. irreducibly then J~ Z-matrices and Jc~ are ~v[- I IS, Proof \Ve prove separately dominant they matrices. are proceeds J~ and Jc~ are Z-matrices by [:3. Lemma Then \[-matrices. that \Ve only show and strictly 6.:3. page 20-1] it immediately J~ is an \I-matrix. that diagonally follows that The proof for J~ similarly. First. note that < (~-I)i.i O.Vi = 1.2 .... l1b. In fact. (DpP)i.i,(DpJi.i and (D~~ )i., is negative are all positive for i = and (o':p),., nb. so that 1. 2 .... Therefore ( J[?[J)' . . <.J = (~- rL ( o',J I ). 1.1 1.1 since (JpP)i ..j ::; 0 and (f'P)i.j For the strict of the elements diagonal (f'xcept (J p P ). ".J. ::; 0, Vi dominance, D pc );.. , (J cp ). - ( =1= j, i. j = L. 2 .... consider I.J .] ~ O. Vi =1= = j. i. j 1.:2.. .. n b I1b. the summat.ion over the absolute the one lying on the main diagonal) along the i-th value row of J~. then L l(.Ipp ),J = L I(~-It, (D,.,..l "b "b pl' :=: :==t J:;t.1 j:;t.l = - OP''!''p )i.J I . 1(..\ )",1 [IWJ, it. 1(.1,·,I" I + I(Dp, )iil t.1' -I I] J#l J~l < J,p) ij 1(..\-I),J [1(Dccl,.,II(OpP),..! + I(Dpcl,.,II(Dcp), .• I] =1. The (~ote inequality is obtained due to the strict inequality that we can not affirm this with the exclusive diagonal dominance are bounded in Jpp.) held in f.'\'t'ry row of contribution from the irreducibh' \Ve can say t hen. that all entries of the transformed by 1. which is the value that all entries II' have in the main diagonal. blocks 0 149 _~E><'O-' -4 o )II i :0.4 0.8 -3500 -3000 .. .-)114-.• ( ........... -"".! 0.2 , 0.9 :~~~--- ---- --, -44bOO ""........ iI ... . I , 1 1.2 1.4 1.6 , .: -2500 -2000 -11500 ~ 1.8 2 ~ j -'000 -500 0 500 Figure 5.4 Spectra of the partially decoupled forms of the sample .Jacobian matrix. The on~a~ove corresponds to D-IJ, and the one below to \tV J (or equivalently. \tv J.) .\n inmecliate consequence Corollary Proof 5.4.1 of the above result is given by the following corollary. The diagonal blocks J£ and Jc~ are positive stable. This is just the result stated in [:3. Theorem 6.12, page 211]. o In Figure .j...t we show the spectrum of the resulting Jacobian matrix after applying t.he decollpling operators has !Jt't'll D-l and W. [nterestingly enough, the Jacobian spectrum significantly compressed and shifted to the right half of the complex plane by the action of D-l. In contrast. t.he original structure strategies that intent to presen'e as much as possible of the matrix perform very poorly as preconditioners (see the great resemblance between the spectrum of \tV J and J.) Several experiments like this one ha\'e indicated that the best strategy is to break as much as possible the coupling between equations (or unknowns) of the individual blocks. than trying to preserve some desirable properties L.iO .... )( 10-;) or 21 -2 -- ! a 0.2 0.4 1 )( 10-3 _~t .. : n -0.4 -0.3 0.6 .. t_ 0.6 : -'" 1-6 1.4 '-6 .... : : ::..: .. -___ - ~I~ 1-: I: 1 : ~ -0.1 -0.2 , .2 1 I: -- ..- -- J. I .... --- 0 0.1 0.2 - U.;" 0.4 ::J 0.6 ...... .. • ;:.. :-- .: ~ !!!.!!!,"" 1 'o.g '-, 0.03 . . --- j 1.2 1.3 1-4 1.5 Figure 5.5 Spectra of each block after decoupling with D. From top to hottom. they correspond to the (1.1). (1.2). (2.1) and (2.2) blocks 5.5 Two-stage preconditioners We Iwgin by gi\'ing a brief background order of t he following presentation obeys roughly led to the formulat ion of the \'arious detail \\'alli< ildditl\'E' (\lId t\\'o-~tilge preconditioner 11I1lItiplicati\'E' form. (·()w.;i~till~ of thl' combination Background Efforts to den~lop Behie equations and Vinsome preconclitioners minor change format Pilei for the forthcoming a chronology preconditioners ideas. of how the ideas that arose. \Ve discuss [1:39]. and a couple of extensions thi~ section with a more efficient opprator such as block .J acobi. The in to it in approach D-I and the inexact solutioll block Gauss Seidel and Schur based. 5.5.1 parabolic \\"e two-stage of the decoupling \)1' ~t a ndard block preconditioners complt-'ment as motivation general and efficient soh'ers for systems ha\-e started to emerge [11] to be the first researchers appear as a form of c1ecotlpled to the idea but seeking strongly elliptic in the last few years. preconditioners to incorporate of coupled to consider in reservoir saturation Howe\-er. combinnfit'f: engineering. information and .-\ was later 1.11 proposed by Behie and Forsyth [10] and 'Wallis [138]. with the introduction of the constrained Wallis generalizes residuals preconditioning the idea which allows the freedom to choose those residuals to be driven exactly (or almost exactly) to zero. In particular, if this residual corresponds to pressure variables, the method coincides with that proposed by Behie and Vinsome. However, Wallis later suggests that the iterat.ive solution of reduced pressures systems are one of the most appealing tackle larger reservoir simulation concatenation or combination problems [139]. ~Ieanwhile. developments of inexact preconditioning for general symmetric and non-symmetric of domain decomposition approaches to on the stages have been proposed problems [138] but specially in the context [13, 1.5,80] for flow in porous media. These works, hO\vever. do not address the topic of specialized preconditioners for coupled equations. In CFD the idea of using decoupled matrix blocks for the construction ditioners for iterative methods and for the implementation of precon- of solvers has been around longer. Segregated algorithms have been successfully applied to solving Navier-Stokes equations alternate (see e.g .. [76. 1:31] and references therein). These methods solution of pressures and velocities or on the exhaustive them to get a good overall solution of the problem. oped in sequential formulations solution of one of Similar ideas have been den'l- at the level of time discretization of linear solvers or preconditioners rely upon the rather at the len,1 for fully implicit formulations. The use of two-stage methods is not new (see e.g .. [LOa] and references cited there). In fact, these methods are known under different names and are scattered throughout the literature. [41, 57. 72]. They are also known as inner-outer In the context of preconditioning or inner-outer preconditioners parallel computing or inexact iterations they have been referred to as nonlinear, [3, 4, 112]. They have been also subject settings (e.g. see [27] and further references therein.) variable of study in However. in t he context of large-scale systems of coupled equations t hey strangely seem to han' been owr\ooked. The renewed interest in using two-stage methods obeys primarily to the tional cost associated with solving large inner linear systems. KrylO\'-subspace methods have also contributed For example. the t"zawa algorithm Recent dewlopments in to the renewed interest in t his area. formalized the inexact version of this algorithm In same fashion. intensive work has been recently devoted to extending cur- rE'nt non-symmetric iterat ive soh'ers to be able to accommodate \'ariability of the preconditioner from iteration to iteration: .\s said in Chapter 2. we use right preconditioning step for the consecuti\'e ordering of unknowns. cheap and that its proper application That is. if r = (I'fl' or through this work. \Ve make an D-l as a preprocessing In view that this operator is fixed. introduces the desirable diagonal dominance of tlw lIlain diagonal blocks of the coupled system (.).6). sidf'. Ilencf'. t1H' application the inexactness e.g. [3.112.1:3:3. l:H) . exceptIon to this rule when we include the decoupling operator norms. (1- has been around for more than :3.5years and it was only recently that some researchers [16. :Yi]. COmp1\I Wf' consider its use on the If'ft of D-' implies the use of weighted norms for all \'I'ctor ,)1 is a gi\'f~n residual (\\'hich concatenates ,.,... hlJlh lite wetting and the non-wetting residuals of phases) whose norm needs to be complltl'l\. t 11f' n Clearly, t his mentation. dot's not int roduce any major complication or o\-erhead into the implf" \Ioreovpr. this step can be also regarded as a scaling step for the couplcd \'ariables of the nonlinear -function in a given ~ewton step. This incidentally improws the robustness of the whole :'\ewton method. \ewton method can be seen in [22. -19]. Further discussion on scaling within the 5.5.2 Combinative two-stage preconditioner Consider the two-stage preconditioner ."1 expressed as the solution of the precondi- tioned residual equation ;\-Ipv = r. Also, denote preconditioner JWp == vVpJ. Then the action of the Alp is described by the following steps, Algorithm 5.5.1 (Two-stage Combinative) 1. Solve the reduced pressure system (R~Jwp Rp) p note its solution by = R~ VVpT and de- p. 2. Obtain expanded solution p = Rpp. :3. Compute new residual 4. Precondition r = r - J p. and correct v = .\/-1,. + p. The action of the whole preconditioner can be compactly written as (5.12) The preconditioner ation. .il is to be preferably computed This means that :~l should be easily factored. once for each Newton iterThe system R~IFpr is solved iteratively giving rise to a nested procedure. (R~.JwpRp) p = We finally remark that .lIp is an exact left inverse of .J on the subspace spanned by the columns of Rp. That IS, (.\-/;11) Rp = Rp• This is the preconc\itioner as stated by Wallis [1:39]. In contrast to the combinati\'e method of Behie and Vinsome [11], he proposes to solve the pressure system iteratively and formalizes the form of the operators the preconditioner as two-step stage combinative preconditioner accepted terminology IMPES preconditioner, (2SComb) for convergent vVp and Rp• Although we consider the term two- more appropriate, nested inexact Wallis refers to procedures according to a more and to the former 0_08 I 1 O.C6f I . . :- .. .o:a . 002- . "tl • .. I II I '. 1 I ] .1... - I I J -J.:JoI- I ,B8~ oJO Figure df'~igllation operator ~_7 employed "'il h t hp lise of i,dditiull solution tin'ly and [)-l 14 '_5 system. by .1.6 shows the spectrum In this particular of the example . .\[ extensions ing solllt ion to a n-'dll('('d concentration \Ve propose system two different In the following we present residual preconditiol1er Figure to the redllCt'd prf'ssure and Olultiplicati\·ely. two-stage j t.J part of .J. multiplicative preconditioner. the preconditioned 12 of the pressure and illcorporat to the ...olution the previous . I_' by B('hie and \'insome. for an exact Additive J.3 5.6 Spectra of the .Jacobian right-preconditioned the f'xact \'ersioll of the combinati\'e operator. was t a kpn to he the tridiagonal 5.5.3 08 i t· = (2S.-\dd) .\[~t~I" and is gi\-en by t' = Wf' ill can imprO\'e the qualit.\, of ways to accomplish both procedures JI~~ltr. systefll The additive this: mldi- for computing; combinati\'e L,j.j Algorithm 5.5.2 (Two-stage 1. Solve the reduced Additive) pressure system (R~JD Rp) p = R~D-l r and de- p. note its solution by system (R~JD Rc) 2. Solve the reduced concentration C = R~D-lr and denote its solution by C. :3. Obtain expanded solutions p = Rpp and c = Rcc. 4. Add both approximate .J. Compute new residual r= The nlultiplicati\'e combinative + C. Jy. r - and correct v = ;\:/-li- 6. Precondition sequential treatment solutions y = p +y. two-stage preconditioner of the partially preconditioned (2SMult) residuals instead. proposes the In algorithmic terms it is given by Algorithm 5.5.3 (Two-stage 1. So1\'e the reduced pressure system note its sol u t ion by .J Obtain expanded 3. Construct \lultiplicative) denote its solution by r = r - Jp. system (R~JDRc) c c. solutions c = Rec. 6. Compute new residual w = r 7. Precondition p = R~D-l r and de- solutions µ = Rpp. new residuals Obtain expanded p) p. 4. Solve the reduced concentration ::J. (R~.JD R and correct u = J( c + p). .~I-IW + c + p. = R~D-lr and ~~ i ;-- ·......--.. ·· ~III-· , .....'" .. .I t<"" ·· -·11- 1.112 Figure 5.7 exact As:,uming that both introducing the notation for I = p. c: t IIf> (\cl '-3 14 1.5 Spect ra of the .J acobian right-preconditioned H'rsion of the two-stage additive operator. reduced pressures al1d concentrations ion nf these preconditioners by the are solved exactly can be characterized and by (;').1~ I and (:).1.1.1 The dilTerencf> het\\'ppn term 1~.7tp reslllting ·').;').:3. Prelimil1ary sented in [82]. precol1ditioners the two preconditioners from the computation computational [n Figure perform ;j. j" of anew experIences and in clustering Figure resides in the inclusion with residual these in step 6 of .-\lgorithm preconditioners -1.8 we can observe the spectrum around of the cross the task the point were prethat thest' (1. 0) of the IY, complex major plane. Note that the l11ultiplicati\'e two-stage d usteri ng of t he real part s -of the eigem'alues ('ven though the resulting preconditioned system preconditioner around producf>s thf> uni ty among has a negative t hf> t h !'C'('. eigenvalue. ~.06. ... I I lI. .. I ~06' -0.2 Figure 5.5.4 In the block way that form. \\.f>can express Howe\·er. coupled operator system. the correcting decouplil1g the o\"erhead ),6 0.8 1.2' 4 1.6 In other \·ia additive il1trorluced have intprpretation described we pn'st-'Ilt them a "good" words. -'I preconditioners operators the preconditiollers performs step its corresponding },4 two-stage in this opportunity decoupling ,)2 5.8 Spectra of the .Jacobian right-preconditioned by the exact n~rsion of the two-stage multiplicative operator. Consecutive :0;(1111(' I J in a ~impler form. the spectrum we apply the block \·ersions and multiplicative block abo\-e in cOllsecuti\-f> block job in clustering as it is depicted in alternate directly in the combinative extensions. by this opf>ration is difficult gin:'n that to compensate tilt' of the original to JD and omit preconditioner The reason fOl'lll. and for this is that for by its own l·,)8 preconditioning action effecti\"eness. to reinforce At the end of this section = the analysis of its this vie\ ..... For the ease of presentation If] we extend (I"""' we consider I;;' (1m -I o ) a so that , 0 \.1) ';"0 (r' ( -I = 0D) ( (Sf] Illryx nb '- _ the factored form of the block-partitiolled ~"b:7b DO), .fcc (.fcc) .fcp 0 - (.fc~rl.fl~(80)-1 (.fc~rl !5.16) lnbxnb ) x (.1.17) ( I",,", -I;;' (fEr' ) , o \rlwrp S'o = rllf'rf'forE'. .fg - J/~ (.fc~) if ,.0 -1 = (,.~. l'lbxnb .fL~ is the Schul' complement ,.~r is a given residual. the inexact t jOI\l'r1 hlocks <ls~()ciated to (.1.1 i) can be described Algorithm 5.5.4 (Block .) n' as follows. solving) l. Sol\'{-' J~q = ,.~and denote its solution by ({. = ,.~ - .1;:/1. :3. SO!\'e .S'Dp = Ii' and denote ;). Solve Jc~c = !J and denote 6. Return (fi.c). its solution its solution of JD with respect by by p. c. action to /~. of the parti- 1.")9 [f steps 1. :3 and ·1 are solved iterati\'ely il two-stage hea\'ily method, Obviously, Ilpon the con\'ergence preconditioner, and satisfied its dfirit-'ncy instead of via a direct the COI1\'ergence of the whole of each indi\'idual is dictated for every new outer inner H~ Jl, I .. , • ' ........ ' . ", "-' ' , ". l' ' assuming the blocks and I U '05 ";S ioner like this is cost Iy to implement linear with .S'D probably systems it is straightforward approximations neglecting . j 5.9 Spectra of the .Jacobian right-preconditioned exact \'{'rsion of the two-stage block .Jacobi operator. different to de\-ise the steps to (JD )-1, Discarding Jpc- and :2SGS to the solution /-p by the in our context. dense. However. for carrying we ohtain under lhi~ out the action the first step of Algorithm to be zero matrices It demands l)f .1..1.-1 and the two-stage block Gauss-Seidel (25G5) results from Jp.;- This reduces step :3 of .-\lgorithm .j.,j,,! for both 2SB.J preconditioner only the block are chusf'1l • _,~I9 the :'1)lutioll of three as a 1 """' ". -v 6~ .Jacobi r:2SB.J) tolerances this I • .•.It pr<'s('ntation Regarding I + :..... : ('It-'arly. a precondit dept>nds I .oZ>I" • .. Figure procedme I . ! 1_6,: lZI~' • we obtain iteration. I I solve, by the way in which :9, I method. whereas the two-stage of o .J ppp D = rn • Il:iO A more robust preconditioner of the Schur complement. order to maintain customary in Figurf's approximation .).1)-;).11. around \"'ot surprisingly. under reasonable that one eigem'alue relationship point these of the block subsystems an' two-stage 1)1' job of clustering from the rest as shown on Figure preconditioner preconditioners clustering by 2SB.J preconditioning. bet\'v'een the action on t he left half of the complex between it is does an even better separated resemblance [n costs. (1. 0) produced except for ol1e that appears of the multiplicati\'e are in\"ol\'ec!, :').9. we can obser\"e the significant the complex is also a certain matrix computational for exact solution [n Figure approxil11iltion to (Jfc)-l. the 2SGS block preconditioner the eigem"alues ;').10, There by means of a better all blocks of the original of tlwse pr('conditioners the f'igen\"alues and 8. where to prO\'ide a ~imple approximation Spf'ctrum shown this can be obtained plane. of this preconditioner although This which shall the latter fact illustrates become lea\"es the close more evident in t h,' next st'ction. Strategies soln>r this \·ariants. In the Schur complement C'FD problems. (·oncept .. -\ classical (>olllpi('Illt'nt COllT im'oh'ing with example respp.ct ril~t to How ill porous assemhled sol\'ing simultaneously not all) of the primary precol1ditioner to soh'e f'([uations .,addle inspired point nwHlcients media applil'i1t (see e.g .. [76]) ranging for incompressible by the Richardson the global disCl'etized of freedom Among formulations. departs solves the Schur III itp.ratioll. equation \lany is 11"\'('" \'ariat il)[\S we construct proposed from the discretization Ro\\'. The algorithm work unclf'r to with one or some (but \'ariants, method linear for each nodalunknowll associated the several projection arising algorithms which from solving separately by the discrete formulations lOllS. in several algorithm for fully implicit for all the degrees unknowns. segregated-type i~ the t'zawa to \'elocity and soh'ec! in its entirety are possible many have been employed a third by Turek [l:~q of ~a\'ier-Stokes from an approximation to Lo I the Schur complement component That given with respect to pressures hy the \'elocities (role represented is. to obtain the preconditiol1ed residual Algorithm 5.5.5 Discrete -, (-.fccD) - 1, Set 'J '. -' So!\e I and solves iteratively (Two-stage (1\. the hypf'rblilic by concentrations ('.,)1 we perform in our ca~('). the following ,I ('!>'; Projection) D _ 1, ~ (Ix) [.fppD - D .fpc (r',l' i'-ell• -1 ( J-D ec ) ] D. II' _ I. P - D rn -.lpc ( .Icc- D)-1 D D' ." r'L' Iteratl\el~. . Obtalll L'p. -1. Return i,e .. the preconditioned residual corresponding to (/'~. /'~). The idea behil1d this prE'conditioner some approximation to concentration on the Schur compll-'Il\f'nt tl) tht' [:\IPES philosophy is to give a sharper coefficients. with respect solution \Ye propose to concentrations than the Schur complement to pressures this algorithm to pressUl'f'S \\-hich would resoh-e l1lore accurately the coneent rat ion components. Throughout \\'(~n-'fer to Ihis preconditioner as discrete two-stage precondit to a\'oid an additional The first :itep in ,\Igorithm it(3ration J-:; I is introduced to solve .f,_~(1 = /'~, as suggestt-'d is chosen to \)f' computationally in\'erse of the diagonal to the identity in Figure ,1.'),·) -1.11. matrix part of .fe,- iu step ('Iwap. 1 of Algorithm Turek spectrum this \\-\)rk. ioner US D pi, and co"t h' ·1.-1,--1:. The operator [l:n] suggests (i.(' .. .Jacobi preconclitioner) in our case, The eigenvalue based since it is more closely rdat('d with respect projection giw'n that /-:::1 which clearly thus generated he t Iw rcc\1\C(-,~ is shown 1. ) I .~~ •• J I, J~~ I :01t I i ~~(~~ ~a~ ~d " ~~l I ~~l ;2 " 13 U Figure 5.10 Spectra of the .Jacobian right-preconditioned by the exact \'ersion of the two-stage block Gauss-Seidel operator. )0' 303, , , 'le2' 0' 0 jCl· 0 0 o· ~ -vOIo ,, ~C3' I~S '.' 115 12 125 . 1.J lJ5 1£ Figure 5.11 Spectra of the .Jacobian right-preconditioned by the exact \'ersioll of the two-stage block discrete projection operator. 5.5.5 Relation If _\[ indicates between alternate any of the preconditioners it is it is desirable • Continuity described predict when tilt' :-;ymmetric 1- the follo\\!ing error bound part of JJf-1 the preconditioner \\'a.\- tlll'\' Iwlp the con\-prgpnce h'idently. pl'l'COllditioner there -, matrix./. (1 + 17)2 the smaller 17 is expected al1d hetween lIlt'thod. machine the multiplicative "hara('terizatioll is absent (recall factor) Theorem • to be. In this Sf'nse. it is fll1d in what For the silkp of ~il11plicit.\·. \\'t' precision) the additiw t-'\'ery inner rpdllCt'd and block alld Gauss-Seidel at the first step of Algorithm a similar definite d<:'\-eloped herE' are related Ito between (convergence -117 ) of til(-' itt'riltin> is a relation is positive (1+17)" we are able to sol\"(> eXiletl.\' t ionf'1'. By looking '1'Ilforlullatl'ly. (1-17( to ob~erve how the preconditioners ClSSlllllt-'that for the .Jacobian or bOlllldlless: :!.1.1 )'1: importallt abo\'e two conditions These two abo\'e propertif's The better forms that which implies the following fur (;\IRES and consecutive Jacobi two-stage t\,,'o-stage precondi- ;).,j.l we can see that for BiCGST..-\B. the solution l()b for the two-stage additive .J. then job preconditioning a comparable effect. original .Jacobian 1m\' error frequencies matrix less compact than justified algf'braic instead a bettt'r in higher t he behavior propertit"s of taking Although all Consequently. the decoupling global or at least. operation 1f'<\\'t'S .\I for preconditioner expected to eliminate two-stage those iterations. :'\ot preconditioner art" provide the desired method, M seems to be it still has to capture from a linear (recall effectiH'lless spectrum pictures easier rt'c\ucecl concentration problem part of the whose block in Figures problem .).2--).G). obtain{·r\ 1)\- strategy. The last poil1t can be made Eadd .fI has a better the use of the operator in concentrations it difficult doE's a good c counterparts. combinati\'e contained +t and concentration ,\[ may e\'entually cost. ath-ant age of decoupling that an efficient from pressure consecutiw original make expect plots for the alternate a more elaborated but at a significal1t hyperbolic their If tp operator. which is additionally remaining the spectrum better we should task of finding surprisingly'. Of course. propagation :'\ote t hat the omission one with the difficult t he error more precise by looking (I - I\[-I) [I - ](/" + fJ] = ( I-./.\I---1)[ I-.J -0- 1/;>+1".1- ,(- at the computation of = by taking norms ./-./ -D) (tp+lc.lJ' 1 we obtail1 (,').10\ where I = III- I\[-III: 'J = II (.J a - .J D) = IIEs.!11 = (t p il/- + t c) II : .ID.\Islll. IG, Equation (.1.20) shows mate ilS il .\ddit ionally. result of lea\'ing is unlikely this penalty linear system .if used for retrieving Other acceptable effectivel.y in special information form could be a coarse ing a problem for enhancing concentration under simple terms on sequential it. is intended for \'ector at lowf'r computational residuals. demands cases. has to -I cOllplf'd For example. it can lw lost in a line correction met hod. of the original discretization. are not easy to obtain In general, and within implementations. opera.tol" . the original J[ pst i- factor. problems implementations and parallel cant ained in the decollpled with representation reliable coarse meshes for hyperbolic final error a. The \'ariable than the overall error propagation of the global the effect of the decoupling to be smaller seems to be only justified part by II into introduced by preconditioning and decreasing The use of penalty off the preconditioning this \'ariable compensate Howe\·er. the neaT- it should be designed more relaxed \Ye believe that bounds better if results can be obtained by incorporating more informatiol1 blocks and improving the performance of each subsystem solll t ion. till' Finally. we ['(>mark that contf'xt of microdectronic IhClt flll"tlll'l" pl"Pcol1ditioning tin;1 propo~ed by Bank precol1ditioning 5.5.6 ff Fan d rll. just recently de\'ice ~imulation after at similar obsef\'ations [60], They experimentally tllP decoupling ai, results arri\'ed in significant ,.;ta!!,;ewith in OhSf'l"H'd the .-\BF tranSfOl'lllil- illq>l'O\"f'Illf'nt compared to Id()l'k alone. Efficient implementation III t his section we propose of the two-stage In order se\'eral strategies to enhal1ce the computational efficiel1cy preconditioners. to derrease the computational can use the old but still effective nwthod requiremel1ts of onr preconditioners of line correction. This concept we was tlrst 170 of the decoupled .Jacobian. tin' I\ryIO\'-slIbspace \'aril',ty of domain \Ye can certainly method In particular. tiplicative residual erate computation diagonally he line-correct require preconditioner (e.g .. block .Jacobi) .. \ both overlapping and non-o\·erlappil1,Q;. decomposition algorithms, the coarse-grid involving setting component parabolic (see [:37] for detailed in the algebraic convergence them e.g., additive theory mit!- with mod- on the subject). to non-symmetric 01' in the preconditioned convection-diffusion properties of overlapping \'ery appealing to solve ion method. this way. Other also el1lploYf'f1 but. cont.rary ltIc!p!illiu' domain for systems in :3-D. making formulated by an itera- Result.s like ?vI-matrices which are dominC\l1t. Additionally. t 110t component this are applicable than do approaches. in parallel these problems. overlapping Schwarz. convective with a block-type decomposition can be used to precondition solve these systems <;\-~tellls Very robust non-overlapping to overlapping is uncertain. and highly domain methods. schemes are better the :2-D problems parallel arising preconditiol1ers decomposition methods in 2-D from can be can be their success for non-symmetric 1-;-1 Chapter 6 Computational experiments This chapter encompasses and preconditioners are de\'oted introduces [11 for coupled to analyze the e:qwrimentation linear systems. performance by Wheeler ilnd Smith of both The first two sections in a parallel of the chapl!'1' The black-oil in [1-1::2]and later improved met hoc I~ Krylov-secant of each one separately. ideas from these t.wo approaches described 6.1 numerical last section reservoir simulator ft al. in [10]. by Dawson Evaluating Krylov-secant methods this section present Wf' numerical the four secal1t algorithms devised f'xperiments in this thesis, (HOK~) algorithm higher order KryIO\'-\"ewton to illustrate namely. and. t.he the effectiveness the nonlinear KE:'\. of the HKS-B. HKS-E~ and HI,S- \" algorithms. TIlt" r1isrllssion aud C'hapu'l' bf'gins reviewing L Ihe f'xtended Rosenhrork's flll\Clioll "lid two le\'(-'I~ of difficulty T\\"O additional a nonlinear problems steady-state the f'xample dimelbions here because inexact \ewton methods [:20.56. The second t>xample in\'oh'es to model groundwater in two space dimensions transport serves for the~t' Il'sl--. distribution Chaptpl' :~ PO\\"(,II'" singular \-I-P(llIation. known as the {modifiul} temperature 1 hroughout the Pxtf'llded of Ihe Chandrasekhar model for the steady-state and is included function. were also dlosen equation cases shown The B"(Ifu in rf'acting first of tht'lll it has been used repeatedly This i~ il problun. systems i,.. in two spacf' as a test bed for 108]. a simplificatiol1 in the unsaturated as a window of -Richard8' zone. to observe Equation. which is uspd This time-dependent the Krylov-secant model algorithms 1-:-·) .- in action for underground algorithm simulation should benefit reservoir tal industries. This should on a parallel two-phase .--\\1 [\.rylov-secant :\ewtol1 "s method. applications. simulators prepare resen'oir methods B royden's specifica\ly. the .Jacobian method 6.1.1 equation in Chapter criteria in their inexact algorit h III versions. :\[oL'e each time in con- and the line-search experiments SPARe the composite Eirola-;\I e\'anlinna is solved by GMRES 2 . .--\11numerical performance Figures of the examples backtracking were run in this section 10. Plllplo:--ed to accelerate According employed to their order that algorit hms based e\·er. we rather to decrease work in millions relative of appearance. (those provide categorized and secant type nonlinear KEN algorithm. method. idea. of methods (Broyden's the HKS-B algorithm norlll~ preconditiont'r W(\s at every haye IWt'll step). secant- and the last set of three In the next subsections. in ~ewton the HOKN of accumu- residual in this subsection the .Jacobian I\:rylov-secant point oper- in each implementatioll. to the .Jacobian), the nine methods :\ewton's gorithm) all methods evaluate an approximation upon the hybrid the composite that nonlinear in this thesis .. --\ tridiagonal the rate of COl\\"prgetlce of C:\IRES as :\ewton-like type (those shown before in terms of floating 0.1. 6.2 and 6.:3 show the computational ror pach Ul\e or the met.hods discussed method. method, Preliminary examples lated float il1g point operations classified term \·l.:!c on a Sun workstation We present at ions. to Newton's are presented the use of the forcing descrilwd on Matlab or Newton experimentation § :3. met hod and the nonlinear this section .. --\11of them with in and environmen- for the forthcoming are compared throughout junction in use by the petroleum the ground simulator \.ve believe that this (or a similar) type of methods algorithm method. (:\"ewtoll's and the HKS-N the NEN algorithm. and the HKS-EN ho\\"- algorithm). althe Ext. Powell Ext. Rosenbrock 10° 10 ~ 10° , -2 , 10 tE 10-'- Z c: 10 _0 ~ <=> , 10-' , 10-8 8 - ..0-10 10 0 2 4 MFIOp ChandraS6khar 10 6 .. _10 - '2 10 ' " 8'10- ~10-0 - -2 - 0 2 4 6 MFIOp (c=_9) ChandraS6khar -. (c=_999999) 10° 10° 10-2 10 ~ 10-" i ~ ~ 10- 0 r :z c: '0 .S'10-8 I 10-" 10-10 .. 10-10 12 10- _0 <=> ~ 10-8 o 0_5 1.5 1 MFlop 10 2 - 0 3 Figure 6.1 Performance in millions of floating point operations of .\ewton·s method idash-dotted line). composite Newton's method (dashed line) and the HO[~.\ algorithm (solid line) for solving the extended Rosenbrock's fUllct iOI1. the extended Powell's function and two levels of difficulty of the Chandrasekhar H-equation. Comparison tended of the set of .\e\\"ton-like Rosenbrock fUllction is definitely Ilackt racking stf'pS before entering ,hI' ('kar winner is tlw composite hI' l\Ol\.\ algorithm ~teps performed spend is given difficult to t he region of rapid method of nonlinear about by the latter. the most .\ewton·s ('Oll\-,'r!!.!'s ill t ht' ft'\\"f'St number t methods in Figure case. It requires The exsewral COIl\'ergence. In this case. which is incidentally the one that iterat ions, The .\t-'\\"ton·s the sallle effort dlte to t hf' poor The reader 6.1. can confirm method alld Krylo\"-Broydell the same trend on Figures :t I and :~.-t-. The extel1clecl PO\\"l,,11equation case reveals the great rit hill. In this case. t he four cOTlSecutiw I~rylov- Broyden t\orms d much faster q-cllbic to zero than local cot\\"prgent e\'en the composite method. potential of the HOl\.:\' alga- steps drive nonlinear .\ewton·s method. residual theoretically 1-;-1 Ext. 10° -- 10-2 ..... - ' -\ 10-'& ;Z _. 10 22 I ,, .. <=> ~10-· ",\ '0 10 0 2 -. - 2 ChandraS6khar [c-.9] ;Z 10 _8 <=> ~10·· , 10 , .. 10-· ~ <=> 10-- , 0_5 , , , , , -'0 -12 '-5 1 MFIOp , 10-· 10 , 0 (c-_999999) -. 2E jo -'0 _ 10 6 10° , 10- a: 10 4 MFIOp 4 ~ -'0 6 Chandrasekhar 10 10 4 MFIOp 10° _. a: 10 \ 10·' 4 "10- ;Z '.'! a: 10 <=> ~10'· -. 10° , ~ E)(I. Pow611 Rosenbrock 10 2 0 2 MFIOp 3 Figure 6.2 Performal1ce in millions of floating point operations of Broyden's Illethod (dash-dotted line). the nonlinear Eirola-Nevanlinna alg;orithm (dashed line) and the 110nlinear KEN algorithm (solid line) for solving the extended Rosenbrock's function, the extended Powell's function and t\\'o levels of difficulty of the C'handrasekhar H-equation, \ote that the composite t hO'lgh it requires \ewton's t\\'o G:\IRES method solutiol1 H -I-'(Ii wt iOll rdlt'rt s t he sa me t rend per nonlinear seen before ('()tlllt. In thi~ particular case. increasing I-IOK\ algorithm, underlines certain harder situations. also in Figures Figure 6.'2 sho\\'s methods pared \\'ith \e\\'toll algorithm some additional this favorable in handling stlperior the Krylov-Broyden steps than step al- of the problem it era t iOll favors of the algorithlll circumstance tIll' for comes to light step is performed. group the extended This explains in the nonlinear Broyden's method The Chandrasekhar robustnC'ss for the secant-like type of approaches. [\:rylo\"-Broyden is slightly Ol1ce again. poorly \ewton's iteration. the nonlinearity accidentally. performance perform by the \ot than in t erllls of t he nonlinear 6.:2 and 6.:3 when -a Krylov- Broyden these played This is better method is very effective of methods. Rosenbrock In general. function in part the wasted KE\ algorithm. for small nonlinear in dealing com- effort disThe \E:\ tolerances. with the extended ,-l I .) Powell function and therefore. ones. all of which com'erge In the Chandrasekhar dear for the easier the nonlinear H~equation case (i.e .. c linear ~ystems. Howe\'er. sQn1e relatively small obtained KE\' in e\'ery sa\'ing obtained better with a similar KEN algorithm the nonlinear the nonlinear = the convergence algorithm tolerances. in tllf' nonlinear about step towards than the ~EN at small tolerances the nonlinear system The additional corresponds Ext. Powell 10-2 10-2 Z 10-4 ~ ~ 6 j _ I' 8'10'· ~ ~10-· \1; J2 l' II 10-12 \1 o 2 I 4 MFIOp Chandrasekhar >0'1 z 10" a: ~ 10-'2 o 2 4 Chandrasekhar I I' ... ... 6 1.5 [c=999999] 1~ >0-' Z , 0-' a: I I:::: I " \ \ ... 1 MFlop 10-12 [c=,9] ',10-'0 0.5 I \ MFlop \ ',',> , 10.,ot ',' 0 - _...- ... I::: I '~ 10-10 6 I ir 10- I" 10-'0 1~ I 8 ~ 10. 10·" 2 0 1 2 MFlop at suggests 1~ Z 10-· (lnd algorithm solution. Ext. Rosenbrock 1~ lIol .Jacobian In this case. every linear of work. is method c = .9999999. the same amount KE:\ algorithm I1WI hods of Broyden's case with as the best choice. implies of these in solving the associated better The Iw otlwr t cost. portion perform outperforms behavior .9). The plateau methods nonlinear l\rylov-:\ewton computational is due to the difficulty both Illethod h:EN algorithm 3 Figure 6.3 P(~rr()nnance in millions of floating point operations of HKS-B idash-dotted line). HKS-E:\ (dashed line) and the HKS-:\ (solid line) for so!\'il1g the extended Rosenbrock's function. the extended Powell's function and t\\'o lew'ls of difficulty of the C'handrasekhar H-equation. to i:)' 1,6 The last set of methods. depicted in figure dle the extended algorithms. i.e .. those alternatively using Richardson 6.:3. The failure of Broyclen and Krylov-Broyden Rosenbrock function produces no clear distinction However. the lise of a cheaper Richardson sa\'ing in million of Hoating point operations primarily based on G\IRES. iteration in comparison iteration. art' steps to han- among t.he t href' explains the slight to a :'\iewton's method The interleaved action of Krylov-Broyden updates pro- duces a stairway shapf' in the convergence rates of all methods for the extended PO\\'f'1l function. This indicates a loss of convergence rate each time such update is executed, Despite this. the HKS-~ algorithm :jOCX of the computational and HKS-E:'\i algorithms is able to outperform Newton's method in almost effort .. -\ similar observation can be made of the HKS-B with respect to Broyden's method and the NEN algorithm. Ho\\·e\'er. the HI\S-E\' algorithm does not take advantage of the Krylov-Broyden step in the same way that thl-' Ilonlinear KE\' algorithm. The success algorithms l)f the Richardson illt roduces additional iteration at the first steps of HKS-B and HKS-E\' savings with respect to the corresponding parts 1-3royden's lllethod and the \E\' still I he algorithm. COlll1tf'r- Howe\'er. the KE:'\i algorithm most et-ficit>ntamong all. for the t'handrasekhar H-equation. i~ this grollp of HI~S methods I)('rformed modestly \\·ell. TIlt' reader call obserH' that the HKS-\' algorithm is hardly more efficient than \'t'\\'tl)l\'S method in the easy case. Additiollalh·. the HKS-E0i is competiti\'e with Broyden's method. especially in the hardest However. the performance of the HKS- B is disappointing of iterations iteration. in soh'ing the linear systems case. due to an excessive number with both G:\IRES and the Richardson [-;-7 6.1.2 The modified The moclified Bratu Bratu problem problem is gi\'en by 11 V 2 all II II This problem tor design and phenomena. respect to threshold solution =a plays an important processing and In the absence and hence II u on represents :S ,\ •. an. a simplified of the convection it always n. 10 role in combustion term, has a solution \'alue .\. for which the equation for .\ . + 0-:-)( .1' + ,\f =.f modeling model for nonlinear this operator for A < has no solution For more details. and semiconduc- is monotone O. \Vhen for A we refer the reader diffusion > O. there is a ,\ > A. wit h and at least one to [70, 92] and pointers therein. \Ye soh'e this problem conditions, il detailf'd See. f'.g .. Glowinski. scheme and no upwinding lilwal' ~\·~t('l11 generated situation. .Jacobi (with Richardson 1X LQ-12 computed Keller and hy the :\e\\'ton we consider boundary Reinhart problem in [9:2] for iteration. except was considered problem f'qual where indicated for thesf' experiments by means of ef[uatiol1s by a block-centered is discretized 0 harder as 0 al1d .\ grow. = I:!S as suggested size) preconditioner TIll' coefficient. III this in [108] .. \ hlod, was used for in the tables .. -\ ;'\ewton and the linear solution tolerance tolerances till' of were (2.6) and (2.7). of all nonlinear sizes :\. These Bratu (or Gelfand) proposed was used for the con\'t,cti\'e ,\ =!)i' and Table 6.1 shO\vs the comparison ~The actual Dirichlet stt'p becomes ~ blocks of approximately for six different with homogeneous III this work. the problem df'scription. llnite-ditff'rpnce particlllar in the unit square problem has are indicated Q = Q. methods utilized ill these tests on the first row of this table. l,~ Table 6.1 Total number of linear iterations (LI) and nonlinear iterations C'\{) for all methods discussed in this thesis applied to the modified Bratu problem. The quantities in parentheses indicate the number of Richardson iterations employed by the HKS algorithms. ! I \Iet ! \' LI hod ! \' e\\'ton Ie_omp, .ewton \' HOI\\' HKS·~ Broyden n :38 22 ( L1) :) fj, 9 --) .- 9 L1 10 :36(21 ) i.t' .. ('\'enly spaced of the coordinate G\IRES :) 70 :37(26) KE::\ HI\S·B HI":S· E:'-i lillC'itr iterations directions, algorirhm .. \11 HI\S (shown ear iterations bottom line hert.' is that algorithms reproduce obsen'ation smaller 8 10 ~ "- dimension --! I 9 8 preconditioner 9 8 respectively. \\"as used to accelerate by \e\\'ll)f\ a reduction method. '" of nonlinear ~avings properties the Hessenberg algorithms. these of Broyclen's update, behaves I ,~ I 9 6 ' the the non- and \E\' iteratiull number we can observe governing matrix matrix, method of lina small t.hough. The Krylov·secant method. This last i.e .. an operator like Broyden's ' ~ \ half the number in the overall for these updates than of Richard:sol1 Conn:>rsely. iterations the .Jacobian f3roydell's lip the tllllllber adding the Krylo ...·Broyden because of about f in each t he linear can still appreciate than I degree represent ~ i :~, Ii :~ i , 17:3 108 12-1:(19) 20:3 2:3--1 2-t4 10.5(80) 92( .54) --! iterations \'1 : 12.) in a higher well the cOl1\'ergence is important :3 :3 6 8 i I Ll I -l :3 :3 6 8 98 1:31 8:3 64( .52) 1·56 178 186 80(66) 86(:)7) -! - --! by the HI\S algorithms. in the number illcn-'nH'nt 8 I \' LI I can be made on these result.s. methods \\'e employed -t :3 :3 6 I \' points \Ioreo\'(:>r. in parenthesis) LI 70 92 61 4.5( 36) 11:3 l-r( 1:3.) :'59(46) 62( :38) ·")0 interesting f'lllployed counterparts. I :\1 :\ tridiagonal size affects l)f (;\[ B ES iterations ,40 :30 wit.h 10. 20. :30. -[0 and :'50 grid blocks. meshes cOI1\·ergencf'. Se\'eral TIlt' problf'm Ll -t6 60 -to 2/(18) i--! 84 102 -18(28) -I:2(2.j) I -I: :3 -I: :~t \'E\' much 20 10 of update 179 of tilt' .Jacobian. Therefore. gence C[lIalit.\· of Bronlf'n's the HKS methods method promise to approximate with the added savings st em1l1 i II!.!, from t he fact that u pc\at t-'S are performed the ('<)lI\'l'r- in Hoating point operations on a mat rix of considera hly lo\\"pr order. all the other 1)\' the HOI\)i I ion. ilS hancl. \\"e can obser\"E' the savings algorithm compared it has been obsen'ed ~teps than the composite Ilumlwr of nonlinear \e\\'ton's nonlil1ear residual deli\'ered by the composite terms :\'e\\'toIl's in the HOI\::\' to remark that 1Ilg; in sonw situations. \lot !1('c('ssarily \)pl'rilli()n~ (;\IHES method grows quadratically l"sually. "olul ion is approached residuals this number due to the point norms delin'red by Sroyden's t crill-'ria operatiolls by the KE\ below. algorithm [,-I!)]. iterations method (in it is may be decei\'- G:\IRES iterations does of floating point taken in a particlllClr is higher as the nOl1liIlt'iH tolerances (i.e .. decrease of 'lk! This fact shall be important of the HOI~\ Finally. The difficulty On this matter of iterations i!.!,htt-'ning;of linear the one and the nonlinear \\"e remark are also smaller that to I~E\ the nonlinear than those producl"d met hod. The quadratic more by G\IRES). of lillf'ar iterations than :'-rewton's work since the number for the COI1\'ergf>I1Cl" analysis in tp.rms of floating than of G\IRES the number hy the Eisf>l1stat and \\'alker take into account 6.4). \\"ords, more accumulated with the norm of the final (see e.g .. Figure employed hettt"r spend the same smaller higher In addi- to generate both basically that iterations method, of magnitude \\"as slightly the o\'erall\lllmber In other ~ewton's to remark orders impl,\' more computational ,",olulion, p\'t'snilwd Although, is se\'eral G~IRES has the potential it is important of thl" 111lllllwr of linear iterations important I'omes the HOK~ method. in the HOI,~ of the linf'ar systems with the composite before, iterations. in accumulated gro\\'t h of the number pronoullced as the problem of floating point operations size increases. This in G\IRES implies that lw- Sil\"IIlgs 1,--0 in operations relatin:, also grow quadratically number of linear even though itE'rations among the table shows almost tilt-' ~C1llle all methods. Modifi6d Bralu Problem - - Newton - Camp" Newton -HOKN HKS_N \ \ \ \ \ , , , , , \ , \ , , , \ 0.5 Figure 6.4 (R\RY) '_5 3 2 Iteration of nonlinear to COII\'l'rge to the solution .. -\5 in the example as t!1l' best ill terms of total \\"c can obsen'e that the composite highf'st nonlinear method and that to the nonlinear iteration by the number of nonlinear \cwtOI1's iterations nOl1linear of the :\E); KE:\ will make the HKS-E:'\ \orms on a for all method~ methods appf'ar ... \e\\"ton's I1wthod iterations. method. algorithm \ote taken higher-order iterations The that than HKS-\ amol1g all. [n a similar algorithm. KE:'\ algorithm. iterations cases. HOI~\ takes less nonlinear htlt more than CUl'\'e described 4 \onlinE'at' iterations \"s. Relative \onlinear Residuals of \ewtoll-like methods for the modified Bratu problem -1:0 x lO grid, Fi!!;1Il'f's h. f and 6.:') show the number numlwr 3_5 algorithm fashion. takes the the com"ergC'n('(> falls between that of Broyden's the HKS-EN performs similarly so that one may expect the use of the Richardson algorithm whene\'er more efficient Richardson suc- 1';;1 ceeels at every attempt. this last observation The applies HKS-B mimics the behavior of Broy'den's met.hod. ~o as well. Modified Bratu Problem - - Broyden - - NEN -KEN HKS_B HKS_EN 2 3 6 4 Iteration Figure 6.5 (R\R\') \'onlinear iterations \·s. Relative ~onlinear Residuals ;'\J'orms of secant-like methoels for the modified Bratu problem on a -to x 10 ~rid. Figure 6,.) calibrates once more the quality pared to tlw well known Broyden algorithm and Broyden's dates are restricted the intermediate rithm and faster beha\'ior KrylO\'-Broyden performs versions method that Kr,\'ll)\' hasis. of the nonlinear algorithm. updates. the method. {'nder \'E\ performs updates. mill- Broyden's Ill'- the sam!' light. we can explain I\:E:'-i algorithm KE\ upelate of cur\'f'S betweE"n the HKS-I~ not much is lost when ThE" nonlinear o-nly l\:rylov-Broyclen of Broyclen's The closeness suggt'sts to the current and the HKS-E\ HKS-E\ npdatt'. of the KrylO\'-Broyelen hetween algorithm only Broyden the \E.\ alternates updates All three share the feature "I!!;l)- Broyrlf'n and. tlue> of being .\S Iwfore. measuring nonlinear iterations floating point operations instead of number of numher of provides more conclusi\'e insights. t he computational Figures 6.6 and 6.7 illustratl' efficiency of the new methods. Modified Bratu Problem 10° 10·' 10-2 3 10- ~---,- ! - ,-, r 10-4 ~ , ~ F , , ~ 10·a E 10.0 10.7 10.0 U - . Newton - - ::,JI0 Comp_ NeWlon :~;~N 2 4 6 8 10 MFlop 14 12 16 20 18 Figure 6.6 Performance in millions of floating point operations of ~e\vton's method. composite .\ewton·s method. the HOK.\ algorithm and the HKS·.\ algorithm for soh'ing the modified Bratn prohlem on a ,-1:0 x -1:0 grid. Figure (-j.Gshows how the HOK.\ algorithm outpprforllls method. The penalty introduced \('\\·toll·~ in soh-ing two linear ~ystems with G~£RES with the latter method spoils the nice capabilities provides higher conn~rgence the composite rates without suggested incurring in Figure 6.-1. The HOI\:'\ in such penalty. Although. it may not he as effecti\'e as the composite :'-iewton's method in driving the nonlinear residual norms down. it saves a sensible amount of computation l)f the underlying Krylov information. the Kryiov-Broyden step deteriorates In this particular by taking advantage case. however. the quality of as the solution is approached. making :\ewtoll'~ method more efficient for nonlinear tolerances in the order of LO x 10-7 which ma.\· be considered fairly small final I\rylov-Broyden Richardson iteration in most practical steps explains situations. the poor results was always able to converge good as those deli\'ered The lack of success of the HKS-E); but the nonlinear of algorithm. tllP The steps were not il~ by G\IRES. Mooilied Sratu PrOtllem r I - , Sroyden 1- - NEN '-KEN II . - HKS S - HKS_EN 10 5 20 15 MFlop 25 30 Figure 6.7 Performance in millions of floating point, operations of Broyden's method, the nonlinear Eirola-:\'e\'anlinna algorithm, the nonlinear I\E\" algorithm. the HKS-B algorithm and the HKS-E\" algorithm for soh'ing the modified Bratu problem on a -lQ x 10 grid. Figllre 6. j' ~hows a much closer t hey were less effect i \'e than linear step in more than l)elS This fact stems primarily. of C:\IRES to become t hose met hods evaluati ng the .J acobian ,jO% of computing based on Krylov- Broyden met hod tends methods more efficient from the increasing from the increasing iterations rc-'st'l11blaIlCf-'ilmol1g all Sf-'('ill1t methods. Secondly, yield the desired at small deterioration difficulty (see Table work. relative provides s norms. update. The significant a more consistent meth- Broyden' residual of the KrylO\'-Broyden llOll- secant pay-off although of the linear systems. 6.1 above) at, e\'ery the faster nonlinear Firstly, but savings behavior of I ~-l computational effort against the HKS-B algorithms In general. ~E~ compared the contrasting algorithm con\'erging is amazing picture nonlinear :\ewton's on the modified makes appropriate an analysis ilnd. tridiagonal IlXI()) Both ii.p. incomplete block-.Jacobi linear iterations G:\IRES U' and Richardson In this sense, the new the computational we present matrix system some indefiniteness.) cost of the -l we de\'oted methods of the associated preconditioners: appear point proposed .Jacobian .Jacobi preconditioners to achien' (in the HI,S algorithms) precol1ditioned for how the precondi- for all the Krylov-secant They also produce not positi\'e rate rna,\" /Jot linear of this kind here. quitt' poor in this casf:'. due to the strong inherits convergence (i.e .. diag;onill with ;3 and \\'ith no infill inside the matrix operator the the most expt-~lIsi\'I' (see Table 6.2). In Chapter block-.Jacobi preconditioner and t.hat make them attractive problem. of difficulty preconditioners for all methods. theoretical exceeding Bratu of standard preconditioner. method 6.-1:-6.7. From being the methods a balance of all methods high degree a family and method. The We consider ",;(·itlill~). without to the use of preconditioning in this dissertatiol1. :\"ewton's implementation. maintain rates lionel' affects the conw'rgence systems of the composite of a computer and Broyden's To end the analysis of the HKS-E:'\ steps t.hey go to being almost algorithms conw'rgf:'nce norms KEN algorithm. whel1 one looks at Figures in terms family of Krylo\'-secant a discussion residual two ext remes show how a rapid sOllnd as promising traditional nonlinear to the nonlinear in the fewest nonlinear to rese, These faster relative hlocb bandwidth). the lowest number the lowest accumulated iterations. --I: ~ote that of total number of the ILt'(O) is cOI1\'ecti\'e part that makes the inverse of till-' stable. (Consequently. lie on the left side of the complex some eigenvalues of [he plane and the preconditioned Table 6.2 Summary of the total number of linear iterations shown with se\'eral preconditioners for all nonlil1ear methods. The quantities in pilrel1theses indicate the number of Richardson iterations employed by tht:' HKS methods. The problem considered is on a -1:0 x 4:0 grid. I \I('thod \ewton Comp. \ewton HOI\:\ HKS-:'\ .Jacobi :120 ,1:07 :3:)9 :t21(:3~)) I I Broyden I ~E~ ' Ul -t:31 :ri.) I\:E~ HI\:S-B HI\:S-E:\ Tridiagonal 98 1:31 ~:3 64:( .12) 1.')6 . preconditioners. Recall hetween that rf'sldt ill railure pn'cuIHliriollf'l'. of nonlinear I j 28( 19) 2.1( 16) 1:38 17:3 16:3 9.1(68) n6 260 269 linear iterations ·n 66 74, 87 4,:3(4,1) 4:5(:32 ) and preconditioned effectiw'ly are applied ! 1 I 1 i i II I for different to the precondi- II FII system . .\ large inconsistf'lIcy (recall then' are sllllllllarized \Verf' process according 110 differences in Table in §§-t.2.2) ma\' discussion as the nonlinear reduc<' consistently to add that (which i 127( 108) 1:30( 8:3) of the methods updates I and that therE' is no \-'v'ayto reflect (at least in terms iterations It is worth ;')8 78 90 4:8(32) :39( :3:3) al1d the fixed preconditioner in reducing tablp shu\\'" lhat I -lQ I .J KryIO\'-Broyden cost) the llpdated this system [LUIO) 6.2 is to show the stability tiolled ~ystf'm soh'ed by G\[RES of cOlllputational B.Jacobi( 4:) 178 186 80(66) 86(,:)7) 207(169) 220( 1;')2) The main point of Table B.Jacobi(8) H 62 .')2 ach·al1ces. to the quality TII(, or lilt' in the totallll\llllwl' 6.1 for this problem size of -to x -to.) 6.1.3 Richards' This example III a \-ertical equation problem models cross-section. . region betwf'en the ground the infiltration of the near-surface This is a rase of unsaturated surface and the water table. flow that underground takes i.e .. the so called wile place in tilt' t'adOSf -:;OIlf. \\"Iwrp f! a nel parE' t he ground water capilla ry head and densi ty. respect ively. different functional cients the subsurface Oil and hydraulic forms are often used to describe \vater content. conducti\'ity the dependence For this example, HOWf'H'r. of both codfi- our choices of dispersi\"ity are. respectin>ly. I [{(c) = [{oel. and wil h .- C - Co Cs - Co (' \\"here Co is the irreducible dependent coefficient underground whose nonzero [\'o(i.j) = ~ 1- This choice of [\"0. and. represents although a narrow The hydraulic 1110\'('. I,d,(' distances ('1/ = Figure in underground n.];), domain The solution I I shows for +1 is sometimes at saturation formatiol1s. for tlw found in underground rock where the moisture is proportional of magnitude \-alue of the irreducible for a mesh of 16 x 16 at the 151 the effect of the heterogeneity content formations is allowed to to the rock permeabilwithin [n our computational lIlUistmE' to 1 ~ i,j ~5. o\"er a few orders which reprE'sE'nts a typical 6.S slll)WS the solution dimensional J chanl1el of pernwable conducti\·ity 1\0 is a position and the tensor \"alues have been chosen according contri\·ed. ity. which has been shown to change ~hort water content relati\"(~ly experiments watf:'r content. distribution the O\'et' W(' St'(' 1\\"0- and 1000th time steps of simulation. in the resulting subsurface water was chosen small content. "-\ constant t'!}()11gh ilnd time step to allow the inexact was used for these simulations. which ~ ewton met hod to can verge \\"ithin gi \'en by ~t = ~h2 16 . -to n0l11inear iterat ions This small time step was required [net as an acceptable Figures initial in order guess for the nonlinear 6.9 show the dispel'sivity for the same discretization ,;teps. Figure 6.10 sho\\'s the distribution geneity the interval scaling of having in t he scaling time o\'er the two-dimensional £10- 6.8. at the pi and of the transport of the model. (0 ..j) as a result is hidden mesh as in Figure to give the reader and nonlinearities of the previous iteration. D(c). coefficient. malll. Bot h figures are intended to use the solution The coefficients scaled of the spatial and instf'ad. effect of hetero- are shown l\(c) both l\(c). coefficient. a feel for the combined 1000th timt' to vary within D(c) by [(O.max. This coordinates. Richards' equation 500 ,,'" 450 II 400 " ," ~" 350 II " ," ~" 300 0- d:: ::;; " 250 " 200 I- - 150 I- - 100 Newton Comp_ Newton -HOKN 50 HKS_N I..;" o'-=0 10 20 30 40 50 TIme step 60 70 80 90 100 Figure 6.11 Performance in accumulated millions of floating point operations of :'\ewton's method. composite \'ewton's method. the HOI\:\' algorithm and the HKS-~ algorithm for sol\'ing Richards' equation. Figure 6.11 and for all the nonlinear 6.12 show methods the accumulated (million) as t he simulation progresses floating point operations up to 100 time steps. fOf I Jl) f Richards' equation 500 450 400 350 300 , CL £ 250 :::;; ! 200 - - Broyden 150 - - NEN -KEN 100 HKS_B 50 HKS_EN -+- 0 0 10 20 40 50 Time step 60 70 80 100 90 Figure 6.12 Pf'rformance in accumulated millions of floating point operations of Broyden's method. the 110nlinear Eirola-~evanlinna algorithm. the nonlinear [\:E:\ algorithm. the HKS-B algorithm and the HKS-E~ algorithm for soh-ing Richards' equation. a c1iscrl'tization ('xhibils lIlt> mesh of :32 x :3:2. computational TIlt' 1I0h~:\algorithm methods. (see Fi~lIrl' Ii. I I ). TIlt' incr('ilsing ach'ances operations. 110111illParproblems The CUf\'e clearly shows a signific'i1llt ~i\\'ing in computational as simulation of HO(lting point was Ilsed. cost trel1d of allllol1linear to ('nd of t his ~hort simulation lil1t'ar problems :\0 preconditioning This producps growth cost from ~ta('l difficulty a slIpt'r1il1ear growth or tilt' in the nllllllwl' is not only due to the complexity but also to that of the linear problem, 1101\- This is an example of t lw wllf'l"l" t he region of rapid com'ergence is far from the initial guess given at e\'ery time step. causing to \"ewton's before can unexpf'cted be obsern'd toward difficulties in Figure the solution. making 6.12 that them secant method methods more preferable than reaching produce :\ewton that region. more efficient [t steps type of approaches. The HOKN algorithm ~teps per nonlinear E\ delivers between one and two successful Krylov- Sro~'dpl1 for this problem case. :'iote. ho\.... ever. that the HI\S- algorit hm is more efficient than this algorithm during the first :30 time steps. Throughout iteration t.he whole simulation. the HKS-EN algorithms efficient than \ewtol1's method and the composite substitution by Richardson of G\IRES iterations. turns out to be mon~ ~ewton's method owing to till" However. in the absence of that beneficial secant step it shows a similar order cost to that of the other t.wo nonlinear methods. Figure 6.12 shows again that the nonlinear KE\ close competitors. perhaps algorithm \....ith a marginal advantage and the HKS-E\ for the latter. In this case. the HI\:S-S performs badly as result of a sequence of poor Krylov-Sroyden somehow are correctt>d ill the HKS-E~ algorithm. a clear winner betwef'n Sroyc\en and the \E~ \ewton's are steps that Also. there does not seem to he algorithm (as it also occurs betweell method and the composite :\"ewton's method) but. both the nonlinear KE\ and the HKS-E\ algorithms Table 6.3 summarizes perform hetter yet. convergence of the previous plots. The table confirms the excessi\'e work (in terms of nonlinear tlw composite \ewton's iterations) method and the HKS-\ algorithm and all secant methods. bel' of nonlinear iterations number of linear iterations. carried out by :\ewton's algorithms The composite \ewton's of Sewton's method. compared to the HO!\:\ lllf'thoc\ hah'es the Ulllll- method but both spend ahout the same totid The figures for the HOK\ algorithm perfectly justify what is obsen'ed in Figure 6.11. It ['ec\uces in half t he number of nonlinear iterat iOlls taken by the composite \ewton's method and. besides. it reduces by an almost -l:-fold the total number of linear iterations with respect to this higher order method. the HOI\::\" algorithm not only tackles efficiently the nonlinearities much easier linear problems that arise at the neighborhood Hence. but also leac\s to of the solution. Table 6.3 Tot.al of nonlinear iterations (~r). G:\1RES iterations (GI) al1d (\\'hen applicable) Richardson iterat ions (R ich) for inexact versions of se\'f~ral nonlinear solvers. The problem size considered is of size 16 x 16 gridblocks after 100 time steps of simulation. ! \"1 :\lethod \ewton Camp. ~ewton HOK:'-i HKS-:\ Sroyclen :'\E:"i KEN Hr~S-B HKS-E:'-i The \EN algorithm Sroyden's method putational efficiency nonlinpar between The obtalOed iterations HKS-B algorithm iterations. compared o\"erwork of Richardson C:\lRES iterations. iterations corroborate displayed 0 0 0 0 0 0 0 0 0 the HOD number of linear of nonlinear iterations KEN algorithm than \ote to Broyden's accounts 1\:E:\ algorithm method: the relative more nonlinear that did not alle\"iatt' those of last section in that the combined conH'\'!!;t'~ high cost of the iterations number all of G:\1R ES results of linear and relyil1g on in the number These lilt, work inducl'd the cost of merely algorithms. by the situation the additional exhibits of but n'(luc('~ in In this particular compensate and HKS-E~ is marked by com- number that the HI\:S-EN algorithm of this table is the reduction by the HKS-:\ shown for the similar but its efficiency iterations. The table dearly iterations takes an intermediate the nonlinear iterations itpratiolls One of the other highlights 422 2'j;j 7 :3078 19:39 0 0 0 11:309 0 0 0 2909 1774 of (;:\IRES via Richardson by ext ra nonlinear 11890 1:2186 2673 6091 4046 displaid" i'Oll2;hly 129(· the total number savings 162/ 8:35 :391 1622 6:31 :J47 these two algorithms ft'\\'f'r Ilumlwr of linear iterations in a few more nonlinear Sacks .. '6·j but t he number iterations Rich. . 499 also hah"es of both" G1 appear to iterations of HKS algorithms :\ewton's 6.2 is approximately and Bro.n[('I\·s of C\[RES iterations in the methods. Evaluating preconditioners for coupled systems [n this section we discuss t he results and 6 ..1. which were designed previously covered The matrices and right hand 1.1 times grid spacing The data higher than for the tests :3 \f'wton comhinations iteratiol1s 1)2,:') \[Hz shown in Tahles 6.--1 for coupled systems that problems to the description and one injection The permeability in the vertical discretization sizes: 8x 8x given \'ertical is uniform direction. was downloaded in \vells in the areal We use non-uniform --1 and within from the simulation the cmrent time lew\. 16 x 16 x --1. \Ve ran and preconditioner tested These nodes g;i\'e a peak after 1 time step and The code including was written were run on a single node of and 18\[ clock). were generated ~t = 0.1. 1.0 days. of linear soh'er and all of the tests according of one production of the reservoir. and two different for our test ParSim corners hoth cases with time steps wit h a side \'ectors black oil simulator at opposite sel1se and experiments to test the ideas on preconditioners -1. Our test model consists Chapter located of the numerical in this work. hy the two phase after equal to the number in FORTR.-\:\ SP1(RS6000. performance all the 77 model :370. of 12.1 \[F\ops al\d have 128 \18 of R:\\I. The tests included with runs made with hoth G\1RES each of the schellles conditioners tridiagonal. Table analyzed in this work and. of COlllmOIl use in reservoir 1Le(O) (i.e .. incomplete 6.--1shows the results Table 6 ..) shO\vs the corresponding and BiCGSTAB simulation LV factorization additionally. (particularly with three the last two), prei.e .. with no infill) and block .Jacobi. for all the preconditioners results preconditioned for BiCGSTAB applied to G\[RES preconditioned and with each I! Hi Bi(,(SL\B. This owes to the fact that per iteration instead BiCGSTAB of the single one needed gence of BiCGST-\B is {'rratic. has two matrix-vector multipli('~ by G~IRES. Additionally, as is well known the COlm'r- and call be appreciated in Figun' 6.1-t, Comparison a greater betwf'f'11 t he results numbn exceptions) all of the two-stage number these results stage the of Ol\tf'r iterations of outer partial for the short{'r jtt'ral gence history that (with a few preconditioners time step. operator after the combil1ative The increased difficulty of outer gm' implemented the system. for the two- We believe that full decoupling is less for t Ilf' is more effecti\"f' as preconditioner. which only uses iterat ions for ~t = 1.0 than of the problem in the a\"erage number iI The key in interprf'ting one and the preconditioner number 1)1' inner iterations :\otice aw'rage only soh'es of il1ner iterations of the per Sit'!' soh-ers. except of each case. :2SComb domlllance blocks shows a greater ~hows the accumulated number notice for both iterative \\·hereas 1.0 show~· that time stt'P i~ with a longer of inner iterations pt>r II11it 1011. rlw nUlllbt>r results .Jacobian in all caSt-'S by the growth rdlf'l"tl'd IJllll'r ~f. for the 2SComb) for the longer for the shorter To this point. decoupling (except of the full-decoupling ight of t he off-diagonal result. = for ~t those and its own power to precondition longer time step than il 0.1 and for the first four preconditioners iterations is in the action preconditioners II't = for the long!:'r time step. However. smaller for ~t itt'ralion for minor differenn's is comparable due to particular of both the pressure for pressure blocks and concentration components and therefore Jacobian S,-" components show a lower in the time step size damages of the decoupled in II\(' conn'r- that in the case of the last five preconditioners .. -\n increase main-diagonal uf the outer the diagonal thus producing 1l)7 Table 6.5 Results for BiCGST.-\B preconditioned by the nine schemes tested in this \\'ork .. Vit: number of outer iterations: Ts: elapsed time in seconds for the solver iteration: Tp: elapsed t.ime in seconds to form the preconditioner: Si.,,: average number of inner iterations per unit outer iteration. Preconditioners shown are from top to bottom: tridiagonal (Tridiag.). incomplete LU factorization with no infill (ILU(O)), block .Jacobi (B.1). two-stage Combinative (2SComb.), two-stage Additive (2SAdd.). two-st.age ~-lultiplicative (2S~Iult.). two-stage block Jacobi (2SBJ), two-stage Gauss-Seidel (2SGS) and two-stage Discrete Projection (2SDP). I Time Step Size Prob. Size -lx 8 x 8 I Time Proh. Size 16 x 16 x -1 ~t =.1 --+ r Precond. Tridiag. rLl.'(O) BJ 2SC'omb. 2SAdd. 2S~[ult. 2SBJ 2SGS 2SDP Step Size Precond. Tridiag. rLl'(O) B.l 2SComb. 2SAdd. 2S~'Iult. 2SBJ 2SGS 2SDP 1')_I 2:39 80 L06 2-lI~J 2:3 11 10 I :3.-1-2 .57.90 2.98 LL:3.7·1 88.7·1 61.:38 2-l-.97 11.9 L 20.0-l- Ni.a 0.26 0.:37 0,17 0 ..50 1.00 1.00 0.02 0.02 0.0:3 Sit 176 > 1000 ,57 170 68 -1-4 -11 17 1:3 n,:~7 .1..S-l- - ~t - 118 ..5:3 -l5.99 9.7.1 292 ..50 :361.7.1 1:3.:38 2:38.88 1:3.:38 8:3.21 0.09 :3.5.82 0.09 .56.72 0.12 1-l.10 -l-9 -1-9 ·50 61 > LOOO '\';"1 0')._1 0.17 0 ..50 0.7.1 0.7·5 0.02 0.02 0.0:3 Tp - .\"i , .'1 - :~81.22 :31.-1--1 :37.-l-6 In.6-l690.00 12.00 -l-90.6:3 1:3.18 2·5.5.25 1:3.:38 68.91 0.09 :3:3.81 0.09 11 0.12 10 ·58.62 -l-2-l69 180 61 :32 23 81 18:3 188 17:3 177 179 = 1. T, .Vit I . Tp , .V;.~ 1. Ts Nit , Tp Ts = 6.20 227 >1000 2.83 75 -1-0 125 2.53.50 :3-1: 1·58.38 118 11.) 67.:38 14 -l-6.91 116 24 115 12 24.15 118 1-1- H.01 ~t =.1 ! I Tp Ts .\iit --+ ~t ! ! i 2-1 , I 1 'J 76 76 82 It (l rdt'r i nller sol \"{~s.as rpflpcted by t he res III t s on bot h tables. a groWl h ill t hE' size of t he linear system .-\5 for the quest ion of efficiency. and discrete t illws to converge the linear systems typical mpnt ioned al though t he problem conspcuti\'f' gl't'iiter preconditioners size. The comhinati\'e for tl1t'~p rathpr .\ccOl'ding preconditioner friendly on a\-prage than \n'IIIH'ltcl\'pd and cOl1centration timings. is not robust .. \s t Iw enough of e\'en blocks In all experilllent~. \\'f' preconditioners, TIlt' fOl'lIlt'r family with the addition of the global in the consecuti\'e type). in the application of the global as \\'as mentioned abo\·e . .\! should decoupled preconditiol1er pressure be at least and is that process, I) to them an' to apprlJxi- to concentriltiolls. to the latter schemes. incurred :\Ioreo\'er. as a preconditioner blocks. but is absent to the high overhead of the alternate concentration ;).-r. \vith the consecllti\'e Nlui\-alent as effective the conditioned step gi\'en by ,\[ (this step times testify more Theorem matrix with respect is approximately The total elapsed performs or approximation of the altprnate preconditioning (recall use the identity of the Schill' complement to the comparison 2SDP from the decoupling blocks are :\[-matrices in this rasp. this appreciation is more poorly resulting systems ,\ !inal word is de\'otf'd that as the best 2SGS. The reason to concentrations Schur complement till' indi\'idual simulation for problems However, ~ote that lnatf' ./,~ in the construction Me black-oil t.he 2SDP appears results). its closer competitor ilnd cOl1centration is no guarantee similar at overall than then' the best elapsed robustness for example. by ollter iterations, mat rix with respect pressure display i.e.. tIlt' here are only modest. t.o ha\'e t.he required Schur complenwnt .-\It hough. projection. sizes presented :2SGS achieves when looking t he pressure precondi t ioners. problems. (although. iterations su rprisi lIgly. in e\'ery case. in fully implicit preconditioner. to the trend suggested can be misleading inner appear a the consecu ti ve- type two-st age block .Jacobi. Gauss-Seidel above. .vi. decreased Somehow However. as J! is a 109 preconditioner for the full .Jacobian. or worse. We are now looking the c1ecoupled blocks, and therefore. It should with so that These experiments be mentiol1ed the coupling -l grid blocks numbering scheme .ff results .\1 was blocks). in the number in the algebraic that the properties we require step sizes. i.e .. clearly threshold value. expected to deteriorate was ahvays which In spite of linear analysis \Ve also remarked 11ll' I ill1l' increase hod bu t dops not. 110ticeably these results guarantee we believe there the performance that increasing of greater all in the robustness of this elapsed matrix times blocks (see combinative because The results the performance a n.\' of t he at llf'r is still considerable method § .3.:3) blocks are met for reasonable will not be all valid for ~t beyond deteriorates CO[l\'ergence of the .\ewton was chosPI1 (notice rapidly of the .Jacobian that the t\\-'o-stage il ffecl of bands of the assumed as the time step size increases step of J iterations. from the individual our assumptions of soh·er. factorization retained is most llents (which it is based on) are no longer dominant. 11lf'1 number its main effect seems to be the posting T~ with no reduction "how that time by the iterative The layers in the z-direction. has to beat the action as an incomplete of 19. of the path. that this is a losing proposition in wasted chosen a bandwidth of the grid global preconditioner. for J that show clearly of nearest-neighbor cases have We ment.ioned of t.hat infill inside us back to beginning for a preconditioner the application complete which throws I room of any of the five new two-stage a given preconditioner the pressure was compo- of these experiments of the first t\\'o-st age wo-~l age schemes. ill choosing itself without time a ~t substantially preconclitioners In\' if'\\' of which will damaging proposed in this work. Figure 6.1:3 summarizes size of 8 x 8 x --l and ~t the three standard the convergence = 0.1. On the upper preconditioners. behavior of G yIRES for the discretizat left corner. The plot on the upper the plot shows the results ion for right shows the convergence 200 1.. \ I en E 0.8~ \ en ... " ~ 0.6~. enQ) i, ~0.4 ~en 0'6~l \ ~ 0.4 \ l "" I Q) a: o.a E I \ 0.2 Q): a: " ,- \ \ 0.2 , \ o o 50 100 150 200 250 Iter. " 20 10 30 40 50 Iter. 0.1 r I I en o.oaf en E... g 0.06 g 0.6 Ul Q) ~ 1\ Q) I ~O.4t \ ~ 0.04~\ a: o.a E... :\, ,\ a: O .02~" i \~ i j O~ 10 5 Iter. 15 0.2 i \ \ 1 " I j Or--.. 5 15 10 Iter. Figure 6.13 Relatin> residual norms VS, iteration of G:\[RES for different preconditiollers. The performance with different preconditioners are ur!!;f\llized ill matrix form. Subplot (1.1): [Ll' (dot). Trid(dash). block .Jacobi I ~olid). Subplot ( 1. 2): two-stage combinative (dot). t\vo-stage additi\'e (dash). two-stage multiplicati\'e (solid). Subplot (2.1): two-stage block .Jacobi (dot). two-stage Gauss-Seidel (dash). two-stage discrete projection (solid). Subplot (2.2): block Jacobi (dot). two-stage multiplicative (clash). two-stage discrete projection (solid). Problem Size: -! x 8 x 8. ~t 0.1. = 201 Figure 6.14 Relative residual norms \'s. iteration of BiCGST...\B for different preconditioners. The performance with different preconditioners are nrp;anizt>d in matrix form, Subplot (1.1): IU' (dot). Trid(dash). block .Jacobi holid). Subplot (1. :2): two-stage combinati\"e (dot). two-stage additive (dash). two-stage multiplicative (solid). Subplot (:2.1): two-stage block .Jacobi (dot). two-stage Gauss-Seidel (dash). two-stage discrete projection (solid), Subplot (2.2): block .Jacobi (dot). two-stage multiplicative (dash). two-stage discrete projection (solid). Problem Size: -l x 8 x 8. ~t = 0.1. conditions. This implies illld concent ration the manipulation coefficient of .Jacobian '. " , H- with 6--1:prf'sstl\'f.' arrays. , " matrices l ~9- .,\ ~~- .,, .3.5 .. ,) .a~ JJ;-)2" .:;'p .~-J " ~i ~.2 )3 "".""'9 oJ" p". .. g. I JS U,lul"'ll(ln. , ".2 ') J 0.4 05 Welting ph ... Sw Q.6 UlurallO". 0.7 Sw l) 8 'H Figure 6.15 (LEFT) Relatin:, permeability of both phases and capillary pressure function (RIGHT). Table 6.6 Physical input Inirial non\\'etting phase presstlre at --lYft Illili,,! \\,(>tting saturation at -il) ft \llll\\'Ptl in~ phase density \llll\\'l't I if!!.!; phase (olllpressibilit~, \\'('Itillg phase compressibility \llllwetting phase \'iscosity \\'t'tting phase \"iscosity .\real permeability Permeahilityalong bt and 2nd half of \'ertical _\dditionally. t the data are decomposed he ~ilnw origillalnlllllber fact that than in most the horizontal resen'oir plane data. :300p.si .,j ~8Ib/ft3 1.:2 x LO -~ p,~i-I '3') , ..) ", 10-'; p...-'--I 1.6c[J O.:2:3cp l·jOmd gridblocks in an areal sense (i.e .. each processors of grid blocks along t he dept h direction). domains where lOme! and :30md the \'ertical the phases direction flow. The This is due to t\\\~ is relatively effective holds much smaller manipulation of a 20:') full permeability tensor IOllcf'ntrations pressures induces of the linearized ilnd a I-point pquat ion (t his gives unknown), densities matrix-vector Table implementation. and satmatiolls a 19-point of the linearized arrays data alld stencil non-wetting accompanying involves neighbors block of pressures for the pressures for phas(' each gridhlock communication of each (see to [--10]for further det ails I. comprises and concentrations the G\[RES (i.e., the product phase) .. -\ tridiagonal preconditioner of is uSf'd thf' COIl\'prgence rate of this inner GMRES. "hows the associated lIlodel consist the physical relati\'e of a water at the coordinate 6.3.2 and. the 2SGS preconditioner of a particular 6.6 summarizes "p('cified) equat.ion products and four corner of each indi\'idual to accelerate phase rise to the 6--1coefficient Thereforf'. In our particular wetting :;tencil for concentrations nodf' wit h its four lat('ral solution a 19-point. st.encil discretization parameters permeability injection for this problem, and capillary well (with bottomhole (l. I) of the plane and. a production at t lIP opposite Considerations corner for pressure and Figure functions pressure used. specified) 6.1.) The located well (\'v'ith bottomhole pressure of the plane. implementing the HOK~ algorithm with the 2GSS preconditiorier B(-'fun-' pr('~(-'lltillg the numerical cOIl:,ideratiolls the HOK~ arising nOlllinear linf'ar system. of t he form rt'slllh. it is important from the joiut illlplp.mentation solver. the 2SGS the secant Since equation to t'~tahlish some e~"'1)('('i<l\ of the 2SC;S preconditiont:'r demands previous on which the Krylov-Broyden decoupling update flU" of tlw is based. is ~06 for a given !.:th nonlinear Here . .\Ilk) reprf'sents decoupled matrix a similar the inexact (D(kl) presentation -1 to t he .Jacobian system one determines that with; = preconditioner matrix acting \'(k'I!/lkl. UpOIl t l]f' as 2 x :2 blocb expressed has the secant factorization equation update for the Hessenberg of the Hessenberg the \'alue of the function ill~ prlJjpctt>d \)I\to the underlying and cOllsequently'. terms of the decoupled .\11 efficient ('ratioll block Gauss-Seidel .~g" + = s(k) matrix is given by is given by matrix !!(Olklrl /,~k)ll. 1\('11(1'. update solution in (.j.I). .-\rnoldi Broyden's and a G\.[RES .-t(kl. This decoupled depicted {":-;ing t he associated Therefore. iteration ill place over all arrays lIals f'ntries of each block same :-;tandard coefficients. Euclidean HOK\ befort-' Ilt'- poillt neptis to be decoupled Technically. implementation. the Krylo\'- Bro.\'flt'll can be carrit,d \)l1t ill system . implementation .Jacobian 1If'\\' Krylo\' ~ubspace. the entire linear the original at the is accomplished holding the matrix fi\'e arrays and the \'ector by carrying coefficients. are employed entries norm in the line-search out the decoupling of ~. This backtracking In order to rf'store to hold the main diago- allows to maintain strategy. op- forcing tl\l' tprrn 207 selection and in the nonlinear after all Krylov-Broyden As explained Krylov-Broyden stopping criteria. steps in the HOK:'-J' algorithm in Chapter update there --1-. is no need for the implementation tiol1s can be done in terms of the updated \';n+l and the minimal preconditioner \Ve remark inexact Broyden 6.3.3 residual ~ewtol1 nonlil1ear that the method HOK~ Hessenberg Numerical All opera- the orthogonal matrix E IRm. A.dditionally. y(k) update. to retrieve tlw the 8~). algorithm can be easily changed to the standard the computation of the Krylov- results BiCGSTAB for two different of the 2S('omb for a problem of modest difficulty plo.\'" alIllost hair of till' total Bi(,(;ST.-\B call" made donbks and the :2SGS preconditioning values of ~l. The table shows that both G~1RES number at each linear The cost associated of iterations iteration ils t he simulation problems :\lgorithm for ~l = 2.2.1 and ,.j reveals. pmhand. .-\lgoritlllll and the application times comparable Bi-CGSTAB latter BiCGST.-\B alld Plw'onditiulI"r multiplication the and similarly hilt on the other Illllitiplications problems. perform that makes t he performance In simple in more complex algorithms .0,:)). \'otice (d. on GylRES in Table 6.1. of (;\IRES to the matrix-\'ector tween these two linear sol\"ers. and efficient = (i.e .. for ~l the IIl1mher of l1liltrix-\"ector by G\IRES whereas This is shown al1d BiC'GST.-\B of any of the two-st age preconditioners robust the .Jacobian·s steps. the dff'ct G:\IRES. form algorithm. matrix, solution with a single flag inhibiting \Ye compare 2.:3.1), to explicitly after each Krylov-Broyden direction. values an' restored have been completed. of the HOK~ approximation _\[(1.) is kept fixed, un preconditioned The coefficients method tends be- to outperform lends to be more Also remarkable ditioner in relation is the performance to the 2SComb :2SGS preconditioner tions. Since the number t he computer This reduces times result matrices extracted linear preconditioner. soh'ers by almost iterations the obsen'ations from this physical is practically I. recall discussion 10 times made with the 2SGS precon- For this particular by more than a 10-fold the total of nonlinear corroborates of both in the problem. number of linear the itera- unchanged. we impro\"(;> on cost of both schemes). previous section fOf samplf' model. Table 6.7 Summar~' of linear iterations (LI). nonlinear iterations (:\1). number of backtracks (NB) and execution times of G\IRES and Bi-CGST.-\B with the use of the :2SC'omb al1d the :2SGS preconditioners. The simulation cO\'ers :20 time steps with ,,:).t = .O.j and ,,:).t = ..j for a problem of size 8 x 2--1:x :2t gridblocks on a mesh o1'-!- x --I: nodes of the Intel Paragon. ("'): Backtracking met.hod failed after the 1,th time step; (**): flt was halved after the 16th time step. Linear solverjPrec. G \IR ES j2SComb .O.j G\[RESj2SGS Bi-CGST:\Bj2SComb ! Bi-CGST.-\.Bj2SGS ,,:).t ! I G\IRESj2SComb ! ,,-i I ! Bi-CGST.-\.Bj2SGS tiOller forces a l'f'duction concf'lltrations within to regulate pressures of material balance Time(Hrs.) n 102 -!-9 8.10 --I:.j 0 0 0 (i6 1.10 0.11 L.19 0.07 -1--1: 0 6.:3'1 100 0 107 0 O.·jl l!JO --1:1 ;').62 10:2 12 0.70 I 12808 (Co') for different I -!-~n reasons. For,,:).t = .,j. the 2SGS of the time ~tep clue to the high changes the time the next and saturations \B I G\IRESj2SGS !l3i-CGSTA B/2SComb(' fails twice \iI I 6.-!.j :'):38 I BiCGST-\.B tomary I LI l--l:·jO step, In many resen'oir time step according within the current due to the deterioration time simulation to a maximum step. or eventual precondi- of pressures codes allowable This prevents and it. is ellschange possible failure of the nonlinear or lo~s solu- 209 tion. Shortening the time step increases the chances of convergence for the nonlinear met hod. The failure with the ~SC'omb preconditioner because the linear soh-er was unable (0.1. in our case). IIFil, for decreasing high number Figures Paragon 011 --I:. 8 and t This was enough that prohlem this execution direction had undergone the issue of parallel problem sizes, to capture the simulator is mainly t an acceptable allO\ved a scalability \ve compare the efficiency in timil1gs trend of the machines. scales better on the IBNI SP2 than on due to the low latency has compared size the larger tolerance failt-'c\ steps. 6.17 summarize This can obserw machil1e breakdown. For four different soh"er on both Iw smallest before The line-search at the maximum could not provide and nonlinear IB:\I SP2. t he Intel Paragon. problem that. 12 processors. The reader to converge BiCGSTAB 6.16. 6.19 and and HOK\j2SGS the former \"ote of backtracks 6.18. the Therefore. is more seri~us. with the latter he efficiencies size is practically obtained. go\"erned and high handwidth ol1e. As expected. ~ote that the larger how the computing hy the communication the time for overhead 111 hot h machines. TllC' major hulk of parallelism TIlt' hllKk tridiag;onal and sufficient preconditioner to meet the in the inl1er and outer operations (i.e ... -\XPY·s is chosen in the construction (Howe\"er. orthogonality and O\"er the -of the the G:\lRES linear G:\IRES inner Krylo\' tolerances. C\IRES without contemplates is totally t'nfortunately. the classical to exploit sacrificing iterative most iOlH'r. parall('l of til(-' at the level of BLAS-l In this regard. Gram-Schmidt basis of the precondit are parallelizable products). modified implementation if required.) in t he computation used in the innermost required operations Schmidt resides further stability refinement Gram- parallelism requirements, to presen"p :lIO The Krylov-Broyden step allelism since the Hessenberg .\rnoldi process. among with \'ectors The Hessenberg on that basis. encouraging time sizes than shown those display interprocessor relatin"ly by the SP2. needed of the SP2 make length. arises thus for difff'rf'nt results sizes. gain for different increasing not illlpl~' 1Ilolor reductions of !IFII is explained time (ll"(~ decay problem "iize by the fact because of the The much shortf'r more linearly of computation a high penalty ill ncl' the problem of zero length. dependent to communication it is important steps aIIO\\'('([ ill the HOI\:\ the IIIlIIIlwr of h:rdo\'-secant basis \"f'ctors of performing is latency-bound cases. the shown in both figures This transfer III both after of code with a small degree that on the Paragon the ratio of par- show a more rapid speedup on the Paragon. the interprocessor problem as result of efficiency to set up a message keeping in each processor proportion means, part and distributing the scaling This impact communication long times on message . constant matrix a \vider range lllay ha'\"e a grf>ater efficiency latencies of partitioning The results on the Paragon but. at the same loses a significant up replicated in spite of the significant of fine grain parallelism. that ends ih mind the above discussion. Ha\'ing somehow matrix This is a consequence all processors. products in the HOK:'-l' method algorithm fairly to watch for since if tlwy do may be paid in terms of pcll'Cdlt" dficiency. It is important -t processors to add that so. this explains The log-log plots in Figures the simulator why timings 6.l8 and are compared f). (indicated with slope -L) that all problem the SP2. This the fact that complements as more processors are added from ;')O7( to 1OO~ faster than \\'as not designed relatiwly t he Paragon. to this number. I!> show the de\'iation size cases present in the SP2. to work with less than timings from ideal speedup on both the Paragon are less sensitive In these experiments In theory to degradation the SP2 shows this margin and is expected to 1)1' to 1)(> ~ll I 3~ 2.8 i- , 4.12x12 1- - i 1 8x24x24 261-1 12x36x36 - I - 16x48x48 2.4 ~ 2'2~ c. ~ 2 l - , I .8 -- - - - - - - - - - - - - - - '--- - - JJ I _ ' 1 6~ - --- 1' 1I - - - - - - - 1 ::,~, ,J 4 5 6 7 N. 8 processors 9 10 II 12 Figure 6.16 Speedup vs. number of processors for the two-phase problem using the HOI\:\j2SGS soln'r on an Intel Paragon after 20 time steps. 3 'r 28~ :: - 4.12x12 - 8.24x24 2.6~ 12.36x36 i - . 16x46x48 2.4 I 2.2~ c. ~ :!t. 2 Ul 1.8 1.6 1.4 I i 1.2 1 4 5 6 10 789 N. II 12 processors Figure 6.17 Speedup vs. number of processors for the two-phase problem llsing the HOK:\/2SGS solver on an IB~I SP2 after 20 time steps. larger. but the author tllf' pl"rformance suspects that memory hierarchy f'ffects may be deteriorating of the SP2. -4"12,,12 I 1- - I [ 8><24.24 12><36><36 - - 16><48><46 g t - - - i ~I '~l _ 1 ! ~------------ ---- -------d ~ fL I , 1 . j 10' Logl0 or N. orocessors Figure 6.18 Lug-log plot of thl" number of processors vs. execution time for the two-phase probll"m using the HOT\:'-l'j2SGS solver on an Intel Paragon after :W till ..' :;teps. Figllre (),~lJ \t'\\'lulI/1S(;S \\'jt illustrates :,oln'rs the relati\'ely ha\'e in the simulati()n. 112S( 'omh pl'<'conditiOillng 2SCS preconditioning. slightly rapid The more this number nonlinear Figme diffel't'nce tll)\\' rile line correcrton sol\'ing presen'e tilllf'S Krylo\"-Broyden of accumulated that For a modf'ratf' to takes abu\"(' impact HOK\ and size. C\IHES problem lilll';u iterations I!lUI'<' :-;leps in the iterations. bot.h HOK:--;j2SGS than method This is accomplished wit 11 l'f'dul"I' by a mort' com'ergence. 6.21 t'xpresses is strong G\IRES iess prominent in tilt' 2SComb iteration between effort the :!SComb preconditioner the pressure :;ystem. This method the highest possible robustness. in terms and contributes \\'as not introduced The line correction of computer time. The :2SGS preconditioning. to reducing the cost for in the 2SGS in order method to in the :2SGS has - 4x12x12 - - 10' 12x36x36 - ~ '" 8x24x24 - I " 16.48.48 - - - i..= J "0 ...J E ~ 10 ~ _ ~ ,I _ ~ la' I LaglO of N. processors Figure 6.19 Log-log plot of the number of processors vs, execution t.ime for the two-phase problem using t.he HOKNj2SGS sol\'er on an IBM SP2 . aft.er :W time steps. I 10° ~ - HOKN/2SGS II Newlon/2SGS f z a: .. Newlon/2SComb 10-';: Z a: 10 '~ .. , a t Figure 6.20 100 50 150 GMRES ,ter, \ umber of accumulated GMRES iterations vs Relati\'e nonlinear residual norms (\R\R) using t.he HOK~j2SGS. :'-l'ewtonj2SGS and \e\\"tonj2SComb solwrs on l~ nodes of t.he IB\1 SP2 for a problem size of l6 x -tS x -t8 at the t hire! t.ime st.ep with ~t = .O,j day. - HOKN/2SGS Newton/2SGS Newton/2SComb .. 300 .. 400 CPU lime .. 700 600 500 (5) CPC: time vs Relative Figure 6.21 nonlinear residual norms (NRNR) using the HOI\::\' j2SGS. :\ewton/2SGS and Newton/2SComb solvers on l2 nodes of the IB~I SP2 for a problem size of 16 x -t8 x -t8 at the third t.ime step with ~t = .0,) day .. :,onw dilficulties due to the lack of diagonal (i.e .. this situation the does not happen SYSII'lll is rt'ally easy ",hell tl1l'1'1' ,It'(' relatiw' !!;radif>nts uf till' for concentrations phcbP to the nonlinear that the line-colTPctioll time steps decoupling solution. by G:\IRES preconditioner but with is partial greater coefficients gradients tillle st('ps. backtracking it is preferred in order Illethod pressure at large of TlwuwII1 ;').,L I, Since the line-search guesses of the pressure This loss of diagollal to solve), small capillary \wlfing dominance where. dominance compared \'il)latillg 11II.'t block matrix thl'n hod allows to reinforce to he able to take larger is obsen'pd to permeabilit.\, tl1l' conditiolls to handle the robustness time steps. still works fine in the :2SGS preconditioner restriction than and Illore of the elliptic in the 2SComb properties contrarily, approach. of pressures had of the \Ve remark for slllall where the coefficients are :! I,) .. ooo~I - ~ 9000 - 6000 ~I !5 .~ - I ' Newton/2SComb HOKN/2SGS " / / . I / I /// "1 j /// / 7000' ,,;- <1> ~ ~ ::;: 6000 /// / / Cl 5000 .!3!, ~ 4000 § ; z 1 I / ]I / / t 1 / , I ,ooo~ --- ::t",/ , 0.5 1 """ 1.5 2 2.5 3 Simulation in days 35 4 4.5 5 Figure 6.22 Performance in accumulated GMRES iterations of the HOI\:?'i/2SGS a.nd :'\ewton/2SC'omb solvers after 100 time steps of simulation with DT = .0,) of a 16 x ,~8 x ,18 problem size on 16 SP2 nodes. 10 - - Newton/2SComb 9 - HOKN/2SGS 8 7 ~ 6~ ~ 5~ i ! a.. U / / I :::l 4~ 3 i 2 a 0.5 1.5 2 2.5 3 Simulation in days 3.5 4 4.5 5 Figure 6.23 Performance in accumulated CPl: time of the HOK~j2SGS and ~ewton/:2SC'omb soh'ers after 100 time steps of simulation with ~l = ,n,-) of a 16 x ·18 x -48prohlem size on 16 SP2 nodes. - - Newton/2SComb - HOKN/2SGS g 'iii ~ 350 rJJ ~ 300 ~ al 250 I n; "3 E 200 B l;l '0150 100 1.5 2 2.5 3 Simulation in days 4.5 4 3.5 5 Figure 6.24 Performance in accumulated nonlinear iterations of HOKN j:2SGS. \ewton/:2SGS and Xewton/2SComb solvers after 100 time steps of simulation with ~t = .0.5 of a 16 x -t8 x 48 problem size on 16 SP2 nodes. Despite timings this. of the inexact :2S( '01111> premlldit Thf' G\IRES HOI\:\j:2SGS tilP \ewtonj2SC'omb HO[\.\/:2SGS a considerably to the Newtonj:2SComb tioner than since one G\IRES the timings new solver. longer and ~t The line correction This is larger. Iime step a three-fold method size than with explains clearly use in the <3 x n :<2 \. the sa\'in~ long simulation. FigmE' 6.'2:2 shows that smaller of G\IRES amount the iterations till' of 111'\\' compan>d soh'er. with the :2SComb \e\"f'rtheless, the for a particular for a moderately spends As hd·ore. solver. by almost iOlwr was more effect.i\'e with this prohlem prt>\'julls allalysis iterations solver still outperforms iteration is more f'xpensi\'e preconditioner the Figure of the simulations margin howe\·er. tends are reduced to increase with the 2SGS precondi- 6.2:3 exhibits a fairer reality. by more than a third as the simulation with timE' is :.!L7 Figure 6.2-1 shows that not onlv linear iterations reduced. This figure illustrates but also nonlinear iterations the effect of using only one Krylov-Broyden step per nonlinear iteration .. -'\lthough, the HOK~ does not imply a noticeable speedup the inexact ~e\\'ton ill-conditioning llIethod (due mostly to its limited parallel capabilities an' 0\"f'1' and the of the .Jacobian matrix) its use is still advisable for achieving l)f'ttt~r material balance. In most cases. relatl\'e nonlinear residuals are driven closer to thl' solution than in those cases where the I,rylov-Broyden belie\'e that the HOK\ effectiveness is attenuated that acts as a left preconditioner ill the approximation preconditioning of the system. of the Krylov-Broyden step was disabled. \Ye also, due to the decoupling operation This introduces further concerns update for simultaneous left and right. of ,lw .Jacohian mat rix. Table 6.8 Result.s for the HOI,\, j2SGS and \'ewtonj2SComb solvers for different large problem sizes vs, different number of processors of the Intel Paragon. Exectltion figures are measured a.fter lO days of simulation wit.h ~t = 1 day. CPt' times (T) are measured in minutes, (E) indicates parallel efficiency. ("')Abnormal efficiency due to paging of the operating system. \. of P rOCf'ssors 1., -+ i Sol \"PI'. I III .<: IS :.t1 ~ I :W ;< ! HOK:\j2SGS lOa x LOa I :)0 x lOa x lOa I : :\e\\"tonj2SComh Sol \'lOll'. HOK~/2SGS :\ ewtonj2SComb Solver. HOI,~/2SGS \ewtonj2SComb I T LIL 7:2 ,i:l.!J.1 r .iO..j2 :n:2.-t2 T 99.78 . 5;)";.26 In order to show the capabilities some tests representing L6 unknowns) r E O.~;j 0./8 E 0.99 l.;)'; ( '< ) E l-LV) O,9l 68.I9 I 0.90 T E 72.8I l.00 0-t8.7I l.00 I T I f:' :20.-17 I L.UO (),1.8·1 i L.UO E l.00 l.00 of the HOK~j2SGS six hundred thousand pressure and concentration ! at large scale we run and one million of unknowns (addint?; on the Intel Paragon. These problem sizes are quite challenging homogeneous here. physical situation These results The so!\'er unknown specti\'ely. cases An a\'erage this short simulation. :\ewtonj2SC'omb preconditioner that sol\·er. from the deterioration appretiate that HOI\:\j2SCS timings l.(i('l. llleans increase more rapidly On the other \vith the ~SGS of the HOK~/2SGS solver can be drawn problem sizes of the table. than that the HOK:\/2SGS \\"1" for the at the case of 16 x -t8 tht? \ewtonj2SComb the seconds. the first column looking than :-<. ;;01\"1-'(' solver going from lei, can not be appreciated ·l~ :2,) for the large ..., machine. sol\'er is determined by its execution times rela- It is at least five times as fast as the :\ewton/2SComll de\'ice. o.;o!\·er is caused G~IRES -to for increasing we obserH' of the parallel t he line correction This restart hancl. 7::?000 unknowlIs). solver. it.eration for the \"ewtonj2SComb limitations ifier.! for the outermost rates. along alld stp\>. rf'- iteration of robustness method Going thousand per time linear G~IRES The aspect this trend ti\'e to the \ewtonj2SC'omb of the per time step was neeec\ec\ for and efficiency approach. The efficiency of the HOI~:\/2SGS :\e\\"ton;2SComb every r"nfortunately. cases due to memory 501\'er without that robustness L7:2 and ;).0:3 tinw~ faster than and :l!i processors. modeled the difficulty lO minutes case takes approximately employed. approach. increase the six hundred ·S and of the line correction grid blocks (i,e,. more than is to further 70 linear it.erations t.he largest of processors of permeabilities) is. l,)-20 times less accumulated both even for the qua~l- in Table 6.8, to tlw \ewton/2S('omb and number changes ill approximately This for solving formulation was able to sol\'e both of roughly The table also exhibits compared (i.e .. moderate are compiled HOK~/2SGS one million in a fully implicit a ~t = L day was specified :\evertheless. problem, to handle The anomalous by the high restart in order \'alue is reduced to maintain to 12 for GNIRES paging situation for till' \'alue (of -to) that has to be specacceptable linear com;ergenCt' using the 2SGS preconditioner. :2l !J ~\lthough the deterioration of line correction affects negatively the performance of (; \IR ES with the 2SComb preconditioner. its llse is still justified in these cases due to the significant savings of floating point operations introduced. cpr Table 6.9 time measured in minutes of a million and six hundred thousand of unknowns on l6 nodes of the SP2 for La days of simulation with ~t = 1 day. Soh'er HOK:\j2SGS ~ewton/2SComb ' :30 x LOax lOa .jO.49 l·j6.26 ;')0 x lOa x lOa 78.24 -t:3.).7,) The two largest cases were also executed on l6 nodes of the IBM SP2. Timings obtained on both machines show that the HOK~/2SGS and the Newton/2SC'omb solver perform similarly on :36 nodes of the Intel Paragon and on l6 nodes of the IB\I SP2. The relati\"f.'ly slight time reduction higher bandwidth of the latter solver is due to the and lower latency of the SP2 (in addition to the fact that smaller ('ol11mullication o\'erhead is incurred in a less number of processors). The reduction ill t>xf'cution time is about 1.2,)-1..5 fold on l6 nodes of the SP2 compared to :36 nodt's of tlw Parag;on, \otf' I hat a similar amount of dt'~radatioll is obsf'l'\'ed in the line correct itlll method as the problem size increases which denotes once again, that the HOK:\j2SGS sO!\'er is more robust than the :\ewtonj2SComb lems. sO!\'er for solving large scale prob- Chapter 7 Conclusions and further work In this research we have proposed a novel way to solve coupled systems of nonlinear equations at a lower cost. fort. on propagating we have concentrated the ef- IIseful I~rylov subspace informat.ion in two consecutive nonlinear ~teps of an inexact \ewton Krylo\'-Broyden To achieve this objective. or inexact quasi-~ewton method. updates (or Broyclen updates restricted a reasonable vehicle to propagate \Ve have found that to the Krylov subspace) are t his information in the form of efficient steps toward the solution of the nonlinear problem. Five algorithms were proposed to solve large scale nonlinear order \'crsion of \ewton's method (HOK~ algorithm). problems: a higher a faster version of Broyden's met hod (nonlinear K E:\' algori thm) which appears as a more efficient variant of t he recent nonlinear Eirola-\e\·anlinna method (HKS-\). a hybrid Krylov-secant a hybrid Krylov-secant a hybrid I\:rylo\'-secant gorithms (:\'E\). version of \ewton's version of Broyden's method (HKS-B) and. version of the nonlinear KE\' (HKS-E\). The first two al- \t'ad to the least squares solution of two or more minimal approximation problems (ill a lower dimensional space) for every C\lRES of algorithms are rather characterized instead of the more expensi\'e G\IRES .-\mong all. only the HKS-E~ use of Richardson iterations iterati\'e linear solver in every nonlinear cycle, algorithm combines effectively these two approaches. We ha\'e observed that explicit some of the implementation by the alternative call. The last threp spts knowledge of the .Jacobian of the above algorithms. in Roating point operations .. -\dditionally, the lI~e of any desired preconditioner is not required for This intr<?duces further savings the method can effectively accommodate whose effect turns out to be hidden (but not tri\'ially ,,1'1'<\111 separable) algorithms prup()~('d after seem to adapt experiments update. wP.11to efficient globalized have exhibited in sa\'ing a large amount attractive tation Krylo\'-Broyden In general. inexact our I\:ryl()\'- .\ewton method" lately in the literature. Computational rithms a given of operations. for large ,;cale implementations. in this direction of Krylov-secant par and nonlinear on the .-\moldi process e:-;t in exploiting possihle analysis ill Chapter \'alue theory symmetric (see combinations experimcn- problems [4:4]) This (arising linear Recent inexact inter.\ewton [291. in parameters systems in tl\l' paralll- Richardsull of nonlinear considering seems with sound but deserws for systems and and in [8:3] and briefly discll~"I'd eigenvalue in most iterations are reported .Jacobian problem however, and in lin- ideas based the life of useful relaxation non-symmetric. point. that of Lanczos relaxation is heuristically of predicting believes of quasi-.\ewton is. keep sO!\'ing future problems may be derived. was detected (1:-; This is a challenging of a matrix. in terms of Richardson problems falls short strongly to prolongate are usually explored: optimization until it fails, This argument where linpar sy"tems updates methods It is possible t That formalization. algo- particularly further for solving optimization The author of the adequacy in some nonlinear iteration methods ill the arena of unconstrained H I\:S algorit hms. !'tel'S makes them encourage need to be further can be reinterpreted of BFGS • Detailed This feature \Ve strongly programming, Ilew formulations Illethods of Krylov-secant . We have found t hat some aspects • The \'iability the potential further equations that and the pi~en- redistributions after low-rallk to be more manageable nonlinear programming for opti- mization problems) where many results of eigenvalue interlacing theory can bf> applied. • Future theory on Krylov-Broyden updates should be in order. In this dissprta- tion we have just gi\'en a preliminary develop the algorithms. motivation However. it is necessary to characterize gence and identify situations their COll\'er- were the update may work badly or well. This will help to cletermine the scope of the Krylov-secant possihle enhancements around this idea in order to algorithms and come up with to them. • Extend the ideas above to other linear iterative solvers. \Ve have used G \IRES as a framework to develop all Krylov-secant EirQla-\evanlinna algorithms. However. the linear algorit.hm keeps track of search directions generated t he process which may be also re-utilized or propagated sit Ilation seems to be less clear in those algorithms Ilpon Petrov-Calerkin approximations. during as we did here. The whose functionality depends However. any positive advance in that direction capahle of handling systems with multiple right hand sides or combining se\'eral low-rank updates may result into important extensions to t hI' present work. On the other hane!. the effort of this work has been complemented analysis of the physical and cOlTPsponding algebraic properties linear systems associated flow problems). with a cardul coming from coupled to the .Jacobian matrix (specifically arising in multi-phase This study leads to the conception of a new family of precondition- ers which are basically inexact extensions (i.e .. with blocks solved inexactly) frequently used hlock preconditioners but simple decoupling strategy coupling is a good preconditioner but, with the peculiarity of the entire system, of the of relying on a strong It was established that the de- hy itself and this. combined with the block solution or the decoupled system. gi\'es rise to an efficient preconditioner for the original linear S \'st em, Tlwrefore. these and general cient basis: iterations theory they are easy to implement and generality already developed inner linear systems. for indi\'idual In our particular the prohlem The author vehicle to extend PDE's to coupled parabolic onal blocks producing the algebraic convection-cliffusion properties a simp/f' that believes iterative scenarios. especial of full decoupling make them linear enhancements information that of PDE's, systems may lead to further case, the consideration in a way to concentrate under and can afford the use of several effi- here could be fitted into several on the physics behind interpretations. have been developed prO\'ides a satisfactory .\It hough the ideas presel\ted siderations preconditioners to soh'e the resulting this simplicity so!\'er two-stage COl\and translated in two main diagamenable to efficient ilIlH'r it erat i\'e so!\·ers. Our numerical tiona! .Jacobi approaches \\'1' l'ol\sidpr addressed show that proposed or banded illt'X(lct \'ersion results preconditioners of resel'\'oir (for preconditioning of the combinative the lWW in the literature preconditioner that the approach l'ollowing (originally isslws on two-stage outperform a fl"w t radi- ILC(Q). engineering: of the entire system). de\'eloped for coupled precollditioners hlock and an need linl'ill' 10 lit· in the future: • Dynamic characterization of the tolerances controlling the II1ner componel\t so!\'ers of the precond it ioner. • Theoretical analysis per grid block. what propertie; and extension It is important can be exploited of the preconditioners to severalunknowl1s to know what. choice of primary when solving large coupled variables systems and of nonlin- ear equations. \Ve have obsen'ed that some decoupled t han others, so this may determine blocks are easier to so!\'e the type of linear solvers to be used wit.hin the preconditioner. III general. further computational ing IJllt need to be e\'aillated scale. ,\mong phase. target applications compositional nonlinearities urations and between for the pressure ions) and good rf>asons to investigate industrial in se\'eral . III summary. an'ilS, tl)pic 1:'\'('11 dehuitely In contrast gorithms Hence, potential stand further to the two-stage by thell1se!\'es experiences for the solution equations force and itself The coming from threedifferent of t.he reservoir another bet\veen strengths in model (i.e .. one or others reservoir more on the preconditioning and task, coupled t'nconrage simulation. and at a larger for the sat.variables are ideas displayed but also extremely applications. is a difficult though situations lllodel is not only challenging full understanding linear systelll \\'(' coupled Our results are promis- useful the insights the coupling important llw physical or flow dri\'ing The reservoir Oll more stringent reservoir t he typical in t his dissertation. coupled are required. we consider thermal potel1tial or concentrat ,;ufficiently under experiences conception Only a few experiences systems research of equations in \'aried arise preconditioners for have been reported in many application in this direction, preconditioner~ as general of efficient soh-ers scenarios of large scale problems. ,;tlldied lwn-'. the Krylov-secant for nunlinear are desirable "ystems to calibrate al- of equations. their great 227 Bibliography PI ,J, :\ARDE:'\ ..\\0 ing coupled K. Preconditioned KARLSSO~. S,ljstFTlI of fundamental cg-type semiconductor methods equations, fo/' ,'..011'- BIT. 29 (1989). pp. 916-9:37. [:2] :\1. G. BEHlE, :\LLE~. medin. in Lectures [:31 0, AXELSSO\'. [-t] 0. _~XELSSO\ \otes ..\.\iD Solution .\hthods. Verlag, Cambridge ituations 1:2 (1991). and l'aI'iable-8tep flow Berlin. t'niversity in pOT'OI/S 1988. conjugate preconditioning, 199·1:. Press. gradient SIA\I,J. \Iatrix Applied Scienn-' pp. 62,=)-64,,1:. Pffrolfllm Rfsuroir Siml/lation, L9~'n, Publisher. T. CHAN. \V. COrC;!IR:\\. B.\\~. fl/do/'i::ation (11)89). Springer P. VASSILEVSKI...t black box generali::ed [.')] K. _~Z[Z ..\\D _~. SETHARI. ~Iil R. .. \fultiphnse ,J, TRA~GENSTEIN in Engineering. [/f/'atit'e .'.ol/'a lcith inllfl' .-\nal. Appl.. A~D prorfdure for ,\\D The altunate-blod.· K. S\IITH. 8lj8tf/1/ .... of partial dijJuelltilll equntiolls. BIT. ~!) pp. !):38-9.i..t, [7] .J. BAR\ES . .-111 I/Igorithm method. Compl\ter [~] R, BARRETT. GARRA. \-. fa/' ,'wl/'illg lIonlinear eqllations ba8ed on the 8fCI/I1t .Journal. S (L%,)). pp. 66-67. :-'1. BERRY. T. CHA:'>i. EIKKHOLTT. R. POZO. Templatf8 fo/' the solution ods. SIA\l. Philadelphia. .T. DDnIEL C. RO~lI.\iE, . A~D H. of linear 8!}8teTns: building 1994. .J. DC:'>iATO, .J. DO\VAN DER VORST, blocks for ituatit1e mffh- Ul] Dynamics .J. BEAR. of Fillids [LO] G. BEHlE A\O p, FORSYTH. .-;imlliation of enhancEd in Porous .\Iedia. Incompldefactori::ation oil recovery. SIAM Inc, I ~)'::!. DO\'er Publications. methods J, Sci, Statist. for fully implicit Comput.. ,) (L!)81). pp. ;,)-!:3-,,)61. [l L] G, BEHlE . .-;lfll11 [12] A. I (L·IOII t' •. ~ JOe. 0 BER~IA\ Blocl.: Ifu'(ltice t'p et, E ng ....J R, A\O in Classics .';('ifflas, [L:3] R. P. \·[\SO~[E. .\\0 BHOGES\\'ARA . in ,-\pplied .J. E. .. I-:ILLOUGH, \V. C . .JR .. Domain Df'composition Italy. B./ORST.\O 1/11" pdroltlun position Domain .J. BR.-\~[BLE. :\Iathematical decomposition in computational and mll/ti- memory parallel pp. 1')_1- 16') _. Pamllel domain International Computing. decompo8itioll Conference D. Keyes and.J. on ~11. Society. f)o/TIt1ill "((,o/TIpo,~ition, in DOlllain-hased 1994. Philadelphia, -t (l()9')) ,,_. for Scientific K.\HST.\D. ulginf.f:J'ill!}. parallelism science and engineering, pfl/'Ol/r:/ cornpl/fillil and problem decol\l- 0, Keyes. y, Saac\, eels .. SI.\.\I. U)9-1. pro :39-,')6. .J. PASCIAI\ 1/I.IJ0rithm for 8nddlf grid .\Iethods, .\Iethods T. rr:,~fl.,.oi/' in the mathematiccd on di8tributed E. GROSSE. .-\\0 IQq:3. .-\nwrican A\O methods and D. Trllhlar. [L6] SIA:\I. to rOl/pled tl'llllSpOJ't eqllation .... in Se\'enth I'd~.. ('nllW, ;1\ r, matrices ill poroll8 media . . clentl 'fi c C"ompllt1l1g. J 01lrna 1 a t· S' [U] P. BJ0RSTAO. I/ppliul .\'onnf!)atiL'f :\Iathematics. !J/'i" SOI/'f/'8 fo/' }lo1l' simlilation !}/'OCf8801".';; fully implicit (·Ll)'·)) . ti_ . pp. 6-' :)ti- 66" ~, PLE\DIO\S A\ 0 lTIethodsfor L99,). point . .\\0 _-\. \-:\SSILEV, probh:ms. in Copper ,·lnalysis .\Iountain of the ine,ract Conference C::nl('f/ on :\IlIlt- [L 7] P. BROWN, A local conuergence difference [L8] -. projection A theoretical .1. Sci, Statist. [l9] methods, thEOry for combined SIA~l J. ~umer. Ana!., 2-t (1987). comparison A\'D and GJ/RES of the Arnoldi Com put.. 20 (l992). P. BRow:'i inexact-Newton -/finitE- pp. -t07--t:l-L nlgorithms, S1.-\\[ pp. ,')8-78. A. HINDMARSH, Matrix-free methods for stiff sysfF.ms of nonlinear systems of with lOll' mnk up- ODE's, SI:-\\I J. :'\umer. Ana!., :23 (1986), pp. 610-6:38. [20] p, BROWN A\'D fquntions. [2L] Y. Private [:t.2] P. ~. S,-\AD. A:'iD thE solution [:n] P, ~. of lrzrge-8cale [2,5] -, BROYDE\'. .-\\'D (26] -. II (l990). H. \V.-\LKER. pp. -t,)0-4-81. Preconditioning 199,), differential-algebrnic Y. S.-\AD. lOT/I'ergellce Csing l\rylot' methods ill S1.-\:\I J. Sci. Comput.. -"ystems. thwry of IIOllliT/enl' S£:Iu'oll-/\'/,ylol' J, Opt.im .. ,~ (l99-1), pp. 297-:3:30. A cla"'8 of metho({·"fo/, ics of Computation. A lieu' me/hod .Journal. Comput.. for pp. L 167--L·l88. BRo\\'\' \[athemat I\rylo1J methods A. HINDMARSH. A:-;D L. PETZOLD. ,1Igori/I1111,."SI.-\\I [:2q C, Hybrid Communication. BROW\'. L,) (L99-i). SAAD, 51.-\:\1.1. Sci. Statist. P. BRO\VN. dntes. Y. for soh'illg $o[,.ill.'J lIolllilltnr L~) ( L%,)). pp . Ilonlinerzr .j'jj ...imliltanfOu8 equl//ioll ...., -,)9:3. . ,imllltanf.OllS equations. Computing 1:2 (L l)()q). pp. !)-i-99. The COllt'f/'gfllce Computation. 2~ (l970). of .'Jingle-rank pp. :36.5-:382. quasi-Sewton mEthods. :\lathematics of [Vii .T. DE~:\IS K. Tl'RNER, ;\:\0 .\ppl.. ~8 (1()K7), [U'i] .T. DE:\:\IS IIprlntes [l71 -. \\"ALKER. .Jersey. '';;l)(!r'8E ,<;fcallt ,\Ilal.. :2:2 (l98,)). \1l111er, R. B. .\:\0 ([nd 1I0nlinwr R. DF.l'Fl.II.\RD. /lpdllte.; I Local improvunent in thwnm.<;. Programming at On>r- Irith inaccllmlt 8fcan! conditiolls. pp. 760-778 . •. Vllmerical methods Prentice-Hall. eqllatiol/8. FREr\'o. The IT/Ilthunatic$ E/)\\'I.\(;. it,.; .\ppl.. EISE:\ST.-\T, IIOll8,1)mmtlric pp. :34;')-:3:37 . :2 (L!)C)Ol. for unconstrained Englewood lilltl/I' Fa..:;t Sfcant .-;ystfll/S. Cliffs. \'ew :\E\'.-\\L1:\\,\, method..;; for I:\[P,-\CT Ilu of C'omplltill!!; pp. 2·14-]7() . of n ... rl'l'nil' ...imllllllioll. Sl.\:\I. Philadelphia, [.')]] T, EIROLA ,\:\D O. .\ig. alld ..-\. \\·ALTER. ,\:\D of IrIl"!Je non'':'-'jl1llnf:!l'u: .-ollliion :\Iatlwmatic<;. for ,.;;e ("(I II pp. 949-987. :22: :\Iathematical SCH~ABEL ill Scil>lIc(' (lnd Engineering. [,):~] S. l("nst-change Lq~n. ,h/'lllil'l .-~~n, for 198-1:. f.t(I ...t-chllll.t}f optimi::atioll methods: Study Programming L,l9] .T. E. DE:\:\IS theorem8 COIII'UgfIlC€ in II/lftsi-,Yell'ton \orth-Holland, SIA:\I .1, Linear .-\Ig. and direction8. SI.-\:\[ .1. ~umer. ,-\naL. 18 (l98l). [nncc/lmcy [L8] -. '."in) p, H. I1Itfho(!s. \\'olfach. conju!}nte pp. 187-209. .\:\D :\Iathematica\ Genun!i::ed ill frontier~ in :\pplit·.[ l~)S:L ,\cctlunting wilh mnk-one updnte8. Linear LlL l L~)S9). pp, ·)Ll-;,)20. H. EUI..\:"i . .-\:\D :\1. SCHFLTZ. ..;;ystems of linear fqllations, \"nl'iational SIA:\[.J. iterative mf:/hod ... \"umer. Anal.. 20 (l98:3), [,j..!j mtthod",. Tech. Rep. \lASC S. EISE:"iSTAT .-\:'\0 [.56] -, Choosing ('ampuL [.57] H, H. J. Optimization. SIA\[ A\D G. ELS:"iER A:'\D systems H. EIl.:"i. V. terms ('omp1\t.. l,) P. F.\:'\. fa/' itfmti,.f [6l] in an inexact Inuact GaLl'S. S1:\\l.1, inexact Newton Newton Rin~ m ffh ad .... S1.-\\I .J. Sci. method. and preconditioned :'-rumer. Anal..:Jl Convergence in the numerical U::awa algorithms (1994). pp. L64.5-1661. of block iterative solution of Euler methods equations. for :"J'umer. pp. ;')-tl-,j.59. GIO\'A\GIGLI. lif/tar (l996). Globally convergent V. \IEHR\IA:"i:'-l, llrising !Jol'ilhmic [60] Q. Sciences. -t (L99-t). pp. :39:3-422. the forcing \Iath .. ;')9 (l991). [591 A. qltasi-.\'f1L'fOf/ l7 (l996). pp. l6-:J2. EL\IA:"i linear TR 82-7. Dept. of \lathematical \VALKER. for sllddle point problems, [.581 L. of ine,ract L982. l"niversity. [;').5] T. STEIHAFG. Local analysis S. EISE:'lSTAT AND system D. KE'{ES, A:"iD \1. D. ..W/t'fI'8 for I/onlinear S\IOOKE. dliptic Towards problems. pol!}nl- SIA\1 .J. Sci. (1~)9-t). pp. 68L-70:L FORSYTH .. J. \Ic\L\CKE:'>i. 801L'fI's in duiCf . .;i/lli/lation. A\D \V. S1:\\[ T.-\\G, Performallc( .1. Sci. Statist. I,~,";I/(,'; Comput.. I, pp. lOO-lL7, C. FARHAT. L. CRIVELLI. liVE ....olvEI's to multiple Raux. £,l'tfnding load and repeated analyses. Center for Space Structure Colorado. A:'>iDF. substructure Tech. Rep. CU-CSSC-9:3-L 7. and Controls. College of Enginnering, Boulder. Colorado . .1uly 199:3, based itfrll- ljniversity of :.!ll [fil] B. FISHER .-\:\0 ,I. Appro\:. [ti:3] p, R, Theory. FORSYTH G;,)(L99l). iturztions [6;')] R, simulation Qua..-i-I.,t1'l/d FREr:\O. FREl"\O. G. WI' systEm..-. GOlXB Practical . considerations and their . .-\\0 for adaptit'f ill/plicit ~I. :\, Ilse in non-Hermitian olllt,.i,/, Math .. -!~ (1992), pp. l:3,=)-1.18. and Applied \umerica. in ,\da I/ot allL'([y,,;; optill/lIl. (1I'f: .1. of Compo Physics. 62 (1986), pp, 26;')-281. polynomials .1. Computational . polYllomill18 pp. 161-172. P. SA~I\IO:\'. .-\:\0 /TlF.thods in rr,,:;ul"oir [6-t] R. Chfby.,dH'I' FREl":\O. Iterative :-;ACHTIGAL. Cambridge [niversity 80lution of IiI/- Press, ~ew York. L!)ql. pp. Yi -lOa. [Go] R. FREC:\O .-\\0 I.ahorilt\)rip~, [h,] D. G.-\y. :1),'''1] ,)omf (,()lIl·f.l"gence Lh IIIJI~»). II ',"i 11/' R. thofl/cith for GILL. indtji'llih Il992). [,0] \\", SCH\.-\BEL. rep .. AT&T proputif." interior-point :\umerical algorithm ,\nalysis of Broydfn',,;; Soltillg pmjfcftflllprla/f,o.;. for :\Ianuscript. SIA\I .1. mf:fhod. :':;Y8ttm8 of 1I0l/linUlr in \onlinear D. PO:\CELEO:\, ."i.IJ."tU1/,';([rising in optimi::rztioll. tf[llrztiOIlS Programming:L Robinson, I'd~.. ,\cademic ~IrRR.-\Y. 80ft.·- Bell :\lllller. by !3/'O.'I- O. \Ian2;Cl~ar- Pr('~!'. \.Y .. L!),8, pp. 2-!,,),:'!."L. .-\\D ~I. SI.\\[ S.-\l'\OERS, .J, \Iatrix rfconditiollu',"i Anal. .-\ppl.. L:~ pp,l!J:!-:nL. H. R, GLO\\T\Sh:1. ,fitllt Q.\IR-based pp. 62:3-6:30. idn. R. \[eyf'l'. and S, [()!)] P. A Hill. \.1. l'l!.);), \[Ilrray 0, G,\y ,\:\D ,h ,J.-\RRE. porofJrrzm..-. tech. illY !il/u/r \nal.. F. mtfho,ff; 1,l1h. Siam for KELLER tht ltrzst .. -\\D ,':iq/Ill/·f.'; .J Sci. St at. ('omput.. L. REI~HART. ,"iollltion Continuation-coT/jugatt of nOlllillwr ~ ( L!J8,i). pp. ,9:3-8:3:3. bOllndary /'flillf gl'n.pm/)- [71] G. GOLUB A~D C. V. LOA.~. ,\Iatri.r Computations, John Hopkins t'niversily Press, 1989. [72] G. GOLrB Richardson (l988). [7:3] ~I. :\\D The convergence OVERTO~. itf./'{/tiL'f mflhods fOl' sol ring linear of ine.ract systems, Cheb!Jshu lind :\Iath .. .j:l Numer. pp. ;j7l-;,)9:3. .J. S. GO~l EZ A~ D G.\IRES and ORTHO'\l1,V \lathematics Performance ~IORALES. of Chebyshev on a set of oil reservoir for Large Scale Computing, iterative simulation method. problems. in In J.e, Diaz, New York, Basel, 1989. pp, :26.)-:295. [7t] \\'. Itt-ratite HAcKBrsH. plied ~lathematical [7,,)] R. HA~B'\r". D. I'tgatf:ll itFmtil't of Large Sparse Science. Springer-Verlag, SILVESTER, A:'-iD .J. 8ollLtion ttchniquu5 TWH-:216, t'nin'rsity [7b] \'. Soll/tion A\D 111 fTl I ,,/gol'ithll1,-:; for the Tlllllluicf/1 sollLtion [Jroblt f1/8, [77] ~I. I. .1. for :'\umer. HE1\KE\SCHLOSS and Applied [78] ~I. HOLST. L. VICE\TE. "PQ algorithm8. intErior-point \Iathematics, ,\ flin!] ("qllation8. of coupled and 8fg- swirling flow, Tech. Rep. 1. H:\SBA\I. Segrf!]alfd finitF of l"I'gt- ...cah incompl'fs8iblt t!f- .llolt' Analysis of ine.ract trust-rf.gioll Tech. Rep. TR9,5-18. Dept. of Computational Rice l"niversity. /'Obl/st and e.ljicifllt L!l9..t:. A comparison ill Fluids .. L7 (l~)9:3), pp. :l2:3-:3..t:~. \leth. A:'-iD ,-\p- 199-t. E\GEUI.\\. HAHOI"TL"\IA\. of Equation8. 1994. for incompressible of \Ianchester. ~I. CHEW. Systems numerical 19!)'=). method for nonlinear proteill /Hod- ['jl)] C. ill ,\pplied [80] .T. Ituntit'f KELLEY. methods \Iathematics. I-:ILLOl"GH A:\D il/l'f:8ligatioll for linear SIA\L Philadelphia, \1. \VHEELER. Parallel of dO/lll/il/ decomposition SPE Symposium on Reser\"oir eqliation,.;;. in Frolltiers and nonlinear 199.5. iteratil'f linea/' equation solvers fo/' /'fst/'I'oir Simulation, SPE paper SO!vt/'8: ,-1/1 ,;;imlliation. in \inth no. L60:21. San .\ntonio. Texas. 198'j. [8L] K ..JBILOC H. A:\D .,;;ol1-ing systnl18 [82] H. KLlE. of iiI/wI' 1. R.-\\IE: -,.'/"1011";;of /lonlil/wr cllld ,\pplied [~I] -. \[ocleling .. -\\D "llel Compl\tation. Group Rice ("ni\'f~rsity, [S.} -. method..;; for A:\D mllltiphasE .. \ugust 19!H. for 8011'in.'7 L99,5. Russia. Tchebychu Center ill /TIll of Research It i-ph prorfdllrE itUfltio/l. \umer. for 1/ ""t on P,1I'- L')96. SOlliE problem..;; ill the thwr!) Thf 111 dll or/.", and applications of itemti/.'f.: II/( th- L969. ituatioll fo/' I/o/lsymmtfl'ic !inEn/' "'Y8tfTl1,";. .. :28 \ 11)11). pp, :3O'j-:32/. Adaptit't Tchtbychu DAWSO~. .-\ffiliates \leeting Tech. Rep. C'RPC-TR%6-11. Rice Cniwrsity. \IA\TEl"FFEL. \ tuner. \Iath .\"fleton-I\'rylol' Industrial C. So l'C'IE , S. \1. \\"HEELER. l\'rylov- . ;;ecant mtlhods ods. PhD thesis . .\O\·osibirsk, [So) T, fo/' \Iath .. (199,5), pp. 73-89. C. RA\IE, metllOr!. ..' /01' extrapolation p ,.u:o 1/ dit io 1/ f ,.." fo,. iI/ f ,ract Sf letO II /'t,.,;t/,/,oi,. . ,;imlliatioll. ETSOV. )i umer. rector f:qllatio/l";;. Tech. Rep. TR9,5-:27. Dept. of Computatiollal \Iathematics. T I/'o-,.,f agf [8,5] Y. Krz:\ \1. P/'fconditionfl's flolL' .... ill Suhsurfacl':' [8:3] H, KLlE. \1. eqliations. P.-\\'ARI\,O. \1. "'HEELER. Allnl!}si..;; of some SADOh:. t81il1lflting parameters ~Iath .. :31 (L978). pp, for the l8:3-208. nonsymmf/I'I(' Methods [88] G, MARCHUK, ics, Springer-Verlag, [89] .J. of Numerical Applications of \-Iathemat- 197.5. Thwry )'IARTi~EZ. Jlathematics, of 8ecrlT/I preconditioners. ~lath. of Computation, bO (199:3). pp. 699-,18. SOR-8ec(Jllt [90] -. [9l] C, \IATTAX Series, [92] .J, methods. A:'iD R. Richardson. ),IORE, S1:-\)'I.1. ~umer. Reservoir DALTON. Ana\.,:31 Simulation, pp, 21,-226. (l994), vol. 1:3, SPE-Monograph TX, 1990. .-l colltction of nonlinHl/' problems, ics, Vol. 26. E. Allgower and K. Georg. in Lectures eds .. American in Applied ~lathe\1lat- ~'lathematical Socif'ty. L990. pp. ,:2:3-,62. [9:3] R. ear ~ABBE:-; . Algebra. .-t new application for generali:ed proceedings L. Reichel. Computing. of t he conference A, Ruttan. and JI-matrices, in Linear in Numerical Algebra and R. Varga, eds .. Walter Lin- Scient ific de Gruyter. L~H):3. (!l-tJ :'\. :'\,\CHTIGAL. L, REICHEL rithlll for non8ymmtfric ,\\0 L. TREFETHE:\ linea/' ,"!J8tflll.... SL-\~I.J. . ~Iatrix .-l hybrid G.\IRES ,--\nal. .--\ppl.. 1:3 Ill.tjo- ([<)<)'2), pp. ,96--8:2,), [9.3] S. ~ASH. Newton-type Anal.. 21 (198·l). [96] -. minimi:ation ria the Lanc:os method, S1A\1 .1. \um. pp. ,70-778. Preconditioning of truncated-Sell·ton put.. 6 (l98,3). pp. ;399-616. methods. SIA~l .1. Sci, Stat. ('om- [9T] S. :\'ASH IIItlhod ,1. .\~D :\"OCEOAL. alld Ilu: trlll/cated Optim .. 1 (L99l\. [98] S. :\.\SH A:\D A nUT1Hrical stlldy .\"fll·ton mffhod for of Ihe limited BFGS' memory large 8cale problem..,. 51.-\:\1 .J, pro :3;');3-:3T2, _-\. SOFER. [illwr alld .vontinear programming, :\lcGra\\"- Hill. L996. [99] 0, \Iathematics. [LOO] :\. .T, Birkhallser fqllalion8. ThfOry :\OCEOAL. .T. :\OLE\ .-\:\0 ....tmi-i/TIplirif [l1):3] D. \'er!ag, Basel. 199:3. ~;):3- :!()(j, B. :\Ot"R-O\IID. for ,...Olilfioll of tlco-stage of algorithm8 itr:ratit'e for processes for solrill!} pp. -t60--l:69 . uncon.strained optimi::ation, in Acta University Press. ~ew York. 1991, pp. 199-2-I:2. BERRY, n .... u/'Oi,. pp. in Lectures ill for tineal' equations. .}, \lImer .. -\nal.. lO (l97:1), 51.-\\1 \ umerica, Cambridge [lO:2] of iterations Oil Iflr: COIINrgf/lcr: :\ICHOLS. linwr [LOL1 COll/'ugellrf :\EVA:\LI:\:\A. nst ,..imulalioll B. Itchlliquf8. PARLETT. of 1I01llill.U/I' }illilf on the 8lability ,\\0 (IUlIf R. /II and time-8tep Trans. TAYLOH . fquatioll,';;. 8fn8itirity SPE of .-\C\IE. :2,j:3 (L9T:3). .-l .\"elL'lon-Lallc::o., Computers of /TIt//IOt! and Strllct 111'1'''';, If) (L~)8:3). pp. 2-tl-~,,)2, [LO-l:j D. The bloc/..' conjugale O·LEARY. ear .-\lg, alld ,-\ppl.. [lO.s] .T, 111 [l06] D. ORTEGA Sutl'al .\\0 (UJ8L). \\". \ (/I'/(/b!f8. Ol"ELLETTE. :2~) algorithm and relatEd mEthdo.", Lill- pp. 29:3-:322. RHEI:\BOLOT. ltf:l'atiL'e Solution of .vonlinwr Equatio/l ... :\cademic Press. ~ew '{ork. 1970. Schul' pp, LST-29.S. (1980). gradient complEments and statistics. Linear .-\lg. and .-\ppl.. :3-1 [LOi] B. [lOS] PARLETT. A new look at the Lanc::os algorithm for solving symmetric tuns of linear equations. Linear Alg. and App\.. 29 (l980). :vI. A.'{DH. PER.'{ICE. L. tial differential ZHOU. equations Ctah Supercomputing [109] L. REICHEL. The application [110J Y. SAAD, sHeral [lll] - methods, mffhod of Computation, iteration linear systems Operator ods. \L Kaashoek . .1. \'am Schuppen. u'ith -t8 (1987), pp, 6.j1-662. . .-tn ot'erview of Krylov subspace methods with applications Scattering, and poly- (1991), pp. :389-4l-1:. for solving symmetric \[athematics lems. in Signal Processing, par- L994. of Leja points to Richardson On the Lanc::os of nonlinear Tech. Rep. TR-l8:....9,L Linear Alg. and Appl.. rJ-t-l56 right-hand-,.,ides. pp, :323-:J..l6. Parallei solution using ine.ract Newton Institute. Ilomial preconditioning, \VALKER. ."!I:;;- to control prob- Theory and Numerical :v[eth- and A. Ran. eds .. Birkhauser. 1990. pp. -Wl--ll0. [l1:2] -. A ./ie.l'ible inner-outer CompuL [11:3] -. preconditioned G.\fRES algorithm. SIA:vI .1. Sci. 14 (199:3). pp, -l61--t69. !tuatire ,\ltfhods for Sparse Linear Sy . ,tems, P\VS Publishing Company. L996. [ll-t] Y. SAAD A:\D:VI. SCHCLTZ. for soluing nonsymmtf,.ic GJlRES: A gellf:l'ali::fd minimal linear systems, residual algorithm S1:\:\I.1. Sci. Stat. CompuL i (1986), pp. 8.56-869. [11;3] T. SAATY, ,\lodem nonlinear' equations, Dover, 1981. [116] A. A. SA~IARSKIl .-\.'{D E. :\'IKOLAEV .. Vumerical vol. II: Iterative \Iethods. Birkhauser J,Iethods for Grid EquatioN:';, Verlag. 1989, [117] P. SAYLOR A\O D, S~IOLARShl. R ichard ..wn ....(1Igol';thm. [Ll8] V. (1967), pp. l:n-L:~8, til'e A\O ,,,clHlIlf,' liplt .-l /If II' approach proctl/llrt. Computing. A\O in Sixth E. G.,\LLOPOrLOS, to construction An ituatiue .\ lIIfmory-con$fl'l'ing hybnd method of ejjiciwt ,I, /'11- CG and BiCe mftho,{,"" processand SIAM method for .J. Sci. Statist. non..,ymITu:I. Comput.. l(). -!J:n. for solring/infar ..,ide.". Tech. Rep. CSRD-L10:3. Oe\'elopment. ('0/11'( (-nin>l'sity of Illinois. Center 8ystuns with for Supercomputing l.'rbana-Champaign. /11111- He- Feb. l!)l):!, 1!Jf.IICf pmp" rtiEs of hloe/.· (;,\{ RES for :,ol,.ifl.r/ ,..y,.,.tf 1Tl8 with "'IIII,/il/ .....Tech. Rep. ('SHU Clnd De\·elopment. C, S~IITH . .-\, T"lli\'ersity PETERSO\ . .-\\0 of 11/llltiplt Prop .. :~7 (lq~9). pp. l·l90-Lt9:!' SOHE\~E\. Sf/clan·,.,. l:\l(j. uf Illinois. fOf' Ihf l/'fl/tlllfllt SI.-\~l.J. Zh .. l~) L99:3. pp. ,)7-69. sides, [12-1:] D, C, ~Iat. D. Keyes. :\1. Leuze, L. Petzold. R. Sincovec, right-hlwd riyhl·/IIII/lI.,idf. [L2:3] Ukran. SL\~I Conf. in Parallel riyhl-h(wd -. . wilh multiplt "t'alTh alld L 1:2:2] method, bloc/': ,\rnoldi [L20] \". SnIO\C(\1 [In] -, 1,)-t-l,)6 (1991). pp. 61:,)--(db. Fariable D. Reed. eds .. SIA~I. !) l7 (dgol'illlJl/ /01' In Russian. .-\, YERDII\ ing for Scientific pp. of .Vewton's of an rzdaptil'f fol' I/II/."",,'/'(I.'} pI/mile/algorithms: I/nd l'(lriabh ric -"'ys/fm8 Alg. and Appl.. A modification SHA~IA\Shll. [I L91 H. SnIO\ Linear fmplunentation ('ellll'r l"rbana-C'hampaign, R, :\IITTRA. i/lcilhlll rnflhorl fur SlIpt'l'colll(>lIting A conjugate tltctroT1!l/gnetic Oct. L99:~. g/'lldient algof'ilhm fie.ld8. IEEE Trans. l.cith rz model trust \lllner .. -\nal.. UJ (L!)82), pp. -1:09--1:26, Ht'"I'(\1'l'11 ,\111. region modificl//llIlI. [l2,5] G. STARKE AND R. A hybrid Arnoldi-Faber VARGA. 8ymmetT'ic SY8tems of linear equations. [126] T. tion methods. [127] 1. Local and superlinear STEIHAUG. ylathematical A:'-oiD \V. TAGGART niques in reservoir :\Iath .. 64 (1993), convergence for truncated 27 (1983), SPE Reservoir pp. 2t:3-:!tO. iterated projec- pp. 176-190. The use of higher-order PlNCZEWSKl, method for flon- ~umer. Programming, simulation. iterative differencing Engineering, (August If.ch1987). pp. :360-:372. [L28] R. TEIGL.-\N'D media .J. [L:30] L. Cell centered FLAD\L\RK. methods Birkhauser ItuatiL'e TRAUB. glewood G. in :\Iultigrid flOIL', grid Conference. [129] A:'\D III : proceedings Verlag, mulligrid methods in pOrolL.~ of the :3rd European \Iulti- 1991. methods for the solution of equations, Prentice Hall, En- Cliffs. 1964. TREFETHEN'. P'jflldospectra fiths and G. Watson, eds .. of matrices, in ~lImerical l:3. Longman \'01. Scientific Analysis 91. D. Grif- and Technical, 1992, pp. 778-79,). [l:3l] S, Tr REK, u/,wtions. [t:32] H. In preparation. for the solution Compllt.. L:3 (l99:2). VAN' DER VORST .\Iethods, mEfhods forthf incompre88ible ,VaL·ier-Stokt.j 1994. BICGSTAB: \'A:'-oi DER VORST. BI-CG [1:3:3] H. On discrete projection a fast and smoothly of nonsymmetric linear convergent variant of SIA:\l .J. Sci. Stat. systems. pp. 6:31-6-t4. A~D C. Tech. Rep. TR91-80. VUlK, GMRESR: Technological A Family Cniversity of Ne.jted c;,llRES of Delft. 1991. ~I:~~] C, \·Uh:. FI/rthu il"al t'nin'rsity [l:nj C. \'C[[~ [L:W] H. [l:ri] -. \YALLIS. nf'ser\'Oir Incomplete (,'.\IRES gl'flditTlt SiIlllllation. \YEISF.R ....for ,\~D SPE pap<-,r 9 (l988). method. using householder f/'{/II,";- pp. L')2-16:3. Computer Phys\cs Commnllilli- Technil'id 1l0erS, S P [ paper [Ln] L, \\'IGTO\. dYlIlIlT/ie,o; flO, SPE Symposium L2:W-i. San Francisco. linf Communication. ,..ucCfsi/'e fIlUt!/j8i, ... Soc. of Pet. Eng .. probltm:;. ,\~D ,\nnual as a preconditioning in Seventh Pri\'ate \1. \YIIEEI.F.R. flliptie .1. \YHEELEIl jil/id mtlhod f!iminafion aCCf!tmfion. Two· .... 11 p !J/'Il'ol/'{itiollil/g, f, I'Inl'l G.\IRES Computing. gaussian !J/'Ob!f1I1,"; II thforftical [l-l-lj of fhf .1. "",\TTS . .-1 method of improl'il/fJ _-\, ITIffh- ·'):3i 19~9). pp. :311-:320, i::fd conjugaff. ~Il t; of some G;\IRES-like L60 ([fJ92). pp. 1:H-L62. of Scientific of flif A compari.'wn VORST. ImplffllfntatioT/s Implu1Ifntafiol/s :I:~!)] - 1992. VA:'-1 DER .... .Jollrnal cations. [110] H. Tech. Rep. TR!J2--L:2. TI~chll()II)!!;- with C,\IRESR, .\Ig, and its ,-\ppl.. \YALh:ER. formation [l:~~] .1. of Delft. .-\~D orf,,,. Linear f,l'ptrienn.,,; R. S~"TH. Conference no. L9801. Sail ,\ntonio. D, Yl'. A~D :\. YOl'~G, f'Of!t,~, in Proceedings 199:3, in ani8ofmpi,. J.. (L07:3). pp. LO,~-118, of b!ocJ.:-Cfllfufd Simulation and Exhibition G.\IRES ./inih IliF- pp. :3;')1:\7·), on a hypercube. of the Society Texas. 19tn. Texas. ,I. \Ull1er. ,\nal.. :(') (Lt)~~). Rf:-;fl'l'oil' gfn(:/'{/I- on :\umerical orerrda.ration 011 ("Ol/I'ugfnf'( ~I.\:\l for of Petroleum in ()-~th Engi- 1089. flcct/eration L!)b;'),-\1.-\.-\ Conference, of complltfltio/lol Del1\"er. CO. Iq',-l. [l-1-t] of preconditioned iterative PhD thesis, Dept. of Computer Science, 1]. YANG, tUTl8. Champaign. [l-t,)] D, YOC:'\G. [l-t6] S. ZE:-iG. A family "olvas r niversity fteratilJe Solution C. VeIn, AND equations of Large Linear Systems. P. \VESSELING, Solution in generaL coordinates grid method8. Tech. Rep. TR~):3-6-t, Technological Z. ZORA:'-'; linear S.1/8- of Illinois. ("rbana- 199.5. ,Yarler-Stokf8 [l-li] for sparse AND X. SHEN. Academic Press, 1971. of the incomprE88iblf by I{ryLov subspace university and fT/lllti- of Delft, 199:3. Pnrallel aLgorithms for optimaL controL of large scale linea,. sY8tem8. Springer- Verlag, 1993. Glossary 2SAdd Two-stage addi t i\·e. 2SBJ Two-stage block .Jacobi, 2SComb Two-stage Combinative. 2SDP Two-stage discrete projection. 2SGS Two-stage block Gauss Seidel. 2SMuit Two-stage multiplicative. ABF .-\lternate EN Eirola-:\ evanlinna. HKS Hybrid Krylov-secant. HKS-B Hybrid Krylov-secant based on Broyden's method. HKS-EN Hybrid Krylov-secant based on the Eirola-Nevanlinna HKS-N Hybrid Krylov-secant based on :'-iewton's method, HOKN Higher-order Il\IPES Implicit press1Il'es-explicit saturations. KEN I\:rylov- Eirola- \ f'\'anlinna. MHGwIRES :\[odified hybrid G:\IRES. NEN \onlinear block factorization. 1-': ry 10\'- \ ewt on . Eirola-~e\'anlinna. algorithm.