next up previous contents
Next: Orbits of Galaxies Up: Optimization Previous: The Newton-Raphson Method

Secant Method

  Comparing the iterative formula of the Steepest Descent (4.5) with (4.22) shows that the Steepest Decent method is identical to the Newton-Raphson when tex2html_wrap_inline5917 , tex2html_wrap_inline5899 being the unit matrix. This is the basis for the Secant method, also known as the Variable Metric method.gif

As seen in the previous subsection, the Hessian matrix needed in the Newton-Raphson method can be both laborious to calculate and invert for systems with a large number of dimensions. The idea of the Secant method is not to use the Hessian matrix directly, but rather start of the procedure with an approximation to the matrix, which, as one gets closer to the solution, gradually approaches the Hessian.

The earliest Secant methods were formulated in terms of constructing a sequence of matrices tex2html_wrap_inline5921 which builds up the inverse Hessian; that is,

  equation1556


Even better if the limit is achieved after n iterations instead of infinitely many iterations. The iterative formula is then

  equation1563


where tex2html_wrap_inline5871 is the gradient. Later, methods for constructing the ordinary Hessian were presented. The inverse Hessian approximation might seem to offer an advantage in terms of the number of arithmetic operation required to perform an iteration, since one needs to inverse the matrix for the Hessian update. This seeming defect was eliminated by, instead of updating the Hessian, the Cholesky factors tex2html_wrap_inline5927 of the Hessian were modified at each iteration GMW. The number of arithmetic operations is then the same for the two methods. Both methods are being used today, the choice being a matter of convenience and taste. The inverse Hessian update will be used in this thesis, and therefore only this method will be presented in the following. For details on the Hessian update, see GMW,DS.

For a n-dimensional quadratic function, the inverse Hessian will be achieved after at most n iterations, and the solution will then be located, from what we saw in Subsection 4.3.3, in only one iteration. Hence, the solution will be arrived at after at most n iterations. In other words, this method resembles the method of Steepest Descent near the starting point of the procedure, while near the optimal point, it metamorphose into the Newton-Raphson method.

The Secant method is quite similar to the Conjugate Gradients, in some senses. They both converge after n iterations for an n-dimensional quadratic function, they both accumulate information from successive iterations, and they both require the calculation of only the first derivative. (The latter makes the Secant method part of category (2) from page gif.) The main difference between the two methods is the way they store and update the information; the Secant method requires an tex2html_wrap_inline5939 matrix while the Conjugate Gradients method only needs an n-dimensional vector. Usually, the storage of the matrix in the Secant method is no disadvantage, at least not for a moderate number of variables.

So how are the tex2html_wrap_inline5921 being updated at each iteration? There are two main schemes GMW, namely Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS). Let us first consider the DFP algorithm. At a point tex2html_wrap_inline5869 , the approximated inverse Hessian at the subsequent point is given by Press

  equation1582


where tex2html_wrap_inline5947 and tex2html_wrap_inline5949 , tex2html_wrap_inline5951 being the gradient. This expression is constructed by forcing the tex2html_wrap_inline5953 -s to be mutually tex2html_wrap_inline5189 -conjugate. Even though the deduction of equation (4.25) is in regard of a quadratic function, it basically applies for all kinds of equations/systems. Another property of (4.25) is that if tex2html_wrap_inline5957 is positive definite, it will remain so for each k; i.e. tex2html_wrap_inline5921 is a sequence of positive definite matrices. For proofs and a detailed deduction of (4.25), see Walsh.

  algorithm1616

The BFGS algorithm was later introduced to improved some unbecoming features of the DFP formula, concerning, for example, roundoff error and convergence tolerance. The BFGS updating formula is exactly the same, apart from one additional term Press

  equation1678


where tex2html_wrap_inline5371 is defined by

  equation1687


The BFGS formula is usually to be preferred GMW.

The step length tex2html_wrap_inline5707 in equation (4.24) is determined as for the Newton-Raphson method; i.e either by linear search, backtracking, or by deciding on tex2html_wrap_inline5977 and gradually changing it to unity as the solution is being approached. The algorithm for the BFGS update with backtracking is given in Algorithm 4.4.

The Secant method has become very popular for optimization: it converges fast (at best superlinear), it is stable, and spends relatively modest computing time at each iteration. The method is often being preferred over the Conjugate Gradients method, but the latter will have an advantage over the former for systems with large number of dimensions (e.g. 10,000). Since both the DFP and the BFGS updating formulas are guaranteed to uphold the positive definiteness of the initial approximation of the inverse Hessian (assuming that there are no roundoff errors that results in loss of positive definiteness), the Secant method will only be able to locate minimum points.


next up previous contents
Next: Orbits of Galaxies Up: Optimization Previous: The Newton-Raphson Method

Trond Hjorteland
Mon Jul 5 02:59:28 MET DST 1999