# Methods to Find Global Minima

Unless an analytical solution exists, which is generally not the case for experimentally measured data with any noise, the only method guaranteed to find the global minimum of an optimisation problem is a complete search of the parameter space. Since many of the parameters may be continuously varying this may appear impossible, but it is almost always possible to define a vector of sufficiently small, but non-zero, smallest discrete parameter value changes such that adequately sampled grid search may be performed. (More sophisticated methods may vary the grid spacing throughout the parameter space or perform multiple searches with differing grid spacings.) An example of this type of parameter space search is given in Wess, Hammersley, Wess & Miller (1998) for the problem of determining the molecular packing of type 1 collagen from limited experimental data.

With the recent and continuing phenomenal increase in computing power, the size of problems which may be tackled in this brute force manner has vastly increased, but there remain many problems where the size of the parameter space to be searched is so enormous that such an approach is and will remain impractical.

Whenever possible the best manner to overcome such problems by setting good starting values (Ghosh, 1997). If the model can be initialised in the region of the global optimum, then any of a variety of simple ``descent'' algorithms will find the global minimum, given enough iterations. The primary difference in the various algorithms is the rate of ``descent'' under differing conditions e.g. ``saddle points''. Only when good initialisation is not possible, do more complicated and time consuming algorithms such as simulated annealing and genetic algorithms serve a useful purpose, and still they cannot guarantee to find the global minimum.

To de-mystify these terms ``good initialisation'', ``local minima'', and ``global minimum'' a simple 1-D example will be given. Imagine there is a row of peaks of the same peak shape with a regular spacing between them. The ``objective function'' i.e. the function to be minimised, will be some function of the difference between the observed data values and the predicted model of the regular peaks. This could be as simple as the sum of the absolute differences, or the sum of the squared differences, or may be a sum of differences weighted according to some assumed error model (this is discussed below).

If the initialisation starts with approximately the right peak shape, but with double the spacing, and roughly in phase, we could expect a ``descent'' algorithm to obtain good estimates of the values of the peak shape parameters, but to converge on a value for the spacing which is twice the correct value. Every other peak will have been missed so the value for the objective function will be greater than it could be. The algorithm has fallen into a ``local minima'' and not the ``global minimum'' where the sum of differences would be much smaller. We could hope that a simulated annealing algorithm might randomly be able to jump out of this local minima and find the global minimum. However, this is not guaranteed, and must take much longer. With rudimentary graphical input of the initialisation model and with graphical display of the model at convergence this problem can clearly be overcome.

Even with much more complicated models, graphical initialisation and display is often a much better method for obtaining global minimums than more complicated algorithms. Nevertheless there is a class of problems where this is not possible and algorithms such as simulated annealing and genetic algorithms can be relevant.

User graphical input with immediate feed-back is used to initialise most of the fitting models, although sometimes automatic algorithms are used to obtain good initial estimates for simple well defined problems such as fitting a direct beam mark. The results of all model fitting are displayed graphically. The model values are available after fitting for further analysis e.g. in the case of 2-D data the model can be subtracted from the experimental data to investigate structure in the residuals.

Next: Parameter Scaling Up: Model Fitting Within FIT2D Previous: Model Fitting Within FIT2D

Andy Hammersley
6/11/1998