USF

NEWS FROM THE UPPSALA SOFTWARE FACTORY - 6

Making the most of your search model

Gerard J. Kleywegt
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden

In cases with high sequence homology, Molecular Replacement (MR) is usually a trivial exercise. But, as the structural homology gets lower and lower, solving the structure by means of MR becomes increasingly difficult. Conventional wisdom holds that an RMSD of about 1.5Å between the search model and the actual structure constitutes the limit at which MR can still be used. Fortunately, the MR programs get better, and crystallographers think of new and clever tricks, thereby pushing the limits of the method further and further. On the program end, Axel Brünger has introduced PC-refinement into X-PLOR [1], which is a useful tool to sort the rotation function boys from the rotation function men, and to improve the orientation of the entire molecule or of individual domains prior to the translation function. More recently, he has introduced a direct rotation function [2] which may help in cases where other approaches fail. Liang Tong has developed a locked rotation function [3], which uses knowledge of the rotational relationships between NCS-related molecules to increase the signal-to-noise ratio of the rotation function. More recently, he has also implemented a locked translation function [4]. Jorge Navaza, finally, has carefully redone the mathematics of the fast rotation and translation functions, and his package AMoRe [5] has solved numerous problems which could not be solved by other programs.
In this article we will address some of the tricks that the crystallographer can use in order to produce a more suitable search model in cases where the standard approach is not immediately successfull. We shall also discuss some software tools that can be of help in this process.

* Which parts of the model are likely to be conserved ?
MR is based on a comparison of the set of intra-molecular vectors (rotation function) and inter-molecular vectors (translation function) calculated from the search model on the one hand, and the Patterson function (or peaks) calculated from the data pertaining to the unknown structure on the other. Therefore, every atom (or rather: atom pair) in the search model that is roughly in the correct position will contribute to the signal, whereas every atom in a wrong position will, at best, contribute to the noise and, at worst, contribute to false signals. If the MR is not immediately successfull, a critical appraisal of which parts of the search model are likely to be conserved is in order. The crystallographer can use a large set of heuristics and tools to aid in this process:

Parts of the model which are not likely to be conserved at all can be removed; if only the sidechain conformation is uncertain, residues can be cut back to the CB or CG/OG/SG atom. Temperature factors can either be set to a uniform value, or they can be retained. The latter is probably to be preferred in most cases, since it automatically downweights the contribution of regions with high temperature factors. An exception can be made if inspection of the temperature factors shows them to be completely unreliable (there are structures in the PDB which have been refined without temperature-factor restraints at resolutions as low as 3Å [6], the result being amusing to some, but not particularly useful). Another option is to multiply all temperature factors by a constant in order to reproduce the overall B-factor obtained from a Wilson plot, while retaining the pattern of low and high temperature factors within the molecule.

* When the going gets tough ...
In the case of non-crystallographic symmetry (NCS), MR becomes more difficult because a monomeric search model constitutes less and less of the scattering matter in the asymmetric unit as there are more and more molecules in it. If one is lucky enough to have a tetrameric search model, and the data indicates that the unknown might form similar tetramers, the MR can of course be carried out with the intact tetramer. If the rotation function fails to give solutions, PC refinement of the individual monomers may be of help. If that fails too, a dimer can be used and only if that fails to give a solution as well should a single monomer be tried. If the search model is monomeric, but the unknown is not, the locked rotation and translation functions may be of use.
In the case of multiple-domain structures, MR calculations can be carried out with the intact molecule, with different subsets of the domains, and with individual domains. PC refinement may be necessary in these cases before attempting to solve the translation function.
Often nowadays different crystal forms of a protein are obtained. Clearly, it is worthwhile to try and solve each of these separately. Usually, the one with the best data and the smallest degree of NCS can be solved first. Since the starting phases after MR are often very poor, it is worthwhile to solve one or more of the other crystal forms as well, so that multiple-crystal electron-density averaging can be used to improve the maps.

* Multiple models
One "trick" which we have found to work very well in cases where all attempts to solve the structure had failed is to use multiple, superimposed search models (see also [9]). Such models are often available:

This approach works surprisingly often, probably since it implicitly weights the well-conserved parts of the models higher than the more variable (or more poorly determined) parts. Nevertheless, it is often necessary in practice to combine the use of multiple models with the editing out of parts which show a large conformational spread. The approach that has worked for us on several occasions is to superimpose all available models, remove any obvious outliers (in particular for NMR ensembles), to remove regions of large variability, and to cut back all sidechains to CB or CG. Also, we have always used uniform temperature factors since (a) their use would be tatutological for multiple models, (b) Bs from different X-ray structures are not well comparable, and (c) NMR models don't have Bs associated with them at all.
When using multiple models, the contrast between correct solutions and incorrect ones is usually very low, so that it is very important to ascertain the correctness of solutions. Also, the correct solutions may be far down the list: for ACBP, rotation function solution 41, and translation function solution 21 turned out to be the correct ones ...
It is rather puzzling why it should be so difficult to solve MR problems when a 100% homologous NMR structure is available. In the case of ACBP the problem turned out to be a rather large RMSD of ~2.2Å on CA atoms. The differences, however, were not random: the helices had all undergone rigid-body translations along their axis compared to the NMR model.

* Tools
Various programs from Uppsala can be used in the process of producing search models:

(1) O [10]:

(2) MOLEMAN2 [11], a trivial but useful MOLEcule MANipulating program:
(3) LSQMAN [6], a superpositioning and (NCS-) analysis program:
(4) SEAMAN [11], a program specifically written for SEArch-model MANipulation which handles multiple models as well as single models:
Other Uppsala tools that may be of use for MR work in general include:
* AVAILABILITY
All programs other than O are available to academic users free of charge (from the O ftp server). For more information about these programs, contact GJK. For more information about O, contact T. A. Jones ( alwyn@xray.bmc.uu.se).

* REFERENCES


USF Latest update at 12 February, 1998.