USF

NEWS FROM THE UPPSALA SOFTWARE FACTORY - 3

OOPS-a-daisy

Gerard J. Kleywegt & T. Alwyn Jones
Department of Molecular Biology
Biomedical Centre, Uppsala University
Uppsala - Sweden

Rebuilding a protein structure into an electron density map is a tedious chore. Assuming that the chain has been traced correctly, there still remains the danger of smaller, local errors in the structure [1], such as a poor fit of the model to the data and poor stereo-chemistry. When rebuilding a model, for example with O [2], for each residue in turn one has to consider if the residue fits the density, if it has reasonable side-chain geometry, if it has favourable phi/psi angles, if it has atoms with unusually high temperature factors, if the peptide is close to planar and has its oxygen atom pointing into the right direction, etc. etc.
Collecting and integrating all the necessary information is time-consuming and prone to oversights, especially if one has to rebuild several hundred residues. In addition, once the model is reasonably well refined, most residues will be okay. This means that, for most rebuilds, it may be better to spend most of the time inspecting only a small fraction of the residues, namely the ten percent (or so) bad or suspect ones. In order to speed up and facilitate the rebuilding process, we have written a program, called OOPS, which automates most of the work involved in managing all quality-related information. To prevent re-invention of a number of wheels, OOPS performs only a few error checks itself; most of the quality indicators it checks should be calculated with O first. In addition, it is fairly simple to include one's own criteria. The most interesting part of the program's output is a set of O macros which will take the user on a journey along all bad or suspect residues. For each such residue, the macros will tell the user what is wrong with the residue. In this fashion, rebuilding can be sped up by a factor of two to five, depending on the quality of the current model.

DATABLOCKS
The input to OOPS consists of a number of O datablocks plus some input from the user. For some checks, a PDB file is required as well. In addition, output on a per-residue basis from other programs can be used (if it comes in the form of an O datablock).
So, what's an O datablock ? O stores all its data in the internal database in the form of datablocks [3]. These are one-dimensional arrays (vectors) of integer, real, character (*6) or text values. For example, the residue names of a molecule are stored in character datablocks:


M9A_RESIDUE_NAME          C        159 (1x,5a)
 A5    A6    A7    A8    A9   
...
 A160  A161  A162  A301  

O contains several utilities for assessing protein quality on a per-residue basis (see below). For instance, the "RSC_fit" (Rotamer-Side-Chain fit) command calculates for every residue (except Gly and Ala) the RMS distance between its side-chain atoms and the corresponding atoms of the rotamer that is most similar to it [3,4]. If this number is greater than ~1.5 Å, the residue has a side-chain conformation which differs significantly from all of the known rotamers (derived from a database of well-refined, high-resolution structures) for that residue type. This means that the residue merits closer scrutiny [4]: either its conformation is "real", or it is in error (e.g. due to a rebuilding error or, more seriously, over-fitting of the data at low resolution). The results of the calculations are stored in a datablock, one real number per residue:


M9A_RESIDUE_RSC           R        159 (9(x,f7.4))
  1.7662  0.8072  0.0000  0.4455  0.8154  0.1031  0.4413  0.0000  0.5567
  0.1751  0.3660  0.0000  0.3071  0.4685  1.7608  0.0000  1.0607  0.3184
  0.1695  0.3179  0.1571  0.3836  0.0000  0.2108  1.5435  1.5919  0.7658
...
  0.8385  0.0000  0.2209  0.1567  0.8266  1.3797  0.7248  0.1913  0.2502
  1.3568  0.4881  1.4166  0.9104  0.0000  0.0000

OOPS can handle up to ten user-defined criteria. The only requirements are that the data is in the form of a datablock with one (integer or real) number per residue, and that the distinction between good and bad residues is of the type: "the residue is bad, if the value of X is less (or greater) than some cut-off". Another utility program available to O users (called ODBMAN) can be used to extract O datablocks from the output of other programs such as X-PLOR [5] and PROCHECK [6]. Such datablocks can then be used in conjunction with OOPS. Examples are the number of bad contacts which can be extracted from PROCHECK output, and the "conformational energy" of the residues, which can be calculated with X-PLOR. One could also use OOPS in NMR work, for example by providing it with the number of constraint/restraint violations per residue, or even simply the number of NOEs per residue (if this number is low, the residue is not well defined by the data; this information could then be used to decide to replace "chimeric" side-chain conformations by rotamers).

QUALITY CHECKS
At present, OOPS can check the following quality indicators (in addition to the ten user-definable criteria; note that some of the criteria are specific for and limited to amino-acid residues):

(1) Bad pep-flips [2] (a measure for the distance between a peptide oxygen orientation and those encountered in the database). Typically, residues with values exceeding ~2.5 Å should be inspected more closely (either they are wrong, or they have an unusual peptide orientation for a reason)

(2) Bad RS-fit values [1,2] (the correlation between calculated and 2Fo-Fc density for any or all atoms in a residue); values lower than ~0.6 indicate poor density. RS-fit values may be checked for all atoms, main-chain atoms alone and side-chain atoms alone(3) Bad RSC values (see above)

(4) Mask errors (i.e., if one uses real-space averaging, this checks which atoms are not covered by the current mask, given a certain radius)

(5) Too high and too low temperature factors and occupancies

(6) Bad phi/psi angle combinations [7]

(7) Poor peptide planarity (by calculating the improper twist angle C(i) - Ca(i) - N(i+1) - O(i), which is 0.0 for planar peptide groups)

(8) Poor Ca chirality [6] (by calculating the improper twist angle Ca(i) - N(i) - C(i) - CB(i) which should be ~33.9(o) for non-Gly, non-Pro residues)

(9) Bad QualWat values [8] (water molecules only). This is a combined measure of quality, incorporating occupancy and temperature factor of the water oxygen atoms and the resolution of the data: QualWat = 100 * Q * EXP(-B/(4D^2))). This quantity will be zero for absent, and 100 for perfect water molecules

(10) Bad contacts (requires output from PROCHECK [6])

OUTPUT
The output of OOPS consists of:

(1) statistics for most of the used quality indicators, e.g.:


 ***************************************************************************
  Analysis of Pep-flip values (>0)
 ***************************************************************************
 Number of values ....................                  154
 Average value .......................                0.837
 Standard deviation ..................                0.584
 Minimum value observed ..............                0.166
 Maximum value observed ..............                3.033

 Nr <        0.0000                    :        0 (  0.00 %; Cum   0.00 %)
 Nr >=       0.0000 and <       0.5000 :       47 ( 30.52 %; Cum  30.52 %)
 Nr >=       0.5000 and <       1.0000 :       74 ( 48.05 %; Cum  78.57 %)
 Nr >=       1.0000 and <       1.5000 :       14 (  9.09 %; Cum  87.66 %)
 Nr >=       1.5000 and <       2.0000 :        8 (  5.19 %; Cum  92.86 %)
 Nr >=       2.0000 and <       2.5000 :        7 (  4.55 %; Cum  97.40 %)
 Nr >=       2.5000 and <       3.0000 :        3 (  1.95 %; Cum  99.35 %)
 Nr >=       3.0000 and <       3.5000 :        1 (  0.65 %; Cum 100.00 %)
 Nr >=       3.5000                    :        0 (  0.00 %; Cum 100.00 %)

(2) plot files for some of the criteria (as a function of residue number)

(3) a list of potentially bad residues, plus their faults, e.g.:


 OOPS - (GLU A5) 
 Bad RS-fit (all atoms) =   0.4076500    
 Bad RS-fit (main chain) =   0.5228300    
 Bad RS-fit (side chain) =   0.2270200    
 Mask too tight
 Too high temperature factor =    140.1200    
 Bad contact(s); count =            1

(4) a list of the violation counts for each criterion, e.g.:


 Bad pep-flip             : (          4) 
 Bad RS-fit (all atoms)   : (          8) 
 Bad RS-fit (main chain)  : (          8) 
 Bad RS-fit (side chain)  : (         11) 
 Bad RSC                  : (         17) 

(5) a set of O macros. These can be generated in several ways:
- one for each residue, or only for each suspect residue
- chained together, or not (i.e., if they are chained, one macro will automatically add a command to the O menu to execute the macro for the next residue, if any)
Each macro contains a command which will put the residue at the centre of the display, commands to print some information (name and type of the residue, criteria that were violated), a user-defined set of commands (e.g., to execute another macro which draws the residue and its closest neighbours and perhaps a map). Chained macros also manipulate the O menu. An example of such a macro:


 centre_zone M6A A5 ;
 print Residue GLU A5
 print Bad RS-fit (all atoms) = 0.408
 print Bad RS-fit (main chain) = 0.523
 print Bad RS-fit (side chain) = 0.227
 print Mask too tight
 print Too high temperature factor = 140.12
 print Bad contact(s); count = 1
 @avemap obj sph sphere 10 end bell
 print Hit or type "@oops/a6" for next baddy
 menu @oops/a6 on on_off
 menu @oops/a5 off on_off

By using chained macros for only the suspect residues, the crystallographer is guided quickly through the trouble spots and the interesting bits of the structure. If, on the other hand, one generates a macro for every residue, one can use the macros as a useful source of information. If one wants to obtain information about the quality of residue Trp A412, all one has to do is to execute the OOPS macro (which may have the same name as the residue) pertaining to this residue and up come the bad aspects (if any) of that tryptophan.Using OOPS requires some O datablocks to be prepared in advance (however, there's an O macro available to do most of that work for the user as well). Running the program takes only a few minutes. The result in terms of speed-up of the rebuilding process are well worth this small effort. Also, OOPS makes it less likely that residues with serious errors in them are overlooked and may therefore help improve the quality of the structure.

AVAILABILITY
OOPS
is one in a series of "O-dalisques", i.e. programs that work in conjunction with O. OOPS runs on SGI, ESV and DEC ALPHA/OSF1 workstations. For more information, contact GJK (E-mail: "gerard@xray.bmc.uu.se").

REFERENCES
[1]
C.I. Brändén & T.A. Jones, Nature 343 (1990), 687-689.
[2]
T.A. Jones, J.Y. Zou, S.W. Cowan & M. Kjeldgaard, Acta Cryst. A47 (1991), 110-119.
[3]
T.A. Jones & M. Kjeldgaard, "O - the manual", Uppsala (1993).
[4]
J.Y. Zou & S.L. Mowbray, "An evaluation of the use of databases in protein structure refinement", submitted.
[5]
A.T. Brünger, "X-PLOR. A system for crystallography and NMR", New Haven (1992).
[6]
R.A. Laskowski, M.W. MacArthur, D.S. Moss & J.M. Thornton, J. Appl. Cryst. 26 (1993), 283-291.
[7]
C. Ramakrishnan & G.N. Ramachandran, Biophys. J. 5 (1965), 909-933.
[8]
E. Arnold & M.G. Rossmann, J. Mol. Biol. 211 (1990), 763-801.


USF Latest update at 12 February, 1998.