Uppsala Software Factory

Uppsala Software Factory - QDB Manual

1 QDB - GENERAL INFORMATION
2 REFERENCES
3 VERSION HISTORY
4 INTRODUCTION
5 STARTUP
6 USER GUIDE

6.1 Prompt

6.2 Command types

6.3 Protein properties in the QDB database

6.4 Listing information about a protein

6.5 Statistics

6.6 Correlation of two properties

6.7 All correlations for one property

6.8 Histogram of property values

6.9 Sorting by property

6.10 Selecting a subset of proteins

6.11 Gerard's overall quality score

6.12 Significance of correlations
7 KNOWN BUGS

1 QDB - GENERAL INFORMATION

Program : QDB
Version : 950621
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 590, SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : quality analysis of 476 PDB entries
Package : stand-alone

2 REFERENCES

* 1 * G.J. Kleywegt (1996). Use of non-crystallographic symmetry in protein structure refinement. Acta Cryst D52, 842-857. [http://www.iucr.ac.uk/journals/acta/tocs/actad/1996/actad5204.html]

3 VERSION HISTORY

9409XX - 0.0 - initial version (code lost in disk crash)
941017 - 0.1 - started reprogramming
941018 - 0.2 - continued reprogramming
941020 - 0.3 - finished reprogramming
941231 - 0.4 - added EXTRA1 records and properties
950102 - 1.0 - added EXTRA2 records and properties; added SCORE; removed bug (COMPND wasn't read from the database)
950118 - 1.1 - sensitive to environment variable GKLIB
950621 - 1.2 - calculate significance for correlations

4 INTRODUCTION

QDB is a simple program for analysing a small database containing various statistics and quality-indicator values for protein structures solved by X-ray crystallography.

The present database file is called "quality.lib" and contains data pertaining to 476 proteins, solved at resolutions between 1.5 and 3.5 A. It was generated by G.J Kleywegt, for structures from the Brookhaven PDB, using PROCHECK, LSQMAN and several 'jiffy' programs.

NOTE: This program is sensitive to the environment variable GKLIB. If set, the name of this directory will be prepended to the default name for the library file needed by this program. For example, in Uppsala, put the following line in your .login or .cshrc file: setenv GKLIB /nfs/public/lib

5 STARTUP

When you start the program, it prints a header, current dimensioning and a list of available commands. But before that, you have to provide the name of your quality-database file:

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 52 gerard onyx 20:17:33 progs/qdb > QDB *** QDB *** QDB *** QDB *** QDB *** QDB *** QDB *** QDB *** QDB *** QDB *** Version - 950102/1.0 (C) 1993-5 Gerard J. Kleywegt, Dept. Mol. Biology, BMC, Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL) Others - T.A. Jones, G. Bricogne, Rams, W.A. Hendrickson Others - W. Kabsch, CCP4, PROTEIN, E. Dodson, etc. etc. Started - Mon Jan 2 21:41:05 1995 User - gerard Mode - interactive Host - onyx ProcID - 24473 Tty - /dev/ttyq1 *** QDB *** QDB *** QDB *** QDB *** QDB *** QDB *** QDB *** QDB *** QDB *** Max nr of proteins in database : ( 600) Max nr of criteria in database : ( 75) Max nr of comments per protein : ( 5) Max nr of selection save sets : ( 10) Max nr of histogram bins : ( 50) Initialising properties ... Initialising database ... Name of database file ? (quality.lib) Reading database ... Nr of lines read : ( 7183) Nr of proteins : ( 476) Calculating scores ... Quality indicators used in scoring: BAD | POOR | FAIR | OKAY | GOOD Name -2 | -1 | 0 | +1 | +2 Weight RESOL 2.20 2.00 1.50 1.20 5.00 RFAC 0.25 0.20 0.15 0.10 1.00 ... %DIH10 15.00 12.00 8.00 5.00 1.00 %ANG5 15.00 12.00 8.00 5.00 1.00 QUit ? (list commands) $ shell_command ! (comment) SHow [prop] LIst pdbid STats prop COrr prop1 prop2 [plotfile] SOrt prop [nlist xp1 xp2 xp3] ALl_corr prop [cut-off] HIsto prop [bins min max plotfile] SElect ALl SElect NOne SElect ANd VAlid prop SElect OR VAlid prop SElect ANd IF prop operator value SElect OR IF prop operator value SElect SAve saveset comment SElect REstore saveset SElect ? SElect INvert

CPU total/user/sys : 7.1 7.0 0.1 QDB [476/476] > ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

6 USER GUIDE

6.1 Prompt

The prompt ( QDB [476/476] > ) indicates how many out of how many proteins are currently selected. Only selected proteins will be used in the following commands.

6.2 Command types

There are three types of command:

- general commands (quit, list, etc.)
- selection commands
- analysis commands

6.3 Protein properties in the QDB database

Proteins in the database have 59 numeric properties (at present) and some text attributes. The numeric properties are referred to by their name. The SHow command (without parameter, or with * as parameter) lists all properties, their type (Real or Integer), their default value (if unobserved), the range of valid values (a value outside this range is considered unobserved), and a short description:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 QDB [476/476] > sh
 Nr  Name  T    Default    Minimum    Maximum Description
  1 YEAR   I         -1         50         99 Year of deposition at PDB
  2 Z      I         -1          1         96 Nr of asymmetric units per unit cell
  3 NATOMS I         -1        100    1000000 Nr of atoms in the PDB file
  4 NHET   I         -1          0    1000000 Nr of HETERO atoms in the PDB file
...
 72 %DIH10 R      -1.00       0.00     100.00 % Residues |delta-CA-CA*-CA-CA(NCS)| > 10
 73 AADANG R      -1.00       0.00     180.00 Average |delta-CA-CA*-CA(NCS)|
 74 %ANG5  R      -1.00       0.00     100.00 % Residues |delta-CA-CA*-CA(NCS)| > 5
 75 SCORE  R       0.00    -999.00     999.00 Grand quality score
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

6.4 Listing information about a protein

Proteins are refered to by their four-character PDB identifier. This can be used with the LIst command to retrieve all defined properties of a certain structure:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 QDB [476/476] > li 1guh
 List   : (1GUH)
 Number : (     311)
 Date   : (24-FEB-93)
 Spcgrp : (C2)
 Jrnl   : (J.MOL.BIOL.)
 Header : (TRANSFERASE(GLUTATHIONE))
 Compnd : (????)
 Author : (I.SINNING,G.J.KLEYWEGT,T.A.JONES)
 Remark : (GROUPED BS (NOTE: SHOULD DIVIDE NR OF ATOMS BY 2 !))
 Remark : (I.SINNING,G.J.KLEYWEGT,T.A.JONES)
 Remark : (-)
 Remark : (-)
 Remark : (-)
 YEAR           93 Year of deposition at PDB
 Z               4 Nr of asymmetric units per unit cell
 NATOMS       3646 Nr of atoms in the PDB file
 NHET           54 Nr of HETERO atoms in the PDB file
...
 %PSI10       0.00 % Residues with |delta-Psi(NCS)| > 10
 AADDIH       0.00 Average |delta-CA-CA*-CA-CA(NCS)|
 %DIH10       0.00 % Residues |delta-CA-CA*-CA-CA(NCS)| > 10
 AADANG       0.00 Average |delta-CA-CA*-CA(NCS)|
 %ANG5        0.00 % Residues |delta-CA-CA*-CA(NCS)| > 5
 SCORE       36.00 Grand quality score
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

6.5 Statistics

The simples analysis option is STats. For example to find the highest and lowest overall G-factor in the database, use:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 QDB [476/476] > st pogfac
 Nr of selected proteins : (     468)
 Nr  Name  T    Default    Minimum    Maximum Description
 59 POGFAC R     -99.00     -50.00      50.00 Overall G-factor
 Average value : ( -4.524E-01)
 St. deviation : (  5.541E-01)
 Minimum value : ( -7.700E+00)
 Maximum value : (  4.000E-01)
 Sum of values : ( -2.117E+02)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

Note that only structures for which the G-factor has actually been calculated are included (i.e., 468 out of 476 structures).

6.6 Correlation of two properties

The COrrelation command calculates the correlation coefficient between two properties (again, only for those of the currently selected ones for which both properties are defined):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 QDB [476/476] > sh rmsdbb
 Nr  Name  T    Default    Minimum    Maximum Description
 35 RMSDBB R      -1.00       0.00     999.00 RMS delta-B bonded atoms (A**2)
 QDB [476/476] > sh rmsdbn
 Nr  Name  T    Default    Minimum    Maximum Description
 45 RMSDBN R      -1.00       0.00     999.00 RMS delta-B of NCSCAI CAs improved LSQ (
 QDB [476/476] > cor rmsdbb rmsdbn
 Nr of selected proteins : (     324)
 Nr  Name  T    Default    Minimum    Maximum Description
 35 RMSDBB R      -1.00       0.00     999.00 RMS delta-B bonded atoms (A**2)
 Average value : (  4.244E+00)
 St. deviation : (  4.085E+00)
 Minimum value : (  9.000E-02)
 Maximum value : (  3.219E+01)
 Sum of values : (  1.375E+03)
 Nr  Name  T    Default    Minimum    Maximum Description
 45 RMSDBN R      -1.00       0.00     999.00 RMS delta-B of NCSCAI CAs improved LSQ (
 Average value : (  8.212E+00)
 St. deviation : (  5.119E+00)
 Minimum value : (  0.000E+00)
 Maximum value : (  3.560E+01)
 Sum of values : (  2.661E+03)
 Nr of values : (     324)
 Corr. coeff. : (   0.487)
 Plot file : (rmsdbb_rmsdbn.plt)
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

In this case, the correlation between "RMS delta-B bonded atoms" and "RMS delta-B of the NCSCAI CA atoms" is investigated: apparently, people who use strong/weak/no restraints for Bs of bonded atoms also use strong/weak/no restraints for the temperature factors of NCS- related atoms.

The command also produced an O2D plot file. If you enter "none" as the filename, no such file will be created. If you don't provide a filename at all, a sensible default is generated. Note that this produces a scatter plot (i.e., use the SC command in O2D, or add "sc" if you use the OMAC/o2dps script).

The plot file may look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
REMARK Created by QDB V. 941020/0.3 at Thu Oct 20 16:47:57 1994 for user gerard
REMARK Plot file rmsdbb_rmsdbn.plt
REMARK Scatter plot of RMSDBB = RMS delta-B bonded atoms (A**2)
REMARK And RMSDBN = RMS delta-B of NCSCAI CAs improved LSQ (
REMARK Number of selected proteins = 324
REMARK Correlation coefficient = 0.4873894
REMARK Minimum of RMSDBB = 9.0000004E-02 ... Maximum = 32.19000
REMARK Average of RMSDBB = 4.244442 ... Standard deviation = 4.085301
REMARK Minimum of RMSDBN = 0.0000000E+00 ... Maximum = 35.60000
REMARK Average of RMSDBN = 8.212036 ... Standard deviation = 5.118954
XLABEL RMSDBB
YLABEL RMSDBN
COLOUR 4
NPOINT 324
XYVIEW -0.8729999 33.15300 -1.068000 36.66800
XVALUE *
   5.6600E+00   8.7800E+00   9.8700E+00   2.6700E+00   1.5700E+00   5.6800E+00
   3.3000E+00   3.4900E+00   3.1700E+00   1.0000E+00   2.1000E+00   5.0000E-01
...
   1.2800E+00   3.0600E+00   2.4100E+00   2.4800E+00   1.3400E+00   3.9000E+00
YVALUE *
   4.0000E+00   1.0900E+01   1.2100E+01   6.5000E+00   2.9000E+00   6.6000E+00
...
   6.0000E+00   1.3900E+01   1.1000E+01   3.2000E+00   3.3000E+00   3.6000E+00
END
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

6.7 All correlations for one property

The ALl_corr command calculates the correlation coefficient between one property and all the others (no plot files are produced). If the absolute value of the correlation coefficient exceeds the value of "cutoff" (default 0.0), a message is printed:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 QDB [476/476] > al rmsdbb 0.4
 Property to correlate : (RMSDBB)
   
 Property       : (BMODE)
 Nr of proteins : (     434)
 Corr. coeff.   : (   1.000)
   
 Property       : (BSDV)
 Nr of proteins : (     434)
 Corr. coeff.   : (   0.550)
   
 Property       : (CORRBB)
 Nr of proteins : (     434)
 Corr. coeff.   : (  -0.836)
   
 Property       : (RMSDBN)
 Nr of proteins : (     324)
 Corr. coeff.   : (   0.487)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

6.8 Histogram of property values

If you want to look at the distribution of values for a property, use the HIstogram command. Provide the name of the property and, optionally:

- either MINUS the number of bins, or the size of the bins
- the minimim value to consider
- the maximum value to use
- the name of the histogram plot file

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 QDB [476/476] > hi resol 0.1 1.4 3.6
 Nr of selected proteins : (     476)
 Nr  Name  T    Default    Minimum    Maximum Description
 14 RESOL  R      -1.00       0.10       5.00 Nominal resolution of the data (A)
 Average value : (  2.403E+00)
 St. deviation : (  4.141E-01)
 Minimum value : (  1.500E+00)
 Maximum value : (  3.500E+00)
 Sum of values : (  1.144E+03)
 Nr of bins : (      22)
 Bin width  : (   0.100)
 Bin   2 [  1.5000E+00,   1.6000E+00] Nr =    3 (  0.63 %) Cumul    3
 Bin   3 [  1.6000E+00,   1.7000E+00] Nr =    9 (  1.89 %) Cumul   12
 Bin   4 [  1.7000E+00,   1.8000E+00] Nr =   42 (  8.82 %) Cumul   54
 Bin   5 [  1.8000E+00,   1.9000E+00] Nr =    2 (  0.42 %) Cumul   56
 Bin   6 [  1.9000E+00,   2.0000E+00] Nr =   35 (  7.35 %) Cumul   91
 Bin   7 [  2.0000E+00,   2.1000E+00] Nr =   55 ( 11.55 %) Cumul  146
 Bin   9 [  2.2000E+00,   2.3000E+00] Nr =   30 (  6.30 %) Cumul  176
 Bin  10 [  2.3000E+00,   2.4000E+00] Nr =   27 (  5.67 %) Cumul  203
 Bin  11 [  2.4000E+00,   2.5000E+00] Nr =   17 (  3.57 %) Cumul  220
 Bin  12 [  2.5000E+00,   2.6000E+00] Nr =  103 ( 21.64 %) Cumul  323
 Bin  13 [  2.6000E+00,   2.7000E+00] Nr =    2 (  0.42 %) Cumul  325
 Bin  14 [  2.7000E+00,   2.8000E+00] Nr =   28 (  5.88 %) Cumul  353
 Bin  15 [  2.8000E+00,   2.9000E+00] Nr =   62 ( 13.03 %) Cumul  415
 Bin  16 [  2.9000E+00,   3.0000E+00] Nr =   18 (  3.78 %) Cumul  433
 Bin  17 [  3.0000E+00,   3.1000E+00] Nr =   33 (  6.93 %) Cumul  466
 Bin  19 [  3.2000E+00,   3.3000E+00] Nr =    8 (  1.68 %) Cumul  474
 Bin  21 [  3.4000E+00,   3.5000E+00] Nr =    2 (  0.42 %) Cumul  476
 Plot file : (resol_histo.plt)
 Plot file written
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

The plot file should be plotted/converted with the HI or PI command in O2D (or OMAC/o2dps). It may look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
REMARK Created by QDB V. 941020/0.3 at Thu Oct 20 16:57:09 1994 for user gerard
REMARK Plot file resol_histo.plt
REMARK Histogram plot of RESOL = Nominal resolution of the data (A)
REMARK Number of selected proteins = 476
REMARK Minimum of RESOL = 1.500000 ... Maximum = 3.500000
REMARK Average of RESOL = 2.403317 ... Standard deviation = 0.4140590
XLABEL RESOL
YLABEL Nr of proteins in bin
COLOUR 4
NPOINT 23
XYVIEW 1.334000 3.666000 -3.090000 106.0900
XVALUE *
   1.4000E+00   1.5000E+00   1.6000E+00   1.7000E+00   1.8000E+00   1.9000E+00
   2.0000E+00   2.1000E+00   2.2000E+00   2.3000E+00   2.4000E+00   2.5000E+00
   2.6000E+00   2.7000E+00   2.8000E+00   2.9000E+00   3.0000E+00   3.1000E+00
   3.2000E+00   3.3000E+00   3.4000E+00   3.5000E+00   3.6000E+00
YVALUE *
   0.0000E+00   3.0000E+00   9.0000E+00   4.2000E+01   2.0000E+00   3.5000E+01
   5.5000E+01   0.0000E+00   3.0000E+01   2.7000E+01   1.7000E+01   1.0300E+02
   2.0000E+00   2.8000E+01   6.2000E+01   1.8000E+01   3.3000E+01   0.0000E+00
   8.0000E+00   0.0000E+00   2.0000E+00   0.0000E+00   0.0000E+00
END
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

6.9 Sorting by property

If you want to find out who solved the best and worst structures according to a single criterion, use the SOrt command. Provide the name of the property to sort on and, optionally:

- the number of proteins to list (0 means all selected proteins, a positive number means the top N entries, and a negative number means both the top and bottom N entries)

- the names of up to three additional numeric properties which should also be listed

For example, to sort by overall G-factor, use:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 QDB [476/476] > so pogfac -10 resol rmsimp year
 Sorted by : (POGFAC)
 Also list : (RESOL)
 Also list : (RMSIMP)
 Also list : (YEAR)
 List : (     -10)
 Nr of selected proteins : (     468)
   Nr Indx  ID        POGFAC       RESOL        RMSIMP       YEAR   Authors
    1   24 4HHB       -7.700        1.740        0.364           84 G.FERMI,M.F.PERUTZ
    2   21 4DFR       -3.400        1.700        0.484           82 D.J.FILMAN,D.A.MATTHEWS,J.T.BOLIN,J.KRAU
    3  469 1HCY       -2.500        3.200       -1.000           91 A.VOLBEDA,W.G.J.HOL
    4  125 2SOD       -2.400        2.000        0.882           80 J.A.TAINER,E.D.GETZOFF,J.S.RICHARDSON,D.
    5  468 1HC1       -2.100        3.200       -1.000           91 A.VOLBEDA,W.G.J.HOL
    6  302 1RBA       -2.000        2.600        0.925           91 G.SCHNEIDER,E.SODERLIND
    7  403 3HVP       -2.000        2.800       -1.000           89 A.WLODAWER,M.JASKOLSKI,M.MILLER
    8  164 1FXI       -1.900        2.200        0.539           90 T.TSUKIHARA
    9  374 1CID       -1.800        2.800       -1.000           93 R.L.BRADY,E.J.DODSON,G.LANGE
   10  391 2AAT       -1.600        2.800       -1.000           89 D.SMITH,S.ALMO,M.TONEY,D.RINGE
 ==============
  459   39 1DXU        0.200        1.800        0.253           92 J.S.KAVANAUGH,A.ARNONE
  460   34 3MDS        0.200        1.800        0.159           93 M.L.LUDWIG,A.L.METZGER,K.A.PATTRIDGE,W.C
  461   40 1DXV        0.200        1.800        0.271           92 J.S.KAVANAUGH,A.ARNONE
  462    5 4SDH        0.200        1.600        0.246           93 W.E.ROYERJUNIOR
  463  475 1HNB        0.200        3.500        0.781           93 S.RAGHUNATHAN,R.J.CHANDROSS,R.H.KRETSING
  464  439 1HNC        0.300        3.000        0.923           93 S.RAGHUNATHAN,R.J.CHANDROSS,R.H.KRETSING
  465  114 1FIA        0.300        2.000        0.333           91 D.KOSTREWA,J.GRANZIN,H.-W.CHOE,J.LABAHN,
  466   20 2MSB        0.300        1.700        0.265           92 W.I.WEIS,K.DRICKAMER,W.A.HENDRICKSON
  467   97 1DSB        0.300        2.000        0.633           93 J.L.MARTIN,J.C.A.BARDWELL,J.KURIYAN
  468  110 2NCK        0.400        2.000        0.253           93 R.L.WILLIAMS,D.A.OREN,E.ARNOLD
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

6.10 Selecting a subset of proteins

If you want statistics for only a subset of the proteins (e.g., all structures with NCS, solved after 1988 at resolutions below 2.5 A), use the SElect command:

- ALl selects all proteins
- NOne selects none of the proteins
- INvert selects the complement of the previously selected proteins
- ANd VAlid only keeps those for which a certain property is defined
- OR VAlid adds those for which a certain property is defined
- ANd IF only keeps those for which the value of a defined property satisfy an expression (<, =, or > than a cut-off value)
- OR IF adds those for which the value of a defined property satisfy an expression (<, =, or > than a cut-off value)
- SAve stores the current selection (so-called saveset)
- REstores restores a previously stored saveset
- ? prints a listing of the savesets

From version 1.0 onward, you can also select by some of the text attributes, namely:

- AUthor
- COmpound
- JOurnal

Each of these three types of selection works as an AND on the current set of selected proteins.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 QDB [476/476] > sel all
 All proteins selected
 QDB [476/476] > sel and valid numncs
 Nr of selected proteins : (     345)
 QDB [345/476] > sel and if year > 88
 Nr of selected proteins : (     331)
 QDB [331/476] > sel and if resol > 2.5
 Nr of selected proteins : (      90)
 QDB [ 90/476] > se sav 1 "ncs after 1988 worse than 2.5 A resol"
 Saveset  1 [  90/ 476 ] = ncs after 1988 worse than 2.5 A resol
 QDB [ 90/476] > se ?
 Saveset  1 [  90/ 476 ] = ncs after 1988 worse than 2.5 A resol
 Saveset  2 [   0/ 476 ] = NO proteins selected
 Saveset  3 [   0/ 476 ] = NO proteins selected
 Saveset  4 [   0/ 476 ] = NO proteins selected
 Saveset  5 [   0/ 476 ] = NO proteins selected
 Saveset  6 [   0/ 476 ] = NO proteins selected
 Saveset  7 [   0/ 476 ] = NO proteins selected
 Saveset  8 [   0/ 476 ] = NO proteins selected
 Saveset  9 [   0/ 476 ] = NO proteins selected
 Saveset 10 [   0/ 476 ] = NO proteins selected
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 QDB [476/476] > se jo nature
 Nr of selected proteins : (      20)
 QDB [ 20/476] > se all
 All proteins selected
 QDB [476/476] > se au w.g.j.hol
 Nr of selected proteins : (      13)
 QDB [ 13/476] > se all
 All proteins selected
 QDB [476/476] > se comp dismutase
 Nr of selected proteins : (      10)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

6.11 Gerard's overall quality score

From version 1.0 onward, an overall score is calculated for each protein. This done by taking a number of quality indicators and classifying them as BAD, POOR, FAIR, OKAY and GOOD. Each of these indicators has a weight W. All proteins start with a score of zero.
If a protein scores BAD for a criterion, -2 * W is added to its score, if it's POOR, -W is added; OKAY gives +W and GOOD +2 * W.
The scoring formula is highly subjective of course, but I tend to find that it agrees well with my own impression of the quality and reliability of protein models.
The computed values is stored in a property called "SCORE" and can be used for selecting and sorting etc.

The following properties are used in the formula:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Quality indicators used in scoring:
         BAD  |   POOR  |  FAIR   |  OKAY   |  GOOD
 Name    -2   |    -1   |    0    |   +1    |   +2  Weight
 RESOL       2.20      2.00      1.50      1.20      5.00
 RFAC        0.25      0.20      0.15      0.10      1.00
 RHO         0.50      1.00      1.50      3.00      1.00
 BWAVE      60.00     50.00     40.00     30.00      2.00
 BAVE       40.00     30.00     20.00     10.00      2.00
 RMSDBB     10.00      7.50      5.00      2.50      5.00
 RMSIMP      0.50      0.30      0.20      0.10      3.00
 RMSDBN     10.00      7.50      5.00      2.50      3.00
 PRMFRP     70.00     80.00     85.00     90.00      3.00
 PRDARP      2.00      1.50      1.00      0.50      5.00
 POGFAC     -2.00     -1.00     -0.50      0.00      3.00
 DACA       -2.00     -1.00     -0.50      0.00      5.00
 FLIP        4.00      3.00      2.00      1.00      3.00
 BADRSC     20.00     15.00     10.00      5.00      3.00
 %PHI10     15.00     12.00      8.00      5.00      1.00
 %PSI10     15.00     12.00      8.00      5.00      1.00
 %DIH10     15.00     12.00      8.00      5.00      1.00
 %ANG5      15.00     12.00      8.00      5.00      1.00
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

6.12 Significance of correlations

From version 1.2 onward, the significance of the correlations found with COrr and ALl_corr is calculated. See chapter 13.7 of "Numerical Recipes". The standard deviation of the correlation coefficient, r, is roughly 1/SQRT(N), and the significance of a correlation is erfc(|r| * SQRT (N/2)), with erfc the complementary error function. A *small* value for this number indicates that the two distributions are significantly correlated. (Given the large values for N, this is almost always the case here ;-)

However, note that the assumption of rapidly dying tails for the individual distributions is not true (they are more Poisson-like), so that the significance of the significance hass to be taken with a large rock of salt. In fact, it is probably impossible to decide if a correlation is significant in the cases considered here (I couldn't find any appropriate tests, nor could the assembled statisticians on Usenet (sci.stats.help, I think it was)).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 QDB [476/476] > al resol
 Property to correlate : (RESOL)
 ...
 Property       : (PNBC)
 Nr of proteins : (     468)
 Corr. coeff.   : (   0.004)
 Significance   : (  3.440E+00)
 ...
 Property       : (PRMFRP)
 Nr of proteins : (     468)
 Corr. coeff.   : (  -0.480)
 Significance   : (  2.793E-25)
 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

7 KNOWN BUGS

None, at present.

Created at Fri Dec 18 19:42:23 1998 by MAN2HTML version 971024/1.6