Uppsala Software Factory

Uppsala Software Factory - ZPROF Manual


1 ZPROF - GENERAL INFORMATION

Program : ZPROF
Version : 971103
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 590, SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : calculate Z-scores for profile/database scan results
Package : SBIN


2 REFERENCES

Reference(s) for this program:

* 1 * G.J. Kleywegt & T.A. Jones (1998). Databases in protein crystallography. Acta Cryst D54, 1119-1131. [http://alpha2.bmc.uu.se/gerard/papers/databases.html] [http://www.iucr.org/iucr-top/journals/acta/tocs/actad/1998/actad5406_1.html]

* 2 * G.J. Kleywegt & T.A. Jones (1999 ?). Chapter 25.2.6. O and associated programs. Int. Tables for Crystallography, Volume F. To be published.


3 VERSION HISTORY

970813 - 0.1 - first version
971103 - 1.0 - cleaned up code and manual


4 DESCRIPTION

ZPROF is a simple non-interactive program which reads a *sorted* list of profile/sequence scores (calculated with the pftools-program "pfsearch") and calculates Z-scores.

Usage: ZPROF [Z-score cut-off] < sorted_list > log_file

The value for the Z-score cut-off is optional (defaults to 4.0).

Typical example:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
pfsearch -a aligned.prf /nfs/scr_uu5/gerard/sprot34.dat | & tee pfsearch_all.log
sort -nr pfsearch_all.log > pfsearch_all.sorted
ZPROF 3.5 < pfsearch_all.sorted > zprof.top
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

The output may look as follows:

(1) The Z-score cut-off value is set to the default; if a command-line argument is found which can be interpreted as a real or integer number, that cut-off is used instead:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***

Version - 971103/1.0 (C) 1992-97 Gerard J. Kleywegt, Dept. Mol. Biology, BMC, Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL) Others - T.A. Jones, G. Bricogne, Rams, W.A. Hendrickson Others - W. Kabsch, CCP4, PROTEIN, E. Dodson, etc. etc.

Started - Mon Nov 3 19:21:31 1997 User - gerard Mode - interactive Host - sarek ProcID - 29454 Not using a tty as input device

*** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***

Reference(s) for this program:

* 1 * G.J. Kleywegt, Uppsala University, Uppsala, Sweden, Unpublished program.

For manuals and complete references, check: http://alpha2.bmc.uu.se/usf/

*** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***

Z-score cut-off : ( 3.500) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(2) The sorted score file is read and statistics are calculated and printed. Then an iterative process starts (max. 10 cycles) to determine the average score and the standard deviation therein for the sequences whose score is less than Average + Z_cut-off * St_deviation. When the number of such sequences is constant, the calculations have converged. The values for the average and standard deviation of that cycle will be used.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Working ...

Nr of sequences scored : ( 59021) Average : ( 178.673) St.dev. : ( 44.654) Minimum : ( 26.000) Maximum : ( 594.000)

Remove "outliers" and re-calc ... Nr of sequences left : ( 58944) Average : ( 178.324) St.dev. : ( 43.513) Minimum : ( 26.000) Maximum : ( 334.000)

Remove "outliers" and re-calc ... Nr of sequences left : ( 58939) Average : ( 178.311) St.dev. : ( 43.491) Minimum : ( 26.000) Maximum : ( 330.000)

Remove "outliers" and re-calc ... Nr of sequences left : ( 58939) Average : ( 178.311) St.dev. : ( 43.491) Minimum : ( 26.000) Maximum : ( 330.000)

Converged ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(3) A bit of "profile code" is printed which can be cut and pasted into the profile file:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
MA   /NORMALIZATION: MODE=1; FUNCTION=LINEAR;
MA     R1=    -4.09993267; R2=     0.02299311; TEXT ='Z-score';
MA   /CUT_OFF: LEVEL=0; SCORE=     331; N_SCORE=     3.50000000; MODE=1;
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(4) In case you want to use a different cut-off, a number of Z-score cut-off values and the corresponding raw score values are printed:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Z-score of  0.0 requires raw score      178
 Z-score of  0.5 requires raw score      200
 Z-score of  1.0 requires raw score      222
 Z-score of  1.5 requires raw score      244
 Z-score of  2.0 requires raw score      265
 Z-score of  2.5 requires raw score      287
 Z-score of  3.0 requires raw score      309
 Z-score of  3.5 requires raw score      331
 Z-score of  4.0 requires raw score      352
 Z-score of  4.5 requires raw score      374
 Z-score of  5.0 requires raw score      396
 Z-score of  5.5 requires raw score      418
 Z-score of  6.0 requires raw score      439
 Z-score of  6.5 requires raw score      461
 Z-score of  7.0 requires raw score      483
 Z-score of  7.5 requires raw score      504
 Z-score of  8.0 requires raw score      526
 Z-score of  8.5 requires raw score      548
 Z-score of  9.0 requires raw score      570
 Z-score of  9.5 requires raw score      591
 Z-score of 10.0 requires raw score      613
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

(5) The top-scoring entries are listed with their rank, Z-score and raw score. (Note that this assumes that the input file was already sorted !!!) After the first entry which scores below the Z-score cut-off has been listed, the listing ends.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
        1     9.56    594 P29257|LEC2_CYTSC 2-ACETAMIDO-2-DEOXY-D-GALACTOSE-BINDING SEED LECTIN II
        2     9.24    580 P45797|GUB_BACPO BETA-GLUCANASE PRECURSOR (EC 3.2.1.73) (ENDO-BETA-1,3-1,
        3     9.03    571 P27051|GUB_BACLI BETA-GLUCANASE PRECURSOR (EC 3.2.1.73) (ENDO-BETA-1,3-1,

[...]

80 3.56 333 P36851|HEX_ADE07 HEXON PROTEIN (LATE PROTEIN 2). 81 3.56 333 P36849|HEX_ADE03 HEXON PROTEIN (LATE PROTEIN 2). 82 3.53 332 P32491|MKK2_YEAST PROTEIN KINASE MKK2/SSP33 (EC 2.7.1.-). 83 3.49 330 P38419|LOXC_ORYSA LIPOXYGENASE, CHLOROPLAST PRECURSOR (EC 1.13.11.12). ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

(6) Finally, a brief summary is printed:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Z-score cut-off : (   3.500)
 Nr of "hits"    : (         82)
 % of database   : (   0.139)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5 KNOWN BUGS

None, at present ("peppar, peppar").


6 UNKNOWN BUGS

Does not compute.


Uppsala Software Factory Created at Fri Dec 18 19:42:31 1998 by MAN2HTML version 971024/1.6