Program : ZPROF
Version : 971103
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology,
Uppsala University, Biomedical Centre, Box 590,
SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : calculate Z-scores for profile/database scan results
Package : SBIN
Reference(s) for this program:
* 1 * G.J. Kleywegt & T.A. Jones (1998). Databases in protein crystallography. Acta Cryst D54, 1119-1131. [http://alpha2.bmc.uu.se/gerard/papers/databases.html] [http://www.iucr.org/iucr-top/journals/acta/tocs/actad/1998/actad5406_1.html]
* 2 * G.J. Kleywegt & T.A. Jones (1999 ?). Chapter 25.2.6. O and associated programs. Int. Tables for Crystallography, Volume F. To be published.
970813 - 0.1 - first version
971103 - 1.0 - cleaned up code and manual
ZPROF is a simple non-interactive program which reads a *sorted* list of profile/sequence scores (calculated with the pftools-program "pfsearch") and calculates Z-scores.
Usage: ZPROF [Z-score cut-off] < sorted_list > log_file
The value for the Z-score cut-off is optional (defaults to 4.0).
Typical example:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- pfsearch -a aligned.prf /nfs/scr_uu5/gerard/sprot34.dat | & tee pfsearch_all.log sort -nr pfsearch_all.log > pfsearch_all.sorted ZPROF 3.5 < pfsearch_all.sorted > zprof.top ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The output may look as follows:
(1) The Z-score cut-off value is set to the default; if a command-line argument is found which can be interpreted as a real or integer number, that cut-off is used instead:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***Version - 971103/1.0 (C) 1992-97 Gerard J. Kleywegt, Dept. Mol. Biology, BMC, Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL) Others - T.A. Jones, G. Bricogne, Rams, W.A. Hendrickson Others - W. Kabsch, CCP4, PROTEIN, E. Dodson, etc. etc.
Started - Mon Nov 3 19:21:31 1997 User - gerard Mode - interactive Host - sarek ProcID - 29454 Not using a tty as input device
*** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***
Reference(s) for this program:
* 1 * G.J. Kleywegt, Uppsala University, Uppsala, Sweden, Unpublished program.
For manuals and complete references, check: http://alpha2.bmc.uu.se/usf/
*** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF *** ZPROF ***
Z-score cut-off : ( 3.500) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
(2) The sorted score file is read and statistics are calculated and printed. Then an iterative process starts (max. 10 cycles) to determine the average score and the standard deviation therein for the sequences whose score is less than Average + Z_cut-off * St_deviation. When the number of such sequences is constant, the calculations have converged. The values for the average and standard deviation of that cycle will be used.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Working ...Nr of sequences scored : ( 59021) Average : ( 178.673) St.dev. : ( 44.654) Minimum : ( 26.000) Maximum : ( 594.000)
Remove "outliers" and re-calc ... Nr of sequences left : ( 58944) Average : ( 178.324) St.dev. : ( 43.513) Minimum : ( 26.000) Maximum : ( 334.000)
Remove "outliers" and re-calc ... Nr of sequences left : ( 58939) Average : ( 178.311) St.dev. : ( 43.491) Minimum : ( 26.000) Maximum : ( 330.000)
Remove "outliers" and re-calc ... Nr of sequences left : ( 58939) Average : ( 178.311) St.dev. : ( 43.491) Minimum : ( 26.000) Maximum : ( 330.000)
Converged ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
(3) A bit of "profile code" is printed which can be cut and pasted into the profile file:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- MA /NORMALIZATION: MODE=1; FUNCTION=LINEAR; MA R1= -4.09993267; R2= 0.02299311; TEXT ='Z-score'; MA /CUT_OFF: LEVEL=0; SCORE= 331; N_SCORE= 3.50000000; MODE=1; ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
(4) In case you want to use a different cut-off, a number of Z-score cut-off values and the corresponding raw score values are printed:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Z-score of 0.0 requires raw score 178 Z-score of 0.5 requires raw score 200 Z-score of 1.0 requires raw score 222 Z-score of 1.5 requires raw score 244 Z-score of 2.0 requires raw score 265 Z-score of 2.5 requires raw score 287 Z-score of 3.0 requires raw score 309 Z-score of 3.5 requires raw score 331 Z-score of 4.0 requires raw score 352 Z-score of 4.5 requires raw score 374 Z-score of 5.0 requires raw score 396 Z-score of 5.5 requires raw score 418 Z-score of 6.0 requires raw score 439 Z-score of 6.5 requires raw score 461 Z-score of 7.0 requires raw score 483 Z-score of 7.5 requires raw score 504 Z-score of 8.0 requires raw score 526 Z-score of 8.5 requires raw score 548 Z-score of 9.0 requires raw score 570 Z-score of 9.5 requires raw score 591 Z-score of 10.0 requires raw score 613 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
(5) The top-scoring entries are listed with their rank, Z-score and raw score. (Note that this assumes that the input file was already sorted !!!) After the first entry which scores below the Z-score cut-off has been listed, the listing ends.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- 1 9.56 594 P29257|LEC2_CYTSC 2-ACETAMIDO-2-DEOXY-D-GALACTOSE-BINDING SEED LECTIN II 2 9.24 580 P45797|GUB_BACPO BETA-GLUCANASE PRECURSOR (EC 3.2.1.73) (ENDO-BETA-1,3-1, 3 9.03 571 P27051|GUB_BACLI BETA-GLUCANASE PRECURSOR (EC 3.2.1.73) (ENDO-BETA-1,3-1,[...]
80 3.56 333 P36851|HEX_ADE07 HEXON PROTEIN (LATE PROTEIN 2). 81 3.56 333 P36849|HEX_ADE03 HEXON PROTEIN (LATE PROTEIN 2). 82 3.53 332 P32491|MKK2_YEAST PROTEIN KINASE MKK2/SSP33 (EC 2.7.1.-). 83 3.49 330 P38419|LOXC_ORYSA LIPOXYGENASE, CHLOROPLAST PRECURSOR (EC 1.13.11.12). ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
(6) Finally, a brief summary is printed:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Z-score cut-off : ( 3.500) Nr of "hits" : ( 82) % of database : ( 0.139) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
None, at present ("peppar, peppar").
Does not compute.