Program : DEJAVU
Version : 981127
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology,
Uppsala University, Biomedical Centre, Box 590,
SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : detecting similarities/motifs in protein structures
using a large database
Package : DEJAVU
Reference(s) for this program:
* 1 * G.J. Kleywegt & T.A. Jones (1994). Halloween ... Masks and Bones. In "From First Map to Final Model", edited by S. Bailey, R. Hubbard and D. Waller. SERC Daresbury Laboratory, Warrington, pp. 59-66.
* 2 * G.J. Kleywegt & T.A. Jones (1997). Taking the fun out of map interpretation. CCP4/ESF-EACBM Newsletter on Protein Crystallography 33, January 1997, pp. 19-21. [http://alpha2.bmc.uu.se/usf/factory_7.html]
* 3 * G.J. Kleywegt & T.A. Jones (1997). Detecting folding motifs and similarities in protein structures. Methods in Enzymology 277, 525-545.
* 4 * G.J. Kleywegt & T.A. Jones (1999 ?). Chapter 25.2.6. O and associated programs. Int. Tables for Crystallography, Volume F. To be published.
921022 - 0.1 - Started programming; called program "AnalSecS"
for "ANALyse SECondary Structure" ...
921029 - 1.0 - First working version released in-house;
first version of the manual
921030 - 1.1 - Minor changes; continued manual; cro analysis
921031 - 1.2 - Minor changes to lsq-macro and output;
corrected non-conservation of directionality;
introduced weights in the score calculation
921103 - 1.3 - Changed LIst option; add STatistics option
930105 - 1.4 - Changed name to DEJAVU (at last); updated manual
930125 - 1.5 - Implemented distance options I and A; implemented
incremental search for maximum common motif;
option to try to avoid multiple chain hits
930126 - 1.6 - Removed some minor bugs
930222 - 1.7 - new SELECT option; avoid hits with multiple copies
of the "same" protein
930302 - 1.8 - TOPOLOGY option (crummy !!!)
930713 - 2.0 - cleaned up for export; added notes on installing
and running the software to this manual file
930826 - 2.1 - more info when errors occur during database read;
increased array dimensions for new databases
930921 - 2.1.1 minor bug fix in SElect (needed for DEC Alphas)
930923 - - added jiffy program POST to analyse O log file
930924 - 3.0 - altered SElect command to continue cycling until
you actually choose option 0 (=back to main menu);
BONES search option (part of INcr); works for P2 !
930927 - 3.1 - if BONES search, check that there are > 2 SSEs;
if NO directionality, use |cos| for the score;
option to skip all proteins whose PDB file does
not exist (actually: can not be read by the user);
only include factors in score whose weight > 0.01;
include centroid-LSQ-RMSD as a factor contributing
to the score; new option to do either an lsq_explicit
inside O, or an lsq_centroid inside DEJAVU; make
lsq_improve with both complete molecules the default
for the FInd option as well
931206 - 4.0 - interface with LSQMAN (through input file)
941101 - 4.1 - increased dimensioning to 2500 structures
950118 - 4.2 - sensitive to environment variable GKLIB
950718 - 4.3 - replaced "mismatch nr of residues" by two separate
cut-offs for "too short" and "too long" SSEs
970102 - 5.0 - better suggested defaults for BONES searches; sort
the hits (by nr of SSEs -> RMSD -> Score); reduced
the amount of output generated by the program; add
PDB identifier to PRINT statements in O macros to
facilitate grep-ing results for a particular entry
(e.g.: "grep ^print lsq.omac | grep 1ack")
970115 - - added DEJANA to sort O macros produced by DEJAVU or
LSQMAN; added quick starter guide to manual and a
brief description of DEJANA
970131 - 5.1 - moved a few search parameters which are rarely used
to a separate PArameter command
970729 - 5.2 - LSQMAN will now also write the aligned hits to
PDB files (can be switched off) - this is useful
for non-O users
981020 -5.2.1- minor bug fix (RMSD not always printed in list of hits)
981127 - 5.3 - new SElect options to (de)select multiple entries;
list total number of mismatched residues for every
hit; list total number of gap-length differences
(between neighbouring SSEs) for every hit; implemented
symbolic searching where spatial arrangements of
SSEs are not used, only their type and length (in
terms of residues) - can be used if you get no
hits at all, or if you have a very reliable secondary
structure prediction
In the "good old days" protein scientists made it a sport to become walking databanks of secondary structure motifs; upon seeing a particular fold, for example during a seminar, they would say: "Oh, but that fold also occurs in XXX", and, boy, did you feel stupid for having failed to notice this. Well, your worries might be coming to an end soon, thanks to DEJAVU.
DEJAVU will take a description of the secondary structure elements that occur in your particular protein and compare it to a huge database of secondary structure elements that occur in protein structures that have been published as PDB files.
What's the basic idea ? A MOTIF of secondary structure elements
(henceforth abbreviated "SSEs") consists of N SSEs, each of
which comprises M(i) residues and has a length of L(i) Angstrom
(measured from the first residue's Calpha to that of the last
residue), and which is characterised by a matrix D(i,j) which
contains the centre-to-centre distances (for example) and by
another matrix C(i,j) which contains the cosines of the angles
made by the direction vectors of the individual elements (the
direction vector goes FROM the N-terminal Calpha TO the C-terminal
one). Finding a motif in the database that is SIMILAR to that
which occurs in your protein then comes down to finding suitable
collections of N SSEs in the structures of other proteins which
have approximately the same numbers of residues, the same lengths
and comparable mutual distances and direction-vector cosines.
And that is ALL there is to it !
NOTE: unless you have compelling reasons to do otherwise, you are strongly suggested to use the INcremental search option, rather than the FInd option, since the former is much less sensitive to small differences between similar structures.
NOTE: you can also use this program with "SSEs" based on a skeleton
(Bones). Simply create an SSE file with dummy residue names,
find the terminal CA positions by clicking on the appropriate
Bones atoms & guess the number of residues as:
- N->C distance (A) divided by 1.6 A/residue for a helix
- N->C distance (A) divided by 3.4 A/residue for a strand
For more details, see: G.J. Kleywegt & T.A. Jones,
"Halloween ... Masks and Bones", in "From First Map to Final Model"
(S. Bailey, R. Hubbard & D. Waller, Eds.), SERC Daresbury Laboratory,
Warrington (1994), pp. 59-66.
NOTE: This program is sensitive to the environment variable GKLIB. If set, the name of this directory will be prepended to the default name for the library file needed by this program. For example, in Uppsala, put the following line in your .login or .cshrc file: setenv GKLIB /nfs/public/lib
This section briefly goes through the necessary steps of running DEJAVU - it is NOT a substitute for reading the manual.
* set up the programs and database as described elsewhere in this document
* run the "make_sse" script to generate an SSE file (the latest version of this script can be found in the OMAC directory)
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 264 gerard sarek 17:07:00 gerard/junk > make_ssemake_sse - generate an SSE file for DEJAVU - gerard kleywegt
NOTE: this script will ONLY work if you have the run alias set up correctly .... if you do not know what this is, ask Gerard .....................
Enter a 4 character PDB identifier for your structure > crab crab Enter the COMPLETE path and file name of your PDB file > /nfs/pdb/full/1cbs.pdb /nfs/pdb/full/1cbs.pdb Enter a comment string about your structure > crabp 2 crabp 2 Enter the name of an O database file that you own > gen.o6 gen.o6 ... running PRO1 ...
[...]
Removing temporary files ...
SSE file crab.sse created
! ! === crab ! MOL crab NOTE crabp 2 PDB /nfs/pdb/full/1cbs.pdb ! BETA 'B1' '6' '8' 3 28.796 22.676 37.211 30.802 19.424 31.450 BETA 'B2' '11' '13' 3 29.015 12.742 24.851 24.586 12.258 19.768 ALPHA 'A1' '15' '22' 8 22.385 17.198 17.681 13.432 21.938 13.569 ALPHA 'A2' '26' '36' 11 22.755 24.444 6.742 27.253 23.416 21.312 BETA 'B3' '39' '46' 8 28.851 24.201 26.747 16.441 23.718 46.062 BETA 'B4' '49' '56' 8 12.870 24.551 42.348 27.978 28.831 24.684 BETA 'B5' '59' '65' 7 23.783 31.655 22.882 11.058 26.641 37.132 BETA 'B6' '70' '72' 3 3.912 26.102 34.059 7.826 30.063 29.644 BETA 'B7' '82' '86' 5 4.988 27.197 27.492 6.946 16.621 35.945 BETA 'B8' '92' '99' 8 14.016 11.835 35.981 4.357 26.721 21.041 BETA 'B9' '106' '113' 8 6.255 18.453 18.868 20.634 11.503 37.091 BETA 'B10' '118' '125' 8 26.242 11.413 35.458 11.634 16.962 17.923 BETA 'B11' '128' '135' 8 14.286 12.486 16.407 29.697 14.762 34.394 ENDMOL
Finished with exit status 0 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
* start DEJAVU
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 265 gerard sarek 17:07:00 gerard/junk > run dejavu[...]
DEJAVU SSE library file ? (/nfs/public/lib/dejavu.lib)
List contents of SSE library (Y/N) ? (N)
Skip non-existent PDB files (Y/N) ? (N)
[...]
===> Option ? (READ) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
* read your new SSE file
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (READ) read User DEJAVU file ? (user.sse) crab.sse ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
* start an INcremental search; tweak the input parameters until you get more hits than you would hope to find (we'll get rid of the poor ones later; better to find a few poor hits now, than to miss correct ones)
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (READ) in********** NEW QUERY **********
Elements : ( B1 B2 A1 A2 B3 B4 B5 B6 B7 B8 B9 B10 B11) Nr of SSEs : ( 13) Min nr of residues for SSEs ? ( 4) Nr of SSEs : ( 10) Remaining SSEs : ( A1 A2 B3 B4 B5 B7 B8 B9 B10 B11) Min nr of elements to match (0 = abort) ? ( 4) 6
Is this a BONES search ? (N)
Do lsq_explicit inside O ? (N)
Define how much the nr of residues in SSEs may differ by defining how many residues shorter or longer SSEs in the database may be compared to those in your protein. Max nr of residues "too short" ? ( 2) Max nr of residues "too long" ? ( 4)
Mismatch element length ? ( 10.000) Mismatch distances ? ( 8.000) Mismatch cosines ? ( 0.400)
Weights for nr res, length, dist, cos, rmsd Weights for scoring ? ( 0.001 0.001 0.100 0.100 0.500) Normalised weights : ( 0.014 0.014 0.139 0.139 0.694)
Possible distance criteria: C => centre-to-centre H => MIN head-tail and tail-head (anti-parallel) T => MIN head-head and tail-tail (parallel) I => MIN of all these distances A => MAX of all these distances Which distances (C/H/T/I/A) ? (C)
Extensive output ? (N)
Conserve directionality ? (Y)
Conserve absolute motif ? (Y)
Conserve neighbours ? (N)
Attempt to avoid multi-chain hits ? (N) Attempt to avoid identical proteins ? (N)
Create O macro file ? (Y) O macro file ? (lsq.omac) Create LSQMAN input file ? (Y) LSQMAN input file ? (lsqman.inp)
[...]
Sorting hits ...
Nr Entry PDB SSE RMSD SCORE Compound ==== ===== ==== ==== ===== ===== ======== 1 152 1cbs 10 0.00 0.00 cellular retinoic-acid-binding protein type ii co - human (homo sapie 2 149 1cbi 10 1.73 1.50 mol_id: 1; - mol_id: 1; 3 490 1hmt 9 1.31 1.15 fatty acid binding protein (human muscle, m-fabp) - organism: homo sa 4 619 1lid 9 1.45 1.27 adipocyte lipid-binding protein complexed with ol - mouse (mus muscul 5 759 1opb 9 1.94 1.66 cellular retinol binding protein ii (holo form) - rat (rattus rattus 6 219 1crb 9 2.64 2.31 cellular retinol binding protein (crbp) complexed - rat (rattus rattu 7 825 1pmp 8 1.13 1.03 p2 myelin protein (p2) - bovine (bos taurus 8 380 1ftp 8 1.73 1.50 fatty-acid-binding protein - desert locust (sch 9 663 1mdc 8 2.43 2.08 fatty acid binding protein (manduca sexta) (mfb2) - tobacco hornworm 10 197 1cly 7 3.94 3.64 mol_id: 1; - 11 715 1ncb 7 6.02 5.43 n9 neuraminidase-nc41 (e.c.3.2.1.18) mutant with - influenza virus a/
[...] ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
* when you're happy, quit the program
* it is strongly recommended to now run LSQMAN to separate the men from the boys
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 266 gerard sarek 17:07:00 gerard/junk > run lsqman < lsqman.inp > lsqman.out ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
* now run DEJANA to sort out the hits you're really interested in, let it write them to a new O macro, and execute this macro from within O. The use of DEJANA is described elsewhere in this manual
* set up the programs and database as described elsewhere in this document
* you will have to create an SSE file. Usually, this means you have at least a set of Bones in which you can identify SSEs. Perhaps you have used ESSENS and SOLEX to get an SSE file (see the SOLEX manual for more details), for example:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ! Created by SOLEX V. 961228/1.0 at Sat Dec 28 23:36:51 1996 for user gerard ! MOL bone NOTE auto-generated by SOLEX PDB btrace.pdb ! BETA 'B1' ' 1' ' 12' 12 61.43 60.73 47.76 33.97 55.75 27.06 BETA 'B2' ' 13' ' 21' 9 44.24 63.08 16.44 37.40 64.56 41.58 BETA 'B3' ' 22' ' 29' 8 56.31 63.65 17.51 44.11 72.87 32.13 BETA 'B4' ' 30' ' 37' 8 49.36 51.47 27.01 61.21 66.47 37.90 BETA 'B5' ' 38' ' 45' 8 57.25 53.27 22.42 59.65 74.87 31.87 BETA 'B6' ' 46' ' 52' 7 45.76 52.50 31.42 59.24 63.58 40.97 BETA 'B7' ' 53' ' 59' 7 62.51 73.28 34.79 52.24 58.42 26.17 BETA 'B8' ' 60' ' 65' 6 47.19 65.18 19.62 39.41 67.92 33.35 ENDMOL ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
* start DEJAVU and read in your SSE file
* start an INcremental search, and answer Yes to the question if this is a Bones search. Tweak the input parameters until you get more hits than you would ever want (we'll sort out the good and the bad later)
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (READ) in********** NEW QUERY **********
Elements : ( B1 B2 B3 B4 B5 B6 B7 B8) Nr of SSEs : ( 8) Min nr of residues for SSEs ? ( 4) Nr of SSEs : ( 8) Remaining SSEs : ( B1 B2 B3 B4 B5 B6 B7 B8) Min nr of elements to match (0 = abort) ? ( 4) 6
Is this a BONES search ? (N) yes BONES search mode
BONES search; will do lsq_centroid
Define how much the nr of residues in SSEs may differ by defining how many residues shorter or longer SSEs in the database may be compared to those in your protein. BONES suggested value: 1 or 2 Max nr of residues "too short" ? ( 2) BONES suggested value: 4 to 6 Max nr of residues "too long" ? ( 4)
BONES suggested value: ~10 Mismatch element length ? ( 10.000) BONES suggested value: ~6 Mismatch distances ? ( 8.000) 6 BONES suggested value: 0.2 to 0.4 Mismatch cosines ? ( 0.400) 0.2
Weights for nr res, length, dist, cos, rmsd BONES suggested values: 0 0 1 1 5 Weights for scoring ? ( 0.001 0.001 0.100 0.100 0.500) 0 0 1 1 5 Normalised weights : ( 0.001 0.001 0.142 0.142 0.712)
Possible distance criteria: C => centre-to-centre H => MIN head-tail and tail-head (anti-parallel) T => MIN head-head and tail-tail (parallel) I => MIN of all these distances A => MAX of all these distances BONES suggested value: C !!! Which distances (C/H/T/I/A) ? (C)
Extensive output ? (N)
BONES suggested value: NO !!! Conserve directionality ? (Y) no
BONES suggested value: Y Conserve absolute motif ? (Y)
BONES suggested value: NO !!! Conserve neighbours ? (N) no
Attempt to avoid multi-chain hits ? (N) Attempt to avoid identical proteins ? (N)
Create O macro file ? (Y) O macro file ? (lsq.omac)
[...]
Nr of database entries : ( 1381) Nr of selected entries : ( 1381) Nr of matching entries : ( 54) Nr of hits (total) : ( 376)
Sorting hits ...
Nr Entry PDB SSE RMSD SCORE Compound ==== ===== ==== ==== ===== ===== ======== 1 380 1ftp 7 2.71 2.26 fatty-acid-binding protein - desert locust (sch 2 825 1pmp 6 2.20 1.92 p2 myelin protein (p2) - bovine (bos taurus 3 152 1cbs 6 2.53 2.05 cellular retinoic-acid-binding protein type ii co - human (homo sapie 4 547 1igc 6 2.74 2.42 igg1 fab fragment complexed with protein g (domai - molecule: igg1 fa 5 338 1fbi 6 2.86 2.52 fab fragment of the monoclonal antibody f9.13.7 ( - immunoglobulin f9 6 619 1lid 6 2.88 2.39 adipocyte lipid-binding protein complexed with ol - mouse (mus muscul 7 663 1mdc 6 2.93 2.57 fatty acid binding protein (manduca sexta) (mfb2) - tobacco hornworm 8 490 1hmt 6 2.94 2.41 fatty acid binding protein (human muscle, m-fabp) - organism: homo sa 9 1150 2cgr 6 3.01 2.61 igg2b (kappa) fab fragment complexed with antigen - mouse (mus muscul 10 219 1crb 6 3.01 2.62 cellular retinol binding protein (crbp) complexed - rat (rattus rattu
[...] ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
* when you're happy, quit the program
* now run DEJANA to sort out the hits you're really interested in, let it write them to a new O macro, and execute this macro from within O. The use of DEJANA is described elsewhere in this manual
In order to run DEJAVU you need a database file (which we provide) and a file which describes the SSEs of your protein. Here, we describe how you can make such a file yourself; later, we show how this process can be carried out completely automatically.
An (ASCII) input file consists of records which are all read in the format (A6,A) and which are supposed to contain (keyword, value) combinations. The only exception is the comment card, which has an exclamation mark ("!") in column 1 and may contain any text you like in the other columns. Comment cards are ignored when DEJAVU reads your file.
Keywords consist of 6 characters, but only the first THREE are really needed.
The important keywords are:
REMark - followed by any text; the text is printed when DEJAVU reads the file; may occur anywhere; note the difference with "!" cards
MOLecl - an identifier for the molecule, typically the PDB name which consists of four characters (we suggest you use four characters for your own proteins as well, although the name may be up to ten characters long); this record MUST preceed all of the following records !!
NOTe - a description of your protein, its source, possibly model number etc.; this record is optional
PDBfil - the name of the PDB file (please use COMPLETE path names); optional
ENDmol - another optional card to flag the end of the description of your molecule; it will force DEJAVU to print a brief summary of what is has just read from your file; if you omit this record, no such information is printed
In between the PDBfil and the ENDmol cards come the records which describe your protein's SSEs, one card per SSE. Such a card must contain the TYPE of secondary structure as the keyword. Valid type names are defined at the start of the database. Now (and in the foreseeable future), the only allowed types are 'ALPHA ' and 'BETA ' (note the trailing spaces !). The rest of the line must contain (in FREE format) in the following order:
- the NAME of the SSE (e.g., 'A3' for the third alpha helix)
- the NAME of the first residue (e.g., 'B234' for residue nr 234
in chain B of your protein); these must be O-names if you want
to use O for the least-squares analysis and the graphics
- the NAME of the last residue
- the NUMBER of residues
- the X,Y,Z coordinates of the Calpha atom of the first residue
- the X,Y,Z coordinates of the Calpha atom of the last residue
The following example input file demonstrates the rules described above:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ! Fil cro1.secs ! Dat Tue Oct 27 16:10:38 1992 ! Mol 1cro ! MOL 1cro NOTE cro repressor - bacteriophage (lamb PDB /nfs/public/pdb/cro1.pdb ! BETA 'B1 ' 'O2' 'O5' 4 -14.281 -31.313 -18.167 -23.175 -35.450 -16.637 ALPHA 'A1 ' 'O7' 'O13' 7 -29.257 -34.194 -18.097 -28.845 -32.180 -7.967 ALPHA 'A2 ' 'O16' 'O23' 8 -34.771 -27.785 -12.919 -28.824 -24.039 -20.669 ALPHA 'A3 ' 'O27' 'O36' 10 -37.998 -24.961 -17.921 -38.897 -38.362 -23.129 BETA 'B2 ' 'O39' 'O45' 7 -29.786 -38.963 -24.270 -15.878 -26.755 -18.342 BETA 'B3 ' 'O49' 'O56' 8 -19.552 -22.759 -18.208 -26.812 -40.941 -30.956 BETA 'B4 ' 'A2' 'A5' 4 -13.971 -31.869 -27.393 -5.357 -36.922 -28.490 ALPHA 'A4 ' 'A7' 'A13' 7 0.890 -35.709 -26.997 0.486 -34.944 -37.172 ALPHA 'A5 ' 'A16' 'A23' 8 7.112 -30.676 -32.685 0.941 -25.214 -25.866 ALPHA 'A6 ' 'A27' 'A36' 10 10.231 -27.335 -28.000 10.343 -40.059 -21.413 BETA 'B5 ' 'A39' 'A45' 7 1.183 -39.887 -20.169 -11.744 -27.270 -27.497 BETA 'B6 ' 'A49' 'A56' 8 -7.815 -23.996 -28.506 -2.038 -40.811 -13.598 BETA 'B7 ' 'A61' 'A64' 4 -0.515 -49.077 -6.661 7.429 -51.625 -0.395 BETA 'B8 ' 'B2' 'B5' 4 -9.695 -42.362 -23.899 -11.331 -37.554 -32.556 ALPHA 'A7 ' 'B7' 'B13' 7 -14.598 -38.849 -38.128 -5.003 -39.984 -40.092 ALPHA 'A8 ' 'B16' 'B23' 8 -11.330 -44.668 -45.288 -16.314 -48.999 -37.181 ALPHA 'A9 ' 'B27' 'B36' 10 -16.401 -47.176 -46.990 -22.870 -34.583 -45.529 BETA 'B9 ' 'B39' 'B45' 7 -20.900 -34.390 -36.358 -10.488 -46.927 -25.771 BETA 'B10 ' 'B49' 'B56' 8 -11.541 -50.660 -29.488 -25.975 -32.563 -31.906 BETA 'B11 ' 'C2' 'C5' 4 -19.072 -41.841 -20.389 -17.236 -36.377 -12.462 ALPHA 'A10 ' 'C7' 'C13' 7 -14.059 -37.036 -6.711 -23.682 -37.697 -4.432 ALPHA 'A11 ' 'C16' 'C23' 8 -17.641 -41.442 1.004 -12.536 -47.247 -6.179 ALPHA 'A12 ' 'C27' 'C36' 10 -12.708 -44.384 3.140 -5.894 -32.347 0.006 BETA 'B12 ' 'C39' 'C45' 7 -7.596 -33.295 -8.952 -18.764 -46.131 -18.226 BETA 'B13 ' 'C49' 'C56' 8 -18.195 -49.385 -14.312 -2.019 -32.415 -13.482 ENDMOL ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The assignment of the SSEs, i.e., determining where helices and strands begin and end, can either be done by you, or within O (with the YASSPA option).
The above file, by the way, was extracted from the database by DEJAVU. It is used in some of the examples that are shown below, so if you want to rework the examples, you may want to extract this file as well (use the EXtract option in DEJAVU, then ask for molecule 1cro).
The database file (for those interested) consists of a number of 'TYPE ' cards, which define the secondary structure types that are defined, a number of entries a la the user DEJAVU file and (optionally) a 'CHAIN ' card whic points to another database file (in this way you may chain your private database to your local database and from there on to the general PDB-derived database). Note that all records FOLLOWING a CHAIN card are IGNORED (i.e., it is NOT an INCLUDE statement !!!).
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- REMARK REMARK Secondary structure database REMARK (...) REMARK Version 0.7 - Gerard Kleywegt @ 921103 - first Uppsala structures included REMARK REMARK === list of secondary structure types that are used in this database REMARK TYPE 'ALPHA' 'alpha helix' TYPE 'BETA' 'beta strand' REMARK REMARK === PRIVATE STRUCTURES (...) REMARK REMARK === GSTA; sec structure according to ALWYN !!! NOT YASSPA !!! REMARK MOL GSTA NOTE human class alpha glutathione S-transferase model M10A REMARK BETA 'B1' 'A4' 'A7' 4 83.556 32.658 -4.327 85.981 34.524 4.814 ALPHA 'A1' 'A16' 'A25' 10 88.040 22.978 5.128 83.811 20.525 -8.112 (...) BETA 'B5' 'A203' 'A205' 3 94.355 22.919 1.194 97.646 21.706 7.281 ALPHA 'A9' 'A209' 'A218' 10 100.424 25.314 18.933 90.509 36.091 17.098 ENDMOL (...) REMARK REMARK === CHAIN TO NEXT FILE REMARK CHAIN /home/gerard/progs/secs/libs/uppsala.secs ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
When you start the program, you will see something like this:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 151 gerard rigel 21:42:26 progs/secs> DEJAVU*** DEJAVU *** DEJAVU *** DEJAVU *** DEJAVU *** DEJAVU *** DEJAVU *** DEJAVU ***
Version - 921029/0.06 By - Gerard J. Kleywegt, Dept. Mol. Biology, BMC, Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL)
Started - Thu Oct 29 21:57:05 1992 User - gerard Mode - interactive Tty - /dev/ttyq3
*** DEJAVU *** DEJAVU *** DEJAVU *** DEJAVU *** DEJAVU *** DEJAVU *** DEJAVU ***
Max nr of database entries : ( 1000) Max nr of sec-struc elements per entry : ( 150) Max nr of sec-struc types : ( 10)
DEJAVU database file ? (secs.lib)
List contents of database (Y/N) ? (N)
TYPE > ALPHA alpha helix TYPE > BETA beta strand Nr of lines read : ( 94) Nr of entries now : ( 3) CHAIN > /home/gerard/progs/secs/libs/pdb.secs
Nr of lines read : ( 20356) Nr of entries : ( 605)
+----------------------------------------------------------+ | OPTIONS: | | | | REad user DEJAVU file FInd user motif in database | | LIst a database entry EXtract a database entry | | CHeck database integrity STatistics | | QUit from DEJAVU INcremental comparison | | SElect certain entries TOpological analysis | | ! (comment; no action) ? (list options) | +----------------------------------------------------------+
===> Option ? (READ) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
You are asked to supply the name of the database file and whether or not you want a listing of the contents of the database (reply "NO" to this unless you want to see 20 kilolines of output running over your screen ...). The database(s) are then loaded and the number of entries (in this case, 605) is printed. You are then presented with a menu of options:
! = any input beginning with "!" is ignored (this allows you to
include comments in input files or scripts)
? = will result in a renewed listing of the available options
QU = will stop the program
CH = not usually needed by end-users; it checks all entries to
see if there are duplicate molecule identifiers or PDB
file names (this takes some time !)
LI = lists all entries which contain a certain string in their
molecule identifier, note or PDB file name; you may enter
the string
EX = extracts an entry from the database in a suitable format
so that this file can be used as a user input file to DEJAVU
RE = read a user DEJAVU file (must be done before one uses FI)
FI = searches for secondary structure motifs; this option is
discussed in detail in the following section
IN = incremental search ("find as many common SSEs as possible");
experience has shown that this is the method of choice !!!
An example of the use and output of the LIst option in which all entries which have the word "dna" in their note are listed:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (READ) li Search on Name, Comment or Filename ? (N) com Search string ? (p2) dnaMOL > 1dpi NOTE > /dna$ polymerase i (klenow fragment) (e.c.2.7.7.7 - (escherichia $col PDB > /nfs/public/pdb/dpi1.pdb Nr of elements : ( 37) ====== > Nr Type Name From To Nres ====== > 1 ALPHA A1 336 348 13 ====== > 2 BETA B1 351 358 8 ====== > 3 BETA B2 370 375 6 ====== > 4 BETA B3 380 385 6 [...] ====== > 35 ALPHA A20 890 905 16 ====== > 36 BETA B16 913 921 9 ====== > 37 ALPHA A21 924 927 4
MOL > 2gn5 NOTE > gene 5 /dna$ binding protein - filamentous bacteri PDB > /nfs/public/pdb/gn52.pdb Nr of elements : ( 7) ====== > Nr Type Name From To Nres ====== > 1 ALPHA A1 11 13 3 ====== > 2 BETA B1 15 19 5 ====== > 3 BETA B2 22 24 3 ====== > 4 BETA B3 26 38 13 ====== > 5 BETA B4 42 48 7 ====== > 6 BETA B5 60 62 3 ====== > 7 BETA B6 81 84 4
===> Option ? (LI) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Note that the "notes" for the PDB-derived entries were extracted by a dumb csh-script from the COMPND and SOURCE records of the corrsponding PDB files; they have not been checked by hand and may therefore be rather incomplete !
An example of the use of the EXtract option:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (LI) extr Molecule name ? (dna) 2gn5MOL > 2gn5 NOTE > gene 5 /dna$ binding protein - filamentous bacteri PDB > /nfs/public/pdb/gn52.pdb Nr of elements : ( 7) Filename ? (out.secs) 2gn5.secs
===> Option ? (EXTR) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Note that ALL entries which contain the string that you enter in
their molecule identifier are written to files !
To show that this option really works, we show the resulting file:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 182 gerard rigel 19:04:41 progs/secs> cat 2gn5.secs ! Fil 2gn5.secs ! Dat Thu Oct 29 22:10:29 1992 ! Mol 2gn5 ! MOL 2gn5 NOTE gene 5 /dna$ binding protein - filamentous bacteri PDB /nfs/public/pdb/gn52.pdb ! ALPHA 'A1 ' '11' '13' 3 9.884 15.253 22.042 8.967 11.131 19.406 BETA 'B1 ' '15' '19' 5 13.747 7.764 18.560 14.306 -3.922 13.856 BETA 'B2 ' '22' '24' 3 23.228 -7.564 9.436 22.766 -10.808 3.610 BETA 'B3 ' '26' '38' 13 18.044 -11.177 3.277 -3.221 15.221 11.399 BETA 'B4 ' '42' '48' 7 -3.554 14.308 15.412 10.385 3.316 9.016 BETA 'B5 ' '60' '62' 3 6.488 19.768 11.732 5.599 17.379 5.353 BETA 'B6 ' '81' '84' 4 7.108 8.400 4.546 10.457 17.825 5.205 ENDMOL ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
An example of the use of the REad option:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (LIST) read User DEJAVU file ? (user.secs) cro1.secsMOL > 1cro NOTE > cro repressor - bacteriophage (lamb PDB > /nfs/public/pdb/cro1.pdb ENDMOL > 1cro Nr of elements : ( 25) ====== > 1 BETA B1 O2 O5 4 ====== > 2 ALPHA A1 O7 O13 7 ====== > 3 ALPHA A2 O16 O23 8 ====== > 4 ALPHA A3 O27 O36 10 ====== > 5 BETA B2 O39 O45 7 ====== > 6 BETA B3 O49 O56 8 ====== > 7 BETA B4 A2 A5 4 ====== > 8 ALPHA A4 A7 A13 7 ====== > 9 ALPHA A5 A16 A23 8 ====== > 10 ALPHA A6 A27 A36 10 ====== > 11 BETA B5 A39 A45 7 ====== > 12 BETA B6 A49 A56 8 ====== > 13 BETA B7 A61 A64 4 ====== > 14 BETA B8 B2 B5 4 ====== > 15 ALPHA A7 B7 B13 7 ====== > 16 ALPHA A8 B16 B23 8 ====== > 17 ALPHA A9 B27 B36 10 ====== > 18 BETA B9 B39 B45 7 ====== > 19 BETA B10 B49 B56 8 ====== > 20 BETA B11 C2 C5 4 ====== > 21 ALPHA A10 C7 C13 7 ====== > 22 ALPHA A11 C16 C23 8 ====== > 23 ALPHA A12 C27 C36 10 ====== > 24 BETA B12 C39 C45 7 ====== > 25 BETA B13 C49 C56 8
Nr of lines read : ( 34) Nr of elements : ( 25)
===> Option ? (READ) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Looking for a secondary structure motif is easy. Let's take the example we used above pertaining to lambda cro repressor. We will look for a very simple "motif" consisting only of the helix-(turn)-helix of the DNA-binding domain. Actually, since we can only look for alpha helices (and beta strands, of course) we will ignore the turn, but we will impose that any "hit" in the database must consist of two helices which are quite close together (i.e., the C-terminus of helix A2 must be close to the N-terminus of helix A3).
The output looks something like this (broken into small pieces and annotated):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (LI) fi********** NEW QUERY **********
Elements : ( B1 A1 A2 A3 B2 B3 B4 A4 A5 A6 B5 B6 B7 B8 A7 A8 A9 B9 B10 B11 A10 A11 A12 B12 B13) Nr of elements to match (0 = abort) ? ( 2) 2 Query element 1 ? ( A4) A2 Query element 2 ? ( A5) A3 ................... ( A2 A3) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
DEJAVU prints a list of the SSEs in your protein and wants to know how many SSEs make up your query motif. Next, you enter their names one by one (names are case-sensitive; spaces are removed by the program).
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Mismatch nr of residues ? ( 3) 2 Mismatch element length ? ( 10.000) 6 Mismatch distances ? ( 5.000) 3 Mismatch cosines ? ( 0.150) .1 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Subsequently, the mismatch criteria must be entered. The first two are used for finding possible matching SSEs in database structures, the latter two for finding motifs of SSEs that have similar mutual distances and direction-vector cosines.
NOTE: from version 4.3 onward, the "mismatch nr of residues" has been replaced by *two* separate criteria, one which tells how many residues SSEs in the database proteins may be too short, and another which tells how many residues SSEs in the database proteins may be too long. This is especially useful when you use SSEs based on Bones; e.g., you found 6 residues in a helix but cannot exclude that the helix might be longer. In that case, use a "too short" cut-off of 1 or 2 residues, but a "too long" cut-off of 4 or even more residues.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Possible distance criteria: C => centre-to-centre H => MIN head-tail and tail-head (anti-parallel) T => MIN head-head and tail-tail (parallel) Which distances (C/H/T) ? (H)Extensive output ? (N) no ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
You must decide what type of distance criterium to use. If you
have a purely anti-parallel motif, you may use option "H" which
compares C-term-to-N-term distances; if you have a purely parallel
motif, you are better off if you use option "T" (the shortest of
the N-term-to-N-term and the C-term-to-C-term distances are used).
If you have a mixed motif or all SSEs are criss-cross, then it's
safest to use option "C" (centre-to-centre).
In addition, you may request extensive output, but you must be
suicidal if you reply "YES" to this question !!
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Conserve directionality ? (Y)Conserve absolute motif ? (Y)
Conserve neighbours ? (Y)
Create "O" macro file ? (Y)
"O" macro file ? (lsq.omac) cro_lsq.omac ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The last four input items pertain to:
(1) conservation of directionality: what this boils down to is that if you say "YES" you make sure that all elements are similarly oriented. What the program does is to sort the query elements from N-term to C-term and to make sure that the matching elements of a "hit" are also ordered from N-term to C-term. In addition, the actual cosines -rather than their absolute values- are checked. If you don't use this option, you might, for example, also find that helices A3 and A2 (in THAT order) of 1cro match your query, which is fine except that they run in the wrong direction (namely, from C-term to N-term)
(2) conservation of absolute motif or merely relatively: if you say "YES", then ALL the inter-SSE distances and cosines must satisfy the corresponding mismatch criteria; if you say "NO", then they must only hold for SUBSEQUENT SSEs (i.e., the distance from SSE nr 3 to nr 2 must be okay, but that from 3 to 1 doesn't matter, etc.). For example, if you are looking for a large beta-sheet, but you are interested in beta-barrels made up of similar strands as those in your protein as well, then don't impose the absolute motif
(3) conservation of neighbours: if you say "YES" here, it merely means that if two elements are neighbours in your structure, then they must also be neighbours in the database structures. This is a rather strict criterion, and it's probably the first you want to relax if you don't find any (or enough) hits
(4) if you want, you can get an O macro file which will do some amazing tricks for you (see later) !!
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Nr of elements recognised in query : ( 2) Indices : ( 3 4) Nr of elements of each type : ( 2 0)********** 1cro ********** [cro repressor - bacteriophage (lamb ] [/nfs/public/pdb/cro1.pdb ] QUERY : ( 3 4) Elements : A2 A3 Lengths : ( 10.462 14.405) Residues : ( 8 10)
MATCH : ( 3 4) Elements : A2 A3 Lengths : ( 10.462 14.405) Residues : ( 8 10) Length ... rmsd = 0.000 ... match = 1.000 Residues ... rmsd = 0.000 ... match = 1.000 Distance ... rmsd = 0.000 ... match = 1.000 Cosines ... rmsd = 0.000 ... match = 1.000 SCORE : ( 0.000)
MATCH : ( 9 10) Elements : A5 A6 Lengths : ( 10.696 14.328) Residues : ( 8 10) Length ... rmsd = 0.174 ... match = 1.000 Residues ... rmsd = 0.000 ... match = 1.000 Distance ... rmsd = 0.144 ... match = 1.000 Cosines ... rmsd = 0.064 ... match = 1.000 SCORE : ( 0.383)
MATCH : ( 16 17) Elements : A8 A9 Lengths : ( 10.456 14.233) Residues : ( 8 10) Length ... rmsd = 0.122 ... match = 1.000 Residues ... rmsd = 0.000 ... match = 1.000 Distance ... rmsd = 0.356 ... match = 1.000 Cosines ... rmsd = 0.030 ... match = 1.000 SCORE : ( 0.509)
MATCH : ( 22 23) Elements : A11 A12 Lengths : ( 10.552 14.182) Residues : ( 8 10) Length ... rmsd = 0.170 ... match = 1.000 Residues ... rmsd = 0.000 ... match = 1.000 Distance ... rmsd = 0.129 ... match = 1.000 Cosines ... rmsd = 0.017 ... match = 1.000 SCORE : ( 0.316) Nr of best match : ( 1) Best score : ( 0.000) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The program prints the SSEs it's going to look for and starts scanning the database. For each entry in the database, DEJAVU does the following:
(1) are there enough SSEs ?
(2) are there enough SSEs of each type (alpha, beta) ?
(3) find all possibly matching SSEs in the database structure for ALL of the elements in the query; if there aren't any for even one of the query elements, the database structure is skipped. Matching occurs by comparing type, number of residues and length of the SSEs
(4) ALL possible combinations of matching SSEs in the query and the database entry are generated which completely satisfy ALL criteria outlined earlier (distances, cosines, absolute or relative motif, directionality and neighbours)
(5) all the hits are printed and compared with the query; the matching SSEs are listed and some RMS-deviations are computed (don't worry about the match factors in the output); these are all combined into a final score; the score is 0.0 for a perfect match (see A2-A3 above which is identical to the query); the higher the score, the poorer the match
(6) for each protein which produced hits, the one with the lowest score is used to create some O instructions in the O macro file; in the example above, 1cro itself produced 4 very good hits because there are four monomers in the PDB file; note that the motif we are looking for scores 0.00
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ********** 1lap ********** [leucine aminopeptidase (e.c.3.4.11.1) - bovine (bos $taurus ] [/nfs/public/pdb/lap1.pdb ] QUERY : ( 3 4) Elements : A2 A3 Lengths : ( 10.462 14.405) Residues : ( 8 10)MATCH : ( 31 32) Elements : A16 A17 Lengths : ( 9.916 17.758) Residues : ( 7 12) Length ... rmsd = 2.402 ... match = 0.993 Residues ... rmsd = 1.581 ... match = 0.989 Distance ... rmsd = 0.797 ... match = 1.000 Cosines ... rmsd = 0.033 ... match = 1.000 SCORE : ( 4.864) Nr of best match : ( 1) Best score : ( 4.864)
********** 1trc ********** [calmodulin (/tr=2=c$ fragment comprising residues - bull (bos $taurus] [/nfs/public/pdb/trc1.pdb ] QUERY : ( 3 4) Elements : A2 A3 Lengths : ( 10.462 14.405) Residues : ( 8 10)
MATCH : ( 4 5) Elements : A3 A4 Lengths : ( 9.351 14.741) Residues : ( 8 10) Length ... rmsd = 0.821 ... match = 0.998 Residues ... rmsd = 0.000 ... match = 1.000 Distance ... rmsd = 0.187 ... match = 1.000 Cosines ... rmsd = 0.005 ... match = 1.000 SCORE : ( 1.016) Nr of best match : ( 1) Best score : ( 1.016)
===> Option ? (FI) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
So, we found "hits" with three different proteins. In this case, we used rather strict criteria in order to restrict the output a bit; if you relax the criteria somewhat, you get many more hits.
If you have coordinates for your search model (at least CA atoms), and if you have the PDB files of the hits on a local disk, you are strongly advised to run LSQMAN first, and to use DEJANA to screen the O macro produced by LSQMAN.
Otherwise, you can use DEJANA directly on the O macro produced by DEJAVU. DEJANA reads an DEJAVU or LSQMAN O macro, and allows you to apply cut-offs to get rid of unwanted (poor) hits.
For example, in case of a Bones search, the program can be used directly on the O macro produced by DEJAVU:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 274 gerard sarek 18:14:59 gerard/junk > run dejana[...]
Name of O macro (from DEJAVU or LSQMAN) ? (lsqman.omac) lsq.omac
Reading hits ... # 1 ID 1acy Nres 6 RMSD 4.08 A # 2 ID 1baf Nres 6 RMSD 4.10 A
[...]
# 54 ID 7tim Nres 6 RMSD 3.67 A
Nr of hits (> 0 residues/SSEs) : ( 54)
------------------------------------------
Min nr of matched residues/SSEs ? ( 1) Max RMSD of matched residues/SSEs ? ( 999.990)
Sorting hits ...
Nr of hits left : ( 54)
# 1 ID 1ftp Nres 7 RMSD 2.71 A # 2 ID 1pmp Nres 6 RMSD 2.20 A # 3 ID 1cbs Nres 6 RMSD 2.53 A # 4 ID 1igc Nres 6 RMSD 2.74 A # 5 ID 1fbi Nres 6 RMSD 2.86 A
[...]
# 54 ID 1for Nres 6 RMSD 5.90 A
Select one of the following options: 0 = re-enter criteria and re-sort 1 = write new O macro with current hits 2 = quit program without writing new O macro Option (0, 1, 2) ? ( 0)
------------------------------------------
Min nr of matched residues/SSEs ? ( 1) 6 Max RMSD of matched residues/SSEs ? ( 999.990) 3.5
Sorting hits ...
Nr of hits left : ( 19)
# 1 ID 1ftp Nres 7 RMSD 2.71 A # 2 ID 1pmp Nres 6 RMSD 2.20 A # 3 ID 1cbs Nres 6 RMSD 2.53 A # 4 ID 1igc Nres 6 RMSD 2.74 A # 5 ID 1fbi Nres 6 RMSD 2.86 A # 6 ID 1lid Nres 6 RMSD 2.88 A # 7 ID 1mdc Nres 6 RMSD 2.93 A # 8 ID 1hmt Nres 6 RMSD 2.94 A # 9 ID 2cgr Nres 6 RMSD 3.01 A # 10 ID 1crb Nres 6 RMSD 3.01 A # 11 ID 1iai Nres 6 RMSD 3.03 A # 12 ID 1rmf Nres 6 RMSD 3.03 A # 13 ID 1svb Nres 6 RMSD 3.05 A # 14 ID 1bbj Nres 6 RMSD 3.11 A # 15 ID 1opb Nres 6 RMSD 3.14 A # 16 ID 1eap Nres 6 RMSD 3.21 A # 17 ID 1mcp Nres 6 RMSD 3.23 A # 18 ID 1tet Nres 6 RMSD 3.31 A # 19 ID 1dbb Nres 6 RMSD 3.45 A
Select one of the following options: 0 = re-enter criteria and re-sort 1 = write new O macro with current hits 2 = quit program without writing new O macro Option (0, 1, 2) ? ( 0) 1 New O macro file ? (dejana.omac) dejana_bones.omac
Writing hits ...
Processing PDB code : (1ftp) Processing PDB code : (1pmp)
[...]
New O macro written ...
[...] ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Example of a case where coordinates were used:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 274 gerard sarek 18:14:59 gerard/junk > run dejana[...]
Maximum number of hits : ( 2500)
Name of O macro (from DEJAVU or LSQMAN) ? (lsqman.omac) lsq_crab.omac
Reading hits ... # 1 ID 1ACY Nres 26 RMSD 1.99 A # 2 ID 1AMP Nres 16 RMSD 3.45 A
[...]
# 52 ID 8FAB Nres 16 RMSD 2.14 A
Nr of hits (> 0 residues/SSEs) : ( 52)
------------------------------------------
Min nr of matched residues/SSEs ? ( 1) Max RMSD of matched residues/SSEs ? ( 999.990)
Sorting hits ...
Nr of hits left : ( 52)
# 1 ID 1CBS Nres 137 RMSD 0.00 A # 2 ID 1CBI Nres 130 RMSD 0.86 A # 3 ID 1OPB Nres 123 RMSD 1.35 A # 4 ID 1CRB Nres 123 RMSD 1.36 A # 5 ID 1HMT Nres 121 RMSD 1.36 A # 6 ID 1LID Nres 120 RMSD 1.44 A # 7 ID 1FTP Nres 120 RMSD 1.69 A # 8 ID 1PMP Nres 119 RMSD 1.37 A # 9 ID 1MDC Nres 105 RMSD 2.06 A # 10 ID 1EPA Nres 66 RMSD 1.97 A # 11 ID 1NSN Nres 43 RMSD 2.64 A
[...]
# 51 ID 1NMB Nres 8 RMSD 1.79 A # 52 ID 7FAB Nres 5 RMSD 0.44 A
Select one of the following options: 0 = re-enter criteria and re-sort 1 = write new O macro with current hits 2 = quit program without writing new O macro Option (0, 1, 2) ? ( 0) 0
------------------------------------------
Min nr of matched residues/SSEs ? ( 1) 100 Max RMSD of matched residues/SSEs ? ( 999.990) 3
Sorting hits ...
Nr of hits left : ( 9)
# 1 ID 1CBS Nres 137 RMSD 0.00 A # 2 ID 1CBI Nres 130 RMSD 0.86 A # 3 ID 1OPB Nres 123 RMSD 1.35 A # 4 ID 1CRB Nres 123 RMSD 1.36 A # 5 ID 1HMT Nres 121 RMSD 1.36 A # 6 ID 1LID Nres 120 RMSD 1.44 A # 7 ID 1FTP Nres 120 RMSD 1.69 A # 8 ID 1PMP Nres 119 RMSD 1.37 A # 9 ID 1MDC Nres 105 RMSD 2.06 A
Select one of the following options: 0 = re-enter criteria and re-sort 1 = write new O macro with current hits 2 = quit program without writing new O macro Option (0, 1, 2) ? ( 0) 1 New O macro file ? (dejana.omac) dejana_crab.omac
Writing hits ...
Processing PDB code : (1CBS) Processing PDB code : (1CBI) Processing PDB code : (1OPB) Processing PDB code : (1CRB) Processing PDB code : (1HMT) Processing PDB code : (1LID) Processing PDB code : (1FTP) Processing PDB code : (1PMP) Processing PDB code : (1MDC)
New O macro written ...
[...] ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
NOTE: from version 5.0 onwards, one would use the accompanying program DEJANA to sort out the hits, and save only the most promising ones to a new O macro.
Analysing and evaluating the "hits" is best done in O. The previous example resulted in the following O macro:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 187 gerard rigel 19:04:41 progs/secs> cat cro_lsq.omac ! "O" macro cro_lsq.omac ! created by DEJAVU at Thu Oct 29 22:27:18 1992 ! print ... analysing 1cro print cro repressor - bacteriophage (lamb print ... query A2 A3 print ... allowed mismatches 2 6.000 3.000 0.100 print ... distance type H print ... directionality Y print ... absolute motif Y print ... neighbours Y ! s_a_i /nfs/public/pdb/cro1.pdb 1cro mol 1cro obj c1cro pai_zo 1cro ; yellow pai_zo 1cro O16 O23 green pai_zo 1cro O27 O36 green ca ; end cent_id term_id 1cro O16 CA ; ! db_set_dat .lsq_integer 1 1 50 db_set_dat .lsq_integer 2 4 4 db_set_dat .lsq_integer 3 3 16999999 ! o_setup off off on ! ! print ... comparing 1cro print cro repressor - bacteriophage (lamb print ... score = 0.0000000E+00 ! s_a_i /nfs/public/pdb/cro1.pdb 1cro pdb ! lsq_expl 1cro 1cro O16 O23 CA O16 O27 O36 CA O27 ; 1cro_to_1cro ! lsq_impr 1cro_to_1cro 1cro ; 1cro ; CA 1cro_to_1cro ! lsq_mol 1cro_to_1cro 1cro ; ! mol 1cro obj c1cro pai_zo 1cro ; blue pai_zo 1cro O16 O23 red pai_zo 1cro O27 O36 red ca ; end ! ! print ... comparing 1lap print leucine aminopeptidase (e.c.3.4.11.1) - bovine (bos $taurus print ... score = 4.864332 ! s_a_i /nfs/public/pdb/lap1.pdb 1lap pdb ! lsq_expl 1cro 1lap O16 O23 CA 404 O27 O36 CA 428 ; 1lap_to_1cro ! lsq_impr 1lap_to_1cro 1cro ; 1lap ; CA 1lap_to_1cro ! lsq_mol 1lap_to_1cro 1lap ; ! mol 1lap obj c1lap pai_zo 1lap ; blue pai_zo 1lap 404 410 red pai_zo 1lap 428 439 red ca ; end ! ! print ... comparing 1trc print calmodulin (/tr=2=c$ fragment comprising residues - bull (bos $taurus print ... score = 1.016416 ! s_a_i /nfs/public/pdb/trc1.pdb 1trc pdb ! lsq_expl 1cro 1trc O16 O23 CA A103 O27 O36 CA A118 ; 1trc_to_1cro ! lsq_impr 1trc_to_1cro 1cro ; 1trc ; CA 1trc_to_1cro ! lsq_mol 1trc_to_1cro 1trc ; ! mol 1trc obj c1trc pai_zo 1trc ; blue pai_zo 1trc A103 A110 red pai_zo 1trc A118 A127 red ca ; end ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Let's run O and execute this macro (the output of the fitting of 1cro onto itself has been omitted):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 190 gerard rigel 23:08:50 secs/database> 4d_ono general.o O > Use of this program implies acceptance of conditions O > described in Appendix 10 of the O manual O > O version 5.8, Sat Sep 26 13:59:06 MET 1992 O > Loading general.o O > Maximum inter-residue link distance = 6.00 O > There were 23 residues. O > 113 atoms. O > Do you want to use the display? [Yes]: O > Graphics board GL4DXG-4.0 O > O > trackball on (F7KEY) O > trackball off (F7KEY) @cro_lsq.omac O > Macro in computer file-system. As4> ... analysing 1cro O > As4> cro repressor - bacteriophage (lamb O > As4> ... query A2 A3 O > As4> ... allowed mismatches 2 6.000 3.000 0.100 O > As4> ... distance type H O > As4> ... directionality Y O > As4> ... absolute motif Y O > As4> ... neighbours Y O > O > Sam> File type is PDB Sam> Database compressed. Sam> Molecule 1CRO contained 264 residues and 264 atoms O > O > O > O > O > O > O > O > O > O > O > O > O > O > As4> ... comparing 1cro [...] O > O > O > O > O > O > O > O > O > O > O > As4> ... comparing 1lap O > As4> leucine aminopeptidase (e.c.3.4.11.1) - bovine (bos $taurus O > As4> ... score = 4.864332 O > O > Sam> File type is PDB Sam> Database compressed. Sam> Molecule 1LAP contained 483 residues and 4491 atoms O > PDB is not a visible command. O > O > Lsq > Now define what atoms in A [=1CRO] are to be matched to B [=1LAP] Lsq > Defining 3 names in 1CRO implies a zone and an atom name. Lsq > Defining 2 names in 1CRO implies a zone and CA atoms. Lsq > Defining 1 name in 1CRO implies the CA of that residue. Lsq > Molecule 1LAP just requires the start residue and atom name. Lsq > A blank line terminates input. Lsq > Define atoms from 1CRO (the not rotated molecule): Lsq > Define atoms from 1LAP (the rotated molecule): Lsq > Define atoms from 1CRO (the not rotated molecule): Lsq > Define atoms from 1LAP (the rotated molecule): Lsq > Define atoms from 1CRO (the not rotated molecule): Lsq > The 18 atoms have an r.m.s. fit of 5.768 Lsq > xyz(1) = 0.9571*x+ 0.1367*y+ -0.2555*z+ -112.0573 Lsq > xyz(2) = 0.2552*x+ 0.0197*y+ 0.9667*z+ -70.0792 Lsq > xyz(3) = 0.1371*x+ -0.9904*y+ -0.0160*z+ 33.9509 Lsq > The transformation can be stored in O. Lsq > A blank is taken to mean do not store anything Lsq > The transformation will be stored in .LSQ_RT_ O > O > Lsq > Least squares match by Semi Automatic Alignment. Lsq > What is the name of molecule B [1LAP ]? Lsq > Number of atoms in A/B to look for alignment 264 481 Lsq > 0Search for connected fragments. Lsq > A fragment of 8 residues located. Lsq > Loop = 1 ,r.m.s. fit = 0.346 with 8 atoms Lsq > x(1) = 0.9335*x+ -0.2296*y+ 0.2756*z+ -97.8013 Lsq > x(2) = -0.3366*x+ -0.2957*y+ 0.8940*z+ -6.6633 Lsq > x(3) = -0.1238*x+ -0.9273*y+ -0.3533*z+ 54.2608 Lsq > 0Search for connected fragments. Lsq > A fragment of 14 residues located. Lsq > Loop = 2 ,r.m.s. fit = 2.143 with 14 atoms Lsq > x(1) = 0.1328*x+ -0.9509*y+ -0.2794*z+ 18.4068 Lsq > x(2) = -0.2737*x+ -0.3061*y+ 0.9118*z+ -9.3083 Lsq > x(3) = -0.9526*x+ -0.0446*y+ -0.3009*z+ 58.7248 Lsq > 0Search for connected fragments. Lsq > A fragment of 15 residues located. Lsq > A fragment of 6 residues located. Lsq > Loop = 3 ,r.m.s. fit = 2.612 with 21 atoms Lsq > x(1) = 0.0871*x+ -0.9605*y+ -0.2645*z+ 22.0105 Lsq > x(2) = -0.2722*x+ -0.2783*y+ 0.9211*z+ -11.2710 Lsq > x(3) = -0.9583*x+ -0.0082*y+ -0.2857*z+ 56.8081 Lsq > 0Search for connected fragments. Lsq > A fragment of 15 residues located. Lsq > A fragment of 6 residues located. Lsq > Loop = 4 ,r.m.s. fit = 2.612 with 21 atoms Lsq > x(1) = 0.0871*x+ -0.9605*y+ -0.2645*z+ 22.0105 Lsq > x(2) = -0.2722*x+ -0.2783*y+ 0.9211*z+ -11.2710 Lsq > x(3) = -0.9583*x+ -0.0082*y+ -0.2857*z+ 56.8081 Lsq > The transformation can be stored in O. Lsq > A blank is taken to mean do not store anything Lsq > The transformation will be stored in .LSQ_RT_ Lsq > Here are the fragments used in the alignment Lsq > 0 O23 LGVYQSAINKAIHAG O37 Lsq > 425 RSAGACTAAAFLKEF 439 Lsq > 0 O39 KIFLTI O44 Lsq > 326 IQVDNT 331 O > O > O > O > O > O > O > O > O > O > O > As4> ... comparing 1trc O > As4> calmodulin (/tr=2=c$ fragment comprising residues - bull (bos $tau O > As4> ... score = 1.016416 O > O > Sam> File type is PDB Sam> Database compressed. Sam> Molecule 1TRC contained 140 residues and 1089 atoms O > PDB is not a visible command. O > O > Lsq > Now define what atoms in A [=1CRO] are to be matched to B [=1TRC] Lsq > Defining 3 names in 1CRO implies a zone and an atom name. Lsq > Defining 2 names in 1CRO implies a zone and CA atoms. Lsq > Defining 1 name in 1CRO implies the CA of that residue. Lsq > Molecule 1TRC just requires the start residue and atom name. Lsq > A blank line terminates input. Lsq > Define atoms from 1CRO (the not rotated molecule): Lsq > Define atoms from 1TRC (the rotated molecule): Lsq > Define atoms from 1CRO (the not rotated molecule): Lsq > Define atoms from 1TRC (the rotated molecule): Lsq > Define atoms from 1CRO (the not rotated molecule): Lsq > The 18 atoms have an r.m.s. fit of 2.956 Lsq > xyz(1) = 0.0832*x+ -0.6134*y+ -0.7854*z+ 62.0348 Lsq > xyz(2) = 0.5658*x+ 0.6778*y+ -0.4695*z+ -22.2287 Lsq > xyz(3) = 0.8204*x+ -0.4053*y+ 0.4034*z+ -91.4498 Lsq > The transformation can be stored in O. Lsq > A blank is taken to mean do not store anything Lsq > The transformation will be stored in .LSQ_RT_ O > O > Lsq > Least squares match by Semi Automatic Alignment. Lsq > What is the name of molecule B [1TRC ]? Lsq > Number of atoms in A/B to look for alignment 264 140 Lsq > 0Search for connected fragments. Lsq > A fragment of 15 residues located. Lsq > A fragment of 10 residues located. Lsq > Loop = 1 ,r.m.s. fit = 2.363 with 25 atoms Lsq > x(1) = 0.1272*x+ -0.5979*y+ -0.7914*z+ 60.8691 Lsq > x(2) = 0.6057*x+ 0.6787*y+ -0.4153*z+ -29.7156 Lsq > x(3) = 0.7854*x+ -0.4266*y+ 0.4485*z+ -93.8586 Lsq > 0Search for connected fragments. Lsq > A fragment of 15 residues located. Lsq > A fragment of 10 residues located. Lsq > Loop = 2 ,r.m.s. fit = 2.363 with 25 atoms Lsq > x(1) = 0.1272*x+ -0.5979*y+ -0.7914*z+ 60.8691 Lsq > x(2) = 0.6057*x+ 0.6787*y+ -0.4153*z+ -29.7156 Lsq > x(3) = 0.7854*x+ -0.4266*y+ 0.4485*z+ -93.8586 Lsq > The transformation can be stored in O. Lsq > A blank is taken to mean do not store anything Lsq > The transformation will be stored in .LSQ_RT_ Lsq > Here are the fragments used in the alignment Lsq > 0 O13 RFGQTKTAKD O22 Lsq > A99 YISAAELRHV A108 Lsq > 0 O23 LGVYQSAINKAIHAG O37 Lsq > A114 EKLTDEEVDEMIREA A128 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
If we now check the displayed objects, we notice that the fit with calmodulin
is quite reasonable (rms = 2.4 A for 25 atoms; helix E of the calcium-
binding EF-hand has been matched with helix A3 of lambda cro repressor).
However, for leucine aminopeptidase the fit is not so good. In this case, only
one helix overlaps with one of cro. This is an example where the lsq_improve
option in O actually makes things worse (for our purposes, at least). If we
re-do the lsq_explicit from the macro and redraw the chain, the visual fit is
improved. The fit is still relatively poor, but the MOTIF is really there: a
helix, a long loop and another helix with roughly the same orientation as
that of the helices in cro. And this is of course the crux of DEJAVU: even though
the sequence homology may be zero and the rms-fit of the Calpha-atoms may be
high, you still get to see motifs which are "spatially similar" !!! So, the
extremely simplistic description of SSEs (basically, through six coordinates)
works to the advantage of the performance of the program !
Again, we used very strict criteria in this example and therefore we only got two hits. If you relax them a bit you get dozens of potential (DNA-binding ???) helix-whatever-helix motifs. If you do this and you plot all of the "hits" you typically get a nice clustering of red SSEs on your screen (the colour of the matched SSEs) from a collection of widely different proteins.
Let's do some more serious work. We have reasons to believe that the B1-A1-B2 plus the B3-B4-A3 motifs of human class alpha glutathione S-transferase might constitute a glutathione-binding domain. Are there similar motifs in the database, preferably of proteins that bind glutathione ? Well, let's find out:
First, we create and read our DEJAVU file for GSTA:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (READ)User DEJAVU file ? (user.secs) gsta.secs
REMARK > === GSTA; sec structure according to ALWYN !!! NOT YASSPA !!! MOL > gsta NOTE > human class alpha glutathione s-transferase model m10a ENDMOL > gsta Nr of elements : ( 14) ====== > 1 BETA B1 A4 A7 4 ====== > 2 ALPHA A1 A16 A25 10 ====== > 3 BETA B2 A27 A35 9 ====== > 4 ALPHA A2 A37 A46 10 ====== > 5 BETA B3 A56 A58 3 ====== > 6 BETA B4 A62 A65 4 ====== > 7 ALPHA A3 A67 A78 12 ====== > 8 ALPHA A4 A85 A110 26 ====== > 9 ALPHA A5 A113 A141 29 ====== > 10 ALPHA A6 A154 A169 16 ====== > 11 ALPHA A7 A178 A189 12 ====== > 12 ALPHA A8 A191 A197 7 ====== > 13 BETA B5 A203 A205 3 ====== > 14 ALPHA A9 A209 A218 10
Nr of lines read : ( 21) Nr of elements : ( 14) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Then we enter the search parameters:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ********** NEW QUERY **********Elements : ( B1 A1 B2 A2 B3 B4 A3 A4 A5 A6 A7 A8 B5 A9) Nr of elements to match (0 = abort) ? ( 0) 6 Query element 1 ? () B1 Query element 2 ? () A1 Query element 3 ? () B2 Query element 4 ? () B3 Query element 5 ? () B4 Query element 6 ? () A3 ................... ( B1 A1 B2 B3 B4 A3) Mismatch nr of residues ? ( 3) 4 Mismatch element length ? ( 10.000) 13 Mismatch distances ? ( 5.000) 10 Mismatch cosines ? ( 0.150) 0.4
Possible distance criteria: C => centre-to-centre H => MIN head-tail and tail-head (anti-parallel) T => MIN head-head and tail-tail (parallel) Which distances (C/H/T) ? (C) c Extensive output ? (N)
Conserve directionality ? (Y)
Conserve absolute motif ? (Y)
Conserve neighbours ? (Y) n Create "O" macro file ? (Y)
"O" macro file ? (lsq.omac) gsta_lsq.omac ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
And then we watch the results (the "trivial hit", namely GSTA itself) has been omitted from the output:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Nr of elements recognised in query : ( 6) Indices : ( 1 2 3 5 6 7) Nr of elements of each type : ( 2 4)********** 1gp1 ********** [glutathione peroxidase (e.c.1.11.1.9) - bovine (bos $taurus ] [/nfs/public/pdb/gp11.pdb ] QUERY : ( 1 2 3 5 6 7) Elements : B1 A1 B2 B3 B4 A3 Lengths : ( 9.640 14.114 24.862 6.844 9.271 16.715) Residues : ( 4 10 9 3 4 12)
MATCH : ( 4 5 7 14 15 17) Elements : B3 A2 B4 B9 B10 A7 Lengths : ( 22.528 20.107 22.531 19.264 18.742 10.189) Residues : ( 8 14 8 7 7 8) Length ... rmsd = 9.074 ... match = 0.892 Residues ... rmsd = 3.512 ... match = 0.922 Distance ... rmsd = 2.407 ... match = 0.978 Cosines ... rmsd = 0.148 ... match = 0.985 SCORE : ( 16.672)
MATCH : ( 20 21 23 29 30 32) Elements : B13 A8 B14 B18 B19 A13 Lengths : ( 22.630 19.887 22.532 16.943 10.320 10.139) Residues : ( 8 14 8 6 4 8) Length ... rmsd = 7.680 ... match = 0.906 Residues ... rmsd = 3.109 ... match = 0.932 Distance ... rmsd = 2.432 ... match = 0.980 Cosines ... rmsd = 0.155 ... match = 0.984 SCORE : ( 14.560) Nr of best match : ( 2) Best score : ( 14.560) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
And, voila, the only hit (other than GSTA itself) is glutathione peroxidase !!! In fact, there are two possible matches ! Since the O macro only contains instructions for the one with the lowest score, but we want to look at both, we LIst this entry in order to edit the macro a bit and produce both matches on the screen:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (FI) li Search on Name, Comment or Filename ? (N) n Search string ? (p2) 1gp1MOL > 1gp1 NOTE > glutathione peroxidase (e.c.1.11.1.9) - bovine (bos $taurus PDB > /nfs/public/pdb/gp11.pdb Nr of elements : ( 32) ====== > Nr Type Name From To Nres ====== > 1 BETA B1 A15 A17 3 ====== > 2 BETA B2 A25 A27 3 ====== > 3 ALPHA A1 A29 A31 3 ====== > 4 BETA B3 A35 A42 8 ====== > 5 ALPHA A2 A48 A61 14 ====== > 6 ALPHA A3 A63 A65 3 ====== > 7 BETA B4 A67 A74 8 ====== > 8 ALPHA A4 A85 A93 9 ====== > 9 BETA B5 A100 A102 3 ====== > 10 BETA B6 A106 A108 3 ====== > 11 BETA B7 A111 A113 3 ====== > 12 ALPHA A5 A120 A128 9 ====== > 13 BETA B8 A150 A152 3 ====== > 14 BETA B9 A160 A166 7 ====== > 15 BETA B10 A170 A176 7 ====== > 16 ALPHA A6 A181 A183 3 ====== > 17 ALPHA A7 A185 A192 8 ====== > 18 BETA B11 B15 B18 4 ====== > 19 BETA B12 B25 B27 3 ====== > 20 BETA B13 B35 B42 8 ====== > 21 ALPHA A8 B48 B61 14 ====== > 22 ALPHA A9 B63 B65 3 ====== > 23 BETA B14 B67 B74 8 ====== > 24 ALPHA A10 B85 B93 9 ====== > 25 BETA B15 B100 B104 5 ====== > 26 BETA B16 B106 B108 3 ====== > 27 ALPHA A11 B120 B128 9 ====== > 28 BETA B17 B150 B152 3 ====== > 29 BETA B18 B161 B166 6 ====== > 30 BETA B19 B173 B176 4 ====== > 31 ALPHA A12 B181 B183 3 ====== > 32 ALPHA A13 B185 B192 8 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Of course, the two matches occur with each of the two monomers in the dimer, but since the assignments of the SSEs are slightly different, we still produce both matches.
The resulting O macro looks like this:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 194 gerard rigel 23:08:50 secs/database> cat gsta_lsq.omac ! "O" macro gsta_lsq.omac ! created by DEJAVU at Thu Oct 29 23:46:17 1992 ! print ... analysing gsta print human class alpha glutathione s-transferase model m10a print ... query B1 A1 B2 B3 B4 A3 print ... allowed mismatches 4 13.000 10.000 0.400 print ... distance type C print ... directionality Y print ... absolute motif Y print ... neighbours N ! mol gsta obj xgsta pai_zo gsta ; yellow pai_zo gsta A4 A7 green pai_zo gsta A16 A25 green pai_zo gsta A27 A35 green pai_zo gsta A56 A58 green pai_zo gsta A62 A65 green pai_zo gsta A67 A78 green ca ; end cent_id term_id gsta A4 CA ; ! db_set_dat .lsq_integer 1 1 50 db_set_dat .lsq_integer 2 4 4 db_set_dat .lsq_integer 3 3 16999999 ! o_setup off off on ! ! print ... comparing 1gp1 print glutathione peroxidase (e.c.1.11.1.9) - bovine (bos $taurus print ... score = 14.55962 ! s_a_i /nfs/public/pdb/gp11.pdb 1gp1 pdb ! lsq_expl gsta 1gp1 A4 A7 CA B35 A16 A25 CA B48 A27 A35 CA B67 A56 A58 CA B161 A62 A65 CA B173 A67 A78 CA B185 ; 1gp1_to_gsta ! lsq_impr 1gp1_to_gsta gsta ; 1gp1 ; CA 1gp1_to_gsta ! lsq_mol 1gp1_to_gsta 1gp1 ; ! mol 1gp1 obj c1gp1 pai_zo 1gp1 ; blue pai_zo 1gp1 B35 B42 red pai_zo 1gp1 B48 B61 red pai_zo 1gp1 B67 B74 red pai_zo 1gp1 B161 B166 red pai_zo 1gp1 B173 B176 red pai_zo 1gp1 B185 B192 red ca ; end ! ! s_a_i /nfs/public/pdb/gp11.pdb xgp1 pdb ! lsq_expl gsta xgp1 A4 A7 CA A35 A16 A25 CA A48 A27 A35 CA A67 A56 A58 CA A160 A62 A65 CA A170 A67 A78 CA A185 ; xgp1_to_gsta ! lsq_impr xgp1_to_gsta gsta ; xgp1 ; CA xgp1_to_gsta ! lsq_mol xgp1_to_gsta xgp1 ; ! mol 1gp1 obj cxgp1 pai_zo xgp1 ; blue pai_zo xgp1 A35 A42 red pai_zo xgp1 A48 A61 red pai_zo xgp1 A67 A74 red pai_zo xgp1 A160 A166 red pai_zo xgp1 A170 A176 red pai_zo xgp1 A185 A192 red ca ; end ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Executing this macro gives the following output (edited):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 196 gerard rigel 23:08:50 secs/database> 4d_ono general.o O > Use of this program implies acceptance of conditions O > described in Appendix 10 of the O manual O > O version 5.8, Sat Sep 26 13:59:06 MET 1992 [...] @gsta_lsq.omac O > Macro in computer file-system. As4> ... analysing gsta O > As4> human class alpha glutathione s-transferase model m10a O > As4> ... query B1 A1 B2 B3 B4 A3 O > As4> ... allowed mismatches 4 13.000 10.000 0.400 O > As4> ... distance type C O > As4> ... directionality Y O > As4> ... absolute motif Y O > As4> ... neighbours N O > O > O > O > O > O > O > O > O > O > O > O > O > O > O > O > O > O > O > As4> ... comparing 1gp1 O > As4> glutathione peroxidase (e.c.1.11.1.9) - bovine (bos $taurus O > As4> ... score = 14.55962 [...] Lsq > The 30 atoms have an r.m.s. fit of 3.645 Lsq > xyz(1) = -0.7311*x+ 0.6446*y+ 0.2236*z+ 83.3897 Lsq > xyz(2) = 0.1075*x+ -0.2147*y+ 0.9707*z+ -7.7601 Lsq > xyz(3) = 0.6737*x+ 0.7338*y+ 0.0877*z+ -33.9970 [...] Lsq > 0Search for connected fragments. Lsq > A fragment of 26 residues located. Lsq > A fragment of 14 residues located. Lsq > A fragment of 9 residues located. Lsq > A fragment of 9 residues located. Lsq > Loop = 10 ,r.m.s. fit = 2.529 with 58 atoms Lsq > x(1) = -0.7038*x+ 0.7023*y+ 0.1070*z+ 85.7188 Lsq > x(2) = 0.0950*x+ -0.0562*y+ 0.9939*z+ -10.9052 Lsq > x(3) = 0.7040*x+ 0.7097*y+ -0.0272*z+ -29.9750 Lsq > 0Search for connected fragments. Lsq > A fragment of 24 residues located. Lsq > A fragment of 16 residues located. Lsq > A fragment of 9 residues located. Lsq > A fragment of 9 residues located. Lsq > Loop = 11 ,r.m.s. fit = 3.361 with 58 atoms Lsq > x(1) = -0.6967*x+ 0.7093*y+ 0.1072*z+ 85.3970 Lsq > x(2) = 0.0397*x+ -0.1111*y+ 0.9930*z+ -8.9049 Lsq > x(3) = 0.7162*x+ 0.6961*y+ 0.0493*z+ -33.0698 Lsq > The transformation can be stored in O. Lsq > A blank is taken to mean do not store anything Lsq > The transformation will be stored in .LSQ_RT_ Lsq > Here are the fragments used in the alignment Lsq > 0 A4 PKLHYFNARGRMESTRWLLAAAGV A27 Lsq > B36 LLIENVASL GTTVRDYTQMNDLQ B59 Lsq > 0 A28 EFEEKFIKS A36 Lsq > B68 VVLGFPCNQ B76 Lsq > 0 A52 QQVPMVEID A60 Lsq > B157 SWNFEKFLV B165 Lsq > 0 A61 GMKLVQTRAILNYIAS A76 Lsq > B171 PVRRYSRRFLTIDIEP B186 [...] Sam> Molecule XGP1 contained 555 residues and 3111 atoms [...] Lsq > The 30 atoms have an r.m.s. fit of 4.841 Lsq > xyz(1) = -0.1827*x+ -0.7881*y+ -0.5879*z+ 157.7386 Lsq > xyz(2) = 0.8678*x+ 0.1518*y+ -0.4732*z+ 15.9964 Lsq > xyz(3) = 0.4621*x+ -0.5966*y+ 0.6561*z+ -2.8169 Lsq > The transformation can be stored in O. [...] Lsq > 0Search for connected fragments. Lsq > A fragment of 24 residues located. Lsq > A fragment of 14 residues located. Lsq > A fragment of 9 residues located. Lsq > A fragment of 9 residues located. Lsq > A fragment of 5 residues located. Lsq > Loop = 9 ,r.m.s. fit = 3.248 with 61 atoms Lsq > x(1) = -0.1430*x+ -0.6702*y+ -0.7282*z+ 154.9774 Lsq > x(2) = 0.9470*x+ 0.1212*y+ -0.2975*z+ 9.6677 Lsq > x(3) = 0.2877*x+ -0.7322*y+ 0.6174*z+ 9.8883 Lsq > The transformation can be stored in O. Lsq > A blank is taken to mean do not store anything Lsq > The transformation will be stored in .LSQ_RT_ Lsq > Here are the fragments used in the alignment Lsq > 0 A4 PKLHYFNARGRMESTRWLLAAAGV A27 Lsq > A36 LLIENVASL GTTVRDYTQMNDLQ A59 Lsq > 0 A28 EFEEKFIKS A36 Lsq > A68 VVLGFPCNQ A76 Lsq > 0 A45 NDGYL A49 Lsq > A153 RNDVS A157 Lsq > 0 A52 QQVPMVEID A60 Lsq > A157 SWNFEKFLV A165 Lsq > 0 A61 GMKLVQTRAILNYI A74 Lsq > A172 VRRYSRRFLTIDIE A185 [...] ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Again, the sequence similarity is negligible, the rms-value of the fit
is not too impressive, but if you look on the screen you see a very
reasonable fit (except for the last helix) !!!
One also notes that the two monomers overlap exactly, which implies that
the differences in SSE-assignments must be due to round-off errors in
YASSPA.
By the way, the "o_setup" instruction in the macro ensures that you get
a log file from O; this will be called o_log.lst. Print it and stick
it right into your laboratory notebook !!!
If you are too lazy to make your own DEJAVU input files, you can do it partially or even completely automatically. To this end there are two companion pre-processing programs, PRO1 and PRO2, as well as a csh-script called "makedb".
this program requires a simple ASCII input file and will produce a macro for running YASSPA with O and creating several intermediate files as well as a csh-script for deleting all intermediate files afterwards. Running this program gives the following output:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 113 gerard sirius 20:03:14 secs/database> cat pro1.log*** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 ***
Version - 921027/0.03 By - Gerard J. Kleywegt, Dept. Mol. Biology, BMC, Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL)
Started - Fri Oct 30 20:01:03 1992 User - Mode - batch Not using a tty as input device
*** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 ***
PRO1 input file ? (pro1.inp) u.inp
"O" script file ? (pro1.omac) u.omac
csh script file ? (pro1.csh) u.csh
... processing 1UBQ ... processing 1UTG ... processing 2UTG
Nr of lines read : ( 6) Nr of proteins read : ( 3) Nr of proteins processed : ( 3)
*** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 ***
Version - 921027/0.03 Started - Fri Oct 30 20:01:03 1992 Stopped - Fri Oct 30 20:01:03 1992
CPU-time taken : User - 0.0 Sys - 0.1 Total - 0.1
*** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** PRO1 *** ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The input file must look something like this:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ! u.inp ! Input file for PRO1/PRO2 created by script makedb ! File created at Fri Oct 30 20:01:01 MET 1992 '1UBQ' '/nfs/public/pdb/ubq1.pdb' 'UBIQUITIN - HUMAN (HOMO $SAPIEN' '1UTG' '/nfs/public/pdb/utg1.pdb' 'UTEROGLOBIN (OXIDIZED) - RABBIT (ORYCTOLAGUS' '2UTG' '/nfs/public/pdb/utg2.pdb' 'UTEROGLOBIN - UTERINE SECRETIONS ' ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Every line beginning with an exclamation mark in column 1 is ignored.
The others must have the following items on one line:
- protein identifier
- PDB-file name (absolute pathnames, please !)
- comment or notes
The csh-script will look like this:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----# 1UBQ /bin/rm 1UBQ.name /bin/rm 1UBQ.struc /bin/rm 1UBQ.ca
# 1UTG /bin/rm 1UTG.name /bin/rm 1UTG.struc /bin/rm 1UTG.ca
# 2UTG /bin/rm 2UTG.name /bin/rm 2UTG.struc /bin/rm 2UTG.ca ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The O macro may look as follows:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ! ! 1UBQ s_a_i /nfs/public/pdb/ubq1.pdb 1UBQ mol 1UBQ yasspa 1UBQ alpha 0.5 yasspa 1UBQ beta 0.8 wr 1UBQ_residue_name 1UBQ.name (1x,5a6) wr 1UBQ_residue_2ry_struc 1UBQ.struc (1x,5a6) sel_on 1UBQ ; sel_prop atom_name ^= ca off s_a_o 1UBQ.ca pdb 1UBQ ;; yes ;; db_kill 1UBQ* ! ! 1UTG s_a_i /nfs/public/pdb/utg1.pdb 1UTG mol 1UTG yasspa 1UTG alpha 0.5 yasspa 1UTG beta 0.8 wr 1UTG_residue_name 1UTG.name (1x,5a6) wr 1UTG_residue_2ry_struc 1UTG.struc (1x,5a6) sel_on 1UTG ; sel_prop atom_name ^= ca off s_a_o 1UTG.ca pdb 1UTG ;; yes ;; db_kill 1UTG* ! ! 2UTG s_a_i /nfs/public/pdb/utg2.pdb 2UTG mol 2UTG yasspa 2UTG alpha 0.5 yasspa 2UTG beta 0.8 wr 2UTG_residue_name 2UTG.name (1x,5a6) wr 2UTG_residue_2ry_struc 2UTG.struc (1x,5a6) sel_on 2UTG ; sel_prop atom_name ^= ca off s_a_o 2UTG.ca pdb 2UTG ;; yes ;; db_kill 2UTG* ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
After running O with this macro, we have the following files:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- 2 -rw-r--r-- 1 gerard 918 Oct 30 19:58 1UBQ.struc 2 -rw-r--r-- 1 gerard 918 Oct 30 19:58 1UBQ.name 10 -rw-r--r-- 1 gerard 5092 Oct 30 19:58 1UBQ.ca 3 -rw-r--r-- 1 gerard 1040 Oct 30 19:59 1UTG.name 3 -rw-r--r-- 1 gerard 1040 Oct 30 19:59 1UTG.struc 10 -rw-r--r-- 1 gerard 4690 Oct 30 19:59 1UTG.ca 4 -rw-r--r-- 1 gerard 2012 Oct 30 19:59 2UTG.name 4 -rw-r--r-- 1 gerard 2012 Oct 30 19:59 2UTG.struc 19 -rw-r--r-- 1 gerard 9380 Oct 30 19:59 2UTG.ca ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The "XXXX.struc" files contain the YASSPA datablocks, the "XXXX.name" files the residue identifiers and the "XXXX.ca" files are PDB coordinate files for only the Calpha atoms.
now you are ready to run PRO2. It uses the same input file that PRO1 read earlier as well as all the files created by O. The output looks like this:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 118 gerard sirius 20:03:14 secs/database> cat pro2.log*** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 ***
Version - 921026/0.02 By - Gerard J. Kleywegt, Dept. Mol. Biology, BMC, Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL)
Started - Fri Oct 30 20:02:18 1992 User - Mode - batch Not using a tty as input device
*** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 ***
Max nr of residues per protein : ( 5000) Max nr of sec structure types : ( 10)
PRO2 input file ? (pro1.inp) u.inp
DEJAVU database file ? (secs.newlib) u.secs
How many sec struc types ? ( 2)
Enter data for sec structure type : ( 1) Name (6 characters) ? (ALPHA)
Abbreviation (1 ch) ? (A)
Enter data for sec structure type : ( 2) Name (6 characters) ? (BETA)
Abbreviation (1 ch) ? (B)
Names : ( ALPHA BETA) Abbreviations : ( A B)
... processing 1UBQ ... UBIQUITIN - HUMAN (HOMO $SAPIEN ... /nfs/public/pdb/ubq1.pdb Nr of residue names : ( 134) Nr of YASSPA residues : ( 134) Nr of res + sec struc : ( 50) Nr of CA coordinates : ( 76) Nr of am ac + sec str : ( 50) Nr of sec struc elems : ( 9) Types : ( 3 6)
... processing 1UTG ... UTEROGLOBIN (OXIDIZED) - RABBIT (ORYCTOLAGUS ... /nfs/public/pdb/utg1.pdb Nr of residue names : ( 153) Nr of YASSPA residues : ( 153) Nr of res + sec struc : ( 56) Nr of CA coordinates : ( 70) Nr of am ac + sec str : ( 56) Nr of sec struc elems : ( 5) Types : ( 5 0)
... processing 2UTG ... UTEROGLOBIN - UTERINE SECRETIONS ... /nfs/public/pdb/utg2.pdb Nr of residue names : ( 305) Nr of YASSPA residues : ( 305) Nr of res + sec struc : ( 109) Nr of CA coordinates : ( 140) Nr of am ac + sec str : ( 109) Nr of sec struc elems : ( 9) Types : ( 9 0)
Nr of lines read : ( 6) Nr of proteins read : ( 3) Nr of proteins processed : ( 3)
*** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 ***
Version - 921026/0.02 Started - Fri Oct 30 20:02:18 1992 Stopped - Fri Oct 30 20:02:21 1992
CPU-time taken : User - 0.6 Sys - 0.2 Total - 0.8
*** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 *** PRO2 ***
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The result of this is a file which can be read by DEJAVU:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ! ! === 1UBQ ! MOL 1UBQ NOTE UBIQUITIN - HUMAN (HOMO $SAPIEN PDB /nfs/public/pdb/ubq1.pdb ! BETA 'B1' '2' '7' 6 26.849 29.021 3.898 30.224 38.643 16.662 BETA 'B2' '11' '16' 6 31.190 42.012 12.331 31.219 27.341 4.275 ALPHA 'A1' '23' '33' 11 31.287 22.201 16.417 39.807 32.994 9.233 ALPHA 'A2' '38' '40' 3 38.816 28.019 19.889 37.737 31.636 23.712 BETA 'B3' '41' '45' 5 34.737 30.874 21.473 22.125 29.062 18.183 BETA 'B4' '49' '51' 3 25.348 26.871 23.643 29.014 21.656 22.288 ALPHA 'A3' '57' '59' 3 22.923 18.583 12.025 21.078 21.149 16.251 BETA 'B5' '60' '62' 3 19.064 21.352 12.999 20.080 24.773 8.033 BETA 'B6' '65' '74' 10 21.418 30.253 9.620 40.871 33.801 30.253 ENDMOL ! ! === 1UTG ! MOL 1UTG NOTE UTEROGLOBIN (OXIDIZED) - RABBIT (ORYCTOLAGUS PDB /nfs/public/pdb/utg1.pdb ! ALPHA 'A1' '4' '14' 11 30.857 26.132 29.178 27.175 11.022 27.802 ALPHA 'A2' '18' '28' 11 36.402 7.816 28.131 39.520 21.056 36.843 ALPHA 'A3' '32' '46' 15 43.004 10.865 42.182 28.542 3.621 28.934 ALPHA 'A4' '50' '65' 16 23.195 5.333 22.534 17.765 26.475 28.934 ALPHA 'A5' '67' '69' 3 13.143 29.843 32.238 17.682 28.430 34.983 ENDMOL ! ! === 2UTG ! MOL 2UTG NOTE UTEROGLOBIN - UTERINE SECRETIONS PDB /nfs/public/pdb/utg2.pdb ! ALPHA 'A1' 'A4' 'A14' 11 27.389 27.997 -3.826 33.154 25.609 10.405 ALPHA 'A2' 'A18' 'A28' 11 26.224 30.108 15.661 16.176 26.975 3.151 ALPHA 'A3' 'A32' 'A46' 15 12.301 22.187 13.161 32.443 24.577 18.017 ALPHA 'A4' 'A50' 'A65' 16 40.262 26.962 15.045 38.390 21.010 -6.594 ALPHA 'A5' 'B4' 'B14' 11 29.114 11.709 -6.099 27.611 11.044 9.247 ALPHA 'A6' 'B18' 'B28' 11 35.706 5.945 11.488 41.514 12.506 -2.421 ALPHA 'A7' 'B32' 'B46' 15 48.793 14.249 7.397 29.683 10.438 16.285 ALPHA 'A8' 'B50' 'B65' 16 22.172 8.174 15.475 18.958 18.636 -4.118 ALPHA 'A9' 'B67' 'B69' 3 15.847 24.030 -6.553 21.323 23.866 -6.168 ENDMOL ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
PRO2 is clever enough to not be bothered by the strange way in which O creates and writes residue identifiers; it knows when it deals with a DNA or polysaccharide molecules and it won't generate structural elements which comprise residues from two different chains. The only remaining problems are with integer chain id's in the PDB file and with multiple NMR structures in one PDB file.
PRO2 generates SSEs by simply looking for continuous stretches
of ALPHA or BETA, retrieving the corresponding residue id's
and the coordinates of the Calpha atoms of the first and last
residues.
When PRO2 has finished, you may execute the csh-script to get
rid of intermediate files.
if you want to automate the process completely, or
if you want to create your own database of SSEs,
then you may use the csh-script called "makedb".
This script processes one or several PDB files, creates the
necessary input files for PRO1, O and PRO2, runs these programs
and deletes all intermediate files.
To use this script, copy it to your own directory, edit it
appropriately and type: source makedb. The output looks as
follows (unfortunately, O output cannot be redirected; if you
try to do it anyway, the program gets into an endless "empty
input line"-loop):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 177 gerard sirius 19:57:47 secs/database> source makedb ... scanning PDB files ... ... /nfs/public/pdb/ubq1.pdb ... ... /nfs/public/pdb/utg1.pdb ... ... /nfs/public/pdb/utg2.pdb ... ... running PRO1 ... ... running O ... O > Use of this program implies acceptance of conditions O > described in Appendix 10 of the O manual O > O version 5.8, Sat Sep 26 13:59:06 MET 1992 O > Loading /home/gerard/progs/secs/database/general.o O > Maximum inter-residue link distance = 6.00 O > There were 23 residues. O > 113 atoms. O > Do you want to use the display? [Yes]: O > Error in INST, object DISP_BONDS O > O > Macro in computer file-system. Sam> File type is PDB Sam> Nothing marked for deletion, so no compression. Sam> Molecule 1UBQ contained 134 residues and 660 atoms O > + O > Current molecule 1ZNA has not been loaded. O > Util> Template size : 5 residues. Util> There were 17 Util> Prompt: O > Util> Template size : 5 residues. Util> There were 33 Util> Prompt: O > O > O > O > O > Sam> I can't recognise file type from the file name Sam> What IS the file type? [PDB]: Sam> 76 atoms written out. O > Heap> Deleted 1UBQ_ATOM_XYZ Heap> Deleted 1UBQ_ATOM_B Heap> Deleted 1UBQ_ATOM_WT Heap> Deleted 1UBQ_ATOM_Z Heap> Deleted 1UBQ_ATOM_NAME Heap> Deleted 1UBQ_ATOM_VISIBLE Heap> Deleted 1UBQ_ATOM_SELECT Heap> Deleted 1UBQ_RESIDUE_NAME Heap> Deleted 1UBQ_RESIDUE_TYPE Heap> Deleted 1UBQ_RESIDUE_POINTERS Heap> Deleted 1UBQ_RESIDUE_CG Heap> Deleted 1UBQ_PDB_HEADER Heap> Deleted 1UBQ_PDB_COMPND Heap> Deleted 1UBQ_PDB_SOURCE Heap> Deleted 1UBQ_PDB_CRYST1 Heap> Deleted 1UBQ_PDB_SCALE Heap> Deleted 1UBQ_MOLECULE_TYPE Heap> Deleted 1UBQ_MOLECULE_CA Heap> Deleted 1UBQ_MOLECULE_CA_MXDST Heap> Deleted 1UBQ_RESIDUE_2RY_STRUC O > O > O > Sam> File type is PDB [...] Heap> Deleted 2UTG_MOLECULE_CA_MXDST Heap> Deleted 2UTG_RESIDUE_2RY_STRUC O > O > As1> Saved 44.5u 3.2s 1:13 65% ... running PRO2 ... ... removing intermediate files ... ... started at Fri Oct 30 20:01:00 MET 1992 ... ... stopped at Fri Oct 30 20:02:22 MET 1992 ... ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The contents of the makedb script:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- # makedb - csh script to generate an input file or (partial) # database for DEJAVU # # edit this script, then do : source makedb # # Gerard Kleywegt @ 921023,24,26,27,30 ## remove the FOLLOWING lines after copying this to your own directory echo echo "makedb - sorry, you have to copy me to your own directory" echo " first, then edit me and THEN you may execute me" echo exit -1 # remove the ABOVE lines after copying this to your own directory
# uncomment the following line to get all commands echoed to the screen # set echo
# edit the FOLLOWING lines before executing the script ***********************
# an identifier (will be used to generate filenames) set id=u
# which PDB files are to be processed ? (you may use wildcards) set sour=/nfs/public/pdb/u*.pdb # set sour=/nfs/public/pdb/utg2.pdb # set sour="/nfs/public/pdb/u*.pdb /nfs/public/pdb/x*.pdb"
# where are the executables of PRO1 and PRO2 ? set prog=/nfs/public/IRIX/bin
# the directory and name of the O executable for your machine set oexe=/nfs/taj/alwyn/o/bin/4d_ono
# any O database file of your own set ofil=/home/gerard/progs/secs/database/general.o
# the scratch directory where all intermediate files are kept set scrat=/nfs/scratch/gerard
# edit the ABOVE lines before executing the script ***********************
# derive other file names automagically set prof=$id.inp set omac=$id.omac set scsh=$id.csh set secs=$id.secs
# set some variables and redefine 'grep' set savedir=$cwd set started=`date` alias grep 'grep -i'
# go to the work directory cd $scrat
# *** make input file for PRO1/PRO2 ***
echo ... scanning PDB files ...
# write message to output file echo ! $prof > $prof echo ! Input file for PRO1/PRO2 created by script "makedb" >> $prof echo ! File created at `date` >> $prof
# loop over the PDB files
foreach file ($sour)
# show the user that you're actually doing something echo ... $file ...
# grab the molecule name from the HEADER record set molnam="`head -10 $file | grep 'header ' | cut -c63-66`"
# grab the compound name from the FIRST COMPND record set compnd="`head -10 $file | grep 'compnd ' | cut -c11-59`"
# get the source from the SOURCE record set source="`head -10 $file | grep 'source ' | cut -c11-29`"
# add the appropriate line to the PRO1/PRO2 input file echo "'""$molnam""' '"$file"' '""$compnd - $source""'" >> $p
end
# *** run PRO1 ***
# create input file echo $prof > temp1 echo $omac >> temp1 echo $scsh >> temp1
echo ... running PRO1 ...
$prog/4d_pro1 -batch < temp1 >& pro1.log
# check if there were errors grep error pro1.log
# make the csh-script executable chmod +x $scsh
# *** run O ***
# create input file echo no > tempo echo "@$omac" >> tempo echo stop >> tempo
echo ... running O ...
$oexe $ofil < tempo
# *** run PRO2 ***
# create input file echo $prof > temp2 echo $secs >> temp2 echo 2 >> temp2 echo ALPHA >> temp2 echo A >> temp2 echo BETA >> temp2 echo B >> temp2
echo ... running PRO2 ...
$prog/4d_pro2 -batch < temp2 >& pro2.log
# check if there were errors grep error pro2.log
# *** clean up ***
echo ... removing intermediate files ...
$scsh \rm temp1 tempo temp2 pro1.log pro2.log $scsh $prof $omac
# go back to original directory cd $savedir
echo ... started at $started ... echo ... stopped at `date` ...
# unset echo
exit 0 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Note that the script assumes that there is a HEADER, at least one COMPND
and at least one SOURCE card among the first ten cards in each PDB file.
If this is NOT the case, you must edit the input file that is created
($prof) and you may want to temporarily remove the statements at the end
that remove all intermediate files.
We mentioned before that relaxing the criteria in the search for
the DNA-binding helix-(turn)-helix motif of lambda cro repressor
would yield many more hits than the two we obtained in the
example.
If we actually do this, we may get the following hits:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 110 gerard rose 15:24:13 progs/secs> grep s_a_i cro_relax.omac s_a_i /nfs/public/pdb/cro1.pdb 1cro s_a_i /nfs/public/pdb/acn5.pdb 5acn pdb s_a_i /nfs/public/pdb/acn6.pdb 6acn pdb s_a_i /nfs/public/pdb/api7.pdb 7api pdb s_a_i /nfs/public/pdb/api8.pdb 8api pdb s_a_i /nfs/public/pdb/api9.pdb 9api pdb s_a_i /nfs/public/pdb/cat7.pdb 7cat pdb s_a_i /nfs/public/pdb/cat8.pdb 8cat pdb s_a_i nfs/public/pdb/ccp1.pdb 1ccp pdb s_a_i /nfs/public/pdb/ccp2.pdb 2ccp pdb s_a_i /nfs/public/pdb/ccp3.pdb 3ccp pdb s_a_i /nfs/public/pdb/ccp4.pdb 4ccp pdb s_a_i /nfs/public/pdb/cro1.pdb 1cro pdb s_a_i /nfs/public/pdb/csc1.pdb 1csc pdb s_a_i /nfs/public/pdb/csc2.pdb 2csc pdb s_a_i /nfs/public/pdb/csc3.pdb 3csc pdb s_a_i /nfs/public/pdb/csc4.pdb 4csc pdb s_a_i /nfs/public/pdb/csc5.pdb 5csc pdb s_a_i /nfs/public/pdb/cts1.pdb 1cts pdb s_a_i /nfs/public/pdb/cts2.pdb 2cts pdb s_a_i nfs/public/pdb/cts3.pdb 3cts pdb s_a_i /nfs/public/pdb/cts5.pdb 5cts pdb s_a_i nfs/public/pdb/cts6.pdb 6cts pdb s_a_i /nfs/public/pdb/cyp2.pdb 2cyp pdb s_a_i /nfs/public/pdb/cro3.pdb 3cro pdb s_a_i /nfs/public/pdb/hco1.pdb 1hco pdb s_a_i /nfs/public/pdb/icd3.pdb 3icd pdb s_a_i /nfs/public/pdb/icd4.pdb 4icd pdb s_a_i /nfs/public/pdb/icd5.pdb 5icd pdb s_a_i /nfs/public/pdb/icd6.pdb 6icd pdb s_a_i /nfs/public/pdb/icd7.pdb 7icd pdb s_a_i /nfs/public/pdb/icd8.pdb 8icd pdb s_a_i /nfs/public/pdb/icd9.pdb 9icd pdb s_a_i /nfs/public/pdb/lap1.pdb 1lap pdb s_a_i /nfs/public/pdb/lrd1.pdb 1lrd pdb s_a_i /nfs/public/pdb/lzm2.pdb 2lzm pdb s_a_i /nfs/public/pdb/lzm3.pdb 3lzm pdb s_a_i /nfs/public/pdb/or12.pdb 2or1 pdb s_a_i /nfs/public/pdb/phs1.pdb 1phs pdb s_a_i /nfs/public/pdb/sic1.pdb 1sic pdb s_a_i /nfs/public/pdb/trc1.pdb 1trc pdb s_a_i /nfs/public/pdb/ts13.pdb 3ts1 pdb s_a_i /nfs/public/pdb/ts14.pdb 4ts1 pdb s_a_i /nfs/public/pdb/xia1.pdb 1xia pdb s_a_i /nfs/public/pdb/xia4.pdb 4xia pdb s_a_i /nfs/public/pdb/xia5.pdb 5xia pdb s_a_i /nfs/public/pdb/xia6.pdb 6xia pdb s_a_i /nfs/public/pdb/xia7.pdb 7xia pdb s_a_i /nfs/public/pdb/xia8.pdb 8xia pdb s_a_i /nfs/public/pdb/xia9.pdb 9xia pdb s_a_i /nfs/public/pdb/55c1.pdb 155c pdb ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
In fact, we used the following parameters:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 107 gerard sirius 15:24:58 secs/database> more cro_relax.omac ! "O" macro cro_relax.omac ! created by DEJAVU at Fri Oct 30 15:26:41 1992 ! o_setup off off on ! print ... analysing 1cro print cro repressor - bacteriophage (lamb print ... query A2 A3 print ... allowed mismatches 2 6.000 5.000 0.250 print ... distance type H print ... directionality Y print ... absolute motif Y print ... neighbours Y ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
We have processed a representative selection of these hits with O (i.e., using only the best scoring protein of a set of related ones, such as the seven xia, d-xylose isomerase). The results are summarised in the following table.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ========================================================================================= 15 20 25 30 35 | | | | | 1cro score rmsX NI rmsI O11 AMRFGQTKTAKDLGVYQSAINKAIHAGR O38 lambda cro repressor XXXXXXXX XXXXXXXXXX (the two helices) ========================================================================================= 5acn 9.36 2.67 22 3.40 733 ETQIEWFRAGSALNRMKELQQK 754 aconitase 8api 5.01 4.77 21 3.11 A264 ENELTHDIITKFLEN A278 alpha-1-antitrypsin 8cat 6.63 4.63 18 2.78 B252 LAHEDPDYGLRDLFNAIA B269 catalase 2ccp 6.09 5.45 14 1.79 240 QDPKYLSIVKEYAN 253 cytochrome-c peroxidase 2cts 7.11 3.04 27 2.92 66 FRGFSIPECQKLLPK 80 citrate synthase 87 PLPEGLFWLLVT 98 2cyp 6.39 5.47 32 2.93 202..NE 209 cytochrome-c peroxidase 241 DPKYLSIVKEY 251 91..KE 98 (with cro A-chain) 15 SYEDF 19 (with cro B-chain) 3cro 9.83 5.53 31 2.90 R56..QYG R62 434 cro repressor R40 KRPRFLF R46 L41 RPRFLFEIAMALNC.. L57 1hco 6.41 4.84 17 3.09 B42 FESFGD B47 haemoglobin B57 NPKVKAHGKKV B67 5icd 6.85 5.33 29 3.20 85 PAETLDLIREYR 96 isocitrate dehydrogenase 353..GSII 357 (with cro C-chain) 386 AKTVTY 391 (with cro C-chain) 1lap 4.86 5.77 21 2.61 425 RSAGACTAAAFLKEF 439 leucine aminopeptidase 1lrd 2.84 0.60 25 3.70 ! 329 LGLSQESVADKMGMGQSGVGALFNG 353 lambda repressor 3lzm 7.68 3.95 29 2.87 95..ALIN 101 lysozyme 113 GFTNSLRMLQQKR.. 127 2or1 3.12 0.67 32 3.40 ! L5..RI L11 434 repressor L13 LGLNQAELAQKVGTTQQSIEQLENG L37 1phs 3.69 2.09 52 3.06 340..RALDGKDVLGLTFSGSGDEVMKLINKQ 372 phaseolin 39..QQSK 44 (with cro A-chain) 13..YFNSD 19 (with cro B-chain) 1sic 6.47 3.55 24 2.59 E229 GAAALILS E236 subtilisin E238 HPNWTNTQVRSSLQNT E253 1trc 1.02 2.96 25 2.36 A99 YISAAELRHV A108 calmodulin A114 EKLTDEEVDEMIREA A128 4ts1 6.28 4.72 24 3.24 A144 SVNYM A148 Tyr-tr-RNA synthase A152 ESVQSRIETG..A165 B35 CGFDP B39 (with cro C-chain) 6xia 5.98 2.98 29 2.49 215 PEVGHEQMAGLNFPHGIAQALWA 237 d-xylose isomerase 155c 6.00 4.15 18 2.86 73 ANLIEY 78 cytochrome-c550 80 TDPKPLVKKMTD 91 ========================================================================================= XXXXXXXX XXXXXXXXXX (the two helices) 1cro score rmsX NI rmsI O11 AMRFGQTKTAKDLGVYQSAINKAIHAGR O38 lambda cro repressor | | | | | 15 20 25 30 35 ========================================================================================= ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Legend: the first column contains the PDB identifier which is followed by the score according to DEJAVU, the rms fit of the Calpha atoms using the lsq_explicit option in O, the number of matched residues as determined by the lsq_improve option in O and the rms fit of the Calpha atoms of these residues. The right-hand part of the table shows (some of) the structural alignments found by lsq_improve in sofar as they pertain to residues in and around the helix-turn-helix motif of 1cro.
NOTE: since lsq_improve does a global optimisation for the alignment of two proteins, the resulting picture simetimes is worse than after a simple lsq_explicit (e.g., for 1lrd and 2or1). Also, this option is sometimes unstable, alternating between two solutions and not always ending up with the best one.
NOTE: there doesn't seem to be a simple correlation between the DEJAVU scores and the rms-fit values, so be careful when throwing away hits with a high DEJAVU score (e.g., 5acn and 2cts) !
NOTE: how widely different amino-acid sequences may yield similar spatial motifs !!!
NOTE: the best hits are those for which both helices are part of a long matching sequence of residues (i.e., 5acn, 2cts, 1lrd, 2or1, 1phs, 1sic, 1trc, 6xia and 155c).
If you want to compare your structure with a subset of the PDB structures, you can use the select option:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (READ) seleOptions : (1) Select ALL entries (2) Select NONE of the entries (3) Select ON for one or more entries (4) Select OFF for one or more entries (5) Read a select macro file Option (1-5) ? ( 1) 1 Selected ALL entries
Nr of selected entries now : ( 607)
2 CPU total/user/sys : 0.0 0.0 0.0
===> Option ? (SELE)
Options : (1) Select ALL entries (2) Select NONE of the entries (3) Select ON for one or more entries (4) Select OFF for one or more entries (5) Read a select macro file Option (1-5) ? ( 1) 5 Select macro file ? (user.sel) cici.select
Selected NONE of the entries Select ON 1alc Select ON 2apr Select ON 5apr Select ON 1bp2 Select ON 3bp2 Select ON 4bp2 ERROR --- Invalid entry code: 2c4s Select ON 1cdp Select ON 3cln Select ON 2cna Select ON 3cna Select ON 4cpv Select ON 5cpv ... Select ON 1trc Select ON 1trm Select ON 2trm
Nr of selected entries now : ( 87)
2 CPU total/user/sys : 0.3 0.3 0.1
===> Option ? (SELE) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
A select file may contain comments (any line beginning with "!") and
select records; possible types:
- select all
- select none
- select on pdb_code
- select off pdb_code
A select file may look as follows:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- < % 147 gerard sirius 23:09:47 secs/cbh1> cat cici.select ! Select file for DEJAVU ! Created by select.csh ! At Thu Feb 18 22:45:45 MET 1993 ! Keywords calcium ! Select none Select on 1ALC Select on 2APR Select on 5APR ... Select on 1TRC Select on 1TRM Select on 2TRM ! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Use the following C-shell script (or an adaptation) to generate select files automatically by scanning for one or more keywords in all PDB files:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- #!/bin/csh -f # select.csh - Gerard Kleywegt 1993 if ($#argv < 1) then echo echo "usage: $0 keyword1 [keyword2 ...]" echo exit 1 endif # set pdbdir=/nfs/public/pdb # set alfabet='a b c d e f g h i j k l m n o p q r s t u v w x y z' set out=$argv[1].select # echo Looking for $argv[1-$#argv] echo Select file $out # echo "! Select file for DEJAVU " > $out echo "! Created by $0" >> $out echo "! At `date`" >> $out echo "! Keywords $argv[1-$#argv]" >> $out # echo "! " >> $out echo "Select none" >> $out # loop over all letters in the alphabet foreach letter ($alfabet) set files=`echo $pdbdir/$letter"*.pdb"` echo echo There are $#files PDB files beginning with the letter $letter # loop over all files beginning with this letter foreach pdb ($files) # loop over all keywords foreach key ($argv) # count the nr of times this keyword occurs in the file set hits=`grep -c -i $key $pdb` if ($hits == 0) then goto failure endif end # if here, the file contains all keywords set molnam="`head -10 $pdb | grep -i 'header ' | cut -c63-66`" set compnd="`head -10 $pdb | grep -i 'compnd ' | cut -c11-59`" echo Protein $molnam in file $pdb echo Possible name "$compnd" echo "Select on $molnam" >> $out # in case of failure, you come here immediately failure: end end # echo "! " >> $out echo Done ... exit 0 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The following is an example of an incremental search:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (READ) in********** NEW QUERY **********
Elements : ( B1 B2 B3 B4 A1 B5 A2 A3 B6 B7 B8 B9 B10 B11 B12 B13 A4 A5 B14 B15 B16 B17 B18 B19 A6 B20 B21 B22 B23 B24 A7 A8 A9 B25 A10 A11 B26) Min nr of residues for SSEs ? ( 5) 6 ................... ( B3 B4 A3 B8 B9 B11 B16 B17 B21 B22 A7 A9 B25 A11 B26) Min nr of elements to match (0 = abort) ? ( 4) 5
Mismatch nr of residues ? ( 3)
Mismatch element length ? ( 10.000)
Mismatch distances ? ( 8.000)
Mismatch cosines ? ( 0.400)
Weights for scoring ? ( 0.250 0.250 0.250 0.250) 1 1 10 5 Normalised weights : ( 0.059 0.059 0.588 0.294)
Possible distance criteria: C => centre-to-centre H => MIN head-tail and tail-head (anti-parallel) T => MIN head-head and tail-tail (parallel) I => MIN of all these distances A => MAX of all these distances Which distances (C/H/T/I/A) ? (C)
Extensive output ? (N)
Conserve directionality ? (Y)
Conserve absolute motif ? (Y)
Conserve neighbours ? (N)
Attempt to avoid multi-chain hits ? (Y)
Attempt to avoid identical proteins ? (Y)
Create "O" macro file ? (Y)
"O" macro file ? (lsq.omac)
Nr of elements recognised in query : ( 15) Indices : ( 3 4 8 11 12 14 21 22 27 28 31 33 34 36 37) Nr of elements of each type : ( 4 11)
********** 2cna ********** 108 ********** [concanavalin a - jack bean (canavali ] [/nfs/public/pdb/cna2.pdb ] QUERY : ( 3 4 8 11 12 14 21 22 27 28 31 33 34 36 37) Elements : B3 B4 A3 B8 B9 B11 B16 B17 B21 B22
A7 A9 B25 A11 B26 Lengths : ( 26.477 31.328 10.053 22.441 24.508 23.564 23.091 25.716 26.247 23.934 13.939 11.969 19.554 9.769 27.656) Residues : ( 9 11 7 9 9 8 9 9 9 8 10 9 7 7 10) Nr of common SSEs : ( 5)
MATCH : ( 0 7 0 9 10 12 0 0 20 0 0 0 0 0 0) Elements : -X- B6 -X- B8 B9 B10 -X- -X- B18 -X- -X- -X- -X- -X- -X- Lengths : ( 23.720 23.278 23.972 31.742 17.850) Residues : ( 9 8 8 11 6) Length ... rmsd = 6.265 ... match = 0.970 Residues ... rmsd = 2.191 ... match = 0.973 Distance ... rmsd = 4.260 ... match = 0.970 Cosines ... rmsd = 0.146 ... match = 0.981 SCORE : ( 3.163)
Nr of hits : ( 1) Nr of common SSEs : ( 5) Nr of best match : ( 1) Best score : ( 3.163)
Nr of matching entries : ( 1) Nr of hits (total) : ( 1)
Entry 108 = 2cna = concanavalin a - jack bean (canavali
2 CPU total/user/sys : 3.2 3.0 0.3
===> Option ? (IN) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
This, rather crummy, option may help you in fathoming the topology of your protein. You enter a cosine and a distance cutoff which determine whether or not two SSEs are parallel (cosine >= cutoff) or anti-parallel (cosine <= -cutoff) and whether they are spatial neighbours (distance <= cutoff). A matrix is printed which contains +2 for parallel neighbours, +1 for parallel, -1 for anti-parallel and -2 for anti-parallel neighbours.
The first number is the sum of the absolute values of the matrix entries for an SSE (if high, then central in a motif), the second is the number of spatial neighbours. You should choose your cut-off such that no SSE has more than 2 spatial neighbours.
DEJAVU produces a file which can be plotted (and converted into PostScript) with O2D (use "open 2 topo 0 1" to open a 2D window, then type "topo mytopo.file mytopo.ps" and voila).
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- COSine cut-off ? ( 0.800) DIStance cut-off ? ( 8.000) O2D topology file ? (cbh6a.topo) A1 5 1 11 -1 0 0 0 0 0 0 0 1 -1 2 0 A2 6 0 -1 11 0 -1 0 0 0 0 0 -1 1 -1 1 B1 3 1 0 0 11 0 -2 1 0 0 0 0 0 0 0 B2 4 1 0 -1 0 11 0 0 0 0 0 0 0 1 -2 B3 6 2 0 0 -2 0 11 -2 1 -1 0 0 0 0 0 B4 6 2 0 0 1 0 -2 11 -2 1 0 0 0 0 0 B5 5 2 0 0 0 0 1 -2 11 -2 0 0 0 0 0 B6 4 1 0 0 0 0 -1 1 -2 11 0 0 0 0 0 B7 2 1 0 0 0 0 0 0 0 0 11 -2 0 0 0 B8 7 2 1 -1 0 0 0 0 0 0 -2 11 -2 1 0 B9 7 2 -1 1 0 0 0 0 0 0 0 -2 11 -2 1 B10 9 3 2 -1 0 1 0 0 0 0 0 1 -2 11 -2 B11 6 2 0 1 0 -2 0 0 0 0 0 0 1 -2 11 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The system manager will have to do the following:
* put the appropriate executables in directories which are accessible by local DEJAVU users
* change the "make_sse" script (site-specific executables)
* copy the big PDB-derived libraries to an accessible directory
* change the file names of ALL PDB files mentioned in the
big PDB-derived libraries so that they point to the disk
etc. where you keep your local copies of the uncompressed
PDB files. In Uppsala, all PDB files are in a directory
called /nfs/pdb/full. If you keep your
PDB files in a directory called /usr/mnt/people/pdb, change
the big library file accordingly, e.g., using a (stream) editor,
OR make a soft link in "/", as follows:
ln -s /usr/mnt/people/pdb /nfs/pdb/full
If you create a soft link, you do NOT have to edit the big
library file !
Example of changing the libraries with "sed":
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- echo "s%/nfs/pdb/full%/y/database/brookhaven/pdb%g" > q.sed sed -f q.sed full_pdb.lib > q ; mv q full_pdb.lib echo "s%/nfs/pdb/pre%/y/database/brookhaven/pdb%g" > q.sed sed -f q.sed pre_pdb.lib > q ; mv q pre_pdb.lib ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
* provide users with a minimalist DEJAVU library file which should AT LEAST contain the following lines:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- TYPE 'ALPHA' 'alpha helix' TYPE 'BETA' 'beta strand'CHAIN your_local_big_pdb-derived_dejavu_library_file ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
In between the TYPE and the CHAIN commands, the user may
insert SSE records of his/her own structures (see the
example dejavu_user.lib file). NOTE that keywords should
be left-justified, uppercase strings of SIX characters
(i.e., add trailing spaces if necessary).
NOTE that you may "chain" an unlimited number of SSE files;
I like to have my personal file first, then a file with
structures solved in Uppsala but not yet in the PDB and
finally the big PDB-derived library.
As of version 5.3, DEJAVU is capable of "symbolic matching". In
this case, the spatial information regarding the SSEs is
completely ignored, and only their type and length (nr of
residues) are used (as well as the number of residues in
gaps between neighbouring SSEs).
This option can be useful if you get no hits at all; for example,
a domain rearrangement may screw up coordinate-based searches,
but symbolic matching may still work.
Another application is when you have a very reliable secondary
structure prediction, but no structure (yet). Make an SSE file
and use dummy coordinates, e.g.:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- MOL P2 NOTE P2 myelin protein for testing symbolic matching BETA 'B1' 'A7' 'A9' 3 0.0 0.0 0.0 1.0 1.0 1.0 BETA 'B2' 'A12' 'A14' 3 0.0 0.0 0.0 1.0 1.0 1.0 ALPHA 'A1' 'A16' 'A23' 8 0.0 0.0 0.0 1.0 1.0 1.0 ALPHA 'A2' 'A27' 'A35' 9 0.0 0.0 0.0 1.0 1.0 1.0 BETA 'B3' 'A37' 'A45' 9 0.0 0.0 0.0 1.0 1.0 1.0 BETA 'B4' 'A48' 'A55' 8 0.0 0.0 0.0 1.0 1.0 1.0 BETA 'B5' 'A58' 'A64' 7 0.0 0.0 0.0 1.0 1.0 1.0 BETA 'B6' 'A68' 'A74' 7 0.0 0.0 0.0 1.0 1.0 1.0 BETA 'B7' 'A78' 'A87' 10 0.0 0.0 0.0 1.0 1.0 1.0 BETA 'B8' 'A90' 'A97' 8 0.0 0.0 0.0 1.0 1.0 1.0 BETA 'B9' 'A100' 'A109' 10 0.0 0.0 0.0 1.0 1.0 1.0 BETA 'B10' 'A112' 'A119' 8 0.0 0.0 0.0 1.0 1.0 1.0 BETA 'B11' 'A122' 'A129' 8 0.0 0.0 0.0 1.0 1.0 1.0 ENDMOL ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Now run DEJAVU (see below). Note that 11 of the first 12 hits are proteins that belong to the same family (and have the same fold) as P2 myelin protein.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ********** NEW QUERY **********Elements : ( B1 B2 A1 A2 B3 B4 B5 B6 B7 B8 B9 B10 B11) Nr of SSEs : ( 13) Min nr of residues for SSEs ? ( 4) Nr of SSEs : ( 11) Remaining SSEs : ( A1 A2 B3 B4 B5 B6 B7 B8 B9 B10 B11) Min nr of elements to match (0 = abort) ? ( 9)
Is this a BONES search ? (N)
Is this a SYMBOLIC search ? (Y)
SYMBOLIC search; no LSQ done
Define how much the nr of residues in SSEs may differ by defining how many residues shorter or longer SSEs in the database may be compared to those in your protein. Max nr of residues "too short" ? ( 3) Max nr of residues "too long" ? ( 3)
[...]
********** 1opb ********** 1243 ********** [cellular retinol binding protein ii (holo form) (holo-crbpii - rat (r ] [/nfs/pdb/full/1opb.pdb ] Elements : A1 A2 B3 B4 B5 B6 B7 B8 B9 B10 B11 Nr of common SSEs : ( 10) Elements : A1 A2 B3 B4 B5 B6 -X- B7 B8 B9 B10 Total mismatched residues : ( 9) Total gaps mismatch : ( 7) Elements : A1 A2 B3 B4 B5 B6 -X- B8 B9 B10 B11 Total mismatched residues : ( 6) Total gaps mismatch : ( 5) Elements : A1 A2 B3 B4 B5 -X- B6 B7 B8 B9 B10 Total mismatched residues : ( 10) Total gaps mismatch : ( 12) Elements : A1 A2 B3 B4 -X- B5 B6 B7 B8 B9 B10 Total mismatched residues : ( 10) Total gaps mismatch : ( 12) Elements : A1 A2 B3 -X- B4 B5 B6 B7 B8 B9 B10 Total mismatched residues : ( 11) Total gaps mismatch : ( 13) Elements : A1 A2 -X- B3 B4 B5 B6 B7 B8 B9 B10 Total mismatched residues : ( 12) Total gaps mismatch : ( 12)
Nr of hits : ( 6) Nr of common SSEs : ( 10) Nr of best match : ( 2) Best score : ( 6.000) Best gap mismatch : ( 5.000)
[...]
Nr of database entries : ( 2182) Nr of selected entries : ( 2182) Nr of matching entries : ( 39) Nr of hits (total) : ( 639)
Sorting hits ...
Nr Entry PDB SSE GAPS SCORE Compound ==== ===== ==== ==== ===== ===== ======== 1 1327 1pmp 11 0 0 p2 myelin protein (p2) - bovine (bos taurus) caudal spinal root myeli 2 675 1ftp 11 3 2 fatty-acid-binding protein - desert locust (schistocerca gregaria) 3 545 1eal 11 5 10 nmr study of ileal lipid binding protein - organism_scientific: sus s 4 440 1crb 11 11 9 cellular retinol binding protein (crbp) complexed with all-t - rat (r 5 823 1hmt 10 1 1 fatty acid binding protein (human muscle, m-fabp) complexed - organis 6 1036 1lid 10 1 1 adipocyte lipid-binding protein complexed with oleic acid - mouse (mu 7 1029 1lfo 10 1 4 liver fatty acid binding protein - oleate complex - organism_scientif 8 1243 1opb 10 5 6 cellular retinol binding protein ii (holo form) (holo-crbpii - rat (r 9 635 1fie 10 23 12 recombinant human coagulation factor xiii - organism_scientific: homo 10 353 1cbi 9 4 5 apo-cellular retinoic acid binding protein i - organism_scientific: m 11 355 1cbs 9 5 5 cellular retinoic-acid-binding protein type ii complexed wit - human 12 1105 1mdc 9 5 7 fatty acid binding protein (manduca sexta) (mfb2) - tobacco hornworm 13 1193 1nir 9 7 7 oxydized nitrite reductase from pseudomonas aeruginosa - organism_sci 14 2018 2tbv 9 7 13 tomato bushy stunt virus - tomato bushy stunt virus
[...]
37 592 1esf 9 28 17 staphylococcal enterotoxin a - organism_scientific: staphylococcus au 38 934 1ivd 9 45 12 influenza a subtype n2 neuraminidase (sialidase) (e.c.3.2.1. - influe 39 1831 2bpa 9 1823 14 bacteriophage phix174 capsid proteins gpf, gpg, gpj and four - bacter
2 CPU total/user/sys : 6.9 6.7 0.2 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
* 930125 - new distance options I (= min of all other types of distances) and A (= max of ditto)
* 930125 - names of SSEs are now all converted to upper case, i.e., no longer case-sensitive
* 930125 - implemented incremental search, i.e. a search for the maximum common motif of your protein and all of the database proteins; the input is the same as for the FIND option, except that you don't provide a set of SSEs but only the minimum number of SSEs that must be matched. This type of search may take a while if your protein contains many SSEs ! Note that you may also specify a minimum length (in residues) which will affect the choice of the query elements and of those from the database structures. Set the minimum length to 5 residues, for example, in order to ignore about hits involving tiny SSEs
* 930125 - implemented option to tell DEJAVU to try and avoid multiple chain hits by using only SSEs which have the same chain identifier for their first residue (in the range 'a' - 'z' or 'A' to 'Z') as the first SSE of each database protein
* 930222 - SELECT option (see above); option to try and avoid hits with multiple copies of the same protein (i.e., if you found a hit with 1LYZ, DEJAVU will skip 2LYZ etc.). It compares the last three characters of the PDB code with those of all proteins that already yielded hits; if they are identical, the protein is skipped (this is not 100 % fail-proof and you might miss interesting hits !!!)
None, at present.