Program : SPASM
Version : 990301
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology,
Uppsala University, Biomedical Centre, Box 590,
SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : detection of main and side chain motifs
Package : SPASM
Reference(s) for this program:
* 1 * M. Harel, G.J. Kleywegt, R.B.G. Ravelli, I. Silman & J.L. Sussman (1995). Crystal structure of an acetylcholinesterase- fasciculin complex: interaction of a three-fingered toxin from snake venom with its target. Structure 3, 1355-1366. [http://www.ncbi.nlm.nih.gov/htbin-post/Entrez/query?uid=8747462&form=6&db=m&Dopt=r]
* 2 * G.J. Kleywegt (1998). Deja-vu all over again. CCP4/ESF-EACBM Newsletter on Protein Crystallography 35, July 1998, pp. 10-12. [http://alpha2.bmc.uu.se/usf/factory_9.html]
* 3 * G.J. Kleywegt & T.A. Jones (1998). Databases in protein crystallography. Acta Cryst D54, 1119-1131. [http://alpha2.bmc.uu.se/~gerard/papers/databases.html] [http://www.iucr.org/iucr-top/journals/acta/tocs/actad/1998/actad5406_1.html]
* 4 * G.J. Kleywegt (1998 ?). Recognition of spatial motifs in protein structures. J Mol Biol, in press.
* 5 * G.J. Kleywegt & T.A. Jones (1999 ?). Chapter 25.2.6. O and associated programs. Int. Tables for Crystallography, Volume F. To be published.
950109 - 0.1 - trial quick-n-dirty program -> works well !
950110 - 0.2 - more bells and whistles; select MC and/or SC to
use for superpositioning
950111 - 0.3 - continue; interface to O
950112 - 1.0 - first version for the "general public" (Uppsala only)
950118 - 1.1 - sensitive to environment variable GKLIB; debugged
option to use generic residue type "XXX" (main-chain
only !); added sketch instructions for the search
pattern object to the O macro
950119 - 1.2 - introduced option to conserve neighbours
950120 - 1.3 - minor changes
950126 - - added MKSPAZ (v. 1.0) to generate new library entries
950421 - 2.0 - redid database to use CAs instead of main-chain
centre-of-gravity to get better results with NHANCE
when generating coordinates for beta-strands; changes
propagated through all programs
951005 - 2.1 - new databse (Hobohm & Sander 95% list); redimensioned
for 1200 residues
970124 - 2.2 - optional generation of LSQMAN input file (to detect
more global similarities)
971127 - 3.0 - implemented use of BLOSUM-45 substitution matrix to
decide which substitutions are allowed; optional
generation of multiple sequence alignment file for
use with MSEQPRO for profile analysis
980210 -3.0.1- increased dimensioning so the program can handle 1OCC
980318 -3.0.2- minor bug fix (first residue of "Your sequence" was
usually the wrong one ;-)
980909 - 3.1 - added a new substitution option in which the user
can define which residue-type substitutions will
be allowed (e.g., HIS<->HIS, but GLN<->GLU,ASP,GLN)
981007 - X - the jiffy program DEJANA (part of the DEJAVU package)
has been changed so it can also be used with O macros
produced by SPASM and RIGOR !
990301 - 3.2 - separate distance cut-offs for CA/CA and sidechain/SC
mismatches
SPASM stands for "SPatial Arrangements of Side chains and Main
chains". It is a complementary program to DEJAVU: DEJAVU can be
used to find similar arrangements of helices and strands, and
SPASM can be used to find similar arrangements of side chains
and main chains (e.g., loops, turns, active sites, metal-binding
sites, etc.).
The program is based on an idea of Artymiuk et al.; reference:
P. Artymiuk, "Fold Recognition", in "Making the Most of Your
Model" (J.N. Thornton & W.N. Hunter, Eds.), Daresbury Laboratory,
pp. XXX-YYY (1995).
The algorithm is basically the same as that used by DEJAVU (i.e.,
an exhaustive, recursive, depth-first search with early pruning
of the search tree), but the input and the database differ, of
course.
The program is surprisingly fast, and surprisingly little noise
tends to be generated (unless you have relaxed criteria). SPASM
is interfaced with O, since it can produce an O macro to read,
draw and align all hits to your search pattern.
The accompanying program MKSPAZ can be used to generate new
library entries from standard PDB files. This can be used to
include new proteins into the librayr, or to generate small
libraries which contain only those proteins you want to compare
your structure with.
NOTE: This program is sensitive to the environment variable GKLIB. If set, the name of this directory will be prepended to the default name for the library file needed by this program. For example, in Uppsala, put the following line in your .login or .cshrc file: setenv GKLIB /nfs/public/lib
The database (spasm.lib) contains information about the CA atoms and
the centres-of-gravity of the side-chain atoms of > 200,000
residues from ~950 proteins (Hobohm & Sander 95% homology list
of August 1995).
The format of the database is simple, and you can easily add
more proteins to it (with MKSPAZ). An entry for a protein may
look as follows:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ! Created by MAKEDB V. 950421/2.0 at Fri Oct 6 00:30:06 1995 for user gerard ! PRO 125D PDB /nfs/pdb/full/125d.pdb RES 99.99 CMP CD2-GAL4 (65-RESIDUE DNA-BINDING DOMAIN) (YEAST) (NMR, 22 STRUCTURES) MET 1 2.427 -14.350 -17.374 -0.570 -14.374 -18.601 LYS 2 4.409 -11.142 -17.694 3.077 -8.545 -19.102 ... PRO 42 3.389 -0.744 -14.643 1.531 -0.759 -14.821 LYS 43 3.879 -0.178 -18.355 2.235 0.052 -21.732 END ! PRO 135L PDB /nfs/pdb/full/135l.pdb RES 1.30 CMP LYSOZYME (E.C.3.2.1.17) LYS 1 25.408 20.195 26.922 27.543 17.709 27.832 VAL 2 23.949 17.686 24.540 23.390 18.223 22.658 ... ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Any line starting with an exclamation mark ("!") is ignored as being
a comment line. A protein entry starts with "PRO XXXX", where "XXXX"
is an identifier (e.g., PDB code). The physical location of the
protein's PDB file is stored in the PDB record, the resolution (in A;
99.99 for NMR structures and other files without a resolution remark)
on the RES record, and the name of the protein in the CMP record.
These first four records *must* appear in this order, but comment lines
may be interspersed.
For each residue there is one line which contains the residue
identifier in columns 1-10 (columns 18-27 of a regular PDB record),
followed by the coordinates of the CA atom and the centre-of-gravity of
the side-chain atoms in the order CaX, CaY, CaZ, SX, SY, SZ (can be
read in free format). The "END" record signals the end of the
residue list.
In principle, you can add as many proteins to the database as you
wish, since the file is rewound, read and processed simultaneously.
This means that the database residues are not stored in memory. There
is a hard-wired limitation on the maximum number of residues in any
given individual database protein, though (at present, 1200 residues).
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ... Name of new SPASM library to create ? (spasm.new) spasm.customName of next PDB file to add ? ( ) /nfs/pdb/full/1aaz.pdb 4-Character ID ? (/nfs) 1aaz
Processing : (/nfs/pdb/full/1aaz.pdb) Nr of atoms : ( 862)
Nr of residues found : ( 87) Nr of residues written : ( 87)
Name of next PDB file to add ? ( ) /nfs/pdb/full/3cbh.pdb 4-Character ID ? (/nfs) 3cbh
Processing : (/nfs/pdb/full/3cbh.pdb) Nr of atoms : ( 365)
Nr of residues found : ( 365) Nr of residues written : ( 0)
Name of next PDB file to add ? ( ) chra.pdb 4-Character ID ? (chra)
Processing : (chra.pdb) Nr of atoms : ( 2794) Resolution (A; 99.99 for NMR) ? ( 99.990) 3.0
Nr of residues found : ( 370) Nr of residues written : ( 370)
Name of next PDB file to add ? ( )
Nr of residues total : ( 457) Nr of proteins used : ( 2) ... ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
You can add as many PDB files in one run of the program as you like.
Note that 3CBH only contains CA coordinates and can therefore not
be used.
The new database may look as follows:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ! Created by MKSPAZ V. 950421/2.0 at Fri Apr 21 23:04:43 1995 for user gerard ! PRO 1AAZ PDB /nfs/pdb/full/1aaz.pdb RES 2.00 CMP GLUTAREDOXIN MET A 1 19.791 29.971 -7.982 20.633 31.166 -7.396 PHE A 2 21.192 26.459 -8.288 19.118 24.939 -9.912 ... LYS A 87 11.780 15.553 -16.595 13.488 14.683 -19.464 END ! PRO 3CBH PDB /nfs/pdb/full/3cbh.pdb RES 2.00 CMP CELLOBIOHYDROLASE /II$ CORE PROTEIN (E.C.3.2.1.91) (/CBHII$) END ! NO residues ! PRO CHRA PDB chra.pdb RES 3.00 CMP MET A 1 12.446 27.113 66.765 12.791 23.902 65.990 LYS A 2 10.581 29.825 64.962 7.858 30.460 62.815 ILE A 3 9.320 32.550 67.058 10.543 34.217 67.524 ... VAL A 369 6.275 51.912 83.789 5.932 53.474 82.387 SER A 370 9.658 52.747 85.454 9.533 54.706 84.938 END ! ! total residues 457 ! total proteins 2 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Defining a search pattern is very simple: just make a PDB file
and remove everything *except* all atoms of the residues of interest.
For example, if your catalytic residues are Asp 123, Glu 219 and
Asp 382, simply make a PDB file which only contains the atoms of each
of these three residues. SPASM *implicitly* assumes that you present
the residues in the order in which they appear in the sequence ! This
is sometimes important, but the program does *not* check if this is
actually the case !
As an example, the following file contains the residues of the
catalytic triad of Candida antarctica lipase B (PDB code 1TCA):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ATOM 768 N SER 105 -8.424 22.313 13.475 1.00 3.99 1TCA 906 ATOM 769 CA SER 105 -8.136 21.862 14.832 1.00 5.02 1TCA 907 ATOM 770 C SER 105 -9.268 20.966 15.394 1.00 3.52 1TCA 908 ATOM 771 O SER 105 -9.669 20.008 14.737 1.00 3.42 1TCA 909 ATOM 772 CB SER 105 -7.904 23.111 15.702 1.00 8.11 1TCA 910 ATOM 773 OG SER 105 -7.320 22.766 16.938 1.00 13.88 1TCA 911 ATOM 1369 N ASP 187 3.721 21.285 13.689 1.00 8.01 1TCA1507 ATOM 1370 CA ASP 187 2.590 21.835 14.434 1.00 6.80 1TCA1508 ATOM 1371 C ASP 187 3.008 21.995 15.906 1.00 6.78 1TCA1509 ATOM 1372 O ASP 187 3.491 21.052 16.516 1.00 7.57 1TCA1510 ATOM 1373 CB ASP 187 1.399 20.880 14.322 1.00 5.92 1TCA1511 ATOM 1374 CG ASP 187 0.083 21.509 14.737 1.00 7.68 1TCA1512 ATOM 1375 OD1 ASP 187 0.020 22.124 15.816 1.00 6.59 1TCA1513 ATOM 1376 OD2 ASP 187 -0.895 21.386 13.979 1.00 7.36 1TCA1514 ATOM 1649 N HIS 224 0.477 25.559 13.397 1.00 6.46 1TCA1787 ATOM 1650 CA HIS 224 -0.921 25.162 13.569 1.00 6.60 1TCA1788 ATOM 1651 C HIS 224 -1.880 26.075 12.788 1.00 6.98 1TCA1789 ATOM 1652 O HIS 224 -2.807 25.591 12.123 1.00 7.10 1TCA1790 ATOM 1653 CB HIS 224 -1.273 25.180 15.058 1.00 6.87 1TCA1791 ATOM 1654 CG HIS 224 -2.570 24.513 15.386 1.00 7.40 1TCA1792 ATOM 1655 ND1 HIS 224 -2.851 23.212 15.037 1.00 7.13 1TCA1793 ATOM 1656 CD2 HIS 224 -3.666 24.948 16.056 1.00 7.87 1TCA1794 ATOM 1657 CE1 HIS 224 -4.041 22.847 15.458 1.00 8.67 1TCA1795 ATOM 1658 NE2 HIS 224 -4.541 23.888 16.073 1.00 8.28 1TCA1796 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The maximum number of atoms and residues in the search pattern is printed upon start-up. Any lines which do not start with "ATOM" are skipped; hydrogen atoms are ignored. Amino-acid residues are recognised by the fact that they have more than three main-chain atoms (N, CA, C, O, OTX, OT1, OT2); for example, the database contains a handful of pyroglutamate residues (type PCA). If you use a residue type "XXX", the residue will be matched against *any* residue type in the database.
SPASM is easy-to-use. One thing you need to know in advance is the
location of the database file on your local computer system. In
Uppsala, this is: /nfs/public/lib/spasm.lib
Once you start the program, just answer the questions (most of the
defaults make sense), and let SPASM do the hard work. To explain the
input etc., we shall work through an example using the pattern file
shown above (Ser-Asp-His):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM ***Version - 950421/2.0 (C) 1993-5 Gerard J. Kleywegt, Dept. Mol. Biology, BMC, Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL) Others - T.A. Jones, G. Bricogne, Rams, W.A. Hendrickson Others - W. Kabsch, CCP4, PROTEIN, E. Dodson, etc. etc.
Started - Fri Apr 21 23:07:22 1995 User - gerard Mode - interactive Host - jupiter ProcID - 19870 Tty - /dev/ttyq8
*** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM ***
Max nr of atoms in pattern file : ( 500) Max nr of residues in ,, ,, : ( 50) Ditto, in database proteins : ( 1024)
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- SPASM database file ? (/nfs/public/lib/spasm.lib) ../spasm.lib CPU total/user/sys : 0.0 0.0 0.0 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Provide the name of the database file on your local computer system.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Which PDB file ? (0xyz.pdb) 1tca.pdbNr of atoms : ( 24)
1TCA SER 105 -8.136 21.862 14.832 -7.612 22.938 16.320 1TCA ASP 187 2.590 21.835 14.434 0.152 21.475 14.714 1TCA HIS 224 -0.921 25.162 13.569 -3.157 24.098 15.511
Nr of residues found : ( 3) Nr of residues okay : ( 3) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Provide the name of the PDB file that contains your search pattern.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Four-character ID for this run ? (1TCA) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Provide a 4-character ID for this run (used in the O macro, for instance).
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Enter the max RMSD for "good" hits. If you use only a few residues (3-5), an RMSD < 1 A tends to be obtained for similar arrangements of residues. Max superpositioning RMSD ? ( 1.500) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The maximum allowable value of the RMSD of database residues with your search pattern. It is always good to start with a low value (e.g., 1 A) to see if there are any *very* similar patterns in the database. If not, you can relax the value to 1.5 or 2 A and repeat the search.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- To speed up the search, any match in which at one of the residue-residue distances differs by more than a certain value are not pursued further. Reasonable values are 1 - 2 A. Max distance mismatch ? ( 2.000) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
This number should be a little bit larger than the maximum RMSD; it can be relaxed further if no hits are found with more restrictive values.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- You may opt to use only structures solved at high resolution by supplying a resolution cut-off. Note: NMR structures have a resolution of 99.99 A, so use a cut-off > 100 if you want to include these. Resolution cut-off (A) ? ( 999.900) 2.5 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Supply a resolution cut-off (or a number >= 100 if you don't want to use such a cut-off).
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- You may opt to allow substitutions of certain residue types. You have the following options: (1) Do not allow substitutions (2) Only allow D/E, N/Q, L/I, F/Y and R/K (3) Use BLOSUM-45 to decide (4) User-defined substitutions Substitution option ? ( 4)Enter allowed substitutions in 3-letter code: Which types to allow for ARG ? (ARG) Which types to allow for GLN ? (GLN) gln asp glu Which types to allow for HIS ? (HIS) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
You now have several options to decide which substitutions to allow.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- If you want to conserve the order in which your residues occur in the sequence, use this constraint. Conserve sequence directionality ? (N) y ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
This constraint, when used, speeds up the search considerably, but you may not always want to use it.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- If you want to conserve neighbouring residues, use this constraint. Conserve neighbouring residues ? (N) y ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
This constraint is useful when you have multiple loops, helices and/or strands with gaps.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- If you want to conserve the sizes of the sequence gaps between the residues in your search pattern, use this constraint. Conserve sequence gaps ? (N) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
This constraint or the previous one *must* be switched on when you do main-chain searches involving sequential residues (loops, turns, etc.). Also, if you have a pattern like "GxxTxN" you may want to conserve the gaps.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- You may want to see the MC/MC and/or SC/SC distance matrices of your search pattern and that of any hits found in the database, to help decide if the hit is good enough for your purposes. Matrices are *only* printed if you search pattern contains 10 or fewer residues. Print distance matrices ? (N) y ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The distance matrices enable you to see how good the fit is, and if there are residues which "fan" more than the others.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- If you are not an O user, you may want the best superpositioning operator to be printed. Print operators ? (N) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
If you create O files, you don't need to see the operators.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- For debugging purposes, you may request extensive output, listing *all* database proteins which are tried. Extensive output ? (N) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Not normally used.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- You may opt to use the centres-of-gravity of the side-chain and/or main-chain atoms. If you have few residues in your search pattern (e.g., 3), it is best to use both. If you only use main-chains, the residue *types* are ignored. 0=SC, 1=MC+SC, 2=MC ? ( 1) 1 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
If you look for 3-5 active site residues etc., use MC+SC to get
a better alignment. If you look for loops etc., use MC only.
If you look for only two residues, MC and SC atoms are always
used. Note that you will get a lot of noise (false hits) if
you look for only two residues, so use low values for the maximum
RMSD and maximum distance difference (e.g., 0.5 and 1.0 A) !
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- You may want an O macro plus LSQ operator file for easy inspection of the hits. Use this only once you have found a proper set of search parameters. O macro and operator file ? (N) yO macro file ? (1tca.omac)
O operator file ? (1tca.odb) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
If you have found a promising set of hits, you may want to look at
them on the display. In that case, use the option to generate an
O macro to do all the hard work for you. This will also generate
an O datablock file which contains all the relevant LSQ operators.
The jiffy program DEJANA (part of the DEJAVU package) can be used
to sort the hits in the O macro produced by SPASM !
From version 2.2 onwards, you may opt to get an input file for the least-squares superpositioning program LSQMAN. This input file will read the coordinates, apply the operator found by SPASM, and attempt to extend the superpositioning between your model and each of the hits. This may enable you to detect more global (or: less local) similarities to other structures. In order for this to work, LSQMAN will need the PDB file which contains your complete model (i.e., not just the motif you ran through SPASM).
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- You may want LSQMAN input file to see if the superpositioning of the putative hits extends beyond the motif you have defined. LSQMAN input file ? (Y)LSQMAN input file ? (cra2.lsqman)
PDB file of your entire model ? (cra2.pdb) /nfs/pdb/full/1cbs.pdb ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Now SPASM takes off. It reads the database proteins one at a time. It then checks to which residues each of the residues in your pattern can be matched. Normally, an "ASP" only matches an "ASP", but there are the following exceptions:
- if you use only main-chain atoms, the sequence is completely ignored, thereby enabling you to find loops etc. (this is not unlike the Lego_loop command in O);
- if you name a residue in your search pattern "XXX" (i.e., instead of ALA, ASN, etc.), it can be matched to *any* residue type;
- if you enabled the substitution option, some residue types can be matched with types other than their own (e.g., Asp-Glu).
When a "hit" is found, SPASM informs you:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ... Searching ...==> HIT : (1CUS) Compound : (CUTINASE (E.C.3.1.1.-)) File : (/nfs/pdb/full/1cus.pdb) Residues : ( 197) Resol (A): ( 1.250)
MATCH with RMSD 0.48 A for 6 pseudo-atoms SER 105 <---> SER 120 * ASP 187 <---> ASP 175 * HIS 224 <---> HIS 188 * Target SC distance matrix SER 105 0.0 8.1 4.7 ASP 187 8.1 0.0 4.3 HIS 224 4.7 4.3 0.0 Hit SC distance matrix SER 120 0.0 8.6 5.0 ASP 175 8.6 0.0 4.7 HIS 188 5.0 4.7 0.0 Target MC distance matrix SER 105 0.0 10.7 8.0 ASP 187 10.7 0.0 4.9 HIS 224 8.0 4.9 0.0 Hit MC distance matrix SER 120 0.0 10.5 8.2 ASP 175 10.5 0.0 5.0 HIS 188 8.2 5.0 0.0 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
It shows the ID, file and compound name of the hit and the number
of residues in it. Then it lists how many possible matches there
exist for each of the individual residues in your search pattern.
Subsequently, it prints the successfull match, including the RMSD.
An asterisk ("*") after a matched residue means that the residue
type is conserved. If you requested this, the MC/MC and/or SC/SC
distance matrices in your search pattern and in the hit are shown.
Finally, the operator which superimposes the database protein with
your search pattern is shown (if you don't generate an O macro,
you can still use this to quickly superimpose the database protein
and your own).
In this example, we find two more hits, both of which make sense (actually, with a bit more relaxed criteria, we find 6 reasonable hits):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ==> HIT : (1HPL) Compound : (LIPASE (E.C.3.1.1.3) (TRIACYLGLYCEROL HYDROLASE)) File : (/nfs/pdb/full/1hpl.pdb) Residues : ( 449) Resol (A): ( 2.300)MATCH with RMSD 1.10 A for 6 pseudo-atoms SER 105 <---> SER A 152 * ASP 187 <---> ASP A 205 * HIS 224 <---> HIS A 263 *
...
==> HIT : (1TCA) Compound : (LIPASE (E.C.3.1.1.3) (TRIACYLGLYCEROL HYDROLASE)) File : (/nfs/pdb/full/1tca.pdb) Residues : ( 317) Resol (A): ( 1.550)
MATCH with RMSD 0.00 A for 6 pseudo-atoms SER 105 <---> SER 105 * ASP 187 <---> ASP 187 * HIS 224 <---> HIS 224 *
Nr of proteins found : ( 3) Nr of proteins tried : ( 472) Total number of hits : ( 3) CPU total/user/sys : 23.6 23.1 0.5
Run again ? (Y) n
*** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM ***
Version - 950421/2.0 Started - Fri Apr 21 23:07:22 1995 Stopped - Fri Apr 21 23:12:55 1995
CPU-time taken : User - 23.1 Sys - 0.6 Total - 23.7
*** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM ***
>>> This program (C) 1993-95, GJ Kleywegt & TA Jones <<< E-mail: "gerard@xray.bmc.uu.se" or "alwyn@xray.bmc.uu.se"
*** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** SPASM *** ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
If you want to run the program again, simply reply "Y(es)", and off you go again. By the way: note that the search took less than half a minute (on an SGI XZ; using an older database) !
As an example of the use of SPASM in locating loops, turns, etc. which
are similar to those in your protein, we use residues Lys 98 - Lys 106
of holo-CRABP II (PDB code 1CBS).
The input could be as follows:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Which PDB file ? (0xyz.pdb) cra5.pdbNr of atoms : ( 67)
CRA5 LYS 98 3.696 23.952 23.582 0.544 25.481 24.450 CRA5 LEU 99 4.357 26.721 21.041 5.674 26.130 18.846 CRA5 LEU 100 2.879 30.070 22.051 4.009 31.225 24.034 CRA5 LYS 101 2.282 30.961 18.413 3.437 33.815 19.258 CRA5 GLY 102 2.423 28.993 15.184 2.423 28.993 15.184 CRA5 GLU 103 2.988 25.348 14.377 0.150 25.311 13.383 CRA5 GLY 104 5.967 23.024 14.231 5.967 23.024 14.231 CRA5 PRO 105 7.338 19.640 15.418 8.813 20.317 14.469 CRA5 LYS 106 6.255 18.453 18.868 4.927 17.885 21.479
Nr of residues found : ( 9) Nr of residues okay : ( 9)
Four-character ID for this run ? (CRA5)
Enter the max RMSD for "good" hits. If you use only a few residues (3-5), an RMSD < 1 A tends to be obtained for similar arrangements of residues. Max superpositioning RMSD ? ( 1.500) 1.0
To speed up the search, any match in which at one of the residue-residue distances differs by more than a certain value are not pursued further. Reasonable values are 1 - 2 A. Max distance mismatch ? ( 2.000)
You may opt to use only structures solved at high resolution by supplying a resolution cut-off. Note: NMR structures have a resolution of 99.99 A, so use a cut-off > 100 if you want to include these. Resolution cut-off (A) ? ( 999.900)
You may opt to allow substitutions of certain residue types. At present, the following are hard-wired: ASP/GLU, ASN/GLN, LEU/ILE, PHE/TYR and LYS/ARG. Allow for these substitutions ? (N)
If you want to conserve the order in which your residues occur in the sequence, use this constraint. Conserve sequence directionality ? (N) y
If you want to conserve neighbouring residues, use this constraint. Conserve neighbouring residues ? (N) y
If you want to conserve the sizes of the sequence gaps between the residues in your search pattern, use this constraint. Conserve sequence gaps ? (N) y
You may want to see the MC/MC and/or SC/SC distance matrices of your search pattern and that of any hits found in the database, to help decide if the hit is good enough for your purposes. Matrices are *only* printed if you search pattern contains 10 or fewer residues. Print distance matrices ? (N) y
If you are not an O user, you may want the best superpositioning operator to be printed. Print operators ? (N)
For debugging purposes, you may request extensive output, listing *all* database proteins which are tried. Extensive output ? (N)
You may opt to use the centres-of-gravity of the side-chain and/or main-chain atoms. If you have few residues in your search pattern (e.g., 3), it is best to use both. If you only use main-chains, the residue *types* are ignored. 0=SC, 1=MC+SC, 2=MC ? ( 1) 2
You may want an O macro plus LSQ operator file for easy inspection of the hits. Use this only once you have found a proper set of search parameters. O macro and operator file ? (N) y
O macro file ? (cra5.omac)
O operator file ? (cra5.odb) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The only hit (with the parameters used above):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ... Searching ...==> HIT : (1HNA) Compound : (GLUTATHIONE S-TRANSFERASE (HUMAN, CLASS MU) (GSTM2-2) FORM A (E.C.2.5.1.18) File : (/nfs/pdb/full/1hna.pdb) Residues : ( 217) Resol (A): ( 1.850)
MATCH with RMSD 0.92 A for 9 pseudo-atoms LYS 98 <---> PHE 147 LEU 99 <---> LEU 148 * LEU 100 <---> GLY 149 LYS 101 <---> ASP 150 GLY 102 <---> LYS 151 GLU 103 <---> ILE 152 GLY 104 <---> THR 153 PRO 105 <---> PHE 154 LYS 106 <---> VAL 155 Target MC distance matrix LYS 98 0.0 3.8 6.4 8.8 9.9 9.3 9.7 9.9 7.7 LEU 99 3.8 0.0 3.8 5.4 6.6 6.9 7.9 9.5 8.8 LEU 100 6.4 3.8 0.0 3.8 7.0 9.0 11.0 13.1 12.5 LYS 101 8.8 5.4 3.8 0.0 3.8 6.9 9.7 12.8 13.1 GLY 102 9.9 6.6 7.0 3.8 0.0 3.8 7.0 10.6 11.8 GLU 103 9.3 6.9 9.0 6.9 3.8 0.0 3.8 7.3 8.9 GLY 104 9.7 7.9 11.0 9.7 7.0 3.8 0.0 3.8 6.5 PRO 105 9.9 9.5 13.1 12.8 10.6 7.3 3.8 0.0 3.8 LYS 106 7.7 8.8 12.5 13.1 11.8 8.9 6.5 3.8 0.0 Hit MC distance matrix PHE 147 0.0 3.8 5.6 8.3 8.4 7.7 8.0 9.6 7.2 LEU 148 3.8 0.0 3.8 6.9 6.9 7.4 7.3 9.7 8.1 GLY 149 5.6 3.8 0.0 3.8 5.7 7.7 9.2 12.3 11.3 ASP 150 8.3 6.9 3.8 0.0 3.8 7.0 9.7 13.2 13.1 LYS 151 8.4 6.9 5.7 3.8 0.0 3.8 6.4 10.2 10.9 ILE 152 7.7 7.4 7.7 7.0 3.8 0.0 3.8 7.0 8.1 THR 153 8.0 7.3 9.2 9.7 6.4 3.8 0.0 3.8 5.6 PHE 154 9.6 9.7 12.3 13.2 10.2 7.0 3.8 0.0 3.8 VAL 155 7.2 8.1 11.3 13.1 10.9 8.1 5.6 3.8 0.0
Nr of proteins found : ( 1) Nr of proteins tried : ( 580) Total number of hits : ( 1) CPU total/user/sys : 124.0 123.6 0.4 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The O macro generated by the second example above, looks as follows:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ! Created by SPASM V. 950421/2.0 at Fri Apr 14 23:23:15 1995 for user gerard ! Search pattern from cra5.pdb ! LYS 98 ! LEU 99 ! LEU 100 ! LYS 101 ! GLY 102 ! GLU 103 ! GLY 104 ! PRO 105 ! LYS 106 read cra5.odb sam_atom_in cra5.pdb CRA5 PDB mol CRA5 pa_case atom_z 4 6 7 8 16 green cyan magenta yellow object CRA5 ca ; end centre_xyz 4.12 25.52 18.25 ! sketch_setup stick smooth 0.1 8 sketch_setup sphere smooth 0 db_create .cpk_radii 110 r db_set_dat .cpk_radii ; 0.2 sketch_stick CRA5 sketch_cpk CRA5 ! ! HIT 1HNA ! GLUTATHIONE S-TRANSFERASE (HUMAN, CLASS MU) (GSTM2-2) FORM A (E.C.2.5.1.18 sam_atom_in /nfs/pdb/full/1hna.pdb 1HNA PDB ! Hit nr 1 ! RMSD 0.92 ! LYS 98 <---> PHE 147 ! LEU 99 <---> LEU 148 * ! LEU 100 <---> GLY 149 ! LYS 101 <---> ASP 150 ! GLY 102 <---> LYS 151 ! GLU 103 <---> ILE 152 ! GLY 104 <---> THR 153 ! PRO 105 <---> PHE 154 ! LYS 106 <---> VAL 155 mol 1HNA delete 1HNA1 ; object 1HNA1 ca ; end lsq_obj 1HNA1_to_CRA5 1HNA1 on_off bell message Done ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The file with the LSQ operators looks as follows:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ! Created by SPASM V. 950421/2.0 at Fri Apr 14 23:23:15 1995 for user gerard .lsq_rt_1HNA1_to_CRA5 r 12 (6f12.6) 0.894604 -0.413836 -0.168596 -0.296522 -0.832013 0.468859 -0.334304 -0.369451 -0.867033 25.347286 31.042042 30.984348 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Now start O, type "@cra5.omac", go and have a cup of coffee if you have many hits, and admire the result ...
NOTE: the macro will *only* work if the PDB file names in the SPASM database file actually point to the corresponding PDB files on your local file system. To this end, your local system manager should do something like this:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- unix> sed -e 's%/nfs/pdb/full%/your/pdb/directory/%' spasm.lib > local.lib ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
By running LSQMAN with the input file produced by SPASM (optional; only available in version 2.2 and later), you may detect similarities between your model and any of the hits. LSQMAN will read your complete model, and then for each of the hits apply the operator found by SPASM, and subsequently try to find more residues which are superimposed through that operator (or an "improved" version). In addition, LSQMAN will produce a new O macro with the (improved) operators for the hits.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- unix> run lsqman < cra6.lsqman >& cra6_lsqman.out ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
After this, you can of course use the DEJAVU companion program DEJANA to sort the hits and remove those you're not impressed by. See the DEJAVU manual for details.
Some labs mirror the PDB directory structure (i.e., the one where entry 1CBS goes into a subdirectory "cb", and where the file is called "pdb1cbs.ent" instead of "1cbs.pdb"). Morten Kjeldgaard contributed the following to help you generate a SPASM database with MKSPAZ.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- Hiya Gerard,I had a problem generating a 'spasm.lib' file because we are mirroring the PDB directory structure, (e.g. pdb1aak.ent is in aa/pdb1aak.ent) so I could not do a simple sed job on spasm.lib. PDB supplies a script hash.pl to generate a dbm database relating entry code to filename.
Therefore I had to re-run mkspaz with the list of 1563 pdb ident codes, but I found that it was not straightforward to create the input file as most entries are in different directories, and because the mkspaz program wants extra input when the file contains an NMR structure. So I created this little perl script to generate the mkspaz input file from a list of pdb idents. Run it by
rsdb < names
The script is useful for people mirroring the PDB directory structure. I thought you might wanna include it in yer spasm manual.
Cheers from MOK!
PS: Now I have my very personalized spasm.lib file! Whooy!
PPS: This is my first (input file generator (input file generator)) program ;-)
----8<-- *snip* -----
#!/usr/sbin/perl # Make an input file for mkspaz from a list # of PDB ident codes. mok 980409.
# define the location of the pdb index files... $dir = "/pdb/index";
print "spazzzzzm.lib\n"; dbmopen(%loc,"$dir/loc",0644);
while (<>) { $id = $_; chop ($id); $filename = $loc{$id};
# only do something if the file exists... if ( -s $filename) { open(FILE, $filename);
print "$filename\n"; print "$id\n";
# check if this is an nmr structure... do { $_ = <FILE>; chop; } until (/^(EXPDTA|ATOM)/) ; $nmr = m/NMR/; if ($nmr) { print "100.0\n"; } } else { print STDERR "$filename does not exist...\n"; } }
-- Morten Kjeldgaard | e-mail: mok@imsb.au.dk Institute of Molecular and Structural Biology | Phone : +45 89 42 50 26 Aarhus University | Fax : +45 86 20 12 22 Gustav Wieds Vej 10, DK-8000 Aarhus C, Denmark | Home : +45 86 18 81 80 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
None, at present.