Uppsala Software Factory

Uppsala Software Factory - STRUPRO Manual


1 STRUPRO - GENERAL INFORMATION

Program : STRUPRO
Version : 980211
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 590, SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : generate PROSITE profiles from aligned 3D protein structures
Package : SBIN


2 REFERENCES

Reference(s) for this program:

* 1 * G.J. Kleywegt & T.A. Jones (1998). Databases in protein crystallography. Acta Cryst D54, 1119-1131. [http://alpha2.bmc.uu.se/gerard/papers/databases.html] [http://www.iucr.org/iucr-top/journals/acta/tocs/actad/1998/actad5406_1.html]

* 2 * G.J. Kleywegt & T.A. Jones (1999 ?). Chapter 25.2.6. O and associated programs. Int. Tables for Crystallography, Volume F. To be published.


3 VERSION HISTORY

970728 - 0.1 - first version
970804 - 0.4 - first documented version
970805 - 0.5 - try to extend alignments backwards as well; minor changes
971103 - 1.0 - cleaned up code and manual
980206 - 1.1 - minor changes
980211 - 1.2 - bug fix; slightly changed mult. seq. alignment output file format for easier conversion to ALSCRIPT


4 INTRODUCTION

This program generates PROSITE profiles from a set of aligned three-dimensional protein structures in PDB format.

A profile is a matrix where every residue has a row of numbers associated with it, which indicate how well each of the twenty residue types "fit in" at that particular position in the sequence. For instance:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
            Gly Ala Ser ... Phe Tyr Trp ...
 ...
 Ala 263      2   5   3      -2  -2  -4
 Phe 264     -4  -3  -4      10   9   7
 ...
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

A profile can be aligned with all sequences in a database. A sequence which is "compatible" with the profile will receive a high score. For instance, a sequence containing the dipeptide Ala-Tyr would obtain a score of 5 + 9 = 14; Tyr-Ala, on the other hand, would only score -2 + -3 = -5.

Note that, in addition to the twenty values for each of the common amino-acid residue types, the matrix also contains two columns which contain a score (or penalty) for the opening and extension of a gap (in the alignment of a database sequence to the profile).

Whereas patterns are very strict (if one strictly conserved residue is not conserved in one sequence, this sequence will not be matched to the pattern, even if it satisfies the rest of the pattern), profiles are more tolerant/subtle. For instance, if a residue is a tyrosine in all known sequences, a related sequence which happens to have a phenylalanine in that position may still obtain a high score.

Traditionally, profiles have been generated from multiple aligned sequences. The actual values in the profile matrix depend on three factors:
- the variety of residues observed in each position in the aligned sequences (e.g., a strictly conserved Trp will lead to a high value for the Trp-entry in that row of the matrix)
- knowledge about the likelihood of residue substitutions (e.g., Phe and Tyr are closely related residues, so a strictly conserved Phe will also give a fairly high value for a Tyr in that position). This knowledge is encoded in residue substitution tables (e.g., PAM and BLOSUM matrices)
- weights assigned to the individual sequences in the alignment to reduce the effect of sample bias. For instance, if three sequences AAAA, AAAA, and GGGG are used to generate a profile, the first two are redundant and should receive a weight of 1/4 each, whereas the third should be weighted by 1/2.

The program STRUPRO takes a slightly different approach. It takes as input a set of superimposed *structures*, and generates a profile only for stretches of residues that are in structurally equivalent positions. Inside such stretches, insertions are strongly penalised; in between insertions are "cost-neutral". The rationale is that, since structure is generally better conserved than sequence, a profile based only on the structurally-conserved core of a set of proteins stands a better chance of picking up other proteins from the database with a similar structure.

The profile can then be scanned against SWISS-PROT to reveal more proteins that could belong to the same class (structurally, functionally, evolutionarily).

In order to scan sequence profiles against SWISS-PROT, you will also need:

(1) the "pftools" suite of programs, written by Philipp Bucher ( mailto:pbucher@isrec-sun1.unil.ch ) and available by ftp from http://ulrec3.unil.ch:80/ftp-server/pftools/ (the suite should compile on most Unix machines).

(2) the SWISS-PROT database of protein sequences ( http://www.expasy.ch/sprot/sprot-top.html ), which can be downloaded by ftp from ftp://ftp.expasy.ch/databases/swiss-prot/ (at the time of writing, the file "compressed/sprot35.dat.Z").


5 INPUT TO THE PROGRAM


5.1 Start-up

When you start the program, it prints some information:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 *** STRUPRO *** STRUPRO *** STRUPRO *** STRUPRO *** STRUPRO *** STRUPRO ***

Version - 971023/0.7 (C) 1992-97 Gerard J. Kleywegt, Dept. Mol. Biology, BMC, Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL) Others - T.A. Jones, G. Bricogne, Rams, W.A. Hendrickson Others - W. Kabsch, CCP4, PROTEIN, E. Dodson, etc. etc.

Started - Mon Nov 3 15:50:12 1997 User - gerard Mode - interactive Host - sarek ProcID - 28822 Tty - /dev/ttyq14

*** STRUPRO *** STRUPRO *** STRUPRO *** STRUPRO *** STRUPRO *** STRUPRO ***

Reference(s) for this program:

* 1 * G.J. Kleywegt, Uppsala University, Uppsala, Sweden, Unpublished program.

For manuals and complete references, check: http://alpha2.bmc.uu.se/usf/

*** STRUPRO *** STRUPRO *** STRUPRO *** STRUPRO *** STRUPRO *** STRUPRO ***

Max nr of atoms/residues : ( 100000) Max nr of molecules : ( 100) Max nr of residues in sequence : ( 1000) Nr of amino-acid types : ( 20) Random sequence length : ( 2000000) One-letter codes : ( A R N D C E Q G H I L K M F P S T W Y V) Three-letter codes : ( ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


5.2 Random-number seed

The first bit of input is an integer seed for the random-number generator. This will be used to generate a random amino-acid sequence, and to generate random sequences when calculating the weight of each structure/sequence. If you repeat this run of the program on the same machine with the same seed, you should be getting identical results.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Random-number seed ? (  123456) 620605
 Random-number seed : (  620605)
 => Random number generator initialised with seed :     620605
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5.3 Random sequence

The program will now generate a random amino-acid sequence of (at present) 2,000,000 residues. This sequence has an amino-acid distribution similar to that found in proteins in the PDB (GJK, unpublished results). It will be used later to calculate scores for the profile parts, which gives you some idea of the "signal-to-noise".

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Generating random sequence ...
 Target composition    : (   0.081    0.044    0.046    0.058    0.019
  0.058    0.037    0.080    0.022    0.053    0.081    0.059    0.020
  0.040    0.047    0.068    0.063    0.016    0.038    0.071)
 Working ...
 Actual composition    : (   0.081    0.044    0.047    0.058    0.019
  0.058    0.037    0.080    0.022    0.053    0.082    0.059    0.019
  0.040    0.047    0.068    0.063    0.016    0.038    0.070)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5.4 Substitution matrix

Next, you need to provide the name of a file which contains the matrix to be used in the construction of the profiles. A number of matrices are available; others can be made by the user.

Note: if you have defined the environment variable GKLIB so that it points to the directory where you keep your collection of these matrix files (in Uppsala: /nfs/public/lib), the program will use this to generate the default file name.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Library file with matrix ? (/nfs/public/lib/sbin_blosum45.lib) ../sbin_blosum45.lib
 Library file with matrix : (../sbin_blosum45.lib)
 Comment : (! BLOSUM 45 matrix made from BLOCKS v. 5.0 and scaled in half-
  bits.)
 Comment : (! ARNDCQEGHILKMFPSTWYVBZX)
 Comment : (! integer matrix)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Such a matrix file may look as follows (if it would contain real, instead of integer, numbers, replace the MATI by MATR and the format by something appropriate):

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
!
! PAM 250 matrix recommended by Gonnet, Cohen & Benner
! Science June 5, 1992.
! Values rounded to nearest integer
!
TYPE 22 (30(2x,a1))
  C  S  T  P  A  G  N  D  E  Q  H  R  K  M  I  L  V  F  Y  W  X  *
!
MATI (30i3)
 12  0  0 -3  0 -2 -2 -3 -3 -2 -1 -2 -3 -1 -1 -2  0 -1  0 -1 -3 -8
  0  2  2  0  1  0  1  0  0  0  0  0  0 -1 -2 -2 -1 -3 -2 -3  0 -8
  0  2  2  0  1 -1  0  0  0  0  0  0  0 -1 -1 -1  0 -2 -2 -4  0 -8
 -3  0  0  8  0 -2 -1 -1  0  0 -1 -1 -1 -2 -3 -2 -2 -4 -3 -5 -1 -8
  0  1  1  0  2  0  0  0  0  0 -1 -1  0 -1 -1 -1  0 -2 -2 -4  0 -8
 -2  0 -1 -2  0  7  0  0 -1 -1 -1 -1 -1 -4 -4 -4 -3 -5 -4 -4 -1 -8
 -2  1  0 -1  0  0  4  2  1  1  1  0  1 -2 -3 -3 -2 -3 -1 -4  0 -8
 -3  0  0 -1  0  0  2  5  3  1  0  0  0 -3 -4 -4 -3 -4 -3 -5 -1 -8
 -3  0  0  0  0 -1  1  3  4  2  0  0  1 -2 -3 -3 -2 -4 -3 -4 -1 -8
 -2  0  0  0  0 -1  1  1  2  3  1  2  2 -1 -2 -2 -2 -3 -2 -3 -1 -8
 -1  0  0 -1 -1 -1  1  0  0  1  6  1  1 -1 -2 -2 -2  0  2 -1 -1 -8
 -2  0  0 -1 -1 -1  0  0  0  2  1  5  3 -2 -2 -2 -2 -3 -2 -2 -1 -8
 -3  0  0 -1  0 -1  1  0  1  2  1  3  3 -1 -2 -2 -2 -3 -2 -4 -1 -8
 -1 -1 -1 -2 -1 -4 -2 -3 -2 -1 -1 -2 -1  4  2  3  2  2  0 -1 -1 -8
 -1 -2 -1 -3 -1 -4 -3 -4 -3 -2 -2 -2 -2  2  4  3  3  1 -1 -2 -1 -8
 -2 -2 -1 -2 -1 -4 -3 -4 -3 -2 -2 -2 -2  3  3  4  2  2  0 -1 -1 -8
  0 -1  0 -2  0 -3 -2 -3 -2 -2 -2 -2 -2  2  3  2  3  0 -1 -3 -1 -8
 -1 -3 -2 -4 -2 -5 -3 -4 -4 -3  0 -3 -3  2  1  2  0  7  5  4 -2 -8
  0 -2 -2 -3 -2 -4 -1 -3 -3 -2  2 -2 -2  0 -1  0 -1  5  8  4 -2 -8
 -1 -3 -4 -5 -4 -4 -4 -5 -4 -3 -1 -2 -4 -1 -2 -1 -3  4  4 14 -4 -8
 -3  0  0 -1  0 -1  0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 -2 -4 -1 -8
 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8 -8
!
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that the matrix may contain entries for residue types not used by STRUPRO (e.g., "X", "B", "Z", "*"); the program will ignore these.


5.5 Cut-off distance and frameshifts

You are to provide a cut-off distance (in Å) for CA atoms of different molecules to be considered equivalent. If this number is very high, frameshifts may occur in the structural alignments, although the program can be instructed to try and correct for these. Another cut-off distance determines how bits of equivalent structure are extended at their ends.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Equivalent CA distance ? (   5.000)
 Equivalent CA distance : (   5.000)

Extension CA distance ? ( 8.000) Extension CA distance : ( 8.000)

Try to correct frame-shifts (Y/N) ? (Y) Try to correct frame-shifts (Y/N) : (Y) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


5.6 Minimum fragment length

Only structurally conserved, sequential stretches of a certain minimum length will be used in the profile (they must be at least 3 residues for the RMSD calculations to work).

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Min fragment length ? (       5)
 Min fragment length : (       5)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5.7 Sequence weighting

Appropriate weighting of the various structures/sequences is important to minimise bias in the profile (e.g., five different structures of the same human protein and only one of an insect form of the protein will bias the profile towards human sequences). The following weights can be used:

- uniform weights, i.e. all weights equal; this is not advisable

- rms(rmsd) weights, i.e. based on the structural variation; this is probably not very useful since it may be determined to a certain extent by practices of the crystallographer ;-)

- sequence distance weights, as defined by Sibbald and Argos; this is probably the most sensible choice (in this implementation, the number of "Monte Carlo" cycles executed lies between 100,000 and 1,000,000, or fewer if the weights converge to within 1%)

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Sequences may be weighted:
 U = uniform weights
 R = rms(rmsd) weights
 S = sequence distance weights
 Weighting scheme ? (S)
 Weighting scheme : (S)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5.8 PDB and profile files

Provide the name of the PDB file which contains ALL molecules. Note that the molecules must have been superimposed previously (e.g., with O or LSQMAN; LSQMAN contains a BRute_force command to find structural alignments "ab initio"). Any two subsequent molecules in the file must have different chain identifiers. However, not all identifiers have to be unique (which would otherwise limit you to a maximum of 26 molecules), e.g. you could alternate chain identifiers A and B. Note that the program *ONLY* reads the CA atoms, so you can make your files considerably smaller by only including these (e.g.: grep ^ATOM myfile.pdb | grep ' CA ' > new.pdb).

You must also provide the name of the (output) profile file (these customarily have an extension ".prf").

In addition, the program writes a structure-based multiple-sequence alignment to a new file.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Name of PDB file ? (aligned.pdb)
 Name of PDB file : (aligned.pdb)

Name of profile file ? (aligned.prf) Name of profile file : (aligned.prf)

Name of sequence alignment file ? (aligned.seq) Name of sequence alignment file : (aligned.seq)

Remark : (REMARK 1ayh.pdb CHAIN B) Remark : (REMARK 1lte.pdb CHAIN C) Remark : (REMARK 1eg1.pdb CHAIN D) Remark : (REMARK hieg.pdb CHAIN E) Nr of CA atoms : ( 1656) Nr of molecules : ( 5)

Mol # 1 Atoms 1 to 434 Mol # 2 Atoms 435 to 648 Mol # 3 Atoms 649 to 887 Mol # 4 Atoms 888 to 1258 Mol # 5 Atoms 1259 to 1656

---------------------------------------------------------------------- Unknown amino-acid type : (PCA) REPLACED BY ALANINE !!! Unknown amino-acid type : (PYR) REPLACED BY ALANINE !!! Unknown amino-acid type : (PCA) REPLACED BY ALANINE !!! ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


6 OUTPUT

STRUPRO will now start looking for residues that are structurally equivalent in all aligned structures (i.e., a residue in the first protein has a partner in each of the other structures within the cut-off distance). When it encounters such a residue, it checks to see if neighbouring residues (on either side) also have partners in all the other structures (now using the second distance cut-off).

In this way, a set of residues is equivalenced between all structures. However, the structural superposition may not always be optimal, so the program will try to detect and fix any frameshift errors. It does this simply by checking for each structure if shifting the alignment to the first structure by one residue forward or backward would improve the superpositioning RMSD. If so, the equivalenced residues are altered accordingly, and the frameshift test is carried out again, until no more frameshifts occur.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 ----------------------------------------------------------------------
 Shift mol   2 by -1 (RMSD -1/0/+1 :    4.3   6.4   8.9 A)
 Shift mol   3 by -1 (RMSD -1/0/+1 :    1.8   3.7   6.3 A)
 Shift mol   4 by -1 (RMSD -1/0/+1 :    4.2   6.3   8.8 A)
 Shift mol   7 by -1 (RMSD -1/0/+1 :    3.9   6.0   8.3 A)

---------------------------------------------------------------------- ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

At that stage, the program will again try to extend the alignments in both directions using the extension distance cut-off. If the resulting conserved set of residues contains at least the minimum number of residues defined by the user, a potential pattern has been found.

For every structurally conserved stretch of residues that the program encounters, the output includes:

- a listing of the first residue of the stretch of structurally conserved residues in every molecule

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 New structurally conserved stretch !
 Starts at residue ALA -  372
   molecule    2 @ GLY -  188
   molecule    3 @ PHE -  131
   molecule    4 @ GLN -  325
   molecule    5 @ GLY -  352
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

- information about the structural variation. For every residue the RMS(RMSD) of the comparisons to all other Nmol-1 structures is printed.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Number of residues       : (       7)
 RMS (RMSD) all pairs (A) : (   2.121)
 RMS(RMSD) (A): (   1.899    2.220    2.688    1.850    1.821)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

- weights are calculated and printed. In the case of sequence distance weights ,this may take a little while since thousands of random sequences need to be generated, and statistics accumulated.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Calculating sequence distances ...
 Weights converged : (      40000)
 Largest shift (%) : (   0.852)
 Weights      : (   0.161    0.228    0.288    0.182    0.141)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

- for every residue, the amino acid for every molecule, and the profile matrix entries are listed

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 AA-TYPE :  ALA ARG ASN ASP CYS GLU GLN GLY HIS ILE LEU LYS MET PHE PRO SER THR TRP TYR VAL
 |AGFQG|
 PROFILE :    0 -16  -7 -15 -24 -17  -7  14 -16 -22 -13 -16 -13   3 -18  -4 -12 -13  -9 -17
 |NVSYN|
 PROFILE :   -4  -9  11  -4 -17  -9 -10 -12  -3  -5 -15  -9 -10  -6 -21   9   4 -25   0  -2
 |MDNMM|
 PROFILE :  -12  -7  12   7 -22   0  -5 -12   3  -5  -6  -5  16 -15 -18  -7  -7 -30 -10 -11
 |LDPNE|
 PROFILE :  -12 -10   5  12 -29   2   4 -16  -6 -18 -16  -6 -13 -25  14  -6  -8 -31 -18 -24
 |WWWWW|
 PROFILE :  -20 -20 -40 -40 -50 -20 -30 -20 -30 -20 -20 -20 -20  10 -30 -40 -30 150  30 -30
 |LLDLL|
 PROFILE :  -13 -17 -16  -1 -23 -14  -8 -24 -14   3  27 -21   6  -4 -24 -21 -10 -26  -6  -2
 |DGPDD|
 PROFILE :  -13 -15   4  29 -33  -7   5   5 -10 -34 -30  -7 -25 -35  17  -3 -12 -33 -25 -30
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

- next, the program slides the profile along the entire random amino acid sequence and calculates statistics:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Random sequence tests :      1999994
 Average, St.dev.      :        -71.8        39.4
 Minimum, Maximum      :       -196.0       225.0
 Z-min, Z-max          :        -3.15        7.53

Mol # 1 Raw score = 217 Z-score = 7.33 Mol # 2 Raw score = 213 Z-score = 7.22 Mol # 3 Raw score = 204 Z-score = 7.00 Mol # 4 Raw score = 220 Z-score = 7.40 Mol # 5 Raw score = 249 Z-score = 8.14 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


7 RESULTS

When the program has finished, it will print a summary:

- the pairwise sequence identity matrix (in %), *ONLY* counting the residues that ended up being in the profile:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Nr of residues in profile : (        183)

Sequence identity for these residues only: % Seq id mol # 1 -> 100.0 11.5 7.7 50.8 46.4 % Seq id mol # 2 -> 11.5 100.0 9.8 10.9 12.0 % Seq id mol # 3 -> 7.7 9.8 100.0 8.2 9.8 % Seq id mol # 4 -> 50.8 10.9 8.2 100.0 54.6 % Seq id mol # 5 -> 46.4 12.0 9.8 54.6 100.0

Average sequence identity (%) : ( 22.186) St. dev. : ( 18.759) Minimum : ( 7.650) Maximum : ( 54.645) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

- some results pertaining to the random sequence

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Sum of maximum random scores : (       3260)
 Sum AVE+3SIGMA random scores : (       1048)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

- the accumulated raw scores of the input structures/sequences.

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Score for molecule   1 =       3806
 Score for molecule   2 =       2862
 Score for molecule   3 =       2536
 Score for molecule   4 =       3866
 Score for molecule   5 =       3933
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

- a suggestion is made for the minimum raw score to be used in searches against the (SWISS-PROT) sequence database (note that it is better to scan the whole sequence database to get realistic statistics)

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 Minimum raw score : (       2100)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


8 PROFILE FILE

For the example above, the following profile file is generated:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
ID   STRUPRO; MATRIX.
AC   PS99999;
DT   JAN-1900 (CREATED);
DE   Created by STRUPRO V. 971103/1.0 at Mon Nov 3 16:05:41 1997 for user gerard
CC
CC   Substitution matrix file : ../sbin_blosum45.lib
CC   Nr of structures used : 5
CC   Equivalent CA distance (A) : 5.000000
CC   Extension CA distance (A) : 8.000000
CC   Frameshift correction used
CC   Min fragment length : 5
CC   Weighting scheme : S
CC
MA   /GENERAL_SPEC: ALPHABET='ARNDCEQGHILKMFPSTWYV'; LENGTH=183;
MA   TOPOLOGY=LINEAR;
MA   /DISJOINT: DEFINITION=PROTECT; N1=1; N2=183;
MA   /CUT_OFF: LEVEL=0; SCORE= 2100;
MA   /DEFAULT: MI=-100; I=-10; IM=0 ; MD=-100; D=-3; DM=0;
MA   /M: SY='D'; M=-8,-12,8,23,-23,-8,0,5,-12,-29,-23,-8,-20,-27,-12,7,9,-32,-19,-19;
MA   /M: SY='N'; M=-4,-7,26,3,-18,-7,-7,5,-7,-20,-22,-7,-16,-18,-16,12,15,-33,-18,-19;
MA   /M: SY='W'; M=-7,-17,-17,-24,-24,-17,-17,-18,-19,-6,0,-21,-6,12,-23,-7,-5,18,6,-8;
MA   /M: SY='V'; M=-4,-12,-14,-20,-18,-2,-13,-27,-18,12,0,-12,4,-12,-18,-2,9,-26,-8,15;
MA   /M: SY='V'; M=-5,-8,-23,-23,-17,-18,-18,-28,-23,14,11,-6,8,-4,-25,-15,-5,-25,-7,23;
MA   /M: SY='L'; M=4,-17,-16,-17,-22,-7,1,-22,-17,6,13,-15,3,-9,-16,-11,-8,-23,-10,1;
MA   /M: SY='D'; M=-15,-13,15,49,-30,-5,10,11,-5,-40,-30,-5,-27,-37,-13,0,-13,-35,-23,-30;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='N'; M=8,-10,10,-6,-18,-6,-8,-8,1,-12,-16,-7,-12,-6,-18,6,0,-15,7,-13;
MA   /M: SY='G'; M=0,-11,20,1,-21,-9,-9,30,-9,-29,-30,-11,-20,-24,-17,14,-3,-31,-24,-24;
MA   /M: SY='V'; M=-3,-20,-30,-30,-13,-27,-27,-30,-27,27,22,-23,13,3,-30,-16,-3,-27,-7,38;
MA   /M: SY='T'; M=-6,-13,11,-10,-19,-10,-13,-20,-14,4,-7,-13,-4,-10,-16,6,18,-30,-10,0;
MA   /M: SY='T'; M=-6,-13,-6,-19,-13,-19,-16,-23,-20,-7,-4,-16,-7,16,-16,9,33,-19,1,0;
MA   /M: SY='S'; M=-2,-6,12,6,-17,3,14,-11,-7,-20,-21,-3,-17,-20,-9,18,17,-34,-17,-16;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='S'; M=4,-4,0,-6,-14,-4,-4,-10,-14,-12,-22,1,-12,-18,-14,20,10,-34,-16,0;
MA   /M: SY='L'; M=-10,-20,-30,-30,-20,-20,-20,-30,-20,20,50,-30,20,10,-30,-30,-10,-20,0,10;
MA   /M: SY='R'; M=-6,15,2,0,-22,6,12,-16,-8,-24,-22,14,-14,-22,-10,8,9,-28,-14,-16;
MA   /M: SY='L'; M=-10,-20,-26,-32,-22,-16,-22,-30,-18,26,38,-26,28,6,-26,-26,-10,-20,0,14;
MA   /M: SY='G'; M=-4,-10,13,2,-24,-7,0,21,-10,-28,-24,-8,-18,-24,-14,6,0,-28,-22,-24;
MA   /M: SY='H'; M=-14,-2,-9,-10,-26,-4,4,-24,8,-13,-1,-2,-2,2,-18,-14,-12,-18,3,-14;
MA   /M: SY='I'; M=-8,-20,-18,-27,-22,-18,-24,-32,-17,23,7,-20,7,4,-22,-10,5,-14,13,19;
MA   /M: SY='T'; M=-4,-10,4,-10,-16,-6,-10,-14,-8,-2,-1,-12,6,-8,-18,5,10,-30,-10,-4;
MA   /M: SY='P'; M=-6,-10,-6,4,-30,4,24,-16,-10,-24,-26,-2,-20,-28,33,5,-4,-32,-24,-26;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='N'; M=-10,-13,7,5,-19,-16,-8,-15,-10,-11,-13,-13,-12,5,-20,1,-2,-25,-5,-6;
MA   /M: SY='D'; M=-5,-15,-5,7,-20,-11,-4,-16,-16,-9,-17,-11,-13,-21,1,5,1,-35,-18,1;
MA   /M: SY='C'; M=1,-17,-2,-12,20,-14,-14,0,-19,-24,-23,-17,-18,-19,-20,15,12,-37,-22,-11;
MA   /M: SY='A'; M=19,-20,-11,-14,-24,-12,-9,11,-20,-20,-21,-12,-16,-26,19,1,-8,-23,-26,-17;
MA   /M: SY='R'; M=-17,54,0,-7,-30,23,5,-20,3,-27,-20,25,-7,-25,-17,-7,-10,-20,-10,-23;
MA   /M: SY='Y'; M=-8,-15,-20,-23,-18,-17,-19,-28,-11,9,15,-18,6,9,-25,-12,6,-11,17,9;
MA   /M: SY='Y'; M=-18,9,-17,-20,-28,-7,-15,-27,5,-3,7,-4,2,12,-27,-20,-10,5,37,-8;
MA   /M: SY='L'; M=-7,-15,-17,-20,-20,-12,-15,-22,-8,5,17,-20,5,7,-25,-9,-2,-13,14,0;
MA   /M: SY='L'; M=7,-16,-15,-22,-15,-11,-15,-18,-16,5,14,-16,12,-4,-18,-5,8,-23,-7,5;
MA   /M: SY='D'; M=-2,1,20,24,-24,1,7,-8,-3,-27,-26,10,-19,-29,-13,2,-6,-32,-18,-22;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='Y'; M=1,-12,-13,-18,-20,-10,-15,-21,1,-5,-5,-10,-5,9,-20,-3,7,4,35,-5;
MA   /M: SY='Q'; M=-8,5,10,4,-24,10,12,-17,-3,-16,-20,12,-9,-24,-14,-2,-6,-29,-14,-14;
MA   /M: SY='M'; M=-10,-13,-16,-22,-29,5,-10,-25,-10,10,0,-11,20,-15,8,-14,-10,-22,-9,-2;
MA   /M: SY='F'; M=-12,-18,-26,-30,-20,-24,-24,-30,-13,14,20,-23,9,26,-30,-21,-8,-4,23,13;
MA   /M: SY='H'; M=-9,-2,4,-4,-26,-1,-5,2,28,-28,-22,-2,-9,-22,-16,-2,-4,-26,-3,-22;
MA   /M: SY='L'; M=-12,-20,-25,-30,-25,-18,-22,-32,-13,22,31,-25,15,13,-28,-25,-10,-8,19,10;
MA   /M: SY='W'; M=-8,-15,-5,-16,-28,-15,-17,3,-17,-15,-9,-17,-11,-8,-23,-11,-6,15,-5,-17;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='G'; M=-4,-11,9,-3,-22,-9,0,11,-10,-24,-22,-11,-17,-9,-16,9,-4,-25,-15,-20;
MA   /M: SY='E'; M=-13,-6,-13,-18,-25,13,-4,-25,-6,-4,7,-11,5,4,-21,-13,-10,-12,3,-12;
MA   /M: SY='Y'; M=-16,-10,-17,-23,-25,-3,-13,-27,-2,-2,6,-14,3,25,-25,-17,-10,3,29,-9;
MA   /M: SY='T'; M=0,-5,3,-4,-15,12,1,-14,-9,-16,-19,-5,-10,-21,-10,21,24,-30,-13,-11;
MA   /M: SY='F'; M=-15,3,-18,-30,-20,-25,-22,-27,-18,1,2,-12,0,33,-27,-15,-7,-8,9,9;
MA   /M: SY='D'; M=-12,-13,7,23,-22,-10,2,-12,-8,-24,-20,-10,-20,-4,-15,6,-2,-27,-7,-17;
MA   /M: SY='V'; M=0,-15,-17,-22,-13,-14,-20,-20,-17,15,3,-15,16,-5,-22,0,2,-30,-10,24;
MA   /M: SY='D'; M=-17,-2,4,23,-27,-7,5,-18,-8,-27,-20,6,-17,-7,-15,-8,-10,-22,-5,-20;
MA   /M: SY='P'; M=6,-17,-14,-15,-20,-12,-9,-15,-19,-4,-8,-15,-7,-15,11,3,1,-29,-18,-1;
MA   /M: SY='A'; M=14,-18,-5,-17,-15,-9,-12,-13,-19,2,-9,-15,-6,-13,-13,13,12,-28,-13,4;
MA   /M: SY='K'; M=0,12,-2,0,-27,21,17,-17,-5,-24,-23,24,-9,-31,-8,-3,-8,-22,-13,-21;
MA   /M: SY='L'; M=-10,-9,2,-4,-23,-4,6,-19,-7,-4,10,-12,-1,-8,-20,-12,-7,-28,-11,-11;
MA   /M: SY='P'; M=-7,-17,-15,-10,-32,-10,-3,-20,-20,-17,-25,-10,-17,-25,63,-2,6,-30,-25,-22;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='C'; M=-8,-23,-15,-20,46,-20,-17,-25,-25,-23,-20,-20,-18,-20,3,-3,4,-40,-25,-13;
MA   /M: SY='G'; M=14,-20,-3,-13,-24,-17,-17,51,-20,-32,-24,-17,-17,-27,-17,3,-14,-20,-27,-22;
MA   /M: SY='M'; M=-13,-13,-8,0,-27,1,-5,-24,-9,4,5,-12,9,-15,-17,-13,-10,-26,-7,-3;
MA   /M: SY='N'; M=-5,-10,22,0,-20,-13,-13,12,-8,-14,-21,-10,-13,-18,-22,3,-6,-32,-20,-11;
MA   /M: SY='G'; M=1,-16,-4,-12,-19,-12,-12,15,-16,-16,-8,-19,-9,-15,-19,7,-2,-28,-18,-11;
MA   /M: SY='A'; M=21,-16,-8,-15,-10,-12,-12,-8,-19,-3,-12,-13,-8,-14,-16,16,8,-31,-17,10;
MA   /M: SY='F'; M=-15,-20,-25,-35,-20,-30,-25,-30,-20,10,30,-30,10,46,-30,-25,-10,-5,15,5;
MA   /M: SY='F'; M=-20,-15,-20,-30,-25,-25,-25,-30,0,0,5,-20,0,55,-30,-20,-10,20,55,-5;
MA   /M: SY='L'; M=-9,-15,-18,-27,-18,-16,-19,-25,-14,9,21,-20,20,15,-23,-14,4,-17,3,6;
MA   /M: SY='G'; M=-2,-15,-7,-13,-20,-13,-15,7,-9,-12,-16,-15,-10,-8,-21,6,-2,-16,2,-5;
MA   /M: SY='P'; M=-4,-8,-4,-2,-25,8,10,-16,-10,-19,-22,-4,-14,-25,19,9,9,-30,-19,-19;
MA   /M: SY='M'; M=-5,-12,-10,-20,-20,-7,-17,1,-10,-2,0,-12,22,-10,-17,-4,4,-23,-10,-2;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='C'; M=-4,-17,-6,-9,44,-9,1,-18,-17,-27,-23,-14,-20,-22,-21,8,-1,-42,-24,-15;
MA   /M: SY='C'; M=-16,-21,-25,-29,33,-21,-27,-28,-14,-18,-14,-21,-14,3,-34,-20,-15,23,19,-15;
MA   /M: SY='D'; M=-9,-4,24,29,-24,6,24,-9,1,-28,-27,2,-22,-28,-10,9,-2,-37,-20,-27;
MA   /M: SY='E'; M=-7,4,0,-3,-24,38,11,-20,1,-17,-17,4,-3,-31,-10,6,9,-23,-10,-21;
MA   /M: SY='M'; M=-10,-18,-23,-32,-22,-11,-22,-28,-13,27,29,-21,38,3,-23,-23,-10,-20,0,15;
MA   /M: SY='D'; M=-14,-13,14,45,-30,-6,7,15,-6,-40,-30,-6,-27,-37,-13,0,-13,-34,-23,-30;
MA   /M: SY='I'; M=-7,-27,-23,-37,-24,-23,-30,-37,-30,44,17,-27,17,0,-23,-17,-7,-23,-3,36;
MA   /M: SY='W'; M=-13,-3,-16,-16,-34,24,-1,-22,-6,-14,-9,-5,-2,-18,-19,-16,-16,29,3,-24;
MA   /M: SY='F'; M=-16,-7,-11,-22,-24,5,-8,-26,-7,-9,-3,-12,0,26,-21,-11,-10,-3,12,-13;
MA   /M: SY='D'; M=-1,-17,-6,6,-22,-13,-7,-3,-15,-12,-3,-15,-8,-18,-19,-7,-9,-28,-16,-5;
MA   /M: SY='N'; M=-4,-8,27,3,-19,-8,-8,10,-7,-22,-24,-8,-17,-19,-17,11,11,-32,-19,-21;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='R'; M=-11,11,-8,-15,-28,-5,-8,-23,-15,-9,-13,0,-7,-16,9,-4,5,-25,-13,-9;
MA   /M: SY='H'; M=11,-1,1,-7,-20,2,-1,-10,14,-22,-21,5,-9,-22,-12,5,-4,-26,-8,-14;
MA   /M: SY='I'; M=-4,-12,-2,-14,-20,-2,-11,-21,-13,9,-6,-12,1,-14,-19,2,0,-29,-11,8;
MA   /M: SY='A'; M=10,-11,-1,-4,-24,-1,7,8,8,-26,-19,-8,-13,-25,-12,1,-11,-24,-15,-20;
MA   /M: SY='L'; M=-11,-22,-25,-35,-21,-27,-27,-32,-24,24,25,-28,13,23,-28,-21,-8,-15,5,20;
MA   /M: SY='N'; M=-7,-9,14,13,-17,-9,-4,-15,-10,-12,-16,-7,-14,-18,-16,7,14,-35,-15,-5;
MA   /M: SY='P'; M=-10,-18,-22,-17,-30,-15,-12,-25,-13,-3,-13,-12,-8,-8,31,-12,-8,-15,2,-5;
MA   /M: SY='H'; M=-18,-2,15,0,-28,3,-5,-18,58,-20,-18,-8,-5,-8,-22,-8,-13,-18,25,-25;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='D'; M=-12,-10,26,37,-27,-5,5,12,-2,-35,-30,-5,-25,-32,-15,3,-10,-35,-22,-30;
MA   /M: SY='S'; M=1,-8,-6,-9,-18,-7,-6,-14,-17,-9,-20,-3,-11,-18,2,11,5,-33,-18,0;
MA   /M: SY='N'; M=0,-18,6,-1,-25,-12,-11,5,-14,-8,-14,-14,-9,-20,-17,-2,-9,-27,-17,-9;
MA   /M: SY='G'; M=-5,4,0,-10,-30,-12,-15,46,-15,-37,-27,-7,-17,-27,-20,-3,-17,-20,-25,-27;
MA   /M: SY='C'; M=-7,-17,-5,-15,49,-12,-15,-20,7,-27,-23,-20,-15,-20,-27,3,-5,-42,-15,-15;
MA   /M: SY='G'; M=-8,-14,-2,-2,-30,3,-6,6,-12,-11,-14,-12,-6,-26,-16,-5,-13,-23,-16,-14;
MA   /M: SY='W'; M=-15,5,-16,-20,-35,-8,-10,-22,-18,-22,-20,14,-12,1,-20,-22,-17,41,10,-20;
MA   /M: SY='N'; M=-5,-8,22,0,-15,-10,-10,-13,-8,-5,-15,-8,-10,-12,-20,8,13,-35,-15,-2;
MA   /M: SY='P'; M=-10,-14,-2,-4,-32,-3,9,-22,-12,-5,-15,-8,-10,-21,21,-7,-8,-29,-18,-15;
MA   /M: SY='Y'; M=-10,-13,-13,-12,-28,-8,-10,-20,2,-10,-15,-10,-10,2,7,-3,-3,-3,26,-15;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='Y'; M=-10,-15,-13,-16,-26,-13,-11,-20,-6,-11,-14,-14,-11,10,9,-3,-3,-7,16,-14;
MA   /M: SY='Y'; M=-18,-15,-22,-28,-25,-21,-23,-30,-1,5,15,-21,5,40,-30,-22,-10,12,47,-2;
MA   /M: SY='G'; M=-3,-14,0,-1,-30,-8,3,44,-14,-37,-27,-11,-20,-30,-14,0,-17,-23,-27,-30;
MA   /M: SY='F'; M=-14,-5,-19,-23,-28,-17,-13,-25,-17,-5,4,-14,-2,9,8,-18,-10,-16,-4,-10;
MA   /M: SY='D'; M=-11,-15,11,32,-30,-9,1,28,-9,-40,-30,-9,-25,-35,-15,0,-15,-31,-25,-30;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='L'; M=-9,-17,-17,-4,-19,-18,-13,-25,-18,8,15,-19,4,-6,-25,-15,-7,-29,-9,14;
MA   /M: SY='D'; M=0,-11,19,38,-23,-3,9,-6,-4,-29,-25,-3,-23,-32,-12,4,-6,-35,-20,-22;
MA   /M: SY='T'; M=0,-8,17,0,-12,-5,-5,-10,-10,-15,-20,-8,-15,-15,-12,23,30,-35,-15,-10;
MA   /M: SY='G'; M=-2,-4,-3,-9,-22,-7,-7,3,-16,-19,-14,0,-9,-18,-16,3,2,-25,-15,-13;
MA   /M: SY='K'; M=-7,10,0,2,-30,5,14,3,-10,-33,-28,22,-15,-30,-10,-5,-13,-22,-18,-25;
MA   /M: SY='F'; M=-8,-10,-16,-21,-21,-19,-15,-25,-21,-3,-6,-8,-4,9,-3,-7,3,-18,-3,4;
MA   /M: SY='F'; M=-2,-15,-11,-23,-20,-17,-16,-20,11,-7,4,-20,1,19,-23,-12,-10,-12,11,-6;
MA   /M: SY='T'; M=-2,-8,14,-3,-12,-8,-8,-15,-13,-12,-15,-8,-12,-12,-12,18,38,-32,-12,-7;
MA   /M: SY='V'; M=-7,-19,-26,-29,-19,-23,-27,-32,-17,26,9,-19,9,8,-28,-14,-4,-13,15,31;
MA   /M: SY='V'; M=11,-22,-23,-29,-14,-23,-25,-24,-27,23,7,-19,7,-5,-23,-7,-2,-26,-11,33;
MA   /M: SY='T'; M=-8,-17,-10,-25,-17,-20,-20,-27,-22,7,2,-20,0,16,-18,0,20,-17,3,7;
MA   /M: SY='Q'; M=-13,5,5,28,-30,12,38,-17,-2,-33,-25,17,-20,-33,-5,-2,-10,-30,-18,-28;
MA   /M: SY='W'; M=-20,-18,-25,-35,-30,-28,-28,-27,-13,-5,0,-23,-5,50,-30,-25,-15,51,42,-10;
MA   /M: SY='D'; M=-12,-4,11,21,-27,11,20,-16,0,-21,-13,-1,-13,-26,-13,-3,-8,-31,-15,-23;
MA   /M: SY='A'; M=18,-17,-9,-14,-18,-10,-7,-12,-20,-13,-15,-10,-13,-19,16,8,14,-26,-19,-8;
MA   /M: SY='N'; M=-1,-11,17,13,-20,-5,-2,16,-7,-29,-30,-9,-22,-26,-14,17,1,-35,-23,-22;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='K'; M=1,5,-4,-8,-23,0,-2,-15,-5,-18,-20,16,-10,-13,-14,1,-3,-13,6,-13;
MA   /M: SY='I'; M=-10,-27,-23,-37,-27,-20,-27,-37,-27,42,28,-30,20,3,-23,-23,-10,-20,0,25;
MA   /M: SY='H'; M=-10,-1,3,-7,-22,-3,-5,-19,7,-11,-3,-1,-2,-12,-19,-6,2,-27,-4,-12;
MA   /M: SY='R'; M=-20,32,-6,-14,-34,3,-7,-20,20,-28,-20,9,-10,-13,-22,-17,-17,14,7,-25;
MA   /M: SY='Y'; M=0,-7,-13,-19,-23,-11,-13,-20,-3,-8,-7,-2,-5,13,-21,-10,-7,4,28,-8;
MA   /M: SY='Y'; M=-10,-15,-25,-25,-20,-20,-25,-30,-4,14,5,-15,5,16,-30,-15,-5,1,37,19;
MA   /M: SY='V'; M=-9,-14,-13,1,-21,-11,1,-24,-15,1,6,-12,-2,-12,-20,-11,-7,-30,-11,7;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='V'; M=-5,-18,-6,-6,-22,-16,-15,-4,-20,-2,-8,-16,-6,-16,-18,-1,3,-27,-14,4;
MA   /M: SY='I'; M=-12,-22,-22,-34,-23,-25,-28,-34,-18,24,12,-24,9,24,-26,-18,-8,-6,20,20;
MA   /M: SY='Q'; M=-8,-8,-10,-6,-25,7,10,-22,-10,-11,-4,-6,-5,-19,5,-4,2,-26,-13,-15;
MA   /M: SY='Q'; M=-3,-3,0,-4,-23,3,9,-17,-12,-10,-18,4,-10,-20,-10,8,1,-29,-14,-9;
MA   /M: SY='A'; M=8,-16,-10,-17,-23,-11,-9,-14,7,-15,-13,-14,-8,-4,6,-4,-9,-19,-4,-13;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='N'; M=-14,-9,24,12,-26,-2,-3,-15,18,-10,-17,-9,-9,-19,-18,-3,-8,-33,-7,-17;
MA   /M: SY='T'; M=7,-8,-6,-14,-16,6,-6,-16,-11,-7,-7,-6,4,-16,-12,7,16,-24,-10,-4;
MA   /M: SY='I'; M=2,-17,-15,-25,-24,0,-13,-25,-17,16,11,-17,9,-11,-18,-11,-8,-20,-6,6;
MA   /M: SY='N'; M=-6,-13,13,5,-20,-7,-5,-15,-11,-6,-15,-11,-11,-17,-14,9,10,-33,-13,-6;
MA   /M: SY='D'; M=5,-14,1,15,-17,-9,-1,-10,-13,-14,-17,-8,-15,-23,-14,8,0,-34,-18,-2;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='L'; M=-4,-16,-18,-22,-14,-18,-18,-26,-22,10,17,-20,6,0,-22,-5,16,-26,-6,15;
MA   /M: SY='A'; M=12,-12,1,10,-18,-2,10,-10,-12,-21,-17,-4,-17,-24,-8,8,6,-29,-18,-13;
MA   /M: SY='N'; M=-4,-10,6,-2,-20,-8,0,-1,-12,-13,-15,-8,-11,-18,-17,3,2,-30,-18,-7;
MA   /M: SY='M'; M=-12,-7,-15,-27,-25,-10,-17,-26,-13,11,6,-5,22,10,-20,-18,-10,-14,4,5;
MA   /M: SY='G'; M=-7,-5,-4,-1,-32,-2,9,11,-13,-31,-28,4,-18,-30,11,-4,-13,-25,-23,-28;
MA   /M: SY='K'; M=-3,6,-5,-7,-21,8,0,-18,-11,-12,-19,13,-6,-23,-15,3,-1,-27,-12,-3;
MA   /M: SY='A'; M=25,-18,-12,-20,-12,-12,-12,-11,-20,-3,4,-15,-3,-11,-15,3,9,-22,-13,2;
MA   /M: SY='P'; M=-8,-18,-19,-17,-27,-13,-9,-23,-20,-4,1,-17,-4,-12,28,-10,3,-27,-16,-10;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='E'; M=-4,7,2,-6,-22,12,0,-3,-7,-22,-21,0,-10,-24,-13,10,8,-26,-15,-18;
MA   /M: SY='W'; M=-8,-20,-16,-18,-38,-17,-18,23,-23,-30,-27,-17,-20,-19,5,-13,-20,24,-14,-30;
MA   /M: SY='M'; M=-5,-15,-18,-25,-20,-13,-23,0,-13,8,5,-15,26,-8,-23,-12,-10,-23,-10,11;
MA   /M: SY='V'; M=-8,-5,-9,5,-20,-12,-6,-22,-17,-4,-11,3,-6,-18,-20,-7,-5,-30,-13,11;
MA   /M: SY='L'; M=-7,-23,-27,-33,-20,-23,-25,-33,-25,30,32,-27,17,5,-27,-22,-7,-23,-3,26;
MA   /M: SY='M'; M=5,-17,-16,-23,-18,-17,-22,4,-18,2,-2,-16,11,-11,-22,-7,-8,-23,-15,10;
MA   /M: SY='M'; M=-12,-14,-23,-32,-20,-12,-22,-24,-9,17,26,-19,39,16,-24,-23,-10,-15,5,8;
MA   /M: SY='S'; M=5,-7,23,5,-13,0,0,0,-5,-20,-30,-7,-20,-20,-13,32,15,-40,-20,-15;
MA   /M: SY='L'; M=-7,-23,-19,-28,-26,-20,-23,-6,-23,13,19,-27,9,-4,-24,-19,-13,-20,-8,6;
MA   /M: SY='W'; M=-1,-20,-32,-35,-39,-17,-25,-15,-27,-17,-17,-17,-17,2,-25,-26,-22,104,16,-22;
MA   /M: SY='N'; M=-10,-7,22,10,-24,-6,-4,-10,-6,-20,-23,-6,-19,-16,-17,3,7,-7,-9,-22;
MA   /M: SY='G'; M=-10,-15,10,28,-30,-10,-1,32,-10,-40,-30,-10,-25,-35,-15,0,-15,-30,-25,-30;
MA   /M: SY='T'; M=7,-9,4,-5,-18,-4,1,-13,-6,-13,-13,-5,-12,-11,-13,6,9,-20,-2,-11;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='G'; M=0,-16,-7,-15,-24,-17,-7,14,-16,-22,-13,-16,-13,3,-18,-4,-12,-13,-9,-17;
MA   /M: SY='N'; M=-4,-9,11,-4,-17,-9,-10,-12,-3,-5,-15,-9,-10,-6,-21,9,4,-25,0,-2;
MA   /M: SY='M'; M=-12,-7,12,7,-22,0,-5,-12,3,-5,-6,-5,16,-15,-18,-7,-7,-30,-10,-11;
MA   /M: SY='P'; M=-12,-10,5,12,-29,2,4,-16,-6,-18,-16,-6,-13,-25,14,-6,-8,-31,-18,-24;
MA   /M: SY='W'; M=-20,-20,-40,-40,-50,-20,-30,-20,-30,-20,-20,-20,-20,10,-30,-40,-30,150,30,-30;
MA   /M: SY='L'; M=-13,-17,-16,-1,-23,-14,-8,-24,-14,3,27,-21,6,-4,-24,-21,-10,-26,-6,-2;
MA   /M: SY='D'; M=-13,-15,4,29,-33,-7,5,5,-10,-34,-30,-7,-25,-35,17,-3,-12,-33,-25,-30;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='G'; M=0,-20,0,-10,-30,-20,-20,70,-20,-40,-30,-20,-20,-30,-20,0,-20,-20,-30,-30;
MA   /M: SY='A'; M=19,-14,2,-11,-12,-10,-10,-7,-15,-5,-14,-10,-10,-16,-16,12,5,-30,-18,4;
MA   /M: SY='P'; M=-12,-13,-16,-6,-36,-3,9,-22,-6,-18,-21,-6,-15,-16,43,-10,-10,-16,-2,-25;
MA   /M: SY='N'; M=8,8,15,-2,-17,0,-2,-4,-5,-20,-23,1,-16,-20,-14,15,4,-31,-18,-15;
MA   /M: SY='N'; M=-10,-7,24,24,-27,0,13,10,-1,-31,-28,-2,-22,-29,-13,3,-9,-33,-22,-30;
MA   /M: SY='V'; M=19,-23,-18,-29,-17,-18,-21,-20,-26,19,4,-19,4,-9,-18,-4,-3,-22,-11,21;
MA   /M: SY='A'; M=5,-9,2,-10,-19,3,-6,-14,-8,-3,-3,-8,-2,-16,-19,-2,-4,-26,-13,-4;
MA     /I: MI=0; I=-1; MD=0; /M: SY='X'; M=0; D=-1;
MA   /M: SY='N'; M=2,10,14,3,-21,4,10,-8,-3,-22,-23,5,-16,-22,-13,9,0,-30,-18,-19;
MA   /M: SY='D'; M=-8,-11,8,14,-24,-7,5,5,-9,-21,-20,-7,-16,-25,-16,0,-9,-31,-20,-14;
MA   /M: SY='A'; M=21,-9,7,-6,-16,7,-1,-4,-7,-16,-19,-4,-12,-24,-12,14,2,-27,-18,-13;
MA   /M: SY='P'; M=3,-15,1,-5,-29,-8,-2,-11,-13,-18,-26,-8,-18,-26,43,-1,-6,-30,-26,-24;
MA   /M: SY='N'; M=-12,-6,13,-3,-27,3,-2,-15,-3,-16,-20,-6,-12,-7,6,-3,-6,-24,-10,-24;
MA   /M: SY='T'; M=5,-16,-13,-17,-18,-12,-10,-19,-20,-5,0,-15,-5,-11,5,1,15,-26,-13,-4;
MA   /M: SY='H'; M=-16,5,-1,-5,-30,15,1,-22,43,-21,-17,5,-2,-14,-18,-10,-14,-12,22,-23;
MA   /M: SY='V'; M=7,-18,-15,-6,-14,-19,-15,-19,-21,6,-3,-13,-3,-13,-21,-3,-2,-30,-14,21;
MA   /M: SY='T'; M=-2,-9,-11,-15,-15,-1,-11,-24,-17,3,-5,-9,0,-13,-18,4,17,-28,-10,12;
MA   /M: SY='Y'; M=-20,-14,-20,-27,-26,-21,-24,-30,6,0,4,-17,0,48,-30,-20,-10,23,62,-6;
MA   /M: SY='S'; M=1,-10,10,14,-15,-2,3,-6,-10,-23,-26,-8,-20,-23,-10,27,19,-38,-18,-13;
MA   /M: SY='W'; M=-14,-9,15,-7,-33,-9,-13,-9,-8,-20,-26,-9,-20,-7,-24,-12,-13,45,2,-30;
MA   /M: SY='I'; M=-3,-21,-18,-27,-19,-18,-22,-27,-24,24,13,-23,9,-2,-22,-7,-1,-27,-7,22;
MA   /M: SY='R'; M=-16,34,-4,-12,-28,-1,-2,-22,-9,-24,-18,26,-8,-3,-18,-12,-10,-14,-1,-16;
MA   /M: SY='W'; M=-18,-13,-23,-22,-35,-13,-8,-24,-10,-14,-10,-13,-12,19,-24,-23,-17,57,31,-20;
MA   /M: SY='G'; M=11,-18,-2,-12,-21,-16,-16,34,-20,-27,-21,-16,-16,-23,-16,7,1,-22,-23,-17;
MA   /M: SY='S'; M=-3,-8,4,10,-23,10,7,-9,-6,-23,-28,-4,-18,-29,9,16,3,-34,-20,-21;
//
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Note that insertions and deletions in the structurally conserved stretches are severely penalised (since there were none in the set of aligned structures !), whereas they may occur anywhere in between such stretches (since they occur in (almost) all the input structures !).


9 SEQUENCE ALIGNMENT FILE

STRUPRO also produces a file which contains the structure-based multiple-sequence alignment of the input models. This file can be used to add and align additional protein sequences, which can then in turn be used with the program MSEQPRO to generate a profile based on this multiple-sequence alignment.

The format of this file is simple:
- lines beginning with an exclamation mark ("!") are ignored
- other lines represent one sequence each
- an empty line signals a break, and will reset the program's sequence counter to 1

The file (part of it) may look as follows:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
!
! Sequence alignment file
! Created by STRUPRO V. 971103/1.0 at Mon Nov 3 16:03:29 1997 for user gerard
!
! REMARK 1ayh.pdb CHAIN B
! REMARK 1lte.pdb CHAIN C
! REMARK 1eg1.pdb CHAIN D
! REMARK hieg.pdb CHAIN E

! NOT ALIGNED MOL 1 FROM PCA- 1 TO GLN- 28 ! ASACTLQSETHPPLTWQKCSSGGTCTQQ ! NOT ALIGNED MOL 2 FROM GLN- 1 TO SER- 15 ! QTGGSFFEPFNSYNS ! NOT ALIGNED MOL 3 FROM VAL- 1 TO ASN- 15 ! VETISFSFSEFEPGN ! NOT ALIGNED MOL 4 FROM PYR- 1 TO GLN- 28 ! AQPGTSTPEVHPKLTTYKCTKSGGCVAQ ! NOT ALIGNED MOL 5 FROM PCA- 1 TO ALA- 27 ! AKPGETKEVHPQLTTFRCTKRGGCKPA ! ALIGNED MOL 1 FROM THR- 29 TO ASP- 35 TGSVVID- ! ALIGNED MOL 2 FROM GLY- 16 TO ASP- 22 GTWEKAD- ! ALIGNED MOL 3 FROM ASP- 16 TO GLY- 22 DNLTLQG- ! ALIGNED MOL 4 FROM ASP- 29 TO ASP- 35 DTSVVLD- ! ALIGNED MOL 5 FROM THR- 28 TO ASP- 34 TNFIVLD-

! NOT ALIGNED MOL 1 FROM ALA- 36 TO THR- 81 ! ANWRWTHATNSSTNCYDGNTWSSTLCPDNETCAKNCCLDGAAYAST ! NOT ALIGNED MOL 2 FROM GLY- 23 TO ALA- 36 ! GYSNGGVFNCTWRA ! NOT ALIGNED MOL 3 FROM ALA- 23 TO GLY- 22 ! ! NOT ALIGNED MOL 4 FROM TRP- 36 TO ALA- 79 ! WNYRWMHDANYNSCTVNGGVNTTLCPDEATCGKNCFIEGVDYAA ! NOT ALIGNED MOL 5 FROM SER- 35 TO GLN- 83 ! SLSHPIHRAEGLGPGGCGDWGNPPPKDVCPDVESCAKNCIMEGIPDYSQ ! ALIGNED MOL 1 FROM TYR- 82 TO SER- 87 YGVTTS- ! ALIGNED MOL 2 FROM ASN- 37 TO THR- 42 NNVNFT- ! ALIGNED MOL 3 FROM ALA- 23 TO GLN- 28 ASLITQ- ! ALIGNED MOL 4 FROM SER- 80 TO SER- 85 SGVTTS- ! ALIGNED MOL 5 FROM TYR- 84 TO ASN- 89 YGVTTN-

[...] ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----


10 KNOWN BUGS

None, at present ("peppar, peppar").


11 UNKNOWN BUGS

Does not compute.


Uppsala Software Factory Created at Fri Dec 18 19:42:30 1998 by MAN2HTML version 971024/1.6