Uppsala Software Factory

Uppsala Software Factory - CT2HET Manual


1 CT2HET - GENERAL INFORMATION

Program : CT2HET
Version : 970725
Author : Gerard J. Kleywegt, Dept. of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Box 590, SE-751 24 Uppsala, SWEDEN
E-mail : gerard@xray.bmc.uu.se
Purpose : Finding hetero compounds in OMAC/hetero.pdb
Package : X-UTIL


2 REFERENCES

Reference(s) for this program:

* 1 * G.J. Kleywegt (1992-1999). Uppsala University, Uppsala, Sweden. Unpublished program.

* 2 * G.J. Kleywegt & T.A. Jones (1999 ?). Chapter 25.2.6. O and associated programs. Int. Tables for Crystallography, Volume F. To be published.


3 VERSION HISTORY

960513 - 0.1 - first version
960517 - 0.2 - minor changes
960629 - 0.3 - allow "**" to mean *any* type of atom; allow definition of dihedrals and impropers to distinguish cis/trans, R/S chiral centres, flat/puckered rings/groups, etc.
970725 - 0.4 - minor bug fix


4 DESCRIPTION

This is a little jiffy program that may be of help when you are looking for all PDB entries that contain a certain type of hetero compound, or to see if there is any hetero compound in the PDB which is a superstructure of your particular one (for instance, if you want to generate O or X-PLOR dictionaries).

The program requires two input files:
- a description of a hetero compound in connection table format (either generated by something like ChemDraw, or created by editing a file);
- the collection of hetero compounds available from Uppsala (OMAC/hetero.pdb).

The nice thing is that you only have to specify atom types and bonds in your connection table file. For example to define a six-membered ring of which one atom is an oxygen:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
C5O1 six-ring
  6  6
   -0.7114    0.4317    0.0000 C
   -1.2114   -0.4339    0.0000 C
   -0.2511   -0.1608    0.0000 C
    0.7137   -0.4339    0.0000 O
    1.2137    0.4317    0.0000 C
    0.2533    0.1586    0.0000 C
  1  2  1  1
  2  3  1  1
  3  4  1  1
  4  5  1  1
  5  6  1  1
  1  6  1  1
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

The format of this file for CT2HET is as follows:
- line 1: title
- line 2: number of atoms and number of bonds (free format)
- for each atom, a line which may contain anything, but the last two characters of which must be the chemical element symbol of the atom ("CU" for copper, " C" for carbon, etc.)
- for each bond, the numbers of the two atoms between which the bond exists

Hence, a minimalist version of the above file would be:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
C5O1 six-ring
 6 6
 C
 C
 C
 O
 C
 C
 1 2
 2 3
 3 4
 4 5
 5 6
 1 6
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

To look for amino-acid-like compounds, use:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
alanine
5 4
 N
 C
 C
 C
 O
1 2
2 3
2 4
4 5
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

To look for compounds which contain a peptide link, use:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
dipeptide
8 7
 N
 C
 C
 O
 N
 C
 C
 O
1 2
2 3
3 4
3 5
5 6
6 7
7 8
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

To look for epoxides (use 1.6 A cut-off), use:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
EPOXIDE
3 3
 C
 C
 O
1 2
2 3
3 1
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

From version 0.3 onward, you may use the wildcard "**" or " *" to indicate *any* type of atom. For example to look for six-membered rings C6, C5O1, C5S1, C5N1, ... use:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
C5X1 six-ring
 6 6
 C
 C
 C
**
 C
 C
 1 2
 2 3
 3 4
 4 5
 5 6
 1 6
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

From version 0.3 onward, you may also add any number of dihedrals and/or impropers (treated equivalently; dihedrals are simply "proper impropers"). This can be used to distinguish cis/trans isomers, R/S chiral centres and to constrain the search to find only flat rings/groups, etc.
The number of impropers is specified on the second line of the input file, after the number of atoms and bonds. The impropers themselves follow after the bonds, one per line. Each line should contain the numbers of the four atoms defining the improper, the target value for the improper (modulo 360), and the tolerance (degrees).
For example, to find all compounds which contain a benzene ring with a sulfur atom attached to it, use:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

C6---C5 / \ S7--C1 C4 \ / C2---C3

----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
S-benzyl moiety
 7 7 7
 C
 C
 C
 C
 C
 C
 S
 1 2
 2 3
 3 4
 4 5
 5 6
 1 6
 1 7
 1 2 3 4 0 10 flatness dihedral
 2 3 4 5 0 10 ,,
 3 4 5 6 0 10 ,,
 4 5 6 1 0 10 ,,
 5 6 1 2 0 10 ,,
 6 1 2 3 0 10 ,,
 1 2 6 7 0 10 flatness improper
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

To find all compounds which contain a D-amino acid residue moiety, use:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
generic D amino acid
5 4 1
1 N
2 CA C
3 CB C
4 C
5 O
1 2
2 3
2 4
4 5
2 1 4 3 -35 15  only D form (+35 for L form)
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

One of the hits will be:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 -*- HIT -*- HIT -*- HIT -*- HIT -*-
 COMPND : ( D-VALINE)
REMARK  Extracted from PDB file 173d.pdb
REMARK  Formula C5 H11 N1 O2
REMARK  Nr of non-hydrogen atoms 7
REMARK  Residue type DVA
REMARK  Residue name 545
REMARK  Original residue name (for O) $A19
REMARK    2 RESOLUTION. 3.0  ANGSTROMS.                                  173D  17
REMARK  Compound also present in : 209D 2D55
 Matches :
   1  N  N   DVA   545
   2  C  CA  DVA   545
   3  C  CB  DVA   545
   4  C  C   DVA   545
   5  O  O   DVA   545
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   


5 EXAMPLE

If you do have coordinates for a compound, you can quickly obtain a connection table file by running PDB2CT:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
<254 capo.bmc.uu.se progs/utils> PDB2CT  < leo.pdb > leo.ct
All done
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

You have to edit the resulting file to fix the atom names.

When you have prepared a connection table file, simply run the program and answer the questions. In the example, we shall look for retinol-like compounds. The connection table file defines all atoms of retinol, except the hydroxyl group:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
PDB2CT
   20   20
   0.000   0.000   0.000 C
   1.546   0.000   0.000 C
   2.184   1.352   0.000 C
   1.185   2.458   0.000 C
  -0.265   1.962   0.000 C
  -0.592   0.633   0.000 C
  -1.975   0.108   0.000 C
  -3.125   0.793   0.000 C
  -4.457   0.230   0.000 C
  -5.503   1.081   0.000 C
  -6.890   0.658   0.000 C
  -7.844   1.594   0.000 C
  -9.255   1.274   0.000 C
 -10.155   2.288   0.000 C
 -11.622   2.188   0.000 C
   0.252  -1.761   0.000 C
   1.107   0.611   0.000 C
  -1.228   3.152   0.000 C
  -4.654  -1.291   0.000 C
  -9.677  -0.204   0.000 C
    1    2 1 1
    1    6 1 1
    1   16 1 1
    1   17 1 1
    2    3 1 1
    3    4 1 1
    4    5 1 1
    5    6 1 1
    5   18 1 1
    6    7 1 1
    7    8 2 1
    8    9 1 1
    9   10 2 1
    9   19 1 1
   10   11 1 1
   11   12 2 1
   12   13 1 1
   13   14 1 1
   13   20 1 1
   14   15 1 1
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
   

Using this as input yields:

      
 ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
 *** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET ***

Version - 960517/0.2 (C) 1993-6 Gerard J. Kleywegt, Dept. Mol. Biology, BMC, Uppsala (S) User I/O - routines courtesy of Rolf Boelens, Univ. of Utrecht (NL) Others - T.A. Jones, G. Bricogne, Rams, W.A. Hendrickson Others - W. Kabsch, CCP4, PROTEIN, E. Dodson, etc. etc.

Started - Fri May 17 17:42:44 1996 User - gerard Mode - interactive Host - ALPHA/OSF1 ProcID - 12679 Tty - /dev/ttyp4

*** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET ***

Reference(s) for this program:

* 1 * G.J. Kleywegt, Uppsala University, Uppsala, Sweden, Unpublished program.

*** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET ***

Max nr of atoms : ( 150) Max nr of bonds : ( 150)

To decide if two atoms are bonded, the program uses two distance cut-offs: one if both atoms are light elements (hydrogen through to neon), the other in all other cases (e.g., C-I bonds). Max bonded distance H-Ne ? ( 1.700) Max bonded distance others ? ( 2.200)

Sometimes the first two characters of an atom name are not equal to the chemical element symbol. Usually, an extra character has been prefixed (e.g., AP or XC). The program can use this heuristic. Use atom name heuristic ? (Y)

If you want, the HETATMs of the hits can be listed, ready for cutting and pasting into a PDB file. List atoms of hits ? (N)

If you want, the matching atom names can be listed. List matching atom pairs ? (N)

Connectivity table file ? (het.ct) rta.ct Title : (PDB2CT) Nr of atoms : ( 20) Nr of bonds : ( 20) 1 2 1 6 1 16 1 17 2 3 3 4 4 5 5 6 5 18 6 7 7 8 8 9 9 10 9 19 10 11 11 12 12 13 13 14 13 20 14 15 Connectivity table read Atom types : ( C C C C C C C C C C C C C C C C C C C C)

Hetero compound library ? (/nfs/public/lib/hetero.pdb) ... -*- HIT -*- HIT -*- HIT -*- HIT -*- COMPND : ( CARBENOXOLONE) REMARK Extracted from PDB file 1hdc.pdb REMARK Formula C34 H50 O7 REMARK Nr of non-hydrogen atoms 41 REMARK Residue type CBO REMARK Residue name 385 REMARK Original residue name (for O) $A301 REMARK 2 RESOLUTION. 2.20 ANGSTROMS. 1HDC 25 ... -*- HIT -*- HIT -*- HIT -*- HIT -*- COMPND : ( N-ETHYL RETINAMIDE) REMARK Extracted from PDB file 1erb.pdb REMARK Formula C22 H33 N1 O1 REMARK Nr of non-hydrogen atoms 24 REMARK Residue type ETR REMARK Residue name 576 REMARK Original residue name (for O) $176 REMARK 2 RESOLUTION. 1.9 ANGSTROMS. 1ERB 38 ... COMPND : ( N-(4-HYDROXYPHENYL)ALL-TRANS RETINAMIDE) ... COMPND : ( RETINOIC ACID) ... COMPND : ( RETINOIC ACID (ALL-TRANS)) ... COMPND : ( RETINAL) ... COMPND : ( RETINOL) ...

Library scan done Nr of compounds checked : ( 1357) Nr of hits found : ( 7)

*** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET ***

Version - 960517/0.2 Started - Fri May 17 17:42:44 1996 Stopped - Fri May 17 17:43:48 1996

CPU-time taken : User - 1.7 Sys - 0.3 Total - 2.0

*** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET ***

>>> This program (C) 1993-96, GJ Kleywegt & TA Jones <<< E-mail: "gerard@xray.bmc.uu.se" or "alwyn@xray.bmc.uu.se"

*** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET *** CT2HET *** ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----

All hits seem to make sense.


6 KNOWN BUGS

None, at present ("peppar, peppar").


Uppsala Software Factory Created at Fri Dec 18 19:42:07 1998 by MAN2HTML version 971024/1.6