NUCPLOT manual
N U C P L O T - v.1.0
Operating Manual
A program to generate schematic diagrams of protein-nucleic acid interactions
Written by Nicholas M Luscombe & Roman A Laskowski
Contents
a. running nucplot
b. running nuconly
a. print options
b. plot parameters
c. numerical parameters
d. colour parameters
e. colour definitions
f. non-standard bases
a. miscalculation of H-bonds by HBPLUS
b. expected bond falls outside the criteria
c. missing water contacts
d. using the *.bond file
Appendix A - PDB file format
Appendix B - The *.bond file format
Appendix C - The HBADD program
a. pre-processor for HBPLUS
b. the algorithm
c. problems
Back to NUCPLOT homepage
| Last modified 15 Jan 1998 |
1. Introduction
NUCPLOT is a program that automatically generates schematic 2D representations of protein-nucleic acid interactions. The input is a standard PDB file and the output is a colour or black-and-white PostScript file which gives a simple, at-a-glance representation of the hydrogen bonds and hydrophobic contacts between proteins and nucleic acids.
By default, NUCPLOT expects the hydrogen bonds and non-bonded contacts to be calculated by the HBPLUS program ( McDonald & Thornton, 1994) and can read files output by that program. It is possible to use data computed by other means provided they are supplied in the format described in Appendix B.
HBPLUS was primarily designed for calculating interactions within large molecules found in the PDB. This means that it is unable to recognize the majority of small ligands and as a result may miss certain hydrogen bonds between the protein, nucleic acid and ligand. This would result in plots lacking the required small molecule-nucleic acid interactions.
HBPLUS does allow the user to define ligands in a separate input file as described in the HBPLUS Operating Manual. The file describes the nature of hydrogen bonds each atom in the ligand is able to make and therefore allows HBPLUS to correctly calculate the interactions to it.
A program called HBADD
is supplied which aims to reduce the work involved in creating this
input file for HBPLUS. The program looks for any HETATM records in a
given PDB file and searches for its' structure in the Het Group
Dictionary. The input file is then created. The Het Group Dictionary
is available directly from the PDB at:
ftp://pdb.pdb.bnl.gov/pub/resources/hetgroups/het_dictionary.txt
Or check for PDB mirror sites at:-
http://www.pdb.bnl.gov/mirror_sites.html
The script file provided in the package automatically runs HBADD, HBPLUS and NUCPLOT. We recommend the use of this script for producing your plots.
N
U
C
P
L
O
T manual
2. Installation of programs
a) Decompress and untar the HBPLUS and NUCPLOT programs
Type the following commands:- gunzip hbplus.tar.Z
- tar -xvf hbplus.tar
- gunzip nucplot.tar.Z
- tar -xvf nucplot.tar
b) Compiling the programs.
i) HBPLUS:
Change directories to hbplus/ and type the following command:
- make
ii) NUCPLOT:
Change directories to nucplot/ and type the following command:
- cc -o nucplot nucplot.c -lm
- cc -o hbadd hbadd.c -lm
-lm is an option that includes the Standard Maths library.
iii) Het Group Dictionary:
Place the Het Group Dictionary in the nucplot/ directory.
c) Assignment of aliases/logical names
Under UNIX, the following aliases need to be defined to run NUCPLOT and HBPLUS. They can be placed in the user's .cshrc file.
** nucplot-directory gives the full path to where the NUCPLOT program files are held. Please change accordingly.
After modifying the .cshrc file, type the following command to set up the aliases:
- source .cshrc
N
U
C
P
L
O
T manual
3. How to run NUCPLOT
a) Running NUCPLOT
We recommend running NUCPLOT outside the nucplot/ and hbplus/ directories.
To run NUCPLOT on a structure for the first time, simply type:
- nucplot [pdb filename]
- eg. nucplot /data/pdb/pdb1zaa.ent
This runs a script which automatically executes the HBADD, HBPLUS and NUCPLOT programs in turn to produce the plot.
b) Running NUCONLY
For subsequent runs of NUCPLOT on the same structure (eg. after changing the parameters to alter the appearance of the plot - see below), simply type:
- nuconly [pdb filename]
This runs the NUCPLOT program only and cuts down on the processing time taken for running HBADD and HBPLUS. The *.hb2 and *.nb2 files for the structure must be placed in the same directory as the one in which you are running NUCPLOT
This script may also be used if you wish to supply your own list of hydrogen bonds and non bonded contacts.
N
U
C
P
L
O
T manual
4. NUCPLOT outputs
- nucplot.ps:
- A PostScript file showing the schematic diagram. This may be viewed with GHOSTSCRIPT or GHOSTVIEW.
- *.bond:
- A text file containing a list of all interactions to the nucleic acid. A new file is only created if a .bond file doesn't already exist for the structure.
- hbplus.rc:
- The input file to HBPLUS created by HBADD.
- *.hb2, *.nb2:
- Output files from HBPLUS containing all hydrogen bond and non bonded contact information for the structure.
N
U
C
P
L
O
T manual
5. Altering the appearance of the plot
The resulting plots can be altered by changing the plot parameters in the parameter file "nucplot.par which can be found in the current directory. When running NUCPLOT in a new directory, the default parameter file is copied from the nucplot directory. To return to the default parameters for the plot, simply remove the parameter file from the working directory and run the NUCPLOT script. The file can be edited with any text editor.
a) Print options:
- Produce a colour PostScript file (Y/N)? - Defines whether the diagram
is drawn in colour or black and white.
- Orientation of plot: (P)ortrait or (L)andscape? - Defines whether
the orientation of the plot is to be portrait or landscape.
- Plot title (Y/N)? - Defines whether the PDB filename is shown on the
plot as a title.
- Show key to plot (Y/N)? - Defines whether the key to the symbols on
the plot are shown.
- Output filename = "nucplot.ps" (Y/N)? - Defines whether the output
filename is nucplot.ps. The alternative format is pdbname_X.ps where
pdbname is the name of the input file and X is the chain ID of the
first DNA chain on the plot.
b) Plot parameters:
- Read bonding information from .bond file (Y/N)? - Defines whether the
protein-DNA, water-DNA, and protein-water-DNA interaction information
is read from a file in .bond format. A file listing all the
interactions shown on the plot is automatically produced by
NUCPLOT. This can be edited and used as an input if the you wish
to display certain interactions only.
- Include protein residue H-bonds (Y/N)? - Defines whether protein-DNA
hydrogen bonds should be included in the plot.
- Include non-bridging water H-bonds (Y/N)? - Defines whether
non-bridging water-DNA hydrogen bonds should be included in the plot.
- Include bridging water H-bonds (Y/N)? - Defines whether
protein-bridging water-DNA hydrogen bonds should be included in the
plot.
- Include non-bonded contacts (Y/N)? - Defines whether van der Waals
interactions should be included in the plot.
- Use one-letter amino acid codes for protein residue names (Y/N)? -
Defines whether the residue names shown on the plot are represented
by the single or three letter code.
- Residues only, rather than all atoms, for non-bonded contacts (Y/N)?
- Defines whether the atom name is shown for the van der Waals
bonds. Showing only residue names reduces crowding on the plot.
c) Numerical parameters:
- Number of base pairs on each page - Defines the number of base pairs
to be placed on each page.
- Distance cut off for hydrogen bonds - Defines the cut off distance
for hydrogen bonds. Bonds must be below this length to be displayed
on the plot. The maximum distance allowed is 3.35Å.
More details.
- Distance cut off for non-bonded contacts - Defines the cut off
distance for van der Waals bonds. Maximum distance allowed is
3.90Å. More details.
d) Colour parameters:
The colours of the background and various symbols of the plot can be defined. The definitions of the different colours are described below. The colour name must be one that is defined in the list of colour definitions.
e) Colour definitions:
The colour definitions table allows you to modify any of the default colour definitions. You canalso set up new colours of your own (up to 20 different colours are allowed).
Each entry contains three numbers, each between 0.0 and 1.0, giving the ratios of red, green and blue, respectively, making up the given colour.
Each colour has a name defined within the single quotes and may be altered. These names are referred to when assigning colours in the "Colour parameters" section (see above section).
f) Non-standard bases:
The list allows you to include non-standard bases in the plot. New base types may be added by appending the name to the list as it appears in the PDB.
N
U
C
P
L
O
T
manual
6. Editing the PostScript file
If you are familiar with PostScript files, you can make simple amendments to the plot by editing the nucplot.ps file.
The file is an ASCII text file, and so can be modified using any text editor. The sorts of amendments you can make are: changes to labels (in terms of size, colour and text), addition of other text, changes to colours, sizes, etc.
Some changes, of course, are easier by altering the nucplot.par parameter file and re-running NUCPLOT.
N
U
C
P
L
O
T manual
7. Missing bonds
There are several possible explanations:
a) Miscalculation of H-bond by HBPLUS
As mentioned in the introduction, NUCPLOT uses a list of bonds supplied by the HBPLUS program by default. In this case, HBPLUS sometimes gives incorrect results as, when the program encounters a ligand, residue or base it does not recognize, it may be unable to correctly calculate all the hydrogen bonds the interacting entity makes.
HBPLUS does allow the user to define ligands in a separate input file as described in section 2.6 of the HBPLUS Operating Manual. The file describes the nature of hydrogen bonds each atom in the ligand is able to make and therefore allows HBPLUS to correctly calculate the interactions to it.
A program called HBADD (see Appendix C) is supplied which aims to reduce the work involved in creating this input file for HBPLUS. The program looks for any HETATM records in a given PDB file and searches for its' structure in the Het Group Dictionary. HBADD relies on the atom names and connectivities of the ligand atoms in the PDB file matching those in the dictionary.
If there is a mismatch, you may have to edit the PDB file or create your own HBPLUS input file.
b) Expected bond falls outside the criteria
HBPLUS calculates hydrogen bonds in the following method. All possible hydrogen atom (H) positions are calculated for donor atoms (D) which satisfy specified geometrical criteria with acceptor atoms (A) in the vicinity. The criteria used are: the H-A distance is < 2.7Å, the D-A distance is < 3.35Å, the D-H-A angle is > 90° and the H-A-AA angle is > 90°, where AA is the atom attached to the acceptor.
For non-bonded contacts, all atoms within 3.9Å of each other are considered to be interacting by HBPLUS.
NUCPLOT uses an additional distance cut-off filter which is specified in the nucplot.par parameter file (more details). This is easier to alter than the input for HBPLUS and as long as the distance required is less than the cut-offs specified by HBPLUS, we recommend the use of this method.
c) Missing water contacts
NUCPLOT only considers hydrogen bonds between water molcules and nucleic acids. Non bonded contacts are not included to prevent overcrowding in the diagram.
d) Using the *.bond file
NUCPLOT can be instructed to read the interaction information from the *.bond file. The simplest way to make alterations to this file is to run NUCPLOT, see which bonds are missing, and add these manually to the file before re-running the program using NUCONLY. The file can also be edited to remove any unwanted interactions. See Section 5 for details on the nucplot.par file and Appendix B for an explanation of the *.bond file format.
N
U
C
P
L
O
T
manual
Reference
Luscombe N M, Laskowski R A, Thornton J M. (1997).
NUCPLOT: a program to generate schematic diagams of protein-DNA
interactions.
Nucleic Acids Research, 25, 4940-4945.
N
U
C
P
L
O
T
manual
Appendix A - PDB file format
NUCPLOT reads the atomic coordinates from standard PDB format files only.
In particular, NUCPLOT may not produce coherent outputs if the following are not observed:
- Chain ID
- all protein and nucleic acid chains must be labelled with a chain ID.
- Protein residue and nucleic acid base names
- protein residues must be named by their three-letter codes and bases by their one-letter codes. Some crystallographic programs generate PDB files with three-letter base names but unfortunately some of these codes are already taken up as HETATM codes (eg. GUA = glutaric acid).
- Water names
- water should be labelled HOH.
- Atom names
- all atoms should be named according to the standard PDB format eg. sugar atoms in the nucleic acid backbone should be named C5*, C4* etc not C5', C4'.
For complete details of the PDB format please see:
http://www.biochem.ucl.ac.uk/bsm/pdb/Contents_Guide_2.html
N
U
C
P
L
O
T manual
Appendix B - The *.bond file format
The table below shows the NUCPLOT file format for the *.bond file containing the list of hydrogen bonds, non bonded contacts and covalent bonds to be plotted. The file is automatically produced when the HBPLUS files are used as inputs.
Example file:
10 20 30 40
12345678790123456787901234567879012345678790123
===============================================
Line 1: NUCPLOT v.1.0 - Bond file (pdb1zaa.bond)
Line 2: -----------------------------------------------
|
Line 4: **** Hydrogen Bonds ***************************
| Donor Acceptor Distance
| ARG C 70 NH2 G A 2 O1P 2.87
| ARG C 80 NH2 G A 2 N7 2.86
| ARG C 80 NH1 G A 2 O6 2.99
Line X: HOH 319 O G B 4 O6 2.60
|
|
Line X+3: **** Non Bonded Contacts **********************
| protein DNA Distance
| THR C 56 CG2 C A 3 P 3.63
| THR C 56 CG2 C A 3 O1P 3.82
Line Y: THR C 56 CB C A 3 O2P 3.38
|
|
Line Y+3: **** Covalent Bonds ***************************
| protein DNA Distance
Explanation of columns:
------------------------------------------------------------
Field | Column | Description
No. | range |
------------------------------------------------------------
1. | 1 - 3 | Donor residue 3-letter code
- | 4 - 4 | Blank
2. | 5 - 5 | Donor chain ID
- | 6 - 7 | Blank
3. | 8 - 10 | Donor residue number
- | 11 - 13 | Blank
4. | 14 - 17 | Donor residue atom name
- | 18 - 21 | Blank
5. | 22 - 24 | Acceptor residue 3-letter code
- | 25 - 25 | Blank
6. | 26 - 26 | Acceptor chain ID
- | 27 - 28 | Blank
7. | 29 - 31 | Acceptor residue number
- | 32 - 34 | Blank
8. | 35 - 38 | Acceptor residue name
- | 39 - 41 | Blank
9. | 42 - 45 | H-bond distance
------------------------------------------------------------
N
U
C
P
L
O
T
manual
Appendix C - The HBADD program
a) Pre-processor for HBPLUS
Written by Nicholas M Luscombe & Roman A LaskowskiThis program identifies all Het groups in the input PDB file and searches for them in the Het Group Dictionary, available from the PDB. Any matches are used to generate a definition of the residue type in HBPLUS format. Each residue's definition gives the atom connectivities and, for the polar atoms, defines which are hydrogen-bond donors/acceptors and how many hydrogens they can donate/accept.
The output file, containing the residue definitions, is called hbplus.rc. The information allows the HBPLUS program to calculate all potential H-bonds between ligand and protein correctly.
b) The algorithm
The program reads in the given PDB file and identifies all the HET groups involved. Computes connectivities using CONECT records and distances between atoms.It locates each HET group in the HET Group Dictionary. Connectivities are recorded. Atoms are matched by name and then by connectivity. Where the HET group in the PDB file successfully matches the dictionary definition on both these criteria, the relevant bond angles are computed where necessary.
The rules for H-bond formation, applied to all O, N and S atoms, are as follows (John Mitchell, personal communication):-
-
H-bond donors:
-
Any O, S or N potentially donates the
number of Hs bound to it.
H-bond acceptors:
-
O/S:
-
sp3 - can accept up to 2
sp2 - can accept up to 2
N:
-
planar - no acceptance
sp3 - can accept 1
sp2 - can accept 1
aromatic - can accept (3 - no. of bonds)
amide - no acceptance
c) Problems
Obviously problems occur when:-- The HET group in the PDB file is not present in the HET Group Dictionary, or
- The atom-naming and connectivities in the PDB file do not match that given in the HET Group Dictionary.
In the first case, the residue definition might still need to be manually prepared for HBPLUS (as described in section 2.6 of that program's documentation).
In the second case, a simple solution might be to rename the relevant atoms in the PDB file so that they correspond to the atom-naming in the HET Group Dictionary.
Known bug
In some cases, the graph-matching algorithm of HBADD fails and some of the atom-name mappings between the PDB file and the HET Group Dictionary are missed. This flaw will be rectified soon.
N
U
C
P
L
O
T
manual