You are here: Home Users and Science Experimental facilities Structural biology How to use our beamlines RUN YOUR EXPERIMENT Automatic Data Processing

Automatic Data Processing

last modified 12-04-2011 09:16

Automatic processing of data on MX beamlines

Overview

As of March, 2010 a  system for the automatic processing of data with XDS is available on all ID MX beamlines.  The general program flow is as follows:

  1. User selects the 'Process & Analyze Data' Checkbox in the Collect->Parameters tab and specifies whether there is expected anomalous signal and the number of residues in the ASU
  2. As soon as data collection is initiated, a fast auto processing run is started which should keep up with the images as they are being collected.  The directory containing these results can be found in PROCESSED_DATA/your_image_directory_name/xds_your_run/xds_fastproc
  3. Once data collection has stopped, a full autoprocessing run is performed in all reasonable bravais lattices.  POINTLESS is also used in parallel to determine the space group.  An initial integration run is perfomed, and the refined parameters from this run are used for a second integration run.  Resolution ranges are adjusted in order to obtain an I/sigma of 2 in the outer resolution shell.  Data are then converted to F's and converted to CCP4 MTZ format  as well as merged and unmerged Scalepack formats.
  4. Data, where significant anomalous signal is detected, are submitted automatically to the AutoRickshaw server.  Please read related terms and conditions.

Where are my results and Data?

The directory structure is as follows:

PROCESSED_DATA/your_image_directory_name/xds_your_run/xds_parallelproc:  top level directory for full autoprocessing.  XDS files from the indexing run are kept here

PROCESSED_DATA
/your_image_directory_name/xds_your_run/xds_parallelproc/SGnumber_unit_cell_dimensions : Directory for a given space group.  there are usually many of these off of the top level directory

 

for example:

/mntdirect/_data_visitor/mx555/id23eh2/20100316/PROCESSED_DATA/John/fooprotein/xds_168_run2_1/xds_parallelproc/1_88.1_104.1_107.8_90.0_90.1_90.2

would hold the results for processing in P1, unit cell dimensions = 88.1, 104.1, 107.8, 90.0, 90.1, 90.2

Files within each space group/unit cell dimensions directory include:

XSCALE.LP:  final integaration statistics 

ccp4.mtz:  Data in CCP4 mtz format

unmerged.mtz:  Data in multi record MTZ format, suitable for SCALA scaling

xscaled_merged.sca:  Scalepack format, merged

xscaled_unmerged.sca: Scalepack format, unmerged

scala.log: Statistics from SCALA


Moreover, you can now monitor your results through ISPYB (LIMS). See this webpage for more information.

More details of what is really happening:

 

Integration is done in XDS and the resolution is cut at each round (multiple rounds are used in the first place in order to refine beam X, Y, distance, unit cell parameters etc).

 

After XDS integration, XSCALEing is done four times:

merging, anom ON

merging, anom OFF

no merging, anom ON

no merging, anom OFF

 

In the merged cases, data are converted to MTZ format in two ways:

1)The result of the XSCALE is converted to a scalepack file, then via SCALEPACK2MTZ, then TRUNCATE, then UNIQUEIFY generating these files:   merged_anom_sc2mtz.mtz and merged_noanom_sc2mtz.mtz

2a)POINTLESS is used to convert the XDS_ASCII file from XDS (i.e. pre XSCALE, pre conversion to SCA) to a multirecord MTZ file, suitable for SCALA

b)SCALA is run on this data , which generates the log file. 

 

Truncate and downstream programs are not run at the moment - but this may change !

 

 Files generated:

merged_anom_pointless_multirecord.mtz

merged_anom_pointless.mtz

merged_noanom_pointless_multirecord.mtz

merged_noanom_pointless.mtz

 

In the POINTLESS version, there is one round of resolution estimation missing so:

1) the .sca files are usually a better estimate of the resolution of the data. 

 

If you find this not to be the case, please let Max know.  I would recommend checking the *XSCALE.LP log files to see if it has cut resolution to your

standards.

 

Check the results carefully. Below shows a typical case for .sca files and mtz file comparison. The SCA files appear to be truncated to a more realistic resolution than in the POINTLESS files (as expected).

pointless:  Outer shell res, shell I/sig, Rsym = 3.05, 1.0,  87% (!!)

scalepack: Outer shell res, shell I/sig, Rsym  = 3.22, 3.36, 41.4%

 

 

 

Expert user debugging

Is there a problem?  Here is what you should expect to see upon collecting a dataset of >3 images:

  • You should see debugging in the MxCube window "Information Messages" with messages such as "In processDataScripts".  NOTE: in remote data collection situations, the MXCube on the console will not output this debugging.  The remote user should, however, see it.
  • As data is being collected, a xds_fastproc directory should be created automatically in the top level directory, i.e.
PROCESSED_DATA/your_image_directory_name/xds_your_run/xds_fastproc

 After a short pause (depending on cluster load), you should see XDS begin to index and integrate normally

  • After the data collection has been completed, an xds_parallelproc directory should be created in the top level directory.  i.e.
PROCESSED_DATA/your_image_directory_name/xds_your_run/xds_parallelproc

  • Assuming that your data are indexable by the autoprocessing software, you should then see sub directories appear for each possible bravais lattice (most will be the lowest symmetry space group of the Bravais lattice, with the exception of the Pointless choice )
  • Integration should proceed until the files described earlier (ccp4.mtz, xscaled_*, etc) are generated.

Remedies to problems found above

These tasks are normally done by the beamline scientist, but an experienced user may also find them useful.

Is the mxprocessing server running? 

bliss_dserver status MxProcessing.py

if it is not running, start it:

bliss_dserver start MxProcessing.py

IMPORTANT: the MxProcessing.py server MUST be run as the operator (e.g. opid14, opid23) and NOT blissadm

Is the ISPyB server running?

Check in MXstartup or with blissadm status.  It needs to be running for results to be stored in the database.  Autoprocessing CAN NOT proceed without this, but will generate errors and not upload any processing information to ISPyB.  One clue that the ISPYBserver is not running is the error in the Information Messages tab that reads like:

Error starting processing, is the autoprocessing server correctly configured?  TypeRrror('cannot marchal None unless allow_none is enable')

This is sometimes caused by the user logging into MxCube when an ISPyB server is not running.  This can cause no SessionID to be set, which means no DataCollectionIDs are sent, which throws an exception.  do a bliss_dserver status and look for the ISpyBServer.

 

 

Do you see debugging in the MxCube "Information Messages"?

You should see items such as "In processDataScripts".  If not, you are probably running the wrong version of MxCube.

Known problems

 

  • Results can be mediocre if there are few spots or with very low (worse than 5 A) data
  • in "fast" mode, the programs can have a difficult time keeping up with short (<0.5 second/frame) exposures
  • Autoprocessing is running and creating output files but there is nothing displayed in ISPYB. ** PLEASE CONTACT YOUR LC (Max, Stephanie or Elspeth)** It is likely that the server which uploads the results to ISPYB has crashed and needs to be restarted.

 

Updates

  • February, 2011 Improved indexing reliability by including more images in XDS spot picking, with a failover to LABELIT
  • October 6, 2010 Fixed a bug which was causing the P1 integration to take a long time (and cause subsequent POINTLESS run to be slow as well)
  • October 5, 2010 Added many new files to be attached to ispyb:

 

merged_anom_sc2mtz.mtz

Merged data, anomalous pairs treated separately, imported via Scalepack2MTZ

merged_anom_pointless_multirecord.mtz

Merged data, anomalous pairs treated separately, imported via pointless, in multirecord MTZ format (for import into SCALA, for example)

merged_anom_pointless.mtz

Merged data, anomalous pairs treated separately, imported via pointless, in MTZ format after SCALA

merged_noanom_sc2mtz.mtz

Merged data, anomalous pairs treated as equivalent imported via Scalepack2MTZ

merged_noanom_pointless_multirecord.mtz

Merged data, anomalous pairs treated as equivalent imported via pointless, in multirecord MTZ format (for import into SCALA, for example)

merged_noanom_pointless.mtz

  Merged data, anomalous pairs treated as equivalent, imported via pointless, in MTZ format after SCALA

merged_anom.sca

Merged data, anomalous pairs treated separately, SCALEPACK format

merged_noanom.sca

Merged data, anomalous pairs treated as equivalent, SCALEPACK format

unmerged_anom.sca

Unmerged data, anomalous pairs treated separately, SCALEPACK format

unmerged_noanom.sca

Unmerged data, anomalous pairs treated as equivalent, SCALEPACK format

merged_anom_pointless.mtz_scala.log

  logfile for "Merged data, anomalous pairs treated separately, imported via pointless, in MTZ format after SCALA"

merged_noanom_pointless.mtz_scala.log

logfile for "Merged data, anomalous pairs treated as equivalent, imported via pointless, in MTZ format after SCALA"

merged_anom_XSCALE.LP 

XSCALE log file, anomalous pairs treated separately

merged_noanom_XSCALE.LP

XSCALE log file, anomalous pairs treated as equivalent

  • Sept 2010 modified code to avoid deadlocking the CONDOR cluster

 


European Synchrotron Radiation Facility