Automatic Data Processing
Automatic processing of data on MX beamlines
Overview
As of March, 2010 a system for the automatic processing of data with XDS is available on all ID MX beamlines. The general program flow is as follows:
- User selects the 'Process & Analyze Data' Checkbox in the Collect->Parameters tab and specifies whether there is expected anomalous signal and the number of residues in the ASU
- As soon as data collection is initiated, a fast auto processing run is started which should keep up with the images as they are being collected. The directory containing these results can be found in PROCESSED_DATA/your_image_directory_name/xds_your_run/xds_fastproc
- Once data collection has stopped, a full autoprocessing run is performed in all reasonable bravais lattices. POINTLESS is also used in parallel to determine the space group. An initial integration run is perfomed, and the refined parameters from this run are used for a second integration run. Resolution ranges are adjusted in order to obtain an I/sigma of 2 in the outer resolution shell. Data are then converted to F's and converted to CCP4 MTZ format as well as merged and unmerged Scalepack formats.
- Data, where significant anomalous signal is detected, are submitted automatically to the AutoRickshaw server. Please read related terms and conditions.
Where are my results and Data?
The directory structure is as follows:
PROCESSED_DATA/your_image_directory_name/xds_your_run/xds_parallelproc: top level directory for full autoprocessing. XDS files from the indexing run are kept herePROCESSED_DATA/your_image_directory_name/xds_your_run/xds_parallelproc/SGnumber_unit_cell_dimensions : Directory for a given space group. there are usually many of these off of the top level directory
for example:
/mntdirect/_data_visitor/mx555/id23eh2/20100316/PROCESSED_DATA/John/fooprotein/xds_168_run2_1/xds_parallelproc/1_88.1_104.1_107.8_90.0_90.1_90.2
would hold the results for processing in P1, unit cell dimensions = 88.1, 104.1, 107.8, 90.0, 90.1, 90.2
Files within each space group/unit cell dimensions directory include:
XSCALE.LP: final integaration statistics
ccp4.mtz: Data in CCP4 mtz format
unmerged.mtz: Data in multi record MTZ format, suitable for SCALA scaling
xscaled_merged.sca: Scalepack format, merged
xscaled_unmerged.sca: Scalepack format, unmerged
scala.log: Statistics from SCALA
More details of what is really happening:
Integration is done in XDS and the resolution is cut at each round (multiple rounds are used in the first place in order to refine beam X, Y, distance, unit cell parameters etc).
After XDS integration, XSCALEing is done four times:
merging, anom ON
merging, anom OFF
no merging, anom ON
no merging, anom OFF
In the merged cases, data are converted to MTZ format in two ways:
1)The result of the XSCALE is converted to a scalepack file, then via SCALEPACK2MTZ, then TRUNCATE, then UNIQUEIFY generating these files: merged_anom_sc2mtz.mtz and merged_noanom_sc2mtz.mtz
2a)POINTLESS is used to convert the XDS_ASCII file from XDS (i.e. pre XSCALE, pre conversion to SCA) to a multirecord MTZ file, suitable for SCALA
b)SCALA is run on this data , which generates the log file.
Truncate and downstream programs are not run at the moment - but this may change !
Files generated:
merged_anom_pointless_multirecord.mtz
merged_anom_pointless.mtz
merged_noanom_pointless_multirecord.mtz
merged_noanom_pointless.mtz
In the POINTLESS version, there is one round of resolution estimation missing so:
1) the .sca files are usually a better estimate of the resolution of the data.
If you find this not to be the case, please let Max know. I would recommend checking the *XSCALE.LP log files to see if it has cut resolution to your
standards.
Check the results carefully. Below shows a typical case for .sca files and mtz file comparison. The SCA files appear to be truncated to a more realistic resolution than in the POINTLESS files (as expected).
pointless: Outer shell res, shell I/sig, Rsym = 3.05, 1.0, 87% (!!)
scalepack: Outer shell res, shell I/sig, Rsym = 3.22, 3.36, 41.4%
Expert user debugging
Is there a problem? Here is what you should expect to see upon collecting a dataset of >3 images:
- You should see debugging in the MxCube window "Information Messages" with messages such as "In processDataScripts". NOTE: in remote data collection situations, the MXCube on the console will not output this debugging. The remote user should, however, see it.
- As data is being collected, a xds_fastproc directory should be created automatically in the top level directory, i.e.
After a short pause (depending on cluster load), you should see XDS begin to index and integrate normally
- After the data collection has been completed, an xds_parallelproc directory should be created in the top level directory. i.e.
- Assuming that your data are indexable by the autoprocessing software, you should then see sub directories appear for each possible bravais lattice (most will be the lowest symmetry space group of the Bravais lattice, with the exception of the Pointless choice )
- Integration should proceed until the files described earlier (ccp4.mtz, xscaled_*, etc) are generated.
Remedies to problems found above
These tasks are normally done by the beamline scientist, but an experienced user may also find them useful.
Is the mxprocessing server running?
bliss_dserver status MxProcessing.py
if it is not running, start it:
bliss_dserver start MxProcessing.py
IMPORTANT: the MxProcessing.py server MUST be run as the operator (e.g. opid14, opid23) and NOT blissadm
Is the ISPyB server running?
Check in MXstartup or with blissadm status. It needs to be running for results to be stored in the database. Autoprocessing CAN NOT proceed without this, but will generate errors and not upload any processing information to ISPyB. One clue that the ISPYBserver is not running is the error in the Information Messages tab that reads like:
Error starting processing, is the autoprocessing server correctly configured? TypeRrror('cannot marchal None unless allow_none is enable')
This is sometimes caused by the user logging into MxCube when an ISPyB server is not running. This can cause no SessionID to be set, which means no DataCollectionIDs are sent, which throws an exception. do a bliss_dserver status and look for the ISpyBServer.
Do you see debugging in the MxCube "Information Messages"?
You should see items such as "In processDataScripts". If not, you are probably running the wrong version of MxCube.
Known problems
- Results can be mediocre if there are few spots or with very low (worse than 5 A) data
- in "fast" mode, the programs can have a difficult time keeping up with short (<0.5 second/frame) exposures
- Autoprocessing is running and creating output files but there is nothing displayed in ISPYB. ** PLEASE CONTACT YOUR LC (Max, Stephanie or Elspeth)** It is likely that the server which uploads the results to ISPYB has crashed and needs to be restarted.
Updates
- February, 2011 Improved indexing reliability by including more images in XDS spot picking, with a failover to LABELIT
- October 6, 2010 Fixed a bug which was causing the P1 integration to take a long time (and cause subsequent POINTLESS run to be slow as well)
- October 5, 2010 Added many new files to be attached to ispyb:
merged_anom_sc2mtz.mtz
Merged data, anomalous pairs treated separately, imported via Scalepack2MTZ
merged_anom_pointless_multirecord.mtzMerged data, anomalous pairs treated separately, imported via pointless, in multirecord MTZ format (for import into SCALA, for example)
merged_anom_pointless.mtzMerged data, anomalous pairs treated separately, imported via pointless, in MTZ format after SCALA
merged_noanom_sc2mtz.mtzMerged data, anomalous pairs treated as equivalent imported via Scalepack2MTZ
merged_noanom_pointless_multirecord.mtzMerged data, anomalous pairs treated as equivalent imported via pointless, in multirecord MTZ format (for import into SCALA, for example)
merged_noanom_pointless.mtzMerged data, anomalous pairs treated as equivalent, imported via pointless, in MTZ format after SCALA
merged_anom.scaMerged data, anomalous pairs treated separately, SCALEPACK format
merged_noanom.scaMerged data, anomalous pairs treated as equivalent, SCALEPACK format
unmerged_anom.scaUnmerged data, anomalous pairs treated separately, SCALEPACK format
unmerged_noanom.scaUnmerged data, anomalous pairs treated as equivalent, SCALEPACK format
merged_anom_pointless.mtz_scala.loglogfile for "Merged data, anomalous pairs treated separately, imported via pointless, in MTZ format after SCALA"
merged_noanom_pointless.mtz_scala.loglogfile for "Merged data, anomalous pairs treated as equivalent, imported via pointless, in MTZ format after SCALA"
merged_anom_XSCALE.LPXSCALE log file, anomalous pairs treated separately
merged_noanom_XSCALE.LPXSCALE log file, anomalous pairs treated as equivalent
- Sept 2010 modified code to avoid deadlocking the CONDOR cluster