Materials science: keeping up with the data explosion
Computers in the ESRF’s materials-science beamlines will soon have to deal with terabytes of data per day, all of which will have to be analysed and delivered to users in a matter of minutes.
“We are constantly using bigger and more detectors,” explains Gavin Vaughan (right), scientist in charge at ID11 (Photo credit: C. Argoud).
When the ESRF switched on in the early 1990s, its materials-science beamlines were equipped with just a single workstation. Back then, diffraction data were recorded on image plates that were read out offline. But the arrival of CCD cameras unleashed a data explosion. In terms of image sizes, the first CCD detectors used to produce a couple of megabytes, whereas today’s produce up to 24 MB. The new detectors also read the data out much faster: where it once took eight seconds it now takes less than one, and soon will be even quicker. With multiple detectors often in use and frames being recorded every few seconds during an experiment, huge demands are being made of ID11’s computing resources.
“Since we moved to taking data with 2D detectors, we’ve struggled to keep up,” says Gavin Vaughan, scientist in charge at ID11. “We’re taking data faster and faster.”
Storing those data is not such a problem. In the past 10 years the ESRF moved from tape storage, which held a few gigabytes, to portable hard drives that can carry a terabyte or so. But thanks to Moore’s law, the 40-year-old trend that sees computer-storage capacities double every two years or so, the hardware has kept pace. “Our detector expansion is slower than Moore’s law, so things are getting easier for storage,” says beamline operation manager Jonathan Wright. “A big helper has been popular demand for large-capacity hard drives for the home PC market, which has made them very cheap.” Wright also says that parallel programming will be needed in order for the group to benefit from Moore’s law and meet future challenges. “Many existing algorithms might easily be parallelised, but there is still a significant cost to do that. Some GPU codes already exist, but these are not yet in a user-friendly state.”
X-ray diffraction methods at ID11 give users access to the structure of materials on a wide range of length and time-scales. The basic technique is based on analysing diffraction spots produced by grains in the sample, which give scientists the information they need to deduce a material’s properties or to study chemical reactions. As is the case at most ESRF beamlines, users need to perform as much online analysis as possible. But the detectors produce a lot of raw data, and algorithms that transform these data into images that users can evaluate require intensive processing. The initial software used in grain mapping experiments, which was tied to a proprietary system that prevented optimisations, used to take months to process a data set. “Now it takes a couple of hours,” says Vaughan.
The improvement was thanks to a five-year EU funding injection, which enabled postdocs at the ESRF and at other institutes to work on algorithm development and implementation until last year. The project, called TotalCryst, was set up to extract the most from ID11’s 3D X-ray diffraction microscope, which allows non-destructive characterisation of individual grains and sub-grains inside bulk materials. Practically, TotalCryst provide algorithms that quickly reveal multiscale dynamics of the individual embedded grains. The code has all been translated to modular programs using C or Python, and is now being developed in an on-going project called Fable by staff at ID11 and the Technical University of Denmark.
Bigger, better, faster
Diffraction contrast tomography, a recent and related technique to the 3D microscope, combines microtomography with diffraction to produce terabytes of raw data mapping the 3D grain shape and orientation in polycrystalline materials (see image, below). All these data have to be corrected and then back-transformed to get from reciprocal space to 3D space to allow users to investigate the growth of grains during annealing and to study the interaction of fatigue cracks with microstructures. It’s another area where GPUs are likely to come to the rescue.
![]() |
|
X-ray diffraction contrast tomography reveals the 3D grain shapes in a beta-titanium alloy, a material used in aircraft and engines. Image credit: W. Ludwig et al. |
“In terms of challenges, there are always new ideas coming up,” says Wright. One is the 3D X-ray detector commissioned in 2009 that consists of three semi-transparent screens, which produce a series of three X-ray images at different distances from the sample. This should allow the spatial and angular distribution of the diffraction to be recovered, allowing heavily deformed materials to be studied. The Grain Tracking group at ID19, which is migrating many of its experiments to ID11, is developing a large body of code to derive 3D grain maps on the ESRF computing cluster, but mapping plastically deformed samples is a big challenge. Another challenge, says Wright, is to run analyses in more online modes so that users can get more feedback during experiments; also to maintain software packages developed by postdocs or scientists on short-term contracts once they have left the ESRF.
“We are constantly using bigger and more detectors, and improvements in the beamlines mean data come in faster,” explains Gavin Vaughan. “Experiments currently produce about 1 TB per day, but this will be 16 TB in the future and we currently have no way to deal with that!”
Matthew Chalmers
This article appeared in ESRFnews, March 2011.
To register for a free subscription and to rapidly receive the current issue, please go to:
http://www.esrf.fr/UsersAndScience/Publications/Newsletter/esrfnewsdigital
