Diamond Annual Review 2019/20

104 105 D I A M O N D L I G H T S O U R C E A N N U A L R E V I E W 2 0 1 9 / 2 0 D I A M O N D L I G H T S O U R C E A N N U A L R E V I E W 2 0 1 9 / 2 0 Scientific Software, Controls and Computation Mark Heron, Head of Scientific Software Controls and Computation H aving been created in 2018, the Scientific Software, Controls and Computation (SSCC) department is now established as part of the management structure of Diamond Light Source. The overall function of the department is tuned to provide the best possible support to deliver Diamond's scientific output, through outstanding and evolving software, computing and control systems. It functions as eight groups: Scientific Computing, Data Analysis, Data Acquisition, Beamline Controls, Accelerator Controls, Electronic Systems, Business Applications, and Cyber Security. Business Applications and Cyber Security groups have both been introduced over the course of the past year. The introduction of the Business Applications group was in recognition of the increased importance of both informationmanagement and business software to support both the scientific and corporate functions of Diamond. The creation of a Cyber Security group recognises that Diamond must deliver a level of cyber security maturity, across a large and complex IT infrastructure, consistent with best practice to address continually developing and changing cyber security threats. As many beamlines introduce new detectors to enhance their data collection capabilities, the department is responding to the up-and-coming Big Data challenges. As part of the planning for the proposed Diamond-II upgrade, SSCC are developing exciting new capabilities in these areas. To ensure that forthcoming challenges are met, a medium term strategy for SSCC was developed during the year. This involved extensive consultation within Diamond to explore and understand future science needs and drivers. This then informed a set of critical objectives to define future direction and to give consideration to their realisation through a series of roadmaps. Following an internal review at the end of 2019, the strategy will be reviewed externally during 2020, before being presented to the Science Advisory Committee. The following are examples of some of the interesting developments across the broad range of activities that have taken place within the SSCC department during the past year. Automatic Reconstruction of Diffraction Tomography Data The Microfocus Spectroscopy beamline (I18) was the first spectroscopy beamline built at Diamond and has the ability to explore the elemental makeup up of a sample under investigation with very high resolution; around two microns (about 50 times smaller than the width of a human hair). Over the past three years there have been major improvements to the hardware and software on I18 to enable significantly faster continuous scanning measurements, facilitating live data processing and visualisation. These improvements were made as part of the Mapping project and it resulted in a step change in data collection performance of the beamline. During the past year a new detector, called an Excalibur 3M, was installed on I18 running with new Odin data acquisition software for fast readout and support of the Mapping project capabilities. This allowed data from diffraction experiments to be collected at similar exposure times to the traditionally faster fluorescence measurement, enabling more efficient multi- modal measurements to be made. As well as being able to run faster with more informative scans the development made it easier to connect the scan information to external systems, e.g. informationmanagement and automatic processing applications. At the beginning of December 2019, all the software pieces were in place to enable live data processing of the new Excalibur detector and automatic tomographic reconstruction of fluorescence and diffraction data, using Diamond’s data processing framework, SAVU. All the processing applications were managed by a new micro-services framework, for triggering the automatic processing, and recording its status. The resulting processed data files produced were registered back with the original data collected in the experiment information management database, ISPyB, and so were viewable in the ISPyB webpage by the users. The first user group to use this new system were very happy with its performance stating,“As an experienced Diamond user, I have to say this is one of the most enjoyable beamtimes I’ve had so far. I am very impressed by the upgrade of the experimental set-up (detector), data acquisition and almost real time data analysis.” Fastest Detectors in Diamond Over the past year Diamond has installed several best-in-class X-ray detector systems, ensuring provision of world leading beamlines. For example, two of the Macromolecular Crystallography (MX) beamlines I03 and I04 had a new detector called an Eiger 2 XE detector, with 16 million pixels running at up to 560 frames per second. These modern detectors present extensive challenges to the software and IT support groups due to the high rates with which data can be acquired, leading to challenges in transporting the data, and subsequent processing and storage. Odin data acquisition software, a collaborative development between Diamond and STFC, was specifically developed to control and collect data from high data-rate detectors by being highly scalable and configurable to cater for specific detectors and applications. The Eiger 2 XE, and several other types of detectors around Diamond, successfully use the Odin software to acquire data at data-rates of multi-gigabytes per second. For the Eiger 2 XE models on the MX beamlines, Odin enables data acquisition speeds of up to 28,800 frames in 58 seconds routinely. In MX beamlines radiation damage is the single biggest limiting factor to data quality and must be balanced against the signal to noise of the data, in order to obtain collection of an optimal data set. One option is to perform stepped transmission measurements where the same scan is repeated a number of times with increasing photon flux: at lowest flux the data collected are too weak, at highest flux the sample is damaged, the optimum is in between. While this is a reliable and conservative approach to data collection, with the previous generation of detectors this was expensive in terms of time taken, as each scan would take twominutes or more, making for a time greater than 10 minutes for the complete run of four acquisitions. With the Eiger 2 XE detectors being able to run at 500 Hz, the full four scan acquisition can be performed in less than a minute - half the time of a single run with the old detector. With the automated analysis in place, a user is easily able to identify the best available data sets from the selection. Scientific Computing in the Cloud Collecting, processing and analysing large volumes of experimental data requires an ever-increasing suite of computer resources in order for users of Diamond to achieve this as effectively and efficiently as possible. Cloud computing offers a flexible way to realise this. For Diamond, the big data challenge is the need to transfer many 10’s of Terabytes’s (TB) of data around the network daily (1 TB is roughly equivalent to 250,000 photos on a phone with a 12MP camera). The consequence is that it is not feasible for all parts of the data acquisition and analysis pipeline to be moved off-site to the public cloud, because of the amounts of data that would need to be exchanged. To address this, Diamond is developing a hybrid cloud model; this requiresdeterminingwhichcomputationalproblemscanbemovedtotheoff-site cloud providers, and which computing challenges need to be kept on-premise. Eiger 2 XE Detector on beamline I03. Example of I18 auto processing. One of forty thousand diffraction images collected during the tomography scan (left) used to generate a sinogram of the scan (each pixel corresponds to a single diffraction image) (centre) and a sample cross-section of the reconstructed volume from the sinogram (right). Data for images courtesy of Dr Tan Sui’s research group at the University of Surrey. On-premise Cloud Computing Infrastructure.