Data Analysis Software Group
The work of the Data Analysis Software group has continued unabated over the past year with renewed effort on everything; from delivering new functionality with Data Acquisition colleagues, to helping increase the general value and quality of scientific software on an international stage.
Development of the DAWN software has accelerated this year with new scientific staff joining the team and contributing important functionality for specific scientific areas. The modular nature of DAWN and the technology it depends on has allowed components of the package to be used in new ways. As well as incorporating new features into the data acquisition software in Diamond, GDA, components have also been reused by the controls group at Diamond as part of their long term interfaces to EPICS. This has also led to central facilities, including neutron spallation sources as well as synchrotron sources, incorporating new functionality and finding new uses for the software.1
The last year has seen the continuation of the cross-beamline project to streamline calibration and data reduction of powder diffraction data from 2D detectors. With the collaboration of beamline colleagues, the calibration aspects of the project were completed at the end of last year and this year the focus has been on data reduction.
The goal has been to produce a modular, generic data processing framework and user interface, allowing custom data processing pipelines to be built and applied over multiple datasets. This resulting framework is now available as the processing perspective in DAWN, and contains many independent mathematical operations specific to 2D powder diffraction data (including azimuthal/radial integration), and moreover many which are applicable to other kinds of data processing. Making the system completely generic has led to it being adopted for processing data from other experiments at Diamond (for example Angle-Resolved Photoemission Spectroscopy (ARPES) data reduction), taking a step towards giving users a consistent data analysis experience, not just across beamlines, but also across techniques.
Since 2013, Diamond software and beamline groups have been focused on creating a clean, robust and simple user experience; from tomography data collection, through to the processing of raw data in to volumetric data. The efforts and achievements so far of the groups involved can be seen in the dramatic rise in the volume of tomography data being collected, and a threefold increase on usage of the Diamond computing cluster used for tomographic reconstruction.
As well as helping to coordinate these activities, the data analysis group have also been working closely with other Diamond staff to develop and enable new techniques to make the best use of the large cluster resources available and improve the experiment flow. This pipeline is compatible with similar work being undertaken inside the DAWN data processing framework, and is being considered for adoption by the wider community, including CCPi (Computational Collaboration Project for Imaging) and the IMAT station at the ISIS neutron source.
Crystallography experiments at Diamond range from material, to chemical, to structural biology, each with its own challenges. One ambition beginning to be realised is to use the strengths in one domain and transfer it to another. As developments in data processing and structure solution such as DIALS (led by Gwyndaf Evans) and CCP4 continue to benefit and deliver improvements to the automated processes in macromolecular crystallography (MX), the software developed is now being used to help process data from chemical and material crystallography.
DIALS is a collaboration including Diamond, CCP4 and the Lawrence Berkeley National Laboratory (LBNL); last year good progress was reported on the software being developed. The project itself has continued to go from strength to strength, with many of the components now being routinely used in MX pipelines at Diamond and also in novel and challenging problems in processing data from new detectors, including the curved detector on the Long-Wavelength MX beamline (I23).
As the Eclipse software framework greatly benefits key Diamond software such as DAWN and
the GDA, Diamond have joined the steering committee of the Eclipse Science Working group, along with other major software players such as IBM and Oak Ridge National Laboratories.
Diamond participated in the Science day at the annual international conference, EclipseCon in California, which highlighted scientific projects and identified areas where open source projects can work together. The initiative has led to software being designed to share data description, plotting, parallel tools and high performance computer (HPC) job running.
This year the team has had the opportunity to work closely with the Software Sustainability Institute (SSI) as Mark Basham was selected as one of the 2014 SSI Fellows. This Fellowship scheme provides funding and access to a network of software developers and scientists, all of whom are trying to make their code more sustainable. The funding provided can be used by the fellows to run events which promote the writing of sustainable software, and Mark used his to run a series of Python Community of Practice events. The events have been well received and have brought together developers and scientists from across the Rutherford Appleton Laboratory (RAL) site and from the Culham Centre for Fusion Energy to participate in events such as Cyber Dojos and Hackathons.
The data produced at Diamond continues to grow at pace, reflecting the investment and developments made in everything from detector technology, beamline stability and automation and improved software. In 2013/2014, the total amount of data archived from Diamond was reported to be 1PB. By March 2015 this had grown to an impressive 3PB of data, equating to 800 million files, catalogued onto tape.
1. Basham M. et al. Data Analysis WorkbeNch (DAWN). J. Synchrotron Rad. 22 (2015).
Alun Ashton, Diamond Light Source, firstname.lastname@example.org
Diamond Light Source is the UK's national synchrotron science facility, located at the Harwell Science and Innovation Campus in Oxfordshire.
Copyright © 2017 Diamond Light Source
Diamond Light Source Ltd
Harwell Science & Innovation Campus