Diamond Annual Review 2021/22

116 117 D I A M O N D L I G H T S O U R C E A N N U A L R E V I E W 2 0 2 1 / 2 2 D I A M O N D L I G H T S O U R C E A N N U A L R E V I E W 2 0 2 1 / 2 2 Scientific Software, Controls and Computation Mark Heron, Head of Scientific Software, Controls, and Computation T he Scientific Software, Controls andComputation (SSCC) departmentmanages all software, computingand control systems to facilitate and support the science programme of Diamond. Over the last year there has been an increasing emphasis on planning for Diamond- II. The department functions as eight groups: Scientific Computing, Data Analysis, Data Acquisition, Beamline Controls, Accelerator Controls, Electronic Systems, Scientific Information Management Systems and Cyber Security. The overall structure and function of these areas recognises the importance of, and is optimised to provide, the best possible delivery and support for software, computing and control systems. There have been a number of changes in SSCC over the last year. A Scientific Information Management Systems (SIMS) group was formed, and an interim group leader appointed; a newHead for Data Analysis was appointed; and a new role as Head of the Integrated Software Programme for Diamond-II was created and an appointment made. Throughout most of the last year the majority of SSCC staff worked remotely, whilst always ensuring adequate on-site cover. Maintaining effective and timely communication was seen as imperative to be able to support operations effectively. The last year has seen an increasing emphasis on planning and making preparations for Diamond-II. A series of work packages for software, computing and controls have been developed and estimates have been made of the resources required to deliver the plan. Further details of Diamond-II work relating to the following areas are reported below: Development of the New Software Architecture; Diamond-II Software and Computing; Design of New Electron Beam Position Monitor . In the previous Annual Review, it was reported that a Strategy for Scientific Software, Controls and Computation had been published; this presented a vision for how the department will meet forthcoming challenges in software, controls and computing, and a set of SSCC goals was specified. The goals are listed below, together with a progress report: Selected highlights from exciting SSCC developments across a broad range of activities during the past year are presented: A New Data Centre Diamond currently operates two data centres with combined capacity of 500 kW. These house the required IT resource of approximately 18,000 compute cores and 20 PB of high-performance storage needed today for first level data capture and processing of data from the photon beamlines and electron microscopes. In addition, subsequent processing utilises shared cloud-based resource primarily operated by STFC as the IRIS service to the science community. Today the first data centre is 100% full and having been constructed in 2011 the cooling infrastructure is nearing its end-of-life, and the second data centre is 60 % full. As a result, there is limited capacity for future IT resource to support the anticipated growth in data volumes and processing needs. Racks in the current data centres support IT loads of up to 10 kW per rack, while modern IT equipment can achieve IT loads up to 80 kW per rack and higher. This combination of need for greater capacity and changing technology has led to the specification for a new data centre. During the past year, the technical design for the new data centre has been developed in conjunction with a company that specialises in the design and construction of data centres. The resulting design provides for an operating power envelope of up to 1 MW of IT load. The design provides for 40 racks to house the IT equipment. These are specified as: 20 off low power racks at 25 kW per rack, 15 off high power racks at 60 kW per rack and 5 off network racks at 5 kW per rack. The low power racks will use rear door cooling and the high power a combination of rear door cooling and water to chip cooling. The design affords a high-level of resilience and to be fault tolerant to minimise the downtime of the IT services. This is achieved through two separate electrical supplies, with one fed through uninterruptable power supplies to protect against short mains outages. The cooling systems also includes redundancy and extract the heat load through a mixture of chilled water to rear door coolers for the low power racks and a combination of rear door coolers and direct water to chip from dry air coolers for the high-power racks. This combined cooling approach provides: flexibility to accommodate the mixed IT load required; support for the move to new IT technologies with higher power densities; and gives a good operational efficiency in terms of data centre cooling. As of spring 2022, the technical design for the data centre has now been reviewed, with amodel of it shown in Fig. 1. By the summer of 2022, the detailed design work should be finished and the construction project commencing with timescales for completion and handover of summer 2024. Figure 1: Model of Diamond’s New Data Centre (Courtesy Oper8 Global Ltd). Planning for Diamond-II Software and Computing Diamond-II will deliver photon beams with substantially increased spectral brightness. This major enhancement will enable significant new science. Experiments will become more complex and will be conducted with higher spatial and temporal resolutions. To facilitate these there must be greater automation of experiments, faster detectors, rapid data processing and reduction, and the introduction and development of new data processing techniques. The latter will exploit recent developments in Artificial Intelligence andMachine Learning toolkits, automating data reduction and analysis. Science will benefit too from a more open software environment; an open environment will facilitate a greater level of collaboration between software and scientist, profiting also from a lowered threshold of entry for conducting experiments. All these changes will be made possible by the significant enhancement of Diamond’s software and computing capabilities delivered as part of the Diamond-II upgrade. These developments will enable the major increase in data, transforming data into scientific knowledge. During the Summer of 2021, key core software, controls and computational development areas were identified through a series of internal workshops. These identified areas are: • Science Specific Data Analysis Software Developments • Detector Readout • Data Archiving • Post-visit Data Analysis Services • High Performance Sample Stages • User Administration and Information Management • Modernisation of Data Acquisition Software Framework Science Specific Data Analysis Software Developments Developing data analysis software will be key to enabling new science. Developments will address limitations in the software for delivering computationally intensive science. Accelerating these applications will benefit the Ptychography, Tomography and MX science domains. Speedier and more robust algorithms will benefit Ptychography experiments. The Tomography domain will also welcome speed alongside improvements to usability and the introduction of packaged services. Macromolecular crystallography will see the necessary step change in performance realised via a migration to in-memory architecture and the use of GPU/FPGA accelerations. Real-time analysis will be important to all, with automatic processing of data through all the analysis pipelines. Detector Readout Detector data rates are being seen to double every seven and a half months. Diamond currently has data rates of up to 4 GB/s from a single detector. Data is transferred from all the detectors to central storage, and the existing 50-80 GB/s high performance file systems can handle the demands from multiple beamlines. Integrating detectors with high-speed file systems reliably is however becoming increasingly challenging. As the data rates continue to increase, this will only be more so. Developments in this area will address the need to stream data to data-consumers and for detectors to write directly to memory data processors, without the file system. Data Archiving Diamond’s current data archiving solution hosts over 20 PBs of data, but it will not scale effectively to cope with the order of magnitude change to data volumes expected with Diamond-II. For Diamond-II, the total volume of data will increase significantly in to the 100s PBs. To support the larger more complex data the usability of the archive will be enhanced, to provide greater functionality in terms of how data is selected and how it is moved out of the archive and processed. This will be a key enabler of better data processing services and open data. Post-visit Data Analysis Services Transferring data from Diamond to users’ home institutions for processing presents Diamond’s existing users with significant challenges due to the size of the data and network limitations (See Fig. 2). When the users get the data back to their home institute, the computing resources and software to process it may not be readily available. Furthermore, new users expect to see results and not the raw data alone. Whilst Diamond provides some post visit data analysis services today, for the computationally intensive science domains, MX and SSCC Goal Report on Progress Deliver high quality software, computing and controls provision and support in order to enable Diamond’s science programme: from experiment approval to publication. • As reported, examples of related work:- • Development of a new software architecture; • Design for a new data centre; • Design of new electron beam position monitor; • Automation of Electron Microscopy Single Particle Analysis Data Processing. Establish strong partnerships, internally and externally with the Science and Technology Facilities Council (STFC), other facilities, universities, etc; to deliver best in class software and computing in a collaborative, effective and sustainable way. • Development with Birmingham University of services to run on the Baskerville Tier 2 HPC system. • Support for STFC in the development of an Outline Business Case for the campus Research Computing Centre. • A strong and long-running collaboration on the Odin Data Acquisition and Controls software framework with the STFC Detector Systems Software Group. • Well-established collaborations with SOLEIL, and more recently MAX-IV, on the PanDaBox for low-latency experiment triggering and synchronisation. • Collaborations with NSLS-II and other EPICS sites to adopt and progress BlueSky developments. Deliver a step change in experiment capabilities and efficiencies, across the science programme, through: intuitive useable applications; automated analysis of data; remote access capabilities; automation of instrument and sample management; high speed readout of detectors; and near real-time data process for visualisation feedback. • Preparations for a step-change in experimental capabilities and efficiencies as part of the D-II planning. Widen experimental data exploitation through delivering: FAIR and Open Data, and through new capabilities in data science (Machine Learning (ML) and Artificial Intelligence (AI)), modelling and analysis. • Significant developments in delivering FAIR and Open Data, with changes proposed to the Diamond Data Policy, existing data embargoes extended, and a comprehensive user consultation planned for 2022. Realise a hybrid computing model, which delivers on-site computing services to those applications for which low latency and high bandwidth are essential, with all other services delivered through cloud computing. • Development in containerisation to facilitate greater use of external computing resources. • Substantial increase in external computing resource, largely through the use of the IRIS compute cluster provided by STFC. • Close work with STFC to support the planning for the Research Computing Centre.