Diamond Annual Review 2023/24
69 D I A M O N D L I G H T S O U R C E A N N U A L R E V I E W 2 0 2 3 / 2 4 Update on Diamond-II software and computing The Diamond-II Core Software, Controls and Computing project (the third pillar of the Diamond-II project, seeks to deliver core developments across software, controls and computing to fully exploit opportunities afforded by the Diamond-II machine upgrade, and its new beamlines. It is a significant body of work underpinned by development, deployment and exploitation of a modern beamline software architecture. This will enable science currently impossible today by closely integrating data analysis, high performance computing, the control system, data acquisition (DAQ) and beamline instrumentation. Improved data analysis throughput will be realised by reducing latency in time-critical steps, and providing substantially better management of data, including facility-wide access to rich metadata catalogues. Whilst the ultimate ambition of the project is to harness the brightness of the new Diamond machine and enable flagship capabilities, the project will gradually realise a continuous stream of incremental benefits, steadily reducing technical debt and addressing critical obsolescence, whilst unlocking new capabilities deployed with greater flexibility and extensibility. This supports Diamond’s objective to, based on science opportunity, implement exemplar services within the new software architecture to enhance Diamond and de-risk Diamond-II. The project is driven by an extensive plan: a bound waterfall plan, as clear strategy for delivering interconnected developments across disciplines with finite time and resource, exploiting agile loops, to meet key milestones (see Figure 2a) and allow for smaller increments of work, early deployments (portfolio opportunities) and regular feedback. First reviewed externally in Summer 2022, the beamline software architecture has matured substantially this year: a new Architecture Report has been written, and the project is looking forward to external review at the inaugural meeting of the Scientific Software Advisory Committee in June. It is an architecture encapsulating units of functionality into discrete services. The core supporting infrastructure for this is the Cloud Native Computing Foundation Project, Kubernetes, which will run both in-house developed software, and upstream and community supplied projects. Substantial progress has been made over the course of the year in establishing and provisioning this infrastructure, and the required containerisation of software and services). At the centre, Diamond’s DAQ Software Framework is the primary interface for Diamond’s users, responsible for orchestrating experiments and managing data collection. GDA, Diamond’s current platform, will be replaced by the Athena service based DAQ framework. Closely coupled to this is replacement of the existingmiddle-layer framework, Malcolm, and provision of support for more extensive fly-scanning. These intertwined developments, which span two substantial project work streams, will leverage NSLS-II’s Bluesky library for data collection, and their Ophyd library for device abstraction. Members of Diamond’s SSCC software groups have enjoyed effective close collaboration throughout the year with their counterparts at NSLS-II. In November, the first releases of core Athena services and Ophyd-async were deployed using a new Kubernetes infrastructure to the I22 beamline alongside the current GDA platform. These were successfully employed in a user experiment to synchronise hardware control and data collection. Within MX, Bluesky has been leveraged with great success to support Unattended Data Collection. The development of Bluesky plans and Ophyd devices will be at the heart of delivering new science capabilities; when brought together with all the considerable developments across the project, they sit as epicentre of a new integrated architecture focused on effective collaboration between software groups and science (see Figure 2b) to deliver flexible and extensible software solutions. The project has also seen advancement across the data analysis and information management domains. This year has seen triumph in the exploitation of novel hardware and architectures to bring about performance gains – e.g., realising spot-finding with GPUs in MX. It has been possible to make use of Diamond-II funding to recruit key positions in the second half of the project.. This has enabled the maturation of the beamline software architecture across the full stack. A new Universal Laboratory Information Management Systems team has been formed to drive forward the much- needed provision of information management systems for the Physical Sciences and tackle the challenge of providing access to rich metadata catalogues. A new Shipping Service has been delivered and Diamond’s SciCat has been deployed to B24. A new Data Analysis Platform team has also been established. The project teams have been able to explore several key cross- cutting themes: streaming, e.g. via prototyping ptychography developments on I08, web-technology based user interfaces, and crucially, authorisation and authentication to ensure to ensure a robust and secure scalable and extensible architecture. Progress notwithstanding, the project teams are very mindful of the challenges which await as they seek to support new deployments and capabilities of the new architecture alongside the existing software. Spot-finding with GPUs for macromedia crystallography A critical part of automated macromolecular crystallography data collection is the alignment of the sample with the X-ray beam, which is performed with an automated optical step followed by a raster scan of the sample with the X-ray beam. Basic data analysis, or spot finding, is performed and the position with the strongest diffraction identified (at orthogonal angles) as the optimal location for data collection. Until recently, this involved capturing the data to HDF5 (Hierarchical Data Format version 5, an Open Source file format that supports large, complex, heterogeneous data) files, then performing the analysis from these files once the acquisition was complete. This led to a latency of a few seconds between the end of the scan and availability of results, or longer latency if there were a large amount of contention for the resources shared between beamlines for running this system. With the state-of-the-art Eiger 2XE 16M detectors on beamlines I03 and I04, which routinely operate at 500Hz, this latency is longer than the data collection to follow, representing an unacceptable delay. By addressing this delay throughput of automated data collection services will improve. Section 3- First image shows pixel classification as performed on the GPU to identify which pixels belong to “spots” vs. being part of the background. Section 2 - Second image shows the view of the analysis in SynchWeb, with the maxima of diffraction highlighted as a heat map.
Made with FlippingBook
RkJQdWJsaXNoZXIy OTk3MjMx