Crystallographic fragment screening experiments on beamlines like I04-1 and VMXi generate vast datasets of rich structural data for drug discovery. Unlike other methods of ligand screening, however, establishing ligand binding requires a complete structural model of that binding to be constructed. This can be very challenging: ligand binding in fragment screens is typically at low occupancy, which can make binding sites difficult to identify and model in conventional crystallographic maps.
The Pan-Dataset Density Analysis (PanDDA) algorithm was developed to address this issue. By leveraging the many isomorphous datasets found in fragment screening PanDDA can characterize the “expected” electron density – or “ground state” – and therefore to identify outliers – or “events” – and generate synthetic “background corrected” maps in which modelling ligands is straightforward. The user experience is further eased by providing a convenient GUI for navigating and building these events.
PanDDA 2 was developed to bring objectivity to the characterization of the ground state and the identification and modelling of ligand binding events. PanDDA 2 does this through several core innovations: automated identification of multiple candidate ground states through dataset clustering, assessment of event quality by ML, automated ligand building and assessment of ligand model quality by ML.

By the combination of these improvements PanDDA 2 can as much as double the number of fragments discovered that are relevant to medicinal chemistry. Hits only available in PanDDA 2 have already proven important in pushing lead series to nanomolar potency. PanDDA 2 has also considerably expanded the number of interactions explored in target active sites across screening experiments.
Previous work such as Cluster4x has shown that correct selection of the characterization datasets for PanDDA can reveal otherwise undetected ligand binding. However, because of the ubiquitous heterogeneities in fragment screening data it can be difficult to decide which datasets belong together in a ground state. PanDDA 2 fully automates and hence objectifies this process, by choosing a comprehensive selection of non-overlapping sets of X-ray datasets and assessing each dataset for the presence of ligand binding against these candidate ground states.

PanDDA 2 generates ligand builds for events by using the differential evolution algorithm to place candidate conformations generated in RDKIT. These “rough” builds are only accurate to about 1 Å RMSD, however this is sufficient to provide a good initial guess from which to perform automatic or manual refinement.
Using the historical XChem dataset of over 100,000 unique datasets in over 100 different protein systems, it was possible to train ML models to identify ligand binding-like electron density and credible models for ligand poses. These can be used to identify which events from which candidate ground states are likely to correspond to ligand binding, and which ligand models proposed by differential evolution are accurate. Potential binding events and ligand bound models of them can therefore be ranked and presented to users in the conventional PanDDA GUI.
PanDDA 2 is available at DLS through XChem, and now automatically runs on all datasets from the I04-1 beamline. Other facilities can install PanDDA 2 from the GitHub: https://github.com/xchem/PanDDA2. PanDDA 2 will also be available soon through the CCP4 package.
Diamond Light Source is the UK's national synchrotron science facility, located at the Harwell Science and Innovation Campus in Oxfordshire.
Diamond Light Source Ltd
Diamond House
Harwell Science & Innovation Campus
Didcot
Oxfordshire
OX11 0DE
Copyright © Diamond Light Source. Diamond Light Source® and the Diamond logo are registered trademarks of Diamond Light Source Ltd
Registered in England and Wales at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom. Company number: 4375679. VAT number: 287 461 957. Economic Operators Registration and Identification (EORI) number: GB287461957003.