Diamond's Big Data

A microscopic scan of a bee’s eye has pushed Diamond past 3 petabytes - but how to cope with all that data?

The dataset that brought us past the 3PB mark: a coloured reconstruction of a bee’s compound eye, produced from studies on I13. Courtesy of Gavin Taylor, Emily Baird, Andrew J Bodey & Andreas Enstrom (University of Lund)
The dataset that brought us past the 3PB mark: a coloured reconstruction of a bee’s compound eye, produced from studies on I13. Courtesy of Gavin Taylor, Emily Baird, Andrew J Bodey & Andreas Enstrom (University of Lund)


 Around 3000 scientists visit Diamond each year to study, scan, and scrutinise samples. They could be investigating anything, from a type of virus or bacteria, to smart materials like graphene, or samples from dinosaur skeletons. All of this scientific research produces some pretty big data, up to 20 terabytes a day to be precise – that’s equivalent to all the storage on about 40 standard laptops – and Diamond’s data is only getting bigger.

Diamond recently passed a big data milestone: a massive three petabytes have now been processed at the facility. Now for those of us who don’t know our gigabytes from our terabytes, three petabytes is about the equivalent of 187,500 modern iPhones*, or 6000 years of Mp3 music; it’s more data than a human being could ever process in their lifetime.
And the amount of data being processed at Diamond is increasing rapidly; in fact, the number has gone up by 35% in the last 12 months alone. New beamlines, improved capabilities, and proliferating numbers of users mean that Diamond is running at a higher level than ever before. Diamond’s data is expected to increase even more rapidly in the coming years; 8 new beamlines will become operational by 2018, and new integrated facilities like the eBIC centre for biological research and the new materials characterisation facility will mean that Diamond is soon producing more data than almost any other facility in the UK.
So how do you cope with that much information? Well, Diamond has an entire team of computing specialists dedicated to processing the raw information generated by scientific experiments. Their job is to transform this data into a format that can be interpreted by the scientists carrying out the experiment, like a graph or an image. These people are the unsung heroes of science – with advanced software and techniques, the computing team ensure that scientists’ experiments generate clear and meaningful information. 
The work that pushed Diamond over the three petabyte mark is a perfect example of how scientific experimentation and software and computing techniques work hand in hand to advance research. A team from the University of Lund in Sweden visited Diamond’s I13 beamline to study the eyes of orchid bees in an attempt to determine how they navigate dense tropical forests. The orchid bees have a small brain the size of a sesame seed, but five eyes, two of which have thousands of lenses, make up for this lack of thinking power. This important research could help support advances in automated navigation technology.
The team used microCT scanning on I13, Diamond’s longest beamline which supports a diverse range of research, from biology, to materials, to archaeology, and engineering. Their images of the tropical bee’s compound eye demonstrate the intricacy of the data that research at Diamond produces. The delicacy of the task, unravelling raw information to create an image like this, is astonishing, but it is one of the most important elements of modern science, and a field in which Diamond excels. 
There is no doubt that data drives Diamond and many other large-scale modern science facilities. Whilst their work is carried out behind the scenes, computing specialists are integral to scientific exploration, and their skills and expertise ensure that big machines keep generating big data. This collaboration between scientists and computing specialists, researchers, and translators, underpins modern science and helps to advance our understanding of the world, one byte at a time.
* Assuming a storage capacity of 16GB each.



Related content

Compound eye of an orchid bee, imaged to understand navigation in complex environment. G Taylor, E Baird, AJ Bodey, et al.