March 27, 2008 1 Comment
In the article, the main thing that caught by eye was this:
The Large Hadron Collider (LHC) will produce close to one per cent of all digital data generated on the planet. The amount stored every year by each of the big experiments would fill about 100,000 DVDs.
That’s 15 PetaBytes per year according to Dirk Duellmann’s presentation linked to below.
Wow that’s data. And a volume of data that would be useless without applications to process, crunch, filter and make sense.
Anyway, I read that article and went away to find out a little more about how they’re doing what their doing.
Oracle is a major component of their architecture with something like one hundred server nodes across fifteen clusters serving two or three million database sessions per week.
I don’t know any of them, but the guys from CERN are regular presenters at Oracle user groups and they’re very open about what they do and how they do it. There’s plenty of interesting information on their Physics Database Wiki
RAC and Streams are the cornerstones of their database architecture. They’re using Asychronous Autolog CDC. Which was an interesting and timely coincidence for me because I had been chatting recenty to someone at a big bank who had been struggling with this streams setup and was getting frustrating with the tens of patches, some conflicting, to get this up and running.
Dirk Duellman made some interesting points about their architecture and Streams observations at OOW, among which, the impact of Streams rules at capture or propagation stage, the translation of bulk operations on the source to many non-bulk operations at the destinations, the impact of spillover of message from memory to disk.
All fascinating. Some of the detail makes my head ache and that’s ignoring all the rocket science physics stuff .