OraStory

Entries from March 2008

You want data? This is data – Large hadron collider (lhc)

March 27, 2008 · 1 Comment

The other day, on my commute, I was reading this report in the Telegraph – about The Large Hadron Collider at CERN (I also see Doug was there a-visiting not so long ago).

In the article, the main thing that caught by eye was this:

The Large Hadron Collider (LHC) will produce close to one per cent of all digital data generated on the planet. The amount stored every year by each of the big experiments would fill about 100,000 DVDs.

That’s 15 PetaBytes per year according to Dirk Duellmann’s presentation linked to below.

Wow that’s data. And a volume of data that would be useless without applications to process, crunch, filter and make sense.

Anyway, I read that article and went away to find out a little more about how they’re doing what their doing.
Oracle is a major component of their architecture with something like one hundred server nodes across fifteen clusters serving two or three million database sessions per week.

I don’t know any of them, but the guys from CERN are regular presenters at Oracle user groups and they’re very open about what they do and how they do it. There’s plenty of interesting information on their Physics Database Wiki

RAC and Streams are the cornerstones of their database architecture. They’re using Asychronous Autolog CDC. Which was an interesting and timely coincidence for me because I had been chatting recenty to someone at a big bank who had been struggling with this streams setup and was getting frustrating with the tens of patches, some conflicting, to get this up and running.

Dirk Duellman made some interesting points about their architecture and Streams observations at OOW, among which, the impact of Streams rules at capture or propagation stage, the translation of bulk operations on the source to many non-bulk operations at the destinations, the impact of spillover of message from memory to disk.

All fascinating. Some of the detail makes my head ache and that’s ignoring all the rocket science physics stuff ;-) .

Categories: large hadron collider · lhc · oracle · streams

How to prove it?

March 27, 2008 · 2 Comments

It seems commonsense to most database specialists that, for data-centric applications where the application does not need to be database independent*, for logic that is close to your data – data logic – and data-intensive operations, it’s likely to be more efficient for that to be in the database, to be quicker in the database, to be cheaper in the database (by dint of the database having already been purchased and therefore just being a proxy of time to develop), to be less complex in the database, to be faster to write in the database, to have less lines of code and to be just as reusable in the database**

But how to prove it?

Many of us have no doubt written plenty of demos and proofs of concepts that validate the approach of on particular approach/solution over another in specific circumstances.

But is there any mileage in a set of benchmarks, similar perhaps to the TPC benchmarks used to compare hardware and database vendors? Maybe this already exists? Maybe this is exactly what the TPC benchmarks could also be used for. But I’d be interested in a focus broader than just speed including complexity, lines of code, testability, etc.

Ignoring my particular database-centric interest, how useful would it be to have a set of benchmarks that could be used to compare languages or new features in a language?

It would certainly make for an interesting challenge for those looking to fight the corner for their particular favorite. It would be useful information for a new project trying to justify which technology / features / architecture to go with and would make a change from the just “let’s use feature X because it’s new and exciting”.


*I’m thinking here about vendor applications that need to be database independent to support N different databases rather than a misguided notions that it’s general best practice and it’s more likely that you might change from Oracle to SQL Server than rip out your C# and go all Groovy.
**When written by someone who knows what their doing, obviously.

Categories: oracle · performance

The dea(r)th of Oracle RDBMS and contracting?

March 19, 2008 · 33 Comments

I’m feeling flat today and apologise for the sensationalist headline.

I also apologise for another one of this useless opinion-based posts. There are far too many of opinions in this site, not to mention far too many posts on things requiring further investigation which never gets done, and not nearly enough concise, factual posts which give anyone any genuine interest. One day this will change… but not today - it’s cathartic to let it out.

So, I tend to contract these days, specialising in pretty much anything Oracle RDBMS-wise coming primarily from a development/application angle – i.e. development, design, performance tuning, architecture, development DBA.

Two observations I have about the current market for Oracle contracts in the UK.

Firstly, an increasing number of clients have HR-imposed limits where they will engage a contractor for a maximum of one year. For a database expert, this presents a bit of a problem. So much of what we do is about the data. I find it takes at least six months to get a really good handle on just a fraction of the business and its data. So, after one year you’ve not long started ramping up the value that you can deliver. Maybe this behaviour is more common in banks and other such finance companies but, apart from my current client, I’ve talked to two prospective future clients who have the same such policy. This sort of policy only make sense if you think you can swap in/out any development resource and put no value on them having knowledge specific to your business.

And now to Oracle. I feel like the war has been lost and there are only a few pockets of resistance left now, resistance that will sooner or later be squashed. The religious war regarding sensible, about pragmatic use of databases and database code, about doing work related to data in the database, about data quality being enforced in the database, etc versus the database should be a bit bucket camp. 

Over the last few years I’ve worked on some pretty decent databases, some of which I created. They all took a pragmatic approach to the database. Even in n-tier Java environments, if it made sense we put the logic where it made sense, where there was a strategic or performance advantage whichever tier that was.

Even at my current client, I was pleasantly surprised to find a database-centric application – you might even say excessively so. But not for much longer. The database is under attack. A newly created hierarchy have decreed that databases are indeed bad.

And I was speaking to a friend today at a previous employer, a major media / entertainment company. They are planning to abandon their pragmatic approach to Oracle and switch wholely to open source databases, ORM tools, and the like.

I just don’t understand why. Actually I do understand some of the why. 

  • Databases don’t perform well when SQL is written by people who neither like nor understand SQL, people who don’t appreciate that they need to abandon their iterative approaches and need to think in sets, people who struggle to string together a couple of tables.
  • SQL doesn’t always perform well when written by people who neither like nor understand access paths and indexes.
  • It’s difficult to write good SQL against poor table design.
  • When databases don’t perform well, people don’t want to wait for people to tune or redesign, they want to buy some more memory or CPU and have it installed later that day. 
  • People of Influence with a decent database background are becoming few and far between.
  • People are reluctant to use built-in features like RLS, Oracle Audit Vault / FGA, etc, etc and prefer to write their own framework from scratch.
  • And not insignificantly, database testing tools are way behind the curve. Managers are getting used to full-featured testing reports, code coverage and code metrics and rightfully see the database as backward in this area.

And if you’re going to have a bit bucket, well, you might as well have a free bit bucket. And then you might as well stick ORM on top of it.

Categories: Witterings · contracting · oracle