As part of the daily barrage, there was a great piece on Google's Dremel project on Wired. The article was posted at a technical news discussion site and the top ranked comment for that article was so sensible and something we have to explain to clients everyday especially when they are under relentless pressure by everybody about "Big Data".
I am re-posting the comment for people to think about:
A small note: Its great to see so many great tools coming up to solve the kind of problems which were earlier difficult/impossible to solve. But however please check your big data use cases many times before using big data tools. Because frankly 'big data' is becoming a just cool must use tool regardless of use cases people have these days. I've even seen data sizes as small as 10 MB being considered for big data use cases. Often this gets subjected to a monstrously complex architecture for no good reason.
Generally most of these cases can be addressed and solved with as simple a tool like sqlite! And all you generally need is something like Perl with sqlite and ability to write simple SQL queries.
People get deceived very easily, When they look at GB scale XML files they think that is what big data is. Yet most of that generally and easily goes into a traditional RDBMS. And the performance is generally is in pretty acceptable limits. Mark up eats a lot of space and data size. When converted to flat file structures like csv's, tsv's and then imported to a RDBMS the data sizes are way smaller. I've some times seen an order of 10x difference.
Another annoying thing is abuse of NoSQL databases. Perfectly relational data is being de normalized, force fed in NoSQL databases and access data interfaces are generally bad buggy sub implementations of SQL.
This is almost like, people who don't understand SQL are condemned to implement it badly.