yuvi kochar on tech

Saturday, March 26, 2011

CTO Roundtable: Big Data
Interesting discussion led by Michael Brown, CTO at comScore at our CTO Roundtable breakfast meeting.

Michael and his team manage and analyze one of the largest datasets processed by a local company. It is fascinating how thinking has to evolve when the volumes reaches billions of rows of data per day. Some of the key takeaways for me were:

Leverage sort before processing
Shard the dataset to create smaller more manageable files
Parallel processing improves turnaround and allows for scalability
Open source tools have reached a level of maturity to be relevant solutions to consider
Most of the processing is now accomplished on smaller commodity machines
Machine memory is more of a limiting resource than hard disk
Solid-state disks can be selectively used to improve performance for IO intensive processes
Security is best managed by selectively separating and focusing on securing sensitive data
SQL is still king as it is very easily grasped by non-technical folks. Therefore, it is easier to find and develop talent with SQL skills.

Micheal has clearly lived through the evolution of technology for managing Big Data over the past 10 years. I enjoyed the event a lot.