Saturday, March 26, 2011

CTO Roundtable: Big Data
Interesting discussion led by Michael Brown, CTO at comScore at our CTO Roundtable breakfast meeting.

Michael and his team manage and analyze one of the largest datasets processed by a local company. It is fascinating how thinking has to evolve when the volumes reaches billions of rows of data per day. Some of the key takeaways for me were:
  • Leverage sort before processing
  • Shard the dataset to create smaller more manageable files
  • Parallel processing improves turnaround and allows for scalability
  • Open source tools have reached a level of maturity to be relevant solutions to consider
  • Most of the processing is now accomplished on smaller commodity machines
  • Machine memory is more of a limiting resource than hard disk
  • Solid-state disks can be selectively used to improve performance for IO intensive processes
  • Security is best managed by selectively separating and focusing on securing sensitive data
  • SQL is still king as it is very easily grasped by non-technical folks. Therefore, it is easier to find and develop talent with SQL skills.
Micheal has clearly lived through the evolution of technology for managing Big Data over the past 10 years. I enjoyed the event a lot.

No comments: