Big Data and Moneyball

Like many baseball fans, I was spellbound by Moneyball, the 2003 Michael Lewis book that told the story of how Oakland Athletics general manager Billy Beane and his staff leveled the playing field between baseball’s biggest teams and his small market club by finding overlooked value in the “big data” of baseball statistics. While I was fascinated by the fresh analysis of the game, I must admit that I underestimated how far beyond the baseball diamond those lessons can travel. More recently, Hyoun Park’s excellent blog Misunderstanding Moneyball  and Nate Sliver’s The Signal and the Noise helped me realize that the lessons of Moneyball can also be applied to today’s big data landscape.

How is big data like Moneyball? For starters, both attempt to predict outcomes in complex and fluid environments.  And both certainly have no shortage of data to analyze: there seems to be no aspect of a baseball game or a website visit that is left unrecorded these days. However, the data collected is not always used effectively. For example, in Moneyball baseball scouts had spent decades evaluating talent by metrics of individual performance, like batting average and home runs, rather than metrics that predict a team’s ability to score runs and ­­win games, like on-base percentage (OBP) and slugging percentage.

Are organizations poised to make similar mistakes with big data? Certainly lots of technology budget dollars are being spent in pursuit of a business OBP, but as Silver points out, “The litmus test for whether you are a competent forecaster is if more information makes your predictions better.”  Are the big data analytics at your organization meeting this litmus test? Since large volumes of data typically cost more to analyze, that is a question worth asking early and often. While the New York Yankees can spend more money on a single star player than some small market clubs spend on their entire opening day lineup, most businesses cannot afford to outspend their competitors in the same proportions.

Agile data management tools and techniques can help organizations level the playing field with bigger competitors in the same way that playing Moneyball helped Billy Beane. While Billy Beane and his staff used spreadsheets and a wealth of available baseball statistics to replace the old standard of a five tool player, organizations can use these five agile tools to maximize their big data investments:

  • Methodology – Encourage business and IT to collaborate seamlessly.
  • Modeling – Visualize the data so it can be used effectively to answer business questions.
  • Automation – Stay focused on business questions by using tools that automate data management tasks and scale with big data volumes.
  • Flexible Presentation – Support the full range of data presentation tools users need.
  • Governance – Give business control of their data. Users shouldn’t have to wait to create groupings or reorganize hierarchies and should be able to fix any data issues they find.

Put another way by master prognosticator Nate Silver, “The key is to develop tools and habits so that you are more often looking for ideas and information in the right places and honing the skills required to harness them into W’s & L’s once you’ve found them.”

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply