Are you about to select a large data platform? Keep in mind that SQL isn’t the best use of Hadoop.
Our recent 16 Top Big Data Analytics Platforms collection has generated a great deal of interest and lots of comments and questions. To answer the latter, we jumped on the chance to do a Google+ Hangout with the editors of sister UBM website AllAnalytics so shall we go deeper at the topic.
The questions through the discussion (see video interview below) covered a number of the most commonly asked questions posted below our slide show — 21 comments and counting at this writing. How do you define big data analytics platforms or “big data” for that matter? What makes these “top” vendors, and are any of those platforms accessible to midmarket companies?
[ Watch InformationWeek’s Doug Henschen discuss 16 Top Big Data Analytics Platforms with the editors of AllAnalytics (below). ]
I elaborate on these types of topics, but some new questions from AllAnalytics editors Beth Schultz and Michael Steiner sparked good conversation and 3 points worth highlighting:
SQL won’t reap the benefits of Hadoop
Many of those 16 top platform providers offer SQL-on-Hadoop options. You’ll want to take into account that SQL analysis against data on or from Hadoop would not offer the very best value you’ll be able to gain from the platform. Organizations are embracing Hadoop to exploit data they couldn’t afford to maintain before and, more importantly, to capture complex and variable data — clickstreams, log files, mobile data, social data, and more — that are not easily managed in relational database management systems (DBMSs).
You might be able to boil structured data out of an infinite collection on Hadoop for SQL analysis. However the higher value might be found with machine learning, time-series analysis, and other approaches that allow you to correlate this new data with the highly structured information you have been analyzing for years.
“We’re seeing over and over, at almost every company that we work with, that the capabilities that BI and SQL give them are fine, however the kinds of data and the kinds of questions that they inevitably wish to get to head far beyond that,” noted Platfora CEO Ben Werther in a up to date interview in this topic. “Within the old world, you’d investigate sales by store and many others, but inside the new world it’s worthwhile to investigate such things as clickstream behavior and the way it pertains to physical store activity. [It’s about] connecting the dots around the old traditional data sources and adding this new world of digital clicks, ads, and mobile, and social data.”
Hadoop distributors aren’t all eager promoters
Data-management incumbents Oracle, IBM, and Teradata all sell and support Hadoop, but you get the impression they do not have their heart in helping you to benefit from the platform. Hadoop is on their product checklist because they know their customers have an interest in it. However, I’ve talked to executives of all three companies who dismiss it as an immature, hard-to-manage platform that isn’t nearly as capable as their incumbent databases.
The maturity and management points are accurate (and aren’t a surprise, given Hadoop’s life compared with 30-year-old RDBMSs). As far as its capabilities are concerned, the contrast with RDBMSs is not an apples-to-apples comparison. (See the point above about Hadoop’s purpose and highest value not being SQL querying.) IBM gets this and has gone to the trouble of creating its own (InfoSphere BigInsights) Hadoop distribution. Teradata gets it, too, but puts forward Teradata Aster as a platform for MapReduce, Graph Analysis, and more. Yet how many platforms do you want to cope and maintain?
The bottom line is that all three of these biggies seem to accept Hadoop grudgingly as a high-scale, low-cost data lake, if the customer insists, but they want to channel the analysis activity into their own platforms and products. (I get a different, more enthusiastic feel about Hadoop from Microsoft, perhaps because it doesn’t have nearly as many high-scale data warehousing customers as do Oracle, IBM, and Teradata, and could benefit from Hadoop market disruption.) If you like the idea of getting everything from one vendor, that’s fine, but keep your eyes wide open about which vendors are eager to help you use Hadoop as more than a storage platform.
Analytics: the real prize
Perhaps the most important point made during our discussion was that all 16 of these platform vendors realize that managing data isn’t enough. That’s why DBMS vendors are packing on the in-database analytics capabilities. It’s why giants like IBM, Oracle, and SAP have acquired numerous analytics vendors. Yes, they still make tons of money on database licenses, ETL, and so on. But customers aren’t likely to be attracted to platforms unless they can help them make sense of the data in order to get to predictive and prescriptive analytics.
I also make the purpose that businesses don’t live by analytics alone. They should cover the fundamentals of BI, operational reporting, and other needs. That’s why we distinguished between platform providers — those offering open, multi-purpose environments — and dedicated analytics vendors. That line is beginning to blur as companies like SAS, Alpine Data Labs, and others support memory-intensive clustered-server environments and Hadoop. Are these vendors prepared to support everything else you’ll want to do with a knowledge platform?
These key points and questions are all worth considering as you explore the choices specified in the 16 Top Big Data Analytics Platforms collection. We are hoping it is a helpful guide with a purpose to result in fruitful technology choices.
Engage with Oracle president Mark Hurd, Box founder Aaron Levie, UPMC CIO Dan Drawbaugh, GE Power CIO Jim Fowler, former Netflix cloud architect Adrian Cockcroft, and other leaders of the Digital Business movement on the InformationWeek Conference and Elite 100 Awards Ceremony, to be held together with Interop in Las Vegas, March 31 to April 1, 2014. See the complete agenda here.
Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of … View Full Bio
More Insights