Big Data Reaches Inflection Point

Enterprises see the sunshine on big data opportunities. It is only an issue of time before mainstream data-management environments evolve.

What’s the status of the large data revolution? Fresh clues emerged with week with Hadoop vendor Cloudera scoring a $160 million round a chance capital funding, big data analytics company Platfora getting a $38 million capital infusion, and Allied Market Research issuing an estimate that the $2 billion Hadoop ecosystem (as measured in 2013) will quickly grow to $50 billion by 2020.

Citing that heady $50 billion stat, Rob Bearden, CEO of Cloudera-rival Hortonworks, said that he expects see “60%, 70%, 80%” of enterprise data stepping into Hadoop over the approaching years. Speaking at this week’s GigaOM Structure Data big data event in Long island, Bearden said Hadoop changes the economics of managing data, giving companies a sought-after “single platform that manages all data types and structures.”

Structure keynoter Paul Maritz, CEO of EMC-spinoff Pivotal, said his company is targeted on making Hadoop enterprise-ready so “mere mortals can do what the net giants have done with a lot of data.” Businesses are “commencing to get up to the chance,” he said, citing General Electric as a working example. GE CEO Jeffrey Immelt has changed the direction of that industry giant to grab the chance within the Internet of items, which galvanized its “industrial Internet” strategy, marked by connected turbines, locomotives, aircraft engines, and more. The necessity for giant data tooling was one motivation behind GE’s $105-million, 2013 investment in Pivotal.

[Want more on Pivotal’s latest moves? Read Pivotal Brings In-Memory Analysis To Hadoop.]

Cutting-edge giants like GE aren’t the best ones investing in big data. “We’re beginning to see companies reconceive themselves as data companies,” Maritz observed. “When all the consumers on the planet got connected to the web, it enabled a thorough change. As billions of devices get connected, that, too, will enable radical change, so we need to embrace it.”

Hortonworks CEO Rob Bearden.

Hortonworks CEO Rob Bearden.

The big data expenditures won’t go simply to Hadoop providers. Exhibitors on the Structure event represented a cross-component to technologies:

  • Alpine Data Labs announced support for the open source Spark technology for in-memory analysis on top of Hadoop. Spark developer and support provider Databricks has certified Alpine’s implementation of the technology for machine learning and analytics.
  • HP Vertica is in partnership with multiple Hadoop vendors (most recently MapR), but with its recent Vertica 7 release it introduced Flex Zone, which seems like a light-weight alternative to Hadoop. Flex Zone is built on commodity hardware. Its nodes can store structured or semi-structured data. It supports schema-on-read analysis, meaning you only load data without needing to create a schema upfront or use ETL to load. FlexZone is deployed and managed with the similar tools used for Vertica, and it’s queried with SQL (or in-database R or Java-based algorithms). Flex Zone would not support unstructured data (like images or audio files) or MapReduce processing as Hadoop does. But you will not have got to learn Pig or MapReduce, and it’s said to be about in accordance with Hadoop storage costs.
  • MetaScale, the large data consulting and services firm, spun out of Sears, highlighted a brand new managed-services program wherein it could take over the management and administration of Hadoop clusters and other big data infrastructure that’s already in use. It does so using remote-monitoring capabilities. For firms which have yet to deploy big data infrastructure, MetaScale offers Hadoop and NoSQL appliances which are prewired for its remote-management services. The assumption is here’s to get round the big data talent shortage and speed deployments by tapping MetaScale’s experience in big data deployments and its economies of scale in managing infrastructure.
  • New Relic, an online- and mobile-application monitoring company, this week announced new Insight analytics capabilities within its platform. The premise is to move beyond monitoring app performance and to begin collecting and analyzing application data, equivalent to customer names, ages, subscription levels, product selections, and other attributes which may be used for up-selling, cross-selling, and customer segmentation. Think Splunk-meets-application-monitoring, but as a consequence the audience is developers who can exploit the tools to construct more intelligence into their Web and mobile apps.
  • Paxata won a Best Analytics Startup award at Strata for its Adaptive Data Preparation platform, which runs on Hadoop or inside the cloud. Geared to business analysts, the platform supports merging, cleaning, enriching, and otherwise shaping raw data sets into information that’s ready for business intelligence and analytics. The info-management tools bridge the distance between information-management professionals and information scientists — the folk who do the complete heavy-duty coding and knowledge-management work — and the business users who demand novel combinations of information and new reports. The analysts in between have lacked tools for working efficiently with data, in keeping with Paxata.

You might think about some or all of those vendors as disruptors, but Shaun Connolly, Hortonworks VP of corporate strategy, says data is what’s disrupting the datacenter, not Hadoop, NoSQL databases, or the other technology or group of vendors. It is the masses of information generated by new devices, applications, digital services, sensors, interaction modes, and more. New technologies and platforms weren’t just invented by new vendors who wanted a chunk of old IT budgets. They were invented to unravel new problems that weren’t well addressed by the old tools.

Solid state alone can’t solve your volume and function problem. Think scale-out, virtualization, and cloud. Discover more in regards to the 2014 State of Enterprise Storage Survey leads to the recent issue of InformationWeek Tech Digest.

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of … View Full Bio

More Insights