Pivotal Brings In-Memory Analysis To Hadoop

Pivotal takes on Cloudera and Hortonworks with GemFire XD, enhanced SQL querying, and new machine-learning options in Pivotal HD 2.0.

8 Data Centers For Clouds Toughest Jobs

8 Datacenters For Cloud’s Toughest Jobs

(Click image for a bigger view and slideshow.)

Pivotal, the EMC spin-off company pursuing modern application development within the context of cloud computing and massive-data analysis, on Monday released Pivotal HD 2.0, an update of its Hadoop distribution incorporating an in-memory database and a battery of latest analysis capabilities.

Pivotal HD 2.0 is the vendor’s first distribution in keeping with Apache Hadoop 2.2, the newest release of the open source platform incorporating YARN system resource management controls. The discharge also integrates and supports Apache GraphLab, an open source framework for derivatives monitoring, recommendations, and graph analytics.

The big news, however, is the addition of GemFire XD, an in-memory database designed to execute algorithms and analytics on data in real time. Blending elements of Pivotal’s GemFire (in-memory object grid) and SQL Fire (in-memory database), GemFire XD puts a SQL-compliant, in-memory database on top of the Hadoop Distributed File System (HDFS), from which it’s going to read data or write data with ultra-low latency.

[Want more in this company’s other capabilities? Read Pivotal Launches Cloud App Development Platform.]

GemFire XD can be utilized by a mobile network provider, as an example, to ascertain the identity, location, device, and network of an incoming call within an instant after which apply complex algorithms or in-memory analytics to figure out find out how to route the decision making the appropriate use of accessible capacity. The database can also handle data-transformation tasks before writing the info to HDFS, circumventing the necessity for processing that would rather be required by means of ETL routines.

The Hadoop community is lately seeking to Apache Spark as an open-source option for in-memory and stream processing capabilities, but Pivotal says commercial GemFire XD has many advantages over that technology.

“We’re occupied with Spark and may support it, but it’s generally used for [data] ingest or caching,” said Michael Cucchi, Pivotal’s senior director of product marketing, in an interview with InformationWeek. “GemFire XD is an ANSI-compliant SQL database with high-availability features, and it could possibly run over wide-area networks, so that you could have an instance in Europe and another in North America with replication.”

In another database-derived advance in Pivotal HD 2.0, the corporate has enhanced its HAWQ SQL-on-Hadoop query engine, that’s in line with the Greenplum database. HAWQ can now apply the greater than 50 in-database algorithms within the MADlib Machine Learning Library. What’s more the engine now supports automatic translation of R, Python, and Java-based queries and applications so HAWQ can handle business logic and procedures now well handled in SQL.

Pivotal competitors reminiscent of Cloudera and Hortonworks slam HAWQ’s commercial roots, but here, too, the seller says its proprietary technology has advantages over Hive, Impala, and other open source SQL-on-Hadoop options.

“HAWQ takes good thing about Greenplum’s 10 years of history as a massively parallel processing analytical query engine, so it’s 100% SQL compliant, has broad support, and it is very high performance in comparison to [Hive, Impala,] and other options,” said Cucchi.

Working on defusing another criticism of HAWQ, Pivotal announced that HD 2.0 introduces beta support for reading and writing of Parquet files from HAWQ. This implies the engine will soon support an open file type instead of the Greenplum-specific formatting currently utilized by the database.

Matching Cloudera’s “enterprise data hub” concept, Pivotal has developed a Business Data Lake architecture with HD 2.0 on the center of enterprise data management. However the company continues to be catching up in some regards in that its proprietary HAWQ and GemFire XD components can’t, as yet, be managed by YARN. That’s something Pivotal is operating on, in line with Cucchi, but for now companies should use the mixture of Pivotal Command Center, Virtual Resource Planner tools, and YARN to separately manage the resources and workloads within a knowledge lake environment.

Pivotal sees its biggest advantage as being its larger Pivotal One Platform, which mixes its Spring Source application-development framework and Cloud Foundry platform-as-a-service capabilities in addition to the firms data-management capabilities.

“We’ve got hooks from our data-services capabilities so Spring Source developers could make calls from within their environment in an effort to make the info products react,” Cucchi explained. “Developers may spin up hundreds of nodes of Hadoop [on our cloud platform] within minutes, after which with one click, they are able to attach data services on to their applications.”

That’s a much wider play than Pivotal’s key Hadoop-distributor competitors attempt to address, however the question is whether or not Pivotal can win in all three of the markets within which it competes: application development, cloud infrastructure, and high-scale data management. On that last front, Pivotal now has greater than 100 customers running on its Hadoop distribution, with most using HAWQ, in step with Cucchi, but he declined to quote recent customer wins.

Cloudera and Hortonworks are generally seen because the leaders of the short-growing Hadoop market, with Pivotal ranking somewhere after MapR and within the same league as IBM (with BigInsights) in bringing the platform to enterprise customers.

Incidents of mobile malware are way up, researchers say, and 78% of respondents worry about lost or stolen devices. But although many teams are taking mobile security more seriously, 42% still skip scanning completely, and just 39% have MDM systems in place. Discover more within the State Of Mobile Security report (free registration required).

Doug Henschen is Executive Editor of InformationWeek, where he covers the intersection of enterprise applications with information management, business intelligence, big data and analytics. He previously served as editor in chief of Intelligent Enterprise, editor in chief of … View Full Bio

More Insights