Paytronix manages only tens of terabytes, but it surely offers an appropriate example of why we’d like greater than relational databases.
Big data is just one reason data-driven companies are considering new platforms. As Paytronix can attest, data variety is the more compelling reason to think of NoSQL databases and Hadoop.
It’s easy to grasp Paytronix’s needs, since it makes a speciality of managing marketing and loyalty programs for the restaurant sector — a business with which we’re all familiar. Paytronix collects data from greater than 8,000 restaurants, mostly locations of chains similar to Panera, Papa Gino’s, and Outback Steakhouse. The info is used to optimize marketing campaigns and boost sales across chains and in specific locations.
Until last year, Paytronix’s center of research was a Microsoft SQL Server data warehouse containing only tens of terabytes. But Paytronix couldn’t handle the variability of point-of-sale and loyalty card data available, because each chain has its own data model.
[ Want more on analytics and knowledge management? Read our “2014 Analytics, BI, and data Management Survey Report.” ]
“We’ve held daylong meetings dealing with these different data structures, saying, ‘Can we put all of it in a relational database?'” Andrew Robbins, Paytronix’s president and founder, told us. “But for each field of knowledge, there appear exceptions and problems.” Ideas for solutions always appeared to come again to expensive changes inside the data model and ETL routines.
Because of the differences from chain to chain, Paytronix aggregated data by category — appetizer, pasta, dessert, and the like. In consequence, you could not drill all the way down to see details, consisting of the recognition of specific menu items by store or across chains. You furthermore mght couldn’t see text modifiers, which include “soup rather than salad” or “substitute potato with rice.”
Lured by the promise of having the ability to load any data and create the schema on read, Robbins said, Paytronix started experimenting with MongoDB (a NoSQL database) and Hadoop in June 2012. Microsoft SQL Server continues to be used to run Paytronix’s transactional systems and the info warehouse, but MongoDB now manages digital creative assets — equivalent to advertisements, brand logos, signage, and other images — while Hadoop is used for exploratory analytics.
With Hadoop, Paytronix is storing check-level detail from every restaurant, yet it doesn’t must worry about variations from chain to chain or changing the information model when menus change. Using a mixture of R-based data modeling, MapReduce processing, and Hive queries, the corporate is spotting previously unseen patterns in customer behavior. As an example, children often figure within the decision to dine out. But parents don’t always inform you that they’re parents, although asked on a loyalty program enrollment form. After which there are the grandparents, aunts, and uncles who frequently take children out to dinner but have no kids at home.
Using Hadoop, Paytronix is spotting loyalty club members who’re dining early and ordering items together with kids’ entrees and milk as a beverage — telltale signs that youngsters are one of the guests. These customers can also be targeted for child-related promotions and discounts that could give restaurants an enormous boost in business.
Panera is without doubt one of the restaurant chains that Paytronix supports.
Paytronix extensively utilized Hadoop to identify coupon fraud that was tied to express waiters and waitresses. It really is engaged on spotting millennial customers whom restaurants should attract now that many baby boomers aren’t dining out as often. It looks for patterns similar to large groups coming in on weekdays after work hours and ordering many of drinks and appetizers. A lot of restaurants are bobbing up with social promotions that encourage you to gift friends or give to charities by logging in through Facebook.
“If we’ve got a Facebook account, we will discover what they prefer, and it seems [that] the things people like inform you how old they’re,” Robbins said. As an instance, tastes in music and flicks are reliable indicators of age.
Hadoop is the appropriate platform for analyzing social data, and if Paytronix finds something of value, it may move boiled-down datasets from Hadoop into the info warehouse, where Pentaho BI is used for the reporting, ad hoc queries, and analysis. This midsized marketing firm got started with Hadoop with a Cloudera deployment running in Amazon’s cloud, but now that the platform is proven, it’s deploying a Hadoop cluster on its premises.
The Paytronix example shows why information management is moving beyond databases. It isn’t that the databases are going away, but where social data, clickstreams, and sensor data are in use or where plain data inconsistency is a reality, new platforms like Hadoop and NoSQL are gaining adoption.
More details at the Paytronix deployment are featured in our 2014 Analytics, BI, and knowledge Management Survey Report (registration required). This free report relies on interviews with 248 information management professionals and includes 22 informative charts and graphs.
You can use distributed databases without putting your company’s crown jewels in danger. Here’s how. Also within the Data Scatter issue of InformationWeek: A wild-card team member with a further skill set may also help provide an outdoor perspective that may turn big data into business innovation (free registration required).
More Insights