How One Amazon Cloud Customer Gains Operations Visibility

Wattpad uses a mix of 3 monitoring systems to get the insight it must keep its busy, AWS-powered website humming.

Wattpad is an internet site that brings together authors and readers. First-time authors, or for that matter, many-time authors, can post their stories to the positioning and gather critical feedback that will assist in the revision process. Readers can follow favorite authors and access greater than 10 million stories at no charge using an online browser or mobile device.

Wattpad was endorsed by noted Canadian author Margaret Atwood and grown to ten million visitors a month. Each second it fields 6,000 requests for content or requests to post comments. Meaning there may be few kinks within the way the location operates. Wattpad’s owners want visitors to discover what they’re in search of — many have favorite authors — and get their requested stories to their devices quickly.

Wattpad, from its inception, has run on Amazon Web Services infrastructure, but its operators have found it hard to achieve as much visibility as they’d like about website applications. Amazon’s CloudWatch provides basic feedback, which are amped up by adding for-a-fee reporting metrics. Wattpad has used CloudWatch, but added to it three independent monitoring systems: New Relic, Datadog, and Boundary.

[Need to learn more about what New Relic does for cloud applications? See Obama’s Developer Brain Trust: Contained in the Big Battle.]

In the second one half 2013, Charles Chan, head of engineering at Wattpad, desired to know what was wrong when the site’s search engine, Elasticsearch, developed signs of slowing down. Wattpad visitors can be absent for a couple of days or perhaps weeks, and after they return they need a brief compilation of any new postings by their favorite authors. Elasticsearch is a tender search engine in line with Lucene open-source code, designed to collect frequently updated documents. Its first, stable 1.0 release was released on February 12, 2014, so Chan carefully monitored the sooner versions used through 2013.

In the latter half 2013, Wattpad’s monitoring system showed many user requests flowing into Elasticsearch, but significantly less information than expected popping out. The quest engine slowdown would was hard to detect with AWS’s CloudWatch monitoring service, which lacks a metric that reflects Elasticsearch’s operation. But independent monitoring service Boundary, which detects traffic levels between the nodes of a system running on Amazon, helped Chan identify the difficulty.

Boundary’s dashboard alerted Chan that Elasticsearch output was slowing down since it could spot the disproportionate amount of knowledge popping out of the Elasticsearch node in comparison with the traffic stepping into. Chan corrected the issue by reconfiguring Elasticsearch to higher fit Wattpad’s usage patterns.

Chan said Wattpad has used Amazon’s EC2 from its start, nevertheless it added Boundary six months ago to its New Relic application performance monitoring and its Datadog system, which compiles monitoring data right into a unified display. He doesn’t rely upon Amazon’s CloudWatch much anymore. “It doesn’t give the identical level of insights” as Boundary and New Relic, he said.

New Relic APM is helpful in spotting application slowdowns, similar to a hung application looking forward to results from a satellite database system, and other potential trouble points. Boundary brings something else to the party: a capability to work out what network traffic is feeding into those applications and the traffic popping out. Unlike traditional systems management, which tells you whether your servers and network switches are operating normally, Boundary watches the network segments between the nodes.

“Not all issues could be manifested on the application level,” said Chan in an interview. He uses its overview of network bandwidth use and network traffic to identify potential trouble points. Wattpad conducted testing inside the lead-as much as its busy holiday season by firing off artificial demand against its website and observing, through Boundary, where traffic flowed smoothly and where it all started to back up.

One trouble spot was the open-source Memcache caching system supplying frequently used data to servers. When fielding a request from an application, Memcache was returning more data than the applying could use, chewing up network bandwidth. “Unnecessary data was being sent” as it have been configured to overdo the response side of its operation in comparison to the info coming in, said Chan. His staff was ready to correct the difficulty before it resulted in any peak-traffic slowdowns.

Chan said Boundary was “easy to establish in a question of hours.” It places its own sensing agent on hardware devices, which automates the reporting of traffic to the central Boundary system. Any new agent reporting in prompts Boundary so as to add another device to its network topology map. The lines at the map illustrate which node is chatting with which.

Chan has no complaints about Amazon as a cloud corporation and said CloudWatch served Wattpad’s purpose initially. But now that Wattpad has reached 10 million visitors a month, he needs a more complete view of what is happening in his cloud infrastructure.

Asked where he’d be without his monitoring combination of recent Relic, Datadog, and Boundary, he said, “We’d should perform a little network sniffing on our own” to take a look at to come to a decision network traffic. “We shouldn’t have much visibility without the network traffic element. With those three, they’ll be capable to carry us quite far.”

Engage with Oracle president Mark Hurd, NFL CIO Michelle McKenna-Doyle, General Motors CIO Randy Mott, Box founder Aaron Levie, UPMC CIO Dan Drawbaugh, GE Power CIO Jim Fowler, and other leaders of the Digital Business movement on the InformationWeek Conference and Elite 100 Awards Ceremony, to be held along with Interop in Las Vegas, March 31 to April 1, 2014. See the entire agenda here.

Charles Babcock is an editor-at-large for InformationWeek, having joined the publication in 2003. He’s the previous editor-in-chief of Digital News, former software editor of Computerworld and previous technology editor of Interactive Week. He’s a graduate of Syracuse … View Full Bio

More Insights