Diffbot Uses Robots To Extract Data From E-commerce Sites

Diffbot announced today that it’s relasing a brand new API that uses robots to realize and extract data from e-commerce sites.

The robotics company, which uses vision, machine learning and synthetic intelligence to research and extract data from web content, appeared on the Bing-sponsored LAUNCH event last year, where it laid out is plans to make the whole web machine-readable. More on that here.

The new API uses computer vision to show any e-commerce site right into a product database, the corporate says in an email.

“Software developers can use the API to extract various data from the page include product image, SKU code, price, shipping cost, discount price, MSRP, etc.,” a spokesperson for diffbot tells WebProNews. “The API can identify and structure information in spite of a site’s design, layout, markup or language.”

Additionally, diffbot has developed a spider technology, that can analyze a whole site, skipping non-product pages, and extracting just the info from relevant page types.

“Think about Target.com, or Wal-Mart.com, and having the ability to extract all the product data from each of the product pages,” the spokesperson says.

“E-commerce is among the finest activities on the net. With 28% folks internet users shopping daily, we figured we should always teach our robot ways to understand products,” said CEO Mike Tung. “The Product API represents our latest advances in pushing the capabilities of automated page extraction. We’re one step in the direction of the approaching goal of creating the complete web machine-readable.”

Diffbot believes the complete web could be broken down into about twenty or so page types, which includes home pages, article pages, product pages, location pages, social network pages, etc., and says will continue to roll out APIs for brand spanking new page types until it has tools to index the full Internet. It already has APIs for home pages, article pages and image pages.

The company is backed by Earthlink founder Sky Dayton, who’s a part of the board.