Georgia Solves Campaign Finance Data Challenge Via OCR

In a project to make financial disclosure and campaign contribution data public, Georgia turns to Captricity’s hybrid system, which mixes machine learning and human intelligence to digitize handwritten forms.

Internet Of Things: 8 Cost-Cutting Ideas For Government

Internet Of items: 8 Cost-Cutting Ideas For Government

(Click image for larger view and slideshow.)

Faced with a deadline for electronically processing and making publicly available financial disclosure information, officials on the Georgia Government Transparency and Campaign Finance Commission had to deploy a system for digitizing a multiplicity of forms — a number of that have been even handwritten in crayon. They turned to a different data-capture system from Captricity that mixes human intelligence and machine learning.

The commission is answerable for making public each of the financial disclosure and campaign contribution forms for elected officials and individuals running for office. It collects the reports and makes them available online in a searchable database.

In 2013, the Georgia legislature mandated that disclosure forms needed to be transmitted to the commission from local offices by electronic filing or faxing and that the recent processing system needed to be in place by the top of the year. Under major budget constraints, commission officials decided that e-faxing was the perfect and most cost-effective option to implement.

But they needed a method extract the information from the faxed forms and convert it to structured output which may be digitized and integrated with their existing online filing system and database. The main formidable obstacle associated with the haphazard way people forms filled within the forms.

[At the federal level, recent bipartisan legislation aims to make agencies’ spending more transparent. Read Senate Unanimously Passes DATA Act.]

“Some people use Adobe and print the forms. Some people handwrite them. We’ve even received forms written in crayon,” said Joel Perkins, CEO of Inserv360, an Atlanta-based firm that manages the commission’s IT infrastructure with another Georgia company, Jaxified LLC.

The IT team tested several optical character recognition (OCR) systems, but they weren’t nearly accurate enough, particularly with handwritten forms. That’s after they found Captricity of Berkeley, Calif. The knowledge-capture specialist firm uses crowdsourcing to show difficult-to-read paper documents into actionable data within hours. When the Georgia team tested the system, the structured data returned by Captricity was 99% accurate, even for handwritten forms.

Captricity’s cloud-based system leverages OCR scanning technology and manual data-entry workers from Amazon Mechanical Turk (AMT), a crowdsourcing Internet marketplace that lets “requesters” akin to Captricity coordinate using human intelligence to accomplish tasks that computers are currently unable to do. The system isolates a form’s individual fields, or “shreds,” into distinct images. AMT workers gather content from the shreds and employ OCR algorithms that teach the pc to “read” the information. The output of the OCR engines becomes continuously more accurate.

“The more they do, the easier those engines recover from time,” Kuang Chen, CEO and co-founding father of Captricity, told InformationWeek Government. “Our customers get almost perfect data from the get-go because we use humans to also verify the output of those predictions and confirm that each single piece gets as much as high-level accuracy,” he said. “The verification is additionally crowdsourced.”

Since the start of the year, the Georgia commission has received about 7,000 e-faxes, lots of them 10 or more pages in length. Perkins estimates that the commission will process about 40,000 pages a month in the course of the seven annual filing periods this year. All the forms, even those filled out in crayon, flow from fax, to Captricity, to the commission’s e-filing system in a single smooth pass, he said.

Last November, the Food and Drug Administration announced a freelance with Captricity to digitize handwritten HIPAA complaint forms using the OCR and Amazon Mechanical Turk process. About 10% of tens of thousands of HIPAA reports are submitted on paper. Previously, the forms were digitized manually by data-entry staff at FDA, a process that created an enormous backlog in paperwork. Unlike the Georgia campaign forms, that are all an issue of public record, FDA documents involve security and privacy issues.

However, Captricity’s shredded method of processing documents also ensures the privacy and security of a document’s content, Chen said. Crowdsourced workers see just a fragment of a whole document. “No single this type of verifiers gets to peer anything outside the context of the only little shred,” he said. “They do not know who it’s, who it’s for, or what it’s about. An analogous trick that makes it go fast makes it secure besides.”

Join us at GTEC, Canada’s government technology event. Over 6,000 participants attend GTEC — Government Technology Exhibition And Conference per annum to interchange ideas and advance the business of data and communications technology (ICT) in government. Don’t miss thought-provoking keynotes, workshops, panels, seminars, and roundtable discussions on a comprehensive choice of ICT topics presented by leading public sector and industry experts. Register for GTEC with marketing code MPIWKGTEC and save $100 on entire event and conference passes or for a free expo pass. It happens Oct. 27 to 30 in Ottawa.

Richard W. Walker is a contract writer based inside the Washington, D.C., area who was covering issues and trends in government technology for greater than 15 years. View Full Bio

More Insights