Build strong predictive coding with the right PDF and OCR foundation

Predictive coding offers strong benefits to the legal industry, but if care isn’t paid to selecting the right document conversion solutions agencies risk falling victim to GIGO
Graphic designer working late at computer in office

At LegalTech this week a lot of talks focused on electronic discovery again this year, but this time there was more talk about predictive coding.

For those unfamiliar with predictive coding, it is a machine-learning technology that utilises a combination of keyword search, natural language processing, data filtering and sampling techniques to automate portions of the document review process in legal discovery.  The goal of predictive coding is to reduce the number of electronic documents that must be reviewed by humans to evaluate their relevance to a legal case.  It is a particularly important within the legal industry given the pace at which the volume of electronically stored information has grown and continues to grow.

One comment from a panellist that stood out in a stream of tweets was, “95% of cases are too small for predictive coding.”

This quote, taken out of context, can be interpreted in more than one way.  For example, it could be interpreted in the same vein as the old adage that warns against the killing of an ant with a sledgehammer.  Or it simply could mean that predictive coding technology hasn’t yet matured to the point it’s suitable for the majority of discovery requests.

Whatever the intent of the comment was I think it underscores an important point.  Technology needs to serve the business, not drive it.  Frequently in our seal to achieve greater productivity and efficiency through technology we have a tendency to overlook the obvious and forget to stop and consider whether a particular solution makes solid business sense.

A subject related to this idea of overlooking the obvious, and one that is also related to electronic discovery, is considering whether some of the core document processing solutions that have been in place for some time may be failing the business by introducing problems into the discovery process that undermine productivity or drive unseen costs.

For example, PDF and OCR are both critical to the processing of ESI (electronically stored information) and significantly impact the success of newer technologies like predictive coding that are sensitive to the quality of the data contained in the document. They are the critical first elements in converting images and paper to searchable content and if they aren’t built for the legal environment, could introduce errors into the coding making it worthless. If the technologies you use as the foundation for new solutions that promise to transform your business aren’t capable of supporting the needs of that solution and its users, you’re wasting resources at best, and damaging the productivity, effectiveness and reputation of your organisation at worst

Are you evaluating e-discovery solutions?  If so, I suggest it’s also time to evaluate your current PDF and OCR tools to ensure your efforts don’t become a victim of GIGO – garbage in, garbage out.


Tags: ,