Valora Technology Assisted Review FAQ

Frequently Asked Questions (FAQ’s)

Download as PDF

  1. How long on average does it take to complete a TAR when implementing Valora’s workflow?
  2. How many reviewers do I need to involve when using Valora’s technology to perform our review?
  3. What is the underlying technology that Valora uses to support their Technology Assisted Review workflow?
  4. Can I use Valora for Early Case Assessment (ECA) or when I don’t have much knowledge about my case or the data that has been collected?
  5. In your presentations, you discuss a process of “tagging” documents when you ingest the data into PowerHouse. What does that mean?
  6. What is the value of tagging to the TAR process?
  7. I am not 100% comfortable with producing without some level of “eyes on” review taking place. How can I use Valora in this situation?
  8. What happens if I have a new issue or more data for a case? Can Valora handle changes in the workflow without adding costs?
  9. How does Valora ensure accuracy of their automated process?
  10. Is Valora’s process defensible?
  11. Can Valora AutoReview foreign language?
  12. What are some of the biggest differences between Valora’s Rules Based Approach and the common Predictive Coding model?

Q: How long on average does it take to complete a TAR when implementing Valora’s workflow?     

A: The amount of time is highly contingent upon the availability of the case’s subject-matter-experts (SMEs), often outside counsel attorneys who will provide input and feedback to Valora’s project managers. Examples of SME input and feedback include: details of what they are looking to find, or rules for categorizing (review-labeling) document in the collection. Another factor to consider is the number of times the rules are modified (otherwise known as “iterations”) for the attorneys to have confidence with the process. A typical project breaks down as follows:

  • 2-3 days to configure and run the rules. Simultaneously, Valora is ingesting the data and running it through our proprietary process to extract and tag attributes about the individual documents.

  • 1-2 days of run time for applying and performing quality control on the rules. This is the step that often iterates as SMEs discover new elements of the documents or the case.

  • Final QC using law firm or contract attorneys is optional. If elected, the percent of documents reviewed will drive the estimated time to complete. With Valora’s TAR process, typically <5% of the documents are reviewed manually.


Q: How many reviewers do I need to involve when using Valora’s technology to perform our review?     

A: Valora recommends that 1 Sr. Associate or 1 Partner participate in the process. This person should be someone who has an intimate knowledge of the case and pertinent issues driving discovery review. This attorney should be available to provide feedback to their designated PM regarding how documents are being automatically coded. The success of the review is highly contingent upon collaboration between Valora and the attorney.

  • Some of Valora’s clients also utilize contract attorneys to perform quality control after the final set of rules has been applied and the work product has been delivered back to the client. The number of attorneys needed is driven by the deadline as well as the % or # of documents your team believes should be reviewed by humans. With Valora’s TAR process, typically <5% of the documents are reviewed manually.


Q: What is the underlying technology that Valora uses to support their Technology Assisted Review workflow?      

A: Since our inception, Valora has been providing services powered by our proprietary pattern-matching technology. Valora’s uses Hierarchical Probabilistic Context-free Grammars, a type of Natural Language Processing. Our PowerHouseTM system (Valora’s proprietary processing platform and administrative dashboard) uses a voting engine to identify syntactic and semantic elements of a document (e.g., phone number or person's name) and combines these elements into progressively larger hierarchical grammar units.  It uses probabilities associated with each level in the hierarchy to select the matching recognition patterns. 


Q: Can I use Valora for Early Case Assessment (ECA) or when I don’t have much knowledge about my case or the data that has been collected?      

A: Absolutly! Valora has a visualization tool, BlackCatTM, which allows users to drill into their dataset to uncover recurring themes, relevant issues and other pertinent information. During the visual review in BlackCat, Valora’s PM will work with the SME attorney to build an appropriate Ruleset based on what was uncovered in their analysis of the data. From there, our standard process for TAR commences.


Q: In your presentations, you discuss a process of “tagging” documents when you ingest the data into PowerHouse. What does that mean?     

A: Our process of tagging documents is what makes Valora so unique. Our technology analyzes each document as it comes over the PowerHouse threshold, looking for unique and distinctive attributes about each of those documents that can be used at a later point to search for relevant or privileged content. Examples of the 90+ attributes that we extract and tag are:

  • Document Type (memo, inventory report, patent application, etc.)

  • Tone of a document (neutral, hostile, informal, formal, personal etc.)

  • Actual dates and authors (not relying on metadata)

By tagging documents in this way, Valora essentially obtains a DNA sequence for each document that can later be used for analysis and assessment. Valora’s tag list is a result of more than a decade of document-tagging experience. As a general rule of thumb, if there is some element of information present on a document, Valora’s system can identify and tag it.

Interestingly, Valora’s tagging approach works on both scanned paper documents and ESI documents with limited or no metadata, such as attachments and native files. Valora tags all documents in a collection.


Q: What is the value of tagging to the TAR process?     

A: Primarily, the value is in being able to utilize the tags to find and classify what you are looking for in the dataset. With Valora, you have access to information that you wouldn’t normally be able to obtain from a manual or “predictive coding” style document review. While Valora’s Ruleset is utilizing both content and tags, each of the tags collected can be made available to you in your database, if desired. This would provide you with useful ancillary information, such as who communicates with whom and which highly unusual concepts are present in a document, etc.

Ultimately, an automated TAR process replaces almost all of the doc-by-doc manual labor, which reduces the cost and time associated with document analysis. Furthermore, since Valora is primarily searching the tags when applying client rules, the speed of each iteration isn’t dependant on the size of the actual dataset. Typical results of each iteration are provided back for attorney comment within hours, not days!


Q: I am not 100% comfortable with producing without some level of “eyes on” review taking place. How can I use Valora in this situation?     

A: Valora is not suggesting that the machine fully take over and perform the work of an attorney. Instead, our process is designed to reduce the amount of low-level manual labor required for document review, and leave the high-level analysis in the hands of SMEs. Consequently, the result is reduced cost and time and increased, measureable accuracy.

Many of our clients use Valora to perform first pass review to determine relevancy and issues as well as to identify potentially privileged documents. Once Valora has delivered the output to your review platform, prioritization of “eyes on” review can be established. There are many ways to do this. Valora can work with your team to design a plan that gets you to your comfort level. Additionally, we can assist you with sampling the dataset using statistically sound and highly defensible methods for testing quality.


Q: What happens if I have a new issue or more data for a case? Can Valora handle changes in the workflow without adding costs?     

A: After more than a decade of coding and reviewing documents, at Valora we understand that nothing in litigation review is static, except for maybe the deadline! We have designed our process to accommodate changes at any stage in the process. When new issues arise, the SME attorney working with Valora explains what they need to the assigned Valora PM. The PM modifies the RuleSet to reflect the changes, and begins a process iteration accordingly. All iterations are saved and available for review at any time. Iterations proceed until the SME attorney is satisfied and formally signs off on the results.

The best part is that issue- and accuracy-oriented iterations are free! Valora encourages clients to hypothesize about how they would organize and classify their collection in different ways, and to specifically test out different variables per iteration. There is no additional charge for this type of dataset exploration, nor for the addition or removal of issues and rules over time.

Regarding adding new data, this is easy for Valora. New data installments are incorporated into the existing workflow and RuleSet. Once the new data is ingested and tagged (see above) into PowerHouse, the most recent set of rules is applied to the dataset and the results are delivered to BlackCat and/or to the client. In cases where the new data installment has an effect on the output of the prior installments (as can happen for detection and labeling of NearDuplicates, for example), Valora will supply a fresh overlay of the results for the entire dataset, incorporating the results of the new installment. Valora’s BlackCat interface always reflects the current state of the dataset, and astute observers can actually see the new data installment being incorporated in real-time! New data installments do have an additional per document charge for ingestion and processing, applied only to the new data added to the collection.


Q: How does Valora ensure accuracy of their automated process?      

A: Valora uses a stringent method of statistical sampling to ensure accuracy at all stages1. The specific method used is called Stratified Sampling. Stratified Sampling is a method where documents are randomly selected from specific segments of the population. An example of a segment is Document Type (Chart, Cash Flow Statement, Lab Results, etc.). So instead of pulling random samples from the entire population, Valora pulls a representative sample from each document type as identified by Valora’s tagging technology at ingestion. We are essentially creating a “mini me” sample set from each document type, date range, issue concentration, and so on. (Remember the 90+ attributes we tag for? Those ultimately become the basis for the sampling strata.)

Each of the documents in the strata selected are then manually reviewed (audited, really) for accuracy by either Valora’s personnel or the client’s. If any errors are found, they are recorded and fed back into the system for re-iteration. Valora will modify the rules to rectify the error and kick off an iteration. This process repeats until no errors are found in the sample set. It typically takes 3-4 tries, which are often combined with iterations for new issues or new data installments.

Like all statistical sample, the number of documents pulled for each stratified sample is mainly contingent on the confidence level desired regarding accuracy of the coding as well as the acceptable margin of error. To some degree the original size of the dataset is factored but the impact on the sample size is minimal. For example, in a document population of 100,000 documents, if you want to be 99% confident in your accuracy, with a 2% margin of error, your sample size would be roughly 4,000 documents.

For many, this seems like too small of a number since it is less than 5% of the collection. For those not well-versed in statistical sampling, this can seem like an impossibly small sample size. Valora recognizes the gulf between statistical sampling and “comfort sampling.” For those who have concerns, Valora suggests that they review the % that makes them the most comfortable. Typically, this is 10-20% of the population.


[1] Valora uses the standard formula for the computation of sample size:  where  is the sample size, is the number of standard deviations required for the desired confidence (1.96 for 95% confidence),  is the a priori assumed probability and is the desired confidence interval (e.g., ±2%).   Valora also applies the finite population correction: where  is the corrected sample size,  is the sample size from the previous formula, and  is the size of the whole collection.

Valora uses proportionate stratification for audit sampling:  the collection is divided into strata and samples are randomly selected from within each one such that the sample size from each stratum is proportionate to the size of that stratum relative to the whole collection.  Sample sizes are adjusted to ensure that all strata are sampled.


Q: Is Valora’s process defensible?     

A: Yes, it is defensible! Because the process is “Rules Based,” the decision-making for document disposition is completely transparent. Every decision that is or can be made by the system is verified by the SME attorney and then documented in the Valora RuleSet. This is not “black box technology,” nor is Valora depending on the quality of a seed set to perform the review. Because the rules are explicitly defined, written down, and producible, the process is perfectly repeatable by any party and consistent over time - the very definitions of defensibilty. If the rules stay the same, the results stay the same. Replicability and Transparency lead to Defensibility.

When the TAR process is complete, a final set of rules is provided as evidence to the court or opposing parties about how and why each document was coded as it was. The final RuleSet includes both the rules used in the particular matter, as well as a description of Valora’s process and technology.


Q: Can Valora AutoReview foreign language?     

A: Yes, Valora can process and AutoReview some foreign languages. If the data is ASCII text then we can handle it in our system. If the data is Unicode, we would be able to identify that the languages exist but we cannot AutoReview the documents. Examples of Unicode are Chinese, Japanese and Korean.


Q: What are some of the biggest differences between Valora’s Rules Based Approach and the common Predictive Coding model?     

A: This is a great question! There are several differences but the ones that have the most impact are as follows:

  • Valora does not create a random seed set using algorithms like most PC models do. Valora’s “seed set” is essentially the entire document population! That is important because there is no reliance on the quality of the seed set in Valora’s model. Additionally, there are no conflicts within the coding, as you often see when human beings make determinations about seed set documents.

  • Valora’s process is very flexible as new issues develop or existing issues change over time. When a new issue(s) arises, the rules are modified to reflect the changes. The iterative process resumes, until the attorney is satisfied and signs off on the results. There is no additional fee from Valora and minimal attorney time required to scope out the rules modification and review the results. With PC workflows, it is often confusing how to manage changes to the issues. There isn’t a push-button solution. When there changes, the attorneys must re-review the seed set for the updated issues, which are then propagated across the dataset. Depending on the number of changes, your attorney may be reviewing the same seed set multiple times, which will obviously increase costs and time necessary to complete the review. It can also increase complexity as documents may appear to “arbitrarily” change their status over time.

  • Valora’s process is fully transparent. There is full record of every rule that was applied, accompanied with the results of each rule’s application. There is never a question about how or why a decision was made when using Valora’s process. Additionally, the Rules-Based process is replicable, meaning that if the dataset doesn’t change and the rules don’t change, you will get the same result every time. With PC, it is often unclear why a decision was made on a document originally, or why other documents followed suit (or didn’t). By design, these systems are “Black Box” in that they do not disclose the particulars by which document categorizations were actually made, only that they were “similar” to previous decision. Furthermore, it is extremely unlikely to get the same results from 2 attorneys reviewing the same seed set. That is just the human decision-making process at work. Thus, replicability is lower with PC than it is with Rules-Based systems.

  • With Valora, the savings is in the cost of hourly labor. There is no need for a high-priced SME attorney to sit and review an arbitrary sampling of documents for relevancy until a seed set is complete. Valora involves the attorneys in the process by showing them emblematic documents and asking for their feedback. The effort is minimal in comparison to optimizing a seed set (or to hiring legions of contract attorneys). The impact is a huge savings in hourly fees. Depending on the size of the seed set and the complexity of the case, PC models could actually render more attorney fees from seed set review and QC, than it would cost to use a team of contract attorneys for first pass review and senior attorneys for QC!