Valora Technology Assisted Review FAQ
Frequently Asked Questions (FAQ’s)
Download as PDF
- How long on average does it take to complete a TAR when implementing Valora’s workflow?
- How many reviewers do I need to involve when using Valora’s technology to perform our review?
- What is the underlying technology that Valora uses to support their Technology Assisted Review workflow?
- Can I use Valora for Early Case Assessment (ECA) or when I don’t have much knowledge about my case or the data that has been collected?
- In your presentations, you discuss a process of “tagging” documents when you ingest the data into PowerHouse. What does that mean?
- What is the value of tagging to the TAR process?
- I am not 100% comfortable with producing without some level of “eyes on” review taking place. How can I use Valora in this situation?
- What happens if I have a new issue or more data for a case? Can Valora handle changes in the workflow without adding costs?
- How does Valora ensure accuracy of their automated process?
- Is Valora’s process defensible?
- Can Valora AutoReview foreign language?
- What are some of the biggest differences between Valora’s Rules Based Approach and the common Predictive Coding model?
Q: How long on average does it take to complete a TAR when implementing Valora’s workflow? ↑
A: The amount of time is highly
contingent upon the availability of the case’s
subject-matter-experts (SMEs), often outside counsel attorneys who
will provide input and feedback to Valora’s project managers.
Examples of SME input and feedback include: details of what they are
looking to find, or rules for categorizing (review-labeling) document
in the collection. Another factor to consider is the number of times
the rules are modified (otherwise known as “iterations”)
for the attorneys to have confidence with the process. A typical
project breaks down as follows:
2-3 days to configure and run
the rules. Simultaneously, Valora is ingesting the data and running
it through our proprietary process to extract and tag attributes
about the individual documents.
1-2 days of run time for
applying and performing quality control on the rules. This is the
step that often iterates as SMEs discover new elements of the
documents or the case.
Final QC using law firm or
contract attorneys is optional. If elected, the percent of documents
reviewed will drive the estimated time to complete. With Valora’s
TAR process, typically <5% of the documents are reviewed
Q: How many reviewers do I need to involve when using Valora’s technology to perform our review? ↑
A: Valora recommends that 1 Sr.
Associate or 1 Partner participate in the process. This person should
be someone who has an intimate knowledge of the case and pertinent
issues driving discovery review. This attorney should be available to
provide feedback to their designated PM regarding how documents are
being automatically coded. The success of the review is highly
contingent upon collaboration between Valora and the attorney.
Some of Valora’s clients
also utilize contract attorneys to perform quality control after the
final set of rules has been applied and the work product has been
delivered back to the client. The number of attorneys needed is
driven by the deadline as well as the % or # of documents your team
believes should be reviewed by humans. With Valora’s TAR
process, typically <5% of the documents are reviewed manually.
Q: What is the underlying technology that Valora uses to support their Technology Assisted Review workflow? ↑
A: Since our inception, Valora has been providing services powered by our proprietary pattern-matching
technology. Valora’s uses Hierarchical Probabilistic
Context-free Grammars, a type of Natural Language Processing. Our
PowerHouseTM system (Valora’s proprietary processing
platform and administrative dashboard) uses a voting engine to
identify syntactic and semantic elements of a document (e.g.,
phone number or person's name) and combines these elements into
progressively larger hierarchical grammar units. It uses
probabilities associated with each level in the hierarchy to select
the matching recognition patterns.
Q: Can I use Valora for Early Case Assessment (ECA) or when I don’t have much knowledge about my case or the data that has been collected? ↑
A: Absolutly! Valora has a
visualization tool, BlackCatTM, which allows users to
drill into their dataset to uncover recurring themes, relevant issues
and other pertinent information. During the visual review in
BlackCat, Valora’s PM will work with the SME attorney to build
an appropriate Ruleset based on what was uncovered in their analysis
of the data. From there, our standard process for TAR commences.
Q: In your presentations, you discuss a process of “tagging” documents when you ingest the data into PowerHouse. What does that mean? ↑
A: Our process of tagging documents
is what makes Valora so unique. Our technology analyzes each document
as it comes over the PowerHouse threshold, looking for unique and
distinctive attributes about each of those documents that can be used
at a later point to search for relevant or privileged content.
Examples of the 90+ attributes that we extract and tag are:
Document Type (memo, inventory
report, patent application, etc.)
Tone of a document (neutral,
hostile, informal, formal, personal etc.)
Actual dates and authors (not
relying on metadata)
By tagging documents in this way,
Valora essentially obtains a DNA sequence for each document that can
later be used for analysis and assessment. Valora’s tag list
is a result of more than a decade of document-tagging experience. As
a general rule of thumb, if there is some element of information
present on a document, Valora’s system can identify and tag it.
tagging approach works on both scanned paper documents and ESI
documents with limited or no metadata, such as attachments and native
files. Valora tags all documents in a collection.
Q: What is the value of tagging to the TAR process? ↑
A: Primarily, the value is in being
able to utilize the tags to find and classify what you are looking
for in the dataset. With Valora, you have access to information that
you wouldn’t normally be able to obtain from a manual or
“predictive coding” style document review. While
Valora’s Ruleset is utilizing both content and tags, each of
the tags collected can be made available to you in your database, if
desired. This would provide you with useful ancillary information,
such as who communicates with whom and which highly unusual concepts
are present in a document, etc.
Ultimately, an automated TAR process
replaces almost all of the doc-by-doc manual labor, which reduces the
cost and time associated with document analysis. Furthermore, since
Valora is primarily searching the tags when applying client rules,
the speed of each iteration isn’t dependant on the size of the
actual dataset. Typical results of each iteration are provided back
for attorney comment within hours, not days!
Q: I am not 100% comfortable with producing without some level of “eyes on” review taking place. How can I use Valora in this situation? ↑
A: Valora is not suggesting that the
machine fully take over and perform the work of an attorney. Instead,
our process is designed to reduce the amount of low-level manual
labor required for document review, and leave the high-level analysis
in the hands of SMEs. Consequently, the result is reduced cost and
time and increased, measureable accuracy.
Many of our clients use Valora to
perform first pass review to determine relevancy and issues as well
as to identify potentially privileged documents. Once Valora has
delivered the output to your review platform, prioritization of “eyes
on” review can be established. There are many ways to do this.
Valora can work with your team to design a plan that gets you to your
comfort level. Additionally, we can assist you with sampling the
dataset using statistically sound and highly defensible methods for
Q: What happens if I have a new issue or more data for a case? Can Valora handle changes in the workflow without adding costs? ↑
A: After more than a decade of
coding and reviewing documents, at Valora we understand that nothing
in litigation review is static, except for maybe the deadline! We
have designed our process to accommodate changes at any stage in the
process. When new issues arise, the SME attorney working with Valora
explains what they need to the assigned Valora PM. The PM modifies
the RuleSet to reflect the changes, and begins a process iteration
accordingly. All iterations are saved and available for review at
any time. Iterations proceed until the SME attorney is satisfied and
formally signs off on the results.
The best part is that issue- and
accuracy-oriented iterations are free! Valora encourages clients to
hypothesize about how they would organize and classify their
collection in different ways, and to specifically test out different
variables per iteration. There is no additional charge for this type
of dataset exploration, nor for the addition or removal of issues and
rules over time.
Regarding adding new data, this is
easy for Valora. New data installments are incorporated into the
existing workflow and RuleSet. Once the new data is ingested and
tagged (see above) into PowerHouse, the most recent set of rules is
applied to the dataset and the results are delivered to BlackCat
and/or to the client. In cases where the new data installment has an
effect on the output of the prior installments (as can happen
for detection and labeling of NearDuplicates, for example), Valora
will supply a fresh overlay of the results for the entire dataset,
incorporating the results of the new installment. Valora’s
BlackCat interface always reflects the current state of the dataset,
and astute observers can actually see the new data installment being
incorporated in real-time! New data installments do have an
additional per document charge for ingestion and processing, applied
only to the new data added to the collection.
Q: How does Valora ensure accuracy of their automated process?
A: Valora uses a stringent method of
statistical sampling to ensure accuracy at all stages1. The specific
method used is called Stratified
Sampling. Stratified Sampling is a method where documents are
randomly selected from specific segments of the population. An
example of a segment is Document Type (Chart, Cash Flow Statement,
Lab Results, etc.). So instead of pulling random samples from the
entire population, Valora pulls a representative sample from each
document type as identified by Valora’s tagging technology
at ingestion. We are essentially creating a “mini me”
sample set from each document type, date range, issue concentration,
and so on. (Remember the 90+ attributes we tag for? Those
ultimately become the basis for the sampling strata.)
Each of the documents in the strata
selected are then manually reviewed (audited, really) for accuracy by
either Valora’s personnel or the client’s. If any errors
are found, they are recorded and fed back into the system for
re-iteration. Valora will modify the rules to rectify the error and
kick off an iteration. This process repeats until no errors are found
in the sample set. It typically takes 3-4 tries, which are often
combined with iterations for new issues or new data installments.
Like all statistical sample, the
number of documents pulled for each stratified sample is mainly
contingent on the confidence level desired regarding accuracy of the
coding as well as the acceptable margin of error. To some degree the
original size of the dataset is factored but the impact on the sample
size is minimal. For example, in a document population of 100,000
documents, if you want to be 99% confident in your accuracy, with a
2% margin of error, your sample size would be roughly 4,000
For many, this seems like too small
of a number since it is less than 5% of the collection. For those not
well-versed in statistical sampling, this can seem like an impossibly
small sample size. Valora recognizes the gulf between statistical
sampling and “comfort sampling.” For those who have
concerns, Valora suggests that they review the % that makes them the
most comfortable. Typically, this is 10-20% of the population.
Q: Is Valora’s process defensible? ↑
A: Yes, it is defensible! Because
the process is “Rules Based,” the decision-making for
document disposition is completely transparent. Every decision that
is or can be made by the system is verified by the SME attorney and
then documented in the Valora RuleSet. This is not “black box
technology,” nor is Valora depending on the quality of a seed
set to perform the review. Because the rules are explicitly defined,
written down, and producible, the process is perfectly repeatable by
any party and consistent over time - the very definitions of
defensibilty. If the rules stay the same, the results stay the same.
Replicability and Transparency lead to Defensibility.
When the TAR process is complete, a
final set of rules is provided as evidence to the court or opposing
parties about how and why each document was coded as it was. The
final RuleSet includes both the rules used in the particular matter,
as well as a description of Valora’s process and technology.
Q: Can Valora AutoReview foreign language? ↑
A: Yes, Valora can process and
AutoReview some foreign languages. If the data is ASCII text then we
can handle it in our system. If the data is Unicode, we would be able
to identify that the languages exist but we cannot AutoReview the
documents. Examples of Unicode are Chinese, Japanese and Korean.
Q: What are some of the biggest differences between Valora’s Rules Based Approach and the common Predictive Coding model? ↑
A: This is a great question! There
are several differences but the ones that have the most impact are as
Valora does not create a random
seed set using algorithms like most PC models do. Valora’s
“seed set” is essentially the entire document
population! That is important because there is no reliance on the
quality of the seed set in Valora’s model. Additionally, there
are no conflicts within the coding, as you often see when human
beings make determinations about seed set documents.
Valora’s process is very
flexible as new issues develop or existing issues change over time.
When a new issue(s) arises, the rules are modified to reflect the
changes. The iterative process resumes, until the attorney is
satisfied and signs off on the results. There is no additional fee
from Valora and minimal attorney time required to scope out the
rules modification and review the results. With PC workflows, it is
often confusing how to manage changes to the issues. There isn’t
a push-button solution. When there changes, the attorneys must
re-review the seed set for the updated issues, which are then
propagated across the dataset. Depending on the number of changes,
your attorney may be reviewing the same seed set multiple times,
which will obviously increase costs and time necessary to complete
the review. It can also increase complexity as documents may appear
to “arbitrarily” change their status over time.
Valora’s process is fully
transparent. There is full record of every rule that was applied,
accompanied with the results of each rule’s application. There
is never a question about how or why a decision was made when using
Valora’s process. Additionally, the Rules-Based process is
replicable, meaning that if the dataset doesn’t change and the
rules don’t change, you will get the same result every time.
With PC, it is often unclear why a decision was made on a document
originally, or why other documents followed suit (or didn’t).
By design, these systems are “Black Box” in that they do
not disclose the particulars by which document categorizations were
actually made, only that they were “similar” to previous
decision. Furthermore, it is extremely unlikely to get the same
results from 2 attorneys reviewing the same seed set. That is just
the human decision-making process at work. Thus, replicability is
lower with PC than it is with Rules-Based systems.
With Valora, the savings is in
the cost of hourly labor. There is no need for a high-priced SME
attorney to sit and review an arbitrary sampling of documents for
relevancy until a seed set is complete. Valora involves the
attorneys in the process by showing them emblematic documents and
asking for their feedback. The effort is minimal in comparison to
optimizing a seed set (or to hiring legions of contract attorneys).
The impact is a huge savings in hourly fees. Depending on the size
of the seed set and the complexity of the case, PC models could
actually render more attorney fees from seed set review and
QC, than it would cost to use a team of contract attorneys for first
pass review and senior attorneys for QC!