TRANSCRIPTION ACCURACY
Expectations
& Limitations Restricted Access Information page
Overview
The over riding limitation on
quality of output data is the quality of the source.
Traditional key punching (from printed documents) can guarantee
99.9% accuracy using a 2 pass, punch and verify process. OCR can, within more
strictly defined source limitations produce similar results.
Transcription processes are less precise and are as much an
interpretive skill as science. As a consequence, accuracy can and does vary
enormously from supplier to supplier - a fact that any prospective user of
the data should carefully consider. Our testing of data from various suppliers
indicates that price is not a general guide to
quality.
Limitations
The recording medium does not normally contribute to any deterioration of the available voice data,
however in the
real world,
-
Respondents get flustered and fail to provide
all the information requested.
-
Respondents provide all the information but in a rushed or
garbled manner.
-
Respondents have poor phone technique reducing sound
quality by talking across or away from the mouth piece or by making the call from a
noisy environment
-
Respondents speak with accents or regional dialects that
amount to poor pronunciation
Significantly we are dealing with "Name" data which is
not bound by general spelling rules.
QA
Quality assurance is
maintained by
-
utilising quality
platforms ensuring quality audio reproduction.
-
employing
only experienced data entry operators with proven speed and accuracy skills
and extensive history in name and address processing
-
restricting
experienced operators to those with English as first language and having
extensive local knowledge of Australian cities and remote
environs.
Software
Tools
DART has
designed a proprietary suite of tools to deliver a streamlined play and
capture service. At the core is the AMAS database providing help and fill
assistance for precise address location and a name list that runs into the tens
of thousands.
Data Components and Validation
Family names
Theoretically, family names do not follow any particular spelling
rules and are of infinite variety. Unless the name is spelt there is no absolute
guarantee of accuracy. That said, our capture software contains smart search
access to tens of thousands of names based on the white pages directories. In
selecting a name, where some doubt as to spelling exists we are not
guessing but rather matching a pronunciation. We are, in effect trying to turn
an infinite list of possibilities into a finite range. There are limitations -
the white pages directory lists do not contain all names or spelling variations
and copyright and privacy legislation prevents a name and address link which
would serve to cross check possibilities.
Our observation is that 2-5% of family names are either not
present on the voice file or so indistinct as to make definition
impossible. A further 10% will be clearly pronounced but not spelt correctly due
to interpretive errors eg Newman should be Numan or Schmidt should be
Schmitt.
First names
First names are better defined than family names. Respondents will often only
supply an initial or skip this request. The fall back for an operator when a
name is foreign to them or imprecise is to record only the initial. If requested,
this field fails to be recorded for an estimated 5 -10% of respondents. Addresses Address
information (street numbers, street names and suburb) is the most responsive
component to validation and checking. DART uses an AMAS based tool to verify
that the street (and/or apartment number) exists within the suburb and postcode
quoted. Respondents often misname roads and streets or provide incorrect
postcode details (approx 10%) however the majority of these can be resolved with
certainty. As this data is auto filled spelling errors are avoided. It should be
noted that not all apartments or all parts of Australia are currently
listed on the AMAS database however there is comprehensive coverage of major
cities. All verified addresses will automatically have the Australia
Post DPID barcode included within the record as standard output. Respondents
typically fail to give sufficient (unresolvable) detail in 3-5% of cases and a
further 3-5% of records cannot be validated. Email Email
records are prone to error to a similar degree as family names. Although the
format is defined to some extent (@, .com etc) if respondents fail to spell the
unique component there is no defined list with which to cross check. Telephone
Numbers Error rates are typically low - allow 1-2%
Summary
Where personalised addressing is the required output from a standard
55cent competition call window we would estimate errors appearing in
approximately 20% of output records with family names, first names and emails
carrying the majority.
Of the 20% of records that contain
inaccuracies most address errors would not prevent delivery and would be of a minor nature or a result of missing data at source.
Test and Check
- Don't just accept quoted accuracy levels.
- A supplier should be willing to provide test data
and be capable of maintaining a consistent standard.
- Verify the results and spot check ongoing projects
|