Dart Logo
Home
About DART
Services
Contact us


 


TRANSCRIPTION ACCURACY  

Expectations & Limitations

Restricted Access Information page


Overview

The over riding limitation on quality of output data is the quality of the source. 

Traditional key punching (from printed documents) can guarantee 99.9% accuracy using a 2 pass, punch and verify process. OCR can, within more strictly defined source limitations produce similar results.  

Transcription processes are less precise and are as much an interpretive skill as science. As a consequence, accuracy can and does vary enormously from supplier to supplier  - a fact that any prospective user of the data should carefully consider. Our testing of data from various suppliers indicates that price is not a general guide to quality. 

Limitations

The recording medium does not normally contribute to any deterioration of the available voice data, however in the real world,

  • Respondents get  flustered and fail to provide all  the information requested. 

  • Respondents provide all the information but in a rushed or garbled manner.

  • Respondents have poor phone technique reducing sound quality by talking across or away from the mouth piece or by making the call from a noisy environment

  • Respondents speak with accents or regional dialects that amount to poor pronunciation 

Significantly we are dealing with "Name" data which is not bound by general spelling rules. 

QA

 

Quality assurance is maintained by 

  • utilising quality platforms ensuring quality audio reproduction.

  • employing  only experienced data entry operators with proven speed and accuracy skills and extensive history in name and address processing

  • restricting experienced operators to those with English as first language and having extensive local knowledge of Australian cities and remote environs.  

Software Tools

 

DART has designed  a proprietary suite of tools to deliver a streamlined play and capture service. At the core is  the AMAS database providing help and fill assistance for precise address location and a name list that runs into the tens of thousands.  

Data Components and Validation

Family names

Theoretically, family names do not follow any particular spelling rules and are of infinite variety. Unless the name is spelt there is no absolute guarantee of accuracy. That said, our capture software contains smart search access to tens of thousands of names based on the white pages directories. In selecting a name, where some doubt as to spelling exists we are not  guessing but rather matching a pronunciation. We are, in effect trying to turn an infinite list of possibilities into a finite range. There are limitations - the white pages directory lists do not contain all names or spelling variations and copyright and privacy legislation prevents a name and address link which would serve to cross check possibilities. 

Our observation is that 2-5% of family names are either not present on the voice file  or so indistinct as to make definition impossible. A further 10% will be clearly pronounced but not spelt correctly due to interpretive errors eg Newman should be Numan or Schmidt should be Schmitt. 

First names

First names are better defined than family names. Respondents will often only supply an initial or skip this request. The fall back for an operator when a name is foreign to them or imprecise is to record only the initial. If requested, this field fails to be recorded for an estimated 5 -10% of respondents.

Addresses

Address information (street numbers, street names and suburb) is the most responsive component to validation and checking. DART uses an AMAS based tool to verify that the street (and/or apartment number) exists within the suburb and postcode quoted. Respondents often misname roads and streets or provide incorrect postcode details (approx 10%) however the majority of these can be resolved with certainty. As this data is auto filled spelling errors are avoided. It should be noted that not all apartments or all parts of Australia are currently listed on the AMAS database however there is comprehensive coverage of major cities. All verified addresses will automatically have the Australia Post DPID barcode included within the record as standard output. Respondents typically fail to give sufficient (unresolvable) detail in 3-5% of cases and a further 3-5% of records cannot be validated.   

Email

Email records are prone to error to a similar degree as family names. Although the format is defined to some extent (@, .com etc) if respondents fail to spell the unique component there is no defined list with which to cross check. 

Telephone Numbers

Error rates are typically low - allow 1-2%

Summary

Where personalised addressing is the required output from a standard 55cent competition call window we would estimate errors appearing in approximately 20% of output records with family names, first names and emails carrying the majority. 

Of the 20% of records that contain inaccuracies most address errors would not prevent delivery and would be of  a minor nature or a result of missing data at source. 

Test and Check
  • Don't just accept quoted accuracy levels. 
  • A supplier should be willing to provide test data and be capable of maintaining a consistent standard.  
  • Verify the results and spot check ongoing projects