Expectations & Limitations
Restricted Access Information page
The over riding limitation on quality of output data is the quality of the source.
Traditional key punching (from printed documents) can guarantee 99.9% accuracy using a 2 pass, punch and verify process. OCR can, within more strictly defined source limitations produce similar results.
Transcription processes are less precise and are as much an interpretive skill as science. As a consequence, accuracy can and does vary enormously from supplier to supplier - a fact that any prospective user of the data should carefully consider. Our testing of data from various suppliers indicates that price is not a general guide to quality.
The recording medium does not normally contribute to any deterioration of the available voice data, however in the real world,
Significantly we are dealing with "Name" data which is not bound by general spelling rules.
Quality assurance is maintained by
DART has designed a proprietary suite of tools to deliver a streamlined play and capture service. At the core is the AMAS database providing help and fill assistance for precise address location and a name list that runs into the tens of thousands.
Data Components and Validation
Theoretically, family names do not follow any particular spelling rules and are of infinite variety. Unless the name is spelt there is no absolute guarantee of accuracy. That said, our capture software contains smart search access to tens of thousands of names based on the white pages directories. In selecting a name, where some doubt as to spelling exists we are not guessing but rather matching a pronunciation. We are, in effect trying to turn an infinite list of possibilities into a finite range. There are limitations - the white pages directory lists do not contain all names or spelling variations and copyright and privacy legislation prevents a name and address link which would serve to cross check possibilities.
Our observation is that 2-5% of family names are either not present on the voice file or so indistinct as to make definition impossible. A further 10% will be clearly pronounced but not spelt correctly due to interpretive errors eg Newman should be Numan or Schmidt should be Schmitt.
First names are better defined than family names. Respondents will often only supply an initial or skip this request. The fall back for an operator when a name is foreign to them or imprecise is to record only the initial. If requested, this field fails to be recorded for an estimated 5 -10% of respondents.
Address information (street numbers, street names and suburb) is the most responsive component to validation and checking. DART uses an AMAS based tool to verify that the street (and/or apartment number) exists within the suburb and postcode quoted. Respondents often misname roads and streets or provide incorrect postcode details (approx 10%) however the majority of these can be resolved with certainty. As this data is auto filled spelling errors are avoided. It should be noted that not all apartments or all parts of Australia are currently listed on the AMAS database however there is comprehensive coverage of major cities. All verified addresses will automatically have the Australia Post DPID barcode included within the record as standard output. Respondents typically fail to give sufficient (unresolvable) detail in 3-5% of cases and a further 3-5% of records cannot be validated.
Email records are prone to error to a similar degree as family names. Although the format is defined to some extent (@, .com etc) if respondents fail to spell the unique component there is no defined list with which to cross check.
Error rates are typically low - allow 1-2%
Where personalised addressing is the required output from a standard 55cent competition call window we would estimate errors appearing in approximately 20% of output records with family names, first names and emails carrying the majority.
Of the 20% of records that contain inaccuracies most address errors would not prevent delivery and would be of a minor nature or a result of missing data at source.
Test and Check