Is accessibility conformance an elusive property? A study of validity and reliability of WCAG 2.0 - Citegraph

Paper Info

Title
Is accessibility conformance an elusive property? A study of validity and reliability of WCAG 2.0

Abstract
The Web Content Accessibility Guidelines (WCAG) 2.0 separate testing into both “Machine” and “Human” audits; and further classify “Human Testability” into “Reliably Human Testable” and “Not Reliably Testable”; it is human testability that is the focus of this paper. We wanted to investigate the likelihood that “at least 80&percnt; of knowledgeable human evaluators would agree on the conclusion” of an accessibility audit, and therefore understand the percentage of success criteria that could be described as reliably human testable, and those that could not. In this case, we recruited twenty-five experienced evaluators to audit four pages for WCAG 2.0 conformance. These pages were chosen to differ in layout, complexity, and accessibility support, thereby creating a small but variable sample. We found that an 80&percnt; agreement between experienced evaluators almost never occurred and that the average agreement was at the 70--75&percnt; mark, while the error rate was around 29&percnt;. Further, trained—but novice—evaluators performing the same audits exhibited the same agreement to that of our more experienced ones, but a reduction on validity of 6--13&percnt; ; the validity that an untrained user would attain can only be a conjecture. Expertise appears to improve (by 19&percnt;) the ability to avoid false positives. Finally, pooling the results of two independent experienced evaluators would be the best option, capturing at most 76&percnt; of the true problems and producing only 24&percnt; of false positives. Any other independent combination of audits would achieve worse results. This means that an 80&percnt; target for agreement, when audits are conducted without communication between evaluators, is not attainable, even with experienced evaluators, when working on pages similar to the ones used in this experiment; that the error rate even for experienced evaluators is relatively high and further, that untrained accessibility auditors be they developers or quality testers from other domains, would do much worse than this.

Year	DOI	Venue
2012	10.1145/2141943.2141946	TACCESS
Keywords	DocType	Volume
human testable,false positive,elusive property,error rate,experienced evaluator,average agreement,knowledgeable human evaluator,independent experienced evaluator,accessibility audit,human testability,twenty-five experienced evaluator	Journal	4
Issue	Citations	PageRank
2	14	1.11
References	Authors
17	3

Authors (3 rows)

Cited by (14 rows)

References (17 rows)

Name	Order	Citations	PageRank
Giorgio Brajnik	1	645	62.79
yeliz yesilada	2	566	74.67
Simon Harper	3	1105	140.48

1