Seite - 133 - in The Future of Software Quality Assurance
Bild der Seite - 133 -
Text der Seite - 133 -
Testing Artificial Intelligence 133
4.5 Test Data
What test data to use and whether it can be created, found or manipulated depends
on the context and the availability of data from production. Data creation or
manipulation(likeincaseof imagerecognition)ishardtodoandsometimesuseless
orevencounter-productive.Using tools tomanipulateorcreate imagesbrings inan
extravariablewhichmightcreatebiasof its own!Howrepresentativeof real-world
pictures is test data? If the algorithmidentifiesaspects in createddata that can only
be foundin test data, thevalueof the tests is compromised.
AI testerscreatea testdataset fromreal-lifedataandstrictlyseparate thesefrom
training data. As the AI system is dynamic, the world it is used in is dynamic, test
datawill have to be refreshedregularly.
4.6 Metrics
TheoutputofAI isnotBoolean: theyarecalculatedresultsonallpossibleoutcomes
(labels).To determine the performanceof the system, it is not enough to determine
which label has thehighest score.Metricswill benecessary.
Take, forexample, image recognition:we want to knowif a picture of a cat will
be recognised as a cat. In practice this means that the label “cat” will get a higher
score than“dog”. If thescoreoncat is0.43anddoggets0.41, thecatwins.But the
smalldifferencebetween thescoresmight indicate faultprobability.
In a search engine we want to know if the top result is the top 1 expectation of
theuser,but if the top1 result isnumber2on the list, that soundswrong,but is still
better than if itwerenumber3.Wewant toknowifall relevant resultsare in the top
10(this is calledprecision)or that therearenooffensiveresults in the top10.
Depending on the context we need metrics to process the output from the AI
system into an evaluation of its performance. Testers need the skills to determine
relevantmetricsand incorporate themin the tests.
4.7 WeighingandContracts
Theoverallevaluationof theAIsystem alsohas to incorporaterelative importance.
Someresults aremore important thanothersas is with any testing.Thinkof results
with high moral impact like racial bias. As part of designing test cases their weight
for the overall evaluation should be determined based on risks and importance to
users. Testers need sensitivity for these kinds of risks, being able to identify them,
translating them into test cases and metrics. They will need understanding of the
contextof the usage of the system and the psychologyof the users. AI testers need
empathyandworldawareness.
The Future of Software Quality Assurance
- Titel
- The Future of Software Quality Assurance
- Autor
- Stephan Goericke
- Verlag
- Springer Nature Switzerland AG
- Ort
- Cham
- Datum
- 2020
- Sprache
- englisch
- Lizenz
- CC BY 4.0
- ISBN
- 978-3-030-29509-7
- Abmessungen
- 15.5 x 24.1 cm
- Seiten
- 276
- Kategorie
- Informatik