Page - 133 - in The Future of Software Quality Assurance

Image of the Page - 133 -

Text of the Page - 133 -

Testing Artificial Intelligence 133 4.5 Test Data What test data to use and whether it can be created, found or manipulated depends on the context and the availability of data from production. Data creation or manipulation(likeincaseof imagerecognition)ishardtodoandsometimesuseless orevencounter-productive.Using tools tomanipulateorcreate imagesbrings inan extravariablewhichmightcreatebiasof its own!Howrepresentativeof real-world pictures is test data? If the algorithmidentifiesaspects in createddata that can only be foundin test data, thevalueof the tests is compromised. AI testerscreatea testdataset fromreal-lifedataandstrictlyseparate thesefrom training data. As the AI system is dynamic, the world it is used in is dynamic, test datawill have to be refreshedregularly. 4.6 Metrics TheoutputofAI isnotBoolean: theyarecalculatedresultsonallpossibleoutcomes (labels).To determine the performanceof the system, it is not enough to determine which label has thehighest score.Metricswill benecessary. Take, forexample, image recognition:we want to knowif a picture of a cat will be recognised as a cat. In practice this means that the label “cat” will get a higher score than“dog”. If thescoreoncat is0.43anddoggets0.41, thecatwins.But the smalldifferencebetween thescoresmight indicate faultprobability. In a search engine we want to know if the top result is the top 1 expectation of theuser,but if the top1 result isnumber2on the list, that soundswrong,but is still better than if itwerenumber3.Wewant toknowifall relevant resultsare in the top 10(this is calledprecision)or that therearenooffensiveresults in the top10. Depending on the context we need metrics to process the output from the AI system into an evaluation of its performance. Testers need the skills to determine relevantmetricsand incorporate themin the tests. 4.7 WeighingandContracts Theoverallevaluationof theAIsystem alsohas to incorporaterelative importance. Someresults aremore important thanothersas is with any testing.Thinkof results with high moral impact like racial bias. As part of designing test cases their weight for the overall evaluation should be determined based on risks and importance to users. Testers need sensitivity for these kinds of risks, being able to identify them, translating them into test cases and metrics. They will need understanding of the contextof the usage of the system and the psychologyof the users. AI testers need empathyandworldawareness.

back to the book The Future of Software Quality Assurance"

The Future of Software Quality Assurance

Title: The Future of Software Quality Assurance
Author: Stephan Goericke
Publisher: Springer Nature Switzerland AG
Location: Cham
Date: 2020
Language: English
License: CC BY 4.0
ISBN: 978-3-030-29509-7
Size: 15.5 x 24.1 cm
Pages: 276
Category: Informatik