Summary Automating Test Selection and Defect Prevention arxiv.org
7,430 words - PDF document - View PDF document
One Line
SUPERNOVA is an automated system that reduces testing hours, improves stability, and prevents defects in AAA video games through risk assessment, machine learning, and a SaaS model.
Slides
Slide Presentation (12 slides)
Key Points
- SUPERNOVA is a system that automates test selection and defect prevention in AAA video games using data analysis, machine learning, and deep learning.
- It reduces testing hours by 55% or more and improves stability during the production cycle.
- SUPERNOVA uses risk-based testing to select tests and provides a detailed breakdown of the probability of a change-list being bug inducing.
- The system also incorporates defect prevention by predicting whether code is bug inducing and presenting developers with feature-based insights.
- SUPERNOVA is an end-to-end automated solution that selects tests based on risk assessment and extracts features from code data, software system hierarchy, and developer details.
Summaries
109 word summary
SUPERNOVA is an automated system for test selection and defect prevention in AAA video games. It reduces testing hours by 55% and improves stability. The system selects tests based on risk assessment and provides insights to developers. It is relevant for the games industry and incorporates machine learning to prevent bugs. SUPERNOVA has a SaaS model and offers a user interface for managing data sources and training models. It is used by EA for data science tasks. Defect prevention is achieved through machine learning. Future work involves improving algorithms and adding semantic code properties. SUPERNOVA aims to automate test selection and prevent defects, providing improved forecasting, efficiencies, and reliability.
214 word summary
The authors present SUPERNOVA, an automated system that uses data analysis, machine learning, and deep learning to automate test selection and defect prevention in AAA video games. The system reduces testing hours by 55% and improves stability during production cycles. It selects tests based on risk assessment and provides insights to developers. SUPERNOVA is particularly relevant for the games industry due to its tight release cycles and high uncertainty. It incorporates machine learning and code analysis to prevent bugs from entering production code. The system has a software as a service (SaaS) model and offers a user interface for managing data sources, designing rules, training machine learning models, and monitoring performance. EA uses SUPERNOVA for data science tasks in QA and game testing. The system allows for efficient data collection, configuration, and model construction. It can be adapted to changing game development cycles and offers different model options. Defect prevention is achieved through machine learning by predicting risky code commits. The adoption of SUPERNOVA led to a decrease in required testing hours and a higher mean daily fix rate. Future work involves improving the system's algorithms and supplementing the existing machine learning algorithm with semantic code properties. SUPERNOVA aims to automate test selection and prevent defects, providing improved forecasting, increased efficiencies, and better reliability.
498 word summary
Automating test selection and defect prevention in AAA video games is a challenge due to the complexity and size of software systems. To address this, the authors present SUPERNOVA, a system that uses data analysis, machine learning, and deep learning to automate test selection and defect prevention. SUPERNOVA reduces testing hours by 55% or more and improves stability during the production cycle. It uses risk-based testing to select tests and provides a detailed breakdown of the probability of a change-list introducing bugs. The system also predicts whether code is bug-inducing and provides feature-based insights to developers. SUPERNOVA is an end-to-end automated solution that selects tests based on risk assessment and extracts features from code data, software system hierarchy, and developer details. It is particularly relevant for the games industry, which has tight release cycles and high uncertainty. The system incorporates machine learning and code analysis to prevent bugs from entering production code. SUPERNOVA is an end-to-end automation tool with a software as a service (SaaS) model. It provides a user interface for managing data sources, designing rules, creating mathematical functions, training machine learning models, and monitoring performance. The system allows for efficient data collection, configuration, and model construction. EA uses SUPERNOVA for data science tasks in QA and game testing. The system uses probability, impact, and time criteria to assess the risk of failure and make automated test selections. It also incorporates deep learning and allows users to create and train neural networks. Currently, SUPERNOVA is used for automating test selection and defect prevention using risk-based testing and machine learning. It offers flexibility and efficiency in model training and can be adapted to changing game development cycles. The system allows for training models directly on semantic code data without the need for complex features. It offers different model options to cater to different skill levels. Defect prevention is achieved through machine learning by predicting risky code commits using a semi-supervised learning approach.
Automating test selection and defect prevention can significantly improve testing processes and reduce testing hours. The SZZ algorithm, while effective, has limitations such as falsely categorizing bug reports and labeling false positives. Various factors were considered in testing metrics to predict the likelihood of a commit causing an error. The adoption of SUPERNOVA led to a decrease in required testing hours and a higher mean daily fix rate. Deep learning models are proposed as a future approach for automated test selection. The defect prevention model achieved good performance with high precision, recall, and F1 score. Future work involves improving the SZZ algorithm and supplementing the existing machine learning algorithm with semantic code properties. The document references various research papers and studies related to test selection, defect prevention, software evolution, code metrics, and bug localization. SUPERNOVA aims to automate test selection and prevent defects, providing benefits such as improved forecasting, increased efficiencies, and better reliability. The document also references studies related to game engine architecture and web application testing using tools like Katalon Studio, Selenium, and TestComplete.
1438 word summary
Automating Test Selection and Defect Prevention in AAA Video Games is a challenge due to the increasing complexity and size of software systems. Traditional manual testing methods are labor-intensive and cost-prohibitive, while script-based automation is ineffective in non-deterministic environments. To address these issues, the authors present SUPERNOVA, a system that automates test selection and defect prevention using data analysis, machine learning, and deep learning. SUPERNOVA reduces testing hours by 55% or more and improves stability during the production cycle. It uses risk-based testing to select tests and provides a detailed breakdown of the probability of a change-list being bug inducing. The system also incorporates defect prevention by predicting whether code is bug inducing and presenting developers with feature-based insights to make informed decisions. The authors propose an end-to-end automated solution that selects tests based on risk assessment and extracts features from code data, software system hierarchy, and developer details. This approach is particularly relevant for the games industry, which faces tight release cycles and a high level of uncertainty.
Solutions that work for other tech companies may not be applicable to the games industry. However, there are software tools like Facebook's Infer and Clever-Commit from Ubisoft that are used as preventative measures for testing. These tools incorporate machine learning and code analysis to detect errors and prevent bugs from entering production code. The concept of RBT (Regression Based Testing) has also been previously researched and served as inspiration for the test selection approach used in SUPERNOVA. While there are automated software testing systems available for consumer use, they may not be effective for proprietary software environments. SUPERNOVA is an end-to-end automation tool for data science-based testing in AAA games, designed with a software as a service (SaaS) model. In complex systems like game engines, manual test cases are necessary. None of the existing test automating products can select or choose predefined tests based on a mathematical model's output. SUPERNOVA provides a user interface for managing data sources, designing rules, creating mathematical functions, training machine learning models, and monitoring performance. The workflow for SUPERNOVA is outlined in Figure 2.
SUPERNOVA provides an interface for retrieving data from various sources such as Jira, TestRail, Git, and more. It streamlines the process of data collection by providing building blocks for efficient retrieval. After data collection, users can configure the data by linking fields and constructing nodes. This allows for efficient filtering, grouping, and searching. Once the desired data fields have been selected, they can be modified using SUPERNOVA's visual programming interface to create metrics that describe useful behaviors. Mathematical formulas and machine learning methods can be used to construct models. The mathematical formulas use risk exposure calculations based on probability, impact, and time factors. The machine learning method uses JSON schema and allows for the selection of algorithms supported by scikit-learn, with hyperparameter tuning. The models can be trained to reproduce certain outcomes on unseen data. Overall, SUPERNOVA provides flexibility and efficiency in data collection, configuration, and model construction for a wide variety of tasks.
EA uses an automated system called SUPERNOVA to perform data science tasks for QA and game testing. The system uses probability, impact, and time criteria to assess the risk of failure and make automated test selections. The weights for these criteria are hand-tuned by data scientists. The system also incorporates deep learning, allowing users to create and train neural networks using a drag and drop interface. Model training can be monitored using TensorBoard, and pre-processing techniques can be applied to data. The system is adaptable to changing game development cycles. Currently, the system is used for automating test selection and defect prevention. Risk Based Testing (RBT) is used for test case selection, while machine learning is used for defect prevention. RBT allows for data-driven decisions in testing, while machine learning allows for feature extraction from matrices. Deep learning initiatives are still in development.
The SUPERNOVA system allows for training models directly on semantic code data without the need for complex features. It offers users the option to use probabilistic, machine learning, or deep learning models, catering to different skill levels. Internal studies and experimentation are conducted to identify metrics that describe risky test areas, which are then configured through a visual programming interface. These metrics are fed into an RBT model, which updates test case selection and the historical database of risk outputs. Pipelines connect all components together, indicating inputs, models used, and output actions. Defect prevention in game development can be achieved through machine learning by predicting risky code commits. Semi-supervised learning is used with the SZZ algorithm as a heuristic to label data. A binary classifier is built based on prior bug inducing commits. The risk score prediction is based on a semi-supervised machine learning approach using a tree-based gradient boosting machine. The SZZ algorithm identifies bug-inducing code commits based on historical data from bug tracking systems and version control systems.
In a study on automating test selection and defect prevention, it was found that for every correctly labeled defective file, there is one incorrectly labeled defective file and two missed defective files. The base SZZ algorithm had a recall of 69%, precision of 42%, and an F1 score of 53%, while the best performing variant, R-SZZ, had a precision of 57%, recall of 73%, and an F1 score of 64%. However, SZZ has shortcomings, such as falsely categorizing bug reports and labeling false positives. Approximately 64% of changes made in bug fixing commits are unrelated to bug fixing, resulting in false positives. Additionally, approximately 33% of bug issue reports are incorrectly categorized. In terms of testing metrics, various factors were considered such as open unaddressed defects probability, addressed change requests probability, defect to change ratio probability, and script failure rate probability. The study also used code properties, commit action properties, and developer properties to predict the likelihood of a commit causing an error. The model alerts developers if the prediction exceeds a risk acceptance level. Overall, the study has provided improvements to EA's testing processes.
Automating test selection with SUPERNOVA led to a significant decrease in required testing hours, with automation increasing from 29% to 58% over the course of a year. Total testing hours decreased by 55%, resulting in approximately 6000 fewer hours spent testing per year. The adoption of SUPERNOVA in sports game 2020 (SG20) led to a higher mean daily fix rate of 67%, compared to only 25% for sports game 2019 (SG19). The features used in the defect prevention model included file properties, code properties, commit action properties, and developer properties. There was no observable correlation between an individual game tester's experience and overall savings in testing hours. Test planning efforts dropped by 97.5% in SG20 compared to SG19. Deep learning models were proposed as a future approach for automated test selection. Despite finding more bugs, SG20 had a higher bug fix rate and reduced variance compared to SG19. The defect prevention model achieved a macro average performance with 71% precision, 77% recall, and an F1 score of 74%. Future work involves improving the SZZ algorithm and supplementing the existing machine learning algorithm with semantic code properties. Automating test selection with SUPERNOVA resulted in significant gains in staff hours, cost, efficiency, and consistency.
This text excerpt is a list of references to various research papers and studies related to test selection, defect prevention, software evolution, code metrics, just-in-time fault prevention, and bug localization. The references include papers from conferences such as the International Symposium on Software Testing and Analysis (ISSTA), Foundations of Software Engineering (FSE), International Conference on Software Engineering (ICSE), and Mining Software Repositories (MSR). The papers cover topics such as dynamic file dependencies, static regression test selection, hybrid regression test selection, scaling static analyses, clone detection for fault prevention, context metrics on defect prediction, cross-project learning in defect prediction, dormant bugs, risk estimation in risk-based testing approaches, integration of manual and automatic risk assessment, version control systems, and bug localization.
The document discusses the challenges and costs associated with game and software testing, particularly in the context of Electronic Arts (EA). The authors propose a solution called SUPERNOVA, which aims to automate test selection and prevent defects from entering the system. SUPERNOVA is designed to work with complex test cases and fit into existing pipelines. It allows QA testers to reduce the time spent on planning testing programs and allocate resources to more productive tasks. Additionally, it improves forecasting, increases efficiencies, and offers better reliability compared to traditional methods. The document references several studies and papers related to game engine architecture, optimized test case selection, and web application testing using tools like Katalon Studio, Selenium, and TestComplete.