Napredna pretraga

Pregled bibliografske jedinice broj: 882497

Critical Level of Data Imbalance For Machine Learning Algorithms In Software Defect Prediction


Mauša, Goran; Dalbelo Bašić, Bojana; Galinac Grbac, Tihana
Critical Level of Data Imbalance For Machine Learning Algorithms In Software Defect Prediction // Proceedings of IWDS 2016 / Lončarić, Sven ; Šmuc, Tomislav (ur.).
Zagreb: Centre of Research Excellence for Data Science and Cooperative Systems, 2016. str. 36-36 (poster, međunarodna recenzija, znanstveni)


Naslov
Critical Level of Data Imbalance For Machine Learning Algorithms In Software Defect Prediction

Autori
Mauša, Goran ; Dalbelo Bašić, Bojana ; Galinac Grbac, Tihana

Vrsta, podvrsta i kategorija rada
Sažeci sa skupova, sažetak, znanstveni

Izvornik
Proceedings of IWDS 2016 / Lončarić, Sven ; Šmuc, Tomislav - Zagreb : Centre of Research Excellence for Data Science and Cooperative Systems, 2016, 36-36

Skup
First International Workshop on Data Science

Mjesto i datum
Zagreb, Hrvatska, 30.11.2016.

Ključne riječi
Critical level ; machine learning ; software defect prediction

Sažetak
The increasing complexity of software systems is extending the verification activities and increasing the development cost. Software defect predictions aims to improve the allocation of verification resources. A major problem in this field is that the software defects are unequally distributed within the software system. The majority of defects is situated in the smaller part of the system. This problem is also known as the data imbalance problem and it is an inherent feature in this domain. High levels of data imbalance are known to deteriorate the performance of machine learning algorithms. This paper proposes a method for establishing the critical level of data imbalance for machine learning algorithms. The proposed method is based on Arrow-Pratt metric and it enables us to determine above which level of data imbalance certain machine learning algorithms become incapable of performing their defect prediction task. The benefit of using this method is to give the practitioners guidelines for finding the most appropriate machine learning method for the level of imbalance that they are facing in practice and to improve verification and development strategies of complex software systems.

Vrsta sudjelovanja
Poster

Vrsta recenzije
Međunarodna recenzija

Izvorni jezik
Engleski

Znanstvena područja
Računarstvo