Çatal, Çağatay2017-10-162017-10-1620121785-8860http://hdl.handle.net/11413/1662Experimental studies confirmed that only a small portion of software modules cause faults in software systems. Therefore, the majority of software modules are represented with non-faulty labels and the rest are marked with faulty labels during the modeling phase. These kinds of datasets are called irnbalanced, and different performance metrics exist to evaluate the performance of proposed fault prediction techniques. In this study, we investigate 85 fault prediction papers based on their performance evaluation metrics and categorize these metrics into two main groups. Evaluation methods such as cross validation and stratified sampling are not in the scope of this paper, and therefore only evaluation metrics are examined This study shows that researchers have used different evaluation parameters for software fault prediction until now and more studies on performance evaluation metrics for imbalanced datasets should be conducteden-USPerformance EvaluationSoftware Fault PredictionMachine LearningOriented Design MetricsModelsPerformans DeğerlendirmesiYazılım Hatası TahminiMakine ÖğrenmeOdaklı Tasarım MetrikleriModellerPerformance Evaluation Metrics for Software Fault Prediction StudiesArticle3096969000133096969000132-s2.0-848661177992-s2.0-84866117799