To main content

Incorrect results in software engineering experiments: How to improve research practices

Abstract

Context

The trustworthiness of research results is a growing concern in many empirical disciplines.

Aim

The goals of this paper are to assess how much the trustworthiness of results reported in software engineering experiments is affected by researcher and publication bias, given typical statistical power and significance levels, and to suggest improved research practices.

Method

First, we conducted a small-scale survey to document the presence of researcher and publication biases in software engineering experiments. Then, we built a model that estimates the proportion of correct results for different levels of researcher and publication bias. A review of 150 randomly selected software engineering experiments published in the period 2002–2013 was conducted to provide input to the model.

Results

The survey indicates that researcher and publication bias is quite common. This finding is supported by the observation that the actual proportion of statistically significant results reported in the reviewed papers was about twice as high as the one expected assuming no researcher and publication bias. Our models suggest a high proportion of incorrect results even with quite conservative assumptions.

Conclusion

Research practices must improve to increase the trustworthiness of software engineering experiments. A key to this improvement is to avoid conducting studies with unsatisfactory low statistical power.
Read publication

Category

Academic article

Client

  • Research Council of Norway (RCN) / 231679/F20
  • Research Council of Norway (RCN) / 231679

Language

English

Author(s)

  • Magne Jørgensen
  • Tore Dybå
  • Knut Liestøl
  • Dag Sjøberg

Affiliation

  • University of Oslo
  • Simula Research Laboratory
  • SINTEF Digital / Software Engineering, Safety and Security

Year

2016

Published in

Journal of Systems and Software

ISSN

0164-1212

Publisher

Elsevier

Volume

116

Page(s)

133 - 145

View this publication at Cristin