Preventing Duplicate Bug Reports by Continuously Querying Bug Reports

Abram Hindle, Curtis Onuckzo

2018/07/20

Preventing Duplicate Bug Reports by Continuously Querying Bug Reports

Authors

Abram Hindle, Curtis Onuckzo

Venue

Abstract

Bug deduplication or duplicate bug report detection is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to IR-based word indexing methods employed by search engines. Yet too often these searches do very little to stop the creation of duplicate bug reports. Some bug trackers have more than 10% of their bug reports marked as duplicate. Perhaps these bug tracker search engines are not enough? In this paper we propose a method of attempting to prevent duplicate bug reports before they start: continuously querying. That is as the bug reporter types in their bug report their text is used to query the bug database to find duplicate or related bug reports. This continuously querying bug reports allows the reporter to be alerted to duplicate bug reports as they report the bug, rather than formulating queries to find the duplicate bug report. Thus this work ushers in a new way of evaluating bug report deduplication techniques, as well as a new kind of bug deduplication task. We show that simple IR measures can address this problem but also that further research is needed to refine this novel process that is integrate-able into modern bug report systems.

Bibtex

@article{hindle2018EMSE-Continuously-Querying,
 abstract = {Bug deduplication or duplicate bug report detection is a hot topic in software engineering information retrieval research, but it is often not deployed. Typically to de-duplicate bug reports developers rely upon the search capabilities of the bug report software they employ, such as Bugzilla, Jira, or Github Issues. These search capabilities range from simple SQL string search to IR-based word indexing methods employed by search engines. Yet too often these searches do very little to stop the creation of duplicate bug reports. Some bug trackers have more than 10% of their bug reports marked as duplicate. Perhaps these bug tracker search engines are not enough? In this paper we propose a method of attempting to prevent duplicate bug reports before they start: continuously querying. That is as the bug reporter types in their bug report their text is used to query the bug database to find duplicate or related bug reports. This continuously querying bug reports allows the reporter to be alerted to duplicate bug reports as they report the bug, rather than formulating queries to find the duplicate bug report. Thus this work ushers in a new way of evaluating bug report deduplication techniques, as well as a new kind of bug deduplication task. We show that simple IR measures can address this problem but also that further research is needed to refine this novel process that is integrate-able into modern bug report systems.},
 accepted = {2018-07-20},
 author = {Abram Hindle and Curtis Onuckzo},
 authors = {Abram Hindle, Curtis Onuckzo},
 code = {hindle2018EMSE-Continuously-Querying},
 day = {20},
 funding = {MITACS Accelerate and NSERC Discovery},
 journal = {Empirical Software Engineering},
 journalid = {EMSE-D-17-00233R2},
 month = {July},
 pagerange = {1--38},
 pages = {1--38},
 published = {2018-07-20},
 role = { Researcher / Co-author},
 title = {Preventing Duplicate Bug Reports by Continuously Querying Bug Reports},
 type = {article},
 url = {http://softwareprocess.ca/pubs/hindle2018EMSE-Continuously-Querying.pdf},
 venue = {Empirical Software Engineering},
 year = {2018}
}