Background: Researchers in software engineering have increasingly added gray literature (GL) to primary and, especially, secondary studies. Several reasons explain this decision, such as grasping practitioners' view on the topic under study. However, the use of GL in research poses several challenges like the amount and unstructured nature of data. The lack of automated tools and approaches to aid this task creates a bottleneck in selecting documents for inclusion. Aims: We investigate how summaries generated by PositionRank, an unsupervised text summarization approach, could support the inclusion analysis of documents in a GL study. Method: We performed an evaluation of using PositionRank to summarize documents analyzed on an ongoing study on software engineering. We compared the rating among two raters in a cross-over setup using summaries and full-text documents. We calculated their agreement, the precision and miss-rate using summaries against the full-text. The raters also discussed the documents on which they had conflicted answers and reached categories of reasons to explain the disagreements. Results: The results indicate that some inclusion criteria, which might be positively determined by few sentences, is susceptible to be misclassified when using summaries. Conclusions: Our study presents an analysis of the use of automatic summarization to support the inclusion assessment in gray literature studies discussing when this solution is viable. Our results could guide further studies in this direction.
- Generated abstracts: Evaluating automatic text summarization for blog posts in gray literature studies
- J Melegati - Free University of Bozen-BolzanoE Guerra - Free University of Bozen-BolzanoIgor Wiese - Universidade Tecnológica Federal do ParanáXiaofeng Wang - Free University of Bozen-Bolzano
- ACM International Conference Proceeding Series, pp.282-287
- International Conference on Evaluation and Assessment in Software Engineering (Gothenburg, 13/06/2022–15/06/2022)
- Association for Computing Machinery
- (UNIBZ)66641886
991006491097301241 - 2-s2.0-85132389007
- Faculty of Computer Science
- English
- Conference proceeding
- Melegati J, Guerra E, Wiese IS, Wang X