Investigating the Vulnerability Fixing Process in OSS Projects: Peculiarities and Challenges by Gerardo Canfora, Andrea Di Sorbo, Sara Forootani, Antonio Pirozzi, Corrado Aaron Visaggio

pubblicato 27 set 2020, 11:16 da Gerardo Canfora

Although vulnerabilities can be considered and treated as bugs, they present numerous peculiarities compared to other types of bugs (canonical bugs in the remainder of the paper). A vulnerability adds functionality to a system, as it allows an adversary to misuse or abuse the system, while a canonical bug is an incomplete or incorrect implementation of a requirement, and thus degrades the functionality of the system. This difference can affect the fixing process of vulnerabilities. By mining the repositories of 6 open source projects, we characterize the differences in the fixing process between vulnerabilities and canonical bugs, highlighting critical issues which could represent challenges for future research. Results of our study demonstrate that: (i) more re-assignments (than the ones observed in canonical bugs) are required for .finding the developers able to handle vulnerability-related bugs, (ii) developers' security-related skills should be pro.led, to improve the efficiency of the security bug assignment tasks,
and, consequently, reduce the re-assignments, and (iii) vulnerabilities require more effort, contributors and time to the .fixing strategy but smaller time to fix than canonical bugs.
Keywords: Security Bugs, Process Improvement, Software Maintenance and Evolution, Bug Management, Empirical Study
Computers & Security (to appear)

Demystifying the Adoption of Behavior-Driven Development in Open Source Projects by Fiorella Zampetti, Andrea Di Sorbo, Corrado Aaron Visaggio, Gerardo Canfora, Massimiliano Di Penta

pubblicato 22 apr 2020, 13:00 da Gerardo Canfora

Context: Behavior-Driven Development (BDD) features the capability, through appropriate domain-specific languages, of specifying acceptance test cases and making them executable. The availability of frameworks such as Cucumber or RSpec makes the application of BDD possible in practice. However, it is unclear to what extent developers use such frameworks, and whether they use them for actually performing BDD, or, instead, for other purposes such as unit testing. Objective: In this paper, we conduct an empirical investigation about the use of BDD tools in open source, and how, when a BDD tool is in place, BDD specifications co-evolve with source code.
Method: Our investigation includes three different phases: (i) a large-scale analysis to understand the extent to which BDD frameworks are used in 50,000 popular open-source projects written in five programming languages; (ii) a study on the co-evolution of scenarios, fixtures and production code in a sample of 20 Ruby projects, through the Granger’s causality test, and (iii) a survey with 31 developers to understand how they use BDD frameworks.
Results: Results of the study indicate that ≃ 27% of the sampled projects use BDD frameworks, with a prevalence in Ruby projects (68%). In about 37% of the cases, we found a co-evolution between scenarios/fixtures and production code. Specifically, changes to scenarios and fixtures often happen together or after changes to source code. Moreover, survey respondents indicate that, while they understand the intended purpose of BDD frameworks, most of them write tests while/after coding rather than strictly applying BDD.
Conclusions: Even if the BDD frameworks usage is widespread among open source projects, in many cases they are used for different purposes such as unit testing activities. This mainly happens because developers felt BDD remains quite effort-prone, and its application goes beyond the simple adoption of a BDD framework.
Information and Software Technology (to appear)

An Empirical Characterization of Bad Practices in Continuous Integration by Fiorella Zampetti, Carmine Vassallo, Sebastiano Panichella, Gerardo Canfora, Harald Gall, Massimiliano Di Penta

pubblicato 1 nov 2019, 07:24 da Gerardo Canfora

Abstract Continuous Integration (CI) has been claimed to introduce several benefits in software development, including high software quality and reliabil- ity. However, recent work pointed out challenges, barriers and bad practices characterizing its adoption. This paper empirically investigates what are the bad practices experienced by developers applying CI. The investigation has been conducted by leveraging semi-structured interviews of 13 experts and mining more than 2,300 Stack Overflow posts. As a result, we compiled a catalog of 79 CI bad smells belonging to 7 categories related to different di- mensions of a CI pipeline management and process. We have also investigated the perceived importance of the identified bad smells through a survey in- volving 26 professional developers, and discussed how the results of our study relate to existing knowledge about CI bad practices. Whilst some results, such as the poor usage of branches, confirm existing literature, the study also high- lights uncovered bad practices, e.g., related to static analysis tools or the abuse of shell scripts, and contradict knowledge from existing literature, e.g., about avoiding nightly builds. We discuss the implications of our catalog of CI bad smells for (i) practitioners, e.g., favor specific, portable tools over hacking, and do not ignore nor hide build failures, (ii) educators, e.g., teach CI culture, not just technology, and teach CI by providing examples of what not to do, and (iii) researchers, e.g., developing support for failure analysis, as well as automated CI bad smell detectors.
Empirical Software Engineering (to appear)

Exploiting Natural Language Structures in Software Informal Documentation by Andrea Di Sorbo, Sebastiano Panichella, Corrado A. Visaggio, Massimiliano Di Penta, Gerardo Canfora, Harald C. Gall

pubblicato 21 set 2019, 06:16 da Gerardo Canfora

Communication means, such as issue trackers, mailing lists, Q&A forums, and app reviews, are premier means of collabora- tion among developers, and between developers and end-users. Analyzing such sources of information is crucial to build recommenders for developers, for example suggesting experts, re-documenting source code, or transforming user feedback in maintenance and evolution strategies for developers. To ease this analysis, in previous work we proposed DECA (Development Emails Content Analyzer), a tool based on Natural Language Parsing that classifies with high precision development emails’ fragments according to their purpose. However, DECA has to be trained through a manual tagging of relevant patterns, which is often effort-intensive, error-prone and requires specific expertise in natural language parsing. In this paper, we first show, with a study involving Master’s and Ph.D. students, the extent to which producing rules for identifying such patterns requires effort, depending on the nature and complexity of patterns. Then, we propose an approach, named NEON (Nlp-based softwarE dOcumentation aNalyzer), that automatically mines such rules, minimizing the manual effort. We assess the performances of NEON in the analysis and classification of mobile app reviews, developers discussions, and issues. NEON simplifies the patterns’ identification and rules’ definition processes, allowing a savings of more than 70% of the time otherwise spent on performing such activities manually. Results also show that NEON-generated rules are close to the manually identified ones, achieving comparable recall.
IEEE Transactions on Software Engineering (TSE) - to appear.
[IEEE Xplore]

Summarizing Vulnerabilities’ Descriptions to Support Experts during Vulnerability Assessment Activities by Ernesto Rosario Russo, Andrea Di Sorbo, Corrado A. Visaggio, Gerardo Canfora

pubblicato 21 set 2019, 06:11 da Gerardo Canfora

Vulnerabilities affecting software and systems have to be promptly fixed, to prevent violations to integrity, availability and con- fidentiality policies of targeted organizations. Once a vulnerability is discovered, it is published on the Common Vulnerabilities and Exposures (CVE) database, freely available on the web. However, vulnerabilities are described using natural language, which makes them hard to be automatically interpreted by machines. As a consequence, vulnerability assessment activities tend to be time-consuming and imprecise, as the assessors must manually read the majority of the vulnerabilities concerning the perimeter to be protected, to make a decision on which vulnerabilities have the highest priority for patching. In this paper we present CVErizer, an approach able to automatically generate summaries of daily posted vulnerabilities and categorize them according to a taxon- omy modeled for industry. We empirically assess the classification capabilities of the approach on a set of 3369 pre-labeled CVE records and perform an end-to-end evaluation of CVErizer summaries involving 15 cybersecurity master students and 4 professional security experts. Our study demonstrates the high performance of the proposed approach in correctly extracting and classifying information from CVE descriptions. Summaries are also considered highly useful for helping analysts during the vulnerability assessment processes.
Journal of Systems and Software, Vol. 156, pages 84 - 99, 2019

A Study on the Interplay between Pull Request Review and Continuous Integration Builds by Fiorella Zampetti, Gabriele Bavota, Gerardo Canfora and Massimiliano Di Penta

pubblicato 11 feb 2019, 12:38 da Gerardo Canfora

Modern code review (MCR) is nowadays well- adopted in industrial and open source projects. Recent studies have investigated how developers perceive its ability to foster code quality, developers' code ownership, and team building. MCR is often being used with automated quality checks through static analysis tools, testing or, ultimately, through automated builds on a Continuous Integration (CI) infrastructure. With the aim of understanding how developers use the outcome of CI builds during code review and, more specifically, during the discussion of pull requests, this paper empirically investigates the interplay between pull request discussion and the use of CI by means of 64,865 pull request discussions belonging to 69 open source projects. After having analyzed to what extent a build outcome influences the pull request merger, we qualitatively analyze the content of 857 pull request discussions. Also, we complement such an analysis with a survey involving 13 developers. While pull requests with passed build have a higher chance of being merged than failed ones, and while survey participants confirmed this quantitative finding, other process-related factors play a more important role in the pull request merge decision. Also, the survey participants point out cases where a pull request can be merged in presence of a CI failure, e.g., when a new pull request is opened to cope with the failure, when the failure is due to minor static analysis warnings. The study also indicates that CI introduces extra complexity, as in many pull requests developers have to solve non-trivial CI configuration issues.
Proc. of 26th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER'19) Zhejiang University, Hangzhou, China - February 24-27, 2019

A Nlp-based Solution to Prevent from Privacy Leaks in Social Network Posts by G. Canfora, A. Di Sorbo, E. Emanuele, S. Forootani, C.A. Visaggio

pubblicato 7 set 2018, 14:04 da Gerardo Canfora

Private and sensitive information is often revealed in posts appearing in Social Networks (SN). This is due to the users’ willingness to increase their interactions within specific social groups, but also to a poor knowledge about the risks for privacy. We argue that technologies able to evaluate the sensitiveness of information while it is being published could enhance privacy protection by warning the user about the risks deriving from the disclosure of a certain information. To this aim, we propose a method, and an accompanying tool, to automatically intercept the sensitive information which is delivered in a social network post, through the exploitation of recurrent natural language patterns that are often used by users to disclose private data. A comparison with several machine learning techniques reveals that our method outperforms them, since it is more precise, accurate and not dependent on (i) a specific training set, or (ii) the selection of particular features.
Proc. of 13th International ARES Conference on Availability, Reliability and Security (ARES 2018) - Hamburg, Germany, August 27-30, 2018

LEILA: formaL tool for idEntifying mobIle maLicious behAviour by Gerardo Canfora, Fabio Martinelli, Francesco Mercaldo, Vittoria Nardone, Antonella Santone, Corrado Aaron Visaggio

pubblicato 4 mag 2018, 05:35 da Gerardo Canfora

With the increasing diffusion of mobile technologies, nowadays mobile devices represent an irreplaceable tool to perform several operations, from posting a status on a social network to transfer money between bank accounts. As a consequence, mobile devices store a huge amount of private and sensitive information and this is the reason why attackers are developing very sophisticated techniques to extort data and money from our devices. This paper presents the design and the implementation of LEILA (formaL tool for idEntifying mobIle maLicious behAviour), a tool targeted at Android malware families detection. LEILA is based on a novel approach that exploits model checking to analyse and verify the Java Bytecode that is produced when the source code is compiled. After a thorough description of the method used for Android malware families detection, we report the experiments we have conducted using LEILA. The experiments demonstrated that the tool is effective in detecting malicious behaviour and, especially, in localizing the payload within the code: we evaluated real-world malware belonging to several widespread families obtaining an accuracy ranging between 0.97 and 1.
IEEE Transactions on Software Engineering (accepted)

The Relation between Developers’ Communication and Fix-Inducing Changes: An Empirical Study by Mario Luca Bernardi, Gerardo Canfora, Giuseppe A. Di Lucca, Massimiliano Di Penta, Damiano Distante

pubblicato 1 mar 2018, 06:29 da Gerardo Canfora

Background Many open source and industrial projects involve several developers spread around the world and working in different timezones. Such developers usually communicate through mailing lists, issue tracking systems or chats. Lack of adequate communication can create misunderstanding and could possibly cause the introduction of bugs.
Aim This paper aims at investigating the relation between the bug inducing and fixing phenomenon and the lack of written communication between committers in open source projects.
Method We performed an empirical study that involved four open source projects, namely Apache httpd, GNU GCC, Mozilla Firefox, and Xorg Xserver. For each project change history data, issue tracker comments, mailing list messages, and chat logs were analyzed in order to answer four research questions about the relation between the social importance and communication level of committers and their proneness to induce bug xes.
Results and implications Results indicate that the majority of bugs are fixed by committers who did not induce them, a smaller but substantial percentage of bugs is xed by committers that induced them, and very few bugs are fixed by committers that were not directly involved in previous changes on the same les of the x. More importantly, committers inducing xes tend to have a lower level of communication between each other than that of other committers. This last finding suggests that increasing the level of communication between x-inducing committers could reduce the number of xes induced in a software project.
Journal of Systems and Software (to appear)

Android Apps and User Feedback: A Dataset for Software Evolution and Quality Improvement by Giovanni Grano, Andrea Di Sorbo, Francesco Mercaldo, Corrado A. Visaggio, Sebastiano Panichella, Gerardo Canfora

pubblicato 24 lug 2017, 10:24 da Gerardo Canfora   [ aggiornato in data 24 lug 2017, 10:24 ]

Nowadays, Android represents the most popular mobile platform with a market share of around 80%. Previous research showed that data contained in user reviews and code change history of mobile apps represent a rich source of information for reducing software maintenance and development effort, increasing customers’ satisfaction. Stemming from this observation, we present in this paper a large dataset of Android applications belonging to 23 different apps categories, which provides an overview of the types of feedback users report on the apps and documents the evolution of the related code metrics. The dataset contains about 395 applications of the F-Droid repository, including around 600 versions, 280,000 user reviews and more than 450,000 user feedback (extracted with specific text mining approaches). Furthermore, for each app version in our dataset, we employed the Paprika tool and developed several Python scripts to detect 8 different code smells and compute 22 code quality indicators. The paper discusses the potential usefulness of the dataset for future research in the field.
Dataset URL:
2nd International Workshop on App Market Analytics, WAMA 2017 (in conjunction with ESEC/FSE 2017)

1-10 of 78