Gerardo canfora, Francesco Mercaldo, Antonella Santone. A Novel Classification Technique based on Formal Methods. ACM Transactions on Knowledge Discovcery from Data. 2023 (To Appear) - preprint

In last years, we are witnessing a growing interest in the application of supervised machine learning techniques in the most disparate fields. One winning factor of machine learning is represented by its ability to easily create models, as it does not require prior knowledge about the application domain. Complementary to machine learning are formal methods, that intrinsically offer safeness check and mechanism for reasoning on failures. Considering the weaknesses of machine learning, a new challenge could be represented by the use of formal methods. However, formal methods require the expertise of the domain, knowledge about modeling language with its semantic and mathematical rigour to specify properties. In this paper, we propose a novel learning technique based on the adoption of formal methods for classification thanks to the automatic generation both of the formula and of the model. In this way the proposed method does not require any human intervention and thus it can be applied also to complex/large datasets. This leads to less effort both in using formal methods and in a better explainability and reasoning about the obtained results. Through a set of case studies from different real-world domains (i.e., driver detection, scada attack identification, arrhythmia characterization, mobile malware detection and radiomics for lung cancer analysis), we demonstrate the usefulness of the proposed method, by showing that we are able to overcome the performances obtained from widespread classification algorithms.

Fiorella Zampetti, Damian A. Tamburri, Sebastiano Panichella, Annibale Panichella, Gerardo Canfora, Massimiliano Di Penta. Continuous Integration and Delivery Practices for Cyber-Physical Systems: An Interview-Based Study. ACM Transactions on Software Engineering and Methodology. Vol. 32, Issue: 3, 2023 - preprint

Continuous Integration and Delivery (CI/CD) practices have shown several benefits for software development and operations, e.g., faster release cycles and early discovery of defects. For Cyber-Physical System (CPS) development, CI/CD can help achieving required goals, such as high dependability, yet it may be challenging to apply. This paper empirically investigates challenges, barriers, and their mitigation occurring when applying CI/CD practices to develop CPSs in 10 organizations working in 8 different domains. The study has been conducted through semi-structured interviews, by applying an open card sorting procedure together with a member-checking survey within the same organizations, and by validating the results through a further survey involving 55 professional developers. The study reveals several peculiarities in the application of CI/CD to CPSs. These include the need for (i) combining continuous and periodic builds, while balancing the use of Hardware-in-the-Loop (HiL) and simulators; (ii) coping with difficulties in software deployment (iii) accounting for simulators and HiL differing in their behavior; and (vi) combining hardware/software expertise in the development team. Our findings open the road towards recommenders aimed at supporting the setting and evolution of CI/CD pipelines, as well as university curricula requiring interdisciplinarity, such as knowledge about hardware, software, and their interplay. 

Arnaldo Sgueglia, Andrea Di Sorbo, Corrado Aaron Visaggio, Gerardo Canfora. A Systematic Literature Review of IoT Time Series Anomaly Detection Solutions. Future Generation Computer Systems Vol. 134, September 2022, Pages 170-186 - preprint 

The rapid spread of the Internet of Things (IoT) devices has prompted many people and companies to adopt the IoT paradigm, as this paradigm allows the automation of several processes related to data collection and monitoring. In this context, the sensors (or other devices) generate huge amounts of data while monitoring physical spaces and objects. Therefore, the problem of managing and analyzing these huge amounts of data has stimulated researchers and practitioners to adopt anomaly detection techniques, which are automated solutions to enable the recognition of abnormal behaviors occurring in complex systems. In particular, in IoT environments, anomaly detection very often involves the analysis of time series data and this analysis should be accomplished under specific time or resource constraints. In this systematic literature review, we focus on the IoT time series anomaly detection problem by analyzing 62 articles written from 2014 to 2021. Specifically, we explore the methods and techniques adopted by researchers to deal with the issues related to dimensionality reduction, anomaly localization, and real-time monitoring, also discussing the datasets used, and the real-case scenarios tested. For each of these topics, we highlight potential limitations and open issues that need to be addressed in future work .

Andrea Di Sorbo, Sonia Laudanna, Anna Vacca, Corrado A. Visaggio, Gerardo Canfora. Profiling Gas Consumption in Solidity Smart Contracts, Journal of Systems and Software, Vol. 186, April 2022 - preprint

Nowadays, more and more applications are developed for running on a distributed ledger technology, namely dApps. The business logic of dApps is usually implemented within smart contracts developed through Solidity, a programming language for writing smart contracts on different blockchain platforms, including the popular Ethereum. In Ethereum, the smart contracts run on the machines of miners and the gas corresponds to the execution fee compensating such computing resources. However, the deployment and execution costs of a smart contract depend on the implementation choices done by developers. Unappropriated design choices could lead to higher gas consumption than necessary. In this paper, we (i) identify a set of 19 Solidity code smells affecting the deployment and transaction costs of a smart contract, and (ii) assess the relevance of such smells through a survey involving 34 participants. On top of these smells, we propose GasMet, a suite of metrics for statically evaluating the code quality of a smart contract from the gas consumption perspective. An experiment involving 2,186 smart contracts demonstrates that the proposed metrics have direct associations with deployment costs. The metrics in our suite can be used for more easily identifying source code segments that need optimizations.

Gerardo Canfora, Andrea Di Sorbo, Sara Forootani, Matias Martinez, Corrado A. Visaggio. Patchworking: Exploring the Code Changes induced by Vulnerability Fixing Activities, Information and Software Technology, vol. 142, February 2022 - preprint

Context: Identifying and repairing vulnerable code is a critical software maintenance task. Change impact analysis plays an important role during software maintenance, as it helps software maintainers to figure out the potential effects of a change before it is applied. However, while the software engineering community has extensively studied techniques and tools for performing impact analysis of change requests, there are no approaches for estimating the impact when the change involves the resolution of a vulnerability bug. Objective: We hypothesize that similar vulnerabilities may present similar strategies for patching. More specifically, our work aims at understanding whether the class of the vulnerability to fix may determine the type of impact on the system to repair. Method: To verify our conjecture, in this paper, we examine 524 security patches applied to vulnerabilities belonging to ten different weakness categories and extracted from 98 different open-source projects written in Java. Results: We obtain empirical evidence that vulnerabilities of the same types are often resolved by applying similar code transformations, and, thus, produce almost the same impact on the codebase. Conclusion: On the one hand, our findings open the way to better management of software maintenance activities when dealing with software vulnerabilities. Indeed, vulnerability class information could be exploited to better predict how much code will be affected by the fixing, how the structural properties of the code (i.e., complexity, coupling, cohesion, size) will change, and the effort required for the fix. On the other hand, our results can be leveraged for improving automated strategies supporting developers when they have to deal with security flaws.

Sebastiano Panichella, Gerardo Canfora, Andrea Di Sorbo. “Won’t We Fix this Issue?” Qualitative Characterization and Automated Identification of Wontfix Issues on GitHub Information and Software Technology, Volume 139, November 2021 - preprint

Context: Addressing user requests in the form of bug reports and Github issues represents a crucial task of any successful software project. However, user-submitted issue reports tend to widely differ in their quality, and developers spend a considerable amount of time handling them.

Objective: By collecting a dataset of around 6,000 issues of 279 GitHub projects, we observe that developers take significant time (i.e., about five months, on average) before labeling an issue as a wontfix. For this reason, in this paper, we empirically investigate the nature of wontfix issues and methods to facilitate issue management process.

Method: We first manually analyze a sample of 667 wontfix issues, extracted from heterogeneous projects, investigating the common reasons behind a “wontfix decision”, the main characteristics of wontfix issues and the potential factors that could be connected with the time to close them. Furthermore, we experiment with approaches enabling the prediction of wontfix issues by analyzing the titles and descriptions of reported issues when submitted.

Results and conclusion: Our investigation sheds some light on the wontfix issues’ characteristics, as well as the potential factors that may affect the time required to make a “wontfix decision”. Our results also demonstrate that it is possible to perform prediction of wontfix issues with high average values of precision, recall, and F-measure (90%–93%).

Andrea Di Sorbo, Sebastiano Panichella, Corrado A. Visaggio, Massimiliano Di Penta, Gerardo Canfora, Harald C. Gall. Exploiting Natural Language Structures in Software Informal Documentation. IEEE Transactions on Software Engineering, Volume: 47, Issue: 8, 2021 - preprint


Communication means, such as issue trackers, mailing lists, Q&A forums, and app reviews, are premier means of collabora- tion among developers, and between developers and end-users. Analyzing such sources of information is crucial to build recommenders for developers, for example suggesting experts, re-documenting source code, or transforming user feedback in maintenance and evolution strategies for developers. To ease this analysis, in previous work we proposed DECA (Development Emails Content Analyzer), a tool based on Natural Language Parsing that classifies with high precision development emails’ fragments according to their purpose. However, DECA has to be trained through a manual tagging of relevant patterns, which is often effort-intensive, error-prone and requires specific expertise in natural language parsing. In this paper, we first show, with a study involving Master’s and Ph.D. students, the extent to which producing rules for identifying such patterns requires effort, depending on the nature and complexity of patterns. Then, we propose an approach, named NEON (Nlp based softwarE dOcumentation aNalyzer), that automatically mines such rules, minimizing the manual effort. We assess the performances of NEON in the analysis and classification of mobile app reviews, developers discussions, and issues. NEON simplifies the patterns’ identification and rules’ definition processes, allowing a savings of more than 70% of the time otherwise spent on performing such activities manually. Results also show that NEON-generated rules are close to the manually identified ones, achieving comparable recall.

AnnaVacca, Andrea Di Sorbo, Corrado A.Visaggio, GerardoCanfora. A systematic literature review of blockchain and smart contract development: Techniques, tools, and open challenges. Journal of Systems and Software, Volume 174, April 2021 - preprint

Blockchain platforms and languages for writing smart contracts are becoming increasingly popular. However, smart contracts and blockchain applications are developed through non-standard software life-cycles, in which, for instance, delivered applications can hardly be updated or bugs resolved by releasing a new version of the software. Therefore, this systematic literature review oriented to software engineering aims at highlighting current problems and possible solutions concerning smart contracts and blockchain applications development. In this paper, we analyze 96 articles (written from 2016 to 2020) presenting solutions to tackle software engineering-specific challenges related to the development, test, and security assessment of blockchain-oriented software. In particular, we review papers (that appeared in international journals and conferences) relating to six specific topics: smart contract testing, smart contract code analysis, smart contract metrics, smart contract security, Dapp performance, and blockchain applications. Beyond the systematic review of the techniques, tools, and approaches that have been proposed in the literature to address the issues posed by the development of blockchain-based software, for each of the six aforementioned topics, we identify open challenges that require further research.