for the paper Code Cloning in Smart Contracts on the Ethereum Platform: An Extended Replication Study.
This paper is an extended replication of the paper Code cloning in smart contracts: a case study on verified contracts from the Ethereum blockchain platform by M. Kondo, G. Oliva, Z.M. Jiang, A. Hassan, and O. Mizuno. For the replication package of the original study, please, visit https://github.com/SAILResearch/suppmaterial-18-masanari-smart_contract_cloning. To obtain the corpus of 33,034 smart contracts, please, contact the authors of the original study.
/01_data/clonedata– Results of the clone analysis by the NiCad extension developed for this study./raw– Raw results from the analysis./duplicates– Cleaned data.openzeppelin.zip– OpenZeppelin data. Requires unzipping into folderopenzeppelin.
/metadata– Metadata about the authors, creation date and transactions of the contracts in the corpus./prepared- Prepared pickle files for data analysis.
/02_prepare- Scripts for preparing the data in/01_data/prepared. Contains potentially long-running scripts. In such cases, the approximate execution times are reported in the source files./03_analysis- Analysis scripts for the automated analysis of data./04_results- Results of the analyses, including charts and numeric results. Some of these results are discussed in the paper in great detail. Every analysis result corresponds to a particular observation in the paper, clearly identified in the name of the generated observation file.
The following describes four reproduction scenarios. Any of the scenarios can be executed independently from the others.
- Reproduction of the analyses: reproduces the analysis results in
/04_results, including charts and numeric results. The scripts use the prepared data contained in the/01_data/preparedfolder. - Reproduction of the prepared data: reproduces the prepared data in
/01_data/preparedby (i) merging author, transaction and file length metadata into the clone data; and (ii), pre-processing data for analysis and persisting the pre-processed data into pickle files. Some of the pre-processing steps are potentially time-consuming. In such cases, the approximate execution times are reported in the source file. - Reproduction of the cleaned data: reproduces the cleaned data in
/01_data/clonedata/duplicatesfrom the raw data in/01_data/clonedata/rawby bringing the contents of the.xmlfiles into a consolidated form. - Reproduction of the raw data: reproduces the raw data
/01_data/clonedata/rawby running the NiCad extension developed for this study.
NOTE: The following steps have been tested with python>=3.7 && python<3.10.
Follow the steps below to reproduce the analysis results in /04_results, including charts and numeric results. The scripts use the prepared data contained in the /01_data/prepared folder.
- Clone this repository.
- Install dependencies by running
pip install -r requirements.txtin the root folder. - Extract
/01_data/clonedata/openzeppelin.zipinto folder/01_data/clonedata/openzeppelin, or runpython 01_unzip.pyin the02_preparefolder. - Run
python analysis.pyin the/03_analysisfolder.- Run
python analysis.py -o [observationId]to run the analysis of a specific observation. - Use the
-sflag to stash the folder of the previous analyses.
- Run
Follow the steps below to reproduce the prepared data in /01_data/prepared by (i) merging author, transaction and file length metadata into the clone data; and (ii), pre-processing data for analysis and persisting the pre-processed data into pickle files. Some of the pre-processing steps are potentially time-consuming. In such cases, the approximate execution times are reported in the source file.
- Run
python 03_mergeMetadata.pyin the/02_preparefolder. - Run
python 04_prepareAnalysisData.pyin the/02_preparefolder.- Run
python 04_prepareAnalysisData.py -p [RQ or observation ID]to prepare data for a specific RQ or observation.
- Run
Some preparation steps can take up to hours to complete. Please find the benchmarked execution times commented in the source code.
Follow the steps below to reproduce the cleaned data in /01_data/clonedata/duplicates from the raw data in /01_data/clonedata/raw by bringing the contents of the .xml files into a consolidated form.
The cleaned data is used in the data preparation scripts. The cleaned data is included in this replication package in folder /01_data/clonedata/duplicates, but it can be reproduced from the raw data by following the steps below.
- Run
python 02_cleanup.pyin the/02_preparefolder.
Follow the steps below to reproduce the raw clone data in /01_data/clonedata/raw by running the NiCad extension developed for this study.
To obtain the corpus of 33,034 smart contracts, please, contact the authors of the original study.
A Docker image is maintained on Docker Hub and can be obtained by running: docker pull faizank/nicad6:TSE.
The following process assumes docker is installed and working correctly, and the image is pulled. You can verify that image by issuing docker images from the terminal and see that there is an image named faizank/nicad6 available in the list.
NOTE: The following steps have been tested with docker_engine==20.10.17(build==100c701)
- Create a new folder
/systems/source-codeand move the corpus to this folder. - Create a new folder
/outputto store the result of clone analysis. - Execute the analysis by issuing the following command:
docker run --platform linux/x86_64 -v output:/nicad6/01_data -v systems:/nicad6/systems faizank/nicad6. This will generate the output artefacts inside theoutputfolder. - Move the contents of the
/outputfolder to/01_dataand use the python scripts discussed above for the rest of the replication.
Should you prefer to build the image from scratch, please, refer to the repository of the NiCad extension developed for this study.
To experiment with the tool, issue docker run --platform linux/x86_64 -v output:/nicad6/01_data -v systems:/nicad6/systems -it faizank/nicad6 bash.