Flow Chart Generation-Based Source Code Similarity Detection Using Process Mining

Joint Authors

Li, Lulu
Zeng, Qingtian
Liu, Cong
Zhang, Feng

Source

Scientific Programming

Issue

Vol. 2020, Issue 2020 (31 Dec. 2020), pp.1-15, 15 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2020-07-07

Country of Publication

Egypt

No. of Pages

15

Main Subjects

Mathematics

Abstract EN

Source code similarity detection has extensive applications in computer programming teaching and software intellectual property protection.

In the teaching of computer programming courses, students may utilize some complex source code obfuscation techniques, e.g., opaque predicates, loop unrolling, and function inlining and outlining, to reduce the similarity between code fragments and avoid the plagiarism detection.

Existing source code similarity detection approaches only consider static features of source code, making it difficult to cope with more complex code obfuscation techniques.

In this paper, we propose a novel source code similarity detection approach by considering the dynamic features at runtime of source code using process mining.

More specifically, given two pieces of source code, their running logs are obtained by source code instrumentation and execution.

Next, process mining is used to obtain the flow charts of the two pieces of source code by analyzing their collected running logs.

Finally, similarity of the two pieces of source code is measured by computing the similarity of these two flow charts.

Experimental results show that the proposed approach can deal with more complex obfuscation techniques including opaque predicates and loop unrolling as well as function inlining and outlining, which cannot be handled by existing work properly.

Therefore, we argue that our approach can defeat commonly used code obfuscation techniques more effectively for source code similarity detection than the existing state-of-the-art approaches.

American Psychological Association (APA)

Zhang, Feng& Li, Lulu& Liu, Cong& Zeng, Qingtian. 2020. Flow Chart Generation-Based Source Code Similarity Detection Using Process Mining. Scientific Programming،Vol. 2020, no. 2020, pp.1-15.
https://search.emarefa.net/detail/BIM-1209263

Modern Language Association (MLA)

Zhang, Feng…[et al.]. Flow Chart Generation-Based Source Code Similarity Detection Using Process Mining. Scientific Programming No. 2020 (2020), pp.1-15.
https://search.emarefa.net/detail/BIM-1209263

American Medical Association (AMA)

Zhang, Feng& Li, Lulu& Liu, Cong& Zeng, Qingtian. Flow Chart Generation-Based Source Code Similarity Detection Using Process Mining. Scientific Programming. 2020. Vol. 2020, no. 2020, pp.1-15.
https://search.emarefa.net/detail/BIM-1209263

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1209263