Computation of Program Source Code Similarity by Composition of Parse Tree and Call Graph

Joint Authors

Song, Hyun-Je
Park, Seong-Bae
Park, Se Young

Source

Mathematical Problems in Engineering

Issue

Vol. 2015, Issue 2015 (31 Dec. 2015), pp.1-12, 12 p.

Publisher

Hindawi Publishing Corporation

Publication Date

2015-04-16

Country of Publication

Egypt

No. of Pages

12

Main Subjects

Civil Engineering

Abstract EN

This paper proposes a novel method to compute how similar two program source codes are.

Since a program source code is represented as a structural form, the proposed method adopts convolution kernel functions as a similarity measure.

Actually, a program source code has two kinds of structural information.

One is syntactic information and the other is the dependencies of function calls lying on the program.

Since the syntactic information of a program is expressed as its parse tree, the syntactic similarity between two programs is computed by a parse tree kernel.

The function calls within a program provide a global structure of a program and can be represented as a graph.

Therefore, the similarity of function calls is computed with a graph kernel.

Then, both structural similarities are reflected simultaneously into comparing program source codes by composing the parse tree and the graph kernels based on a cyclomatic complexity.

According to the experimental results on a real data set for program plagiarism detection, the proposed method is proved to be effective in capturing the similarity between programs.

The experiments show that the plagiarized pairs of programs are found correctly and thoroughly by the proposed method.

American Psychological Association (APA)

Song, Hyun-Je& Park, Seong-Bae& Park, Se Young. 2015. Computation of Program Source Code Similarity by Composition of Parse Tree and Call Graph. Mathematical Problems in Engineering،Vol. 2015, no. 2015, pp.1-12.
https://search.emarefa.net/detail/BIM-1073823

Modern Language Association (MLA)

Song, Hyun-Je…[et al.]. Computation of Program Source Code Similarity by Composition of Parse Tree and Call Graph. Mathematical Problems in Engineering No. 2015 (2015), pp.1-12.
https://search.emarefa.net/detail/BIM-1073823

American Medical Association (AMA)

Song, Hyun-Je& Park, Seong-Bae& Park, Se Young. Computation of Program Source Code Similarity by Composition of Parse Tree and Call Graph. Mathematical Problems in Engineering. 2015. Vol. 2015, no. 2015, pp.1-12.
https://search.emarefa.net/detail/BIM-1073823

Data Type

Journal Articles

Language

English

Notes

Includes bibliographical references

Record ID

BIM-1073823