Source Code Plagiarism Detection SCPDet A
Source Code Plagiarism Detection SCPDet A
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/268446973
READS
425
3 authors, including:
Tapan P Gondaliya
Hiren Joshi
RK University
8 PUBLICATIONS 8 CITATIONS
8 PUBLICATIONS 3 CITATIONS
SEE PROFILE
SEE PROFILE
Hardik Joshi
Associate Professor
School of Computer Science
Dr. Babasaheb Ambedkar Open
University
Ahmadabad, Gujarat, India
Assistant Professor
Department of Computer Science
Gujarat University
Ahmadabad, Gujarat, India
ABSTRACT
Internet has stored large amount of data, information [30] or
source code. In this large amount of data or source code it is
very difficult and time consuming task to find out the
similarity or plagiarism in the source code, research
publications in academic.[1] In this paper here we describe the
some of the techniques and algorithms for how to find out the
plagiarisms in source code. So in large organization or
academic institute can easily find out the plagiarism in source
code and research publications using this technique. We also
differentiate all the techniques of plagiarism for find out how
can one technique is differing then the other as well.
General Terms
Source Code Plagiarism Detection in Student Assignments
Keywords
Plagiarism, Source code, Source code reuse, Plagiarism
Detection System
1. INTRODUCTION
Digital documents can easily copy from one place to another.
[2] Plagiarism is work of others is reproduced without
acknowledging the source, this is known as plagiarism. [7]
Work that can be plagiarized in many formats includes words,
computer program, computer software, graphics or drawing,
electronic material and more.[7] Plagiarism is one of the
growing global problems experienced by the publishers,
researches and educational institutions which are generally
defined to be the literary theft. Probably in the most frequent
cases appear in academic institutions where students copy
material from books, journals, on the Internet, their peers
without citing references.
JPlag
SIM
MOSS
Plaggie
2.1 JPlag
JPlag is a kind of plagiarism technique that finds the
similarities between multiple sets of source code in the
program. [5] JPlag does not aware with the byte or text but
its aware with the program syntax, program structure. And
JPlag support natural language text and different kind of
programming languages like a java, c, c++, c# and as well. [5]
JPlag is generally work in the two phase in this two phase in
first phase programs to be compared are parsed and converted
into token strings. [6] In second phase token strings are
compared in two different pairs for formative the similarity of
each pair. In this comparison, JPlag attempts to cover one
token stream with substrings (tiles) taken from the other as
well as possible. [6] Here we give the demonstration image of
JPlag simulator for we know how can work this technique.
18
2.2 SIM
SIM is a kind of the Software tool that was developed in 1989
and created by the Dick Grune. [10] SIM is a very important
tool for finds the similarity in different language source code
like a Pascal, c, java and more. [8] SIM is basically work on
the tokenization. Process of similarity found in SIM is first
they tokenize the source code and then SIM create a forward
reference table and that can be used for detect the best
matches between new files, and the text and after they
compared both of the things [8]. In this technique programs
are first parsed via the flex lexical analyzer for produce a
sequence of integers or we can say tokens. The tokens for
symbols, keywords, and comments, at the same time the
tokens for identifiers are assigned dynamically and saved in a
shared symbol table and whitespace is deleted from them.[8]
SIM detects similarities between two different source code by
using their correctness, fashion, and uniqueness as well. SIM
is also used for DNA string matching. [9]
Here we give the demonstration image of SIM Tools for we
know more about the SIM tools and its process.
2.3 MOSS
MOSS stands for Measure of software similarity. [8] It is a
one kind of the plagiarism detection system created by Alex
Aiken and UC Berkeley. [3] Moss tool was basically founded
in 1994 and It is the system that finds out the source code
19
2.4 Plaggie
Plaggie is a kind of engine that find out the source code
similarity from programming languages. [8] Plaggie is
generally a stand-alone source code plagiarism detection
engine. [15] plaggie functionality and graphical user interface
are very similar to the JPlag but plaggie is a open source and
installation of that engine is in local system as well.[15]
plaggie was developed by Ahtiainen et al in 2002. Plaggie
only check programs that are written in Java that means
support only one language. The basic algorithm used for
comparing source code in plaggie is tokenization and after
that Greedy String Tiling and do not used the optimization.
[8] Above we discussed the four important source code
plagiarism detection tools its functionality and show that
graphical user interface of that tools and now here we
compare that four tools with its different characteristics.
Table 1. Compression of four source code detection tools
with its characteristics, function and technique
Tools
JPlag
SIM
MOSS
Plaggie
Open Source
Tools/Paid
Local/online
tool
Code
Submit/File
NO
YES
NO
YES
Web
Local
Web
Local
Submit
Code
Submit
File
Submit
Code
Submit
Code
23
No
Yes
No
No
1996
1989
1994
2002
Guido
Malpo
hl
Dick
Grune
Aiken et
al
Ahtiaine
n et al
Greedy
String
Tiling
&
Optimiz
ation
&
Tokeniz
ation
Flax
lexical
analyzer
Winnowing
technique
Greedy
String
Tiling
&
Tokenizat
ion
Lang.
Support
Expandability
Founded in
Year
Founded By
Technique
3. LITERATURE REVIEW
In this phase we will describes the some of the research paper
that is regarding to the source code plagiarism detection as
well as after that description we will also compare that all the
research papers or the algorithms in the tabular formats as
well.
20
21
Table 2.Summery and comparative study of the different research papers and its characteristics and functionality
Sr.No
Name of Authors
1.
Enrique Flores,
Alberto Barron-Cedeno,
Paolo Rosso,
Lidia Moreno
Year:- 2011
2.
DeSoCoRe: Detecting
Source Code Re-Use
across Programming
Languages
[15]
Enrique Flores,
Alberto Barron-Cedeno,
Paolo Rosso,
Lidia Moreno
3.
Software Plagiarism
Detection A Graph-based
Approach
[16]
Dong-Kyu Chae,
Jiwoon Ha,
Sang-Wook Kim,
BooJoong Kang,
Elu Gyu Lm
4.
5.
Yingnong Dang,
Song Ge,
Ray Huang and Dongmei
Zhang
Ameera Jadalla & Ashraf
Elnagar
22
Plagiarism in
Programming
Assignments
[19]
Mike Joy,
Michael Luck
Year:- 1999
7.
8.
A. Bugarn, M. Carreira, M.
Lama, X.M. Pardo
Xin Chen,
Brent Francia,
Ming Li,
Brian McKinnon,
Amit Seker
Method Used: -
Kolmogorov
Complexity.
9.
Zoran Djuric,
Dragan Gasevic
10.
11.
Enrique Flores,
Alberto Barron-Cedeno,
Paolo Rosso
and Lidia Moreno
12.
Georgina Cosma,
Mike Joy
Method Used:-
Latent Semantic
Analysis
Tools :PlaGate, Novel Tools
23
14.
Akhil Gupta,
Dr. Sukhvir Singh
Methods:Lexical Analysis
Upul Bandara,
and
Gamini Wijayarathna
Method:Machine Learning
Technique
15.
C Code Plagiarism
detection System
[28]
N.Haritha ,
M.Bhavani,
K.Thammi Reddy
16.
4. CONCLUSION
In this paper author first of all describe the real meaning of
source code plagiarism after that described the different
source code plagiarism detection tools and compared its
function, characteristics and technique. In the last phase
authors discussed the different research papers and compared
in tabular form with its technique, method, characteristics,
5. ACKNOWLEDGMENTS
I would like to thank to Dr. Tushar Deshai, Head of Doctoral
Study & Vimal Bhatt Administrator of Doctoral Study at RK
University for good technical support for any time. I wish to
24
6. REFERENCES
[1]
IJCATM : www.ijcaonline.org
25