Plagiarism Detection System
Plagiarism Detection System
Supervisor
Eng. Hani Salah
June 2010
Acknowledgment
The team members advance deep thanks to their dear supervisor
Eng. Hani Salah
Project team
II
Dedication
To our parents for their support and encouragement all the time
To the supervisor of the project Eng. Hani Salah
To all our teachers and friends.
Project team
III
الملخص
االَتحبل فً واجببث انطهبت هً يشكهت راث َطبق واعغ ويتضاٌذة فً انًشاحم األكبدًٌٍت .أٌ ػًهٍت
انكشف ػٍ االَتحبل بىاعطت اإلَغبٌ هً ػًهٍت بطٍئت وغٍش يىثىلت .نزنك عٍتى فً هزا انًششوع
بُبء َظبو يحىعب ٌؼتًذ ػهى انىٌب نكشف حبالث االَتحبل حتى ٌتًكٍ انًذسعىٌ فً انجبيؼبث
يٍ ػًم كشف نالَتحبل بطشٌمت أعشع وأكثش دلت ويىثىلٍت.
ٌىجذ انؼذٌذ يٍ انخىاسصيٍبث نكشف االَتحبل ,نزنك تى دساعت هزا انخىاسصيٍبث ,واختٍبس أحذهب
كأعبط نتصًٍى وتُفٍز هزا انُظبو نًمبسَت انىاجببث انتً ٌغهًهب انطهبت ػهى يىلغ انتؼهٍى
االنكتشوًَ انًغتخذو فً جبيؼت بىنٍتكُك فهغطٍٍ ,وإخشاج تمبسٌش تفٍذ بىجىد حبالث االَتحبل
,ونمذ تى اعتخذاو دوسة حٍبة تطىس االَظًت ( )SDLCكًُهجٍت يؼتًذة نبُبء انُظبو.
وعٍتى تطبٍك هزا انُظبو ػهى يىلغ انتؼهٍى االنكتشوًَ فً جبيؼت بىنٍتكُك فهغطٍٍ (يىودل),
ببعتخذاو نغت PHPانًغتخذيت فً بشيجت َظبو يىودل.
يغ انؼهى بىجىد أَظًت تمىو بُفظ ػًم َظبو هزا انًششوع ,إال أٌ هزِ األَظًت (انبشيجٍبث) غٍش
بشكم يفتىح يجبٍَت وغٍش يفتىحت انًصذس ,بؼكظ انُظبو انًمذو فً هزا انتمشٌش وانزي عٍىصع
انًصذس.
IV
Abstract
There is currently a number of existing detection algorithms. Some of them have been
studied, analyzed, and compared. Based on this analysis and comparison, the most suitable
one for our system (namely: AC) has been chosen for the implementation phase. Our
system will reuse the selected algorithm, and integrate it to the E-learning platform that is
used at PPU (namely: Moodle). The algorithm has been reprogrammed by PHP (the
language used in Moodle programming), in this project we will use System Development
Life Cycle (SDLC) as a methodology to implement this system.
Given that, there are already some existing systems (software) for plagiarism detection, but
they are all not free and not open-source. On the other hand, the system that is presented in
this report is free to use, integrateable with Moodle, and open-source.
V
TABLE OF CONTENTS
ACKNOWLEDGMENT II
DEDICATION III
DELARATION IV
الملخص V
ABSTRACT V
CHAPTER ONE INTRODUCTION 1
1.1 PROBLEM STATEMENT 2
1.2 PROJECT OBJECTIVE 2
1.3 PROJECT GOALS 2
1.5.1 Functional Requirements 4
1.5.2 Non-Functional Requirements 4
1.6 TIME SCHEDULE / GANTT CHART 4
1.7 THE RISKS 6
CHAPTER TWO BACKGROUND 9
2.1 PHP 10
2.1.1 Advantages of PHP 10
2.1.2 Why PHP 10
2.2 MOODLE 10
2.2.1 Moodle Usage 10
2.2.2 Moodle Management 10
2.2.3 Assignments in Module 11
2.3 EXISTING CHEATING DETECTION ALGORITHMS AND OPEN SOURCE SOFTWARE 11
2.3.1 Attribute-Counting Systems 11
2.3.2 Structure-metric Systems 12
2.4 CONCLUSIONS 18
CHAPTER THREE SYSTEM REQUIREMENTS 23
3.1 SYSTEM REQUIREMENTS 24
3.1.1 Functional Requirements 24
3.1.2 Non-Functional Requirements 24
3.2 FEASIBILITY STUDY 24
3.2.1 Development Requirements 25
3.2.2 Operational Requirement 26
3.2.3 Total cost 27
CHAPTER FOUR SYSTEM SPECIFICATION 28
4.1 USE CASE 29
4.2 CLASS DIAGRAM 30
4.3 OBJECT DIAGRAM 31
4.4 SEQUENCE DIAGRAM 31
4.5 COLLABORATION DIAGRAM 33
4.6 STATE DIAGRAM 34
4.7ACTIVITY DIAGRAM 36
VI
CHAPTER FIVE DESIGN 38
5.1 DATABASE DESIGN 39
5.2 INTERFACE DESIGN 40
5.3 FLOWCHARTS 44
CHAPTER SIX IMPLEMENTATION 46
6.1 INSTALLATION ENVIRONMENT 47
6.2 SERVER INFORMATION AND CONFIGURATION 53
6.3 DEPLOYMENT TIERS 53
CHAPTER SEVEN TESTING 59
7.1 SYSTEM UNIT AND MODULE TESTING 57
7.2 INTEGRATION TESTING 58
7.3 SYSTEM TESTING 58
7.4 ACCEPTANCE TESTING 58
7.5 INTERFACE TESTING 59
CHAPTER EIGHT CONCLOUSIONS 68
8.1 CONCLUSIONS 67
8.2 DIRECTIONS FOR FUTURE WORK 67
REFERENCES 69
FIGURE REFERENCES 70
APPENDICES 71
VII
LIST OF FIGURES
Figure (1): SIM Example-Student Program Pairs ........................................................ 13
Figure (2): SID Phases ................................................................................................ 15
Figure(3): Parsing Steps ............................................................................................. 16
Figure (4): Use Case ................................................................................................... 29
Figure (5): Class Diagram. .......................................................................................... 30
Figure (6): Object Diagram......................................................................................... 31
Figure (7): Teacher Sequance Diagram ....................................................................... 32
Figure (8): Student Sequence Diagram ....................................................................... 33
Figure (9): Teacher Collaboration Diagram ................................................................ 33
Figure (10): Student Collaboration Diagram ............................................................... 34
Figure (11): Teacher State Diagram ............................................................................ 34
Figure (12): Student State Diagram............................................................................. 35
Figure (13): Teacher Activity Diagram ....................................................................... 36
Figure (14): Student Activity Diagram ........................................................................ 37
Figure (15): Login Form. ............................................................................................ 40
Figure (16): Select Course. ......................................................................................... 40
Figure (17): Select Assignment ................................................................................... 40
Figure (18): Submit Assignment. ................................................................................ 41
Figure (19): View Submitted Assignment. .................................................................. 41
Figure (20): Submitted Assignment ............................................................................ 42
Figure (21): plagiarism detection button. .................................................................... 43
Figure (22): Comparison Result. ................................................................................. 43
Figure(23): student operation. ..................................................................................... 44
Figure (24): Teacher Operation. .................................................................................. 45
Figure (25): Start Moodel . ......................................................................................... 48
Figure (26): Choose Language. ................................................................................... 45
Figure (27):Checking PHP Setting. ............................................................................. 45
Figure (28): Confirm the Location of Moodel. ............................................................ 50
Figure (29): Confirm Database Setting........................................................................ 51
Figure (30): Server Checks. ........................................................................................ 52
Figure (31): Language Pack. ....................................................................................... 53
Figure (32):GPL License. ........................................................................................... 54
Figure(33): Setting the Database. ................................................................................ 55
Figure(34):Setup Administrator Account. ................................................................... 56
Figure (35): Three-Tier System .................................................................................. 54
Figure (36): Three-Tier Class Diagram ....................................................................... 55
Figure(37): Login Form. ............................................................................................. 62
Figure(38): Login to the Site....................................................................................... 63
Figure(39): Main Page. ............................................................................................... 64
Figure(40): Select the Course From The Main Page. ................................................... 64
Figure(41): Select the Assignment . ............................................................................ 65
Figure(42): The Main Page of the Assignment. ........................................................... 66
Figure(43): Submission Page. ..................................................................................... 67
Figure(44): Similarity Page. ........................................................................................ 68
Figure(45): Visual Report. .......................................................................................... 69
Figure(46): Save as Excel Sheet or Word Document. .................................................. 70
VIII
LIST OF TABLES
Table(1): Gant Chart 1 (First Semester: October 2009 - December 2009) .................... 4
Table(2): Gant Chart 2 (Second Semester: February 2010 –May 2010) ........................ 5
Table (3): Expected Risks ............................................................................................ 6
Table (4): Effects and Responsibilities ......................................................................... 7
Table (5): Algorithms Comparing .............................................................................. 19
Table (6): Result Display ........................................................................................... 21
Table (7): Hardware Development Resources and Costs ............................................ 25
Table (8): Software Development Resources and Costs ............................................. 25
Table (9): Operational Hardware Resources and Costs............................................... 26
Table (10): Operational Software Resources and Costs .............................................. 26
Table (11): Total Cost................................................................................................ 27
Table (12): Assignment Table .................................................................................... 39
Table (13): Context Table .......................................................................................... 39
Table (14): User Table ............................................................................................... 39
Table (15): Shows Some Problems to be Tested Teacher Object ................................ 57
IX
Chapter One
Introduction
2
2. Picking one of the analyzed algorithms to be used in this system, or designing a
new one from the scratch. This goal includes coding the algorithm using PHP (the
programming language used in Moodle).
3. Integrating the plagiarism-detection code in the existing PPU e-learning platform
(Moodle).
Based on the goals stated above, we shall precede breaking down them into
concrete activities, functions, and deliverables as follow:
1.4 Methodology
The project team will use the traditional method of software engineering,
which called System Development Life Cycle (SDLC), in the analysis, and
development of the system.
We will start by studying, analyzing existing plagiarism system and algorithms,
and comparing between them to choice the best and most suitable one between them
to be uses on the project.
3
1.5 System Requirements
This section lists both functional and non-functional requirements of the system.
Further details about each requirement will be discussed in Chapter 3.
Table (1): Gantt Chart 1 (First Semester: October 2009 - December 2009)
Key Activities October November December
Analysis
Requirement
specification
Design(Draft)
Documentation
4
Important dates and milestones for 1st semester:
October 1st: Project plan.
October 20th: Project outline.
November 30th: literature review and Background.
December 10th: Requirement specification.
December 20th: Design.
December 24th: Final documentation.
Programming
Installation
Testing
Documentation
5
1.7 The Risks
During the different phases of this project, some risks may appear and cause
delaying, threaten the progress, or even affecting the outcomes of the project. Table (3)
summarizes the most important expected risk events, access degree of probability (P)
that the risk event happens (represented from 1-5 where five is the most likely), access
degree of effect (E) upon project when it happens (represented from 1-5 where five is
the worst), and the risk index which equal the product (P × E).
2 Lack of time / 5 3 15
schedule conflict
between members
3 Equipment 2 5 10
malfunction
4 Loss of data 2 5 10
5 Loss of equipment 2 5 10
6 Unexpected results 2 3 6
7 Communications 2 2 4
problems between
members
8 Installation 3 5 15
problems
6
Table (4) summarizes the expected effect for each event on the project, action(s)
could be taken for each event, and who is responsible for this action.
7
Table (4): Cont.
7 Communications Delaying Perform timely ALL
problems meetings ,good
between communication
members
8
Chapter Two
Background
2.1 PHP
2.2 Moodle
2.3 Existing Cheating Detection Algorithms and Open
Source Software
2.3.1 Attribute-Counting Systems
2.3.2 Structure-metric Systems
2.3.2.1 SIM
2.3.2.2 MOSS
2.3.2.3 JPlag
2.3.2.4 SID
2.3.2.5 AC
2.3.2.5 Code Match
2.4 Conclusions
In this project we will presents background about PHP, Moodel and Existing
Cheating Detection Algorithms and Open Source Software.
2.1 PHP
Hypertext Preprocessor (PHP) "is a widely-used general-purpose scripting
language that is especially suited for web development and can be embedded into
html" [3].
2.2 Moodle
Moodle is "a software package for producing internet-based courses and web
sites. It is an ongoing development project designed to support a social constructionist
framework of education" [4].
10
module can be added to Moodle installation. Plug-in language packs allow full
localization to any language. These can be edited using a built-in web based editor.
Currently there are language packs for over 70 languages.
11
2.3.2 Structure-metric Systems
This type of plagiarism detection algorithms introduces much larger number of
metrics and notions of similarity for the resulting feature vector in order to improve
performance (based on structure and metric comparison).
These algorithms are usually based on converting the program into a stream of
tokens (thus ignoring easily changeable information such as space, line breaks,
comments, etc.) and then comparing these token streams to find similarities among
them. The most advanced systems in this category (in terms of plagiarism detection
performance) are: SIM, MOSS, JPlag, AC and CodeMatch. The following is a brief
description for these systems.
2.3.2.1 SIM
Software Similarity Tester (SIM) plagiarism detection system was developing in
1999 by Gitchell and Tran as a system for measuring the similarity between text written
in C, Java, Pascal and natural language.
In SIM, each program is first parsed using the lexical analyzer to produce a
sequence of integers (tokens), then compares token sequences using a dynamic
programming string alignment technique. This technique first assigns each pair of
characters in alignment a score. For example, a match scores 1, a mismatch scores -1.
The score between two sequences is then defined to be the maximum score among all
alignments, and tests similarity between texts written in C, Java, Pascal, and natural
language. With this definition, a similarity measure between two sequences is defined
as follows:
s 2 scores, t scores, s scoret , t
12
Figure (1): SIM Example-Student Program Pairs (Source: [1])
2.3.2.2 MOSS
Measure of Software Similarity (MOSS) was developed in 1994 by Alex Aiken at
Berkeley as a system for measuring the similarity of source code written in C, C++,
Java, or Pascal. MOSS tests the source code in real file be parse the source code,
tokenizing it and apply comparison algorithm (MOSS) to the tokenized form of the
code. And compare it with the source code in other files.
2.3.2.3 JPlag
The amount of information given about JPlag is very sparse. JPlag does not
compare to the internet. It is designed to find similarities among the student
assignments, which is usually sufficient for computer programs. However its main
function is to convert the programs into token strings and comparing these strings.
13
The official JPlag website [8] summarizes its work a s follow: "Similarities of 0% or
5% can be represented by the similarity value alone this clearly is no plagiarism.
Likewise, similarities of 100% can also be represented by the similarity value alone —
this clearly is a plagiarism. But what if the similarity is 40%? Such cases should usually
be investigated by a human being for final judgment.”
2.3.2.4 SID
Shared Information Distance or Software Integrity Detection (SID) detects
similarity between programs (source code) by computing the shared information
between them.
SID is easy to use software to detect plagiarism within source code and has shown
to be the most effective at catching cheaters. SID currently supports Java and C++
source codes. For two programs to be compared, SID computes the shared amount of
information between two programs, the shared information distance between two
programs X and Y is defined as:
x, y k x k x y
k xy
14
File 1 Token Seq 1
2.3.2.5 AC
AC presents a website to detect similarity between assignments or programs and
can be used by any person free. This website provides statistical analysis and several
graphical visualizations aid in the interpretation of analysis results. AC tools available
for research and development at: http://tangow.ii.uam.es/ac.
1. Distance integration
This stage put the characters in sequence and converting them into sequence of
tokens after removing comments and spaces from the source file.
This stage counts the tokens between two assignments using parser (compiler to
compare the similarities between the two sequences) and gives the percentage of
similarity.
15
5. Semantic parsing makes the actual parsing by comparing the sequences and
gives of the outputs.
Source String
Lexical Analysis
(Create Tokens)
Tokens
Syntactic Analysis
Parse
Compiler, Interpreter or
Translator
Output
16
2.3.2.6 CodeMatch
CodeMatch compares every file in one directory with every file in another
directory, including all subdirectories if requested. CodeMatch produces a database that
can then be exported to an HTML basic report that lists the most highly correlated pairs
of files. You can click on any particular pair listed in the HTML basic report see an
HTML detailed report that shows the specific items in the files (statements, comments,
identifiers, or instruction sequences) that caused the high correlation.
The Algorithms
Statement Matching
17
Comment Matching
Identifier Matching
For each file pair, the CodeMatch Identifier Matching algorithm counts the
number of matching identifiers that are not programming language keywords. In order
to determine whether an identifier is a programming language keyword, comparison is
done with a list of programming language keywords. Only non-keywords are compared
in order to find matching function names, variable names, and other identifiers.
Correlation Score
Finally, a single correlation score is given for the similarity of the file pairs. If a
file pair has a higher score, it implies that these files are more similar and may be
plagiarized from each other. CodeMatch reduces the effort needed by the expert by
allowing him to narrow his focus from hundreds of thousands of lines of code in
hundreds of files to dozens of lines of code in dozens of files.
2.4 Conclusions
Based on the information presented in section 2.3, AC is nominated to be used for
the implementation of the project. The reasoning of this selection can be explained as
follow:
1. AC is free and open source.
2. AC supports both programming (source-code) and other natural language files.
These points in addition to AC features stated in section 2.3.2.5 make it a good
candidate for this project. In this project we cannot use the AC code as it because it is
difficult to integrate it with model code, so we analysis the code and reprogramming the
main algorithms by using PHP to use it in the plagiarism detection system.
18
Table (5): Algorithms Comparison
Language (s) Java, C#, C, C, C++, Java, Java, C++. BASIC, C, C, Java, Pascal C, Java,
Supported C++, C#, C++, and natural natural
Scheme and Python, Visual C#, Delphi, language. language.
natural Basic, Flash
language Javascript, ActionScript,
text. FORTRAN, Java,
ML, Haskell, JavaScript,
Lisp, MASM,
Scheme, Pascal,
Pascal, Perl, PHP,
Modula2, Perl, PowerBuilder,
TCL, Ruby, SQL,
Matlab, Verilog,
VHDL, VHDL.
Verilog,
Spice, MIPS
Assembly
8086,
HCL2.
Cost Free but user Free but user Free and open Commercial Free and open Free and open
must create must sourced tool, free on sourced sourced
an account create an any code
account where
the total of all
files being
examined is
less than 1
19
megabyte
Requirements Web browser, A submission JDK 1.4 or JDK 1.4 or Java runtime
Java Runtime script later later environment
Environment for either
(JRE), Java UNIX or
1.5 or higher Windows
Security User id and User id and Runs locally Runs locally Runs locally
e-mail e-mail
needed needed
20
Table (5): Cont.
Speed Fast Fast Large sets of Large sets of Fast Fast
files will files will
require long require
amounts of long amounts
time to of
analyze time to analyze
Algorithms Greedy String Winnowing Token based String Greedy String Token based
Tiling Algorithm matching matching Tiling matching
algorithm for algorithm for
source-code, source-code,
string string
matching for matching for
natural natural
language language
texts texts
Overview Histogram, Ordered list Matching pair Ordered list Ordered list Ordered list
results display statistics Tree
method about
the files that
were
analyzed
Display of Powerful Powerful Powerful Powerful Results Powerful
Results graphical graphical graphical graphical displayed in a graphical
interface for interface for interface for interface for dialogue interface for
presenting presenting presenting presenting presenting
results results results results results
21
Table (6): Cont.
Visualization of Cross linked 2/4 scroll Table List of results existence
Results listings, bars, showing showing the graphical and
scroll simultaneous the similar detected code chart tool.
bars scrolling lines fragments
of code between file
detected pairs.
among
file pairs.
Visual display of Yes Yes Yes No No Yes
matched file
pairs
Metrics Percentage Percentage Percentage Percentage Token matches, Percentage
Produced similarity, similarity, similarity, similarity, lines matched similarity, token
token token token lines matches.
matches matches, matches matched
lines
matched
22
Chapter Three
System
Requirements
2. Ease to use
Teachers will interact with the system to generate plagiarism report
through a user-friendly graphical user interface. Furthermore, the
generated reports will contain both textual and visual (bars, charts, etc.)
representation for the results.
24
3.2.1 Development requirements
1. Development hardware resources
Table (7) below shows the hardware resources that are needed during the
development phase along with their costs.
Total 1609$
Table (8) below shows the software resources that are needed during the
development phase along with their costs.
1
(http://www.amazon.com, accessed: 10/10/09)
25
3. Development human resources
The cost of development is zero because the developers of the system are
the project group team and the system development is part of their graduation
requirements.
Table (10) below shows the software resources that are needed during the
implementation phase along with their costs.
2
( http://www.php.net/ ,accessed: 10/10/09)
26
3. Operational human resources
The human operational cost is zero, because the e-learning supervisor and
network supervisor are employees at Palestine polytechnic university and they
already paid salaries for their jobs.
27
Chapter Four
System
Specification
29
4.2 Class Diagram
The Class Diagram is used to describe the main classes and their roles in the
system. This diagram is shown in Figure (5).
30
4.3 Object Diagram
The Object Diagram is used to describe the main objects in the system as shown in
the Figure (6).
31
Figure (7): Teacher Sequance Diagram
32
Figure (8): Student Sequence Diagram
33
4.5.2 Student collaboration diagram
Figure (10) shows the data that flow between object in students operation.
34
4.6.2 Student state diagram
Figure (12) shows the different state for objects depend on its attributes
during student operation.
35
4.7 Activity Diagram
4.7.1 Teacher activity diagram
Figure (13) presents the deferent activity for teacher.
36
4.7.2 Student activity diagram
Figure (14) presents the deferent activity for teacher.
37
Chapter Five
Design
Context table
Table (13): Context Table
Context
id :INT
contextlevel: INT
instance: INT
path:VARCHAR(255)
depth:INT
id :INT
name: VARCHAR(255)
major: VARCHAR(255)
address: VARCHAR(255)
phone: INT
3
(These tables are subset of the complete database of Moodle; a subset of the schema
can be found in Appendix (B)).
39
5.2Interface Design
Login form
This form is used to authenticate users. The user (teacher or student) enters
his / her credentials in the login form (see Figure (15)). Based on successful
authentication, the user will be redirected to the course page.
Select course
Through this form, the user selects a course from the list of courses he /
she was registered in (see Figure (16)).
Select assignment
Through this form, the user selects an assignment (see Figure (17)).
40
Submit Assignment
Finally, the student submits his / her solution on the upload form as
shown in Figure (18).
Submitted Assignment
After the teacher press on the link above, the submitted assignments will
be displayed as shown in Figure (20).
41
Figure (20): Submitted Assignments
42
Figure (21): Plagiarism Detection Button
Comparison Result
Figure (22) shows a sample cheating (comparison) results.
43
5.3 Flowcharts
Student operation
Start
No
Correct input
Yes
Student page
Select course
Select assignment
Submit assignment
End
44
Teacher operation
Start
No
Correct input
Yes
Teacher page
Select course
Select assignment
Show Submitted
assignment
Compare Submitted
assignment
Show result
End
45
Chapter Six
Implementation
Moodle requirements
1. Hardware
1. Disk space: 160MB free (min). To store the teaching materials it requires
more free space.
2. Memory: 256MB (min), 1GB (recommended). Moodle can support 50
concurrent users for every 1GB of RAM, but this will vary depending on
the specific hardware and software combination.
3. The capacity can limit the number of users that Moodle site can handle.
2. Software
1. Web server software: Moodle can work fine under any web server that
supports PHP, such as IIS on Windows platforms.
2. PHP scripting language. There are currently two versions of PHP
available: PHP4 and PHP5.
To install Moodle:
Click on the Start Moodle.exe button, after you click on this button, you see a
CMD prompt interface; this interface displays Moodle version and other
information; it shows two choices the 0 for exit and 1 for refresh; to start enter
1, and XAMPP sever will start. The screen is shown in Figure (25).
47
Figure (25): Start Moodle
After running the XAMPP server, write http://localhost on the browser address,
then the Moodle language selection page will appear as shown in Figure (26).
48
After selecting the language, you will be redirected to the PHP settings screen
as show in Figure (27).
The next screen (see Figure (28)) will ask the administrator to fill-in the
following information:
1. Web address: specify the full web address where model will be
accessed.
2. Moodel directory: specify the full directory path to this installation.
3. Data directory: a place where Moodel can save uploaded files.
49
Figure (28): Confirm Moodel Location
The next step is to configure Moodle database settings. The installer as shown
in Figure (29) creates this database automatically.
50
The next step (Figure (31)) is to check if the various components of the system
meet the system requirements.
If you want to change the installation language, you need to download the language
pack as shown in Figure (31).
51
Next, the GPL license will appear as can be seen in Figure (32).
52
Then configure an account for the main administrator who has a complete
control over the site. You need to give him a secure username and password
(see Figure (34)).
53
their operations or business rules from both the user interface, from the database, and
from any other legacy applications that might be used.
4
The Computer Center in PPU
54
Figure (36): Three-Tier Class Diagram
55
Chapter Seven
Testing
The unit and module testing has been performed using the black box testing
method; we suggest some possible problems and test the system performance against
these problems.
Table (15): Shows Some Problems to be Testing on Teacher Object
Inputs Expected values Actual values Notes
Valid Teacher Error Message Error Message Match
Username and Invalid
Password
57
Table (15): Cont.
Select Course That Open Course Main Open Course Main Match
The Teacher Does Page Page
Register In It
Select The Open Assignment Open Assignment Match
Assignment Main Page Main Page
Select No Submitted Error Message (No Error Message (No Match
Assignments Submitted Submitted
Assignments) Assignments)
Select # Submitted Open The List Of Open The List Of Match
Assignments Students Names Students Names
Whose Submitted Whose Submitted
Assignment Assignment
Check The Plagiarism View Plagiarism View Plagiarism Match
Detection Report Report
58
7.5 Interface Testing
This section simulates (tests) how all system screens will be used by a teacher.
The sequence and displayed results show that all main system screens are working as
expected. First, the teacher enters the Moodle Website (http://elearning.ppu.edu/) and he
will see the main page of Moodle that contains interactive objects such as the
organization name (PPU in our case), the course names, and the website description. To
enter the system, the teacher clicks the Login button, or enters his / her own username
and password in the login form directly (see Figure (37)).
If he / she clicked on the button, he / she will be redirected to the login page; in
this page, the teacher should enter his / her own username and password correctly.
59
Figure (38): Login to the Site
If the entered username and password were correctly, Moodle will redirect the
teacher to the main page which contains the names of courses that he / she is teaching,
course description, and the name of teacher (teachers) that he/she (they) teaching each
course as shown in the Figure (39).
60
When the teacher clicks on a course name, he / she will be redirected to the main
page of the selected course.
After clicking on the name of the selected-course, the main page of that course
will be displayed. This page contains course name, course participants, course
assignments and other activities. The user then can click on the name of the available
assignments (if any) as shown in Figure (41).
61
Figure (41): Select an Assignment
62
After clicking on the link, the teacher will be redirected to the submission page
that contains submitted assignments; this page contains the names of students who
submitted the assignments and some other information.
The teacher can view these assignments, add comments, add the assignment
grade, and update this assignment. When click on the link, the teacher can view the
plagiarism report that compares the assignments with each other’s.
When the teacher clicks on the view similarity table, he/she will be redirected to
the plagiarism form that shows the result of comparing between the submitted
assignments as shown on the Figure (44).
63
Figure (44): Similarity Page
The teacher can view the visual report when he/she click on the view chart link
as shown on the previous Figure (45).
64
The teacher can export the results of the cheating report to an Excel sheet or Word
document by clicking on the Excel sheet, Word document link as shown in Figure (46).
65
Chapter Eight
Conclusions
8.1 Conclusions
8.2 Future Work
This system aimed to upgrade Moodle (the well-known open-source e-learning
system). The presented system allows the teachers to detect the plagiarism among
student's submitted assignments. This functionality has been successfully embedded
into Moodle, and it is now ready to use by university teachers.
8.1 Conclusions
1. The system upgrades on Moodle and its open-source.
2. The plagiarism detection functionality of the system is based on a well-known
algorithm in this field called AC.
3. The system compares among student's submitted-assignments and gives the rate
of convergence among them.
4. The system generates cheating reports in both tabular and bar chart formats. It
also enables the teacher to export cheating results to Excel spreadsheets or Word
documents.
5. The system currently supports limited types of files; for the natural languages, it
supports .text, .doc, and .docx files. For source code assignments, the system
supports .php, .C, .C++, and .java files.
67
References
[1] Sanjay Goel, Deepak Rao et al.: Plagiarism and its Detection in Programming
Languages, December 15, 2005.
[2] Sanjay Goel, Deepak Rao et al.: Plagiarism and its Detection in Programming
Languages, December 15, 2005.
[5] Manuel Freire, Manuel Cebrian and Emilio del Rosal: An Integrated Source Code
Plagiarism Detection Environment, Escuela Politecnica Superior, Universidad
Autonoma de Madrid, 28049 Madrid, Spain.
[6] Shared Information and Program Plagiarism Detection, Xin Chen, Brent Francia,
Ming Li, Brian Mckinnon, Amit Seker, University of California, Santa Barbara,
December 13, 2003.
68
Figure References
Figure (1):
http://www.cis.famu.edu/~ejones/papers/stephens-plagiarism-ccscsc.pdf,
(accessed: 23/11/2009)
Figure(2):
Figure (3):
Xin Chen, Brent Francia, Ming Li, Brian Mckinnon, and Amit Seker: Shared
Information and Program Plagiarism Detection, University of California,
Santa Barbara, December 13, 2003.
69
Appendices
User Manual
71
User Sub-Manual
For Using Plagiarism Detection
System*
June 2010
This user sub-manual is under the GNU General Public License (GPL).
*
This user sub-manual is for our project only, and we aim to integrate it to the original user manual of
Moodle once we integrate our system with Moodle.
72
1.1 Introduction
Plagiarism can be defined as the use or close imitation of the language and
thoughts of another author and the representation of them as one's own original work.
So, in this project we will applied this system on Moodle, this is Moodle using
in E-Learning at Palestine Polytechnic University (PPU), It has become very popular
among educators around the world as a tool for creating online dynamic web sites for
their students. To work, it needs to be installed on a web server somewhere, either on
one of your own computers or one at a web hosting company, and it is open source
software package for producing Internet-based courses and web sites. It is a global
development project designed to support a social constructionist framework of
education.
Moodle is provided freely as Open Source software under the GNU General
Public License (GPL) Version 2, June 1991. Basically this means Moodle is
copyrighted, but that you have additional freedoms. You are allowed to copy, use, and
modify Moodle.
73
1.2 System Features
The system presents open-source software to be integrated with an existing
open-source e-learning platform (namely: Moodle). The user does not need to know
how this system works; him / her only needs to know how to use Moodle as a teacher.
1. When you request the Moodle Website (http://elearning.ppu.edu/) you will see the
main page of the Moodle, it is containing interactive objects such as the
organization name that using the Moodle, the courses names, and the website
description. To enter the system please click on the button, or please enter your own
username and password in the login form directly as shown in Figure (1).
74
Figure (1): Login Form
2. If you click on the button, you show the login page, please enter your own username
and password correctly in spatial place as shown in the Figure (2).
75
3. After you have entered your own username and password correctly, the Moodle will
redirect you to the main page, which contains the names of courses that you are
teaching, course description, and the name of teacher (teachers) that he/she (they)
teaching each course as shown in the Figure (3).
4. Please click on the name of the course that you want, this will redirect you to the
main page for this course as shown in the Figure (4).
76
Figure (4): Select the Course
5. After clicking on the name of the selected-course, you will see the main page of the
selected-course. It contains course name, participants in the course, course
assignments and other activities. Please click on the name of the wanted-assignment
as shown in the Figure (5).
77
6. View the main page of the assignment, this page contains assignment description,
number of students whose submitted-assignments, and button, to view the submitted
assignments please click on as you see in Figure (6).
7. After clicking on the link, the teacher will be redirected to the submission page that
contains submitted assignments; this page contains the names of students who
submitted the assignments and some other information.
The teacher can view these assignments, add comments, add the assignment
grade, and update this assignment. When click on the link, the teacher can view the
plagiarism report that compares the assignments with each others.
78
Figure (7): Submission Page
8. When the teacher clicks on the view similarity table, he / she will be redirected to the
plagiarism form that shows the result of comparing between the submitted
assignments as shown on the Figure (8).
79
Figure (8): Similarity Page
9. The teacher can view the visual report when he/she click on the view chart link as shown on
the previous Figure (9).
80
10. The teacher can export the results of the cheating report to an Excel sheet or Word document
by clicking on the Excel sheet, Word document link as shown in Figure (10).
1.4 Glossary
1. Plagiarism Detection System: is open-source software that using to locating
instance of plagiarism within the documents withers its text or code and
reporting the plagiarism cases, and upgrade with the Moodle.
2. Course Management System (CMS): is a tool that allow instructors to post
information on the web, this information related courses as course chapters,
videos, audios, assignments, and so on, this is a tool facilitates course
management by the instructors.
3. Learning Management System (LMS): is a software application for the
administration, documentation, tracking, and reporting of training programs,
classroom and online events, e-learning programs, and training content.
81
4. Virtual Learning Environment (VLE): is a software system designed to
support teaching and learning in an educational setting, where the focus is on
management.
5. Open-Source: describes practices in production and development that promote
access to the product’s source materials. Before the term open source became
widely adopted, developers and producers used a variety of phrases to describe
the concept; open source gained hold with the rise of the Internet, and the
attendant need for massive retooling of the computing source code.
6. Modular Object-Oriented Dynamic Learning Environment (Moodle): is an
Open Source CMS, also known as a LMS or a VLE. It has become as popular e-
learning platform in higher education to creating online dynamic web sites for
their students (http://www.moodle.org/about/).
82
Appendix (B)
Moodle Database
Schema
83
84