0% found this document useful (0 votes)

14 views15 pages

DeepReview_ Automatic code review using deep multi-instance learn

The document presents DeepReview, a novel deep learning model for automatic code review using multi-instance learning, which addresses the challenges of traditional code review processes. By treating each code change as a collection of hunks, DeepReview utilizes Convolutional Neural Networks (CNN) to predict whether a change is approved or rejected, significantly improving prediction accuracy compared to existing methods. Experimental results demonstrate its effectiveness on large datasets from open source projects, establishing it as a valuable tool for enhancing software quality assurance.

Uploaded by

Lizzie Geraldino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views15 pages

DeepReview_ Automatic code review using deep multi-instance learn

Uploaded by

Lizzie Geraldino

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Singapore Management University

Institutional Knowledge at Singapore Management University

Research Collection School Of Computing and School of Computing and Information Systems
Information Systems

4-2019

DeepReview: Automatic code review using deep multi-instance

learning
Hengyi LI
Nanjing University

Shuting SHI
Nanjing University

Ferdian THUNG
Singapore Management University, [email protected]

Xuan HUO
Nanjing University

Bowen XU
Singapore Management University, [email protected]

See next page for additional authors

Follow this and additional works at: https://ink.library.smu.edu.sg/sis_research

Part of the Software Engineering Commons

Citation
LI, Hengyi; SHI, Shuting; THUNG, Ferdian; HUO, Xuan; XU, Bowen; LI, Ming; and LO, David. DeepReview:
Automatic code review using deep multi-instance learning. (2019). Advances in knowledge discovery and
data mining: 23rd Pacific-Asia Conference, PAKDD 2019, Macau, China, April 14-17: Proceedings. 11440,
318-330.
Available at: https://ink.library.smu.edu.sg/sis_research/4346

This Conference Proceeding Article is brought to you for free and open access by the School of Computing and
Information Systems at Institutional Knowledge at Singapore Management University. It has been accepted for
inclusion in Research Collection School Of Computing and Information Systems by an authorized administrator of
Institutional Knowledge at Singapore Management University. For more information, please email
[email protected].
Author
Hengyi LI, Shuting SHI, Ferdian THUNG, Xuan HUO, Bowen XU, Ming LI, and David LO

This conference proceeding article is available at Institutional Knowledge at Singapore Management University:
https://ink.library.smu.edu.sg/sis_research/4346
DeepReview: Automatic Code Review
Using Deep Multi-instance Learning

Heng-Yi Li1 , Shu-Ting Shi1 , Ferdian Thung2 , Xuan Huo1 , Bowen Xu2 ,
Ming Li1(B) , and David Lo2
1
National Key Laboratory for Novel Software Technology, Nanjing University,
Nanjing 210023, China
{lihy,shist,huox,lim}@lamda.nju.edu.cn
2
School of Information Systems, Singapore Management University,
Singapore, Singapore
{ferdiant.2013,bowenxu.2017}@phdis.smu.edu.sg, [email protected]

Abstract. Code review, an inspection of code changes in order to iden-

tify and ﬁx defects before integration, is essential in Software Qual-
ity Assurance (SQA). Code review is a time-consuming task since the
reviewers need to understand, analysis and provide comments manu-
ally. To alleviate the burden of reviewers, automatic code review is
needed. However, this task has not been well studied before. To bridge
this research gap, in this paper, we formalize automatic code review
as a multi-instance learning task that each change consisting of mul-
tiple hunks is regarded as a bag, and each hunk is described as an
instance. We propose a novel deep learning model named DeepReview
based on Convolutional Neural Network (CNN), which is an end-to-end
model that learns feature representation to predict whether one change is
approved or rejected. Experimental results on open source projects show
that DeepReview is eﬀective in automatic code review tasks. In terms of
F1 score and AUC, DeepReview outperforms the performance of tradi-
tional single-instance based model TFIDF-SVM and the state-of-the-art
deep feature based model Deeper.

Keywords: Software mining · Machine learning ·

Multi-instance learning · Automatic code review

1 Introduction

Software Quality Assurance (SQA) is essential in software development. Soft-

ware code review [16] is an important inspection of code changes written by
an independent third-party developer in order to identify and ﬁx defects before
integration. Eﬀective code review can largely improve the software quality.
However, code review is a very time-consuming task that the reviewer needs
to spend much time to understand, analyze and provide comments for the code
review request. Additionally, with the rapid growth of software, the number of
c Springer Nature Switzerland AG 2019
Q. Yang et al. (Eds.): PAKDD 2019, LNAI 11440, pp. 318–330, 2019.
https://doi.org/10.1007/978-3-030-16145-3_25
DeepReview: Automatic Code Review Using Deep Multi-instance Learning 319

(Rejected) changed hunk

(Approved) changed hunk

Fig. 1. An example of rejected change JdbcRepository.java of review request 26657

from Apache. This change contains four hunks and only one hunk is rejected.

review requests are growing, which leads to a heavier burden on code review-
ers. Therefore, automatic code review is important to alleviate the burden of
reviewers.
Recently, some studies have been proposed to improve the effectiveness of
code review [1,16]. Thongtanunam et al. [16] revealed that 4%–30% of reviews
have code-reviewer assignment problems. They proposed a code reviewer rec-
ommendation approach named REVFINDER to solve it by leveraging the file
location information. Ebert et al. [1] proposed to identify the factors that confuse
reviewers and understand how confusion impacts the efficiency of code reviewers.
However, the task of automatic code review has not been well studied previously.
Considering the above issues, an automated approach is needed, which is
able to help reviewers to review the code submitted by developers. Usually, a
review request submitted by developers contains some changes of source code
in the form of diff files and textual descriptions indicating the intention of
the change. Notice that each change may contain multiple change hunks and
each hunk corresponds to a set of continuous lines of code. For example, Fig. 1
shows the change in the file JdbcRepository.java of review request 26657 from
Apache project. This change contains four hunks. One of the most common ways
to analyze this change is to combine all hunks together and generate a unified
feature to represent the change. However, this method may lead to two problems.
First, the hunks appearing in each change may be discontinuous and unrelated
to one another. Directly combining the hunks together may generate mislead-
ing feature representations, leading to a poor prediction performance. Second,
when the change is rejected, not every hunk in the change is rejected. Some hunks
have no issues and can be approved by reviewers. So the approved hunks and the
rejected hunks should not be processed together for feature extraction. There-
fore, separately generating features from each individual hunk in automatic code
320 H.-Y. Li et al.

review is needed. If the label (referring to approved or rejected ) of each hunk is

available, we can directly build classification models on hunk data. However, in
code review tasks, the label of each hunk is hard to be obtained while the label
of each change can be extracted. A question arises here, can we build a model
to generate hunk-level feature representations for automatic code review based
on change-level labels?
To solve this problem, we formulate the automatic code review as a binary
classification task in the multi-instance learning setting. Instead of regard-
ing each change as an individual instance following traditional machine learn-
ing method, multi-instance learning method regards each change as a bag of
instances while each hunk of the change is described as an instance. The basic
assumption in multi-instance learning is that if one instance is positive then
the bag is also positive, which is consistent with code review task whereas if
one hunk is rejected then the change is also rejected. In our paper, we propose a
deep learning model named DeepReview based on Convolutional Neural Network
(CNN) via multi-instance learning, which is able to automatically learn seman-
tic features from each hunk and predict if one change is approved or rejected.
Additionally, in order to obtain the features that capture the difference of code
changes, DeepReview firstly recovers the code before change (old source code)
and after change (new source code) according to the diff markers. These snip-
pers are then fed in to the deep model to generate feature presentation, based
on which the label of each change is predicted. We conduct experiments on large
datasets collected from open source Apache projects for evaluation. The results
in terms of widely-used metrics AUC and F1 score indicate that DeepReview
is effective in automatic code review and outperforms previous state-of-the-art
feature representation methods previously used for related software engineering
tasks.
The contributions of our work are several folds:

– We are the first to study automatic code review task as multi-instance learn-
ing task. One change always contains multiple hunks, where each hunk is
described as an instance and the change can be represented by a set of
instances. Experiment results on five large datasets show that the proposed
multi-instance model is effective in automatic code review tasks.
– We propose a novel deep learning model named DeepReview based on Convo-
lutional Neural Network (CNN), which learns semantic feature representation
from source code change and change descriptions, to predict if one change is
approved or rejected.

2 The DeepReview Approach

In this section, we introduce the details of applying DeepReview for automatic

code review. The goal of this task is to predict if one code change of review
request submitted by developers is approved or rejected. The general process of
automatic code review based on machine learning model is illustrated in Fig. 2.
DeepReview: Automatic Code Review Using Deep Multi-instance Learning 321

New change submission

Code review archives Different changes Instances Classifier

+
-
-

(1) Collecting and (2) Extracting (3) Building a Prediction:

processing data. features to generate prediction model Rejected or Approved
training instances
(4) Predicting new change

Fig. 2. The general automatic code review process based on machine learning model.

The automatic code review prediction process mainly contains several parts:
– Collecting data from code review systems and processing data.
– Generating feature representations of the input data.
– Training a classiﬁer based on the generated features and labels.
– Predicting if a new change is approved or rejected.
In the following subsections, we ﬁrst introduce the general framework of
DeepReview in Subsect. 2.1, and the data processing is reported in Subsect. 2.2.
The core parts of DeepReview is elaborated in Subsects. 2.3 and 2.4.

2.1 The Framework of DeepReview

We introduce some notations of our framework. Let C o = {co1 , co2 , . . . , coN } and
C n = {cn1 , cn2 , . . . , cnN } denotes the collection of old code and new code. Let D =
{d1 , d2 , . . . , dN } denotes the collection of change descriptions, where N is the
number of changes. In this paper, we formalize the code review as a learning task,
which attempts to learn a prediction function f : X → Y. xi ∈ X = (coi , cni , di )
denotes each change, where coi and cni denotes the i-th old code (before changed)
and new code (after changed) respectively. Here coi = {hoi1 , hoi2 , . . . , hoim } and
coi = {hni1 , hni2 , . . . , hnim } contains multiple hunks and m is the number of hunks.
di denotes the text description of i-th change. yi ∈ Y = {1, 0} indicates whether
the change is approved or rejected.
We instantiate the code review prediction model by constructing a multi-
instance learning based deep neural network named DeepReview. The general
framework of DeepReview is illustrated in Fig. 3. The DeepReview model con-
tains three parts: input layers, instance feature generation layers and multi-
instance based prediction layers.
In the DeepReview model, each hunk of source code change is regarded as an
instance. In the input layers, the source code and text description of each instance
is encoded as feature vectors and then are fed into the neural network for pro-
cessing. The details of data processing in the input layers will be discussed in
Subsect. 2.2. Then the encoded data of each instance is fed into instance feature
generation layers. In these layers, DeepReview utilizes diﬀerent convolutional
neural networks (CNN) to extract features from the source code input and the
textual description input. The convolutional neural networks for programming
322 H.-Y. Li et al.

code
changes
Input layer

old source code new source code

… … text description
.. .. . ..
. . .. .
… …

encoding encoding encoding

Instance feature
CNN for program- CNN for program- CNN for natural generation layer
ming language ming language language

Fully-connected layers for feature fusion

Fully-connected layers for prediction Multi-instance based

prediction layer

Output

Fig. 3. The general framework of DeepReview for automatic code review prediction.
The DeepReview model contains three parts: input layer, instance feature generation
layer, multi-instance based prediction layer.

language processing (called PCNN) is carefully designed respecting to the char-

acteristics of source code, which is similar to the network structure in [4]. The
convolutional neural networks for textual description processing (called NCNN)
is a standard way in [6]. Then the generated middle-level features of old code,
new code and textual descriptions of each instance are fused to learn a uniﬁed
feature representation via fully-connected networks mapping. Finally, after gen-
erating uniﬁed feature representations, the DeepReview model make a prediction
for each change via the multi-instance learning way in the multi-instance based
prediction layers.

2.2 Data Processing

The datasets used for automatic code review is the changed source code sub-
mitted by developers, which always appears in form of diffs and contains both
source code and diff markers (e.g., + stands for adding a line, - stands for
deleting a line). The main features in code changes are the diﬀerence between
the code before changed and after changed. So in data preprocessing shown in
DeepReview: Automatic Code Review Using Deep Multi-instance Learning 323

Multi-instance Based
Input Layers Instance Feature Generation Layers
Prediction Layers

hunk1 .
PCNN
hunk2
.
PCNN hink1
Fusion layers Fusion layers

Max pooling layers

Weight
Sharing
Weight
prediction
Weight
hunk1 . Sharing Sharing
Old code New code PCNN

. hink2
hunk2 PCNN Fusion layers Fusion layers

.
NCNN
Description

Fig. 4. Automatic code review by DeepReview. When a change is processed for pre-
diction, three parts of the change (old code, new code and text descriptions) are ﬁrstly
encoded as feature vectors to feed into deep model. Then three parts of convolutional
neural networks are followed to extracte semantic features for source code and text
description separately. After that a fully-connected network is used to get fusion fea-
ture for hunks. Finally, another fully-connected network and a max-pooling layer is
connected to generate a prediction indicating approved or rejected of the change.

the left part of Fig. 4, we extract both old code (before changed) and new code
(after changed) from diffs as input. We also use the change descriptions since
they contain the goal of this change and are helpful to improve the prediction
performance.
After splitting diff ﬁles into old code, new code and text description, a
pre-trained word2vec [10] technique is used to encode every token as vector rep-
resentations (e.g., a 300 dimension vector), which has been shown eﬀective in
processing textual data and widely used in text processing tasks [6,10]. In a simi-
lar way, we split descriptions as words and encode them as vector representations
too. All these vector representations are sent into the deep neural network to
learn the semantic features.

2.3 Instance Feature Generation Layer

DeepReview takes old source code (before change) and new source code (after
change) along with the text descriptions as inputs. Noticing that the source
code and text descriptions are with diﬀerent structures. Therefore we use PCNN
network for code and NCNN network for text to extract feature, respectively.
As aforementioned, each change will contain multiple hunks and diﬀerent
hunks are individual instance, therefore the instance features should be extracted
separately by the same neural network. In other words, the weight of PCNN is
shared for all code hunks. In this way, we can get unbiased feature representations
for each hunk with both old code and new code.
324 H.-Y. Li et al.

Suppose one change contains m modiﬁed hunks. Let (zoi1 , zoi2 , . . . , zoim )
denotes the middle-level vectors of old source code coi , (zni1 , zni2 , . . . , znim ) denotes
the middle-level vectors of new source code cni and zti denotes the middle-level
vectors of text description di . In the instance feature generation layers, DeepRe-
view ﬁrst concatenates this three part for each instance as following:

zhij = zoij znij zti (1)

where is the concatenating operation and the generated zhij represents the
features of the j-th hunk of the i-th change (referring to one instance).
To capture the diﬀerence between new code and old code as well as the
relation between code change and change description, this concatenated features
are then fed into fully-connected networks for feature fusion.

2.4 Multi-instance Based Prediction Layer

In the prediction layers, we first make a prediction for each hunk (also called
instance) using fully-connected networks following a sigmoid layer based on the
generated hunk representations. Similarly, all the fully-connected networks are
shared weights to each hunk so that the generated prediction does not have bias.
The output prediction of each hunk pi = (pi1 , pi2 , . . . , pim ) is generated.
In the multi-instance setting, if any instance is positive (rejected), the bag is
also positive (rejected). So the maximum value of predictions for hunks is used
for predicting the label of each change. Then, a max-pooling layer is employed
to get the final prediction for the change, that is pî = max{p}.
Specifically, the parameters of the convolutional neural networks layers can
be denoted as Θ = {θ1 , θ2 , . . . , θl } and the parameters of the fully-connected
networks layers can be denoted as W = {w1 , w2 , . . . , w3 }. Therefore, the loss
function implied in DeepReview is:

N
L(Θ, W ) = − (ca yi log pˆi + cr (1 − yi ) log(1 − pˆi )) + λΩ(f ) (2)
i=1

where L is a cross-entropy loss, Ω(f ) is the regularization term which imposes

regularization (e.g., L2 regularization) on the weights of model, and λ is the
trade-oﬀ parameter balancing these two terms. ca denotes the cost of incorrectly
predicting a rejected change as approved and cr denote the cost of incorrectly
predicting a approved change as rejected. This objective function can be eﬀec-
tively optimized by SGD (Stochastic gradient descent) algorithm.

3 Experiments

To evaluate the eﬀectiveness of DeepReview, we conduct experiments on thou-

sands of code reviews from open source software projects and compare with
several state-of-the-art code review methods.
DeepReview: Automatic Code Review Using Deep Multi-instance Learning 325

3.1 Experiment Settings

The datasets used in our experiment are from Apache1 Code Review Board,
which are also analyzed by prior studies on code reviews [13,14]. We down-
loaded all reviews on October 2017 and selected only code reviews in which the
reviewers highlighted the line numbers that they have issues with, totally 1,011
code reviews. We further extracted five repositories with the largest number of
involved files in the collected code reviews – the different datasets and their
statistics are shown in Table 1. For each repository, we have more than 1,000
involved files and at least 3,500 hunks.

Table 1. Statistics of our data sets.

Datasets #changes #hunks #rejected

cloudstack-git 1,682 6,171 128
aurora 1,161 6,762 168
drill-git 1,015 3,575 43
accumulo 1,011 5,798 152
hbase-git 1,009 6,702 140

As indicated by Table 1, the number of rejected hunks is only a small part of

all hunks and the datasets are very imbalanced. Therefore, we use F1 to evalu-
ate the performance; F1 has been widely used in imbalanced learning settings.
Additionally, we record the AUC, which is a non-parametric method to evaluate
model performance and is unaffected by class imbalance. The evaluation met-
rics used in our experiments were adopted to evaluate many approaches that
automate various software engineering tasks [4,5,9,12,17].
We compare the proposed model DeepReview with following baseline meth-
ods and some of its variants:
– TFIDF-LR [2], which uses Term Frequency-Inverse Document Frequency
(TFIDF) feature to represent source code changes and Logistic Regression
(LR) for classification.
– TFIDF-SVM, which uses TFIDF features to represent source code changes
and Support Vector Machine (SVM) for classification.
– Deeper [20], one of the state-of-the-art deep learning models on software engi-
neering, which extracts deep features from changes with DBN models and
then apply Logistic Regression (LR) for classification.
– Deeper-SVM, a slight variant of Deeper, which uses DBN model for feature
extraction and then apply Support Vector Machine for classification.
– DeepReview-SingleInstance, one variant of DeepReview, which does not con-
sider the multi-instance setting and concatenate the all hunks together as one
instance for input.
1
https://reviews.apache.org/r/.
326 H.-Y. Li et al.

– DeepReview-diﬀ, one variant of DeepReview, which does not separate the

code change and taking diﬀ marks and diﬀ code as input.

The settings of DeepReview and its variants are introduced here: in the con-
volution layers, we use activation function σ(x) = max(x, 0). Also, we set the
size of convolution windows as 2 and 3 with 100 feature maps each.

3.2 Experiment Results

For each dataset, 10-fold cross validation is repeated 5 times and we report the
average value of all compared methods in order to reduce the evaluation bias. We
also apply the statistic test to evaluate the significance of DeepReview. Pairwise
t-test at 95% confidence level is conducted.
We firstly compare our proposed model DeepReview with several traditional
non-multi instance models. One of the most common methods is to employ
Vector Space Model (VSM) to represent the changes. In addition, we compare
DeepReview with latest deep learning based models Deeper [20] on software
engineering, which applies Deep Believe Network for semantic feature extraction.
The results are shown in Tables 2 and 3. The highest results of each repository is
highlighted in bold. The compared methods that are significantly inferior than
our approach will be marked with “◦” and significantly better than our approach
be marked with “•”.

Table 2. The performance comparison in terms of F1 on all methods.

Datasets TFIDF-LR TFIDF-SVM Deeper Deeper-SVM DeepReview

accumulo 0.219◦ 0.231◦ 0.208◦ 0.199◦ 0.444
aurora 0.202◦ 0.214◦ 0.352◦ 0.298◦ 0.436
cloudstack-git 0.252◦ 0.276◦ 0.392◦ 0.257◦ 0.497
drill-git 0.213◦ 0.235◦ 0.277◦ 0.226◦ 0.414
hbase-git 0.235◦ 0.257◦ 0.182◦ 0.142◦ 0.463
Avg. 0.224◦ 0.243◦ 0.282◦ 0.224◦ 0.451

Table 3. The performance comparison in terms of AUC on all methods.

Datasets TFIDF-LR TFIDF-SVM Deeper Deeper-SVM DeepReview

accumulo 0.635◦ 0.678◦ 0.697◦ 0.705◦ 0.746
aurora 0.577◦ 0.629◦ 0.687◦ 0.566◦ 0.758
cloudstack-git 0.755◦ 0.827◦ 0.825◦ 0.637◦ 0.870
drill-git 0.676◦ 0.725◦ 0.639◦ 0.571◦ 0.761
hbase-git 0.685◦ 0.751 0.597◦ 0.547◦ 0.758
Avg. 0.666◦ 0.722◦ 0.689◦ 0.605◦ 0.779
DeepReview: Automatic Code Review Using Deep Multi-instance Learning 327

DeepReview-SingleInstance DeepReview DeepReview-SingleInstance DeepReview

0.500 0.900
0.480
0.850
0.460
0.440 0.800
0.420
0.400 0.750
0.380
0.700
0.360
0.340 0.650
0.320
0.300 0.600
accumulo aurora cloudstack-git drill-git hbase-git accumulo aurora cloudstack-git drill-git hbase-git

(a) F1. (b) AUC

Fig. 5. F1 and AUC of the compared methods on ﬁve datasets.

DeepReview-diff DeepReview DeepReview-diff DeepReview

0.500 0.900

0.850
0.450
0.800

0.400 0.750

0.700
0.350
0.650
0.300 0.600
accumulo aurora cloudstack-git drill-git hbase-git accumulo aurora cloudstack-git drill-git hbase-git

(a) F1. (b) AUC.

Fig. 6. F1 and AUC of the compared methods on ﬁve datasets.

As indicated in Tables 2 and 3, DeepReview achieves the best performance on

all datasets in terms of F1 score. On average, DeepReview can lead to AUC value
0.779, which is significant better than the value achieves by TFIDF-LR (0.666),
TFIDF-SVM (0.722). When compared with Deeper and its variant Deeper-SVM,
it can be easily find that DeepReview achieves the best F1 score and AUC value.
On average, the superiority of DeepReview to other deep feature based methods
is statistically significant. In conclusion, the proposed DeepReview is effective in
automatic code review prediction, which indicates that DeepReview can learn
better features than traditional hand-crafted features or previous deep learning
based features.
To evaluate the effectiveness of applying multi-instance learning strategy
for code review, we compare our model to traditional single-instance learning
model, named DeepReview-SingleInstance. Figure 5a and b show the perfor-
mance comparison of DeepReview and a variant DeepReview-SingleInstance.
It can be observed that DeepReview achieves higher AUC value and F1 score
than DeepReview-SingleInstance on all datasets, indicating that multi-instance
learning approach is effective in code review task.
To evaluate the effectiveness of applying both source code before and after
changes to model the difference features of change, we compare another vari-
ant of DeepReview, named DeepReview-diff. We use the same network structure
to extract the features of code in diffs and fuse it with the features of corre-
328 H.-Y. Li et al.

sponding change description as the ﬁnal representations. Figure 6a and b show

the performance comparison of DeepReview and its variant DeepReview-diﬀ.
Compared to DeepReview-diﬀ, it is clear that DeepReview outperforms it by
improving 4.2% in terms of F1 score and 4.7% in terms of AUC on average.

4 Related Work

Many empirical studies aim to help researchers and practitioners to understand

code review practice from different perspectives [7,13,15]. To characterize and
understand the differences between a diverse set of software projects, Rigby et
al. [13] found that many characteristics of code review have independently con-
verged to similar values which indicates general principles of code review, e.g.,
reviewers prefer discussion and fixing code over reporting defects, the number
of involved developers can vary. Kononenko et al. [7] investigated a set of fac-
tors that might affect the quality of code review based on a large open-source
project Mozilla, and focused on the relationship between human factors (e.g.,
personal characteristics of developers, team participation and involvement) and
code review quality. Tao et al. [15] investigated the reasons behind 300 rejected
Eclipse and Mozilla patches by surveying 246 developers. They concluded that
the poor quality of the solution, the large size of the involvement of unnecessary
changes, the ambiguous documentation of a patch and inefficient communica-
tion. Moreover, Thongtanunam et al. [16] revealed that 4%–30% of reviews have
code-reviewer assignment problem. Thus, they proposed a code-reviewer recom-
mendation approach REVFINDER to solve the problem by leveraging the file
location information. The intuition is that files that are located in similar file
paths would be managed and reviewed by experienced code-reviewers. Zanjani
et al. [21] also studied on code reviewer recommendation problem and they pro-
posed an approach cHRev by leveraging the specific information in previously
completed reviews (i.e., quantification of review comments and their recency).
Recently, deep learning has been applied in software engineering. For exam-
ple, Yang et al. applied Deep Belief Network (DBN) to learn higher-level features
from a set of basic features extracted from commits (e.g., lines of code added,
lines of code deleted, etc.) to predict buggy commits [20]. Xu et al. applied word
embedding and convolutional neural network (CNN) to predict semantic links
between knowledge units in Stack Overflow (i.e., questions and answers) to help
developers better navigate and search the popular knowledge base [19]. Lee et al.
applied word embedding and CNN to identify developers that should be assigned
to fix a bug report [8]. Mou et al. [11] applied tree based CNN on abstract syntax
tree to detect code snippets of certain patterns. Huo et al. [3,4] applied learned
unified semantic feature based on bug reports in natural language and source
code in programming language for bug localization tasks. Wei et al. [18] proposed
deep feature learning framework AST-based LSTM network for functional clone
detection, which exploits the lexical and syntactical information.
DeepReview: Automatic Code Review Using Deep Multi-instance Learning 329

5 Conclusion
In this paper, we are the first to formulate code review as a multi-instance learn-
ing task. We propose a novel deep learning model named DeepReview for auto-
matic code review, which takes raw data of a changed code containing multiple
hunks along with the textual descriptions as inputs and predicts if one change
is approved or rejected. Experimental results on five open source datasets show
that DeepReview is effective and outperforms the state-of-the-art models previ-
ously proposed for other automated software engineering tasks.

Acknowledgment. This research was supported by National Key Research and

Development Program (2017YFB1001903) and NSFC (61751306).

References
1. Ebert, F., Castor, F., Novielli, N., Serebrenik, A.: Confusion detection in code
reviews. In: ICSME, pp. 549–553 (2017)
2. Gay, G., Haiduc, S., Marcus, A., Menzies, T.: On the use of relevance feedback in
IR-based concept location. In: ICSM, pp. 351–360 (2009)
3. Huo, X., Li, M.: Enhancing the unified features to locate buggy files by exploiting
the sequential nature of source code. In: IJCAI, pp. 1909–1915 (2017)
4. Huo, X., Li, M., Zhou, Z.H.: Learning unified features from natural and program-
ming languages for locating buggy source code. In: IJCAI, pp. 1606–1612 (2016)
5. Jiang, T., Tan, L., Kim, S.: Personalized defect prediction. In: ASE, pp. 279–289
(2013)
6. Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP,
pp. 1746–1751 (2014)
7. Kononenko, O., Baysal, O., Guerrouj, L., Cao, Y., Godfrey, M.W.: Investigating
code review quality: do people and participation matter? In: ICSME, pp. 111–120
(2015)
8. Lee, S., Heo, M., Lee, C., Kim, M., Jeong, G.: Applying deep learning based auto-
matic bug triager to industrial projects. In: ESEC/FSE, pp. 926–931 (2017)
9. Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn
defect predictors. IEEE TSE 33(1), 2–13 (2007)
10. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed represen-
tations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119
(2013)
11. Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over
tree structures for programming language processing. In: AAAI, pp. 1287–1293
(2016)
12. Nam, J., Pan, S.J., Kim, S.: Transfer defect learning. In: ICSE, pp. 382–391 (2013)
13. Rigby, P.C., Bird, C.: Convergent contemporary software peer review practices. In:
FSE, pp. 202–212 (2013)
14. Rigby, P.C., German, D.M., Storey, M.A.: Open source software peer review prac-
tices: a case study of the apache server. In: ICSE, pp. 541–550 (2008)
15. Tao, Y., Han, D., Kim, S.: Writing acceptable patches: an empirical study of open
source project patches. In: ICSME, pp. 271–280 (2014)
330 H.-Y. Li et al.

16. Thongtanunam, P., Tantithamthavorn, C., Kula, R.G., Yoshida, N., Iida, H., Mat-
sumoto, K.I.: Who should review my code? A ﬁle location-based code-reviewer rec-
ommendation approach for modern code review. In: SANER, pp. 141–150 (2015)
17. Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect
prediction. In: ICSE, pp. 297–308 (2016)
18. Wei, H.H., Li, M.: Supervised deep features for software functional clone detection
by exploiting lexical and syntactical information in source code. In: IJCAI, pp.
3034–3040 (2017)
19. Xu, B., Ye, D., Xing, Z., Xia, X., Chen, G., Li, S.: Predicting semantically linkable
knowledge in developer online forums via convolutional neural network. In: ASE,
pp. 51–62 (2016)
20. Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect
prediction. In: QRS, pp. 17–26 (2015)
21. Zanjani, M.B., Kagdi, H., Bird, C.: Automatically recommending peer reviewers
in modern code review. IEEE TSE 42(6), 530–543 (2016)

Thongtanunam 2015
No ratings yet
Thongtanunam 2015
10 pages
2409.15152v1
No ratings yet
2409.15152v1
8 pages
Automated Code Review and Bug Detection
No ratings yet
Automated Code Review and Bug Detection
6 pages
Automating Code Review
No ratings yet
Automating Code Review
5 pages
Placement Test: Placement Test Section 1 Choose The Best Word or Phrase (A, B, C or D) To Fill Each Blank.
No ratings yet
Placement Test: Placement Test Section 1 Choose The Best Word or Phrase (A, B, C or D) To Fill Each Blank.
6 pages
Cambridge University: Did You Know?
33% (3)
Cambridge University: Did You Know?
14 pages
Code Reviewer Automation
No ratings yet
Code Reviewer Automation
17 pages
A Study Code Review in Software Development Using AI
No ratings yet
A Study Code Review in Software Development Using AI
7 pages
Speech Writing
No ratings yet
Speech Writing
2 pages
Unit Testing and TDD with TypeScript: Quality Code from Day One
From Everand
Unit Testing and TDD with TypeScript: Quality Code from Day One
Baldurs L.
No ratings yet
Research Report Bartes-Catalin-Razvan IS 248
No ratings yet
Research Report Bartes-Catalin-Razvan IS 248
8 pages
Diggit: Automated Code Review Via Software Repository Mining
No ratings yet
Diggit: Automated Code Review Via Software Repository Mining
5 pages
DLDay18 Paper 40
No ratings yet
DLDay18 Paper 40
9 pages
Refactoring with C++: Explore modern ways of developing maintainable and efficient applications
From Everand
Refactoring with C++: Explore modern ways of developing maintainable and efficient applications
Dmitry Danilov
No ratings yet
Python-Based Evolutionary Algorithms for Engineers
From Everand
Python-Based Evolutionary Algorithms for Engineers
Pankaj Jayaraman
No ratings yet
Efficient API Client Generation with AutoRest: Definitive Reference for Developers and Engineers
From Everand
Efficient API Client Generation with AutoRest: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PHPUnit in Practice: Definitive Reference for Developers and Engineers
From Everand
PHPUnit in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Codeception Essentials: Definitive Reference for Developers and Engineers
From Everand
Codeception Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
XCTest in Swift: Definitive Reference for Developers and Engineers
From Everand
XCTest in Swift: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
OpenAPI Specification in Practice: Definitive Reference for Developers and Engineers
From Everand
OpenAPI Specification in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Latihan Soal Bahasa Inggris Ossbn24 Kelas 1 Sd
No ratings yet
Latihan Soal Bahasa Inggris Ossbn24 Kelas 1 Sd
7 pages
OpenAI Development Guide: Definitive Reference for Developers and Engineers
From Everand
OpenAI Development Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Clean Code Practices
From Everand
Clean Code Practices
Zoe Codewell
No ratings yet
Code Generation Techniques and Applications: Definitive Reference for Developers and Engineers
From Everand
Code Generation Techniques and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
LangChain Essentials: From Basics to Advanced AI Applications
From Everand
LangChain Essentials: From Basics to Advanced AI Applications
Robert Johnson
No ratings yet
Swift Programming Simplified: A Practical Guide with Examples
From Everand
Swift Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
Node.js for Beginners: A comprehensive guide to building efficient, full-featured web applications with Node.js
From Everand
Node.js for Beginners: A comprehensive guide to building efficient, full-featured web applications with Node.js
Ulises Gascón
No ratings yet
Code::Blocks Essentials: Definitive Reference for Developers and Engineers
From Everand
Code::Blocks Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Python OOP Step by Step: A Practical Guide with Examples
From Everand
Python OOP Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
From Everand
Practical C++ Backend Programming: Crafting Databases, APIs, and Web Servers for High-Performance Backend
Justin Barbara
No ratings yet
Software Engineering: Concepts, Principles, and Practices
From Everand
Software Engineering: Concepts, Principles, and Practices
Pasquale De Marco
No ratings yet
Writing Clean Code Step by Step: A Practical Guide with Examples
From Everand
Writing Clean Code Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
C# Functional Programming Made Easy: A Practical Guide with Examples
From Everand
C# Functional Programming Made Easy: A Practical Guide with Examples
William E. Clark
No ratings yet
Mastering the Art of C# Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of C# Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
Pylint in Professional Python Development: Definitive Reference for Developers and Engineers
From Everand
Pylint in Professional Python Development: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Hands-On Python for DevOps: Leverage Python's native libraries to streamline your workflow and save time with automation
From Everand
Hands-On Python for DevOps: Leverage Python's native libraries to streamline your workflow and save time with automation
Ankur Roy
No ratings yet
Python Functional Programming Simplified: A Practical Guide with Examples
From Everand
Python Functional Programming Simplified: A Practical Guide with Examples
William E. Clark
No ratings yet
Python Debugging from Scratch: A Practical Guide with Examples ASIN (Ebook):
From Everand
Python Debugging from Scratch: A Practical Guide with Examples ASIN (Ebook):
William E. Clark
No ratings yet
Programming Best Practices for New Developers: A Practical Guide with Examples
From Everand
Programming Best Practices for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
C# Algorithms for New Programmers: A Practical Guide with Examples
From Everand
C# Algorithms for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
TypeScript Blueprints
From Everand
TypeScript Blueprints
Ivo Gabe de Wolff
No ratings yet
C# Debugging from Scratch: A Practical Guide with Examples
From Everand
C# Debugging from Scratch: A Practical Guide with Examples
William E. Clark
No ratings yet
C++ Functional Programming for Starters: A Practical Guide with Examples
From Everand
C++ Functional Programming for Starters: A Practical Guide with Examples
William E. Clark
No ratings yet
Jack Frederick Kilpatrick, Anna Gritts Kilpatrick - Run Toward the Nightland_ Magic of the Oklahoma Cherokees (Brighter Scan)
No ratings yet
Jack Frederick Kilpatrick, Anna Gritts Kilpatrick - Run Toward the Nightland_ Magic of the Oklahoma Cherokees (Brighter Scan)
96 pages
Algorithms Made Simple: Understanding the Building Blocks of Software
From Everand
Algorithms Made Simple: Understanding the Building Blocks of Software
William E. Clark
No ratings yet
Test-Driven iOS Development with Swift: Create fully-featured and highly functional iOS apps by writing tests first
From Everand
Test-Driven iOS Development with Swift: Create fully-featured and highly functional iOS apps by writing tests first
Dr. Dominik Hauser
5/5 (2)
Mastering the Craft: Unleashing the Art of Software Engineering
From Everand
Mastering the Craft: Unleashing the Art of Software Engineering
Kiran Nagesh
No ratings yet
Practical C++ Backend Programming
From Everand
Practical C++ Backend Programming
Justin Barbara
No ratings yet
C++ OOP Made Simple: A Practical Guide with Examples
From Everand
C++ OOP Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
When and If
No ratings yet
When and If
4 pages
Software Architecture with Python
From Everand
Software Architecture with Python
Anand Balachandran Pillai
3/5 (1)
Term Paper Sample in English
100% (1)
Term Paper Sample in English
7 pages
EhLib Users Guide
100% (1)
EhLib Users Guide
42 pages
Mastering C# Concurrency
From Everand
Mastering C# Concurrency
Agafonov Eugene
2/5 (2)
Designing deep learning systems: Software engineering, #1
From Everand
Designing deep learning systems: Software engineering, #1
rayaan
No ratings yet
Sergio Assad - Homage A Julian Arcas
100% (2)
Sergio Assad - Homage A Julian Arcas
5 pages
The United States Vs Ah Chong Digest
No ratings yet
The United States Vs Ah Chong Digest
2 pages
US v. Hicks
50% (2)
US v. Hicks
2 pages
Learning .NET High-performance Programming
From Everand
Learning .NET High-performance Programming
Antonio Esposito
No ratings yet
Software Reuse: Methods, Models, Costs, Second Edition
From Everand
Software Reuse: Methods, Models, Costs, Second Edition
Ronald J. Leach
No ratings yet
Spring 2.5 Aspect Oriented Programming
From Everand
Spring 2.5 Aspect Oriented Programming
Massimiliano DessÃ¬
No ratings yet
Live Trace Visualization for System and Program Comprehension in Large Software Landscapes
From Everand
Live Trace Visualization for System and Program Comprehension in Large Software Landscapes
Florian Fittkau
No ratings yet
11th Computer Science 2m 3 M Questions English Medium PDF
No ratings yet
11th Computer Science 2m 3 M Questions English Medium PDF
21 pages
Resume Sujit
No ratings yet
Resume Sujit
2 pages
Vu Nhu Binh - U8 - L4 - Grammar - Warm Up + Presentation
No ratings yet
Vu Nhu Binh - U8 - L4 - Grammar - Warm Up + Presentation
3 pages
RPT Bi Tahun 2
No ratings yet
RPT Bi Tahun 2
35 pages
Osseti Artawasd
No ratings yet
Osseti Artawasd
6 pages
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Geometri Unsur Struktur
No ratings yet
Geometri Unsur Struktur
10 pages
Real-World Solutions for Developing High-Quality PHP Frameworks and Applications
From Everand
Real-World Solutions for Developing High-Quality PHP Frameworks and Applications
Sebastian Bergmann
2.5/5 (2)
Java™ Programming: A Complete Project Lifecycle Guide
From Everand
Java™ Programming: A Complete Project Lifecycle Guide
Nitin Shreyakar
No ratings yet
Pre-Test Teen 2
No ratings yet
Pre-Test Teen 2
7 pages
Honrado Vs CA
No ratings yet
Honrado Vs CA
12 pages
Lasanas Vs People
No ratings yet
Lasanas Vs People
11 pages
Programming Concepts in C++
From Everand
Programming Concepts in C++
Robert Burns
No ratings yet
Komal Khandale Resume-1
No ratings yet
Komal Khandale Resume-1
2 pages
Tutoring YLC6.2 - L2 PDF
No ratings yet
Tutoring YLC6.2 - L2 PDF
4 pages
Antonio vs. Reyes, G.R. No. 155800, Mar. 10,2005
No ratings yet
Antonio vs. Reyes, G.R. No. 155800, Mar. 10,2005
19 pages
Illegal Loging: Sub Heading Here 22pt
No ratings yet
Illegal Loging: Sub Heading Here 22pt
4 pages
Abakada Guro Vs Ermita (Sep and Oct 2005) Digest
No ratings yet
Abakada Guro Vs Ermita (Sep and Oct 2005) Digest
4 pages
Abakada Guro Vs Ermita (Sep and Oct) Digest
No ratings yet
Abakada Guro Vs Ermita (Sep and Oct) Digest
4 pages
Article On How The Bernoulli Effect Relates To Speech Sound Production
No ratings yet
Article On How The Bernoulli Effect Relates To Speech Sound Production
3 pages
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
Professional Test Driven Development with C#: Developing Real World Applications with TDD
From Everand
Professional Test Driven Development with C#: Developing Real World Applications with TDD
James Bender
No ratings yet
Ugalde Vs Yassi
No ratings yet
Ugalde Vs Yassi
10 pages
Common Mistakes at PET
No ratings yet
Common Mistakes at PET
31 pages
SG Buybook
No ratings yet
SG Buybook
2 pages
Robinsons Vs California
No ratings yet
Robinsons Vs California
8 pages
Unit 11 Language Awareness
No ratings yet
Unit 11 Language Awareness
2 pages
Microsoft Word Inserting Comments and Voice
No ratings yet
Microsoft Word Inserting Comments and Voice
15 pages
MARCOS Vs MANGLAPUS Digest Oct 1989
No ratings yet
MARCOS Vs MANGLAPUS Digest Oct 1989
1 page
To Be Print All PDF
No ratings yet
To Be Print All PDF
13 pages
Differences Between Left Hemisphere and Right Hemisphere
No ratings yet
Differences Between Left Hemisphere and Right Hemisphere
6 pages
8 Outsiders Characterization Rubric
No ratings yet
8 Outsiders Characterization Rubric
1 page
Lingva LATINA VII Lectio XV
No ratings yet
Lingva LATINA VII Lectio XV
14 pages
Philippine Constitution Association vs. Enriquez GR No 113105
No ratings yet
Philippine Constitution Association vs. Enriquez GR No 113105
3 pages
People V Norma Hernandez
No ratings yet
People V Norma Hernandez
1 page
People V Pagal
No ratings yet
People V Pagal
1 page
Spring and Spring Boot Interview Questions and Answers. Tech interviewer’s notes
From Everand
Spring and Spring Boot Interview Questions and Answers. Tech interviewer’s notes
John Edward Cooper Berg
5/5 (3)
The Teaching of Literature: Irish C. Dijan, LPT
No ratings yet
The Teaching of Literature: Irish C. Dijan, LPT
36 pages
ENC 0025 Developmental Writing II
No ratings yet
ENC 0025 Developmental Writing II
4 pages
Describe Depth Interview Technique and Explain Projective Techniques
No ratings yet
Describe Depth Interview Technique and Explain Projective Techniques
11 pages
Formulas de InglÉs
No ratings yet
Formulas de InglÉs
3 pages
Software Testing Interview Questions You'll Most Likely Be Asked
From Everand
Software Testing Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

DeepReview_ Automatic code review using deep multi-instance learn

Uploaded by

DeepReview_ Automatic code review using deep multi-instance learn

Uploaded by

Singapore Management University

Institutional Knowledge at Singapore Management University

DeepReview: Automatic code review using deep multi-instance

See next page for additional authors

Part of the Software Engineering Commons

Abstract. Code review, an inspection of code changes in order to iden-

Keywords: Software mining · Machine learning ·

Software Quality Assurance (SQA) is essential in software development. Soft-

(Rejected) changed hunk

(Approved) changed hunk

Fig. 1. An example of rejected change JdbcRepository.java of review request 26657

review is needed. If the label (referring to approved or rejected ) of each hunk is

2 The DeepReview Approach

In this section, we introduce the details of applying DeepReview for automatic

New change submission

Code review archives Different changes Instances Classifier

(1) Collecting and (2) Extracting (3) Building a Prediction:

2.1 The Framework of DeepReview

old source code new source code

encoding encoding encoding

Fully-connected layers for feature fusion

Fully-connected layers for prediction Multi-instance based

language processing (called PCNN) is carefully designed respecting to the char-

2.2 Data Processing

Max pooling layers

2.3 Instance Feature Generation Layer

zhij = zoij  znij  zti (1)

2.4 Multi-instance Based Prediction Layer

where L is a cross-entropy loss, Ω(f ) is the regularization term which imposes

To evaluate the eﬀectiveness of DeepReview, we conduct experiments on thou-

3.1 Experiment Settings

Table 1. Statistics of our data sets.

Datasets #changes #hunks #rejected

As indicated by Table 1, the number of rejected hunks is only a small part of

– DeepReview-diﬀ, one variant of DeepReview, which does not separate the

3.2 Experiment Results

Table 2. The performance comparison in terms of F1 on all methods.

Datasets TFIDF-LR TFIDF-SVM Deeper Deeper-SVM DeepReview

Table 3. The performance comparison in terms of AUC on all methods.

Datasets TFIDF-LR TFIDF-SVM Deeper Deeper-SVM DeepReview

DeepReview-SingleInstance DeepReview DeepReview-SingleInstance DeepReview

(a) F1. (b) AUC

Fig. 5. F1 and AUC of the compared methods on ﬁve datasets.

DeepReview-diff DeepReview DeepReview-diff DeepReview

(a) F1. (b) AUC.

Fig. 6. F1 and AUC of the compared methods on ﬁve datasets.

As indicated in Tables 2 and 3, DeepReview achieves the best performance on

sponding change description as the ﬁnal representations. Figure 6a and b show

Many empirical studies aim to help researchers and practitioners to understand

Acknowledgment. This research was supported by National Key Research and

You might also like

zhij = zoij znij zti (1)