DeepReview_ Automatic code review using deep multi-instance learn
DeepReview_ Automatic code review using deep multi-instance learn
Research Collection School Of Computing and School of Computing and Information Systems
Information Systems
4-2019
Shuting SHI
Nanjing University
Ferdian THUNG
Singapore Management University, [email protected]
Xuan HUO
Nanjing University
Bowen XU
Singapore Management University, [email protected]
Citation
LI, Hengyi; SHI, Shuting; THUNG, Ferdian; HUO, Xuan; XU, Bowen; LI, Ming; and LO, David. DeepReview:
Automatic code review using deep multi-instance learning. (2019). Advances in knowledge discovery and
data mining: 23rd Pacific-Asia Conference, PAKDD 2019, Macau, China, April 14-17: Proceedings. 11440,
318-330.
Available at: https://ink.library.smu.edu.sg/sis_research/4346
This Conference Proceeding Article is brought to you for free and open access by the School of Computing and
Information Systems at Institutional Knowledge at Singapore Management University. It has been accepted for
inclusion in Research Collection School Of Computing and Information Systems by an authorized administrator of
Institutional Knowledge at Singapore Management University. For more information, please email
[email protected].
Author
Hengyi LI, Shuting SHI, Ferdian THUNG, Xuan HUO, Bowen XU, Ming LI, and David LO
This conference proceeding article is available at Institutional Knowledge at Singapore Management University:
https://ink.library.smu.edu.sg/sis_research/4346
DeepReview: Automatic Code Review
Using Deep Multi-instance Learning
Heng-Yi Li1 , Shu-Ting Shi1 , Ferdian Thung2 , Xuan Huo1 , Bowen Xu2 ,
Ming Li1(B) , and David Lo2
1
National Key Laboratory for Novel Software Technology, Nanjing University,
Nanjing 210023, China
{lihy,shist,huox,lim}@lamda.nju.edu.cn
2
School of Information Systems, Singapore Management University,
Singapore, Singapore
{ferdiant.2013,bowenxu.2017}@phdis.smu.edu.sg, [email protected]
1 Introduction
review requests are growing, which leads to a heavier burden on code review-
ers. Therefore, automatic code review is important to alleviate the burden of
reviewers.
Recently, some studies have been proposed to improve the effectiveness of
code review [1,16]. Thongtanunam et al. [16] revealed that 4%–30% of reviews
have code-reviewer assignment problems. They proposed a code reviewer rec-
ommendation approach named REVFINDER to solve it by leveraging the file
location information. Ebert et al. [1] proposed to identify the factors that confuse
reviewers and understand how confusion impacts the efficiency of code reviewers.
However, the task of automatic code review has not been well studied previously.
Considering the above issues, an automated approach is needed, which is
able to help reviewers to review the code submitted by developers. Usually, a
review request submitted by developers contains some changes of source code
in the form of diff files and textual descriptions indicating the intention of
the change. Notice that each change may contain multiple change hunks and
each hunk corresponds to a set of continuous lines of code. For example, Fig. 1
shows the change in the file JdbcRepository.java of review request 26657 from
Apache project. This change contains four hunks. One of the most common ways
to analyze this change is to combine all hunks together and generate a unified
feature to represent the change. However, this method may lead to two problems.
First, the hunks appearing in each change may be discontinuous and unrelated
to one another. Directly combining the hunks together may generate mislead-
ing feature representations, leading to a poor prediction performance. Second,
when the change is rejected, not every hunk in the change is rejected. Some hunks
have no issues and can be approved by reviewers. So the approved hunks and the
rejected hunks should not be processed together for feature extraction. There-
fore, separately generating features from each individual hunk in automatic code
320 H.-Y. Li et al.
– We are the first to study automatic code review task as multi-instance learn-
ing task. One change always contains multiple hunks, where each hunk is
described as an instance and the change can be represented by a set of
instances. Experiment results on five large datasets show that the proposed
multi-instance model is effective in automatic code review tasks.
– We propose a novel deep learning model named DeepReview based on Convo-
lutional Neural Network (CNN), which learns semantic feature representation
from source code change and change descriptions, to predict if one change is
approved or rejected.
Fig. 2. The general automatic code review process based on machine learning model.
The automatic code review prediction process mainly contains several parts:
– Collecting data from code review systems and processing data.
– Generating feature representations of the input data.
– Training a classifier based on the generated features and labels.
– Predicting if a new change is approved or rejected.
In the following subsections, we first introduce the general framework of
DeepReview in Subsect. 2.1, and the data processing is reported in Subsect. 2.2.
The core parts of DeepReview is elaborated in Subsects. 2.3 and 2.4.
code
changes
Input layer
Instance feature
CNN for program- CNN for program- CNN for natural generation layer
ming language ming language language
Output
Fig. 3. The general framework of DeepReview for automatic code review prediction.
The DeepReview model contains three parts: input layer, instance feature generation
layer, multi-instance based prediction layer.
The datasets used for automatic code review is the changed source code sub-
mitted by developers, which always appears in form of diffs and contains both
source code and diff markers (e.g., + stands for adding a line, - stands for
deleting a line). The main features in code changes are the difference between
the code before changed and after changed. So in data preprocessing shown in
DeepReview: Automatic Code Review Using Deep Multi-instance Learning 323
Multi-instance Based
Input Layers Instance Feature Generation Layers
Prediction Layers
hunk1 .
PCNN
hunk2
.
PCNN hink1
Fusion layers Fusion layers
. hink2
hunk2 PCNN Fusion layers Fusion layers
.
NCNN
Description
Fig. 4. Automatic code review by DeepReview. When a change is processed for pre-
diction, three parts of the change (old code, new code and text descriptions) are firstly
encoded as feature vectors to feed into deep model. Then three parts of convolutional
neural networks are followed to extracte semantic features for source code and text
description separately. After that a fully-connected network is used to get fusion fea-
ture for hunks. Finally, another fully-connected network and a max-pooling layer is
connected to generate a prediction indicating approved or rejected of the change.
the left part of Fig. 4, we extract both old code (before changed) and new code
(after changed) from diffs as input. We also use the change descriptions since
they contain the goal of this change and are helpful to improve the prediction
performance.
After splitting diff files into old code, new code and text description, a
pre-trained word2vec [10] technique is used to encode every token as vector rep-
resentations (e.g., a 300 dimension vector), which has been shown effective in
processing textual data and widely used in text processing tasks [6,10]. In a simi-
lar way, we split descriptions as words and encode them as vector representations
too. All these vector representations are sent into the deep neural network to
learn the semantic features.
DeepReview takes old source code (before change) and new source code (after
change) along with the text descriptions as inputs. Noticing that the source
code and text descriptions are with different structures. Therefore we use PCNN
network for code and NCNN network for text to extract feature, respectively.
As aforementioned, each change will contain multiple hunks and different
hunks are individual instance, therefore the instance features should be extracted
separately by the same neural network. In other words, the weight of PCNN is
shared for all code hunks. In this way, we can get unbiased feature representations
for each hunk with both old code and new code.
324 H.-Y. Li et al.
Suppose one change contains m modified hunks. Let (zoi1 , zoi2 , . . . , zoim )
denotes the middle-level vectors of old source code coi , (zni1 , zni2 , . . . , znim ) denotes
the middle-level vectors of new source code cni and zti denotes the middle-level
vectors of text description di . In the instance feature generation layers, DeepRe-
view first concatenates this three part for each instance as following:
where is the concatenating operation and the generated zhij represents the
features of the j-th hunk of the i-th change (referring to one instance).
To capture the difference between new code and old code as well as the
relation between code change and change description, this concatenated features
are then fed into fully-connected networks for feature fusion.
In the prediction layers, we first make a prediction for each hunk (also called
instance) using fully-connected networks following a sigmoid layer based on the
generated hunk representations. Similarly, all the fully-connected networks are
shared weights to each hunk so that the generated prediction does not have bias.
The output prediction of each hunk pi = (pi1 , pi2 , . . . , pim ) is generated.
In the multi-instance setting, if any instance is positive (rejected), the bag is
also positive (rejected). So the maximum value of predictions for hunks is used
for predicting the label of each change. Then, a max-pooling layer is employed
to get the final prediction for the change, that is pˆi = max{p}.
Specifically, the parameters of the convolutional neural networks layers can
be denoted as Θ = {θ1 , θ2 , . . . , θl } and the parameters of the fully-connected
networks layers can be denoted as W = {w1 , w2 , . . . , w3 }. Therefore, the loss
function implied in DeepReview is:
N
L(Θ, W ) = − (ca yi log pˆi + cr (1 − yi ) log(1 − pˆi )) + λΩ(f ) (2)
i=1
3 Experiments
The settings of DeepReview and its variants are introduced here: in the con-
volution layers, we use activation function σ(x) = max(x, 0). Also, we set the
size of convolution windows as 2 and 3 with 100 feature maps each.
For each dataset, 10-fold cross validation is repeated 5 times and we report the
average value of all compared methods in order to reduce the evaluation bias. We
also apply the statistic test to evaluate the significance of DeepReview. Pairwise
t-test at 95% confidence level is conducted.
We firstly compare our proposed model DeepReview with several traditional
non-multi instance models. One of the most common methods is to employ
Vector Space Model (VSM) to represent the changes. In addition, we compare
DeepReview with latest deep learning based models Deeper [20] on software
engineering, which applies Deep Believe Network for semantic feature extraction.
The results are shown in Tables 2 and 3. The highest results of each repository is
highlighted in bold. The compared methods that are significantly inferior than
our approach will be marked with “◦” and significantly better than our approach
be marked with “•”.
0.850
0.450
0.800
0.400 0.750
0.700
0.350
0.650
0.300 0.600
accumulo aurora cloudstack-git drill-git hbase-git accumulo aurora cloudstack-git drill-git hbase-git
4 Related Work
5 Conclusion
In this paper, we are the first to formulate code review as a multi-instance learn-
ing task. We propose a novel deep learning model named DeepReview for auto-
matic code review, which takes raw data of a changed code containing multiple
hunks along with the textual descriptions as inputs and predicts if one change
is approved or rejected. Experimental results on five open source datasets show
that DeepReview is effective and outperforms the state-of-the-art models previ-
ously proposed for other automated software engineering tasks.
References
1. Ebert, F., Castor, F., Novielli, N., Serebrenik, A.: Confusion detection in code
reviews. In: ICSME, pp. 549–553 (2017)
2. Gay, G., Haiduc, S., Marcus, A., Menzies, T.: On the use of relevance feedback in
IR-based concept location. In: ICSM, pp. 351–360 (2009)
3. Huo, X., Li, M.: Enhancing the unified features to locate buggy files by exploiting
the sequential nature of source code. In: IJCAI, pp. 1909–1915 (2017)
4. Huo, X., Li, M., Zhou, Z.H.: Learning unified features from natural and program-
ming languages for locating buggy source code. In: IJCAI, pp. 1606–1612 (2016)
5. Jiang, T., Tan, L., Kim, S.: Personalized defect prediction. In: ASE, pp. 279–289
(2013)
6. Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP,
pp. 1746–1751 (2014)
7. Kononenko, O., Baysal, O., Guerrouj, L., Cao, Y., Godfrey, M.W.: Investigating
code review quality: do people and participation matter? In: ICSME, pp. 111–120
(2015)
8. Lee, S., Heo, M., Lee, C., Kim, M., Jeong, G.: Applying deep learning based auto-
matic bug triager to industrial projects. In: ESEC/FSE, pp. 926–931 (2017)
9. Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn
defect predictors. IEEE TSE 33(1), 2–13 (2007)
10. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed represen-
tations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119
(2013)
11. Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over
tree structures for programming language processing. In: AAAI, pp. 1287–1293
(2016)
12. Nam, J., Pan, S.J., Kim, S.: Transfer defect learning. In: ICSE, pp. 382–391 (2013)
13. Rigby, P.C., Bird, C.: Convergent contemporary software peer review practices. In:
FSE, pp. 202–212 (2013)
14. Rigby, P.C., German, D.M., Storey, M.A.: Open source software peer review prac-
tices: a case study of the apache server. In: ICSE, pp. 541–550 (2008)
15. Tao, Y., Han, D., Kim, S.: Writing acceptable patches: an empirical study of open
source project patches. In: ICSME, pp. 271–280 (2014)
330 H.-Y. Li et al.
16. Thongtanunam, P., Tantithamthavorn, C., Kula, R.G., Yoshida, N., Iida, H., Mat-
sumoto, K.I.: Who should review my code? A file location-based code-reviewer rec-
ommendation approach for modern code review. In: SANER, pp. 141–150 (2015)
17. Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect
prediction. In: ICSE, pp. 297–308 (2016)
18. Wei, H.H., Li, M.: Supervised deep features for software functional clone detection
by exploiting lexical and syntactical information in source code. In: IJCAI, pp.
3034–3040 (2017)
19. Xu, B., Ye, D., Xing, Z., Xia, X., Chen, G., Li, S.: Predicting semantically linkable
knowledge in developer online forums via convolutional neural network. In: ASE,
pp. 51–62 (2016)
20. Yang, X., Lo, D., Xia, X., Zhang, Y., Sun, J.: Deep learning for just-in-time defect
prediction. In: QRS, pp. 17–26 (2015)
21. Zanjani, M.B., Kagdi, H., Bird, C.: Automatically recommending peer reviewers
in modern code review. IEEE TSE 42(6), 530–543 (2016)