A_novel_k_NN_algorithm_with_data_driven
A_novel_k_NN_algorithm_with_data_driven
PII: S0167-8655(17)30356-2
DOI: 10.1016/j.patrec.2017.09.036
Reference: PATREC 6948
Please cite this article as: Shichao Zhang, Debo Cheng, Zhenyun Deng, Ming Zong, Xuelian Deng, A
novel kNN Algorithm with Data-driven k Parameter Computation, Pattern Recognition Letters (2017),
doi: 10.1016/j.patrec.2017.09.036
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
1
Highlights
T
IP
CR
US
AN
M
ED
PT
CE
AC
ACCEPTED MANUSCRIPT
2
Shichao Zhanga , Debo Chenga,b , Zhenyun Denga , Ming Zongc , Xuelian Dengd
a Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi, China.
T
b Information Technology and Mathematical Sciences, University of South Australia, Adelaide, Australia
c Institute of Natural and Mathematical Sciences, Massey University, Auckland, New Zealand
IP
d College of Public Health and Management, Guangxi University of Chinese Medicine, Nanning, Guangxi, China
CR
ABSTRACT
This paper studies an example-driven k-parameter computation that identifies different k values for dif-
ferent test samples in kNN prediction applications, such as classification, regression and missing data
imputation. This is carried out with reconstructing a sparse coefficient matrix between test samples
US
and training data. In the reconstruction process, an ℓ1 −norm regularization is employed to generate an
element-wise sparsity coefficient matrix, and an LPP (Locality Preserving Projection) regularization
is adopted to keep the local structures of data for achieving the efficiency. Further, with the learnt k
AN
value, kNN approach is applied to classification, regression and missing data imputation. We experi-
mentally evaluate the proposed approach with 20 real datasets, and show that our algorithm is much
better than previous kNN algorithms in terms of data mining tasks, such as classification, regression
and missing value imputation.
c 2017 Elsevier Ltd. All rights reserved.
M
h√ i
ED
T
are two test samples that are predicted to ‘+’ class according to is concluded in Section 5.
the kNN rule. And the left test sample is incorrectly predicted.
IP
When setting k=1, the test samples are both incorrectly pre-
dicted. From the training examples, it is reasonable to take k=3 2. Related Work
and k=7 for the left test sample and the right one, respectively.
CR
For a similar scenario of missing value imputation in Figure The study of kNN method has been a hot research topic in
2, the right and left test samples should be assigned different k, data mining and machine learning since the algorithm was pro-
i.e., k=3 and k=5 respectively. This scenario also indicates that posed in 1967 (Cover and Hart, 1967). In this section, we
different test samples should take different numbers of nearest briefly review the applications of kNN algorithm in data min-
neighbors in real kNN prediction applications. That is to say,
setting a fixed constant for all test samples may often lead to
low prediction rates in real classification applications.
Motivated by the above facts, this paper proposes a k-
US
ing tasks, such as classification, regression and missing value
imputation.
AN
parameter computation for kNN approximate prediction based 2.1. Classification
on Sparse learning, called S-kNN 1 (Cheng et al., 2014). The
k-parameter computation can identify different k values for pre- KNN classification algorithm first selects k closest sam-
dicting different test samples with kNN algorithm. This is car- ples(i.e., k nearest neighbors) for a test sample from all the
M
ried out by reconstructing a sparse coefficient matrix between training samples, and then predicts the test sample with a simple
test samples and training data (Zhu et al., 2017d)(Zhu et al., classifier, e.g., majority classification rule. Liu et al. designed a
2016b). With the matrix, an optimal k value can be obtained new anomaly removal algorithm under the framework of kNN
for each test sample one by one. In the reconstruction, a least classification (Liu et al., 2010), which adopts mutual nearest
ED
square loss function is applied to achieve the minimal recon- neighbors whose advantage is that pseudo nearest neighbors
struction error, and an norm regularization is utilized to result in can be identified instead of k nearest neighbors to determine the
the element-wise sparsity (i.e., the sparse codes appear in the el- class labels of unknown samples. Weinberger et al. used semi-
ement of the coefficient matrix) for generating various k values definite programming to learn a Mahanalobis distance metric
PT
for different test samples (Zhu et al., 2017c)(Zhu et al., 2017b). for kNN classification, and adopted the target that k nearest
We also employ the Locality Preserving Projection (LPP) regu- neighbors always belong to the same class to optimize the mea-
larization to preserve the local structures of data during the re- sure metric, which samples from different classes are sepa-
CE
construction process, aiming to further improve the reconstruc- rated by a large margin (Weinberger and Saul, 2009). More-
tion performance (He and Niyogi, 2003)(Hu et al., 2017). The over, Goldberger et al. proposed a novel non-parametric kNN
proposed S-kNN algorithm is experimentally evaluated against classification that learns a new quadratic distance metric and
data mining tasks, such as classification, regression and missing calls neighborhood component analysis (NCA) method (Gold-
AC
value imputation. Comparing with previous kNN algorithms, berger et al., 2004). This method focuses on the learned dis-
the main contributions are as follows. tance to be low-rank, so as to saving the storage and search
costs. Jamshidi and Kaburlasos proposed an effective synergy
• Existing kNN approximate prediction algorithm is with a of the Intervals’ Number k-nearest neighbor classifier, and the
fixed k value for the whole problem space. The S-kNN gravitational search algorithm (GSA) for stochastic search and
algorithm identifies an optimal k value for each test sam- optimization (Jamshidi and Kaburlasos, 2014). Saini et al. pre-
ple, i.e., the parameter k can be different for different test sented an application of k-Nearest Neighbor (kNN) algorithm
samples. as a classifier for detection of QRS-complex in ECG (Saini
et al., 2013). This algorithm uses a digital band-pass filter to
reduce the interference present in ECG false detection signal.
1 In this manuscript, we rewrote the parts (i.e., Section 1 and Section 4) and For avoiding the influence of k value, Varmuza et al. used the
added the parts (i.e., Section 2.1, Section 2.2, Section 2.3, Section 3.3, and repeated double cross validation method to search an optimum
Section 3.4), compared to our former conference version. k for k nearest neighbor classification (Varmuza et al., 2014).
ACCEPTED MANUSCRIPT
4
2.2. Regression selection and KNN-based imputation, and then estimated miss-
The kNN regression has been widely used and studied for ing values by the kNN algorithm (Meesad and Hengpraprohm,
many years in pattern recognition and data mining. In regres- 2008). Recently, Zhang developed a kerne-based missing value
sion analysis, Burba et al. utilized kernel estimator based some imputation algorithm that takes the attribute correlations within
asymptotic properties of the kNN to improving the performance data, so as to making optimal statistical parameters: mean, dis-
of kNN regression (Burba et al., 2009). Moreover, the purpose tribution function after missing-data are imputed. Thus, the
of their work utilized local adaptive bandwidth to study the non- method improves the nearest neighbors of missing data the per-
parametric kNN algorithm. Ferraty and Vieu utilized the func- formance of the previous kNN algorithm (Zhang et al., 2006).
tional version of the Nadaraya-Watson kernel type estimator to Hoef et al. proposed two methods i.e., the spatial linear model
construct the non-parametric characteristics of kNN algorithm and k nearest neighbor for mapping and estimating totals. In or-
for estimation, classification and discrimination on high dimen- der to enhance prediction and understanding, they employed a
sional data (Ferraty and Vieu, 2006). In the theory of kNN al- Bayesian approach account for the covariance parameters (Hoef
gorithm, Mack studied the L2 convergence and the asymptotic and Hailemariam, 2013). For adjusting the estimated missing
T
distribution (Mack, 1981), and Devroye proved the strong con- values to the overall size of the compositional parts of the neigh-
sistency and the uniform convergence of kNN algorithm (De- bors, Hron et al. proposed that the distance utilizes the k nearest
IP
vroye et al., 1981). Hu et al. proposed a data-driven method for neighbor to procedure based on the Aitchison distance and uses
the battery capacity estimation, and used a non-linear kernel re- an iterative model-based imputation technique to search the re-
sult of the proposed k-nearest neighbor procedure (Hron et al.,
CR
gression model based on the kNN to capture the dependency
of the capacity on the features. This work also utilizes the 2010).
adaptation of particle swarm optimizations to find the feature From the above subsections, existing kNN methods use a
weights for the kNN regression model (Hu et al., 2014). Goyal fixed k value for the whole problem space and often lead to
et al. took the interrelatedness of these metrics into account and poor predictions.
statistically established the extent to improve the explanatory
power of multiple linear regression. And then they conducted
stepwise regression to identify influential metrics to avoid over
US
3. Proposed Method
AN
fitting of data, and proposes suitability of kNN regression in In this section, we introduce some basic concepts used in
the development of fault prediction model (Goyal et al., 2014). our proposed method. And then describe the S-kNN method.
Cycle time of wafer lots for semiconductor fab was a critical Finally, we improve the S-kNN method by optimizing the ob-
task, therefore, Ni et al. combined the particle swarm optimiza- jective function.
tion with a Gaussian mutation operator and a simulated weight
M
of the features for kNN regression, and then used it to pre- 3.1. Notation
dict the cycle time of wafer fab (Ni et al., 2012). Zhou pro-
Throughout the paper, we denote matrices as boldface up-
posed semi-supervised regression with co-training (Zhou and
percase letters, vectors as boldface lowercase letters and scalars
ED
stance, Zhang et al. utilized grey-based distance measure to re- and d stand for the numbers of training samples, test samples,
place Euclidean distance in conventional kNN algorithm, which and feature dimension, respectively. For obtaining the recon-
can improve the performance of kNN imputation algorithm, re- struction coefficient matrix W ∈ Rn×m , we reconstruct the test
ferred Grey-Based kNN Iteration Imputation (GBKII) (Zhang samples with the training samples. Thus, the objective function
et al., 2007). It is an instance-based and a non-parametric defined as follows (Yager and Petry, 2014).
imputation algorithm. Chen and Shao proposed the naive X
jackknife variance estimators that treat imputed values as ob- arg min wi T xi − yi (1)
W i
served data produces serious underestimation based on nearest-
neighbor imputation. This method is a nonparametric variance where wi, j is utilized to measure the correlation among yi and
estimation technique, asymptotically unbiased and consistent training samples x j . The larger the value of wi, j is, the more
for the sample means (Chen and Shao, 2001). Meesad and relevant between ith test sample and jth training sample is. In
Hengpraprohm first proposed a methodology to impute miss- particular, the case of wi, j = 0 denotes that there is uncorrelation
ing values in microarray, which combines KNN-based feature between yi and x j .
ACCEPTED MANUSCRIPT
5
To carry out the reconstruction, LPP(locality preserving pro- where δ is a tuning parameter. The optimal solution of Eq.(6)
jection) (He and Niyogi, 2003) is applied to obtain an optimal can be described as a closed solution W = (XXT + δI)−1 XY,
linear transformation W. Lpp technique can preserve the local where I ∈ Rn×n is an identity matrix.
structure of original data in the new space, i.e., W converts the The regularization term ℓ1 −norm has been proved to gener-
high-dimensional data X into the low-dimensional data Y with ate zero elements in a matrix, i.e., lead to sparsity (Zhu et al.,
the following definition: 2017a), while many studies have shown that the ℓ2 −norm do
not surely generate sparse result. In this paper, the element wi, j
y j = WT xi , i = 1, 2, · · · , n (2) in W indicates the correlation between the ith test sample and
To this end, the objective function of LPP can be defined as the jth training sample. We expected that each test sample is
follows: only represented by part of training samples, i.e., many zero el-
X 2 ements on each column in W. Therefore, it makes sense for us
min (WT xi − WT x j ) si, j to use an ℓ1 −norm term to replace the ℓ2 −norm. Meanwhile,
W
i, j we also employ the LPP to preserve the local structures of data
T
2
X
= min (WT xi − WT x j ) S (3) after the reconstruction process. Thus, we defined the objective
W
i, j function of the proposed S-kNN method as follows.
IP
where S is the weight matrix and each element of S is defined 1 2
−kx −x k2 arg min WT X − Y + ρ1 tr(WT XLXT W) + ρ2 kWk1 (7)
by a heat kernel si, j = exp( iσ j ) 2 . σ is a tuning parameter. W 2 F
CR
Without loss of generality, we set σ = 1 in our experiments
where ρ1 is a tuning parameter and designed to balance the
which the justification for this choice of weights can be traced 2
back to (Belkin and Niyogi, 2001). magnitude between tr(WT XLXT W) and WT X − Y F . More-
By plugging Eq.(2) into Eq.(3), and some algebraic transfor- over, the larger the value of ρ1 is, the larger the contribution
of LPP in Eq.(7) will be. In particular, Eq.(7) can shrink to
mation operations, we obtain:
1X T
2 i, j
2
(W xi − WT x j ) si, j US
LASSO when setting ρ1 = 0.
Different from LASSO, the proposed S-kNN gives a consid-
eration to preserve the local structures of data via LPP. More-
AN
X X over, the S-kNN is utilized to learn the k value for kNN al-
= (WT xi di, j xi T W) − (WT xi si, j xi T W)
gorithm. Different from that conventional kNN algorithms of-
i i, j
ten use a fixed k value for all test samples, or learn the k
= tr(WT XDXT W) − tr(WT XSXT W) value for each test sample without respect to the correlation
= tr(WT XLXT W) (4) among test samples. The proposed S-kNN algorithm learns
M
an optimal k value for each test sample with the above recon-
where D is a diagonal matrix and the ith diagonal element of D
struction process. During the reconstruction process, the pro-
is defined as di,i = Σ j si, j . Hence, L = D − S is the Laplacian
posed method considers the correlation between test samples
matrix.
ED
test sample and the training samples are. The second column Algorithm 1: Pseudo code of solving Eq.(7).
has one nonzero element. It means that the test sample has
Input: η(0) = 0.01, α(1) = 1, γ = 0.002, ρ1 , ρ2 ;
only one nearest neighbor, i.e., the value k=1. With the same
Output: W;
rule, the third test sample and the fourth test samples have two
1 Initialize t = 1;
nearest neighbors (i.e., the value k=2) and four nearest neigh-
2 Initialize W(1) as a random diagonal matrix;
bors (i.e., the value k=4), respectively. Note that, the sparsity
3 repeat
(i.e., there are many zero elements in W) is generated due to
4 while L(W(t)) > Gη(t−1) (πη(t−1) (W(t)), W(t)) do
the introduction of the ℓ1 −norm in Eq.(7). This leads to that
5 Set η(t − 1) = γη(t − 1);
our algorithm outputs different k values for different test sam-
6 end
ples. Moreover, the LPP is introduced for further improving the
7 Set η(t) = η(t − 1);
performance of the reconstruction process in Eq.(7). However,
8 Compute W(t + 1) = arg min Gη(t) (W, V(t));
almost all conventional kNN algorithms employ a fixed k value √ W
decided by users or experts for all test samples. 9 Compute α(t + 1) =
1+ 1+4α(t)2
;
T
2
10 Compute Eq.(14);
3.4. Optimization
11 until Eq.(7) converges;
IP
Note that the Eq.(7) is a convex but non-smooth function. In
this subsection, we address this by designing a new accelerated
proximal gradient method (Zhu et al., 2014). We first conduct
CR
the proximal gradient method on Eq.(7) by letting: where
√ the coefficient α(t + 1) is usually set as α(t + 1) =
1+ 1+4α(t)2
1 2 2 .
f (W) = WT X − Y F + ρ1 tr(WT XLXT W) (8) Finally, we presnet the pseudo of our proposed optimization
2
ϑ(W) = f (W) + ρ2 kWk1 (9) method in Algorithm 1 and its convergence in Theorem 1.
where Gη(t) (W, W(t)) = f (W(t)) + h∇ f (W(t)), W − W(t)i + where γ > 0 is a predefined constant, L is the Lipschitz constant
η(t) 2 T T of the gradient of f (W) in Eq.(8), and W∗ = arg min ϑ(W).
2 kW−W(t)kF +ρ2 kWk1 , ∇ f (W(t)) = (XX +ρ1 XLX )W(t)−
M
W
T
XŶ , h·, ·i is an inner product operator, η(t) determines the step Theorem 1 shows that the convergence rate of the proposed
size of the t-iteration, W(t) is the value of W obtained at the accelerated proximal gradient method is O( t12 ), where t is the
t-iteration, and ρ2 is a tuning parameter. count number of iterations in Algorithm 1.
ED
clidean projection of W(t) onto the convex set η(t). Taking into For regression and missing value imputation tasks, the bigger
account the separability of W(t + 1) on each row, i.e., wi (t + 1), the correlation between a test sample and its nearest neighbors
we update the weights for each row individually as follows. is, the larger the contribution of the nearest neighbors to the test
1 i 2 ρ2 sample is. Therefore, we propose to employ a weighted method
AC
Meanwhile, in order to accelerate the proximal gradient where n is the number of training samples, and ytrain(i) stands
method in Eq.(8), we introduce an auxiliary variable V(t + 1) for the true value of the ith training sample.
as follows as follows. For classification applications, the proposed S-kNN algo-
α(t) − 1 rithm uses k nearest neighbors of each test sample to predict
V(t + 1) = W(t) + (W(t + 1) − W(t)) (14) its class label with the majority rule.
α(t + 1)
ACCEPTED MANUSCRIPT
7
T
Abalone 4177 8 imputation No
endsw Eunite2001 336 16 imputation No
1 Normalizing X and Y (When Y is class labels without Housing 506 13 imputation No
IP
normalization); Pyrim 74 28 imputation No
YachtHydrodynamics(Yacht) 74 7 imputation No
2 Optimizing Eq.(7) to obtain the optimal solution W;
Obtaining the optimal k value for test samples based on W;
CR
3
4 switch task do 4.1. Experimental setting
5 case 1 In our experiments, standard kNN algorithm was regarded as
6 Obtaining class labels via majority rule; first comparison algorithm, where the k value is setting to 5.
7 end The second compared algorithm is Eq.(7) with the setting of
8
9
10
case 2
end
Obtaining prediction value via Eq.(16);
case 3
US
ρ1 = 0, i.e., via LASSO to learn different k values for test sam-
ples. We call this algorithm as L-kNN, with which we would
like to show the importance of preserving the local structures of
AN
11 data (Kang and Cho, 2008; Tibshirani, 1996).
12 Obtaining imputation value via Eq.(16); There are 20 datasets involved to validate the proposed algo-
13 end rithm, which were downloaded from UCI (Bache and Lichman,
14 endsw 2013), LIBSVM (Chang and Lin, 2011) and the literature3 .
These datasets are detailed in Table 1. We conduct experiments
M
fication. We describe these computations with Algorithm 2 as conducted experiments by 10-fold cross-validation method, and
follows. repeated the whole process 10 times to avoid the possible bias.
In Algorithm 2, the input data is first normalized. And then, The classification accuracy is employed to measure the clas-
the dataset is divided into a set of training samples and a set of sification efficiency which is defined as follows.
PT
test samples for 10-fold cross validation. Thirdly, the correla- ncorrect
Accuracy = (17)
tion coefficient W between training samples and test samples is n
computed with Eq.(7), and the optimal solution W is obtained where n is the number of all samples, ncorrect is the number of
with the proposed optimization process. Consequently, we gen- correct classification samples. The higher accuracy the algo-
CE
erate the most correlative training samples of a test sample, rithm is, the better performance of classification it is.
i.e., top k candidates of nearest neighbors (training samples) of The Root Mean Square Error (RMSE) (Zhu et al., 2013b)
the test sample. Finally, we use the k nearest training samples and correlation coefficient are employed to evaluate the per-
to predict the test sample. This means that different test sam-
AC
Accuracy
Accuracy
0.8
predictions and observations. The correlation coefficient is be- 0.8
0.7
0.75
tween +1 and -1, where 1 is perfect positive correlation, 0 is 0.6
0.7
no correlation, and -1 is totally negative correlation. Generally, 0.5
0.65
the larger the correlation coefficient is, the more accurate the
0.4
0 2 4 6 8 10 0 2 4 6 8 10
prediction is. Iterations Iterations
T
0.8 0.95
datasets, in terms of three data mining tasks, classification, re- S−kNN
L−kNN
S−kNN
0.9
gression and missing value imputation. We evaluate both the 0.7
IP
0.85
Accuracy
Accuracy
regression and missing value imputation in the same subsection
0.8
because they have the same prediction model. 0.6
0.75
CR
0.5 0.7
4.2.1. Data classification 0.65
We summarize the classification accuracies of all algorithms 0.4
0 2 4 6 8 10 0 2 4 6 8 10
in Table 2. We listed the results of each repeated average value Iterations Iterations
0.95
1
kNN
L−kNN
S−kNN
AN
0.9 0.9
racy, the proposed S-kNN algorithm averagely improves 4.47% 0.8 0.85
Accuracy
Accuracy
kNN
and 22.38% accuracies more than the L-kNN algorithm and L−kNN 0.8
0.7
S−kNN
standard kNN method, respectively. In addition, Figures 3-12 0.75
0.6
have showed that the S-kNN algorithm had the highest accuracy 0.7
M
0.5
in each of iteration. From the results on the Satimage (a multi- 0.65
0.9
has averagely improved by 5.47% accuracy more than the L-
Accuracy
Accuracy
0.8 0.8
kNN. And on Heart dataset, the S-kNN algorithm improved by 0.7
0.7
11.48% accuracy more than the L-kNN method. 0.6
CE
Both the S-kNN and L-kNN outperformed the kNN algo- 0.6
0.5
rithm. It is the reason that the both methods using different
0.4 0.5
k values in kNN algorithm and lead to better classification per- 0 2 4 6
Iterations
8 10 0 2 4 6
Iterations
8 10
Accuracy
0.85
0.6
We also depict the RMSE results of repeated the 10-fold cross- 0.8
validation method process 10 times for all algorithms in Figures 0.4 0.75
13-22, and the correlation coefficient results of all algorithms in 0.7
each of iterations in Figures 23-32. 0.2 0.65
0 2 4 6 8 10 0 2 4 6 8 10
From Tables 3 and 4, in terms of RMSE, we can find that Iterations Iterations
RMSE
2 Australian 0.5319±0.0036 0.6493±0.0013 0.6826±0.0021
0.015 Cleveland 0.7048±0.0029 0.7619±0.0030 0.8048±0.0038
1.5
Derm 0.6057±0.0061 0.9343±0.0007 0.9714±0.0009
0.01 Ionosphere 0.6743±0.0031 0.8286±0.0025 0.8571±0.0025
1
Heart 0.5381±0.0031 0.6741±0.0033 0.7889±0.0043
0.5 0.005
0 2 4 6 8 10 0 2 4 6 8 10 Sonar 0.6450±0.0047 0.7200±0.0051 0.7850±0.0034
Iterations Iterations
Satimage 0.2871±0.0029 0.6758±0.0026 0.7424±0.0019
Seeds 0.8048±0.0033 0.8714±0.0015 0.9238±0.0011
Fig. 13. Bodyfat Fig. 14. Concretes- MEAN 0.5972±0.0200 0.7608±0.0088 0.8155±0.0070
lump
5 0.25
kNN Table 3. Regression performances in terms of rmse(mean±std)
T
kNN
L−kNN L−kNN
S−kNN S−kNN
Dataset kNN L-kNN S-kNN
4.5
0.2 Bodyfat 2.1e-05±3.8e-09 1.4e-05±3.3e-09 1.3e-05±3.9e-09
IP
4 Concreteslump 0.0176±4.1e-04 0.0151±3.2e-04 0.0139±3.1e-04
RMSE
RMSE
CR
0.1
Wine-white 0.0071±1.6e-06 0.0064±1.0e-06 0.0061±9.3e-07
3
2.5 0.05
0 2 4 6 8 10 0 2 4 6 8 10
Iterations Iterations kNN. Figures. 13-22 also showed that the S-kNN algorithm
had higher prediction performance than the two compared al-
Fig. 15. Mpg
10
x 10
−3
kNN
L−kNN
Fig. 16. Triazines
16
kNN
L−kNN
US
gorithms in each of iteration. For the correlation coefficient,
Figures 23-32 demonstrated the results as similar to that in Fig-
ures 13-22.
As we have seen, in the evaluation of RMSE, the proposed
AN
14 S−kNN
S−kNN
9
12
S-kNN has averagely has averagely reduced the classification
RMSE
RMSE
8 error by 0.0734 and 0.0269 less than the L-kNN and standard
7
10 kNN, respectively. In particular, the S-kNN algorithm made
the most improvement on Mpg dataset, i.e., reduced 0.1216 and
M
6 8
0.3387 less than the L-kNN and standard kNN. There is a sim-
5
0 2 4 6 8 10
6
0 2 4 6 8 10 ilar performance in the evaluation of S-kNN missing value im-
Iterations Iterations
putation with the RMSE.
ED
Fig. 17. Wine-white Fig. 18. Abalone In terms of correlation coefficient on five datasets, the pro-
posed S-kNN has averagely increases by 3.66% more than L-
40 8 kNN, and by 10.42% more than standard kNN. Moreover, the
kNN kNN
L−kNN L−kNN proposed method achieved the maximal increment on Triazines
S−kNN 7 S−kNN
dataset, i.e., 8.8% more than the L-kNN and 22.85% more than
PT
35
6 standard kNN.
RMSE
RMSE
30
Like the above experiments for evaluating S-kNN classifica-
5
tion, the evaluations for S-kNN regression and missing value
CE
25
4 imputation have also demonstrated two improvements as fol-
20 3
lows.
0 2 4 6 8 10 0 2 4 6 8 10
Iterations Iterations
• The proposed S-kNN outperformed L-kNN duo to that the
AC
Fig. 19. Eunite2001 Fig. 20. Housing preservation of local structures of data is well considered
in the S-kNN algorithms.
0.09 15 kNN
0.08 L−kNN • Both the S-kNN and L-kNN outperformed standard kNN
14
S−kNN because they learn different optimal k values for different
0.07 13
0.06
RMSE
RMSE
12
0.05 11
0.04 kNN
Table 4. Imputation performances in terms of rmse(mean±std)
10
L−kNN Dataset kNN L-kNN S-kNN
0.03 S−kNN 9
Abalone 2.3894±0.1019 2.1261±0.1360 2.0850±0.1207
0.02
0 2 4 6 8 10
8
0 2 4 6 8 10
Eunite2001 31.2345±12.691 28.8972±12.253 26.0969±17.403
Iterattions Iterations Housing 5.2228±1.2315 3.8666±0.2108 3.7948±0.2175
Pyrim 0.0673±0.0002 0.0492±0.0003 0.0484±0.0003
Fig. 21. Pyrim Fig. 22. Yacht Yacht 10.5379±3.0492 10.1437±2.6463 9.4637±3.2310
ACCEPTED MANUSCRIPT
10
Correlation Coefficient
Correlation Coefficient
S−kNN
Dataset kNN L-kNN S-kNN 0.9
Bodyfat 0.9846±8.1e-05 0.9918±4.4e-05 0.9930±4.3e-05 0.99
0.8
Concreteslump 0.6606±0.0312 0.7719±0.0249 0.8194±0.0195
0.7
Mpg 0.8865±0.0019 0.8978±0.0014 0.9076±0.0011 0.98
Triazines 0.4256±0.0148 0.5661±0.0123 0.6541±0.0106 0.6
Wine-white 0.9041±4.9e-04 0.9260±2.7e-04 0.9307±1.9e-04 0.97
0.5
0.96 0.4
0 2 4 6 8 10 0 2 4 6 8 10
Iterations Iterations
Table 6. Imputation performances in terms of correlation
coefficient(mean±std) Fig. 23. Bodyfat Fig. 24. Concretes-
Dataset kNN L-kNN S-kNN lump
Abalone 0.6499±0.0008 0.7282±0.0024 0.7376±0.0021
Eunite2001 0.8239±0.0048 0.8493±0.0036 0.8741±0.0013
1 kNN 0.9
T
Housing 0.8285±0.0004 0.9142±0.0004 0.9197±0.0003 kNN
L−kNN L−kNN
Pyrim 0.8283±0.0079 0.9140±0.0025 0.9201±0.0022
Correlation Cofficient
0.8
Correlation Coefficient
S−kNN S−kNN
Yacht 0.7200±0.0044 0.7948±0.0039 0.8162±0.0034 0.95
IP
0.7
0.9
0.6
test samples.
CR
0.85 0.5
0.4
In a word, according to the results on three learning tasks, 0.8
0 2 4 6 8 10
Iterations
we can make the following conclusion: First, it might be rea- 0 2 4 6
Iterations
8 10
0.85
kNN
L−kNN
AN
0.94 0.8
Correlation Coefficient
S−kNN
Correlation Coefficient
5. Conclusion
0.92 0.75
0.88 0.65
test samples with learning different k values for different test L−kNN
S−kNN
samples according to the distribution of the data. It is an 0.86
0 2 4 6 8 10 0 2 4 6 8 10
example-driven k-parameter computation. The key is the re- Iterations Iterations
S−kNN S−kNN
PT
0.9 0.95
datasets have demonstrated that the proposed S-kNN method
0.9
is efficiency and promising, compared with the state-of-the-art 0.85
0.8
0.75
6. Acknowledgements 0 2 4 6 8 10
0.75
0 2 4 6 8 10
Iterations Iterations
AC
This work was supported in part by the China Key Re- Fig. 29. Eunite2001 Fig. 30. Housing
search Program (Grant No: 2016YFB1000905), the China 973
Program (Grant No: 2013CB329404), the China 1000-Plan 1.1
kNN
1
National Distinguished Professorship, the Nation Natural Sci- L−kNN 0.95 kNN
Correlation Coefficient
L−kNN
Correlation Coefficient
1 S−kNN
ence Foundation of China (Grants No: 61573270, 61672177, 0.9 S−kNN
References Varmuza, K., Filzmoser, P., Hilchenbach, M., Krger, H., Siln, J., 2014. Knn
classification evaluated by repeated double cross validation: Recognition of
Arefi, M., Taheri, S.M., 2015. Least-squares regression based on atanassov’s minerals relevant for comet dust. Chemometrics & Intelligent Laboratory
intuitionistic fuzzy inputs–outputs and atanassov’s intuitionistic fuzzy pa- Systems 138, 64–71.
rameters. IEEE Transactions on Fuzzy Systems 23, 1142–1154. Weinberger, K.Q., Saul, L.K., 2009. Distance metric learning for large margin
Bache, K., Lichman, M., 2013. UCI machine learning repository. URL: http: nearest neighbor classification. Journal of Machine Learning Research 10,
//archive.ics.uci.edu/ml. 207–244.
Belkin, M., Niyogi, P., 2001. Laplacian eigenmaps and spectral techniques for Wu, X., Kumar, V., et al., 2008. Top 10 algorithms in data mining. Knowledge
embedding and clustering, in: International Conference on Neural Informa- and Information Systems 14, 1–37.
tion Processing Systems: Natural and Synthetic, pp. 585–591. Yager, R.R., Petry, F.E., 2014. Hypermatching: Similarity matching with ex-
Burba, F., Ferraty, F., Vieu, P., 2009. k-nearest neighbour method in functional treme values. IEEE Transactions on Fuzzy Systems 22, 949–957.
nonparametric regression. Journal of Nonparametric Statistics 21, 453–469. Zhang, C., Zhu, X., et al., 2007. GBKII: an imputation method for missing
Chang, C.C., Lin, C.J., 2011. LIBSVM: A library for support vector machines. values, in: PAKDD, pp. 1080–1087.
ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27. Zhang, S., 2011. Shell-neighbor method and its application in missing data
Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. imputation. Applied Intelligence 35, 123–133.
Chen, J., Shao, J., 2001. Jackknife variance estimation for nearest-neighbor Zhang, S., Jin, Z., Zhu, X., 2011. Missing data imputation by utilizing infor-
imputation. Journal of the American Statistical Association 96, 260–269.
T
mation within incomplete instances. Journal of Systems and Software 84,
Cheng, D., Zhang, S., Deng, Z., Zhu, Y., Zong, M., 2014. knn algorithm with 452–459.
data-driven k value, in: ADMA, pp. 499–512. Zhang, S., Li, X., Zong, M., Zhu, X., Cheng, D., 2017a. Learning k for knn
IP
Cover, T., Hart, P., 1967. Nearest neighbor pattern classification. IEEE Trans- classification. ACM Transactions on Intelligent Systems and Technology 8,
actions on Information Theory 13, 21–27. 43.
Devroye, L., et al., 1981. On the almost everywhere convergence of nonpara- Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R., 2017b. Efficient knn classi-
CR
metric regression function estimates. The Annals of Statistics 9, 1310–1319. fication with different numbers of nearest neighbors. IEEE Transactions on
Ferraty, F., Vieu, P., 2006. Nonparametric functional data analysis: theory and Neural Networks and Learning Systems , 10.1109/TNNLS.2017.2673241.
practice. Springer Series in Statistics, Springer. Zhang, S., Qin, Y., Zhu, X., Zhang, J., Zhang, C., 2006. Optimized param-
Ghosh, A.K., 2006. On optimum choice of k in nearest neighbor classification. eters for missing data imputation, in: PRICAI 2006: Trends in Artificial
Computational Statistics & Data Analysis 50, 3113–3123. Intelligence. Springer, pp. 1010–1016.
Goldberger, J., Roweis, S.T., Hinton, G.E., Salakhutdinov, R., 2004. Neigh-
US
Zhang, S., Wu, X., Zhu, M., 2010. Efficient missing data imputation for super-
bourhood components analysis, in: NIPS, pp. 513–520. vised learning, in: ICCI, pp. 672–679.
Goyal, R., Chandra, P., Singh, Y., 2014. Suitability of knn regression in the Zhou, Z.H., Li, M., 2005. Semi-supervised regression with co-training, in:
development of interaction based software fault prediction models. Ieri Pro- IJCAI, pp. 908–916.
cedia 6, 15–21. Zhu, X., Huang, Z., Shen, H.T., Zhao, X., 2013a. Linear cross-modal hashing
AN
He, X., Niyogi, P., 2003. Locality preserving projections, in: NIPS, pp. 153– for efficient multimedia search, in: ACM Multimedia, pp. 143–152.
160. Zhu, X., Huang, Z., et al., 2013b. Video-to-shot tag propagation by graph sparse
Hoef, Jay M, V., Hailemariam, T., 2013. A comparison of the spatial linear group lasso. IEEE Transactions on Multimedia 15, 633–646.
model to nearest neighbor (k-nn) methods for forestry applications. Plos Zhu, X., Li, X., Zhang, S., 2016a. Block-row sparse multiview multilabel learn-
One 8, e59129. ing for image classification. IEEE transactions on cybernetics 46, 450–461.
Hron, K., Templ, M., Filzmoser, P., 2010. Imputation of missing values Zhu, X., Li, X., Zhang, S., Ju, C., Wu, X., 2017a. Robust joint graph sparse
M
for compositional data using classical and robust methods. Computational coding for unsupervised spectral feature selection. IEEE transactions on
Statistics & Data Analysis 54, 3095–3107. neural networks and learning systems 28, 1263–1275.
Hu, C., Jain, G., Zhang, P., Schmidt, C., Gomadam, P., Gorka, T., 2014. Data- Zhu, X., Li, X., Zhang, S., Xu, Z., Yu, L., Wang, C., 2017b. Graph pca hashing
driven method based on particle swarm optimization and k-nearest neighbor for similarity search. IEEE Transactions on Multimedia .
ED
regression for estimating capacity of lithium-ion battery. Applied Energy Zhu, X., Suk, H., Wang, L., Lee, S., Shen, D., 2017c. A novel relational regu-
129, 49–55. larization feature selection method for joint regression and classification in
Hu, R., Zhu, X., Cheng, D., He, W., Yan, Y., Song, J., Zhang, S., 2017. Graph AD diagnosis. Medical Image Analysis 38, 205–214.
self-representation method for unsupervised feature selection. Neurocom- Zhu, X., Suk, H.I., Huang, H., Shen, D., 2017d. Low-rank graph-regularized
puting 220, 130–137. structured sparse regression for identifying genetic biomarkers. IEEE Trans-
Jamshidi, Y., Kaburlasos, V.G., 2014. gsainknn: A gsa optimized, lattice com-
PT
Lall, U., Sharma, A., 1996. A nearest neighbor bootstrap for resampling hydro- encoding approach to hashing. IEEE Transactions on Image Processing 23,
logic time series. Water Resources Research 32, 679–693. 3737–3750.
Liu, H., Zhang, S., Zhao, J., Zhao, X., Mo, Y., 2010. A new classification
algorithm using mutual nearest neighbors, in: GCC, pp. 52–57.
Mack, Y.P., 1981. Local properties of k-nn regression estimates. SIAM Journal
AC