0% found this document useful (0 votes)
10 views

A_novel_k_NN_algorithm_with_data_driven

This manuscript presents a novel kNN algorithm, termed S-kNN, which computes a data-driven k parameter for each test sample, allowing for different k values in classification, regression, and missing data imputation tasks. The method utilizes a sparse coefficient matrix and incorporates ℓ1-norm and Locality Preserving Projection regularizations to enhance performance. Experimental evaluations demonstrate that S-kNN outperforms traditional kNN algorithms across various data mining tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

A_novel_k_NN_algorithm_with_data_driven

This manuscript presents a novel kNN algorithm, termed S-kNN, which computes a data-driven k parameter for each test sample, allowing for different k values in classification, regression, and missing data imputation tasks. The method utilizes a sparse coefficient matrix and incorporates ℓ1-norm and Locality Preserving Projection regularizations to enhance performance. Experimental evaluations demonstrate that S-kNN outperforms traditional kNN algorithms across various data mining tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Accepted Manuscript

A novel kNN Algorithm with Data-driven k Parameter Computation

Shichao Zhang, Debo Cheng, Zhenyun Deng, Ming Zong,


Xuelian Deng

PII: S0167-8655(17)30356-2
DOI: 10.1016/j.patrec.2017.09.036
Reference: PATREC 6948

To appear in: Pattern Recognition Letters

Received date: 1 June 2017


Revised date: 1 September 2017
Accepted date: 25 September 2017

Please cite this article as: Shichao Zhang, Debo Cheng, Zhenyun Deng, Ming Zong, Xuelian Deng, A
novel kNN Algorithm with Data-driven k Parameter Computation, Pattern Recognition Letters (2017),
doi: 10.1016/j.patrec.2017.09.036

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
1

Highlights

• Existing kNN approximate prediction algorithm is with a


fixed k value for the whole problem space. The S-kNN
algorithm identifies an optimal k value for each test sam-
ple, i.e., the parameter k can be different for different test
samples.

• Different from conventional Least Absolute Shrinkage and


Selection Operator (LASSO), our approach takes the local
structures of samples into account.

• This paper proposes a novel optimization method to solve


the designed objective function.

T
IP
CR
US
AN
M
ED
PT
CE
AC
ACCEPTED MANUSCRIPT
2

Pattern Recognition Letters


journal homepage: www.elsevier.com

A novel kNN Algorithm with Data-driven k Parameter Computation

Shichao Zhanga , Debo Chenga,b , Zhenyun Denga , Ming Zongc , Xuelian Dengd
a Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, Guangxi, China.

T
b Information Technology and Mathematical Sciences, University of South Australia, Adelaide, Australia
c Institute of Natural and Mathematical Sciences, Massey University, Auckland, New Zealand

IP
d College of Public Health and Management, Guangxi University of Chinese Medicine, Nanning, Guangxi, China

CR
ABSTRACT

This paper studies an example-driven k-parameter computation that identifies different k values for dif-
ferent test samples in kNN prediction applications, such as classification, regression and missing data
imputation. This is carried out with reconstructing a sparse coefficient matrix between test samples

US
and training data. In the reconstruction process, an ℓ1 −norm regularization is employed to generate an
element-wise sparsity coefficient matrix, and an LPP (Locality Preserving Projection) regularization
is adopted to keep the local structures of data for achieving the efficiency. Further, with the learnt k
AN
value, kNN approach is applied to classification, regression and missing data imputation. We experi-
mentally evaluate the proposed approach with 20 real datasets, and show that our algorithm is much
better than previous kNN algorithms in terms of data mining tasks, such as classification, regression
and missing value imputation.
c 2017 Elsevier Ltd. All rights reserved.
M

h√ i
ED

1. Introduction it is without any theories to guarantee that k = n is suitable


for each test sample. Liu et al. pointed out, it has been proved
The kNN (k Nearest Neighbors) algorithm is a non- that a fixed k value is not suitable for many test samples in a
parametric, or an instance-based, or a lazy method, and has given training dataset (Liu et al., 2010).
been regarded as one of the simplest method in data mining and
PT

We now illustrate the above limitations of kNN algorithm


machine learning (Qin et al., 2013)(Zhang et al., 2017a)(Zhang
with a fixed k value in Figures 1 and 2. Figure 1 is an example
et al., 2017b). The principle of kNN algorithm is that the most
of a binary classification task, where the classes training sam-
similar samples belonging to the same class have high probabil-
ples are marked as ‘+’ and ‘-’ respectively, and the labels of test
CE

ity. Generally, the kNN algorithm first finds k nearest neighbors


samples are marked with the symbol ‘?’. Figure 2 is an example
of a query in training dataset, and then predicts the query with
of a missing data imputation, where the symbol ‘?’ stands for
the major class in the k nearest neighbors. Therefore, it has re-
data with missing values.
cently been selected as one of top 10 algorithms in data mining
AC

(Wu et al., 2008).


As well known, kNN algorithm is often sensitive to the selec-
tion of the k value. Although efforts have been focused on this
topic for a long time, setting k value is still very challengeable
in kNN algorithm (Zhang et al., 2010). Lall and Sharama
√ men-
tioned that setting a suitable k should satisfy k = n for train-
ing datasets with sample size larger than 100 (Lall and Sharma,
1996). Ghosh investigated a Bayesian method for guiding us
well in selecting k mainly (Ghosh, 2006). Mitra et al. thought

∗∗ Correspondingauthor: Shichao Zhang


e-mail: [email protected] (Shichao Zhang) Fig. 1. Training examples for kNN classification
ACCEPTED MANUSCRIPT
3

• Different from conventional Least Absolute Shrinkage and


Selection Operator (LASSO) (Kang and Cho, 2008; Tib-
shirani, 1996), our approach takes the local structures of
samples into account.

• This paper proposes a novel optimization method to solve


the designed objective function.

The remainder of the paper is organized as follows. Section 2


briefly reviews related kNN methods for classification, regres-
Fig. 2. Training examples for kNN regression/ missing value imputation. sion and missing value imputation. Section 3 is the main body
of our S-kNN method. The proposed method is experimentally
In Figure 1, when setting k=5 for the kNN algorithm, there evaluated with real datasets in Section 4. Finally, this research

T
are two test samples that are predicted to ‘+’ class according to is concluded in Section 5.
the kNN rule. And the left test sample is incorrectly predicted.

IP
When setting k=1, the test samples are both incorrectly pre-
dicted. From the training examples, it is reasonable to take k=3 2. Related Work
and k=7 for the left test sample and the right one, respectively.

CR
For a similar scenario of missing value imputation in Figure The study of kNN method has been a hot research topic in
2, the right and left test samples should be assigned different k, data mining and machine learning since the algorithm was pro-
i.e., k=3 and k=5 respectively. This scenario also indicates that posed in 1967 (Cover and Hart, 1967). In this section, we
different test samples should take different numbers of nearest briefly review the applications of kNN algorithm in data min-
neighbors in real kNN prediction applications. That is to say,
setting a fixed constant for all test samples may often lead to
low prediction rates in real classification applications.
Motivated by the above facts, this paper proposes a k-
US
ing tasks, such as classification, regression and missing value
imputation.
AN
parameter computation for kNN approximate prediction based 2.1. Classification
on Sparse learning, called S-kNN 1 (Cheng et al., 2014). The
k-parameter computation can identify different k values for pre- KNN classification algorithm first selects k closest sam-
dicting different test samples with kNN algorithm. This is car- ples(i.e., k nearest neighbors) for a test sample from all the
M

ried out by reconstructing a sparse coefficient matrix between training samples, and then predicts the test sample with a simple
test samples and training data (Zhu et al., 2017d)(Zhu et al., classifier, e.g., majority classification rule. Liu et al. designed a
2016b). With the matrix, an optimal k value can be obtained new anomaly removal algorithm under the framework of kNN
for each test sample one by one. In the reconstruction, a least classification (Liu et al., 2010), which adopts mutual nearest
ED

square loss function is applied to achieve the minimal recon- neighbors whose advantage is that pseudo nearest neighbors
struction error, and an norm regularization is utilized to result in can be identified instead of k nearest neighbors to determine the
the element-wise sparsity (i.e., the sparse codes appear in the el- class labels of unknown samples. Weinberger et al. used semi-
ement of the coefficient matrix) for generating various k values definite programming to learn a Mahanalobis distance metric
PT

for different test samples (Zhu et al., 2017c)(Zhu et al., 2017b). for kNN classification, and adopted the target that k nearest
We also employ the Locality Preserving Projection (LPP) regu- neighbors always belong to the same class to optimize the mea-
larization to preserve the local structures of data during the re- sure metric, which samples from different classes are sepa-
CE

construction process, aiming to further improve the reconstruc- rated by a large margin (Weinberger and Saul, 2009). More-
tion performance (He and Niyogi, 2003)(Hu et al., 2017). The over, Goldberger et al. proposed a novel non-parametric kNN
proposed S-kNN algorithm is experimentally evaluated against classification that learns a new quadratic distance metric and
data mining tasks, such as classification, regression and missing calls neighborhood component analysis (NCA) method (Gold-
AC

value imputation. Comparing with previous kNN algorithms, berger et al., 2004). This method focuses on the learned dis-
the main contributions are as follows. tance to be low-rank, so as to saving the storage and search
costs. Jamshidi and Kaburlasos proposed an effective synergy
• Existing kNN approximate prediction algorithm is with a of the Intervals’ Number k-nearest neighbor classifier, and the
fixed k value for the whole problem space. The S-kNN gravitational search algorithm (GSA) for stochastic search and
algorithm identifies an optimal k value for each test sam- optimization (Jamshidi and Kaburlasos, 2014). Saini et al. pre-
ple, i.e., the parameter k can be different for different test sented an application of k-Nearest Neighbor (kNN) algorithm
samples. as a classifier for detection of QRS-complex in ECG (Saini
et al., 2013). This algorithm uses a digital band-pass filter to
reduce the interference present in ECG false detection signal.
1 In this manuscript, we rewrote the parts (i.e., Section 1 and Section 4) and For avoiding the influence of k value, Varmuza et al. used the
added the parts (i.e., Section 2.1, Section 2.2, Section 2.3, Section 3.3, and repeated double cross validation method to search an optimum
Section 3.4), compared to our former conference version. k for k nearest neighbor classification (Varmuza et al., 2014).
ACCEPTED MANUSCRIPT
4

2.2. Regression selection and KNN-based imputation, and then estimated miss-
The kNN regression has been widely used and studied for ing values by the kNN algorithm (Meesad and Hengpraprohm,
many years in pattern recognition and data mining. In regres- 2008). Recently, Zhang developed a kerne-based missing value
sion analysis, Burba et al. utilized kernel estimator based some imputation algorithm that takes the attribute correlations within
asymptotic properties of the kNN to improving the performance data, so as to making optimal statistical parameters: mean, dis-
of kNN regression (Burba et al., 2009). Moreover, the purpose tribution function after missing-data are imputed. Thus, the
of their work utilized local adaptive bandwidth to study the non- method improves the nearest neighbors of missing data the per-
parametric kNN algorithm. Ferraty and Vieu utilized the func- formance of the previous kNN algorithm (Zhang et al., 2006).
tional version of the Nadaraya-Watson kernel type estimator to Hoef et al. proposed two methods i.e., the spatial linear model
construct the non-parametric characteristics of kNN algorithm and k nearest neighbor for mapping and estimating totals. In or-
for estimation, classification and discrimination on high dimen- der to enhance prediction and understanding, they employed a
sional data (Ferraty and Vieu, 2006). In the theory of kNN al- Bayesian approach account for the covariance parameters (Hoef
gorithm, Mack studied the L2 convergence and the asymptotic and Hailemariam, 2013). For adjusting the estimated missing

T
distribution (Mack, 1981), and Devroye proved the strong con- values to the overall size of the compositional parts of the neigh-
sistency and the uniform convergence of kNN algorithm (De- bors, Hron et al. proposed that the distance utilizes the k nearest

IP
vroye et al., 1981). Hu et al. proposed a data-driven method for neighbor to procedure based on the Aitchison distance and uses
the battery capacity estimation, and used a non-linear kernel re- an iterative model-based imputation technique to search the re-
sult of the proposed k-nearest neighbor procedure (Hron et al.,

CR
gression model based on the kNN to capture the dependency
of the capacity on the features. This work also utilizes the 2010).
adaptation of particle swarm optimizations to find the feature From the above subsections, existing kNN methods use a
weights for the kNN regression model (Hu et al., 2014). Goyal fixed k value for the whole problem space and often lead to
et al. took the interrelatedness of these metrics into account and poor predictions.
statistically established the extent to improve the explanatory
power of multiple linear regression. And then they conducted
stepwise regression to identify influential metrics to avoid over
US
3. Proposed Method
AN
fitting of data, and proposes suitability of kNN regression in In this section, we introduce some basic concepts used in
the development of fault prediction model (Goyal et al., 2014). our proposed method. And then describe the S-kNN method.
Cycle time of wafer lots for semiconductor fab was a critical Finally, we improve the S-kNN method by optimizing the ob-
task, therefore, Ni et al. combined the particle swarm optimiza- jective function.
tion with a Gaussian mutation operator and a simulated weight
M

of the features for kNN regression, and then used it to pre- 3.1. Notation
dict the cycle time of wafer fab (Ni et al., 2012). Zhou pro-
Throughout the paper, we denote matrices as boldface up-
posed semi-supervised regression with co-training (Zhou and
percase letters, vectors as boldface lowercase letters and scalars
ED

Li, 2005), which employed two kNN regressors with different


as normal italic letters. For a matrix X = [xi j ], its ith row
distance metrics, each of which labeled the unlabeled data for
and jth column are denoted as xi and x j , respectively. The
the others during the learning process.
qP the ℓ2 −norm
Frobenius norm, qPand the ℓ1 −norm are represented
qP
i 2 2 n 2
as kXkF = i kx k2 = j kx j k2 , kx j k2 = i xi, j , and
PT

2.3. Missing value imputation P P


kXk1 = i j |xi, j |, respectively. The transpose operator, trace
In real data mining applications, missing data is often in-
operator and inverse of a matrix X are expressed as XT , tr(X),
evitable. There are many techniques to deal with missing data
and X−1 , respectively.
CE

that can mainly be divided into two categories, missing instance


deletion and missing value imputation (Qin et al., 2007)(Zhang
3.2. Reconstruction and LPP (locality preserving projection)
et al., 2011)(Zhang, 2011), in which the kNN imputation is an
important approximate solution in real applications. For in- Given X = {xi }ni=1 ∈ Rn×d and Y = {yi }m
i=1 ∈ R
m×d
, where n, m,
AC

stance, Zhang et al. utilized grey-based distance measure to re- and d stand for the numbers of training samples, test samples,
place Euclidean distance in conventional kNN algorithm, which and feature dimension, respectively. For obtaining the recon-
can improve the performance of kNN imputation algorithm, re- struction coefficient matrix W ∈ Rn×m , we reconstruct the test
ferred Grey-Based kNN Iteration Imputation (GBKII) (Zhang samples with the training samples. Thus, the objective function
et al., 2007). It is an instance-based and a non-parametric defined as follows (Yager and Petry, 2014).
imputation algorithm. Chen and Shao proposed the naive X
jackknife variance estimators that treat imputed values as ob- arg min wi T xi − yi (1)
W i
served data produces serious underestimation based on nearest-
neighbor imputation. This method is a nonparametric variance where wi, j is utilized to measure the correlation among yi and
estimation technique, asymptotically unbiased and consistent training samples x j . The larger the value of wi, j is, the more
for the sample means (Chen and Shao, 2001). Meesad and relevant between ith test sample and jth training sample is. In
Hengpraprohm first proposed a methodology to impute miss- particular, the case of wi, j = 0 denotes that there is uncorrelation
ing values in microarray, which combines KNN-based feature between yi and x j .
ACCEPTED MANUSCRIPT
5

To carry out the reconstruction, LPP(locality preserving pro- where δ is a tuning parameter. The optimal solution of Eq.(6)
jection) (He and Niyogi, 2003) is applied to obtain an optimal can be described as a closed solution W = (XXT + δI)−1 XY,
linear transformation W. Lpp technique can preserve the local where I ∈ Rn×n is an identity matrix.
structure of original data in the new space, i.e., W converts the The regularization term ℓ1 −norm has been proved to gener-
high-dimensional data X into the low-dimensional data Y with ate zero elements in a matrix, i.e., lead to sparsity (Zhu et al.,
the following definition: 2017a), while many studies have shown that the ℓ2 −norm do
not surely generate sparse result. In this paper, the element wi, j
y j = WT xi , i = 1, 2, · · · , n (2) in W indicates the correlation between the ith test sample and
To this end, the objective function of LPP can be defined as the jth training sample. We expected that each test sample is
follows: only represented by part of training samples, i.e., many zero el-
X 2 ements on each column in W. Therefore, it makes sense for us
min (WT xi − WT x j ) si, j to use an ℓ1 −norm term to replace the ℓ2 −norm. Meanwhile,
W
i, j we also employ the LPP to preserve the local structures of data

T
2
X
= min (WT xi − WT x j ) S (3) after the reconstruction process. Thus, we defined the objective
W
i, j function of the proposed S-kNN method as follows.

IP
where S is the weight matrix and each element of S is defined 1 2
−kx −x k2 arg min WT X − Y + ρ1 tr(WT XLXT W) + ρ2 kWk1 (7)
by a heat kernel si, j = exp( iσ j ) 2 . σ is a tuning parameter. W 2 F

CR
Without loss of generality, we set σ = 1 in our experiments
where ρ1 is a tuning parameter and designed to balance the
which the justification for this choice of weights can be traced 2
back to (Belkin and Niyogi, 2001). magnitude between tr(WT XLXT W) and WT X − Y F . More-
By plugging Eq.(2) into Eq.(3), and some algebraic transfor- over, the larger the value of ρ1 is, the larger the contribution
of LPP in Eq.(7) will be. In particular, Eq.(7) can shrink to
mation operations, we obtain:
1X T
2 i, j
2
(W xi − WT x j ) si, j US
LASSO when setting ρ1 = 0.
Different from LASSO, the proposed S-kNN gives a consid-
eration to preserve the local structures of data via LPP. More-
AN
X X over, the S-kNN is utilized to learn the k value for kNN al-
= (WT xi di, j xi T W) − (WT xi si, j xi T W)
gorithm. Different from that conventional kNN algorithms of-
i i, j
ten use a fixed k value for all test samples, or learn the k
= tr(WT XDXT W) − tr(WT XSXT W) value for each test sample without respect to the correlation
= tr(WT XLXT W) (4) among test samples. The proposed S-kNN algorithm learns
M

an optimal k value for each test sample with the above recon-
where D is a diagonal matrix and the ith diagonal element of D
struction process. During the reconstruction process, the pro-
is defined as di,i = Σ j si, j . Hence, L = D − S is the Laplacian
posed method considers the correlation between test samples
matrix.
ED

and training samples. It first considers the correlations of test


3.3. Approach samples through generating the k values for all test samples.
And then, the correlations of training samples are taken into
When we reconstruct test samples Y with training samples
account by adding the LPP regularization term into the recon-
X to obtain the linear transformation matrix W, it expects to
PT

struction process. Consequently, the proposed S-kNN method


map X into the space of Y via W and make the distance be-
is a data-driven method for selecting the optimal k values.
tween Y and WT X as small as possible. Accordingly, we em-
In Eq.(7), each element wi, j of the matrix W can be under-
ploy the least square loss function to control the reconstruction
stood as the correlation between ith test sample and jth training
CE

error (Arefi and Taheri, 2015)(Zhu et al., 2016a).


sample. If wi, j > 0, their correlation is positive; if wi, j < 0 , the
2 n X
m correlation is negative; and if wi, j =0, the ith test sample is unre-
2 ∧ X
WT X − Y F
= Y −Y = (ŷi, j − yi, j )2 (5) lated to the jth training sample. To understand the optimization
F i=1 j=1
of Eq.(7), assume we have the following optimal W.
AC

where Ŷ is the new representation of X in the space of Y,


0 0 0.7 0.3 
 
i.e., Ŷ = WT X. kWT X − Yk2F denotes the reconstruction error.  
 0.1 0 0.3 0 
Thanks to the Eq.(5) is convex, we can easily obtain its global 
0.6 0 0 0 
solution W = (XXT )−1 XY. Because of XXT is not always in-

W = 
0 0.7 0 0.1 

vertible in real applications, an ℓ2 −norm is added to remove the 
0 0 0 0.5 
 
issue of invertible. Thus, the objective function is changed to  
0.5 0 0 0.7
the ridge regression.
2 In this example, there are six training samples and four test
arg min WT X − Y F
+ δ kWk22 (6) samples. According to W, we know that the first column has
W
three nonzero elements. That is, the test sample has three
nearest neighbors, i.e., the value k=3. Moreover, the greater
2σ is a tuning parameter. For simplicity, we set σ = 1 in our experiments. the correlation value is, the closer the correlations between the
ACCEPTED MANUSCRIPT
6

test sample and the training samples are. The second column Algorithm 1: Pseudo code of solving Eq.(7).
has one nonzero element. It means that the test sample has
Input: η(0) = 0.01, α(1) = 1, γ = 0.002, ρ1 , ρ2 ;
only one nearest neighbor, i.e., the value k=1. With the same
Output: W;
rule, the third test sample and the fourth test samples have two
1 Initialize t = 1;
nearest neighbors (i.e., the value k=2) and four nearest neigh-
2 Initialize W(1) as a random diagonal matrix;
bors (i.e., the value k=4), respectively. Note that, the sparsity
3 repeat
(i.e., there are many zero elements in W) is generated due to
4 while L(W(t)) > Gη(t−1) (πη(t−1) (W(t)), W(t)) do
the introduction of the ℓ1 −norm in Eq.(7). This leads to that
5 Set η(t − 1) = γη(t − 1);
our algorithm outputs different k values for different test sam-
6 end
ples. Moreover, the LPP is introduced for further improving the
7 Set η(t) = η(t − 1);
performance of the reconstruction process in Eq.(7). However,
8 Compute W(t + 1) = arg min Gη(t) (W, V(t));
almost all conventional kNN algorithms employ a fixed k value √ W
decided by users or experts for all test samples. 9 Compute α(t + 1) =
1+ 1+4α(t)2
;

T
2
10 Compute Eq.(14);
3.4. Optimization
11 until Eq.(7) converges;

IP
Note that the Eq.(7) is a convex but non-smooth function. In
this subsection, we address this by designing a new accelerated
proximal gradient method (Zhu et al., 2014). We first conduct

CR
the proximal gradient method on Eq.(7) by letting: where
√ the coefficient α(t + 1) is usually set as α(t + 1) =
1+ 1+4α(t)2
1 2 2 .
f (W) = WT X − Y F + ρ1 tr(WT XLXT W) (8) Finally, we presnet the pseudo of our proposed optimization
2
ϑ(W) = f (W) + ρ2 kWk1 (9) method in Algorithm 1 and its convergence in Theorem 1.

We known that that f (W) is convex and differentiable. Thus,


we used the proximal gradient method to optimize W, and iter-
atively update it by means of the following optimization rule.
US
Theorem 1. Let {W(t)} be the sequence generated by Algo-
rithm 1, then for ∀ t ≥ 1, the following formula holds

2γL kW(1) − W∗ k2F


AN
ϑ(W(t)) − ϑ(W∗ ) ≤ (15)
W(t + 1) = arg min Gη(t) (W, W(t)) (10) (t + 1)2
W

where Gη(t) (W, W(t)) = f (W(t)) + h∇ f (W(t)), W − W(t)i + where γ > 0 is a predefined constant, L is the Lipschitz constant
η(t) 2 T T of the gradient of f (W) in Eq.(8), and W∗ = arg min ϑ(W).
2 kW−W(t)kF +ρ2 kWk1 , ∇ f (W(t)) = (XX +ρ1 XLX )W(t)−
M

W
T
XŶ , h·, ·i is an inner product operator, η(t) determines the step Theorem 1 shows that the convergence rate of the proposed
size of the t-iteration, W(t) is the value of W obtained at the accelerated proximal gradient method is O( t12 ), where t is the
t-iteration, and ρ2 is a tuning parameter. count number of iterations in Algorithm 1.
ED

By ignoring the terms independent of W in Eq.(10), we can


rewrite it as follows. 3.5. S-kNN Algorithm
W(t + 1) = πη(t) (W(t)) = In our propose algorithm, firstly, we optimize Eq.(7) to obtain
PT

1 ρ2 the correlation coefficient matrix W, so as to obtain an optimal


arg min kW − U(t)k22 + kWk1 (11) k value for each test sample. And then, we use the selected k to
2 η(t)
conduct kNN algorithm for different data mining tasks, such as
1
where U(t) = W(t) − η(t) ∇ f (W(t)) and πη(t) (W(t)) is the Eu- classification, regression, and missing value imputation.
CE

clidean projection of W(t) onto the convex set η(t). Taking into For regression and missing value imputation tasks, the bigger
account the separability of W(t + 1) on each row, i.e., wi (t + 1), the correlation between a test sample and its nearest neighbors
we update the weights for each row individually as follows. is, the larger the contribution of the nearest neighbors to the test
1 i 2 ρ2 sample is. Therefore, we propose to employ a weighted method
AC

wi (t + 1) = arg min w − ui (t) 2


+ wi 2
(12) for both the regression task and missing value imputation task
w i 2 η(t)
(Zhu et al., 2013a). And we defined the weighted predictive
1
where ui (t) = wi (t) − η(t) ∇ f (wi (t)) and wi (t) are the ith row value of jth test sample as follows.
of u(t) and W(t), respectively. According to Eq.(12), wi (t + 1) n
takes a closed form solution as follows.
X wi, j
predictvalue weight = ( n × ytrain(i) ) (16)
P
i∗ i
w = max{ w − ρ2 , 0} · sgn(w ) i
(13)
i=1 wi, j
i=1

Meanwhile, in order to accelerate the proximal gradient where n is the number of training samples, and ytrain(i) stands
method in Eq.(8), we introduce an auxiliary variable V(t + 1) for the true value of the ith training sample.
as follows as follows. For classification applications, the proposed S-kNN algo-
α(t) − 1 rithm uses k nearest neighbors of each test sample to predict
V(t + 1) = W(t) + (W(t + 1) − W(t)) (14) its class label with the majority rule.
α(t + 1)
ACCEPTED MANUSCRIPT
7

Algorithm 2: The pseudo of S-kNN algorithm. Table 1. Benchmark datasets


Dataset Instances Features Type Classes
Input: X, Y; Adult 1605 113 classification 2
Output: Arcene 100 9920 classification 2
switch task do Australian 690 14 classification 2
Cleveland 214 13 classification 2
case 1 Derm 358 34 classification 2
Class labels; Heart 270 13 classification 2
end Ionosphere 350 34 classification 2
Sonar 208 60 classification 2
case 2 Satimage 620 36 classification 6
Predicted value; Seeds 210 7 classification 3
end Bodyfat 252 14 regression No
Concreteslump 103 10 regression No
case 3 Mpg 398 8 regression No
Imputation value; Triazines 186 60 regression No
end Wine-white 4898 11 regression No

T
Abalone 4177 8 imputation No
endsw Eunite2001 336 16 imputation No
1 Normalizing X and Y (When Y is class labels without Housing 506 13 imputation No

IP
normalization); Pyrim 74 28 imputation No
YachtHydrodynamics(Yacht) 74 7 imputation No
2 Optimizing Eq.(7) to obtain the optimal solution W;
Obtaining the optimal k value for test samples based on W;

CR
3
4 switch task do 4.1. Experimental setting
5 case 1 In our experiments, standard kNN algorithm was regarded as
6 Obtaining class labels via majority rule; first comparison algorithm, where the k value is setting to 5.
7 end The second compared algorithm is Eq.(7) with the setting of
8
9
10
case 2

end
Obtaining prediction value via Eq.(16);

case 3
US
ρ1 = 0, i.e., via LASSO to learn different k values for test sam-
ples. We call this algorithm as L-kNN, with which we would
like to show the importance of preserving the local structures of
AN
11 data (Kang and Cho, 2008; Tibshirani, 1996).
12 Obtaining imputation value via Eq.(16); There are 20 datasets involved to validate the proposed algo-
13 end rithm, which were downloaded from UCI (Bache and Lichman,
14 endsw 2013), LIBSVM (Chang and Lin, 2011) and the literature3 .
These datasets are detailed in Table 1. We conduct experiments
M

on 10 datasets for classification, 5 datasets for regression, and 5


Therefore, our model can be easily applied to data mining datasets for missing value imputation, respectively. We coded
tasks, such as regression, missing value imputation and classi- all algorithms with MATLAB 7.1 in windows 7 system. We
ED

fication. We describe these computations with Algorithm 2 as conducted experiments by 10-fold cross-validation method, and
follows. repeated the whole process 10 times to avoid the possible bias.
In Algorithm 2, the input data is first normalized. And then, The classification accuracy is employed to measure the clas-
the dataset is divided into a set of training samples and a set of sification efficiency which is defined as follows.
PT

test samples for 10-fold cross validation. Thirdly, the correla- ncorrect
Accuracy = (17)
tion coefficient W between training samples and test samples is n
computed with Eq.(7), and the optimal solution W is obtained where n is the number of all samples, ncorrect is the number of
with the proposed optimization process. Consequently, we gen- correct classification samples. The higher accuracy the algo-
CE

erate the most correlative training samples of a test sample, rithm is, the better performance of classification it is.
i.e., top k candidates of nearest neighbors (training samples) of The Root Mean Square Error (RMSE) (Zhu et al., 2013b)
the test sample. Finally, we use the k nearest training samples and correlation coefficient are employed to evaluate the per-
to predict the test sample. This means that different test sam-
AC

formance of both regression analysis and missing value impu-


ples are predicted with different numbers of nearest neighbors. tation. Note that there are not missing values in the original
Note that the regression and missing value imputation tasks are datasets, we randomly selected some independent values to be
carried out with Eq.(16), and data classification is with the ma- missed according to the literatures on missing value imputa-
jority rule. tions.
The RMSE is defined as the square root of predicted value
4. Experimental analysis and the ground-truth. The formal formula is as follows.
v
t n
The proposed S-kNN method was evaluated in the data min- 1X
RMS E = (yi − ŷi )2 (18)
ing tasks, such as classification, regression and missing value n i=1
imputation, by compared with the state-of-the-art kNN algo-
rithms. Note that the classification task includes binary classi-
fication and multi-class classification. 3 http://www.cc.gatech.edu/
~lsong/code.html
ACCEPTED MANUSCRIPT
8

where yi indicates the ground-truth, yi indicates the predicted 1 1.1
kNN
value. Obviously, the smaller the RMSE is, the better the per- 0.95
kNN
L−kNN 1 L−kNN
S−kNN
formance of predictions is. 0.9 S−kNN
0.9
Correlation coefficient indicates the correlations between 0.85

Accuracy

Accuracy
0.8
predictions and observations. The correlation coefficient is be- 0.8
0.7
0.75
tween +1 and -1, where 1 is perfect positive correlation, 0 is 0.6
0.7
no correlation, and -1 is totally negative correlation. Generally, 0.5
0.65
the larger the correlation coefficient is, the more accurate the
0.4
0 2 4 6 8 10 0 2 4 6 8 10
prediction is. Iterations Iterations

4.2. Experimental Results Fig. 3. Adult Fig. 4. Arcene


In this section, we evaluate the performance of the proposed
S-kNN algorithm by compared with two algorithms on real kNN 1
kNN
L−kNN

T
0.8 0.95
datasets, in terms of three data mining tasks, classification, re- S−kNN
L−kNN
S−kNN
0.9
gression and missing value imputation. We evaluate both the 0.7

IP
0.85

Accuracy
Accuracy
regression and missing value imputation in the same subsection
0.8
because they have the same prediction model. 0.6
0.75

CR
0.5 0.7
4.2.1. Data classification 0.65
We summarize the classification accuracies of all algorithms 0.4
0 2 4 6 8 10 0 2 4 6 8 10
in Table 2. We listed the results of each repeated average value Iterations Iterations

which repeated the 10-fold cross-validation method process 10


times for all algorithms in Figures 3-12.
As shown in Table 2, we found that the proposed S-kNN al-
gorithm outperformed the comparison algorithms, L-kNN and
standard kNN. Specifically, in terms of the classification accu-
1
US
Fig. 5. Australian Fig. 6. Cleveland

0.95
1
kNN
L−kNN
S−kNN
AN
0.9 0.9
racy, the proposed S-kNN algorithm averagely improves 4.47% 0.8 0.85
Accuracy

Accuracy
kNN
and 22.38% accuracies more than the L-kNN algorithm and L−kNN 0.8
0.7
S−kNN
standard kNN method, respectively. In addition, Figures 3-12 0.75
0.6
have showed that the S-kNN algorithm had the highest accuracy 0.7
M

0.5
in each of iteration. From the results on the Satimage (a multi- 0.65

class dataset), the performance of the standard kNN algorithm 0.4


0 2 4 6 8 10 0 2 4 6 8 10
Iterations Iterations
is poor. However, the proposed algorithm performs steadily re-
sults for multi-class classification applications.
ED

Fig. 7. Derm Fig. 8. Ionosphere


As demonstrated in the above, the S-kNN approach performs
better than the L-kNN algorithm. This is because we utilized 1
kNN kNN
the LPP regularization term into the S-kNN method for preserv- 1 L−kNN L−kNN
S−kNN 0.9 S−kNN
ing the local structure of data. In particular, the S-kNN method
PT

0.9
has averagely improved by 5.47% accuracy more than the L-
Accuracy

Accuracy

0.8 0.8
kNN. And on Heart dataset, the S-kNN algorithm improved by 0.7
0.7
11.48% accuracy more than the L-kNN method. 0.6
CE

Both the S-kNN and L-kNN outperformed the kNN algo- 0.6
0.5
rithm. It is the reason that the both methods using different
0.4 0.5
k values in kNN algorithm and lead to better classification per- 0 2 4 6
Iterations
8 10 0 2 4 6
Iterations
8 10

formance than standard kNN method that is with a fixed k value


AC

for all test samples. Fig. 9. Heart Fig. 10. Sonar

4.2.2. Regression results & Imputation results 1


kNN
kNN 1
We summarize the regression RMSE results of regression L−kNN
0.95
L−kNN
S−kNN
S−kNN
and missing value imputation in Tables 3 and 4 respectively, 0.8
0.9
and the correlation coefficient in Tables 5 and 6, respectively.
Accuracy

Accuracy

0.85
0.6
We also depict the RMSE results of repeated the 10-fold cross- 0.8
validation method process 10 times for all algorithms in Figures 0.4 0.75
13-22, and the correlation coefficient results of all algorithms in 0.7
each of iterations in Figures 23-32. 0.2 0.65
0 2 4 6 8 10 0 2 4 6 8 10
From Tables 3 and 4, in terms of RMSE, we can find that Iterations Iterations

the proposed S-kNN achieved the best performance of regres-


sion and missing value imputation, followed by L-kNN and Fig. 11. Satimage Fig. 12. Seeds
ACCEPTED MANUSCRIPT
9
−4
x 10
3.5 0.03
kNN kNN Table 2. Comparison of classification accuracy(mean±std)
L−kNN L−kNN Dataset kNN L-kNN S-kNN
3
S−kNN 0.025 S−kNN
Adult 0.6700±0.0004 0.7828±0.0012 0.7994±0.0016
2.5
0.02 Arcene 0.5100±0.0077 0.7100±0.0143 0.8000±0.0156
RMSE

RMSE
2 Australian 0.5319±0.0036 0.6493±0.0013 0.6826±0.0021
0.015 Cleveland 0.7048±0.0029 0.7619±0.0030 0.8048±0.0038
1.5
Derm 0.6057±0.0061 0.9343±0.0007 0.9714±0.0009
0.01 Ionosphere 0.6743±0.0031 0.8286±0.0025 0.8571±0.0025
1
Heart 0.5381±0.0031 0.6741±0.0033 0.7889±0.0043
0.5 0.005
0 2 4 6 8 10 0 2 4 6 8 10 Sonar 0.6450±0.0047 0.7200±0.0051 0.7850±0.0034
Iterations Iterations
Satimage 0.2871±0.0029 0.6758±0.0026 0.7424±0.0019
Seeds 0.8048±0.0033 0.8714±0.0015 0.9238±0.0011
Fig. 13. Bodyfat Fig. 14. Concretes- MEAN 0.5972±0.0200 0.7608±0.0088 0.8155±0.0070
lump

5 0.25
kNN Table 3. Regression performances in terms of rmse(mean±std)

T
kNN
L−kNN L−kNN
S−kNN S−kNN
Dataset kNN L-kNN S-kNN
4.5
0.2 Bodyfat 2.1e-05±3.8e-09 1.4e-05±3.3e-09 1.3e-05±3.9e-09

IP
4 Concreteslump 0.0176±4.1e-04 0.0151±3.2e-04 0.0139±3.1e-04
RMSE

RMSE

0.15 Mpg 3.8080±0.4417 3.5909±0.3662 3.4693±0.3158


3.5 Triazines 0.1477±0.0017 0.1357±0.0016 0.1242±0.0013

CR
0.1
Wine-white 0.0071±1.6e-06 0.0064±1.0e-06 0.0061±9.3e-07
3

2.5 0.05
0 2 4 6 8 10 0 2 4 6 8 10
Iterations Iterations kNN. Figures. 13-22 also showed that the S-kNN algorithm
had higher prediction performance than the two compared al-
Fig. 15. Mpg

10
x 10
−3

kNN
L−kNN
Fig. 16. Triazines

16
kNN
L−kNN
US
gorithms in each of iteration. For the correlation coefficient,
Figures 23-32 demonstrated the results as similar to that in Fig-
ures 13-22.
As we have seen, in the evaluation of RMSE, the proposed
AN
14 S−kNN
S−kNN
9
12
S-kNN has averagely has averagely reduced the classification
RMSE

RMSE

8 error by 0.0734 and 0.0269 less than the L-kNN and standard
7
10 kNN, respectively. In particular, the S-kNN algorithm made
the most improvement on Mpg dataset, i.e., reduced 0.1216 and
M

6 8
0.3387 less than the L-kNN and standard kNN. There is a sim-
5
0 2 4 6 8 10
6
0 2 4 6 8 10 ilar performance in the evaluation of S-kNN missing value im-
Iterations Iterations
putation with the RMSE.
ED

Fig. 17. Wine-white Fig. 18. Abalone In terms of correlation coefficient on five datasets, the pro-
posed S-kNN has averagely increases by 3.66% more than L-
40 8 kNN, and by 10.42% more than standard kNN. Moreover, the
kNN kNN
L−kNN L−kNN proposed method achieved the maximal increment on Triazines
S−kNN 7 S−kNN
dataset, i.e., 8.8% more than the L-kNN and 22.85% more than
PT

35

6 standard kNN.
RMSE

RMSE

30
Like the above experiments for evaluating S-kNN classifica-
5
tion, the evaluations for S-kNN regression and missing value
CE

25
4 imputation have also demonstrated two improvements as fol-
20 3
lows.
0 2 4 6 8 10 0 2 4 6 8 10
Iterations Iterations
• The proposed S-kNN outperformed L-kNN duo to that the
AC

Fig. 19. Eunite2001 Fig. 20. Housing preservation of local structures of data is well considered
in the S-kNN algorithms.
0.09 15 kNN
0.08 L−kNN • Both the S-kNN and L-kNN outperformed standard kNN
14
S−kNN because they learn different optimal k values for different
0.07 13
0.06
RMSE

RMSE

12
0.05 11

0.04 kNN
Table 4. Imputation performances in terms of rmse(mean±std)
10
L−kNN Dataset kNN L-kNN S-kNN
0.03 S−kNN 9
Abalone 2.3894±0.1019 2.1261±0.1360 2.0850±0.1207
0.02
0 2 4 6 8 10
8
0 2 4 6 8 10
Eunite2001 31.2345±12.691 28.8972±12.253 26.0969±17.403
Iterattions Iterations Housing 5.2228±1.2315 3.8666±0.2108 3.7948±0.2175
Pyrim 0.0673±0.0002 0.0492±0.0003 0.0484±0.0003
Fig. 21. Pyrim Fig. 22. Yacht Yacht 10.5379±3.0492 10.1437±2.6463 9.4637±3.2310
ACCEPTED MANUSCRIPT
10

Table 5. Regression performances in terms of correlation kNN kNN


L−kNN 1 L−kNN
coefficient(mean±std) 1 S−kNN

Correlation Coefficient

Correlation Coefficient
S−kNN
Dataset kNN L-kNN S-kNN 0.9
Bodyfat 0.9846±8.1e-05 0.9918±4.4e-05 0.9930±4.3e-05 0.99
0.8
Concreteslump 0.6606±0.0312 0.7719±0.0249 0.8194±0.0195
0.7
Mpg 0.8865±0.0019 0.8978±0.0014 0.9076±0.0011 0.98
Triazines 0.4256±0.0148 0.5661±0.0123 0.6541±0.0106 0.6
Wine-white 0.9041±4.9e-04 0.9260±2.7e-04 0.9307±1.9e-04 0.97
0.5

0.96 0.4
0 2 4 6 8 10 0 2 4 6 8 10
Iterations Iterations
Table 6. Imputation performances in terms of correlation
coefficient(mean±std) Fig. 23. Bodyfat Fig. 24. Concretes-
Dataset kNN L-kNN S-kNN lump
Abalone 0.6499±0.0008 0.7282±0.0024 0.7376±0.0021
Eunite2001 0.8239±0.0048 0.8493±0.0036 0.8741±0.0013
1 kNN 0.9

T
Housing 0.8285±0.0004 0.9142±0.0004 0.9197±0.0003 kNN
L−kNN L−kNN
Pyrim 0.8283±0.0079 0.9140±0.0025 0.9201±0.0022

Correlation Cofficient
0.8

Correlation Coefficient
S−kNN S−kNN
Yacht 0.7200±0.0044 0.7948±0.0039 0.8162±0.0034 0.95

IP
0.7
0.9
0.6

test samples.

CR
0.85 0.5

0.4
In a word, according to the results on three learning tasks, 0.8
0 2 4 6 8 10
Iterations
we can make the following conclusion: First, it might be rea- 0 2 4 6
Iterations
8 10

sonable to use varied k values in kNN algorithm in real appli-


cations. Second, the optimal k values should be learnt from the
data, i.e., the data-driven k value in kNN algorithm.
0.96 US
Fig. 25. Mpg Fig. 26. Triazines

0.85
kNN
L−kNN
AN
0.94 0.8

Correlation Coefficient
S−kNN
Correlation Coefficient

5. Conclusion
0.92 0.75

In this work, we have proposed a novel kNN algorithm, 0.9 0.7


called S-kNN approach, by replacing the fixed k value for all
kNN
M

0.88 0.65
test samples with learning different k values for different test L−kNN
S−kNN
samples according to the distribution of the data. It is an 0.86
0 2 4 6 8 10 0 2 4 6 8 10
example-driven k-parameter computation. The key is the re- Iterations Iterations

construction of correlation between test samples and training


ED

Fig. 27. Wine-white Fig. 28. Abalone


samples, which is a sparse coefficient matrix. With this sparse
correlation, we can obtain the optimal k value for a test sam-
ple. For efficiency, the LPP regularization is adopted to keep 0.95 kNN 1 kNN
L−kNN L−kNN
the local structures of the data. The experiments on 20 real
Correlation Coefficient
Correlation Coefficient

S−kNN S−kNN
PT

0.9 0.95
datasets have demonstrated that the proposed S-kNN method
0.9
is efficiency and promising, compared with the state-of-the-art 0.85

kNN methods. 0.8


0.85
CE

0.8
0.75

6. Acknowledgements 0 2 4 6 8 10
0.75
0 2 4 6 8 10
Iterations Iterations
AC

This work was supported in part by the China Key Re- Fig. 29. Eunite2001 Fig. 30. Housing
search Program (Grant No: 2016YFB1000905), the China 973
Program (Grant No: 2013CB329404), the China 1000-Plan 1.1
kNN
1

National Distinguished Professorship, the Nation Natural Sci- L−kNN 0.95 kNN
Correlation Coefficient

L−kNN
Correlation Coefficient

1 S−kNN
ence Foundation of China (Grants No: 61573270, 61672177, 0.9 S−kNN

and 61363009), National Association of public funds, the 0.9 0.85


Guangxi Natural Science Foundation (Grant No: 2015GXNS- 0.8
FCB139011), the Guangxi High Institutions Program of Intro- 0.8
0.75
ducing 100 High-Level Overseas Talents, the Guangxi Collab- 0.7
0.7
orative Innovation Center of Multi-Source Information Integra- 0.65
0 2 4 6 8 10 0 2 4 6 8 10
tion and Intelligent Processing, the Research Fund of Guangxi Iterations Iterations

Key Lab of MIMS (16-A-01-01 and 16-A-01-02), and the


Guangxi Bagui Teams for Innovation and Research. Fig. 31. Pyrim Fig. 32. Yacht
ACCEPTED MANUSCRIPT
11

References Varmuza, K., Filzmoser, P., Hilchenbach, M., Krger, H., Siln, J., 2014. Knn
classification evaluated by repeated double cross validation: Recognition of
Arefi, M., Taheri, S.M., 2015. Least-squares regression based on atanassov’s minerals relevant for comet dust. Chemometrics & Intelligent Laboratory
intuitionistic fuzzy inputs–outputs and atanassov’s intuitionistic fuzzy pa- Systems 138, 64–71.
rameters. IEEE Transactions on Fuzzy Systems 23, 1142–1154. Weinberger, K.Q., Saul, L.K., 2009. Distance metric learning for large margin
Bache, K., Lichman, M., 2013. UCI machine learning repository. URL: http: nearest neighbor classification. Journal of Machine Learning Research 10,
//archive.ics.uci.edu/ml. 207–244.
Belkin, M., Niyogi, P., 2001. Laplacian eigenmaps and spectral techniques for Wu, X., Kumar, V., et al., 2008. Top 10 algorithms in data mining. Knowledge
embedding and clustering, in: International Conference on Neural Informa- and Information Systems 14, 1–37.
tion Processing Systems: Natural and Synthetic, pp. 585–591. Yager, R.R., Petry, F.E., 2014. Hypermatching: Similarity matching with ex-
Burba, F., Ferraty, F., Vieu, P., 2009. k-nearest neighbour method in functional treme values. IEEE Transactions on Fuzzy Systems 22, 949–957.
nonparametric regression. Journal of Nonparametric Statistics 21, 453–469. Zhang, C., Zhu, X., et al., 2007. GBKII: an imputation method for missing
Chang, C.C., Lin, C.J., 2011. LIBSVM: A library for support vector machines. values, in: PAKDD, pp. 1080–1087.
ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27. Zhang, S., 2011. Shell-neighbor method and its application in missing data
Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. imputation. Applied Intelligence 35, 123–133.
Chen, J., Shao, J., 2001. Jackknife variance estimation for nearest-neighbor Zhang, S., Jin, Z., Zhu, X., 2011. Missing data imputation by utilizing infor-
imputation. Journal of the American Statistical Association 96, 260–269.

T
mation within incomplete instances. Journal of Systems and Software 84,
Cheng, D., Zhang, S., Deng, Z., Zhu, Y., Zong, M., 2014. knn algorithm with 452–459.
data-driven k value, in: ADMA, pp. 499–512. Zhang, S., Li, X., Zong, M., Zhu, X., Cheng, D., 2017a. Learning k for knn

IP
Cover, T., Hart, P., 1967. Nearest neighbor pattern classification. IEEE Trans- classification. ACM Transactions on Intelligent Systems and Technology 8,
actions on Information Theory 13, 21–27. 43.
Devroye, L., et al., 1981. On the almost everywhere convergence of nonpara- Zhang, S., Li, X., Zong, M., Zhu, X., Wang, R., 2017b. Efficient knn classi-

CR
metric regression function estimates. The Annals of Statistics 9, 1310–1319. fication with different numbers of nearest neighbors. IEEE Transactions on
Ferraty, F., Vieu, P., 2006. Nonparametric functional data analysis: theory and Neural Networks and Learning Systems , 10.1109/TNNLS.2017.2673241.
practice. Springer Series in Statistics, Springer. Zhang, S., Qin, Y., Zhu, X., Zhang, J., Zhang, C., 2006. Optimized param-
Ghosh, A.K., 2006. On optimum choice of k in nearest neighbor classification. eters for missing data imputation, in: PRICAI 2006: Trends in Artificial
Computational Statistics & Data Analysis 50, 3113–3123. Intelligence. Springer, pp. 1010–1016.
Goldberger, J., Roweis, S.T., Hinton, G.E., Salakhutdinov, R., 2004. Neigh-

US
Zhang, S., Wu, X., Zhu, M., 2010. Efficient missing data imputation for super-
bourhood components analysis, in: NIPS, pp. 513–520. vised learning, in: ICCI, pp. 672–679.
Goyal, R., Chandra, P., Singh, Y., 2014. Suitability of knn regression in the Zhou, Z.H., Li, M., 2005. Semi-supervised regression with co-training, in:
development of interaction based software fault prediction models. Ieri Pro- IJCAI, pp. 908–916.
cedia 6, 15–21. Zhu, X., Huang, Z., Shen, H.T., Zhao, X., 2013a. Linear cross-modal hashing
AN
He, X., Niyogi, P., 2003. Locality preserving projections, in: NIPS, pp. 153– for efficient multimedia search, in: ACM Multimedia, pp. 143–152.
160. Zhu, X., Huang, Z., et al., 2013b. Video-to-shot tag propagation by graph sparse
Hoef, Jay M, V., Hailemariam, T., 2013. A comparison of the spatial linear group lasso. IEEE Transactions on Multimedia 15, 633–646.
model to nearest neighbor (k-nn) methods for forestry applications. Plos Zhu, X., Li, X., Zhang, S., 2016a. Block-row sparse multiview multilabel learn-
One 8, e59129. ing for image classification. IEEE transactions on cybernetics 46, 450–461.
Hron, K., Templ, M., Filzmoser, P., 2010. Imputation of missing values Zhu, X., Li, X., Zhang, S., Ju, C., Wu, X., 2017a. Robust joint graph sparse
M

for compositional data using classical and robust methods. Computational coding for unsupervised spectral feature selection. IEEE transactions on
Statistics & Data Analysis 54, 3095–3107. neural networks and learning systems 28, 1263–1275.
Hu, C., Jain, G., Zhang, P., Schmidt, C., Gomadam, P., Gorka, T., 2014. Data- Zhu, X., Li, X., Zhang, S., Xu, Z., Yu, L., Wang, C., 2017b. Graph pca hashing
driven method based on particle swarm optimization and k-nearest neighbor for similarity search. IEEE Transactions on Multimedia .
ED

regression for estimating capacity of lithium-ion battery. Applied Energy Zhu, X., Suk, H., Wang, L., Lee, S., Shen, D., 2017c. A novel relational regu-
129, 49–55. larization feature selection method for joint regression and classification in
Hu, R., Zhu, X., Cheng, D., He, W., Yan, Y., Song, J., Zhang, S., 2017. Graph AD diagnosis. Medical Image Analysis 38, 205–214.
self-representation method for unsupervised feature selection. Neurocom- Zhu, X., Suk, H.I., Huang, H., Shen, D., 2017d. Low-rank graph-regularized
puting 220, 130–137. structured sparse regression for identifying genetic biomarkers. IEEE Trans-
Jamshidi, Y., Kaburlasos, V.G., 2014. gsainknn: A gsa optimized, lattice com-
PT

actions on Big Data , 10.1109/TBDATA.2017.2735991.


puting knn classifier. Engineering Applications of Artificial Intelligence 35, Zhu, X., Suk, H.I., Lee, S.W., Shen, D., 2016b. Subspace regularized sparse
277–285. multitask learning for multiclass neurodegenerative disease identification.
Kang, P., Cho, S., 2008. Locally linear reconstruction for instance-based learn- IEEE Transactions on Biomedical Engineering 63, 607–618.
ing. Pattern Recognition 41, 3507–3518. Zhu, X., Zhang, L., Huang, Z., 2014. A sparse embedding and least variance
CE

Lall, U., Sharma, A., 1996. A nearest neighbor bootstrap for resampling hydro- encoding approach to hashing. IEEE Transactions on Image Processing 23,
logic time series. Water Resources Research 32, 679–693. 3737–3750.
Liu, H., Zhang, S., Zhao, J., Zhao, X., Mo, Y., 2010. A new classification
algorithm using mutual nearest neighbors, in: GCC, pp. 52–57.
Mack, Y.P., 1981. Local properties of k-nn regression estimates. SIAM Journal
AC

on Algebraic Discrete Methods 2, 311–323.


Meesad, P., Hengpraprohm, K., 2008. Combination of knn-based feature selec-
tion and knnbased missing-value imputation of microarray data, in: ICICIC,
pp. 341–341.
Ni, J., Qiao, F., Li, L., Di Wu, Q., 2012. A memetic pso based knn regression
method for cycle time prediction in a wafer fab, in: WCICA, pp. 474–478.
Qin, Y., Zhang, S., et al., 2007. Semi-parametric optimization for missing data
imputation. Applied Intelligence 27, 79–88.
Qin, Z., Wang, A.T., Zhang, C., Zhang, S., 2013. Cost-sensitive classifica-
tion with k-nearest neighbors, in: International Conference on Knowledge
Science, Engineering and Management, Springer. pp. 112–131.
Saini, I., Singh, D., Khosla, A., 2013. Qrs detection using k-nearest neigh-
bor algorithm (knn) and evaluation on standard ecg databases. Journal of
Advanced Research 4, 331–344.
Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. Journal
of the Royal Statistical Society. Series B (Methodological) , 267–288.

You might also like