0% found this document useful (0 votes)
44 views

A Bayesian Approach With Type-2 Student-T Membership Function For T-S Model Identification

1. The document proposes a Bayesian approach using a type-2 Student's t membership function for identifying Takagi-Sugeno fuzzy models from sparse data. 2. It combines Gaussian and Student's t density type membership functions in an interval type-2 fuzzy c-regression model framework. The Student's t density part is weighted more for modeling sparse data, as the Student's t distribution is commonly used for sparse data modeling in Bayesian inference. 3. The paper presents a novel approach for optimizing the consequent parameters using stochastic gradient descent without first performing type reduction of the membership functions, instead applying the Karnik-Mendel algorithm directly to the output to infer an optimal interval type-1 fuzzy set

Uploaded by

th onorimi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

A Bayesian Approach With Type-2 Student-T Membership Function For T-S Model Identification

1. The document proposes a Bayesian approach using a type-2 Student's t membership function for identifying Takagi-Sugeno fuzzy models from sparse data. 2. It combines Gaussian and Student's t density type membership functions in an interval type-2 fuzzy c-regression model framework. The Student's t density part is weighted more for modeling sparse data, as the Student's t distribution is commonly used for sparse data modeling in Bayesian inference. 3. The paper presents a novel approach for optimizing the consequent parameters using stochastic gradient descent without first performing type reduction of the membership functions, instead applying the Karnik-Mendel algorithm directly to the output to infer an optimal interval type-1 fuzzy set

Uploaded by

th onorimi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1

A Bayesian Approach with Type-2 Student-t


Membership Function for T-S Model Identification
Vikas Singh, Homanga Bharadhwaj and Nishchal K Verma

Abstract—Clustering techniques have been proved highly suc- hyperplane-shaped clustering becomes more popular [4], [5].
cessful for Takagi-Sugeno (T-S) fuzzy model identification. In The architecture of these algorithms are robust in partitioning
particular, fuzzy c-regression clustering based on type-2 fuzzy the data space, inferring estimates of outputs with the inputs
arXiv:2009.00822v1 [cs.AI] 2 Sep 2020

set has been shown the remarkable results on non-sparse data


but their performance degraded on sparse data. In this paper, an and determining an optimum fit for the regression model.
innovative architecture for fuzzy c-regression model is presented Previously proposed techniques like fuzzy c-regression model
and a novel student-t distribution based membership function (FCRM) and fuzzy c-mean (FCM) have been developed for
is designed for sparse data modelling. To avoid the overfitting, type-2 fuzzy logic framework. Here, upper and lower member-
we have adopted a Bayesian approach for incorporating a ship values are determined by simultaneously optimizing two
Gaussian prior on the regression coefficients. Additional novelty
of our approach lies in type-reduction where the final output is objective functions. Interval type-2 (IT2) FCRM, which was
computed using Karnik Mendel algorithm and the consequent presented recently for the T-S regression framework has shown
parameters of the model are optimized using Stochastic Gradient significantly better performance in terms of error minimization
Descent method. As detailed experimentation, the result shows and robustness in comparison of type-1 fuzzy logic [4]–[6].
that proposed approach outperforms on standard datasets in In this paper, we have combined the Gaussian and student-t
comparison of various state-of-the-art methods.
density type membership function for an IT2 FCRM frame-
Index Terms—TSK Model, Fuzzy c-Regression, Student-t dis- work. This is a hyperplane-shaped membership function with
tribution relatively two different weighed terms. The student-t density
part is weighed more if the data being modeled is sparse.
I. I NTRODUCTION The student-t distribution is a popular prior for sparse data
modelling in Bayesian inference. Therefore, it is used in our

T HERE have been numerous research focusing on model-


ing of non-linear systems through their input and output
mapping. In particular, fuzzy logic based approaches have
model. The stochastic gradient descent (SGD) technique is
used for optimization of consequent parameters and Karnik
Mendel (KM) algorithm is applied for type reduction of
been very successful in modeling of non-linear dynamics in estimated output. We have used L2 regularization of the
the presence of uncertainties [1]. Type-1 fuzzy logic enables regression coefficients in the IT2 fuzzy c-means clustering for
system identification and modeling by virtue of numerous identification of antecedent parameters. As demonstrated in
linguistic rules. Although this approach performs well, but due the results section, the regularization helps our model against
to the limitation of crisp membership values, its potential to overfitting of the training data and increases the generalization
handle the uncertainty in data is limited. Therefore, in order on unseen data. In addition, an innovative scheme for optimiz-
to successfully model the data uncertainty, a type-2 fuzzy ing the consequent parameters is also presented, wherein we
logic was proposed, where membership values of each data do not perform type reduction of the type-1 weight sets prior
points are themselves fuzzy. The type-2 fuzzy logic has been for output estimation. Instead, we use KM algorithm for the
remarkably successful in past due to its robustness in the output to infer an optimal interval type-1 fuzzy set and the set
presence of imprecise and noisy data [2], [3]. boundaries are optimized by SGD method.
The basic steps used in fuzzy inference system are the The rest of the paper is organized as: In Section II, we
structure and parameter identification of the model. The discuss the TSK fuzzy model, IT2-FCR and IT2-FCRM. In
structure identification is related to the process of selecting Section III, we describe the proposed approach . In Section
number of rules, input features and partition of input-output IV, we present the efficacy of proposed approach through
space while parameter identification is used to compute the experimentation. Finally, Section V concludes the paper.
antecedent and consequent parameters of the model. In the
literature’s fuzzy clustering have been widely used for fuzzy II. P RELIMINARIES
space partitioning since, a T-S fuzzy model is comprised of A. TSK Fuzzy Model
various locally weighted linear regression models, many of
TSK fuzzy model provides a rule-based structure for mod-
them are hyperplane based models incorporating a hyperplane
eling a complex non-linear system. If G(x, y) is the system to
based clustering and seems very effective for structure iden-
be identified, where x ∈ Rn be the input vector and y ∈ R be
tification. In particular, fuzzy c-regression clustering that are
the output. Then, the ith rule is written as
Vikas Singh Homanga Bharadhwaj, and Nishchal K Verma are with Rule i : IF xi is Ai1 and · · · and xm is Aim THEN
the Department of Electrical Engineering, IIT Kanpur, India (e-mail:
[email protected], [email protected], [email protected]) yi = θ0i + θ1i x1 + · · · + θmi xm (1)
2

where, i = 1, · · · , c is the number of fuzzy rule and yi is where, Eik (ζi ) be the MAP estimator, (yk −gi (xk , ζi ))2 be the
the ith output. Using these rules, we can infer the final model likelihood, ∑mp=0 (bip )2 be the prior or called as a regularizer,
output as follows: which is equivalent to the Bayesian notion of having a prior
on the regression weights bi for each cluster i and λ be
∑ci=1 wi yi i
m
y= c , w = ∏ µAi j x j (2) the regularizer control parameter. The regularizer reduces the
∑i=1 wi j=1 overfitting in cluster assignment by constraining the small
where, wi denotes the overall firing strength of the ith rule. regression weights.
In the proposed approach, we first define two degrees of
fuzziness m1 and m2 , initializes the number of clusters c and
B. Interval Type-2 FCM (IT2-FCM) a termination threshold ε . We also initialize the parameters ζi
In the interval type-2 FCM, two different objective func- and ζi , which are the upper and lower regression coefficient
tion that differ in their degrees of fuzziness are optimized vectors of ith cluster. Then, the equation (7) is written in the
simultaneously using Lagrange multipliers to obtain the upper term upper and lower error function MAP estimator as follows:
and lower membership function [3]. Let m1 and m2 be the m
two degree of fuzziness, then the two objective function are Eik (ζ i ) = (yk − gi (xk , ζi ))2 + λ ∑ (bip)2
described as p=0
m (7)
N c
Qm1 (U, v) = ∑ ∑ µi (xk ) m1 2
Eik (ζi ) Eik (ζ i ) = (yk − gi (xk , ζi ))2 + λ ∑ (bip)2
k=1 i=1 p=0
N c
(3)
To reduce the complexity of the system, a weighted average
Qm2 (U, v) = ∑ ∑ µi (xk )m1 Eik (ζi )2 type reduction technique is applied to obtain Eik (ζi ) as
k=1 i=2
(Eik (ζ i ) + Eik (ζ i ))
C. Inter Type-2 Fuzzy c-Regression Algorithm (IT2-FCR) Eik (ζi ) = (8)
2
The main motivation of inter type-2 fuzzy c-regression Through MAP estimate on the posterior of defuzzified error
algorithm is to partition the set of n data points (xk , yk ) function Eik (ζi ), the upper and lower membership function for
(k = 1 · · · n) into c clusters. The data points in every cluster i every data points in each cluster are obtained as similar to [3]
can be described by a regression model as and they are given as follows:
ŷk = gi (xk , ζi ) = bi1 xk1 + · · · + bim xkm + bi0 = [xk 1]ζiT (4) 1 1 1
 2 , if c  (Eik (ζi )  < c

 
 (Eik (ζi ) (m 1 −1) ∑1 E (ζ )
where, xk = [xk1 , · · · , xkm ] be the kth input vector, j =  ∑c

r=1 Erk (ζi ) rk i

1, · · · , m be the number of features, i = 1, · · · , c be the number uik = 1


  2 , otherwise
of clusters and ζi = [bi1 , · · · , bim , bi0 ] be the coefficient vector of
 
(E (ζi ) (m2 −1)

 c ik
∑r=1 Erk (ζi )
the ith cluster. In [4], the coefficient vectors are optimized by (9)
1 1 1
, if ≥

weighted least square method, whereas, in our approach we    2

(Eik (ζi ) c
(Eik (ζi ) (m1 −1) ∑c1

used SGD. The primary objective for using SGD is to make ∑cr=1

 Erk (ζi )
Erk (ζi )
the algorithm robust even for the cases where [xT Pi x] become uik = 1
   2 , otherwise
singular [6].
 (m2 −1)
(Eik (ζi )

∑cr=1

Erk (ζi )

III. P ROPOSED M ETHODOLOGY The above equation can be interpreted as that for a MAP
In this paper we have presented a new framework for FCRM problem formulated in (8). To estimate the parameters ζi and
with an innovative student-t distribution based membership ζi , we formulate the problem as a locally weighted linear
function (MF) for sparse data modelling [6]–[8]. The presented regression with an objective function:
approach is described in following subsections. 1 n
J(ζi ) = ∑ uik ([xk 1]ζiT − yk )
2 k=1
(10)
A. Fuzzy-Space Partitioning
Here, uik denotes the membership value of kth data point in
Firstly, we formulate the task of fuzzy-space partitioning as
the ith cluster. The parameter ζi and ζi are estimated by SGD
a Maximum A-Posterior (MAP) over a squared error function
using the above objective function by appropriately finding uik
[8]. Exploiting Bayes rule the MAP estimator is defined as
and uik . Then the regression coefficient (ζi ) are obtained by a
φ (y) = arg maxn p(x/y) = arg maxn p(y/x)p(x) (5) type reduction technique as follow:
x∈R x∈R

where, p(x/y) be the posterior, p(y/x) be the likelihood and (ζ i + ζ i )


ζi = (11)
p(x) be the prior distribution. Using the above equation the 2
MAP estimator is expressed in term of regression problem as The steps in this subsection are run and the parameters are
m updated until the convergence of ||ζicurrent − ζiprevious || ≥ ε to
Eik (ζi ) = (yk − gi (xk , ζi ))2 + λ ∑ (bip)2 (6) obtain the optimal value of the regression coefficient as briefly
p=0 described in Algorithm 1.
3

B. Identification of Antecedent Parameters C. Identification of Consequent Parameters


The MF developed in [5] is hyperplane shaped, which In the most of the literature the defuzzification of weights is
cannot successfully incorporate the relevant information of computed before determining the model output ŷk . The prob-
data distributions within different clusters. To overcome this lem with these approaches are that they do not consider effect
issue, we proposed a modified Gaussian based MF combined of model output which will affect the over all performance
with a student-t density function. The student-t distribution is of the model. To overcome this problem we evaluated the yk
widely used as a prior in Bayesian inference for sparse data and yk corresponding to the µ A (xk ) and µ Ai (xk ) using the KM
i
modelling [7]. Here, we weigh the Gaussian and the student-t algorithm [2]. The values of yk and yk are optimized parallelly
part by a hyper-parameter α . If the data we are modelling is until the convergence. The another advantage of this approach
very sparse then, α should be set very low so as to give more is that it become more robust in handling noise and provide
weight to student-t density membership value. a confidence interval for every output data points. The model
! output yk and yk corresponding to the weights µ A (xk ) and
i
(dik (ζ i ) − vi (ζ i ))2 µ Ai (xk ) are calculate using (1) and (2) as follows:
µ Ai (xk ) = α exp −η
σi2 (ζ i ) p
(12)
dik2 (ζ i )
!− (r+1)
2 ∑ µ Ai (xk ).(θ0i + θ1i xk1 + · · · + θMi xkM )
i=1
+(1 − α ) 1 + yk = p c
r
! ∑ µ Ai (xk ) + ∑ µ A (xk )
(dik (ζ i ) − vi (ζ i ))2 i=1 i=p+1
i

µ A (xk ) = α exp −η c (16)


i σi2 (ζ i ) µ A (xk ).(θ0i + θ1i xk1 + · · · + θMi xkM )
(13)
∑ i
!− (r+1) i=p+1
dik2 (ζ i ) 2 + p c
+(1 − α ) 1 +
r ∑ µ Ai (xk ) + ∑ µ A (xk )
i
i=1 i=p+1
q
In the above, dik is the distance between kth input vector
and ith cluster hyperplane.
∑ µ Ai (xk ).(θ0i + θ1i xk1 + · · · + θMi xkM )
i=1
yk = q c
|xk .ζ i |
dik (ζ i ) =
|xk .ζ i |
; dik (ζ i ) = (14) ∑ µ Ai (xk ) + ∑ µ Ai (xk )
||ζ i || ||ζ i || i=1 i=q+1
(17)
c
∑ µ Ai (xk ).(θ0i + θ1i xk1 + · · · + θMi xkM )
where, r = max{dik (ζ i ), i = 1 · · · c} is the maximum dis- i=q+1
tance of kth input vector from the ith cluster, vi and σi denotes + q c
the average distance and variance of each data points from the ∑ µ Ai (xk ) + ∑ µ Ai (xk )
cluster hyperplane respectively. i=1 i=q+1

n n where, p and q are switching points and computed by KM


∑ dik (ζi ) ∑ (dik (ζi ) − vi(ζi ))2 algorithm. We run above mentioned steps until the conver-
k=1 k=1 gence of yk and yk . Finally , the model output is determined
vi (ζi ) = ; σi (ζi ) = (15)
n n by applying a type reduction technique as
The lower MF (µ A (xk )) and upper MF (µ Ai (xk )) are called yk + yk
i
as weights of the TSK fuzzy model corresponding to kth input yk = (18)
2
belonging to the ith cluster.
IV. R ESULTS & D ISCUSSION
Algorithm 1 The Proposed Approach A. House Prices Dataset
1: Begin The house prices dataset
2: for i=1 to m do (https://www.kaggle.com/lespin/house-prices-dataset) is
3: Calculate ζi and ζi the upper and lower regression used to predict the sale price of a particular property. Through
vectors using (10) experimentation, we have demonstrated the robustness of
4: Calculate errors Eik (ζ i ), Eik (ζ i ) using (7) proposed method on this sparse data. The dataset is divided
5: Calculate upper and lower MFs ( uik , uik ) using (10) in training (70%) and testing (30%) sets and five-fold
6: end cross-validation is used while training. The hyper-parameters
7: The above identifies optimal ζi and ζi ∀ i ∈ [1, c] of the model are initially set as: c = 3, m1 = 1.6, m2 = 4.7,
8: for i=1 to m do λ = 0.3, α = 0.15 and η = 3.7. It should be noted that the
9: Compute the input MFs using (12) and (13) value of α = 0.15 is small because the dataset is sparse.
10: Compute the interval type-2 output yk using (18) So, in MF, the contribution of student-t function should be
11: end high, which is ensured by a smaller value of α i.e., larger
12: End value of 1 − α as defined by (12) and (13). The mean square
4

Table I: C OMPARISON OF PERFORMANCE ON HOUSE PRICES DATASET


Model LR RR RBFNN ITFRCM [9] TIFNN [9] RIT2FC [9] Proposed
MSE 0.06 0.06 0.049 0.019 0.045 0.035 0.008
Coefficient of Determination 0.68 0.69 0.67 0.73 0.77 0.79 0.85
Median Absolute Error 0.71 0.73 0.73 0.75 0.80 0.81 0.85
LR: Logistic Regression, RR: Ridge Regression, RBFNN: Radial Basis Function Neural Network, ITFRCM: Interval Type-2 Fuzzy c-Means,
TIFNN: Type-1 Set-Based Fuzzy Neural Network, RIT2FC: Reinforced Interval Type-2 FCM-Based Fuzzy Classifier

Table II: P ERFORMANCE ON NON - LINEAR TIME SERIES PROBLEM


error (MSE) is 0.008 on the test data, which is lower than State-of-the-Art Rules MSE
state-of-the-art methods as shown in Table I. The absolute Li et al. [1] 4 1.49 × 10−2
value of error as shown in Fig. 2 is also small in compare to Fazel Zarandi [4] 4 5.4 × 10−3
the absolute house prices as shown in Fig. 1. We postulate Li et al. [5] 4 1.02 × 10−2
MIT2 FCRM [6] 4 1.02 × 10−4
that this is due to the student-t MF used in our model,
Proposed 4 7.2 × 10−5
which helps in robustly quantifying the effects of sparse data.
Also, the higher test accuracy is due to greater generalization
owing to L2 regularizer used in our model. The coefficient requirement for a stable system. Therefore, we conclude that
of variation which is the ratio of explained variance to total our algorithm yields a dynamically stable model.
variance is very high (0.85). This suggests that our model
captures variations in the data robustly and is not susceptible Actual Output
Model Output
to faulty performance in the presence of outliers. 4

Data Point
2
Actual Output
190000 Model Output
0
House prices

185000
0 100 200 300 400 500
180000 Output
Fig. 3: Performance comparison of model output and actual output
175000
170000
0 20 40 60 80 100
Data Points
Fig. 1: Performance comparison of model output with actual output

100
Regression Error

50
Fig. 4: Plot of test error on time series data
0
−50
−100 C. A sinc Function in one Dimension
0 20 40 60 80 100
Data Points In this subsection a non-linear sinc function is used to
Fig. 2: Plot of test error for house prices dataset present the effectiveness of the proposed model;
B. Non-Linear Plant Modeling sin(x)
y= (20)
The second-order non-linear difference equation as given in x
(20) is used in order to draw comparison with other benchmark where, x ∈ [−40, 0) (0, 40]. We have sampled 121 data
S
models as given in Table II. points uniformly for this one dimensional function. As similar
(z(k − 1) + 2.5)z(k − 1)z(k − 2) to previous case study, the number of rules is taken as four.
z(k) = + v(k) (19) The hyper-parameters are tuned through grid-search and finally
1 + z2 (k − 1) + z2(k − 2)
fixed as: m1 = 1.5, m2 = 7 and η = 3.14. The MSE of the
where, v(k) = sin(2k/25) is the input for validation of proposed model is 2.4 × 10−3, which is lower in compare
model, z(k) is the model output whereas, z(k − 1), z(k − 2) and to modified inter type-2 FRCM (MIT2-FCRM) [6], which is
u(k) are the model inputs respectively. The hyper-parameters 7.7 × 10−3 on the test data of 121 samples. The Table III
are tuned by grid search and finally set as: c = 4, m1 = 1.5, provides a detailed comparison of performance with state-of-
m2 = 7 and η = 3.14. The obtained MSE of the model the-art methods.
on 500 test data points is 7.2 × 10−5 using only four rules Table III: PERFORMANCE ON sinc FUNCTION
which is much smaller compared to other models. Through State-of-the-Art Rules MSE
simulations, we have shown that proposed model outperforms SCM [10] 2 4.47 × 10−2
with other state-of-the-art model. The Fig. 3 shows that our EUM [10] 2 4.50 × 10−2
EFCM [10] 2 8.9 × 10−3
model output closely tracks the actual output at every time- Fazel Zarandi [4] 4 2.385 × 10−2
step. As observed in Fig. 4, the error fluctuates with data point, MIT2 FCRM [6] 4 7.7 × 10−3
but the absolute error is consistently less than 0.1 with no rapid Proposed 4 2.4 × 10−3
surge at stationary points of time series data. This is a crucial
5

V. C ONCLUSION distribution based MF. The proposed MF is useful for fuzzy c-


In this paper, we have illustrated the efficacy of the proposed mean regression models as demonstrated in section IV. When
Bayesian type-2 fuzzy regression approach using student-t the number of features are small in compared to the samples,
[4] M. H. F. Zarandi, R. Gamasaee, and I. B. Turksen, “A type-2 fuzzy c- clustering of input-output space yield to be very effective
regression clustering algorithm for takagi-sugeno system identification for identify the rules of the fuzzy system. In addition, we
and its application in the steel industry,” Information Sciences, vol. 187,
pp. 179–203, 2012. have also demonstrated that instead of direct defuzzification
[5] C. Li, et al., “T-S fuzzy model identification with a gravitational search- of weights before computation of the final output, a continuous
based hyperplane clustering algorithm,” IEEE Transactions on Fuzzy defuzzification and optimization gives better results.
Systems, vol. 20, no. 2, pp. 305–317, 2012.
[6] W. Zou, C. Li, and N. Zhang, “A T-S fuzzy model identification approach
based on a modified inter type-2 FRCM algorithm,” IEEE Trans. on
Fuzzy Syst., vol. 26, no. 3, pp. 1104 – 1113, 2017.
[7] V. E. E. Bening and V. Y. Korolev, “On an application of the student
distribution in the theory of probability and mathematical statistics,” R EFERENCES
Theory of Probability & Its Apls., vol. 49, no. 3, pp. 377–391, 2005.
[8] R. Gribonval, “Should penalized least squares regression be interpreted
as maximum a posteriori estimation?” IEEE Trans. on Signal Process., [1] C. Li, et al., “T-S fuzzy model identification based on a novel fuzzy
vol. 59, no. 5, pp. 2405–2410, 2011. c-regression model clustering algorithm,” Engineering Applications of
[9] E. H. Kim, S. K. Oh, and W. Pedrycz, “Design of reinforced interval Artificial Intelligence, vol. 22, no. 4-5, pp. 646–653, 2009.
type-2 fuzzy c-means-based fuzzy classifier,” IEEE Trans. on Fuzzy [2] J. Mendel, “On KM algorithms for solving type-2 fuzzy set problems,”
Syst., vol. 26, no. 5, pp. 3054 – 3068, 2017. IEEE Trans. on Fuzzy Syst., vol. 21, no. 3, pp. 426–446, 2013.
[10] M. S. Chen and S. W. Wang, “Fuzzy clustering analysis for optimizing [3] C. Hwang and F. C. H. Rhee, “Uncertain fuzzy clustering: Interval type-
fuzzy membership functions,” Fuzzy sets and systems, vol. 103, no. 2, 2 fuzzy approach to c-means,” IEEE Trans. on Fuzzy Syst., vol. 15, no. 1,
pp. 239–254, 1999. pp. 107–120, 2007.

You might also like