A Bayesian Approach With Type-2 Student-T Membership Function For T-S Model Identification
A Bayesian Approach With Type-2 Student-T Membership Function For T-S Model Identification
Abstract—Clustering techniques have been proved highly suc- hyperplane-shaped clustering becomes more popular [4], [5].
cessful for Takagi-Sugeno (T-S) fuzzy model identification. In The architecture of these algorithms are robust in partitioning
particular, fuzzy c-regression clustering based on type-2 fuzzy the data space, inferring estimates of outputs with the inputs
arXiv:2009.00822v1 [cs.AI] 2 Sep 2020
where, i = 1, · · · , c is the number of fuzzy rule and yi is where, Eik (ζi ) be the MAP estimator, (yk −gi (xk , ζi ))2 be the
the ith output. Using these rules, we can infer the final model likelihood, ∑mp=0 (bip )2 be the prior or called as a regularizer,
output as follows: which is equivalent to the Bayesian notion of having a prior
on the regression weights bi for each cluster i and λ be
∑ci=1 wi yi i
m
y= c , w = ∏ µAi j x j (2) the regularizer control parameter. The regularizer reduces the
∑i=1 wi j=1 overfitting in cluster assignment by constraining the small
where, wi denotes the overall firing strength of the ith rule. regression weights.
In the proposed approach, we first define two degrees of
fuzziness m1 and m2 , initializes the number of clusters c and
B. Interval Type-2 FCM (IT2-FCM) a termination threshold ε . We also initialize the parameters ζi
In the interval type-2 FCM, two different objective func- and ζi , which are the upper and lower regression coefficient
tion that differ in their degrees of fuzziness are optimized vectors of ith cluster. Then, the equation (7) is written in the
simultaneously using Lagrange multipliers to obtain the upper term upper and lower error function MAP estimator as follows:
and lower membership function [3]. Let m1 and m2 be the m
two degree of fuzziness, then the two objective function are Eik (ζ i ) = (yk − gi (xk , ζi ))2 + λ ∑ (bip)2
described as p=0
m (7)
N c
Qm1 (U, v) = ∑ ∑ µi (xk ) m1 2
Eik (ζi ) Eik (ζ i ) = (yk − gi (xk , ζi ))2 + λ ∑ (bip)2
k=1 i=1 p=0
N c
(3)
To reduce the complexity of the system, a weighted average
Qm2 (U, v) = ∑ ∑ µi (xk )m1 Eik (ζi )2 type reduction technique is applied to obtain Eik (ζi ) as
k=1 i=2
(Eik (ζ i ) + Eik (ζ i ))
C. Inter Type-2 Fuzzy c-Regression Algorithm (IT2-FCR) Eik (ζi ) = (8)
2
The main motivation of inter type-2 fuzzy c-regression Through MAP estimate on the posterior of defuzzified error
algorithm is to partition the set of n data points (xk , yk ) function Eik (ζi ), the upper and lower membership function for
(k = 1 · · · n) into c clusters. The data points in every cluster i every data points in each cluster are obtained as similar to [3]
can be described by a regression model as and they are given as follows:
ŷk = gi (xk , ζi ) = bi1 xk1 + · · · + bim xkm + bi0 = [xk 1]ζiT (4) 1 1 1
2 , if c (Eik (ζi ) < c
(Eik (ζi ) (m 1 −1) ∑1 E (ζ )
where, xk = [xk1 , · · · , xkm ] be the kth input vector, j = ∑c
r=1 Erk (ζi ) rk i
III. P ROPOSED M ETHODOLOGY The above equation can be interpreted as that for a MAP
In this paper we have presented a new framework for FCRM problem formulated in (8). To estimate the parameters ζi and
with an innovative student-t distribution based membership ζi , we formulate the problem as a locally weighted linear
function (MF) for sparse data modelling [6]–[8]. The presented regression with an objective function:
approach is described in following subsections. 1 n
J(ζi ) = ∑ uik ([xk 1]ζiT − yk )
2 k=1
(10)
A. Fuzzy-Space Partitioning
Here, uik denotes the membership value of kth data point in
Firstly, we formulate the task of fuzzy-space partitioning as
the ith cluster. The parameter ζi and ζi are estimated by SGD
a Maximum A-Posterior (MAP) over a squared error function
using the above objective function by appropriately finding uik
[8]. Exploiting Bayes rule the MAP estimator is defined as
and uik . Then the regression coefficient (ζi ) are obtained by a
φ (y) = arg maxn p(x/y) = arg maxn p(y/x)p(x) (5) type reduction technique as follow:
x∈R x∈R
Data Point
2
Actual Output
190000 Model Output
0
House prices
185000
0 100 200 300 400 500
180000 Output
Fig. 3: Performance comparison of model output and actual output
175000
170000
0 20 40 60 80 100
Data Points
Fig. 1: Performance comparison of model output with actual output
100
Regression Error
50
Fig. 4: Plot of test error on time series data
0
−50
−100 C. A sinc Function in one Dimension
0 20 40 60 80 100
Data Points In this subsection a non-linear sinc function is used to
Fig. 2: Plot of test error for house prices dataset present the effectiveness of the proposed model;
B. Non-Linear Plant Modeling sin(x)
y= (20)
The second-order non-linear difference equation as given in x
(20) is used in order to draw comparison with other benchmark where, x ∈ [−40, 0) (0, 40]. We have sampled 121 data
S
models as given in Table II. points uniformly for this one dimensional function. As similar
(z(k − 1) + 2.5)z(k − 1)z(k − 2) to previous case study, the number of rules is taken as four.
z(k) = + v(k) (19) The hyper-parameters are tuned through grid-search and finally
1 + z2 (k − 1) + z2(k − 2)
fixed as: m1 = 1.5, m2 = 7 and η = 3.14. The MSE of the
where, v(k) = sin(2k/25) is the input for validation of proposed model is 2.4 × 10−3, which is lower in compare
model, z(k) is the model output whereas, z(k − 1), z(k − 2) and to modified inter type-2 FRCM (MIT2-FCRM) [6], which is
u(k) are the model inputs respectively. The hyper-parameters 7.7 × 10−3 on the test data of 121 samples. The Table III
are tuned by grid search and finally set as: c = 4, m1 = 1.5, provides a detailed comparison of performance with state-of-
m2 = 7 and η = 3.14. The obtained MSE of the model the-art methods.
on 500 test data points is 7.2 × 10−5 using only four rules Table III: PERFORMANCE ON sinc FUNCTION
which is much smaller compared to other models. Through State-of-the-Art Rules MSE
simulations, we have shown that proposed model outperforms SCM [10] 2 4.47 × 10−2
with other state-of-the-art model. The Fig. 3 shows that our EUM [10] 2 4.50 × 10−2
EFCM [10] 2 8.9 × 10−3
model output closely tracks the actual output at every time- Fazel Zarandi [4] 4 2.385 × 10−2
step. As observed in Fig. 4, the error fluctuates with data point, MIT2 FCRM [6] 4 7.7 × 10−3
but the absolute error is consistently less than 0.1 with no rapid Proposed 4 2.4 × 10−3
surge at stationary points of time series data. This is a crucial
5