Architectural Exploration For Energy-Efficient Fixed Point Kalman Filter Vlsi Design
Architectural Exploration For Energy-Efficient Fixed Point Kalman Filter Vlsi Design
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE I
S UMMARY OF R ELATED W ORK A BOUT KF D ESIGN R EALIZATIONS
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 3. General architectures of: (a) INV block, (b) Cofactor block, (c) C block, (d) Det block, and (e) B2 block.
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
−
Fig. 5. Prior system error covariance (P(k) ) architectures in: (a) sequential form, (b) semiparallel form, and (c) parallel form.
synthesis tool. All architectures use fixed-point representation, component of the stage, it calculates two components. The
where (n/2) − 1 bits represent the integer part, (n/2) bits the parallel arrangement calculates all components of the step in
fractional part and one bit for the mathematical sign. just one cycle.
1) Prior State Vector (x̂−
(k) ): Fig. 4 presents the architectures 2) Prior System Error Covariance (P(k) −
): Fig. 5 shows dif-
developed for the prior state vector equation in the sequential ferent hardware architectures for prior system error covariance,
form, Fig. 4(a), semiparallel form, Fig. 4(b), and parallel (5), in sequential [Fig. 5(a)], semiparallel [Fig. 5(b)], and
form, Fig. 4(c). The sequential form processing begins with parallel [Fig. 5(c)] forms. The processing is similar to the
simultaneous multiplication of the components for matrix prior state vector processing. The primary modification is in
operation follows of its sum. This process realizes until the number of input signals and registers feedback. In this
calculating all parts of the first step, repeating in the second process, the transposed matrix A T is realized using the matrix
step. Lastly, in third step, one applies just the sums of the A and reposing its coefficients in multiplexer’s input. For prior
results of both phases one and two. The semiparallel form system error covariance calculation in steps after the first, one
realizes the same processing, but instead of calculating a single uses pipeline, with previously calculated outputs (T and So ) to
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 6. KG hardware architecture in: (a) sequential form, (b) semiparallel form, and (c) parallel form.
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 9. Posterior state vector (x̂(k) ) architectures in: (a) sequential form, (b) semiparallel form, and (c) parallel form.
Fig. 10. Posterior system error covariance (P(k) ) architectures in (a) sequential form, (b) semiparallel form, and (c) parallel form.
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 12. Measurement noise covariance (R(k) ) architectures in: (a) sequential form and (b) parallel form.
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE III
E QUATIONS S YNTHESIS RESULTS 1 @ RUNNING AT 100 MHz
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE VI
S YNTHESIS R ESULTS : T HOROUGH KF A RCHITECTURE BALANCED FOR H IGHER E NERGY-E FFICIENCY. RUNNING AT 100 MHz IN 65-nm CMOS
Fig. 15. System identification validation. (a) Estimated output Ẑ RMSE. (b) Estimated state vector x̂ RMSE. (c) Observed output versus estimated output.
(d) Real state vector versus estimated state vector.
Fig. 16. Noise elimination validation. (a) Original EEG signal and (b) its frequency response. (c) Corrupted EEG signal and (d) its frequency response.
(e) Filtered EEG signal and (f) its frequency response.
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 18. System estimation validation in a RLC. (a) Current in the inductor versus estimated current. (b) Voltage in the capacitor versus estimated voltage.
(c) Voltage in the resistor versus estimated voltage. (d) Current in the resistor current versus estimated current.
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
The work in [33] is the only one with an ASIC imple- [2] J. Myers, A. Savanth, R. Gaddh, D. Howard, P. Prabhat, and D. Flynn,
mentation, using a 0.5-μm CMOS technology with 24 bits “A subthreshold ARM cortex-M0+ subsystem in 65 nm CMOS for
resolution. Their authors quote power dissipation at a 20 MHz WSN applications with 14 power domains, 10T SRAM, and integrated
voltage regulator,” IEEE J. Solid-State Circuits, vol. 51, no. 1, pp. 31–44,
operating frequency and fail to mention the voltage supply. Jan. 2016.
This lack of information precludes parameters scaling for a fair [3] J. Han and M. Orshansky, “Approximate computing: An emerging
power comparison with our 65-nm CMOS ASIC. Therefore, paradigm for energy-efficient design,” in Proc. 18th IEEE Eur. TEST
we directly compare key design parameters of five solutions Symp. (ETS), May 2013, pp. 1–6.
in Table VII, considering two state variables (i.e., being one [4] A. Suleiman, Z. Zhang, L. Carlone, S. Karaman, and V. Sze, “Navion: A
2-mW fully integrated real-time visual-inertial odometry accelerator for
control input and one observed signal). In this comparison, autonomous navigation of nano drones,” IEEE J. Solid-State Circuits,
although frequency speed is highly dependent on technology vol. 54, no. 4, pp. 1106–1119, Apr. 2019.
parameters and voltage supply, the number of clock cycles is [5] A. Bellar and M. A. S. Mohammed, “Satellite inertia parameters esti-
a better metric for the characteristics of the architecture. It is mation based on extended Kalman filter,” J. Aerosp. Technol. Manage.,
noticeable that, although presenting a more extensive system vol. 11, pp. 1–11, Mar. 2019.
[6] W. Zhou, J. Hou, L. Liu, T. Sun, and J. Liu, “Design and simulation
with two state variables, two control inputs, and two observed of the integrated navigation system based on extended Kalman filter,”
signals, our balanced KF architecture has a latency of just Open Phys., vol. 15, no. 1, pp. 182–187, Apr. 2017.
34 clock cycles against 113 clock cycles of [33], representing a [7] M. Oskoei, “Adaptive Kalman filter applied to vision based head gesture
reduction of 3.3 times. Besides, the solution at 0.5-μm CMOS tracking for playing video games,” Robotics, vol. 6, no. 4, p. 33, 2017.
presented in [33] dissipates 55.3 mW at just 20 MHz, i.e., 5× [8] H. Wang et al., “Kalman filter slope measurement method based on
improved genetic algorithm-back propagation,” in Proc. WCX SAE World
slower than our VLSI KF frequency of operation. Moreover, Congr. Exper., 2020, pp. 1–10.
our balanced KF architecture dissipates just 1.30 mW at a [9] S. Acharya et al., “Ensemble learning approach via Kalman filtering
5× higher frequency. Our architecture shows a power reduc- for a passive wearable respiratory monitor,” IEEE J. Biomed. Health
tion of 42.5 times, in direct comparison to the best result Informat., vol. 23, no. 3, pp. 1022–1031, May 2019.
(55.3 mW) in prior literature for a VLSI implementation. [10] C.-S.-A. Gong et al., “Design and implementation of acoustic sensing
The most relevant FoM for comparison, i.e., energy expended system for online early fault detection in industrial fans,” J. Sensors,
vol. 2018, Jun. 2018, Art. no. 4105208.
per operation, shows that our proposed balanced architecture [11] S.-A. Li and C. Li, “FPGA implementation of adaptive Kalman filter
reduces this FoM by 710 × compared to said VLSI solution. for industrial ultrasonic applications,” Microsyst. Technol., pp. 1–8,
May 2019, doi: 10.1007/s00542-019-04456-6.
V. C ONCLUSION [12] X. Lai, T. Yang, Z. Wang, and P. Chen, “IoT implementation of Kalman
filter to improve accuracy of air quality monitoring and prediction,” Appl.
This work presented dedicated architectures implementing Science, vol. 9, no. 9, p. 1831, 2019.
the entire KF process in DSP applications. Eight different [13] L. Torres, J. Jiménez-Cabas, O. González, L. Molina, L. Estrada, and
architectures were developed, one for each equation of the R. Francisco, “Kalman filters for leak diagnosis in pipelines: Brief
filter, configured in fully sequential, semiparallel, and fully history and future research,” J. Mar. Sci. Eng., vol. 8, no. 3, p. 173,
2020.
parallel forms. Such wide design exploration enabled us to
[14] R. E. Kalman, “A new approach to linear filtering and prediction
determine which configuration leads to the best compromise or problems,” J. Basic Eng., vol. 82, no. 1, pp. 35–45, Mar. 1960.
balance between circuit area, power dissipation, and process- [15] G. Paim, P. Marques, E. Costa, S. Almeida, and S. Bampi, “Improved
ing speed. Our KG proposal uses the iterative-based Godsh- goldschmidt algorithm for fast and energy-efficient fixed-point divider,”
midt divider for the matrix inversions. Simulations determined in Proc. 24th IEEE Int. Conf. Electron., Circuits Syst. (ICECS),
20 bits as the best fixed-point word width to implement the Dec. 2017, pp. 74–77.
[16] S. F. Obermann and M. J. Flynn, “Division algorithms and implemen-
architectures with the best system error. Our results pointed to tations,” IEEE Trans. Comput., vol. 46, no. 8, pp. 833–854, Aug. 1997.
sequential and semiparallel architectures as the best tradeoff [17] R. E. Goldschmidt, “Applications of division by convergence,” Ph.D.
between reduced circuit area, power dissipation, and high dissertation, Massachusetts Inst. Technol., Cambridge, MA, USA, 1964.
processing speed for the KF equations. This best-balanced [18] C. Paleologu, J. Benesty, and S. Ciochina, “Study of the general Kalman
VLSI implementation of the entire KF architecture is a filter for echo cancellation,” IEEE Trans. Audio, Speech, Lang. Process.,
vol. 21, no. 8, pp. 1539–1549, Aug. 2013.
comprehensive and new solution for VLSI DSP applications.
[19] J. Baliyan, A. Aggarwal, and A. Kumar, “Implementation of Kalman
Finally, we explored the VLSI KF circuit in application filter using VHDL,” Int. J. Sci. Eng. Technol. Res, vol. 3, no. 8,
scenarios of systems identification, noise cancellation, and pp. 1569–1575, 2014.
systems estimation to show its end-user performance. The [20] R. Inan, M. Barut, and F. Karakaya, “FPGA implementation of extended
results presented reduced RMSE values regarding estimated Kalman filter for speed-sensorless control of induction motors,” in
outputs and estimated state vectors, confirming and validating Proc. 7th IET Int. Conf. Power Electron., Mach. Drives (PEMD), 2014,
pp. 1–6.
the accuracy, precision, and reliability of the new KF dedicated [21] A. A. Q. Al Rababah, “Embedded architecture for object tracking
VLSI architectures proposed in this work. Comparisons with using Kalman filter,” J. Comput. Sci., vol. 12, no. 5, pp. 241–245,
previous literature revealed our new balanced KF with the 2016.
best results concerning the number of arithmetic operators, [22] M. Terra, R. Montanari, and V. Guizilini, “FPGA implementation of
power dissipation, energy per operation, and processing speed. robust array Kalman filter based On Givens rotation,” in Proc. XIII Intell.
Automat. Brazilian Symp. (SBIA), Oct. 2017, pp. 1844–1849.
In future works, we aim to investigate the system order [23] N. Noordin, Z. Ibrahim, M. Xie, R. Samad, and N. Hasan, “FPGA
scalability at design time and its impacts by using other case implementation of simulated Kalman filter optimization algorithm,”
studies. Runtime scalable KF architectures will be addressed J. Telecommun., Electron. Comput. Eng., vol. 10, nos. 1–3, pp. 21–24,
as well in future research. 2018.
[24] A. Jarrah, “Optimized parallel architecture of Kalman filter for radar
tracking applications,” Jordan J. Electr. Eng., vol. 2, no. 3, pp. 215–230,
R EFERENCES May 2016.
[1] S. Vangal et al., “Near-threshold voltage design techniques for heteroge- [25] A. Valade, P. Acco, P. Grabolosa, and J.-Y. Fourniols, “A study about
nous manycore system-on-chips,” J. Low Power Electron. Appl., vol. 10, Kalman filters applied to embedded sensors,” Sensors, vol. 17, no. 12,
no. 2, p. 16, May 2020. p. 2810, Dec. 2017.
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[26] S. Nejad, D. T. Gladwin, and D. A. Stone, “On-chip implementation Patrícia Ücker Leleu da Costa received the
of extended Kalman filter for adaptive battery states monitoring,” in engineering degree in electronics engineering from
Proc. 42nd Annu. Conf. IEEE Ind. Electron. Soc. (IECON), Oct. 2016, the Federal University of Pelotas, Pelotas, Brazil,
pp. 5513–5518. in 2018, and the M.Sc. degree in electronic engi-
[27] L.-C. Zai, C. L. DeMarco, and T. A. Lipo, “An extended Kalman neering and computing from the Catholic University
filter approach to rotor time constant measurement in PWM induction of Pelotas, Pelotas, in 2020.
motor drives,” IEEE Trans. Ind. Appl., vol. 28, no. 1, pp. 96–104, Her research interests are low-power VLSI archi-
Jan./Feb. 1992. tectures, arithmetic operators, digital signal process-
[28] S. Y. Chen, “Kalman filter for robot vision: A survey,” IEEE Trans. Ind. ing architecture, and approximate computing.
Electron., vol. 59, no. 11, pp. 4409–4420, Nov. 2012.
[29] F. Sandhu, H. Selamat, S. E. Alavi, and V. B. S. Mahalleh, “FPGA-based
implementation of Kalman filter for real-time estimation of tire velocity
and acceleration,” IEEE Sensors J., vol. 17, no. 17, pp. 5749–5758,
Sep. 2017.
[30] M. Ricco, P. Manganiello, E. Monmasson, G. Petrone, and
G. Spagnuolo, “FPGA-based implementation of dual Kalman filter for
Eduardo Antonio César da Costa (Member, IEEE)
PV MPPT applications,” IEEE Trans. Ind. Informat., vol. 13, no. 1,
received the five-year engineering degree in electri-
pp. 176–185, Feb. 2017.
cal engineering from the University of Pernambuco,
[31] J. Soh and X. Wu, “A scalable, FPGA-based implementation of the
Recife, Brazil, in 1988, the M.Sc. degree in electrical
unscented Kalman filter,” in Introduction and Implementations of the
engineering from the Federal University of Paraiba,
Kalman Filter. Rijeka, Croatia: InTechOpen, 2018.
Campina Grande, Brazil, in 1991, and the Ph.D.
[32] J. Liao et al., “FPGA implementation of a Kalman-based motion esti-
degree in computer science from the Federal Uni-
mator for levitated nanoparticles,” IEEE Trans. Instrum. Meas., vol. 68,
versity of Rio Grande do Sul, Porto Alegre, Brazil,
no. 7, pp. 2374–2386, Jul. 2019.
in 2002.
[33] R. Chávez-Bracamontes, M. A. Gurrola-Navarro, H. J. Jiménez-Flores,
Part of his doctoral work was developed at the
and M. Bandala-Sánchez, “VLSI architecture of a Kalman filter opti-
INESC-ID, Lisbon, Portugal. He is currently a Full
mized for real-time applications,” IEICE Electron. Exp., pp. 1–11,
Professor with the Catholic University of Pelotas (UCPel), Pelotas, Brazil.
Feb. 2016, Art. no. 20160043.
He is a Co-Founder and a Coordinator of the Graduate Program on Electronic
[34] C. Wang, E. D. Burnham-Fay, and J. D. Ellis, “Real-time FPGA-based
Engineering and Computing at UCPel. His research interests are VLSI
Kalman filter for constant and non-constant velocity periodic error
architectures and low-power design.
correction,” Precis. Eng., vol. 48, pp. 133–143, Apr. 2017.
[35] P. T. L. Pereira, G. Paim, P. Ucker, E. Costa, S. Almeida, and S. Bampi,
“Exploring architectural solutions for an energy-efficient Kalman filter
gain realization,” in Proc. 26th IEEE Int. Conf. Electron., Circuits Syst.
(ICECS), Nov. 2019, pp. 650–653.
[36] A. U. Irtürk, “GUSTO: General architecture design utility and synthesis
tool for optimization,” Ph.D. dissertation, Univ. California, San Diego, Sérgio José Melo de Almeida (Member, IEEE)
CA, USA, 2009. received the B.E.E. degree from the Federal Univer-
[37] R. Muller, H.-J. Pfleiderer, and K.-U. Stein, “Energy per logic operation– sity of Pernambuco (UFPE), Recife, Brazil, in 1988,
A figure of merit for IC’s,” in Proc. 2nd Eur. Solid State Circuits Conf., the M.Sc. degree in electrical engineering from
Sep. 1976, pp. 50–51. Federal University of Paraba (UFPB), João Pessoa,
Brazil, in 1991, and the Ph.D. degree in electrical
engineering from the Federal University of Santa
Catarina (UFSC), Florianópolis, Brazil, in 2004.
Pedro Tauã Lopes Pereira (Student Member, He was a Postdoctoral with the Department of
IEEE) received the engineering degree in control Electrical and Electronic Engineering, Federal Uni-
and automation engineering from the Federal Uni- versity of Santa Catarina, from 2009 to 2010. He is
versity of Pelotas (UFPEL), Pelotas, Brazil, in 2018, currently a Professor of Electrical Engineering and Computer Science with
and the M.Sc. degree in electronic engineering and the Catholic University of Pelotas, Pelotas, Brazil. His research interests are
computing from the Catholic University of Pelotas, in digital signal processing, including statistical signal processing, adaptive
Pelotas, Brazil, in 2019. He is a currently working algorithm, hyperspectral image processing, and dedicated hardware for signal
toward the Ph.D. degree at the Federal University processing.
of Rio Grande do Sul (UFRGS), Porto Alegre,
Brazil.
His research interests are digital signal processing
architecture, adaptive filters, and approximate computing.
Authorized licensed use limited to: Carleton University. Downloaded on June 04,2021 at 14:16:29 UTC from IEEE Xplore. Restrictions apply.