0% found this document useful (0 votes)

290 views42 pages

FFT Matrix Factorization Techniques

The document discusses efficient implementations of the discrete Fourier transform (DFT) using matrix factorizations. It describes how the DFT matrix can be factored into sparse matrices involving block diagonal matrices and permutation matrices. This factorization leads to an algorithm for computing the DFT in O(n log n) operations using a divide-and-conquer approach, by recursively breaking the problem into smaller DFT subproblems. The Cooley-Tukey algorithm is presented as an efficient non-recursive implementation of this approach.

Uploaded by

Olimpiu Stoicuta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

290 views42 pages

FFT Matrix Factorization Techniques

Uploaded by

Olimpiu Stoicuta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

The FFT

Via Matrix Factorizations

A Key to Designing High Performance Implementations

Charles Van Loan

Department of Computer Science
Cornell University
A High Level Perspective...
Blocking For Performance

 
A11 A12 · · · A1q } n1
A A · · · A 
 21 22 2q  } n2
A =  . . . . . 
 . . . . 
Ap1 Ap2 · · · Apq } nq
|{z} |{z} |{z}
n1 n2 nq

A well known strategy for high-performance Ax = b and Ax = λx

solvers.
Factoring for Performance

One way to execute a matrix-vector product

y = Fnx
when Fn = At · · · A2A1 is as follows:

y=x
for k = 1:t
y = Ak x
end

A different factorization Fn = Ãt̃ · · · Ã1 would yield a different

algorithm.
The Discrete Fourier Transform (n = 8)

 
ω80 ω80 ω80 ω80 ω80 ω80 ω80 ω80
 
 ω0 ω81 ω82 ω83 ω84 ω85 ω86 7
ω8 
 8 
 
 ω0 ω82 ω84 ω86 ω88 ω810 ω812 14
ω8 
 8 
 
 ω0 ω83 ω86 ω89 ω812 ω815 ω818 21
ω8 
 8 
y = F8x =  x
 ω0 ω84 ω88 ω812 ω816 ω820 ω824 28
ω8 
 8 
 0 
ω
 8 ω85 ω810 ω815 ω820 ω825 ω830 35
ω8  
 0 
ω
 8 ω86 ω812 ω818 ω824 ω830 ω836 42
ω8  
ω80 ω87 ω814 ω821 ω828 ω835 ω842 ω849

ω8 = cos(2π/8) − i · sin(2π/8)
The DFT Matrix In General...

If ωn = cos(2π/n) − i · sin(2π/n) then

pq
[Fn]pq = ωn

= (cos(2π/n) − i · sin(2π/n))pq

= cos(2pqπ/n) − i · sin(2pqπ/n)

Fact:
FnH Fn = nIn

√
Thus, Fn/ n is unitary.
Data Sparse Matrices

An n-by-n matrix A is data sparse if it can be represented with

many fewer than n2 numbers.

Example 1.
A has lots of zeros. (“Traditional Sparse”)

Example 2.
A is Toeplitz...
 
a b c d
e a b c
A = 
f

e a b
g f e a
More Examples of Data Sparse Matrices

A is a Kronecker Product B ⊗ C, e.g.,

" #
b11C b12C
A =
b21C b22C

If B ∈ IRm1×m1 and C ∈ IRm2×m2 then A = B ⊗ C has m21m22

entries but is parameterized by just m21 + m22 numbers.
Extreme Data Sparsity

n X
X n X
n X
n
A = S(i, j, k, `) · (2-by-2) ⊗ · · · ⊗ (2-by-2)
i=1 j=1 k=1 `=1 | {z }
d times

A is 2d -by-2d but is parameterized by O(dn4) numbers.

Factorization of Fn

The DFT matrix can be factored into a short product of sparse

matrices, e.g.,

F1024 = A10 · · · A2A1P1024

where each A-matrix has 2 nonzeros per row and P1024 is a per-
mutation.
From Factorization to Algorithm

If n = 210 and
Fn = A10 · · · A2A1Pn
then

y = Pnx
for k = 1:10
y = Ak x ← 2n flops.
end

computes y = Fnx and requires O(n log n) flops.

Recursive Block Structure

F8(:, [ 0 2 4 6 1 3 5 7 ]) =
 
1 0 0 0 1 0 0 0
 0 1 0 0 0 ω 0 0 
 8 
 0 0 1 0 0 0 ω 2 0 
 8 
 F 0

0 ω8  3
 0 0 0 1 0 0 4
 
 1 0 0 0 −1 0 0 0  0 F4
 
 0 1 0 0 0 −ω8 0 0 

 0 0 1 0 0 2
0 −ω8 0 

0 0 0 1 0 0 0 −ω83

Fn/2 “shows up” when you permute the columns of Fn so that

the odd-indexed columns come first.
Recursion...

We build an 8-point DFT from two 4-point DFTs...

 
1 0 0 0 1 0 0 0

 0 1 0 0 0 ω8 0 0 

 0 0 1 0 0 0 ω82 0  " #
0 ω83  F4x([Link])
 
 0 0 0 1 0 0
F8 x =  
 1 0 0 0 −1 0 0 0  F4x([Link])
 
 0 1 0 0 0 −ω8 0 0 

 0 0 1 0 0 2
0 −ω8 0 

0 0 0 1 0 0 0 −ω83
Radix-2 FFT: Recursive Implementation

function y =fft(x, n)
if n = 1
y = x
else
m = n/2; ω = exp(−2πi/n)
Ω = diag(1, ω, . . . , ω m−1)
zT = fft(x([Link]n − 1), m)
zB = Ω· fft(x([Link]n − 1), m)

Im Im zT
y = Overall: 5n log n flops.
Im −Im zB
end
The Divide-and-Conquer Picture

The Radix-2 Factorization...

If n = 2m and
Ωm = diag(1, ωn, . . . , ωnm−1),
then " # " #
Fm ΩmFm Im Ωm
FnΠn = = (I2 ⊗ Fm).
Fm −ΩmFm Im −Ωm

where Πn = In(:, [Link]n [Link]n]).

Fm 0
Note: I2 ⊗ Fm = .
0 Fm
The Cooley-Tukey Factorization

n = 2t

Fn = At · · · A1Pn

Pn = the n-by-n “bit reversal ” permutation matrix

" #
IL/2 ΩL/2
Aq = I r ⊗ L = 2q , r = n/L
IL/2 −ΩL/2

L/2−1
ΩL/2 = diag(1, ωL, . . . , ωL ) ωL = exp(−2πi/L)
The Bit Reversal Permutation

([Link])
HH
H
H
HH
H
H
([Link]) ([Link])
Q Q
Q Q
Q Q
Q Q
([Link]) ([Link]) ([Link]) ([Link])
@ @ @ @
@ @ @ @
([Link]) ([Link]) ([Link]) ([Link]) ([Link]) ([Link]) ([Link]) ([Link])
A A A A A A A A
A A A A A A A A
[0] [8] [4] [12] [2] [10] [6] [14] [1] [9] [5] [13] [3] [11] [7] [15]
Bit Reversal
       
x(0) x(0000) x(0000) x(0)
 x(1)   x(0001)   x(1000)   x(8) 
       
 x(2)   x(0010)   x(0100)   x(4) 
       
 x(3)   x(0011)   x(1100)   x(12) 
       
 x(4)   x(0100)   x(0010)   x(2) 
       
 x(5)   x(0101)   x(1010)   x(10) 
       
 x(6)   x(0110)   x(0110)   x(6) 
       
 x(7)   x(0111)   x(1110)   x(14) 
 x(8)  =  x(1000) 
    →  x(0001)  =  x(1) 
   
       
 x(9)   x(1001)   x(1001)   x(9) 
       
 x(10)   x(1010)   x(0101)   x(5) 
       
 x(11)   x(1011)   x(1101)   x(13) 
       
 x(12)   x(1100)   x(0011)   x(3) 
       
 x(13)   x(1101)   x(1011)   x(11) 
       
 x(14)   x(1110)   x(0111)   x(7) 
x(15) x(1111) x(1111) x(15)
Butterfly Operations
This matrix is block diagonal...
" #
IL/2 ΩL/2
Aq = I r ⊗ L = 2q , r = n/L
IL/2 −ΩL/2
r copies of things like this
 
1 ×

 1 × 


 1 × 


 1 × 

1
 × 


 1 × 

 1 × 
1 ×
At the Scalar Level...

a sH a + ωb
s
H
H
ω
HH
b s Hs a − ωb
Signal Flow Graph (n = 8)

x0
H
s s s s y0
HH @ A
ω80 @ A
H @ A
@
x4 s
HHs
ω80
A s y1
A A
s
@
@ @ A A
@
@ @ A A
@ A
@s A s y2
x2
HH ω82
A
ω80

s s
A
H @ A A
ω80 @ A A A
A
HH @ A A
x6 Hs @s A ω81 A s y3
A A
s
A
A A A A
A A

A A A A
s A A A s y4
x1
H ω82
A A A
s s
H @
H
ω80 @ A A
A
H @
@ AA A A
H
x5
s Hs ω80 s ω83 A A s y5
@ A A
@ @
@ A A
@ @ A A
@
@ A A s y6
H
x3 2
s s ω8 s
HH A
@
ω80 @ A
H @ A
H
x7
s Hs @
s A s y7
The Transposed Stockham Factorization

If n = 2t, then
Fn = St · · · S2S1,
where for q = 1:t the factor Sq = Aq Γq−1 is defined by

Aq = I r ⊗ BL , L = 2q , r = n/L,

Γq−1 = Πr∗ ⊗ IL∗ , L∗ = L/2, r∗ = 2r,

IL∗ ΩL∗
BL = ,
IL∗ −ΩL∗

ΩL∗ = diag(1, ωL, . . . , ωLL∗−1).

Perfect Shuffle

   
x0 x0
 x1   x1 
   
 x2   x4 
   
 x3   x5 
(Π4 ⊗ I2) 
 x4  =  x2 
  
   
 x5   x3 
   
 x6   x6 
x7 x7
Cooley-Tukey Array Interpretation

Step q:

k

2k 2k+1 
8 

>
<


L∗ =2q−1
>
−→ L=2q
: 



| {z } 
r∗ =n/L∗
| {z }
r=n/L
Reshaping

 
×
×
 
×
 
×
  × × × ×
x =  ×  → x2×4 =
 
× × × × ×
 
×
 
×
×
Transposed Stockham Array Interp

k k+r
9
>
=
(q−1)
xL∗ ×r∗ = FL∗ xT
r∗ ×L∗ = L∗ =2q−1 .
>
;

| {z }
r∗ =n/L∗
x(q) = Sq x(q−1)
k
9
>
>
>
>
>
>
>
>
=
(q)
xL×r = FL xT
r×L = L=2q
>
>
>
>
>
>
>
>
;

| {z }
r=n/L
2 × 2 × 2 Basic Radix-2 Versions

Store intermediate DFTs by row or column

Intermediate DFTs adjacent or not.

How the two butterfly loops are ordered.

" #!
IL/2 ΩL/2
x = Ir ⊗ x L = 2q , r = n/L
IL/2 −ΩL/2
The Gentleman-Sande Idea

It can be shown that FnT = Fn and so if

Fn = At · · · A1PnT
then
Fn = FnT = PnAT1 · · · ATt
and we can compute y = Fnx as follows...
y = x
for k = t: − 1:1
y = ATk x
end
y = Pny
Convolution and Other Aps

From “problem space” to “DFT space” via

for k = t: − 1:1
x = ATk x
end
x = Pnx

Do your thing in DFT space. Then inverse transform back to

Problem space via
x = PnT x
for k = 1:t
x = Ak x
end
x = x/n

Can avoid the Pn ops by working in “scrambled” DFT space.

Radix-4

Can combine four quarter-length DFTs to produce a single full-

length DFT:
    
I I I I a (a + c) + (b + d)
 I −iI −I iI  b   (a − c)−i(b − d) 
v=   = 
 I −I I −I  c   (a + c) − (b + d) 
,

I iI −I −iI d (a − c)+i(b − d)

The radix-4 butterfly.

Better re-use of data.
Fewer flops. Radix-4 FFT is 4.25n log n (instead of 5n log n).
Mixed Radix

96

#P
cPP

# PP
c PP
# c
# c PP
24 24 24 24
@ @ @ @
@ @ @ @
8 8 8 8 8 8 8 8 8 8 8 8
Multiple DFTs

Given: n1-by-n2 matrix X.

Multicolumn DFT Problem...

X ← Fn1 X

Multirow DFT Problem...

X ← XFn2
Blocked Multiple DFTs

X ← Fn1 X becomes

X1 | X2 | · · · | Xp ← Fn1 X1 | Fn1 X2 | · · · | Fn1 Xp
The 4-Step Framework

A matrix reshaping of the x ← Fnx operation when n = n1n2:

xn1×n2 ← xn1×n2 Fn2 Multiple row DFT

xn1×n2 ← Fn(0:n1 − 1, 0:n2 − 1).∗ xn1×n2 Pointwise multiply

xn2×n1 ← xTn1×n2 Transpose

xn2×n1 ← xn2×n1 Fn1 Multiple row DFT .

Can be arranged so communication is concentrated in the trans-

pose step.
Distributed Transpose: Example

Initial:  
X00 X01 X02 X03
 X10 X11 X12 X13 
X = 
 X20
.
X21 X22 X23 
X30 X31 X32 X33
Transpose each block:
 
T
X00 T
X01 T
X02 T
X03
 
 XT T
X11 T
X12 T
X13 
 10 
X ←  .
 XT T
X21 T
X22 T 
X23
 20 
T
X30 T
X31 T
X32 T
X33
Now regard as 2-by-2 and block transpose each block:
 
X T XT XT XT
 00 10 02 12 
 T T T T

X X X X 
X ←  01 11 03 13  .
 T T T T

X X X X 
 20 30 22 32 
T XT XT XT
X21 31 23 33
Now do a 2-by-2 block transpose:
 
X T XT XT XT
 00 10 20 30 
 T T T T

X X X X 
X ←  01 11 21 31  .
 T 
 X XT XT XT 
 02 12 22 32 
T XT XT XT
X03 13 23 33
Factorization and Transpose

xn×m ← xTm×n

corresponds to
x ← P (m, n)x

where P (m, n) is a perfect shuffle permutation, e.g.,

P (3, 4) = I12(:, [0 3 6 9 1 4 7 10 2 5 8 11])

Different multi-pass transposition algorithms correspond to differ-

ent factorizations of P (m, n).
Two-Dimensional FFTs

If X is an n1-by-n2 matrix then is 2D DFT is

X ← Fn1 XFn2

Option 1.
X ← Fn1 X
X ← XFn2

Option 2. Assume n1 = n2 and Fn1 = At · · · A1.

for q = 1:t
X ← Aq XATq
end
Interminlgling the column and row butterfly computations can
result in better locality.
3-Dimensional DFTs

Given X(1:n1, 1:n2, 1:n3 ), apply DFT in each of the three dimen-
sions.
If
x = reshape(X(1:n1, 1:n2, 1:n3), n1n2n3, 1)

then the problem is to compute

x ← (Fn3 ⊗ Fn2 ⊗ Fn1 )x

i.e.,
x ← (In3 ⊗ In2 ⊗ Fn1 )x
x ← (In3 ⊗ Fn2 ⊗ In1)x
x ← (Fn3 ⊗ In2 ⊗ In1)x
d-Dimensional DFTs

Sample for d = 5:
X(α1, α2 , α3, α4, α5) Fn1
µ=1
X(α2, α3 , α4, α5, α1) ΠTn1,n
X(α2, α3 , α4, α5, α1) Fn2
µ=2
X(α3, α4 , α5, α1, α2) ΠTn2,n
X(α3, α4 , α5, α1, α2) Fn3
µ=3
X(α4, α5 , α1, α2, α3) ΠTn3,n
X(α4, α5 , α1, α2, α3) Fn4
µ=4
X(α5, α1 , α2, α3, α4) ΠTn4,n
X(α5, α1 , α2, α3, α4) Fn5
µ=5
X(α1, α2 , α3, α4, α5) ΠTn5,n

Intemingling of component DFTs and tensor transpositions.

References

FFTW: http:[Link]

C. Van Loan (1992). Computational Frameworks for the Fast

Fourier Transform, SIAM Publications, Philadelphia, PA.

FFT Matrix Factorization Techniques
No ratings yet
FFT Matrix Factorization Techniques
42 pages
DSP-Lec 4
No ratings yet
DSP-Lec 4
32 pages
DIT FFT Algorithm and Twiddle Factors
100% (1)
DIT FFT Algorithm and Twiddle Factors
18 pages
FFT Algorithms in Digital Signal Processing
No ratings yet
FFT Algorithms in Digital Signal Processing
9 pages
Fast Fourier Transform Overview
100% (3)
Fast Fourier Transform Overview
32 pages
Radix-2 FFT Algorithm Explained
No ratings yet
Radix-2 FFT Algorithm Explained
15 pages
Fast Fourier Transform Algorithms Explained
No ratings yet
Fast Fourier Transform Algorithms Explained
61 pages
DFT and IDFT Experiment Report
No ratings yet
DFT and IDFT Experiment Report
14 pages
FPGA Radix-2 FFT Implementation
No ratings yet
FPGA Radix-2 FFT Implementation
7 pages
DFT vs FFT: Theory and Implementation
No ratings yet
DFT vs FFT: Theory and Implementation
16 pages
Fast Fourier Transform Overview
No ratings yet
Fast Fourier Transform Overview
17 pages
Quantum Fourier Transform Overview
No ratings yet
Quantum Fourier Transform Overview
13 pages
Decimation-in-Frequency FFT Overview
No ratings yet
Decimation-in-Frequency FFT Overview
23 pages
DIT vs DIF FFT Algorithms Explained
0% (1)
DIT vs DIF FFT Algorithms Explained
21 pages
Understanding Fast Fourier Transform (FFT)
No ratings yet
Understanding Fast Fourier Transform (FFT)
23 pages
FFT and FIR Filtering on C6713 DSK
No ratings yet
FFT and FIR Filtering on C6713 DSK
21 pages
Fast Fourier Transform Course Outline
No ratings yet
Fast Fourier Transform Course Outline
53 pages
FFT Algorithms in Signal Processing
100% (2)
FFT Algorithms in Signal Processing
39 pages
N-Point FFT Algorithm Explained
No ratings yet
N-Point FFT Algorithm Explained
10 pages
Signal Processing Algorithm Architecture
No ratings yet
Signal Processing Algorithm Architecture
60 pages
FFT and Its Relation to z-Transform
No ratings yet
FFT and Its Relation to z-Transform
27 pages
Pipelined Parallel FFT Architecture
No ratings yet
Pipelined Parallel FFT Architecture
5 pages
DFT and DTFT Relationship Explained
No ratings yet
DFT and DTFT Relationship Explained
48 pages
Overview of FFT Algorithms
No ratings yet
Overview of FFT Algorithms
37 pages
Fast Fourier Transform Overview
No ratings yet
Fast Fourier Transform Overview
22 pages
Understanding Fourier Transforms and DFT
No ratings yet
Understanding Fourier Transforms and DFT
27 pages
Fast Fourier Transform Implementation
No ratings yet
Fast Fourier Transform Implementation
8 pages
Fast Fourier Transform Algorithm Overview
No ratings yet
Fast Fourier Transform Algorithm Overview
11 pages
Decimation in Frequency FFT Overview
No ratings yet
Decimation in Frequency FFT Overview
23 pages
FFT Algorithm Implementation in C
No ratings yet
FFT Algorithm Implementation in C
9 pages
FFT Algorithm Efficiency Explained
No ratings yet
FFT Algorithm Efficiency Explained
20 pages
FFT Algorithm and Implementation Stages
No ratings yet
FFT Algorithm and Implementation Stages
47 pages
DIF FFT Algorithm Overview
No ratings yet
DIF FFT Algorithm Overview
23 pages
Fast Fourier Transform Overview
No ratings yet
Fast Fourier Transform Overview
19 pages
Fast Fourier Transform in MATLAB
No ratings yet
Fast Fourier Transform in MATLAB
9 pages
Radix-4 FFT-DIF Algorithm Overview
No ratings yet
Radix-4 FFT-DIF Algorithm Overview
7 pages
FFT Decimation Techniques Explained
No ratings yet
FFT Decimation Techniques Explained
37 pages
IP FFT Processors For OFDM in FPGA (Http://bbwizard - Com)
No ratings yet
IP FFT Processors For OFDM in FPGA (Http://bbwizard - Com)
9 pages
Fast Fourier Transform Methods Explained
No ratings yet
Fast Fourier Transform Methods Explained
18 pages
FFT Algorithm Complexity Analysis
No ratings yet
FFT Algorithm Complexity Analysis
4 pages
Fast Fourier Transform (FFT) Overview
No ratings yet
Fast Fourier Transform (FFT) Overview
20 pages
FFT 2025: DFT vs FFT Explained
No ratings yet
FFT 2025: DFT vs FFT Explained
39 pages
Fast Fourier Transform Explained
No ratings yet
Fast Fourier Transform Explained
14 pages
Comparing DIT and DIF FFT Algorithms
No ratings yet
Comparing DIT and DIF FFT Algorithms
37 pages
Overview of Fast Fourier Transforms
No ratings yet
Overview of Fast Fourier Transforms
30 pages
Decimation in Time and Frequency FFT
No ratings yet
Decimation in Time and Frequency FFT
37 pages
DIT Radix-2 FFT Algorithm Explained
No ratings yet
DIT Radix-2 FFT Algorithm Explained
4 pages
Impact of DPU 2017
No ratings yet
Impact of DPU 2017
6 pages
Fast Fourier Transform Overview
No ratings yet
Fast Fourier Transform Overview
25 pages
Fast Fourier Transform Overview
No ratings yet
Fast Fourier Transform Overview
11 pages
8-Point DIF FFT Case Study in MATLAB
No ratings yet
8-Point DIF FFT Case Study in MATLAB
7 pages
Pipelined 128-Point FFT Design Project
No ratings yet
Pipelined 128-Point FFT Design Project
70 pages
Implementation of Fast Fourier Transform (FFT) Using VHDL
93% (30)
Implementation of Fast Fourier Transform (FFT) Using VHDL
71 pages
Induction Machine Parameter Estimation
No ratings yet
Induction Machine Parameter Estimation
8 pages
LTSR 25-NP Current Transducer Overview
No ratings yet
LTSR 25-NP Current Transducer Overview
3 pages
Ultrasonic Ceramic Transducers Specs
No ratings yet
Ultrasonic Ceramic Transducers Specs
49 pages
dSPACE Tutorial for PV MPPT in Simulink
No ratings yet
dSPACE Tutorial for PV MPPT in Simulink
22 pages
VSI Operation in Induction Drives
No ratings yet
VSI Operation in Induction Drives
17 pages
Field Orientation in Induction Machines
No ratings yet
Field Orientation in Induction Machines
2 pages
Underground Mine Communications Infrastructure III GMG UM v01 r01
No ratings yet
Underground Mine Communications Infrastructure III GMG UM v01 r01
54 pages
Basic Circuit Measurements Lab Guide
No ratings yet
Basic Circuit Measurements Lab Guide
13 pages
dSPACE CLP1104 Manual 201663013420
No ratings yet
dSPACE CLP1104 Manual 201663013420
172 pages
WS-E10 / WSD-E11: Intrinsically Safe Wireless Anemometer
No ratings yet
WS-E10 / WSD-E11: Intrinsically Safe Wireless Anemometer
4 pages
GfG Microtector III G888 Overview
No ratings yet
GfG Microtector III G888 Overview
8 pages
Energy Efficient Induction Motor Control
No ratings yet
Energy Efficient Induction Motor Control
11 pages
Capital Modular XC Functional 251109
100% (4)
Capital Modular XC Functional 251109
73 pages
Iron Loss Minimization in Induction Motors
No ratings yet
Iron Loss Minimization in Induction Motors
7 pages
ALTAIR-5X-PID - Wireless Solution Bulletin - GB
No ratings yet
ALTAIR-5X-PID - Wireless Solution Bulletin - GB
4 pages
Caratheodory, Calculus of Variations and Partial Differential Equations PDF
No ratings yet
Caratheodory, Calculus of Variations and Partial Differential Equations PDF
412 pages
MATLAB Matrix Operations and Inverses
No ratings yet
MATLAB Matrix Operations and Inverses
10 pages
Power System Matrices and Operations
No ratings yet
Power System Matrices and Operations
33 pages
JEE 2025 Maths Practice Questions
No ratings yet
JEE 2025 Maths Practice Questions
3 pages
Understanding Matrix Determinants
100% (1)
Understanding Matrix Determinants
15 pages
MATH 2070 Algebraic Structures Solutions
No ratings yet
MATH 2070 Algebraic Structures Solutions
4 pages
Vector Space and Subspace Problems
No ratings yet
Vector Space and Subspace Problems
4 pages
Matrix Algebra Worksheet with Vedic Methods
No ratings yet
Matrix Algebra Worksheet with Vedic Methods
4 pages
Matrix Operations and Properties Quiz
No ratings yet
Matrix Operations and Properties Quiz
2 pages
LU Factorization in Linear Algebra
No ratings yet
LU Factorization in Linear Algebra
29 pages
ECON 255 - Final Fall 2022 - 5 Dec 2024
No ratings yet
ECON 255 - Final Fall 2022 - 5 Dec 2024
17 pages
Lec 8 (MTH100) Matrices and Determines
No ratings yet
Lec 8 (MTH100) Matrices and Determines
9 pages
MTH603 Final Term MCQs Review
No ratings yet
MTH603 Final Term MCQs Review
7 pages
MATH F112 Assignment I Instructions
No ratings yet
MATH F112 Assignment I Instructions
3 pages
Linear Equations and Matrix Solutions
No ratings yet
Linear Equations and Matrix Solutions
6 pages
Ji-Hun and Kana's Marathon Catch-Up
No ratings yet
Ji-Hun and Kana's Marathon Catch-Up
8 pages
Understanding Pivot Matrices
No ratings yet
Understanding Pivot Matrices
7 pages
Orthogonal Matrices and Traffic Flow
No ratings yet
Orthogonal Matrices and Traffic Flow
10 pages
Numerical Methods for Algebraic Equations
No ratings yet
Numerical Methods for Algebraic Equations
129 pages
Overview of Matrix Theory and Applications
No ratings yet
Overview of Matrix Theory and Applications
14 pages
Linear Algebra 404 Spring 2025 Homework 4
No ratings yet
Linear Algebra 404 Spring 2025 Homework 4
2 pages
Eigenvalue Mapping Overview
No ratings yet
Eigenvalue Mapping Overview
3 pages
Class 12 Matrices and Determinants Test
No ratings yet
Class 12 Matrices and Determinants Test
2 pages
Ramanujan's 3x3 Magic Square Insights
No ratings yet
Ramanujan's 3x3 Magic Square Insights
15 pages
Properties of Matrices Explained
No ratings yet
Properties of Matrices Explained
7 pages
Matrix Diagonalization Explained
No ratings yet
Matrix Diagonalization Explained
8 pages
Wei Ren and Beard - 2005 - Consensus Seeking in Multiagent Systems Under Dynamically Changing Interaction Topologies
No ratings yet
Wei Ren and Beard - 2005 - Consensus Seeking in Multiagent Systems Under Dynamically Changing Interaction Topologies
7 pages
Negative Binomial Distribution Insights
No ratings yet
Negative Binomial Distribution Insights
191 pages
MATH F112 Assignment I Instructions
No ratings yet
MATH F112 Assignment I Instructions
2 pages
Eigenvalues and Eigenvectors Explained
No ratings yet
Eigenvalues and Eigenvectors Explained
16 pages
Diagonalization and Eigenvalues Explained
No ratings yet
Diagonalization and Eigenvalues Explained
30 pages

FFT Matrix Factorization Techniques

Uploaded by

FFT Matrix Factorization Techniques

Uploaded by

The FFT

Via Matrix Factorizations

Charles Van Loan

A well known strategy for high-performance Ax = b and Ax = λx

One way to execute a matrix-vector product

A different factorization Fn = Ãt̃ · · · Ã1 would yield a different

If ωn = cos(2π/n) − i · sin(2π/n) then

An n-by-n matrix A is data sparse if it can be represented with

A is a Kronecker Product B ⊗ C, e.g.,

If B ∈ IRm1×m1 and C ∈ IRm2×m2 then A = B ⊗ C has m21m22

A is 2d -by-2d but is parameterized by O(dn4) numbers.

The DFT matrix can be factored into a short product of sparse

F1024 = A10 · · · A2A1P1024

computes y = Fnx and requires O(n log n) flops.

Fn/2 “shows up” when you permute the columns of Fn so that

We build an 8-point DFT from two 4-point DFTs...

The Radix-2 Factorization...

where Πn = In(:, [Link]n [Link]n]).

Pn = the n-by-n “bit reversal ” permutation matrix

Γq−1 = Πr∗ ⊗ IL∗ , L∗ = L/2, r∗ = 2r,

ΩL∗ = diag(1, ωL, . . . , ωLL∗−1).

Store intermediate DFTs by row or column

Intermediate DFTs adjacent or not.

How the two butterfly loops are ordered.

It can be shown that FnT = Fn and so if

From “problem space” to “DFT space” via

Do your thing in DFT space. Then inverse transform back to

Can avoid the Pn ops by working in “scrambled” DFT space.

Can combine four quarter-length DFTs to produce a single full-

The radix-4 butterfly.

Given: n1-by-n2 matrix X.

Multicolumn DFT Problem...

Multirow DFT Problem...

A matrix reshaping of the x ← Fnx operation when n = n1n2:

xn1×n2 ← xn1×n2 Fn2 Multiple row DFT

xn1×n2 ← Fn(0:n1 − 1, 0:n2 − 1).∗ xn1×n2 Pointwise multiply

xn2×n1 ← xTn1×n2 Transpose

xn2×n1 ← xn2×n1 Fn1 Multiple row DFT .

Can be arranged so communication is concentrated in the trans-

where P (m, n) is a perfect shuffle permutation, e.g.,

P (3, 4) = I12(:, [0 3 6 9 1 4 7 10 2 5 8 11])

Different multi-pass transposition algorithms correspond to differ-

If X is an n1-by-n2 matrix then is 2D DFT is

Option 2. Assume n1 = n2 and Fn1 = At · · · A1.

then the problem is to compute

x ← (Fn3 ⊗ Fn2 ⊗ Fn1 )x

Intemingling of component DFTs and tensor transpositions.

C. Van Loan (1992). Computational Frameworks for the Fast

You might also like