0% found this document useful (0 votes)
98 views73 pages

4 Giantomassi GWR Abinit

The document describes the GW implementation in the ABINIT code. It has limitations for large systems due to quartic scaling. A new implementation called GWR uses imaginary axis and supercells to achieve linear scaling. GWR computes the self-energy on the imaginary axis using FFTs and stores G and W in memory. Self-consistent GW calculations can be performed in a single run.

Uploaded by

alpharoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views73 pages

4 Giantomassi GWR Abinit

The document describes the GW implementation in the ABINIT code. It has limitations for large systems due to quartic scaling. A new implementation called GWR uses imaginary axis and supercells to achieve linear scaling. GWR computes the self-energy on the imaginary axis using FFTs and stores G and W in memory. Self-consistent GW calculations can be performed in a single run.

Uploaded by

alpharoy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Low-scaling GW in ABINIT

M. Giantomass
Université Catholique de Louvai
Louvain-la-Neuve, Belgium
i

The GW implementation of ABINIT


(quartic scaling version)

• Formalism in G- and ω-space (real-axis)


• Norm-conserving pseudos (recommended) and PAW
• Di erent approximations for the self-energy: HF, COHSEX, GW
• Di erent integrations techniques for Σ:
- four di erent plasmon-pole models: FAST but APPROXIMATE
- countour-deformation (CD): ACCURATE but SLOW
- Analytic continuation + Pade’

• Di erent levels of self-consistency: G 0W 0, energy-only, qp-GW, full GW


• MPI-algorithm with distributed wavefunctions
• OpenMP threads for low-level loops , BLAS and FFTs
• Steps are connected via les and input variables: getwfk or getwfk_ lepath, getscr_ lepath …

• gwcalctyp de nes the approximation for Σ and the self-consistency mode


ff
ff
ff
ff
fi
fi
fi
fi
Limitations of the standard GW code
• Quartic scaling in natom
• Quadratic in the number of k-points in the BZ (if symmetries are not exploited)
• Memory for W(g, g′, q, ω) does not scale with MPI procs (big limitation when computing Σ)
• Most calculations are still performed at the G0W0 level with the plasmon-pole approximation
• Only two MPI levels (nband and nsppol). Decent scalability but not exascale-ready:
https://gitlab.pop-coe.eu/documents/reports/-/raw/master/POP2-AR-157-Abinit.pdf

GW in imaginary time with supercells
ABINIT GWR code (not yet released)
GW with supercells and imaginary-axis
PHYSICAL REVIEW B 90, 054115 (2014)

In brief:
• Work with the analytic continuation of Hedin’s equations on the image. axis: (t → iτ, ω → iω)
• Avoid convolutions by working in the most natural space e.g.:

- χ(r, R′, iτ) = G(r, R′, iτ) G*(R′, r, − iτ)

- Wk(g, g′, iω) = vk(g, g′)ϵk−1(g, g′, iω)

• Use FFTs to go from the R = r + L supercell to G = k + g and viceversa


• Sample the imaginary axis with minimax meshes {ωk}, {τj} that minimize the maximum error in
Δmax
the MP2 energy for given N and R =
Δgap
• Use inhomogeneous sine/cosine transforms for iωk ↔ iτj
Precomputed weights


χk(g, g′, iωk) = γkj cos(ωkτj)χk(g, g′, iτj)
j=1








GWR code in a nutshell
• optdriver 6 to activate the GWR driver
• gwr_task specifies the task to perform:
‣ “HDIAGO” for direct diagonalization with scalapack followed by WFK output
‣ “G0W0” for one-shot method
‣ "EGEW", "EGW0", “G0EW" for eigenvalue-only self-consistency
‣ “RPA_ENERGY” for Ec energy with automatic extrapolation for npweps → ∞

• External files required:


1) DEN file with GS density (required for all tasks)
2) WFK file with empty states (only for GW/RPA tasks)
• Scalapack required in all gwr_tasks
• G and W are computed in the same run and stored in memory (no getscr* variables)
• Self-consistent iterations are performed in the same dataset (no getqps variable)
• Automatic parallelization:
‣ HDIAGO uses MPI pools to distribute k/spins and Hgg′ matrix with scalapack
‣ The other gwr_tasks employ a 4D MPI grid (g/r, k-points, minimax mesh, spin)

GWR input variables
New input variables specific to GWR:
• gwr_task instead of gwcalctyp
• gwr_ntau: number of points in the minimax mesh (GreenX library)
• gwr_boxcutmin: defines the FFT mesh for G from ecut (crucial for performance and memory)
• gwr_max_niter, gwr_tolqp_eig: stopping criteria for GW self-consistency
• gwr_np_kgts: to specify the MPI grid (optional)
• gwr_sigma_algo: 1 for supercell version, 2 for convolutions in BZ with symmetries
• gwr_max_hwtene: Max. transition energy included in the computation of the head/wings of χgg′(q → 0)
• ….

Variables in common with the legacy GW code:


• ecuteps: cutoff energy for χ, W
• ecutsigx: cutoff energy for Σx
• gw_qprange or (nkptgw, kptgw bdgw) to define states in Σnk
• inclvkb: for the treatment of the q → 0 limit in χ
• gw_icutcoul, vcutgeo: treatment of q → 0 divergence and Coulomb cutoff for isolated systems

Input le for G0W0 with the GWR code
optdriver 6 # Activate GWR cod
gwr_task "G0W0" # One-shot calculatio

getden_ lepath "GS_DEN" # Read GS densit


getwfk_ lepath "GREEN_WFK" # Read WFK le with empty states

nband 1000 # Bands in Green’s function


gwr_ntau 8 # Number of minimax points
gwr_boxcutmin 1.1 # Ratio between FFT box and G-sphere. Default: 1.

ecuteps 8.0 # Cut-off energy for dielectric matrix.


ecutsigx 12.0 # Dimension of the G sum in Sigma_x

nkptgw 2 # number of k-point where GW corrections are compute


# set it to 0 to automatically select the fundamental and the direct gap
kptgw # k-points in reduced coordinate
0.0 0.0 0.
0.5 0.0 0.

bdgw # calculate GW corrections for bands from 4 to


4
4 5
5

fi
fi
fi
0

fi
y

Input le for energy-only self-consistent GW

optdriver 6 # Activate GWR cod


gwr_task "EGW0" # energy-only self-consistency in G.

getden_ lepath "GS_DEN" # Read GS densit


getwfk_ lepath "GREEN_WFK" # Read WFK le with empty states

nband 1000 # Bands in Green’s function


gwr_ntau 8 # Number of minimax points
gwr_boxcutmin 1.1 # Ratio between FFT box and G-sphere. Default: 1.

ecuteps 8.0 # Cut-off energy for dielectric matrix.


ecutsigx 12.0 # Dimension of the G sum in Sigma_x

gw_qprange +8 # Compute Σ for all occ states + 8 empty bands


gwr_nstep 4 # Max number of iteration (default 50, so one might omit it
gwr_tolqpe 0.02 eV # Will stop if all abs differences between QP energies computed at tw
# consecutive iteration steps are smaller than this value
fi
fi
fi
fi
y

Green’s function: real ω vs iτ space

G in ω-space (real axis) G in imaginary time iτ

ψn(r)ψ* n (r′)
∑ ω − εn + iδ +sign(εn)
G(r, r′, ω) = G(r, r′, iτ) = Θ(τ)G(r, r′, iτ) + Θ(−τ)G(r, r′, iτ)
n

unocc
−εnτ

G
G(r, r′, iτ) = − ψn(r)ψ*
n (r′)e (τ > 0)
n
Bounded exp.
occ
−εnτ

G(r, r′, iτ) = ψn(r)ψ*
n (r′)e (τ < 0)
n

• Smooth behaviour in iτ/iω → integration is “easier”


• Branch cuts and poles → ω-integration is tricky
• Requires iτ ⇒ iω transforms
• Analytic expression for RPA χ̃(ω)
• Requires analytic continuation to go back to the real-ω axis
• Direct connection with QP energies and spectral function A(ω)
before computing QP energies and A(ω)









Plane-wave expansion of two-point functions
• In nite system simulated with Born-von-Karman (BvK) periodic boundary conditions i.e.
(N1, N2, N3) supercell of volume V = NΩ with N = N1N2N3 and Ω the unit cell volume

• G, χ̃, W are de ned in the BvK supercell

• G, χ̃, W are invariant if we translate both r1 and r2 by R i.e.:

G(r1, r2) = G(r1 + R, r2 + R)

• This implies the Fourier expansion:

1 i(q+G1)⋅r1 −i(q+G2)⋅r2
V ∑
f(r1, r2) = e fG1G2(q) e
q
G1G2
1
V ∬V
fG1G2(q) = e −i(q+G1)⋅r1
f(r1, r2) e i(q+G2)⋅r2
dr1 dr2

1 1 1
where the q-points belong to the BZ mesh dual to the BvK supercell: ( , , )
N1 N2 N3
fi
fi
GWR algorithm
P. Liu et al. PhysRevB. 94 165109 (2016)
WFK generation with direct diagonalization
G(r, r′, iτ) = Θ(τ)G(r, r′, iτ) + Θ(−τ)G(r, r′, iτ)
optdriver 6 # enter GWR code
gwr_task "HDIAGO" # direct diago
unocc
getden_ lepath “GS_DEN” # read GS density to build H −εnτ

G(r, r′, iτ) = − ψn(r)ψ*
n (r′)e (τ > 0)
nband 1200 # occ + empty states
n
Bounded exp.
occ
−εnτ

G(r, r′, iτ) = ψn(r)ψ*
n (r′)e (τ < 0)
Scalapack diago vs iterative eigensolvers: n

• Iterative solvers are efficient provided nband << npw


• High-energy states are difficult to converge with iterative methods
• Direct diago. easily outperforms iterative solvers (e.g. lobpcg) if many bands are needed
• In ZnO, for instance, ~3000 bands are needed to converge…
ZnO with 4 nodes on Lumi wall-time (s)
nband slk_diago lobpcg
ecut 40.0 1000 11 105
mpw 3909
ngkpt 885 2000 21 306
nbdbuf 10% nband 3000 35 FAIL
tolwfr 1.0d-18




(512 cores, 2 Gb per core)



fi
.

Spatial symmetries in GW Full BZ

• KS wavefunctions in the BZ can be reconstructed from the IBZ:


IBZ

ϵℛk = ϵk
[H, {ℛ, t}] = 0 uℛk(r) = e −iℛk⋅t uk(ℛ−1(r − t))
−i(ℛk+G)⋅t −1
uℛk(G) = e uk(ℛ G) .
Rotation matrix Fractional translation

• Spatial symmetry for the polarizability:

χ (r1, r2) = χ (ℛ (r1 − t), ℛ (r2 − t))


0 0 −1 −1
χ 0
GG
(ℛq) = e it⋅(G2−G1) 0
χ ℛ−1G ℛ−1G
(q)
1 2 1 2

Take-home message:
‣ Bloch states are computed in the IBZ and then reconstructed in the BZ at runtime
0
‣ G(k) and χ (q) are computed and stored only for k/q in the IBZ
‣ BZ integrals depending on an external q, can be restricted to the IBZq de ned by the little-group of q
‣ Signi cant speedup and memory saving in high-symmetry systems. Time-reversal can be easily included
fi
fi
MPI distribution of G, χ, W in GWR
• 4D MPI grid to distribute memory and operations over:
σ
Gk ( g, g′ , ± iτ)

- collinear spins inside spin_comm (trivial algo.)
- IBZ k-points inside kpt_comm
- g’ components inside g_comm
- iτ/iω points inside tau_comm (almost trivial algo.)
PBLAS

• spin_comm and tau_comm levels are very efficient (few MPI communications)
• kpt_comm and g_comm are network intensive but crucial to keep memory at bay
• To go to the supercell, indeed, we need to pre-compute and store in memory:

NBZ n t
i(k+g)r
for each k ∈ BZ memory ∝

Gk(r, g′) = e Gk(g, g′) × × npw
npk npg
g

• For optimal performance, MPI procs should be a multiple of gwr_ntau x nsppol but mind the memory for G!
• Matrices are stored in single precision by default (—enable-gw-dpc=“yes” to use double precision)



ff
GWR algorithm
P. Liu et al. PhysRevB. 94 165109 (2016)
From G to χ in iτ space (step 1) NB: the loop over τ-points is external. At
each iteration, we have to consider ± τi
For each k in the BZ, do: (not always shown in the equations)

symm
➡ Use symmetries to build G GkIBZ(g, g′) → Gk(g, g′)
k
k in the BZ
in the BZ
i(k+g)r

Gk(r, g′) = e Gk(g, g′)
➡ FFT along g index
g
Local to each MPI distributed
MPI proc inside g_comm

➡ MPI-transpose to have g′ PTRANS


local on each proc
Gk(r, g′) → G̃k(g′, r)
Local MPI distributed

Cons: Pros:
NBZ n t
‣ Workspace memory ∝
npk
×
npg
× npw
‣ Linear scaling in NBZ
2NBZ
‣ Lots of calls to PTRANS: ( np ) ‣ Scales well with npk (less PTRANS calls)
k
‣ Memory increases with Nr (ecut and gwr_boxcutratio)







ff
From G to χ in iτ space (step 2)
Step 1. For each r in unit cell, use G̃ to compute:

−i(k+g′)R′ Need all k in the BZ for the FFT!



G(r, R′) = G(r, k + g′)e
k-parallelism is really low-level!
kg′

χ(r, R′, iτ) = G(r, R′, iτ)G*(R′, r, − iτ) Only χr(R′) is stored at xed r

iG′R′

χ(r, G′) = χ(r, R′)e Transform immediately to G′-space (k + g′)
R′∈S and store results in temp. PBLAS matrix χ̃

Step 2. Once all r have been computed, MPI-transpose χ and perform FFT along the r-axis

−i(k+g)r

χk(g, g′) = e χ(r, k + g′) Only k-points in the IBZ are stored
Matrices are PBLAS-distributed
r∈C
Cons: Pros:
‣ k-parallelism requires nfft communications ‣ Tons of FFTs in batch mode (blocking over r)
‣ We loose part of the speedup gained in step 1 ‣ Ideal scenario for OpenMP/GPUs


















fi
GWR algorithm
P. Liu et al. PhysRevB. 94 165109 (2016)
Computing W from χ
Step 1. Cosine transform (iω → iτ):
Requires communication inside tau_comm
N


χk(g, g′, iωk) = γkj cos(ωkτj)χk(g, g′, iτj)
j=1

Step 2. Compute symmetrized dielectric matrix:



εk(g, g′, iω) = δgg′ − vk(g, g′)χk(g, g′, iω) vk(g, g′) =
| k + g | | k′ + g′|
vcutgeo selects the
Step 3. Compute correlated screened Coulomb interaction W̃ : expression for v

−1
Wk(g, g′, iω) = vk(g, g′)ϵk (g, g′, iω) W̃k(g, g′, iω) = Wk(g, g′, iω) − vk(g, g′)
Matrix inversion with Scalapack/ELPA.

Step 4. Inverse Cosine Transform (iτ → iω):


N


W̃k(g, g′, iτk) = ξkj cos(ωkτj)W̃k(g, g′, iωj)


j=1















Requires communication inside tau_comm
GWR algorithm
P. Liu et al. PhysRevB. 94 165109 (2016)
Computing Σnq(ω)
FFT FFT
Step 1. FFTs in the unit cell: Gk(g, g′, iτ) ⟹ Gk(r, g′, iτ) W̃k(g, g′, iτ) ⟹ W̃k(r, g′, iτ)

Step 2. For each r in C do:


G(r, k + g′, iτ)e −i(k+g′)R′

G(r, R′, iτ) =
kg′
−i(k+g′)R′

W(r, R′, iτ) = W(r, k + g′, iτ)e
kg′

Σ(r, R′, iτ) = − G(r, R′, iτ)W(r, R′, iτ) Avoid storing full
Σ(r, R′) in memory

Compute partial

Σnq(iτ) = Σnq(iτ) + ψ*
nq(r) Σ(r, R′, iτ) ψnq(R′)
contribution to Σnq
R′∈S
and accumulate

Step 3. sine/cosine transforms to go to iω space, followed by analytic continuation to the real-ω:

C S CT+ST AC
Σnq(iτ) = Σnq(iτ) + Σnq(iτ) ⟹ Σnq(iω) ⟹ Σnq(ω)



Step 4. Add exchange part (sum over occ states directly). Finally, solve the linearized QP equation


















Validation: χ(ω) with GWR and Adler-Wiser
• Silicon with 4x4x4 Γ-centered k-mesh
• gwr_ntau = 12
• nband = 100 and inclvkb 2 to compute head and wings
QP direct gaps with GWR and quartic GW
• 4x4x4 Γ-centered k-mesh
• nband = 100 × nocc, ecuteps = 14 Ha
• gwr_ntau = 20 in GWR, nfreqre = 50, freqremax=1.5 Ha, nfreqim 10 for CD

• Overall, good agreement. CD is our reference


• Largest difference between GWR and quartic code for LiF at Γ (~0.2 eV)
• In GaAs, GWR and CD agree with each other, PPM overestimates CD/GWR by ~0.2 eV
Is GWR faster than the legacy code?
Well, it depends:
‣ In small symmetric systems, the quartic code is still competitive but W is not MPI-distributed!
‣ GWR is superior if:
- low-symmetry systems with dense k-meshes
- large ecuteps or nband
- G0W0 without PPM
- off-diagonal matrix elements of Σ are needed for self-consistency wall-time spent in the GWR
routines for nband 1000

wall-time (s)
Benchmark results for ZnO: nband Quartic GW GWR
‣ 8 nodes on Lumi, 2 Gb per core
1000 3023 1947
‣ ecut 40.0
‣ ecuteps 12 2000 MEM_FAIL 2145

‣ ngkpt 8 8 5 3000 MEM_FAIL 2432


‣ nomega/gwr_ntau = 12
‣ npτ = 2 in GWR
• Most of the wall-time spent to build χ and Σ in the supercell (build_chi and build_sigmac)
• Σ is as expensive as χ, unlike the quartic-code
Scaling of GWR algo. with the k-mesh size
‣ Linear scaling with the BZ size but computing Σ in the SC is more expensive than χ (cpu and memory)
‣ If one-shot QPs are needed only at the CBM/VBM, convolution + symmetries for Σnk is faster:

Σ(r, R) ≈ G(r, R)W(r, R)


self-energy in the supercell

ℒk


Σk(r, r′) ≈ Gk+q(r, r′)Wq(r, r′)
q

self-energy at k via convolutions and


symmetries (gwr_sigma_algo = 1)

‣ Computing Σ in the supercell is the recommended approach if one needs Σnk for all k in the IBZ, e.g.:
- band structure interpolation of G0W0 results
- self-consistency (requires off-diagonal matrix elements for which symmetries are not easy to exploit)



Pros and cons of GWR code
Pros:
‣ Cubic scaling in natom
‣ Linear scaling with Nk in the full BZ
‣ Fast convergence with minimax mesh (~20 points)
‣ GW beyond PPA: Σ(ω) and A(ω) at reasonable cost
k
‣ Computing off-diagonal Σmn for all k-points in the IBZ is not as expensive as in legacy code

Cons:
‣ Symmetries are more difficult to exploit, especially in the supercell
‣ Requires Pade’ to go back to the real axis: Σ(iω) → Σ(ω)
‣ Much more memory-demanding than conventional GW algorithm
‣ Requires different MPI levels and PBLAS distribution of G, χ, W to make memory scale
‣ Needs precomputed minimax meshes (solved thanks to Green-X library)
Supplemental material
Why do we need GW?

• LDA/GGA systematically underestimate exp. band gaps

• Hedin’s equations provide a rigorous approach to study excitation


energies and band gaps

• Hedin’s equations are hard to solve thus we usually employ a


simpli ed version e.g. GW in the one-shot version (G0W0)

• G0W0 improves bands gaps and dispersions by including screening and 8

AlAs,GaP,SiC,AlP,CdS

diamond
AlN
SrO
ZnO,GaN,ZnS
exchange e ects beyond Kohn-Sham (KS) theory

InP,GaAs,CdTe,AlSb
Calculated gap (eV)
6

ZnSe,CuBr
InN,Ge,GaSb,CdO
• G0W0 still undershoots gaps: lot of discussions on starting point,

Se,Cu2O
4

MgO
vertex-corrections, self-consistency, e-ph interaction, etc.

InSb,P,InAs

CaO
2

Si
HgTe
• For accurate optical properties, we need to go beyond GW and include
0 :LDA
e-h interaction via e.g. the Bethe-Salpeter equation (BSE) :GW(LDA)

0 2 4 6 8
Experimental gap (eV)

[adapted from van Schilfgaarde et al., PRL 96,


fi
ff
        

  
            

GW equations in real-space and time


    
         
 
              

             
                    
         

 

Hedin’s equations The GW approximation



                  


              


               


            

 
                  



(1) ≡ (r1, t1)

δ(12) = δ(r1 − r2) δ(t1 − t2)

• The GW equations are much “easier” to solve as we completely bypass the vertex equation
• Several technical aspects to be considered:
- representation: r-space vs G-space, frequency-space vs (imaginary) time, etc.
- basis set expansion, integration techniques
- self-consistency: one-shot, partial/full GW consistency or e ective QP Hamiltonian
ff
Spatial dependence: unit cell and BvK supercell
Unit cell C with lattice ℒC and BZ C*
• k: wave-vector in C*
• g: vector of the reciprocal lattice ℒ*
C

Supercell S with lattice ℒS and BZ S*

• Any G of the (dense) reciprocal lattice ℒ*


S
can be
expressed as G = k + g
PHYSICAL REVIEW B 90, 054115 (2014)

iGr −iG′r′

‣ Fourier series for functions ful lling BvK conditions: F(r, r′) = e F(G, G′)e
GG′ Block diagonal
matrix
‣ If F(r, r′) = F(r + a, r′ + a) ∀ a ∈ ℒC then F(k + g, k′ + g′) = δkk′Fk(g, g′)

e i(k+g)rFk(g, g′)e −i(k+g′)r′



F(r, r′) = Much more e cient than a brute

k force 6D FFT in (G, G’) space

gg′














ffi
fi
Exchange part
BZ occ

∑∑
• Fock operator in real-space: Σx(r1, r2) = − Ψnk(r1)Ψnk(r2) v(r1, r2)
k ν

• Σ matrix elements in G-space:

• Relatively fast (only occupied states involved)


• Need ecutsigx >> ecut due to long range bare Coulomb interaction
• Slow convergence with q-mesh due to singularity for | q | → 0. Di erent techniques available:
• Auxiliary function integration for 3D systems (Gygy1986 or Carrier2007)
• Truncated Coulomb interaction (sphere, cylinder, surface geometry, …)
• Monte Carlo integration of 1/vc(q + G) in the BZ microzone
ff
gw_icutcoul -20 Band 1

• gw_icutcoul selects the technique to integrate the Coulomb divergence -22

-18 Band 4

• Spherical integration (gw_icutcoul 3)

Σx (eV)
-20 Spherical Integration
Carrier's Auxiliary Function
Spherical Cutoff in vc(r)
• approximate the mini-box around Γ with a sphere -22

-6.4
• perform the integration analytically (gw_icutcoul 3) -6.8
-7.2 Band 6

• Default is the auxiliary function by [Carrier2007] (gw_icutcoul 6) -7.6


,1) ,2) ,3) ,4) ,5) ,6) ,7) ,8) ,9) 10)
(1,1 (2,2 (3,3 (4,4 (5,5 (6,6 (7,7 (8,8 (9,910,10,
(
# Q-Points in full BZ (log scale)
• We add and remove a function f(q) with the same asymptotic behaviour as v(q) for q → 0

• The integral over the BZ of f(q) has an analytical expression

• The integrand Σx(q, …) − f(q) is smooth and can be integrated with a coarse q-mesh

• Better than the spherical integration but can be improved…

• Note that the Monte-Carlo (MC) approach (gw_icutcoul 16) converges


faster than the auxiliary function integration technique

[Rangel2020]
Long wavelength limit
• In semiconductors, the head and the wings of χ̃
G,G′ go to zero for | q | → 0

lim χ̃00(q, ω) = 0 lim χ̃G0(q, ω) = 0 lim χ̃0G(q, ω) = 0


q→0 q→0 q→0
Head Wings

0
• At the level of ε, this leads to a form for | q | → 0
0
• The limit is nite but the value depends on the direction q̂

• To compute the limit we need:

−iq⋅r 2
⟨b1, k − q | e | b2, k⟩ = δb1b2 − iq ⋅ ⟨b1, k | r | b2, k⟩ + (q )
ill de ned in periodic systems

• The matrix elements of r are expressed in terms of the [H, r] commutator:

⟨b1, k | [H, r] | b2, k⟩


⟨b1, k | r | b2, k⟩ = for b1 ≠ b2 H = T +VHxc(r) + Vloc(r) +VNL(r, r′)
εb1k − εb2k
pseudopotential part


𝒪
fi
fi
• The nal expression reads: CPU demanding

⟨b1 , k | ∇−[V NL , r] | b2 , k⟩
⟨b1, k − q | e −iq⋅r | b2, k⟩ ≈ − iq ⋅ for b1 ≠ b2
q→0 εb2k − εb1k

Take-home message:
• The commutator [VNL, r] is important for optical properties or for GW calculations in large cells.
Less critical for GW in bulk systems
• This term is included by default (inclvkb 2). Use 0 to ignore it
• Heads and wings converge fast with nband and slowly with the number of k-points
• Randomly shifted k-meshes are usually used to converge the macroscopic dielectric function with/wo
local- eld (LF) e ects:

LF(ω) = lim 1 NLF(ω) = lim ε (q, ω)


εM εM 00
q→0 ε00
−1(q, ω) q→0

• In GW, one usually uses the same high-symmetry k-mesh both in screening and sigma to reduce nkpt
fi
fi
ff
GW method in Fourier space
in a nutshell
Plane-wave expansion of Bloch orbitals

• For periodic systems with lattice vectors R, Bloch theorem’s states:


ik·r
nk (r) =e unk (r) with unk (r + R) = unk (r)

• If we de ne the reciprocal lattice with lattice vectors G, such that:


iG·R
e =1

the periodic part of the Bloch’s function can be written:

where the coefficients unk(G) are obtained by a Fourier transform:

Unit cell
2
!k + G!
• The basis set is truncated such that 2
< Ecut
fi
Plane-wave expansion of two-point functions
• In nite system simulated with Born-von-Karman (BvK) periodic boundary conditions i.e.
(N1, N2, N3) supercell of volume V = NΩ with N = N1N2N3

• G, χ̃, W are de ned in the BvK supercell

• G, χ̃, W are invariant if we translate both r1 and r2 by R i.e.:

G(r1, r2) = G(r1 + R, r2 + R)

• This implies the Fourier expansion:

1 i(q+G1)⋅r1 −i(q+G2)⋅r2
V ∑
f(r1, r2) = e fG1G2(q) e
q
G1G2
1
V ∬V
fG1G2(q) = e −i(q+G1)⋅r1
f(r1, r2) e i(q+G2)⋅r2
dr1 dr2

1 1 1
where the q-points belong to the BZ mesh dual to the BvK supercell: ( , , )
N1 N2 N3
fi
fi
RPA polarizability in the ω-domain
• Use the Lehmann representation of the time-ordered G:

Ψi(r1)Ψ†i (r2)
η → 0+,
∑ ω − ϵi + iη sign(ϵi − μ)
G(r1, r2; ω) =
i

to evaluate the frequency convolution:


−i
2π ∫
χ̃(r1, r2, ω) = G(r1, r2, ω − ω′)G(r2, r1, ω) e −iηω′dω′

• Branch cuts —> ω-integration is tricky


and obtain the Adler-Wiser expression (sum over transitions): • Certain ω-integrals can be computed analytic
• Direct connection with observables
ψnk(r1)ψ*
mk′(r1)ψnk(r2)*ψmk′(r2)
χ̃0(r1, r2, ω) =

( fnk − fmk′)
nk,mk′
ω − εnk + εmk′ − iη sgn(εnk − εmk′)

• Finally, we transform to reciprocal-space …











RPA polarizability in the G-ω domain

with the oscillator matrix element M given by:

computed with FFT

• Main bottleneck of our GW work ow


• Slow convergence wrt nband (NB: nband and ecuteps are coupled)
• In principle N 2 scaling but, thanks to spatial and TR symmetries, it is somewhere between N 2 and N N
BZ IBZ IBZ BZ
• Number of ω-points: 2 if PPM (fast algo as data ts into cache), ~50-100 if CD (slow due to main memory access)
fl
fi
GW self-energy in ω-domain
• In ω-space, the GW self-energy is given by the convolution:
Wk(g, g′, ω′) = vk(g, g′)ϵk−1(g, g′, ω′)

i
2π ∫
Σ(r1, r2; ω) = G(r1, r2; ω + ω′)W(r1, r2; ω′)e iω′δ +
dω′ .

−1
• Using W = v + (ε − 1)v, we rewrite Σ as exchange (x) + correlation (c):

Σ(r1, r2; ω) ≡ Σx(r1, r2) + Σc(r1, r2; ω)


Long range Short range

k 0 0
• ABINIT computes Σnm in the KS basis. G W energies are then obtained via the linearized QP equation:

ϵQP = ϵKS + Z⟨ΨKS | Σ(ϵKS) − vxc | ΨKS⟩ .

−1

[ ]
KS ∂Σ(ϵ) KS
Z ≡ 1 − ⟨Ψ | |Ψ ⟩
∂ϵKS









Exchange part
BZ occ

∑∑
• Fock operator in real-space: Σx(r1, r2) = − Ψnk(r1)Ψnk(r2) v(r1, r2)
k ν

• Σ matrix elements in G-space:

• Relatively fast (only occupied states involved)


• Need ecutsigx >> ecut due to long range bare Coulomb interaction
• Slow convergence with q-mesh due to singularity for | q | → 0. Di erent techniques available:
• Auxiliary function integration for 3D systems (Gygy1986 or Carrier2007)
• Truncated Coulomb interaction (sphere, cylinder, surface geometry, …)
• Monte Carlo integration of 1/vc(q + G) in the BZ microzone
ff
Correlated part
• The matrix elements of Σc(ω) are given by:

Frequency dependent term


(convolution)

where the expression for J depends on the integration technique (CD, PPM, AC)

Correlation is short-ranged (can use ecuteps << 4 ecut)


The most CPU demanding part of the sigma code: double sum over G + computation of J
Slow convergence wrt nband, especially for absolute QP energies. Gaps are easier to converge
J(ω) involves a complicated numerical integration. Possible approaches:
1. plasmon-pole models (fast but approximated)
2. numerical integration techniques (demanding but accurate)
Plasmon-pole models
• Main idea: approximate the imaginary part of ε −1(ω) with a delta-peak (plasmon resonance)

Kramers-Kronig Ω2G1G2(q)
Im ϵG−11G2(q, ω) = AG1G2(q) [δ(ω − ωG1G2(q)) − δ(ω + ωG1G2(q))] Re ϵG1G2(q, ω) = δG1G2 + 2 ~2
−1
ω −ω G1G2(q)
Amplitude of the peak Plasmon frequency

• The two parameters are tted so to reproduce ab-initio results at selected frequencies:

Most commonly used models


ppmodel 1: Reproduce ε −1(ω = 0) and ε −1(iωplasma) (Godby-Needs, default)

ppmodel 2: Reproduce ε −1(ω = 0) and f-sum rule. (Hybertsen-Louie)

Other models
ppmodel 3 : Spectral decomposition of ε −1 PRB 47 15931 (1993)

ppmodel 4 : Approximation to the reducible polarizability PRB 34, 5390 (1986)


fi
Contour deformation technique
+∞
Use Cauchy's residue theorem to rewrite the ∫−∞
f(ω + ω′) dω′ integral as:

+∞
i
{ }
c(z) (z − z ) − c(iω′) d(iω′)
∑ ∫−∞
Σc(ω) = 2π i lim G(z) W p G(ω + iω′) W
2π z
z→zp
p

Contribution from the poles located inside the contour. Integration along the imaginary axis (smooth
Usually ~50 ω-points. Need to interpolate W(ω′) integrand, Usually ~10 points are enough)

ℜΣ(ω) of Al computed with CD and PPM


50

• Accurate but expensive, especially at the level of memory since G vectors and ω′ Re Σ with PPM
Re Σ without PPM
-points are not MPI-distributed
• CD is required for more advanced treatments e.g. e-e- lifetimes τnk, spectral

Re Σ ( eV )
0

function Ank(ω) + cumulant expansion


• In principle, the CD can be replaced by the AC of Σnk(iω) to the real axis
provided we have robust tools to do the AC without loosing accuracy -50 Strong oscillations far
from the Fermi level



-40 -20 0 20 40



𝒞

ω ( eV )
𝒞
Plasmonpole models

i
2π ∫
Σc(r1, r2; ω) = c
G(r1, r2; ω + ω′) W (r1, r2; ω′) e iω′δ +
dω′ .
Analytic model




Plasmon-pole models
• Main idea: approximate the imaginary part of ε −1(ω) with a delta-peak (plasmon resonance)

Kramers-Kronig Ω2G1G2(q)
Im ϵG−11G2(q, ω) = AG1G2(q) [δ(ω − ωG1G2(q)) − δ(ω + ωG1G2(q))] Re ϵG1G2(q, ω) = δG1G2 + 2 ~2
−1
ω −ω G1G2(q)
Amplitude of the peak Plasmon frequency

• The two parameters are tted so to reproduce ab-initio results at selected frequencies:

Most commonly used models


ppmodel 1: Reproduce ε −1(ω = 0) and ε −1(iωplasma) (Godby-Needs, default)

ppmodel 2: Reproduce ε −1(ω = 0) and f-sum rule. (Hybertsen-Louie)

Other models
ppmodel 3 : Spectral decomposition of ε −1 PRB 47 15931 (1993)

ppmodel 4 : Approximation to the reducible polarizability PRB 34, 5390 (1986)


fi
Plasmon-pole model: pros and cons
Accurate QP energies close to the gap (error ≈ 0.1 − 0.2 eV)

Ideal tool for initial convergence studies

Very e cient both in term of CPU and memory as the convolution integral has the analytical expression:

ppmodel=1,2
iω ! δ
dω !
!
s 2 e
JG1 G2 (q, ω) = ΩG1 G2 (q) " #" #
ω + ω ! − "s + iη sign("s − µ) ω !2 − (ω̃G1 G2 (q) − iη)2

−1
Questionable in systems with d- or f-electron systems as one usually nds multiple peaks in ℑε

Im Σ(ω) is a sum of delta peaks. No satellites in spectral function or electron lifetimes

QP corrections for low- and high-energy states are qualitatively wrong. . .


ffi
fi
Plasmon-pole model breakdown
ReΣ(ω) of Aluminum calculated with and without PPM
50

Re Σ with PPM
Re Σ without PPM

Re Σ ( eV ) 0

-50 Strong oscillations far


from the Fermi level

-40 -20 0 20 40
WARNING ω ( eV )
Avoid PPMs for computing band widths in metals
Avoid PPMs for self-consistent GW calculations, especially when updating the wavefunctions
GW band structures
GW band structures
https://docs.abinit.org /tutorial/gw1/#7-how-to-compute-gw-band-structures

• GW corrections can be computed only for the k-points in the WFK le (k-mesh)
• GW band structures (k-path) require some sort of interpolation technique
• Three methods available:
1. Energy-dependent scissors operator
- t QP corrections as a function of the KS eigenvalues: ϵ QP = ϵ KS + Δ(ϵ KS)
- Easy to implement but rather crude, see this AbiPy example
2. Wannier interpolation
- Accurate but much more complex (requires wannier90 and maximally-localized Wannier functions)
- See tests/wannier90/t03.in and Phys. Rev. B 79, 045109, (2009)
3. Star-function interpolation
- Less accurate than Wannier but much easier to use (same method as in Bolztrap)
- Possible instabilities in the presence of band crossings
• In all methods, QP corrections for all k-points in the IBZ and all the relevant bands are needed.
fi
fi
Star-function interpolation with AbiPy
Corrections are smooth hence easier to interpolate
• Good compromise between accuracy and easiness of use
2
• Can interpolate either QP energies or QP corrections (recommended) PPMODEL 1
Contour Deformation
1
• In brief:

EQP-EKS (eV)
0
1. Compute KS band structure along a high-symmetry k-path
-1
2. Run GW for all k-points in the IBZ and the relevant bands

3. Use SIGRES.nc and GSR.nc to interpolate QP corrections -2


LDA GAP
-3
-2 0 2 4 6 8 10 12
EKS (eV)
• Python example:

# Get quasiparticle results from the SIGRES.nc database.


sigres = abiopen(abidata.ref_ le("si_g0w0ppm_nband30_SIGRES.nc")

# Read the KS band energies computed on the k-path


with abiopen(abidata.ref_ le("si_nscf_GSR.nc")) as gsr_nscf
ks_ebands_kpath = gsr_nscf.eband

# Interpolate the QP corrections.


# The QP energies are returned in r.qp_ebands_kpat
r = sigres.interpolate(lpratio=5, ks_ebands_kpath=ks_ebands_kpath

r.qp_ebands_kpath.plot(with_gaps=True)
fi
fi
s

GW with pseudopotentials
Pseudopotentials in a nutshell
• A pseudopotential (PP) mimics the interaction seen by valence electrons due to the core electrons and
the nucleus
• By construction, the PP reproduces the atomic energies and the valence wavefunctions of the all-electron
(AE) atom outside a certain radius
• AE valence wavefunctions are replaced by pseudized orbitals that are easier to describe in Fourier space
• Advantages of PPs:
- Much smaller cuto energy
- Less electrons involved in the calculation (frozen core approximation)

• Drawback of PPs:
- Cannot reproduce the nodal shape of AE orbitals
- Cannot account for core relaxation e ects

• In GW codes based on PPs, we only compute the valence part of the self-energy. Many-body e ects
due to core electrons are treated at the KS level and imported from the atomic environment
ff
ff
ff
GW with pseudopotentials
Important things to know when using pseudos for GW:

• The matrix elements of Σx are sensitive to the nodal shape of the orbitals

• Pseudos for GW calculations have to be generated carefully:


- pseudized orbitals should be close to the all-electron ones (not always possible)
- need multiple projector per l-channel to have good scattering properties in the empty region (no ghost)
- shallow semi-cores should be included, possibly the full n-shell (e.g. 3spd in Ga instead of 3d only)

Spatial overlap among


Ga-3d Ga-3spd 3s, 3p and 3d states
(OK for GS, bad for GW) (OK for GW)

bad logder at
high energy
The PseudoDojo project
http://www.pseudo-dojo.org /

• NC and PAW pseudos


• LDA, PBE, PBE-sol
• Scalar and relativistic pseudos with SOC
• Two tables:
1. standard for GS applications
2. stringent for GW or more accurate GS studies

• Stringent pseudos have:


- more electrons in valence
- smaller core radi
- more expensive in terms of ecut

• For some elements e.g. Si there is no di erence between the


standard and the stringent version
• All pseudos validated by comparing with AE results
• Hints for ecut are provided (low/normal/high)
ff
Plane-waves, symmetries and BZ
Plane-wave expansion of orbitals
• For periodic systems with lattice vectors R, Bloch theorem’s states:
ik·r
nk (r) =e unk (r) with unk (r + R) = unk (r)
• If we de ne the reciprocal lattice with lattice vectors G, such that:

iG·R
e =1
the periodic part of the Bloch’s function can be written:

where the coefficients unk(G) are obtained by a Fourier transform:

Unit cell
2
The basis set is truncated such that !k + G!
• < Ecut
2
fi
Convolution theorem
• Density associated with one eigenfunction: nbk (r) =

ubk (r) ubk (r)

!" # !" #
• In Fourier space:
#
∗ −iGr # iG r
nbk (r) = ubk (G)e ubk (G )e =
G G# convolution
"$
∗ # % i(G# −G)r
= ubk (G)ubk (G ) e
GG#

FFT box
• The radius of the G-sphere for n(G) is twice the radius of the
G-sphere used for the wavefunctions

G-sphere for n

Take-home message:
- Product in r-space —> convolution in G-space
- The FFT mesh should enclose the sphere of radius 2 * Gmax to treat the convolution exactly
Plane-wave expansion of two-point functions
• In nite system simulated with Born-von-Karman (BvK) periodic boundary conditions i.e.
(N1, N2, N3) supercell of volume V = NΩ with N = N1N2N3

• All our MBPT functions are de ned in the BvK supercell

• G, χ̃, W are invariant if we translate both r and r’ by R, that is: G(r1, r2) = G(r1 + R, r2 + R)

• This implies the following Fourier expansion:

1 i(q+G1)⋅r1 −i(q+G2)⋅r2
V ∑
f(r1, r2) = e fG1G2(q) e
q
G1G2
1
V ∬V
fG1G2(q) = e −i(q+G1)⋅r1
f(r1, r2) e i(q+G2)⋅r2
dr1 dr2

1 1 1
where the q-points belong to the BZ mesh that is dual to the BvK supercell: ( , , )
N1 N2 N3

• If G is expanded with cuto energy ecut then χ̃ = GG will have components up to 4 ecut!
fi
ff
fi
Crystal symmetries and k-points
• Wavefunctions and eigenvalues in the full Brillouin zone (BZ)
can be reconstructed from the irreducible wedge (IBZ)
• S = Rotation, t = fractional translation
 


!Sk = !k  ! = !−k

 % −1 & 
 nk
uk S (r − t)

uSk (r) =

e−iSk·t  †
(r) un−k (r)
 


  u nk =
uSk (G) = uk (S G)

 e−i(Sk+G)·t −1 

unk (G) †
 = un−k (−G)
Spatial symmetries
Time-reversal symmetry Irreducible wedge

Take-home message:
- unk(G) are computed and stored only for nkpt k-points in the IBZ
- The higher the number of symmetries nsym, the faster the calculation
- Space group is automatically detected, all symmetries are used by default (kptopt 1)
- ABINIT may nd less symmetries than expected if lattice and positions are not given with enough digits
fi
Exchange + correlated self-energy
• ABINIT computes Σnmk(ω) = ⟨nk | Σ(ω) | mk⟩ for a subset of KS states
• We never compute Σ(r1, r2; ω)
• Diagonal terms Σnk(ω) are enough for G0W0
• Computing GW band structures is not an easy task!

The k-point must be in the WFK le

Two mutually exclusive methods to specify the states


Explicit: Automatic with qprange = num
nkptgw: number of k-point in Σnk 0 → QP corrections only for the fundamental and the optical gap
kptgw: k-point list +num → QP corrections for all the k-points in the IBZ. Include num bands above and below ϵF
bdgw: range of n indices for each k-point -num → QP corrections for all the k-points in IBZ. Include all occupied states and num empty states.
fi
NSCF run to generate the WFK le

fi
Convolution theorem
• Density associated with one eigenfunction: nbk (r) =

ubk (r) ubk (r)

!" # !" #
• In Fourier space:
#
∗ −iGr # iG r
nbk (r) = ubk (G)e ubk (G )e =
G G# convolution
"$
∗ # % i(G# −G)r
= ubk (G)ubk (G ) e
GG#

FFT box
• The radius of the G-sphere for n(G) is twice the radius of the
G-sphere used for the wavefunctions

G-sphere for n

Take-home message:
- Product in r-space —> convolution in G-space
- The FFT mesh should enclose the sphere of radius 2 * Gmax to treat the convolution exactly
Generation of the WFK le with empty bands
• Expensive step so we recommend to:
getden_ lepath “pre x_DEN"
- start immediately with reasonably large nband (>> 10 nband_occ)
iscf -2 # NSCF
- perform initial convergence studies tolwfr 1e-18
- generate new WFK if nband is not enough nband 1200 # occ + empt
nbdbuf 120 # ~10% of nban
• Use nbdbuf to save a lot of time as high-energy states converge slowly
nstep 100 # default is too small
• The WFK de nes the BZ sampling and the list of k-points where QP
corrections can be computed NB: only the rst nband - nbdbuf
states are converged within tolwfr

Best practices:
• First compute the KS bands to locate the CBM/VBM then select the k-mesh (ngkpt, nshiftk, shiftk) accordingly
• Don’t use datasets to run GW in a single run. Split everything and optimize the number of MPI procs for each step
• The default eigensolver (conjugate gradient, CG) cannot use more than nkpt * nsppol MPI cores
• Use paral_kgb 1 (LOBPCG solver) if ncores > nkpt * nsppol …
fi
fi
fi
fi
y

fi
LOBPCG eigensolver
• More scalable than CG: parallelized over k-points, bands, FFT, spins getden_ lepath "pre x_DEN"
iscf -2 # NSCF
• More di cult to con gure (npkpt, npband, np t, bandpp) tolwfr 1e-18
nband 1200 # occ + empt
• Good news: memory scales with all MPI levels nbdbuf 120 # ~10% of nban
• Use autoparal 1 to let ABINIT nd a good con guration for given nstep 100 # default is too smal
number of MPI cores
paral_kgb
autoparal 1 # Only for GS!
https://docs.abinit.org /tutorial/paral_gspw/

Best practices:
• Use an even number of MPI processors when nsppol 2
• Ideally, the number of MPI cores should be proportional to nkpt * nsppol
• Avoid prime numbers for nband as npband must divide nband
• npband and bandpp can a ect the SCF convergence. Increasing bandpp usually makes the algorithm more stable
ffi
fi
1

fi
fi
ff
y

fi
d

ff
fi
Self-consistency methods
Self-consistency in the GW approximation
• If the initial DFT band structure if not adequate, one should update QP energies or wavefunctions
in the self-consistent cycle

• Di erent kind of self-consistency are possible: GoW, GWo, GW

• Full self-consistent GW calculations do not improve over GoWo. Moreover Σ is not hermitian and
energy dependent

• The Quasiparticle Self-consistent GW (QSGW) overcomes these problems:



where ℜ means that one only retains the hermitian part of the matrix.
1
fi |S|f j ⇥ = ¬ [ fi |S(ei )|f j ⇥ + fi |S(e j )|f j ⇥]
2
S. V. Faleev, M. van Schilfgaarde, and T. Kotani, Phys. Rev. Lett. 93, 126406 (2004)
ff
Self-consistency in the GW approximation
• The QSGW band gap is slightly bigger than the experimental one.

8 MgO
AlN

ZnO,GaN
Calculated gap (eV)
CaO

ZnSe,CuBr
InP,GaAs,CdTe
6 SrO

ZnTe,CdS

ZnS
diamond

Cu2O
4

InN,GaSb
InSb,InAs
AlAs,GaP,SiC,AlP
2
HgTe
AlSb,Se
Si
Ge,CdO
0 P,Te

0 2 4 6 8
Experimental gap (eV)
[adapted from van Schilfgaarde et al., PRL 96, 226402 (2006)]
Quasi-particle SCGW (II)

Left: LDA and G0W0 results for the band gap. Right: QPSGW results

Much more CPU demanding than one-shot GW due to the o -diagonal terms ⟨i|Σ|j⟩
The QP corrections must be calculated for all k-points and all occupied states
Check whether the chosen KS basis set is exible enough
fl
ff
Quasi-particle SCGW (I)
PRL 96 226402 (2006)

Faleev’s approach to the self-consistency issue:

ĥQP GW = T + vH [n](r1 ) + vext (r1 ) + Σ̃(r1 , r2 )

where Σ̃ is a static and Hermitian approximation to the true GW self-energy

k 1 !
k k
"
Σ̃ij ≡ Herm Σij (!ik ) + Σij (!jk )
2

QP
! "
Equations are solved self-consistently ĥQP GW Ψ QP
,E QP n , Σ̃

QP
!
QP states are expanded in terms of KS states (QPS le) |Ψmk ! = k
Unm |ΨKS
nk !
n

Important variables: getqps, gwcalctype


fi
Exchange + correlated self-energy
• ABINIT computes Σnmk(ω) = ⟨nk | Σ(ω) | mk⟩ for a
subset of KS states
• We never compute Σ(r1, r2; iω)
• Diagonal terms Σnk(ω) are enough for G0W0
• Computing GW band structures is not an easy task!
Some kind of interpolation is required

IMPORTANT: The k-point must be in the WFK le

Two mutually exclusive methods to specify the states


Explicit: Automatic with qprange = num
nkptgw: number of k-point in Σnk 0 → QP corrections only for the fundamental and the optical gap
kptgw: k-point list +num → QP corrections for all the k-points in the IBZ. Include num bands above and below ϵF
bdgw: range of n indices for each k-point -num → QP corrections for all the k-points in IBZ. Include all occupied states and num empty states.
fi

You might also like