dgamore.memory_estimator#
Pure, side-effect-free estimator of the peak host-memory of the memory-sensitive DGAmore operations. Each
save_memory_* switch in dgamore.config.MemoryConfig selects between a fast (flag off) and a lean
(flag on) code path; this module estimates the peak bytes of the dominant arrays of both paths so the driver can set
the flags automatically. Apart from the global storage precision dgamore.n_point_base.DTYPE (the single
source of truth for the per-element size), it pulls in no run-state from the package – no MPI, no psutil, no
config singleton: every input is passed as an argument, which keeps the formulas unit-testable in isolation.
All heavy quantities are backed by a single DTYPE array, and q-points are distributed
across MPI ranks, so per-rank arrays scale with the per-rank q-count rather than the total. Only the dominant large
arrays of each branch are modeled; a single global OVERHEAD_FACTOR accounts for the un-modeled transients.
Functions
|
Estimates the per-rank transient peak host-memory (in bytes) of the fast and lean code path of each memory-sensitive branch, split by whether each transient is distributed across the ranks of a node or built on a single rank, together with the per-rank persistent baseline. |
- class dgamore.memory_estimator.BranchPeak(off_distributed: float, off_single: float, on_distributed: float, on_single: float)[source]#
Bases:
objectPer-rank transient peak bytes of one memory-sensitive branch, split by how the peak is distributed across the MPI ranks of a node (the persistent baseline is reported separately and is not included here).
For the node-total budget the memory on a node with
rranks at this branch’s peak isr * (baseline + distributed) + single: a distributed transient is held by every rank simultaneously (so it scales withr), while a single-rank transient is built on one rank while the others idle (so it is counted once). Both the fast (off) and lean (on) code paths are described.- Variables:
off_distributed – Per-rank transient bytes held by every rank in the fast (flag-off) path.
off_single – Transient bytes held by a single rank in the fast (flag-off) path.
on_distributed – Per-rank transient bytes held by every rank in the lean (flag-on) path.
on_single – Transient bytes held by a single rank in the lean (flag-on) path.
- Parameters:
- dgamore.memory_estimator.estimate_peaks(*, n_bands: int, nk_tot: int, nk_irr: int, niw_core: int, niv_core: int, niv_full: int, niv_cut: int, niv_pp: int, n_ranks: int, with_eliashberg: bool, save_fq: bool = False, construct_fq_cheap: bool = False, overhead: float = 1.1) tuple[float, dict[str, BranchPeak]][source]#
Estimates the per-rank transient peak host-memory (in bytes) of the fast and lean code path of each memory-sensitive branch, split by whether each transient is distributed across the ranks of a node or built on a single rank, together with the per-rank persistent baseline.
The returned dict maps a branch key to a
BranchPeak; the branch keys mirror thesave_memory_for_*switches:"chi0q","chiq_aux","sde"are always present;"fq"and"lanczos"are added only whenwith_eliashbergis True. The first tuple element is the per-rank persistent baseline (the replicated full-grid Green’s function and self-energies that stay live throughout the non-local routine); the caller adds it to the node total (every rank holds it). For a node withrranks the memory at a branch’s peak isr * (baseline + distributed) + single.- Parameters:
n_bands (int) – Number of bands \(B\).
nk_tot (int) – Total number of momentum points (full BZ).
nk_irr (int) – Number of momentum points in the irreducible BZ.
niw_core (int) – Number of positive bosonic core frequencies.
niv_core (int) – Number of positive fermionic core frequencies.
niv_full (int) – Number of positive fermionic full-region frequencies.
niv_cut (int) – Number of positive fermionic frequencies the full-grid
giwk_fullis kept at through the kernel/SDE section (min(niw_core + niv_full + 10, niv_dmft)indgamore.nonlocal_sde.calculate_self_energy_q()); the SDE self-energy contraction needs the shell window, so giwk is not shrunk to the core box here.niv_pp (int) – Number of positive fermionic frequencies of the pp (Eliashberg) box.
n_ranks (int) – Number of MPI ranks the q-points are distributed over.
with_eliashberg (bool) – Whether the Eliashberg step runs (adds the
"fq"and"lanczos"branches).save_fq (bool) – Whether the full ladder vertex is kept in the full ph box (
config.eliashberg.save_fq); when True the per-rankfqaccumulator spans the full[wn, vc, vc]block instead of the small pp box.construct_fq_cheap (bool) – Whether the
fqper-q blocks are built on the smaller pp frequency box (config.eliashberg.construct_fq_cheap), shrinking every per-q two-fermion block fromvctovpp.overhead (float) – Global multiplicative factor accounting for un-modeled transient arrays.
- Returns:
A tuple
(baseline_bytes, peaks)of the per-rank baseline and a dict mapping each branch key to itsBranchPeak.- Return type:
tuple[float, dict[str, BranchPeak]]