dgamore.memory_estimator#

Pure, side-effect-free estimator of the peak host-memory of the memory-sensitive DGAmore operations. Each save_memory_* switch in dgamore.config.MemoryConfig selects between a fast (flag off) and a lean (flag on) code path; this module estimates the peak bytes of the dominant arrays of both paths so the driver can set the flags automatically. Apart from the global storage precision dgamore.n_point_base.DTYPE (the single source of truth for the per-element size), it pulls in no run-state from the package – no MPI, no psutil, no config singleton: every input is passed as an argument, which keeps the formulas unit-testable in isolation.

All heavy quantities are backed by a single DTYPE array, and q-points are distributed across MPI ranks, so per-rank arrays scale with the per-rank q-count rather than the total. Only the dominant large arrays of each branch are modeled; a single global OVERHEAD_FACTOR accounts for the un-modeled transients.

Functions

estimate_peaks(*, n_bands, nk_tot, nk_irr, ...)

Estimates the per-rank transient peak host-memory (in bytes) of the fast and lean code path of each memory-sensitive branch, split by whether each transient is distributed across the ranks of a node or built on a single rank, together with the per-rank persistent baseline.

class dgamore.memory_estimator.BranchPeak(off_distributed: float, off_single: float, on_distributed: float, on_single: float)[source]#

Bases: object

Per-rank transient peak bytes of one memory-sensitive branch, split by how the peak is distributed across the MPI ranks of a node (the persistent baseline is reported separately and is not included here).

For the node-total budget the memory on a node with r ranks at this branch’s peak is r * (baseline + distributed) + single: a distributed transient is held by every rank simultaneously (so it scales with r), while a single-rank transient is built on one rank while the others idle (so it is counted once). Both the fast (off) and lean (on) code paths are described.

Variables:

off_distributed – Per-rank transient bytes held by every rank in the fast (flag-off) path.
off_single – Transient bytes held by a single rank in the fast (flag-off) path.
on_distributed – Per-rank transient bytes held by every rank in the lean (flag-on) path.
on_single – Transient bytes held by a single rank in the lean (flag-on) path.

Parameters:

off_distributed (float)
off_single (float)
on_distributed (float)
on_single (float)

dgamore.memory_estimator.estimate_peaks(*, n_bands: int, nk_tot: int, nk_irr: int, niw_core: int, niv_core: int, niv_full: int, niv_cut: int, niv_pp: int, n_ranks: int, with_eliashberg: bool, save_fq: bool = False, construct_fq_cheap: bool = False, overhead: float = 1.1) → tuple[float, dict[str, BranchPeak]][source]#

The returned dict maps a branch key to a BranchPeak; the branch keys mirror the save_memory_for_* switches: "chi0q", "chiq_aux", "sde" are always present; "fq" and "lanczos" are added only when with_eliashberg is True. The first tuple element is the per-rank persistent baseline (the replicated full-grid Green’s function and self-energies that stay live throughout the non-local routine); the caller adds it to the node total (every rank holds it). For a node with r ranks the memory at a branch’s peak is r * (baseline + distributed) + single.

Parameters:

n_bands (int) – Number of bands \(B\).
nk_tot (int) – Total number of momentum points (full BZ).
nk_irr (int) – Number of momentum points in the irreducible BZ.
niw_core (int) – Number of positive bosonic core frequencies.
niv_core (int) – Number of positive fermionic core frequencies.
niv_full (int) – Number of positive fermionic full-region frequencies.
niv_cut (int) – Number of positive fermionic frequencies the full-grid giwk_full is kept at through the kernel/SDE section (min(niw_core + niv_full + 10, niv_dmft) in dgamore.nonlocal_sde.calculate_self_energy_q()); the SDE self-energy contraction needs the shell window, so giwk is not shrunk to the core box here.
niv_pp (int) – Number of positive fermionic frequencies of the pp (Eliashberg) box.
n_ranks (int) – Number of MPI ranks the q-points are distributed over.
with_eliashberg (bool) – Whether the Eliashberg step runs (adds the "fq" and "lanczos" branches).
save_fq (bool) – Whether the full ladder vertex is kept in the full ph box (config.eliashberg.save_fq); when True the per-rank fq accumulator spans the full [wn, vc, vc] block instead of the small pp box.
construct_fq_cheap (bool) – Whether the fq per-q blocks are built on the smaller pp frequency box (config.eliashberg.construct_fq_cheap), shrinking every per-q two-fermion block from vc to vpp.
overhead (float) – Global multiplicative factor accounting for un-modeled transient arrays.

Returns:

A tuple (baseline_bytes, peaks) of the per-rank baseline and a dict mapping each branch key to its BranchPeak.

Return type:

tuple[float, dict[str, BranchPeak]]