Why even bother with code-based lineage? Most lineage solutions are runtime-centric: they capture lineage during pipeline execution, whether tracking Spark transformations, Airflow DAGs, or Warehouse SQL queries. And while this provides excellent visibility into active pipelines, it also creates blind spots, particularly for rarely executed or complex code paths. For example: * Restore jobs triggered only during incidents or annual computations. * Full refresh of dbt models * Dashboard that are infrequently accessed In all those cases, runtime lineage cannot provide the full picture because those pipelines don’t run frequently enough to emit runtime lineage frequently enough. To fill this gap, lineage extracted directly from code becomes critical, creating a comprehensive view ensuring organizations are fully informed of their data lineage regardless of pipeline execution frequency. Interested in checking out Foundational’s code-based lineage? DM me or drop your details here:https://lnkd.in/d5FpfvRa
extremely honest dwh expert soon to be obviated by LLMs
3whi! i think “prospective lineage” vs “retrospective lineage” is also a useful framing. code-based is good too but for different purposes. anyway, love it, I’ll dm you