Barak Fargoun’s Post

Why even bother with code-based lineage? Most lineage solutions are runtime-centric: they capture lineage during pipeline execution, whether tracking Spark transformations, Airflow DAGs, or Warehouse SQL queries. And while this provides excellent visibility into active pipelines, it also creates blind spots, particularly for rarely executed or complex code paths. For example: * Restore jobs triggered only during incidents or annual computations. * Full refresh of dbt models * Dashboard that are infrequently accessed In all those cases, runtime lineage cannot provide the full picture because those pipelines don’t run frequently enough to emit runtime lineage frequently enough. To fill this gap, lineage extracted directly from code becomes critical, creating a comprehensive view ensuring organizations are fully informed of their data lineage regardless of pipeline execution frequency. Interested in checking out Foundational’s code-based lineage? DM me or drop your details here:https://lnkd.in/d5FpfvRa

Michael W.

extremely honest dwh expert soon to be obviated by LLMs

3w

hi! i think “prospective lineage” vs “retrospective lineage” is also a useful framing. code-based is good too but for different purposes. anyway, love it, I’ll dm you

Like
Reply

To view or add a comment, sign in

Explore topics