A directed acyclic graph (DAG) is a directed graph with no directed cycles: starting from any node and following edges in their direction, you can never return to where you began. That single constraint — acyclicity — is what makes DAGs one of the most useful structures in computing. It guarantees a consistent ordering of nodes and lets recursive definitions over the graph terminate.

DAGs generalise both sequences (a chain is a DAG where every node has at most one predecessor and one successor) and trees (a tree is a DAG where every node has at most one parent). A DAG relaxes the tree’s single-parent rule: a node may be reachable along several distinct paths, as long as none of those paths loops back.

Formal Definition#

A DAG is a pair G = (V, E) where V is a set of vertices and E ⊆ V × V is a set of directed edges, such that there is no sequence of edges v₀ → v₁ → … → vₖ with v₀ = vₖ and k ≥ 1.

Two derived notions recur constantly:

  • A source is a vertex with no incoming edges.
  • A sink is a vertex with no outgoing edges.

Every non-empty DAG has at least one source and at least one sink — a fact that follows directly from acyclicity and is the basis of most DAG algorithms.

Why Acyclicity Matters#

The defining property is not cosmetic. It buys three things at once:

  • A topological order exists. A graph admits a topological ordering — a linear arrangement of vertices where every edge points “forward” — if and only if it is acyclic. This is the formal sense in which a DAG encodes a set of “X must come before Y” constraints.
  • Recursion over the graph terminates. Any function defined as “compute something for a node in terms of its neighbours” is well-defined on a DAG. The same recurrence on a cyclic graph either fails to terminate or has no fixed point. The height/depth recurrence is the canonical example.
  • Dynamic programming applies. Because there are no cycles, each subproblem is computed once and memoised, giving linear-time solutions to problems (longest path, reachability) that are NP-hard on general graphs.

A practical corollary: running the height/depth recurrence and watching for non-termination (a node that depends on itself transitively) is a standard way to detect an illegal cycle in what was supposed to be a DAG.

Uniqueness of the Ordering#

“Does a DAG have a single, unambiguous ordering of its nodes?” splits into two separate questions.

As a partial order, yes — there is exactly one. A DAG defines a single partial order: its reachability relation (the transitive closure). Whenever a path runs from u to w, “u before w” is fixed. That structure is intrinsic and unambiguous. What it does not do is order nodes with no path between them — they are incomparable.

As a linear order, there is always at least one, and usually many. A topological sort flattens the partial order into a sequence that respects every edge:

  • Existence is guaranteed (and equivalent to acyclicity).
  • Uniqueness holds if and only if the DAG contains a directed Hamiltonian path — one path threading through every node. Equivalently, the partial order is already total (a single chain) with no incomparable pairs left to reorder.
  • Otherwise the ordering is not unique. Independent nodes can be sequenced in any relative order, so there are multiple — often exponentially many — valid topological orders. Counting them (the linear extensions of the partial order) is #P-complete in general.

A direct test falls out of Kahn’s algorithm: the order is unique exactly when the set of in-degree-zero (“ready”) nodes has size one at every step. The first time two nodes are simultaneously ready, you have a real choice — and therefore more than one valid ordering.

This is the same incomparability that surfaces in the height/depth recurrence: the per-node level is unique, so the layering is unambiguous, but nodes sharing a level are incomparable and the order chosen among them is arbitrary. The layering is canonical; flattening it into one sequence generally is not.

Common Operations#

Operation What it computes Typical use
Topological sort A valid linear order respecting all edges Build/task scheduling
Longest path The critical path; linear-time on a DAG Project scheduling (PERT/CPM)
Height / depth Per-node distance to the nearest sink / source Layering, scheduling slack
Transitive reduction Smallest edge set with the same reachability Minimal dependency graphs
Transitive closure All reachable pairs Reachability queries

Where DAGs Show Up#

DAGs are the implicit data model behind a surprising amount of infrastructure:

  • Build systems — Make, Bazel, and Ninja model targets and their prerequisites as a DAG, then topologically order the work.
  • Task and dataflow schedulers — Airflow, Spark, and TensorFlow graphs are DAGs of operations.
  • Version control — a Git commit history is a DAG (merges give a commit two parents); git log --graph renders it.
  • Content addressing — Merkle DAGs underpin IPFS and Git’s object store; a blockchain is the degenerate linear case.
  • Distributed ledgers — IOTA’s Tangle replaced the linear chain with a transaction DAG; modern BFT consensus protocols such as Mysticeti are also DAG-structured.
  • Compilers — expression DAGs share common subexpressions; SSA form and dependency analysis are DAG problems.
  • Probabilistic models — a Bayesian network is a DAG of conditional dependencies.

Wiki Pages#

Height and Depth in a DAG