A directed acyclic graph (DAG) is a directed graph with no directed cycles: starting from any node and following edges in their direction, you can never return to where you began. That single constraint — acyclicity — is what makes DAGs one of the most useful structures in computing. It guarantees a consistent ordering of nodes and lets recursive definitions over the graph terminate.
DAGs generalise both sequences (a chain is a DAG where every node has at most one predecessor and one successor) and trees (a tree is a DAG where every node has at most one parent). A DAG relaxes the tree’s single-parent rule: a node may be reachable along several distinct paths, as long as none of those paths loops back.
Formal Definition#
A DAG is a pair G = (V, E) where V is a set of vertices and E ⊆ V × V is a set of directed edges, such that there is no sequence of edges v₀ → v₁ → … → vₖ with v₀ = vₖ and k ≥ 1.
Two derived notions recur constantly:
- A source is a vertex with no incoming edges.
- A sink is a vertex with no outgoing edges.
Every non-empty DAG has at least one source and at least one sink — a fact that follows directly from acyclicity and is the basis of most DAG algorithms.
Why Acyclicity Matters#
The defining property is not cosmetic. It buys three things at once:
- A topological order exists. A graph admits a topological ordering — a linear arrangement of vertices where every edge points “forward” — if and only if it is acyclic. This is the formal sense in which a DAG encodes a set of “X must come before Y” constraints.
- Recursion over the graph terminates. Any function defined as “compute something for a node in terms of its neighbours” is well-defined on a DAG. The same recurrence on a cyclic graph either fails to terminate or has no fixed point. The height/depth recurrence is the canonical example.
- Dynamic programming applies. Because there are no cycles, each subproblem is computed once and memoised, giving linear-time solutions to problems (longest path, reachability) that are NP-hard on general graphs.
A practical corollary: running the height/depth recurrence and watching for non-termination (a node that depends on itself transitively) is a standard way to detect an illegal cycle in what was supposed to be a DAG.
Uniqueness of the Ordering#
“Does a DAG have a single, unambiguous ordering of its nodes?” splits into two separate questions.
As a partial order, yes — there is exactly one. A DAG defines a single partial order: its reachability relation (the transitive closure). Whenever a path runs from u to w, “u before w” is fixed. That structure is intrinsic and unambiguous. What it does not do is order nodes with no path between them — they are incomparable.
As a linear order, there is always at least one, and usually many. A topological sort flattens the partial order into a sequence that respects every edge:
- Existence is guaranteed (and equivalent to acyclicity).
- Uniqueness holds if and only if the DAG contains a directed Hamiltonian path — one path threading through every node. Equivalently, the partial order is already total (a single chain) with no incomparable pairs left to reorder.
- Otherwise the ordering is not unique. Independent nodes can be sequenced in any relative order, so there are multiple — often exponentially many — valid topological orders. Counting them (the linear extensions of the partial order) is #P-complete in general.
A direct test falls out of Kahn’s algorithm: the order is unique exactly when the set of in-degree-zero (“ready”) nodes has size one at every step. The first time two nodes are simultaneously ready, you have a real choice — and therefore more than one valid ordering.
This is the same incomparability that surfaces in the height/depth recurrence: the per-node level is unique, so the layering is unambiguous, but nodes sharing a level are incomparable and the order chosen among them is arbitrary. The layering is canonical; flattening it into one sequence generally is not.
Common Operations#
| Operation | What it computes | Typical use |
|---|---|---|
| Topological sort | A valid linear order respecting all edges | Build/task scheduling |
| Longest path | The critical path; linear-time on a DAG | Project scheduling (PERT/CPM) |
| Height / depth | Per-node distance to the nearest sink / source | Layering, scheduling slack |
| Transitive reduction | Smallest edge set with the same reachability | Minimal dependency graphs |
| Transitive closure | All reachable pairs | Reachability queries |
Where DAGs Show Up#
DAGs are the implicit data model behind a surprising amount of infrastructure:
- Build systems — Make, Bazel, and Ninja model targets and their prerequisites as a DAG, then topologically order the work.
- Task and dataflow schedulers — Airflow, Spark, and TensorFlow graphs are DAGs of operations.
- Version control — a Git commit history is a DAG (merges give a commit two parents);
git log --graphrenders it. - Content addressing — Merkle DAGs underpin IPFS and Git’s object store; a blockchain is the degenerate linear case.
- Distributed ledgers — IOTA’s Tangle replaced the linear chain with a transaction DAG; modern BFT consensus protocols such as Mysticeti are also DAG-structured.
- Compilers — expression DAGs share common subexpressions; SSA form and dependency analysis are DAG problems.
- Probabilistic models — a Bayesian network is a DAG of conditional dependencies.
External Links#
- Wikipedia: Directed acyclic graph
- Wikipedia: Topological sorting
- CLRS, Introduction to Algorithms — DAG-shortest-paths and topological sort (Ch. 22, 24)