Weighted edit distance computation: Strings, trees, and Dyck

D Das, J Gilbert, MT Hajiaghayi, T Kociumaka… - Proceedings of the 55th …, 2023 - dl.acm.org
Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023dl.acm.org
Given two strings of length n over alphabet Σ, and an upper bound k on their edit distance,
the algorithm of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88) from almost
forty years back computes the unweighted string edit distance in O (n+ k 2) time. To date, it
remains the fastest algorithm for exact edit distance computation, and it is optimal under the
Strong Exponential Hypothesis (Backurs and Indyk; STOC'15). Over the years, this result has
inspired many developments, including fast approximation algorithms for string edit distance …
Given two strings of length n over alphabet Σ, and an upper bound k on their edit distance, the algorithm of Myers (Algorithmica’86) and Landau and Vishkin (JCSS’88) from almost forty years back computes the unweighted string edit distance in O(n+k2) time. To date, it remains the fastest algorithm for exact edit distance computation, and it is optimal under the Strong Exponential Hypothesis (Backurs and Indyk; STOC’15). Over the years, this result has inspired many developments, including fast approximation algorithms for string edit distance as well as similar Õ(n+poly(k))-time algorithms for generalizations to tree and Dyck edit distances. Surprisingly, all these results hold only for unweighted instances.
While unweighted edit distance is theoretically fundamental, almost all real-world applications require weighted edit distance, where different weights are assigned to different edit operations (insertions, deletions, and substitutions), and the weights may vary with the characters being edited. Given a weight function w : Σ∪{ε} × Σ∪{ε} → ℝ≥ 0 (such that w(a,a) = 0 and w(a,b) ≥ 1 for all a, b∈ Σ∪{ε} with ab), the goal is to find an alignment that minimizes the total weight of edits. Except for the vanilla O(n2)-time dynamic-programming algorithm and its almost trivial O(nk)-time implementation (k being an upper bound on the sought total weight), none of the aforementioned developments on the unweighted edit distance applies to the weighted variant.
In this paper, we propose the first O(n+poly(k))-time algorithm that computes the weighted string edit distance exactly, thus bridging a fundamental decades-old gap between our understanding of unweighted and weighted edit distance. We then generalize this result to the weighted tree and Dyck edit distances, bringing in several new techniques, which lead to a deterministic algorithm that improves upon the previous work even for unweighted tree edit distance. Given how fundamental weighted edit distance is, we believe our O(n+poly(k))-time algorithm will be instrumental for further significant developments in the area.
ACM Digital Library