1 line
21 KiB
HTML
1 line
21 KiB
HTML
<!doctype html><title>Approaches to Compiler Pattern Matching</title><meta charset=utf-8><meta content="width=device-width,initial-scale=1" name=viewport><link href=res/favicon.png rel=icon sizes=512x512><link href=res/favicon.png rel=image_src type=image/png><link title="alexs168's blog" href=atom.xml rel=alternate type=application/atom+xml><body><style>@font-face{font-family:DejaVu Sans Mono;src:local(DejaVu Sans Mono),url(res/DejaVuSansMono.woff2)format("woff2"),local(Courier New),local(Courier),local(monospace);font-weight:400;font-style:normal;font-display:swap}body{font-family:DejaVu Sans Mono;font-size:10pt}td{vertical-align:top;width:100%;display:inline}h1,h2,h3,h4{margin-top:1%;margin-bottom:.75%}p{margin-top:.75%;margin-bottom:.75%}ul{margin-top:0%}.current{font-weight:700}pre{margin-top:0;margin-bottom:0;display:inline}a,a:visited{color:#3f51b5;text-decoration:none}</style><div><p><br><h1>Approaches to pattern matching in compilers</h1><p><span style=font-size:9pt><p>Git revision <a href=https://github.com/alex-s168/website/tree/34fd6adb3c89bef3f2d18b06b24533b52641bf4a>#34fd6adb</a><p><br>Modified at 19. August 2025 09:55<p>Written by <a href=https://alex.vxcc.dev>alex_s168</a></span></div><div><p><br><span style=text-decoration:underline><h2>Introduction</h2></span> Compilers often have to deal with pattern matching and rewriting (find-and-replace) inside the compiler IR (intermediate representation).<p>Common use cases for pattern matching in compilers:<ul><li>“peephole optimizations”: the most common kind of optimization in compilers. They find a short sequence of code and replace it with some other code. For example replacing <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>x <span style=color:#d73a49>&</span> (<span style=color:#b60157>1</span> <span style=color:#d73a49><<</span> b)</code></span> with a bit test operation.<li>finding a sequence of operations for complex optimization passes to operate on: advanced compilers have complex optimizations that can’t really be performed with simple IR operation replacements, and instead require complex logic. Patterns are used here to find operation sequences where those optimizations are applicable, and also to extract details inside that sequence.<li>code generation: converting the IR to machine code / VM bytecode. A compiler needs to find operations (or sequences of operations) inside the IR, and “replace” them with machine code.</ul></div><div><p><br><span style=text-decoration:underline><h2>Simplest Approach</h2></span> Currently, most compilers mostly do this inside their source code. For example, in MLIR, most (but not all) pattern matches are performed in C++ code.<p>The only advantage to this approach is that it doesn’t require a complex pattern matching system.<p>I only recommend doing this for small compiler toy projects.</div><div><p><br><span style=text-decoration:underline><h3>Disadvantages</h3></span> Doing pattern matching this way has many disadvantages.<p><br>Some (but not all):<ul><li>debugging pattern match rules can be hard<li>IR rewrites need to be tracked manually (for debugging)<li>source locations and debug information also need to be tracked manually, which often isn’t implemented very well.<li>verbose and barely readable pattern matching code<li>overall error-prone</ul><p>I myself did pattern matching this way in my old compiler backend, and I speak from experience when I say that this approach <strong>sucks</strong> (in most cases).</div><div><p><br><span style=text-decoration:underline><h2>Pattern Matching DSLs</h2></span> A custom language for describing IR patterns and IR transformations (aka rewrites).<p>I will put this into the category of “structured pattern matching”.</div><div><p><br>An example is Cranelift’s ISLE DSL:<div style=margin-top:4pt><span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><pre><code><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> x ^ x == 0.</span><br>(<span style=color:#4b69c6>rule</span> (<span style=color:#4b69c6>simplify</span> (<span style=color:#4b69c6>bxor</span> (<span style=color:#4b69c6>ty_int</span> ty) x x))<br> (<span style=color:#4b69c6>subsume</span> (<span style=color:#4b69c6>iconst_u</span> ty <span style=color:#b60157>0</span>)))</code></pre></span></div></div><div><p><br>Another example is tinygrad’s pattern system:<div style=margin-top:4pt><span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><pre><code>(<span style=color:#4b69c6>UPat</span>(Ops.AND, src<span style=color:#d73a49>=</span>(<br> UPat.<span style=color:#4b69c6>var</span>(<span style=color:#298e0d>"</span><span style=color:#298e0d>x</span><span style=color:#298e0d>"</span>),<br> <span style=color:#4b69c6>UPat</span>(Ops.SHL, src<span style=color:#d73a49>=</span>(<br> UPat.<span style=color:#4b69c6>const</span>(<span style=color:#b60157>1</span>),<br> UPat.<span style=color:#4b69c6>var</span>(<span style=color:#298e0d>"</span><span style=color:#298e0d>b</span><span style=color:#298e0d>"</span>)))),<br> <span style=color:#d73a49>lambda</span> x,b: <span style=color:#4b69c6>UOp</span>(Ops.BIT_TEST, src<span style=color:#d73a49>=</span>(x, b)))</code></pre></span></div><p>Fun fact: tinygrad actually decompiles the python code inside the second element of the pair, and runs multiple optimization passes on that.</div><div><br>This approach is used by many popular compilers such as LLVM, GCC, and Cranelift for peephole optimizations and code generation.</div><div><p><br><span style=text-decoration:underline><h3>Advantages</h3></span><ul><li><strong>debugging and tracking of rewrites, source locations, and debug information can be done properly</strong><li>patterns themselves can be inspected and modified programmatically.<li>they are easier to use and read than manual pattern matching in the source code.</ul><p><br>There is however an even better alternative:</div><div><p><br><span style=text-decoration:underline><h2>Pattern Matching Dialects</h2></span> I will also put this method into the category of “structured pattern matching”.<p><br>The main example of this is MLIR, with the <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>pdl</code></span> and the <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>transform</code></span> dialects. Sadly few projects/people use these dialects, and instead do pattern matching in C++ code. Probably because the dialects aren’t documented very well.</div><div><p><br><span style=text-decoration:underline><h3>What are compiler dialects?</h3></span> Modern compilers, especially multi-level compilers, such as MLIR, have their operations grouped in “dialects”.<p>Each dialect either represents specific kinds of operations, like arithmetic operations, or a specific backend’s/frontend’s operations, such as the <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>llvm</code></span>, <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>emitc</code></span>, and the <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>spirv</code></span> dialects in MLIR.<p>Dialects commonly contain operations, data types, as well as optimization and dialect conversion passes.</div><div><br><span style=text-decoration:underline><h3>Core Concept</h3></span> The IR patterns and transformations are represented using the compiler’s IR. This is mostly done in a separate dialect, with dedicated operations for operating on IR.</div><div><p><br><span style=text-decoration:underline><h3>Examples</h3></span> MLIR’s <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>pdl</code></span> dialect can be used to replace <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>arith.addi</code></span> with <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>my.add</code></span> like this:<div style=margin-top:4pt><span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><pre><code>pdl.pattern <span style=color:#d73a49>@replace_addi_with_my_add</span> : benefit(<span style=color:#b60157>1</span>) {<br> <span style=color:#d73a49>%arg0</span> = pdl.operand<br> <span style=color:#d73a49>%arg1</span> = pdl.operand<br> <span style=color:#d73a49>%op</span> = pdl.operation <span style=color:#298e0d>"arith.addi"</span>(<span style=color:#d73a49>%arg0</span>, <span style=color:#d73a49>%arg1</span>)<br><br> pdl.rewrite <span style=color:#d73a49>%op</span> {<br> <span style=color:#d73a49>%new_op</span> = pdl.operation <span style=color:#298e0d>"my.add"</span>(<span style=color:#d73a49>%arg0</span>, <span style=color:#d73a49>%arg1</span>) -> (<span style=color:#d73a49>%op</span>)<br> pdl.replace <span style=color:#d73a49>%op</span> with <span style=color:#d73a49>%new_op</span><br> }<br>}</code></pre></span></div></div><div><p><br><span style=text-decoration:underline><h3>Advantages</h3></span><ul><li>the pattern matching infrastructure can optimize it’s own patterns: The compiler can operate on patterns and rewrite rules like they are normal operations. This removes the need for special infrastructure regarding pattern matching DSLs.<li>the compiler could AOT compile patterns<li>the compiler could optimize, analyze, and combine patterns to reduce compile time.<li>IR (de-)serialization infrastructure in the compiler can also be used to exchange peephole optimizations.<li>bragging rights: your compiler represents its patterns in it’s own IR</ul></div><div><p><br><span style=text-decoration:underline><h3>Combining with a DSL</h3></span> I recommend having a pattern matching / rewrite DSL, that transpiles to pattern matching / rewrite dialect operations.<p>The advantage of this over just having a rewrite dialect is that it makes patterns even more readable (and maintainable!)</div><div><p><br><span style=text-decoration:underline><h3>E-Graphs</h3></span> <span style=white-space:nowrap><p><a href=https://en.wikipedia.org/wiki/E-graph>E-Graphs</a></span> are magical datastructures that can be used to efficiently encode all possible transformations, and then select the best transformation.<p>An example implementation is <a href=https://egraphs-good.github.io/>egg</a><p>Even though E-Graphs solve most problems, I still recommend using a pattern matching dialect, especially in multi-level compilers, to be more flexible, and have more future-proof pattern matching, or you decide that you want to match some complex patterns manually.</div><div><p><br><span style=text-decoration:underline><h2>More Advantages of Structured Pattern Matching</h2></span><p><span style=text-decoration:underline><h3>Smart Pattern Matchers</h3></span> Instead of brute-forcing all peephole optimizations (of which there can be a LOT in advanced compilers), the compiler can organize all the patterns to provide more efficient matching. I didn’t yet investigate how to do this. If you have any ideas regarding this, please <a href=https://alex.vxcc.dev>contact me.</a><p>There are other ways to speed up the pattern matching and rewrite process using this too.</div><div><br><span style=text-decoration:underline><h3>Reversible Transformations</h3></span> I don’t think that there currently is any compiler that does this. If you do know one, again, please <a href=https://alex.vxcc.dev>contact me.</a></div><div><br>Optimizing compilers typically deal with code (mostly written by people) that is on a lower level than the compiler theoretically supports. For example, humans tend to write code like this for extracting a bit: <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>x <span style=color:#d73a49>&</span> (<span style=color:#b60157>1</span> <span style=color:#d73a49><<</span> b)</code></span>, but compilers tend to have a high-level bit test operation (with exceptions). A reason for having higher-level primitives is that it allows the compiler to do more high-level optimizations, but also some target architectures have a bit test operation, that is more optimal.</div><div><p><br>This is not just the case for “low-level” things like bit tests, but also high level concepts, like a reduction over an array, or even the implementation of a whole algorithm. For example LLVM, since recently, can detect implementations of <a href=https://en.wikipedia.org/wiki/Cyclic_redundancy_check>CRC.</a></div><div><br>LLVM actually doesn’t have many dedicated operations like a bit-test operation, and instead canonicalizes all bit-test patterns to <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>x <span style=color:#d73a49>&</span> (<span style=color:#b60157>1</span> <span style=color:#d73a49><<</span> b) <span style=color:#d73a49>!=</span> <span style=color:#b60157>0</span></code></span>, and matches for that in compiler passes that expect bit test operations.</div><div><br>Now let’s go back to the <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>x <span style=color:#d73a49>&</span> (<span style=color:#b60157>1</span> <span style=color:#d73a49><<</span> b)</code></span> (bit test) example. Optimizing compilers should be able to detect that, and other bit test patterns (like <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>x <span style=color:#d73a49>&</span> (<span style=color:#b60157>1</span> <span style=color:#d73a49><<</span> b) <span style=color:#d73a49>></span> <span style=color:#b60157>0</span></code></span>), and then replace those with a bit-test operation. But they also have to be able to convert bit-test operations back to their implementation for compilation targets that don’t have a bit-test instruction. Currently, compiler backends do this by having separate patterns for converting bit-test to it’s dedicated operation, and back.</div><div><p><br>A better solution is to associate a set of implementations with the bit test operation, and make the compiler <strong>automatically reverse</strong> those to generate the best implementation (in the instruction selector for example).<p>Implementing pattern/transformation reversion can be challenging however, but it provides many benefits, and all “big” compilers should definitely do this, in my opinion.</div><div><p><br><span style=text-decoration:underline><h3>Runtime Library</h3></span> Compilers typically come with a runtime library that implement more complex operations that aren’t supported by most processors or architectures.<p>The implementation of those functions should also use that pattern matching dialect. This allows your backend to detect code written by users with a similar implementation as in the runtime library, giving you some additional optimizations for free.<p>I don’t think any compiler currently does this either.</div><div><p><br><span style=text-decoration:underline><h2>Problems with Pattern Matching</h2></span> The main problem is ordering the patterns.<p>As an example, consider these three patterns:<div style=margin-top:4pt><span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><pre><code><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> A</span><br>(<span style=color:#4b69c6>add</span> x:Const y) => (<span style=color:#4b69c6>add</span> y x)<br><br><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> B</span><br>(<span style=color:#4b69c6>sub</span> (<span style=color:#4b69c6>add</span> x y:Const) z:Const) => (<span style=color:#4b69c6>lea</span> x y (<span style=color:#4b69c6>const_neg</span> z))<br><br><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> C</span><br>(<span style=color:#4b69c6>add</span> x <span style=color:#b60157>1</span>) => (<span style=color:#4b69c6>inc</span> x)</code></pre></span></div></div><div><p><br>Now what should the compiler do when it sees this:<div style=margin-top:4pt><span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><pre><code>(<span style=color:#4b69c6>sub</span> (<span style=color:#4b69c6>add</span> <span style=color:#b60157>5</span> <span style=color:#b60157>1</span>) <span style=color:#b60157>2</span>)</code></pre></span></div></div><div><p><br>All three patterns would match:<div style=margin-top:4pt><span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><pre><code><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> apply A</span><br>(<span style=color:#4b69c6>sub</span> (<span style=color:#4b69c6>add</span> <span style=color:#b60157>5</span> <span style=color:#b60157>1</span>) <span style=color:#b60157>2</span>) => (<span style=color:#4b69c6>sub</span> (<span style=color:#4b69c6>add</span> <span style=color:#b60157>1</span> <span style=color:#b60157>5</span>) <span style=color:#b60157>2</span>)<br><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> only B applies now</span><br>(<span style=color:#4b69c6>sub</span> (<span style=color:#4b69c6>add</span> <span style=color:#b60157>1</span> <span style=color:#b60157>5</span>) <span style=color:#b60157>2</span>) => (<span style=color:#4b69c6>lea</span> <span style=color:#b60157>1</span> <span style=color:#b60157>5</span> (<span style=color:#4b69c6>const_neg</span> <span style=color:#b60157>2</span>))<br><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> nothing applies anymore</span><br><br><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> alternatively apply B</span><br>(<span style=color:#4b69c6>sub</span> (<span style=color:#4b69c6>add</span> <span style=color:#b60157>5</span> <span style=color:#b60157>1</span>) <span style=color:#b60157>2</span>) => (<span style=color:#4b69c6>lea</span> <span style=color:#b60157>5</span> <span style=color:#b60157>1</span> (<span style=color:#4b69c6>const_neg</span> <span style=color:#b60157>2</span>))<br><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> nothing applies anymore</span><br><br><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> atlernatively apply C</span><br>(<span style=color:#4b69c6>sub</span> (<span style=color:#4b69c6>add</span> <span style=color:#b60157>5</span> <span style=color:#b60157>1</span>) <span style=color:#b60157>2</span>) => (<span style=color:#4b69c6>sub</span> (<span style=color:#4b69c6>inc</span> <span style=color:#b60157>5</span>) <span style=color:#b60157>2</span>)<br><span style=color:#8a8a8a>;;</span><span style=color:#8a8a8a> nothing applies anymore</span></code></pre></span></div></div><div><p><br>Now which of those transformations should be performed?<p>This is not as easy to solve as it seems, especially in the context of instruction selection (specifically scheduling), where the performance on processors depends on a sequence of instructions, instead of just a single instruction.</div><div><p><br><span style=text-decoration:underline><h3>Superscalar CPUs</h3></span> Modern processor architecture features like superscalar execution make this even more complicated.<p><br>As a simple, <strong>unrealistic</strong> example, let’s imagine a CPU (core) that has one bit operations execution unit, and two ALU execution units / ports.<br>This means that the CPU can execute two instructions in the ALU unit and one instruction in the bit ops unit at the same time.</div><div><p><br>One might think that always optimizing <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>a & (1 << b)</code></span> to a bit test operation is good for performance. But in this example, that is not the case.<p>If we have a function that does a lot of bitwise operations next to each other, and the compiler replaces all bit tests with bit test operations, suddenly all operations depend on the bit ops unit, which means that instead of executing 3 instructions at a time (ignoring pipelining), the CPU can only execute one instruction at a time.</div><div><p><br>This shows that we won’t know if an optimization is actually good, until we are at a late point in the compilation process where we can simulate the CPU’s instruction scheduling.<p>This does not only apply to instruction selection, but also to more higher-level optimizations, such as loop and control flow related optimizations.</div><div><p><br><span style=text-decoration:underline><h2>Conclusion</h2></span> One can see how pattern matching dialects are the best option to approach pattern matching.<p><br>Someone wanted me to insert a takeaway here, but I won’t.<p><br>PS: I’ll hunt down everyone who still decides to do pattern matching in their compiler source after reading this article.</div><script>var gotoVariant=(a=>{window.location.href=window.location.href.replace(/\.\w+.html/g,a)});window.addEventListener(`beforeprint`,a=>{gotoVariant(`.min.pdf`)})</script><script async src=coffee.js></script> |