This commit is contained in:
2025-07-04 18:57:01 +02:00
parent a454722901
commit 683ac5fe6a
3 changed files with 59 additions and 27 deletions

View File

@@ -1,6 +1,5 @@
#let alex_contact_url = "https://alex.vxcc.dev"
#let to-bool(str) = {
if str == "true" {
return true

View File

@@ -1,8 +1,8 @@
#import "common.typ": *
// pdfs need to be smaller text
#let small-font-size = if is-web { 14pt } else { 10pt }
#let default-font-size = if is-web { 17pt } else { 12pt }
#let small-font-size = if is-web { 14pt } else { 7pt }
#let default-font-size = if is-web { 17pt } else { 9pt }
#let core-page-style(content) = {[

View File

@@ -20,15 +20,15 @@
#section[
= Introduction
Compilers often have to deal with find-and-replace inside the compiler IR (intermediate representation).
Compilers often have to deal with find-and-replace (pattern matching and rewriting) inside the compiler IR (intermediate representation).
Common use cases for pattern matching in compilers:
- "peephole optimizations": the most common kind of optimization in compilers.
They find a short sequence of code and replace that with some other code,
for example replacing ```c x & 1 << b``` with a bit test.
They find a short sequence of code and replace it with some other code.
For example replacing ```c x & (1 << b)``` with a bit test operation.
- finding a sequence of operations for complex optimization passes to operate on:
advanced compilers have complex operations that can't really be performed with
simple IR operation replacements, and instead requires complex logic.
advanced compilers have complex optimizations that can't really be performed with
simple IR operation replacements, and instead require complex logic.
Patterns are used here to find operation sequences where those optimizations
are applicable, and also to extract details inside that sequence.
- code generation: converting the IR to machine code / VM bytecode.
@@ -41,15 +41,12 @@
Currently, most compilers mostly do this inside the compiler's source code.
For example, in MLIR, *most* pattern matches are performed in C++ code.
The only advantage to this approach is that it will reduce compiler development time
if the compiler only needs to match a few patterns.
The only advantage to this approach is that it doesn't require a complex pattern matching system.
]
#section[
== Disadvantages
Doing pattern matching inside the compiler's source code has many disadvantages.
I strongly advertise against doing pattern matching this way.
Doing pattern matching that way has many disadvantages.
\
Some (but not all) disadvantages:
@@ -59,7 +56,7 @@
- overall error-prone
I myself did pattern matching this way in my old compiler backend,
and I speak from experience when I say that this approach sucks.
and I speak from experience when I say that this approach *sucks* (in most cases).
]
#section[
@@ -71,7 +68,7 @@
#section[
An example is Cranelift's ISLE:
#context html-frame[```isle
#context html-frame[```lisp
;; x ^ x == 0.
(rule (simplify (bxor (ty_int ty) x x))
(subsume (iconst_u ty 0)))
@@ -112,15 +109,15 @@
#section[
= Pattern Matching Dialects
This section also applies to compilers that don't use dialects, but do pattern matching this way.
For example GHC has the `RULES` pragma, which does something like this. I don't know where that is used, or if anyone even uses that...
For example, GHC has the `RULES` pragma, which does something like this. I however don't know what that is actually used for...
\
I will also put this into the category of "structured pattern matching".
I will also put this method into the category of "structured pattern matching".
\
The main example of this is MLIR, with the `pdl` and the `transform` dialects.
Sadly few projects use these dialects, and instead have C++ pattern matching code.
One reason for this could be that they aren't documented very well.
Sadly few projects/people use these dialects, and instead use C++ pattern matching code.
I think that is because the dialects aren't documented very well.
]
#section[
@@ -128,8 +125,8 @@
Modern compilers, especially multi-level compilers, such as MLIR,
have their operations grouped in "dialects".
Each dialect represents either a specific kind of operations, like arithmetic operations,
or a specific compilation target / backend's operations, such as the `llvm` dialect in MLIR.
Each dialect represents either specific kind of operations, like arithmetic operations,
or a specific compilation target/backend's operations, such as the `llvm` dialect in MLIR.
Dialects commonly contain operations, data types, as well as optimization and dialect conversion passes.
]
@@ -137,8 +134,25 @@
#section[
== Core Concept
Instead of, or in addition to having a separate language for pattern matching and rewrites,
the patterns and rewrites are represented in the compiler IR.
This is mostly done in a separate dialect.
the IR patterns and rewrites are represented in the compiler IR itself.
This is mostly done in a separate dialect, with dedicated operations for operating on compiler IR.
]
#section[
== Examples
MLIR's `pdl` dialect can be used to replace `arith.addi` with `my.add` like this:
#context html-frame[```llvm
pdl.pattern @replace_addi_with_my_add : benefit(1) {
%arg0 = pdl.operand
%arg1 = pdl.operand
%op = pdl.operation "arith.addi"(%arg0, %arg1)
pdl.rewrite %op {
%new_op = pdl.operation "my.add"(%arg0, %arg1) -> (%op)
pdl.replace %op with %new_op
}
}
```]
]
#section[
@@ -152,6 +166,14 @@
- bragging rights: your compiler represents it's own patterns in it's own IR
]
#section[
== Combining with a DSL
The best way to do pattern matching is to have a pattern matching / rewrite DSL,
that transpiles to pattern matching / rewrite dialect operations.
The advantage of this over just having a rewrite dialect is that it (should) make patterns even more readable.
]
#section[
= More Advantages of Structured Pattern Matching
@@ -173,10 +195,18 @@
#section[
Optimizing compilers typically deal with code (mostly written by people)
that is on a lower level than the compiler theoretically supports.
For example, humans tend to write code like this for testing for a bit: ```c x & 1 << b```,
but compilers tend to have a high-level bit test primitive.
For example, humans tend to write code like this for testing for a bit: ```c x & (1 << b)```,
but compilers tend to have a high-level bit test operation (with exceptions).
A reason for having higher-level primitives is that it allows the compiler to do more high-level optimizations,
but also some target architectures have a bit test operation that is faster.
but also some target architectures have a bit test operation, that is more optimal.
]
// TODO! DEBUG INFORMATION
#section[
LLVM actually doesn't have many dedicated operations like a bit-test operation,
and instead canonicalizes all bit-test patterns to ```c x & (1 << b) != 0```,
and matches for that in passes that expect bit test operations.
]
#section[
@@ -186,7 +216,7 @@
]
#section[
Now let's go back to the ```c x & 1 << b``` (bit test) example.
Now let's go back to the ```c x & (1 << b)``` (bit test) example.
Optimizing compilers should be able to detect that pattern, and also other bit test patterns (like ```c x & (1 << b) > 0```),
and then replace those with a bit test operation.
But they also have to be able to convert bit test operations back to their implementation for targets that don't have a bit test operation.
@@ -213,6 +243,9 @@
= Conclusion
One can see how pattern matching dialects are the best option by far.
\
Someone wanted me to insert a takeaway here, but I won't.
\
PS: I'll hunt down everyone who still decides to do pattern matching in their compiler source after reading this article.
]