This commit is contained in:
2025-07-04 18:57:01 +02:00
parent a454722901
commit 683ac5fe6a
3 changed files with 59 additions and 27 deletions

View File

@@ -1,6 +1,5 @@
#let alex_contact_url = "https://alex.vxcc.dev" #let alex_contact_url = "https://alex.vxcc.dev"
#let to-bool(str) = { #let to-bool(str) = {
if str == "true" { if str == "true" {
return true return true

View File

@@ -1,8 +1,8 @@
#import "common.typ": * #import "common.typ": *
// pdfs need to be smaller text // pdfs need to be smaller text
#let small-font-size = if is-web { 14pt } else { 10pt } #let small-font-size = if is-web { 14pt } else { 7pt }
#let default-font-size = if is-web { 17pt } else { 12pt } #let default-font-size = if is-web { 17pt } else { 9pt }
#let core-page-style(content) = {[ #let core-page-style(content) = {[

View File

@@ -20,15 +20,15 @@
#section[ #section[
= Introduction = Introduction
Compilers often have to deal with find-and-replace inside the compiler IR (intermediate representation). Compilers often have to deal with find-and-replace (pattern matching and rewriting) inside the compiler IR (intermediate representation).
Common use cases for pattern matching in compilers: Common use cases for pattern matching in compilers:
- "peephole optimizations": the most common kind of optimization in compilers. - "peephole optimizations": the most common kind of optimization in compilers.
They find a short sequence of code and replace that with some other code, They find a short sequence of code and replace it with some other code.
for example replacing ```c x & 1 << b``` with a bit test. For example replacing ```c x & (1 << b)``` with a bit test operation.
- finding a sequence of operations for complex optimization passes to operate on: - finding a sequence of operations for complex optimization passes to operate on:
advanced compilers have complex operations that can't really be performed with advanced compilers have complex optimizations that can't really be performed with
simple IR operation replacements, and instead requires complex logic. simple IR operation replacements, and instead require complex logic.
Patterns are used here to find operation sequences where those optimizations Patterns are used here to find operation sequences where those optimizations
are applicable, and also to extract details inside that sequence. are applicable, and also to extract details inside that sequence.
- code generation: converting the IR to machine code / VM bytecode. - code generation: converting the IR to machine code / VM bytecode.
@@ -41,15 +41,12 @@
Currently, most compilers mostly do this inside the compiler's source code. Currently, most compilers mostly do this inside the compiler's source code.
For example, in MLIR, *most* pattern matches are performed in C++ code. For example, in MLIR, *most* pattern matches are performed in C++ code.
The only advantage to this approach is that it will reduce compiler development time The only advantage to this approach is that it doesn't require a complex pattern matching system.
if the compiler only needs to match a few patterns.
] ]
#section[ #section[
== Disadvantages == Disadvantages
Doing pattern matching inside the compiler's source code has many disadvantages. Doing pattern matching that way has many disadvantages.
I strongly advertise against doing pattern matching this way.
\ \
Some (but not all) disadvantages: Some (but not all) disadvantages:
@@ -59,7 +56,7 @@
- overall error-prone - overall error-prone
I myself did pattern matching this way in my old compiler backend, I myself did pattern matching this way in my old compiler backend,
and I speak from experience when I say that this approach sucks. and I speak from experience when I say that this approach *sucks* (in most cases).
] ]
#section[ #section[
@@ -71,7 +68,7 @@
#section[ #section[
An example is Cranelift's ISLE: An example is Cranelift's ISLE:
#context html-frame[```isle #context html-frame[```lisp
;; x ^ x == 0. ;; x ^ x == 0.
(rule (simplify (bxor (ty_int ty) x x)) (rule (simplify (bxor (ty_int ty) x x))
(subsume (iconst_u ty 0))) (subsume (iconst_u ty 0)))
@@ -112,15 +109,15 @@
#section[ #section[
= Pattern Matching Dialects = Pattern Matching Dialects
This section also applies to compilers that don't use dialects, but do pattern matching this way. This section also applies to compilers that don't use dialects, but do pattern matching this way.
For example GHC has the `RULES` pragma, which does something like this. I don't know where that is used, or if anyone even uses that... For example, GHC has the `RULES` pragma, which does something like this. I however don't know what that is actually used for...
\ \
I will also put this into the category of "structured pattern matching". I will also put this method into the category of "structured pattern matching".
\ \
The main example of this is MLIR, with the `pdl` and the `transform` dialects. The main example of this is MLIR, with the `pdl` and the `transform` dialects.
Sadly few projects use these dialects, and instead have C++ pattern matching code. Sadly few projects/people use these dialects, and instead use C++ pattern matching code.
One reason for this could be that they aren't documented very well. I think that is because the dialects aren't documented very well.
] ]
#section[ #section[
@@ -128,7 +125,7 @@
Modern compilers, especially multi-level compilers, such as MLIR, Modern compilers, especially multi-level compilers, such as MLIR,
have their operations grouped in "dialects". have their operations grouped in "dialects".
Each dialect represents either a specific kind of operations, like arithmetic operations, Each dialect represents either specific kind of operations, like arithmetic operations,
or a specific compilation target/backend's operations, such as the `llvm` dialect in MLIR. or a specific compilation target/backend's operations, such as the `llvm` dialect in MLIR.
Dialects commonly contain operations, data types, as well as optimization and dialect conversion passes. Dialects commonly contain operations, data types, as well as optimization and dialect conversion passes.
@@ -137,8 +134,25 @@
#section[ #section[
== Core Concept == Core Concept
Instead of, or in addition to having a separate language for pattern matching and rewrites, Instead of, or in addition to having a separate language for pattern matching and rewrites,
the patterns and rewrites are represented in the compiler IR. the IR patterns and rewrites are represented in the compiler IR itself.
This is mostly done in a separate dialect. This is mostly done in a separate dialect, with dedicated operations for operating on compiler IR.
]
#section[
== Examples
MLIR's `pdl` dialect can be used to replace `arith.addi` with `my.add` like this:
#context html-frame[```llvm
pdl.pattern @replace_addi_with_my_add : benefit(1) {
%arg0 = pdl.operand
%arg1 = pdl.operand
%op = pdl.operation "arith.addi"(%arg0, %arg1)
pdl.rewrite %op {
%new_op = pdl.operation "my.add"(%arg0, %arg1) -> (%op)
pdl.replace %op with %new_op
}
}
```]
] ]
#section[ #section[
@@ -152,6 +166,14 @@
- bragging rights: your compiler represents it's own patterns in it's own IR - bragging rights: your compiler represents it's own patterns in it's own IR
] ]
#section[
== Combining with a DSL
The best way to do pattern matching is to have a pattern matching / rewrite DSL,
that transpiles to pattern matching / rewrite dialect operations.
The advantage of this over just having a rewrite dialect is that it (should) make patterns even more readable.
]
#section[ #section[
= More Advantages of Structured Pattern Matching = More Advantages of Structured Pattern Matching
@@ -173,10 +195,18 @@
#section[ #section[
Optimizing compilers typically deal with code (mostly written by people) Optimizing compilers typically deal with code (mostly written by people)
that is on a lower level than the compiler theoretically supports. that is on a lower level than the compiler theoretically supports.
For example, humans tend to write code like this for testing for a bit: ```c x & 1 << b```, For example, humans tend to write code like this for testing for a bit: ```c x & (1 << b)```,
but compilers tend to have a high-level bit test primitive. but compilers tend to have a high-level bit test operation (with exceptions).
A reason for having higher-level primitives is that it allows the compiler to do more high-level optimizations, A reason for having higher-level primitives is that it allows the compiler to do more high-level optimizations,
but also some target architectures have a bit test operation that is faster. but also some target architectures have a bit test operation, that is more optimal.
]
// TODO! DEBUG INFORMATION
#section[
LLVM actually doesn't have many dedicated operations like a bit-test operation,
and instead canonicalizes all bit-test patterns to ```c x & (1 << b) != 0```,
and matches for that in passes that expect bit test operations.
] ]
#section[ #section[
@@ -186,7 +216,7 @@
] ]
#section[ #section[
Now let's go back to the ```c x & 1 << b``` (bit test) example. Now let's go back to the ```c x & (1 << b)``` (bit test) example.
Optimizing compilers should be able to detect that pattern, and also other bit test patterns (like ```c x & (1 << b) > 0```), Optimizing compilers should be able to detect that pattern, and also other bit test patterns (like ```c x & (1 << b) > 0```),
and then replace those with a bit test operation. and then replace those with a bit test operation.
But they also have to be able to convert bit test operations back to their implementation for targets that don't have a bit test operation. But they also have to be able to convert bit test operations back to their implementation for targets that don't have a bit test operation.
@@ -213,6 +243,9 @@
= Conclusion = Conclusion
One can see how pattern matching dialects are the best option by far. One can see how pattern matching dialects are the best option by far.
\
Someone wanted me to insert a takeaway here, but I won't.
\ \
PS: I'll hunt down everyone who still decides to do pattern matching in their compiler source after reading this article. PS: I'll hunt down everyone who still decides to do pattern matching in their compiler source after reading this article.
] ]