mirror of
https://github.com/alex-s168/website.git
synced 2025-09-10 09:05:08 +02:00
c
This commit is contained in:
@@ -20,15 +20,15 @@
|
||||
|
||||
#section[
|
||||
= Introduction
|
||||
Compilers often have to deal with find-and-replace inside the compiler IR (intermediate representation).
|
||||
Compilers often have to deal with find-and-replace (pattern matching and rewriting) inside the compiler IR (intermediate representation).
|
||||
|
||||
Common use cases for pattern matching in compilers:
|
||||
- "peephole optimizations": the most common kind of optimization in compilers.
|
||||
They find a short sequence of code and replace that with some other code,
|
||||
for example replacing ```c x & 1 << b``` with a bit test.
|
||||
They find a short sequence of code and replace it with some other code.
|
||||
For example replacing ```c x & (1 << b)``` with a bit test operation.
|
||||
- finding a sequence of operations for complex optimization passes to operate on:
|
||||
advanced compilers have complex operations that can't really be performed with
|
||||
simple IR operation replacements, and instead requires complex logic.
|
||||
advanced compilers have complex optimizations that can't really be performed with
|
||||
simple IR operation replacements, and instead require complex logic.
|
||||
Patterns are used here to find operation sequences where those optimizations
|
||||
are applicable, and also to extract details inside that sequence.
|
||||
- code generation: converting the IR to machine code / VM bytecode.
|
||||
@@ -41,15 +41,12 @@
|
||||
Currently, most compilers mostly do this inside the compiler's source code.
|
||||
For example, in MLIR, *most* pattern matches are performed in C++ code.
|
||||
|
||||
The only advantage to this approach is that it will reduce compiler development time
|
||||
if the compiler only needs to match a few patterns.
|
||||
The only advantage to this approach is that it doesn't require a complex pattern matching system.
|
||||
]
|
||||
|
||||
#section[
|
||||
== Disadvantages
|
||||
Doing pattern matching inside the compiler's source code has many disadvantages.
|
||||
|
||||
I strongly advertise against doing pattern matching this way.
|
||||
Doing pattern matching that way has many disadvantages.
|
||||
|
||||
\
|
||||
Some (but not all) disadvantages:
|
||||
@@ -59,7 +56,7 @@
|
||||
- overall error-prone
|
||||
|
||||
I myself did pattern matching this way in my old compiler backend,
|
||||
and I speak from experience when I say that this approach sucks.
|
||||
and I speak from experience when I say that this approach *sucks* (in most cases).
|
||||
]
|
||||
|
||||
#section[
|
||||
@@ -71,7 +68,7 @@
|
||||
|
||||
#section[
|
||||
An example is Cranelift's ISLE:
|
||||
#context html-frame[```isle
|
||||
#context html-frame[```lisp
|
||||
;; x ^ x == 0.
|
||||
(rule (simplify (bxor (ty_int ty) x x))
|
||||
(subsume (iconst_u ty 0)))
|
||||
@@ -112,15 +109,15 @@
|
||||
#section[
|
||||
= Pattern Matching Dialects
|
||||
This section also applies to compilers that don't use dialects, but do pattern matching this way.
|
||||
For example GHC has the `RULES` pragma, which does something like this. I don't know where that is used, or if anyone even uses that...
|
||||
For example, GHC has the `RULES` pragma, which does something like this. I however don't know what that is actually used for...
|
||||
|
||||
\
|
||||
I will also put this into the category of "structured pattern matching".
|
||||
I will also put this method into the category of "structured pattern matching".
|
||||
|
||||
\
|
||||
The main example of this is MLIR, with the `pdl` and the `transform` dialects.
|
||||
Sadly few projects use these dialects, and instead have C++ pattern matching code.
|
||||
One reason for this could be that they aren't documented very well.
|
||||
Sadly few projects/people use these dialects, and instead use C++ pattern matching code.
|
||||
I think that is because the dialects aren't documented very well.
|
||||
]
|
||||
|
||||
#section[
|
||||
@@ -128,8 +125,8 @@
|
||||
Modern compilers, especially multi-level compilers, such as MLIR,
|
||||
have their operations grouped in "dialects".
|
||||
|
||||
Each dialect represents either a specific kind of operations, like arithmetic operations,
|
||||
or a specific compilation target / backend's operations, such as the `llvm` dialect in MLIR.
|
||||
Each dialect represents either specific kind of operations, like arithmetic operations,
|
||||
or a specific compilation target/backend's operations, such as the `llvm` dialect in MLIR.
|
||||
|
||||
Dialects commonly contain operations, data types, as well as optimization and dialect conversion passes.
|
||||
]
|
||||
@@ -137,8 +134,25 @@
|
||||
#section[
|
||||
== Core Concept
|
||||
Instead of, or in addition to having a separate language for pattern matching and rewrites,
|
||||
the patterns and rewrites are represented in the compiler IR.
|
||||
This is mostly done in a separate dialect.
|
||||
the IR patterns and rewrites are represented in the compiler IR itself.
|
||||
This is mostly done in a separate dialect, with dedicated operations for operating on compiler IR.
|
||||
]
|
||||
|
||||
#section[
|
||||
== Examples
|
||||
MLIR's `pdl` dialect can be used to replace `arith.addi` with `my.add` like this:
|
||||
#context html-frame[```llvm
|
||||
pdl.pattern @replace_addi_with_my_add : benefit(1) {
|
||||
%arg0 = pdl.operand
|
||||
%arg1 = pdl.operand
|
||||
%op = pdl.operation "arith.addi"(%arg0, %arg1)
|
||||
|
||||
pdl.rewrite %op {
|
||||
%new_op = pdl.operation "my.add"(%arg0, %arg1) -> (%op)
|
||||
pdl.replace %op with %new_op
|
||||
}
|
||||
}
|
||||
```]
|
||||
]
|
||||
|
||||
#section[
|
||||
@@ -152,6 +166,14 @@
|
||||
- bragging rights: your compiler represents it's own patterns in it's own IR
|
||||
]
|
||||
|
||||
#section[
|
||||
== Combining with a DSL
|
||||
The best way to do pattern matching is to have a pattern matching / rewrite DSL,
|
||||
that transpiles to pattern matching / rewrite dialect operations.
|
||||
|
||||
The advantage of this over just having a rewrite dialect is that it (should) make patterns even more readable.
|
||||
]
|
||||
|
||||
#section[
|
||||
= More Advantages of Structured Pattern Matching
|
||||
|
||||
@@ -173,10 +195,18 @@
|
||||
#section[
|
||||
Optimizing compilers typically deal with code (mostly written by people)
|
||||
that is on a lower level than the compiler theoretically supports.
|
||||
For example, humans tend to write code like this for testing for a bit: ```c x & 1 << b```,
|
||||
but compilers tend to have a high-level bit test primitive.
|
||||
For example, humans tend to write code like this for testing for a bit: ```c x & (1 << b)```,
|
||||
but compilers tend to have a high-level bit test operation (with exceptions).
|
||||
A reason for having higher-level primitives is that it allows the compiler to do more high-level optimizations,
|
||||
but also some target architectures have a bit test operation that is faster.
|
||||
but also some target architectures have a bit test operation, that is more optimal.
|
||||
]
|
||||
|
||||
// TODO! DEBUG INFORMATION
|
||||
|
||||
#section[
|
||||
LLVM actually doesn't have many dedicated operations like a bit-test operation,
|
||||
and instead canonicalizes all bit-test patterns to ```c x & (1 << b) != 0```,
|
||||
and matches for that in passes that expect bit test operations.
|
||||
]
|
||||
|
||||
#section[
|
||||
@@ -186,7 +216,7 @@
|
||||
]
|
||||
|
||||
#section[
|
||||
Now let's go back to the ```c x & 1 << b``` (bit test) example.
|
||||
Now let's go back to the ```c x & (1 << b)``` (bit test) example.
|
||||
Optimizing compilers should be able to detect that pattern, and also other bit test patterns (like ```c x & (1 << b) > 0```),
|
||||
and then replace those with a bit test operation.
|
||||
But they also have to be able to convert bit test operations back to their implementation for targets that don't have a bit test operation.
|
||||
@@ -213,6 +243,9 @@
|
||||
= Conclusion
|
||||
One can see how pattern matching dialects are the best option by far.
|
||||
|
||||
\
|
||||
Someone wanted me to insert a takeaway here, but I won't.
|
||||
|
||||
\
|
||||
PS: I'll hunt down everyone who still decides to do pattern matching in their compiler source after reading this article.
|
||||
]
|
||||
|
Reference in New Issue
Block a user