extend pattern matching article

This commit is contained in:
2025-07-05 21:00:30 +02:00
parent 0e55be8ee0
commit 6c4616eb4c
2 changed files with 34 additions and 2 deletions

View File

@@ -317,6 +317,38 @@
where the performance on processors depends on a sequence of instructions, instead of just a single instruction.
]
#section[
== Superscalar CPUs
Modern processor architecture features like superscalar execution make this even more complicated.
\
As a simple, *non realistic* example, let's imagine a CPU (core) that has one bit operations execution unit,
and two ALU execution units:
#context html-frame[```
+---------+-----+-----+
| Bit Ops | ALU | ALU |
+---------+-----+-----+
```]
This means that the CPU can execute two instructions in the ALU unit and one instruction in the bit ops unit at the same time.
]
#section[
One might think that always optimizing `a & (1 << b)` to a bit test operation is good for performance.
But in this example, that is not the case.
If we have a function that does a lot of bitwise operations next to each other,
and the compiler replaces all bit tests with bit test operations, suddenly all operations depend on the bit ops unit,
which means that instead of executing 3 instructions at a time (ignoring pipelining), the CPU can only execute one instruction at a time.
]
#section[
This shows that we won't know if an optimization is actually good,
until we are at a late point in the compilation process where we can simulate the CPU's instruction scheduling.
This does not only apply to instruction selection, but also to more higher-level optimizations,
such as loop and control flow related optimizations.
]
#section[
= Conclusion
One can see how pattern matching dialects are the best option by far.