extend pattern matching article

2025-09-10 01:05:07 +02:00 · 2025-07-05 21:00:30 +02:00
parent 0e55be8ee0
commit 6c4616eb4c
2 changed files with 34 additions and 2 deletions
--- a/pages/article-favicon.typ
+++ b/pages/article-favicon.typ
@@ -63,7 +63,7 @@
  \
  Then I used modern technology (a copier) to copy that piece of paper multiple times,
-  and also scale it up (for higher coloring details and image resolution).
+  and also scale it up (to allow for more details).
 ]
 #section[
@@ -78,7 +78,7 @@
 ]
 #section[
-  = Step 4: Outsourcing the logo
+  = Step 4: Outsourcing the coloring
  After some time, I just gave up, and decided to ask my sister for help...
  #context wimage(res-path()+"article-favicon/step4_0.png", width:70%)
--- a/pages/compiler-pattern-matching.typ
+++ b/pages/compiler-pattern-matching.typ
@@ -317,6 +317,38 @@
  where the performance on processors depends on a sequence of instructions, instead of just a single instruction.
 ]
 #section[
  == Superscalar CPUs
  Modern processor architecture features like superscalar execution make this even more complicated.
  \
  As a simple, *non realistic* example, let's imagine a CPU (core) that has one bit operations execution unit,
  and two ALU execution units:
  #context html-frame[```
  +---------+-----+-----+
  | Bit Ops | ALU | ALU |
  +---------+-----+-----+
  ```]
  This means that the CPU can execute two instructions in the ALU unit and one instruction in the bit ops unit at the same time.
 ]
 #section[
  One might think that always optimizing `a & (1 << b)` to a bit test operation is good for performance.
  But in this example, that is not the case.
  If we have a function that does a lot of bitwise operations next to each other,
  and the compiler replaces all bit tests with bit test operations, suddenly all operations depend on the bit ops unit,
  which means that instead of executing 3 instructions at a time (ignoring pipelining), the CPU can only execute one instruction at a time.
 ]
 #section[
  This shows that we won't know if an optimization is actually good,
  until we are at a late point in the compilation process where we can simulate the CPU's instruction scheduling.
  This does not only apply to instruction selection, but also to more higher-level optimizations,
  such as loop and control flow related optimizations.
 ]
 #section[
  = Conclusion
  One can see how pattern matching dialects are the best option by far.