diff --git a/pages/article-favicon.typ b/pages/article-favicon.typ index ed13b59..98119bf 100644 --- a/pages/article-favicon.typ +++ b/pages/article-favicon.typ @@ -63,7 +63,7 @@ \ Then I used modern technology (a copier) to copy that piece of paper multiple times, - and also scale it up (for higher coloring details and image resolution). + and also scale it up (to allow for more details). ] #section[ @@ -78,7 +78,7 @@ ] #section[ - = Step 4: Outsourcing the logo + = Step 4: Outsourcing the coloring After some time, I just gave up, and decided to ask my sister for help... #context wimage(res-path()+"article-favicon/step4_0.png", width:70%) diff --git a/pages/compiler-pattern-matching.typ b/pages/compiler-pattern-matching.typ index 613a823..32b9d20 100644 --- a/pages/compiler-pattern-matching.typ +++ b/pages/compiler-pattern-matching.typ @@ -317,6 +317,38 @@ where the performance on processors depends on a sequence of instructions, instead of just a single instruction. ] +#section[ + == Superscalar CPUs + Modern processor architecture features like superscalar execution make this even more complicated. + + \ + As a simple, *non realistic* example, let's imagine a CPU (core) that has one bit operations execution unit, + and two ALU execution units: + #context html-frame[``` + +---------+-----+-----+ + | Bit Ops | ALU | ALU | + +---------+-----+-----+ + ```] + This means that the CPU can execute two instructions in the ALU unit and one instruction in the bit ops unit at the same time. +] + +#section[ + One might think that always optimizing `a & (1 << b)` to a bit test operation is good for performance. + But in this example, that is not the case. + + If we have a function that does a lot of bitwise operations next to each other, + and the compiler replaces all bit tests with bit test operations, suddenly all operations depend on the bit ops unit, + which means that instead of executing 3 instructions at a time (ignoring pipelining), the CPU can only execute one instruction at a time. +] + +#section[ + This shows that we won't know if an optimization is actually good, + until we are at a late point in the compilation process where we can simulate the CPU's instruction scheduling. + + This does not only apply to instruction selection, but also to more higher-level optimizations, + such as loop and control flow related optimizations. +] + #section[ = Conclusion One can see how pattern matching dialects are the best option by far.