diff --git a/pages/article-favicon.typ b/pages/article-favicon.typ
index ed13b59..98119bf 100644
--- a/pages/article-favicon.typ
+++ b/pages/article-favicon.typ
@@ -63,7 +63,7 @@
 
   \
   Then I used modern technology (a copier) to copy that piece of paper multiple times,
-  and also scale it up (for higher coloring details and image resolution).
+  and also scale it up (to allow for more details).
 ]
 
 #section[
@@ -78,7 +78,7 @@
 ]
 
 #section[
-  = Step 4: Outsourcing the logo
+  = Step 4: Outsourcing the coloring
   After some time, I just gave up, and decided to ask my sister for help...
 
   #context wimage(res-path()+"article-favicon/step4_0.png", width:70%)
diff --git a/pages/compiler-pattern-matching.typ b/pages/compiler-pattern-matching.typ
index 613a823..32b9d20 100644
--- a/pages/compiler-pattern-matching.typ
+++ b/pages/compiler-pattern-matching.typ
@@ -317,6 +317,38 @@
   where the performance on processors depends on a sequence of instructions, instead of just a single instruction.
 ]
 
+#section[
+  == Superscalar CPUs
+  Modern processor architecture features like superscalar execution make this even more complicated.
+
+  \
+  As a simple, *non realistic* example, let's imagine a CPU (core) that has one bit operations execution unit,
+  and two ALU execution units:
+  #context html-frame[```
+  +---------+-----+-----+
+  | Bit Ops | ALU | ALU |
+  +---------+-----+-----+
+  ```]
+  This means that the CPU can execute two instructions in the ALU unit and one instruction in the bit ops unit at the same time.
+]
+
+#section[
+  One might think that always optimizing `a & (1 << b)` to a bit test operation is good for performance.
+  But in this example, that is not the case.
+
+  If we have a function that does a lot of bitwise operations next to each other,
+  and the compiler replaces all bit tests with bit test operations, suddenly all operations depend on the bit ops unit,
+  which means that instead of executing 3 instructions at a time (ignoring pipelining), the CPU can only execute one instruction at a time.
+]
+
+#section[
+  This shows that we won't know if an optimization is actually good,
+  until we are at a late point in the compilation process where we can simulate the CPU's instruction scheduling.
+
+  This does not only apply to instruction selection, but also to more higher-level optimizations,
+  such as loop and control flow related optimizations.
+]
+
 #section[
   = Conclusion
   One can see how pattern matching dialects are the best option by far.