Files
website-build/build/compiler-inlining.typ.desktop.html
2025-08-29 15:28:30 +02:00

359 lines
20 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<title>Automatically inlining functions is not easy</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="icon" sizes="512x512" href="res/favicon.png">
<link rel="image_src" type="image/png" href="res/favicon.png">
<link type="application/atom+xml" rel="alternate" title="alexs168's blog" href="atom.xml">
</head>
<body>
<style>
@font-face {
font-family: 'DejaVu Sans Mono';
src:local('DejaVu Sans Mono'),
url('res/DejaVuSansMono.woff2') format('woff2'),
local('Courier New'),
local(Courier),
local(monospace);
font-weight: normal;
font-style: normal;
font-display: swap;
}
/*
@font-face {
font-family: 'DejaVu Sans Mono';
src:local('DejaVu Sans Mono'),
url('res/DejaVuSansMono-Bold.woff2') format('woff2'),
local('Courier New'),
local(Courier),
local(monospace);
font-weight: bold;
font-style: normal;
font-display: swap;
}
@font-face {
font-family: 'DejaVu Sans';
src:local('DejaVu Sans'),
url('res/DejaVuSans-Bold.woff2') format('woff2'),
local('Courier New');
font-weight: bold;
font-style: normal;
font-display: swap;
}
@font-face {
font-family: 'DejaVu Sans';
src:local('DejaVu Sans'),
url('res/DejaVuSans.woff2') format('woff2'),
local('Courier New');
font-weight: normal;
font-style: normal;
font-display: swap;
}*/
body {
font-family: DejaVu Sans Mono;
font-size: 17pt;
}
td {
width: 100%;
display: inline;
vertical-align: top;
}
h1,h2,h3,h4 {
margin-top: 1%;
margin-bottom: 0.75%;
margin-left: -0.75%;
}
p {
margin-top: 0.75%;
margin-bottom: 0.75%;
}
ul {
margin-top: 0%;
}
.current {
font-weight: bold;
}
pre {
margin-top: 0px;
margin-bottom: 0px;
display: inline;
}
a {
color: #3f51b5;
text-decoration: none;
}
a:visited {
color: #3f51b5;
text-decoration: none;
}
</style>
<table>
<tr>
<td><span class="sidebar" style><span class="column-fixed" style="display: inline-flex; position: fixed; justify-content: center; width: 25%"><table><tr><td><span class="table-of-contents" style><div style="
border:1.2pt solid black;
border-radius:2pt;
padding:3%;"><p><span style="text-decoration: underline">Table of contents</span><br></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-0"><a href="#loc-1">Introduction</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-1"><a href="#loc-2">Greedy inlining with heuristics</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">|  ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-2"><a href="#loc-3">Issue 1: (sometimes) multiple options</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">|  ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-3"><a href="#loc-4">Issue 2: ABI requirements on argument passing</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">|  └─</span> <span class style="flex:1;"><span class="headingr" id="headingr-4"><a href="#loc-5">Issue 3: (sometimes) prevents optimizations</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-5"><a href="#loc-6">Function outlining</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-6"><a href="#loc-7">A better approach</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">|  ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-7"><a href="#loc-8">Step 1: inlining</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">|  ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-8"><a href="#loc-9">Step 2: detect duplicate code</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">|  ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-9"><a href="#loc-10">Step 3: slightly reduce size of outlinable section</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">|  ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-10"><a href="#loc-11">Step 4: perform outlining</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">|  └─</span> <span class style="flex:1;"><span class="headingr" id="headingr-11"><a href="#loc-12">Issue 1: high compile-time memory usage</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">└─</span> <span class style="flex:1;"><span class="headingr" id="headingr-12"><a href="#loc-13">Conclusion</a></span></span></span></p></div></span><span class="table-of-contents" style><script>document.addEventListener('DOMContentLoaded', function() {
let tags = ['h2', 'h3', 'h4'].flatMap(x => Array.from(document.getElementsByTagName(x))).sort((a, b) => a.getBoundingClientRect().top - b.getBoundingClientRect().top);
let pageHeight = document.documentElement.scrollHeight-window.innerHeight;
document.addEventListener('scroll', (event) => {
let progress = -(document.documentElement.getBoundingClientRect().y / pageHeight);
let delta = progress * window.innerHeight;
let idx = tags.map(x => 0 > x.getBoundingClientRect().top - delta).lastIndexOf(true);
Array.from(document.getElementsByClassName('headingr')).map(x => x.classList.remove('current'));
if (idx != -1) {
document.getElementById('headingr-' + idx).classList.add('current');
}
}
);
})</script><style>
.table-of-contents > p > span { width: 100%; }
</style></span></td></tr><tr><td><br>
<a href="index.html"><b>Website Home</b></a> <br>
</td></tr><tr><td><p>Renderings of this page:</p><ul><li><a href="#" onclick="gotoVariant(&quot;.min.pdf&quot;);">minimal PDF (printable)</a></li><li><a href="#" onclick="gotoVariant(&quot;.nano.html&quot;);">minimal HTML</a></li></ul></td></tr><tr><td><a href="atom.xml">Atom feed</a> <br>
</td></tr><tr><td><style>
@media only screen and (max-width: 1200px) {
.sidebar {
display: none !important;
}
.column-fixed {
width: 0% !important;
}
.body-column {
left: 3% !important;
}
}
@media only screen and (max-width: 1800px) {
.body-column > span {
width: 75% !important;
}
}
@media only screen and (max-width: 1200px) {
.body-column {
width: 97% !important;
}
.body-column > span {
width: 100% !important;
}
}
.hide { display: inline; background: black; transition: background 0.3s linear; }
.hide:hover, .hide:focus { background: transparent; }
</style></td></tr></table><style>
.column-fixed > table > tbody > tr > td > * { width: 100%; }
</style></span></span></td>
<td><span class="body-column" style="position: absolute; left: 28%; width: 72%"><span style="
width:50%;
display:inline-block"><div style="
"><p><br></p><h1>Automatically inlining functions is not easy</h1><p><span style="font-size: 14pt"><p>Git revision <a href="https://github.com/alex-s168/website/tree/9c2913af189b62c028f6f773370f50f9e6c13307">#9c2913af</a><br>Modified at 11. August 2025 16:38</p><p>Written by <a href="https://alex.vxcc.dev">alex_s168</a></p></span></p></div><div style="
"><br>Note that the <a href="#" onclick="gotoVariant(&quot;.min.pdf&quot;);">PDF Version</a> of this page might look a bit better styling wise.</div><div style="
"><br><span id="loc-1" style="text-decoration: underline"><h2>Introduction</h2></span> Function calls have some overhead, which can sometimes be a big issue for other optimizations. Because of that, compiler backends (should) inline function calls. There are however many issues with just greedily inlining calls…</div><div style="
"><p><br><span id="loc-2" style="text-decoration: underline"><h2>Greedy inlining with heuristics</h2></span> This is the most obvious approach. We can just inline all functions with only one call, and then inline calls where the inlined function does not have many instructions.</p><p>Example:</p><div class style="margin-top:4pt;"><span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><pre><code>function f32 $square(f32 %x) {<br>@entry:<br> // this is stupid, but I couldn't come up with a better example<br> f32 %e = add %x, 0<br> f32 %out = add %e, %x<br> ret %out<br>}<br><br>function f32 $hypot(f32 %a, f32 %b) {<br>@entry:<br> f32 %as = call $square(%a)<br> f32 %bs = call $square(%b)<br> f32 %sum = add %as, %bs<br> f32 %o = sqrt %sum<br> ret %o<br>}<br><br>function f32 $tri_hypot({f32, f32} %x) {<br> f32 %a = extract %x, 0<br> f32 %b = extract %x, 1<br> f32 %o = call $hypot(%a, %b) // this is a "tail call"<br> ret %o<br>}<br><br>// let's assume that $hypot is used someplace else in the code too</code></pre></span></div></div><div style="
"><br>Lets assume our inlining treshold is 5 operations. Then we would get Waait there are multiple options…</div><div style="
"><p><br><span id="loc-3" style="text-decoration: underline"><h3>Issue 1: (sometimes) multiple options</h3></span> If we inline the <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$square</code></span> calls, then <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$hypot</code></span> will have too many instructions to be inlined into <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$tri_hypot</code></span>:</p><div class style="margin-top:4pt;"><span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><pre><code>...<br>function f32 $hypot(f32 %a, f32 %b) {<br>@entry:<br> // more instructions than our inlining treshold:<br> f32 %ase = add %a, 0<br> f32 %as = add %ase, %a<br> f32 %bse = add %b, 0<br> f32 %bs = add %bse, %b<br> f32 %sum = add %as, %bs<br> f32 %o = sqrt %sum<br> ret %o<br>}<br>...</code></pre></span></div></div><div style="
"><p><br>The second option is to inline the <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$hypot</code></span> call into <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$tri_hypot</code></span>. (There are also some other options)</p><p>Now in this case, it seems obvious to prefer inlining <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$square</code></span> into <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$hypot</code></span>.</p></div><div style="
"><p><br><span id="loc-4" style="text-decoration: underline"><h3>Issue 2: ABI requirements on argument passing</h3></span> If we assume the target ABI only has one f32 register for passing arguments, then we would have to generate additional instructions for passing the second argument of <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$hypot</code></span>, and then it might actually be more efficient to inline <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$hypot</code></span> instead of <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$square</code></span>.</p><p>This example is not realistic, but this issue actually occurs when compiling lots of code.</p><p>Another related issue is that having more arguments arranged in a fixed way will require lots of moving data arround at the call site.</p><p>A solution to this is to make the heuristics not just output code size, but also make it depend on the number of arguments / outputs passed to the function.</p></div><div style="
"><p><br><span id="loc-5" style="text-decoration: underline"><h3>Issue 3: (sometimes) prevents optimizations</h3></span></p><div class style="margin-top:4pt;"><span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><pre><code>function f32 $myfunc(f32 %a, f32 %b) {<br>@entry:<br> f32 %sum = add %a, %b<br> f32 %sq = sqrt %sum<br> ...<br>}<br><br>function $callsite(f32 %a, f32 %b) {<br>@entry:<br> f32 %as = add %a, %a<br> f32 %bs = add %b, %b<br> f32 %x = call $myfunc(%as, %bs)<br> ...<br>}</code></pre></span></div><p>If the target has a efficient <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">hypot</code></span> operation, then that operation will only be used if we inline <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$myfunc</code></span> into <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$callsite</code></span>.</p><p>This means that inlining is now depended on… instruction selection??</p><p>This is not the only optimization prevented by not inlining the call. If <span style="
border:1pt solid black;
border-radius:2pt;
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$callsite</code></span> were to be called in a loop, then not inlining would prevent vectorization.</p></div><div style="
"><p><br><span id="loc-6" style="text-decoration: underline"><h2>Function outlining</h2></span> A related optimization is “outlining”. Its the opposite to inlining. It moves duplicate code into a function, to reduce code size, and sometimes increase performance (because of instruction caching)</p><p>If we do inlining seperately from outlining, we often get unoptimal code.</p></div><div style="
"><br><span id="loc-7" style="text-decoration: underline"><h2>A better approach</h2></span> We can instead first inline <strong>all</strong> inlinable calls, and <strong>then</strong> perform more agressive outlining.</div><div style="
"><p><br><span id="loc-8" style="text-decoration: underline"><h3>Step 1: inlining</h3></span> We inline <strong>all</strong> function calls, except for:</p><ul><li>self recursion (obviously)</li><li>functions explicitly marked as no-inline by the user</li></ul></div><div style="
"><p><br><span id="loc-9" style="text-decoration: underline"><h3>Step 2: detect duplicate code</h3></span> There are many algorithms for doing this.</p><p>The goal of this step is to both:</p><ul><li>maximize size of outlinable section</li><li>minimize size of code</li></ul></div><div style="
"><p><br><span id="loc-10" style="text-decoration: underline"><h3>Step 3: slightly reduce size of outlinable section</h3></span> The goal is to reduce size of outlinable sections, to make the code more optimal.</p><p>This should be ABI and instruction depended, and have the goal of:</p><ul><li>reducing argument shuffles required at all call sites</li><li>reducing register preassure</li><li>not preventing good isel choices and optimizations.</li></ul><p>this is also dependent on the targetted code size.</p></div><div style="
"><br><span id="loc-11" style="text-decoration: underline"><h3>Step 4: perform outlining</h3></span> This is obvious.</div><div style="
"><p><br><span id="loc-12" style="text-decoration: underline"><h3>Issue 1: high compile-time memory usage</h3></span> Inlining <strong>all</strong> function calls first will increase the memory usage during compilation by A LOT</p><p>Im sure that there is a smarter way to implement this method, without actually performing the inlining…</p></div><div style="
"><p><br><span id="loc-13" style="text-decoration: underline"><h2>Conclusion</h2></span> Function inlining is much more complex than one might think.</p><p>PS: No idea how to implement this…</p><p>Subscribe to the <a href="atom.xml">Atom feed</a> to get notified about futre compiler-related articles.</p></div></span></span></td>
<td></td>
</tr>
</table>
<script>
function gotoVariant(variant) {
window.location.href = window.location.href.replace(/\.\w+.html/g, variant);
}
window.addEventListener('beforeprint', (event) => {
gotoVariant('.min.pdf');
});
</script>
<script src="coffee.js" async>
</script>
</body>
</html>