up
This commit is contained in:
358
build/compiler-inlining.typ.desktop.html
Normal file
358
build/compiler-inlining.typ.desktop.html
Normal file
@@ -0,0 +1,358 @@
|
||||
<!DOCTYPE html>
|
||||
<html>
|
||||
<head>
|
||||
<title>Automatically inlining functions is not easy</title>
|
||||
<meta charset="utf-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||||
<link rel="icon" sizes="512x512" href="res/favicon.png">
|
||||
<link rel="image_src" type="image/png" href="res/favicon.png">
|
||||
<link type="application/atom+xml" rel="alternate" title="alexs168's blog" href="atom.xml">
|
||||
</head>
|
||||
<body>
|
||||
<style>
|
||||
|
||||
@font-face {
|
||||
font-family: 'DejaVu Sans Mono';
|
||||
src:local('DejaVu Sans Mono'),
|
||||
url('res/DejaVuSansMono.woff2') format('woff2'),
|
||||
local('Courier New'),
|
||||
local(Courier),
|
||||
local(monospace);
|
||||
font-weight: normal;
|
||||
font-style: normal;
|
||||
font-display: swap;
|
||||
}
|
||||
|
||||
/*
|
||||
@font-face {
|
||||
font-family: 'DejaVu Sans Mono';
|
||||
src:local('DejaVu Sans Mono'),
|
||||
url('res/DejaVuSansMono-Bold.woff2') format('woff2'),
|
||||
local('Courier New'),
|
||||
local(Courier),
|
||||
local(monospace);
|
||||
font-weight: bold;
|
||||
font-style: normal;
|
||||
font-display: swap;
|
||||
}
|
||||
|
||||
@font-face {
|
||||
font-family: 'DejaVu Sans';
|
||||
src:local('DejaVu Sans'),
|
||||
url('res/DejaVuSans-Bold.woff2') format('woff2'),
|
||||
local('Courier New');
|
||||
font-weight: bold;
|
||||
font-style: normal;
|
||||
font-display: swap;
|
||||
}
|
||||
|
||||
@font-face {
|
||||
font-family: 'DejaVu Sans';
|
||||
src:local('DejaVu Sans'),
|
||||
url('res/DejaVuSans.woff2') format('woff2'),
|
||||
local('Courier New');
|
||||
font-weight: normal;
|
||||
font-style: normal;
|
||||
font-display: swap;
|
||||
}*/
|
||||
|
||||
body {
|
||||
font-family: DejaVu Sans Mono;
|
||||
font-size: 17pt;
|
||||
}
|
||||
|
||||
td {
|
||||
width: 100%;
|
||||
display: inline;
|
||||
vertical-align: top;
|
||||
}
|
||||
|
||||
h1,h2,h3,h4 {
|
||||
margin-top: 1%;
|
||||
margin-bottom: 0.75%;
|
||||
margin-left: -0.75%;
|
||||
}
|
||||
|
||||
p {
|
||||
margin-top: 0.75%;
|
||||
margin-bottom: 0.75%;
|
||||
}
|
||||
|
||||
ul {
|
||||
margin-top: 0%;
|
||||
}
|
||||
|
||||
.current {
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
pre {
|
||||
margin-top: 0px;
|
||||
margin-bottom: 0px;
|
||||
display: inline;
|
||||
}
|
||||
|
||||
a {
|
||||
color: #3f51b5;
|
||||
text-decoration: none;
|
||||
}
|
||||
|
||||
a:visited {
|
||||
color: #3f51b5;
|
||||
text-decoration: none;
|
||||
}
|
||||
</style>
|
||||
<table>
|
||||
<tr>
|
||||
<td><span class="sidebar" style><span class="column-fixed" style="display: inline-flex; position: fixed; justify-content: center; width: 25%"><table><tr><td><span class="table-of-contents" style><div style="
|
||||
border:1.2pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:3%;"><p><span style="text-decoration: underline">Table of contents</span><br></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-0"><a href="#loc-1">Introduction</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-1"><a href="#loc-2">Greedy inlining with heuristics</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">| ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-2"><a href="#loc-3">Issue 1: (sometimes) multiple options</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">| ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-3"><a href="#loc-4">Issue 2: ABI requirements on argument passing</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">| └─</span> <span class style="flex:1;"><span class="headingr" id="headingr-4"><a href="#loc-5">Issue 3: (sometimes) prevents optimizations</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-5"><a href="#loc-6">Function outlining</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-6"><a href="#loc-7">A better approach</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">| ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-7"><a href="#loc-8">Step 1: inlining</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">| ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-8"><a href="#loc-9">Step 2: detect duplicate code</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">| ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-9"><a href="#loc-10">Step 3: slightly reduce size of outlinable section</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">| ├─</span> <span class style="flex:1;"><span class="headingr" id="headingr-10"><a href="#loc-11">Step 4: perform outlining</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">| └─</span> <span class style="flex:1;"><span class="headingr" id="headingr-11"><a href="#loc-12">Issue 1: high compile-time memory usage</a></span></span></span></p><p style="line-height:1.1"><span class style="display:flex; text-indent:0pt;"><span class style="margin-right: 11pt;">└─</span> <span class style="flex:1;"><span class="headingr" id="headingr-12"><a href="#loc-13">Conclusion</a></span></span></span></p></div></span><span class="table-of-contents" style><script>document.addEventListener('DOMContentLoaded', function() {
|
||||
let tags = ['h2', 'h3', 'h4'].flatMap(x => Array.from(document.getElementsByTagName(x))).sort((a, b) => a.getBoundingClientRect().top - b.getBoundingClientRect().top);
|
||||
let pageHeight = document.documentElement.scrollHeight-window.innerHeight;
|
||||
document.addEventListener('scroll', (event) => {
|
||||
let progress = -(document.documentElement.getBoundingClientRect().y / pageHeight);
|
||||
let delta = progress * window.innerHeight;
|
||||
let idx = tags.map(x => 0 > x.getBoundingClientRect().top - delta).lastIndexOf(true);
|
||||
Array.from(document.getElementsByClassName('headingr')).map(x => x.classList.remove('current'));
|
||||
if (idx != -1) {
|
||||
document.getElementById('headingr-' + idx).classList.add('current');
|
||||
}
|
||||
}
|
||||
);
|
||||
})</script><style>
|
||||
.table-of-contents > p > span { width: 100%; }
|
||||
</style></span></td></tr><tr><td><br>
|
||||
<a href="index.html"><b>Website Home</b></a> <br>
|
||||
</td></tr><tr><td><p>Renderings of this page:</p><ul><li><a href="#" onclick="gotoVariant(".min.pdf");">minimal PDF (printable)</a></li><li><a href="#" onclick="gotoVariant(".nano.html");">minimal HTML</a></li></ul></td></tr><tr><td><a href="atom.xml">Atom feed</a> <br>
|
||||
</td></tr><tr><td><style>
|
||||
@media only screen and (max-width: 1200px) {
|
||||
.sidebar {
|
||||
display: none !important;
|
||||
}
|
||||
|
||||
.column-fixed {
|
||||
width: 0% !important;
|
||||
}
|
||||
|
||||
.body-column {
|
||||
left: 3% !important;
|
||||
}
|
||||
}
|
||||
|
||||
@media only screen and (max-width: 1800px) {
|
||||
.body-column > span {
|
||||
width: 75% !important;
|
||||
}
|
||||
}
|
||||
|
||||
@media only screen and (max-width: 1200px) {
|
||||
.body-column {
|
||||
width: 97% !important;
|
||||
}
|
||||
.body-column > span {
|
||||
width: 100% !important;
|
||||
}
|
||||
}
|
||||
|
||||
.hide { display: inline; background: black; transition: background 0.3s linear; }
|
||||
.hide:hover, .hide:focus { background: transparent; }
|
||||
</style></td></tr></table><style>
|
||||
.column-fixed > table > tbody > tr > td > * { width: 100%; }
|
||||
</style></span></span></td>
|
||||
<td><span class="body-column" style="position: absolute; left: 28%; width: 72%"><span style="
|
||||
|
||||
|
||||
width:50%;
|
||||
|
||||
display:inline-block"><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br></p><h1>Automatically inlining functions is not easy</h1><p><span style="font-size: 14pt"><p>Git revision <a href="https://github.com/alex-s168/website/tree/9c2913af189b62c028f6f773370f50f9e6c13307">#9c2913af</a><br>Modified at 11. August 2025 16:38</p><p>Written by <a href="https://alex.vxcc.dev">alex_s168</a></p></span></p></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><br>Note that the <a href="#" onclick="gotoVariant(".min.pdf");">PDF Version</a> of this page might look a bit better styling wise.</div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><br><span id="loc-1" style="text-decoration: underline"><h2>Introduction</h2></span> Function calls have some overhead, which can sometimes be a big issue for other optimizations. Because of that, compiler backends (should) inline function calls. There are however many issues with just greedily inlining calls…</div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br><span id="loc-2" style="text-decoration: underline"><h2>Greedy inlining with heuristics</h2></span> This is the most obvious approach. We can just inline all functions with only one call, and then inline calls where the inlined function does not have many instructions.</p><p>Example:</p><div class style="margin-top:4pt;"><span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><pre><code>function f32 $square(f32 %x) {<br>@entry:<br> // this is stupid, but I couldn't come up with a better example<br> f32 %e = add %x, 0<br> f32 %out = add %e, %x<br> ret %out<br>}<br><br>function f32 $hypot(f32 %a, f32 %b) {<br>@entry:<br> f32 %as = call $square(%a)<br> f32 %bs = call $square(%b)<br> f32 %sum = add %as, %bs<br> f32 %o = sqrt %sum<br> ret %o<br>}<br><br>function f32 $tri_hypot({f32, f32} %x) {<br> f32 %a = extract %x, 0<br> f32 %b = extract %x, 1<br> f32 %o = call $hypot(%a, %b) // this is a "tail call"<br> ret %o<br>}<br><br>// let's assume that $hypot is used someplace else in the code too</code></pre></span></div></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><br>Let’s assume our inlining treshold is 5 operations. Then we would get – Waait there are multiple options…</div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br><span id="loc-3" style="text-decoration: underline"><h3>Issue 1: (sometimes) multiple options</h3></span> If we inline the <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$square</code></span> calls, then <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$hypot</code></span> will have too many instructions to be inlined into <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$tri_hypot</code></span>:</p><div class style="margin-top:4pt;"><span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><pre><code>...<br>function f32 $hypot(f32 %a, f32 %b) {<br>@entry:<br> // more instructions than our inlining treshold:<br> f32 %ase = add %a, 0<br> f32 %as = add %ase, %a<br> f32 %bse = add %b, 0<br> f32 %bs = add %bse, %b<br> f32 %sum = add %as, %bs<br> f32 %o = sqrt %sum<br> ret %o<br>}<br>...</code></pre></span></div></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br>The second option is to inline the <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$hypot</code></span> call into <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$tri_hypot</code></span>. (There are also some other options)</p><p>Now in this case, it seems obvious to prefer inlining <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$square</code></span> into <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$hypot</code></span>.</p></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br><span id="loc-4" style="text-decoration: underline"><h3>Issue 2: ABI requirements on argument passing</h3></span> If we assume the target ABI only has one f32 register for passing arguments, then we would have to generate additional instructions for passing the second argument of <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$hypot</code></span>, and then it might actually be more efficient to inline <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$hypot</code></span> instead of <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$square</code></span>.</p><p>This example is not realistic, but this issue actually occurs when compiling lots of code.</p><p>Another related issue is that having more arguments arranged in a fixed way will require lots of moving data arround at the call site.</p><p>A solution to this is to make the heuristics not just output code size, but also make it depend on the number of arguments / outputs passed to the function.</p></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br><span id="loc-5" style="text-decoration: underline"><h3>Issue 3: (sometimes) prevents optimizations</h3></span></p><div class style="margin-top:4pt;"><span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><pre><code>function f32 $myfunc(f32 %a, f32 %b) {<br>@entry:<br> f32 %sum = add %a, %b<br> f32 %sq = sqrt %sum<br> ...<br>}<br><br>function $callsite(f32 %a, f32 %b) {<br>@entry:<br> f32 %as = add %a, %a<br> f32 %bs = add %b, %b<br> f32 %x = call $myfunc(%as, %bs)<br> ...<br>}</code></pre></span></div><p>If the target has a efficient <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">hypot</code></span> operation, then that operation will only be used if we inline <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$myfunc</code></span> into <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$callsite</code></span>.</p><p>This means that inlining is now depended on… instruction selection??</p><p>This is not the only optimization prevented by not inlining the call. If <span style="
|
||||
border:1pt solid black;
|
||||
border-radius:2pt;
|
||||
|
||||
|
||||
padding:1.6pt;display:inline-block"><code style="white-space: pre-wrap">$callsite</code></span> were to be called in a loop, then not inlining would prevent vectorization.</p></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br><span id="loc-6" style="text-decoration: underline"><h2>Function outlining</h2></span> A related optimization is “outlining”. It’s the opposite to inlining. It moves duplicate code into a function, to reduce code size, and sometimes increase performance (because of instruction caching)</p><p>If we do inlining seperately from outlining, we often get unoptimal code.</p></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><br><span id="loc-7" style="text-decoration: underline"><h2>A better approach</h2></span> We can instead first inline <strong>all</strong> inlinable calls, and <strong>then</strong> perform more agressive outlining.</div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br><span id="loc-8" style="text-decoration: underline"><h3>Step 1: inlining</h3></span> We inline <strong>all</strong> function calls, except for:</p><ul><li>self recursion (obviously)</li><li>functions explicitly marked as no-inline by the user</li></ul></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br><span id="loc-9" style="text-decoration: underline"><h3>Step 2: detect duplicate code</h3></span> There are many algorithms for doing this.</p><p>The goal of this step is to both:</p><ul><li>maximize size of outlinable section</li><li>minimize size of code</li></ul></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br><span id="loc-10" style="text-decoration: underline"><h3>Step 3: slightly reduce size of outlinable section</h3></span> The goal is to reduce size of outlinable sections, to make the code more optimal.</p><p>This should be ABI and instruction depended, and have the goal of:</p><ul><li>reducing argument shuffles required at all call sites</li><li>reducing register preassure</li><li>not preventing good isel choices and optimizations.</li></ul><p>this is also dependent on the targetted code size.</p></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><br><span id="loc-11" style="text-decoration: underline"><h3>Step 4: perform outlining</h3></span> This is obvious.</div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br><span id="loc-12" style="text-decoration: underline"><h3>Issue 1: high compile-time memory usage</h3></span> Inlining <strong>all</strong> function calls first will increase the memory usage during compilation by A LOT</p><p>I’m sure that there is a smarter way to implement this method, without actually performing the inlining…</p></div><div style="
|
||||
|
||||
|
||||
|
||||
|
||||
"><p><br><span id="loc-13" style="text-decoration: underline"><h2>Conclusion</h2></span> Function inlining is much more complex than one might think.</p><p>PS: No idea how to implement this…</p><p>Subscribe to the <a href="atom.xml">Atom feed</a> to get notified about futre compiler-related articles.</p></div></span></span></td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</table>
|
||||
<script>
|
||||
|
||||
function gotoVariant(variant) {
|
||||
window.location.href = window.location.href.replace(/\.\w+.html/g, variant);
|
||||
}
|
||||
|
||||
window.addEventListener('beforeprint', (event) => {
|
||||
gotoVariant('.min.pdf');
|
||||
});
|
||||
|
||||
</script>
|
||||
<script src="coffee.js" async>
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
Reference in New Issue
Block a user