1 line
15 KiB
HTML
1 line
15 KiB
HTML
<!doctype html><title>Automatically inlining functions is not easy</title><meta charset=utf-8><meta content="width=device-width,initial-scale=1" name=viewport><link href=res/favicon.png rel=icon sizes=512x512><link href=res/favicon.png rel=image_src type=image/png><link title="alexs168's blog" href=atom.xml rel=alternate type=application/atom+xml><body><style>@font-face{font-family:DejaVu Sans Mono;src:local(DejaVu Sans Mono),url(res/DejaVuSansMono.woff2)format("woff2"),local(Courier New),local(Courier),local(monospace);font-weight:400;font-style:normal;font-display:swap}body{font-family:DejaVu Sans Mono;font-size:17pt}td{vertical-align:top;width:100%;display:inline}h1,h2,h3,h4{margin-top:1%;margin-bottom:.75%;margin-left:-.75%}p{margin-top:.75%;margin-bottom:.75%}ul{margin-top:0%}.current{font-weight:700}pre{margin-top:0;margin-bottom:0;display:inline}a,a:visited{color:#3f51b5;text-decoration:none}</style><table><tr><td><span class=sidebar><span class=column-fixed style=justify-content:center;width:25%;display:inline-flex;position:fixed><table><tr><td><span class=table-of-contents><div style="border:1.2pt solid #000;border-radius:2pt;padding:3%"><p><span style=text-decoration:underline>Table of contents</span><br><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>├─</span> <span style=flex:1><span class=headingr id=headingr-0><a href=#loc-1>Introduction</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>├─</span> <span style=flex:1><span class=headingr id=headingr-1><a href=#loc-2>Greedy inlining with heuristics</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>| ├─</span> <span style=flex:1><span class=headingr id=headingr-2><a href=#loc-3>Issue 1: (sometimes) multiple options</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>| ├─</span> <span style=flex:1><span class=headingr id=headingr-3><a href=#loc-4>Issue 2: ABI requirements on argument passing</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>| └─</span> <span style=flex:1><span class=headingr id=headingr-4><a href=#loc-5>Issue 3: (sometimes) prevents optimizations</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>├─</span> <span style=flex:1><span class=headingr id=headingr-5><a href=#loc-6>Function outlining</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>├─</span> <span style=flex:1><span class=headingr id=headingr-6><a href=#loc-7>A better approach</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>| ├─</span> <span style=flex:1><span class=headingr id=headingr-7><a href=#loc-8>Step 1: inlining</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>| ├─</span> <span style=flex:1><span class=headingr id=headingr-8><a href=#loc-9>Step 2: detect duplicate code</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>| ├─</span> <span style=flex:1><span class=headingr id=headingr-9><a href=#loc-10>Step 3: slightly reduce size of outlinable section</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>| ├─</span> <span style=flex:1><span class=headingr id=headingr-10><a href=#loc-11>Step 4: perform outlining</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>| └─</span> <span style=flex:1><span class=headingr id=headingr-11><a href=#loc-12>Issue 1: high compile-time memory usage</a></span></span></span><p style=line-height:1.1><span style=text-indent:0;display:flex><span style=margin-right:11pt>└─</span> <span style=flex:1><span class=headingr id=headingr-12><a href=#loc-13>Conclusion</a></span></span></span></div></span><span class=table-of-contents><script>document.addEventListener(`DOMContentLoaded`,(()=>{let a=[`h2`,`h3`,`h4`].flatMap(a=>Array.from(document.getElementsByTagName(a))).sort((a,b)=>a.getBoundingClientRect().top- b.getBoundingClientRect().top);let b=document.documentElement.scrollHeight- window.innerHeight;document.addEventListener(`scroll`,c=>{let d=-(document.documentElement.getBoundingClientRect().y/b);let e=d*window.innerHeight;let f=a.map(a=>0>a.getBoundingClientRect().top- e).lastIndexOf(!0);Array.from(document.getElementsByClassName(`headingr`)).map(a=>a.classList.remove(`current`));f!=-1&&document.getElementById(`headingr-`+ f).classList.add(`current`)})}))</script><style>.table-of-contents>p>span{width:100%}</style></span><tr><td><br> <a href=index.html><b>Website Home</b></a> <br><tr><td><p>Renderings of this page:<ul><li><a onclick='gotoVariant(".min.pdf");' href=#>minimal PDF (printable)</a><li><a onclick='gotoVariant(".nano.html");' href=#>minimal HTML</a></ul><tr><td><a href=atom.xml>Atom feed</a> <br><tr><td><style>@media only screen and (width<=1200px){.sidebar{display:none!important}.column-fixed{width:0%!important}.body-column{left:3%!important}}@media only screen and (width<=1800px){.body-column>span{width:75%!important}}@media only screen and (width<=1200px){.body-column{width:97%!important}.body-column>span{width:100%!important}}.hide{background:#000;transition:background .3s linear;display:inline}.hide:hover,.hide:focus{background:0 0}</style></table><style>.column-fixed>table>tbody>tr>td>*{width:100%}</style></span></span><td><span class=body-column style=width:72%;position:absolute;left:28%><span style=width:50%;display:inline-block><div><p><br><h1>Automatically inlining functions is not easy</h1><p><span style=font-size:14pt><p>Git revision <a href=https://github.com/alex-s168/website/tree/9c2913af189b62c028f6f773370f50f9e6c13307>#9c2913af</a><br>Modified at 11. August 2025 16:38<p>Written by <a href=https://alex.vxcc.dev>alex_s168</a></span></div><div><br>Note that the <a onclick='gotoVariant(".min.pdf");' href=#>PDF Version</a> of this page might look a bit better styling wise.</div><div><br><span id=loc-1 style=text-decoration:underline><h2>Introduction</h2></span> Function calls have some overhead, which can sometimes be a big issue for other optimizations. Because of that, compiler backends (should) inline function calls. There are however many issues with just greedily inlining calls…</div><div><p><br><span id=loc-2 style=text-decoration:underline><h2>Greedy inlining with heuristics</h2></span> This is the most obvious approach. We can just inline all functions with only one call, and then inline calls where the inlined function does not have many instructions.<p>Example:<div style=margin-top:4pt><span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><pre><code>function f32 $square(f32 %x) {<br>@entry:<br> // this is stupid, but I couldn't come up with a better example<br> f32 %e = add %x, 0<br> f32 %out = add %e, %x<br> ret %out<br>}<br><br>function f32 $hypot(f32 %a, f32 %b) {<br>@entry:<br> f32 %as = call $square(%a)<br> f32 %bs = call $square(%b)<br> f32 %sum = add %as, %bs<br> f32 %o = sqrt %sum<br> ret %o<br>}<br><br>function f32 $tri_hypot({f32, f32} %x) {<br> f32 %a = extract %x, 0<br> f32 %b = extract %x, 1<br> f32 %o = call $hypot(%a, %b) // this is a "tail call"<br> ret %o<br>}<br><br>// let's assume that $hypot is used someplace else in the code too</code></pre></span></div></div><div><br>Let’s assume our inlining treshold is 5 operations. Then we would get – Waait there are multiple options…</div><div><p><br><span id=loc-3 style=text-decoration:underline><h3>Issue 1: (sometimes) multiple options</h3></span> If we inline the <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$square</code></span> calls, then <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$hypot</code></span> will have too many instructions to be inlined into <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$tri_hypot</code></span>:<div style=margin-top:4pt><span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><pre><code>...<br>function f32 $hypot(f32 %a, f32 %b) {<br>@entry:<br> // more instructions than our inlining treshold:<br> f32 %ase = add %a, 0<br> f32 %as = add %ase, %a<br> f32 %bse = add %b, 0<br> f32 %bs = add %bse, %b<br> f32 %sum = add %as, %bs<br> f32 %o = sqrt %sum<br> ret %o<br>}<br>...</code></pre></span></div></div><div><p><br>The second option is to inline the <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$hypot</code></span> call into <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$tri_hypot</code></span>. (There are also some other options)<p>Now in this case, it seems obvious to prefer inlining <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$square</code></span> into <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$hypot</code></span>.</div><div><p><br><span id=loc-4 style=text-decoration:underline><h3>Issue 2: ABI requirements on argument passing</h3></span> If we assume the target ABI only has one f32 register for passing arguments, then we would have to generate additional instructions for passing the second argument of <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$hypot</code></span>, and then it might actually be more efficient to inline <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$hypot</code></span> instead of <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$square</code></span>.<p>This example is not realistic, but this issue actually occurs when compiling lots of code.<p>Another related issue is that having more arguments arranged in a fixed way will require lots of moving data arround at the call site.<p>A solution to this is to make the heuristics not just output code size, but also make it depend on the number of arguments / outputs passed to the function.</div><div><p><br><span id=loc-5 style=text-decoration:underline><h3>Issue 3: (sometimes) prevents optimizations</h3></span><div style=margin-top:4pt><span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><pre><code>function f32 $myfunc(f32 %a, f32 %b) {<br>@entry:<br> f32 %sum = add %a, %b<br> f32 %sq = sqrt %sum<br> ...<br>}<br><br>function $callsite(f32 %a, f32 %b) {<br>@entry:<br> f32 %as = add %a, %a<br> f32 %bs = add %b, %b<br> f32 %x = call $myfunc(%as, %bs)<br> ...<br>}</code></pre></span></div><p>If the target has a efficient <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>hypot</code></span> operation, then that operation will only be used if we inline <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$myfunc</code></span> into <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$callsite</code></span>.<p>This means that inlining is now depended on… instruction selection??<p>This is not the only optimization prevented by not inlining the call. If <span style="border:1pt solid #000;border-radius:2pt;padding:1.6pt;display:inline-block"><code style=white-space:pre-wrap>$callsite</code></span> were to be called in a loop, then not inlining would prevent vectorization.</div><div><p><br><span id=loc-6 style=text-decoration:underline><h2>Function outlining</h2></span> A related optimization is “outlining”. It’s the opposite to inlining. It moves duplicate code into a function, to reduce code size, and sometimes increase performance (because of instruction caching)<p>If we do inlining seperately from outlining, we often get unoptimal code.</div><div><br><span id=loc-7 style=text-decoration:underline><h2>A better approach</h2></span> We can instead first inline <strong>all</strong> inlinable calls, and <strong>then</strong> perform more agressive outlining.</div><div><p><br><span id=loc-8 style=text-decoration:underline><h3>Step 1: inlining</h3></span> We inline <strong>all</strong> function calls, except for:<ul><li>self recursion (obviously)<li>functions explicitly marked as no-inline by the user</ul></div><div><p><br><span id=loc-9 style=text-decoration:underline><h3>Step 2: detect duplicate code</h3></span> There are many algorithms for doing this.<p>The goal of this step is to both:<ul><li>maximize size of outlinable section<li>minimize size of code</ul></div><div><p><br><span id=loc-10 style=text-decoration:underline><h3>Step 3: slightly reduce size of outlinable section</h3></span> The goal is to reduce size of outlinable sections, to make the code more optimal.<p>This should be ABI and instruction depended, and have the goal of:<ul><li>reducing argument shuffles required at all call sites<li>reducing register preassure<li>not preventing good isel choices and optimizations.</ul><p>this is also dependent on the targetted code size.</div><div><br><span id=loc-11 style=text-decoration:underline><h3>Step 4: perform outlining</h3></span> This is obvious.</div><div><p><br><span id=loc-12 style=text-decoration:underline><h3>Issue 1: high compile-time memory usage</h3></span> Inlining <strong>all</strong> function calls first will increase the memory usage during compilation by A LOT<p>I’m sure that there is a smarter way to implement this method, without actually performing the inlining…</div><div><p><br><span id=loc-13 style=text-decoration:underline><h2>Conclusion</h2></span> Function inlining is much more complex than one might think.<p>PS: No idea how to implement this…<p>Subscribe to the <a href=atom.xml>Atom feed</a> to get notified about futre compiler-related articles.</div></span></span><td></table><script>var gotoVariant=(a=>{window.location.href=window.location.href.replace(/\.\w+.html/g,a)});window.addEventListener(`beforeprint`,a=>{gotoVariant(`.min.pdf`)})</script><script async src=coffee.js></script> |