Let's rethink unified building and LTCG

Discussions about the development and maturation of the platform code (UXP).
Warning: may contain highly-technical topics.

Moderators: trava90, athenian200

User avatar
jobbautista9
Keeps coming back
Keeps coming back
Posts: 782
Joined: 2020-11-03, 06:47
Location: Philippines
Contact:

Let's rethink unified building and LTCG

Unread post by jobbautista9 » 2023-04-22, 03:43

So I've been following Issue #80 (UXP) for a while before it got closed. I have to admit that I was disappointed when the effort to de-unify everything had to be stopped, as I was excited to get the benefits of improved debuggability as well as link-time optimization.. But I understand why it has to be done. Linkers are just not suited to handle such a large number of objects and functions, and it's been proven that optimizing in compile-time instead of link-time yields better trade-offs for a codebase like ours. Mozilla has been dogmatic in shoving in as many optimizations as possible to their platform (even if such optimizations either conflict each other or are just redundant). Continuing our path in Issue #80 would have been dogmatic as well, just in the other direction.

However, I feel we're still being kinda dogmatic. Yes we've struck a fine balance in how we use unified building, a lot better than how Mozilla does it, but I think we might've dismissed link-time code generation too early. I don't see LTCG being a thing for the whole platform (especially xul.dll which despite the splitting of gkmedias, JS, and ICU is still quite the libevil) in the near future, but we can still use LTCG for some shared libraries we build. One example would be Spidermonkey. Since we've finally splitted the JS library out of libxul in Issue #62 (UXP), I think Spidermonkey would be a perfect LTO candidate for the following reasons:
  • It's fairly self-contained: Nearly all of the listing of Spidermonkey's sources reside in js/src/moz.build. There are other two mozbuilds which builds into Spidermonkey residing in config/external/ffi and modules/fdlibm/src, which shouldn't be hassle to add CFLAGS and LDFLAGS in. This is unlike gkmedias where sources are scattered into many different directories across the tree.
  • The deprot has long been resolved: We want to build all sources de-unified if we're using whole program optimization, since unified building is redundant (and inferior) in the field of optimizing code across many separate but related files. It should be easy to build Spidermonkey deunified again. Oh, and since we're no longer using unified building we get a better debugging experience as a bonus!
  • The filesize: Spidermonkey sits in a pretty sweet spot of 7.26 MB (most of the size is from code, not data), which isn't too big to torture the linker and isn't too small to be pointless.
  • We can probably revisit Issue #1676 (UXP): I admit that I'm still a bit bitter about this. :P I just prefer smaller mozbuilds and easier porting of Mozilla code. IIUC it got backed-out because it screwed with the performance gains from unified building of chunks of code. With LTCG this should no longer be an issue, as all source files are now taken into consideration when optimizing the library instead of optimizing by chunks.
  • All of Spidermonkey is optimized fairly and evenly instead of the arbitrary chunk by chunk method of UB.
  • Spidermonkey is essentially it's own program so WPO actually makes sense here. :D
I've done a build with LTO enabled for Spidermonkey before writing this. The build time didn't really change much, thanks to the initial compilation stage being faster with LTO. Memory usage did increase significantly (as expected from LTO), but not too much. It's working perfectly fine so far. mozjs.dll's filesize went down from 7,436KB to 7,221KB, which is within expectations. I don't really expect miraculous improvements in JavaScript crunching, but I'm not going to discount the possibility that we will gain performance improvements either, due to LTO doing better code optimizations than compile-time optimization in paper. Besides, it's not the performance I'm aiming here with this endeavor but rather a saner and more balanced approach to code optimization. You can see my work branch here.

So that's it about LTO. I might try ICU next as that one is similar in complexity to Spidermonkey, but I will probably never do this for gkmedias for two reasons: too many mozbuilds to account for, and too many code that are unrelated to each other. Speaking of which, our unified building of gkmedias sources is fine so far, but can be better. What about instead of thinking unified building solely as a tool for faster from-clobber compiles, we think of it as some sort of mini-WPO in gkmedias? I propose we sacrifice some build parallelism, debuggability, and incremental compile times in exchange for using larger chunks of related code for unified building. This has already been done in commit 3122de9547 from Issue #1862 (UXP) where all of harfbuzz sources has been compiled into a single chunk. We can probably do this in other third-party media libraries we have too. Obviously not every library should be compiled as a single chunk, but the idea is to limit the arbitrary amount of unified chunks produced so we can optimize more fairly and evenly.

Let's look at libjxl as an example (even though we currently build it de-unified, but in libopus there are unified sources; I'm confused what is our preferred mode of building for /media). It's probably a bad idea to build every source there into a single chunk, so let's have more than one. One chunk could be for the JXL decoder, one could be for modular decoding, and another for the decoding of JPEG image data within a JPEG-XL file. It's pretty oversimplified but hopefully you can get the idea.

So that pretty much sums up what I'd like to do with how we handle optimizations in the build system right now. I would really appreciate any thoughts and criticisms you may have with this post. For all I know I could be talking out of my ass all this time; I don't have the years of experience Moonchild has dealing with code optimizations (it's his forte after all, and is the reason why Pale Moon exists in the first place: because we can do better than how Mozilla does optimizing stuff). But hopefully all the research and experimentation for this effort pays off in some way... :)
Image

merry mimas

XUL add-ons developer. You can find a list of add-ons I manage at http://rw.rs/~job/software.html.

Mima avatar by 絵虎. Pixiv post: https://www.pixiv.net/en/artworks/15431817

Image

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35474
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Let's rethink unified building and LTCG

Unread post by Moonchild » 2023-04-30, 14:00

Deprot is almost inevitable if the upstream doesn't test for it and we build unified.
With JS split out in its own .dll going back to de-unified there makes sense; it's the ever-necessary search for code balancing with our code size. The issue I have with LTCG though is not necessarily unified building; but rather the fact that pushing the linker so hard almost always (with our code size) results in less optimized code than making a standard codegen-at-compile-time build and unless there is a significant improvement to be had I'd prefer to not use LTCG as it makes troubleshooting performance issues much harder due to the compile optimizations being in a mostly unknown state and not reproducible build-to-build.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35474
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Let's rethink unified building and LTCG

Unread post by Moonchild » 2023-04-30, 22:59

Benchmarking in dromaeo puts non-unified and LTCG behind unified without LTCG on average in the builds I just did back to back for apples to apples as much as possible.
The percentage average isn't huge in most cases, but certainly noticeable.
Attachments
jslto1.png
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

q160765803
Apollo supporter
Apollo supporter
Posts: 32
Joined: 2023-04-13, 07:57

Re: Let's rethink unified building and LTCG

Unread post by q160765803 » 2023-04-30, 23:05

Moonchild wrote:
2023-04-30, 22:59
Benchmarking in dromaeo puts non-unified and LTCG behind unified without LTCG on average in the builds I just did back to back for apples to apples as much as possible.
The percentage average isn't huge in most cases, but certainly noticeable.
Is value lower the better or reverse?

User avatar
jobbautista9
Keeps coming back
Keeps coming back
Posts: 782
Joined: 2020-11-03, 06:47
Location: Philippines
Contact:

Re: Let's rethink unified building and LTCG

Unread post by jobbautista9 » 2023-05-01, 03:06

Moonchild wrote:
2023-04-30, 14:00
The issue I have with LTCG though is not necessarily unified building; but rather the fact that pushing the linker so hard almost always (with our code size) results in less optimized code than making a standard codegen-at-compile-time build and unless there is a significant improvement to be had I'd prefer to not use LTCG as it makes troubleshooting performance issues much harder due to the compile optimizations being in a mostly unknown state and not reproducible build-to-build.
I didn't know that the issue with LTCG is deeper than I thought. The more you know I guess! :D
Moonchild wrote:
2023-04-30, 22:59
Benchmarking in dromaeo puts non-unified and LTCG behind unified without LTCG on average in the builds I just did back to back for apples to apples as much as possible.
The percentage average isn't huge in most cases, but certainly noticeable.
I haven't expected that much regression tbh. Raytracing being slower isn't a big deal to me, but arrays, code evaluation, string operations, and regex regressing (even by just a percent) is concerning... I wonder if you can test with all sources being unified + LTCG? I can't seem to see an working online verson of the dromaeo benchmark, so I couldn't test myself, and I'd like to see if it's the LTCG being the regressor or the de-unification of sources.
Image

merry mimas

XUL add-ons developer. You can find a list of add-ons I manage at http://rw.rs/~job/software.html.

Mima avatar by 絵虎. Pixiv post: https://www.pixiv.net/en/artworks/15431817

Image

User avatar
jobbautista9
Keeps coming back
Keeps coming back
Posts: 782
Joined: 2020-11-03, 06:47
Location: Philippines
Contact:

Re: Let's rethink unified building and LTCG

Unread post by jobbautista9 » 2023-05-01, 03:13

q160765803 wrote:
2023-04-30, 23:05
Is value lower the better or reverse?
It's a mix of both; you should look at the colors instead. Green percentage means deunified sources + LTCG has performed significantly better than the current unified w/o LTCG, while red means the opposite. Green highlights shows which build did better in each test, regardless of significance.
Last edited by jobbautista9 on 2023-05-01, 03:13, edited 1 time in total.
Image

merry mimas

XUL add-ons developer. You can find a list of add-ons I manage at http://rw.rs/~job/software.html.

Mima avatar by 絵虎. Pixiv post: https://www.pixiv.net/en/artworks/15431817

Image

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35474
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Let's rethink unified building and LTCG

Unread post by Moonchild » 2023-05-01, 03:13

jobbautista9 wrote:
2023-05-01, 03:06
I can't seem to see an working online verson of the dromaeo benchmark, so I couldn't test myself,
A copy of it is available (sans saving results) on https://testserver.palemoon.org/dromaeo -- Tobin made a rip of it to self-host.
I used "All javascript tests" for the run I posted.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
jobbautista9
Keeps coming back
Keeps coming back
Posts: 782
Joined: 2020-11-03, 06:47
Location: Philippines
Contact:

Re: Let's rethink unified building and LTCG

Unread post by jobbautista9 » 2023-05-01, 03:14

Moonchild wrote:
2023-05-01, 03:13
A copy of it is available (sans saving results) on https://testserver.palemoon.org/dromaeo -- Tobin made a rip of it to self-host.
Thanks! I will do the tests myself then. :)
Image

merry mimas

XUL add-ons developer. You can find a list of add-ons I manage at http://rw.rs/~job/software.html.

Mima avatar by 絵虎. Pixiv post: https://www.pixiv.net/en/artworks/15431817

Image

q160765803
Apollo supporter
Apollo supporter
Posts: 32
Joined: 2023-04-13, 07:57

Re: Let's rethink unified building and LTCG

Unread post by q160765803 » 2023-05-01, 03:16

jobbautista9 wrote:
2023-05-01, 03:13
q160765803 wrote:
2023-04-30, 23:05
Is value lower the better or reverse?
It's a mix of both; you should look at the colors instead. Green percentage means deunified sources + LTCG has performed significantly better than the current unified w/o LTCG, while red means the opposite. Green highlights shows which build did better in each test, regardless of significance.
so fields with green background is better, right?

User avatar
jobbautista9
Keeps coming back
Keeps coming back
Posts: 782
Joined: 2020-11-03, 06:47
Location: Philippines
Contact:

Re: Let's rethink unified building and LTCG

Unread post by jobbautista9 » 2023-05-01, 03:17

q160765803 wrote:
2023-05-01, 03:16
so fields with green background is better, right?
Yes.
Image

merry mimas

XUL add-ons developer. You can find a list of add-ons I manage at http://rw.rs/~job/software.html.

Mima avatar by 絵虎. Pixiv post: https://www.pixiv.net/en/artworks/15431817

Image

User avatar
jobbautista9
Keeps coming back
Keeps coming back
Posts: 782
Joined: 2020-11-03, 06:47
Location: Philippines
Contact:

Re: Let's rethink unified building and LTCG

Unread post by jobbautista9 » 2023-05-01, 10:38

I've just done some tests. I'm not sure if I can draw a clear conclusion from this; the results heavily varies.

Deunified + LTCG wins a 6% total over the previous unified + non-LTCG, with big wins in RegEx, Primes, and Richards. It loses significantly in the DNA Count test.

Deunified + LTCG holds up well against its non-LTCG counterpart, with a 2.6% win total. Against unified + LTCG, 3.5% win total.

If we're not going to use LTCG, then deunified apparently seems to perform better than unified, with a 3.2% win total, big wins in 3D raytrace, AES, RegEx, and Richards, and a significant loss in the DNA Count test.

But amidst every win, there are several minor losses in deunified + LTCG. While they're small, they could add up.

However if you ask me if going with deunified + LTCG is going to be a performance issue, I think we shouldn't worry about that.

The hardware I used for this benchmark is an ASUS ROG Strix G15, which has 16 GB RAM and a Ryzen 7 4800H CPU.
Attachments
benchmark-result.pdf
(334.76 KiB) Downloaded 24 times
Image

merry mimas

XUL add-ons developer. You can find a list of add-ons I manage at http://rw.rs/~job/software.html.

Mima avatar by 絵虎. Pixiv post: https://www.pixiv.net/en/artworks/15431817

Image

Locked