However, I feel we're still being kinda dogmatic. Yes we've struck a fine balance in how we use unified building, a lot better than how Mozilla does it, but I think we might've dismissed link-time code generation too early. I don't see LTCG being a thing for the whole platform (especially xul.dll which despite the splitting of gkmedias, JS, and ICU is still quite the libevil) in the near future, but we can still use LTCG for some shared libraries we build. One example would be Spidermonkey. Since we've finally splitted the JS library out of libxul in Issue #62 (UXP), I think Spidermonkey would be a perfect LTO candidate for the following reasons:
- It's fairly self-contained: Nearly all of the listing of Spidermonkey's sources reside in js/src/moz.build. There are other two mozbuilds which builds into Spidermonkey residing in config/external/ffi and modules/fdlibm/src, which shouldn't be hassle to add CFLAGS and LDFLAGS in. This is unlike gkmedias where sources are scattered into many different directories across the tree.
- The deprot has long been resolved: We want to build all sources de-unified if we're using whole program optimization, since unified building is redundant (and inferior) in the field of optimizing code across many separate but related files. It should be easy to build Spidermonkey deunified again. Oh, and since we're no longer using unified building we get a better debugging experience as a bonus!
- The filesize: Spidermonkey sits in a pretty sweet spot of 7.26 MB (most of the size is from code, not data), which isn't too big to torture the linker and isn't too small to be pointless.
- We can probably revisit Issue #1676 (UXP): I admit that I'm still a bit bitter about this.
I just prefer smaller mozbuilds and easier porting of Mozilla code. IIUC it got backed-out because it screwed with the performance gains from unified building of chunks of code. With LTCG this should no longer be an issue, as all source files are now taken into consideration when optimizing the library instead of optimizing by chunks.
- All of Spidermonkey is optimized fairly and evenly instead of the arbitrary chunk by chunk method of UB.
- Spidermonkey is essentially it's own program so WPO actually makes sense here.
So that's it about LTO. I might try ICU next as that one is similar in complexity to Spidermonkey, but I will probably never do this for gkmedias for two reasons: too many mozbuilds to account for, and too many code that are unrelated to each other. Speaking of which, our unified building of gkmedias sources is fine so far, but can be better. What about instead of thinking unified building solely as a tool for faster from-clobber compiles, we think of it as some sort of mini-WPO in gkmedias? I propose we sacrifice some build parallelism, debuggability, and incremental compile times in exchange for using larger chunks of related code for unified building. This has already been done in commit 3122de9547 from Issue #1862 (UXP) where all of harfbuzz sources has been compiled into a single chunk. We can probably do this in other third-party media libraries we have too. Obviously not every library should be compiled as a single chunk, but the idea is to limit the arbitrary amount of unified chunks produced so we can optimize more fairly and evenly.
Let's look at libjxl as an example (even though we currently build it de-unified, but in libopus there are unified sources; I'm confused what is our preferred mode of building for /media). It's probably a bad idea to build every source there into a single chunk, so let's have more than one. One chunk could be for the JXL decoder, one could be for modular decoding, and another for the decoding of JPEG image data within a JPEG-XL file. It's pretty oversimplified but hopefully you can get the idea.
So that pretty much sums up what I'd like to do with how we handle optimizations in the build system right now. I would really appreciate any thoughts and criticisms you may have with this post. For all I know I could be talking out of my ass all this time; I don't have the years of experience Moonchild has dealing with code optimizations (it's his forte after all, and is the reason why Pale Moon exists in the first place: because we can do better than how Mozilla does optimizing stuff). But hopefully all the research and experimentation for this effort pays off in some way...
