feature request: link time optimization for the linux tarball

Talk about code development, features, specific bugs, enhancements, patches, and similar things.
Forum rules
Please keep everything here strictly on-topic.
This board is meant for Pale Moon source code development related subjects only like code snippets, patches, specific bugs, git, the repositories, etc.

This is not for tech support! Please do not post tech support questions in the "Development" board!
Please make sure not to use this board for support questions. Please post issues with specific websites, extensions, etc. in the relevant boards for those topics.

Please keep things on-topic as this forum will be used for reference for Pale Moon development. Expect topics that aren't relevant as such to be moved or deleted.
brikler

feature request: link time optimization for the linux tarball

Unread post by brikler » 2019-03-07, 19:37

salve,

hopefully is this the right section for this purpose :)

is it possible to compile the palemoon binary for the linux tarball with lto?
palemoon will start faster and the binary will be a little smaller :)

i suppose GCC is used but it should be similar with CLANG and it's really simple to get binaries with lto by adding only two or three CFLAGS/CXXflags/CPPFLAGS

Code: Select all

CFLAGS="-flto=<onlineCPU> -fuse-linker-plugin"
CXXFLAGS="-flto=<onlineCPU> -fuse-linker-plugin"
CPPLAGS="-flto=<onlineCPU> -fuse-linker-plugin"
LDFLAGS="-fuse-ld=gold"
better use ld.gold because it is faster then the default linker
-flto[=n]

This option runs the standard link-time optimizer. When invoked with source code, it generates GIMPLE (one of GCC’s internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.

To use the link-time optimizer, -flto and optimization options should be specified at compile time and during the final link. It is recommended that you compile all the files participating in the same link with the same options and also specify those options at link time.
-fuse-linker-plugin

Enables the use of a linker plugin during link-time optimization. This option relies on plugin support in the linker, which is available in gold or in GNU ld 2.21 or newer.

This option enables the extraction of object files with GIMPLE bytecode out of library archives. This improves the quality of optimization by exposing more code to the link-time optimizer. This information specifies what symbols can be accessed externally (by non-LTO object or during dynamic linking). Resulting code quality improvements on binaries (and shared libraries that use hidden visibility) are similar to -fwhole-program. See -flto for a description of the effect of this flag and how to use it.

This option is enabled by default when LTO support in GCC is enabled and GCC was configured for use with a linker supporting plugins (GNU ld 2.21 or newer or gold).
source for -flto and -fuse-linker-plugin: https://gcc.gnu.org/onlinedocs/gcc/Opti ... ze-Options

what do you think about?


brikler

Re: feature request: link time optimization for the linux tarball

Unread post by brikler » 2019-03-08, 09:19

it's a pity but there is massive performance improvement possible, see pages 31 to 38: http://www.ucw.cz/~hubicka/slides/opensuse2018-e.pdf

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35627
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: feature request: link time optimization for the linux tarball

Unread post by Moonchild » 2019-03-08, 09:53

brikler wrote:it's a pity but there is massive performance improvement possible
On a program of the size and complexity of Pale Moon, this generally doesn't apply. There may be -some- improvement but in general the global scoping doesn't help past a certain size of program, and in fact may cause degradation due to the exponential complexity of linking that will push the linkers used to (and sometimes over) their limits.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

brikler

Re: feature request: link time optimization for the linux tarball

Unread post by brikler » 2019-03-08, 15:04

Moonchild wrote:
brikler wrote:it's a pity but there is massive performance improvement possible
On a program of the size and complexity of Pale Moon, this generally doesn't apply.
… why does it work with firefox?
have you read the linked pdf file in my previous post? … opensuse are using lto since 2012 (factory and leap) and thy wouldn't if it doesn't work as expected

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35627
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: feature request: link time optimization for the linux tarball

Unread post by Moonchild » 2019-03-08, 15:46

Yes I looked at the slides. No, using profile-guided optimization when feeding it benchmarks in the profiling stage does not improve the overall performance of the resulting binary, quite the opposite considering benchmarks are not typical use of the program. Doing this on a large and complex program is doing nothing but trading off overall performance for a higher benchmark score. And that is not even taking into account known stability issues with applications built with profile feedback.

Are you familiar with how profiling works? How do you think profiling will make the compiler respond if you run tons of microbenches with tight loops? What will it prioritize for optimization? What will it de-optimize?

I also didn't say "it does not work"; I said "it generally doesn't apply", as in, it will not have an actual result in using the browser in normal circumstances. Sure you can prove PGO works and that the compiler is doing what you tell it to do by measuring what you set out to prove, but outside of that limited scope it will cause a negative effect.
Why do you think the binary size shrinks? I'll tell you why: because functions that would normally use e.g. unrolled loops will become size optimized (and therefore slower) for being considered "cold paths" because your profiling run doesn't actually exercise that code path.
Beyond a certain size/complexity threshold, profiling simply has no use and only serves to reduce optimization, not improve it.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

brikler

Re: feature request: link time optimization for the linux tarball

Unread post by brikler » 2019-03-08, 19:09

so i am not the first to come a cross with optimization^^
but your are mistaken i don't mean PGO i mean linktime optimization (lto) :)

pgo is expensive because it need two compilations, the first to find code to optimize and the second to use these optimization.
as example code to optimize is eliminate or reduce overhead because the compiler could take the "better" desiccation

Code: Select all

#pgo can be done at the first compilation with 
CFLAGS+=" -fprofile-generate -fprofile-dir=</dir>" 
#and in the second compilation with 
CFLAGS+=" -fprofile-correction -fprofile-use -fprofile-dir=</dir>"

vannilla
Moon Magic practitioner
Moon Magic practitioner
Posts: 2193
Joined: 2018-05-05, 13:29

Re: feature request: link time optimization for the linux tarball

Unread post by vannilla » 2019-03-08, 19:34

brikler wrote:so i am not the first to come a cross with optimization^^
but your are mistaken i don't mean PGO i mean linktime optimization (lto) :)

pgo is expensive because it need two compilations, the first to find code to optimize and the second to use these optimization.
as example code to optimize is eliminate or reduce overhead because the compiler could take the "better" desiccation

Code: Select all

#pgo can be done at the first compilation with 
CFLAGS+=" -fprofile-generate -fprofile-dir=</dir>" 
#and in the second compilation with 
CFLAGS+=" -fprofile-correction -fprofile-use -fprofile-dir=</dir>"
Link-time optimization causes instabilites (or some other problem) as per the link provided by yami_.
I'm not sure, but if those problems would be solved, LTO could probably be used.
About profile-guided optimizations, even if Pale Moon wasn't as complex as it is, the problem with interactive applications is that every code path is potentially "better" than the other, simply because users interact with the application in different ways.
As such, even if developers find out that during their development a certain path can be optimized, it's also true that users can take a different path, essentially cancelling out the optimization, given that those other paths would be less optimized (or not optimized at all.)
Which is what Moonchild said, but it's better to say it again.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35627
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: feature request: link time optimization for the linux tarball

Unread post by Moonchild » 2019-03-08, 20:35

brikler wrote:but your are mistaken i don't mean PGO i mean linktime optimization (lto) :)
I know what you meant, but the slides you were pointing me to were about using profiling where the difference was really seen.

I'm all for using lto if the instabilities can be fixed, which is why we have an open github issue for it. I'm likely also going to use /GL with MSVC again once I've had time to do proper stability tests and unified building has been cut down on.

By the way, Mozilla, despite all of this knowledge about profiling, still uses it in Firefox on Windows -- they also use both unified building AND link-time code generation, which is stacking multiple potentially conflicting technologies on top of each other -- I'm relatively sure the only reason it's still somewhat stable is because of a lot of trial and error through automated builds and excluding certain files due to "compiler bugs" (actually build engineering bugs) over the years.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

brikler

Re: feature request: link time optimization for the linux tarball

Unread post by brikler » 2019-03-09, 08:44

Moonchild wrote: I'm all for using lto if the instabilities can be fixed, which is why we have an open github issue for it.


excluding certain files due to "compiler bugs" (actually build engineering bugs) over the years.
i am glad :)

you mean the two gentoo user in this bug report https://github.com/MoonchildProductions/uxp/issues/104 ?
the problem was module ordering by the linker, thy used gcc 7 but gcc evolved up to 8 and ld.gold isn't necessary anymore…

brikler

Re: feature request: link time optimization for the linux tarball

Unread post by brikler » 2019-03-11, 09:18

Moonchild wrote: Are you familiar with how profiling works? How do you think profiling will make the compiler respond if you run tons of microbenches with tight loops? What will it prioritize for optimization? What will it de-optimize?
do you know the necessary steps for pgo?
1. compile the program
2.1 install and use it
2.2 you need a directory with read and write permission
2.3 while the program is in use it will be profiled to eliminate dead code and optimize code structure
3. recompile it with the new code profile

i think tight loops will be not touched but for loop optimization you can use "graphite".
how does it work: https://www.cs.utexas.edu/~pingali/CS38 ... aphite.pdf

Code: Select all

CFLAGS+=" -fgraphite-identity -floop-nest-optimize -ftree-loop-distribution -ftree-vectorize"
CXXFLAGS+=" -fgraphite-identity -floop-nest-optimize -ftree-loop-distribution -ftree-vectorize"

brikler

Re: feature request: link time optimization for the linux tarball

Unread post by brikler » 2019-03-15, 14:32

i was able to build palemoon with lto and i would say: the bug is fixed with a quick linker change 8-)

Code: Select all

-rw-r--r-- 1 tom tom 36591088 15.03.2019 15:14 palemoon-28.4.0-1-x86_64.pkg.tar.xz
-rw-r--r-- 1 tom tom 41316836 23.02.2019 10:18 palemoon-bin-28.4.0-1-x86_64.pkg.tar.xz
ld.gold failed but this isn't a problem because it's possible to change the linker to ld.bfd and restart the compilation and nothing is lost…probably is ld.bfd in general more robust then ld.gold.

Code: Select all

[tom@frija palemoon-bin]$ pacman -Q gcc
gcc 8.2.0-2

Locked