feature request: link time optimization for the linux tarball

Suggestions and feature requests for the Pale Moon browser

Moderator: satrow

brikler
Moongazer
Moongazer
Posts: 14
Joined: Fri, 25 Nov 2016, 18:41

feature request: link time optimization for the linux tarball

Unread postby brikler » Thu, 07 Mar 2019, 19:37

salve,

hopefully is this the right section for this purpose :)

is it possible to compile the palemoon binary for the linux tarball with lto?
palemoon will start faster and the binary will be a little smaller :)

i suppose GCC is used but it should be similar with CLANG and it's really simple to get binaries with lto by adding only two or three CFLAGS/CXXflags/CPPFLAGS

Code: Select all

CFLAGS="-flto=<onlineCPU> -fuse-linker-plugin"
CXXFLAGS="-flto=<onlineCPU> -fuse-linker-plugin"
CPPLAGS="-flto=<onlineCPU> -fuse-linker-plugin"
LDFLAGS="-fuse-ld=gold"

better use ld.gold because it is faster then the default linker
-flto[=n]

This option runs the standard link-time optimizer. When invoked with source code, it generates GIMPLE (one of GCC’s internal representations) and writes it to special ELF sections in the object file. When the object files are linked together, all the function bodies are read from these ELF sections and instantiated as if they had been part of the same translation unit.

To use the link-time optimizer, -flto and optimization options should be specified at compile time and during the final link. It is recommended that you compile all the files participating in the same link with the same options and also specify those options at link time.

-fuse-linker-plugin

Enables the use of a linker plugin during link-time optimization. This option relies on plugin support in the linker, which is available in gold or in GNU ld 2.21 or newer.

This option enables the extraction of object files with GIMPLE bytecode out of library archives. This improves the quality of optimization by exposing more code to the link-time optimizer. This information specifies what symbols can be accessed externally (by non-LTO object or during dynamic linking). Resulting code quality improvements on binaries (and shared libraries that use hidden visibility) are similar to -fwhole-program. See -flto for a description of the effect of this flag and how to use it.

This option is enabled by default when LTO support in GCC is enabled and GCC was configured for use with a linker supporting plugins (GNU ld 2.21 or newer or gold).

source for -flto and -fuse-linker-plugin: https://gcc.gnu.org/onlinedocs/gcc/Opti ... ze-Options

what do you think about?

yami_
Lunatic
Lunatic
Posts: 404
Joined: Thu, 26 Apr 2018, 11:05

Re: feature request: link time optimization for the linux tarball

Unread postby yami_ » Thu, 07 Mar 2019, 19:57

cat came back from Berkeley waving flags -- rob pike

brikler
Moongazer
Moongazer
Posts: 14
Joined: Fri, 25 Nov 2016, 18:41

Re: feature request: link time optimization for the linux tarball

Unread postby brikler » Fri, 08 Mar 2019, 09:19

it's a pity but there is massive performance improvement possible, see pages 31 to 38: http://www.ucw.cz/~hubicka/slides/opensuse2018-e.pdf

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 23008
Joined: Sun, 28 Aug 2011, 17:27
Location: 58°2'16"N 14°58'31"E
Contact:

Re: feature request: link time optimization for the linux tarball

Unread postby Moonchild » Fri, 08 Mar 2019, 09:53

brikler wrote:it's a pity but there is massive performance improvement possible

On a program of the size and complexity of Pale Moon, this generally doesn't apply. There may be -some- improvement but in general the global scoping doesn't help past a certain size of program, and in fact may cause degradation due to the exponential complexity of linking that will push the linkers used to (and sometimes over) their limits.
Improving Mozilla code: You know you're on the right track with code changes when you spend the majority of your time deleting code.

"If you want to build a better world for yourself, you have to be willing to build one for everybody." -- Coyote Osborne

brikler
Moongazer
Moongazer
Posts: 14
Joined: Fri, 25 Nov 2016, 18:41

Re: feature request: link time optimization for the linux tarball

Unread postby brikler » Fri, 08 Mar 2019, 15:04

Moonchild wrote:
brikler wrote:it's a pity but there is massive performance improvement possible

On a program of the size and complexity of Pale Moon, this generally doesn't apply.


… why does it work with firefox?
have you read the linked pdf file in my previous post? … opensuse are using lto since 2012 (factory and leap) and thy wouldn't if it doesn't work as expected

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 23008
Joined: Sun, 28 Aug 2011, 17:27
Location: 58°2'16"N 14°58'31"E
Contact:

Re: feature request: link time optimization for the linux tarball

Unread postby Moonchild » Fri, 08 Mar 2019, 15:46

Yes I looked at the slides. No, using profile-guided optimization when feeding it benchmarks in the profiling stage does not improve the overall performance of the resulting binary, quite the opposite considering benchmarks are not typical use of the program. Doing this on a large and complex program is doing nothing but trading off overall performance for a higher benchmark score. And that is not even taking into account known stability issues with applications built with profile feedback.

Are you familiar with how profiling works? How do you think profiling will make the compiler respond if you run tons of microbenches with tight loops? What will it prioritize for optimization? What will it de-optimize?

I also didn't say "it does not work"; I said "it generally doesn't apply", as in, it will not have an actual result in using the browser in normal circumstances. Sure you can prove PGO works and that the compiler is doing what you tell it to do by measuring what you set out to prove, but outside of that limited scope it will cause a negative effect.
Why do you think the binary size shrinks? I'll tell you why: because functions that would normally use e.g. unrolled loops will become size optimized (and therefore slower) for being considered "cold paths" because your profiling run doesn't actually exercise that code path.
Beyond a certain size/complexity threshold, profiling simply has no use and only serves to reduce optimization, not improve it.
Improving Mozilla code: You know you're on the right track with code changes when you spend the majority of your time deleting code.

"If you want to build a better world for yourself, you have to be willing to build one for everybody." -- Coyote Osborne

brikler
Moongazer
Moongazer
Posts: 14
Joined: Fri, 25 Nov 2016, 18:41

Re: feature request: link time optimization for the linux tarball

Unread postby brikler » Fri, 08 Mar 2019, 19:09

so i am not the first to come a cross with optimization^^
but your are mistaken i don't mean PGO i mean linktime optimization (lto) :)

pgo is expensive because it need two compilations, the first to find code to optimize and the second to use these optimization.
as example code to optimize is eliminate or reduce overhead because the compiler could take the "better" desiccation

Code: Select all

#pgo can be done at the first compilation with
CFLAGS+=" -fprofile-generate -fprofile-dir=</dir>"
#and in the second compilation with
CFLAGS+=" -fprofile-correction -fprofile-use -fprofile-dir=</dir>"

vannilla
Lunatic
Lunatic
Posts: 344
Joined: Sat, 05 May 2018, 13:29

Re: feature request: link time optimization for the linux tarball

Unread postby vannilla » Fri, 08 Mar 2019, 19:34

brikler wrote:so i am not the first to come a cross with optimization^^
but your are mistaken i don't mean PGO i mean linktime optimization (lto) :)

pgo is expensive because it need two compilations, the first to find code to optimize and the second to use these optimization.
as example code to optimize is eliminate or reduce overhead because the compiler could take the "better" desiccation

Code: Select all

#pgo can be done at the first compilation with
CFLAGS+=" -fprofile-generate -fprofile-dir=</dir>"
#and in the second compilation with
CFLAGS+=" -fprofile-correction -fprofile-use -fprofile-dir=</dir>"

Link-time optimization causes instabilites (or some other problem) as per the link provided by yami_.
I'm not sure, but if those problems would be solved, LTO could probably be used.
About profile-guided optimizations, even if Pale Moon wasn't as complex as it is, the problem with interactive applications is that every code path is potentially "better" than the other, simply because users interact with the application in different ways.
As such, even if developers find out that during their development a certain path can be optimized, it's also true that users can take a different path, essentially cancelling out the optimization, given that those other paths would be less optimized (or not optimized at all.)
Which is what Moonchild said, but it's better to say it again.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 23008
Joined: Sun, 28 Aug 2011, 17:27
Location: 58°2'16"N 14°58'31"E
Contact:

Re: feature request: link time optimization for the linux tarball

Unread postby Moonchild » Fri, 08 Mar 2019, 20:35

brikler wrote:but your are mistaken i don't mean PGO i mean linktime optimization (lto) :)

I know what you meant, but the slides you were pointing me to were about using profiling where the difference was really seen.

I'm all for using lto if the instabilities can be fixed, which is why we have an open github issue for it. I'm likely also going to use /GL with MSVC again once I've had time to do proper stability tests and unified building has been cut down on.

By the way, Mozilla, despite all of this knowledge about profiling, still uses it in Firefox on Windows -- they also use both unified building AND link-time code generation, which is stacking multiple potentially conflicting technologies on top of each other -- I'm relatively sure the only reason it's still somewhat stable is because of a lot of trial and error through automated builds and excluding certain files due to "compiler bugs" (actually build engineering bugs) over the years.
Improving Mozilla code: You know you're on the right track with code changes when you spend the majority of your time deleting code.

"If you want to build a better world for yourself, you have to be willing to build one for everybody." -- Coyote Osborne

brikler
Moongazer
Moongazer
Posts: 14
Joined: Fri, 25 Nov 2016, 18:41

Re: feature request: link time optimization for the linux tarball

Unread postby brikler » Sat, 09 Mar 2019, 08:44

Moonchild wrote:I'm all for using lto if the instabilities can be fixed, which is why we have an open github issue for it.


excluding certain files due to "compiler bugs" (actually build engineering bugs) over the years.

i am glad :)

you mean the two gentoo user in this bug report https://github.com/MoonchildProductions/uxp/issues/104 ?
the problem was module ordering by the linker, thy used gcc 7 but gcc evolved up to 8 and ld.gold isn't necessary anymore…

brikler
Moongazer
Moongazer
Posts: 14
Joined: Fri, 25 Nov 2016, 18:41

Re: feature request: link time optimization for the linux tarball

Unread postby brikler » Mon, 11 Mar 2019, 09:18

Moonchild wrote:Are you familiar with how profiling works? How do you think profiling will make the compiler respond if you run tons of microbenches with tight loops? What will it prioritize for optimization? What will it de-optimize?

do you know the necessary steps for pgo?
1. compile the program
2.1 install and use it
2.2 you need a directory with read and write permission
2.3 while the program is in use it will be profiled to eliminate dead code and optimize code structure
3. recompile it with the new code profile

i think tight loops will be not touched but for loop optimization you can use "graphite".
how does it work: https://www.cs.utexas.edu/~pingali/CS38 ... aphite.pdf

Code: Select all

CFLAGS+=" -fgraphite-identity -floop-nest-optimize -ftree-loop-distribution -ftree-vectorize"
CXXFLAGS+=" -fgraphite-identity -floop-nest-optimize -ftree-loop-distribution -ftree-vectorize"

brikler
Moongazer
Moongazer
Posts: 14
Joined: Fri, 25 Nov 2016, 18:41

Re: feature request: link time optimization for the linux tarball

Unread postby brikler » Fri, 15 Mar 2019, 14:32

i was able to build palemoon with lto and i would say: the bug is fixed with a quick linker change 8-)

Code: Select all

-rw-r--r-- 1 tom tom 36591088 15.03.2019 15:14 palemoon-28.4.0-1-x86_64.pkg.tar.xz
-rw-r--r-- 1 tom tom 41316836 23.02.2019 10:18 palemoon-bin-28.4.0-1-x86_64.pkg.tar.xz

ld.gold failed but this isn't a problem because it's possible to change the linker to ld.bfd and restart the compilation and nothing is lost…probably is ld.bfd in general more robust then ld.gold.

Code: Select all

[tom@frija palemoon-bin]$ pacman -Q gcc
gcc 8.2.0-2


Return to “Suggestions/feature requests”

Who is online

Users browsing this forum: No registered users and 1 guest