Build flakiness, possibly related to high parallelism?

Talk about code development, features, specific bugs, enhancements, patches, and similar things.
Forum rules
Please keep everything here strictly on-topic.
This board is meant for Pale Moon source code development related subjects only like code snippets, patches, specific bugs, git, the repositories, etc.

This is not for tech support! Please do not post tech support questions in the "Development" board!
Please make sure not to use this board for support questions. Please post issues with specific websites, extensions, etc. in the relevant boards for those topics.

Please keep things on-topic as this forum will be used for reference for Pale Moon development. Expect topics that aren't relevant as such to be moved or deleted.
User avatar
OPNA2608
Hobby Astronomer
Hobby Astronomer
Posts: 24
Joined: 2019-09-27, 09:30

Build flakiness, possibly related to high parallelism?

Unread post by OPNA2608 » 2022-06-12, 14:07

On the NixOS community builders (which handle our pull request tests), Pale Moon usually struggles to compile. Most of the time it errors out very early with ~

Code: Select all

78 0:34.78(B Traceback (most recent call last):
78 0:34.78(B   File "/build/source/platform/config/pythonpath.py", line 56, in <module>
78 0:34.78(B     main(sys.argv[1:])
78 0:34.78(B   File "/build/source/platform/config/pythonpath.py", line 48, in main
78 0:34.79(B     execfile(script, frozenglobals)
78 0:34.79(B   File "/build/source/platform/python/mozbuild/mozbuild/action/xpidl-process.py", line 20, in <module>
78 0:34.79(B     from xpidl.typelib import write_typelib
78 0:34.79(B   File "/build/source/platform/xpcom/idl-parser/xpidl/typelib.py", line 13, in <module>
78 0:34.79(B     import xpt
78 0:34.79(B EOFError: EOF read where object expected
78 0:34.79(B make[6]: *** [Makefile:45: ../../../dist/bin/components/cookie.xpt] Error 1
An example of a full log can be found here: https://logs.nix.ci/?attempt_id=6ebbd3b ... kgs.177295

Pull request reviewers also run into this error sometimes. Looking through the build requirements, I don't think this could be due to a lack of free memory since it's about an on-disk file?

I do not know what other reviewers are running their tests with, but our community builder tries to power through builds with a very high amount of parallelism - MOZ_PARALLEL_BUILD=64. I suspect this could be the problem since the highest I've ever tested Pale Moon builds with is 36-40, though not extensively.

Do you have a clue what could be going on here? If it's about the parallelism count, do you think this is a problem that could be fixed? And if not, do you have a maximum known-working thread count I could limit the build script to as a workaround?

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35403
Joined: 2011-08-28, 17:27
Location: Motala, SE
Contact:

Re: Build flakiness, possibly related to high parallelism?

Unread post by Moonchild » 2022-06-12, 15:23

Parallelism shouldn't be an issue, although the most I ever build with is 24 jobs (releases are built with 20), although I have never tested with an excessive amount of build threads like that. I think it's simply not within scope of the python/configure part of our build system. The build system does use temporary files for configuration.
Try powering through a little less excessively (e.g. 32 parallel jobs) so python configure has a chance to write out the temporary configuration files before they are being used.
Another potential issue is the related I/O when doing many parallel tasks, especially early in the build process where many files are being read and written by the build system. if you want to run many parallel tasks you must have very fast I/O as well.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
athenian200
Contributing developer
Contributing developer
Posts: 1481
Joined: 2018-10-28, 19:56
Location: Georgia

Re: Build flakiness, possibly related to high parallelism?

Unread post by athenian200 » 2022-06-12, 16:00

I wonder if this could cause problems? I remember running into errors like this when I tried to compile Pale Moon on a SPARC server. I kept getting a random EOF error no matter how many other things I fixed. But I remember I had an insane number of threads... 8 cores and 8 threads per core. Which meant it had something like 64 threads. I never thought it would be a bad thing to use the processor to its fullest potential because there was such a thing as too much parallelism.

I don't know if limiting the number of threads will help, but I think it should work with 32 threads. Some people on Ryzen processors have 16 cores and 32 threads, and they seem to have no issues compiling the browser in record time. But 64 might be too much.
"The Athenians, however, represent the unity of these opposites; in them, mind or spirit has emerged from the Theban subjectivity without losing itself in the Spartan objectivity of ethical life. With the Athenians, the rights of the State and of the individual found as perfect a union as was possible at all at the level of the Greek spirit." -- Hegel's philosophy of Mind

User avatar
Nuck-TH
Project Contributor
Project Contributor
Posts: 195
Joined: 2020-03-02, 16:04

Re: Build flakiness, possibly related to high parallelism?

Unread post by Nuck-TH » 2022-06-12, 18:42

Just to add note - my experimentation shows, that cores are utilized fully while compiling, so setting jobs to SMT "cores" of my Ryzen 5600x doesn't yield any benefit, just marginally increases build time. Fastest time is when i set -j6, which is equal to number of physical cores.
Cores+1 jobs may be beneficial if I/O is not fast enough(when core count is low, high core count needs fast I/O), but again - i have fast SSD, so it doesn't yield any significant changes for me.

Locked