MSVC vs. ICC & Waterfox

Talk about code development, features, specific bugs, enhancements, patches, and similar things.
Forum rules
Please keep everything here strictly on-topic.
This board is meant for Pale Moon source code development related subjects only like code snippets, patches, specific bugs, git, the repositories, etc.

This is not for tech support! Please do not post tech support questions in the "Development" board!
Please make sure not to use this board for support questions. Please post issues with specific websites, extensions, etc. in the relevant boards for those topics.

Please keep things on-topic as this forum will be used for reference for Pale Moon development. Expect topics that aren't relevant as such to be moved or deleted.
dark_moon

Re: MSVC vs. ICC & Waterfox

Unread post by dark_moon » 2012-06-24, 14:14

Hmm i test then the optimised x64 again the x86.

Current the x86 win again the x64, even if all plugins are disabled (under a fresh profile, without any addons):
x86
Sunspider: 252.0ms +/- 6.6%
Kraken: 3732.9ms +/- 1.6%
V8: 6296

x64
Sunspider: 237.3ms +/- 0.8%
Kraken: 4162.8ms +/- 0.6%
V8: 5383

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35650
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: MSVC vs. ICC & Waterfox

Unread post by Moonchild » 2012-06-24, 15:03

Like I said, before I would consider publishing it, I'd have to test if it's worth my while.

ad edit1: MSVC ;) with some code optimizations
ad edit2: Mostly, yes, but it'd be more akin to avoiding some Intel-specific optimizations rather than spending my time coding assembler
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35650
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: MSVC vs. ICC & Waterfox

Unread post by Moonchild » 2012-06-24, 15:08

dark_moon wrote:Hmm i test then the optimised x64 again the x86.

Current the x86 win again the x64, even if all plugins are disabled (under a fresh profile, without any addons):
x86
Sunspider: 252.0ms +/- 6.6%
Kraken: 3732.9ms +/- 1.6%
V8: 6296

x64
Sunspider: 237.3ms +/- 0.8%
Kraken: 4162.8ms +/- 0.6%
V8: 5383
Pure-JS benchmarks will give you a wrong picture. x64 in these tests has an overhead in variable sizes that hurts these test results. In addition, these are tight loops. You can't compare these, especially so for sunspider, kraken and V8 that only test the JIT javascript compiler and not much else.
Read: http://forum.palemoon.org/viewtopic.php ... 650&p=3004
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

dark_moon

Re: MSVC vs. ICC & Waterfox

Unread post by dark_moon » 2012-06-24, 15:34

Oh i dont see this FAQ. Thanks for what.

So the only way to test if a browser is faster then another, is to check the rendering time from a website?

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35650
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: MSVC vs. ICC & Waterfox

Unread post by Moonchild » 2012-06-24, 20:08

dark_moon wrote:Oh i dont see this FAQ. Thanks for what.

So the only way to test if a browser is faster then another, is to check the rendering time from a website?
No...
An overall browser test would test all aspects. Rendering time of a website is also hard to measure and results can be biased depending on a lot of factors - especially on how you measure the time.

Benchmarking a browser is simply very difficult to do properly. Especially if you want to compare different architectures. This is also why on my website I test against Firefox and nothing else, it being the closest in comparison to the way it handles code, so results are the closest thing to be comparable - besides, it's the most interesting for Pale Moon.
One could use page load/drawing times overall as a measurement, but you'd have to come up with representative pages for real-world browsing. That is a challenge all in itself
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

stravinsky

Re: MSVC vs. ICC & Waterfox

Unread post by stravinsky » 2012-06-25, 06:46

how about the top 40 alexa sites? like Tomshardware uses for their "Web Browser Grand Prix" ?

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35650
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: MSVC vs. ICC & Waterfox

Unread post by Moonchild » 2012-06-25, 08:16

stravinsky wrote:how about the top 40 alexa sites? like Tomshardware uses for their "Web Browser Grand Prix" ?
The "highest ranked" websites aren't necessarily a good cross-section of the different types of website you find. If you want to do it based on Alexa ranking, then you'd need to use a lot more sites than just 40.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

aphanic

Re: MSVC vs. ICC & Waterfox

Unread post by aphanic » 2012-08-20, 21:12

Hi, I'd like to add some things to this thread as I think they're relevant.

The main thing with Intel's compiler is the processor/instruction set optimization switches used:
· /arch:target : This works the same as with Microsoft's compilers, targets the minimum instruction set available for the processor but doesn't add a dispatcher for non-Intel processors. For example /arch:SSE3 would allow generation of SSE3 instructions which would be used in non-Intel processors too.
· /Qaxtarget : This one adds an alternate path of execution targeting Intel processors, executables contain a dispatcher to trigger each path (it's possible to add more than one) and defaults to another one on non-Intel processors. For example /QaxSSE4.1 would target the 45nm Core2 line I think (up to SSE4.1 support anyway), while non-Intel processors would be executing some unoptimized path even if they support SSE4.1 and the underlying instruction sets. Useful for Intel processors, but not at all for other brands. This switch can be combined with the previous to allow other instruction sets for the default path too.
· /Qxtarget : This one makes the compiler generate code for a determined Intel processor line, executables wouldn't execute at all under non-Intel CPUs. Useful when compiling things for a line of Intel processors or targeting the current machine (/QxHOST would do too, on Intel processors equals to set the /Qx switch and on non-Intel the /arch: one).

This applies to the code you write, so if Intel compilers are to be used to produce the executables of a product I'd say to stay clear of /Qax and /Qx and only use /arch to avoid adding a default non-optimized path to the executable.

Now, when using and linking to Intel libraries, such as MKL (the math one), SVML (small vectors), etc. the thing is different because they have versioned functions depending on the processor under which they're being executed and the instruction set available. So even using only /arch:sth I think the dispatcher is being added anyway to differentiate Intel from non-Intel processors and then particular instruction sets, but that's where the fun is; for example the library for small vectors is very efficient and making use of XMM registers (SSE2) and YMM registers (AVX) in its computations really makes a difference.

There are ways of overriding this dispatcher, for example calling the optimized functions directly (but it's tricky and would be a pain in the ass in a big program). Other one would be to replace dispatcher function for one of our own (i.e. writing a function with the same name), but for this to work the program would have to be statically linked against the libraries (/MT switch in Microsoft and Intel compilers under Windows): that function would take precedence over a function of the same name in the library.

Pale Moon is not being statically linked against the CRT so the executable size would be bigger, even that those DLLs are being supplied within the installation (msvcr100.dll and msvcp100.dll) from my experience Intel compilers tend to generate larger executables (but also faster ones). I also don't know /MT would be suitable for this code base anyway, I just wanted to point out some things of the Intel compiler related to the discussion.

Agner Fog has very good material on optimizing C++ and many other advanced topics definitely worth reading, at page 132 of Optimizing software in C++ there's an insight on the matters relating the dispatcher of the Intel compiler as well as code ready to use to override the dispatchers. This thread in his blog is also very interesting, with some information on what kind of code some Intel compilers used to generate for non-Intel machines, and benchmarks too.

Anyway, and to finish because this got kind of big; both AMD and Intel have guides on which switches to use to optimize the executables (in case you end up deciding for 2 separated builds): AMD (for various compilers including Intel's), Intel (for its own, describing some things).

Kind regards.

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 35650
Joined: 2011-08-28, 17:27
Location: Motala, SE

Re: MSVC vs. ICC & Waterfox

Unread post by Moonchild » 2012-08-20, 21:56

Thanks for taking the time to write, but the base problem with ICC remains that it penalizes any architecture that is not "GenuineIntel", even if it supports /arch:SSE2, for example. In addition, there are plenty of reports that the Mozilla code base does not provide stable binaries when compiled with ICC.

Quoted from your very linked article:
Unfortunately, the CPU detection mechanism in Intel compilers has several flaws:

• The best possible version of the code is chosen only when running on an Intel
processor
. The CPU dispatcher checks whether the processor is an Intel before it
checks which instruction set it supports
. An inferior version of the code is selected if
the processor is not an Intel, even if the processor is compatible with a better version
of the code.
This can lead to a dramatic degradation of performance on AMD and
VIA processors.

• Explicit CPU dispatching works only with Intel processors. A non-Intel processor
makes the dispatcher signal an error simply by performing an illegal operation that
crashes the program.
Also:
If a dispatched function calls another dispatched function then the dispatch branch of the latter is executed even though the CPU-type is already known at this place
...which is extremely inefficient, considering the Mozilla code base uses an enormous amount of nested calls. Inlining isn't possible because many calls are diverted to shared modules. Adding compiler directives in-code for explicit dispatching (which would be the only alternative) will fail because of the dispatcher not accepting non-Intel (point 2 in the first quote).

Aside from that, creating 2 separate builds for Intel and AMD using 2 different compilers that need code patches is not part of Pale Moon's goals, and would at most be accepted as a contributed build. I also have to see the practical result of such a specialized separation in terms of noticeable speed for average users to see if it's even worth considering.

You're also mistaken when you say that Pale Moon isn't linked against the MS CRT. It is and has been ever since the switch to VS2010 was made - this is because a custom CLR was no longer possible (Microsoft no longer supplies the necessary components to build your own CLR, unlike in VS2005). Removing the CRT files supplied with the browser will prevent the browser from starting (unless you have other copies installed on your system, e.g. in %windir%\system32) or may cause issues if there are version differences between the supplied DLL files and the ones present on your system.

True static linking, by the way, has not been properly supported for a long time in Firefox, and at the very least prevents packaging; potentially overriding the dispatcher code with your own is therefore not possible.
"Sometimes, the best way to get what you want is to be a good person." -- Louis Rossmann
"Seek wisdom, not knowledge. Knowledge is of the past; wisdom is of the future." -- Native American proverb
"Linux makes everything difficult." -- Lyceus Anubite

aphanic

Re: MSVC vs. ICC & Waterfox

Unread post by aphanic » 2012-08-20, 23:43

Yes, the main problem with the compiler is the dispatchers, if only it checked for supported instruction sets and not CPU brand... For small projects statically linking and bypassing the dispatcher function so other processors benefit of the optimized paths as well may be enough, but still it would be relying on some hack making it harder to maintain. And according to Agner the only dispatcher they have that wouldn't require "patching" would be the one from the IPP library, so even using /arch alone wouldn't suffice because of the independent dispatchers in their libraries (MKL and cia).

I wrote the info on it because I've worked with it, to have an insight on how it behaves more or less (since people seem to talk about Waterfox using that compiler). For example, enabling /Qipo (LTCG across all modules equivalent in Intel compiler) is very good but _really_ memory intensive and knowing that Firefox is already peaking the limit in VS2010 I can only imagine how hard would it be to make use of it during a compilation with Intel compiler. Personally I'd say to stick with Microsoft's compilers and not even doing separated versions, it'd be quite a lot of work to maintain 2 separated versions and even more if using different compilers and/or libraries (plus the work involving setting up a build environment for it). Having SSE2 enabled already and being PGO builds the increase in performance is substantial enough.

I noticed Pale Moon is linked to Microsoft's CRT (hence the libraries being shipped), I said it's not statically linked to them but dinamically. And it's actually better if you ask me, I mean, one can say some pro's about statically linking against a C runtime but in case of security issues with it there's no way for the program to use the updated runtime without recompiling. Also in a statically linked program can crash if code in one DLL calls free() in a pointer allocated by malloc() in another DLL for example, so programs that work perfectly dinamically linked against the C runtime start failing when switched to statically linked even after compiling cleanly.