irc://irc.freenode.net/ffmpeg-devel wrote: <nevcairiel> both ssse3 and avx optimizations, and most of those are only available on 64-bit on top, so if you run on 32-bit or on a CPU without ssse3, it'll be quite a bit slower
<jamrial> the optimizations up to sse2 are all for both x86 and x64. it's only ssse3 and above that (most) require x64
UPDATE 2: The IRC logs are up:
http://lists.ffmpeg.org/pipermail/ffmpe ... 02201.html
----------------------------------------------------------------
From testing I've done over at doom9 (more details at end of post), VP9 decoding seems to take great benefit from using SSSE3 - I see around a 3x speedup in decoding just due to SSSE3. Without SSSE3, even my friend's semi-modern 2.4GHz i5-520M cannot decode YouTube's new 60fps footage at 1920x1080 in VP9 without noticeable frame-drops and the CPU fan turning into a jumbo jet taking off.
The thing is though, neither Firefox nor Chrome seem to support SSSE3 in their VP9 decoder (UPDATE: At the time of this post Chrome 64bit didn't exist). By contrast, MPC-HC's VP9 decoder seems to support SSSE3.
Considering that part of Pale Moon's features is that it actually takes advantage of semi-modern instruction sets like SSE3, it doesn't seem like much of a stretch to get those SSSE3 optimizations into future Pale Moon's VP9 decoder, especially with how demanding 60fps 1080p VP9 is.
===== SSSE3 Testing =====
I have a 2.5GHz Athlon 64 x2 4800+ Brisbane G2 and a 2GHz Core 2 Duo T5800 Merom. Most real-world CPU tests I've run show that these two CPUs have nearly identical performance if they have a level playing field (i.e. no SSSE3 optimizations). The following CPU review also leads credence to the similar performance of these CPUs: LINK (other than their Windsor Athlon having twice the L2 cache, which Tom's Hardware shows only helps K8 a bit (LINK, source), their 4800+ and E6300 are essentially slighly lower-clocked versions of my two CPUs)
On the Brisbane, MPC-HC's VP9 decoding performance is also extremely similar to the performance I get with Chrome's VP9 decoder - around 60-70% CPU utilization for 1280x720 30fps. On the Merom however, decoding the exact same VP9 video in MPC-HC is much less - only around 20-30% CPU utilization. And yet, when I play back the very same 1280x720 30fps VP9 video in Chrome on the Merom, I get 60-70% CPU utilization. In fact, my Brisbane cannot even play 1920x1080 30fps VP9 at all without major framedrops, and playing the same VP9 video in Chrome and/or Firefox 31 beta on the Merom also results in tons of frame-dropping. Of course though, in MPC-HC, the Merom manages to only have around 50% CPU usage for 1920x1080 30fps VP9; heck the Merom can just manage to even play back 1920x1080 60fps VP9 in MPC-HC.
Now consider that the Brisbane PC has a Radeon HD4200 iGP while the Merom PC only has an Intel X3100 iGP. The only thing the Merom system really has over the Brisbane system is SSSE3, and according to another doom9 user, SSSE3 has been very useful lately (LINK) (technically the Merom has twice the L2 cache, but it also lacks the integrated memory controller that the Brisbane has).
To clarify, the video decode performance of MPC-HC v1.7.5 was not noticeably different between 32bit and 64bit on the Brisbane, though admittedly I did not test MPC-HC 32bit on the Merom (which I totally should do). Also I use the exact same copies of the portable versions of Chrome, Firefox 31 beta, and MPC-HC.
UPDATE: Just did a new test on an AMD E-350 (1.6GHz). With only a 1280x720 30fps VP9 video, there were framedrops in 32bit MPC-HC and I had ~100% CPU utilization but 64bit MPC-HC played the video just fine with around 55-75% CPU utilization.