Soft Lockup on SMP systems - 100% of one CPU on kill

Support and discussions for the x86/x64 Linux version of Pale Moon and specific Linux distribution questions related to the browser.

Moderator: trava90

Forum rules
If your question is about general use of the browser and not specific to Linux, then please use the General Support board.
Locked
PhilSalkie

Soft Lockup on SMP systems - 100% of one CPU on kill

Post by PhilSalkie » 2014-11-16, 00:26

About daily, I'll have Pale Moon suddenly go unresponsive, no longer updating its window. Once I try to kill the window, it starts to consume an entire CPU, leading to an error message like this:

[113242.202826]BUG:soft lockup - CPU#2 stuck for 23s![mozStorage #5:3182]

This necessitates a powerdown (system won't reboot, it hangs trying to kill the Pale Moon process) and a restart. Anybody else seen anything like this? I'm using 25.0, will try 25.1 and see if that fixes things.

User avatar
x-15a2
Keeps coming back
Keeps coming back
Posts: 825
Joined: 2014-03-19, 00:28
Location: Triskelion

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by x-15a2 » 2014-11-16, 02:45

This can be caused by ill behaved add-ons and extensions. Try running PM in safe mode and see if that helps. If it does, you'll need to identify the offending add-on\extension.

I've also read from Moonchild that outdated video drivers for some GPU's can cause this type of issue. Search around in the forums for information like this.

PhilSalkie

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by PhilSalkie » 2014-11-19, 18:28

Happens in 25.1, as well.

I'm using eight extensions - Adblock Plus 2.6.6, CookieSafe 3.0.5, DoNotTrackMe 3.2.1127, DownloadHelper 4.9.24, Image Zoom 0.6.3, NoScript 2.6.9.4, Tab Scope 1.5, and YouTube ALL HTML5 2.1.3 - I'll try disabling them and see what I find (not going to be quick to determine, since the problem only happens once a day or so.)

I'm also using a fairly recent NVidia driver, but it's driving a 4K monitor - not your average use case, I suspect - so it's possible there's a connection there.

More info as it becomes available.

Thanks!

User avatar
Night Wing
Knows the dark side
Knows the dark side
Posts: 4395
Joined: 2011-10-03, 10:19
Location: Texas, USA

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by Night Wing » 2014-11-19, 19:01

This laptop I'm presently typing on is using an Intel 4000 graphics chip.

I'm using 64 bit linux Pale Moon 25.1.0 running in 64 bit linux Mint 17 and I have never experienced Pale Moon become unresponsive even if I let the laptop go to sleep and then re-awaken.
Linux Mint 20 (Ulyana) Xfce 64 Bit with 64 Bit linux Pale Moon

PhilSalkie

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by PhilSalkie » 2014-11-21, 19:13

Tried removing Tab Scope, still occurs. It seems to relate to clicking the mouse in either a tab or the URL bar - I'll have to pay more attention to what I'm clicking when it locks up.
I've now removed DownloadHelper, we'll see if it still happens...

(Suspend/Resume works fine for me on the desktop, with the exception that PaleMoon forgets all history on resuming, so I have to close the browser before suspending. I suspect this is because my home directory is an NFS4 share - suspend/resume on a laptop doesn't show the same problem.)

Update - a bit more information: noticed a flurry of entries from systemd-hostnamed (complaining about nss-myhostname - the message itself is apparently a bug, as nss-myhosntame is not supposed to be default installed on ubuntu) after the lockup. Since I'm happy to blame all Linux problems on systemd (or at least all that can't reasonably be blamed on pulseaudio) could this lockup be some interaction between palemoon and systemd-hostnamed?

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 28169
Joined: 2011-08-28, 17:27
Location: Tranås, SE
Contact:

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by Moonchild » 2014-11-21, 20:33

PhilSalkie wrote:Update - a bit more information: noticed a flurry of entries from systemd-hostnamed (complaining about nss-myhostname - the message itself is apparently a bug, as nss-myhosntame is not supposed to be default installed on ubuntu) after the lockup. Since I'm happy to blame all Linux problems on systemd (or at least all that can't reasonably be blamed on pulseaudio) could this lockup be some interaction between palemoon and systemd-hostnamed?
Hmm.. on Linux you can either use the system-installed NSS or the Pale Moon-supplied NSS. It's possible there is a discrepancy there in different versions and what they expect.
I'm not familiar enough with the guts of Linux (especially not on the plethora of different distros that all work slightly differently) and how these interactions work at a system level (I'm very much a Windows man), I'm afraid I can't really provide more insight.
"There will be times when the position you advocate, no matter how well framed and supported, will not be accepted by the public simply because you are who you are." -- Merrill Rose
Image

PhilSalkie

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by PhilSalkie » 2014-11-22, 16:18

Hmm.. on Linux you can either use the system-installed NSS or the Pale Moon-supplied NSS. It's possible there is a discrepancy there in different versions and what they expect.
"you can either use" - Is that something I can select in preferences here somehow, to see if one or the other stops the lockups?

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 28169
Joined: 2011-08-28, 17:27
Location: Tranås, SE
Contact:

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by Moonchild » 2014-11-23, 08:46

PhilSalkie wrote:
Hmm.. on Linux you can either use the system-installed NSS or the Pale Moon-supplied NSS. It's possible there is a discrepancy there in different versions and what they expect.
"you can either use" - Is that something I can select in preferences here somehow, to see if one or the other stops the lockups?
Only if you build the browser from source, I'm afraid.
By default it will provide its own, but you can select --with-system-nss at build time (in .mozconfig) to tell it to use a system-installed version of the lib.

I'm not sure if you can symlink to a system-installed version to test it or not. Maybe some of the Linux folks can provide a few things to try out?

Then again, if you build the browser from source anyway, using flags specific for your system, it may already alleviate your trouble.
"There will be times when the position you advocate, no matter how well framed and supported, will not be accepted by the public simply because you are who you are." -- Merrill Rose
Image

PhilSalkie

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by PhilSalkie » 2014-11-23, 19:15

Got rid of the NSS messages from systemd-hostnamed, but PaleMoon still locks up. Next thing I'll try is the latest NVidia driver - the one I'm using is from July, but there have been some updates since then.

PhilSalkie

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by PhilSalkie » 2014-11-24, 02:09

Ah, apparently the key to search on was "MozStorage" - should have done that first. This is a known Firefox bug having to do with the interface to SQLite: https://bugzilla.mozilla.org/show_bug.cgi?id=676064
Sadly, nobody's been assigned to fix it...

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 28169
Joined: 2011-08-28, 17:27
Location: Tranås, SE
Contact:

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by Moonchild » 2014-11-24, 20:57

No, I'm afraid that's not it:
  • The cause for that bug was fixed in bug #673470 - the SQLite issue that caused this was completely removed.
  • Even more important, this is related to SQL access by SafeBrowsing. Pale Moon does not include the SafeBrowsing "feature" because it is a volume-charged Google API service these days, is considered by Google to be experimental (with no signs of being fully released), and really doesn't do anything but provide a false sense of security in most cases.
So, this really must have a different cause.
"There will be times when the position you advocate, no matter how well framed and supported, will not be accepted by the public simply because you are who you are." -- Merrill Rose
Image

PhilSalkie

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by PhilSalkie » 2014-11-27, 01:18

Oh, well, seemed like a good lead. Maybe it's related to SQLite operating on a database that's mounted on a network share - I'll try moving the .moonchild\ productions directory to the boot drive, and symlinking it from my network share, see if that helps. I've gotten another message about nss in my system log right around the time of the last lockup, might try recompiling PaleMoon with the different nss switch if this keeps happening.

PhilSalkie

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by PhilSalkie » 2014-12-11, 09:26

The network share seems to have been the issue. I keep my home directory on an NFS4 mount to a Synology Diskstation, and SQLite seems to have issues with that (the Diskstation's set to spin the hard drives down after some time, maybe it's the delay waiting for the platters to spin up that's causing the lockup - pretty hard to tell, really.) In any event, since moving the .moonchild\ productions directory to the boot drive and making a symlink in my home directory, palemoon hasn't frozen once. I think I've found my workaround. Thanks for the hints!

User avatar
Moonchild
Pale Moon guru
Pale Moon guru
Posts: 28169
Joined: 2011-08-28, 17:27
Location: Tranås, SE
Contact:

Re: Soft Lockup on SMP systems - 100% of one CPU on kill

Post by Moonchild » 2014-12-11, 15:32

You probably found the cause there - the sqlite back-end isn't exactly calculated on having file access (the profile is available) but not data access (the I/O fails to complete) and you're probably hitting a race condition or infinite loop as a result, leading to the core lockup. A spin-up delay would likely cause a db access timeout, but it would be extremely difficult to pinpoint exactly which call gets into trouble as a result. It may be in the front-end, back-end, sqlite db driver or filesystem-specific. Too much guesswork to have a jab at without a simple testcase that can be used to track the problem.
"There will be times when the position you advocate, no matter how well framed and supported, will not be accepted by the public simply because you are who you are." -- Merrill Rose
Image

Locked