SlslQt (Linux): occasional freezes,100% CPU load

Hey again, thought I'd wait with this not to be posted until the weekend because I had a suspicion you were a bit unusually busy, Nir.
Good. Let's get to the point:
This is the most unpleasant issue you can imagine on SlskQt.
You can't PREDICT this beast. It will happen, or will not, it might work up to midnight, then you go to sleep, leave slsk idle, then you wake up in the morning - CPU = 100%. No way to change any menu tabs.

A somewhat ridiculous workaround for that is to quit & restart SlskQt multiple times, and if you're lucky, CPU is back to normal again.
However, it has happened rarely that even after 10 quit/restart cycles, CPU was still 100%.

Currently, I'm trying to always keep a recent (!!) backup of my data file. However, this is not as easy as it sounds since as far as I can see SlskQt does NOT always write to the file when running, but rather not until quitting the app (?? please correct me if this is wrong). Had it a couple times that I restored a .Soulseek file and then there were lots of very old downloads (though finished long time ago) and obsolete settings in there.

Ah, it might be important to say that in CPU = 100% state, Slsk *MUST NOT* be quit with the 'X' on its window. If you do it after all, the process will still remain in the system.
This is why I always 'kill' or 'killall' the process to avoid leaving behind a "SlskQt zombie".

Wonder if anyone but me got the same strange thing on his Linux distro. Probably not. :)

thescarletfire's picture

I have a Linux home server with roughly 10TB of SATA storage onboard, running Soulseek, SSH, and SFTP 24/7 and I haven't encountered any issues on Ubuntu x64 so far at all. For a long while now.
What are your exact specs for the Linux machine?

What distro are you on?
My system monitor shows a max usage of like 1.5GB (and that's after weeks of running SS 24/7)...fwiw.

The server is a humble tower build, 16GB RAM, i7 860, nothing too fancy.

Good luck!

Hi,

I'm on Debian (3.3.7 x86 Qt v4.8.1).
This issue must have been due to using an old (and continuously ever-altered) configuration file. I bet the new data files have something different that the new clients do need. (And which an old configuration file might not supply or in a wrong way). When Nir finally repaired the ugly color setting window issue at the beginning of 2013 I went ahead and remade my WHOLE data shite including userlist (!!!), user groups, share folder configuration (it's complex, I ain't sharing /media/sdb5/mp3 ;)) tons of custom options, and color settings - bit by bit.

7 hours of work (comparing, setting, comparing, taking screenshots, noting down things --- sheesh!)

Since then, this issue has not reappeared any more, and the tabs were always operable.

But the technical geek in me would still have wanted to know WHAT caused this behavior with said hangs.

Oh wow, that's awesome. I really couldn't say why this isn't happening anymore, the only thing that's changed is that configuration files are now written to completely new filenames, without overwriting or moving any existing files. It almost sounds like there was some kind of file access conflict causing the freezes/CPU consumption. If so, I'm glad to have it inadvertently resolved.

Update time! Unfortunately, the issue is back again. SoulseekQt had crashed for some unknown reason, and from that point on, the data file must have gotten a "knack" again, so the issue is almost back in the same old way.
Only way out will probably be to create a fresh data file, re-configure shares, re-make all user groups, set colors again from scratch, etc. Phew.
Please, PLEASE give us a way to work with several small files again. Since at this (quasi-)beta stage of SoulseekQt, most of us would even accept a data file corruption on condition that our settings like colors, shared folders and user groups would not have to be remade from scratch every time.

The issue is especially problematic when you chat with a user and "tell him to queue" something for upload. You can almost act a clairvoyant now, predicting what will happen in the next minute: directly on upload attempt, CPU will go up to 100%, "chat cursor" will disappear, and all you can do is restart the app.

BTW, your profiler build you sent me in PM does only rarely get to the point when it writes out a gmon.out file. The program may as well freeze quite some time before it will ever get the chance to make it to the file-creation routine.

thescarletfire's picture

You're on Debian 3.3 still? Why such an old release? Perhaps that's contributing to your issues with SoulseekQt?

http://en.wikipedia.org/wiki/Debian

The latest stable is v6 from what I can tell.

I've not been able to reproduce any of your symptoms on my Debian-based machines, sorry. Hope you can figure this out soon.

My two cents: I love the .dat file configuration - everything in one nice neat package has been light years better than the old cfg files, in my experience. Importing/backing up has been a cinch since the switch here.

Scarlet.

Huh? "Debian 3.3", where did you get that from? =D
Ah I see. But no: it's a Debian Wheezy with a kernel 3.3.7. Whenever we Linuxers say "we're on Linux 3.3" it means that we're on some unspecified Linux distro with a 3.3 kernel.

(That's why it always makes sense in IT world to read version specifications up to the very last digit, not stopping anywhere before ;))
BTW, Debian has ever since preferred using human-readable names like 'Lenny' or 'Wheezy' to ugly numbers that rather suggest engine types ('v6').

thescarletfire's picture

I'm sorry you're having problems with SoulseekQT, but I can't reproduce your errors on any of my Debian machines.
Best of luck,
Scarlet

OK, YET one step further in tracking down this problem!
(Which still is an issue even in this year's April build)

Nir, your marvellous Peer Messenger MIGHT finally have given the correct insight.
(You have to be quick, though, otherwise you can't even access that tab anymore)

It might be due to establishing a connection to a certain user! And this user is known to not be always online, so every time he's offline, things work fine. Once he's online *AND* queuing something from me, the freezes occur.

How?

Well, with that certain user it says something about a "Messaging connection established with user XY" although the user did not chat with me any word.
(Maybe I should not take the "messaging" too literally, though.) This made the peer messenger stop *exactly* after trying to establish that connection, so you could bet your life on it that it froze again. And it did.

And now it comes.

Since the files the user wanted were userlist-only, I tried REMOVING this user from user list and ... hey presto!
Automagically the freezes were gone, and connections to the users coming after him in peer messenger list worked fine.
It might just be a coincidence, but I think not, since I've quit and restarted SoulseekQt 5 times, and nothing ever froze after removing that user. Whoa.

I guess the only way to get this problem nailed is to have a debug build whose Peer Messenger has a deeper debug level.
This appears to be related to building up connections of a certain type to certain users, obviously.

Well it sounds like the user is repeatedly browsing your files... if that's the case, you should see repeated "Shared file list requested by user [user]" messages under Diagnostics->Logs->Shared, let me know so I can figure out some kind of preventative measure.

Thanks, Nir

Well, hang on please. I could not say that for sure since if the user browsed me repeatedly, this information would be in *another* tab on Diagnostics, and have less to do with the Peer Messenger.
Plus, I noticed that you display some of the Diagnostics tabs in a "just-in-time" fashion, so if nobody has browsed me so far, there is not such tab either.

Before I shoot too quick here, there is another shot in the dark I have. I read that you've updated the miniupnp library in the latest build(s).

But I always get this error (reported previously years ago by others)
Initializing UPnP
No device list found during discovery. miniupnp error code: 0
Found own address to be xx.xxx.xx.xxx
Port 2236 mapping failure, miniupnp error: -3
Found own address to be xx.xxx.xx.xxx
Port 2237 mapping failure, miniupnp error: -3

I'm not connected to a router, but directly with my adsl modem.
Should I better turn off uPnP altogether?
However, I wonder whether my connection on 2237 will still remain obfuscated.

Whatever, this does look suspicious. Works anyway most of the time, but these error messages should not occur.

Following up on this.

The freezes definitely happen on a PM_QUEUE_UPLOAD from certain (not all) users. When one of this selection of users comes online AND wants a file from me, the freezes happen. So yes, you read right: when the requesting user in question hadn't been there for 2 days, SoulseekQt client will work fine for these 2 days too.

To be absolutely sure, I went on using a HEX EDITOR and removing the "problematic" username from the User List section of the .Soulseek data file!! Though the user got a "file not shared" and was not amused therefore, the client ALWAYS worked afterwards (100% of cases I tested, and I can't count them anymore).

I could even trigger the freeze myself by initiating an upload to this user ("Upload To User" functionality).

Here's a sample log from last week when this happened again:
(I felt like removing user names for privacy reasons, sorry)

[ Peer Messenger.log ]

[Fri May 24 23:19:16 2013] Send PM_TRANSFER_REQUEST message to user {removed 1}
[Fri May 24 23:19:16 2013] Found one existing messaging connection to user
[Fri May 24 23:19:16 2013] Error writing message to socket: Unknown error
[Fri May 24 23:19:46 2013] Peer connection to user {removed 1} expired, closing.
[Fri May 24 23:19:46 2013] Send PM_UPLOAD_FAILED message to user {removed 1}
[Fri May 24 23:19:46 2013] Found one existing messaging connection to user
[Fri May 24 23:19:46 2013] Error writing message to socket: Unknown error
[Fri May 24 23:19:46 2013] Send PM_TRANSFER_REQUEST message to user {removed 1}
[Fri May 24 23:19:46 2013] Found one existing messaging connection to user
[Fri May 24 23:19:46 2013] Error writing message to socket: Unknown error
[Fri May 24 23:20:06 2013] Accepted messaging connection to user {removed 2}
[Fri May 24 23:20:06 2013] Peer connection of type P opened from user {removed 2}
[Fri May 24 23:20:07 2013] Received PM_QUEUE_UPLOAD message from user {removed 2}
** PROGRAM FROZEN, CPU = 100% **

[ Transfer Queue.log ]

[Fri May 24 23:17:43 2013] Upload of {Removed 0} to {removed} aborted, sending upload failed notification.
[Fri May 24 23:18:16 2013] Upload of {Removed 0} to {removed} timed out in requesting phase, aborting
[Fri May 24 23:18:16 2013] Upload of {Removed 0} to {removed} aborted, sending upload failed notification.
[Fri May 24 23:20:04 2013] User {removed 2} went online
[Fri May 24 23:20:07 2013] Queue upload requested by {removed 2} for file {Removed 1}
** PROGRAM FROZEN, CPU = 100% **

Nir, I'm in need of a debug build with a real deep debug level! Otherwise this problem may never be found. This debug level is way too superficial, to say the least.