Message boards : Questions and problems : BOINC "hangs" when network unavailable
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Apr 06 Posts: 302 |
When host runs with broken network (for example, network cable unplugged), boinc.exe (runned as service) begins to consume CPU (~2h of CPU time for ~10 hours w/o network). At this situation BOINC client manager can't connect to boinc service and hangs (not just shows message like "Cant connect to core client" but exactly hangs). |
![]() Send message Joined: 20 Dec 07 Posts: 1069 ![]() |
What BOINC version and OS? All versions I can remember. I'm sure for 5.8.16 and 5.10.45. I've also read posts on the fora somewhere. Can't search now, Grey's Anatomy continues :-) Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) ![]() |
Send message Joined: 9 Apr 06 Posts: 302 |
What BOINC version and OS? In my case it's BOINC 5.10.45. OSes Vista Business edition, Windows Server 2003 x64 & Win98 (I seen such behavior few times already, and when it hitted even my new quad with Vista realised that this bug should be reported). ADDON: Science app (ovserved this on Einstein@home project app) experienced permanent restarts because of not reciving heartbeats from core client. |
Send message Joined: 9 Apr 06 Posts: 302 |
10h is just approx value, cant rely on it. Network cable was unplugged in second half of day and on next day morning I noticed described situation (in last case, with Vista OS on quad). In case of Core2 Duo under Win2003 x64 and P-II under Win98 network cable was unplugged possibly few days. So, probably this occurs after long network outage. Sure, if network settings were set to "Network activity suspended" before cable unplug, all work just fine. |
![]() Send message Joined: 20 Dec 07 Posts: 1069 ![]() |
In your first post you mention 10 hours. Are you saying the network must be broken for about 10 hours to cause the hang? I am wondering because I unplugged the cable between the host and router on 4 of my hosts and saw no problems. But I left them unplugged for only 1 to 2 hours, not 10. One of the hosts is WinXP and BOINC 5.10.45. Two are Linux and BOINC 6.2.15. The other is Linux and BOINC 5.10.45. None are service install. The hang occurs as soon as BOINC tries a network connection without a cable plugged in. Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) ![]() |
![]() Send message Joined: 3 Apr 06 Posts: 547 ![]() |
In your first post you mention 10 hours. Are you saying the network must be broken for about 10 hours to cause the hang? I am wondering because I unplugged the cable between the host and router on 4 of my hosts and saw no problems. But I left them unplugged for only 1 to 2 hours, not 10. Might it be possible, that these 'about 10 hours' correlate with the time, when the machine's IP lease expires? (I've occasionally had this problem too since some 2 years ago (and reported it few times), but not since last months.) Peter |
Send message Joined: 9 Apr 06 Posts: 302 |
Well, maybe, although Win98 host has manually assigned IP as I recall. Other hosts have default IP expire time, don't know how long is it for these OSes. |
![]() Send message Joined: 3 Apr 06 Posts: 547 ![]() |
Other hosts have default IP expire time, don't know how long is it for these OSes. The expiration delay is defined by the DHCP server, not OS-side. Peter |
![]() Send message Joined: 20 Dec 07 Posts: 1069 ![]() |
@Gundolf, Quite possible. I only have windows machines. The one I'm referring to runs NT4 :-) I'm quite sure, though, that it's not the IP lease. I do have issues with that too, but never concurrently with BOINC manager hangs (as far as I remember :-) Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) ![]() |
Send message Joined: 9 Apr 06 Posts: 302 |
Other hosts have default IP expire time, don't know how long is it for these OSes. Do you really think DHCP service built in Windows is not part of OS? Maybe I should belive after that Win is true microkernel and modular OS ?Even RTOS maybe? LoL (offtopic, of course, but... sorry :) ) |
Send message Joined: 9 Apr 06 Posts: 302 |
Raistmer, are you sure the manager is truly hanging? I unplugged the cable on a Win XP host running BOINC 6.2.18 and waited until it attempted to upload a result. It tries the upload 3 or 4 times then gives an application modal popup saying "BOINC couldn't do Internet communication, and no default connection is selected." The manager appears to hang because clicking on it does nothing. However, if I close the popup then the manager regains control. I didnt recive that popup. When I kill BOINC manager process and restart it it hangs again. The solution for that case was to stop boinc service, thaen start BOINC manager, then disable network access, then restart boinc service and manager. So, some "hang" occurs into boinc service IMHO, not BOINC manager itself. This is supported by einstein@home app behavior - it perpetually restarts with message like no heartbeat for 30 sec. Apparently, science app can't communicate with boinc service too in that situation. And don't forget increased CPU consumption by boinc.exe process. It looks like service retry his communication attempts too often to be able to do anything else. I will try to reproduce this situation in more controlled environment. |
![]() Send message Joined: 29 Aug 05 Posts: 15625 ![]() |
I will try to reproduce this situation in more controlled environment. Try to reproduce it with 6.2.18 I say this because 5.10 is no longer in development, no new(er) versions of 5.10 will be released. |
![]() Send message Joined: 3 Apr 06 Posts: 547 ![]() |
Other hosts have default IP expire time, don't know how long is it for these OSes. No, I do not, but please note, that I wrote about IP lease from DHCP server (the server machine defines, for which period of time is the client machine (Windows in our case) allowed to use this IP), not about the client machine's OS' DHCP service (acting as the DHCP client). Maybe I should belive after that Win is true microkernel and modular OS ?Even RTOS maybe? LoL (offtopic, of course, but... sorry :) ) I've no exact idea, what to compare Windows to, but sure no RTOS :-) (joke taken). Peter |
Send message Joined: 9 Apr 06 Posts: 302 |
I will try to reproduce this situation in more controlled environment. Sorry, I still don't like idea to use 6.x versions, at least on production hosts. But I guess that good share of 5.x codebase was used in 6.x version. So this bug easely could go to this new version too. |
Send message Joined: 9 Apr 06 Posts: 302 |
IP lease from DHCP server[/url] (the server machine defines, for which period of time is the client machine (Windows in our case) allowed to use this IP), not about the client machine's OS' DHCP service (acting as the DHCP client). I mean that host in local network so it has local IP and this local IP was assigned by server...in "server" role plays Win2003 x64 in one case and WinXP x86 in another case. Both client (with DHCP client service) and server (with embedded into system DHCP server service) are windows :) That's what I mean :) So all affected hosts are connected to Internet via Windows embedded NAT services and have no external IP. Have no idea if it can shed any light on issue under investigation but ... Maybe I should belive after that Win is true microkernel and modular OS ?Even RTOS maybe? LoL (offtopic, of course, but... sorry :) ) I've no exact idea, what to compare Windows to, but sure no RTOS :-) (joke taken). :) |
![]() Send message Joined: 3 Apr 06 Posts: 547 ![]() |
I mean that host in local network so it has local IP and this local IP was assigned by server...in "server" role plays Win2003 x64 in one case and WinXP x86 in another case. Both client (with DHCP client service) and server (with embedded into system DHCP server service) are windows :) That's what I mean :) OK, this way you can at least rule out the machines acting as DHCP server (what the NAT service is) not being available. Thus this is probably no lost IP problem (because I did have such BOINC issues in the past). Peter |
Send message Joined: 9 Apr 06 Posts: 302 |
I agree. It's the client (what you call the service) that hangs on your machine. I call client as service, because BOINC core client runs as windows service on that hosts (exept Win98 of course, there boinc.exe runs just as separate console app). Hosts are connected via cable. Cable unplug leads to that situation (long unplug, how long - need to be investigated more). I usually not run BOINC manager, only boinc.exe runs as service. So, after boinc.exe met his trouble, BOINC manager (that was launched later) met wounded boinc service. Why BOINC manager doesn't use some timeout for communication with service (core client) in this case - don't know. If service stopped manager shows popup like "can't connect to client". But while service runs, manager hangs. |
Send message Joined: 9 Apr 06 Posts: 302 |
BOINC has been running on my Win XP BOINC 5.10.45 system with the network cable unplugged from the hardware router for 25 hours. No hangs, no problems. There was some delay - AP should be finished before further experiments. Will try with these options enabled. |
Send message Joined: 9 Apr 06 Posts: 302 |
Well, after 28h with unplugged cable BOINC manager can connect to service. On first try (after ~3 h) it showed popup box, on second and third attempt it just opened main window. Hard reproducible (or nonreproducible) bug it seems. But for 28h of running w/o network boinc.exe took 1h14min of CPU. That is, ~4% of CPU goes to boinc service itself. Almost every second BOINC tries to reconnect for some of results. There is ~100 results ready to upload already (SETI produces very short tasks now). These retries take too much CPU time it seems. Maybe it worth to check if project available once per minute (or less often), not each second? |
![]() Send message Joined: 29 Aug 05 Posts: 15625 ![]() |
You get the once a second retry if you have your reminder option set to zero. Only in a 6.3 is the reminder at zero really off. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.