Thread 'message timeout'

Message boards : BOINC Manager : message timeout
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9266 - Posted: 1 Apr 2007, 1:53:13 UTC

I am running Boinc 5.8.15 on Mac OS 10.4.9. I keep getting these messages:

Fri Mar 30 19:55:32 2007|SETI@home|Task 25ja04ab.7635.24577.554814.3.13_1 exited with zero status but no 'finished' file
Fri Mar 30 19:55:32 2007|SETI@home|If this happens repeatedly you may need to reset the project.
Fri Mar 30 19:55:32 2007|SETI@home|Restarting task 25ja04ab.7635.24577.554814.3.13_1 using setiathome_enhanced version 513
Sat Mar 31 06:22:27 2007||Restarting 25ja04ab.7635.24577.554814.3.13_1 - message timeout
Sat Mar 31 06:22:27 2007|SETI@home|Restarting task 25ja04ab.7635.24577.554814.3.13_1 using setiathome_enhanced version 513
Sat Mar 31 06:22:28 2007||[error] Process 27600 not found
Sat Mar 31 10:43:26 2007||Restarting 25ja04ab.7635.24577.554814.3.13_1 - message timeout
Sat Mar 31 10:43:27 2007|SETI@home|Restarting task 25ja04ab.7635.24577.554814.3.13_1 using setiathome_enhanced version 513

Trouble is Boinc Manager stays on that client continuously and does not switch to the next client file. I suspend Seti and then it will go to the next client only to get another set of message time outs.

Sat Mar 31 20:32:08 2007|rosetta@home|Starting CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0
Sat Mar 31 20:32:09 2007|rosetta@home|Starting task CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 using rosetta version 554
Sat Mar 31 20:35:16 2007||Restarting CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 - message timeout
Sat Mar 31 20:35:17 2007||[error] Process 6854 not found
Sat Mar 31 20:38:23 2007||Restarting CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 - message timeout
Sat Mar 31 20:38:24 2007||[error] Process 6875 not found
Sat Mar 31 20:41:28 2007||Restarting CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 - message timeout
Sat Mar 31 20:41:29 2007||[error] Process 6896 not found
Sat Mar 31 20:44:33 2007||Restarting CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 - message timeout
Sat Mar 31 20:44:33 2007|rosetta@home|Restarting task CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 using rosetta version 554
Sat Mar 31 20:44:34 2007||[error] Process 6916 not found

I posted this on the client's web site and the answer I got was to wait for the next stable release of Boinc Manager to correct the issue.

When will the next stable version be released? Or is this a different problem that I can fix on my end?
ID: 9266 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9267 - Posted: 1 Apr 2007, 2:53:00 UTC

I've sent an email to David, Rom and Charlie about this.

There's a similar report to yours on the Einstein boards.


At some point they might want to get some debugging logs from you. When they let me know which ones, I'll post the file you'll need to create.
Kathryn :o)
ID: 9267 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9273 - Posted: 1 Apr 2007, 4:41:13 UTC

A few things...


First... What kind of Mac is it? PPC or Intel?


Second... This is from Charlie, the Mac developer.

Are you running on an Intel or PowerPC Mac? A new version 5.18 of SETI@home client for Intel Macs, which should fix many of the crashes, has been ready for some time now but its release has been delayed for various reasons.

In my experience, the "file not found" message by itself is rarely a real problem on *NIX systems. The call waitpid (0,...) is used to find processes which have exited. It returns the Process ID of any exited child process in the same process group as the BOINC client.

The client then searches its list of active tasks for one with that Process ID. If the Process ID is not found in BOINC's list of currently active tasks, it generates the "process xxxx not found" message, where xxxx is the Process ID.

Having said that, the "message timeout" errors are a different matter and may well be worth investigating. I'll leave that to David or Rom.

Cheers,
--Charlie



Third... I've gotten the message timeout message in the past. I emailed David about it a couple alpha versions back and his response was...

"Restarting XXX - message timeout" is what happens when
an app gets into a deadlocked state because of CPU throttling.
The app should be running again afterwards.
-- David


So my question would be are you using the CPU throttling feature of BOINC?
Kathryn :o)
ID: 9273 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9279 - Posted: 1 Apr 2007, 11:28:44 UTC - in response to Message 9273.  
Last modified: 1 Apr 2007, 11:29:38 UTC

It is a PPC iMac and no I am not using the CPU throttling feature. I don't even know what that is, let alone where it is selected.
ID: 9279 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9280 - Posted: 1 Apr 2007, 12:33:42 UTC

Thanks Cheryl.

I've passed along that additional information to the developers.

And just for your future knowledge, CPU throttling lets you decrease the amount of CPU BOINC uses. Many use it to control temperatures when running on laptops. You can find the settings for it under "Your Account" in the "General Preferences" on most of the project websites (as long as their server code is recent enough to have the field on the website). You can also set it through the the Simple View of BOINC Manager by clicking on the preferences button.
Kathryn :o)
ID: 9280 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9284 - Posted: 1 Apr 2007, 13:24:18 UTC - in response to Message 9280.  

The CPU usage was set to 100%. Should this be set lower?

In checking my results on the client web site, the last two work units showed Client Error and Compute Error. I am assuming this means nothing is being done on these work units.

Where would I find the logs to Boinc Manager? I could not find any in my logs.
ID: 9284 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9285 - Posted: 1 Apr 2007, 14:58:36 UTC
Last modified: 1 Apr 2007, 15:17:10 UTC

Unless your computer is running hot, I wouldn't fool with that feature. I use it because my laptop runs *really* hot.

The debugging logs would be something that would be generated after one of us gives you instructions on how to set up an cc_config.xml file. These messages will be logged in the same place most of the other messages are, in the messages tab/window. The messages from your message tab/window (depending on which view of the manager you're looking at) can be found in a file called stdoutdae.

Where you might find it is a different question... I know BOINC on Macs has stuff in slightly different places depending on the file. With Windows it's all in the same directory.
Kathryn :o)
ID: 9285 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9289 - Posted: 1 Apr 2007, 19:28:30 UTC - in response to Message 9285.  
Last modified: 1 Apr 2007, 19:38:51 UTC

Here is an update and something to inquire further about.

I suspended Network activity and for the last 2 1/2 hours I have not been getting those 'message timeout' errors. I have also noticed that the workunits are working faster than before I suspended Network activity.

This is a home connection to the internet - DSL - and a Linksys Router connected to the modem in order to share printers with a Windows XP machine - who is not running Boinc (just the Mac is).
ID: 9289 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9290 - Posted: 1 Apr 2007, 20:48:04 UTC - in response to Message 9289.  

Here is an update and something to inquire further about.

I suspended Network activity and for the last 2 1/2 hours I have not been getting those 'message timeout' errors. I have also noticed that the workunits are working faster than before I suspended Network activity.

This is a home connection to the internet - DSL - and a Linksys Router connected to the modem in order to share printers with a Windows XP machine - who is not running Boinc (just the Mac is).


Interesting.

Try re-enabling the network and see if you start getting the messages again. If you do, disable it and see what happens.

Basically see if you can reproduce the behavior.

Kathryn :o)
ID: 9290 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 9291 - Posted: 1 Apr 2007, 23:31:33 UTC


Could it possibly be another manifestation of this bug which Nicolas found?

ID: 9291 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9292 - Posted: 2 Apr 2007, 0:23:56 UTC - in response to Message 9290.  

I resumed Network Activity and after four hours there was no 'message Timeout' errors, but ----

The manager did not switch clients after 60 minutes as per my preferences.
ID: 9292 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9293 - Posted: 2 Apr 2007, 0:31:44 UTC - in response to Message 9291.  


Could it possibly be another manifestation of this bug which Nicolas found?


No, it is not from clock sync. Mine is set to manual sync and I rarely do that.
ID: 9293 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 9294 - Posted: 2 Apr 2007, 0:43:00 UTC - in response to Message 9293.  
Last modified: 2 Apr 2007, 0:44:38 UTC


Could it possibly be another manifestation of this bug which Nicolas found?


No, it is not from clock sync. Mine is set to manual sync and I rarely do that.

That post doesn't mention clock syncing. Maybe you read a different one? The bug I described was on "Message 8819".
ID: 9294 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9295 - Posted: 2 Apr 2007, 1:52:00 UTC - in response to Message 9292.  

I resumed Network Activity and after four hours there was no 'message Timeout' errors, but ----

The manager did not switch clients after 60 minutes as per my preferences.


What project was it running?

5.8.x versions of BOINC won't switch tasks until a checkpoint has been reached. If it was Rosetta it's possible that the first model hadn't finished. If I'm remembering right, Rosetta only checkpoints at the end of a model.
Kathryn :o)
ID: 9295 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9296 - Posted: 2 Apr 2007, 2:11:37 UTC - in response to Message 9295.  



What project was it running?

5.8.x versions of BOINC won't switch tasks until a checkpoint has been reached. If it was Rosetta it's possible that the first model hadn't finished. If I'm remembering right, Rosetta only checkpoints at the end of a model.

Kathryn,

Seti was running at the time. I restarted Boinc Manager. I'll watch it and let you know what happens. I now have it set to Network Activity based on preferences rather than Always Available.

Nicolas,

My internet connection has always been a good one. I got the Message Timeout errors for several hours on end no mater what client it was working on. Turning off Network Activity, then turning it back on several hours later may have corrected those timeouts. I never had this problem with previous versions of the manager until the 5.8.11 & the 5.8.15 version.
ID: 9296 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9297 - Posted: 2 Apr 2007, 3:31:56 UTC - in response to Message 9296.  



What project was it running?

5.8.x versions of BOINC won't switch tasks until a checkpoint has been reached. If it was Rosetta it's possible that the first model hadn't finished. If I'm remembering right, Rosetta only checkpoints at the end of a model.

Kathryn,

Seti was running at the time. I restarted Boinc Manager. I'll watch it and let you know what happens. I now have it set to Network Activity based on preferences rather than Always Available.


Well... the checkpointing issue can't be it. Seti checkpoints pretty frequently. Definitely more than once an hour.

Let me create that cc_config.xml file for you. I'll set some flags that might shed some light on the problem. Disclaimer... I'm guessing at the best flags to set... I haven't heard back from David or Rom on this yet.

Kathryn :o)
ID: 9297 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9299 - Posted: 2 Apr 2007, 4:04:42 UTC
Last modified: 9 Apr 2007, 1:30:32 UTC

Exit out of BOINC.

Create the following file with TextEdit.

<cc_config>
   <log_flags>
       <task>1</task>
       <file_xfer>1</file_xfer>
       <sched_ops>1</sched_ops>
       <cpu_sched>1</cpu_sched>
       <cpu_sched_debug>0</cpu_sched_debug>
       <debt_debug>0</debt_debug>
       <state_debug>0</state_debug>
       <task_debug>1</task_debug>
       <file_xfer_debug>0</file_xfer_debug>
       <sched_op_debug>0</sched_op_debug>
       <http_debug>0</http_debug>
       <work_fetch_debug>0</work_fetch_debug>
       <proxy_debug>0</proxy_debug>
       <time_debug>0</time_debug>
       <http_xfer_debug>0</http_xfer_debug>
       <measurement_debug>0</measurement_debug>
       <poll_debug>0</poll_debug>
       <guirpc_debug>0</guirpc_debug>
       <scrsave_debug>0</scrsave_debug>
       <rr_simulation>0</rr_simulation>
       <cpu_sched>1</cpu_sched>
       <app_msg_send>0</app_msg_send>
       <app_msg_receive>0</app_msg_receive>
       <unparsed_xml>0</unparsed_xml>
       <network_status_debug>0</network_status_debug>
    </log_flags>
</cc_config>



Save it as cc_config.xml and make sure it doesn't get an extension like .txt (I don't know if Macs put this on like Windows does).


Put this file in Macintosh HD --> Library --> Application Support --> BOINC Data.

If any of the flags that are turned off (<flag>0</flag>) need to be turned on, I'll let you know. You'll just need to change the 0 to a 1 (<flag>1</flag>) and then tell BOINC to re-read the file. But if this is needed, I'll give better directions for that.

Let BOINC run for a while. If you see those errors, exit out of BOINC. I'm guessing in that same directory should be a file called stoutdae. You can use Spotlight to search for it if it's not there. Then either post it here for us to look at or email it to me and I'll pass it along to the developers.


[edited to fix the cc_config.xml]
Kathryn :o)
ID: 9299 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9305 - Posted: 2 Apr 2007, 13:33:53 UTC

Kathryn,

I followed your instructions and I am letting it run. I have two stoutdae files now. One with .old and the other with .txt

When I restarted Boinc I got:

Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: <measurement_debug>
Mon Apr 2 08:24:34 2007||Unexpected text 0 in cc_config.xml
Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: </measurement_debug>
Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: <checkpoint_debug>
Mon Apr 2 08:24:34 2007||Unexpected text 0 in cc_config.xml
Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: </checkpoint_debug>
Mon Apr 2 08:24:35 2007||Starting BOINC client version 5.8.15 for powerpc-apple-darwin
Mon Apr 2 08:24:35 2007||log flags: task, file_xfer, sched_ops, cpu_sched, task_debug
Mon Apr 2 08:24:35 2007||Libraries: libcurl/7.15.5 OpenSSL/0.9.7l zlib/1.2.3
Mon Apr 2 08:24:35 2007||Data directory: /Library/Application Support/BOINC Data
Mon Apr 2 08:24:37 2007||Processor: 1 Power Macintosh Power Macintosh [Power Macintosh Model PowerMac4,5] [AltiVec]


As it runs I get:
Mon Apr 2 08:24:37 2007||General prefs: no separate prefs for home; using your defaults
Mon Apr 2 08:24:37 2007|SETI@home|[cpu_sched] Starting 25ja04ab.7635.24577.554814.3.13_1(resume)
Mon Apr 2 08:24:37 2007||[task_debug] ACTIVE_TASK::start(): forked process: pid 20249
Mon Apr 2 08:24:37 2007|SETI@home|[task_debug] task_state=EXECUTING for 25ja04ab.7635.24577.554814.3.13_1 from start
Mon Apr 2 08:24:37 2007|SETI@home|Restarting task 25ja04ab.7635.24577.554814.3.13_1 using setiathome_enhanced version 513
Mon Apr 2 08:25:41 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed
Mon Apr 2 08:26:42 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed
Mon Apr 2 08:27:42 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed
Mon Apr 2 08:28:43 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed
Mon Apr 2 08:29:43 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed
Mon Apr 2 08:30:43 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed
Mon Apr 2 08:31:44 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed
Mon Apr 2 08:32:44 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed

I appreciate your help with this.
ID: 9305 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15561
Netherlands
Message 9306 - Posted: 2 Apr 2007, 14:01:57 UTC - in response to Message 9305.  

Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: <measurement_debug>
Mon Apr 2 08:24:34 2007||Unexpected text 0 in cc_config.xml
Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: </measurement_debug>
Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: <checkpoint_debug>
Mon Apr 2 08:24:34 2007||Unexpected text 0 in cc_config.xml
Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: </checkpoint_debug>

<measurement_debug> is no longer in use.
<checkpoint_debug> isn't in use yet. It'll be available in 5.10+ .. checkpoints are measured in 5.8 with the <task_debug> flag.

So you can ignore those messages.
ID: 9306 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9307 - Posted: 2 Apr 2007, 14:21:40 UTC

Great!

Now I guess just let it run and see if it starts throwing the original error messages.

Sorry about those extra flags. I didn't check to make sure it was up to date.
Kathryn :o)
ID: 9307 · Report as offensive
1 · 2 · Next

Message boards : BOINC Manager : message timeout

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.