Thread '3 days crunching lost - BOINC issue?'

Message boards : BOINC Manager : 3 days crunching lost - BOINC issue?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileadrianxw
Avatar

Send message
Joined: 2 Oct 05
Posts: 404
Denmark
Message 5070 - Posted: 20 Jul 2006, 12:58:33 UTC
Last modified: 20 Jul 2006, 12:59:22 UTC

This morning, I noticed one of my machines appeared to be idle. BOINC Manager looked fine, and responded to tab actions and whatever, but the "Running" program was not changing times, and the System Idle process was running.

Looking at the messages, it looks to have beed idle for 3 days.

I have posted the messages for the period plus a bit more here.

Starting on 16th at 19:07, you can see Predictor pre-empts and Rosetta continue.

LHC is then asking for work, there are loads of these as the project has nothing to do at the time.

20:07 Rosetta pre-empts and SIMAP continues, completes and uploads, then continues with another wu.

21:12 SIMAP pre-empts and Rosetta continues and asks for more work.

23:34 Rosetta pre-empts and MCDN continues.

17th 00:34 MCDN pre-empts and Rosetta continues.

The point of leaving all that in the messages is to show that everything was running normally, nothing odd happening. Now it gets interesting!

02:34 Rosetta pre-empts and MCDN continues... except it doesn't!

17:35 15 hours later, work fetch gets suspended for overcomitment. There is then absolutely nothing in the messages until this morning.

20th 06:28 I'd noticed on another computer that LHC had produced some wu's so I went to this computer and did and "Update" on LHC to see if it would get any wu's, (it didn't). I did not notice the problem, hey it was early!

10:04 BOINC suspends MCDN, runs the benchmarks, the resumes MCDN, still nothing happening though.

11:57 I finally notice the problem stop and start BOINC.

11:58 system goes into EDF because the MCDN result is now overdue. The MCDN wu continues - I let it run to see what happens.

12:08 MCDN finishes and Predictor starts.

It has been running normally since. The LTD and STD values are enormous. MCDN has -268316 LTD!

Machine is this one, wu is this one, result is this one. Machine has be running flawlessly for months.

So, what happened here guys? BOINC issue? MCDN issue?

I'll post this at MCDN as well in case they recognise anything.




Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 5070 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15563
Netherlands
Message 5072 - Posted: 20 Jul 2006, 13:48:05 UTC

MCDN issue.
ID: 5072 · Report as offensive

Message boards : BOINC Manager : 3 days crunching lost - BOINC issue?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.