Thread 'Statistics for computation errors?'

Message boards : Questions and problems : Statistics for computation errors?
Message board moderation

To post messages, you must log in.

AuthorMessage
Markus Elfring

Send message
Joined: 20 Dec 11
Posts: 36
Germany
Message 41720 - Posted: 20 Dec 2011, 19:42:58 UTC

I can see in the task list for my PC system that a couple of computations have got the client state "Compute error" occasionally. I can look into each of them by the results web display. But I find this user interface to find out corresponding error reasons not so convenient as I imagine it could be.

The BOINC software has got an infrastructure to generate some statistics. Now I am looking for tools which can visualise the error distribution in an improved way to increase the chances for fixing involved open issues.

Is any automatic analysis performed on the returned exit codes within work units?
Is an automatic categorisation performed for computation failures so that an efficient drill-down into interesting issues would be supported?
ID: 41720 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15628
Netherlands
Message 41722 - Posted: 20 Dec 2011, 20:46:45 UTC - in response to Message 41720.  

One would hope that the project keeps an eye on the errors being returned, as that way they can easily see whether or not their app is stable enough. Of course, when a project is being set up, in alpha or beta stages, they'll want you to post about those errors in their forums. No quicker way to get the facts back than from a user.

In the back-end, used by the project, one has a list of errors over the last 24 hours and more. For the user, however... See, most of the errors are due to application instability. Yep, it's very well possible that your system isn't in ship shape and that it is producing only errors, but you can check that against other users.

Other than that, we have the BOINC FAQs, where I put some errors with a description of what is causing them. But that again scratches only the surface. BOINC errors are easy for me, as I just ask the developer. But application errors, not so easy, especially not when the developer for the project hasn't specified that any special error number should be given. So then all your errors are of the -1 category. Good luck there.

Best ask either in their forums, or here. Although, when you do post here, you'll probably get as answer that you should ask at that project's forums first. Sorry for that, but we don't really know which project uses what error message, etc. ;-)
ID: 41722 · Report as offensive
Markus Elfring

Send message
Joined: 20 Dec 11
Posts: 36
Germany
Message 41725 - Posted: 20 Dec 2011, 22:11:20 UTC - in response to Message 41722.  

In the back-end, used by the project, one has a list of errors over the last 24 hours and more.

I know that one project displays the information "Successes last 24h" besides the usual server status and credit statistics.

For the user, however... See, most of the errors are due to application instability.

I would like to try a reduction for this aspect by a detailed source code analysis.

BOINC errors are easy for me, as I just ask the developer. But application errors, not so easy, especially not when the developer for the project hasn't specified that any special error number should be given.

I hope that more constructive discussions can be achieved in this area. I am looking for a general improvement that can be applied to more software projects.

Best ask either in their forums, [...]

I have tried that already. ;-)
ID: 41725 · Report as offensive
ProfileDavid Anderson
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 10 Sep 05
Posts: 732
Message 41728 - Posted: 21 Dec 2011, 0:02:28 UTC

The project admin web interface (not normally visible to volunteers) has facilities for viewing errors broken down by OS and by application version, and for drilling down to individual failed jobs, including viewing the stderr output of those jobs.
ID: 41728 · Report as offensive
Markus Elfring

Send message
Joined: 20 Dec 11
Posts: 36
Germany
Message 41733 - Posted: 21 Dec 2011, 15:34:34 UTC - in response to Message 41728.  

The project admin web interface (not normally visible to volunteers) has facilities for viewing errors broken down by OS and by application version, and for drilling down to individual failed jobs, including viewing the stderr output of those jobs.

Thanks for your information.

How do you think about to enable access to parts of this interface for more users?

Which project leaders and administrators would like to publish additional data about success or failure rates?

Can any work flows be optimised with corresponding analysis reports?
ID: 41733 · Report as offensive
Markus Elfring

Send message
Joined: 20 Dec 11
Posts: 36
Germany
Message 41979 - Posted: 9 Jan 2012, 12:52:53 UTC - in response to Message 41728.  

The project admin web interface (not normally visible to volunteers) has facilities for viewing errors broken down by OS and by application version, and for drilling down to individual failed jobs, including viewing the stderr output of those jobs.

Is any documentation available for the involved programming interfaces to collect an error distribution from the running software applications?
ID: 41979 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15628
Netherlands
Message 41980 - Posted: 9 Jan 2012, 13:12:37 UTC - in response to Message 41979.  

http://boinc.berkeley.edu/trac/wiki/ProjectMain: Main Wiki index.
http://boinc.berkeley.edu/trac/wiki/HtmlOps: Administrative web interface.
http://boinc.berkeley.edu/trac/wiki/ProjectOptions: Project configuration options.

The administrative web interface is what projects admins will see, it'll also show the errors of the last 24 hours, by application version.
ID: 41980 · Report as offensive
Markus Elfring

Send message
Joined: 20 Dec 11
Posts: 36
Germany
Message 42094 - Posted: 15 Jan 2012, 15:25:28 UTC - in response to Message 41980.  

I am missing still a clearer documentation for the analysis of unexpected application behaviour by the automatic exit code processing.

Can the mentioned error report become eventually public in more detailed ways?
ID: 42094 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15628
Netherlands
Message 42097 - Posted: 15 Jan 2012, 17:38:18 UTC - in response to Message 42094.  

Ask the project in question if they're willing to export all their errors to the user base. That's outside our jurisdiction, so to say.
ID: 42097 · Report as offensive
Markus Elfring

Send message
Joined: 20 Dec 11
Posts: 36
Germany
Message 42132 - Posted: 18 Jan 2012, 14:25:44 UTC - in response to Message 42097.  

I would like to clarify the BOINC APIs and tools which can help to show and reduce failure rates eventually.
Which data bases are completely project-specific for the result analysis?
ID: 42132 · Report as offensive

Message boards : Questions and problems : Statistics for computation errors?

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.