Message boards : BOINC Manager : New ResourceShare values are not "read" by some hosts
Message board moderation
Author | Message |
---|---|
Send message Joined: 23 Dec 13 Posts: 45 |
I have 12 hosts. I use BAM as AcctMgr. All my projects were at ResourceShare=100 for the last 2 years. Yesterday, I assigned them values 1/5/10/25/50/100/200/500 in BAM, and BAM successfully set those in actual project sites. Then I forced an UPDATE on all projects on each host. To my surprise, some hosts got the whole set of new values for the projects they run, and some hosts got those only for a few projects. I cannot understand what is going on... As an example, I will use GPUGRID. I set it to 200 in BAM, BAM relayed that info to the project site, and I see it at 200 there. I don't use any "computer location" stuff; all hosts are at the default location. Yet, some hosts received 200 as the new value, and some hosts are stuck at 100. For kicks, on such a host, I suspended & NNT'd everything except for GPUGRID, changed the value to 201 on GPUGRID site (to take any issues with BAM or stale files or file timestamps off the table), ran on update on GPUGRID on that host, it received 8 tasks (4xTitanX, baby!), yet the resshare stayed at 100! I changed it back to 200 on the site, rerun update, still 100. I looked at all *gpugid*.xml files under ProgramData/BOINC, resource_share entries are all 200; not a single instance of 100. And, a good number of projects are at this state on some number of my hosts. What to do? How can it be XXX in all *projname*.xml files, yet be 100 for the projname in BOINCMgr?? If it were for one project across all hosts, or all projects across one host, I could understand, and blame a project or host being stuck at something, but this? :( Yes, everything is the latest version. Thanks Tuna |
Send message Joined: 23 Dec 13 Posts: 45 |
So, I PAINSTAKINGLY went through all of my 12 hosts with ~50 projects they are attached to, and made sure every project on each host has the correct resshare that I set under MyProjects in BAM, and I also made sure that each project's own site also showed the same resshare in their ProjectPreferences under YourAccount. To do this, I had to identify all the projects for each of my hosts that for some strange reason wouldn't get the new resshare from the project's XML, drain it of any remaining tasks with NoNewTasks+AbortNotStartedWork+DelayedDetach, wait for detach, and then reattach it using BAM. Somehow this initialized the project on that host with the correct resshare value coming from the project. So, now the default values under BAM's MyProjects page, the project sites themselves and my hosts attached to those projects are all in sync. But there clearly is a bug somewhere in BOINCMgr that prevents it from accepting the ResShare value from the XML file sent by the project, sometimes. Working around it was very very very time consuming. Thanks Tuna |
Send message Joined: 29 Aug 05 Posts: 15560 |
I'll wait until the developers at BAM have said anything about this: https://boincstats.com/en/forum/18/11507,1 |
Send message Joined: 23 Dec 13 Posts: 45 |
I'll wait until the developers at BAM have said anything about this: https://boincstats.com/en/forum/18/11507,1 Certainly. But note that my report here is about seemingly random hosts not respecting the resshare of seemingly random projects even though the XML sent to the host by that project during an update contains the correct value (since the project site itself has the correct value under MyAccount-->ProjectSettings), unless I detach n' reattach. On the other hand, my report on BAM is that BAM doesn't seem to read back (or display) correctly the current resshare value of some projects on some hosts after an AccountMgrUpdate, regardless if that value on the host is correct based on what the project site has, even though the host sends an XML to BAM with the correct values. I acknowledge that they sound similar, but I'll be surprised if they are the same issue. The two ends and direction of flow of information seems to be differeach in each case... Tuna |
Send message Joined: 4 Jul 12 Posts: 321 |
The problem with issues like that is that they are hard to reproduce. If it is only happening on some hosts but not on others I can't reproduce this on my end. What I could do if I find the time is to check where the value the Manager show to the user comes from. Normally this should come directly from the project specific preferences XML. Do you still have hosts that show this discrepancy or di you clean them all? |
Send message Joined: 23 Dec 13 Posts: 45 |
Unfortunately, I cleaned them all up which took 2-3 days, after leaving them untouched for about a week to see if the problem would resolve itself on its own. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.