Thread 'Boinc Manager Suggestions'

Message boards : BOINC Manager : Boinc Manager Suggestions
Message board moderation

To post messages, you must log in.

AuthorMessage
Darmok

Send message
Joined: 15 Mar 18
Posts: 12
Canada
Message 85211 - Posted: 15 Mar 2018, 2:05:52 UTC
Last modified: 15 Mar 2018, 2:06:27 UTC

Some projects have multiple server sites for various tasks and a site may be down for uploads while another available. II would be great to have the possibility of selecting which tasks to upload (and not) as a command button on the Transfer Page. This represents much bandwidth as all tasks will try to upload to a downed server until it reaches the tasks to upload to the available servers. This is wasting Gb's of bandwidth. Thanks
ID: 85211 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5128
United Kingdom
Message 85222 - Posted: 15 Mar 2018, 10:51:09 UTC
Last modified: 15 Mar 2018, 11:10:24 UTC

Welcome Darmok, and thanks for your suggestion.

In fact, I don't think the situation is quite as bad as you suggest. We went through the 'multiple upload server' problem in some detail a few years ago, and in most cases, the current solution works well enough. Usually, when an upload servers fails, is unreachable, or is shut down for maintenance, your BOINC client gets that message immediately at the very beginning of the transaction: a few hundred bytes are exchanged in header negotiation, or a few dozen seconds are lost waiting for a timeout, but no major damage is done. If an upload fails for any reason, BOINC waits before retrying. At first, the waits are short, so that transient glitches are resolved quickly. If the retries fail consistently, the wait between retries is increased, and in addition, the whole project is inhibited from uploading until some event triggers a project-wide retry.

To cope with the multiple-server situation, every new upload file generated by the project is tried once: maybe this one will be destined for a working server, and will go through at once. But if not, it joins the queue waiting to upload.

The advantage of this system if that if a new upload intended for the failed server does get through (i.e. the project administrators have fixed the server), BOINC releases the project-wide inhibition, and all uploads intended for the broken server are retried automatically, with no action required at your end.

There is, however, one situation where bandwidth is wasted as you describe - and this is one which is most commonly used by projects which have very large upload files to manage, thus compounding the problem. My understanding is that this happens when a project sets up a small 'gateway' upload server to handle the incoming files from users, but then transfers them to a much larger background data storage facility. If the background storage facility fills up or otherwise fails, the gateway server doesn't find out about it until the upload is complete and the gateway server attempts to transfer the entire file in one go.

My understanding is that one project - Climate Prediction (CPDN) - is suffering this sort of problem at the moment. Maybe this is what prompted your suggestion? The project administrators are aware of the problem and are investigating.

But it would be better if this case was handled by BOINC in the same way as I described above - with the gateway server rejecting the upload right at the beginning if the backing store is not ready to accept it. I think this would be better than adding a control to the manager for individual users: not everyone monitors their BOINC installation as closely as you obviously do, and almost everyone takes time off to sleep most days! More seriously, there's a risk that someone would turn off the uploads during a problem, and then forget to turn them back on again in a timely fashion after the problem is fixed.

I do understand that the problem you've observed is frustrating, and for some users with metered bandwidth may even incur unexpected expense. I'll pass on your concern both to the individual project I've identified, and to the wider BOINC development team, to see what can be done.

Edit - reported as Github issue #2411
ID: 85222 · Report as offensive
Jacob Klein
Volunteer tester
Help desk expert

Send message
Joined: 9 Nov 10
Posts: 63
United States
Message 85233 - Posted: 15 Mar 2018, 14:11:56 UTC
Last modified: 15 Mar 2018, 14:13:04 UTC

Thanks guys. I would appreciate something to mitigate this problem. I'm not on a metered connection, but I've got 4 PCs all uploading CPDN (ClimatePrediction.net) files over and over, and my local network gets badly congested when the upload pipe is saturated - sometimes hard to use webpages even.

I hope the change gets implemented to avoid the upload altogether, if the backing store is incapable of accepting.

Thanks,
Jacob Klein
ID: 85233 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5128
United Kingdom
Message 85234 - Posted: 15 Mar 2018, 14:59:12 UTC

The particular CPDN project failure which may have prompted this thread has been resolved for the time being, and uploads are being accepted again - although I don't know how quickly they're being accepted, or how big the queued backlog became before it was released. But it has happened to them before, and will likely happen to them again, so we still need to have some way of handling it automatically.
ID: 85234 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15560
Netherlands
Message 85237 - Posted: 15 Mar 2018, 15:42:12 UTC

I'm about sure that you can select tasks in the Transfers tab and press for them the Retry Now button. Only the selected tasks will then try the upload.
At least, that's how I got rid of the ones stuck in Seti's upload problem last week.

Not selecting any tasks in the list will retry them all.
ID: 85237 · Report as offensive
Darmok

Send message
Joined: 15 Mar 18
Posts: 12
Canada
Message 85285 - Posted: 17 Mar 2018, 8:54:23 UTC - in response to Message 85237.  
Last modified: 17 Mar 2018, 8:56:19 UTC

Thank you for your replies.
Yes, it is indeed CPDN with whom I have been running 24/7 for close to ten years now. I am not the largest volunteer on this project but, when this occurs every 2-3 months now, it must be proportionately more annoying the greater the number of devices running the project. When each task has several uploads in-waiting and each is between 35 to 70 Mbs and they restart several times, that represents a lot of uploads into the void. The Retry button will retry all tasks for the project (unless at one point these are blocked at 100% Progress). And then it will upload a single file to a different server located somewhere else (Oregon in this case) than the downed main server(s). Personally, I think adding the functionality of having the choice to upload or not each single file is a minor responsibility for a volunteer to handle and will likely be used sparingly. But when needed, it will be great to have. Of course, it would be nicer if the project prevents the situation instead of reacting to it.
ID: 85285 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5128
United Kingdom
Message 85286 - Posted: 17 Mar 2018, 10:27:37 UTC - in response to Message 85237.  

I'm about sure that you can select tasks in the Transfers tab and press for them the Retry Now button. Only the selected tasks will then try the upload.
At least, that's how I got rid of the ones stuck in Seti's upload problem last week.

Not selecting any tasks in the list will retry them all.
Unfortunately, not quite.

There are two sorts of transfer backoffs. One which shows an individual delay for an specific file until the next retry: the other is a project-wide backoff which applies to all files from that project. Selecting a single file for retrying will indeed reset the individual delay for that file, but it will also clear the project-wide delay, and other files whose individual delay since the last attempt has expired may be retried first. This means that you can't cherrypick the smallest file for the attempt.

Also, the project-wide backoff isn't reapplied until three attempts have failed. Since the default setting is for two uploads to be tried concurrently, a single click on the 'retry' button will upload four files - up to 280 MB on Darmok's figures - under these conditions.

This problem has plagued CPDN for many, many years. One of their project moderators - Thyme Lawn - developed code to apply 'suspend network activity' on a project-by-project basis, and took it as far as an experimental client/manager combination, which I tested. This was before the days of separate sub-projects uploading to different servers, so it might not be ideal now: but in any event, the concept was rejected by the core BOINC developers and abandoned.

Oh, and the 'retry now' button is disabled until at least one file has been selected.
ID: 85286 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5128
United Kingdom
Message 85287 - Posted: 17 Mar 2018, 10:42:21 UTC

Researching that old solution, I found http://boinc.berkeley.edu/trac/ticket/133: this problem shouldn't be occurring if it's the actual upload server which is out of disk space. That's why I phrased my upstream report in terms of transfers to a backing store.

The ticket I was actually looking for is http://boinc.berkeley.edu/trac/ticket/139
ID: 85287 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15560
Netherlands
Message 85288 - Posted: 17 Mar 2018, 11:46:04 UTC - in response to Message 85286.  

Oh, and the 'retry now' button is disabled until at least one file has been selected.
Ah OK, you're right.

But I also don't think that the client can be made to switch between upload servers if the storage server is at fault. This sounds more like a server solution where the storage should warn when it's got just 15-20% storage left, that a human comes look. Saw Andy's answer on the CPDN list, where he said he wasn't aware of any space issues? That means their storage doesn't warn when it's reaching capacity.
ID: 85288 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5128
United Kingdom
Message 85289 - Posted: 17 Mar 2018, 12:35:03 UTC - in response to Message 85288.  

But I also don't think that the client can be made to switch between upload servers if the storage server is at fault.
Interesting question. The client can certainly switch between *download* servers when the project specifies them in advance: Einstein does that, with

    <download_url>http://einstein2.aei.uni-hannover.de/download/hsgamma_FGRPB1G_1.20_windows_x86_64__FGRPopencl1K-nvidia.exe</download_url>
    <download_url>http://einstein-dl.syr.edu/download/hsgamma_FGRPB1G_1.20_windows_x86_64__FGRPopencl1K-nvidia.exe</download_url>
    <download_url>http://einstein-dl.phys.uwm.edu/download/hsgamma_FGRPB1G_1.20_windows_x86_64__FGRPopencl1K-nvidia.exe</download_url>
- but they only do it with application files which can be mirrored around the world and kept unchanged for a long time and multiple downloads. They don't do it for single-task data files, and I don't know if the client would support it for uploads.

CPDN does use different upload servers for different research types - Pacific North West study results go directly to Oregon, Australian study results go directly to Australia - but only to a single server in each region. It's not under user control.

This sounds more like a server solution where the storage should warn when it's got just 15-20% storage left, that a human comes look. Saw Andy's answer on the CPDN list, where he said he wasn't aware of any space issues? That means their storage doesn't warn when it's reaching capacity.
And did you see David's solution in c0c6cf7 - ah, those long-ago, far-away, innocent days when an upload server could run until it had 1 MB of free disk space? That brings timing into play as well - even if the server has 101 MB free at the start of your 100 MB upload, somebody else might have sneaked in and filled 35 MB of it while you were uploading.

With regard to Andy's comment, Sarah came back the next day with "We have cleared off some space and intend to clear more in the coming days." - so it wasn't a false negative. In the meantime, a single user had uploaded 1.7 GB - the current 1 MB test is clearly inadequate in the modern era, and the projects which use huge network storage facilities need to implement some kind of early warning system. Perhaps we can invite Andy and Laurence to compare notes on Tuesday?
ID: 85289 · Report as offensive
Darmok

Send message
Joined: 15 Mar 18
Posts: 12
Canada
Message 85290 - Posted: 17 Mar 2018, 12:53:40 UTC - in response to Message 85286.  
Last modified: 17 Mar 2018, 12:59:07 UTC

"[i]Also, the project-wide backoff isn't reapplied until three attempts have failed. Since the default setting is for two uploads to be tried concurrently, a single click on the 'retry' button will upload four files - up to 280 MB on Darmok's figures - under these conditions."[/i]
That is my understanding as well although the upload was closer to 1Gb each time with multiple files with each task. My Network Com is suspended most of the time thus having multiple files for each task accumulating when I do connect to upload.

Given the attempt by Thyme Lawn, I'd say the issue evolved now to perhaps develop my suggestion for a side Command Button, sort of a "Stop Transfer File". It is much less "final" than the Abort command which deletes the data. It would allow, if needed, to cherrypick as you say.[/quote]

The PNW server uploads files of less than 2Mb versus the current UK at 30-70 Mbs approx. each. I do not know why it selects the large (UK) files first than goes on to the PNW files.
ID: 85290 · Report as offensive

Message boards : BOINC Manager : Boinc Manager Suggestions

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.