Thread 'Can close-coupled workflow type application be deployed on a BOINC server?'

Message boards : API : Can close-coupled workflow type application be deployed on a BOINC server?
Message board moderation

To post messages, you must log in.

AuthorMessage
Wei Hao

Send message
Joined: 13 Dec 13
Posts: 21
United States
Message 52060 - Posted: 21 Jan 2014, 15:19:32 UTC

we have an earth science work flow which can be divided into around ten sections. The ten sections are interdependent such that (for instance,)Section C's input should be Section A and B's output. And each section takes ~10 to ~90 minutes to process on regular workstations. Would this kind of project be efficient or even possible to deploy on a BOINC Server? As my understanding, BOINC will be very useful for computing tasks that are loosely coupled, i.e., the computing units should be quite independent on each other and the granule of the computing unit should be very light.
ID: 52060 · Report as offensive
ChristianB
Volunteer developer
Volunteer tester

Send message
Joined: 4 Jul 12
Posts: 321
Germany
Message 52079 - Posted: 22 Jan 2014, 7:24:47 UTC

This is totally what BOINC can do. Each of your section would be a BOINC separate application. You start with generating work for Section A and B (which are independent) and then generate work for C using the output from A and B. For each application you have to write a work generator, validator and assimilator. The assimilators of Application A and B would put the output data someplace where the work generator of Application C can pick them up.

A computer scientist from St. Jude Children's Research Hospital (Memphis, Tennessee) built a python based workflow tool for this use-case. I don't have a contact so you should ask on the boinc_projects mailinglist if he is still around.
ID: 52079 · Report as offensive
Wei Hao

Send message
Joined: 13 Dec 13
Posts: 21
United States
Message 52085 - Posted: 22 Jan 2014, 14:19:34 UTC - in response to Message 52079.  

Cool. Thanks!
ID: 52085 · Report as offensive
Wei Hao

Send message
Joined: 13 Dec 13
Posts: 21
United States
Message 52086 - Posted: 22 Jan 2014, 14:35:01 UTC - in response to Message 52079.  

Hi Christian, I am pretty new with BOINC. Thank you for your feedback. In our case, we have large data sets for each section of work flow. For instance, a single output file of Section A could be as large as 3GB. And the processing time could be more than an hour. We can not split each section into smaller pieces since we don't have the source code. Do you have more suggestions such that we can deploy the project more efficiently?
ID: 52086 · Report as offensive
ChristianB
Volunteer developer
Volunteer tester

Send message
Joined: 4 Jul 12
Posts: 321
Germany
Message 52088 - Posted: 22 Jan 2014, 15:22:55 UTC

What do you mean with "as large as"? Is it normally smaller? Is this compressed or can you get some increase using compression? A 3GB result file is very bad for volunteer computing as the volunteer might not have a good upload connection. So this seems to be unsuitable for Volunteer Computing. Maybe you have to compute this step on your own and only parallelize the other sections.

Volunteer Computing is only efficient when you can split the work in small parts. Big input files can also be handled but not big result files.
ID: 52088 · Report as offensive
Wei Hao

Send message
Joined: 13 Dec 13
Posts: 21
United States
Message 52089 - Posted: 22 Jan 2014, 15:44:39 UTC - in response to Message 52088.  

Thanks Christian. By 'as large as', I mean a typical output file size is ~3GB. As you said, it is not good for average volunteer computing environment.
ID: 52089 · Report as offensive
Juha
Volunteer developer
Volunteer tester
Help desk expert

Send message
Joined: 20 Nov 12
Posts: 801
Finland
Message 52125 - Posted: 24 Jan 2014, 18:28:26 UTC

Do you need the output from each part or just the last one? If just the last one couldn't you send one large task that consist of ten parts? This way the large output files would be just temporary files the host can remove after the task is completed.

If each part takes about an hour (or two or ten...) total runtime would still be perfectly reasonable.
ID: 52125 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5129
United Kingdom
Message 52129 - Posted: 24 Jan 2014, 19:56:44 UTC

I think the biggest problem is those 3GB upload files. This sort of project might work well in a closed community like a university campus, where you can rely on all users having high-speed (ideally gigabit) bi-directional ethernet connections. Some city-states with high fibre-optic penetration might also be a possibility.

But the general volunteer community around the world would struggle with limited upload speeds. I suppose one extra question that needs to be asked is - roughly how many of these multipart jobs are you intending to process (and how quickly)? Do you think recruitment from a closed community pool could supply you with enough participants?
ID: 52129 · Report as offensive
Wei Hao

Send message
Joined: 13 Dec 13
Posts: 21
United States
Message 52630 - Posted: 18 Feb 2014, 14:40:49 UTC - in response to Message 52129.  
Last modified: 18 Feb 2014, 14:41:06 UTC

If all the clients are supposed to be in the same LAN, then it may not be such a big problem. Thanks for the sharing of your ideas!
ID: 52630 · Report as offensive

Message boards : API : Can close-coupled workflow type application be deployed on a BOINC server?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.