Milkyway@Home - Massive Performance Issue - 1 Job Taking 8 CPUs vs 8 Jobs Taking 1 CPU Each

Message boards : Questions and problems : Milkyway@Home - Massive Performance Issue - 1 Job Taking 8 CPUs vs 8 Jobs Taking 1 CPU Each
Message board moderation

To post messages, you must log in.

AuthorMessage
Babynetman

Send message
Joined: 11 Oct 22
Posts: 7
Message 110077 - Posted: 11 Oct 2022, 13:18:26 UTC
Last modified: 11 Oct 2022, 13:29:20 UTC

I noticed recently that my per host average was dropping dramatically, from 8000 to less than 2000. I found that my task queue had Milkyway@Home tasks that were apparently configured to each take 8 CPUs instead of the normal 1 CPU per task. I noticed the performance of the 8 CPU task was no different than that of a 1 CPU task. I aborted the task and let BOINC reload a new task list. It dropped a batch of 1 CPU tasks, and they all started working as normal. A bit later, I decided to check on progress and saw there was again a single 8 CPU task running, and several 1 CPU tasks were waiting to run and that the single 8 CPU task was progressing at the same rate as a 1 CPU task. This time, I suspended the 8 CPU task and let the batch of 1 CPU tasks run, and then resumed the 8 CPU task. They all progressed at the same rate, and all looked fine again. However, after a few minutes, the 8 CPU task was running alone, and the 1 CPU tasks were again waiting to run, and the performance of the 8 CPU task was still the same as a 1 CPU task.

If others are experiencing this same issue, processing throughput for Milkyway at home will be massively impacted...
ID: 110077 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 110078 - Posted: 11 Oct 2022, 13:39:06 UTC - in response to Message 110077.  

If you don't want the multithreaded CPU tasks at Milkyway@Home, go to https://milkyway.cs.rpi.edu/milkyway/prefs.php?subset=project, edit these preferences, uncheck Milkyway@home N-Body Simulation and save preferences. BOINC will then only download and run single threaded tasks for this project.
ID: 110078 · Report as offensive
Babynetman

Send message
Joined: 11 Oct 22
Posts: 7
Message 110087 - Posted: 12 Oct 2022, 13:33:23 UTC - in response to Message 110078.  

Thanks. I made the preference change.

It's not that I don't want them. They are performing approximately 8 times worse than single threaded tasks. For example: An 8 CPU task is 80.641% done and estimated remaining is 01:34:50. In contrast, two single CPU tasks are both at 7% done and estimated remaining is :01:38:53.

I'm using an i7-2600 @ 3.40GHz with 8GB of RAM.
ID: 110087 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 110088 - Posted: 12 Oct 2022, 15:14:32 UTC - in response to Message 110087.  

It's not an 8 CPU task, the N-Body tasks are multithreaded, they use as many cores as your CPU has, or as many as you allow BOINC to use.
So, on an AMD Ryzen Threadripper 3990X 64-Core, 128-Thread CPU, they'll take 128 threads if that's the amount you allow BOINC to use.

And despite both simulations being from the same project, I'm not sure you can compare their runtime or science done.
ID: 110088 · Report as offensive
Babynetman

Send message
Joined: 11 Oct 22
Posts: 7
Message 110112 - Posted: 15 Oct 2022, 14:36:12 UTC - in response to Message 110088.  

The task is labeled 8 CPU, but thanks for the clarification.

If the project is using the same number of cores, either 8 x 1 or 1 x 8, the throughput should have remained the same. It appears the task that is consuming all 8 threads at once is performing only slightly better than the process that takes a single thread from a time consumed and estimated time of completion. The statistics bear this out as it shows a huge drop in processing since the 8 CPU tasks started appearing.

I agree, if the N-Body tasks are completely different than the standard simulation, comparing processing time could be apples to oranges, but the statistics should convey similar values as the productivity should match. Since they don't, I am just suggesting someone take a look to avoid a huge loss of productivity, or at least the appearance of a loss of productivity.
ID: 110112 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 110114 - Posted: 15 Oct 2022, 15:08:13 UTC - in response to Message 110112.  

The statistics bear this out ...
Which statistic, exactly, are you quoting when making that statement? It may be that you've uncovered a blemish in BOINC's record-keeping.

My experience is that recorded CPU time (or, more accurately, recorded core time) does usually come out as slightly below 8 times elapsed (wall-clock) time - there's a slight loss of efficiency during synchronisation between threads. But until we know exactly which metric you're referring to, we can't check the others.
ID: 110114 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1329
United States
Message 110115 - Posted: 15 Oct 2022, 15:12:00 UTC - in response to Message 110112.  

... I am just suggesting someone take a look to avoid a huge loss of productivity, or at least the appearance of a loss of productivity.

Then take your complaint over to the Milkyway Number crunching forums since it is their project and program(s) you feel are not working correctly.

The BOINC program it self does not do any calculations.
ID: 110115 · Report as offensive
Babynetman

Send message
Joined: 11 Oct 22
Posts: 7
Message 110118 - Posted: 16 Oct 2022, 0:10:15 UTC - in response to Message 110114.  

The BOINC Statistics display for host average. The average on the host dropped from just over 7000 per day to well under 2000 in 3 weeks - the graph does not offer enough precision to estimate better than that. Since terminating the 8 CPU tasks the average has climbed over 4500 in just a few days. The only difference is the allocation of the 8 CPU tasks...
ID: 110118 · Report as offensive
Babynetman

Send message
Joined: 11 Oct 22
Posts: 7
Message 110119 - Posted: 16 Oct 2022, 0:13:10 UTC - in response to Message 110115.  

I thought this is the milkyway@home forum, not the BOINC program forum. Apologies, if I'm mistaken.

If this issue is more suited for the other forum, I'm happy to move the concern there...
ID: 110119 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 110120 - Posted: 16 Oct 2022, 8:21:09 UTC - in response to Message 110118.  

The BOINC Statistics display for host average.
By which.I assume the graphical display in BOINC Manager.

Many years ago, as an undergraduate at one of the major physics labs in the UK, I was taught two lessons which have stayed with me for over fifty years (the actual physics, sadly, has not).

Lesson 1: Do every calculation twice. Once, using the best technology and the highest precision available (at the time, that was a sliderule). And again, with a pencil on the back of an envelope, to 'order of magnitude' - nearest power of 10 - precision only. That checks that you put the decimal point at the right place in the first answer.

Lesson 2: A number is meaningless unless you state the units it's measured in.

In this case, your numbers are 7000, 2000, 4500. Those are measured in "BOINC credits". They should be equivalent to "cobblestones", which is a defined number in terms of the number of calculations performed in reaching the scientific answer. Many of us wish that this was still the case, so that we can do the sort of comparisons that you are attempting.

But unfortunately, and disappointingly, the direct link between 'work done' and 'credits awarded' was broken over 10 years ago. Each project is free to choose its own credit reward rate, and as you have found, they don't all keep that consistent, even between the different task types within their own project.

As things stand at the moment, you have two options. Either take a deep breath, relax, and stop worrying about it. Or take it up with the research/administration team at Milkyway@Home: they are in control of their own credit rewards, and have the power to change it. But it may be low on their list of priorities.
ID: 110120 · Report as offensive
Profile Bill Freauff
Avatar

Send message
Joined: 26 Mar 11
Posts: 175
United States
Message 110166 - Posted: 21 Oct 2022, 22:07:52 UTC - in response to Message 110120.  

The BOINC Statistics display for host average.
By which.I assume the graphical display in BOINC Manager.

Many years ago, as an undergraduate at one of the major physics labs in the UK, I was taught two lessons which have stayed with me for over fifty years (the actual physics, sadly, has not).

Lesson 1: Do every calculation twice. Once, using the best technology and the highest precision available (at the time, that was a sliderule). And again, with a pencil on the back of an envelope, to 'order of magnitude' - nearest power of 10 - precision only. That checks that you put the decimal point at the right place in the first answer.

Lesson 2: A number is meaningless unless you state the units it's measured in.

In this case, your numbers are 7000, 2000, 4500. Those are measured in "BOINC credits". They should be equivalent to "cobblestones", which is a defined number in terms of the number of calculations performed in reaching the scientific answer. Many of us wish that this was still the case, so that we can do the sort of comparisons that you are attempting.

But unfortunately, and disappointingly, the direct link between 'work done' and 'credits awarded' was broken over 10 years ago. Each project is free to choose its own credit reward rate, and as you have found, they don't all keep that consistent, even between the different task types within their own project.

As things stand at the moment, you have two options. Either take a deep breath, relax, and stop worrying about it. Or take it up with the research/administration team at Milkyway@Home: they are in control of their own credit rewards, and have the power to change it. But it may be low on their list of priorities.



Very well said....
ID: 110166 · Report as offensive
Babynetman

Send message
Joined: 11 Oct 22
Posts: 7
Message 110167 - Posted: 21 Oct 2022, 22:19:29 UTC - in response to Message 110120.  

I'm not worrying about it. I simply thought something might be wrong and, if so, someone might care.

If there's nothing wrong, there's nothing wrong, and if no one cares, no one cares.

"No good deed goes unpunished..."
ID: 110167 · Report as offensive

Message boards : Questions and problems : Milkyway@Home - Massive Performance Issue - 1 Job Taking 8 CPUs vs 8 Jobs Taking 1 CPU Each

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.