Message boards : BOINC client : Multi core tasks alongside single core tasks.
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Jun 10 Posts: 2691 |
Running one old resend from CPDN. I decided to allow some amicable numbers tasks to also run. I downloaded a few one of which started on 7 cores. I increased the percentage of cores to be used to 50 to allow the CPDN task to keep going. It wouldn't start till I had paused all the AN tasks, then I was able to un-pause them and it continued to run. 1. Is this a known behaviour? 2: Is there an option to limit the number of cores multi core tasks will use? Edit: Even after I increased the number of cores available to BOINC to 60% after a minute or two of the AN 7 core task running alongside the CPDN task the CPDN task stops and shows as, "Waiting to run." While waiting to here views on this, I will nip over to Git-hub and see if there is a bug filed for this. As the CPDN task is a priority for me I have aborted the AN tasks as they wouldn't restart before the deadline if paused to let the CPDN one complete. Edit:2 Issue opened on git-hub. #5254 |
Send message Joined: 28 Jun 10 Posts: 2691 |
AenBleidd commented May 27, 2023 The bit I don't understand is that if I increase the cores available after the multicore task has started, the CPDN one still stops within two minutes after an Amicable Numbers one is restarted. |
Send message Joined: 5 Oct 06 Posts: 5128 |
I got involved at github because Vitalii's response includes a suggested resolution from me, and we need to do some proper testing on the latest code (Dave's running v7.23.0 self-build from master). I've joined Amicable, and found I could set the number of CPUs to use in Project Preferences, as well as the workround I suggested. I chose 3, and got a task allocated with: <avg_ncpus>3.000000</avg_ncpus> <plan_class>mt</plan_class> <cmdline>--nthreads 3</cmdline>so that bit's working as documented. I'll monitor what it gets up to when running. But this is with a pretty well-tested v7.20.5 from the PPA. Looks like I might have to do some building, too. Edit: And now my 6-core machine is running 3 cores for AM, 1 single-core CPU task, and 2 GPU tasks with a full core assigned to each (ugh - OpenCL). That's what I expect and want, but it took a while for BOINC to respond and pause the other two single-core tasks, so for a while I was running 8 cores. |
Send message Joined: 28 Jun 10 Posts: 2691 |
First test, if I cut down the number of CPUs BOINC can use, the CPDN single core task keeps running. When the downloaded tasks which are now set at 4 threads are finished, I will increase to 5 and try again. So this seems like another anomaly, the AN task runs using 4 cores and the CPDN one uses one making 5 in total so 5/16 rather than 1/4 which is what I set in BOINC. i have not currently got an app_config for any projects. |
Send message Joined: 5 Oct 06 Posts: 5128 |
OK, I've pulled down the artifacts for #5251 - 2 days old, so probably pretty close to yours. I'll take them for a spin tomorrow. |
Send message Joined: 28 Jun 10 Posts: 2691 |
OK, I've pulled down the artifacts for #5251 - 2 days old, so probably pretty close to yours. I'll take them for a spin tomorrow. 14th May was the date for my master. |
Send message Joined: 5 Oct 06 Posts: 5128 |
v7.22/23 is all about a whole new set of global preferences, separate ones for 'not in use'. #5251 was a late afterthought by David, who thought a late and undocumented change - which he couldn't remember why he put in - might lead to "Otherwise BOINC will stop computing after 60 minutes of idleness." That would be stopping everything, not just one project, but it's not a million miles from the question you're asking. That's why I want to go on poking and prodding until we understand exactly what's going on, and why. |
Send message Joined: 28 Jun 10 Posts: 2691 |
That would be stopping everything, not just one project, but it's not a million miles from the question you're asking. That's why I want to go on poking and prodding until we understand exactly what's going on, and why. I shall try and get a more precise summary of the issue here. Most of the time I don't do the sort of thing that has alerted me to the issue so its not a big issue for me but I can see it might be for some, especially if they want their system to crunch 24/7 without any intervention. I look at what is happening on this machine at least two or three times most days so tend to notice if something is not behaving as expected quite quickly. |
Send message Joined: 28 Jun 10 Posts: 2691 |
May have something. Increased %cpus to 40. Downloaded bunch of AN tasks. got rid of the nvidia ones that always crash on my machine that would run alongside the multicore tasks till they crashed. CPDN task didn't start. Increased cpu% to 50% i.e. 8 cores.Still didn't start. Extract from event log. Sun 28 May 2023 07:53:29 BST | | [cpu_sched_debug] Request CPU reschedule: periodic CPU scheduling Sun 28 May 2023 07:53:29 BST | | [cpu_sched_debug] schedule_cpus(): start Sun 28 May 2023 07:53:29 BST | climateprediction.net | [cpu_sched_debug] add to run list: hadam4h_a05m_200011_5_932_012141214_1 (CPU, FIFO) (prio -0.175154) Sun 28 May 2023 07:53:29 BST | | [cpu_sched_debug] enforce_run_list(): start Sun 28 May 2023 07:53:29 BST | | [cpu_sched_debug] preliminary job list: Sun 28 May 2023 07:53:29 BST | climateprediction.net | [cpu_sched_debug] 0: hadam4h_a05m_200011_5_932_012141214_1 (MD: no; UTS: no) Sun 28 May 2023 07:53:29 BST | | [cpu_sched_debug] final job list: Sun 28 May 2023 07:53:29 BST | climateprediction.net | [cpu_sched_debug] 1: hadam4h_a05m_200011_5_932_012141214_1 (MD: no; UTS: no) Sun 28 May 2023 07:53:29 BST | climateprediction.net | [cpu_sched_debug] all CPUs used (6.00 >= 6), skipping hadam4h_a05m_200011_5_932_012141214_1 Sun 28 May 2023 07:53:29 BST | | [cpu_sched_debug] enforce_run_list: end It seems like the client hasn't increased the number of CPUs in use. Sun 28 May 2023 07:53:29 BST | climateprediction.net | [cpu_sched_debug] all CPUs used (6.00 >= 6), skipping hadam4h_a05m_200011_5_932_012141214_1Is the line that may be a smoking gun? I aborted the AN tasks in the queue to see if it was to do with task priority but again having stopped the Amicable task the CPDN one restarted but stopped again once the AN started. There is a risk that I might lose the CPDN task but given it is almost a year old so probably not going to be used i was wondering about stopping the client and restarting it to see if that made any difference? But wondered if there was anything else I should look at first? |
Send message Joined: 5 Oct 06 Posts: 5128 |
Can you show us the bit of the log that says what those 6 cpus are busy with, please? Is that your current Amicable ncpus setting? |
Send message Joined: 28 Jun 10 Posts: 2691 |
Can you show us the bit of the log that says what those 6 cpus are busy with, please? Is that your current Amicable ncpus setting? Will scroll back and find it in a minute. first, Sun 28 May 2023 09:08:30 BST | | [cpu_sched_debug] final job list: Sun 28 May 2023 09:08:30 BST | climateprediction.net | [cpu_sched_debug] 0: hadam4h_a05m_200011_5_932_012141214_1 (MD: no; UTS: yes) Sun 28 May 2023 09:08:30 BST | climateprediction.net | [cpu_sched_debug] scheduling hadam4h_a05m_200011_5_932_012141214_1 Sun 28 May 2023 09:08:30 BST | | [cpu_sched_debug] using 1.00 out of 6 CPUs Sun 28 May 2023 09:08:30 BST | climateprediction.net | [css] running hadam4h_a05m_200011_5_932_012141214_1 ( ) Sun 28 May 2023 09:08:30 BST | | [cpu_sched_debug] enforce_run_list: endNow the Amicable numbers task is finished, as you can see the event log is showing Sun 28 May 2023 09:08:30 BST | | [cpu_sched_debug] using 1.00 out of 6 CPUs despite the manager showing 8 as being in use. |
Send message Joined: 28 Jun 10 Posts: 2691 |
Sun 28 May 2023 08:54:17 BST | | [cpu_sched_debug] final job list: |
Send message Joined: 5 Oct 06 Posts: 5128 |
That really does look like a problem and a smoking gun. I've pulled down a 2-week old artifact as well, and I'll deploy that when my caffeine levels have reached optimum - see if I can reproduce it. Then, it's probably off to the simulator - that will be David's first response. I did get a change made to MT handling (#4992) for CPDN/IFS, but that should have been for the server only - and it looks like it is (sched/sched_send.cpp is a server file). The conversation in that PR rather tails off, but I think we got it tested in the end, thanks to LHC. |
Send message Joined: 28 Jun 10 Posts: 2691 |
I think I am going to need to go back to school as it were and learn some programming in whatever version of C BOINC uses. My only formal learning was in ALGOL which I haven't seen evidence of it being used for at least 30 years. |
Send message Joined: 5 Oct 06 Posts: 5128 |
Likewise! I learned Algol 60 at the back of my mother's classes in the school holidays, and we used Algol W as the main language on my diploma course. Algol W was a Stanford University product, so Berkeley probably doesn't recognise it. |
Send message Joined: 28 Jun 10 Posts: 2691 |
Likewise! I learned Algol 60 at the back of my mother's classes in the school holidays, and we used Algol W as the main language on my diploma course. Algol W was a Stanford University product, so Berkeley probably doesn't recognise it. I can't even remember the differences between 60 and W now. W I spent a couple of terms doing while still at school, 60 I did some of at St. Andrew's Uni. before going off in a different direction and working in child and adolescent mental health nursing for 25 years. |
Send message Joined: 5 Oct 06 Posts: 5128 |
OK, back to business - prepping up. This is a 6-core i5-9600KF, which I think is capable of hyperthreading to 12 cores, but I have that turned off in hardware. So, first prep job is to turn that down in preferences, so I can turn it back up later. Sun 28 May 2023 10:31:18 BST | | Number of usable CPUs has changed from 6 to 5. Sun 28 May 2023 10:31:18 BST | | max CPUs used: 5 Sun 28 May 2023 10:31:19 BST | NumberFields@home | [cpu_sched] Preempting wu_sf3_DS-16x271-21_Grp890573of1000000_0 (left in memory)Note that I have activity set to 'run always', not according to preferences, but that preference is acted on anyway. With the 2-week old artifact, Sun 28 May 2023 10:48:22 BST | | Starting BOINC client version 7.23.0 for x86_64-pc-linux-gnu Sun 28 May 2023 10:48:22 BST | | This a development version of BOINC and may not function properly Sun 28 May 2023 10:48:23 BST | | - max CPUs used: 5Now off to find an MT task... Got a couple. Now waiting for something to finish so MT starts up. |
Send message Joined: 5 Oct 06 Posts: 5128 |
I started my course in 1973, after this happened: 1969 Move to new building on an adjacent site. Titan airlifted by crane ('the computing service is suspended').(quotes from the official history. Upgrades, eh?) According to Wikipedia, Algol W was a key language on the IBM 360 range, and I presume 370 as well. That's the one I used. |
Send message Joined: 5 Oct 06 Posts: 5128 |
Back to business. Sun 28 May 2023 11:09:50 BST | NumberFields@home | Computation for task wu_sf3_DS-16x271-21_Grp890573of1000000_0 finished Sun 28 May 2023 11:09:50 BST | Amicable Numbers | [cpu_sched] Starting task amicable_10_21_2426_1685252702.324398_984_1 using amicable_10_21 version 300 (mt) in slot 2 Sun 28 May 2023 11:10:50 BST | NumberFields@home | [cpu_sched] Preempting wu_sf3_DS-16x271-21_Grp883275of1000000_0 (left in memory) Sun 28 May 2023 11:10:50 BST | NumberFields@home | [cpu_sched] Preempting wu_sf3_DS-16x271-21_Grp881899of1000000_0 (left in memory)Amicable is running on 3 cores, as yesterday. Note the one-minute time delay before pausing the two other tasks. And this is definitely bonkers, and a bug. Sun 28 May 2023 11:20:24 BST | | Number of usable CPUs has changed from 5 to 6. Sun 28 May 2023 11:20:25 BST | Amicable Numbers | Starting task amicable_10_21_2426_1685252702.324398_998_1 Sun 28 May 2023 11:20:25 BST | Amicable Numbers | [cpu_sched] Starting task amicable_10_21_2426_1685252702.324398_998_1 using amicable_10_21 version 300 (mt) in slot 5I'm now running on 8 cores - 2x3 for Amicable, 1 each supporting the 2 GPUs. cpu_sched_debug... And here's the evidence. Sun 28 May 2023 11:25:55 BST | NumberFields@home | [cpu_sched_debug] all CPUs used (8.00 >= 6), skipping wu_sf3_DS-16x271-21_Grp883275of1000000_0 Sun 28 May 2023 11:25:56 BST | | [cpu_sched_debug] Request CPU reschedule: application exited Sun 28 May 2023 11:25:56 BST | | [cpu_sched_debug] final job list: Sun 28 May 2023 11:25:56 BST | Einstein@Home | [cpu_sched_debug] 0: LATeah4021L07_940.0_0_0.0_34824363_2 (MD: no; UTS: yes) Sun 28 May 2023 11:25:56 BST | Einstein@Home | [cpu_sched_debug] 1: LATeah4021L08_1116.0_0_0.0_6637911_0 (MD: no; UTS: no) Sun 28 May 2023 11:25:56 BST | Amicable Numbers | [cpu_sched_debug] 2: amicable_10_21_2426_1685252702.324398_984_1 (MD: no; UTS: yes) Sun 28 May 2023 11:25:56 BST | Amicable Numbers | [cpu_sched_debug] 3: amicable_10_21_2426_1685252702.324398_998_1 (MD: no; UTS: yes) Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 4: wu_sf3_DS-16x271-21_Grp883275of1000000_0 (MD: no; UTS: no) Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 5: wu_sf3_DS-16x271-21_Grp881899of1000000_0 (MD: no; UTS: no) Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 6: wu_sf3_DS-16x271-21_Grp890082of1000000_0 (MD: no; UTS: no) Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 7: wu_sf3_DS-16x271-21_Grp889806of1000000_0 (MD: no; UTS: no) Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 8: wu_sf3_DS-16x271-21_Grp890572of1000000_0 (MD: no; UTS: no) Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] 9: wu_sf3_DS-16x271-21_Grp898646of1000000_0 (MD: no; UTS: no) Sun 28 May 2023 11:25:56 BST | Einstein@Home | [cpu_sched_debug] scheduling LATeah4021L07_940.0_0_0.0_34824363_2 Sun 28 May 2023 11:25:56 BST | Einstein@Home | [cpu_sched_debug] scheduling LATeah4021L08_1116.0_0_0.0_6637911_0 Sun 28 May 2023 11:25:56 BST | Amicable Numbers | [cpu_sched_debug] scheduling amicable_10_21_2426_1685252702.324398_984_1 Sun 28 May 2023 11:25:56 BST | Amicable Numbers | [cpu_sched_debug] scheduling amicable_10_21_2426_1685252702.324398_998_1 Sun 28 May 2023 11:25:56 BST | NumberFields@home | [cpu_sched_debug] all CPUs used (8.00 >= 6), skipping wu_sf3_DS-16x271-21_Grp883275of1000000_0 Sun 28 May 2023 11:25:56 BST | Einstein@Home | Starting task LATeah4021L08_1116.0_0_0.0_6637911_0 Sun 28 May 2023 11:25:56 BST | Einstein@Home | [cpu_sched] Starting task LATeah4021L08_1116.0_0_0.0_6637911_0 using hsgamma_FGRPB1G version 128 (FGRPopencl2Pup-nvidia) in slot 1 Sun 28 May 2023 11:25:56 BST | | [cpu_sched_debug] enforce_run_list: end |
Send message Joined: 28 Jun 10 Posts: 2691 |
Eliott22,000 here though I couldn't find it on the Wikipedia page for Eliott computers. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.