Thread 'Anything and Everything to do with (WCG) World Community Grid'

Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · 35 . . . 37 · Next

AuthorMessage
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 613
Sweden
Message 117072 - Posted: 13 Oct 2025, 14:37:31 UTC
Last modified: 13 Oct 2025, 15:11:18 UTC

Nothing will happen today. It's Thanks Giving holiday in Canada this Monday.

Edit, added: Igor deleted the 4 annoying spam messages from "almogu" in the WCG forum, even though it's a holiday Monday in Canada.
ID: 117072 · Report as offensive     Reply Quote
PMH_UK

Send message
Joined: 24 Dec 10
Posts: 83
United Kingdom
Message 117075 - Posted: 13 Oct 2025, 21:57:48 UTC - in response to Message 117072.  

October 13, 2025

Happy Thanksgiving to our Canadian volunteers and partners.
Work on finishing deployment setup will resume tomorrow.
Paul.
ID: 117075 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 613
Sweden
Message 117077 - Posted: 14 Oct 2025, 0:36:24 UTC - in response to Message 117075.  
Last modified: 14 Oct 2025, 0:37:05 UTC

In reply to PMH_UK's message of 13 Oct 2025:
October 13, 2025

Happy Thanksgiving to our Canadian volunteers and partners.
Work on finishing deployment setup will resume tomorrow.
I didn't realize that was a new update from WCG, until I read their Operational Status page.
I thought it was something from you PMH_UK :-)
ID: 117077 · Report as offensive     Reply Quote
Profilejay_e
Avatar

Send message
Joined: 8 Mar 07
Posts: 125
United States
Message 117086 - Posted: 15 Oct 2025, 2:31:45 UTC
Last modified: 15 Oct 2025, 2:32:14 UTC

greetings!!
-- A status --
It's 10:26PM EDT on Tuesday, Oct 14.
I tried, but no WU available for AMD/Intel 64...
my BoincManager event log:
Tue 14 Oct 2025 10:16:59 PM EDT | World Community Grid | Scheduler request completed: got 0 new tasks
Tue 14 Oct 2025 10:16:59 PM EDT | World Community Grid | No tasks sent
Tue 14 Oct 2025 10:16:59 PM EDT | World Community Grid | No tasks are available for OpenPandemics - COVID 19
Tue 14 Oct 2025 10:16:59 PM EDT | World Community Grid | No tasks are available for OpenPandemics - COVID-19 - GPU
Tue 14 Oct 2025 10:16:59 PM EDT | World Community Grid | No tasks are available for Africa Rainfall Project
Tue 14 Oct 2025 10:16:59 PM EDT | World Community Grid | No tasks are available for Mapping Cancer Markers
Tue 14 Oct 2025 10:16:59 PM EDT | World Community Grid | Tasks for NVIDIA GPU are available, but your preferences are set to not accept them
Tue 14 Oct 2025 10:16:59 PM EDT | World Community Grid | Tasks for AMD/ATI GPU are available, but your preferences are set to not accept them
Tue 14 Oct 2025 10:16:59 PM EDT | World Community Grid | Tasks for Intel GPU are available, but your preferences are set to not accept them

Maybe other WU are/were available.
:-)
Jay
ID: 117086 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1704
United States
Message 117087 - Posted: 15 Oct 2025, 5:27:18 UTC - in response to Message 117086.  

greetings!!
-- A status --
It's 10:26PM EDT on Tuesday, Oct 14.
I tried, but no WU available for AMD/Intel 64...
my BoincManager event log:
...
Maybe other WU are/were available.
:-)
Jay

Nothing is available yet as the BOINC backend is still not up and running.as it should be.

Maybe in the year 2525 they will have all the current set of bugs fixed.
ID: 117087 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 613
Sweden
Message 117091 - Posted: 16 Oct 2025, 1:56:13 UTC

New update from the WCG team:

October 15, 2025
Testing the validators right now, been a lot of iterations on these.

As soon as the validator works, we will deploy across the six partitions and clear the backlog.
Then we can check the transitioner interaction. If that is all good, we can finally start sending new work.

Going to finalize object storage for the archive - instead of previous tape backup.
ID: 117091 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 3061
United Kingdom
Message 117099 - Posted: 17 Oct 2025, 15:54:54 UTC

Weekend coming up. Guessing not this week.
ID: 117099 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 613
Sweden
Message 117101 - Posted: 17 Oct 2025, 21:43:12 UTC - in response to Message 117099.  

In reply to Dave's message of 17 Oct 2025:
Weekend coming up. Guessing not this week.
Probably not. Although, we do know that part of the team (Dylan and Igor) has been working even on weekends.
ID: 117101 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 613
Sweden
Message 117106 - Posted: 18 Oct 2025, 23:57:26 UTC
Last modified: 19 Oct 2025, 0:19:23 UTC

Something is slowing down the WCG website extremely much now. Let's wait and see what that means.

Edit: Going to "System Error" now. My guess is that they finally released the validator Kraken, on the "Pending Validation" tasks. We'll see....
ID: 117106 · Report as offensive     Reply Quote
Bryn Mawr
Help desk expert

Send message
Joined: 31 Dec 18
Posts: 331
United Kingdom
Message 117107 - Posted: 19 Oct 2025, 0:31:40 UTC - in response to Message 117106.  

Service Unavailable from this end.
ID: 117107 · Report as offensive     Reply Quote
Jean-David

Send message
Joined: 19 Dec 05
Posts: 118
United States
Message 117108 - Posted: 19 Oct 2025, 0:46:01 UTC - in response to Message 117107.  

Down for Everyone or Just Me
Is Worldcommunitygrid.org down?
It's not just you! worldcommunitygrid.org is down.

Last updated: Oct 18, 2025, 8:44 PM (1 second ago)

ID: 117108 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 613
Sweden
Message 117109 - Posted: 19 Oct 2025, 1:14:29 UTC
Last modified: 19 Oct 2025, 1:24:45 UTC

No, it's not down, just slower than molasses in the winter, and showing "System Error", and also "Service Unavailable", from time to time. However, with a ton of patience, it is possible to get to the site, and even post on the forum.

I just manage to edit, and add to a post I posted just when the slowness begun. I can still get to the site, however you need patience :-)
ID: 117109 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 613
Sweden
Message 117110 - Posted: 19 Oct 2025, 4:33:53 UTC

New long update from WCG:

October 18, 2025
We are sending small batches of workunits out starting tonight with batch IDs in the range 9999900+ for MCM1 to test the new distributed partition-aware batch upserting app-specific create_work daemons. The few volunteers who get these workunits before we start releasing larger batches as we gain confidence that the new system is working as expected may notice these workunits have a much smaller number of signatures and run much faster than normal. These are still meaningful workunits, but key parameters such as number of signatures to test per workunit were reduced so we could get feedback quckly.

Similar to ARP1, we have moved all workunit templating and preparation to WCG servers for MCM1. We did this for the MAM1 beta (beta30) already, but we were able to move the rendering of workunit templates per batch into the create_work daemon C++ code directly, where it consumes a protobuf schema from Kafka/Redpanda's schema registry that it then hydrates to produce all workunits for the batch according to the desired parameters it consumes from the "plan" topic via Kafka. Hence, "app-specific" above. Then, it updates the BOINC database in bulk instead of calling BOINC's create_work() function. Metadata is local, partitioned, replicated in Kafka for durability, each batch writes files to that nodes' 1/6th of the buckets from the BOINC dir_hier fanout directory and commits 1/6th of the batch records to the database in non-overlapping ranges per 10k workunits per batch.

The new validators are working and deployed. In our new distributed, partitioned approach, validators process workunits local to their host ONLY, uploads are partitioned according to the fanout directory assigned by BOINC, routed to the correct backend node by HAProxy corresponding to the BOINC fanout buckets. We split the buckets between nodes, instead of using them to fanout across the filesystem and avoid massive numbers of files in a single BOINC upload path, we fan out across the cluster and read/write these buckets in tmpfs so Apache serves downloads and accepts uploads in-memory, validators read in-memory, Kafka/Redpanda gets a copy of uploads into a disk-persisted, replicated topic for durability so if a node goes down and we lose the in-memory cache of downloads and uploads, we can replay and recover.

By subscribing to a Kafka topic containing the count of uploads, a reduction on upload events emitted to Kafka topics from the new file_upload_handlers for only the local buckets of that partition, file locations pertaining to a pair of workunits, and emits success or failure to another queue for downstream "assimilation". We have written and are testing a batch applier that collects successful validation events on each partition, and batch updates the BOINC database so that the transitioner and scheduler can work together to evaluate the state of those workunits. Once we are confident the batch updates work as expected from the applier, users should start seeing workunits pending validation clear to valid.

We are not running file_deleter or db_purge at the moment, they need to be rearchitected to match the new setup, or at minimum assessed to make sure it makes sense to start them unchanged. We have no concerns about running out of space in the database or on disk at the moment, only making mistakes, so we will get around to assessing what if anything needs to change about file_deleter and db_purge soon but not now. Likely, they will also take advantage of per-workunit event data from Redpanda/Kafka instead of just talking to the BOINC database and operate on local partitions across the cluster. But as we are producing events for every workunit's full lifecycle to Kafka topics we have a level of visibility and control we were never able to achieve with the legacy system, and we were able to set up prometheus node_exporter, tap into docker stats endpoints per node across the cluster, and likewise for Redpanda/Kafka with the helpful https://github.com/redpanda-data/observability repo to get a Grafana dashboard going that will let us do many things, such as serve up server status pages, and improve the stats pages.
ID: 117110 · Report as offensive     Reply Quote
Profileunixchick

Send message
Joined: 28 Mar 18
Posts: 145
United States
Message 117111 - Posted: 19 Oct 2025, 5:00:08 UTC - in response to Message 117110.  
Last modified: 19 Oct 2025, 5:08:53 UTC

A status page?!? Amazing!

I can't get on to the WCG forum. My Boinc can connect without error. I haven't gotten any WUs though. I still have a bunch of backup projects WUs to do, so no worries.

Thank you Grumpy for posting the update here. It is a nice long fact filled update.

Now I can get on the WCG forums.
ID: 117111 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 3061
United Kingdom
Message 117114 - Posted: 19 Oct 2025, 7:08:39 UTC - in response to Message 117111.  

Forum working fine for me. No tasks though either on Linux or Android. Haven't tried Windows as I am running one of the large memory tasks from CPDN in native Linux and one in a VM. They take ~26GB each and firing up a Windows VM would probably crash one of them.
ID: 117114 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 3061
United Kingdom
Message 117119 - Posted: 19 Oct 2025, 19:39:32 UTC

But Android still getting the png files failing to download messages.
ID: 117119 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 613
Sweden
Message 117124 - Posted: 20 Oct 2025, 8:14:39 UTC - in response to Message 117119.  

In reply to Dave's message of 19 Oct 2025:
But Android still getting the png files failing to download messages.
The PNG files issue is solved. However they should at the same time also have fixed the issue with unnecessary downloads of PNG files for projects that finished a long time ago, such as mip1, hst1, and scc1. Then we wouldn't have to download 21 files (956 934 byte), for no reason whatsoever.
ID: 117124 · Report as offensive     Reply Quote
MyrCu

Send message
Joined: 27 Aug 22
Posts: 40
Message 117131 - Posted: 21 Oct 2025, 13:10:33 UTC

I received a couple of MCM WUs on my Android Phone.
They took less than 10 Minutes to finish.
ID: 117131 · Report as offensive     Reply Quote
MyrCu

Send message
Joined: 27 Aug 22
Posts: 40
Message 117132 - Posted: 21 Oct 2025, 13:18:03 UTC - in response to Message 117131.  

For my Linux devices they are no WUs avaible.
ID: 117132 · Report as offensive     Reply Quote
bill

Send message
Joined: 11 Sep 15
Posts: 12
United States
Message 117133 - Posted: 21 Oct 2025, 13:21:01 UTC
Last modified: 21 Oct 2025, 13:31:49 UTC

It's a miracle! I got tasks!

Whoops...they seem to only run 3:30 minutes each.
ID: 117133 · Report as offensive     Reply Quote
Previous · 1 . . . 29 · 30 · 31 · 32 · 33 · 34 · 35 . . . 37 · Next

Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.