Thread 'Anything and Everything to do with (WCG) World Community Grid'

Author	Message
Hadrian Send message Joined: 5 Nov 11 Posts: 65	Message 117329 - Posted: 2 Nov 2025, 14:44:00 UTC - in response to Message 117325. Some of mine from yesterday will not upload, 24 hours so far. ID: 117329 · Reply Quote

MyrCu Send message Joined: 27 Aug 22 Posts: 41	Message 117330 - Posted: 2 Nov 2025, 21:26:46 UTC - in response to Message 117329. Same here. ID: 117330 · Reply Quote

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1460	Message 117358 - Posted: 5 Nov 2025, 17:54:49 UTC While plenty of tasks are available the validation queue is getting longer and longer - someone needs to give it a prod/kick/enema to get it moving properly. ID: 117358 · Reply Quote

Jean-David Send message Joined: 19 Dec 05 Posts: 124	Message 117359 - Posted: 5 Nov 2025, 20:26:34 UTC - in response to Message 117358. Not only that ... Wed 05 Nov 2025 02:40:36 PM EST \| World Community Grid \| Sending scheduler request: To report completed tasks. Wed 05 Nov 2025 02:40:36 PM EST \| World Community Grid \| Reporting 8 completed tasks Wed 05 Nov 2025 02:40:36 PM EST \| World Community Grid \| Requesting new tasks for CPU Wed 05 Nov 2025 02:40:41 PM EST \| World Community Grid \| Scheduler request failed: HTTP service unavailable Wed 05 Nov 2025 02:57:58 PM EST \| World Community Grid \| Sending scheduler request: To report completed tasks. Wed 05 Nov 2025 02:57:58 PM EST \| World Community Grid \| Reporting 8 completed tasks Wed 05 Nov 2025 02:57:58 PM EST \| World Community Grid \| Requesting new tasks for CPU Wed 05 Nov 2025 02:58:03 PM EST \| World Community Grid \| Scheduler request failed: HTTP service unavailable ID: 117359 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 741	Message 117360 - Posted: 5 Nov 2025, 20:55:35 UTC WCG website slow as molasses in the winter. Dylan writes: Database crashed, was able to clear the write lock/disk sleeps causing a crash loop and try restarting the container which was stuck, but it seems there is an IO issue with the volume that the BOINC database runs from or some further cleanup I still need to do before I can get the database up and running again. I can r/w to the volume manually so hopefully something I am able to handle without reaching out to hosting about the volume, it is a Ceph RDB and we store backups to a separate NFS mount point so data loss is not expected, but I don't quite know how long this is going to take yet. ID: 117360 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 741	Message 117366 - Posted: 6 Nov 2025, 12:32:06 UTC Changes to the Device Profiles, are now propagating to the BOINC client. Example: World Community Grid 2025-11-06 13:17:52 General prefs: from World Community Grid (last modified 06-Nov-2025 13:17:20) Thank you Dylan, for all your hard work (which BTW seems to go on during all hours, even in the middle of the Toronto nights.) ID: 117366 · Reply Quote

Dr Who Fan Send message Joined: 10 May 07 Posts: 1864	Message 117367 - Posted: 6 Nov 2025, 15:04:28 UTC - in response to Message 117366. Seems to be working at the moment. I woke up to a few tasks running on my Android phone this morning and it recently uploaded one task with no issues. Big thanks to Dylan and everyone else at Jurisica for making it this far!!! ID: 117367 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 741	Message 117368 - Posted: 6 Nov 2025, 17:48:28 UTC - in response to Message 117358. In reply to robsmith's message of 5 Nov 2025: While plenty of tasks are available the validation queue is getting longer and longer - someone needs to give it a prod/kick/enema to get it moving properly. Yeah, it's almost no validation going on at the moment. That's expected while Dylan is working on that part of the system. There's also tons of finished tasks to validate that was crunched before (and cached task during) the migration, and uploaded and reported when the system came back. It will take a long time, before everything is back to normal. ID: 117368 · Reply Quote

Jean-David Send message Joined: 19 Dec 05 Posts: 124	Message 117374 - Posted: 7 Nov 2025, 18:41:10 UTC - in response to Message 117367. Does not seem to be working now: Lots of these: Fri 07 Nov 2025 12:55:40 PM EST \| World Community Grid \| Sending scheduler request: To fetch work. Fri 07 Nov 2025 12:55:40 PM EST \| World Community Grid \| Requesting new tasks for CPU Fri 07 Nov 2025 12:55:42 PM EST \| World Community Grid \| Scheduler request completed: got 0 new tasks Fri 07 Nov 2025 12:55:42 PM EST \| World Community Grid \| Server error: feeder not running ID: 117374 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 741	Message 117382 - Posted: 8 Nov 2025, 17:25:27 UTC The feeder is back, but no new work available yet. ID: 117382 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 741	Message 117395 - Posted: 10 Nov 2025, 19:19:37 UTC Last modified: 10 Nov 2025, 19:20:50 UTC New update from Dylan on the WCG forum. https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,47541_offset,380#707377 It has been resends only over the weekend, the mcm1_create_work daemons lost their database connection during BOINC database maintenance, and I realized they needed some code changes to not skip batches when the database connection won't allow BOINC to receive the new workunits defined by the batch plan, and a few other BOINC daemons like the batch assimilator (all the daemons live in the same container I've stuffed all our legacy code into) also needed some work to setup the fix for the validation fix. There will be a more comprehensive update posted shortly on the lab website "Operational Status" tab (https://www.cs.toronto.edu/~juris/jlab/wcg.html), but the TLDR is that today I plan to restart MCM1 batch production after I push a new build of the BOINC daemons, transitioner, setup a Kafka broker on the BOINC database node to backfill the assimilators with resends and scheduler reported tasks that didn't have the details needed to calc credit when the assimilator first received the upload pair from the validator, and if all that goes well then I am pretty sure I can finally piece together the full validation backlog from over the break and set the assimilators upon it. ID: 117395 · Reply Quote

Hans Sveen Send message Joined: 3 Nov 20 Posts: 25	Message 117400 - Posted: 10 Nov 2025, 21:30:36 UTC Last modified: 10 Nov 2025, 21:34:22 UTC Here is the whole update::: https://www.cs.toronto.edu/~juris/jlab/wcg.html November 11, 2025 Database maintenance over Friday/Saturday completed without issue. We have resolved an issue with the backup scripts, effectively increased memory used to service database queries and added some new indices. We expect better performance from the BOINC database going forward. However, the disk remains slower than initial benchmarking when we stood up the database. We will monitor and reach out to hosting to see if the Ceph placement group expansion (that caused the stuck blocks of that particular disk when the placement group the result table lives on) got stuck in a "peering" state. We were informed that we should expect temporary, possibly intermittent slow IO during this Ceph maintenance window. If we can get faster disks for the BOINC database (which would require restoring the database to a new volume as we did to migrate) we will consider a maintenance window. Right now, we are optimistic the issues revealed in the new system by hanging database queries and database crashes can all be resolved with patches the new BOINC daemons, and current performance will be sufficient. As mentioned, this event identified several issues with the new BOINC daemons. MCM1 workunit creation proceeds in the Kafka topic even though the database is down, the mcm1_create_work daemon for it's Kafka partition on science01...science06 tries to commit it's part of the batch, database isn't there, so it doesn't do anything, but it does commit it's offset/pointer into the batch plan topic and move on to consume the next batch plan. That means every 10-15m while the database is down, a batch is effectively skipped. We were able to fix that, and have restarted MCM1 batch creation at roughly 5:00 p.m. EST, November 10th, 2025. We believe we have finally architected a fix for the pending validation backlog issue. This requires some non-trivial plumbing in the MCM1 batch assimilator, a Kafka connector deployed on the BOINC database node, and transitioner code changes. Workunit supply may remain artificially lower while we roll out the new batch assimilator builds and monitor the transitioner -> Kafka event consumption and result table interaction. We were able to resolve the issue with computing preferences not being updated from the website to BOINC client and vice versa. Generally, when the BOINC database goes down, so does the event listener that handles these messages on the webserver. We are still working on resolving the validation backlog from over the break, with the result table bricked during the Ceph maintenance we architected a "trust the filesystem" solution, and we are hopeful that this issue will be resolved this week. MAM1 was initially planned to be resumed in beta30 last week, to see if 7.07 fairly schedules work and respects --nthreads, which is a blocking issue in promoting the beta application to production. Depending on the error rate and behaviour on BOINC clients, we would then consider the stable code paths for the first production batches. Given our increased control over batch parameters with the new Kafka topic that uses a protobuf schema to fill out the workunit and result table entires, we intend to run work in production on Linux as soon as the beta30 application is stable with an error rate lower than MCM1 excepting the GLIBC dependency, which is typically the only repeated error we see from clients on the current LibTorch code path. We will then rely on iterating the beta30 application to 7.08 and 7.09 to get GPU and Windows support, and Parquet IO for input and uploaded results. Hans S. ID: 117400 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 741	Message 117409 - Posted: 12 Nov 2025, 15:27:06 UTC New work is incoming, but there is a strange issue for quite a lot of my new tasks. Not all of them though, but many. Check the OS types, and versions you're paired with. My Windows 8.1 is paired with "T". That's a new OS and Version for me. Might be an AI reminder that I need to get myself another cup of Tea, maybe :-) A few examples, of many: https://www.worldcommunitygrid.org/contribution/workunit/772178042 https://www.worldcommunitygrid.org/contribution/workunit/772178045 https://www.worldcommunitygrid.org/contribution/workunit/772178049 ID: 117409 · Reply Quote

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 3392	Message 117411 - Posted: 12 Nov 2025, 15:49:51 UTC - in response to Message 117409. Check the OS types, and versions you're paired with. My Windows 8.1 is paired with "T". That's a new OS and Version for me. Might be an AI reminder that I need to get myself another cup of Tea, maybe :-) No coffee reminder for me. (I only have Linux and Android tasks. What I did notice is that Android is picking up _0 and _1 tasks. All my Linux ones are _2 or occasionally, _3. ID: 117411 · Reply Quote

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 3392	Message 117413 - Posted: 13 Nov 2025, 10:01:44 UTC Now getting freshly generated tasks on Linux as well as Android. Even running only one task at a time on each platform, I am producing results a lot faster than they are getting validated. ID: 117413 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 741	Message 117414 - Posted: 13 Nov 2025, 10:47:16 UTC - in response to Message 117413. In reply to Dave's message of 13 Nov 2025: Now getting freshly generated tasks on Linux as well as Android. Even running only one task at a time on each platform, I am producing results a lot faster than they are getting validated. Yeah, validation is more or less dead. Only a few tasks finished by both wingmen are validated per day now. Pending Validation are building up fast. They still haven't solved that issue. ID: 117414 · Reply Quote

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 3392	Message 117423 - Posted: 14 Nov 2025, 10:52:20 UTC - in response to Message 117414. Yeah, validation is more or less dead. Only a few tasks finished by both wingmen are validated per day now. Pending Validation are building up fast. They still haven't solved that issue. Interestingly, three tasks completed this morning have validated almost right away. No sign of the older ones getting done though. ID: 117423 · Reply Quote

Jean-David Send message Joined: 19 Dec 05 Posts: 124	Message 117427 - Posted: 14 Nov 2025, 18:42:21 UTC - in response to Message 117411. In reply to Dave's message of 12 Nov 2025: Check the OS types, and versions you're paired with. My Windows 8.1 is paired with "T". That's a new OS and Version for me. Might be an AI reminder that I need to get myself another cup of Tea, maybe :-) No coffee reminder for me. (I only have Linux and Android tasks. What I did notice is that Android is picking up _0 and _1 tasks. All my Linux ones are _2 or occasionally, _3. My Linux work is picking up mostly _0 and _1, as are my partners.. All seem to be valid. Here is a typical result: MCM1_0242073_7755 Project name: Mapping Cancer Markers Created: Nov. 3, 2025 - 03:40 UTC Name: MCM1_0242073_7755 Minimum Quorum: 2 Replication: 2 Result name MCM1_0242073_7755_0 OS type Linux EndeavourOS OS version EndeavourOS Linux [6.17.7-arch1-1\|libc 2.42] Status Valid Sent time 2025-11-14 03:17:36 UTC Time due 2025-11-14 09:23:46 UTC Return time 2025-11-14 03:21:33 UTC Cpu time 2.36 Elapsed time 2.87 Claimed credit 22.8 Granted credit 63 MCM1_0242073_7755_1 Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.10 (Ootpa) [4.18.0-553.83.1.el8_10.x86_64\|libc 2.28] Valid 2025-11-14 07:10:20 UTC 1.71 1.74 103.2 63 ID: 117427 · Reply Quote

Sirius B Send message Joined: 12 Jun 09 Posts: 2164	Message 117465 - Posted: 19 Nov 2025, 0:05:52 UTC Since the changeover, the most I've ever got was 2/3 days worth. This latest batch was for 6 days, last wu completed 23:54 this evening, no other tasks download. What annoys me the most is the stats sites unable to get stats. They can send "We miss you" e-mails but can't set up stats so we can see what we achieve as individuals/teams... ID: 117465 · Reply Quote

mmonnin Send message Joined: 1 Jul 16 Posts: 225	Message 117466 - Posted: 19 Nov 2025, 0:13:47 UTC Last modified: 19 Nov 2025, 0:13:56 UTC WCG would need to update the stats. The last update was August 30th. https://download.worldcommunitygrid.org/boinc/stats/ ID: 117466 · Reply Quote

Copyright © 2026 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.