Thread 'World Community Grid has announced an extended outage from Feb 14 to April 22, 2022'

Message boards : Projects : World Community Grid has announced an extended outage from Feb 14 to April 22, 2022
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

AuthorMessage
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 416
Sweden
Message 109984 - Posted: 4 Oct 2022, 17:18:40 UTC
Last modified: 4 Oct 2022, 18:18:27 UTC

Yes, the website came back, and at the same time I got 50 new OPNG tasks. The first 23 went down without a single http error, but now the http errors are back
again, just as bad as ever. I also have a update script (batch file), but set at 6-minutes. Otherwise, the chance of getting any OPNG's would be almost 0%.

Edit, added: After a lot of retries, I managed to download the rest of my 50 OPNG's + 20 more. I can't continue to babysit my computer, so I have set NNT, and will
shut it down after the cached OPNG tasks are crunched, uploaded, and reported. I will try again in a couple of days, and if things haven't improved then, I'll wait a
week or more before I try again.
ID: 109984 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5128
United Kingdom
Message 109985 - Posted: 4 Oct 2022, 19:14:03 UTC - in response to Message 109984.  

New error mode:

04/10/2022 20:11:33 | World Community Grid | [http] HTTP error: Error in the HTTP2 framing layer
drops the attempted connection immediately. I'm taking a break.
ID: 109985 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1442
United States
Message 109988 - Posted: 4 Oct 2022, 20:35:17 UTC - in response to Message 109985.  

New error mode:
04/10/2022 20:11:33 | World Community Grid | [http] HTTP error: Error in the HTTP2 framing layer
drops the attempted connection immediately. I'm taking a break.

Getting same message...

Found a closed github issue from 2019 in the curl library code that mentions the primary error.

With all the burps and farts Krembil has given us in the past too many months, could they possibly be running outdated server software? Doubt it but anything is possible.

Most likely, its the various real & virtual WCG server(?? s) that are completely overwhelmed with everyone trying to view website, send/receive work, etc.
ID: 109988 · Report as offensive
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 229
United States
Message 109989 - Posted: 4 Oct 2022, 21:16:45 UTC - in response to Message 109988.  

WCG runs their own custom server software. they are not a normal BOINC project.

wondering if it is "outdated" or not isn't even applicable being how different they are.
ID: 109989 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1301
United Kingdom
Message 109990 - Posted: 4 Oct 2022, 21:29:35 UTC - in response to Message 109989.  

Knowing IBM it was highly customised to run on their hardware, thus is being a grade one pain in the chair polisher to get running properly on "normal" hardware.
ID: 109990 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5128
United Kingdom
Message 109991 - Posted: 4 Oct 2022, 21:35:33 UTC

Well, they backed out of that one pretty quickly (temporarily or permanently, time will tell) - I've been back to 'normal' delays for a while. I've given up trying for tonight.
ID: 109991 · Report as offensive
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 416
Sweden
Message 109994 - Posted: 5 Oct 2022, 0:22:46 UTC
Last modified: 5 Oct 2022, 0:41:10 UTC

boinccmd-exe --network_available in a batch file on Windows running --network_available every 20 seconds seems to keep my computer fed now, without manual intervention.
It will auto-retry any stalled download.

Simple example:

cd C:\Program Files\BOINC
:loop

boinccmd.exe --network_available

TIMEOUT /T 20 /nobreak
cls
goto loop
ID: 109994 · Report as offensive
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 416
Sweden
Message 109998 - Posted: 5 Oct 2022, 4:41:09 UTC

Sigh....Again WCG website went down. "503 Service Unavailable No server is available to handle this request", or "System error
World Community Grid is currently experiencing an unexpected error. Please check Facebook or Twitter for more information."
ID: 109998 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 890
United States
Message 109999 - Posted: 5 Oct 2022, 6:01:52 UTC - in response to Message 109998.  

I see that. Same issue as before. Same error messages. Guess their bandaid fix did not correct the real issue, ie they really don't know how to manage a project the scope and size that IBM could manage with one hand tied behind their back.


ID: 109999 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5128
United Kingdom
Message 110001 - Posted: 5 Oct 2022, 8:00:07 UTC

As folks have said. I've got many OPNG transfers waiting from last night, and they're clearing very, very slowly under my OCD.

But I caught an interesting new error message when checking it was the same problem as before:

05/10/2022 08:53:34 | World Community Grid | [http] [ID#45505] Received header from server: HTTP/2 503
05/10/2022 08:53:34 | World Community Grid | [http] [ID#45505] Received header from server: <html><body><h1>503 Service Unavailable</h1>
05/10/2022 08:53:34 | World Community Grid | [http] [ID#45505] Received header from server: No server is available to handle this request.
05/10/2022 08:53:34 | World Community Grid | [http] [ID#45505] Info:  Connection cache is full, closing the oldest one.
I'll look into that one.
ID: 110001 · Report as offensive
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 416
Sweden
Message 110002 - Posted: 5 Oct 2022, 8:37:14 UTC - in response to Message 110001.  

As folks have said. I've got many OPNG transfers waiting from last night, and they're clearing very, very slowly under my OCD.

But I caught an interesting new error message when checking it was the same problem as before:

05/10/2022 08:53:34 | World Community Grid | [http] [ID#45505] Received header from server: HTTP/2 503
05/10/2022 08:53:34 | World Community Grid | [http] [ID#45505] Received header from server: <html><body><h1>503 Service Unavailable</h1>
05/10/2022 08:53:34 | World Community Grid | [http] [ID#45505] Received header from server: No server is available to handle this request.
05/10/2022 08:53:34 | World Community Grid | [http] [ID#45505] Info:  Connection cache is full, closing the oldest one.
I'll look into that one.

That seems to be a Curl thing.

CURLMOPT_MAXCONNECTS explained
When the cache is full, curl closes the oldest one in the cache to prevent the number of open connections from increasing.

https://curl.se/libcurl/c/CURLMOPT_MAXCONNECTS.html
ID: 110002 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2691
United Kingdom
Message 110003 - Posted: 5 Oct 2022, 9:20:30 UTC

As folks have said. I've got many OPNG transfers waiting from last night, and they're clearing very, very slowly under my OCD.
Same here but ARP, that being the only WCG project I run at the moment. At least I can see all the downloads again now without having to scroll!
ID: 110003 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5128
United Kingdom
Message 110004 - Posted: 5 Oct 2022, 9:45:40 UTC - in response to Message 110002.  

That seems to be a Curl thing.

CURLMOPT_MAXCONNECTS explained
When the cache is full, curl closes the oldest one in the cache to prevent the number of open connections from increasing.

https://curl.se/libcurl/c/CURLMOPT_MAXCONNECTS.html
I was assuming that, but thanks for confirming and the link.

But what's the limit in our clients, and is it of the order of that 45 thousand current ID count? If so, is it required/efficient in a client setting? I'll probably post that question in Git.
ID: 110004 · Report as offensive
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 416
Sweden
Message 110005 - Posted: 5 Oct 2022, 9:53:11 UTC

Well, regarding the situation at the moment of WCG at Krembil/Jurisica Lab, all I have to say is: "Luke 23:34" :-)
ID: 110005 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5128
United Kingdom
Message 110006 - Posted: 5 Oct 2022, 12:20:06 UTC

ID: 110006 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1442
United States
Message 110008 - Posted: 5 Oct 2022, 14:16:12 UTC - in response to Message 110005.  

Well, regarding the situation at the moment of WCG at Krembil/Jurisica Lab, all I have to say is: "Luke 23:34" :-)

Yes, we forgive them. They supposedly "started working" on the physical hosting Feb 14 -- almost 6 months ago but took virtual control a bit over a year ago.

Has anyone ever found out how many real full-time positions are actually working for WCG staffing at Krembil?

I'm more inclined to say they are deep (over their heads?) into the s**t.
ID: 110008 · Report as offensive
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 110011 - Posted: 5 Oct 2022, 14:41:02 UTC - in response to Message 110005.  

Well, regarding the situation at the moment of WCG at Krembil/Jurisica Lab, all I have to say is: "Luke 23:34" :-)

I thought maybe you were going to say "Physician, heal thyself".
ID: 110011 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1301
United Kingdom
Message 110012 - Posted: 5 Oct 2022, 15:34:09 UTC - in response to Message 110008.  

Has anyone ever found out how many real full-time positions are actually working for WCG staffing at Krembil?

The number of FTE employed on a task is all too often a totally irrelevant metric as one may have a thousand employed on a task an none of them doing any productive work, or one may have one employed and working very effectively. Guess which set will get the task done first, and at a lower cost.
ID: 110012 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1301
United Kingdom
Message 110014 - Posted: 5 Oct 2022, 15:40:24 UTC

One thing I would like to see is a comparison between the code handed over by IBM and a comparable age "stable, production" version of server-side BOINC. I have a gut feeling that there would be a substantial difference between the two.
Someone said (some time ago) that it may well have been an easier task to take the data and load it onto "clean" servers running a standard version of BOINC. While this may have sounded a simple task it could well be an uphill struggle unless the data structure deployed by IBM was identical to (or readily converted to) that used by "standard" BOINC.
ID: 110014 · Report as offensive
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 416
Sweden
Message 110015 - Posted: 5 Oct 2022, 15:41:38 UTC - in response to Message 110006.  
Last modified: 5 Oct 2022, 16:11:27 UTC

Opened https://github.com/BOINC/boinc/issues/4952
Thank you Richard !

Edit, added: WCG admins just has to be totally out of this world, or just don't understand what they are (not) doing.

On their Facebook page, some 25 minutes ago, they posted:
"We are experiencing a brief system error with our website and database. We apologize and will notify you when it has been resolved."

"A brief system error"? No, it's not "brief" when it's going on for many hours, with the webpage unusable, and since the same issue happened just a couple of days ago, they obviously did not fix it then.
Sigh, I really don't know what to say........
ID: 110015 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 10 · Next

Message boards : Projects : World Community Grid has announced an extended outage from Feb 14 to April 22, 2022

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.