Thread 'News on Project Outages'

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 47 · 48 · 49 · 50 · 51 · 52 · 53 . . . 68 · Next

AuthorMessage
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 423
Sweden
Message 111192 - Posted: 6 Mar 2023, 19:00:06 UTC
Last modified: 6 Mar 2023, 19:02:15 UTC

New Update from WCG:

Update: Unfortunately, additional hardware problem on the storage server besides the RAID card
are preventing us from restarting. Working with the data center on the alternative solutions.


Comment: Incredible.........
ID: 111192 · Report as offensive     Reply Quote
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1302
United Kingdom
Message 111193 - Posted: 6 Mar 2023, 19:44:26 UTC - in response to Message 111192.  

As all too often - cure the obvious problem and there are at least two more problems lurking in the shadows to come out an bite one :-(
ID: 111193 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 423
Sweden
Message 111194 - Posted: 6 Mar 2023, 20:59:57 UTC
Last modified: 6 Mar 2023, 21:11:39 UTC

@Rob: Yeah, it looks as if you're right about that.

New Update from WCG:

"Update #2: Unfortunately, the RAID controller was not the root cause of our storage system failure,
the PCI bus failed. Data center is in the process of moving the disks to an alternate system and we
will post updates as we progress. Once again, thank you for your patience."


Comment: Looking at their Facebook account, there's not much "patience" there, to say thank you for :-)
ID: 111194 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 423
Sweden
Message 111197 - Posted: 7 Mar 2023, 12:07:59 UTC
Last modified: 7 Mar 2023, 12:17:10 UTC

Another day, another day to wait for new revelations of new issues that will stop WCG from restarting.
This is getting old by now, really really old. It's almost a week now, that WCG has been down.
SPOF's in such a big project, is simply not acceptable, and utterly unprofessional.

I've had a lot of patience with this migration from IBM to Jurisica Lab/Krembil, but by now, my patience if wearing very thin.
ID: 111197 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2718
United Kingdom
Message 111198 - Posted: 7 Mar 2023, 12:57:15 UTC - in response to Message 111197.  

How did Krembil come to get it? Did they put in a bid when IBM wanted out or what? It seems to me that they are grossly under resourced for a project of this size. Were there any other options to take over from IBM? That is something I don't recall seeing any discussion of.
ID: 111198 · Report as offensive     Reply Quote
ProfileContact
Avatar

Send message
Joined: 29 Aug 05
Posts: 80
Canada
Message 111199 - Posted: 7 Mar 2023, 13:34:45 UTC - in response to Message 111198.  

Krembil was already tied to World Community Grid and maybe researchers there saved it from oblivion. First save the patient from death and then treat the developing symptoms.

https://web.archive.org/web/20210913180104/https://www.worldcommunitygrid.org/about_us/viewNewsArticle.do?articleId=732 wrote:
Additionally, we’re transferring knowledge about World Community Grid's projects and practices to the team at Krembil Research Institute. Through the Mapping Cancer Markers and Help Conquer Cancer projects, Dr. Jurisica and his team are already familiar with the power of World Community Grid. They are excited to learn even more about its inner workings and to embrace the power of citizen science.

ID: 111199 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 423
Sweden
Message 111200 - Posted: 7 Mar 2023, 17:23:51 UTC
Last modified: 7 Mar 2023, 18:22:07 UTC

Well, one of my computers logged the WCG system as down the first time on 1 Mar 2023 06.42.56 UTC.
Dr Who Fan posted about it on 1 Mar 2023, 7:16:30 UTC.

So, the system has been down now for at least about 6 days, 10 hours, 38+ minutes or about 154 hours (rounded down), and still counting.
That must be a new record, to fix a relatively simple problem. And WCG isn't up yet.
The last update from the team, was more than 20 hours ago.....

Edit, added: Luke 23:34
ID: 111200 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2718
United Kingdom
Message 111202 - Posted: 7 Mar 2023, 19:59:51 UTC - in response to Message 111199.  

Krembil was already tied to World Community Grid and maybe researchers there saved it from oblivion. First save the patient from death and then treat the developing symptoms.

https://web.archive.org/web/20210913180104/https://www.worldcommunitygrid.org/about_us/viewNewsArticle.do?articleId=732 wrote:
Additionally, we’re transferring knowledge about World Community Grid's projects and practices to the team at Krembil Research Institute. Through the Mapping Cancer Markers and Help Conquer Cancer projects, Dr. Jurisica and his team are already familiar with the power of World Community Grid. They are excited to learn even more about its inner workings and to embrace the power of citizen science.
Read the link., still isn't clear to me at least whether Krembil took it on because it would have folded otherwise or weather there were any other potential takers. I shall have to do some searching and try and find out more though if no one here knows anything I might not have much luck.
ID: 111202 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1450
United States
Message 111205 - Posted: 7 Mar 2023, 21:11:53 UTC - in response to Message 111202.  

... ...Read the link., still isn't clear to me at least whether Krembil took it on because it would have folded otherwise or weather there were any other potential takers. I shall have to do some searching and try and find out more though if no one here knows anything I might not have much luck.

I don't think information about other possible candidates for takeover of WCG besides Krembil were ever made public by IBM. Big Blue sprung it up on the BOINC public community after the deal was made.
ID: 111205 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2718
United Kingdom
Message 111208 - Posted: 7 Mar 2023, 22:22:52 UTC - in response to Message 111205.  

Thanks, I wondered if that were the case.
ID: 111208 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 423
Sweden
Message 111209 - Posted: 8 Mar 2023, 0:42:01 UTC

It's 19:40 ish, in Toronto, and still no update for today.
No further comments needed, for now.
ID: 111209 · Report as offensive     Reply Quote
Robokapp

Send message
Joined: 8 Mar 23
Posts: 11
Message 111213 - Posted: 8 Mar 2023, 5:18:45 UTC

I feel i have one more FUBAR'd unfortunate incident in me before it's time to walk... mostly because some of my machines are quite old and can't really do much of the more intensive work of other projects.

i wish there was a really good alternative for medical and overall people-benefiting research.
ID: 111213 · Report as offensive     Reply Quote
PMH_UK

Send message
Joined: 24 Dec 10
Posts: 37
United Kingdom
Message 111214 - Posted: 8 Mar 2023, 7:53:53 UTC - in response to Message 111213.  

DENIS, Rosetta, SiDock and TNGrid all do medical research.

Paul.
ID: 111214 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2718
United Kingdom
Message 111215 - Posted: 8 Mar 2023, 8:09:30 UTC - in response to Message 111214.  

DENIS, Rosetta, SiDock and TNGrid all do medical research.

Paul.
Dennis currently telling me it has no work available.
ID: 111215 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 423
Sweden
Message 111217 - Posted: 8 Mar 2023, 12:24:43 UTC

1 day, 15 hours+ since the last update from the WCG team.
Very disturbing, very bad PR for WCG.
ID: 111217 · Report as offensive     Reply Quote
jhseltzer

Send message
Joined: 8 Feb 17
Posts: 7
United States
Message 111220 - Posted: 8 Mar 2023, 14:43:39 UTC - in response to Message 111215.  

Rosetta has work available. I have 4 machines busy crunching files.
ID: 111220 · Report as offensive     Reply Quote
[CSF] Aleksey Belkov

Send message
Joined: 3 Mar 23
Posts: 14
Russia
Message 111225 - Posted: 8 Mar 2023, 16:26:11 UTC

SiDock@home has ~10k WUs (long and short) - so I don't see any problems with computer's idle due to WCG outage.
ID: 111225 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 423
Sweden
Message 111226 - Posted: 8 Mar 2023, 16:37:53 UTC
Last modified: 8 Mar 2023, 17:00:38 UTC

New update, 10 minutes ago:

"Update #3: As of this morning, the data center continues to work on booting the temporary replacement
DSS 7000 storage system. They are attempting multiple alternative strategies to resolve current failures."


Edit, added:
Specifications Dell DSS 7000:
Form Factor: 4U
Max capacity: 720TB (90 x 8TB HDDs)
2 x DSS server nodes (2S Intel E5 v3 series CPUs)
ID: 111226 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2718
United Kingdom
Message 111227 - Posted: 8 Mar 2023, 17:09:39 UTC

"Update #3: As of this morning, the data center continues to work on booting the temporary replacement
Would that be a size 10 boot?
ID: 111227 · Report as offensive     Reply Quote
PMH_UK

Send message
Joined: 24 Dec 10
Posts: 37
United Kingdom
Message 111228 - Posted: 8 Mar 2023, 17:49:31 UTC - in response to Message 111215.  

Dennis currently telling me it has no work available.

It has un-sent units currently (was 0 before but many in progress).

Paul.
ID: 111228 · Report as offensive     Reply Quote
Previous · 1 . . . 47 · 48 · 49 · 50 · 51 · 52 · 53 . . . 68 · Next

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.