Message boards : Projects : News on Project Outages
Message board moderation
Previous · 1 . . . 48 · 49 · 50 · 51 · 52 · 53 · 54 . . . 68 · Next
Author | Message |
---|---|
Send message Joined: 28 Jun 10 Posts: 2718 |
Yep. Got loads now.Dennis currently telling me it has no work available. |
Send message Joined: 30 Mar 20 Posts: 423 |
And another working day has ended in Toronto. No results this day either. WCG still dead as a doornail. It's as if they're trying to land on Mars or something, and not just restart/reconnect a storage system. |
Send message Joined: 3 Mar 23 Posts: 14 |
just restart/reconnect a storage system. This can be completely different level of problem. It's time to calm down and just wait for result. Blaming them every day can't help this situation at all : ) |
Send message Joined: 8 Mar 23 Posts: 11 |
i wonder if a hospital's network goes down, if it takes 8 days and counting to fix it... |
Send message Joined: 28 Jun 10 Posts: 2718 |
i wonder if a hospital's network goes down, if it takes 8 days and counting to fix it...It did at the one I worked at once after a Citrix, "upgrade." |
Send message Joined: 3 Mar 23 Posts: 10 |
just restart/reconnect a storage system. Unfortunately, this is not the first issue since restart (well into double digits now, I suspect, even ignoring the length of time migration took). I believe there is a combination of issues at play (to name a few off the top of my head): - Jurisica lab not understanding the complexity of what they were taking on from IBM and biting off more than they can chew; - Krebil's IT having other priorities (be interesting to see their service level agreement for this!); - insufficient resources invested to avoid single point of failure outages; - old/repurposed devices pressed into action as temporary fixes; - unanticipated consequences from moving off IBM technology to alternative software. In this particular instance I would like to understand better how they use RAID for storage back-up and mirroring since the early reference to having tape back-ups was disconcerting as I would have assumed these were overnight and not real-time. If Jurisica Lab's communication was significantly better / faster that might "calm down" everyone who, like me, has invested time and effort into supporting WCG over the years. Cheers Phillip |
Send message Joined: 31 Dec 18 Posts: 301 |
just restart/reconnect a storage system. Agreed - and this is not a problem to park at Krembil’s door, it’s totally the responsibility of the data centre. |
Send message Joined: 3 Mar 23 Posts: 14 |
We can lament the current situation and problems of the project as much as we like, but in truth, this does not give any of us the right to demand or dictate anything to the project, regardless of how much we have invested in this project over the years of participation in it. This is still a voluntary computing project.
I do not want to remove responsibility from anyone, but it cannot be excluded that there was simply no other way out at that/current moment. Anyway we don't know ins and outs of agreement on the transition of WCG from IBM to Krebil and who of them was interested in what(or was forced to do that). |
Send message Joined: 30 Mar 20 Posts: 423 |
New update, 30 minutes ago: Update #4: The "new" system did recognize the data hardware RAIDs. All have been rebuilt, and the data center is attempting to repair the OS drives/RAID. |
Send message Joined: 10 May 07 Posts: 1450 |
Not going to get my hopes up that WCG will be resuscitated until I see it. |
Send message Joined: 30 Mar 20 Posts: 423 |
Not going to get my hopes up that WCG will be resuscitated until I see it.Same here, and since I do not have any interest in any other project, I have shut down all my crunching computers. They really should concentrate on getting the BOINC part of the system up and running. The WCG website, bells and whistles, can wait until the BOINC part is running normally. |
Send message Joined: 17 Nov 16 Posts: 891 |
Asteroids@home is offline due to SSL certificate expiration |
Send message Joined: 8 Mar 23 Posts: 11 |
Tomorrow is Friday. if WCG doesnt fix it by tomorrow afternoon... next chance is Monday. |
Send message Joined: 30 Mar 20 Posts: 423 |
Tomorrow is Friday.That is true, because as we have learned by now, SHARCNET (Shared Hierarchical Academic Research Computing Network), does not help their customers, (at least not WCG) during Weekends, evenings, and nights. At least that's what we learned from one of the WCG updates, when they wrote "Unfortunately, data center staff will not be able to help us over the weekend." Very strange data center. They only seem to work during office hours. After office hours, customer systems (or only WCG) obviously will be allowed to crash and burn. Below is an interesting comment from a WCG user, on their FB account: Eric Pohlke writes: I'm lost here David. I've tried to reach out to Krembil several times with no answer. I get responses from their funding partners and Service Providers, but nothing from the WCG. A problem like this would have been solved and a solution applied with 2 days or I would have fired a few people. Building a top end Grid and Rack Server would have taken less time. Swap the controller box out and put a spare in. All data Centres that the University Health Network use have triple redundancy backup. They can't afford to loose their link and data stream with their clients (doctors, specialist, research engineers, etc.) The WCG volunteers should be given the same consideration and respect. There were once over 1,720,000 WCG volunteers, and way before any smartphone app or console. All on their PCs. IBM would share things, seek advice, etc. with their volunteers. Have the Mr. J. Bains consider allocating much more resources to the World Community Grid Project. Donald Weaver, the former Director invited the World Community Grid to Krembil and was so excited about its achievements and the energy its volunteers put forth. Joseph Bains needs to understand the importance and potential of this project. Back in the day, most PC users only had a single core, low yield CPU and Windows 98 to use. Today, the hope gaming Rig is the power of a mini-Super Computer. And the Commercial hardware, any decent computer tech can build one and know where to get the parts without taking out a second mortgage. We need this energy back in Krembil. |
Send message Joined: 11 Oct 10 Posts: 13 |
The SHARCNET website clearly states (look under Support) that they are a 0900 to 1700 EST (Canada) 5 day a week operation. No support on weekends,. Subtract out lunch breaks, staff meetings, etc. maybe there is an average of 6 hours of support per day. With this work schedule, it is no surprise it took forever to get WCG up and running. And no surprise it takes forever to get anything fixed. SHARCNET has other users, We (WCG users) have no idea where WCG lies on the priority list of having issues resolved. Maybe WCG is at the bottom of the list. All of this had to be known to Krembil from day 1. So, it is what it is. Just find other projects to use your computer time. No use complaining. Nothing is going to change. |
Send message Joined: 30 Mar 20 Posts: 423 |
Yeah, no support after business hours. Incredible. To the question "How long should I expect to wait for support?", on this page: https://helpwiki.sharcnet.ca/wiki/FAQ, The answer is: "Unfortunately Compute Canada/SHARCNET does not have adequate funding to provide support 24 hours a day, 7 days a week. User support and system monitoring is limited to regular business hours: there is no official support on weekends or holidays, or outside 9:00 - 17:00 EST . Please note that this includes monitoring of our systems and operations, so typically when there are problems overnight or on weekends/holidays system notices will not be posted until the next business day." So, no wonder then that everything, including the migration from IBM, takes such long time, compared to when WCG was run by IBM. That state of affairs is not going to work in the long run. If there's no support outside of business hours, WCG will slowly fade away. |
Send message Joined: 30 Mar 20 Posts: 423 |
WCG New update, 15 minutes ago: "Update #5: The storage server was revived yesterday late afternoon. Both database filesystems mounted as before, but the science filesystem did not. It needs a repair; erasing the old log first." |
Send message Joined: 3 Mar 23 Posts: 14 |
Just find other projects to use your computer time. No use complaining. Nothing is going to change. "Came, offended, left." (= Perhaps, before inflating further hysteria that "everything is lost", still wait for this story ends and only THEN draw any conclusions (especially with calls to abandon the project)? |
Send message Joined: 29 Aug 05 Posts: 80 |
as we have learned by now, SHARCNET (Shared Hierarchical Academic Research Computing Network),SHARCNET has free access to Compute Canada for academic research. https://youtu.be/hWkWAaNBILs?t=146 Free makes sense. I don't see a flow of cash to the project. Limited service makes sense from a free service. It's actually amazing to have any service at all for no charge! After all, somebody (Canadian taxpayer) is paying for replacement parts and labour and delivery etc... It also makes sense that this system is now overburdened by World Community Grid. It was not set up with the intention to host anything like a huge BOINC project. Good on these people for still trying to help us. They are relentless :) |
Send message Joined: 17 Nov 16 Posts: 891 |
Asteroids@home is back online. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.