Thread 'CPDN project offline again'

Message boards : Projects : CPDN project offline again
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Alan K

Send message
Joined: 16 Dec 16
Posts: 6
United Kingdom
Message 86598 - Posted: 18 Jun 2018, 9:03:09 UTC - in response to Message 86585.  

Looking at the IP addresses the CPDN server and Oxford's own website server are on the same 129.67.242 subnet whereas the CPDN server is on the 129.67.195 subnet so the main CPDN pages will be fine. Its the sdatabase server which I guess is on the .195 subnet that is the problem.
ID: 86598 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2682
United Kingdom
Message 86602 - Posted: 18 Jun 2018, 9:40:52 UTC - in response to Message 86598.  

As the project's own message board is currently off-line it would be nice to get some updates we could post here. (It was briefly back on line on the 13th.) it would be nice to have some clue as to the timescale before things come back on line. I am sure those of us Andy contacts via an email list from time to time would post updates here. (Last time I posted one that had already been posted in my haste!) If there is nothing new from Andy by the end of the month I will give him a nudge. I don't really want to do this too often as I am sure he has other emails to do with getting the project back on line that he has to deal with and possibly lots of emails from other projects (not necessarily BOINC ones) that may be affected by the it problems at Oxford.
ID: 86602 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 86605 - Posted: 18 Jun 2018, 12:33:01 UTC

There's a message on the front page.
ID: 86605 · Report as offensive
mmonnin

Send message
Joined: 1 Jul 16
Posts: 146
United States
Message 86606 - Posted: 18 Jun 2018, 14:02:07 UTC - in response to Message 86573.  

There's been NO work for Linux for over a year, so unless you run Wine, you're wasting your time.
And the project is still off line.


There was some right before it went down. I ended up returning work in a period the server was up but credit it was never issued. Hopefully the results around that time weren't lost.
ID: 86606 · Report as offensive
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 86607 - Posted: 18 Jun 2018, 14:38:13 UTC - in response to Message 86606.  

There was some right before it went down.

I think those were the ancient reissues, due more to their year-long timeout period, than any new work.
But maybe the outage will give them a chance to re-think Linux too.
ID: 86607 · Report as offensive
kevin

Send message
Joined: 28 Mar 17
Posts: 21
United Kingdom
Message 86608 - Posted: 18 Jun 2018, 15:00:24 UTC - in response to Message 86607.  


But maybe the outage will give them a chance to re-think Linux too.


I hope so, I've another machine that could contribute a few spare cores to it.
ID: 86608 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2682
United Kingdom
Message 86618 - Posted: 19 Jun 2018, 8:53:06 UTC

Thanks Les, I have my browser set up so I don't see it if I don't scroll down slowly enough. If it doesn't come back, the message get updated or something appear on the list I will give it till half way through July then but hopefully something will appear before then. Even when there is no information I struggle with the pull on one hand to provide information for the more aggressively vociferous posters and not wanting to take time away from fixing things/irritating those who are.
ID: 86618 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15558
Netherlands
Message 86619 - Posted: 19 Jun 2018, 9:32:27 UTC

Andy will be mailing the moderators list later today with an exhausting list of what was done and what's still to do. Things are looking up.
ID: 86619 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2682
United Kingdom
Message 86620 - Posted: 19 Jun 2018, 13:39:06 UTC - in response to Message 86619.  

And here it is.

Hi All,


Here is an update on the progress of the restoration of the climateprediction.net project. As I have written in a previous email, we had pretty much lost all the main servers of the project, so for the past month the project has been running from the project backup server. As the underlying infrastructure has been progressively restored, we have been able to start rebuilding the servers of the project. Recently we switched off the project on the backup site so that we could take a dump of the database on that machine in order to import that into a new database server. This work is still ongoing, what I can do at this stage is report the progress made so far and the steps necessary to restore the project. So far we have restored the upload, download and application servers, all three of which are configured and ready to go. The work at present is focusing on the database server. Beyond that, prior to starting up the project again, the slave database will need to be restored. At present I don't have definite completion time for this work, my estimate is between Friday (at the earliest) and a point next week, this will all depend on how the work on these two steps goes.


Best regards,

Andy
ID: 86620 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15558
Netherlands
Message 86647 - Posted: 21 Jun 2018, 22:42:50 UTC - in response to Message 86641.  

It's not super secret that list. I'm not a moderator on CPDN yet I manage to receive the emails. Just saying. :)
But as far as I understood their servers aren't real hardware, they solely exist in a VM and that VM has had corruption problems. So how do you run a barebones server in the VM when the VM is corrupt?
ID: 86647 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 86658 - Posted: 23 Jun 2018, 1:16:40 UTC

There's been several messages in the middle of the main page over the weeks.

The current one says:

The project is currently undergoing a full
system rebuild because of persistent issues
which includes a new project backup system,
as such the project will remain down to at
least until 15th June.

Several people have complained in recent times on the cpdn Message board about no BOINC message, so I asked Andy about it.
He said that the IT people weren't very familiar with the BOINC stuff, and he, at least, preferred to communicate via email with the moderators.

There are far worse things happening in the world.
ID: 86658 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 86661 - Posted: 23 Jun 2018, 19:30:19 UTC

Hi All,

A brief update for you:

Gradually services are being restored. You will see that the www.cpdn.org/cpdnboinc site is now back and the message boards are also back. However the project is still offline, I am currently working on non-trivial issue with the apache configuration of the cgi link, without this link working the project cannot be brought back online.

Best regards,

Andy
ID: 86661 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1440
United States
Message 86662 - Posted: 23 Jun 2018, 21:38:15 UTC - in response to Message 86661.  

Les, Thank you for the updated information!
ID: 86662 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 86680 - Posted: 26 Jun 2018, 12:18:30 UTC

Time to move back to the cpdn Message board.
Thread is "Nearly there", in Number crunching.

Main page
ID: 86680 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15558
Netherlands
Message 86682 - Posted: 26 Jun 2018, 12:27:38 UTC
Last modified: 26 Jun 2018, 12:28:25 UTC

Andy Bowers, CPDN admin wrote:
Hi All,

Just to let you all know: the main project has now been restored. I have now rebuilt all the main project servers and services, and re-enabled the project. The project is now running from a new server: 'caerus.oerc'.

The project is currently only running from a master database, there is currently no slave server. The project has ordered a new machine to replace the slave database. Until the slave database machine arrives the project will be taken down on regular occasions to take a dump of the database.

Please let me know if you spot any issues.

Best regards,

Andy

Do know that the project is only available via HTTP for now, while Andy figures out how to re-implement SSL on the main server. Oh and there may be problems connecting to the server at this time.
ID: 86682 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2682
United Kingdom
Message 86685 - Posted: 26 Jun 2018, 13:03:40 UTC

As noted, there are problems connecting to the server at the moment. I thought I might be doing something stupid and missing something but Les has confirmed that he is also in the,Connection to the internet OK project servers may be down camp. My guess is that it shouldn't be too long before things are up and running again though. The server status page is showing just over 500 tasks ready to send but unless there are more not showing yet, these will be gone in seconds once everyone can connect again.
ID: 86685 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5126
United Kingdom
Message 86689 - Posted: 26 Jun 2018, 13:29:22 UTC - in response to Message 86685.  

Investigation suggests that the scheduler is declared to require https access, but the scheduling server doesn't have an SSL certificate yet. Andy says he's working on that this afternoon - just one more push!
ID: 86689 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5126
United Kingdom
Message 86694 - Posted: 26 Jun 2018, 16:03:28 UTC

I'm now getting a new message:

26/06/2018 17:01:28 | climateprediction.net | Scheduler request failed: Peer certificate cannot be authenticated with given CA certificates

I'll report that one upstairs too.
ID: 86694 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15558
Netherlands
Message 86701 - Posted: 26 Jun 2018, 17:58:19 UTC

The problem is with the amount of encryption registrations that CPDN has done, the error we now get is due to the main server being registered with an encryption that's outside the limit of the maximum amount of encryptions they could do. So it's going to take at least a day for Andy to get this fixed.
ID: 86701 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2682
United Kingdom
Message 86717 - Posted: 27 Jun 2018, 7:55:05 UTC - in response to Message 86701.  

A bit more progress. Was able to attach to project on my Linux version of BOINC this morning. Update to try and get work still running into authentication problems. I have sent relevant parts of event log to Andy. so I think we are getting there. There won't be any work for Linux at the moment but I will switch back toWINE and the windows version once things are ready as I don't want to play about detaching project when I have work to upload from the windows one.
ID: 86717 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Projects : CPDN project offline again

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.