Message boards : Projects : nanoHUB_at_home has failed every single task -- work time exceeded
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Dec 08 Posts: 14 |
Science United has recently decided to start throwing nanHUB_at_home work units at my computer. Unfortunately not a single one of these is processing. All have an estimated time of 3min 47sec, and are timing out after 1h 33min of crunching time. Each work unit has log files similar to the following: 2020-07-10 02:27:29 AM | nanoHUB_at_home | Aborting task 07839998_45_0: exceeded elapsed time limit 5576.75 (86400.00G/15.81G) 2020-07-10 02:27:31 AM | nanoHUB_at_home | Aborting task 07839998_37_0: exceeded elapsed time limit 5576.75 (86400.00G/15.81G) 2020-07-10 02:27:46 AM | nanoHUB_at_home | Computation for task 07839998_45_0 finished 2020-07-10 02:27:46 AM | nanoHUB_at_home | Output file 07839998_45_0_r1958071825_0 for task 07839998_45_0 absent 2020-07-10 02:27:46 AM | nanoHUB_at_home | Computation for task 07839998_37_0 finished 2020-07-10 02:27:46 AM | nanoHUB_at_home | Output file 07839998_37_0_r1960253673_0 for task 07839998_37_0 absent https://imgur.com/a/fDPBmi4 In-use computing is only able to process about 4 WUs at a time with the rest showing Waiting for Memory as per my preferences. Specs listed below for reference 2020-07-10 02:40:08 AM | | Starting BOINC client version 7.16.7 for windows_x86_64 2020-07-10 02:40:08 AM | | log flags: file_xfer, sched_ops, task 2020-07-10 02:40:08 AM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2s zlib/1.2.8 2020-07-10 02:40:09 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 1650 (driver version 451.48, CUDA version 11.0, compute capability 7.5, 4096MB, 3327MB available, 3037 GFLOPS peak) 2020-07-10 02:40:09 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1650 (driver version 451.48, device version OpenCL 1.2 CUDA, 4096MB, 3327MB available, 3037 GFLOPS peak) 2020-07-10 02:40:09 AM | | Windows processor group 0: 12 processors 2020-07-10 02:40:09 AM | | Processor: 12 AuthenticAMD AMD Ryzen 5 2600 Six-Core Processor [Family 23 Model 8 Stepping 2] 2020-07-10 02:40:09 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 svm sse4a osvw skinit wdt tce topx page1gb rdtscp fsgsbase bmi1 smep bmi2 2020-07-10 02:40:09 AM | | OS: Microsoft Windows 10: Core x64 Edition, (10.00.18363.00) 2020-07-10 02:40:09 AM | | Memory: 15.93 GB physical, 21.18 GB virtual 2020-07-10 02:40:09 AM | | Disk: 931.51 GB total, 587.71 GB free 2020-07-10 02:40:09 AM | | Local time is UTC -6 hours 2020-07-10 02:40:09 AM | | No WSL found. 2020-07-10 02:40:09 AM | | VirtualBox version: 6.0.22 2020-07-10 02:40:09 AM | | General prefs: from https://scienceunited.org/ (last modified 26-Jun-2020 02:52:09) 2020-07-10 02:40:09 AM | | Host location: none 2020-07-10 02:40:09 AM | | General prefs: using your defaults 2020-07-10 02:40:09 AM | | Reading preferences override file 2020-07-10 02:40:09 AM | | Preferences: 2020-07-10 02:40:09 AM | | max memory usage when active: 8974.23 MB 2020-07-10 02:40:09 AM | | max memory usage when idle: 14685.10 MB 2020-07-10 02:40:09 AM | | max disk usage: 592.37 GB 2020-07-10 02:40:09 AM | | max CPUs used: 10 2020-07-10 02:40:09 AM | | don't use GPU while active 2020-07-10 02:40:09 AM | | suspend work if non-BOINC CPU load exceeds 35% 2020-07-10 02:40:09 AM | | (to change preferences, visit a project web site or select Preferences in the Manager) 2020-07-10 02:40:09 AM | | Setting up project and slot directories 2020-07-10 02:40:09 AM | | Checking active tasks 2020-07-10 02:40:09 AM | | Using account manager Science United Since I've written the above post, I've had more WUs downloaded with the exact same issue. I'm manually aborting 100s of these as they'll just waste processing. |
Send message Joined: 8 Nov 10 Posts: 310 |
The " Output file absent" error sometimes means that the disk drive is too slow to access that file. I don't see the problem on my SSDs. A disk write cache (PrimoCache) would also help. Sometimes the newer versions of VirtualBox have similar problems. I always use VBox 5.2.x. https://www.virtualbox.org/wiki/Download_Old_Builds_5_2 |
Send message Joined: 14 Aug 19 Posts: 55 |
First, I suggest you dump Science United and use the standard BOINC manager. That way you will have full control over what's running on your machines. You could then detach from the project permanently or set the project to No New Tasks until the problem is solved. This sounds to me like the work is never actually running. If you are successfully running other VirtualBox work then this problem needs to be reported to the project. I'd at least check their forums for reports of problems / solutions. If you aren't running other VB work, you should check the excellent LHC guide for VB work to make sure your machine is setup correctly. The info about specific VB versions is a bit dated but the information overall is good. I also like to enable the VB window when I have a problem, I can usually see what it is from the log that's shown there. To do that, go to your cc_config file and set the vbox_window to 1. <vbox_window>1</vbox_window> Team USA forum Follow us on Twitter Help us #crunchforcures! |
Send message Joined: 5 Oct 06 Posts: 5128 |
The "Output file absent" error simply means that the science application finished - either by failure, or by forced closure - before it had time to write out the scientific result. I've never seen the minuscule difference in timing between a classic hard disk and an SSD cause this error - either the scientific data is present, or it isn't. More likely, the nanohub application isn't really making any real progress. BOINC invents a 'pseudoprogress' to reassure the user that something is happening, when in reality the science has got stuck. That possibility should be checked, but (for me) not at this time of night. I'll take a look in the morning. |
Send message Joined: 8 Nov 19 Posts: 718 |
Large RAM WUs usually are via docker or VM. If your OS is Windows, it could have to emulate Linux through an emulation layer. If you don't have enough RAM, and the system is reading from SWAP, it may cause the WU to process very slowly. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.