Thread '"job cache full" due to indicated unrealistic high remaining runtimes of GPUGRID Python tasks'

Message boards : Questions and problems : "job cache full" due to indicated unrealistic high remaining runtimes of GPUGRID Python tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
Erich56

Send message
Joined: 30 Dec 14
Posts: 95
Austria
Message 109980 - Posted: 4 Oct 2022, 15:07:33 UTC

For about a week, I have the following problem when trying to crunch GPUGRID Python tasks on one of my hosts which consists of:

2 CPUs Xeon E5 8-core / 16-HT each.
1 GPU Quadro P5000
128 GB Ramdisk
128 GB system memory

until about 10 days ago, I ran 2 Pythons simultaneously (with a setting in the app_config.xml: 0.5 gpu usage) without any problems.

Now, while only 1 Python is running and I push the update button on the BOINC manager for fetching another Python, the BOINC event log tells me that no Pythons are available. Which is not the case though, as the server status page shows plenty unsent tasks for download; besides, I can download such tasks on another PC.

So I tried to download tasks from other GPU projects, and in all cases the event log says:
not requesting tasks: don't need (CPU; NVIDIA GPU: job cache full).
In the BOINC computing preferences, I then set the "store at least work" to the maximum possible of 10 days, and under "store up to an additional" also 10 days. However, this did not solve the problem.

There is about 94GB free space on the Ramdisk, and some 150GB free system RAM.

So, what I noticed then was that "job cache full" is obviously preventing more downloads:
a running Python, due to it's technical nature, shows remaing runtimes of 30 days, 60 days or even more - and this causes "job cache full" :-( Which does not reflect the reality though, because a Python, on this host, finishes after about 24 hours.
Before, like on my other machines, remaining runtime for Pythons was indicated as 1-2 days.
So, at some point something unknown must have happened, letting the runtimes jump up that drastically only on this one of my several hosts (in fact, another host whith a smaller GPU, right now, after a runtime of several hours, shows a remaining runtime of about 24 hours).

I posted this problem in the GPUGRID forum already; members confirmed that due to the technical nature of the Python tasks, such unrealistic "remaining time" indications are shown in the BOINC manager; so, obviously in some cases that's normal what concerns the GPUGRID Python tasks.
One member was even talking about 157 remaining days (!) shown on his host (while the tasks in fact finish far below 24 hours).

With the generous hardware ressources of this host, I would like to crunch 2 Python tasks simulteaneously in any case (like I am doing it on another host with less hardware ressources).
Can anyone help me to get out of this problem? Is there any possiblity to tweak the values or whatever?
In the GPUGRID forum, so far I could not get any advice.
ID: 109980 · Report as offensive
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 229
United States
Message 109986 - Posted: 4 Oct 2022, 19:15:28 UTC - in response to Message 109980.  

change your cache settings to 1 day + 0.1 day
ID: 109986 · Report as offensive

Message boards : Questions and problems : "job cache full" due to indicated unrealistic high remaining runtimes of GPUGRID Python tasks

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.