Message boards : GPUs : Specifications for NVidia RTX 30x0 range?
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Oct 06 Posts: 5128 |
May I ask you hardware enthusiasts to double-check my thoughts on running BOINC on the new RTX 3070/3080/3090 range? I've been studying NVIDIA A100 Tensor Core GPU Architecture and NVIDIA Ampere GA102 GPU Architecture BOINC uses the number of CUDA cores per SM, and a flops multiplier, to estimate the GPU's peak speed. I'm getting that the GA102 (and above, but not the A100) benefit from both an increase from 64 to 128 cores per SM, and the ability to process two FP32 streams concurrently. So I think that the current v7.16.11 BOINC client will rate the new cards at one-quarter of the flops reported by other tools. Can anybody confirm that? If it's true, I'll code a patch for the next release of BOINC. |
Send message Joined: 17 Nov 16 Posts: 890 |
I'll ask in the OCN forums for anyone who actually has managed to snag a 3080. Not many have. The ones that have are shipping them off to gpu block manufacturers for measurements for new blocks in return for a free block. So all they did was verify that the card was not a dud and shipped them off. No chance of asking whether anyone actually ran a BOINC project on it yet. |
Send message Joined: 8 Nov 19 Posts: 718 |
I guess The Collatz Conjecture might make good use for these GPUs. They generally do load the (2080Ti)GPU to 100% on my systems, which comparatively should be just under a 3080 in terms of performance. |
Send message Joined: 5 Oct 06 Posts: 5128 |
The Collatz comparison doesn't really help to answer my question. Collatz will be especially well served by the previous Volta and Turing chipsets, because of their additional, independent, pathway for INT32 calculations. The Ampere chipset makes that extra pathway available for FP32 calculations too, which makes it more widely suitable for the type of research that BOINC is designed to support. |
Send message Joined: 17 Nov 16 Posts: 890 |
I found a RTX 3080 running Einstein on the Gamma Ray and Gravity Wave gpu applications. https://einsteinathome.org/host/12850228/tasks/0/40 Already posted a PM to them asking for their Event Log startup for the gpu detection output. Also running the stock Windows apps so my question about whether a new app would be needed got answered. Seems the apps developed for the Turing-Volta transition still work. The host ran the GR tasks in 360 seconds compared to my 460 seconds for my RTX 2080. So 27% faster on the GR tasks compared to Turing. |
Send message Joined: 5 Oct 06 Posts: 5128 |
Thanks for that - I've looked through some of the logs, and it all checks out (like the 10 GB VRAM shown in stderr.txt, against the 4 GB recognised by BOINC. That's for another day.) Remember that this is an OpenCL app, and relies of the efficiency of the OpenCL translation layer in using the new CUDA functions. And it wastes something like 14 seconds at the end on pure CPU work - so we can't put too much reliance on the speed ratio. Also, the host only ran for one day - looks like a burn-in test. Now, if we could just find a CUDA example at GPUGrid... |
Send message Joined: 17 Nov 16 Posts: 890 |
Well I fully expected the existing app to fail. It failed on the Pascal >> Turing transition. OpenCL app. The OpenCL layer didn't handle the change in the CUDA core to SM count and the CC capability was out of range. You still don't have the gpu detection output for the BOINC calculated GFLOPs rating you need. Yes, they also ran for a day at Milkyway. Not so impressive there. Twice as fast at Primegrid compared to a RTX 2080 Ti running a CUDA app. https://www.primegrid.com/results.php?hostid=1023381 |
Send message Joined: 5 Oct 06 Posts: 5128 |
Twice as fast at Primegrid compared to a RTX 2080 Ti running a CUDA app.That's more surprising. I'd have expected the speed increase to be less, because the RTX 2080 can use its INT32 pathway, which it (probably - I'm not fully knowledgeable on Einstein's maths) couldn't use at Einstein. |
Send message Joined: 17 Nov 16 Posts: 890 |
I'm thinking it is because the Primegrid app is CUDA and not OpenCL so probably more optimized for the architecture. Really want to find one on GPUGrid. |
Send message Joined: 17 Nov 16 Posts: 890 |
Also not a card family to deploy at Milkyway. Nvidia dropped the FP64 capability in half again from 1:32 to 1:64. Still trying to force the compute consumer to their pro line of products ($$). |
Send message Joined: 5 Oct 06 Posts: 5128 |
I've received reliable reports that BOINC shows 14,884 GFLOPS peak for the RTX 3080, and SIV shows 29,768 - exactly double. Since we use different API calls for getting the shader count, that'll be the difference - SIV will be right, and us wrong. That leaves the question of the doubled FP32 pipeline unresolved. That may require direct experimentation on the hardware - I'm thinking possibly including running two tasks in parallel. |
Send message Joined: 17 Nov 16 Posts: 890 |
I've received reliable reports that BOINC shows 14,884 GFLOPS peak for the RTX 3080, and SIV shows 29,768 - exactly double. Since we use different API calls for getting the shader count, that'll be the difference - SIV will be right, and us wrong. How do you want to set that experiment up? What parameters are you looking for? |
Send message Joined: 5 Oct 06 Posts: 5128 |
How do you want to set that experiment up? What parameters are you looking for?This is just for baseline BOINC users, not fancy optimisers. Ideally a single 30x0 card, in a host with plenty of power and cooling (so nothing gets throttled). Run a known - preferably CUDA - app for long enough to get a good idea of performance. Slap in an app_config.xml file with <gpu_usage>.5</gpu_usage>, and record what happens. |
Send message Joined: 17 Nov 16 Posts: 890 |
How do you want to set that experiment up? What parameters are you looking for?This is just for baseline BOINC users, not fancy optimisers. Ideally a single 30x0 card, in a host with plenty of power and cooling (so nothing gets throttled). Run a known - preferably CUDA - app for long enough to get a good idea of performance. Slap in an app_config.xml file with <gpu_usage>.5</gpu_usage>, and record what happens. Ok, I will ask Till to run his RTX 3080 at Primegrid with an app_config with 0.5 gpu usage. That is a CUDA application. |
Send message Joined: 17 Nov 16 Posts: 890 |
Till ran a app_config.xml but report it is still running tasks as singles. This is his file. <app_config> <app> <name>pps_sr2sieve</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> </gpu_versions> </app> </app_config> I don't see anything wrong with the syntax. He reports no errors on startup and the app_config is read. Just doesn't run tasks as doubles. I found posts in Primegrid/Number Crunching that shows times running doubles with the PPSieve app so it should work. |
Send message Joined: 5 Oct 06 Posts: 5128 |
Might be wise to throw in a <cpu_usage> line for completeness. It's not marked as optional in the manual. |
Send message Joined: 17 Nov 16 Posts: 890 |
Till responds: With the cpu_usage added it starts working immediately... BOINC is sometimes a strange piece of software. So doesn't look like the app is using the second FP32 pipeline. So responding like previous generations. Typically not exactly double the crunch times. So the card might be slightly more productive doing doubles compared to singles at the expense of using a lot more power. |
Send message Joined: 5 Mar 08 Posts: 272 |
So doesn't look like the app is using the second FP32 pipeline. So responding like previous generations. Typically not exactly double the crunch times. So the card might be slightly more productive doing doubles compared to singles at the expense of using a lot more power. Maybe it needs to be recompiled with the latest CUDA toolkit to take advantage of the additional pipeline. CUDA 11.1 was just released with support for RXT30 series cards. Phoronix article here MarkJ |
Send message Joined: 17 Nov 16 Posts: 890 |
Looks like we will need new apps for Ampere at GPUGrid. Task fails with A-100 cards with nvrtc error of an unknown arch. |
Send message Joined: 5 Oct 06 Posts: 5128 |
Found the thread, and saw the error message in the results. Yup, that's a show-stopper, even though the A100 card is only cc8.0 Meanwhile, I've submitted #4031 to deal with the flops display. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.