8.1.2010 | The year I started blogging (blogware) |
9.1.2010 | Linux initramfs with iSCSI and bonding support for PXE booting |
9.1.2010 | Using manually tweaked PTX assembly in your CUDA 2 program |
9.1.2010 | OpenCL autoconf m4 macro |
9.1.2010 | Mandelbrot with MPI |
10.1.2010 | Using dynamic libraries for modular client threads |
11.1.2010 | Creating an OpenGL 3 context with GLX |
11.1.2010 | Creating a double buffered X window with the DBE X extension |
12.1.2010 | A simple random file read benchmark |
14.12.2011 | Change local passwords via RoundCube safer |
5.1.2012 | Multi-GPU CUDA stress test |
6.1.2012 | CUDA (Driver API) + nvcc autoconf macro |
29.5.2012 | CUDA (or OpenGL) video capture in Linux |
31.7.2012 | GPGPU abstraction framework (CUDA/OpenCL + OpenGL) |
7.8.2012 | OpenGL (4.3) compute shader example |
10.12.2012 | GPGPU face-off: K20 vs 7970 vs GTX680 vs M2050 vs GTX580 |
4.8.2013 | DAViCal with Windows Phone 8 GDR2 |
5.5.2015 | Sample pattern generator |
Update 16-03-2020: Versions 1.1 and up support tensor cores.
Update 30-11-2016: Versions 0.7 and up also benchmark.
I work with GPUs a lot and have seen them fail in a variety of ways: too much (factory) overclocked memory/cores, unstable when hot, unstable when cold (not kidding), memory partially unreliable, and so on. What's more, failing GPUs often fail silently and produce incorrect results when they are just a little unstable, and I have seen such GPUs consistently producing correct results on some apps and incorrect results on others.
What I needed in my tool box was a stress test for multi-GPGPU-setups that used all of the GPUs' memory and checked the results while keeping the GPUs burning. There are not a lot of tools that can do this, let alone for Linux. Therefore I hacked together my own. It runs on Linux and uses the CUDA driver API.
My program forks one process for each GPU on the machine, one process for keeping track of the GPU temperatures if available (e.g. Fermi Teslas don't have temp. sensors), and one process for reporting the progress. The GPU processes each allocate 90% of the free GPU memory, initialize 2 random 2048*2048 matrices, and continuously perform efficient CUBLAS matrix-matrix multiplication routines on them and store the results across the allocated memory. Both floats and doubles are supported. Correctness of the calculations is checked by comparing results of new calculations against a previous one -- on the GPU. This way the GPUs are 100% busy all the time and CPUs idle. The number of erroneous calculations is brought back to the CPU and reported to the user along with the number of operations performed so far and the GPU temperatures.
Real-time progress and summaries every ~10% are printed as shown below. Matrices processed are cumulative, whereas errors are for that summary. GPUs are separated by slashes. The program exits with a conclusion after it has been running for the number of seconds given as the last command line parameter. If you want to burn using doubles instead, give parameter "-d" before the burn duration. The example below was on a machine that had one working GPU and one faulty (too much factory overclocking and thus slightly unstable (you wouldn't have noticed it during gaming)):
% ./gpu_burn 120 GPU 0: GeForce GTX 1080 (UUID: GPU-f998a3ce-3aad-fa45-72e2-2898f9138c15) GPU 1: GeForce GTX 1080 (UUID: GPU-0749d3d5-0206-b657-f0ba-1c4d30cc3ffd) Initialized device 0 with 8110 MB of memory (7761 MB available, using 6985 MB of it), using FLOATS Initialized device 1 with 8113 MB of memory (7982 MB available, using 7184 MB of it), using FLOATS 10.8% proc'd: 3472 (4871 Gflop/s) - 3129 (4683 Gflop/s) errors: 0 - 0 temps: 56 C - 56 C Summary at: Mon Oct 31 10:32:22 EET 2016 22.5% proc'd: 6944 (4786 Gflop/s) - 7152 (4711 Gflop/s) errors: 0 - 0 temps: 61 C - 60 C Summary at: Mon Oct 31 10:32:36 EET 2016 33.3% proc'd: 10850 (4843 Gflop/s) - 10728 (4633 Gflop/s) errors: 2264 (WARNING!) - 0 temps: 63 C - 61 C Summary at: Mon Oct 31 10:32:49 EET 2016 44.2% proc'd: 14756 (4861 Gflop/s) - 13857 (4675 Gflop/s) errors: 1703 (WARNING!) - 0 temps: 66 C - 63 C Summary at: Mon Oct 31 10:33:02 EET 2016 55.0% proc'd: 18228 (4840 Gflop/s) - 17433 (4628 Gflop/s) errors: 3399 (WARNING!) - 0 temps: 69 C - 65 C Summary at: Mon Oct 31 10:33:15 EET 2016 66.7% proc'd: 22134 (4824 Gflop/s) - 21009 (4652 Gflop/s) errors: 3419 (WARNING!) - 0 temps: 70 C - 65 C Summary at: Mon Oct 31 10:33:29 EET 2016 77.5% proc'd: 25606 (4844 Gflop/s) - 25032 (4648 Gflop/s) errors: 5715 (WARNING!) - 0 temps: 71 C - 66 C Summary at: Mon Oct 31 10:33:42 EET 2016 88.3% proc'd: 29078 (4835 Gflop/s) - 28161 (4602 Gflop/s) errors: 7428 (WARNING!) - 0 temps: 73 C - 67 C Summary at: Mon Oct 31 10:33:55 EET 2016 100.0% proc'd: 33418 (4752 Gflop/s) - 32184 (4596 Gflop/s) errors: 9183 (WARNING!) - 0 temps: 74 C - 68 C Killing processes.. done Tested 2 GPUs: GPU 0: FAULTY GPU 1: OK
With this tool I've been able to spot unstable GPUs that performed well under every other load they were subjected to. So far it has also never missed a GPU that was known to be unstable. *knocks on wood*
Grab it from GitHub https://github.com/wilicc/gpu-burn or below:
gpu_burn-0.4.tar.gz
gpu_burn-0.6.tar.gz (compatible with nvidia-smi and nvcc as of 04-12-2015)
gpu_burn-0.8.tar.gz (includes benchmark, Gflop/s)
gpu_burn-0.9.tar.gz (compute profile 30, compatible w/ CUDA 9)
gpu_burn-1.0.tar.gz (async compare, CPU no longer busy)
gpu_burn-1.1.tar.gz (tensor core support)
and burn with floats for an hour: make && ./gpu_burn 3600
If you're running a Tesla, burning with doubles instead stresses the card more (as it was friendly pointed out to me in the comments by Rick W): make && ./gpu_burn -d 3600
If you have a Turing architecture, you might want to benchmark Tensor cores as well: ./gpu-burn -tc 3600 (Thanks Igor Moura!)
You might have to show the Makefile to your CUDA if it's not in the default path, and also to a version of gcc your nvcc can work with. It expects to find nvidia-smi from your default path.
Hi, I was trying to use your tool to stress test one of our older CUDA Systems (Intel(R) Core(TM)2 Q9650, 8 GiB Ram, GTX 285 Cards). When I run the tool I get the following output: ./gpu_burn 1 GPU 0: GeForce GTX 285 (UUID: N/A) Initialized device 0 with 1023 MB of memory (967 MB available, using 871 MB of it) Couldn't init a GPU test: Error in "load module": CUDA_ERROR_NO_BINARY_FOR_GPU 100.0% proc'd: 0 errors: 164232 (WARNING!) temps: 46 C Summary at: Tue Mar 27 16:24:16 CEST 2012 100.0% proc'd: 0 errors: 354700 (WARNING!) temps: 46 C Killing processes.. done Tested 1 GPUs: 0: FAULTY I guess the card is not properly supported, it is at least weird that proc'd is always 0. Any hints on that?- Ulli Brennenstuhl
Well... Could have figured that out faster. Had to make a change in the makefile, as the gtx 285 cards only have computing capability 1.3. (-arch=compute_13)- Ulli Brennenstuhl
Hi, gpu_burn Looks like exactly what I'm looking for! I want to run it remotely (over SSH) on a machine I've just booted off a text-mode-only Live CD (PLD Linux RescueCD 11.2). What exactly do I have to install on the remote system for it to run? a full-blow X installation? Or is it enough to copy over my NVIDIA driver kernel module file, and a few libraries (perhaps to a LD_LIBRARY_PATH'ed dir)? I would like to install as little as possible stuff in that remote machine... Thanks in advance for any hints.- durval
Hi! X is definitely not a prerequisite. I haven't got a clean Linux installation at hand right now so I'm unable to confirm this, but I think you need: From my application: gpu_burn (the binary) compare.ptx (the compiled result comparing kernel) And then you need the nvidia kernel module loaded, which has to match the running kernel's version: nvidia.ko Finally, the gpu_burn binary is linked against these libraries from the CUDA toolkit, which should be found from LD_LIBRARY_PATH in your case: libcublas.so libcudart.so (dependency of libcublas.so) And the CUDA library that is installed by the nvidia kernel driver installer: libcuda.so Hope this helps, and sorry for the late reply :-)- wili
How can I solve "Couldn't init a GPU test: Error in "load module": CUDA_ERROR_FILE_NOT_FOUND"? Although I specified the absolute path, this always shows me this error message. Can you tell me the reason? Run length not specified in the command line. Burning for 10 secs GPU 0: Quadro 400 (UUID: GPU-d2f766b0-8edd-13d6-710c-569d6e138412) GPU 1: GeForce GTX 690 (UUID: GPU-e6df228c-b7a7-fde5-3e08-d2cd3485aed7) GPU 2: GeForce GTX 690 (UUID: GPU-b70fd5b0-129f-4b39-f1ff-938cbad4ed26) Initialized device 0 with 2047 MB of memory (1985 MB available, using 1786 MB of it) Couldn't init a GPU test: Error in "load module": CUDA_ERROR_FILE_NOT_FOUND 0.0% proc'd: 4236484 / 0 / 0 errors: 0 / 0 / 0 temps: 63 C / 42 C / 40 C ... ... ... 100.0% proc'd: 1765137168 / -830702792 / 1557019224 errors: 0 / 0 / 0 temps: 62 C / 38 C / 36 C 100.0% proc'd: 1769373652 / -826466308 / 1561255708 errors: 0 / 0 / 0 temps: 62 C / 38 C / 36 C 100.0% proc'd: 1773610136 / -822229824 / 1565492192 errors: 0 / 0 / 0 temps: 62 C / 38 C / 36 C 100.0% proc'd: 1777846620 / -817993340 / 1569728676 errors: 0 / 0 / 0 temps: 62 C / 38 C / 36 C Killing processes.. done Tested 3 GPUs: 0: OK 1: OK 2: OK- Encheon Lim
Hi, Did you try to run the program in the compilation directory? The program searches for the file "compare.ptx" in the current work directory, and gives that error if it's not found. The file is generated during "make".- wili
Hi wili, did You try Your program on K20 cards? I see errors very often, also on GPUs which are ok. I cross-checked with codes developed in our institute and all of our cards run fine, but gpu_burn gives errors (not in each run, but nearly). Do You have an idea? Regards, Henrik- Henrik
Hi Henrik, Right now I have only access to one K20. I've just ran multiple burns on it (the longest one 6 hours), and have not had a single error (CUDA 5.0.35, driver 310.19). I know this is not what you would like to hear, but given that I'm using proven-and-tested CUBLAS to perform the calculations, there should be no errors on fully stable cards. How hot are the cards running? The K20 that I have access to heats up to exactly 90'C. One thing I have also noticed is that some cards might work OK on some computers but not on other (otherwise stable) computers. So IF the errors are real, they might not be the K20s' fault entirely. Regardless, I've made a small adjustment to the program about how results are being compared (more tolerant with regards to rounding); you might want to test this version out. Best regards,- wili
Hi wili, Currently we have lots of GPU issues on "fallen off the bus". It is interesting to run this test. When I put in 10min, the job will continue run until it reachs 100%. Then it hang with "Killing processes.." though no errors been counted. From the /var/log/messages, GPU fallen off the bus again when the test ran for 5min. Here are the details: ->date ; ./gpu_burn 600 ; date Fri Mar 22 15:15:53 EST 2013 GPU 0: Tesla K10.G1.8GB (UUID: GPU-af90ada7-7ce4-ae5c-bd28-0ef1745a3ad0) GPU 1: Tesla K10.G1.8GB (UUID: GPU-a8b75d1d-a592-6f88-c781-65174986329c) Initialized device 0 with 4095 MB of memory (4028 MB available, using 3625 MB of it) Initialized device 1 with 4095 MB of memory (4028 MB available, using 3625 MB of it) 10.3% proc'd: 18984 / 18080 errors: 0 / 0 temps: 87 C / 66 C Summary at: Fri Mar 22 15:16:56 EST 2013 20.3% proc'd: 32544 / 37064 errors: 0 / 0 temps: 95 C / 76 C Summary at: Fri Mar 22 15:17:56 EST 2013 30.7% proc'd: 39776 / 56048 errors: 0 / 0 temps: 96 C / 81 C Summary at: Fri Mar 22 15:18:58 EST 2013 40.8% proc'd: 44296 / 74128 errors: 0 / 0 temps: 97 C / 85 C Summary at: Fri Mar 22 15:19:59 EST 2013 51.7% proc'd: 47912 / 91304 errors: 0 / 0 temps: 98 C / 88 C Summary at: Fri Mar 22 15:21:04 EST 2013 62.0% proc'd: 47912 / 91304 errors: 0 / 0 temps: 98 C / 88 C Summary at: Fri Mar 22 15:22:06 EST 2013 72.3% proc'd: 47912 / 91304 errors: 0 / 0 temps: 98 C / 88 C Summary at: Fri Mar 22 15:23:08 EST 2013 82.5% proc'd: 47912 / 91304 errors: 0 / 0 temps: 98 C / 88 C Summary at: Fri Mar 22 15:24:09 EST 2013 92.8% proc'd: 47912 / 91304 errors: 0 / 0 temps: 98 C / 88 C Summary at: Fri Mar 22 15:25:11 EST 2013 100.0% proc'd: 47912 / 91304 errors: 0 / 0 temps: 98 C / 88 C Killing processes.. ->grep fallen /var/log/messages Mar 22 15:20:53 sstar105 kernel: NVRM: GPU at 0000:06:00.0 has fallen off the bus. Mar 22 15:20:53 sstar105 kernel: NVRM: GPU at 0000:06:00.0 has fallen off the bus. Mar 22 15:20:53 sstar105 kernel: NVRM: GPU at 0000:05:00.0 has fallen off the bus. What does that mean? GPU stop function because the high temperatures? Please advise. Thanks.- runny
Hi runny, Yeah this is very likely the case. It looks like the cards stop crunching data halfway through the test, when the first one hits 98 C. This is a high temperature and Teslas should shut down automatically at roughly 100 C. Also, I've never seen "has fallen off the bus" being caused by other issues than overheating (have seen it due to overheating twice). Best regards,- wili
Hi Wili, Thanks for your reply. I changed the GPU clock and run your script for 10min. And it past the test this time. The temperature reached to 97 C. The node may survive GPU crash but will sacrifice the performance. Do you know if there is a way to avoid this happen from the programmer's side? Many thanks,- runny
Hello, based on the following output, can i say that my graphic card is with problems? Thanks ############################## alechand@pcsantos2:~/Downloads/GPU_BURN$ ./gpu_burn 3600 GPU 0: GeForce GTX 680 (UUID: GPU-b242223a-b6ca-bd7f-3afc-162cba21e710) Initialized device 0 with 2047 MB of memory (1808 MB available, using 1628 MB of it) Failure during compute: Error in "Read faultyelemdata": CUDA_ERROR_LAUNCH_FAILED 10.0% proc'd: 11468360 errors: 22936720 (WARNING!) temps: 32 C Summary at: Thu May 16 09:23:06 BRT 2013 20.1% proc'd: 22604124 errors: 22271528 (WARNING!) temps: 32 C Summary at: Thu May 16 09:29:07 BRT 2013 30.1% proc'd: 33668668 errors: 22129088 (WARNING!) temps: 32 C Summary at: Thu May 16 09:35:08 BRT 2013 40.1% proc'd: 44763812 errors: 22190288 (WARNING!) temps: 32 C Summary at: Thu May 16 09:41:09 BRT 2013 50.1% proc'd: 55869696 errors: 22211768 (WARNING!) temps: 32 C Summary at: Thu May 16 09:47:10 BRT 2013 60.2% proc'd: 67029916 errors: 22320440 (WARNING!) temps: 31 C Summary at: Thu May 16 09:53:11 BRT 2013 70.2% proc'd: 78271124 errors: 22482416 (WARNING!) temps: 31 C Summary at: Thu May 16 09:59:12 BRT 2013 80.2% proc'd: 89538144 errors: 22534040 (WARNING!) temps: 31 C Summary at: Thu May 16 10:05:13 BRT 2013 90.2% proc'd: 100684312 errors: 22292336 (WARNING!) temps: 31 C Summary at: Thu May 16 10:11:14 BRT 2013 100.0% proc'd: 111385148 errors: 21401672 (WARNING!) temps: 31 C Killing processes.. done Tested 1 GPUs: 0: FAULTY alechand@pcsantos2:~/Downloads/GPU_BURN$ ######################################- Alechand
@Alechand: I have the same problem. In my case changing USEMEM to #define USEMEM 0.75625 let my GPU burn ;) The error also occurs at "checkError(cuMemcpyDtoH(&faultyElems, d_faultyElemData, sizeof(int)), "Read faultyelemdata");", but I don't think that a simple (only 1 int value!) cudaMemcopy let the GPU crash. After crashing (e.g. USEMEM 0.8) the allocated memory consumes 0% (run nvidia-smi). I also inserted a sleep(10) between "our->compute();" and "our->compare();". During this 'sleep' it pointed out that even for the "USEMEM 0.9"-case the amount of memory is successfully allocated (run nvidia-smi in another shell). Are there any ideas how to fix this in a more egelant way? Thanks in advance for any hints!- Chris
Hi, Thanks for noting this bug. Unfortunately I'm unable to reproduce the bug which makes it difficult for me to fix. Could either of you guys give me the nvidia driver and CUDA versions and the card model you're experiencing this problem with and I could try to fix me up a similar system? Thanks- wili
Hi, I'm using NVIDIA-SMI 4.310.23 and Cuda Driver Version: 310.23 (libcuda.so.310.23). The other libraries have version 5.0.43 (libcublas.so.5.0.43, etc. all from CUDA 5.0 SDK). GeForce GTX 680 Another problem I had is discussed here: http://troylee2008.blogspot.de/2012/05/cudagetdevicecount-returned-38.html Since I use the NVIDIA GPU only for CUDA I have to manually create these files and maybe my GPU is in a deep idle power state which leads to the "Read faultyelemdata"-Problem. I also tried to get the current power state using nvidia-smi. -> without success; i only get N/A. In some of the samples NVIDIA performs a 'warmup'. For example: stereoDisparity.cu // First run the warmup kernel (which we'll use to get the GPU in the correct max power state stereoDisparityKernel<<<numBlocks, numThreads>>>(d_img0, d_img1, d_odata, w, h, minDisp, maxDisp); It also turned out that after a reboot in some test cases gpu_burn crashes (USEMEM 0.7). But in 95% of all 'gpu_burn after reboot' cases everything went fine. I think there is a problem with my setup and not with your code. At the moment I use 4 executables with different USEMEM arguments to allocate the whole memory and avoid any problems: ./gpu_burn 3600 0.25 #1st ~500MB ./gpu_burn 3600 0.33 #2nd ~500MB ./gpu_burn 3600 0.5 #3rd ~500MB ./gpu_burn 3600 0.9 #4th ~500MB Thank you very much for the code! At the moment it does what it should (under the discussed assumptions).- Chris
Hi Chris, Wow, you have quite new CUDA SDK. The latest production version is 5.0.35 and even in the developer center they don't offer the 5.0.43, so I will not be able to get my hands on the same version. Power state "N/A" with GTX 680 is quite normal, nothing to get alarmed about. Also, even if the card was in a deep idle state (Teslas can go dormant such that they take dozens of seconds to wake up), this error should not occur. Also, the warmup in SDK samples is only for getting reliable benchmarks timings, not for avoiding issues like these. Given that the cuMemcpyDtoH is in fact right after the "compare" kernel launch and the returned error code is not something that cuMemcpyDtoH is allowed to return, I think this is likely a bug in the compare kernel (the error code is delayed until the next call) and therefore in my code. I have now made certain changes to the program and would appreciate if you took a shot with this version and reported back if the problem persists. I'm still unable to replicate the failure conditions myself so I'm working blind here :-) Best regards,- wili
Thanks for this helpful utility. I was using it to test a K20 but found the power consumption displayed by nvidia-smi during testing was only ~110/225W. I modified the test to do double precision instead of single and my power number went up to ~175W. Here is my patch: ftp://ftp.microway.com/for-customer/gpu-burn-single-to-double.diff- Rick W
Hi Rick! This is a very good observation! I found the difference in K20 power draw to be smaller than what you reported but, still, doubles burn the card harder. I have now added support for doubles (by specifying "-d" as a command line parameter). I implemented it a bit differently to your patch, by using templates. Best regards,- wili
Thank you, gpu_burn was very helpful. We use it for stress testing our Supermicro GPU servers.- Dmitry
Very useful utility. Thank you! Would be even better if initCuda() obeyed CUDA_VISIBLE_DEVICES, as we could use this utility to simulate more complex multiuser workloads. But from what I can tell it automatically runs on all GPUs in the system regardless of this setting.- pulsewidth
Can you please give instructions of steps to install and use gpu_burn? We build many tesla k10, k20 and c2075 systems but have no way of stress testing for erros and stability, also we are usually Windows based while our customer is Centos based. Thank you for any help.- sokhac@chassisplans.com
Hey, there seems to be an overflow occuring in your proc'd and errors: variables. I am getting negative values and huge sudden changes. Testing it on a titan with Cuda 5.5 and driver 331.38 For example: ./gpu_burn 300 GPU 0: GeForce GTX TITAN (UUID: GPU-6f344d7d-5f0e-8974-047e-bfcf4a559f14) Initialized device 0 with 6143 MB of memory (5082 MB available, using 4573 MB of it), using FLOATS 11.0% proc'd: 11410 errors: 0 temps: 71 C Summary at: Tue Mar 18 14:06:17 EDT 2014 21.7% proc'd: 14833 errors: 0 temps: 70 C Summary at: Tue Mar 18 14:06:49 EDT 2014 33.3% proc'd: 18256 errors: 0 temps: 69 C Summary at: Tue Mar 18 14:07:24 EDT 2014 44.0% proc'd: 20538 errors: 0 temps: 72 C Summary at: Tue Mar 18 14:07:56 EDT 2014 55.0% proc'd: 21679 errors: 0 temps: 76 C Summary at: Tue Mar 18 14:08:29 EDT 2014 66.7% proc'd: 23961 errors: 0 temps: 78 C Summary at: Tue Mar 18 14:09:04 EDT 2014 77.7% proc'd: 26243 errors: 0 temps: 78 C Summary at: Tue Mar 18 14:09:37 EDT 2014 88.3% proc'd: 27384 errors: 0 temps: 78 C Summary at: Tue Mar 18 14:10:09 EDT 2014 98.7% proc'd: 4034576 errors: -1478908168 (WARNING!) temps: 73 C Summary at: Tue Mar 18 14:10:40 EDT 2014 100.0% proc'd: 35754376 errors: -938894720 (WARNING!) temps: 72 C Killing processes.. done Tested 1 GPUs: GPU 0: FAULTY- smth chntla
looking at my dmesg output, it looks like gpu_burn is segfaulting: [3788076.522693] gpu_burn[10677]: segfault at 7fff85734ff8 ip 00007f27e09796be sp 00007fff85735000 error 6 in libcuda.so.331.38[7f27e0715000+b6b000] [3789172.403516] gpu_burn[11794]: segfault at 7fff407f1ff0 ip 00007fefd9c366b8 sp 00007fff407f1ff0 error 6 in libcuda.so.331.38[7fefd99d2000+b6b000] [3789569.269295] gpu_burn[12303]: segfault at 7fff04de5ff8 ip 00007f8842346538 sp 00007fff04de5fe0 error 6 in libcuda.so.331.38[7f88420e2000+b6b000] [3789984.624949] gpu_burn[12659]: segfault at 7fff5814dff8 ip 00007f7ed89656be sp 00007fff5814e000 error 6 in libcuda.so.331.38[7f7ed8701000+b6b000]- smth chntla
Hi, thank you for your useful tool! I do use your program for testing GpGpu! I have a question! in Testing Rresult, I watch "proc'd". I don't know what it means can you explain it?- alsub2
- wissam
Hi, I have some observations to share regarding the stability of the output of this multigpu stress test. I have two systems 1. system1 has a Tesla K20c. launching this test for 10, 30 sec the result is OK, while launching it for 60 sec the test shows errors (temperature remains the same in either cases). On this same system trying to run cudamemtester (http://sourceforge.net/projects/cudagpumemtest/) no errors are registered. Trying even other benchmarks like the ones in the shoc package, everything is ok indicating that the gpu card is defects-free 2. system2 has a Quadro 4000 that I know for sure to be faulty (different tests like cudamemtester and amber show that it is faulty) while running the multigpu stress test for any duration (even days), everything seems to be ok!! how could you explain that?? moreover, trying to direct the output of this multigpustress test to a file for future analysis, I notice that whenever there are errors, the output file quickly become huge (for 60 sec duration about 800MB!!) can anyone explain what is going on here please?? many thanks- Wissam
Any idea what could be the problem?? The system is from Penguin Computing, GPUs are on their Altus 2850GTi, Dual AMD Opteron 6320, CUDA 6.5.14, gcc 4.4.7, CentOS6. Thanks -- Joe /local/jgillen/gpuburn> uname -a Linux n20 2.6.32-279.22.1.el6.637g0002.x86_64 #1 SMP Fri Feb 15 19:03:25 EST 2013 x86_64 x86_64 x86_64 GNU/Linux /local/jgillen/gpuburn> ./gpu_burn 100 GPU 0: Tesla K20m (UUID: GPU-360ca00b-275c-16ba-76e6-ed0b7f9690c2) GPU 1: Tesla K20m (UUID: GPU-cf91aacf-f174-1b30-b110-99c6f4e2a5cd) Initialized device 0 with 4799 MB of memory (4704 MB available, using 4234 MB of it), using FLOATS Initialized device 1 with 4799 MB of memory (4704 MB available, using 4234 MB of it), using FLOATS Failure during compute: Error in "SGEMM": CUBLAS_STATUS_EXECUTION_FAILED 1.0% proc'd: -1 / 0 errors: -281 (DIED!)/ 0 temps: -- / -- Failure during compute: Error in "SGEMM": CUBLAS_STATUS_EXECUTION_FAILED 1.0% proc'd: -1 / -1 errors: -284 (DIED!)/ -1 (DIED!) temps: -- / -- No clients are alive! Aborting /local/jgillen/gpuburn> ./gpu_burn -d 100 GPU 0: Tesla K20m (UUID: GPU-360ca00b-275c-16ba-76e6-ed0b7f9690c2) GPU 1: Tesla K20m (UUID: GPU-cf91aacf-f174-1b30-b110-99c6f4e2a5cd) Initialized device 0 with 4799 MB of memory (4704 MB available, using 4234 MB of it), using DOUBLES Initialized device 1 with 4799 MB of memory (4704 MB available, using 4234 MB of it), using DOUBLES Failure during compute: Error in "Read faultyelemdata": 2.0% proc'd: 0 / -1 errors: 0 / -1 (DIED!) temps: -- / -- Failure during compute: Error in "Read faultyelemdata": 2.0% proc'd: -1 / -1 errors: -1 (DIED!)/ -1 (DIED!) temps: -- / -- No clients are alive! Aborting /local/jgillen/gpuburn>- Joe
for me get segfault build with g++ 4.9 and cuda 6.5.14 linux 3.17.6 nvidia (beta) 346.22 [1126594.592699] gpu_burn[2038]: segfault at 7fff6e126000 ip 000000000040219a sp 00007fff6e120d80 error 4 in gpu_burn[400000+f000] [1126615.669239] gpu_burn[2083]: segfault at 7fff80faf000 ip 000000000040216a sp 00007fff80fa94c0 error 4 in gpu_burn[400000+f000] [1126644.488424] gpu_burn[2119]: segfault at 7fffa3840000 ip 000000000040216a sp 00007fffa383ad40 error 4 in gpu_burn[400000+f000] [1127041.656267] gpu_burn[2219]: segfault at 7fff981de000 ip 00000000004021f0 sp 00007fff981d92e0 error 4 in gpu_burn[400000+e000]- sl1pkn07
Great tool. helped me identify some bizarre behavior on a gpu. thank- naveed
Hi: Great and useful job!!! can you guide me to understand how to pilot GPUs .. in other words how to drive a process on a specific GPU! Or is it the driver that does that by itself!?!?! MANY THANKS for your help and compliments fot your work! CIAOOOOOO Piero- Piero
Hi Piero! It is the application's responsibility to query the available devices and pick one from the enumeration. I'm not aware of a way to force a specific GPU via e.g. an environment variable. BR,- wili
Hi, I just tested this tool on one machine with two K20x (RHEL 6.6) and it is working like a charm Thank you for doing a great job Best regards from Germany Mike- Mike
Doesn't work on my system: $ ./gpu_burn 3600 GPU 0: GeForce GTX 980 (UUID: GPU-23b4ffc9-5548-310e-0a67-e07e0d2a83ff) GPU 1: Tesla M2090 (UUID: GPU-96f2dc19-daa5-f147-7fa6-0bc86a1a1dd2) terminate called after throwing an instance of 'std::string' 0.0% proc'd: -1 errors: 0 (DIED!) temps: -- No clients are alive! Aborting $ uname -a Linux node00 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2015 NVIDIA Corporation Built on Mon_Feb_16_22:59:02_CST_2015 Cuda compilation tools, release 7.0, V7.0.27 $ gcc --version gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4) Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ lsb_release -a LSB Version: :core-4.1-amd64:core-4.1-noarch Distributor ID: CentOS Description: CentOS Linux release 7.2.1511 (Core) Release: 7.2.1511 Codename: Core- Richard
Great nifty little thing! Many thanks! P.S.1 (I have modified the makefile slightly for PASCAL) P.S.2 (Could you please spoil us by implementing more tests, and telling the performance and numerical accuracy of the card? I would be very curious to see how different SM architectures affect performance)- obm
Hi obm! Glad that you found the tool useful. I don't have a Pascal card around yet to see whether compute versions need to get changed. The point about printing out performance figures is a good one. I can't print out flops values since I don't know the internal implementation of CUBLAS and as it also might change over time, but I could certainly print out multiplication operations per second. I'll add that when I have the time! Numerical accuracy should be the same for all cards using the same CUBLAS library, since all cards nowadays conform to the IEEE floating point standard.- wili
Very nice tool, just used it to check my cuda setup on a quad titan x compute node.- Steve
Wili, Thank you for a very useful tool. I have been using gpu_burn since you first put it online. I have used your test to verify a 16 node, 96GPU cluster could run without throttling for thermal protection, and a I drove several brands of workstations to brown-out conditions with 3x K20 and K40 GPUs with verified inadequate current on the P/S 12v rails. Thank You!- Mike
Thanks for writing this, I found it useful to check a used GPU I bought off eBay. I had to change the Makefile architecture to compute_20 since CUDA 7.0+ doesn't support compute_13 any more. I also needed to update the Temperature matching. Have you considered hosting this on github instead of your blog? People would be able to more easily make contributions.- Alex
Hi Alex! That's a very good suggestion, I'll look into it once I have the time :-) Also please note that the newer version: "gpu_burn-0.6.tar.gz (compatible with nvidia-smi and nvcc as of 04-12-2015)" already uses compute_20 and has the temperature parsing matched with the newer nvidia-smi. The older 0.4 version (which I assume you used) is only there for compatibility with older setups.- wili
great utility! I confirm this is working on Ubuntu 16.04 w/ CUDA 8 beta. Just had to change Makefile (CUDAPATH=/usr/local/cuda-8.0) and comment out error line in /usr/local/cuda-8.0/include/host_config.h (because GCC was too new): #if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 3) //#error -- unsupported GNU version! gcc versions later than 5.3 are not supported! #endif /* __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 1) */ GPU I tested was TitanX (Maxwell). Lastly, I agree with comment(s) above about turning this into a small benchmark to compare different systems' GPU performance (iterations per second)- Jason
Works great with different GPU cards until the new GTX 1080Ti. Passing of nvidia-smi might be the main issue there- Leo
Any idean on this? ./gpu_burn 3600 GPU 0: Tesla P100-SXM2-16GB (UUID: GPU-a27460fa-1802-cff6-f0b7-f4f1fe935a67) GPU 1: Tesla P100-SXM2-16GB (UUID: GPU-4ab21a43-10cb-3df1-8a6e-c8c27ff1bf9f) GPU 2: Tesla P100-SXM2-16GB (UUID: GPU-daabb1f4-529e-08d6-3c58-25ca54b7dbbe) GPU 3: Tesla P100-SXM2-16GB (UUID: GPU-f378575d-7a45-7e81-9460-c5358f2a7957) Initialized device 0 with 16276 MB of memory (15963 MB available, using 14366 MB of it), using FLOATS Failure during compute: Error in "SGEMM": CUBLAS_STATUS_EXECUTION_FAILED 0.0% proc'd: -1 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: -5277 (DIED!)- 0 - 0 - 0 temps: 35 C - 36 C - 38 C - 37 C Initialized device 1 with 16276 MB of memory (15963 MB available, using 14366 MB of it), using FLOATS 0.0% proc'd: -1 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: -5489 (DIED!)- 0 - 0 - 0 temps: 35 C - 36 C - 38 C - 37 C Failure during compute: Error in "SGEMM": CUBLAS_STATUS_EXECUTION_FAILED 0.1% proc'd: -1 (0 Gflop/s) - -1 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: -6667 (DIED!)- -919 (DIED!)- 0 - 0 temps: 35 C - 36 C - 38 C - 37 C Initialized device 3 with 16276 MB of memory (15963 MB available, using 14366 MB of it), using FLOATS 0.1% proc'd: -1 (0 Gflop/s) - -1 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: -6928 (DIED!)- -1180 (DIED!)- 0 - 0 temps: 35 C - 36 C - 38 C - 37 C Failure during compute: Error in "SGEMM": CUBLAS_STATUS_EXECUTION_FAILED 0.1% proc'd: -1 (0 Gflop/s) - -1 (0 Gflop/s) - 0 (0 Gflop/s) - -1 (0 Gflop/s) errors: -6935 (DIED!)- -1187 (DIED!)- 0 - -1 (DIED!) temps: 35 C - 36 C - 38 C - 37 C Initialized device 2 with 16276 MB of memory (15963 MB available, using 14366 MB of it), using FLOATS 0.1% proc'd: -1 (0 Gflop/s) - -1 (0 Gflop/s) - 0 (0 Gflop/s) - -1 (0 Gflop/s) errors: -7121 (DIED!)- -1373 (DIED!)- 0 - -1 (DIED!) temps: 35 C - 36 C - 38 C - 37 C Failure during compute: Error in "SGEMM": CUBLAS_STATUS_EXECUTION_FAILED 0.1% proc'd: -1 (0 Gflop/s) - -1 (0 Gflop/s) - -1 (0 Gflop/s) - -1 (0 Gflop/s) errors: -7123 (DIED!)- -1375 (DIED!)- -1 (DIED!)- -1 (DIED!) temps: 35 C - 36 C - 38 C - 37 C No clients are alive! Aborting- Lexasoft
Hi Lexasoft, Looks bad, haven't seen this yet. I'll give it a go next week with updated tools and a pair of 1080Tis and see if this comes up. (I don't have access to P100s myself (congrats BTW ;-)) BR- wili
Version 0.7 seems to work fine with the default CUDA (7.5.18) that comes in Ubuntu 16.04.1 LTS on dual 1080Tis: GPU 0: Graphics Device (UUID: GPU-f2a70d44-7e37-fb35-91a3-09f49eb8be76) GPU 1: Graphics Device (UUID: GPU-62859d61-d08d-8769-e506-ee302442b0f0) Initialized device 0 with 11169 MB of memory (10876 MB available, using 9789 MB of it), using FLOATS Initialized device 1 with 11172 MB of memory (11003 MB available, using 9903 MB of it), using FLOATS 10.6% proc'd: 6699 (6324 Gflop/s) - 6160 (6411 Gflop/s) errors: 0 - 0 temps: 63 C - 54 C Summary at: Mon May 29 13:26:34 EEST 2017 21.7% proc'd: 14007 (6208 Gflop/s) - 13552 (6317 Gflop/s) errors: 0 - 0 temps: 75 C - 67 C Summary at: Mon May 29 13:26:54 EEST 2017 32.2% proc'd: 20706 (6223 Gflop/s) - 20328 (6238 Gflop/s) errors: 0 - 0 temps: 82 C - 75 C Summary at: Mon May 29 13:27:13 EEST 2017 42.8% proc'd: 27405 (5879 Gflop/s) - 27720 (6222 Gflop/s) errors: 0 - 0 temps: 85 C - 80 C Summary at: Mon May 29 13:27:32 EEST 2017 53.3% proc'd: 34104 (5877 Gflop/s) - 34496 (6179 Gflop/s) errors: 0 - 0 temps: 87 C - 82 C Summary at: Mon May 29 13:27:51 EEST 2017 63.9% proc'd: 40803 (5995 Gflop/s) - 41272 (6173 Gflop/s) errors: 0 - 0 temps: 86 C - 83 C Summary at: Mon May 29 13:28:10 EEST 2017 75.0% proc'd: 46893 (5989 Gflop/s) - 48664 (6092 Gflop/s) errors: 0 - 0 temps: 87 C - 84 C Summary at: Mon May 29 13:28:30 EEST 2017 85.6% proc'd: 53592 (5784 Gflop/s) - 55440 (6080 Gflop/s) errors: 0 - 0 temps: 86 C - 83 C Summary at: Mon May 29 13:28:49 EEST 2017 96.1% proc'd: 60291 (5969 Gflop/s) - 62216 (5912 Gflop/s) errors: 0 - 0 temps: 87 C - 84 C Summary at: Mon May 29 13:29:08 EEST 2017 100.0% proc'd: 63336 (5961 Gflop/s) - 64680 (5805 Gflop/s) errors: 0 - 0 temps: 86 C - 84 C Killing processes.. done Tested 2 GPUs: GPU 0: OK GPU 1: OK- wili
I'd just like to thank you for such an useful application. I'm testing a little big monster with 10 GTX 1080 Ti cards and it helped a lot. The complete system setup is: Supermicro 4028GR-TR2 2x Intel E5-2620V4 128 GB of memory 8x nvidia GTX 1080 Ti 1 SSD for the O.S. CentOS 7.3 x64 I installed latest Cuda developement kit "cuda_8.0.61_375.26" but the GPU cards were not recognized so I upgraded the driver to latest 381.22. After that it works like charm. I'll have acces to this server a little more so if you want me to test something on it, just contact me.- ibarco
This tool is awesome and works as advertised! We are using it to stress test our Supermicro GPU servers (GeForce GTX 1080 for now).- theonewolf
Hi, first, thanks for this awesome tool ! But, I may need some help to be sure to understand why I get. I run gpu_burn and get: GPU 0: Tesla P100-PCIE-16GB (UUID: GPU-d164c8c4-1bdb-a4ab-a9fe-3dffdd8ec75f) GPU 1: Tesla P100-PCIE-16GB (UUID: GPU-6e1a32bb-5291-4611-4d0c-78b5ff2766ce) GPU 2: Tesla P100-PCIE-16GB (UUID: GPU-c5d10fad-f861-7cc0-b37b-7cfbd9bceb67) GPU 3: Tesla P100-PCIE-16GB (UUID: GPU-df9c5c01-4fbc-3fb2-9f41-9c6900bb78c8) Initialized device 0 with 16276 MB of memory (15945 MB available, using 14350 MB of it), using FLOATS Initialized device 2 with 16276 MB of memory (15945 MB available, using 14350 MB of it), using FLOATS Initialized device 3 with 16276 MB of memory (15945 MB available, using 14350 MB of it), using FLOATS Initialized device 1 with 16276 MB of memory (15945 MB available, using 14350 MB of it), using FLOATS 40.0% proc'd: 894 (3984 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: 0 - 0 - 0 - 0 temps: 38 C - 45 C - 51 C - 38 C Summary at: Wed Aug 23 11:08:01 EDT 2017 60.0% proc'd: 1788 (7998 Gflop/s) - 894 (3736 Gflop/s) - 894 (3750 Gflop/s) - 894 (3740 Gflop/s) errors: 0 - 0 - 0 - 0 temps: 44 C - 53 C - 59 C - 45 C Summary at: Wed Aug 23 11:08:03 EDT 2017 80.0% proc'd: 2682 (8014 Gflop/s) - 1788 (8015 Gflop/s) - 1788 (8016 Gflop/s) - 1788 (8016 Gflop/s) errors: 0 - 0 - 0 - 0 temps: 44 C - 53 C - 59 C - 45 C Summary at: Wed Aug 23 11:08:05 EDT 2017 100.0% proc'd: 3576 (8019 Gflop/s) - 2682 (8015 Gflop/s) - 3576 (8015 Gflop/s) - 2682 (8015 Gflop/s) errors: 0 - 0 - 0 - 0 temps: 44 C - 53 C - 59 C - 45 C Summary at: Wed Aug 23 11:08:07 EDT 2017 100.0% proc'd: 4470 (8018 Gflop/s) - 3576 (8015 Gflop/s) - 3576 (8015 Gflop/s) - 3576 (8016 Gflop/s) errors: 0 - 0 - 0 - 0 temps: 47 C - 56 C - 62 C - 48 C Killing processes.. done Tested 4 GPUs: GPU 0: OK GPU 1: OK GPU 2: OK GPU 3: OK More precisely, with theses information: proc'd: 3576 (8019 Gflop/s) - 2682 (8015 Gflop/s) - 3576 (8015 Gflop/s) - 2682 (8015 Gflop/s) errors: 0 - 0 - 0 - 0 temps: 44 C - 53 C - 59 C - 45 C Is this information listed in the same order as the GPU list ? (proc'd: 3576 (8019 Gflop/s), 44 C => information for GPU 0 ? proc'd: 2682 (8015 Gflop/s), 53 C => information for GPU 1 ? proc'd: 3576 (8015 Gflop/s), 59 C => information for GPU 2 ? proc'd: 2682 (8015 Gflop/s), 45 C => information for GPU 3 ?) Thank you! Ced- Ced
Hi Ced! Yes, the processed/gflops, temps, and GPU lists should all have the same order. If you think they don't match, I can double check the code.- wili
Hi wili! Thank you As the devices were not initialiazed in the same order as the GPU list I was not sure. Otherwise, I don't have any clue to point the fact that the order could mismatch.- Ced
Hi Ced, Yes the initialization is multi-threaded and is reported in non-deterministic order. Otherwise the order should match. Also note that some devices take longer to init and start crunching data, that's why you may see some devices starting to report their progress late.- wili
Hi Wili, Indead, each tests I performed ensured me that the order perfectly match. Thank you again.- Ced
Hi Wili, it seems that CUDA 9.0 updated the minimum supported virtual architecture: nvcc fatal : Value 'compute_20' is not defined for option 'gpu-architecture' Replacing that with compute_30 is enough. Best, Alfredo- Alfredo
Hi Wili, Is it possible to run your tool on CPU only so as to compare CPU results to GPU results ? Best regards- Ced
Hi Ced, It runs on CUDA, and as such only on NVidia GPUs. Until NVidia releases CPU drivers for CUDA (don't hold your breath), I'm afraid the answer is no. BR,- wili
Thank you for this tool. Still super handy- CapnScabby
wili, why not distribute your tool through some more accessible platform, e.g. github? This seems to be the best torture test for CUDA GPUs, so people are surely interested. I myself have used it a few times, but it's just too much (mental) effort to check on a website whether there's an update instead of just doing a "git remote update; git tag" to see if there's a new release! Look, microway already took the code and uploaded it to github, so why not do it yourself, have people fork your repo and take (more of the) credit while making it easier to access? ;) Cheers, Szilard- pszilard
Hi Pszilard, You can now find it at GitHub, https://github.com/wilicc/gpu-burn I've known that it needs to go up somewhere easier to access for a while now, especially after there seemed to be outside interest. (It's just that I like to host all my hobby projects on my own servers.) I guess I have to budge here this time; thanks for giving the nudge. BR,- wili
hi, here is the message with quadro 6000 : GPU 0: Quadro 6000 (UUID: GPU-cfb253b7-0520-7498-dee2-c75060d9ed25) Initialized device 0 with 5296 MB of memory (4980 MB available, using 4482 MB of it), using FLOATS Couldn't init a GPU test: Error in "load module": 10.8% proc'd: -807182460 (87002046464 Gflop/s) errors: 0 temps: 71 C 1 C Summary at: lundi 4 décembre 2017, 16:01:29 (UTC+0100) 21.7% proc'd: -1334328390 (74880032768 Gflop/s) errors: 0 temps: 70 C C C Summary at: lundi 4 décembre 2017, 16:01:42 (UTC+0100) 23.3% proc'd: 1110985114 (599734133719040 Gflop/s) errors: 0 temps: 70 C ^C my configuration works with K5000- mulot
Hi Mulot, "compare.ptx" has to be found from your working directory. Between these runs, you probably invoked gpu-burn with different working directories. (PS. A more verbose "file not found" print added to github) BR,- wili
Hi, I hit the "Error in "load module":" error as well, in my case it was actually an unhandled CUDA error code 218, which says CUDA_ERROR_INVALID_PTX. For me this was caused by misconfigured nvcc path (where nvcc was from CUDA 7.5, but rest of the system used 9.1), once I was correctly using nvcc from cuda 9.1, this started working. Thanks,- av
Hi, I've tried with the Titan V with your tool, seems running good, but when I monitor the TDP by NVIDIA-smi, the Watt can't be reached at 250W which is the Card's spec, the problem isn't occurred when I use the older generation card like GTX 1070Ti with the same version of Driver 387.34, does the Geforce Linux Driver have porblem on the Titan V? Or the gpuburn may need modify for Volta GPU usage? Many thanks!- Natsu
This is the best tool for maximizing GPU power consumption! I tried V100 PCIE x8 and it rocks!!! Thank you!!!- dhenzjhen
Hi, Great tool! I tried with 1080ti and TitanXp, maybe a feature is missing is the choice of the gpu we want to stress. However, is it normal to obtain temperature values that stay constant to 84/85 on both the gpus while they are stressed?- piovrasca
Hi, A good feature suggestion! Maybe I’ll add that at some point. Temperatures sound about right, they are nowadays regulated after all.- wili
Hello, I had a problem where a multi-GPU server was hard-crashing under a load using CUDA for neural networks training. Geeks3d's GPUtest wasn't able to reproduce the problem. Then I've stumbled upon your blog. Your tool singled out the faulty GPU in 5 seconds! You, sir, have spared me a lot of headache. I cannot thank you enough!- Kurnass
How about make the output stream better suited for log-files (running it with SLURM). Seems like it uses back ticks extensively; thus the file grows to a couple of 100MiB for 30sec.- qnib
I tried your tool on Jetson TX2 tegra board and on tegra there is no nvidia-smi package available. so temps are not available. Problem I have is that I get a Segementation Fault! Is that because of missing nvidia-smi? or should your stress test work on tegra?- gernot
Hi gernot, I’ve never tried running this on a Tegra, so it might or might not work. It should print a better error message, granted, but as I don’t have a Tegra for testing you’re unfortunately on your own :-)- wili
I cant seem to get this to build i keep getting make PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/cuda/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:.:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/cuda/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl nvcc -I/usr/local/cuda/include -arch=compute_30 -ptx compare.cu -o compare.ptx g++ -O3 -Wno-unused-result -I/usr/local/cuda/include -c gpu_burn-drv.cpp gpu_burn-drv.cpp:49:10: fatal error: cuda.h: No such file or directory #include <cuda.h> ^~~~~~~~ compilation terminated. make: *** [Makefile:11: drv] Error 1 Cuda.h can be found in include/linux i have tried updating the path in the make but it doesn't work it still says it cant find it. This is on the latest Arch linux (system just updated and rebooted before this) any ideas? this is exactly what i am looking for here at work it will speed up my GPU testing significantly so hopefully i can get it working- kitsuna
Spun up an Ubuntu system and it worked right away so ill just use this on the GPU test system. Great program now i can stress 5 cards at once!- kitsuna
genort, Same problem here. I figured out that the Tegra OS does not have nvidia-smi. I get the program working by commenting anything regarding nvidia-smi or temperature reading. (You will have to find another tool if you want do read the temperatures)- c
Hi Anybody got this working on Ubuntu 18.04 and Cuda 9.2?- Siroberts
Hi, I am trying the test suit with two cards: one gigabyte 1070 gamer together with a GB 1070Ti under Ubuntu 16.04. The system usually crashed in several minutes, and the power consumption of 1070 goes above 180W; the system does a reboot. When I test only one card (modifying the code), it runs without any trouble (1070 using ~180W, 1070Ti less). Removing one card the other runs without any pain. So, what I can see that the two card cannot work together, but they work smoothly alone. (BTW when I block additional process testing further cards the code gives the temperature of the other card.) Anyway, this is what I can see. Is there anybody who could help me to solve this issue? I guess the FE is overlocked to keep pace with the Ti. (The cards would be used for simple FP computations and rarely visualize the results on remote Linux clients.) - nt- Nantucket
Any ideas about these errors when running make on Fedora 28 and CUDA V9.1.85 ? /usr/include/c++/8/type_traits(1049): error: type name is not allowed /usr/include/c++/8/type_traits(1049): error: type name is not allowed /usr/include/c++/8/type_traits(1049): error: identifier "__is_assignable" is undefined 3 errors detected in the compilation of "/tmp/tmpxft_00001256_00000000-6_compare.cpp1.ii".- ss
SS, Your GCC is not working correctly. I can see it's including GCC version 8 headers, but CUDA (even 9.2) doesn't support GCC versions newer than 7. Maybe you've somehow managed to use an older GCC version with newer GCC headers.- wili
wili, thanks for spotting that! Yes, I have GCC version 8 installed by default with Fedora 28, but CUDA from negativo (https://negativo17.org/nvidia-driver/) installs "cuda-gcc" and "cuda-gcc-c++" version 6.4.0. I got gpu-burn working by modifying the Makefile to use: 1) the correct CUDAPATH and header directories 2) cuda-g++ 3) the -ccbin flag for nvcc I'll include it below in case it helps others. ========== CUDAPATH=/usr/include/cuda # Have this point to an old enough gcc (for nvcc) GCCPATH=/usr HOST_COMPILER=cuda-g++ NVCC=nvcc CCPATH=${GCCPATH}/bin drv: PATH=${PATH}:.:${CCPATH}:${PATH} ${NVCC} -I${CUDAPATH} -ccbin=${HOST_COMPILER} -arch=compute_30 -ptx compare.cu -o compare.ptx ${HOST_COMPILER} -O3 -Wno-unused-result -I${CUDAPATH} -c gpu_burn-drv.cpp ${HOST_COMPILER} -o gpu_burn gpu_burn-drv.o -O3 -lcuda -L${CUDAPATH} -L${CUDAPATH} -Wl,-rpath=${CUDAPATH} -Wl,-rpath=${CUDAPATH} -lcublas -lcudart -o gpu_burn- ss
Is "gpu-burn -d" supposed to print incremental status updates like "gpu-burn" does? For DOUBLES, I'm not seeing those, and after running for several (10?) seconds, it freezes all graphical parts of my system until eventually terminating in the usual way with "Killing processes.. done Tested 1 GPUs: GPU 0: OK"- ss
It's a GT1030 (with GDDR5) and at least according to wikipedia it does support single, double, and half precision.- ss
I am running gpu_burn ver 0.9 on a GTX670 and this works perfectly. When I try and test a GTX580 it fails I think due to compute capabilty of the card only being 2.0. I tried to edit the makefile (-arch=compute_30) to (-arch=compute_20) and recompile but this fails to compile. Any ideas on how to get it to support the older card would be much appreciated.- id23raw
id23raw, Fermi architecture was deprecated in CUDA 8 and dropped from CUDA 9, so the blame is on NVidia. Your only choice is to downgrade to an older CUDA and then use compute_20. BR,- wili
Thank you for the prompt reply to my question Wili.- id23raw
Hi ship, The only way available to you right now is to run ./gpu_burn 120 1>> LOG_FILE 2>> LOG_FILE- wili
Thanks for the code
Is there any chance this works with the new 2080ti's, I need to stress test 4 2080ti's simultaneously.- JO
Hi JO, I haven't had the chance to test this myself, but according to external reports it works fine on Turing cards as well. So give it a spin and let me know how it works. One observation worth note is that the "heat up" process has increased in length from previous generations (like has been the case with many Teslas for example), so do run it with longer durations. (Not e.g. with just 10 secs). BR,- wili
I got the following error, ideas? ./gpu Run length not specified in the command line. Burning for 10 secs GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-2f4ea2c2-8119-cda0-63a0-36dfca404b53) GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-fc39fba9-e82e-3fbb-81e7-d455355ecdd1) GPU 2: Tesla V100-SXM2-16GB (UUID: GPU-0463eba2-0217-fbcc-1b86-02f4bf9b3f34) GPU 3: Tesla V100-SXM2-16GB (UUID: GPU-1f424260-52fb-f7ee-3887-e91ece7b7438) GPU 4: Tesla V100-SXM2-16GB (UUID: GPU-449c96b9-d7b4-1ed9-7d9f-e479a9b56100) GPU 5: Tesla V100-SXM2-16GB (UUID: GPU-f73428a8-e2ed-8854-cc90-cab903c695d0) GPU 6: Tesla V100-SXM2-16GB (UUID: GPU-bd886dab-500d-f38f-f93c-c0e53bbc6a4d) GPU 7: Tesla V100-SXM2-16GB (UUID: GPU-07e97e68-877d-03a9-ee5a-e85d315130bc) Couldn't init a GPU test: Error in "init": CUBLAS_STATUS_NOT_INITIALIZED- dave
ran ./gpu_burn 60 with my two 1080's and computer crashed within two seconds... thanks!- 1080 guy
version 0.9 worked perfectly for me (CUDA 9.1 on two testla's k20Xm and one GT 710- wkernkamp
My gpu was under utilized (showed 30-50% utilization) and after running your code it shows >95% utilization. Your code has got the awakening power for sure. Thanks.- sedawk
i have 1 gpu tesla v100, cuda-10.1, debian 9, my gpu running hot 90+ when i ran over 30sec, i don't want to burn my gpu, is their a way i can throttle it down a bit or is this normal.- jdm
Thank you Will for the test. Tried it on 4 2080Tis with cuda 10. Works fine. In single precisions 1 gpu has 5 errors the rest are OK, but in double precision all gpus are OK. Any idea?- andy
GPU 0: GeForce 940MX (UUID: GPU-bb6cb5d0-ca68-c456-1588-9c0bcb409409) Initialized device 0 with 4046 MB of memory (3611 MB available, using 3250 MB of it), using FLOATS Couldn't init a GPU test: Error in "load module": 11.7% proc'd: 0 (0 Gflop/s) errors: 0 temps: 50 C Summary at: Dom Jul 21 11:51:59 -03 2019 23.3% proc'd: 0 (0 Gflop/s) errors: 0 temps: 52 C Summary at: Dom Jul 21 11:52:06 -03 2019 35.0% proc'd: 0 (0 Gflop/s) errors: 0 temps: 55 C Summary at: Dom Jul 21 11:52:13 -03 2019 46.7% proc'd: 0 (0 Gflop/s) errors: 0 temps: 56 C Summary at: Dom Jul 21 11:52:20 -03 2019 58.3% proc'd: 0 (0 Gflop/s) errors: 0 temps: 57 C Summary at: Dom Jul 21 11:52:27 -03 2019 68.3% proc'd: 0 (0 Gflop/s) errors: 0 temps: 58 C Summary at: Dom Jul 21 11:52:33 -03 2019 80.0% proc'd: 0 (0 Gflop/s) errors: 0 temps: 58 C Summary at: Dom Jul 21 11:52:40 -03 2019 91.7% proc'd: 0 (0 Gflop/s) errors: 0 temps: 57 C Summary at: Dom Jul 21 11:52:47 -03 2019 100.0% proc'd: 0 (0 Gflop/s) errors: 0 temps: 56 C Killing processes.. done Tested 1 GPUs: GPU 0: OK i didnt understand the output, im not sure if the test was successfully done- dfvneto
gpu-burn doesnn't compile. Make produces lots of deprecated errors. Using Titan RTX, CUDA 10.1, and nvidia driver 418.67.- airyimbin
Odd one I'm seeing at the moment: GPU 0: GeForce TRX 2070 (UUID: GPU-9d0c1beb-ed60-283c-d7ac88404a8b) Initialized device 0 with 7981 MB of memory (7577 MB available, using 6819MB of it), using FLOATS Failure during compute: Error in "SGEMM": CUBLAS_STATUS_EXECUTION_FAILED 10.0% proc'd: -1 (0 Gflop/s) errors: -1 (DIED!) temps: 47 C No clients are alive! Aborting This is with nvidia driver 430, libcublas9.1.85 I've tested the same motherboard with multiple 2070's which all show the same issue yet it works fine with a 2080ti in that system. The same 2070's then work on other (typically) older systems. At a bit of a loss as to where to go from here - any advice appreciated- gardron
Just wanted to take a moment and thank you for this awesome tool. I had an unreliable graphics card that was rock stable on almost EVERY workload (gaming, benchmarks, etc), except some CUDA workloads (especially DNN training). The program calling CUDA would sometimes die with completely random errors (INVALID_VALUE, INVALID_ARGUMENT, etc.). I couldn't figure out what the issue was or how to reproduce, until I tested your tool and it consistently failed! I sent the card back for warranty, tested the replacement card and BOOM! It's been rock solid for the past 1.5 years. So, thanks so much!- Mehran
the commit 73d13cbba6cc959b3214343cb3cc83ec5354a9d2 is a right way. However, this change does not make any difference on the binary. It seems the CUDA just used free for cuMemFreeHost. Driver Version: 410.79 CUDA Version: 10.0- frank
Hi I have to make v100 card works on some specific loading, not full loading. For example, v100 full loading is 250(watt), may I know some commands of ./gpu_burn make v100 works on 125(watt), 150(watt) ... 200(watt) etc. Thank you : )- Binhao
Hi Wili Hello, may I ask if there is any command that can print the running log slower after running./GPU burn?- Focus
Hi Focus, Not right now unless you go and modify the code (increment 10.0f on gpu_burn-drv.cpp:525). I'll make a note that I'll make this adjustable in the future (not sure when I have the time). Thanks for the suggestion!- wili
Hello and thanks for the program. When I see: 100.0% proc'd: 33418 (4752 Gflop/s) - 32184 (4596 Gflop/s) errors: 9183 (WARNING!) - 0 temps: 74 C - 68 C What does the "4596 Gflop/s" mean? Is it the gflops for onw gpu or for all the gpus running?- Rozo
Hi Rozo, The first GPU churns 4752Gflop/s and the second 4596Gflop/s. The order is listed in the beginning of the run.- wili
Hello and many thanks for this program. I have always had a need to test Nvidia GPU's because I either buy then on eBay or I buy systems that have them installed and I need to be absolutely sure they are 100% functional. I had used the OCCT program in Windows but that only allowed me to test one at a time for 1 hour. Also OCCT only allows 2GB of GPU memory to be tested where as GPU-Burn uses 90% of available memory so it is much more thorough. Now I am testing up to five at a time. Thanks- Steve
Hello again, I am reporting a bug in that one of the five GPU's I am testing has the thread for it stop and hang at 1000 iterations. For the other four GPU's they continue increasing to the iteration until the final iteration counts of 32,250, 32,250, 32,250 and 32,200. All five of the GPU's are Quadro K600's. The program then tries to exit at 100% testing done but hangs on the GPU that was hung. The only way to exit the program is by doing a <CTRL>C. I also want to point out that the error counts for all five GPU's shows 0 even the one that hung. I you have a chance to fix this a solution may be to detect a hung GPU thread, stop it, continue testing the remaining GPU's and then in the final report state that it is faulty. Thanks- Steve
Hi Steve, Happy to hear you were able to test 5 GPUs at once. Totally agree with you, a hung GPU (thread) should be detected and reported accordingly. That case didn't receive much testing from me because the faulty GPUs I've had kept on running but just produced incorrect results. I'll add this on the TODO list. Thanks!- wili
Hello and thanks for the program ! I am doing tests on two GPU : Quadro RTX4000 and Tesla K80 For both the burn is doing fine but with nvidia-smi I can see that I almost maximal power consumption for the K80 (290/300W) while for the RTX4000 I only use 48/125W Is there a way to increase the power consumption doing the burn on the RTX4000 ? Thanks Thomas- Thomas
Hi Wili, Hope you can help me out with the below problem : test1@test1-pc:~$ nvidia-smi Thu Jul 9 10:32:25 2020 ±----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 | |-------------------------------±---------------------±---------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce GTX 1070 On | 00000000:01:00.0 On | N/A | | 0% 49C P8 11W / 151W | 222MiB / 8116MiB | 0% Default | | | | N/A | ±------------------------------±---------------------±---------------------+ ±----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1006 G /usr/lib/xorg/Xorg 81MiB | | 0 N/A N/A 1150 G /usr/bin/gnome-shell 124MiB | | 0 N/A N/A 3387 G …mviewer/tv_bin/TeamViewer 11MiB | ±----------------------------------------------------------------------------+ test1@test1-pc:~$ nvcc --version nvcc: NVIDIA ® Cuda compiler driver Copyright © 2005-2017 NVIDIA Corporation Built on Fri_Nov__3_21:07:56_CDT_2017 Cuda compilation tools, release 9.1, V9.1.85 test1@test1-pc:~$ gcc --version gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Copyright © 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. test1@test1-pc:~$ cd gpu-burn test1@test1-pc:~/gpu-burn$ make PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:.:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin /usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -arch=compute_30 -ptx compare.cu -o compare.ptx nvcc fatal : Value ‘compute_30’ is not defined for option ‘gpu-architecture’ Makefile:9: recipe for target ‘drv’ failed make: *** [drv] Error 1 As i am new to ubuntu, could you point me to what i am missing out to cause this error? Thanks for your time.- robbocop
Hi there, It is very great tool for GPU stress test, Can I know is it compatible for latest cuda, like 10.x? Best regards Jason- Jason
Another person trying to use this on a TX2, I had the same issue as the other person. A fix that works is to comment out the function call and the line before it that tries to call the function for checking the temps. There are other ways to monitor temp on the TX2, and the code works with that removed.- LovesTha
Hi Wili I tried to run gpu-burn on the following environment but failed with the following error message: 1) OS SLES 15SP2 2) cuda 11.1 installed and have the gpu-burn compiled, 3: NVDIA-Linux-x86_64-450.80.02 # gpu-burn GPU 0: Quadro K5200 (UUID: GPU-684e262c-85a7-d822-c3a4-bb41918db340) GPU 1: Tesla K40c (UUID: GPU-bb529e3e-aa59-3f8f-c6e6-ee3b1cba7bf5) Initialized device 0 with 11441 MB of memory (11324 MB available, using 10191 MB of it), using DOUBLES Initialized device 1 with 7616 MB of memory (7510 MB available, using 6759 MB of it), using DOUBLES 0.0% proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: 1605296569 (WARNING!)- 0 temps: 40 C - 42 C 0.0% proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: -2136859607 (WARNING!)- 0 temps: 40 C - 42 C 0.0% proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: -1584048487 (WARNING!)- 0 temps: 40 C - 42 C ...100.0% proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: -1767488304 (WARNING!)- -1767488304 (WARNING!) temps: 41 C - 43 C 100.0% proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: -1214677184 (WARNING!)- -1214677184 (WARNING!) temps: 41 C - 43 C 100.0% proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: -661866064 (WARNING!)- -661866064 (WARNING!) temps: 41 C - 43 C 100.0% proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: -109054944 (WARNING!)- -109054944 (WARNING!) temps: 41 C - 43 C 100.0% proc'd: 0 (0 Gflop/s) - 0 (0 Gflop/s) errors: 443756176 (WARNING!)- 443756176 (WARNING!) temps: 41 C - 43 C Killing processes.. done Tested 2 GPUs: GPU 0: FAULTY GPU 1: FAULTY I don't believe Both GPU are faulty... Any comparability issue with the environment? - William- William
fae@intel:~/gpu_burn$ sudo make PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:.:/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin /usr/local/cuda/bin/nvcc -I/usr/local/cuda/include -arch=compute_30 -ptx compare.cu -o compare.ptx nvcc fatal : Value 'compute_30' is not defined for option 'gpu-architecture' Makefile:10: recipe for target 'drv' failed make: *** [drv] Error 1 This symptom cause by nvcc --gpu-architecture (-arch) in CUDA v11.1 not support compute_30. So we should edit Makefile as below. CUDAPATH=/usr/local/cuda # Have this point to an old enough gcc (for nvcc) GCCPATH=/usr NVCC=${CUDAPATH}/bin/nvcc CCPATH=${GCCPATH}/bin drv: PATH=${PATH}:.:${CCPATH}:${PATH} ${NVCC} -I${CUDAPATH}/include -arch=compute_30 -ptx compare.cu -o compare.ptx g++ -O3 -Wno-unused-result -I${CUDAPATH}/include -c gpu_burn-drv.cpp g++ -o gpu_burn gpu_burn-drv.o -O3 -lcuda -L${CUDAPATH}/lib64 -L${CUDAPATH}/lib -Wl,-rpath=${CUDAPATH}/lib64 -Wl,-rpath=${CUDAPATH}/lib -lcublas -lcudart -o gpu_burn Change -arch=compute_30 to compute_80, or refer the Virtual Architecture Feature List change value.- Andy Yang
Hello, While trying to run gpu-burn on K8s, tried to spin up multiple pods of gpu-burn by enabling time slicing feature of nvidia. However, lets say out of 8 pods of gpu-burn, 6 pods totally ran fine and returned the appropriate throughput. However, 2 pods return FAULT GPU with the error: "Couldn't init a GPU test: Error in "C alloc": CUDA_ERROR_INVALID_VALUE". Any idea how I should proceed with this?- ckuduvalli
啊啊啊?都没挖矿程åºå¥½ç”¨å•Šå“¥ä»¬- 大笨蛋
16.8% proc'd: 60072747 (113040 Gflop/s) - 61195827 (116874 Gflop/s) - 60717181 (114812 Gflop/s) - 60298700 (114959 Gflop/s) - 60377583 (114886 Gflop/s) - 60861577 (115925 Gflop/s) - 59667636 (112078 Gflop/s) - 58898861 (111560 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C 16.8% proc'd: 60072747 (113040 Gflop/s) - 61195827 (116874 Gflop/s) - 60717181 (114812 Gflop/s) - 60298700 (114959 Gflop/s) - 60377583 (114886 Gflop/s) - 60861577 (115925 Gflop/s) - 59667636 (112078 Gflop/s) - 58900198 (111332 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C 16.8% proc'd: 60072747 (113040 Gflop/s) - 61195827 (116874 Gflop/s) - 60717181 (114812 Gflop/s) - 60298700 (114959 Gflop/s) - 60377583 (114886 Gflop/s) - 60861577 (115925 Gflop/s) - 59668973 (114397 Gflop/s) - 58900198 (111332 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C 16.8% proc'd: 60072747 (113040 Gflop/s) - 61197164 (116745 Gflop/s) - 60717181 (114812 Gflop/s) - 60298700 (114959 Gflop/s) - 60377583 (114886 Gflop/s) - 60861577 (115925 Gflop/s) - 59668973 (114397 Gflop/s) - 58900198 (111332 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C 16.8% proc'd: 60074084 (115018 Gflop/s) - 61197164 (116745 Gflop/s) - 60717181 (114812 Gflop/s) - 60298700 (114959 Gflop/s) - 60377583 (114886 Gflop/s) - 60861577 (115925 Gflop/s) - 59668973 (114397 Gflop/s) - 58900198 (111332 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C 16.8% proc'd: 60074084 (115018 Gflop/s) - 61197164 (116745 Gflop/s) - 60717181 (114812 Gflop/s) - 60300037 (114361 Gflop/s) - 60377583 (114886 Gflop/s) - 60861577 (115925 Gflop/s) - 59668973 (114397 Gflop/s) - 58900198 (111332 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C 16.8% proc'd: 60074084 (115018 Gflop/s) - 61197164 (116745 Gflop/s) - 60717181 (114812 Gflop/s) - 60300037 (114361 Gflop/s) - 60377583 (114886 Gflop/s) - 60862914 (115543 Gflop/s) - 59668973 (114397 Gflop/s) - 58900198 (111332 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C 16.8% proc'd: 60074084 (115018 Gflop/s) - 61197164 (116745 Gflop/s) - 60718518 (114107 Gflop/s) - 60300037 (114361 Gflop/s) - 60377583 (114886 Gflop/s) - 60862914 (115543 Gflop/s) - 59668973 (114397 Gflop/s) - 58900198 (111332 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C 16.8% proc'd: 60074084 (115018 Gflop/s) - 61197164 (116745 Gflop/s) - 60718518 (114107 Gflop/s) - 60300037 (114361 Gflop/s) - 60378920 (114074 Gflop/s) - 60862914 (115543 Gflop/s) - 59668973 (114397 Gflop/s) - 58900198 (111332 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C 16.8% proc'd: 60074084 (115018 Gflop/s) - 61197164 (116745 Gflop/s) - 60718518 (114107 Gflop/s) - 60300037 (114361 Gflop/s) - 60378920 (114074 Gflop/s) - 60862914 (115543 Gflop/s) - 59668973 (114397 Gflop/s) - 58901535 (110352 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C 16.8% proc'd: 60074084 (115018 Gflop/s) - 61197164 (116745 Gflop/s) - 60718518 (114107 Gflop/s) - 60300037 (114361 Gflop/s) - 60378920 (114074 Gflop/s) - 60862914 (115543 Gflop/s) - 59670310 (112828 Gflop/s) - 58901535 (110352 Gflop/s) errors: 0 - 0 - 0 - 0 - 167742 (WARNING!)- 0 - 0 - 0 temps: 77 C - 75 C - 77 C - 70 C - 76 C - 73 C - 85 C - 69 C The factory test found that the fifth GPU card was abnormal, which was solved after replacement- wade