![]() ![]() |
Nov 5 2009, 11:06 PM
Post
#1
|
|
|
Group: Moderators Posts: 2,619 Joined: 3-June 08 From: Santa Clara, CA Member No.: 106,363 Club SLI Member: No Org.: NVIDIA |
The CUDA Toolkit 3.0 Beta is now available.
Highlights for this release include:
Windows developers should be sure to sign up for the Nexus (codename) beta program, and test drive the integrated support for GPU hardware debugging, profiling, and platform trace/analysis features at: www.nvidia.com/nexus Please review the errata document for important notes about using this beta release. Special Notice for MacOS X Developers
Downloads Getting Started - Linux Getting Started - OS X Getting Started - Windows XP32 195.39 XP64 195.39 Vista/Win7 32 195.39 Vista/Win7 64 195.39 Notebook XP32 195.39 Notebok XP64 195.39 Notebook Vista/Win7 32 195.39 Notebook Vista/Win7 64 195.39 Linux 32 195.17 Linux 64 195.17 3.0.0 for Non-GT200 Leopard 3.0.1 for GT200 Leopard and Snow Leopard CUDA Toolkit for Fedora 10 32-bit CUDA Toolkit for RHEL 4.8 32-bit CUDA Toolkit for RHEL 5.3 32-bit CUDA Toolkit for SLED 11.0 32-bit CUDA Toolkit for SuSE 11.1 32-bit CUDA Toolkit for Ubuntu 9.04 32-bit CUDA Toolkit for Fedora 10 64-bit CUDA Toolkit for RHEL 4.8 64-bit CUDA Toolkit for RHEL 5.3 64-bit CUDA Toolkit for SLED 11.0 64-bit CUDA Toolkit for SuSE 11.1 64-bit CUDA Toolkit for Ubuntu 9.04 64-bit CUDA Toolkit for OS X CUDA Toolkit for Windows 32-bit CUDA Toolkit for Windows 64-bit CUDA Profiler 3.0 Beta Readme CUDA Profiler 3.0 Beta Release Notes for Linux CUDA Profiler 3.0 Beta Release Notes for OS X CUDA Profiler 3.0 Beta Release Notes for Windows CUDA Toolkit EULA CUDA-GDB Readme CUDA-GDB User Manual CUDA Reference Manual CUDA Toolkit Release Notes for Linux CUDA Toolkit Release Notes for OS X CUDA Toolkit Release Notes for Windows CUDA Programming Guide CUDA Best Practices Guide Online Documentation GPU Computing SDK for Linux GPU Computing SDK for OS X GPU Computing SDK for Win32 GPU Computing SDK for Win64 CUDA SDK Release Notes DirectCompute Release Notes OpenCL Release Notes GPU Computing EULA |
|
|
|
Nov 5 2009, 11:09 PM
Post
#2
|
|
|
Group: Moderators Posts: 2,619 Joined: 3-June 08 From: Santa Clara, CA Member No.: 106,363 Club SLI Member: No Org.: NVIDIA |
|
|
|
|
Nov 6 2009, 01:17 AM
Post
#3
|
|
![]() ![]() Group: Members Posts: 17 Joined: 5-March 07 Member No.: 43,819 |
Is it possible to have the "CUDA 3 Beta Programming Guide" available for separate download?
This would provide a way for me to learn more about the release without downloading/installing the entire SDK. |
|
|
|
Nov 6 2009, 01:45 AM
Post
#4
|
|
![]() ![]() ![]() ![]() Group: Members Posts: 90 Joined: 16-May 09 Member No.: 155,172 Org.: funcom |
It just bluescreened when I tried to run my code under profiler. I have a minidump if anyone from nvidia would like to take a look
|
|
|
|
Nov 6 2009, 03:41 AM
Post
#5
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 1,210 Joined: 13-June 08 From: California USA Member No.: 107,688 |
Is it possible to have the "CUDA 3 Beta Programming Guide" available for separate download? This would provide a way for me to learn more about the release without downloading/installing the entire SDK. The toolkit beta teases you with new docs labelled "CUDA 3 Beta Programming Guide" but they are just the 2.3 docs. That's the first thing I wanted to look at! The best practices guide, nvcc docs, and PTX spec are also all 2.3 versions. I checked both Linux and Windows. There are new 3.0 CCUBLAS and CUFFT beta docs though. Tim, are we allowed to openly discuss the 3.0 toolkit beta here? The rules were relaxed for 2.3 beta and that was nice to allow forum discussion. There's some promising new features in 3.0 even ignoring the Fermi support! |
|
|
|
Nov 6 2009, 07:17 AM
Post
#6
|
|
|
Group: Moderators Posts: 2,619 Joined: 3-June 08 From: Santa Clara, CA Member No.: 106,363 Club SLI Member: No Org.: NVIDIA |
yeah, feel free to discuss as per usual.
I'll investigate the documentation packaging issue tomorrow... |
|
|
|
Nov 6 2009, 10:19 AM
Post
#7
|
|
![]() Group: Members Posts: 3 Joined: 18-April 09 Member No.: 151,009 |
Nice! but my code runs now 3 times slower vs 2.3 sdk! ptx generated code is the same, so is it the driver? is it the toolkit? I know this is a beta release.
|
|
|
|
Nov 6 2009, 01:22 PM
Post
#8
|
|
![]() ![]() ![]() ![]() Group: Members Posts: 89 Joined: 6-September 07 From: Berlin, Germany Member No.: 68,914 |
I downloaded the driver, toolkit, sdk last night (openSuSE 11.1, 64-bit) and just compiled it and ran first tests. nbody and my own test examples (sgemm/dgemm others) run just as fast as before. The setup is 2x GTX260. I have one code with runs on dual GPUs using data partitioning and QThreads (from PyQt/Qt) and it does fine. What I did notice, since my dual-GPU code reports both the total time spent in the kernel and the elapsed time for the whole thread (data is already on the GPU), is that the CUDA runtime initialization part is much faster. It used to be that there was an extra 0.6 to 1.5 seconds(!) to be added to the pure kernel run time. They are now almost the same:
CODE +++< GPU count: 2 GPUs >+++ All GPUs in parallel - one thread per GPU ----------------------------------------- GPU-0 : GeForce GTX 260 (1.3) - sum : 0.689 sec [404.1 GFlops] GPU-1 : GeForce GTX 260 (1.3) - sum : 0.692 sec [402.8 GFlops] GPU-0 : GeForce GTX 260 (1.3) - run : 2.105 sec [132.3 GFlops] GPU-1 : GeForce GTX 260 (1.3) - run : 2.108 sec [132.1 GFlops] All GPUs - run : 2.185 sec [254.9 GFlops] Last two elements GPU: 1.019e-03 3.220e-04 +++< CPU count: 2 CPUs >+++ All CPUs in parallel - one thread per CPU ----------------------------------------- Thread-0 - sum : 708.240 sec [ 3.4 GFlops] Thread-0 - run : 708.271 sec [ 3.4 GFlops] Thread-1 - sum : 717.896 sec [ 3.4 GFlops] Thread-1 - run : 717.946 sec [ 3.4 GFlops] All CPU - run : 717.947 sec [ 6.8 GFlops] Last two elements CPU: 1.019e-03 3.220e-04 Speedup: 328.5 max error at : 3929812 12 49 98 6.52e-01 6.53e-01 2.93e-04 L2 rel error : 2.5e-04 max rel error : 4.5e-04 The speedups are based on times, not flops values. Flops are defined differently on the CPU and GPU, as it involves trigs (One trig counted as 4 flops GPU and 140 flops CPU). It is now (no CPU part ,as I do not want to wait 12 minutes for the CPU): CODE +++< GPU count: 2 GPUs >+++ All GPUs in parallel - one thread per GPU ----------------------------------------- GPU-0 : GeForce GTX 260 (1.3) - sum : 0.698 sec [399.2 GFlops] GPU-0 : GeForce GTX 260 (1.3) - run : 0.841 sec [331.2 GFlops] GPU-1 : GeForce GTX 260 (1.3) - sum : 0.700 sec [397.8 GFlops] GPU-1 : GeForce GTX 260 (1.3) - run : 0.844 sec [329.9 GFlops] All GPUs - run : 1.026 sec [543.1 GFlops] Last two elements GPU: 1.019e-03 3.220e-04 As now OpenCL is included with the driver, I have an issue with that. Last week I wrote ctypes-based Python bindings for OpenCL, like I did for CUDA itself previously (python-cuda). With driver 190.29 a device query gave me CODE CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS : 3 CL_DEVICE_MAX_WORK_ITEM_SIZES : 512 512 64 I now get (I changed the format to hex to see whats going on): CODE CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS : 0x3 CL_DEVICE_MAX_WORK_ITEM_SIZES : 0x200 0x7F6800000200 0x7F6800000040 CL_DEVICE_MAX_WORK_ITEM_SIZES is queried passing a pointer to (size_t*3), i.e. a ctypes array of 3 64-bit longs . The first element is correct, the second and third contain "garbage" (that looks like a device address ?) in the high-order bits and the correct value in the low 32 bits. size_t is 64-bit, passing a pointer to an array of 3 32-bit ints just gives all 0. So 64-bit, as per OpenCL spec, should be correct. What happened to OpenCL between 190.29 and 195.17? In addition I also get a double free error thrown by glibc on one machine and a segmentation fault on another, after the code ran successfully. Just to be clear: I am calling OpenCL from Python. vector_add example and a simple bandwidth test work just fine, also all data returned in the device query are basically correct. The segmentation fault was apparently caused by a bug in my code. After a rewrite it goes away. However the MAX_WORK_ITEM_SIZES issue still exits. The C++ SDK sample gives the correct result, Python::OpenCL (don't recall it's exact name right now), based on Cython, also gives a correct answer. PyOpenCL, based on boost.python also gets one value wrong (the last dimension). Despite cl_khr_fp64 available, both my ctypes OpenCL bindings and PyOpenCL report preferred vector width double as 0, while the SDK reports 1. As far as speed is concerned, I see no slow-down with 3.0b1, tested in 9650M GT, dual GTX-260 and dual GTX-280. I do notice, that X11 windows seem to pop up significantly faster than with previous driver version. This post has been edited by apaehler: Nov 7 2009, 06:56 PM |
|
|
|
Nov 6 2009, 03:49 PM
Post
#9
|
|
![]() ![]() ![]() ![]() Group: Members Posts: 77 Joined: 16-February 09 From: Germany Member No.: 141,107 |
I wonder why the 'sdk' folder within the CUDA-SDK is an exact copy of the SDK itself!? So every example (cuda and opencl), library, etc. exists twice.
Edit: I get 1403 "unused parameter X" warnings from nvcc when compiling my programs with "--compiler-options -Wall,-Wextra" in the following header files:
There are a lot of "declared 'static' but never defined" warnings, too. I'm using gcc 4.3.2 on a 64 bit linux machine. In spite of the warnings everything seems to work fine, but in fact a little bit slower than with 2.3. This post has been edited by Tobi_W: Nov 6 2009, 04:45 PM |
|
|
|
Nov 7 2009, 12:25 AM
Post
#10
|
|
|
Group: Moderators Posts: 2,619 Joined: 3-June 08 From: Santa Clara, CA Member No.: 106,363 Club SLI Member: No Org.: NVIDIA |
|
|
|
|
Nov 7 2009, 11:03 AM
Post
#11
|
|
![]() Group: Members Posts: 3 Joined: 18-April 09 Member No.: 151,009 |
I've tried 190.42 and 195.17 beta drivers on Ubuntu 9.10 64 using CUDA SDK 2.3 and 3.0 beta and gcc 4.3
I'm using 2 devices 285 GTX, and my code is set to use both devices (SLI is OFF). Also I use a 3rd card (8400 GS) for display (not for CUDA). 190.42 + SDK 2.3 = 13 seconds 195.17 + SDK 2.3 || 195.17 + SDK 3.0 beta = 34 seconds ! I've checked the 8400 is not being used anytime. Then I supose it's a driver problem (I really hope). However Nbody demo performs better at 195.17 + SDK 3.0 (up to 500 GFLOPs) but smokeparticles also mess up performace :( |
|
|
|
Nov 9 2009, 09:34 AM
Post
#12
|
|
![]() ![]() Group: Members Posts: 16 Joined: 8-December 08 Member No.: 129,480 Org.: Onera |
I've a gcc-4.4,
with the 2.3 cuda sdk/toolkit I use the '--compiler-biindir' option to chose the gcc-4.3 with nvcc in 3.0 beta, this option is probably bad parsed : with : "--compiler-bindir=/usr/bin/gcc-4.3' I've the error : unsuported compiler '/usr/bin/gcc-4' my solution (hack) is to unlink all /usr/bin/{gcc,g++,cpp, ... } who point to 4.4 and make links to the 4.3. |
|
|
|
Nov 10 2009, 03:50 PM
Post
#13
|
|
![]() ![]() ![]() ![]() ![]() Group: Members Posts: 201 Joined: 19-December 08 Member No.: 131,297 |
How do I become a registered developer?
|
|
|
|
Nov 10 2009, 06:04 PM
Post
#14
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: NVIDIA Employees Posts: 858 Joined: 17-November 04 From: London, England Member No.: 243 Org.: NVIDIA Developer Technologies |
Sign up as a "GPU Computing Developer" here:
http://developer.nvidia.com/page/registere...er_program.html |
|
|
|
Nov 11 2009, 12:51 PM
Post
#15
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 732 Joined: 14-August 08 From: Cambridge, United Kingdom Member No.: 115,518 Org.: TidePowerd, Ltd. |
As reported in this thread, I was having some problems with CUDA and Windows 7:
http://forums.nvidia.com/index.php?showtopic=101930 I installed the new 3.0-beta drivers, toolkit and SDK and tried running some of the examples, and I'm still having the same problem (kernels take several seconds before executing, and the entire system freezes during that time). I went into the Nvidia Control Panel and disabled my 2nd, 3rd, and 4th monitors and enabled the multi-GPU acceleration; now the examples run just fine, but when I run the deviceQueryDrv example, it only shows a single device. Since I'm not running displays on the other 3 GPUs (I have 2x GTX 295's), why don't they show up? Also, the device query on the device that does show up says that there is no time limit on kernel execution. EDIT: Does anyone know if the PTX version will increase to version 1.5 for this release of the CUDA driver? The 3.0-beta toolkit includes the PTX 1.4 specification. This post has been edited by profquail: Nov 11 2009, 12:57 PM |
|
|
|
Nov 11 2009, 01:16 PM
Post
#16
|
|
![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 161 Joined: 14-February 08 From: Heidelberg Member No.: 92,485 Org.: Frankfurt Institute for Advanced Studies |
I get 1403 "unused parameter X" warnings from nvcc when compiling my programs with "--compiler-options -Wall,-Wextra" in the following header files: In my experience those are caused from gcc when invoked from nvcc. You should be able to shut them up by telling gcc that those are system directories and it should not warn you about errors within those files. (BTW a lot easier if you use CMake to invoke the compilation.) |
|
|
|
Nov 11 2009, 01:24 PM
Post
#17
|
|
![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 161 Joined: 14-February 08 From: Heidelberg Member No.: 92,485 Org.: Frankfurt Institute for Advanced Studies |
I noticed there now is a --multicore (and even --multicore-llvm) switch in the compiler, however the headers disable compilation if this switch is used for all compilers but MSVC. Is the multicore support on linux planned for 3.0 final?
|
|
|
|
Nov 12 2009, 10:31 AM
Post
#18
|
|
![]() Group: Members Posts: 9 Joined: 6-November 09 From: Bremen, Germany Member No.: 244,381 Club SLI Member: No |
are there any problems with gcc4.4 (of ubuntu9.1) and cudaSDK 3.0beta ?
if not, i would like to register as a developer. Can I do this as a hobby-programmer just for simple tests ? |
|
|
|
Nov 12 2009, 10:35 AM
Post
#19
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() Group: Members Posts: 982 Joined: 4-April 06 From: Munich, Germany Member No.: 18,632 Org.: Nomor Research GmbH |
are there any problems with gcc4.4 (of ubuntu9.1) and cudaSDK 3.0beta ? if not, i would like to register as a developer. Can I do this as a hobby-programmer just for simple tests ? They do ask for company size, job position, area of work and such things. You can always specify a company size of "1" ;) Christian |
|
|
|
Nov 12 2009, 10:45 AM
Post
#20
|
|
![]() Group: Members Posts: 9 Joined: 6-November 09 From: Bremen, Germany Member No.: 244,381 Club SLI Member: No |
this is no way to get beta feedback.
I do not want to report all these things just to test a beta version |
|
|
|
![]() ![]() |
| Copyright © 2008 NVIDIA® Corporation. Terms of Use | Legal Info | Privacy Policy | Time is now: 29th July 2010 - 06:20 PM |