Help - Search - Members - Calendar
Full Version: CUDA Programming and Development
NVIDIA Forums > CUDA GPU Computing > CUDA Programming and Development
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43
  1. Sticky:CUDA Occupancy Calculator (47 replies)
  2. Sticky:Links to CUDA development tools (5 replies)
  3. Sticky:Reporting a problem with CUDA (0 replies)
  4. Whats wrong with this simple kernel call? (0 replies)
  5. Reduction & block dimension (6 replies)
  6. How to execute a CUDA/MPI program (2 replies)
  7. MersenneTwister configuration (6 replies)
  8. Passing an array of a user defined structure to a Kernel (1 reply)
  9. NVCC: Variable missing when compiling with nvcc? (1 reply)
  10. cudaMemcpy3D (1 reply)
  11. Do CUDA volatiles work? (0 replies)
  12. cublas problem: some blas 1 functions extremely slow! (1 reply)
  13. Multiple GPU speed problem (4 replies)
  14. correct syntax for operator overloading in CUDA (1 reply)
  15. How to reduce compile time for big kernel function? (3 replies)
  16. Cluster Algo Divide by Zero (0 replies)
  17. nvcc compiler bug (5 replies)
  18. CudaProf Exporting Graphs (0 replies)
  19. (C9999) max reg limit too low (1 reply)
  20. How to separate device function and kernel function? (2 replies)
  21. Newbie: __shared__, what am I doing wrong? (2 replies)
  22. How to reduce Local Memory Usage. (11 replies)
  23. CUDA&MATLAB (7 replies)
  24. cudaMemcpyToSymbol reversing order representation of byte array (2 replies)
  25. Pollard Rho Algorithm for solving Discrete Logarithm problem (0 replies)
  26. Optimization, high register usage with templates (16 replies)
  27. How to implement mean and standard deviation (0 replies)
  28. Getting cuerror Invalid device pointer (2 replies)
  29. CUDA Video Decoder problem (2 replies)
  30. Any good ideas for this special "reduction" ? (10 replies)
  31. ERROR : see declaration of _cudaFatcubinHandle redefined (1 reply)
  32. OpenGL Video Decoder H.264 (0 replies)
  33. Batched 2D FFT implementation (23 replies)
  34. Extending sobel example to 3D volume raster (0 replies)
  35. seg fault in kernel when attempting to += global memory (1 reply)
  36. Writing to global memory (5 replies)
  37. *** glibc detected *** runtime error (0 replies)
  38. CUBLAS question (3 replies)
  39. kernal function single variable (1 reply)
  40. slow runtime caused by cudaMemcpy() (5 replies)
  41. Strange Compiler Shared Memory Usage (5 replies)
  42. unspecified launch failure (3 replies)
  43. 2D array & unique indexation (3 replies)
  44. Is there an error in the cuda manual matrix multiplication example? (9 replies)
  45. enforcing dual-issue by mixing fp and integer arithmetic (1 reply)
  46. Matrix Reduction (7 replies)
  47. Finite element (1 reply)
  48. Multiple Reduction in a 2D array (6 replies)
  49. graph traversal (1 reply)
  50. Pointer to pointer (1 reply)
  51. CUBLAS library and kernel (4 replies)
  52. How can I get CUDA 3.0 (1 reply)
  53. Maximal allocatable memory block (4 replies)
  54. New to CUDA, anyone know a good basic tutorial/example? (2 replies)
  55. Copy data from device to another device (0 replies)
  56. Is it possible to overlap small IOs and computations? (2 replies)
  57. cuda-gdb get different result with cudaMemcpyHostToDevice function (2 replies)
  58. Multiple GPU memory address problem (6 replies)
  59. Applications with large CPU-GPU transfer time? (3 replies)
  60. texture : unsigned char and FilterModeLinear (4 replies)
  61. cudaThreadSynchronize() stalls application (10 replies)
  62. CUDA kernel timeout (10 replies)
  63. Enumeration using GPU (0 replies)
  64. round-to-double on GPUs? (10 replies)
  65. beginner question regarding shared memory (4 replies)
  66. my speedy FFT (113 replies)
  67. Building CUDA library (4 replies)
  68. cudaMallocPitch() and cudaMemcpy2D() (0 replies)
  69. When to use CUDA_SAFE_CALL() (2 replies)
  70. Occupancy calculator question (1 reply)
  71. enable double precision for SDK (8 replies)
  72. cudaMallocArray returning invalid error code? (0 replies)
  73. Initilize Array to Zero (3 replies)
  74. Reduction Operation to find the Minimum (2 replies)
  75. Writing to global memory failing at runtime (4 replies)
  76. Occupancy Calculation in check but still 'out of resource' error. (4 replies)
  77. Incorrect calculation results for thread block size equal to 512 (6 replies)
  78. Driver install problem, Win 7, GT 230M (1 reply)
  79. Async memory problems (5 replies)
  80. Question about NVIDIA CUDA Visual Profiler Version 2.2 (0 replies)
  81. BUG in the 64 Bit Driver of CUDA 2.3 (0 replies)
  82. hitting the grid size limitation (5 replies)
  83. Watchdog timeout error (0 replies)
  84. Problem with memory access or thread synchronization (2 replies)
  85. how to create a dynamic array in the device function? (4 replies)
  86. Load data on GPU, work on GPU and receive data from GPU. OPENGL or CUDA? (3 replies)
  87. Just give me an advice. (1 reply)
  88. Why is this channel descriptor invalid? (1 reply)
  89. cudaArray access data (2 replies)
  90. fatal error LNK1112: module machine type 'x64' conflicts with target machine type 'X86' (5 replies)
  91. NVCC command unknown (0 replies)
  92. Is my bandwidth calculation right? (3 replies)
  93. Multi-GPU Requirements ! (3 replies)
  94. cuFFT plans: memory and time (1 reply)
  95. limits on number of textures? (10 replies)
  96. cuda bug: cudaGLUnregisterBufferObject fails to reclaim memory (5 replies)
  97. Optimization (2 replies)
  98. Optimization of kernel (4 replies)
  99. CUFFT partial corruption problem - random affects (1 reply)
  100. cublasMalloc return success but set the pointer to 0... (0 replies)
  101. Contexts (1 reply)
  102. Cuda error: invalid device pointer. (5 replies)
  103. Bug report: Incorrect block scheduling (9 replies)
  104. -DNVCC flag? (0 replies)
  105. Mixed CUDA and MPI programming (7 replies)
  106. why did my first cudaMalloc() cost so much time? (1 reply)
  107. GPU Memory monitoring (1 reply)
  108. problem of cudaGetDevice for MultiGPUs (2 replies)
  109. copy C array of structs to GPU with driver API and use it (0 replies)
  110. Anyone have a simple k-means implementation for CUDA? (0 replies)
  111. Determining Thread vs Block (1 reply)
  112. Simple kernel problem (2 replies)
  113. problems with using streams when overlapping transfer and kernel execution (0 replies)
  114. Problems with __threadfence (2 replies)
  115. compiling c++ string failed (0 replies)
  116. Simple Driver API sample don't work (2 replies)
  117. Could someone compile simple example for me on the mobile card? (20 replies)
  118. Graphical Output Question (3 replies)
  119. CUDA SDK 2.3 Bug report: vectorAddDrv (1 reply)
  120. problem with zerocopy (1 reply)
  121. CUDA slower in Windows 7 than in Windows XP (21 replies)
  122. Gibbs Sampling on CUDA (1 reply)
  123. cudaMemcpy2D error (1 reply)
  124. Using Mersenne Twister (1 reply)
  125. CUBLAS question (4 replies)
  126. Compiler Bug ? Position of statement causes program to fail! (8 replies)
  127. cudaBindTexture2D incorrect documentation? (0 replies)
  128. driver API: JIT compiler error, CUDA_ERROR_NO_BINARY_FOR_GPU (1 reply)
  129. Is it possible to increment a variable by different threads at the same time ? (3 replies)
  130. Coalesced accesses on different arrays (2 replies)
  131. Global memory read interference? (2 replies)
  132. How to compile a simple example for the driver API? (1 reply)
  133. Atomic Operations: new value? (1 reply)
  134. Shared Memory Compilation Error (2 replies)
  135. Strange CUDA Image Processing behavior (1 reply)
  136. Loading structured data efficiently using CUDA (8 replies)
  137. thread / block allocation in function of data size (5 replies)
  138. Cuda Tree Structures (0 replies)
  139. Load balancing Cuda contexts (9 replies)
  140. alignment issue in passing arrays to the GPU in kernel parameters (3 replies)
  141. Possible cudaMalloc problem (1 reply)
  142. Generating cuda code at run time (2 replies)
  143. Cuda uncomprensible error (2 replies)
  144. CUDA visual profiler using mpi? (1 reply)
  145. the sample "fluidGL" in SDK (0 replies)
  146. Why shared memory is slower than global memory with gradient computation? (6 replies)
  147. Binding linear memory to 2D texture (3 replies)
  148. STREAMS (0 replies)
  149. copy a matrix in global to a vector in shared (2 replies)
  150. How goos is constant propagation in nvopencc? (0 replies)
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.