Jun 14 2008, 12:36 PM
Post
#1
|
|
![]() ![]() ![]() ![]() ![]() Group: Extranet Users Posts: 107 Joined: 6-October 07 From: Berkeley, CA Member No.: 72,970 Org.: UC Berkeley |
my speedy FFT
Hi, I'd like to share an implementation of the FFT that achieves 160 Gflop/s on the GeForce 8800 GTX, which is 3x faster than 50 Gflop/s offered by the CUFFT. It is designed for n = 512, which is hardcoded. The implementation also includes cases n = 8 and n = 64 working in a special data layout. Compile using CUDA 2.0 beta or later. Vasily Update (Sep 8, 2008): I attached a newer version that includes n = 16, n = 256 and n = 1024 cases. Here is its performance in Gflop/s when using driver 177.11 (up to ~40% slower with the newer 177.84 version): CODE ...n batch 8600GTS 8800GTX 9800GTX GTX280 ...8 2097152 22 67 52 102 ..16 1048576 28 89 68 125 ..64 262144 35 126 104 231 .256 65536 39 143 137 221 .512 32768 41 162 154 298 1024 16384 41 159 166 260 Update (Jan 12, 2009): I attached a quickly patched version that supports forward and inverse FFTs for n = 8, 16, 64, 256, 512, 1024, 2048, 4096 and 8192. Here are results for CUDA 2.1 beta, driver 180.60: CODE Device: GeForce GTX 280, 1296 MHz clock, 1024 MB memory. Compiled with CUDA 2010. --------CUFFT------- ---This prototype--- ---two way--- ...N Batch Gflop/s GB/s error Gflop/s GB/s error Gflop/s error ...8 1048576 8.8 9.4 1.8 83.0 88.6 1.6 82.8 2.1 ..16 524288 19.7 15.8 2.1 91.9 73.5 1.5 92.5 1.8 ..64 131072 55.8 29.8 2.3 185.6 99.0 2.4 185.4 3.0 .256 32768 97.1 38.8 2.2 160.1 64.1 2.0 160.1 3.0 .512 16384 65.2 23.2 2.9 240.9 85.6 2.5 240.9 3.7 1024 8192 86.7 27.8 2.7 211.4 67.7 2.5 211.9 3.9 2048 4096 50.6 14.7 3.7 160.0 46.6 3.0 159.6 4.5 4096 2048 48.9 13.0 4.0 171.0 45.6 3.3 170.2 4.9 8192 1024 47.4 11.7 4.4 185.9 45.8 3.4 184.1 5.2 Errors are supposed to be of order of 1 (ULPs). Some of the layouts used are not straightforward. If you need filtering based on element-wise multiplication by a vector in the frequency space, I'd suggest using a similar layout for the vector. This post has been edited by vvolkov: Jan 12 2009, 03:54 PM
Attached File(s)
FFT_061408.zip ( 7.43K )
Number of downloads: 703
FFT_090808.zip ( 11.04K )
Number of downloads: 541
IFFT_011209.zip ( 19.55K )
Number of downloads: 644 |
|
|
|
vvolkov my speedy FFT Jun 14 2008, 12:36 PM
jimh Very cool. Thanks for posting this. I'm excite... Jun 15 2008, 05:02 AM
vvolkov QUOTE(jimh @ Jun 14 2008, 09:02 PM)I noticed ... Jun 15 2008, 05:26 AM
mfatica Using sincos instead of the sin and cos will be fa... Jun 15 2008, 11:26 AM
jimh Vasily, I was surprised to hear it didn't make... Jun 15 2008, 04:26 PM
oYo2k7 Hi,
I have a couple of questions, i'm kind of... Jun 17 2008, 02:43 PM
vvolkov QUOTE(oYo2k7 @ Jun 17 2008, 06:43 AM)1) Why d... Jun 18 2008, 01:17 AM
E.D. Riedijk QUOTE(vvolkov @ Jun 18 2008, 03:17 AM)Because... Jun 18 2008, 04:59 AM

vvolkov QUOTE(E.D. Riedijk @ Jun 17 2008, 08:59 PM)Bu... Jun 18 2008, 05:23 AM

E.D. Riedijk QUOTE(vvolkov @ Jun 18 2008, 07:23 AM)I don... Jun 18 2008, 12:29 PM

vvolkov QUOTE(E.D. Riedijk @ Jun 18 2008, 04:29 AM)An... Jun 18 2008, 01:52 PM
oYo2k7 Ok !
Sorry for the second question, i knew yo... Jun 18 2008, 07:02 AM
vvolkov QUOTE(oYo2k7 @ Jun 17 2008, 11:02 PM)One more... Jun 18 2008, 07:31 AM
E.D. Riedijk well, you can, but it will be slow memory compared... Jun 18 2008, 03:23 PM
cudacuda321 I looked through the code and it seems like comple... Sep 5 2008, 07:45 PM
vvolkov QUOTE(cudacuda321 @ Sep 5 2008, 11:45 AM)I lo... Sep 5 2008, 07:55 PM
cudacuda321 QUOTE(vvolkov @ Sep 5 2008, 12:55 PM)This is ... Sep 6 2008, 06:33 AM
vvolkov QUOTE(cudacuda321 @ Sep 5 2008, 10:33 PM)Sorr... Sep 6 2008, 07:04 AM
profquail Vasily, I've read some of your papers, and not... Sep 24 2008, 05:10 PM
vvolkov QUOTE(profquail @ Sep 24 2008, 09:10 AM)are y... Sep 26 2008, 08:20 AM
XFer Vasily,
thanks for this great contribution.
Does... Oct 5 2008, 09:22 PM
vvolkov QUOTE(XFer @ Oct 5 2008, 01:22 PM)Does it wor... Oct 6 2008, 07:09 AM
g000fy with "FFT_061408"
QUOTEDevice: GeForce ... Oct 6 2008, 03:37 AM
vvolkov QUOTE(g000fy @ Oct 5 2008, 07:37 PM)is there ... Oct 6 2008, 07:14 AM
g000fy QUOTE(vvolkov @ Oct 6 2008, 02:14 AM)It shoul... Oct 6 2008, 07:19 PM
RoofusGreen Would it be possible to run this code in CUDA 1.0?... Oct 7 2008, 04:48 PM

tmurray QUOTE(RoofusGreen @ Oct 7 2008, 09:48 AM)Woul... Oct 7 2008, 05:16 PM

RoofusGreen QUOTE(tmurray @ Oct 7 2008, 12:16 PM)Why are ... Oct 7 2008, 05:26 PM
vvolkov QUOTE(g000fy @ Oct 6 2008, 11:19 AM)thanks fo... Oct 7 2008, 07:28 PM
mfatica Cuda 2.0 is out for Mac.
The compiler in 1.1 will ... Oct 7 2008, 05:29 PM
shawkie I agree - great work!
Is there any chance of ... Nov 4 2008, 04:20 PM
hill_matthew Although this is now a bit old: http://www.science... Nov 10 2008, 03:01 PM
vvolkov QUOTE (hill_matthew @ Nov 10 2008, 07:01 ... Nov 10 2008, 03:24 PM
hill_matthew Very interesting paper. Did I miss the link to the... Nov 10 2008, 03:41 PM
vvolkov QUOTE (hill_matthew @ Nov 10 2008, 07:41 ... Nov 10 2008, 04:11 PM
_gl QUOTE (vvolkov @ Nov 10 2008, 04:11 PM) I... Nov 26 2008, 04:50 PM
shawkie QUOTE (_gl @ Nov 26 2008, 04:50 PM) I... Nov 27 2008, 01:51 PM
hill_matthew Not very surprising really, but encouraging that t... Nov 27 2008, 08:06 PM
doctor Using the 061408 and cuda 2.1 on 8800GT I got
D... Dec 18 2008, 04:03 AM
hill_matthew I've been thinking about the upcoming OpenCL a... Dec 20 2008, 12:14 AM
lzhfire To make a FFT testing with double precision in CUD... Dec 22 2008, 10:09 AM
dpephd Hey Everyone,
I got a BFG GeForce 260 GTX OC for ... Jan 5 2009, 05:33 AM
vvolkov QUOTE (dpephd @ Jan 4 2009, 09:33 PM) I a... Jan 5 2009, 12:36 PM
dpephd QUOTE (dpephd @ Jan 4 2009, 09:33 PM) I.e... Jan 5 2009, 07:43 PM
vvolkov QUOTE (dpephd @ Jan 5 2009, 11:43 AM) Obt... Jan 6 2009, 01:57 AM
dpephd QUOTE (vvolkov @ Jan 5 2009, 05:57 PM) ..... Jan 8 2009, 05:53 PM
mfatica Thanks for reporting this bug, we are working on i... Jan 6 2009, 01:49 AM
mfatica This compiler flag will fix the performance proble... Jan 11 2009, 06:36 PM
dpephd QUOTE (mfatica @ Jan 11 2009, 10:36 AM) T... Jan 12 2009, 05:27 AM

dpephd QUOTE (dpephd @ Jan 11 2009, 09:27 PM) Th... Jan 12 2009, 08:06 AM

vvolkov QUOTE (carcle85 @ Jan 12 2009, 01:21 AM) ... Jan 12 2009, 04:05 PM

carcle85 QUOTE (vvolkov @ Jan 12 2009, 09:05 AM) P... Jan 12 2009, 04:38 PM
dpephd QUOTE (mfatica @ Jan 11 2009, 10:36 AM) T... Jan 12 2009, 08:41 AM
dpephd some additional variable batch size results for 8 ... Jan 19 2009, 07:32 AM
carcle85 Hello,
I've tried this fft and I obtained grea... Jan 12 2009, 09:21 AM
kanishk Thanks vvolkov for your really nice and useful imp... Jan 17 2009, 09:10 AM
vvolkov QUOTE (kanishk @ Jan 17 2009, 01:10 AM) W... Jan 18 2009, 03:17 PM
dpephd QUOTE (kanishk @ Jan 17 2009, 01:10 AM) C... Jan 19 2009, 07:27 AM
CudaSpeak IFFT-011209 running unaltered under Windows XP 64 ... Jan 20 2009, 10:23 PM
Pimbolie1979 Do you use a Hanning Window in your FFT? Jan 24 2009, 11:56 AM
profquail QUOTE (Pimbolie1979 @ Jan 24 2009, 05:56 ... Jan 24 2009, 11:45 PM
seibert QUOTE (profquail @ Jan 24 2009, 06:45 PM)... Jan 25 2009, 03:17 AM
profquail QUOTE (seibert @ Jan 24 2009, 09:17 PM) T... Jan 25 2009, 05:19 PM
Pimbolie1979 N = the number of FFT points
What is the batch pa... Jan 24 2009, 12:25 PM
CudaSpeak QUOTE (Pimbolie1979 @ Jan 24 2009, 06:56 ... Jan 24 2009, 03:32 PM
Andy386 It's seems i am unable to run your FFT on Wind... Jan 27 2009, 01:46 PM
E.D. Riedijk QUOTE (Andy386 @ Jan 27 2009, 02:46 PM) I... Jan 27 2009, 02:24 PM
Andy386 Argh, i totally missed to get an mexfile out of th... Jan 27 2009, 03:17 PM
wanderine Is it possible to use for 2D and 3D? 4D ? Jan 27 2009, 10:43 PM
Andy386 QUOTE (wanderine @ Jan 28 2009, 12:43 AM)... Jan 28 2009, 12:36 PM
E.D. Riedijk QUOTE (Andy386 @ Jan 28 2009, 01:36 PM) I... Jan 28 2009, 12:46 PM
carcle85 My problem is that to obtain the output in the sam... Jan 31 2009, 11:40 AM
vvolkov QUOTE (carcle85 @ Jan 31 2009, 03:40 AM) ... Jan 31 2009, 11:48 AM

carcle85 QUOTE (vvolkov @ Jan 31 2009, 03:48 AM) B... Jan 31 2009, 01:19 PM

vvolkov QUOTE (carcle85 @ Jan 31 2009, 05:19 AM) ... Jan 31 2009, 01:24 PM
E.D. Riedijk QUOTE (carcle85 @ Jan 31 2009, 12:40 PM) ... Jan 31 2009, 01:24 PM
carcle85 QUOTE (vvolkov @ Jan 31 2009, 05:24 AM) S... Jan 31 2009, 01:47 PM
Pimbolie1979 Can you create a DLL from your prototype FFT? Jan 31 2009, 11:17 PM
saratoga Is it possible to estimate FFTs per second from th... Feb 2 2009, 03:46 PM
glenvidia QUOTE (vvolkov @ Jun 14 2008, 06:36 AM) U... Feb 17 2009, 09:10 PM
TOAOMatis Hi there, my first post in this topic. Started yes... Feb 18 2009, 10:08 AM
qqliudl Using 011209 and CUDA 2.1
CODEDevice: GeForce 880... Mar 2 2009, 08:16 PM
spacerat Using IFFT_011209 and CUDA 1.1, 8600GTS / 32 Strea... Mar 3 2009, 11:46 AM
vvolkov QUOTE (spacerat @ Mar 3 2009, 03:46 AM) U... Mar 3 2009, 10:27 PM
doctor Can anyone post code for the missing power of two... Mar 4 2009, 07:10 AM
spacerat Now using Cuda 2.1 - looks much better already
COD... Mar 5 2009, 09:00 AM
wanderine Has this been implemented in CUFFT yet? How big is... Mar 7 2009, 10:59 AM
wanderine If a 3D FFT is calculated with the use of 3 sequen... Mar 10 2009, 10:17 PM
vvolkov QUOTE (wanderine @ Mar 10 2009, 02:17 PM)... Mar 10 2009, 11:47 PM
wanderine QUOTE (vvolkov @ Mar 11 2009, 12:47 AM) P... Mar 25 2009, 07:16 AM
Pimbolie1979 What does error 3.0 mean? Apr 7 2009, 07:19 PM
vvolkov QUOTE (Pimbolie1979 @ Apr 7 2009, 11:19 A... Apr 8 2009, 03:56 PM
jsvarma Hi,
After using the speedy FFT code, we hav observ... Jun 9 2009, 10:44 AM
vvolkov QUOTE (jsvarma @ Jun 9 2009, 03:44 AM) Af... Jun 16 2009, 04:33 AM
wanderine How about support for sizes that are not a power o... Jun 9 2009, 11:01 AM
mcg Hey folks,
I attempted to integrate this code int... Jun 17 2009, 09:21 PM
NCC-1701D Hey
I had one basic doubt abt the performance mea... Jun 24 2009, 06:35 AM
vvolkov QUOTE (NCC-1701D @ Jun 23 2009, 11:3... Jun 24 2009, 07:12 AM
NCC-1701D QUOTE (vvolkov @ Jun 24 2009, 09:12 AM) G... Jun 24 2009, 07:40 AM
mfatica I really don't see the problem, as long as you... Jun 24 2009, 10:07 AM![]() ![]() |
| Copyright 2008 NVIDIA Corporation. Terms of Use | Legal Info | Privacy Policy | Time is now: 9th February 2010 - 11:11 PM |