Hey,
The only algorithms that will benefit from a GP-GPU implementation is where the same sufficiently complex operations are done on a very large set of data.
GPUs are very specialized for batch processing:
- memory bandwidth is huge for sequential memory access (an order of magnitude faster than CPU memory), but very slow for random memory access : the memory controller is a "stupid" highly optimized piece of hardware.
- lot of computing operations can be done simultaneously (up to 128 in modern GPUs), but conditional branching have to give the same results on all "threads". If not, both branches are executed and one is discarded afterward. (pretty complex idea to understand, this is because GPUs only have 1 instruction scheduling units for many execution units).
Add this to the cost of transmitting datas from CPU to GPU and back from GPU to CPU, which makes that most of the time optimizing something on GPU can be slower.
More here
http://gpgpu.org/ if you're interested in the subject.
- Jeko