The GPU is very fast for making parallel calculations of small predefined objects.
The use case of FSB is completely different. It requires consecutive calculations of different objects with a massive data flow.
Emagine you have to calculate MA. The formula is something like: MA[n] = MA[n-1] + Price[n] - Price[n - period].
There is no way to make the calculations in parallel.
The only example that may benefit from GPU is probably the calculation of the Upper and the Lower bands of the "band" indicators. It will work because once we have the middle MA calculated, we may compute each band in parallel as Up[n] = MA[n] + Delta[n].
However!!! To utilize simple '+' operation we have to call the GPU API, which will probably be slower. The only solution is to upload the complete MA and ATR (or other) arrays to the GPU, to calculate the new array and to download it back.
I'm very pessimistic about the benefits of such implementations.