96 kHz.org
Advanced Audio Recording

Advantages of FPGAs in the field of Audio Processing

At first sight, audio processing appears to be an easy task in modern signal processing since the sample rate is mostly low. A more closer look discovers that because of the desired accuracy (16 Bits and more) quite a significant demand of calculation power is required to fulfill peoples needs such as with physical modeling. To understand the principles a deeper look at the architecture of  FPGAs is required:

Advantages of FPGAs in Signal Processing

FPGAs can perform basic calculations such as MUL / SUM and decision much quicker than Microcontrollers and also FPGAs can process many "tasks" parallely in real time. For example, a current DSP operating at 60MHz performs a 2nd order differential equation describing a sine oscillation in about 1us using sequential processing, variable treatment and RAM access, while an FPGA each single step of an equation or what else can be processed parallely leading to a so called pipeline where all resources are free again to be used in the next clock cycle. So, many channels / voices / cases can be processed. There will be only a latency of  dedicated number of clock cycles. As long as the result of the calculation is not required to continue with the processing directly, tasks can thus be processed much more effectively in total. Although the basic system frequency might be higher with DSPs or CPUs, FPGAs can easily become much quicker than e.g. DSP solution. FPGAs are appropriate mostly for applications which require parallel processing. The more channels required - the better is the utilization on an FPGA.

The subsequent example shows a timing comparison for both FPGA and DSP for a 128 TAP sequential filter (equalizer):

 

Comparison of FPGAs and DSP - equalizer example

 

Here the DSP (left) needs 13 clock cycles to process one sample and it's corresponding coefficient. All TAPs of the filter are processed at 80 MHz leading to "exactly" the required speed to be ready within the period given by the 48kHz sampling frequency. More TAPs required more operation frequency of the DSPs. Unlike that the FPGA consists of combinatorial logic forming deciders and calculation modules which all could do operate simultaneously. This only leads to a latency of 7 clock cycles including RAM wait states. The final result of 23000 time steps appears worse at first sight and seem to tell us, that 2 of these instances were required to complete a full operation during the given sample rate of 48kHz and possibly process the upper and lower region of the samples. But since all actions are done in parallel, only 128 clock cycles + latency are required.

The common issue with FPGAs is the latency problem with internal elements like multipliers and adders, which might be not quick enough to complete their operation during one clock cycle. This can be solved by partial parallelization like show here for a multiplier structure:

Comparison of FPGAs and DSP - pipelining methods

Pipelining method to increase throughput at a DSP system in FPGAs.

 

Conclusion

FPGAs typically run at lower speeds than DSPs when synthesis constraints are set that way that a balanced tradeoff between speed and area is focused where not too many additional FFs will have to be added in order to achieve the desired system frequency. Usually, this is about 3 times lower. On the other hand, FPGAs do process many operations within one single step where DSPs need 2 or more and thus come closer again to the DSPs in final data operation speed. However a ratio of 1:2 might persist at this point of view. But there is room for improvement:  Because of full pipelined operation any residing clock cycle which is not required to complete the total number of operations of the channel which have to be done during one sample period can be used to generate more channels in simple pipelined systems. For fully pipelined systems, the latency has no effect anymore on the resulting number of channels. Only a further set of variables / signals is required for this, so balancing the pipeline delay with the architecture width is required. Tweaking the internal architecture that way, that complex operations like filtering are done the parallel way, saves pipeline delay and latency and increases the used area only moderately, where doubling the number of voices in a DSP system requires up to the doubled operation frequency.

 

Read more about DSP-Systems for Audio Processing

 

© 2005 - Jürgen Schuhmacher