Design And Analysis Of Various VLSI Optimization Techniques For Correlators

Sivasankari B1* and Poongodi P2

1Department of Electronics & Communication Engineering, SNS College of Technology, Coimbatore, Tamil Nadu
2Department of Electronics & Communication Engineering, Karpagam College of Engineering, Coimbatore, India

*Corresponding author: E-Mail: sanksnsct@gmail.com

ABSTRACT

Background: This paper describes about correlator architectures with different VLSI optimization techniques such as parallel processing and pipelining. Both of these techniques are used to reduce the power consumption and to achieve high speed. Objective: To achieve high speed and low power consumption, VLSI optimization techniques such as parallel processing and pipelining technique is used. Results: Simulation is done using Xilinx ISE 13.2 tool with Verilog HDL (Verification Logarithmic Hardware Description Language) and it has been synthesized on Kintex7 (xc7k70t-fbg676) FPGAs for the multiplier less correlator and pre-compute correlator. When compared to parallel pre-compute correlator, parallel pre-compute correlator with pipelining reduces the delay by 20.77%. Conclusion: Up to certain limit pipelining provides significant performance gains with little increase in chip area. It also reduces glitching in the circuit. Throughput beyond that achievable by pipelining can be attained by parallel architectures.

KEY WORDS: Optimization, Parallel Processing, Pipelining, Correlator.

1. INTRODUCTION

Now a days, wireless equipments are parts of everyone’s life. In every fragile from mobiles to computer, Broadband Wireless Access (BWA), internet is most popularly used without the use of a wired modem connection. In wireless system, the data’s are transmitted through radio waves. It is possible for extended range of communications, which is not possible with the use of wires. Some other wireless accesses are Wi-Fi and WiMAX. The operation of Wi-Fi and WiMAX is similar, but WiMAX operates at high speed and it is used for a large number of users. It is a multi-carrier modulation technique, where the closely spaced sub carrier signals are used to carry the data to the channels in a parallel manner. These subcarriers are orthogonal to each other. To eliminate the Inter Symbol Interference (ISI), the sub carriers should be non-overlapped. For a complete elimination of ISI there must be some guard interval between OFDM symbols. when the correlation scheme is used, the Hardware complexity of MIMO-OFDM synchronization is reduced based on time-multiplexing technique.

C.Visweswariah (1997) proposed the circuit optimization technique to maintain timing accuracy. G. Hazari et al proposed that the VLSI optimizing the flow,which suitable for compilers, design procedures and real-time allocations on-chip modules. In buffer insertion algorithm, Uttraphan proposed a new technique which optimize the interconnect power and delay consumption of buffers using dynamic programming. Fatemeh Kashfi (2011) presenting a variety of analytical methods to optimize power and delay of VLSI system.

An FPGA architecture could be exploited in for implementing the wireless communication protocols. The precomputation based correlators also presented in for improving throughput of the wireless system.

Correlator is a digital device that takes two Nyquist-sampled digital streams representing the voltages which is present in one or more radio receivers and computes the cross-correlation function as a time lag function. Correlation is a mathematical operation that is similar to convolution. Correlation uses two signals to generate third signal. Each correlation algorithms are based on the correlation of the received signal. Multiplier less correlator is designed to replace the use of DSP slices. It reduces area, delay and power consumption. Also it can be used in any FPGA architecture. Based on a cross - correlation algorithm the out-performance is improved. So it needs less hardware operators(components) than the classic correlator. In 2014, Anandh Leno et.al proposed design of Resource Efficient Low Power Correlator for Communication for optimization. The following optimization techniques are given for achieving higher throughput.

Parallel Processing: In DSP, parallel processing is a technique duplicating function unit to operate different tasks (signals) simultaneously. Accordingly, we can do the same signal processing for different signals based on the equivalent duplicated function units. Due to the features of parallel processing, the multiple outputs are generated by parallel DSP design, which results in higher throughput than not parallel. Reduced power consumption, not increase in clock speed, increased sample speed and Parallelism are the advantages of parallel processing.

Pipelining: One important technique used in most of the digital applications such as microcontrollers, DSP systems etc. is Pipelining. Its operation initiates from the basic idea of a water pipeline, where the water sent continuously without waiting for the water in the pipeline to bring out. On most DSP systems, the speed enhancement of the critical path is based on this pipelining operation. For example, either it can reduce the power consumption or increase...
the clock frequency at the same speed of the DSP system. DSP48E1-based correlators can attain higher clock speeds only through a detailed pipelined design.

2. METHODS
Proposed VLSI Optimization Based Correlators:

Parallel Pre-compute Correlator: The architecture of parallel pre compute correlator is shown in Fig.1. The proposed parallel correlator is mainly used in OFDM systems for timing synchronization. The size of the correlator and the number of registers is based on the input samples. The correlator coefficients are selected based on the preamble samples of the short OFDM signal. Preamble signal is used for transmitting time synchronization. The parallel correlator is also based on computation sharing technique. The product of the received sample with the complex correlator coefficients is estimated by the pre-compute and the selector unit. Pre-computed values are selected based on the multiplexer. Finally the addition process is done in parallel. Because of this parallel processing, it reduces the delay with some area overhead.

![Figure 1. Architecture of Parallel Precompute Correlator](image1)

Parallel Precompute Correlator With Pipelining: The architecture of parallel pre-compute correlator with pipelining is shown in Fig.2. The correlator is implemented using parallel processing, pipelining and sharing technique. With the help of these techniques, the speed will be increased and the power consumption is reduced.

Here R is the 64 bit register and pr[0], pr[1]….pr[63] are preamble symbols. The basic components of this architecture are Multiplexer, Adder, pre-computation unit and registers. The pre-computed values are selected with the help of preamble symbols. The pre-computation unit computes the product of received sample with the complex correlator coefficients.

![Figure 2. Parallel pre-compute correlator with pipelining](image2)

3. RESULTS AND DISCUSSIONS
Simulation is done using Xilinx ISE 13.2 tool using Verilog HDL (Verification Logarithmic Hardware Description Language) and it has been synthesized on Kintex7 (xc7k70t-fbg676) FPGAs for the multiplier less correlator and pre-compute correlator.

Simulation Result of Parallel Pre-compute Correlator: The simulation result of parallel precompute correlator is shown in Fig. 3. Here X is 32 bit input. H1,h2,….,h64 are correlator coefficients. Y_re and y_img are the real and imaginary part of the correlator outputs. Various inputs are given and associated outputs are analyzed.
Simulation Result of Parallel Precompute Correlator with Pipelining

The simulation result of parallel precompute correlator with pipelining is shown in Fig.4. Here x is the 32 bit input. Clk is the clock input and h1, h2….h64 are complex correlator coefficients. Y_re and y_img are the real and imaginary part of the correlator outputs.

Implementation Results: The architecture of precompute correlator with VLSI optimization techniques such as parallel processing and pipelining is synthesized on Kintex 7 FPGA using Xilinx ISE 13.2. These architectures are coded using Verilog HDL.

Table.1. Comparison of Area

<table>
<thead>
<tr>
<th>Technique</th>
<th>Number of Occupied Slices</th>
<th>Number of Slice LUTs</th>
</tr>
</thead>
<tbody>
<tr>
<td>Correlator using Array Multiplier</td>
<td>3704</td>
<td>12015</td>
</tr>
<tr>
<td>Parallel precompute correlator</td>
<td>2303</td>
<td>9074</td>
</tr>
<tr>
<td>Parallel precompute correlator with pipelining</td>
<td>2404</td>
<td>9180</td>
</tr>
</tbody>
</table>

The area overhead is analysed in terms of number of occupied slices and number of slice LUTs and given in Table.1. From the obtained results, it is clear that the Parallel precompute correlator reduces the area and inclusion of pipelining slightly increases the area overhead.

Table.2. Comparison of Delay

<table>
<thead>
<tr>
<th>Technique</th>
<th>Delay (ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Correlator using Array Multiplier</td>
<td>28.998</td>
</tr>
<tr>
<td>Parallel precompute correlator</td>
<td>19.588</td>
</tr>
<tr>
<td>Parallel precompute correlator with pipelining</td>
<td>16.219</td>
</tr>
</tbody>
</table>

From Table.2, it is clear that, when compared to parallel precompute correlator, parallel precompute correlator with pipelining reduces the delay by 20.77%.

Table.3. Comparison of power

<table>
<thead>
<tr>
<th>Technique</th>
<th>Power (watts)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Correlator using Array Multiplier</td>
<td>5.219</td>
</tr>
<tr>
<td>Parallel precompute correlator</td>
<td>2.5</td>
</tr>
<tr>
<td>Parallel precompute correlator with pipelining</td>
<td>2.09</td>
</tr>
</tbody>
</table>

The Table.3. shows that parallel precompute correlator with pipelining reduces the power consumption by 0.7%.
4. CONCLUSION

The techniques of pipelining and parallel processing have been discussed. Which technique to employ in a specific design depends on factors such as functionality, chip area, power consumption and complexity of the control logic. Up to certain limit pipelining provides significant performance gains with little increase in chip area. It also reduces glitching in the circuit. Throughput beyond that achievable by pipelining can be attained by parallel architectures. For parallel architectures the throughput scales almost linearly with chip area. When compared to parallel pre-compute correlator, parallel pre-compute correlator with pipelining reduces the delay by 20.77%.

In future recent optimization techniques like particle swarm optimization, cuckoo search algorithm can be adopted for further improving the performance of the correlators.

REFERENCES

Aifeng Ren, Qinye Yin, FPGA Implementation of a W-CDMA System Based on IP Functions, WSEAS Int. Conf. on Dynamical Systems and Control, Venice, Italy, 2005, 320-324.


