### DIGITAL SIGNAL PROCESSING

### INTELLECTUAL PROPERTIES

of

**Rice Electronics** 

and

## APPLICATION TO 6<sup>th</sup> GENERATION (6G) NETWORKS

Filename: Processing IP – 6G

© 2023 Greg Rice

### **OVERVIEW**

Rice Electronics is developing Intellectual Properties (IP) for 6G network digital processing. These are discussed below as;

- Architecture IP
- Modelling IP
- Numeric Building Block IP

The company's IP enables unique processing Architectures for critical 6G functions such as;

- Beamforming
- Modulation / Demodulation
- Waveform Generation

The Architectures may be specialized, optimized and scaled to suit system requirements. This is done via structured building block approaches.

The Architectures can incorporate;

- Proprietary Numeric Building Blocks, such as the "Trigonometric Multipliers" (TM) discussed herein
- Multi-tier structures composed of such Blocks

These Architectures enable high-performance/low-complexity hardware implementation(s). They can achieve orders-of-magnitude improvements (in terms of complexity/cost/performance) over existing hardware entities such as;

- FPGA or ASIC Form of Specialized/Dedicated Processors
- General Purpose Programmable Processors
- Digital Signal Processors or Graphics Processing Units (DSPs or GPUs)

### **ARCHITECTURE IP**

The IP enables various specialized Architectures (e.g. FFTs, filters, phase shifters). With the IP approach described herein, IP Building Blocks can be used to construct Architectures of various complexities. These Architectures can be tailored for such critical 6G areas as;

- Advanced Modulation Techniques (e.g. OFDM)
- Beamforming
- Waveform Generation
- Passive Sensing

Currently, interested parties may contact the Company directly concerning the Modelling IP. Otherwise, the discussion below relates;

- 1) The Company's structured, hierarchical design approaches
- 2) Numeric Building Blocks (IP) supporting such approaches
- 3) A large FFT Architecture (IP) designed by the Company using said Blocks
- 4) Application of the IP for various 6G needs

### <u>OF NOTE:</u>

The Rice IP Building Blocks are devoid of much of the multiplication operations associated with DSP. This ultimately leads to real-time implementations of 100s of Mhz bandwidth, with extremely efficient circuits. The basis of this unique IP is distinctly different from Cordic, Residue or other types of arithmetic systems.

#### <mark>Also,</mark>

The large FFT described below (or pieces thereof) reflect the fundamental operations required for OFDM, DFT-s-OFDMA (SC-FDMA) and variations thereof, as used for 5G networks and currently studied for future 6G application.

### **MODELLING IP**

# CONTACT COMPANY FOR FURTHER INFORMATION REGARDING MODELLING IP

# CONTACT COMPANY FOR FURTHER INFORMATION REGARDING MODELLING

IP

Preliminary P4

### HIERARCHICAL STRUCTURES

The IP simplifies construction of high-performance processors through hierarchical building block approaches, as exemplified below.

Advanced Numeric Building Blocks are used to minimize physical structure. For example, proprietary "Trigonometric Multipliers" (TMs) can replace conventional multipliers (using 100s of gates, instead of 1000s) in construction of FFTs.

In the hierarchical approach, TMs enable highly efficient "small" DFTs, which in turn create optimal larger Architectures. That is, the TMs serve as "Building Blocks" for the larger scale "Architecture IP", as exemplified in Figure 1 below.



### HIERARCHICAL ARCHITECUTURE APPROACH

### FIGURE 1

The hierarchical Architectures greatly reduce logic complexity, data movement and memory organization. Also, computational complexity is far less than traditional implementations of challenging processing functions, with most multiplication operations being eliminated. As a point of comparison, for large Fourier transform functions;

Rice Architecture IP  $\approx 2N(\sqrt{N})$  addition operations for N-point real-value FFT

Traditional approach  $\approx$  (N/2) log<sub>2</sub>N <u>complex multiplies</u> for N-point complex FFT

Such Architectural hierarchy will be captured within the Modelling IP. Both structural and computational effects of the approach can be evaluated therein.

### TRIGONOMETRIC MULTIPLIERS (TM) IP

The Proprietary TM replaces traditional multiplier building blocks. The TM (requiring hundreds of gates versus thousands for conventional multipliers) drastically reduces logic complexity of many DSP Architectures.

The TM is specialized for arithmetic calculations associated with many transform operations. The basic TM calculation as illustrated below;

### TRIGONOMETRIC MULTIPLIER CALCULATION

### FIGURE 2

The TMs may be implemented in hardware using only addition operators (i.e., precluding need for any conventional multiplication circuitry).

The TMs optimize implementation of transform functions such as DCTs and small, stand-alone DFTs. They also serve as highly efficient digital phase shifters and signal synthesizers. The discussion below presents an Architecture for large FFT operations, wherein the TM is a building block.

The Rice IP includes variations of the TM which may be instantiated in the Architectures to meet system requirements at minimal complexity. The Modelling IP can be used to support such hardware optimization.

The TM can generate trigonometric values internally, thus mitigating the need for external storage of such values in ROM or other memory.

The TM resembles a "modular" computational Building Block. It presents a straightforward interface as shown in Figure 3 below.



### TRIGONOMETRIC MULTIPLIER INTERFACES

Its simplistic structure lends the TM to high clock rates. Clock periods are typically limited by the propagation delay through one or two adders, and two multiplexers; as exemplified in Figure 4.



### TRIGONOMETRIC MULTIPLIER DELAY PATH - EXAMPLE

### LARGE ARCHITECTURES (FFT EXAMPLE)

Structurally, the TMs;

- Are characterized by simple interface and operation
- Consist of basic library elements (e.g. adders, multiplexers, registers)
- Can trade complexity for numeric precision (fixed point, 8 to 24 bits)
- Are synchronous in operation (requiring one or two clock cycles to produce most products within the Architecture structure)

The TM may be used to build efficient, small transforms (e.g., an N-point DFT performed with  $N^2$  to  $N^2/2$  addition operations, and no multiplies). Within an Architecture, such DFTs can implement semi-independent "processing Sections". These Sections can then be used to construct large FFT Architecture(s). Greater (or fewer) Sections can be employed in parallel to scale the Architecture complexity, and related performance (execution times).

Large FFTs are critical for communications devices and infrastructure. In many OFDM schemes, FFT sizes might approach 4K (4096) points to support advanced signal modulation. The Architecture herein resolves fundamental problems in implementation of such FFTs.

Figures 5 and 6 below relate to implementation of the Architecture with emphasis on a 4K point FFT transform. The Figures reflect lower and higher complexity implementations, respectively. As shown, the Architecture scales such that execution time (column 3) is inversely proportional to complexity (column 2). The Architecture can be ultimately scaled such that execution times approach 4096 clock cycles (or  $\approx$  8µsec execution time @ 500MHz clock speed).

The use of modular Sections resolves certain issues inherent in high-performance transform design, as related to memory structure and data movement. These are alleviated by the Sections' use of small, self-sufficient local memories (referenced in column 2 of the Figures). Also of note, is that the size of "Global" memory (column 4) remains constant as the Architecture is scaled.

| Data Resolution         | Approximate Complexity  | Clock Cycles     | "Global" Memory:         |
|-------------------------|-------------------------|------------------|--------------------------|
|                         |                         |                  | Size / Accesses          |
| 16 bit input data f(n)  | 2,000 gates             | ≈ 2 <b>^</b> 17  | 4096 words RAM /         |
| 20 bit output data F(j) | +1024 bytes "local" RAM | (131,072)        | 2^17 accesses            |
|                         | +256 bytes "local" ROM  |                  |                          |
| 12 bit input data f(n)  | 1,500 gates             | $\approx 2^{17}$ | 4096 words RAM /         |
| 16 bit output data F(j) | +1024 bytes "local" RAM | (131,072)        | 2 <sup>17</sup> accesses |
|                         | +256 bytes "local" ROM  |                  |                          |

### Low-Complexity 4096 Point Real-Valued Transform

### FIGURE 5

| Data Resolution         | Approximate Complexity  | Clock Cycles     | "Global" Memory: |
|-------------------------|-------------------------|------------------|------------------|
|                         |                         |                  | Size / Accesses  |
| 16 bit input data f(n)  | 8,000 gates             | $\approx 2^{15}$ | 4096 words RAM / |
| 20 bit output data F(j) | +4096 bytes "local" RAM | (32,768)         | 2^15 accesses    |
|                         | +1024 bytes "local" ROM |                  |                  |
| 12 bit input data f(n)  | 6,000 gates             | ≈ 2^15           | 4096 words RAM / |
| 16 bit output data F(j) | +4096 bytes "local" RAM | (32,768)         | 2^15 accesses    |
| $\sim$                  | +1024 bytes "local" ROM |                  |                  |

### High-Complexity 4096 Point Real-Valued Transform

The execution times seen in Figures 5 and 6 can be further reduced by modifying the Section structure itself. This accelerates processing at the expense of greater complexity. For example, increased complexity of  $\approx$  50% essentially halves execution time. This is achieved by increased complexity of individual Sections, as opposed to a greater number of Sections in parallel.

Still, the FFT execution time (column 3) scales with the number of parallel Sections employed in the Architecture. Execution times remain inversely proportional to overall complexity (column 2). The Architecture can be scaled such that execution times decrease to nearly 2048 clock cycles (or  $\approx$  4µsec execution time @ 500MHz clock speed). This scaling relationship is expressed as follows;

 $T_E \alpha (1/S)$ 

Where,

### T<sub>E</sub> = Execution Time

### S = Number of Sections Employed

Advantages of shorter execution time are greater throughput (or bandwidth), and decreased latency of data vectors "streaming" through the transform. Decreased latency can be a critical system-level consideration, especially in 6G systems.

Figures 7 and 8 below reflect the impact of Section modifications upon the transforms described by Figures 5 and 6, respectively. Differences in the modified (accelerated) transforms include;

- Reduction in clock cycles by factor of 2 (column 3 of Figures 7 and 8)
- Increased complexity of  $\approx 50\%$  (column 2 of Figures 7 and 8)
- Reduction by  $\approx 2$  bits of "best data resolution" (column 1 of the Figures)
- "Global" Memory size unchanged, but requiring faster access via "split memory" organization (allowing simultaneous dual access to 2 Memory halves). "Local RAM" memory requiring either faster access (cycle) times, or increased complexity.

| Data Resolution         | Approximate Complexity  | Clock Cycles | "Global" Memory:         |
|-------------------------|-------------------------|--------------|--------------------------|
|                         |                         |              | Size / Accesses          |
| 14 bit input data f(n)  | 3,000 gates             | ≈ 2^16       | 4096 words RAM /         |
| 18 bit output data F(j) | +1024 bytes "local" RAM | (65,536)     | 2 <sup>17</sup> accesses |
|                         | +256 bytes "local" ROM  |              | C                        |
| 12 bit input data f(n)  | 2,300 gates             | ≈ 2^16       | 4096 words RAM /         |
| 16 bit output data F(j) | +1024 bytes "local" RAM | (65,536)     | 2^17 accesses            |
|                         | +256 bytes "local" ROM  |              |                          |

### 2X Speed, Low-Complexity 4K Real-Valued Transform

# FIGURE 7

| Data Resolution         | Approximate Complexity  | Clock Cycles     | "Global" Memory: |
|-------------------------|-------------------------|------------------|------------------|
|                         |                         |                  | Size / Accesses  |
| 14 bit input data f(n)  | 12,000 gates            | $\approx 2^{14}$ | 4096 words RAM / |
| 18 bit output data F(j) | +4096 bytes "local" RAM | (16,384)         | 2^15 accesses    |
| • •                     | +1024 bytes "local" ROM |                  |                  |
| 12 bit input data f(n)  | 9,000 gates             | $\approx 2^{14}$ | 4096 words RAM / |
| 16 bit output data F(j) | +4096 bytes "local" RAM | (16,384)         | 2^15 accesses    |
|                         | +1024 bytes "local" ROM |                  |                  |

### 2X Speed, High-Complexity 4K Real-Valued Transform

### SPATIAL DOMAIN APPLICATIONS - BEAMFORMING

Beamforming will be a cornerstone of future 6G systems and platforms. Such beamforming will be dependent upon processing of multiple signal paths from large antenna arrays. Many approaches under consideration involve analog processing of these parallel signal paths. Digital processing is theoretically possible, but is often dismissed due to the circuit complexity and power consumption required at 6G bandwidths.

But 6G beamforming requires precision phase matching across numerous signal paths. This is technically difficult and costly to achieve with analog processing. Digital processing would alleviate such issues.

Accordingly, different levels of the IP described above can be employed for phasebased beamforming, (e.g., as employed in interferometers and phased arrays). At a basic level, a linear array of TMs can perform simple beamforming. At a higher level, DFT blocks may resolve numerous simultaneous beams. With parallel implementation of such DFTs, bandwidths of 100s of Mhz may be supported.

Therefore, various IP implementations can perform parallel processing of wideband signals derived from antenna arrays of 6G platforms. This can be done with10X less complexity relative to conventional digital technologies. Cost, size and power are all reduced. Further, the approaches can be scaled for system requirements such as;

- Number of antennas
- Signal bandwidths
- Accuracies (number of bits)
- Angular precision

The Modelling IP can allow trade-offs of the above parameters to be simulated and evaluated. This can greatly accelerate system development. Modelling IP can also simulate certain system delays and response times. This can be done in the context of various degrees of parallelism, providing additional key trade-off evaluations.

### PASSIVE SENSING APPLICATIONS

"Sensing" has been suggested as a future capability of 6G platforms. The IP described herein can also be applied in this area. However, this document does not postulate "active RF sensing", especially as concerns small platforms. Technical requirements with regard to power, stabilization and synchronization may be overwhelming is such environment.

At the same time, various forms of "<u>passive</u> RF sensing" may be viable, and the IP discussed above would be instrumental in such regard.

Another type of "passive sensing", can involve imaging sensors (e.g. cameras).

The Architectures, techniques and methods above can also be applied to 6G platforms for Image Processing, when said platform might be equipped with imaging sensor(s).

For example, the large FFTs may be used to efficiently implement 2-D convolutions in the frequency domain. This is fundamental to many image processing tasks, ranging from basic filtering, to object detection. Also, smaller structures such as DCTs are essential to image compression.

With respect to object detection and recognition, "neural nets" are also an area of potential utility for the IP technology. The IP relevance is driven by the evolution of aspects of neural networks toward frequency domain implementations.

Rice Electronics continues to develop diversified Numeric Building Blocks, as may be applicable to Image Processing and other 6G challenges.

### SUMMARY

Rice Electronics has created Intellectual Property (IP) for specialized digital processing Architectures. The Architectures employ novel Building Blocks in a hierarchical approach; facilitating rapid hardware development and unique complexity/ performance trade-offs. The Company's IP can eliminate most conventional multiplication from high-performance Architectures. This allows hardware implementations based primarily upon simple addition operations. The IP paradigms are independent of circuit technologies.

The Architectures may be specialized and scaled, to optimize for tasks such as transforms, convolutions and correlations.

The Company's IP addresses many fundamental challenges of 6G Network development. This includes extreme processing throughput required to support the wide-bandwidth signal paths of 6G antenna arrays. The IP also addresses size, cost and power constraints of small platforms.

The IP enables circuits comparable in performance to conventional solutions, at a small fraction of the complexity. For many aspects of 6G, the IP is far superior to DSP cores, FPGAs, and other logic entities.

As discussed herein, the Company's digital processing IP base includes;

- Architecture IP
- Numeric Building Block IP
- Modelling IP

### NOTES:

This document contains preliminary information.

Some Intellectual Properties referenced in this document may have patents pending.

Contact: Greg Rice

ricetronics@gmail.com

### ABBREVIATIONS

| ASIC | Application Specific Integrated Circuit |
|------|-----------------------------------------|
|------|-----------------------------------------|

- DCT Discrete Cosine Transform
- DFT Discrete Fourier Transform
- DSP Digital Signal Processing (or Processor)
- FDMA Frequency Domain Multiple Access
- FFT Fast Fourier Transform
- 6G Sixth Generation
- GPU Graphics Processing Unit
- IP Intellectual Property
- FPGA Field Programmable Gate Array
- OFDM Orthogonal Frequency Domain Multiplexing
- SC-FDMA Single Carrier Frequency Domain Multiple Access
- SoC System-on-Chip

Filename: Processing IP - 6G

© 2023 Greg Rice