# Amplifier-Enhanced Memristive Massive MIMO Linear Detector Circuit: An Ultra-Energy-Efficient and Robust-to-Conductance-Error Design

Jia-Hui Bi, Shaoshi Yang, Senior Member, IEEE, Ping Zhang, Fellow, IEEE, Sheng Chen, Life Fellow, IEEE

Abstract—The emerging analog matrix computing technology based on memristive crossbar array (MCA) constitutes a revolutionary new computational paradigm applicable to a wide range of domains. Despite the proven applicability of MCA for massive multiple-input multiple-output (MIMO) detection, existing schemes do not take into account the unique characteristics of massive MIMO channel matrix. This oversight makes their computational accuracy highly sensitive to conductance errors of memristive devices, which is unacceptable for massive MIMO receivers. In this paper, we propose an MCA-based circuit design for massive MIMO zero forcing and minimum mean-square error detectors. Unlike the existing MCA-based detectors, we decompose the channel matrix into the product of small-scale and large-scale fading coefficient matrices, thus employing an MCA-based matrix computing module and amplifier circuits to process the two matrices separately. We present two conductance mapping schemes which are crucial but have been overlooked in all prior studies on MCA-based detector circuits. The proposed detector circuit exhibits significantly superior performance to the conventional MCA-based detector circuit, while only incurring negligible additional power consumption. Our proposed detector circuit maintains its advantage in energy efficiency over traditional digital approach by tens to hundreds of times.

*Index Terms*—Massive MIMO, multi-user detection, receiver design, analog matrix computing, memristive crossbar array, inmemory computing.

# I. Introduction

Massive multiple-input multiple-output (MIMO) technology, whose core idea is to equip base stations (BSs) with a very large number of antennas to support multiuser transmissions, can significantly improve the network capacity and spectrum efficiency, and it has become a cornerstone technology for contemporary and future wireless communication systems. However, utilizing large number of antennas results in high complexity of detection algorithms, posing a notable challenge to the realization of next-generation massive MIMO receivers that are expected to simultaneously achieve high performance, ultra-low latency and low energy consumption. Many detection algorithms have been proposed to reduce detection latency and energy consumption [1]. However, these low-complexity algorithms usually suffer from considerable performance loss, and therefore they do not achieve good

J.-H. Bi, S. Yang and P. Zhang are with the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China (e-mails: {bijiahui, shaoshi.yang, pzhang}@bupt.edu.cn). *Corresponding author: S. Yang.* 

trade off between high performance and low latency/low energy consumption. Another effective approach involves accelerating MIMO detection at the hardware level. However, with the gradual demise of Moore's law, the enhancement of computational performance of traditional processors based on complementary metal oxide semiconductor (CMOS) process is becoming increasingly challenging, which makes it difficult for CMOS-based digital processors to keep up with the demand of next-generation massive MIMO receivers.

On the other avenue, the emerging memristive devices can be integrated into crossbar arrays for analog matrix computing. Combined with operational amplifiers (OAs), the memristive crossbar array (MCA) enables high-dimensional matrix operations in extremely short time, including matrix-vector multiplication (MVM) [2], inverse matrix computation [3] and pseudoinverse matrix computation [4]. As a form of in-memory computing, the analog matrix computing technology offers significant advantages in computational speed and energy efficiency compared to traditional digital approach. Given that massive MIMO detectors primarily involve high-dimensional matrix operations, the MCA enables the realization of next-generation massive MIMO receivers with high performance, ultra-low latency and low energy consumption.

Although MCA has been applied successfully in the realms such as deep neural networks, machine learning, image processing and so on, its application in massive MIMO detection is still at a nascent stage. The work in [5] applied MCA to accelerate the MVM operations in discrete fourier transform and MIMO detection. However, this scheme relied on another processor to compute inverse matrices and so was a palliative approach. In the study [6], an MCA-based zero forcing (ZF) precoder circuit was proposed, whose core concept can be applied to develop an MCA-based ZF detector. In the studies [7], [8], two MCA-based detector circuits with similar structures were proposed, respectively, and both circuits can be used for the computation of linear detection algorithms, including ZF, regularized ZF and minimum mean-square error (MMSE) algorithms. However, the works [6]–[8] did not consider the disparity in large-scale fading coefficients (LSFCs) associated with user terminals (UTs) distributed in different locations of a massive MIMO network. This disparity leads to different elements of the matrices computed in MCA-based circuits following probability distributions with distinct variances and having large matrix condition numbers, which makes the detection performance susceptible to conductance errors.

S. Chen is with the School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, U.K. (e-mail: sqc@ecs.soton.ac.uk).

To solve this problem, in this paper we propose a novel MCA-based circuit design for massive MIMO linear detectors. Our main contributions are summarized as follows.

- We propose an MCA-based circuit design for massive MIMO ZF and MMSE detectors. Differing from the existing MCA-based detector circuits, our circuit design decompose the channel matrix into the product of the LSFC matrix and the small-scale fading coefficient (SSFC) matrix, and deal with them separately.
- We present two conductance mapping schemes for the MCA-based detector circuits, based on statistical channel state information (CSI) and instantaneous CSI, respectively. The conductance mapping scheme is crucial to MCA-based detector circuits but has been overlooked in all prior studies.
- We investigate the impacts of mapping scheme and conductance errors on detection performance and demonstrate the performance advantage of the proposed detector circuit over the conventional MCA-based detector circuit. We alse demonstrate the significant advantage of the proposed circuit over traditional digital approach in terms of energy efficiency.

## II. SYSTEM MODEL AND BASIC ALGORITHMS

# A. System Model

We consider a massive MIMO system, in which the BS is equipped with R antennas to support K single-antenna UTs with R > K. The uplink received signals are given by:

$$\tilde{\mathbf{y}} = \tilde{\mathbf{H}}\tilde{\mathbf{s}} + \tilde{\mathbf{n}},\tag{1}$$

where  $\tilde{\mathbf{y}} \in \mathbb{C}^{R \times 1}$  is the received signal vector,  $\tilde{\mathbf{s}} \in \mathbb{C}^{K \times 1}$  is the transmitted signal vector sent by the UTs,  $\tilde{\mathbf{H}} \in \mathbb{C}^{R \times K}$  is the channel matrix, and  $\tilde{\mathbf{n}} \in \mathbb{C}^{R \times 1}$  is a complex additive white Gaussian noise (AWGN) vector with variance  $\sigma_n^2$  per element, i.e.,  $\tilde{\mathbf{n}} \sim \mathcal{CN}(\mathbf{0}, \sigma_n^2 \mathbf{I})$  with  $\mathbf{0}$  and  $\mathbf{I}$  denoting the zero vector and the identity matrix of appropriate dimensions, respectively.

Let  $\lambda_1, \dots, \lambda_K$  be the LSFCs between the K UTs and the BS. The channel matrix  $\tilde{\mathbf{H}}$  can be expressed as:

$$\tilde{\mathbf{H}} = \tilde{\mathbf{G}}\tilde{\boldsymbol{\Lambda}},\tag{2}$$

where the diagonal matrix  $\tilde{\mathbf{\Lambda}} = \mathrm{diag} \left( \sqrt{\lambda 1}, \cdots, \sqrt{\lambda_K} \right)$  represents the LSFC matrix and  $\tilde{\mathbf{G}} \in \mathbb{C}^{R \times K}$  is the SSFC matrix. We consider the typical Rayleigh fading channel model, which means that the elements of  $\tilde{\mathbf{G}}$  follow the zero-mean Gaussian distribution with variance  $\sigma_q^2$  per dimension, namely,

$$\tilde{g}_{i,j} \sim \mathcal{CN}(0, 2\sigma_q^2), 1 \le i \le R, 1 \le j \le K.$$
 (3)

The complex-valued system model of (1) can be expressed in an equivalent real-valued system model of

$$y = Hs + n, (4)$$

where

$$\mathbf{y} = \begin{bmatrix} \Re(\tilde{\mathbf{y}}) \\ \Im(\tilde{\mathbf{y}}) \end{bmatrix}, \ \mathbf{s} = \begin{bmatrix} \Re(\tilde{\mathbf{s}}) \\ \Im(\tilde{\mathbf{s}}) \end{bmatrix}, \ \mathbf{n} = \begin{bmatrix} \Re(\tilde{\mathbf{n}}) \\ \Im(\tilde{\mathbf{n}}) \end{bmatrix},$$

$$\mathbf{H} = \begin{bmatrix} \Re(\tilde{\mathbf{H}}) & -\Im(\tilde{\mathbf{H}}) \\ \Im(\tilde{\mathbf{H}}) & \Re(\tilde{\mathbf{H}}) \end{bmatrix},$$

in which  $\Re(\cdot)$  and  $\Im(\cdot)$  denote the real and imaginary parts of the corresponding arguments, respectively. In particular, the real-valued channel matrix  $\mathbf{H} \in \mathbb{R}^{2R \times 2K}$  is given by

$$\mathbf{H} = \mathbf{G}\boldsymbol{\Lambda},\tag{5}$$

where  $\Lambda = \text{diag}(\sqrt{\lambda_1}, \cdots, \sqrt{\lambda_K}, \sqrt{\lambda_1}, \cdots, \sqrt{\lambda_K})$  and

$$\mathbf{G} = \left[ \begin{array}{cc} \Re(\tilde{\mathbf{G}}) & -\Im(\tilde{\mathbf{G}}) \\ \Im(\tilde{\mathbf{G}}) & \Re(\tilde{\mathbf{G}}) \end{array} \right].$$

The task of a massive MIMO detector is to estimate s from y given the CSI. And the CSI is assumed to be perfectly known in this paper .

#### B. Basic Detection Algorithms

We consider the following two basic linear detection algorithms.

1) ZF Algorithm: The ZF algorithm estimates signals by:

$$\hat{\mathbf{s}}_{\mathrm{ZF}} = (\mathbf{H}^{\mathrm{T}}\mathbf{H})^{-1}\mathbf{H}^{\mathrm{T}}\mathbf{y},\tag{6}$$

where  $(\cdot)^T$  represents the transpose operator,  $(\cdot)^{-1}$  represents the inverse operator. Upon substituting (5) into (6) we obtain:

$$\hat{\mathbf{s}}_{\mathrm{ZF}} = \mathbf{\Lambda}^{-1} (\mathbf{G}^{\mathrm{T}} \mathbf{G})^{-1} \mathbf{G}^{\mathrm{T}} \mathbf{y}. \tag{7}$$

2) MMSE Algorithm: The MMSE algorithm estimates signals by:

$$\hat{\mathbf{s}}_{\text{MMSE}} = \left(\mathbf{H}^{\text{T}}\mathbf{H} + \rho \mathbf{I}\right)^{-1}\mathbf{H}^{\text{T}}\mathbf{y},\tag{8}$$

where  $\rho=\frac{\sigma_n^2}{p_s}$  and  $p_s$  is the average symbol energy of the transmitted signals. Upon substituting (5) into (8) we obtain:

$$\hat{\mathbf{s}}_{\text{MMSE}} = \mathbf{\Lambda}^{-1} (\mathbf{G}^{T} \mathbf{G} + \mathbf{P})^{-1} \mathbf{G}^{T} \mathbf{y}, \tag{9}$$

where 
$$\mathbf{P} = \operatorname{diag}\left(\frac{\rho}{\lambda_1}, \frac{\rho}{\lambda_2}, \cdots, \frac{\rho}{\lambda_K}, \frac{\rho}{\lambda_1}, \frac{\rho}{\lambda_2}, \cdots, \frac{\rho}{\lambda_K}\right)$$
.

# III. PROPOSED MCA-BASED CIRCUIT DESIGN

The proposed detector circuit is illustrated in Fig. 1, which is a combination of an MCA-based computing module and 2K amplifier circuits. The MCA-based computing module comprises four  $2R \times 2K$  MCAs and two sets of OAs, the conductances of the feedback memristive devices of the first set of OAs are all  $\delta_0$ , and the conductances of the feedback memristive devices of the second set of OAs are  $\delta_1, \delta_2, \cdots, \delta_{2K}$ .

Owing to the virtual ground property of OA networks, the voltages at the inverting-input nodes of the first set of OAs and the noninverting-input nodes of the second set of OAs are approximately zeros. Besides, the currents flowing into the inverting-input nodes of the first set of OAs and the noninverting-input nodes of the second set of OAs are approximately zeros owing to the inherent characteristic of OAs. Let A, B, C and D be the conductance matrices of the four MCAs, respectively,  $v_1$  be the output voltages of the first set of OAs,  $v_2$  be the output voltages of the second set of OAs, and  $i_{in}$  be the input currents. Further denote E = A - B, F = C - D, and  $\Delta_1 = \text{diag}(\delta_1, \delta_2, \cdots, \delta_{2K})$ . According to Ohm's law and Kirchhoff's law, we have

$$\mathbf{E}\mathbf{v}_2 + \mathbf{i}_{\text{in}} + \delta_0 \mathbf{v}_1 = \mathbf{0} \tag{10}$$



Fig. 1. The proposed MCA-based detector circuit.

and

$$\mathbf{F}^{\mathrm{T}}\mathbf{v}_{1} - \mathbf{\Delta}_{1}\mathbf{v}_{2} = \mathbf{0}.\tag{11}$$

Upon substituting (11) into (10) we obtain:

$$\mathbf{v}_2 = -(\mathbf{F}^{\mathrm{T}}\mathbf{E} + \mathbf{\Delta})^{-1}\mathbf{F}^{\mathrm{T}}\mathbf{i}_{\mathrm{in}},\tag{12}$$

where  $\Delta = \text{diag}(\delta_0 \delta_1, \delta_0 \delta_2, \cdots, \delta_0 \delta_{2K}).$ 

For the amplifier circuits, let  $\theta_1, \theta_2, \cdots, \theta_{2K}$  be the conductances of the feedback memristive devices, and denote  $\theta_0$ as the conductance of the memristive devices connected to the output nodes of the second set of OAs. The magnification of the kth amplifier circuit is  $\frac{\theta_0}{\theta_k}$ . The output voltages of the amplifier circuits are:

$$\mathbf{v}_{\text{out}} = -\mathbf{\Theta}^{-1}\mathbf{v}_2,\tag{13}$$

where  $\Theta = \operatorname{diag}\left(\frac{\theta_1}{\theta_0}, \frac{\theta_2}{\theta_0}, \cdots, \frac{\theta_{2K}}{\theta_0}\right)$ . Upon substituting (12) into (13) we obtain:

$$\mathbf{v}_{\text{out}} = \mathbf{\Theta}^{-1} (\mathbf{F}^{\text{T}} \mathbf{E} + \mathbf{\Delta})^{-1} \mathbf{F}^{\text{T}} \mathbf{i}_{\text{in}}.$$
 (14)

A memristive device is a two-terminal device whose conductance can be changed by charge or flux through it. Using a dedicated program [9], [10], the conductance of a memristive device can be set to any desired value within a specified range. By mapping y onto  $i_{in}$ , mapping G onto E and F, setting  $\Delta$ to zeros or mapping P onto  $\Delta$ , and mapping  $\Lambda$  onto  $\Theta$ , the result of (7) or (9) can be obtained by measuring  $\mathbf{v}_{\text{out}}$ .

The conventional MCA-based detector circuit does not decompose the channel matrix into the product of the LSFC matrix and the SSFC matrix. Obviously, the MCA-based computing module in Fig. 1 can be employed as a conventional MCA-based detector circuit to compute (6) or (8). Therefore, in the rest of this paper, we employ this module to represent the conventional MCA-based detector circuit for analysis convenience.

#### IV. CONDUCTANCE MAPPING SCHEMES

The mapped matrix may contain both positive and negative elements, but the device conductance values must remain positive. So we map the matrix onto the difference between two positive conductance matrices, instead of a single conductance matrix. Let the conductance range of memristive devices be  $[\omega_{\min}, \ \omega_{\max}]$ . We define  $\omega = \omega_{\max} - \omega_{\min}$ . The scheme for mapping a matrix U onto the conductance matrix X - Z is:

$$x_{i,j} = \begin{cases} \omega_{\text{max}}, u_{i,j} > 0\\ \omega_{\text{min}}, u_{i,j} \le 0 \end{cases}$$
 (15)

and

$$z_{i,j} = x_{i,j} - \alpha u_{i,j},\tag{16}$$

where  $\alpha$  is the scaling factor. Any conductance that is beyond the conductance range will be clipped to the endpoints.

Process variations and device limitations always lead to conductance errors of memristive devices. The conductance errors can be modeled as Gaussian random variables with mean 0 and variance  $\sigma_m^2$  [10], [11]. Therefore, the impact of conductance errors is equivalent to applying perturbations with variance  $\frac{2\sigma_m^2}{\alpha^2}$  to each element of the mapped matrix.

In this section, we propose two conductance mapping schemes, one termed the statistical CSI-based (SCB) scheme, the other termed the instantaneous CSI-based (ICB) scheme.

#### A. SCB Mapping Scheme

Our SCB scheme selects a fixed scaling factor based on the statistical CSI. Specifically, to map a matrix U onto conductance matrices, the SCB scheme calculates the scaling factor by:

$$\alpha = \frac{\omega}{\beta \sigma_u},\tag{17}$$

where  $\beta$  is the scaling parameter of the SCB scheme and  $\sigma_u$  is the standard deviation of the elements of the mapped matrix.

The proposed detector circuit maps G onto conductance matrices. For G,  $\sigma_u = \sigma_q$ . The conventional detector circuit maps H onto conductance matrices. For H,

$$\sigma_u = \sqrt{\frac{\sum\limits_{k=1}^K \lambda_k}{K}} \sigma_g. \tag{18}$$

# B. ICB Mapping Scheme

Our ICB scheme calculates the scaling factor according to:

$$\alpha = \frac{\omega}{\max\{|u_{i,j}|\}},\tag{19}$$

to map U onto conductance matrices. Clearly, with this scaling factor, no element of U will be clipped. Unlike the SCB scheme, the ICB scheme requires to recalculate the scaling factor with each change of instantaneous channel matrix.

#### C. Discussion

When dealing with a matrix whose elements follow different probability distributions, it becomes challenging to select an appropriate scaling parameter  $\beta$  for the SCB mapping scheme. This is because a small scaling parameter leads to a large scaling factor  $\alpha$ , which is likely to result in a substantial probability of the elements with larger variance being clipped, while a large scaling parameter brings about significant perturbations caused by conductance errors, and the perturbations are particularly severe to the elements with smaller variance. As for the ICB mapping scheme, its scaling parameter is usually decided by the elements with larger variance. Similarly, the perturbations caused by conductance errors are particularly severe to the elements with smaller variance. Evidently, the larger the variance disparity among the different elements of the mapped matrix, the more significant the aforementioned effects become, and the severer the perturbations caused by conductance errors.

For the proposed detector circuit, the elements of the mapped matrix **G** follow the same distribution. In practical scenarios, UTs in a cell always have different distances to the BS, leading to the distinct LSFCs of different UTs. Thus the elements within different columns of the channel matrix **H** follow the probability distributions with different variances. Clearly, the conventional MCA-based detector circuit exhibits a significant variance disparity of the elements of its mapped matrix, which results in severe perturbations caused by conductance errors. This is the reason why we decompose the channel matrix into the product of the LSFC matrix and the SSFC matrix, mapping them separately. It also indicates the superiority of the proposed detector circuit compared to the conventional MCA-based detector circuit.

#### V. SIMULATIONS

We consider a multi-user massive MIMO system with 4 UTs and 64 BS antennas, and 64 quadrature amplitude modulation (QAM) is used in the simulation. The conductance range of memristive devices is  $0.1 \sim 30\,\mu\text{S}$ . The SPICE simulations in this paper are conducted using LTspice<sup>®</sup>.

# A. Computation Time

The computation time is an important performance metric for MCA-based detector circuits and we measure it in terms of the convergence time of the circuit. The most critical influencing factor on convergence time is the gain-bandwidth product (GBP) of OAs [12]. The transient results of output



Fig. 2. Transient results of output voltages of (a) the proposed detector circuit, and (b) the conventional MCA-based detector circuit.



Fig. 3. BERs of the proposed detector circuit when  $\sigma_m=0$  and adopting the SCB scheme.



Fig. 4. BERs of the proposed detector circuit, given  $\sigma_m = 1\%\omega$ .

voltages of the proposed and conventional MCA-based detector circuits are illustrated in Fig. 2, where the OAs have a GBP of 500 MHz. The proposed circuit exhibits almost identical computation time to that of the conventional MCA-based detector circuit. The computation time of the proposed detector circuit is typically about 80 ns, and it can be further reduced by increasing the GBP of OAs.

#### B. Detection Performance

Simulation results indicate that there is no significant difference between the detection performances of ZF and MMSE algorithms in the considered scenario. Therefore we do not distinguish between the ZF and MMSE in the figures.

In Fig. 3, we compare the bit error rate (BER) performances of the proposed detector circuit adopting the SCB scheme under different scaling parameters, given  $\sigma_m = 0$ , using the digital approach as the benchmark. For the SCB mapping scheme, the larger the scaling parameter, the fewer elements are clipped, and the lower the BER is, i.e., the closer the performance of the proposed detector circuit to digital approach. Specifically, the scaling parameter needs to be at least 3.0 for the proposed detector circuit to ensure satis-



Fig. 5. BERs of the proposed detector circuit as the functions of the scaling parameter, under different conductance error levels with an SNR of  $15\,\mathrm{dB}$ .



Fig. 6. BERs of the proposed and conventional detector circuits as the functions of the scaling parameter in a massive MIMO cell, given  $\sigma_m=0.5\%\omega$ .

factory performance. While the presence of clipped elements increases the BER of the proposed detector circuit, its impact is observable only in high signal-to-noise ratio (SNR). In low SNR, the AWGN remains the primary factor constraining detection performance. Even in the absence of AWGN, clipped elements still cause detection errors, and the BER may exhibit the error floor as the SNR increases.

Fig. 4 depicts the BER performances of the proposed detector circuit, given  $\sigma_m=1\%\omega$ . Simulation results indicate that a larger value of scaling parameter no longer signifies a lower BER. This is due to the fact that a larger scaling parameter implies severer perturbations caused by conductance errors.

To gain further insight into the relationship between the BER and the scaling parameter  $\beta$ , we exam the BERs as the functions of the scaling parameter for the proposed detector circuit in Fig. 5, given different conductance error levels with an SNR of 15 dB. Simulation results reveal that the BER of the proposed detector circuit adopting the SCB scheme first decreases and then increases as the scaling parameter increases in the presence of conductance errors, because the primary factor constraining detection performance shifts



Fig. 7. Power consumption results of the proposed and conventional MCA-based detector circuits, as well as the RAPC results of the proposed circuit, as the functions of the number of UTs, K.

from the clipped elements to the perturbations caused by conductance errors as  $\beta$  increases. As expected, the higher the conductance error level, the higher the BER of the detector circuit, regardless whether the SCB scheme or the ICB scheme is adopted. When the conductance error level is low, the BER of the detector circuit adopting the SCB scheme consistently remains higher than that of the ICB scheme. However, when the conductance error level is high, the BER of the detector circuit adopting the ICB scheme is higher than the achievable minimum BER of the SCB scheme.

After investigating the impacts of conductance mapping scheme and conductance errors on detection performance, we demonstrate the performance advantage of the proposed detector circuit over the conventional MCA-based detector circuit. We consider a massive MIMO cell with randomly distributed UTs. The radius of the cell is 150 m, the uplink carrier frequency is 2 GHz and the bandwidth is 25 MHz. The transmitting power of a UT is 20 dBm. Fig. 6 compares the BER results of the proposed and conventional MCA-based detector circuits as the functions of the scaling parameter, given  $\sigma_m = 0.5\%\omega$ . The results demonstrate that compared to the conventional MCA-based detector circuit, the proposed detector circuit consistently exhibits a significantly lower BER, for both the ICB scheme and SCB scheme.

# C. Power Consumption, Computing Performance and Energy Efficiency

In this experiment, we consider the OA whose static power dissipation is  $12\,\mu\mathrm{W}$  and GBP is 500 MHz [13]. We use current-based digital-to-analog converters (DACs) of [14] to provide input currents for the MCA-based detector circuits. The analog-to-digital converters (ADCs) of [15] are used to measure the output voltages.

The proposed circuit incorporates additional 2K amplifier circuits compared to the conventional MCA-based detector circuit. Fig. 7 depicts the power consumption results of the proposed and conventional MCA-based detector circuits as the functions of the number of UTs, K. Meanwhile, Fig. 7



Fig. 8. Computing performance and energy efficiency results of the proposed and conventional MCA-based detector circuits as the functions of the number of UTs, K, using the commercial GPU NVIDIA QUADRO GV100 as the benchmark.

depicts the relative additional power consumption (RAPC) of the proposed circuit compared to the conventional MCA-based detector circuit. The RAPC of the proposed circuit is less than 0.6%, which means the additional amplifier circuits of the proposed circuit do not result in significant additional power consumption.

We use the ratio of the number of equivalent floating-point operations (FLOPs) to the computation time of an MCA-based detector circuit as a metric to gauge its computing performance, in which a FLOP is assumed to be either a real multiplication or a real summation. Besides, we use the ratio of the equivalent FLOP number of an MCA-based detector circuit to the energy consumed during its computation time as a metric to gauge its energy efficiency. The two metrics are measured in tera-FLOPs per second (TOPS) and TOPS/W, respectively.

Fig. 8 depicts the computing performance and energy efficiency results of the proposed and conventional MCA-based detector circuits, using the commercial graphic processing unit (GPU) NVIDIA QUADRO GV100 [16] as the benchmark. There is no significant difference in computing performance and energy efficiency between the proposed and conventional MCA-based detector circuits. The higher the number of UTs, the higher the dimensions of computed matrices, but the higher the computing performance and the energy efficiency of the MCA-based detector circuits. The MCA-based detector circuits exhibit computing performance advantages over the commercial GPU only when K is relatively large, but their energy efficiency surpasses the GPU by several orders of magnitude.

## VI. CONCLUSIONS

We have proposed a novel MCA-based circuit design for massive MIMO ZF and MMSE detectors. The proposed detector circuit employs an MCA-based matrix computing module and OA-based amplifier circuits to separately deal with the SSFC matrix and the LSFC matrix, significantly reducing the perturbations caused by conductance errors. We have presented two conductance mapping schemes for the MCA-based detector circuits, one termed the SCB scheme and the other termed the ICB scheme. We have investigated the impacts of mapping scheme and conductance errors on detection performance of the proposed detector circuit and have demonstrated the significant performance advantage of our proposed detector circuit over the conventional MCA-based detector circuit. Although the proposed circuit incorporates additional amplifier circuits compared to the conventional MCA-based detector circuit, the additional amplifier circuits do not result in observable additional power consumption. The energy efficiency of the proposed circuit is tens to hundreds of times that of the commercial GPU NVIDIA QUADRO GV100.

#### REFERENCES

- S. Yang and L. Hanzo, "Fifty years of MIMO detection: The road to large-scale MIMOs," *IEEE Commun. Surveys Tuts.*, vol. 17, no. 4, pp. 1941–1988, 4th Quart. 2015.
- [2] L. Xia, et al., "Technological exploration of RRAM crossbar array for matrix-vector multiplication," J. Comput. Sci. Technol., vol. 3, no. 1, pp. 3–19, Jan. 2016.
- [3] Z. Sun, et al., "Solving matrix equations in one step with cross-point resistive arrays," Proc. Nat. Acad. Sci., vol. 116, no. 10, pp. 4123–4128, Mar. 2019.
- [4] Z. Sun, G. Pedretti, A. Bricalli, and D. Ielmini, "One-step regression and classification with cross-point resistive memory arrays," Sci. Adv., vol. 6, no. 5, Jan. 2020, Art. no. eaay2378.
- [5] G. Yuan, et al., "Memristor crossbar-based ultra-efficient next-generation baseband processors," in Proc. IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, Aug. 6-9, 2017, pp. 1121–1124.
- [6] P. Zuo, Z. Sun, and R. Huang, "Extremely-fast, energy-efficient massive MIMO precoding with analog RRAM matrix computing," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 70, no. 7, pp. 2335–2339, Jul. 2023.
- [7] P. Mannocci, E. Melacarne, and D. Ielmini, "An analogue in-memory ridge regression circuit with application to massive MIMO acceleration," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 12, no. 4, pp. 952–962, Dec. 2022.
- [8] Q. Zeng, et al., "Realizing in-memory baseband processing for ultrafast and energy-efficient 6G," *IEEE Internet Things J.*, vol. 11, no. 3, pp. 5169–5183, Feb. 2024.
- [9] P. Yao, et al., "Face classification using electronic synapses," Nature Commun., vol. 8, no. 1, May. 2017, Art. no. 15199.
- [10] C. Li, et al., "Analogue signal and image processing with large memristor crossbars," Nature Electron., vol. 1, no. 1, pp. 52–59, Jan. 2018.
- [11] T. P. Xiao, et al., "On the accuracy of analog neural network inference accelerators," *IEEE Circuits Syst. Mag.*, vol. 22, no. 4, pp. 26–48, 4th Ouart. 2022.
- [12] P. Mannocci, et al., "A universal, analog, in-memory computing primitive for linear algebra using memristors," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 68, no. 12, pp. 4889–4899, Dec. 2021.
- [13] B. Feinberg, et al., "An analog preconditioner for solving linear systems," in Proc. IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Korea, Feb. 27-Mar. 3, 2021, pp. 761–774.
- [14] S. Ishraqul Huq, S. Islam, N. Saqib, and S. N. Biswas, "Design of low power 8-bit DAC using PTM-LP technology," in *Proc. International Conference on Recent Trends in Electrical, Electronics and Computing Technologies (ICRTEECT)*, Warangal, India, Jul. 30-31, 2017, pp. 64– 69.
- [15] M. J. Marinella, et al., "Multiscale co-design analysis of energy, latency, area, and accuracy of a ReRAM analog neural training accelerator," IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 8, no. 1, pp. 86–101, Mar. 2018.
- [16] "Data sheet: Quadro GV100," NVIDIA, 2022. [Online]. Available: https://www.nvidia.com/en-us/design-visualization/quadro/