Source author record

Ruoyu Sun

Ruoyu Sun appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.OC eess.SP Information Theory math.IT Computer Vision Robotics

Catalog footprint

What is connected

18works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AI-Driven Spectrum Occupancy Prediction Using Real-World Spectrum Measurements

Spectrum occupancy prediction is a critical enabler for real-time and proactive dynamic spectrum sharing (DSS), as it can provide short-term channel availability information to support more efficient spectrum access decisions in wireless communication systems. Instead of relying on open-source datasets or simulated data, commonly used in the literature, this paper investigates short-horizon spectrum occupancy prediction using mid-band, 24X7 real-world spectrum measurement data collected in the United States. We construct a multi-band channel occupancy dataset through analyzing 61 days of empirical data and formulate a next-minute channel occupancy prediction task across all frequency channels. This study focuses on AI-driven prediction methods, including Random Forest, Extreme Gradient Boosting (XGBoost), and a Long Short-Term Memory (LSTM) network, and compares their performance against a conventional Markov chain-based statistical baseline. Numerical results show that learning-based methods outperform the statistical baseline on dynamic channels, particularly under fixed false-alarm constraints. These results demonstrate the effectiveness of AI-driven spectrum occupancy prediction, indicating that lightweight learning models can effectively support future deployment-oriented DSS systems.

preprint2026arXiv

Automated Spectrum Sensing and Analysis Framework

Spectrum sensing and analysis is crucial for a variety of reasons, including regulatory compliance, interference detection and mitigation, and spectrum resource planning and optimization. Effective, real-time spectrum analysis remains a challenge, stemming from the need to analyse an increasingly complex and dynamic environment with limited resources. The vast amount of data generated from sensing the spectrum at multiple sites requires sophisticated data analysis and processing techniques, which can be technically demanding and expensive. This paper presents a novel, holistic framework developed and deployed at multiple locations across the USA for spectrum analysis and describes the different parts of the end-to-end pipeline. The details of each of the modules of the pipeline, data collection and pre-processing at remote locations, transfer to a centralized location, post-processing analysis, visualization, and long-term storage, are reported. The motivation behind this work is to develop a robust spectrum analysis framework that can help gain greater insights into the spectrum usage across the country and augment additional use cases such as dynamic spectrum sharing.

preprint2026arXiv

LarS-Net: A Large-Scale Framework for Network-Level Spectrum Sensing

As the demand of wireless communication continues to rise, the radio spectrum (a finite resource) requires increasingly efficient utilization. This trend is driving the evolution from static, stand-alone spectrum allocation toward spectrum sharing and dynamic spectrum sharing. A critical element of this transition is spectrum sensing, which facilitates informed decision-making in shared environments. Previous studies on spectrum sensing and cognitive radio have been largely limited to individual sensors or small sensor groups. In this work, a large-scale spectrum sensing network (LarS-Net) is designed in a cost-effective manner. Spectrum sensors are either co-located with base stations (BSs) to share the tower, backhaul, and power infrastructure, or integrated directly into BSs as a new feature leveraging active BS antenna systems. As an example incumbent system, fixed service microwave link operating in the lower-7 GHz band is investigated. This band is a primary candidate for 6G, being considered by the WRC-23, ITU, and FCC. Based on Monte Carlo simulations, we determine the minimum subset of BSs equipped with sensing capability to guarantee a target incumbent detection probability. The simulations account for various sensor antenna configurations, propagation channel models, and duty cycles for both incumbent transmissions and sensing operations. Building on this framework, we introduce three network-level sensing performance metrics: Emission Detection Probability (EDP), Temporal Detection Probability (TDP), and Temporal Mis-detection Probability (TMP), which jointly capture spatial coverage, temporal detectability, and multi-node diversity effects. Using these metrics, we analyze the impact of LarS-Net inter-site distance, noise uncertainty, and sensing duty-cycle on large-scale sensing performance.

preprint2023arXiv

Adam Can Converge Without Any Modification On Update Rules

Ever since Reddi et al. 2018 pointed out the divergence issue of Adam, many new variants have been designed to obtain convergence. However, vanilla Adam remains exceptionally popular and it works well in practice. Why is there a gap between theory and practice? We point out there is a mismatch between the settings of theory and practice: Reddi et al. 2018 pick the problem after picking the hyperparameters of Adam, i.e., $(β_1, β_2)$; while practical applications often fix the problem first and then tune $(β_1, β_2)$. Due to this observation, we conjecture that the empirical convergence can be theoretically justified, only if we change the order of picking the problem and hyperparameter. In this work, we confirm this conjecture. We prove that, when $β_2$ is large and $β_1 < \sqrt{β_2}<1$, Adam converges to the neighborhood of critical points. The size of the neighborhood is propositional to the variance of stochastic gradients. Under an extra condition (strong growth condition), Adam converges to critical points. It is worth mentioning that our results cover a wide range of hyperparameters: as $β_2$ increases, our convergence result can cover any $β_1 \in [0,1)$ including $β_1=0.9$, which is the default setting in deep learning libraries. To our knowledge, this is the first result showing that Adam can converge without any modification on its update rules. Further, our analysis does not require assumptions of bounded gradients or bounded 2nd-order momentum. When $β_2$ is small, we further point out a large region of $(β_1,β_2)$ where Adam can diverge to infinity. Our divergence result considers the same setting as our convergence result, indicating a phase transition from divergence to convergence when increasing $β_2$. These positive and negative results can provide suggestions on how to tune Adam hyperparameters.

preprint2022arXiv

Does Momentum Change the Implicit Regularization on Separable Data?

The momentum acceleration technique is widely adopted in many optimization algorithms. However, there is no theoretical answer on how the momentum affects the generalization performance of the optimization algorithms. This paper studies this problem by analyzing the implicit regularization of momentum-based optimization. We prove that on the linear classification problem with separable data and exponential-tailed loss, gradient descent with momentum (GDM) converges to the L2 max-margin solution, which is the same as vanilla gradient descent. That means gradient descent with momentum acceleration still converges to a low-complexity model, which guarantees their generalization. We then analyze the stochastic and adaptive variants of GDM (i.e., SGDM and deterministic Adam) and show they also converge to the L2 max-margin solution. Technically, to overcome the difficulty of the error accumulation in analyzing the momentum, we construct new potential functions to analyze the gap between the model parameter and the max-margin solution. Numerical experiments are conducted and support our theoretical results.

preprint2022arXiv

Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning

Model-agnostic meta-learning (MAML) and its variants have become popular approaches for few-shot learning. However, due to the non-convexity of deep neural nets (DNNs) and the bi-level formulation of MAML, the theoretical properties of MAML with DNNs remain largely unknown. In this paper, we first prove that MAML with over-parameterized DNNs is guaranteed to converge to global optima at a linear rate. Our convergence analysis indicates that MAML with over-parameterized DNNs is equivalent to kernel regression with a novel class of kernels, which we name as Meta Neural Tangent Kernels (MetaNTK). Then, we propose MetaNTK-NAS, a new training-free neural architecture search (NAS) method for few-shot learning that uses MetaNTK to rank and select architectures. Empirically, we compare our MetaNTK-NAS with previous NAS methods on two popular few-shot learning benchmarks, miniImageNet, and tieredImageNet. We show that the performance of MetaNTK-NAS is comparable or better than the state-of-the-art NAS method designed for few-shot learning while enjoying more than 100x speedup. We believe the efficiency of MetaNTK-NAS makes itself more practical for many real-world tasks.

preprint2022arXiv

On the Landscape of One-hidden-layer Sparse Networks and Beyond

Sparse neural networks have received increasing interest due to their small size compared to dense networks. Nevertheless, most existing works on neural network theory have focused on dense neural networks, and the understanding of sparse networks is very limited. In this paper, we study the loss landscape of one-hidden-layer sparse networks. First, we consider sparse networks with a dense final layer. We show that linear networks can have no spurious valleys under special sparse structures, and non-linear networks could also admit no spurious valleys under a wide final layer. Second, we discover that spurious valleys and spurious minima can exist for wide sparse networks with a sparse final layer. This is different from wide dense networks which do not have spurious valleys under mild assumptions.

preprint2021arXiv

On a Faster $R$-Linear Convergence Rate of the Barzilai-Borwein Method

The Barzilai-Borwein (BB) method has demonstrated great empirical success in nonlinear optimization. However, the convergence speed of BB method is not well understood, as the known convergence rate of BB method for quadratic problems is much worse than the steepest descent (SD) method. Therefore, there is a large discrepancy between theory and practice. To shrink this gap, we prove that the BB method converges $R$-linearly at a rate of $1-1/κ$, where $κ$ is the condition number, for strongly convex quadratic problems. In addition, an example with the theoretical rate of convergence is constructed, indicating the tightness of our bound.

preprint2021arXiv

Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity

Update order is one of the major design choices of block decomposition algorithms. There are at least two classes of deterministic update orders: nonsymmetric (e.g. cyclic order) and symmetric (e.g. Gaussian back substitution or symmetric Gauss-Seidel). Recently, Coordinate Descent (CD) with cyclic order was shown to be $O(n^2)$ times slower than randomized versions in the worst-case. A natural question arises: can the symmetrized orders achieve faster convergence rates than the cyclic order, or even getting close to the randomized versions? In this paper, we give a negative answer to this question. We show that both Gaussian back substitution (GBS) and symmetric Gauss-Seidel (sGS) suffer from the same slow convergence issue as the cyclic order in the worst case. In particular, we prove that for unconstrained problems, both GBS-CD and sGS-CD can be $O(n^2)$ times slower than R-CD. Despite unconstrained problems, we also empirically study linearly constrained problems with quadratic objective: we empirically demonstrate that the convergence speed of GBS-ADMM and sGS-ADMM can be roughly $O(n^2)$ times slower than randomly permuted ADMM.

preprint2020arXiv

DEED: A General Quantization Scheme for Communication Efficiency in Bits

In distributed optimization, a popular technique to reduce communication is quantization. In this paper, we provide a general analysis framework for inexact gradient descent that is applicable to quantization schemes. We also propose a quantization scheme Double Encoding and Error Diminishing (DEED). DEED can achieve small communication complexity in three settings: frequent-communication large-memory, frequent-communication small-memory, and infrequent-communication (e.g. federated learning). More specifically, in the frequent-communication large-memory setting, DEED can be easily combined with Nesterov's method, so that the total number of bits required is $\tilde{O}( \sqrtκ \log 1/ε)$, where $\tilde{O}$ hides numerical constant and $\log κ$ factors. In the frequent-communication small-memory setting, DEED combined with SGD only requires $\tilde{O}( κ\log 1/ε)$ number of bits in the interpolation regime. In the infrequent communication setting, DEED combined with Federated averaging requires a smaller total number of bits than Federated Averaging. All these algorithms converge at the same rate as their non-quantized versions, while using a smaller number of bits.

preprint2020arXiv

Distilling Object Detectors with Task Adaptive Regularization

Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices. Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization. In this paper, we investigate each module of a typical detector in depth, and propose a general distillation framework that adaptively transfers knowledge from teacher to student according to the task specific priors. The intuition is that simply distilling all information from teacher to student is not advisable, instead we should only borrow priors from the teacher model where the student cannot perform well. Towards this goal, we propose a region proposal sharing mechanism to interflow region responses between the teacher and student models. Based on this, we adaptively transfer knowledge at three levels, \emph{i.e.}, feature backbone, classification head, and bounding box regression head, according to which model performs more reasonably. Furthermore, considering that it would introduce optimization dilemma when minimizing distillation loss and detection loss simultaneously, we propose a distillation decay strategy to help improve model generalization via gradually reducing the distillation penalty. Experiments on widely used detection benchmarks demonstrate the effectiveness of our method. In particular, using Faster R-CNN with FPN as an instantiation, we achieve an accuracy of $39.0\%$ with Resnet-50 on COCO dataset, which surpasses the baseline $36.3\%$ by $2.7\%$ points, and even better than the teacher model with $38.5\%$ mAP.

preprint2020arXiv

Off-road Autonomous Vehicles Traversability Analysis and Trajectory Planning Based on Deep Inverse Reinforcement Learning

Terrain traversability analysis is a fundamental issue to achieve the autonomy of a robot at off-road environments. Geometry-based and appearance-based methods have been studied in decades, while behavior-based methods exploiting learning from demonstration (LfD) are new trends. Behavior-based methods learn cost functions that guide trajectory planning in compliance with experts' demonstrations, which can be more scalable to various scenes and driving behaviors. This research proposes a method of off-road traversability analysis and trajectory planning using Deep Maximum Entropy Inverse Reinforcement Learning. To incorporate vehicle's kinematics while solving the problem of exponential increase of state-space complexity, two convolutional neural networks, i.e., RL ConvNet and Svf ConvNet, are developed to encode kinematics into convolution kernels and achieve efficient forward reinforcement learning. We conduct experiments in off-road environments. Scene maps are generated using 3D LiDAR data, and expert demonstrations are either the vehicle's real driving trajectories at the scene or synthesized ones to represent specific behaviors such as crossing negative obstacles. Different cost functions of traversability analysis are learned and tested at various scenes of capability in guiding the trajectory planning of different behaviors. We also demonstrate the performance and computation efficiency of the proposed method.

preprint2019arXiv

Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity

Traditional landscape analysis of deep neural networks aims to show that no sub-optimal local minima exist in some appropriate sense. From this, one may be tempted to conclude that descent algorithms which escape saddle points will reach a good local minimum. However, basic optimization theory tell us that it is also possible for a descent algorithm to diverge to infinity if there are paths leading to infinity, along which the loss function decreases. It is not clear whether for non-linear neural networks there exists one setting that no bad local-min and no decreasing paths to infinity can be simultaneously achieved. In this paper, we give the first positive answer to this question. More specifically, for a large class of over-parameterized deep neural networks with appropriate regularizers, the loss function has no bad local minima and no decreasing paths to infinity. The key mathematical trick is to show that the set of regularizers which may be undesirable can be viewed as the image of a Lipschitz continuous mapping from a lower-dimensional Euclidean space to a higher-dimensional Euclidean space, and thus has zero measure.

preprint2016arXiv

Guaranteed Matrix Completion via Non-convex Factorization

Matrix factorization is a popular approach for large-scale matrix completion. The optimization formulation based on matrix factorization can be solved very efficiently by standard algorithms in practice. However, due to the non-convexity caused by the factorization model, there is a limited theoretical understanding of this formulation. In this paper, we establish a theoretical guarantee for the factorization formulation to correctly recover the underlying low-rank matrix. In particular, we show that under similar conditions to those in previous works, many standard optimization algorithms converge to the global optima of a factorization formulation, and recover the true low-rank matrix. We study the local geometry of a properly regularized factorization formulation and prove that any stationary point in a certain local region is globally optimal. A major difference of our work from the existing results is that we do not need resampling in either the algorithm or its analysis. Compared to other works on nonconvex optimization, one extra difficulty lies in analyzing nonconvex constrained optimization when the constraint (or the corresponding regularizer) is not "consistent" with the gradient direction. One technical contribution is the perturbation analysis for non-symmetric matrix factorization.

preprint2015arXiv

Globally Optimal Joint Uplink Base Station Association and Beamforming

The joint base station (BS) association and beamforming problem has been studied extensively in recent years, yet the computational complexity for even the simplest SISO case has not been fully characterized. In this paper, we consider the problems for an uplink SISO/SIMO cellular network under the max-min fairness criterion. We first prove that the problems for both the SISO and SIMO scenarios are polynomial time solvable. Secondly, we present a fixed point based binary search (BS-FP) algorithm for both SISO and SIMO scenarios whereby a QoS (Quality of Service) constrained subproblem is solved at each step by a fixed point method. Thirdly, we propose a normalized fixed point (NFP) iterative algorithm to directly solve the original problem and prove its geometric convergence to global optima. Although it is not known whether the NFP algorithm is a polynomial time algorithm, empirically it converges to the global optima orders of magnitude faster than the polynomial time algorithms, making it suitable for applications in huge-scale networks.

preprint2015arXiv

Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems

The iteration complexity of the block-coordinate descent (BCD) type algorithm has been under extensive investigation. It was recently shown that for convex problems the classical cyclic BCGD (block coordinate gradient descent) achieves an $\mathcal{O}(1/r)$ complexity ($r$ is the number of passes of all blocks). However, such bounds are at least linearly depend on $K$ (the number of variable blocks), and are at least $K$ times worse than those of the gradient descent (GD) and proximal gradient (PG) methods. In this paper, we aim to close such theoretical performance gap between cyclic BCD and GD/PG. First we show that for a family of quadratic nonsmooth problems, the complexity bounds for cyclic Block Coordinate Proximal Gradient (BCPG), a popular variant of BCD, can match those of the GD/PG in terms of dependency on $K$ (up to a $\log^2(K)$ factor). For the same family of problems, we also improve the bounds of the classical BCD (with exact block minimization) by an order of $K$. Second, we establish an improved complexity bound of Coordinate Gradient Descent (CGD) for general convex problems which can match that of GD in certain scenarios. Our bounds are sharper than the known bounds as they are always at least $K$ times worse than GD. Our analyses do not depend on the update order of block variables inside each cycle, thus our results also apply to BCD methods with random permutation (random sampling without replacement, another popular variant).

preprint2014arXiv

Interference alignment using finite and dependent channel extensions: the single beam case

Vector space interference alignment (IA) is known to achieve high degrees of freedom (DoF) with infinite independent channel extensions, but its performance is largely unknown for a finite number of possibly dependent channel extensions. In this paper, we consider a $K$-user $M_t \times M_r$ MIMO interference channel (IC) with arbitrary number of channel extensions $T$ and arbitrary channel diversity order $L$ (i.e., each channel matrix is a generic linear combination of $L$ fixed basis matrices). We study the maximum DoF achievable via vector space IA in the single beam case (i.e. each user sends one data stream). We prove that the total number of users $K$ that can communicate interference-free using linear transceivers is upper bounded by $NL+N^2/4$, where $N = \min\{M_tT, M_rT \}$. An immediate consequence of this upper bound is that for a SISO IC the DoF in the single beam case is no more than $\min\left\{\sqrt{ 5K/4}, L + T/4\right\}$. When the channel extensions are independent, i.e. $ L$ achieves the maximum $M_r M_t T $, we show that this maximum DoF lies in $[M_r+M_t-1, M_r+M_t]$ regardless of $T$. Unlike the well-studied constant MIMO IC case, the main difficulty is how to deal with a hybrid system of equations (zero-forcing condition) and inequalities (full rank condition). Our approach combines algebraic tools that deal with equations with an induction analysis that indirectly considers the inequalities.

preprint2014arXiv

Joint Downlink Base Station Association and Power Control for Max-Min Fairness: Computation and Complexity

In a heterogeneous network (HetNet) with a large number of low power base stations (BSs), proper user-BS association and power control is crucial to achieving desirable system performance. In this paper, we systematically study the joint BS association and power allocation problem for a downlink cellular network under the max-min fairness criterion. First, we show that this problem is NP-hard. Second, we show that the upper bound of the optimal value can be easily computed, and propose a two-stage algorithm to find a high-quality suboptimal solution. Simulation results show that the proposed algorithm is near-optimal in the high-SNR regime. Third, we show that the problem under some additional mild assumptions can be solved to global optima in polynomial time by a semi-distributed algorithm. This result is based on a transformation of the original problem to an assignment problem with gains $\log(g_{ij})$, where $\{g_{ij}\}$ are the channel gains.

Ruoyu Sun

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

AI-Driven Spectrum Occupancy Prediction Using Real-World Spectrum Measurements

Automated Spectrum Sensing and Analysis Framework

LarS-Net: A Large-Scale Framework for Network-Level Spectrum Sensing

Adam Can Converge Without Any Modification On Update Rules

Does Momentum Change the Implicit Regularization on Separable Data?

Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning

On the Landscape of One-hidden-layer Sparse Networks and Beyond

On a Faster $R$-Linear Convergence Rate of the Barzilai-Borwein Method

Understanding Limitation of Two Symmetrized Orders by Worst-case Complexity

DEED: A General Quantization Scheme for Communication Efficiency in Bits

Distilling Object Detectors with Task Adaptive Regularization

Off-road Autonomous Vehicles Traversability Analysis and Trajectory Planning Based on Deep Inverse Reinforcement Learning

Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity

Guaranteed Matrix Completion via Non-convex Factorization

Globally Optimal Joint Uplink Base Station Association and Beamforming

Improved Iteration Complexity Bounds of Cyclic Block Coordinate Descent for Convex Problems

Interference alignment using finite and dependent channel extensions: the single beam case

Joint Downlink Base Station Association and Power Control for Max-Min Fairness: Computation and Complexity