Researcher profile

Rong Chen

Rong Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
14topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

Towards Fully-fledged GPU Multitasking via Proactive Memory Scheduling

The limited HBM capacity has become the primary bottleneck for hosting an increasing number of larger-scale GPU tasks. While demand paging extends capacity via host DRAM, it incurs up to 78x slowdown due to the massive working sets and poor locality of GPU workloads. We observe, however, that GPU memory access patterns are inherently predictable via kernel launch arguments and their asynchronous execution nature. Leveraging this, we propose MSched, an OS-level scheduler that extends GPU context switching to include proactive working set preparation, thereby coalescing fragmented, eventual, and expensive page faults into a single efficient migration. MSched employs a template-based approach to predict working sets with near-perfect accuracy and proposes a co-design between task scheduler and memory manager to enforce a globally optimal page placement policy. Evaluation demonstrates that MSched outperforms demand paging by up to 11.05x for scientific and deep learning workloads, and 57.88x for LLM under memory oversubscription.

preprint2024arXiv

Cross-target Stance Detection by Exploiting Target Analytical Perspectives

Cross-target stance detection (CTSD) is an important task, which infers the attitude of the destination target by utilizing annotated data derived from the source target. One important approach in CTSD is to extract domain-invariant features to bridge the knowledge gap between multiple targets. However, the analysis of informal and short text structure, and implicit expressions, complicate the extraction of domain-invariant knowledge. In this paper, we propose a Multi-Perspective Prompt-Tuning (MPPT) model for CTSD that uses the analysis perspective as a bridge to transfer knowledge. First, we develop a two-stage instruct-based chain-of-thought method (TsCoT) to elicit target analysis perspectives and provide natural language explanations (NLEs) from multiple viewpoints by formulating instructions based on large language model (LLM). Second, we propose a multi-perspective prompt-tuning framework (MultiPLN) to fuse the NLEs into the stance predictor. Extensive experiments results demonstrate the superiority of MPPT against the state-of-the-art baseline methods.

preprint2023arXiv

Graphs with girth $2\ell+1$ and without longer odd holes that contain an odd $K_4$-subdivision

We say that a graph $G$ has an {\em odd $K_4$-subdivision} if some subgraph of $G$ is isomorphic to a $K_4$-subdivision and whose faces are all odd holes of $G$. For a number $\ell\geq 2$, let $\mathcal{G}_{\ell}$ denote the family of graphs which have girth $2\ell+1$ and have no odd hole with length greater than $2\ell+1$. Wu, Xu and Xu conjectured that every graph in $\bigcup_{\ell\geq2}\mathcal{G}_{\ell}$ is 3-colorable. Recently, Chudnovsky et al. and Wu et al., respectively, proved that every graph in $\mathcal{G}_2$ and $\mathcal{G}_3$ is 3-colorable. In this paper, we prove that no $4$-vertex-critical graph in $\bigcup_{\ell\geq5}\mathcal{G}_{\ell}$ has an odd $K_4$-subdivision. Using this result, Chen proved that all graphs in $\bigcup_{\ell\geq5}\mathcal{G}_{\ell}$ are 3-colorable.

preprint2022arXiv

Graphs with girth $2\ell+1$ and without longer odd holes are $3$-colorable

For a number $\ell\geq 2$, let $\mathcal{G}_{\ell}$ denote the family of graphs which have girth $2\ell+1$ and have no odd hole with length greater than $2\ell+1$. Plummer and Zha conjectured that every 3-connected and internally 4-connected graph in $\mathcal{G}_2$ is 3-colorable. Wu, Xu, and Xu conjectured that every graph in $\bigcup_{\ell\geq2}\mathcal{G}_{\ell}$ is 3-colorable. Chudnovsky et al. and Wu et al., respectively, proved that every graph in $\mathcal{G}_2$ and $\mathcal{G}_3$ is 3-colorable. In this paper, we prove that every graph in $\bigcup_{\ell\geq5}\mathcal{G}_{\ell}$ is 3-colorable.

preprint2022arXiv

Improved iterative methods for solving risk parity portfolio

Risk parity, also known as equal risk contribution, has recently gained increasing attention as a portfolio allocation method. However, solving portfolio weights must resort to numerical methods as the analytic solution is not available. This study improves two existing iterative methods: the cyclical coordinate descent (CCD) and Newton methods. We enhance the CCD method by simplifying the formulation using a correlation matrix and imposing an additional rescaling step. We also suggest an improved initial guess inspired by the CCD method for the Newton method. Numerical experiments show that the improved CCD method performs the best and is approximately three times faster than the original CCD method, saving more than 40% of the iterations.

preprint2022arXiv

KRCORE: a microsecond-scale RDMA control plane for elastic computing

We present KRCORE, an RDMA library with a microsecond-scale control plane on commodity RDMA hardware for elastic computing. KRCORE can establish a full-fledged RDMA connection within 10μs (hundreds or thousands of times faster than verbs), while only maintaining a (small) fixed-sized connection metadata at each node, regardless of the cluster scale. The key ideas include virtualizing pre-initialized kernel-space RDMA connections instead of creating one from scratch, and retrofitting advanced RDMA dynamic connected transport with static transport for both low connection overhead and high networking speed. Under load spikes, KRCORE can shorten the worker bootstrap time of an existing disaggregated key-value store (namely RACE Hashing) by 83%. In serverless computing (namely Fn), KRCORE can also reduce the latency for transferring data through RDMA by 99%.

preprint2022arXiv

Path Integral Quantum Monte Carlo Method for Light Nuclei

I describe the first continuous space nuclear path integral quantum Monte Carlo method, and calculate the ground state properties of light nuclei including Deuteron, Triton, Helium-3 and Helium-4, using both local chiral interaction up to next-to-next-to-leading-order and the Argonne $v_6'$ interaction. Compared with diffusion based quantum Monte Carlo methods such as Green's function Monte Carlo and auxiliary field diffusion Monte Carlo, path integral quantum Monte Carlo has the advantage that it can directly calculate the expectation value of operators without tradeoff, whether they commute with the Hamiltonian or not. For operators that commute with the Hamiltonian, e.g., the Hamiltonian itself, the path integral quantum Monte Carlo light-nuclei results agree with Green's function Monte Carlo and auxiliary field diffusion Monte Carlo results. For other operator expectations which are important to understand nuclear measurements but do not commute with the Hamiltonian and therefore cannot be accurately calculated by diffusion based quantum Monte Carlo methods without tradeoff, the path integral quantum Monte Carlo method gives reliable results. I show root-mean-square radii, one-particle number density distributions, and Euclidean response functions for single-nucleon couplings. I also systematically describe all the sampling algorithms used in this work, the strategies to make the computation efficient, the error estimations, and the details of the implementation of the code to perform calculations. This work can serve as a benchmark test for future calculations of larger nuclei or finite temperature nuclear matter using path integral quantum Monte Carlo.

preprint2022arXiv

Rank Determination in Tensor Factor Model

Factor model is an appealing and effective analytic tool for high-dimensional time series, with a wide range of applications in economics, finance and statistics. This paper develops two criteria for the determination of the number of factors for tensor factor models where the signal part of an observed tensor time series assumes a Tucker decomposition with the core tensor as the factor tensor. The task is to determine the dimensions of the core tensor. One of the proposed criteria is similar to information based criteria of model selection, and the other is an extension of the approaches based on the ratios of consecutive eigenvalues often used in factor analysis for panel time series. Theoretically results, including sufficient conditions and convergence rates, are established. The results include the vector factor models as special cases, with an additional convergence rates. Simulation studies provide promising finite sample performance for the two criteria.

preprint2022arXiv

The $9$-connected Excluded Minors for the Class of Quasi-graphic Matroids

The class of quasi-graphic matroids, recently introduced by Geelen, Gerards, and Whittle, is minor closed and contains both the class of lifted-graphic matroids and the class of frame matroids, each of which generalises the class of graphic matroids. In this paper, we prove that the matroids $U_{3,7}$ and $U_{4,7}$ are the only $9$-connected excluded minors for the class of quasi-graphic matroids.

preprint2021arXiv

Congruences modulo powers of 5 for the rank parity function

It is well known that Ramanujan conjectured congruences modulo powers of 5, 7 and and 11 for the partition function. These were subsequently proved by Watson (1938) and Atkin (1967). In 2009 Choi, Kang, and Lovejoy proved congruences modulo powers of 5 for the crank parity function. The generating function for rank parity function is f(q), which is the first example of a mock theta function that Ramanujan mentioned in his last letter to Hardy. We prove congruences modulo powers of 5 for the rank parity function.

preprint2020arXiv

Factor Models for High-Dimensional Tensor Time Series

Large tensor (multi-dimensional array) data are now routinely collected in a wide range of applications, due to modern data collection capabilities. Often such observations are taken over time, forming tensor time series. In this paper we present a factor model approach for analyzing high-dimensional dynamic tensor time series and multi-category dynamic transport networks. Two estimation procedures along with their theoretical properties and simulation results are presented. Two applications are used to illustrate the model and its interpretations.

preprint2020arXiv

KoPA: Automated Kronecker Product Approximation

We consider the problem of matrix approximation and denoising induced by the Kronecker product decomposition. Specifically, we propose to approximate a given matrix by the sum of a few Kronecker products of matrices, which we refer to as the Kronecker product approximation (KoPA). Because the Kronecker product is an extension of the outer product from vectors to matrices, KoPA extends the low rank matrix approximation, and includes it as a special case. Comparing with the latter, KoPA also offers a greater flexibility, since it allows the user to choose the configuration, which are the dimensions of the two smaller matrices forming the Kronecker product. On the other hand, the configuration to be used is usually unknown, and needs to be determined from the data in order to achieve the optimal balance between accuracy and parsimony. We propose to use extended information criteria to select the configuration. Under the paradigm of high dimensional analysis, we show that the proposed procedure is able to select the true configuration with probability tending to one, under suitable conditions on the signal-to-noise ratio. We demonstrate the superiority of KoPA over the low rank approximations through numerical studies, and several benchmark image examples.

preprint2020arXiv

Modeling Multivariate Spatial-Temporal Data with Latent Low-Dimensional Dynamics

High-dimensional multivariate spatial-temporal data arise frequently in a wide range of applications; however, there are relatively few statistical methods that can simultaneously deal with spatial, temporal and variable-wise dependencies in large data sets. In this paper, we propose a new approach to utilize the correlations in variable, space and time to achieve dimension reduction and to facilitate spatial/temporal predictions in the high-dimensional settings. The multivariate spatial-temporal process is represented as a linear transformation of a lower-dimensional latent factor process. The spatial dependence structure of the factor process is further represented non-parametrically in terms of latent empirical orthogonal functions. The low-dimensional structure is completely unknown in our setting and is learned entirely from data collected irregularly over space but regularly over time. We propose innovative estimation and prediction methods based on the latent low-rank structures. Asymptotic properties of the estimators and predictors are established. Extensive experiments on synthetic and real data sets show that, while the dimensions are reduced significantly, the spatial, temporal and variable-wise covariance structures are largely preserved. The efficacy of our method is further confirmed by the prediction performances on both synthetic and real data sets.

preprint2020arXiv

Multi-Task Learning Enhanced Single Image De-Raining

Rain removal in images is an important task in computer vision filed and attracting attentions of more and more people. In this paper, we address a non-trivial issue of removing visual effect of rain streak from a single image. Differing from existing work, our method combines various semantic constraint task in a proposed multi-task regression model for rain removal. These tasks reinforce the model's capabilities from the content, edge-aware, and local texture similarity respectively. To further improve the performance of multi-task learning, we also present two simple but powerful dynamic weighting algorithms. The proposed multi-task enhanced network (MENET) is a powerful convolutional neural network based on U-Net for rain removal research, with a specific focus on utilize multiple tasks constraints and exploit the synergy among them to facilitate the model's rain removal capacity. It is noteworthy that the adaptive weighting scheme has further resulted in improved network capability. We conduct several experiments on synthetic and real rain images, and achieve superior rain removal performance over several selected state-of-the-art (SOTA) approaches. The overall effect of our method is impressive, even in the decomposition of heavy rain and rain streak accumulation.The source code and some results can be found at:https://github.com/SumiHui/MENET.

preprint2020arXiv

On a class of elliptic functions associated with the even Dirichlet characters

We construct a class of companion elliptic functions associated with the even Dirichlet characters. Using the well-known properties of the classical Weierstrass elliptic function $\wp(z|τ)$ as the blueprint, we will derive their representations in terms of $q$-series and partial fractions, explore the significance of the coefficients of their power series expansions and establish the modular properties under the actions of the arithmetic groups $Γ_0(N)$ and $Γ_1(N)$.

preprint2020arXiv

On Decomposition of $θ_2^{2n}(τ)$ as the Sum of Lambert Series and Cusp forms

Based on the values of the Weierstrass elliptic function $\wp(z|τ)$ at $z=πτ/2$, $(π+πτ)/{2}, (π+πτ)/{4},(π+2πτ)/{4}$ and the theory of modular forms on the arithmetic group $Γ_0(2)$, we decompose $θ_2^{2n}(τ)$ as sum of Eisenstein series and a cusp forms. Using the recurrence relation of $\wp^{(2n)}(z|τ)$, we provide an algorithm to determine the exact form of these cusp forms.