Source author record

Jiyuan Zhang

Jiyuan Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math-ph math.MP Artificial Intelligence astro-ph.EP astro-ph.GA astro-ph.IM Computation and Language Distributed, Parallel, and Cluster Computing Machine Learning math.NA math.PR math.ST Numerical Analysis Statistics Theory

Catalog footprint

What is connected

10works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs

The pervasive "memory wall" bottleneck is significantly amplified in modern large-scale Mixture-of-Experts (MoE) architectures. MoE's inherent architectural sparsity leads to sparse arithmetic compute and also introduces substantial activation memory overheads -- driven by large token routing buffers and the need to materialize and buffer intermediate tensors. This memory pressure limits the maximum batch size and sequence length that can fit on GPUs, and also results in excessive data movements that hinders performance and efficient model scaling. We present MoEBlaze, a memory-efficient MoE training framework that addresses these issues through a co-designed system approach: (i) an end-to-end token dispatch and MoE training method with optimized data structures to eliminate intermediate buffers and activation materializing, and (ii) co-designed kernels with smart activation checkpoint to mitigate memory footprint while simultaneously achieving better performance. We demonstrate that MoEBlaze can achieve over 4x speedups and over 50% memory savings compared to existing MoE frameworks.

preprint2022arXiv

KMT-2021-BLG-0171Lb and KMT-2021-BLG-1689Lb: Two Microlensing Planets in the KMTNet High-cadence Fields with Followup Observations

Follow-up observations of high-magnification gravitational microlensing events can fully exploit their intrinsic sensitivity to detect extrasolar planets, especially those with small mass ratios. To make followup more uniform and efficient, we develop a system, HighMagFinder, based on the real-time data from the Korean Microlensing Telescope Network (KMTNet) to automatically alert possible ongoing high-magnification events. We started a new phase of follow-up observations with the help of HighMagFinder in 2021. Here we report the discovery of two planets in high-magnification microlensing events, KMT-2021-BLG-0171 and KMT-2021-BLG-1689, which were identified by the HighMagFinder. We find that both events suffer the ``central-resonant'' caustic degeneracy. The planet-host mass-ratio is $q\sim4.7\times10^{-5}$ or $q\sim 2.2\times10^{-5}$ for KMT-2021-BLG-0171, and $q\sim2.5\times10^{-4}$ or $q\sim 1.8\times10^{-4}$ for KMT-2021-BLG-1689. Together with two events reported by Ryu et al. (2022), four cases that suffer such degeneracy have been discovered in the 2021 season alone, indicating that the degenerate solutions may have been missed in some previous studies. We also propose a new factor for weighting the probability of each solution from the phase-space. The resonant interpretations for the two events are disfavored under this consideration. This factor can be included in future statistical studies to weight degenerate solutions.

preprint2022arXiv

Mixture of Experts for Biomedical Question Answering

Biomedical Question Answering (BQA) has attracted increasing attention in recent years due to its promising application prospect. It is a challenging task because the biomedical questions are professional and usually vary widely. Existing question answering methods answer all questions with a homogeneous model, leading to various types of questions competing for the shared parameters, which will confuse the model decision for each single type of questions. In this paper, in order to alleviate the parameter competition problem, we propose a Mixture-of-Expert (MoE) based question answering method called MoEBQA that decouples the computation for different types of questions by sparse routing. To be specific, we split a pretrained Transformer model into bottom and top blocks. The bottom blocks are shared by all the examples, aiming to capture the general features. The top blocks are extended to an MoE version that consists of a series of independent experts, where each example is assigned to a few experts according to its underlying question type. MoEBQA automatically learns the routing strategy in an end-to-end manner so that each expert tends to deal with the question types it is expert in. We evaluate MoEBQA on three BQA datasets constructed based on real examinations. The results show that our MoE extension significantly boosts the performance of question answering models and achieves new state-of-the-art performance. In addition, we elaborately analyze our MoE modules to reveal how MoEBQA works and find that it can automatically group the questions into human-readable clusters.

preprint2022arXiv

Optimal convergence rate of the explicit Euler method for convection-diffusion equations II: high dimensional cases

This is the second part of study on the optimal convergence rate of the explicit Euler discretization in time for the convection-diffusion equations [Appl. Math. Lett. \textbf{131} (2022) 108048] which focuses on high-dimensional linear/nonlinear cases under Dirichlet or Neumann boundary conditions. Several new corrected difference schemes are proposed based on the explicit Euler discretization in temporal derivative and central difference discretization in spatial derivatives. The priori estimate of the corrected scheme with application to constant convection coefficients is provided at length by the maximum principle and the optimal convergence rate four is proved when the step ratios along each direction equal to $1/6$. The corrected difference schemes have essentially improved {\rm \textbf{CFL}} condition and the numerical accuracy comparing with the classical difference schemes. Numerical examples involving two-/three-dimensional linear/nonlinear problems under Dirichlet/Neumann boundary conditions such as the Fisher equation, the Chafee-Infante equation, the Burgers' equation and classification to name a few substantiate the good properties claimed for the corrected difference scheme.

preprint2022arXiv

Spatio-Temporal Recurrent Networks for Event-Based Optical Flow Estimation

Event camera has offered promising alternative for visual perception, especially in high speed and high dynamic range scenes. Recently, many deep learning methods have shown great success in providing promising solutions to many event-based problems, such as optical flow estimation. However, existing deep learning methods did not address the importance of temporal information well from the perspective of architecture design and cannot effectively extract spatio-temporal features. Another line of research that utilizes Spiking Neural Network suffers from training issues for deeper architecture.To address these points, a novel input representation is proposed that captures the events' temporal distribution for signal enhancement. Moreover, we introduce a spatio-temporal recurrent encoding-decoding neural network architecture for event-based optical flow estimation, which utilizes Convolutional Gated Recurrent Units to extract feature maps from a series of event images. Besides, our architecture allows some traditional frame-based core modules, such as correlation layer and iterative residual refine scheme, to be incorporated. The network is end-to-end trained with self-supervised learning on the Multi-Vehicle Stereo Event Camera dataset. We have shown that it outperforms all the existing state-of-the-art methods by a large margin. The code link is https://github.com/ruizhao26/STE-FlowNet.

preprint2022arXiv

Uncertainty Guided Depth Fusion for Spike Camera

Depth estimation is essential for various important real-world applications such as autonomous driving. However, it suffers from severe performance degradation in high-velocity scenario since traditional cameras can only capture blurred images. To deal with this problem, the spike camera is designed to capture the pixel-wise luminance intensity at high frame rate. However, depth estimation with spike camera remains very challenging using traditional monocular or stereo depth estimation algorithms, which are based on the photometric consistency. In this paper, we propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse the predictions of monocular and stereo depth estimation networks for spike camera. Our framework is motivated by the fact that stereo spike depth estimation achieves better results at close range while monocular spike depth estimation obtains better results at long range. Therefore, we introduce a dual-task depth estimation architecture with a joint training strategy and estimate the distributed uncertainty to fuse the monocular and stereo results. In order to demonstrate the advantage of spike depth estimation over traditional camera depth estimation, we contribute a spike-depth dataset named CitySpike20K, which contains 20K paired samples, for spike depth estimation. UGDF achieves state-of-the-art results on CitySpike20K, surpassing all monocular or stereo spike depth estimation baselines. We conduct extensive experiments to evaluate the effectiveness and generalization of our method on CitySpike20K. To the best of our knowledge, our framework is the first dual-task fusion framework for spike camera depth estimation. Code and dataset will be released.

preprint2020arXiv

ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection

In recent years, Convolutional Neural Networks (CNNs) have shown superior capability in visual learning tasks. While accuracy-wise CNNs provide unprecedented performance, they are also known to be computationally intensive and energy demanding for modern computer systems. In this paper, we propose Virtual Pooling (ViP), a model-level approach to improve speed and energy consumption of CNN-based image classification and object detection tasks, with a provable error bound. We show the efficacy of ViP through experiments on four CNN models, three representative datasets, both desktop and mobile platforms, and two visual learning tasks, i.e., image classification and object detection. For example, ViP delivers 2.1x speedup with less than 1.5% accuracy degradation in ImageNet classification on VGG-16, and 1.8x speedup with 0.025 mAP degradation in PASCAL VOC object detection with Faster-RCNN. ViP also reduces mobile GPU and CPU energy consumption by up to 55% and 70%, respectively. As a complementary method to existing acceleration approaches, ViP achieves 1.9x speedup on ThiNet leading to a combined speedup of 5.23x on VGG-16. Furthermore, ViP provides a knob for machine learning practitioners to generate a set of CNN models with varying trade-offs between system speed/energy consumption and accuracy to better accommodate the requirements of their tasks. Code is available at https://github.com/cmu-enyac/VirtualPooling.

preprint2019arXiv

Co-rank 1 projections and the randomised Horn problem

Let $\hat{\boldsymbol x}$ be a normalised standard complex Gaussian vector, and project an Hermitian matrix $A$ onto the hyperplane orthogonal to $\hat{\boldsymbol x}$. In a recent paper Faraut [Tunisian J. Math. \textbf{1} (2019), 585--606] has observed that the corresponding eigenvalue PDF has an almost identical structure to the eigenvalue PDF for the rank 1 perturbation $A + b \hat{\boldsymbol x} \hat{\boldsymbol x}^\dagger$, and asks for an explanation. We provide this by way of a common derivation involving the secular equations and associated Jacobians. This applies too in related setting, for example when $\hat{\boldsymbol x}$ is a real Gaussian and $A$ Hermitian, and also in a multiplicative setting $A U B U^\dagger$ where $A, B$ are fixed unitary matrices with $B$ a multiplicative rank 1 deviation from unity, and $U$ is a Haar distributed unitary matrix. Specifically, in each case there is a dual eigenvalue problem giving rise to a PDF of almost identical structure.

preprint2018arXiv

Lyapunov exponents for some isotropic random matrix ensembles

A random matrix with rows distributed as a function of their length is said to be isotropic. When these distributions are Gaussian, beta type I, or beta type II, previous work has, from the viewpoint of integral geometry, obtained the explicit form of the distribution of the determinant. We use these result to evaluate the sum of the Lyapunov spectrum of the corresponding random matrix product, and we further give explicit expressions for the largest Lyapunov exponent. Generalisations to the case of complex or quaternion entries are also given. For standard Gaussian matrices $X$, the full Lyapunov spectrum for products of random matrices $I_N + {1 \over c} X$ is computed in terms of a generalised hypergeometric function in general, and in terms of a single single integral involving a modified Bessel function for the largest Lyapunov exponent.

preprint2018arXiv

Parametrising correlation matrices

Correlation matrices are the sub-class of positive definite real matrices with all entries on the diagonal equal to unity. Earlier work has exhibited a parametrisation of the corresponding Cholesky factorisation in terms of partial correlations, and also in terms of hyperspherical co-ordinates. We show how the two are relating, starting from the definition of the partial correlations in terms of the Schur complement. We extend this to the generalisation of correlation matrices to the cases of complex and quaternion entries. As in the real case, we show how the hyperspherical parametrisation leads naturally to a distribution on the space of correlation matrices $\{R\}$ with probability density function proportional to $( \det R)^a$. For certain $a$, a construction of random correlation matrices realising this distribution is given in terms of rectangular standard Gaussian matrices.

Jiyuan Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs

KMT-2021-BLG-0171Lb and KMT-2021-BLG-1689Lb: Two Microlensing Planets in the KMTNet High-cadence Fields with Followup Observations

Mixture of Experts for Biomedical Question Answering

Optimal convergence rate of the explicit Euler method for convection-diffusion equations II: high dimensional cases

Spatio-Temporal Recurrent Networks for Event-Based Optical Flow Estimation

Uncertainty Guided Depth Fusion for Spike Camera

ViP: Virtual Pooling for Accelerating CNN-based Image Classification and Object Detection

Co-rank 1 projections and the randomised Horn problem

Lyapunov exponents for some isotropic random matrix ensembles

Parametrising correlation matrices