Source author record

Yi Jiang

Yi Jiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

42works

25topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ArcAligner: Adaptive Recursive Aligner for Compressed Context Embeddings in RAG

Retrieval-Augmented Generation (RAG) helps LLMs stay accurate, but feeding long documents into a prompt makes the model slow and expensive. This has motivated context compression, ranging from token pruning and summarization to embedding-based compression. While researchers have tried ''compressing'' these documents into smaller summaries or mathematical embeddings, there is a catch: the more you compress the data, the more the LLM struggles to understand it. To address this challenge, we propose ArcAligner (Adaptive recursive context *Aligner*), a lightweight module integrated into the language model layers to help the model better utilize highly compressed context representations for downstream generation. It uses an adaptive ''gating'' system that only adds extra processing power when the information is complex, keeping the system fast. Across knowledge-intensive QA benchmarks, ArcAligner consistently beats compression baselines at comparable compression rates, especially on multi-hop and long-tail settings. The source code is publicly available.

preprint2026arXiv

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

We present NextFlow, a unified decoder-only autoregressive transformer trained on 6 trillion interleaved text-image discrete tokens. By leveraging a unified vision representation within a unified autoregressive architecture, NextFlow natively activates multimodal understanding and generation capabilities, unlocking abilities of image editing, interleaved content and video generation. Motivated by the distinct nature of modalities - where text is strictly sequential and images are inherently hierarchical - we retain next-token prediction for text but adopt next-scale prediction for visual generation. This departs from traditional raster-scan methods, enabling the generation of 1024x1024 images in just 5 seconds - orders of magnitude faster than comparable AR models. We address the instabilities of multi-scale generation through a robust training recipe. Furthermore, we introduce a prefix-tuning strategy for reinforcement learning. Experiments demonstrate that NextFlow achieves state-of-the-art performance among unified models and rivals specialized diffusion baselines in visual quality.

preprint2026arXiv

OptiSet: Unified Optimizing Set Selection and Ranking for Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) improves generation quality by incorporating evidence retrieved from large external corpora. However, most existing methods rely on statically selecting top-k passages based on individual relevance, which fails to exploit combinatorial gains among passages and often introduces substantial redundancy. To address this limitation, we propose OptiSet, a set-centric framework that unifies set selection and set-level ranking for RAG. OptiSet adopts an "Expand-then-Refine" paradigm: it first expands a query into multiple perspectives to enable a diverse candidate pool and then refines the candidate pool via re-selection to form a compact evidence set. We then devise a self-synthesis strategy without strong LLM supervision to derive preference labels from the set conditional utility changes of the generator, thereby identifying complementary and redundant evidence. Finally, we introduce a set-list wise training strategy that jointly optimizes set selection and set-level ranking, enabling the model to favor compact, high-gain evidence sets. Extensive experiments demonstrate that OptiSet improves performance on complex combinatorial problems and makes generation more efficient. The source code is publicly available.

preprint2026arXiv

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Visual generation is dominated by three paradigms: AutoRegressive (AR), diffusion, and Visual AutoRegressive (VAR) models. Unlike AR and diffusion, VARs operate on heterogeneous input structures across their generation steps, which creates severe asynchronous policy conflicts. This issue becomes particularly acute in reinforcement learning (RL) scenarios, leading to unstable training and suboptimal alignment. To resolve this, we propose a novel framework to enhance Group Relative Policy Optimization (GRPO) by explicitly managing these conflicts. Our method integrates three synergistic components: 1) a stabilizing intermediate reward to guide early-stage generation; 2) a dynamic time-step reweighting scheme for precise credit assignment; and 3) a novel mask propagation algorithm, derived from principles of Reward Feedback Learning (ReFL), designed to isolate optimization effects both spatially and temporally. Our approach demonstrates significant improvements in sample quality and objective alignment over the vanilla GRPO baseline, enabling robust and effective optimization for VAR models.

preprint2023arXiv

Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling

We identify and overcome two key obstacles in extending the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets): (i) convolution operation cannot handle irregular, random-masked input images; (ii) the single-scale nature of BERT pre-training is inconsistent with convnet's hierarchical structure. For (i), we treat unmasked pixels as sparse voxels of 3D point clouds and use sparse convolution to encode. This is the first use of sparse convolution for 2D masked modeling. For (ii), we develop a hierarchical decoder to reconstruct images from multi-scale encoded features. Our method called Sparse masKed modeling (SparK) is general: it can be used directly on any convolutional model without backbone modifications. We validate it on both classical (ResNet) and modern (ConvNeXt) models: on three downstream tasks, it surpasses both state-of-the-art contrastive learning and transformer-based masked modeling by similarly large margins (around +1.0%). Improvements on object detection and instance segmentation are more substantial (up to +3.5%), verifying the strong transferability of features learned. We also find its favorable scaling behavior by observing more gains on larger models. All this evidence reveals a promising future of generative pre-training on convnets. Codes and models are released at https://github.com/keyu-tian/SparK.

preprint2022arXiv

A Direct Parallel-in-Time Quasi-Boundary Value Method for Inverse Space-Dependent Source Problems

Inverse source problems arise often in real-world applications, such as localizing unknown groundwater contaminant sources. Being different from Tikhonov regularization, the quasi-boundary value method has been proposed and analyzed as an effective way for regularizing such inverse source problems, which was shown to achieve an optimal order convergence rate under suitable assumptions. However, fast direct or iterative solvers for the resulting all-at-once large-scale linear systems have been rarely studied in the literature. In this work, we first proposed and analyzed a modified quasi-boundary value method, and then developed a diagonalization-based parallel-in-time (PinT) direct solver, which can achieve a dramatic speedup in CPU times when compared with MATLAB's sparse direct solver. In particular, the time-discretization matrix $B$ is shown to be diagonalizable, and the condition number of its eigenvector matrix $V$ is proven to exhibit quadratic growth, which guarantees the roundoff errors due to diagonalization is well controlled. Several 1D and 2D examples are presented to demonstrate the very promising computational efficiency of our proposed method, where the CPU times in 2D cases can be speedup by three orders of magnitude.

preprint2022arXiv

Automatic Parameter Selection for Electron Ptychography via Bayesian Optimization

Electron ptychography provides new opportunities to resolve atomic structures with deep sub-angstrom spatial resolution and studying electron-beam sensitive materials with high dose efficiency. In practice, obtaining accurate ptychography images requires simultaneously optimizing multiple parameters that are often selected based on trial-and-error, resulting in low-throughput experiments and preventing wider adoption. Here, we develop an automatic parameter selection framework to circumvent this problem using Bayesian optimization with Gaussian processes. With minimal prior knowledge, the workflow efficiently produces ptychographic reconstructions that are superior than the ones processed by experienced experts. The method also facilitates better experimental designs by exploring optimized experimental parameters from simulated data.

preprint2022arXiv

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtain identities by associating detection boxes whose scores are higher than a threshold. The objects with low detection scores, e.g. occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating almost every detection box instead of only the high score ones. For the low score detection boxes, we utilize their similarities with tracklets to recover true objects and filter out the background detections. When applied to 9 different state-of-the-art trackers, our method achieves consistent improvement on IDF1 score ranging from 1 to 10 points. To put forwards the state-of-the-art performance of MOT, we design a simple and strong tracker, named ByteTrack. For the first time, we achieve 80.3 MOTA, 77.3 IDF1 and 63.1 HOTA on the test set of MOT17 with 30 FPS running speed on a single V100 GPU. ByteTrack also achieves state-of-the-art performance on MOT20, HiEve and BDD100K tracking benchmarks. The source code, pre-trained models with deploy versions and tutorials of applying to other trackers are released at https://github.com/ifzhang/ByteTrack.

preprint2022arXiv

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

A typical pipeline for multi-object tracking (MOT) is to use a detector for object localization, and following re-identification (re-ID) for object association. This pipeline is partially motivated by recent progress in both object detection and re-ID, and partially motivated by biases in existing tracking datasets, where most objects tend to have distinguishing appearance and re-ID models are sufficient for establishing associations. In response to such bias, we would like to re-emphasize that methods for multi-object tracking should also work when object appearance is not sufficiently discriminative. To this end, we propose a large-scale dataset for multi-human tracking, where humans have similar appearance, diverse motion and extreme articulation. As the dataset contains mostly group dancing videos, we name it "DanceTrack". We expect DanceTrack to provide a better platform to develop more MOT algorithms that rely less on visual discrimination and depend more on motion analysis. We benchmark several state-of-the-art trackers on our dataset and observe a significant performance drop on DanceTrack when compared against existing benchmarks. The dataset, project code and competition server are released at: \url{https://github.com/DanceTrack}.

preprint2022arXiv

eeγproduction at photon-photon colliders at complete electroweak NLO accuracy

We present the NLO electroweak radiative corrections to the eeγproduction in γγcollision, which is an ideal channel for calibrating the beam luminosity of a Photon Linear Collider. We analyze the dependence of the total cross section on the beam colliding energy, and then investigate the kinematic distributions of final particles at various initial photon beam polarizations at EW NLO accuracy. The numerical results indicate that the EW relative corrections to the total cross section are non-negligible and become increasingly significant as the increase of the beam colliding energy. Such EW corrections are very important and should be taken into consideration in precision theoretical and experimental studies at high-energy γγcolliders.

preprint2022arXiv

In Defense of Online Models for Video Instance Segmentation

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance. However, online methods have their inherent advantage in handling long video sequences and ongoing videos while offline models fail due to the limit of computational resources. Therefore, it would be highly desirable if online models can achieve comparable or even better performance than offline models. By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association between frames caused by the similar appearance among different instances in the feature space. Observing this, we propose an online framework based on contrastive learning that is able to learn more discriminative instance embeddings for association and fully exploit history information for stability. Despite its simplicity, our method outperforms all online and offline methods on three benchmarks. Specifically, we achieve 49.5 AP on YouTube-VIS 2019, a significant improvement of 13.2 AP and 2.1 AP over the prior online and offline art, respectively. Moreover, we achieve 30.2 AP on OVIS, a more challenging dataset with significant crowding and occlusions, surpassing the prior art by 14.8 AP. The proposed method won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022). We hope the simplicity and effectiveness of our method, as well as our insight into current methods, could shed light on the exploration of VIS models.

preprint2022arXiv

Language as Queries for Referring Video Object Segmentation

Referring video object segmentation (R-VOS) is an emerging cross-modal task that aims to segment the target object referred by a language expression in all video frames. In this work, we propose a simple and unified framework built upon Transformer, termed ReferFormer. It views the language as queries and directly attends to the most relevant regions in the video frames. Concretely, we introduce a small set of object queries conditioned on the language as the input to the Transformer. In this manner, all the queries are obligated to find the referred objects only. They are eventually transformed into dynamic kernels which capture the crucial object-level information, and play the role of convolution filters to generate the segmentation masks from feature maps. The object tracking is achieved naturally by linking the corresponding queries across frames. This mechanism greatly simplifies the pipeline and the end-to-end framework is significantly different from the previous methods. Extensive experiments on Ref-Youtube-VOS, Ref-DAVIS17, A2D-Sentences and JHMDB-Sentences show the effectiveness of ReferFormer. On Ref-Youtube-VOS, Refer-Former achieves 55.6J&F with a ResNet-50 backbone without bells and whistles, which exceeds the previous state-of-the-art performance by 8.4 points. In addition, with the strong Swin-Large backbone, ReferFormer achieves the best J&F of 64.2 among all existing methods. Moreover, we show the impressive results of 55.0 mAP and 43.7 mAP on A2D-Sentences andJHMDB-Sentences respectively, which significantly outperforms the previous methods by a large margin. Code is publicly available at https://github.com/wjn922/ReferFormer.

preprint2022arXiv

Master integrals for mixed QCD-QED corrections to charged-current Drell-Yan production of a massive charged lepton

The master integrals for the mixed QCD-QED two-loop virtual corrections to the charged-current Drell-Yan process $q\bar{q}^{\prime} \rightarrow \ell ν$ are computed analytically by using the differential equation method. A suitable choice of master integrals makes it successful to cast the differential equation system into the canonical form. We keep the dependence on charged lepton mass in the building of differential equations and then expand the system against the ratio of small charged lepton mass to large $W$-boson mass. In such a way the final results will contain large logarithms of the form $\log(m_{\ell}^2/m_W^2)$. Finally, all the canonical master integrals are given as Taylor series around $d = 4$ spacetime dimensions up to order four, with coefficients expressed in terms of Goncharov polylogarithms up to weight four.

preprint2022arXiv

MetaFormer: A Unified Meta Framework for Fine-Grained Recognition

Fine-Grained Visual Classification(FGVC) is the task that requires recognizing the objects belonging to multiple subordinate categories of a super-category. Recent state-of-the-art methods usually design sophisticated learning pipelines to tackle this task. However, visual information alone is often not sufficient to accurately differentiate between fine-grained visual categories. Nowadays, the meta-information (e.g., spatio-temporal prior, attribute, and text description) usually appears along with the images. This inspires us to ask the question: Is it possible to use a unified and simple framework to utilize various meta-information to assist in fine-grained identification? To answer this problem, we explore a unified and strong meta-framework(MetaFormer) for fine-grained visual classification. In practice, MetaFormer provides a simple yet effective approach to address the joint learning of vision and various meta-information. Moreover, MetaFormer also provides a strong baseline for FGVC without bells and whistles. Extensive experiments demonstrate that MetaFormer can effectively use various meta-information to improve the performance of fine-grained recognition. In a fair comparison, MetaFormer can outperform the current SotA approaches with only vision information on the iNaturalist2017 and iNaturalist2018 datasets. Adding meta-information, MetaFormer can exceed the current SotA approaches by 5.9% and 5.3%, respectively. Moreover, MetaFormer can achieve 92.3% and 92.7% on CUB-200-2011 and NABirds, which significantly outperforms the SotA approaches. The source code and pre-trained models are released athttps://github.com/dqshuai/MetaFormer.

preprint2022arXiv

Model-Based Neural Network and Its Application to Line Spectral Estimation

This paper presents the concept of "model-based neural network"(MNN), which is inspired by the classic artificial neural network (ANN) but for different usages. Instead of being used as a data-driven classifier, a MNN serves as a modeling tool with artfully defined inputs, outputs, and activation functions which have explicit physical meanings. Owing to the same layered form as an ANN, a MNN can also be optimized using the back-propagation (BP) algorithm. As an interesting application, the classic problem of line spectral estimation can be modeled by a MNN. We propose to first initialize the MNN by the fast Fourier transform (FFT) based spectral estimation, and then optimize the MNN by the BP algorithm, which automatically yields the maximum likelihood (ML) parameter estimation of the frequency spectrum. We also design a method of merging and pruning the hidden-layer nodes of the MNN, which can be used for model-order selection, i.e., to estimate the number of sinusoids. Numerical simulations verify the effectiveness of the proposed method.

preprint2022arXiv

Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a task that an agent is required to follow a language instruction to navigate to the goal position, which relies on the ongoing interactions with the environment during moving. Recent Transformer-based VLN methods have made great progress benefiting from the direct connections between visual observations and the language instruction via the multimodal cross-attention mechanism. However, these methods usually represent temporal context as a fixed-length vector by using an LSTM decoder or using manually designed hidden states to build a recurrent Transformer. Considering a single fixed-length vector is often insufficient to capture long-term temporal context, in this paper, we introduce Multimodal Transformer with Variable-length Memory (MTVM) for visually-grounded natural language navigation by modelling the temporal context explicitly. Specifically, MTVM enables the agent to keep track of the navigation trajectory by directly storing previous activations in a memory bank. To further boost the performance, we propose a memory-aware consistency loss to help learn a better joint representation of temporal context with random masked instructions. We evaluate MTVM on popular R2R and CVDN datasets, and our model improves Success Rate on R2R unseen validation and test set by 2% each, and reduce Goal Process by 1.6m on CVDN test set.

preprint2022arXiv

Nonlinear Kalman Filter Using Cramer Rao Bound

This paper studies the optimal state estimation for a dynamic system, whose transfer function can be nonlinear and the input noise can be of arbitrary distribution. Our algorithm differs from the conventional extended Kalman filter (EKF) and the particle filter (PF) in that it estimates not only the state vector but also the Cramer-Rao bound (CRB), which serves as an accuracy indicator. Combining the state estimation, the CRB, and the incoming new measurement, the algorithm updates the state estimation according to the maximum likelihood (ML) criterion. To illustrate the effectiveness of the proposed method for autonomous driving, we apply it to estimate the position and velocity of a vehicle based on the noisy measurements of distance and Doppler offset. Simulation results show that the proposed algorithm can achieve estimation significantly more accurate than the standard EKF and the PF.

preprint2022arXiv

Objects in Semantic Topology

A more realistic object detection paradigm, Open-World Object Detection, has arisen increasing research interests in the community recently. A qualified open-world object detector can not only identify objects of known categories, but also discover unknown objects, and incrementally learn to categorize them when their annotations progressively arrive. Previous works rely on independent modules to recognize unknown categories and perform incremental learning, respectively. In this paper, we provide a unified perspective: Semantic Topology. During the life-long learning of an open-world object detector, all object instances from the same category are assigned to their corresponding pre-defined node in the semantic topology, including the `unknown' category. This constraint builds up discriminative feature representations and consistent relationships among objects, thus enabling the detector to distinguish unknown objects out of the known categories, as well as making learned features of known objects undistorted when learning new categories incrementally. Extensive experiments demonstrate that semantic topology, either randomly-generated or derived from a well-trained language model, could outperform the current state-of-the-art open-world object detectors by a large margin, e.g., the absolute open-set error is reduced from 7832 to 2546, exhibiting the inherent superiority of semantic topology on open-world object detection.

preprint2022arXiv

Physics-Inspired Unsupervised Classification for Region of Interest in X-Ray Ptychography

X-ray ptychography allows for large fields to be imaged at high resolution at the cost of additional computational expense due to the large volume of data. Given limited information regarding the object, the acquired data often has an excessive amount of information that is outside the region of interest (RoI). In this work we propose a physics-inspired unsupervised learning algorithm to identify the RoI of an object using only diffraction patterns from a ptychography dataset before committing computational resources to reconstruction. Obtained diffraction patterns that are automatically identified as not within the RoI are filtered out, allowing efficient reconstruction by focusing only on important data within the RoI while preserving image quality.

preprint2022arXiv

Ptychopy: GPU framework for ptychographic data analysis

X-ray ptychography imaging at synchrotron facilities like the Advanced Photon Source (APS) involves controlling instrument hardwares to collect a set of diffraction patterns from overlapping coherent illumination spots on extended samples, managing data storage, reconstructing ptychographic images from acquired diffraction patterns, and providing the visualization of results and feedback. In addition to the complicated workflow, ptychography instrument could produce up to several TB's of data per second that is needed to be processed in real time. This brings up the need to develop a high performance, robust and user friendly processing software package for ptychographic data analysis. In this paper we present a software framework which provides functionality of visualization, work flow control, and data reconstruction. To accelerate the computation and large datasets process, the data reconstruction part is implemented with three algorithms, ePIE, DM and LSQML using CUDA-C on GPU.

preprint2022arXiv

Real-time 3D analysis during electron tomography using tomviz

The demand for high-throughput electron tomography is rapidly increasing in biological and material sciences. However, this 3D imaging technique is computationally bottlenecked by alignment and reconstruction which runs from hours to days. We demonstrate real-time tomography with dynamic 3D tomographic visualization to enable rapid interpretation of specimen structure immediately as data is collected on an electron microscope. Using geometrically complex chiral nanoparticles, we show volumetric interpretation can begin in less than 10 minutes and a high quality tomogram is available within 30 minutes. Real time tomography is integrated into tomviz, an open source and cross platform 3D analysis tool that contains intuitive graphical user interfaces (GUI) to enable any scientist to characterize biological and material structure in 3D.

preprint2022arXiv

SeqFormer: Sequential Transformer for Video Instance Segmentation

In this work, we present SeqFormer for video instance segmentation. SeqFormer follows the principle of vision transformer that models instance relationships among video frames. Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms shall be done with each frame independently. To achieve this, SeqFormer locates an instance in each frame and aggregates temporal information to learn a powerful representation of a video-level instance, which is used to predict the mask sequences on each frame dynamically. Instance tracking is achieved naturally without tracking branches or post-processing. On YouTube-VIS, SeqFormer achieves 47.4 AP with a ResNet-50 backbone and 49.0 AP with a ResNet-101 backbone without bells and whistles. Such achievement significantly exceeds the previous state-of-the-art performance by 4.6 and 4.4, respectively. In addition, integrated with the recently-proposed Swin transformer, SeqFormer achieves a much higher AP of 59.3. We hope SeqFormer could be a strong baseline that fosters future research in video instance segmentation, and in the meantime, advances this field with a more robust, accurate, neat model. The code is available at https://github.com/wjf5203/SeqFormer.

preprint2022arXiv

Subtype-Former: a deep learning approach for cancer subtype discovery with multi-omics data

Motivation: Cancer is heterogeneous, affecting the precise approach to personalized treatment. Accurate subtyping can lead to better survival rates for cancer patients. High-throughput technologies provide multiple omics data for cancer subtyping. However, precise cancer subtyping remains challenging due to the large amount and high dimensionality of omics data. Results: This study proposed Subtype-Former, a deep learning method based on MLP and Transformer Block, to extract the low-dimensional representation of the multi-omics data. K-means and Consensus Clustering are also used to achieve accurate subtyping results. We compared Subtype-Former with the other state-of-the-art subtyping methods across the TCGA 10 cancer types. We found that Subtype-Former can perform better on the benchmark datasets of more than 5000 tumors based on the survival analysis. In addition, Subtype-Former also achieved outstanding results in pan-cancer subtyping, which can help analyze the commonalities and differences across various cancer types at the molecular level. Finally, we applied Subtype-Former to the TCGA 10 types of cancers. We identified 50 essential biomarkers, which can be used to study targeted cancer drugs and promote the development of cancer treatments in the era of precision medicine.

preprint2022arXiv

Towards Grand Unification of Object Tracking

We present a unified method, termed Unicorn, that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters. Due to the fragmented definitions of the object tracking problem itself, most existing trackers are developed to address a single or part of tasks and overspecialize on the characteristics of specific tasks. By contrast, Unicorn provides a unified solution, adopting the same input, backbone, embedding, and head across all tracking tasks. For the first time, we accomplish the great unification of the tracking network architecture and learning paradigm. Unicorn performs on-par or better than its task-specific counterparts in 8 tracking datasets, including LaSOT, TrackingNet, MOT17, BDD100K, DAVIS16-17, MOTS20, and BDD100K MOTS. We believe that Unicorn will serve as a solid step towards the general vision model. Code is available at https://github.com/MasterBin-IIAU/Unicorn.

preprint2022arXiv

Two-dimensional Obstructed Atomic Insulators with Fractional Corner Charge in MA$_2$Z$_4$ Family

According to topological quantum chemistry, a class of electronic materials have been called obstructed atomic insulators (OAIs), in which a portion of valence electrons necessarily have their centers located on some empty $\textit{Wyckoff}$ positions without atoms occupation in the lattice. The obstruction of centering these electrons coinciding with their host atoms is nontrivial and results in metallic boundary states when the boundary is properly cut. Here, on basis of first-principles calculations in combination with topological quantum chemistry analysis, we propose two dimensional MA$_2$Z$_4$ (M = Cr, Mo and W; A = Si and Ge, Z = N, P and As) monolayer family are all OAIs. A typical case is the recently synthesized MoSi$_2$N$_4$. Although it is a topological trivial insulator with the occupied electronic states being integer combination of elementary band representations, it has valence electrons centering empty $\textit{Wyckoff}$ positions. It exhibits unique OAI-induced metallic edge states along the (1$\bar{1}$0) edge of MoSi$_2$N$_4$ monolayer and the in-gap corner states at three vertices of certain hexagonal nanodisk samples respecting C$_3$ rotation symmetry. The readily synthesized MoSi$_2$N$_4$ is quite stable and has a large bulk band gap of 1.94 eV, which makes the identification of these edge and corner states most possible for experimental clarification.

preprint2021arXiv

Hybrid Interference Mitigation Using Analog Prewhitening

This paper proposes a novel scheme for mitigating strong interferences, which is applicable to various wireless scenarios, including full-duplex wireless communications and uncoordinated heterogenous networks. As strong interferences can saturate the receiver's analog-to-digital converters (ADC), they need to be mitigated both before and after the ADCs, i.e., via hybrid processing. The key idea of the proposed scheme, namely the Hybrid Interference Mitigation using Analog Prewhitening (HIMAP), is to insert an M-input M-output analog phase shifter network (PSN) between the receive antennas and the ADCs to spatially prewhiten the interferences, which requires no signal information but only an estimate of the covariance matrix. After interference mitigation by the PSN prewhitener, the preamble can be synchronized, the signal channel response can be estimated, and thus a minimum mean squared error (MMSE) beamformer can be applied in the digital domain to further mitigate the residual interferences. The simulation results verify that the HIMAP scheme can suppress interferences 80dB stronger than the signal by using off-the-shelf phase shifters (PS) of 6-bit resolution.

preprint2021arXiv

Robust Output Regulation and Reinforcement Learning-based Output Tracking Design for Unknown Linear Discrete-Time Systems

In this paper, we investigate the optimal output tracking problem for linear discrete-time systems with unknown dynamics using reinforcement learning and robust output regulation theory. This output tracking problem only allows to utilize the outputs of the reference system and the controlled system, rather than their states, and differs from most existing tracking results that depend on the state of the system. The optimal tracking problem is formulated into a linear quadratic regulation problem by proposing a family of dynamic discrete-time controllers. Then, it is shown that solving the output tracking problem is equivalent to solving output regulation equations, whose solution, however, requires the knowledge of the complete and accurate system dynamics. To remove such a requirement, an off-policy reinforcement learning algorithm is proposed using only the measured output data along the trajectories of the system and the reference output. By introducing re-expression error and analyzing the rank condition of the parameterization matrix, we ensure the uniqueness of the proposed RL based optimal control via output feedback.

preprint2021arXiv

Topological classification and diagnosis in magnetically ordered electronic materials

We show that compositions of time-reversal and spatial symmetries, also known as the magnetic-space-group symmetries, protect topological invariants as well as surface states that are distinct from those of all preceding topological states. We obtain, by explicit and exhaustive construction, the topological classification of electronic band insulators that are magnetically ordered for each one of the 1421 magnetic space groups in three dimensions. We have also computed the symmetry-based indicators for each nontrivial class, and, by doing so, establish the complete mapping from symmetry representations to topological invariants.

preprint2020arXiv

$WWγ$ production at hadron colliders with NLO QCD+EW corrections and parton shower effects

$W^+W^-γ$ production in proton-proton collision provides a window to the mechanism of electroweak symmetry breaking and a direct accessment to triple and quartic gauge couplings. Precision study of gauge boson self-interactions may also provide evidence of existence of new physics beyond the Standard Model. In this paper, we study the $W^+W^-γ$ production at the LHC and future higher energy proton-proton colliders at the QCD+EW NLO including parton shower effects. We find that the contributions from the photon-induced (i.e., $qγ$- and $γγ$-initiated) channels are non-negligible since the photon luminosity can be enhanced significantly with the increment of colliding energy, and the large real jet emission QCD and EW corrections can be depressed sufficiently by applying the jet veto event selection scheme. Moreover, we also investigate the theoretical errors arising from the PDF uncertainty and the factorization/renormalization scale dependence.

preprint2020arXiv

Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019

Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using the false positive rate, false negative rate, and DICE coefficient (DC). The DC ranged from 0.7354$\pm$0.1149 to 0.8372$\pm$0.0858. The DC of the best method was close to the inter-observer agreement (0.8398$\pm$0.0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ($\textit{p}$<$0.01$) than single model methods, with mean DC of 0.7966 and 0.7544, respectively. Deep learning based methods could potentially help pathologists find suspicious regions for further analysis of lung cancer in WSI.

preprint2020arXiv

Discontinuity-resolving shock-capturing schemes on unstructured grids

Solving compressible flows containing discontinuities remains a major challenge for numerical methods especially on unstructured grids. Thus in this work, we make contributions to shock capturing schemes on unstructured grids with aim of resolving discontinuities with low numerical dissipation. Different from conventional shock capturing schemes which only use polynomials as interpolation functions on unstructured grids, the proposed scheme employs the linear polynomial as well as non-polynomial as reconstruction candidates. For linear polynomial, the second order MUSCL scheme with the MLP (Multi-dimensional Limiting Process) slope limiter is adopted. The multi-dimensional THINC (Tangent of Hyperbola for INterface Capturing) function with quadratic surface representation and Gaussian quadrature, so-called THINC/QQ, is used as the non-polynomial reconstruction candidate. With these reconstruction candidates, a multi-stage boundary variation diminishing (BVD) algorithm which aims to minimize numerical dissipation is designed on unstructured grids to select the final reconstruction function. The resulted shock capturing scheme is named as MUSCL-THINC/QQ-BVD. The performance of the proposed scheme is demonstrated through solving compressible single-phase and multi-phase problems where the discontinuity is the typical flow structure. The numerical results show that the proposed scheme is capable of capturing sharp discontinuous profiles without numerical oscillations as well as resolving vortices associated with Kelvin-Helmholtz instabilities along shear layers and material interfaces. In comparison with schemes only replying on high order polynomials, the proposed scheme shows significant improvement of resolution across discontinuities. Thus, this work provides an accurate and robust shock-capturing scheme to resolve discontinuities in compressible flows.

preprint2020arXiv

Learning to Segment the Tail

Real-world visual recognition requires handling the extreme sample imbalance in large-scale long-tailed data. We propose a "divide&conquer" strategy for the challenging LVIS task: divide the whole data into balanced parts and then apply incremental learning to conquer each one. This derives a novel learning paradigm: class-incremental few-shot learning, which is especially effective for the challenge evolving over time: 1) the class imbalance among the old-class knowledge review and 2) the few-shot data in new-class learning. We call our approach Learning to Segment the Tail (LST). In particular, we design an instance-level balanced replay scheme, which is a memory-efficient approximation to balance the instance-level samples from the old-class images. We also propose to use a meta-module for new-class learning, where the module parameters are shared across incremental phases, gaining the learning-to-learn knowledge incrementally, from the data-rich head to the data-poor tail. We empirically show that: at the expense of a little sacrifice of head-class forgetting, we can gain a significant 8.3% AP improvement for the tail classes with less than 10 instances, achieving an overall 2.0% AP boost for the whole 1,230 classes.

preprint2020arXiv

Measuring the linear and nonlinear elastic properties of brain tissue with shear waves and inverse analysis

We use supersonic shear wave imaging (SSI) technique to measure not only the linear but also the nonlinear elastic properties of brain matter. Here, we tested six porcine brains ex vivo and measured the velocities of the plane shear waves induced by acoustic radiation force at different states of pre-deformation when the ultrasonic probe is pushed into the soft tissue. We relied on an inverse method based on the theory governing the propagation of small-amplitude acoustic waves in deformed solids to interpret the experimental data. We found that, depending on the subjects, the resulting initial shear modulus mu0 varies from 1.8 to 3.2 kPa, the stiffening parameter b of the hyperelastic Demiray-Fung model from 0.13 to 0.73, and the third- (A) and fourth-order (D) constants of weakly nonlinear elasticity from -1.3 to -20.6 kPa and from 3.1 to 8.7 kPa, respectively. Paired t-test performed on the experimental results of the left and right lobes of the brain shows no significant difference. These values are in line with those reported in the literature on brain tissue, indicating that the SSI method, combined to the inverse analysis, is an efficient and powerful tool for the mechanical characterization of brain tissue, which is of great importance for computer simulation of traumatic brain injury and virtual neurosurgery.

preprint2020arXiv

Same data may bring conflict results: a caution to use the disruptive index

In the last two decades, scholars have designed various types of bibliographic related indicators to identify breakthrough-class academic achievements. In this study, we take a further step to look at properties of the promising disruptive index, thus deepening our understanding of this index and further facilitating its wise use in bibliometrics. Using publication records for Nobel laureates between 1900 and 2016, we calculate the DI of Nobel Prize-winning articles and its benchmark articles in each year and use the median DI to denote the central tendency in each year, and compare results between Medicine, Chemistry, and Physics. We find that conclusions based on DI depend on the length of their citation time window, and different citation time windows may cause different, even controversial, results. Also, discipline and time play a role on the length of citation window when using DI to measure the innovativeness of a scientific work. Finally, not all articles with DI equals to 1 were the breakthrough-class achievements. In other words, the DI stands up theoretically, but we should not neglect that the DI was only shaped by the number of citing articles and times the references have been cited, these data may vary from database to database.

preprint2020arXiv

Worm-algorithm-type Simulation of Quantum Transverse-Field Ising Model

We apply a worm algorithm to simulate the quantum transverse-field Ising model in a path-integral representation of which the expansion basis is taken as the spin component along the external-field direction. In such a representation, a configuration can be regarded as a set of non-intersecting loops constructed by "kinks" for pairwise interactions and spin-down (or -up) imaginary-time segments. The wrapping probability for spin-down loops, a dimensionless quantity characterizing the loop topology on a torus, is observed to exhibit small finite-size corrections and yields a high-precision critical point in two dimensions (2D) as $h_c \! =\! 3.044\, 330(6)$, significantly improving over the existing results and nearly excluding the best one $h_c \! =\! 3.044\, 38 (2)$. At criticality, the fractal dimensions of the loops are estimated as $d_{\ell \downarrow} (1{\rm D}) \! = \! 1.37(1) \! \approx \! 11/8 $ and $d_{\ell \downarrow} (2{\rm D}) \! = \! 1.75 (3)$, consistent with those for the classical 2D and 3D O(1) loop model, respectively. An interesting feature is that in one dimension (1D), both the spin-down and -up loops display the critical behavior in the whole disordered phase ($ 0 \! \leq \! h \! < \! h_c$), having a fractal dimension $d_{\ell} \! = \! 1.750 (7)$ that is consistent with the hull dimension $d_{\rm H} \! = \! 7/4$ for critical 2D percolation clusters. The current worm algorithm can be applied to simulate other quantum systems like hard-core boson models with pairing interactions.

preprint2019arXiv

Physics of Electrostatic Projection Revealed by High-Speed Video Imaging

Processes based on electrostatic projection are used extensively in industry, e.g. for mineral separations, electrophotography or manufacturing of coated abrasives, such as sandpaper. Despite decades of engineering practice, there are still unanswered questions. In this paper, we present a comprehensive experimental study of projection process of more than 1500 individual spherical alumina particles with a nominal size of 500 $μ$m, captured by high-speed video imaging and digital image analysis. Based on flight trajectories of approximately 1100 projected particles, we determined the acquired charge and dynamics as a function of relative humidity (RH) and electric field intensity and compared the results with classical theories. For RH levels of 50\% and above, more than 85\% of disposed particles were projected, even when the electric field intensity was at its minimum level. This suggests that, beyond a critical value of electric field intensity, relative humidity plays a more critical role in the projection process. We also observed that the charging time is reduced dramatically for RH levels of 50\% and above, possibly due to the build-up of thin water films around the particles which can facilitate charge transfer. In contrast, projected particles at 30\% RH level exhibited an excessive amount of electric charge, between two to four times than that of saturation value, which might be attributed to triboelectric charging effects. Finally, the physics of electrostatic projection is compared and contrasted with those of induced-charge electrokinetic phenomena, which share similar field-square scaling, as the applied field acts on its own induced charge to cause particle motion.

preprint2019arXiv

Precision study of $W^-W^+H$ production including parton shower effects at the CERN Large Hadron Collider

The precision study of $W^-W^+H$ production with subsequent $W^{\pm} \rightarrow l^{\pm} \overset{ _{(-)}}{ν_{l}}$ and $H \rightarrow b\bar{b}$ decays at the LHC can help us to study the Higgs gauge couplings and to search for new physics beyond the SM. In this paper, we calculate the shower-matched NLO QCD correction and the EW corrections from the $q\bar{q}$ annihilation and photon-induced channels to the $W^-W^+H$ production at the $14~ {\rm TeV}$ LHC, and deal with the subsequent decays of Higgs and $W^{\pm}$ bosons by adopting the {\sc MadSpin} method. Both the integrated cross section and some kinematic distributions of $W^{\pm}$, $H$ and their decay products are provided. We find that the QCD correction enhances the LO differential cross section significantly, while the EW correction from the $q\bar{q}$ annihilation channel obviously suppresses the LO differential cross section, especially in the high energy phase-space region due to the Sudakov effect. The $qγ$- and $γγ$-induced relative corrections are positive, and insensitive to the transverse momenta of $W^{\pm}$, $H$ and their decay products. These photon-induced corrections compensate the negative $q\bar{q}$-initiated EW correction, and become the dominant EW contribution as the increment of the $pp$ colliding energy. The parton shower (PS) effects on the kinematic distributions are nonnegligible. The PS relative correction to the $b$-jet transverse momentum distribution can exceed $100\%$ in the high $p_{T, b}$ region. We also investigate the scale and PDF uncertainties, and find that the theoretical error of the ${\rm QCD}+{\rm EW}+qγ+γγ$ corrected integrated cross section mainly comes from the renormalization scale dependence of the QCD correction.

preprint2016arXiv

Integrative modeling of sprout formation in angiogenesis: coupling the VEGFA-Notch signaling in a dynamic stalk-tip cell selection

During angiogenesis, new blood vessels headed by a migrating endothelial tip cell sprout from pre-existing ones. This process is known to be regulated by two signaling pathways concurrently, vascular endothelial growth factor A (VEGFA) and Notch-Delta. Extracellular VEGFA activates the intracellular Notch-Delta pathway in nearby endothelial cells which results in endothelial (stalk, tip) differentiation. Retinal astrocytes appear to play a crucial role in polarizing new sprouts by secreting VEGFA. \emph{In vivo} retinal angiogenesis experiments in neonatal mouse generated quantitative data on daily cell counts and morphological data of vascular network expanding over fibronectin-rich matrix. Based on this set of data and other existing ones, we developed a cell-based, multiscale mathematical model using the cellular Potts model framework to investigate the sprout evolution by integrating the VEGFA and Notch-Delta signaling pathways. The model incorporates three levels of description: intracellular, intercellular, and extracellular. Starting with a single astrocyte embedded in a fibronectin-rich matrix, we use the model to assess different scenarios regarding VEGFA levels and its interaction with matrix proteins. Simulation results suggest that astrocyte-derived VEGFA gradients along with heterogeneous ECM reproduces sprouting morphology, and the extension speed is in agreement with experimental data in 7 days postnatal mouse retina. Results also reproduce empirical observations in sprouting angiogenesis, including anastomosis, dynamic tip cell competition, and sprout regression as a result of Notch blockade.

preprint2016arXiv

Nanomaterial datasets to advance tomography in scanning transmission electron microscopy

Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data.

preprint2015arXiv

Topology of certain symplectic conifold transitions of $CP^{1}$-bundles

In this paper, we prove the existence of certain symplectic conifold transitions on all $CP^{1}$-bundles over symplectic 4--manifolds, which generalizes Smith, Thomas and Yau's examples of symplectic conifold transitions on trivial $CP^{1}$-bundles over Kähler surfaces. Our main result is to determine the diffeomorphism types of such symplectic conifold transitions of $CP^{1}$-bundles. In particular, this implies that in the case of trivial $CP^{1}$-bundles over projective surfaces, Smith, Thomas and Yau's examples of symplectic conifold transitions are diffeomorphic to Kähler 3--folds.

preprint2014arXiv

Breaking the Crowther Limit: Combining Depth-Sectioning and Tilt Tomography for High-Resolution, Wide-Field 3D Reconstructions

To date, high-resolution (< 1 nm) imaging of extended objects in three-dimensions (3D) has not been possible. A restriction known as the Crowther criterion forces a tradeoff between object size and resolution for 3D reconstructions by tomography. Further, the sub-Angstrom resolution of aberration-corrected electron microscopes is accompanied by a greatly diminished depth of field, causing regions of larger specimens (> 6 nm) to appear blurred or missing. Here we demonstrate a three-dimensional imaging method that overcomes both these limits by combining through-focal depth sectioning and traditional tilt-series tomography to reconstruct extended objects, with high-resolution, in all three dimensions. The large convergence angle in aberration corrected instruments now becomes a benefit and not a hindrance to higher quality reconstructions. A through-focal reconstruction over a 390 nm 3D carbon support containing over one hundred dealloyed and nanoporous PtCu catalyst particles revealed with sub-nanometer detail the extensive and connected interior pore structure that is created by the dealloying instability.

preprint2014arXiv

Multicomponent Gas Diffusion in Porous Electrodes

Multicomponent gas transport is investigated with unprecedented precision by AC impedance analysis of porous YSZ anode-supported solid oxide fuel cells. A fuel gas mixture of H2-H2O-N2 is fed to the anode, and impedance data are measured across the range of hydrogen partial pressure (10-100%) for open circuit conditions at three temperatures (800C, 850C and 900C) and for 300mA applied current at 800C. For the first time, analytical formulae for the diffusion resistance (Rb) of three standard models of multicomponent gas transport (Fick, Stefan-Maxwell, and Dusty Gas) are derived and tested against the impedance data. The tortuosity is the only fitting parameter since all the diffusion coefficients are known. Only the Dusty Gas model leads to a remarkable data collapse for over twenty experimental conditions, using a constant tortuosity consistent with permeability measurements and the Bruggeman relation. These results establish the accuracy of the Dusty Gas model for multicomponent gas diffusion in porous media and confirm the efficacy of electrochemical impedance analysis to precisely determine transport mechanisms.

Yi Jiang

What is connected

Connect this record

See the researcher in context

Building this map preview

42 published item(s)

ArcAligner: Adaptive Recursive Aligner for Compressed Context Embeddings in RAG

NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

OptiSet: Unified Optimizing Set Selection and Ranking for Retrieval-Augmented Generation

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling

A Direct Parallel-in-Time Quasi-Boundary Value Method for Inverse Space-Dependent Source Problems

Automatic Parameter Selection for Electron Ptychography via Bayesian Optimization

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

eeγproduction at photon-photon colliders at complete electroweak NLO accuracy

In Defense of Online Models for Video Instance Segmentation

Language as Queries for Referring Video Object Segmentation

Master integrals for mixed QCD-QED corrections to charged-current Drell-Yan production of a massive charged lepton

MetaFormer: A Unified Meta Framework for Fine-Grained Recognition

Model-Based Neural Network and Its Application to Line Spectral Estimation

Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation

Nonlinear Kalman Filter Using Cramer Rao Bound

Objects in Semantic Topology

Physics-Inspired Unsupervised Classification for Region of Interest in X-Ray Ptychography

Ptychopy: GPU framework for ptychographic data analysis

Real-time 3D analysis during electron tomography using tomviz

SeqFormer: Sequential Transformer for Video Instance Segmentation

Subtype-Former: a deep learning approach for cancer subtype discovery with multi-omics data

Towards Grand Unification of Object Tracking

Two-dimensional Obstructed Atomic Insulators with Fractional Corner Charge in MA$_2$Z$_4$ Family

Hybrid Interference Mitigation Using Analog Prewhitening

Robust Output Regulation and Reinforcement Learning-based Output Tracking Design for Unknown Linear Discrete-Time Systems

Topological classification and diagnosis in magnetically ordered electronic materials

$WWγ$ production at hadron colliders with NLO QCD+EW corrections and parton shower effects

Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019

Discontinuity-resolving shock-capturing schemes on unstructured grids

Learning to Segment the Tail

Measuring the linear and nonlinear elastic properties of brain tissue with shear waves and inverse analysis

Same data may bring conflict results: a caution to use the disruptive index

Worm-algorithm-type Simulation of Quantum Transverse-Field Ising Model

Physics of Electrostatic Projection Revealed by High-Speed Video Imaging

Precision study of $W^-W^+H$ production including parton shower effects at the CERN Large Hadron Collider

Integrative modeling of sprout formation in angiogenesis: coupling the VEGFA-Notch signaling in a dynamic stalk-tip cell selection

Nanomaterial datasets to advance tomography in scanning transmission electron microscopy

Topology of certain symplectic conifold transitions of $CP^{1}$-bundles

Breaking the Crowther Limit: Combining Depth-Sectioning and Tilt Tomography for High-Resolution, Wide-Field 3D Reconstructions

Multicomponent Gas Diffusion in Porous Electrodes