Source author record

Cheng Chen

Cheng Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

64works

40topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond

We investigate various stochastic bandit problems in the presence of adversarial corruptions. A seminal work for this problem is the BARBAR~\cite{gupta2019better} algorithm, which achieves both robustness and efficiency. However, it suffers from a regret of $O(KC)$, which does not match the lower bound of $Ω(C)$, where $K$ denotes the number of arms and $C$ denotes the corruption level. In this paper, we first improve the BARBAR algorithm by proposing a novel framework called BARBAT, which eliminates the factor of $K$ to achieve an optimal regret bound up to a logarithmic factor. We also extend BARBAT to various settings, including multi-agent bandits, graph bandits, combinatorial semi-bandits and batched bandits. Compared with the Follow-the-Regularized-Leader framework, our methods are more amenable to parallelization, making them suitable for multi-agent and batched bandit settings, and they incur lower computational costs, particularly in semi-bandit problems. Numerical experiments verify the efficiency of the proposed methods.

preprint2026arXiv

Elimination Templates in Macaulay2

We introduce the package \texttt{EliminationTemplates} for the Macaulay2 computer algebra system, which provides tools for constructing automatic solvers for families of zero-dimensional radical ideals depending on algebraically independent parameters. This article provides a self-contained description of how elimination templates are constructed for such families and their specialization properties. Additionally, we describe the main functionality and datatypes provided by our package, and illustrate its usage on several examples, including applications from computer vision from which elimination templates originated.

preprint2026arXiv

Fourier-Jacobi models for real symplectic-metaplectic groups: the basic case

In this paper, we generalize the method of Gan-Ichino and Atobe in [GI16][A18] to the field of real numbers and prove the basic tempered case of the local Gan-Gross-Prasad conjecture for Fourier-Jacobi models of symplectic-metaplectic groups, based on the tempered case of the conjecture for Bessel models proved in [CL22] by Chen-Luo.

preprint2026arXiv

From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion

Vision-Language Models (VLMs) create a severe visual feature bottleneck by using a crude, asymmetric connection that links only the output of the vision encoder to the input of the large language model (LLM). This static architecture fundamentally limits the ability of LLMs to achieve comprehensive alignment with hierarchical visual knowledge, compromising their capacity to accurately integrate local details with global semantics into coherent reasoning. To resolve this, we introduce Cross-Layer Injection (CLI), a novel and lightweight framework that forges a dynamic many-to-many bridge between the two modalities. CLI consists of two synergistic, parameter-efficient components: an Adaptive Multi-Projection (AMP) module that harmonizes features from diverse vision layers, and an Adaptive Gating Fusion (AGF) mechanism that empowers the LLM to selectively inject the most relevant visual information based on its real-time decoding context. We validate the effectiveness and versatility of CLI by integrating it into LLaVA-OneVision and LLaVA-1.5. Extensive experiments on 18 diverse benchmarks demonstrate significant performance improvements, establishing CLI as a scalable paradigm that unlocks deeper multimodal understanding by granting LLMs on-demand access to the full visual hierarchy.

preprint2026arXiv

GEM: Gaussian Evolution Model for Occupancy Forecasting and Motion Planning

Future 3D semantic occupancy forecasting and motion planning are central to autonomous driving, as they require models to reason about how surrounding scenes evolve and how the ego vehicle should act. Existing occupancy world models commonly discretize scenes into latent embeddings, volumetric features, or quantized tokens, and forecast future states through fixed-step autoregressive generation. This limits temporal flexibility, obscures scene evolution, accumulates errors over long horizons, and poorly matches the continuous-time dynamics of real driving scenes. We propose GEM, a Gaussian Evolution Model for non-autoregressive occupancy world modeling, where driving scenes are represented as explicit continuous 4D Gaussian primitives with learned dynamics. Instead of rolling out future occupancy states step by step, GEM directly queries the Gaussian world representation at arbitrary timestamps and splats the corresponding conditional 3D Gaussians into semantic occupancy volumes. This enables efficient forecasting over the full horizon while retaining a compact and interpretable scene representation. By decoupling spatial geometry, temporal support, and primitive motion, GEM makes the predicted world easier to inspect, as each primitive's evolution can be followed continuously over time. The same representation also supports motion planning by predicting future ego trajectories from the learned Gaussian world. Extensive experiments show that GEM achieves state-of-the-art future semantic occupancy forecasting and strong motion planning performance, while providing flexible temporal querying.

preprint2026arXiv

HTPO: Towards Exploration-Exploitation Balanced Policy Optimization via Hierarchical Token-level Objective Control

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a pivotal technique for enhancing the reasoning capabilities of Large Language Models (LLMs). However, the de facto practice of mainstream RL algorithms is to treat all tokens of one response equally and assign the same optimization objective to each token, failing to provide granular guidance for the reasoning process. While in Chain-of-Thought (CoT) reasoning, different tokens usually play distinct roles. Therefore, the current RL algorithms lack an effective mechanism to dynamically balance the exploration-exploitation trade-off during learning. To this end, we propose Hierarchical Token-level Objective Control Policy Optimization (HTPO), a novel RL algorithm that takes the divide-and-conquer idea to hierarchically partition the response tokens into specific functional groups from three aspects (i.e., prompt difficulty, answer correctness, and token entropy). Within each group, according to the contributions to exploration or exploitation, we design specialized optimization objectives to facilitate the effective execution of each token's expected functionality. In this way, HTPO can achieve a more balanced exploration-exploitation trade-off. Extensive experiments on challenging reasoning benchmarks validate the superiority of our HTPO algorithm, which significantly outperforms the strong DAPO baseline (e.g., +8.6% and +6.7% on AIME'24 and AIME'25, respectively). When scaling test-time compute, the HTPO-trained model maintains a consistent performance advantage over the DAPO baseline, and the gap widens as the sampling budget increases, validating that our adaptive token-level control method fosters effective exploration without sacrificing exploitation performance. Code will be at https://github.com/xcyao00/HTPO.

preprint2026arXiv

Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts

Recently, diffusion models have achieved a great performance with a small dataset of size $n$ and a fast optimization process. However, the estimation error of diffusion models suffers from the curse of dimensionality $n^{-1/D}$ with the data dimension $D$. Since images are usually a union of low-dimensional manifolds, current works model the data as a union of linear subspaces with Gaussian latent and achieve a $1/\sqrt{n}$ bound. Though this modeling reflects the multi-manifold property, the Gaussian latent can not capture the multi-modal property of the latent manifold. To bridge this gap, we propose the mixture subspace of low-rank mixture of Gaussian (MoLR-MoG) modeling, which models the target data as a union of $K$ linear subspaces, and each subspace admits a mixture of Gaussian latent ($n_k$ modals with dimension $d_k$). With this modeling, the corresponding score function naturally has a mixture of expert (MoE) structure, captures the multi-modal information, and contains nonlinear property. We first conduct real-world experiments to show that the generation results of MoE-latent MoG NN are much better than MoE-latent Gaussian score. Furthermore, MoE-latent MoG NN achieves a comparable performance with MoE-latent Unet with $10 \times$ parameters. These results indicate that the MoLR-MoG modeling is reasonable and suitable for real-world data. After that, based on such MoE-latent MoG score, we provide a $R^4\sqrt{Σ_{k=1}^Kn_k}\sqrt{Σ_{k=1}^Kn_kd_k}/\sqrt{n}$ estimation error, which escapes the curse of dimensionality by using data structure. Finally, we study the optimization process and prove the convergence guarantee under the MoLR-MoG modeling. Combined with these results, under a setting close to real-world data, this work explains why diffusion models only require a small training sample and enjoy a fast optimization process to achieve a great performance.

preprint2026arXiv

On the orbital evolution of binaries with polar circumbinary disks

Binaries occur in many astrophysical systems, from young protostellar binaries in star forming regions to supermassive black hole binaries in galaxy centers. In many cases, a circumbinary disk of gas forms around the binary with an orbit that may be misaligned to the binary plane. Misaligned disks around nearly circular binaries evolve into disks that are either aligned or counteraligned with the binary orbit. However, if the binary is sufficiently eccentric, then it can be more likely that the disk ends up in a polar-aligned configuration in which the disk angular momentum vector aligns with the binary eccentricity vector. We use Smoothed Particle Hydrodynamics simulations, evolved to an approximate steady state under mass injection, to determine the orbital evolution of a binary with a polar-aligned disk for a range of binary-disk parameters. We find that, in all of the cases we have simulated, the binary shrinks with time. The decay rate is larger than for binaries surrounded by aligned or retrograde disks with matched disk parameters. The rate of shrinkage is largely unaltered by the size of the sink radii employed for the binary stars, but for small enough sink radii some of the models exhibit long-lived polar circumprimary disks, which are continually fed mass from the circumbinary disk. We discuss our results in the contexts of planet formation in young polar-aligned disks and merging supermassive black holes in galaxy centers.

preprint2026arXiv

RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs

Graph Random Walks (GRWs) offer efficient approximations of key graph properties and have been widely adopted in many applications. However, GRW workloads are notoriously difficult to accelerate due to their strong data dependencies, irregular memory access patterns, and imbalanced execution behavior. While recent work explores FPGA-based accelerators for GRWs, existing solutions fall far short of hardware potential due to inefficient pipelining and static scheduling. This paper presents RidgeWalker, a high-performance GRW accelerator designed for datacenter FPGAs. The key insight behind RidgeWalker is that the Markov property of GRWs allows decomposition into stateless, fine-grained tasks that can be executed out-of-order without compromising correctness. Building on this, RidgeWalker introduces an asynchronous pipeline architecture with a feedback-driven scheduler grounded in queuing theory, enabling perfect pipelining and adaptive load balancing. We prototype RidgeWalker on datacenter FPGAs and evaluated it across a range of GRW algorithms and real-world graph datasets. Experimental results demonstrate that RidgeWalker achieves an average speedup of 7.0x over state-of-the-art FPGA solutions and 8.1x over GPU solutions, with peak speedups of up to 71.0x and 22.9x, respectively. The source code is publicly available at https://github.com/Xtra-Computing/RidgeWalker.

preprint2026arXiv

SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment

Unifying multimodal understanding and generation is a compelling frontier that is beginning to emerge in the medical field. However, the limited existing unified medical models typically treat understanding and generation as disjoint objectives, lacking a meaningful functional synergy. In this work, we identify and address a critical question in unified medical modeling: what form of understanding truly benefits generation. We present SynerMedGen, a unified framework built on the proposed principle of generation-aligned understanding, which synergizes understanding objectives with generation tasks via task alignment. SynerMedGen introduces three generation-aligned understanding tasks and a two-stage training strategy that transfers generation-beneficial representations learned during understanding training to medical image synthesis. Remarkably, even with understanding training alone, our SynerMedGen achieves strong zero-shot performance across 22 medical image synthesis tasks and demonstrates robust generalization to unseen datasets. When combined with generation training, SynerMedGen consistently outperforms state-of-the-art specialized medical image synthesis models as well as recent unified medical models. We also release a large-scale dataset named SynerMed consisting of 1M paired synthesis samples and 2M generation-derived understanding instances to support further research on understanding-generation synergy. Our project can be accessed at https://github.com/Mhilab/SynerMedGen.

preprint2026arXiv

TimeMM: Time-as-Operator Spectral Filtering for Dynamic Multimodal Recommendation

Multimodal recommendation improves user modeling by integrating collaborative signals with heterogeneous item content. In real applications, user interests evolve over time and exhibit nonstationary dynamics, where different preference factors change at different rates. This challenge is amplified in multimodal settings because visual and textual cues can dominate decisions under different temporal regimes. Despite strong progress, most multimodal recommenders still rely on static interaction graphs or coarse temporal heuristics, which limits their ability to model continuous preference evolution with fine-grained temporal adaptation. To address these limitations, we propose TimeMM, a time-conditioned spectral filtering framework for dynamic multimodal recommendation. TimeMM instantiates Time-as-Operator by mapping interaction recency to a family of parametric temporal kernels that reweight edges on the user--item graph, producing component-specific representations without explicit eigendecomposition. To capture non-stationary interests, we introduce Adaptive Spectral Filtering that mixes the operator bank according to temporal context, yielding prediction-specific effective spectral responses. To account for modality-specific temporal sensitivity, we further propose Spectral-Aware Modality Routing that calibrates visual and textual contributions conditioned on the same temporal context. Finally, a ranking-space Spectral Diversity Regularization encourages complementary expert behaviors and prevents filter-bank collapse. Extensive experiments on real-world benchmarks demonstrate that TimeMM consistently outperforms state-of-the-art multimodal recommenders while maintaining linear-time scalability.

preprint2023arXiv

Diffusion Model based Semi-supervised Learning on Brain Hemorrhage Images for Efficient Midline Shift Quantification

Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptions. In this paper, we propose a novel semi-supervised framework to accurately measure the scale of MLS from head CT scans. We formulate the MLS measurement task as a deformation estimation problem and solve it using a few MLS slices with sparse labels. Meanwhile, with the help of diffusion models, we are able to use a great number of unlabeled MLS data and 2793 non-MLS cases for representation learning and regularization. The extracted representation reflects how the image is different from a non-MLS image and regularization serves an important role in the sparse-to-dense refinement of the deformation field. Our experiment on a real clinical brain hemorrhage dataset has achieved state-of-the-art performance and can generate interpretable deformation fields.

preprint2022arXiv

3D-model ShapeNet Core Classification using Meta-Semantic Learning

Understanding 3D point cloud models for learning purposes has become an imperative challenge for real-world identification such as autonomous driving systems. A wide variety of solutions using deep learning have been proposed for point cloud segmentation, object detection, and classification. These methods, however, often require a considerable number of model parameters and are computationally expensive. We study a semantic dimension of given 3D data points and propose an efficient method called Meta-Semantic Learning (Meta-SeL). Meta-SeL is an integrated framework that leverages two input 3D local points (input 3D models and part-segmentation labels), providing a time and cost-efficient, and precise projection model for a number of 3D recognition tasks. The results indicate that Meta-SeL yields competitive performance in comparison with other complex state-of-the-art work. Moreover, being random shuffle invariant, Meta-SeL is resilient to translation as well as jittering noise.

preprint2022arXiv

5 Gbps Optical Wireless Communication Using Commercial SPAD Array Receivers

Photon counting detectors such as single-photon avalanche diode (SPAD) arrays can be utilized to improve the sensitivity of optical wireless communication (OWC) systems. However, the achievable data rate of SPAD-based OWC systems is strongly limited by the nonlinearity induced by SPAD dead time. In this work, the performance of SPAD-based OWC system with orthogonal frequency division multiplexing (OFDM) is investigated and compared with that of on-off keying (OOK). We employ nonlinear equalization, peak-to-average power ratio optimization by adjusting the OFDM clipping level, and adaptive bit and energy loading to achieve a record experimental data rate of 5 Gbps. The contrasting optimal regimes of operation of the two modulation schemes are also demonstrated.

preprint2022arXiv

A Unified Framework for Campaign Performance Forecasting in Online Display Advertising

Advertisers usually enjoy the flexibility to choose criteria like target audience, geographic area and bid price when planning an campaign for online display advertising, while they lack forecast information on campaign performance to optimize delivery strategies in advance, resulting in a waste of labour and budget for feedback adjustments. In this paper, we aim to forecast key performance indicators for new campaigns given any certain criteria. Interpretable and accurate results could enable advertisers to manage and optimize their campaign criteria. There are several challenges for this very task. First, platforms usually offer advertisers various criteria when they plan an advertising campaign, it is difficult to estimate campaign performance unifiedly because of the great difference among bidding types. Furthermore, complex strategies applied in bidding system bring great fluctuation on campaign performance, making estimation accuracy an extremely tough problem. To address above challenges, we propose a novel Campaign Performance Forecasting framework, which firstly reproduces campaign performance on historical logs under various bidding types with a unified replay algorithm, in which essential auction processes like match and rank are replayed, ensuring the interpretability on forecast results. Then, we innovatively introduce a multi-task learning method to calibrate the deviation of estimation brought by hard-to-reproduce bidding strategies in replay. The method captures mixture calibration patterns among related forecast indicators to map the estimated results to the true ones, improving both accuracy and efficiency significantly. Experiment results on a dataset from Taobao.com demonstrate that the proposed framework significantly outperforms other baselines by a large margin, and an online A/B test verifies its effectiveness in the real world.

preprint2022arXiv

A Unified Two-Stage Group Semantics Propagation and Contrastive Learning Network for Co-Saliency Detection

Co-saliency detection (CoSOD) aims at discovering the repetitive salient objects from multiple images. Two primary challenges are group semantics extraction and noise object suppression. In this paper, we present a unified Two-stage grOup semantics PropagatIon and Contrastive learning NETwork (TopicNet) for CoSOD. TopicNet can be decomposed into two substructures, including a two-stage group semantics propagation module (TGSP) to address the first challenge and a contrastive learning module (CLM) to address the second challenge. Concretely, for TGSP, we design an image-to-group propagation module (IGP) to capture the consensus representation of intra-group similar features and a group-to-pixel propagation module (GPP) to build the relevancy of consensus representation. For CLM, with the design of positive samples, the semantic consistency is enhanced. With the design of negative samples, the noise objects are suppressed. Experimental results on three prevailing benchmarks reveal that TopicNet outperforms other competitors in terms of various evaluation metrics.

preprint2022arXiv

Approaching a Minimal Topological Electronic Structure in Antiferromagnetic Topological Insulator MnBi2Te4 via Surface Modification

The topological electronic structure plays a central role in the non-trivial physical properties in topological quantum materials. A minimal, hydrogen-atom-like topological electronic structure is desired for researches. In this work, we demonstrate an effort towards the realization of such a system in the intrinsic magnetic topological insulator MnBi2Te4, by manipulating the topological surface state (TSS) via surface modification. Using high resolution laser- and synchrotron-based angle-resolved photoemission spectroscopy (ARPES), we found the TSS in MnBi2Te4 is heavily hybridized with a trivial Rashba-type surface state (RSS), which could be efficiently removed by the in situ surface potassium (K) dosing. By employing multiple experimental methods to characterize K dosed surface, we attribute such a modification to the electrochemical reactions of K clusters on the surface. Our work not only gives a clear band assignment in MnBi2Te4, but also provides possible new routes in accentuating the topological behavior in the magnetic topological quantum materials.

preprint2022arXiv

BLINK with Elasticsearch for Efficient Entity Linking in Business Conversations

An Entity Linking system aligns the textual mentions of entities in a text to their corresponding entries in a knowledge base. However, deploying a neural entity linking system for efficient real-time inference in production environments is a challenging task. In this work, we present a neural entity linking system that connects the product and organization type entities in business conversations to their corresponding Wikipedia and Wikidata entries. The proposed system leverages Elasticsearch to ensure inference efficiency when deployed in a resource limited cloud machine, and obtains significant improvements in terms of inference speed and memory consumption while retaining high accuracy.

preprint2022arXiv

Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval

Recently, the cross-modal pre-training task has been a hotspot because of its wide application in various down-streaming researches including retrieval, captioning, question answering and so on. However, exiting methods adopt a one-stream pre-training model to explore the united vision-language representation for conducting cross-modal retrieval, which easily suffer from the calculation explosion. Moreover, although the conventional double-stream structures are quite efficient, they still lack the vital cross-modal interactions, resulting in low performances. Motivated by these challenges, we put forward a Contrastive Cross-Modal Knowledge Sharing Pre-training (COOKIE) to grasp the joint text-image representations. Structurally, COOKIE adopts the traditional double-stream structure because of the acceptable time consumption. To overcome the inherent defects of double-stream structure as mentioned above, we elaborately design two effective modules. Concretely, the first module is a weight-sharing transformer that builds on the head of the visual and textual encoders, aiming to semantically align text and image. This design enables visual and textual paths focus on the same semantics. The other one is three specially designed contrastive learning, aiming to share knowledge between different models. The shared cross-modal knowledge develops the study of unimodal representation greatly, promoting the single-modal retrieval tasks. Extensive experimental results on multi-modal matching researches that includes cross-modal retrieval, text matching, and image retrieval reveal the superiors in calculation efficiency and statistical indicators of our pre-training model.

preprint2022arXiv

Developing a Production System for Purpose of Call Detection in Business Phone Conversations

For agents at a contact centre receiving calls, the most important piece of information is the reason for a given call. An agent cannot provide support on a call if they do not know why a customer is calling. In this paper we describe our implementation of a commercial system to detect Purpose of Call statements in English business call transcripts in real time. We present a detailed analysis of types of Purpose of Call statements and language patterns related to them, discuss an approach to collect rich training data by bootstrapping from a set of rules to a neural model, and describe a hybrid model which consists of a transformer-based classifier and a set of rules by leveraging insights from the analysis of call transcripts. The model achieved 88.6 F1 on average in various types of business calls when tested on real life data and has low inference time. We reflect on the challenges and design decisions when developing and deploying the system.

preprint2022arXiv

Direct Visualization and Manipulation of Tunable Quantum Well State in Semiconducting Nb2SiTe4

Quantum well states (QWSs) can form at the surface or interfaces of materials with confinement potential. They have broad applications in electronic and optical devices such as high mobility electron transistor, photodetector and quantum well laser. The properties of the QWSs are usually the key factors for the performance of the devices. However, direct visualization and manipulation of such states are in general challenging. In this work, by using angle-resolved photoemission spectroscopy (ARPES) and scanning tunneling microscopy/spectroscopy (STM/STS), we directly probe the QWSs generated on the vacuum interface of a narrow band gap semiconductor Nb2SiTe4. Interestingly, the position and splitting of QWSs could be easily manipulated via potassium (K) dosage onto the sample surface. Our results suggest Nb2SiTe4 to be an intriguing semiconductor system to study and engineer the QWSs, which has great potential in device applications.

preprint2022arXiv

DLTTA: Dynamic Learning Rate for Test-time Adaptation on Cross-domain Medical Images

Test-time adaptation (TTA) has increasingly been an important topic to efficiently tackle the cross-domain distribution shift at test time for medical images from different institutions. Previous TTA methods have a common limitation of using a fixed learning rate for all the test samples. Such a practice would be sub-optimal for TTA, because test data may arrive sequentially therefore the scale of distribution shift would change frequently. To address this problem, we propose a novel dynamic learning rate adjustment method for test-time adaptation, called DLTTA, which dynamically modulates the amount of weights update for each test image to account for the differences in their distribution shift. Specifically, our DLTTA is equipped with a memory bank based estimation scheme to effectively measure the discrepancy of a given test sample. Based on this estimated discrepancy, a dynamic learning rate adjustment strategy is then developed to achieve a suitable degree of adaptation for each test sample. The effectiveness and general applicability of our DLTTA is extensively demonstrated on three tasks including retinal optical coherence tomography (OCT) segmentation, histopathological image classification, and prostate 3D MRI segmentation. Our method achieves effective and fast test-time adaptation with consistent performance improvement over current state-of-the-art test-time adaptation methods. Code is available at: https://github.com/med-air/DLTTA.

preprint2022arXiv

Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation

Automatic surgical scene segmentation is fundamental for facilitating cognitive intelligence in the modern operating theatre. Previous works rely on conventional aggregation modules (e.g., dilated convolution, convolutional LSTM), which only make use of the local context. In this paper, we propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance, by progressively capturing the global context. We firstly develop a hierarchy Transformer to capture intra-video relation that includes richer spatial and temporal cues from neighbor pixels and previous frames. A joint space-time window shift scheme is proposed to efficiently aggregate these two cues into each pixel embedding. Then, we explore inter-video relation via pixel-to-pixel contrastive learning, which well structures the global embedding space. A multi-source contrast training objective is developed to group the pixel embeddings across videos with the ground-truth guidance, which is crucial for learning the global property of the whole data. We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset. Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches. Code is available at https://github.com/YuemingJin/STswinCL.

preprint2022arXiv

Impression Allocation and Policy Search in Display Advertising

In online display advertising, guaranteed contracts and real-time bidding (RTB) are two major ways to sell impressions for a publisher. For large publishers, simultaneously selling impressions through both guaranteed contracts and in-house RTB has become a popular choice. Generally speaking, a publisher needs to derive an impression allocation strategy between guaranteed contracts and RTB to maximize its overall outcome (e.g., revenue and/or impression quality). However, deriving the optimal strategy is not a trivial task, e.g., the strategy should encourage incentive compatibility in RTB and tackle common challenges in real-world applications such as unstable traffic patterns (e.g., impression volume and bid landscape changing). In this paper, we formulate impression allocation as an auction problem where each guaranteed contract submits virtual bids for individual impressions. With this formulation, we derive the optimal bidding functions for the guaranteed contracts, which result in the optimal impression allocation. In order to address the unstable traffic pattern challenge and achieve the optimal overall outcome, we propose a multi-agent reinforcement learning method to adjust the bids from each guaranteed contract, which is simple, converging efficiently and scalable. The experiments conducted on real-world datasets demonstrate the effectiveness of our method.

preprint2022arXiv

Inverse is Better! Fast and Accurate Prompt for Few-shot Slot Tagging

Prompting methods recently achieve impressive success in few-shot learning. These methods modify input samples with prompt sentence pieces, and decode label tokens to map samples to corresponding labels. However, such a paradigm is very inefficient for the task of slot tagging. Since slot tagging samples are multiple consecutive words in a sentence, the prompting methods have to enumerate all n-grams token spans to find all the possible slots, which greatly slows down the prediction. To tackle this, we introduce an inverse paradigm for prompting. Different from the classic prompts mapping tokens to labels, we reversely predict slot values given slot types. Such inverse prompting only requires a one-turn prediction for each slot type and greatly speeds up the prediction. Besides, we propose a novel Iterative Prediction Strategy, from which the model learns to refine predictions by considering the relations between different slot types. We find, somewhat surprisingly, the proposed method not only predicts faster but also significantly improves the effect (improve over 6.1 F1-scores on 10-shot setting) and achieves new state-of-the-art performance.

preprint2022arXiv

MuZero with Self-competition for Rate Control in VP9 Video Compression

Video streaming usage has seen a significant rise as entertainment, education, and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users, and reduce energy use and costs overall. In this paper, we present an application of the MuZero algorithm to the challenge of video compression. Specifically, we target the problem of learning a rate control policy to select the quantization parameters (QP) in the encoding process of libvpx, an open source VP9 video compression library widely used by popular video-on-demand (VOD) services. We treat this as a sequential decision making problem to maximize the video quality with an episodic constraint imposed by the target bitrate. Notably, we introduce a novel self-competition based reward mechanism to solve constrained RL with variable constraint satisfaction difficulty, which is challenging for existing constrained RL methods. We demonstrate that the MuZero-based rate control achieves an average 6.28% reduction in size of the compressed videos for the same delivered video quality level (measured as PSNR BD-rate) compared to libvpx's two-pass VBR rate control policy, while having better constraint satisfaction behavior.

preprint2022arXiv

Neural Network Based Pore Flow Field Prediction in Porous Media Using Super Resolution

Direct pore-scale simulations of fluid flow through porous media are computationally expensive to perform for realistic systems. Previous works have demonstrated using the geometry of the microstructure of porous media to predict the velocity fields therein based on neural networks. However, such trained neural networks do not perform well for unseen porous media with a large degree of heterogeneity. In this study we propose that incorporating a coarse velocity field in the input of neural networks is an effective way to improve the prediction performance. The coarse velocity field can be simulated with a low computational cost and provides global information to regularize the ill-posedness of the learning problem, which is usually caused by the use of local geometries due to the computational resource constraints. We show that incorporating the coarse-mesh velocity field significantly improves the prediction accuracy of the fine-mesh velocity field by comparison to the prediction that relies on geometric information alone, especially for the porous medium with a large interior vuggy pore space. We also show the flexibility of training the network in using coarse velocity fields with various resolutions. The results suggest that even using coarse velocity field with a very low resolution, the predictions are still enhanced and close to the ground truths. The feasibility of the method is further demonstrated by testing the trained network on real rocks. This study highlights the merits of incorporating a coarse-mesh velocity field into the input for neural networks, which provides global, physics-based information for the model, thereby improving the model's generalization capability.

preprint2022arXiv

Observation of Dimension-Crossover of a Tunable 1D Dirac Fermion in Topological Semimetal NbSi$_x$Te$_2$

Condensed matter systems in low dimensions exhibit emergent physics that does not exist in three dimensions. When electrons are confined to one dimension (1D), some significant electronic states appear, such as charge density wave, spin-charge separations and Su-Schrieffer-Heeger (SSH) topological state. However, a clear understanding of how the 1D electronic properties connects with topology is currently lacking. Here we systematically investigated the characteristic 1D Dirac fermion electronic structure originated from the metallic NbTe$_2$ chains on the surface of the composition-tunable layered compound NbSi$_x$Te$_2$ ($x$ = 0.40 and 0.43) using angle-resolved photoemission spectroscopy. We found the Dirac fermion forms a Dirac nodal line structure protected by the combined $\widetilde{\mathcal{M}}{\rm_y}$ and time-reversal symmetry T and proves the NbSi$_x$Te$_2$ system as a topological semimetal, in consistent with the ab-initio calculations. As $x$ decreases, the interaction between adjacent NbTe2 chains increases and Dirac fermion goes through a dimension-crossover from 1D to 2D, as evidenced by the variation of its Fermi surface and Fermi velocity across the Brillouin zone in consistence with a Dirac SSH model. Our findings demonstrate a tunable 1D Dirac electron system, which offers a versatile platform for the exploration of intriguing 1D physics and device applications.

preprint2022arXiv

Online Active Regression

Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+ε)$-approximate solution, our proposed algorithms only require $\tilde{\mathcal{O}}(ε^{-1} d \log(nκ))$ queries of labels, where $n$ is the number of data points and $κ$ is a quantity, called the condition number, of the data points. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.

preprint2022arXiv

Online Reflective Learning for Robust Medical Image Segmentation

Deep segmentation models often face the failure risks when the testing image presents unseen distributions. Improving model robustness against these risks is crucial for the large-scale clinical application of deep models. In this study, inspired by human learning cycle, we propose a novel online reflective learning framework (RefSeg) to improve segmentation robustness. Based on the reflection-on-action conception, our RefSeg firstly drives the deep model to take action to obtain semantic segmentation. Then, RefSeg triggers the model to reflect itself. Because making deep models realize their segmentation failures during testing is challenging, RefSeg synthesizes a realistic proxy image from the semantic mask to help deep models build intuitive and effective reflections. This proxy translates and emphasizes the segmentation flaws. By maximizing the structural similarity between the raw input and the proxy, the reflection-on-action loop is closed with segmentation robustness improved. RefSeg runs in the testing phase and is general for segmentation models. Extensive validation on three medical image segmentation tasks with a public cardiac MR dataset and two in-house large ultrasound datasets show that our RefSeg remarkably improves model robustness and reports state-of-the-art performance over strong competitors.

preprint2022arXiv

Performance Analysis of SPAD-Based Optical Wireless Communication with OFDM

In recent years, there has been a growing interest in the use of single-photon avalanche diode (SPAD) in optical wireless communication (OWC). SPAD operates in the Geiger mode and can act as a photon counting receiver obviating the need for a transimpedance amplifier (TIA). Although a SPAD receiver can provide higher sensitivity compared to the traditional linear photodetectors, it suffers from the dead-time-induced nonlinearity. To improve the data rates of SPAD-based OWC systems, optical orthogonal frequency division multiplexing (OFDM) can be employed. This paper provides a comprehensive theoretical analysis of the SPAD-based OWC systems using OFDM signalling considering the effects of signal clipping, SPAD nonlinearity, and signal-dependent shot noise. An equivalent additive Gaussian noise channel model is proposed to describe the performance of the SPAD-based OFDM system. The statistics of the proposed channel model and the analytical expressions of the signal-to-noise ratio (SNR) and bit error rate (BER) are derived in closed forms. By means of extensive numerical results, the impact of the unique receiver nonlinearity on the system performance is investigated. The results demonstrate new insights into different optical power regimes of reliable operation for SPAD-based OFDM systems even well beyond SPAD saturation level.

preprint2022arXiv

Persistent exchange splitting in a chiral helimagnet Cr1/3NbS2

Using high-resolution angle-resolved photoemission spectroscopy (ARPES) and ab-initio calculation, we systematically investigate the electronic structure of the chiral helimagnet Cr1/3NbS2 and its temperature evolution. The comparison with NbS2 suggests that the electronic structure of Cr1/3NbS2 is strongly modified by the intercalation of Cr atoms. Our ab-initio calculation, consistent with experimental result, suggests strong hybridization between Nb- and Cr-derived states near the Fermi level. In the chiral helimagnetic state (below the Curie temperature Tc), we observe exchange splitting of the energy bands crossing EF, which follows the temperature evolution of the magnetic moment, suggesting an important role of the conduction electrons in the long-range magnetic ordering. Interestingly, the exchange splitting persists far above Tc with negligible temperature dependence, in drastic contrast to the itinerant ferromagnetism described by the Stoner model, indicating the existence of short-range magnetic order. Our results provide important insights into the microscopic mechanism of the chiral helimagnetic ordering in Cr1/3NbS2.

preprint2022arXiv

SHREC 2021: Classification in cryo-electron tomograms

Cryo-electron tomography (cryo-ET) is an imaging technique that allows three-dimensional visualization of macro-molecular assemblies under near-native conditions. Cryo-ET comes with a number of challenges, mainly low signal-to-noise and inability to obtain images from all angles. Computational methods are key to analyze cryo-electron tomograms. To promote innovation in computational methods, we generate a novel simulated dataset to benchmark different methods of localization and classification of biological macromolecules in tomograms. Our publicly available dataset contains ten tomographic reconstructions of simulated cell-like volumes. Each volume contains twelve different types of complexes, varying in size, function and structure. In this paper, we have evaluated seven different methods of finding and classifying proteins. Seven research groups present results obtained with learning-based methods and trained on the simulated dataset, as well as a baseline template matching (TM), a traditional method widely used in cryo-ET research. We show that learning-based approaches can achieve notably better localization and classification performance than TM. We also experimentally confirm that there is a negative relationship between particle size and performance for all methods.

preprint2022arXiv

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves $O(\log{T})$ regret in the stochastic environment and $O(m\sqrt{nT})$ regret in the adversarial environment, where $T$ is the number of rounds, $n$ is the number of items and $m$ is the number of positions. We also provide a lower bound of order $Ω(m\sqrt{nT})$ for adversarial PBM, which matches our upper bound and improves over the state-of-the-art lower bound. The experiments show that our algorithm could simultaneously learn in both stochastic and adversarial environments and is competitive compared to existing methods that are designed for a single environment.

preprint2022arXiv

Single-domain Generalization in Medical Image Segmentation via Test-time Adaptation from Shape Dictionary

Domain generalization typically requires data from multiple source domains for model learning. However, such strong assumption may not always hold in practice, especially in medical field where the data sharing is highly concerned and sometimes prohibitive due to privacy issue. This paper studies the important yet challenging single domain generalization problem, in which a model is learned under the worst-case scenario with only one source domain to directly generalize to different unseen target domains. We present a novel approach to address this problem in medical image segmentation, which extracts and integrates the semantic shape prior information of segmentation that are invariant across domains and can be well-captured even from single domain data to facilitate segmentation under distribution shifts. Besides, a test-time adaptation strategy with dual-consistency regularization is further devised to promote dynamic incorporation of these shape priors under each unseen domain to improve model generalizability. Extensive experiments on two medical image segmentation tasks demonstrate the consistent improvements of our method across various unseen domains, as well as its superiority over state-of-the-art approaches in addressing domain generalization under the worst-case scenario.

preprint2022arXiv

Temperature effect on non-Darcian flow in low-permeability porous media

In low-permeability porous media, the velocity of a fluid flow exhibits a nonlinear dependence on the imposed pressure gradient. This non-Darcian flow behavior has important implications to geological disposal of nuclear waste, hydrocarbon extraction from shale, and flow and transport in clay-rich aquifers. Temperature has been postulated to affect the threshold pressure gradient of a non-Darcian flow; however, the supporting data is very limited. In this study we for the first time report a systematic measurement of the threshold pressure gradient under various permeabilities and temperatures. The results show that a higher temperature leads to a lower threshold pressure gradient under the same permeability and a faster reduction of the threshold pressure gradient with increasing permeability. The experimental data are fitted to a two-parameter model to determine the parameters, h0 and a, which characterize the interfacial fluid-solid interactions and the transition between the Darcy and non-Darcian regimes.

preprint2022arXiv

Test-time Adaptation with Calibration of Medical Image Classification Nets for Label Distribution Shift

Class distribution plays an important role in learning deep classifiers. When the proportion of each class in the test set differs from the training set, the performance of classification nets usually degrades. Such a label distribution shift problem is common in medical diagnosis since the prevalence of disease vary over location and time. In this paper, we propose the first method to tackle label shift for medical image classification, which effectively adapt the model learned from a single training label distribution to arbitrary unknown test label distribution. Our approach innovates distribution calibration to learn multiple representative classifiers, which are capable of handling different one-dominating-class distributions. When given a test image, the diverse classifiers are dynamically aggregated via the consistency-driven test-time adaptation, to deal with the unknown test label distribution. We validate our method on two important medical image classification tasks including liver fibrosis staging and COVID-19 severity prediction. Our experiments clearly show the decreased model performance under label shift. With our method, model performance significantly improves on all the test datasets with different label shifts for both medical image diagnosis tasks.

preprint2021arXiv

A concise review of Rydberg atom based quantum computation and quantum simulation

Quantum information processing based on Rydberg atoms emerged as a promising direction two decades ago. Recent experimental and theoretical progresses have shined exciting light on this avenue. In this concise review, we will briefly introduce the basics of Rydberg atoms and their recent applications in associated areas of neutral atom quantum computation and simulation. We shall also include related discussions on quantum optics with Rydberg atomic ensembles, which are increasingly used to explore quantum computation and quantum simulation with photons.

preprint2021arXiv

A Technical Overview of AV1

The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than 30% reduction in bit-rate compared to its predecessor VP9 for the same decoded video quality. This paper provides a technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility.

preprint2021arXiv

Computation Resource Allocation Solution in Recommender Systems

Recommender systems rely heavily on increasing computation resources to improve their business goal. By deploying computation-intensive models and algorithms, these systems are able to inference user interests and exhibit certain ads or commodities from the candidate set to maximize their business goals. However, such systems are facing two challenges in achieving their goals. On the one hand, facing massive online requests, computation-intensive models and algorithms are pushing their computation resources to the limit. On the other hand, the response time of these systems is strictly limited to a short period, e.g. 300 milliseconds in our real system, which is also being exhausted by the increasingly complex models and algorithms. In this paper, we propose the computation resource allocation solution (CRAS) that maximizes the business goal with limited computation resources and response time. We comprehensively illustrate the problem and formulate such a problem as an optimization problem with multiple constraints, which could be broken down into independent sub-problems. To solve the sub-problems, we propose the revenue function to facilitate the theoretical analysis, and obtain the optimal computation resource allocation strategy. To address the applicability issues, we devise the feedback control system to help our strategy constantly adapt to the changing online environment. The effectiveness of our method is verified by extensive experiments based on the real dataset from Taobao.com. We also deploy our method in the display advertising system of Alibaba. The online results show that our computation resource allocation solution achieves significant business goal improvement without any increment of computation cost, which demonstrates the efficacy of our method in real industrial practice.

preprint2021arXiv

MITNet: GAN Enhanced Magnetic Induction Tomography Based on Complex CNN

Magnetic induction tomography (MIT) is an efficient solution for long-term brain disease monitoring, which focuses on reconstructing bio-impedance distribution inside the human brain using non-intrusive electromagnetic fields. However, high-quality brain image reconstruction remains challenging since reconstructing images from the measured weak signals is a highly non-linear and ill-conditioned problem. In this work, we propose a generative adversarial network (GAN) enhanced MIT technique, named MITNet, based on a complex convolutional neural network (CNN). The experimental results on the real-world dataset validate the performance of our technique, which outperforms the state-of-art method by 25.27%.

preprint2020arXiv

Asteroid belt survival through stellar evolution: dependence on the stellar mass

Polluted white dwarfs are generally accreting terrestrial-like material that may originate from a debris belt like the asteroid belt in the solar system. The fraction of white dwarfs that are polluted drops off significantly for white dwarfs with masses $M_{\rm WD}\gtrsim 0.8\,\rm M_\odot$. This implies that asteroid belts and planetary systems around main-sequence stars with mass $M_{\rm MS}\gtrsim 3\,\rm M_\odot$ may not form because of the intense radiation from the star. This is in agreement with current debris disc and exoplanet observations. The fraction of white dwarfs that show pollution also drops off significantly for low mass white dwarfs $(M_{\rm WD}\lesssim 0.55\,\rm M_\odot)$. However, the low-mass white dwarfs that do show pollution are not currently accreting but have accreted in the past. We suggest that asteroid belts around main sequence stars with masses $M_{\rm MS}\lesssim 2\,\rm M_\odot$ are not likely to survive the stellar evolution process. The destruction likely occurs during the AGB phase and could be the result of interactions of the asteroids with the stellar wind, the high radiation or, for the lowest mass stars that have an unusually close-in asteroid belt, scattering during the tidal orbital decay of the inner planetary system.

preprint2020arXiv

Atomic line defects and zero-energy end states in monolayer Fe(Te,Se) high-temperature superconductors

Majorana zero-energy bound states (ZEBSs) have been proposed to exist at the ends of one-dimensional Rashba nanowires proximity-coupled to an s-wave superconductor in an external magnetic field induced Zeeman field. Such hybrid structures have been a central platform in the search for non-Abelian Majorana zero modes (MZMs) toward fault-tolerant topological quantum computing. Here we report the discovery of ZEBSs simultaneously appearing at each end of a one-dimensional atomic line defect in monolayer iron-based high-temperature superconductor FeTe0.5Se0.5 films grown on SrTiO3(001) substrates. The spectroscopic properties of the ZEBSs, including the temperature and tunneling barrier dependences, as well as their fusion induced by coupling on line defects of different lengths are found to be robust and consistent with those of the MZMs. These observations suggest a realization of topological Shockley defects at the ends of an atomic line defect in a two-dimensional s-wave superconductor that can host a Kramers pair of MZMs protected by time-reversal symmetry along the chain. Our findings reveal an unprecedented class of topological line defect excitations in two-dimensional superconductor FeTe0.5Se0.5 monolayer films and offer an advantageous platform for generating topological zero-energy excitations at higher operating temperatures, in a single material, and under zero external magnetic field.

preprint2020arXiv

Electronic structure of a Si-containing topological Dirac semimetal CaAl2Si2

There has been an upsurge in the discovery of topological quantum materials, where various topological insulators and semimetals have been theoretically predicted and experimentally observed. However, only very few of them contains silicon, the most widely used element in electronic industry. Recently, ternary compound CaAl2Si2 has been predicted to be a topological Dirac semimetal, hosting Lorentz-symmetry-violating quasiparticles with a strongly tilted conical band dispersion. In this work, by using high-resolution angle-resolved photoemission spectroscopy (ARPES), we investigated the comprehensive electronic structure of CaAl2Si2. A pair of topological Dirac crossings is observed along the kz direction, in good agreement with the ab initio calculations, confirming the topological Dirac semimetal nature of the compound. Our study expands the topological material family on Si-containing compounds, which have great application potential in realizing low-cost, nontoxic electronic device with topological quantum states.

preprint2020arXiv

Formation of the polar debris disc around 99 Herculis

We investigate the formation mechanism for the observed nearly polar aligned (perpendicular to the binary orbital plane) debris ring around the eccentric orbit binary 99 Herculis. An initially inclined nonpolar debris ring or disc will not remain flat and will not evolve to a polar configuration, due to the effects of differential nodal precession that alter its flat structure. However, a gas disc with embedded well coupled solids around the eccentric binary may evolve to a polar configuration as a result of pressure forces that maintain the disc flatness and as a result of viscous dissipation that allows the disc to increase its tilt. Once the gas disc disperses, the debris disc is in a polar aligned state in which there is little precession. We use three-dimensional hydrodynamical simulations, linear theory, and particle dynamics to study the evolution of a misaligned circumbinary gas disc and explore the effects of the initial disc tilt, mass, and size. We find that for a wide range of parameter space, the polar alignment timescale is shorter than the lifetime of the gas disc. Using the observed level of alignment of 3 deg. from polar, we place an upper limit on the mass of the gas disc of about 0.014 M_sun at the time of gas dispersal. We conclude that the polar debris disc around 99 Her can be explained as the result of an initially moderately inclined gas disc with embedded solids. Such a disc may provide an environment for the formation of polar planets.

preprint2020arXiv

Functional Linear Regression: Dependence and Error Contamination

Functional linear regression is an important topic in functional data analysis. It is commonly assumed that samples of the functional predictor are independent realizations of an underlying stochastic process, and are observed over a grid of points contaminated by i.i.d. measurement errors. In practice, however, the dynamical dependence across different curves may exist and the parametric assumption on the error covariance structure could be unrealistic. In this paper, we consider functional linear regression with serially dependent observations of the functional predictor, when the contamination of the predictor by the white noise is genuinely functional with fully nonparametric covariance structure. Inspired by the fact that the autocovariance function of observed functional predictors automatically filters out the impact from the unobservable noise term, we propose a novel autocovariance-based generalized method-of-moments estimate of the slope function. We also develop a nonparametric smoothing approach to handle the scenario of partially observed functional predictors. The asymptotic properties of the resulting estimators under different scenarios are established. Finally, we demonstrate that our proposed method significantly outperforms possible competing methods through an extensive set of simulations and an analysis of a public financial dataset.

preprint2020arXiv

High-resolution imaging of Rydberg atoms in optical lattices using an aspheric-lens objective in vacuum

We present a high-resolution, simple and versatile system for imaging ultracold Rydberg atoms in optical lattices. The imaging objective is a single aspheric lens (with a working distance of 20.6 mm and a numerical aperture (NA) of 0.51) placed inside the vacuum chamber. Adopting a large-working-distance lens leaves room for electrodes and electrostatic shields to control electric fields around Rydberg atoms. With this setup, we achieve an Rayleigh resolution of 1.10 $μ$m or $1.41λ$ ($λ=780$ nm), limited by the NA of the aspheric lens. For systems of highly excited Rydberg states with blockade radii greater than a few $μ$m, the resolution achieved is sufficient for studying many physical processes of interest.

preprint2020arXiv

Large-Scale Optimal Transport via Adversarial Training with Cycle-Consistency

Recent advances in large-scale optimal transport have greatly extended its application scenarios in machine learning. However, existing methods either not explicitly learn the transport map or do not support general cost function. In this paper, we propose an end-to-end approach for large-scale optimal transport, which directly solves the transport map and is compatible with general cost function. It models the transport map via stochastic neural networks and enforces the constraint on the marginal distributions via adversarial training. The proposed framework can be further extended towards learning Monge map or optimal bijection via adopting cycle-consistency constraint(s). We verify the effectiveness of the proposed method and demonstrate its superior performance against existing methods with large-scale real-world applications, including domain adaptation, image-to-image translation, and color transfer.

preprint2020arXiv

Learning Directional Feature Maps for Cardiac MRI Segmentation

Cardiac MRI segmentation plays a crucial role in clinical diagnosis for evaluating personalized cardiac performance parameters. Due to the indistinct boundaries and heterogeneous intensity distributions in the cardiac MRI, most existing methods still suffer from two aspects of challenges: inter-class indistinction and intra-class inconsistency. To tackle these two problems, we propose a novel method to exploit the directional feature maps, which can simultaneously strengthen the differences between classes and the similarities within classes. Specifically, we perform cardiac segmentation and learn a direction field pointing away from the nearest cardiac tissue boundary to each pixel via a direction field (DF) module. Based on the learned direction field, we then propose a feature rectification and fusion (FRF) module to improve the original segmentation features, and obtain the final segmentation. The proposed modules are simple yet effective and can be flexibly added to any existing segmentation network without excessively increasing time and space complexity. We evaluate the proposed method on the 2017 MICCAI Automated Cardiac Diagnosis Challenge (ACDC) dataset and a large-scale self-collected dataset, showing good segmentation performance and robust generalization ability of the proposed method.

preprint2020arXiv

Polar planets around highly eccentric binaries are the most stable

We study the orbital stability of a non-zero mass, close-in circular orbit planet around an eccentric orbit binary for various initial values of the binary eccentricity, binary mass fraction, planet mass, planet semi--major axis, and planet inclination by means of numerical simulations that cover $5 \times 10^4$ binary orbits. For small binary eccentricity, the stable orbits that extend closest to the binary (most stable orbits) are nearly retrograde and circulating. For high binary eccentricity, the most stable orbits are highly inclined and librate near the so-called generalised polar orbit which is a stationary orbit that is fixed in the frame of the binary orbit. For more extreme mass ratio binaries, there is a greater variation in the size of the stability region (defined by initial orbital radius and inclination) with planet mass and initial inclination, especially for low binary eccentricity. For low binary eccentricity, inclined planet orbits may be unstable even at large orbital radii (separation $> 5 \,a_{\rm b}$). The escape time for an unstable planet is generally shorter around an equal mass binary compared with an unequal mass binary. Our results have implications for circumbinary planet formation and evolution and will be helpful for understanding future circumbinary planet observations.

preprint2020arXiv

Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated Fusion

Accurate medical image segmentation commonly requires effective learning of the complementary information from multimodal data. However, in clinical practice, we often encounter the problem of missing imaging modalities. We tackle this challenge and propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities. Our network uses feature disentanglement to decompose the input modalities into the modality-specific appearance code, which uniquely sticks to each modality, and the modality-invariant content code, which absorbs multimodal information for the segmentation task. With enhanced modality-invariance, the disentangled content code from each modality is fused into a shared representation which gains robustness to missing data. The fusion is achieved via a learning-based strategy to gate the contribution of different modalities at different locations. We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset. With competitive performance to the state-of-the-art approaches for full modality, our method achieves outstanding robustness under various missing modality(ies) situations, significantly exceeding the state-of-the-art method by over 16% in average for Dice on whole tumor segmentation.

preprint2020arXiv

Study of single-particle resonant states with Green's function method

The relativistic mean field theory with the Green's function method is taken to study the single-particle resonant states. Different from our previous work [Phys.Rev.C 90,054321(2014)], the resonant states are identified by searching for the poles of Green's function or the extremes of the density of states. This new approach is very effective for all kinds of resonant states, no matter it is broad or narrow. The dependence on the space size for the resonant energies, widths, and the density distributions in the coordinate space has been checked and it is found very stable. Taking $^{120}$Sn as an example, four new broad resonant states $2g_{7/2}$, $2g_{9/2}$, $2h_{11/2}$ and $1j_{13/2}$ are observed, and also the accuracy for the width of the very narrow resonant state $1h_{9/2}$ is highly improved to be $1\times 10^{-8}$ MeV. Besides, our results are very close to those by the complex momentum representation method and the complex scaling method.

preprint2020arXiv

Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation

Unsupervised domain adaptation has increasingly gained interest in medical image computing, aiming to tackle the performance degradation of deep neural networks when being deployed to unseen data with heterogeneous characteristics. In this work, we present a novel unsupervised domain adaptation framework, named as Synergistic Image and Feature Alignment (SIFA), to effectively adapt a segmentation network to an unlabeled target domain. Our proposed SIFA conducts synergistic alignment of domains from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features by leveraging adversarial learning in multiple aspects and with a deeply supervised mechanism. The feature encoder is shared between both adaptive perspectives to leverage their mutual benefits via end-to-end learning. We have extensively evaluated our method with cardiac substructure segmentation and abdominal multi-organ segmentation for bidirectional cross-modality adaptation between MRI and CT images. Experimental results on two different tasks demonstrate that our SIFA method is effective in improving segmentation performance on unlabeled target images, and outperforms the state-of-the-art domain adaptation approaches by a large margin.

preprint2020arXiv

Zero-energy bound states in the high-temperature superconductors at the two-dimensional limit

Majorana zero modes (MZMs) that obey the non-Abelian statistics have been intensively investigated for potential applications in topological quantum computing. The prevailing signals in tunneling experiments "fingerprinting" the existence of MZMs are the zero-energy bound states (ZEBSs). However, nearly all of the previously reported ZEBSs showing signatures of the MZMs are observed in difficult-to-fabricate heterostructures at very low temperatures and additionally require applied magnetic field. Here, by using in-situ scanning tunneling spectroscopy, we detect the ZEBSs upon the interstitial Fe adatoms deposited on two different high-temperature superconducting one-unit-cell-thick iron chalcogenides on SrTiO3(001). The spectroscopic results resemble the phenomenological characteristics of the MZMs inside the vortex cores of topological superconductors. Our experimental findings may extend the MZM explorations in connate topological superconductors towards an applicable temperature regime and down to the two-dimensional limit. While a concrete understanding of the observations is lacking, possible explanations involving novel 2D superconducting states with spin-orbit coupling, spontaneous nucleation of anomalous vortices at the magnetic sites, and noncoplanar magnetic ordering may further stimulate theoretical understandings of the scarcely captured ZEBSs in strongly correlated systems with multiband Cooper pairing.

preprint2016arXiv

A Novel Method to Study Bottom-up Visual Saliency and its Neural Mechanism

In this study, we propose a novel method to measure bottom-up saliency maps of natural images. In order to eliminate the influence of top-down signals, backward masking is used to make stimuli (natural images) subjectively invisible to subjects, however, the bottom-up saliency can still orient the subjects attention. To measure this orientation/attention effect, we adopt the cueing effect paradigm by deploying discrimination tasks at each location of an image, and measure the discrimination performance variation across the image as the attentional effect of the bottom-up saliency. Such attentional effects are combined to construct a final bottomup saliency map. Based on the proposed method, we introduce a new bottom-up saliency map dataset of natural images to benchmark computational models. We compare several state-of-the-art saliency models on the dataset. Moreover, the proposed paradigm is applied to investigate the neural basis of the bottom-up visual saliency map by analyzing psychophysical and fMRI experimental results. Our findings suggest that the bottom-up saliency maps of natural images are constructed in V1. It provides a strong scientific evidence to resolve the long standing dispute in neuroscience about where the bottom-up saliency map is constructed in human brain.

preprint2016arXiv

Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform

Using \textit{multiple streams} can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Prior work focuses a lot on GPUs but little is known about the performance impact on (Intel Xeon) Phi. In this work, we apply multiple streams into six real-world applications on Phi. We then systematically evaluate the performance benefits of using multiple streams. The evaluation work is performed at two levels: the microbenchmarking level and the real-world application level. Our experimental results at the microbenchmark level show that data transfers and kernel execution can be overlapped on Phi, while data transfers in both directions are performed in a serial manner. At the real-world application level, we show that both overlappable and non-overlappable applications can benefit from using multiple streams (with an performance improvement of up to 24\%). We also quantify how task granularity and resource granularity impact the overall performance. Finally, we present a set of heuristics to reduce the search space when determining a proper task granularity and resource granularity. To conclude, our evaluation work provides lots of insights for runtime and architecture designers when using multiple streams on Phi.

preprint2016arXiv

System Design of Internet-of-Things for Residential Smart Grid

Internet-of-Things (IoTs) envisions to integrate, coordinate, communicate, and collaborate real-world objects in order to perform daily tasks in a more intelligent and efficient manner. To comprehend this vision, this paper studies the design of a large scale IoT system for smart grid application, which constitutes a large number of home users and has the requirement of fast response time. In particular, we focus on the messaging protocol of a universal IoT home gateway, where our cloud enabled system consists of a backend server, unified home gateway (UHG) at the end users, and user interface for mobile devices. We discuss the features of such IoT system to support a large scale deployment with a UHG and real-time residential smart grid applications. Based on the requirements, we design an IoT system using the XMPP protocol, and implemented in a testbed for energy management applications. To show the effectiveness of the designed testbed, we present some results using the proposed IoT architecture.

preprint2016arXiv

The Impact of Unlicensed Access on Small-Cell Resource Allocation

Small cells deployed in licensed spectrum and unlicensed access via WiFi provide different ways of expanding wireless services to low mobility users. That reduces the demand for conventional macro-cellular networks, which are better suited for wide-area mobile coverage. The mix of these technologies seen in practice depends in part on the decisions made by wireless service providers that seek to maximize revenue, and allocations of licensed and unlicensed spectrum by regulators. To understand these interactions we present a model in which a service provider allocates available licensed spectrum across two separate bands, one for macro- and one for small-cells, in order to serve two types of users: mobile and fixed. We assume a service model in which the providers can charge a (different) price per unit rate for each type of service (macro- or small-cell); unlicensed access is free. With this setup we study how the addition of unlicensed spectrum affects prices and the optimal allocation of bandwidth across macro-/small-cells. We also characterize the optimal fraction of unlicensed spectrum when new bandwidth becomes available.

preprint2015arXiv

A Parallel algorithm for $\mathcal{X}$-Armed bandits

The target of $\mathcal{X}$-armed bandit problem is to find the global maximum of an unknown stochastic function $f$, given a finite budget of $n$ evaluations. Recently, $\mathcal{X}$-armed bandits have been widely used in many situations. Many of these applications need to deal with large-scale data sets. To deal with these large-scale data sets, we study a distributed setting of $\mathcal{X}$-armed bandits, where $m$ players collaborate to find the maximum of the unknown function. We develop a novel anytime distributed $\mathcal{X}$-armed bandit algorithm. Compared with prior work on $\mathcal{X}$-armed bandits, our algorithm uses a quite different searching strategy so as to fit distributed learning scenarios. Our theoretical analysis shows that our distributed algorithm is $m$ times faster than the classical single-player algorithm. Moreover, the number of communication rounds of our algorithm is only logarithmic in $mn$. The numerical results show that our method can make effective use of every players to minimize the loss. Thus, our distributed approach is attractive and useful.

preprint2015arXiv

Regret vs. Communication: Distributed Stochastic Multi-Armed Bandits and Beyond

In this paper, we consider the distributed stochastic multi-armed bandit problem, where a global arm set can be accessed by multiple players independently. The players are allowed to exchange their history of observations with each other at specific points in time. We study the relationship between regret and communication. When the time horizon is known, we propose the Over-Exploration strategy, which only requires one-round communication and whose regret does not scale with the number of players. When the time horizon is unknown, we measure the frequency of communication through a new notion called the density of the communication set, and give an exact characterization of the interplay between regret and communication. Specifically, a lower bound is established and stable strategies that match the lower bound are developed. The results and analyses in this paper are specific but can be translated into more general settings.

preprint2014arXiv

Buyer to Seller Recommendation under Constraints

The majority of recommender systems are designed to recommend items (such as movies and products) to users. We focus on the problem of recommending buyers to sellers which comes with new challenges: (1) constraints on the number of recommendations buyers are part of before they become overwhelmed, (2) constraints on the number of recommendations sellers receive within their budget, and (3) constraints on the set of buyers that sellers want to receive (e.g., no more than two people from the same household). We propose the following critical problems of recommending buyers to sellers: Constrained Recommendation (C-REC) capturing the first two challenges, and Conflict-Aware Constrained Recommendation (CAC-REC) capturing all three challenges at the same time. We show that C-REC can be modeled using linear programming and can be efficiently solved using modern solvers. On the other hand, we show that CAC-REC is NP-hard. We propose two approximate algorithms to solve CAC-REC and show that they achieve close to optimal solutions via comprehensive experiments using real-world datasets.

preprint2013arXiv

The Best Answers? Think Twice: Online Detection of Commercial Campaigns in the CQA Forums

In an emerging trend, more and more Internet users search for information from Community Question and Answer (CQA) websites, as interactive communication in such websites provides users with a rare feeling of trust. More often than not, end users look for instant help when they browse the CQA websites for the best answers. Hence, it is imperative that they should be warned of any potential commercial campaigns hidden behind the answers. However, existing research focuses more on the quality of answers and does not meet the above need. In this paper, we develop a system that automatically analyzes the hidden patterns of commercial spam and raises alarms instantaneously to end users whenever a potential commercial campaign is detected. Our detection method integrates semantic analysis and posters' track records and utilizes the special features of CQA websites largely different from those in other types of forums such as microblogs or news reports. Our system is adaptive and accommodates new evidence uncovered by the detection algorithms over time. Validated with real-world trace data from a popular Chinese CQA website over a period of three months, our system shows great potential towards adaptive online detection of CQA spams.

preprint2011arXiv

An Active Margin System and its Application in Chinese Margin Lending Market

In order to protect brokers from customer defaults in a volatile market, an active margin system is proposed for the transactions of margin lending in China. The probability of negative return under the condition that collaterals are liquidated in a falling market is used to measure the risk associated with margin loans, and a recursive algorithm is proposed to calculate this probability under a Markov chain model. The optimal maintenance margin ratio can be given under the constraint of the proposed risk measurement for a specified amount of initial margin. An example of such a margin system is constructed and applied to $26,800$ margin loans of 134 stocks traded on the Shanghai Stock Exchange. The empirical results indicate that the proposed method is an operational method for brokers to set margin system with a clearly specified target of risk control.

preprint2011arXiv

Battling the Internet Water Army: Detection of Hidden Paid Posters

We initiate a systematic study to help distinguish a special group of online users, called hidden paid posters, or termed "Internet water army" in China, from the legitimate ones. On the Internet, the paid posters represent a new type of online job opportunity. They get paid for posting comments and new threads or articles on different online communities and websites for some hidden purposes, e.g., to influence the opinion of other people towards certain social events or business markets. Though an interesting strategy in business marketing, paid posters may create a significant negative effect on the online communities, since the information from paid posters is usually not trustworthy. When two competitive companies hire paid posters to post fake news or negative comments about each other, normal online users may feel overwhelmed and find it difficult to put any trust in the information they acquire from the Internet. In this paper, we thoroughly investigate the behavioral pattern of online paid posters based on real-world trace data. We design and validate a new detection mechanism, using both non-semantic analysis and semantic analysis, to identify potential online paid posters. Our test results with real-world datasets show a very promising performance.

Cheng Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

64 published item(s)

A Near-optimal, Scalable and Parallelizable Framework for Stochastic Bandits Robust to Adversarial Corruptions and Beyond

Elimination Templates in Macaulay2

Fourier-Jacobi models for real symplectic-metaplectic groups: the basic case

From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion

GEM: Gaussian Evolution Model for Occupancy Forecasting and Motion Planning

HTPO: Towards Exploration-Exploitation Balanced Policy Optimization via Hierarchical Token-level Objective Control

Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts

On the orbital evolution of binaries with polar circumbinary disks

RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs

SynerMedGen: Synergizing Medical Multimodal Understanding with Generation via Task Alignment

TimeMM: Time-as-Operator Spectral Filtering for Dynamic Multimodal Recommendation

Diffusion Model based Semi-supervised Learning on Brain Hemorrhage Images for Efficient Midline Shift Quantification

3D-model ShapeNet Core Classification using Meta-Semantic Learning

5 Gbps Optical Wireless Communication Using Commercial SPAD Array Receivers

A Unified Framework for Campaign Performance Forecasting in Online Display Advertising

A Unified Two-Stage Group Semantics Propagation and Contrastive Learning Network for Co-Saliency Detection

Approaching a Minimal Topological Electronic Structure in Antiferromagnetic Topological Insulator MnBi2Te4 via Surface Modification

BLINK with Elasticsearch for Efficient Entity Linking in Business Conversations

Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval

Developing a Production System for Purpose of Call Detection in Business Phone Conversations

Direct Visualization and Manipulation of Tunable Quantum Well State in Semiconducting Nb2SiTe4

DLTTA: Dynamic Learning Rate for Test-time Adaptation on Cross-domain Medical Images

Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene Segmentation

Impression Allocation and Policy Search in Display Advertising

Inverse is Better! Fast and Accurate Prompt for Few-shot Slot Tagging

MuZero with Self-competition for Rate Control in VP9 Video Compression

Neural Network Based Pore Flow Field Prediction in Porous Media Using Super Resolution

Observation of Dimension-Crossover of a Tunable 1D Dirac Fermion in Topological Semimetal NbSi$_x$Te$_2$

Online Active Regression

Online Reflective Learning for Robust Medical Image Segmentation

Performance Analysis of SPAD-Based Optical Wireless Communication with OFDM

Persistent exchange splitting in a chiral helimagnet Cr1/3NbS2

SHREC 2021: Classification in cryo-electron tomograms

Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

Single-domain Generalization in Medical Image Segmentation via Test-time Adaptation from Shape Dictionary

Temperature effect on non-Darcian flow in low-permeability porous media

Test-time Adaptation with Calibration of Medical Image Classification Nets for Label Distribution Shift

A concise review of Rydberg atom based quantum computation and quantum simulation

A Technical Overview of AV1

Computation Resource Allocation Solution in Recommender Systems

MITNet: GAN Enhanced Magnetic Induction Tomography Based on Complex CNN

Asteroid belt survival through stellar evolution: dependence on the stellar mass

Atomic line defects and zero-energy end states in monolayer Fe(Te,Se) high-temperature superconductors

Electronic structure of a Si-containing topological Dirac semimetal CaAl2Si2

Formation of the polar debris disc around 99 Herculis

Functional Linear Regression: Dependence and Error Contamination

High-resolution imaging of Rydberg atoms in optical lattices using an aspheric-lens objective in vacuum

Large-Scale Optimal Transport via Adversarial Training with Cycle-Consistency

Learning Directional Feature Maps for Cardiac MRI Segmentation

Polar planets around highly eccentric binaries are the most stable

Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated Fusion

Study of single-particle resonant states with Green's function method

Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation

Zero-energy bound states in the high-temperature superconductors at the two-dimensional limit

A Novel Method to Study Bottom-up Visual Saliency and its Neural Mechanism

Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform

System Design of Internet-of-Things for Residential Smart Grid

The Impact of Unlicensed Access on Small-Cell Resource Allocation

A Parallel algorithm for $\mathcal{X}$-Armed bandits

Regret vs. Communication: Distributed Stochastic Multi-Armed Bandits and Beyond

Buyer to Seller Recommendation under Constraints

The Best Answers? Think Twice: Online Detection of Commercial Campaigns in the CQA Forums

An Active Margin System and its Application in Chinese Margin Lending Market

Battling the Internet Water Army: Detection of Hidden Paid Posters