Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
45works
0followers
29topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

45 published item(s)

preprint2026arXiv

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

Reinforcement learning (RL) has enabled complex reasoning abilities in large language models (LLMs). However, most RL algorithms suffer from performance saturation, preventing continued gains as RL training scales. This problem can be characterized by the collapse of entropy, a key diagnostic for exploration in RL. Existing attempts focus on preventing entropy collapse through regularization or clipping. However, their resulting entropy curves often exhibit instability in the long term, which hinders performance gains. In this paper, we introduce Entrocraft, a simple rejection-sampling approach that realizes user-customized entropy schedule by biasing the advantage distributions. Entrocraft requires no objective regularization and is advantage-estimator-agnostic. Theoretically, we relate per-step entropy change to the advantage distribution under minimal assumptions. This explains the behavior of existing RL and entropy-preserving methods. Entrocraft also enables a systematic study of entropy schedules, which reveals that linear annealing, which starts high and decays to a slightly lower target, performs best. Empirically, Entrocraft addresses performance saturation, significantly improving generalization, output diversity, and long-term training. It enables a 4B model to outperform an 8B baseline, sustains improvement for up to 4x longer before plateauing, and raises pass@K by 50% over the baseline.

preprint2026arXiv

AR-MOT: Autoregressive Multi-object Tracking

As multi-object tracking (MOT) tasks continue to evolve toward more general and multi-modal scenarios, the rigid and task-specific architectures of existing MOT methods increasingly hinder their applicability across diverse tasks and limit flexibility in adapting to new tracking formulations. Most approaches rely on fixed output heads and bespoke tracking pipelines, making them difficult to extend to more complex or instruction-driven tasks. To address these limitations, we propose AR-MOT, a novel autoregressive paradigm that formulates MOT as a sequence generation task within a large language model (LLM) framework. This design enables the model to output structured results through flexible sequence construction, without requiring any task-specific heads. To enhance region-level visual perception, we introduce an Object Tokenizer based on a pretrained detector. To mitigate the misalignment between global and regional features, we propose a Region-Aware Alignment (RAA) module, and to support long-term tracking, we design a Temporal Memory Fusion (TMF) module that caches historical object tokens. AR-MOT offers strong potential for extensibility, as new modalities or instructions can be integrated by simply modifying the output sequence format without altering the model architecture. Extensive experiments on MOT17 and DanceTrack validate the feasibility of our approach, achieving performance comparable to state-of-the-art methods while laying the foundation for more general and flexible MOT systems.

preprint2026arXiv

Autonomous Robotic Bone Micro-Milling System with Automatic Calibration and 3D Surface Fitting

Automating bone micro-milling using a robotic system presents challenges due to the uncertainties in both the external and internal features of bone tissue. For example, during mouse cranial window creation, a circular path with a radius of 2 to 4 mm needs to be milled on the mouse skull using a microdrill. The uneven surface and non-uniform thickness of the mouse skull make it difficult to fully automate this process, requiring the system to possess advanced perceptual and adaptive capabilities. In this study, we address this challenge by integrating a Microscopic Stereo Camera System (MSCS) into the robotic bone micro-milling system and proposing a novel online pre-measurement pipeline for the target surface. Starting from uncalibrated cameras, the pipeline enables automatic calibration and 3D surface fitting through a convolutional neural network (CNN)-based keypoint detection. Combined with the existing feedback-based system, we develop the world's first autonomous robotic bone micro-milling system capable of rapidly, in real-time perceiving and adapting to surface unevenness and non-uniform thickness, thereby enabling an end-to-end autonomous cranial window creation workflow without human assistance. Validation experiments on euthanized mice demonstrate that the improved system achieves a success rate of 85.7 % and an average milling time of 2.1 minutes, showing not only significant performance improvements over the previous system but also exceptional accuracy, speed, and stability compared to human operators.

preprint2026arXiv

CompoSE: Compositional Synthesis and Editing of 3D Shapes via Part-Aware Control

Creating and editing high-quality 3D content remains a central challenge in computer graphics. We address this challenge by introducing CompoSE, a novel method for Compositional Synthesis and Editing of 3D shapes via part-aware control. Our method takes as input a set of coarse geometric primitives (e.g., bounding boxes) that represent distinct object parts arranged in a particular spatial configuration, and synthesizes as output part-separated 3D objects that support localized granular (i.e., compositional) editing of individual parts. The key insight that enables our method is our use of a diffusion transformer architecture that alternates between processing each part locally and aggregating contextual information across parts globally, and features a novel conditioning technique that ensures strong adherence to the user's input. Importantly, our method learns to infer part semantics and symmetries directly from the user's coarse layout guidance, and does not require part-level text prompts. We demonstrate that our method enables powerful part-level editing capabilities, including context-aware substitution, addition, deletion, and style-preserving resizing operations. We show through extensive experiments that our method significantly outperforms existing approaches on guided synthesis, as measured by objective metrics and LLM-based evaluations.

preprint2026arXiv

Graph-Structured Driven Dual Adaptation for Mitigating Popularity Bias

Popularity bias is a common challenge in recommender systems. It often causes unbalanced item recommendation performance and intensifies the Matthew effect. Due to limited user-item interactions, unpopular items are frequently constrained to the embedding neighborhoods of only a few users, leading to representation collapse and weakening the model's generalization. Although existing supervised alignment and reweighting methods can help mitigate this problem, they still face two major limitations: (1) they overlook the inherent variability among different Graph Convolutional Networks (GCNs) layers, which can result in negative gains in deeper layers; (2) they rely heavily on fixed hyperparameters to balance popular and unpopular items, limiting adaptability to diverse data distributions and increasing model complexity. To address these challenges, we propose Graph-Structured Dual Adaptation Framework (GSDA), a dual adaptive framework for mitigating popularity bias in recommendation. Our theoretical analysis shows that supervised alignment in GCNs is hindered by the over-smoothing effect, where the distinction between popular and unpopular items diminishes as layers deepen, reducing the effectiveness of alignment at deeper levels. To overcome this limitation, GSDA integrates a hierarchical adaptive alignment mechanism that counteracts entropy decay across layers together with a distribution-aware contrastive weighting strategy based on the Gini coefficient, enabling the model to adapt its debiasing strength dynamically without relying on fixed hyperparameters. Extensive experiments on three benchmark datasets demonstrate that GSDA effectively alleviates popularity bias while consistently outperforming state-of-the-art methods in recommendation performance.

preprint2026arXiv

Higher Satisfaction, Lower Cost: A Technical Report on How LLMs Revolutionize Meituan's Intelligent Interaction Systems

Enhancing customer experience is essential for business success, particularly as service demands grow in scale and complexity. Generative artificial intelligence and Large Language Models (LLMs) have empowered intelligent interaction systems to deliver efficient, personalized, and 24/7 support. In practice, intelligent interaction systems encounter several challenges: (1) Constructing high-quality data for cold-start training is difficult, hindering self-evolution and raising labor costs. (2) Multi-turn dialogue performance remains suboptimal due to inadequate intent understanding, rule compliance, and solution extraction. (3) Frequent evolution of business rules affects system operability and transferability, constraining low-cost expansion and adaptability. (4) Reliance on a single LLM is insufficient in complex scenarios, where the absence of multi-agent frameworks and effective collaboration undermines process completeness and service quality. (5) The open-domain nature of multi-turn dialogues, lacking unified golden answers, hampers quantitative evaluation and continuous optimization. To address these challenges, we introduce WOWService, an intelligent interaction system tailored for industrial applications. With the integration of LLMs and multi-agent architectures, WOWService enables autonomous task management and collaborative problem-solving. Specifically, WOWService focuses on core modules including data construction, general capability enhancement, business scenario adaptation, multi-agent coordination, and automated evaluation. Currently, WOWService is deployed on the Meituan App, achieving significant gains in key metrics, e.g., User Satisfaction Metric 1 (USM 1) -27.53% and User Satisfaction Metric 2 (USM 2) +25.51%, demonstrating its effectiveness in capturing user needs and advancing personalized service.

preprint2026arXiv

PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection

Visual instruction tuning adapts pre-trained Multimodal Large Language Models (MLLMs) to follow human instructions for real-world applications. However, the rapid growth of these datasets introduces significant redundancy, leading to increased computational costs. Existing methods for selecting instruction data aim to prune this redundancy, but predominantly rely on computationally demanding techniques such as proxy-based inference or training-based metrics. Consequently, the substantial computational costs incurred by these selection processes often exacerbate the very efficiency bottlenecks they are intended to resolve, posing a significant challenge to the scalable and effective tuning of MLLMs. To address this challenge, we first identify a critical, yet previously overlooked, factor: the anisotropy inherent in visual feature distributions. We find that this anisotropy induces a \textit{Global Semantic Drift}, and overlooking this phenomenon is a key factor limiting the efficiency of current data selection methods. Motivated by this insight, we devise \textbf{PRISM}, the first training-free framework for efficient visual instruction selection. PRISM surgically removes the corrupting influence of global background features by modeling the intrinsic visual semantics via implicit re-centering. Empirically, PRISM reduces the end-to-end time for data selection and model tuning to just 30\% of conventional pipelines. More remarkably, it achieves this efficiency while simultaneously enhancing performance, surpassing models fine-tuned on the full dataset across eight multimodal and three language understanding benchmarks, culminating in a 101.7\% relative improvement over the baseline. The code is available for access via \href{https://github.com/bibisbar/PRISM}{this repository}.

preprint2026arXiv

Towards Sustainable Growth: A Multi-Value-Aware Retrieval Framework for E-Commerce Search

New item growth is critical for maintaining a healthy ecosystem in large-scale e-commerce platforms. However, existing systems tend to prioritize presenting users with already popular items, a phenomenon often referred to as the "Matthew effect". In the context of search retrieval, current cold-start models suffer from the misalignment between training objectives and online business metrics, and they lack effective mechanisms to measure an item's growth potential. In this paper, we propose a Multi-Value-Aware retrieval framework tailored for e-commerce search, designed to better align with the cascaded online values across different stages of the search system while balancing immediate conversion and long-term item growth. Our framework GrowthGR consists of two key components: an Item Long-term Transaction Value Prediction (ItemLTV) module and a Multi-Value-Aware Generative Retrieval (MultiGR) module. First, in the ItemLTV module, we employ counterfactual inference to quantify the long-term value increment attributable to a single user interaction. Second, in the MultiGR module, building upon a semantic-ID-based generative retrieval architecture, we leverage structured samples with the search cascade signals and adopt a Multi-Value-Aware Policy Optimization (MoPO) training paradigm to align with multi-stage online values, while explicitly balancing short-term transactional value and long-term growth potential estimated by ItemLTV. We successfully deployed GrowthGR on Taobao's production platform, achieving a substantial 5.3% lift in new item GMV while delivering a non-trivial 0.3% gain in overall search GMV. Extensive online analysis and A/B testing demonstrate its positive impact on the overall ecosystem value.

preprint2026arXiv

UnAC: Adaptive Visual Prompting with Abstraction and Stepwise Checking for Complex Multimodal Reasoning

Although recent LMMs have become much stronger at visual perception, they remain unreliable on problems that require multi-step reasoning over visual evidence. In this paper, we present UnAC (Understanding, Abstracting, and Checking), a multimodal prompting method that strengthens reasoning for complex multimodal tasks in LMMs (e.g., GPT-4o, Gemini 1.5, and GPT-4V). To improve image understanding and capture fine details, we propose an adaptive visual prompting strategy that enables LMMs to focus on salient regions. We further design an image-abstraction prompt to effectively extract key information from images. In addition, we introduce a gradual self-checking scheme that improves reasoning by verifying each decomposed subquestion and its answer. Extensive experiments on three public benchmarks-MathVista, MM-Vet, and MMMU.

preprint2026arXiv

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

Camera-controlled video generation has made substantial progress, enabling generated videos to follow prescribed viewpoint trajectories. However, existing methods usually learn camera-specific conditioning through camera encoders, control branches, or attention and positional-encoding modifications, which often require post-training on large-scale camera-annotated videos. Training-free alternatives avoid such post-training, but often shift the cost to test-time optimization or extra denoising-time guidance. We propose Warp-as-History, a simple interface that turns camera-induced warps into camera-warped pseudo-history with target-frame positional alignment and visible-token selection. Given a target camera trajectory, we construct camera-warped pseudo-history from past observations and feed it through the model's visual-history pathway. Crucially, we align its positional encoding with the target frames being denoised and remove warped-history tokens without valid source observations. Without any training, architectural modification, or test-time optimization, this interface reveals a non-trivial zero-shot capability of a frozen video generation model to follow camera trajectories. Moreover, lightweight offline LoRA finetuning on only one camera-annotated video further improves this capability and generalizes to unseen videos, improving camera adherence, visual quality, and motion dynamics without test-time optimization or target-video adaptation. Extensive experiments on diverse datasets confirm the effectiveness of our method.

preprint2025arXiv

Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos

Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these, we first present the EgoMAN dataset, a large-scale egocentric dataset for interaction stage-aware 3D hand trajectory prediction with 219K 6DoF trajectories and 3M structured QA pairs for semantic, spatial, and motion reasoning. We then introduce the EgoMAN model, a reasoning-to-motion framework that links vision-language reasoning and motion generation via a trajectory-token interface. Trained progressively to align reasoning with motion dynamics, our approach yields accurate and stage-aware trajectories with generalization across real-world scenes.

preprint2025arXiv

Laser, Vacuum, and Gas Reaction Chamber for Operando Measurements at NSLS-II's 28-ID-2

We present a laser reaction chamber that we developed for in-situ/operando X-ray diffraction measurements at the NSLS-II 28-ID-2 XPD (X-Ray Powder Diffraction) beamline. This chamber allows for rapid and dynamic sample heating under specialized gas environments, spanning ambient conditions down to vacuum pressures. We demonstrate the capabilities of this setup through two applications: laser-driven heating in polycrystalline iron oxide and in single crystal WTe2. Our measurements reveal the ability to resolve chemical reaction kinetics over minutes with 1-s time resolution. This setup advances opportunities for in-situ/operando XRD studies in both bulk and single crystal materials.

preprint2024arXiv

Dark-Field X-ray Microscopy for 2D and 3D imaging of Microstructural Dynamics at the European X-ray Free Electron Laser

Dark field X-ray microscopy (DXFM) can visualize microstructural distortions in bulk crystals. Using the femtosecond X-ray pulses generated by X-ray free-electron lasers (XFEL), DFXM can achieve sub-μm spatial resolution and <100 fs time resolution simultaneously. In this paper, we demonstrate ultrafast DFXM measurements at the European XFEL to visualize an optically-driven longitudinal strain wave propagating through a diamond single crystal. We also present two DFXM scanning modalities that are new to the XFEL sources: spatially 3D and 2D axial-strain scans with sub-μm spatial resolution. With this progress in XFEL-based DFXM, we discuss new opportunities to study multi-timescale spatio-temporal dynamics of microstructures.

preprint2024arXiv

Measuring the Burgers Vector of Dislocations with Dark-Field X-ray Microscopy

The behavior of dislocations is essential to understand material properties, but their subsurface dynamics that are representative of bulk phenomena cannot be resolved by conventional transmission electron microscopy (TEM). Dark field X-ray microscope (DFXM) was recently demonstrated to image hierarchical structures of bulk dislocations by imaging lattice distortions along the transmitted X-ray diffracted beam using an objective lens. While today&#39;s DFXM can effectively map the line vector of dislocations, it still cannot quantify the Burgers vector required to understand dislocation interactions, structures, and energies. Our study formulates a theoretical model of how DFXM images collected along specific scans can be used to directly measure the Burgers vector of a dislocation. By revisiting the &#34;invisibility criteria&#34; from TEM theory, we re-solve this formalism for DFXM and extend it to the geometric-optics model developed for DFXM to evaluate how the images acquired from different scans about a single {hkl} diffraction peak encode the Burgers vector within them. We demonstrate this for edge, screw, and mixed dislocations and discuss the observed symmetries. This work advances our understanding of DFXM to establish its capabilities to connect bulk experiments to dislocation theory and mechanics.

preprint2024arXiv

The Mood of the Sunlight: Visualization of the Sunlight Data for Public Art

The application of data visualization in public art attracts increasing attention. In this paper, we present the design and implementation of a visualization method for sunlight data collected over a long period of time with an industrial camera. The proposed method makes use of the saturation and value information of collected sunlight image data in Hue Saturation Value color model to show the variation of the mood of the sunlight. Specifically, we create visual patterns with a rotating planet gear, which has an intuitively consistent geometric meaning with HSV color model and the planetary motion. Due to the variation of the sunlight data over time, the generated visual pattern presents a periodic variation that corresponds to the changing mood of the sunlight. Furthermore, we also use the sunlight data to generate music as another form of data representation. Two public artworks have been created with the above visualization and auralization methods and displayed on an exhibition held at China Resources Tower, Shenzhen, China. This work is a typical practice of creating public installations with data visualization technology, giving a glimpse into the many ways science and art intersect.

preprint2023arXiv

Automatic Generation of German Drama Texts Using Fine Tuned GPT-2 Models

This study is devoted to the automatic generation of German drama texts. We suggest an approach consisting of two key steps: fine-tuning a GPT-2 model (the outline model) to generate outlines of scenes based on keywords and fine-tuning a second model (the generation model) to generate scenes from the scene outline. The input for the neural model comprises two datasets: the German Drama Corpus (GerDraCor) and German Text Archive (Deutsches Textarchiv or DTA). In order to estimate the effectiveness of the proposed method, our models are compared with baseline GPT-2 models. Our models perform well according to automatic quantitative evaluation, but, conversely, manual qualitative analysis reveals a poor quality of generated texts. This may be due to the quality of the dataset or training inputs.

preprint2023arXiv

Unsupervised Mandarin-Cantonese Machine Translation

Advancements in unsupervised machine translation have enabled the development of machine translation systems that can translate between languages for which there is not an abundance of parallel data available. We explored unsupervised machine translation between Mandarin Chinese and Cantonese. Despite the vast number of native speakers of Cantonese, there is still no large-scale corpus for the language, due to the fact that Cantonese is primarily used for oral communication. The key contributions of our project include: 1. The creation of a new corpus containing approximately 1 million Cantonese sentences, and 2. A large-scale comparison across different model architectures, tokenization schemes, and embedding structures. Our best model trained with character-based tokenization and a Transformer architecture achieved a character-level BLEU of 25.1 when translating from Mandarin to Cantonese and of 24.4 when translating from Cantonese to Mandarin. In this paper we discuss our research process, experiments, and results.

preprint2022arXiv

3D Morphology of Open Clusters in the Solar Neighborhood with Gaia EDR3 II: Hierarchical Star Formation Revealed by Spatial and Kinematic Substructures

We identify members of 65 open clusters in the solar neighborhood using the machine-learning algorithm StarGO based on Gaia EDR3 data. After adding members of twenty clusters from previous studies (Pang et al. 2021a,b; Li et al. 2021) we obtain 85 clusters, and study their morphology and kinematics. We classify the substructures outside the tidal radius into four categories: filamentary (f1) and fractal (f2) for clusters $<100$ Myr, and halo (h) and tidal-tail (t) for clusters $>100$ Myr. The kinematical substructures of f1-type clusters are elongated; these resemble the disrupted cluster Group X. Kinematic tails are distinct in t-type clusters, especially Pleiades. We identify 29 hierarchical groups in four young regions (Alessi 20, IC 348, LP 2373, LP 2442); ten among these are new. The hierarchical groups form filament networks. Two regions (Alessi 20, LP 2373) exhibit global &#34;orthogonal&#34; expansion (stellar motion perpendicular to the filament), which might cause complete dispersal. Infalling-like flows (stellar motion along the filament) are found in UBC 31 and related hierarchical groups in the IC 348 region. Stellar groups in the LP 2442 region (LP 2442 gp 1-5) are spatially well-mixed but kinematically coherent. A merging process might be ongoing in the LP 2442 subgroups. For younger systems ($\lesssim30$ Myr), the mean axis ratio, cluster mass and half-mass radius tend to increase with age values. These correlations between structural parameters may imply two dynamical processes occurring in the hierarchical formation scenario in young stellar groups: (1) filament dissolution and (2) sub-group mergers.

preprint2022arXiv

A Knowledge Distillation-Based Backdoor Attack in Federated Learning

Federated Learning (FL) is a novel framework of decentralized machine learning. Due to the decentralized feature of FL, it is vulnerable to adversarial attacks in the training procedure, e.g. , backdoor attacks. A backdoor attack aims to inject a backdoor into the machine learning model such that the model will make arbitrarily incorrect behavior on the test sample with some specific backdoor trigger. Even though a range of backdoor attack methods of FL has been introduced, there are also methods defending against them. Many of the defending methods utilize the abnormal characteristics of the models with backdoor or the difference between the models with backdoor and the regular models. To bypass these defenses, we need to reduce the difference and the abnormal characteristics. We find a source of such abnormality is that backdoor attack would directly flip the label of data when poisoning the data. However, current studies of the backdoor attack in FL are not mainly focus on reducing the difference between the models with backdoor and the regular models. In this paper, we propose Adversarial Knowledge Distillation(ADVKD), a method combine knowledge distillation with backdoor attack in FL. With knowledge distillation, we can reduce the abnormal characteristics in model result from the label flipping, thus the model can bypass the defenses. Compared to current methods, we show that ADVKD can not only reach a higher attack success rate, but also successfully bypass the defenses when other methods fails. To further explore the performance of ADVKD, we test how the parameters affect the performance of ADVKD under different scenarios. According to the experiment result, we summarize how to adjust the parameter for better performance under different scenarios. We also use several methods to visualize the effect of different attack and explain the effectiveness of ADVKD.

preprint2022arXiv

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings

Similarity query is the family of queries based on some similarity metrics. Unlike the traditional database queries which are mostly based on value equality, similarity queries aim to find targets &#34;similar enough to&#34; the given data objects, depending on some similarity metric, e.g., Euclidean distance, cosine similarity and so on. To measure the similarity between data objects, traditional methods normally work on low level or syntax features(e.g., basic visual features on images or bag-of-word features of text), which makes them weak to compute the semantic similarities between objects. So for measuring data similarities semantically, neural embedding is applied. Embedding techniques work by representing the raw data objects as vectors (so called &#34;embeddings&#34; or &#34;neural embeddings&#34; since they are mostly generated by neural network models) that expose the hidden semantics of the raw data, based on which embeddings do show outstanding effectiveness on capturing data similarities, making it one of the most widely used and studied techniques in the state-of-the-art similarity query processing research. But there are still many open challenges on the efficiency of embedding based similarity query processing, which are not so well-studied as the effectiveness. In this survey, we first provide an overview of the &#34;similarity query&#34; and &#34;similarity query processing&#34; problems. Then we talk about recent approaches on designing the indexes and operators for highly efficient similarity query processing on top of embeddings (or more generally, high dimensional data). Finally, we investigate the specific solutions with and without using embeddings in selected application domains of similarity queries, including entity resolution and information retrieval. By comparing the solutions, we show how neural embeddings benefit those applications.

preprint2022arXiv

A Survey on the Fairness of Recommender Systems

Recommender systems are an essential tool to relieve the information overload challenge and play an important role in people&#39;s daily lives. Since recommendations involve allocations of social resources (e.g., job recommendation), an important issue is whether recommendations are fair. Unfair recommendations are not only unethical but also harm the long-term interests of the recommender system itself. As a result, fairness issues in recommender systems have recently attracted increasing attention. However, due to multiple complex resource allocation processes and various fairness definitions, the research on fairness in recommendation is scattered. To fill this gap, we review over 60 papers published in top conferences/journals, including TOIS, SIGIR, and WWW. First, we summarize fairness definitions in the recommendation and provide several views to classify fairness issues. Then, we review recommendation datasets and measurements in fairness studies and provide an elaborate taxonomy of fairness methods in the recommendation. Finally, we conclude this survey by outlining some promising future directions.

preprint2022arXiv

Advances in Neural Rendering

Synthesizing photo-realistic images and videos is at the heart of computer graphics and has been the focus of decades of research. Traditionally, synthetic images of a scene are generated using rendering algorithms such as rasterization or ray tracing, which take specifically defined representations of geometry and material properties as input. Collectively, these inputs define the actual scene and what is rendered, and are referred to as the scene representation (where a scene consists of one or more objects). Example scene representations are triangle meshes with accompanied textures (e.g., created by an artist), point clouds (e.g., from a depth sensor), volumetric grids (e.g., from a CT scan), or implicit surface functions (e.g., truncated signed distance fields). The reconstruction of such a scene representation from observations using differentiable rendering losses is known as inverse graphics or inverse rendering. Neural rendering is closely related, and combines ideas from classical computer graphics and machine learning to create algorithms for synthesizing images from real-world observations. Neural rendering is a leap forward towards the goal of synthesizing photo-realistic image and video content. In recent years, we have seen immense progress in this field through hundreds of publications that show different ways to inject learnable components into the rendering pipeline. This state-of-the-art report on advances in neural rendering focuses on methods that combine classical rendering principles with learned 3D scene representations, often now referred to as neural scene representations. A key advantage of these methods is that they are 3D-consistent by design, enabling applications such as novel viewpoint synthesis of a captured scene. In addition to methods that handle static scenes, we cover neural scene representations for modeling non-rigidly deforming objects...

preprint2022arXiv

Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement

In this paper, we propose a weakly-supervised approach for 3D object detection, which makes it possible to train a strong 3D detector with position-level annotations (i.e. annotations of object centers). In order to remedy the information loss from box annotations to centers, our method, namely Back to Reality (BR), makes use of synthetic 3D shapes to convert the weak labels into fully-annotated virtual scenes as stronger supervision, and in turn utilizes the perfect virtual labels to complement and refine the real labels. Specifically, we first assemble 3D shapes into physically reasonable virtual scenes according to the coarse scene layout extracted from position-level annotations. Then we go back to reality by applying a virtual-to-real domain adaptation method, which refine the weak labels and additionally supervise the training of detector with the virtual scenes. Furthermore, we propose a more challenging benckmark for indoor 3D object detection with more diversity in object sizes to better show the potential of BR. With less than 5% of the labeling labor, we achieve comparable detection performance with some popular fully-supervised approaches on the widely used ScanNet dataset. Code is available at: https://github.com/wyf-ACCEPT/BackToReality

preprint2022arXiv

Fine- and Coarse-Granularity Hybrid Self-Attention for Efficient BERT

Transformer-based pre-trained models, such as BERT, have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, deploying these models can be prohibitively costly, as the standard self-attention mechanism of the Transformer suffers from quadratic computational cost in the input sequence length. To confront this, we propose FCA, a fine- and coarse-granularity hybrid self-attention that reduces the computation cost through progressively shortening the computational sequence length in self-attention. Specifically, FCA conducts an attention-based scoring strategy to determine the informativeness of tokens at each layer. Then, the informative tokens serve as the fine-granularity computing units in self-attention and the uninformative tokens are replaced with one or several clusters as the coarse-granularity computing units in self-attention. Experiments on GLUE and RACE datasets show that BERT with FCA achieves 2x reduction in FLOPs over original BERT with <1% loss in accuracy. We show that FCA offers a significantly better trade-off between accuracy and FLOPs compared to prior methods.

preprint2022arXiv

Learnability of Competitive Threshold Models

Modeling the spread of social contagions is central to various applications in social computing. In this paper, we study the learnability of the competitive threshold model from a theoretical perspective. We demonstrate how competitive threshold models can be seamlessly simulated by artificial neural networks with finite VC dimensions, which enables analytical sample complexity and generalization bounds. Based on the proposed hypothesis space, we design efficient algorithms under the empirical risk minimization scheme. The theoretical insights are finally translated into practical and explainable modeling methods, the effectiveness of which is verified through a sanity check over a few synthetic and real datasets. The experimental results promisingly show that our method enjoys a decent performance without using excessive data points, outperforming off-the-shelf methods.

preprint2022arXiv

LUNA: Learning Slot-Turn Alignment for Dialogue State Tracking

Dialogue state tracking (DST) aims to predict the current dialogue state given the dialogue history. Existing methods generally exploit the utterances of all dialogue turns to assign value for each slot. This could lead to suboptimal results due to the information introduced from irrelevant utterances in the dialogue history, which may be useless and can even cause confusion. To address this problem, we propose LUNA, a sLot-tUrN Alignment enhanced approach. It first explicitly aligns each slot with its most relevant utterance, then further predicts the corresponding value based on this aligned utterance instead of all dialogue utterances. Furthermore, we design a slot ranking auxiliary task to learn the temporal correlation among slots which could facilitate the alignment. Comprehensive experiments are conducted on multi-domain task-oriented dialogue datasets, i.e., MultiWOZ 2.0, MultiWOZ 2.1, and MultiWOZ 2.2. The results show that LUNA achieves new state-of-the-art results on these datasets.

preprint2022arXiv

Modeling and Predicting Citation Count via Recurrent Neural Network with Long Short-Term Memory

The rapid evolution of scientific research has been creating a huge volume of publications every year. Among the many quantification measures of scientific impact, citation count stands out for its frequent use in the research community. Although peer review process is the mainly reliable way of predicting a paper&#39;s future impact, the ability to foresee lasting impact on the basis of citation records is increasingly important in the scientific impact analysis in the era of big data. This paper focuses on the long-term citation count prediction for individual publications, which has become an emerging and challenging applied research topic. Based on the four key phenomena confirmed independently in previous studies of long-term scientific impact quantification, including the intrinsic quality of publications, the aging effect and the Matthew effect and the recency effect, we unify the formulations of all these observations in this paper. Building on a foundation of the above formulations, we propose a long-term citation count prediction model for individual papers via recurrent neural network with long short-term memory units. Extensive experiments on a real-large citation data set demonstrate that the proposed model consistently outperforms existing methods, and achieves a significant performance improvement.

preprint2022arXiv

Multiscale Convolutional Transformer with Center Mask Pretraining for Hyperspectral Image Classification

Hyperspectral images (HSI) not only have a broad macroscopic field of view but also contain rich spectral information, and the types of surface objects can be identified through spectral information, which is one of the main applications in hyperspectral image related research.In recent years, more and more deep learning methods have been proposed, among which convolutional neural networks (CNN) are the most influential. However, CNN-based methods are difficult to capture long-range dependencies, and also require a large amount of labeled data for model training.Besides, most of the self-supervised training methods in the field of HSI classification are based on the reconstruction of input samples, and it is difficult to achieve effective use of unlabeled samples. To address the shortcomings of CNN networks, we propose a noval multi-scale convolutional embedding module for HSI to realize effective extraction of spatial-spectral information, which can be better combined with Transformer network.In order to make more efficient use of unlabeled data, we propose a new self-supervised pretask. Similar to Mask autoencoder, but our pre-training method only masks the corresponding token of the central pixel in the encoder, and inputs the remaining token into the decoder to reconstruct the spectral information of the central pixel.Such a pretask can better model the relationship between the central feature and the domain feature, and obtain more stable training results.

preprint2022arXiv

MuSCLe: A Multi-Strategy Contrastive Learning Framework for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation (WSSS) has gained significant popularity since it relies only on weak labels such as image level annotations rather than pixel level annotations required by supervised semantic segmentation (SSS) methods. Despite drastically reduced annotation costs, typical feature representations learned from WSSS are only representative of some salient parts of objects and less reliable compared to SSS due to the weak guidance during training. In this paper, we propose a novel Multi-Strategy Contrastive Learning (MuSCLe) framework to obtain enhanced feature representations and improve WSSS performance by exploiting similarity and dissimilarity of contrastive sample pairs at image, region, pixel and object boundary levels. Extensive experiments demonstrate the effectiveness of our method and show that MuSCLe outperforms the current state-of-the-art on the widely used PASCAL VOC 2012 dataset.

preprint2022arXiv

OPERA:Operation-Pivoted Discrete Reasoning over Text

Machine reading comprehension (MRC) that requires discrete reasoning involving symbolic operations, e.g., addition, sorting, and counting, is a challenging task. According to this nature, semantic parsing-based methods predict interpretable but complex logical forms. However, logical form generation is nontrivial and even a little perturbation in a logical form will lead to wrong answers. To alleviate this issue, multi-predictor -based methods are proposed to directly predict different types of answers and achieve improvements. However, they ignore the utilization of symbolic operations and encounter a lack of reasoning ability and interpretability. To inherit the advantages of these two types of methods, we propose OPERA, an operation-pivoted discrete reasoning framework, where lightweight symbolic operations (compared with logical forms) as neural modules are utilized to facilitate the reasoning ability and interpretability. Specifically, operations are first selected and then softly executed to simulate the answer reasoning procedure. Extensive experiments on both DROP and RACENum datasets show the reasoning ability of OPERA. Moreover, further analysis verifies its interpretability.

preprint2021arXiv

A Machine Learning Approach to Optimal Inverse Discrete Cosine Transform (IDCT) Design

The design of the optimal inverse discrete cosine transform (IDCT) to compensate the quantization error is proposed for effective lossy image compression in this work. The forward and inverse DCTs are designed in pair in current image/video coding standards without taking the quantization effect into account. Yet, the distribution of quantized DCT coefficients deviate from that of original DCT coefficients. This is particularly obvious when the quality factor of JPEG compressed images is small. To address this problem, we first use a set of training images to learn the compound effect of forward DCT, quantization and dequantization in cascade. Then, a new IDCT kernel is learned to reverse the effect of such a pipeline. Experiments are conducted to demonstrate that the advantage of the new method, which has a gain of 0.11-0.30dB over the standard JPEG over a wide range of quality factors.

preprint2021arXiv

Defect $a$-Theorem and $a$-Maximization

Conformal defects describe the universal behaviors of a conformal field theory (CFT) in the presence of a boundary or more general impurities. The coupled critical system is characterized by new conformal anomalies which are analogous to, and generalize those of standalone CFTs. Here we study the conformal $a$- and $c$-anomalies of four dimensional defects in CFTs of general spacetime dimensions greater than four. We prove that under unitary defect renormalization group (RG) flows, the defect $a$-anomaly must decrease, thus establishing the defect $a$-theorem. For conformal defects preserving minimal supersymmetry, the full defect symmetry contains a distinguished $U(1)_R$ subgroup. We derive the anomaly multiplet relations that express the defect $a$- and $c$-anomalies in terms of the defect (mixed) &#39;t Hooft anomalies for this $U(1)_R$ symmetry. Once the $U(1)_R$ symmetry is identified using the defect $a$-maximization principle which we prove, this enables a non-perturbative pathway to the conformal anomalies of strongly coupled defects. We illustrate our methods by discussing a number of examples including boundaries in five dimensions and codimension-two defects in six dimensions. We also comment on chiral algebra sectors of defect operator algebras and potential conformal collider bounds on defect anomalies.

preprint2020arXiv

A Workload Adaptive Haptic Shared Control Scheme for Semi-Autonomous Driving

Haptic shared control is used to manage the control authority allocation between a human and an autonomous agent in semi-autonomous driving. Existing haptic shared control schemes, however, do not take full consideration of the human agent. To fill this research gap, this study presents a haptic shared control scheme that adapts to a human operator&#39;s workload, eyes on road and input torque in real-time. We conducted human-in-the-loop experiments with 24 participants. In the experiment, a human operator and an autonomy module for navigation shared the control of a simulated notional High Mobility Multipurpose Wheeled Vehicle (HMMWV) at a fixed speed. At the same time, the human operator performed a target detection task for surveillance. The autonomy could be either adaptive or non-adaptive to the above-mentioned human factors. Results indicate that the adaptive haptic control scheme resulted in significantly lower workload, higher trust in autonomy, better driving task performance and smaller control effort.

preprint2020arXiv

Attention: to Better Stand on the Shoulders of Giants

Science of science (SciSci) is an emerging discipline wherein science is used to study the structure and evolution of science itself using large data sets. The increasing availability of digital data on scholarly outcomes offers unprecedented opportunities to explore SciSci. In the progress of science, the previously discovered knowledge principally inspires new scientific ideas, and citation is a reasonably good reflection of this cumulative nature of scientific research. The researches that choose potentially influential references will have a lead over the emerging publications. Although the peer review process is the mainly reliable way of predicting a paper&#39;s future impact, the ability to foresee the lasting impact based on citation records is increasingly essential in the scientific impact analysis in the era of big data. This paper develops an attention mechanism for the long-term scientific impact prediction and validates the method based on a real large-scale citation data set. The results break conventional thinking. Instead of accurately simulating the original power-law distribution, emphasizing the limited attention can better stand on the shoulders of giants.

preprint2020arXiv

Conforming nanoparticle sheets to surfaces with Gaussian curvature

Nanoparticle monolayer sheets are ultrathin inorganic-organic hybrid materials that combine highly controllable optical and electrical properties with mechanical flexibility and remarkable strength. Like other thin sheets, their low bending rigidity allows them to easily roll into or conform to cylindrical geometries. Nanoparticle monolayers not only can bend, but also cope with strain through local particle rearrangement and plastic deformation. This means that, unlike thin sheets such as paper or graphene, nanoparticle sheets can much more easily conform to surfaces with complex topography characterized by non-zero Gaussian curvature, like spherical caps or saddles. Here, we investigate the limits of nanoparticle monolayers&#39; ability to conform to substrates with Gaussian curvature by stamping nanoparticle sheets onto lattices of larger polystyrene spheres. Tuning the local Gaussian curvature by increasing the size of the substrate spheres, we find that the stamped sheet morphology evolves through three characteristic stages: from full substrate coverage, where the sheet extends over the interstices in the lattice, to coverage in the form of caps that conform tightly to the top portion of each sphere and fracture at larger polar angles, to caps that exhibit radial folds. Through analysis of the nanoparticle positions, obtained from scanning electron micrographs, we extract the local strain tensor and track the onset of strain-induced dislocations in the particle arrangement. By considering the interplay of energies for elastic and plastic deformations and adhesion, we construct arguments that capture the observed changes in sheet morphology as Gaussian curvature is tuned over two orders of magnitude.

preprint2020arXiv

From $\mathcal{N}=4$ Super-Yang-Mills on $\mathbb{RP}^4$ to bosonic Yang-Mills on $\mathbb{RP}^2$

We study the four-dimensional $\mathcal{N}=4$ super-Yang-Mills (SYM) theory on the unorientable spacetime manifold $\mathbb{RP}^4$. Using supersymmetric localization, we find that for a large class of local and extended SYM observables preserving a common supercharge $\mathcal{Q}$, their expectation values are captured by an effective two-dimensional bosonic Yang-Mills (YM) theory on an $\mathbb{RP}^2$ submanifold. This paves the way for understanding $\mathcal{N}=4$ SYM on $\mathbb{RP}^4$ using known results of YM on $\mathbb{RP}^2$. As an illustration, we derive a matrix integral form of the SYM partition function on $\mathbb{RP}^4$ which, when decomposed into discrete holonomy sectors, contains subtle phase factors due to the nontrivial $η$-invariant of the Dirac operator on $\mathbb{RP}^4$. We also comment on potential applications of our setup for AGT correspondence, integrability and bulk-reconstruction in AdS/CFT that involve cross-cap states on the boundary.

preprint2020arXiv

Learning to Parse Wireframes in Images of Man-Made Environments

In this paper, we propose a learning-based approach to the task of automatically extracting a &#34;wireframe&#34; representation for images of cluttered man-made environments. The wireframe (see Fig. 1) contains all salient straight lines and their junctions of the scene that encode efficiently and accurately large-scale geometry and object shapes. To this end, we have built a very large new dataset of over 5,000 images with wireframes thoroughly labelled by humans. We have proposed two convolutional neural networks that are suitable for extracting junctions and lines with large spatial support, respectively. The networks trained on our dataset have achieved significantly better performance than state-of-the-art methods for junction detection and line segment detection, respectively. We have conducted extensive experiments to evaluate quantitatively and qualitatively the wireframes obtained by our method, and have convincingly shown that effectively and efficiently parsing wireframes for images of man-made environments is a feasible goal within reach. Such wireframes could benefit many important visual tasks such as feature correspondence, 3D reconstruction, vision-based mapping, localization, and navigation. The data and source code are available at https://github.com/huangkuns/wireframe.

preprint2020arXiv

Non-perturbative Defect One-Point Functions in Planar $\mathcal{N}=4$ Super-Yang-Mills

The four dimensional $\mathcal{N}=4$ super-Yang-Mills (SYM) theory exhibits rich dynamics in the presence of codimension-one conformal defects. The new structure constants of the extended operator algebra consist of one-point functions of local operators which are nonvanishing due to the defect insertion and carry nontrivial coupling dependence. We study an important class of half-BPS superconformal defects engineered by D5 branes that share three common directions with the D3 branes and involve Nahm pole configurations for the SYM fields on the D3 brane worldvolume. In the planar large $N$ limit, we obtain non-perturbative results in the &#39;t Hooft coupling $λ$ for the defect one-point functions of both BPS and non-BPS operators, building upon recent progress in localization and integrability methods. For BPS operator insertions in the SYM with D5-brane type boundary or interface, we derive an effective two dimensional defect-Yang-Mills (dYM) theory from supersymmetric localization, which gives an efficient way to extract defect observables and generates a novel matrix model for the defect one-point function. By solving the matrix model in the large $N$ limit, we obtain exact results in $λ$ which interpolate between perturbative Feynman diagram contributions in the weak coupling limit and IIB string theory predictions on $AdS_5\times S^5$ in the strong coupling regime, providing a precision test of AdS/CFT with interface defects. For general non-BPS operators, we develop a non-perturbative bootstrap-type program for integrable boundary states on the worldsheet of the IIB string theory, corresponding to the interface defects in the planar SYM. Such integrable boundary states are constrained by a set of general consistency conditions for which we present explicit solutions that reproduce and extend the known results at weak coupling from integrable spin-chain methods.

preprint2020arXiv

Nonabelian Mirror Symmetry Beyond the Chiral Ring

Mirror symmetry is a type of infrared duality in 3D quantum field theory that relates the low-energy dynamics of two distinct ultraviolet descriptions. Though first discovered in the supersymmetric context, it has far-reaching implications for understanding nonperturbative physics in general 3D quantum field theories. We study mirror symmetry in 3D $\mathcal{N}=4$ supersymmetric field theories whose Higgs or Coulomb branches realize $D$- and $E$-type Kleinian singularities in the $ADE$ classification, generalizing previous work on the $A$-type case. Such theories include the $SU(2)$ gauge theory coupled to fundamental matter in the $D$-type case and non-Lagrangian generalizations thereof in the $E$-type case. In these cases, the mirror description is given by a quiver gauge theory of affine $D$- or $E$-type. We investigate the mirror map at the level of the recently identified 1D protected subsector described by topological quantum mechanics, which implements a deformation quantization of the corresponding $ADE$ singularity. We give an explicit dictionary between the monopole operators and their dual mesonic operators in the $D$-type case. Along the way, we extract various operator product expansion (OPE) coefficients for the quantized Higgs and Coulomb branches. We conclude by offering some perspectives on how the topological subsectors of the $E$-type quivers might shed light on their non-Lagrangian duals.

preprint2020arXiv

People as Scene Probes

By analyzing the motion of people and other objects in a scene, we demonstrate how to infer depth, occlusion, lighting, and shadow information from video taken from a single camera viewpoint. This information is then used to composite new objects into the same scene with a high degree of automation and realism. In particular, when a user places a new object (2D cut-out) in the image, it is automatically rescaled, relit, occluded properly, and casts realistic shadows in the correct direction relative to the sun, and which conform properly to scene geometry. We demonstrate results (best viewed in supplementary video) on a range of scenes and compare to alternative methods for depth estimation and shadow compositing.

preprint2020arXiv

PointHop++: A Lightweight Learning Model on Point Sets for 3D Classification

The PointHop method was recently proposed by Zhang et al. for 3D point cloud classification with unsupervised feature extraction. It has an extremely low training complexity while achieving state-of-the-art classification performance. In this work, we improve the PointHop method furthermore in two aspects: 1) reducing its model complexity in terms of the model parameter number and 2) ordering discriminant features automatically based on the cross-entropy criterion. The resulting method is called PointHop++. The first improvement is essential for wearable and mobile computing while the second improvement bridges statistics-based and optimization-based machine learning methodologies. With experiments conducted on the ModelNet40 benchmark dataset, we show that the PointHop++ method performs on par with deep neural network (DNN) solutions and surpasses other unsupervised feature extraction methods.

preprint2020arXiv

VC-Net: Deep Volume-Composition Networks for Segmentation and Visualization of Highly Sparse and Noisy Image Data

The motivation of our work is to present a new visualization-guided computing paradigm to combine direct 3D volume processing and volume rendered clues for effective 3D exploration such as extracting and visualizing microstructures in-vivo. However, it is still challenging to extract and visualize high fidelity 3D vessel structure due to its high sparseness, noisiness, and complex topology variations. In this paper, we present an end-to-end deep learning method, VC-Net, for robust extraction of 3D microvasculature through embedding the image composition, generated by maximum intensity projection (MIP), into 3D volume image learning to enhance the performance. The core novelty is to automatically leverage the volume visualization technique (MIP) to enhance the 3D data exploration at deep learning level. The MIP embedding features can enhance the local vessel signal and are adaptive to the geometric variability and scalability of vessels, which is crucial in microvascular tracking. A multi-stream convolutional neural network is proposed to learn the 3D volume and 2D MIP features respectively and then explore their inter-dependencies in a joint volume-composition embedding space by unprojecting the MIP features into 3D volume embedding space. The proposed framework can better capture small / micro vessels and improve vessel connectivity. To our knowledge, this is the first deep learning framework to construct a joint convolutional embedding space, where the computed vessel probabilities from volume rendering based 2D projection and 3D volume can be explored and integrated synergistically. Experimental results are compared with the traditional 3D vessel segmentation methods and the deep learning state-of-the-art on public and real patient (micro-)cerebrovascular image datasets. Our method demonstrates the potential in a powerful MR arteriogram and venogram diagnosis of vascular diseases.

preprint2019arXiv

$\mathcal{N}=4$ Super-Yang-Mills Correlators at Strong Coupling from String Theory and Localization

We compute $1/λ$ corrections to the four-point functions of half-BPS operators in $SU(N)$ $\mathcal{N}=4$ super-Yang-Mills theory at large $N$ and large &#39;t Hooft coupling $λ=g_\text{YM}^2 N$ using two methods. Firstly, we relate integrals of these correlators to derivatives of the mass deformed $S^4$ free energy, which was computed at leading order in large $N$ and to all orders in $1/λ$ using supersymmetric localization. Secondly, we use AdS/CFT to relate these $1/λ$ corrections to higher derivative corrections to supergravity for scattering amplitudes of Kaluza-Klein scalars in IIB string theory on $AdS_5\times S^5$, which in the flat space limit are known from worldsheet calculations. These two methods match at the order corresponding to the tree level $R^4$ interaction in string theory, which provides a precise check of AdS/CFT beyond supergravity, and allow us to derive the holographic correlators to tree level $D^4R^4$ order. Combined with constraints from arXiv:1809.10670, our results can be used to derive CFT data to one-loop $D^4R^4$ order. Finally, we use AdS/CFT to fix these correlators in the limit where $N$ is taken to be large while $g_{\rm YM}$ is kept fixed. In this limit, we present a conjecture for the small mass limit of the $S^4$ partition function that includes all instanton corrections and is written in terms of the same Eisenstein series that appear in the study of string theory scattering amplitudes.

preprint2019arXiv

An exact quantization of Jackiw-Teitelboim gravity

We propose an exact quantization of two-dimensional Jackiw-Teitelboim (JT) gravity by formulating the JT gravity theory as a 2D gauge theory placed in the presence of a loop defect. The gauge group is a certain central extension of $PSL(2, \mathbb{R})$ by $\mathbb{R}$. We find that the exact partition function of our theory when placed on a Euclidean disk matches precisely the finite temperature partition function of the Schwarzian theory. We show that observables on both sides are also precisely matched: correlation functions of boundary-anchored Wilson lines in the bulk are given by those of bi-local operators in the Schwarzian theory. In the gravitational context, the Wilson lines are shown to be equivalent to the world-lines of massive particles in the metric formulation of JT gravity.