Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
39works
0followers
22topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

39 published item(s)

preprint2026arXiv

Advantages and disadvantages of maximally entangled states in dilaton black hole background

We investigate quantum entanglement and coherence for four classes of Bell-like fermionic states in the vicinity of the event horizon of a Garfinkle-Horowitz-Strominger (GHS) dilaton black hole. Contrary to the common expectation that maximally entangled states always provide superior quantum resources, our results show that their entanglement can be lower than that of suitably chosen non-maximally entangled states in this curved spacetime background. This reveals that non-maximally entangled states may offer operational advantages for entanglement-based tasks under gravitational effects. In contrast, quantum coherence exhibits monotonic behavior: larger initial coherence leads to systematically enhanced robustness against the dilaton induced degradation. These results indicate that the optimal choice of initial quantum states depends sensitively on the specific quantum resource, either quantum entanglement or quantum coherence, required for quantum information processing near a dilaton black hole.

preprint2026arXiv

Can LLMs See Without Pixels? Benchmarking Spatial Intelligence from Textual Descriptions

Recent advancements in Spatial Intelligence (SI) have predominantly relied on Vision-Language Models (VLMs), yet a critical question remains: does spatial understanding originate from visual encoders or the fundamental reasoning backbone? Inspired by this question, we introduce SiT-Bench, a novel benchmark designed to evaluate the SI performance of Large Language Models (LLMs) without pixel-level input, comprises over 3,800 expert-annotated items across five primary categories and 17 subtasks, ranging from egocentric navigation and perspective transformation to fine-grained robotic manipulation. By converting single/multi-view scenes into high-fidelity, coordinate-aware textual descriptions, we challenge LLMs to perform symbolic textual reasoning rather than visual pattern matching. Evaluation results of state-of-the-art (SOTA) LLMs reveals that while models achieve proficiency in localized semantic tasks, a significant "spatial gap" remains in global consistency. Notably, we find that explicit spatial reasoning significantly boosts performance, suggesting that LLMs possess latent world-modeling potential. Our proposed dataset SiT-Bench serves as a foundational resource to foster the development of spatially-grounded LLM backbones for future VLMs and embodied agents. Our code and benchmark will be released at https://github.com/binisalegend/SiT-Bench .

preprint2026arXiv

D$^3$R-DETR: DETR with Dual-Domain Density Refinement for Tiny Object Detection in Aerial Images

Detecting tiny objects plays a vital role in remote sensing intelligent interpretation, as these objects often carry critical information for downstream applications. However, due to the extremely limited pixel information and significant variations in object density, mainstream Transformer-based detectors often suffer from slow convergence and inaccurate query-object matching. To address these challenges, we propose D$^3$R-DETR, a novel DETR-based detector with Dual-Domain Density Refinement. By fusing spatial and frequency domain information, our method refines low-level feature maps and utilizes their rich details to predict more accurate object density map, thereby guiding the model to precisely localize tiny objects. Extensive experiments on the AI-TOD-v2 dataset demonstrate that D$^3$R-DETR outperforms existing state-of-the-art detectors for tiny object detection.

preprint2026arXiv

Dressed-state relaxation in coupled qubits as a source of two-qubit gate errors

Understanding error mechanisms in two-qubit gate operations is essential for building high-fidelity quantum processors. While prior studies predominantly treat dephasing noise as either Markovian or predominantly low-frequency, realistic qubit environments exhibit structured, frequency-dependent spectra. Here we demonstrate that noise at frequencies matching the dressed-state energy splitting--set by the inter-qubit coupling strength g--induces a distinct relaxation channel that degrades gate performance. Through combined theoretical analysis and experimental verification on superconducting qubits with engineered noise spectra, we show that two-qubit gate errors scale predictably with the noise power spectral density at frequency 2g, extending the concept of $T_{1ρ}$ relaxation to interacting systems. This frequency-selective relaxation mechanism, universal across platforms, enriches our understanding of decoherence pathways during gate operations. The same mechanism sets coherence limits for dual-rail or singlet-triplet encodings.

preprint2026arXiv

ELDOR: A Dataset and Benchmark for Illegal Gold Mining in the Amazon Rainforest

Illegal gold mining in the Amazon rainforest causes deforestation, water contamination, and long-term ecosystem disruption, yet remains difficult to monitor at fine spatial scales. Satellite imagery supports large-scale observation, but often misses small mining-related structures and subtle land-cover transitions, especially under frequent cloud cover. We introduce ELDOR, a large-scale UAV benchmark for monitoring environmental and landscape disturbance from illegal gold mining in the rainforest. ELDOR contains manually annotated orthomosaic imagery covering over 2,500 hectares, with pixel-level semantic labels for both mining-related activities and surrounding ecological structures. With this unified annotation source, we establish four benchmark tasks: semantic segmentation, segmentation-derived recognition, direct multi-label classification, and class-presence recognition with vision-language models. Across these tasks, we compare generic and remote-sensing-specific segmentation models, vision foundation model-related segmentation methods, direct multi-label classification methods, and vision-language models under a controlled closed-set protocol. Results show that current methods still struggle with rare small-scale mining structures and fine-grained recovery classes, suggesting the need for context-aware and multimodal modeling. To support domain analysis and practical use, we further build an interactive explorer for domain experts that provides a unified interface for data exploration and model inference.

preprint2026arXiv

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

We present GLM-4.1V-Thinking, GLM-4.5V, and GLM-4.6V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size, and demonstrates competitive or even superior results compared to closed-source models such as Gemini-2.5-Flash on challenging tasks including Coding and GUI Agents. Meanwhile, the smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks. We open-source both GLM-4.1V-9B-Thinking and GLM-4.5V. We further introduce the GLM-4.6V series, open-source multimodal models with native tool use and a 128K context window. A brief overview is available at https://z.ai/blog/glm-4.6v. Code, models and more information are released at https://github.com/zai-org/GLM-V.

preprint2026arXiv

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model design, multimodal training, reinforcement learning, toolchain expansion, and integration with agent frameworks. These developments lead to strong performance in multimodal coding, visual tool use, and framework-based agentic tasks, while preserving competitive text-only coding capability. More importantly, our development process offers practical insights for building multimodal agents, highlighting the central role of multimodal perception, hierarchical optimization, and reliable end-to-end verification.

preprint2026arXiv

Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling

Token serves as the fundamental unit of computation in modern autoregressive models, and generation length directly influences both inference cost and reasoning performance. Despite its importance, existing approaches lack fine-grained length modeling, operating primarily at the coarse-grained sequence level. We introduce the Length Value Model (LenVM), a token-level framework that models the remaining generation length. By formulating length modeling as a value estimation problem and assigning a constant negative reward to each generated token, LenVM predicts a bounded, discounted return that serves as a monotone proxy for the remaining generation horizon. This formulation yields supervision that is annotation-free, dense, unbiased, and scalable. Experiments on LLMs and VLMs demonstrate LenVM provides a highly effective signal at inference time. On the LIFEBench exact length matching task, applying LenVM to a 7B model improves the length score from 30.9 to 64.8, significantly outperforming frontier closed-source models. Furthermore, LenVM enables continuous control over the trade off between performance and efficiency. On GSM8K at a budget of 200 tokens, LenVM maintains 63% accuracy compared to 6 percent for token budget baseline. It also accurately predicts total generation length from the prompt boundary. Finally, LenVM's token-level values offer an interpretable view of generation dynamics, revealing how specific tokens shift reasoning toward shorter or longer regimes. Results demonstrate that LenVM supports a broad range of applications and token length can be effectively modeled as a token-level value signal, highlighting the potential of LenVM as a general framework for length modeling and as a length-specific value signal that could support future RL training. Code is available at https://github.com/eric-ai-lab/Length-Value-Model.

preprint2026arXiv

Less is More: Improving LLM Reasoning with Minimal Test-Time Intervention

Recent progress in large language models (LLMs) has focused on test-time scaling to improve reasoning via increased inference computation, but often at the cost of efficiency. We revisit test-time behavior and uncover a simple yet underexplored phenomenon: reasoning uncertainty is highly localized-only a small subset of high-entropy tokens dominantly affects output correctness. Motivated by this, we propose Minimal Test-Time Intervention (MTI), a training-free framework that enhances reasoning accuracy and stability with minimal overhead. MTI includes: (i) Selective CFG intervention, applying classifier-free guidance only at uncertain positions; and (ii) Lightweight negative-prompt guidance, reusing the main model's KV cache to approximate unconditional decoding efficiently. MTI yields consistent gains across general, coding, and STEM tasks-e.g., +9.28% average improvement on six benchmarks for DeepSeek-R1-7B and +11.25% on AIME2024 using Ling-mini-2.0-while remaining highly efficient.

preprint2026arXiv

Mitigating Measurement Crosstalk via Pulse Shaping

Quantum error correction protocols require rapid and repeated qubit measurements. While multiplexed readout in superconducting quantum systems improves efficiency, fast probe pulses introduce spectral broadening, leading to signal leakage into neighboring readout resonators. This crosstalk results in qubit dephasing and degraded readout fidelity. Here, we introduce a pulse shaping technique inspired by the derivative removal by adiabatic gate (DRAG) protocol to suppress measurement crosstalk during fast readout. By engineering a spectral notch at neighboring resonator frequencies, the method effectively mitigates spurious signal interference. Our approach integrates seamlessly with existing readout architectures, enabling fast, low-crosstalk multiplexed measurements without additional hardware overhead - a critical advancement for scalable quantum computing.

preprint2026arXiv

SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?

Spatial transcriptomics assays are rapidly increasing in scale and complexity, making computational analysis a major bottleneck in biological discovery. Although frontier AI agents have improved dramatically at software engineering and general data analysis, it remains unclear whether they can extract biological insight from messy, real-world spatial datasets. We introduce SpatialBench, a benchmark of 146 verifiable problems derived from practical spatial analysis workflows spanning five spatial technologies and seven task categories. Each problem provides a snapshot of experimental data immediately prior to an analysis step and a deterministic grader that evaluates recovery of a key biological result. Benchmark data on frontier models shows that base model accuracy remains low (20-38% across model families), with strong model-task and model-platform interactions. Harness design has a large empirical effect on performance, indicating that tools, prompts, control flow, and execution environment should be evaluated and improved as first-class objects. SpatialBench serves both as a measurement tool and a diagnostic lens for developing agents that can interact with real spatial datasets faithfully, transparently, and reproducibly.

preprint2026arXiv

Suppressing spurious transitions using spectrally balanced pulse

Achieving precise control over quantum systems presents a significant challenge, especially in many-body setups, where residual couplings and unintended transitions undermine the accuracy of quantum operations. In superconducting qubits, parasitic interactions -- both between distant qubits and with spurious two-level systems -- can severely limit the performance of quantum gates. In this work, we introduce a pulse-shaping technique that uses spectrally balanced microwave pulses to suppress undesired transitions. Experimental results demonstrate an order-of-magnitude reduction in spurious excitations between weakly detuned qubits, as well as a substantial decrease in single-qubit gate errors caused by a strongly coupled two-level defect over a broad frequency range. Our method provides a simple yet powerful solution to mitigate adverse effects from parasitic couplings, enhancing the fidelity of quantum operations and expanding feasible frequency allocations for large-scale quantum devices.

preprint2026arXiv

TransLibEval: Demystify Large Language Models' Capability in Third-party Library-targeted Code Translation

In recent years, Large Language Models (LLMs) have been widely studied in the code translation field on the method, class, and even repository levels. However, most of these benchmarks are limited in terms of Third-Party Library (TPL) categories and scales, making TPL-related errors hard to expose and hindering the development of targeted solutions. Considering the high dependence (over 90%) on TPLs in practical programming, demystifying and analyzing LLMs' code translation performance involving various TPLs becomes imperative. To address this gap, we construct TransLibEval, the first benchmark dedicated to library-centric code translation. It consists of 200 real-world tasks across Python, Java, and C++, each explicitly involving TPLs from diverse categories such as data processing, machine learning, and web development, with comprehensive dependency coverage and high-coverage test suites. We evaluate seven recent LLMs of commercial, general, and code-specialized families under six translation strategies of three categories: Direct, IR-guided, and Retrieval-augmented. Experimental results show a dramatic performance drop compared with library-free settings (average CA decline over 60%), while diverse strategies demonstrate heterogeneous advantages. Furthermore, we analyze 4,831 failed cases from GPT-4o, one of the State-of-the-Art (SOTA) LLMs, revealing numerous third-party reference errors that were obscured previously. These findings highlight the unique challenges of library-centric translation and provide practical guidance for improving TPL-aware code intelligence.

preprint2026arXiv

TransMamba: A Sequence-Level Hybrid Transformer-Mamba Language Model

Transformers are the cornerstone of modern large language models, but their quadratic computational complexity limits efficiency in long-sequence processing. Recent advancements in Mamba, a state space model (SSM) with linear complexity, offer promising efficiency gains but suffer from unstable contextual learning and multitask generalization. Some works conduct layer-level hybrid structures that combine Transformer and Mamba layers, aiming to make full use of both advantages. This paper proposes TransMamba, a novel sequence-level hybrid framework that unifies Transformer and Mamba through shared parameter matrices (QKV and CBx), and thus could dynamically switch between attention and SSM mechanisms at different token lengths and layers. We design the Memory Converter to bridge Transformer and Mamba by converting attention outputs into SSM-compatible states, ensuring seamless information flow at TransPoints where the transformation happens. The TransPoint scheduling is also thoroughly explored for balancing effectiveness and efficiency. We conducted extensive experiments demonstrating that TransMamba achieves superior training efficiency and performance compared to single and hybrid baselines, and validated the deeper consistency between Transformer and Mamba paradigms at sequence level, offering a scalable solution for next-generation language modeling. Code and data are available at https://github.com/Yixing-Li/TransMamba

preprint2025arXiv

PlotGen-Bench: Evaluating VLMs on Generating Visualization Code from Diverse Plots across Multiple Libraries

Recent advances in vision-language models (VLMs) have expanded their multimodal code generation capabilities, yet their ability to generate executable visualization code from plots, especially for complex 3D, animated, plot-to-plot transformations, or multi-library scenarios, remains underexplored. To address this gap, we introduce PlotGen-Bench, a comprehensive benchmark for evaluating plot-to-code generation under realistic and complex visualization scenarios. The benchmark spans 9 major categories, 30 subcategories, and 3 core tasks-plot replication, plot transformation, and multi-library generation, covering both 2D, 3D and animated plots across 5 widely used visualization libraries. Through systematic evaluation of state-of-the-art open- and closed-source VLMs, we find that open-source models still lag considerably behind in visual fidelity and semantic consistency, despite achieving comparable code executability. Moreover, all models exhibit substantial degradation on reasoning-intensive tasks such as chart type conversion and animation generation. PlotGen-Bench establishes a rigorous foundation for advancing research toward more capable and reliable VLMs for visualization authoring and code synthesis, with all data and code available at https://plotgen.github.io.

preprint2024arXiv

KCES: A Workflow Containerization Scheduling Scheme Under Cloud-Edge Collaboration Framework

As more IoT applications gradually move towards the cloud-edge collaborative mode, the containerized scheduling of workflows extends from the cloud to the edge. However, given the high delay of the communication network, loose coupling of structure, and resource heterogeneity between cloud and edge, workflow containerization scheduling in the cloud-edge scenarios faces the difficulty of resource coordination and application collaboration management. To address these two issues, we propose a KubeEdge-Cloud-Edge-Scheduling scheme named KCES, a workflow containerization scheduling scheme for the KubeEdge cloud-edge framework. The KCES includes a cloud-edge workflow scheduling engine for KubeEdge and workflow scheduling strategies for task horizontal roaming and vertical offloading. Considering the scheduling optimization of cloud-edge workflows, this paper proposes a cloud-edge workflow scheduling model and cloud-edge node model and designs a cloud-edge workflow scheduling engine to maximize cloud-edge resource utilization under the constraint of workflow task delay. A cloud-edge resource hybrid management technology is used to design the cloud-edge resource evaluation and resource allocation algorithms to achieve cloud-edge resource collaboration. Based on the ideas of distributed functional roles and the hierarchical division of computing power, the horizontal roaming among the edges and vertical offloading strategies between the cloud and edges for workflow tasks are designed to realize the cloud-edge application collaboration. Through a customized IoT application workflow instance, experimental results show that KCES is superior to the baseline in total workflow duration, average workflow duration, and resource usage and has the capabilities of horizontal roaming and vertical offloading for workflow tasks.

preprint2024arXiv

TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal inputs and the generation in non-textual modalities. In this work, we propose TEAL (Tokenize and Embed ALl)}, an approach to treat the input from any modality as a token sequence and learn a joint embedding space for all modalities. Specifically, for the input from any modality, TEAL first discretizes it into a token sequence with the off-the-shelf tokenizer and embeds the token sequence into a joint embedding space with a learnable embedding matrix. MM-LLMs just need to predict the multi-modal tokens autoregressively as the textual LLMs do. Finally, the corresponding de-tokenizer is applied to generate the output in each modality based on the predicted token sequence. With the joint embedding space, TEAL enables the frozen LLMs to perform both understanding and generation tasks involving non-textual modalities, such as image and audio. Thus, the textual LLM can just work as an interface and maintain its high performance in textual understanding and generation. Experiments show that TEAL achieves substantial improvements in multi-modal understanding, and implements a simple scheme for multi-modal generations.

preprint2023arXiv

Sum-Rate Maximization in Active RIS-Assisted Multi-Antenna WPCN

In this paper, we propose an active reconfigurable intelligent surface (RIS) enabled hybrid relaying scheme for a multi-antenna wireless powered communication network (WPCN), where the active RIS is employed to assist both wireless energy transfer (WET) from the power station (PS) to energy-constrained users and wireless information transmission (WIT) from users to the receiving station (RS). For further performance enhancement, we propose to employ both transmit beamforming at the PS and receive beamforming at the RS. We formulate a sum-rate maximization problem by jointly optimizing the RIS phase shifts and amplitude reflection coefficients for both the WET and the WIT, transmit and receive beamforming vectors, and network resource allocation. To solve this non-convex problem, we propose an efficient alternating optimization algorithm with linear minimum mean squared error criterion, semi-definite relaxation (SDR) and successive convex approximation techniques. Specifically, the tightness of applying the SDR is proved. Simulation results demonstrate that our proposed scheme with 10 reflecting elements (REs) and 4 antennas can achieve 17.78% and 415.48% performance gains compared to the single-antenna scheme with 10 REs and passive RIS scheme with 100 REs, respectively.

preprint2022arXiv

Generating Authentic Adversarial Examples beyond Meaning-preserving with Doubly Round-trip Translation

Generating adversarial examples for Neural Machine Translation (NMT) with single Round-Trip Translation (RTT) has achieved promising results by releasing the meaning-preserving restriction. However, a potential pitfall for this approach is that we cannot decide whether the generated examples are adversarial to the target NMT model or the auxiliary backward one, as the reconstruction error through the RTT can be related to either. To remedy this problem, we propose a new criterion for NMT adversarial examples based on the Doubly Round-Trip Translation (DRTT). Specifically, apart from the source-target-source RTT, we also consider the target-source-target one, which is utilized to pick out the authentic adversarial examples for the target NMT model. Additionally, to enhance the robustness of the NMT model, we introduce the masked language models to construct bilingual adversarial pairs based on DRTT, which are used to train the NMT model directly. Extensive experiments on both the clean and noisy test sets (including the artificial and natural noise) show that our approach substantially improves the robustness of NMT models.

preprint2022arXiv

Improving Stack Overflow question title generation with copying enhanced CodeBERT model and bi-modal information

Context: Stack Overflow is very helpful for software developers who are seeking answers to programming problems. Previous studies have shown that a growing number of questions are of low quality and thus obtain less attention from potential answerers. Gao et al. proposed an LSTM-based model (i.e., BiLSTM-CC) to automatically generate question titles from the code snippets to improve the question quality. However, only using the code snippets in the question body cannot provide sufficient information for title generation, and LSTMs cannot capture the long-range dependencies between tokens. Objective: This paper proposes CCBERT, a deep learning based novel model to enhance the performance of question title generation by making full use of the bi-modal information of the entire question body. Method: CCBERT follows the encoder-decoder paradigm and uses CodeBERT to encode the question body into hidden representations, a stacked Transformer decoder to generate predicted tokens, and an additional copy attention layer to refine the output distribution. Both the encoder and decoder perform the multi-head self-attention operation to better capture the long-range dependencies. This paper builds a dataset containing around 200,000 high-quality questions filtered from the data officially published by Stack Overflow to verify the effectiveness of the CCBERT model. Results: CCBERT outperforms all the baseline models on the dataset. Experiments on both code-only and low-resource datasets show the superiority of CCBERT with less performance degradation. The human evaluation also shows the excellent performance of CCBERT concerning both readability and correlation criteria.

preprint2022arXiv

In-depth Analysis of Durations of Discretionary Lane Changes on Freeway Under Varying Traffic Conditions

This paper aims to investigate the characteristics of durations of discretionary lane changes (LCs) on freeways based on an enriched dataset. A comprehensive analysis of LC durations was conducted based on vehicle types, LC directions and navigation speeds. It was found that the heavy vehicle takes longer time to complete LC maneuver. The LC direction significantly influences the durations of passenger cars but has no significant influence on heavy vehicles. The navigation speed was found to have important influence on LC durations. However, it has different impacts according to vehicle types and LC directions. Further analysis of LC durations at different stages showed that drivers of passenger cars might use different strategies to perform LCs when they change lanes to different directions. However, drivers of heavy vehicles in both directions used less time to occupy the target lanes. Results of this study can be beneficial to understand the mechanism of LC process and the influence of LC on traffic flow.

preprint2022arXiv

Laneformer: Object-aware Row-Column Transformers for Lane Detection

We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is a long-standing research topic for visual perception in autonomous driving. The dominant paradigms rely on purely CNN-based architectures which often fail in incorporating relations of long-range lane points and global contexts induced by surrounding objects (e.g., pedestrians, vehicles). Inspired by recent advances of the transformer encoder-decoder architecture in various vision tasks, we move forwards to design a new end-to-end Laneformer architecture that revolutionizes the conventional transformers into better capturing the shape and semantic characteristics of lanes, with minimal overhead in latency. First, coupling with deformable pixel-wise self-attention in the encoder, Laneformer presents two new row and column self-attention operations to efficiently mine point context along with the lane shapes. Second, motivated by the appearing objects would affect the decision of predicting lane segments, Laneformer further includes the detected object instances as extra inputs of multi-head attention blocks in the encoder and decoder to facilitate the lane point detection by sensing semantic contexts. Specifically, the bounding box locations of objects are added into Key module to provide interaction with each pixel and query while the ROI-aligned features are inserted into Value module. Extensive experiments demonstrate our Laneformer achieves state-of-the-art performances on CULane benchmark, in terms of 77.1% F1 score. We hope our simple and effective Laneformer will serve as a strong baseline for future research in self-attention models for lane detection.

preprint2022arXiv

ONCE-3DLanes: Building Monocular 3D Lane Detection

We present ONCE-3DLanes, a real-world autonomous driving dataset with lane layout annotation in 3D space. Conventional 2D lane detection from a monocular image yields poor performance of following planning and control tasks in autonomous driving due to the case of uneven road. Predicting the 3D lane layout is thus necessary and enables effective and safe driving. However, existing 3D lane detection datasets are either unpublished or synthesized from a simulated environment, severely hampering the development of this field. In this paper, we take steps towards addressing these issues. By exploiting the explicit relationship between point clouds and image pixels, a dataset annotation pipeline is designed to automatically generate high-quality 3D lane locations from 2D lane annotations in 211K road scenes. In addition, we present an extrinsic-free, anchor-free method, called SALAD, regressing the 3D coordinates of lanes in image view without converting the feature map into the bird's-eye view (BEV). To facilitate future research on 3D lane detection, we benchmark the dataset and provide a novel evaluation metric, performing extensive experiments of both existing approaches and our proposed method. The aim of our work is to revive the interest of 3D lane detection in a real-world scenario. We believe our work can lead to the expected and unexpected innovations in both academia and industry.

preprint2022arXiv

Optimal Fractional Fourier Filtering in Time-vertex Graphs signal processing

Graph signal processing (GSP) is an effective tool in dealing with data residing in irregular domains. In GSP, the optimal graph filter is one of the essential techniques, owing to its ability to recover the original signal from the distorted and noisy version. However, most current research focuses on static graph signals and ordinary space/time or frequency domains. The time-varying graph signals have a strong ability to capture the features of real-world data, and fractional domains can provide a more suitable space to separate the signal and noise. In this paper, the optimal time-vertex graph filter and its Wiener-Hopf equation are developed, using the product graph framework. Furthermore, the optimal time-vertex graph filter in fractional domains is also developed, using the graph fractional Laplacian operator and graph fractional Fourier transform. Numerical simulations on real-world datasets will demonstrate the superiority of the optimal time-vertex graph filter in fractional domains over the optimal time-vertex graph filter in ordinary domains and the optimal static graph filter in fractional domains.

preprint2022arXiv

Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement

Word-level Quality Estimation (QE) of Machine Translation (MT) aims to find out potential translation errors in the translated sentence without reference. Typically, conventional works on word-level QE are designed to predict the translation quality in terms of the post-editing effort, where the word labels ("OK" and "BAD") are automatically generated by comparing words between MT sentences and the post-edited sentences through a Translation Error Rate (TER) toolkit. While the post-editing effort can be used to measure the translation quality to some extent, we find it usually conflicts with the human judgement on whether the word is well or poorly translated. To overcome the limitation, we first create a golden benchmark dataset, namely \emph{HJQE} (Human Judgement on Quality Estimation), where the expert translators directly annotate the poorly translated words on their judgements. Additionally, to further make use of the parallel corpus, we propose the self-supervised pre-training with two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to \emph{HJQE}. We conduct substantial experiments based on the publicly available WMT En-De and En-Zh corpora. The results not only show our proposed dataset is more consistent with human judgment but also confirm the effectiveness of the proposed tag correcting strategies.\footnote{The data can be found at \url{https://github.com/ZhenYangIACAS/HJQE}.}

preprint2022arXiv

Universal Graph Filter Design based on Butterworth, Chebyshev and Elliptic Functions

Graph filters are crucial tools in processing the spectrum of graph signals. In this paper, we propose to design universal IIR graph filters with low computational complexity by using three kinds of functions, which are Butterworth, Chebyshev, and Elliptic functions, respectively. Specifically, inspired by the classical analog filter design method, we first derive the zeros and poles of graph frequency responses. With these zeros and poles, we construct the conjugate graph filters to design the Butterworth high pass graph filter, Chebyshev high pass graph filter, and Elliptic high pass graph filter, respectively. On this basis, we further propose to construct a desired graph filter of low pass, band pass, and band stop by mapping the parameters of the desired graph filter to those of the equivalent high pass graph filter. Furthermore, we propose to set the graph filter order given the maximum passband attenuation and the minimum stopband attenuation. Our numerical results show that the proposed graph filter design methods realize the desired frequency response more accurately than the autoregressive moving average (ARMA) graph filter design method, the linear least-squares fitting (LLS) based graph filter design method, and the Chebyshev FIR polynomial graph filter design method.

preprint2022arXiv

Weighted Mean and Median graph Filters with Attenuation Factor for Sensor Network

This paper proposes a weighted attenuation k-hop graph, which depicts the spatial neighbor nodes with their hops from the central node. Based on this k-kop graph, we further propose a node selecting graph, which selects temporal neighbor nodes of multiple instances of the central node. With this node selecting graph, we propose a graph mean filter. In addition, we also apply the proposed node selecting graph to the median filter. Finally, the experimental results show that the proposed mean filter performs better than the original median filter in the signal denoising polluted by white noise and the median filter using node selecting graph also has better performance than the original median filter.

preprint2021arXiv

Bi-GCN: Binary Graph Convolutional Network

Graph Neural Networks (GNNs) have achieved tremendous success in graph representation learning. Unfortunately, current GNNs usually rely on loading the entire attributed graph into network for processing. This implicit assumption may not be satisfied with limited memory resources, especially when the attributed graph is large. In this paper, we pioneer to propose a Binary Graph Convolutional Network (Bi-GCN), which binarizes both the network parameters and input node features. Besides, the original matrix multiplications are revised to binary operations for accelerations. According to the theoretical analysis, our Bi-GCN can reduce the memory consumption by an average of ~30x for both the network parameters and input data, and accelerate the inference speed by an average of ~47x, on the citation networks. Meanwhile, we also design a new gradient approximation based back-propagation method to train our Bi-GCN well. Extensive experiments have demonstrated that our Bi-GCN can give a comparable performance compared to the full-precision baselines. Besides, our binarization approach can be easily applied to other GNNs, which has been verified in the experiments.

preprint2021arXiv

Ferromagnetic Enhancement in LaMnO3 Films with Release and Flexure

A variety of novel phenomena and functionalities emerge from lowering the dimensionality of materials and enriching the degrees of freedom in modulation. In this work, it is found that the saturation magnetization of LaMnO3 (LMO) films is largely enhanced by 56% after releasing from a brand-new phase of tetragonal strontium aluminate buffer layer, and is significantly increased by 92% with bending films to a curvature of 1 mm-1 using a water-assisted direct-transferring method. Meanwhile, the Curie temperature of LMO films has been improved by 13 K. High-resolution spherical aberration-corrected scanning transmission electron microscopy and first-principles calculations unambiguously demonstrate that the enhanced ferromagnetism is attributed to the strengthened Mn-O-Mn super-exchange interactions from the augmented characteristics of the unconventional P21/n structure caused by the out-of-plane lattice shrinking after strain releasing and increased flexure degree of freestanding LMO films. This work paves a way to achieve large-scale and crack-and-wrinkle-free freestanding films of oxides with largely improved functionalities.

preprint2021arXiv

FlowMOT: 3D Multi-Object Tracking by Scene Flow Association

Most end-to-end Multi-Object Tracking (MOT) methods face the problems of low accuracy and poor generalization ability. Although traditional filter-based methods can achieve better results, they are difficult to be endowed with optimal hyperparameters and often fail in varying scenarios. To alleviate these drawbacks, we propose a LiDAR-based 3D MOT framework named FlowMOT, which integrates point-wise motion information with the traditional matching algorithm, enhancing the robustness of the motion prediction. We firstly utilize a scene flow estimation network to obtain implicit motion information between two adjacent frames and calculate the predicted detection for each old tracklet in the previous frame. Then we use Hungarian algorithm to generate optimal matching relations with the ID propagation strategy to finish the tracking task. Experiments on KITTI MOT dataset show that our approach outperforms recent end-to-end methods and achieves competitive performance with the state-of-the-art filter-based method. In addition, ours can work steadily in the various-speed scenarios where the filter-based methods may fail.

preprint2021arXiv

Optimal Scheduling of Integrated Demand Response-Enabled Community Integrated Energy Systems in Uncertain Environments

The community integrated energy system (CIES) is an essential energy internet carrier that has recently been the focus of much attention. A scheduling model based on chance-constrained programming is proposed for integrated demand response (IDR)-enabled CIES in uncertain environments to minimize the system operating costs, where an IDR program is used to explore the potential interaction ability of electricity-gas-heat flexible loads and electric vehicles. Moreover, power to gas (P2G) and micro-gas turbine (MT), as links of multi-energy carriers, are adopted to strengthen the coupling of different energy subsystems. Sequence operation theory (SOT) and linearization methods are employed to transform the original model into a solvable mixed-integer linear programming model. Simulation results on a practical CIES in North China demonstrate an improvement in the CIES operational economy via the coordination of IDR and renewable uncertainties, with P2G and MT enhancing the system operational flexibility and user comprehensive satisfaction. The CIES operation is able to achieve a trade-off between economy and system reliability by setting a suitable confidence level for the spinning reserve constraints. Besides, the proposed solution method outperforms the Hybrid Intelligent Algorithm in terms of both optimization results and calculation efficiency.

preprint2020arXiv

An Iterative Graph Spectral Subtraction Method for Speech Enhancement

In this paper, we investigate the application of graph signal processing (GSP) theory in speech enhancement. We first propose a set of shift operators to construct graph speech signals, and then analyze their spectrum in the graph Fourier domain. By leveraging the differences between the spectrum of graph speech and graph noise signals, we further propose the graph spectral subtraction (GSS) method to suppress the noise interference in noisy speech. Moreover, based on GSS, we propose the iterative graph spectral subtraction (IGSS) method to further improve the speech enhancement performance. Our experimental results show that the proposed operators are suitable for graph speech signals, and the proposed methods outperform the traditional basic spectral subtraction (BSS) method and iterative basic spectral subtraction (IBSS) method in terms of both signal-to-noise ratios (SNR) and mean Perceptual Evaluation of Speech Quality (PESQ).

preprint2020arXiv

Code-switching pre-training for neural machine translation

This paper proposes a new pre-training method, called Code-Switching Pre-training (CSP for short) for Neural Machine Translation (NMT). Unlike traditional pre-training method which randomly masks some fragments of the input sentence, the proposed CSP randomly replaces some words in the source sentence with their translation words in the target language. Specifically, we firstly perform lexicon induction with unsupervised word embedding mapping between the source and target languages, and then randomly replace some words in the input sentence with their translation words according to the extracted translation lexicons. CSP adopts the encoder-decoder framework: its encoder takes the code-mixed sentence as input, and its decoder predicts the replaced fragment of the input sentence. In this way, CSP is able to pre-train the NMT model by explicitly making the most of the cross-lingual alignment information extracted from the source and target monolingual corpus. Additionally, we relieve the pretrain-finetune discrepancy caused by the artificial symbols like [mask]. To verify the effectiveness of the proposed method, we conduct extensive experiments on unsupervised and supervised NMT. Experimental results show that CSP achieves significant improvements over baselines without pre-training or with other pre-training methods.

preprint2020arXiv

Engine and Aftertreatment Co-Optimization of Connected HEVs via Multi-Range Vehicle Speed Planning and Prediction

Connected vehicles (CVs) have situational awareness that can be exploited for control and optimization of the powertrain system. While extensive studies have been carried out for energy efficiency improvement of CVs via eco-driving and planning, the implication of such technologies on the thermal responses of CVs has not been fully investigated. One of the key challenges in leveraging connectivity for optimization-based thermal management of CVs is the relatively slow thermal dynamics, which necessitate the use of a long prediction horizon to achieve the best performance. Long-term prediction of the CV speed, unlike the V2V/V2I-based short-range prediction, is difficult and error-prone. The multiple timescales inherent to power and thermal systems call for a variable timescale optimization framework with access to short- and long-term vehicle speed preview. To this end, a model predictive controller (MPC) with a multi-range speed preview for integrated power and thermal management (iPTM) of connected hybrid electric vehicles (HEVs) is presented in this paper. The MPC is formulated to manage the power-split between the engine and the battery while enforcing the power and thermal (engine coolant and catalytic converter temperatures) constraints. The MPC exploits prediction and optimization over a shorter receding horizon and longer shrinking horizon. Over the longer shrinking horizon, the vehicle speed estimation is based on the data collected from the connected vehicles traveling on the same route as the ego-vehicle. Simulation results of applying the MPC over real-world urban driving cycles in Ann Arbor, MI are presented to demonstrate the effectiveness and fuel-saving potentials of the proposed iPTM strategy under the uncertainty associated with long-term predictions of the CV's speed.

preprint2020arXiv

Entropy Minimization vs. Diversity Maximization for Domain Adaptation

Entropy minimization has been widely used in unsupervised domain adaptation (UDA). However, existing works reveal that entropy minimization only may result into collapsed trivial solutions. In this paper, we propose to avoid trivial solutions by further introducing diversity maximization. In order to achieve the possible minimum target risk for UDA, we show that diversity maximization should be elaborately balanced with entropy minimization, the degree of which can be finely controlled with the use of deep embedded validation in an unsupervised manner. The proposed minimal-entropy diversity maximization (MEDM) can be directly implemented by stochastic gradient descent without use of adversarial learning. Empirical evidence demonstrates that MEDM outperforms the state-of-the-art methods on four popular domain adaptation datasets.

preprint2020arXiv

Metric-Learning-Assisted Domain Adaptation

Domain alignment (DA) has been widely used in unsupervised domain adaptation. Many existing DA methods assume that a low source risk, together with the alignment of distributions of source and target, means a low target risk. In this paper, we show that this does not always hold. We thus propose a novel metric-learning-assisted domain adaptation (MLA-DA) method, which employs a novel triplet loss for helping better feature alignment. We explore the relationship between the second largest probability of a target sample's prediction and its distance to the decision boundary. Based on the relationship, we propose a novel mechanism to adaptively adjust the margin in the triplet loss according to target predictions. Experimental results show that the use of proposed triplet loss can achieve clearly better results. We also demonstrate the performance improvement of MLA-DA on all four standard benchmarks compared with the state-of-the-art unsupervised domain adaptation methods. Furthermore, MLA-DA shows stable performance in robust experiments.

preprint2020arXiv

Understanding Negative Sampling in Graph Representation Learning

Graph representation learning has been extensively studied in recent years. Despite its potential in generating continuous embeddings for various networks, both the effectiveness and efficiency to infer high-quality representations toward large corpus of nodes are still challenging. Sampling is a critical point to achieve the performance goals. Prior arts usually focus on sampling positive node pairs, while the strategy for negative sampling is left insufficiently explored. To bridge the gap, we systematically analyze the role of negative sampling from the perspectives of both objective and risk, theoretically demonstrating that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance. To the best of our knowledge, we are the first to derive the theory and quantify that the negative sampling distribution should be positively but sub-linearly correlated to their positive sampling distribution. With the guidance of the theory, we propose MCNS, approximating the positive distribution with self-contrast approximation and accelerating negative sampling by Metropolis-Hastings. We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and personalized recommendation, on a total of 19 experimental settings. These relatively comprehensive experimental results demonstrate its robustness and superiorities.

preprint2019arXiv

All-optical frequency resolved optical gating for isolated attosecond pulse reconstruction

We demonstrate an all-optical approach for precise characterization of attosecond extreme ultraviolet pulses. Isolated attosecond pulse is produced from high order harmonics using intense driving pulse with proper gating technique. When a weak field is synchronized with the driver, it perturbs the harmonics generation process via altering the accumulated phase of the electron trajectories. The perturbed harmonic spectrum can be formulated as a convolution of the unperturbed dipole and a phase gate, implying the validity of complete reconstruction of isolated attosecond pulses using conventional frequency resolved optical gating method. This in situ measurement avoids the central momentum approximation assumed in the widely used attosecond streaking measurement, providing a simple and reliable metrology for isolated attosecond pulse.

preprint2019arXiv

Realization of Superadiabatic Two-qubit Gates Using Parametric Modulation in Superconducting Circuits

Fast robust two-qubit gate operation with low susceptibility to crosstalk are the key to scalable quantum information processing. Parametrically driven gate is inherently insensitive to crosstalk while superadiabatic control can speed up the gate without losing accuracy. We propose and experimentally implement superadiabatic two-qubit gates using parametric modulation on superconducting quantum circuits. Our results demonstrate the preservation of adiabaticity at a gate speed close to the quantum limit, in addition to robustness against control instability. We demonstrate a CZ gate with error rate of 5.8$\%$, limited largely by qubit decoherence, promising future improvement and scalable implementation.