Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
41works
0followers
19topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

41 published item(s)

preprint2026arXiv

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)

Large-scale AI models, such as Large Language Models (LLMs) and Diffusion Models (DMs), have grown rapidly in size, creating significant challenges for efficient deployment on resource-constrained hardware. In this paper, we introduce Dynamic-Length Float (DFloat11), a lossless compression framework that reduces LLM and DM size by 30% while preserving outputs that are bit-for-bit identical to the original model. DFloat11 is motivated by the low entropy in the BFloat16 weight representation of LLMs, which reveals significant inefficiency in the existing storage format. By applying entropy coding, DFloat11 assigns dynamic-length encodings to weights based on frequency, achieving near information-optimal compression without any loss of precision. To facilitate efficient inference with dynamic-length encodings, we develop a custom GPU kernel for fast online decompression. Our design incorporates the following: (i) compact, hierarchical lookup tables (LUTs) that fit within GPU SRAM for efficient decoding, (ii) a two-phase GPU kernel for coordinating thread read/write positions using lightweight auxiliary variables, and (iii) transformer-block-level decompression to minimize latency. Experiments on Llama 3.3, Qwen 3, Mistral 3, FLUX.1, and others validate our hypothesis that DFloat11 achieves around 30% model size reduction while preserving bit-for-bit identical outputs. Compared to a potential alternative of offloading parts of an uncompressed model to the CPU to meet memory constraints, DFloat11 achieves 2.3--46.2x higher throughput in token generation. With a fixed GPU memory budget, DFloat11 enables 5.7--14.9x longer generation lengths than uncompressed models. Notably, our method enables lossless inference of Llama 3.1 405B, an 810GB model, on a single node equipped with 8x80GB GPUs.

preprint2026arXiv

Automating API Documentation from Crowdsourced Knowledge

API documentation is crucial for developers to learn and use APIs. However, it is known that many official API documents are obsolete and incomplete. To address this challenge, we propose a new approach called AutoDoc that generates API documents with API knowledge extracted from online discussions on Stack Overflow (SO). AutoDoc leverages a fine-tuned dense retrieval model to identify seven types of API knowledge from SO posts. Then, it uses GPT-4o to summarize the API knowledge in these posts into concise text. Meanwhile, we designed two specific components to handle LLM hallucination and redundancy in generated content. We evaluated AutoDoc against five comparison baselines on 48 APIs of different popularity levels. Our results indicate that the API documents generated by AutoDoc are up to 77.7% more accurate, 9.5% less duplicated, and contain 34.4% knowledge uncovered by the official documents. We also measured the sensitivity of AutoDoc to the choice of different LLMs. We found that while larger LLMs produce higher-quality API documents, AutoDoc enables smaller open-source models (e.g., Mistral-7B-v0.3) to achieve comparable results. Finally, we conducted a user study to evaluate the usefulness of the API documents generated by AutoDoc. All participants found API documents generated by AutoDoc to be more comprehensive, concise, and helpful than the comparison baselines. This highlights the feasibility of utilizing LLMs for API documentation with careful design to counter LLM hallucination and information redundancy.

preprint2026arXiv

Caracal: Causal Architecture via Spectral Mixing

The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, O(L log(L)) Multi-Head Fourier (MHF) module. Our contributions are threefold: (1) We leverage the Fast Fourier Transform (FFT) for sequence mixing, inherently addressing both bottlenecks mentioned above. (2) We apply a frequency-domain causal masking technique that enforces autoregressive capabilities via asymmetric padding and truncation, overcoming a critical barrier for Fourier-based generative models. (3) Unlike efficient models relying on hardware-specific implementations (e.g., Mamba), we uses standard library operators. This ensures robust portability, eliminating common deployment barriers. Evaluations demonstrate that Caracal performs competitively with Transformer and SSM baselines, offering a scalable and simple pathway for efficient long-sequence modeling. Code is available in Appendix.

preprint2026arXiv

Deployability-Centric Infrastructure-as-Code Generation: Fail, Learn, Refine, and Succeed through LLM-Empowered DevOps Simulation

Infrastructure-as-Code (IaC) generation holds significant promise for automating cloud infrastructure provisioning. Recent advances in Large Language Models (LLMs) present a promising opportunity to democratize IaC development by generating deployable infrastructure templates from natural language descriptions. However, current evaluation focuses on syntactic correctness while ignoring deployability, the critical measure of the utility of IaC configuration files. Six state-of-the-art LLMs performed poorly on deployability, achieving only 20.8$\sim$30.2% deployment success rate on the first attempt. In this paper, we construct DPIaC-Eval, the first deployability-centric IaC template benchmark consisting of 153 real-world scenarios cross 58 unique services. Also, we propose an LLM-based deployability-centric framework, dubbed IaCGen, that uses iterative feedback mechanism encompassing format verification, syntax checking, and live deployment stages, thereby closely mirroring the real DevOps workflows. Results show that IaCGen can make 54.6$\sim$91.6% generated IaC templates from all evaluated models deployable in the first 10 iterations. Additionally, human-in-the-loop feedback that provide direct guidance for the deployability errors, can further boost the performance to over 90% passItr@25 on all evaluated LLMs. Furthermore, we explore the trustworthiness of the generated IaC templates on user intent alignment and security compliance. The poor performance (25.2% user requirement coverage and 8.4% security compliance rate) indicates a critical need for continued research in this domain.

preprint2026arXiv

FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics

AI research agents accelerate ML research by automating hypothesis generation, experimentation, and empirical refinement. Existing agent strategies range from greedy hill-climbing to tree search and evolutionary optimization, yet which strategy choices drive performance remains unclear. Answering this question requires a benchmark that separates agent strategy (e.g., search topology) from execution infrastructure (e.g., code editor), so that performance differences are attributable to strategy rather than infrastructure, and that provides process-level metrics beyond final scores to analyze exploration behaviors. Existing benchmarks offer limited support. We propose FML-Bench, a benchmark of 18 fundamental ML research tasks across 10 domains that separates agent strategy from execution infrastructure and defines 12 process-level behavioral metrics. Evaluating six representative agents, we find that: (1) strategy complexity alone does not guarantee strong performance: a simple greedy hill-climber nearly matches the best-performing tree-search agent, both well above the remaining agents; (2) our analysis suggests this pattern relates to improvement opportunity structure: greedy search tends to be more effective when opportunities are dense, while tree-search and evolutionary strategies tend to be more effective when opportunities are sparse; an adaptive agent built on this insight switches to broader exploration upon detecting improvement stagnation and outperforms the other six agents, lending initial support to this observation; and (3) process-level analysis reveals that early convergence and directionally focused exploration are significantly associated with final performance, while solution diversity and compute cost are not. Our benchmark is available at: https://github.com/qrzou/FML-bench.

preprint2026arXiv

Geometry-Aware State Space Model: A New Paradigm for Whole-Slide Image Representation

Accurate analysis of histopathological images is critical for disease diagnosis and treatment planning. Whole-slide images (WSIs), which digitize tissue specimens at gigapixel resolution, are fundamental to this process but require aggregating thousands of patches for slide-level predictions. Multiple Instance Learning (MIL) tackles this challenge with a two-stage paradigm, decoupling tile-level embedding and slide-level prediction. However, most existing methods implicitly embed patch representations in homogeneous Euclidean spaces, overlooking the hierarchical organization and regional heterogeneity of pathological tissues. This limits current models' ability to capture global tissue architecture and fine-grained cellular morphology. To address this limitation, we introduce a hybrid hyperbolic-Euclidean representation that embeds WSI features in dual geometric spaces, enabling complementary modeling of hierarchical tissue structures and local morphological details. Building on this formulation, we develop BatMIL, a WSI classification framework that leverages both geometric spaces. To model long-range dependencies among thousands of patches, we employ a structured state space sequence model (S4) backbone that encodes patch sequences with linear computational complexity. Furthermore, to account for regional heterogeneity, we introduce a chunk-level mixture-of-experts (MoE) module that groups patches into regions and dynamically routes them to specialized subnetworks, improving representational capacity while reducing redundant computation. Extensive experiments on seven WSI datasets spanning six cancer types demonstrate that BatMIL consistently outperforms state-of-the-art MIL approaches in slide-level classification tasks. These results indicate that geometry-aware representation learning offers a promising direction for next-generation computational pathology.

preprint2026arXiv

LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid

Large language models (LLMs) have shown immense potential across various domains, but their high memory requirements and inference costs remain critical challenges for deployment. Post-training quantization (PTQ) has emerged as a promising technique to reduce memory requirements and decoding latency. However, recent accurate quantization methods often depend on specialized computations or custom data formats to achieve better model quality, which limits their compatibility with popular frameworks, as they require dedicated inference kernels tailored to specific hardware and software platforms, hindering wider adoption. Furthermore, many competitive methods have high resource requirements and computational overhead for quantizing models, making it challenging to scale them to hundreds of billions of parameters. In response to these challenges, we propose LeanQuant (Loss-Error-Aware Network Quantization), a novel quantization method that is accurate, versatile, and scalable. In the existing popular iterative loss-error-based quantization framework, we identify a critical limitation in prior methods: the min-max affine quantization grid fails to preserve model quality due to outliers in inverse Hessian diagonals. To overcome this fundamental issue, we propose learning loss-error-aware grids, instead of using non-adaptive min-max affine grids. Our approach not only produces quantized models that are more accurate but also generalizes to a wider range of quantization types, including affine and non-uniform quantization, enhancing compatibility with more frameworks. Extensive experiments with recent LLMs demonstrate that LeanQuant is highly accurate, comparing favorably against competitive baselines in model quality, and scalable, achieving very accurate quantization of Llama-3.1 405B, one of the largest open-source LLMs to date, using two Quadro RTX 8000-48GB GPUs in 21 hours.

preprint2026arXiv

Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation

Adapting pre-trained large language models (LLMs) is crucial but challenging due to their enormous size. Parameter-efficient fine-tuning (PEFT) techniques typically employ additive adapters applied to frozen model weights. To further reduce memory usage, model weights are often compressed through quantization. However, existing PEFT methods often yield suboptimal model quality because they rely on restrictive assumptions, such as low-rank constraints on adapters to limit the number of trainable parameters. We find that sketching, a popular data compression technique, can serve as an efficient LLM adaptation strategy while avoiding the low-rank assumption. We introduce SketchTune, a compressive adaptation strategy that compresses LLM weights into compact fine-tunable sketches, integrating compression and adaptation into a unified framework. This integration eliminates the need for complex two-path computation in existing PEFT techniques, enabling faster and more memory-efficient training and inference. SketchTune is supported by mathematical insights into matrix classes that are better approximated using sketching rather than low-rank methods. Our extensive evaluations with Llama and Mistral models demonstrate that SketchTune outperforms leading PEFT methods across diverse tasks while using substantially smaller base models and comparable trainable parameters. As a highlight, SketchTune outperforms LoRA, DoRA, and S2FT on commonsense and math benchmarks using 2.6-3.5$\times$ smaller base models and exceeds LoftQ in accuracy by 14.48% on GSM8K with 7.3$\times$ fewer trainable parameters. Our code is available at https://github.com/LeanModels/SketchTune.

preprint2026arXiv

Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

Usability testing with experts and potential users can assess the effectiveness, efficiency, and user satisfaction of graphical user interfaces (GUIs) but doing so remains a costly and time-intensive process. Prior work has used computer use agents (CUAs) and other generative agents that can simulate user interactions and preference, but we show that agents still struggle to provide accurate usability assessments. In this work, we present a novel machine learning method that operationalizes a computational definition of usability to train CUAs to assess GUI usability by i) prioritizing important interaction flows, ii) executing them through human-like interactions, and iii) predicting a learned numerical usability score. We train a computer use agent, uxCUA, with our algorithm on a large-scale dataset of fully interactive user interfaces (UIs) paired with usability labels and human preferences. We show that uxCUA outperforms larger models in accurate usability assessments and produces realistic critiques of both synthetic and real UIs. More broadly, our work aims to build a principled, data-driven foundation for automated usability assessment in HCI.

preprint2026arXiv

VPD-100K: Towards Generalizable and Fine-grained Visual Privacy Protection

Privacy protection has become a critical requirement in the era of ubiquitous visual data sharing, imposing higher demands on efficient and robust privacy detection algorithms. However, current robust detection models are severely hindered by the lack of comprehensive datasets. Existing privacy-oriented datasets often suffer from limited scale, coarse-grained annotations, and narrow domain coverage, failing to capture the intricate details of sensitive information in realworld environments. To bridge this gap, we present a large-scale, fine-grained Visual Privacy Dataset (VPD-100K), designed to facilitate generalized privacy detection. We establish a holistic taxonomy comprising four primary domains: Human Presence, On-Screen Personally Identifiable Information (PII), Physical Identifiers, and Location Indicators, containing 100,000 images annotated with 33 fine-grained classes and over 190,000 object instances. Statistical analysis reveals that our dataset features long-tailed distributions, small object scales, and high visual complexity. These characteristics make the dataset particularly valuable for demanding, unconstrained applications such as live streaming, where actors frequently face unintentional, realtime information leakage. Furthermore, we design an effective frequency-enhanced lightweight module consisting of frequency-domain attention fusion and adaptive spectral gating mechanism that breaks the limitations of spatial pixel intensity to better capture the subtle details of sensitive information. Extensive experiments conducted on both diverse image and streaming videos benchmarks consistently demonstrate the effectiveness of our VPD-100K dataset and the wellcurated frequency mechanism. The code and dataset are available at https://vpd-100k.github.io/.

preprint2024arXiv

AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback

Large language models (LLMs) such as ChatGPT have seen widespread adoption due to their strong instruction-following abilities. Developing these LLMs involves a complex yet poorly understood workflow requiring training with human feedback. Replicating and understanding this instruction-following requires tackling three major challenges: the high cost of data collection, the lack of trustworthy evaluation, and the absence of reference method implementations. We address these challenges with AlpacaFarm, a simulator that enables research and development for learning from feedback at a low cost. First, we design LLM prompts to simulate human feedback that are 50x cheaper than crowdworkers and display high agreement with humans. Second, we propose an automatic evaluation and validate it against human instructions obtained on real-world interactions. Third, we contribute reference implementations for several methods (PPO, DPO, best-of-n, expert iteration, and more) that learn from pairwise feedback. Finally, as an end-to-end validation of AlpacaFarm, we train and evaluate eleven models on 10k pairs of real human feedback and show that rankings of models trained in AlpacaFarm match rankings of models trained on human data. As a demonstration of the research possible in AlpacaFarm, we find that methods that use a reward model can substantially improve over supervised fine-tuning and that our reference PPO implementation leads to a +10% improvement in win-rate against Davinci003. We release all components of AlpacaFarm at https://github.com/tatsu-lab/alpaca_farm.

preprint2024arXiv

Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations

Relational databases play an important role in business, science, and more. However, many users cannot fully unleash the analytical power of relational databases, because they are not familiar with database languages such as SQL. Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine incorrect queries. To address these issues, we introduce a new interaction mechanism that allows users to directly edit a step-by-step explanation of a query to fix errors. Our experiments on multiple datasets, as well as a user study with 24 participants, demonstrate that our approach can achieve better performance than multiple SOTA approaches. Our code and datasets are available at https://github.com/magic-YuanTian/STEPS.

preprint2023arXiv

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation. Our DL3DV-10K dataset, benchmark results, and models will be publicly accessible at https://dl3dv-10k.github.io/DL3DV-10K/.

preprint2023arXiv

In-Plane Magnon Valve Effect in Magnetic Insulator/Heavy Metal/ Magnetic Insulator Device

We propose an in-plane magnon valve (MV), a sandwich structure composed of ferromagnetic insulator/heavy metal/ferromagnetic insulator (MI/HM/MI). When the magnetizations of the two MI layers are parallel, the longitudinal conductance in the HM layer is greater than that in the antiparallel state according to the magnetic proximity effect, termed as the in-plane magnon valve effect. We investigate the dependence of MV ratio (MVR), which is the relative change in longitudinal conductance between the parallel and antiparallel MV states, on the difference in electronic structure between magnetized and non-magnetized metal atoms, revealing that MVR can reach 100%. Additionally, the dependence of MVR on the thickness of metal layer is analyzed, revealing an exponential decrease with increasing thickness. Then we investigate the dependence of HM layer conductance on the relative angle between the magnetizations of two MI layers, illustrating the potential of MV as a magneto-sensitive magnonic sensor. We also investigate the effect of Joule heating on the measurement signal based on the spin Seebeck effect. Two designed configurations are proposed according to whether the electron current is parallel or perpendicular to the magnetization of the MI layer. In the parallel configuration, the transverse voltage differs between the parallel and antiparallel MV states. While in the perpendicular configuration, the longitudinal resistance differs. Quantitative numerical results indicate the feasibility of detecting a voltage signal using the first configuration in experiments. Our work contributes valuable insights for the design, development and integration of magnon devices

preprint2023arXiv

Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

3D point cloud semantic segmentation has a wide range of applications. Recently, weakly supervised point cloud segmentation methods have been proposed, aiming to alleviate the expensive and laborious manual annotation process by leveraging scene-level labels. However, these methods have not effectively exploited the rich geometric information (such as shape and scale) and appearance information (such as color and texture) present in RGB-D scans. Furthermore, current approaches fail to fully leverage the point affinity that can be inferred from the feature extraction network, which is crucial for learning from weak scene-level labels. Additionally, previous work overlooks the detrimental effects of the long-tailed distribution of point cloud data in weakly supervised 3D semantic segmentation. To this end, this paper proposes a simple yet effective scene-level weakly supervised point cloud segmentation method with a newly introduced multi-modality point affinity inference module. The point affinity proposed in this paper is characterized by features from multiple modalities (e.g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution. Extensive experiments on the ScanNet and S3DIS benchmarks verify the effectiveness of our proposed method, which outperforms the state-of-the-art by ~4% to ~6% mIoU. Codes are released at https://github.com/Sunny599/AAAI24-3DWSSG-MMA.

preprint2023arXiv

Voltage-Controlled Magnon Transistor via Tunning Interfacial Exchange Coupling

Magnon transistors that can effectively regulate magnon transport by an electric field are desired for magnonics which aims to provide a Joule-heating free alternative to the conventional electronics owing to the electric neutrality of magnons (the key carriers of spin-angular momenta in the magnonics). However, also due to their electric neutrality, magnons have no access to directly interact with an electric field and it is thus difficult to manipulate magnon transport by voltages straightforwardly. Here, we demonstrated a gate voltage ($V_{\rm g}$) applied on a nonmagnetic metal/magnetic insulator (NM/MI) interface that bended the energy band of the MI and then modulated the possibility for conduction electrons in the NM to tunnel into the MI can consequently enhance or weaken the spin-magnon conversion efficiency at the interface. A voltage-controlled magnon transistor based on the magnon-mediated electric current drag (MECD) effect in a Pt/Y$_{\rm 3}$Fe$_{\rm 5}$O$_{\rm 12}$ (YIG)/Pt sandwich was then experimentally realized with $V_{\rm g}$ modulating the magnitude of the MECD signal. The obtained efficiency (the change ratio between the MECD voltage at $\pm V_{\rm g}$) reached 10%/(MV/cm) at 300 K. This prototype of magnon transistor offers an effective scheme to control magnon transport by a gate voltage.

preprint2022arXiv

Faster Cut-Equivalent Trees in Simple Graphs

Let $G = (V, E)$ be an undirected connected simple graph on $n$ vertices. A cut-equivalent tree of $G$ is an edge-weighted tree on the same vertex set $V$, such that for any pair of vertices $s, t\in V$, the minimum $(s, t)$-cut in the tree is also a minimum $(s, t)$-cut in $G$, and these two cuts have the same cut value. In a recent paper [Abboud, Krauthgamer and Trabelsi, 2021], the authors propose the first subcubic time algorithm for constructing a cut-equivalent tree. More specifically, their algorithm has $\widetilde{O}(n^{2.5})$ running time. In this paper, we improve the running time to $\hat{O}(n^2)$ if almost-linear time max-flow algorithms exist. Also, using the currently fastest max-flow algorithm by [van den Brand et al, 2021], our algorithm runs in time $\widetilde{O}(n^{17/8})$.

preprint2022arXiv

Faster Min-Plus Product for Monotone Instances

In this paper, we show that the time complexity of monotone min-plus product of two $n\times n$ matrices is $\tilde{O}(n^{(3+ω)/2})=\tilde{O}(n^{2.687})$, where $ω< 2.373$ is the fast matrix multiplication exponent [Alman and Vassilevska Williams 2021]. That is, when $A$ is an arbitrary integer matrix and $B$ is either row-monotone or column-monotone with integer elements bounded by $O(n)$, computing the min-plus product $C$ where $C_{i,j}=\min_k\{A_{i,k}+B_{k,j}\}$ takes $\tilde{O}(n^{(3+ω)/2})$ time, which greatly improves the previous time bound of $\tilde{O}(n^{(12+ω)/5})=\tilde{O}(n^{2.875})$ [Gu, Polak, Vassilevska Williams and Xu 2021]. Then by simple reductions, this means the following problems also have $\tilde{O}(n^{(3+ω)/2})$ time algorithms: (1) $A$ and $B$ are both bounded-difference, that is, the difference between any two adjacent entries is a constant. The previous results give time complexities of $\tilde{O}(n^{2.824})$ [Bringmann, Grandoni, Saha and Vassilevska Williams 2016] and $\tilde{O}(n^{2.779})$ [Chi, Duan and Xie 2022]. (2) $A$ is arbitrary and the columns or rows of $B$ are bounded-difference. Previous result gives time complexity of $\tilde{O}(n^{2.922})$ [Bringmann, Grandoni, Saha and Vassilevska Williams 2016]. (3) The problems reducible to these problems, such as language edit distance, RNA-folding, scored parsing problem on BD grammars. [Bringmann, Grandoni, Saha and Vassilevska Williams 2016]. Finally, we also consider the problem of min-plus convolution between two integral sequences which are monotone and bounded by $O(n)$, and achieve a running time upper bound of $\tilde{O}(n^{1.5})$. Previously, this task requires running time $\tilde{O}(n^{(9+\sqrt{177})/12}) = O(n^{1.859})$ [Chan and Lewenstein 2015].

preprint2022arXiv

Learning Cross-Scale Visual Representations for Real-Time Image Geo-Localization

Robot localization remains a challenging task in GPS denied environments. State estimation approaches based on local sensors, e.g. cameras or IMUs, are drifting-prone for long-range missions as error accumulates. In this study, we aim to address this problem by localizing image observations in a 2D multi-modal geospatial map. We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources. We propose a framework that learns cross-scale visual representations without supervision. Experiments are conducted on data from two different domains, underwater and aerial. In contrast to existing studies in cross-view image geo-localization, our approach a) performs better on smaller-scale multi-modal maps; b) is more computationally efficient for real-time applications; c) can serve directly in concert with state estimation pipelines.

preprint2022arXiv

Model-Based Neural Network and Its Application to Line Spectral Estimation

This paper presents the concept of &#34;model-based neural network&#34;(MNN), which is inspired by the classic artificial neural network (ANN) but for different usages. Instead of being used as a data-driven classifier, a MNN serves as a modeling tool with artfully defined inputs, outputs, and activation functions which have explicit physical meanings. Owing to the same layered form as an ANN, a MNN can also be optimized using the back-propagation (BP) algorithm. As an interesting application, the classic problem of line spectral estimation can be modeled by a MNN. We propose to first initialize the MNN by the fast Fourier transform (FFT) based spectral estimation, and then optimize the MNN by the BP algorithm, which automatically yields the maximum likelihood (ML) parameter estimation of the frequency spectrum. We also design a method of merging and pruning the hidden-layer nodes of the MNN, which can be used for model-order selection, i.e., to estimate the number of sinusoids. Numerical simulations verify the effectiveness of the proposed method.

preprint2022arXiv

On the Opportunities and Risks of Foundation Models

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.

preprint2022arXiv

Pancreatic Cancer ROSE Image Classification Based on Multiple Instance Learning with Shuffle Instances

The rapid on-site evaluation (ROSE) technique can significantly ac-celerate the diagnostic workflow of pancreatic cancer by immediately analyzing the fast-stained cytopathological images with on-site pathologists. Computer-aided diagnosis (CAD) using the deep learning method has the potential to solve the problem of insufficient pathology staffing. However, the cancerous patterns of ROSE images vary greatly between different samples, making the CAD task extremely challenging. Besides, due to different staining qualities and various types of acquisition devices, the ROSE images also have compli-cated perturbations in terms of color distribution, brightness, and contrast. To address these challenges, we proposed a novel multiple instance learning (MIL) approach using shuffle patches containing the instances, which adopts the patch-based learning strategy of Vision Transformers. With the re-grouped bags of shuffle instances and their bag-level soft labels, the approach utilizes a MIL head to make the model focus on the features from the pancreatic cancer cells, rather than that from various perturbations in ROSE images. Simultaneously, combined with a classification head, the model can effectively identify the gen-eral distributive patterns across different instances. The results demonstrate the significant improvements in the classification accuracy with more accurate at-tention regions, indicating that the diverse patterns of ROSE images are effec-tively extracted, and the complicated perturbations of ROSE images are signifi-cantly eliminated. It also suggests that the MIL with shuffle instances has great potential in the analysis of cytopathological images.

preprint2022arXiv

Shuffle Instances-based Vision Transformer for Pancreatic Cancer ROSE Image Classification

The rapid on-site evaluation (ROSE) technique can signifi-cantly accelerate the diagnosis of pancreatic cancer by im-mediately analyzing the fast-stained cytopathological images. Computer-aided diagnosis (CAD) can potentially address the shortage of pathologists in ROSE. However, the cancerous patterns vary significantly between different samples, making the CAD task extremely challenging. Besides, the ROSE images have complicated perturbations regarding color distribution, brightness, and contrast due to different staining qualities and various acquisition device types. To address these challenges, we proposed a shuffle instances-based Vision Transformer (SI-ViT) approach, which can reduce the perturbations and enhance the modeling among the instances. With the regrouped bags of shuffle instances and their bag-level soft labels, the approach utilizes a regression head to make the model focus on the cells rather than various perturbations. Simultaneously, combined with a classification head, the model can effectively identify the general distributive patterns among different instances. The results demonstrate significant improvements in the classification accuracy with more accurate attention regions, indicating that the diverse patterns of ROSE images are effectively extracted, and the complicated perturbations are significantly reduced. It also suggests that the SI-ViT has excellent potential in analyzing cytopathological images. The code and experimental results are available at https://github.com/sagizty/MIL-SI.

preprint2022arXiv

TempLM: Distilling Language Models into Template-Based Generators

While pretrained language models (PLMs) have greatly improved text generation, they have also been known to produce unfaithful or inappropriate content. In contrast, classic template-based systems provide strong guarantees of faithfulness at the cost of fluency. We propose TempLM, which achieves the best of both worlds by distilling a PLM into a template-based generator. On the E2E and SynthBio data-to-text datasets, we show that TempLM is more faithful than the original PLM and is more fluent than prior template systems. Notably, on an out-of-domain evaluation, TempLM reduces a finetuned BART model&#39;s unfaithfulness rate from 83% to 0%. In a human study, we find that TempLM&#39;s templates substantially improve upon human-written ones in BERTScore.

preprint2022arXiv

When Cyber-Physical Systems Meet AI: A Benchmark, an Evaluation, and a Way Forward

Cyber-physical systems (CPS) have been broadly deployed in safety-critical domains, such as automotive systems, avionics, medical devices, etc. In recent years, Artificial Intelligence (AI) has been increasingly adopted to control CPS. Despite the popularity of AI-enabled CPS, few benchmarks are publicly available. There is also a lack of deep understanding on the performance and reliability of AI-enabled CPS across different industrial domains. To bridge this gap, we initiate to create a public benchmark of industry-level CPS in seven domains and build AI controllers for them via state-of-the-art deep reinforcement learning (DRL) methods. Based on that, we further perform a systematic evaluation of these AI-enabled systems with their traditional counterparts to identify the current challenges and explore future opportunities. Our key findings include (1) AI controllers do not always outperform traditional controllers, (2) existing CPS testing techniques (falsification, specifically) fall short of analyzing AI-enabled CPS, and (3) building a hybrid system that strategically combines and switches between AI controllers and traditional controllers can achieve better performance across different domains. Our results highlight the need for new testing techniques for AI-enabled CPS and the need for more investigations into hybrid CPS systems to achieve optimal performance and reliability.

preprint2021arXiv

All lines on a smooth cubic surface in terms of three skew lines

Jordan showed that the incidence variety of a smooth cubic surface containing 27 lines has solvable Galois group over the incidence variety of a smooth cubic surface containing 3 skew lines. As noted by Harris, it follows that for any smooth cubic surface, there exist formulas for all 27 lines in terms of any 3 skew lines. In response to a question of Farb, we compute these formulas explicitly. We also discuss how these formulas relate to Schläfli&#39;s count of lines on real smooth cubic surfaces.

preprint2021arXiv

Learning to Stop with Surprisingly Few Samples

We consider a discounted infinite horizon optimal stopping problem. If the underlying distribution is known a priori, the solution of this problem is obtained via dynamic programming (DP) and is given by a well known threshold rule. When information on this distribution is lacking, a natural (though naive) approach is &#34;explore-then-exploit,&#34; whereby the unknown distribution or its parameters are estimated over an initial exploration phase, and this estimate is then used in the DP to determine actions over the residual exploitation phase. We show: (i) with proper tuning, this approach leads to performance comparable to the full information DP solution; and (ii) despite common wisdom on the sensitivity of such &#34;plug in&#34; approaches in DP due to propagation of estimation errors, a surprisingly &#34;short&#34; (logarithmic in the horizon) exploration horizon suffices to obtain said performance. In cases where the underlying distribution is heavy-tailed, these observations are even more pronounced: a ${\it single \, sample}$ exploration phase suffices.

preprint2020arXiv

A Scaling Algorithm for Weighted $f$-Factors in General Graphs

We study the maximum weight perfect $f$-factor problem on any general simple graph $G=(V,E,w)$ with positive integral edge weights $w$, and $n=|V|$, $m=|E|$. When we have a function $f:V\rightarrow \mathbb{N}_+$ on vertices, a perfect $f$-factor is a generalized matching so that every vertex $u$ is matched to $f(u)$ different edges. The previous best algorithms on this problem have running time $O(m f(V))$ [Gabow 2018] or $\tilde{O}(W(f(V))^{2.373}))$ [Gabow and Sankowski 2013], where $W$ is the maximum edge weight, and $f(V)=\sum_{u\in V}f(u)$. In this paper, we present a scaling algorithm for this problem with running time $\tilde{O}(mn^{2/3}\log W)$. Previously this bound is only known for bipartite graphs [Gabow and Tarjan 1989]. The running time of our algorithm is independent of $f(V)$, and consequently it first breaks the $Ω(mn)$ barrier for large $f(V)$ even for the unweighted $f$-factor problem in general graphs.

preprint2020arXiv

An Analysis of Adversarial Attacks and Defenses on Autonomous Driving Models

Nowadays, autonomous driving has attracted much attention from both industry and academia. Convolutional neural network (CNN) is a key component in autonomous driving, which is also increasingly adopted in pervasive computing such as smartphones, wearable devices, and IoT networks. Prior work shows CNN-based classification models are vulnerable to adversarial attacks. However, it is uncertain to what extent regression models such as driving models are vulnerable to adversarial attacks, the effectiveness of existing defense techniques, and the defense implications for system and middleware builders. This paper presents an in-depth analysis of five adversarial attacks and four defense methods on three driving models. Experiments show that, similar to classification models, these models are still highly vulnerable to adversarial attacks. This poses a big security threat to autonomous driving and thus should be taken into account in practice. While these defense methods can effectively defend against different attacks, none of them are able to provide adequate protection against all five attacks. We derive several implications for system and middleware builders: (1) when adding a defense component against adversarial attacks, it is important to deploy multiple defense methods in tandem to achieve a good coverage of various attacks, (2) a blackbox attack is much less effective compared with a white-box attack, implying that it is important to keep model details (e.g., model architecture, hyperparameters) confidential via model obfuscation, and (3) driving models with a complex architecture are preferred if computing resources permit as they are more resilient to adversarial attacks than simple models.

preprint2020arXiv

BERTScore: Evaluating Text Generation with BERT

We propose BERTScore, an automatic evaluation metric for text generation. Analogously to common metrics, BERTScore computes a similarity score for each token in the candidate sentence with each token in the reference sentence. However, instead of exact matches, we compute token similarity using contextual embeddings. We evaluate using the outputs of 363 machine translation and image captioning systems. BERTScore correlates better with human judgments and provides stronger model selection performance than existing metrics. Finally, we use an adversarial paraphrase detection task to show that BERTScore is more robust to challenging examples when compared to existing metrics.

preprint2020arXiv

Demystifying Orthogonal Monte Carlo and Beyond

Orthogonal Monte Carlo (OMC) is a very effective sampling algorithm imposing structural geometric conditions (orthogonality) on samples for variance reduction. Due to its simplicity and superior performance as compared to its Quasi Monte Carlo counterparts, OMC is used in a wide spectrum of challenging machine learning applications ranging from scalable kernel methods to predictive recurrent neural networks, generative models and reinforcement learning. However theoretical understanding of the method remains very limited. In this paper we shed new light on the theoretical principles behind OMC, applying theory of negatively dependent random variables to obtain several new concentration results. We also propose a novel extensions of the method leveraging number theory techniques and particle algorithms, called Near-Orthogonal Monte Carlo (NOMC). We show that NOMC is the first algorithm consistently outperforming OMC in applications ranging from kernel methods to approximating distances in probabilistic metric spaces.

preprint2020arXiv

ICS-Assist: Intelligent Customer Inquiry Resolution Recommendation in Online Customer Service for Large E-Commerce Businesses

Efficient and appropriate online customer service is essential to large e-commerce businesses. Existing solution recommendation methods for online customer service are unable to determine the best solutions at runtime, leading to poor satisfaction of end customers. This paper proposes a novel intelligent framework, called ICS-Assist, to recommend suitable customer service solutions for service staff at runtime. Specifically, we develop a generalizable two-stage machine learning model to identify customer service scenarios and determine customer service solutions based on a scenario-solution mapping table. We implement ICS-Assist and evaluate it using an over 6-month field study with Alibaba Group. In our experiment, over 12,000 customer service staff use ICS-Assist to serve for over 230,000 cases per day on average. The experimen-tal results show that ICS-Assist significantly outperforms the traditional manual method, and improves the solution acceptance rate, the solution coverage rate, the average service time, the customer satisfaction rate, and the business domain catering rate by up to 16%, 25%, 6%, 14% and 17% respectively, compared to the state-of-the-art methods.

preprint2020arXiv

Impact of 150keV and 590keV proton irradiation on monolayer MoS2

We present a comprehensive study on the effects of proton irradiation at different energies (150 and 590 keV) with the fluence of 1x 1012 proton/cm2 on monolayer MoS2. This study not only improves our understanding of the influence of high-energy proton beams on MoS2 but also has implications for radiation-induced changes in device processing and engineering of devices from multilayer MoS2 starting material. Increasing defect density with decreasing proton irradiation energy was observed from photoluminescence spectroscopy study. These defects are attributed to sulfur vacancies observed through x-ray photoelectron spectroscopy analysis and confirmed by transmission electron microscope imaging. Scanning electron microscopy images showed the creation of grain boundaries after proton irradiation. A higher degree of surface deformation was detected with lower irradiation energies through atomic force microscopy. Inter-defect distance is increased with the increase in proton energy irradiation as estimated by transmission electron microscopy imaging. Raman spectroscopy reveals negligible structural changes in the crystal quality after the irradiation. These deformation damages due to proton irradiation are insignificant at the MoS2 layer. Based on the overall influence of low energy proton irradiation on the material characteristics, ML-MoS2 materials can be considered robust and reliable building blocks for 2D material based devices for space applications.

preprint2020arXiv

Mitigating Overfitting in Supervised Classification from Two Unlabeled Datasets: A Consistent Risk Correction Approach

The recently proposed unlabeled-unlabeled (UU) classification method allows us to train a binary classifier only from two unlabeled datasets with different class priors. Since this method is based on the empirical risk minimization, it works as if it is a supervised classification method, compatible with any model and optimizer. However, this method sometimes suffers from severe overfitting, which we would like to prevent in this paper. Our empirical finding in applying the original UU method is that overfitting often co-occurs with the empirical risk going negative, which is not legitimate. Therefore, we propose to wrap the terms that cause a negative empirical risk by certain correction functions. Then, we prove the consistency of the corrected risk estimator and derive an estimation error bound for the corrected risk minimizer. Experiments show that our proposal can successfully mitigate overfitting of the UU method and significantly improve the classification accuracy.

preprint2020arXiv

Monolayer Vanadium-doped Tungsten Disulfide: A Room-Temperature Dilute Magnetic Semiconductor

Dilute magnetic semiconductors, achieved through substitutional doping of spin-polarized transition metals into semiconducting systems, enable experimental modulation of spin dynamics in ways that hold great promise for novel magneto-electric or magneto-optical devices, especially for two-dimensional systems such as transition metal dichalcogenides that accentuate interactions and activate valley degrees of freedom. Practical applications of 2D magnetism will likely require room-temperature operation, air stability, and (for magnetic semiconductors) the ability to achieve optimal doping levels without dopant aggregation. Here we describe room-temperature ferromagnetic order obtained in semiconducting vanadium-doped tungsten disulfide monolayers produced by a reliable single-step film sulfidation method across an exceptionally wide range of vanadium concentrations, up to 12 at% with minimal dopant aggregation. These monolayers develop p-type transport as a function of vanadium incorporation and rapidly reach ambipolarity. Ferromagnetism peaks at an intermediate vanadium concentration of a few atomic percent and decreases for higher concentrations, which is consistent with quenching due to orbital hybridization at closer vanadium-vanadium spacings, as supported by transmission electron microscopy, magnetometry and first-principles calculations. Room-temperature two-dimensional dilute magnetic semiconductors provide a new component to expand the functional scope of van der Waals heterostructures and bring semiconducting magnetic 2D heterostructures them into the realm of practical application.

preprint2020arXiv

Near-linear Time Algorithm for Approximate Minimum Degree Spanning Trees

Given a graph $G = (V, E)$, we wish to compute a spanning tree whose maximum vertex degree, i.e. tree degree, is as small as possible. Computing the exact optimal solution is known to be NP-hard, since it generalizes the Hamiltonian path problem. For the approximation version of this problem, a $\tilde{O}(mn)$ time algorithm that computes a spanning tree of degree at most $Δ^* +1$ is previously known [Fürer \& Raghavachari 1994]; here $Δ^*$ denotes the minimum tree degree of all the spanning trees. In this paper we give the first near-linear time approximation algorithm for this problem. Specifically speaking, we propose an $\tilde{O}(\frac{1}{ε^7}m)$ time algorithm that computes a spanning tree with tree degree $(1+ε)Δ^* + O(\frac{1}{ε^2}\log n)$ for any constant $ε\in (0,\frac{1}{6})$. Thus, when $Δ^*=ω(\log n)$, we can achieve approximate solutions with constant approximate ratio arbitrarily close to 1 in near-linear time.

preprint2020arXiv

Photo-degradation Protection in 2D In-Plane Heterostructures Revealed by Hyperspectral Nanoimaging: the Role of Nano-Interface 2D Alloys

Single-layer heterostructures exhibit striking quasiparticle properties and many-body interaction effects that hold promise for a range of applications. However, their properties can be altered by intrinsic and extrinsic defects, thus diminishing their applicability. Therefore, it is of paramount importance to identify defects and understand 2D materials&#39; degradation over time using advanced multimodal imaging techniques as well as stabilize degradation via built-in interface protection. Here we implemented a liquid-phase precursor approach to synthesize 2D in-plane MoS2-WS2 heterostructures exhibiting nanoscale alloyed interfaces and map exotic interface effects during photo-degradation using a novel combination of hyperspectral tip-enhanced photoluminescence, Raman and near-field nanoscopy. Surprisingly, 2D alloyed regions exhibit remarkable thermal and photo-degradation stability providing protection against oxidation. Coupled with surface and interface strain, 2D alloy regions create localized potential wells that concentrate excitonic species via a charge carrier funneling effect. These results provide a clear understanding of the importance of 2D alloys as systems able to withstand degradation effects over time, and could be now used to stabilize optoelectronic devices based on 2D materials.

preprint2020arXiv

Stereo Endoscopic Image Super-Resolution Using Disparity-Constrained Parallel Attention

With the popularity of stereo cameras in computer assisted surgery techniques, a second viewpoint would provide additional information in surgery. However, how to effectively access and use stereo information for the super-resolution (SR) purpose is often a challenge. In this paper, we propose a disparity-constrained stereo super-resolution network (DCSSRnet) to simultaneously compute a super-resolved image in a stereo image pair. In particular, we incorporate a disparity-based constraint mechanism into the generation of SR images in a deep neural network framework with an additional atrous parallax-attention modules. Experiment results on laparoscopic images demonstrate that the proposed framework outperforms current SR methods on both quantitative and qualitative evaluations. Our DCSSRnet provides a promising solution on enhancing spatial resolution of stereo image pairs, which will be extremely beneficial for the endoscopic surgery.

preprint2020arXiv

Supporting OpenMP 5.0 Tasks in hpxMP -- A study of an OpenMP implementation within Task Based Runtime Systems

OpenMP has been the de facto standard for single node parallelism for more than a decade. Recently, asynchronous many-task runtime (AMT) systems have increased in popularity as a new programming paradigm for high performance computing applications. One of the major challenges of this new paradigm is the incompatibility of the OpenMP thread model and other AMTs. Highly optimized OpenMP-based libraries do not perform well when coupled with AMTs because the threading of both libraries will compete for resources. This paper is a follow-up paper on the fundamental implementation of hpxMP, an implementation of the OpenMP standard which utilizes the C++ standard library for Parallelism and Concurrency (HPX) to schedule and manage tasks. In this paper, we present the implementation of task features, e.g. taskgroup, task depend, and task_reduction, of the OpenMP 5.0 standard and optimization of the #pragma omp parallel for pragma. We use the daxpy benchmark, the Barcelona OpenMP Tasks Suite, Parallel research kernels, and OpenBLAS benchmarks to compare the different OpenMp implementations: hpxMP, llvm-OpenMP, and GOMP.

preprint2020arXiv

Wafer-scale epitaxial growth of single orientation WS2 monolayers on sapphire

Realization of wafer-scale single-crystal films of transition metal dichalcogenides (TMDs) such as tungsten sulfide requires epitaxial growth and coalescence of oriented domains to form a continuous monolayer. The domains must be oriented in the same crystallographic direction on the substrate to avoid the formation of metallic inversion domain boundaries (IDBs) which are a common feature of layered chalcogenides. Here we demonstrate fully-coalesced single orientation tungsten sulfide monolayers on 2-inch diameter c-plane sapphire by metalorganic chemical vapor deposition using a multi-step growth process. High growth temperatures and sulfur/metal ratios were required to reduce domain misorientation and achieve epitaxial tungsten sulfide monolayers with low in-plane rotational twist (0.09 deg). Transmission electron microscopy analysis reveals that the tungsten sulfide monolayers lack IDBs but instead have translational boundaries that arise when tungsten sulfide domains with slightly off-set lattices merge together. By adjusting the monolayer growth rate, the density of translational boundaries and bilayer coverage were significantly reduced. The preferred orientation of domains is attributed to the presence of steps on the sapphire surface coupled with growth conditions promote surface diffusion and oriented attachment. The transferred tungsten sulfide monolayers show neutral and charged exciton emission at 80K with negligible defect-related luminescence. Back-gated tungsten sulfide field effect transistors exhibited mobility of 16 cm2/Vs. The results demonstrate the potential of achieving wafer-scale TMD monolayers free of inversion domains with properties approaching that of exfoliated flakes.

preprint2019arXiv

Nonlinear dark-field imaging of 1D defects in monolayer dichalcogenides

Extended defects with one dimensionality smaller than that of the host, such as 2D grain boundaries in 3D materials or 1D grain boundaries in 2D materials, can be particularly damaging since they directly impede the transport of charge, spin or heat, and can introduce a metallic character into otherwise semiconducting systems. Unfortunately, a technique to rapidly and non-destructively image 1D defects in 2D materials is lacking. Scanning transmission electron microscopy (STEM), Raman, photoluminescence and nonlinear optical spectroscopies, are all extremely valuable, but current implementations suffer from low throughput and a destructive nature (STEM) or limitations in their unambiguous sensitivity at the nanoscale. Here we demonstrate that dark-field second harmonic generation (SHG) microscopy can rapidly, efficiently, and non-destructively probe grain boundaries and edges in monolayer dichalcogenides (i.e. MoSe2, MoS2 and WS2). Dark-field SHG efficiently separates the spatial components of the emitted light and exploits interference effects from crystal domains of different orientations to localize grain boundaries and edges as very bright 1D patterns through a Cerenkov-type SHG emission. The frequency dependence of this emission in MoSe2 monolayers is explained in terms of plasmon-enhanced SHG related to the defects metallic character. This new technique for nanometer-scale imaging of the grain structure, domain orientation and localized 1D plasmons in 2D different semiconductors, thus enables more rapid progress towards both applications and fundamental materials discoveries.