Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
53works
0followers
30topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

53 published item(s)

preprint2026arXiv

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.

preprint2026arXiv

Before the Body Moves: Learning Anticipatory Joint Intent for Language-Conditioned Humanoid Control

Natural language is an intuitive interface for humanoid robots, yet streaming whole-body control requires control representations that are executable now and anticipatory of future physical transitions. Existing language-conditioned humanoid systems typically generate kinematic references that a low-level tracker must repair reactively, or use latent/action policies whose outputs do not explicitly encode upcoming contact changes, support transfers, and balance preparation. We propose \textbf{DAJI} (\emph{Dynamics-Aligned Joint Intent}), a hierarchical framework that learns an anticipatory joint-intent interface between language generation and closed-loop control. DAJI-Act distills a future-aware teacher into a deployable diffusion action policy through student-driven rollouts, while DAJI-Flow autoregressively generates future intent chunks from language and intent history. Experiments show that DAJI achieves strong results in anticipatory latent learning, single-instruction generation, and streaming instruction following, reaching 94.42\% rollout success on HumanML3D-style generation and 0.152 subsequence FID on BABEL.

preprint2026arXiv

First-Order Efficiency for Probabilistic Value Estimation via A Statistical Viewpoint

Probabilistic values, including Shapley values and semivalues, provide a model-agnostic framework to attribute the behavior of a black-box model to data points or features, with a wide range of applications including explainable artificial intelligence and data valuation. However, their exact computation requires utility evaluations over exponentially many coalitions, making Monte Carlo approximation essential in modern machine learning applications. Existing estimators are often developed through different identification strategies, including weighted averages, self-normalized weighting, regression adjustment, and weighted least squares. Our key observation is that these seemingly distinct constructions share a common first-order error structure, in which the leading term is an augmented inverse-probability weighted influence term determined by the sampling law and a working surrogate function. This first-order representation yields an explicit expression for the leading mean squared error (MSE), which characterizes how the sampling law and the surrogate jointly determine statistical efficiency. Guided by this criterion, we propose an Efficiency-Aware Surrogate-adjusted Estimator (EASE) that directly chooses the sampling law and surrogate to minimize the first-order MSE. We demonstrate that EASE consistently outperforms state-of-the-art estimators for various probabilistic values.

preprint2026arXiv

FlowAct-R1: Towards Interactive Humanoid Video Generation

Interactive humanoid video generation aims to synthesize lifelike visual agents that can engage with humans through continuous and responsive video. Despite recent advances in video synthesis, existing methods often grapple with the trade-off between high-fidelity synthesis and real-time interaction requirements. In this paper, we propose FlowAct-R1, a framework specifically designed for real-time interactive humanoid video generation. Built upon a MMDiT architecture, FlowAct-R1 enables the streaming synthesis of video with arbitrary durations while maintaining low-latency responsiveness. We introduce a chunkwise diffusion forcing strategy, complemented by a novel self-forcing variant, to alleviate error accumulation and ensure long-term temporal consistency during continuous interaction. By leveraging efficient distillation and system-level optimizations, our framework achieves a stable 25fps at 480p resolution with a time-to-first-frame (TTFF) of only around 1.5 seconds. The proposed method provides holistic and fine-grained full-body control, enabling the agent to transition naturally between diverse behavioral states in interactive scenarios. Experimental results demonstrate that FlowAct-R1 achieves exceptional behavioral vividness and perceptual realism, while maintaining robust generalization across diverse character styles.

preprint2026arXiv

Generalized Priority-Aware Shapley Value

Shapley value and its priority-aware extensions are widely used for valuation in machine learning, but existing methods require pairwise priority to be binary and acyclic, a restriction spectacularly violated in real-data examples such as aggregated human preferences and multi-criterion comparisons. We introduce the generalized priority-aware Shapley value (GPASV), a random order value defined on arbitrary directed weighted priority graphs, in which pairwise edges penalize rather than forbid order violations. GPASV covers a range of classical models as boundary cases. We establish GPASV through an axiomatic characterization, develop the associated computational methods, and introduce a priority sweeping diagnostic extending PASV's. We apply GPASV to LLM ensemble valuation on the cyclic Chatbot Arena preference graph, illustrating that priority-aware valuation is not a one-button operation: different balances of pairwise graph priority versus individual soft priority produce substantively different valuations of the same data.

preprint2026arXiv

SEED: Targeted Data Selection by Weighted Independent Set

Data selection seeks to identify a compact yet informative subset from large-scale training corpora, balancing sample quality against collection diversity. We formulate this problem as a Weighted Independent Set (WIS) on a similarity graph, where nodes represent data samples weighted by influence, and edges connect semantically redundant pairs. This formulation naturally yields subsets that are simultaneously high-quality and diverse. However, two challenges arise in practice: naive node weights fail to distinguish informative signals from gradient noise, and edge construction under heterogeneous domain distributions produces structurally imbalanced graphs that bias selection toward sparse regions. To address these issues, we introduce two principled refinements from a unified graph perspective: (1) \textit{node value calibration} that restricts influence estimation to the bilateral salient subspace to ground node importance in task-relevant signals rather than surface-level statistics; (2) \textit{local scale normalization} that adapts edge thresholds to local neighborhood density, mitigating graph imbalance induced by cross-domain distribution shifts. Together, these components yield a robust and scalable data selection pipeline dubbed SEED. We further construct \texttt{Honeybee-Remake-SEED-200K}, a compact multimodal dataset curated by SEED. Extensive experiments show that SEED consistently outperforms state-of-the-art methods on instruction tuning, visual instruction tuning, and semantic segmentation across diverse model families.

preprint2026arXiv

Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations

Sampling from pretrained diffusion and flow-matching models typically requires many forward passes to generate diverse and high-fidelity images. Existing distillation methods often rely on multiple auxiliary networks, carefully designed training stages, or complex optimization pipelines. In this work, we revisit the recently proposed Drifting Model objective and show that a single drifting loss can be directly used to simplify one step distillation. A key observation is that the pretrained diffusion teacher itself already provides a strong representation space. Unlike the original Drifting Model, which relies on an additional pretrained feature extractor, we use intermediate hidden states of the pretrained teacher model as the feature representation. This removes the need for training or introducing an extra representation network while preserving a semantically meaningful feature geometry for drifting. Furthermore, we introduce a lightweight mode coverage loss to mitigate mode collapse during distillation and encourage the student generator to cover diverse teacher-supported regions. Extensive experiments on ImageNet and SDXL demonstrate that our method achieves efficient one step generation with competitive image quality and diversity, achieving FID scores of 1.58 on ImageNet-64$\times$64 and 18.4 on SDXL, while substantially simplifying the overall distillation framework.

preprint2026arXiv

TextLDM: Language Modeling with Continuous Latent Diffusion

Diffusion Transformers (DiT) trained with flow matching in a VAE latent space have unified visual generation across images and videos. A natural next step toward a single architecture for both generation (visual synthesis) and understanding (text generation) is to apply this framework to language modeling. We propose TextLDM, which transfers the visual latent diffusion recipe to text generation with minimal architectural modification. A Transformer-based VAE maps discrete tokens to continuous latents, enhanced by Representation Alignment (REPA) with a frozen pretrained language model to produce representations effective for conditional denoising. A standard DiT then performs flow matching in this latent space, identical in architecture to its visual counterpart. The central challenge we address is obtaining high-quality continuous text representations: we find that reconstruction fidelity alone is insufficient, and that aligning latent features with a pretrained language model via REPA is critical for downstream generation quality. Trained from scratch on OpenWebText2, TextLDM substantially outperforms prior diffusion language models and matches GPT-2 under the same settings. Our results establish that the visual DiT recipe transfers effectively to language, taking a concrete step toward unified diffusion architectures for multimodal generation and understanding.

preprint2026arXiv

Toward Natural and Companionable Virtual Agents via Cross-Temporal Emotional Modeling

Recent advances in foundation models have enabled conversational agents that aim for sustained companionship rather than mere task completion. Yet most still remain unable to support natural, long-term companion-like interactions, resulting in experiences that feel episodic and inauthentic. We argue that current agents overlooked cross-temporal modeling of agents' social behaviors and internal emotions: generated behaviors rarely influence an agent's emotional state, and emotional states seldom shape subsequent behaviors. We present Cross-Temporal Emotion Modeling (CTEM), a framework that links long-term behavioral history to moment-to-moment emotional expression. CTEM establishes a closed loop where past experiences update an evolving emotional state; this state conditions immediate interactions; and user feedback continually revises both memory and emotional state, enabling reflection and anticipation. We instantiate CTEM as Auri, a companion agent on an instant-messaging platform, and report a 21-day in-the-wild study showing that CTEM shows improvements in perceived naturalness, coherence, and emotional harmony.

preprint2026arXiv

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

We propose X-WAM, a Unified 4D World Model that unifies real-time robotic action execution and high-fidelity 4D world synthesis (video + 3D reconstruction) in a single framework, addressing the critical limitations of prior unified world models (e.g., UWM) that only model 2D pixel-space and fail to balance action efficiency and world modeling quality. To leverage the strong visual priors of pretrained video diffusion models, X-WAM imagines the future world by predicting multi-view RGB-D videos, and obtains spatial information efficiently through a lightweight structural adaptation: replicating the final few blocks of the pretrained Diffusion Transformer into a dedicated depth prediction branch for the reconstruction of future spatial information. Moreover, we propose Asynchronous Noise Sampling (ANS) to jointly optimize generation quality and action decoding efficiency. ANS applies a specialized asynchronous denoising schedule during inference, which rapidly decodes actions with fewer steps to enable efficient real-time execution, while dedicating the full sequence of steps to generate high-fidelity video. Rather than entirely decoupling the timesteps during training, ANS samples from their joint distribution to align with the inference distribution. Pretrained on over 5,800 hours of robotic data, X-WAM achieves 79.2% and 90.7% average success rate on RoboCasa and RoboTwin 2.0 benchmarks, while producing high-fidelity 4D reconstruction and generation surpassing existing methods in both visual and geometric metrics.

preprint2024arXiv

Optimal Nonparametric Inference on Network Effects with Dependent Edges

Testing network effects in weighted directed networks is a foundational problem in econometrics, sociology, and psychology. Yet, the prevalent edge dependency poses a significant methodological challenge. Most existing methods are model-based and come with stringent assumptions, limiting their applicability. In response, we introduce a novel, fully nonparametric framework that requires only minimal regularity assumptions. While inspired by recent developments in $U$-statistic literature (arXiv:1712.00771, arXiv:2004.06615), our approach notably broadens their scopes. Specifically, we identified and carefully addressed the challenge of indeterminate degeneracy in the test statistics $-$ a problem that aforementioned tools do not handle. We established Berry-Esseen type bound for the accuracy of type-I error rate control. Using original analysis, we also proved the minimax optimality of our test's power. Simulations underscore the superiority of our method in computation speed, accuracy, and numerical robustness compared to competing methods. We also applied our method to the U.S. faculty hiring network data and discovered intriguing findings.

preprint2024arXiv

Theoretical Study on Superradiant Raman Scattering with Rubidium Atoms in An Optical Cavity

Superradiant Raman scattering of Rubidium atoms has been explored in the experiment [Nature 484, 78 (2012)] to prove the concept of the superradiant laser, which attracts significant attentions in quantum metrology due to the expected ultra-narrow linewidth down to millihertz. To better understand the physics involved in this experiment, we have developed a quantum master equation theory by treating the Rubidium atoms as three-level systems, and coupling them with a dressed laser and an optical cavity. Our simulations show different superradiant Raman scattering pulses for the systems within the crossover and strong coupling regime, and the shifted and broader spectrum of the steady-state Raman scattering. Thus, our studies provide a unified view on the superradiant Raman scattering pulses, and an alternative explanation to the broad spectrum of the steady-state Raman scattering, as observed in the experiment. In future, our theory can be readily applied to study other interesting phenomena relying on the superradiant Raman scattering, such as magnetic field sensing, real-time tracking of quantum phase, Dicke phase transition of non-equilibrium dynamics and so on.

preprint2023arXiv

SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory. We generalise Shapley value to Markov convex game called Markov Shapley value (MSV) and apply it as a value factorisation method in global reward game, which is obtained by the equivalence between the two games. Based on the properties of MSV, we derive Shapley-Bellman optimality equation (SBOE) to evaluate the optimal MSV, which corresponds to an optimal joint deterministic policy. Furthermore, we propose Shapley-Bellman operator (SBO) that is proved to solve SBOE. With a stochastic approximation and some transformations, a new MARL algorithm called Shapley Q-learning (SHAQ) is established, the implementation of which is guided by the theoretical results of SBO and MSV. We also discuss the relationship between SHAQ and relevant value factorisation methods. In the experiments, SHAQ exhibits not only superior performances on all tasks but also the interpretability that agrees with the theoretical analysis. The implementation of this paper is on https://github.com/hsvgbkhgbv/shapley-q-learning.

preprint2022arXiv

Asymptotic theory in network models with covariates and a growing number of node parameters

We propose a general model that jointly characterizes degree heterogeneity and homophily in weighted, undirected networks. We present a moment estimation method using node degrees and homophily statistics. We establish consistency and asymptotic normality of our estimator using novel analysis. We apply our general framework to three applications, including both exponential family and non-exponential family models. Comprehensive numerical studies and a data example also demonstrate the usefulness of our method.

preprint2022arXiv

Capacity Analysis of Holographic MIMO Channels with Practical Constraints

Holographic Multiple-Input and Multiple-Output (MIMO) is envisioned as a promising technology to realize unprecedented spectral efficiency by integrating a large number of antennas into a compact space. Most research on holographic MIMO is based on isotropic scattering environments, and the antenna gain is assumed to be unlimited by deployment space. However, the channel might not satisfy isotropic scattering because of generalized angle distributions, and the antenna gain is limited by the array aperture in reality. In this letter, we aim to analyze the holographic MIMO channel capacity under practical angle distribution and array aperture constraints. First, we calculate the spectral density for generalized angle distributions by introducing a wavenumber domain-based method. And then, the capacity under generalized angle distributions is analyzed and two different aperture schemes are considered. Finally, numerical results show that the capacity is obviously affected by angle distribution at high signal-to-noise ratio (SNR) but hardly affected at low SNR, and the capacity will not increase infinitely with antenna density due to the array aperture constraint.

preprint2022arXiv

Cavity Quantum Electrodynamics Effects of Optically Cooled Nitrogen-Vacancy Centers Coupled to a High Frequency Microwave Resonator

Recent experiments demonstrated the cooling of a microwave mode of a high-quality dielectric resonator coupled to optically cooled nitrogen-vacancy (NV) spins in diamond. Our recent theoretical study [arXiv:2110.10950] pointed out the cooled NV spins can be used to realize cavity quantum electrodynamics effects (C-QED) at room temperature. In this article, we propose to modify the setup used in a recent diamond maser experiment [Nature 55, 493-496 (2018)], which features a higher spin transition frequency, a lower spin-dephasing rate and a stronger NV spins-resonator coupling, to realize better microwave mode cooling and the room-temperature CQED effects. To describe more precisely the optical spin cooling and the collective spin-resonator coupling, we extend the standard Jaynes-Cumming model to account for the rich electronic and spin levels of the NV centers. Our calculations show that for the proposed setup it is possible to cool the microwave mode from $293$ K (room temperature) to $116$ K, which is about $72$ K lower than the previous records, and to study the intriguing dynamics of the CQED effects under the weak-to-strong coupling transition by varying the laser power. With simple modifications, our model can be applied to, e.g., other solid-state spins or triplet spins of pentacene molecules, and to investigate other effects, such as the operations of pulsed and continuous-wave masing.

preprint2022arXiv

Controllable Semantic Parsing via Retrieval Augmentation

In practical applications of semantic parsing, we often want to rapidly change the behavior of the parser, such as enabling it to handle queries in a new domain, or changing its predictions on certain targeted queries. While we can introduce new training examples exhibiting the target behavior, a mechanism for enacting such behavior changes without expensive model re-training would be preferable. To this end, we propose ControllAble Semantic Parser via Exemplar Retrieval (CASPER). Given an input query, the parser retrieves related exemplars from a retrieval index, augments them to the query, and then applies a generative seq2seq model to produce an output parse. The exemplars act as a control mechanism over the generic generative model: by manipulating the retrieval index or how the augmented query is constructed, we can manipulate the behavior of the parser. On the MTOP dataset, in addition to achieving state-of-the-art on the standard setup, we show that CASPER can parse queries in a new domain, adapt the prediction toward the specified patterns, or adapt to new semantic schemas without having to further re-train the model.

preprint2022arXiv

Cross-Scale Vector Quantization for Scalable Neural Speech Coding

Bitrate scalability is a desirable feature for audio coding in real-time communications. Existing neural audio codecs usually enforce a specific bitrate during training, so different models need to be trained for each target bitrate, which increases the memory footprint at the sender and the receiver side and transcoding is often needed to support multiple receivers. In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ), in which multi-scale features are encoded progressively with stepwise feature fusion and refinement. In this way, a coarse-level signal is reconstructed if only a portion of the bitstream is received, and progressively improves the quality as more bits are available. The proposed CSVQ scheme can be flexibly applied to any neural audio coding network with a mirrored auto-encoder structure to achieve bitrate scalability. Subjective results show that the proposed scheme outperforms the classical residual VQ (RVQ) with scalability. Moreover, the proposed CSVQ at 3 kbps outperforms Opus at 9 kbps and Lyra at 3kbps and it could provide a graceful quality boost with bitrate increase.

preprint2022arXiv

End-to-End Neural Speech Coding for Real-Time Communications

Deep-learning based methods have shown their advantages in audio coding over traditional ones but limited attention has been paid on real-time communications (RTC). This paper proposes the TFNet, an end-to-end neural speech codec with low latency for RTC. It takes an encoder-temporal filtering-decoder paradigm that has seldom been investigated in audio coding. An interleaved structure is proposed for temporal filtering to capture both short-term and long-term temporal dependencies. Furthermore, with end-to-end optimization, the TFNet is jointly optimized with speech enhancement and packet loss concealment, yielding a one-for-all network for three tasks. Both subjective and objective results demonstrate the efficiency of the proposed TFNet.

preprint2022arXiv

In Defense of Kalman Filtering for Polyp Tracking from Colonoscopy Videos

Real-time and robust automatic detection of polyps from colonoscopy videos are essential tasks to help improve the performance of doctors during this exam. The current focus of the field is on the development of accurate but inefficient detectors that will not enable a real-time application. We advocate that the field should instead focus on the development of simple and efficient detectors that an be combined with effective trackers to allow the implementation of real-time polyp detectors. In this paper, we propose a Kalman filtering tracker that can work together with powerful, but efficient detectors, enabling the implementation of real-time polyp detectors. In particular, we show that the combination of our Kalman filtering with the detector PP-YOLO shows state-of-the-art (SOTA) detection accuracy and real-time processing. More specifically, our approach has SOTA results on the CVC-ClinicDB dataset, with a recall of 0.740, precision of 0.869, $F_1$ score of 0.799, an average precision (AP) of 0.837, and can run in real time (i.e., 30 frames per second). We also evaluate our method on a subset of the Hyper-Kvasir annotated by our clinical collaborators, resulting in SOTA results, with a recall of 0.956, precision of 0.875, $F_1$ score of 0.914, AP of 0.952, and can run in real time.

preprint2022arXiv

KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos

Recommender systems deployed in real-world applications can have inherent exposure bias, which leads to the biased logged data plaguing the researchers. A fundamental way to address this thorny problem is to collect users' interactions on randomly expose items, i.e., the missing-at-random data. A few works have asked certain users to rate or select randomly recommended items, e.g., Yahoo!, Coat, and OpenBandit. However, these datasets are either too small in size or lack key information, such as unique user ID or the features of users/items. In this work, we present KuaiRand, an unbiased sequential recommendation dataset containing millions of intervened interactions on randomly exposed videos, collected from the video-sharing mobile App, Kuaishou. Different from existing datasets, KuaiRand records 12 kinds of user feedback signals (e.g., click, like, and view time) on randomly exposed videos inserted in the recommendation feeds in two weeks. To facilitate model learning, we further collect rich features of users and items as well as users' behavior history. By releasing this dataset, we enable the research of advanced debiasing large-scale recommendation scenarios for the first time. Also, with its distinctive features, KuaiRand can support various other research directions such as interactive recommendation, long sequential behavior modeling, and multi-task learning. The dataset and its news will be available at https://kuairand.com.

preprint2022arXiv

LBCF: A Large-Scale Budget-Constrained Causal Forest Algorithm

Offering incentives (e.g., coupons at Amazon, discounts at Uber and video bonuses at Tiktok) to user is a common strategy used by online platforms to increase user engagement and platform revenue. Despite its proven effectiveness, these marketing incentives incur an inevitable cost and might result in a low ROI (Return on Investment) if not used properly. On the other hand, different users respond differently to these incentives, for instance, some users never buy certain products without coupons, while others do anyway. Thus, how to select the right amount of incentives (i.e. treatment) to each user under budget constraints is an important research problem with great practical implications. In this paper, we call such problem as a budget-constrained treatment selection (BTS) problem. The challenge is how to efficiently solve BTS problem on a Large-Scale dataset and achieve improved results over the existing techniques. We propose a novel tree-based treatment selection technique under budget constraints, called Large-Scale Budget-Constrained Causal Forest (LBCF) algorithm, which is also an efficient treatment selection algorithm suitable for modern distributed computing systems. A novel offline evaluation method is also proposed to overcome an intrinsic challenge in assessing solutions' performance for BTS problem in randomized control trials (RCT) data. We deploy our approach in a real-world scenario on a large-scale video platform, where the platform gives away bonuses in order to increase users' campaign engagement duration. The simulation analysis, offline and online experiments all show that our method outperforms various tree-based state-of-the-art baselines. The proposed approach is currently serving over hundreds of millions of users on the platform and achieves one of the most tremendous improvements over these months.

preprint2022arXiv

Learning Multi-granularity User Intent Unit for Session-based Recommendation

Session-based recommendation aims to predict a user's next action based on previous actions in the current session. The major challenge is to capture authentic and complete user preferences in the entire session. Recent work utilizes graph structure to represent the entire session and adopts Graph Neural Network to encode session information. This modeling choice has been proved to be effective and achieved remarkable results. However, most of the existing studies only consider each item within the session independently and do not capture session semantics from a high-level perspective. Such limitation often leads to severe information loss and increases the difficulty of capturing long-range dependencies within a session. Intuitively, compared with individual items, a session snippet, i.e., a group of locally consecutive items, is able to provide supplemental user intents which are hardly captured by existing methods. In this work, we propose to learn multi-granularity consecutive user intent unit to improve the recommendation performance. Specifically, we creatively propose Multi-granularity Intent Heterogeneous Session Graph which captures the interactions between different granularity intent units and relieves the burden of long-dependency. Moreover, we propose the Intent Fusion Ranking module to compose the recommendation results from various granularity user intents. Compared with current methods that only leverage intents from individual items, IFR benefits from different granularity user intents to generate more accurate and comprehensive session representation, thus eventually boosting recommendation performance. We conduct extensive experiments on five session-based recommendation datasets and the results demonstrate the effectiveness of our method.

preprint2022arXiv

MobRecon: Mobile-Friendly Hand Mesh Reconstruction from Monocular Image

In this work, we propose a framework for single-view hand mesh reconstruction, which can simultaneously achieve high reconstruction accuracy, fast inference speed, and temporal coherence. Specifically, for 2D encoding, we propose lightweight yet effective stacked structures. Regarding 3D decoding, we provide an efficient graph operator, namely depth-separable spiral convolution. Moreover, we present a novel feature lifting module for bridging the gap between 2D and 3D representations. This module begins with a map-based position regression (MapReg) block to integrate the merits of both heatmap encoding and position regression paradigms for improved 2D accuracy and temporal coherence. Furthermore, MapReg is followed by pose pooling and pose-to-vertex lifting approaches, which transform 2D pose encodings to semantic features of 3D vertices. Overall, our hand reconstruction framework, called MobRecon, comprises affordable computational costs and miniature model size, which reaches a high inference speed of 83FPS on Apple A14 CPU. Extensive experiments on popular datasets such as FreiHAND, RHD, and HO3Dv2 demonstrate that our MobRecon achieves superior performance on reconstruction accuracy and temporal coherence. Our code is publicly available at https://github.com/SeanChenxy/HandMesh.

preprint2022arXiv

Optimal $L^p$ regularity for $\bar\partial$ on the Hartogs triangle

In this paper, we prove weighted $L^p$ estimates for the canonical solutions on product domains. As an application, we show that if $p\in [4, \infty)$, the $\bar\partial$ equation on the Hartogs triangle with $L^p$ data admits $L^p$ solutions with the desired estimates. For any $ε>0$, by constructing an example with $L^p$ data but having no $L^{p+ε}$ solutions, we verify the sharpness of the $L^p$ regularity on the Hartogs triangle.

preprint2022arXiv

Toward a Human-Centered AI-assisted Colonoscopy System

AI-assisted colonoscopy has received lots of attention in the last decade. Several randomised clinical trials in the previous two years showed exciting results of the improving detection rate of polyps. However, current commercial AI-assisted colonoscopy systems focus on providing visual assistance for detecting polyps during colonoscopy. There is a lack of understanding of the needs of gastroenterologists and the usability issues of these systems. This paper aims to introduce the recent development and deployment of commercial AI-assisted colonoscopy systems to the HCI community, identify gaps between the expectation of the clinicians and the capabilities of the commercial systems, and highlight some unique challenges in Australia.

preprint2022arXiv

Unique continuation for $\bar\partial$ with square-integrable potentials

In this paper, we investigate the unique continuation property for the inequality $|\bar\partial u| \le V|u|$, where $u$ is a vector-valued function from a domain in $\mathbb C^n$ to $\mathbb C^N$, and the potential $V\in L^2$. We show that the strong unique continuation property holds when $n=1$, and the weak unique continuation property holds when $n\ge 2$. In both cases, the $L^2$ integrability condition on the potential is optimal.

preprint2022arXiv

WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic Segmentation for Lung Adenocarcinoma

Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma (LUAD) is the most common subtype. Exploiting the potential value of the histopathology images can promote precision medicine in oncology. Tissue segmentation is the basic upstream task of histopathology image analysis. Existing deep learning models have achieved superior segmentation performance but require sufficient pixel-level annotations, which is time-consuming and expensive. To enrich the label resources of LUAD and to alleviate the annotation efforts, we organize this challenge WSSS4LUAD to call for the outstanding weakly-supervised semantic segmentation (WSSS) techniques for histopathology images of LUAD. Participants have to design the algorithm to segment tumor epithelial, tumor-associated stroma and normal tissue with only patch-level labels. This challenge includes 10,091 patch-level annotations (the training set) and over 130 million labeled pixels (the validation and test sets), from 87 WSIs (67 from GDPH, 20 from TCGA). All the labels were generated by a pathologist-in-the-loop pipeline with the help of AI models and checked by the label review board. Among 532 registrations, 28 teams submitted the results in the test phase with over 1,000 submissions. Finally, the first place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919). According to the technical reports of the top-tier teams, CAM is still the most popular approach in WSSS. Cutmix data augmentation has been widely adopted to generate more reliable samples. With the success of this challenge, we believe that WSSS approaches with patch-level annotations can be a complement to the traditional pixel annotations while reducing the annotation efforts. The entire dataset has been released to encourage more researches on computational pathology in LUAD and more novel WSSS techniques.

preprint2021arXiv

Active Frequency Measurement on Superradiant Strontium Clock Transitions

We develop a stochastic mean-field theory to describe active frequency measurements of pulsed superradiant emission, studied in recent experiments with strontium-87 atoms trapped in an optical lattice inside an optical cavity [M. Norcia, et al., Phys. Rev. X 8, 21036 (2018)]. Our theory reveals the intriguing dynamics of atomic ensembles with multiple transition frequencies, and it reproduces the superradiant beats signal, noisy power spectra, and frequency uncertainty in remarkable agreement with the experiments. Moreover, by reducing the number of atoms, elongating the superradiant pulses and shortening the experimental duty cycle, we predict a short-term frequency uncertainty $9\times10^{-16} \sqrt{τ/s}$, which makes active frequency measurements with superradiant transitions comparable with the record performance of current frequency standards [M. Schioppo, et al., Nat. Photonics, 11, 48 (2017)]. Our theory combines cavity-quantum electrodynamics and quantum measurement theory, and it can be readily applied to explore conditional quantum dynamics and describe frequency measurements for other processes such as steady-state superradiance and superradiant Raman lasing.

preprint2021arXiv

Cavity Quantum Electrodynamics Effects with Nitrogen Vacancy Center Spins in Diamond and Microwave Resonators at Room Temperature

Cavity quantum electrodynamics (C-QED) effects, such as Rabi splitting, Rabi oscillations and superradiance, have been demonstrated with nitrogen vacancy center spins in diamond in microwave resonators at cryogenic temperature. In this article we explore the possibility to realize strong collective coupling and the resulting C-QED effects with ensembles of spins at room temperature. Thermal excitation of the individual spins by the hot environment leads to population of collective Dicke states with low symmetry and a reduced collective spin-microwave field coupling. However, we show with simulations that the thermal excitation can be compensated by spin-cooling via optical pumping. The resulting population of Dicke states with higher symmetry implies strong coupling with currently available high-quality resonators and enables C-QED effects at room temperature with potential applications in quantum sensing and quantum information processing.

preprint2021arXiv

Hölder estimates for the $\bar\partial$ problem for $(p,q)$ forms on product domains

The purpose of this paper is to study Hölder estimates for the $\bar\partial$ problem for $(p,q)$ forms on products of general planar domains. As indicated by an example of Stein and Kerzman, solutions to the $\bar\partial$ problem on product domains in $\mathbb C^n (n\ge 2)$ does not gain regularity in Hölder spaces. Making use of an integral representation of Nijenhuis and Woolf, we show that given a $\bar\partial$-closed $(p,q)$ form with $C^{k,α}$ components, $0\le p\le n, 1\le q\le n$, $k\in \mathbb Z^+\cup \{0\}, 0<α\le 1$, there is a $C^{k, α&#39;}$ solution to the $\bar\partial$ problem on product domains for any $0<α&#39;<α$ with the desired Hölder estimate.

preprint2021arXiv

Investigating the integrate and fire model as the limit of a random discharge model: a stochastic analysis perspective

In the mean field integrate-and-fire model, the dynamics of a typical neuron within a large network is modeled as a diffusion-jump stochastic process whose jump takes place once the voltage reaches a threshold. In this work, the main goal is to establish the convergence relationship between the regularized process and the original one where in the regularized process, the jump mechanism is replaced by a Poisson dynamic, and jump intensity within the classically forbidden domain goes to infinity as the regularization parameter vanishes. On the macroscopic level, the Fokker-Planck equation for the process with random discharges (i.e. Poisson jumps) are defined on the whole space, while the equation for the limit process is on the half space. However, with the iteration scheme, the difficulty due to the domain differences has been greatly mitigated and the convergence for the stochastic process and the firing rates can be established. Moreover, we find a polynomial-order convergence for the distribution by a re-normalization argument in probability theory. Finally, by numerical experiments, we quantitatively explore the rate and the asymptotic behavior of the convergence for both linear and nonlinear models.

preprint2021arXiv

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System

Designing task-oriented dialogue systems is a challenging research topic, since it needs not only to generate utterances fulfilling user requests but also to guarantee the comprehensibility. Many previous works trained end-to-end (E2E) models with supervised learning (SL), however, the bias in annotated system utterances remains as a bottleneck. Reinforcement learning (RL) deals with the problem through using non-differentiable evaluation metrics (e.g., the success rate) as rewards. Nonetheless, existing works with RL showed that the comprehensibility of generated system utterances could be corrupted when improving the performance on fulfilling user requests. In our work, we (1) propose modelling the hierarchical structure between dialogue policy and natural language generator (NLG) with the option framework, called HDNO, where the latent dialogue act is applied to avoid designing specific dialogue act representations; (2) train HDNO via hierarchical reinforcement learning (HRL), as well as suggest the asynchronous updates between dialogue policy and NLG during training to theoretically guarantee their convergence to a local maximizer; and (3) propose using a discriminator modelled with language models as an additional reward to further improve the comprehensibility. We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA, showing improvements on the performance evaluated by automatic evaluation metrics and human evaluation. Finally, we demonstrate the semantic meanings of latent dialogue acts to show the explanability for HDNO.

preprint2021arXiv

Some Rigorous Results on the Phase Transition of Finitary Random Interlacement

In this paper, we show several rigorous results on the phase transition of Finitary Random Interlacement (FRI). For the high intensity regime, we show the existence of a critical fiber length, and give the exact asymptotic of it as intensity goes to infinity. At the same time, our result for the low intensity regime proves the global existence of a non-trivial phase transition with respect to the system intensity.

preprint2021arXiv

Structural Controllability of Networked Relative Coupling Systems

This paper studies the controllability of networked relative coupling systems (NRCSs), in which subsystems are of fixed high-order linear dynamics and coupled through relative variables depending on their neighbors, from a structural perspective. The purpose is to explore conditions for subsystem dynamics and network topologies under which for almost all weights of the subsystem interaction links, the corresponding numerical NRCSs are controllable, which is called structurally controllable. Three types of subsystem interaction fashions are considered: 1) each subsystem is single-input-single-output (SISO), 2) each subsystem is multiple-input-multiple-output (MIMO), and the weights for all channels between two subsystems are identical, and 3) each subsystem is MIMO, but different channels between two subsystems can be weighted differently. {We show that all parameter-dependent modes of the NRCSs are generically controllable under some necessary connectivity conditions. We then derive necessary and/or sufficient conditions for structural controllability depending on subsystem dynamics and network topologies&#39; connectivity in a decoupled form for all the three interaction fashions.} We also extend our results to handle certain subsystem heterogeneities and demonstrate their direct applications on some practical systems, including the mass-spring-damper system and the power network.

preprint2021arXiv

Weighted Sylvester sums on the Frobenius set in more variables

Let $a_1,a_2,\dots,a_k$ be positive integers with $\gcd(a_1,a_2,\dots,a_k)=1$. Let ${\rm NR}={\rm NR}(a_1,a_2,\dots,a_k)$ denote the set of positive integers nonrepresentable in terms of $a_1,a_2,\dots,a_k$. The largest nonrepresentable integer $\max{\rm NR}$, the number of nonrepresentable positive integers $\sum_{n\in{\rm NR}}1$ and the sum of nonrepresentable positive integers $\sum_{n\in{\rm NR}}n$ have been widely studied for a long time as related to the famous Frobenius problem. In this paper by using Eulerian numbers, we give formulas for the weighted sum $\sum_{n\in{\rm NR}}λ^{n}n^μ$, where $μ$ is a nonnegative integer and $λ$ is a complex number. We also examine power sums of nonrepresentable numbers and some formulae for three variables. Several examples illustrate and support our results.

preprint2020arXiv

Cauchy singular integral operator with parameters in Log-Hölder spaces

This paper is motivated by a claim in the classical textbook of Muskhelishvili concerning the Cauchy singular integral operator $S$ on Hölder functions with parameters. To the contrary of the claim, a counter example was constructed by Tumanov which shows that $S$ with parameters fails to maintain the same Hölder regularity with respect to the parameters. In view of the example, the behavior of the Cauchy singular integral operator with parameters between a type of Log-Hölder spaces is investigated to obtain the sharp norm estimates. At the end of the paper, we discuss its application to the $\bar\partial$ problem on product domains.

preprint2020arXiv

Characterization of complementing pairs of $({\mathbb Z}_{\geq 0})^n$

Let $A, B, C$ be subsets of an abelian group $G$. A pair $(A, B)$ is called a $C$-pair if $A, B\subset C$ and $C$ is the direct sum of $A$ and $B$. The $(\Z_{\geq 0})$-pairs are characterized by de Bruijn in 1950 and the $(\Z_{\geq 0})^2$-pairs are characterized by Niven in 1971. In this paper, we characterize the $(\Z_{\geq 0})^n$-pairs for all $n\geq 1$. We show that every $(\Z_{\geq 0})^n$-pair is characterized by a weighted tree if it is primitive, that is, it is not a Cartesian product of a $(\Z_{\geq 0})^p$-pair and a $(\Z_{\geq 0})^q$-pair of lower dimensions.

preprint2020arXiv

FocalMix: Semi-Supervised Learning for 3D Medical Image Detection

Applying artificial intelligence techniques in medical imaging is one of the most promising areas in medicine. However, most of the recent success in this area highly relies on large amounts of carefully annotated data, whereas annotating medical images is a costly process. In this paper, we propose a novel method, called FocalMix, which, to the best of our knowledge, is the first to leverage recent advances in semi-supervised learning (SSL) for 3D medical image detection. We conducted extensive experiments on two widely used datasets for lung nodule detection, LUNA16 and NLST. Results show that our proposed SSL methods can achieve a substantial improvement of up to 17.3% over state-of-the-art supervised learning approaches with 400 unlabeled CT scans.

preprint2020arXiv

Generic Detectability and Isolability of Topology Failures in Networked Linear Systems

This paper studies the possibility of detecting and isolating topology failures (including link failures and node failures) of a networked system from subsystem measurements, in which subsystems are of fixed high-order linear dynamics, and the exact interaction weights among them are unknown. We prove that in such class of networked systems with the same network topologies, the detectability and isolability of a given topology failure (set) are generic properties, indicating that it is the network topology that dominates the property of being detectable or isolable for a failure (set). We first give algebraic conditions for detectability and isolability of arbitrary parameter perturbations for a lumped plant, and then derive graph-theoretical necessary and sufficient conditions for generic detectability and isolability of topology failures for the networked systems. On the basis of these results, we consider the problems of deploying the smallest set of sensors for generic detectability and isolability. We reduce the associated sensor placement problems to the hitting set problems, which can be effectively solved by greedy algorithms with guaranteed approximation performances.

preprint2020arXiv

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Monocular depth estimation plays a crucial role in 3D recognition and understanding. One key limitation of existing approaches lies in their lack of structural information exploitation, which leads to inaccurate spatial layout, discontinuous surface, and ambiguous boundaries. In this paper, we tackle this problem in three aspects. First, to exploit the spatial relationship of visual features, we propose a structure-aware neural network with spatial attention blocks. These blocks guide the network attention to global structures or local details across different feature layers. Second, we introduce a global focal relative loss for uniform point pairs to enhance spatial constraint in the prediction, and explicitly increase the penalty on errors in depth-wise discontinuous regions, which helps preserve the sharpness of estimation results. Finally, based on analysis of failure cases for prior methods, we collect a new Hard Case (HC) Depth dataset of challenging scenes, such as special lighting conditions, dynamic objects, and tilted camera angles. The new dataset is leveraged by an informed learning curriculum that mixes training examples incrementally to handle diverse data distributions. Experimental results show that our method outperforms state-of-the-art approaches by a large margin in terms of both prediction accuracy on NYUDv2 dataset and generalization performance on unseen datasets.

preprint2020arXiv

Lipschitz classification of Bedford-McMullen carpets with uniform horizontal fibers

Let ${\cal M}_{t,v,r}(n,m)$, $2\leq m<n$, be the collection of self-affine carpets with expanding matrix $\diag(n,m)$ which are totally disconnected, possessing vacant rows and with uniform horizontal fibers. In this paper, we introduce a notion of structure tree of a metric space, and thanks to this new notion, we completely characterize when two carpets in ${\cal M}_{t,v,r}(n,m)$ are Lipschitz equivalent.

preprint2020arXiv

Mapping Natural Language Instructions to Mobile UI Action Sequences

We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PIXELHELP, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in HowTo instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PIXELHELP.

preprint2020arXiv

Neural Inheritance Relation Guided One-Shot Layer Assignment Search

Layer assignment is seldom picked out as an independent research topic in neural architecture search. In this paper, for the first time, we systematically investigate the impact of different layer assignments to the network performance by building an architecture dataset of layer assignment on CIFAR-100. Through analyzing this dataset, we discover a neural inheritance relation among the networks with different layer assignments, that is, the optimal layer assignments for deeper networks always inherit from those for shallow networks. Inspired by this neural inheritance relation, we propose an efficient one-shot layer assignment search approach via inherited sampling. Specifically, the optimal layer assignment searched in the shallow network can be provided as a strong sampling priori to train and search the deeper ones in supernet, which extremely reduces the network search space. Comprehensive experiments carried out on CIFAR-100 illustrate the efficiency of our proposed method. Our search results are strongly consistent with the optimal ones directly selected from the architecture dataset. To further confirm the generalization of our proposed method, we also conduct experiments on Tiny-ImageNet and ImageNet. Our searched results are remarkably superior to the handcrafted ones under the unchanged computational budgets. The neural inheritance relation discovered in this paper can provide insights to the universal neural architecture search.

preprint2020arXiv

On (non-)monotonicity and phase diagram of finitary random interlacement

In this paper, we study the evolution of a Finitary Random Interlacement (FRI) with respect to the expected length of each fiber. In contrast to the previously proved phase transition between sufficiently large and small fiber length, we show that for $d=3,4$, FRI is NOT stochastically monotone as fiber length increasing. At the same time, numerical evidences still strongly support the existence of a unique and sharp phase transition on the existence of a unique infinite cluster, while the critical value for phase transition is estimated to be an inversely proportional function with respect to the system intensity.

preprint2020arXiv

On Chemical Distance and Local Uniqueness of a Sufficiently Supercritical Finitary Random Interlacement

In this paper, we study geometric properties of the unique infinite cluster $Γ$ in a sufficiently supercritical Finitary Random Interlacements $\mathcal{FI}^{u,T}$ in $\mathbb{Z}^d, \ d\ge 3$. We prove that the chemical distance in $Γ$ is, with stretched exponentially high probability, of the same order as the Euclidean distance in $\mathbb{Z}^d$. This also implies a shape theorem parallel to those for Bernoulli percolation and random interlacements. We also prove local uniqueness of $\mathcal{FI}^{u,T}$, which says any two large clusters in $\mathcal{FI}^{u,T}$ &#34;close to each other&#34; will with stretched exponentially high probability be connected to each other within the same order of the distance between them.

preprint2020arXiv

On some threshold-one attractive interacting particle systems on homogeneous trees

In this paper, we consider the threshold-one contact process and the threshold-one voter model w/o spontaneous death on homogeneous trees $\mathbb{T}_d$, $d\ge 2$. Mainly inspired by the corresponding arguments for ordinary contact processes, we prove that the complete convergence theorem holds for these three systems under strong survival. When the systems survives weakly, complete convergence may also hold under certain transition and/or initial conditions.

preprint2020arXiv

Optomechanical Collective Effects in Surface-Enhanced Raman Scattering from Many Molecules

The interaction between molecules is commonly ignored in surface-enhanced Raman scattering (SERS). Under this assumption, the total SERS signal is described as the sum of the individual contributions of each molecule treated independently. We adopt here an optomechanical description of SERS within a cavity quantum electrodynamics framework to study how collective effects emerge from the quantum correlations of distinct molecules. We derive analytical expressions for identical molecules and implement numerical simulations to analyze two types of collective phenomena: (i) a decrease of the laser intensity threshold to observe strong non-linearities as the number of molecules increases, within intense illumination, and (ii) identification of superradiance in the SERS signal, namely a quadratic scaling with the number of molecules. The laser intensity required to observe the latter in the anti-Stokes scattering is relatively moderate, which makes it particularly accessible to experiments. Our results also show that collective phenomena can survive in the presence of moderate homogeneous and inhomogeneous broadening.

preprint2020arXiv

Structural Controllability of Undirected Diffusive Networks with Vector-Weighted Edges

In this paper, controllability of undirected networked systems with {diffusively coupled subsystems} is considered, where each subsystem is of {identically {\emph{fixed}}} general high-order single-input-multi-output dynamics. The underlying graph of the network topology is {\emph{vector-weighted}}, rather than scalar-weighted. The aim is to find conditions under which the networked system is structurally controllable, i.e., for almost all vector values for interaction links of the network topology, the corresponding system is controllable. It is proven that, the networked system is structurally controllable, if and only if each subsystem is controllable and observable, and the network topology is globally input-reachable. These conditions are further extended to the cases {with multi-input-multi-output subsystems and matrix-weighted edges,} or where both directed and undirected interaction links exist.

preprint2020arXiv

Tilings of convex polyhedral cones and topological properties of self-affine tiles

Let $\textbf{a}_1,\dots, \textbf{a}_r$ be vectors in a half-space of $\mathbb{R}^n$. We call $$C=\textbf{a}_1\mathbb{R}^++\cdots+\textbf{a}_r \mathbb{R}^+$$ a convex polyhedral cone, and call $\{\textbf{a}_1,\dots, \textbf{a}_r\}$ a generator set of $C$. A generator set with the minimal cardinality is called a frame. We investigate the translation tilings of convex polyhedral cones. Let $T\subset \mathbb{R}^n$ be a compact set such that $T$ is the closure of its interior, and $\mathcal{J}\subset \mathbb{R}^n$ be a discrete set. We say $(T,\mathcal{J})$ is a translation tiling of $C$ if $T+\mathcal{J}=C$ and any two translations of $T$ in $T+\mathcal{J}$ are disjoint in Lebesgue measure. We show that if the cardinality of a frame of $C$ is larger than $\dim C$, the dimension of $C$, then $C$ does not admit any translation tiling; if the cardinality of a frame of $C$ equals $\dim C$, then the translation tilings of $C$ can be reduced to the translation tilings of $(\mathbb{Z}^+)^n$. As an application, we characterize all the self-affine tiles possessing polyhedral corners, which generalizes a result of Odlyzko [A. M. Odlyzko, \textit{Non-negative digit sets in positional number systems}, Proc. London Math. Soc., \textbf{37}(1978), 213-229.].

preprint2019arXiv

The Surprising Accuracy of Benford&#39;s Law in Mathematics

Benford&#39;s law is an empirical ``law&#39;&#39; governing the frequency of leading digits in numerical data sets. Surprisingly, for mathematical sequences the predictions derived from it can be uncannily accurate. For example, among the first billion powers of $2$, exactly $301029995$ begin with digit 1, while the Benford prediction for this count is $10^9\log_{10}2=301029995.66\dots$. Similar ``perfect hits&#39;&#39; can be observed in other instances, such as the digit $1$ and $2$ counts for the first billion powers of $3$. We prove results that explain many, but not all, of these surprising accuracies, and we relate the observed behavior to classical results in Diophantine approximation as well as recent deep conjectures in this area.