Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
38works
0followers
28topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

38 published item(s)

preprint2026arXiv

1-GHz VIS-to-MIR frequency combs enabled by CMOS-compatible nanophotonic waveguides

A fully stabilized frequency comb is essential for precision metrology and coherent optical synthesis. However, fully-stabilized frequency combs generally require separate stages for supercontinuum generation (SCG) and self-referencing, largely limiting their compactness. Here, enabled by the low-threshold multi-octave supercontinuum generation and concurrent third-harmonic generation in low-loss silicon nitride waveguides, we present a novel approach to a self-referenced frequency comb source at 1 GHz repetition rate spanning from the full visible (VIS) to the mid-infrared (MIR). Our coherent comb is seeded by an all-polarization-maintaining ultrafast fiber laser at 1556 nm, with a pulse duration of 73 fs at 1 GHz repetition rate. With an injected energy of merely 110 pJ, the pulses propagate through dispersion-engineered Si3N4 waveguides, generating supercontinuum spanning over three octaves from 350-3280 nm i.e. 0.76 PHz of coherent bandwidth. Moreover, the on-chip third harmonic generation provides a carrier envelope offset beat note via f-3f with a signal-to-noise ratio of 43 dB. Fueled by the evolving photonic integration providing possibilities of on-chip filtering and photo-detectors, this approach for single-chip self-referencing of high-repetition-rate frequency combs paves the way for ultrabroadband comb sources with unprecedented compactness and field-readiness.

preprint2026arXiv

Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment

Most multimodal models treat every negative pair alike, ignoring the ambiguous negatives that differ from the positive by only a small detail. We propose Boundary-Aware Curriculum with Local Attention (BACL), a lightweight add-on that turns these borderline cases into a curriculum signal. A Boundary-aware Negative Sampler gradually raises difficulty, while a Contrastive Local Attention loss highlights where the mismatch occurs. The two modules are fully differentiable and work with any off-the-shelf dual encoder. Theory predicts a fast O(1/n) error rate; practice shows up to +32% R@1 over CLIP and new SOTA on four large-scale benchmarks, all without extra labels.

preprint2026arXiv

First Submillimeter Lights from Dome A: Tracing the Carbon Cycle in the Feedback of Massive Stars

The cycling of carbon between its ionized, atomic, and molecular phases shapes the chemical compositions and physical conditions of the interstellar medium (ISM). However, ground-based studies of the full carbon cycle have been limited by atmospheric absorption. Dome~A, the most promising site for submillimeter astronomy, has long resisted successful submillimeter astronomical observations. Using the 60~cm Antarctic Terahertz Explorer, we present the first successful CO ($4-3$) and [CI] ($^3P_1 - ^3P_0$) mapping observations of two archetypal triggered massive star-formation regions at Dome~A. These data, together with archival [CII], provide the first complete characterization of all three carbon phases in these environments. We find elevated C$^{0}$/CO abundance ratios in high-extinction regions, plausibly driven by deep penetration of intense radiation fields from massive stars into a clumpy ISM. These findings mark a major milestone for submillimeter astronomy at Dome~A and offer valuable insights into the impact of massive star feedback on the surrounding ISM.

preprint2026arXiv

GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models

Currently, process reward models (PRMs) have exhibited remarkable potential for test-time scaling. Since large language models (LLMs) regularly generate flawed intermediate reasoning steps when tackling a broad spectrum of reasoning and decision-making tasks, PRMs are required to possess capabilities for detecting process-level errors in real-world scenarios. However, existing benchmarks primarily focus on mathematical reasoning, thereby failing to comprehensively evaluate the error detection ability of PRMs across diverse reasoning scenarios. To mitigate this gap, we introduce GR-Ben, a process-level benchmark specifically designed for assessing PRM's performance across two primary reasoning domains (science and logic) and nine subdomains. We conduct extensive experiments on a diverse set of 22 models, encompassing both PRMs and LLMs, and derive two key findings: (1) In domains beyond mathematical reasoning, the error-detection ability of existing PRMs and LLMs is found to be markedly weaker by comparison.(2) In general, PRMs are less adept at identifying knowledge-based errors, whereas LLMs exhibit poorer performance in detecting computational errors. We hope GR-Ben can foster future researches on PRMs for general domains, thereby enhancing the reasoning capabilities of LLMs.

preprint2026arXiv

Synergy over Discrepancy: A Partition-Based Approach to Multi-Domain LLM Fine-Tuning

Large language models (LLMs) demonstrate impressive generalization abilities, yet adapting them effectively across multiple heterogeneous domains remains challenging due to inter-domain interference. To overcome this challenge, we propose a partition-based multi-stage fine-tuning framework designed to exploit inter-domain synergies while minimizing negative transfer. Our approach strategically partitions domains into subsets (stages) by balancing domain discrepancy, synergy, and model capacity constraints. We theoretically analyze the proposed framework and derive novel generalization bounds that justify our partitioning strategy. Extensive empirical evaluations on various language understanding tasks show that our method consistently outperforms state-of-the-art baselines.

preprint2023arXiv

Building Concise Logical Patterns by Constraining Tsetlin Machine Clause Size

Tsetlin machine (TM) is a logic-based machine learning approach with the crucial advantages of being transparent and hardware-friendly. While TMs match or surpass deep learning accuracy for an increasing number of applications, large clause pools tend to produce clauses with many literals (long clauses). As such, they become less interpretable. Further, longer clauses increase the switching activity of the clause logic in hardware, consuming more power. This paper introduces a novel variant of TM learning - Clause Size Constrained TMs (CSC-TMs) - where one can set a soft constraint on the clause size. As soon as a clause includes more literals than the constraint allows, it starts expelling literals. Accordingly, oversized clauses only appear transiently. To evaluate CSC-TM, we conduct classification, clustering, and regression experiments on tabular data, natural language text, images, and board games. Our results show that CSC-TM maintains accuracy with up to 80 times fewer literals. Indeed, the accuracy increases with shorter clauses for TREC, IMDb, and BBC Sports. After the accuracy peaks, it drops gracefully as the clause size approaches a single literal. We finally analyze CSC-TM power consumption and derive new convergence properties.

preprint2023arXiv

Joint Hybrid Beamforming and User Scheduling for Multi-Satellite Cooperative Networks

In this paper, we consider a cooperative communication network where multiple satellites provide services for ground users (GUs) (at the same time and on the same frequency). The communication and computational resources on satellites are usually restricted and the satellite-GU link determination affects the communication performance significantly when multiple satellites provide services for multiple GUs in a collaborative manner. Therefore, considering the limitation of the on-board radio-frequency chains, we first propose a hybrid beamforming method consisting of analog beamforming for beam alignment and digital beamforming for interference mitigation. Then, to establish appropriate connections between satellites and GUs, we propose a heuristic user scheduling algorithm which determines the connections according to the total spectral efficiency (SE) increment of the multi-satellite cooperative network. Next, a joint hybrid beamforming and user scheduling scheme is proposed to dramatically improve the performance of the multi-satellite cooperative network. Moreover, simulations are conducted to compare the proposed schemes with representative baselines and analyze the key factors influencing the performance of the multi-satellite cooperative network. It is shown that the proposed joint beamforming and user scheduling approach can provide 47.2% SE improvement on average as compared with its non-joint counterpart.

preprint2022arXiv

A Fair and Efficient Hybrid Federated Learning Framework based on XGBoost for Distributed Power Prediction

In a modern power system, real-time data on power generation/consumption and its relevant features are stored in various distributed parties, including household meters, transformer stations and external organizations. To fully exploit the underlying patterns of these distributed data for accurate power prediction, federated learning is needed as a collaborative but privacy-preserving training scheme. However, current federated learning frameworks are polarized towards addressing either the horizontal or vertical separation of data, and tend to overlook the case where both are present. Furthermore, in mainstream horizontal federated learning frameworks, only artificial neural networks are employed to learn the data patterns, which are considered less accurate and interpretable compared to tree-based models on tabular datasets. To this end, we propose a hybrid federated learning framework based on XGBoost, for distributed power prediction from real-time external features. In addition to introducing boosted trees to improve accuracy and interpretability, we combine horizontal and vertical federated learning, to address the scenario where features are scattered in local heterogeneous parties and samples are scattered in various local districts. Moreover, we design a dynamic task allocation scheme such that each party gets a fair share of information, and the computing power of each party can be fully leveraged to boost training efficiency. A follow-up case study is presented to justify the necessity of adopting the proposed framework. The advantages of the proposed framework in fairness, efficiency and accuracy performance are also confirmed.

preprint2022arXiv

CATCH: Chasing All Transients Constellation Hunters Space Mission

In time-domain astronomy, a substantial number of transients will be discovered by multi-wavelength and multi-messenger observatories, posing a great challenge for follow-up capabilities. We have thus proposed an intelligent X-ray constellation, the Chasing All Transients Constellation Hunters (CATCH) space mission. Consisting of 126 micro-satellites in three types, CATCH will have the capability to perform follow-up observations for a large number of different types of transients simultaneously. Each satellite in the constellation will carry lightweight X-ray optics and use a deployable mast to increase the focal length. The combination of different optics and detector systems enables different types of satellites to have multiform observation capabilities, including timing, spectroscopy, imaging, and polarization. Controlled by the intelligent system, different satellites can cooperate to perform uninterrupted monitoring, all-sky follow-up observations, and scanning observations with a flexible field of view (FOV) and multi-dimensional observations. Therefore, CATCH will be a powerful mission to study the dynamic universe. Here, we present the current design of the spacecraft, optics, detector system, constellation configuration and observing modes, as well as the development plan.

preprint2022arXiv

Data-driven identification of the spatio-temporal structure of turbulent flows by streaming Dynamic Mode Decomposition

Streaming Dynamic Mode Decomposition (sDMD) (Hemati et al., Phys. Fluids 26(2014)) is a low-storage version of Dynamic Mode Decomposition (DMD) (Schmid, J. Fluid Mech. 656 (2010)), a data-driven method to extract spatio-temporal flow patterns. Streaming DMD avoids storing the entire data sequence in memory by approximating the dynamic modes through incremental updates with new available data. In this paper, we use sDMD to identify and extract dominant spatio-temporal structures of different turbulent flows, requiring the analysis of large datasets. First, the efficiency and accuracy of sDMD are compared to the classical DMD, using a publicly available test dataset that consists of velocity field snapshots obtained by direct numerical simulation of a wake flow behind a cylinder. Streaming DMD not only reliably reproduces the most important dynamical features of the flow; our calculations also highlight its advantage in terms of the required computational resources. We subsequently use sDMD to analyse three different turbulent flows that all show some degree of large-scale coherence: rapidly rotating Rayleigh--Bénard convection, horizontal convection and the asymptotic suction boundary layer. Structures of different frequencies and spatial extent can be clearly separated, and the prominent features of the dynamics are captured with just a few dynamic modes. In summary, we demonstrate that sDMD is a powerful tool for the identification of spatio-temporal structures in a wide range of turbulent flows.

preprint2022arXiv

Data-Driven Robust Control for Discrete Linear Time-Invariant Systems: A Descriptor System Approach

Given the recent surge of interest in data-driven control, this paper proposes a two-step method to study robust data-driven control for a parameter-unknown linear time-invariant (LTI) system that is affected by energy-bounded noises. First, two data experiments are designed and corresponding data are collected, then the investigated system is equivalently written into a data-based descriptor system with structured parametric uncertainties. Second, combined with model-based control theory for descriptor systems, state feedback controllers are designed for such data-based descriptor system, which stabilize the original LTI system and guarantee the ${H_\infty}$ performance. Finally, a simulation example is provided to illustrate the effectiveness and merits of our method.

preprint2022arXiv

Domain Knowledge-Based Automated Analog Circuit Design with Deep Reinforcement Learning

The design automation of analog circuits is a longstanding challenge in the integrated circuit field. This paper presents a deep reinforcement learning method to expedite the design of analog circuits at the pre-layout stage, where the goal is to find device parameters to fulfill desired circuit specifications. Our approach is inspired by experienced human designers who rely on domain knowledge of analog circuit design (e.g., circuit topology and couplings between circuit specifications) to tackle the problem. Unlike all prior methods, our method originally incorporates such key domain knowledge into policy learning with a graph-based policy network, thereby best modeling the relations between circuit parameters and design targets. Experimental results on exemplary circuits show it achieves human-level design accuracy (~99%) with 1.5x efficiency of existing best-performing methods. Our method also shows better generalization ability to unseen specifications and optimality in circuit performance optimization. Moreover, it applies to designing diverse analog circuits across different semiconductor technologies, breaking the limitations of prior ad-hoc methods in designing one particular type of analog circuits with conventional semiconductor technology.

preprint2022arXiv

Domain Knowledge-Infused Deep Learning for Automated Analog/Radio-Frequency Circuit Parameter Optimization

The design automation of analog circuits is a longstanding challenge. This paper presents a reinforcement learning method enhanced by graph learning to automate the analog circuit parameter optimization at the pre-layout stage, i.e., finding device parameters to fulfill desired circuit specifications. Unlike all prior methods, our approach is inspired by human experts who rely on domain knowledge of analog circuit design (e.g., circuit topology and couplings between circuit specifications) to tackle the problem. By originally incorporating such key domain knowledge into policy training with a multimodal network, the method best learns the complex relations between circuit parameters and design targets, enabling optimal decisions in the optimization process. Experimental results on exemplary circuits show it achieves human-level design accuracy (99%) 1.5X efficiency of existing best-performing methods. Our method also shows better generalization ability to unseen specifications and optimality in circuit performance optimization. Moreover, it applies to design radio-frequency circuits on emerging semiconductor technologies, breaking the limitations of prior learning methods in designing conventional analog circuits.

preprint2022arXiv

Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation

Personalized recommendation is an important class of deep-learning applications that powers a large collection of internet services and consumes a considerable amount of datacenter resources. As the scale of production-grade recommendation systems continues to grow, optimizing their serving performance and efficiency in a heterogeneous datacenter is important and can translate into infrastructure capacity saving. In this paper, we propose Hercules, an optimized framework for personalized recommendation inference serving that targets diverse industry-representative models and cloud-scale heterogeneous systems. Hercules performs a two-stage optimization procedure - offline profiling and online serving. The first stage searches the large under-explored task scheduling space with a gradient-based search algorithm achieving up to 9.0x latency-bounded throughput improvement on individual servers; it also identifies the optimal heterogeneous server architecture for each recommendation workload. The second stage performs heterogeneity-aware cluster provisioning to optimize resource mapping and allocation in response to fluctuating diurnal loads. The proposed cluster scheduler in Hercules achieves 47.7% cluster capacity saving and reduces the provisioned power by 23.7% over a state-of-the-art greedy scheduler.

preprint2022arXiv

Lattice Convolutional Networks for Learning Ground States of Quantum Many-Body Systems

Deep learning methods have been shown to be effective in representing ground-state wave functions of quantum many-body systems. Existing methods use convolutional neural networks (CNNs) for square lattices due to their image-like structures. For non-square lattices, existing method uses graph neural network (GNN) in which structure information is not precisely captured, thereby requiring additional hand-crafted sublattice encoding. In this work, we propose lattice convolutions in which a set of proposed operations are used to convert non-square lattices into grid-like augmented lattices on which regular convolution can be applied. Based on the proposed lattice convolutions, we design lattice convolutional networks (LCN) that use self-gating and attention mechanisms. Experimental results show that our method achieves performance on par or better than existing methods on spin 1/2 $J_1$-$J_2$ Heisenberg model over the square, honeycomb, triangular, and kagome lattices while without using hand-crafted encoding.

preprint2022arXiv

Music Influence Modeling Based on Directed Network Model

Studying the history of music may provide a glimpse into the development of human creativity as we examine the evolutionary and revolutionary trends in music and genres. First, a musical influence metric was created to construct a directed network of musical influence. Second, we examined the revolutions and development of musical genres, modeled the similarity, and explored similarities and influences within and between genres. Hierarchical cluster analysis and time series analysis of genres were used to explore the correlation between genres. Finally, Network Analysis, Semantic Analysis, and Random Forest Model are employed to find the revolutionaries. The above work was applied to Country music to sort out and analyze its evolution. In studying the connection between music and the social environment, time series analysis is used to determine the impact of social, political, or technological changes on music.

preprint2022arXiv

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

Processing-in-memory (PIM) architectures have demonstrated great potential in accelerating numerous deep learning tasks. Particularly, resistive random-access memory (RRAM) devices provide a promising hardware substrate to build PIM accelerators due to their abilities to realize efficient in-situ vector-matrix multiplications (VMMs). However, existing PIM accelerators suffer from frequent and energy-intensive analog-to-digital (A/D) conversions, severely limiting their performance. This paper presents a new PIM architecture to efficiently accelerate deep learning tasks by minimizing the required A/D conversions with analog accumulation and neural approximated peripheral circuits. We first characterize the different dataflows employed by existing PIM accelerators, based on which a new dataflow is proposed to remarkably reduce the required A/D conversions for VMMs by extending shift and add (S+A) operations into the analog domain before the final quantizations. We then leverage a neural approximation method to design both analog accumulation circuits (S+A) and quantization circuits (ADCs) with RRAM crossbar arrays in a highly-efficient manner. Finally, we apply them to build an RRAM-based PIM accelerator (i.e., \textbf{Neural-PIM}) upon the proposed analog dataflow and evaluate its system-level performance. Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy, compared to the state-of-the-art RRAM-based PIM accelerators, i.e., ISAAC (CASCADE).

preprint2022arXiv

Robust and fair work allocation

In today's digital world, interaction with online platforms is ubiquitous, and thus content moderation is important for protecting users from content that do not comply with pre-established community guidelines. Having a robust content moderation system throughout every stage of planning is particularly important. We study the short-term planning problem of allocating human content reviewers to different harmful content categories. We use tools from fair division and study the application of competitive equilibrium and leximin allocation rules. Furthermore, we incorporate, to the traditional Fisher market setup, novel aspects that are of practical importance. The first aspect is the forecasted workload of different content categories. We show how a formulation that is inspired by the celebrated Eisenberg-Gale program allows us to find an allocation that not only satisfies the forecasted workload, but also fairly allocates the remaining reviewing hours among all content categories. The resulting allocation is also robust as the additional allocation provides a guardrail in cases where the actual workload deviates from the predicted workload. The second practical consideration is time dependent allocation that is motivated by the fact that partners need scheduling guidance for the reviewers across days to achieve efficiency. To address the time component, we introduce new extensions of the various fair allocation approaches for the single-time period setting, and we show that many properties extend in essence, albeit with some modifications. Related to the time component, we additionally investigate how to satisfy markets' desire for smooth allocation (e.g., partners for content reviewers prefer an allocation that does not vary much from time to time, to minimize staffing switch). We demonstrate the performance of our proposed approaches through real-world data obtained from Meta.

preprint2022arXiv

Self-Supervised Graph Neural Networks for Improved Electroencephalographic Seizure Analysis

Automated seizure detection and classification from electroencephalography (EEG) can greatly improve seizure diagnosis and treatment. However, several modeling challenges remain unaddressed in prior automated seizure detection and classification studies: (1) representing non-Euclidean data structure in EEGs, (2) accurately classifying rare seizure types, and (3) lacking a quantitative interpretability approach to measure model ability to localize seizures. In this study, we address these challenges by (1) representing the spatiotemporal dependencies in EEGs using a graph neural network (GNN) and proposing two EEG graph structures that capture the electrode geometry or dynamic brain connectivity, (2) proposing a self-supervised pre-training method that predicts preprocessed signals for the next time period to further improve model performance, particularly on rare seizure types, and (3) proposing a quantitative model interpretability approach to assess a model's ability to localize seizures within EEGs. When evaluating our approach on seizure detection and classification on a large public dataset, we find that our GNN with self-supervised pre-training achieves 0.875 Area Under the Receiver Operating Characteristic Curve on seizure detection and 0.749 weighted F1-score on seizure classification, outperforming previous methods for both seizure detection and classification. Moreover, our self-supervised pre-training strategy significantly improves classification of rare seizure types. Furthermore, quantitative interpretability analysis shows that our GNN with self-supervised pre-training precisely localizes 25.4% focal seizures, a 21.9 point improvement over existing CNNs. Finally, by superimposing the identified seizure locations on both raw EEG signals and EEG graphs, our approach could provide clinicians with an intuitive visualization of localized seizure regions.

preprint2022arXiv

Simultaneous Detection of Optical Flares of the Magnetically Active M Dwarf Wolf 359

We present detections of stellar flares of Wolf\,359, an M6.5 dwarf in the solar neighborhood (2.41~pc) known to be prone to flares due to surface magnetic activity. The observations were carried out from 2020 April 23 to 29 with a 1-m and a 0.5-m telescope separated by nearly 300~km in Xinjiang, China. In 27~hr of photometric monitoring, a total of 13 optical flares were detected, each with a total energy of $\gtrsim 5 \times 10^{29}$~erg. The measured event rate of about once every two hours is consistent with those reported previously in radio, X-ray and optical wavelengths for this star. One such flare, detected by both telescopes on 26 April, was an energetic event with a released energy of nearly $10^{33}$~erg. The two-telescope lightcurves of this major event sampled at different cadences and exposure timings enabled us to better estimate the intrinsic flare profile, which reached a peak of up to 1.6 times the stellar quiescent brightness, that otherwise would have been underestimated in the observed flare amplitudes of about $0.4$ and $0.8$, respectively, with single telescopes alone. The compromise between fast sampling so as to resolve a flare profile versus a longer integration time for higher photometric signal-to-noise provides a useful guidance in the experimental design of future flare observations.

preprint2021arXiv

Control Reconfiguration of Dynamical Systems for Improved Performance via Reverse- and Forward-engineering

This paper presents a control reconfiguration approach to improve the performance of two classes of dynamical systems. Motivated by recent research on re-engineering cyber-physical systems, we propose a three-step control retrofit procedure. First, we reverse-engineer a dynamical system to dig out an optimization problem it actually solves. Second, we forward-engineer the system by applying a corresponding faster algorithm to solve this optimization problem. Finally, by comparing the original and accelerated dynamics, we obtain the implementation of the redesigned part (the extra dynamics). As a result, the convergence rate/speed or transient behavior of the given system can be improved while the system control structure is maintained. Internet congestion control and distributed proportional-integral (PI) control, as applications in the two different classes of target systems, are used to show the effectiveness of the proposed approach.

preprint2021arXiv

Data-Driven Controllability Analysis and Stabilization for Linear Descriptor Systems

For a parameter-unknown linear descriptor system, this paper proposes data-driven methods to testify the system's type and controllability and then to stabilize it. First, a data-based condition is developed to identify whether this unknown system is a descriptor system or is equivalent to a normal system. Furthermore, various controllability concepts are testified by replacing the descriptor system's matrices with data. Finally, a data-based decomposing method is proposed to transfer the nominal system into its slow-fast subsystems' form, so that a state feedback controller for the slow subsystem can be obtained from persistently exciting input and state sequences. Meanwhile, due to the equivalent stabilizability between the nominal system and its slow subsystem, a state feedback controller which stabilizes the nominal system is also obtained. A simulation example is provided to illustrate the effectiveness of those methods.

preprint2021arXiv

Disentangled Recurrent Wasserstein Autoencoder

Learning disentangled representations leads to interpretable models and facilitates data generation with style transfer, which has been extensively studied on static data such as images in an unsupervised learning framework. However, only a few works have explored unsupervised disentangled sequential representation learning due to challenges of generating sequential data. In this paper, we propose recurrent Wasserstein Autoencoder (R-WAE), a new framework for generative modeling of sequential data. R-WAE disentangles the representation of an input sequence into static and dynamic factors (i.e., time-invariant and time-varying parts). Our theoretical analysis shows that, R-WAE minimizes an upper bound of a penalized form of the Wasserstein distance between model distribution and sequential data distribution, and simultaneously maximizes the mutual information between input data and different disentangled latent factors, respectively. This is superior to (recurrent) VAE which does not explicitly enforce mutual information maximization between input data and disentangled latent representations. When the number of actions in sequential data is available as weak supervision information, R-WAE is extended to learn a categorical latent representation of actions to improve its disentanglement. Experiments on a variety of datasets show that our models outperform other baselines with the same settings in terms of disentanglement and unconditional video generation both quantitatively and qualitatively.

preprint2021arXiv

Millimeter-sized Dust Grains Appear Surviving the Water-sublimating Temperature in the Inner 10 au of the FU Ori Disk

Previous observations have shown that the $\lesssim$10 au, $\gtrsim$400 K hot inner disk of the archetypal accretion outburst young stellar object, FU Ori, is dominated by viscous heating. To constrain dust properties in this region, we have performed radio observations toward this disk using the Karl G. Jansky Very Large Array (JVLA) in 2020 June-July, September, and November. We also performed complementary optical photometric monitoring observations. We found that the dust thermal emission from the hot inner disk mid-plane of FU Ori has been approximately stationary and the maximum dust grain size is $\gtrsim$1.6 mm in this region. If the hot inner disk of FU Ori which is inward of the 150-170 K water snowline is turbulent (e.g., corresponding to a Sunyaev & Shakura viscous $α_{t}\gtrsim$0.1), or if the actual maximum grain size is still larger than the lower limit we presently constrain, then as suggested by the recent analytical calculations and the laboratory measurements, water-ice free dust grains may be stickier than water-ice coated dust grains in protoplanetary disks. Additionally, we find that the free-free emission and the Johnson B and V bands magnitudes of these binary stars are brightening in 2016-2020. The optical and radio variability might be related to the dynamically evolving protostellar or disk accretion activities. Our results highlight that hot inner disks of outbursting objects are important laboratories for testing models of dust grain growth. Given the active nature of such systems, to robustly diagnose the maximum dust grain sizes, it is important to carry out coordinated multi-wavelength radio observations.

preprint2021arXiv

On the Convergence of Tsetlin Machines for the XOR Operator

The Tsetlin Machine (TM) is a novel machine learning algorithm with several distinct properties, including transparent inference and learning using hardware-near building blocks. Although numerous papers explore the TM empirically, many of its properties have not yet been analyzed mathematically. In this article, we analyze the convergence of the TM when input is non-linearly related to output by the XOR-operator. Our analysis reveals that the TM, with just two conjunctive clauses, can converge almost surely to reproducing XOR, learning from training data over an infinite time horizon. Furthermore, the analysis shows how the hyper-parameter T guides clause construction so that the clauses capture the distinct sub-patterns in the data. Our analysis of convergence for XOR thus lays the foundation for analyzing other more complex logical expressions. These analyses altogether, from a mathematical perspective, provide new insights on why TMs have obtained state-of-the-art performance on several pattern recognition problems

preprint2020arXiv

Fluctuations of ergodic sums on periodic orbits under specification

We study the fluctuations of ergodic sums using global and local specifications on periodic points. We obtain Lindeberg-type central limit theorems in both situations. As an application, when the system possesses a unique measure of maximal entropy, we show weak convergence of ergodic sums to a mixture of normal distributions. Our results suggest decomposing the variances of ergodic sums according to global and local sources.

preprint2020arXiv

Legal Assignments and fast EADAM with consent via classical theory of stable matchings

Gale and Shapley's stable assignment problem has been extensively studied, applied, and extended. In the context of school choice, mechanisms often aim at finding an assignment that is more favorable to students. We investigate two extensions introduced in this framework -- legal assignments and the EADAM algorithm -- through the lens of classical theory of stable matchings. In any instance, the set ${\cal L}$ of legal assignments is known to contain all stable assignments. We prove that ${\cal L}$ is exactly the set of stable assignments in another instance. Moreover, we show that essentially all optimization problems over ${\cal L}$ can be solved within the same time bound needed for solving it over the set of stable assignments. A key tool for this latter result is an algorithm that finds the student-optimal legal assignment. We then generalize our algorithm to obtain the assignment output of EADAM with any given set of consenting students without sacrificing the running time, hence largely improving in both theory and practice over known algorithms. Lastly, we show that the set ${\cal L}$ can be much larger than the set of stable matchings, connecting legal matchings with certain concepts and open problems in the literature.

preprint2020arXiv

Machine Translation System Selection from Bandit Feedback

Adapting machine translation systems in the real world is a difficult problem. In contrast to offline training, users cannot provide the type of fine-grained feedback (such as correct translations) typically used for improving the system. Moreover, different users have different translation needs, and even a single user's needs may change over time. In this work we take a different approach, treating the problem of adaptation as one of selection. Instead of adapting a single system, we train many translation systems using different architectures, datasets, and optimization methods. Using bandit learning techniques on simulated user feedback, we learn a policy to choose which system to use for a particular translation task. We show that our approach can (1) quickly adapt to address domain changes in translation tasks, (2) outperform the single best system in mixed-domain translation tasks, and (3) make effective instance-specific decisions when using contextual bandit strategies.

preprint2020arXiv

OralCam: Enabling Self-Examination and Awareness of Oral Health Using a Smartphone Camera

Due to a lack of medical resources or oral health awareness, oral diseases are often left unexamined and untreated, affecting a large population worldwide. With the advent of low-cost, sensor-equipped smartphones, mobile apps offer a promising possibility for promoting oral health. However, to the best of our knowledge, no mobile health (mHealth) solutions can directly support a user to self-examine their oral health condition. This paper presents OralCam, the first interactive app that enables end-users' self-examination of five common oral conditions (diseases or early disease signals) by taking smartphone photos of one's oral cavity. OralCam allows a user to annotate additional information (e.g. living habits, pain, and bleeding) to augment the input image, and presents the output hierarchically, probabilistically and with visual explanations to help a laymen user understand examination results. Developed on our in-house dataset that consists of 3,182 oral photos annotated by dental experts, our deep learning based framework achieved an average detection sensitivity of 0.787 over five conditions with high localization accuracy. In a week-long in-the-wild user study (N=18), most participants had no trouble using OralCam and interpreting the examination results. Two expert interviews further validate the feasibility of OralCam for promoting users' awareness of oral health.

preprint2020arXiv

Quaternion Product Units for Deep Learning on 3D Rotation Groups

We propose a novel quaternion product unit (QPU) to represent data on 3D rotation groups. The QPU leverages quaternion algebra and the law of 3D rotation group, representing 3D rotation data as quaternions and merging them via a weighted chain of Hamilton products. We prove that the representations derived by the proposed QPU can be disentangled into "rotation-invariant" features and "rotation-equivariant" features, respectively, which supports the rationality and the efficiency of the QPU in theory. We design quaternion neural networks based on our QPUs and make our models compatible with existing deep learning models. Experiments on both synthetic and real-world data show that the proposed QPU is beneficial for the learning tasks requiring rotation robustness.

preprint2020arXiv

Site testing campaign for the Large Optical/infrared Telescope of China: Overview

The Large Optical/infrared Telescope (LOT) is a ground-based 12m diameter optical/infrared telescope which is proposed to be built in the western part of China in the next decade. Based on satellite remote sensing data, along with geographical, logistical and political considerations, three candidate sites were chosen for ground-based astronomical performance monitoring. These sites include: Ali in Tibet, Daocheng in Sichuan, and Muztagh Ata in Xinjiang. Up until now, all three sites have continuously collected data for two years. In this paper, we will introduce this site testing campaign, and present its monitoring results obtained during the period between March 2017 and March 2019.

preprint2020arXiv

Site-testing at Muztagh-ata site I: Ground Meteorology and Sky Brightness

Site-testing is crucial for achieving the goal of scientific research and analysis of meteorological and optical observing conditions is one of the basic tasks of it. As one of three potential sites to host 12-meter Large Optical/infrared Telescope (LOT), Muztagh-ata site which is located on the Pamirs Plateau in west China's Xinjiang began its site-testing task in the spring of 2017. In this paper, we firstly start with an introduction to the site and then present a statistical analysis of the ground-level meteorological properties such as air temperature, barometric pressure, relative humidity, wind speed and direction, recorded by automatic weather station with standard meteorological sensors for two-year long. We also show the monitoring results of sky brightness during this period.

preprint2020arXiv

Site-testing at Muztagh-ata site II: Seeing statistics

In this article, we present a detailed analysis of the statistical properties of seeing for the Muztagh-ata site which is the candidate site for hosting future Chinese Large Optical/infrared Telescope (LOT) project. The measurement was obtained with Differential Image Motion Monitor (DIMM) from April 2017 to November 2018 at different heights during different periods. The median seeing at 11 meters and 6 meters are very close but different significantly from that on the ground. We mainly analyzed the seeing at 11 meters monthly and hourly, having found that the best season for observing was from late autumn to early winter and seeing tended to improve during the night only in autumn. The analysis of the dependence on temperature inversion, wind speed, direction also was made and the best meteorological conditions for seeing is given.

preprint2020arXiv

Stable intense 1 kHz supercontinuum light generation in air

Supercontinuum (SC) light source has advanced ultrafast laser spectroscopy in condensed matter science, biology, physics, and chemistry. Compared to the frequently used photonic crystal fibers and bulk materials, femtosecond laser filamentation in gases is damage-immune for supercontinuum generation. A bottleneck problem is the strong jitters from filament induced self-heating at kHz repetition rate level. We demonstrate stable kHz supercontinuum generation directly in air with multiple mJ level pulse energy. This is achieved by applying an external DC electric field to the air plasma filament through the effects of plasma wave guiding and Coulomb interaction. Both pointing and intensity jitters of 1 kHz air filament induced SC light are reduced by more than 2 fold. This offers the opportunities for stable intense SC generation and other laser filament based applications in air.

preprint2020arXiv

The Architectural Implications of Facebook's DNN-based Personalized Recommendation

The widespread application of deep learning has changed the landscape of computation in the data center. In particular, personalized recommendation for content ranking is now largely accomplished leveraging deep neural networks. However, despite the importance of these models and the amount of compute cycles they consume, relatively little research attention has been devoted to systems for recommendation. To facilitate research and to advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inferences can drastically improve latency-bounded throughput, and the diverse composition of recommendation models leads to different optimization strategies.

preprint2020arXiv

The Wide-field Photometric System of the Nanshan One-meter Telescope

The Nanshan One-meter Wide-field Telescope (NOWT) is a prime focus system located at Nanshan Station of Xinjiang Astronomical Observatories (XAO). The field of view(FOV) was designed to 1.5 degree *1.5 degree, and Johnson-Cousins UBVRI system was chosen as the main Filter set. The telescope has been providing observation services for astronomers since Sept. 2013. Variable source searching and time-domain surveys are the main scientific goals. The system's test results are reported including linearity, dark current, bias, readout noise and gain of the CCD camera. The accurate instrumental calibration coefficients in UBVRI bands was driven with Landolt standard stars during photometric nights. Finally, the limiting magnitudes are given with signal-to-noise ratios and various exposure times for observers.

preprint2019arXiv

Boundary Zonal Flow in Rotating Turbulent Rayleigh-Bénard Convection

For rapidly rotating turbulent Rayleigh--Bénard convection in a slender cylindrical cell, experiments and direct numerical simulations reveal a boundary zonal flow (BZF) that replaces the classical large-scale circulation. The BZF is located near the vertical side wall and enables enhanced heat transport there. Although the azimuthal velocity of the BZF is cyclonic (in the rotating frame), the temperature is an anticyclonic traveling wave of mode one whose signature is a bimodal temperature distribution near the radial boundary. The BZF width is found to scale like $Ra^{1/4}Ek^{2/3}$ where the Ekman number $Ek$ decreases with increasing rotation rate.

preprint2019arXiv

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns that pose a fundamental challenge to accelerate. This paper proposes a lightweight, commodity DRAM compliant, near-memory processing solution to accelerate personalized recommendation inference. The in-depth characterization of production-grade recommendation models shows that embedding operations with high model-, operator- and data-level parallelism lead to memory bandwidth saturation, limiting recommendation inference performance. We propose RecNMP which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models. RecNMP is specifically tailored to production environments with heavy co-location of operators on a single server. Several hardware/software co-optimization techniques such as memory-side caching, table-aware packet scheduling, and hot entry profiling are studied, resulting in up to 9.8x memory latency speedup over a highly-optimized baseline. Overall, RecNMP offers 4.2x throughput improvement and 45.8% memory energy savings.