Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
41works
0followers
23topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

41 published item(s)

preprint2026arXiv

First Submillimeter Lights from Dome A: Tracing the Carbon Cycle in the Feedback of Massive Stars

The cycling of carbon between its ionized, atomic, and molecular phases shapes the chemical compositions and physical conditions of the interstellar medium (ISM). However, ground-based studies of the full carbon cycle have been limited by atmospheric absorption. Dome~A, the most promising site for submillimeter astronomy, has long resisted successful submillimeter astronomical observations. Using the 60~cm Antarctic Terahertz Explorer, we present the first successful CO ($4-3$) and [CI] ($^3P_1 - ^3P_0$) mapping observations of two archetypal triggered massive star-formation regions at Dome~A. These data, together with archival [CII], provide the first complete characterization of all three carbon phases in these environments. We find elevated C$^{0}$/CO abundance ratios in high-extinction regions, plausibly driven by deep penetration of intense radiation fields from massive stars into a clumpy ISM. These findings mark a major milestone for submillimeter astronomy at Dome~A and offer valuable insights into the impact of massive star feedback on the surrounding ISM.

preprint2026arXiv

Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation

Large language model (LLM)-based generative list-wise recommendation has advanced rapidly, but decoding remains sequential and thus latency-prone. To accelerate inference without changing the target distribution, speculative decoding (SD) uses a small draft model to propose several next tokens at once and a target LLM to verify and accept the longest prefix, skipping multiple steps per round. In generative recommendation, however, each item is represented by multiple semantic-ID tokens, often with separators, and current drafts typically treat these tokens uniformly. This overlooks two practical facts: (i) a token's semantics depend on its within-item slot, and (ii) uncertainty tends to increase with speculation depth. Without modeling these effects, SD's speedups can be limited. We introduce PAD-Rec, Position-Aware Drafting for generative Recommendation, a lightweight module that augments the draft model with two complementary signals. Item position embeddings explicitly encode the within-item slot of each token, strengthening structural awareness. Step position embeddings encode the draft step, allowing the model to adapt to depth-dependent uncertainty and improve proposal quality. To harmonize these signals with base features, we add simple gates: a learnable coefficient for item slots and a context-driven gate for draft steps. The module is trainable, easy to integrate with standard draft models, and adds negligible inference overhead. Extensive experiments on four real-world datasets show up to 3.1x wall-clock speedup and about 5% average wall-clock speedup gain over strong SD baselines, while largely preserving recommendation quality.

preprint2026arXiv

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Recent progress in text-to-image (T2I) diffusion models (DMs) has enabled high-quality visual synthesis from diverse textual prompts. Yet, most existing T2I DMs, even those equipped with large language model (LLM)-based text encoders, remain text-pixel mappers -- they employ LLMs merely as text encoders, without leveraging their inherent reasoning capabilities to infer what should be visually depicted given the textual prompt. To move beyond such literal generation, we propose the think-then-generate (T2G) paradigm, where the LLM-based text encoder is encouraged to reason about and rewrite raw user prompts; the states of the rewritten prompts then serve as diffusion conditioning. To achieve this, we first activate the think-then-rewrite pattern of the LLM encoder with a lightweight supervised fine-tuning process. Subsequently, the LLM encoder and diffusion backbone are co-optimized to ensure faithful reasoning about the context and accurate rendering of the semantics via Dual-GRPO. In particular, the text encoder is reinforced using image-grounded rewards to infer and recall world knowledge, while the diffusion backbone is pushed to produce semantically consistent and visually coherent images. Experiments show substantial improvements in factual consistency, semantic alignment, and visual realism across reasoning-based image generation and editing benchmarks, achieving 0.79 on WISE score, nearly on par with GPT-4. Our results constitute a promising step toward next-generation unified models with reasoning, expression, and demonstration capacities.

preprint2025arXiv

Antarctic TianMu Staring Observation Project I: Overview and Implementation of the Prototype Telescope

Wide-field rapid sky surveys serve as critical observational methods for time-domain astronomical research. The Antarctic region, with several months of continuous dark nights annually, is an ideal site for time-domain astronomical observations. The Antarctic TianMu Staring Observation Project aims to deploy a fleet of small telescopes, adopting an array observation model to conduct time-domain optical observations in Antarctica, featuring wide-sky coverage, high-cadence sampling, long-period staring, and simultaneous multi-band measurements. Considering the severe challenges optical telescopes face in Antarctica, including extremely low temperatures, unattended operation, and limited power supply and network transmission, we have designed and developed the Antarctic TianMu prototype telescope based on drift-scan charge-coupled device technology. In October 2022, our prototype (with an aperture of 18 cm), named AT-Proto was transported to Zhongshan Station in Antarctica aboard China's 39th Antarctic Research Expedition. It has since operated stably and reliably in the frigid environment for over two years, demonstrating the significant advantages of this technology in polar astronomical observations. The experimental observation results of AT-Proto provide a solid foundation for the subsequent construction of a time-domain astronomy observation array in Antarctica.

preprint2024arXiv

Two-Stage Constrained Actor-Critic for Short Video Recommendation

The wide popularity of short videos on social media poses new opportunities and challenges to optimize recommender systems on the video-sharing platforms. Users sequentially interact with the system and provide complex and multi-faceted responses, including watch time and various types of interactions with multiple videos. One the one hand, the platforms aims at optimizing the users' cumulative watch time (main goal) in long term, which can be effectively optimized by Reinforcement Learning. On the other hand, the platforms also needs to satisfy the constraint of accommodating the responses of multiple user interactions (auxiliary goals) such like, follow, share etc. In this paper, we formulate the problem of short video recommendation as a Constrained Markov Decision Process (CMDP). We find that traditional constrained reinforcement learning algorithms can not work well in this setting. We propose a novel two-stage constrained actor-critic method: At stage one, we learn individual policies to optimize each auxiliary signal. At stage two, we learn a policy to (i) optimize the main signal and (ii) stay close to policies learned at the first stage, which effectively guarantees the performance of this main policy on the auxiliaries. Through extensive offline evaluations, we demonstrate effectiveness of our method over alternatives in both optimizing the main goal as well as balancing the others. We further show the advantage of our method in live experiments of short video recommendations, where it significantly outperforms other baselines in terms of both watch time and interactions. Our approach has been fully launched in the production system to optimize user experiences on the platform.

preprint2023arXiv

A SSIM Guided cGAN Architecture For Clinically Driven Generative Image Synthesis of Multiplexed Spatial Proteomics Channels

Here we present a structural similarity index measure (SSIM) guided conditional Generative Adversarial Network (cGAN) that generatively performs image-to-image (i2i) synthesis to generate photo-accurate protein channels in multiplexed spatial proteomics images. This approach can be utilized to accurately generate missing spatial proteomics channels that were not included during experimental data collection either at the bench or the clinic. Experimental spatial proteomic data from the Human BioMolecular Atlas Program (HuBMAP) was used to generate spatial representations of missing proteins through a U-Net based image synthesis pipeline. HuBMAP channels were hierarchically clustered by the (SSIM) as a heuristic to obtain the minimal set needed to recapitulate the underlying biology represented by the spatial landscape of proteins. We subsequently prove that our SSIM based architecture allows for scaling of generative image synthesis to slides with up to 100 channels, which is better than current state of the art algorithms which are limited to data with 11 channels. We validate these claims by generating a new experimental spatial proteomics data set from human lung adenocarcinoma tissue sections and show that a model trained on HuBMAP can accurately synthesize channels from our new data set. The ability to recapitulate experimental data from sparsely stained multiplexed histological slides containing spatial proteomic will have tremendous impact on medical diagnostics and drug development, and also raises important questions on the medical ethics of utilizing data produced by generative image synthesis in the clinical setting. The algorithm that we present in this paper will allow researchers and clinicians to save time and costs in proteomics based histological staining while also increasing the amount of data that they can generate through their experiments.

preprint2022arXiv

Broad Emission and Absorption Line Outflows in the Quasar SDSS J163345.22+512748.4

We present a detailed study of the optical and NIR emission and absorption line spectrum of the quasar SDSS J163345.22+512748.4. We discovered on the newly acquired NIR spectrum a highly meta-stable neutral helium broad absorption line (BAL) \heiozetz\ with a width of $\sim$ 2000 \kmps\ and a blueshift of $\sim$ 7000 \kmps\ in the velocity space. The BAL system is also significantly detected in \mgii\ and \heiteen. We estimate a column density of $(5.0 \pm 1.7) \times 10^{14}$ cm$^{-2}$ for the HeI*(2~$^3$S) level, and infer an ionization parameter of $U_{A} = 10^{-1.9\pm 0.2}$ for the BAL outflow assuming that the BAL region is thick enough for a full development of an ionization front. The total column density of the BAL outflow is constrained in the range N$\rm _{H}$ $\sim$ 10$^{21}$-10$^{21.4}$ cm$^{-2}$. We also found that the bulk of both MgII and UV FeII, as well as H$α$ broad emission lines (BELs) are blueshifted with a velocity of $\sim$ 2200 \kmps\ with respect to the quasar systemic redshift. We constrain that the blueshifted BEL region has a covering factor $C_{f}\approx 16\%$, a density n$\rm _{H}$ $\sim $ 10$^{10.6}$-10$^{11.3}$ cm$^{-3}$, a column density N$\rm _{H}\gtrsim 10^{23}$ cm$^{-2}$, and an ionization parameter $U_{E}\sim 10^{-2.1}-10^{-1.5}$. The outflow gas is located at $\sim$0.1 pc away from the central ionization source, at a scale comparable to the BLR. A toy kinetic model has been proposed to reproduce the profile of MgII BEL well if assuming a partial obscured axisymmetric geometry of the outflow with a radial velocity as observed by the BALs.

preprint2022arXiv

Constrained Reinforcement Learning for Short Video Recommendation

The wide popularity of short videos on social media poses new opportunities and challenges to optimize recommender systems on the video-sharing platforms. Users provide complex and multi-faceted responses towards recommendations, including watch time and various types of interactions with videos. As a result, established recommendation algorithms that concern a single objective are not adequate to meet this new demand of optimizing comprehensive user experiences. In this paper, we formulate the problem of short video recommendation as a constrained Markov Decision Process (MDP), where platforms want to optimize the main goal of user watch time in long term, with the constraint of accommodating the auxiliary responses of user interactions such as sharing/downloading videos. To solve the constrained MDP, we propose a two-stage reinforcement learning approach based on actor-critic framework. At stage one, we learn individual policies to optimize each auxiliary response. At stage two, we learn a policy to (i) optimize the main response and (ii) stay close to policies learned at the first stage, which effectively guarantees the performance of this main policy on the auxiliaries. Through extensive simulations, we demonstrate effectiveness of our approach over alternatives in both optimizing the main goal as well as balancing the others. We further show the advantage of our approach in live experiments of short video recommendations, where it significantly outperforms other baselines in terms of watch time and interactions from video views. Our approach has been fully launched in the production system to optimize user experiences on the platform.

preprint2022arXiv

Contrastive Learning of Features between Images and LiDAR

Image and Point Clouds provide different information for robots. Finding the correspondences between data from different sensors is crucial for various tasks such as localization, mapping, and navigation. Learning-based descriptors have been developed for single sensors; there is little work on cross-modal features. This work treats learning cross-modal features as a dense contrastive learning problem. We propose a Tuple-Circle loss function for cross-modality feature learning. Furthermore, to learn good features and not lose generality, we developed a variant of widely used PointNet++ architecture for point cloud and U-Net CNN architecture for images. Moreover, we conduct experiments on a real-world dataset to show the effectiveness of our loss function and network structure. We show that our models indeed learn information from both images as well as LiDAR by visualizing the features.

preprint2022arXiv

Deep Learning based Intelligent Coin-tap Test for Defect Recognition

The coin-tap test is a convenient and primary method for non-destructive testing, while its manual on-site operation is tough and costly. With the help of the latest intelligent signal processing method, convolutional neural networks (CNN), we achieve an intelligent coin-tap test which exhibited superior performance in recognizing the defects. However, this success of CNNs relies on plenty of well-labeled data from the identical scenario, which could be difficult to get for many real industrial practices. This paper further develops transfer learning strategies for this issue, that is, to transfer the model trained on data of one scenario to another. In experiments, the result presents a notable improvement by using domain adaptation and pseudo label learning strategies. Hence, it becomes possible to apply the model into scenarios with none or little (less than 10\%) labeled data adopting the transfer learning strategies proposed herein. In addition, we used a benchmark dataset constructed ourselves throughout this study. This benchmark dataset for the coin-tap test containing around 100,000 sound signals is published at https://github.com/PPhub-hy/torch-tapnet.

preprint2022arXiv

Double-Barreled Question Detection at Momentive

Momentive offers solutions in market research, customer experience, and enterprise feedback. The technology is gleaned from the billions of real responses to questions asked on the platform. However, people may create biased questions. A double-barreled question (DBQ) is a common type of biased question that asks two aspects in one question. For example, "Do you agree with the statement: The food is yummy, and the service is great.". This DBQ confuses survey respondents because there are two parts in a question. DBQs impact both the survey respondents and the survey owners. Momentive aims to detect DBQs and recommend survey creators to make a change towards gathering high quality unbiased survey data. Previous research work has suggested detecting DBQs by checking the existence of grammatical conjunction. While this is a simple rule-based approach, this method is error-prone because conjunctions can also exist in properly constructed questions. We present an end-to-end machine learning approach for DBQ classification in this work. We handled this imbalanced data using active learning, and compared state-of-the-art embedding algorithms to transform text data into vectors. Furthermore, we proposed a model interpretation technique propagating the vector-level SHAP values to a SHAP value for each word in the questions. We concluded that the word2vec subword embedding with maximum pooling is the optimal word embedding representation in terms of precision and running time in the offline experiments using the survey data at Momentive. The A/B test and production metrics indicate that this model brings a positive change to the business. To the best of our knowledge, this is the first machine learning framework for DBQ detection, and it successfully differentiates Momentive from the competitors. We hope our work sheds light on machine learning approaches for bias question detection.

preprint2022arXiv

Estimation of Solar Observations with the Five-hundred-meter Aperture Spherical Radio Telescope (FAST)

We present the estimation of the solar observation with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). For both the quite Sun and the Sun with radio bursts, when pointing directly to the Sun, the total power received by FAST would be out of the safe operational range of the signal chain, even resulting in the damage to the receiver. As a conclusion, the Sun should be kept at least $\sim 2^{\circ}$ away from the main beam during the observing at $\sim 1.25 {\ \rm GHz}$. The separation for lower frequency should be larger. For simplicity, the angular separation between the FAST beam and the Sun is suggested to be $\sim 5^{\circ}$ for observations on 200 MHz or higher bands.

preprint2022arXiv

Feature-aware Diversified Re-ranking with Disentangled Representations for Relevant Recommendation

Relevant recommendation is a special recommendation scenario which provides relevant items when users express interests on one target item (e.g., click, like and purchase). Besides considering the relevance between recommendations and trigger item, the recommendations should also be diversified to avoid information cocoons. However, existing diversified recommendation methods mainly focus on item-level diversity which is insufficient when the recommended items are all relevant to the target item. Moreover, redundant or noisy item features might affect the performance of simple feature-aware recommendation approaches. Faced with these issues, we propose a Feature Disentanglement Self-Balancing Re-ranking framework (FDSB) to capture feature-aware diversity. The framework consists of two major modules, namely disentangled attention encoder (DAE) and self-balanced multi-aspect ranker. In DAE, we use multi-head attention to learn disentangled aspects from rich item features. In the ranker, we develop an aspect-specific ranking mechanism that is able to adaptively balance the relevance and diversity for each aspect. In experiments, we conduct offline evaluation on the collected dataset and deploy FDSB on KuaiShou app for online A/B test on the function of relevant recommendation. The significant improvements on both recommendation quality and user experience verify the effectiveness of our approach.

preprint2022arXiv

GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with Semi-Supervised Learning and Explicit Policy Injection

Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on enhancing dialog understanding and generation tasks while neglecting the exploitation of dialog policy. In this paper, we propose GALAXY, a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora via semi-supervised learning. Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation with the help of unlabeled dialogs. We also implement a gating mechanism to weigh suitable unlabeled dialog samples. Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems, and achieves new state-of-the-art results on benchmark datasets: In-Car, MultiWOZ2.0 and MultiWOZ2.1, improving their end-to-end combined scores by 2.5, 5.3 and 5.5 points, respectively. We also show that GALAXY has a stronger few-shot ability than existing models under various low-resource settings.

preprint2022arXiv

KuaiRand: An Unbiased Sequential Recommendation Dataset with Randomly Exposed Videos

Recommender systems deployed in real-world applications can have inherent exposure bias, which leads to the biased logged data plaguing the researchers. A fundamental way to address this thorny problem is to collect users' interactions on randomly expose items, i.e., the missing-at-random data. A few works have asked certain users to rate or select randomly recommended items, e.g., Yahoo!, Coat, and OpenBandit. However, these datasets are either too small in size or lack key information, such as unique user ID or the features of users/items. In this work, we present KuaiRand, an unbiased sequential recommendation dataset containing millions of intervened interactions on randomly exposed videos, collected from the video-sharing mobile App, Kuaishou. Different from existing datasets, KuaiRand records 12 kinds of user feedback signals (e.g., click, like, and view time) on randomly exposed videos inserted in the recommendation feeds in two weeks. To facilitate model learning, we further collect rich features of users and items as well as users' behavior history. By releasing this dataset, we enable the research of advanced debiasing large-scale recommendation scenarios for the first time. Also, with its distinctive features, KuaiRand can support various other research directions such as interactive recommendation, long sequential behavior modeling, and multi-task learning. The dataset and its news will be available at https://kuairand.com.

preprint2022arXiv

KuaiRec: A Fully-observed Dataset and Insights for Evaluating Recommender Systems

The progress of recommender systems is hampered mainly by evaluation as it requires real-time interactions between humans and systems, which is too laborious and expensive. This issue is usually approached by utilizing the interaction history to conduct offline evaluation. However, existing datasets of user-item interactions are partially observed, leaving it unclear how and to what extent the missing interactions will influence the evaluation. To answer this question, we collect a fully-observed dataset from Kuaishou's online environment, where almost all 1,411 users have been exposed to all 3,327 items. To the best of our knowledge, this is the first real-world fully-observed data with millions of user-item interactions. With this unique dataset, we conduct a preliminary analysis of how the two factors - data density and exposure bias - affect the evaluation results of multi-round conversational recommendation. Our main discoveries are that the performance ranking of different methods varies with the two factors, and this effect can only be alleviated in certain cases by estimating missing interactions for user simulation. This demonstrates the necessity of the fully-observed dataset. We release the dataset and the pipeline implementation for evaluation at https://kuairec.com

preprint2022arXiv

LBCF: A Large-Scale Budget-Constrained Causal Forest Algorithm

Offering incentives (e.g., coupons at Amazon, discounts at Uber and video bonuses at Tiktok) to user is a common strategy used by online platforms to increase user engagement and platform revenue. Despite its proven effectiveness, these marketing incentives incur an inevitable cost and might result in a low ROI (Return on Investment) if not used properly. On the other hand, different users respond differently to these incentives, for instance, some users never buy certain products without coupons, while others do anyway. Thus, how to select the right amount of incentives (i.e. treatment) to each user under budget constraints is an important research problem with great practical implications. In this paper, we call such problem as a budget-constrained treatment selection (BTS) problem. The challenge is how to efficiently solve BTS problem on a Large-Scale dataset and achieve improved results over the existing techniques. We propose a novel tree-based treatment selection technique under budget constraints, called Large-Scale Budget-Constrained Causal Forest (LBCF) algorithm, which is also an efficient treatment selection algorithm suitable for modern distributed computing systems. A novel offline evaluation method is also proposed to overcome an intrinsic challenge in assessing solutions' performance for BTS problem in randomized control trials (RCT) data. We deploy our approach in a real-world scenario on a large-scale video platform, where the platform gives away bonuses in order to increase users' campaign engagement duration. The simulation analysis, offline and online experiments all show that our method outperforms various tree-based state-of-the-art baselines. The proposed approach is currently serving over hundreds of millions of users on the platform and achieves one of the most tremendous improvements over these months.

preprint2022arXiv

Radio pulsations from a neutron star within the gamma-ray binary LS I +61$^{\circ}$ 303

LS I +61$^{\circ}$ 303 is one of the rare gamma-ray binaries, emitting most of their luminosity in photons with energies beyond 100 MeV. The $\sim$26.5 d orbital period is clearly detected at many wavelengths. Additional aspects of its multi-frequency behavior make it the most interesting example of the class. The morphology of high-resolution radio images changes with orbital phase displaying a cometary tail pointing away from the high-mass star. LS I +61$^{\circ}$ 303 also shows superorbital variability. A couple of energetic ($\sim 10^{37}$ erg s$^{-1}$), short, magnetar-like bursts have been plausibly ascribed to it. LS I +61$^{\circ}$ 303's phenomenology has been put under theoretical scrutiny for decades, but the lack of certainty regarding the nature of the compact object in the binary has prevented advancing our understanding of the source. Here, using observations done with the Five-hundred-meter Aperture Spherical radio Telescope (FAST), we report on the existence of transient radio pulsations from the direction of LS I +61$^{\circ}$ 303. We find a period $P=269.15508 \pm 0.00016$ ms at a significance of $> 20σ$. This is the first evidence for pulsations from this source at any frequency, and strongly argues for the existence of a rotating neutron star in LS I +61$^{\circ}$ 303.

preprint2022arXiv

RELLIS-3D Dataset: Data, Benchmarks and Analysis

Semantic scene understanding is crucial for robust and safe autonomous navigation, particularly so in off-road environments. Recent deep learning advances for 3D semantic segmentation rely heavily on large sets of training data, however existing autonomy datasets either represent urban environments or lack multimodal off-road data. We fill this gap with RELLIS-3D, a multimodal dataset collected in an off-road environment, which contains annotations for 13,556 LiDAR scans and 6,235 images. The data was collected on the Rellis Campus of Texas A\&M University and presents challenges to existing algorithms related to class imbalance and environmental topography. Additionally, we evaluate the current state-of-the-art deep learning semantic segmentation models on this dataset. Experimental results show that RELLIS-3D presents challenges for algorithms designed for segmentation in urban environments. This novel dataset provides the resources needed by researchers to continue to develop more advanced algorithms and investigate new research directions to enhance autonomous navigation in off-road environments. RELLIS-3D is available at https://github.com/unmannedlab/RELLIS-3D

preprint2022arXiv

SFE-AI at SemEval-2022 Task 11: Low-Resource Named Entity Recognition using Large Pre-trained Language Models

Large scale pre-training models have been widely used in named entity recognition (NER) tasks. However, model ensemble through parameter averaging or voting can not give full play to the differentiation advantages of different models, especially in the open domain. This paper describes our NER system in the SemEval 2022 task11: MultiCoNER. We proposed an effective system to adaptively ensemble pre-trained language models by a Transformer layer. By assigning different weights to each model for different inputs, we adopted the Transformer layer to integrate the advantages of diverse models effectively. Experimental results show that our method achieves superior performances in Farsi and Dutch.

preprint2022arXiv

The 2175 Å~ Bump Features in FeLoBAL Quasars: One Indicator of MW-like Dust in the Nuclear Region of Quasar

To investigate the properties of dust in the nuclear region of quasars, we explored the extinction curves of the iron low-ionization broad absorption line (FeLoBAL) quasar SDSS J163004.29+311957.6 and its two analogues. The parameterized extinction curves indicated the Milky Way-like 2175 Å~ bump features in underlying extinction, which are similar to those seen in the Local Group and a subset of high-redshift star-forming galaxies. Compared to the bump features in the Large Magellanic Clouds (LMC), the detections in this work are much closer to those in the Milky Way (MW). These bump features, as well as those in the high- and low-ionization broad absorption line (BAL) quasars of Zhang et al., are probably the counterpart of the 2175 Å~ bump features in the quasar environment. This type of dust grain is generally small, easily disrupted by high-energy photons and has difficulty surviving in the radiation field of the active galactic nucleus (AGN). However, due to the presence of absorption-line outflows, the 2175 Å~ bump feature in quasars, which should be rare, is seen many times in BAL quasars. The shielding effect of outflow clouds allows the MW-like dust grains to be assembled or extends the survival period in the quasar nuclear region. The process, and physical and chemical conditions deserve further observational study and investigation.

preprint2022arXiv

The Neural-Prediction based Acceleration Algorithm of Column Generation for Graph-Based Set Covering Problems

Set covering problem is an important class of combinatorial optimization problems, which has been widely applied and studied in many fields. In this paper, we propose an improved column generation algorithm with neural prediction (CG-P) for solving graph-based set covering problems. We leverage a graph neural network based neural prediction model to predict the probability to be included in the final solution for each edge. Our CG-P algorithm constructs a reduced graph that only contains the edges with higher predicted probability, and this graph reduction process significantly speeds up the solution process. We evaluate the CG-P algorithm on railway crew scheduling problems and it outperforms the baseline column generation algorithm. We provide two solution modes for our CG-P algorithm. In the optimal mode, we can obtain a solution with an optimality guarantee while reducing the time cost to 63.12%. In the fast mode, we can obtain a sub-optimal solution with a 7.62% optimality gap in only 2.91% computation time.

preprint2021arXiv

Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning

Stochastic Gradient Descent (SGD) is the key learning algorithm for many machine learning tasks. Because of its computational costs, there is a growing interest in accelerating SGD on HPC resources like GPU clusters. However, the performance of parallel SGD is still bottlenecked by the high communication costs even with a fast connection among the machines. A simple approach to alleviating this problem, used in many existing efforts, is to perform communication every few iterations, using a constant averaging period. In this paper, we show that the optimal averaging period in terms of convergence and communication cost is not a constant, but instead varies over the course of the execution. Specifically, we observe that reducing the variance of model parameters among the computing nodes is critical to the convergence of periodic parameter averaging SGD. Given a fixed communication budget, we show that it is more beneficial to synchronize more frequently in early iterations to reduce the initial large variance and synchronize less frequently in the later phase of the training process. We propose a practical algorithm, named ADaptive Periodic parameter averaging SGD (ADPSGD), to achieve a smaller overall variance of model parameters, and thus better convergence compared with the Constant Periodic parameter averaging SGD (CPSGD). We evaluate our method with several image classification benchmarks and show that our ADPSGD indeed achieves smaller training losses and higher test accuracies with smaller communication compared with CPSGD. Compared with gradient-quantization SGD, we show that our algorithm achieves faster convergence with only half of the communication. Compared with full-communication SGD, our ADPSGD achieves 1:14x to 1:27x speedups with a 100Gbps connection among computing nodes, and the speedups increase to 1:46x ~ 1:95x with a 10Gbps connection.

preprint2021arXiv

Communication-Efficient Sampling for Distributed Training of Graph Convolutional Networks

Training Graph Convolutional Networks (GCNs) is expensive as it needs to aggregate data recursively from neighboring nodes. To reduce the computation overhead, previous works have proposed various neighbor sampling methods that estimate the aggregation result based on a small number of sampled neighbors. Although these methods have successfully accelerated the training, they mainly focus on the single-machine setting. As real-world graphs are large, training GCNs in distributed systems is desirable. However, we found that the existing neighbor sampling methods do not work well in a distributed setting. Specifically, a naive implementation may incur a huge amount of communication of feature vectors among different machines. To address this problem, we propose a communication-efficient neighbor sampling method in this work. Our main idea is to assign higher sampling probabilities to the local nodes so that remote nodes are accessed less frequently. We present an algorithm that determines the local sampling probabilities and makes sure our skewed neighbor sampling does not affect much the convergence of the training. Our experiments with node classification benchmarks show that our method significantly reduces the communication overhead for distributed GCN training with little accuracy loss.

preprint2021arXiv

Efficient Mining of Frequent Subgraphs with Two-Vertex Exploration

Frequent Subgraph Mining (FSM) is the key task in many graph mining and machine learning applications. Numerous systems have been proposed for FSM in the past decade. Although these systems show good performance for small patterns (with no more than four vertices), we found that they have difficulty in mining larger patterns. In this work, we propose a novel two-vertex exploration strategy to accelerate the mining process. Compared with the single-vertex exploration adopted by previous systems, our two-vertex exploration avoids the large memory consumption issue and significantly reduces the memory access overhead. We further enhance the performance through an index-based quick pattern technique that reduces the overhead of isomorphism checks, and a subgraph sampling technique that mitigates the issue of subgraph explosion. The experimental results show that our system achieves significant speedups against the state-of-the-art graph pattern mining systems and supports larger pattern mining tasks that none of the existing systems can handle.

preprint2021arXiv

Graph Attention Collaborative Similarity Embedding for Recommender System

We present Graph Attention Collaborative Similarity Embedding (GACSE), a new recommendation framework that exploits collaborative information in the user-item bipartite graph for representation learning. Our framework consists of two parts: the first part is to learn explicit graph collaborative filtering information such as user-item association through embedding propagation with attention mechanism, and the second part is to learn implicit graph collaborative information such as user-user similarities and item-item similarities through auxiliary loss. We design a new loss function that combines BPR loss with adaptive margin and similarity loss for the similarities learning. Extensive experiments on three benchmarks show that our model is consistently better than the latest state-of-the-art models.

preprint2021arXiv

NGAT4Rec: Neighbor-Aware Graph Attention Network For Recommendation

Learning informative representations (aka. embeddings) of users and items is the core of modern recommender systems. Previous works exploit user-item relationships of one-hop neighbors in the user-item interaction graph to improve the quality of representation. Recently, the research of Graph Neural Network (GNN) for recommendation considers the implicit collaborative information of multi-hop neighbors to enrich the representation. However, most works of GNN for recommendation systems do not consider the relational information which implies the expression differences of different neighbors in the neighborhood explicitly. The influence of each neighboring item to the representation of the user's preference can be represented by the correlation between the item and neighboring items of the user. Symmetrically, for a given item, the correlation between one neighboring user and neighboring users can reflect the strength of signal about the item's characteristic. To modeling the implicit correlations of neighbors in graph embedding aggregating, we propose a Neighbor-Aware Graph Attention Network for recommendation task, termed NGAT4Rec. It employs a novel neighbor-aware graph attention layer that assigns different neighbor-aware attention coefficients to different neighbors of a given node by computing the attention among these neighbors pairwisely. Then NGAT4Rec aggregates the embeddings of neighbors according to the corresponding neighbor-aware attention coefficients to generate next layer embedding for every node. Furthermore, we combine more neighbor-aware graph attention layer to gather the influential signals from multi-hop neighbors. We remove feature transformation and nonlinear activation that proved to be useless on collaborative filtering. Extensive experiments on three benchmark datasets show that our model outperforms various state-of-the-art models consistently.

preprint2021arXiv

Record high $T_{\rm c}$ and robust superconductivity in transition metal $δ$-Ti phase at megabar pressure

We report a record high superconducting transition temperature ($T_{\rm c}$) up to 23.6 K under high pressure in the elemental metal Ti, one of the top ten most abundant elements in Earth's crust. The $T_{\rm c}$ increases monotonically from 2.3 K at 40.3 GPa to 23.6 K at 144.9 GPa, which surpasses all known records from elemental metals reported so far. With further compression, a robust $T_{\rm c}$ of ~23 K is observed between 144.9 and 183 GPa in the $δ$-Ti phase. The pressure-dependent $T_{\rm c}$ can be well described by the conventional electron-phonon coupling (EPC) mechanism. Density Functional Theory calculations show the Fermi nesting and the phonon softening of optical branches at the $γ$-Ti to $δ$-Ti phase transition pressure enhance EPC, which results in the record high $T_{\rm c}$. We attribute the robust superconductivity in $δ$-Ti to the apparent robustness of its strong EPC against lattice compression. These results provide new insight into exploring new high-$T_{\rm c}$ elemental metals and Ti-based superconducting alloys.

preprint2021arXiv

Scribble-Supervised Semantic Segmentation by Uncertainty Reduction on Neural Representation and Self-Supervision on Neural Eigenspace

Scribble-supervised semantic segmentation has gained much attention recently for its promising performance without high-quality annotations. Due to the lack of supervision, confident and consistent predictions are usually hard to obtain. Typically, people handle these problems to either adopt an auxiliary task with the well-labeled dataset or incorporate the graphical model with additional requirements on scribble annotations. Instead, this work aims to achieve semantic segmentation by scribble annotations directly without extra information and other limitations. Specifically, we propose holistic operations, including minimizing entropy and a network embedded random walk on neural representation to reduce uncertainty. Given the probabilistic transition matrix of a random walk, we further train the network with self-supervision on its neural eigenspace to impose consistency on predictions between related images. Comprehensive experiments and ablation studies verify the proposed approach, which demonstrates superiority over others; it is even comparable to some full-label supervised ones and works well when scribbles are randomly shrunk or dropped.

preprint2020arXiv

A Fast Radio Burst discovered in FAST drift scan survey

We report the discovery of a highly dispersed fast radio burst, FRB~181123, from an analysis of $\sim$1500~hr of drift-scan survey data taken using the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The pulse has three distinct emission components, which vary with frequency across our 1.0--1.5~GHz observing band. We measure the peak flux density to be $>0.065$~Jy and the corresponding fluence $>0.2$~Jy~ms. Based on the observed dispersion measure of 1812~cm$^{-3}$~pc, we infer a redshift of $\sim 1.9$. From this, we estimate the peak luminosity and isotropic energy to be $\lesssim 2\times10^{43}$~erg~s$^{-1}$ and $\lesssim 2\times10^{40}$~erg, respectively. With only one FRB from the survey detected so far, our constraints on the event rate are limited. We derive a 95\% confidence lower limit for the event rate of 900 FRBs per day for FRBs with fluences $>0.025$~Jy~ms. We performed follow-up observations of the source with FAST for four hours and have not found a repeated burst. We discuss the implications of this discovery for our understanding of the physical mechanisms of FRBs.

preprint2020arXiv

Bi-Directional Attention for Joint Instance and Semantic Segmentation in Point Clouds

Instance segmentation in point clouds is one of the most fine-grained ways to understand the 3D scene. Due to its close relationship to semantic segmentation, many works approach these two tasks simultaneously and leverage the benefits of multi-task learning. However, most of them only considered simple strategies such as element-wise feature fusion, which may not lead to mutual promotion. In this work, we build a Bi-Directional Attention module on backbone neural networks for 3D point cloud perception, which uses similarity matrix measured from features for one task to help aggregate non-local information for the other task, avoiding the potential feature exclusion and task conflict. From comprehensive experiments and ablation studies on the S3DIS dataset and the PartNet dataset, the superiority of our method is verified. Moreover, the mechanism of how bi-directional attention module helps joint instance and semantic segmentation is also analyzed.

preprint2020arXiv

Bright Debris Disk Candidates Observed with AKARI/Far-Infrared Surveyor (FIS)

We cross-correlate \hip\ main-sequence star catalog with \fis\ catalog, and identify 136 stars (at $>90$% reliability) with far-infrared detections at least in one band. After rejecting 51 stars classified as young stellar objects, Be stars, other type stars with known dust disks or with potential contaminations and 2 stars without infrared excess emission, we obtain a sample of 83 candidate stars with debris disks. Stars in our sample cover spectral types from B to K-types with most being early types. This represents an unique sample of luminous debris disks that derived uniformly from all sky survey with a spatial resolution a factor of two better than the previous such survey by \iras. Moreover, by collecting the infrared photometric data from other public archives, 85% of them have infrared excesses in more than one bands, allowing the estimate of the dust temperatures. We fit the blackbody model to the broad band spectral energy distribution of these stars to derive the statistical distribution of the disk parameters. 7 stars require an additional warm component of temperature around 200 K. While a substantial fraction of our sample(58 stars) have weak 12 \micron\ excess indicating that a warm dust component maybe common among these bright debris disk systems.

preprint2020arXiv

Deep Learning Inversion of Electrical Resistivity Data

The inverse problem of electrical resistivity surveys (ERSs) is difficult because of its nonlinear and ill-posed nature. For this task, traditional linear inversion methods still face challenges such as suboptimal approximation and initial model selection. Inspired by the remarkable nonlinear mapping ability of deep learning approaches, in this article, we propose to build the mapping from apparent resistivity data (input) to resistivity model (output) directly by convolutional neural networks (CNNs). However, the vertically varying characteristic of patterns in the apparent resistivity data may cause ambiguity when using CNNs with the weight sharing and effective receptive field properties. To address the potential issue, we supply an additional tier feature map to CNNs to help those aware of the relationship between input and output. Based on the prevalent U-Net architecture, we design our network (ERSInvNet) that can be trained end-to-end and can reach a very fast inference speed during testing. We further introduce a depth weighting function and a smooth constraint into loss function to improve inversion accuracy for the deep region and suppress false anomalies. Six groups of experiments are considered to demonstrate the feasibility and efficiency of the proposed methods. According to the comprehensive qualitative analysis and quantitative comparison, ERSInvNet with tier feature map, smooth constraints, and depth weighting function together achieve the best performance.

preprint2020arXiv

Deep-Learning Inversion of Seismic Data

We propose a new method to tackle the mapping challenge from time-series data to spatial image in the field of seismic exploration, i.e., reconstructing the velocity model directly from seismic data by deep neural networks (DNNs). The conventional way of addressing this ill-posed inversion problem is through iterative algorithms, which suffer from poor nonlinear mapping and strong nonuniqueness. Other attempts may either import human intervention errors or underuse seismic data. The challenge for DNNs mainly lies in the weak spatial correspondence, the uncertain reflection-reception relationship between seismic data and velocity model, as well as the time-varying property of seismic data. To tackle these challenges, we propose end-to-end seismic inversion networks (SeisInvNets) with novel components to make the best use of all seismic data. Specifically, we start with every seismic trace and enhance it with its neighborhood information, its observation setup, and the global context of its corresponding seismic profile. From the enhanced seismic traces, the spatially aligned feature maps can be learned and further concatenated to reconstruct a velocity model. In general, we let every seismic trace contribute to the reconstruction of the whole velocity model by finding spatial correspondence. The proposed SeisInvNet consistently produces improvements over the baselines and achieves promising performance on our synthesized and proposed SeisInv data set according to various evaluation metrics. The inversion results are more consistent with the target from the aspects of velocity values, subsurface structures, and geological interfaces. Moreover, the mechanism and the generalization of the proposed method are discussed and verified. Nevertheless, the generalization of deep-learning-based inversion methods on real data is still challenging and considering physics may be one potential solution.

preprint2020arXiv

Defect segmentation: Mapping tunnel lining internal defects with ground penetrating radar data using a convolutional neural network

This research proposes a Ground Penetrating Radar (GPR) data processing method for non-destructive detection of tunnel lining internal defects, called defect segmentation. To perform this critical step of automatic tunnel lining detection, the method uses a CNN called Segnet combined with the Lovász softmax loss function to map the internal defect structure with GPR synthetic data, which improves the accuracy, automation and efficiency of defects detection. The novel method we present overcomes several difficulties of traditional GPR data interpretation as demonstrated by an evaluation on both synthetic and real datas -- to verify the method on real data, a test model containing a known defect was designed and built and GPR data was obtained and analyzed.

preprint2020arXiv

Discovery and timing of pulsars in the globular cluster M13 with FAST

We report the discovery of a binary millisecond pulsar (namely PSR J1641+3627F or M13F) in the globular cluster M13 (NGC 6205) and timing solutions of M13A to F using observations made with the Five-hundred-metre Aperture Spherical radio Telescope (FAST). PSR J1641+3627F has a spin period of 3.00 ms and an orbital period of 1.4 days. The most likely companion mass is 0.16 M$_{\odot}$. M13A to E all have short spin periods and small period derivatives. We also confirm that the binary millisecond pulsar PSR J1641$+$3627E (also M13E) is a black widow with a companion mass around 0.02 M$_{\odot}$. We find that all the binary systems have low eccentricities compared to those typical for globular cluster pulsars and that they decrease with distance from the cluster core. This is consistent with what is expected as this cluster has a very low encounter rate per binary.

preprint2020arXiv

First embedded cluster formation in California molecular cloud

We performed a multi-wavelength observation toward LkHa 101 embedded cluster and its adjacent 85arcmin*60arcmin region. The LkHa 101 embedded cluster is the first and only one significant cluster in California molecular cloud (CMC). These observations have revealed that the LkHa 101 embedded cluster is just located at the projected intersectional region of two filaments. One filament is the highest-density section of the CMC, the other is a new identified filament with a low-density gas emission. Toward the projected intersection, we find the bridging features connecting the two filaments in velocity, and identify a V-shape gas structure. These agree with the scenario that the two filaments are colliding with each other. Using the Five-hundred-meter Aperture Spherical radio Telescope (FAST), we measured that the RRL velocity of the LkHa 101 H II region is 0.5 km/s, which is related to the velocity component of the CMC filament. Moreover, there are some YSOs distributed outside the intersectional region. We suggest that the cloud-cloud collision together with the fragmentation of the main filament may play an important role in the YSOs formation of the cluster.

preprint2020arXiv

Night-time measurements of astronomical seeing at Dome A in Antarctica

Seeing, the angular size of stellar images blurred by atmospheric turbulence, is a critical parameter used to assess the quality of astronomical sites. Median values at the best mid-latitude sites are generally in the range of 0.6--0.8\,arcsec. Sites on the Antarctic plateau are characterized by comparatively-weak turbulence in the free-atmosphere above a strong but thin boundary layer. The median seeing at Dome C is estimated to be 0.23--0.36 arcsec above a boundary layer that has a typical height of 30\,m. At Dome A and F, the only previous seeing measurements were made during daytime. Here we report the first direct measurements of night-time seeing at Dome A, using a Differential Image Motion Monitor. Located at a height of just 8\,m, it recorded seeing as low as 0.13\,arcsec, and provided seeing statistics that are comparable to those for a 20\,m height at Dome C. It indicates that the boundary layer was below 8\,m 31\% of the time. At such times the median seeing was 0.31\,arcsec, consistent with free-atmosphere seeing. The seeing and boundary layer thickness are found to be strongly correlated with the near-surface temperature gradient. The correlation confirms a median thickness of approximately 14\,m for the boundary layer at Dome A, as found from a sonic radar. The thinner boundary layer makes it less challenging to locate a telescope above it, thereby giving greater access to the free-atmosphere.

preprint2020arXiv

The Fundamental Performance of FAST with 19-beam Receiver at L Band

The Five-hundred-meter Aperture Spherical radio Telescope (FAST) passed national acceptance and is taking pilot cycle of 'Shared-Risk' observations. The 19-beam receiver covering 1.05-1.45 GHz was used for most of these observations. The electronics gain fluctuation of the system is better than 1\% over 3.5 hours, enabling enough stability for observations. Pointing accuracy, aperture efficiency and system temperature are three key parameters of FAST. The measured standard deviation of pointing accuracy is 7.9$''$, which satisfies the initial design of FAST. When zenith angle is less than 26.4$^\circ$, the aperture efficiency and system temperature around 1.4 GHz are $\sim$ 0.63 and less than 24 K for central beam, respectively. The measured value of these two parameters are better than designed value of 0.6 and 25 K, respectively. The sensitivity and stability of the 19-beam backend are confirmed to satisfy expectation by spectral HI observations toward N672 and polarization observations toward 3C286. The performance allows FAST to take sensitive observations in various scientific goals, from studies of pulsar to galaxy evolution.

preprint2019arXiv

Pilot HI Survey of Planck Galactic Cold Clumps with FAST

We present a pilot HI survey of 17 Planck Galactic Cold Clumps (PGCCs) with the Five-hundred-meter Aperture Spherical radio Telescope (FAST). HI Narrow Self-Absorption (HINSA) is an effective method to detect cold HI being mixed with molecular hydrogen H$_2$ and improves our understanding of the atomic to molecular transition in the interstellar medium. HINSA was found in 58\% PGCCs that we observed. The column density of HINSA was found to have an intermediate correlation with that of $^{13}$CO, following $\rm log( N(HINSA)) = (0.52\pm 0.26) log(N_{^{13}CO}) + (10 \pm 4.1) $. HI abundance relative to total hydrogen [HI]/[H] has an average value of $4.4\times 10^{-3}$, which is about 2.8 times of the average value of previous HINSA surveys toward molecular clouds. For clouds with total column density N$\rm_H >5 \times 10^{20}$ cm$^{-2}$, an inverse correlation between HINSA abundance and total hydrogen column density is found, confirming the depletion of cold HI gas during molecular gas formation in more massive clouds. Nonthermal line width of $^{13}$CO is about 0-0.5 km s$^{-1}$ larger than that of HINSA. One possible explanation of narrower nonthermal width of HINSA is that HINSA region is smaller than that of $^{13}$CO. Based on an analytic model of H$_2$ formation and H$_2$ dissociation by cosmic ray, we found the cloud ages to be within 10$^{6.7}$-10$^{7.0}$ yr for five sources.

preprint2018arXiv

Super Diffusion for Salient Object Detection

One major branch of saliency object detection methods is diffusion-based which construct a graph model on a given image and diffuse seed saliency values to the whole graph by a diffusion matrix. While their performance is sensitive to specific feature spaces and scales used for the diffusion matrix definition, little work has been published to systematically promote the robustness and accuracy of salient object detection under the generic mechanism of diffusion. In this work, we firstly present a novel view of the working mechanism of the diffusion process based on mathematical analysis, which reveals that the diffusion process is actually computing the similarity of nodes with respect to the seeds based on diffusion maps. Following this analysis, we propose super diffusion, a novel inclusive learning-based framework for salient object detection, which makes the optimum and robust performance by integrating a large pool of feature spaces, scales and even features originally computed for non-diffusion-based salient object detection. A closed-form solution of the optimal parameters for the integration is determined through supervised learning. At the local level, we propose to promote each individual diffusion before the integration. Our mathematical analysis reveals the close relationship between saliency diffusion and spectral clustering. Based on this, we propose to re-synthesize each individual diffusion matrix from the most discriminative eigenvectors and the constant eigenvector (for saliency normalization). The proposed framework is implemented and experimented on prevalently used benchmark datasets, consistently leading to state-of-the-art performance.