Source author record

Song Wang

Song Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

76works

30topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A geometry aware framework enhances noninvasive mapping of whole human brain dynamics

Non-invasive electrophysiology lacks methods that accurately reconstruct whole-brain spatiotemporal dynamics while incorporating individual cortical geometry, leaving current electroencephalography and magnetoencephalography source imaging limited by simplistic or biologically implausible priors. Here, we show that embedding participant-specific Geometric Basis Functions (GBFs), eigenmodes derived from each individual's cortical surface, provides a powerful anatomic constraint that resolves the inverse problem and improves reconstruction fidelity. The method reconstructs neural sources as linear combinations of geometric basis functions, thereby aligning source estimates with the geometric organization of neural dynamics. We validate GBF across the Meta-Source Benchmark, task-evoked data, resting-state networks, intracranial stimulation, and epilepsy data. The results demonstrate that GBF yields high localization accuracy and captures fast spatiotemporal dynamics consistent with anatomical pathways. These findings suggest that both spontaneous and evoked whole-brain activity can be described by hundreds of geometric modes, providing a compact yet accurate representation of neural sources. By linking cortical geometry to electrophysiological dynamics, GBF offers a versatile source imaging tool for both scientific and clinical applications.

preprint2026arXiv

AI for Auto-Research: Roadmap & User Guide

AI-assisted research is crossing a threshold: fully automated systems can now generate research papers for as little as $15, while long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input. Yet this productivity frontier exposes a deeper integrity problem: under scientific pressure, even frontier LLMs still fabricate results, miss hidden errors, and fail to judge novelty reliably. Studying developments through April 2026, we present an end-to-end analysis of AI across the complete research lifecycle, organized into four epistemological phases: Creation (idea generation, literature review, coding & experiments, tables & figures), Writing (paper writing), Validation (peer review, rebuttal & revision), and Dissemination (posters, slides, videos, social media, project pages, and interactive agents). We identify a sharp, stage-dependent boundary between reliable assistance and unreliable autonomy: AI excels at structured, retrieval-grounded, and tool-mediated tasks, but remains fragile for genuinely novel ideas, research-level experiments, and scientific judgment. Generated ideas often degrade after implementation, research code lags far behind pattern-matching benchmarks, and end-to-end autonomous systems have not yet consistently reached major-venue acceptance standards. We further show that greater automation can obscure rather than eliminate failure modes, making human-governed collaboration the most credible deployment paradigm. Finally, we provide a structured taxonomy, benchmark suite, and tool inventory, cross-stage design principles, and a practitioner-oriented playbook, with resources maintained at our project page.

preprint2026arXiv

Divergence is Uncertainty: A Closed-Form Posterior Covariance for Flow Matching

Flow matching has become a leading framework for generative modeling, but quantifying the uncertainty of its samples remains an open problem. Existing approaches retrain the model with auxiliary variance heads, maintain costly ensembles, or propagate approximate covariance through many integration steps, trading off training cost, inference cost, or accuracy. We show that none of these trade-offs is necessary. We prove that, for any pre-trained flow matching velocity field, the trace of the posterior covariance over the clean data given the current state equals, in closed form, the divergence of the velocity field, up to a known time-dependent prefactor and an additive constant. We call this the \emph{divergence-uncertainty identity} for flow matching. The matrix-level form of the identity is similarly closed-form, depending solely on the velocity Jacobian. Because the identity is exact and post-hoc, it is computable on any pre-trained flow matching model, with no retraining and no architectural modification. For one-step generators such as MeanFlow, the same identity yields the exact end-to-end generation uncertainty in a single forward pass, eliminating the multi-step variance propagation required by all prior methods. Experiments on MNIST confirm that the resulting per-pixel uncertainty maps are semantically meaningful, concentrating on digit boundaries where inter-sample variation is highest, and that the scalar uncertainty score tracks actual prediction error, all at roughly 10,000$\times$ less total compute than ensembling or Monte Carlo dropout.

preprint2026arXiv

Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

The rapid advancement of autonomous systems, including self-driving vehicles and drones, has intensified the need to forge true Spatial Intelligence from multi-modal onboard sensor data. While foundation models excel in single-modal contexts, integrating their capabilities across diverse sensors like cameras and LiDAR to create a unified understanding remains a formidable challenge. This paper presents a comprehensive framework for multi-modal pre-training, identifying the core set of techniques driving progress toward this goal. We dissect the interplay between foundational sensor characteristics and learning strategies, evaluating the role of platform-specific datasets in enabling these advancements. Our central contribution is the formulation of a unified taxonomy for pre-training paradigms: ranging from single-modality baselines to sophisticated unified frameworks that learn holistic representations for advanced tasks like 3D object detection and semantic occupancy prediction. Furthermore, we investigate the integration of textual inputs and occupancy representations to facilitate open-world perception and planning. Finally, we identify critical bottlenecks, such as computational efficiency and model scalability, and propose a roadmap toward general-purpose multi-modal foundation models capable of achieving robust Spatial Intelligence for real-world deployment.

preprint2026arXiv

MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

Long-term conversational agents must decide which turns to store in external memory, yet recent systems rely on autoregressive LLM generation at every turn to make that decision. We present MemRouter, a write-side memory router that decouples memory admission from the downstream answer backbone and replaces per-turn memory-management decoding with an embedding-based routing policy. MemRouter encodes each turn together with recent context, projects the resulting embeddings through a frozen LLM backbone, and predicts whether the turn should be stored using lightweight classification heads while training only 12M parameters. Under a controlled matched-harness comparison on LoCoMo, where the retrieval pipeline, answer prompts, and QA backbone (Qwen2.5-7B) are held identical, MemRouter outperforms an LLM-based memory manager on every question category (overall F1 52.0 vs 45.6, non-overlapping 95% CIs) while reducing memory-management p50 latency from 970ms to 58ms. Descriptive factorial averaging further shows that learned admission improves mean F1 by +10.3 over random storage, category-specific prompting adds +5.2 over a generic prompt, and retrieval contributes +0.7. These results suggest that write-side memory admission can be learned by a small supervised router, while answer generation remains a separate downstream component in long-horizon conversational QA.

preprint2026arXiv

Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction

Large Language Model based multi-agent systems (MAS) excel at collaborative problem solving but remain brittle to cascading errors: a single faulty step can propagate across agents and disrupt the trajectory. In this paper, we present MASC, a metacognitive framework that endows MAS with real-time, unsupervised, step-level error detection and self-correction. MASC rethinks detection as history-conditioned anomaly scoring via two complementary designs: (1) Next-Execution Reconstruction, which predicts the embedding of the next step from the query and interaction history to capture causal consistency, and (2) Prototype-Guided Enhancement, which learns a prototype prior over normal-step embeddings and uses it to stabilize reconstruction and anomaly scoring under sparse context (e.g., early steps). When an anomaly step is flagged, MASC triggers a correction agent to revise the acting agent's output before information flows downstream. On the Who&When benchmark, MASC consistently outperforms all baselines, improving step-level error detection by up to 8.47% AUC-ROC ; When plugged into diverse MAS frameworks, it delivers consistent end-to-end gains across architectures, confirming that our metacognitive monitoring and targeted correction can mitigate error propagation with minimal overhead.

preprint2026arXiv

MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion

Multimodal physiological data powers clinical AI systems from intensive care units to wearable devices, but sensors routinely fail in practice. Two failure modes are common: modality missing, where an entire channel is absent, and within-modality missing, where a contiguous time segment is lost. No existing benchmark evaluates multiple fusion architectures under both failure modes at controlled severity levels across diverse clinical datasets. We present MuteBench, a benchmark covering 9 datasets from 7 clinical domains, 6 fusion architectures, and 2 missing-data modes over 125,000 samples. Through this benchmark, we find that architecture family is the strongest predictor of robustness, outweighing parameter count. Channel-independent models tolerate modality missing well but can be sensitive to within-modality missing, especially on short sequences. Curriculum modality dropout protects reliably only up to the maximum dropout rate used in training. We also find that channel count, sequence length, and modality alignment jointly determine which failure mode poses the greater threat. Finally, a PTB-XL case study suggests that diffusion-based imputation can improve downstream classification under within-modality missing, with the largest gains for models whose expert routing is most sensitive to corrupted inputs, though broader validation across datasets remains an open direction. MuteBench provides practitioners with concrete guidance for both selecting existing architectures and informing the design of future robust multimodal fusion methods.

preprint2026arXiv

Social Bias in LLM-Generated Code: Benchmark and Mitigation

Large Language Models (LLMs) are increasingly deployed to generate code for human-centered applications where demographic fairness is critical. However, existing evaluations focus almost exclusively on functional correctness, leaving social bias in LLM-generated code largely unexamined. Extending our prior work on Solar, we conduct a comprehensive empirical study using SocialBias-Bench, a benchmark of 343 real-world coding tasks spanning seven demographic dimensions. We evaluate four prominent LLMs and find severe bias across all models, with Code Bias Scores reaching up to 60.58%. We further show that standard prompt-level interventions, such as Chain-of-Thought reasoning and fairness persona assignment, inadvertently amplify bias rather than reduce it. We then investigate whether structured multi-agent software process frameworks can improve fairness, finding that structured pipelines reduce bias when early roles correctly scope what the code should and should not consider. However, adding explicit fairness instructions to all agent roles produces worse outcomes than providing none, suggesting that diffused responsibility goes unaddressed. To address these limitations, we propose the Fairness Monitor Agent (FMA), a modular component that plugs into any existing code generation pipeline without modifying it. FMA analyzes the task description to determine which attributes should be considered or restricted, then detects and corrects violations through an iterative review process, without requiring an executable test suite. Evaluated on all 343 tasks, FMA reduces bias by 65.1% compared to a developer agent alone and improves functional correctness from 75.80% to 83.97%, outperforming all other studied approaches.

preprint2026arXiv

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Autonomous driving has long relied on modular "Perception-Decision-Action" pipelines, where hand-crafted interfaces and rule-based components often break down in complex or long-tailed scenarios. Their cascaded design further propagates perception errors, degrading downstream planning and control. Vision-Action (VA) models address some limitations by learning direct mappings from visual inputs to actions, but they remain opaque, sensitive to distribution shifts, and lack structured reasoning or instruction-following capabilities. Recent progress in Large Language Models (LLMs) and multimodal learning has motivated the emergence of Vision-Language-Action (VLA) frameworks, which integrate perception with language-grounded decision making. By unifying visual understanding, linguistic reasoning, and actionable outputs, VLAs offer a pathway toward more interpretable, generalizable, and human-aligned driving policies. This work provides a structured characterization of the emerging VLA landscape for autonomous driving. We trace the evolution from early VA approaches to modern VLA frameworks and organize existing methods into two principal paradigms: End-to-End VLA, which integrates perception, reasoning, and planning within a single model, and Dual-System VLA, which separates slow deliberation (via VLMs) from fast, safety-critical execution (via planners). Within these paradigms, we further distinguish subclasses such as textual vs. numerical action generators and explicit vs. implicit guidance mechanisms. We also summarize representative datasets and benchmarks for evaluating VLA-based driving systems and highlight key challenges and open directions, including robustness, interpretability, and instruction fidelity. Overall, this work aims to establish a coherent foundation for advancing human-compatible autonomous driving systems.

preprint2023arXiv

Few-shot Node Classification with Extremely Weak Supervision

Few-shot node classification aims at classifying nodes with limited labeled nodes as references. Recent few-shot node classification methods typically learn from classes with abundant labeled nodes (i.e., meta-training classes) and then generalize to classes with limited labeled nodes (i.e., meta-test classes). Nevertheless, on real-world graphs, it is usually difficult to obtain abundant labeled nodes for many classes. In practice, each meta-training class can only consist of several labeled nodes, known as the extremely weak supervision problem. In few-shot node classification, with extremely limited labeled nodes for meta-training, the generalization gap between meta-training and meta-test will become larger and thus lead to suboptimal performance. To tackle this issue, we study a novel problem of few-shot node classification with extremely weak supervision and propose a principled framework X-FNC under the prevalent meta-learning framework. Specifically, our goal is to accumulate meta-knowledge across different meta-training tasks with extremely weak supervision and generalize such knowledge to meta-test tasks. To address the challenges resulting from extremely scarce labeled nodes, we propose two essential modules to obtain pseudo-labeled nodes as extra references and effectively learn from extremely limited supervision information. We further conduct extensive experiments on four node classification datasets with extremely weak supervision to validate the superiority of our framework compared to the state-of-the-art baselines.

preprint2022arXiv

An End-to-End Dialogue Summarization System for Sales Calls

Summarizing sales calls is a routine task performed manually by salespeople. We present a production system which combines generative models fine-tuned for customer-agent setting, with a human-in-the-loop user experience for an interactive summary curation process. We address challenging aspects of dialogue summarization task in a real-world setting including long input dialogues, content validation, lack of labeled data and quality evaluation. We show how GPT-3 can be leveraged as an offline data labeler to handle training data scarcity and accommodate privacy constraints in an industrial setting. Experiments show significant improvements by our models in tackling the summarization and content validation tasks on public datasets.

preprint2022arXiv

Automatic Comment Generation via Multi-Pass Deliberation

Deliberation is a common and natural behavior in human daily life. For example, when writing papers or articles, we usually first write drafts, and then iteratively polish them until satisfied. In light of such a human cognitive process, we propose DECOM, which is a multi-pass deliberation framework for automatic comment generation. DECOM consists of multiple Deliberation Models and one Evaluation Model. Given a code snippet, we first extract keywords from the code and retrieve a similar code fragment from a pre-defined corpus. Then, we treat the comment of the retrieved code as the initial draft and input it with the code and keywords into DECOM to start the iterative deliberation process. At each deliberation, the deliberation model polishes the draft and generates a new comment. The evaluation model measures the quality of the newly generated comment to determine whether to end the iterative process or not. When the iterative process is terminated, the best-generated comment will be selected as the target comment. Our approach is evaluated on two real-world datasets in Java (87K) and Python (108K), and experiment results show that our approach outperforms the state-of-the-art baselines. A human evaluation study also confirms the comments generated by DECOM tend to be more readable, informative, and useful.

preprint2022arXiv

Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection

Co-salient object detection (CoSOD) has recently achieved significant progress and played a key role in retrieval-related tasks. However, it inevitably poses an entirely new safety and security issue, i.e., highly personal and sensitive content can potentially be extracting by powerful CoSOD methods. In this paper, we address this problem from the perspective of adversarial attacks and identify a novel task: adversarial co-saliency attack. Specially, given an image selected from a group of images containing some common and salient objects, we aim to generate an adversarial version that can mislead CoSOD methods to predict incorrect co-salient regions. Note that, compared with general white-box adversarial attacks for classification, this new task faces two additional challenges: (1) low success rate due to the diverse appearance of images in the group; (2) low transferability across CoSOD methods due to the considerable difference between CoSOD pipelines. To address these challenges, we propose the very first black-box joint adversarial exposure and noise attack (Jadena), where we jointly and locally tune the exposure and additive perturbations of the image according to a newly designed high-feature-level contrast-sensitive loss function. Our method, without any information on the state-of-the-art CoSOD methods, leads to significant performance degradation on various co-saliency detection datasets and makes the co-salient objects undetectable. This can have strong practical benefits in properly securing the large number of personal photos currently shared on the Internet. Moreover, our method is potential to be utilized as a metric for evaluating the robustness of CoSOD methods.

preprint2022arXiv

Cancellable Template Design for Privacy-Preserving EEG Biometric Authentication Systems

As a promising candidate to complement traditional biometric modalities, brain biometrics using electroencephalography (EEG) data has received a widespread attention in recent years. However, compared with existing biometrics such as fingerprints and face recognition, research on EEG biometrics is still in its infant stage. Most of the studies focus on either designing signal elicitation protocols from the perspective of neuroscience or developing feature extraction and classification algorithms from the viewpoint of machine learning. These studies have laid the ground for the feasibility of using EEG as a biometric authentication modality, but they have also raised security and privacy concerns as EEG data contains sensitive information. Existing research has used hash functions and cryptographic schemes to protect EEG data, but they do not provide functions for revoking compromised templates as in cancellable template design. This paper proposes the first cancellable EEG template design for privacy-preserving EEG-based authentication systems, which can protect raw EEG signals containing sensitive privacy information (e.g., identity, health and cognitive status). A novel cancellable EEG template is developed based on EEG graph features and a non-invertible transform. The proposed transformation provides cancellable templates, while taking advantage of EEG elicitation protocol fusion to enhance biometric performance. The proposed authentication system offers equivalent authentication performance (8.58\% EER on a public database) as in the non-transformed domain, while protecting raw EEG data. Furthermore, we analyze the system's capacity for resisting multiple attacks, and discuss some overlooked but critical issues and possible pitfalls involving hill-climbing attacks, second attacks, and classification-based authentication systems.

preprint2022arXiv

Characterizing and Understanding Software Security Vulnerabilities in Machine Learning Libraries

The application of machine learning (ML) libraries has been tremendously increased in many domains, including autonomous driving systems, medical, and critical industries. Vulnerabilities of such libraries result in irreparable consequences. However, the characteristics of software security vulnerabilities have not been well studied. In this paper, to bridge this gap, we take the first step towards characterizing and understanding the security vulnerabilities of five well-known ML libraries, including Tensorflow, PyTorch, Sickit-learn, Pandas, and Numpy. To do so, in total, we collected 596 security-related commits to exploring five major factors: 1) vulnerability types, 2) root causes, 3) symptoms, 4) fixing patterns, and 5) fixing efforts of security vulnerabilities in ML libraries. The findings of this study can assist developers in having a better understanding of software security vulnerabilities across different ML libraries and gain a better insight into their weaknesses of them. To make our finding actionable, we further developed DeepMut, an automated mutation testing tool, as a proof-of-concept application of our findings. DeepMut is designed to assess the adequacy of existing test suites of ML libraries against security-aware mutation operators extracted from the vulnerabilities studied in this work. We applied DeepMut on the Tensorflow kernel module and found more than 1k alive mutants not considered by the existing test suits. The results demonstrate the usefulness of our findings.

preprint2022arXiv

CRFormer: A Cross-Region Transformer for Shadow Removal

Aiming to restore the original intensity of shadow regions in an image and make them compatible with the remaining non-shadow regions without a trace, shadow removal is a very challenging problem that benefits many downstream image/video-related tasks. Recently, transformers have shown their strong capability in various applications by capturing global pixel interactions and this capability is highly desirable in shadow removal. However, applying transformers to promote shadow removal is non-trivial for the following two reasons: 1) The patchify operation is not suitable for shadow removal due to irregular shadow shapes; 2) shadow removal only needs one-way interaction from the non-shadow region to the shadow region instead of the common two-way interactions among all pixels in the image. In this paper, we propose a novel cross-region transformer, namely CRFormer, for shadow removal which differs from existing transformers by only considering the pixel interactions from the non-shadow region to the shadow region without splitting images into patches. This is achieved by a carefully designed region-aware cross-attention operation that can aggregate the recovered shadow region features conditioned on the non-shadow region features. Extensive experiments on ISTD, AISTD, SRD, and Video Shadow Removal datasets demonstrate the superiority of our method compared to other state-of-the-art methods.

preprint2022arXiv

Detecting and Monitoring Tidal Dissipation of Hot Jupiters in the Era of SiTian

Transit Timing Variation (TTV) of hot Jupiters provides direct observational evidence of planet tidal dissipation. Detecting tidal dissipation through TTV needs high precision transit timings and long timing baselines. In this work, we predict and discuss the potential scientific contribution of SiTian Survey in detecting and analyzing exoplanet TTV. We develop a tidal dissipation detection pipeline for SiTian Survey that aims at time-domain astronomy with 72 1-meter optical telescopes. The pipeline includes the modules of light curve deblending, transit timing obtaining, and TTV modeling. SiTian is capable to detect more than 25,000 exoplanets among which we expect $\sim$50 sources showing evidence of tidal dissipation. We present detection and analysis of tidal dissipating targets, based on simulated SiTian light curves of XO-3b and WASP-161b. The transit light curve modeling gives consistent results within 1$σ$ to input values of simulated light curves. Also, the parameter uncertainties predicted by Monte-Carlo Markov Chain are consistent with the distribution obtained from simulating and modeling the light curve 1000 times. The timing precision of SiTian observations is $\sim$ 0.5 minutes with one transit visit. We show that differences between TTV origins, e.g., tidal dissipation, apsidal precession, multiple planets, would be significant, considering the timing precision and baseline. The detection rate of tidal dissipating hot Jupiters would answer a crucial question of whether the planet migrates at an early formation stage or random stages due to perturbations, e.g., planet scattering, secular interaction. SiTian identified targets would be constructive given that the sample would extend tenfold.

preprint2022arXiv

FAITH: Few-Shot Graph Classification with Hierarchical Task Graphs

Few-shot graph classification aims at predicting classes for graphs, given limited labeled graphs for each class. To tackle the bottleneck of label scarcity, recent works propose to incorporate few-shot learning frameworks for fast adaptations to graph classes with limited labeled graphs. Specifically, these works propose to accumulate meta-knowledge across diverse meta-training tasks, and then generalize such meta-knowledge to the target task with a disjoint label set. However, existing methods generally ignore task correlations among meta-training tasks while treating them independently. Nevertheless, such task correlations can advance the model generalization to the target task for better classification performance. On the other hand, it remains non-trivial to utilize task correlations due to the complex components in a large number of meta-training tasks. To deal with this, we propose a novel few-shot learning framework FAITH that captures task correlations via constructing a hierarchical task graph at different granularities. Then we further design a loss-based sampling strategy to select tasks with more correlated classes. Moreover, a task-specific classifier is proposed to utilize the learned task correlations for few-shot classification. Extensive experiments on four prevalent few-shot graph classification datasets demonstrate the superiority of FAITH over other state-of-the-art baselines.

preprint2022arXiv

Meta-RangeSeg: LiDAR Sequence Semantic Segmentation Using Multiple Feature Aggregation

LiDAR sensor is essential to the perception system in autonomous vehicles and intelligent robots. To fulfill the real-time requirements in real-world applications, it is necessary to efficiently segment the LiDAR scans. Most of previous approaches directly project 3D point cloud onto the 2D spherical range image so that they can make use of the efficient 2D convolutional operations for image segmentation. Although having achieved the encouraging results, the neighborhood information is not well-preserved in the spherical projection. Moreover, the temporal information is not taken into consideration in the single scan segmentation task. To tackle these problems, we propose a novel approach to semantic segmentation for LiDAR sequences named Meta-RangeSeg, where a new range residual image representation is introduced to capture the spatial-temporal information. Specifically, Meta-Kernel is employed to extract the meta features, which reduces the inconsistency between the 2D range image coordinates input and 3D Cartesian coordinates output. An efficient U-Net backbone is used to obtain the multi-scale features. Furthermore, Feature Aggregation Module (FAM) strengthens the role of range channel and aggregates features at different levels. We have conducted extensive experiments for performance evaluation on SemanticKITTI and SemanticPOSS. The promising results show that our proposed Meta-RangeSeg method is more efficient and effective than the existing approaches. Our full implementation is publicly available at https://github.com/songw-zju/Meta-RangeSeg .

preprint2022arXiv

MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting

Although achieving significant progress, existing deep generative inpainting methods are far from real-world applications due to the low generalization across different scenes. As a result, the generated images usually contain artifacts or the filled pixels differ greatly from the ground truth. Image-level predictive filtering is a widely used image restoration technique, predicting suitable kernels adaptively according to different input scenes. Inspired by this inherent advantage, we explore the possibility of addressing image inpainting as a filtering task. To this end, we first study the advantages and challenges of image-level predictive filtering for image inpainting: the method can preserve local structures and avoid artifacts but fails to fill large missing areas. Then, we propose semantic filtering by conducting filtering on the deep feature level, which fills the missing semantic information but fails to recover the details. To address the issues while adopting the respective advantages, we propose a novel filtering technique, i.e., Multilevel Interactive Siamese Filtering (MISF), which contains two branches: kernel prediction branch (KPB) and semantic & image filtering branch (SIFB). These two branches are interactively linked: SIFB provides multi-level features for KPB while KPB predicts dynamic kernels for SIFB. As a result, the final method takes the advantage of effective semantic & image-level filling for high-fidelity inpainting. We validate our method on three challenging datasets, i.e., Dunhuang, Places2, and CelebA. Our method outperforms state-of-the-art baselines on four metrics, i.e., L1, PSNR, SSIM, and LPIPS. Please try the released code and model at https://github.com/tsingqguo/misf.

preprint2022arXiv

On Structural Explanation of Bias in Graph Neural Networks

Graph Neural Networks (GNNs) have shown satisfying performance in various graph analytical problems. Hence, they have become the \emph{de facto} solution in a variety of decision-making scenarios. However, GNNs could yield biased results against certain demographic subgroups. Some recent works have empirically shown that the biased structure of the input network is a significant source of bias for GNNs. Nevertheless, no studies have systematically scrutinized which part of the input network structure leads to biased predictions for any given node. The low transparency on how the structure of the input network influences the bias in GNN outcome largely limits the safe adoption of GNNs in various decision-critical scenarios. In this paper, we study a novel research problem of structural explanation of bias in GNNs. Specifically, we propose a novel post-hoc explanation framework to identify two edge sets that can maximally account for the exhibited bias and maximally contribute to the fairness level of the GNN prediction for any given node, respectively. Such explanations not only provide a comprehensive understanding of bias/fairness of GNN predictions but also have practical significance in building an effective yet fair GNN model. Extensive experiments on real-world datasets validate the effectiveness of the proposed framework towards delivering effective structural explanations for the bias of GNNs. Open-source code can be found at https://github.com/yushundong/REFEREE.

preprint2022arXiv

Overview of the LAMOST survey in the first decade

The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST), also known as the Guoshoujing Telescope, is a major national scientific facility for astronomical research located in Xinglong, China. Beginning with a pilot survey in 2011, LAMOST has been surveying the night sky for more than 10 years. The LAMOST survey covers various objects in the Universe, from normal stars to peculiar ones, from the Milky Way to other galaxies, and from stellar black holes and their companions to quasars that ignite ancient galaxies. Until the latest data release 8, the LAMOST survey has released spectra for more than 10 million stars, ~220,000 galaxies, and ~71,000 quasars. With this largest celestial spectra database ever constructed, LAMOST has helped astronomers to deepen their understanding of the Universe, especially for our Milky Way galaxy and the millions of stars within it. In this article, we briefly review the characteristics, observations, and scientific achievements of LAMOST. In particular, we show how astrophysical knowledge about the Milky Way has been improved by LAMOST data.

preprint2022arXiv

PC-GANs: Progressive Compensation Generative Adversarial Networks for Pan-sharpening

The fusion of multispectral and panchromatic images is always dubbed pansharpening. Most of the available deep learning-based pan-sharpening methods sharpen the multispectral images through a one-step scheme, which strongly depends on the reconstruction ability of the network. However, remote sensing images always have large variations, as a result, these one-step methods are vulnerable to the error accumulation and thus incapable of preserving spatial details as well as the spectral information. In this paper, we propose a novel two-step model for pan-sharpening that sharpens the MS image through the progressive compensation of the spatial and spectral information. Firstly, a deep multiscale guided generative adversarial network is used to preliminarily enhance the spatial resolution of the MS image. Starting from the pre-sharpened MS image in the coarse domain, our approach then progressively refines the spatial and spectral residuals over a couple of generative adversarial networks (GANs) that have reverse architectures. The whole model is composed of triple GANs, and based on the specific architecture, a joint compensation loss function is designed to enable the triple GANs to be trained simultaneously. Moreover, the spatial-spectral residual compensation structure proposed in this paper can be extended to other pan-sharpening methods to further enhance their fusion results. Extensive experiments are performed on different datasets and the results demonstrate the effectiveness and efficiency of our proposed method.

preprint2022arXiv

PLGAN: Generative Adversarial Networks for Power-Line Segmentation in Aerial Images

Accurate segmentation of power lines in various aerial images is very important for UAV flight safety. The complex background and very thin structures of power lines, however, make it an inherently difficult task in computer vision. This paper presents PLGAN, a simple yet effective method based on generative adversarial networks, to segment power lines from aerial images with different backgrounds. Instead of directly using the adversarial networks to generate the segmentation, we take their certain decoding features and embed them into another semantic segmentation network by considering more context, geometry, and appearance information of power lines. We further exploit the appropriate form of the generated images for high-quality feature embedding and define a new loss function in the Hough-transform parameter space to enhance the segmentation of very thin power lines. Extensive experiments and comprehensive analysis demonstrate that our proposed PLGAN outperforms the prior state-of-the-art methods for semantic segmentation and line detection.

preprint2022arXiv

PLMCL: Partial-Label Momentum Curriculum Learning for Multi-Label Image Classification

Multi-label image classification aims to predict all possible labels in an image. It is usually formulated as a partial-label learning problem, given the fact that it could be expensive in practice to annotate all labels in every training image. Existing works on partial-label learning focus on the case where each training image is annotated with only a subset of its labels. A special case is to annotate only one positive label in each training image. To further relieve the annotation burden and enhance the performance of the classifier, this paper proposes a new partial-label setting in which only a subset of the training images are labeled, each with only one positive label, while the rest of the training images remain unlabeled. To handle this new setting, we propose an end-to-end deep network, PLMCL (Partial Label Momentum Curriculum Learning), that can learn to produce confident pseudo labels for both partially-labeled and unlabeled training images. The novel momentum-based law updates soft pseudo labels on each training image with the consideration of the updating velocity of pseudo labels, which help avoid trapping to low-confidence local minimum, especially at the early stage of training in lack of both observed labels and confidence on pseudo labels. In addition, we present a confidence-aware scheduler to adaptively perform easy-to-hard learning for different labels. Extensive experiments demonstrate that our proposed PLMCL outperforms many state-of-the-art multi-label classification methods under various partial-label settings on three different datasets.

preprint2022arXiv

Prior Knowledge Enhances Radiology Report Generation

Radiology report generation aims to produce computer-aided diagnoses to alleviate the workload of radiologists and has drawn increasing attention recently. However, previous deep learning methods tend to neglect the mutual influences between medical findings, which can be the bottleneck that limits the quality of generated reports. In this work, we propose to mine and represent the associations among medical findings in an informative knowledge graph and incorporate this prior knowledge with radiology report generation to help improve the quality of generated reports. Experiment results demonstrate the superior performance of our proposed method on the IU X-ray dataset with a ROUGE-L of 0.384$\pm$0.007 and CIDEr of 0.340$\pm$0.011. Compared with previous works, our model achieves an average of 1.6% improvement (2.0% and 1.5% improvements in CIDEr and ROUGE-L, respectively). The experiments suggest that prior knowledge can bring performance gains to accurate radiology report generation. We will make the code publicly available at https://github.com/bionlplab/report_generation_amia2022.

preprint2022arXiv

Radiology Text Analysis System (RadText): Architecture and Evaluation

Analyzing radiology reports is a time-consuming and error-prone task, which raises the need for an efficient automated radiology report analysis system to alleviate the workloads of radiologists and encourage precise diagnosis. In this work, we present RadText, an open-source radiology text analysis system developed by Python. RadText offers an easy-to-use text analysis pipeline, including de-identification, section segmentation, sentence split and word tokenization, named entity recognition, parsing, and negation detection. RadText features a flexible modular design, provides a hybrid text processing schema, and supports raw text processing and local processing, which enables better usability and improved data privacy. RadText adopts BioC as the unified interface, and also standardizes the input / output into a structured representation compatible with Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). This allows for a more systematic approach to observational research across multiple, disparate data sources. We evaluated RadText on the MIMIC-CXR dataset, with five new disease labels we annotated for this work. RadText demonstrates highly accurate classification performances, with an average precision of, a recall of 0.94, and an F-1 score of 0.92. We have made our code, documentation, examples, and the test set available at https://github.com/bionlplab/radtext .

preprint2022arXiv

Stellar chromospheric activities revealed from the LAMOST-K2 time-domain survey

By using the LAMOST time-domain survey data, we study stellar activities based on the $\rm{H_α}$ lines for about 2000 stars in four $K$2 plates. Two indices, $R_{\rm{Hα}}^{'}$ and $R_{\rm{Hα}}^{+}$, are computed from LAMOST spectra, the former of which is derived by excluding the photospheric contributions to the $\rm{H_α}$ lines, while the latter is derived by further subtracting the non-dynamo driven chromospheric emission. Meanwhile, the periodicity and variation amplitudes are computed from \emph{K2} light curves. Both the $R_{\rm{Hα}}^{'}$-Ro relation and $R_{\rm{Hα}}^{+}$-Ro relation show complicated profiles in the non-saturated decay region. Hot stars show flatter slopes and higher activity level than cool stars, and the behaviour is more notable in the $R_{\rm{Hα}}^{+}$-$R_{o}$ relation. This is consistent with recent studies using other activity proxies, including $L_{\rm{x}}/L_{\rm{bol}}$, $R_{\rm{HK}}^{'}$ and amplitudes of optical light curves. % This may suggest different kinds of stars follow different power laws in the decay region. Most of our targets have multiple observations, and some of them exhibit significant variability of ${\rm{Hα}}$ emissions, which may cause the large scatters shown in the decay region. We find three targets exhibiting positive correlation in rotational phase, possibly indicating that their optical light curves are dominated by hot faculae rather than cool starspots.

preprint2022arXiv

Task-Adaptive Few-shot Node Classification

Node classification is of great importance among various graph mining tasks. In practice, real-world graphs generally follow the long-tail distribution, where a large number of classes only consist of limited labeled nodes. Although Graph Neural Networks (GNNs) have achieved significant improvements in node classification, their performance decreases substantially in such a few-shot scenario. The main reason can be attributed to the vast generalization gap between meta-training and meta-test due to the task variance caused by different node/class distributions in meta-tasks (i.e., node-level and class-level variance). Therefore, to effectively alleviate the impact of task variance, we propose a task-adaptive node classification framework under the few-shot learning setting. Specifically, we first accumulate meta-knowledge across classes with abundant labeled nodes. Then we transfer such knowledge to the classes with limited labeled nodes via our proposed task-adaptive modules. In particular, to accommodate the different node/class distributions among meta-tasks, we propose three essential modules to perform \emph{node-level}, \emph{class-level}, and \emph{task-level} adaptations in each meta-task, respectively. In this way, our framework can conduct adaptations to different meta-tasks and thus advance the model generalization performance on meta-test tasks. Extensive experiments on four prevalent node classification datasets demonstrate the superiority of our framework over the state-of-the-art baselines. Our code is provided at https://github.com/SongW-SW/TENT.

preprint2022arXiv

Uncertainty-Aware Cascaded Dilation Filtering for High-Efficiency Deraining

Deraining is a significant and fundamental computer vision task, aiming to remove the rain streaks and accumulations in an image or video captured under a rainy day. Existing deraining methods usually make heuristic assumptions of the rain model, which compels them to employ complex optimization or iterative refinement for high recovery quality. This, however, leads to time-consuming methods and affects the effectiveness for addressing rain patterns deviated from from the assumptions. In this paper, we propose a simple yet efficient deraining method by formulating deraining as a predictive filtering problem without complex rain model assumptions. Specifically, we identify spatially-variant predictive filtering (SPFilt) that adaptively predicts proper kernels via a deep network to filter different individual pixels. Since the filtering can be implemented via well-accelerated convolution, our method can be significantly efficient. We further propose the EfDeRain+ that contains three main contributions to address residual rain traces, multi-scale, and diverse rain patterns without harming the efficiency. First, we propose the uncertainty-aware cascaded predictive filtering (UC-PFilt) that can identify the difficulties of reconstructing clean pixels via predicted kernels and remove the residual rain traces effectively. Second, we design the weight-sharing multi-scale dilated filtering (WS-MS-DFilt) to handle multi-scale rain streaks without harming the efficiency. Third, to eliminate the gap across diverse rain patterns, we propose a novel data augmentation method (i.e., RainMix) to train our deep models. By combining all contributions with sophisticated analysis on different variants, our final method outperforms baseline methods on four single-image deraining datasets and one video deraining dataset in terms of both recovery quality and speed.

preprint2021arXiv

J-PLUS: Support Vector Machine Applied to STAR-GALAXY-QSOClassification

Context. In modern astronomy, machine learning has proved to be efficient and effective to mine the big data from the newesttelescopes. Spectral surveys enable us to characterize millions of objects, while long exposure time observations and wide surveysconstrain their strides from millions to billions. Aims.In this study, we construct a supervised machine learning algorithm, to classify the objects in the Javalambre Photometric LocalUniverse Survey first data release (J-PLUS DR1). Methods.The sample set is featured with 12-waveband photometry, and magnitudes are labeled with spectrum-based catalogs, in-cluding Sloan Digital Sky Survey spectroscopic data, Large Sky Area Multi-Object Fiber Spectroscopic Telescope, and VERONCAT- Veron Catalog of Quasars & AGN. The performance of the classifier is presented with applications of blind test validations basedon RAdial Velocity Extension, Kepler Input Catalog, 2 MASS Redshift Survey, and the UV-bright Quasar Survey. A new algorithmis applied to constrain the extrapolation that could decrease accuracies for many machine learning classifiers. Results.The accuracies of the classifier are 96.5% in blind test and 97.0% in training cross validation. The F1-scores for each classare presented to show the precision of the classifier. We also discuss different methods to constrain the po

preprint2021arXiv

LAMOST Time-Domain Survey: First Results of four $K$2 plates

From Oct. 2019 to Apr. 2020, LAMOST performs a time-domain spectroscopic survey of four $K$2 plates with both low- and med-resolution observations. The low-resolution spectroscopic survey gains 282 exposures ($\approx$46.6 hours) over 25 nights, yielding a total of about 767,000 spectra, and the med-resolution survey takes 177 exposures ($\approx$49.1 hours) over 27 nights, collecting about 478,000 spectra. More than 70%/50% of low-resolution/med-resolution spectra have signal-to-noise ratio higher than 10. We determine stellar parameters (e.g., $T_{\rm eff}$, log$g$, [Fe/H]) and radial velocity (RV) with different methods, including LASP, DD-Payne, and SLAM. In general, these parameter estimations from different methods show good agreement, and the stellar parameter values are consistent with those of APOGEE. We use the $Gaia$ DR2 RV data to calculate a median RV zero point (RVZP) for each spectrograph exposure by exposure, and the RVZP-corrected RVs agree well with the APOGEE data. The stellar evolutionary and spectroscopic masses are estimated based on the stellar parameters, multi-band magnitudes, distances and extinction values. Finally, we construct a binary catalog including about 2700 candidates by analyzing their light curves, fitting the RV data, calculating the binarity parameters from med-resolution spectra, and cross-matching the spatially resolved binary catalog from $Gaia$ EDR3. The LAMOST TD survey is expected to get breakthrough in various scientific topics, such as binary system, stellar activity, and stellar pulsation, etc.

preprint2021arXiv

The Disk Veiling Effect of the Black Hole Low-Mass X-ray Binary A0620-00

The optical light curves of quiescent black hole low-mass X-ray binaries often exhibit significant non-ellipsoidal variabilities, showing the photospheric radiation of the companion star is veiled by other source of optical emission. Assessing this "veiling" effect is critical to the black hole mass measurement. Here in this work, we carry out a strictly simultaneous spectroscopic and photometric campaign on the prototype of black hole low-mass X-ray binary A0620-00. We find that for each observation epoch, the extra optical flux beyond a pure ellipsoidal modulation is positively correlated with the fraction of veiling emission, indicating the accretion disk contributes most of the non-ellipsoidal variations. Meanwhile, we also obtain a K2V spectral classification of the companion, as well as the measurements of the companion's rotational velocity $v \sin i = 83.8\pm1.9$ km s$^{-1}$ and the mass ratio between the companion and the black hole $q=0.063\pm0.004$.

preprint2020arXiv

A Catalog of Short Period Spectroscopic and Eclipsing Binaries Identified from the LAMOST & PTF Surveys

Binaries play key roles in determining stellar parameters and exploring stellar evolution models. We build a catalog of 88 eclipsing binaries with spectroscopic information, taking advantage of observations from both the Large Sky Area Multi-Object fiber Spectroscopic Telescope (LAMOST) and the Palomar Transient Factory (PTF) surveys. A software pipeline is constructed to identify binary candidates by examining their light curves. The orbital periods of binaries are derived from the Lomb-Scargle method. The key distinguishing features of eclipsing binaries are recognized by a new filter \textit{Flat Test}. We classify the eclipsing binaries by applying Fourier analysis on the light curves. Among all the binary stars, 13 binaries are identified as eclipsing binaries for the first time. The catalog contains information: position, primary eclipsing magnitude and time, eclipsing depth, the number of photometry and radial velocity observations, largest radial velocity difference, binary type, the effective temperature of observable star $T_{\rm eff}$, and surface gravity of observable star log \emph{g}. The false-positive probability is calculated by using both a Monte Carlo simulation and real data from the SDSS Stripe 82 Standard Catalog. The binaries in the catalog are mostly with a period of less than one day. The period distribution shows a 0.22-day cut-off which is consistent with the low probability of an eclipsing binary rotating with such a period.

preprint2020arXiv

Label-guided Learning for Text Classification

Text classification is one of the most important and fundamental tasks in natural language processing. Performance of this task mainly dependents on text representation learning. Currently, most existing learning frameworks mainly focus on encoding local contextual information between words. These methods always neglect to exploit global clues, such as label information, for encoding text information. In this study, we propose a label-guided learning framework LguidedLearn for text representation and classification. Our method is novel but simple that we only insert a label-guided encoding layer into the commonly used text representation learning schemas. That label-guided layer performs label-based attentive encoding to map the universal text embedding (encoded by a contextual information learner) into different label spaces, resulting in label-wise embeddings. In our proposed framework, the label-guided layer can be easily and directly applied with a contextual encoding method to perform jointly learning. Text information is encoded based on both the local contextual information and the global label clues. Therefore, the obtained text embeddings are more robust and discriminative for text classification. Extensive experiments are conducted on benchmark datasets to illustrate the effectiveness of our proposed method.

preprint2020arXiv

Learning to Segment Anatomical Structures Accurately from One Exemplar

Accurate segmentation of critical anatomical structures is at the core of medical image analysis. The main bottleneck lies in gathering the requisite expert-labeled image annotations in a scalable manner. Methods that permit to produce accurate anatomical structure segmentation without using a large amount of fully annotated training images are highly desirable. In this work, we propose a novel contribution of Contour Transformer Network (CTN), a one-shot anatomy segmentor including a naturally built-in human-in-the-loop mechanism. Segmentation is formulated by learning a contour evolution behavior process based on graph convolutional networks (GCNs). Training of our CTN model requires only one labeled image exemplar and leverages additional unlabeled data through newly introduced loss functions that measure the global shape and appearance consistency of contours. We demonstrate that our one-shot learning method significantly outperforms non-learning-based methods and performs competitively to the state-of-the-art fully supervised deep learning approaches. With minimal human-in-the-loop editing feedback, the segmentation performance can be further improved and tailored towards the observer desired outcomes. This can facilitate the clinician designed imaging-based biomarker assessments (to support personalized quantitative clinical diagnosis) and outperforms fully supervised baselines.

preprint2020arXiv

Modeling Cross-view Interaction Consistency for Paired Egocentric Interaction Recognition

With the development of Augmented Reality (AR), egocentric action recognition (EAR) plays important role in accurately understanding demands from the user. However, EAR is designed to help recognize human-machine interaction in single egocentric view, thus difficult to capture interactions between two face-to-face AR users. Paired egocentric interaction recognition (PEIR) is the task to collaboratively recognize the interactions between two persons with the videos in their corresponding views. Unfortunately, existing PEIR methods always directly use linear decision function to fuse the features extracted from two corresponding egocentric videos, which ignore consistency of interaction in paired egocentric videos. The consistency of interactions in paired videos, and features extracted from them are correlated to each other. On top of that, we propose to build the relevance between two views using biliear pooling, which capture the consistency of two views in feature-level. Specifically, each neuron in the feature maps from one view connects to the neurons from another view, which guarantee the compact consistency between two views. Then all possible paired neurons are used for PEIR for the inside consistent information of them. To be efficient, we use compact bilinear pooling with Count Sketch to avoid directly computing outer product in bilinear. Experimental results on dataset PEV shows the superiority of the proposed methods on the task PEIR.

preprint2020arXiv

MUTATT: Visual-Textual Mutual Guidance for Referring Expression Comprehension

Referring expression comprehension (REC) aims to localize a text-related region in a given image by a referring expression in natural language. Existing methods focus on how to build convincing visual and language representations independently, which may significantly isolate visual and language information. In this paper, we argue that for REC the referring expression and the target region are semantically correlated and subject, location and relationship consistency exist between vision and language.On top of this, we propose a novel approach called MutAtt to construct mutual guidance between vision and language, which treat vision and language equally thus yield compact information matching. Specifically, for each module of subject, location and relationship, MutAtt builds two kinds of attention-based mutual guidance strategies. One strategy is to generate vision-guided language embedding for the sake of matching relevant visual feature. The other reversely generates language-guided visual feature to match relevant language embedding. This mutual guidance strategy can effectively guarantees the vision-language consistency in three modules. Experiments on three popular REC datasets demonstrate that the proposed approach outperforms the current state-of-the-art methods.

preprint2020arXiv

Phase-dependent study of near-infrared disk emission lines in LB-1

The mass, origin and evolutionary stage of the binary system LB-1 has been the subject of intense debate, following the claim that it hosts an $\sim$70$M_{\odot}$ black hole, in stark contrast with the expectations for stellar remnants in the Milky Way. We conducted a high-resolution, phase-resolved spectroscopic study of the near-infrared Paschen lines in this system, using the 3.5-m telescope at Calar Alto Observatory. We find that Pa$β$ and Pa$γ$ (after proper subtraction of the stellar absorption component) are well fitted with a standard double-peaked model, typical of disk emission. We measured the velocity shifts of the red and blue peaks at 28 orbital phases: the line center has an orbital motion in perfect antiphase with the stellar motion, and the radial velocity amplitude ranges from 8 to 13 km/s for different choices of lines and profile modelling. We interpret this curve as proof that the disk is tracing the orbital motion of the primary, ruling out the circumbinary disk and the hierarchical triple scenarios. The phase-averaged peak-to-peak half-separation (proxy for the projected rotational velocity of the outer disk) is $\sim$70 km s$^{-1}$, larger than the stellar orbital velocity and also inconsistent with a circumbinary disk. From those results, we infer a primary mass 4--8 times higher than the secondary mass. Moreover, we show that the ratio of the blue and red peaks (V/R intensity ratio) has a sinusoidal behaviour in phase with the secondary star, which can be interpreted as the effect of external irradiation by the secondary star on the outer disk. Finally, we briefly discuss our findings in the context of alternative scenarios recently proposed for LB-1. Definitive tests between alternative solutions will require further astrometric data from $Gaia$.

preprint2020arXiv

Shadow Removal by a Lightness-Guided Network with Training on Unpaired Data

Shadow removal can significantly improve the image visual quality and has many applications in computer vision. Deep learning methods based on CNNs have become the most effective approach for shadow removal by training on either paired data, where both the shadow and underlying shadow-free versions of an image are known, or unpaired data, where shadow and shadow-free training images are totally different with no correspondence. In practice, CNN training on unpaired data is more preferred given the easiness of training data collection. In this paper, we present a new Lightness-Guided Shadow Removal Network (LG-ShadowNet) for shadow removal by training on unpaired data. In this method, we first train a CNN module to compensate for the lightness and then train a second CNN module with the guidance of lightness information from the first CNN module for final shadow removal. We also introduce a loss function to further utilise the colour prior of existing data. Extensive experiments on widely used ISTD, adjusted ISTD and USR datasets demonstrate that the proposed method outperforms the state-of-the-art methods with training on unpaired data.

preprint2020arXiv

Zero-Dimensional Organic-Inorganic Hybrid Material with Ultra-Narrow-Red Emission at Room Temperature

Recently, low-dimensional organic-inorganic hybrid halide compounds have aroused great attention in the optoelectronic field, due to the unique topology and optical properties. Herein, we report an Mn4+ doped [N(CH3)4]2TiF6 zero-dimensional organic-inorganic hybrid phosphor, which could not only exhibit very narrow and pure red emission, but also maintain efficient emission intensity at room temperature. The crystal structure, photoluminescence properties and temperature sensing application are discussed. The excellent temperature dependent luminescent properties are attributed to the rigid structure and isolated MnF62- octahedra in the total crystal framework. These results will help design suitable materials and devices in both warm white light emitting diodes and optical sensors.

preprint2019arXiv

Machine Learning Regression of extinction in the second $Gaia$ Data Release

Machine learning has become a popular tool to help us make better decisions and predictions, based on experiences, observations and analysing patterns within a given data set without explicitly functions. In this paper, we describe an application of the supervised machine-learning algorithm to the extinction regression for the second Gaia data release, based on the combination of Large Sky Area Multi-Object Fiber Spectroscopic Telescope, Sloan Extension for Galactic Understanding and Exploration and the Apache Point Observatory Galactic Evolution Experiment. The derived extinction in our training sample is consistent with other spectrum-based estimates, and its standard deviation of the cross validations is 0.0127 mag. A blind test is carried out using the RAdial Velocity Experiment catalog, and the standard deviation is 0.0372 mag. Such precise training sample enable us to regress the extinction, E(BP-RP), for 133 million stars in the second Gaia data release. Of these, 106 million stars have the uncertainties less than 0.1 mag, which suffer less bias from the external regression. We also find that there are high deviations between the extinctions form photometry-based methods, and between spectrum- and photometry-based methods. This implies that spectrum-based method could bring more signal to a regressing model than multi-band photometry, and a higher signal-to-noise ratio would acquire a more reliable result.

preprint2016arXiv

An Automated CNN Recommendation System for Image Classification Tasks

Nowadays the CNN is widely used in practical applications for image classification task. However the design of the CNN model is very professional work and which is very difficult for ordinary users. Besides, even for experts of CNN, to select an optimal model for specific task may still need a lot of time (to train many different models). In order to solve this problem, we proposed an automated CNN recommendation system for image classification task. Our system is able to evaluate the complexity of the classification task and the classification ability of the CNN model precisely. By using the evaluation results, the system can recommend the optimal CNN model and which can match the task perfectly. The recommendation process of the system is very fast since we don't need any model training. The experiment results proved that the evaluation methods are very accurate and reliable.

preprint2016arXiv

Chandra ACIS Survey of X-ray Point Sources in Nearby Galaxies. II. X-ray Luminosity Functions and Ultraluminous X-ray Sources

Based on the recently completed {\it Chandra}/ACIS survey of X-ray point sources in nearby galaxies, we study the X-ray luminosity functions (XLFs) for X-ray point sources in different types of galaxies and the statistical properties of ultraluminous X-ray sources (ULXs). Uniform procedures are developed to compute the detection threshold, to estimate the foreground/background contamination, and to calculate the XLFs for individual galaxies and groups of galaxies, resulting in an XLF library for 343 galaxies of different types. With the large number of surveyed galaxies, we have studied the XLFs and ULX properties across different host galaxy types, and confirm with good statistics that the XLF slope flattens from lenticular ($α\sim1.50\pm0.07$) to elliptical ($\sim1.21\pm0.02$), to spirals ($\sim0.80\pm0.02$), to peculiars ($\sim0.55\pm0.30$), and to irregulars ($\sim0.26\pm0.10$). The XLF break dividing the neutron star and black hole binaries is also confirmed, albeit at quite different break luminosities for different types of galaxies. A radial dependency is found for ellipticals, with a flatter XLF slope for sources located between $D_{25}$ and 2$D_{25}$, suggesting the XLF slopes in the outer region of early-type galaxies are dominated by low-mass X-ray binaries in globular clusters. This study shows that the ULX rate in early-type galaxies is $0.24\pm0.05$ ULXs per surveyed galaxy, on a $5σ$ confidence level. The XLF for ULXs in late-type galaxies extends smoothly until it drops abruptly around $4\times10^{40}$ erg s$^{-1}$, and this break may suggest a mild boundary between the stellar black hole population possibly including 30 $M_\odot$ black holes with super-Eddington radiation and intermediate mass black holes.

preprint2016arXiv

Chandra ACIS Survey of X-ray Point Sources: The Source Catalog

The $Chandra$ archival data is a valuable resource for various studies on different topics of X-ray astronomy. In this paper, we utilize this wealth and present a uniformly processed data set, which can be used to address a wide range of scientific questions. The data analysis procedures are applied to 10,029 ACIS observations, which produces 363,530 source detections, belonging to 217,828 distinct X-ray sources. This number is twice the size of the $Chandra$ Source Catalog (Version 1.1). The catalogs in this paper provide abundant estimates of the detected X-ray source properties, including source positions, counts, colors, fluxes, luminosities, variability statistics, etc. Cross-correlation of these objects with galaxies shows 17,828 sources are located within the $D_{25}$ isophotes of 1110 galaxies, and 7504 sources are located between the $D_{25}$ and 2$D_{25}$ isophotes of 910 galaxies. Contamination analysis with the log$N$--log$S$ relation indicates that 51.3\% of objects within 2$D_{25}$ isophotes are truly relevant to galaxies, and the "net" source fraction increases to 58.9\%, 67.3\%, and 69.1\% for sources with luminosities above $10^{37}$, $10^{38}$, and $10^{39}$ erg s$^{-1}$. Among the possible scientific uses of this catalog, we discuss the possibility to study intra-observation variability, inter-observation variability, and supersoft sources.

preprint2016arXiv

On Study of the Binarized Deep Neural Network for Image Classification

Recently, the deep neural network (derived from the artificial neural network) has attracted many researchers' attention by its outstanding performance. However, since this network requires high-performance GPUs and large storage, it is very hard to use it on individual devices. In order to improve the deep neural network, many trials have been made by refining the network structure or training strategy. Unlike those trials, in this paper, we focused on the basic propagation function of the artificial neural network and proposed the binarized deep neural network. This network is a pure binary system, in which all the values and calculations are binarized. As a result, our network can save a lot of computational resource and storage. Therefore, it is possible to use it on various devices. Moreover, the experimental results proved the feasibility of the proposed network.

preprint2016arXiv

Sharp Chandra View of ROSAT All-Sky Survey Bright Sources: I. Improvement of Positional Accuracy

The ROSAT All-Sky Survey (RASS) represents one of the most complete and sensitive soft X-ray all-sky surveys to date. However, the deficient positional accuracy of the RASS Bright Source Catalog (BSC) and subsequent lack of firm optical identifications affect the multi-wavelength studies of X-ray sources. The widely used positional errors $σ_{pos}$ based on the Tycho Stars Catalog (Tycho-1) have previously been applied for identifying objects in the optical band. The considerably sharper Chandra view covers a fraction of RASS sources, whose $σ_{pos}$ could be improved by utilizing the sub-arcsec positional accuracy of Chandra observations. We cross-match X-ray objects between the BSC and \emph{Chandra} sources extracted from the Advanced CCD Imaging Spectrometer (ACIS) archival observations. A combined counterparts list (BSCxACIS) with \emph{Chandra} spatial positions weighted by the X-ray flux of multi-counterparts is employed to evaluate and improve the former identifications of BSC with the other surveys. Based on these identification evaluations, we suggest that the point-likeness of BSC sources and INS (isolated neutron stars) candidates should be carefully reconsidered.

preprint2016arXiv

Who Leads the Clothing Fashion: Style, Color, or Texture? A Computational Study

It is well known that clothing fashion is a distinctive and often habitual trend in the style in which a person dresses. Clothing fashions are usually expressed with visual stimuli such as style, color, and texture. However, it is not clear which visual stimulus places higher/lower influence on the updating of clothing fashion. In this study, computer vision and machine learning techniques are employed to analyze the influence of different visual stimuli on clothing-fashion updates. Specifically, a classification-based model is proposed to quantify the influence of different visual stimuli, in which each visual stimulus's influence is quantified by its corresponding accuracy in fashion classification. Experimental results demonstrate that, on clothing-fashion updates, the style holds a higher influence than the color, and the color holds a higher influence than the texture.

preprint2015arXiv

An Updated Ultraviolet Catalog of GALEX Nearby Galaxies

The ultraviolet catalog of nearby galaxies made by \citet{Gil07} presents the integrated photometry and surface brightness profiles for 1034 nearby galaxies observed by \textit{Galaxy Evolution Explorer} (\textit{GALEX}). We provide an updated catalog of 4138 nearby galaxies based on the latest Genral Release (GR6/GR7) of \textit{GALEX}. These galaxies are selected from HyperLeda with apparent diameter larger than 1{\arcmin}. From the surface brightness profiles accurately measured with the deep NUV and FUV images, we have calculated asymptotic magnitudes, aperture (D25) magnitudes, colors, structural parameters (effective radii and concentration indices), luminosities, and effective surface brightness. Archival optical and infrared photometry from HyperLeda, 2MASS, and IRAS are also integrated into the catalog. Our parameter measurements and some analyses are consistent with those of \citet{Gil07}. The (FUV $- K$) color provides a good criterion to distinguish early and late-type galaxies, which can be improved further with the concentration indices. The IRX-$β$ relation is reformulated with our UV-selected nearby galaxies.

preprint2015arXiv

BATC 15 Band Photometry of the Open Cluster NGC 188

This paper presents CCD multicolour photometry for the old open cluster NGC 188. The observations were carried out as a part of the Beijing--Arizona--Taiwan--Connecticut Multicolour Sky Survey from 1995 February to 2008 March, using 15 intermediate-band filters covering 3000--10000 Å. By fitting the Padova theoretical isochrones to our data, the fundamental parameters of this cluster are derived: an age of $t=7.5\pm 0.5$ Gyr, a distant modulus of $(m-M)_0=11.17\pm0.08$, and a reddening of $E(B-V)=0.036\pm0.010$. The radial surface density profile of NGC 188 is obtained by star count. By fitting the King model, the structural parameters of NGC 188 are derived: a core radius of $R_{c}=3.80'$, a tidal radius of $R_{t}=44.78'$, and a concentration parameter of $C_{0}=\log(R_{t}/R_{c})=1.07$. Fitting the mass function to a power-law function $ϕ(m) \propto m^α$, the slopes of mass functions for different spatial regions are derived. We find that NGC 188 presents a slope break in the mass function. The break mass is $m_{\rm break}=0.885~M_{\odot}$. In the mass range above $m_{\rm break}$, the slope of the overall region is $α=-0.76$. The slope of the core region is $α=1.09$, and the slopes of the external regions are $α=-0.86$ and $α=-2.15$, respectively. In the mass range below $m_{\rm break}$, these slopes are $α=0.12$, $α=4.91$, $α=1.33$, and $α=-1.09$, respectively. The mass segregation in NGC 188 is reflected in the obvious variation of the slopes in different spatial regions of this cluster.

preprint2015arXiv

Co-interest Person Detection from Multiple Wearable Camera Videos

Wearable cameras, such as Google Glass and Go Pro, enable video data collection over larger areas and from different views. In this paper, we tackle a new problem of locating the co-interest person (CIP), i.e., the one who draws attention from most camera wearers, from temporally synchronized videos taken by multiple wearable cameras. Our basic idea is to exploit the motion patterns of people and use them to correlate the persons across different videos, instead of performing appearance-based matching as in traditional video co-segmentation/localization. This way, we can identify CIP even if a group of people with similar appearance are present in the view. More specifically, we detect a set of persons on each frame as the candidates of the CIP and then build a Conditional Random Field (CRF) model to select the one with consistent motion patterns in different videos and high spacial-temporal consistency in each video. We collect three sets of wearable-camera videos for testing the proposed algorithm. All the involved people have similar appearances in the collected videos and the experiments demonstrate the effectiveness of the proposed algorithm.

preprint2015arXiv

Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation

We propose a new learning-based method for estimating 2D human pose from a single image, using Dual-Source Deep Convolutional Neural Networks (DS-CNN). Recently, many methods have been developed to estimate human pose by using pose priors that are estimated from physiologically inspired graphical models or learned from a holistic perspective. In this paper, we propose to integrate both the local (body) part appearance and the holistic view of each local part for more accurate human pose estimation. Specifically, the proposed DS-CNN takes a set of image patches (category-independent object proposals for training and multi-scale sliding windows for testing) as the input and then learns the appearance of each local part by considering their holistic views in the full body. Using DS-CNN, we achieve both joint detection, which determines whether an image patch contains a body joint, and joint localization, which finds the exact location of the joint in the image patch. Finally, we develop an algorithm to combine these joint detection/localization results from all the image patches for estimating the human pose. The experimental results show the effectiveness of the proposed method by comparing to the state-of-the-art human-pose estimation methods based on pose priors that are estimated from physiologically inspired graphical models or learned from a holistic perspective.

preprint2015arXiv

Feature Sampling Strategies for Action Recognition

Although dense local spatial-temporal features with bag-of-features representation achieve state-of-the-art performance for action recognition, the huge feature number and feature size prevent current methods from scaling up to real size problems. In this work, we investigate different types of feature sampling strategies for action recognition, namely dense sampling, uniformly random sampling and selective sampling. We propose two effective selective sampling methods using object proposal techniques. Experiments conducted on a large video dataset show that we are able to achieve better average recognition accuracy using 25% less features, through one of proposed selective sampling methods, and even remain comparable accuracy while discarding 70% features.

preprint2015arXiv

LooseCut: Interactive Image Segmentation with Loosely Bounded Boxes

One popular approach to interactively segment the foreground object of interest from an image is to annotate a bounding box that covers the foreground object. Then, a binary labeling is performed to achieve a refined segmentation. One major issue of the existing algorithms for such interactive image segmentation is their preference of an input bounding box that tightly encloses the foreground object. This increases the annotation burden, and prevents these algorithms from utilizing automatically detected bounding boxes. In this paper, we develop a new LooseCut algorithm that can handle cases where the input bounding box only loosely covers the foreground object. We propose a new Markov Random Fields (MRF) model for segmentation with loosely bounded boxes, including a global similarity constraint to better distinguish the foreground and background, and an additional energy term to encourage consistent labeling of similar-appearance pixels. This MRF model is then solved by an iterated max-flow algorithm. In the experiments, we evaluate LooseCut in three publicly-available image datasets, and compare its performance against several state-of-the-art interactive image segmentation algorithms. We also show that LooseCut can be used for enhancing the performance of unsupervised video segmentation and image saliency detection.

preprint2015arXiv

Relativistic baryonic jets from an ultraluminous supersoft X-ray source

The formation of relativistic jets by an accreting compact object is one of the fundamental mysteries of astrophysics. While the theory is poorly understood, observations of relativistic jets from systems known as microquasars have led to a well-established phenomenology. Relativistic jets are not expected from sources with soft or supersoft X-ray spectra, although two such systems are known to produce relatively low-velocity bipolar outflows. Here we report optical spectra of an ultraluminous supersoft X-ray source (ULS) in the nearby galaxy M81 (M81 ULS-1) showing blueshifted broad Hα emission lines, characteristic of baryonic jets with relativistic speeds. The time variable jets have projected velocities ~17 per cent of the speed of light, and seem similar to those in the prototype microquasar SS 433. Such relativistic jets are not expected to be launched from white dwarfs, but an origin from a black hole or neutron star in M81 ULS-1 is hard to reconcile with its constant soft X-rays. The completely unexpected presence of relativistic jets in a ULS challenges the canonical theories for jet formation, but may possibly be explained by a long speculated super-critically accreting black hole with optically thick outflows

preprint2015arXiv

Spectroscopic Studies of an Ultraluminous Supersoft X-Ray Source in M81

Ultraluminous supersoft X-ray sources (ULSs) exhibit supersoft X-ray spectra with blackbody temperatures below 0.1 keV and bolometric luminosities above 10$^{39}$ ergs s$^{-1}$. In this Letter, we report the first optical spectroscopic observations of a ULS in M81 using the LRIS spectrograph on the Keck I telescope. The detected Balmer emission lines show a mean intrinsic velocity dispersion of 400$\pm$80 km s$^{-1}$, which is consistent with that from an accretion disk. The spectral index of the continuum on the blue side is also consistent with the multi-color disk model. The H$_α$ emission line exhibits a velocity of $\sim$180 km s$^{-1}$ relative to the local stellar environment, suggesting that this ULS may be a halo system in M81 belonging to an old population. No significant shift is found for the H$_α$ emission line between two observations separated by four nights.

preprint2015arXiv

Unsupervised Cross-Domain Recognition by Identifying Compact Joint Subspaces

This paper introduces a new method to solve the cross-domain recognition problem. Different from the traditional domain adaption methods which rely on a global domain shift for all classes between source and target domain, the proposed method is more flexible to capture individual class variations across domains. By adopting a natural and widely used assumption -- "the data samples from the same class should lay on a low-dimensional subspace, even if they come from different domains", the proposed method circumvents the limitation of the global domain shift, and solves the cross-domain recognition by finding the compact joint subspaces of source and target domain. Specifically, given labeled samples in source domain, we construct subspaces for each of the classes. Then we construct subspaces in the target domain, called anchor subspaces, by collecting unlabeled samples that are close to each other and highly likely all fall into the same class. The corresponding class label is then assigned by minimizing a cost function which reflects the overlap and topological structure consistency between subspaces across source and target domains, and within anchor subspaces, respectively.We further combine the anchor subspaces to corresponding source subspaces to construct the compact joint subspaces. Subsequently, one-vs-rest SVM classifiers are trained in the compact joint subspaces and applied to unlabeled data in the target domain. We evaluate the proposed method on two widely used datasets: object recognition dataset for computer vision tasks, and sentiment classification dataset for natural language processing tasks. Comparison results demonstrate that the proposed method outperforms the comparison methods on both datasets.

preprint2014arXiv

Measuring the superconducting coherence length in thin films using a two-coil experiment

We present measurements of the superconducting coherence length ξ in thin (d < 100 Å) films of MoGe alloy and Nb using a combination of linear and nonlinear mutual inductance techniques. As the alternating current in the drive coil is increased at fixed temperature, we see a crossover from linear to nonlinear coupling to the pickup coil, consistent with the unbinding of vortex-antivortex pairs as the peak pair momentum nears \hbar\/ξ and the unbinding barrier vanishes. We compare measurements of ξ made by this mutual inductance technique to values determined from the films' upper critical fields, thereby confirming the applicability of a recent calculation of the upper limit on a vortex-free state in our experiment.

preprint2014arXiv

Multiplicity One Theorems, S-Version

We proved three theorems of $S$-version of the mulyiplicity one.

preprint2014arXiv

New 2MASS Near-infrared Photometry for Globular Clusters in M31

We present 2MASS $JHK_{\rm s}$ photometry for 913 star clusters and candidates in the field of M31, which are selected from the latest Revised Bologna Catalog of M31 globular clusters (GCs) and candidates. The photometric measurements in this paper supplement this catalog, and provide a most comprehensive and homogeneous photometric catalog for M31 GCs in the $JHK_{\rm s}$ bandpasses. In general, our photometry is consistent with previous measurements. The globular cluster luminosity function (GCLF) peaks for the confirmed GCs derived by fitting a $t_5$ distribution using maximum likelihood method are: $J_0 = 15.348_{-0.208}^{+0.206}$, $H_0 = 14.703_{-0.180}^{+0.176}$, and ${K_{\rm s}}_0 = 14.534_{-0.146}^{+0.142}$, all of which agree well with previous studies. The GCLFs are different between metal-rich (MR) and metal-poor (MP), inner and outer subpopulations, as that MP clusters are fainter than their MR counterparts, and the inner clusters are brighter than the outer ones, which confirm previous results. The NIR colors of the GC candidates are on average redder than those of the confirmed GCs, which lead to an obscure bimodal distribution of the color indices. The relation of $(V-K_{\rm s})_0$ and metallicity shows a notable departure from linearity, with a shallower slope towards the redder end. The color-magnitude diagram (CMD) and color-color diagram show that many GC candidates are located out of the evolutionary tracks, suggesting that some of them may be false M31 GC candidates. The CMD also shows that the initial mass function of M31 GCs covers a large range, and the majority of the clusters have initial masses between $10^3$ and $10^6$ $M_{\odot}$.

preprint2014arXiv

Spectral Energy Distributions and Masses of 304 M31 Old Star Clusters

This paper presents CCD multicolor photometry for 304 old star clusters in the nearby spiral galaxy M31. Of which photometry of 55 star clusters is first obtained. The observations were carried out as a part of the Beijing--Arizona--Taiwan--Connecticut (BATC) Multicolor Sky Survey from 1995 February to 2008 March, using 15 intermediate-band filters covering 3000--10000 Å. Detailed comparisons show that our photometry is in agreement with previous measurements. Based on the ages and metallicities from Caldwell et al. and the photometric measurements here, we estimated the clusters' masses by comparing their multicolor photometry with stellar population synthesis models. The results show that the sample clusters have masses between $\sim 3\times10^4 M_\odot$ and $\sim 10^7 M_\odot$ with the peak of $\sim 4\times10^5 M_\odot$. The masses here are in good agreement with those in previous studies. Combined with the masses of young star clusters of M31 from Wang et al., we find that the peak of mass of old clusters is ten times that of young clusters.

preprint2014arXiv

Video In Sentences Out

We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases, spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the trackto-role assignments, and changing body posture.

preprint2013arXiv

Lie bialgebras of generalized loop Virasoro algebras

The first cohomology group of a generalized loop Virasoro algebra with coefficients in the tensor product of its adjoint module is shown to be trivial. The result is applied to prove that Lie bialgebra structures on generalized loop Virasoro algebras are coboundary triangular. We then generalize the results to generalized map Virasoro algebras.

preprint2013arXiv

Structural parameters for globular clusters in M31

In this paper, we present surface brightness profiles for 79 globular clusters in M31, using images observed with {\it Hubble Space Telescope}, some of which are from new observations. The structural and dynamical parameters are derived from fitting the profiles to several different models for the first time. The results show that in the majority of cases, King models fit the M31 clusters as well as Wilson models, and better than Sérsic models. However, there are 11 clusters best fitted by Sérsic models with the Sérsic index $n>2$, meaning that they have cuspy central density profiles. These clusters may be the well-known core-collapsed candidates. There is a bimodality in the size distribution of M31 clusters at large radii, which is different from their Galactic counterparts. In general, the properties of clusters in M31 and the Milky Way fall in the same regions of parameter spaces. The tight correlations of cluster properties indicate a "fundamental plane" for clusters, which reflects some universal physical conditions and processes operating at the epoch of cluster formation.

preprint2012arXiv

Age and mass studies for young star clusters in M31 from SEDs-fit

In this paper, we present photometry for young star clusters in M31, which are selected from Caldwell et al. These star clusters have been observed as part of the Beijing--Arizona--Taiwan--Connecticut (BATC) Multicolor Sky Survey from 1995 February to 2008 March. The BATC images including these star clusters are taken with 15 intermediate-band filters covering 3000--10000 Å. Combined with photometry in the {\sl GALEX} far- and near-ultraviolet, broad-band $UBVRI$, SDSS $ugriz$, and infrared $JHK_{\rm s}$ of Two Micron All Sky Survey, we obtain their accurate spectral energy distributions (SEDs) from 1538-20000 Å. We derive these star clusters' ages and masses by comparing their SEDs with stellar population synthesis models. Our results are in good agreement with previous determinations. The mean value of age and mass of young clusters ($<2$ Gyr) is about 385 Myr and $2\times 10^4 {M_\odot}$, respectively. There are two distinct peaks in the age distribution, a highest peak at age $\sim$ 60 Myr and a secondary peak around 250 Myr, while the mass distribution shows a single peak around $10^4 {M_\odot}$. A few young star clusters have two-body relaxation times greater than their ages, indicating that those clusters have not been well dynamically relaxed and therefore have not established the thermal equilibrium. There are several regions showing aggregations of young star clusters around the 10 kpc ring and the outer ring, indicating that the distribution of the young star clusters is well correlated with M31's star-forming regions. The young massive star clusters (age $\leq 100$ Myr and mass $\geq 10^4 {M_\odot}$) show apparent concentration around the ring splitting region, suggesting a recent passage of a satellite galaxy (M32) through M31 disk.

preprint2012arXiv

Derivations and automorphism groups of the original deformative Schrödinger-Virasoro algebras

In this paper, we determine the derivation algebra and the automorphism group of the original deformative ${\rm Schr\ddot{o}dinger}$-{\rm Virasoro} algebras which is the semi-direct product Lie algebra of the Witt algebra and its tensor density module ${\rm I^g}(a,b)$.

preprint2012arXiv

Large-Scale Automatic Labeling of Video Events with Verbs Based on Event-Participant Interaction

We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event participants, humans and objects interacting with each other, abstracting away all object-class information and fine-grained image characteristics, and relying solely on the coarse-grained motion of the event participants. We apply our approach to a large set of 22 distinct verb classes and a corpus of 2,584 videos, yielding two surprising outcomes. First, a classification accuracy of greater than 70% on a 1-out-of-22 labeling task and greater than 85% on a variety of 1-out-of-10 subsets of this labeling task is independent of the choice of which of two different time-series classifiers we employ. Second, we achieve this level of accuracy using a highly impoverished intermediate representation consisting solely of the bounding boxes of one or two event participants as a function of time. This indicates that successful event recognition depends more on the choice of appropriate features that characterize the linguistic invariants of the event classes than on the particular classifier algorithms.

preprint2012arXiv

Metal Abundance and Kinematical Properties of M81 Globular Cluster System

In this paper, we presented metal abundance properties of 144 M81 globular clusters. These globulars consist of the largest globular cluster sample in M81 till now. Our main results are: the distribution of metallicities are bimodal, with metallicity peaks at [Fe/H]\sim-1.51 and -0.58, and the metal-poor globular clusters tend to be less spatially concentrated than the metal-rich ones; the metal-rich globular clusters in M81 do not demonstrate a centrally concentrated spatial distribution as the metal-rich ones in M31 do; like our Galaxy and M31, the globular clusters in M81 have a small radial metallicity gradient. These results are consistent with those obtained based on a small sample of M81 globular clusters. In addition, this paper showed that there is evidence that a strong rotation of the M81 globular cluster system around the minor axis exists, and that rotation is present in the metal-rich globular cluster subsample, while the metal-poor globular cluster subsample shows no evidence for rotation. The most significant difference between the rotation of the metal-rich and metal-poor globular clusters occurs at intermediate projected galactocentric radii. The results of this paper confirm the conclusion of Schroder et al. that M81's metal-rich globular clusters at intermediate projected radii were associated with a thick disk of M81.

preprint2012arXiv

On Local and Global Conjugacy

In this note we discuss and classify LFMO-spcial representations, for which under certain functoriality, it gives the instance of failure of multiplicity one.

preprint2012arXiv

Structural Parameters for Globular Clusters in the Outer Halo of M31

In this paper, we present internal surface brightness profiles, using images in the F606W and F814W filter bands observed with the Advanced Camera for Surveys on the {\it Hubble Space Telescope}, for ten globular clusters (GCs) in the outer halo of M31. Standard King models are fitted to the profiles to derive their structural and dynamical parameters. The results show that, in general, the properties of clusters in M31 and the Milky Way fall in the same regions of parameter spaces. The outer halo GCs of M31 have larger ellipticities than most of GCs in M31 and the Milky Way. Their large ellipticities may be due to galaxy tides coming from satellite dwarf galaxies of M31 or may be related to the apparently more vigorous accretion or merger history that M31 has experienced. The tight correlation of cluster binding energy $E_b$ with mass $M_{\rm mod}$ indicates that, the "fundamental plane" does exist for clusters, regardless of their host environments, which is consistent with previous studies.

preprint2012arXiv

Video In Sentences Out

We present a system that produces sentential descriptions of video: who did what to whom, and where and how they did it. Action class is rendered as a verb, participant objects as noun phrases, properties of those objects as adjectival modifiers in those noun phrases,spatial relations between those participants as prepositional phrases, and characteristics of the event as prepositional-phrase adjuncts and adverbial modifiers. Extracting the information needed to render these linguistic entities requires an approach to event recognition that recovers object tracks, the track-to-role assignments, and changing body posture.

preprint2011arXiv

Age and mass constraints for a young massive cluster in M31 based on spectral-energy-distribution fitting

VDB0-B195D is a massive, blue star cluster in M31. It was observed as part of the Beijing-Arizona-Taiwan-Connecticut (BATC) Multicolor Sky Survey using 15 intermediate-band filters covering a wavelength range of 3000--10,000 Å. Based on aperture photometry, we obtain its spectral-energy distribution (SED) as defined by the 15 BATC filters. We apply previously established relations between the BATC intermediate-band and the Johnson-Cousins $UBVRI$ broad-band systems to convert our BATC photometry to the standard system. A detailed comparison shows that our newly derived $VRI$ magnitudes are fully consistent with previous results, while our new $B$ magnitude agrees to within $2σ$. In addition, we determine the cluster's age and mass by comparing its SED (from 3000 to 20,000Å, comprising photometric data in the 15 BATC intermediate bands, optical broad-band $BVRI$, and 2MASS near-infrared $JHK_{\rm s}$ data) with theoretical stellar population synthesis models, resulting in age and mass determinations of $60.0\pm 8.0$~Myr and $(1.1-1.6) \times 10^5 M_\odot$, respectively. This age and mass confirms previous suggestions that VDB0-B195D is a young massive cluster in M31.

preprint2011arXiv

Age and structure parameters of a remote M31 globular cluster B514 based on HST, 2MASS, GALEX and BATC observations

B514 is a remote M31 globular cluster which locating at a projected distance of R_p~55 kpc. Deep observations with the Advanced Camera for Surveys (ACS) on the Hubble Space Telescope (HST) are used to provide the accurate integrated light and star counts of B514. By coupling analysis of the distribution of the integrated light with star counts, we are able to reliably follow the profile of the cluster out to ~40". Based on the combined profile, we study in detail its surface brightness distribution in F606W and F814W filters, and determine its structural parameters by fitting a single-mass isotropic King model. The results showed that, the surface brightness distribution departs from the best-fit King model for r>10". B514 is quite flatted in the inner region, and has a larger half-light radius than majority of normal globular clusters of the same luminosity. It is interesting that, in the M_V versus log R_h plane, B514 lies nearly on the threshold for ordinary globular clusters as defined by Mackey & van den Bergh. In addition, B514 was observed as part of the Beijing-Arizona-Taiwan-Connecticut (BATC) Multicolor Sky Survey, using 13 intermediate-band filters covering a wavelength range of 3000--8500 Å. Based on aperture photometry, we obtain its SEDs as defined by the 13 BATC filters. We determine the cluster's age and mass by comparing its SEDs (from 2267 to 20000Å, comprising photometric data in the near-ultraviolet of GALEX, 5 SDSS bands, 13 BATC intermediate-band, and 2MASS near-infrared JHKs} filters) with theoretical stellar population synthesis models, resulting in age of $11.5\pm3.5$ Gyr. This age confirms the previous suggestion that B514 is an old GC in M31. B514 has a mass of $0.96-1.08 \times 10^6 \rm M_sun$, and is a medium-mass globular cluster in M31.

preprint2011arXiv

Extension of vertex operator algebra $V_{\hat{H}_{4}}(\ell,0)$

We classify the irreducible restricted modules for the affine Nappi-Witten Lie algebra $\hat{H}_{4}$ with some natural conditions. It turns out the representation theory of $\hat{H}_{4}$ is quite different from the theory of representations of Heisenberg algebras. We also study the extension of the vertex operator algebra $V_{\hat{H}_{4}}(\ell,0)$ by the even lattice $L$. We give the structure of the extension $V_{\hat{H}_{4}}(\ell,0)\otimes \C[L]$ and its irreducible modules via irreducible representations of $V_{\hat{H}_{4}}(\ell,0)$ viewed as a vertex algebra.

preprint2010arXiv

Determination of fundamental properties of an M31 globular cluster from main-sequence photometry

M31 globular cluster B379 is the first extragalactic cluster, the age of which was determined by main-sequence photometry. In this method, the age of a cluster is obtained by fitting its CMD with stellar evolutionary models. However, different stellar evolutionary models use different parameters of stellar evolution, such as range of stellar masses, different opacities and equations of state, and different recipes, and so on. So, it is interesting to check whether different stellar evolutionary models can give consistent results for the same cluster. Brown et al. (2004a) constrained the age of B379 by comparing its CMD with isochrones of the 2006 VandenBerg models. Using SSP models of BC03 and its multi-photometry, Ma et al. (2007) independently determined the age of B379, which is in good agreement with the determination of Brown et al. (2004a). The BC03 models are calculated based on the Padova evolutionary tracks. It is necessary to check whether the age of B379 which, being determined based on the Padova evolutionary tracks, is in agreement with the determination of Brown et al. (2004a). So, in this paper, we re-determine its age using isochrones of the Padova stellar evolutionary models. In addition, the metal abundance, the distance modulus, and the reddening value for B379 are also determined in this paper. The results obtained in this paper are consistent with the previous determinations, which including the age obtained by Brown et al. (2004a). So, this paper confirms the consistence of the age scale of B379 between the Padova isochrones and the 2006 VandenBerg isochrones, i.e. the results' comparison between Brown et al. (2004a) and Ma et al. (2007) is meaningful. The results obtained in this paper are: the metallicity [M/H]=-0.325, the age $τ=11.0\pm1.5$ Gyr, the reddening value E(B-V)=0.08, and the distance modulus $(m-M)_{0}=24.44\pm0.10$.

preprint2010arXiv

Spectral energy distributions and age estimates of 104 M31 globular clusters

We present photometry of 104 M31 globular clusters (GCs) and GC candidates in 15 intermediate-band filters of the Beijing-Arizona-Taiwan-Connecticut (BATC) photometric system. The GCs and GC candidates were selected from the Revised Bologna Catalog (v.3.5). We obtain the cluster ages by comparing the photometric data with up-to-date theoretical synthesis models. The photometric data used are {\sl GALEX} far- and near-ultraviolet and 2MASS near-infrared $JHK_{\rm s}$ magnitudes, combined with optical photometry. The ages of our sample clusters cover a large range, although most clusters are younger than 10 Gyr. Combined with the ages obtained in our series of previous papers focusing on the M31 GC system, we present the full M31 GC age distribution. The M31 GC system contains populations of young and intermediate-age GCs, as well as the `usual' complement of well-known old GCs, i.e., GCs of similar age as the majority of the Galactic GCs. In addition, young GCs (and GC candidates) are distributed nearly uniformly in radial distance from the center of M31, while most old GCs (and GC candidates) are more strongly concentrated.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.15235:author:5:song-wang

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2604.25592:author:1:song-wang