Researcher profile

Jianmin Wang

Jianmin Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2026arXiv

Composite spectrum of Little Red Dot from a standard inner disk and an unstable outer disk

James Webb Space Telescope (JWST) has revealed a new class of high-redshift, very red, compact broad-line sources, termed as "little red dots" (LRDs). The physical mechanism driving these properties remains elusive. We construct spectral energy distributions (SEDs) with spectroscopic redshift for 28 LRDs and find they exhibit V-shaped SEDs with a common break frequency of $ν_{\rm b}\simeq10^{14.96\pm0.06}$ Hz. We propose that the unique SEDs can be well explained by the combination of an inner standard disk and an outer gravitationally unstable accretion disk with Toomre parameter $Q\sim1$, where the outer disk has a temperature of $\sim2000-4000 K$ and mainly radiates in near-infrared to optical wavebands. The composite spectrum from this model naturally explains the V-shaped continuum and reproduces intrinsically luminous infrared-optical emission without requiring extreme dust extinction or unusual stellar populations. Even considering possible dense gas around the disk to account for pronounced Balmer breaks in some LRDs, the intrinsic optical-UV emission is only suppressed by factors of $\lesssim2-3$, which suggests that most LRDs are sub-Eddington and intrinsically weak. These results provide new insights into early-phase black hole growth and galaxy evolution.

preprint2025arXiv

Introduction to the Chinese Space Station Survey Telescope (CSST)

The Chinese Space Station Survey Telescope (CSST) is an upcoming Stage-IV sky survey telescope, distinguished by its large field of view (FoV), high image quality, and multi-band observation capabilities. It can simultaneously conduct precise measurements of the Universe by performing multi-color photometric imaging and slitless spectroscopic surveys. The CSST is equipped with five scientific instruments, i.e. Multi-band Imaging and Slitless Spectroscopy Survey Camera (SC), Multi-Channel Imager (MCI), Integral Field Spectrograph (IFS), Cool Planet Imaging Coronagraph (CPI-C), and THz Spectrometer (TS). Using these instruments, CSST is expected to make significant contributions and discoveries across various astronomical fields, including cosmology, galaxies and active galactic nuclei (AGN), the Milky Way and nearby galaxies, stars, exoplanets, Solar System objects, astrometry, and transients and variable sources. This review aims to provide a comprehensive overview of the CSST instruments, observational capabilities, data products, and scientific potential.

preprint2023arXiv

Graph Exploration with Embedding-Guided Layouts

Node-link diagrams are widely used to visualize graphs. Most graph layout algorithms only use graph topology for aesthetic goals (e.g., minimize node occlusions and edge crossings) or use node attributes for exploration goals (e.g., preserve visible communities). Existing hybrid methods that bind the two perspectives still suffer from various generation restrictions (e.g., limited input types and required manual adjustments and prior knowledge of graphs) and the imbalance between aesthetic and exploration goals. In this paper, we propose a flexible embedding-based graph exploration pipeline to enjoy the best of both graph topology and node attributes. First, we leverage embedding algorithms for attributed graphs to encode the two perspectives into latent space. Then, we present an embedding-driven graph layout algorithm, GEGraph, which can achieve aesthetic layouts with better community preservation to support an easy interpretation of the graph structure. Next, graph explorations are extended based on the generated graph layout and insights extracted from the embedding vectors. Illustrated with examples, we build a layout-preserving aggregation method with Focus+Context interaction and a related nodes searching approach with multiple proximity strategies. Finally, we conduct quantitative and qualitative evaluations, a user study, and two case studies to validate our approach.

preprint2022arXiv

Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

Unsupervised detection of anomaly points in time series is a challenging problem, which requires the model to derive a distinguishable criterion. Previous methods tackle the problem mainly through learning pointwise representation or pairwise association, however, neither is sufficient to reason about the intricate dynamics. Recently, Transformers have shown great power in unified modeling of pointwise representation and pairwise association, and we find that the self-attention weight distribution of each time point can embody rich association with the whole series. Our key observation is that due to the rarity of anomalies, it is extremely difficult to build nontrivial associations from abnormal points to the whole series, thereby, the anomalies' associations shall mainly concentrate on their adjacent time points. This adjacent-concentration bias implies an association-based criterion inherently distinguishable between normal and abnormal points, which we highlight through the \emph{Association Discrepancy}. Technically, we propose the \emph{Anomaly Transformer} with a new \emph{Anomaly-Attention} mechanism to compute the association discrepancy. A minimax strategy is devised to amplify the normal-abnormal distinguishability of the association discrepancy. The Anomaly Transformer achieves state-of-the-art results on six unsupervised time series anomaly detection benchmarks of three applications: service monitoring, space & earth exploration, and water treatment.

preprint2022arXiv

Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting

Extending the forecasting time is a critical demand for real applications, such as extreme weather early warning and long-term energy consumption planning. This paper studies the long-term forecasting problem of time series. Prior Transformer-based models adopt various self-attention mechanisms to discover the long-range dependencies. However, intricate temporal patterns of the long-term future prohibit the model from finding reliable dependencies. Also, Transformers have to adopt the sparse versions of point-wise self-attentions for long series efficiency, resulting in the information utilization bottleneck. Going beyond Transformers, we design Autoformer as a novel decomposition architecture with an Auto-Correlation mechanism. We break with the pre-processing convention of series decomposition and renovate it as a basic inner block of deep models. This design empowers Autoformer with progressive decomposition capacities for complex time series. Further, inspired by the stochastic process theory, we design the Auto-Correlation mechanism based on the series periodicity, which conducts the dependencies discovery and representation aggregation at the sub-series level. Auto-Correlation outperforms self-attention in both efficiency and accuracy. In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a 38% relative improvement on six benchmarks, covering five practical applications: energy, traffic, economics, weather and disease. Code is available at this repository: \url{https://github.com/thuml/Autoformer}.

preprint2022arXiv

Decoupled Adaptation for Cross-Domain Object Detection

Cross-domain object detection is more challenging than object classification since multiple objects exist in an image and the location of each object is unknown in the unlabeled target domain. As a result, when we adapt features of different objects to enhance the transferability of the detector, the features of the foreground and the background are easy to be confused, which may hurt the discriminability of the detector. Besides, previous methods focused on category adaptation but ignored another important part for object detection, i.e., the adaptation on bounding box regression. To this end, we propose D-adapt, namely Decoupled Adaptation, to decouple the adversarial adaptation and the training of the detector. Besides, we fill the blank of regression domain adaptation in object detection by introducing a bounding box adaptor. Experiments show that D-adapt achieves state-of-the-art results on four cross-domain object detection tasks and yields 17% and 21% relative improvement on benchmark datasets Clipart1k and Comic2k in particular.

preprint2022arXiv

Flowformer: Linearizing Transformers with Conservation Flows

Transformers based on the attention mechanism have achieved impressive success in various areas. However, the attention mechanism has a quadratic complexity, significantly impeding Transformers from dealing with numerous tokens and scaling up to bigger models. Previous methods mainly utilize the similarity decomposition and the associativity of matrix multiplication to devise linear-time attention mechanisms. They avoid degeneration of attention to a trivial distribution by reintroducing inductive biases such as the locality, thereby at the expense of model generality and expressiveness. In this paper, we linearize Transformers free from specific inductive biases based on the flow network theory. We cast attention as the information flow aggregated from the sources (values) to the sinks (results) through the learned flow capacities (attentions). Within this framework, we apply the property of flow conservation into attention and propose the Flow-Attention mechanism of linear complexity. By respectively conserving the incoming flow of sinks for source competition and the outgoing flow of sources for sink allocation, Flow-Attention inherently generates informative attentions without using specific inductive biases. Empowered by the Flow-Attention, Flowformer yields strong performance in linear time for wide areas, including long sequence, time series, vision, natural language, and reinforcement learning. The code and settings are available at this repository: https://github.com/thuml/Flowformer.

preprint2022arXiv

From Big to Small: Adaptive Learning to Partial-Set Domains

Domain adaptation targets at knowledge acquisition and dissemination from a labeled source domain to an unlabeled target domain under distribution shift. Still, the common requirement of identical class space shared across domains hinders applications of domain adaptation to partial-set domains. Recent advances show that deep pre-trained models of large scale endow rich knowledge to tackle diverse downstream tasks of small scale. Thus, there is a strong incentive to adapt models from large-scale domains to small-scale domains. This paper introduces Partial Domain Adaptation (PDA), a learning paradigm that relaxes the identical class space assumption to that the source class space subsumes the target class space. First, we present a theoretical analysis of partial domain adaptation, which uncovers the importance of estimating the transferable probability of each class and each instance across domains. Then, we propose Selective Adversarial Network (SAN and SAN++) with a bi-level selection strategy and an adversarial adaptation mechanism. The bi-level selection strategy up-weighs each class and each instance simultaneously for source supervised training, target self-training, and source-target adversarial adaptation through the transferable probability estimated alternately by the model. Experiments on standard partial-set datasets and more challenging tasks with superclasses show that SAN++ outperforms several domain adaptation methods.

preprint2022arXiv

MetaSets: Meta-Learning on Point Sets for Generalizable Representations

Deep learning techniques for point clouds have achieved strong performance on a range of 3D vision tasks. However, it is costly to annotate large-scale point sets, making it critical to learn generalizable representations that can transfer well across different point sets. In this paper, we study a new problem of 3D Domain Generalization (3DDG) with the goal to generalize the model to other unseen domains of point clouds without any access to them in the training process. It is a challenging problem due to the substantial geometry shift from simulated to real data, such that most existing 3D models underperform due to overfitting the complete geometries in the source domain. We propose to tackle this problem via MetaSets, which meta-learns point cloud representations from a group of classification tasks on carefully-designed transformed point sets containing specific geometry priors. The learned representations are more generalizable to various unseen domains of different geometries. We design two benchmarks for Sim-to-Real transfer of 3D point clouds. Experimental results show that MetaSets outperforms existing 3D deep learning methods by large margins.

preprint2022arXiv

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning

The predictive learning of spatiotemporal sequences aims to generate future images by learning from the historical context, where the visual dynamics are believed to have modular structures that can be learned with compositional subsystems. This paper models these structures by presenting PredRNN, a new recurrent network, in which a pair of memory cells are explicitly decoupled, operate in nearly independent transition manners, and finally form unified representations of the complex environment. Concretely, besides the original memory cell of LSTM, this network is featured by a zigzag memory flow that propagates in both bottom-up and top-down directions across all layers, enabling the learned visual dynamics at different levels of RNNs to communicate. It also leverages a memory decoupling loss to keep the memory cells from learning redundant features. We further propose a new curriculum learning strategy to force PredRNN to learn long-term dynamics from context frames, which can be generalized to most sequence-to-sequence models. We provide detailed ablation studies to verify the effectiveness of each component. Our approach is shown to obtain highly competitive results on five datasets for both action-free and action-conditioned predictive learning scenarios.

preprint2022arXiv

Quantum Interference between Photons and Single Quanta of Stored Atomic Coherence

Essential for building quantum networks over remote independent nodes, the indistinguishability of photons has been extensively studied by observing the coincidence dip in the Hong-Ou-Mandel interferometer. However, indistinguishability is not limited to the same type of bosons. For the first time, we hereby observe quantum interference between flying photons and a single quantum of stored atomic coherence (magnon) in an atom-light beam splitter interface. We demonstrate that the Hermiticity of this interface determines the type of quantum interference between photons and magnons. Consequently, not only the bunching behavior that characterizes bosons is observed, but counterintuitively, fermionlike antibunching as well. The hybrid nature of the demonstrated magnon-photon quantum interface can be applied to versatile quantum memory platforms, and can lead to fundamentally different photon distributions from those occurring in boson sampling.

preprint2022arXiv

Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs

Model hubs with many pre-trained models (PTMs) have become a cornerstone of deep learning. Although built at a high cost, they remain \emph{under-exploited} -- practitioners usually pick one PTM from the provided model hub by popularity and then fine-tune the PTM to solve the target task. This naïve but common practice poses two obstacles to full exploitation of pre-trained model hubs: first, the PTM selection by popularity has no optimality guarantee, and second, only one PTM is used while the remaining PTMs are ignored. An alternative might be to consider all possible combinations of PTMs and extensively fine-tune each combination, but this would not only be prohibitive computationally but may also lead to statistical over-fitting. In this paper, we propose a new paradigm for exploiting model hubs that is intermediate between these extremes. The paradigm is characterized by two aspects: (1) We use an evidence maximization procedure to estimate the maximum value of label evidence given features extracted by pre-trained models. This procedure can rank all the PTMs in a model hub for various types of PTMs and tasks \emph{before fine-tuning}. (2) The best ranked PTM can either be fine-tuned and deployed if we have no preference for the model's architecture or the target PTM can be tuned by the top $K$ ranked PTMs via a Bayesian procedure that we propose. This procedure, which we refer to as \emph{B-Tuning}, not only improves upon specialized methods designed for tuning homogeneous PTMs, but also applies to the challenging problem of tuning heterogeneous PTMs where it yields a new level of benchmark performance.

preprint2022arXiv

Towards Natural Language Interfaces for Data Visualization: A Survey

Utilizing Visualization-oriented Natural Language Interfaces (V-NLI) as a complementary input modality to direct manipulation for visual analytics can provide an engaging user experience. It enables users to focus on their tasks rather than having to worry about how to operate visualization tools on the interface. In the past two decades, leveraging advanced natural language processing technologies, numerous V-NLI systems have been developed in academic research and commercial software, especially in recent years. In this article, we conduct a comprehensive review of the existing V-NLIs. In order to classify each paper, we develop categorical dimensions based on a classic information visualization pipeline with the extension of a V-NLI layer. The following seven stages are used: query interpretation, data transformation, visual mapping, view transformation, human interaction, dialogue management, and presentation. Finally, we also shed light on several promising directions for future work in the V-NLI community.

preprint2022arXiv

Transferability in Deep Learning: A Survey

The success of deep learning algorithms generally depends on large-scale data, while humans appear to have inherent ability of knowledge transfer, by recognizing and applying relevant knowledge from previous learning experiences when encountering and solving unseen tasks. Such an ability to acquire and reuse knowledge is known as transferability in deep learning. It has formed the long-term quest towards making deep learning as data-efficient as human learning, and has been motivating fruitful design of more powerful deep learning algorithms. We present this survey to connect different isolated areas in deep learning with their relation to transferability, and to provide a unified and complete view to investigating transferability through the whole lifecycle of deep learning. The survey elaborates the fundamental goals and challenges in parallel with the core principles and methods, covering recent cornerstones in deep architectures, pre-training, task adaptation and domain adaptation. This highlights unanswered questions on the appropriate objectives for learning transferable knowledge and for adapting the knowledge to new tasks and domains, avoiding catastrophic forgetting and negative transfer. Finally, we implement a benchmark and an open-source library, enabling a fair evaluation of deep learning methods in terms of transferability.

preprint2022arXiv

Visual Data Analysis with Task-based Recommendations

General visualization recommendation systems typically make design decisions for the dataset automatically. However, most of them can only prune meaningless visualizations but fail to recommend targeted results. This paper contributes TaskVis, a task-oriented visualization recommendation system that allows users to select their tasks precisely on the interface. We first summarize a task base with 18 classical analytic tasks by a survey both in academia and industry. On this basis, we maintain a rule base, which extends empirical wisdom with our targeted modeling of the analytic tasks. Then, our rule-based approach enumerates all the candidate visualizations through answer set programming. After that, the generated charts can be ranked by four ranking schemes. Furthermore, we introduce a task-based combination recommendation strategy, leveraging a set of visualizations to give a brief view of the dataset collaboratively. Finally, we evaluate TaskVis through a series of use cases and a user study.

preprint2022arXiv

What Makes the Story Forward? Inferring Commonsense Explanations as Prompts for Future Event Generation

Prediction over event sequences is critical for many real-world applications in Information Retrieval and Natural Language Processing. Future Event Generation (FEG) is a challenging task in event sequence prediction because it requires not only fluent text generation but also commonsense reasoning to maintain the logical coherence of the entire event story. In this paper, we propose a novel explainable FEG framework, Coep. It highlights and integrates two types of event knowledge, sequential knowledge of direct event-event relations and inferential knowledge that reflects the intermediate character psychology between events, such as intents, causes, reactions, which intrinsically pushes the story forward. To alleviate the knowledge forgetting issue, we design two modules, Im and Gm, for each type of knowledge, which are combined via prompt tuning. First, Im focuses on understanding inferential knowledge to generate commonsense explanations and provide a soft prompt vector for Gm. We also design a contrastive discriminator for better generalization ability. Second, Gm generates future events by modeling direct sequential knowledge with the guidance of Im. Automatic and human evaluation demonstrate that our approach can generate more coherent, specific, and logical future events.

preprint2020arXiv

An Approach for Process Model Extraction By Multi-Grained Text Classification

Process model extraction (PME) is a recently emerged interdiscipline between natural language processing (NLP) and business process management (BPM), which aims to extract process models from textual descriptions. Previous process extractors heavily depend on manual features and ignore the potential relations between clues of different text granularities. In this paper, we formalize the PME task into the multi-grained text classification problem, and propose a hierarchical neural network to effectively model and extract multi-grained information without manually-defined procedural features. Under this structure, we accordingly propose the coarse-to-fine (grained) learning mechanism, training multi-grained tasks in coarse-to-fine grained order to share the high-level knowledge for the low-level tasks. To evaluate our approach, we construct two multi-grained datasets from two different domains and conduct extensive experiments from different dimensions. The experimental results demonstrate that our approach outperforms the state-of-the-art methods with statistical significance and further investigations demonstrate its effectiveness.

preprint2020arXiv

Learning Individual Models for Imputation (Technical Report)

Missing numerical values are prevalent, e.g., owing to unreliable sensor reading, collection and transmission among heterogeneous sources. Unlike categorized data imputation over a limited domain, the numerical values suffer from two issues: (1) sparsity problem, the incomplete tuple may not have sufficient complete neighbors sharing the same/similar values for imputation, owing to the (almost) infinite domain; (2) heterogeneity problem, different tuples may not fit the same (regression) model. In this study, enlightened by the conditional dependencies that hold conditionally over certain tuples rather than the whole relation, we propose to learn a regression model individually for each complete tuple together with its neighbors. Our IIM, Imputation via Individual Models, thus no longer relies on sharing similar values among the k complete neighbors for imputation, but utilizes their regression results by the aforesaid learned individual (not necessary the same) models. Remarkably, we show that some existing methods are indeed special cases of our IIM, under the extreme settings of the number l of learning neighbors considered in individual learning. In this sense, a proper number l of neighbors is essential to learn the individual models (avoid over-fitting or under-fitting). We propose to adaptively learn individual models over various number l of neighbors for different complete tuples. By devising efficient incremental computation, the time complexity of learning a model reduces from linear to constant. Experiments on real data demonstrate that our IIM with adaptive learning achieves higher imputation accuracy than the existing approaches.

preprint2020arXiv

Minimum Class Confusion for Versatile Domain Adaptation

There are a variety of Domain Adaptation (DA) scenarios subject to label sets and domain configurations, including closed-set and partial-set DA, as well as multi-source and multi-target DA. It is notable that existing DA methods are generally designed only for a specific scenario, and may underperform for scenarios they are not tailored to. To this end, this paper studies Versatile Domain Adaptation (VDA), where one method can handle several different DA scenarios without any modification. Towards this goal, a more general inductive bias other than the domain alignment should be explored. We delve into a missing piece of existing methods: class confusion, the tendency that a classifier confuses the predictions between the correct and ambiguous classes for target examples, which is common in different DA scenarios. We uncover that reducing such pairwise class confusion leads to significant transfer gains. With this insight, we propose a general loss function: Minimum Class Confusion (MCC). It can be characterized as (1) a non-adversarial DA method without explicitly deploying domain alignment, enjoying faster convergence speed; (2) a versatile approach that can handle four existing scenarios: Closed-Set, Partial-Set, Multi-Source, and Multi-Target DA, outperforming the state-of-the-art methods in these scenarios, especially on one of the largest and hardest datasets to date (7.3% on DomainNet). Its versatility is further justified by two scenarios proposed in this paper: Multi-Source Partial DA and Multi-Target Partial DA. In addition, it can also be used as a general regularizer that is orthogonal and complementary to a variety of existing DA methods, accelerating convergence and pushing these readily competitive methods to stronger ones. Code is available at https://github.com/thuml/Versatile-Domain-Adaptation.

preprint2020arXiv

Multi-Task Learning of Generalizable Representations for Video Action Recognition

In classic video action recognition, labels may not contain enough information about the diverse video appearance and dynamics, thus, existing models that are trained under the standard supervised learning paradigm may extract less generalizable features. We evaluate these models under a cross-dataset experiment setting, as the above label bias problem in video analysis is even more prominent across different data sources. We find that using the optical flows as model inputs harms the generalization ability of most video recognition models. Based on these findings, we present a multi-task learning paradigm for video classification. Our key idea is to avoid label bias and improve the generalization ability by taking data as its own supervision or supervising constraints on the data. First, we take the optical flows and the RGB frames by taking them as auxiliary supervisions, and thus naming our model as Reversed Two-Stream Networks (Rev2Net). Further, we collaborate the auxiliary flow prediction task and the frame reconstruction task by introducing a new training objective to Rev2Net, named Decoding Discrepancy Penalty (DDP), which constraints the discrepancy of the multi-task features in a self-supervised manner. Rev2Net is shown to be effective on the classic action recognition task. It specifically shows a strong generalization ability in the cross-dataset experiments.

preprint2020arXiv

On Localized Discrepancy for Domain Adaptation

We propose the discrepancy-based generalization theories for unsupervised domain adaptation. Previous theories introduced distribution discrepancies defined as the supremum over complete hypothesis space. The hypothesis space may contain hypotheses that lead to unnecessary overestimation of the risk bound. This paper studies the localized discrepancies defined on the hypothesis space after localization. First, we show that these discrepancies have desirable properties. They could be significantly smaller than the pervious discrepancies. Their values will be different if we exchange the two domains, thus can reveal asymmetric transfer difficulties. Next, we derive improved generalization bounds with these discrepancies. We show that the discrepancies could influence the rate of the sample complexity. Finally, we further extend the localized discrepancies for achieving super transfer and derive generalization bounds that could be even more sample-efficient on source domain.

preprint2020arXiv

Time Series Data Cleaning: From Anomaly Detection to Anomaly Repairing (Technical Report)

Errors are prevalent in time series data, such as GPS trajectories or sensor readings. Existing methods focus more on anomaly detection but not on repairing the detected anomalies. By simply filtering out the dirty data via anomaly detection, applications could still be unreliable over the incomplete time series. Instead of simply discarding anomalies, we propose to (iteratively) repair them in time series data, by creatively bonding the beauty of temporal nature in anomaly detection with the widely considered minimum change principle in data repairing. Our major contributions include: (1) a novel framework of iterative minimum repairing (IMR) over time series data, (2) explicit analysis on convergence of the proposed iterative minimum repairing, and (3) efficient estimation of parameters in each iteration. Remarkably, with incremental computation, we reduce the complexity of parameter estimation from O(n) to O(1). Experiments on real datasets demonstrate the superiority of our proposal compared to the state-of-the-art approaches. In particular, we show that (the proposed) repairing indeed improves the time series classification application.