Source author record

Tao Xiang

Tao Xiang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

102works

29topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

From Ground to Sky: Architectures, Applications, and Challenges Shaping Low-Altitude Wireless Networks

In this article, we introduce a novel low-altitude wireless network (LAWN), which is a reconfigurable, three-dimensional (3D) layered architecture. In particular, the LAWN integrates connectivity, sensing, control, and computing across aerial and terrestrial nodes that enable seamless operation in complex, dynamic, and mission-critical environments. Different from the conventional aerial communication systems, LAWN's distinctive feature is its tight integration of functional planes in which multiple functionalities continually reshape themselves to operate safely and efficiently in the low-altitude sky. With the LAWN, we discuss several enabling technologies, such as integrated sensing and communication (ISAC), semantic communication, and fully-actuated control systems. Finally, we identify potential applications and key cross-layer challenges. This article offers a comprehensive roadmap for future research and development in the low-altitude airspace.

preprint2024arXiv

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the limitations of existing approaches regarding generalization and consistency, we introduce a novel neural rendering technique. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Specifically, our method builds neural encoding volumes from generated multi-view inputs. We adjust the weights of the SDF network conditioned on an input image at test-time to allow model adaptation to novel scenes in a feed-forward manner via HyperNetworks. To mitigate artifacts derived from the synthesized views, we propose the use of a volume transformer module to improve the aggregation of image features instead of processing each viewpoint separately. Through our proposed method, dubbed as Hyper-VolTran, we avoid the bottleneck of scene-specific optimization and maintain consistency across the images generated from multiple viewpoints. Our experiments show the advantages of our proposed approach with consistent results and rapid generation.

preprint2022arXiv

Adaptive Fine-Grained Sketch-Based Image Retrieval

The recent focus on Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) has shifted towards generalising a model to new categories without any training data from them. In real-world applications, however, a trained FG-SBIR model is often applied to both new categories and different human sketchers, i.e., different drawing styles. Although this complicates the generalisation problem, fortunately, a handful of examples are typically available, enabling the model to adapt to the new category/style. In this paper, we offer a novel perspective -- instead of asking for a model that generalises, we advocate for one that quickly adapts, with just very few samples during testing (in a few-shot manner). To solve this new problem, we introduce a novel model-agnostic meta-learning (MAML) based framework with several key modifications: (1) As a retrieval task with a margin-based contrastive loss, we simplify the MAML training in the inner loop to make it more stable and tractable. (2) The margin in our contrastive loss is also meta-learned with the rest of the model. (3) Three additional regularisation losses are introduced in the outer loop, to make the meta-learned FG-SBIR model more effective for category/style adaptation. Extensive experiments on public datasets suggest a large gain over generalisation and zero-shot based approaches, and a few strong few-shot baselines.

preprint2022arXiv

Chiral conformal field theory for topological states and the anyon eigenbasis on the torus

Model wave functions constructed from (1+1)D conformal field theory (CFT) have played a vital role in studying chiral topologically ordered systems. There usually exist multiple degenerate ground states when such states are placed on the torus. The common practice for dealing with this degeneracy within the CFT framework is to take a full correlator on the torus, which includes both holomorphic and antiholomorphic sectors, and decompose it into several conformal blocks. In this paper, we propose a pure chiral approach for the torus wave function construction. By utilizing the operator formalism, the wave functions are written as chiral correlators of holormorphic fields restricted to each individual topological sector. This method is not only conceptually much simpler, but also automatically provides us the anyon eigenbasis of the degenerate ground states (also known as the "minimally entangled states"). As concrete examples, we construct the full set of degenerate ground states for SO($n$)$_1$ and SU($n$)$_1$ chiral spin liquids on the torus, the former of which provide a complete wave function realization of Kitaev's sixteenfold way of anyon theories. We further characterize their topological orders by analytically computing the associated modular $S$ and $T$ matrices.

preprint2022arXiv

Deep Learning for Free-Hand Sketch: A Survey

Free-hand sketches are highly illustrative, and have been widely used by humans to depict objects or stories from ancient times to the present. The recent prevalence of touchscreen devices has made sketch creation a much easier task than ever and consequently made sketch-oriented applications increasingly popular. The progress of deep learning has immensely benefited free-hand sketch research and applications. This paper presents a comprehensive survey of the deep learning techniques oriented at free-hand sketch data, and the applications that they enable. The main contents of this survey include: (i) A discussion of the intrinsic traits and unique challenges of free-hand sketch, to highlight the essential differences between sketch data and other data modalities, e.g., natural photos. (ii) A review of the developments of free-hand sketch research in the deep learning era, by surveying existing datasets, research topics, and the state-of-the-art methods through a detailed taxonomy and experimental evaluation. (iii) Promotion of future work via a discussion of bottlenecks, open problems, and potential research directions for the community.

preprint2022arXiv

Domain Generalization: A Survey

Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce. This is because most learning algorithms strongly rely on the i.i.d.~assumption on source/target data, which is often violated in practice due to domain shift. Domain generalization (DG) aims to achieve OOD generalization by using only source data for model learning. Over the last ten years, research in DG has made great progress, leading to a broad spectrum of methodologies, e.g., those based on domain alignment, meta-learning, data augmentation, or ensemble learning, to name a few; DG has also been studied in various application areas including computer vision, speech recognition, natural language processing, medical imaging, and reinforcement learning. In this paper, for the first time a comprehensive literature review in DG is provided to summarize the developments over the past decade. Specifically, we first cover the background by formally defining DG and relating it to other relevant fields like domain adaptation and transfer learning. Then, we conduct a thorough review into existing methods and theories. Finally, we conclude this survey with insights and discussions on future research directions.

preprint2022arXiv

Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

The human visual system is remarkable in learning new visual concepts from just a few examples. This is precisely the goal behind few-shot class incremental learning (FSCIL), where the emphasis is additionally placed on ensuring the model does not suffer from "forgetting". In this paper, we push the boundary further for FSCIL by addressing two key questions that bottleneck its ubiquitous application (i) can the model learn from diverse modalities other than just photo (as humans do), and (ii) what if photos are not readily accessible (due to ethical and privacy constraints). Our key innovation lies in advocating the use of sketches as a new modality for class support. The product is a "Doodle It Yourself" (DIY) FSCIL framework where the users can freely sketch a few examples of a novel class for the model to learn to recognize photos of that class. For that, we present a framework that infuses (i) gradient consensus for domain invariant learning, (ii) knowledge distillation for preserving old class information, and (iii) graph attention networks for message passing between old and novel classes. We experimentally show that sketches are better class support than text in the context of FSCIL, echoing findings elsewhere in the sketching literature.

preprint2022arXiv

Dynamic Instance Domain Adaptation

Most existing studies on unsupervised domain adaptation (UDA) assume that each domain's training samples come with domain labels (e.g., painting, photo). Samples from each domain are assumed to follow the same distribution and the domain labels are exploited to learn domain-invariant features via feature alignment. However, such an assumption often does not hold true -- there often exist numerous finer-grained domains (e.g., dozens of modern painting styles have been developed, each differing dramatically from those of the classic styles). Therefore, forcing feature distribution alignment across each artificially-defined and coarse-grained domain can be ineffective. In this paper, we address both single-source and multi-source UDA from a completely different perspective, which is to view each instance as a fine domain. Feature alignment across domains is thus redundant. Instead, we propose to perform dynamic instance domain adaptation (DIDA). Concretely, a dynamic neural network with adaptive convolutional kernels is developed to generate instance-adaptive residuals to adapt domain-agnostic deep features to each individual instance. This enables a shared classifier to be applied to both source and target domain data without relying on any domain annotation. Further, instead of imposing intricate feature alignment losses, we adopt a simple semi-supervised learning paradigm using only a cross-entropy loss for both labeled source and pseudo labeled target data. Our model, dubbed DIDA-Net, achieves state-of-the-art performance on several commonly used single-source and multi-source UDA datasets including Digits, Office-Home, DomainNet, Digit-Five, and PACS.

preprint2022arXiv

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

Large-scale Vision-and-Language (V+L) pre-training for representation learning has proven to be effective in boosting various downstream V+L tasks. However, when it comes to the fashion domain, existing V+L methods are inadequate as they overlook the unique characteristics of both the fashion V+L data and downstream tasks. In this work, we propose a novel fashion-focused V+L representation learning framework, dubbed as FashionViL. It contains two novel fashion-specific pre-training tasks designed particularly to exploit two intrinsic attributes with fashion V+L data. First, in contrast to other domains where a V+L data point contains only a single image-text pair, there could be multiple images in the fashion domain. We thus propose a Multi-View Contrastive Learning task for pulling closer the visual representation of one image to the compositional multimodal representation of another image+text. Second, fashion text (e.g., product description) often contains rich fine-grained concepts (attributes/noun phrases). To exploit this, a Pseudo-Attributes Classification task is introduced to encourage the learned unimodal (visual/textual) representations of the same concept to be adjacent. Further, fashion V+L tasks uniquely include ones that do not conform to the common one-stream or two-stream architectures (e.g., text-guided image retrieval). We thus propose a flexible, versatile V+L model architecture consisting of a modality-agnostic Transformer so that it can be flexibly adapted to any downstream tasks. Extensive experiments show that our FashionViL achieves a new state of the art across five downstream tasks. Code is available at https://github.com/BrandonHanx/mmf.

preprint2022arXiv

FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO. With practical applications in mind, we collect sketches that convey scene content well but can be sketched within a few minutes by a person with any sketching skills. Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals, offering both object- and scene-level abstraction. Each sketch is augmented with its text description. Using our dataset, we study for the first time the problem of fine-grained image retrieval from freehand scene sketches and sketch captions. We draw insights on: (i) Scene salience encoded in sketches using the strokes temporal order; (ii) Performance comparison of image retrieval from a scene sketch and an image caption; (iii) Complementarity of information in sketches and image captions, as well as the potential benefit of combining the two modalities. In addition, we extend a popular vector sketch LSTM-based encoder to handle sketches with larger complexity than was supported by previous work. Namely, we propose a hierarchical sketch decoder, which we leverage at a sketch-specific "pre-text" task. Our dataset enables for the first time research on freehand scene sketch understanding and its practical applications.

preprint2022arXiv

Giant and Reversible Electronic Structure Evolution in a Magnetic Topological Material EuCd2As2

The electronic structure and the physical properties of quantum materials can be significantly altered by charge carrier doping and magnetic state transition. Here we report a discovery of a giant and reversible electronic structure evolution with doping in a magnetic topological material. By performing high-resolution angle-resolved photoemission measurements on EuCd2As2,we found that a huge amount of hole doping can be introduced into the sample surface due to surface absorption. The electronic structure exhibits a dramatic change with the hole doping which can not be described by a rigid band shift. Prominent band splitting is observed at high doping which corresponds to a doping-induced magnetic transition at low temperature (below -15 K) from an antiferromagnetic state to a ferromagnetic state. These results have established a detailed electronic phase diagram of EuCd2As2 where the electronic structure and the magnetic structure change systematically and dramatically with the doping level. They further suggest that the transport, magnetic and topological properties of EuCd2As2 can be greatly modified by doping. These work will stimulate further investigations to explore for new phenomena and properties in doping this magnetic topological material.

preprint2022arXiv

Magnetic Excitations in Strained Infinite-layer Nickelate PrNiO2

Strongly correlated materials often respond sensitively to the external perturbations. In the recently discovered superconducting infinite-layer nickelates, the superconducting transition temperature can be dramatically enhanced via only ~1% compressive strain-tuning enabled by substrate design. However, the root of such enhancement remains elusive. While the superconducting pairing mechanism is still not settled, magnetic Cooper pairing - similar to the cuprates has been proposed. Using resonant inelastic x-ray scattering, we investigate the magnetic excitations in infinite-layer PrNiO2 thin films for different strain conditions. The magnon bandwidth of PrNiO2 shows only marginal response to strain-tuning, in sharp contrast to the striking enhancement of the superconducting transition temperature Tc in the doped superconducting samples. These results suggest the enhancement of Tc is not mediated by spin excitations and thus provide important empirics for the understanding of superconductivity in infinite-layer nickelates.

preprint2022arXiv

Negative Frames Matter in Egocentric Visual Query 2D Localization

The recently released Ego4D dataset and benchmark significantly scales and diversifies the first-person visual perception data. In Ego4D, the Visual Queries 2D Localization task aims to retrieve objects appeared in the past from the recording in the first-person view. This task requires a system to spatially and temporally localize the most recent appearance of a given object query, where query is registered by a single tight visual crop of the object in a different scene. Our study is based on the three-stage baseline introduced in the Episodic Memory benchmark. The baseline solves the problem by detection and tracking: detect the similar objects in all the frames, then run a tracker from the most confident detection result. In the VQ2D challenge, we identified two limitations of the current baseline. (1) The training configuration has redundant computation. Although the training set has millions of instances, most of them are repetitive and the number of unique object is only around 14.6k. The repeated gradient computation of the same object lead to an inefficient training; (2) The false positive rate is high on background frames. This is due to the distribution gap between training and evaluation. During training, the model is only able to see the clean, stable, and labeled frames, but the egocentric videos also have noisy, blurry, or unlabeled background frames. To this end, we developed a more efficient and effective solution. Concretely, we bring the training loop from ~15 days to less than 24 hours, and we achieve 0.17% spatial-temporal AP, which is 31% higher than the baseline. Our solution got the first ranking on the public leaderboard. Our code is publicly available at https://github.com/facebookresearch/vq2d_cvpr.

preprint2022arXiv

One Sketch for All: One-Shot Personalized Sketch Segmentation

We present the first one-shot personalized sketch segmentation method. We aim to segment all sketches belonging to the same category provisioned with a single sketch with a given part annotation while (i) preserving the parts semantics embedded in the exemplar, and (ii) being robust to input style and abstraction. We refer to this scenario as personalized. With that, we importantly enable a much-desired personalization capability for downstream fine-grained sketch analysis tasks. To train a robust segmentation module, we deform the exemplar sketch to each of the available sketches of the same category. Our method generalizes to sketches not observed during training. Our central contribution is a sketch-specific hierarchical deformation network. Given a multi-level sketch-strokes encoding obtained via a graph convolutional network, our method estimates rigid-body transformation from the target to the exemplar, on the upper level. Finer deformation from the exemplar to the globally warped target sketch is further obtained through stroke-wise deformations, on the lower level. Both levels of deformation are guided by mean squared distances between the keypoints learned without supervision, ensuring that the stroke semantics are preserved. We evaluate our method against the state-of-the-art segmentation and perceptual grouping baselines re-purposed for the one-shot setting and against two few-shot 3D shape segmentation methods. We show that our method outperforms all the alternatives by more than $10\%$ on average. Ablation studies further demonstrate that our method is robust to personalization: changes in input part semantics and style differences.

preprint2022arXiv

Partially Does It: Towards Scene-Level FG-SBIR with Partial Input

We scrutinise an important observation plaguing scene-level sketch research -- that a significant portion of scene sketches are "partial". A quick pilot study reveals: (i) a scene sketch does not necessarily contain all objects in the corresponding photo, due to the subjective holistic interpretation of scenes, (ii) there exists significant empty (white) regions as a result of object-level abstraction, and as a result, (iii) existing scene-level fine-grained sketch-based image retrieval methods collapse as scene sketches become more partial. To solve this "partial" problem, we advocate for a simple set-based approach using optimal transport (OT) to model cross-modal region associativity in a partially-aware fashion. Importantly, we improve upon OT to further account for holistic partialness by comparing intra-modal adjacency matrices. Our proposed method is not only robust to partial scene-sketches but also yields state-of-the-art performance on existing datasets.

preprint2022arXiv

Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning

Existing temporal action detection (TAD) methods rely on generating an overwhelmingly large number of proposals per video. This leads to complex model designs due to proposal generation and/or per-proposal action instance evaluation and the resultant high computational cost. In this work, for the first time, we propose a proposal-free Temporal Action detection model with Global Segmentation mask (TAGS). Our core idea is to learn a global segmentation mask of each action instance jointly at the full video length. The TAGS model differs significantly from the conventional proposal-based methods by focusing on global temporal representation learning to directly detect local start and end points of action instances without proposals. Further, by modeling TAD holistically rather than locally at the individual proposal level, TAGS needs a much simpler model architecture with lower computational cost. Extensive experiments show that despite its simpler design, TAGS outperforms existing TAD methods, achieving new state-of-the-art performance on two benchmarks. Importantly, it is ~ 20x faster to train and ~1.6x more efficient for inference. Our PyTorch implementation of TAGS is available at https://github.com/sauradip/TAGS .

preprint2022arXiv

Quasi-uniaxial pressure induced superconductivity in stoichiometric compound UTe$_2$

The recent discovery of superconductivity in heavy Fermion compound UTe2, a candidate topological and triplet-paired superconductor, has aroused widespread interest. However, to date, there is no consensus on whether the stoichiometric sample of UTe2 is superconducting or not due to lack of reliable evidence to distinguish the difference between the nominal and real compositions of samples. Here, we are the first to clarify that the stoichiometric UT2 is non-superconducting at ambient pressure and under hydrostatic pressure up to 6 GPa, however we find that it can be compressed into superconductivity by application of quasi-uniaxial pressure. Measurements of resistivity, magnetoresistance and susceptibility reveal that the quasi-uniaxial pressure results in a suppression of the Kondo coherent state seen at ambient pressure, and then leads to a superconductivity initially emerged on the ab-plane at 1.5 GPa. At 4.8 GPa, the superconductivity is developed in three crystallographic directions. The superconducting state coexists with an exotic magnetic ordered state that develops just below the onset temperature of the superconducting transition. The discovery of the quasi-uniaxial-pressure-induced superconductivity with exotic magnetic state in the stoichiometric UTe2 not only provide new understandings on this compound, but also highlight the vital role of Te deficiency in developing the superconductivity at ambient pressures.

preprint2022arXiv

Semi-Supervised Temporal Action Detection with Proposal-Free Masking

Existing temporal action detection (TAD) methods rely on a large number of training data with segment-level annotations. Collecting and annotating such a training set is thus highly expensive and unscalable. Semi-supervised TAD (SS-TAD) alleviates this problem by leveraging unlabeled videos freely available at scale. However, SS-TAD is also a much more challenging problem than supervised TAD, and consequently much under-studied. Prior SS-TAD methods directly combine an existing proposal-based TAD method and a SSL method. Due to their sequential localization (e.g, proposal generation) and classification design, they are prone to proposal error propagation. To overcome this limitation, in this work we propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT) with a parallel localization (mask generation) and classification architecture. Such a novel design effectively eliminates the dependence between localization and classification by cutting off the route for error propagation in-between. We further introduce an interaction mechanism between classification and localization for prediction refinement, and a new pretext task for self-supervised model pre-training. Extensive experiments on two standard benchmarks show that our SPOT outperforms state-of-the-art alternatives, often by a large margin. The PyTorch implementation of SPOT is available at https://github.com/sauradip/SPOT

preprint2022arXiv

Sketch3T: Test-Time Training for Zero-Shot SBIR

Zero-shot sketch-based image retrieval typically asks for a trained model to be applied as is to unseen categories. In this paper, we question to argue that this setup by definition is not compatible with the inherent abstract and subjective nature of sketches, i.e., the model might transfer well to new categories, but will not understand sketches existing in different test-time distribution as a result. We thus extend ZS-SBIR asking it to transfer to both categories and sketch distributions. Our key contribution is a test-time training paradigm that can adapt using just one sketch. Since there is no paired photo, we make use of a sketch raster-vector reconstruction module as a self-supervised auxiliary task. To maintain the fidelity of the trained cross-modal joint embedding during test-time update, we design a novel meta-learning based training paradigm to learn a separation between model updates incurred by this auxiliary task from those off the primary objective of discriminative learning. Extensive experiments show our model to outperform state of-the-arts, thanks to the proposed test-time adaption that not only transfers to new categories but also accommodates to new sketching styles.

preprint2022arXiv

Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval

Sketching enables many exciting applications, notably, image retrieval. The fear-to-sketch problem (i.e., "I can't sketch") has however proven to be fatal for its widespread adoption. This paper tackles this "fear" head on, and for the first time, proposes an auxiliary module for existing retrieval models that predominantly lets the users sketch without having to worry. We first conducted a pilot study that revealed the secret lies in the existence of noisy strokes, but not so much of the "I can't sketch". We consequently design a stroke subset selector that {detects noisy strokes, leaving only those} which make a positive contribution towards successful retrieval. Our Reinforcement Learning based formulation quantifies the importance of each stroke present in a given subset, based on the extent to which that stroke contributes to retrieval. When combined with pre-trained retrieval models as a pre-processing module, we achieve a significant gain of 8%-10% over standard baselines and in turn report new state-of-the-art performance. Last but not least, we demonstrate the selector once trained, can also be used in a plug-and-play manner to empower various sketch applications in ways that were not previously possible.

preprint2022arXiv

SOFT: Softmax-free Transformer with Linear Complexity

Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. However, the employment of self-attention modules results in a quadratic complexity in both computation and memory usage. Various attempts on approximating the self-attention computation with linear complexity have been made in Natural Language Processing. However, an in-depth analysis in this work shows that they are either theoretically flawed or empirically ineffective for visual recognition. We further identify that their limitations are rooted in keeping the softmax self-attention during approximations. Specifically, conventional self-attention is computed by normalizing the scaled dot-product between token feature vectors. Keeping this softmax operation challenges any subsequent linearization efforts. Based on this insight, for the first time, a softmax-free transformer or SOFT is proposed. To remove softmax in self-attention, Gaussian kernel function is used to replace the dot-product similarity without further normalization. This enables a full self-attention matrix to be approximated via a low-rank matrix decomposition. The robustness of the approximation is achieved by calculating its Moore-Penrose inverse using a Newton-Raphson method. Extensive experiments on ImageNet show that our SOFT significantly improves the computational efficiency of existing ViT variants. Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.

preprint2022arXiv

Style-Based Global Appearance Flow for Virtual Try-On

Image-based virtual try-on aims to fit an in-shop garment into a clothed person image. To achieve this, a key step is garment warping which spatially aligns the target garment with the corresponding body parts in the person image. Prior methods typically adopt a local appearance flow estimation model. They are thus intrinsically susceptible to difficult body poses/occlusions and large mis-alignments between person and garment images (see Fig.~\ref{fig:fig1}). To overcome this limitation, a novel global appearance flow estimation model is proposed in this work. For the first time, a StyleGAN based architecture is adopted for appearance flow estimation. This enables us to take advantage of a global style vector to encode a whole-image context to cope with the aforementioned challenges. To guide the StyleGAN flow generator to pay more attention to local garment deformation, a flow refinement module is introduced to add local context. Experiment results on a popular virtual try-on benchmark show that our method achieves new state-of-the-art performance. It is particularly effective in a `in-the-wild' application scenario where the reference image is full-body resulting in a large mis-alignment with the garment image (Fig.~\ref{fig:fig1} Top). Code is available at: \url{https://github.com/SenHe/Flow-Style-VTON}.

preprint2022arXiv

Towards artificial general intelligence via a multimodal foundation model

The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks. To achieve this goal, we propose to pre-train our foundation model by self-supervised learning with weak semantic correlation data crawled from the Internet and show that promising results can be obtained on a wide range of downstream tasks. Particularly, with the developed model-interpretability tools, we demonstrate that strong imagination ability is now possessed by our foundation model. We believe that our work makes a transformative stride towards AGI, from our common practice of "weak or narrow AI" to that of "strong or generalized AI".

preprint2022arXiv

UIGR: Unified Interactive Garment Retrieval

Interactive garment retrieval (IGR) aims to retrieve a target garment image based on a reference garment image along with user feedback on what to change on the reference garment. Two IGR tasks have been studied extensively: text-guided garment retrieval (TGR) and visually compatible garment retrieval (VCR). The user feedback for the former indicates what semantic attributes to change with the garment category preserved, while the category is the only thing to be changed explicitly for the latter, with an implicit requirement on style preservation. Despite the similarity between these two tasks and the practical need for an efficient system tackling both, they have never been unified and modeled jointly. In this paper, we propose a Unified Interactive Garment Retrieval (UIGR) framework to unify TGR and VCR. To this end, we first contribute a large-scale benchmark suited for both problems. We further propose a strong baseline architecture to integrate TGR and VCR in one model. Extensive experiments suggest that unifying two tasks in one framework is not only more efficient by requiring a single model only, it also leads to better performance. Code and datasets are available at https://github.com/BrandonHanx/CompFashion.

preprint2022arXiv

Zero-Shot Temporal Action Detection via Vision-Language Prompting

Existing temporal action detection (TAD) methods rely on large training data including segment-level annotations, limited to recognizing previously seen classes alone during inference. Collecting and annotating a large training set for each class of interest is costly and hence unscalable. Zero-shot TAD (ZS-TAD) resolves this obstacle by enabling a pre-trained model to recognize any unseen action classes. Meanwhile, ZS-TAD is also much more challenging with significantly less investigation. Inspired by the success of zero-shot image classification aided by vision-language (ViL) models such as CLIP, we aim to tackle the more complex TAD task. An intuitive method is to integrate an off-the-shelf proposal detector with CLIP style classification. However, due to the sequential localization (e.g, proposal generation) and classification design, it is prone to localization error propagation. To overcome this problem, in this paper we propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE). Such a novel design effectively eliminates the dependence between localization and classification by breaking the route for error propagation in-between. We further introduce an interaction mechanism between classification and localization for improved optimization. Extensive experiments on standard ZS-TAD video benchmarks show that our STALE significantly outperforms state-of-the-art alternatives. Besides, our model also yields superior results on supervised TAD over recent strong competitors. The PyTorch implementation of STALE is available at https://github.com/sauradip/STALE.

preprint2021arXiv

Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning

Most recent few-shot learning (FSL) methods are based on meta-learning with episodic training. In each meta-training episode, a discriminative feature embedding and/or classifier are first constructed from a support set in an inner loop, and then evaluated in an outer loop using a query set for model updating. This query set sample centered learning objective is however intrinsically limited in addressing the lack of training data problem in the support set. In this paper, a novel contrastive prototype learning with augmented embeddings (CPLAE) model is proposed to overcome this limitation. First, data augmentations are introduced to both the support and query sets with each sample now being represented as an augmented embedding (AE) composed of concatenated embeddings of both the original and augmented versions. Second, a novel support set class prototype centered contrastive loss is proposed for contrastive prototype learning (CPL). With a class prototype as an anchor, CPL aims to pull the query samples of the same class closer and those of different classes further away. This support set sample centered loss is highly complementary to the existing query centered loss, fully exploiting the limited training data in each episode. Extensive experiments on several benchmarks demonstrate that our proposed CPLAE achieves new state-of-the-art.

preprint2021arXiv

Deep Learning for Person Re-identification: A Survey and Outlook

Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.

preprint2021arXiv

Learning to Generate Novel Domains for Domain Generalization

This paper focuses on domain generalization (DG), the task of learning from multiple source domains a model that generalizes well to unseen domains. A main challenge for DG is that the available source domains often exhibit limited diversity, hampering the model's ability to learn to generalize. We therefore employ a data generator to synthesize data from pseudo-novel domains to augment the source domains. This explicitly increases the diversity of available training domains and leads to a more generalizable model. To train the generator, we model the distribution divergence between source and synthesized pseudo-novel domains using optimal transport, and maximize the divergence. To ensure that semantics are preserved in the synthesized data, we further impose cycle-consistency and classification losses on the generator. Our method, L2A-OT (Learning to Augment by Optimal Transport) outperforms current state-of-the-art DG methods on four benchmark datasets.

preprint2021arXiv

Local Black-box Adversarial Attacks: A Query Efficient Approach

Adversarial attacks have threatened the application of deep neural networks in security-sensitive scenarios. Most existing black-box attacks fool the target model by interacting with it many times and producing global perturbations. However, global perturbations change the smooth and insignificant background, which not only makes the perturbation more easily be perceived but also increases the query overhead. In this paper, we propose a novel framework to perturb the discriminative areas of clean examples only within limited queries in black-box attacks. Our framework is constructed based on two types of transferability. The first one is the transferability of model interpretations. Based on this property, we identify the discriminative areas of a given clean example easily for local perturbations. The second is the transferability of adversarial examples. It helps us to produce a local pre-perturbation for improving query efficiency. After identifying the discriminative areas and pre-perturbing, we generate the final adversarial examples from the pre-perturbed example by querying the targeted model with two kinds of black-box attack techniques, i.e., gradient estimation and random search. We conduct extensive experiments to show that our framework can significantly improve the query efficiency during black-box perturbing with a high attack success rate. Experimental results show that our attacks outperform state-of-the-art black-box attacks under various system settings.

preprint2021arXiv

Magnetic field-tuned quantum criticality in optimally electron-doped cuprate thin films

Antiferromagnetic (AF) spin fluctuations are commonly believed to play a key role in electron pairing of cuprate superconductors. In electron-doped cuprates, it is still in paradox about the interplay among different electronic states in quantum perturbations, especially between superconducting and magnetic states. Here, we report a systematic transport study on cation-optimized La2-xCexCuO4 (x = 0.10) thin films in high magnetic fields. We find an AF quantum phase transition near 60 T, where the Hall number jumps from nH =-x to nH = 1-x, resembling the change of nH at the AF boundary (xAF = 0.14) tuned by Ce doping. In the AF region a spin dependent state manifesting anomalous positive magnetoresistance is observed, which is closely related to superconductivity. Once the AF state is suppressed by magnetic field, a polarized ferromagnetic state is predicted, reminiscent of the recently reported ferromagnetic state at the quantum endpoint of the superconducting dome by Ce doping. The magnetic field that drives phase transitions in a similar but distinct manner to doping thereby provides a unique perspective to understand the quantum criticality of electron-doped cuprates.

preprint2021arXiv

Momentum-Resolved Visualization of Electronic Evolution in Doping a Mott Insulator

High temperature superconductivity in cuprates arises from doping a parent Mott insulator by electrons or holes. A central issue is how the Mott gap evolves and the low-energy states emerge with doping. Here we report angle-resolved photoemission spectroscopy measurements on a cuprate parent compound by sequential in situ electron doping. The chemical potential jumps to the bottom of the upper Hubbard band upon a slight electron doping, making it possible to directly visualize the charge transfer band and the full Mott gap region. With increasing doping, the Mott gap rapidly collapses due to the spectral weight transfer from the charge transfer band to the gapped region and the induced low-energy states emerge in a wide energy range inside the Mott gap. These results provide key information on the electronic evolution in doping a Mott insulator and establish a basis for developing microscopic theories for cuprate superconductivity.

preprint2021arXiv

Resonating valence bond realization of spin-1 non-Abelian chiral spin liquid on the torus

We propose resonating valence bond wave functions for a spin-1 system on the torus that realize a non-Abelian chiral spin liquid. The wave functions take the form of infinite dimensional matrix product states constructed from conformal blocks of the $\mathrm{SO}(3)_{1}$ Wess-Zumino-Witten model. This means that they are lattice analogues of the bosonic Moore-Read state introduced in fractional quantum Hall systems. The topological order of this system is revealed by explicit construction of three-fold degenerate ground states and analytical computation of the modular S and T matrices.

preprint2021arXiv

Universal quantum transition from superconducting to insulating states in pressurized Bi2Sr2CaCu2O8+δ superconductors

Copper oxide superconductors have continually fascinated the communities of condensed matter physics and material sciences because they host the highest ambient-pressure superconducting transition temperature (Tc) and mysterious physics. Searching for the universal correlation between the superconducting state and its normal state or neighboring ground state is believed to be an effective way for finding clues to elucidate the underlying mechanism of the superconductivity. One of the common pictures for the copper oxide superconductors is that a well-behaved metallic phase will present after the superconductivity is entirely suppressed by chemical doping or application of the magnetic field. Here, we report a different observation of universal quantum transition from superconducting state to insulating-like state under pressure in the under-, optimally- and over-doped Bi2212 superconductors with two CuO2 planes in a unit cell. The same phenomenon has been also found in the Bi2201 superconductor with one CuO2 plane and the Bi2223 superconductor with three CuO2 planes in a unit cell. These results not only provide fresh information but also pose a new challenge for achieving a unified understanding on the underlying physics of the high-Tc superconductivity.

preprint2021arXiv

Universal scaling of the critical temperature and the strange-metal scattering rate in unconventional superconductors

Dramatic evolution of properties with minute change in the doping level is a hallmark of the complex chemistry which governs cuprate superconductivity as manifested in the celebrated superconducting domes as well as quantum criticality taking place at precise compositions. The strange metal state, where the resistivity varies linearly with temperature, has emerged as a central feature in the normal state of cuprate superconductors. The ubiquity of this behavior signals an intimate link between the scattering mechanism and superconductivity. However, a clear quantitative picture of the correlation has been lacking. Here, we report observation of quantitative scaling laws between the superconducting transition temperature $T_{\rm c}$ and the scattering rate associated with the strange metal state in electron-doped cuprate $\rm La_{2-x}Ce_xCuO_4$ (LCCO) as a precise function of the doping level. High-resolution characterization of epitaxial composition-spread films, which encompass the entire overdoped range of LCCO has allowed us to systematically map its structural and transport properties with unprecedented accuracy and increment of $Δx = 0.0015$. We have uncovered the relations $T_{\rm c}\sim(x_{\rm c}-x)^{0.5}\sim(A_1^\square)^{0.5}$, where $x_c$ is the critical doping where superconductivity disappears on the overdoped side and $A_1^\square$ is the scattering rate of perfect $T$-linear resistivity per CuO$_2$ plane. We argue that the striking similarity of the $T_{\rm c}$ vs $A_1^\square$ relation among cuprates, iron-based and organic superconductors is an indication of a common mechanism of the strange metal behavior and unconventional superconductivity in these systems.

preprint2020arXiv

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

Existing few-shot learning (FSL) methods assume that there exist sufficient training samples from source classes for knowledge transfer to target classes with few training samples. However, this assumption is often invalid, especially when it comes to fine-grained recognition. In this work, we define a new FSL setting termed few-shot fewshot learning (FSFSL), under which both the source and target classes have limited training samples. To overcome the source class data scarcity problem, a natural option is to crawl images from the web with class names as search keywords. However, the crawled images are inevitably corrupted by large amount of noise (irrelevant images) and thus may harm the performance. To address this problem, we propose a graph convolutional network (GCN)-based label denoising (LDN) method to remove the irrelevant images. Further, with the cleaned web images as well as the original clean training images, we propose a GCN-based FSL method. For both the LDN and FSL tasks, a novel adaptive aggregation GCN (AdarGCN) model is proposed, which differs from existing GCN models in that adaptive aggregation is performed based on a multi-head multi-level aggregation module. With AdarGCN, how much and how far information carried by each graph node is propagated in the graph structure can be determined automatically, therefore alleviating the effects of both noisy and outlying training samples. Extensive experiments show the superior performance of our AdarGCN under both the new FSFSL and the conventional FSL settings.

preprint2020arXiv

AFeSe2 (A=Tl, K, Rb, or Cs): Iron-based superconducting analog of the cuprates

It has long been a challenging task to find compounds with similar crystal and electronic structures as cuprate superconductors with low dimensionality and strong antiferromagnetic fluctuations. The parent compounds of cuprate superconductors are Mott insulators with strong in-plane antiferromagnetic exchange interactions between Cu moments. Here we show, based on first-principles density functional calculations, that AFeSe2 (A=Tl, K, Rb, or Cs) exhibit many of the physical properties common to the cuprate parent compounds: (1) the FeSe2 layer in AFeSe2 is similar in crystalline and electronic structures to the CuO2 plane in cuprates, although Se atoms are not coplanar to the square Fe-lattice; (2) they are antiferromagnetic insulators, but with relatively small charge excitation gaps; (3) their ground states are Neel antiferromagnetic ordered, similar as in cuprates; and (4) the antiferromagnetic exchange interactions between Fe moments are larger than in other iron-based superconducting materials, but comparable to those in cuprates. Like cuprates, these compounds may become high-Tc superconductors upon doping of charge carriers either by chemical substitution or intercalation or by liquid or solid gating.

preprint2020arXiv

Automatic Differentiation for Second Renormalization of Tensor Networks

Tensor renormalization group (TRG) constitutes an important methodology for accurate simulations of strongly correlated lattice models. Facilitated by the automatic differentiation technique widely used in deep learning, we propose a uniform framework of differentiable TRG ($\partial$TRG) that can be applied to improve various TRG methods, in an automatic fashion. Essentially, $\partial$TRG systematically extends the concept of second renormalization [PRL 103, 160601 (2009)] where the tensor environment is computed recursively in the backward iteration, in the sense that given the forward process of TRG, $\partial$TRG automatically finds the gradient through backpropagation, with which one can deeply "train" the tensor networks. We benchmark $\partial$TRG in solving the square-lattice Ising model, and demonstrate its power by simulating one- and two-dimensional quantum systems at finite temperature. The deep optimization as well as GPU acceleration renders $\partial$TRG manybody simulations with high efficiency and accuracy.

preprint2020arXiv

BézierSketch: A generative model for scalable vector sketches

The study of neural generative models of human sketches is a fascinating contemporary modeling problem due to the links between sketch image generation and the human drawing process. The landmark SketchRNN provided breakthrough by sequentially generating sketches as a sequence of waypoints. However this leads to low-resolution image generation, and failure to model long sketches. In this paper we present BézierSketch, a novel generative model for fully vector sketches that are automatically scalable and high-resolution. To this end, we first introduce a novel inverse graphics approach to stroke embedding that trains an encoder to embed each stroke to its best fit Bézier curve. This enables us to treat sketches as short sequences of paramaterized strokes and thus train a recurrent sketch generator with greater capacity for longer sketches, while producing scalable high-resolution results. We report qualitative and quantitative results on the Quick, Draw! benchmark.

preprint2020arXiv

Compressing deep neural networks by matrix product operators

A deep neural network is a parametrization of a multilayer mapping of signals in terms of many alternatively arranged linear and nonlinear transformations. The linear transformations, which are generally used in the fully connected as well as convolutional layers, contain most of the variational parameters that are trained and stored. Compressing a deep neural network to reduce its number of variational parameters but not its prediction power is an important but challenging problem toward the establishment of an optimized scheme in training efficiently these parameters and in lowering the risk of overfitting. Here we show that this problem can be effectively solved by representing linear transformations with matrix product operators (MPOs), which is a tensor network originally proposed in physics to characterize the short-range entanglement in one-dimensional quantum states. We have tested this approach in five typical neural networks, including FC2, LeNet-5, VGG, ResNet, and DenseNet on two widely used data sets, namely, MNIST and CIFAR-10, and found that this MPO representation indeed sets up a faithful and efficient mapping between input and output signals, which can keep or even improve the prediction accuracy with a dramatically reduced number of parameters. Our method greatly simplifies the representations in deep learning, and opens a possible route toward establishing a framework of modern neural networks which might be simpler and cheaper, but more efficient.

preprint2020arXiv

Correlation between Fermi surface reconstruction and superconductivity in pressurized FeTe0.55Se0.45

Here we report the first results of the high-pressure Hall coefficient (RH) measurements, combined with the high-pressure resistance measurements, at different temperatures on the putative topological superconductor FeTe0.55Se0.45. We find the intimate correlation of sign change of RH, a fingerprint to manifest the reconstruction of Fermi surface, with structural phase transition and superconductivity. Below the critical pressure (PC) of 2.7 GPa, our data reveal that the hole - electron carriers are thermally balanced (RH=0) at a critical temperature (T*), where RH changes its sign from positive to negative, and concurrently a tetragonal-orthorhombic phase transition takes place. Within the pressure range from 1bar to PC, T* is continuously suppressed by pressure, while TC increases monotonically. At about PC, T* is indistinguishable and TC reaches a maximum value. Moreover, a pressure-induced sign change of RH is found at ~PC where the orthorhombic-monoclinic phase transition occurs. With further compression, TC decreases and disappears at ~ 12 GPa. The correlation among the electron-hole balance, crystal structure and superconductivity found in the pressurized FeTe0.55Se0.45 implies that its nontrivial superconductivity is closely associated with its exotic normal state resulted from the interplay between the reconstruction of the Fermi surface and the change of the structural lattice.

preprint2020arXiv

Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

Sketch as an image search query is an ideal alternative to text in capturing the fine-grained visual details. Prior successes on fine-grained sketch-based image retrieval (FG-SBIR) have demonstrated the importance of tackling the unique traits of sketches as opposed to photos, e.g., temporal vs. static, strokes vs. pixels, and abstract vs. pixel-perfect. In this paper, we study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail -- a person typically sketches up to various extents of detail to depict an object. This hierarchical structure is often visually distinct. In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels. In particular, features from a sketch and a photo are enriched using cross-modal co-attention, coupled with hierarchical node fusion at every level to form a better embedding space to conduct retrieval. Experiments on common benchmarks show our method to outperform state-of-the-arts by a significant margin.

preprint2020arXiv

Deep Domain-Adversarial Image Generation for Domain Generalisation

Machine learning models typically suffer from the domain shift problem when trained on a source dataset and evaluated on a target dataset of different distribution. To overcome this problem, domain generalisation (DG) methods aim to leverage data from multiple source domains so that a trained model can generalise to unseen domains. In this paper, we propose a novel DG approach based on \emph{Deep Domain-Adversarial Image Generation} (DDAIG). Specifically, DDAIG consists of three components, namely a label classifier, a domain classifier and a domain transformation network (DoTNet). The goal for DoTNet is to map the source training data to unseen domains. This is achieved by having a learning objective formulated to ensure that the generated data can be correctly classified by the label classifier while fooling the domain classifier. By augmenting the source training data with the generated unseen domain data, we can make the label classifier more robust to unknown domain changes. Extensive experiments on four DG datasets demonstrate the effectiveness of our approach.

preprint2020arXiv

Domain-Adaptive Few-Shot Learning

Existing few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples. However, in practice this assumption is often invalid -- the target classes could come from a different domain. This poses an additional challenge of domain adaptation (DA) with few training samples. In this paper, the problem of domain-adaptive few-shot learning (DA-FSL) is tackled, which requires solving FSL and DA in a unified framework. To this end, we propose a novel domain-adversarial prototypical network (DAPN) model. It is designed to address a specific challenge in DA-FSL: the DA objective means that the source and target data distributions need to be aligned, typically through a shared domain-adaptive feature embedding space; but the FSL objective dictates that the target domain per class distribution must be different from that of any source domain class, meaning aligning the distributions across domains may harm the FSL performance. How to achieve global domain distribution alignment whilst maintaining source/target per-class discriminativeness thus becomes the key. Our solution is to explicitly enhance the source/target per-class separation before domain-adaptive feature embedding learning in the DAPN, in order to alleviate the negative effect of domain alignment on FSL. Extensive experiments show that our DAPN outperforms the state-of-the-art FSL and DA models, as well as their naïve combinations. The code is available at https://github.com/dingmyu/DAPN.

preprint2020arXiv

Egocentric Action Recognition by Video Attention and Temporal Context

We present the submission of Samsung AI Centre Cambridge to the CVPR2020 EPIC-Kitchens Action Recognition Challenge. In this challenge, action recognition is posed as the problem of simultaneously predicting a single `verb' and `noun' class label given an input trimmed video clip. That is, a `verb' and a `noun' together define a compositional `action' class. The challenging aspects of this real-life action recognition task include small fast moving objects, complex hand-object interactions, and occlusions. At the core of our submission is a recently-proposed spatial-temporal video attention model, called `W3' (`What-Where-When') attention~\cite{perez2020knowing}. We further introduce a simple yet effective contextual learning mechanism to model `action' class scores directly from long-term temporal behaviour based on the `verb' and `noun' prediction scores. Our solution achieves strong performance on the challenge metrics without using object-specific reasoning nor extra training data. In particular, our best solution with multimodal ensemble achieves the 2$^{nd}$ best position for `verb', and 3$^{rd}$ best for `noun' and `action' on the Seen Kitchens test set.

preprint2020arXiv

Few-Shot Learning as Domain Adaptation: Algorithm and Analysis

To recognize the unseen classes with only few samples, few-shot learning (FSL) uses prior knowledge learned from the seen classes. A major challenge for FSL is that the distribution of the unseen classes is different from that of those seen, resulting in poor generalization even when a model is meta-trained on the seen classes. This class-difference-caused distribution shift can be considered as a special case of domain shift. In this paper, for the first time, we propose a domain adaptation prototypical network with attention (DAPNA) to explicitly tackle such a domain shift problem in a meta-learning framework. Specifically, armed with a set transformer based attention module, we construct each episode with two sub-episodes without class overlap on the seen classes to simulate the domain shift between the seen and unseen classes. To align the feature distributions of the two sub-episodes with limited training samples, a feature transfer network is employed together with a margin disparity discrepancy (MDD) loss. Importantly, theoretical analysis is provided to give the learning bound of our DAPNA. Extensive experiments show that our DAPNA outperforms the state-of-the-art FSL alternatives, often by significant margins.

preprint2020arXiv

Fine-Grained Instance-Level Sketch-Based Video Retrieval

Existing sketch-analysis work studies sketches depicting static objects or scenes. In this work, we propose a novel cross-modal retrieval problem of fine-grained instance-level sketch-based video retrieval (FG-SBVR), where a sketch sequence is used as a query to retrieve a specific target video instance. Compared with sketch-based still image retrieval, and coarse-grained category-level video retrieval, this is more challenging as both visual appearance and motion need to be simultaneously matched at a fine-grained level. We contribute the first FG-SBVR dataset with rich annotations. We then introduce a novel multi-stream multi-modality deep network to perform FG-SBVR under both strong and weakly supervised settings. The key component of the network is a relation module, designed to prevent model over-fitting given scarce training data. We show that this model significantly outperforms a number of existing state-of-the-art models designed for video analysis.

preprint2020arXiv

Incremental Few-Shot Object Detection

Most existing object detection methods rely on the availability of abundant labelled training samples per class and offline model training in a batch mode. These requirements substantially limit their scalability to open-ended accommodation of novel classes with limited labelled training data. We present a study aiming to go beyond these limitations by considering the Incremental Few-Shot Detection (iFSD) problem setting, where new classes must be registered incrementally (without revisiting base classes) and with few examples. To this end we propose OpeN-ended Centre nEt (ONCE), a detector designed for incrementally learning to detect novel class objects with few examples. This is achieved by an elegant adaptation of the CentreNet detector to the few-shot learning scenario, and meta-learning a class-specific code generator model for registering novel classes. ONCE fully respects the incremental learning paradigm, with novel class registration requiring only a single forward pass of few-shot training samples, and no access to base classes -- thus making it suitable for deployment on embedded devices. Extensive experiments conducted on both the standard object detection and fashion landmark detection tasks show the feasibility of iFSD for the first time, opening an interesting and very important line of research.

preprint2020arXiv

Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention

Attentive video modeling is essential for action recognition in unconstrained videos due to their rich yet redundant information over space and time. However, introducing attention in a deep neural network for action recognition is challenging for two reasons. First, an effective attention module needs to learn what (objects and their local motion patterns), where (spatially), and when (temporally) to focus on. Second, a video attention module must be efficient because existing action recognition models already suffer from high computational cost. To address both challenges, a novel What-Where-When (W3) video attention module is proposed. Departing from existing alternatives, our W3 module models all three facets of video attention jointly. Crucially, it is extremely efficient by factorizing the high-dimensional video feature data into low-dimensional meaningful spaces (1D channel vector for `what' and 2D spatial tensors for `where'), followed by lightweight temporal attention reasoning. Extensive experiments show that our attention model brings significant improvements to existing action recognition models, achieving new state-of-the-art performance on a number of benchmarks.

preprint2020arXiv

On Learning Semantic Representations for Million-Scale Free-Hand Sketches

In this paper, we study learning semantic representations for million-scale free-hand sketches. This is highly challenging due to the domain-unique traits of sketches, e.g., diverse, sparse, abstract, noisy. We propose a dual-branch CNNRNN network architecture to represent sketches, which simultaneously encodes both the static and temporal patterns of sketch strokes. Based on this architecture, we further explore learning the sketch-oriented semantic representations in two challenging yet practical settings, i.e., hashing retrieval and zero-shot recognition on million-scale sketches. Specifically, we use our dual-branch architecture as a universal representation framework to design two sketch-specific deep models: (i) We propose a deep hashing model for sketch retrieval, where a novel hashing loss is specifically designed to accommodate both the abstract and messy traits of sketches. (ii) We propose a deep embedding model for sketch zero-shot recognition, via collecting a large-scale edge-map dataset and proposing to extract a set of semantic vectors from edge-maps as the semantic knowledge for sketch zero-shot domain alignment. Both deep models are evaluated by comprehensive experiments on million-scale sketches and outperform the state-of-the-art competitors.

preprint2020arXiv

Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval

Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch. Its widespread applicability is however hindered by the fact that drawing a sketch takes time, and most people struggle to draw a complete and faithful sketch. In this paper, we reformulate the conventional FG-SBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible. We further propose an on-the-fly design that starts retrieving as soon as the user starts drawing. To accomplish this, we devise a reinforcement learning-based cross-modal retrieval framework that directly optimizes rank of the ground-truth photo over a complete sketch drawing episode. Additionally, we introduce a novel reward scheme that circumvents the problems related to irrelevant sketch strokes, and thus provides us with a more consistent rank list during the retrieval. We achieve superior early-retrieval efficiency over state-of-the-art methods and alternative baselines on two publicly available fine-grained sketch retrieval datasets.

preprint2020arXiv

Tunable giant magnetoresistance in a single-molecule junction

Controlling electronic transport through a single-molecule junction is crucial for molecular electronics or spintronics. In magnetic molecular devices, the spin degree-of-freedom can be used to this end since the magnetic properties of the magnetic ion centers fundamentally impact the transport through the molecules. Here we demonstrate that the electron pathway in a single-molecule device can be selected between two molecular orbitals by varying a magnetic field, giving rise to a tunable anisotropic magnetoresistance up to 93%. The unique tunability of the electron pathways is due to the magnetic reorientation of the transition metal center, resulting in a re-hybridization of molecular orbitals. We obtain the tunneling electron pathways by Kondo effect, which manifests either as a peak or a dip line shape. The energy changes of these spin-reorientations are remarkably low and less than one millielectronvolt. The large tunable anisotropic magnetoresistance could be used to control electronic transport in molecular spintronics.

preprint2019arXiv

Emergent superconductivity in single crystalline $\mathrm{MgTi}_2\mathrm{O}_4$ films via structural engineering

Spinel compounds have demonstrated rich functionalities but rarely shown superconductivity. Here, we report the emergence of superconductivity in the spinel $\mathrm{MgTi}_2\mathrm{O}_4$, known to be an insulator with a complicated order. The superconducting transition is achieved by engineering a superlattice of $\mathrm{MgTi}_2\mathrm{O}_4$ and $\mathrm{SrTiO}_3$. The onset transition temperature in the $\mathrm{MgTi}_2\mathrm{O}_4$ layer can be tuned from 0 to 5 K in such geometry, concurrently with a stretched $c$-axis (from 8.51 to 8.53 Å) compared to the bulk material. Such a positive correlation without saturation suggests ample room for the further enhancement. Intriguingly, the superlattice exhibits isotropic upper critical field $H_{\mathrm{c}2}$ that breaks the Pauli limit, distinct from the highly anisotropic feature of interface superconductivity. The origin of superconductivity in the $\mathrm{MgTi}_2\mathrm{O}_4$ layer is understood in combination with the electron energy loss spectra and the first-principles electronic structure calculations, which point to the birth of superconductivity in the $\mathrm{MgTi}_2\mathrm{O}_4$ layer by preventing the Ti-Ti dimerization. Our discovery not only provides a platform to explore the interplay between the superconductivity and other exotic states, but also opens a new window to realize superconductivity in the spinel compounds as well as other titanium oxides.

preprint2019arXiv

Evidence for an Additional Symmetry Breaking from Direct Observation of Band Splitting in the Nematic State of FeSe Superconductor

The iron-based superconductor FeSe has attracted much recent attention because of its simple crystal structure, distinct electronic structure and rich physics exhibited by itself and its derivatives. Determination of its intrinsic electronic structure is crucial to understand its physical properties and superconductivity mechanism. Both theoretical and experimental studies so far have provided a picture that FeSe consists of one hole-like Fermi surface around the Brillouin zone center in its nematic state. Here we report direct observation of two hole-like Fermi surface sheets around the Brillouin zone center, and the splitting of the associated bands, in the nematic state of FeSe by taking high resolution laser-based angle-resolved photoemission measurements. These results indicate that, in addition to nematic order and spin-orbit coupling, there is an additional order in FeSe that breaks either inversion or time reversal symmetries. The new Fermi surface topology asks for reexamination of the existing theoretical and experimental understanding of FeSe and stimulates further efforts to identify the origin of the hidden order in its nematic state.

preprint2019arXiv

Mott phase in a van der Waals transition-metal halide at single layer limit

Two-dimensional materials offer opportunities for unravelling unprecedented ordered states at single layer limit. Among such ordered states, Mott phase is rarely explored. Here, we report the Mott phase in van der Waals chromium (II) iodide (CrI2) films. High quality CrI2 films with atomically flat surface and macro size are grown on graphitized 6H-SiC(0001) substrate by molecular beam epitaxy. By in situ low temperature scanning tunneling microscopy and spectroscopy (STM/STS), we reveal that the film has a band gap as large as ~3.2 eV, which is nearly thickness independent. Density functional plus dynamic mean field theory calculations suggest that CrI2 films may be a strong Mott insulator with a ferromagnetically ordered ground state. The Mott phase is corroborated by the spectral band splitting, that is consistent with the extended Hubbard model, and gap reduction at charge dopants. Our study provides a platform for studying correlated electron states at single layer limit.

preprint2019arXiv

Non-Volatile Superconductivity in an Insulating Copper Oxide Induced via Ionic Liquid Gating

Manipulating the superconducting states of high-T_c cuprate superconductors in an efficient and reliable way is of great importance for their applications in next-generation electronics. Traditional methods are mostly based on a trial-and-error method that is difficult to implement and time consuming. Here, employing ionic liquid gating, a selective control of volatile and non-volatile superconductivity is achieved in pristine insulating Pr_2CuO_{4\pmδ} film, based on two distinct mechanisms: 1) with positive electric fields, the film can be reversibly switched between non-superconducting and superconducting states, attributed to the carrier doping effect. 2) The film becomes more resistive by applying negative bias voltage up to -4 V, but strikingly, a non-volatile superconductivity is achieved once the gate voltage is removed. Such a persistent superconducting state represents a novel phenomenon in copper oxides, resulting from the doping healing of oxygen vacancies in copper-oxygen planes as unraveled by high-resolution scanning transmission electron microscope and in-situ x-ray diffraction experiments. The effective manipulation and mastering of volatile/non-volatile superconductivity in the same parent cuprate opens the door to more functionalities for superconducting electronics, as well as supplies flexible samples for investigating the nature of quantum phase transitions in high-T_c superconductors.

preprint2019arXiv

Selective Hybridization between Main Band and Superstructure Band in Bi$_2$Sr$_2$CaCu$_2$O$_{8+δ}$ Superconductor

High-resolution laser-based angle-resolved photoemission measurements have been carried out on Bi$_2$Sr$_2$CaCu$_2$O$_{8+δ}$ (Bi2212) and Bi$_2$Sr$_{2-x}$La$_x$CuO$_{6+δ}$ (Bi2201) superconductors. Unexpected hybridization between the main band and the superstructure band in Bi2212 is clearly revealed. In the momentum space where one main Fermi surface intersects with one superstructure Fermi surface, four bands are observed instead of two. The hybridization exists in both superconducting state and normal state, and in Bi2212 samples with different doping levels. Such a hybridization is not observed in Bi2201. This phenomenon can be understood by considering the bilayer splitting in Bi2212, the selective hybridization of two bands with peculiar combinations, and the altered matrix element effects of the hybridized bands. These observations provide strong evidence on the origin of the superstructure band which is intrinsic to the CuO$_2$ planes. Therefore, understanding physical properties and superconductivity mechanism in Bi2212 should consider the complete Fermi surface topology which involves the main bands, the superstructure bands and their interactions.

preprint2019arXiv

Strong coupling superconductivity in trilayer film LiB$_2$C$_2$

Coupling between $σ$-bonding electrons and phonons is generally very strong. To metallize $σ$-electrons provides a promising route to hunt for new high-T$_c$ superconductors. Based on this picture and first-principles density functional calculation with Wannier interpolation for electronic structure and lattice dynamics, we predict that trilayer film LiB$_2$C$_2$ is a good candidate to realize this kind of high-T$_c$ superconductivity. By solving the anisotropic Eliashberg equations, we find that free-standing trilayer LiB$_2$C$_2$ is a phonon-mediated superconductor with T$_c$ exceeding the liquid-nitrogen temperature at ambient pressure. The transition temperature can be further raised to 125 K by applying a biaxial tensile strain.

preprint2016arXiv

A conducting nano-filament (CNF) network as a precursor to the origin of superconductivity in electron-doped copper oxides

Emergency of superconductivity at the instabilities of antiferromagnetism has been widely recognized in unconventional superconductors. In copper-oxide superconductors, spin fluctuations play a predominant role in electron pairing with electron dopants yet composite orders veil the nature of superconductivity for hole-doped family. However, in electron-doped copper oxide superconductors (cuprates) the AFM critical end point is still in controversy for different probes, demonstrating high sensitivity to oxygen content. Here, by carefully tuning the oxygen content, a systematic study of the Hall signal and magnetoresistivity up to 58 Tesla on LCCO thin films identifies two characteristic temperatures. The former is quite robust, whereas the latter becomes flexible with increasing magnetic field, thereby linking respectively to two- and three-dimensional AFM, evident from the multidimensional phase diagram as a function of oxygen and Ce dopants. A rigorous theoretical analysis of the presented data suggest the existence of conductive nano-filamentary structures that effectively corroborate all previously reported field studies. The new findings provide a uniquely consistent alternative picture in understanding the interactions between AFM and superconductivity in electron-doped cuprates and offer a consolidating interpretation to the pioneering scaling law in cuprates recently established by Bozovic et al. (Nature, 2016)

preprint2016arXiv

Deep Transfer Learning for Person Re-identification

Person re-identification (Re-ID) poses a unique challenge to deep learning: how to learn a deep model with millions of parameters on a small training set of few or no labels. In this paper, a number of deep transfer learning models are proposed to address the data sparsity problem. First, a deep network architecture is designed which differs from existing deep Re-ID models in that (a) it is more suitable for transferring representations learned from large image classification datasets, and (b) classification loss and verification loss are combined, each of which adopts a different dropout strategy. Second, a two-stepped fine-tuning strategy is developed to transfer knowledge from auxiliary datasets. Third, given an unlabelled Re-ID dataset, a novel unsupervised deep transfer learning model is developed based on co-training. The proposed models outperform the state-of-the-art deep Re-ID models by large margins: we achieve Rank-1 accuracy of 85.4\%, 83.7\% and 56.3\% on CUHK03, Market1501, and VIPeR respectively, whilst on VIPeR, our unsupervised model (45.1\%) beats most supervised models.

preprint2016arXiv

Highly Efficient Regression for Scalable Person Re-Identification

Existing person re-identification models are poor for scaling up to large data required in real-world applications due to: (1) Complexity: They employ complex models for optimal performance resulting in high computational cost for training at a large scale; (2) Inadaptability: Once trained, they are unsuitable for incremental update to incorporate any new data available. This work proposes a truly scalable solution to re-id by addressing both problems. Specifically, a Highly Efficient Regression (HER) model is formulated by embedding the Fisher's criterion to a ridge regression model for very fast re-id model learning with scalable memory/storage usage. Importantly, this new HER model supports faster than real-time incremental model updates therefore making real-time active learning feasible in re-id with human-in-the-loop. Extensive experiments show that such a simple and fast model not only outperforms notably the state-of-the-art re-id methods, but also is more scalable to large data with additional benefits to active learning for reducing human labelling effort in re-id deployment.

preprint2016arXiv

Learning a Discriminative Null Space for Person Re-identification

Most existing person re-identification (re-id) methods focus on learning the optimal distance metrics across camera views. Typically a person's appearance is represented using features of thousands of dimensions, whilst only hundreds of training samples are available due to the difficulties in collecting matched training images. With the number of training samples much smaller than the feature dimension, the existing methods thus face the classic small sample size (SSS) problem and have to resort to dimensionality reduction techniques and/or matrix regularisation, which lead to loss of discriminative power. In this work, we propose to overcome the SSS problem in re-id distance metric learning by matching people in a discriminative null space of the training data. In this null space, images of the same person are collapsed into a single point thus minimising the within-class scatter to the extreme and maximising the relative between-class separation simultaneously. Importantly, it has a fixed dimension, a closed-form solution and is very efficient to compute. Extensive experiments carried out on five person re-identification benchmarks including VIPeR, PRID2011, CUHK01, CUHK03 and Market1501 show that such a simple approach beats the state-of-the-art alternatives, often by a big margin.

preprint2016arXiv

Self-consistent spin-wave analysis of the 1/3 magnetization plateau in the kagome antiferromagnet

We propose a modified spin-wave theory to study the 1/3 magnetization plateau of the antiferromagnetic Heisenberg model on the kagome lattice. By the self-consistent inclusion of quantum corrections, the 1/3 plateau is stabilized over a broad range of magnetic fields for all spin quantum numbers, S. The values of the critical magnetic fields and the widths of the magnetization plateaus are fully consistent with recent numerical results from exact diagonalization and infinite projected entangled paired states.

preprint2016arXiv

Semantic Regularisation for Recurrent Image Annotation

The "CNN-RNN" design pattern is increasingly widely applied in a variety of image annotation tasks including multi-label classification and captioning. Existing models use the weakly semantic CNN hidden layer or its transform as the image embedding that provides the interface between the CNN and RNN. This leaves the RNN overstretched with two jobs: predicting the visual concepts and modelling their correlations for generating structured annotation output. Importantly this makes the end-to-end training of the CNN and RNN slow and ineffective due to the difficulty of back propagating gradients through the RNN to train the CNN. We propose a simple modification to the design pattern that makes learning more effective and efficient. Specifically, we propose to use a semantically regularised embedding layer as the interface between the CNN and RNN. Regularising the interface can partially or completely decouple the learning problems, allowing each to be more effectively trained and jointly training much more efficient. Extensive experiments show that state-of-the art performance is achieved on multi-label classification as well as image captioning.

preprint2015arXiv

A close look at antiferromagnetism in multidimensional phase diagram of electron-doped copper oxide

Emergency of superconductivity at the instabilities of antiferromagnetism (AFM), spin/charge density waves has been widely recognized in unconventional superconductors. In copper-oxide superconductors, spin fluctuations play a predominant role in electron pairing with electron dopants yet composite orders veil the nature of superconductivity for hole-doped family. However, in electron-doped ones the ending point of AFM is still in controversy for different probes or its sensitivity to oxygen content. Here, by carefully tuning the oxygen content, a systematic study of Hall signal and magnetoresistivity up to 58 Tesla on optimally doped La2-xCexCuO4+-δ (x = 0.10) thin films identifies two characteristic temperatures at 62.5+-7.5 K and 25+-5 K. The former is quite robust whereas the latter becomes flexible with increasing magnetic field, thereby linked to two- and three-dimensional AFM, evident from the multidimensional phase diagram as a function of oxygen as well as Ce dopants. Consequently, the observation of extended AFM phase in contrast to μSR probe corroborates an elevated critical doping in field, providing an unambiguous picture to understand the interactions between AFM and superconductivity.

preprint2015arXiv

Evolution of electronic states in n-type copper oxide superconductor via electric double layer gating

Since the discovery of n-type copper oxide superconductors, the evolution of electron- and hole-bands and its relation to the superconductivity have been seen as a key factor in unveiling the mechanism of high-Tc superconductors. So far, the occurrence of electrons and holes in n-type copper oxides has been achieved by chemical doping, pressure, and/or deoxygenation. However, the observed electronic properties are blurred by the concomitant effects such as change of lattice structure, disorder, etc. Here, we report on successful tuning the electronic band structure of n-type Pr2-xCexCuO4 (x = 0.15) ultrathin films, via the electric double layer transistor technique. Abnormal transport properties, such as multiple sign reversals of Hall resistivity in normal and mixed states, have been revealed within an electrostatic field in range of -2 V to +2 V, as well as varying the temperature and magnetic field. In the mixed state, the intrinsic anomalous Hall conductivity invokes the contribution of both electron and hole-bands as well as the energy dependent density of states near the Fermi level. The two-band model can also describe the normal state transport properties well, whereas the carrier concentrations of electrons and holes are always enhanced or depressed simultaneously in electric fields. This is in contrast to the scenario of Fermi surface reconstruction by antiferromagnetism, where an anti-correlation between electrons and holes is commonly expected. Our findings paint the picture where Coulomb repulsion plays an important role in the evolution of the electronic states in n-type cuprate superconductors.

preprint2015arXiv

First-principles study of FeSe epitaxial films on SrTiO3

The discovery of high temperature superconductivity in FeSe films on SrTiO3 substrate has inspired great experimental and theoretical interests. First-principles density functional theory calculations, which have played an important role in the study of bulk iron-based superconductors, also participate in the investigation of interfacial superconductivity. In this article, we review the calculation results on the electronic and magnetic structures of FeSe epitaxial films, emphasizing on the interplay between different degrees of freedom, such as charge, spin, and lattice vibrations. Furthermore, the comparison between FeSe monolayer and bilayer films on SrTiO3 is discussed.

preprint2015arXiv

Ground State Degeneracy of Interacting Spinless Fermions

We propose an eigen-operator scheme to study the lattice model of interacting spinless fermions at half filling and show that this model possesses a hidden form of reflection positivity in its Majorana fermion representation. Based on this observation, we prove rigourously that the ground state of this model is either unique or doubly degenerate if the lattice size $N$ is even, and is always doubly degenerate if $N$ is odd. This proof holds in all dimensions with arbitrary lattice structures.

preprint2015arXiv

Nematic antiferromagnetic states in bulk FeSe

We revisit bulk FeSe through the systematic first-principles electronic structure calculations. We find that there are a series of staggered $n$-mer antiferromagnetic (AFM) states with corresponding energies below that of the collinear AFM state which is the ground state for the parent compounds of most iron-based superconductors. Here the staggered $n$-mer ($n$ any integer $>1$) means that a set of $n$ adjacent spins parallel on a line along $b$-axis with spins in antiparallel between $n$-mers and along $a$-axis. Among them, the lowest energy states are quasi-degenerate staggered dimer and staggered trimer AFM states as well as their any staggered combinations. Thus, to have the largest entropy to minimize the free energy at low temperature, the most favorable state is such a quasi-one-dimensional antiferromagnet in which along $b$-axis a variety of $n$-mers, mostly dimers and trimers, are randomly antiparallel aligned while along $a$-axis spins are antiparallel aligned, i.e. actually a nematic paramagnet. This finding accounts well for the absence of long-range magnetic order in bulk FeSe and meanwhile indicates the dominant stripe spin fluctuation and the nematicity as spin-driven.

preprint2015arXiv

Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels

The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels because human annotators are much better at ranking two images/videos (e.g. which one is more interesting) than giving an absolute value to each of them separately. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. Differing from existing methods, the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations. Extensive experiments on various benchmark datasets demonstrate that our new approach significantly outperforms state-of-the-arts alternatives.

preprint2015arXiv

Semantic Graph for Zero-Shot Learning

Zero-shot learning aims to classify visual objects without any training data via knowledge transfer between seen and unseen classes. This is typically achieved by exploring a semantic embedding space where the seen and unseen classes can be related. Previous works differ in what embedding space is used and how different classes and a test image can be related. In this paper, we utilize the annotation-free semantic word space for the former and focus on solving the latter issue of modeling relatedness. Specifically, in contrast to previous work which ignores the semantic relationships between seen classes and focus merely on those between seen and unseen classes, in this paper a novel approach based on a semantic graph is proposed to represent the relationships between all the seen and unseen class in a semantic word space. Based on this semantic graph, we design a special absorbing Markov chain process, in which each unseen class is viewed as an absorbing state. After incorporating one test image into the semantic graph, the absorbing probabilities from the test data to each unseen class can be effectively computed; and zero-shot classification can be achieved by finding the class label with the highest absorbing probability. The proposed model has a closed-form solution which is linear with respect to the number of test images. We demonstrate the effectiveness and computational efficiency of the proposed method over the state-of-the-arts on the AwA (animals with attributes) dataset.

preprint2015arXiv

Sketch-a-Net that Beats Humans

We propose a multi-scale multi-channel deep neural network framework that, for the first time, yields sketch recognition performance surpassing that of humans. Our superior performance is a result of explicitly embedding the unique characteristics of sketches in our model: (i) a network architecture designed for sketch rather than natural photo statistics, (ii) a multi-channel generalisation that encodes sequential ordering in the sketching process, and (iii) a multi-scale network ensemble with joint Bayesian fusion that accounts for the different levels of abstraction exhibited in free-hand sketches. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photo or sketch. Our network on the other hand not only delivers the best performance on the largest human sketch dataset to date, but also is small in size making efficient training possible using just CPUs.

preprint2015arXiv

Tensor network algorithm by coarse-graining tensor renormalization on finite periodic lattices

We develop coarse-graining tensor renormalization group algorithms to compute physical properties of two-dimensional lattice models on finite periodic lattices. Two different coarse-graining strategies, one based on the tensor renormalization group and the other based on the higher-order tensor renormalization group, are introduced. In order to optimize the tensor-network model globally, a sweeping scheme is proposed to account for the renormalization effect from the environment tensors under the framework of second renormalization group. We demonstrate the algorithms by the classical Ising model on the square lattice and the Kitaev model on the honeycomb lattice, and show that the finite-size algorithms achieve substantially more accurate results than the corresponding infinite-size ones.

preprint2015arXiv

Transductive Multi-class and Multi-label Zero-shot Learning

Recently, zero-shot learning (ZSL) has received increasing interest. The key idea underpinning existing ZSL approaches is to exploit knowledge transfer via an intermediate-level semantic representation which is assumed to be shared between the auxiliary and target datasets, and is used to bridge between these domains for knowledge transfer. The semantic representation used in existing approaches varies from visual attributes to semantic word vectors and semantic relatedness. However, the overall pipeline is similar: a projection mapping low-level features to the semantic representation is learned from the auxiliary dataset by either classification or regression models and applied directly to map each instance into the same semantic representation space where a zero-shot classifier is used to recognise the unseen target class instances with a single known 'prototype' of each target class. In this paper we discuss two related lines of work improving the conventional approach: exploiting transductive learning ZSL, and generalising ZSL to the multi-label case.

preprint2015arXiv

Transductive Multi-label Zero-shot Learning

Zero-shot learning has received increasing interest as a means to alleviate the often prohibitive expense of annotating training data for large scale recognition problems. These methods have achieved great success via learning intermediate semantic representations in the form of attributes and more recently, semantic word vectors. However, they have thus far been constrained to the single-label case, in contrast to the growing popularity and importance of more realistic multi-label data. In this paper, for the first time, we investigate and formalise a general framework for multi-label zero-shot learning, addressing the unique challenge therein: how to exploit multi-label correlation at test time with no training data for those classes? In particular, we propose (1) a multi-output deep regression model to project an image into a semantic word space, which explicitly exploits the correlations in the intermediate semantic layer of word vectors; (2) a novel zero-shot learning algorithm for multi-label data that exploits the unique compositionality property of semantic word vector representations; and (3) a transductive learning strategy to enable the regression model learned from seen classes to generalise well to unseen classes. Our zero-shot learning experiments on a number of standard multi-label datasets demonstrate that our method outperforms a variety of baselines.

preprint2015arXiv

Transductive Multi-view Zero-Shot Learning

Most existing zero-shot learning approaches exploit transfer learning via an intermediate-level semantic representation shared between an annotated auxiliary dataset and a target dataset with different classes and no annotation. A projection from a low-level feature space to the semantic representation space is learned from the auxiliary dataset and is applied without adaptation to the target dataset. In this paper we identify two inherent limitations with these approaches. First, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift problem and propose a novel framework, transductive multi-view embedding, to solve it. The second limitation is the prototype sparsity problem which refers to the fact that for each target class, only a single prototype is available for zero-shot learning given a semantic representation. To overcome this problem, a novel heterogeneous multi-view hypergraph label propagation method is formulated for zero-shot learning in the transductive embedding space. It effectively exploits the complementary information offered by different semantic representations and takes advantage of the manifold structures of multiple representation spaces in a coherent manner. We demonstrate through extensive experiments that the proposed approach (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complementarity of multiple semantic representations, (3) significantly outperforms existing methods for both zero-shot and N-shot recognition on three image and video benchmark datasets, and (4) enables novel cross-view annotation tasks.

preprint2015arXiv

Weakly Supervised Learning of Objects, Attributes and their Associations

When humans describe images they tend to use combinations of nouns and adjectives, corresponding to objects and their associated attributes respectively. To generate such a description automatically, one needs to model objects, attributes and their associations. Conventional methods require strong annotation of object and attribute locations, making them less scalable. In this paper, we model object-attribute associations from weakly labelled images, such as those widely available on media sharing sites (e.g. Flickr), where only image-level labels (either object or attributes) are given, without their locations and associations. This is achieved by introducing a novel weakly supervised non-parametric Bayesian model. Once learned, given a new image, our model can describe the image, including objects, attributes and their associations, as well as their locations and segmentation. Extensive experiments on benchmark datasets demonstrate that our weakly supervised model performs at par with strongly supervised models on tasks such as image description and retrieval based on object-attribute associations.

preprint2014arXiv

Iron based high transition temperature superconductors

In a superconductor electrons form pairs and electric transport becomes dissipation-less at low temperatures. Recently discovered iron based superconductors have the highest superconducting transition temperature next to copper oxides. In this article, we review material aspects and physical properties of iron based superconductors. We discuss the dependence of transition temperature on the crystal structure, the interplay between antiferromagnetism and superconductivity by examining neutron scattering experiments, and the electronic properties of these compounds obtained by angle resolved photoemission spectroscopy in link with some results from scanning tunneling microscopy/spectroscopy measurements. Possible microscopic model for this class of compounds is discussed from a strong coupling point of view.

preprint2014arXiv

Long-time dynamics of quantum chains: transfer-matrix renormalization group and entanglement of the maximal eigenvector

By using a different quantum-to-classical mapping from the Trotter-Suzuki decomposition, we identify the entanglement structure of the maximal eigenvectors for the associated quantum transfer matrix. This observation provides a deeper insight into the problem of linear growth of the entanglement entropy in time evolution using conventional methods. Based on this observation, we propose a general method for arbitrary temperatures using the biorthonormal transfer-matrix renormalization group. Our method exhibits a competitive accuracy with a much cheaper computational cost in comparison with two recent proposed methods for long-time dynamics based on a folding algorithm [Phys. Rev. Lett. 102, 240603 (2009)] and a modified time-dependent density-matrix renormalization group [Phys. Rev. Lett. 108, 227206 (2012)].

preprint2013arXiv

Can deeply underdoped superconducting cuprates be topological superconductors?

The nodal $d_{x^2-y^2}$ superconducting gap is a hallmark of the cuprate high T$_c$ superconductors. Surprisingly recent angle-resolved photoemission spectroscopy of deeply underdoped cuprates revealed a nodeless energy gap which is adhered to the Fermi surface. Importantly this phenomenon is observed for compounds across several different cuprate families. In this letter we propose an exciting possibility, namely the fully gapped state is a topological superconductor.

preprint2013arXiv

Comparing Tensor Renormalization Group and Monte Carlo calculations for spin and gauge models

We show that the Tensor Renormalization Group (TRG) method can be applied to O(N) spin models, principal chiral models and pure gauge theories (Z2, U(1) and SU(2)) on (hyper) cubic lattices. We explain that contrarily to some common belief, it is very difficult to write compact formulas expressing the blockspinning of lattice models. We show that in contrast to other approaches, the TRG formulation allows us to write exact blocking formulas with numerically controllable truncations. The basic reason is that the TRG blocking separates neatly the degrees of freedom inside the block and which are integrated over, from those kept to communicate with the neighboring blocks. We argue that the TRG is a method that can handle large volumes, which is crucial to approach quasi-conformal systems. The method can also get rid of some sign problems. We discuss recent results regarding the critical properties of the 2D O(2) nonlinear sigma model with complex beta and chemical potential. As some of these results appeared in a recently published paper (PRD 88, 056005) and two recent preprints (arXiv:1309.4963 and arXiv:1309.6623), these proceedings rather emphasize the conceptual aspects of our ongoing effort.

preprint2013arXiv

Investigation of atomic and electronic structures of MgOFeSe studied by the first-principles calculations

In order to assist the search of new superconductors in iron selenide materials by intercalation, we calculate the crystal and electronic structures of MgOFeSe using the first-principles density functional theory. MgOFeSe is isotructural to the parent compound of iron pnictide superconductor LaOFeAs. In LaOFeAs, the anion O$^{2-}$ is located at the center of each LaO tetrahedra. But for MgOFeSe, we find that the crystal structure with the cation Mg$^{2+}$ as the tetrahedral center in the MgO layer is energetically more stable. The low energy band structures around the Fermi surfaces of MgOFeSe are contributed mainly by Fe 3$d$ orbitals. The ground state of MgOFeSe is collinearly antiferromagnetically ordered. The height of Se atoms above the Fe-Fe layer is about 1.38 Å, which is close to the height of As from the Fe-Fe layer in the iron pnictide superconductors with optimal superconducting transition temperatures.

preprint2013arXiv

Prediction of phonon-mediated high temperature superconductivity in stoichiometric Li$_2$B$_3$C

The discovery of superconductivity in Magnesium Diborate (MgB$_2$) has stimulated great interest in the search of new superconductors with similar lattice structures. Unlike cuprate or iron-based superconductors, MgB$_2$ is indisputably a phonon-mediated high temperature superconductor. The emergence of high temperature superconductivity in this material results from the strong coupling between the boron $σ$-bonding electrons around the Fermi level and the bond-stretching optical phonon modes. Here we show, based on the first-principles calculations, that Li$_2$B$_3$C is such a good candidate of superconductor whose superconducting transition temperature (T$_c$) might be even higher than MgB$_2$. Li$_2$B$_3$C consists of alternating graphene-like boron-carbon layers and boron-boron layers with intercalated lithium atoms between them. Similar to MgB$_2$, Li$_2$B$_3$C is inherently metallic and possesses two $σ$- and two $π$-electron bands around the Fermi energy. The superconducting pairs are glued predominately by the strong interaction between boron $σ$-bonding electrons and various optical phonon modes.

preprint2012arXiv

(PI,0) antiferromagnetic spin excitations in superconducting Rb0.82Fe1.68Se2

We use inelastic neutron scattering to show that superconducting (SC) rubidium iron selenide Rb0.82Fe1.68Se2 exhibits antiferromagnetic (AF) spin excitations near the in-plane wave vector Q = (PI, 0) identical to that for iron arsenide superconductors. Moreover, we find that these excitations change from incommensurate to commensurate with increasing energy, and occur at the expense of spin waves associated with the coexisting sqrt(5)\timessqrt(5) block AF phase. Since angle resolved photoemission experiments reveal no evidence for hole-like Fermi surface at Gamma(0, 0), our results suggest that the Q = (PI, 0) excitations in SC Rb0.82Fe1.68Se2 come from localized moments and may have a similar origin as the hourglass-like spin excitations in copper oxide superconductors.

preprint2012arXiv

Atomic and electronic structures of FeSe monolayer and bilayer thin films on SrTiO$_3$ (001): a first-principles study

By the first-principles electronic structure calculations, we have studied electronic structures of FeSe monolayer and bilayer thin films on SrTiO$_3$ (001) with SrO-termination or TiO$_2$ termination. We find that both FeSe monolayer and bilayer on either termination behave like a slightly doped semiconductor and a collinear antiferromagnetic order on Fe ions. There is no substantial charge transfer between the FeSe layers and the substrate. FeSe is adhered to the SrTiO$_3$ surface by a dipole-dipole interaction. The Fermi surface is mainly the contribution of Fe-3d orbitals. A valence band contributed mainly by the O-$2p$ orbitals in the TiO$_2$ layer is located slightly below the Fermi level, which can become conducting upon a small doping of holes.

preprint2012arXiv

Effect of Li-deficiency impurities on the electron-overdoped LiFeAs superconductor

We use transport, inelastic neutron scattering, and angle resolved photoemission experiments to demonstrate that the stoichiometric LiFeAs is an intrinsically electron-overdoped superconductor similar to those of the electron-overdoped NaFe1-xTxAs and BaFe2-xTxAs2 (T = Co,Ni). Furthermore, we show that although transport properties of the stoichiometric superconducting LiFeAs and Li-deficient nonsuperconducting Li1-xFeAs are different, their electronic and magnetic properties are rather similar. Therefore, the nonsuperconducting Li1-xFeAs is also in the electron overdoped regime, where small Li deficiencies near the FeAs octahedra can dramatically suppress superconductivity through the impurity scattering effect.

preprint2012arXiv

Efficient simulation of infinite tree tensor network states on the Bethe lattice

We show that the simple update approach proposed by Jiang et. al. [H.C. Jiang, Z.Y. Weng, and T. Xiang, Phys. Rev. Lett. 101, 090603 (2008)] is an efficient and accurate method for determining the infinite tree tensor network states on the Bethe lattice. Ground state properties of the quantum transverse Ising model and the Heisenberg XXZ model on the Bethe lattice are studied. The transverse Ising model is found to undergo a second-order quantum phase transition with a diverging magnetic susceptibility but a finite correlation length which is upper-bounded by 1/ln(q-1) even at the transition point (q is the coordinate number of the Bethe lattice). An intuitive explanation on this peculiar "critical" phenomenon is given. The XXZ model on the Bethe lattice undergoes a first-order quantum phase transition at the isotropic point. Furthermore, the simple update scheme is found to be related with the Bethe approximation. Finally, by applying the simple update to various tree tensor clusters, we can obtain rather nice and scalable approximations for two-dimensional lattices.

preprint2012arXiv

Proximity effect at superconducting Sn-Bi2Se3 interface

We have investigated the conductance spectra of Sn-Bi2Se3 interface junctions down to 250 mK and in different magnetic fields. A number of conductance anomalies were observed below the superconducting transition temperature of Sn, including a small gap different from that of Sn, and a zero-bias conductance peak growing up at lower temperatures. We discussed the possible origins of the smaller gap and the zero-bias conductance peak. These phenomena support that a proximity-effect-induced chiral superconducting phase is formed at the interface between the superconducting Sn and the strong spin-orbit coupling material Bi2Se3.

preprint2011arXiv

Continuous quantum phase transition between two topologically distinct valence bond solid states associated with the same spin value

We propose a simple one-dimensional spin-2 Hamiltonian, which exhibits two topologically distinct valence bond solid states in different exactly solvable limits. We then construct the phase diagram and study the quantum phase transition between these states using the infinite time evolving block decimation algorithms. From the scaling relation between the entanglement entropy and correlation length, we find that the central charge for the underlying critical conformal field theory is $c=2$.

preprint2011arXiv

Electronic structures and magnetic orders of Fe-vacancies ordered ternary iron selenides TlFe$_{1.5}$Se$_2$ and AFe$_{1.5}$Se$_2$ (A=K, Rb, or Cs)

By the first-principles electronic structure calculations, we find that the ground state of the Fe-vacancies ordered TlFe$_{1.5}$Se$_2$ is a quasi-two-dimensional collinear antiferromagnetic semiconductor with an energy gap of 94 meV, in agreement with experimental measurements. This antiferromagnetic order is driven by the Se-bridged antiferromagnetic superexchange interactions between Fe moments. Similarly, we find that crystals AFe$_{1.5}$Se$_2$ (A=K, Rb, or Cs) are also antiferromagnetic semiconductors but with a zero-gap semiconducting state or semimetallic state nearly degenerated with the ground states. Thus rich physical properties and phase diagrams are expected.

preprint2011arXiv

First-principles study of pressure-induced magnetic phase transitions in ternary iron selenide K$_{0.8}$Fe$_{1.6}$Se$_2$

We have studied the pressure effect on electronic structures and magnetic orders of ternary iron selenide K$_{0.8}$Fe$_{1.6}$Se$_2$ by the first-principles electronic structure calculations. At low pressure, the compound is in the blocked checkerboard antiferromagnetic (AFM) semiconducting phase, as observed by the neutron scatting measurements. Applying pressure induces two phase transitions, first from the blocked checkerboard AFM semiconducting phase to a collinear AFM metallic phase around 12 GPa, and then to a non-magnetic metallic phase around 25 GPa, respectively. Our results help to clarify the recent experimental measurements under pressure.

preprint2011arXiv

Magnetic Frustration and Iron-Vacancy Ordering in Iron-Chalcogenide

We show that the magnetic and vacancy orders in the 122 $(A_{1-y}Fe_{2-x}Se_2)$ iron-chalcogenides can be naturally derived from the $J_1-J_2-J_3$ model with $J_1$ being the ferromagnetic (FM) nearest neighbor exchange coupling and $J_{2}, J_3$ being the antiferromagnetic (AFM) next and third nearest neighbor ones respectively, previously proposed to describe the magnetism in the 11(FeTe/Se) systems. In the 11 systems, the magnetic exchange couplings are extremely frustrated in the ordered bi-collinear antiferromagnetic state so that the magnetic transition temperature is low. In the 122 systems, the formation of iron vacancy order reduces the magnetic frustration and significantly increases the magnetic transition temperature and the ordered magnetic moment. The pattern of the 245 iron-vacancy order ($\sqrt{5}\times \sqrt{5}$) observed in experiments is correlated to the maximum reduction of magnetic frustration. The nature of the iron-vacancy ordering may hence be electronically driven. We explore other possible vacancy patterns and magnetic orders associated with them. We also calculate the spin wave excitations and their novel features to test our model.

preprint2011arXiv

Neutron Scattering Studies of spin excitations in hole-doped Ba0.67K0.33Fe2As2 superconductor

We report inelastic neutron scattering experiments on single crystals of superconducting Ba0.67K0.33Fe2As2 (Tc = 38 K). In addition to confirming the resonance previously found in powder samples, we find that spin excitations in the normal state form longitudinally elongated ellipses along the QAFM direction in momentum space, consistent with density functional theory predictions. On cooling below Tc, while the resonance preserves its momentum anisotropy as expected, spin excitations at energies below the resonance become essentially isotropic in the in-plane momentum space and dramatically increase their correlation length. These results suggest that the superconducting gap structures in Ba0.67Ka0.33Fe2As2 are more complicated than those suggested from angle resolved photoemission experiments.

preprint2011arXiv

Ternary iron selenide K$_{0.8}$Fe$_{1.6}$Se$_2$ is an antiferromagnetic semiconductor

We have studied electronic and magnetic structures of K$_{0.8+x}$Fe$_{1.6}$Se$_2$ by performing the first-principles electronic structure calculations. The ground state of the Fe-vacancies ordered K$_{0.8}$Fe$_{1.6}$Se$_2$ is found to be a quasi-two-dimensional blocked checkerboard antiferromagnetic (AFM) semiconductor with an energy gap of 594 meV and a large ordering magnetic moment of 3.37 $μ_B$ for each Fe atom, in excellent agreement with the neutron scattering measurement. The underlying mechanism is the chemical-bonding-driven tetramer lattice distortion. K$_{0.8+x}$Fe$_{1.6}$Se$_2$ with finite $x$ is a doped AFM semiconductor with low conducting carrier concentration which is approximately proportional to the excess potassium content, consistent qualitatively with the infrared observation. Our study reveals the importance of the interplay between antiferromagnetism and superconductivity in these materials. This suggests that K$_{0.8}$Fe$_{1.6}$Se$_2$, instead of KFe$_2$Se$_2$, should be regarded as a parent compound from which the superconductivity emerges upon electron or hole doping.

preprint2011arXiv

Topological term in the non-linear $σ$ model of the SO(5) spin chains

We show that there is a topological (Berry phase) term in the non-linear $σ$ model description of the SO(5) spin chain. It distinguishes the linear and projective representations of the SO(5) symmetry group, in exact analogy to the well-known $θ$-term of the SO(3) spin chain. The presence of the topological term is due to the fact that $π_2(\frac{SO(5)}{SO(3)\times SO(2)})= \mathbb{Z}$. We discuss the implication of our results on the spectra of the SO(5) spin chain, and connect it with a recent solvable SO(5) spin model which exhibits valence bond solid ground state and edge degeneracy.

preprint2010arXiv

Anisotropic Neutron Spin Resonance in Superconducting BaFe$_{1.9}$Ni$_{0.1}$As$_2$

We use polarized inelastic neutron scattering to show that the neutron spin resonance below $T_c$ in superconducting BaFe$_{1.9}$Ni$_{0.1}$As$_2$ ($T_c=20$ K) is purely magnetic in origin. Our analysis further reveals that the resonance peak near 7~meV only occurs for the planar response. This challenges the common perception that the spin resonance in the pnictides is an isotropic triplet excited state of the singlet Cooper pairs, as our results imply that only the $S_{001}=\pm1$ components of the triplet are involved.

preprint2010arXiv

Electronic and magnetic structures of ternary iron selenides AFe$_2$Se$_2$ (A=K, Cs, or Tl)

By the first-principles electronic structure calculations, we find that the ground state of ternary iron selenides AFe$_2$Se$_2$ (A=K, Cs, or Tl) is in a bi-collinear antiferromagnetic order, in which the Fe local moments ($\sim2.8μ_B$) align ferromagnetically along a diagonal direction and antiferromagnetically along the other diagonal direction on the Fe-Fe square lattice. This bi-collinear antiferromagnetic order results from the interplay among the nearest, the next nearest, and the next next nearest neighbor superexchange interactions, mediated by Se $4p$-orbitals.

preprint2010arXiv

Electronic structures of ternary iron arsenides AFe$_2$As$_2$ (A=Ba, Ca, or Sr)

We have studied the electronic and magnetic structures of the ternary iron arsenides AFe$_2$As$_2$ (A = Ba, Ca, or Sr) using the first-principles density functional theory. The ground states of these compounds are in a collinear antiferromagnetic order, resulting from the interplay between the nearest and the next-nearest neighbor superexchange antiferromagnetic interactions bridged by As $4p$ orbitals. The correction from the spin-orbit interaction to the band structure is small. The pressure can reduce dramatically the magnetic moment and diminish the collinear antiferromagnetic order. Based on the calculations, we propose that the low energy dynamics of these materials is described effectively by a $t-J_H-J_1-J_2$-type model.

preprint2010arXiv

Normal-State Hourglass Dispersion of the Spin Excitations in FeSe$_{x}$Te$_{1-x}$

We use cold neutron spectroscopy to study the low-energy spin excitations of superconducting (SC) FeSe$_{0.4}$Te$_{0.6}$ and essentially non-superconducting (NSC) FeSe$_{0.45}$Te$_{0.55}$. In contrast to BaFe$_{2-x}$(Co,Ni)$_{x}$As$_2$, where the low-energy spin excitations are commensurate both in the SC and normal state, the normal-state spin excitations in SC FeSe$_{0.4}$Te$_{0.6}$ are incommensurate and show an hourglass dispersion near the resonance energy. Since similar hourglass dispersion is also found in the NSC FeSe$_{0.45}$Te$_{0.55}$, we argue that the observed incommensurate spin excitations in FeSe$_{1-x}$Te$_{x}$ are not directly associated with superconductivity. Instead, the results can be understood within a picture of Fermi surface nesting assuming extremely low Fermi velocities and spin-orbital coupling.

preprint2009arXiv

Atomic and electronic structures of ternary iron arsenides $A$Fe$_2$As$_2$(001) surfaces ($A$=Ba, Sr, or Ca)

By the first-principles electronic structure calculations, we find that energetically the most favorable cleaved $A$Fe$_2$As$_2$(001) surface ($A$=Ba, Sr, or Ca) is $A$-terminated with a $(\sqrt{2}\times \sqrt{2})R45^{\circ}$ or $(1\times 2)$ order. The $(1\times 2)$ ordered structure yields a $(1\times 2)$ dimerized STM image, in agreement with the experimental observation. The $A$ atoms are found to diffuse on the surface with a small energy barrier so that the cleaving process may destroy the $A$ atoms ordering. At the very low temperatures this may result in an As-terminated surface with the $A$ atoms in randomly assembling. The As-terminated BaFe$_2$As$_2$ surface in orthorhombic phase is $(\sqrt{2}\times\sqrt{2})R45^{\circ}$ buckled, giving rise to a switchable $(\sqrt{2}\times \sqrt{2})R45^{\circ}$ STM pattern upon varying the applied bias. No any reconstruction is found for the other As-terminated surfaces. There are surface states crossing or nearby the Fermi energy in the As-terminated and $(1\times 2)$ $A$-terminated surfaces. A unified physical picture is thus established to help understand the cleaved $A$Fe$_2$As$_2$(001) surfaces.

preprint2009arXiv

Topologically distinct classes of valence bond solid states with their parent Hamiltonians

We introduce a general method to construct one-dimensional translationally invariant valence bond solid states with a built-in Lie group $G$ and derive their matrix product representations. The general strategies to find their parent Hamiltonians are provided so that the valence bond solid states are their unique ground states. For quantum integer spin-$S$ chains, we discuss two topologically distinct classes of valence bond solid states: One consists of two virtual SU(2) spin-$J$ variables in each site and another is formed by using two $SO(2S+1)$ spinors. Among them, a new spin-1 fermionic valence bond solid state, its parent Hamiltonian, and its properties are discussed in detail. Moreover, two types of valence bond solid states with SO(5) symmetry are further generalized and their respective properties are analyzed as well.

preprint2008arXiv

Quantum transfer matrix method for one-dimensional disordered electronic systems

We develop a novel quantum transfer matrix method to study thermodynamic properties of one-dimensional (1D) disordered electronic systems. It is shown that the partition function can be expressed as a product of $2\times2$ local transfer matrices. We demonstrate this method by applying it to the 1D disordered Anderson model. Thermodynamic quantities of this model are calculated and discussed.

preprint2005arXiv

Midgap States in Antiferromagnetic Heisenberg Chains with A Staggered Field

We study low-energy excitations in antiferromagnetic Heisenberg chains with a staggered field which splits the spectrum into a longitudinal and a transverse branch. Bound states are found to exist inside the field induced gap in both branches. They originate from the edge effects and are inherent to spin-chain materials. The sine-Gordon scaling $h_s^{2/3}|\log h_s|^{1/6}$ ($h_s$: the staggered field) provides an accurate description for the gap and midgap energies in the transverse branch for $S=1/2$ and the midgap energies in both branches for $S=3/2$ over a wide range of magnetic field; however, it can fit other low-energy excitations only at much lower field. Moreover, the integer-spin S=1 chain displays scaling behavior that does not fit this scaling law. These results reveal intriguing features of magnetic excitations in spin-chain materials that deserve further investigation.

Tao Xiang

What is connected

Connect this record

See the researcher in context

Building this map preview

102 published item(s)

From Ground to Sky: Architectures, Applications, and Challenges Shaping Low-Altitude Wireless Networks

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

Adaptive Fine-Grained Sketch-Based Image Retrieval

Chiral conformal field theory for topological states and the anyon eigenbasis on the torus

Deep Learning for Free-Hand Sketch: A Survey

Domain Generalization: A Survey

Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

Dynamic Instance Domain Adaptation

FashionViL: Fashion-Focused Vision-and-Language Representation Learning

FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context

Giant and Reversible Electronic Structure Evolution in a Magnetic Topological Material EuCd2As2

Magnetic Excitations in Strained Infinite-layer Nickelate PrNiO2

Negative Frames Matter in Egocentric Visual Query 2D Localization

One Sketch for All: One-Shot Personalized Sketch Segmentation

Partially Does It: Towards Scene-Level FG-SBIR with Partial Input

Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning

Quasi-uniaxial pressure induced superconductivity in stoichiometric compound UTe$_2$

Semi-Supervised Temporal Action Detection with Proposal-Free Masking

Sketch3T: Test-Time Training for Zero-Shot SBIR

Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval

SOFT: Softmax-free Transformer with Linear Complexity

Style-Based Global Appearance Flow for Virtual Try-On

Towards artificial general intelligence via a multimodal foundation model

UIGR: Unified Interactive Garment Retrieval

Zero-Shot Temporal Action Detection via Vision-Language Prompting

Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning

Deep Learning for Person Re-identification: A Survey and Outlook

Learning to Generate Novel Domains for Domain Generalization

Local Black-box Adversarial Attacks: A Query Efficient Approach

Magnetic field-tuned quantum criticality in optimally electron-doped cuprate thin films

Momentum-Resolved Visualization of Electronic Evolution in Doping a Mott Insulator

Resonating valence bond realization of spin-1 non-Abelian chiral spin liquid on the torus

Universal quantum transition from superconducting to insulating states in pressurized Bi2Sr2CaCu2O8+δ superconductors

Universal scaling of the critical temperature and the strange-metal scattering rate in unconventional superconductors

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

AFeSe2 (A=Tl, K, Rb, or Cs): Iron-based superconducting analog of the cuprates

Automatic Differentiation for Second Renormalization of Tensor Networks

BézierSketch: A generative model for scalable vector sketches

Compressing deep neural networks by matrix product operators

Correlation between Fermi surface reconstruction and superconductivity in pressurized FeTe0.55Se0.45

Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image Retrieval

Deep Domain-Adversarial Image Generation for Domain Generalisation

Domain-Adaptive Few-Shot Learning

Egocentric Action Recognition by Video Attention and Temporal Context

Few-Shot Learning as Domain Adaptation: Algorithm and Analysis

Fine-Grained Instance-Level Sketch-Based Video Retrieval

Incremental Few-Shot Object Detection

Knowing What, Where and When to Look: Efficient Video Action Modeling with Attention

On Learning Semantic Representations for Million-Scale Free-Hand Sketches

Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval

Tunable giant magnetoresistance in a single-molecule junction

Emergent superconductivity in single crystalline $\mathrm{MgTi}_2\mathrm{O}_4$ films via structural engineering

Evidence for an Additional Symmetry Breaking from Direct Observation of Band Splitting in the Nematic State of FeSe Superconductor

Mott phase in a van der Waals transition-metal halide at single layer limit

Non-Volatile Superconductivity in an Insulating Copper Oxide Induced via Ionic Liquid Gating

Selective Hybridization between Main Band and Superstructure Band in Bi$_2$Sr$_2$CaCu$_2$O$_{8+δ}$ Superconductor

Strong coupling superconductivity in trilayer film LiB$_2$C$_2$

A conducting nano-filament (CNF) network as a precursor to the origin of superconductivity in electron-doped copper oxides

Deep Transfer Learning for Person Re-identification

Highly Efficient Regression for Scalable Person Re-Identification

Learning a Discriminative Null Space for Person Re-identification

Self-consistent spin-wave analysis of the 1/3 magnetization plateau in the kagome antiferromagnet

Semantic Regularisation for Recurrent Image Annotation

A close look at antiferromagnetism in multidimensional phase diagram of electron-doped copper oxide

Evolution of electronic states in n-type copper oxide superconductor via electric double layer gating

First-principles study of FeSe epitaxial films on SrTiO3

Ground State Degeneracy of Interacting Spinless Fermions

Nematic antiferromagnetic states in bulk FeSe

Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels

Semantic Graph for Zero-Shot Learning

Sketch-a-Net that Beats Humans

Tensor network algorithm by coarse-graining tensor renormalization on finite periodic lattices

Transductive Multi-class and Multi-label Zero-shot Learning

Transductive Multi-label Zero-shot Learning