Source author record

Jianfeng Wang

Jianfeng Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.CO cond-mat.mtrl-sci Computation and Language Machine Learning cond-mat.mes-hall math.AP Applications astro-ph.IM cond-mat.str-el Distributed, Parallel, and Cluster Computing eess.SP Information Theory math.IT Networking and Internet Architecture physics.optics

Catalog footprint

What is connected

36works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Bring Metric Functions into Diffusion Models

We introduce a Cascaded Diffusion Model (Cas-DM) that improves a Denoising Diffusion Probabilistic Model (DDPM) by effectively incorporating additional metric functions in training. Metric functions such as the LPIPS loss have been proven highly effective in consistency models derived from the score matching. However, for the diffusion counterparts, the methodology and efficacy of adding extra metric functions remain unclear. One major challenge is the mismatch between the noise predicted by a DDPM at each step and the desired clean image that the metric function works well on. To address this problem, we propose Cas-DM, a network architecture that cascades two network modules to effectively apply metric functions to the diffusion model training. The first module, similar to a standard DDPM, learns to predict the added noise and is unaffected by the metric function. The second cascaded module learns to predict the clean image, thereby facilitating the metric function computation. Experiment results show that the proposed diffusion model backbone enables the effective use of the LPIPS loss, leading to state-of-the-art image quality (FID, sFID, IS) on various established benchmarks.

preprint2024arXiv

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

In the evolution of Vision-Language Pre-training, shifting from short-text comprehension to encompassing extended textual contexts is pivotal. Recent autoregressive vision-language models like \cite{flamingo, palme}, leveraging the long-context capability of Large Language Models, have excelled in few-shot text generation tasks but face challenges in alignment tasks. Addressing this gap, we introduce the contrastive loss into text generation models, presenting the COntrastive-Streamlined MultimOdal framework (\ModelName), strategically partitioning the language model into dedicated unimodal text processing and adept multimodal data handling components. \ModelName, our unified framework, merges unimodal and multimodal elements, enhancing model performance for tasks involving textual and visual data while notably reducing learnable parameters. However, these models demand extensive long-text datasets, yet the availability of high-quality long-text video datasets remains limited. To bridge this gap, this work introduces \VideoDatasetName, an inaugural interleaved video-text dataset featuring comprehensive captions, marking a significant step forward. Demonstrating its impact, we illustrate how \VideoDatasetName{} enhances model performance in image-text tasks. With 34% learnable parameters and utilizing 72\% of the available data, our model demonstrates significant superiority over OpenFlamingo~\cite{openflamingo}. For instance, in the 4-shot flickr captioning task, performance notably improves from 57.2% to 65.\%. The contributions of \ModelName{} and \VideoDatasetName{} are underscored by notable performance gains across 14 diverse downstream datasets encompassing both image-text and video-text tasks.

preprint2022arXiv

A complete characterization of graphs with exactly two positive eigenvalues

In 1977 Smith characterized graphs with exactly one positive eigenvalue. Since then, many particular results related to graphs with exactly two positive eigenvalues have emerged. In this paper we conclude this investigation by giving a full characterization of these graphs.

preprint2022arXiv

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

Knowledge-based visual question answering (VQA) involves answering questions that require external knowledge not present in the image. Existing methods first retrieve knowledge from external resources, then reason over the selected knowledge, the input image, and question for answer prediction. However, this two-step approach could lead to mismatches that potentially limit the VQA performance. For example, the retrieved knowledge might be noisy and irrelevant to the question, and the re-embedded knowledge features during reasoning might deviate from their original meanings in the knowledge base (KB). To address this challenge, we propose PICa, a simple yet effective method that Prompts GPT3 via the use of Image Captions, for knowledge-based VQA. Inspired by GPT-3's power in knowledge retrieval and question answering, instead of using structured KBs as in previous work, we treat GPT-3 as an implicit and unstructured KB that can jointly acquire and process relevant knowledge. Specifically, we first convert the image into captions (or tags) that GPT-3 can understand, then adapt GPT-3 to solve the VQA task in a few-shot manner by just providing a few in-context VQA examples. We further boost performance by carefully investigating: (i) what text formats best describe the image content, and (ii) how in-context examples can be better selected and used. PICa unlocks the first use of GPT-3 for multimodal tasks. By using only 16 examples, PICa surpasses the supervised state of the art by an absolute +8.6 points on the OK-VQA dataset. We also benchmark PICa on VQAv2, where PICa also shows a decent few-shot performance.

preprint2022arXiv

An Empirical Study of Training End-to-End Vision-and-Language Transformers

Vision-and-language (VL) pre-training has proven to be highly effective on various VL downstream tasks. While recent work has shown that fully transformer-based VL models can be more efficient than previous region-feature-based methods, their performance on downstream tasks often degrades significantly. In this paper, we present METER, a Multimodal End-to-end TransformER framework, through which we investigate how to design and pre-train a fully transformer-based VL model in an end-to-end manner. Specifically, we dissect the model designs along multiple dimensions: vision encoders (e.g., CLIP-ViT, Swin transformer), text encoders (e.g., RoBERTa, DeBERTa), multimodal fusion module (e.g., merged attention vs. co-attention), architectural design (e.g., encoder-only vs. encoder-decoder), and pre-training objectives (e.g., masked image modeling). We conduct comprehensive experiments and provide insights on how to train a performant VL transformer. METER achieves an accuracy of 77.64% on the VQAv2 test-std set using only 4M images for pre-training, surpassing the state-of-the-art region-feature-based model by 1.04%, and outperforming the previous best fully transformer-based model by 1.6%. Notably, when further scaled up, our best VQA model achieves an accuracy of 80.54%. Code and pre-trained models are released at https://github.com/zdou0830/METER.

preprint2022arXiv

Exploiting dynamic nonlinearity in upconversion nanoparticles for super-resolution imaging

Single-beam super-resolution microscopy, also known as superlinear microscopy, exploits the nonlinear response of fluorescent probes in confocal microscopy. The technique requires no complex purpose-built system, light field modulation, or beam shaping. Here, we present a strategy to enhance spatial resolution of superlinear microscopy by modulating excitation intensity during image acquisition. This modulation induces dynamic optical nonlinearity in upconversion nanoparticles (UCNPs), resulting in variations of higher spatial-frequency information in the obtained images. The high-order information can be extracted with a proposed weighted finite difference imaging algorithm from raw fluorescence images, to generate an image with a higher resolution than superlinear microscopy images. We apply this approach to resolve two adjacent nanoparticles within a diffraction-limited area, improving the resolution to 130 nm. This work suggests a new scope for developing dynamic nonlinear fluorescent probes in super-resolution nanoscopy.

preprint2022arXiv

Inertia and spectral symmetry of eccentricity matrices of some clique trees

The eccentricity matrix $\mathcal E(G)$ of a connected graph $G$ is obtained from the distance matrix of $G$ by leaving unchanged the largest nonzero entries in each row and each column, and replacing the remaining ones with zeros. In this paper, we consider the set $\mathcal C \mathcal T$ of clique trees whose blocks have at most two cut-vertices \textcolor{blue}{of the clique tree}. After proving the irreducibility of the eccentricity matrix of a clique tree in $\mathcal C \mathcal T$ and finding its inertia indices, we show that every graph in $\mathcal C \mathcal T$ with more than $4$ vertices and odd diameter has two positive and two negative $\mathcal E$-eigenvalues. Positive $\mathcal E$-eigenvalues and negative $\mathcal E$-eigenvalues turn out to be equal in number even for graphs in $\mathcal C \mathcal T$ with even diameter; that shared cardinality also counts the \textcolor{blue}{`diametrally distinguished'} vertices. Finally, we prove that the spectrum of the eccentricity matrix of a clique tree $G$ in $\mathcal C \mathcal T$ is symmetric with respect to the origin if and only if $G$ has an odd diameter and exactly two adjacent central vertices.

preprint2022arXiv

Injecting Semantic Concepts into End-to-End Image Captioning

Tremendous progress has been made in recent years in developing better image captioning models, yet most of them rely on a separate object detector to extract regional features. Recent vision-language studies are shifting towards the detector-free trend by leveraging grid representations for more flexible model training and faster inference speed. However, such development is primarily focused on image understanding tasks, and remains less investigated for the caption generation task. In this paper, we are concerned with a better-performing detector-free image captioning model, and propose a pure vision transformer-based image captioning model, dubbed as ViTCAP, in which grid representations are used without extracting the regional features. For improved performance, we introduce a novel Concept Token Network (CTN) to predict the semantic concepts and then incorporate them into the end-to-end captioning. In particular, the CTN is built on the basis of a vision transformer and is designed to predict the concept tokens through a classification task, from which the rich semantic information contained greatly benefits the captioning task. Compared with the previous detector-based models, ViTCAP drastically simplifies the architectures and at the same time achieves competitive performance on various challenging image captioning datasets. In particular, ViTCAP reaches 138.1 CIDEr scores on COCO-caption Karpathy-split, 93.8 and 108.6 CIDEr scores on nocaps, and Google-CC captioning datasets, respectively.

preprint2022arXiv

NP-Match: When Neural Processes meet Semi-Supervised Learning

Semi-supervised learning (SSL) has been widely explored in recent years, and it is an effective way of leveraging unlabeled data to reduce the reliance on labeled data. In this work, we adjust neural processes (NPs) to the semi-supervised image classification task, resulting in a new method named NP-Match. NP-Match is suited to this task for two reasons. Firstly, NP-Match implicitly compares data points when making predictions, and as a result, the prediction of each unlabeled data point is affected by the labeled data points that are similar to it, which improves the quality of pseudo-labels. Secondly, NP-Match is able to estimate uncertainty that can be used as a tool for selecting unlabeled samples with reliable pseudo-labels. Compared with uncertainty-based SSL methods implemented with Monte Carlo (MC) dropout, NP-Match estimates uncertainty with much less computational overhead, which can save time at both the training and the testing phases. We conducted extensive experiments on four public datasets, and NP-Match outperforms state-of-the-art (SOTA) results or achieves competitive results on them, which shows the effectiveness of NP-Match and its potential for SSL.

preprint2022arXiv

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos. An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn order-aware positional embeddings. Compared to DALL-E, Imagen and Parti, NUWA-Infinity can generate high-resolution images with arbitrary sizes and support long-duration video generation additionally. Compared to NUWA, which also covers images and videos, NUWA-Infinity has superior visual synthesis capabilities in terms of resolution and variable-size generation. The GitHub link is https://github.com/microsoft/NUWA. The homepage link is https://nuwa-infinity.microsoft.com.

preprint2022arXiv

On joins of a clique and a co-clique as star complements in regular graphs

In this paper we consider $r$-regular graphs $G$ that admit the vertex set partition such that one of the induced subgraphs is the join of an $s$-vertex clique and a $t$-vertex co-clique and represents a star complement for an eigenvalue $μ$ of $G$. The cases in which one of the parameters $s, t$ is less than 2 or $μ=r$ are already resolved. It is conjectured in [J. Wang, X. Yuan, L. Liu, Regular graphs with a prescribed complete multipartite graph as a star complement, Linear Algebra Appl.~579 (2019) 302--319] that if $s, t\geq 2$ and $μ\neq r$, then $μ=-2, t=2$ and $G=\overline{(s+1)K_2}$. For $μ=-t$ we verify this conjecture to be true. We further study the case in which $μ\neq-t$ and confirm the conjecture provided $t^2-4μ^2t-4μ^3=0$. For the remaining possibility we determine the structure of a putative counterexample and relate its existence to the existence of a particular 2-class block design. It occurs that the smallest counterexample would have 1265 vertices.

preprint2022arXiv

Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation

Recently, several Bayesian deep learning methods have been proposed for semi-supervised medical image segmentation. Although they have achieved promising results on medical benchmarks, some problems are still existing. Firstly, their overall architectures belong to the discriminative models, and hence, in the early stage of training, they only use labeled data for training, which might make them overfit to the labeled data. Secondly, in fact, they are only partially based on Bayesian deep learning, as their overall architectures are not designed under the Bayesian framework. However, unifying the overall architecture under the Bayesian perspective can make the architecture have a rigorous theoretical basis, so that each part of the architecture can have a clear probabilistic interpretation. Therefore, to solve the problems, we propose a new generative Bayesian deep learning (GBDL) architecture. GBDL belongs to the generative models, whose target is to estimate the joint distribution of input medical volumes and their corresponding labels. Estimating the joint distribution implicitly involves the distribution of data, so both labeled and unlabeled data can be utilized in the early stage of training, which alleviates the potential overfitting problem. Besides, GBDL is completely designed under the Bayesian framework, and thus we give its full Bayesian formulation, which lays a theoretical probabilistic foundation for our architecture. Extensive experiments show that our GBDL outperforms previous state-of-the-art methods in terms of four commonly used evaluation indicators on three public medical datasets.

preprint2022arXiv

Scaling Up Vision-Language Pre-training for Image Captioning

In recent years, we have witnessed significant performance boost in the image captioning task based on vision-language pre-training (VLP). Scale is believed to be an important factor for this advance. However, most existing work only focuses on pre-training transformers with moderate sizes (e.g., 12 or 24 layers) on roughly 4 million images. In this paper, we present LEMON, a LargE-scale iMage captiONer, and provide the first empirical study on the scaling behavior of VLP for image captioning. We use the state-of-the-art VinVL model as our reference model, which consists of an image feature extractor and a transformer model, and scale the transformer both up and down, with model sizes ranging from 13 to 675 million parameters. In terms of data, we conduct experiments with up to 200 million image-text pairs which are automatically collected from web based on the alt attribute of the image (dubbed as ALT200M). Extensive analysis helps to characterize the performance trend as the model size and the pre-training data size increase. We also compare different training recipes, especially for training on large-scale noisy data. As a result, LEMON achieves new state of the arts on several major image captioning benchmarks, including COCO Caption, nocaps, and Conceptual Captions. We also show LEMON can generate captions with long-tail visual concepts when used in a zero-shot manner.

preprint2022arXiv

Statistical learning for train delays and influence of winter climate and atmospheric icing

This study investigated the climate effect under consecutive winters on the arrival delay of high-speed passenger trains in northern Sweden. Novel statistical learning approaches, including inhomogeneous Markov chain model and stratified Cox model, were adopted to account for the time-varying risks of train delays. The inhomogeneous Markov chain modelling for the arrival delays has used several covariates, including weather variables, train operational direction, and findings from the primary delay analysis through stratified Cox model. The results showed that the weather variables, such as temperature, snow depth, ice/snow precipitation, and train operational direction, significantly impact the arrival delay. The performance of the fitted inhomogeneous Markov chain model was evaluated by the walk-forward validation method. The averaged mean absolute errors between the expected rates and the observed rates of the arrival delay over the train line was obtained at the level of 0.088, which implies that approximately 9% of trains may be misclassified as having arrival delays by the fitted model at a measuring point on the train line.

preprint2022arXiv

The Overlooked Classifier in Human-Object Interaction Recognition

Human-Object Interaction (HOI) recognition is challenging due to two factors: (1) significant imbalance across classes and (2) requiring multiple labels per image. This paper shows that these two challenges can be effectively addressed by improving the classifier with the backbone architecture untouched. Firstly, we encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs. As a result, the performance is boosted significantly, especially for the few-shot subset. Secondly, we propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset. Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin. Moreover, we transfer the classification model to instance-level HOI detection by connecting it with an off-the-shelf object detector. We achieve state-of-the-art without additional fine-tuning.

preprint2022arXiv

The Overlooked Classifier in Human-Object Interaction Recognition

preprint2022arXiv

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

We propose UniTAB that Unifies Text And Box outputs for grounded vision-language (VL) modeling. Grounded VL tasks such as grounded captioning require the model to generate a text description and align predicted words with object regions. To achieve this, models must generate desired text and box outputs together, and meanwhile indicate the alignments between words and boxes. In contrast to existing solutions that use multiple separate modules for different outputs, UniTAB represents both text and box outputs with a shared token sequence, and introduces a special <obj> token to naturally indicate word-box alignments in the sequence. UniTAB thus could provide a more comprehensive and interpretable image description, by freely grounding generated words to object regions. On grounded captioning, UniTAB presents a simpler solution with a single output head, and significantly outperforms state of the art in both grounding and captioning evaluations. On general VL tasks that have different desired output formats (i.e., text, box, or their combination), UniTAB with a single network achieves better or comparable performance than task-specific state of the art. Experiments cover 7 VL benchmarks, including grounded captioning, visual grounding, image captioning, and visual question answering. Furthermore, UniTAB's unified multi-task network and the task-agnostic output sequence design make the model parameter efficient and generalizable to new tasks.

preprint2021arXiv

A best bound for $λ_2(G)$ to guarantee $κ(G) \geq 2$

Let $G$ be a connected $d$-regular graph with a given order and the second largest eigenvalue $λ_2(G)$. Mohar and O (private communication) asked a challenging problem: what is the best upper bound for $λ_2(G)$ which guarantees that $κ(G) \geq t+1$, where $1 \leq t \leq d-1$ and $κ(G)$ is the vertex-connectivity of $G$, which was also mentioned by Cioabă. As a starting point, we solve this problem in the case $t =1$, and characterize all families of extremal graphs.

preprint2021arXiv

Dirac Nodal Lines and Nodal Loops in a Topological Kagome Superconductor CsV$_3$Sb$_5$

The intertwining of charge order, superconductivity and band topology has promoted the AV$_3$Sb$_5$ (A=K, Rb, Cs) family of materials to the center of attention in condensed matter physics. Underlying those mysterious macroscopic properties such as giant anomalous Hall conductivity (AHC) and chiral charge density wave is their nontrivial band topology. While there have been numerous experimental and theoretical works investigating the nontrivial band structure and especially the van Hove singularities, the exact topological phase of this family remains to be clarified. In this work, we identify CsV$_3$Sb$_5$ as a Dirac nodal line semimetal based on the observation of multiple Dirac nodal lines and loops close to the Fermi level. Combining photoemission spectroscopy and density functional theory, we identify two groups of Dirac nodal lines along $k_z$ direction and one group of Dirac nodal loops in the A-H-L plane. These nodal loops are located at the Fermi level within the instrumental resolution limit. Importantly, our first-principle analyses indicate that these nodal loops may be a crucial source of the mysterious giant AHC observed. Our results not only provide a clear picture to categorize the band structure topology of this family of materials, but also suggest the dominant role of topological nodal loops in shaping their transport behavior.

preprint2021arXiv

Galleon: Reshaping the Square Peg of NFV

Software is often used for Network Functions (NFs) -- such as firewalls, NAT, deep packet inspection, and encryption -- that are applied to traffic in the network. The community has hoped that NFV would enable rapid development of new NFs and leverage commodity computing infrastructure. However, the challenge for researchers and operators has been to align the square peg of high-speed packet processing with the round hole of cloud computing infrastructures and abstractions, all while delivering performance, scalability, and isolation. Past work has led to the belief that NFV is different enough that it requires novel, custom approaches that deviate from today's norms. To the contrary, we show that we can achieve performance, scalability, and isolation in NFV judiciously using mechanisms and abstractions of FaaS, the Linux kernel, NIC hardware, and OpenFlow switches. As such, with our system Galleon, NFV can be practically-deployable today in conventional cloud environments while delivering up to double the performance per core compared to the state of the art.

preprint2020arXiv

Anchor Box Optimization for Object Detection

In this paper, we propose a general approach to optimize anchor boxes for object detection. Nowadays, anchor boxes are widely adopted in state-of-the-art detection frameworks. However, these frameworks usually pre-define anchor box shapes in heuristic ways and fix the sizes during training. To improve the accuracy and reduce the effort of designing anchor boxes, we propose to dynamically learn the anchor shapes, which allows the anchors to automatically adapt to the data distribution and the network learning capability. The learning approach can be easily implemented with stochastic gradient descent and can be plugged into any anchor box-based detection framework. The extra training cost is almost negligible and it has no impact on the inference time or memory cost. Exhaustive experiments demonstrate that the proposed anchor optimization method consistently achieves significant improvement ($\ge 1\%$ mAP absolute gain) over the baseline methods on several benchmark datasets including Pascal VOC 07+12, MS COCO and Brainwash. Meanwhile, the robustness is also verified towards different anchor initialization methods and the number of anchor shapes, which greatly simplifies the problem of anchor box design.

preprint2020arXiv

Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer

In this paper, we propose an effective knowledge transfer framework to boost the weakly supervised object detection accuracy with the help of an external fully-annotated source dataset, whose categories may not overlap with the target domain. This setting is of great practical value due to the existence of many off-the-shelf detection datasets. To more effectively utilize the source dataset, we propose to iteratively transfer the knowledge from the source domain by a one-class universal detector and learn the target-domain detector. The box-level pseudo ground truths mined by the target-domain detector in each iteration effectively improve the one-class universal detector. Therefore, the knowledge in the source dataset is more thoroughly exploited and leveraged. Extensive experiments are conducted with Pascal VOC 2007 as the target weakly-annotated dataset and COCO/ImageNet as the source fully-annotated dataset. With the proposed solution, we achieved an mAP of $59.7\%$ detection performance on the VOC test set and an mAP of $60.2\%$ after retraining a fully supervised Faster RCNN with the mined pseudo ground truths. This is significantly better than any previously known results in related literature and sets a new state-of-the-art of weakly supervised object detection under the knowledge transfer setting. Code: \url{https://github.com/mikuhatsune/wsod_transfer}.

preprint2020arXiv

Hashing-based Non-Maximum Suppression for Crowded Object Detection

In this paper, we propose an algorithm, named hashing-based non-maximum suppression (HNMS) to efficiently suppress the non-maximum boxes for object detection. Non-maximum suppression (NMS) is an essential component to suppress the boxes at closely located locations with similar shapes. The time cost tends to be huge when the number of boxes becomes large, especially for crowded scenes. The basic idea of HNMS is to firstly map each box to a discrete code (hash cell) and then remove the boxes with lower confidences if they are in the same cell. Considering the intersection-over-union (IoU) as the metric, we propose a simple yet effective hashing algorithm, named IoUHash, which guarantees that the boxes within the same cell are close enough by a lower IoU bound. For two-stage detectors, we replace NMS in region proposal network with HNMS, and observe significant speed-up with comparable accuracy. For one-stage detectors, HNMS is used as a pre-filter to speed up the suppression with a large margin. Extensive experiments are conducted on CARPK, SKU-110K, CrowdHuman datasets to demonstrate the efficiency and effectiveness of HNMS. Code is released at \url{https://github.com/microsoft/hnms.git}.

preprint2019arXiv

Enhanced block sparse signal recovery based on $q$-ratio block constrained minimal singular values

In this paper we introduce the $q$-ratio block constrained minimal singular values (BCMSV) as a new measure of measurement matrix in compressive sensing of block sparse/compressive signals and present an algorithm for computing this new measure. Both the mixed $\ell_2/\ell_q$ and the mixed $\ell_2/\ell_1$ norms of the reconstruction errors for stable and robust recovery using block Basis Pursuit (BBP), the block Dantzig selector (BDS) and the group lasso in terms of the $q$-ratio BCMSV are investigated. We establish a sufficient condition based on the $q$-ratio block sparsity for the exact recovery from the noise free BBP and developed a convex-concave procedure to solve the corresponding non-convex problem in the condition. Furthermore, we prove that for sub-Gaussian random matrices, the $q$-ratio BCMSV is bounded away from zero with high probability when the number of measurements is reasonably large. Numerical experiments are implemented to illustrate the theoretical results. In addition, we demonstrate that the $q$-ratio BCMSV based error bounds are tighter than the block restricted isotropic constant based bounds.

preprint2019arXiv

Giant Enhancement of Solid Solubility in Monolayer BNC Alloys by Selective Orbital Coupling

Solid solubility (SS) is one of the most important features of alloys, which is usually difficult to be largely tuned in the entire alloy concentrations by external approaches. Some alloys that were supposed to have promising physical properties could turn out to be much less useful because of their poor SS, e.g., the case for monolayer BNC [(BN)1-x(C2)x] alloys. Until now, an effective approach on significantly enhancing SS of (BN)1-x(C2)x in the entire x is still lacking. In this article, a novel mechanism of selective orbital coupling between high energy wrong-bond states and surface states mediated by the specific substrate has been proposed to stabilize the wrong-bonds and in turn significantly enhance the SS of (BN)1-x(C2)x alloys. Surprisingly, we demonstrate that five ordered alloys, exhibiting variable direct quasi-particle bandgaps from 1.35 to 3.99 eV, can spontaneously be formed at different x when (BN)1-x(C2)x is grown on hcp-phase Cr. Interestingly, the optical transitions around the band edges in these ordered alloys, accompanied by largely tunable exciton binding energies of ~1 eV at different x, are significantly strong due to their unique band structures. Importantly, the disordered (BN)1-x(C2)x alloys, exhibiting fully tunable bandgaps from 0 to ~6 eV in the entire x, can be formed on Cr substrate at the miscibility temperature of ~1200 K, which is greatly reduced compared to that of 4500~5600 K in free-standing form or on other substrates. Our discovery not only may resolve the long-standing SS problem of BNC alloys, but also could significantly extend the applications of BNC alloys for various optoelectronic applications.

preprint2019arXiv

Realization of Lieb Lattice in Covalent-organic Frameworks with Tunable Topology and Magnetism

Lieb lattice, a two-dimensional edge-depleted square lattice, has been predicted to host various exotic electronic properties due to its unusual band structure, i.e., Dirac cone intersected by a flat band (Dirac-flat bands). Until now, although a few artificial Lieb lattices have been discovered in experiments, the realization of a Lieb lattice in a real material is still unachievable. In this article, based on tight-binding modeling and first-principles calculations, we predict that the two covalent organic frameworks (COFs), i.e., sp2C-COF and sp2N-COF, which have been synthesized in the recent experiments, are actually the first two material realizations of organic-ligand-based Lieb lattice. It is found that the lattice distortion can govern the bandwidth of the Dirac-flat bands and in turn determine its electronic instability against spontaneous spin-polarization during carrier doping. The spin-orbit coupling effects could drive these Dirac-flat bands in a distorted Lieb lattice presenting nontrivial topological properties, which depend on the position of Fermi level. Interestingly, as the hole doping concentration increases, the sp2C-COF can experience the phase transitions from a paramagnetic state to a ferromagnetic one and then to a Néel antiferromagnetic one. Our findings not only confirm the first material realization of Lieb lattice in COFs, but also offer a possible way to achieve tunable topology and magnetism in d- (f-) orbital-free organic lattices.

preprint2016arXiv

Electronic properties of SnTe-class topological crystalline insulator materials

The rise of topological insulators in recent years has broken new ground both in the conceptual cognition of condensed matter physics and the promising revolution of the electronic devices. It also stimulates the explorations of more topological states of matter. Topological crystalline insulator is a new topological phase, which combines the electronic topology and crystal symmetry together. In this article, we review the recent progress in the studies of SnTe-class topological crystalline insulator materials. Starting from the topological identifications in the aspects of the bulk topology, surface states calculations and experimental observations, we present the electronic properties of topological crystalline insulators under various perturbations, including native defect, chemical doping, strain, and thickness-dependent confinement effects, and then discuss their unique quantum transport properties, such as valley-selective filtering and helicity-resolved functionalities for Dirac fermions. The rich properties and high tunability make SnTe-class materials promising candidates for novel quantum devices.

preprint2016arXiv

Manipulated Object Proposal: A Discriminative Object Extraction and Feature Fusion Framework for First-Person Daily Activity Recognition

Detecting and recognizing objects interacting with humans lie in the center of first-person (egocentric) daily activity recognition. However, due to noisy camera motion and frequent changes in viewpoint and scale, most of the previous egocentric action recognition methods fail to capture and model highly discriminative object features. In this work, we propose a novel pipeline for first-person daily activity recognition, aiming at more discriminative object feature representation and object-motion feature fusion. Our object feature extraction and representation pipeline is inspired by the recent success of object hypotheses and deep convolutional neural network based detection frameworks. Our key contribution is a simple yet effective manipulated object proposal generation scheme. This scheme leverages motion cues such as motion boundary and motion magnitude (in contrast, camera motion is usually considered as "noise" for most previous methods) to generate a more compact and discriminative set of object proposals, which are more closely related to the objects which are being manipulated. Then, we learn more discriminative object detectors from these manipulated object proposals based on region-based convolutional neural network (R-CNN). Meanwhile, we develop a network based feature fusion scheme which better combines object and motion features. We show in experiments that the proposed framework significantly outperforms the state-of-the-art recognition performance on a challenging first-person daily activity benchmark.

preprint2016arXiv

The Xinglong 2.16-m Telescope: Current Instruments and Scientific Projects

The Xinglong 2.16-m reflector is the first 2-meter class astronomical telescope in China. It was jointly designed and built by the Nanjing Astronomical Instruments Factory (NAIF), Beijing Astronomical Observatory (now National Astronomical Observatories, Chinese Academy of Sciences, NAOC) and Institute of Automation, Chinese Academy of Sciences in 1989. It is Ritchey-Chrétien (R-C) reflector on an English equatorial mount and the effective aperture is 2.16 meters. It had been the largest optical telescope in China for $\sim18$ years until the Guoshoujing Telescope (also called Large Sky Area Multi-Object Fiber Spectroscopic Telescope, LAMOST) and the Lijiang 2.4-m telescope were built. At present, there are three main instruments on the Cassegrain focus available: the Beijing Faint Object Spectrograph and Camera (BFOSC) for direct imaging and low resolution ($R\sim500-2000$) spectroscopy, the spectrograph made by Optomechanics Research Inc. (OMR) for low resolution spectroscopy (the spectral resolutions are similar to those of BFOSC) and the fiber-fed High Resolution Spectrograph (HRS, $R\sim30000-65000$). The telescope is widely open to astronomers all over China as well as international astronomical observers. Each year there are more than 40 ongoing observing projects, including 6-8 key projects. Recently, some new techniques and instruments (e.g., astro-frequency comb calibration system, polarimeter and adaptive optics) have been or will be tested on the telescope to extend its observing abilities.

preprint2015arXiv

Group $K$-Means

We study how to learn multiple dictionaries from a dataset, and approximate any data point by the sum of the codewords each chosen from the corresponding dictionary. Although theoretically low approximation errors can be achieved by the global solution, an effective solution has not been well studied in practice. To solve the problem, we propose a simple yet effective algorithm \textit{Group $K$-Means}. Specifically, we take each dictionary, or any two selected dictionaries, as a group of $K$-means cluster centers, and then deal with the approximation issue by minimizing the approximation errors. Besides, we propose a hierarchical initialization for such a non-convex problem. Experimental results well validate the effectiveness of the approach.

preprint2014arXiv

Experimental observation of Dirac-like surface states and topological phase transition in Pb$_{1-x}$Sn$_x$Te(111) films

The surface of a topological crystalline insulator (TCI) carries an even number of Dirac cones protected by crystalline symmetry. We epitaxially grew high quality Pb$_{1-x}$Sn$_x$Te(111) films and investigated the TCI phase by in-situ angle-resolved photoemission spectroscopy. Pb$_{1-x}$Sn$_x$Te(111) films undergo a topological phase transition from trivial insulator to TCI via increasing the Sn/Pb ratio, accompanied by a crossover from n-type to p-type doping. In addition, a hybridization gap is opened in the surface states when the thickness of film is reduced to the two-dimensional limit. The work demonstrates an approach to manipulating the topological properties of TCI, which is of importance for future fundamental research and applications based on TCI.

preprint2014arXiv

On a Problem of Harary and Schwenk on Graphs with Distinct Eigenvalues

Harary and Schwenk posed the problem forty years ago: Which graphs have distinct adjacency eigenvalues? In this paper, we obtain a necessary and sufficient condition for an Hermitian matrix with simple spectral radius and distinct eigenvalues. As its application, we give an algebraic characterization to the Harary-Schwenk's problem. As an extension of their problem, we also obtain a necessary and sufficient condition for a positive semidefinite matrix with simple least eigenvalue and distinct eigenvalues, which can provide an algebraic characterization to their problem with respect to the (normalized) Laplacian matrix.

preprint2014arXiv

Optimized Cartesian $K$-Means

Product quantization-based approaches are effective to encode high-dimensional data points for approximate nearest neighbor search. The space is decomposed into a Cartesian product of low-dimensional subspaces, each of which generates a sub codebook. Data points are encoded as compact binary codes using these sub codebooks, and the distance between two data points can be approximated efficiently from their codes by the precomputed lookup tables. Traditionally, to encode a subvector of a data point in a subspace, only one sub codeword in the corresponding sub codebook is selected, which may impose strict restrictions on the search accuracy. In this paper, we propose a novel approach, named Optimized Cartesian $K$-Means (OCKM), to better encode the data points for more accurate approximate nearest neighbor search. In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace. Each sub codeword stems from different sub codebooks in each subspace, which are optimally generated with regards to the minimization of the distortion errors. The high-dimensional data point is then encoded as the concatenation of the indices of multiple sub codewords from all the subspaces. This can provide more flexibility and lower distortion errors than traditional methods. Experimental results on the standard real-life datasets demonstrate the superiority over state-of-the-art approaches for approximate nearest neighbor search.

preprint2014arXiv

Randić energy and Randić eigenvalues

Let $G$ be a graph of order $n$, and $d_i$ the degree of a vertex $v_i$ of $G$. The Randić matrix ${\bf R}=(r_{ij})$ of $G$ is defined by $r_{ij} = 1 / \sqrt{d_jd_j}$ if the vertices $v_i$ and $v_j$ are adjacent in $G$ and $r_{ij}=0$ otherwise. The normalized signless Laplacian matrix $\mathcal{Q}$ is defined as $\mathcal{Q} =I+\bf{R}$, where $I$ is the identity matrix. The Randić energy is the sum of absolute values of the eigenvalues of $\bf{R}$. In this paper, we find a relation between the normalized signless Laplacian eigenvalues of $G$ and the Randić energy of its subdivided graph $S(G)$. We also give a necessary and sufficient condition for a graph to have exactly $k$ and distinct Randić eigenvalues.

preprint2011arXiv

The existence and uniqueness of the smoothing solution of the Navier-Stokes equations

This paper discussed the existence and uniqueness of the smoothing solution of the Navier-Stokes equations. At first, we construct the theory of the linear equations which is about the unknown four variables functions with constant coefficients. Secondly, we use this theory to convert the Navier-Stokes equations into the simultaneous of the first order linear partial differential equations with constant coefficients and the quadratic equations. Thirdly, we use the Fourier transformation to convert the first order linear partial differential equations with constant coefficients into the linear equations, and we get the explicit general solution of it. At last, we convert the quadratic equations into the integral equations or the question to find the fixed-point of a continuous mapping. We use the theories about the Poisson equation, the heat-conduct equation, the Schauder fixed-point theorem and the contraction mapping principle to prove that the fixed-point is exist and unique except a set whose Lebesgue measure is 0, hence the smoothing solution of the Navier-Stokes equations is also exist and unique except a set whose Lebesgue measure is 0.

preprint2011arXiv

The global existence of the smoothing solution for the Navier-Stokes equations

This paper discussed the global existence of the smoothing solution for the Navier-Stokes equations. At first, we construct the theory of the linear equations which is about the unknown four variables functions with constant coefficients. Secondly, we use this theory to convert the Navier-Stokes equations into the simultaneous of the first order linear partial differential equations with constant coefficients and the quadratic equations. Thirdly, we use the Fourier transformation to convert the first order linear partial differential equations with constant coefficients into the linear equations, and we get the explicit general solution of it. At last, we convert the quadratic equations into the integral equations or the question to find the fixed-point of a continuous mapping. We use the theories about the Poisson's equation, the heat-conduct equation, the Schauder fixed-point theorem to prove that the fixed-point is exist, hence the smoothing solution for the Navier-Stokes equations is globally exist.

Jianfeng Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

36 published item(s)

Bring Metric Functions into Diffusion Models

COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

A complete characterization of graphs with exactly two positive eigenvalues

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

An Empirical Study of Training End-to-End Vision-and-Language Transformers

Exploiting dynamic nonlinearity in upconversion nanoparticles for super-resolution imaging

Inertia and spectral symmetry of eccentricity matrices of some clique trees

Injecting Semantic Concepts into End-to-End Image Captioning

NP-Match: When Neural Processes meet Semi-Supervised Learning

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

On joins of a clique and a co-clique as star complements in regular graphs

Rethinking Bayesian Deep Learning Methods for Semi-Supervised Volumetric Medical Image Segmentation

Scaling Up Vision-Language Pre-training for Image Captioning

Statistical learning for train delays and influence of winter climate and atmospheric icing

The Overlooked Classifier in Human-Object Interaction Recognition

The Overlooked Classifier in Human-Object Interaction Recognition

UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling

A best bound for $λ_2(G)$ to guarantee $κ(G) \geq 2$

Dirac Nodal Lines and Nodal Loops in a Topological Kagome Superconductor CsV$_3$Sb$_5$

Galleon: Reshaping the Square Peg of NFV

Anchor Box Optimization for Object Detection

Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer

Hashing-based Non-Maximum Suppression for Crowded Object Detection

Enhanced block sparse signal recovery based on $q$-ratio block constrained minimal singular values

Giant Enhancement of Solid Solubility in Monolayer BNC Alloys by Selective Orbital Coupling

Realization of Lieb Lattice in Covalent-organic Frameworks with Tunable Topology and Magnetism

Electronic properties of SnTe-class topological crystalline insulator materials

Manipulated Object Proposal: A Discriminative Object Extraction and Feature Fusion Framework for First-Person Daily Activity Recognition

The Xinglong 2.16-m Telescope: Current Instruments and Scientific Projects

Group $K$-Means

Experimental observation of Dirac-like surface states and topological phase transition in Pb$_{1-x}$Sn$_x$Te(111) films

On a Problem of Harary and Schwenk on Graphs with Distinct Eigenvalues

Optimized Cartesian $K$-Means

Randić energy and Randić eigenvalues

The existence and uniqueness of the smoothing solution of the Navier-Stokes equations

The global existence of the smoothing solution for the Navier-Stokes equations