Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
67works
0followers
39topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

67 published item(s)

preprint2026arXiv

AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation

Recent advances in 4D content generation have attracted increasing attention, yet creating high-quality animated 3D models remains challenging due to the complexity of modeling spatio-temporal distributions and the scarcity of 4D training data. We present AnimateAnyMesh++, a feed-forward framework for text-driven animation of arbitrary 3D meshes with substantial upgrades in data, architecture, and generative capability. First, we expand the DyMesh-XL dataset by mining dynamic content from Objaverse-XL, increasing the number of unique identities from 60K to 300K and substantially broadening category and motion diversity. Second, we redesign DyMeshVAE-Flex with power-law topology-aware attention and vertex-normal enhanced features, which significantly improves trajectory reconstruction, local geometry preservation, and mitigates trajectory-sticking artifacts. Third, we introduce architectural changes to both DyMeshVAE-Flex and the rectified-flow (RF) generator to support variable-length sequence training and generation, enabling longer animations while preserving reconstruction fidelity. Extensive experiments demonstrate that AnimateAnyMesh++ generates semantically accurate and temporally coherent mesh animations within seconds, surpassing prior approaches in quality and efficiency. The enlarged DyMesh-XL, the upgraded DyMeshVAE-Flex, and variable-length RF together deliver consistent gains across benchmarks and in-the-wild meshes. We will release code, models, and the expanded DyMesh-XL upon acceptance of this manuscript to facilitate research in 4D content creation.

preprint2026arXiv

DyDiT++: Diffusion Transformers with Timestep and Spatial Dynamics for Efficient Visual Generation

Diffusion Transformer (DiT), an emerging diffusion model for visual generation, has demonstrated superior performance but suffers from substantial computational costs. Our investigations reveal that these costs primarily stem from the static inference paradigm, which inevitably introduces redundant computation in certain diffusion timesteps and spatial regions. To overcome this inefficiency, we propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions. Building on these designs, we present an extended version, DyDiT++, with improvements in three key aspects. First, it extends the generation mechanism of DyDiT beyond diffusion to flow matching, demonstrating that our method can also accelerate flow-matching-based generation, enhancing its versatility. Furthermore, we enhance DyDiT to tackle more complex visual generation tasks, including video generation and text-to-image generation, thereby broadening its real-world applications. Finally, to address the high cost of full fine-tuning and democratize technology access, we investigate the feasibility of training DyDiT in a parameter-efficient manner and introduce timestep-based dynamic LoRA (TD-LoRA). Extensive experiments on diverse visual generation models, including DiT, SiT, Latte, and FLUX, demonstrate the effectiveness of DyDiT++. Remarkably, with <3% additional fine-tuning iterations, our approach reduces the FLOPs of DiT-XL by 51%, yielding 1.73x realistic speedup on hardware, and achieves a competitive FID score of 2.07 on ImageNet. The code is available at https://github.com/alibaba-damo-academy/DyDiT.

preprint2026arXiv

MedGround: Bridging the Evidence Gap in Medical Vision-Language Models with Verified Grounding Data

Vision-Language Models (VLMs) can generate convincing clinical narratives, yet frequently struggle to visually ground their statements. We posit this limitation arises from the scarcity of high-quality, large-scale clinical referring-localization pairs. To address this, we introduce MedGround, an automated pipeline that transforms segmentation resources into high-quality medical referring grounding data. Leveraging expert masks as spatial anchors, MedGround precisely derives localization targets, extracts shape and spatial cues, and guides VLMs to synthesize natural, clinically grounded queries that reflect morphology and location. To ensure data rigor, a multi-stage verification system integrates strict formatting checks, geometry- and medical-prior rules, and image-based visual judging to filter out ambiguous or visually unsupported samples. Finally, we present MedGround-35K, a novel multimodal medical dataset. Extensive experiments demonstrate that VLMs trained with MedGround-35K consistently achieve improved referring grounding performance, enhance multi-object semantic disambiguation, and exhibit strong generalization to unseen grounding settings. This work highlights MedGround as a scalable, data-driven approach to anchor medical language to verifiable visual evidence. Dataset and code will be released publicly upon acceptance.

preprint2023arXiv

Head-Free Lightweight Semantic Segmentation with Linear Transformer

Existing semantic segmentation works have been mainly focused on designing effective decoders; however, the computational load introduced by the overall structure has long been ignored, which hinders their applications on resource-constrained hardwares. In this paper, we propose a head-free lightweight architecture specifically for semantic segmentation, named Adaptive Frequency Transformer. It adopts a parallel architecture to leverage prototype representations as specific learnable local descriptions which replaces the decoder and preserves the rich image semantics on high-resolution features. Although removing the decoder compresses most of the computation, the accuracy of the parallel structure is still limited by low computational resources. Therefore, we employ heterogeneous operators (CNN and Vision Transformer) for pixel embedding and prototype representations to further save computational costs. Moreover, it is very difficult to linearize the complexity of the vision Transformer from the perspective of spatial domain. Due to the fact that semantic segmentation is very sensitive to frequency information, we construct a lightweight prototype learning block with adaptive frequency filter of complexity $O(n)$ to replace standard self attention with $O(n^{2})$. Extensive experiments on widely adopted datasets demonstrate that our model achieves superior accuracy while retaining only 3M parameters. On the ADE20K dataset, our model achieves 41.8 mIoU and 4.6 GFLOPs, which is 4.4 mIoU higher than Segformer, with 45% less GFLOPs. On the Cityscapes dataset, our model achieves 78.7 mIoU and 34.4 GFLOPs, which is 2.5 mIoU higher than Segformer with 72.5% less GFLOPs. Code is available at https://github.com/dongbo811/AFFormer.

preprint2022arXiv

Active Source Free Domain Adaptation

Source free domain adaptation (SFDA) aims to transfer a trained source model to the unlabeled target domain without accessing the source data. However, the SFDA setting faces an effect bottleneck due to the absence of source data and target supervised information, as evidenced by the limited performance gains of newest SFDA methods. In this paper, for the first time, we introduce a more practical scenario called active source free domain adaptation (ASFDA) that permits actively selecting a few target data to be labeled by experts. To achieve that, we first find that those satisfying the properties of neighbor-chaotic, individual-different, and target-like are the best points to select, and we define them as the minimum happy (MH) points. We then propose minimum happy points learning (MHPL) to actively explore and exploit MH points. We design three unique strategies: neighbor ambient uncertainty, neighbor diversity relaxation, and one-shot querying, to explore the MH points. Further, to fully exploit MH points in the learning process, we design a neighbor focal loss that assigns the weighted neighbor purity to the cross-entropy loss of MH points to make the model focus more on them. Extensive experiments verify that MHPL remarkably exceeds the various types of baselines and achieves significant performance gains at a small cost of labeling.

preprint2022arXiv

An empirical equilibrium model of formal and informal credit markets in developing countries

I develop and estimate a dynamic equilibrium model of risky entrepreneurs&#39; borrowing and savings decisions incorporating both formal and local-informal credit markets. Households have access to an exogenous formal credit market and to an informal credit market in which the interest rate is endogenously determined by the local demand and supply of credit. I estimate the model via Simulated Maximum Likelihood using Thai village data during an episode of formal credit market expansion. My estimates suggest that a 49 percent reduction in fixed costs increased the proportion of households borrowing formally by 36 percent, and that a doubling of the collateralized borrowing limits lowered informal interest rates by 24 percent. I find that more productive households benefited from the policies that expanded borrowing access, but less productive households lost in terms of welfare due to diminished savings opportunities. Gains are overall smaller than would be predicted by models that do not consider the informal credit market.

preprint2022arXiv

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled source domain to a different unlabeled target domain. Most existing UDA methods focus on learning domain-invariant feature representation, either from the domain level or category level, using convolution neural networks (CNNs)-based frameworks. One fundamental problem for the category level based UDA is the production of pseudo labels for samples in target domain, which are usually too noisy for accurate domain alignment, inevitably compromising the UDA performance. With the success of Transformer in various tasks, we find that the cross-attention in Transformer is robust to the noisy input pairs for better feature alignment, thus in this paper Transformer is adopted for the challenging UDA task. Specifically, to generate accurate input pairs, we design a two-way center-aware labeling algorithm to produce pseudo labels for target samples. Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively. Such design explicitly enforces the framework to learn discriminative domain-specific and domain-invariant representations simultaneously. The proposed method is dubbed CDTrans (cross-domain transformer), and it provides one of the first attempts to solve UDA tasks with a pure transformer solution. Experiments show that our proposed method achieves the best performance on public UDA datasets, e.g. VisDA-2017 and DomainNet. Code and models are available at https://github.com/CDTrans/CDTrans.

preprint2022arXiv

ChemRL-GEM: Geometry Enhanced Molecular Representation Learning for Property Prediction

Effective molecular representation learning is of great importance to facilitate molecular property prediction, which is a fundamental task for the drug and material industry. Recent advances in graph neural networks (GNNs) have shown great promise in applying GNNs for molecular representation learning. Moreover, a few recent studies have also demonstrated successful applications of self-supervised learning methods to pre-train the GNNs to overcome the problem of insufficient labeled molecules. However, existing GNNs and pre-training strategies usually treat molecules as topological graph data without fully utilizing the molecular geometry information. Whereas, the three-dimensional (3D) spatial structure of a molecule, a.k.a molecular geometry, is one of the most critical factors for determining molecular physical, chemical, and biological properties. To this end, we propose a novel Geometry Enhanced Molecular representation learning method (GEM) for Chemical Representation Learning (ChemRL). At first, we design a geometry-based GNN architecture that simultaneously models atoms, bonds, and bond angles in a molecule. To be specific, we devised double graphs for a molecule: The first one encodes the atom-bond relations; The second one encodes bond-angle relations. Moreover, on top of the devised GNN architecture, we propose several novel geometry-level self-supervised learning strategies to learn spatial knowledge by utilizing the local and global molecular 3D structures. We compare ChemRL-GEM with various state-of-the-art (SOTA) baselines on different molecular benchmarks and exhibit that ChemRL-GEM can significantly outperform all baselines in both regression and classification tasks. For example, the experimental results show an overall improvement of 8.8% on average compared to SOTA baselines on the regression tasks, demonstrating the superiority of the proposed method.

preprint2022arXiv

Controllable energy angular spectrum method

A controllable energy method, which considers the undersampling issue of the transfer function and valid spectral energy of a source signal, is proposed to implement angular spectrum diffraction calculation in near and far fields. The proposed method provides an optimized frequency boundary $f_{CE}$ within which it always keeps controllable energy to be diffracted. The controllable energy angular spectrum method significantly reduces the number of samples while having the same accuracy as previous angular spectrum methods, implying a higher calculation efficiency. The new perspective of analyzing spectral energy is shown to improve the performance of relevant diffraction calculations.

preprint2022arXiv

Detecting Owner-member Relationship with Graph Convolution Network in Fisheye Camera System

The owner-member relationship between wheels and vehicles contributes significantly to the 3D perception of vehicles, especially in embedded environments. However, to leverage this relationship we must face two major challenges: i) Traditional IoU-based heuristics have difficulty handling occluded traffic congestion scenarios. ii) The effectiveness and applicability of the solution in a vehicle-mounted system is difficult. To address these issues, we propose an innovative relationship prediction method, DeepWORD, by designing a graph convolutional network (GCN). Specifically, to improve the information richness, we use feature maps with local correlation as input to the nodes. Subsequently, we introduce a graph attention network (GAT) to dynamically correct the a priori estimation bias. Finally, we designed a dataset as a large-scale benchmark which has annotated owner-member relationship, called WORD. In the experiments we learned that the proposed method achieved state-of-the-art accuracy and real-time performance. The WORD dataset is made publicly available at https://github.com/NamespaceMain/ownermember-relationship-dataset.

preprint2022arXiv

Dynamic Gradient Reactivation for Backward Compatible Person Re-identification

We study the backward compatible problem for person re-identification (Re-ID), which aims to constrain the features of an updated new model to be comparable with the existing features from the old model in galleries. Most of the existing works adopt distillation-based methods, which focus on pushing new features to imitate the distribution of the old ones. However, the distillation-based methods are intrinsically sub-optimal since it forces the new feature space to imitate the inferior old feature space. To address this issue, we propose the Ranking-based Backward Compatible Learning (RBCL), which directly optimizes the ranking metric between new features and old features. Different from previous methods, RBCL only pushes the new features to find best-ranking positions in the old feature space instead of strictly alignment, and is in line with the ultimate goal of backward retrieval. However, the sharp sigmoid function used to make the ranking metric differentiable also incurs the gradient vanish issue, therefore stems the ranking refinement during the later period of training. To address this issue, we propose the Dynamic Gradient Reactivation (DGR), which can reactivate the suppressed gradients by adding dynamic computed constant during forward step. To further help targeting the best-ranking positions, we include the Neighbor Context Agents (NCAs) to approximate the entire old feature space during training. Unlike previous works which only test on the in-domain settings, we make the first attempt to introduce the cross-domain settings (including both supervised and unsupervised), which are more meaningful and difficult. The experimental results on all five settings show that the proposed RBCL outperforms previous state-of-the-art methods by large margins under all settings.

preprint2022arXiv

Early life height and weight production functions with endogenous energy and protein inputs

We examine effects of protein and energy intakes on height and weight growth for children between 6 and 24 months old in Guatemala and the Philippines. Using instrumental variables to control for endogeneity and estimating multiple specifications, we find that protein intake plays an important and positive role in height and weight growth in the 6-24 month period. Energy from other macronutrients, however, does not have a robust relation with these two anthropometric measures. Our estimates indicate that in contexts with substantial child undernutrition, increases in protein-rich food intake in the first 24 months can have important growth effects, which previous studies indicate are related significantly to a range of outcomes over the life cycle.

preprint2022arXiv

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Locating 3D objects from a single RGB image via Perspective-n-Points (PnP) is a long-standing problem in computer vision. Driven by end-to-end deep learning, recent studies suggest interpreting PnP as a differentiable layer, so that 2D-3D point correspondences can be partly learned by backpropagating the gradient w.r.t. object pose. Yet, learning the entire set of unrestricted 2D-3D points from scratch fails to converge with existing approaches, since the deterministic pose is inherently non-differentiable. In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose on the SE(3) manifold, essentially bringing categorical Softmax to the continuous domain. The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution. The underlying principle unifies the existing approaches and resembles the attention mechanism. EPro-PnP significantly outperforms competitive baselines, closing the gap between PnP-based method and the task-specific leaders on the LineMOD 6DoF pose estimation and nuScenes 3D object detection benchmarks.

preprint2022arXiv

Estimating the Effects of Educational System Consolidation: The Case of China&#39;s Rural School Closure Initiative

Global trends of fertility decline, population aging, and rural outmigration are creating pressures to consolidate school systems, with the rationale that economies of scale will enable higher quality education to be delivered in an efficient manner, despite longer travel distances for students. Yet, few studies have considered the implications of system consolidation for educational access and inequality, outside of the context of developed countries. We estimate the impact of educational infrastructure consolidation on educational attainment using the case of China&#39;s rural primary school closure policies in the early 2000s. We use data from a large household survey covering 728 villages in 7 provinces, and exploit variation in villages&#39; year of school closure and children&#39;s ages at closure to identify the causal impact of school closure. For girls exposed to closure during their primary school ages, we find an average decrease of 0.60 years of schooling by 2011, when children&#39;s mean age was 17 years old. Negative effects strengthen with time since closure. For boys, there is no corresponding significant effect. Different effects by gender may be related to greater sensitivity of girls&#39; enrollment to distance and greater responsiveness of boys&#39; enrollment to quality.

preprint2022arXiv

Exploiting dynamic nonlinearity in upconversion nanoparticles for super-resolution imaging

Single-beam super-resolution microscopy, also known as superlinear microscopy, exploits the nonlinear response of fluorescent probes in confocal microscopy. The technique requires no complex purpose-built system, light field modulation, or beam shaping. Here, we present a strategy to enhance spatial resolution of superlinear microscopy by modulating excitation intensity during image acquisition. This modulation induces dynamic optical nonlinearity in upconversion nanoparticles (UCNPs), resulting in variations of higher spatial-frequency information in the obtained images. The high-order information can be extracted with a proposed weighted finite difference imaging algorithm from raw fluorescence images, to generate an image with a higher resolution than superlinear microscopy images. We apply this approach to resolve two adjacent nanoparticles within a diffraction-limited area, improving the resolution to 130 nm. This work suggests a new scope for developing dynamic nonlinear fluorescent probes in super-resolution nanoscopy.

preprint2022arXiv

FAKD: Feature Augmented Knowledge Distillation for Semantic Segmentation

In this work, we explore data augmentations for knowledge distillation on semantic segmentation. To avoid over-fitting to the noise in the teacher network, a large number of training examples is essential for knowledge distillation. Imagelevel argumentation techniques like flipping, translation or rotation are widely used in previous knowledge distillation framework. Inspired by the recent progress on semantic directions on feature-space, we propose to include augmentations in feature space for efficient distillation. Specifically, given a semantic direction, an infinite number of augmentations can be obtained for the student in the feature space. Furthermore, the analysis shows that those augmentations can be optimized simultaneously by minimizing an upper bound for the losses defined by augmentations. Based on the observation, a new algorithm is developed for knowledge distillation in semantic segmentation. Extensive experiments on four semantic segmentation benchmarks demonstrate that the proposed method can boost the performance of current knowledge distillation methods without any significant overhead. Code is available at: https://github.com/jianlong-yuan/FAKD.

preprint2022arXiv

Fewer, better pathways for all? Intersectional impacts of rural school consolidation in China&#39;s minority regions

Primary school consolidation--the closure of small community schools or their mergers into larger, better-resourced schools--is emerging as a significant policy response to changing demographics in middle income countries with large rural populations. In China, large-scale consolidation took place in the early 21st century. Because officially-recognized minority populations disproportionately reside in rural and remote areas, minority students were among those at elevated risk of experiencing school consolidation. We analyze heterogeneous effects of consolidation on educational attainment and reported national language ability in China by exploiting variations in closure timing across villages and cohorts captured in a 2011 survey of provinces and autonomous regions with substantial minority populations. We consider heterogeneous treatment effects across groups defined at the intersections of minority status, gender, and community ethnic composition and socioeconomic status. Compared to villages with schools, villages whose schools had closed reported that the schools students now attended were better resourced, less likely to offer minority language of instruction, more likely to have Han teachers, farther away, and more likely to require boarding. Much more than Han youth, ethnic minority youth were negatively affected by closure, in terms of its impact on both educational attainment and written Mandarin facility. However, significant penalties accruing to minority youth occurred only in the poorest villages. Penalties were generally heavier for girls, but in the most ethnically segregated minority villages, boys from minority families were highly vulnerable to closure effects on attainment and written Mandarin facility. Results show that intersections of minority status, gender, and community characteristics can delineate significant heterogeneities in policy impacts.

preprint2022arXiv

Graph Convolution for Re-ranking in Person Re-identification

Nowadays, deep learning is widely applied to extract features for similarity computation in person re-identification (re-ID) and have achieved great success. However, due to the non-overlapping between training and testing IDs, the difference between the data used for model training and the testing data makes the performance of learned feature degraded during testing. Hence, re-ranking is proposed to mitigate this issue and various algorithms have been developed. However, most of existing re-ranking methods focus on replacing the Euclidean distance with sophisticated distance metrics, which are not friendly to downstream tasks and hard to be used for fast retrieval of massive data in real applications. In this work, we propose a graph-based re-ranking method to improve learned features while still keeping Euclidean distance as the similarity metric. Inspired by graph convolution networks, we develop an operator to propagate features over an appropriate graph. Since graph is the essential key for the propagation, two important criteria are considered for designing the graph, and three different graphs are explored accordingly. Furthermore, a simple yet effective method is proposed to generate a profile vector for each tracklet in videos, which helps extend our method to video re-ID. Extensive experiments on three benchmark data sets, e.g., Market-1501, Duke, and MARS, demonstrate the effectiveness of our proposed approach.

preprint2022arXiv

HCFRec: Hash Collaborative Filtering via Normalized Flow with Structural Consensus for Efficient Recommendation

The ever-increasing data scale of user-item interactions makes it challenging for an effective and efficient recommender system. Recently, hash-based collaborative filtering (Hash-CF) approaches employ efficient Hamming distance of learned binary representations of users and items to accelerate recommendations. However, Hash-CF often faces two challenging problems, i.e., optimization on discrete representations and preserving semantic information in learned representations. To address the above two challenges, we propose HCFRec, a novel Hash-CF approach for effective and efficient recommendations. Specifically, HCFRec not only innovatively introduces normalized flow to learn the optimal hash code by efficiently fit a proposed approximate mixture multivariate normal distribution, a continuous but approximately discrete distribution, but also deploys a cluster consistency preserving mechanism to preserve the semantic structure in representations for more accurate recommendations. Extensive experiments conducted on six real-world datasets demonstrate the superiority of our HCFRec compared to the state-of-art methods in terms of effectiveness and efficiency.

preprint2022arXiv

HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer

Accurate ADMET (an abbreviation for &#34;absorption, distribution, metabolism, excretion, and toxicity&#34;) predictions can efficiently screen out undesirable drug candidates in the early stage of drug discovery. In recent years, multiple comprehensive ADMET systems that adopt advanced machine learning models have been developed, providing services to estimate multiple endpoints. However, those ADMET systems usually suffer from weak extrapolation ability. First, due to the lack of labelled data for each endpoint, typical machine learning models perform frail for the molecules with unobserved scaffolds. Second, most systems only provide fixed built-in endpoints and cannot be customised to satisfy various research requirements. To this end, we develop a robust and endpoint extensible ADMET system, HelixADMET (H-ADMET). H-ADMET incorporates the concept of self-supervised learning to produce a robust pre-trained model. The model is then fine-tuned with a multi-task and multi-stage framework to transfer knowledge between ADMET endpoints, auxiliary tasks, and self-supervised tasks. Our results demonstrate that H-ADMET achieves an overall improvement of 4%, compared with existing ADMET systems on comparable endpoints. Additionally, the pre-trained model provided by H-ADMET can be fine-tuned to generate new and customised ADMET endpoints, meeting various demands of drug research and development requirements.

preprint2022arXiv

HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle

Accurate protein structure prediction can significantly accelerate the development of life science. The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and inference of AlphaFold2 from scratch. The cost of running the original AlphaFold2 is expensive for most individuals and institutions. Therefore, reducing this cost could accelerate the development of life science. We implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption. The performance is improved by operator fusion, tensor fusion, and hybrid parallelism computation, while the memory is optimized through Recompute, BFloat16, and memory read/write in-place. Compared with the original AlphaFold2 (implemented with Jax) and OpenFold (implemented with PyTorch), HelixFold needs only 7.5 days to complete the full end-to-end training and only 5.3 days when using hybrid parallelism, while both AlphaFold2 and OpenFold take about 11 days. HelixFold saves 1x training time. We verified that HelixFold&#39;s accuracy could be on par with AlphaFold2 on the CASP14 and CAMEO datasets. HelixFold&#39;s code is available on GitHub for free download: https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein/forecast.

preprint2022arXiv

Image-to-Video Re-Identification via Mutual Discriminative Knowledge Transfer

The gap in representations between image and video makes Image-to-Video Re-identification (I2V Re-ID) challenging, and recent works formulate this problem as a knowledge distillation (KD) process. In this paper, we propose a mutual discriminative knowledge distillation framework to transfer a video-based richer representation to an image based representation more effectively. Specifically, we propose the triplet contrast loss (TCL), a novel loss designed for KD. During the KD process, the TCL loss transfers the local structure, exploits the higher order information, and mitigates the misalignment of the heterogeneous output of teacher and student networks. Compared with other losses for KD, the proposed TCL loss selectively transfers the local discriminative features from teacher to student, making it effective in the ReID. Besides the TCL loss, we adopt mutual learning to regularize both the teacher and student networks training. Extensive experiments demonstrate the effectiveness of our method on the MARS, DukeMTMC-VideoReID and VeRi-776 benchmarks.

preprint2022arXiv

Information retrieval for label noise document ranking by bag sampling and group-wise loss

Long Document retrieval (DR) has always been a tremendous challenge for reading comprehension and information retrieval. The pre-training model has achieved good results in the retrieval stage and Ranking for long documents in recent years. However, there is still some crucial problem in long document ranking, such as data label noises, long document representations, negative data Unbalanced sampling, etc. To eliminate the noise of labeled data and to be able to sample the long documents in the search reasonably negatively, we propose the bag sampling method and the group-wise Localized Contrastive Estimation(LCE) method. We use the head middle tail passage for the long document to encode the long document, and in the retrieval, stage Use dense retrieval to generate the candidate&#39;s data. The retrieval data is divided into multiple bags at the ranking stage, and negative samples are selected in each bag. After sampling, two losses are combined. The first loss is LCE. To fit bag sampling well, after query and document are encoded, the global features of each group are extracted by convolutional layer and max-pooling to improve the model&#39;s resistance to the impact of labeling noise, finally, calculate the LCE group-wise loss. Notably, our model shows excellent performance on the MS MARCO Long document ranking leaderboard.

preprint2022arXiv

Lightweight Object-level Topological Semantic Mapping and Long-term Global Localization based on Graph Matching

Mapping and localization are two essential tasks for mobile robots in real-world applications. However, largescale and dynamic scenes challenge the accuracy and robustness of most current mature solutions. This situation becomes even worse when computational resources are limited. In this paper, we present a novel lightweight object-level mapping and localization method with high accuracy and robustness. Different from previous methods, our method does not need a prior constructed precise geometric map, which greatly releases the storage burden, especially for large-scale navigation. We use object-level features with both semantic and geometric information to model landmarks in the environment. Particularly, a learning topological primitive is first proposed to efficiently obtain and organize the object-level landmarks. On the basis of this, we use a robot-centric mapping framework to represent the environment as a semantic topology graph and relax the burden of maintaining global consistency at the same time. Besides, a hierarchical memory management mechanism is introduced to improve the efficiency of online mapping with limited computational resources. Based on the proposed map, the robust localization is achieved by constructing a novel local semantic scene graph descriptor, and performing multi-constraint graph matching to compare scene similarity. Finally, we test our method on a low-cost embedded platform to demonstrate its advantages. Experimental results on a large scale and multi-session real-world environment show that the proposed method outperforms the state of arts in terms of lightweight and robustness.

preprint2022arXiv

Network-ELAA Beamforming and Coverage Analysis for eMBB/URLLC in Spatially Non-Stationary Rician Channels

In vehicle-to-infrastructure (V2I) networks, a cluster of multi-antenna access points (APs) can collaboratively conduct transmitter beamforming to provide data services (e.g., eMBB or URLLC). The collaboration between APs effectively forms a networked linear antenna-array with extra-large aperture (i.e., network-ELAA), where the wireless channel exhibits spatial nonstationarity. Major contribution of this work lies in the analysis of beamforming gain and radio coverage for network-ELAA non-stationary Rician channels considering the AP clustering. Assuming that: 1) the total transmit-power is fixed and evenly distributed over APs, 2) the beam is formed only based on the line-of-sight (LoS) path, it is found that the beamforming gain is concave to the cluster size. The optimum size of the AP cluster varies with respect to the user&#39;s location, channel uncertainty as well as data services. A user located farther from the ELAA requires a larger cluster size. URLLC is more sensitive to the channel uncertainty when comparing to eMBB, thus requiring a larger cluster size to mitigate the channel fading effect and extend the coverage. Finally, it is shown that the network-ELAA can offer significant coverage extension (50% or more in most of cases) when comparing with the single-AP scenario.

preprint2022arXiv

Optimal allocations to heterogeneous agents with an application to stimulus checks

A planner allocates discrete transfers of size $D_g$ to $N$ heterogeneous groups labeled $g$ and has CES preferences over the resulting outcomes, $H_g(D_g)$. We derive a closed-form solution for optimally allocating a fixed budget subject to group-specific inequality constraints under the assumption that increments in the $H_g$ functions are non-increasing. We illustrate our method by studying allocations of &#34;support checks&#34; from the U.S. government to households during both the Great Recession and the COVID-19 pandemic. We compare the actual allocations to optimal ones under alternative constraints, assuming the government focused on stimulating aggregate consumption during the 2008--2009 crisis and focused on welfare during the 2020--2021 crisis. The inputs for this analysis are obtained from versions of a life-cycle model with heterogeneous households, which predicts household-type-specific consumption and welfare responses to tax rebates and cash transfers.

preprint2022arXiv

Pointwise Characterization of Besov and Triebel-Lizorkin Spaces on Spaces of Homogeneous Type

In this article, the authors establish the pointwise characterization of Besov and Triebel-Lizorkin spaces on spaces of homogeneous type via clarifying the relationship among Hajłasz-Sobolev spaces, Hajłasz-Besov and Hajłasz-Triebel-Lizorkin spaces, grand Besov and Triebel-Lizorkin spaces, and Besov and Triebel-Lizorkin spaces. A major novelty of this article is that all results presented in this article get rid of both the dependence on the reverse doubling condition of the measure and the metric condition of the quasi-metric under consideration. Moreover, the pointwise characterization of the inhomogeneous version is new even when the underlying space is an RD-space.

preprint2022arXiv

Riesz Transform Characterization of Hardy Spaces Associated with Ball Quasi-Banach Function Spaces

Let $X$ be a ball quasi-Banach function space satisfying some mild assumptions and $H_X(\mathbb{R}^n)$ the Hardy space associated with $X$. In this article, the authors introduce both the Hardy space $H_X(\mathbb{R}^{n+1}_+)$ of harmonic functions and the Hardy space $\mathbb{H}_X(\mathbb{R}^{n+1}_+)$ of harmonic vectors, associated with $X$, and then establish the isomorphisms among $H_X(\mathbb{R}^n)$, $H_{X,2}(\mathbb{R}^{n+1}_+)$, and $\mathbb{H}_{X,2}(\mathbb{R}^{n+1}_+)$, where $H_{X,2}(\mathbb{R}^{n+1}_+)$ and $\mathbb{H}_{X,2}(\mathbb{R}^{n+1}_+)$ are, respectively, certain subspaces of $H_X(\mathbb{R}^{n+1}_+)$ and $\mathbb{H}_X(\mathbb{R}^{n+1}_+)$. Using these isomorphisms, the authors establish the first order Riesz transform characterization of $H_X(\mathbb{R}^n)$. The higher order Riesz transform characterization of $H_X(\mathbb{R}^n)$ is also obtained. The results obtained in this article have a wide range of generality and can be applied to the classical Hardy space, the weighted Hardy space, the Herz-Hardy space, the Lorentz-Hardy space, the variable Hardy space, the mixed-norm Hardy space, the local generalized Herz-Hardy space, and the mixed-norm Herz-Hardy space.

preprint2022arXiv

Roadmap for Optical Tweezers

Optical tweezers are tools made of light that enable contactless pushing, trapping, and manipulation of objects ranging from atoms to space light sails. Since the pioneering work by Arthur Ashkin in the 1970s, optical tweezers have evolved into sophisticated instruments and have been employed in a broad range of applications in life sciences, physics, and engineering. These include accurate force and torque measurement at the femtonewton level, microrheology of complex fluids, single micro- and nanoparticle spectroscopy, single-cell analysis, and statistical-physics experiments. This roadmap provides insights into current investigations involving optical forces and optical tweezers from their theoretical foundations to designs and setups. It also offers perspectives for applications to a wide range of research fields, from biophysics to space exploration.

preprint2022arXiv

Same environment, stratified impacts? Air pollution, extreme temperatures, and birth weight in south China

This paper investigates whether associations between birth weight and prenatal ambient environmental conditions--pollution and extreme temperatures--differ by 1) maternal education; 2) children&#39;s innate health; and 3) interactions between these two. We link birth records from Guangzhou, China, during a period of high pollution, to ambient air pollution (PM10 and a composite measure) and extreme temperature data. We first use mean regressions to test whether, overall, maternal education is an &#34;effect modifier&#34; in the relationships between ambient air pollution, extreme temperature, and birth weight. We then use conditional quantile regressions to test for effect heterogeneity according to the unobserved innate vulnerability of babies after conditioning on other confounders. Results show that 1) the negative association between ambient exposures and birth weight is twice as large at lower conditional quantiles of birth weights as at the median; 2) the protection associated with college-educated mothers with respect to pollution and extreme heat is heterogeneous and potentially substantial: between 0.02 and 0.34 standard deviations of birth weights, depending on the conditional quantiles; 3) this protection is amplified under more extreme ambient conditions and for infants with greater unobserved innate vulnerabilities.

preprint2022arXiv

Scaled ReLU Matters for Training Vision Transformers

Vision transformers (ViTs) have been an alternative design paradigm to convolutional neural networks (CNNs). However, the training of ViTs is much harder than CNNs, as it is sensitive to the training parameters, such as learning rate, optimizer and warmup epoch. The reasons for training difficulty are empirically analysed in ~\cite{xiao2021early}, and the authors conjecture that the issue lies with the \textit{patchify-stem} of ViT models and propose that early convolutions help transformers see better. In this paper, we further investigate this problem and extend the above conclusion: only early convolutions do not help for stable training, but the scaled ReLU operation in the \textit{convolutional stem} (\textit{conv-stem}) matters. We verify, both theoretically and empirically, that scaled ReLU in \textit{conv-stem} not only improves training stabilization, but also increases the diversity of patch tokens, thus boosting peak performance with a large margin via adding few parameters and flops. In addition, extensive experiments are conducted to demonstrate that previous ViTs are far from being well trained, further showing that ViTs have great potential to be a better substitute of CNNs.

preprint2022arXiv

Structural randomised selection

An important problem in the analysis of high-dimensional omics data is to identify subsets of molecular variables that are associated with a phenotype of interest. This requires addressing the challenges of high dimensionality, strong multicollinearity and model uncertainty. We propose a new ensemble learning approach for improving the performance of sparse penalised regression methods, called STructural RANDomised Selection (STRANDS). The approach, that builds and improves upon the Random Lasso method, consists of two steps. In both steps, we reduce dimensionality by repeated subsampling of variables. We apply a penalised regression method to each subsampled dataset and average the results. In the first step, subsampling is informed by variable correlation structure, and in the second step, by variable importance measures from the first step. STRANDS can be used with any sparse penalised regression approach as the &#34;base learner&#34;. Using synthetic data and real biological datasets, we demonstrate that STRANDS typically improves upon its base learner, and that taking account of the correlation structure in the first step can help to improve the efficiency with which the model space may be explored.

preprint2022arXiv

TCR: A Transformer Based Deep Network for Predicting Cancer Drugs Response

Predicting clinical outcomes to anti-cancer drugs on a personalized basis is challenging in cancer treatment due to the heterogeneity of tumors. Traditional computational efforts have been made to model the effect of drug response on individual samples depicted by their molecular profile, yet overfitting occurs because of the high dimension for omics data, hindering models from clinical application. Recent research shows that deep learning is a promising approach to build drug response models by learning alignment patterns between drugs and samples. However, existing studies employed the simple feature fusion strategy and only considered the drug features as a whole representation while ignoring the substructure information that may play a vital role when aligning drugs and genes. Hereby in this paper, we propose TCR (Transformer based network for Cancer drug Response) to predict anti-cancer drug response. By utilizing an attention mechanism, TCR is able to learn the interactions between drug atom/sub-structure and molecular signatures efficiently in our study. Furthermore, a dual loss function and cross sampling strategy were designed to improve the prediction power of TCR. We show that TCR outperformed all other methods under various data splitting strategies on all evaluation matrices (some with significant improvement). Extensive experiments demonstrate that TCR shows significantly improved generalization ability on independent in-vitro experiments and in-vivo real patient data. Our study highlights the prediction power of TCR and its potential value for cancer drug repurpose and precision oncology treatment.

preprint2022arXiv

Towards Boosting the Open-Domain Chatbot with Human Feedback

Many open-domain dialogue models pre-trained with social media comments can generate coherent replies but have difficulties producing engaging responses when interacting with real users. This phenomenon might mainly result from the deficiency of annotated human-human conversations and the misalignment with human preference. In this paper, we propose a novel and efficient approach Diamante to boost the open-domain chatbot, where two kinds of human feedback (including explicit demonstration and implicit preference) are collected and leveraged. By asking annotators to select or amend the model-generated candidate responses, Diamante efficiently collects the human demonstrated responses and constructs a Chinese chit-chat dataset. To enhance the alignment with human preference, Diamante leverages the implicit preference in the data collection process and introduces the generation-evaluation joint training. Comprehensive experiments indicate that the Diamante dataset and joint training paradigm can significantly boost the performance of Chinese pre-trained dialogue models.

preprint2022arXiv

You are what your parents expect: Height and local reference points

Recent estimates are that about 150 million children under five years of age are stunted, with substantial negative consequences for their schooling, cognitive skills, health, and economic productivity. Therefore, understanding what determines such growth retardation is significant for designing public policies that aim to address this issue. We build a model for nutritional choices and health with reference-dependent preferences. Parents care about the health of their children relative to some reference population. In our empirical model, we use height as the health outcome that parents target. Reference height is an equilibrium object determined by earlier cohorts&#39; parents&#39; nutritional choices in the same village. We explore the exogenous variation in reference height produced by a protein-supplementation experiment in Guatemala to estimate our model&#39;s parameters. We use our model to decompose the impact of the protein intervention on height into price and reference-point effects. We find that the changes in reference points account for 65% of the height difference between two-year-old children in experimental and control villages in the sixth annual cohort born after the initiation of the intervention.

preprint2021arXiv

1st Place Solution to ECCV-TAO-2020: Detect and Represent Any Object for Tracking

We extend the classical tracking-by-detection paradigm to this tracking-any-object task. Solid detection results are first extracted from TAO dataset. Some state-of-the-art techniques like \textbf{BA}lanced-\textbf{G}roup \textbf{S}oftmax (\textbf{BAGS}\cite{li2020overcoming}) and DetectoRS\cite{qiao2020detectors} are integrated during detection. Then we learned appearance features to represent any object by training feature learning networks. We ensemble several models for improving detection and feature representation. Simple linking strategies with most similar appearance features and tracklet-level post association module are finally applied to generate final tracking results. Our method is submitted as \textbf{AOA} on the challenge website. Code is available at https://github.com/feiaxyt/Winner_ECCV20_TAO.

preprint2021arXiv

Difference Characterization of Besov and Triebel-Lizorkin Spaces on Spaces of Homogeneous Type

In this article, the authors introduce the spaces of Lipschitz type on spaces of homogeneous type in the sense of Coifman and Weiss, and discuss their relations with Besov and Triebel-Lizorkin spaces. As an application, the authors establish the difference characterization of Besov and Triebel-Lizorkin spaces on spaces of homogeneous type. A major novelty of this article is that all results presented in this article get rid of the dependence on the reverse doubling assumption of the considered measure of the underlying space ${\mathcal X}$ via using the geometrical property of ${\mathcal X}$ expressed by its dyadic reference points, dyadic cubes, and the (local) lower bound. Moreover, some results when $p\le 1$ but near to $1$ are new even when ${\mathcal X}$ is an RD-space.

preprint2021arXiv

Intelligent Electric Vehicle Charging Recommendation Based on Multi-Agent Reinforcement Learning

Electric Vehicle (EV) has become a preferable choice in the modern transportation system due to its environmental and energy sustainability. However, in many large cities, EV drivers often fail to find the proper spots for charging, because of the limited charging infrastructures and the spatiotemporally unbalanced charging demands. Indeed, the recent emergence of deep reinforcement learning provides great potential to improve the charging experience from various aspects over a long-term horizon. In this paper, we propose a framework, named Multi-Agent Spatio-Temporal Reinforcement Learning (Master), for intelligently recommending public accessible charging stations by jointly considering various long-term spatiotemporal factors. Specifically, by regarding each charging station as an individual agent, we formulate this problem as a multi-objective multi-agent reinforcement learning task. We first develop a multi-agent actor-critic framework with the centralized attentive critic to coordinate the recommendation between geo-distributed agents. Moreover, to quantify the influence of future potential charging competition, we introduce a delayed access strategy to exploit the knowledge of future charging competition during training. After that, to effectively optimize multiple learning objectives, we extend the centralized attentive critic to multi-critics and develop a dynamic gradient re-weighting strategy to adaptively guide the optimization direction. Finally, extensive experiments on two real-world datasets demonstrate that Master achieves the best comprehensive performance compared with nine baseline approaches.

preprint2021arXiv

Learning to Select External Knowledge with Multi-Scale Negative Sampling

The Track-1 of DSTC9 aims to effectively answer user requests or questions during task-oriented dialogues, which are out of the scope of APIs/DB. By leveraging external knowledge resources, relevant information can be retrieved and encoded into the response generation for these out-of-API-coverage queries. In this work, we have explored several advanced techniques to enhance the utilization of external knowledge and boost the quality of response generation, including schema guided knowledge decision, negatives enhanced knowledge selection, and knowledge grounded response generation. To evaluate the performance of our proposed method, comprehensive experiments have been carried out on the publicly available dataset. Our approach was ranked as the best in human evaluation of DSTC9 Track-1.

preprint2021arXiv

Multi-object Tracking with a Hierarchical Single-branch Network

Recent Multiple Object Tracking (MOT) methods have gradually attempted to integrate object detection and instance re-identification (Re-ID) into a united network to form a one-stage solution. Typically, these methods use two separated branches within a single network to accomplish detection and Re-ID respectively without studying the inter-relationship between them, which inevitably impedes the tracking performance. In this paper, we propose an online multi-object tracking framework based on a hierarchical single-branch network to solve this problem. Specifically, the proposed single-branch network utilizes an improved Hierarchical Online In-stance Matching (iHOIM) loss to explicitly model the inter-relationship between object detection and Re-ID. Our novel iHOIM loss function unifies the objectives of the two sub-tasks and encourages better detection performance and feature learning even in extremely crowded scenes. Moreover, we propose to introduce the object positions, predicted by a motion model, as region proposals for subsequent object detection, where the intuition is that detection results and motion predictions can complement each other in different scenarios. Experimental results on MOT16 and MOT20 datasets show that we can achieve state-of-the-art tracking performance, and the ablation study verifies the effectiveness of each proposed component.

preprint2021arXiv

Multiplexed structured illumination super-resolution imaging with time-domain upconversion nanoparticles

The emerging optical multiplexing within nanoscale shows super-capacity in encoding information by using the time-domain fingerprints from uniform nanoparticles. However, the optical diffraction limit compromises the decoding throughput and accuracy of the nanoparticles during wide-field imaging. This, in turn, challenges the quality of nanoparticles to afford the modulated excitation condition, and further to retain the multiplexed optical fingerprints for super-resolution multiplexing. Here we report a tailor-made time-domain super-resolution method with the lifetime-engineered upconversion nanoparticles for multiplexing. We demonstrate that the nanoparticles are bright, uniform, and stable under structured illumination, which supports a lateral resolution of 186 nm, less than 1/4th of the excitation wavelength. We further develop a deep learning algorithm to coordinate with super-resolution images for more accurate decoding compared to a numeric algorithm. We demonstrate a three-channel sub-diffraction-limit imaging-based optical multiplexing with decoding accuracies above 93% for each channel, and larger than 60% accuracies for potential seven-channel multiplexing. The improved resolution provides high throughput by resolving the particles within the optical limit, which enables higher multiplexing capacity in space. This time-domain super-resolution multiplexing opens a new horizon for handling the growing amount of information content, diseases source, and security risk in modern society

preprint2021arXiv

The Stellar &#34;Snake&#34; I: Whole Structure and Properties

To complement our previous discovery of the young snake-like structure in the solar neighborhood and reveal the structure&#39;s full extent, we build two samples of stars within the Snake and its surrounding territory from {\tt Gaia EDR3}. With the friends-of-friends algorithm, we identify 2694 and 9615 Snake member candidates from the two samples. Thirteen open clusters are embedded in these member candidates. By combining the spectroscopic data from multiple surveys, we investigate the comprehensive properties of the candidates and find that they \thj{are very likely to} belong to one sizable structure, since most of the components are well bridged in their spatial distributions, and follow a single stellar population with an age of $30-40$\,Myr and solar metallicity. This sizable structure is best explained as hierarchically primordial, and probably formed from a filamentary giant molecular cloud with unique formation history in localized regions. To analyze the dynamics of the Snake, we divide the structure into five groups according to their tangential velocities; we find that the groups are expanding at a coherent rate ($κ_X\sim3.0\,\times10^{-2}\,\rm km\,s^{-1}\,pc^{-1}$) along the length of the structure ($X$-direction). \thj{The corresponding expansion age ($τ\sim33$\,Myr) is highly consistent with the age of the Snake}. With over ten thousand member stars, the Snake is an ideal laboratory to study nearby coeval stellar formation, stellar physics, and environmental evolution over a large spatial extent.

preprint2020arXiv

A Quantitative Analytical Model for Predicting and Optimizing the Rate Performance of Battery Cells

An important objective of designing lithium-ion rechargeable battery cells is to maximize their rate performance without compromising the energy density, which is mainly achieved through computationally expensive numerical simulations at present. Here we present a simple analytical model for predicting the rate performance of battery cells limited by electrolyte transport without any fitting parameters. It exhibits very good agreement with simulations over a wide range of discharge rate and electrode thickness and offers a speedup of >10$^5$ times. The optimal electrode properties predicted by the model are of less than 10% difference from simulation results, suggesting it as an attractive computational tool for the cell-level battery architecture design. The model also offers important insights on practical ways to improve the rate performance of thick electrodes, including avoiding electrode materials such as LiFePO$_4$ and Li$_4$Ti$_5$O$_{12}$ whose open-circuit potentials are insensitive to the state of charge and utilizing lithium metal anode to synergistically accelerate electrolyte transport within thick cathodes.

preprint2020arXiv

Anomalous bremsstrahlung and the structure of cosmic ray electron-positron fluxes at the GeV-TeV energy range

We reveal that the energy spectra of electrons-positrons in primary cosmic rays measured at atmosphere top have double structures: an excess component $Φ^s_{e^+}(E)=Φ^s_{e^-}(E)$ around $400 GeV$, which origins from a strong $e^+e^-$-source and the distorted background $Φ^0_{e^-}(E)$. We supposed that the difference between AMS-CALET and Fermi-LAT-DAMPE data origins from the energy loss of the fluxes due to the anomalous bremsstrahlung effect at a special window. The evolution of spectra under anomalous bremsstrahlung effect satisfies an improved electromagnetic cascade equation. The above spectra are parameterized and they can be regarded as the subjects exploring new physics. We suggest to check the previous applications of the Bethe-Heitler formula in the study of the propagation of high energy electrons and photons.

preprint2020arXiv

Are $N\barΩ$ bound states?

Inspired by the progress of the experimental search of the $NΩ$ dibaryon by the STAR collaboration, we study $N\barΩ$ systems in the framework of quark delocalization color screening model. Our results show that the attraction between $N$ and $\barΩ$ is a little bit larger than that between $N$ and $Ω$, which indicates that it is more possible for the $N\barΩ$ than the $NΩ$ system to form bound states. The dynamic calculations state that both the $J^{P}=1^{+}$ and $2^{+}$ $N\barΩ$ systems are bound states. The binding energy of these two states are deeper than that of $NΩ$ systems with $J^{P}=2^{+}$, and the $NΩ$ system with $J^{P}=1^{+}$ is unbound. The calculation of the low-energy scattering phase shifts, scattering length and the effective range also supports the existence of the $N\barΩ$ bound states with $J^{P}=1^{+}$ and $2^{+}$. So the $N\barΩ$ states are better hexaquark states and stronger signals are expected in experiments.

preprint2020arXiv

Efficient network immunization under limited knowledge

Targeted immunization or attacks of large-scale networks has attracted significant attention by the scientific community. However, in real-world scenarios, knowledge and observations of the network may be limited thereby precluding a full assessment of the optimal nodes to immunize (or remove) in order to avoid epidemic spreading such as that of current COVID-19 epidemic. Here, we study a novel immunization strategy where only $n$ nodes are observed at a time and the most central between these $n$ nodes is immunized (or attacked). This process is continued repeatedly until $1-p$ fraction of nodes are immunized (or attacked). We develop an analytical framework for this approach and determine the critical percolation threshold $p_c$ and the size of the giant component $P_{\infty}$ for networks with arbitrary degree distributions $P(k)$. In the limit of $n\to\infty$ we recover prior work on targeted attack, whereas for $n=1$ we recover the known case of random failure. Between these two extremes, we observe that as $n$ increases, $p_c$ increases quickly towards its optimal value under targeted immunization (attack) with complete information. In particular, we find a new scaling relationship between $|p_c(\infty)-p_c(n)|$ and $n$ as $|p_c(\infty)-p_c(n)|\sim n^{-1}\exp(-αn)$. For Scale-free (SF) networks, where $P(k)\sim k^{-γ}, 2<γ<3$, we find that $p_c$ has a transition from zero to non-zero when $n$ increases from $n=1$ to order of $\log N$ ($N$ is the size of network). Thus, for SF networks, knowledge of order of $\log N$ nodes and immunizing them can reduce dramatically an epidemics.

preprint2020arXiv

Heterochromatic nonlinear optical responses in upconversion nanoparticles for point spread function engineering

Point spread function (PSF) engineering of the emitter can code higher spatial frequency information of an image to break diffraction limit but suffer from the complexed optical systems. Here we present a robust strategy to simultaneously achieve diverse PSFs from upconversion nanoparticles under a single doughnut-shape scanning beam. By saturating the four-photon state, the high-frequency information can be extracted through the doughnut emission PSF. In contrast, the complementary lower frequency information can be carried out by the Gaussian-like emission PSF, as a result of over-saturated at the two-photon state. With the Fourier domain heterochromatic fusion, we verify the capability of the synthesised PSF to cover both low and high-frequency information, yielding the overall enhanced image quality. We show a spatial resolution of 40 nm, 1/24th of the excitation wavelength. This work suggests a new scope for developing nonlinear multi-colour emitting probes to improve image quality and noise control in nanoscopy.

preprint2020arXiv

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well-developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2,300 data-generating scenarios, including both synthetic and semi-synthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely-used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a `no panacea&#39; view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

preprint2020arXiv

Hyperspectral City V1.0 Dataset and Benchmark

This document introduces the background and the usage of the Hyperspectral City Dataset and the benchmark. The documentation first starts with the background and motivation of the dataset. Follow it, we briefly describe the method of collecting the dataset and the processing method from raw dataset to the final release dataset, specifically, the version 1.0. We also provide the detailed usage of the dataset and the evaluation metric for submitted the result for the 2019 Hyperspectral City Challenge.

preprint2020arXiv

Looking for the possible gluon condensation signature in sub-TeV gamma-ray spectra: from active galactic nuclei to gamma ray bursts

The gluon condensation in the proton as a dynamical model is used to treat a series of unsolved puzzles in sub-TeV gamma ray spectra, they include the broken power-law of blazar&#39;s radiation, the hardening confusion of 1ES 1426+428, Mkn 501, and the recently recorded sub-TeV gamma spectra of GRB 180720B and GRB 190114C. We find that the above anomalous phenomena in gamma ray energy spectra can be understood with the simple broken power law based on a QCD gluon condensation effect.

preprint2020arXiv

MBCAL: Sample Efficient and Variance Reduced Reinforcement Learning for Recommender Systems

In recommender systems such as news feed stream, it is essential to optimize the long-term utilities in the continuous user-system interaction processes. Previous works have proved the capability of reinforcement learning in this problem. However, there are many practical challenges to implement deep reinforcement learning in online systems, including low sample efficiency, uncontrollable risks, and excessive variances. To address these issues, we propose a novel reinforcement learning method, namely model-based counterfactual advantage learning (MBCAL). The proposed method takes advantage of the characteristics of recommender systems and draws ideas from the model-based reinforcement learning method for higher sample efficiency. It has two components: an environment model that predicts the instant user behavior one-by-one in an auto-regressive form, and a future advantage model that predicts the future utility. To alleviate the impact of excessive variance when learning the future advantage model, we employ counterfactual comparisons derived from the environment model. In consequence, the proposed method possesses high sample efficiency and significantly lower variance; Also, it is able to use existing user logs to avoid the risks of starting from scratch. In contrast to its capability, its implementation cost is relatively low, which fits well with practical systems. Theoretical analysis and elaborate experiments are presented. Results show that the proposed method transcends the other supervised learning and RL-based methods in both sample efficiency and asymptotic performances.

preprint2020arXiv

Multi-Domain Learning and Identity Mining for Vehicle Re-Identification

This paper introduces our solution for the Track2 in AI City Challenge 2020 (AICITY20). The Track2 is a vehicle re-identification (ReID) task with both the real-world data and synthetic data. Our solution is based on a strong baseline with bag of tricks (BoT-BS) proposed in person ReID. At first, we propose a multi-domain learning method to joint the real-world and synthetic data to train the model. Then, we propose the Identity Mining method to automatically generate pseudo labels for a part of the testing data, which is better than the k-means clustering. The tracklet-level re-ranking strategy with weighted features is also used to post-process the results. Finally, with multiple-model ensemble, our method achieves 0.7322 in the mAP score which yields third place in the competition. The codes are available at https://github.com/heshuting555/AICITY2020_DMT_VehicleReID.

preprint2020arXiv

Neural Video Coding using Multiscale Motion Compensation and Spatiotemporal Context Model

Over the past two decades, traditional block-based video coding has made remarkable progress and spawned a series of well-known standards such as MPEG-4, H.264/AVC and H.265/HEVC. On the other hand, deep neural networks (DNNs) have shown their powerful capacity for visual content understanding, feature extraction and compact representation. Some previous works have explored the learnt video coding algorithms in an end-to-end manner, which show the great potential compared with traditional methods. In this paper, we propose an end-to-end deep neural video coding framework (NVC), which uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals, respectively. Novel features of NVC include: 1) To estimate and compensate motion over a large range of magnitudes, we propose an unsupervised multiscale motion compensation network (MS-MCN) together with a pyramid decoder in the VAE for coding motion features that generates multiscale flow fields, 2) we design a novel adaptive spatiotemporal context model for efficient entropy coding for motion information, 3) we adopt nonlocal attention modules (NLAM) at the bottlenecks of the VAEs for implicit adaptive feature extraction and activation, leveraging its high transformation capacity and unequal weighting with joint global and local information, and 4) we introduce multi-module optimization and a multi-frame training strategy to minimize the temporal error propagation among P-frames. NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods following the common test conditions, demonstrating consistent gains across all popular test sequences for both PSNR and MS-SSIM distortion metrics.

preprint2020arXiv

On Transfer Operators for Markovian Products of Invertible Random Matrices

In this article we consider the Markovian products of invertible (not necessarily positive) matrices chosen from a strongly irreducible, contracting, finite set of matrices. We construct Markovian transfer operators and prove the spectral property which draws a connection between the top Lyapunov exponent associated to the random matrix product problem and the spectrum of the corresponding Markovian transfer operator.

preprint2020arXiv

PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable

Pre-training models have been proved effective for a wide range of natural language processing tasks. Inspired by this, we propose a novel dialogue generation pre-training framework to support various kinds of conversations, including chit-chat, knowledge grounded dialogues, and conversational question answering. In this framework, we adopt flexible attention mechanisms to fully leverage the bi-directional context and the uni-directional characteristic of language generation. We also introduce discrete latent variables to tackle the inherent one-to-many mapping problem in response generation. Two reciprocal tasks of response generation and latent act recognition are designed and carried out simultaneously within a shared network. Comprehensive experiments on three publicly available datasets verify the effectiveness and superiority of the proposed framework.

preprint2020arXiv

PSDet: Efficient and Universal Parking Slot Detection

While real-time parking slot detection plays a critical role in valet parking systems, existing methods have limited success in real-world applications. We argue two reasons accounting for the unsatisfactory performance: \romannumeral1, The available datasets have limited diversity, which causes the low generalization ability. \romannumeral2, Expert knowledge for parking slot detection is under-estimated. Thus, we annotate a large-scale benchmark for training the network and release it for the benefit of community. Driven by the observation of various parking lots in our benchmark, we propose the circular descriptor to regress the coordinates of parking slot vertexes and accordingly localize slots accurately. To further boost the performance, we develop a two-stage deep architecture to localize vertexes in the coarse-to-fine manner. In our benchmark and other datasets, it achieves the state-of-the-art accuracy while being real-time in practice. Benchmark is available at: https://github.com/wuzzh/Parking-slot-dataset

preprint2020arXiv

Quantum-enhanced interferometry with asymmetric beam splitters

In this paper, we investigate the phase sensitivities in two-path optical interferometry with asymmetric beam splitters. Here, we present the optimal conditions for the transmission ratio and the phase of the beam splitter to gain the highest sensitivities for a general class of non-classical states with parity symmetry. Additionally, we address the controversial question of whether the scheme with a combination of coherent state and photon-added or photon-subtracted squeezed vacuum state is better or worse than the most celebrated one using a combination of coherent state and squeezed vacuum state.

preprint2020arXiv

Response to Comment on Widely tunable compact terahertz gas lasers

We recently demonstrated a widely tunable THz molecular laser and reported mathematical formulas and a table for comparing how various molecules would perform as such lasers (Chevalier et al., Science, 15 November 2019, p. 856-860). Here we correct the value of a single parameter used to calculate the table (see Erratum for Chevalier et al.), thereby eliminating the concerns raised by Lampin and Barbieri (Lampin et al., arXiv:2004.04422). We also show that our simplified model for the output THz power is a better approximation than the alternative one proposed in the technical comment.

preprint2020arXiv

Self-evolving ghost imaging

Ghost imaging can capture 2D images with a point detector instead of an array sensor. It therefore offers a solution to the challenge of building area format sensors in wavebands where such sensors are difficult and expensive to produce and opens up new imaging modalities due to high-performance single-pixel detectors. Traditionally, ghost imaging retrieves the image of an object offline, by correlating measured light intensities and applied illuminating patterns. Here we present a feedback-based approach for online updating of the imaging result that can bypass post-processing, termed self-evolving ghost imaging (SEGI). We introduce a genetic algorithm to optimize the illumination patterns in real-time to match the objects shape according to the measured total light intensity. We theoretically and experimentally demonstrate this concept for static and dynamic imaging. This method opens new perspectives for real-time ghost imaging in applications such as remote sensing (e.g. machine vision, LiDAR systems in autonomous vehicles) and biological imaging.

preprint2020arXiv

Soft-mode and Anderson-like localization in two-phase disordered media

Wave localization is ubiquitous in disordered media -- from amorphous materials, where soft-mode localization is closely related to materials failure, to semi-conductors, where Anderson localization leads to metal-insulator transition. Our main understanding, though, is based on discrete models. Here, we provide a continuum perspective on the wave localization in two-phase disordered elastic media by studying the scalar wave equation with heterogeneous modulus and/or density. At low frequencies, soft modes arise as a result of disordered elastic modulus, which can also be predicted by the localization landscape. At high frequencies, Anderson-like localization occurs due to disorder either in density or modulus. For the latter case, we demonstrate how the vibrational dynamics changes from plane waves to diffusons with increasing frequency. Finally, we discuss the implications of our findings on the design of architected soft materials.

preprint2020arXiv

Super-transport of Excitons in Atomically Thin Organic Semiconductors at the 2D Quantum Limit

Long-range and fast transport of coherent excitons is important for development of high-speed excitonic circuits and quantum computing applications. However, most of these coherent excitons have only been observed in some low-dimensional semiconductors when coupled with cavities, as there are large inhomogeneous broadening and dephasing effects on the exciton transport in their native states of the materials. Here, by confining coherent excitons at the 2D quantum limit, we firstly observed molecular aggregation enabled super-transport of excitons in atomically thin two-dimensional (2D) organic semiconductors between coherent states, with a measured a high effective exciton diffusion coefficient of 346.9 cm2/sec at room temperature. This value is one to several orders of magnitude higher than the reported values from other organic molecular aggregates and low-dimensional inorganic materials. Without coupling to any optical cavities, the monolayer pentacene sample, a very clean 2D quantum system (1.2 nm thick) with high crystallinity (J type aggregation) and minimal interfacial states, showed superradiant emissions from the Frenkel excitons, which was experimentally confirmed by the temperature-dependent photoluminescence (PL) emission, highly enhanced radiative decay rate, significantly narrowed PL peak width and strongly directional in-plane emission. The coherence in monolayer pentacene samples was observed to be delocalized over 135 molecules, which is significantly larger than the values (a few molecules) observed from other organic thin films. In addition, the super-transport of excitons in monolayer pentacene samples showed highly anisotropic behaviour. Our results pave the way for the development of future high-speed excitonic circuits, fast OLEDs, and other opto-electronic devices.

preprint2020arXiv

SUPER: A Novel Lane Detection System

AI-based lane detection algorithms were actively studied over the last few years. Many have demonstrated superior performance compared with traditional feature-based methods. The accuracy, however, is still generally in the low 80% or high 90%, or even lower when challenging images are used. In this paper, we propose a real-time lane detection system, called Scene Understanding Physics-Enhanced Real-time (SUPER) algorithm. The proposed method consists of two main modules: 1) a hierarchical semantic segmentation network as the scene feature extractor and 2) a physics enhanced multi-lane parameter optimization module for lane inference. We train the proposed system using heterogeneous data from Cityscapes, Vistas and Apollo, and evaluate the performance on four completely separate datasets (that were never seen before), including Tusimple, Caltech, URBAN KITTI-ROAD, and X-3000. The proposed approach performs the same or better than lane detection models already trained on the same dataset and performs well even on datasets it was never trained on. Real-world vehicle tests were also conducted. Preliminary test results show promising real-time lane-detection performance compared with the Mobileye.

preprint2020arXiv

Thermodynamic Origin of Reaction Non-Uniformity in Battery Porous Electrodes and its Mitigation

The development of non-uniform reaction current distribution within porous electrodes is a ubiquitous phenomenon during battery charging / discharging and frequently controls the rate performance of battery cells. Reaction inhomogeneity in porous electrodes is usually attributed to the kinetic limitation of mass transport within the electrolyte and/or solid electrode phase. In this work, however, we reveal that it is also strongly influenced by the intrinsic thermodynamic behavior of electrode materials, specifically the dependence of the equilibrium potential on the state of charge: electrode reaction becomes increasingly non-uniform when the slope of the equilibrium potential curve is reduced. We employ numerical simulation and equivalent circuit model to elucidate such a correlation and show that the degree of reaction inhomogeneity and the resultant discharge capacity can be predicted by a dimensionless reaction uniformity number. For electrode materials that have equilibrium potentials insensitive to the state of charge and exhibit significant reaction non-uniformity, we demonstrate several approaches to spatially homogenizing the reaction current inside porous electrodes, including matching the electronic and ionic resistances, introducing graded electronic conductivity and reducing the surface reaction kinetics.

preprint2020arXiv

Upconversion nonlinear structured illumination microscopy

Video-rate super-resolution imaging through biological tissue can visualize and track biomolecule interplays and transportations inside cellular organisms. Structured illumination microscopy allows for wide-field super resolution observation of biological samples but is limited by the strong absorption and scattering of light by biological tissues, which degrades its imaging resolution. Here we report a photon upconversion scheme using lanthanide-doped nanoparticles for wide-field super-resolution imaging through the biological transparent window, featured by near-infrared and low-irradiance nonlinear structured illumination. We demonstrate that the 976 nm excitation and 800 nm up-converted emission can mitigate the aberration. We found that the nonlinear response of upconversion emissions from single nanoparticles can effectively generate the required high spatial frequency components in Fourier domain. These strategies lead to a new modality in microscopy with a resolution of 130 nm, 1/7th of the excitation wavelength, and a frame rate of 1 fps.

preprint2019arXiv

Prediction of $NΩ$-like dibaryons with heavy quarks

Possible $NΩ$-like dibaryons $NΩ_{ccc}$ and $NΩ_{bbb}$ with quantum numbers $IJ^P=\frac{1}{2}2^+$ are investigated within the framework of quark delocalization color screening model. We find both of these two states are bound, and the binding energy increases as the quarks of the system become heavier. The attraction between $N$ and $Ω_{ccc}$ (or $Ω_{bbb}$) mainly comes from the kinetic energy term due to quark delocalization and color screening. The effect of the channel-coupling provides more effective attraction to $NΩ_{ccc}$ and $NΩ_{bbb}$ systems. Besides, the scattering length, the effective range, and the binding energy, obtained from the calculation of the low-energy scattering phase shifts, also supports the existence of the $NΩ_{ccc}$ and $NΩ_{bbb}$ states. All these properties can provide necessary information for experimental search for the $NΩ$-like dibaryons with heavy quarks. And the experimental progress can also check the mechanism of the intermediate-range attraction of the baryon-baryon interaction in quark models.

preprint2007arXiv

Reentrant spin glass transition in LuFe2O4

We have carried out a comprehensive investigation of magnetic properties of LuFe$_2$O$_4$, using AC susceptibility, DC magnetization and specific heat. A magnetic phase transition around $\sim$236 K was observed with DC magnetization and specific heat measurements, which is identified as a paramagnetic to ferrimagnetic transition based on the nonlinear susceptibility data. Upon further cooling below this temperature, we also observed highly relaxational magnetic behavior: the DC magnetization exhibits history and time dependence, and the real and imaginary part of the AC susceptibility shows large frequency dependence. Dynamic scaling of the AC susceptibility data suggests that this low temperature phase can be described as a reentrant spin glass phase. We also discuss magnetic field dependence of the spin glass transition and aging, memory and rejuvenation effect below the glass transition temperature around 228 K.