Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
42works
0followers
28topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

42 published item(s)

preprint2026arXiv

DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation

Controllable medical video generation has achieved remarkable progress, but it still lacks interpretability, which requires the alignment of generated contents with physical priors and faithful clinical manifestations. To push the boundaries from mere controllability to interpretability, we propose DepthPilot, the first interpretable framework for colonoscopy video generation. This work takes a step toward trustworthy generation through two synergistic paradigms. To achieve explicit geometric grounding, DepthPilot devises a prior distribution alignment strategy, injecting depth constraints into the diffusion backbone via parameter-efficient fine-tuning to ensure anatomical fidelity. To enhance intrinsic nonlinear modeling under these geometric constraints, DepthPilot employs an adaptive spline denoising module, replacing fixed linear weights with learnable spline functions to capture complex spatio-temporal dynamics. Extensive evaluations across three public datasets and in-house clinical data confirm DepthPilot's robust ability to produce physically consistent videos. It achieves FID scores below 15 across all benchmarks and ranks first in clinician assessments, bridging the gap between "visually realistic" and "clinically interpretable". Moreover, DepthPilot-generated videos are expected to enable reliable 3D reconstruction, facilitating surgical navigation and blind region identification, and serve as a foundation toward the colorectal world model.

preprint2023arXiv

On the Stretch Factor of Polygonal Chains

Let $P=(p_1, p_2, \dots, p_n)$ be a polygonal chain in $\mathbb{R}^d$. The stretch factor of $P$ is the ratio between the total length of $P$ and the distance of its endpoints, $\sum_{i = 1}^{n-1} |p_i p_{i+1}|/|p_1 p_n|$. For a parameter $c \geq 1$, we call $P$ a $c$-chain if $|p_ip_j|+|p_jp_k| \leq c|p_ip_k|$, for every triple $(i,j,k)$, $1 \leq i<j<k \leq n$. The stretch factor is a global property: it measures how close $P$ is to a straight line, and it involves all the vertices of $P$; being a $c$-chain, on the other hand, is a fingerprint-property: it only depends on subsets of $O(1)$ vertices of the chain. We investigate how the $c$-chain property influences the stretch factor in the plane: (i) we show that for every $\varepsilon > 0$, there is a noncrossing $c$-chain that has stretch factor $Ω(n^{1/2-\varepsilon})$, for sufficiently large constant $c=c(\varepsilon)$; (ii) on the other hand, the stretch factor of a $c$-chain $P$ is $O\left(n^{1/2}\right)$, for every constant $c\geq 1$, regardless of whether $P$ is crossing or noncrossing; and (iii) we give a randomized algorithm that can determine, for a polygonal chain $P$ in $\mathbb{R}^2$ with $n$ vertices, the minimum $c\geq 1$ for which $P$ is a $c$-chain in $O\left(n^{2.5}\ \mathrm{polylog}\ n\right)$ expected time and $O(n\log n)$ space. These results generalize to $\mathbb{R}^d$. For every dimension $d\geq 2$ and every $\varepsilon>0$, we construct a noncrossing $c$-chain that has stretch factor $Ω\left(n^{(1-\varepsilon)(d-1)/d}\right)$; on the other hand, the stretch factor of any $c$-chain is $O\left((n-1)^{(d-1)/d}\right)$; for every $c>1$, we can test whether an $n$-vertex chain in $\mathbb{R}^d$ is a $c$-chain in $O\left(n^{3-1/d}\ \mathrm{polylog}\ n\right)$ expected time and $O(n\log n)$ space.

preprint2023arXiv

Ultra-thin Epitaxial MgB2 on SiC: Substrate Surface Polarity Dependent Properties

High quality, ultrathin, superconducting films are required for advanced devices such as hot-electron bolometers, superconducting nanowire single photon detectors, and quantum applications. Using Hybrid Physical-Chemical Vapor Deposition (HPCVD), we show that MgB2 films as thin as 4 nm can be fabricated on the carbon terminated 6H-SiC (0001) surface with a superconducting transition temperature above 33K and a rms roughness of 0.7 nm. Remarkably, the film quality is a function of the SiC surface termination, with the C-terminated surface preferred to the Si-terminated surface. To understand the MgB2 thin film/ SiC substrate interactions giving rise to this difference, we characterized the interfacial structures using Rutherford backscattering spectroscopy/channeling, electron energy loss spectroscopy, and x-ray photoemission spectroscopy. The MgB2/SiC interface structure is complex and different for the two terminations. Both terminations incorporate substantial unintentional oxide layers influencing MgB2 growth and morphology, but with different extent, intermixing and interface chemistry. In this paper, we report measurements of transport, resistivity, and critical superconducting temperature of MgB2/SiC that are different for the two terminations, and link interfacial structure variations to observed differences. The result shows that the C face of SiC is a preferred substrate for the deposition of ultrathin superconducting MgB2 films.

preprint2022arXiv

BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation

The challenges of learning a robust 6D pose function lie in 1) severe occlusion and 2) systematic noises in depth images. Inspired by the success of point-pair features, the goal of this paper is to recover the 6D pose of an object instance segmented from RGB-D images by locally matching pairs of oriented points between the model and camera space. To this end, we propose a novel Bi-directional Correspondence Mapping Network (BiCo-Net) to first generate point clouds guided by a typical pose regression, which can thus incorporate pose-sensitive information to optimize generation of local coordinates and their normal vectors. As pose predictions via geometric computation only rely on one single pair of local oriented points, our BiCo-Net can achieve robustness against sparse and occluded point clouds. An ensemble of redundant pose predictions from locally matching and direct pose regression further refines final pose output against noisy observations. Experimental results on three popularly benchmarking datasets can verify that our method can achieve state-of-the-art performance, especially for the more challenging severe occluded scenes. Source codes are available at https://github.com/Gorilla-Lab-SCUT/BiCo-Net.

preprint2022arXiv

Capturing Evolution Genes for Time Series Data

The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments&#39; distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results.

preprint2022arXiv

Classification of Single-View Object Point Clouds

Object point cloud classification has drawn great research attention since the release of benchmarking datasets, such as the ModelNet and the ShapeNet. These benchmarks assume point clouds covering complete surfaces of object instances, for which plenty of high-performing methods have been developed. However, their settings deviate from those often met in practice, where, due to (self-)occlusion, a point cloud covering partial surface of an object is captured from an arbitrary view. We show in this paper that performance of existing point cloud classifiers drops drastically under the considered single-view, partial setting; the phenomenon is consistent with the observation that semantic category of a partial object surface is less ambiguous only when its distribution on the whole surface is clearly specified. To this end, we argue for a single-view, partial setting where supervised learning of object pose estimation should be accompanied with classification. Technically, we propose a baseline method of Pose-Accompanied Point cloud classification Network (PAPNet); built upon SE(3)-equivariant convolutions, the PAPNet learns intermediate pose transformations for equivariant features defined on vector fields, which makes the subsequent classification easier (ideally) in the category-level, canonical pose. By adapting existing ModelNet40 and ScanNet datasets to the single-view, partial setting, experiment results can verify the necessity of object pose estimation and superiority of our PAPNet to existing classifiers.

preprint2022arXiv

Convolutional Fine-Grained Classification with Self-Supervised Target Relation Regularization

Fine-grained visual classification can be addressed by deep representation learning under supervision of manually pre-defined targets (e.g., one-hot or the Hadamard codes). Such target coding schemes are less flexible to model inter-class correlation and are sensitive to sparse and imbalanced data distribution as well. In light of this, this paper introduces a novel target coding scheme -- dynamic target relation graphs (DTRG), which, as an auxiliary feature regularization, is a self-generated structural output to be mapped from input images. Specifically, online computation of class-level feature centers is designed to generate cross-category distance in the representation space, which can thus be depicted by a dynamic graph in a non-parametric manner. Explicitly minimizing intra-class feature variations anchored on those class-level centers can encourage learning of discriminative features. Moreover, owing to exploiting inter-class dependency, the proposed target graphs can alleviate data sparsity and imbalanceness in representation learning. Inspired by recent success of the mixup style data augmentation, this paper introduces randomness into soft construction of dynamic target relation graphs to further explore relation diversity of target classes. Experimental results can demonstrate the effectiveness of our method on a number of diverse benchmarks of multiple visual classification tasks, especially achieving the state-of-the-art performance on popular fine-grained object benchmarks and superior robustness against sparse and imbalanced data. Source codes are made publicly available at https://github.com/AkonLau/DTRG.

preprint2022arXiv

Fine-Grained Object Classification via Self-Supervised Pose Alignment

Semantic patterns of fine-grained objects are determined by subtle appearance difference of local parts, which thus inspires a number of part-based methods. However, due to uncontrollable object poses in images, distinctive details carried by local regions can be spatially distributed or even self-occluded, leading to a large variation on object representation. For discounting pose variations, this paper proposes to learn a novel graph based object representation to reveal a global configuration of local parts for self-supervised pose alignment across classes, which is employed as an auxiliary feature regularization on a deep representation learning network.Moreover, a coarse-to-fine supervision together with the proposed pose-insensitive constraint on shallow-to-deep sub-networks encourages discriminative features in a curriculum learning manner. We evaluate our method on three popular fine-grained object classification benchmarks, consistently achieving the state-of-the-art performance. Source codes are available at https://github.com/yangxh11/P2P-Net.

preprint2022arXiv

Fisher Matrix Based Fault Detection for PMUs Data in Power Grids

Abnormal event detection is critical in the safe operation of power system. In this paper, using the data collected from phasor measurement units (PMUs), two methods based on Fisher random matrix are proposed to detect faults in power grids. Firstly, the fault detection matrix is constructed and the event detection problem is reformatted as a two-sample covariance matrices test problem. Secondly, the central limit theorem for the linear spectral statistic of the Fisher matrix is derived and a test statistic for testing faults is proposed. To save computing resources, the screening step of fault interval based on the test statistic is designed to check the existence of faults. Then two point-by-point methods are proposed to determine the time of the fault in the interval. One method detects faults by checking whether the largest sample eigenvalue falls outside the supporting set of limiting spectral distribution of the standard Fisher matrix, which can detect the faults with higher accuracy. The other method tests the faults based on the statistic proposed, which has a faster detection speed. Compared with existing works, the simulation results illustrate that two methods proposed in this paper cost less computational time and provide a higher degree of accuracy.

preprint2022arXiv

Global well-posedness of the 1d compressible Navier-Stokes system with rough data

In this paper, we study the global well-posedness problem for the 1d compressible Navier-Stokers system (cNSE) in gas dynamics with rough initial data. Frist, Liu- Yu (2022) established the global well-posedness theory for the 1d isentropic cNSE with initial velocity data in BV space. Then, it was extended to the 1d cNSE for the polytropic ideal gas with initial velocity and temperature data in BV space by Wang-Yu-Zhang (2022). We improve the global well-posedness result of Liu-Yu with initial velocity data in $W^{2γ,1}$ space; and of Wang-Yu-Zhang with initial velocity data in $ L^2\cap W^{2γ,1}$ space and initial data of temperature in $\dot W^{-\frac{2}{3},\frac{6}{5}}\cap \dot W^{2γ-1,1}$ for any $γ>0$ \textit{arbitrary small}. Our essential ideas are based on establishing various &#34;end-point&#34; smoothing estimates for the 1d parabolic equation.

preprint2022arXiv

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

Audio classification is an important task of mapping audio samples into their corresponding labels. Recently, the transformer model with self-attention mechanisms has been adopted in this field. However, existing audio transformers require large GPU memories and long training time, meanwhile relying on pretrained vision models to achieve high performance, which limits the model&#39;s scalability in audio tasks. To combat these problems, we introduce HTS-AT: an audio transformer with a hierarchical structure to reduce the model size and training time. It is further combined with a token-semantic module to map final outputs into class featuremaps, thus enabling the model for the audio event detection (i.e. localization in time). We evaluate HTS-AT on three datasets of audio classification where it achieves new state-of-the-art (SOTA) results on AudioSet and ESC-50, and equals the SOTA on Speech Command V2. It also achieves better performance in event localization than the previous CNN-based models. Moreover, HTS-AT requires only 35% model parameters and 15% training time of the previous audio transformer. These results demonstrate the high performance and high efficiency of HTS-AT.

preprint2022arXiv

Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments

Choral music separation refers to the task of extracting tracks of voice parts (e.g., soprano, alto, tenor, and bass) from mixed audio. The lack of datasets has impeded research on this topic as previous work has only been able to train and evaluate models on a few minutes of choral music data due to copyright issues and dataset collection difficulties. In this paper, we investigate the use of synthesized training data for the source separation task on real choral music. We make three contributions: first, we provide an automated pipeline for synthesizing choral music data from sampled instrument plugins within controllable options for instrument expressiveness. This produces an 8.2-hour-long choral music dataset from the JSB Chorales Dataset and one can easily synthesize additional data. Second, we conduct an experiment to evaluate multiple separation models on available choral music separation datasets from previous work. To the best of our knowledge, this is the first experiment to comprehensively evaluate choral music separation. Third, experiments demonstrate that the synthesized choral data is of sufficient quality to improve the model&#39;s performance on real choral music datasets. This provides additional experimental statistics and data support for the choral music separation study.

preprint2022arXiv

Local Rotational Jamming and Multi-Scale Hyperuniformities in an Active Spinner System

An active system consisting of many self-spinning dimers is simulated, and a distinct local rotational jamming transition is observed as the density increases. In the low density regime, the system stays in an absorbing state, in which each dimer rotates independently subject to the applied torque. While in the high density regime, a fraction of the dimers become rotationally jammed into local clusters, and the system exhibits spinodal-decomposition like two-phase morphologies. For high enough densities, the system becomes completely jammed in both rotational and translational degrees of freedom. Such a simple system is found to exhibit rich and multiscale disordered hyperuniformities among the above phases: the absorbing state shows a critical hyperuniformity of the strongest class and subcritically preserves the vanishing density-fluctuation scaling up to some length scale; the locally-jammed state shows a two-phase hyperuniformity conversely beyond some length scale with respect to the phase cluster sizes; the totally jammed state appears to be a monomer crystal, but intrinsically loses large-scale hyperuniformity. These results are inspiring for designing novel phase-separation and disordered hyperuniform systems through dynamical organization.

preprint2022arXiv

Locality-sensitive bucketing functions for the edit distance

Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existing $k$-mer-based bucketing methods have been efficient in processing sequencing data with low error rate, but encounter much reduced sensitivity on data with high error rate. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps. Here we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be $(d_1, d_2)$-sensitive if any two sequences within an edit distance of $d_1$ are mapped into at least one shared bucket, and any two sequences with distance at least $d_2$ are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of $(d_1,d_2)$ and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal. These results provide theoretical foundations for their practical use in analyzing sequences with high error rate while also providing insights for the hardness of designing ungapped LSH functions.

preprint2022arXiv

Low-rank approximation for multiscale PDEs

Historically, analysis for multiscale PDEs is largely unified while numerical schemes tend to be equation-specific. In this paper, we propose a unified framework for computing multiscale problems through random sampling. This is achieved by incorporating randomized SVD solvers and manifold learning techniques to numerically reconstruct the low-rank features of multiscale PDEs. We use multiscale radiative transfer equation and elliptic equation with rough media to showcase the application of this framework.

preprint2022arXiv

Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap

Semantic analyses of object point clouds are largely driven by releasing of benchmarking datasets, including synthetic ones whose instances are sampled from object CAD models. However, learning from synthetic data may not generalize to practical scenarios, where point clouds are typically incomplete, non-uniformly distributed, and noisy. Such a challenge of Simulation-to-Reality (Sim2Real) domain gap could be mitigated via learning algorithms of domain adaptation; however, we argue that generation of synthetic point clouds via more physically realistic rendering is a powerful alternative, as systematic non-uniform noise patterns can be captured. To this end, we propose an integrated scheme consisting of physically realistic synthesis of object point clouds via rendering stereo images via projection of speckle patterns onto CAD models and a novel quasi-balanced self-training designed for more balanced data distribution by sparsity-driven selection of pseudo labeled samples for long tailed classes. Experiment results can verify the effectiveness of our method as well as both of its modules for unsupervised domain adaptation on point cloud classification, achieving the state-of-the-art performance. Source codes and the SpeckleNet synthetic dataset are available at https://github.com/Gorilla-Lab-SCUT/QS3.

preprint2022arXiv

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

Singing melody extraction is an important problem in the field of music information retrieval. Existing methods typically rely on frequency-domain representations to estimate the sung frequencies. However, this design does not lead to human-level performance in the perception of melody information for both tone (pitch-class) and octave. In this paper, we propose TONet, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture. First, we present an improved input representation, the Tone-CFP, that explicitly groups harmonics via a rearrangement of frequency-bins. Second, we introduce an encoder-decoder architecture that is designed to obtain a salience feature map, a tone feature map, and an octave feature map. Third, we propose a tone-octave fusion mechanism to improve the final salience feature map. Experiments are done to verify the capability of TONet with various baseline backbone models. Our results show that tone-octave fusion with Tone-CFP can significantly improve the singing voice extraction performance across various datasets -- with substantial gains in octave and tone accuracy.

preprint2022arXiv

Weak solutions of the three-dimensional hypoviscous elastodynamics with finite kinetic energy

We construct weak solutions to the 3D hypoviscous incompressible elastodynamics with finite kinetic energy which was unknown in literatures. Our result holds for fractional hypoviscosity $(-Δ)^θ$, where $0\leqθ<1$. The proof {consists of a convex integration scheme with new building blocks of 2D intermittency and suitable temporal correctors, which are motivated by} the inherent geometric structure of the viscoelastic equations.

preprint2022arXiv

Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a three-component pipeline to train a universal audio source separator from a large, but weakly-labeled dataset: AudioSet. First, we propose a transformer-based sound event detection system for processing weakly-labeled training data. Second, we devise a query-based audio separation model that leverages this data for model training. Third, we design a latent embedding processor to encode queries that specify audio targets for separation, allowing for zero-shot generalization. Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training. In addition, the proposed audio separator can be used in a zero-shot setting, learning to separate types of audio sources that were never seen in training. To evaluate the separation performance, we test our model on MUSDB18, while training on the disjoint AudioSet. We further verify the zero-shot performance by conducting another experiment on audio source types that are held-out from training. The model achieves comparable Source-to-Distortion Ratio (SDR) performance to current supervised models in both cases.

preprint2021arXiv

MAX Phase Zr2SeC and Its Thermal Conduction Behavior

The elemental diversity is crucial to screen out ternary MAX phases with outstanding properties via tuning of bonding types and strength between constitutive atoms. As a matter of fact, the interactions between M and A atoms largely determine the physical and chemical properties of MAX phases. Herein, Se element was experimentally realized to occupy the A site of a MAX phase, Zr2SeC, becoming a new member within this nanolaminated ternary carbide family. Comprehensive characterizations including Rietveld refinement of X-ray Diffraction and atom-resolved transmission electron microscopy techniques were employed to validate this novel MAX phase. The distinct thermal conduction behaviors emerged are attributed to the characteristic interactions between Zr and Se atoms.

preprint2021arXiv

Reduced Ionic Diffusion by the Dynamic Electron-Ion Collisions in Warm Dense Hydrogen

The dynamic electron-ion collisions play an important rolein determining the static and transport properties of warmdense matter (WDM). Electron force field (eFF) method is applied to study the ionic transport properties of warm densehydrogen. Compared with the results from quantum moleculardynamics and orbital-free molecular dynamics, the ionicdiffusions are largely reduced by involving the dynamic collisions of electrons and ions. This physics is verified by quantum Langevin molecular dynamics (QLMD) simulations, which includes electron-ion collisions induced friction(EI-CIF) into the dynamic equation of ions. Based on these new results, we proposed a model including the correctionof collisions induced friction of the ionic diffusion. The CIF model has been verified to be valid at a wide range ofdensity and temperature. We also compare the results with the Yukawa one component plasma (YOCP) model andEffective OCP (EOCP) model. We proposed to calculate the self-diffusion coefficients using the EOCP model modifiedby the CIF model to introduce the dynamic electron-ion collisions effect.

preprint2020arXiv

A Framework for End-to-End Learning on Semantic Tree-Structured Data

While learning models are typically studied for inputs in the form of a fixed dimensional feature vector, real world data is rarely found in this form. In order to meet the basic requirement of traditional learning models, structural data generally have to be converted into fix-length vectors in a handcrafted manner, which is tedious and may even incur information loss. A common form of structured data is what we term &#34;semantic tree-structures&#34;, corresponding to data where rich semantic information is encoded in a compositional manner, such as those expressed in JavaScript Object Notation (JSON) and eXtensible Markup Language (XML). For tree-structured data, several learning models have been studied to allow for working directly on raw tree-structure data, However such learning models are limited to either a specific tree-topology or a specific tree-structured data format, e.g., synthetic parse trees. In this paper, we propose a novel framework for end-to-end learning on generic semantic tree-structured data of arbitrary topology and heterogeneous data types, such as data expressed in JSON, XML and so on. Motivated by the works in recursive and recurrent neural networks, we develop exemplar neural implementations of our framework for the JSON format. We evaluate our approach on several UCI benchmark datasets, including ablation and data-efficiency studies, and on a toy reinforcement learning task. Experimental results suggest that our framework yields comparable performance to use of standard models with dedicated feature-vectors in general, and even exceeds baseline performance in cases where compositional nature of the data is particularly important. The source code for a JSON-based implementation of our framework along with experiments can be downloaded at https://github.com/EndingCredits/json2vec.

preprint2020arXiv

A low-rank Schwarz method for radiative transport equation with heterogeneous scattering coefficient

Random sampling has been used to find low-rank structure and to build fast direct solvers for multiscale partial differential equations of various types. In this work, we design an accelerated Schwarz method for radiative transfer equations that makes use of approximate local solution maps constructed offline via a random sampling strategy. Numerical examples demonstrate the accuracy, robustness, and efficiency of the proposed approach.

preprint2020arXiv

CAD-PU: A Curvature-Adaptive Deep Learning Solution for Point Set Upsampling

Point set is arguably the most direct approximation of an object or scene surface, yet its practical acquisition often suffers from the shortcoming of being noisy, sparse, and possibly incomplete, which restricts its use for a high-quality surface recovery. Point set upsampling aims to increase its density and regularity such that a better surface recovery could be achieved. The problem is severely ill-posed and challenging, considering that the upsampling target itself is only an approximation of the underlying surface. Motivated to improve the surface approximation via point set upsampling, we identify the factors that are critical to the objective, by pairing the surface approximation error bounds of the input and output point sets. It suggests that given a fixed budget of points in the upsampling result, more points should be distributed onto the surface regions where local curvatures are relatively high. To implement the motivation, we propose a novel design of Curvature-ADaptive Point set Upsampling network (CAD-PU), the core of which is a module of curvature-adaptive feature expansion. To train CAD-PU, we follow the same motivation and propose geometrically intuitive surrogates that approximate discrete notions of surface curvature for the upsampled point set. We further integrate the proposed surrogates into an adversarial learning based curvature minimization objective, which gives a practically effective learning of CAD-PU. We conduct thorough experiments that show the efficacy of our contributions and the advantages of our method over existing ones. Our implementation codes are publicly available at https://github.com/JiehongLin/CAD-PU.

preprint2020arXiv

Compositional Few-Shot Recognition with Primitive Discovery and Enhancing

Few-shot learning (FSL) aims at recognizing novel classes given only few training samples, which still remains a great challenge for deep learning. However, humans can easily recognize novel classes with only few samples. A key component of such ability is the compositional recognition that human can perform, which has been well studied in cognitive science but is not well explored in FSL. Inspired by such capability of humans, to imitate humans&#39; ability of learning visual primitives and composing primitives to recognize novel classes, we propose an approach to FSL to learn a feature representation composed of important primitives, which is jointly trained with two parts, i.e. primitive discovery and primitive enhancing. In primitive discovery, we focus on learning primitives related to object parts by self-supervision from the order of image splits, avoiding extra laborious annotations and alleviating the effect of semantic gaps. In primitive enhancing, inspired by current studies on the interpretability of deep networks, we provide our composition view for the FSL baseline model. To modify this model for effective composition, inspired by both mathematical deduction and biological studies (the Hebbian Learning rule and the Winner-Take-All mechanism), we propose a soft composition mechanism by enlarging the activation of important primitives while reducing that of others, so as to enhance the influence of important primitives and better utilize these primitives to compose novel classes. Extensive experiments on public benchmarks are conducted on both the few-shot image classification and video recognition tasks. Our method achieves the state-of-the-art performance on all these datasets and shows better interpretability.

preprint2020arXiv

Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions

Automatic music generation is an interdisciplinary research topic that combines computational creativity and semantic analysis of music to create automatic machine improvisations. An important property of such a system is allowing the user to specify conditions and desired properties of the generated music. In this paper we designed a model for composing melodies given a user specified symbolic scenario combined with a previous music context. We add manual labeled vectors denoting external music quality in terms of chord function that provides a low dimensional representation of the harmonic tension and resolution. Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song. The model contains two stages and requires separate training where the first stage adopts a Conditional Variational Autoencoder (C-VAE) to build a bijection between note sequences and their latent representations, and the second stage adopts long short-term memory networks (LSTM) with structural conditions to continue writing future melodies. We further exploit the disentanglement technique via C-VAE to allow melody generation based on pitch contour information separately from conditioning on rhythm patterns. Finally, we evaluate the proposed model using quantitative analysis of rhythm and the subjective listening study. Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns. The ability to generate longer and more structural phrases from disentangled representations combined with semantic scenario specification conditions shows a broad application of our model.

preprint2020arXiv

Efficient Feature-based Image Registration by Mapping Sparsified Surfaces

With the advancement in the digital camera technology, the use of high resolution images and videos has been widespread in the modern society. In particular, image and video frame registration is frequently applied in computer graphics and film production. However, conventional registration approaches usually require long computational time for high resolution images and video frames. This hinders the application of the registration approaches in the modern industries. In this work, we first propose a new image representation method to accelerate the registration process by triangulating the images effectively. For each high resolution image or video frame, we compute an optimal coarse triangulation which captures the important features of the image. Then, we apply a surface registration algorithm to obtain a registration map which is used to compute the registration of the high resolution image. Experimental results suggest that our overall algorithm is efficient and capable to achieve a high compression rate while the accuracy of the registration is well retained when compared with the conventional grid-based approach. Also, the computational time of the registration is significantly reduced using our triangulation-based approach.

preprint2020arXiv

Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors

Existing deep learning algorithms for point cloud analysis mainly concern discovering semantic patterns from global configuration of local geometries in a supervised learning manner. However, very few explore geometric properties revealing local surface manifolds embedded in 3D Euclidean space to discriminate semantic classes or object parts as additional supervision signals. This paper is the first attempt to propose a unique multi-task geometric learning network to improve semantic analysis by auxiliary geometric learning with local shape properties, which can be either generated via physical computation from point clouds themselves as self-supervision signals or provided as privileged information. Owing to explicitly encoding local shape manifolds in favor of semantic analysis, the proposed geometric self-supervised and privileged learning algorithms can achieve superior performance to their backbone baselines and other state-of-the-art methods, which are verified in the experiments on the popular benchmarks.

preprint2020arXiv

Independent wavefront tailoring in full polarization channels by helicity-decoupled metasurface

Controlling the polarization and wavefront of light is essential for compact photonic systems in modern science and technology. This may be achieved by metasurfaces, a new platform that has radically changed the way people engineer wave-matter interactions. However, it still remains very challenging to generate versatile beams with arbitrary and independent wavefronts in each polarization channel by a single ultrathin metasurface. By modulating both the geometric and propagation phases of the metasurface, here we propose a method that can generate an assembly of circularly- and linearly-polarized beams with simultaneously the capability of independent encoding desired wavefront to each individual polarization channel, which we believe will greatly enhance the information capacities of the meta-devices. Two proof-of-concept designs are experimentally demonstrated in microwave region. Upon the excitation of an arbitrary linear polarization, the first device can generate distinct vortex beams with desired two linear and two circular orthogonal polarizations, whereas the second one can generate multi-foci containing components of full polarizations. This approach to generate versatile polarizations with tailored wavefront may pave a way to achieve advanced, flat and multifunctional meta-device for integrated systems.

preprint2020arXiv

Multiparty Selection

Given a sequence $A$ of $n$ numbers and an integer (target) parameter $1\leq i\leq n$, the (exact) selection problem asks to find the $i$-th smallest element in $A$. An element is said to be $(i,j)$-mediocre if it is neither among the top $i$ nor among the bottom $j$ elements of $S$. The approximate selection problem asks to find a $(i,j)$-mediocre element for some given $i,j$; as such, this variant allows the algorithm to return any element in a prescribed range. In the first part, we revisit the selection problem in the two-party model introduced by Andrew Yao (1979) and then extend our study of exact selection to the multiparty model. In the second part, we deduce some communication complexity benefits that arise in approximate selection. In particular, we present a deterministic protocol for finding an approximate median among $k$ players.

preprint2020arXiv

MusPy: A Toolkit for Symbolic Music Generation

In this paper, we present MusPy, an open source Python library for symbolic music generation. MusPy provides easy-to-use tools for essential components in a music generation system, including dataset management, data I/O, data preprocessing and model evaluation. In order to showcase its potential, we present statistical analysis of the eleven datasets currently supported by MusPy. Moreover, we conduct a cross-dataset generalizability experiment by training an autoregressive model on each dataset and measuring held-out likelihood on the others---a process which is made easier by MusPy&#39;s dataset management system. The results provide a map of domain overlap between various commonly used datasets and show that some datasets contain more representative cross-genre samples than others. Along with the dataset analysis, these results might serve as a guide for choosing datasets in future research. Source code and documentation are available at https://github.com/salu133445/muspy .

preprint2020arXiv

MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views

Autonomous driving requires the inference of actionable information such as detecting and classifying objects, and determining the drivable space. To this end, we present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation using multiple views of a single LiDAR point cloud. The first stage processes the point cloud projected onto a perspective view in order to semantically segment the scene. The second stage then processes the point cloud (along with semantic labels from the first stage) projected onto a bird&#39;s eye view, to detect and classify objects. Both stages use an encoder-decoder architecture. We show that our multi-view, multi-stage, multi-class approach is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input, in challenging scenes with more than one hundred vehicles and pedestrians at a time. The system operates efficiently at 150 fps on an embedded GPU designed for a self-driving car, including a postprocessing step to maintain identities over time. We show results on both KITTI and a much larger internal dataset, thus demonstrating the method&#39;s ability to scale by an order of magnitude.

preprint2020arXiv

On the Longest Spanning Tree with Neighborhoods

We study a maximization problem for geometric network design. Given a set of $n$ compact neighborhoods in $\mathbb{R}^d$, select a point in each neighborhood, so that the longest spanning tree on these points (as vertices) has maximum length. Here we give an approximation algorithm with ratio $0.511$, which represents the first, albeit small, improvement beyond $1/2$. While we suspect that the problem is NP-hard already in the plane, this issue remains open.

preprint2020arXiv

POP909: A Pop-song Dataset for Music Arrangement Generation

Music arrangement generation is a subtask of automatic music generation, which involves reconstructing and re-conceptualizing a piece with new compositional techniques. Such a generation process inevitably requires reference from the original melody, chord progression, or other structural information. Despite some promising models for arrangement, they lack more refined data to achieve better evaluations and more practical results. In this paper, we propose POP909, a dataset which contains multiple versions of the piano arrangements of 909 popular songs created by professional musicians. The main body of the dataset contains the vocal melody, the lead instrument melody, and the piano accompaniment for each song in MIDI format, which are aligned to the original audio files. Furthermore, we provide the annotations of tempo, beat, key, and chords, where the tempo curves are hand-labeled and others are done by MIR algorithms. Finally, we conduct several baseline experiments with this dataset using standard deep music generation algorithms.

preprint2020arXiv

Random Sampling and Efficient Algorithms for Multiscale PDEs

We describe a numerical framework that uses random sampling to efficiently capture low-rank local solution spaces of multiscale PDE problems arising in domain decomposition. In contrast to existing techniques, our method does not rely on detailed analytical understanding of specific multiscale PDEs, in particular, their asymptotic limits. We present the application of the framework on two examples --- a linear kinetic equation and an elliptic equation with rough media. On these two examples, this framework achieves the asymptotic preserving property for the kinetic equations and numerical homogenization for the elliptic equations.

preprint2020arXiv

Structured random sketching for PDE inverse problems

For an overdetermined system $\mathsf{A}\mathsf{x} \approx \mathsf{b}$ with $\mathsf{A}$ and $\mathsf{b}$ given, the least-square (LS) formulation $\min_x \, \|\mathsf{A}\mathsf{x}-\mathsf{b}\|_2$ is often used to find an acceptable solution $\mathsf{x}$. The cost of solving this problem depends on the dimensions of $\mathsf{A}$, which are large in many practical instances. This cost can be reduced by the use of random sketching, in which we choose a matrix $\mathsf{S}$ with fewer rows than $\mathsf{A}$ and $\mathsf{b}$, and solve the sketched LS problem $\min_x \, \|\mathsf{S}(\mathsf{A} \mathsf{x}-\mathsf{b})\|_2$ to obtain an approximate solution to the original LS problem. Significant theoretical and practical progress has been made in the last decade in designing the appropriate structure and distribution for the sketching matrix $\mathsf{S}$. When $\mathsf{A}$ and $\mathsf{b}$ arise from discretizations of a PDE-based inverse problem, tensor structure is often present in $\mathsf{A}$ and $\mathsf{b}$. For reasons of practical efficiency, $\mathsf{S}$ should be designed to have a structure consistent with that of $\mathsf{A}$. Can we claim similar approximation properties for the solution of the sketched LS problem with structured $\mathsf{S}$ as for fully-random $\mathsf{S}$? We give estimates that relate the quality of the solution of the sketched LS problem to the size of the structured sketching matrices, for two different structures. Our results are among the first known for random sketching matrices whose structure is suitable for use in PDE inverse problems.

preprint2020arXiv

Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering

Unsupervised domain adaptation (UDA) is to make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution shifts from the target one. Mainstream UDA methods learn aligned features between the two domains, such that a classifier trained on the source features can be readily applied to the target ones. However, such a transferring strategy has a potential risk of damaging the intrinsic discrimination of target data. To alleviate this risk, we are motivated by the assumption of structural domain similarity, and propose to directly uncover the intrinsic target discrimination via discriminative clustering of target data. We constrain the clustering solutions using structural source regularization that hinges on our assumed structural domain similarity. Technically, we use a flexible framework of deep network based discriminative clustering that minimizes the KL divergence between predictive label distribution of the network and an introduced auxiliary one; replacing the auxiliary distribution with that formed by ground-truth labels of source data implements the structural source regularization via a simple strategy of joint network training. We term our proposed method as Structurally Regularized Deep Clustering (SRDC), where we also enhance target discrimination with clustering of intermediate network features, and enhance structural regularization with soft selection of less divergent source examples. Careful ablation studies show the efficacy of our proposed SRDC. Notably, with no explicit domain alignment, SRDC outperforms all existing methods on three UDA benchmarks.

preprint2019arXiv

Multielemental single-atom-thick A layers in nanolaminated V2(Sn, A)C (A=Fe, Co, Ni, Mn) for tailoring magnetic properties

Tailoring of individual single-atom-thick layers in nanolaminated materials offers atomic-level control over material properties. Nonetheless, multielement alloying in individual atomic layers in nanolaminates is largely unexplored. Here, we report a series of inherently nanolaminated V2(A&#39;xSn1-x)C (A&#39;=Fe, Co, Ni and Mn, and combinations thereof, with x=1/3) synthesized by an alloy-guided reaction. The simultaneous occupancy of the four magnetic elements and Sn, the individual single-atom-thick A layers in the compound constitute high-entropy-alloy analogues, two-dimensional in the sense that the alloying exclusively occurs in the A layers. V2(A&#39;xSn1-x)C exhibit distinct ferromagnetic behavior that can be compositionally tailored from the multielement A-layer alloying. This two-dimensional alloying provides a structural-design route with expanded chemical space for discovering materials and exploit properties.

preprint2019arXiv

Optical properties of cubic boron arsenide

The ultrahigh thermal conductivity of boron arsenide makes it a promising material for next-generation electronics and optoelectronics. In this work, we report measured optical properties of cubic boron arsenide crystals including the complex dielectric function, refractive index, and absorption coefficient in the ultraviolet, visible, and near-infrared wavelength range. The data were collected at room temperature using spectroscopic ellipsometry as well as transmission and reflection spectroscopy. We further calculate the optical response using density functional and many-body perturbation theory, considering quasiparticle and excitonic corrections. The computed values for the direct and indirect band gaps (4.25 eV and 2.07 eV) agree well with the measured results (4.12 eV and 2.02 eV). Our findings contribute to the effort of using boron arsenide in novel electronic and optoelectronic applications that take advantage of its demonstrated ultrahigh thermal conductivity and predicted high ambipolar carrier mobility.

preprint2019arXiv

The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation

With recent breakthroughs in artificial neural networks, deep generative models have become one of the leading techniques for computational creativity. Despite very promising progress on image and short sequence generation, symbolic music generation remains a challenging problem since the structure of compositions are usually complicated. In this study, we attempt to solve the melody generation problem constrained by the given chord progression. This music meta-creation problem can also be incorporated into a plan recognition system with user inputs and predictive structural outputs. In particular, we explore the effect of explicit architectural encoding of musical structure via comparing two sequential generative models: LSTM (a type of RNN) and WaveNet (dilated temporal-CNN). As far as we know, this is the first study of applying WaveNet to symbolic music generation, as well as the first systematic comparison between temporal-CNN and RNN for music generation. We conduct a survey for evaluation in our generations and implemented Variable Markov Oracle in music pattern discovery. Experimental results show that to encode structure more explicitly using a stack of dilated convolution layers improved the performance significantly, and a global encoding of underlying chord progression into the generation procedure gains even more.

preprint2019arXiv

Tooth morphometry using quasi-conformal theory

Shape analysis is important in anthropology, bioarchaeology and forensic science for interpreting useful information from human remains. In particular, teeth are morphologically stable and hence well-suited for shape analysis. In this work, we propose a framework for tooth morphometry using quasi-conformal theory. Landmark-matching Teichmüller maps are used for establishing a 1-1 correspondence between tooth surfaces with prescribed anatomical landmarks. Then, a quasi-conformal statistical shape analysis model based on the Teichmüller mapping results is proposed for building a tooth classification scheme. We deploy our framework on a dataset of human premolars to analyze the tooth shape variation among genders and ancestries. Experimental results show that our method achieves much higher classification accuracy with respect to both gender and ancestry when compared to the existing methods. Furthermore, our model reveals the underlying tooth shape difference between different genders and ancestries in terms of the local geometric distortion and curvatures.