Source author record

Ke Chen

Ke Chen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

67works

38topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation

Controllable medical video generation has achieved remarkable progress, but it still lacks interpretability, which requires the alignment of generated contents with physical priors and faithful clinical manifestations. To push the boundaries from mere controllability to interpretability, we propose DepthPilot, the first interpretable framework for colonoscopy video generation. This work takes a step toward trustworthy generation through two synergistic paradigms. To achieve explicit geometric grounding, DepthPilot devises a prior distribution alignment strategy, injecting depth constraints into the diffusion backbone via parameter-efficient fine-tuning to ensure anatomical fidelity. To enhance intrinsic nonlinear modeling under these geometric constraints, DepthPilot employs an adaptive spline denoising module, replacing fixed linear weights with learnable spline functions to capture complex spatio-temporal dynamics. Extensive evaluations across three public datasets and in-house clinical data confirm DepthPilot's robust ability to produce physically consistent videos. It achieves FID scores below 15 across all benchmarks and ranks first in clinician assessments, bridging the gap between "visually realistic" and "clinically interpretable". Moreover, DepthPilot-generated videos are expected to enable reliable 3D reconstruction, facilitating surgical navigation and blind region identification, and serve as a foundation toward the colorectal world model.

preprint2023arXiv

On the Stretch Factor of Polygonal Chains

Let $P=(p_1, p_2, \dots, p_n)$ be a polygonal chain in $\mathbb{R}^d$. The stretch factor of $P$ is the ratio between the total length of $P$ and the distance of its endpoints, $\sum_{i = 1}^{n-1} |p_i p_{i+1}|/|p_1 p_n|$. For a parameter $c \geq 1$, we call $P$ a $c$-chain if $|p_ip_j|+|p_jp_k| \leq c|p_ip_k|$, for every triple $(i,j,k)$, $1 \leq i<j<k \leq n$. The stretch factor is a global property: it measures how close $P$ is to a straight line, and it involves all the vertices of $P$; being a $c$-chain, on the other hand, is a fingerprint-property: it only depends on subsets of $O(1)$ vertices of the chain. We investigate how the $c$-chain property influences the stretch factor in the plane: (i) we show that for every $\varepsilon > 0$, there is a noncrossing $c$-chain that has stretch factor $Ω(n^{1/2-\varepsilon})$, for sufficiently large constant $c=c(\varepsilon)$; (ii) on the other hand, the stretch factor of a $c$-chain $P$ is $O\left(n^{1/2}\right)$, for every constant $c\geq 1$, regardless of whether $P$ is crossing or noncrossing; and (iii) we give a randomized algorithm that can determine, for a polygonal chain $P$ in $\mathbb{R}^2$ with $n$ vertices, the minimum $c\geq 1$ for which $P$ is a $c$-chain in $O\left(n^{2.5}\ \mathrm{polylog}\ n\right)$ expected time and $O(n\log n)$ space. These results generalize to $\mathbb{R}^d$. For every dimension $d\geq 2$ and every $\varepsilon>0$, we construct a noncrossing $c$-chain that has stretch factor $Ω\left(n^{(1-\varepsilon)(d-1)/d}\right)$; on the other hand, the stretch factor of any $c$-chain is $O\left((n-1)^{(d-1)/d}\right)$; for every $c>1$, we can test whether an $n$-vertex chain in $\mathbb{R}^d$ is a $c$-chain in $O\left(n^{3-1/d}\ \mathrm{polylog}\ n\right)$ expected time and $O(n\log n)$ space.

preprint2023arXiv

Ultra-thin Epitaxial MgB2 on SiC: Substrate Surface Polarity Dependent Properties

High quality, ultrathin, superconducting films are required for advanced devices such as hot-electron bolometers, superconducting nanowire single photon detectors, and quantum applications. Using Hybrid Physical-Chemical Vapor Deposition (HPCVD), we show that MgB2 films as thin as 4 nm can be fabricated on the carbon terminated 6H-SiC (0001) surface with a superconducting transition temperature above 33K and a rms roughness of 0.7 nm. Remarkably, the film quality is a function of the SiC surface termination, with the C-terminated surface preferred to the Si-terminated surface. To understand the MgB2 thin film/ SiC substrate interactions giving rise to this difference, we characterized the interfacial structures using Rutherford backscattering spectroscopy/channeling, electron energy loss spectroscopy, and x-ray photoemission spectroscopy. The MgB2/SiC interface structure is complex and different for the two terminations. Both terminations incorporate substantial unintentional oxide layers influencing MgB2 growth and morphology, but with different extent, intermixing and interface chemistry. In this paper, we report measurements of transport, resistivity, and critical superconducting temperature of MgB2/SiC that are different for the two terminations, and link interfacial structure variations to observed differences. The result shows that the C face of SiC is a preferred substrate for the deposition of ultrathin superconducting MgB2 films.

preprint2022arXiv

BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation

The challenges of learning a robust 6D pose function lie in 1) severe occlusion and 2) systematic noises in depth images. Inspired by the success of point-pair features, the goal of this paper is to recover the 6D pose of an object instance segmented from RGB-D images by locally matching pairs of oriented points between the model and camera space. To this end, we propose a novel Bi-directional Correspondence Mapping Network (BiCo-Net) to first generate point clouds guided by a typical pose regression, which can thus incorporate pose-sensitive information to optimize generation of local coordinates and their normal vectors. As pose predictions via geometric computation only rely on one single pair of local oriented points, our BiCo-Net can achieve robustness against sparse and occluded point clouds. An ensemble of redundant pose predictions from locally matching and direct pose regression further refines final pose output against noisy observations. Experimental results on three popularly benchmarking datasets can verify that our method can achieve state-of-the-art performance, especially for the more challenging severe occluded scenes. Source codes are available at https://github.com/Gorilla-Lab-SCUT/BiCo-Net.

preprint2022arXiv

Capturing Evolution Genes for Time Series Data

The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments' distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results.

preprint2022arXiv

Classification of Single-View Object Point Clouds

Object point cloud classification has drawn great research attention since the release of benchmarking datasets, such as the ModelNet and the ShapeNet. These benchmarks assume point clouds covering complete surfaces of object instances, for which plenty of high-performing methods have been developed. However, their settings deviate from those often met in practice, where, due to (self-)occlusion, a point cloud covering partial surface of an object is captured from an arbitrary view. We show in this paper that performance of existing point cloud classifiers drops drastically under the considered single-view, partial setting; the phenomenon is consistent with the observation that semantic category of a partial object surface is less ambiguous only when its distribution on the whole surface is clearly specified. To this end, we argue for a single-view, partial setting where supervised learning of object pose estimation should be accompanied with classification. Technically, we propose a baseline method of Pose-Accompanied Point cloud classification Network (PAPNet); built upon SE(3)-equivariant convolutions, the PAPNet learns intermediate pose transformations for equivariant features defined on vector fields, which makes the subsequent classification easier (ideally) in the category-level, canonical pose. By adapting existing ModelNet40 and ScanNet datasets to the single-view, partial setting, experiment results can verify the necessity of object pose estimation and superiority of our PAPNet to existing classifiers.

preprint2022arXiv

Convolutional Fine-Grained Classification with Self-Supervised Target Relation Regularization

Fine-grained visual classification can be addressed by deep representation learning under supervision of manually pre-defined targets (e.g., one-hot or the Hadamard codes). Such target coding schemes are less flexible to model inter-class correlation and are sensitive to sparse and imbalanced data distribution as well. In light of this, this paper introduces a novel target coding scheme -- dynamic target relation graphs (DTRG), which, as an auxiliary feature regularization, is a self-generated structural output to be mapped from input images. Specifically, online computation of class-level feature centers is designed to generate cross-category distance in the representation space, which can thus be depicted by a dynamic graph in a non-parametric manner. Explicitly minimizing intra-class feature variations anchored on those class-level centers can encourage learning of discriminative features. Moreover, owing to exploiting inter-class dependency, the proposed target graphs can alleviate data sparsity and imbalanceness in representation learning. Inspired by recent success of the mixup style data augmentation, this paper introduces randomness into soft construction of dynamic target relation graphs to further explore relation diversity of target classes. Experimental results can demonstrate the effectiveness of our method on a number of diverse benchmarks of multiple visual classification tasks, especially achieving the state-of-the-art performance on popular fine-grained object benchmarks and superior robustness against sparse and imbalanced data. Source codes are made publicly available at https://github.com/AkonLau/DTRG.

preprint2022arXiv

Fine-Grained Object Classification via Self-Supervised Pose Alignment

Semantic patterns of fine-grained objects are determined by subtle appearance difference of local parts, which thus inspires a number of part-based methods. However, due to uncontrollable object poses in images, distinctive details carried by local regions can be spatially distributed or even self-occluded, leading to a large variation on object representation. For discounting pose variations, this paper proposes to learn a novel graph based object representation to reveal a global configuration of local parts for self-supervised pose alignment across classes, which is employed as an auxiliary feature regularization on a deep representation learning network.Moreover, a coarse-to-fine supervision together with the proposed pose-insensitive constraint on shallow-to-deep sub-networks encourages discriminative features in a curriculum learning manner. We evaluate our method on three popular fine-grained object classification benchmarks, consistently achieving the state-of-the-art performance. Source codes are available at https://github.com/yangxh11/P2P-Net.

preprint2022arXiv

Fisher Matrix Based Fault Detection for PMUs Data in Power Grids

Abnormal event detection is critical in the safe operation of power system. In this paper, using the data collected from phasor measurement units (PMUs), two methods based on Fisher random matrix are proposed to detect faults in power grids. Firstly, the fault detection matrix is constructed and the event detection problem is reformatted as a two-sample covariance matrices test problem. Secondly, the central limit theorem for the linear spectral statistic of the Fisher matrix is derived and a test statistic for testing faults is proposed. To save computing resources, the screening step of fault interval based on the test statistic is designed to check the existence of faults. Then two point-by-point methods are proposed to determine the time of the fault in the interval. One method detects faults by checking whether the largest sample eigenvalue falls outside the supporting set of limiting spectral distribution of the standard Fisher matrix, which can detect the faults with higher accuracy. The other method tests the faults based on the statistic proposed, which has a faster detection speed. Compared with existing works, the simulation results illustrate that two methods proposed in this paper cost less computational time and provide a higher degree of accuracy.

preprint2022arXiv

Global well-posedness of the 1d compressible Navier-Stokes system with rough data

In this paper, we study the global well-posedness problem for the 1d compressible Navier-Stokers system (cNSE) in gas dynamics with rough initial data. Frist, Liu- Yu (2022) established the global well-posedness theory for the 1d isentropic cNSE with initial velocity data in BV space. Then, it was extended to the 1d cNSE for the polytropic ideal gas with initial velocity and temperature data in BV space by Wang-Yu-Zhang (2022). We improve the global well-posedness result of Liu-Yu with initial velocity data in $W^{2γ,1}$ space; and of Wang-Yu-Zhang with initial velocity data in $ L^2\cap W^{2γ,1}$ space and initial data of temperature in $\dot W^{-\frac{2}{3},\frac{6}{5}}\cap \dot W^{2γ-1,1}$ for any $γ>0$ \textit{arbitrary small}. Our essential ideas are based on establishing various "end-point" smoothing estimates for the 1d parabolic equation.

preprint2022arXiv

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

Audio classification is an important task of mapping audio samples into their corresponding labels. Recently, the transformer model with self-attention mechanisms has been adopted in this field. However, existing audio transformers require large GPU memories and long training time, meanwhile relying on pretrained vision models to achieve high performance, which limits the model's scalability in audio tasks. To combat these problems, we introduce HTS-AT: an audio transformer with a hierarchical structure to reduce the model size and training time. It is further combined with a token-semantic module to map final outputs into class featuremaps, thus enabling the model for the audio event detection (i.e. localization in time). We evaluate HTS-AT on three datasets of audio classification where it achieves new state-of-the-art (SOTA) results on AudioSet and ESC-50, and equals the SOTA on Speech Command V2. It also achieves better performance in event localization than the previous CNN-based models. Moreover, HTS-AT requires only 35% model parameters and 15% training time of the previous audio transformer. These results demonstrate the high performance and high efficiency of HTS-AT.

preprint2022arXiv

Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments

Choral music separation refers to the task of extracting tracks of voice parts (e.g., soprano, alto, tenor, and bass) from mixed audio. The lack of datasets has impeded research on this topic as previous work has only been able to train and evaluate models on a few minutes of choral music data due to copyright issues and dataset collection difficulties. In this paper, we investigate the use of synthesized training data for the source separation task on real choral music. We make three contributions: first, we provide an automated pipeline for synthesizing choral music data from sampled instrument plugins within controllable options for instrument expressiveness. This produces an 8.2-hour-long choral music dataset from the JSB Chorales Dataset and one can easily synthesize additional data. Second, we conduct an experiment to evaluate multiple separation models on available choral music separation datasets from previous work. To the best of our knowledge, this is the first experiment to comprehensively evaluate choral music separation. Third, experiments demonstrate that the synthesized choral data is of sufficient quality to improve the model's performance on real choral music datasets. This provides additional experimental statistics and data support for the choral music separation study.

preprint2022arXiv

Local Rotational Jamming and Multi-Scale Hyperuniformities in an Active Spinner System

An active system consisting of many self-spinning dimers is simulated, and a distinct local rotational jamming transition is observed as the density increases. In the low density regime, the system stays in an absorbing state, in which each dimer rotates independently subject to the applied torque. While in the high density regime, a fraction of the dimers become rotationally jammed into local clusters, and the system exhibits spinodal-decomposition like two-phase morphologies. For high enough densities, the system becomes completely jammed in both rotational and translational degrees of freedom. Such a simple system is found to exhibit rich and multiscale disordered hyperuniformities among the above phases: the absorbing state shows a critical hyperuniformity of the strongest class and subcritically preserves the vanishing density-fluctuation scaling up to some length scale; the locally-jammed state shows a two-phase hyperuniformity conversely beyond some length scale with respect to the phase cluster sizes; the totally jammed state appears to be a monomer crystal, but intrinsically loses large-scale hyperuniformity. These results are inspiring for designing novel phase-separation and disordered hyperuniform systems through dynamical organization.

preprint2022arXiv

Local well-posedness of the $1d$ compressible Navier-Stokes system with rough data

This paper presents a new approach to the local well-posedness of the $1d$ compressible Navier-Stokes systems with rough initial data. Our approach is based on establishing some smoothing and Lipschitz-type estimates for the $1d$ parabolic equation with piecewise continuous coefficients.

preprint2022arXiv

Locality-sensitive bucketing functions for the edit distance

Many bioinformatics applications involve bucketing a set of sequences where each sequence is allowed to be assigned into multiple buckets. To achieve both high sensitivity and precision, bucketing methods are desired to assign similar sequences into the same bucket while assigning dissimilar sequences into distinct buckets. Existing $k$-mer-based bucketing methods have been efficient in processing sequencing data with low error rate, but encounter much reduced sensitivity on data with high error rate. Locality-sensitive hashing (LSH) schemes are able to mitigate this issue through tolerating the edits in similar sequences, but state-of-the-art methods still have large gaps. Here we generalize the LSH function by allowing it to hash one sequence into multiple buckets. Formally, a bucketing function, which maps a sequence (of fixed length) into a subset of buckets, is defined to be $(d_1, d_2)$-sensitive if any two sequences within an edit distance of $d_1$ are mapped into at least one shared bucket, and any two sequences with distance at least $d_2$ are mapped into disjoint subsets of buckets. We construct locality-sensitive bucketing (LSB) functions with a variety of values of $(d_1,d_2)$ and analyze their efficiency with respect to the total number of buckets needed as well as the number of buckets that a specific sequence is mapped to. We also prove lower bounds of these two parameters in different settings and show that some of our constructed LSB functions are optimal. These results provide theoretical foundations for their practical use in analyzing sequences with high error rate while also providing insights for the hardness of designing ungapped LSH functions.

preprint2022arXiv

Low-rank approximation for multiscale PDEs

Historically, analysis for multiscale PDEs is largely unified while numerical schemes tend to be equation-specific. In this paper, we propose a unified framework for computing multiscale problems through random sampling. This is achieved by incorporating randomized SVD solvers and manifold learning techniques to numerically reconstruct the low-rank features of multiscale PDEs. We use multiscale radiative transfer equation and elliptic equation with rough media to showcase the application of this framework.

preprint2022arXiv

Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap

Semantic analyses of object point clouds are largely driven by releasing of benchmarking datasets, including synthetic ones whose instances are sampled from object CAD models. However, learning from synthetic data may not generalize to practical scenarios, where point clouds are typically incomplete, non-uniformly distributed, and noisy. Such a challenge of Simulation-to-Reality (Sim2Real) domain gap could be mitigated via learning algorithms of domain adaptation; however, we argue that generation of synthetic point clouds via more physically realistic rendering is a powerful alternative, as systematic non-uniform noise patterns can be captured. To this end, we propose an integrated scheme consisting of physically realistic synthesis of object point clouds via rendering stereo images via projection of speckle patterns onto CAD models and a novel quasi-balanced self-training designed for more balanced data distribution by sparsity-driven selection of pseudo labeled samples for long tailed classes. Experiment results can verify the effectiveness of our method as well as both of its modules for unsupervised domain adaptation on point cloud classification, achieving the state-of-the-art performance. Source codes and the SpeckleNet synthetic dataset are available at https://github.com/Gorilla-Lab-SCUT/QS3.

preprint2022arXiv

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

Singing melody extraction is an important problem in the field of music information retrieval. Existing methods typically rely on frequency-domain representations to estimate the sung frequencies. However, this design does not lead to human-level performance in the perception of melody information for both tone (pitch-class) and octave. In this paper, we propose TONet, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture. First, we present an improved input representation, the Tone-CFP, that explicitly groups harmonics via a rearrangement of frequency-bins. Second, we introduce an encoder-decoder architecture that is designed to obtain a salience feature map, a tone feature map, and an octave feature map. Third, we propose a tone-octave fusion mechanism to improve the final salience feature map. Experiments are done to verify the capability of TONet with various baseline backbone models. Our results show that tone-octave fusion with Tone-CFP can significantly improve the singing voice extraction performance across various datasets -- with substantial gains in octave and tone accuracy.

preprint2022arXiv

Weak solutions of the three-dimensional hypoviscous elastodynamics with finite kinetic energy

We construct weak solutions to the 3D hypoviscous incompressible elastodynamics with finite kinetic energy which was unknown in literatures. Our result holds for fractional hypoviscosity $(-Δ)^θ$, where $0\leqθ<1$. The proof {consists of a convex integration scheme with new building blocks of 2D intermittency and suitable temporal correctors, which are motivated by} the inherent geometric structure of the viscoelastic equations.

preprint2022arXiv

Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

Deep learning techniques for separating audio into different sound sources face several challenges. Standard architectures require training separate models for different types of audio sources. Although some universal separators employ a single model to target multiple sources, they have difficulty generalizing to unseen sources. In this paper, we propose a three-component pipeline to train a universal audio source separator from a large, but weakly-labeled dataset: AudioSet. First, we propose a transformer-based sound event detection system for processing weakly-labeled training data. Second, we devise a query-based audio separation model that leverages this data for model training. Third, we design a latent embedding processor to encode queries that specify audio targets for separation, allowing for zero-shot generalization. Our approach uses a single model for source separation of multiple sound types, and relies solely on weakly-labeled data for training. In addition, the proposed audio separator can be used in a zero-shot setting, learning to separate types of audio sources that were never seen in training. To evaluate the separation performance, we test our model on MUSDB18, while training on the disjoint AudioSet. We further verify the zero-shot performance by conducting another experiment on audio source types that are held-out from training. The model achieves comparable Source-to-Distortion Ratio (SDR) performance to current supervised models in both cases.

preprint2021arXiv

MAX Phase Zr2SeC and Its Thermal Conduction Behavior

The elemental diversity is crucial to screen out ternary MAX phases with outstanding properties via tuning of bonding types and strength between constitutive atoms. As a matter of fact, the interactions between M and A atoms largely determine the physical and chemical properties of MAX phases. Herein, Se element was experimentally realized to occupy the A site of a MAX phase, Zr2SeC, becoming a new member within this nanolaminated ternary carbide family. Comprehensive characterizations including Rietveld refinement of X-ray Diffraction and atom-resolved transmission electron microscopy techniques were employed to validate this novel MAX phase. The distinct thermal conduction behaviors emerged are attributed to the characteristic interactions between Zr and Se atoms.

preprint2021arXiv

Reduced Ionic Diffusion by the Dynamic Electron-Ion Collisions in Warm Dense Hydrogen

The dynamic electron-ion collisions play an important rolein determining the static and transport properties of warmdense matter (WDM). Electron force field (eFF) method is applied to study the ionic transport properties of warm densehydrogen. Compared with the results from quantum moleculardynamics and orbital-free molecular dynamics, the ionicdiffusions are largely reduced by involving the dynamic collisions of electrons and ions. This physics is verified by quantum Langevin molecular dynamics (QLMD) simulations, which includes electron-ion collisions induced friction(EI-CIF) into the dynamic equation of ions. Based on these new results, we proposed a model including the correctionof collisions induced friction of the ionic diffusion. The CIF model has been verified to be valid at a wide range ofdensity and temperature. We also compare the results with the Yukawa one component plasma (YOCP) model andEffective OCP (EOCP) model. We proposed to calculate the self-diffusion coefficients using the EOCP model modifiedby the CIF model to introduce the dynamic electron-ion collisions effect.

preprint2020arXiv

A Framework for End-to-End Learning on Semantic Tree-Structured Data

While learning models are typically studied for inputs in the form of a fixed dimensional feature vector, real world data is rarely found in this form. In order to meet the basic requirement of traditional learning models, structural data generally have to be converted into fix-length vectors in a handcrafted manner, which is tedious and may even incur information loss. A common form of structured data is what we term "semantic tree-structures", corresponding to data where rich semantic information is encoded in a compositional manner, such as those expressed in JavaScript Object Notation (JSON) and eXtensible Markup Language (XML). For tree-structured data, several learning models have been studied to allow for working directly on raw tree-structure data, However such learning models are limited to either a specific tree-topology or a specific tree-structured data format, e.g., synthetic parse trees. In this paper, we propose a novel framework for end-to-end learning on generic semantic tree-structured data of arbitrary topology and heterogeneous data types, such as data expressed in JSON, XML and so on. Motivated by the works in recursive and recurrent neural networks, we develop exemplar neural implementations of our framework for the JSON format. We evaluate our approach on several UCI benchmark datasets, including ablation and data-efficiency studies, and on a toy reinforcement learning task. Experimental results suggest that our framework yields comparable performance to use of standard models with dedicated feature-vectors in general, and even exceeds baseline performance in cases where compositional nature of the data is particularly important. The source code for a JSON-based implementation of our framework along with experiments can be downloaded at https://github.com/EndingCredits/json2vec.

preprint2020arXiv

A low-rank Schwarz method for radiative transport equation with heterogeneous scattering coefficient

Random sampling has been used to find low-rank structure and to build fast direct solvers for multiscale partial differential equations of various types. In this work, we design an accelerated Schwarz method for radiative transfer equations that makes use of approximate local solution maps constructed offline via a random sampling strategy. Numerical examples demonstrate the accuracy, robustness, and efficiency of the proposed approach.

preprint2020arXiv

CAD-PU: A Curvature-Adaptive Deep Learning Solution for Point Set Upsampling

Point set is arguably the most direct approximation of an object or scene surface, yet its practical acquisition often suffers from the shortcoming of being noisy, sparse, and possibly incomplete, which restricts its use for a high-quality surface recovery. Point set upsampling aims to increase its density and regularity such that a better surface recovery could be achieved. The problem is severely ill-posed and challenging, considering that the upsampling target itself is only an approximation of the underlying surface. Motivated to improve the surface approximation via point set upsampling, we identify the factors that are critical to the objective, by pairing the surface approximation error bounds of the input and output point sets. It suggests that given a fixed budget of points in the upsampling result, more points should be distributed onto the surface regions where local curvatures are relatively high. To implement the motivation, we propose a novel design of Curvature-ADaptive Point set Upsampling network (CAD-PU), the core of which is a module of curvature-adaptive feature expansion. To train CAD-PU, we follow the same motivation and propose geometrically intuitive surrogates that approximate discrete notions of surface curvature for the upsampled point set. We further integrate the proposed surrogates into an adversarial learning based curvature minimization objective, which gives a practically effective learning of CAD-PU. We conduct thorough experiments that show the efficacy of our contributions and the advantages of our method over existing ones. Our implementation codes are publicly available at https://github.com/JiehongLin/CAD-PU.

preprint2020arXiv

Compositional Few-Shot Recognition with Primitive Discovery and Enhancing

Few-shot learning (FSL) aims at recognizing novel classes given only few training samples, which still remains a great challenge for deep learning. However, humans can easily recognize novel classes with only few samples. A key component of such ability is the compositional recognition that human can perform, which has been well studied in cognitive science but is not well explored in FSL. Inspired by such capability of humans, to imitate humans' ability of learning visual primitives and composing primitives to recognize novel classes, we propose an approach to FSL to learn a feature representation composed of important primitives, which is jointly trained with two parts, i.e. primitive discovery and primitive enhancing. In primitive discovery, we focus on learning primitives related to object parts by self-supervision from the order of image splits, avoiding extra laborious annotations and alleviating the effect of semantic gaps. In primitive enhancing, inspired by current studies on the interpretability of deep networks, we provide our composition view for the FSL baseline model. To modify this model for effective composition, inspired by both mathematical deduction and biological studies (the Hebbian Learning rule and the Winner-Take-All mechanism), we propose a soft composition mechanism by enlarging the activation of important primitives while reducing that of others, so as to enhance the influence of important primitives and better utilize these primitives to compose novel classes. Extensive experiments on public benchmarks are conducted on both the few-shot image classification and video recognition tasks. Our method achieves the state-of-the-art performance on all these datasets and shows better interpretability.

preprint2020arXiv

Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions

Automatic music generation is an interdisciplinary research topic that combines computational creativity and semantic analysis of music to create automatic machine improvisations. An important property of such a system is allowing the user to specify conditions and desired properties of the generated music. In this paper we designed a model for composing melodies given a user specified symbolic scenario combined with a previous music context. We add manual labeled vectors denoting external music quality in terms of chord function that provides a low dimensional representation of the harmonic tension and resolution. Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song. The model contains two stages and requires separate training where the first stage adopts a Conditional Variational Autoencoder (C-VAE) to build a bijection between note sequences and their latent representations, and the second stage adopts long short-term memory networks (LSTM) with structural conditions to continue writing future melodies. We further exploit the disentanglement technique via C-VAE to allow melody generation based on pitch contour information separately from conditioning on rhythm patterns. Finally, we evaluate the proposed model using quantitative analysis of rhythm and the subjective listening study. Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns. The ability to generate longer and more structural phrases from disentangled representations combined with semantic scenario specification conditions shows a broad application of our model.

preprint2020arXiv

Efficient Feature-based Image Registration by Mapping Sparsified Surfaces

With the advancement in the digital camera technology, the use of high resolution images and videos has been widespread in the modern society. In particular, image and video frame registration is frequently applied in computer graphics and film production. However, conventional registration approaches usually require long computational time for high resolution images and video frames. This hinders the application of the registration approaches in the modern industries. In this work, we first propose a new image representation method to accelerate the registration process by triangulating the images effectively. For each high resolution image or video frame, we compute an optimal coarse triangulation which captures the important features of the image. Then, we apply a surface registration algorithm to obtain a registration map which is used to compute the registration of the high resolution image. Experimental results suggest that our overall algorithm is efficient and capable to achieve a high compression rate while the accuracy of the registration is well retained when compared with the conventional grid-based approach. Also, the computational time of the registration is significantly reduced using our triangulation-based approach.

preprint2020arXiv

Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors

Existing deep learning algorithms for point cloud analysis mainly concern discovering semantic patterns from global configuration of local geometries in a supervised learning manner. However, very few explore geometric properties revealing local surface manifolds embedded in 3D Euclidean space to discriminate semantic classes or object parts as additional supervision signals. This paper is the first attempt to propose a unique multi-task geometric learning network to improve semantic analysis by auxiliary geometric learning with local shape properties, which can be either generated via physical computation from point clouds themselves as self-supervision signals or provided as privileged information. Owing to explicitly encoding local shape manifolds in favor of semantic analysis, the proposed geometric self-supervised and privileged learning algorithms can achieve superior performance to their backbone baselines and other state-of-the-art methods, which are verified in the experiments on the popular benchmarks.

preprint2020arXiv

Independent wavefront tailoring in full polarization channels by helicity-decoupled metasurface

Controlling the polarization and wavefront of light is essential for compact photonic systems in modern science and technology. This may be achieved by metasurfaces, a new platform that has radically changed the way people engineer wave-matter interactions. However, it still remains very challenging to generate versatile beams with arbitrary and independent wavefronts in each polarization channel by a single ultrathin metasurface. By modulating both the geometric and propagation phases of the metasurface, here we propose a method that can generate an assembly of circularly- and linearly-polarized beams with simultaneously the capability of independent encoding desired wavefront to each individual polarization channel, which we believe will greatly enhance the information capacities of the meta-devices. Two proof-of-concept designs are experimentally demonstrated in microwave region. Upon the excitation of an arbitrary linear polarization, the first device can generate distinct vortex beams with desired two linear and two circular orthogonal polarizations, whereas the second one can generate multi-foci containing components of full polarizations. This approach to generate versatile polarizations with tailored wavefront may pave a way to achieve advanced, flat and multifunctional meta-device for integrated systems.

preprint2020arXiv

Multiparty Selection

Given a sequence $A$ of $n$ numbers and an integer (target) parameter $1\leq i\leq n$, the (exact) selection problem asks to find the $i$-th smallest element in $A$. An element is said to be $(i,j)$-mediocre if it is neither among the top $i$ nor among the bottom $j$ elements of $S$. The approximate selection problem asks to find a $(i,j)$-mediocre element for some given $i,j$; as such, this variant allows the algorithm to return any element in a prescribed range. In the first part, we revisit the selection problem in the two-party model introduced by Andrew Yao (1979) and then extend our study of exact selection to the multiparty model. In the second part, we deduce some communication complexity benefits that arise in approximate selection. In particular, we present a deterministic protocol for finding an approximate median among $k$ players.

preprint2020arXiv

MusPy: A Toolkit for Symbolic Music Generation

In this paper, we present MusPy, an open source Python library for symbolic music generation. MusPy provides easy-to-use tools for essential components in a music generation system, including dataset management, data I/O, data preprocessing and model evaluation. In order to showcase its potential, we present statistical analysis of the eleven datasets currently supported by MusPy. Moreover, we conduct a cross-dataset generalizability experiment by training an autoregressive model on each dataset and measuring held-out likelihood on the others---a process which is made easier by MusPy's dataset management system. The results provide a map of domain overlap between various commonly used datasets and show that some datasets contain more representative cross-genre samples than others. Along with the dataset analysis, these results might serve as a guide for choosing datasets in future research. Source code and documentation are available at https://github.com/salu133445/muspy .

preprint2020arXiv

MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views

Autonomous driving requires the inference of actionable information such as detecting and classifying objects, and determining the drivable space. To this end, we present Multi-View LidarNet (MVLidarNet), a two-stage deep neural network for multi-class object detection and drivable space segmentation using multiple views of a single LiDAR point cloud. The first stage processes the point cloud projected onto a perspective view in order to semantically segment the scene. The second stage then processes the point cloud (along with semantic labels from the first stage) projected onto a bird's eye view, to detect and classify objects. Both stages use an encoder-decoder architecture. We show that our multi-view, multi-stage, multi-class approach is able to detect and classify objects while simultaneously determining the drivable space using a single LiDAR scan as input, in challenging scenes with more than one hundred vehicles and pedestrians at a time. The system operates efficiently at 150 fps on an embedded GPU designed for a self-driving car, including a postprocessing step to maintain identities over time. We show results on both KITTI and a much larger internal dataset, thus demonstrating the method's ability to scale by an order of magnitude.

preprint2020arXiv

On the Longest Spanning Tree with Neighborhoods

We study a maximization problem for geometric network design. Given a set of $n$ compact neighborhoods in $\mathbb{R}^d$, select a point in each neighborhood, so that the longest spanning tree on these points (as vertices) has maximum length. Here we give an approximation algorithm with ratio $0.511$, which represents the first, albeit small, improvement beyond $1/2$. While we suspect that the problem is NP-hard already in the plane, this issue remains open.

preprint2020arXiv

POP909: A Pop-song Dataset for Music Arrangement Generation

Music arrangement generation is a subtask of automatic music generation, which involves reconstructing and re-conceptualizing a piece with new compositional techniques. Such a generation process inevitably requires reference from the original melody, chord progression, or other structural information. Despite some promising models for arrangement, they lack more refined data to achieve better evaluations and more practical results. In this paper, we propose POP909, a dataset which contains multiple versions of the piano arrangements of 909 popular songs created by professional musicians. The main body of the dataset contains the vocal melody, the lead instrument melody, and the piano accompaniment for each song in MIDI format, which are aligned to the original audio files. Furthermore, we provide the annotations of tempo, beat, key, and chords, where the tempo curves are hand-labeled and others are done by MIR algorithms. Finally, we conduct several baseline experiments with this dataset using standard deep music generation algorithms.

preprint2020arXiv

Random Sampling and Efficient Algorithms for Multiscale PDEs

We describe a numerical framework that uses random sampling to efficiently capture low-rank local solution spaces of multiscale PDE problems arising in domain decomposition. In contrast to existing techniques, our method does not rely on detailed analytical understanding of specific multiscale PDEs, in particular, their asymptotic limits. We present the application of the framework on two examples --- a linear kinetic equation and an elliptic equation with rough media. On these two examples, this framework achieves the asymptotic preserving property for the kinetic equations and numerical homogenization for the elliptic equations.

preprint2020arXiv

Structured random sketching for PDE inverse problems

For an overdetermined system $\mathsf{A}\mathsf{x} \approx \mathsf{b}$ with $\mathsf{A}$ and $\mathsf{b}$ given, the least-square (LS) formulation $\min_x \, \|\mathsf{A}\mathsf{x}-\mathsf{b}\|_2$ is often used to find an acceptable solution $\mathsf{x}$. The cost of solving this problem depends on the dimensions of $\mathsf{A}$, which are large in many practical instances. This cost can be reduced by the use of random sketching, in which we choose a matrix $\mathsf{S}$ with fewer rows than $\mathsf{A}$ and $\mathsf{b}$, and solve the sketched LS problem $\min_x \, \|\mathsf{S}(\mathsf{A} \mathsf{x}-\mathsf{b})\|_2$ to obtain an approximate solution to the original LS problem. Significant theoretical and practical progress has been made in the last decade in designing the appropriate structure and distribution for the sketching matrix $\mathsf{S}$. When $\mathsf{A}$ and $\mathsf{b}$ arise from discretizations of a PDE-based inverse problem, tensor structure is often present in $\mathsf{A}$ and $\mathsf{b}$. For reasons of practical efficiency, $\mathsf{S}$ should be designed to have a structure consistent with that of $\mathsf{A}$. Can we claim similar approximation properties for the solution of the sketched LS problem with structured $\mathsf{S}$ as for fully-random $\mathsf{S}$? We give estimates that relate the quality of the solution of the sketched LS problem to the size of the structured sketching matrices, for two different structures. Our results are among the first known for random sketching matrices whose structure is suitable for use in PDE inverse problems.

preprint2020arXiv

Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering

Unsupervised domain adaptation (UDA) is to make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution shifts from the target one. Mainstream UDA methods learn aligned features between the two domains, such that a classifier trained on the source features can be readily applied to the target ones. However, such a transferring strategy has a potential risk of damaging the intrinsic discrimination of target data. To alleviate this risk, we are motivated by the assumption of structural domain similarity, and propose to directly uncover the intrinsic target discrimination via discriminative clustering of target data. We constrain the clustering solutions using structural source regularization that hinges on our assumed structural domain similarity. Technically, we use a flexible framework of deep network based discriminative clustering that minimizes the KL divergence between predictive label distribution of the network and an introduced auxiliary one; replacing the auxiliary distribution with that formed by ground-truth labels of source data implements the structural source regularization via a simple strategy of joint network training. We term our proposed method as Structurally Regularized Deep Clustering (SRDC), where we also enhance target discrimination with clustering of intermediate network features, and enhance structural regularization with soft selection of less divergent source examples. Careful ablation studies show the efficacy of our proposed SRDC. Notably, with no explicit domain alignment, SRDC outperforms all existing methods on three UDA benchmarks.

preprint2019arXiv

Multielemental single-atom-thick A layers in nanolaminated V2(Sn, A)C (A=Fe, Co, Ni, Mn) for tailoring magnetic properties

Tailoring of individual single-atom-thick layers in nanolaminated materials offers atomic-level control over material properties. Nonetheless, multielement alloying in individual atomic layers in nanolaminates is largely unexplored. Here, we report a series of inherently nanolaminated V2(A'xSn1-x)C (A'=Fe, Co, Ni and Mn, and combinations thereof, with x=1/3) synthesized by an alloy-guided reaction. The simultaneous occupancy of the four magnetic elements and Sn, the individual single-atom-thick A layers in the compound constitute high-entropy-alloy analogues, two-dimensional in the sense that the alloying exclusively occurs in the A layers. V2(A'xSn1-x)C exhibit distinct ferromagnetic behavior that can be compositionally tailored from the multielement A-layer alloying. This two-dimensional alloying provides a structural-design route with expanded chemical space for discovering materials and exploit properties.

preprint2019arXiv

Optical properties of cubic boron arsenide

The ultrahigh thermal conductivity of boron arsenide makes it a promising material for next-generation electronics and optoelectronics. In this work, we report measured optical properties of cubic boron arsenide crystals including the complex dielectric function, refractive index, and absorption coefficient in the ultraviolet, visible, and near-infrared wavelength range. The data were collected at room temperature using spectroscopic ellipsometry as well as transmission and reflection spectroscopy. We further calculate the optical response using density functional and many-body perturbation theory, considering quasiparticle and excitonic corrections. The computed values for the direct and indirect band gaps (4.25 eV and 2.07 eV) agree well with the measured results (4.12 eV and 2.02 eV). Our findings contribute to the effort of using boron arsenide in novel electronic and optoelectronic applications that take advantage of its demonstrated ultrahigh thermal conductivity and predicted high ambipolar carrier mobility.

preprint2019arXiv

The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation

With recent breakthroughs in artificial neural networks, deep generative models have become one of the leading techniques for computational creativity. Despite very promising progress on image and short sequence generation, symbolic music generation remains a challenging problem since the structure of compositions are usually complicated. In this study, we attempt to solve the melody generation problem constrained by the given chord progression. This music meta-creation problem can also be incorporated into a plan recognition system with user inputs and predictive structural outputs. In particular, we explore the effect of explicit architectural encoding of musical structure via comparing two sequential generative models: LSTM (a type of RNN) and WaveNet (dilated temporal-CNN). As far as we know, this is the first study of applying WaveNet to symbolic music generation, as well as the first systematic comparison between temporal-CNN and RNN for music generation. We conduct a survey for evaluation in our generations and implemented Variable Markov Oracle in music pattern discovery. Experimental results show that to encode structure more explicitly using a stack of dilated convolution layers improved the performance significantly, and a global encoding of underlying chord progression into the generation procedure gains even more.

preprint2019arXiv

Tooth morphometry using quasi-conformal theory

Shape analysis is important in anthropology, bioarchaeology and forensic science for interpreting useful information from human remains. In particular, teeth are morphologically stable and hence well-suited for shape analysis. In this work, we propose a framework for tooth morphometry using quasi-conformal theory. Landmark-matching Teichmüller maps are used for establishing a 1-1 correspondence between tooth surfaces with prescribed anatomical landmarks. Then, a quasi-conformal statistical shape analysis model based on the Teichmüller mapping results is proposed for building a tooth classification scheme. We deploy our framework on a dataset of human premolars to analyze the tooth shape variation among genders and ancestries. Experimental results show that our method achieves much higher classification accuracy with respect to both gender and ancestry when compared to the existing methods. Furthermore, our model reveals the underlying tooth shape difference between different genders and ancestries in terms of the local geometric distortion and curvatures.

preprint2016arXiv

Coupled Leidenfrost States as a Monodisperse Granular Clock

Using an event-driven molecular dynamics simulation, we show that simple monodisperse granular beads confined in coupled columns may oscillate as a new type of granular clock. To trigger this oscillation, the system needs to be driven against gravity into a density-inverted state, with a high-density clustering phase supported from below by a gas-like low-density phase (Leidenfrost effect) in each column. Our analysis reveals that the density-inverted structure and the relaxation dynamics between the phases can amplify any small asymmetry between the columns, and lead to a giant oscillation. The oscillation occurs only for an intermediate range of the coupling strength, and the corresponding phase diagram can be universally described with a characteristic height of the density-inverted structure. A minimal two-phase model is proposed and linear stability analysis shows that the triggering mechanism of the oscillation can be explained as a switchable two-parameter Hopf bifurcation. Numerical solutions of the model also reproduce similar oscillatory dynamics to the simulation results.

preprint2016arXiv

Deep Structured-Output Regression Learning for Computational Color Constancy

Computational color constancy that requires esti- mation of illuminant colors of images is a fundamental yet active problem in computer vision, which can be formulated into a regression problem. To learn a robust regressor for color constancy, obtaining meaningful imagery features and capturing latent correlations across output variables play a vital role. In this work, we introduce a novel deep structured-output regression learning framework to achieve both goals simultaneously. By borrowing the power of deep convolutional neural networks (CNN) originally designed for visual recognition, the proposed framework can automatically discover strong features for white balancing over different illumination conditions and learn a multi-output regressor beyond underlying relationships between features and targets to find the complex interdependence of dif- ferent dimensions of target variables. Experiments on two public benchmarks demonstrate that our method achieves competitive performance in comparison with the state-of-the-art approaches.

preprint2016arXiv

Finiteness of hyperelliptic and superelliptic curves with CM Jacobians

In this paper we study the Coleman-Oort conjecture for superelliptic curves, i.e., curves defined by affine equations $y^n=F(x)$ with $F$ a separable polynomial. We prove that up to isomorphism there are at most finitely many superelliptic curves of fixed genus $g\geq 8$ with CM Jacobians. The proof relies on the geometric structures of Shimura subvarieties in Siegel modular varieties and the stability properties of Higgs bundles associated to fibred surfaces.

preprint2016arXiv

Learning Contextualized Music Semantics from Tags via a Siamese Network

Music information retrieval faces a challenge in modeling contextualized musical concepts formulated by a set of co-occurring tags. In this paper, we investigate the suitability of our recently proposed approach based on a Siamese neural network in fighting off this challenge. By means of tag features and probabilistic topic models, the network captures contextualized semantics from tags via unsupervised learning. This leads to a distributed semantics space and a potential solution to the out of vocabulary problem which has yet to be sufficiently addressed. We explore the nature of the resultant music-based semantics and address computational needs. We conduct experiments on three public music tag collections -namely, CAL500, MagTag5K and Million Song Dataset- and compare our approach to a number of state-of-the-art semantics learning approaches. Comparative results suggest that this approach outperforms previous approaches in terms of semantic priming and music tag completion.

preprint2016arXiv

Multi-Label Zero-Shot Learning via Concept Embedding

Zero Shot Learning (ZSL) enables a learning model to classify instances of an unseen class during training. While most research in ZSL focuses on single-label classification, few studies have been done in multi-label ZSL, where an instance is associated with a set of labels simultaneously, due to the difficulty in modeling complex semantics conveyed by a set of labels. In this paper, we propose a novel approach to multi-label ZSL via concept embedding learned from collections of public users' annotations of multimedia. Thanks to concept embedding, multi-label ZSL can be done by efficiently mapping an instance input features onto the concept embedding space in a similar manner used in single-label ZSL. Moreover, our semantic learning model is capable of embedding an out-of-vocabulary label by inferring its meaning from its co-occurring labels. Thus, our approach allows both seen and unseen labels during the concept embedding learning to be used in the aforementioned instance mapping, which makes multi-label ZSL more flexible and suitable for real applications. Experimental results of multi-label ZSL on images and music tracks suggest that our approach outperforms a state-of-the-art multi-label ZSL model and can deal with a scenario involving out-of-vocabulary labels without re-training the semantics learning model.

preprint2016arXiv

On Higgs bundles over Shimura varieties of ball quotient type

We prove the generic exclusion of certain Shimura varieties of unitary and orthogonal types from the Torelli locus. The proof relies on a slope inequality on surface fibration due to G. Xiao, and the main result implies that certain Shimura varieties only meet the Torelli locus in dimension zero.

preprint2015arXiv

A note on Shimura subvarieties in the hyperelliptic Torelli locus

We prove the non-existence of Shimura subvarieties of positive dimension contained generically in the hyperelliptic Torelli locus for curves of genus at least 8, which is an analogue of Oort's conjecture in the hyperelliptic case.

preprint2015arXiv

A novel variational model for image registration using Gaussian curvature

Image registration is one important task in many image processing applications. It aims to align two or more images so that useful information can be extracted through comparison, combination or superposition. This is achieved by constructing an optimal trans- formation which ensures that the template image becomes similar to a given reference image. Although many models exist, designing a model capable of modelling large and smooth deformation field continues to pose a challenge. This paper proposes a novel variational model for image registration using the Gaussian curvature as a regulariser. The model is motivated by the surface restoration work in geometric processing [Elsey and Esedoglu, Multiscale Model. Simul., (2009), pp. 1549-1573]. An effective numerical solver is provided for the model using an augmented Lagrangian method. Numerical experiments can show that the new model outperforms three competing models based on, respectively, a linear curvature [Fischer and Modersitzki, J. Math. Imaging Vis., (2003), pp. 81- 85], the mean curvature [Chumchob, Chen and Brito, Multiscale Model. Simul., (2011), pp. 89-128] and the diffeomorphic demon model [Vercauteren at al., NeuroImage, (2009), pp. 61-72] in terms of robustness and accuracy.

preprint2015arXiv

A Total Fractional-Order Variation Model for Image Restoration with Non-homogeneous Boundary Conditions and its Numerical Solution

To overcome the weakness of a total variation based model for image restoration, various high order (typically second order) regularization models have been proposed and studied recently. In this paper we analyze and test a fractional-order derivative based total $α$-order variation model, which can outperform the currently popular high order regularization models. There exist several previous works using total $α$-order variations for image restoration; however first no analysis is done yet and second all tested formulations, differing from each other, utilize the zero Dirichlet boundary conditions which are not realistic (while non-zero boundary conditions violate definitions of fractional-order derivatives). This paper first reviews some results of fractional-order derivatives and then analyzes the theoretical properties of the proposed total $α$-order variational model rigorously. It then develops four algorithms for solving the variational problem, one based on the variational Split-Bregman idea and three based on direct solution of the discretise-optimization problem. Numerical experiments show that, in terms of restoration quality and solution efficiency, the proposed model can produce highly competitive results, for smooth images, to two established high order models: the mean curvature and the total generalized variation.

preprint2015arXiv

Bounded eqidistribution of special subvarieties in mixed Shimura varieties

In this paper we prove the equidistribution of bounded sequences of special subvarieties in a general mixed Shimura varieties, a notion adapted from the pure case treated by Clozel, Ullmo, and Yafaev in the study of the Andre-Oort conjecture. We also discuss the relation of bounded sequences with bounded Galois orbits of special subvarieties.

preprint2015arXiv

Learning Constructive Primitives for Online Level Generation and Real-time Content Adaptation in Super Mario Bros

Procedural content generation (PCG) is of great interest to game design and development as it generates game content automatically. Motivated by the recent learning-based PCG framework and other existing PCG works, we propose an alternative approach to online content generation and adaptation in Super Mario Bros (SMB). Unlike most of existing works in SMB, our approach exploits the synergy between rule-based and learning-based methods to produce constructive primitives, quality yet controllable game segments in SMB. As a result, a complete quality game level can be generated online by integrating relevant constructive primitives via controllable parameters regarding geometrical features and procedure-level properties. Also the adaptive content can be generated in real time by dynamically selecting proper constructive primitives via an adaptation criterion, e.g., dynamic difficulty adjustment (DDA). Our approach is of several favorable properties in terms of content quality assurance, generation efficiency and controllability. Extensive simulation results demonstrate that the proposed approach can generate controllable yet quality game levels online and adaptable content for DDA in real time.

preprint2015arXiv

Learning Contextualized Semantics from Co-occurring Terms via a Siamese Architecture

One of the biggest challenges in Multimedia information retrieval and understanding is to bridge the semantic gap by properly modeling concept semantics in context. The presence of out of vocabulary (OOV) concepts exacerbates this difficulty. To address the semantic gap issues, we formulate a problem on learning contextualized semantics from descriptive terms and propose a novel Siamese architecture to model the contextualized semantics from descriptive terms. By means of pattern aggregation and probabilistic topic models, our Siamese architecture captures contextualized semantics from the co-occurring descriptive terms via unsupervised learning, which leads to a concept embedding space of the terms in context. Furthermore, the co-occurring OOV concepts can be easily represented in the learnt concept embedding space. The main properties of the concept embedding space are demonstrated via visualization. Using various settings in semantic priming, we have carried out a thorough evaluation by comparing our approach to a number of state-of-the-art methods on six annotation corpora in different domains, i.e., MagTag5K, CAL500 and Million Song Dataset in the music domain as well as Corel5K, LabelMe and SUNDatabase in the image domain. Experimental results on semantic priming suggest that our approach outperforms those state-of-the-art methods considerably in various aspects.

preprint2014arXiv

Energy Gap Substructures in Conductance Measurements of MgB2-based Josephson Junctions: Beyond the 2-Gap Model

Several theoretical analyses of the two superconducting energy gaps of magnesium diboride, $Δ_π$ and $Δ_σ$, predict substructures within each energy gap, rather than two pure numbers. Recent experiments have revealed similar structures. We report tunneling conductance data providing additional experimental evidence for these features. The absence of these features in c-axis tunneling, and a sharp peak in the subgap (associated with the counterelectrode material), support the conclusion that these features are intrinsic to MgB2. By demonstrating the inadequacy of a simple two-gap model in fitting the data, we illustrate that some distinctions between theoretical models of energy gap substructures are experimentally accessible.

preprint2014arXiv

Rapid Skill Capture in a First-Person Shooter

Various aspects of computer game design, including adaptive elements of game levels, characteristics of 'bot' behavior, and player matching in multiplayer games, would ideally be sensitive to a player's skill level. Yet, while difficulty and player learning have been explored in the context of games, there has been little work analyzing skill per se, and how it pertains to a player's input. To this end, we present a data set of 476 game logs from over 40 players of a first-person shooter game (Red Eclipse) as a basis of a case study. We then analyze different metrics of skill and show that some of these can be predicted using only a few seconds of keyboard and mouse input. We argue that the techniques used here are useful for adapting games to match players' skill levels rapidly, perhaps more rapidly than solutions based on performance averaging such as TrueSkill.

preprint2013arXiv

Implementing program extraction from CL1-proofs

Computability logic (CoL) is a formal theory of interactive computation. It understands computational problems as games played by two players: a machine and its environment, uses logical formalism to describe valid principles of computability and formulas to represent computational problems. Logic CL1 is a deductive system for a fragment of CoL. The logical vocabulary contains all of the operators of classical logic and choice operators, the atoms represent elementary games i.e. predicates of classical logic. In this paper, we present a program that takes a CL1-proof of an arbitrary formula $F$, and extract a winning strategy for $F$ from that proof then play $F$ using that strategy. We hope this paper would provide a starting point for further work in program extraction of the CoL-based arithmetic and other CoL-based applied systems.

preprint2013arXiv

Learning-Based Procedural Content Generation

Procedural content generation (PCG) has recently become one of the hottest topics in computational intelligence and AI game researches. Among a variety of PCG techniques, search-based approaches overwhelmingly dominate PCG development at present. While SBPCG leads to promising results and successful applications, it poses a number of challenges ranging from representation to evaluation of the content being generated. In this paper, we present an alternative yet generic PCG framework, named learning-based procedure content generation (LBPCG), to provide potential solutions to several challenging problems in existing PCG techniques. By exploring and exploiting information gained in game development and public beta test via data-driven learning, our framework can generate robust content adaptable to end-user or target players on-line with minimal interruption to their experience. Furthermore, we develop enabling techniques to implement the various models required in our framework. For a proof of concept, we have developed a prototype based on the classic open source first-person shooter game, Quake. Simulation results suggest that our framework is promising in generating quality content.

preprint2013arXiv

Phonon Dispersion and Elastic Moduli of Two-Dimensional Disordered Colloidal Packings of Soft Particles with Frictional Interactions

Particle tracking and displacement covariance matrix techniques are employed to investigate the phonon dispersion relations of two-dimensional colloidal glasses composed of soft, thermoresponsive microgel particles whose temperature-sensitive size permits \textit{in situ} variation of particle packing fraction. Bulk, $B$, and shear, $G$, moduli of the colloidal glasses are extracted from the dispersion relations as a function of packing fraction, and variation of the ratio $G/B$ with packing fraction is found to agree quantitatively with predictions for jammed packings of frictional soft particles. In addition, $G$ and $B$ individually agree with numerical predictions for frictional particles. This remarkable level of agreement enabled us to extract an energy scale for the inter-particle interaction from the individual elastic constants and to derive an approximate estimate for the inter-particle friction coefficient.

preprint2013arXiv

Phonons in two-dimensional soft colloidal crystals

The vibrational modes of pristine and polycrystalline monolayer colloidal crystals composed of thermosensitive microgel particles are measured using video microscopy and covariance matrix analysis. At low frequencies, the Debye relation for two dimensional harmonic crystals is observed in both crystal types; at higher frequencies, evidence for van Hove singularities in the phonon density of states is significantly smeared out by experimental noise and measurement statistics. The effects of these errors are analyzed using numerical simulations. We introduce methods to correct for these limitations, which can be applied to disordered systems as well as crystalline ones, and we show that application of the error correction procedure to the experimental data leads to more pronounced van Hove singularities in the pristine crystal. Finally, quasi-localized low-frequency modes in polycrystalline two-dimensional colloidal crystals are identified and demonstrated to correlate with structural defects such as dislocations, suggesting that quasi-localized low-frequency phonon modes may be used to identify local regions vulnerable to rearrangements in crystalline as well as amorphous solids.

preprint2011arXiv

Femtosecond Laser-induced Crystallization of Amorphous Sb2Te3 film and Coherent Phonon Spectroscopy Characterization and Optical Injection of Electron Spins

A femtosecond laser-irradiated crystallizing technique is tried to convert amorphous Sb2Te3 film into crystalline film. Sensitive coherent phonon spectroscopy (CPS) is used to monitor the crystallization of amorphous Sb2Te3 film at the original irradiation site. The CPS reveals that the vibration strength of two phonon modes that correspond to the characteristic phonon modes of crystalline Sb2Te3, enhances with increasing laser irradiation fluence (LIF), showing the rise of the degree of crystallization with LIF and that femtosecond laser irradiation is a good post-treatment technique. Time-resolved circularly polarized pump-probe spectroscopy is used to investigate electron spin relaxation dynamics of the laser-induced crystallized Sb2Te3 film. Spin relaxation process indeed is observed, confirming the theoretical predictions on the validity of spin-dependent optical transition selection rule and the feasibility of transient spin-grating-based optical detection scheme of spin-plasmon collective modes in Sb2Te3-like topological insulators.

preprint2011arXiv

Phonon Spectra, Nearest Neighbors, and Mechanical Stability of Disordered Colloidal Clusters with Attractive Interactions

We investigate the influence of morphology and size on the vibrational properties of disordered clusters of colloidal particles with attractive interactions. From measurements of displacement correlations between particles in each cluster, we extract vibrational properties of the corresponding "shadow" glassy cluster, with the same geometric configuration and interactions as the "source" cluster but without damping. Spectral features of the vibrational modes are found to depend strongly on the average number of nearest neighbors, $\bar{NN}$, but only weakly on the number of particles in each glassy cluster. In particular, the median phonon frequency, $ω_{med}$, is essentially constant for $\bar{NN}$ $<2$ and then grows linearly with $\bar{NN}$ for $\bar{NN}$ $>2$. This behavior parallels concurrent observations about local isostatic structures, which are absent in clusters with $\bar{NN}$ $<2$ and then grow linearly in number for $\bar{NN}$$>2$. Thus, cluster vibrational properties appear to be strongly connected to cluster mechanical stability (i.e., fraction of locally isostatic regions), and the scaling of $ω_{med}$ with $\bar{NN}$ is reminiscent of the behavior of packings of spheres with repulsive interactions at the jamming transition. Simulations of random networks of springs corroborate observations and suggest that connections between phonon spectra and nearest neighbor number are generic to disordered networks.

preprint2010arXiv

Dynamical decoupling for a qubit in telegraph-like noises

Based on the stochastic theory developed by Kubo and Anderson, we present an exact result of the decoherence function of a qubit in telegraph-like noises under dynamical decoupling control. We prove that for telegraph-like noises, the decoherence can be suppressed at most to the third order of the time and the periodic Carr-Purcell-Merboom-Gill sequences are the most efficient scheme in protecting the qubit coherence in the short-time limit.

preprint2010arXiv

Emotional State Categorization from Speech: Machine vs. Human

This paper presents our investigations on emotional state categorization from speech signals with a psychologically inspired computational model against human performance under the same experimental setup. Based on psychological studies, we propose a multistage categorization strategy which allows establishing an automatic categorization model flexibly for a given emotional speech categorization task. We apply the strategy to the Serbian Emotional Speech Corpus (GEES) and the Danish Emotional Speech Corpus (DES), where human performance was reported in previous psychological studies. Our work is the first attempt to apply machine learning to the GEES corpus where the human recognition rates were only available prior to our study. Unlike the previous work on the DES corpus, our work focuses on a comparison to human performance under the same experimental settings. Our studies suggest that psychology-inspired systems yield behaviours that, to a great extent, resemble what humans perceived and their performance is close to that of humans under the same experimental setup. Furthermore, our work also uncovers some differences between machine and humans in terms of emotional state recognition from speech.

preprint2010arXiv

Exploring Language-Independent Emotional Acoustic Features via Feature Selection

We propose a novel feature selection strategy to discover language-independent acoustic features that tend to be responsible for emotions regardless of languages, linguistics and other factors. Experimental results suggest that the language-independent feature subset discovered yields the performance comparable to the full feature set on various emotional speech corpora.

preprint2010arXiv

Low-frequency vibrations of soft colloidal glasses

We conduct experiments on two-dimensional packings of colloidal thermosensitive hydrogel particles whose packing fraction can be tuned above the jamming transition by varying the temperature. By measuring displacement correlations between particles, we extract the vibrational properties of a corresponding "shadow" system with the same configuration and interactions, but for which the dynamics of the particles are undamped. The vibrational spectrum and the nature of the modes are very similar to those predicted for zero-temperature idealized sphere models and found in atomic and molecular glasses; there is a boson peak at low frequency that shifts to higher frequency as the system is compressed above the jamming transition.

preprint2010arXiv

Rotational and Translational Phonon Modes in Glasses Composed of Ellipsoidal Particles

The effects of particle shape on the vibrational properties of colloidal glasses are studied experimentally. 'Ellipsoidal glasses' are created by stretching polystyrene spheres to different aspect ratios and then suspending the resulting ellipsoidal particles in water at high packing fraction. By measuring displacement correlations between particles, we extract vibrational properties of the corresponding "shadow" ellipsoidal glass with the same geometric configuration and interactions as the 'source' suspension but without damping. Low frequency modes in glasses composed of ellipsoidal particles with major/minor axis aspect ratios $\sim$1.1 are observed to have predominantly rotational character. By contrast, low frequency modes in glasses of ellipsoidal particles with larger aspect ratios ($\sim$3.0) exhibit a mix of rotational and translational character. All glass samples were characterized by a distribution of particles with different aspect ratios. Interestingly, even within the same sample it was found that small-aspect-ratio particles participate relatively more in rotational modes, while large-aspect-ratio particles tend to participate relatively more in translational modes.

Ke Chen

What is connected

Connect this record

See the researcher in context

Building this map preview

67 published item(s)

DepthPilot: From Controllability to Interpretability in Colonoscopy Video Generation

On the Stretch Factor of Polygonal Chains

Ultra-thin Epitaxial MgB2 on SiC: Substrate Surface Polarity Dependent Properties

BiCo-Net: Regress Globally, Match Locally for Robust 6D Pose Estimation

Capturing Evolution Genes for Time Series Data

Classification of Single-View Object Point Clouds

Convolutional Fine-Grained Classification with Self-Supervised Target Relation Regularization

Fine-Grained Object Classification via Self-Supervised Pose Alignment

Fisher Matrix Based Fault Detection for PMUs Data in Power Grids

Global well-posedness of the 1d compressible Navier-Stokes system with rough data

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments

Local Rotational Jamming and Multi-Scale Hyperuniformities in an Active Spinner System

Local well-posedness of the $1d$ compressible Navier-Stokes system with rough data

Locality-sensitive bucketing functions for the edit distance

Low-rank approximation for multiscale PDEs

Quasi-Balanced Self-Training on Noise-Aware Synthesis of Object Point Clouds for Closing Domain Gap

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

Weak solutions of the three-dimensional hypoviscous elastodynamics with finite kinetic energy

Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

MAX Phase Zr2SeC and Its Thermal Conduction Behavior

Reduced Ionic Diffusion by the Dynamic Electron-Ion Collisions in Warm Dense Hydrogen

A Framework for End-to-End Learning on Semantic Tree-Structured Data

A low-rank Schwarz method for radiative transport equation with heterogeneous scattering coefficient

CAD-PU: A Curvature-Adaptive Deep Learning Solution for Point Set Upsampling

Compositional Few-Shot Recognition with Primitive Discovery and Enhancing

Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions

Efficient Feature-based Image Registration by Mapping Sparsified Surfaces

Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors

Independent wavefront tailoring in full polarization channels by helicity-decoupled metasurface

Multiparty Selection

MusPy: A Toolkit for Symbolic Music Generation

MVLidarNet: Real-Time Multi-Class Scene Understanding for Autonomous Driving Using Multiple Views

On the Longest Spanning Tree with Neighborhoods

POP909: A Pop-song Dataset for Music Arrangement Generation

Random Sampling and Efficient Algorithms for Multiscale PDEs

Structured random sketching for PDE inverse problems

Unsupervised Domain Adaptation via Structurally Regularized Deep Clustering

Multielemental single-atom-thick A layers in nanolaminated V2(Sn, A)C (A=Fe, Co, Ni, Mn) for tailoring magnetic properties

Optical properties of cubic boron arsenide

The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation

Tooth morphometry using quasi-conformal theory

Coupled Leidenfrost States as a Monodisperse Granular Clock

Deep Structured-Output Regression Learning for Computational Color Constancy

Finiteness of hyperelliptic and superelliptic curves with CM Jacobians

Learning Contextualized Music Semantics from Tags via a Siamese Network

Multi-Label Zero-Shot Learning via Concept Embedding

On Higgs bundles over Shimura varieties of ball quotient type

A note on Shimura subvarieties in the hyperelliptic Torelli locus

A novel variational model for image registration using Gaussian curvature

A Total Fractional-Order Variation Model for Image Restoration with Non-homogeneous Boundary Conditions and its Numerical Solution

Bounded eqidistribution of special subvarieties in mixed Shimura varieties

Learning Constructive Primitives for Online Level Generation and Real-time Content Adaptation in Super Mario Bros

Learning Contextualized Semantics from Co-occurring Terms via a Siamese Architecture

Energy Gap Substructures in Conductance Measurements of MgB2-based Josephson Junctions: Beyond the 2-Gap Model

Rapid Skill Capture in a First-Person Shooter

Implementing program extraction from CL1-proofs

Learning-Based Procedural Content Generation

Phonon Dispersion and Elastic Moduli of Two-Dimensional Disordered Colloidal Packings of Soft Particles with Frictional Interactions

Phonons in two-dimensional soft colloidal crystals

Femtosecond Laser-induced Crystallization of Amorphous Sb2Te3 film and Coherent Phonon Spectroscopy Characterization and Optical Injection of Electron Spins

Phonon Spectra, Nearest Neighbors, and Mechanical Stability of Disordered Colloidal Clusters with Attractive Interactions

Dynamical decoupling for a qubit in telegraph-like noises

Emotional State Categorization from Speech: Machine vs. Human

Exploring Language-Independent Emotional Acoustic Features via Feature Selection

Low-frequency vibrations of soft colloidal glasses

Rotational and Translational Phonon Modes in Glasses Composed of Ellipsoidal Particles