Source author record

Qian Yu

Qian Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

33works

23topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning

Large-scale chemical reaction datasets are crucial for AI research in chemistry. However, existing chemical reaction data often exist as images within papers, making them not machine-readable and unusable for training machine learning models. In response to this challenge, we propose the RxnCaption framework for the task of chemical Reaction Diagram Parsing (RxnDP). Our framework reformulates the traditional coordinate prediction driven parsing process into an image captioning problem, which Large Vision Language Models (LVLMs) handle naturally. We introduce a strategy termed BBox and Index as Visual Prompt (BIVP), which uses our state-of-the-art molecular detector, MolYOLO, to pre-draw molecular bounding boxes and indices directly onto the input image. This turns the downstream parsing into a natural-language description problem. Extensive experiments show that the BIVP strategy significantly improves structural extraction quality while simplifying model design. We further construct the RxnCaption-15k dataset, an order of magnitude larger than prior real-world literature benchmarks, with a balanced test subset across four layout archetypes. Experiments demonstrate that RxnCaption-VL achieves state-of-the-art performance on multiple metrics. We believe our method, dataset, and models will advance structured information extraction from chemical literature and catalyze broader AI applications in chemistry. We will release data, models, and code on GitHub.

preprint2026arXiv

VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

Scalable Vector Graphics (SVG) animation generation is pivotal for professional design due to their structural editability and resolution independence. However, this task remains challenging as it requires bridging discrete code representations with continuous visual dynamics. Existing optimization-based methods often destroy topological consistency, while general-purpose LLMs rely on rigid CSS/SMIL transformations, failing to model geometry-level non-rigid deformations. To address these limitations, we present VAnim, the first LLM-based framework for open-domain text-to-SVG animation. We reconceptualize animation not as sequence generation, but as Sparse State Updates (SSU) on a persistent SVG DOM tree. This paradigm compresses sequence length by over 9.8x while preserving the SVG DOM structure and non-participating elements by construction. To enable precise control, we propose an Identification-First Motion Planning mechanism that grounds textual instructions in explicit visual entities. Furthermore, to overcome the non-differentiable nature of SVG rendering, we employ Rendering-Aware Reinforcement Learning via Group Relative Policy Optimization (GRPO). By leveraging a hybrid reward from a state-of-the-art video perception encoder, we align discrete code updates with high-fidelity visual feedback. We also introduce SVGAnim-134k, the first benchmark for vector animation. Extensive experiments demonstrate that VAnim significantly outperforms state-of-the-art baselines in semantic alignment and structural validity, with additional appendix metrics further validating motion quality and identity preservation.

preprint2024arXiv

Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training

Exemplar-based sketch-to-photo synthesis allows users to generate photo-realistic images based on sketches. Recently, diffusion-based methods have achieved impressive performance on image generation tasks, enabling highly-flexible control through text-driven generation or energy functions. However, generating photo-realistic images with color and texture from sketch images remains challenging for diffusion models. Sketches typically consist of only a few strokes, with most regions left blank, making it difficult for diffusion-based methods to produce photo-realistic images. In this work, we propose a two-stage method named ``Inversion-by-Inversion" for exemplar-based sketch-to-photo synthesis. This approach includes shape-enhancing inversion and full-control inversion. During the shape-enhancing inversion process, an uncolored photo is generated with the guidance of a shape-energy function. This step is essential to ensure control over the shape of the generated photo. In the full-control inversion process, we propose an appearance-energy function to control the color and texture of the final generated photo.Importantly, our Inversion-by-Inversion pipeline is training-free and can accept different types of exemplars for color and texture control. We conducted extensive experiments to evaluate our proposed method, and the results demonstrate its effectiveness. The code and project can be found at https://ximinng.github.io/inversion-by-inversion-project/.

preprint2023arXiv

Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

3D point cloud semantic segmentation has a wide range of applications. Recently, weakly supervised point cloud segmentation methods have been proposed, aiming to alleviate the expensive and laborious manual annotation process by leveraging scene-level labels. However, these methods have not effectively exploited the rich geometric information (such as shape and scale) and appearance information (such as color and texture) present in RGB-D scans. Furthermore, current approaches fail to fully leverage the point affinity that can be inferred from the feature extraction network, which is crucial for learning from weak scene-level labels. Additionally, previous work overlooks the detrimental effects of the long-tailed distribution of point cloud data in weakly supervised 3D semantic segmentation. To this end, this paper proposes a simple yet effective scene-level weakly supervised point cloud segmentation method with a newly introduced multi-modality point affinity inference module. The point affinity proposed in this paper is characterized by features from multiple modalities (e.g., point cloud and RGB), and is further refined by normalizing the classifier weights to alleviate the detrimental effects of long-tailed distribution without the need of the prior of category distribution. Extensive experiments on the ScanNet and S3DIS benchmarks verify the effectiveness of our proposed method, which outperforms the state-of-the-art by ~4% to ~6% mIoU. Codes are released at https://github.com/Sunny599/AAAI24-3DWSSG-MMA.

preprint2022arXiv

3D Shape Reconstruction from Free-Hand Sketches

Sketches are the most abstract 2D representations of real-world objects. Although a sketch usually has geometrical distortion and lacks visual cues, humans can effortlessly envision a 3D object from it. This suggests that sketches encode the information necessary for reconstructing 3D shapes. Despite great progress achieved in 3D reconstruction from distortion-free line drawings, such as CAD and edge maps, little effort has been made to reconstruct 3D shapes from free-hand sketches. We study this task and aim to enhance the power of sketches in 3D-related applications such as interactive design and VR/AR games. Unlike previous works, which mostly study distortion-free line drawings, our 3D shape reconstruction is based on free-hand sketches. A major challenge for free-hand sketch 3D reconstruction comes from the insufficient training data and free-hand sketch diversity, e.g. individualized sketching styles. We thus propose data generation and standardization mechanisms. Instead of distortion-free line drawings, synthesized sketches are adopted as input training data. Additionally, we propose a sketch standardization module to handle different sketch distortions and styles. Extensive experiments demonstrate the effectiveness of our model and its strong generalizability to various free-hand sketches. Our code is publicly available at https://github.com/samaonline/3D-Shape-Reconstruction-from-Free-Hand-Sketches.

preprint2022arXiv

A Simple Test-Time Method for Out-of-Distribution Detection

Neural networks are known to produce over-confident predictions on input images, even when these images are out-of-distribution (OOD) samples. This limits the applications of neural network models in real-world scenarios, where OOD samples exist. Many existing approaches identify the OOD instances via exploiting various cues, such as finding irregular patterns in the feature space, logits space, gradient space or the raw space of images. In contrast, this paper proposes a simple Test-time Linear Training (ETLT) method for OOD detection. Empirically, we find that the probabilities of input images being out-of-distribution are surprisingly linearly correlated to the features extracted by neural networks. To be specific, many state-of-the-art OOD algorithms, although designed to measure reliability in different ways, actually lead to OOD scores mostly linearly related to their image features. Thus, by simply learning a linear regression model trained from the paired image features and inferred OOD scores at test-time, we can make a more precise OOD prediction for the test instances. We further propose an online variant of the proposed method, which achieves promising performance and is more practical in real-world applications. Remarkably, we improve FPR95 from $51.37\%$ to $12.30\%$ on CIFAR-10 datasets with maximum softmax probability as the base OOD detector. Extensive experiments on several benchmark datasets show the efficacy of ETLT for OOD detection task.

preprint2022arXiv

Alleviating Cold-start Problem in CTR Prediction with A Variational Embedding Learning Framework

We propose a general Variational Embedding Learning Framework (VELF) for alleviating the severe cold-start problem in CTR prediction. VELF addresses the cold start problem via alleviating over-fits caused by data-sparsity in two ways: learning probabilistic embedding, and incorporating trainable and regularized priors which utilize the rich side information of cold start users and advertisements (Ads). The two techniques are naturally integrated into a variational inference framework, forming an end-to-end training process. Abundant empirical tests on benchmark datasets well demonstrate the advantages of our proposed VELF. Besides, extended experiments confirmed that our parameterized and regularized priors provide more generalization capability than traditional fixed priors.

preprint2022arXiv

Gating-adapted Wavelet Multiresolution Analysis for Exposure Sequence Modeling in CTR prediction

The exposure sequence is being actively studied for user interest modeling in Click-Through Rate (CTR) prediction. However, the existing methods for exposure sequence modeling bring extensive computational burden and neglect noise problems, resulting in an excessively latency and the limited performance in online recommenders. In this paper, we propose to address the high latency and noise problems via Gating-adapted wavelet multiresolution analysis (Gama), which can effectively denoise the extremely long exposure sequence and adaptively capture the implied multi-dimension user interest with linear computational complexity. This is the first attempt to integrate non-parametric multiresolution analysis technique into deep neural networks to model user exposure sequence. Extensive experiments on large scale benchmark dataset and real production dataset confirm the effectiveness of Gama for exposure sequence modeling, especially in cold-start scenarios. Benefited from its low latency and high effecitveness, Gama has been deployed in our real large-scale industrial recommender, successfully serving over hundreds of millions users.

preprint2022arXiv

IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument Mining Tasks

Traditionally, a debate usually requires a manual preparation process, including reading plenty of articles, selecting the claims, identifying the stances of the claims, seeking the evidence for the claims, etc. As the AI debate attracts more attention these years, it is worth exploring the methods to automate the tedious process involved in the debating system. In this work, we introduce a comprehensive and large dataset named IAM, which can be applied to a series of argument mining tasks, including claim extraction, stance classification, evidence extraction, etc. Our dataset is collected from over 1k articles related to 123 topics. Near 70k sentences in the dataset are fully annotated based on their argument properties (e.g., claims, stances, evidence, etc.). We further propose two new integrated argument mining tasks associated with the debate preparation process: (1) claim extraction with stance classification (CESC) and (2) claim-evidence pair extraction (CEPE). We adopt a pipeline approach and an end-to-end method for each integrated task separately. Promising experimental results are reported to show the values and challenges of our proposed tasks, and motivate future research on argument mining.

preprint2022arXiv

LightSecAgg: a Lightweight and Versatile Design for Secure Aggregation in Federated Learning

Secure model aggregation is a key component of federated learning (FL) that aims at protecting the privacy of each user's individual model while allowing for their global aggregation. It can be applied to any aggregation-based FL approach for training a global or personalized model. Model aggregation needs to also be resilient against likely user dropouts in FL systems, making its design substantially more complex. State-of-the-art secure aggregation protocols rely on secret sharing of the random-seeds used for mask generations at the users to enable the reconstruction and cancellation of those belonging to the dropped users. The complexity of such approaches, however, grows substantially with the number of dropped users. We propose a new approach, named LightSecAgg, to overcome this bottleneck by changing the design from "random-seed reconstruction of the dropped users" to "one-shot aggregate-mask reconstruction of the active users via mask encoding/decoding". We show that LightSecAgg achieves the same privacy and dropout-resiliency guarantees as the state-of-the-art protocols while significantly reducing the overhead for resiliency against dropped users. We also demonstrate that, unlike existing schemes, LightSecAgg can be applied to secure aggregation in the asynchronous FL setting. Furthermore, we provide a modular system design and optimized on-device parallelization for scalable implementation, by enabling computational overlapping between model training and on-device encoding, as well as improving the speed of concurrent receiving and sending of chunked masks. We evaluate LightSecAgg via extensive experiments for training diverse models on various datasets in a realistic FL system with large number of users and demonstrate that LightSecAgg significantly reduces the total training time.

preprint2022arXiv

One- and two-qubit gate infidelities due to motional errors in trapped ions and electrons

In this work, we derive analytic formulae that determine the effect of error mechanisms on one- and two-qubit gates in trapped ions and electrons. First, we analyze, and derive expressions for, the effect of driving field inhomogeneities on one-qubit gate fidelities. Second, we derive expressions for two-qubit gate errors, including static motional frequency shifts, trap anharmonicities, field inhomogeneities, heating, and motional dephasing. We show that, for small errors, each of our expressions for infidelity converges to its respective numerical simulation; this shows our formulae are sufficient for determining error budgets for high-fidelity gates, obviating numerical simulations in future projects. All of the derivations are general to any internal qubit state, and any mixed state of the ion crystal's motion that is diagonal in the Fock state basis. Our treatment of static motional frequency shifts, trap anharmonicities, heating, and motional dephasing apply to both laser-based and laser-free gates, while our treatment of field imhomogenieties applies to laser-free systems.

preprint2021arXiv

Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing Vertical and Horizontal Convolutions

Accurate image segmentation plays a crucial role in medical image analysis, yet it faces great challenges of various shapes, diverse sizes, and blurry boundaries. To address these difficulties, square kernel-based encoder-decoder architecture has been proposed and widely used, but its performance remains still unsatisfactory. To further cope with these challenges, we present a novel double-branch encoder architecture. Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels in the double-branch encoder, so features learned by the two branches can be expected to complement each other. 2) Considering that spatial attention can help models to better focus on the target region in a large-sized image, we develop an attention loss to further emphasize the segmentation on small-sized targets. Together, the above two schemes give rise to a novel double-branch encoder segmentation framework for medical image segmentation, namely Crosslink-Net. The experiments validate the effectiveness of our model on four datasets. The code is released at https://github.com/Qianyu1226/Crosslink-Net.

preprint2021arXiv

Deep Symmetric Adaptation Network for Cross-modality Medical Image Segmentation

Unsupervised domain adaptation (UDA) methods have shown their promising performance in the cross-modality medical image segmentation tasks. These typical methods usually utilize a translation network to transform images from the source domain to target domain or train the pixel-level classifier merely using translated source images and original target images. However, when there exists a large domain shift between source and target domains, we argue that this asymmetric structure could not fully eliminate the domain gap. In this paper, we present a novel deep symmetric architecture of UDA for medical image segmentation, which consists of a segmentation sub-network, and two symmetric source and target domain translation sub-networks. To be specific, based on two translation sub-networks, we introduce a bidirectional alignment scheme via a shared encoder and private decoders to simultaneously align features 1) from source to target domain and 2) from target to source domain, which helps effectively mitigate the discrepancy between domains. Furthermore, for the segmentation sub-network, we train a pixel-level classifier using not only original target images and translated source images, but also original source images and translated target images, which helps sufficiently leverage the semantic information from the images with different styles. Extensive experiments demonstrate that our method has remarkable advantages compared to the state-of-the-art methods in both cross-modality Cardiac and BraTS segmentation tasks.

preprint2021arXiv

Existence and Hölder continuity conditions for self-intersection local time of Rosenblatt process

We consider the existence and Hölder continuity conditions for the self-intersection local time of Rosenblatt process. Moreover, we study the cases of intersection local time and collision local time, respectively.

preprint2021arXiv

Feasibility study of quantum computing using trapped electrons

We investigate the feasibility of using electrons in a linear Paul trap as qubits in a future quantum computer. We discuss the necessary experimental steps to realize such a device through a concrete design proposal, including trapping, cooling, electronic detection, spin readout and single and multi-qubit gate operations. Numeric simulations indicate that two-qubit Bell-state fidelities of order 99.99% can be achieved assuming reasonable experimental parameters.

preprint2021arXiv

Trapping electrons in a room-temperature microwave Paul trap

We demonstrate trapping of electrons in a millimeter-sized quadrupole Paul trap driven at 1.6~GHz in a room-temperature ultra-high vacuum setup. Cold electrons are introduced into the trap by ionization of atomic calcium via Rydberg states and stay confined by microwave and static electric fields for several tens of milliseconds. A fraction of these electrons remain trapped longer and show no measurable loss for measurement times up to a second. Electronic excitation of the motion reveals secular frequencies which can be tuned over a range of several tens to hundreds of MHz. Operating a similar electron Paul trap in a cryogenic environment may provide a platform for all-electric quantum computing with trapped electron spin qubits.

preprint2020arXiv

Building Information Modeling and Classification by Visual Learning At A City Scale

In this paper, we provide two case studies to demonstrate how artificial intelligence can empower civil engineering. In the first case, a machine learning-assisted framework, BRAILS, is proposed for city-scale building information modeling. Building information modeling (BIM) is an efficient way of describing buildings, which is essential to architecture, engineering, and construction. Our proposed framework employs deep learning technique to extract visual information of buildings from satellite/street view images. Further, a novel machine learning (ML)-based statistical tool, SURF, is proposed to discover the spatial patterns in building metadata. The second case focuses on the task of soft-story building classification. Soft-story buildings are a type of buildings prone to collapse during a moderate or severe earthquake. Hence, identifying and retrofitting such buildings is vital in the current earthquake preparedness efforts. For this task, we propose an automated deep learning-based procedure for identifying soft-story buildings from street view images at a regional scale. We also create a large-scale building image database and a semi-automated image labeling approach that effectively annotates new database entries. Through extensive computational experiments, we demonstrate the effectiveness of the proposed method.

preprint2020arXiv

Cosmic muon flux measurement and tunnel overburden structure imaging

We present a cosmic ray muon tomographic experiment for measuring the muon flux and imaging the tunnel overburden structures in Changshu, China. The device used in this study is a tracking detector based on the plastic scintillator with SiPM technology, which can be conveniently operated in field works. The compact system with sensitive area of $6400 cm^2$ can measure the angular distribution of cosmic muons. It's able to image the overburden density length from the surface of overburden to the detector along the muon tracks. The open sky muon flux measurement outside the tunnel has a good agreement with the modified Gassier Formula model. The distributions of muon flux at three positions inside the tunnel are very similar to that of open sky. Assuming the average density of overburden compact sandstone is $2.65 g/cm^3$, the overburden thickness can be obtained from the density length derived from the difference of muon flux inside and outside the tunnel. Moreover, for known penetrated lengths (i.e., topography of overburden), the density anomalies of the overburden can also been obtained. This study suggests a potential application for imaging and detecting subsurface structures in civil engineering, tunnels or caverns with the cosmic ray muon telescope.

preprint2020arXiv

Crossbar-Net: A Novel Convolutional Network for Kidney Tumor Segmentation in CT Images

Due to the irregular motion, similar appearance and diverse shape, accurate segmentation of kidney tumor in CT images is a difficult and challenging task. To this end, we present a novel automatic segmentation method, termed as Crossbar-Net, with the goal of accurate segmenting the kidney tumors. Firstly, considering that the traditional learning-based segmentation methods normally employ either whole images or squared patches as the training samples, we innovatively sample the orthogonal non-squared patches (namely crossbar patches), to fully cover the whole kidney tumors in either horizontal or vertical directions. These sampled crossbar patches could not only represent the detailed local information of kidney tumor as the traditional patches, but also describe the global appearance from either horizontal or vertical direction using contextual information. Secondly, with the obtained crossbar patches, we trained a convolutional neural network with two sub-models (i.e., horizontal sub-model and vertical sub-model) in a cascaded manner, to integrate the segmentation results from two directions (i.e., horizontal and vertical). This cascaded training strategy could effectively guarantee the consistency between sub-models, by feeding each other with the most difficult samples, for a better segmentation. In the experiment, we evaluate our method on a real CT kidney tumor dataset, collected from 94 different patients including 3,500 images. Compared with the state-of-the-art segmentation methods, the results demonstrate the superior results of our method on dice ratio score, true positive fraction, centroid distance and Hausdorff distance. Moreover, we have extended our crossbar-net to a different task: cardiac segmentation, showing the promising results for the better generalization.

preprint2020arXiv

Crossover-Net: Leveraging the Vertical-Horizontal Crossover Relation for Robust Segmentation

Robust segmentation for non-elongated tissues in medical images is hard to realize due to the large variation of the shape, size, and appearance of these tissues in different patients. In this paper, we present an end-to-end trainable deep segmentation model termed Crossover-Net for robust segmentation in medical images. Our proposed model is inspired by an insightful observation: during segmentation, the representation from the horizontal and vertical directions can provide different local appearance and orthogonality context information, which helps enhance the discrimination between different tissues by simultaneously learning from these two directions. Specifically, by converting the segmentation task to a pixel/voxel-wise prediction problem, firstly, we originally propose a cross-shaped patch, namely crossover-patch, which consists of a pair of (orthogonal and overlapped) vertical and horizontal patches, to capture the orthogonal vertical and horizontal relation. Then, we develop the Crossover-Net to learn the vertical-horizontal crossover relation captured by our crossover-patches. To achieve this goal, for learning the representation on a typical crossover-patch, we design a novel loss function to (1) impose the consistency on the overlap region of the vertical and horizontal patches and (2) preserve the diversity on their non-overlap regions. We have extensively evaluated our method on CT kidney tumor, MR cardiac, and X-ray breast mass segmentation tasks. Promising results are achieved according to our extensive evaluation and comparison with the state-of-the-art segmentation models.

preprint2020arXiv

Entangled Polynomial Codes for Secure, Private, and Batch Distributed Matrix Multiplication: Breaking the "Cubic" Barrier

In distributed matrix multiplication, a common scenario is to assign each worker a fraction of the multiplication task, by partitioning the input matrices into smaller submatrices. In particular, by dividing two input matrices into $m$-by-$p$ and $p$-by-$n$ subblocks, a single multiplication task can be viewed as computing linear combinations of $pmn$ submatrix products, which can be assigned to $pmn$ workers. Such block-partitioning based designs have been widely studied under the topics of secure, private, and batch computation, where the state of the arts all require computing at least "cubic" ($pmn$) number of submatrix multiplications. Entangled polynomial codes, first presented for straggler mitigation, provides a powerful method for breaking the cubic barrier. It achieves a subcubic recovery threshold, meaning that the final product can be recovered from \emph{any} subset of multiplication results with a size order-wise smaller than $pmn$. In this work, we show that entangled polynomial codes can be further extended to also include these three important settings, and provide a unified framework that order-wise reduces the total computational costs upon the state of the arts by achieving subcubic recovery thresholds.

preprint2020arXiv

Review-based Question Generation with Adaptive Instance Transfer and Augmentation

Online reviews provide rich information about products and service, while it remains inefficient for potential consumers to exploit the reviews for fulfilling their specific information need. We propose to explore question generation as a new way of exploiting review information. One major challenge of this task is the lack of review-question pairs for training a neural generation model. We propose an iterative learning framework for handling this challenge via adaptive transfer and augmentation of the training instances with the help of the available user-posed question-answer data. To capture the aspect characteristics in reviews, the augmentation and generation procedures incorporate related features extracted via unsupervised learning. Experiments on data from 10 categories of a popular E-commerce site demonstrate the effectiveness of the framework, as well as the usefulness of the new task.

preprint2020arXiv

Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding

We consider the problem of massive matrix multiplication, which underlies many data analytic applications, in a large-scale distributed system comprising a group of worker nodes. We target the stragglers' delay performance bottleneck, which is due to the unpredictable latency in waiting for slowest nodes (or stragglers) to finish their tasks. We propose a novel coding strategy, named \emph{entangled polynomial code}, for designing the intermediate computations at the worker nodes in order to minimize the recovery threshold (i.e., the number of workers that we need to wait for in order to compute the final output). We demonstrate the optimality of entangled polynomial code in several cases, and show that it provides orderwise improvement over the conventional schemes for straggler mitigation. Furthermore, we characterize the optimal recovery threshold among all linear coding strategies within a factor of $2$ using \emph{bilinear complexity}, by developing an improved version of the entangled polynomial code. In particular, while evaluating bilinear complexity is a well-known challenging problem, we show that optimal recovery threshold for linear coding strategies can be approximated within a factor of $2$ of this fundamental quantity. On the other hand, the improved version of the entangled polynomial code enables further and orderwise reduction in the recovery threshold, compared to its basic version. Finally, we show that the techniques developed in this paper can also be extended to several other problems such as coded convolution and fault-tolerant computing, leading to tight characterizations.

preprint2020arXiv

Unsupervised Sketch-to-Photo Synthesis

Humans can envision a realistic photo given a free-hand sketch that is not only spatially imprecise and geometrically distorted but also without colors and visual details. We study unsupervised sketch-to-photo synthesis for the first time, learning from unpaired sketch-photo data where the target photo for a sketch is unknown during training. Existing works only deal with style change or spatial deformation alone, synthesizing photos from edge-aligned line drawings or transforming shapes within the same modality, e.g., color images. Our key insight is to decompose unsupervised sketch-to-photo synthesis into a two-stage translation task: First shape translation from sketches to grayscale photos and then content enrichment from grayscale to color photos. We also incorporate a self-supervised denoising objective and an attention module to handle abstraction and style variations that are inherent and specific to sketches. Our synthesis is sketch-faithful and photo-realistic to enable sketch-based image retrieval in practice. An exciting corollary product is a universal and promising sketch generator that captures human visual perception beyond the edge map of a photo.

preprint2016arXiv

Finding the Optimal Demodulator Under Implementation Constraints

The common approach of designing a communication device is to maximize a well-defined objective function, e.g., the channel capacity and the cut-off rate. We propose easy-to-implement solutions for Gaussian channels that approximate the optimal results for these maximization problems. Three topics are addressed. First, we consider the case where the channel output is quantized, and we find the quantization thresholds that maximize the mutual information. The approximation derived from the asymptotic solution has a negligible loss on the entire range of SNR when 2-PAM modulation is used, and its quantization thresholds linearly depend on the standard deviation of noise. We also derive a simple estimator of the relative capacity loss due to quantization, based on the high-rate limit. Then we consider the integer constraint on the decoding metric, and maximize the mismatched channel capacity. We study the asymptotic solution of the optimal metric assignment and show that the same approximation we derived in the matched decoding case still holds for the mismatched decoder. Finally, we consider the demodulation problem for 8PSK bit-interleaved coded modulation(BICM). We derive the approximated optimal demodulation metrics that maximize the general cut-off rate or the mismatched capacity using max-log approximation . The error rate performances of the two metrics' assignments are compared, based on Reed-Solomon-Viterbi(RSV) code, and the mismatched capacity metric turns out to be better. The proposed approximation can be computed using an efficient firmware algorithm, and improves the system performance of commercial chips.

preprint2016arXiv

Large Scale Business Discovery from Street Level Imagery

Search with local intent is becoming increasingly useful due to the popularity of the mobile device. The creation and maintenance of accurate listings of local businesses worldwide is time consuming and expensive. In this paper, we propose an approach to automatically discover businesses that are visible on street level imagery. Precise business store front detection enables accurate geo-location of businesses, and further provides input for business categorization, listing generation, etc. The large variety of business categories in different countries makes this a very challenging problem. Moreover, manual annotation is prohibitive due to the scale of this problem. We propose the use of a MultiBox based approach that takes input image pixels and directly outputs store front bounding boxes. This end-to-end learning approach instead preempts the need for hand modeling either the proposal generation phase or the post-processing phase, leveraging large labelled training datasets. We demonstrate our approach outperforms the state of the art detection techniques with a large margin in terms of performance and run-time efficiency. In the evaluation, we show this approach achieves human accuracy in the low-recall settings. We also provide an end-to-end evaluation of business discovery in the real world.

preprint2015arXiv

An optimal approximation of Rosenblatt sheet by multiple Wiener integrals

Let $Z^{α,β}$ be the Rosenblatt sheet with the representation $$ Z^{α,β}(t,s)=\int^t_0\int^s_0\int^t_0\int^s_0Q^α(t,y_1,y_2)Q^β(s,u_1,u_2)B(dy_1,du_1)B(dy_2,du_2) $$ where $B$ is a Brownian sheet, $\frac12<α,β<1$, $Q^α$ and $Q^β$ are the given kernel. In this paper, we contruct multiple Wiener integrals of the form \begin{align*} \int^t_0\int^s_0\int^t_0\int^s_0&[k_1(y_1,y_2)^{-\frac12α}(u_1,u_2)^{-\frac12β}+k_2(y_1\vee y_2)^{\frac12α}(y_1\wedge y_2)^{-\frac12α}|y_1-y_2|^{α-1}\\ &\cdot(u_1\vee u_2)^{\frac12β}(u_1\wedge u_2)^{-\frac12β}|u_1-u_2|^{β-1}]B(dy_1,du_1)B(dy_2,du_2),~~k_1,k_2\geq0, \end{align*} and obtain an optimal approximation of $Z^{α,β}(t,s)$.

preprint2015arXiv

Further Theoretical Study of Distribution Separation Method for Information Retrieval

Recently, a Distribution Separation Method (DSM) is proposed for relevant feedback in information retrieval, which aims to approximate the true relevance distribution by separating a seed irrelevance distribution from the mixture one. While DSM achieved a promising empirical performance, theoretical analysis of DSM is still need further study and comparison with other relative retrieval model. In this article, we first generalize DSM's theoretical property, by proving that its minimum correlation assumption is equivalent to the maximum (original and symmetrized) KL-Divergence assumption. Second, we also analytically show that the EM algorithm in a well-known Mixture Model is essentially a distribution separation process and can be simplified using the linear separation algorithm in DSM. Some empirical results are also presented to support our theoretical analysis.

preprint2015arXiv

Nanoscale Origins of the Damage Tolerance of the High-Entropy Alloy CrMnFeCoNi

Damage-tolerance can be an elusive characteristic of structural materials requiring both high strength and ductility, properties that are often mutually exclusive. High-entropy alloys are of interest in this regard. Specifically, the single-phase CrMnFeCoNi alloy displays tensile strength levels of ~1 GPa, excellent ductility (~60-70%) and exceptional fracture toughness (KJIc > 200 MPa/m). Here, through the use of in-situ straining in an aberration-corrected transmission electron microscope, we report on the salient atomistic to micro-scale mechanisms underlying the origin of these properties. We identify a synergy of multiple deformation mechanisms, rarely achieved in metallic alloys, which generates high strength, work hardening and ductility, including the easy motion of Shockley partials, their interactions to form stacking-fault parallelepipeds, and arrest at planar-slip bands of undissociated dislocations. We further show that crack propagation is impeded by twinned, nano-scale bridges that form between the near-tip crack faces and delay fracture by shielding the crack tip.

preprint2015arXiv

Sketch-a-Net that Beats Humans

We propose a multi-scale multi-channel deep neural network framework that, for the first time, yields sketch recognition performance surpassing that of humans. Our superior performance is a result of explicitly embedding the unique characteristics of sketches in our model: (i) a network architecture designed for sketch rather than natural photo statistics, (ii) a multi-channel generalisation that encodes sequential ordering in the sketching process, and (iii) a multi-scale network ensemble with joint Bayesian fusion that accounts for the different levels of abstraction exhibited in free-hand sketches. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photo or sketch. Our network on the other hand not only delivers the best performance on the largest human sketch dataset to date, but also is small in size making efficient training possible using just CPUs.

preprint2014arXiv

Metal oxide resistive switching: evolution of the density of states across the metal-insulator transition

We report the study of metal-STO-metal memristors where the doping concentration in STO can be fine-tuned through electric field migration of oxygen vacancies. In this tunnel junction device, the evolution of the Density Of States (DoS) can be followed continuously across the Metal-Insulator Transition (MIT). At very low dopant concentration, the junction displays characteristic signatures of discrete dopants levels. As the dopant concentration increases, the semiconductor band gap fills in but a soft Coulomb gap remains. At even higher doping, a transition to a metallic state occurs where the DoS at the Fermi level becomes finite and Altshuler-Aronov corrections to the DoS are observed. At the critical point of the MIT, the DoS scales linearly with energy $N(\varepsilon) \sim \varepsilon$, the possible signature of multifractality.

preprint2013arXiv

Understanding Boltzmann Machine and Deep Learning via A Confident Information First Principle

Typical dimensionality reduction methods focus on directly reducing the number of random variables while retaining maximal variations in the data. In this paper, we consider the dimensionality reduction in parameter spaces of binary multivariate distributions. We propose a general Confident-Information-First (CIF) principle to maximally preserve parameters with confident estimates and rule out unreliable or noisy parameters. Formally, the confidence of a parameter can be assessed by its Fisher information, which establishes a connection with the inverse variance of any unbiased estimate for the parameter via the Cramér-Rao bound. We then revisit Boltzmann machines (BM) and theoretically show that both single-layer BM without hidden units (SBM) and restricted BM (RBM) can be solidly derived using the CIF principle. This can not only help us uncover and formalize the essential parts of the target density that SBM and RBM capture, but also suggest that the deep neural network consisting of several layers of RBM can be seen as the layer-wise application of CIF. Guided by the theoretical analysis, we develop a sample-specific CIF-based contrastive divergence (CD-CIF) algorithm for SBM and a CIF-based iterative projection procedure (IP) for RBM. Both CD-CIF and IP are studied in a series of density estimation experiments.

preprint2011arXiv

Pattern formation in oscillatory complex networks consisting of excitable nodes

Oscillatory dynamics of complex networks has recently attracted great attention. In this paper we study pattern formation in oscillatory complex networks consisting of excitable nodes. We find that there exist a few center nodes and small skeletons for most oscillations. Complicated and seemingly random oscillatory patterns can be viewed as well-organized target waves propagating from center nodes along the shortest paths, and the shortest loops passing through both the center nodes and their driver nodes play the role of oscillation sources. Analyzing simple skeletons we are able to understand and predict various essential properties of the oscillations and effectively modulate the oscillations. These methods and results will give insights into pattern formation in complex networks, and provide suggestive ideas for studying and controlling oscillations in neural networks.

Qian Yu

What is connected

Connect this record

See the researcher in context

Building this map preview

33 published item(s)

RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning

VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training

Multi-modality Affinity Inference for Weakly Supervised 3D Semantic Segmentation

3D Shape Reconstruction from Free-Hand Sketches

A Simple Test-Time Method for Out-of-Distribution Detection

Alleviating Cold-start Problem in CTR Prediction with A Variational Embedding Learning Framework

Gating-adapted Wavelet Multiresolution Analysis for Exposure Sequence Modeling in CTR prediction

IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument Mining Tasks

LightSecAgg: a Lightweight and Versatile Design for Secure Aggregation in Federated Learning

One- and two-qubit gate infidelities due to motional errors in trapped ions and electrons

Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing Vertical and Horizontal Convolutions

Deep Symmetric Adaptation Network for Cross-modality Medical Image Segmentation

Existence and Hölder continuity conditions for self-intersection local time of Rosenblatt process

Feasibility study of quantum computing using trapped electrons

Trapping electrons in a room-temperature microwave Paul trap

Building Information Modeling and Classification by Visual Learning At A City Scale

Cosmic muon flux measurement and tunnel overburden structure imaging

Crossbar-Net: A Novel Convolutional Network for Kidney Tumor Segmentation in CT Images

Crossover-Net: Leveraging the Vertical-Horizontal Crossover Relation for Robust Segmentation

Entangled Polynomial Codes for Secure, Private, and Batch Distributed Matrix Multiplication: Breaking the "Cubic" Barrier

Review-based Question Generation with Adaptive Instance Transfer and Augmentation

Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding

Unsupervised Sketch-to-Photo Synthesis

Finding the Optimal Demodulator Under Implementation Constraints

Large Scale Business Discovery from Street Level Imagery

An optimal approximation of Rosenblatt sheet by multiple Wiener integrals

Further Theoretical Study of Distribution Separation Method for Information Retrieval

Nanoscale Origins of the Damage Tolerance of the High-Entropy Alloy CrMnFeCoNi

Sketch-a-Net that Beats Humans

Metal oxide resistive switching: evolution of the density of states across the metal-insulator transition

Understanding Boltzmann Machine and Deep Learning via A Confident Information First Principle

Pattern formation in oscillatory complex networks consisting of excitable nodes