Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
27works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

27 published item(s)

preprint2026arXiv

RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

The integration of Large Language Model (LLM) agents is transforming recommender systems from simple query-item matching towards deeply personalized and interactive recommendations. Reinforcement Learning (RL) provides an essential framework for the optimization of these agents in recommendation tasks. However, current methodologies remain limited by a reliance on single dimensional outcome-based rewards that focus exclusively on final user interactions, overlooking critical intermediate capabilities, such as instruction following and complex intent understanding. Despite the necessity for designing multi-dimensional reward, the field lacks a standardized benchmark to facilitate this development. To bridge this gap, we introduce RecRM-Bench, the largest and most comprehensive benchmark to date for agentic recommender systems. It comprises over 1 million structured entries across four core evaluation dimensions: instruction following, factual consistency, query-item relevance, and fine-grained user behavior prediction. By supporting comprehensive assessment from syntactic compliance to complex intent grounding and preference modeling, RecRM-Bench provides a foundational dataset for training sophisticated reward models. Furthermore, we propose a systematic framework for the construction of multi-dimensional reward models and the integration of a hybrid reward function, establishing a robust foundation for developing reliable and highly capable agentic recommender systems. The complete RecRM-Bench dataset is publicly available at https://huggingface.co/datasets/wwzeng/RecRM-Bench.

preprint2022arXiv

AC-Feasible Power Transfer Regions of Virtual Power Plants: Characterization and Application

Distributed energy resources (DERs) in distribution networks can be aggregated as a virtual power plant (VPP) for transmission-level operations. A critical challenge for such coordination is the complexity of the AC-feasible power transfer region between a VPP and the transmission system at their point of common coupling. To overcome this challenge, this paper develops a characterization method for such regions. The proposed method constructs linear constraints to inner-approximate the AC-feasible power transfer regions. To guarantee AC-feasibility, the parameters in these constraints are determined by applying the Brouwer fixed point theorem to the second-order Taylor expansion of the nonlinear Dist-Flow equations. Based on the power transfer regions characterized with our method, a transmission-level operation problem with VPP participation is formulated and solved through big-M linearization. The proposed methods are verified by numerical experiments in the IEEE 33-bus and IEEE 136-bus test systems.

preprint2022arXiv

CGMN: A Contrastive Graph Matching Network for Self-Supervised Graph Similarity Learning

Graph similarity learning refers to calculating the similarity score between two graphs, which is required in many realistic applications, such as visual tracking, graph classification, and collaborative filtering. As most of the existing graph neural networks yield effective graph representations of a single graph, little effort has been made for jointly learning two graph representations and calculating their similarity score. In addition, existing unsupervised graph similarity learning methods are mainly clustering-based, which ignores the valuable information embodied in graph pairs. To this end, we propose a contrastive graph matching network (CGMN) for self-supervised graph similarity learning in order to calculate the similarity between any two input graph objects. Specifically, we generate two augmented views for each graph in a pair respectively. Then, we employ two strategies, namely cross-view interaction and cross-graph interaction, for effective node representation learning. The former is resorted to strengthen the consistency of node representations in two views. The latter is utilized to identify node differences between different graphs. Finally, we transform node representations into graph-level representations via pooling operations for graph similarity computation. We have evaluated CGMN on eight real-world datasets, and the experiment results show that the proposed new approach is superior to the state-of-the-art methods in graph similarity learning downstream tasks.

preprint2022arXiv

Continuity scaling: A rigorous framework for detecting and quantifying causality accurately

Data based detection and quantification of causation in complex, nonlinear dynamical systems is of paramount importance to science, engineering and beyond. Inspired by the widely used methodology in recent years, the cross-map-based techniques, we develop a general framework to advance towards a comprehensive understanding of dynamical causal mechanisms, which is consistent with the natural interpretation of causality. In particular, instead of measuring the smoothness of the cross map as conventionally implemented, we define causation through measuring the {\it scaling law} for the continuity of the investigated dynamical system directly. The uncovered scaling law enables accurate, reliable, and efficient detection of causation and assessment of its strength in general complex dynamical systems, outperforming those existing representative methods. The continuity scaling based framework is rigorously established and demonstrated using datasets from model complex systems and the real world.

preprint2022arXiv

Cross-View Cross-Scene Multi-View Crowd Counting

Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera, capturing more people in the scene, and improve counting performance for occluded people or those in low resolution. However, the current multi-view paradigm trains and tests on the same single scene and camera-views, which limits its practical application. In this paper, we propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts. To dynamically handle the challenge of optimal view fusion under scene and camera layout change and non-correspondence noise due to camera calibration errors or erroneous features, we propose a CVCS model that attentively selects and fuses multiple views together using camera layout geometry, and a noise view regularization method to train the model to handle non-correspondence errors. We also generate a large synthetic multi-camera crowd counting dataset with a large number of scenes and camera views to capture many possible variations, which avoids the difficulty of collecting and annotating such a large real dataset. We then test our trained CVCS model on real multi-view counting datasets, by using unsupervised domain transfer. The proposed CVCS model trained on synthetic data outperforms the same model trained only on real data, and achieves promising performance compared to fully supervised methods that train and test on the same single scene.

preprint2022arXiv

Efficient Pipeline Planning for Expedited Distributed DNN Training

To train modern large DNN models, pipeline parallelism has recently emerged, which distributes the model across GPUs and enables different devices to process different microbatches in pipeline. Earlier pipeline designs allow multiple versions of model parameters to co-exist (similar to asynchronous training), and cannot ensure the same model convergence and accuracy performance as without pipelining. Synchronous pipelining has recently been proposed which ensures model performance by enforcing a synchronization barrier between training iterations. Nonetheless, the synchronization barrier requires waiting for gradient aggregation from all microbatches and thus delays the training progress. Optimized pipeline planning is needed to minimize such wait and hence the training time, which has not been well studied in the literature. This paper designs efficient, near-optimal algorithms for expediting synchronous pipeline-parallel training of modern large DNNs over arbitrary inter-GPU connectivity. Our algorithm framework comprises two components: a pipeline partition and device mapping algorithm, and a pipeline scheduler that decides processing order of microbatches over the partitions, which together minimize the per-iteration training time. We conduct thorough theoretical analysis, extensive testbed experiments and trace-driven simulation, and demonstrate our scheme can accelerate training up to 157% compared with state-of-the-art designs.

preprint2022arXiv

GDsmith: Detecting Bugs in Graph Database Engines

Graph database engines stand out in the era of big data for their efficiency of modeling and processing linked data. There is a strong need of testing graph database engines. However, random testing, the most practical way of automated test generation, faces the challenges of semantic validity, non-empty result, and behavior diversity to detect bugs in graph database engines. To address these challenges, in this paper, we propose GDsmith, the first black-box approach for testing graph database engines. It ensures that each randomly generated Cypher query satisfies the semantic requirements via skeleton generation and completion. GDsmith includes our technique to increase the probability of producing Cypher queries that return non-empty results by leveraging three types of structural mutation strategies. GDsmith also includes our technique to improve the behavior diversity of the generated Cypher queries by selecting property keys according to their previous frequencies when generating new queries. Our evaluation results demonstrate that GDsmith is effective and efficient for automated query generation and substantially outperforms the baseline. GDsmith successfully detects 27 previously unknown bugs on the released versions of three popular open-source graph database engines and receive positive feedback from their developers.

preprint2022arXiv

Neural Piecewise-Constant Delay Differential Equations

Continuous-depth neural networks, such as the Neural Ordinary Differential Equations (ODEs), have aroused a great deal of interest from the communities of machine learning and data science in recent years, which bridge the connection between deep neural networks and dynamical systems. In this article, we introduce a new sort of continuous-depth neural network, called the Neural Piecewise-Constant Delay Differential Equations (PCDDEs). Here, unlike the recently proposed framework of the Neural Delay Differential Equations (DDEs), we transform the single delay into the piecewise-constant delay(s). The Neural PCDDEs with such a transformation, on one hand, inherit the strength of universal approximating capability in Neural DDEs. On the other hand, the Neural PCDDEs, leveraging the contributions of the information from the multiple previous time steps, further promote the modeling capability without augmenting the network dimension. With such a promotion, we show that the Neural PCDDEs do outperform the several existing continuous-depth neural frameworks on the one-dimensional piecewise-constant delay population dynamics and real-world datasets, including MNIST, CIFAR10, and SVHN.

preprint2022arXiv

PICASSO: Unleashing the Potential of GPU-centric Training for Wide-and-deep Recommender Systems

The development of personalized recommendation has significantly improved the accuracy of information matching and the revenue of e-commerce platforms. Recently, it has 2 trends: 1) recommender systems must be trained timely to cope with ever-growing new products and ever-changing user interests from online marketing and social network; 2) SOTA recommendation models introduce DNN modules to improve prediction accuracy. Traditional CPU-based recommender systems cannot meet these two trends, and GPU- centric training has become a trending approach. However, we observe that GPU devices in training recommender systems are underutilized, and they cannot attain an expected throughput improvement as what it has achieved in CV and NLP areas. This issue can be explained by two characteristics of these recommendation models: First, they contain up to a thousand input feature fields, introducing fragmentary and memory-intensive operations; Second, the multiple constituent feature interaction submodules introduce substantial small-sized compute kernels. To remove this roadblock to the development of recommender systems, we propose a novel framework named PICASSO to accelerate the training of recommendation models on commodity hardware. Specifically, we conduct a systematic analysis to reveal the bottlenecks encountered in training recommendation models. We leverage the model structure and data distribution to unleash the potential of hardware through our packing, interleaving, and caching optimization. Experiments show that PICASSO increases the hardware utilization by an order of magnitude on the basis of SOTA baselines and brings up to 6x throughput improvement for a variety of industrial recommendation models. Using the same hardware budget in production, PICASSO on average shortens the walltime of daily training tasks by 7 hours, significantly reducing the delay of continuous delivery.

preprint2022arXiv

RAW-GNN: RAndom Walk Aggregation based Graph Neural Network

Graph-Convolution-based methods have been successfully applied to representation learning on homophily graphs where nodes with the same label or similar attributes tend to connect with one another. Due to the homophily assumption of Graph Convolutional Networks (GCNs) that these methods use, they are not suitable for heterophily graphs where nodes with different labels or dissimilar attributes tend to be adjacent. Several methods have attempted to address this heterophily problem, but they do not change the fundamental aggregation mechanism of GCNs because they rely on summation operators to aggregate information from neighboring nodes, which is implicitly subject to the homophily assumption. Here, we introduce a novel aggregation mechanism and develop a RAndom Walk Aggregation-based Graph Neural Network (called RAW-GNN) method. The proposed approach integrates the random walk strategy with graph neural networks. The new method utilizes breadth-first random walk search to capture homophily information and depth-first search to collect heterophily information. It replaces the conventional neighborhoods with path-based neighborhoods and introduces a new path-based aggregator based on Recurrent Neural Networks. These designs make RAW-GNN suitable for both homophily and heterophily graphs. Extensive experimental results showed that the new method achieved state-of-the-art performance on a variety of homophily and heterophily graphs.

preprint2022arXiv

Tie-line Security Regions in High Dimension for Renewable Accommodations

Tie-line power exchanges among regional power systems facilitate renewable accommodations. Power exchanges can be calculated via a tie-line security region that provides the feasible region of the coupling parameters among regional power systems. However, a tie-line security region is a high-dimension polytope due to multiple time periods and border buses inherently in power system operations, leading to the considerable computational burden. A fast calculation method for tie-line security regions in high dimension is studied in this paper. The high-dimension polytope across all the time periods is decomposed as a Cartesian production of lower-dimension polytopes at each time period by leveraging dispatch levels of generations. For each lower-dimension polytope, the computational burden brought by multiple border buses is alleviated by aggregating tie-line power. Also, minimum renewable curtailments are preserved by incorporating an additional dimension in the tie-line security region. For the coupling parameters located within our tie-line security region, a feasible decision of the regional power system exists. Finally, the tie-line security region is used to reduce renewable curtailments in an interconnected power system under a decentralized and non-iterative framework. The performance of the presented methods is corroborated in the IEEE 9-bus system, a 661-bus utility system and a five-region system.

preprint2022arXiv

Whale: Efficient Giant Model Training over Heterogeneous GPUs

The scaling up of deep neural networks has been demonstrated to be effective in improving model quality, but also encompasses several training challenges in terms of training efficiency, programmability, and resource adaptability. We present Whale, a general and efficient distributed training framework for giant models. To support various parallel strategies and their hybrids, Whale generalizes the programming interface by defining two new primitives in the form of model annotations, allowing for incorporating user hints. The Whale runtime utilizes those annotations and performs graph optimizations to transform a local deep learning DAG graph for distributed multi-GPU execution. Whale further introduces a novel hardware-aware parallel strategy, which improves the performance of model training on heterogeneous GPUs in a balanced manner. Deployed in a production cluster with 512 GPUs, Whale successfully trains an industry-scale multimodal model with over ten trillion model parameters, named M6, demonstrating great scalability and efficiency.

preprint2021arXiv

AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search

Large pre-trained language models such as BERT have shown their effectiveness in various natural language processing tasks. However, the huge parameter size makes them difficult to be deployed in real-time applications that require quick inference with limited resources. Existing methods compress BERT into small models while such compression is task-independent, i.e., the same compressed BERT for all different downstream tasks. Motivated by the necessity and benefits of task-oriented BERT compression, we propose a novel compression method, AdaBERT, that leverages differentiable Neural Architecture Search to automatically compress BERT into task-adaptive small models for specific tasks. We incorporate a task-oriented knowledge distillation loss to provide search hints and an efficiency-aware loss as search constraints, which enables a good trade-off between efficiency and effectiveness for task-adaptive BERT compression. We evaluate AdaBERT on several NLP tasks, and the results demonstrate that those task-adaptive compressed models are 12.7x to 29.3x faster than BERT in inference time and 11.5x to 17.0x smaller in terms of parameter size, while comparable performance is maintained.

preprint2021arXiv

Joule-Thomson expansion of the torus-like black hole

In this paper, we study Joule-Thomson effects for the torus-like black hole. The Joule-Thomson coefficients, the inversion curves and the isenthalpic curves are studied. Furthermore, we investigate similarities and differences between the Van der Waals fluid, the torus-like black hole and the charged AdS black holes for the expansion. The isenthalpic curves in the $T-P$ plane are obtained. Moreover, we determine the cooling-heating regions.

preprint2021arXiv

Neural Delay Differential Equations

Neural Ordinary Differential Equations (NODEs), a framework of continuous-depth neural networks, have been widely applied, showing exceptional efficacy in coping with some representative datasets. Recently, an augmented framework has been successfully developed for conquering some limitations emergent in application of the original framework. Here we propose a new class of continuous-depth neural networks with delay, named as Neural Delay Differential Equations (NDDEs), and, for computing the corresponding gradients, we use the adjoint sensitivity method to obtain the delayed dynamics of the adjoint. Since the differential equations with delays are usually seen as dynamical systems of infinite dimension possessing more fruitful dynamics, the NDDEs, compared to the NODEs, own a stronger capacity of nonlinear representations. Indeed, we analytically validate that the NDDEs are of universal approximators, and further articulate an extension of the NDDEs, where the initial function of the NDDEs is supposed to satisfy ODEs. More importantly, we use several illustrative examples to demonstrate the outstanding capacities of the NDDEs and the NDDEs with ODEs' initial value. Specifically, (1) we successfully model the delayed dynamics where the trajectories in the lower-dimensional phase space could be mutually intersected, while the traditional NODEs without any argumentation are not directly applicable for such modeling, and (2) we achieve lower loss and higher accuracy not only for the data produced synthetically by complex models but also for the real-world image datasets, i.e., CIFAR10, MNIST, and SVHN. Our results on the NDDEs reveal that appropriately articulating the elements of dynamical systems into the network design is truly beneficial to promoting the network performance.

preprint2020arXiv

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads

The last decade has witnessed growth in the computational requirements for training deep neural networks. Current approaches (e.g., data/model parallelism, pipeline parallelism) parallelize training tasks onto multiple devices. However, these approaches always rely on specific deep learning frameworks and requires elaborate manual design, which make it difficult to maintain and share between different type of models. In this paper, we propose Auto-MAP, a framework for exploring distributed execution plans for DNN workloads, which can automatically discovering fast parallelization strategies through reinforcement learning on IR level of deep learning models. Efficient exploration remains a major challenge for reinforcement learning. We leverage DQN with task-specific pruning strategies to help efficiently explore the search space including optimized strategies. Our evaluation shows that Auto-MAP can find the optimal solution in two hours, while achieving better throughput on several NLP and convolution models.

preprint2020arXiv

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are still several tricky issues to address: improving computing efficiency while ensuring convergence, and reducing memory usage without incurring additional computing costs. We propose DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. It features a novel parallelization strategy planner to solve the partition and placement problems, and explores the optimal hybrid strategy of data and pipeline parallelism. We also propose a new runtime scheduling algorithm to reduce device memory usage, which is orthogonal to re-computation approach and does not come at the expense of training throughput. Experiments show that DAPPLE planner consistently outperforms strategies generated by PipeDream's planner by up to 3.23x under synchronous training scenarios, and DAPPLE runtime outperforms GPipe by 1.6x speedup of training throughput and reduces the memory consumption of 12% at the same time.

preprint2020arXiv

Graph Structural-topic Neural Network

Graph Convolutional Networks (GCNs) achieved tremendous success by effectively gathering local features for nodes. However, commonly do GCNs focus more on node features but less on graph structures within the neighborhood, especially higher-order structural patterns. However, such local structural patterns are shown to be indicative of node properties in numerous fields. In addition, it is not just single patterns, but the distribution over all these patterns matter, because networks are complex and the neighborhood of each node consists of a mixture of various nodes and structural patterns. Correspondingly, in this paper, we propose Graph Structural-topic Neural Network, abbreviated GraphSTONE, a GCN model that utilizes topic models of graphs, such that the structural topics capture indicative graph structures broadly from a probabilistic aspect rather than merely a few structures. Specifically, we build topic models upon graphs using anonymous walks and Graph Anchor LDA, an LDA variant that selects significant structural patterns first, so as to alleviate the complexity and generate structural topics efficiently. In addition, we design multi-view GCNs to unify node features and structural topic features and utilize structural topics to guide the aggregation. We evaluate our model through both quantitative and qualitative experiments, where our model exhibits promising performance, high efficiency, and clear interpretability.

preprint2020arXiv

Grasping Detection Network with Uncertainty Estimation for Confidence-Driven Semi-Supervised Domain Adaptation

Data-efficient domain adaptation with only a few labelled data is desired for many robotic applications, e.g., in grasping detection, the inference skill learned from a grasping dataset is not universal enough to directly apply on various other daily/industrial applications. This paper presents an approach enabling the easy domain adaptation through a novel grasping detection network with confidence-driven semi-supervised learning, where these two components deeply interact with each other. The proposed grasping detection network specially provides a prediction uncertainty estimation mechanism by leveraging on Feature Pyramid Network (FPN), and the mean-teacher semi-supervised learning utilizes such uncertainty information to emphasizing the consistency loss only for those unlabelled data with high confidence, which we referred it as the confidence-driven mean teacher. This approach largely prevents the student model to learn the incorrect/harmful information from the consistency loss, which speeds up the learning progress and improves the model accuracy. Our results show that the proposed network can achieve high success rate on the Cornell grasping dataset, and for domain adaptation with very limited data, the confidence-driven mean teacher outperforms the original mean teacher and direct training by more than 10% in evaluation loss especially for avoiding the overfitting and model diverging.

preprint2020arXiv

Impact of intra and inter-cluster coupling balance on the performance of nonlinear networked systems

The dynamical and structural aspects of cluster synchronization (CS) in complex systems have been intensively investigated in recent years. Here, we study CS of dynamical systems with intra and inter-cluster couplings. We propose new metrics that describe the performance of such systems and evaluate them as a function of the strength of the couplings within and between clusters. We obtain analytical results that indicate that spectral differences between the Laplacian matrices associated with the partition between intra and inter-couplings directly affect the proposed metrics of system performance. Our results show that the dynamics of the system might exhibit an optimal balance that optimizes its performance. Our work provides new insights into the way specific symmetry properties relate to collective behavior, and could lead to new forms to increase the controllability of complex systems and to optimize their stability.

preprint2020arXiv

NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization

In the last decade, crowd counting and localization attract much attention of researchers due to its wide-spread applications, including crowd monitoring, public safety, space design, etc. Many Convolutional Neural Networks (CNN) are designed for tackling this task. However, currently released datasets are so small-scale that they can not meet the needs of the supervised CNN-based algorithms. To remedy this problem, we construct a large-scale congested crowd counting and localization dataset, NWPU-Crowd, consisting of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes. Compared with other real-world datasets, it contains various illumination scenes and has the largest density range (0~20,033). Besides, a benchmark website is developed for impartially evaluating the different methods, which allows researchers to submit the results of the test set. Based on the proposed dataset, we further describe the data characteristics, evaluate the performance of some mainstream state-of-the-art (SOTA) methods, and analyze the new problems that arise on the new data. What's more, the benchmark is deployed at \url{https://www.crowdbenchmark.com/}, and the dataset/code/models/results are available at \url{https://gjy3035.github.io/NWPU-Crowd-Sample-Code/}.

preprint2020arXiv

One-shot Text Field Labeling using Attention and Belief Propagation for Structure Information Extraction

Structured information extraction from document images usually consists of three steps: text detection, text recognition, and text field labeling. While text detection and text recognition have been heavily studied and improved a lot in literature, text field labeling is less explored and still faces many challenges. Existing learning based methods for text labeling task usually require a large amount of labeled examples to train a specific model for each type of document. However, collecting large amounts of document images and labeling them is difficult and sometimes impossible due to privacy issues. Deploying separate models for each type of document also consumes a lot of resources. Facing these challenges, we explore one-shot learning for the text field labeling task. Existing one-shot learning methods for the task are mostly rule-based and have difficulty in labeling fields in crowded regions with few landmarks and fields consisting of multiple separate text regions. To alleviate these problems, we proposed a novel deep end-to-end trainable approach for one-shot text field labeling, which makes use of attention mechanism to transfer the layout information between document images. We further applied conditional random field on the transferred layout information for the refinement of field labeling. We collected and annotated a real-world one-shot field labeling dataset with a large variety of document types and conducted extensive experiments to examine the effectiveness of the proposed model. To stimulate research in this direction, the collected dataset and the one-shot model will be released1.

preprint2020arXiv

Pixel-wise Crowd Understanding via Synthetic Data

Crowd analysis via computer vision techniques is an important topic in the field of video surveillance, which has wide-spread applications including crowd monitoring, public safety, space design and so on. Pixel-wise crowd understanding is the most fundamental task in crowd analysis because of its finer results for video sequences or still images than other analysis tasks. Unfortunately, pixel-level understanding needs a large amount of labeled training data. Annotating them is an expensive work, which causes that current crowd datasets are small. As a result, most algorithms suffer from over-fitting to varying degrees. In this paper, take crowd counting and segmentation as examples from the pixel-wise crowd understanding, we attempt to remedy these problems from two aspects, namely data and methodology. Firstly, we develop a free data collector and labeler to generate synthetic and labeled crowd scenes in a computer game, Grand Theft Auto V. Then we use it to construct a large-scale, diverse synthetic crowd dataset, which is named as "GCC Dataset". Secondly, we propose two simple methods to improve the performance of crowd understanding via exploiting the synthetic data. To be specific, 1) supervised crowd understanding: pre-train a crowd analysis model on the synthetic data, then fine-tune it using the real data and labels, which makes the model perform better on the real world; 2) crowd understanding via domain adaptation: translate the synthetic data to photo-realistic images, then train the model on translated data and labels. As a result, the trained model works well in real crowd scenes.

preprint2020arXiv

RPM-Oriented Query Rewriting Framework for E-commerce Keyword-Based Sponsored Search

Sponsored search optimizes revenue and relevance, which is estimated by Revenue Per Mille (RPM). Existing sponsored search models are all based on traditional statistical models, which have poor RPM performance when queries follow a heavy-tailed distribution. Here, we propose an RPM-oriented Query Rewriting Framework (RQRF) which outputs related bid keywords that can yield high RPM. RQRF embeds both queries and bid keywords to vectors in the same implicit space, converting the rewriting probability between each query and keyword to the distance between the two vectors. For label construction, we propose an RPM-oriented sample construction method, labeling keywords based on whether or not they can lead to high RPM. Extensive experiments are conducted to evaluate performance of RQRF. In a one month large-scale real-world traffic of e-commerce sponsored search system, the proposed model significantly outperforms traditional baseline.

preprint2020arXiv

SwapText: Image Based Texts Transfer in Scenes

Swapping text in scene images while preserving original fonts, colors, sizes and background textures is a challenging task due to the complex interplay between different factors. In this work, we present SwapText, a three-stage framework to transfer texts across scene images. First, a novel text swapping network is proposed to replace text labels only in the foreground image. Second, a background completion network is learned to reconstruct background images. Finally, the generated foreground image and background image are used to generate the word image by the fusion network. Using the proposing framework, we can manipulate the texts of the input images even with severe geometric distortion. Qualitative and quantitative results are presented on several scene text datasets, including regular and irregular text datasets. We conducted extensive experiments to prove the usefulness of our method such as image based text translation, text image synthesis, etc.

preprint2019arXiv

Supercontinuum generation without residual pump peak through multiple coherent pump seeds

Residual pump peak in fiber-based supercontinuum, as a general phenomenon, limits its practical application. We report a novel supercontinuum generation (SCG) in a conventional highly nonlinear fiber (HNLF) through multiple coherent pump technique, which eliminates the residual pump peak existed in conventional SCG. The multiple coherent pump technique is realized by double bound-state solitons achieved from a homemade modelocked fiber laser. We further compare the SCGs pumped by conventional bound-state soliton and single soliton. It confirms that the effective elimination of the residual pump peak in supercontinuum owes to higher transferring efficiency of the pump energy to new generated frequencies in the multiple coherent pump scheme. The use of multiple coherent pump scheme, i.e., double bound-state solitons, provides a new, simple and promising method to obtain flat supercontinuum source.