Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
39works
0followers
22topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

39 published item(s)

preprint2026arXiv

OxyGent: Making Multi-Agent Systems Modular, Observable, and Evolvable via Oxy Abstraction

Deploying production-ready multi-agent systems (MAS) in complex industrial environments remains challenging due to limitations in scalability, observability, and autonomous evolution. We present OxyGent, an open-source framework driven by two core novelties: a unified Oxy abstraction and the OxyBank evolution engine. The unified abstraction encapsulates agents, tools, LLMs, and reasoning flows as pluggable atomic components, enabling Lego-like scalable system composition and non-intrusive monitoring. To enhance observability, OxyGent introduces permission-driven dynamic planning that replaces rigid workflows with execution graphs generated at runtime, providing adaptive visualizations. Furthermore, to support continuous evolution, OxyBank serves as an AI asset management platform that drives automated data backflow, annotation, and joint evolution. Empirical evaluations and real-world case studies show that OxyGent provides a robust and scalable foundation for MAS. OxyGent is fully open-sourced under the Apache License 2.0 at https://github.com/jd-opensource/OxyGent.

preprint2024arXiv

Re-evaluating the Memory-balanced Pipeline Parallelism: BPipe

Pipeline parallelism is an essential technique in the training of large-scale Transformer models. However, it suffers from imbalanced memory consumption, leading to insufficient memory utilization. The BPipe technique was proposed to address this issue and has proven effective in the GPT-3 model. Nevertheless, our experiments have not yielded similar benefits for LLaMA training. Additionally, BPipe only yields negligible benefits for GPT-3 training when applying flash attention. We analyze the underlying causes of the divergent performance of BPipe on GPT-3 and LLaMA. Furthermore, we introduce a novel method to estimate the performance of BPipe.

preprint2022arXiv

A Normalized Gaussian Wasserstein Distance for Tiny Object Detection

Detecting tiny objects is a very challenging problem since a tiny object only contains a few pixels in size. We demonstrate that state-of-the-art detectors do not produce satisfactory results on tiny objects due to the lack of appearance information. Our key observation is that Intersection over Union (IoU) based metrics such as IoU itself and its extensions are very sensitive to the location deviation of the tiny objects, and drastically deteriorate the detection performance when used in anchor-based detectors. To alleviate this, we propose a new evaluation metric using Wasserstein distance for tiny object detection. Specifically, we first model the bounding boxes as 2D Gaussian distributions and then propose a new metric dubbed Normalized Wasserstein Distance (NWD) to compute the similarity between them by their corresponding Gaussian distributions. The proposed NWD metric can be easily embedded into the assignment, non-maximum suppression, and loss function of any anchor-based detector to replace the commonly used IoU metric. We evaluate our metric on a new dataset for tiny object detection (AI-TOD) in which the average object size is much smaller than existing object detection datasets. Extensive experiments show that, when equipped with NWD metric, our approach yields performance that is 6.7 AP points higher than a standard fine-tuning baseline, and 6.0 AP points higher than state-of-the-art competitors. Codes are available at: https://github.com/jwwangchn/NWD.

preprint2022arXiv

Asymptotics for Strassen's Optimal Transport Problem

In this paper, we consider Strassen's version of optimal transport (OT) problem, which concerns minimizing the excess-cost probability (i.e., the probability that the cost is larger than a given value) over all couplings of two given distributions. We derive large deviation, moderate deviation, and central limit theorems for this problem. Our proof is based on Strassen's dual formulation of the OT problem, Sanov's theorem on the large deviation principle (LDP) of empirical measures, as well as the moderate deviation principle (MDP) and central limit theorems (CLT) of empirical measures. In order to apply the LDP, MDP, and CLT to Strassen's OT problem, nested formulas for Strassen's OT problem are derived. Based on these nested formulas and using a splitting technique, we construct asymptotically optimal solutions to Strassen's OT problem and its dual formulation.

preprint2022arXiv

Autofocus for Event Cameras

Focus control (FC) is crucial for cameras to capture sharp images in challenging real-world scenarios. The autofocus (AF) facilitates the FC by automatically adjusting the focus settings. However, due to the lack of effective AF methods for the recently introduced event cameras, their FC still relies on naive AF like manual focus adjustments, leading to poor adaptation in challenging real-world conditions. In particular, the inherent differences between event and frame data in terms of sensing modality, noise, temporal resolutions, etc., bring many challenges in designing an effective AF method for event cameras. To address these challenges, we develop a novel event-based autofocus framework consisting of an event-specific focus measure called event rate (ER) and a robust search strategy called event-based golden search (EGS). To verify the performance of our method, we have collected an event-based autofocus dataset (EAD) containing well-synchronized frames, events, and focal positions in a wide variety of challenging scenes with severe lighting and motion conditions. The experiments on this dataset and additional real-world scenarios demonstrated the superiority of our method over state-of-the-art approaches in terms of efficiency and accuracy.

preprint2022arXiv

Backpropagation through Time and Space: Learning Numerical Methods with Multi-Agent Reinforcement Learning

We introduce Backpropagation Through Time and Space (BPTTS), a method for training a recurrent spatio-temporal neural network, that is used in a homogeneous multi-agent reinforcement learning (MARL) setting to learn numerical methods for hyperbolic conservation laws. We treat the numerical schemes underlying partial differential equations (PDEs) as a Partially Observable Markov Game (POMG) in Reinforcement Learning (RL). Similar to numerical solvers, our agent acts at each discrete location of a computational space for efficient and generalizable learning. To learn higher-order spatial methods by acting on local states, the agent must discern how its actions at a given spatiotemporal location affect the future evolution of the state. The manifestation of this non-stationarity is addressed by BPTTS, which allows for the flow of gradients across both space and time. The learned numerical policies are comparable to the SOTA numerics in two settings, the Burgers' Equation and the Euler Equations, and generalize well to other simulation set-ups.

preprint2022arXiv

BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment

This work addresses the Burst Super-Resolution (BurstSR) task using a new architecture, which requires restoring a high-quality image from a sequence of noisy, misaligned, and low-resolution RAW bursts. To overcome the challenges in BurstSR, we propose a Burst Super-Resolution Transformer (BSRT), which can significantly improve the capability of extracting inter-frame information and reconstruction. To achieve this goal, we propose a Pyramid Flow-Guided Deformable Convolution Network (Pyramid FG-DCN) and incorporate Swin Transformer Blocks and Groups as our main backbone. More specifically, we combine optical flows and deformable convolutions, hence our BSRT can handle misalignment and aggregate the potential texture information in multi-frames more efficiently. In addition, our Transformer-based structure can capture long-range dependency to further improve the performance. The evaluation on both synthetic and real-world tracks demonstrates that our approach achieves a new state-of-the-art in BurstSR task. Further, our BSRT wins the championship in the NTIRE2022 Burst Super-Resolution Challenge.

preprint2022arXiv

Deep Constrained Least Squares for Blind Image Super-Resolution

In this paper, we tackle the problem of blind image super-resolution(SR) with a reformulated degradation model and two novel modules. Following the common practices of blind SR, our method proposes to improve both the kernel estimation as well as the kernel-based high-resolution image restoration. To be more specific, we first reformulate the degradation model such that the deblurring kernel estimation can be transferred into the low-resolution space. On top of this, we introduce a dynamic deep linear filter module. Instead of learning a fixed kernel for all images, it can adaptively generate deblurring kernel weights conditional on the input and yield a more robust kernel estimation. Subsequently, a deep constrained least square filtering module is applied to generate clean features based on the reformulation and estimated kernel. The deblurred feature and the low input image feature are then fed into a dual-path structured SR network and restore the final high-resolution result. To evaluate our method, we further conduct evaluations on several benchmarks, including Gaussian8 and DIV2KRK. Our experiments demonstrate that the proposed method achieves better accuracy and visual improvements against state-of-the-art methods.

preprint2022arXiv

Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark

Tiny object detection (TOD) in aerial images is challenging since a tiny object only contains a few pixels. State-of-the-art object detectors do not provide satisfactory results on tiny objects due to the lack of supervision from discriminative features. Our key observation is that the Intersection over Union (IoU) metric and its extensions are very sensitive to the location deviation of the tiny objects, which drastically deteriorates the quality of label assignment when used in anchor-based detectors. To tackle this problem, we propose a new evaluation metric dubbed Normalized Wasserstein Distance (NWD) and a new RanKing-based Assigning (RKA) strategy for tiny object detection. The proposed NWD-RKA strategy can be easily embedded into all kinds of anchor-based detectors to replace the standard IoU threshold-based one, significantly improving label assignment and providing sufficient supervision information for network training. Tested on four datasets, NWD-RKA can consistently improve tiny object detection performance by a large margin. Besides, observing prominent noisy labels in the Tiny Object Detection in Aerial Images (AI-TOD) dataset, we are motivated to meticulously relabel it and release AI-TOD-v2 and its corresponding benchmark. In AI-TOD-v2, the missing annotation and location error problems are considerably mitigated, facilitating more reliable training and validation processes. Embedding NWD-RKA into DetectoRS, the detection performance achieves 4.3 AP points improvement over state-of-the-art competitors on AI-TOD-v2. Datasets, codes, and more visualizations are available at: https://chasel-tsui.github.io/AI-TOD-v2/

preprint2022arXiv

Documentation based Semantic-Aware Log Parsing

With the recent advances of deep learning techniques, there are rapidly growing interests in applying machine learning to log data. As a fundamental part of log analytics, accurate log parsing that transforms raw logs to structured events is critical for subsequent machine learning and data mining tasks. Previous approaches either analyze the source code for parsing or are data-driven such as text clustering. They largely neglect to exploit another widely available and valuable resource, software documentation that provides detailed explanations for the messages, to improve accuracy. In this paper, we propose an approach and system framework to use documentation knowledge for log parsing. With parameter value identification, it not only can improve the parsing accuracy for documented messages but also for undocumented messages. In addition, it can discover the linkages between event templates that are established by sharing parameters and indicate the correlation of the event context.

preprint2022arXiv

Enabling arbitrary translation objectives with Adaptive Tree Search

We introduce an adaptive tree search algorithm, that can find high-scoring outputs under translation models that make no assumptions about the form or structure of the search objective. This algorithm -- a deterministic variant of Monte Carlo tree search -- enables the exploration of new kinds of models that are unencumbered by constraints imposed to make decoding tractable, such as autoregressivity or conditional independence assumptions. When applied to autoregressive models, our algorithm has different biases than beam search has, which enables a new analysis of the role of decoding bias in autoregressive models. Empirically, we show that our adaptive tree search algorithm finds outputs with substantially better model scores compared to beam search in autoregressive models, and compared to reranking techniques in models whose scores do not decompose additively with respect to the words in the output. We also characterise the correlation of several translation model objectives with respect to BLEU. We find that while some standard models are poorly calibrated and benefit from the beam search bias, other often more robust models (autoregressive models tuned to maximize expected automatic metric scores, the noisy channel model and a newly proposed objective) benefit from increasing amounts of search using our proposed decoder, whereas the beam search bias limits the improvements obtained from such objectives. Thus, we argue that as models improve, the improvements may be masked by over-reliance on beam search or reranking based methods.

preprint2022arXiv

Fast Nearest Convolution for Real-Time Efficient Image Super-Resolution

Deep learning-based single image super-resolution (SISR) approaches have drawn much attention and achieved remarkable success on modern advanced GPUs. However, most state-of-the-art methods require a huge number of parameters, memories, and computational resources, which usually show inferior inference times when applying them to current mobile device CPUs/NPUs. In this paper, we propose a simple plain convolution network with a fast nearest convolution module (NCNet), which is NPU-friendly and can perform a reliable super-resolution in real-time. The proposed nearest convolution has the same performance as the nearest upsampling but is much faster and more suitable for Android NNAPI. Our model can be easily deployed on mobile devices with 8-bit quantization and is fully compatible with all major mobile AI accelerators. Moreover, we conduct comprehensive experiments on different tensor operations on a mobile device to illustrate the efficiency of our network architecture. Our NCNet is trained and validated on the DIV2K 3x dataset, and the comparison with other efficient SR methods demonstrated that the NCNet can achieve high fidelity SR results while using fewer inference times. Our codes and pretrained models are publicly available at \url{https://github.com/Algolzw/NCNet}.

preprint2022arXiv

First implementation of full-workflow automation in radiotherapy: the All-in-One solution on rectal cancer

The aim of this work is to describe the technical characteristics of an AI-powered radiotherapy workflow that enables full-process automation (All-in-One), evaluate its performance implemented for on-couch initial treatment of rectal cancer, and provide insight into the behavior of full-workflow automation in the specialty of radiotherapy. The All-in-One workflow was developed based on a CT-integrated linear accelerator. It incorporates routine radiotherapy procedures from simulation, autosegmentation, autoplanning, image guidance, beam delivery, and in vivo quality assurance (QA) into one scheme, with critical decision points involved, while the patient is on the treatment couch during the whole process. For the enrolled ten patients with rectal cancer, minor modifications of the autosegmented target volumes were required, and the Dice similarity coefficient and 95% Hausdorff distance before and after modifications were 0.892{\pm}0.061 and 18.2{\pm}13.0 mm, respectively. The autosegmented normal tissues and automatic plans were clinically acceptable without any modifications or reoptimization. The pretreatment IGRT corrections were within 2 mm in all directions, and the EPID-based in vivo QA showed a γ passing rate better than 97{\%} (3{\%}/3 mm/10{\%} threshold). The duration of the whole process was 23.2{\pm}3.5 minutes, depending mostly on the time required for manual modification and plan evaluation. The All-in-One workflow enables full automation of the entire radiotherapy process by seamlessly integrating multiple routine procedures. The one-stop solution shortens the time scale it takes to ready the first treatment from days to minutes, significantly improving the patient experience and the efficiency of the workflow, and shows potential to facilitate the clinical application of online adaptive replanning.

preprint2022arXiv

GLF-CR: SAR-Enhanced Cloud Removal with Global-Local Fusion

The challenge of the cloud removal task can be alleviated with the aid of Synthetic Aperture Radar (SAR) images that can penetrate cloud cover. However, the large domain gap between optical and SAR images as well as the severe speckle noise of SAR images may cause significant interference in SAR-based cloud removal, resulting in performance degeneration. In this paper, we propose a novel global-local fusion based cloud removal (GLF-CR) algorithm to leverage the complementary information embedded in SAR images. Exploiting the power of SAR information to promote cloud removal entails two aspects. The first, global fusion, guides the relationship among all local optical windows to maintain the structure of the recovered region consistent with the remaining cloud-free regions. The second, local fusion, transfers complementary information embedded in the SAR image that corresponds to cloudy areas to generate reliable texture details of the missing regions, and uses dynamic filtering to alleviate the performance degradation caused by speckle noise. Extensive evaluation demonstrates that the proposed algorithm can yield high quality cloud-free images and outperform state-of-the-art cloud removal algorithms with a gain about 1.7dB in terms of PSNR on SEN12MS-CR dataset.

preprint2022arXiv

Learning to Extract Building Footprints from Off-Nadir Aerial Images

Extracting building footprints from aerial images is essential for precise urban mapping with photogrammetric computer vision technologies. Existing approaches mainly assume that the roof and footprint of a building are well overlapped, which may not hold in off-nadir aerial images as there is often a big offset between them. In this paper, we propose an offset vector learning scheme, which turns the building footprint extraction problem in off-nadir images into an instance-level joint prediction problem of the building roof and its corresponding "roof to footprint" offset vector. Thus the footprint can be estimated by translating the predicted roof mask according to the predicted offset vector. We further propose a simple but effective feature-level offset augmentation module, which can significantly refine the offset vector prediction by introducing little extra cost. Moreover, a new dataset, Buildings in Off-Nadir Aerial Images (BONAI), is created and released in this paper. It contains 268,958 building instances across 3,300 aerial images with fully annotated instance-level roof, footprint, and corresponding offset vector for each building. Experiments on the BONAI dataset demonstrate that our method achieves the state-of-the-art, outperforming other competitors by 3.37 to 7.39 points in F1-score. The codes, datasets, and trained models are available at https://github.com/jwwangchn/BONAI.git.

preprint2022arXiv

Linear change and minutes variability of solar wind velocity revealed by FAST

Observation of Interplanetary Scintillation (IPS) provides an important and effective way to study the solar wind and the space weather. A series of IPS observations were conducted by the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The extraordinary sensitivity and the wide frequency coverage make FAST an ideal platform for IPS studies. In this paper we present some first scientific results from FAST observations of IPS with the L-band receiver. Based on the solar wind velocity fitting values of FAST observations on September 26-28, 2020, we found that the velocity decreases with increasing frequency linearly, which has not yet been reported in literature. And we have also detected a variation of solar wind velocity on a timescale of 3-5 minutes, which imply the slow change of the background solar wind, a co-existence of high- and low-speed streams, or a reflect of the quasi-periodic electron-density fluctuations.

preprint2022arXiv

MAD for Robust Reinforcement Learning in Machine Translation

We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviation in the importance weighting calculation), has distributed data generators sampling multiple candidates per source sentence on worker nodes, while a central learner updates the policy. MAD depends crucially on two variance reduction strategies: (1) a conditional reward normalization method that ensures each source sentence has both positive and negative reward translation examples and (2) a new robust importance weighting scheme that acts as a conditional entropy regularizer. Experiments on a variety of translation tasks show that policies learned using the MAD algorithm perform very well when using both greedy decoding and beam search, and that the learned policies are sensitive to the specific reward used during training.

preprint2022arXiv

Noun2Verb: Probabilistic frame semantics for word class conversion

Humans can flexibly extend word usages across different grammatical classes, a phenomenon known as word class conversion. Noun-to-verb conversion, or denominal verb (e.g., to Google a cheap flight), is one of the most prevalent forms of word class conversion. However, existing natural language processing systems are impoverished in interpreting and generating novel denominal verb usages. Previous work has suggested that novel denominal verb usages are comprehensible if the listener can compute the intended meaning based on shared knowledge with the speaker. Here we explore a computational formalism for this proposal couched in frame semantics. We present a formal framework, Noun2Verb, that simulates the production and comprehension of novel denominal verb usages by modeling shared knowledge of speaker and listener in semantic frames. We evaluate an incremental set of probabilistic models that learn to interpret and generate novel denominal verb usages via paraphrasing. We show that a model where the speaker and listener cooperatively learn the joint distribution over semantic frame elements better explains the empirical denominal verb usages than state-of-the-art language models, evaluated against data from 1) contemporary English in both adult and child speech, 2) contemporary Mandarin Chinese, and 3) the historical development of English. Our work grounds word class conversion in probabilistic frame semantics and bridges the gap between natural language processing systems and humans in lexical creativity.

preprint2022arXiv

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution.

preprint2022arXiv

On the Rate-Distortion-Perception Function

Rate-distortion-perception theory generalizes Shannon's rate-distortion theory by introducing a constraint on the perceptual quality of the output. The perception constraint complements the conventional distortion constraint and aims to enforce distribution-level consistencies. In this new theory, the information-theoretic limit is characterized by the rate-distortion-perception function. Although a coding theorem for the rate-distortion-perception function has recently been established, the fundamental nature of the optimal coding schemes remains unclear, especially regarding the role of randomness in encoding and decoding. It is shown in the present work that except for certain extreme cases, the rate-distortion-perception function is achievable by deterministic codes. This paper also clarifies the subtle differences between two notions of perfect perceptual quality and explores some alternative formulations of the perception constraint.

preprint2022arXiv

QPanda: high-performance quantum computing framework for multiple application scenarios

With the birth of Noisy Intermediate Scale Quantum (NISQ) devices and the verification of "quantum supremacy" in random number sampling and boson sampling, more and more fields hope to use quantum computers to solve specific problems, such as aerodynamic design, route allocation, financial option prediction, quantum chemical simulation to find new materials, and the challenge of quantum cryptography to automotive industry security. However, these fields still need to constantly explore quantum algorithms that adapt to the current NISQ machine, so a quantum programming framework that can face multi-scenarios and application needs is required. Therefore, this paper proposes QPanda, an application scenario-oriented quantum programming framework with high-performance simulation. Such as designing quantum chemical simulation algorithms based on it to explore new materials, building a quantum machine learning framework to serve finance, etc. This framework implements high-performance simulation of quantum circuits, a configuration of the fusion processing backend of quantum computers and supercomputers, and compilation and optimization methods of quantum programs for NISQ machines. Finally, the experiment shows that quantum jobs can be executed with high fidelity on the quantum processor using quantum circuit compile and optimized interface and have better simulation performance.

preprint2022arXiv

Sequential Channel Synthesis

The channel synthesis problem has been widely investigated over the last decade. In this paper, we consider the sequential version in which the encoder and the decoder work in a sequential way. Under a mild assumption on the target joint distribution we provide a complete (single-letter) characterization of the solution for the point-to-point case, which shows that the canonical symbol-by-symbol mapping is not optimal in general, but is indeed optimal if we make some additional assumptions on the encoder and decoder. We also extend this result to the broadcast scenario and the interactive communication scenario. We provide bounds in the broadcast setting and a complete characterization of the solution under a mild condition on the target joint distribution in the interactive communication case. Our proofs are based on a Rényi entropy method.

preprint2022arXiv

Unifying Motion Deblurring and Frame Interpolation with Events

Slow shutter speed and long exposure time of frame-based cameras often cause visual blur and loss of inter-frame information, degenerating the overall quality of captured videos. To this end, we present a unified framework of event-based motion deblurring and frame interpolation for blurry video enhancement, where the extremely low latency of events is leveraged to alleviate motion blur and facilitate intermediate frame prediction. Specifically, the mapping relation between blurry frames and sharp latent images is first predicted by a learnable double integral network, and a fusion network is then proposed to refine the coarse results via utilizing the information from consecutive blurry inputs and the concurrent events. By exploring the mutual constraints among blurry frames, latent images, and event streams, we further propose a self-supervised learning framework to enable network training with real-world blurry videos and events. Extensive experiments demonstrate that our method compares favorably against the state-of-the-art approaches and achieves remarkable performance on both synthetic and real-world datasets.

preprint2022arXiv

Weak solutions to an initial-boundary value problem for a continuum equation of motion of grain boundaries

We investigate an initial-(periodic-)boundary value problem for a continuum equation, which is a model for motion of grain boundaries based on the underlying microscopic mechanisms of line defects (disconnections) and integrated the effects of a diverse range of thermodynamic driving forces. We first prove the global-in-time existence and uniqueness of weak solution to this initial-boundary value problem in the case with positive equilibrium disconnection density parameter B, and then investigate the asymptotic behavior of the solutions as B goes to zero. The main difficulties in the proof of main theorems are due to the degeneracy of B=0, a non-local term with singularity, and a non-smooth coefficient of the highest derivative associated with the gradient of the unknown. The key ingredients in the proof are the energy method, an estimate for a singular integral of the Hilbert type, and a compactness lemma.

preprint2021arXiv

On Non-Interactive Simulation of Binary Random Variables

We leverage proof techniques Fourier analysis and an existing result in coding theory to derive new bounds for the problem of non-interactive simulation of binary random variables. Previous bounds in the literature were derived by applying data processing inequalities concerning maximal correlation or hypercontractivity. We show that our bounds are sharp in some regimes. For a specific instance of problem parameters, our main result answers an open problem posed by E. Mossel in 2017. As by-products of our analyses, various new properties of the average distance and distance enumerator of binary block codes are established.

preprint2020arXiv

Better Document-Level Machine Translation with Bayes' Rule

We show that Bayes' rule provides an effective mechanism for creating document translation models that can be learned from only parallel sentences and monolingual documents---a compelling benefit as parallel documents are not always available. In our formulation, the posterior probability of a candidate translation is the product of the unconditional (prior) probability of the candidate output document and the "reverse translation probability" of translating the candidate output back into the source language. Our proposed model uses a powerful autoregressive language model as the prior on target language documents, but it assumes that each sentence is translated independently from the target to the source language. Crucially, at test time, when a source document is observed, the document language model prior induces dependencies between the translations of the source sentences in the posterior. The model's independence assumption not only enables efficient use of available data, but it additionally admits a practical left-to-right beam-search algorithm for carrying out inference. Experiments show that our model benefits from using cross-sentence context in the language model, and it outperforms existing document translation approaches.

preprint2020arXiv

Capturing document context inside sentence-level neural machine translation models with self-training

Neural machine translation (NMT) has arguably achieved human level parity when trained and evaluated at the sentence-level. Document-level neural machine translation has received less attention and lags behind its sentence-level counterpart. The majority of the proposed document-level approaches investigate ways of conditioning the model on several source or target sentences to capture document context. These approaches require training a specialized NMT model from scratch on parallel document-level corpora. We propose an approach that doesn't require training a specialized model on parallel document-level corpora and is applied to a trained sentence-level NMT model at decoding time. We process the document from left to right multiple times and self-train the sentence-level model on pairs of source sentences and generated translations. Our approach reinforces the choices made by the model, thus making it more likely that the same choices will be made in other sentences in the document. We evaluate our approach on three document-level datasets: NIST Chinese-English, WMT'19 Chinese-English and OpenSubtitles English-Russian. We demonstrate that our approach has higher BLEU score and higher human preference than the baseline. Qualitative analysis of our approach shows that choices made by model are consistent across the document.

preprint2020arXiv

Corrections to "Wyner's Common Information under Rényi Divergence Measures"

In this correspondence, we correct an erroneous result on the achievability part of the Rényi common information with order $1+s\in(1,2]$ in [1]. The new achievability result (upper bound) of the Rényi common information no longer coincides with Wyner's common information. We also provide a new converse result (lower bound) in this correspondence for the Rényi common information with order $1+s\in(1,\infty]$. Numerical results show that for doubly symmetric binary sources, the new upper and lower bounds coincide for the order $1+s\in(1,2]$ and they are both strictly larger than Wyner's common information for this case.

preprint2020arXiv

Event Enhanced High-Quality Image Recovery

With extremely high temporal resolution, event cameras have a large potential for robotics and computer vision. However, their asynchronous imaging mechanism often aggravates the measurement sensitivity to noises and brings a physical burden to increase the image spatial resolution. To recover high-quality intensity images, one should address both denoising and super-resolution problems for event cameras. Since events depict brightness changes, with the enhanced degeneration model by the events, the clear and sharp high-resolution latent images can be recovered from the noisy, blurry and low-resolution intensity observations. Exploiting the framework of sparse learning, the events and the low-resolution intensity observations can be jointly considered. Based on this, we propose an explainable network, an event-enhanced sparse learning network (eSL-Net), to recover the high-quality images from event cameras. After training with a synthetic dataset, the proposed eSL-Net can largely improve the performance of the state-of-the-art by 7-12 dB. Furthermore, without additional training process, the proposed eSL-Net can be easily extended to generate continuous frames with frame-rate as high as the events.

preprint2020arXiv

Exact minimum codegree thresholds for $K_4^-$-covering and $K_5^-$-covering

Given two $3$-graphs $F$ and $H$, an $F$-covering of $H$ is a collection of copies of $F$ in $H$ such that each vertex of $H$ is contained in at least one copy of them. Let {$c_2(n,F)$} be the maximum integer $t$ such that every 3-graph with minimum codegree greater than $t$ has an $F$-covering. In this note, we answer an open problem of Falgas-Ravry and Zhao (SIAM J. Discrete Math., 2016) by determining the exact value of {$c_2(n, K_4^-)$} and {$c_2(n, K_5^-)$}, where $K_t^-$ is the complete $3$-graph on $t$ vertices with one edge removed.

preprint2020arXiv

Implicit Euler ODE Networks for Single-Image Dehazing

Deep convolutional neural networks (CNN) have been applied for image dehazing tasks, where the residual network (ResNet) is often adopted as the basic component to avoid the vanishing gradient problem. Recently, many works indicate that the ResNet can be considered as the explicit Euler forward approximation of an ordinary differential equation (ODE). In this paper, we extend the explicit forward approximation to the implicit backward counterpart, which can be realized via a recursive neural network, named IM-block. Given that, we propose an efficient end-to-end multi-level implicit network (MI-Net) for the single image dehazing problem. Moreover, multi-level fusing (MLF) mechanism and residual channel attention block (RCA-block) are adopted to boost performance of our network. Experiments on several dehazing benchmark datasets demonstrate that our method outperforms existing methods and achieves the state-of-the-art performance.

preprint2020arXiv

Improving Spiking Sparse Recovery via Non-Convex Penalties

Compared with digital methods, sparse recovery based on spiking neural networks has great advantages like high computational efficiency and low power-consumption. However, current spiking algorithms cannot guarantee more accurate estimates since they are usually designed to solve the classical optimization with convex penalties, especially the $\ell_{1}$-norm. In fact, convex penalties are observed to underestimate the true solution in practice, while non-convex ones can avoid the underestimation. Inspired by this, we propose an adaptive version of spiking sparse recovery algorithm to solve the non-convex regularized optimization, and provide an analysis on its global asymptotic convergence. Through experiments, the accuracy is greatly improved under different adaptive ways.

preprint2020arXiv

Intra-Ensemble in Neural Networks

Improving model performance is always the key problem in machine learning including deep learning. However, stand-alone neural networks always suffer from marginal effect when stacking more layers. At the same time, ensemble is an useful technique to further enhance model performance. Nevertheless, training several independent deep neural networks for ensemble costs multiple resources. If so, is it possible to utilize ensemble in only one neural network? In this work, we propose Intra-Ensemble, an end-to-end ensemble strategy with stochastic channel recombination operations to train several sub-networks simultaneously within one neural network. Additional parameter size is marginal since the majority of parameters are mutually shared. Meanwhile, stochastic channel recombination significantly increases the diversity of sub-networks, which finally enhances ensemble performance. Extensive experiments and ablation studies prove the applicability of intra-ensemble on various kinds of datasets and network architectures.

preprint2020arXiv

Matching Neuromorphic Events and Color Images via Adversarial Learning

The event camera has appealing properties: high dynamic range, low latency, low power consumption and low memory usage, and thus provides complementariness to conventional frame-based cameras. It only captures the dynamics of a scene and is able to capture almost "continuous" motion. However, different from frame-based camera that reflects the whole appearance as scenes are, the event camera casts away the detailed characteristics of objects, such as texture and color. To take advantages of both modalities, the event camera and frame-based camera are combined together for various machine vision tasks. Then the cross-modal matching between neuromorphic events and color images plays a vital and essential role. In this paper, we propose the Event-Based Image Retrieval (EBIR) problem to exploit the cross-modal matching task. Given an event stream depicting a particular object as query, the aim is to retrieve color images containing the same object. This problem is challenging because there exists a large modality gap between neuromorphic events and color images. We address the EBIR problem by proposing neuromorphic Events-Color image Feature Learning (ECFL). Particularly, the adversarial learning is employed to jointly model neuromorphic events and color images into a common embedding space. We also contribute to the community N-UKbench and EC180 dataset to promote the development of EBIR problem. Extensive experiments on our datasets show that the proposed method is superior in learning effective modality-invariant representation to link two different modalities.

preprint2020arXiv

Mixed Noise Removal with Pareto Prior

Denoising images contaminated by the mixture of additive white Gaussian noise (AWGN) and impulse noise (IN) is an essential but challenging problem. The presence of impulsive disturbances inevitably affects the distribution of noises and thus largely degrades the performance of traditional AWGN denoisers. Existing methods target to compensate the effects of IN by introducing a weighting matrix, which, however, is lack of proper priori and thus hard to be accurately estimated. To address this problem, we exploit the Pareto distribution as the priori of the weighting matrix, based on which an accurate and robust weight estimator is proposed for mixed noise removal. Particularly, a relatively small portion of pixels are assumed to be contaminated with IN, which should have weights with small values and then be penalized out. This phenomenon can be properly described by the Pareto distribution of type 1. Therefore, armed with the Pareto distribution, we formulate the problem of mixed noise removal in the Bayesian framework, where nonlocal self-similarity priori is further exploited by adopting nonlocal low rank approximation. Compared to existing methods, the proposed method can estimate the weighting matrix adaptively, accurately, and robust for different level of noises, thus can boost the denoising performance. Experimental results on widely used image datasets demonstrate the superiority of our proposed method to the state-of-the-arts.

preprint2020arXiv

On Exact and $\infty$-Rényi Common Informations

Recently, two extensions of Wyner's common information\textemdash exact and Rényi common informations\textemdash were introduced respectively by Kumar, Li, and El Gamal (KLE), and the present authors. The class of common information problems involves determining the minimum rate of the common input to two independent processors needed to exactly or approximately generate a target joint distribution. For the exact common information problem, exact generation of the target distribution is required, while for Wyner's and $α$-Rényi common informations, the relative entropy and Rényi divergence with order $α$ were respectively used to quantify the discrepancy between the synthesized and target distributions. The exact common information is larger than or equal to Wyner's common information. However, it was hitherto unknown whether the former is strictly larger than the latter for some joint distributions. In this paper, we first establish the equivalence between the exact and $\infty$-Rényi common informations, and then provide single-letter upper and lower bounds for these two quantities. For doubly symmetric binary sources, we show that the upper and lower bounds coincide, which implies that for such sources, the exact and $\infty$-Rényi common informations are completely characterized. Interestingly, we observe that for such sources, these two common informations are strictly larger than Wyner's. This answers an open problem posed by KLE. Furthermore, we extend Wyner's, $\infty$-Rényi, and exact common informations to sources with countably infinite or continuous alphabets, including Gaussian sources.

preprint2020arXiv

Robust Time-Frequency Reconstruction by Learning Structured Sparsity

Time-frequency distributions (TFDs) play a vital role in providing descriptive analysis of non-stationary signals involved in realistic scenarios. It is well known that low time-frequency (TF) resolution and the emergency of cross-terms (CTs) are two main issues, which make it difficult to analyze and interpret practical signals using TFDs. In order to address these issues, we propose the U-Net aided iterative shrinkage-thresholding algorithm (U-ISTA) for reconstructing a near-ideal TFD by exploiting structured sparsity in signal TF domain. Specifically, the signal ambiguity function is firstly compressed, followed by unfolding the ISTA as a recurrent neural network. To consider continuously distributed characteristics of signals, a structured sparsity constraint is incorporated into the unfolded ISTA by regarding the U-Net as an adaptive threshold block, in which structure-aware thresholds are learned from enormous training data to exploit the underlying dependencies among neighboring TF coefficients. The proposed U-ISTA model is trained by both non-overlapped and overlapped synthetic signals including closely and far located non-stationary components. Experimental results demonstrate that the robust U-ISTA achieves superior performance compared with state-of-the-art algorithms, and gains a high TF resolution with CTs greatly eliminated even in low signal-to-noise ratio (SNR) environments.

preprint2020arXiv

Single Image Deraining with Continuous Rain Density Estimation

Single image deraining (SIDR) often suffers from over/under deraining due to the nonuniformity of rain densities and the variety of raindrop scales. In this paper, we propose a \textbf{\it co}ntinuous \textbf{\it de}nsity guided network (CODE-Net) for SIDR. Particularly, it is composed of { a rain {\color{black}streak} extractor and a denoiser}, where the convolutional sparse coding (CSC) is exploited to filter out noises from the extracted rain streaks. Inspired by the reweighted iterative soft-threshold for CSC, we address the problem of continuous rain density estimation by learning the weights with channel attention blocks from sparse codes. We further {\color{black}develop} a multiscale strategy to depict rain streaks appearing at different scales. Experiments on synthetic and real-world data demonstrate the superiority of our methods over recent {\color{black}state of the arts}, in terms of both quantitative and qualitative results. Additionally, instead of quantizing rain density with several levels, our CODE-Net can provide continuous-valued estimations of rain densities, which is more desirable in real applications.

preprint2020arXiv

Structure-Aware Network for Lane Marker Extraction with Dynamic Vision Sensor

Lane marker extraction is a basic yet necessary task for autonomous driving. Although past years have witnessed major advances in lane marker extraction with deep learning models, they all aim at ordinary RGB images generated by frame-based cameras, which limits their performance in extreme cases, like huge illumination change. To tackle this problem, we introduce Dynamic Vision Sensor (DVS), a type of event-based sensor to lane marker extraction task and build a high-resolution DVS dataset for lane marker extraction. We collect the raw event data and generate 5,424 DVS images with a resolution of 1280$\times$800 pixels, the highest one among all DVS datasets available now. All images are annotated with multi-class semantic segmentation format. We then propose a structure-aware network for lane marker extraction in DVS images. It can capture directional information comprehensively with multidirectional slice convolution. We evaluate our proposed network with other state-of-the-art lane marker extraction models on this dataset. Experimental results demonstrate that our method outperforms other competitors. The dataset is made publicly available, including the raw event data, accumulated images and labels.