Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
61works
0followers
35topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

61 published item(s)

preprint2026arXiv

A Provably Convergent and Practical Algorithm for Gromov--Wasserstein Optimal Transport

Gromov--Wasserstein optimal transport (GWOT) aligns metric measure spaces by matching their within-domain relational structures, but large-scale GWOT remains challenging because its objective is nonconvex and projection onto the transport polytope is often solved only approximately in practice. This leads to a gap between practical projected-gradient implementations and convergence theory, which typically assumes exact projections. For squared-loss GWOT, we propose an inexact projected-gradient framework with a verifiable feasibility-residual-based inexact condition for the projection subproblem. This condition is directly computable and avoids unknown quantities such as the exact projection point. Under this implementable condition, we prove subsequential convergence to stationary points and, with a mild tolerance-decay condition, convergence of the whole sequence. The resulting method retains the simplicity and sparsity of projected-gradient schemes while providing rigorous convergence guarantees, turning projected-gradient methods into a principled and scalable approach for GWOT with provable reliability.

preprint2025arXiv

Detection of disk-jet co-precession in a tidal disruption event

Theories and simulations predict that intense spacetime curvature near black holes bends the trajectories of light and matter, driving disk and jet precession under relativistic torques. However, direct observational evidence of disk-jet co-precession remains elusive. Here, we report the most compelling case to date: a tidal disruption event (TDE) exhibiting unprecedented 19.6-day quasi-periodic variations in both X-rays and radio, with X-ray amplitudes exceeding an order of magnitude. The nearly synchronized X-ray and radio variations suggest a shared mechanism regulating the emission regions. We demonstrate that a disk-jet Lense-Thirring precession model successfully reproduces these variations while requiring a low-spin black hole. This study uncovers previously uncharted short-term radio variability in TDEs, highlights the transformative potential of high-cadence radio monitoring, and offers profound insights into disk-jet physics.

preprint2025arXiv

Holistic Evaluation of Multimodal LLMs on Spatial Intelligence

Multimodal models have achieved remarkable progress in recent years. Nevertheless, they continue to exhibit notable limitations in spatial understanding and reasoning, the very capability that anchors artificial general intelligence in the physical world. With the recent release of GPT-5, allegedly the most powerful AI model to date, it is timely to examine where the leading models (GPT, Gemini, Grok, Seed, Qwen, and Intern) stand on the path toward spatial intelligence (SI). We thus propose EASI for holistic Evaluation of multimodAl LLMs on Spatial Intelligence. EASI conceptualizes a comprehensive taxonomy of spatial tasks that unifies existing benchmarks and a growing collection of newly curated ones, enabling systematic evaluation of state-of-the-art models. In this report, we conduct the study across eight key benchmarks, at a cost exceeding ten billion total tokens. Our empirical study then reveals that (1) GPT-5 demonstrates unprecedented strength in SI, yet (2) still falls short of human performance significantly across a broad spectrum of SI-tasks. Moreover, we (3) show that SI-tasks expose greater model capability deficiency than non-SI tasks, to the extent that (4) proprietary models do not exhibit a decisive advantage when facing the most difficult ones. In addition, we conduct a qualitative evaluation across a diverse set of scenarios that are intuitive for humans, yet fail the most advanced multimodal models. EASI is an ongoing community effort: we have open-sourced the EASI codebase that provides a one-stop and reproducible solution with standardized interfaces, integrated protocols and prompts that significantly reduce the friction of configuring and running multiple benchmarks; we have also launched an accompanying EASI leaderboard to provide a continually updated snapshot of model performance across the full SI spectrum, accelerating collective progress toward robust SI.

preprint2024arXiv

A Physics-guided Generative AI Toolkit for Geophysical Monitoring

Full-waveform inversion (FWI) plays a vital role in geoscience to explore the subsurface. It utilizes the seismic wave to image the subsurface velocity map. As the machine learning (ML) technique evolves, the data-driven approaches using ML for FWI tasks have emerged, offering enhanced accuracy and reduced computational cost compared to traditional physics-based methods. However, a common challenge in geoscience, the unprivileged data, severely limits ML effectiveness. The issue becomes even worse during model pruning, a step essential in geoscience due to environmental complexities. To tackle this, we introduce the EdGeo toolkit, which employs a diffusion-based model guided by physics principles to generate high-fidelity velocity maps. The toolkit uses the acoustic wave equation to generate corresponding seismic waveform data, facilitating the fine-tuning of pruned ML models. Our results demonstrate significant improvements in SSIM scores and reduction in both MAE and MSE across various pruning ratios. Notably, the ML model fine-tuned using data generated by EdGeo yields superior quality of velocity maps, especially in representing unprivileged features, outperforming other existing methods.

preprint2022arXiv

A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection

In this paper, we introduce VCSL (Video Copy Segment Localization), a new comprehensive segment-level annotated video copy dataset. Compared with existing copy detection datasets restricted by either video-level annotation or small-scale, VCSL not only has two orders of magnitude more segment-level labelled data, with 160k realistic video copy pairs containing more than 280k localized copied segment pairs, but also covers a variety of video categories and a wide range of video duration. All the copied segments inside each collected video pair are manually extracted and accompanied by precisely annotated starting and ending timestamps. Alongside the dataset, we also propose a novel evaluation protocol that better measures the prediction accuracy of copy overlapping segments between a video pair and shows improved adaptability in different scenarios. By benchmarking several baseline and state-of-the-art segment-level video copy detection methods with the proposed dataset and evaluation metric, we provide a comprehensive analysis that uncovers the strengths and weaknesses of current approaches, hoping to open up promising directions for future works. The VCSL dataset, metric and benchmark codes are all publicly available at https://github.com/alipay/VCSL.

preprint2022arXiv

An efficient implementable inexact entropic proximal point algorithm for a class of linear programming problems

We introduce a class of specially structured linear programming (LP) problems, which has favorable modeling capability for important application problems in different areas such as optimal transport, discrete tomography and economics. To solve these generally large-scale LP problems efficiently, we design an implementable inexact entropic proximal point algorithm (iEPPA) combined with an easy-to-implement dual block coordinate descent method as a subsolver. Unlike existing entropy-type proximal point algorithms, our iEPPA employs a more practically checkable stopping condition for solving the associated subproblems while achieving provable convergence. Moreover, when solving the capacity constrained multi-marginal optimal transport (CMOT) problem (a special case of our LP problem), our iEPPA is able to bypass the underlying numerical instability issues that often appear in the popular entropic regularization approach, since our algorithm does not require the proximal parameter to be very small in order to obtain an accurate approximate solution. Numerous numerical experiments show that our iEPPA is efficient and robust for solving large-scale CMOT problems. The experiments on the discrete tomography problem also highlight the potential modeling power of our model.

preprint2022arXiv

Automated Architecture Search for Brain-inspired Hyperdimensional Computing

This paper represents the first effort to explore an automated architecture search for hyperdimensional computing (HDC), a type of brain-inspired neural network. Currently, HDC design is largely carried out in an application-specific ad-hoc manner, which significantly limits its application. Furthermore, the approach leads to inferior accuracy and efficiency, which suggests that HDC cannot perform competitively against deep neural networks. Herein, we present a thorough study to formulate an HDC architecture search space. On top of the search space, we apply reinforcement-learning to automatically explore the HDC architectures. The searched HDC architectures show competitive performance on case studies involving a drug discovery dataset and a language recognition task. On the Clintox dataset, which tries to learn features from developed drugs that passed/failed clinical trials for toxicity reasons, the searched HDC architecture obtains the state-of-the-art ROC-AUC scores, which are 0.80% higher than the manually designed HDC and 9.75% higher than conventional neural networks. Similar results are achieved on the language recognition task, with 1.27% higher performance than conventional methods.

preprint2022arXiv

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

3D avatar creation plays a crucial role in the digital age. However, the whole production process is prohibitively time-consuming and labor-intensive. To democratize this technology to a larger audience, we propose AvatarCLIP, a zero-shot text-driven framework for 3D avatar generation and animation. Unlike professional software that requires expert knowledge, AvatarCLIP empowers layman users to customize a 3D avatar with the desired shape and texture, and drive the avatar with the described motions using solely natural languages. Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation. Specifically, driven by natural language descriptions, we initialize 3D human geometry generation with a shape VAE network. Based on the generated 3D human shapes, a volume rendering model is utilized to further facilitate geometry sculpting and texture generation. Moreover, by leveraging the priors learned in the motion VAE, a CLIP-guided reference-based motion synthesis method is proposed for the animation of the generated 3D avatar. Extensive qualitative and quantitative experiments validate the effectiveness and generalizability of AvatarCLIP on a wide range of avatars. Remarkably, AvatarCLIP can generate unseen 3D avatars with novel animations, achieving superior zero-shot capability.

preprint2022arXiv

Bregman Proximal Point Algorithm Revisited: A New Inexact Version and its Inertial Variant

We study a general convex optimization problem, which covers various classic problems in different areas and particularly includes many optimal transport related problems arising in recent years. To solve this problem, we revisit the classic Bregman proximal point algorithm (BPPA) and introduce a new inexact stopping condition for solving the subproblems, which can circumvent the underlying feasibility difficulty often appearing in existing inexact conditions when the problem has a complex feasible set. Our inexact condition also covers several existing inexact conditions as special cases and hence makes our inexact BPPA (iBPPA) more flexible to fit different scenarios in practice. Moreover, inspired by Nesterov's acceleration technique, we develop an inertial variant of our iBPPA, denoted by V-iBPPA, and establish the iteration complexity of $O(1/k^λ)$, where $λ\geq1$ is a quadrangle scaling exponent of the kernel function. In particular, when the proximal parameter is a constant and the kernel function is strongly convex with Lipschitz continuous gradient (hence $λ=2$), our V-iBPPA achieves a faster rate of $O(1/k^2)$ just as existing accelerated inexact proximal point algorithms. Some preliminary numerical experiments for solving the standard OT problem are conducted to show the convergence behaviors of our iBPPA and V-iBPPA under different inexactness settings. The experiments also empirically verify the potential of our V-iBPPA on improving the convergence speed.

preprint2022arXiv

CAMO-MOT: Combined Appearance-Motion Optimization for 3D Multi-Object Tracking with Camera-LiDAR Fusion

3D Multi-object tracking (MOT) ensures consistency during continuous dynamic detection, conducive to subsequent motion planning and navigation tasks in autonomous driving. However, camera-based methods suffer in the case of occlusions and it can be challenging to accurately track the irregular motion of objects for LiDAR-based methods. Some fusion methods work well but do not consider the untrustworthy issue of appearance features under occlusion. At the same time, the false detection problem also significantly affects tracking. As such, we propose a novel camera-LiDAR fusion 3D MOT framework based on the Combined Appearance-Motion Optimization (CAMO-MOT), which uses both camera and LiDAR data and significantly reduces tracking failures caused by occlusion and false detection. For occlusion problems, we are the first to propose an occlusion head to select the best object appearance features multiple times effectively, reducing the influence of occlusions. To decrease the impact of false detection in tracking, we design a motion cost matrix based on confidence scores which improve the positioning and object prediction accuracy in 3D space. As existing multi-object tracking methods only consider a single category, we also propose to build a multi-category loss to implement multi-object tracking in multi-category scenes. A series of validation experiments are conducted on the KITTI and nuScenes tracking benchmarks. Our proposed method achieves state-of-the-art performance and the lowest identity switches (IDS) value (23 for Car and 137 for Pedestrian) among all multi-modal MOT methods on the KITTI test dataset. And our proposed method achieves state-of-the-art performance among all algorithms on the nuScenes test dataset with 75.3% AMOTA.

preprint2022arXiv

Characterization of GaN-based HEMTs Down to 4.2 K for Cryogenic Applications

The cryogenic performance of GaN-based HEMTs (high-electron-mobility transistors) is systematically investigated by the direct current (DC) and low-frequency noise (LFN) characteristics within the temperature (T) range from 300 K to 4.2 K. The important electrical merits of the device, including drain saturation current (IDsat), on-resistance (RON), transductance, subthreshold swing (SS), gate leakage current, and Schottky barrier height, are comprehensively characterized and their temperature-dependent behavior was statistically analyzed. In addition, the LFN of the device shows an evident behavior of 1/f noise from 10 Hz to 10 kHz in the measured temperature range and can be significantly reduced at cryogenic temperature. These results are of great importance to motivate further studies into the GaN-based cryo-devices and systems.

preprint2022arXiv

Coded Transaction Broadcasting for High-throughput Blockchains

High-throughput blockchains require efficient transaction broadcast mechanisms that can deliver transactions to most network nodes with low bandwidth overhead and latency. Existing schemes coordinate transmissions across peers to avoid sending redundant data, but they either incur a high latency or are not robust against adversarial network nodes. We present Strokkur, a new transaction broadcasting mechanism that provides both low bandwidth overhead and low latency. The core idea behind Strokkur is to avoid explicit coordination through randomized transaction coding. Rather than forward individual transactions. Strokkur nodes send out codewords -- XOR sums of multiple transactions selected at random. Since almost every codeword is useful for the receiver to decode new transactions, Strokkur nodes do not require coordination, for example, to determine which transactions the receiver is missing. Strokkur's coding strategy builds on LT codes, a popular class of rateless erasure codes, and extends them to support multiple uncoordinated senders with partially-overlapping continual streams of transaction data. Strokkur introduces mechanisms to cope with adversarial senders that may send corrupt codewords, and a simple rate control algorithm that enables each node to independently determine an appropriate sending rate of codewords for each peer. Our implementation of Strokkur in Golang supports 647k transactions per second using only one CPU core. Our evaluation across a 19-node Internet deployment and large-scale simulation show that Strokkur consumes 2--7.6x less bandwidth than the existing scheme in Bitcoin, and 9x lower latency that Shrec when only 4% of nodes are adversarial.

preprint2022arXiv

Compact and variable radio emission from an active galaxy with supersoft X-ray emission

RX J1301.9+2747 is a unique active galaxy with supersoft X-ray spectrum that lacks significant emission at energies above 2 keV. In addition, it is one of few galaxies displaying quasi-periodic X-ray eruptions that recur on a timescale of 13-20 ks. We present multi-epoch radio observations of RX J1301.9+2747 using GMRT, VLA and VLBA. The VLBA imaging at 1.6 GHz reveals a compact radio emission unresolved at a scale of <0.7 pc, with a brightness temperature of T_b>5x10^7 K. The radio emission is variable by more than a factor of 2.5 over a few days, based on the data taken from VLA monitoring campaigns. The short-term radio variability suggests that the radio emitting region has a size as small as 8x10^{-4} pc, resulting in an even higher brightness temperature of T_b ~10^{12} K. A similar limit on the source size can be obtained if the observed flux variability is not intrinsic and caused by the interstellar scintillation effect. The overall radio spectrum is steep with a time-averaged spectral index alpha=-0.78+/-0.03 between 0.89 GHz and 14 GHz. These observational properties rule out a thermal or star-formation origin of the radio emission, and appear to be consistent with the scenario of episodic jet ejections driven by magnetohydrodynamic process. Simultaneous radio and X-ray monitoring observations down to a cadence of hours are required to test whether the compact and variable radio emission is correlated with the quasi-periodic X-ray eruptions.

preprint2022arXiv

Coverage Axis: Inner Point Selection for 3D Shape Skeletonization

In this paper, we present a simple yet effective formulation called Coverage Axis for 3D shape skeletonization. Inspired by the set cover problem, our key idea is to cover all the surface points using as few inside medial balls as possible. This formulation inherently induces a compact and expressive approximation of the Medial Axis Transform (MAT) of a given shape. Different from previous methods that rely on local approximation error, our method allows a global consideration of the overall shape structure, leading to an efficient high-level abstraction and superior robustness to noise. Another appealing aspect of our method is its capability to handle more generalized input such as point clouds and poor-quality meshes. Extensive comparisons and evaluations demonstrate the remarkable effectiveness of our method for generating compact and expressive skeletal representation to approximate the MAT.

preprint2022arXiv

CreatureShop: Interactive 3D Character Modeling and Texturing from a Single Color Drawing

Creating 3D shapes from 2D drawings is an important problem with applications in content creation for computer animation and virtual reality. We introduce a new sketch-based system, CreatureShop, that enables amateurs to create high-quality textured 3D character models from 2D drawings with ease and efficiency. CreatureShop takes an input bitmap drawing of a character (such as an animal or other creature), depicted from an arbitrary descriptive pose and viewpoint, and creates a 3D shape with plausible geometric details and textures from a small number of user annotations on the 2D drawing. Our key contributions are a novel oblique view modeling method, a set of systematic approaches for producing plausible textures on the invisible or occluded parts of the 3D character (as viewed from the direction of the input drawing), and a user-friendly interactive system. We validate our system and methods by creating numerous 3D characters from various drawings, and compare our results with related works to show the advantages of our method. We perform a user study to evaluate the usability of our system, which demonstrates that our system is a practical and efficient approach to create fully-textured 3D character models for novice users.

preprint2022arXiv

DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation

This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch. Unlike current solutions that estimate each frame in a video, DeciWatch introduces a simple yet effective sample-denoise-recover framework that only watches sparsely sampled frames, taking advantage of the continuity of human motions and the lightweight pose representation. Specifically, DeciWatch uniformly samples less than 10% video frames for detailed estimation, denoises the estimated 2D/3D poses with an efficient Transformer architecture, and then accurately recovers the rest of the frames using another Transformer-based network. Comprehensive experimental results on three video-based human pose estimation and body mesh recovery tasks with four datasets validate the efficiency and effectiveness of DeciWatch. Code is available at https://github.com/cure-lab/DeciWatch.

preprint2022arXiv

Federated Contrastive Learning for Dermatological Disease Diagnosis via On-device Learning

Deep learning models have been deployed in an increasing number of edge and mobile devices to provide healthcare. These models rely on training with a tremendous amount of labeled data to achieve high accuracy. However, for medical applications such as dermatological disease diagnosis, the private data collected by mobile dermatology assistants exist on distributed mobile devices of patients, and each device only has a limited amount of data. Directly learning from limited data greatly deteriorates the performance of learned models. Federated learning (FL) can train models by using data distributed on devices while keeping the data local for privacy. Existing works on FL assume all the data have ground-truth labels. However, medical data often comes without any accompanying labels since labeling requires expertise and results in prohibitively high labor costs. The recently developed self-supervised learning approach, contrastive learning (CL), can leverage the unlabeled data to pre-train a model, after which the model is fine-tuned on limited labeled data for dermatological disease diagnosis. However, simply combining CL with FL as federated contrastive learning (FCL) will result in ineffective learning since CL requires diverse data for learning but each device only has limited data. In this work, we propose an on-device FCL framework for dermatological disease diagnosis with limited labels. Features are shared in the FCL pre-training process to provide diverse and accurate contrastive information. After that, the pre-trained model is fine-tuned with local labeled data independently on each device or collaboratively with supervised federated learning on all devices. Experiments on dermatological disease datasets show that the proposed framework effectively improves the recall and precision of dermatological disease diagnosis compared with state-of-the-art methods.

preprint2022arXiv

Federated Self-Supervised Contrastive Learning and Masked Autoencoder for Dermatological Disease Diagnosis

In dermatological disease diagnosis, the private data collected by mobile dermatology assistants exist on distributed mobile devices of patients. Federated learning (FL) can use decentralized data to train models while keeping data local. Existing FL methods assume all the data have labels. However, medical data often comes without full labels due to high labeling costs. Self-supervised learning (SSL) methods, contrastive learning (CL) and masked autoencoders (MAE), can leverage the unlabeled data to pre-train models, followed by fine-tuning with limited labels. However, combining SSL and FL has unique challenges. For example, CL requires diverse data but each device only has limited data. For MAE, while Vision Transformer (ViT) based MAE has higher accuracy over CNNs in centralized learning, MAE&#39;s performance in FL with unlabeled data has not been investigated. Besides, the ViT synchronization between the server and clients is different from traditional CNNs. Therefore, special synchronization methods need to be designed. In this work, we propose two federated self-supervised learning frameworks for dermatological disease diagnosis with limited labels. The first one features lower computation costs, suitable for mobile devices. The second one features high accuracy and fits high-performance servers. Based on CL, we proposed federated contrastive learning with feature sharing (FedCLF). Features are shared for diverse contrastive information without sharing raw data for privacy. Based on MAE, we proposed FedMAE. Knowledge split separates the global and local knowledge learned from each client. Only global knowledge is aggregated for higher generalization performance. Experiments on dermatological disease datasets show superior accuracy of the proposed frameworks over state-of-the-arts.

preprint2022arXiv

Label-Efficient Interactive Time-Series Anomaly Detection

Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.

preprint2022arXiv

Learn to Predict How Humans Manipulate Large-sized Objects from Interactive Motions

Understanding human intentions during interactions has been a long-lasting theme, that has applications in human-robot interaction, virtual reality and surveillance. In this study, we focus on full-body human interactions with large-sized daily objects and aim to predict the future states of objects and humans given a sequential observation of human-object interaction. As there is no such dataset dedicated to full-body human interactions with large-sized daily objects, we collected a large-scale dataset containing thousands of interactions for training and evaluation purposes. We also observe that an object&#39;s intrinsic physical properties are useful for the object motion prediction, and thus design a set of object dynamic descriptors to encode such intrinsic properties. We treat the object dynamic descriptors as a new modality and propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task. We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects. We also demonstrate the predicted results are useful for human-robot collaborations.

preprint2022arXiv

Longest Chain Consensus Under Bandwidth Constraint

Spamming attacks are a serious concern for consensus protocols, as witnessed by recent outages of a major blockchain, Solana. They cause congestion and excessive message delays in a real network due to its bandwidth constraints. In contrast, longest chain (LC), an important family of consensus protocols, has previously only been proven secure assuming an idealized network model in which all messages are delivered within bounded delay. This model-reality mismatch is further aggravated for Proof-of-Stake (PoS) LC where the adversary can spam the network with equivocating blocks. Hence, we extend the network model to capture bandwidth constraints, under which nodes now need to choose carefully which blocks to spend their limited download budget on. To illustrate this point, we show that &#39;download along the longest header chain&#39;, a natural download rule for Proof-of-Work (PoW) LC, is insecure for PoS LC. We propose a simple rule &#39;download towards the freshest block&#39;, formalize two common heuristics &#39;not downloading equivocations&#39; and &#39;blocklisting&#39;, and prove in a unified framework that PoS LC with any one of these download rules is secure in bandwidth-constrained networks. In experiments, we validate our claims and showcase the behavior of these download rules under attack. By composing multiple instances of a PoS LC protocol with a suitable download rule in parallel, we obtain a PoS consensus protocol that achieves a constant fraction of the network&#39;s throughput limit even under worst-case adversarial strategies.

preprint2022arXiv

Mass Testing and Characterization of 20-inch PMTs for JUNO

Main goal of the JUNO experiment is to determine the neutrino mass ordering using a 20kt liquid-scintillator detector. Its key feature is an excellent energy resolution of at least 3 % at 1 MeV, for which its instruments need to meet a certain quality and thus have to be fully characterized. More than 20,000 20-inch PMTs have been received and assessed by JUNO after a detailed testing program which began in 2017 and elapsed for about four years. Based on this mass characterization and a set of specific requirements, a good quality of all accepted PMTs could be ascertained. This paper presents the performed testing procedure with the designed testing systems as well as the statistical characteristics of all 20-inch PMTs intended to be used in the JUNO experiment, covering more than fifteen performance parameters including the photocathode uniformity. This constitutes the largest sample of 20-inch PMTs ever produced and studied in detail to date, i.e. 15,000 of the newly developed 20-inch MCP-PMTs from Northern Night Vision Technology Co. (NNVT) and 5,000 of dynode PMTs from Hamamatsu Photonics K. K.(HPK).

preprint2022arXiv

Mix-Teaching: A Simple, Unified and Effective Semi-Supervised Learning Framework for Monocular 3D Object Detection

Monocular 3D object detection is an essential perception task for autonomous driving. However, the high reliance on large-scale labeled data make it costly and time-consuming during model optimization. To reduce such over-reliance on human annotations, we propose Mix-Teaching, an effective semi-supervised learning framework applicable to employ both labeled and unlabeled images in training stage. Mix-Teaching first generates pseudo-labels for unlabeled images by self-training. The student model is then trained on the mixed images possessing much more intensive and precise labeling by merging instance-level image patches into empty backgrounds or labeled images. This is the first to break the image-level limitation and put high-quality pseudo labels from multi frames into one image for semi-supervised training. Besides, as a result of the misalignment between confidence score and localization quality, it&#39;s hard to discriminate high-quality pseudo-labels from noisy predictions using only confidence-based criterion. To that end, we further introduce an uncertainty-based filter to help select reliable pseudo boxes for the above mixing operation. To the best of our knowledge, this is the first unified SSL framework for monocular 3D object detection. Mix-Teaching consistently improves MonoFlex and GUPNet by significant margins under various labeling ratios on KITTI dataset. For example, our method achieves around +6.34% AP@0.7 improvement against the GUPNet baseline on validation set when using only 10% labeled data. Besides, by leveraging full training set and the additional 48K raw images of KITTI, it can further improve the MonoFlex by +4.65% improvement on AP@0.7 for car detection, reaching 18.54% AP@0.7, which ranks the 1st place among all monocular based methods on KITTI test leaderboard. The code and pretrained models will be released at https://github.com/yanglei18/Mix-Teaching.

preprint2022arXiv

Modularity and uniformization of a higher genus algebraic space curve, its distinct arithmetical realizations by cohomology groups and $E_6$, $E_7$, $E_8$-singularities

We prove the modularity for an algebraic space curve $Y$ of genus $50$ in $\mathbb{P}^5$, which consists of $21$ quartic polynomials in six variables, by means of an explicit modular parametrization by theta constants of order $13$. This provides an example of modularity, explicit uniformization and hyperbolic uniformization of arithmetic type for a higher genus algebraic space curve. In particular, it gives a new example for Hilbert&#39;s 22nd problem. This gives $21$ modular equations of order $13$, which greatly improve the result of Ramanujan and Evans on the construction of modular equations of order $13$. We show that $Y$ is isomorphic to the modular curve $X(13)$. The corresponding ideal $I(Y)$ is invariant under the action of $\text{SL}(2, 13)$, which leads to a $21$-dimensional reducible representation of $\text{SL}(2, 13)$, whose decomposition as the direct sum of $1$, $7$ and $13$-dimensional representations gives two distinct arithmetical realizations of $X(13)$ by character fields $\mathbb{Q}(χ)=\mathbb{Q}(ζ_7+ζ_7^{-1})$ or $\mathbb{Q}(χ)=\mathbb{Q}(\sqrt{13})$ of irreducible representations of $\text{SL}(2, 13)$ corresponding to the decompositions of cohomology groups of a projective or affine variety with values in a coherent algebraic sheaf on $X(13)$ as well as the geometric construction of $Y$, the geometric realization of the degenerate principal series and the Steinberg representation of $\text{SL}(2, 13)$. The projection $Y \rightarrow Y/\text{SL}(2, 13)$ (identified with $\mathbb{CP}^1$) is a Galois covering whose generic fibre is interpreted as the Galois resolvent of the modular equation $Φ_{13}(\cdot, j)=0$ of level $13$. The ring of invariant polynomials $(\mathbb{C}[z_1, z_2, z_3, z_4, z_5, z_6]/I(Y))^{\text{SL}(2, 13)}$ over $X(13)$ leads to a new perspective on the theory of $E_6$, $E_7$ and $E_8$-singularities.

preprint2022arXiv

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this problem, we propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping. Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis. MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation. MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation. A qualitative analysis further demonstrates MotionDiffuse&#39;s controllability for comprehensive motion generation. Homepage: https://mingyuan-zhang.github.io/projects/MotionDiffuse.html

preprint2022arXiv

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space

This paper addresses an important problem of ranking the pre-trained deep neural networks and screening the most transferable ones for downstream tasks. It is challenging because the ground-truth model ranking for each task can only be generated by fine-tuning the pre-trained models on the target dataset, which is brute-force and computationally expensive. Recent advanced methods proposed several lightweight transferability metrics to predict the fine-tuning results. However, these approaches only capture static representations but neglect the fine-tuning dynamics. To this end, this paper proposes a new transferability metric, called \textbf{S}elf-challenging \textbf{F}isher \textbf{D}iscriminant \textbf{A}nalysis (\textbf{SFDA}), which has many appealing benefits that existing works do not have. First, SFDA can embed the static features into a Fisher space and refine them for better separability between classes. Second, SFDA uses a self-challenging mechanism to encourage different pre-trained models to differentiate on hard examples. Third, SFDA can easily select multiple pre-trained models for the model ensemble. Extensive experiments on $33$ pre-trained models of $11$ downstream tasks show that SFDA is efficient, effective, and robust when measuring the transferability of pre-trained models. For instance, compared with the state-of-the-art method NLEEP, SFDA demonstrates an average of $59.1$\% gain while bringing $22.5$x speedup in wall-clock time. The code will be available at \url{https://github.com/TencentARC/SFDA}.

preprint2022arXiv

SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech Separation

Continuous speech separation for meeting pre-processing has recently become a focused research topic. Compared to the data in utterance-level speech separation, the meeting-style audio stream lasts longer, has an uncertain number of speakers. We adopt the time-domain speech separation method and the recently proposed Graph-PIT to build a super low-latency online speech separation model, which is very important for the real application. The low-latency time-domain encoder with a small stride leads to an extremely long feature sequence. We proposed a simple yet efficient model named Skipping Memory (SkiM) for the long sequence modeling. Experimental results show that SkiM achieves on par or even better separation performance than DPRNN. Meanwhile, the computational cost of SkiM is reduced by 75% compared to DPRNN. The strong long sequence modeling capability and low computational cost make SkiM a suitable model for online CSS applications. Our fastest real-time model gets 17.1 dB signal-to-distortion (SDR) improvement with less than 1-millisecond latency in the simulated meeting-style evaluation.

preprint2022arXiv

SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos

When analyzing human motion videos, the output jitters from existing pose estimators are highly-unbalanced with varied estimation errors across frames. Most frames in a video are relatively easy to estimate and only suffer from slight jitters. In contrast, for rarely seen or occluded actions, the estimated positions of multiple joints largely deviate from the ground truth values for a consecutive sequence of frames, rendering significant jitters on them. To tackle this problem, we propose to attach a dedicated temporal-only refinement network to existing pose estimators for jitter mitigation, named SmoothNet. Unlike existing learning-based solutions that employ spatio-temporal models to co-optimize per-frame precision and temporal smoothness at all the joints, SmoothNet models the natural smoothness characteristics in body movements by learning the long-range temporal relations of every joint without considering the noisy correlations among joints. With a simple yet effective motion-aware fully-connected network, SmoothNet improves the temporal smoothness of existing pose estimators significantly and enhances the estimation accuracy of those challenging frames as a side-effect. Moreover, as a temporal-only model, a unique advantage of SmoothNet is its strong transferability across various types of estimators and datasets. Comprehensive experiments on five datasets with eleven popular backbone networks across 2D and 3D pose estimation and body recovery tasks demonstrate the efficacy of the proposed solution. Code is available at https://github.com/cure-lab/SmoothNet.

preprint2022arXiv

The Larger The Fairer? Small Neural Networks Can Achieve Fairness for Edge Devices

Along with the progress of AI democratization, neural networks are being deployed more frequently in edge devices for a wide range of applications. Fairness concerns gradually emerge in many applications, such as face recognition and mobile medical. One fundamental question arises: what will be the fairest neural architecture for edge devices? By examining the existing neural networks, we observe that larger networks typically are fairer. But, edge devices call for smaller neural architectures to meet hardware specifications. To address this challenge, this work proposes a novel Fairness- and Hardware-aware Neural architecture search framework, namely FaHaNa. Coupled with a model freezing approach, FaHaNa can efficiently search for neural networks with balanced fairness and accuracy, while guaranteed to meet hardware specifications. Results show that FaHaNa can identify a series of neural networks with higher fairness and accuracy on a dermatology dataset. Target edge devices, FaHaNa finds a neural architecture with slightly higher accuracy, 5.28x smaller size, 15.14% higher fairness score, compared with MobileNetV2; meanwhile, on Raspberry PI and Odroid XU-4, it achieves 5.75x and 5.79x speedup.

preprint2022arXiv

Visual-Tactile Sensing for Real-time Liquid Volume Estimation in Grasping

We propose a deep visuo-tactile model for realtime estimation of the liquid inside a deformable container in a proprioceptive way.We fuse two sensory modalities, i.e., the raw visual inputs from the RGB camera and the tactile cues from our specific tactile sensor without any extra sensor calibrations.The robotic system is well controlled and adjusted based on the estimation model in real time. The main contributions and novelties of our work are listed as follows: 1) Explore a proprioceptive way for liquid volume estimation by developing an end-to-end predictive model with multi-modal convolutional networks, which achieve a high precision with an error of around 2 ml in the experimental validation. 2) Propose a multi-task learning architecture which comprehensively considers the losses from both classification and regression tasks, and comparatively evaluate the performance of each variant on the collected data and actual robotic platform. 3) Utilize the proprioceptive robotic system to accurately serve and control the requested volume of liquid, which is continuously flowing into a deformable container in real time. 4) Adaptively adjust the grasping plan to achieve more stable grasping and manipulation according to the real-time liquid volume prediction.

preprint2021arXiv

A census of optically dark massive galaxies in the early Universe from magnification by lensing galaxy clusters

We present ALMA 870um and JCMT SCUBA2 850um dust continuum observations of a sample of optically dark and strongly lensed galaxies in the cluster fields. The ALMA and SCUBA2 observations reach a median rms of about 0.11 mJy and 0.44 mJy, respectively, with the latter close to the confusion limit of the data at 850um. This represents one of the most sensitive searches for dust emission in optically dark galaxies. We detect the dust emission in 12 out of 15 galaxies at >3.8 sigma, corresponding to a detection rate of 80 per cent. Thanks to the gravitational lensing, our observations reach a deeper limiting flux than previous surveys in blank fields by a factor of 3. We estimate delensed infrared luminosities in the range log(LIR)=11.5-12.7 Lsun, which correspond to dust-obscured star formation rates (SFRs) of 30 to 520 Msun per year. Stellar population fits to the optical-to-NIR photometric data yield a median redshift z=4.26 and de-lensed stellar mass log(Mstar)=10.78 Msun. They contribute a lensing-corrected star-formation rate density at least an order of magnitude higher than that of equivalently massive UV-selected galaxies at z>3. The results suggest that there is a missing population of massive star-forming galaxies in the early Universe, which may dominate the SFR density at the massive end. Five optically dark galaxies are located within r<50 arcsec in one cluster field, representing a potential overdensity structure that has a physical origin at a confidence level >99.974% from Poisson statistics. Follow-up spectroscopic observations with ALMA and JWST are crucial to confirm whether it is associated with a protocluster at similar redshifts.

preprint2021arXiv

Birational geometry of blow-ups of projective spaces along points and lines

Consider the blow-up $X$ of $\mathbb{P}^3$ at 6 points in very general position and the 15 lines through the 6 points. We construct an infinite-order pseudo-automorphism $ϕ_X$ on $X$, induced by the complete linear system of a divisor of degree 13. The effective cone of $X$ has infinitely many extremal rays and hence, $X$ is not a Mori Dream Space. The threefold $X$ has a unique anticanonical section which is a Jacobian K3 Kummer surface $S$ of Picard number 17. The restriction of $ϕ_X$ on $S$ realizes one of Keum&#39;s 192 infinite-order automorphisms of Jacobian K3 Kummer surfaces. In general, we show the blow-up of $\mathbb{P}^n$ ($n\geq 3$) at $(n+3)$ very general points and certain 9 lines through them is not Mori Dream, with infinitely many extremal effective divisors. As an application, for $n\geq 7$, the blow-up of $\overline{M}_{0,n}$ at a very general point has infinitely many extremal effective divisors.

preprint2021arXiv

Deferrable Load Scheduling under Demand Charge: A Block Model-Predictive Control Approach

Optimal scheduling of deferrable electrical loads can reshape the aggregated load profile to achieve higher operational efficiency and reliability. This paper studies deferrable load scheduling under demand charge that imposes a penalty on the peak consumption over a billing period. Such a terminal cost poses challenges in real-time dispatch when demand forecasts are inaccurate. A block model-predictive control approach is proposed by breaking demand charge into a sequence of stage costs. The problem of charging electric vehicles is used to illustrate the efficacy of the proposed approach. Numerical examples show that the block model-predictive control outperforms benchmark methods in various settings.

preprint2021arXiv

E-Tree Learning: A Novel Decentralized Model Learning Framework for Edge AI

Traditionally, AI models are trained on the central cloud with data collected from end devices. This leads to high communication cost, long response time and privacy concerns. Recently Edge empowered AI, namely Edge AI, has been proposed to support AI model learning and deployment at the network edge closer to the data sources. Existing research including federated learning adopts a centralized architecture for model learning where a central server aggregates the model updates from the clients/workers. The centralized architecture has drawbacks such as performance bottleneck, poor scalability and single point of failure. In this paper, we propose a novel decentralized model learning approach, namely E-Tree, which makes use of a well-designed tree structure imposed on the edge devices. The tree structure and the locations and orders of aggregation on the tree are optimally designed to improve the training convergency and model accuracy. In particular, we design an efficient device clustering algorithm, named by KMA, for E-Tree by taking into account the data distribution on the devices as well as the the network distance. Evaluation results show E-Tree significantly outperforms the benchmark approaches such as federated learning and Gossip learning under NonIID data in terms of model accuracy and convergency.

preprint2021arXiv

Efficient Compressed Sensing Based Image Coding by Using Gray Transformation

In recent years, compressed sensing (CS) based image coding has become a hot topic in image processing field. However, since the bit depth required for encoding each CS sample is too large, the compression performance of this paradigm is unattractive. To address this issue, a novel CS-based image coding system by using gray transformation is proposed. In the proposed system, we use a gray transformation to preprocess the original image firstly and then use CS to sample the transformed image. Since gray transformation makes the probability distribution of CS samples centralized, the bit depth required for encoding each CS sample is reduced significantly. Consequently, the proposed system can considerably improve the compression performance of CS-based image coding. Simulation results show that the proposed system outperforms the traditional one without using gray transformation in terms of compression performance.

preprint2021arXiv

JUNO Physics and Detector

The Jiangmen Underground Neutrino Observatory (JUNO) is a 20 kton LS detector at 700-m underground. An excellent energy resolution and a large fiducial volume offer exciting opportunities for addressing many important topics in neutrino and astro-particle physics. With 6 years of data, the neutrino mass ordering can be determined at 3-4 sigma and three oscillation parameters can be measured to a precision of 0.6% or better by detecting reactor antineutrinos. With 10 years of data, DSNB could be observed at 3-sigma; a lower limit of the proton lifetime of 8.34e33 years (90% C.L.) can be set by searching for p->nu_bar K^+; detection of solar neutrinos would shed new light on the solar metallicity problem and examine the vacuum-matter transition region. A core-collapse supernova at 10 kpc would lead to ~5000 IBD and ~2000 (300) all-flavor neutrino-proton (electron) scattering events. Geo-neutrinos can be detected with a rate of ~400 events/year. We also summarize the final design of the JUNO detector and the key R&D achievements. All 20-inch PMTs have been tested. The average photon detection efficiency is 28.9% for the 15,000 MCP PMTs and 28.1% for the 5,000 dynode PMTs, higher than the JUNO requirement of 27%. Together with the >20 m attenuation length of LS, we expect a yield of 1345 p.e. per MeV and an effective energy resolution of 3.02%/\sqrt{E (MeV)}$ in simulations. The underwater electronics is designed to have a loss rate <0.5% in 6 years. With degassing membranes and a micro-bubble system, the radon concentration in the 35-kton water pool could be lowered to <10 mBq/m^3. Acrylic panels of radiopurity <0.5 ppt U/Th are produced. The 20-kton LS will be purified onsite. Singles in the fiducial volume can be controlled to ~10 Hz. The JUNO experiment also features a double calorimeter system with 25,600 3-inch PMTs, a LS testing facility OSIRIS, and a near detector TAO.

preprint2020arXiv

A Framework for Behavior Privacy Preserving in Radio Frequency Signal

Recent years have witnessed the bloom development of the human-centered wireless sensing applications, in which some human information, such as the user&#39;s identity and motions, can be retrieved through analyzing the signal distortion caused by the target person. However, the openness of wireless transmission raises increasing concerns on user privacy, since either the human identity or human motion is sensitive in certain scenarios, including personal residence, laboratory, and office. Researchers have reported that commodity WiFi signals can be abused to identify users. To dispel this threat, in this paper we propose a privacy-preserving framework to effectively hide the information of user behaviors in wireless signals while retaining the ability of user authentication. The core of our framework is a novel Siamese network-based deep model, namely RFBP-Net. In this way, wireless sensing reveals user information moderately. We conduct extensive experiments on both the real WiFi and RFID system and open datasets. The experiment results show that RFBP-Net is able to significantly reduce the activity recognition accuracy, i.e., 70% reduction in the RFID system and 80% reduction in the WiFi system, with a slight penalty in the user authentication accuracy, i.e., only 5% and 1% decrease in the RFID and WiFi system, respectively.

preprint2020arXiv

A Sterile Neutrino Search at compact materials irradiation facility

The compact material irradiation facility (CMIF) is a current project in China that will provide a compact deuteron-beryllium neutron source. The target of this facility will be an intense and compact Isotope Decay-At-Rest (IsoDAR) neutrino source. In this paper, we propose to test the sterile neutrino hypothesis using CMIF as the neutrino source. At CMIF platform, the electron antineutrino production rate can be up to $2.0\times 10^{19}$ per day. When paired with an 80 t liquid scintillator detector to study short baseline electron antineutrino disappearance, the inverse beta decay (IBD) event rate is large enough to investigate the parameter ranges of interest for neutrino anomalies. Our sensitivity analysis shows that a short baseline experiment at this platform will provide a very competitive sterile neutrino search, especially in the high-$Δm^2$ region ($Δm^2 >10\,\text{eV}^2$).

preprint2020arXiv

Achieving Super-Linear Speedup across Multi-FPGA for Real-Time DNN Inference

Real-time Deep Neural Network (DNN) inference with low-latency requirement has become increasingly important for numerous applications in both cloud computing (e.g., Apple&#39;s Siri) and edge computing (e.g., Google/Waymo&#39;s driverless car). FPGA-based DNN accelerators have demonstrated both superior flexibility and performance; in addition, for real-time inference with low batch size, FPGA is expected to achieve further performance improvement. However, the performance gain from the single-FPGA design is obstructed by the limited on-chip resource. In this paper, we employ multiple FPGAs to cooperatively run DNNs with the objective of achieving super-linear speed-up against single-FPGA design. In implementing such systems, we found two barriers that hinder us from achieving the design goal: (1) the lack of a clear partition scheme for each DNN layer to fully exploit parallelism, and (2) the insufficient bandwidth between the off-chip memory and the accelerator due to the growing size of DNNs. To tackle these issues, we propose a general framework, &#34;Super-LIP&#34;, which can support different kinds of DNNs. In this paper, we take Convolutional Neural Network (CNN) as a vehicle to illustrate Super-LIP. We first formulate an accurate system-level model to support the exploration of best partition schemes. Then, we develop a novel design methodology to effectively alleviate the heavy loads on memory bandwidth by moving traffic from memory bus to inter-FPGA links. We implement Super-LIP based on ZCU102 FPGA boards. Results demonstrate that Super-LIP with 2 FPGAs can achieve 3.48x speedup, compared to the state-of-the-art single-FPGA design. What is more, as the number of FPGAs scales up, the system latency can be further reduced while maintaining high energy efficiency.

preprint2020arXiv

Blockchain for Future Smart Grid: A Comprehensive Survey

The concept of smart grid has been introduced as a new vision of the conventional power grid to figure out an efficient way of integrating green and renewable energy technologies. In this way, Internet-connected smart grid, also called energy Internet, is also emerging as an innovative approach to ensure the energy from anywhere at any time. The ultimate goal of these developments is to build a sustainable society. However, integrating and coordinating a large number of growing connections can be a challenging issue for the traditional centralized grid system. Consequently, the smart grid is undergoing a transformation to the decentralized topology from its centralized form. On the other hand, blockchain has some excellent features which make it a promising application for smart grid paradigm. In this paper, we aim to provide a comprehensive survey on application of blockchain in smart grid. As such, we identify the significant security challenges of smart grid scenarios that can be addressed by blockchain. Then, we present a number of blockchain-based recent research works presented in different literatures addressing security issues in the area of smart grid. We also summarize several related practical projects, trials, and products that have been emerged recently. Finally, we discuss essential research challenges and future directions of applying blockchain to smart grid security issues.

preprint2020arXiv

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

Neural Architecture Search (NAS) has demonstrated its power on various AI accelerating platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs). However, it remains an open problem, how to integrate NAS with Application-Specific Integrated Circuits (ASICs), despite them being the most powerful AI accelerating platforms. The major bottleneck comes from the large design freedom associated with ASIC designs. Moreover, with the consideration that multiple DNNs will run in parallel for different workloads with diverse layer operations and sizes, integrating heterogeneous ASIC sub-accelerators for distinct DNNs in one design can significantly boost performance, and at the same time further complicate the design space. To address these challenges, in this paper we build ASIC template set based on existing successful designs, described by their unique dataflows, so that the design space is significantly reduced. Based on the templates, we further propose a framework, namely NASAIC, which can simultaneously identify multiple DNN architectures and the associated heterogeneous ASIC accelerator design, such that the design specifications (specs) can be satisfied, while the accuracy can be maximized. Experimental results show that compared with successive NAS and ASIC design optimizations which lead to design spec violations, NASAIC can guarantee the results to meet the design specs with 17.77%, 2.49x, and 2.32x reductions on latency, energy, and area and with 0.76% accuracy loss. To the best of the authors&#39; knowledge, this is the first work on neural architecture and ASIC accelerator design co-exploration.

preprint2020arXiv

Decentralized Dynamic State Estimation with Bimodal Gaussian Mixture Measurement Noise

This paper proposes a decentralized dynamic state estimation (DSE) algorithm with bimodal Gaussian mixture measurement noise. The decentralized DSE is formulated using the Ensemble Kalman Filter (EnKF) and then compared with the unscented Kalman filter (UKF). The performance of the proposed framework is verified using the WSCC 9-bus system simulated in the Real Time Digital Simulator (RTDS). The phasor measurement unit (PMU) measurements are streamed in real-time from the RTDS runtime environment to MATLAB for real-time visualization and estimation. To consider the data corruption scenario in the streaming process, a bi-modal distribution containing two normal distributions with different weights and variances are added to the measurements as the noise component. The performances of both UKF and EnKF are then compared for by calculating the mean-squared-errors (MSEs) between the actual and estimated states.

preprint2020arXiv

Deep Neural Network based Wide-Area Event Classification in Power Systems

This paper presents a wide-area event classification in transmission power grids. The deep neural network (DNN) based classifier is developed based on the availability of data from time-synchronized phasor measurement units (PMUs). The proposed DNN is trained using Bayesian optimization to search for the best hyperparameters. The effectiveness of the proposed event classification is validated through the real-world dataset of the U.S. transmission grids. This dataset includes line outage, transformer outage, frequency event, and oscillation events. The validation process also includes different PMU outputs, such as voltage magnitude, angle, current magnitude, frequency, and rate of change of frequency (ROCOF). The simulation results show that ROCOF as input feature gives the best classification performance. In addition, it is shown that the classifier trained with higher sampling rate PMUs and a larger dataset has higher accuracy.

preprint2020arXiv

Device-Circuit-Architecture Co-Exploration for Computing-in-Memory Neural Accelerators

Co-exploration of neural architectures and hardware design is promising to simultaneously optimize network accuracy and hardware efficiency. However, state-of-the-art neural architecture search algorithms for the co-exploration are dedicated for the conventional von-neumann computing architecture, whose performance is heavily limited by the well-known memory wall. In this paper, we are the first to bring the computing-in-memory architecture, which can easily transcend the memory wall, to interplay with the neural architecture search, aiming to find the most efficient neural architectures with high network accuracy and maximized hardware efficiency. Such a novel combination makes opportunities to boost performance, but also brings a bunch of challenges. The design space spans across multiple layers from device type, circuit topology to neural architecture. In addition, the performance may degrade in the presence of device variation. To address these challenges, we propose a cross-layer exploration framework, namely NACIM, which jointly explores device, circuit and architecture design space and takes device variation into consideration to find the most robust neural architectures. Experimental results demonstrate that NACIM can find the robust neural network with 0.45% accuracy loss in the presence of device variation, compared with a 76.44% loss from the state-of-the-art NAS without consideration of variation; in addition, NACIM achieves an energy efficiency up to 16.3 TOPs/W, 3.17X higher than the state-of-the-art NAS.

preprint2020arXiv

Event Cause Analysis in Distribution Networks using Synchro Waveform Measurements

This paper presents a machine learning method for event cause analysis to enhance situational awareness in distribution networks. The data streams are captured using time-synchronized high sampling rates synchro waveform measurement units (SWMU). The proposed method is formulated based on a machine learning method, the convolutional neural network (CNN). This method is capable of capturing the spatiotemporal feature of the measurements effectively and perform the event cause analysis. Several events are considered in this paper to encompass a range of possible events in real distribution networks, including capacitor bank switching, transformer energization, fault, and high impedance fault (HIF). The dataset for our study is generated using the real-time digital simulator (RTDS) to simulate real-world events. The event cause analysis is performed using only one cycle of the voltage waveforms after the event is detected. The simulation results show the effectiveness of the proposed machine learning-based method compared to the state-of-the-art classifiers.

preprint2020arXiv

Feasibility and physics potential of detecting $^8$B solar neutrinos at JUNO

The Jiangmen Underground Neutrino Observatory~(JUNO) features a 20~kt multi-purpose underground liquid scintillator sphere as its main detector. Some of JUNO&#39;s features make it an excellent experiment for $^8$B solar neutrino measurements, such as its low-energy threshold, its high energy resolution compared to water Cherenkov detectors, and its much large target mass compared to previous liquid scintillator detectors. In this paper we present a comprehensive assessment of JUNO&#39;s potential for detecting $^8$B solar neutrinos via the neutrino-electron elastic scattering process. A reduced 2~MeV threshold on the recoil electron energy is found to be achievable assuming the intrinsic radioactive background $^{238}$U and $^{232}$Th in the liquid scintillator can be controlled to 10$^{-17}$~g/g. With ten years of data taking, about 60,000 signal and 30,000 background events are expected. This large sample will enable an examination of the distortion of the recoil electron spectrum that is dominated by the neutrino flavor transformation in the dense solar matter, which will shed new light on the tension between the measured electron spectra and the predictions of the standard three-flavor neutrino oscillation framework. If $Δm^{2}_{21}=4.8\times10^{-5}~(7.5\times10^{-5})$~eV$^{2}$, JUNO can provide evidence of neutrino oscillation in the Earth at the about 3$σ$~(2$σ$) level by measuring the non-zero signal rate variation with respect to the solar zenith angle. Moveover, JUNO can simultaneously measure $Δm^2_{21}$ using $^8$B solar neutrinos to a precision of 20\% or better depending on the central value and to sub-percent precision using reactor antineutrinos. A comparison of these two measurements from the same detector will help elucidate the current tension between the value of $Δm^2_{21}$ reported by solar neutrino experiments and the KamLAND experiment.

preprint2020arXiv

Hardware/Software Co-Exploration of Neural Architectures

We propose a novel hardware and software co-exploration framework for efficient neural architecture search (NAS). Different from existing hardware-aware NAS which assumes a fixed hardware design and explores the neural architecture search space only, our framework simultaneously explores both the architecture search space and the hardware design space to identify the best neural architecture and hardware pairs that maximize both test accuracy and hardware efficiency. Such a practice greatly opens up the design freedom and pushes forward the Pareto frontier between hardware efficiency and test accuracy for better design tradeoffs. The framework iteratively performs a two-level (fast and slow) exploration. Without lengthy training, the fast exploration can effectively fine-tune hyperparameters and prune inferior architectures in terms of hardware specifications, which significantly accelerates the NAS process. Then, the slow exploration trains candidates on a validation set and updates a controller using the reinforcement learning to maximize the expected accuracy together with the hardware efficiency. Experiments on ImageNet show that our co-exploration NAS can find the neural architectures and associated hardware design with the same accuracy, 35.24% higher throughput, 54.05% higher energy efficiency and 136x reduced search time, compared with the state-of-the-art hardware-aware NAS.

preprint2020arXiv

High Throughput Cryptocurrency Routing in Payment Channel Networks

Despite growing adoption of cryptocurrencies, making fast payments at scale remains a challenge. Payment channel networks (PCNs) such as the Lightning Network have emerged as a viable scaling solution. However, completing payments on PCNs is challenging: payments must be routed on paths with sufficient funds. As payments flow over a single channel (link) in the same direction, the channel eventually becomes depleted and cannot support further payments in that direction; hence, naive routing schemes like shortest-path routing can deplete key payment channels and paralyze the system. Today&#39;s PCNs also route payments atomically, worsening the problem. In this paper, we present Spider, a routing solution that &#34;packetizes&#34; transactions and uses a multi-path transport protocol to achieve high-throughput routing in PCNs. Packetization allows Spider to complete even large transactions on low-capacity payment channels over time, while the multi-path congestion control protocol ensures balanced utilization of channels and fairness across flows. Extensive simulations comparing Spider with state-of-the-art approaches shows that Spider requires less than 25% of the funds to successfully route over 95% of transactions on balanced traffic demands, and offloads 4x more transactions onto the PCN on imbalanced demands.

preprint2020arXiv

Learn to Propagate Reliably on Noisy Affinity Graphs

Recent works have shown that exploiting unlabeled data through label propagation can substantially reduce the labeling cost, which has been a critical issue in developing visual recognition models. Yet, how to propagate labels reliably, especially on a dataset with unknown outliers, remains an open question. Conventional methods such as linear diffusion lack the capability of handling complex graph structures and may perform poorly when the seeds are sparse. Latest methods based on graph neural networks would face difficulties on performance drop as they scale out to noisy graphs. To overcome these difficulties, we propose a new framework that allows labels to be propagated reliably on large-scale real-world data. This framework incorporates (1) a local graph neural network to predict accurately on varying local structures while maintaining high scalability, and (2) a confidence-based path scheduler that identifies outliers and moves forward the propagation frontier in a prudent way. Experiments on both ImageNet and Ms-Celeb-1M show that our confidence guided framework can significantly improve the overall accuracies of the propagated labels, especially when the graph is very noisy.

preprint2020arXiv

Learning to Cluster Faces via Confidence and Connectivity Estimation

Face clustering is an essential tool for exploiting the unlabeled face data, and has a wide range of applications including face annotation and retrieval. Recent works show that supervised clustering can result in noticeable performance gain. However, they usually involve heuristic steps and require numerous overlapped subgraphs, severely restricting their accuracy and efficiency. In this paper, we propose a fully learnable clustering framework without requiring a large number of overlapped subgraphs. Instead, we transform the clustering problem into two sub-problems. Specifically, two graph convolutional networks, named GCN-V and GCN-E, are designed to estimate the confidence of vertices and the connectivity of edges, respectively. With the vertex confidence and edge connectivity, we can naturally organize more relevant vertices on the affinity graph and group them into clusters. Experiments on two large-scale benchmarks show that our method significantly improves clustering accuracy and thus performance of the recognition models trained on top, yet it is an order of magnitude more efficient than existing supervised methods.

preprint2020arXiv

Mapping in a cycle: Sinkhorn regularized unsupervised learning for point cloud shapes

We propose an unsupervised learning framework with the pretext task of finding dense correspondences between point cloud shapes from the same category based on the cycle-consistency formulation. In order to learn discriminative pointwise features from point cloud data, we incorporate in the formulation a regularization term based on Sinkhorn normalization to enhance the learned pointwise mappings to be as bijective as possible. Besides, a random rigid transform of the source shape is introduced to form a triplet cycle to improve the model&#39;s robustness against perturbations. Comprehensive experiments demonstrate that the learned pointwise features through our framework benefits various point cloud analysis tasks, e.g. partial shape registration and keypoint transfer. We also show that the learned pointwise features can be leveraged by supervised methods to improve the part segmentation performance with either the full training dataset or just a small portion of it.

preprint2020arXiv

Resource Allocation for Secure Multi-UAV Communication Systems with Multi-Eavesdropper

In this paper, we study the resource allocation and trajectory design for secure unmanned aerial vehicle (UAV)-enabled communication systems, where multiple multi-purpose UAV base stations are dispatched to provide secure communications to multiple legitimate ground users (GUs) in the existence of multiple eavesdroppers (Eves). Specifically, by leveraging orthogonal frequency division multiple access (OFDMA), active UAV base stations can communicate to their desired ground users via the assigned subcarriers while idle UAV base stations can serve as jammer simultaneously for communication security provisioning. To achieve fairness in secure communication, we maximize the average minimum secrecy rate per user by jointly optimizing the communication/jamming subcarrier allocation policy and the trajectory of UAVs, while taking into account the constraints on the minimum safety distance among multiple UAVs, the maximum cruising speed, the initial/final locations, and the existence of cylindrical no-fly zones (NFZs). The design is formulated as a mixed integer non-convex optimization problem which is generally intractable. Subsequently, a computationally-efficient iterative algorithm is proposed to obtain a suboptimal solution. Simulation results illustrate that the performance of the proposed iterative algorithm can significantly improve the average minimum secrecy rate compared to various baseline schemes.

preprint2020arXiv

SemanticAdv: Generating Adversarial Examples via Attribute-conditional Image Editing

Deep neural networks (DNNs) have achieved great success in various applications due to their strong expressive power. However, recent studies have shown that DNNs are vulnerable to adversarial examples which are manipulated instances targeting to mislead DNNs to make incorrect predictions. Currently, most such adversarial examples try to guarantee &#34;subtle perturbation&#34; by limiting the $L_p$ norm of the perturbation. In this paper, we aim to explore the impact of semantic manipulation on DNNs predictions by manipulating the semantic attributes of images and generate &#34;unrestricted adversarial examples&#34;. In particular, we propose an algorithm \emph{SemanticAdv} which leverages disentangled semantic factors to generate adversarial perturbation by altering controlled semantic attributes to fool the learner towards various &#34;adversarial&#34; targets. We conduct extensive experiments to show that the semantic based adversarial examples can not only fool different learning tasks such as face verification and landmark detection, but also achieve high targeted attack success rate against \emph{real-world black-box} services such as Azure face verification service based on transferability. To further demonstrate the applicability of \emph{SemanticAdv} beyond face recognition domain, we also generate semantic perturbations on street-view images. Such adversarial examples with controlled semantic manipulation can shed light on further understanding about vulnerabilities of DNNs as well as potential defensive approaches.

preprint2020arXiv

Standing on the Shoulders of Giants: Hardware and Neural Architecture Co-Search with Hot Start

Hardware and neural architecture co-search that automatically generates Artificial Intelligence (AI) solutions from a given dataset is promising to promote AI democratization; however, the amount of time that is required by current co-search frameworks is in the order of hundreds of GPU hours for one target hardware. This inhibits the use of such frameworks on commodity hardware. The root cause of the low efficiency in existing co-search frameworks is the fact that they start from a &#34;cold&#34; state (i.e., search from scratch). In this paper, we propose a novel framework, namely HotNAS, that starts from a &#34;hot&#34; state based on a set of existing pre-trained models (a.k.a. model zoo) to avoid lengthy training time. As such, the search time can be reduced from 200 GPU hours to less than 3 GPU hours. In HotNAS, in addition to hardware design space and neural architecture search space, we further integrate a compression space to conduct model compressing during the co-search, which creates new opportunities to reduce latency but also brings challenges. One of the key challenges is that all of the above search spaces are coupled with each other, e.g., compression may not work without hardware design support. To tackle this issue, HotNAS builds a chain of tools to design hardware to support compression, based on which a global optimizer is developed to automatically co-search all the involved search spaces. Experiments on ImageNet dataset and Xilinx FPGA show that, within the timing constraint of 5ms, neural architectures generated by HotNAS can achieve up to 5.79% Top-1 and 3.97% Top-5 accuracy gain, compared with the existing ones.

preprint2020arXiv

Study of Quasi-two-body $B_{(s)}\to ϕ(f_0(980)/f_2(1270)\to)ππ$ Decays in Perturbative QCD Approach

In 2017, LHCb collaboration reported their first observation of the rare decays $B_s \to ϕ(f_0(980)$ $/f_2(1270) \to ) π^+π^-$ and the evidence of $B^0 \to ϕ(f_0(980)/f_2(1270)\to)π^+π^-$. Motivated by this, we study these quasi-two-body decays in the perturbative QCD approach. The branching fractions, $CP$ asymmetries and the polarization fractions are calculated. We find that within the appropriate two-meson wave functions, the calculated branching fractions are in agreement with the measurements of LHCb. Based on the narrow-width approximation, We also calculate the branching fractions of the quasi-two-body $B_{d,s}\to ϕ(f_0(980)/f_2(1270)\to) π^0π^0$ and $B_{d,s}\to ϕ(f_2(1270)\to) K^+K^-$, and hope the predictions to be tested in the ongoing LHCb and Belle II experiments. Moreover, the processes $B_{d,s}\to ϕf_2(1270)$ are also analyzed under this approximation. We note that the $CP$ asymmetries of these decays are very small, because these decays are either penguin dominant or pure penguin processes.

preprint2020arXiv

TAO Conceptual Design Report: A Precision Measurement of the Reactor Antineutrino Spectrum with Sub-percent Energy Resolution

The Taishan Antineutrino Observatory (TAO, also known as JUNO-TAO) is a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO). A ton-level liquid scintillator detector will be placed at about 30 m from a core of the Taishan Nuclear Power Plant. The reactor antineutrino spectrum will be measured with sub-percent energy resolution, to provide a reference spectrum for future reactor neutrino experiments, and to provide a benchmark measurement to test nuclear databases. A spherical acrylic vessel containing 2.8 ton gadolinium-doped liquid scintillator will be viewed by 10 m^2 Silicon Photomultipliers (SiPMs) of >50% photon detection efficiency with almost full coverage. The photoelectron yield is about 4500 per MeV, an order higher than any existing large-scale liquid scintillator detectors. The detector operates at -50 degree C to lower the dark noise of SiPMs to an acceptable level. The detector will measure about 2000 reactor antineutrinos per day, and is designed to be well shielded from cosmogenic backgrounds and ambient radioactivities to have about 10% background-to-signal ratio. The experiment is expected to start operation in 2022.

preprint2020arXiv

Topology Virtualization and Dynamics Shielding Method for LEO Satellite Networks

Virtual Node (VN) method is widely adopted to handle satellite network topological dynamics. However, conventional VN method is insufficient when earth rotation and inter-plane phase difference are considered. An improved VN method based on Celestial Sphere Division is proposed to overcome the defects of the conventional method. An optimized inter-satellite link connecting mode is derived to achieve maximal available links. The optimal VN division solution and addressing scheme are designed to generate a nearly static virtual network and solve the asynchronous switches caused by inter-plane phase difference. Comparison results demonstrate the advantages of proposed method.

preprint2019arXiv

Calculation of the $B\to K_{0,2}^*(1430)f_0(980)/σ$ decays in the Perturbative QCD Approach

Motivated by the observations of the decays $B^0 \to K_0^{*}(1430)^0 f_0(980)$ and $ B^0 \to K_2^{*}(1430)^0 f_0(980)$ from BaBar collaboration, we study the $B^{0(+)} \to K_{0,2}^{*}(1430)^{0(+)} f_0(980)/σ$ decays in the perturbative QCD approach for the first time. In the absence of reliable nonperturbative wave functions we only assume the scalar meson $f_0(980)$ and $σ$ are two-quark ground states. In our calculations, these decays are all dominated by the hard-scattering emission and annihilation diagrams, while the factorizable emission diagrams are forbidden or suppressed heavily by the vector decay constants. Furthermore, the branching fractions are sensitive to the mixing between $f_0(980)$ and $σ$. Comparing our results with the experimental data, a large mixing angle $θ$ is favored. Taking $θ=145^\circ$, the orders of branching fractions of $B \to K_0^{*}(1430)^0 σ$, $B \to K_{2}^{*}(1430)^0 σ$ and $B \to K_{0,2}^{*}(1430)^0 f_0(980)$ are predicted to be $10^{-4}$, $10^{-5}$ and $10^{-6}$, respectively, which can be measured in the current experiments such as LHCb and Belle-2. In addition, although these decays are penguin dominant, the mixing also leads to large direct $CP$ asymmetries in these decays. With the precise data in future, our results could shed light on the inner structure of the scalar mesons and can be used to determine the mixing angle of the $σ-f_0(980)$ system.

preprint2019arXiv

Toward accurate measurement of property-dependent galaxy clustering I. Comparison of the Vmax method and the &#34;shuffled&#34; method

Galaxy clustering provides insightful clues to our understanding of galaxy formation and evolution, as well as the universe. The redshift assignment for the random sample is one of the key steps to measure the galaxy clustering accurately. In this paper, by virtue of the mock galaxy catalogs, we investigate the effect of two redshift assignment methods on the measurement of galaxy two-point correlation functions (hereafter 2PCFs), the Vmax method and the &#34;shuffled&#34; method. We found that the shuffled method significantly underestimates both of the projected 2PCFs and the two-dimensional 2PCFs in redshift space. While the Vmax method does not show any notable bias on the 2PCFs for volume-limited samples. For flux-limited samples, the bias produced by the Vmax method is less than half of the shuffled method on large scales. Therefore, we strongly recommend the Vmax method to assign redshifts to random samples in the future galaxy clustering analysis.

preprint2016arXiv

Equidistribution of curves in homogeneous spaces and Dirichlet&#39;s approximation theorem for matrices

In this paper, we study an analytic curve $φ: I=[a,b]\rightarrow \mathrm{M}(m\times n, \mathbb{R})$ in the space of $m$ by $n$ real matrices, and show that if $φ$ satisfies certain geometric condition, then for almost every point on the curve, the Diophantine approximation given by Dirichlet&#39;s Theorem can not be improved. To do this, we embed the curve into some homogeneous space $G/Γ$, and prove that under the action of some expanding diagonal subgroup $A= \{a(t): t \in \mathbb{R}\}$, the translates of the curve tend to be equidistributed in $G/Γ$, as $t \rightarrow +\infty$.