Source author record

Wei Dai

Wei Dai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

63works

39topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A Control Theoretic Approach to Decentralized AI Economy Stabilization via Dynamic Buyback-and-Burn Mechanisms

The democratization of artificial intelligence through decentralized networks represents a paradigm shift in computational provisioning, yet the long-term viability of these ecosystems is critically endangered by the extreme volatility of their native economic layers. Current tokenomic models, which predominantly rely on static or threshold-based buyback heuristics, are ill-equipped to handle complex system dynamics and often function pro-cyclically, exacerbating instability during market downturns. To bridge this gap, we propose the Dynamic-Control Buyback Mechanism (DCBM), a formalized control-theoretic framework that utilizes a Proportional-Integral-Derivative (PID) controller with strict solvency constraints to regulate the token economy as a dynamical system. Extensive agent-based simulations utilizing Jump-Diffusion processes demonstrate that DCBM fundamentally outperforms static baselines, reducing token price volatility by approximately 66% and lowering operator churn from 19.5% to 8.1% in high-volatility regimes. These findings establish that converting tokenomics from static rules into continuous, structurally constrained control loops is a necessary condition for secure and sustainable decentralized intelligence networks.

preprint2026arXiv

CasualSynth: Generating Structurally Sound Synthetic Data

Large Language Models (LLMs) generate realistic synthetic data but offer no guarantee that their outputs respect the causal mechanisms governing the target domain. We introduce CausalSynth, a framework that decouples causal structure generation from semantic realization, yielding synthetic data that is both causally valid and linguistically rich. The framework operates in three phases. First, a Structural Causal Model (SCM) - a tuple of structural equations defined over a directed acyclic graph (DAG) generates causal skeletons, i.e., variable assignments that satisfy the Global Markov Property of the governing DAG, via ancestral sampling. Second, an LLM acts as a constrained \emph{realizer}, a conditional translator that maps each skeleton to a high-dimensional observation such as a clinical note or a transaction log. Third, an Iterative Consistency Verification module detects structural violations through deterministic extraction and feeds targeted corrections back to the LLM, forming a closed-loop refinement process. We identify the Semantic Backdoor problem the systematic tendency of LLMs to override imposed causal facts with pre-training priors -- and prove that our iterative mechanism reduces the resulting selection bias relative to standard rejection sampling. On three causal benchmarks (ASIA, ALARM, and MIMIC-Struct), CausalSynth preserved conditional independencies with false-positive rates near the nominal $α=0.05$ level and achieved realizability rates above 96% with 70B-parameter LLM backbones. The framework additionally supports principled interventional and counterfactual generation through noise retention and graph mutilation.

preprint2026arXiv

Differentially Private Motif-Preserving Multi-modal Hashing

Cross-modal hashing enables efficient retrieval by encoding images and text into compact binary codes. State-of-the-art methods rely on semantic similarity graphs derived from user interactions for supervision, yet these graphs encode sensitive behavioral patterns vulnerable to link reconstruction attacks. Existing privacy-preserving approaches fail on graph-structured data: Differentially Private SGD destroys relational motifs by treating samples independently, while graph synthesis methods suffer from unbounded local sensitivity in scale-free networks, hub nodes cause single-edge modifications to alter triangle counts by $\mathcal{O}(N)$, necessitating prohibitive noise injection. We term this phenomenon Hubness Explosion. We propose DMP-MH, a Sanitize-then-Distill framework that decouples privacy from representation learning. Our approach first bounds sensitivity by deterministically clipping node degrees, capping the $L_2$-sensitivity of triangle motifs independently of dataset size. A sanitized synthetic graph is then generated via Noisy Mirror Descent under $(ε,δ)$-Edge Differential Privacy. Finally, dual-stream hashing networks distill this topology using a holistic structural loss that enforces cross-modal alignment. Evaluated on MIRFlickr-25K and NUS-WIDE under a strict inductive protocol, DMP-MH outperforms private baselines by up to 11.4 mAP points while retaining up to 92.5% of non-private performance.

preprint2026arXiv

MathDoc: Benchmarking Structured Extraction and Active Refusal on Noisy Mathematics Exam Papers

The automated extraction of structured questions from paper-based mathematics exams is fundamental to intelligent education, yet remains challenging in real-world settings due to severe visual noise. Existing benchmarks mainly focus on clean documents or generic layout analysis, overlooking both the structural integrity of mathematical problems and the ability of models to actively reject incomplete inputs. We introduce MathDoc, the first benchmark for document-level information extraction from authentic high school mathematics exam papers. MathDoc contains \textbf{3,609} carefully curated questions with real-world artifacts and explicitly includes unrecognizable samples to evaluate active refusal behavior. We propose a multi-dimensional evaluation framework covering stem accuracy, visual similarity, and refusal capability. Experiments on SOTA MLLMs, including Qwen3-VL and Gemini-2.5-Pro, show that although end-to-end models achieve strong extraction performance, they consistently fail to refuse illegible inputs, instead producing confident but invalid outputs. These results highlight a critical gap in current MLLMs and establish MathDoc as a benchmark for assessing model reliability under degraded document conditions. Our project repository is available at \href{https://github.com/winnk123/papers/tree/master}{GitHub repository}

preprint2025arXiv

Exploiting Scale-Variant Attention for Segmenting Small Medical Objects

Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. Identifying mild syndrome with small pathological regions serves as an ominous warning and is fundamental in the early diagnosis of diseases. While deep learning algorithms, particularly convolutional neural networks (CNNs), have shown promise in segmenting medical objects, analyzing small areas in medical images remains challenging. This difficulty arises due to information losses and compression defects from convolution and pooling operations in CNNs, which become more pronounced as the network deepens, especially for small medical objects. To address these challenges, we propose a novel scale-variant attention-based network (SvANet) for accurately segmenting small-scale objects in medical images. The SvANet consists of scale-variant attention, cross-scale guidance, Monte Carlo attention, and vision transformer, which incorporates cross-scale features and alleviates compression artifacts for enhancing the discrimination of small medical objects. Quantitative experimental results demonstrate the superior performance of SvANet, achieving 96.12%, 96.11%, 89.79%, 84.15%, 80.25%, 73.05%, and 72.58% in mean Dice coefficient for segmenting kidney tumors, skin lesions, hepatic tumors, polyps, surgical excision cells, retinal vasculatures, and sperms, which occupy less than 1% of the image areas in KiTS23, ISIC 2018, ATLAS, PolypGen, TissueNet, FIVES, and SpermHealth datasets, respectively.

preprint2022arXiv

A New Learning Paradigm for Stochastic Configuration Network: SCN+

Learning using privileged information (LUPI) paradigm, which pioneered teacher-student interaction mechanism, makes the learning models use additional information in training stage. This paper is the first to propose an incremental learning algorithm with LUPI paradigm for stochastic configuration network (SCN), named SCN+. This novel algorithm can leverage privileged information into SCN in the training stage, which provides a new method to train SCN. Moreover, the convergences have been studied in this paper. Finally, experimental results indicate that SCN+ indeed performs favorably.

preprint2022arXiv

Blind Two-Dimensional Super-Resolution and Its Performance Guarantee (Extended Version)

We study the problem of identifying the parameters of a linear system from its response to multiple unknown waveforms. We assume that the system response is a scaled superposition of time-delayed and frequency-shifted versions of the unknown waveforms. Such kind of problem is severely ill-posed and does not yield a unique solution without introducing further constraints. To fully characterize the system, we assume that the unknown waveforms lie in a common known low-dimensional subspace that satisfies certain properties. Then, we develop a blind two-dimensional (2D) super-resolution framework that applies to a large number of applications. In this framework, we show that under a minimum separation between the time-frequency shifts, all the unknowns that characterize the system can be recovered precisely and with high probability provided that a lower bound on the number of the observed samples is satisfied. The proposed framework is based on a 2D atomic norm minimization problem, which is shown to be reformulated and solved via semidefinite programming. Simulation results that confirm the theoretical findings of the paper are provided.

preprint2022arXiv

Data-Efficient Modeling for Precise Power Consumption Estimation of Quadrotor Operations Using Ensemble Learning

Electric Take-Off and Landing (eVTOL) aircraft is considered as the major aircraft type in the emerging urban air mobility. Accurate power consumption estimation is crucial to eVTOL, supporting advanced power management strategies and improving the efficiency and safety performance of flight operations. In this study, a framework for power consumption modeling of eVTOL aircraft was established. We employed an ensemble learning method, namely stacking, to develop a data-driven model using flight records of three different types of quadrotors. Random forest and extreme gradient boosting, showing advantages in prediction, were chosen as base-models, and a linear regression model was used as the meta-model. The established stacking model can accurately estimate the power of a quadrotor. Error analysis shows that about 80% prediction errors fall within one standard deviation interval and less than 0.5% error in the prediction for an entire flight can be expected with a confidence of more than 80%. Our model outperforms the existing models in two aspects: firstly, our model has a better prediction performance, and secondly, our model is more data-efficient, requiring a much smaller dataset. Our model provides a powerful tool for operators of eVTOL aircraft in mission management and contributes to promoting safe and energy-efficient urban air traffic.

preprint2022arXiv

Digging into Primary Financial Market: Challenges and Opportunities of Adopting Blockchain

Since the emergence of blockchain technology, its application in the financial market has always been an area of focus and exploration by all parties. With the characteristics of anonymity, trust, tamper-proof, etc., blockchain technology can effectively solve some problems faced by the financial market, such as trust issues and information asymmetry issues. To deeply understand the application scenarios of blockchain in the financial market, the issue of securities issuance and trading in the primary market is a problem that must be studied clearly. We conducted an empirical study to investigate the main difficulties faced by primary market participants in their business practices and the potential challenges of the deepening application of blockchain technology in the primary market. We adopted a hybrid method combining interviews (qualitative methods) and surveys (quantitative methods) to conduct this research in two stages. In the first stage, we interview 15 major primary market participants with different backgrounds and expertise. In the second phase, we conducted a verification survey of 54 primary market practitioners to confirm various insights from the interviews, including challenges and desired improvements. Our interviews and survey results revealed several significant challenges facing blockchain applications in the primary market: complex due diligence, mismatch, and difficult monitoring. On this basis, we believe that our future research can focus on some aspects of these challenges.

preprint2022arXiv

Maximum principles and the method of moving planes for the uniformly elliptic nonlocal Bellman operator and applications

In this paper, we establish various maximum principles and develop the method of moving planes and the sliding method (on general unbounded domains) for equations involving the uniformly elliptic nonlocal Bellman operator. As a consequence, we derive multiple applications of these maximum principles and the moving planes method. For instance, we prove symmetry, monotonicity and uniqueness results and asymptotic properties for solutions to various equations involving the uniformly elliptic nonlocal Bellman operator in bounded domains, unbounded domains, epigraph or $\mathbb{R}^{n}$. In particular, the uniformly elliptic nonlocal Monge-Ampère operator introduced by Caffarelli and Charro in \cite{CC} is a typical example of the uniformly elliptic nonlocal Bellman operator.

preprint2022arXiv

Orthogonal Stochastic Configuration Networks with Adaptive Construction Parameter for Data Analytics

As a randomized learner model, SCNs are remarkable that the random weights and biases are assigned employing a supervisory mechanism to ensure universal approximation and fast learning. However, the randomness makes SCNs more likely to generate approximate linear correlative nodes that are redundant and low quality, thereby resulting in non-compact network structure. In the light of a fundamental principle in machine learning, that is, a model with fewer parameters holds improved generalization. This paper proposes orthogonal SCN, termed OSCN, to filtrate out the low-quality hidden nodes for network structure reduction by incorporating Gram-Schmidt orthogonalization technology. The universal approximation property of OSCN and an adaptive setting for the key construction parameters have been presented in details. In addition, an incremental updating scheme is developed to dynamically determine the output weights, contributing to improved computational efficiency. Finally, experimental results on two numerical examples and several real-world regression and classification datasets substantiate the effectiveness and feasibility of the proposed approach.

preprint2022arXiv

Solving DC Power Flow Problems Using Quantum and Hybrid algorithms

Power flow calculation plays an important role in planning, operation, and control of the power system. The quantum HHL algorithm can achieve theoretical exponential speedup over classical algorithms on DC power flow calculation. Since the qubit resources in the Noisy Intermediate-scale Quantum (NISQ) era are limited, it is important to discuss the performance considering this limitation. The coefficient matrix of the linear systems of equations in DC power flow problems cannot be represented perfectly by finite binary number strings, which leads to imperfect phase estimation. This work is carried out under the assumption of imperfect phase estimation. The performance of the HHL algorithm is systematically investigated with different accuracy and redundant qubits. In order to further reduce the required qubit resources, a hybrid quantum-classical algorithm is proposed. By comparing errors of the HHL and hybrid algorithms in the DC power flow calculation of the IEEE 5-bus test system, it is found that the hybrid algorithm can achieve comparable precision with fewer qubits than HHL by increasing the number of phase estimation modules, which may make the hybrid algorithm a feasible route in the NISQ era.

preprint2021arXiv

Improved ACD-based financial trade durations prediction leveraging LSTM networks and Attention Mechanism

The liquidity risk factor of security market plays an important role in the formulation of trading strategies. A more liquid stock market means that the securities can be bought or sold more easily. As a sound indicator of market liquidity, the transaction duration is the focus of this study. We concentrate on estimating the probability density function p(Δt_(i+1) |G_i) where Δt_(i+1) represents the duration of the (i+1)-th transaction, G_i represents the historical information at the time when the (i+1)-th transaction occurs. In this paper, we propose a new ultra-high-frequency (UHF) duration modelling framework by utilizing long short-term memory (LSTM) networks to extend the conditional mean equation of classic autoregressive conditional duration (ACD) model while retaining the probabilistic inference ability. And then the attention mechanism is leveraged to unveil the internal mechanism of the constructed model. In order to minimize the impact of manual parameter tuning, we adopt fixed hyperparameters during the training process. The experiments applied to a large-scale dataset prove the superiority of the proposed hybrid models. In the input sequence, the temporal positions which are more important for predicting the next duration can be efficiently highlighted via the added attention mechanism layer.

preprint2021arXiv

Liouville type theorems for fractional and higher order Hénon-Hardy type equations via the method of scaling spheres

In this paper, we are concerned with the fractional and higher order Hénon-Hardy type equations \begin{equation*} (-Δ)^{\fracα{2}}u(x)=f(x,u(x)) \,\,\,\,\,\,\,\,\,\,\,\, \text{in} \,\,\, \mathbb{R}^{n}, \,\,\, \mathbb{R}^{n}_{+} \,\,\, \text{or} \,\,\, Ω\end{equation*} with $n>α$, $0<α<2$ or $α=2m$ with $1\leq m<\frac{n}{2}$. We first consider the typical case $f(x,u)=|x|^{a}u^{p}$ with $a\in(-α,\infty)$ and $0<p<p_{c}(a):=\frac{n+α+2a}{n-α}$. By using the method of scaling spheres, we prove Liouville theorems for the above Hénon-Hardy equations and equivalent integral equations in $\mathbb{R}^{n}$ and $\mathbb{R}^{n}_{+}$. Our results improve the known Liouville theorems for some especially admissible subranges of $a$ and $1<p<\min\left\{\frac{n+α+a}{n-α},p_{c}(a)\right\}$ to the full range $a\in(-α,\infty)$ and $p\in(0,p_{c}(a))$. When $a>0$, we covered the gap $p\in\big[\frac{n+α+a}{n-α},p_{c}(a)\big)$. In particular, when $α=2$, our results give an affirmative answer to the conjecture posed by Phan and Souplet \cite{PS}. As a consequence, we derive a priori estimates and existence of positive solutions to higher order Lane-Emden equations in bounded domains for all $1<p<\frac{n+2m}{n-2m}$. Our theorems improve the results in \cite{CFL,DPQ} remarkably to the maximal range of $p$. For bounded domains $Ω$, we also apply the method of scaling spheres to derive Liouville theorems for super-critical problems. Extensions to PDEs and IEs with general nonlinearities $f(x,u)$ are also included. We believe the method of scaling spheres developed here can be applied conveniently to various fractional or higher order problems with singularities or without translation invariance or in the cases the method of moving planes in conjunction with Kelvin transforms do not work.

preprint2020arXiv

Demonstration of Controlled-Phase Gates between Two Error-Correctable Photonic Qubits

To realize fault-tolerant quantum computing, it is necessary to store quantum information in logical qubits with error correction functions, realized by distributing a logical state among multiple physical qubits or by encoding it in the Hilbert space of a high-dimensional system. Quantum gate operations between these error-correctable logical qubits, which are essential for implementation of any practical quantum computational task, have not been experimentally demonstrated yet. Here we demonstrate a geometric method for realizing controlled-phase gates between two logical qubits encoded in photonic fields stored in cavities. The gates are realized by dispersively coupling an ancillary superconducting qubit to these cavities and driving it to make a cyclic evolution depending on the joint photonic state of the cavities, which produces a conditional geometric phase. We first realize phase gates for photonic qubits with the logical basis states encoded in two quasiorthogonal coherent states, which have important implications for continuous-variable-based quantum computation. Then we use this geometric method to implement a controlled-phase gate between two binomially encoded logical qubits, which have an error-correctable function.

preprint2020arXiv

Dictionary Learning with BLOTLESS Update

Algorithms for learning a dictionary to sparsely represent a given dataset typically alternate between sparse coding and dictionary update stages. Methods for dictionary update aim to minimise expansion error by updating dictionary vectors and expansion coefficients given patterns of non-zero coefficients obtained in the sparse coding stage. We propose a block total least squares (BLOTLESS) algorithm for dictionary update. BLOTLESS updates a block of dictionary elements and the corresponding sparse coefficients simultaneously. In the error free case, three necessary conditions for exact recovery are identified. Lower bounds on the number of training data are established so that the necessary conditions hold with high probability. Numerical simulations show that the bounds approximate well the number of training data needed for exact dictionary recovery. Numerical experiments further demonstrate several benefits of dictionary learning with BLOTLESS update compared with state-of-the-art algorithms especially when the amount of training data is small.

preprint2020arXiv

Direct methods for pseudo-relativistic Schrödinger operators

In this paper, we establish various maximal principles and develop the direct moving planes and sliding methods for equations involving the physically interesting (nonlocal) pseudo-relativistic Schrödinger operators $(-Δ+m^{2})^{s}$ with $s\in(0,1)$ and mass $m>0$. As a consequence, we also derive multiple applications of these direct methods. For instance, we prove monotonicity, symmetry and uniqueness results for solutions to various equations involving the operators $(-Δ+m^{2})^{s}$ in bounded domains, epigraph or $\mathbb{R}^{N}$, including pseudo-relativistic Schrödinger equations, 3D boson star equations and the equations with De Giorgi type nonlinearities.

preprint2020arXiv

EVA: An Encrypted Vector Arithmetic Language and Compiler for Efficient Homomorphic Computation

Fully-Homomorphic Encryption (FHE) offers powerful capabilities by enabling secure offloading of both storage and computation, and recent innovations in schemes and implementations have made it all the more attractive. At the same time, FHE is notoriously hard to use with a very constrained programming model, a very unusual performance profile, and many cryptographic constraints. Existing compilers for FHE either target simpler but less efficient FHE schemes or only support specific domains where they can rely on expert-provided high-level runtimes to hide complications. This paper presents a new FHE language called Encrypted Vector Arithmetic (EVA), which includes an optimizing compiler that generates correct and secure FHE programs, while hiding all the complexities of the target FHE scheme. Bolstered by our optimizing compiler, programmers can develop efficient general-purpose FHE applications directly in EVA. For example, we have developed image processing applications using EVA, with a very few lines of code. EVA is designed to also work as an intermediate representation that can be a target for compiling higher-level domain-specific languages. To demonstrate this, we have re-targeted CHET, an existing domain-specific compiler for neural network inference, onto EVA. Due to the novel optimizations in EVA, its programs are on average 5.3x faster than those generated by CHET. We believe that EVA would enable a wider adoption of FHE by making it easier to develop FHE applications and domain-specific FHE compilers.

preprint2020arXiv

HEAX: An Architecture for Computing on Encrypted Data

With the rapid increase in cloud computing, concerns surrounding data privacy, security, and confidentiality also have been increased significantly. Not only cloud providers are susceptible to internal and external hacks, but also in some scenarios, data owners cannot outsource the computation due to privacy laws such as GDPR, HIPAA, or CCPA. Fully Homomorphic Encryption (FHE) is a groundbreaking invention in cryptography that, unlike traditional cryptosystems, enables computation on encrypted data without ever decrypting it. However, the most critical obstacle in deploying FHE at large-scale is the enormous computation overhead. In this paper, we present HEAX, a novel hardware architecture for FHE that achieves unprecedented performance improvement. HEAX leverages multiple levels of parallelism, ranging from ciphertext-level to fine-grained modular arithmetic level. Our first contribution is a new highly-parallelizable architecture for number-theoretic transform (NTT) which can be of independent interest as NTT is frequently used in many lattice-based cryptography systems. Building on top of NTT engine, we design a novel architecture for computation on homomorphically encrypted data. We also introduce several techniques to enable an end-to-end, fully pipelined design as well as reducing on-chip memory consumption. Our implementation on reconfigurable hardware demonstrates 164-268x performance improvement for a wide range of FHE parameters.

preprint2020arXiv

Learning Optimal Tree Models Under Beam Search

Retrieving relevant targets from an extremely large target set under computational limits is a common challenge for information retrieval and recommendation systems. Tree models, which formulate targets as leaves of a tree with trainable node-wise scorers, have attracted a lot of interests in tackling this challenge due to their logarithmic computational complexity in both training and testing. Tree-based deep models (TDMs) and probabilistic label trees (PLTs) are two representative kinds of them. Though achieving many practical successes, existing tree models suffer from the training-testing discrepancy, where the retrieval performance deterioration caused by beam search in testing is not considered in training. This leads to an intrinsic gap between the most relevant targets and those retrieved by beam search with even the optimally trained node-wise scorers. We take a first step towards understanding and analyzing this problem theoretically, and develop the concept of Bayes optimality under beam search and calibration under beam search as general analyzing tools for this purpose. Moreover, to eliminate the discrepancy, we propose a novel algorithm for learning optimal tree models under beam search. Experiments on both synthetic and real data verify the rationality of our theoretical analysis and demonstrate the superiority of our algorithm compared to state-of-the-art methods.

preprint2020arXiv

Radial distribution of charm quarks in jets in high-energy heavy-ion collisions

Heavy flavor physics in high-energy heavy-ion collisions is a promising and active area to study the mass dependence of the "jet quenching" effects both at the RHIC and the LHC. In this talk, we present the first theoretical study on the $D^0$ meson radial distributions relative to the jet axis both in p+p and Pb+Pb collisions at $\sqrt{s_{NN}}=5.02$ TeV, where a nice agreement of our results with experimental data is observed. The in-medium parton propagations are described by a Monte Carlo transport model which uses the next-to-leading order (NLO) plus parton shower (PS) event generator SHERPA as input and includes elastic (collisional) and inelastic (radiative) in-medium interaction of heavy flavor jet. We find that, at low $D^0$ meson $p_T$, the radial distribution significantly shifts to larger radius indicating a strong diffusion effect, and the diffusion effects decrease quickly with $p_T$ ,which is consistent with the recent CMS measurements. We demonstrate that the angular deviation of charm quarks is sensitive to $D_s$ but not $\hat{q}$, which may provide new constrains on the collisional and radiative heavy quark energy loss.

preprint2020arXiv

Radial profile of heavy quarks in jets in high-energy nuclear collisions

In high energy nuclear collisions, heavy flavor tagged jets are useful hard probes to study the properties of the quark-gluon plasma (QGP). In this talk, we present the first theoretical prediction of the $D^0$ meson radial distributions in jets relative to the jet axis both in p+p and Pb+Pb collisions at $5.02$ TeV, it shows a nice agreement with the available experimental data. The in-medium jet evolution in the study is described by a Monte Carlo transport model which has been incorporated with the initial events as input provided by the next-to-leading order (NLO) plus parton shower (PS) event generator SHERPA. In such evolution process, both elastic and inelastic parton energy loss in the hot and dense medium are taken into account. Within this same simulation framework, we predict different modification patterns of the radial profile of charm and bottom quarks in jets in Pb+Pb collisions: jet quenching effect will lead the charm quarks diffuse to lager radius while lead the bottom quarks distributed closer to jet axis.

preprint2020arXiv

Selective Confidence Intervals for Martingale Regression Model

In this paper we consider the problem of constructing confidence intervals for coefficients of martingale regression models (in particular, time series models) after variable selection. Although constructing confidence intervals are common practice in statistical analysis, it is challenging in our framework due to the data-dependence of the selected model and the correlation among the variables being selected and not selected. We first introduce estimators for the selected coefficients and show that it is consistent under martingale regression model, in which the observations can be dependent and the errors can be heteroskedastic. Then we use the estimators together with a resampling approach to construct confidence intervals. Our simulation results show that our approach outperforms other existing approaches in various data structures.

preprint2020arXiv

Self-similar solutions of energy-supercritical focusing wave equations in all dimensions

In this paper, we prove the existence of a countable family of regular spherically symmetric self-similar solutions to focusing energy super-critical semi-linear wave equations \begin{equation*} \partial_{tt}u-Δu=|u|^{p-1}u \qquad \text{in} \,\, \mathbb{R}^{N}, \end{equation*} where $N\geq 3$, $1+\frac{4}{N-2}<p$, and, if $N\geq 4$, $p \leq 1+\frac{4}{N-3}$. This was previously known only in the case $N=3$, for integer $p$ (see Bizoń, Maison and Wasserman \cite{BMW}). We also study the asymptotics of these solutions.

preprint2020arXiv

Sharp reversed Hardy-Littlewood-Sobolev inequality with extended kernel

In this paper, we prove the following reversed Hardy-Littlewood-Sobolev inequality with extended kernel \begin{equation*} \int_{\mathbb{R}_+^n}\int_{\partial\mathbb{R}^n_+} \frac{x_n^β}{|x-y|^{n-α}}f(y)g(x) dydx\geq C_{n,α,β,p}\|f\|_{L^{p}(\partial\mathbb{R}_+^n)} \|g\|_{L^{q'}(\mathbb{R}_+^n)} \end{equation*} for any nonnegative functions $f\in L^{p}(\partial\mathbb{R}_+^n)$ and $g\in L^{q'}(\mathbb{R}_+^n)$, where $n\geq2$, $p,\ q'\in (0,1)$, $α>n$, $0\leqβ<\frac{α-n}{n-1}$, $p>\frac{n-1}{α-1-(n-1)β}$ such that $\frac{n-1}{n}\frac{1}{p}+\frac{1}{q'}-\frac{α+β-1}{n}=1$. We prove the existence of extremal functions for the above inequality. Moreover, in the conformal invariant case, we classify all the extremal functions and hence derive the best constant via a variant method of moving spheres, which can be carried out \emph{without lifting the regularity of Lebesgue measurable solutions}. Finally, we derive the sufficient and necessary conditions for existence of positive solutions to the Euler-Lagrange equations by using Pohozaev identities. Our results are inspired by Hang, Wang and Yan \cite{HWY}, Dou, Guo and Zhu \cite{DGZ} for $α<n$ and $β=1$, and Gluck \cite{Gl} for $α<n$ and $β\geq0$.

preprint2020arXiv

Transverse Momentum Balance and Angular Distribution of $b\bar{b}$ Dijets in Pb+Pb collisions

The productions of inclusive b-jet and $b\bar{b}$ dijets in Pb+Pb collisions have been investigated by considering the heavy quark and the light quark in-medium evolution simultaneously. The initial hard processes of inclusive b-jet and $b\bar{b}$ dijets productions are described by a next-to-leading order (NLO) plus parton shower Monte Carlo (MC) event generator SHERPA which can be well-matched with the experimental data in p+p collisions. The framework combines the Langevin transport model to describe the evolution of the bottom quark also its collisional energy loss and the higher-twist description to consider the radiative energy loss of both the bottom and light quarks. We compare the theoretical simulation of inclusive jet and inclusive b-jet $R_{\rm AA}$ in Pb+Pb collisions at $\sqrt{s_{\rm NN}}=2.76$ TeV with the experimental data, and then present the theoretical simulation of the momentum balance of the $b\bar{b}$ dijet in Pb+Pb collisions at $5.02$ TeV with the recent CMS data for the first time. A similar trend as that in inclusive dijets has been observed in $b\bar{b}$ dijets, the production distribution is shifted to smaller $x_J$ due to the jet quenching effect. At last, the prediction of the normalized azimuthal angle distribution of the $b\bar{b}$ dijet in Pb+Pb collisions at $5.02$ TeV has been reported. The medium induced energy loss effect of the $b\bar{b}$ dijets will overall suppress its production, but the same side ($Δϕ\to 0$ region) suffers more energy loss than away side ($Δϕ\to π$ region), therefore lead to the suppression on the same side and the enhancement on the away side in the normalized azimuthal angle distribution in A+A collisions.

preprint2019arXiv

Benchmarking Contemporary Deep Learning Hardware and Frameworks:A Survey of Qualitative Metrics

This paper surveys benchmarking principles, machine learning devices including GPUs, FPGAs, and ASICs, and deep learning software frameworks. It also reviews these technologies with respect to benchmarking from the perspectives of a 6-metric approach to frameworks and an 11-metric approach to hardware platforms. Because MLPerf is a benchmark organization working with industry and academia, and offering deep learning benchmarks that evaluate training and inference on deep learning hardware devices, the survey also mentions MLPerf benchmark results, benchmark metrics, datasets, deep learning frameworks and algorithms. We summarize seven benchmarking principles, differential characteristics of mainstream AI devices, and qualitative comparison of deep learning hardware and frameworks.

preprint2016arXiv

Heavy Quark and Quarkonium Transport in High Energy Nuclear Collisions

The strong interaction between heavy quarks and the quark gluon plasma makes the open and hidden charm hadrons be sensitive probes of the deconfinement phase transition in high energy nuclear collisions. Both the cold and hot nuclear matter effects change with the colliding energy and significantly influence the heavy quark and charmonium yield and their transverse momentum distributions. The ratio of averaged quarkonium transverse momentum square and the elliptic flow reveal the nature of the QCD medium created in heavy ion collisions at SPS, RHIC and LHC energies.

preprint2016arXiv

Learning Filter Banks Using Deep Learning For Acoustic Signals

Designing appropriate features for acoustic event recognition tasks is an active field of research. Expressive features should both improve the performance of the tasks and also be interpret-able. Currently, heuristically designed features based on the domain knowledge requires tremendous effort in hand-crafting, while features extracted through deep network are difficult for human to interpret. In this work, we explore the experience guided learning method for designing acoustic features. This is a novel hybrid approach combining both domain knowledge and purely data driven feature designing. Based on the procedure of log Mel-filter banks, we design a filter bank learning layer. We concatenate this layer with a convolutional neural network (CNN) model. After training the network, the weight of the filter bank learning layer is extracted to facilitate the design of acoustic features. We smooth the trained weight of the learning layer and re-initialize it in filter bank learning layer as audio feature extractor. For the environmental sound recognition task based on the Urban- sound8K dataset, the experience guided learning leads to a 2% accuracy improvement compared with the fixed feature extractors (the log Mel-filter bank). The shape of the new filter banks are visualized and explained to prove the effectiveness of the feature design process.

preprint2016arXiv

Low-cost high performance distributed data storage for multi-channel observations

The New Vacuum Solar Telescope (NVST) is a 1-m solar telescope that aims to observe the fine structures in both the photosphere and the chromosphere of the Sun. The observational data acquired simultaneously from one channel for the chromosphere and two channels for the photosphere bring great challenges to the data storage of NVST. The multi-channel instruments of NVST, including scientific cameras and multi-band spectrometers, generate at least 3 terabytes data per day and require high access performance while storing massive short-exposure images. It is worth studying and implementing a storage system for NVST which would balance the data availability, access performance and the cost of development. In this paper, we build a distributed data storage system (DDSS) for NVST and then deeply evaluate the availability of real-time data storage on a distributed computing environment. The experimental results show that two factors, i.e., the number of concurrent read/write and the file size, are critically important for improving the performance of data access on a distributed environment. Referring to these two factors, three strategies for storing FITS files are presented and implemented to ensure the access performance of the DDSS under conditions of multi-host write and read simultaneously. The real applications of the DDSS proves that the system is capable of meeting the requirements of NVST real-time high performance observational data storage. Our study on the DDSS is the first attempt for modern astronomical telescope systems to store real-time observational data on a low-cost distributed system. The research results and corresponding techniques of the DDSS provide a new option for designing real-time massive astronomical data storage system and will be a reference for future astronomical data storage.

preprint2016arXiv

NVST data archiving system based on fastbit nosql database

The New Vacuum Solar Telescope (NVST) is a 1-meter vacuum solar telescope that aims to observe the fine structures of active regions on the Sun. The main tasks of the NVST are high resolution imaging and spectral observations, including the measurements of the solar magnetic field. The NVST has been collecting more than 20 million FITS files since it began routine observations in 2012 and produces a maximum observational records of 120 thousand files in a day. Given the large amount of files, the effective archiving and retrieval of files becomes a critical and urgent problem. In this study, we implement a new data archiving system for the NVST based on the Fastbit Not Only Structured Query Language (NoSQL) database. Comparing to the relational database (i.e., MySQL; My Structured Query Language), the Fastbit database manifests distinctive advantages on indexing and querying performance. In a large scale database of 40 million records, the multi-field combined query response time of Fastbit database is about 15 times faster and fully meets the requirements of the NVST. Our study brings a new idea for massive astronomical data archiving and would contribute to the design of data management systems for other astronomical telescopes.

preprint2016arXiv

On $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-$(1+u)$-additive constacyclic

In this paper, we study $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-$(1+u)$-additive constacyclic code of arbitrary length. Firstly, we study the algebraic structure of this family of codes and a set of generator polynomials for this family as a $(\mathbb{Z}_{2}+u\mathbb{Z}_{2})[x]$-submodule of the ring $R_{α,β}$. Secondly, we give the minimal generating sets of this family codes, and we determine the relationship of generators between the $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-$(1+u)$-additive constacyclic codes and its dual and give the parameters in terms of the degrees of the generator polynomials of the code. Lastly, we also study $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-$(1+u)$-additive constacyclic code in terms of the Gray images.

preprint2016arXiv

Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms

We develop parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework. Whenever possible, we perform computations asynchronously, which helps attain speedups on multicore machines as well as in distributed environments. Moreover, instead of worst-case bounded delays, our methods only depend (mildly) on \emph{expected} delays, allowing them to be robust to stragglers and faulty worker threads. Our algorithms assume block-separable constraints, and subsume the recent Block-Coordinate Frank-Wolfe (BCFW) method~\citep{lacoste2013block}. Our analysis reveals problem-dependent quantities that govern the speedups of our methods over BCFW. We present experiments on structural SVM and Group Fused Lasso, obtaining significant speedups over competing state-of-the-art (and synchronous) methods.

preprint2016arXiv

Productions of $η$, $ρ^0$ and $ϕ$ at large transverse momentum in Heavy ion Collisions

The suppression of the productions of the $η$ meson in relativistic heavy-ion collisions and its ratio of $η/π^0$ are computed theoretically in the framework of the perturbative QCD(pQCD) to confront the experimental data which matches well. We explore how the hadron production ratios as$η/π^0$ would further disclose the informations of the production suppressions due to the energy loss of the energetic jet that propagating though the QGP medium. Also, we present our further studies on vector mesons such as $ρ^0$ and $ϕ$ within the same framework. The theoretical predictions based on pQCD are thus firstly given which give a decent description on the experimental measurements. It paved the way to the uniformly understanding of the strong suppression of single hadron productions at large transverse momentum which is a convincing evidence of the jet quenching effect.

preprint2016arXiv

Structured Compressive Sensing Based Spatio-Temporal Joint Channel Estimation for FDD Massive MIMO

Massive MIMO is a promising technique for future 5G communications due to its high spectrum and energy efficiency. To realize its potential performance gain, accurate channel estimation is essential. However, due to massive number of antennas at the base station (BS), the pilot overhead required by conventional channel estimation schemes will be unaffordable, especially for frequency division duplex (FDD) massive MIMO. To overcome this problem, we propose a structured compressive sensing (SCS)-based spatio-temporal joint channel estimation scheme to reduce the required pilot overhead, whereby the spatio-temporal common sparsity of delay-domain MIMO channels is leveraged. Particularly, we first propose the non-orthogonal pilots at the BS under the framework of CS theory to reduce the required pilot overhead. Then, an adaptive structured subspace pursuit (ASSP) algorithm at the user is proposed to jointly estimate channels associated with multiple OFDM symbols from the limited number of pilots, whereby the spatio-temporal common sparsity of MIMO channels is exploited to improve the channel estimation accuracy. Moreover, by exploiting the temporal channel correlation, we propose a space-time adaptive pilot scheme to further reduce the pilot overhead. Additionally, we discuss the proposed channel estimation scheme in multi-cell scenario. Simulation results demonstrate that the proposed scheme can accurately estimate channels with the reduced pilot overhead, and it is capable of approaching the optimal oracle least squares estimator.

preprint2016arXiv

Understanding Audio Pattern Using Convolutional Neural Network From Raw Waveforms

One key step in audio signal processing is to transform the raw signal into representations that are efficient for encoding the original information. Traditionally, people transform the audio into spectral representations, as a function of frequency, amplitude and phase transformation. In this work, we take a purely data-driven approach to understand the temporal dynamics of audio at the raw signal level. We maximize the information extracted from the raw signal through a deep convolutional neural network (CNN) model. Our CNN model is trained on the urbansound8k dataset. We discover that salient audio patterns embedded in the raw waveforms can be efficiently extracted through a combination of nonlinear filters learned by the CNN model.

preprint2016arXiv

Very Deep Convolutional Neural Networks for Raw Waveforms

Learning acoustic models directly from the raw waveform data with minimal processing is challenging. Current waveform-based models have generally used very few (~2) convolutional layers, which might be insufficient for building high-level discriminative features. In this work, we propose very deep convolutional neural networks (CNNs) that directly use time-domain waveforms as inputs. Our CNNs, with up to 34 weight layers, are efficient to optimize over very long sequences (e.g., vector of size 32000), necessary for processing acoustic waveforms. This is achieved through batch normalization, residual learning, and a careful design of down-sampling in the initial layers. Our networks are fully convolutional, without the use of fully connected layers and dropout, to maximize representation learning. We use a large receptive field in the first convolutional layer to mimic bandpass filters, but very small receptive fields subsequently to control the model capacity. We demonstrate the performance gains with the deeper models. Our evaluation shows that the CNN with 18 weight layers outperform the CNN with 3 weight layers by over 15% in absolute accuracy for an environmental sound recognition task and matches the performance of models using log-mel features.

preprint2015arXiv

$η$ meson production of high-energy nuclear collisions at NLO

The transverse momentum spectrum of $η$ meson in relativistic heavy-ion collisions is studied at the next-to-leading-order (NLO) within the perturbative QCD, where the jet quenching effect in the QGP is incorporated with the effectively medium-modified $η$ fragmentation functions using the higher-twist approach. We show that the theoretical simulations could give nice descriptions of PHENIX data on $η$ meson in both $\rm p+p$ and central $\rm Au+Au$ collisions at the RHIC, and also provide numerical predictions of $η$ spectra in central $\rm Pb+Pb$ collisions with $\sqrt{s_{NN}}=2.76$~TeV at the LHC. The ratios of $η/π^0$ in $\rm p+p$ and in central $\rm Au+Au$ collisions at $200$~GeV are found to overlap in a wide $p_T$ region, which matches well the measured ratio $η/ π^0$ by PHENIX. We demonstrate that, at the asymptotic region when $p_{T} \rightarrow \infty$ the ratios of $η/π^{0}$ in both $\rm Au+Au$ and $\rm p+p$ are almost determined only by quark jets fragmentation and thus approach to the one in $e^{+} e^{-}$ scattering; in addition, the almost identical gluon (quark) contribution fractions to $η$ and to $π$ result in a rather moderate variation of $η/π^{0}$ distribution at intermediate and high $p_T$ region in $\rm A+A$ relative to that in $\rm p+p$; while a slightly higher $η/π^{0}$ at small $p_T$ in $\rm Au+Au$ can be observed due to larger suppression of gluon contribution fraction to $π^{0}$ as compared to the one to $η$. The theoretical prediction for $η/ π^0$ at the LHC has also been presented.

preprint2015arXiv

Merge Frame Design for Video Stream Switching using Piecewise Constant Functions

The ability to efficiently switch from one pre-encoded video stream to another (e.g., for bitrate adaptation or view switching) is important for many interactive streaming applications. Recently, stream-switching mechanisms based on distributed source coding (DSC) have been proposed. In order to reduce the overall transmission rate, these approaches provide a "merge" mechanism, where information is sent to the decoder such that the exact same frame can be reconstructed given that any one of a known set of side information (SI) frames is available at the decoder (e.g., each SI frame may correspond to a different stream from which we are switching). However, the use of bit-plane coding and channel coding in many DSC approaches leads to complex coding and decoding. In this paper, we propose an alternative approach for merging multiple SI frames, using a piecewise constant (PWC) function as the merge operator. In our approach, for each block to be reconstructed, a series of parameters of these PWC merge functions are transmitted in order to guarantee identical reconstruction given the known side information blocks. We consider two different scenarios. In the first case, a target frame is first given, and then merge parameters are chosen so that this frame can be reconstructed exactly at the decoder. In contrast, in the second scenario, the reconstructed frame and merge parameters are jointly optimized to meet a rate-distortion criteria. Experiments show that for both scenarios, our proposed merge techniques can outperform both a recent approach based on DSC and the SP-frame approach in H.264, in terms of compression efficiency and decoder complexity.

preprint2015arXiv

Petuum: A New Platform for Distributed Machine Learning on Big Data

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized graph-based execution that relies on graph representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of ML programs at scale. We propose a general-purpose framework that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. This presents unique opportunities for an integrative system design, such as bounded-error network synchronization and dynamic scheduling based on ML program structure. We demonstrate the efficacy of these system designs versus well-known implementations of modern ML algorithms, allowing ML programs to run in much less time and at considerably larger model sizes, even on modestly-sized compute clusters.

preprint2015arXiv

Strategies and Principles of Distributed Machine Learning on Big Data

The rise of Big Data has led to new demands for Machine Learning (ML) systems to learn complex models with millions to billions of parameters, that promise adequate capacity to digest massive datasets and offer powerful predictive analytics thereupon. In order to run ML algorithms at such scales, on a distributed cluster with 10s to 1000s of machines, it is often the case that significant engineering efforts are required --- and one might fairly ask if such engineering truly falls within the domain of ML research or not. Taking the view that Big ML systems can benefit greatly from ML-rooted statistical and algorithmic insights --- and that ML researchers should therefore not shy away from such systems design --- we discuss a series of principles and strategies distilled from our recent efforts on industrial-scale ML solutions. These principles and strategies span a continuum from application, to engineering, and to theoretical research and development of Big ML systems and architectures, with the goal of understanding how to make them efficient, generally-applicable, and supported with convergence and scaling guarantees. They concern four key questions which traditionally receive little attention in ML research: How to distribute an ML program over a cluster? How to bridge ML computation with inter-machine communication? How to perform such communication? What should be communicated between machines? By exposing underlying statistical and algorithmic characteristics unique to ML programs but not typically seen in traditional computer programs, and by dissecting successful cases to reveal how we have harnessed these principles to design and develop both high-performance distributed ML software as well as general-purpose ML frameworks, we present opportunities for ML researchers and practitioners to further shape and grow the area that lies between ML and systems.

preprint2015arXiv

Structured Matching Pursuit for Reconstruction of Dynamic Sparse Channels

In this paper, by exploiting the special features of temporal correlations of dynamic sparse channels that path delays change slowly over time but path gains evolve faster, we propose the structured matching pursuit (SMP) algorithm to realize the reconstruction of dynamic sparse channels. Specifically, the SMP algorithm divides the path delays of dynamic sparse channels into two different parts to be considered separately, i.e., the common channel taps and the dynamic channel taps. Based on this separation, the proposed SMP algorithm simultaneously detects the common channel taps of dynamic sparse channels in all time slots at first, and then tracks the dynamic channel taps in each single time slot individually. Theoretical analysis of the proposed SMP algorithm provides a guarantee that the common channel taps can be successfully detected with a high probability, and the reconstruction distortion of dynamic sparse channels is linearly upper bounded by the noise power. Simulation results demonstrate that the proposed SMP algorithm has excellent reconstruction performance with competitive computational complexity compared with conventional reconstruction algorithms.

preprint2015arXiv

Tracking A Dynamic Sparse Channel Via Differential Orthogonal Matching Pursuit

This paper considers the problem of tracking a dynamic sparse channel in a broadband wireless communication system. A probabilistic signal model is firstly proposed to describe the special features of temporal correlations of dynamic sparse channels: path delays change slowly over time, while path gains evolve faster. Based on such temporal correlations, we then propose the differential orthogonal matching pursuit (D-OMP) algorithm to track a dynamic sparse channel in a sequential way by updating the small channel variation over time. Compared with other channel tracking algorithms, simulation results demonstrate that the proposed D-OMP algorithm can track dynamic sparse channels faster with improved accuracy.

preprint2014arXiv

$L^{p}$ estimates for bilinear and multi-parameter Hilbert transforms

C. Muscalu, J. Pipher, T. Tao and C. Thiele proved in \cite{MPTT1} that the standard bilinear and bi-parameter Hilbert transform does not satisfy any $L^{p}$ estimates. They also raised a question asking if a bilinear and bi-parameter multiplier operator defined by $$ T_{m}(f_{1},f_{2})(x):=\int_{\mathbb{R}^{4}}m(ξ,η)\hat{f_{1}}(ξ_{1},η_{1})\hat{f_{2}}(ξ_{2},η_{2})e^{2πix\cdot((ξ_{1},η_{1})+(ξ_{2},η_{2}))}dξdη$$ satisfies any $L^p$ estimates, where the symbol $m$ satisfies $$ |\partial_ξ^α\partial_η^βm(ξ,η)|\lesssim\frac{1}{dist(ξ,Γ_{1})^{|α|}}\cdot\frac{1}{dist(η,Γ_{2})^{|β|}} $$ for sufficiently many multi-indices $α=(α_{1},α_{2})$ and $β=(β_{1},β_{2})$, $Γ_{i}$ ($i=1,2$) are subspaces in $\mathbb{R}^{2}$ and $dim \, Γ_{1}=0, \, dim \, Γ_{2}=1$. P. Silva answered partially this question in \cite{S} and proved that $T_{m}$ maps $L^{p_1}\times L^{p_2}\rightarrow L^{p}$ boundedly when $\frac{1}{p_1}+\frac{1}{p_2}=\frac{1}{p}$ with $p_1, p_2>1$, $\frac{1}{p_1}+\frac{2}{p_2}<2$ and $\frac{1}{p_2}+\frac{2}{p_1}<2$. One observes that the admissible range here for these tuples $(p_1,p_2,p)$ is a proper subset contained in the admissible range of BHT. In this paper, we establish the same $L^{p}$ estimates as BHT in the full range for the bilinear and multi-parameter Hilbert transforms with arbitrary symbols satisfying appropriate decay assumptions (Theorem 1.3). Moreover, we also establish the same $L^p$ estimates as BHT for certain modified bilinear and bi-parameter Hilbert transforms with $dim \, Γ_{1}=dim \, Γ_{2}=1$ but with a slightly better decay than that for the bilinear and bi-parameter Hilbert transform (Theorem 1.4).

preprint2014arXiv

$L^{p}$ estimates for the bilinear Hilbert transform for $1/2<p\leq2/3$: A counterexample and generalizations to non-smooth symbols

M. Lacey and C. Thiele proved in [27] (Annals of Math. (1997)) and [28] (Annals of Math. (1999)) that the bilinear Hilbert transform maps $L^{p_1}\times L^{p_2}\rightarrow L^{p}$ boundedly when $\frac{1}{p_1}+\frac{1}{p_2}=\frac{1}{p}$ with $1<p_{1}, \, p_{2}\leq\infty$ and $\frac{2}{3}<p<\infty$. Whether the $L^p$ estimates hold in the range $p\in (1/2,2/3]$ has remained an open problem since then. In this paper, we prove that the bilinear Hilbert transform does not map $\mathcal{F}L^{p'_{1}}\times L^{p_{2}}\rightarrow L^{p}$ for $p_1<2$ and $L^{p_{1}}\times \mathcal{F}L^{p'_{2}}\rightarrow L^{p}$ for $p_2<2$ boundedly (Theorem 1.2). In particular, this shows that the bilinear Hilbert transform neither maps $\mathcal{F}L^{p'_{1}}\times L^{p_{2}}\rightarrow L^{p}$ nor $L^{p_{1}}\times \mathcal{F}L^{p'_{2}}\rightarrow L^{p}$ for $\frac{1}{2}<p<\frac{2}{3}$. Nevertheless, we can establish $L^p$ estimates for the bilinear Fourier multipliers whose symbols are not identical to but arbitrarily close to that of the bilinear Hilbert transform in the full range $p\in(1/2,\infty)$ (Theorem 1.3).

preprint2014arXiv

High-Performance Distributed ML at Scale through Parameter Server Consistency Models

As Machine Learning (ML) applications increase in data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Unfortunately, effective use of clusters for ML requires considerable expertise in writing distributed code, while highly-abstracted frameworks like Hadoop have not, in practice, approached the performance seen in specialized ML implementations. The recent Parameter Server (PS) paradigm is a middle ground between these extremes, allowing easy conversion of single-machine parallel ML applications into distributed ones, while maintaining high throughput through relaxed "consistency models" that allow inconsistent parameter reads. However, due to insufficient theoretical study, it is not clear which of these consistency models can really ensure correct ML algorithm output; at the same time, there remain many theoretically-motivated but undiscovered opportunities to maximize computational throughput. Motivated by this challenge, we study both the theoretical guarantees and empirical behavior of iterative-convergent ML algorithms in existing PS consistency models. We then use the gleaned insights to improve a consistency model using an "eager" PS communication mechanism, and implement it as a new PS system that enables ML algorithms to reach their solution more quickly.

preprint2014arXiv

LightLDA: Big Topic Models on Modest Compute Clusters

When building large-scale machine learning (ML) programs, such as big topic models or deep neural nets, one usually assumes such tasks can only be attempted with industrial-sized clusters with thousands of nodes, which are out of reach for most practitioners or academic researchers. We consider this challenge in the context of topic modeling on web-scale corpora, and show that with a modest cluster of as few as 8 machines, we can train a topic model with 1 million topics and a 1-million-word vocabulary (for a total of 1 trillion parameters), on a document collection with 200 billion tokens -- a scale not yet reported even with thousands of machines. Our major contributions include: 1) a new, highly efficient O(1) Metropolis-Hastings sampling algorithm, whose running cost is (surprisingly) agnostic of model size, and empirically converges nearly an order of magnitude faster than current state-of-the-art Gibbs samplers; 2) a structure-aware model-parallel scheme, which leverages dependencies within the topic model, yielding a sampling strategy that is frugal on machine memory and network communication; 3) a differential data-structure for model storage, which uses separate data structures for high- and low-frequency words to allow extremely large models to fit in memory, while maintaining high inference speed; and 4) a bounded asynchronous data-parallel scheme, which allows efficient distributed processing of massive data via a parameter server. Our distribution strategy is an instance of the model-and-data-parallel programming model underlying the Petuum framework for general distributed ML, and was implemented on top of the Petuum open-source system. We provide experimental evidence showing how this development puts massive models within reach on a small cluster while still enjoying proportional time cost reductions with increasing cluster size, in comparison with alternative options.

preprint2014arXiv

Power Allocation in Compressed Sensing of Non-uniformly Sparse Signals

This paper studies the problem of power allocation in compressed sensing when different components in the unknown sparse signal have different probability to be non-zero. Given the prior information of the non-uniform sparsity and the total power budget, we are interested in how to optimally allocate the power across the columns of a Gaussian random measurement matrix so that the mean squared reconstruction error is minimized. Based on the state evolution technique originated from the work by Donoho, Maleki, and Montanari, we revise the so called approximate message passing (AMP) algorithm for the reconstruction and quantify the MSE performance in the asymptotic regime. Then the closed form of the optimal power allocation is obtained. The results show that in the presence of measurement noise, uniform power allocation, which results in the commonly used Gaussian random matrix with i.i.d. entries, is not optimal for non-uniformly sparse signals. Empirical results are presented to demonstrate the performance gain.

preprint2013arXiv

$L^{p}$ estimates for multi-linear and multi-parameter pseudo-differential operators

We establish the pseudo-differential variant of the $L^{p}$ estimates for multi-linear and multi-parameter Coifman-Meyer multiplier operators proved by C. Muscalu, J. Pipher, T. Tao and C. Thiele in \cite{MPTT1,MPTT2}.

preprint2013arXiv

Consistent Bounded-Asynchronous Parameter Servers for Distributed ML

In distributed ML applications, shared parameters are usually replicated among computing nodes to minimize network overhead. Therefore, proper consistency model must be carefully chosen to ensure algorithm's correctness and provide high throughput. Existing consistency models used in general-purpose databases and modern distributed ML systems are either too loose to guarantee correctness of the ML algorithms or too strict and thus fail to fully exploit the computing power of the underlying distributed system. Many ML algorithms fall into the category of \emph{iterative convergent algorithms} which start from a randomly chosen initial point and converge to optima by repeating iteratively a set of procedures. We've found that many such algorithms are to a bounded amount of inconsistency and still converge correctly. This property allows distributed ML to relax strict consistency models to improve system performance while theoretically guarantees algorithmic correctness. In this paper, we present several relaxed consistency models for asynchronous parallel computation and theoretically prove their algorithmic correctness. The proposed consistency models are implemented in a distributed parameter server and evaluated in the context of a popular ML application: topic modeling.

preprint2013arXiv

Momentum imbalance of isolated photon-tagged jet production at RHIC and LHC

In collisions of ultra-relativistic nuclei, photon-tagged jets provide a unique opportunity to compare jet production and modification due to parton shower formation and propagation in strongly-interacting matter at vastly different center-of-mass energies. We present first results for the cross sections of jets tagged by an isolated photon to ${\cal O}(α_{\rm em} α_s^2)$ in central Au+Au reactions with $\sqrt{s_{NN}}=200$ GeV at RHIC and central Pb+Pb reactions with $\sqrt{s_{NN}}=2.76$ TeV at LHC. We evaluate the increase in the transverse momentum imbalance of the observed $γ$+jet state, induced by the dissipation of the parton shower energy due to strong final-state interactions. Theoretical predictions to help interpret recent and upcoming experimental data are presented.

preprint2013arXiv

Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models

The varying-coefficient model is an important nonparametric statistical model that allows us to examine how the effects of covariates vary with exposure variables. When the number of covariates is big, the issue of variable selection arrives. In this paper, we propose and investigate marginal nonparametric screening methods to screen variables in ultra-high dimensional sparse varying-coefficient models. The proposed nonparametric independence screening (NIS) selects variables by ranking a measure of the nonparametric marginal contributions of each covariate given the exposure variable. The sure independent screening property is established under some mild technical conditions when the dimensionality is of nonpolynomial order, and the dimensionality reduction of NIS is quantified. To enhance practical utility and the finite sample performance, two data-driven iterative NIS methods are proposed for selecting thresholding parameters and variables: conditional permutation and greedy methods, resulting in Conditional-INIS and Greedy-INIS. The effectiveness and flexibility of the proposed methods are further illustrated by simulation studies and real data applications.

preprint2012arXiv

Continuous dependence for $H^{2}$ critical nonlinear Schrödinger equations in high dimensions

The global existence of solutions in $H^{2}$ is well known for $H^{2}$ critical nonlinear Schrödinger equations with small initial data in high dimensions $d\geq8$. However, even though the solution is constructed by a fixed-point technique, continuous dependence in $H^{2}$ does not follow from the contraction mapping argument. Comparing with the low dimension cases $4<d<8$, there is an obstruction to this approach because of the sub-quadratic nature of the nonlinearity(which makes the derivative of the nonlinearity non-Lipschitz). In this paper, we resolve this difficulty by applying exotic Strichartz spaces of lower order instead and show that the solution depends continuously on the initial value in the sense that the local flow is continuous $H^{2}\rightarrow H^{2}$.

preprint2012arXiv

Continuous Dependence of Cauchy Problem For Nonlinear Schrödinger Equation in $H^{s}$

We consider the Cauchy problem for the nonlinear Schrödinger equation $i \partial_{t}u+ Δu=λ_{0}u+λ_{1}|u|^αu$ in $\mathbb{R}^{N}$, where $λ_{0},λ_{1}\in\mathbb{C}$, in $H^s$ subcritical and critical case: $0<α\leq\frac{4}{N-2s}$ when $1<s<\frac{N}{2}$ and $0<α<+\infty$ when $s\geq\frac{N}{2}$. We show that the solution depends continuously on the initial value in the standard sense in $H^{s}(\mathbb{R}^{N})$ if $α$ satisfies certain assumptions.

preprint2012arXiv

Technical Report: Observability of a Linear System under Sparsity Constraints

Consider an n-dimensional linear system where it is known that there are at most k<n non-zero components in the initial state. The observability problem, that is the recovery of the initial state, for such a system is considered. We obtain sufficient conditions on the number of the available observations to be able to recover the initial state exactly for such a system. Both deterministic and stochastic setups are considered for system dynamics. In the former setting, the system matrices are known deterministically, whereas in the latter setting, all of the matrices are picked from a randomized class of matrices. The main message is that, one does not need to obtain full n observations to be able to uniquely identify the initial state of the linear system, even when the observations are picked randomly, when the initial condition is known to be sparse.

preprint2011arXiv

Some Results on the Scattering Theory for Nonlinear Schrödinger Equations in Weighted $L^{2}$ Space

We investigate the scattering theory for the nonlinear Schrödinger equation $i \partial_{t}u+ Δu+λ|u|^αu=0$ in $Σ=H^{1}(\mathbb{R}^{d})\cap L^{2}(|x|^{2};dx)$. We show that scattering states $u^{\pm}$ exist in $Σ$ when $α_{d}<α<\frac{4}{d-2}$, $d\geq3$, $λ\in \mathbb{R}$ with certain smallness assumption on the initial data $u_{0}$, and when $α(d)\leq α< \frac{4}{d-2}$($α\in [α(d), \infty)$, if $d=1,2$), $λ>0$ under suitable conditions on $u_{0}$, where $α_{d}$, $α(d)$ are the positive root of the polynomial $dx^{2}+dx-4$ and $dx^{2}+(d-2)x-4$ respectively. Specially, when $λ>0$, we obtain the existence of $u^{\pm}$ in $Σ$ for $u_{0}$ below a mass-energy threshold $M[u_{0}]^σE[u_{0}]<λ^{-2τ}M[Q]^σE[Q]$ and satisfying an mass-gradient bound $\|u_{0}\|_{L^{2}}^σ\|\nabla u_{0}\|_{L^{2}}<λ^{-τ}\|Q\|_{L^{2}}^σ\|\nabla Q\|_{L^{2}}$ with $\frac{4}{d}<α<\frac{4}{d-2}$($α\in (\frac{4}{d}, \infty)$, if $d=1,2$), and also for oscillating data at critical power $α=α(d)$, where $σ=\frac{4-(d-2)α}{αd-4}$, $τ=\frac{2}{αd-4}$ and $Q$ is the ground state. We also study the convergence of $u(t)$ to the free solution $e^{itΔ}u^{\pm}$ in $Σ$, where $u^{\pm}$ is the scattering state at $\pm\infty$ respectively.

preprint2011arXiv

Structured sublinear compressive sensing via belief propagation

Compressive sensing (CS) is a sampling technique designed for reducing the complexity of sparse data acquisition. One of the major obstacles for practical deployment of CS techniques is the signal reconstruction time and the high storage cost of random sensing matrices. We propose a new structured compressive sensing scheme, based on codes of graphs, that allows for a joint design of structured sensing matrices and logarithmic-complexity reconstruction algorithms. The compressive sensing matrices can be shown to offer asymptotically optimal performance when used in combination with Orthogonal Matching Pursuit (OMP) methods. For more elaborate greedy reconstruction schemes, we propose a new family of list decoding belief propagation algorithms, as well as reinforced- and multiple-basis belief propagation algorithms. Our simulation results indicate that reinforced BP CS schemes offer very good complexity-performance tradeoffs for very sparse signal vectors.

preprint2010arXiv

A Geometric Approach to Low-Rank Matrix Completion

The low-rank matrix completion problem can be succinctly stated as follows: given a subset of the entries of a matrix, find a low-rank matrix consistent with the observations. While several low-complexity algorithms for matrix completion have been proposed so far, it remains an open problem to devise search procedures with provable performance guarantees for a broad class of matrix models. The standard approach to the problem, which involves the minimization of an objective function defined using the Frobenius metric, has inherent difficulties: the objective function is not continuous and the solution set is not closed. To address this problem, we consider an optimization procedure that searches for a column (or row) space that is geometrically consistent with the partial observations. The geometric objective function is continuous everywhere and the solution set is the closure of the solution set of the Frobenius metric. We also preclude the existence of local minimizers, and hence establish strong performance guarantees, for special completion scenarios, which do not require matrix incoherence or large matrix size.

preprint2010arXiv

Commutative-like Encryption: A New Characterization of ElGamal

Commutative encryption is a useful but rather strict notion in cryptography. In this paper, we deny a loose variation of commutative encryption-commutative-like encryption and give an example: the generalization of ElGamal scheme. The application of the new variation is also discussed.

preprint2010arXiv

SET: an algorithm for consistent matrix completion

A new algorithm, termed subspace evolution and transfer (SET), is proposed for solving the consistent matrix completion problem. In this setting, one is given a subset of the entries of a low-rank matrix, and asked to find one low-rank matrix consistent with the given observations. We show that this problem can be solved by searching for a column space that matches the observations. The corresponding algorithm consists of two parts -- subspace evolution and subspace transfer. In the evolution part, we use a line search procedure to refine the column space. However, line search is not guaranteed to converge, as there may exist barriers along the search path that prevent the algorithm from reaching a global optimum. To address this problem, in the transfer part, we design mechanisms to detect barriers and transfer the estimated column space from one side of the barrier to the another. The SET algorithm exhibits excellent empirical performance for very low-rank matrices.

preprint2010arXiv

Subspace Evolution and Transfer (SET) for Low-Rank Matrix Completion

We describe a new algorithm, termed subspace evolution and transfer (SET), for solving low-rank matrix completion problems. The algorithm takes as its input a subset of entries of a low-rank matrix, and outputs one low-rank matrix consistent with the given observations. The completion task is accomplished by searching for a column space on the Grassmann manifold that matches the incomplete observations. The SET algorithm consists of two parts -- subspace evolution and subspace transfer. In the evolution part, we use a gradient descent method on the Grassmann manifold to refine our estimate of the column space. Since the gradient descent algorithm is not guaranteed to converge, due to the existence of barriers along the search path, we design a new mechanism for detecting barriers and transferring the estimated column space across the barriers. This mechanism constitutes the core of the transfer step of the algorithm. The SET algorithm exhibits excellent empirical performance for both high and low sampling rate regimes.

preprint2010arXiv

Universal two-step crystallization of DNA-functionalized nanoparticles

We examine the crystallization dynamics of nanoparticles reversibly tethered by DNA hybridization. We show that the crystallization happens readily only in a narrow temperature "slot," and always proceeds via a two-step process, mediated by a highly-connected amorphous intermediate. For lower temperature quenches, the dynamics of unzipping strands in the amorphous state is sufficiently slow that crystallization is kinetically hindered. This accounts for the well-documented difficulty of forming crystals in these systems. The strong parallel to the crystallization behavior of proteins and colloids suggests that these disparate systems crystallize in an apparently universal manner.

preprint2007arXiv

Unequal dimensional small balls and quantization on Grassmann Manifolds

The Grassmann manifold G_{n,p}(L) is the set of all p-dimensional planes (through the origin) in the n-dimensional Euclidean space L^{n}, where L is either R or C. This paper considers an unequal dimensional quantization in which a source in G_{n,p}(L) is quantized through a code in G_{n,q}(L), where p and q are not necessarily the same. It is different from most works in literature where p\equiv q. The analysis for unequal dimensional quantization is based on the volume of a metric ball in G_{n,p}(L) whose center is in G_{n,q}(L). Our chief result is a closed-form formula for the volume of a metric ball when the radius is sufficiently small. This volume formula holds for Grassmann manifolds with arbitrary n, p, q and L, while previous results pertained only to some special cases. Based on this volume formula, several bounds are derived for the rate distortion tradeoff assuming the quantization rate is sufficiently high. The lower and upper bounds on the distortion rate function are asymptotically identical, and so precisely quantify the asymptotic rate distortion tradeoff. We also show that random codes are asymptotically optimal in the sense that they achieve the minimum achievable distortion with probability one as n and the code rate approach infinity linearly. Finally, we discuss some applications of the derived results to communication theory. A geometric interpretation in the Grassmann manifold is developed for capacity calculation of additive white Gaussian noise channel. Further, the derived distortion rate function is beneficial to characterizing the effect of beamforming matrix selection in multi-antenna communications.

Wei Dai

What is connected

Connect this record

See the researcher in context

Building this map preview

63 published item(s)

A Control Theoretic Approach to Decentralized AI Economy Stabilization via Dynamic Buyback-and-Burn Mechanisms

CasualSynth: Generating Structurally Sound Synthetic Data

Differentially Private Motif-Preserving Multi-modal Hashing

MathDoc: Benchmarking Structured Extraction and Active Refusal on Noisy Mathematics Exam Papers

Exploiting Scale-Variant Attention for Segmenting Small Medical Objects

A New Learning Paradigm for Stochastic Configuration Network: SCN+

Blind Two-Dimensional Super-Resolution and Its Performance Guarantee (Extended Version)

Data-Efficient Modeling for Precise Power Consumption Estimation of Quadrotor Operations Using Ensemble Learning

Digging into Primary Financial Market: Challenges and Opportunities of Adopting Blockchain

Maximum principles and the method of moving planes for the uniformly elliptic nonlocal Bellman operator and applications

Orthogonal Stochastic Configuration Networks with Adaptive Construction Parameter for Data Analytics

Solving DC Power Flow Problems Using Quantum and Hybrid algorithms

Improved ACD-based financial trade durations prediction leveraging LSTM networks and Attention Mechanism

Liouville type theorems for fractional and higher order Hénon-Hardy type equations via the method of scaling spheres

Demonstration of Controlled-Phase Gates between Two Error-Correctable Photonic Qubits

Dictionary Learning with BLOTLESS Update

Direct methods for pseudo-relativistic Schrödinger operators

EVA: An Encrypted Vector Arithmetic Language and Compiler for Efficient Homomorphic Computation

HEAX: An Architecture for Computing on Encrypted Data

Learning Optimal Tree Models Under Beam Search

Radial distribution of charm quarks in jets in high-energy heavy-ion collisions

Radial profile of heavy quarks in jets in high-energy nuclear collisions

Selective Confidence Intervals for Martingale Regression Model

Self-similar solutions of energy-supercritical focusing wave equations in all dimensions

Sharp reversed Hardy-Littlewood-Sobolev inequality with extended kernel

Transverse Momentum Balance and Angular Distribution of $b\bar{b}$ Dijets in Pb+Pb collisions

Benchmarking Contemporary Deep Learning Hardware and Frameworks:A Survey of Qualitative Metrics

Heavy Quark and Quarkonium Transport in High Energy Nuclear Collisions

Learning Filter Banks Using Deep Learning For Acoustic Signals

Low-cost high performance distributed data storage for multi-channel observations

NVST data archiving system based on fastbit nosql database

On $\mathbb{Z}_{2}\mathbb{Z}_{2}[u]$-$(1+u)$-additive constacyclic

Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms

Productions of $η$, $ρ^0$ and $ϕ$ at large transverse momentum in Heavy ion Collisions

Structured Compressive Sensing Based Spatio-Temporal Joint Channel Estimation for FDD Massive MIMO

Understanding Audio Pattern Using Convolutional Neural Network From Raw Waveforms

Very Deep Convolutional Neural Networks for Raw Waveforms

$η$ meson production of high-energy nuclear collisions at NLO

Merge Frame Design for Video Stream Switching using Piecewise Constant Functions

Petuum: A New Platform for Distributed Machine Learning on Big Data

Strategies and Principles of Distributed Machine Learning on Big Data

Structured Matching Pursuit for Reconstruction of Dynamic Sparse Channels

Tracking A Dynamic Sparse Channel Via Differential Orthogonal Matching Pursuit

$L^{p}$ estimates for bilinear and multi-parameter Hilbert transforms

$L^{p}$ estimates for the bilinear Hilbert transform for $1/2<p\leq2/3$: A counterexample and generalizations to non-smooth symbols

High-Performance Distributed ML at Scale through Parameter Server Consistency Models

LightLDA: Big Topic Models on Modest Compute Clusters

Power Allocation in Compressed Sensing of Non-uniformly Sparse Signals

$L^{p}$ estimates for multi-linear and multi-parameter pseudo-differential operators

Consistent Bounded-Asynchronous Parameter Servers for Distributed ML

Momentum imbalance of isolated photon-tagged jet production at RHIC and LHC

Nonparametric Independence Screening in Sparse Ultra-High Dimensional Varying Coefficient Models

Continuous dependence for $H^{2}$ critical nonlinear Schrödinger equations in high dimensions

Continuous Dependence of Cauchy Problem For Nonlinear Schrödinger Equation in $H^{s}$

Technical Report: Observability of a Linear System under Sparsity Constraints

Some Results on the Scattering Theory for Nonlinear Schrödinger Equations in Weighted $L^{2}$ Space

Structured sublinear compressive sensing via belief propagation

A Geometric Approach to Low-Rank Matrix Completion

Commutative-like Encryption: A New Characterization of ElGamal

SET: an algorithm for consistent matrix completion

Subspace Evolution and Transfer (SET) for Low-Rank Matrix Completion

Universal two-step crystallization of DNA-functionalized nanoparticles

Unequal dimensional small balls and quantization on Grassmann Manifolds