Source author record

Tao Guo

Tao Guo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT hep-ph Artificial Intelligence Computation and Language Machine Learning nucl-th Cryptography and Security eess.AS eess.SP Information Retrieval Multimedia Sound

Catalog footprint

What is connected

16works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ChannelKAN: Multi-Scale Dual-Domain Channel Prediction via Hybrid CNN-KAN Architecture

Accurate channel state information (CSI) prediction is essential for improving the reliability and spectral efficiency of massive MIMO-OFDM systems in high-mobility scenarios. Existing deep learning methods struggle to jointly capture short-term local variations and long-range nonlinear dependencies in CSI sequences. To address this challenge, we propose ChannelKAN, a hybrid CNN-KAN channel prediction model with multi-scale frequency domain information enhancement. The key insight is that CNNs and Kolmogorov-Arnold Networks (KANs) are naturally complementary: CNNs extract intra-time-step local spatial-frequency correlations, while KANs with learnable Chebyshev polynomial activations fit inter-time-step nonlinear temporal evolution in a holistic manner. Specifically, a dual-domain expansion module first generates complementary frequency-domain and delay-domain CSI representations. A multi-scale frequency information enhancement module then retains dominant spectral components at multiple scales to strengthen key features and suppress noise. Next, a CNN-KAN feature extraction module captures local correlations via cascaded convolutions and models long-range dependencies via Chebyshev KAN layers. Finally, a dual-domain fusion module adaptively integrates features from both branches to produce the prediction. Experiments on 3GPP-compliant QuaDRiGa datasets demonstrate that ChannelKAN outperforms RNN, LSTM, GRU, CNN, and Transformer baselines in normalized mean square error (NMSE), spectral efficiency (SE), and bit error rate (BER) across various velocities and signal-to-noise ratios. Ablation studies further confirm the effectiveness of each proposed module.

preprint2026arXiv

MiMo-V2-Flash Technical Report

We present MiMo-V2-Flash, a Mixture-of-Experts (MoE) model with 309B total parameters and 15B active parameters, designed for fast, strong reasoning and agentic capabilities. MiMo-V2-Flash adopts a hybrid attention architecture that interleaves Sliding Window Attention (SWA) with global attention, with a 128-token sliding window under a 5:1 hybrid ratio. The model is pre-trained on 27 trillion tokens with Multi-Token Prediction (MTP), employing a native 32k context length and subsequently extended to 256k. To efficiently scale post-training compute, MiMo-V2-Flash introduces a novel Multi-Teacher On-Policy Distillation (MOPD) paradigm. In this framework, domain-specialized teachers (e.g., trained via large-scale reinforcement learning) provide dense and token-level reward, enabling the student model to perfectly master teacher expertise. MiMo-V2-Flash rivals top-tier open-weight models such as DeepSeek-V3.2 and Kimi-K2, despite using only 1/2 and 1/3 of their total parameters, respectively. During inference, by repurposing MTP as a draft model for speculative decoding, MiMo-V2-Flash achieves up to 3.6 acceptance length and 2.6x decoding speedup with three MTP layers. We open-source both the model weights and the three-layer MTP weights to foster open research and community collaboration.

preprint2025arXiv

MiMo-Audio: Audio Language Models are Few-Shot Learners

Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions. GPT-3 has shown that scaling next-token prediction pretraining enables strong generalization capabilities in text, and we believe this paradigm is equally applicable to the audio domain. By scaling MiMo-Audio's pretraining data to over one hundred million of hours, we observe the emergence of few-shot learning capabilities across a diverse set of audio tasks. We develop a systematic evaluation of these capabilities and find that MiMo-Audio-7B-Base achieves SOTA performance on both speech intelligence and audio understanding benchmarks among open-source models. Beyond standard metrics, MiMo-Audio-7B-Base generalizes to tasks absent from its training data, such as voice conversion, style transfer, and speech editing. MiMo-Audio-7B-Base also demonstrates powerful speech continuation capabilities, capable of generating highly realistic talk shows, recitations, livestreaming and debates. At the post-training stage, we curate a diverse instruction-tuning corpus and introduce thinking mechanisms into both audio understanding and generation. MiMo-Audio-7B-Instruct achieves open-source SOTA on audio understanding benchmarks (MMSU, MMAU, MMAR, MMAU-Pro), spoken dialogue benchmarks (Big Bench Audio, MultiChallenge Audio) and instruct-TTS evaluations, approaching or surpassing closed-source models. Model checkpoints and full evaluation suite are available at https://github.com/XiaomiMiMo/MiMo-Audio.

preprint2022arXiv

Efficient Attribute Unlearning: Towards Selective Removal of Input Attributes from Feature Representations

Recently, the enactment of privacy regulations has promoted the rise of the machine unlearning paradigm. Existing studies of machine unlearning mainly focus on sample-wise unlearning, such that a learnt model will not expose user's privacy at the sample level. Yet we argue that such ability of selective removal should also be presented at the attribute level, especially for the attributes irrelevant to the main task, e.g., whether a person recognized in a face recognition system wears glasses or the age range of that person. Through a comprehensive literature review, it is found that existing studies on attribute-related problems like fairness and de-biasing learning cannot address the above concerns properly. To bridge this gap, we propose a paradigm of selectively removing input attributes from feature representations which we name `attribute unlearning'. In this paradigm, certain attributes will be accurately captured and detached from the learned feature representations at the stage of training, according to their mutual information. The particular attributes will be progressively eliminated along with the training procedure towards convergence, while the rest of attributes related to the main task are preserved for achieving competitive model performance. Considering the computational complexity during the training process, we not only give a theoretically approximate training method, but also propose an acceleration scheme to speed up the training process. We validate our method by spanning several datasets and models and demonstrate that our design can preserve model fidelity and reach prevailing unlearning efficacy with high efficiency. The proposed unlearning paradigm builds a foundation for future machine unlearning system and will become an essential component of the latest privacy-related legislation.

preprint2022arXiv

Lossy Computing with Side Information via Multi-Hypergraphs

We consider a problem of coding for computing, where the decoder wishes to estimate a function of its local message and the source message at the encoder within a given distortion. We show that the rate-distortion function can be characterized through a characteristic multi-hypergraph, which simplifies the evaluation of the rate-distortion function.

preprint2022arXiv

Mass spectra of doubly heavy tetraquarks in an improved chromomagnetic interaction model

Doubly heavy tetraquark states are the prime candidates for tightly bound exotic states. We present a systematic study of the mass spectra of the $S$-wave doubly heavy tetraquark states $QQ\bar{q}\bar{q}$ ($q=u, d, s$ and $Q=c, b$) with different quantum numbers $J^P=0^+$, $1^+$, and $2^+$ in the framework of the improved chromomagnetic interaction (ICMI) model. The parameters in the ICMI model are obtained by fitting the conventional hadron spectra and are used directly to predict the masses of the tetraquark states. For heavy quarks, the uncertainties of the parameters are obtained by comparing the masses of doubly and triply heavy baryons with those given by lattice QCD, QCD sum rules, and potential models. Several compact and stable bound states are found in both the doubly charmed and doubly bottomed tetraquark systems. The predicted mass of the $cc\bar u\bar d$ state is consistent with the recent measurement from the LHCb collaboration.

preprint2022arXiv

PromptFL: Let Federated Participants Cooperatively Learn Prompts Instead of Models -- Federated Learning in Age of Foundation Model

Quick global aggregation of effective distributed parameters is crucial to federated learning (FL), which requires adequate bandwidth for parameters communication and sufficient user data for local training. Otherwise, FL may cost excessive training time for convergence and produce inaccurate models. In this paper, we propose a brand-new FL framework, PromptFL, that replaces the federated model training with the federated prompt training, i.e., let federated participants train prompts instead of a shared model, to simultaneously achieve the efficient global aggregation and local training on insufficient data by exploiting the power of foundation models (FM) in a distributed way. PromptFL ships an off-the-shelf FM, i.e., CLIP, to distributed clients who would cooperatively train shared soft prompts based on very few local data. Since PromptFL only needs to update the prompts instead of the whole model, both the local training and the global aggregation can be significantly accelerated. And FM trained over large scale data can provide strong adaptation capability to distributed users tasks with the trained soft prompts. We empirically analyze the PromptFL via extensive experiments, and show its superiority in terms of system feasibility, user privacy, and performance.

preprint2022arXiv

Semantic Compression with Side Information: A Rate-Distortion Perspective

We consider the semantic rate-distortion problem motivated by task-oriented video compression. The semantic information corresponding to the task, which is not observable to the encoder, shows impacts on the observations through a joint probability distribution. The similarities among intra-frame segments and inter-frames in video compression are formulated as side information available at both the encoder and the decoder. The decoder is interested in recovering the observation and making an inference of the semantic information under certain distortion constraints. We establish the information-theoretic limits for the tradeoff between compression rates and distortions by fully characterizing the rate-distortion function. We further evaluate the rate-distortion function under specific Markov conditions for three scenarios: i) both the task and the observation are binary sources; ii) the task is a binary classification of an integer observation as even and odd; iii) Gaussian correlated task and observation. We also illustrate through numerical results that recovering only the semantic information can reduce the coding rate comparing to recovering the source observation.

preprint2021arXiv

Mass spectra and decays of open-heavy tetraquark states

Open-heavy tetraquark states, especially those contain four different quarks have drawn much attention in both theoretical and experimental fields. In the framework of the improved chromomagnetic interaction (ICMI) model, we complete a systematic study on the mass spectra and possible strong decay channels of the $S$-wave open-heavy tetraquark states, $qq\bar{q}\bar{Q}$ ($q=u,d,s$ and $Q=c,b$), with different quantum number $J^P=0^+$, $1^+$, and $2^+$. The parameters in the ICMI model are extracted from the conventional hadron spectra and used directly to predict the mass of tetraquark states. Several compact bound states and narrow resonances are found in both charm-strange and bottom-strange tetraquark sectors, most of them as a product of the strong coupling between the different channels. Our results show the recently discovered four different flavors tetraquark candidates $X_0(2900)$ is probably compact $ud\bar{s}\bar{c}$ state with quantum number $J^P=0^+$. The predictions about $X_0(2900)$ and its partners are expected to be better checked with other theories and future experiments.

preprint2021arXiv

Searching for lepton flavor violating decays tau to Pl in Minimal R-symmetric Supersymmetric Standard Model

We analyze the lepton flavor violating decays $τ\rightarrow Pl$ ($P=π,η,η';\;l=e,μ$) in the scenario of the minimal R-symmetric supersymmetric standard model. The prediction on the branching ratios BR$(τ\rightarrow P e)$ and BR$(τ\rightarrow P μ)$ is affected by the mass insertion parameters $δ^{13}$ and $δ^{23}$, respectively. These parameters are constrained by the experimental bounds on the branching ratios BR($τ\rightarrow e (μ) γ$) and BR($τ\rightarrow 3e(μ)$). The result shows $Z$ penguin dominates the prediction on BR($τ\rightarrow Pl$) in a large region of the parameter space. The branching ratios for BR($τ\rightarrow Pl$) are predicted to be, at least, five orders of magnitude smaller than present experimental bounds and three orders of magnitude smaller than future experimental sensitivities.

preprint2021arXiv

Structural Entropy of the Stochastic Block Models

With the rapid expansion of graphs and networks and the growing magnitude of data from all areas of science, effective treatment and compression schemes of context-dependent data is extremely desirable. A particularly interesting direction is to compress the data while keeping the "structural information" only and ignoring the concrete labelings. Under this direction, Choi and Szpankowski introduced the structures (unlabeled graphs) which allowed them to compute the structural entropy of the Erdős--Rényi random graph model. Moreover, they also provided an asymptotically optimal compression algorithm that (asymptotically) achieves this entropy limit and runs in expectation in linear time. In this paper, we consider the Stochastic Block Models with an arbitrary number of parts. Indeed, we define a partitioned structural entropy for Stochastic Block Models, which generalizes the structural entropy for unlabeled graphs and encodes the partition information as well. We then compute the partitioned structural entropy of the Stochastic Block Models, and provide a compression scheme that asymptotically achieves this entropy limit.

preprint2020arXiv

Edge-assisted Viewport Adaptive Scheme for real-time Omnidirectional Video transmission

Omnidirectional applications are immersive and highly interactive, which can improve the efficiency of remote collaborative work among factory workers. The transmission of omnidirectional video (OV) is the most important step in implementing virtual remote collaboration. Compared with the ordinary video transmission, OV transmission requires more bandwidth, which is still a huge burden even under 5G networks. The tile-based scheme can reduce bandwidth consumption. However, it neither accurately obtain the field of view(FOV) area, nor difficult to support real-time OV streaming. In this paper, we propose an edge-assisted viewport adaptive scheme (EVAS-OV) to reduce bandwidth consumption during real-time OV transmission. First, EVAS-OV uses a Gated Recurrent Unit(GRU) model to predict users' viewport. Then, users were divided into multicast clusters thereby further reducing the consumption of computing resources. EVAS-OV reprojects OV frames to accurately obtain users' FOV area from pixel level and adopt a redundant strategy to reduce the impact of viewport prediction errors. All computing tasks were offloaded to edge servers to reduce the transmission delay and improve bandwidth utilization. Experimental results show that EVAS-OV can save more than 60\% of bandwidth compared with the non-viewport adaptive scheme. Compared to a two-layer scheme with viewport adaptive, EVAS-OV still saves 30\% of bandwidth.

preprint2020arXiv

New Results on the Storage-Retrieval Tradeoff in Private Information Retrieval Systems

In a private information retrieval (PIR) system, the user needs to retrieve one of the possible messages from a set of storage servers, but wishes to keep the identity of requested message private from any given server. Existing efforts in this area have made it clear that the efficiency of the retrieval will be impacted significantly by the amount of the storage space allowed at the servers. In this work, we consider the tradeoff between the storage cost and the retrieval cost. We first present three fundamental results: 1) a regime-wise 2-approximate characterization of the optimal tradeoff, 2) a cyclic permutation lemma that can produce more sophisticated codes from simpler ones, and 3) a relaxed entropic linear program (LP) lower bound that has a polynomial complexity. Equipped with the cyclic permutation lemma, we then propose two novel code constructions, and by applying the lemma, obtain new storage-retrieval points. Furthermore, we derive more explicit lower bounds by utilizing only a subset of the constraints in the relaxed entropic LP in a systematic manner. Though the new upper bound and lower bound do not lead to a more precise approximate characterization in general, they are significantly tighter than the existing art.

preprint2020arXiv

On the Information Leakage in Private Information Retrieval Systems

We consider information leakage to the user in private information retrieval (PIR) systems. Information leakage can be measured in terms of individual message leakage or total leakage. Individual message leakage, or simply individual leakage, is defined as the amount of information that the user can obtain on any individual message that is not being requested, and the total leakage is defined as the amount of information that the user can obtain about all the other messages except the one being requested. In this work, we characterize the tradeoff between the minimum download cost and the individual leakage, and that for the total leakage, respectively. New codes are proposed to achieve these optimal tradeoffs, which are also shown to be optimal in terms of the message size. We further characterize the optimal tradeoff between the minimum amount of common randomness and the total leakage. Moreover, we show that under individual leakage, common randomness is in fact unnecessary when there are more than two messages.

preprint2020arXiv

Weakly Secure Symmetric Multilevel Diversity Coding

Multilevel diversity coding is a classical coding model where multiple mutually independent information messages are encoded, such that different reliability requirements can be afforded to different messages. It is well known that {\em superposition coding}, namely separately encoding the independent messages, is optimal for symmetric multilevel diversity coding (SMDC) (Yeung-Zhang 1999). In the current paper, we consider weakly secure SMDC where security constraints are injected on each individual message, and provide a complete characterization of the conditions under which superposition coding is sum-rate optimal. Two joint coding strategies, which lead to rate savings compared to superposition coding, are proposed, where some coding components for one message can be used as the encryption key for another. By applying different variants of Han's inequality, we show that the lack of opportunity to apply these two coding strategies directly implies the optimality of superposition coding. It is further shown that under a set of particular security constraints, one of the proposed joint coding strategies can be used to construct a code that achieves the optimal rate region.

preprint2011arXiv

The possible candidates of tetraquark : $Z_b(10610)$ and $Z_b(10650)$

Using the chromomagnetic interaction Hamiltonian with proper account for the SU(3) flavor symmetry breaking, we have performed a schematic study on the masses of $S-$wave heavy tetraquarks as $bq\bar{b}\bar{q}$ ($q$ denotes $u$, $d$, $s$ quark). It is found that the numeral results for $bu\bar{b}\bar{d}$ or $bd\bar{b}\bar{u}$ with $1^{+}$ quantum number are 10612 MeV and 10683 MeV respectively, which are well compatible with the recent detected charged bottomonium-like $Z_b(10610)$ and $Z_b(10650)$. Theoretically, we also investigate the possible tetraquark states of $1^{++}$ and $2^{+}$ due to the charge conjugation as the potential candidates for the updating experiments.

Tao Guo

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

ChannelKAN: Multi-Scale Dual-Domain Channel Prediction via Hybrid CNN-KAN Architecture

MiMo-V2-Flash Technical Report

MiMo-Audio: Audio Language Models are Few-Shot Learners

Efficient Attribute Unlearning: Towards Selective Removal of Input Attributes from Feature Representations

Lossy Computing with Side Information via Multi-Hypergraphs

Mass spectra of doubly heavy tetraquarks in an improved chromomagnetic interaction model

PromptFL: Let Federated Participants Cooperatively Learn Prompts Instead of Models -- Federated Learning in Age of Foundation Model

Semantic Compression with Side Information: A Rate-Distortion Perspective

Mass spectra and decays of open-heavy tetraquark states

Searching for lepton flavor violating decays tau to Pl in Minimal R-symmetric Supersymmetric Standard Model

Structural Entropy of the Stochastic Block Models

Edge-assisted Viewport Adaptive Scheme for real-time Omnidirectional Video transmission

New Results on the Storage-Retrieval Tradeoff in Private Information Retrieval Systems

On the Information Leakage in Private Information Retrieval Systems

Weakly Secure Symmetric Multilevel Diversity Coding

The possible candidates of tetraquark : $Z_b(10610)$ and $Z_b(10650)$