Source author record

Wei Wen

Wei Wen appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Neurons and Cognition quant-ph Neural and Evolutionary Computing Artificial Intelligence Computation and Language eess.IV math-ph math.MP

Catalog footprint

What is connected

19works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

We introduce Youtu-LLM, a lightweight yet powerful language model that harmonizes high computational efficiency with native agentic intelligence. Unlike typical small models that rely on distillation, Youtu-LLM (1.96B) is pre-trained from scratch to systematically cultivate reasoning and planning capabilities. The key technical advancements are as follows: (1) Compact Architecture with Long-Context Support: Built on a dense Multi-Latent Attention (MLA) architecture with a novel STEM-oriented vocabulary, Youtu-LLM supports a 128k context window. This design enables robust long-context reasoning and state tracking within a minimal memory footprint, making it ideal for long-horizon agent and reasoning tasks. (2) Principled "Commonsense-STEM-Agent" Curriculum: We curated a massive corpus of approximately 11T tokens and implemented a multi-stage training strategy. By progressively shifting the pre-training data distribution from general commonsense to complex STEM and agentic tasks, we ensure the model acquires deep cognitive abilities rather than superficial alignment. (3) Scalable Agentic Mid-training: Specifically for the agentic mid-training, we employ diverse data construction schemes to synthesize rich and varied trajectories across math, coding, and tool-use domains. This high-quality data enables the model to internalize planning and reflection behaviors effectively. Extensive evaluations show that Youtu-LLM sets a new state-of-the-art for sub-2B LLMs. On general benchmarks, it achieves competitive performance against larger models, while on agent-specific tasks, it significantly surpasses existing SOTA baselines, demonstrating that lightweight models can possess strong intrinsic agentic capabilities.

preprint2022arXiv

Computer-Aided Extraction of Select MRI Markers of Cerebral Small Vessel Disease: A Systematic Review

Cerebral small vessel disease (CSVD) is a major vascular contributor to cognitive impairment in ageing, including dementias. Imaging remains the most promising method for in vivo studies of CSVD. To replace the subjective and laborious visual rating approaches, emerging studies have applied state-of-the-art artificial intelligence to extract imaging biomarkers of CSVD from MRI scans. We aimed to summarise published computer-aided methods to examine three imaging biomarkers of CSVD, namely cerebral microbleeds (CMB), dilated perivascular spaces (PVS), and lacunes of presumed vascular origin. Seventy-one classical image processing, classical machine learning, and deep learning studies were identified. CMB and PVS have been better studied, compared to lacunes. While good performance metrics have been achieved in local test datasets, there have not been generalisable pipelines validated in different research or clinical cohorts. Transfer learning and weak supervision techniques have been applied to accommodate the limitations in training data. Future studies could consider pooling data from multiple sources to increase diversity, and validating the performance of the methods using both image processing metrics and associations with clinical measures.

preprint2022arXiv

Exploiting Feature Diversity for Make-up Temporal Video Grounding

This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022. MTVG aims at localizing the temporal boundary of the step in an untrimmed video based on a textual description. The biggest challenge of this task is the fi ne-grained video-text semantics of make-up steps. However, current methods mainly extract video features using action-based pre-trained models. As actions are more coarse-grained than make-up steps, action-based features are not sufficient to provide fi ne-grained cues. To address this issue,we propose to achieve fi ne-grained representation via exploiting feature diversities. Specifically, we proposed a series of methods from feature extraction, network optimization, to model ensemble. As a result, we achieved 3rd place in the MTVG competition.

preprint2022arXiv

Hormonal Factors Moderate the Associations Between Vascular Risk Factors and White Matter Hyperintensities

Objective: To examine the moderation effects of hormonal factors on the associations between vascular risk factors and white matter hyperintensities (WMH) in men and women, separately. Methods: WMH were automatically segmented and quantified in the UK Biobank dataset (N = 18,294). Generalised linear models were applied to examine 1) the main effects of vascular (body mass index, hip to waist ratio, pulse wave velocity, hypercholesterolemia, diabetes, hypertension, smoking status) and hormonal (testosterone levels, contraceptive pill, hormone replacement therapy, menopause) factors on WMH, and 2) the moderation effects of hormonal factors on the relationship between vascular risk factors and WMH volumes. Results: In men with testosterone levels one standard deviation (SD) higher than the mean value, increased body mass index and pulse wave velocity, and smoking were associated with higher WMH volumes. The association between body mass index and WMH was more significant in the periventricular white matter regions, whilst the relationship between pulse wave velocity and WMH was restricted to deep white matter regions. Men with low testosterone levels (one SD below the mean level) showed a significant association between hypercholesterolemia and higher deep WMH volumes. Hypertensive women showed higher WMH volumes than women without hypertension regardless of whether hormone replacement therapy was used. However, higher WMH volumes, especially in the deep white matter regions, were found in women who did not use hormone replacement therapy or use it for a shorter duration. Conclusion: These findings highlighted the importance of considering hormonal risk factors in the prevention and management of WMH.

preprint2022arXiv

Network resilience in the aging brain

Degeneration and adaptation are two competing sides of the same coin called resilience in the progressive processes of brain aging or diseases. Degeneration accumulates during brain aging and other cerebral activities, causing structural atrophy and dysfunction. At the same time, adaptation allows brain network reorganize to compensate for structural loss to maintain cognition function. Although hidden resilience mechanism is critical and fundamental to uncover the brain aging law, due to the lack of datasets and appropriate methodology, it remains essentially unknown how these two processes interact dynamically across brain networks. To quantitatively investigate this complex process, we analyze aging brains based on 6-year follow-up multimodal neuroimaging database from 63 persons. We reveal the critical mechanism of network resilience that various perturbation may cause fast brain structural atrophy, and then brain can reorganize its functional layout to lower its operational efficiency, which helps to slow down the structural atrophy and finally recover its functional efficiency equilibrium. This empirical finding could be explained by our theoretical model, suggesting one universal resilience dynamical function. This resilience is achieved in the brain functional network with evolving percolation and rich-club features. Our findings can help to understand the brain aging process and design possible mitigation methods to adjust interaction between degeneration and adaptation from resilience viewpoint.

preprint2022arXiv

See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval

Text-based person retrieval aims to find the query person based on a textual description. The key is to learn a common latent space mapping between visual-textual modalities. To achieve this goal, existing works employ segmentation to obtain explicitly cross-modal alignments or utilize attention to explore salient alignments. These methods have two shortcomings: 1) Labeling cross-modal alignments are time-consuming. 2) Attention methods can explore salient cross-modal alignments but may ignore some subtle and valuable pairs. To relieve these issues, we introduce an Implicit Visual-Textual (IVT) framework for text-based person retrieval. Different from previous models, IVT utilizes a single network to learn representation for both modalities, which contributes to the visual-textual interaction. To explore the fine-grained alignment, we further propose two implicit semantic alignment paradigms: multi-level alignment (MLA) and bidirectional mask modeling (BMM). The MLA module explores finer matching at sentence, phrase, and word levels, while the BMM module aims to mine \textbf{more} semantic alignments between visual and textual modalities. Extensive experiments are carried out to evaluate the proposed IVT on public datasets, i.e., CUHK-PEDES, RSTPReID, and ICFG-PEDES. Even without explicit body part alignment, our approach still achieves state-of-the-art performance. Code is available at: https://github.com/TencentYoutuResearch/PersonRetrieval-IVT.

preprint2020arXiv

AutoGrow: Automatic Layer Growing in Deep Convolutional Networks

Depth is a key component of Deep Neural Networks (DNNs), however, designing depth is heuristic and requires many human efforts. We propose AutoGrow to automate depth discovery in DNNs: starting from a shallow seed architecture, AutoGrow grows new layers if the growth improves the accuracy; otherwise, stops growing and thus discovers the depth. We propose robust growing and stopping policies to generalize to different network architectures and datasets. Our experiments show that by applying the same policy to different network architectures, AutoGrow can always discover near-optimal depth on various datasets of MNIST, FashionMNIST, SVHN, CIFAR10, CIFAR100 and ImageNet. For example, in terms of accuracy-computation trade-off, AutoGrow discovers a better depth combination in ResNets than human experts. Our AutoGrow is efficient. It discovers depth within similar time of training a single DNN. Our code is available at https://github.com/wenwei202/autogrow.

preprint2020arXiv

DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures

In seeking for sparse and efficient neural network models, many previous works investigated on enforcing L1 or L0 regularizers to encourage weight sparsity during training. The L0 regularizer measures the parameter sparsity directly and is invariant to the scaling of parameter values, but it cannot provide useful gradients, and therefore requires complex optimization techniques. The L1 regularizer is almost everywhere differentiable and can be easily optimized with gradient descent. Yet it is not scale-invariant, causing the same shrinking rate to all parameters, which is inefficient in increasing sparsity. Inspired by the Hoyer measure (the ratio between L1 and L2 norms) used in traditional compressed sensing problems, we present DeepHoyer, a set of sparsity-inducing regularizers that are both differentiable almost everywhere and scale-invariant. Our experiments show that enforcing DeepHoyer regularizers can produce even sparser neural network models than previous works, under the same accuracy level. We also show that DeepHoyer can be applied to both element-wise and structural pruning.

preprint2020arXiv

Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification

Modern deep neural networks (DNNs) often require high memory consumption and large computational loads. In order to deploy DNN algorithms efficiently on edge or mobile devices, a series of DNN compression algorithms have been explored, including factorization methods. Factorization methods approximate the weight matrix of a DNN layer with the multiplication of two or multiple low-rank matrices. However, it is hard to measure the ranks of DNN layers during the training process. Previous works mainly induce low-rank through implicit approximations or via costly singular value decomposition (SVD) process on every training step. The former approach usually induces a high accuracy loss while the latter has a low efficiency. In this work, we propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step. SVD training first decomposes each layer into the form of its full-rank SVD, then performs training directly on the decomposed weights. We add orthogonality regularization to the singular vectors, which ensure the valid form of SVD and avoid gradient vanishing/exploding. Low-rank is encouraged by applying sparsity-inducing regularizers on the singular values of each layer. Singular value pruning is applied at the end to explicitly reach a low-rank model. We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy, comparing to not only previous factorization methods but also state-of-the-art filter pruning methods.

preprint2020arXiv

Trained Rank Pruning for Efficient Deep Neural Networks

The performance of Deep Neural Networks (DNNs) keeps elevating in recent years with increasing network depth and width. To enable DNNs on edge devices like mobile phones, researchers proposed several network compression methods including pruning, quantization and factorization. Among the factorization-based approaches, low-rank approximation has been widely adopted because of its solid theoretical rationale and efficient implementations. Several previous works attempted to directly approximate a pre-trained model by low-rank decomposition; however, small approximation errors in parameters can ripple a large prediction loss. As a result, performance usually drops significantly and a sophisticated fine-tuning is required to recover accuracy. We argue that it is not optimal to separate low-rank approximation from training. Unlike previous works, this paper integrates low rank approximation and regularization into the training. We propose Trained Rank Pruning (TRP), which iterates low rank approximation and training. TRP maintains the capacity of original network while imposes low-rank constraints during training. A stochastic sub-gradient descent optimized nuclear regularization is utilized to further encourage low rank in TRP. The TRP trained network has low-rank structure in nature, and can be approximated with negligible performance loss, eliminating fine-tuning after low rank approximation. The methods are comprehensively evaluated on CIFAR-10 and ImageNet, outperforming previous compression methods using low rank approximation. Code is available: https://github.com/yuhuixu1993/Trained-Rank-Pruning

preprint2020arXiv

Trained Rank Pruning for Efficient Deep Neural Networks

To accelerate DNNs inference, low-rank approximation has been widely adopted because of its solid theoretical rationale and efficient implementations. Several previous works attempted to directly approximate a pre-trained model by low-rank decomposition; however, small approximation errors in parameters can ripple over a large prediction loss. Apparently, it is not optimal to separate low-rank approximation from training. Unlike previous works, this paper integrates low rank approximation and regularization into the training process. We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training. TRP maintains the capacity of the original network while imposing low-rank constraints during training. A nuclear regularization optimized by stochastic sub-gradient descent is utilized to further promote low rank in TRP. Networks trained with TRP has a low-rank structure in nature, and is approximated with negligible performance loss, thus eliminating fine-tuning after low rank approximation. The proposed method is comprehensively evaluated on CIFAR-10 and ImageNet, outperforming previous compression counterparts using low rank approximation. Our code is available at: https://github.com/yuhuixu1993/Trained-Rank-Pruning.

preprint2020arXiv

TRP: Trained Rank Pruning for Efficient Deep Neural Networks

To enable DNNs on edge devices like mobile phones, low-rank approximation has been widely adopted because of its solid theoretical rationale and efficient implementations. Several previous works attempted to directly approximate a pretrained model by low-rank decomposition; however, small approximation errors in parameters can ripple over a large prediction loss. As a result, performance usually drops significantly and a sophisticated effort on fine-tuning is required to recover accuracy. Apparently, it is not optimal to separate low-rank approximation from training. Unlike previous works, this paper integrates low rank approximation and regularization into the training process. We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training. TRP maintains the capacity of the original network while imposing low-rank constraints during training. A nuclear regularization optimized by stochastic sub-gradient descent is utilized to further promote low rank in TRP. The TRP trained network inherently has a low-rank structure, and is approximated with negligible performance loss, thus eliminating the fine-tuning process after low rank decomposition. The proposed method is comprehensively evaluated on CIFAR-10 and ImageNet, outperforming previous compression methods using low rank approximation.

preprint2016arXiv

A New Learning Method for Inference Accuracy, Core Occupation, and Performance Co-optimization on TrueNorth Chip

IBM TrueNorth chip uses digital spikes to perform neuromorphic computing and achieves ultrahigh execution parallelism and power efficiency. However, in TrueNorth chip, low quantization resolution of the synaptic weights and spikes significantly limits the inference (e.g., classification) accuracy of the deployed neural network model. Existing workaround, i.e., averaging the results over multiple copies instantiated in spatial and temporal domains, rapidly exhausts the hardware resources and slows down the computation. In this work, we propose a novel learning method on TrueNorth platform that constrains the random variance of each computation copy and reduces the number of needed copies. Compared to the existing learning method, our method can achieve up to 68.8% reduction of the required neuro-synaptic cores or 6.5X speedup, with even slightly improved inference accuracy.

preprint2016arXiv

Learning Structured Sparsity in Deep Neural Networks

High demand for computation resources severely hinders deployment of large-scale Deep Neural Networks (DNN) in resource constrained devices. In this work, we propose a Structured Sparsity Learning (SSL) method to regularize the structures (i.e., filters, channels, filter shapes, and layer depth) of DNNs. SSL can: (1) learn a compact structure from a bigger DNN to reduce computation cost; (2) obtain a hardware-friendly structured sparsity of DNN to efficiently accelerate the DNNs evaluation. Experimental results show that SSL achieves on average 5.1x and 3.1x speedups of convolutional layer computation of AlexNet against CPU and GPU, respectively, with off-the-shelf libraries. These speedups are about twice speedups of non-structured sparsity; (3) regularize the DNN structure to improve classification accuracy. The results show that for CIFAR-10, regularization on layer depth can reduce 20 layers of a Deep Residual Network (ResNet) to 18 layers while improve the accuracy from 91.25% to 92.60%, which is still slightly higher than that of original ResNet with 32 layers. For AlexNet, structure regularization by SSL also reduces the error by around ~1%. Open source code is in https://github.com/wenwei202/caffe/tree/scnn

preprint2015arXiv

The Organisation of the Elderly Connectome

Investigations of the human connectome have elucidated core features of adult structural networks, particularly the crucial role of hub-regions. However, little is known regarding network organisation of the healthy elderly connectome, a crucial prelude to the systematic study of neurodegenerative disorders. Here, whole-brain probabilistic tractography was performed on high-angular diffusion-weighted images acquired from 115 healthy elderly subjects, whom were 76 to 94 years old. Structural networks were reconstructed between 512 cortical and subcortical brain regions. We sought to investigate the architectural features of hub-regions, as well as left-right asymmetries, and sexual dimorphisms. We observed that the topology of hub-regions is consistent with adult connectomic data, and more importantly, their architectural features reflect their ongoing vital role in network communication. We also found substantial sexual dimorphisms, with females exhibiting stronger inter-hemispheric connections between cingulate and prefrontal cortices. Lastly, we demonstrate intriguing left-lateralized subnetworks consistent with the neural circuitry specialised for language and executive functions, while rightward subnetworks were dominant in visual and visuospatial streams. These findings provide insights into healthy brain ageing and provide a benchmark for the study of neurodegenerative disorders such as Alzheimers disease and Frontotemporal Dementia.

preprint2014arXiv

Entanglement evolution of three-qubit mixed states in multipartite cavity-reservoir systems

We analyze the multipartite entanglement evolution of three-qubit mixed states composed of a GHZ state and a W state. For a composite system consisting of three cavities interacting with independent reservoirs, it is shown that the entanglement evolution is restricted by a set of monogamy relations. Furthermore, as quantified by the negativity, the entanglement dynamical property of the mixed entangled state of cavity photons is investigated. It is found that the three cavity photons can exhibit the phenomenon of entanglement sudden death (ESD). However, compared with the evolution of a generalized three-qubit GHZ state which has the equal initial entanglement, the ESD time of mixed states is later than that of the pure state. Finally, we discuss the entanglement distribution in the multipartite system, and point out the intrinsic relation between the ESD of cavity photons and the entanglement sudden birth of reservoirs.

preprint2012arXiv

Path integral and wave function collapse

Quantum measurement problem is still unconsensus since it has existed many years and inspired a large of literature in physics and philosophy. We show it can be subsumed into the quantum theory if we extend the Feynman path integral by considering the relativistic effect of Feynman paths. According to this extended theory, we deduce not only the Klein-Gordon equation, but also the wave-function-collapse equation. It is showing that the stochastic and instantaneous collapse of the quantum measurement is due to the "potential noise" of the apparatus or environment and "inner correlation" of wave function respectively. Therefore, the definite-status of the macroscopic matter is due to itself and this does not disobey the quantum mechanics. This work will give a new recognition for the measurement problem.

preprint2011arXiv

Entanglement evolution in multipartite cavity-reservoir systems under local unitary operations

We analyze the entanglement evolution of two cavity photons being affected by the dissipation of two individual reservoirs. Under an arbitrary local unitary operation on the initial state, it is shown that there is only one parameter which changes the entanglement dynamics. For the bipartite subsystems, we show that the entanglement of the cavity photons is correlated with that of the reservoirs, although the local operation can delay the time at which the photon entanglement disappears and advance the time at which the reservoir entanglement appears. Furthermore, via a new defined four-qubit entanglement measure and two three-qubit entanglement measures, we study the multipartite entanglement evolution in the composite system, which allows us to analyze quantitatively both bipartite and multipartite entanglement within a unified framework. In addition, we also discuss the entanglement evolution with an arbitrary initial state.

preprint2011arXiv

The Quantification of Quantum Nonlocality by Characteristic Function

Quantum nonlocal correlation (QNC) is thought to be more general than quantum entanglement correlation, but the strength of it has not been well defined. We propose a way to measure the strength of QNC basing on the characteristic function. The characteristic function of QNC in a composite system is defined as a response function under the local quantum measurement. It is explored that once a characteristic function is given, the state of a composite system, with just a local trace-preserving quantum operation uncertainty, will be determined. We show that the strength of QNC basing on the characteristic function is a half-positive-definite function and does not change under any LU operation. For a two-partite pure state, the strength of QNC is equivalent to the quantum entanglement. Generally, we give a new definition for quantum entanglement using the strength function. Furthermore, we also give a separability-criterion for 2\timesm-dimensional mixed real matrix. This letter proposes an alternate way for QNC further research.

Wei Wen

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Computer-Aided Extraction of Select MRI Markers of Cerebral Small Vessel Disease: A Systematic Review

Exploiting Feature Diversity for Make-up Temporal Video Grounding

Hormonal Factors Moderate the Associations Between Vascular Risk Factors and White Matter Hyperintensities

Network resilience in the aging brain

See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval

AutoGrow: Automatic Layer Growing in Deep Convolutional Networks

DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures

Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification

Trained Rank Pruning for Efficient Deep Neural Networks

Trained Rank Pruning for Efficient Deep Neural Networks

TRP: Trained Rank Pruning for Efficient Deep Neural Networks

A New Learning Method for Inference Accuracy, Core Occupation, and Performance Co-optimization on TrueNorth Chip

Learning Structured Sparsity in Deep Neural Networks

The Organisation of the Elderly Connectome

Entanglement evolution of three-qubit mixed states in multipartite cavity-reservoir systems

Path integral and wave function collapse

Entanglement evolution in multipartite cavity-reservoir systems under local unitary operations

The Quantification of Quantum Nonlocality by Characteristic Function