Source author record

Bei Liu

Bei Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision math.FA physics.atom-ph quant-ph Machine Learning math.OA Multimedia physics.optics Artificial Intelligence Computation and Language math.CA math.KT Social and Information Networks

Catalog footprint

What is connected

16works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EdgeFM: Efficient Edge Inference for Vision-Language Models

Vision-language models (VLMs) have demonstrated strong applicability in edge industrial applications, yet their deployment remains severely constrained by requirements for deterministic low latency and stable execution under resource limitations. Existing frameworks either rely on bloated general-purpose designs or force developers into opaque, hardware-specific closed-source ecosystems, leading to hardware lock-in limitation and poor cross-platform adaptability. Observing that modern AI agents can efficiently search and tune configurations to generate highly optimized low-level kernels for standard LLM operators, we propose EdgeFM, a lightweight, agent-driven VLM/LLM inference framework tailored for cross-platform industrial edge deployment. EdgeFM removes non-essential features to reduce single-request latency, and encapsulates agent-tuned kernel optimizations as a modular library of reusable skills. By allowing direct invocation of these skills rather than waiting for closed-source implementations, it effectively closes the performance gap long dominated by proprietary toolchains. The framework natively supports mainstream platforms including x86 and NVIDIA Orin SoCs, and represents the first end-to-end VLA deployment on the domestic Horizon Journey platform, enhancing cross-platform portability. In most cases, it yields clearly better inference performance than conventional vendor-specific toolchains, achieving up to 1.49 times speedup over TensorRT-Edge-LLM on the NVIDIA Orin platform. Experimental results show that EdgeFM delivers favorable end-to-end inference performance, providing an open-source, production-grade solution for diverse edge industrial scenarios.

preprint2022arXiv

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

We study joint video and language (VL) pre-training to enable cross-modality learning and benefit plentiful downstream VL tasks. Existing works either extract low-quality video features or learn limited text embedding, while neglecting that high-resolution videos and diversified semantics can significantly improve cross-modality learning. In this paper, we propose a novel High-resolution and Diversified VIdeo-LAnguage pre-training model (HD-VILA) for many visual tasks. In particular, we collect a large dataset with two distinct properties: 1) the first high-resolution dataset including 371.5k hours of 720p videos, and 2) the most diversified dataset covering 15 popular YouTube categories. To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts. Our pre-training model achieves new state-of-the-art results in 10 VL understanding tasks and 2 more novel text-to-visual generation tasks. For example, we outperform SOTA models with relative increases of 40.4% R@1 in zero-shot MSR-VTT text-to-video retrieval task and 55.4% in high-resolution dataset LSMDC. The learned VL embedding is also effective in generating visually pleasing and semantically relevant results in text-to-visual editing and super-resolution tasks.

preprint2022arXiv

AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

AI illustrator aims to automatically design visually appealing images for books to provoke rich thoughts and emotions. To achieve this goal, we propose a framework for translating raw descriptions with complex semantics into semantically corresponding images. The main challenge lies in the complexity of the semantics of raw descriptions, which may be hard to be visualized (e.g., "gloomy" or "Asian"). It usually poses challenges for existing methods to handle such descriptions. To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN. Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN which takes Image Embeddings as inputs and is trained by combined semantic consistency losses. To bridge the gap between realistic images and illustration designs, we further adopt a stylization model as post-processing in our framework for better visual effects. Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training. Furthermore, we have built a benchmark that consists of 200 raw descriptions. We conduct a user study to demonstrate our superiority over the competing methods with complicated texts. We release our code at https://github.com/researchmm/AI_Illustrator.

preprint2022arXiv

Exploring Anchor-based Detection for Ego4D Natural Language Query

In this paper we provide the technique report of Ego4D natural language query challenge in CVPR 2022. Natural language query task is challenging due to the requirement of comprehensive understanding of video contents. Most previous works address this task based on third-person view datasets while few research interest has been placed in the ego-centric view by far. Great progress has been made though, we notice that previous works can not adapt well to ego-centric view datasets e.g., Ego4D mainly because of two reasons: 1) most queries in Ego4D have a excessively small temporal duration (e.g., less than 5 seconds); 2) queries in Ego4D are faced with much more complex video understanding of long-term temporal orders. Considering these, we propose our solution of this challenge to solve the above issues.

preprint2022arXiv

Observability of the superkick effect within a quantum-field-theoretical approach

An atom placed in an optical vortex close to the axis may, upon absorbing a photon, acquire a transverse momentum much larger than the transverse momentum of any plane-wave component of the vortex lightfield. This surprising phenomenon dubbed superkick has been clarified previously in terms of the atom wave packet evolution in the field of an optical vortex treated classically. Here, we study this effect within the quantum field theoretical (QFT) framework. We consider collision of a Bessel twisted wave with a compact Gaussian beam focused to a small focal spot $σ$ located at distance $b$ from the twisted beam axis. Through a qualitative discussion supported by exact analytical and numerical calculations, we recover the superkick phenomenon for $σ\ll b$ and explore its limits when $σ$ becomes comparable to $b$. On the way to the final result within the QFT treatment, we encountered and resolved apparent paradoxes related to subtle issues of the formalism. These results open a way to a detailed QFT exploration of other superkick-related effects recently suggested to exist in high-energy collisions.

preprint2020arXiv

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

We propose Pixel-BERT to align image pixels with text by deep multi-modal transformers that jointly learn visual and language embedding in a unified end-to-end framework. We aim to build a more accurate and thorough connection between image pixels and language semantics directly from image and sentence pairs instead of using region-based image features as the most recent vision and language tasks. Our Pixel-BERT which aligns semantic connection in pixel and text level solves the limitation of task-specific visual representation for vision and language tasks. It also relieves the cost of bounding box annotations and overcomes the unbalance between semantic labels in visual task and language semantic. To provide a better representation for down-stream tasks, we pre-train a universal end-to-end model with image and sentence pairs from Visual Genome dataset and MS-COCO dataset. We propose to use a random pixel sampling mechanism to enhance the robustness of visual representation and to apply the Masked Language Model and Image-Text Matching as pre-training tasks. Extensive experiments on downstream tasks with our pre-trained model show that our approach makes the most state-of-the-arts in downstream tasks, including Visual Question Answering (VQA), image-text retrieval, Natural Language for Visual Reasoning for Real (NLVR). Particularly, we boost the performance of a single model in VQA task by 2.17 points compared with SOTA under fair comparison.

preprint2020arXiv

SMP Challenge: An Overview of Social Media Prediction Challenge 2019

"SMP Challenge" aims to discover novel prediction tasks for numerous data on social multimedia and seek excellent research teams. Making predictions via social multimedia data (e.g. photos, videos or news) is not only helps us to make better strategic decisions for the future, but also explores advanced predictive learning and analytic methods on various problems and scenarios, such as multimedia recommendation, advertising system, fashion analysis etc. In the SMP Challenge at ACM Multimedia 2019, we introduce a novel prediction task Temporal Popularity Prediction, which focuses on predicting future interaction or attractiveness (in terms of clicks, views or likes etc.) of new online posts in social media feeds before uploading. We also collected and released a large-scale SMPD benchmark with over 480K posts from 69K users. In this paper, we define the challenge problem, give an overview of the dataset, present statistics of rich information for data and annotation and design the accuracy and correlation evaluation metrics for temporal popularity prediction to the challenge.

preprint2016arXiv

Amplification of nanosecond laser pulse chain via dynamic injection locking of laser diode

We report a novel optical pulse generation method for high-speed wavelength switching of amplified nanosecond (ns) laser pulses resonant to atomic transitions.Under free-running condition, a slave laser diode is blue-detuned with tens of GHz relative to the master laser. A ns pulse chain generated by modulating the continuous-wave master laser with a fiber-pigtailed electro-optical intensity modulator is injected into the slave laser diode to fast switch the slave laser's wavelength back and forth. The output beam of slave laser is filtered by a temperature-controlled etalon to get the amplified pulse chain. Based on our dynamic injection locking scheme, we produce a ns-scale square pulse chain with an effective ON/OFF ratio 10^8, considering at least the 60 dB scattering suppression by tuning light-atom interactions with far off-resonance detuning and 26.7 dB suppression ratio of the etalon. By studying the dynamic processes of injection locking, we determine the dependence of injection locking on both the injection power and the frequency detuning.

preprint2016arXiv

High on/off ratio nanosecond laser pulses for a triggered single-photon source

An 852nm nanosecond laser pulse chain with a high on/off ratio is generated by chopping a continuous-wave laser beam using a Mach-Zehnder-type electro-optic intensity modulator(MZ-EOIM). The detailed dependence of the MZ-EOIM's on/off ratio on various parameters is characterized. By optimizing the incident beam polarization and stabilizing the MZ-EOIM temperature, a static on/off ratio of 12600:1 is achieved. The dynamic on/off ratios versus the pulse repetition rate and the pulse duty cycle are measured and discussed. The high-on/off-ratio nanosecond pulsed laser system was used in a triggered single-photon source based on a trapped single cesium atom, which reveals clear antibunching.

preprint2016arXiv

Suppression of single cesium atom heating in a microscopic optical dipole trap for demonstration of an 852nm triggered single-photon source

We investigate single cesium (Cs) atom heating owing to the momentum accumulation process induced by the resonant pulsed excitation in a microscopic optical dipole trap formed by a strongly focused 1064 nm laser beam. The heating depends on the trap frequency which restricts the maximum repetition rate of pulsed excitation. We experimentally verify the heating of a single atom and then demonstrate how to suppress it with an optimized pulsed excitation/cooling method. The typical trap lifetime of single Cs atom is extended from 108 +/- 6 us to 2536 +/- 31 ms, and the corresponding number of excitation increases from ~ 108 to ~ 360000. In applying this faster cooling method, we use the trapped single Cs atom as a triggered single-photon source at an excitation repetition rate of 10 MHz. The second-order intensity correlations of the emitted single photons are characterized by implementing Hanbury Brown and Twiss setup, and clear anti-bunching effect has been observed.

preprint2015arXiv

Some Structural Properties of Homomorphism Dilation Systems for Linear Maps

Inspired by some recent development on the theory about projection valued dilations for operator valued measures or more generally bounded homomorphism dilations for bounded linear maps on Banach algebras, we explore a pure algebraic version of the dilation theory for linear systems acting on unital algebras and vector spaces. By introducing two natural dilation structures, namely the canonical and the universal dilation systems, we prove that every linearly minimal dilation is equivalent to a reduced homomorphism dilation of the universal dilation, and all the linearly minimal homomorphism dilations can be classified by the associated reduced subspaces contained in the kernel of synthesis operator for the universal dilation.

preprint2014arXiv

Dilations for Systems of Imprimitivity acting on Banach Spaces

Motivated by a general dilation theory for operator-valued measures, framings and bounded linear maps on operator algebras, we consider the dilation theory of the above objects with special structures. We show that every operator-valued system of imprimitivity has a dilation to a probability spectral system of imprimitivity acting on a Banach space. This completely generalizes a well-kown result which states that every frame representation of a countable group on a Hilbert space is unitarily equivalent to a subrepresentation of the left regular representation of the group. The dilated space in general can not be taken as a Hilbert space. However, it can be taken as a Hilbert space for positive operator valued systems of imprimitivity. We also prove that isometric group representation induced framings on a Banach space can be dilated to unconditional bases with the same structure for a larger Banach space This extends several known results on the dilations of frames induced by unitary group representations on Hilbert spaces.

preprint2014arXiv

Dilations of frames, operator valued measures and bounded linear maps

We will give an outline of the main results in our recent AMS Memoir, and include some new results, exposition and open problems. In that memoir we developed a general dilation theory for operator valued measures acting on Banach spaces where operator-valued measures (or maps) are not necessarily completely bounded. The main results state that any operator-valued measure, not necessarily completely bounded, always has a dilation to a projection-valued measure acting on a Banach space, and every bounded linear map, again not necessarily completely bounded, on a Banach algebra has a bounded homomorphism dilation acting on a Banach space. Here the dilation space often needs to be a Banach space even if the underlying space is a Hilbert space, and the projections are idempotents that are not necessarily self-adjoint. These results lead to some new connections between frame theory and operator algebras, and some of them can be considered as part of the investigation about "noncommutative" frame theory.

preprint2014arXiv

Twisted K-homology,Geometric cycles and T-duality

Twisted $K$-homology corresponds to $D$-branes in string theory. In this paper we compare two different models of geometric twisted $K$-homology and get their equivalence. Moreover, we give another description of geometric twisted $K$-homology using bundle gerbes. We establish some properties of geometric twisted $K$-homology. In the last part we construct $T$-duality isomorphism for geometric twisted $K$-homology.

preprint2012arXiv

Operator-Valued Measures, Dilations, and the Theory of Frames

We develop elements of a general dilation theory for operator-valued measures and bounded linear maps between operator algebras that are not necessarily completely-bounded. We prove our main results by extending and generalizing some known results from the theory of frames and framings.

preprint2012arXiv

Upper Beurling Density of Systems formed by Translates of finite Sets of Elements in $L^p(\R^d)$

In this paper, we prove that if a finite disjoint union of translates $\bigcup_{k=1}^n\{f_k(x-γ)\}_{γ\inΓ_k}$ in $L^p(\R^d)$ $(1<p<\infty)$ is a $p'$-Bessel sequence for some $1<p'<\infty$, then the disjoint union $Γ=\bigcup_{k=1}^nΓ_k$ has finite upper Beurling density, and that if $\bigcup_{k=1}^n\{f_k(x-γ)\}_{γ\inΓ_k}$ is a $(C_q)$-system with $1/p+1/q=1$, then $Γ$ has infinite upper Beurling density. Thus, no finite disjoint union of translates in $L^p(\R^d)$ can form a $p'$-Bessel $(C_q)$-system for any $1< p'<\infty$. Furthermore, by using techniques from the geometry of Banach spaces, we obtain that, for $1<p\le2$, no finite disjoint union of translates in $L^p(\R^d)$ can form an unconditional basis.

Bei Liu

What is connected

Connect this record

See the researcher in context

Building this map preview

16 published item(s)

EdgeFM: Efficient Edge Inference for Vision-Language Models

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

Exploring Anchor-based Detection for Ego4D Natural Language Query

Observability of the superkick effect within a quantum-field-theoretical approach

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

SMP Challenge: An Overview of Social Media Prediction Challenge 2019

Amplification of nanosecond laser pulse chain via dynamic injection locking of laser diode

High on/off ratio nanosecond laser pulses for a triggered single-photon source

Suppression of single cesium atom heating in a microscopic optical dipole trap for demonstration of an 852nm triggered single-photon source

Some Structural Properties of Homomorphism Dilation Systems for Linear Maps

Dilations for Systems of Imprimitivity acting on Banach Spaces

Dilations of frames, operator valued measures and bounded linear maps

Twisted K-homology,Geometric cycles and T-duality

Operator-Valued Measures, Dilations, and the Theory of Frames

Upper Beurling Density of Systems formed by Translates of finite Sets of Elements in $L^p(\R^d)$