Source author record

Tian Wang

Tian Wang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

27works

27topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

3D Primitives are a Spatial Language for VLMs

Vision-language models (VLMs) exhibit a striking paradox: they can generate executable code that reconstructs a 3D scene from geometric primitives with correct object counts, classes, and approximate positions, yet the same models fail at simpler spatial questions on the same image. We show that 3D geometric primitives (cubes, spheres, cylinders, expressed in executable code) serve as a powerful intermediate representation for spatial understanding, and exploit this through three contributions. First, we introduce \textbf{\textsc{SpatialBabel}}, a benchmark evaluating fourteen VLMs on primitive-based 3D scene reconstruction across six \emph{scene-code languages} (programming languages and declarative formats for 3D primitive scenes), revealing that a single model's object-detection F1 can vary by up to $5.7\times$ across languages. Second, we propose \textbf{Code-CoT} (Code Chain-of-Thought), a training-free inference strategy that routes spatial reasoning through primitive-based code generation. Code-CoT lifts the SpatialBabel-QA-Score by up to $+6.4$\% on primitive scenes and real-photo CV-Bench-3D accuracy by $+5.0$\% for VLMs with strong coding capabilities. Third, we propose \textbf{S$^{3}$-FT} (Self-Supervised Spatial Fine-Tuning), which self-supervisedly distills primitive spatial knowledge into general visual reasoning by parsing the model's own Three.js primitive-reconstructions into structured annotations and fine-tuning on the result, with \emph{no human labels and no teacher model}. Training on primitive images alone, S$^3$-FT improves Qwen3-VL-8B by $+4.6$ to $+8.6$\% on SpatialBabel-Primitive-QA, $+9.7$\% on CV-Bench-2D, and $+17$\% on HallusionBench; the recipe transfers across model families. These results establish geometric primitives in code as both a diagnostic and a transferable spatial vocabulary for VLMs. We will release all artifacts upon publication.

preprint2026arXiv

Edge-Optimized Multimodal Learning for UAV Video Understanding via BLIP-2

The demand for real-time visual understanding and interaction in complex scenarios is increasingly critical for unmanned aerial vehicles. However, a significant challenge arises from the contradiction between the high computational cost of large Vision language models and the limited computing resources available on UAV edge devices. To address this challenge, this paper proposes a lightweight multimodal task platform based on BLIP-2, integrated with YOLO-World and YOLOv8-Seg models. This integration extends the multi-task capabilities of BLIP-2 for UAV applications with minimal adaptation and without requiring task-specific fine-tuning on drone data. Firstly, the deep integration of BLIP-2 with YOLO models enables it to leverage the precise perceptual results of YOLO for fundamental tasks like object detection and instance segmentation, thereby facilitating deeper visual-attention understanding and reasoning. Secondly, a content-aware key frame sampling mechanism based on K-Means clustering is designed, which incorporates intelligent frame selection and temporal feature concatenation. This equips the lightweight BLIP-2 architecture with the capability to handle video-level interactive tasks effectively. Thirdly, a unified prompt optimization scheme for multi-task adaptation is implemented. This scheme strategically injects structured event logs from the YOLO models as contextual information into BLIP-2's input. Combined with output constraints designed to filter out technical details, this approach effectively guides the model to generate accurate and contextually relevant outputs for various tasks.

preprint2026arXiv

Hybrid Distillation with CoT Guidance for Edge-Drone Control Code Generation

With large language models demonstrating significant potential in code generation tasks, their application to onboard control of resource-constrained Unmanned Aerial Vehicles has emerged as an important research direction. However, a notable contradiction exists between the high resource consumption of large models and the real-time, lightweight requirements of UAV platforms. This paper proposes an integrated approach that combines knowledge distillation, chain-of-thought guidance, and supervised fine-tuning for UAV multi-SDK control tasks, aiming to efficiently transfer complex reasoning and code generation capabilities to smaller models. Firstly, a high-quality dataset covering various mainstream UAV SDKs is constructed, featuring instruction-code-reasoning chains, and incorporates counterfactual negative samples for data augmentation, guiding the model to learn the end-to-end logic from instruction parsing to code generation. Secondly, leveraging DeepSeek-Coder-V2-Lite quantized via QLoRA as the teacher model, and based on a hybrid black-box and white-box distillation strategy, high-quality chain-of-thought soft labels are generated. These are combined with a weighted cross-entropy loss using hard labels to transfer complex reasoning capabilities to the smaller student model. Finally, through prompt tuning engineering optimized for the UAV control scenario, the model performance on core tasks such as SDK type recognition and function call matching is enhanced. Experimental results indicate that the distilled lightweight model maintains high code generation accuracy while achieving significant improvements in deployment and inference efficiency, effectively demonstrating the feasibility and superiority of our approach in achieving precise and lightweight intelligent control for UAVs

preprint2026arXiv

Large Language Models to Enhance Multi-task Drone Operations in Simulated Environments

Benefiting from the rapid advancements in large language models (LLMs), human-drone interaction has reached unprecedented opportunities. In this paper, we propose a method that integrates a fine-tuned CodeT5 model with the Unreal Engine-based AirSim drone simulator to efficiently execute multi-task operations using natural language commands. This approach enables users to interact with simulated drones through prompts or command descriptions, allowing them to easily access and control the drone's status, significantly lowering the operational threshold. In the AirSim simulator, we can flexibly construct visually realistic dynamic environments to simulate drone applications in complex scenarios. By combining a large dataset of (natural language, program code) command-execution pairs generated by ChatGPT with developer-written drone code as training data, we fine-tune the CodeT5 to achieve automated translation from natural language to executable code for drone tasks. Experimental results demonstrate that the proposed method exhibits superior task execution efficiency and command understanding capabilities in simulated environments. In the future, we plan to extend the model functionality in a modular manner, enhancing its adaptability to complex scenarios and driving the application of drone technologies in real-world environments.

preprint2026arXiv

LLM Agents Enable User-Governed Personalization Beyond Platform Boundaries

Personalization today is fundamentally platform-centric: services build user representations from the behavioral fragments they observe. Yet no platform can construct a complete picture of the user, as competitive incentives, legal constraints, user privacy concerns, and epistemic limits create persistent data barriers. This paper argues for a shift from platform-centric personalization to user-governed personalization, where only the user can integrate fragmented contexts across platforms and the offline world. The key asymmetry lies in data access: only users can aggregate their own cross-platform and offline information. Large language model (LLM) agents make such integration practically feasible for the first time by enabling reasoning over heterogeneous personal data and transforming users' cross-context information into actionable personalization capabilities. We provide proof-of-concept evidence that users equipped with cross-platform data exports and an off-the-shelf LLM agent can outperform single-platform personalization baselines. We conclude by outlining a research agenda for building scalable user-governed personalization systems.

preprint2026arXiv

Spectral point transformer for significant wave height estimation from sea clutter

This paper presents a method for estimating significant wave height (Hs) from sparse S_pectral P_oint using a T_ransformer-based approach (SPT). Based on empirical observations that only a minority of spectral points with strong power contribute to wave energy, the proposed SPT effectively integrates geometric and spectral characteristics of ocean surface waves to estimate Hs through multi-dimensional feature representation. The experiment reveals an intriguing phenomenon: the learned features of SPT align well with physical dispersion relations, where the contribution-score map of selected points is concentrated along dispersion curves. Compared to conventional vision networks that process image sequences and full spectra, SPT demonstrates superior performance in Hs regression while consuming significantly fewer computational resources. On a consumer-grade GPU, SPT completes the training of regression model for 1080 sea clutter image sequences within 4 minutes, showcasing its potential to reduce deployment costs for radar wave-measuring systems. The open-source implementation of SPT will be available at https://github.com/joeyee/spt

preprint2023arXiv

A Privacy Glossary for Cloud Computing

Cloud computing is an evolving paradigm that is frequently changing the way humans share, store, and access their information in digital format. While cloud computing offers tremendous benefits (e.g., efficiency, flexibility, and reduced costs), it also brings both security and privacy challenges. Although cloud security has been extensively defined and developed, privacy protections in cloud environments are often described in abstract or vague language, which makes it difficult to interpret and implement. In this study, we propose an initial approach of developing a privacy glossary for cloud computing that provides a consistent and comprehensive set of terminologies for cloud privacy. We believe that this systematic and structured privacy glossary could serve as a first step towards implementing requirements for privacy protections in cloud computing, as well as providing more effective and consistent language in cloud privacy to researchers and professionals in the future.

preprint2022arXiv

Bi-level Doubly Variational Learning for Energy-based Latent Variable Models

Energy-based latent variable models (EBLVMs) are more expressive than conventional energy-based models. However, its potential on visual tasks are limited by its training process based on maximum likelihood estimate that requires sampling from two intractable distributions. In this paper, we propose Bi-level doubly variational learning (BiDVL), which is based on a new bi-level optimization framework and two tractable variational distributions to facilitate learning EBLVMs. Particularly, we lead a decoupled EBLVM consisting of a marginal energy-based distribution and a structural posterior to handle the difficulties when learning deep EBLVMs on images. By choosing a symmetric KL divergence in the lower level of our framework, a compact BiDVL for visual tasks can be obtained. Our model achieves impressive image generation performance over related works. It also demonstrates the significant capacity of testing image reconstruction and out-of-distribution detection.

preprint2022arXiv

Bounds for the distribution of the Frobenius traces associated to products of non-CM elliptic curves

Let $g \geq 1$ be an integer and let $A/\mathbb{Q}$ be an abelian variety that is isogenous over $\mathbb{Q}$ to %the product $E_1 \times \ldots \times E_g$ of elliptic curves $E_1/\mathbb{Q}$, $\ldots$, $E_g/\mathbb{Q}$, without complex multiplication and pairwise non-isogenous over $\overline{\mathbb{Q}}$. a product of $g$ elliptic curves defined over $\mathbb{Q}$, pairwise non-isogenous over $\overline{\mathbb{Q}}$ and each without complex multiplication. %pairwise non-isogenous over $\overline{\mathbb{Q}}$. For an integer $t$ and a positive real number $x$, denote by $π_A(x, t)$ the number of primes $p \leq x$, of good reduction for %the abelian variety $A$, for which the Frobenius trace $a_{1, p}(A)$ associated to the reduction of $A$ modulo $p$ equals $t$. Assuming the Generalized Riemann Hypothesis for Dedekind zeta functions, we prove that $π_A(x, 0) \ll_A x^{1 - \frac{1}{3 g+1 }}/(\log x)^{1 - \frac{2}{3 g+1}}$ and $π_A(x, t) \ll_A x^{1 - \frac{1}{3 g + 2}}/(\log x)^{1 - \frac{2}{3 g + 2}}$ if $t \neq 0$. These bounds largely improve upon recent ones obtained for $g = 2$ by H. Chen, N. Jones, and V. Serban, and may be viewed as generalizations to arbitrary $g$ of the bounds obtained for $g=1$ by M.R. Murty, V.K. Murty, and N. Saradha, combined with a refinement in the power of $\log x$ by D. Zywina. Under the same assumptions, we also prove the existence of a density one set of primes $p$ satisfying $|a_{1, p}(A)|>p^{\frac{1}{3 g + 1} - \varepsilon}$ for any fixed $\varepsilon>0$.

preprint2022arXiv

CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval

Given a natural language description, text-based person retrieval aims to identify images of a target person from a large-scale person image database. Existing methods generally face a \textbf{color over-reliance problem}, which means that the models rely heavily on color information when matching cross-modal data. Indeed, color information is an important decision-making accordance for retrieval, but the over-reliance on color would distract the model from other key clues (e.g. texture information, structural information, etc.), and thereby lead to a sub-optimal retrieval performance. To solve this problem, in this paper, we propose to \textbf{C}apture \textbf{A}ll-round \textbf{I}nformation \textbf{B}eyond \textbf{C}olor (\textbf{CAIBC}) via a jointly optimized multi-branch architecture for text-based person retrieval. CAIBC contains three branches including an RGB branch, a grayscale (GRS) branch and a color (CLR) branch. Besides, with the aim of making full use of all-round information in a balanced and effective way, a mutual learning mechanism is employed to enable the three branches which attend to varied aspects of information to communicate with and learn from each other. Extensive experimental analysis is carried out to evaluate our proposed CAIBC method on the CUHK-PEDES and RSTPReid datasets in both \textbf{supervised} and \textbf{weakly supervised} text-based person retrieval settings, which demonstrates that CAIBC significantly outperforms existing methods and achieves the state-of-the-art performance on all the three tasks.

preprint2022arXiv

Contrastive Learning for Interactive Recommendation in Fashion

Recommender systems and search are both indispensable in facilitating personalization and ease of browsing in online fashion platforms. However, the two tools often operate independently, failing to combine the strengths of recommender systems to accurately capture user tastes with search systems' ability to process user queries. We propose a novel remedy to this problem by automatically recommending personalized fashion items based on a user-provided text request. Our proposed model, WhisperLite, uses contrastive learning to capture user intent from natural language text and improves the recommendation quality of fashion products. WhisperLite combines the strength of CLIP embeddings with additional neural network layers for personalization, and is trained using a composite loss function based on binary cross entropy and contrastive loss. The model demonstrates a significant improvement in offline recommendation retrieval metrics when tested on a real-world dataset collected from an online retail fashion store, as well as widely used open-source datasets in different e-commerce domains, such as restaurants, movies and TV shows, clothing and shoe reviews. We additionally conduct a user study that captures user judgements on the relevance of the model's recommended items, confirming the relevancy of WhisperLite's recommendations in an online setting.

preprint2022arXiv

Delving into the Estimation Shift of Batch Normalization in a Network

Batch normalization (BN) is a milestone technique in deep learning. It normalizes the activation using mini-batch statistics during training but the estimated population statistics during inference. This paper focuses on investigating the estimation of population statistics. We define the estimation shift magnitude of BN to quantitatively measure the difference between its estimated population statistics and expected ones. Our primary observation is that the estimation shift can be accumulated due to the stack of BN in a network, which has detriment effects for the test performance. We further find a batch-free normalization (BFN) can block such an accumulation of estimation shift. These observations motivate our design of XBNBlock that replace one BN with BFN in the bottleneck block of residual-style networks. Experiments on the ImageNet and COCO benchmarks show that XBNBlock consistently improves the performance of different architectures, including ResNet and ResNeXt, by a significant margin and seems to be more robust to distribution shift.

preprint2022arXiv

Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

The core problem of text-based person retrieval is how to bridge the heterogeneous gap between multi-modal data. Many previous approaches contrive to learning a latent common manifold mapping paradigm following a \textbf{cross-modal distribution consensus prediction (CDCP)} manner. When mapping features from distribution of one certain modality into the common manifold, feature distribution of the opposite modality is completely invisible. That is to say, how to achieve a cross-modal distribution consensus so as to embed and align the multi-modal features in a constructed cross-modal common manifold all depends on the experience of the model itself, instead of the actual situation. With such methods, it is inevitable that the multi-modal data can not be well aligned in the common manifold, which finally leads to a sub-optimal retrieval performance. To overcome this \textbf{CDCP dilemma}, we propose a novel algorithm termed LBUL to learn a Consistent Cross-modal Common Manifold (C$^{3}$M) for text-based person retrieval. The core idea of our method, just as a Chinese saying goes, is to `\textit{san si er hou xing}', namely, to \textbf{Look Before yoU Leap (LBUL)}. The common manifold mapping mechanism of LBUL contains a looking step and a leaping step. Compared to CDCP-based methods, LBUL considers distribution characteristics of both the visual and textual modalities before embedding data from one certain modality into C$^{3}$M to achieve a more solid cross-modal distribution consensus, and hence achieve a superior retrieval accuracy. We evaluate our proposed method on two text-based person retrieval datasets CUHK-PEDES and RSTPReid. Experimental results demonstrate that the proposed LBUL outperforms previous methods and achieves the state-of-the-art performance.

preprint2022arXiv

Observation and dynamic control of a new pathway of H$_3^+$ formation

We propose and experimentally demonstrate that the trihydrogen cation (H$_3^+$) can be produced via single photoionization of the molecular hydrogen dimer (H$_4$). Using near-infrared, femtosecond laser pulses and coincidence momentum imaging, we find that the dominant channel after single ionization of the dimer is the ejection of a hydrogen atom within a few hundred femtoseconds, leaving an H3+ cation behind. The formation mechanism is supported and well reproduced by an ab-initio molecular dynamics simulation. This is a new pathway of H$_3^+$ formation from ultracold hydrogen gas that may help explain the unexpected high abundance of H$_3^+$ in the interstellar medium in the universe.

preprint2021arXiv

Personalized Embedding-based e-Commerce Recommendations at eBay

Recommender systems are an essential component of e-commerce marketplaces, helping consumers navigate massive amounts of inventory and find what they need or love. In this paper, we present an approach for generating personalized item recommendations in an e-commerce marketplace by learning to embed items and users in the same vector space. In order to alleviate the considerable cold-start problem present in large marketplaces, item and user embeddings are computed using content features and multi-modal onsite user activity respectively. Data ablation is incorporated into the offline model training process to improve the robustness of the production system. In offline evaluation using a dataset collected from eBay traffic, our approach was able to improve the Recall@k metric over the Recently-Viewed-Item (RVI) method. This approach to generating personalized recommendations has been launched to serve production traffic, and the corresponding scalable engineering architecture is also presented. Initial A/B test results show that compared to the current personalized recommendation module in production, the proposed method increases the surface rate by $\sim$6\% to generate recommendations for 90\% of listing page impressions.

preprint2020arXiv

Accelerating temporal action proposal generation via high performance computing

Temporal action recognition always depends on temporal action proposal generation to hypothesize actions and algorithms usually need to process very long video sequences and output the starting and ending times of each potential action in each video suffering from high computation cost. To address this, based on boundary sensitive network we propose a new temporal convolution network called Multipath Temporal ConvNet (MTN), which consists of two parts i.e. Multipath DenseNet and SE-ConvNet. In this work, one novel high performance ring parallel architecture based on Message Passing Interface (MPI) is further introduced into temporal action proposal generation, which is a reliable communication protocol, in order to respond to the requirements of large memory occupation and a large number of videos. Remarkably, the total data transmission is reduced by adding a connection between multiple computing load in the newly developed architecture. It is found that, compared to the traditional Parameter Server architecture, our parallel architecture has higher efficiency on temporal action detection task with multiple GPUs, which is suitable for dealing with the tasks of temporal action proposal generation, especially for large datasets of millions of videos. We conduct experiments on ActivityNet-1.3 and THUMOS14, where our method outperforms other state-of-art temporal action detection methods with high recall and high temporal precision. In addition, a time metric is further proposed here to evaluate the speed performance in the distributed training process.

preprint2020arXiv

Double-Layer Game Based Wireless Charging Scheduling for Electric Vehicles

Wireless charging technology provides a solution to the insufficient battery life of electric vehicles (EVs). However, the conflict of interests between wireless charging lanes (WCLs) and EVs is difficult to resolve. In the day-ahead electricity market, considering the revenue of WCLs caused by the deviation between actual electricity sales and pre-purchased electricity, as well as endurance and traveling experience of EVs, this paper proposes a charging scheduling algorithm based on a double-layer game model. In lower layer, the potential game is used to model the multi-vehicle game of vehicle charging planning. A shortest path algorithm based on the three-way greedy strategy is designed to solve in dynamic charging sequence problem, and the improved particle swarm optimization algorithm are used to solve the variable ordered potential game. In the upper layer, the reverse Stackelberg game is adopted to harmonize the cost of wireless charging lanes and electric vehicles. As the leader, WCLs stimulate EVs to carry out reasonable charing action by electricity price regulation. As the follower, EVs make the best charging decisions for a given electricity price. An iteration algorithm is designed to ensure the Nash equilibrium convergence of this game. The simulation results show that the double-layer game model proposed in this paper can effectively suppress the deviation between the actual electricity sales and the pre-sale of the charging lane caused by the disorderly charging behavior of the vehicle, and ensure the high endurance and traveling experience of EVs.

preprint2018arXiv

The free energy of biomembrane and nerve excitation and the role of anesthetics

In the electromechanical theory of nerve stimulation, the nerve impulse consists of a traveling region of solid membrane in a liquid environment. Therefore, the free energy necessary to stimulate a pulse is directly related to the free energy difference necessary to induce a phase transition in the nerve membrane. It is a function of temperature and pressure, and it is sensitively dependent on the presence of anesthetics which lower melting transitions. We investigate the free energy difference of solid and liquid membrane phases under the influence of anesthetics. We calculate stimulus-response curves of electromechanical pulses and compare them to measured stimulus-response profiles in lobster and earthworm axons. We also compare them to stimulus-response experiments on human median nerve and frog sciatic nerve published in the literature.

preprint2016arXiv

Non-invasive detection of animal nerve impulses with an atomic magnetometer operating near quantum limited sensitivity

Magnetic fields generated by human and animal organs, such as the heart, brain and nervous system carry information useful for biological and medical purposes. These magnetic fields are most commonly detected using cryogenically-cooled superconducting magnetometers. Here we present the frst detection of action potentials from an animal nerve using an optical atomic magnetometer. Using an optimal design we are able to achieve the sensitivity dominated by the quantum shot noise of light and quantum projection noise of atomic spins. Such sensitivity allows us to measure the nerve impulse with a miniature room-temperature sensor which is a critical advantage for biomedical applications. Positioning the sensor at a distance of a few millimeters from the nerve, corresponding to the distance between the skin and nerves in biological studies, we detect the magnetic field generated by an action potential of a frog sciatic nerve. From the magnetic field measurements we determine the activity of the nerve and the temporal shape of the nerve impulse. This work opens new ways towards implementing optical magnetometers as practical devices for medical diagnostics.

preprint2015arXiv

Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks

Traditional methods to tackle many music information retrieval tasks typically follow a two-step architecture: feature engineering followed by a simple learning algorithm. In these "shallow" architectures, feature engineering and learning are typically disjoint and unrelated. Additionally, feature engineering is difficult, and typically depends on extensive domain expertise. In this paper, we present an application of convolutional neural networks for the task of automatic musical instrument identification. In this model, feature extraction and learning algorithms are trained together in an end-to-end fashion. We show that a convolutional neural network trained on raw audio can achieve performance surpassing traditional methods that rely on hand-crafted features.

preprint2015arXiv

Larger-Context Language Modelling

In this work, we propose a novel method to incorporate corpus-level discourse information into language modelling. We call this larger-context language model. We introduce a late fusion approach to a recurrent language model based on long short-term memory units (LSTM), which helps the LSTM unit keep intra-sentence dependencies and inter-sentence dependencies separate from each other. Through the evaluation on three corpora (IMDB, BBC, and PennTree Bank), we demon- strate that the proposed model improves perplexity significantly. In the experi- ments, we evaluate the proposed approach while varying the number of context sentences and observe that the proposed late fusion is superior to the usual way of incorporating additional inputs to the LSTM. By analyzing the trained larger- context language model, we discover that content words, including nouns, adjec- tives and verbs, benefit most from an increasing number of context sentences. This analysis suggests that larger-context language model improves the unconditional language model by capturing the theme of a document better and more easily.

preprint2014arXiv

Analysis and Control of Beliefs in Social Networks

In this paper, we investigate the problem of how beliefs diffuse among members of social networks. We propose an information flow model (IFM) of belief that captures how interactions among members affect the diffusion and eventual convergence of a belief. The IFM model includes a generalized Markov Graph (GMG) model as a social network model, which reveals that the diffusion of beliefs depends heavily on two characteristics of the social network characteristics, namely degree centralities and clustering coefficients. We apply the IFM to both converged belief estimation and belief control strategy optimization. The model is compared with an IFM including the Barabasi-Albert model, and is evaluated via experiments with published real social network data.

preprint2014arXiv

Electronic Structures of Hybrid Graphene/Phosphorene Nanocomposite

Combining the electronic structures of two-dimensional monolayers in ultrathin hybrid nanocomposites is expected to display new properties beyond their simplex components. Here, first-principles calculations are performed to study the structural, electronic and optical properties of hybrid graphene and phosphorene nanocomposite. It turns out that weak van der Waals interactions dominate between graphene and phosphorene with their intrinsic electronic properties preserved. Hybrid graphene and phosphorene nanocomposite shows tunable band gaps at graphene's Dirac point and a transition from hole doing to electron doing for graphene as the interfacial distance decreases. Charge transfer between graphene to phosphorene induces interfacial electron-hole pairs in hybrid graphene and phosphorene nanocomposite with enhanced visible light response.

preprint2014arXiv

First-Principles Study of Hybrid Graphene and MoS$_2$ Nanocomposites

Combining the electronic properties of graphene and molybdenum disulphide (MoS$_2$) monolayers in two-dimensional (2D) ultrathin hybrid nanocomposites have been synthesized experimentally to create excellent electronic, electrochemical, photovoltaic, photoresponsive and memory devices. Here, first-principles calculations are performed to investigate the electronic, electrical and optical properties in hybrid G/MoS$_2$ and G/MoS$_2$/G nanocomposites. It turns out that weak van der Waals interactions dominate between graphene and MoS$_2$ with their intrinsic electronic properties preserved. Interestingly, tunable p-type doping of graphene is very easy to achieve by applying electric fields perpendicular to hybrid G/MoS$_2$ and G/MoS$_2$/G nanocomposites, because electrons can easily transfer from the Dirac point of graphene to the conduction band of MoS$_2$ due to the work function of graphene close to the electronic affinity of MoS$_2$. Vertical electric fields can generate strong p-type but weak n-type doping of graphene, inducing electron-hole pairs in hybrid G/MoS$_2$/G sandwiched nanocomposites. Moreover, improved optical properties in hybrid G/MoS$_2$ and G/MoS$_2$/G nanocomposites are also expected with potential photovoltaic and photoresponsive applications.

preprint2014arXiv

Is the Preferred Basis selected by the environment?

We show that in a quantum measurement, the preferred basis is determined by the interaction between the apparatus and the quantum system, instead of by the environment. This interaction entangles three degrees of freedom, one system degree of freedom we are interested in and preserved by the interaction, one system degree of freedom that carries the change due to the interaction, and the apparatus degree of freedom which is always ignored. Considering all three degrees of freedom the composite state only has one decomposition, and this guarantees that the apparatus would end up in the expected preferred basis of our daily experiences. We also point out some problems with the environment-induced super-selection (Einselection) solution to the preferred basis problem, and clarifies a common misunderstanding of environmental decoherence and the preferred basis problem.

preprint2014arXiv

Proposal for the Creation and Optical Detection of Spin Cat States in Bose-Einstein Condensates

We propose a method to create "spin cat states", i.e. macroscopic superpositions of coherent spin states, in Bose-Einstein condensates using the Kerr nonlinearity due to atomic collisions. Based on a detailed study of atom loss, we conclude that cat sizes of hundreds of atoms should be realistic. The existence of the spin cat states can be demonstrated by optical readout. Our analysis also includes the effects of higher-order nonlinearities, atom number fluctuations, and limited readout efficiency.

preprint2013arXiv

Can coarse measurements reveal macroscopic quantum effects?

It has recently been conjectured that detecting quantum effects such as superposition or entanglement for macroscopic systems always requires high measurement precision. Analyzing an apparent counter-example involving macroscopic coherent states and Kerr non-linearities, we find that while measurements with coarse outcomes can be sufficient, the phase control precision of the necessary non-linear operations has to increase with the size of the system. This suggests a refined conjecture that either the {\it outcome precision} or the {\it control precision} of the measurements has to increase with system size.

Tian Wang

What is connected

Connect this record

See the researcher in context

Building this map preview

27 published item(s)

3D Primitives are a Spatial Language for VLMs

Edge-Optimized Multimodal Learning for UAV Video Understanding via BLIP-2

Hybrid Distillation with CoT Guidance for Edge-Drone Control Code Generation

Large Language Models to Enhance Multi-task Drone Operations in Simulated Environments

LLM Agents Enable User-Governed Personalization Beyond Platform Boundaries

Spectral point transformer for significant wave height estimation from sea clutter

A Privacy Glossary for Cloud Computing

Bi-level Doubly Variational Learning for Energy-based Latent Variable Models

Bounds for the distribution of the Frobenius traces associated to products of non-CM elliptic curves

CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval

Contrastive Learning for Interactive Recommendation in Fashion

Delving into the Estimation Shift of Batch Normalization in a Network

Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

Observation and dynamic control of a new pathway of H$_3^+$ formation

Personalized Embedding-based e-Commerce Recommendations at eBay

Accelerating temporal action proposal generation via high performance computing

Double-Layer Game Based Wireless Charging Scheduling for Electric Vehicles

The free energy of biomembrane and nerve excitation and the role of anesthetics

Non-invasive detection of animal nerve impulses with an atomic magnetometer operating near quantum limited sensitivity

Automatic Instrument Recognition in Polyphonic Music Using Convolutional Neural Networks

Larger-Context Language Modelling

Analysis and Control of Beliefs in Social Networks

Electronic Structures of Hybrid Graphene/Phosphorene Nanocomposite

First-Principles Study of Hybrid Graphene and MoS$_2$ Nanocomposites

Is the Preferred Basis selected by the environment?

Proposal for the Creation and Optical Detection of Spin Cat States in Bose-Einstein Condensates

Can coarse measurements reveal macroscopic quantum effects?