Researcher profile

Biao Zhang

Biao Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

A C-band microwave rectifier without capacitors for microwave power transmission

A microwave rectifier at 5.8 GHz without any capacitors is presented, which owns a measured MW-to-DC conversion efficiency of 68.1%. A harmonic rejection filter and a DC pass filter, which replace lumped capacitors in conventional microwave rectifiers, are applied to suppressing the harmonics produced by an HSMS-286 Schottky diode during rectifying. At the fundamental frequency, a microstrip impedance transformer which contains a shunt λg/8 short-ended microstrip transmission line and two short series microstrip transmission lines are applied to compensating the imaginary impedance of the diode and matching the input impedance of the rectifier. The measured MW-to-DC conversion efficiency agrees well to the simulated results. The novel rectifier without any lumped passive elements may be applied for power transmission system at higher microwave frequencies.

preprint2026arXiv

Experimental study on an S-band near-field microwave magnetron power transmission system on hundred-watt level

A multi-magnetron microwave source, a metamaterial transmitting antenna, and a large power rectenna array are presented to build a near-field 2.45 GHz microwave power transmission system. The square 1 m2 rectenna array consists of sixteen rectennas with 2048 Schottky diodes for large power microwave rectifying. It receives microwave power and converts them into DC power. The design, structure, and measured performance of a unit rectenna as well as the entail rectenna array are presented in detail. The multi-magnetron microwave power source switches between half and full output power levels, i.e. the half-wave and full-wave modes. The transmission antenna is formed by a double-layer metallic hole array, which is applied to combine the output power of each magnetron. The rectenna array DC output power reaches 67.3 W on a 1.2 ohm DC load at a distance of 5.5 m from the transmission antenna. DC output power is affected by the distance, DC load, and the mode of microwave power source. It shows that conventional low power Schottky diodes can be applied to a microwave power transmission system with simple magnetrons to realise large power microwave rectifying.

preprint2023arXiv

Prompting Large Language Model for Machine Translation: A Case Study

Research on prompting has shown excellent performance with little or even no supervised training across many tasks. However, prompting for machine translation is still under-explored in the literature. We fill this gap by offering a systematic study on prompting strategies for translation, examining various factors for prompt template and demonstration example selection. We further explore the use of monolingual data and the feasibility of cross-lingual, cross-domain, and sentence-to-document transfer learning in prompting. Extensive experiments with GLM-130B (Zeng et al., 2022) as the testbed show that 1) the number and the quality of prompt examples matter, where using suboptimal examples degenerates translation; 2) several features of prompt examples, such as semantic similarity, show significant Spearman correlation with their prompting performance; yet, none of the correlations are strong enough; 3) using pseudo parallel prompt examples constructed from monolingual data via zero-shot prompting could improve translation; and 4) improved performance is achievable by transferring knowledge from prompt examples selected in other settings. We finally provide an analysis on the model outputs and discuss several problems that prompting still suffers from.

preprint2022arXiv

Data Scaling Laws in NMT: The Effect of Noise and Architecture

In this work, we study the effect of varying the architecture and training data quality on the data scaling properties of Neural Machine Translation (NMT). First, we establish that the test loss of encoder-decoder transformer models scales as a power law in the number of training samples, with a dependence on the model size. Then, we systematically vary aspects of the training setup to understand how they impact the data scaling laws. In particular, we change the following (1) Architecture and task setup: We compare to a transformer-LSTM hybrid, and a decoder-only transformer with a language modeling loss (2) Noise level in the training distribution: We experiment with filtering, and adding iid synthetic noise. In all the above cases, we find that the data scaling exponents are minimally impacted, suggesting that marginally worse architectures or training data can be compensated for by adding more data. Lastly, we find that using back-translated data instead of parallel data, can significantly degrade the scaling exponent.

preprint2022arXiv

Examining Scaling and Transfer of Language Model Architectures for Machine Translation

Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs. In this work, we thoroughly examine the role of several architectural design choices on the performance of LMs on bilingual, (massively) multilingual and zero-shot translation tasks, under systematic variations of data conditions and model sizes. Our results show that: (i) Different LMs have different scaling properties, where architectural differences often have a significant impact on model performance at small scales, but the performance gap narrows as the number of parameters increases, (ii) Several design choices, including causal masking and language-modeling objectives for the source sequence, have detrimental effects on translation quality, and (iii) When paired with full-visible masking for source sequences, LMs could perform on par with EncDec on supervised bilingual and multilingual translation tasks, and improve greatly on zero-shot directions by facilitating the reduction of off-target translations.

preprint2022arXiv

Multilingual Document-Level Translation Enables Zero-Shot Transfer From Sentences to Documents

Document-level neural machine translation (DocNMT) achieves coherent translations by incorporating cross-sentence context. However, for most language pairs there's a shortage of parallel documents, although parallel sentences are readily available. In this paper, we study whether and how contextual modeling in DocNMT is transferable via multilingual modeling. We focus on the scenario of zero-shot transfer from teacher languages with document level data to student languages with no documents but sentence level data, and for the first time treat document-level translation as a transfer learning problem. Using simple concatenation-based DocNMT, we explore the effect of 3 factors on the transfer: the number of teacher languages with document level data, the balance between document and sentence level data at training, and the data condition of parallel documents (genuine vs. backtranslated). Our experiments on Europarl-7 and IWSLT-10 show the feasibility of multilingual transfer for DocNMT, particularly on document-specific metrics. We observe that more teacher languages and adequate data balance both contribute to better transfer quality. Surprisingly, the transfer is less sensitive to the data condition, where multilingual DocNMT delivers decent performance with either backtranslated or genuine document pairs.

preprint2022arXiv

Overcoming Van der Waals Forces in reconfigurable nanostructures

Reconfigurable metamaterials require constituent nanostructures to demonstrate switching of shapes with external stimuli. For generality, such nanostructures would touch and stick to other surfaces in one of its configurations. Yet, a longstanding challenge is in overcoming this stiction caused by Van der Waals forces, which impedes shape recovery. Here, we introduce a stiff yet self-recovering material system based on acrylic acid, and tested it in high-aspect ratio structures, where recovery is weak. This designer material has a storage modulus of ~5.2 GPa at room temperature and ~90 MPa in the rubbery state at 150 Celsius, an order of magnitude higher than previous reports. A high-resolution resin for two-photon lithography was developed based on this polymer system, enabling 3D printing of nanopillars with diameters of ~400 nm and aspect ratio as high as ~10. Experimentally, we observed self-recovery as collapsed and touching structures overcome stiction to stand back up. We developed a theoretical model to explain the recoverability of these sub-micron structures. Reconfigurable structural colour prints and holograms were demonstrated, indicating potential applications of the material system as a shape memory polymer suitable for sub-micron reconfigurable metamaterials.

preprint2022arXiv

Revisiting End-to-End Speech-to-Text Translation From Scratch

End-to-end (E2E) speech-to-text translation (ST) often depends on pretraining its encoder and/or decoder using source transcripts via speech recognition or text translation tasks, without which translation performance drops substantially. However, transcripts are not always available, and how significant such pretraining is for E2E ST has rarely been studied in the literature. In this paper, we revisit this question and explore the extent to which the quality of E2E ST trained on speech-translation pairs alone can be improved. We reexamine several techniques proven beneficial to ST previously, and offer a set of best practices that biases a Transformer-based E2E ST system toward training from scratch. Besides, we propose parameterized distance penalty to facilitate the modeling of locality in the self-attention model for speech. On four benchmarks covering 23 languages, our experiments show that, without using any transcripts or pretraining, the proposed system reaches and even outperforms previous studies adopting pretraining, although the gap remains in (extremely) low-resource settings. Finally, we discuss neural acoustic feature modeling, where a neural model is designed to extract acoustic features from raw speech signals directly, with the goal to simplify inductive biases and add freedom to the model in describing speech. For the first time, we demonstrate its feasibility and show encouraging results on ST tasks.

preprint2022arXiv

Training Data Generating Networks: Shape Reconstruction via Bi-level Optimization

We propose a novel 3d shape representation for 3d shape reconstruction from a single image. Rather than predicting a shape directly, we train a network to generate a training set which will be fed into another learning algorithm to define the shape. The nested optimization problem can be modeled by bi-level optimization. Specifically, the algorithms for bi-level optimization are also being used in meta learning approaches for few-shot learning. Our framework establishes a link between 3D shape analysis and few-shot learning. We combine training data generating networks with bi-level optimization algorithms to obtain a complete framework for which all components can be jointly trained. We improve upon recent work on standard benchmarks for 3d shape reconstruction.

preprint2020arXiv

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. In this paper, we explore ways to improve them. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures. We identify the off-target translation issue (i.e. translating into a wrong target language) as the major source of the inferior zero-shot performance, and propose random online backtranslation to enforce the translation of unseen training language pairs. Experiments on OPUS-100 (a novel multilingual dataset with 100 languages) show that our approach substantially narrows the performance gap with bilingual models in both one-to-many and many-to-many settings, and improves zero-shot performance by ~10 BLEU, approaching conventional pivot-based methods.

preprint2020arXiv

On Sparsifying Encoder Outputs in Sequence-to-Sequence Models

Sequence-to-sequence models usually transfer all encoder outputs to the decoder for generation. In this work, by contrast, we hypothesize that these encoder outputs can be compressed to shorten the sequence delivered for decoding. We take Transformer as the testbed and introduce a layer of stochastic gates in-between the encoder and the decoder. The gates are regularized using the expected value of the sparsity-inducing L0penalty, resulting in completely masking-out a subset of encoder outputs. In other words, via joint training, the L0DROP layer forces Transformer to route information through a subset of its encoder states. We investigate the effects of this sparsification on two machine translation and two summarization tasks. Experiments show that, depending on the task, around 40-70% of source encodings can be pruned without significantly compromising quality. The decrease of the output length endows L0DROP with the potential of improving decoding efficiency, where it yields a speedup of up to 1.65x on document summarization tasks against the standard Transformer. We analyze the L0DROP behaviour and observe that it exhibits systematic preferences for pruning certain word types, e.g., function words and punctuation get pruned most. Inspired by these observations, we explore the feasibility of specifying rule-based patterns that mask out encoder outputs based on information such as part-of-speech tags, word frequency and word position.

preprint2020arXiv

Structural Multi-Colour Invisible Inks with Submicron 4D Printing of Shape Memory Polymers

Four-dimensional (4D) printing of shape memory polymer (SMP) imparts time responsive properties to 3D structures. Here, we explore 4D printing of a SMP in the submicron length scale, extending its applications to nanophononics. We report a new SMP photoresist based on Vero Clear achieving print features at a resolution of ~300 nm half pitch using two-photon polymerization lithography (TPL). Prints consisting of grids with size-tunable multi-colours enabled the study of shape memory effects to achieve large visual shifts through nanoscale structure deformation. As the nanostructures are flattened, the colours and printed information become invisible. Remarkably, the shape memory effect recovers the original surface morphology of the nanostructures along with its structural colour within seconds of heating above its glass transition temperature. The high-resolution printing and excellent reversibility in both microtopography and optical properties promises a platform for temperature-sensitive labels, information hiding for anti-counterfeiting, and tunable photonic devices.