Source author record

Yuntao Bai

Yuntao Bai appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-th Computation and Language Machine Learning math.AG math.CO physics.ins-det quant-ph

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We find this alignment training improves performance on almost all NLP evaluations, and is fully compatible with training for specialized skills such as python coding and summarization. We explore an iterated online mode of training, where preference models and RL policies are updated on a weekly cadence with fresh human feedback data, efficiently improving our datasets and models. Finally, we investigate the robustness of RLHF training, and identify a roughly linear relation between the RL reward and the square root of the KL divergence between the policy and its initialization. Alongside our main results, we perform peripheral analyses on calibration, competing objectives, and the use of OOD detection, compare our models with human writers, and provide samples from our models using prompts appearing in recent related work.

preprint2020arXiv

A phase-sensitive optomechanical amplifier for quantum noise reduction in laser interferometers

The sensitivity of future gravitational wave interferometers is expected to be limited through-out the detection band by quantum vacuum fluctuations, which can be reduced by quantum non-demolition methods such as squeezed vacuum injection. However, optical losses in the readout chainseverely limit the effectiveness of such schemes. We propose an optomechanical device to be installedat the output of the detector that mitigates the effect of readout loss, thus allowing the detector tobetter exploit quantum noise evasion schemes.

preprint2015arXiv

The Amplituhedron and the One-loop Grassmannian Measure

All-loop planar scattering amplitudes in maximally supersymmetric Yang-Mills theory can be formulated geometrically in terms of the "amplituhedron". We study the mathematical structures of the one-loop amplituhedron, and present a new formula for its canonical measure, or the one-loop Grassmannian measure formula. Using the recently proposed momentum-twistor diagrams, we show that there is a correspondence between the cells of one-loop amplituhedron, BCFW terms or equivalently on-shell diagrams, and residues of the one-loop Grassmannian formula. In particular, for the first non-trivial case of one-loop NMHV, these structures are naturally associated with a nice geometric picture as polygons in projective space, as we discuss in various illustrative examples.

preprint2014arXiv

The Amplituhedron from Momentum Twistor Diagrams

We propose a new diagrammatic formulation of the all-loop scattering amplitudes/Wilson loops in planar N=4 SYM, dubbed the "momentum-twistor diagrams". These are on-shell-diagrams obtained by gluing trivalent black and white vertices defined in momentum twistor space, which, in the reduced diagram case, are known to be related to diagrams in the original twistor space. The new diagrams are manifestly Yangian invariant, and they naturally represent factorization and forward-limit contributions in the all-loop BCFW recursion relations in momentum twistor space, in a fashion that is completely different from those in momentum space. We show how to construct and evaluate momentum-twistor diagrams, and how to use them to obtain tree-level amplitudes and loop-level integrands; in particular for the latter we identify an isolated bubble-structure for each loop variable, arising from a forward limit, or entangled removal of particles. From a given diagram one can directly read off the C, D matrices via a generalized "boundary measurement"; this in turn determines a cell in the amplituhedron associated with the amplitude, and our diagrammatic representations of the amplitude can provide triangulations of the amplituhedron with generally very intricate geometries. To demonstrate the computational power of the formalism, we give explicit results for general two-loop integrands, and the cells of the complete amplituhedron for two-loop MHV amplitudes.