Researcher profile

Cong Yu

Cong Yu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

Evidence for a Nonzero Eccentricity Superpuff Exoplanet WASP-107 b Using JWST Occultation Observation

WASP-107~b is an extremely low-density super-puff exoplanet whose inflated radius and evidence of strong internal heating make it a key target for understanding planetary structure and evolution. Its orbital eccentricity is a critical parameter for testing mechanisms such as tidal heating and high-eccentricity migration, yet previous measurements have remained inconclusive. Due to the large radial velocity jitter caused by stellar activity, and the presence of at least one additional planet in the system, previous radial velocity measurements could not robustly determine the eccentricity of WASP-107~b. Here we combine the new JWST secondary eclipse data with transit timing data from HST, TESS, and JWST to measure the eccentricity of WASP-107~b. Our joint analysis shows that WASP-107~b has an eccentricity of $0.09\pm0.02$, a mass of $0.096\pm0.005 \, M_J$, and an orbital period of $5.721487\pm0.000001$~days. We find the $99.7\%$ lower limit of the eccentricity is about 0.04. These new measurements are consistent with the scenario in which WASP-107~b is in the final stage of high-eccentricity migration. Preliminary estimate shows that eccentricity-driven tidal dissipation can provide a significant contribution to the energy required to sustain the observed radius inflation of WASP-107~b. Our results establish the dynamical status of one of the most intriguing low-density exoplanets known, and offer new insights into its formation and evolution history.

preprint2026arXiv

Irradiated Atmosphere V: Effects of Vertical-Mixing induced Energy Transport on the Inhomogeneity

Atmospheric variations over time and space boost planetary cooling, as outgoing internal flux responds to stellar radiation and opacity. Vertical mixing regulates this cooling. Our study examines how gravity waves or large-scale induced mixing interact with radiation transfer, affecting temperature inhomogeneity and internal flux. Through the radiative-convective-mixing equilibrium, mixing increases temperature inhomogeneity in the middle and lower atmospheres, redistributing internal flux. Stronger stellar radiation and mixing significantly reduce outgoing flux, slowing cooling. With constant infrared (IR) opacity, lower visible opacity and stronger mixing significantly reduce outgoing flux. Jensen's inequality implies that greater spatial disparities in stellar flux and opacity elevate the ratio of the average internal flux in inhomogeneous columns relative to that in homogeneous columns. This effect, particularly pronounced under high opacity contrasts, amplifies deep-layer temperature inhomogeneity and may enhance cooling. However, with mixing, overall cooling is weaker than without, as both the averaged internal flux of the inhomogeneous columns and that of the homogeneous column decline more sharply for the latter. Thus, while vertical mixing-induced inhomogeneity can enhance cooling, the overall cooling effect remains weaker than in the non-mixing case. Therefore, vertical mixing, by regulating atmospheric structure and flux, is key to understanding planetary cooling.

preprint2025arXiv

Introduction to the Chinese Space Station Survey Telescope (CSST)

The Chinese Space Station Survey Telescope (CSST) is an upcoming Stage-IV sky survey telescope, distinguished by its large field of view (FoV), high image quality, and multi-band observation capabilities. It can simultaneously conduct precise measurements of the Universe by performing multi-color photometric imaging and slitless spectroscopic surveys. The CSST is equipped with five scientific instruments, i.e. Multi-band Imaging and Slitless Spectroscopy Survey Camera (SC), Multi-Channel Imager (MCI), Integral Field Spectrograph (IFS), Cool Planet Imaging Coronagraph (CPI-C), and THz Spectrometer (TS). Using these instruments, CSST is expected to make significant contributions and discoveries across various astronomical fields, including cosmology, galaxies and active galactic nuclei (AGN), the Milky Way and nearby galaxies, stars, exoplanets, Solar System objects, astrometry, and transients and variable sources. This review aims to provide a comprehensive overview of the CSST instruments, observational capabilities, data products, and scientific potential.

preprint2022arXiv

All Birds with One Stone: Multi-task Text Classification for Efficient Inference with One Forward Pass

Multi-Task Learning (MTL) models have shown their robustness, effectiveness, and efficiency for transferring learned knowledge across tasks. In real industrial applications such as web content classification, multiple classification tasks are predicted from the same input text such as a web article. However, at the serving time, the existing multitask transformer models such as prompt or adaptor based approaches need to conduct N forward passes for N tasks with O(N) computation cost. To tackle this problem, we propose a scalable method that can achieve stronger performance with close to O(1) computation cost via only one forward pass. To illustrate real application usage, we release a multitask dataset on news topic and style classification. Our experiments show that our proposed method outperforms strong baselines on both the GLUE benchmark and our news dataset. Our code and dataset are publicly available at https://bit.ly/mtop-code.

preprint2022arXiv

Charformer: Fast Character Transformers via Gradient-based Subword Tokenization

State-of-the-art models in natural language processing rely on separate rigid subword tokenization algorithms, which limit their generalization ability and adaptation to new settings. In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model. To this end, we introduce a soft gradient-based subword tokenization module (GBST) that automatically learns latent subword representations from characters in a data-driven fashion. Concretely, GBST enumerates candidate subword blocks and learns to score them in a position-wise fashion using a block scoring network. We additionally introduce Charformer, a deep Transformer model that integrates GBST and operates on the byte level. Via extensive experiments on English GLUE, multilingual, and noisy text datasets, we show that Charformer outperforms a series of competitive byte-level baselines while generally performing on par and sometimes outperforming subword-based models. Additionally, Charformer is fast, improving the speed of both vanilla byte-level and subword-level Transformers by 28%-100% while maintaining competitive quality. We believe this work paves the way for highly performant token-free models that are trained completely end-to-end.

preprint2022arXiv

Effects of Self-gravity on Mass-loss of the Post-impact Super-Earths

Kepler's observations show most of the exoplanets are super-Earths. The formation of super-Earth is generally related to the atmospheric mass loss that is crucial in the planetary structure and evolution. The shock driven by the giant impact will heat the planet, resulting in the atmosphere escape. We focus on whether self-gravity changes the efficiency of mass loss. Without self-gravity, if the impactor mass is comparable to the envelope mass, there is a significant mass-loss. The radiative-convective boundary will shift inward by self-gravity. As the temperature and envelope mass increase, the situation becomes more prominent, resulting in a heavier envelope. Therefore, the impactor mass will increase to motivate the significant mass loss, as the self-gravity is included. With the increase of envelope mass, the self-gravity is particularly important.

preprint2022arXiv

Image Steganography based on Style Transfer

Image steganography is the art and science of using images as cover for covert communications. With the development of neural networks, traditional image steganography is more likely to be detected by deep learning-based steganalysis. To improve upon this, we propose image steganography network based on style transfer, and the embedding of secret messages can be disguised as image stylization. We embed secret information while transforming the content image style. In latent space, the secret information is integrated into the latent representation of the cover image to generate the stego images, which are indistinguishable from normal stylized images. It is an end-to-end unsupervised model without pre-training. Extensive experiments on the benchmark dataset demonstrate the reliability, quality and security of stego images generated by our steganographic network.

preprint2022arXiv

RFMask: A Simple Baseline for Human Silhouette Segmentation with Radio Signals

Human silhouette segmentation, which is originally defined in computer vision, has achieved promising results for understanding human activities. However, the physical limitation makes existing systems based on optical cameras suffer from severe performance degradation under low illumination, smoke, and/or opaque obstruction conditions. To overcome such limitations, in this paper, we propose to utilize the radio signals, which can traverse obstacles and are unaffected by the lighting conditions to achieve silhouette segmentation. The proposed RFMask framework is composed of three modules. It first transforms RF signals captured by millimeter wave radar on two planes into spatial domain and suppress interference with the signal processing module. Then, it locates human reflections on RF frames and extract features from surrounding signals with human detection module. Finally, the extracted features from RF frames are aggregated with an attention based mask generation module. To verify our proposed framework, we collect a dataset containing 804,760 radio frames and 402,380 camera frames with human activities under various scenes. Experimental results show that the proposed framework can achieve impressive human silhouette segmentation even under the challenging scenarios(such as low light and occlusion scenarios) where traditional optical-camera-based methods fail. To the best of our knowledge, this is the first investigation towards segmenting human silhouette based on millimeter wave signals. We hope that our work can serve as a baseline and inspire further research that perform vision tasks with radio signals. The dataset and codes will be made in public.

preprint2022arXiv

Rossby Wave Instabilities of Protoplanetary Discs with Cooling

Rossby wave instabilities (RWIs) usually lead to nonaxisymmetric vortices in protoplanetary discs and some observed sub-structures of these discs can be well explained by RWIs. We explore how the cooling influences the growth rate of unstable RWI modes in terms of the linear perturbation analysis. The cooling associated with the energy equation is treated in two different ways. The first one we adopt is a simple cooling law. The perturbed thermal state relaxes to the initial thermal state on a prescribed cooling timescale. In the second, we treat the cooling as a thermal diffusion process. The difference in the growth rate between the adiabatic and isothermal modes becomes more pronounced for discs with smaller sound speed. For the simple cooling law, the growth rates of unstable modes monotonically decrease with the shorter cooling timescale in barotropic discs. But the dependence of growth rate with the cooling timescale becomes non-monotonic in non-baratopic discs. The RWI might even be enhanced in non-barotropic discs during the transition from the adiabatic state to the isothermal state. When the cooling is treated as the thermal diffusion, even in barotropic disc, the variation of growth rate with thermal diffusivity becomes non-monotonic. Further more, a maximum growth rate may appear with an appropriate value of thermal diffusivity. The angular momentum flux (AMF) is investigated to understand the angular momentum transport by RWI with cooling.

preprint2022arXiv

Training ELECTRA Augmented with Multi-word Selection

Pre-trained text encoders such as BERT and its variants have recently achieved state-of-the-art performances on many NLP tasks. While being effective, these pre-training methods typically demand massive computation resources. To accelerate pre-training, ELECTRA trains a discriminator that predicts whether each input token is replaced by a generator. However, this new task, as a binary classification, is less semantically informative. In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning. Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets. We further develop two techniques to effectively combine all pre-training tasks: (1) using attention-based networks for task-specific heads, and (2) sharing bottom layers of the generator and the discriminator. Extensive experiments on GLUE and SQuAD datasets demonstrate both the effectiveness and the efficiency of our proposed method.

preprint2021arXiv

Quiz-Style Question Generation for News Stories

A large majority of American adults get at least some of their news from the Internet. Even though many online news products have the goal of informing their users about the news, they lack scalable and reliable tools for measuring how well they are achieving this goal, and therefore have to resort to noisy proxy metrics (e.g., click-through rates or reading time) to track their performance. As a first step towards measuring news informedness at a scale, we study the problem of quiz-style multiple-choice question generation, which may be used to survey users about their knowledge of recent news. In particular, we formulate the problem as two sequence-to-sequence tasks: question-answer generation (QAG) and distractor, or incorrect answer, generation (DG). We introduce NewsQuizQA, the first dataset intended for quiz-style question-answer generation, containing 20K human written question-answer pairs from 5K news article summaries. Using this dataset, we propose a series of novel techniques for applying large pre-trained Transformer encoder-decoder models, namely PEGASUS and T5, to the tasks of question-answer generation and distractor generation. We show that our models outperform strong baselines using both automated metrics and human raters. We provide a case study of running weekly quizzes on real-world users via the Google Surveys platform over the course of two months. We found that users generally found the automatically generated questions to be educational and enjoyable. Finally, to serve the research community, we are releasing the NewsQuizQA dataset.

preprint2021arXiv

The Critical Core Mass of Rotating Planets

The gravitational harmonics measured from Juno and Cassini spacecrafts help us to specify the internal structure and chemical elements of Jupiter and Saturn, respectively. However, we still do not know much about the impact of rotation on the planetary internal structure as well as their formation. The centrifugal force induced by rotation deforms the planetary shape and partially counteracts the gravitational force. Thus, rotation will affect the critical core mass of the exoplanet. Once the atmospheric mass becomes comparable to the critical core mass, the planet will enter the runaway accretion phase and becomes a gas giant. We have confirmed that the critical core masses of rotating planets depend on the stiffness of the polytrope, the outer boundary conditions, and the thickness of the isothermal layer. The critical core mass with Bondi boundary condition is determined by the surface properties. The critical core mass of a rotating planet will increase with the core gravity (i.e., the innermost density). For the Hill boundary condition, the soft polytrope shares the same properties as planets with Bondi boundary condition. Since the total mass for planets with Hill boundary condition increases with the decrease of the polytropic index, higher core gravity is required for rotating planets. As a result, the critical core mass in the stiff Hill model sharply increases. The rotation effects become more important when the radiative and convective regions coexist. Besides, the critical core mass of planets with Hill (Bondi) boundary increases noticeably as the radiative layer becomes thinner (thicker).

preprint2020arXiv

A Generative Approach to Titling and Clustering Wikipedia Sections

We evaluate the performance of transformer encoders with various decoders for information organization through a new task: generation of section headings for Wikipedia articles. Our analysis shows that decoders containing attention mechanisms over the encoder output achieve high-scoring results by generating extractive text. In contrast, a decoder without attention better facilitates semantic encoding and can be used to generate section embeddings. We additionally introduce a new loss function, which further encourages the decoder to generate high-quality embeddings.

preprint2020arXiv

CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese

In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for named entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization, and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labeling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines, and leader-board.

preprint2020arXiv

Generating Representative Headlines for News Stories

Millions of news articles are published online every day, which can be overwhelming for readers to follow. Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption. However, it remains a challenging research problem to efficiently and effectively generate a representative headline for each story. Automatic summarization of a document set has been studied for decades, while few studies have focused on generating representative headlines for a set of articles. Unlike summaries, which aim to capture most information with least redundancy, headlines aim to capture information jointly shared by the story articles in short length, and exclude information that is too specific to each individual article. In this work, we study the problem of generating representative headlines for news stories. We develop a distant supervision approach to train large-scale generation models without any human annotation. This approach centers on two technical components. First, we propose a multi-level pre-training framework that incorporates massive unlabeled corpus with different quality-vs.-quantity balance at different levels. We show that models trained within this framework outperform those trained with pure human curated corpus. Second, we propose a novel self-voting-based article attention layer to extract salient information shared by multiple articles. We show that models that incorporate this layer are robust to potential noises in news stories and outperform existing baselines with or without noises. We can further enhance our model by incorporating human labels, and we show our distant supervision approach significantly reduces the demand on labeled data.

preprint2020arXiv

Thermally Driven Angular Momentum Transport in Hot Jupiters

We study the angular momentum transport inside the hot Jupiters under the the influences of gravitational and thermal forcing. Due to the strong stellar irradiation, radiative region develops on top of the convective region. Internal gravity waves are launched at the radiative-convective boundaries (RCBs). The thermal response is dynamical and plays an important role in the angular momentum transport. By separating the gravitational and thermal forcing terms, we identify the thermal effects for increasing the angular momentum transport. For the low frequency (in the co-rotating frame with planets) prograde (retrograde) tidal frequency, the angular momentum flux is positive (negative). The tidal interactions tends to drive the planet to the synchronous state. We find that the angular momentum transport associated with the internal gravity wave is very sensitive to relative position between the RCB and the penetration depth of the thermal forcing. If the RCB is in the vicinity of the thermal forcing penetration depth, even with small amplitude thermal forcing, the thermally driven angular momentum flux could be much larger than the flux induced by gravitational forcing. The thermally enhanced torque could drive the planet to the synchronous state in as short as a few $10^4$ years.

preprint2020arXiv

Toward a Full MHD Jet Model of Spinning Black Holes--II: Kinematics and Application to the M87 Jet

In this paper, we investigate the magnetohydrodynamical structure of a jet powered by a spinning black hole, where electromagnetic fields and fluid motion are governed by the Grad-Shafranov equation and the Bernoulli equation, respectively. Assuming steady and axisymmetric jet structure, the global solution is uniquely determined with prescribed plasma loading into the jet and the poloidal shape of the outmost magnetic field line. We apply this model to the jet in the center of nearby radio galaxy M87, and we find it can naturally explain the slow flow acceleration and the flow velocity stratification within $10^5$ gravitational radii from the central black hole. In particular, we find the extremal black hole spin is disfavored by the flow velocity measurements, if the plasma loading to the jet is dominated by the electron/positron pair production at the jet base.