Source author record

Ying Cao

Ying Cao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Computer Vision Cryptography and Security Data Structures and Algorithms Machine Learning Artificial Intelligence math.CO Networking and Internet Architecture Neural and Evolutionary Computing physics.ed-ph Programming Languages Software Engineering

Catalog footprint

What is connected

11works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Characteristic quasi-polynomials of truncated arrangements

Given an (affine) integral arrangement $\mathcal{A}$ in $\mathbb{R}^n$, the reduction of $\mathcal{A}$ modulo an arbitrary positive integer $q$ naturally yields an arrangement $\mathcal{A}_q$ in $\mathbb{Z}_q^n$. Our primary objective is to study the combinatorial aspects of the restriction $\mathcal{A}^{(B,\bm b)}$ to the solution space of $B\bm x=\bm b$, and its reduction $\mathcal{A}_q^{(B,\bm b)}$ modulo $q$. This work generalizes the earlier results of Kamiya, Takemura and Terao, as well as Chen and Wang. The purpose of this paper is threefold as follows. Firstly, we derive an explicit counting formula for the cardinality of the complement $M\big(\mathcal{A}_q^{(B,\bm b)}\big)$ of $\mathcal{A}_q^{(B,\bm b)}$; and prove that for all positive integers $q>q_0$, this cardinality coincides with a quasi-polynomial $χ^{\text{quasi}}\big(\mathcal{A}^{(B,\bm b)},q\big)$ in $q$ with a period $ρ_C$. Secondly, we weaken Chen and Wang's original hypothesis $a \mid b$ to a strictly more general condition $\gcd(a,ρ_C)\mid \gcd(b,ρ_C)$, and introduce the concept of combinatorial equivalence for positive integers. Within this framework, we establish three unified comparison relations: between the unsigned coefficients of $χ^{\text{quasi}}\big(\mathcal{A}^{(B,\bm b)},a\big)$ and $χ^{\text{quasi}}\big(\mathcal{A}^{(B,\bm b)},b\big)$; between the unsigned coefficients of distinct constituents of $χ^{\text{quasi}}\big(\mathcal{A}^{(B,\bm b)},q\big)$; and between the cardinalities of $M\big(\mathcal{A}_q^{(B,\bm b)}\big)$ and $M\big(\mathcal{A}_{pq}^{(B,\bm b)}\big)$. Thirdly, using our method, we revisit the enumerative aspects of group colorings and nowhere-zero nonhomogeneous form flows from the early work of Forge, Zaslavsky and Kochol.

preprint2024arXiv

Analyzing students collaboratively solving spherical unit vector problems in upper level E and M through a lens of shared resources

We are interested in better understanding ways that students collaborate to solve conceptual physics problems in the context of spherical unit vectors in upper-level E&M, especially problems that have been shown to be difficult for students to solve individually on their own, but which groups of students have been more successful at. Using think-aloud interviews with students in small groups, we ask them to solve together on a large whiteboard conceptual problems from this E&M context. The interviews were video and audio recorded, and qualitatively analyzed using an emergent coding method and the resources framework. Through this analysis, we observed one common mechanism in all three group-interviews whereby students collaborated effectively: first one student activated a conceptual resource and expressed it, then another student took up that idea, and finally the whole group together used that idea to move forward with the problem. This mechanism exemplifies a newer framework: shared resources. We further analyzed students' collaboration through the lens of shared resources and identified multiple instances. We propose that the shared resources construct could be a potential tool to help understand how students collaborate in solving conceptual physics problems. In this paper, we report our methodology and the results from one group interview to illustrate one shared resource we identified and the role it played in helping students collaboratively solve the conceptual problem in this context. Future work and implications for instruction are suggested.

preprint2023arXiv

Boosting Neural Networks to Decompile Optimized Binaries

Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.

preprint2022arXiv

Night-time Scene Parsing with a Large Real Dataset

Although huge progress has been made on scene analysis in recent years, most existing works assume the input images to be in day-time with good lighting conditions. In this work, we aim to address the night-time scene parsing (NTSP) problem, which has two main challenges: 1) labeled night-time data are scarce, and 2) over- and under-exposures may co-occur in the input night-time images and are not explicitly modeled in existing pipelines. To tackle the scarcity of night-time data, we collect a novel labeled dataset, named {\it NightCity}, of 4,297 real night-time images with ground truth pixel-level semantic annotations. To our knowledge, NightCity is the largest dataset for NTSP. In addition, we also propose an exposure-aware framework to address the NTSP problem through augmenting the segmentation process with explicitly learned exposure features. Extensive experiments show that training on NightCity can significantly improve NTSP performances and that our exposure-aware model outperforms the state-of-the-art methods, yielding top performances on our dataset as well as existing datasets.

preprint2021arXiv

Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation

In this paper, we propose a fully automatic system for generating comic books from videos without any human intervention. Given an input video along with its subtitles, our approach first extracts informative keyframes by analyzing the subtitles, and stylizes keyframes into comic-style images. Then, we propose a novel automatic multi-page layout framework, which can allocate the images across multiple pages and synthesize visually interesting layouts based on the rich semantics of the images (e.g., importance and inter-image relation). Finally, as opposed to using the same type of balloon as in previous works, we propose an emotion-aware balloon generation method to create different types of word balloons by analyzing the emotion of subtitles and audios. Our method is able to vary balloon shapes and word sizes in balloons in response to different emotions, leading to more enriched reading experience. Once the balloons are generated, they are placed adjacent to their corresponding speakers via speaker detection. Our results show that our method, without requiring any user inputs, can generate high-quality comic pages with visually rich layouts and balloons. Our user studies also demonstrate that users prefer our generated results over those by state-of-the-art comic generation systems.

preprint2021arXiv

Online Network Utility Maximization: Algorithm, Competitive Analysis, and Applications

We consider an online version of the well-studied network utility maximization problem, where users arrive one by one and an operator makes irrevocable decisions for each user without knowing the details of future arrivals. We propose a threshold-based algorithm and analyze its worst-case performance. We prove that the competitive ratio of the proposed algorithm is linearly increasing in the number of links in a network and show this competitive analysis is tight. Extensive trace-driven simulations are conducted to demonstrate the performance of our proposed algorithm. In addition, since worst-case scenarios rarely occur in practice, we devise an adaptive implementation of our algorithm to improve its average-case performance and validate its effectiveness via simulations.

preprint2021arXiv

Semantics-Recovering Decompilation through Neural Machine Translation

Decompilation transforms low-level program languages (PL) (e.g., binary code) into high-level PLs (e.g., C/C++). It has been widely used when analysts perform security analysis on software (systems) whose source code is unavailable, such as vulnerability search and malware analysis. However, current decompilation tools usually need lots of experts' efforts, even for years, to generate the rules for decompilation, which also requires long-term maintenance as the syntax of high-level PL or low-level PL changes. Also, an ideal decompiler should concisely generate high-level PL with similar functionality to the source low-level PL and semantic information (e.g., meaningful variable names), just like human-written code. Unfortunately, existing manually-defined rule-based decompilation techniques only functionally restore the low-level PL to a similar high-level PL and are still powerless to recover semantic information. In this paper, we propose a novel neural decompilation approach to translate low-level PL into accurate and user-friendly high-level PL, effectively improving its readability and understandability. Furthermore, we implement the proposed approaches called SEAM. Evaluations on four real-world applications show that SEAM has an average accuracy of 94.41%, which is much better than prior neural machine translation (NMT) models. Finally, we evaluate the effectiveness of semantic information recovery through a questionnaire survey, and the average accuracy is 92.64%, which is comparable or superior to the state-of-the-art compilers.

preprint2020arXiv

Delay-Aware Scheduling over mmWave/Sub-6 Dual Interfaces: A Reinforcement Learning Approach

We consider a transmitter with mmWave/sub6 dual interfaces. Due to the intermittency of mmWave channel, the transmitter must schedule packets wisely across the interfaces to minimize the average delay by observing the system state. We usethe well-known dynamic programming methods and Q-learning to find the optimal scheduling policy and investigate the influenceof observing CSI on the optimal policy under different levels of knowledge of the environment. We find that only when the channel state transition model is not available, the instantaneousCSI can help in reducing system delay

preprint2020arXiv

Optimal Online Algorithms for One-Way Trading and Online Knapsack Problems: A Unified Competitive Analysis

We study two canonical online optimization problems under capacity/budget constraints: the fractional one-way trading problem (OTP) and the integral online knapsack problem (OKP) under an infinitesimal assumption. Under the competitive analysis framework, it is well-known that both problems have the same optimal competitive ratio. However, these two problems are investigated by distinct approaches under separate contexts in the literature. There is a gap in understanding the connection between these two problems and the nature of their online algorithm design. This paper provides a unified framework for the online algorithm design, analysis and optimality proof for both problems. We find that the infinitesimal assumption of the OKP is the key that connects the OTP in the analysis of online algorithms and the construction of worst-case instances. With this unified understanding, our framework shows its potential for analyzing other extensions of OKP and OTP in a more systematic manner.

preprint2016arXiv

Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering

While question answering (QA) with neural network, i.e. neural QA, has achieved promising results in recent years, lacking of large scale real-word QA dataset is still a challenge for developing and evaluating neural QA system. To alleviate this problem, we propose a large scale human annotated real-world QA dataset WebQA with more than 42k questions and 556k evidences. As existing neural QA methods resolve QA either as sequence generation or classification/ranking problem, they face challenges of expensive softmax computation, unseen answers handling or separate candidate answer generation component. In this work, we cast neural QA as a sequence labeling problem and propose an end-to-end sequence labeling model, which overcomes all the above challenges. Experimental results on WebQA show that our model outperforms the baselines significantly with an F1 score of 74.69% with word-based input, and the performance drops only 3.72 F1 points with more challenging character-based input.

preprint2016arXiv

Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation

Neural machine translation (NMT) aims at solving machine translation (MT) problems using neural networks and has exhibited promising results in recent years. However, most of the existing NMT models are shallow and there is still a performance gap between a single NMT model and the best conventional MT system. In this work, we introduce a new type of linear connections, named fast-forward connections, based on deep Long Short-Term Memory (LSTM) networks, and an interleaved bi-directional architecture for stacking the LSTM layers. Fast-forward connections play an essential role in propagating the gradients and building a deep topology of depth 16. On the WMT'14 English-to-French task, we achieve BLEU=37.7 with a single attention model, which outperforms the corresponding single shallow model by 6.2 BLEU points. This is the first time that a single NMT model achieves state-of-the-art performance and outperforms the best conventional model by 0.7 BLEU points. We can still achieve BLEU=36.3 even without using an attention mechanism. After special handling of unknown words and model ensembling, we obtain the best score reported to date on this task with BLEU=40.4. Our models are also validated on the more difficult WMT'14 English-to-German task.

Ying Cao

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Characteristic quasi-polynomials of truncated arrangements

Analyzing students collaboratively solving spherical unit vector problems in upper level E and M through a lens of shared resources

Boosting Neural Networks to Decompile Optimized Binaries

Night-time Scene Parsing with a Large Real Dataset

Automatic Comic Generation with Stylistic Multi-page Layouts and Emotion-driven Text Balloon Generation

Online Network Utility Maximization: Algorithm, Competitive Analysis, and Applications

Semantics-Recovering Decompilation through Neural Machine Translation

Delay-Aware Scheduling over mmWave/Sub-6 Dual Interfaces: A Reinforcement Learning Approach

Optimal Online Algorithms for One-Way Trading and Online Knapsack Problems: A Unified Competitive Analysis

Dataset and Neural Recurrent Sequence Labeling Model for Open-Domain Factoid Question Answering

Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation