Source author record

Yanyan Xu

Yanyan Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks Computer Vision cs.CY Databases eess.AS Machine Learning math.DG

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Dynamically Mitigating Data Discrepancy with Balanced Focal Loss for Replay Attack Detection

It becomes urgent to design effective anti-spoofing algorithms for vulnerable automatic speaker verification systems due to the advancement of high-quality playback devices. Current studies mainly treat anti-spoofing as a binary classification problem between bonafide and spoofed utterances, while lack of indistinguishable samples makes it difficult to train a robust spoofing detector. In this paper, we argue that for anti-spoofing, it needs more attention for indistinguishable samples over easily-classified ones in the modeling process, to make correct discrimination a top priority. Therefore, to mitigate the data discrepancy between training and inference, we propose D3M, to leverage a balanced focal loss function as the training objective to dynamically scale the loss based on the traits of the sample itself. Besides, in the experiments, we select three kinds of features that contain both magnitude-based and phase-based information to form complementary and informative features. Experimental results on the ASVspoof2019 dataset demonstrate the superiority of the proposed methods by comparison between our systems and top-performing ones. Systems trained with the balanced focal loss perform significantly better than conventional cross-entropy loss. With complementary features, our fusion system with only three kinds of features outperforms other systems containing five or more complex single models by 22.5% for min-tDCF and 7% for EER, achieving a min-tDCF and an EER of 0.0124 and 0.55% respectively. Furthermore, we present and discuss the evaluation results on real replay data apart from the simulated ASVspoof2019 data, indicating that research for anti-spoofing still has a long way to go. Source code, analysis data, and other details are publicly available at https://github.com/asvspoof/D3M.

preprint2020arXiv

Deconstructing laws of accessibility and facility distribution in cities

The era of the automobile has seriously degraded the quality of urban life through costly travel and visible environmental effects. A new urban planning paradigm must be at the heart of our roadmap for the years to come. The one where, within minutes, inhabitants can access their basic living needs by bike or by foot. In this work, we present novel insights of the interplay between the distributions of facilities and population that maximize accessibility over the existing road networks. Results in six cities reveal that travel costs could be reduced in half through redistributing facilities. In the optimal scenario, the average travel distance can be modeled as a functional form of the number of facilities and the population density. As an application of this finding, it is possible to estimate the number of facilities needed for reaching a desired average travel distance given the population distribution in a city.

preprint2020arXiv

Socio-economic, built environment, and mobility conditions associated with crime: A study of multiple cities

Nowadays, 23% of the world population lives in multi-million cities. In these metropolises, criminal activity is much higher and violent than in either small cities or rural areas. Thus, understanding what factors influence urban crime in big cities is a pressing need. Mainstream studies analyse crime records through historical panel data or analysis of historical patterns combined with ecological factor and exploratory mapping. More recently, machine learning methods have provided informed crime prediction over time. However, previous studies have focused on a single city at a time, considering only a limited number of factors (such as socio-economical characteristics) and often at large spatial units. Hence, our understanding of the factors influencing crime across cultures and cities is very limited. Here we propose a Bayesian model to explore how crime is related not only to socio-economic factors but also to the built environmental (e.g. land use) and mobility characteristics of neighbourhoods. To that end, we integrate multiple open data sources with mobile phone traces and compare how the different factors correlate with crime in diverse cities, namely Boston, Bogotá, Los Angeles and Chicago. We find that the combined use of socio-economic conditions, mobility information and physical characteristics of the neighbourhood effectively explain the emergence of crime, and improve the performance of the traditional approaches. However, we show that the socio-ecological factors of neighbourhoods relate to crime very differently from one city to another. Thus there is clearly no "one fits all" model.

preprint2016arXiv

Collective benefits in traffic during mega events via the use of information technologies

Information technologies today can inform each of us about the best alternatives for shortest paths from origins to destinations, but they do not contain incentives or alternatives that manage the information efficiently to get collective benefits. To obtain such benefits, we need to have not only good estimates of how the traffic is formed but also to have target strategies to reduce enough vehicles from the best possible roads in a feasible way. The opportunity is that during large events the traffic inconveniences in large cities are unusually high, yet temporary, and the entire population may be more willing to adopt collective recommendations for social good. In this paper, we integrate for the first time big data resources to quantify the impact of events and propose target strategies for collective good at urban scale. In the context of the Olympic Games in Rio de Janeiro, we first predict the expected increase in traffic. To that end, we integrate data from: mobile phones, Airbnb, Waze, and transit information, with game schedules and information of venues. Next, we evaluate the impact of the Olympic Games to the travel of commuters, and propose different route choice scenarios during the peak hours. Moreover, we gather information on the trips that contribute the most to the global congestion and that could be redirected from vehicles to transit. Interestingly, we show that (i) following new route alternatives during the event with individual shortest path can save more collective travel time than keeping the routine routes, uncovering the positive value of information technologies during events; (ii) with only a small proportion of people selected from specific areas switching from driving to public transport, the collective travel time can be reduced to a great extent. Results are presented on-line for the evaluation of the public and policy makers.

preprint2016arXiv

The rigidity of hypersurface in Euclidean space

In the present paper, we revisit the rigidity of hypersurfaces in Euclidean space. We highlight Darboux equation and give new proof of rigidity of hypersurfaces by energy method and maximal principle.

preprint2014arXiv

Hop Doubling Label Indexing for Point-to-Point Distance Querying on Scale-Free Networks

We study the problem of point-to-point distance querying for massive scale-free graphs, which is important for numerous applications. Given a directed or undirected graph, we propose to build an index for answering such queries based on a hop-doubling labeling technique. We derive bounds on the index size, the computation costs and I/O costs based on the properties of unweighted scale-free graphs. We show that our method is much more efficient compared to the state-of-the-art technique, in terms of both querying time and indexing time. Our empirical study shows that our method can handle graphs that are orders of magnitude larger than existing methods.