Source author record

Tao Jia

Tao Jia appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.soc-ph Social and Information Networks Molecular Networks physics.data-an Biological Physics cond-mat.mtrl-sci Digital Libraries Machine Learning Cell Behavior cond-mat.dis-nn cond-mat.soft cond-mat.stat-mech cond-mat.str-el cond-mat.supr-con Information Retrieval Quantitative Methods

Catalog footprint

What is connected

28works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

CasSeqGCN: Combining Network Structure and Temporal Sequence to Predict Information Cascades

One important task in the study of information cascade is to predict the future recipients of a message given its past spreading trajectory. While the network structure serves as the backbone of the spreading, an accurate prediction can hardly be made without the knowledge of the dynamics on the network. The temporal information in the spreading sequence captures many hidden features, but predictions based on sequence alone have their limitations. Recent efforts start to explore the possibility of combining both the network structure and the temporal feature. Here, we propose a new end-to-end prediction method CasSeqGCN in which the structure and temporal feature are simultaneously taken into account. A cascade is divided into multiple snapshots which record the network topology and the state of nodes. The graph convolutional network (GCN) is used to learn the representation of a snapshot. A novel aggregation method based on dynamic routing is proposed to aggregate node representation and the long short-term memory (LSTM) model is used to extract temporal information. CasSeqGCN predicts the future cascade size more accurately compared with other state-of-art baseline methods. The ablation study demonstrates that the improvement mainly comes from the design of the input and the GCN layer. We explicitly design an experiment to show the quality of the cascade representation learned by our approach is better than other methods. Our work proposes a new approach to combine the structural and temporal features, which not only gives a useful baseline model for future studies of cascade prediction, but also brings new insights on a wide collection of problems related with dynamics on and of the network.

preprint2022arXiv

CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling

Heterogeneous information network (HIN) embedding aims to find the representations of nodes that preserve the proximity between entities of different nature. A family of approaches that are wildly adopted applies random walk to generate a sequence of heterogeneous context, from which the embedding is learned. However, due to the multipartite graph structure of HIN, hub nodes tend to be over-represented in the sampled sequence, giving rise to imbalanced samples of the network. Here we propose a new embedding method CoarSAS2hvec. The self-avoid short sequence sampling with the HIN coarsening procedure (CoarSAS) is utilized to better collect the rich information in HIN. An optimized loss function is used to improve the performance of the HIN structure embedding. CoarSAS2hvec outperforms nine other methods in two different tasks on four real-world data sets. The ablation study confirms that the samples collected by CoarSAS contain richer information of the network compared with those by other methods, which is characterized by a higher information entropy. Hence, the traditional loss function applied to samples by CoarSAS can also yield improved results. Our work addresses a limitation of the random-walk-based HIN embedding that has not been emphasized before, which can shed light on a range of problems in HIN analyses.

preprint2022arXiv

Independent Asymmetric Embedding for Information Diffusion Prediction on Social Networks

The prediction for information diffusion on social networks has great practical significance in marketing and public opinion control. It aims to predict the individuals who will potentially repost the message on the social network. One type of method is based on demographics, complex networks and other prior knowledge to establish an interpretable model to simulate and predict the propagation process, while the other type of method is completely data-driven and maps the nodes to a latent space for propagation prediction. Existing latent space design and embedding methods lack consideration for the intervene among users. In this paper, we propose an independent asymmetric embedding method to embed each individual into one latent influence space and multiple latent susceptibility spaces. Based on the similarity between information diffusion and heat diffusion phenomenon, the heat diffusion kernel is exploited in our model and establishes the embedding rules. Furthermore, our method captures the co-occurrence regulation of user combinations in cascades to improve the calculating effectiveness. The results of extensive experiments conducted on real-world datasets verify both the predictive accuracy and cost-effectiveness of our approach.

preprint2022arXiv

Temporal Network Epistemology: on Reaching Consensus in Real World Setting

This work develops the concept of temporal network epistemology model enabling the simulation of the learning process in dynamic networks. The results of the research, conducted on the temporal social network generated using the CogSNet model and on the static topologies as a reference, indicate a significant influence of the network temporal dynamics on the outcome and flow of the learning process. It has been shown that not only the dynamics of reaching consensus is different compared to baseline models but also that previously unobserved phenomena appear, such as uninformed agents or different consensus states for disconnected components. It has been also observed that sometimes only the change of the network structure can contribute to reaching consensus. The introduced approach and the experimental results can be used to better understand the way how human communities collectively solve both complex problems at the scientific level and to inquire into the correctness of less complex but common and equally important beliefs' spreading across entire societies.

preprint2021arXiv

A novel similarity measure for mining missing links in long-path networks

Network information mining is the study of the network topology, which answers a large number of application-based questions towards the structural evolution and the function of a real system. For example, the questions can be related to how the real system evolves or how individuals interact with each other in social networks. Although the evolution of the real system may seem to be found regularly, capturing patterns on the whole process of the evolution is not trivial. Link prediction is one of the most important technologies in network information mining, which can help us understand the real system's evolution law. Link prediction aims to uncover missing links or quantify the likelihood of the emergence of nonexistent links from known network structures. Currently, widely existing methods of link prediction almost focus on short-path networks that usually have a myriad of close triangular structures. However, these algorithms on highly sparse or long-path networks have poor performance. Here, we propose a new index that is associated with the principles of Structural Equivalence and Shortest Path Length ($SESPL$) to estimate the likelihood of link existence in long-path networks. Through 548 real networks test, we find that $SESPL$ is more effective and efficient than other similarity-based predictors in long-path networks. We also exploit the performance of $SESPL$ predictor and embedding-based approaches via machine learning techniques, and the performance of $SESPL$ can achieve a gain of 44.09\% over $GraphWave$ and 7.93\% over $Node2vec$. Finally, according to the matrix of Maximal Information Coefficient ($MIC$) between all the similarity-based predictors, $SESPL$ is a new independent feature to the space of traditional similarity features.

preprint2021arXiv

A paper's corresponding affiliation and first affiliation are consistent at the country level in Web of Science

The purpose of this study is to explore the relationship between the first affiliation and the corresponding affiliation at the different levels via the scientometric analysis We select over 18 million papers in the core collection database of Web of Science (WoS) published from 2000 to 2015, and measure the percentage of match between the first and the corresponding affiliation at the country and institution level. We find that a paper's the first affiliation and the corresponding affiliation are highly consistent at the country level, with over 98% of the match on average. However, the match at the institution level is much lower, which varies significantly with time and country. Hence, for studies at the country level, using the first and corresponding affiliations are almost the same. But we may need to take more cautions to select affiliation when the institution is the focus of the investigation. In the meanwhile, we find some evidence that the recorded corresponding information in the WoS database has undergone some changes since 2013, which sheds light on future studies on the comparison of different databases or the affiliation accuracy of WoS. Our finding relies on the records of WoS, which may not be entirely accurate. Given the scale of the analysis, our findings can serve as a useful reference for further studies when country allocation or institute allocation is needed. Existing studies on comparisons of straight counting methods usually cover a limited number of papers, a particular research field or a limited range of time. More importantly, using the number counted can not sufficiently tell if the corresponding and first affiliation are similar. This paper uses a metric similar to Jaccard similarity to measure the percentage of the match and performs a comprehensive analysis based on a large-scale bibliometric database.

preprint2021arXiv

Magic Doping and Robust Superconductivity in Monolayer FeSe on Titanates

The enhanced superconductivity in monolayer FeSe on titanates opens a fascinating pathway towards the rational design of high-temperature superconductors. Utilizing the state-of-the-art oxide plus chalcogenide molecular beam epitaxy systems in situ connected to a synchrotron angle-resolved photoemission spectroscope, epitaxial LaTiO3 layers with varied atomic thicknesses are inserted between monolayer FeSe and SrTiO3, for systematic modulation of interfacial chemical potential.With the dramatic increase of electron accumulation at the LaTiO3-SrTiO3 surface, providing a substantial surge of work function mismatch across the FeSe-oxide interface, the charge transfer and the superconducting gap in the monolayer FeSe are found to remain markedly robust. This unexpected finding indicates the existence of an intrinsically anchored magic doping within the monolayer FeSe systems.

preprint2021arXiv

The evolution of network controllability in growing networks

The study of network structural controllability focuses on the minimum number of driver nodes needed to control a whole network. Despite intensive studies on this topic, most of them consider static networks only. It is well-known, however, that real networks are growing, with new nodes and links added to the system. Here, we analyze controllability of evolving networks and propose a general rule for the change of driver nodes. We further apply the rule to solve the problem of network augmentation subject to the controllability constraint. The findings fill a gap in our understanding of network controllability and shed light on controllability of real systems.

preprint2020arXiv

A generalized linear threshold model for an improved description of the spreading dynamics

Many spreading processes in our real-life can be considered as a complex contagion, and the linear threshold (LT) model is often applied as a very representative model for this mechanism. Despite its intensive usage, the LT model suffers several limitations in describing the time evolution of the spreading. First, the discrete-time step that captures the speed of the spreading is vaguely defined. Second, the synchronous updating rule makes the nodes infected in batches, which can not take individual differences into account. Finally, the LT model is incompatible with existing models for the simple contagion. Here we consider a generalized linear threshold (GLT) model for the continuous-time stochastic complex contagion process that can be efficiently implemented by the Gillespie algorithm. The time in this model has a clear mathematical definition and the updating order is rigidly defined. We find that the traditional LT model systematically underestimates the spreading speed and the randomness in the spreading sequence order. We also show that the GLT model works seamlessly with the susceptible-infected (SI) or susceptible-infected-recovered (SIR) model. One can easily combine them to model a hybrid spreading process in which simple contagion accumulates the critical mass for the complex contagion that leads to the global cascades. Overall, the GLT model we proposed can be a useful tool to study complex contagion, especially when studying the time evolution of the spreading.

preprint2020arXiv

Measuring similarity in co-occurrence data using ego-networks

The co-occurrence association is widely observed in many empirical data. Mining the information in co-occurrence data is essential for advancing our understanding of systems such as social networks, ecosystem, and brain network. Measuring similarity of entities is one of the important tasks, which can usually be achieved using a network-based approach. Here we show that traditional methods based on the aggregated network can bring unwanted in-directed relationship. To cope with this issue, we propose a similarity measure based on the ego network of each entity, which effectively considers the change of an entity's centrality from one ego network to another. The index proposed is easy to calculate and has a clear physical meaning. Using two different data sets, we compare the new index with other existing ones. We find that the new index outperforms the traditional network-based similarity measures, and it can sometimes surpass the embedding method. In the meanwhile, the measure by the new index is weakly correlated with those by other methods, hence providing a different dimension to quantify similarities in co-occurrence data. Altogether, our work makes an extension in the network-based similarity measure and can be potentially applied in several related tasks.

preprint2020arXiv

The dominance of big teams in China's scientific output

Modern science is dominated by scientific productions from teams. A recent finding shows that teams with both large and small sizes are essential in research, prompting us to analyze the extent to which a country's scientific work is carried out by big/small teams. Here, using over 26 million publications from Web of Science, we find that China's research output is more dominated by big teams than the rest of the world, which is particularly the case in fields of natural science. Despite the global trend that more papers are done by big teams, China's drop in small team output is much steeper. As teams in China shift from small to large size, the team diversity that is essential for innovative works does not increase as much as that in other countries. Using the national average as the baseline, we find that the National Natural Science Foundation of China (NSFC) supports fewer small team works than the National Science Foundation of U.S. (NSF) does, implying that big teams are more preferred by grant agencies in China. Our finding provides new insights into the concern of originality and innovation in China, which urges a need to balance small and big teams.

preprint2019arXiv

Verification of Short-Range Order and Its Impact on the Properties of the CrCoNi Medium Entropy Alloy

Traditional metallic alloys are mixtures of elements where the atoms of minority species tend to distribute randomly if they are below their solubility limit, or lead to the formation of secondary phases if they are above it. Recently, the concept of medium/high entropy alloys (MEA/HEA) has expanded this view, as these materials are single-phase solid solutions of generally equiatomic mixtures of metallic elements that have been shown to display enhanced mechanical properties. However, the question has remained as to how random these solid solutions actually are, with the influence of chemical short-range order (SRO) suggested in computational simulations but not seen experimentally. Here we report the first direct observation of SRO in the CrCoNi MEA using high resolution and energy-filtered transmission electron microscopy. Increasing amounts of SRO give rise to both higher stacking fault energy and hardness. These discoveries suggest that the degree of chemical ordering at the nanometer scale can be tailored through thermomechanical processing, providing a new avenue for tuning the mechanical properties of MEA/HEAs.

preprint2019arXiv

Visualizing Exotic Orbital Texture in the Single-Layer Mott Insulator 1T-TaSe2

Mott insulating behavior is induced by strong electron correlation and can lead to exotic states of matter such as unconventional superconductivity and quantum spin liquids. Recent advances in van der Waals material synthesis enable the exploration of novel Mott systems in the two-dimensional limit. Here we report characterization of the local electronic properties of single- and few-layer 1T-TaSe2 via spatial- and momentum-resolved spectroscopy involving scanning tunneling microscopy and angle-resolved photoemission. Our combined experimental and theoretical study indicates that electron correlation induces a robust Mott insulator state in single-layer 1T-TaSe2 that is accompanied by novel orbital texture. Inclusion of interlayer coupling weakens the insulating phase in 1T-TaSe2, as seen by strong reduction of its energy gap and quenching of its correlation-driven orbital texture in bilayer and trilayer 1T-TaSe2. Our results establish single-layer 1T-TaSe2 as a useful new platform for investigating strong correlation physics in two dimensions.

preprint2015arXiv

An Analysis of the Matching Hypothesis in Networks

The matching hypothesis in social psychology claims that people are more likely to form a committed relationship with someone equally attractive. Previous works on stochastic models of human mate choice process indicate that patterns supporting the matching hypothesis could occur even when similarity is not the primary consideration in seeking partners. Yet, most if not all of these works concentrate on fully-connected systems. Here we extend the analysis to networks. Our results indicate that the correlation of the couple's attractiveness grows monotonically with the increased average degree and decreased degree diversity of the network. This correlation is lower in sparse networks than in fully-connected systems, because in the former less attractive individuals who find partners are likely to be coupled with ones who are more attractive than them. The chance of failing to be matched decreases exponentially with both the attractiveness and the degree. The matching hypothesis may not hold when the degree-attractiveness correlation is present, which can give rise to negative attractiveness correlation. Finally, we find that the ratio between the number of matched couples and the size of the maximum matching varies non-monotonically with the average degree of the network. Our results reveal the role of network topology in the process of human mate choice and bring insights into future investigations of different matching processes in networks.

preprint2015arXiv

Emergence of bimodality in controlling complex networks

Our ability to control complex systems is a fundamental challenge of contemporary science. Recently introduced tools to identify the driver nodes, nodes through which we can achieve full control, predict the existence of multiple control configurations, prompting us to classify each node in a network based on their role in control. Accordingly a node is critical, intermittent or redundant if it acts as a driver node in all, some or none of the control configurations. Here we develop an analytical framework to identify the category of each node, leading to the discovery of two distinct control modes in complex systems: centralized vs distributed control. We predict the control mode for an arbitrary network and show that one can alter it through small structural perturbations. The uncovered bimodality has implications from network security to organizational research and offers new insights into the dynamics and control of complex systems.

preprint2013arXiv

Scaling of Geographic Space as a Universal Rule for Map Generalization

Map generalization is a process of producing maps at different levels of detail by retaining essential properties of the underlying geographic space. In this paper, we explore how the map generalization process can be guided by the underlying scaling of geographic space. The scaling of geographic space refers to the fact that in a geographic space small things are far more common than large ones. In the corresponding rank-size distribution, this scaling property is characterized by a heavy tailed distribution such as a power law, lognormal, or exponential function. In essence, any heavy tailed distribution consists of the head of the distribution (with a low percentage of vital or large things) and the tail of the distribution (with a high percentage of trivial or small things). Importantly, the low and high percentages constitute an imbalanced contrast, e.g., 20 versus 80. We suggest that map generalization is to retain the objects in the head and to eliminate or aggregate those in the tail. We applied this selection rule or principle to three generalization experiments, and found that the scaling of geographic space indeed underlies map generalization. We further relate the universal rule to Töpfer's radical law (or trained cartographers' decision making in general), and illustrate several advantages of the universal rule. Keywords: Head/tail division rule, head/tail breaks, heavy tailed distributions, power law, and principles of selection

preprint2011arXiv

Connecting protein and mRNA burst distributions for stochastic models of gene expression

The intrinsic stochasticity of gene expression can lead to large variability in protein levels for genetically identical cells. Such variability in protein levels can arise from infrequent synthesis of mRNAs which in turn give rise to bursts of protein expression. Protein expression occurring in bursts has indeed been observed experimentally and recent studies have also found evidence for transcriptional bursting, i.e. production of mRNAs in bursts. Given that there are distinct experimental techniques for quantifying the noise at different stages of gene expression, it is of interest to derive analytical results connecting experimental observations at different levels. In this work, we consider stochastic models of gene expression for which mRNA and protein production occurs in independent bursts. For such models, we derive analytical expressions connecting protein and mRNA burst distributions which show how the functional form of the mRNA burst distribution can be inferred from the protein burst distribution. Additionally, if gene expression is repressed such that observed protein bursts arise only from single mRNAs, we show how observations of protein burst distributions (repressed and unrepressed) can be used to completely determine the mRNA burst distribution. Assuming independent contributions from individual bursts, we derive analytical expressions connecting means and variances for burst and steady-state protein distributions. Finally, we validate our general analytical results by considering a specific reaction scheme involving regulation of protein bursts by small RNAs. For a range of parameters, we derive analytical expressions for regulated protein distributions that are validated using stochastic simulations. The analytical results obtained in this work can thus serve as useful inputs for a broad range of studies focusing on stochasticity in gene expression.

preprint2011arXiv

Exploring Human Mobility Patterns Based on Location Information of US Flights

A range of early studies have been conducted to illustrate human mobility patterns using different tracking data, such as dollar notes, cell phones and taxicabs. Here, we explore human mobility patterns based on massive tracking data of US flights. Both topological and geometric properties are examined in detail. We found that topological properties, such as traffic volume (between airports) and degree of connectivity (of individual airports), including both in- and outdegrees, follow a power law distribution but not a geometric property like travel lengths. The travel lengths exhibit an exponential distribution rather than a power law with an exponential cutoff as previous studies illustrated. We further simulated human mobility on the established topologies of airports with various moving behaviors and found that the mobility patterns are mainly attributed to the underlying binary topology of airports and have little to do with other factors, such as moving behaviors and geometric distances. Apart from the above findings, this study adopts the head/tail division rule, which is regularity behind any heavy-tailed distribution for extracting individual airports. The adoption of this rule for data processing constitutes another major contribution of this paper. Keywords: scaling of geographic space, head/tail division rule, power law, geographic information, agent-based simulations

preprint2011arXiv

Intrinsic noise in stochastic models of gene expression with molecular memory and bursting

Regulation of intrinsic noise in gene expression is essential for many cellular functions. Correspondingly, there is considerable interest in understanding how different molecular mechanisms of gene expression impact variations in protein levels across a population of cells. In this work, we analyze a stochastic model of bursty gene expression which considers general waiting-time distributions governing arrival and decay of proteins. By mapping the system to models analyzed in queueing theory, we derive analytical expressions for the noise in steady-state protein distributions. The derived results extend previous work by including the effects of arbitrary probability distributions representing the effects of molecular memory and bursting. The analytical expressions obtained provide insight into the role of transcriptional, post-transcriptional and post-translational mechanisms in controlling the noise in gene expression.

preprint2011arXiv

On the structural properties of small-world networks with finite range of shortcut links

We explore a new variant of Small-World Networks (SWNs), in which an additional parameter ($r$) sets the length scale over which shortcuts are uniformly distributed. When $r=0$ we have an ordered network, whereas $r=1$ corresponds to the original SWN model. These short-range SWNs have a similar degree distribution and scaling properties as the original SWN model. We observe the small-world phenomenon for $r \ll 1$ indicating that global shortcuts are not necessary for the small-world effect. For short-range SWNs, the average path length changes nonmonotonically with system size, whereas for the original SWN model it increases monotonically. We propose an expression for the average path length for short-range SWNs based on numerical simulations and analytical approximations.

preprint2011arXiv

Regulation by small RNAs via coupled degradation: mean-field and variational approaches

Regulatory genes called small RNAs (sRNAs) are known to play critical roles in cellular responses to changing environments. For several sRNAs, regulation is effected by coupled stoichiometric degradation with messenger RNAs (mRNAs). The nonlinearity inherent in this regulatory scheme indicates that exact analytical solutions for the corresponding stochastic models are intractable. Here, we present a variational approach to analyze a well-studied stochastic model for regulation by sRNAs via coupled degradation. The proposed approach is efficient and provides accurate estimates of mean mRNA levels as well as higher order terms. Results from the variational ansatz are in excellent agreement with data from stochastic simulations for a wide range of parameters, including regions of parameter space where mean-field approaches break down. The proposed approach can be applied to quantitatively model stochastic gene expression in complex regulatory networks.

preprint2011arXiv

Stochastic modeling of regulation of gene expression by multiple small RNAs

A wealth of new research has highlighted the critical roles of small RNAs (sRNAs) in diverse processes such as quorum sensing and cellular responses to stress. The pathways controlling these processes often have a central motif comprising of a master regulator protein whose expression is controlled by multiple sRNAs. However, the regulation of stochastic gene expression of a single target gene by multiple sRNAs is currently not well understood. To address this issue, we analyze a stochastic model of regulation of gene expression by multiple sRNAs. For this model, we derive exact analytic results for the regulated protein distribution including compact expressions for its mean and variance. The derived results provide novel insights into the roles of multiple sRNAs in fine-tuning the noise in gene expression. In particular, we show that, in contrast to regulation by a single sRNA, multiple sRNAs provide a mechanism for independently controlling the mean and variance of the regulated protein distribution.

preprint2010arXiv

Agent-based Simulation of Human Movement Shaped by the Underlying Street Structure

Relying on random and purposive moving agents, we simulated human movement in large street networks. We found that aggregate flow, assigned to individual streets, is mainly shaped by the underlying street structure, and that human moving behavior (either random or purposive) has little effect on the aggregate flow. This finding implies that given a street network, the movement patterns generated by purposive walkers (mostly human beings) and by random walkers are the same. Based on the simulation and correlation analysis, we further found that the closeness centrality is not a good indicator for human movement, in contrast to a long standing view held by space syntax researchers. Instead we suggest that Google's PageRank, and its modified version - weighted PageRank, betweenness and degree centralities are all better indicators for predicting aggregate flow.

preprint2010arXiv

Applications of Little's Law to stochastic models of gene expression

The intrinsic stochasticity of gene expression can lead to large variations in protein levels across a population of cells. To explain this variability, different sources of mRNA fluctuations ('Poisson' and 'Telegraph' processes) have been proposed in stochastic models of gene expression. Both Poisson and Telegraph scenario models explain experimental observations of noise in protein levels in terms of 'bursts' of protein expression. Correspondingly, there is considerable interest in establishing relations between burst and steady-state protein distributions for general stochastic models of gene expression. In this work, we address this issue by considering a mapping between stochastic models of gene expression and problems of interest in queueing theory. By applying a general theorem from queueing theory, Little's Law, we derive exact relations which connect burst and steady-state distribution means for models with arbitrary waiting-time distributions for arrival and degradation of mRNAs and proteins. The derived relations have implications for approaches to quantify the degree of transcriptional bursting and hence to discriminate between different sources of intrinsic noise in gene expression. To illustrate this, we consider a model for regulation of protein expression bursts by small RNAs. For a broad range of parameters, we derive analytical expressions (validated by stochastic simulations) for the mean protein levels as the levels of regulatory small RNAs are varied. The results obtained show that the degree of transcriptional bursting can, in principle, be determined from changes in mean steady-state protein levels for general stochastic models of gene expression.

preprint2010arXiv

Measuring Urban Sprawl Based on Massive Street Nodes and the Novel Concept of Natural Cities

In this paper, we develop a novel approach to measuring urban sprawl based on street nodes and naturally defined urban boundaries, both extracted from massive volunteered geographic information OpenStreetMap databases through some data-intensive computing processes. The street nodes are defined as street intersections and ends, while the naturally defined urban boundaries constitute what we call natural cities. We find that the street nodes are significantly correlated with population of cities. Based on this finding, we set street nodes as a proxy of population to measure urban sprawl. We further find that street nodes bear a significant linear relationship with city areal extents. In the plot with the x axis representing city areal extents, and the y axis street nodes, sprawling cities are located below the regression line. We verified the approach using urban areas and population from the US census, and then applied the approach to three European countries: France, Germany, and the United Kingdom for the categorization of natural cities into three classes: sprawling, compact, and normal. This categorization sets a uniform standard for cross comparing sprawling levels across an entire country. Keywords: Street networks, openstreetmap, volunteered geographic information, GIS

preprint2010arXiv

Post-transcriptional regulation of noise in protein distributions during gene expression

The intrinsic stochasticity of gene expression can lead to large variability of protein levels across a population of cells. Variability (or noise) in protein distributions can be modulated by cellular mechanisms of gene regulation; in particular, there is considerable interest in understanding the role of post-transcriptional regulation. To address this issue, we propose and analyze a stochastic model for post-transcriptional regulation of gene expression. The analytical solution of the model provides insight into the effects of different mechanisms of post-transcriptional regulation on the noise in protein distributions. The results obtained also demonstrate how different sources of intrinsic noise in gene expression can be discriminated based on observations of regulated protein distributions.

preprint2010arXiv

Zipf's Law for All the Natural Cities in the United States: A Geospatial Perspective

This paper provides a new geospatial perspective on whether or not Zipf's law holds for all cities or for the largest cities in the United States using a massive dataset and its computing. A major problem around this issue is how to define cities or city boundaries. Most of the investigations of Zipf's law rely on the demarcations of cities imposed by census data, e.g., metropolitan areas and census-designated places. These demarcations or definitions (of cities) are criticized for being subjective or even arbitrary. Alternative solutions to defining cities are suggested, but they still rely on census data for their definitions. In this paper we demarcate urban agglomerations by clustering street nodes (including intersections and ends), forming what we call natural cities. Based on the demarcation, we found that Zipf's law holds remarkably well for all the natural cities (over 2-4 million in total) across the United States. There is little sensitivity for the holding with respect to the clustering resolution used for demarcating the natural cities. This is a big contrast to urban areas, as defined in the census data, which do not hold stable for Zipf's law. Keywords: Natural cities, power law, data-intensive geospatial computing, scaling of geographic space

preprint2009arXiv

Quantifying mRNA synthesis and decay rates using small RNAs

Regulation of mRNA decay is a critical component of global cellular adaptation to changing environments. The corresponding changes in mRNA lifetimes can be coordinated with changes in mRNA transcription rates to fine-tune gene expression. Current approaches for measuring mRNA lifetimes can give rise to secondary effects due to transcription inhibition and require separate experiments to estimate changes in mRNA transcription rates. Here, we propose an approach for simultaneous determination of changes in mRNA transcription rate and lifetime using regulatory small RNAs to control mRNA decay. We analyze a stochastic model for coupled degradation of mRNAs and sRNAs and derive exact results connecting RNA lifetimes and transcription rates to mean abundances. The results obtained show how steady-state measurements of RNA levels can be used to analyze factors and processes regulating changes in mRNA transcription and decay.

Tao Jia

What is connected

Connect this record

See the researcher in context

Building this map preview

28 published item(s)

CasSeqGCN: Combining Network Structure and Temporal Sequence to Predict Information Cascades

CoarSAS2hvec: Heterogeneous Information Network Embedding with Balanced Network Sampling

Independent Asymmetric Embedding for Information Diffusion Prediction on Social Networks

Temporal Network Epistemology: on Reaching Consensus in Real World Setting

A novel similarity measure for mining missing links in long-path networks

A paper's corresponding affiliation and first affiliation are consistent at the country level in Web of Science

Magic Doping and Robust Superconductivity in Monolayer FeSe on Titanates

The evolution of network controllability in growing networks

A generalized linear threshold model for an improved description of the spreading dynamics

Measuring similarity in co-occurrence data using ego-networks

The dominance of big teams in China's scientific output

Verification of Short-Range Order and Its Impact on the Properties of the CrCoNi Medium Entropy Alloy

Visualizing Exotic Orbital Texture in the Single-Layer Mott Insulator 1T-TaSe2

An Analysis of the Matching Hypothesis in Networks

Emergence of bimodality in controlling complex networks

Scaling of Geographic Space as a Universal Rule for Map Generalization

Connecting protein and mRNA burst distributions for stochastic models of gene expression

Exploring Human Mobility Patterns Based on Location Information of US Flights

Intrinsic noise in stochastic models of gene expression with molecular memory and bursting

On the structural properties of small-world networks with finite range of shortcut links

Regulation by small RNAs via coupled degradation: mean-field and variational approaches

Stochastic modeling of regulation of gene expression by multiple small RNAs

Agent-based Simulation of Human Movement Shaped by the Underlying Street Structure

Applications of Little's Law to stochastic models of gene expression

Measuring Urban Sprawl Based on Massive Street Nodes and the Novel Concept of Natural Cities

Post-transcriptional regulation of noise in protein distributions during gene expression

Zipf's Law for All the Natural Cities in the United States: A Geospatial Perspective

Quantifying mRNA synthesis and decay rates using small RNAs