Publications

2025

Haoran Nan, Senquan Wang, Chun Ouyang, Yanchen Zhou, Weiwei Gu. Assessing the robustness and reducibility of multiplex networks with embedding-aided interlayer similarities. Physical Review E.

The study of interlayer similarity of multiplex networks helps us understand the intrinsic structure of complex systems, revealing how changes in one layer can propagate and affect others, thus enabling broad implications for transportation, social, and biological systems. Existing algorithms that measure similarities between network layers typically encode only partial information, which limits their effectiveness in capturing the full complexity inherent in multiplex networks. To address this limitation, we propose an interlayer similarity measuring approach named Embedding Aided Interlayer Similarity (EATSim). EATSim concurrently incorporates intralayer structural similarity and cross-layer anchor node alignment consistency, providing a more comprehensive framework for analyzing interconnected systems. Extensive experiments on both synthetic and real-world networks demonstrate that EATSim effectively captures the underlying geometric similarities between interconnected networks, significantly improving the accuracy of interlayer similarity measurement. Moreover, EATSim achieves state-of-the-art performance in two downstream applications: predicting network robustness and network reducibility, showing its great potential in enhancing the understanding and management of complex systems.

Weiwei Gu, Chen Yang, Lei Li, Jinqiang Hou, Filippo Radicchi.Deep-learning-aided dismantling of interdependent networks. Nature Machine Intelligence.

Identifying the minimal set of nodes whose removal breaks a complex network apart, also referred as the network dismantling problem, is a highly non-trivial task with applications in multiple domains. Whereas network dismantling has been extensively studied over the past decade, research has primarily focused on the optimization problem for single-layer networks, neglecting that many, if not all, real networks display multiple layers of interdependent interactions. In such networks, the optimization problem is fundamentally different as the effect of removing nodes propagates within and across layers in a way that can not be predicted using a single-layer perspective. Here, we propose a dismantling algorithm named MultiDismantler, which leverages multiplex network representation and deep reinforcement learning to optimally dismantle multi-layer interdependent networks. MultiDismantler is trained on small synthetic graphs; when applied to large, either real or synthetic networks, it displays exceptional dismantling performance, clearly outperforming all existing benchmark algorithms. We show that MultiDismantler is effective in guiding strategies for the containment of diseases in social networks characterized by multiple layers of social interactions. Also, we show that MultiDismantler is useful in the design of protocols aimed at delaying the onset of cascading failures in interdependent critical infrastructures.

Weiwei Gu, Linbi Lv, Gang Lu, Ruiqi Li. MWTP: A heterogeneous multiplex representation learning framework for link prediction of weak ties. Neural Networks.

Weak ties that bridge different communities are crucial for preserving global connectivity, enhancing resilience, and maintaining functionality and dynamics of complex networks, However, making accurate link predictions for weak ties remain challenging due to lacking of common neighbors. Most complex systems, such as transportation and social networks, comprise multiple types of interactions, which can be modeled by multiplex networks with each layer representing a different type of connection. Better utilizing information from other layers can mitigate the lack of information for predicting weak ties. Here, we propose a GNN-based representation learning framework for Multiplex Weak Tie Prediction (MWTP). It leverages both an intra-layer and an inter-layer aggregator to effectively learn and fuse information across different layers. The intra-layer one integrates features from multi-order neighbors, and the inter-layer aggregation exploits either logit regression or a more sophisticated semantic voting mechanism to compute nodal-level inter-layer attentions, leading to two variants of our framework, MWTP-logit, and MWTP-semantic. The former one is more efficient in implementation attribute to fewer parameters, while the latter one is slower but has stronger learning capabilities. Extensive experiments demonstrate that our MWTPs outperform eleven popular baselines for predicting both weak ties and all ties across diverse real-world multiplex networks. Additionally, MWTPs achieve good prediction performance with a relatively small training size.

Gang Lu, Xiuyi Jiang, Qingyuan Yang, Zhengyang Zhao, Weiwei Gu, Ruiqi Li. Predicting methane adsorption ability of MOFs based on the crystal structural information by convolutional neural network. Fifth International Conference on Material Science and Technology (ICMST 2025).

High-throughput computational screening (HTCS) methods have been widely used in addressing the challenges of screening metal-organic framework (MOF) materials with high performance from large datasets. Accurately predicting the performance of MOFs with hypothetical structures, such as adsorption capacity for methane, can be a great help to HTCS. Traditional molecular dynamics simulation methods are computationally cumbersome and costly. Feature engineering-based machine learning methods have a high reliance on expert experience to extract effective features, some of which can be time-consuming to calculate. Although some classical machine learning models can already achieve good performance, the heavy burden for features computation prior to model training would eventually limit their practical applications, flexibility and generality. In this paper, an approach is proposed for predicting the methane adsorption ability of MOFs directly from their crystal structural information recorded in Crystallographic Information File (CIF)-formatted files, without the need for molecular dynamics simulation or calculating any physical or chemical properties. By expressing the structural information in the crystal cells of MOFs as matrices, a deep convolutional neural network model based on residual structure is applied to train an end-to-end model to predict their methane adsorption ability. The experimental results show that the model accomplishes the prediction task with high accuracy, giving the mean square correlation coefficient (R2) as 0.925 and the mean absolute error (MAE) as 22.28 cm3g-1, which demonstrates a good performance over baselines.

2024

Weiwei Gu, Senquan Wang. An Improved Strategy for Blood Glucose Control Using Multi-Step Deep Reinforcement Learning. Proceedings of the 2024 16th International Conference on Bioinformatics and Biomedical Technology.

Blood Glucose (BG) control involves kee** an individual's BG within a healthy range through extracorporeal insulin injections is an important task for people with type 1 diabetes. However,traditional patient self-management is cumbersome and risky. Recent research has been devoted to exploring individualized and automated BG control approaches, among which Deep Reinforcement Learning (DRL) shows potential as an emerging approach. In this paper, we use an exponential decay model of drug concentration to convert the formalization of the BG control problem, which takes into account the delay and prolongedness of drug effects, from a PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process) to a MDP, and we propose a novel multi-step DRL-based algorithm to solve the problem. The Prioritized Experience Replay (PER) sampling method is also used in it. Compared to single-step bootstrapped updates, multi-step learning is more efficient and reduces the influence from biasing targets. Our proposed method converges faster and achieves higher cumulative rewards compared to the benchmark in the same training environment, and improves the time-in-range (TIR), the percentage of time the patient's BG is within the target range, in the evaluation phase. Our work validates the effectiveness of multi-step reinforcement learning in BG control, which may help to explore the optimal glycemic control measure and improve the survival of diabetic patients.

2023

Weiwei Gu, Jinqiang Hou, Weiyi Gu. Improving Link Prediction Accuracy of Network Embedding Algorithms via Rich Node Attribute Information. Journal of Social Computing.

Complex networks are widely used to represent an abundance of real-world relations ranging from social networks to brain networks. Inferring missing links or predicting future ones based on the currently observed network is known as the link prediction task. Recent network embedding based link prediction algorithms have demonstrated ground-breaking performance on link prediction accuracy. Those algorithms usually apply node attributes as the initial feature input to accelerate the convergence speed during the training process. However, they do not take full advantage of node feature information. In this paper, besides applying feature attributes as the initial input, we make better utilization of node attribute information by building attributable networks and plugging attributable networks into some typical link prediction algorithms and name this algorithm Attributive Graph Enhanced Embedding (AGEE). AGEE is able to automatically learn the weighting trades-off between the structure and the attributive networks. Numerical experiments show that AGEE can improve the link prediction accuracy by around 3% compared with SEAL, Variational Graph AutoEncoder (VGAE), and node2vec.

Lifei Wang, Rui Nie, Zhang Zhang, Weiwei Gu, Shuo Wang, Anqi Wang, Jiang Zhang, Jun Cai. A deep generative framework with embedded vector arithmetic and classifier for sample generation, label transfer, and clustering of single-cell data. Cell Reports Methods.

Multiple-source single-cell datasets have accumulated quickly and need computational methods to integrate and decompose into meaningful components. Here, we present inClust (integrated clustering), a flexible deep generative framework that enables embedding auxiliary information, latent space vector arithmetic, and clustering. All functional parts are relatively modular, independent in implementation but interrelated at runtime, resulting in an all-in general framework that could work in supervised, semi-supervised, or unsupervised mode. We show that inClust is superior to most data integration methods in benchmark datasets. Then, we demonstrate the capability of inClust in the tasks of conditional out-of-distribution generation in supervised mode, label transfer in semi-supervised mode, and spatial domain identification in unsupervised mode. In these examples, inClust could accurately express the effect of each covariate, distinguish the query-specific cell types, or segment spatial domains. The results support that inClust is an excellent general framework for multiple-task harmonization and data decomposition.

Min Gao, Zheng Li, Ruichen Li, Chenhao Cui, Xinyuan Chen, Bodian Ye, Yupeng Li, Weiwei Gu, Qingyuan Gong, Xin Wang, Yang Chen. EasyGraph: A multifunctional, cross-platform, and effective library for interdisciplinary network analysis. Patterns.

Networks are powerful tools for representing the relationships and interactions between entities in various disciplines. However, existing network analysis tools and packages either lack powerful functionality or are not scalable for large networks. In this descriptor, we present EasyGraph, an open-source network analysis library that supports several network data formats and powerful network mining algorithms. EasyGraph provides excellent operating efficiency through a hybrid Python/C++ implementation and multiprocessing optimization. It is applicable to various disciplines and can handle large-scale networks. We demonstrate the effectiveness and efficiency of EasyGraph by applying crucial metrics and algorithms to random and real-world networks in domains such as physics, chemistry, and biology. The results demonstrate that EasyGraph improves the network analysis efficiency for users and reduces the difficulty of conducting large-scale network analysis. Overall, it is a comprehensive and efficient open-source tool for interdisciplinary network analysis.

2022

Weiwei Gu, Ao Yang, Lingyun Lu, Ruiqi Li. Unveiling latent structure of venture capital syndication networks. Entropy.

Venture capital (VC) is a form of private equity financing provided by VC institutions to startups with high growth potential due to innovative technology or novel business models but also high risks. To against uncertainties and benefit from mutual complementarity and sharing resources and information, making joint-investments with other VC institutions on the same startup are pervasive, which forms an ever-growing complex syndication network. Attaining objective classifications of VC institutions and revealing the latent structure of joint-investment behaviors between them can deepen our understanding of the VC industry and boost the healthy development of the market and economy. In this work, we devise an iterative Loubar method based on the Lorenz curve to make objective classification of VC institutions automatically, which does not require setting arbitrary thresholds and the number of categories. We further reveal distinct investment behaviors across categories, where the top-ranked group enters more industries and investment stages with a better performance. Through network embedding of joint investment relations, we unveil the existence of possible territories of top-ranked VC institutions, and the hidden structure of relations between VC institutions.

2021

Weiwei Gu, Aditya Tandon, Yong-Yeol Ahn, Filippo Radicchi. Principled approach to the selection of the embedding dimension of networks. Nature Communications

Network embedding is a general-purpose machine learning technique that encodes network structure in vector spaces with tunable dimension. Choosing an appropriate embedding dimension – small enough to be efficient and large enough to be effective – is challenging but necessary to generate embeddings applicable to a multitude of tasks. Existing strategies for the selection of the embedding dimension rely on performance maximization in downstream tasks. Here, we propose a principled method such that all structural information of a network is parsimoniously encoded. The method is validated on various embedding algorithms and a large corpus of real-world networks. The embedding dimension selected by our method in real-world networks suggest that efficient encoding in low-dimensional spaces is usually possible.

Weiwei Gu, Fei Gao, Xiaodan Lou, Jiang Zhang. Discovering latent node Information by graph attention network. Scientific reports.

In this paper, we propose graph attention based network representation (GANR) which utilizes the graph attention architecture and takes graph structure as the supervised learning information. Compared with node classification based representations, GANR can be used to learn representation for any given graph. GANR is not only capable of learning high quality node representations that achieve a competitive performance on link prediction, network visualization and node classification but it can also extract meaningful attention weights that can be applied in node centrality measuring task. GANR can identify the leading venture capital investors, discover highly cited papers and find the most influential nodes in Susceptible Infected Recovered Model. We conclude that link structures in graphs are not limited on predicting linkage itself, it is capable of revealing latent node information in an unsupervised way once a appropriate learning algorithm, like GANR, is provided.

Ruiqi Li, Lingyun Lu, Tianyu Cui, Weiwei Gu, Shaodong Ma, Gang Xu, H Eugene Stanley. Assessing the Attraction of Cities on Venture Capital From a Scaling Law Perspective. IEEE Access.

Cities are centers for the integration of capital and incubators of inventions. Attracting venture capital (VC) is of great importance for cities to advance in innovative technology and business models towards a sustainable and prosperous future. Yet we still lack a quantitative understanding of the relationship between urban characteristics and VC activities. In this paper, we find a clear nonlinear scaling relationship between VC activities and the urban population of Chinese cities. In such nonlinear systems, the widely applied linear per capita indicators would be either biased to larger cities or smaller cities depends on whether it is superlinear or sublinear, while the residual of cities relative to the prediction of scaling law is a more objective and scale-invariant metric. Such a metric can distinguish the effects of local dynamics and scaled growth induced by the change of population size. The spatiotemporal evolution of such metrics on VC activities reveals three distinct groups of cities, two of which stand out with increasing and decreasing trends, respectively. The taxonomy results together with spatial analysis also signify different development modes between large urban agglomeration regions. Besides, we notice the evolution of scaling exponents on VC activities are of much larger fluctuations than on socioeconomic output of cities, and a conceptual model that focuses on the growth dynamics of different sized cities can well explain it, which we assume would be general to other scenarios.

Weiwei Gu, Fei Gao, Ruiqi Li, Jiang Zhang. Learning Universal Network Representation via Link Prediction by Graph Convolutional Neural Network. Journal of Social Computing.

Network representation learning algorithms, which aim at automatically encoding graphs into low-dimensional vector representations with a variety of node similarity definitions, have a wide range of downstream applications. Most existing methods either have low accuracies in downstream tasks or a very limited application field, such as article classification in citation networks. In this paper, we propose a novel network representation method, named Link Prediction based Network Representation (LPNR), which generalizes the latest graph neural network and optimizes a carefully designed objective function that preserves linkage structures. LPNR can not only learn meaningful node representations that achieve competitive accuracy in node centrality measurement and community detection but also achieve high accuracy in the link prediction task. Experiments prove the effectiveness of LPNR on three real-world networks. With the mini-batch and fixed sampling strategy, LPNR can learn the embedding of large graphs in a few hours.

2020

Xin Gao, Jar-Der Luo, Kunhao Yang, Xiaoming Fu, Loring Liu, Weiwei Gu. Predicting Tie Strength of Chinese Guanxi by Using Big Data of Social Networks. Journal of Social Computing.

This paper poses a question: How many types of social relations can be categorized in the Chinese context? In social networks, the calculation of tie strength can better represent the degree of intimacy of the relationship between nodes, rather than just indicating whether the link exists or not. Previou research suggests that Granovetter measures tie strength so as to distinguish strong ties from weak ties, and the Dunbar circle theory may offer a plausible approach to calculating 5 types of relations according to interaction frequency via unsupervised learning (e.g., clustering interactive data between users in Facebook and Twitter). In this paper, we differentiate the layers of an ego-centered network by measuring the different dimensions of user's online interaction data based on the Dunbar circle theory. To label the types of Chinese guanxi, we conduct a survey to collect the ground truth from the real world and link this survey data to big data collected from a widely used social network platform in China. After repeating the Dunbar experiments, we modify our computing methods and indicators computed from big data in order to have a model best fit for the ground truth. At the same time, a comprehensive set of effective predictors are selected to have a dialogue with existing theories of tie strength. Eventually, by combining Guanxi theory with Dunbar circle studies, four types of guanxi are found to represent a four-layer model of a Chinese ego-centered network.

2019

Weiwei Gu, Jar-der Luo, Jifan Liu. Exploring small-world network with an elite-clique: Bringing embeddedness theory into the dynamic evolution of a venture capital network. Social Networks.

This paper uses a network dynamics model to explain the formation of a small-world network with an elite-clique. This network is a small-world network with an elite-clique at its center in which elites are also the centers of many small groups. These leaders also act as bridges between different small groups. Network dynamics are an important research topic due to their ability to explain the evolution of network structures. In this paper, a Chinese Venture Capital (VC) network was coded from joint investments between VC firms and then analyzed to uncover its network properties and factors that influence its evolution. We first built a random graph model to control for factors such as network scale, network growth, investment frequency and syndication tendency. Then we added a partner-selection mechanism and used two theories to analyze the formation of network structure: relational embeddedness and structural embeddedness. After that, we ran simulations and compared the three models with the actual Chinese VC network. To do this we computed the elite-clique’s EI index, degree distribution, clustering coefficient distribution and motifs. Results show that adding embeddedness theories significantly improved the network dynamic model’s predictive power, and help us uncover the mechanisms that affect the formation of a small-world industrial network with an elite-clique at its center.

2017

Weiwei Gu, Li Gong, Xiaodan Lou, Jiang Zhang. The Hidden Flow Structure and Metric Space of Network Embedding Algorithms Based on Random Walks. Scientific reports.

Network embedding which encodes all vertices in a network as a set of numerical vectors in accordance with it’s local and global structures, has drawn widespread attention. Network embedding not only learns significant features of a network, such as the clustering and linking prediction but also learns the latent vector representation of the nodes which provides theoretical support for a variety of applications, such as visualization, link prediction, node classification, and recommendation. As the latest progress of the research, several algorithms based on random walks have been devised. Although those algorithms have drawn much attention for their high scores in learning efficiency and accuracy, there is still a lack of theoretical explanation, and the transparency of those algorithms has been doubted. Here, we propose an approach based on the open-flow network model to reveal the underlying flow structure and its hidden metric space of different random walk strategies on networks. We show that the essence of embedding based on random walks is the latent metric structure defined on the open-flow network. This not only deepens our understanding of random- walk-based embedding algorithms but also helps in finding new potential applications in network embedding.

2016

Xiaodan Lou, Yong Li, Weiwei Gu, Jiang Zhang. The Atlas of Chinese World Wide Web Ecosystem Shaped by the Collective Attention Flows. PloS one.

The web can be regarded as an ecosystem of digital resources connected and shaped by collective successive behaviors of users. Knowing how people allocate limited attention on different resources is of great importance. To answer this, we embed the most popular Chinese web sites into a high dimensional Euclidean space based on the open flow network model of a large number of Chinese users’ collective attention flows, which both considers the connection topology of hyperlinks between the sites and the collective behaviors of the users. With these tools, we rank the web sites and compare their centralities based on flow distances with other metrics. We also study the patterns of attention flow allocation, and find that a large number of web sites concentrate on the central area of the embedding space, and only a small fraction of web sites disperse in the periphery. The entire embedding space can be separated into 3 regions(core, interim, and periphery). The sites in the core (1%) occupy a majority of the attention flows (40%), and the sites (34%) in the interim attract 40%, whereas other sites (65%) only take 20% flows. What’s more, we clustered the web sites into 4 groups according to their positions in the space, and found that similar web sites in contents and topics are grouped together. In short, by incorporating the open flow network model, we can clearly see how collective attention allocates and flows on different web sites, and how web sites connected each other.