Abstract:
Datasets pertaining to the network of relationships between entities have
been of interest to computer scientists ever since the data of the World Wide
Web was made available for scientific scrutiny. We are today equipped with
the real-world datasets of several complex networks, such as Facebook, Twitter,
WWW, collaboration network, etc. In a complex network, objects are represented
by nodes, and the relationship between the objects is represented by an edge
connecting them. These nodes in a complex network can be ranked based on
their importance. Since the term importance is contextual, scientists have defined
various application specific centrality measures namely degree centrality, closeness
centrality, betweenness centrality, eigenvector centrality, Katz centrality, PageRank,
coreness, and so on.
In real-life applications, one is mainly interested in the relative importance
of a node concerning the top-ranked node and the existing methods use the
computation of centrality measure as a means to rank the nodes. The classical
ranking method will compute the centrality value of all the nodes and will compare
them to get the rank of an interested node. The time complexity of the classical
method is very high in case of large-scale complex networks. This calls for a need
to explore possibilities to compute the rank of a node without computing the
centrality value of all the nodes.
In this work, we aim to propose fast and efficient methods to estimate the
global centrality rank of a node without computing the centrality value of all the
nodes. The proposed methods are based on the structural properties of centrality
measures or sampling techniques. The main contributions of the thesis are listed
below.
1. Methods to estimate the degree rank of a node without collecting the entire
network. The proposed methods are based on the degree-distribution characteristics of the networks and sampling techniques.
2. Heuristic methods to estimate the closeness rank of a node in O(m) time
complexity versus the classical ranking method having the time complexity
O(n m), where n represents the total number of nodes, and m represents
the total number of edges in the network.
3. A method to estimate the shell-index of a node using local information. We
propose hill-climbing based methods to identify top-ranked nodes using the
proposed estimator. We further use the proposed estimator to estimate the
rank of a node without having the entire network.
The proposed methods have been simulated on synthetic networks as well
as on real-world networks. We will discuss the efficiency and feasibility of these
approaches in different contexts.