Minimizing data transfers for regular reachability queries on distributed graphs - Citegraph

Paper Info

Title
Minimizing data transfers for regular reachability queries on distributed graphs

Abstract
Nowadays, there is an explosion of Internet information, which is normally distributed on different sites. Hence, efficient finding information becomes difficult. Efficient query evaluation on distributed graphs is an important research topic since it can be used in real applications such as: social network analysis, web mining, ontology matching, etc. A widely-used query on distributed graphs is the regular reachability query (RRQ). A RRQ verifies whether a node can reach another node by a path satisfying a regular expression. Traditionally RRQs are evaluated by distributed depth-first search or distributed breadth-first search methods. However, these methods are restricted by the total network traffic and the response time on large graphs. Recently, Wenfei Fan et al. proposed an approach for improving reachability queries by visiting each site only once, but it has a communication bottleneck problem when assembling all distributed partial query results. In this paper, we propose two algorithms in order to improve Wenfei Fan's algorithm for RRQs. The first algorithm filters and removes redundant nodes/edges on each local site, in parallel. The second algorithm limits the data transfers by local contraction of the partial result. We extensively evaluated our algorithms on MapReduce using YouTube and DBLP datasets. The experimental results show that our method reduces unnecessary data transfers at most 60%, this solves the communication bottleneck problem.

Year	DOI	Venue
2013	10.1145/2542050.2542092	SoICT
Keywords	Field	DocType
algorithm filter,widely-used query,communication bottleneck problem,internet information,reachability query,minimizing data transfer,wenfei fan,regular reachability query,efficient query evaluation,rrq verifies,partial query result	Bottleneck,Web mining,Computer science,Response time,Theoretical computer science,Reachability,Artificial intelligence,The Internet,Distributed computing,Ontology alignment,Regular expression,Social network analysis,Machine learning	Conference
Citations	PageRank	References
2	0.37	22
Authors
3

Authors (3 rows)

Cited by (2 rows)

References (22 rows)

Name	Order	Citations	PageRank
Quyet Nguyen-Van	1	13	1.34
Le-Duc Tung	2	14	1.98
Zhenjiang Hu	3	1341	99.25

1