Title
A Prior Knowledge Based Approach to Improving Accuracy of Web Services Clustering
Abstract
The rapid growth in both the number and diversity of Web services raises new requirement of clustering techniques to facilitate the service discovery, service repository management etc. Existing clustering methods of Web services primarily focus on using the semantic distances between service features, e.g., topic vectors, mined from WSDL documents. However, these quality topic vectors are hard to be obtained due to the lack of abundant textual information in Web service description documents. In practice, prior knowledge from human's trajectory of utilizing Web services could be helpful in improving the accuracy of Web services clustering. With an analysis in the dataset of Web services and Mashups from ProgrammableWeb, we observe that Web services Mashuped together are highly likely to belong to different clusters and Web services being annotated with identical tags tend to be within the same cluster. Based on these observations, this paper proposes an efficient clustering approach for Web services. The approach firstly uses a probabilistic topic model to elicit the latent topic vectors from Web service description documents. It then performs clustering based on the K-means++ algorithm by incorporating parameters representing above mentioned prior knowledge. A comprehensive evaluation is conducted to validate the performance of our proposed approach based on a ground truth dataset crawled from ProgrammableWeb. Experimental comparisons of the approaches with and without these prior knowledge considerations show that our approach has a significant improvement on the clustering accuracy.
Year
DOI
Venue
2018
10.1109/SCC.2018.00008
2018 IEEE International Conference on Services Computing (SCC)
Keywords
Field
DocType
Web services,Clustering,Prior Knowledge,LDA,K-means++
k-means clustering,Mashup,Information retrieval,Computer science,Ground truth,Topic model,Probabilistic logic,Web service,Cluster analysis,Service discovery
Conference
ISSN
ISBN
Citations 
2474-8137
978-1-5386-7251-8
0
PageRank 
References 
Authors
0.34
19
5
Name
Order
Citations
PageRank
Min Shi1353.53
Jianxun Liu264067.12
Buqing Cao320023.96
Yiping Wen4258.59
Xiangping Zhang533.12