Title
Utilizing Web Search Engines for Program Analysis
Abstract
Programming involves representing domain concepts by using programming abstractions. In object-oriented programs, concepts and relations of the business domain are represented as classes, attributes and methods. However, the concepts and relations that logically belong together are scattered across different modules, interleaved with technical concepts, and distorted due to implementation details. In this paper, we present an automatic method to identify logically related concepts and the relations among them. To achieve this, we systematically transform program identifiers into fragments of natural language sentences and check whether these sentence fragments are meaningful for humans. In order to automatically perform such checks, we use the World Wide Web as a knowledge base that contains a huge number of meaningful texts, and use the Google web search engine to validate the meaningfulness of these sentences. If the search engine returns a sufficient number of hits, we discovered a piece of knowledge in the code. By systematically applying this method, we obtain a condensed form of the knowledge embodied in the program which is an enabler for automatic analyses. We present our experience with several use-cases: (1) assessing the meaningfulness of identifiers, (2) extracting complex concepts from compound identifiers, (3) extracting a meaningful taxonomy from the class hierarchy, and (4) extracting complex conceptual relations from the code. We report on our observations during the analysis of real world Java code, discuss the limitations of our approach and sketch extension possibilities.
Year
DOI
Venue
2010
10.1109/ICPC.2010.26
ICPC
Keywords
Field
DocType
real world java code,program analysis,google web search engine,knowledge base,meaningful text,meaningful taxonomy,automatic method,compound identifiers,business domain,utilizing web search engines,automatic analysis,program identifiers,shape,internet,domain knowledge,search engines,search engine,web search engine,object oriented programming,world wide web,object oriented program,natural languages,engines,natural language,java,knowledge based systems,use case
Web search engine,Data mining,Programming language,Object-oriented programming,Information retrieval,Domain knowledge,Computer science,Knowledge-based systems,Class hierarchy,Business domain,Program analysis,Knowledge base
Conference
Citations 
PageRank 
References 
1
0.35
10
Authors
2
Name
Order
Citations
PageRank
Daniel Ratiu149338.87
Lars Heinemann220211.05