Findx: A Versatile, Low-Resource Approach To Financial Website Classification - Citegraph

Paper Info

Title
Findx: A Versatile, Low-Resource Approach To Financial Website Classification

Abstract
The World Wide Web provides an excellent platform for investors to discover new partnership opportunities with a variety of companies. Analysts can categorize websites according to their business domains to retain relevant investment opportunities. Classifying websites manually is too expensive and time-consuming; thus, automatic classification tools are necessary. In this paper, we present FinDX (Financial Data Exploration), a tool for automatic website content classification for the financial technology (fintech) domain. At the core of our system is a keyword-based web crawler that extracts text from the landing page and relevant subpages, such as the About or Product pages of company websites. After cleaning the text and filtering it using part-of-speech tagging, we use a Linear Support Vector Machine (SVM) or Multilayer Perceptron (MLP) to classify a company website as fintech or non-fintech. FinDX achieves high binary classification accuracy on two different datasets of business websites, attaining a maximal F-score of 96%. In addition, our flexible tool is easily adaptable to any business domain and is not resource-expensive. This makes FinDX ideal for use in startup environments.

Year	DOI	Venue
2019	10.1109/BigData47090.2019.9006368	2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)
Keywords	Field	DocType
Web Content Classification, Machine Learning, Linear Support Vector Machine, Bag-of-Words, Term Frequency Inverse-Document Frequency, Financial Technology	Bag-of-words model,Landing page,Binary classification,tf–idf,Computer science,Support vector machine,Business domain,FinTech,Finance,Web crawler	Conference
ISSN	Citations	PageRank
2639-1589	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Alissa Ostapenko	1	0	0.34
Rodica Neamtu	2	9	4.26
Frazer Anderson	3	0	0.34

1