Abstract | ||
---|---|---|
With the size digital collections are currently reaching, retrieving the best match of a document from large collections by comparing hundreds of tags is a task that involves considerable algorithm complexity, even more so if the number of tags in the collection is not fixed. For these cases, similarity search appears to be the best retrieval method, but there is a lack of techniques suited for these conditions. This work presents a combination of machine learning algorithms put together to find the most similar object of a given one in a set of pre-processed objects based only on their metadata tags. The algorithm represents objects as character frequency curves and is capable of finding relationships between objects without an apparent association. It can also be parallelized using MapReduce strategies to perform the search. This method can be applied to a wide variety of documents with metadata tags. The case-study used in this work to demonstrate the similarity search technique is that of a collection of image objects in JavaScript Object Notation (JSON) containing metadata tags. Information system for image classification based on frequency curve proximity.Combination of machine learning algorithms to find the most similar object of a set.This system can be applied to a wide variety of documents with metadata tagsThe system can be parallelized using Map-Reduce strategies. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1016/j.is.2016.08.001 | Inf. Syst. |
Keywords | Field | DocType |
Information system,Similarity search,Frequent itemset mining,Metadata,Image classification | Information system,Metadata,Metadata repository,Data mining,Notation,Information retrieval,Computer science,Contextual image classification,JSON,Database,Nearest neighbor search,JavaScript | Journal |
Volume | Issue | ISSN |
64 | C | 0306-4379 |
Citations | PageRank | References |
0 | 0.34 | 20 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Lidia Sánchez | 1 | 40 | 7.56 |
Javier Alfonso-Cendón | 2 | 14 | 7.12 |
Tiago R. Oliveira | 3 | 36 | 10.45 |
Joaquín Ordieres-Meré | 4 | 102 | 14.39 |
Manuel Castejón Limas | 5 | 0 | 0.34 |
Paulo Novais | 6 | 883 | 171.45 |