|
Résultat majeur : ENTITY DISCOVERY AND ANNOTATION IN TABLES |
|
|
|
|
ENTITY DISCOVERY AND ANNOTATION IN TABLES
07 janvier 2013
G. Quercini and C. Reynaud. International Conference on Extending Database Technology
|
The Web is rich of tables (e.g., HTML tables, spreadsheets, Google Fusion Tables) that host a considerable wealth of high-quality relational data. Not surprinsingly, they have been increasingly drawing the attention of numerous researchers, especially from the information retrieval and extraction community; unlike unstructured texts, indeed, tables usually favour the automatic extraction of data because of their regular structure and properties. The data extraction is usually complemented by the annotation of the table, which finds its semantics by identifying a type for each column, the relations between columns, if any, and the entities that occur in each cell. In this paper, we focus on the problem of discovering and annotating entities in tables. More specially, entity annotation refers to the task of assigning a label (e.g. "restaurant", "museum") to a phrase (e.g. "T.G.I. Friday's", "Metropolitan Museum of Art") that denotes an entity.
Compared to existing approaches, we tackle this problem in a pragmatic way, which is motivated by specific application needs; in particular, we focus on Google Fusion Tables, which is a rapidly growing collection of tables with rich and high-quality data. The main novelty of our approach is that it does not rely on a pre-compiled reference catalogue of annotated entities, typically extracted from ontologies such as Yago and DBpedia, which limits the annotations to the sole entities that belong to the catalogue. Instead, we train an algorithm to look for information on previously unseen entities on the Web so as to annotate them with the correct type.
Activités de recherche
° Bases de données ° Intégration d'informations
Equipe
° Intelligence Artificielle et Systèmes d'Inférence
Contact
[aucun]
|
| |
|
|
|
|