Technologies of Cross-Domain Data Exchange

Technical Overview Data Editing Support

Data Editing Support

CADDE also develops tools to support data creators and data users, such as format conversion of acquired data.

 

 

TableLinker

The data actually exchanged are described in various formats, structures, and expressions, and data integration functions are required to make use of them.

TableLinker is a tool that supports semantic search of datasets, extraction of tabular data, and annotation of tabular data, and was developed based on the research results of NII [1,2,3], which was evaluated in a worldwide competition.

Extraction of tabular data

Extracting and analyzing “table” regions contained in electronic documents in PDF or image format and converting them to tabular data. First. Layout analysis of each page of the document is performed to extract the table areas contained in the page. Next, cells contained in the table area are extracted and their types are estimated. If the page is an image, character recognition is performed in addition to cell extraction.

Retrieval of Tabular Data

Natural language processing techniques are used to assist in the retrieval and processing of data sets. We can search tabular data using semantics and structure,

Semantic annotation

Inferring what tabular data refers to. We use the knowledge base (knowledge graph) to determine what kind of data the table is about (class inference), what the relationships are between the items in the table (property relationship), and what the value of each item refers to (entity recognition).

Reference

[1] P. Nguyen, I. Yamada, N. Kertkeidkachorn, R. Ichise and H. Takeda: MTab4Wikidata at SemTab 2020: Tabular Data Annotation with Wikidata E. Jiménez-Ruiz, O. Hassanzadeh, V. Efthymiou, J. Chen, K. Srinivas and V. Cutrona eds., Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2020) co-located with the 19th International Semantic Web Conference (ISWC 2020), Virtual conference (originally planned to be in Athens, Greece), November 5, 2020, Vol. 2775 of CEUR Workshop Proceedings, pp. 86-95, CEUR-WS.org (2020).
[2] P. Nguyen, I. Yamada and H. Takeda: MTabES: Entity Search with Keyword Search, Fuzzy Search, and Entity Popularities in The 35th Annual Conference of the Japanese Society for Artificial Intelligence, No. 1N4-IS-1a-02 The Japanese Society for Artificial Intelligence (2021).
[3] P. Nguyen, K. Shinoda, T. Sakamoto, D. Petrescuand, H.-N. Tran, A. Takasu, A. Aizawa and H. Takeda: NII Table Linker at the NTCIR-15 Data Search Task: Re-ranking with Pre-trained Contextualized Embeddings, Data Content, Entity-centric, and Cluster-based Approaches in Proceedings of the NTCIR-15 Conference on Evaluation of Information Access Technologies (2020).

This site is registered on wpml.org as a development site.