Abstract
In this work we address the problem of extracting quality entity knowledge from natural language text,
an important task for the automatic construction of knowledge graphs from unstructured content.
More in details, we investigate the benefit of performing a joint posterior revision, driven by
ontological background knowledge, of the annotations resulting from natural language processing
(NLP) entity analyses such as named entity recognition and classification (NERC) and entity linking (EL).
The revision is performed via a probabilistic model, called jpark, that given the candidate annotations
independently identified by NERC and EL tools on the same textual entity mention, reconsiders the
best annotation choice performed by the tools in light of the coherence of the candidate annotations
with the ontological knowledge. The model can be explicitly instructed to handle the information that
an entity can potentially be NIL (i.e., lacking a corresponding referent in the target linking knowledge
base), exploiting it for predicting the best NERC and EL annotation combination.
We present a comprehensive evaluation of jpark along various dimensions, comparing its perfor-
mances with and without exploiting NIL information, as well as the usage of three different background
knowledge resources (YAGO, DBpedia, and Wikidata) to build the model. The evaluation, conducted
using different tools (the popular Stanford NER and DBpedia Spotlight, as well as the more recent
Flair NER and End-to-End Neural EL) with three reference datasets (AIDA, MEANTIME, and TAC-KBP),
empirically confirms the capability of the model to improve the quality of the annotations of the given
tools, and thus their performances on the tasks they are designed for.