Now showing items 1-3 of 3
(Association for Computational Linguistics, 2012)Developing content extraction methods for Humanities domains raises a number of chal- lenges, from the abundance of non-standard entity types to their complexity to the scarcity of data. Close collaboration with Humani- ...
(Association for Computational Linguistics, 2011)Most existing HLT pipelines assume the input is pure text or, at most, HTML and either ignore (logical) document structure or remove it. We argue that identifying the structure of documents is essential in digital library ...