Abstract
In this paper, we propose SemTree, a novel semantic index for supporting retrieval of information from huge amount of document collections, assuming that semantics of a document can be effectively expressed by a set of 〈subject, predicate, object〉 statements as in the RDF model. A distributed version of KD-Tree has been then adopted for providing a scalable solution to the document indexing, leveraging the mapping of triples in a vectorial space. We investigate the feasibility of our approach in a real case study, considering the problem of finding inconsistencies in documents related to software requirements and report some preliminary experimental results.