Data Model

Textass is a tool to help you working with text documents. The question here is: what information we will have from each document:

  1. Expected info
  2. Parsing distance

1) Expected info

Text meta info

Title

  • Creator
  • Subject
  • Description
  • Publisher
  • Contributor
  • Date
  • Type
  • Format
  • Identifier
  • Source
  • Language
  • Relation
  • Coverage
  • Rights

Metrics info

  • Number of charecters
  • Number of words
  • Number of paragraphs

Annotation info

  • List of entities present in the text. For entity: name, event, place, date, language,...

Users info

  • Use of this document for the users and the cosial stats

Internet info

  • Info coming from internet smashups, like wikipedia, blogosphere,... enriching the document context

2) Parsing distance

We gonna try a simple technique to stablish relationships in between annotations based on distance: same sentence, same paragraph, etc...