Data Model
Textass is a tool to help you working with text documents. The question here is: what information we will have from each document:
- Expected info
- Parsing distance
1) Expected info
Text meta info
Title
- Creator
- Subject
- Description
- Publisher
- Contributor
- Date
- Type
- Format
- Identifier
- Source
- Language
- Relation
- Coverage
- Rights
Metrics info
- Number of charecters
- Number of words
- Number of paragraphs
Annotation info
- List of entities present in the text. For entity: name, event, place, date, language,...
Users info
- Use of this document for the users and the cosial stats
Internet info
- Info coming from internet smashups, like wikipedia, blogosphere,... enriching the document context
2) Parsing distance
We gonna try a simple technique to stablish relationships in between annotations based on distance: same sentence, same paragraph, etc...