image/svg+xml
FoLiA
Format for Linguistic Annotation
- Objective -
Token Annotation
Span Annotation
- Part-of-Speech- Lemmatisation- Corrections- Lexical Semantic Annotation- Morphological Analysis
- Syntactic Parses- Chunking- Named Entities- Dependency Relations
Creation of a single XML format supporting a rich variety of linguistic annotations in ageneralised fashion.
- Features -
- Uniform, Extensible, Expressive- Not committing to tagsets/language- Can encode annotators, confidence- Can encode alternative annotations- Forked off D-Coi/SoNaR format- Unique identifiers (D-Coi compatible)
Annotation types
LAYEREDSTAND-OFF
- uniform paradigm -
INLINE
AnnotationInstance
Set
attributes
Class
Annotator
AnnotatorType
annotated by
associated with
of a certain
member of
- global structure -
<FoLiA xml:id="EXAMPLE"><metadata> <annotations> <token-annotation /> </annotations></metadata><text xml:id="EXAMPLE.text"><p xml:id="EXAMPLE.p.1"> <s xml:id="EXAMPLE.p.1.s.1"> <w xml:id="EXAMPLE.p.1.s.1.w.1"> <t>Hello</t> </w> <w xml:id="EXAMPLE.p.1.s.1.w.2"> <t>world</t> </w> </s></p></text></FoLiA>
<w xml:id="EXAMPLE.p.1.s.1.w.1"> <t>huizen</t> <pos set="CGN" class="N" annotator="Maarten van Gompel" annotatortype="manual" confidence="0.99" /> <lemma class="huis" /></w>
Maarten van Gompel, Martin Reynaert & Antal van den Bosch
ILK Research GroupTilburg Center for Cognition and CommunicationTilburg University
2011-03
- declaration -
- The linguistic annotations used must be declared- Declarations may define a default set, annotator, etc...- Usage of further metadata is free; CMDI recommended
- annotation layers -
<s> <w xml:id="X"><t>he</t></w> <w xml:id="Y"><t>talks</t></w> <syntax> <su class="S"> <su class="PRON"><wref id="X" /></su> <su class="VP"> <su class="V"><wref id="Y" /></su> </su> </su> </syntax></s>
DECLARATIONIN METADATA
References (wref) to word tokens (w) from span annotation elementswithin annotation layers
<annotations> <pos-annotation set="CGN" annotator="Frog" annotatortype="auto"/> </annotations>
Confidence
with a certain
FoLiA development is being funded by CLARIN-NL (www.clarin.nl)