Annotation

TimeML and the legal domain

While exploring how to do temporal annotation in the legal domain, we found out that most temporal expressions were not actually relevant or supported by the temporal annotation standard TimeML. We decided therefore to do create two different annotation sets:

  • StandardTimeML - In this Annotation Set we followed the TimeML guidelines, annotating all the Temporal Expressions we found.
  • LegalTimeML - In this Annotation Set we avoid some annotations, such as for instance mentions to years when citing legal articles. Even when they are temporal references, they are not temporal references in the text. More information on the criteria and the guidelines followed can be found in a future publication.
We also provide within the corpus the Annotation Sets with the output of ten state-of-the-art taggers, so comparison is straight forward if using the GATE. Also TimeML documents with the two manually generated Annotation Sets are provided.

Dataset

Corpus download

Annotated Corpus The corpus contains 30 court decisions (10 per source) from the European Court of Human Rights (ECHR), the European Court of Justice (ECJ) and the United States Supreme Court (USC) annotated following the TimeML Guidelines (TIMEX3). In the corpus you will find:

  • GATE XML documents: Containing for each court the two sets of manual annotations and ten of state-of-the-art temporal taggers output:
    • StandardTimeML - Manual annotations following the TimeML guidelines.
    • LegalTimeML - Annotations adapted to the legal domain following the TimeML schema but with dedicated guidelines.
    • HeidelTime: the output of HeidelTime.
    • SUTime - the output of SUTime
    • GUTime - the output of GUTime as part of the TARSQI toolkit
    • CAEVO - the output of CAEVO
    • ClearTK-TimeML - the output of ClearTK-TimeML as part of the ClearTK framework
    • UWTime - the output of UWTime
    • TIPSEM - the output of TIPSEM
    • TERNIP - the output of TERNIP
    • USFD2 - the output of USFD2 (slightly modified to generate TIMEX3 tags).
    • SYNTIME - the output of SYNTIME
  • StandardTimeML - TimeML documents with StandardTimeML set annotations: The .tml documents for each court in the official TimeML XML format.
  • LegalTimeML - TimeML documents with LegalTimeML set annotations: The .tml documents for each court in the official TimeML XML format
  • PlainTimeML - The original documents in TimeML format but without any tags (intended for test purposes).
 
 

Original Plain Corpus The corpus contains 30 court decisions (10 per source, as TXT) from the European Court of Human Rights (ECHR), the European Court of Justice (ECJ) and the United States Supreme Court (USC).  

Authorship

This work has been done by María Navas-Loro and Víctor Rodríguez-Doncel (Ontology Engineering Group, Universidad Politécnica de Madrid), and Erwin Filtz, Sabrina Kirrane and Axel Polleres (Institute for Information Business, WUWien). Corpora are freely downloadable under a GNU General Public License v3.0 license.

Our paper "TempCourt: Evaluation of Temporal Taggers on a new Corpus of Court Decisions" has been accepted in The Knowledge Engineering Review journal. If you plan to publish a work using this resource, please refer to this webpage in the meantime.