Stylebook for the Tübingen Treebank of Written German - CiteSeerX

Our annotation scheme tries to find a trade-off between pragmatic ...... indentation, however, signals that the comma cannot necessarly be attached to this node.
1MB Größe 6 Downloads 140 Ansichten
Stylebook for the T¨ubingen Treebank of Written German (T¨uBa-D/Z) Heike Telljohann, Erhard W. Hinrichs, Sandra K¨ ubler, Heike Zinsmeister, Kathrin Beck Universit¨at T¨ ubingen Seminar f¨ ur Sprachwissenschaft Wilhelmstr. 19 D-72074 T¨ ubingen {telljohann,eh,kuebler,zinsmeis,kbeck}@sfs.uni-tuebingen.de November 2009

Abstract This stylebook is an updated version of Telljohann et al. (2006). It describes the design principles and the annotation scheme for the German treebank T¨ uBa-D/Z developed by the Division of Computational Linguistics (Lehrstuhl Prof. Hinrichs) at the Department of Linguistics (Seminar f¨ ur Sprachwissenschaft – SfS) of the Eberhard Karls Universit¨at T¨ ubingen, Germany. The guidelines focus on the syntactic annotation of written language data taken from the German newspaper ’die tageszeitung’ (taz). The unannotated taz newspaper material was taken from the Science CD (Wissenschafts-CD) of ’die tageszeitung’ (taz) that can be licensed from contrapress media GmbH (http://shop.taz.de/index.php?cat=c18_taz-Archiv.html). At present, the treebank comprises 45,200 sentences. The newspaper material is taken from the taz editions from 1992 July 10, 11, 13, 14 1995 October 14, 16, 17 1999 April 30, May 3 – 7 The average sentence length is 17.6 words and the total number of tokens currently amounts to 794,079. The T¨ uBa-D/Z treebank is still under development. Thus, the number of annotated sentences will increase over time. Periodic data updates and accompanying updates of this stylebook will be made available at: http://www.sfs.uni-tuebingen.de/en/de_tuebadz.shtml Please consult this website in order to ensure that you are using the most recent and most complete version of the treebank. The annotation scheme for the T¨ uBa-D/Z treebank is derived from the verbmobil treebank for spoken German, developed earlier (1997–2000) by the Division of Computational Linguistics of the SfS (Hinrichs et al. 2000). The T¨ uBa-D/Z annotation scheme has been extended along various dimensions to accommodate the characteristics of written texts. In order to ensure the reusability of the data, a surface-oriented annotation scheme has been adopted that is inspired by the notion of topological fields and is enriched by a level of predicate-argument structure. The linguistic inventory used in the treebank annotation is based on a minimal set of assumptions that are uncontroversial among major syntactic theories. In this sense it is an attempt at theory-neutrality.

0

Acknowledgements The annotation of the T¨ uBa-D/Z treebank is carried out as part of the Competence Center for Text- and Information Technology (or officially: Kompetenzzentrum f¨ ur Text- und Informationstechnologie – KIT). KIT is a joint project of the Institute for Natural Language Processing (IMS) in Stuttgart, and the ’Seminar f¨ ur Sprachwissenschaft’ (SfS) in T¨ ubingen. Funding has been provided since 2000 by the Ministry of Science, Research and the Arts Baden-W¨ urttemberg. Additional support for the creation of the T¨ uBa-D/Z treebank was provided by the special research center Linguistic Data Structures (Sonderforschungsbereich Linguistische Datenstrukturen – SFB 441) at the University of T¨ ubingen funded by the German Research Council (Deutsche Forschungsgemeinschaft – DFG). A project of this scale would not be possible without the generous support from many contributors: Our special thanks go to ’die tageszeitung’ (taz) who kindly granted permission to process the newspaper data and to release the treebank. We would like to acknowledge Rosmary Stegmann for her many contributions to the treebank of spoken German in verbmobil. Her research laid the foundations for the annotation scheme of that treebank, which has been summarized in the ’Stylebook for the German Treebank in verbmobil’ (Stegmann et al. 2000). We would like to thank Manfred Sailer and Frank Richter for their helpful comments and support in form of encouragement and critical discussions from which we could strongly benefit for the challenging task of developing a dataoriented syntactic annotation scheme for spoken as well as for written German. Furthermore, we are indebted to Tylman Ule for his assistance with part-ofspeech tagging of the data and with data conversion. We would also like to acknowledge the support of Martina Liepert and Jorn Veenstra, who initiated and developed the integration of named entities into the annotation scheme. Moreover, we would like to thank Julia Trushkina (see Trushkina 2004) and Yannick Versley who provided the tools for morphological preprocessing. The quality of the treebank has been considerably improved by feature oriented consistency checks developed by Ventsislav Zhechev. Further consistency tests were contributed by Tylman Ule and Frank H. M¨ uller in the course of their research work in the SFB 441. They deserve special mention for their support. We like to thank Vera M¨oller and Karin Naumann (see (Naumann and M¨oller 2007)) for annotating anaphora and coreference relations and also for doing an excellent job in documenting the concepts. Yannick Versley and Holger Wunsch supported the project in various aspects. In course of their PhD projects in the SFB 441 they enhanced the conceptual aspects of the anaphora resolution as annotated in the treebank. They also 1

wrote mapping and conversion tools for integrating the anaphora annotion in the export XML-format. For their diligence and dedication to the arduous task of linguistic annotation and of post-editing we thank our research assistants Janne Berlacher, Anne Brock, Armin Buch, Silke Dutz, Katrin Eichler, Emilia Ellsiepen, Steffen Froemel, Holger Gauza, Simone Hartung, Daniel H¨ uttl, Heike Johannsen, Miriam K¨ashammer, Laura Kassner, Sarah Klug, Janina Kopp, Christian Kreß, Rebecca Kreß, Michael Kossack, Anne Lohse, Wolfgang Maier, Nicole Maruschka, Kai Metzger, Vera M¨oller, Simone M¨ uller, Maja Pietsch, Andreas Rudin, Maria Schmidt, Marie Schreier, Insa Starr, Melanie St¨orzer, and Dominikus Wetzel. They also improved the linguistic quality of the annotation by dedicated discussions on problematic and interesting examples.

2

The development of the T¨ uBa-D/Z treebank was notably facilitated by a number of former verbmobil partners whose contributions went well beyond the call of duty. Hans Uszkoreit and his colleagues at the ’Universit¨at des Saarlandes’ kindly provided us with the graphical annotation tool Annotate (Plaehn 1998) which was developed as part of the research project (Teilprojekt C3; Principal investigators: Uszkoreit/Smolka) Nebenl¨ aufige grammatische Verarbeitung (NEGRA) in the Sonderforschungsbereich 378. The Annotate tool provides human annotators with a graphical, user-friendly interface for annotating and editing trees and also offers database support for maintaining large treebanks. We would like to express our special gratitude to Thorsten Brants, who has kindly and generously provided us with software support and user assistance for the Annotate tool from the very beginning of the T¨ ubingen treebank project.

3

Contents List of Tables

7

1 Introduction

8

2 Major Challenges and Design Decisions

10

3 The Theoretical Basis of the Annotation Scheme 3.1 Topological Fields . . . . . . . . . . . . . . . . . . 3.1.1 The Concept of Topological Fields . . . . . 3.2 Constituent Analysis and Topological Fields . . . . 3.3 General Annotation Principles . . . . . . . . . . . 3.3.1 Flat Clustering Principle . . . . . . . . . . 3.3.2 Longest Match Principle . . . . . . . . . . . 3.3.3 High Attachment Principle . . . . . . . . . 3.4 The Structure of an Annotated Tree . . . . . . . . 3.4.1 The Levels of Annotation . . . . . . . . . . 3.4.2 The Inventory of Labels . . . . . . . . . . . 3.4.3 What Is a Syntactic Unit? . . . . . . . . . 3.4.4 Printing and Spelling Errors . . . . . . . . . 3.4.5 Isolated Phrases . . . . . . . . . . . . . . . 3.4.6 Long-Distance Dependencies . . . . . . . . 3.4.7 Empty Categories . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

13 13 13 16 17 17 17 17 17 17 18 23 26 27 29 30

4 The Annotation of the Internal Structure of Phrases 4.1 Premodification and Postmodification in Phrases . . . 4.2 Noun Phrases . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Prenominal Modification . . . . . . . . . . . . 4.2.2 Postnominal Modification . . . . . . . . . . . . 4.2.3 Appositional Constructions . . . . . . . . . . . 4.2.4 Foreign Language Material . . . . . . . . . . . 4.2.5 Proper Nouns and Named Entities . . . . . . . 4.2.6 Ordinal Numbers . . . . . . . . . . . . . . . . . 4.2.7 Cardinal Numbers . . . . . . . . . . . . . . . . 4.2.8 Letters and Non-Words . . . . . . . . . . . . . 4.2.9 Expletive and Other Uses of es . . . . . . . . . 4.3 Determiner Phrases . . . . . . . . . . . . . . . . . . . 4.4 Prepositional Phrases . . . . . . . . . . . . . . . . . . 4.4.1 Prepositions . . . . . . . . . . . . . . . . . . . 4.4.2 Circumpositions and Postpositions . . . . . . . 4

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

31 31 31 31 35 38 42 45 54 55 56 57 60 61 61 64

. . . . . . . . . . . . . . .

4.5 4.6 4.7

Adjectival Phrases . . . . . . . . . . . . . . . . . . . . . . . . Adverbial Phrases . . . . . . . . . . . . . . . . . . . . . . . . Verb Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Head of a Sentence and Verb Complex . . . . . . . . . 4.7.2 Verb Complexes in Verb-second and Verb-final Clauses 4.7.3 Ersatzinfinitiv Constructions . . . . . . . . . . . . . . 4.7.4 Infinitives with zu . . . . . . . . . . . . . . . . . . . . 4.7.5 Coherency and Incoherency of Verbal Constructions . 4.7.6 AcI Constructions . . . . . . . . . . . . . . . . . . . . 4.7.7 Imperatives . . . . . . . . . . . . . . . . . . . . . . . . 4.7.8 Particle Verbs . . . . . . . . . . . . . . . . . . . . . . 4.7.9 Verbs with Predicate . . . . . . . . . . . . . . . . . . . 4.7.10 Modal Verbs . . . . . . . . . . . . . . . . . . . . . . .

5 Attachment Principles for Phrases 5.1 Attachment to Fields . . . . . . . . . . . . . . 5.2 Attachment of Ambiguous Complements . . . . 5.3 Modifier Attachment . . . . . . . . . . . . . . . 5.3.1 Modifier Attachment in the Initial Field 5.3.2 Attachment across Punctuation Marks . 5.3.3 Ambiguous Modifiers in Isolated Phrases

. . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

6 The Annotation of Sentences 6.1 Sentence Initial Fields . . . . . . . . . . . . . . . . . . . . . . 6.1.1 The C-Field in Verb-Final Clauses . . . . . . . . . . . 6.1.2 The C-Field in Verb-Second Clauses . . . . . . . . . . 6.1.3 The KOORD-Field in all Clause Types . . . . . . . . 6.1.4 The PARORD-Field in Verb-Second Clauses . . . . . 6.1.5 Resumptive Constructions: The LV-Field . . . . . . . 6.2 Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 W-Questions . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Yes - No Questions . . . . . . . . . . . . . . . . . . . . 6.3 Relative Clauses . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Event-modifying Relative Clauses . . . . . . . . . . . . 6.3.2 Independent Relative Clauses . . . . . . . . . . . . . . 6.4 Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Coordination of Phrases . . . . . . . . . . . . . . . . . 6.4.2 Asymmetric Coordination . . . . . . . . . . . . . . . . 6.4.3 Coordinations with Complex Conjunctions . . . . . . 6.4.4 Coordinations with Truncated Words . . . . . . . . . 6.4.5 Attachment Principles of Coordination within Phrases 6.4.6 Coordination of Topological Fields . . . . . . . . . . . 6.4.7 Attachment of Ambiguous Modifiers in Coordination . 6.4.8 Coordination of Sentences . . . . . . . . . . . . . . . . 6.4.9 Paratactic Constructions with denn and weil . . . . . 6.4.10 Conjunctions Occurring with Isolated Phrases . . . . . 6.4.11 Split Coordinations . . . . . . . . . . . . . . . . . . . 6.5 Elliptical Constructions . . . . . . . . . . . . . . . . . . . . . 5

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

65 70 71 71 71 73 75 77 78 79 81 82 85

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

86 86 86 87 89 89 90

. . . . . . . . . . . . . . . . . . . . . . . . .

92 92 92 94 94 95 96 97 97 98 99 100 100 101 102 103 104 106 107 108 109 111 113 113 114 116

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . .

7 The Annotation of Specific Syntactic Phenomena 7.1 Superlative and Comparative Forms . . . . . . . . 7.1.1 Superlative Forms . . . . . . . . . . . . . . 7.1.2 The Comparative Particles wie and als . . . 7.2 Verbal and Adjectival Use of Participles . . . . . . 7.3 Topicalization . . . . . . . . . . . . . . . . . . . . 7.4 Headlines . . . . . . . . . . . . . . . . . . . . . . . 7.5 Discourse Markers . . . . . . . . . . . . . . . . . . 7.6 Parentheses . . . . . . . . . . . . . . . . . . . . . . 7.7 Elliptical weil and wenn auch Constructions . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

8 Criteria for the Distinction of Grammatical Functions 8.1 Subcategorization of Verbs . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Distinction of FOPP, OPP, and V-MOD . . . . . . . . . . . . 8.1.2 Distinction of MOD, MOD-MOD, and V-MOD . . . . . . . . 8.1.3 Distinction of ON, PRED, ON-MOD, and PRED-MOD . . .

. . . . . . . . . . . . .

. . . . . . . . .

119 119 119 119 122 123 124 125 128 129

. . . .

131 131 132 132 133

References

135

Appendix: The T¨ uBa-D/Z Data Formats

137

Index

144

6

List of Tables 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8

Three clause types according to H¨ohle (1986) . . . . . Topological fields . . . . . . . . . . . . . . . . . . . . Levels of annotation . . . . . . . . . . . . . . . . . . Morphological feature combinations for lexical tokens Values of morphological features . . . . . . . . . . . . Node labels . . . . . . . . . . . . . . . . . . . . . . . Edge labels . . . . . . . . . . . . . . . . . . . . . . . Labels for proper nouns and named entities . . . . .

. . . . . . . .

14 15 18 22 23 24 25 26

4.1

Types of es . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

7

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

Chapter 1 Introduction The purpose of this report is to describe the design principles and annotation scheme for the T¨ uBa-D/Z treebank of German. It is intended as a guide for the treebank annotators in T¨ ubingen and for theoretical and computational linguists who want to use annotated treebank data for their own research. In addition, we hope that this report may be of some use for researchers who want to construct their own treebank for German or for some other language. We would like to emphasize that the annotation scheme is language-specific, and we advise against adopting this scheme without modification for some other language. However, we do believe that the type of design decisions that are reported here for German will arise for other languages as well. And it is in this sense that the current report could provide an useful point of reference. The T¨ uBa-D/Z treebank was developed by the Division of Computational Linguistics (Lehrstuhl Prof. Hinrichs) at the Department of Linguistics (Seminar f¨ ur Sprachwissenschaft – SfS) of the Eberhard Karls Universit¨at T¨ ubingen, Germany. The guidelines focus on the syntactic annotation of written language data taken from the German newspaper ’die tageszeitung’ (taz). The unannotated taz newspaper material was taken from the Science CD (Wissenschafts-CD) of ’die tageszeitung’ (taz) that can be licensed from contrapress media GmbH (http://shop.taz.de/index.php?cat=c18_taz-Archiv.html). At present, the treebank comprises 45,200 sentences. The newspaper material is taken from the taz editions from 1992 July 10, 11, 13, 14 1995 October 14, 16, 17 1999 April 30, May 3 – 7. The average sentence length is 17.6 words and the total number of tokens currently amounts to 794,079. The T¨ uBa-D/Z treebank is still under development. Thus, the number of annotated sentences will increase over time. Periodic data updates and accompanying updates of this stylebook will be made available at: http://www.sfs.uni-tuebingen.de/en/de_tuebadz.shtml Please consult this website in order to ensure that you are using the most recent and most complete version of the treebank. The annotation scheme for the T¨ uBa-D/Z treebank is derived from the verbmobil treebank for spoken German, developed earlier (1997–2000) by the Division of Computational Linguistics of the SfS (Hinrichs et al. 2000). The annotation scheme for the verbmobil treebank has been summarized in the ’Stylebook for the German Treebank in verbmobil’ (Stegmann et al. 2000). The T¨ uBa-D/Z annotation scheme has been extended along various dimensions to accommodate the characteristics of written texts. 8

In order to ensure the reusability of the data, the linguistic inventory used in the treebank annotation is based on a minimal set of assumptions that are uncontroversial among major syntactic theories. In this sense it is an attempt at theory-neutrality. The T¨ uBa-D/Z treebank is released in three different data formats : the Negra export format, the Penn treebank format, and in XML format. More information about each data format is given in Appendix: The TBa-D/Z Data Formats. To the best of our knowledge, the verbmobil treebank for spoken German is still the only treebank based on German speech data. It is released as T¨ uBa-D/S treebank (http://www.sfs.uni-tuebingen.de/en/de_tuebads.shtml). For written texts, T¨ uBa-D/Z is not the only treebank available for German. Two other (semi-)manually annotated treebanks are currently available, each with their own annotation scheme: the Negra treebank (http://www.coli.uni-saarland.de/projects/sfb378/negra-corpus/) and the TIGER treebank (http://www.ims.uni-stuttgart.de/projekte/TIGER/). The Tbingen Partially Parsed Corpus of Written German (TPP-D/Z; http://www. sfs.uni-tuebingen.de/en/de_tuepp.shtml) is a project closely related to the TBaD/Z treebank. It consists of 200 million word tokens of the Science CD (WissenschaftsCD) of ’die tageszeitung’ (taz), including the sentences which are annotated in the TBaD/Z treebank. The texts were automatically annotated with clause structure, topological fields, and chunks, in addition to more low level annotation including parts of speech and morphological ambiguity classes. The first release of TBa-D/Z (12/2003) functioned as training corpus.

9

Chapter 2 Major Challenges and Design Decisions Most syntactic theories consider individual sentences as the primary domain of linguistic theorizing and of syntactic annotation. For written language, the segmentation into sentences is largely unproblematic and coincides with the domain of syntactic analysis. However, newspaper texts exhibit a number of phenomena that do no lend themselves easily to a purely sentence-based annotation. These phenomena include: headlines, titles, parentheses, discourse markers, and sentence conjunction by a colon. These cases are described in more detail in sections 3.4.3 to 3.4.5 of this stylebook. The second main question, which needed to be addressed at the outset of the project was the inventory of syntactic categories and grammatical functions to be used for syntactic annotation and specification of predicate-argument structure. Here our choices were guided by two main considerations: 1. Linguistic adequacy and theory-neutrality: For the purposes of reusability of the treebank data, the annotation scheme should not reflect a commitment to a particular syntactic theory. Rather, the inventory of categories should be a reflection of common assumptions that syntacticians share across different frameworks concerning questions of constituenthood, phrase attachment, and grammatical functions. On this note, the annotations should be theory-neutral and minimal. This desideratum is of utmost importance so as to ensure the reusability of the annotated data. At the same time, the annotation scheme should reflect as much as possible those empirical generalizations that syntacticians, especially from a descriptive perspective, have identified as characteristic of the language in question. 2. Balancing the needs of potential users: Since the construction of a treebank is a labor-intensive and costly enterprise, ideally the T¨ uBa-D/Z treebank should appeal to as many potential users as possible. Moreover, the treebank should be of interest to researchers of a wide range of different fields. Considering the renewed interest in the use of corpora for both theoretical and computational linguistics, choicepoints in the annotation scheme should be resolved in such a way that the needs of potential users are balanced as much as possible. To support the use of the T¨ uBa-D/Z treebank in computational linguistics, the annotation scheme should be sensitive to processing considerations, as long as linguistic adequacy of the choice of annotations is not compromised. Ceteris paribus, processing 10

considerations favor annotation schemes that pay close attention to properties of syntactic surface structure, particularly to word order regularities and distributional properties of words and phrases. At the same time, the use of empty categories and data structures with crossing dependencies among phrases are to be avoided if the annotations are to be used for parsers that rely on the context-freeness of the underlying grammar. In order to satisfy the above aims, the annotation scheme is surface-oriented and context-free. The theoretical assumptions underlying the levels of annotation and the choice of labels themselves are as much as possible based on a rich tradition of theoretical and empirical research on German syntax. For the treatment of word regularities of German, which is a language with relatively free word order, an inventory of topological fields is incorporated into the annotation scheme. Topological fields in the sense of Herling (1821), Erdmann (1886), Drach (1937), and H¨ohle (1986) are widely used in descriptive studies of German syntax. Such fields constitute an intermediate layer of analysis above the level of individual phrases and below the clause level. The concept of topological fields favors tree-based annotations, i.e. bracketings that do not rely on crossing or discontinuous dependencies. Instead, such nonlinear dependencies are to be expressed at the level of predicate-argument structure which constitutes a second level of annotation with its own descriptive inventory of grammatical functions. The framework of topological fields is widely used in empirical and theoretical accounts of German syntax. Thus, it is in the linguistics literature. This greatly facilitates thorough training of human annotators, since they can rely on the pre-existing body of literature. One purpose of this stylebook is to add to these reference materials. Currently, a total of 25 syntactic node labels for the encoding of constituent structures are being used. These include labels for topological fields as well as labels for phrases and their constituent parts. In order to capture grammatical functions of individual phrases and syntactic dependencies between phrases, constituent structure trees are enriched by a set of edge labels between constituent structure nodes. The current inventory of edge labels comprises 41 distinct categories. In addition to these primary edge labels, four secondary edge labels are used. These labels indicate phrase-internal government of elements in the verb complex, express phrase-internal modification of noun phrases, resolve long-distance dependencies among modifiers, or relate the phrasal complements of so-called third-construction control verbs. For certain computational applications, robust identification of named entities, e.g. person names, names of companies and institutions, names of geographical locations, is a major concern. Therefore, such named entities are identified by a special node label, and their internal structure is sometimes identified by an additional secondary edge label that is used exclusively for named entities. At the word level, part-of-speech labels are assigned according to the StuttgartT¨ ubingen tagset, which is widely accepted for part-of-speech tagging for German and which provides an inventory of 54 distinct part-of-speech labels. In addition, information on inflectional morphology is given. Detailed information about the complete inventory of node labels, edge labels, partof-speech labels and inflectional feature clusters is given in section 3.4.2 of this stylebook. The remainder of this stylebook is organized as follows: chapter 3 offers an overview of the theoretical foundations of the annotation scheme, focusing on the concept of topological fields (3.1) and its relation to constituent structure (3.2), on general annotation 11

principles (3.3), as well as an overview of the annotation levels and of the inventory of the annotation labels for each level (3.4). Chapter 4 concerns the annotation of the internal structure of phrases, broken down into major word classes and their phrasal projections. Chapter 5 addresses the principles for relating individual phrases to each other, particularly for modifier and complement attachment. Chapter 6 discusses the annotation of entire sentences, focusing on the relationship between sentence types and topological fields, coordination (including phrasal conjunction) and elliptical constructions. Chapter 7 is devoted to the annotation of miscellaneous syntactic constructions such as comparatives, verbal and adjectival participles, topicalization, newspaper headlines, discourse markers, and parentheses, which each pose special challenges for the annotation tasks. Chapter 8 describes the criteria used for distinguishing different grammatical functions. The stylebook concludes with a bibliography, a subject index, and an appendix which describes the three different data formats in which the T¨ uBa-D/Z treebank is distributed. We do not consider the annotation level of anaphora and coreference relations in this stylebook. Please consult (Naumann and M¨oller 2007) for a detailed description of these phenomena.

12

Chapter 3 The Theoretical Basis of the Annotation Scheme 3.1

Topological Fields

The annotation scheme for the T¨ uBa-D/Z treebank has been developed with special regard to the characteristics of the German language: the interaction of configurational and non-configurational syntactic properties, which arise from the partially free word order. On the one hand, there exist three different clause types with respect to the fixed position of the finite verb (verb-second (V-2), verb-initial (V-1), and verb-final (V-end)). On the other hand, there is a high degree of variability of complements and adjuncts. In order to treat the relatively high degree of word order freedom in German, the treebank adopts the notion of topological fields as the primary clustering principle of a sentence. The basic characteristics of the model of topological sequences within a German sentence were originally formulated by Herling (1821) and Erdmann (1886). Herling (1821) developed an adequate topological theory for complex sentences in which clauses form a topological unit carrying a syntactic function and he mentioned the special position of the finite verb in verb-second und verb-final clauses. Erdmann (1886) established the basics of a theory of topological fields and pointed out that the first position in a clause is not necessarily the subject position. The so called Herling/Erdmann scheme already covers a set of word order regularities which apply for all three clause types of German. Later Drach (1937) introduced the notion of field. Finally, H¨ohle (1986) developed topological schemes for the three clause types.

3.1.1

The Concept of Topological Fields

In a German clause, the finite verb can appear in three different positions: verb-second, verb-initial, and verb-final. Only in verb-final clauses the verb complex consisting of the finite verb and non-finite verbal elements forms a unit. The discontinuous positioning of the verbal elements in verb-first and verb-second clauses is the traditional reason for structuring German clauses into fields. The positions of the verbal elements form the Satzklammer (sentence bracket) which divides the sentence into a Vorfeld (initial field), a Mittelfeld (middle field), and a Nachfeld (final field). The Vorfeld and the Mittelfeld are divided by the linke Satzklammer (left sentence bracket), which is the finite verb, the rechte Satzklammer (right sentence bracket) is the verb complex between the 13

Mittelfeld and the Nachfeld. Thus, the theory of topological fields states the fundamental regularities of German word order. It is an important basis for the topological analysis of any German sentence, since subclauses and embedded clauses are treated within the bounds of fields. Identical word order regularities within a specific field can be realized in all three clause types. But the fields themselves differ in their possible elements and grammatical rules. Therefore, the theory is a descriptive rather than explanatory theory for a specific language. H¨ohle (1986) denotes the three clause types as E-S¨atze (verb-final clauses), F1S¨atze (verb-initial clauses), and F2-S¨atze (verb-second clauses). The topological schemes of these types are listed in Table 3.1. Table 3.1: Three clause types according to H¨ohle (1986) E-S¨atze (KOORD) - (C) - X - VK - Y F1-S¨atze (KOORD - (KL) - FINIT - X - VK - Y F2-S¨atze (KOORD or PARORD) - (KL) - K - FINIT - X - VK - Y

Abbreviations and explanations used in Table 3.1: VK: verb complex FINIT: element denoting categories of finiteness KOORD: coordinating particles (e.g. und, oder) PARORD: non-coordinating particles (e.g. denn, weil) X, Y: sequence of any number of constituents C: complementizer K: one constituent KL: nominativus pendens, resumptive construction (Linksversetzung) These schemes topologically analyse not only atomic sentences but also complex sentence constructions which contain embedded clauses. Such embedded clauses can occur in a Linksversetzung (resumptive construction), Vorfeld, Mittelfeld, or Nachfeld. Herling’s theory of the coordination and embedding of sentences covers these phenomena in detail (Herling 1821). According to H¨ohle (1986), we assume the existence of the following topological fields (cf. Table 3.2):

The following description of the topological fields does not claim completeness regarding all descriptive details but rather mentions their main characteristics.1 VF: The Vorfeld consists of only one constituent. Usually it is the subject2 . But because of the high degree of non-configurationality in German, the subject can also occur in the Mittelfeld, thus allowing almost every other constituent to occupy the Vorfeld. 1 2

In the following, the abbreviations for the fields listed in Table 3.2 are used. In the fifth release, 52.5% of all Vorfeld fields host the subject.

14

Table 3.2: Topological fields Field VF LK MF VC NF LV C KOORD

Description Vorfeld (initial field) Linke (Satz-)Klammer (left sentence bracket) Mittelfeld (middle field) Verbkomplex (verb complex) Nachfeld (final field) Linksversetzungsfeld (field for resumptive constructions) C-Feld (field for complementizers, left from MF) Koordinationsfeld (field for coordinating particles) left-most element, optionally in all clause types, (e.g. und, oder) PARORD Koordinationsfeld (field for non-coordinating particles) left-most element, optionally only in verb-second (e.g. denn, weil)

LK: The Linke Klammer is the position of the finite verb in verb-second and verb-first clauses or a conjunction in verb-final clauses. It consists of exactly one element. MF: Apart from those units which are optionally located in other fields, any non-verbal constituent may occur in the Mittelfeld. It consists of a sequence of any number of constituents. The linear order of the constituents depends on the specific word order principles for German and their interaction. VC: The Verbkomplex is a sequence of verb forms. In verb-second and verb-first clauses it consists of one or more non-finite elements or - depending on the verb - of a separable prefix. In verb-final clauses it also contains the finite verb. The rule for the linear order in general is: right determines left. If there is a finite verb in the verb complex, it is usually the right-most element (exception: Ersatzinfinitiv constructions (daß er sich ein neues Konzept wird u ¨berlegen m¨ ussen) (cf. 4.7.3). NF: For some clause types (e.g. so daß-S¨atze), the Nachfeld is the obligatory position. Embedded complement clauses, relative clauses, and single constituents can optionally occur in the Nachfeld. In contrast to the Vorfeld it may be occupied by any number of constituents. LV: The Linksversetzungsfeld is a field for the left-dislocated phrase of resumptive constructions. A Linksversetzung is a pendent constituent. It can be regarded as a syntactic anticipation of a part of a sentence (cf. 6.1.5). There are many restrictions which apply for this position. C: The C-Feld only occurs in verb-final clauses (exception: the conjunction als in subordinated sentences of comparison als w¨are es nie geschehen.). It is obligatorily occupied in finite verb-final clauses if there is no conjunction in the Linke Klammer. In non-finite verb-final clauses the C-position may be empty. This field can be occupied by conjunctions of sentential objects (e.g. daß, ob) or sentence initial conjunctions like um, 15

obwohl, wenn and also by complex interrogative or relative phrases, e.g. ..., ’um wieviel Geld’ geht es dabei? / ..., ’an der’ Max Daniel Professor f¨ ur Klavier ist. (cf. 6.1.1). KOORD: The KOORD-field is the field for coordinating particles. In contrast to the PARORD-field, it can optionally occur as the left-most element of all clause types (cf. 6.1.3). PARORD: The PARORD-field is the field for non-coordinating particles which optionally occur as the left-most element of a verb-second clause (cf. 6.1.4). Concerning the distribution of constituents to topological fields see also the chapter Deskriptive Generalisierungen in Grewendorf (1991). The combination of these fields in order to constitute verb-first, verb-second, or verb-final clauses is described in H¨ohle (1986). The topological model, which is the basis of most traditional German grammars, only provides descriptive parameters concerning the sentence structure without making any statement about the regularities within the fields and the hierarchical constituent structure of the sentence. For more complicated phenomena, it offers only a catalog of detailed descriptions.

3.2

Constituent Analysis and Topological Fields

The main weakness of the concept of topological fields is the above mentioned fact that the hierarchical constituent structure of a sentence cannot be described. The aim is to find a form of representation which combines the topological model with a constituent analysis in order to describe the hierarchy of the linguistic units within the fields. In our annotation scheme, the integration of a constituent analysis was achieved by a second level of annotation strictly within the bounds of topological fields: a predicate-argument structure with its own descriptive inventory of syntactic categories and grammatical functions. The constituent structure is represented by phrase structure trees (phrase markers) whose node and edge labels carry this information. In order to analyse syntactic constructions, it is necessary to define the number and types of constituents within the fields. 1. Number of constituents within the fields: In general, C, LK, KOORD, PARORD, and VF contain only one constituent. More than one constituent is allowed within MF and NF. 2. Types of constituents within the fields: Phrasal constituents occur in VF, MF, NF and C (interrogative or relative phrases). Embedded clauses either belong to NF, VF, LV, or in some cases to MF. Usually, outside the spoken language context, verb-final clauses do not occur isolated. They need to be attached if possible. 16

3.3

General Annotation Principles

Our annotation scheme tries to find a trade-off between pragmatic requirements on the one hand and linguistic reality on the other hand. The following three common annotation principles are adopted to group the constituents within a syntactic tree: the flat clustering principle, the longest match principle, and the high attachment principle.

3.3.1

Flat Clustering Principle

The flat clustering principle keeps the number of hierarchy levels in a syntactic structure as small as possible. As a consequence, any degree of branching is allowed. Constituents which cannot be assigned a grammatical function within a syntactic construction are structured as much as possible, but are not typically connected to surrounding constituents as a whole.

3.3.2

Longest Match Principle

The longest match principle demands that as many daughter nodes as possible are combined into a single mother node, provided that the resulting construction is syntactically as well as semantically well-formed.

3.3.3

High Attachment Principle

The high attachment principle prescribes that syntactically and semantically ambiguous modifiers are attached to the highest possible level in a tree structure. Premodifiers and postmodifiers are treated in a different way. First, both kinds of modifiers are projected to their phrase level. Since the modification scope of premodifiers is unambiguous, they are directly attached to the head of the phrase which they are modifying. By contrast, postmodifiers are always attached on a higher level to preserve ambiguity. This decision was taken to avoid the problematic distinction whether a postmodifier is a free adjunct or a complement of the modified phrase.

3.4 3.4.1

The Structure of an Annotated Tree The Levels of Annotation

A syntactic tree consists of nodes and edges. Nodes represent constituents on different levels of annotation. Edges always link daughter nodes to a mother node. The root node of a tree is assumed as the sentence node of a construction. One level below the sentence node, the nodes of the topological fields are located. This is the reason why topological fields can be regarded as the top-level ordering principle for sentences in the treebank. The sequence of the fields in the three clause types never violates the topological schemes given by H¨ohle (1986). Within each sentence structure, in general at least two topological fields are occupied (exception: infinitive constructions, (cf. 4.7.4). Others may be left empty (elliptical constructions, cf. 6.5). Table 3.3 lists the four levels of annotation which we distinguish within the structure of an annotated syntactic tree3 : 3

We do not consider the suprasentential annotation level of anaphora and coreference relations in this stylebook. Please consult (Naumann and M¨oller 2007) for a detailed description of these phenomena.

17

Table 3.3: Levels of annotation Level clause level field level phrase level lexical level

Inventory root node labels for different types of clauses node labels for topological fields (including labels for conjuncts of fields) node labels for syntactic categories and edge labels for grammatical functions lexical entries tagged with the part-of-speech (PoS-)tags taken from the STTS tagset (Schiller et al. 1995) and with morphological features (Trushkina 2004)

Node labels denote the syntactic category of a phrase or sentence, a topological field, or a grammatical property. Edge labels denote the grammatical function of lexical entries, phrases, topological fields, and clauses.

3.4.2

The Inventory of Labels

The part-of-speech tags used for the annotation are taken from the Stuttgart-T¨ ubingen 4 tagset (STTS) (Schiller et al. 1995). The STTS is a guideline for the annotation of German text corpora on the lexical level. Every single part-of-speech of a text is assigned one specific tag. The tagset consists of the tags listed in Table 3.4.2 (cf. (Schiller et al. 1995)). The tagging of the data was performed by the tnt tagger (Brants 1998) and manually corrected with the Annotate tool (Plaehn 1998). The morphological tags give information about inflectional morphology and include features such as case, number, person, etc. A specific combination of feature-value pairs is defined for each relevant part-of-speech category, see Table 3.4 for the list of part-of-speech categories that are annotated with morphological features and the corresponding feature combinations. The values are represented in a cluster by single character abbreviations, see Table 3.5 for the set of features and their values. Features can uniquely be identified by their position in the cluster. Node labels indicate the syntactic category of a phrase or sentence, but they are also used to label topological fields and sequences of topological fields within coordinations or to indicate specific grammatical properties of constituents. Table 3.6 lists all node labels which are used in the treebank. (An additional node is introduced for named entities, see Table 3.8) Edge labels indicate the grammatical function of lexical entries, phrases, topological fields, and clauses. Since case information is given and a distinction of different modifiers is made by these labels, the syntactic tree structures also contain semantic roles. The specific set of edge labels for the German treebank is listed in Table 3.7, including secondary edge labels. The latter ones are used to resolve ambiguities on a different level of description. 4

PAV was changed into a new tag called PROP (pronominal form of a prepositional phrase) in order to justify PX as the syntactic category of its mother.

18

Two specific edge labels denote whether a constituent has the function of a head (HD), e.g. a phrase (NX, PX, ADJX, ADVX, VXFIN, VXINF), or a non-head (-), e.g. a determiner or a modifier attached to a phrase. On any annotation level, there is at most one head. Within phrases, these two labels indicate the internal dependency structure of the phrase. The head of a sentence structure (e.g. SIMPX) is always the finite verb. In coordinations, each conjunct depends on the head of the whole construction and is denoted with a specific edge label (KONJ) in order to distinguish them from conjunctions and modifying elements within a coordination (see 6.4.1 and 6.4.3). Edge labels below all root node labels carry only non-head labels (cf. (K¨ ubler and Telljohann 2002)). In order to mark proper nouns, and named entities within the treebank, the node label (EN-ADD) and the secondary edge label (EN) are defined (see Table 3.8). EN-ADD is inserted between two nodes to indicate that the node below represents a complex proper noun, (e.g. Ute Wedemeier, The Jim Wane Swingtett), a single proper noun tagged as NN with respect to the STTS (e.g. S¨ ogestraße), or a named entity (e.g. Auf die st¨ urmische Art, Built to Spill) (cf. 4.2.5). EN-ADD is either directly attached to a head noun of a phrase or to a field. If it has a postmodifier, its mother node is NX which represents the nominal status of EN-ADD. The internal syntactic structure of EN-ADD is governed by the general annotation rules. The secondary edge label EN gives information about the relation of two parts of a proper noun within a complex phrase consisting, for instance, of an article and/or an attributive adjective which do not belong to the proper noun itself, e.g. der [zweite Weltkrieg EN] (cf. 4.2.5).

19

Table 3.3.1: The STTS tagset POS = ADJA ADJD ADV APPR APPRART APPO APZR ART CARD FM

description attributive adjective adverbial or predicative adjective adverb preposition; left circumposition preposition + article postposition right circumposition definite or indefinite article cardinal number foreign language material

ITJ KOUI

interjection subordinating conjunction with zu + infinitive subordinating conjunction with clause coordinative conjunction particle of comparison, no clause noun proper noun substituting demonstrative pronoun attributive demonstrative pronoun substituting indefinite pronoun attributive indefinite pronoun without determiner attributive indefinite pronoun with determiner irreflexive personal pronoun substituting possessive pronoun attributive possessive pronoun relative pronoun substituting attributive reflexive personal pronoun substituting interrogative pronoun attributive interrogative pronoun

KOUS KON KOKOM NN NE PDS PDAT PIS PIAT PIDAT PPER PPOSS PPOSAT PRELS PRELAT PRF PWS PWAT

20

examples [das] große [Haus] [er f¨ahrt] schnell, [er ist] schnell schon, bald, doch in [der Stadt], ohne [mich] im [Haus], zur [Sache] [ihm] zufolge, [der Sache] wegen [von jetzt] an der, die, das, ein, eine zwei [M¨anner], [im Jahre] 1994 [Er hat das mit “] A big fish [” u ¨bersetzt] mhm, ach, tja um [zu leben], anstatt [zu fragen] weil, daß, damit, wenn, ob und, oder, aber als, wie Tisch, Herr, [das] Reisen Hans, Hamburg, HSV dieser, jener jener [Mensch] keiner, viele, man, niemand kein [Mensch], irgendein [Glas] [ein] wenig [Wasser], [die] beiden [Br¨ uder] ich, er, ihm, mich, dir meins, deiner mein [Buch], deine [Mutter] [der Hund,] der [der Mann ,] dessen [Hund] sich, einander, dich, mir wer, was welche [Farbe], wessen [Hut]

POS = PWAV PROP PTKZU PTKNEG PTKVZ PTKANT PTKA TRUNC VVFIN VVIMP VVINF VVIZU VVPP VAFIN VAIMP VAINF VAPP VMFIN VMINF VMPP XY $, $. $(

description adverbial interrogative or relative pronoun pronominal adverb zu + infinitive negation particle separated verb particle answer particle particle with adjective or adverb truncated word - first part finite main verb imperative, main verb infinitive, main infinitive + zu, main past participle, main finite verb, aux imperative, aux infinitive, aux past participle, aux finite verb, modal infinitive, modal past participle, modal non-word containing special characters comma sentence-final punctuation other sentence internal punctuation

21

examples warum, wo, wann, wor¨ uber, wobei daf¨ ur, dabei, deswegen, trotzdem zu [gehen] nicht [er kommt] an, [er f¨ ahrt] rad ja, nein, danke, bitte am [sch¨onsten], zu [schnell] An– [und Abreise] [du] gehst, [wir] kommen [an] komm [!] gehen, ankommen anzukommen, loszulassen gegangen, angekommen [du] bist, [wir] werden sei [ruhig !] werden, sein gewesen d¨ urfen wollen [er hat] gekonnt D2XW3, letters , . ? ! ;: -[]()

Table 3.4: Morphological feature combinations for lexical tokens POS ADJA

feature combination case number gender

APPR

case

comments underspecified for gender if plural noun is underspecified, e.g. die/np* nordhessischen/np* Gr¨ unen/np* invariant local description e.g. Berliner/*** cardinal numbers as abbreviation: full morphology e.g. im 4./dsn Jahrhundert/dsn without case if a prepositions takes another PP as complement, e.g. bis/ zu/d einer/dsf Woche/dsf

APPRART case number, gender APPO ART NE NN

case case number gender case number gender case number gender

PDAT PDS PIAT

case number gender case number gender case number gender

PIDAT

case number gender

PIS

case number gender

PPER

case number gender person case number gender case number gender case number gender case number gender plural is underspecified for gender case number gender sich: underspecified for gender person case number gender plural is underspecified for gender wessen/*** case number gender underspecified for gender: plural forms and wer, wem, wen

PPOSAT PPOSS PRELAT PRELS PRF PWAT PWS

underspecified for gender, e.g. Abgeordnete (in plural), Leute

plural is underspecified for gender, e.g. lauter/***, see also ’PIS or PIAT’ below solch/*** (cf. manch, welch, all), see also ’PIS or PIDAT’ below underspecified: man/ns* nichts/*** (cf. nix, sowas) PIS or PIAT: allerhand/*** (cf. allerlei, allzuviel, dergleichen, derlei, etwas, genausoviel, genug, gengend, keinerlei, mehr, reichlich, soviel, viel, wenig, weniger, zuviel, zuwenig) PIDAT or PIS: sowas/*** (cf. paar, bißchen)

22

POS VAFIN VAIMP VMFIN VVFIN VVIMP

feature combination comments person number mood tense person number person number mood tense person number mood tense number German has only second person imperative forms

Table 3.5: Values of morphological features Feature case gender number mood person tense

3.4.3

Values n (nominative), g (genitive), d (dative), a (accusative), * (underspecified) m (masculine), f (feminine), n (neuter), * (underspecified) s (singular), p (plural), * (underspecified) i (indicative), k (subjunctive; German ’Konjunktiv’) 1 (first), 2 (second), 3 (third), * (underspecified) s (present), t (past)

What Is a Syntactic Unit?

The newspaper articles of the taz have been defined as the primary segmentation domain of the data. They are preprocessed into syntactic units delimited by punctuation marks (. ? ! ; - ... /) for which specific rules demand or forbid segmentation. Each syntactic unit is assigned a specific code which identifies its origin in the newspaper data, eg. T990507.123 (T (taz) 99 (year) 05 (month) 07 (day) 123 (article)). A syntactic unit usually consists of one complete sentence structure with a root node (SIMPX, R-SIMPX, P-SIMPX). But it may also consist of one or more sentences and/or phrases, e.g. headlines, titles, sentences with parentheses, sentences with discourse markers, or sentence conjunction by a colon. An annotated tree is a complete syntactically and semantically well-formed construction according to the longest match principle. The model of topological fields does not prescribe that all fields have to be occupied. The fact that fields can be left empty, also helps us to cope with elliptical constructions (cf. 6.5). Punctuation is not annotated, i.e., all punctuation marks are not attached to the tree structure. Exceptions are punctuation marks which carry a semantic meaning within a sentence, e.g. - (bis, und) in expressions like 15.30 - 17.30 Uhr. They are tagged according to the part of speech that they represent in the text (cf. 4.4.1). Constituents are not attached to a tree if they are not assigned a grammatical function within the specific syntactic construction. The following tree diagram shows two 23

Table 3.6: Node labels Node Labels Description Phrase Node Labels ADJX adjectival phrase ADVX adverbial phrase DP determiner phrase (e.g. gar keine) FX foreign language phrase NX noun phrase PX prepositional phrase VXFIN finite verb phrase VXINF non-finite verb phrase Topological Field Node Labels LV resumptive construction (Linksversetzung) C complementizer field (C-Feld) FKOORD coordination consisting of conjuncts of fields KOORD field for coordinating particles LK left sentence bracket (Linke (Satz-)Klammer) MF middle field (Mittelfeld) MFE middle field between VCE and VC NF final field (Nachfeld) PARORD field for non-coordinating particles VC verb complex (Verbkomplex) VCE verb complex with the split finite verb of Ersatzinfinitiv constructions VF initial field (Vorfeld) FKONJ conjunct consisting of more than one field Root Node Labels DM discourse marker P-SIMPX paratactic construction of simplex clauses R-SIMPX relative clause SIMPX simplex clause

24

Table 3.7: Edge labels Edge Labels

Description Edge Labels denoting Heads and Conjuncts HD head non-head KONJ conjunct Complement Edge Labels ON nominative object (i.e. subject; also clausal subjects) OD dative object OA accusative object OG genitive object OS sentential object OPP prepositional object OADVP adverbial object OADJP adjectival object PRED predicate OV verbal object FOPP facultative (i.e. optional) prepositional object, passivized subject (von-phrase) VPT separable verb prefix APP apposition Modifier Edge Labels MOD ambiguous modifier ON-MOD, OA-MOD, OD-MOD, modifiers modifying OG-MOD, OPP-MOD, OS-MOD, complements or modifiers PRED-MOD, FOPP-MOD, e.g. V-MOD = modifier of the verb OADVP-MOD, OADJP-MOD, V-MOD, MOD-MOD Edge Labels in Split Coordinations ONK, ODK, OAK, PREDK, second conjunct (K) in OPPK, FOPPK, OADVPK, split coordinations OSK, OA-MODK, e.g. ONK = second conjunct MODK, V-MODK of a nominative object Edge Label denoting Structural Expletive ES Vorfeld-es Secondary Edge Labels dependency relation between: REFVC two verbal objects in VC REFMOD two ambiguous modifiers REFINT a phrase internal part and its modifier REFCONTR control verb and its complement across clause boundaries

25

Table 3.8: Labels for proper nouns and named entities Labels

Description Phrase Node Labels proper noun or named entity (additional label) Secondary Edge Label phrase internal relation between two parts of a proper noun

EN-ADD EN

annotated trees in one syntactic unit:5 SIMPX 511 −







VF 510 V−MOD PX 506 −

LK 507 HD

HD NX 500 −

An

MF 508

VXFIN 501 HD

HD

ON

MOD

NX 502 HD

ADVX 503 HD

VXINF 504 HD

verwundet

NX 505 −

HD

der

Oder

er

dann

,

ein

Wadendurchschuß

APPR

ART

NE

VAFIN

PPER

ADV

VVPP

$,

ART

NN

$.

d

dsf

dsf

3sit

nsm3

−−

−−

−−

nsm

nsm

−−

0

1

2

wurde

VC 509 OV

3

4

5

6

7

8

9

.

10

The leaves of the trees consist of pairs of non-terminal symbols and part-of-speech tags. Non-terminal symbols are represented by spherical nodes, whereas edge labels are depicted by rectangular nodes. The tree diagram consists of two trees, a SIMPX and an isolated phrase. In accordance with the four annotation levels shown in Table 3.3, the sentence is annotated top-down by the root node (SIMPX), the field nodes (VF, LK, MF, and VC), the phrase nodes (PX, VXFIN, NX, ADVX, and VXINF), and finally the tagged lexical entries. The edge labels between the field level and the phrase level indicate that the syntactic structure contains one unambiguous modifier (V-MOD), a subject (ON), one ambiguous modifier (MOD), a verbal object (OV), and the finite verb, which itself is the head (HD) of the entire syntactic construction. The noun phrase (ein Wadendurchschuß) is not attached to the sentence structure because otherwise the wellformedness of the construction would be violated. Thus, it has to be annotated as an isolated phrase lacking a verbal constituent.

3.4.4

Printing and Spelling Errors

In contrast to spoken language data like in the Verbmobil (cf. (Stegmann et al. 2000)) which exhibit fragmentary utterances, false starts, repetitions, interruptions, and hesitation noises as its characteristic properties, data taken from newspaper corpora does not include unintentionally formed syntactic constructions. Deviations from syntactic wellformedness are either intended by the author or are caused by printing errors. While incorrect writing of words is neglected in the syntactic 5

These tree diagrams and all following tree diagrams in this report were generated with the aid of the Negra Annotate tool.

26

analysis (the respective lexical entry is marked with the correct writing of the word in a comment line below), lexical elements which do not belong to the syntactic construction (intentional or unintentional) are structured as much as possible, but are not attached to the surrounding constituents: −

SIMPX 511 −





MF 510 ON VF 506 MOD

LK 507 HD

ADVX 500 HD

VXFIN 501 HD

Jetz

wollen

MOD

OA NX 508 −

− NX 502 HD

ADVX 503 HD

VC 509 OV

HD

ADJX 504 HD

VXINF 505 HD

Sie

wieder

ein

solches

ADV

VMFIN

PPER

ADV

ART

PIDAT

NN

VVINF

$.

−−

3pis

np*3

−−

asn

asn

asn

−−

−−

0

1

2

3

4

5

System

6

aufbauen

.

7

8

Jetzt SIMPX 517

SIMPX 518















VF 515 V−MOD

NF 516 FOPP

PX 508 −

NX 500 HD

Am

LK 509 HD

VF 510 ON

LK 511 HD

MF 512 V−MOD

VXFIN 501 HD

NX 502 HD

VXFIN 503 HD

PX 504 HD

HD

sie



von

der

Polizei

PPER

VAFIN

PROP

VVPP

VAPP

$(

APPR

ART

NN

dsm

dsm

3pit

−−

np*3

3pks

−−

−−

−−

−−

d

dsf

dsf

3.4.5

5

6

7

worden

HD

,

4

geschlagen

NX 507 −

$,

3

dabei

VXINF 506 HD

HD

VVFIN

2

seien

VXINF 505 HD



NN

1

erklärten

PX 514 HD

APPRART

0

Abend

VC 513 OV

8

9

10

11

Isolated Phrases

There are textual fragments in newspaper data which cannot be analysed as a SIMPX or as a constituent of a SIMPX because they are lacking a verbal constituent or they are not assigned a specific grammatical function within a well-formed sentence. These fragments are annotated as isolated phrases. The isolated elements are structured as much as possible (mostly up to the level of phrasal categories), but they are not typically connected to surrounding constituents as a whole, so that a conflict with the topological field analysis is avoided. Their root node carries a phrasal category of their lexical head (NX, PX, ADVX, etc.): ADVX 503 −

− PX 500 HD

Warum

ADVX 501 HD

0

auch

1

HD ADVX 502 HD

nicht

2

?

3

PWAV

ADV

PTKNEG

$.

−−

−−

−−

−−

27

12

13

PX 503 −

HD PX 502 −

HD

ADVX 500 HD

NX 501 HD

Hoffentlich

ohne

0

Nebenwirkungen

1

.

2

3

ADV

APPR

NN

$.

−−

a

apf

−−

In accordance with the longest match principle, as many parts of the fragment as possible are projected to the phrase level and are included into a tree structure. It has to be decided which part of the whole construction is the head and which parts depend on this head. Phrases within a syntactic unit are not attached on a higher level if they do not show dependency relation. This is often the case with syntactic elements which are separated by a colon or a dash (cf. 5.3.2): SIMPX 510 −





VF 508 ON

NX 509 HD

EN−ADD 505 −

LK 506 HD

NX 500 HD

VXFIN 501 HD

ASB

lädt

− NX 507 −

− VC 502 VPT

NX 503 HD

HD

ADJX 504 HD

ein

:

Tag

der

offenen

NN

VVFIN

PTKVZ

$.

NN

ART

ADJA

NN

nsm

3sis

−−

−−

nsm

gsf

gsf

gsf

0

1

2

3

4

5

Tür

6

7

NX 512

EN−ADD 508 −

EN−ADD 509 −

NX 500 −

Arlington

NX 501 HD



NX 505 HD



NX 507 −



,

R

:

Mark

,

D

:

Jeff

,

Tim

$,

NN

$.

NE

NE

$,

NN

$.

NE

NE

$,

NE

NE

nsn

nsf

npm

−−

−−

nsf

−−

nsm

nsm

−−

npm

−−

nsm

nsm

−−

nsm

nsm

4

5

6

7

28

8

9

10

11

12

Bridges



CARD

3

Pellington

NX 506 −

NE

2

1999

NX 504 −

EN−ADD 511 −

NE

1

USA

NX 503 HD

KONJ

EN−ADD 510 −

NE

0

Road

NX 502 HD

KONJ

13

14

15

Robbins

16

SIMPX 512 −





VF 510 V−MOD

MF 511 ON

ADVX 507

NX 500 HD

Berlin

NX 501 HD

LK 508 HD

HD



ADVX 502 HD

ADVX 503 HD

PX 509 −

VXFIN 504 HD

(

taz

)



So

NE

$(

$(

ADV

ADV

VAFIN

PIS

APPRART

NN

$.

nsn

−−

nsf

−−

−−

−−

−−

3sis

ns*

dsm

dsm

−−

3.4.6

2

3

4

5

6

man

NX 506 HD

$(

1

wird

HD

NX 505 HD

NE

0

also

PRED

7

zum

8

9

Problemfall

10

.

11

Long-Distance Dependencies

Our annotation scheme facilitates a surface-oriented representation of long-distance dependencies without crossing branches and traces. If a modifying constituent is not adjacent to the modified constituent, their dependency relation, which can even go beyond the border of topological fields, is encoded by special naming conventions for edge labels. We use edge labels such as OA-MOD (referring to OA) or PRED-MOD (referring to PRED) etc. expressing the non-ambiguity of the modifier. Beyond this, we make use of secondary edge labels for ambiguity resolution. These labels just serve as additional information to the grammatical functions encoded in the edge labels. These secondary edge labels indicate underspecified long distance dependencies in the following cases: 1. If the above mentioned edge labels need further disambiguation, e.g. if there are two OAs or V-MODs below one SIMPX node (REFMOD). 2. If the dependency relation exists between two nodes of which at least one is phrase internal and therefore carries only head or non-head information (REFINT). 3. If there is a dependency relation outside of SIMPX in control verb constructions (REFCONTR). SIMPX 512 −





VF 507 ON

LK 508 HD

V−MOD

MF 509 MOD

NX 500 HD

VXFIN 501 HD

ADVX 502 HD

ADJX 503 HD

Die

ADJX refmod 504 HD

je

.

VVINF

KOKOM

ADV

$.

np*

3pis

−−

−−

−−

−−

−−

−−

−−

5

denn

HD

ADJD

4

schlummern

ADVX 506 −

ADJD

3

seliger

VXINF 505 HD

ADV

2

künftig

V−MOD 506

VAFIN

1

dort

− NF 511 MOD−MOD

PDS

0

werden

− VC 510 OV

29

6

7

8



SIMPX 515 −





MF 513 OA

NF 514 MOD

NX 511

SIMPX 512

HD VF 506 ON

LK 507 HD

NX 500 HD

VXFIN 501 HD

Dieser

hat

0

− PX 508

refint



HD 512

NX 502 HD

NX 503 −

Auswirkungen

1



auf

2

3

HD

die

Bereitschaft

4

5



MF 509 OA

VC 510 HD

NX 504 HD

VXINF 505 HD

,

Therapieangebote

6

7

anzunehmen

8

.

9

PDS

VAFIN

NN

APPR

ART

NN

$,

NN

VVIZU

$.

nsm

3sis

apf

a

asf

asf

−−

apn

−−

−−



SIMPX 512 −



− NF 511 OS SIMPX 510 −

VF 505 OA

LK 506 HD refcontr

NX 500 −

VXFIN 501 HD

HD

MF 507 ON

MF 508 OD

VC 509 HD 500

NX 502 HD

NX 503

VXINF 504 HD

das

zu

schicken

VVFIN

PIS

ART

NN

PTKZU

VVINF

$.

***

asn

3sks

ns*

dp*

dp*

−−

−−

−−

3.4.7

3

4

Angehörigen



PDS

2

den

HD

All

1

man



PIDAT

0

versuche



5

6

7

.

8

Empty Categories

In general, an empty category analysis, e.g. for phrases without heads, is being avoided in the T¨ uBa-D/Z treebank. Empty Edge Labels Specifiers, prepositions,6 complementizers, discourse markers, KOORD and PARORD constituents, conjunctions, and unambiguous modifiers (that are attached to phrases immediately rather than to topological fields ) are not labeled with grammatical functions. Furthermore, the edges below the SIMPX node are empty. They are not labeled in order to speed up annotation where the information is unnecessary or self-evident. Furthermore, empty edge labels are used in elliptical phrases, e.g. noun phrases only consisting of an article and an attributive adjective (cf. 6.5). 6

In order to facilitate the identification of dependencies between verbs and their nominal complements and adjuncts and in keeping with basic assumptions in Dependency Grammar, the annotated head of a prepositional phrase is the NX (or complement) rather than the preposition itself. Therefore, prepositions carry no edge label.

30

Chapter 4 The Annotation of the Internal Structure of Phrases 4.1

Premodification and Postmodification in Phrases

The annotation of phrases is also carried out following the flat clustering principle in order to keep the number of hierarchy levels in a syntactic structure as small as possible. As will be shown in the following sections, phrases may include adjectival or nominal premodifiers and/or postmodifiers of any syntactic category. Both kinds of modifiers are in principle projected to their phrase levels. Since the modification scope of premodifiers is unambiguous, they are directly attached to the head of the phrase which they modify. By contrast, postmodifiers are always attached on a higher level to preserve ambiguity. This decision, referred to in 3.3 as the high attachment principle, was made to avoid the problematic distinction whether a postmodifier is a free adjunct or a complement of the modified phrase. The attachment strategy for premodifiers and postmodifiers is applied for all categories of phrases.

4.2

Noun Phrases

A simple noun phrase (NX) consists of a head noun (noun, proper noun, or a pronoun), (optionally) a determiner and (optionally) an adjectival or a nominal premodifier of any complexity preceding the head noun. A complex noun phrase is a simple noun phrase with a postmodifier of any syntactic category and complexity.

4.2.1

Prenominal Modification

In a simple noun phrase, both the determiner and the head noun are directly attached on the same level to NX so that the label of the head noun carries the edge label HD and the edge label of the determiner is empty.

31

NX 500 −

HD

die

Auseinandersetzung 0

1

ART

NN

nsf

nsf

NX 500 −

HD

jede

Spur 0

1

PIDAT

NN

nsf

nsf

Since prenominal modifiers are directly attached to the head noun on the same level, their edge labels are empty (whereas the edge labels of modifiers that are attached to topological fields are non-empty (cf. 8.1.2)). Prenominal modifiers are either attributive adjectives or preceding genitive phrases: NX 501 −



HD

ADJX 500 HD

ein

externer 0

Wirtschaftsprüfer 1

2

ART

ADJA

NN

nsm

nsm

nsm

NX 501 −



HD

ADJX 500 HD



die

zu

verhandelnden

ART

PTKZU

ADJA

NN

npf

−−

npf

npf

0

1

Taten

2

3

NX 501 −

HD

NX 500 HD

Bremens

Gesundheitssenatorin 0

1

NE

NN

gsn

nsf

32

If there is a PIDAT preceding the article it is directly attached to the noun phrase. NX 501 −





HD

ADJX 500 HD

all

die

historischen

PIDAT

ART

ADJA

NN

***

apm

apm

apm

0

1

2

Fehler

3

If a PIDAT is following the article in adjective position it is projected to its phrase level (ADJX) with possible premodifiers and then directly attached like an attributive adjective to the noun phrase. NX 501 −



HD

ADJX 500 HD

Die

meisten 0

Benutzer 1

2

ART

PIDAT

NN

npm

npm

npm

NX 504 −



HD

ADJX 503 −



HD

PX 502 −

HD NX 500

ADVX 501

HD

die

in 0

HD

Deutschland 1

ohnehin 2

wenigen 3

Gen−Food−Produzenten 4

5

ART

APPR

NE

ADV

PIDAT

NN

npm

d

dsn

−−

npm

npm

If there is more than one prenominal modifier, the one on the left hand side of the noun is modifying the following noun, the one on the left hand side of the modifier is modifying both, the modifier and the noun, and so on. All of these modifiers are attached to the head noun on the same level which yields a rather flat noun phrase structure. This stategy is justified by the fact that these modifiers have a scope of modification beyond the adjectival phrase, e.g. as in coordinate noun phrases like insgesamt 12.000 Studienpl¨atze und 15.000 Lehrstellen, the adverb insgesamt modifies 12.000 Studienpl¨atze as well as 15.000 Lehrstellen.

33

NX 502 −



ADJX 500

HD

ADJX 501

HD

HD

lieber

knieartiger

Leser

0

1

2

ADJA

ADJA

NN

nsm

nsm

nsm

In case of complex head nouns, e.g. complex (proper) nouns consisting of two nominal parts or coordinated head nouns (cf. 6.4.5), first the complex noun respectively the coordination (cf. 6.4) is annotated with its own internal dependency structure. Afterwards, the determiner and possible premodifying adjectival phrases are attached on a higher level. NX 504 −

HD EN−ADD 503 − NX 502 −

HD

EN−ADD 501 − NX 500 −

Die

" 0



debis

1

Systemhaus 2

GmbH 3

" 4

5

ART

$(

NE

NE

NN

$(

nsf

−−

nsf

nsn

nsf

−−

NX 503 −

HD EN−ADD 502 − NX 501 −

HD

NX 500 HD

der

Heinrich

ART

NE

NN

gsf

gsm

gsf

0

1

Böll−Stiftung

2

34

NX 504 −

HD

NX 503 HD

− PX 502 −

HD

NX 500 HD

NX 501 HD

"

Solidarität

$(

NN

APPR

−−

nsf

d

0

mit

1

Miloevic

2

"



Parolen

NE

$(

$(

NN

dsm

−−

−−

dpf

3

4

5

6

Milosevic NX 503 −

HD NX 502 KONJ

ihren



KONJ

NX 500

NX 501

HD

HD

Sänger 0

und 1

Gründer 2

3

PPOSAT

NN

KON

NN

asm

asm

−−

asm

4.2.2

Postnominal Modification

Whereas prenominal modifiers are always directly attached to the head noun on the same level, postnominal modifiers are attached to the head noun on a higher level. Postnominal modifiers are also always first projected to the phrase level before they are attached to the head noun on a higher level. Phrase internal postmodifiers can be of any phrasal category. The following tree structures show a prepositional phrase (PX) and a genitive phrase (NX) as postmodifiers. See section 6.3, page 99 for the analysis of relative clauses. NX 503 HD

− PX 502 −

HD

NX 500

NX 501

HD

HD

Glück

im 0

Netz 1

2

NN

APPRART

NN

nsn

dsn

dsn

35

NX 503 HD

− NX 502 −



NX 500

ADJX 501



HD

Die

HD

HD

Mitteilung 0

des 1

Bremer

Senats

2

3

4

ART

NN

ART

ADJA

NN

asf

asf

gsm

***

gsm

In case a noun has more than one postmodifier, these modifiers usually show a hierarchical structure, for example, the first modifier modifies the head noun, the second modifier modifies the complete preceding noun phrase structure, and so on. NX 506 HD



NX 505 HD



NX 503 −



PX 504 HD



ADJX 500 HD

die

guten 0

Beziehungen

NX 501

NX 502

HD

HD

Bonns

1

HD

2

zu 3

Moskau 4

5

ART

ADJA

NN

NE

APPR

NE

apf

apf

apf

gsn

d

dsn

Attributes of degree and quantity nouns are also defined as postnominal modifiers: NX 502 HD



NX 500 −

NX 501 HD

eine

HD

Kiste 0

Sprengstoff 1

2

ART

NN

NN

asf

asf

asm

Cardinal numbers either appear as quantity nouns or premodifying adjectival attributes, e.g. the cardinal number 1,000,000 can also be expressed by the quantity noun eine Million. Therefore, we have to distinguish the following two ways of annotation:

36

SIMPX 510 −



VF 509 ON NX 508 HD

− PX 507 −

HD NX 506 HD



NX 504 − NX 500 −

Der

HD

ADJX 501 HD

HD

Etat

0

LK 505 HD

von

1

3,5

2

NX 502 HD

Millionen

3

4

Mark

VXFIN 503 HD

steht

5

.

6

7

ART

NN

APPR

CARD

NN

NN

VVFIN

$.

nsm

nsm

d

−−

dpf

dpf

3sis

−−

SIMPX 517 −





VF 516 OS SIMPX 515 −





VF 513 ON

MF 514 MOD

NX 508 HD



LK 509 HD



NX 500

NX 501 HD

HD

NX 510 −

VXFIN 502 HD

OA

NX 506 HD

NX 507 HD

Das

5

Mark

"

,

empört

NN

NN

VVFIN

ADV

CARD

NN

$(

$,

VVFIN

PPER

PRF

$.

−−

nsn

nsn

nsn

3sit

−−

−−

apf

−−

−−

3sis

nsf3

as*3

−−

3

4

zuletzt

VXFIN 505 HD

ART

2

kostete

ADJX 504 HD

MF 512 ON

"

1

Weißbrot

ADVX 503 HD

LK 511 HD

HD

$(

0

Kilo

OA

5

6

7

8

9

10

sie

11

sich

12

.

13

For nominal postmodifiers apart from genitive phrases the same attachment rule is applied. This kind of postmodifiers which may also appear in brackets, e.g. Heinz Schleußer (SPD), is semantically closely related to the preceding head noun phrase. die Arbeiterwohlfahrt Bremen, for instance, means die Arbeiterwohlfahrt which is located in Bremen, but does not mean die Arbeiterwohlfahrt which is called Bremen. Hence, these postmodifiers have to be distinguished from appositions (cf. 4.2.3) and complex proper nouns (cf. 4.2.5). NX 503 HD



EN−ADD 502 − NX 500 −

Heinz

NX 501 HD



(

SPD

NE

NE

$(

NE

$(

nsm

nsm

−−

nsf

−−

0

Schleußer

1

2

3

)

4

37

NX 503 HD



NX 502 −

HD EN−ADD 500 −

NX 501 HD

die

Arbeiterwohlfahrt

ART

NN

NE

nsf

nsf

nsn

0

1

Bremen

2

NX 502 HD



NX 500 HD

NX 501 HD

Zentralkrankenhaus

0

Ost

NN

NN

dsn

dsn

1

NX 502 HD



NX 500 HD

Kapitel

NX 501 HD

0

VII

1

NN

CARD

dsn

−−

NX 502 HD



NX 500 −

des

NX 501 HD

HD

0

ICE

1

884

2

ART

NN

CARD

gsm

gsm

−−

4.2.3

Appositional Constructions

An apposition is a specific kind of attribute to a noun, which normally agrees in case with this noun and does not change its overall meaning. There is no consensus among grammarians of what is exactly meant by the notion apposition (cf. (Eisenberg 1999 2001)). Eisenberg (1999 2001), for instance, claims that, e.g. Ute Wedemeier die Landesvorsitzende and die Landesvorsitzende Ute Wedemeier are both appositions but it is not clear which part is the apposition and which part is the head noun. The Duden Grammar (1995) distinguishes between loosely constructed appositions (lockere Apposition) (e.g. Ute Wedemeier, die Landesvorsitzende,), which follow the head noun separated by a comma, and tightly constructed appositions (enge Apposition) (e.g. (die) Landesvor38

sitzende Ute Wedemeier), which precede the head noun (cf. (Drosdowski 1995)). According to Helbig/Buscha (1998) there is case agreement between loosely constructed appositions and head nouns which are separated by a punctuation mark. By contrast, Engel (1996) thinks that only loosely constructed appositions can be regarded as appositions. He treats tightly constructed appositions as nomen varians or nomen invarians. Because of these different definitions of the notion of apposition, we do not decide on what is the head noun and what is the apposition. We assume referential identity between the two parts. Loosely constructed appositions as well as tightly constructed appositions are treated as appositional constructions, i.e., the head noun and its apposition form a complex structure which does not give any information about head assignment. Therefore, both parts are first projected to their phrase level and then coordinated on a higher level, each of them labeled as apposition (APP), i.e. as a part of an appositional structure. What is important is the referential identity in meaning. Thus, Nummer 1 is an appositional construction, whereas Seite 1 is a noun phrase with the postmodifier 1. Forms of address for persons and titles, e.g. Herr, Frau, Doktor (Dr.), Professor (Prof.), Bundeskanzler, are also treated as appositional constructions. Here are some examples: NX 505 APP

APP

NX 503

NX 504

HD



NX 500





ADVX 501

HD

ADJX 502

HD

Donnerstag

HD

morgen 0

HD

, 1

den

2

13. 3

Mai 4

5

NN

ADV

$,

ART

ADJA

NN

asm

−−

−−

asm

asm

asm

NX 502 APP

APP

NX 500

NX 501

HD

HD

Herr

Taake 0

1

NN

NE

nsm

nsm

NX 503 APP

APP EN−ADD 502 −

NX 500

NX 501

HD



Landesvorsitzende



Ute 0

Wedemeier 1

2

NN

NE

NE

nsf

nsf

nsf

39

NX 506 APP

APP NX 505 HD

EN−ADD 503

NX 504





NX 500

HD

ADJX 501





Volker



NX 502

HD

Tegeler 0

, 1



stellvertretender

2

Geschäftsführer 3

HD

des 4

Landesverbandes 5

6

NE

NE

$,

ADJA

NN

ART

NN

nsm

nsm

−−

nsm

nsm

gsm

gsm

NX 502 APP

APP

NX 500 −

NX 501 HD

die

HD

Stadt 0

Frankfurt 1

2

ART

NN

NE

nsf

nsf

nsn

NX 504 APP

APP NX 503 APP

APP

NX 500

NX 501

NX 502

HD

HD

HD

Vorwurf

Nummer 0

1 1

2

NN

NN

CARD

nsm

nsf

−−

NX 502 APP

APP

NX 500

NX 501

HD

HD

Telefon

472711 0

1

NN

CARD

dsn

−−

In case of a form of address combined with one or more titles preceding a name, we annotate an embedded appositional construction:

40

NX 506 APP

APP NX 505 APP

APP

NX 503 −

EN−ADD 504



HD



ADJX 500

NX 501

HD

Die

NX 502

HD

Dortmunder 0

Psychologin



Prof.

1

2



Alexa 3

Franke 4

5

ART

ADJA

NN

NN

NE

NE

nsf

***

nsf

nsf

nsf

nsf

The same way, we treat proper nouns in brackets which are identical to the preceding proper noun, for example, an actor’s name and role: NX 504 APP

APP

EN−ADD 502

EN−ADD 503





NX 500 −

NX 501 −

Andrea



Spatzek 0

/ 1



Gabi

2

Zenker 3

4

NE

NE

$(

NE

NE

dsf

dsf

−−

dsf

dsf

Premodification of the whole appositional construction is attached to an additional NX level. NX 505 −

HD NX 504 APP

APP EN−ADD 503 −

ADVX 500 HD

Auch

0

NX 501 HD

Bundesumweltminister

NX 502 −

1

Jürgen



2

Trittin

ADV

NN

NE

NE

−−

nsm

nsm

nsm

3

There are some examples in which the appositional construction does not agree in case. These are postnominal titles of books, films, etc. and translations interspersed in the sentence. In the latter type, we extend the appositional construction also to nonnominal phrases.

41

PX 504 −

HD NX 503 APP

APP EN−ADD 502 −

NCX 500 −

HD

in

dem

APPR

ART

d

dsm

0

NCX 501

1

Film



HD

"

Das

NN

$(

ART

NN

$(

dsm

−−

nsn

nsn

−−

2

3

Verhör

4

"

5

6

SIMPX 510 −





MF 509 OA

OPP PX 508 APP

NCX 505 −

− C 500 −

um

APP

PX 506 HD



VC 507 HD

HD

ADJX 501 HD

NCX 502 HD



die

wildernden

(

"

outlaw

"

)

zu

stellen

ADJA

NN

APPR

NN

$(

$(

FM

$(

$(

PTKZU

VVINF

−−

apm

apm

apm

g

gsn

−−

−−

−−

−−

−−

−−

−−

4.2.4

2

3

4

Gesetzes

HD

ART

1

außer

VXINF 504

KOUI

0

Hunde

FX 503 HD

5

6

7

8

9

10

11

12

Foreign Language Material

Words or parts of a text written in a foreign language except foreign language proper nouns are tagged as foreign language material (FM), e.g. hello (FM), no (FM) longer (FM) amused (FM). All parts of foreign language proper nouns (4.2.5) are tagged as NE (e.g. Mary(NE) , New (NE) York (NE), University (NE) of (NE) Illinois (NE)). Single foreign words are projected to a syntactic level assigned the node label FX, which is an universal label for any syntactic category (phrasal and sentential) in the respective foreign language. More complex parts of a text tagged as FM are attached on the same level without any internal syntactic structure and head assignment. Their mother node is also assigned the label FM, e.g. no longer amused. For foreign language constructions containing a foreign language proper noun, the annotation strategy is the following: in a first step, all NEs are projected to the phrase level (NX), in a second step, these phrase node labels together with all FMs are projected to the next higher level with the node label FX. Again, there is no head assignment directly below the FX node, e.g. Mister Gere himself.

42

DM 500 −

hello 0

FM −−

NX 501 −



FX 500 −



das

no

0

HD

longer

1



amused

2

3

Kollegium

ART

FM

FM

FM

NN

nsn

−−

−−

−−

nsn

4

FX 502 −

HD FX 501 −





NX 500 HD

wie

Mr. 0

Gere 1

himself 2

3

KOKOM

FM

NE

FM

−−

−−

nsm

−−

Often, foreign language material is a part of a German syntactic construction and plays the role of a grammatical function. Therefore, the FX node is attached as a constituent to the tree structure. If it is directly attached to a field or a sentence bracket, the edge label above the FX node denotes its grammatical function within the clause, e.g. Kafka goes Kleinkunst (head of the clause).

43

SIMPX 506 −





VF 503

LK 504

ON

HD

NX 500

FX 501

NX 502

HD

HD

HD

Kafka

MF 505 V−MOD

goes

Kleinkunst

0

1

2

NE

FM

NN

nsm

−−

asf

If a single FM is head of a phrase which can be identified as a German phrase, e.g. by an article and/or an adjective (noun phrase), it is projected to the specific phrasal category, e.g. NX instead of FX in a construction like in der Creme de la Kunst, die nordamerikanischen Brothers. PX 503 −

HD NX 502 −

HD FX 501 −





− NX 500 HD

in

der 0

Creme 1

de 2

la 3

Kunst 4

5

APPR

ART

FM

FM

FM

NN

d

dsf

−−

−−

−−

dsf

NX 501 −



HD

ADJX 500 HD

ihrer

nordamerikanischen 0

Brothers 1

2

PPOSAT

ADJA

FM

gp*

gp*

−−

If FX is modified by a postmodifier the mother node of the complex phrase is also ¨ FX, which again may be preceded by another phrase, e.g. Unter der Uberschrift ’user als looser’.. 44

PX 505 −

HD NX 504 APP

APP FX 503 HD

NX 500 −

Unter



FX 501 HD

HD

NX 502 −

der

Überschrift



user

APPR

ART

NN

$(

FM

KOKOM

FM

$(

d

dsf

dsf

−−

−−

−−

−−

−−

0

4.2.5

1

2

3

4

als

HD

5

looser

6



7

Proper Nouns and Named Entities

Proper nouns denote individual living beings, objects, etc. which exist only once as entities with their own specific properties. The distinction between proper nouns and nouns is not always clear-cut. On the one hand, proper nouns can also become nouns, e.g. Opel as the company is a proper noun PoS-tagged as Opel (NE), on the other hand, Opel as the car is a noun PoS-tagged as Opel (NN). In addition to the categories of proper nouns listed in the STTS guidelines (first and last names of persons, names of companies, and geographical names), we also define names of streets and places, individual names of institutions (e.g. Max-Planck-Institut, Deutsches Museum, events (e.g. Zweiter Weltkrieg), and titles of books, movies, etc. as specific categories of proper nouns. Since the PoS-tagging of proper nouns follows the categories of proper nouns in the STTS, some nouns which belong to our class of proper nouns (e.g. composed forms of NE + NN like S¨ ogestraße) are tagged as NN. Complex proper nouns forming a syntagma as well as titles, names of historical events, institutions, and so on, are PoS-tagged according to their distribution (e.g. der (ART) Potsdamer (ADJA) Platz (NN), Auf (APPR) die (ART) st¨ urmische (ADJA) Art (NN)). This kind of proper nouns, we define as named entities. In order to distinguish nouns from proper nouns and named entities, the latter ones are assigned an additional node label EN-ADD above their mother node. EN-ADD subsumes all single proper nouns which are not tagged as NE, complex proper nouns tagged as NE as well as named entities tagged according to their distribution. The secondary edge label EN is used for complex proper nouns if EN-ADD cannot be used because of the annotation rules for the internal structure of nouns phrases (e.g. den gleichgeschalteten Hamburger Reichssender ) (cf. 4.2.5). Since German and foreign language named entities differ in terms of their PoS-tagging, their internal syntactic structure also differs in consequence. Our annotation strategy of proper nouns and named entities will be demonstrated in the following. 45

German Proper Nouns German proper nouns denoting individual entities in the above mentioned sense consist of one or more lexical elements tagged as NE. In case of a single NE, this NE is projected to its phrase level, like nouns, carrying the node label NX. Proper nouns consisting of two or more NEs are attached on the same level. None of them carries a head label in order to indicate that there is no obvious dependency relation between them, e.g. first name and last name of a person, initials, (included) nick names, and names of institutions consisting of NEs. As mentioned above, these complex proper nouns are assigned the additional node label EN-ADD. NX 500 HD

Hamburg

0

NE dsn

EN−ADD 501 − NX 500 −



Ute

Wedemeier 0

1

NE

NE

nsf

nsf

EN−ADD 501 − NX 500 −



K.

W. 0

1

NE

NE

ns*

ns*

EN−ADD 501 − NX 500 −



Ulrich

" 0



Tofu

1

" 2

Reineking

3

4

NE

$(

NE

$(

NE

nsm

−−

nsm

−−

nsm

46

EN−ADD 501 − NX 500 −



Bayern

München

0

NE

NE

asn

asn

1

Proper nouns which are not tagged as NE, e.g. composed forms of NE+NN (S¨ ogestraße) or complex phrases (Der Spiegel) also get an additional EN-ADD node. If a preceding article does not belong to the proper noun itself, EN-ADD is directly projected from the lexical level of the noun: SIMPX 507 −



MF 506 ON

MOD PX 505 −

HD NX 503 −

NX 500 HD

CDU−Treff

in 0

der 1

VC 504 HD

HD

EN−ADD 501

VXINF 502



HD

Sögestraße 2

eröffnet 3

4

NN

APPR

ART

NN

VVPP

nsm

d

dsf

dsf

−−

If the proper noun, is a complex syntactic structure, e.g. a phrase or a sentence, the lexical elements of this syntactic structure are tagged according to their distribution. First the whole phrase/sentence is annotated before EN-ADD is added: EN−ADD 501 − NX 500 −

HD

der

Spiegel 0

1

ART

NN

nsm

nsm

If proper nouns include other parts of speech than NEs, these are tagged according to their distribution. Therefore, proper nouns with a preposition include a prepositional phrase.

47

EN−ADD 503 − NX 502 −

− PX 501 −

HD NX 500 HD

Ole

von

Beust

0

1

2

NE

APPR

NE

nsm

d

dsm

If a proper noun occurs within a more complex proper noun, EN-ADD is annotated on both levels of proper nouns: NX 504 −

HD EN−ADD 503 − NX 502 −

HD

EN−ADD 501 − NX 500 −

Die

" 0



debis

1

Systemhaus

GmbH

2

3

" 4

5

ART

$(

NE

NE

NN

$(

nsf

−−

nsf

nsn

nsf

−−

If the original form of a proper noun (e.g. Zweiter Weltkrieg, Hamburger Reichssender) is inflected and/or premodified by an article and/or attributive adjective, the included proper noun is indicated by the secondary edge label EN. EN always points from the dependend part of the proper noun to its head noun: SIMPX 509 −



− MF 508 PRED NX 507 HD

VF 504

LK 505

ON

HD

NX 500 HD



VXFIN 501 HD

Er

− 5

NX 502

HD

Zeitzeuge 1

HD

ADJX 503

HD

ist 0

− NX 506

des 2

EN

Zweiten 3

Weltkrieges 4

. 5

6

PPER

VAFIN

NN

ART

ADJA

NN

$.

nsm3

3sis

nsm

gsm

gsm

gsm

−−

48

NX 502 −



− 3

ADJX 500

ADJX 501

HD

den

HD

HD

gleichgeschalteten

EN

Hamburger

0

Reichssender

1

2

3

ART

ADJA

ADJA

NN

asm

asm

***

asm

The secondary edge label is also used to mark that a postmodifier is part of a proper noun: PX 505 −

HD NX 504 HD

− 2

NX 502 −

PX 503 HD



ADJX 500

HD NX 501

EN

HD

vom

HD

Hamburger 0

Institut 1

für 2

Sozialforschung 3

4

APPRART

ADJA

NN

APPR

NN

dsn

***

dsn

a

asf

Proper nouns may occur as a prenominal genitive with an attributive function: NX 501 −

HD

NX 500 HD

Bremens

Häfensenator 0

1

NE

NN

gsn

nsm

Proper nouns may be modified postnominally by a another proper noun, e.g. Arbeiterwohlfahrt Bremen (cp. 4.2.2), which is no proper noun as a whole because Bremen only specifies a specific location of die Arbeiterwohlfahrt. Whereas, M¨ unchen in Bayern M¨ unchen is part of the name of the soccer club. Furthermore, Bremen in a construction like das Motorschiff Bremen belongs to an appositional construction (cf. 4.2.3).

Foreign Language Proper Nouns Since all elements of foreign language proper nouns are tagged as NE, they are annotated in the same way as German proper nouns, i.e., they are either a single proper noun or a complex proper noun with an EN-ADD node. Single foreign language proper nouns are projected to their phrase level, complex ones are attached on the same level without head assignment. Their mother node is NX with the additional node EN-ADD. Note the difference between Ole von Beust (cf. 4.2.5) and Inez van Lambsweerde which results from the PoS-tagging rules. If the proper noun 49

consists of more than one lexical element and if it has a German article, the article is attached on a higher level. NX 500 HD

Mary 0

NE dsf

EN−ADD 501 − NX 500 −



New

York 0

1

NE

NE

dsn

dsn

EN−ADD 501 − NX 500 −



Inez



van 0

Lambsweerde 1

2

NE

NE

NE

dsf

dsf

dsf

EN−ADD 501 − NX 500 −



University



of 0

Illinois 1

2

NE

NE

NE

dsf

dsn

dsn

NX 502 −

HD EN−ADD 501 − NX 500 −

die



Tour 0



de 1

France 2

3

ART

NE

NE

NE

asf

asf

asf

asf

50

German Named Entities As mentioned above, German named entities are always tagged according to their distribution and annotated with their internal syntactic structure as noun phrases, prepositional phrases, clauses, etc. In order to indicate their status as named entities, the additional node EN-ADD is inserted between their mother node and the next higher annotation level. If two EN-ADD nodes are coordinated, their mother node is NX which represents the nominal status of EN-ADD: SIMPX 518 −





VF 517 V−MOD PX 516 −

HD EN−ADD 514

MF 515



ON

ADJX 512 HD



KONJ

PX 507

LK 508



HD

ADJX 500

"

Schlaflos

in 2





NX 503

HD

Seattle 3

" 4

KONJ EN−ADD 510

VXFIN 502

HD



EN−ADD 509

HD

NX 501

HD

Seit

PRED

NX 513

gelten



Tom 6

HD

NX 504



5

NX 511 −



Hanks 7

und 8



Meg 9



NX 505

NX 506

HD

Ryan 10

als 11



Dream−Team 12

HD

des 13

Biedersinns

0

1

APPR

$(

ADJD

APPR

NE

$(

VVFIN

NE

NE

KON

NE

NE

KOKOM

NN

ART

14

NN

d

−−

−−

d

dsn

−−

3pis

nsm

nsm

−−

nsf

nsf

−−

nsn

gsm

gsm

If EN-ADD has a premodifier, which can be an EN-ADD itself or a postmodifier, its mother node is also always NX: SIMPX 515 −





VF 514 ON NX 512 −

MF 513 −

HD

V−MOD

EN−ADD 510 − EN−ADD 506 −

NX 500 −

Bukowskis 0

" 2

Bis



VXFIN 503

HD

derbes 1

NX 509

HD

NX 502

HD

HD

LK 508 HD

ADJX 501 −

Oliver



PX 507



OA

PX 511

ADJX 504

HD

Denver

" 5

NX 505

HD

feierte

6

HD

im

Altonaer 8

Theater 9

Premiere

3

4

NE

NE

ADJA

$(

APPR

NE

$(

VVFIN

APPRART

ADJA

NN

NN

gsm

gsm

nsn

−−

a

asn

−−

3sit

dsn

***

dsn

asf

51

7

HD

10

11

15

NX 510 HD



EN−ADD 509 − SIMPX 507

PX 508





LK 504 HD

ON

VXFIN 500 HD

"



HD

MF 505

Sind

EN−ADD 506 PRED



NX 501

NX 502

HD

HD

Sie

NX 503 −

Luigi

? 3

"

4

von



Stephan

Brüggenthies

0

1

2

5

6

$(

VAFIN

PPER

NE

$.

$(

APPR

NE

7

NE

8

−−

3pis

np*3

nsm

−−

−−

d

dsm

dsm

SIMPX 519 −







VF 518 ON NX 517 HD



EN−ADD 516 − SIMPX 514 −





FOPP

VF 508

LK 509

MF 510

LK 511

ON

HD

OA

HD

NX 500

VXFIN 501

HD

"

MF 515

OA

0

NX 502

HD

HD

jagt 1

Oberärztin 2

" 3



ADVX 503

VXFIN 504

HD

HD

jedenfalls

4

VC 513 HD

OV

NX 505 HD

wird 5

PRED

PX 512

vom 6

ZDF 7

NX 506 −

als 8

VXINF 507 HD

" 9

HD

Medicomödie

10

" 11

angepriesen

12

13

$(

NN

VVFIN

NN

$(

ADV

VAFIN

APPRART

NE

KOKOM

$(

NN

$(

VVPP

−−

nsm

3sis

asf

−−

−−

3sis

dsn

dsn

−−

−−

nsf

−−

−−

EN-ADDs which are directly attached to a field are assigned the grammatical function they have in the German sentence: SIMPX 508 −





− MF 507 PRED

KOORD 500

VF 504

LK 505

ON

HD

NX 501



HD

Oder



VXFIN 502

NX 503

HD

er 0

EN−ADD 506



heißt 1



Elvis 2

Costello 3

. 4

5

KON

PPER

VVFIN

NE

NE

$.

−−

nsm3

3sis

nsm

nsm

−−

If named entities, e.g. a title, consist of two separate phrases and/or clauses, (e.g. title and subtitle), the first part is annotated as the head of the second part to express their 52

14

dependency relation: NX 510 APP

APP EN−ADD 509 − NX 508 HD

− NX 507 HD

− PX 506 −

HD

NX 504 −



NX 500



ADJX 501



HD

Die

NX 505 HD NX 502

HD

Ausstellung 0

" 1

Der

2

HD

neue 3

Mensch 4

− 5

HD

Obsessionen

6

HD

ADJX 503

im 7

20. 8

Jahrhundert 9

" 10

11

ART

NN

$(

ART

ADJA

NN

$(

NN

APPRART

ADJA

NN

$(

nsf

nsf

−−

nsm

nsm

nsm

−−

npf

dsn

dsn

dsn

−−

Foreign Language Named Entities The syntactic annotation of foreign language named entities differs from the annotation of German named entities in the following aspects. As mentioned above, foreign language proper nouns are tagged as NE, while all other lexical entries of a foreign language are tagged as foreign language material (FM). A foreign language named entity which consists only of a proper noun, e.g. the title of a movie (Forrest Gumpp) is assigned an EN-ADD label. If a foreign language named entity consists of only FM tagged elements, these elements are directly attached on the same level without internal syntactic structure. Its mother node is marked as FX, e.g. Knockin’ on Heaven’s Door. If a foreign language named entity consists of NE as well as FM tagged elements, e.g. Shakespeare (NE) in (FM) Love (FM), first the annotation stategy described in 4.2.4 is applied. Then, in a second step, the same strategy as for German named entities is applied: the insertion of the EN-ADD node. EN−ADD 501 − NX 500 −



Forrest

Gump 0

1

NE

NE

nsm

nsm

53

EN−ADD 501 − FX 500 −



Knockin’



on 0



Heaven’s 1

Door 2

3

FM

FM

FM

FM

−−

−−

−−

−−

EN−ADD 502 − FX 501 −





NX 500 HD

Shakespeare

in 0

Love 1

2

NE

FM

FM

nsm

−−

−−

4.2.6

Ordinal Numbers

According to their distribution, ordinal numbers occur either as a premodifying attributive adjective (e.g. die dritte (ADJA) Partie) or as a head noun (e.g. er ist der sechste (NN)). In the first case, the premodifier is projected to an adjectival phrase, in the latter case it is projected to a noun phrase. NX 501 −



HD

ADJX 500 HD

die

dritte 0

Partie 1

2

ART

ADJA

NN

nsf

nsf

nsf

SIMPX 510 −



KOORD 500



VF 507

LK 508

ON

HD

MOD

MOD

MOD

VXFIN 502

ADVX 503

ADVX 504

ADVX 505

HD

HD

HD

HD

ja

auch

NX 501



HD

Aber



das 0

ist 1

2

MF 509

3

PRED NX 506 −

schon 4

HD

die 5

Vierte 6

. 7

8

KON

PDS

VAFIN

ADV

ADV

ADV

ART

NN

$.

−−

nsn

3sis

−−

−−

−−

nsf

nsf

−−

54

4.2.7

Cardinal Numbers

According to their syntactic function (nominal or adjectival), cardinal numbers (CARD), are either projected to NX or ADJX. If their numerals are written separately or in groups, e.g. numbers of bank accounts, they are attached on the same level like proper names without internal head assignment. NX 502 APP

APP

NX 500

NX 501

HD

HD

Jahr

2000 0

1

NN

CARD

dsn

−−

PX 502 −

HD NX 501 −



HD

ADJX 500 HD

in

allen 0

23 1

Bezirken 2

3

APPR

PIDAT

CARD

NN

d

dpm

−−

dpm

NX 502 APP

APP

NX 500

NX 501

HD



BLZ



500 0



901

00

1

2

NN

CARD

CARD

CARD

3

nsf

−−

−−

−−

A premodifying cardinal number is nominal if it does not express a quantity like in the example above, but a characteristic of the following noun, e.g. the number of a zip code: NX 501 −

HD

NX 500 HD

13187

Berlin 0

1

CARD

NE

−−

nsn

55

Complex time expressions or results of competitions are also treated as cardinal numbers: NX 501 −

HD

ADJX 500 HD

20.15

Uhr 0

1

CARD

NN

−−

nsf

PX 501 −

HD NX 500 HD

mit

3:0

0

1

APPR

CARD

d

−−

4.2.8

Letters and Non-Words

Letters and non-words are tagged as XY. They are projected to their phrase level and assigned the syntactic category to which they belong in the construction. Signs which represent a lexical element, e.g. the sign for paragraph, are tagged with the respective part-of-speech tag: EN−ADD 502 − NX 500

NX 501

HD



R

:



Joel

Schumacher

0

1

XY

$.

NE

2

NE

3

−−

−−

nsm

nsm

NX 501 −

HD

NX 500 HD

D−76351

Linkenheim 0

1

XY

NE

−−

dsn

56

NX 506 HD



NX 505 HD

− NX 504 HD



NX 500

NX 501

NX 502

NX 503

HD

HD

HD



§

220

a

des

HD

Strafgesetzbuches

0

1

2

NN

CARD

XY

ART

NN

nsm

−−

−−

gsn

gsn

4.2.9

3

4

Expletive and Other Uses of es

The pronominal form es functions as expletive element in German. Three different expletive usages are traditionally distinguished: formal subject or object, correlate of an extraposed clausal argument, and Vorfeld-es (cf. (Eisenberg 1999 2001), (P¨ utz 1986)). For sake of completeness, the following list begins with an example of es as a referential personal pronoun. Personal Pronoun The pronoun functions as an argument of the verb and refers to some person, object, or event that is salient in the context. It can be tested, whether es is used as a pronoun by replacing it by another noun or pronoun (such as das or er/ihn). In the example tree es refers to the neuter noun G¨ astehaus in the preceding sentence: Die italienische Regierung hat die Familie im staatlichen G¨ astehaus Casino dell’Algardi untergebracht. SIMPX 509 −







MF 508 FOPP VF 504

LK 505

ON

HD

NX 500 HD

PX 506 −

HD

VXFIN 501

VXINF 503

HD

wird 0

OV

NX 502

HD

Es

VC 507

von

HD

Scharfschützen

bewacht 3

.

1

2

4

PPER

VAFIN

APPR

NN

VVPP

$.

5

nsn3

3sis

d

dpm

−−

−−

Formal Subject or Object The formal subject obligatorily occurs with weather verbs, e.g. Es regnet and unpersonal or agentless constructions such as Es gibt so eine Buchung or Es geht um popul¨ are Unterhaltung. Some verbs optionally permit an expletive subject but also occur with referential 57

subjects such as Max/Es kopft an der T¨ ur. A formal object is found in constructions like jmd. legt es an auf etw. or jmd. verdirbt es mit jmdm. In all examples mentioned, es functions as a grammatical argument without semantic contribution, i.e. it does not refer to a person, object, or event. In T¨ uBa-D/Z formal subjects and objects are treated like referential pronouns and are labeled alike, e.g. with edge labels ON or OA. Formal arguments are obligatory and may occur in the Mittelfeld. In case of doubt, it is a good test to paraphrase the sentence such that another element occupies the Vorfeld, e.g. Natrlich gibt es so eine Buchung versus *Natrlich gibt so eine Buchung. SIMPX 507 −



− MF 506 OA

VF 503

LK 504

ON

HD



VXFIN 501

ADVX 502

HD

HD

NX 500 HD

"

Es

0

gibt 1

NX 505 −

so

HD

eine

2

3

Buchung

.

4

5

"

6

7

$(

PPER

VVFIN

ADV

ART

NN

$.

$(

−−

nsn3

3sis

−−

asf

asf

−−

−−

Correlate of an Extraposed Clausal Argument If a clausal argument is extraposed in the Nachfeld, it is optionally doubled by an expletive es in the Vorfeld or Mittelfeld. The expletive is labeled ON-MOD or OS-MOD depending on the function of the clausal argument. −



SIMPX 521 −



− NF 520 ON SIMPX 519 −

− NF 518 OS SIMPX 517 −

VF 510 ON−MOD KOORD 500 −

Aber

NX 501 HD



LK 511 HD

MF 512 PRED

VC 513 HD

VF 514 V−MOD

LK 515 HD

VXFIN 502 HD

ADJX 503 HD

VXINF 504

PX 505 HD

VXFIN 506 HD

HD



ON NX 507 −

ADVX 508 HD

HD

NX 509 −

es

ist

übertrieben

zu

sagen

,

damit

die

FU

VAFIN

ADJD

PTKZU

VVINF

$,

PROP

VVFIN

ART

NE

ADV

ART

NN

$.

−−

nsn3

3sis

−−

−−

−−

−−

−−

3skt

nsf

nsf

−−

asf

asf

−−

2

3

4

5

6

58

7

8

9

10

11

eine

HD

PPER

1

erst

OA

KON

0

bekäme

− MF 516 MOD

12

Identität

13

.

14

Vorfeld-es The last type is a purely structural dummy element. It occurs in Vorfeld position only and is not correlated with any argument of the clause. It does not agree with the verb which becomes evident if there is a plural subject in the Mittelfeld, which is illustrated in the example tree below. It is ungrammatical in the Mittelfeld, e.g. *. . . dass es ihn die V¨olker zahlen. Vorfeld-es is labeled ES to indicate its purely structural function. In the first release of T¨ uBa-D/Z, 12/2003, Vorfeld-es was integrated by means of ON-MOD. SIMPX 516 −





− NF 515 ON−MOD R−SIMPX 514 −

VF 508 ES

LK 509

MF 510

HD

NX 500 HD

VXFIN 501 HD

es

MF 512

ON

ON

MOD

NX 502

NX 503

NX 504

ADJX 505



ihn 1

C 511



OA

HD

zahlen 0



HD

die 2



Völker 3

, 4

HD

deren

VC 513 OV

HD

VXINF 506

VXFIN 507

HD

HD

HD

Menschenrechte

angeblich 7

verteidigt 8

werden

5

6

PPER

VVFIN

PPER

ART

NN

$,

PRELAT

NN

ADJD

VVPP

9

VAFIN

10

****

3pis

asm3

npn

npn

−−

gp*

npn

−−

−−

3pis

Table 4.1 summarizes tests and labels for the different uses of es. Table 4.1: Types of es type test substitutable by other pronouns optional correlates with clausal argument ungrammatical in Mittelfeld edge label

referential formal pronoun argument yes no

correlate no

Vorfeld-es no

no

no

yes

no

no

no

yes

no

no

no

no

yes

ON, OA, etc.

ON, OA

ON-MOD, OS-MOD

ES

Es sei denn The lexicalized phrase es sei denn, meaning auer, is analyzed as a copula construction. 59

11

12

SIMPX 523 KONJ

KONJ SIMPX 522 −





− NF 521 PRED

SIMPX 519 −



VF 510

LK 511

ES

HD

V−MOD

VXFIN 501

ADVX 502

HD

HD

NX 500 HD

"

SIMPX 520



Es

0



MF 512

geschieht

hier

1

VF 513

LK 514

ON

HD

MOD

VXFIN 505

ADVX 506

HD

HD

NX 503

NX 504

HD

HD

nichts 3

, 4

es

5

sei 6

MF 515

VF 516

LK 517

MF 518

ON

HD

OA

NX 507 HD

denn 7

, 8



VXFIN 508 HD

ich

NX 509 HD

tu

es 11

. 12

13

"

9

10

$(

PPER

VVFIN

ADV

PIS

$,

PPER

VAFIN

ADV

$,

PPER

VVFIN

PPER

$.

$(

−−

****

3sis

−−

***

−−

nsn3

3sks

−−

−−

ns*1

1sis

asn3

−−

−−

4.3

2

ON



Determiner Phrases

Certain pronouns serving as determiners in noun phrases may be premodified, for in¨ stance, by degree adverbs such as in so viele Altere, gar kein Schutz, etc. ¨ In the case of so viele Altere, the premodifying adverb so is attached to the indefinite pronoun viele. Together, they form a determiner phrase (DP), which is attached to the ¨ head noun Altere on the same level: NX 502 −

HD

DP 501 −

HD

ADVX 500 HD

so

viele

Ältere

0

1

ADV

PIDAT

NN

2

−−

ap*

ap*

NX 502 −

HD

DP 501 −

HD

ADVX 500 HD

gar

kein 0

Schutz 1

2

ADV

PIAT

NN

−−

nsm

nsm

60

14

4.4

Prepositional Phrases

4.4.1

Prepositions

Considering prepositional phrases, it turns out to be appropriate not to annotate the preposition as the head of the phrase. It is rather reasonable to annotate the complement within the prepositional phrase as the head. This decision facilitates the identification of dependencies between verbs and their nominal complements and adjuncts. Moreover, it is in accordance with basic assumptions in Dependency Grammar. PX 501 −

HD NX 500 HD

in

Südpolen

APPR

NN

d

dsn

0

1

If the preposition is realized as a non-alphabetic sign, e.g. - (bis, gegen), this sign is tagged as APPR and annotated like a preposition: NX 508 HD

− PX 507 −

HD

EN−ADD 505

EN−ADD 506





NX 503 HD

NX 504 −



NX 500

NX 501

HD

HD

HSV

HD

BU 0

HD

ADJX 502

− 1

Bramfelder

2

SV 3

4

NE

NE

APPR

ADJA

NN

nsm

nsn

a

***

asm

Since pronominal adverbs (PROP) are pronominal forms of a prepostional phrase, they are directly projected to PX:

61

SIMPX 510 −







VF 506 ON

LK 507 HD

V−MOD

MF 508 OA

NX 500 HD

VXFIN 501 HD

ADVX 502 HD

NX 503 HD

Freuden−thal

0

wollte

1

gestern

2

nichts

VC 509 OV

FOPP PX 504 HD

3

dazu

4

VXINF 505 HD

sagen

NE

VMFIN

ADV

PIS

PROP

VVINF

nsf

3sit

−−

***

−−

−−

5

Freudenthal

In German, there are so-called Verschmelzungsformen, i.e. merged forms of a preposition and a determiner, e.g. in dem Januar amalgamates to im Januar. The merged form is assigned the part-of-speech tag APPRART (including richer morphological annotation). In terms of syntax, it is annotated like a preposition: PX 501 −

HD NX 500 HD

Im

Januar 0

1

APPRART

NN

dsm

dsm

Prepositional phrases expressing intervals, e.g. with von/bis, von/bis zu or zwischen, are annotated in the same way as coordinate structures (cf. 6.4.1), i.e. without head assignment on the level of coordination, since the two phrases are assumed to be conjuncts. If two prepositions follow each other (e.g. bis zum), the result is an embedded structure of a prepositional phrase taking another preposition. The first preposition does hereby not receive a morphological case feature. PX 506 KONJ

KONJ

PX 504 −

PX 505 HD



HD

NX 502

NX 503





ADJX 500 HD

vom

HD

23. 0

HD

ADJX 501

bis 1

25. 2

Juli 3

4

APPRART

ADJA

APPR

ADJA

NN

dsm

dsm

a

asm

asm

62

PX 505 −

HD NX 504 KONJ



KONJ

NX 502

NX 503





ADJX 500

ADJX 501

HD

zwischen

HD

15.000 0

und 1

22.000 2

3

APPR

CARD

KON

CARD

d

−−

−−

−−

PX 504 −

HD PX 503 −

HD NX 502 APP

bis

zum 0

APP

NX 500

NX 501

HD

HD

Jahr 1

2000 2

3

APPR

APPRART

NN

CARD

−−

dsn

dsn

−−

As opposed to the case with two prepositions, intervals like dritter bis f¨ unfter November are annotated as a coordinate attributive adjective phrase within a simple noun phrase (cf. 6.4.1). Premodification of prepositional phrases follows the general principle of low attachment. PX 504 −



HD NX 503 HD

ADVX 500 HD

Irgendwo



NX 501 −

NX 502 HD

HD

in

den

ADV

APPR

ART

NN

NE

−−

d

dpm

dpm

gsn

0

1

2

Wäldern

3

Schaumburgs

4

There is one exception to the low attachment principle: elliptical contructions in which a preceding adverb does not semantically modify the prepositional phrase. In this case 63

the adverbial phrase is high attached to an additional level of PX. PX 503 −

HD PX 502 −

HD

ADVX 500 HD

Nun

NX 501 HD

zum

0

Wetter

1

ADV

APPRART

NN

−−

dsn

dsn

4.4.2

2

Circumpositions and Postpositions

Circumpositions are treated as ternary branching prepositional phrases. The circumposition on the left hand side is tagged as APPR and the circumposition on the right hand side as APZR: PX 501 −

HD



NX 500 HD

von

sich 0

aus 1

2

APPR

PRF

APZR

d

ds*3

−−

Postpositions are tagged as APPO. The complement of the postposition occurs on the left side and constitutes the head of the prepositional phrase: PX 501 HD



NX 500 −

HD

Dem

Vernehmen 0

nach 1

2

ART

NN

APPO

dsn

dsn

d

64

4.5

Adjectival Phrases

We distinguish between attributive adjectives on the one hand and adverbial or predicative adjectives respectively on the other hand. Attibutive adjectives are tagged as ADJA (die traditionellen Elemente) or CARD (20.15 Uhr), whereas adverbial or predicative adjectives are tagged as ADJD (das Gewicht ist gut; den betriebswirtschaftlich gnstigeren Standort) or PWAV (wie wirke ich). The annotation of superlative and comparative forms is explained in section 7.1 on page 119. In general, German adjectives are inflected when they are an attribute of a noun. They are not inflected either when they function as a predicative adjective or a premodifier of an adjective or an adverb or when they belong to a small class of noninflected adjectives, e.g. some ancient form such as gut Wetter or lieb M¨ utterlein or some adjectives denoting a colour (mit einer rosa Karte). All adjectives have to be projected to their phrase level before they are attached to another phrase or to a field. NX 501 −



HD

ADJX 500 HD

Die

traditionellen

Elemente

0

1

2

ART

ADJA

NN

npn

npn

npn

PX 502 −

HD NX 501 −



HD

ADJX 500 HD

mit

einer 0

rosa 1

Karte 2

3

APPR

ART

ADJA

NN

d

dsf

dsf

dsf

SIMPX 506 −

− LK 504

ON

HD

NX 500 −

HD

Gewicht 0

ADJX 502 HD

ist 1

MF 505 PRED

VXFIN 501 HD

Das



VF 503

gut 2

3

ART

NN

VAFIN

ADJD

nsn

nsn

3sis

−−

65

NX 502 −



HD

ADJX 501 −

HD

ADJX 500 HD

den

0

betriebswirtschaftlich

1

günstigeren

2

Standort

ART

ADJD

ADJA

NN

asm

−−

asm

asm

SIMPX 509 −





3



VF 508 ON NX 504 −



HD

ADJX 500 HD

Der

0

männliche

1

Trinker

2

LK 505 HD

MF 506 V−MOD

VC 507 OV

VXFIN 501 HD

ADJX 502 HD

VXINF 503 HD

sei

3

gut

4

erforscht

5

,

6

ART

ADJA

NN

VAFIN

ADJD

VVPP

$,

nsm

nsm

nsm

3sks

−−

−−

−−

A nominalized adjective like Fassbares might be premodified by an adverbial adjective (ADJD) instead of an attributive adjective (ADJA). The former ones do never inflect. NX 501 −

HD

ADJX 500 HD

physisch

Faßbares 0

1

ADJD

NN

−−

asn

Whenever an adjective is modified by another modifier, the same annotation strategy as for noun phrases is applied, i.e., the modifier is directly attached to the adjectival phrase. The adjectival phrase as a whole is the premodifier of the noun phrase. For instance:

66

NX 502 −



HD

ADJX 501 −

HD

ADVX 500 HD

eine

sehr 0

gute 1

Quelle 2

3

ART

ADV

ADJA

NN

nsf

−−

nsf

nsf

SIMPX 508 −





− MF 507 PRED

KOORD 500

VF 504

LK 505

ON

HD



VXFIN 502

ADVX 503

HD

HD

NX 501





aber

HD

der 0

Text 1

ADJX 506

ist 2

HD

sehr 3

abstrakt 4

5

KON

ART

NN

VAFIN

ADV

ADJD

−−

nsm

nsm

3sis

−−

−−

The same holds if an adjective selects an argument. F¨ ur die Weltgesellschaft is the facultative argument of wesentlich. It is directly attached to the adjectival phrase. NX 503 −



HD

ADJX 502 −

HD

PX 501 −

HD NX 500 −

Die

für 0

HD

die 1

Weltgesellschaft 2

wesentliche 3

Unterscheidung 4

5

ART

APPR

ART

NN

ADJA

NN

nsf

a

asf

asf

nsf

nsf

Premodifying adjectives may occur in a linear order and/or as a coordination (cf. 6.4.1) of attributive adjectives:

67

NX 503 −

HD

ADJX 502 KONJ



KONJ

ADJX 500

ADJX 501

HD

HD

28.

und 0

29. 1

Mai 2

3

ADJA

KON

ADJA

NN

nsm

−−

nsm

nsm

NX 504 −





HD

ADJX 503 KONJ

KONJ

ADJX 500

ADJX 501

HD

Die

ADJX 502

HD

großen 0

, 1

HD

bekannten

serbischen

2

3

Oppositionsparteien 4

5

ART

ADJA

$,

ADJA

ADJA

NN

npf

npf

−−

npf

npf

npf

NX 504 −





HD

ADJX 503 KONJ ADJX 500

ADJX 502

HD

eigene 0

KONJ

ADJX 501

HD

ihre



HD

demokratische 1

und 2

freiheitliche 3

Tradition 4

5

PPOSAT

ADJA

ADJA

KON

ADJA

NN

asf

asf

asf

−−

asf

asf

If the premodifying adjective is deverbal, the adjectival phrase can be of any complexity. In this case, the adjectival phrase has its own internal dependency structure. All elements which depend on the adjective are annotated as its premodifiers. Deverbal adjectives are either attributive or adverbial and predicative respectively, and occur as the present participle or past participle form of a verb.

68

NX 502 −



HD

ADJX 501 −

HD

ADJX 500 HD

das

aktuell 0

diskutierte

Thema

1

2

3

ART

ADJD

ADJA

NN

asn

−−

asn

asn

NX 504 −



HD

ADJX 503 −



HD

PX 502 −

HD

ADVX 500 HD

Die



HD

in

die

Erde

ART

ADV

APPR

ART

NN

ADJA

NN

nsf

−−

a

asf

asf

nsf

nsf

0

teilweise

NX 501

1

2

3

4

gebaute

5

Sporthalle

6

In the following example, postmodification of an adjectival phrase is shown: ADJX 502 HD



ADJX 500

ADJX 501

HD



besser

HD

als 0

gut 1

2

ADJD

KOKOM

ADJD

−−

−−

−−

69

4.6

Adverbial Phrases

Besides adverbials also negation particles (PTKNEG) project to an adverbial phrase. They either occur as premodifiers1 or postmodifiers or they are directly attached to a field. SIMPX 517 −









MF 516 OD

ON

MOD

V−MOD

NX 515

VF 509

LK 510

V−MOD

HD

KOORD 500

ADVX 501

VXFIN 502



HD

HD

Doch

heute 0

APP

APP

EN−ADD 511

EN−ADD 512





NX 503

NX 504



will



Ina

1

2

ADJX 513 −



Terre 3

( 4



Hannelore 5

NX 505 HD

Droege 6

) 7

es 8

VC 514 HD

OV

ADVX 506

ADVX 507

VXINF 508

HD

HD

HD

nicht 9

so 10

recht 11

munden 12

13

KON

ADV

VMFIN

NE

NE

$(

NE

NE

$(

PPER

PTKNEG

ADV

ADJD

VVINF

−−

−−

3sis

dsf

dsf

−−

dsf

dsf

−−

nsn3

−−

−−

−−

−−

NX 503 −



ADVX 500

ADVX 501

HD

HD

bis



HD

ADJX 502 HD

zu

300.000

Leute

0

1

ADV

ADV

CARD

2

NN

3

−−

−−

−−

np*

NX 502 −



ADVX 500

HD

ADJX 501

HD

HD

über

350.000 0

Auskünfte 1

2

ADV

CARD

NN

−−

−−

apf

ADVX 502 HD



ADVX 500

ADVX 501

HD

HD

heute

abend 0

1

ADV

ADV

−−

−−

1

bis zu, u ¨ber are considered to be ADV rather than APPR because of their semantic meaning.

70

14

SIMPX 510 −







MF 509 V−MOD VF 505

LK 506

ON NX 500 −

HD

Der

Fahrer 0

ADVX 507

VC 508

HD

HD



OV

VXFIN 501

ADVX 502

ADVX 503

VXINF 504

HD

HD

HD

HD

konnte 1

nicht 2

mehr 3

bremsen 4

. 5

6

ART

NN

VMFIN

PTKNEG

ADV

VVINF

$.

nsm

nsm

3sit

−−

−−

−−

−−

4.7

Verb Phrases

Whereas finite verb phrases are labeled VXFIN, non-finite verb phrases are labeled VXINF. Since infinitives and past participles share certain properties (e.g. exchangeability in Man hat nur noch das eigene Herz schlagen h¨oren/geh¨ ort.), they are assumed to carry the same phrase label (VXINF). The finite verb in LK as well as the non-finite verbs in VC are always projected to their phrase level. All verb phrases of the verb complex are attached on the same level to form the verb complex. In order to follow the flat clustering principle, no internal hierarchy of the verb complex is annotated.

4.7.1

Head of a Sentence and Verb Complex

The finite verb which can either appear in LK (verb-first clauses and verb-second clauses) or in VC (verb-final clauses), is always the head of the entire sentence. Non-finite verbal elements belong to VC. If the finite verb is located in LK and if there is more than one non-finite element in VC, the non-finite element which is selected by the finite verb is denoted as the head of VC. All other elements of VC are verbal objects. The head of VC selects the verbal object OV. This verbal object may select another verbal object OV, and so on. In order to denote the dependency relations between verbal objects within the verb complex, we attach a secondary edge label REFVC between their phrase nodes.

4.7.2

Verb Complexes in Verb-second and Verb-final Clauses

The following example shows a verb-second clause with the head of the sentence in LK and a verb complex consisting of a single non-finite element.

71

SIMPX 513 −







MF 512 OA

FOPP

VF 510

PX 511

ON



NX 506 −



HD

NX 508

HD

ADJX 500

NX 502

HD

überehrgeizige

Bürgermeister

0

2

HD

die 3



NX 503



will

1

VC 509

HD

VXFIN 501

HD

Der

HD

LK 507



Bergidylle 4

in 5

HD

ein 6

OV

NX 504 −

Mekka 7

HD

des 8

VXINF 505 HD

Massentourismus 9

verwandeln 10

11

ART

ADJA

NN

VMFIN

ART

NN

APPR

ART

NE

ART

NN

VVINF

nsm

nsm

nsm

3sis

asf

asf

a

asn

asn

gsm

gsm

−−

If the verb complex comprises more than one immediate daughter, the one that is selected by the finite verb is the head of VC. SIMPX 509 −



VF 505



LK 506

ON

HD

NX 500

NX 502

HD

Es

VC 508

PRED

VXFIN 501

HD



MF 507



müsse 0

HD

ein 1

Buchungsfehler

OV

HD

VXINF 503

VXINF 504

HD

HD

gewesen

2

3

sein 4

. 5

6

PPER

VMFIN

ART

NN

VAPP

VAINF

$.

nsn3

3sks

nsm

nsm

−−

−−

−−

The following trees demonstrate verb complexes with two or more verbal objects. The secondary edge label REFVC is pointing from the selecting OV to the depending OV. SIMPX 508 −





MF 506 MOD

VC 507 ON

OV refvc

C 500

ADVX 501



HD

Wenn

HD

da 0

NX 502

was 1

OV 503

HD

VXINF 503

VXINF 504

VXFIN 505

HD

HD

HD

gebucht 2

worden 3

ist 4

5

KOUS

ADV

PIS

VVPP

VAPP

VAFIN

−−

−−

nsn

−−

−−

3sis

72

12

SIMPX 513 −





MF 512 MOD

ON

V−MOD PX 511 −

HD NX 509 HD

VC 510 −

OV refvc

C 500

ADVX 501



NX 502

HD

daß



auch 0

HD

sein 1

Vater 2

auf 3

NX 503

NX 504

HD

HD

Anregung 4

Andreottis 5

OV 505

refvc

OV 506

HD

VXINF 505

VXINF 506

VXINF 507

VXFIN 508

HD

HD

HD

HD

umgebracht 6

worden 7

sein 8

könnte 9

10

KOUS

ADV

PPOSAT

NN

APPR

NN

NE

VVPP

VAPP

VAINF

VMFIN

−−

−−

nsm

nsm

a

asf

gsm

−−

−−

−−

3skt

If there is no finite verb at all, the rightmost element of the verb complex (if there is more than one element) is annotated as the head of the sentence. This often occurs in headlines (cf. 5.2 and 7.4). SIMPX 504 −



MF 502

VC 503

OA

HD

NX 500

VXINF 501

HD

HD

Prachtwicken

gucken 0

. 1

2

NN

VVINF

$.

apf

−−

−−

4.7.3

Ersatzinfinitiv Constructions

In order to indicate Ersatzinfinitiv constructions, two specific field node labels are introduced. VCE is the node label for the part of the verb complex consisting of the finite verb which subcategorizes for the Ersatzinfinitiv. MFE is the node label for the second part of MF between VCE and the second part of the verb complex VC (e.g. [C die] [MF uns] [VCE h¨atten] [MFE mißtrauisch] [VC machen m¨ ussen]).

73

SIMPX 511 −







MF 510 ON

OPP

NX 507 KONJ

VCE 508



KONJ

C 500

NX 501

NX 502

PX 503



HD

HD

HD

daß

Fischer

und

0

1

ich 2

dazu 3

VC 509

HD

OV

HD

VXFIN 504

VXINF 505

VXINF 506

HD

HD

HD

haben 4

beitragen 5

können 6

7

KOUS

NE

KON

PPER

PROP

VAFIN

VVINF

VMINF

−−

nsm

−−

ns*1

−−

1pis

−−

−−







R−SIMPX 511 −



C 506 ON

MF 507 OA

VCE 508 HD

MFE 509 PRED

OV

HD

NX 500 HD

NX 501 HD

VXFIN 502 HD

ADJX 503 HD

VXINF 504 HD

VXINF 505 HD

die

uns

PRELS

PPER

VAFIN

ADJD

VVINF

VMINF

np*

ap*1

3pkt

−−

−−

−−

0

hätten

VC 510

1

2

mißtrauisch

machen

3

4

müssen

5

In the example below, the finite verb precedes the non-finite verbs although m¨ ussen is no Ersatzinfinitiv. Since its position corresponds to the position of the finite verb in real Ersatzinfinitiv constructions and here also a second middle field is possible, we follow the same annotation strategy.

SIMPX 514 −







MF 513 ON

OD

MOD

MOD

OA NX 512 −



HD

ADJX 509 − C 500 −

NX 501 −

daß

NX 502 HD

die

HD

Nato

sich 2

VCE 510 HD

OV

HD

ADVX 503

ADVX 504

ADVX 505

VXFIN 506

VXINF 507

VXINF 508

HD

HD

HD

HD

HD

HD

doch 3

noch 4

ein 5

HD

VC 511

ganz 6

neues 8

wird 9

überlegen 10

müssen

1

ART

NE

PRF

ADV

ADV

ART

ADV

ADJA

NN

VAFIN

VVINF

VMINF

−−

nsf

nsf

ds*3

−−

−−

asn

−−

asn

asn

3sis

−−

−−

74

7

Konzept

0

KOUS

11

12

4.7.4

Infinitives with zu

Regarding infinitives with zu, zu determines the non-finiteness of the verb on its right hand side. This is the reason why zu is considered the head of the VXINF whereas the infinitive is assumed to be the complement. Like other infinitives, they occur in the verb complex: SIMPX 512 −







VF 511 ON NX 510 HD

− NX 506 −

LK 507 HD

HD

NX 500 HD

NX 501 KONJ

Erkenntnisse



zu

sein

TRUNC

KON

NN

VVFIN

PPER

ADJD

PTKZU

VAINF

$.

npf

gsf

−−

−−

gsf

3pis

dsm3

−−

−−

−−

−−

4

5

6

fremd



Friedens−

3

ihm

VXINF 505 HD

der

2

scheinen

ADJX 504 HD

ART

1

Konfliktforschung

NX 503 HD

VC 509 OV

NN

0

und

OD

VXFIN 502 HD

KONJ

MF 508 PRED

7

8

9

.

10

SIMPX 510 −







VF 509 OPP PX 505 −

LK 506 HD NX 500 HD

Über

Details 0

MF 507

HD

MOD

VXFIN 501

ADVX 502

HD

HD

werde 1

VC 508 OV VXINF 503 HD

noch 2

HD

zu 3

VXINF 504 −

HD

verhandeln 4

sein 5

. 6

7

APPR

NN

VAFIN

ADV

PTKZU

VVINF

VAINF

$.

a

apn

3sks

−−

−−

−−

−−

−−

The infinitive with zu can also be realized as an infix of the verb. In this case, the verb is tagged as VVIZU. Moreover, it is projected to VXINF with the grammatical function HD:

75

SIMPX 509 −





MF 508 MOD

MOD

OA

PX 507 −

HD NX 505

VC 506

− C 500

HD

HD

ADJX 501



ADVX 502

HD

um

neben 0

NX 503

HD

neuen 1

Baugesetzen 2



auch 3

VXINF 504 HD

mehr 4

HD

Mitspracherechte 5

einzufordern

.

6

7

8

KOUI

APPR

ADJA

NN

ADV

PIAT

NN

VVIZU

$.

−−

d

dpn

dpn

−−

***

apn

−−

−−

Besides the examples above, the infinitive with zu occurs in optional (in most cases with um zu) and obligatory infinitive clauses. SIMPX 515 −





− NF 514 OS

MF 512 ON

OS−MOD

SIMPX 513

OD

OPP



PX 508 −

HD

C 500

NX 501

NX 502

NX 503

NX 504



HD

HD

HD

HD

Wenn

Angehörige 0

es 1

sich 2

zur 3

VC 509

MF 510

VC 511

HD

OA

HD

VXFIN 505

NX 506

HD

Lebensaufgabe 4





machen 5

, 6

VXINF 507 HD

den

HD

Kranken

7

8



zu 9

kontrollieren 10

11

KOUS

NN

PPER

PRF

APPRART

NN

VVFIN

$,

ART

NN

PTKZU

VVINF

−−

np*

asn3

dp*3

dsf

dsf

3pis

−−

asm

asm

−−

−−

SIMPX 513 −





− NF 512 OS SIMPX 511 −

MF 507

VC 508

OD

HD

C 500

NX 501



HD

um

Freunden 0

ON

VXINF 502 HD

C 503 −

zu 1





MF 509



sagen 2

, 3

daß 5

HD

NX 505 HD

ihr

4

OA

NX 504 −

VC 510

Zug 6

VXFIN 506

HD

HD

Verspätung 7

hat 8

9

KOUI

NN

PTKZU

VVINF

$,

KOUS

PPOSAT

NN

NN

VAFIN

−−

dpm

−−

−−

−−

−−

nsm

nsm

asf

3sis

76

Infinitive clauses can consist of only one verb complex: SIMPX 518 −

− FKOORD 517 KONJ



KONJ FKONJ 516 −





FKONJ 514

NF 515





OS

MF 512

SIMPX 513

OA VF 507

V−MOD

LK 508

ON

HD

NX 500



wendet 0

HD

NX 502

HD

Er

LK 510



VXFIN 501

HD



PX 509

den 1

HD

NX 503 HD



Blick 2

von 3

HD

VXFIN 504 HD

der 4

VC 511

HD

Wand 5

VC 505

und 6

VPT

fängt 7

HD

an 8

VXINF 506 −

zu 9

erzählen 10

. 11

12

PPER

VVFIN

ART

NN

APPR

ART

NN

KON

VVFIN

PTKVZ

PTKZU

VVINF

$.

nsm3

3sis

asm

asm

d

dsf

dsf

−−

3sis

−−

−−

−−

−−

4.7.5

Coherency and Incoherency of Verbal Constructions

The notion of coherency attributed to Bech (1955 57) covers the relation of dependency between adjacent verbal elements, i.e. the relation of subcategorization between a verb and a non-finite verbal complement. Kiss (1995) calls this relation infinitive Komplementation (non-finite complementation). Bech (1955 57) distinguishes between three different modi of obligatory and optional coherency: 1. verbs constructing coherently and incoherently, e.g. versprechen, versuchen coherent, extraposition possible: a. [wie er mit kritischen politischen Gegenpositionen umzugehen versteht] incoherent, extraposition: b. [wie er versteht,][mit kritischen politischen Gegenpositionen umzugehen] 2. verbs constructing only coherently, e.g. wollen, m¨ochten coherent, no extrapostion possible: a. [wie er mit kritischen politischen Gegenpositionen umgehen will] b.*[wie er will mit kritischen politischen Gegenpositionen umgehen] 3. verbs constructing only incoherently, e.g. u ¨berreden, u ¨berzeugen incoherent, extraposition obligatory: a. [wie er sie u ¨berredet,][mit kritischen politischen Gegenpositionen umzugehen] b.*[wie er sie [mit kritischen politischen Gegenpositionen umzugehen] u ¨berredet]

77

Coherent and incoherent constructions of verbs are annotated differently. In case of coherency, the verbal complement is part of the verb complex. In the clause wie er mit kritischen politischen Gegenpositionen umzugehen versteht, for instance, the infinitive with zu is the verbal object of the finite verb. While in case of incoherency, the verbal complement is annotated as a sentential complement, i.e., mit kritischen politischen Gegenpositionen umzugehen in the clause wie er sie u ¨berredet, mit kritischen politischen Gegenpositionen umzugehen is a sentential object in NF. We define that a construction is incoherent, if extraposition in NF is possible. That is, whenever it is possible to shift the infinitival complement together with a constituent of MF, which it subcategorizes for, into NF, these elements are annotated as sentential objects. Therefore, the coherent example above (wie er mit kritischen politischen Gegenpositionen umzugehen versteht) is annotated with a sentential object in MF since extraposition is possible (cf. the incoherent example 1.b.). SIMPX 513 −





MF 512 ON

OS SIMPX 511 −



MF 510 OPP PX 509 −

HD NX 506 −

C 500

NX 501



HD

wie

ADJX 502

mit 1

HD

kritischen 2

HD

ADJX 503

HD

er 0



politischen 3

Gegenpositionen 4

VC 507

VC 508

HD

HD

VXINF 504

VXFIN 505

HD

HD

umzugehen 5

versteht 6

7

KOUS

PPER

APPR

ADJA

ADJA

NN

VVIZU

VVFIN

−−

nsm3

d

dpf

dpf

dpf

−−

3sis

If a complement of the verb within the sentential object is located out of the sentence boundaries, e.g. in the C-field, the secondary edge label REFCONTR gives additional information about the dependency relation (cf. 3.4.6).

4.7.6

AcI Constructions

AcI (accusativus cum infinitivo) verbs are a small group of verba sentiendi (e.g. sehen, h¨oren, f¨ uhlen, sp¨ uren) which subcategorize for an accusative and an infinitive. The verbs lassen, machen, heien have a modal verb like reading in which they also select an accusative and an infinitive. The infinitive itself subcategorizes for complements with respect to its valency but its subject is realized by an accusative which is the direct object of the AcI verb. Since AcI constructions are coherent infinitive constructions in which extraposition is not possible (cf. (Eisenberg 1999 2001), p.355), the AcI is not annotated as a sentential object (* wenn man nur noch h¨ort das eigene Herz schlagen). The infinitive as the verbal 78

object of the AcI verb is located in the verb complex and the accusative is realized as OA in MF. SIMPX 510 −





MF 509 ON

MOD

MOD

OA NX 507 −

C 500

NX 501



HD

Wenn

man 0

ADVX 502

ADVX 503

HD

HD

nur 1



HD

ADJX 504 HD

noch 2

VC 508

das 3

eigene 4

Herz 5

OV

HD

VXINF 505

VXFIN 506

HD

HD

schlagen 6

hört 7

8

KOUS

PIS

ADV

ADV

ART

ADJA

NN

VVINF

VVFIN

−−

ns*

−−

−−

asn

asn

asn

−−

3sis

As a consequence of this analysis we annotate two accusative objects (OA) if the AcI construction comprises a transitive infinitive verb such as beenden in the following example. Uns functions as its subject and die Diskussion as its direct object. Both are in accusative case and both are labeled OA. SIMPX 509 −

− LK 506 HD VXFIN 500 HD

Lassen

OA

NX 501 HD

NX 502 HD

uns

VVFIN

PPER

3pis

np*3

4.7.7

MF 507 OA

ON

Sie

0

1



MOD

NX 503 −

VC 508 OV

HD

ADVX 504 HD

VXINF 505 HD

jetzt

beenden

die

Diskussion

PPER

ART

NN

ADV

VVINF

ap*1

asf

asf

−−

−−

2

3

4

5

6

Imperatives

Imperative verbs have only one singular and one plural form and are not inflected concerning the grammatical category person. Their form corresponds to second person singular and plural verbs which are tagged as VVIMP or VAIMP. Warte mal! instead of Wartest du mal?

(warte/VVIMP:s) (wartest/VVFIN:2sip)

It is important to keep apart imperative sentences from imperative verbs. An imperative sentence does not need to comprise an imperative verb form as is shown in the following examples Warten Sie mal bitte! Bitte warten!

(warten/VVFIN:3pip) (warten/VVINF:–) 79

SIMPX 506 −

− MF 505 OA

LK 503 HD

NX 504 HD

VXFIN 500 HD



NX 501 HD

(

vgl.

$(

VVIMP

NN

CARD

$(

−−

s

asf

−−

−−

0

Seite

NX 502 HD

1

32

2

)

3

4

SIMPX 506 −



VF 503



LK 504

V−MOD

MF 505

HD

PX 500

V−MOD

VXFIN 501

HD

ADJX 502

HD

Drum

HD

prüfe

ewig

0

1

2

PROP

VVIMP

ADJD

−−

s

−−

Normally imperative verbs are lacking the subject, but the addressed person can also be mentioned to stress the utterance: SIMPX 504 − DM 502 −

LK 503 HD

NX 500 HD

VXFIN 501 HD

Maikäfer

flieg

0

...

1

2

NN

VVIMP

$(

nsm

s

−−

SIMPX 507 −



− MF 506 ON

VF 503 V−MOD

LK 504 HD

ADJX 500 HD

VXFIN 501 HD



lebe

Ned

Lang

0

1

EN−ADD 505 − NX 502 −

2

Devine

ADJD

VVIMP

NE

NE

−−

s

nsm

nsm

3

80

SIMPX 515 −



− NF 514 OS SIMPX 513 −





LK 507 HD

MF 508 ON

VF 509 ON

LK 510 HD

VXFIN 500 HD

NX 501 HD

NX 502 HD

VXFIN 503 HD



habe

den

Sage

niemand

0

1

,

das

2

3



MF 511

4

VC 512 OV

OA

MOD

NX 504

ADVX 505 HD

VXINF 506 HD

nicht

verändert

HD

Westen

5

6

7

.

8

9

VVIMP

PIS

$,

PDS

VAFIN

ART

NN

PTKNEG

VVPP

$.

s

ns*

−−

nsn

3sks

asm

asm

−−

−−

−−

4.7.8

Particle Verbs

Separable verb particles are tagged as PTKVZ and annotated with the edge label VPT: SIMPX 511 −



VF 509

MF 510 OD LK 507

HD

ADVX 500



NX 501

HD



Auch

HD

der 2





VXFIN 503



Vertreter 1

NX 508

HD

NX 502 HD

die 0



ON NX 506 −



ADJX 504

HD

AfB 3

HD VC 505

HD

stimmten 4

den 5

VPT

86 6

Millionen 7

zu 8

. 9

10

ADV

ART

NN

ART

NE

VVFIN

ART

CARD

NN

PTKVZ

$.

−−

npm

npm

gsf

gsf

3pit

dpf

−−

dpf

−−

−−

In verb-final clauses, the particle verb occurs unseparated within the verb complex: SIMPX 510 −



VF 506



LK 507

ON NX 500 HD

Rußland



MF 508

VC 509

HD

MOD

OD

MOD

OV

VXFIN 501

ADVX 502

NX 503

ADVX 504

VXINF 505

HD

HD

HD

HD

wollte 0



bislang 1

HD

einer 2

UN−Resolution 3

nur 4

zustimmen 5

6

NE

VMFIN

ADV

ART

NN

ADV

VVINF

nsn

3sit

−−

dsf

dsf

−−

−−

81

4.7.9

Verbs with Predicate

Typically, the complement type PRED (predicate) occurs with verbs like sein, haben, scheinen, aussehen, sich anh¨ oren, klingen, etc. PRED is annotated, if the following conditions apply: • if it is not possible to determine the case of the constituent in question properly (e.g. gut in Das ist gut.) • if the constituent in question actually predicates the subject, i.e. the subject is characterized as having the property expressed by PRED (e.g. in Die Ursache war unklar. Die Ursache is characterized by the property of being unclear) • many PRED verbs are raising-verbs (subject without theta-role) • if als-phrases are selected by the verb they are labeled as PRED (e.g. Unter dem Motto Kino-Extrem agiert der Regisseur als Filmjockey.) SIMPX 511 −





VF 510 V−MOD PX 509 −

HD NX 508 APP

APP EN−ADD 505

LK 506

− NX 500 −

Unter

dem 0

NX 501 HD



agiert 3

PRED

NX 503

HD

Kino−Extrem 2

ON

VXFIN 502

HD

Motto 1

MF 507

HD

HD

der 4

NX 504 −

Regisseur 5

HD

als 6

Filmjockey 7

8

APPR

ART

NN

NN

VVFIN

ART

NN

KOKOM

NN

d

dsn

dsn

dsn

3sis

nsm

nsm

−−

nsm

Some examples for verbs that take predicates: recht sein, recht haben, leid tun, frei sein, fertig sein, sich gut/schlecht treffen, gut/schlecht finden, etc. PRED verbs have to be distinguished carefully from verbs occurring with ordinary modifiers (V-MOD) such as gut passen. With respect to topological fields, note that PRED usually marks the border between MF and NF, i.e., whatever constituent occurs on the right hand side of PRED belongs to NF. In general, this constituent is an adjunct which PRED does not subcategorize for:

82

SIMPX 508 −





VF 504

LK 505

ON

HD

NX 500

MF 506

ADVX 503

HD

ist 0

V−MOD

NX 502

HD

Das

NF 507

PRED

VXFIN 501

HD



HD

Politik 1

hier 2

3

PDS

VAFIN

NN

ADV

nsn

3sis

nsf

−−

SIMPX 509 −





− NF 508 V−MOD

VF 504

LK 505

ON

HD

NX 500

MF 506

VXFIN 501

HD

HD NX 503

HD

ist 0



ADJX 502

HD

es

PX 507

PRED



kalt 1

an 2

HD

diesem

Tag

3

4

5

PPER

VAFIN

ADJD

APPR

PDAT

NN

nsn3

3sis

−−

d

dsm

dsm

But there are exceptions in which PRED does not necessarily constitute the border between MF and NF: • Another constituent may occur between PRED and VC, for instance, if an ambiguous modifier follows PRED. R−SIMPX 511 −





MF 510 V−MOD C 505 ON

PRED

PX 506 −

NX 507 HD

NX 500 −

das



an 0

HD



der 3

HD

schönste 4

HD

NX 503

HD

Abend 2

VC 509 HD

ADJX 502 HD

diesem 1

PX 508



NX 501

HD

MOD

Platz 5

im 6

HD

All 7

VXFIN 504

war 8

. 9

10

PRELS

APPR

PDAT

NN

ART

ADJA

NN

APPRART

NN

VAFIN

$.

nsn

d

dsm

dsm

nsm

nsm

nsm

dsn

dsn

3sit

−−

• PRED subcategorizes for the constituent that follows it. Complements of PREDs are always attached to a field since they are assigned a grammatical function within the sentence structure (cf. 8.1):

83

SIMPX 510 −





VF 508

MF 509

ON

PRED

NX 505

LK 506

HD



NX 500

HD

meiner 0



VXFIN 502



Einer

PX 507

HD

NX 501

HD

FOPP

HD

Freunde

NX 504

HD

wurde

1

HD

ADJX 503

2

HD

süchtig 3

nach 4

Nachrichten 5

. 6

7

PIS

PPOSAT

NN

VAFIN

ADJD

APPR

NN

$.

nsm

gpm

gpm

3sit

−−

d

dpf

−−

SIMPX 507 −





VF 504

LK 505

ON

HD

NX 500

MF 506 PRED

VXFIN 501

HD

ADJX 502

HD

ich

FOPP PX 503

HD

bin

HD

froh

0

1

darum 2

3

PPER

VAFIN

ADJD

PROP

ns*1

1sis

−−

−−

• Because of the word order rule that pronouns in MF have to precede other constituents, PRED might not be the last element in MF if it is a pronoun: SIMPX 511 −







VF 510 ON EN−ADD 506

LK 507



MF 508

HD

NX 500

PRED

VXFIN 501

HD

NX 502

HD

Bravo

OV

HD

ADVX 503

VXINF 504

VXINF 505

HD

HD

HD

HD

kann 0

VC 509 MOD

es 1

nicht 2

gewesen 3

sein 4

. 5

6

NN

VMFIN

PPER

PTKNEG

VAPP

VAINF

$.

nsf

3sis

nsn3

−−

−−

−−

−−

SIMPX 507 −



VF 504 ON



LK 505 HD

NX 500 HD

PRED

VXFIN 501 HD

er

NX 502 HD

war 0

MF 506

HD

es 1

MOD ADVX 503

nicht 2

3

PPER

VAFIN

PPER

PTKNEG

nsm3

3sit

nsn3

−−

84

4.7.10

Modal Verbs

Modal verbs are always tagged as VMFIN or VMINF regardless of their use as an auxiliary or a main verb. If a modal verb functions as an auxiliary verb, it is projected like any other auxiliary verb. If a modal verb is the main verb of a sentences, verbal modifiers refer to the modal verb in the same way as they refer to other main verbs: SIMPX 508 −





VF 504

LK 505

MF 506

VC 507

ON

HD

OA

OV

NX 500

VXFIN 501

HD

"



Die

0

NX 502

HD

VXINF 503



wollten 1

HD

die 2

HD

BLG 3

schonen 4

. 5

"

6

7

$(

PDS

VMFIN

ART

NE

VVINF

$.

$(

−−

np*

3pit

asf

asf

−−

−−

−−

SIMPX 512 −



LK 509



MF 510

HD VXFIN 500 HD

Hätte

ON

OD

OA

MOD

NX 501

NX 502

NX 503

ADVX 504

HD

HD

HD

sie 0

VC 511

sich

HD

das

1

2

OA−MOD

V−MOD

OV

HD

NX 505

ADVX 506

VXINF 507

VXINF 508

HD

HD

HD

HD

nicht 3

alles 4

vorher 5

überlegen 6

können 7

? 8

9

VAFIN

PPER

PRF

PDS

PTKNEG

PIS

ADV

VVINF

VMINF

$.

3skt

nsf3

ds*3

asn

−−

asn

−−

−−

−−

−−

SIMPX 508 −





MF 507 ON

OPP

C 504 V−MOD

PX 505 −

PX 500 HD

Warum

NX 501 HD

0

Daewoo

VC 506 HD

HD NX 502 HD

1

nach

2

Bremen

VXFIN 503 HD

3

mußte

PWAV

NE

APPR

NE

VMFIN

−−

ns*

d

dsn

3sit

4

85

Chapter 5 Attachment Principles for Phrases 5.1

Attachment to Fields

Phrases are attached to the topological field in which they occur. Their edge labels denote their grammatical function within the sentence structure. In LK and VC there can only occur verb forms, separable verbal prefixes, or infinitive particles. LK and VC mark the beginning and the end of MF (cf. 3.2).

5.2

Attachment of Ambiguous Complements

The partially free word order and the morphological properties of German can cause ambiguity concerning the grammatical function of a constituent. In the following example, the syntactic structure does not give any information about case assignment. Both noun phrases can be identified as ON or OA: SIMPX 509 −





VF 508 OA NX 507 HD

− PX 504 −

HD

NX 500 −

NX 501 HD

Ein



Bad 0

in 1

MF 506 ON

VXFIN 502 HD

der 2

LK 505 HD

Menge 3

NX 503

HD



verhindert 4

HD

das 5

Sicherheitsgitter 6

. 7

8

ART

NN

APPR

ART

NN

VVFIN

ART

NN

$.

asn

asn

d

dsf

dsf

3sis

nsn

nsn

−−

Headlines like the following are lacking the finite verb. Therefore, in the first example it cannot be decided if it is an active or a passive construction, i.e., if the noun phrase is ON or OA. The second example is an active construction, but again the noun phrase can be both, ON or OA:

86

SIMPX 504 −



MF 502

VC 503

ON

HD

NX 500

VXINF 501

HD

HD

Kriegsverbrecher

verurteilt 0

1

NN

VVPP

npm

−−

SIMPX 504 −



MF 502

VC 503

OA

HD

NX 500

VXINF 501

HD

HD

Prachtwicken

gucken 0

. 1

2

NN

VVINF

$.

apf

−−

−−

Since we do not assign specific edge labels for ambiguous complements, we formulate the following preference principle for case assignment: Preference principle for case assignment: If case assignment is ambiguous, we decide on the more plausible grammatical function and on the more plausible sequence of grammatical functions respectively. The main criteria for the decision are the unmarked word order and the semantic content. Therefore, in the first example above, OA appears in VF whereas ON has its position in MF. For elliptical headlines, we assume a passive construction if the verb in VC is a past participle and an active construction if the verb in VC is an infinitive (cf. 4.7.2 and 7.4).

5.3

Modifier Attachment

Modifiers either modify one specific constituent or more than one constituent. The scope of modification can even range over the whole sentence structure. Therefore, they are either unambiguous or ambiguous. An unambiguous constituent that modifies just one other constituent within a tree structure is either adjacent or discontinuous. In the first case, it is immediately attached to the constituent which it modifies, concerning the attachment rules for phrases. In the second case, the dependency, which can even go beyond the border of topological fields, is indicated by X-MOD edge labels, which express the non-ambiguity of the modifier (e.g. OA-MOD is the modifier of OA). Thus, edge labels like OA-MOD, V-MOD, OPP-MOD, MOD-MOD, etc. express that the respective constituent modifies only one other constituent in the sentence (OA, V, OPP, a modifier, etc.) which is not adjacent: 87

SIMPX 511 −







VF 510 OA−MOD PX 506

LK 507



HD

HD

NX 500

VXFIN 501



Für

HD

diese 0

MF 508

HD

Behauptung 1

ON

MOD

OA

NX 502

ADVX 503

NX 504

HD

hat 2

VC 509

HD

Beckmeyer 3

OV



bisher 4

HD

keinen 5

VXINF 505 HD

Nachweis 6

geliefert 7

. 8

9

APPR

PDAT

NN

VAFIN

NE

ADV

PIAT

NN

VVPP

$.

a

asf

asf

3sis

nsm

−−

asm

asm

−−

−−

If a modifying constituent is ambiguous (i.e. it modifies more than one constituent, the entire sentence, or a constituent that occurred in previous sentences), it is attached to its topological field and given the ambiguous edge label MOD to preserve ambiguity. In the following example an der Uni either modifies the accusative object den Entwicklungsprozeß or the verb fortsetzen: SIMPX 513 −





VF 511

MF 512

V−MOD

ON

ADJX 507 KONJ



KONJ

MOD

HD

HD

energisch 1

NX 503 HD

will 2

VC 510



VXFIN 502

HD

und

PX 509

HD

ADJX 501

0

OA

LK 508

ADJX 500

Phantasievoll





er 3

HD

NX 504 HD

den 4

OV

NX 505 −

Entwicklungsprozeß 5

an 6

HD

der 7

VXINF 506 HD

Uni 8

fortsetzen 9

10

ADJD

KON

ADJD

VMFIN

PPER

ART

NN

APPR

ART

NN

VVINF

−−

−−

−−

3sis

nsm3

asm

asm

d

dsf

dsf

−−

We formulate the following definitions for MOD and X-MOD: Definition of MOD: A constituent is called MOD, if it cannot be assigned a more specific label, either because it is ambiguous or because there is no more specific label (e.g. for sentence modifiers or for constituents that refer to some sentence external expression). Sometimes it is difficult to determine whether a modifier is definite or not. In cases of doubt, modifiers are marked as ambiguous (MOD) rather than as definite modifiers. Definition of X-MOD: X is a variable that can be replaced by labels for syntactic categories like OA, OPP, MOD, V. X-MOD marks long-distance modification which is unambiguous, e.g. relative clauses (Aber es g¨abe (intelligente L¨osungen OA), (die kein Geld kosten OA-MOD)). 88

Typical MODs and V-MODs: Generally, modifying subclauses (e.g. Katastrophenstimmung herrscht erst, [wenn nichts mehr zu verheimlichen ist] (MOD).) are MOD because they modify the complete main clause. Modifying particles and adverbs like da, dann, auch, eigentlich, ja, vielleicht, auch, nat¨ urlich usually show attachment ambiguity and therefore are annotated as MOD. Only if they unambiguously express the modification of the verb (e.g. Das Buch liegt da. or Er geht auch.) they carry the edge label V-MOD. Pronominal adverbs (PROP) like dabei, daf¨ ur, trotzdem, deswegen, hierauf, etc. are either ambiguous (e.g. Dabei (MOD) erscheinen Sie in anderen Verlagen.) or unambiguous [e.g. Er achtet dabei (V-MOD) auf alles.). Non-pronominal adverbs such as vorher, sp¨ater, etc. in most cases give temporal or local information. Thus, they are rather V-MOD than MOD.

5.3.1

Modifier Attachment in the Initial Field

Since only one constituent is allowed in the initial field, all elements preceding and following the head are attached as premodifiers (low attachement) or postmodifiers (high attachment) according to the attachment rules explained in 4.1. SIMPX 513 −





VF 511 MOD

MF 512 ON

PX 509 −

PX 510

refint



HD



NX 506 HD ADVX 500 HD

Auch

PRED



NX 501 HD

ADVX 502 HD

NX 508 −

VXFIN 503 HD



HD

ADJX 505 HD

HD

für

Rumänien

der

Papst−Besuch

APPR

NE

ADV

VAFIN

ART

NN

APPR

ADJA

NN

$.

−−

a

asn

−−

3sis

nsm

nsm

d

dsf

dsf

−−

1

5.3.2

2

ist

NX 504

ADV

0

selbst

HD 509

LK 507 HD

3

4

5

von

6

7

großer

8

Bedeutung

9

.

10

Attachment across Punctuation Marks

The punctuation marks : and - and ... separate a syntactic construction within a unit unless there is no syntactic dependency relation between the two parts (cf. 3.4.5) like in the following: SIMPX 510 −





VF 508

NX 509

ON

HD

EN−ADD 505 −



LK 506

NX 507

HD

NX 500 HD



VXFIN 501 HD

ASB

lädt 0

VC 502

NX 503

VPT

HD

ein 1

: 2

HD

HD

Tag

3

− ADJX 504

der 4

offenen 5

Tür 6

7

NN

VVFIN

PTKVZ

$.

NN

ART

ADJA

NN

nsm

3sis

−−

−−

nsm

gsf

gsf

gsf

89

NX 502 − NX 500

ADJX 501



HD

Sein

HD

HD

Zuhause

:

0

1

stilvolles

Entertainment

2

.

3

4

5

PPOSAT

NN

$.

ADJA

NN

$.

nsn

nsn

−−

nsn

nsn

−−

Attachment is necessary if the part following the punctuation mark has a grammatical function within the sentence structure: SIMPX 512 −



− NF 511 OS SIMPX 510 −

VF 505 ON NX 500

VF 507

HD

ON

VXFIN 501

HD

meinte

: 1



MF 509

HD

PRED

VXFIN 503

HD

0



LK 508

NX 502

HD

Er



LK 506

NX 504

HD

das



ist 4

HD

meine 5

Geschichte 6

’ 7

.

"

2

3

8

9

PPER

VVFIN

$.

$(

PDS

VAFIN

PPOSAT

NN

$(

$.

$(

10

nsm3

3sit

−−

−−

nsn

3sis

nsf

nsf

−−

−−

−−

SIMPX 516 −





− NF 515 V−MOD PX 514 −

KONJ





KONJ

PX 512 −

KOORD 500

VF 508

LK 509

ON

HD

NX 501



Doch

Zweifel



NX 503 HD

blieben 1

HD NX 511

HD

HD

0



NX 510

VXFIN 502

HD

PX 513 HD

− 2

sowohl

3

bei 4

Joergensen 5

HD

ADVX 504

ADVX 505

HD

HD

selbst 6

als 7

NX 506 −

auch 8



in 9

der 10

NX 507 HD



Redaktion 11

HD

der 12

taz 13

. 14

15

KON

NN

VVFIN

$(

KON

APPR

NE

ADV

KON

ADV

APPR

ART

NN

ART

NE

$.

−−

npm

3pit

−−

−−

d

dsm

−−

−−

−−

d

dsf

dsf

gsf

gsf

−−

5.3.3

Ambiguous Modifiers in Isolated Phrases

Since isolated phrases (cf. 3.4.5) do not consist of topological fields, ambiguous modifiers (MOD) have to be attached to the phrase itself. The isolated phrase is projected one level higher and the modifier is attached on this higher level. Thus, the information about ambiguity can be preserved even without topological fields or explicit MOD labelling, just by the existence of yet another projection level of the phrase. 90

The overall attachment strategy has been chosen in order to keep syntactic structure flat and to be able to preserve attachment ambiguity where necessary. In the following examples, so may refer to something that is implicit or has been mentioned before: NX 502 −

HD

ADVX 500

NX 501

HD

HD

so

Winkler 0

1

ADV

NE

−−

nsf

If there is more than one ambiguous modifier in an isolated phrase, all of them are attached on the next higher level. The mother node of this isolated phrase is marked with the node label of the modified phrase. NX 503 −



ADVX 500

ADVX 501

HD

HD

Zunächst

HD NX 502 HD

natürlich 0

Durcheinander 1

. 2

3

ADV

ADV

NN

$.

−−

−−

nsn

−−

NX 504 −



ADVX 500

ADVX 501

HD

HD

vielleicht

HD



mal 0

ADVX 503 HD

ein 1



NX 502

HD

Mini−Hit 2

da 3

4

ADV

ADV

ART

NN

ADV

−−

−−

nsm

nsm

−−

91

Chapter 6 The Annotation of Sentences The approach of topological fields supports the flat clustering principle inasmuch MF and NF allow for more than one constituent being attached to the same field node. The field nodes form a level of annotation between the phrase level and the sentence level. The last step to complete a sentence structure is to attach the field nodes to the highest annotation level of the whole structure: the root node. In the following sections, the annotation of sentence structures will be demonstrated.

6.1

Sentence Initial Fields

6.1.1

The C-Field in Verb-Final Clauses

The C-field (complementizer field) is the field for subordinating conjunctions KOUS (e.g. daß, wenn, da, weil, ob), KOUI (e.g. um (+zu)), relative pronouns (PRELS), interrogative (PWAV) pronouns and (complex) interrogative or relative phrases. Thus, it only occurs in verb-final clauses, except for comparison clauses with the conjunction als. In case of a conjunction, we directly project to the C-field: SIMPX 508 −





MF 506

VC 507

MOD

ON

ADVX 501

NX 502

OV refvc

C 500 −

HD

Wenn

HD

da 0

was 1

OV 503

HD

VXINF 503

VXINF 504

VXFIN 505

HD

HD

HD

gebucht 2

worden 3

ist 4

5

KOUS

ADV

PIS

VVPP

VAPP

VAFIN

−−

−−

***

−−

−−

3sis

There are conjunctions in German which consist of two elements (e.g. so daß and als ob). Both of them are also directly attached to the C-field, while none of them carries a head label.

92

SIMPX 510 −





MF 509 ON

V−MOD

MOD

PRED

PX 508 −

HD NX 506

VC 507

− C 500

NX 501





so



daß 0

HD

der 1

ADVX 502 HD

Maschinenpark 2

HD

HD

ADJX 503

ADJX 504

HD

heute 3

für 4

VXFIN 505

HD

lukrative 5

Sonderanfertigungen 6

HD

unbrauchbar

ist

7

8

9

KOUS

KOUS

ART

NN

ADV

APPR

ADJA

NN

ADJD

VAFIN

−−

−−

nsm

nsm

−−

a

apf

apf

−−

3sis

Since C generally does not contain more than one constituent, the adverb auch in the following example is not supposed to occur in the C-field together with the conjunction wenn. The wenn-clause is annotated as the modifier of the adverbial phrase auch, i.e., the adverbial phrase subcategorizes for the verb-final clause. ADVX 516 HD

− SIMPX 515 −





MF 514 ON

MOD

OA

NX 512 −

HD



NX 509 KONJ ADVX 500 HD

auch

, 0



KONJ



NX 502

NX 503

NX 504



HD

HD

HD

wenn

man 2

wie 3

ARD 4

und 5

HD

DP 510

C 501

1

V−MOD

NX 513

ADJX 505 HD

ZDF 6

VC 511 HD

relativ 7

viele 8

Teams 9

OV

HD

ADVX 506

VXINF 507

VXFIN 508

HD

HD

HD

überall 10

postieren 11

kann 12

13

ADV

$,

KOUS

PIS

KOKOM

NE

KON

NE

ADJD

PIDAT

NN

ADV

VVINF

VMFIN

−−

−−

−−

ns*

−−

nsf

−−

nsn

−−

apn

apn

−−

−−

3sis

If the constituent in the C-field is a pronoun or a complex phrase, it is first projected to the phrase level and then projected to the C-field. The edge label below the C-Field denotes the grammatical function of this constituent.

93

SIMPX 508 −



C 505



MF 506

ON

MOD

NX 500

ADVX 501

HD

PRED

ADJX 502

HD

Wieviel

VC 507

V−MOD

ADJX 503

HD

da 0

HD VXFIN 504

HD

monatlich

HD

fällig

1

2

wird 3

4

PWS

ADV

ADJD

ADJD

VAFIN

nsn

−−

−−

−−

3sis

R−SIMPX 510 −



C 508

MF 509

FOPP

ON

PX 505

MOD

MOD

NX 506



HD



HD

HD

ADJX 501 HD

deren 0

VC 507



NX 500 −

zu



HD

Ablaß 1

die

tonale

2

3

Ebene 4

ADVX 502

ADVX 503

VXFIN 504

HD

HD

HD

natürlich 5

nicht 6

ausreicht 7

8

APPR

PRELAT

NN

ART

ADJA

NN

ADV

PTKNEG

VVFIN

d

gp*

dsm

nsf

nsf

nsf

−−

−−

3sis

6.1.2

The C-Field in Verb-Second Clauses

Only comparison clauses with als allow for a C-field and a left sentence bracket in the same clause: SIMPX 506 −





LK 504 HD C 500 −

ON

VXFIN 501 HD

als

PRED

NX 502 HD

sei 0

MF 505



das 1

NX 503 HD

deren 2

Pflicht 3

4

KOUS

VAFIN

PDS

PDAT

NN

−−

3sks

nsn

gsf

nsf

6.1.3

The KOORD-Field in all Clause Types

The KOORD-field is optionally the left-most field of all clause types (V-1, V-2, V-end). Therefore, it can only occur at the beginning of a syntactic unit (cf. 3.4.3). For verb-second clauses, it can be regarded as an alternative field to the PARORDfield. The KOORD-field contains coordinative particles like und, oder, aber, etc. (cf. H¨ohle (1986)). Here are two examples of different clause types: 94





SIMPX 513 −





MF 512 V−MOD

MOD

OPP PX 511 −

KOORD 500 −

Und

VF 507 ON

LK 508 HD

NX 501 HD

VXFIN 502 HD

ADVX 503 HD

war

früher

Koring

HD NX 509 −

ADVX 504 HD

VC 510 OV

HD

ADJX 505 HD

in

schiefes

KON

NE

VAFIN

ADV

ADV

APPR

ADJA

NN

VVPP

−−

nsm

3sit

−−

−−

a

asn

asn

−−





0

1

2

einmal

VXINF 506 HD

3

4

5

6

Licht

7

geraten

8

SIMPX 507 −

LK 505

MF 506

HD KOORD 500

VXFIN 501



HD

Oder

MOD

PRED

NX 502

ADVX 503

ADJX 504

HD

ist 0

ON

HD

Bremerhaven 1

HD

nicht 2

günstiger 3

? 4

5

KON

VAFIN

NE

PTKNEG

ADJD

$.

−−

3sis

nsn

−−

−−

−−

6.1.4

The PARORD-Field in Verb-Second Clauses

PARORD is an alternative field to KOORD for verb-second clauses only. PARORD expressions are denn, weil1 :

Typical

SIMPX 509 −









VF 508 ON NX 505 −

LK 506 HD

HD

PARORD 500

ADVX 501

VXFIN 502



HD

HD

Denn

auch 0

die 1

MF 507 OPP

gehen 2

PX 503

VC 504

HD

VPT

davon 3

aus 4

5

KON

ADV

PDS

VVFIN

PROP

PTKVZ

−−

−−

np*

3pis

−−

−−

1

weil can occur in verb-second and in verb-final clauses. In the first case, it is in the PARORD-field, in the latter case, it belongs to the C-field.

95

6.1.5

Resumptive Constructions: The LV-Field

Resumptive constructions are analyzed as suggested by H¨ohle (1986) and Kathol (1995), by using the field LV (Linksversetzung) which is located on the left side of VF. In general, the LV-field is not restricted to one constituent. The typical feature of a resumptive construction is that there is a (pronominal) constituent somewhere in the sentence, on the right hand side of the LV-field, which refers back to the expression within the LV-field. Therefore, we use the X-MOD label to indicate this kind of long-distance dependency. SIMPX 520 −









LV 519 ON−MOD PX 518 KONJ

KONJ

PX 516

PX 517



HD



HD

NX 510

NX 511



HD



ADJX 500

ADJX 501

HD

Vom

HD

introvertierten

Einzelgänger 1

zum 2

MOD

MOD

MOD

MOD

OV

HD

VXFIN 503

ADVX 504

ADVX 505

ADVX 506

ADVX 507

VXINF 508

VXINF 509

HD

HD

HD

HD

HD

HD

HD

ja

auch

HD

stilbildenden 3

LK 513 HD

NX 502

HD

0

VF 512 ON

Popstar 4

, 5

das

6

MF 514

muß 7

8

9

VC 515

erst 10

einmal 11

verkraftet 12

werden 13

14

APPRART

ADJA

NN

APPRART

ADJA

NN

$,

PDS

VMFIN

ADV

ADV

ADV

ADV

VVPP

VAINF

dsm

dsm

dsm

dsm

dsm

dsm

−−

nsn

3sis

−−

−−

−−

−−

−−

−−

SIMPX 515 −









LV 514 FOPP−MOD SIMPX 513 −





MF 508 ON KOORD 500 −

Doch

C 501

NX 502



HD

wie 0

es 1

VC 509

VF 510

OV

HD

VXINF 503

VXFIN 504

HD

HD

weitergehen 2

FOPP

, 4

HD

ON NX 507

HD

darüber

5

MF 512

VXFIN 506

HD

soll 3

PX 505

LK 511



herrscht 6

HD

kein 7

Konsens 8

. 9

10

KON

KOUS

PPER

VVINF

VMFIN

$,

PROP

VVFIN

PIAT

NN

$.

−−

−−

nsn3

−−

3sis

−−

−−

3sis

nsm

nsm

−−

Grammatical functions within a LV-construction are assigned according to the following principle: • The LV-constituent is licensed by some (pronominal) constituent within the core sentence. The core sentence exceeds from VF to NF. Therefore, the licensing constituent is considered to be modified by the constituent within the LV-field. For instance, ON-MOD is licensed by ON like in the first example above, which is also in strong accordance with the assumption that the original position of the subject in verb-second clauses is VF. 96

15

In constructions with wenn ... dann ..., the wenn-clause, which is semantically a precondition to the dann-clause, is in the LV-field in correlation with dann. Therefore, dann (MOD) refers back to the wenn-clause (MOD-MOD): SIMPX 519 −







LV 518 MOD−MOD SIMPX 516 −

MF 517





MF 511 MOD

ON

ADVX 501

NX 502

OV



HD

Wenn

HD

da 0

was 1

VF 513

refmod HD

MOD 516

HD

VXINF 504

VXFIN 505

ADVX 506

VXFIN 507

HD

HD

HD

HD

HD

gebucht

OV 503

worden

2

ist

3

4

, 5

dann

6

MOD

PRED

LK 514

VXINF 503

refvc C 500

ON

VC 512

− NX 508 HD

ist 7

PX 515

NX 510

HD

das 8

HD

ADVX 509

HD

nicht 9

in 10

Ordnung 11

12

KOUS

ADV

PIS

VVPP

VAPP

VAFIN

$,

ADV

VAFIN

PDS

PTKNEG

APPR

NN

−−

−−

***

−−

−−

3sis

−−

−−

3sis

nsn

−−

d

dsf

If dann is not present in the matrix clause , the wenn-clause occurs in VF. In this case, the wenn-clause is labeled as MOD because there is no explicit correlating constituent. It rather refers to the whole matrix clause, e.g.(Wenn da was gebucht worden ist (MOD), ist das nicht in Ordnung.)

6.2

Questions

6.2.1

W-Questions

In general, w-questions are verb-second clauses with interrogative pronouns in VF. The problem here is to decide on the syntactic category of the interrogative phrase. We follow the strategy to assign PX to all PWAVs, which compositionally comprise a preposition such as wobei, wofr, wogegen, woher, womit, woran, worauf, wovon, wozu and also to causal PWAVs such as warum, wieso, weshalb. The (non-compositional) PWAVs wann, wo are analysed as ADVX. The annotation of wie is still work in progress. In the current release it is annotated as ADJX instead of ADVX if it directly modifies an adjective or adverb itself. SIMPX 510 −



VF 507 V−MOD

LK 508 HD

PX 500 HD

VXFIN 501 HD

Warum

0

machen





MF 509 ON

OA

NX 502 HD

1

wir

2

NX 503 −

den

HD

3

Computer

4

MOD

MOD

ADVX 504 HD

ADVX 505 HD

nicht

einfach

5

VC 506 VPT

6

aus

7

?

8

PWAV

VVFIN

PPER

ART

NN

PTKNEG

ADV

PTKVZ

$.

−−

1pis

np*1

asm

asm

−−

−−

−−

−−

97

6.2.2

Yes - No Questions

Yes - no questions may occur in various forms, but the most typical form is the verb-first clause: SIMPX 506 −

− MF 505 ON

LK 503

OA

NX 504

HD



HD

VXFIN 500

EN−ADD 501

HD



Veruntreute

die 0

NX 502 HD

AWO

Spendengeld

1

2

? 3

4

VVFIN

ART

NN

NN

$.

3sit

nsf

nsf

asn

−−

Otherwise, a question mark at the end of a verb-second or verb-final clause indicates that it is actually meant as a question: SIMPX 508 −



− MF 507 MOD

VF 504 ON

PRED

LK 505

ADJX 506

HD

NX 500 HD

Das

− ADVX 502

ADVX 503

HD

HD

HD

ist 0

HD

VXFIN 501

doch 1

ganz 2

klar 3

? 4

5

PDS

VAFIN

ADV

ADV

ADJD

$.

nsn

3sis

−−

−−

−−

−−

SIMPX 507 −





MF 506 ON

V−MOD PX 504 −

VC 505 HD

HD

C 500

NX 501

NX 502



HD

HD

Ob

Ampler 0

auf 1

HD

Sieg 2

VXFIN 503

fahre 3

? 4

5

KOUS

NE

APPR

NN

VVFIN

$.

−−

nsm

a

asm

3sks

−−

98

6.3

Relative Clauses

Considering relative clauses (R-SIMPX), the relative pronoun occurs in the C-field. It is first projected to the phrase level before it is attached to the C node. The relative clause itself is located in NF like in the following example if no other constituent follows. Its edge label shows to which constituent of the matrix clause it is related. OA-MOD, for example, suggests that the relative clause refers to OA: SIMPX 516 −







− NF 515 OA−MOD

MF 513

VF 507

LK 508

ON KOORD 500

HD

NX 501



Aber

C 510 HD

HD

gäbe 1

NX 509

ADJX 503

HD

es 0





VXFIN 502

HD

R−SIMPX 514

OA

− MF 511

ON

OA

NX 504

NX 505

HD

intelligente 2

Lösungen 3

, 4



die

5

− VC 512 HD VXFIN 506 HD

kein 6

HD

Geld

kosten

7

8

. 9

10

KON

PPER

VVFIN

ADJA

NN

$,

PRELS

PIAT

NN

VVFIN

$.

−−

nsn3

3skt

apf

apf

−−

np*

asn

asn

3pis

−−

If the head noun phrase of the relative clause is the noun phrase of a prepositional phrase or a postmodifier within a complex phrase, the relative clause is labeled as MOD. Additionally, there is a secondary edge label named REFINT (cf. 3.4.6) from the head noun NX to the relative clause: SIMPX 517 −





− NF 516 MOD R−SIMPX 515 −

MF 512

C 513

OPP VF 506

LK 507

ON

HD

NX 500 −

Ein

PX 508 HD 515

Bettenrost 0



mutiert 1



zu 2



NX 503 HD

einem 3

NX 510 HD

NX 502

HD

ON

PX 509

refint



HD

Gefängnisgitter 4

, 5

hinter

6



MF 514

V−MOD

VXFIN 501 HD



HD

ADJX 504

VXFIN 505

HD

dem 7

VC 511 HD

HD

freier 8

Himmel 9

lockt 10

. 11

12

ART

NN

VVFIN

APPR

ART

NN

$,

APPR

PRELS

ADJA

NN

VVFIN

$.

nsm

nsm

3sis

d

dsn

dsn

−−

d

dsn

nsm

nsm

3sis

−−

The position of the relative clause in NF is justified by the fact that it does not necessarily occur as an immediate constituent located on the right side of the noun phrase to which it refers. For example, a verb complex can occur between the noun phrase and the relative clause (Der Bettenrost ist zu einem Gef¨ angnisgitter mutiert, hinter dem freier Himmel lockt.). In sentences like this, the complexity of the noun phrase (NP + relative clause) is important. This so called heavyness follows Behaghel’s first physical 99

law (Behaghel 1932): complex noun phrases tend to find a position at the end of the sentence even if they deviate from their basic order. If the relative clause does not follow the noun phrase immediately, its unmarked position is in NF. Unless there is strong evidence for a position in MF, the relative clause is located in NF. If the relative clause and its head noun phrase are adjacent constituents in VF or MF, the relative clause modifies the noun phrase directly as a postmodifier. R−SIMPX 516 −





MF 515 OA NX 514 HD

− R−SIMPX 513 −

− C 507 ON

NX 508 −

NX 500 HD

HD

C 509 V−MOD

EN−ADD 501 −

ADVX 502 HD



MF 510 ON

PRED

NX 503 HD

NX 504 HD

VC 511 HD

VC 512 HD

VXFIN 505 HD

VXFIN 506 HD

die

die

AWO

,

wo

er

Kreisvorsitzender

ist

,

prüfte

PRELS

ART

NN

$,

PWAV

PPER

NN

VAFIN

$,

VVFIN

$.

nsf

asf

asf

−−

−−

nsm3

nsm

3sis

−−

3sit

−−

0

1

6.3.1

2

3

4

5

6

7

8

.

9

10

Event-modifying Relative Clauses

Relative clauses that modify an event which is not expressed by a nominal expression are annotated as SIMPX. SIMPX 518 −





− NF 517 MOD SIMPX 516 −

− MF 514 OA

FOPP

LK 508 HD

V−MOD

PX 509 −

VXFIN 500 HD

HD

NX 501 −

NX 502 HD

HD

NX 504 HD

VC 513 HD

HD NX 505 −

,

was

NN

VVPP

$,

PWS

APPR

PIDAT

NN

PIAT

NN

VVFIN

$.

−−

3pis

asm

asm

asn

asn

−−

−−

nsn

d

dsf

dsf

***

asm

3sis

−−

6.3.2

7

8

9

10

Szene

11

mehr

VXFIN 507 HD

HD

APPRART

6

jeder



NN

5

mit

NX 506 HD

ART

4

übertragen



haben

3

Niederdeutsche

VXINF 503 HD

OA

PX 512

VAFIN

2

ins

C 511 ON

...

1

Text

VC 510 OV

−−

0

den



MF 515

12

Sinn

13

macht

14

Independent Relative Clauses

Independent relative clauses (also ’nominal relative clauses’, in German ’Freie Relativs¨atze’) do not modify a head word but substitute an argument or adjunct in the clause. Consequently, they are labeled SIMPX on sentential level (instead of R-SIMPX) and they function as (sentential) subject (ON) or sentential object (OS). The latter is 100

.

15

not uncontroversial since they are distributed like non-sentential, nominal arguments with respect to subcategorization restrictions. The relative pronoun used in independent relative clauses normally belongs to the w-class of relative pronouns such as wer or was and is tagged with the STTS tag PWS. SIMPX 516 −





VF 515 ON SIMPX 513 −



C 508 ON

MOD

NX 500 HD

ADVX 501 HD

Wer



MF 509 V−MOD

VXINF 503 HD

VXINF 504 HD

HD

VXFIN 506 HD

NX 507 −

will

,

kommt

VVPP

VAINF

VMFIN

$,

VVFIN

APPR

PPOSAT

NN

$.

ns*

−−

−−

−−

−−

3sis

−−

3sis

a

ap*

ap*

−−



SIMPX 517 −



3

4

5

6



7

auf

8

seine

HD

ADJD

2

werden

VXFIN 505 HD

PX 512 −

ADV

1

unterhalten

LK 511 HD

HD

PWS

0

gut

VC 510 OV refvc 503

OV

ADJX 502 HD

einfach

MF 514 OPP

9

Kosten

10

.

11

− NF 516 OS SIMPX 515 −

− VF 508 MOD

LK 509 HD

ADVX 500 HD

VXFIN 501 HD

Manchmal

NX 502 HD

C 512 OA

MF 513 ON

VC 514 HD

ADJX 503 HD

VXINF 504 HD

NX 505 HD

NX 506 HD

VXFIN 507 HD

klar

sagen

,

was

will

.

VMFIN

PIS

ADJD

VVINF

$,

PWS

PIS

VMFIN

$.

−−

3sis

ns*

−−

−−

−−

asn

ns*

3sis

−−

1

man

VC 511 OV

ADV

0

muß

ON



MF 510 V−MOD

2

3

4

5

6

man

7

8

9

Independent relative clauses introduced by wie are currently annotated in a different manner. Wie is analyzed as subordinating conjunction (KOUS). This type of structure is to be revised in a subsequent release.

6.4

Coordination

Coordination is a syntactic phenomenon that occurs on the following annotation levels: phrase level, field level, and sentence level. Within coordinations, the conjuncts are first projected to their phrase, field, or clause level. In a second step, they are attached to their mother node which is n-ary branching (conjunctions between the conjuncts). This scheme is the same for all syntactic categories. The edge labels between the mother node and the conjuncts of the coordination are labeled as KONJ. This edge label supports the distinction between conjuncts, modifiers, and conjunctions within complex conjunctions (cf. 6.4.3), as well as the distinction between coordinations and elliptical constructions (cf. 6.5). 101

In contrast to coordinating conjunctions in the KOORD-field, coordinating conjunctions in coordinations (und, oder, etc.) are directly attached to the mother node of the conjuncts. The class of coordinating conjunctions consists of single, e.g. und, oder, aber, als, as well as of complex conjunctions, e.g. entweder oder, weder noch, sowohl als. Generally, coordinating conjunctions may coordinate constituents of any category. Moreover, they can form asymmetric coordinations in which the conjuncts belong to different syntactic categories (cf. 6.4.2).2 In order to distinguish conjunctions from conjuncts within a coordination, their edge labels are empty. In the following, coordination on all annotation levels as well as specific cases of coordination, e.g. split coordinations, will be demonstrated.

6.4.1

Coordination of Phrases

Noun Phrases NX 506 KONJ



KONJ

NX 504

NX 505

HD



NX 500

HD

NX 501

HD



Ende

NX 502 HD

der 0



Kämpfe 1

NX 503

HD

und 2



Verurteilung

HD

der

3

4

Selbstmandatierung 5

6

NN

ART

NN

KON

NN

ART

NN

asn

gpm

gpm

−−

asf

gsf

gsf

Prepositional Phrases PX 504 KONJ



KONJ

PX 502 −

PX 503 HD



HD

NX 500

NX 501

HD

am



Arbeitsplatz 0

oder 1

in 2

HD

der 3

Familie 4

5

APPRART

NN

KON

APPR

ART

NN

dsm

dsm

−−

d

dsf

dsf

2

If bis is used as a conjunction like in 10.000 bis (KON) 20.000 koreanischen Daewoo PKW it is tagged as KON. But remember that von ... bis ... phrases are treated differently (cf. 4.4.1).

102

Adjectival Phrases NX 503 −

HD

ADJX 502 KONJ



KONJ

ADJX 500

ADJX 501

HD

HD

Heimliche

und 0

illegale 1

Pioniertat 2

3

ADJA

KON

ADJA

NN

nsf

−−

nsf

nsf

SIMPX 510 −



− MF 509 PRED

VF 506

LK 507

ON

HD

NX 500

ADJX 508

VXFIN 501

HD

HD

Das

KONJ

KONJ

KONJ

KONJ

ADJX 502

ADJX 503

ADJX 504

ADJX 505

HD

klingt 0

HD

anmaßend 1

, 2

HD

pathosschwer

3

, 4

HD

elektrisch

5

, 6

laut

.

7

8

9

PDS

VVFIN

ADJD

$,

ADJD

$,

ADJD

$,

ADJD

$.

nsn

3sis

−−

−−

−−

−−

−−

−−

−−

−−

Adverbial Phrases ADVX 502 KONJ



KONJ

ADVX 500

ADVX 501

HD

HD

solo

oder 0

zusammen 1

2

ADV

KON

ADV

−−

−−

−−

6.4.2

Asymmetric Coordination

Since constituents of different syntactic categories can be coordinated, it has to be decided on a label for the mother node of the coordination. In this case, the default strategy has been adopted to choose the syntactic category of the left-most conjunct as the category of the entire coordination:

103

ADVX 504 KONJ

KONJ

ADVX 500 HD

heute

, 0

KONJ



KONJ

NX 501

NX 502

NX 503

HD

HD

HD

So.

1

, 2

Mo.

3

, 4

u.

5

Di.

6

7

ADV

$,

NN

$,

NN

$,

KON

NN

−−

−−

nsm

−−

nsm

−−

−−

nsm

SIMPX 519 −



− MF 518 PRED ADJX 517 KONJ

KONJ

KONJ

PX 515

NX 516

KONJ VF 509

LK 510

ON

HD

NX 500 HD

Die

KONJ

VXFIN 501





KONJ

und 3

geschmackvoll 4

, 5

von



HD

PX 513 HD

HD

zart 2



ADJX 503

HD

ist 1

PX 512

ADJX 502

HD

Farbpalette 0

ADJX 511

KONJ



NX 514 HD

KONJ

NX 504

NX 505

HD

HD

Bordeaux

bis 8

ADVX 506 HD

Flieder 9

, 10

auch

11



KONJ

NX 507

NX 508

HD

HD

Orange 12

oder 13

Rosa 14

.

6

7

ART

NN

VAFIN

ADJD

KON

ADJD

$,

APPR

NN

APPR

NN

$,

ADV

NN

KON

NN

$.

nsf

nsf

3sis

−−

−−

−−

−−

d

dsn

a

asn

−−

−−

nsn

−−

nsn

−−

SIMPX 512 −







MF 511 PRED NX 510 − VF 505 ON

KONJ

LK 506 HD

NX 500 HD



VXFIN 501

PX 508 HD

VC 509



HD

weder 1



ausgesprochene 2

OV

NX 503

HD

bin 0

KONJ

ADJX 502

HD

Ich



NX 507

Meat−Loaf−FanIn 3

noch 4

in 5

HD

dem 6

VXINF 504 HD

Konzert 7

gewesen 8

. 9

10

PPER

VAFIN

KON

ADJA

NN

KON

APPR

ART

NN

VAPP

$.

ns*1

1sis

−−

nsf

nsf

−−

d

dsn

dsn

−−

−−

6.4.3

Coordinations with Complex Conjunctions

The conjuncts and conjunctions of a coordination with complex conjunctions are also attached on the same level following the above mentioned rules for coordination. Both parts of complex conjunctions like entweder oder and sowohl als are tagged as KON. The latter one usually occurs together with the adverb auch, which is tagged as ADV, projected to the phrase level, and then attached to the mother node of the coordination. The same applies for nicht in coordinations with sondern. Sondern is tagged as KON, whereas nicht is always tagged as PTKNEG:

104

15

16

SIMPX 516 −



− FKOORD 515 KONJ



KONJ MF 514 V−MOD

PRED

PX 513 − VF 509

LK 510

MOD

HD

ADVX 500

VXFIN 501

HD

HD

Immerhin

MF 511

wird

NX 512

ON

MOD

MOD

MOD

PRED

NX 502

ADVX 503

ADVX 504

ADVX 505

ADJX 506

HD

HD

HD

HD

0

HD

es 1

nicht 2

noch 3

HD

ADJX 507

HD

obendrein 4



ADJX 508

HD

kalt 5

,

sondern

bei 8

HD

20 9

Grad 10

erträglich 11

.

6

7

ADV

VAFIN

PPER

PTKNEG

ADV

ADV

ADJD

$,

KON

APPR

CARD

NN

ADJD

12

$.

13

−−

3sis

nsn3

−−

−−

−−

−−

−−

−−

d

−−

dpn

−−

−−

SIMPX 517 −



− MF 516 V−MOD

VF 514

OA

ADJX 515

ON



KONJ





KONJ

NX 512

PX 513

HD





PX 508 −

HD

NX 500 −

Papst−Besuch

in 1

HD

VXFIN 502

HD

0

NX 510

HD

NX 501 HD

Der

HD

LK 509

ADJX 503

HD

Bukarest 2

ADVX 504

HD

spielt 3

sowohl 4

außenpolitisch 5

NX 505

HD

als 6

auch

für 8





ADVX 506

HD

7

NX 511 −

HD

Rumänien 9

HD

ADJX 507 HD

selbst 10

eine 11

große 12

Rolle

.

13

14

15

ART

NN

APPR

NE

VVFIN

KON

ADJD

KON

ADV

APPR

NE

ADV

ART

ADJA

NN

$.

nsm

nsm

d

dsn

3sis

−−

−−

−−

−−

a

asn

−−

asf

asf

asf

−−

SIMPX 525 −

KONJ



KONJ

SIMPX 524 −





− NF 523 OS

MF 520 ON

MOD

LK 511 −

VXFIN 500

NX 501

HD

HD

haben 0



ADVX 512

HD

Entweder

SIMPX 521

die 1

VC 513 HD

VXINF 503

HD

HD

überhaupt 2

nicht 3

C 514

OV

ADVX 502

OPP

begriffen 4



, 5



VC 516

VF 517

LK 518

HD

ON

HD

HD

HD

VXFIN 506

NX 507

HD

es 7



MF 515

NX 505

worum



ON

PX 504

6

SIMPX 522 −

HD

geht

, 9

oder

10

HD

es 11

PRED

NX 509

ADJX 510

HD

ist

HD

ihnen

egal 14

.

12

13

KON

VAFIN

PDS

ADV

PTKNEG

VVPP

$,

PWAV

PPER

VVFIN

$,

KON

PPER

VAFIN

PPER

ADJD

$.

−−

3pis

np*

−−

−−

−−

−−

−−

nsn3

3sis

−−

−−

nsn3

3sis

dp*3

−−

−−

105

8

VXFIN 508

MF 519 OD

15

16

6.4.4

Coordinations with Truncated Words

In contrast to complete lexical entries, a truncated word is directly attached to the conjunct and the conjunction. Neither the truncated word nor the second conjunct are projected firstly to the phrase level. Their edge label is KONJ. Both conjuncts will only then project to a phrasal node if either both conjuncts comprise a determiner or if both conjuncts are premodified. In this case only the complete lexical entry carries a head label. The truncated word does not receive morphological annotation. NX 500 KONJ



Bau−

KONJ

und 0

Verkehrsplanungen 1

2

TRUNC

KON

NN

−−

−−

dpf

PX 503 −

HD NX 502 KONJ



KONJ

NX 500

NX 501



bei



einer 0



SPD− 1

oder 2

HD

einer 3

CDU−Veranstaltung 4

5

APPR

ART

TRUNC

KON

ART

NN

d

dsf

−−

−−

dsf

dsf

NX 502 KONJ



KONJ NX 501 −

HD

ADVX 500 HD

Noch−Frauen−

und 0

bald 1

Fußballsender 2

3

TRUNC

KON

ADV

NN

−−

−−

−−

nsm

In the case of complex conjunctions, the conjuncts are annotated by means of a shallow structure in the same manner as in the case of less complex structures. ADJX 501 −

KONJ





KONJ

ADVX 500 HD

sowohl

kultur− 0

als 1

auch 2

stadtentwicklungspolitisch 3

4

KON

TRUNC

KON

ADV

ADJD

−−

−−

−−

−−

−−

106

NX 503 −

KONJ

ADVX 500



KONJ

NX 501

HD



nicht

die 0

NX 502 −



Sozial−

,

1

2

sondern

3

HD

die 4

Bildungsbehörde 5

6

PTKNEG

ART

TRUNC

$,

KON

ART

NN

−−

nsf

−−

−−

−−

nsf

nsf

Word initial truncs are different from truncated words which include the second part of a word. The latter ones are treated like complete lexical heads, because they comprise the head morpheme of the complex word. NX 507 HD

− PX 506 −

HD EN−ADD 505 −

NX 503 −

KONJ

NX 504 KONJ

NX 500 HD



NX 501 HD

Originaltitel

ADJX 502 HD

"

8

MM

"

.

NN

KON

NN

APPR

$(

CARD

NN

$(

$.

nsm

−−

nsf

d

−−

−−

dsm

−−

−−

6.4.5

0

und

HD

1

−fassung

2

von

3

4

5

6

7

8

Attachment Principles of Coordination within Phrases

If two or more nominal conjuncts occur together with a common determiner and/or adjectival phrase, first the conjuncts are projected to their phrase level and then the determiner or the adjectival phrase is attached to the coordination on a higher level according to the high attachment principle. Thus, the modification scope comprises the entire coordination. The coordinate part is assigned the head function. NX 503 −

HD NX 502 KONJ

den



KONJ

NX 500

NX 501

HD

HD

Angestellten 0

und 1

Beamten 2

3

ART

NN

KON

NN

dp*

dp*

−−

dpm

107

NX 504 −



HD NX 503 KONJ

ADJX 500 HD

Die

türkischen 0



KONJ

NX 501

NX 502

HD

HD

Instrumente 1

und 2

Harmonien 3

4

ART

ADJA

NN

KON

NN

npn

npn

npn

−−

npf

6.4.6

Coordination of Topological Fields

The conjuncts of a coordination of topological fields are either single fields (cf. 6.4.4) or a combination of fields. Possible combinations are, for instance, (MF + VC), (LK + MF), (LK + MF + VC). The node label for these conjuncts is FKONJ (conjunct consisting of fields) and the mother node of a coordination of conjuncts of fields is FKOORD. In a coordination of conjuncts of fields, the following annotation steps are involved: 1. The constituents are attached to the fields in which they occur in (MF, VC, NF, etc.). 2. Each conjunct (concatenation of fields or single field) is labeled as FKONJ. 3. The conjuncts are attached to the general coordination field FKOORD. SIMPX 526 −

− FKOORD 525 KONJ



KONJ

FKONJ 523 −

FKONJ 524 −







MF 521

NF 522

OPP

OA−MOD

PX 518 − VF 510 ON

HD

LK 511

HD





VXFIN 501

HD

HD

glauben 0

LK 513

ADJX 502

HD

Wir

V−MOD

NX 512

HD

NX 500

MF 519

an 1

die 2

totale 3

Gegenwart 4

und 5

OA



KONJ ADVX 504

ADVX 505

HD

HD

HD

hier 7

KONJ

und 8



C 515

HD

tun



ADVX 514

VXFIN 503

6

R−SIMPX 520

jetzt 9



MF 516

OA

ON

NX 506

NX 507

NX 508

HD

HD

HD

alles 10

, 11

was

12

VC 517 HD VXFIN 509 HD

wir

können

.

13

14

PPER

VVFIN

APPR

ART

ADJA

NN

KON

VVFIN

ADV

KON

ADV

PIS

$,

PRELS

PPER

VMFIN

$.

np*1

1pis

a

asf

asf

asf

−−

1pis

−−

−−

−−

asn

−−

asn

np*1

1pis

−−

Often, the subject of the sentence occurs only in the left field conjunct:

108

15

16

SIMPX 520 −



− FKOORD 519 KONJ



KONJ

FKONJ 518 −



MF 517 ON

V−MOD

FOPP

PX 515

FKONJ 516

− VF 508 MOD

LK 509 HD

ADVX 500 HD

VXFIN 501 HD

Nun

sollen

HD



NX 510

PX 511

− NX 502 HD



HD

ADJX 503 HD

NX 504 HD

VXINF 505 HD

NX 506 −

in

zwei

zu

Hobbypolizisten

APPR

CARD

NN

APPR

NN

VAINF

KON

ART

NN

VVINF

$.

−−

3pis

np*

d

−−

dpf

d

dpm

−−

−−

asf

asf

−−

−−

3

4

5

6

7

und

VXINF 507 HD

HD

NN

2

werden

VC 514 OV

VMFIN

1

Wochen



MF 513 OA

ADV

0

Leute

HD

VC 512 OV

8

eine

9

Waffe

10

11

bekommen

.

12

13

A coordination of fields may also be an embedded structure. In this case, FKOORD functions also as conjunct label: SIMPX 521 −

− FKOORD 520 KONJ

KONJ FKONJ 519 −

− FKOORD 518 KONJ

FKONJ 515 − VF 508 ON

HD

NX 500 −

HD

Älteren 0

LK 511 −

VXFIN 503

, 3

HD

HD

haben

4

Verpflichtungen 6

und 7

VC 514

MOD

OA

ADVX 505

NX 506

HD

familiäre 5



MF 513

ADJX 504

HD

teurer 2



NX 512

HD

ADJX 502 HD

sind 1

KONJ FKONJ 517

OA

MF 510 PRED

VXFIN 501 HD

Die



LK 509



MF 516



oft 8

HD

ein 9

OV VXINF 507 HD

Haus 10

abzuzahlen 11

. 12

13

ART

NN

VAFIN

ADJD

$,

VAFIN

ADJA

NN

KON

ADV

ART

NN

VVIZU

$.

np*

np*

3pis

−−

−−

3pis

apf

apf

−−

−−

asn

asn

−−

−−

6.4.7

Attachment of Ambiguous Modifiers in Coordination

Within phrases, the modification scope of a premodifier can be ambiguous. Therefore, high attachment is applied to preserve ambiguity. In the following example, the adverb modifies the coordination of adjectives rather than only the first adjective:

109

ADJX 504 −

HD ADJX 503 KONJ

ADVX 500 HD



KONJ

ADJX 501

ADJX 502

HD

Viel

HD

größer

und

0

1

brutaler 2

3

ADV

ADJD

KON

ADJD

−−

−−

−−

−−

Modifying constituents are attached to a conjunct rather than to a field if their modification scope is limited to the conjunct. SIMPX 512 −



− MF 511 OPP PX 510 KONJ

VF 506

LK 507

ON NX 500

HD



VXFIN 501

ADVX 502

HD

HD

HD

Wir

glauben 0



KONJ

PX 508 −

HD



NX 503

an 2

HD

die 3



HD

ADVX 504



nicht 1

PX 509

NX 505

HD

Vergangenheit

und

4

5



nicht 6

an 7

HD

die 8

Zukunft 9

. 10

11

PPER

VVFIN

PTKNEG

APPR

ART

NN

KON

PTKNEG

APPR

ART

NN

$.

np*1

1pis

−−

a

asf

asf

−−

−−

a

asf

asf

−−

Also in coordinations with complex conjunctions, attachment on the phrase level is applied if possible. SIMPX 514 −







VF 513 ON NX 511

VC 512

APP

APP

NX 506 −

OV

EN−ADD 507 HD



NX 500

NX 501

HD



Radunskis

Sprecher 0

, 1

MF 509 OA

VXFIN 502 −

Axel

2

LK 508 HD

, 4



wollte

5

KONJ

NX 503

HD

Wallrabenstein 3

VXINF 510 −

HD

die 6

Entscheidung 7

weder 8



KONJ

VXINF 504

VXINF 505

HD

HD

bestätigen 9

noch 10

dementieren 11

12

NE

NN

$,

NE

NE

$,

VMFIN

ART

NN

KON

VVINF

KON

VVINF

gsm

nsm

−−

nsm

nsm

−−

3sit

asf

asf

−−

−−

−−

−−

110

13

SIMPX 514 −







MF 513 OA NX 512 − VF 508 ON

LK 509 HD

NX 500

VXFIN 501 HD



Die

HD

hatte

KONJ





KONJ NX 510 −

− ADVX 502 HD

ADVX 503 HD

nicht

etwa

NX 504 −

ADVX 505 HD

HD

,

sondern

PTKNEG

ADV

PPOSAT

NN

$,

KON

ADV

ART

ADJA

NN

VVPP

$.

nsf

nsf

3sit

−−

−−

asf

asf

−−

−−

−−

asn

asn

asn

−−

−−

3

4

5

6

7

8

gleich

das

9

ganze

VXINF 507 HD

VAFIN

2

Lektüre

HD

ADJX 506 HD

NE

1

unsere

VC 511 OV

ART

0

Lufthansa



10

11

Flugzeug

12

rationiert

If there is more than one constituent within a conjunct, each with its own grammatical function, these constituents are first attached to the respective field node. Then, the fields are coordinated: SIMPX 516 −



− FKOORD 515 KONJ



KONJ MF 514 V−MOD

PRED

PX 513 − VF 509

LK 510

MOD

HD

ADVX 500

VXFIN 501

HD

HD

Immerhin

MOD

MOD

MOD

PRED

NX 502

ADVX 503

ADVX 504

ADVX 505

ADJX 506

HD

HD

HD

es

nicht 2

noch 3

ADJX 508

HD

kalt 5

HD

ADJX 507

HD

obendrein 4



,

sondern

bei 8

HD

20 9

Grad 10

erträglich 11

.

6

7

ADV

VAFIN

PPER

PTKNEG

ADV

ADV

ADJD

$,

KON

APPR

CARD

NN

ADJD

$.

−−

3sis

nsn3

−−

−−

−−

−−

−−

−−

d

−−

dpn

−−

−−

6.4.8

1

NX 512

ON

HD

wird 0

HD

MF 511

12

13

Coordination of Sentences

In accordance with the longest match principle, complete sentences are coordinated as paratactic constructions when they belong to the same syntactic unit (cf. 3.4.3), i.e., they are coordinated by a conjunction, a comma, or a dash:

111

13

.

14

SIMPX 523 KONJ



KONJ

SIMPX 522 −



− MF 521 OPP

VF 519

PX 520

ON



HD

NX 516

NX 517

APP

APP EN−ADD 509

Nashorn−Bürgermeister



Henning 0



VXFIN 502



Storchbein

setzt 2

auf 3

HD

die

HD

ADJX 505

HD

Sammlung

der 6

positiven

Kräfte

und

8

9

prompt 10



MF 514

VC 515

HD

MOD

OV

VXFIN 506

ADVX 507

VXINF 508

HD

HD

HD

HD

7



LK 513

V−MOD

ADJX 504





VF 512



NX 503

HD

1



NX 511

HD

NX 501

HD



LK 510

− NX 500

SIMPX 518

HD

wird 11

da 12

gepöbelt 13

.

4

5

NN

NE

NE

VVFIN

APPR

ART

NN

ART

ADJA

NN

KON

ADJD

VAFIN

ADV

VVPP

14

$.

nsm

nsm

nsm

3sis

a

asf

asf

gpf

gpf

gpf

−−

−−

3sis

−−

−−

−−

SIMPX 520



KONJ

KONJ

SIMPX 518 −

SIMPX 519 −

VF 514 MOD

ON

ADVX 508 −

ADVX 500 HD

ADVX 501 HD

aber

PX 510 −

VXFIN 502 HD

NX 503 HD

blieb

alles

− MF 517 PRED

NX 511 HD



NX 504 HD

LK 512 HD

HD

ADVX 505 HD

NX 513 −



VXFIN 506 HD



nur

der

Innensenator

ein

neuer

VVFIN

PIS

APPRART

NN

$(

ADV

ART

NN

VAFIN

ART

ADJA

$.

−−

−−

3sit

nsn

dsn

dsn

−−

−−

nsm

nsm

3sis

nsm

nsm

−−

2

3

4

5

6

7

8

ist

ADJX 507 HD

ADV

1

alten



ADV

0

beim



VF 516 ON

OPP

LK 509 HD

HD

So



MF 515

9

10

11

.

12

13

A coordination may also consist of two sentences with the subject of the whole construction only occurring in the left conjunct of the coordination. SIMPX 520 KONJ



KONJ

SIMPX 519 −





VF 517

SIMPX 518

OS





SIMPX 515 −

MF 516





MF 509 OA C 500

NX 501



HD

Ohne

LK 511

MF 512

LK 513

ON

HD

OV

HD

HD

VXINF 502

VXINF 503

VXFIN 504

HD

sie 0

OA

VC 510

HD

gesehen 1



zu 2

HD

haben 3

NX 505

? 4

5

,



kontert

6

der 7

PX 514 −

VXFIN 506 HD

HD

Popstar 8

und 9

HD

NX 507 −

wirft 10

OPP

HD

einen 11

ADVX 508 HD

Blick 12

nach 13

rechts 14

. 15

16

KOUS

PPER

VVPP

PTKZU

VAINF

$.

$,

VVFIN

ART

NN

KON

VVFIN

ART

NN

APPR

ADV

$.

−−

ap*3

−−

−−

−−

−−

−−

3sis

nsm

nsm

−−

3sis

asm

asm

d

−−

−−

Subclauses (either in VF or in NF) with or even without a conjunction can also be coordinated. 112

15

SIMPX 525 −



− MF 524 MOD

MOD

ON SIMPX 523 KONJ



KONJ

SIMPX 521 − VF 514



LK 515

PRED

SIMPX 522 −

MF 516

HD

ON

OV refvc

ADJX 500 HD

Unklar

VXFIN 501

ADVX 502

ADVX 503

HD

HD

HD

ist 0

aber 1

C 504

noch 2

NX 505



,

wie

diese

Leistung 6

OV 506

HD

VXINF 506

VXINF 507

VXFIN 508

HD

HD

HD

HD

beurteilt 7

werden 8



C 518 ON

kann 9

− 10

und

11

VC 520

OPP

NX 509

PX 510

HD

HD

wer 12



MF 519 PRED ADJX 511 HD

dafür 13

zuständig 14

OV

HD

VXINF 512

VXFIN 513

HD

HD

sein 15

soll 16

.

4

5

ADJD

VAFIN

ADV

ADV

$,

KOUS

PDAT

NN

VVPP

VAINF

VMFIN

$(

KON

PWS

PROP

ADJD

VAINF

VMFIN

$.

−−

3sis

−−

−−

−−

−−

nsf

nsf

−−

−−

3sis

−−

−−

ns*

−−

−−

−−

3sis

−−

6.4.9

3





VC 517

17

18

Paratactic Constructions with denn and weil

Paratactic constructions consisting of verb-second clauses conjoined by the conjunctions denn and weil, which also occur in the PARORD-field in the beginning of a sentence, are treated as equal conjuncts (verb-second instead of verb-final in weil-clause). In order to distinguish coordination of sentences with conjunct of the PARORD field from the above mentioned coordinations of sentences, these paratactic constructions are labeled as P-SIMPX instead of SIMPX. P−SIMPX 520 KONJ



KONJ SIMPX 519 −





SIMPX 517 −





OA

VF 515

NX 516

ON



NX 508 KONJ



KONJ

NX 500

NX 501

HD

HD

Kilos

und 0



MF 518

Fitneß 1

LK 509

VC 510

VF 511

LK 512

HD

OV

ON

HD

VXFIN 502

VXINF 503

HD

HD

sollen 2

NX 504

, 4

denn

5

OV

NX 506 −

will 7

VC 514



HD

Wesemann 6

EN−ADD 513

VXFIN 505

HD

stimmen 3

HD

die 8



Tour 9

de 10

VXINF 507 −

HD

France 11

gewinnen 12

. 13

14

NN

KON

NN

VMFIN

VVINF

$,

KON

NE

VMFIN

ART

NE

NE

NE

VVINF

$.

npn

−−

npf

3pis

−−

−−

−−

nsm

3sis

asf

asf

asf

asf

−−

−−

6.4.10

Conjunctions Occurring with Isolated Phrases

If a conjunct occurs isolated with a conjunction, high attachment is applied like in complete coordinations. But for isolated conjuncts, the conjunct is annotated as the head of the construction (HD instead of KONJ).

113

ADVX 501 −

HD ADVX 500 HD

und

jetzt 0

1

KON

ADV

−−

−−

ADVX 502 −

Oder



HD

ADVX 500

ADVX 501

HD

HD

eben 0

nicht 1

. 2

3

KON

ADV

PTKNEG

$.

−−

−−

−−

−−

If there are modifiers which do not modify the conjunct itself because they are ambiguous or might modify something else rather than the conjunct, they are attached on the same (high) level as the conjunction: NX 505 −

HD





− PX 504 −

NX 500 HD

Und

das 0

ADVX 501

ADVX 502

HD

HD

auch 1

HD NX 503 HD

noch 2

ohne 3

Mehrvergütung 4

. 5

6

KON

PDS

ADV

ADV

APPR

NN

$.

−−

nsn

−−

−−

a

asf

−−







NX 506 HD NX 505 HD

− PX 504 −

PX 500

ADVX 501

HD

und

HD

damit 0



auch 1

HD

NX 502 HD

die 2

NX 503 HD

Nervosität 3

im 4

Nato−Hauptquartier 5

. 6

7

KON

PROP

ADV

ART

NN

APPRART

NN

$.

−−

−−

−−

nsf

nsf

dsn

dsn

−−

6.4.11

Split Coordinations

Closely related to isolated conjuncts are split coordinations. Generally, the left conjunct of a split coordination is located in MF, in rare cases in VF, and the right conjunct occurs in NF. In order to express the relation between them, the left conjunct carries the label of 114

its grammatical function (ON, OA, OD, etc.) whereas the right conjunct carries a label that denotes that it is the conjunct of this grammatical function (e.g. ONK, OAK, ODK, etc.). In asymmetric coordination, the syntactic category of the second split conjunct determines the syntactic category one level higher up: SIMPX 513 −







MF 511

NF 512

OA VF 507

LK 508

ON

HD

NX 500 HD

Jedes

PX 509

HD

Ja−Wort 0

OAK NX 510



VXFIN 501



OPP

zieht 1

HD

KONJ

KONJ

KONJ

NX 502

NX 503

NX 504

NX 505

NX 506

HD

HD

HD

HD

HD

Applaus 2

nach 3

sich

,

4

5

Unterschriften

6

, 7

Küsse

8

, 9

Händeschütteln

.

10

11

12

PIDAT

NN

VVFIN

NN

APPR

PRF

$,

NN

$,

NN

$,

NN

$.

nsn

nsn

3sis

asm

d

ds*3

−−

apf

−−

apm

−−

asn

−−

SIMPX 515 −









MF 514 ON

OA

NX 512

NF 513

− VF 507

KONJ

LK 508

EN−ADD 509

MOD

HD

ADVX 500

VXFIN 501

ADVX 502

HD

HD

HD

Selbstverständlich

VC 510



hat 0

ONK

NX 503

NX 504



nicht 1



Karin 2

NX 511

OV



Jöns 3

die

KONJ

VXINF 505 HD

4



NX 506

HD

Flächen 5



gebucht 6

, 7

sondern

8

HD

die 9

SPD 10

. 11

12

ADV

VAFIN

PTKNEG

NE

NE

ART

NN

VVPP

$,

KON

ART

NE

$.

−−

3sis

−−

nsf

nsf

apf

apf

−−

−−

−−

nsf

nsf

−−

SIMPX 511 −







LK 509

NF 510

HD VF 505

ONK

VXFIN 506

ON

KONJ

NX 500

VXFIN 501

VXFIN 502

HD

HD

HD

Lausbuben



MF 507

sind 0

KONJ

und 1

PRED



KONJ

NX 503

ADJX 504

HD

bleiben 2

ADJX 508

HD

sie 3

und 4

unwiderstehlich 5

. 6

7

NN

VAFIN

KON

VVFIN

PPER

KON

ADJD

$.

npm

3pis

−−

3pis

np*3

−−

−−

−−

115

6.5

Elliptical Constructions

In elliptical constructions, syntactically necessary linguistic elements are missing which can be reconstructed from the context or the speech situation. Elliptical constructions appear on the phrase level as well as on the sentence level. The model of topological fields does not make any assumptions about dependency relations, but it allows that topological fields may be left empty. For the description of elliptical sentence constructions, the scheme of topological fields is an appropriate model because neither crossing branches nor traces have to be used to annotate the surface structure of a sentence. In elliptical phrases, the head word is missing. They are annotated like phrases without a head. Therefore, the edge labels of an elliptical phrase are empty: SIMPX 510 −





VF 508

MF 509

ON

OD

NX 504 −

LK 505



HD



irischen

Hinweisschilder 2

den 3

HD

walisischen 4

HD

ADVX 503

HD

seien

1



ADJX 502

HD

0

ADJX 507 −

VXFIN 501

HD

die

NX 506

HD

ADJX 500

PRED

ziemlich 5

ähnlich 6

7

ART

ADJA

NN

VAFIN

ART

ADJA

ADV

ADJD

npn

npn

npn

3pks

dpn

dpn

−−

−−

PX 503 KONJ



KONJ PX 502 −

HD

PX 500

NX 501



HD

in

und 0

um 1

Berlin 2

3

APPR

KON

APPR

NE

a

−−

a

asn

PX 505 −

HD NX 504 HD



ADJX 503 −



ADJX 500 HD

vom

NX 502 HD

4.

99

APPRART

ADJA

ADJA

CARD

dsm

dsm

dsm

−−

0

15.

ADJX 501 HD

1

2

3

116

SIMPX 521 −



− FKOORD 520 KONJ

KONJ

MF 517 V−MOD VF 509

LK 510

ON





HD

FDP 0

in 2

Thüringen



HD

knapp

4.000 5

, 6

in

7



Sachsen

und 10

in 11

HD

ADJX 508

HD

3.300 9



NX 507

HD

8

NX 516 HD

ADJX 506

HD

OA

PX 515



NX 505

HD

4

V−MOD

NX 514

ADJX 504

HD

3

OA

PX 513 −

ADVX 503

HD

hat 1



NX 502

HD

KONJ MF 519

V−MOD

NX 512

VXFIN 501 HD

Die

OA

PX 511

HD

NX 500



MF 518

HD

Brandenburg

2.000

12

13

Mitglieder 14

15

ART

NE

VAFIN

APPR

NE

ADV

CARD

$,

APPR

NE

CARD

KON

APPR

NE

CARD

NN

nsf

nsf

3sis

d

dsn

−−

−−

−−

d

dsn

−−

−−

d

dsn

−−

apn

SIMPX 517 −

− FKOORD 516 −

KONJ

KONJ

FKONJ 514

FKONJ 515



ON C 500 −

Ob

NX 501 −



MF 510 MOD

HD



VC 511 OA

ADVX 502 HD



nun

weniger

OV

NX 503

VXINF 504 HD

HD

V−MOD

VXINF 505 HD

ADJX 506 HD

ADV

PIAT

NN

VVINF

VVINF

KON

ADJD

NN

VVINF

VMFIN

−−

nsm

nsm

−−

***

apf

−−

−−

−−

−−

apf

−−

3sis

5

6

7

8

9

Parkgebühren

HD VXFIN 509 HD

NN

4

flächendeckend

OV VXINF 508 HD

Senat

3

oder

NX 507 HD

der

2

lassen

OA

ART

1

reinigen

OV refvc 504

VC 513

KOUS

0

Straßen



MF 512

10

kassieren

11

will

12

In elliptical sentence constructions, specific topological fields are not occupied. All constituents are attached to the appropriate field. In the first example, LK in the second conjunct is missing. In the second example, the subject is in NF and the main clause is lacking a verbal constituent: SIMPX 512 KONJ

KONJ

SIMPX 510 − VF 505

HD

NX 500

VXFIN 501 HD

Der



LK 506

ON



SIMPX 511



HD

Fall 0

MF 507



VF 508

MF 509

PRED

ON

PRED

ADJX 502

NX 503

ADJX 504

HD

ist 1





brisant 2

, 3

HD

die

HD

Mischung

explosiv 6

.

4

5

ART

NN

VAFIN

ADJD

$,

ART

NN

ADJD

$.

nsm

nsm

3sis

−−

−−

nsf

nsf

−−

−−

117

7

8

SIMPX 513 −

− NF 512 ON SIMPX 511 −





MF 510 OA

ON

MF 507 PRED

APP

ADJX 500 HD

Fein

, 0

MOD

NX 508

C 501

NX 502



HD

daß

sich

APP

NX 503 −

HD

NX 504 HD

die 3

VC 509

HD

Achse 4

ADJX 505 HD

Hamburg−Köln 5

VXFIN 506 HD

langsam 6

festigt

2

$,

KOUS

PRF

ART

NN

NE

ADJD

VVFIN

$.

−−

−−

−−

as*3

nsf

nsf

nsn

−−

3sis

−−

118

7

.

1

ADJD

8

9

Chapter 7 The Annotation of Specific Syntactic Phenomena 7.1

Superlative and Comparative Forms

7.1.1

Superlative Forms

The particle am, which occurs as a particle with an adjective or an adverb in superlative constructions, is tagged as PTKA. Both, the particle and the adjective/adverb are attached on the same level forming an adverbial/adjectival phrase: SIMPX 513 −





VF 511 OA

ON

NX 506 −

LK 507



HD

vorigen

Sonntag 1



hätte 2

PX 509



Michael 4

HD

ADJX 503



Frank 3

VC 510



NX 502

HD

0

FOPP



VXFIN 501

HD

MOD

EN−ADD 508

HD

ADJX 500

Den



MF 512



Nehr 5

NX 504 HD

am 6

OV



liebsten 7

aus 8

dem 9

VXINF 505 HD

HD

Kalender 10

gestrichen 11

. 12

13

ART

ADJA

NN

VAFIN

NE

NE

NE

PTKA

ADJD

APPR

ART

NN

VVPP

$.

asm

asm

asm

3skt

nsm

nsm

nsm

−−

−−

d

dsm

dsm

−−

−−

7.1.2

The Comparative Particles wie and als

Comparative particles in German are als and wie, in rare cases also denn (e.g. Die werden dort seliger schlummern denn je.). These particles are tagged as KOKOM and occur with all types of syntactic phrases (NX, ADVX, PX, etc.). They are directly attached to an adjacent comparative phrase. In case of a comparative phrase with a postmodifier, they are directly attached to the highest node of the complex phrase. A comparative phrase can occur as an adjacent postmodifier of the head phrase:

119

SIMPX 516 −





VF 515 MOD SIMPX 513

MF 514





ON

PRED

MF 511

ADJX 512

V−MOD

HD

ADJX 506 −

VC 507 HD

ADJX 500

HD

HD

VXINF 501

VXFIN 502

HD

HD

HD

Rein

musikalisch 0

LK 508

gesehen 1

NX 510

− NX 503 HD

das 3

HD







ADVX 504



ist 2



ADJX 509

ADJX 505

HD

Album 4

HD

wesentlich 5

schlanker 6

als 7

das 8

erste 9

. 10

11

ADJD

ADJD

VVPP

VAFIN

ART

NN

ADV

ADJD

KOKOM

ART

ADJA

$.

−−

−−

−−

3sis

nsn

nsn

−−

−−

−−

nsn

nsn

−−

SIMPX 510 −





MF 509 ON

OA

NX 508 HD

− NX 505 −

C 500

NX 501



HD

daß



HD



als

HD

HD

ADJX 503

HD

1

VC 507



ADJX 502

sie 0

NX 506

VXFIN 504

HD

ehrenamtliche 2

Vorsitzende

ein

3

4

HD

dienstliches 5

Handy 6

hat 7

8

KOUS

PPER

KOKOM

ADJA

NN

ART

ADJA

NN

VAFIN

−−

nsf3

−−

nsf

nsf

asn

asn

asn

3sis

If there is a long-distance dependency between the comparative phrase and the head phrase, the dependency relation is denoted with the respective X-MOD label. R−SIMPX 511 −







MF 510 OA C 505 ON



NX 500 HD

V−MOD

NX 506

PX 507 HD



HD

ADVX 501

mehr 1

nach 2

NX 504

HD

Bremerhaven 3

OA−MOD

VXFIN 503

HD

fünfmal 0

NF 509

HD

NX 502

HD

der

VC 508



liefert 4

HD

als 5

Daewoo 6

7

PRELS

ADV

PIS

APPR

NE

VVFIN

KOKOM

NE

nsm

−−

***

d

dsn

3sis

−−

ns*

In case of a long-distance dependency between the comparative phrase and the main verb (cf. 4.7.9), the comparative phrase is either a complement (e.g. PRED) or an ambiguous or unambiguous modifier of the main verb (MOD or V-MOD). 120

SIMPX 511 −





VF 510 V−MOD PX 509 −

HD NX 508 APP

APP EN−ADD 505

LK 506

− NX 500

NX 501



Unter

HD

dem 0

Motto

ON

VXFIN 502

HD

1

MF 507

HD

HD

Kino−Extrem 2

PRED

NX 503

NX 504



agiert 3

HD

der 4



Regisseur 5

HD

als 6

Filmjockey 7

8

APPR

ART

NN

NN

VVFIN

ART

NN

KOKOM

NN

d

dsn

dsn

dsn

3sis

nsm

nsm

−−

nsm

SIMPX 509 −





VF 508 MOD PX 507 −



HD NX 504 −

LK 505



HD

HD

ADJX 500

VXFIN 501

HD

Wie

in 0

den 1

MF 506

HD

meisten 2

Musicals 3

PRED

NX 502

ADJX 503



ist 4

ON

HD

die 5

HD

Handlung 6

simpel 7

8

KOKOM

APPR

ART

PIDAT

NN

VAFIN

ART

NN

ADJD

−−

d

dpn

dpn

dpn

3sis

nsf

nsf

−−

SIMPX 516 −







MF 515 ON

MOD

VF 512 V−MOD

SIMPX 513 −

PX 507 −

LK 508 HD

HD NX 500 HD

NX 509 −



VXFIN 501 HD

VC 510 HD

HD

ADJX 502 HD

C 503 −

VXINF 504 HD

KONJ

VXINF 505 HD

VXINF 506 HD

Arsten

die

neue

,

wie

,

nachgearbeitet

NE

VAFIN

ART

ADJA

NN

$,

KOUS

VVPP

$,

VVPP

KON

VVPP

$.

d

dsn

3sis

nsf

nsf

nsf

−−

−−

−−

−−

−−

−−

−−

−−

2

3

4

5

6

7

geplant

VXINF 511 −

In

1

Strecke

KONJ

APPR

0

wird

VC 514 OV



8

9

10

und

11

begrünt

12

.

13

The high attachment principle applies when the comparative particle has scope over a coordination of phrases (cf. 6.4.5). In this case, the two conjuncts are coordinated first. Then the particle is attached on a higher level.

121

NX 505 −

HD NX 504 KONJ



KONJ

EN−ADD 502

EN−ADD 503





NX 500 −

wie



Pete 0

NX 501 −

Sampras

oder

1

2



Yewgeni 3

Kafelnikov 4

5

KOKOM

NE

NE

KON

NE

NE

−−

nsm

nsm

−−

nsm

nsm

7.2

Verbal and Adjectival Use of Participles

In German, verbal participles which are passive verb forms (Der Mensch wird angesehen) can be used as adjectives: it can either function as an attribute adjective (der angesehene Mensch) or - depending on the context - also as a predicative adjective (der Mensch ist angesehen.). In contrast to the auxiliary werden in verbal passives, the auxiliary sein is used in constructions with adjectival passives. Concerning the problematic distinction between verbal and adjectival passives, we adapted the criteria in the Stuttgart-T¨ ubingen tagset (STTS) (Schiller et al. 1995).1 1. Can the sentence be transformed into active form keeping the same semantics? If yes → VVPP 2. Is there a von-PP or an equivalent PP that gives evidence for verb semantics? If yes → VVPP 3. Is it possible to substitute the word in questions by a semantically similar adjective? If yes → ADJD The following two tree structures show the annotation of the verbal and adjectival passives of the verbal participle angesehen. In the first example, the verbal participle is analyzed as a VVPP in VC. In the second example, the verbal participle has an adjectival reading and is annotated as an ADJD in MF. SIMPX 509 −





MF 508 MOD

ON

− C 500 −

daß

ADVX 501 HD

VC 507 HD

ADJX 502 HD

ADJX 503 −

ein

anderes ADJA

NN

KOKOM

ADJD

VVPP

VAFIN

−−

−−

nsn

nsn

nsn

−−

−−

−−

3sit

1

3

4

5

gültig

VXFIN 505 HD

ART

2

als

HD

VXINF 504 HD

ADV

1

Argument

HD

OV

KOUS

0

auch

PRED

NX 506 −

6

angesehen

7

wurde

8

Concerning the differences between verbal and adjectival passives in English cf. Bresnan (1995).

122

SIMPX 517 −





− NF 516 ON SIMPX 515 −





MF 514 ON

PRED ADJX 513 −

VF 507 ON−MOD NX 500 HD

LK 508 HD

MF 509 PRED

VXFIN 501 HD

ADJX 502 HD

NX 510 − C 503 −

HD

ADVX 511 HD



ADJX 504 HD

VC 512 HD

HD

ADVX 505 HD

Es

ist

schade

,

daß

so

wenig

PPER

VAFIN

ADJD

$,

KOUS

ADJA

NN

ADV

ADV

ADJD

VAFIN

nsn3

3sis

−−

−−

−−

npf

npf

−−

−−

−−

3pis

0

1

7.3

2

3

4

akademische

5

Leistungen

VXFIN 506 HD

6

7

8

angesehen

9

sind

10

Topicalization

Topicalization is almost exclusively found in verb-second clauses. Consequently, the subject is not in the first position of the clause. Topicalized constructions bring about word order phenomena which differ from those occurring in MF, e.g., non-finite parts of VC are not allowed in MF. Our annotation principles demand to analyze the topicalized verb complex and its non-finite parts as VC in the first position of the clause. VC is then attached to VF. If a part of MF is topicalized along with VC, first MF and VC are combined to form FKONJ before they are attached to VF: SIMPX 509 −





VF 507

MF 508



ON

VC 504

V−MOD

LK 505

OV

HD

VXINF 500

VXFIN 501

HD

HD

Geplant



HD

NX 502 −

war 0

PX 506

HD

der 1

NX 503 HD

Papst−Besuch 2

seit 3

langem 4

. 5

6

VVPP

VAFIN

ART

NN

APPR

NN

$.

−−

3sit

nsm

nsm

d

dsn

−−

123

SIMPX 517 −





VF 515

MF 516



ON

V−MOD

FKONJ 513

PX 514







HD

MF 511

NX 512

FOPP

PRED

HD

PX 507

VC 508



HD NX 500

ADJX 501

HD

Auf

Pioneer 0

OV

HD

VXINF 502

VXFIN 503

HD

HD

HD

aufmerksam 1

geworden 2



LK 509

NX 510 − NX 504 −

war 3

HD

der 4

NX 505 HD

BUND 5

durch 6

HD

Informationen 7

HD

ADJX 506

französischer 8

Bauern 9

. 10

11

APPR

NE

ADJD

VAPP

VAFIN

ART

NE

APPR

NN

ADJA

NN

$.

a

asm

−−

−−

3sit

nsm

nsm

a

apf

gpm

gpm

−−

7.4

Headlines

The syntax of headlines differs from other syntactic constructions in so far as headlines2 often lack the finite verb or a verb at all. If a headline has only an infinitive, the case assigment follows the preference principle formulated in 5.2. Therefore, we assume in general the more plausible grammatical function in each case: a passive constructions with ON in MF if the verb in VC is a past participle and an active construction with OA in MF if the verb in VC is an infinitive. SIMPX 507 −



MF 506 ON

V−MOD

NX 503 −

PX 504 HD



VC 505 HD

ADJX 500

HD

NX 501

HD

VXINF 502

HD

20

Dissidenten 0

in 1

HD

China 2

festgenommen 3

4

CARD

NN

APPR

NE

VVPP

−−

npm

d

dsn

−−

2 The identifier “HEADLINE” is automatically inserted into the comment line above the sentence for each syntactic unit which is marked as a headline in the original data.

124

SIMPX 504 −



MF 502

VC 503

OA

HD

NX 500

VXINF 501

HD

HD

WBM−Chefs

ablösen 0

1

NN

VVINF

apm

−−

If there is no verb at all within a headline, it is annotated like an isolated phrase (cf. 3.4.5): NX 504 HD



ADJX 503 −

NX

HD

ADJX

501

502

HD

HD

Handelsorganisation

vollkommen 0

NN

kopflos

1

2

ADJD

ADJD

Headlines can also consist of more than one syntactic structure, for instance, separated by a colon or a dash (cf. 4.7.2 and 5.2): SIMPX 505 −



VF 503

LK 504

ON NX 500

NX 501

HD

HD

Rechtschreibreform

: 0

7.5

HD VXFIN 502 HD

Gegner

1

klagen 2

3

NN

$.

NN

VVFIN

nsf

−−

npm

3pis

Discourse Markers

Generally, discourse markers are expressions or phrases of greeting, apologizing, thanking, short emotional utterances, and interjections. Their node label is DM. The edge label of a discourse marker is empty, i.e., it does not have a head. Typical discourse markers are: ja, nein, hallo, oh, aha, pst, nunja, gewiß, toll, nun ja, etc.

125

In most cases, discourse markers occur as isolated expressions. Interjections, tagged as ITJ, are directly projected to DM without internal structure. The same applies for answer particles (PTKANT): DM 500 −

Oh 0

ITJ −−

DM 500 −

ja 0

PTKANT −−

Phrases which function as discourse markers are first projected to their phrase level before they are assigned the node label DM. DM 501 − ADVX 500 HD

gewiß 0

ADV −−

DM 501 − NX 500 −

HD

Keine

Ahnung 0

1

PIAT

NN

asf

asf

DM 502 − NX 501 −

HD

ADJX 500 HD

Liebe

tazzen 0

1

ADJA

NE

np*

np*

126

Isolated conjunctions and foreign language discourse markers are tagged according to their part of speech (KON and FM) and are projected to DM: DM 500 −

Und 0

KON −−

DM 500 −

pardon 0

FM −−

Discourse markers may also consist of an interjection or an answer particle and a phrase: DM 501 −



ADVX 500 HD

Nun

ja 0

. 1

2

ADV

PTKANT

$.

−−

−−

−−

In some cases, discourse markers have a grammatical function within a phrase or a clause. Therefore, they are attached to the syntactic structure: PX 503 −

HD NX 502 APP

APP

NX 500 −

mit

HD

den 0

DM 501 −

Worten 1

" 2



aha

3

, 4



aha

5

, 6

aha

7

8

APPR

ART

NN

$(

ITJ

$,

ITJ

$,

ITJ

d

dpn

dpn

−−

−−

−−

−−

−−

−−

127

EN−ADD 503 − NX 502 HD



NX 500

DM 501

HD



Welt



Oh 0

, 1

no

2

3

NN

ITJ

$,

FM

dsf

−−

−−

−−

7.6

Parentheses

Parentheses occur as interjective utterances within a sentence. Since there is no dependency relation between the parenthesis and the rest of the construction, the parenthesis is not attached to the surrounding constituents. Often parentheses occur as SIMPX-clauses. Insertions like sagte Mehmet Scholl into direct speech are also annotated as parenthesis.3 SIMPX 515 −







VF 514 ON NX 513 HD

− NX 508 −



NX 500 −

HD

Vielzahl 0

der 1

V−MOD −

VXFIN 502

HD

PX 503

HD

betroffenen 2

Mieter 3

,

HD

so

Bertermann

,

VXINF 507

HD

HD

bereits

ADJA

NN

VAFIN

PROP

$,

ADV

NE

$,

ADV

VVPP

$.

nsf

nsf

gpm

gpm

gpm

3sks

−−

−−

−−

nsm

−−

−−

−−

−−

Ein





VF 509

VF 510

LK 511

ON

HD

MOD

VXFIN 502

ADVX 503

HD

HD

NCX 501 HD

Kuratorium 0

, 1

das

2

ist 3





ON

HD

12

SIMPX 516

SIMPX 515

NCX 500

11

.

8

ART



10

ausgezogen

7

NN



9

ADVX 506

ART



6

OV

NX 505

HD

daher 5

VC 512

MOD HD

ADVX 504

HD

sei 4

MF 511

NX 510

HD

ADJX 501 HD

Eine

LK 509

MF 512

OA

VXFIN 505 HD

der 5

MF 514

HD

NCX 504 −

wohl 4

LK 513 PRED

Gedanke 6

NCX 506

HD

, 7

HD

macht

8

PRED

ADVX 507

ADJX 508

HD

sich 9

MOD

HD

immer 10

gut 11

. 12

13

ART

NN

$,

PDS

VAFIN

ADV

ART

NN

$,

VVFIN

PRF

ADV

ADJD

$.

nsn

nsn

−−

nsn

3sis

−−

nsm

nsm

−−

3sis

as*3

−−

−−

−−

3

On the T¨ uBa-D/Z web page (http://www.sfs.uni-tuebingen.de/en/de_tuebadz.shtml), the treebank is also available in the Penn Treebank format. For this format, parentheses are attached to the tree structure with the edge label PAR. For further details about the Penn Treebank format cf. Appendix: The T¨ uBa-D/Z Data Formats.

128

13

SIMPX 513 −

− SIMPX 512

MF 511 ON







VF 506 PRED

LK 507 HD

EN−ADD 508 −

LK 509 HD

ADJX 500 HD

VXFIN 501 HD

NCX 502

VXFIN 503 HD





Schön

"

,

sagte

,

"

ist

"

.

ADJD

$(

$,

VVFIN

NE

NE

$,

$(

VAFIN

PDS

PTKNEG

$(

$.

−−

−−

−−

−−

3sit

nsm

nsm

−−

−−

3sis

nsn

−−

−−

−−

7.7

2

3

4

5

6

7

8

9

das

ADVX 505 HD

"

1

Scholl

MOD

NCX 504 HD

$(

0

Mehmet

MF 510 ON

10

nicht

11

12

13

Elliptical weil and wenn auch Constructions

Both conjunctions may introduce sentential phrases as well as adjectival phrases. The sequence wenn auch apparently is in a process of grammaticalization. According to Duden (7th edition) both wenn auch as well as the subordinate conjunction weil can function as coordinating conjunctions between adjectival phrases. However we decided to follow the STTS guidelines. Consequently, we analyze weil as a subordinating conjunction (KOUS) in all cases. Furthermore the sequence wenn auch is analyzed like a conjunction which introduces an elliptical clause. This is in accordance with the distributional differences between these conjunctions and the coordinating conjunctions within elliptical constructions, cf. Karl ist ins Freibad gegangen und Max ins Hallenbad. vs. * Karl ist ins Freibad gegangen weil Max ins Hallenbad. NX 507 −



HD

ADJX 506 HD

− SIMPX 505 −

− MF 504 PRED ADJX 503 −

ADJX 500 HD

C 501 −

HD

ADJX 502 HD

der

herzkranke

,

weil

notorisch

ART

ADJA

$,

KOUS

ADJD

ADJA

NE

nsm

nsm

−−

−−

−−

nsm

nsm

0

1

2

3

4

eifersüchtige

5

Gary

6

129

SIMPX 514 −





− NF 513 MOD SIMPX 512 −



MF 510 PRED VF 506 ON

LK 507 HD

NX 500 HD

VXFIN 501 HD

ADVX 502 HD

MF 511 MOD

PRED

ADJX 508 −

ADJX 509 HD

− C 503 −

ADJX 505 HD

auch

relativ

HD

es

ist

ziemlich

,

wenn

PPER

VAFIN

ADV

ADJD

$,

KOUS

ADV

ADJD

ADJD

$.

nsn3

3sis

−−

−−

−−

−−

−−

−−

−−

−−

0

1

2

perfekt

ADVX 504 HD

3

4

5

6

7

130

langweilig

8

.

9

Chapter 8 Criteria for the Distinction of Grammatical Functions 8.1

Subcategorization of Verbs

The T¨ uBa-D/Z-Verblist document1 lists all verbs occurring in the treebank with their specific subcategorization frames. This reference list guarantees the consistent annotation of grammatical functions. For a detailed description of constructing the verb list see (Hinrichs and Telljohann 2009). Subcategorization of PREDs: Since constituents which predicates subcategorize for have grammatical function within a sentence, they are neither marked as PRED-MOD nor attached to the predicate itself. These constituents are attached to a field and assigned the respective grammatical function like the constituent which is marked as FOPP in the following examples: SIMPX 512 −





VF 511 FOPP PX 509

MF 510



HD

ON

NX 506

LK 507

HD



NX 500 −

Für

HD

den 0

NX 501 −

Erfolg 1

HD

des 2

Volksbegehren 3

HD



VXFIN 502

ADVX 503

HD

HD

sind 4

PRED

NX 508 −

ADJX 505

HD

etwa 5

HD

ADJX 504

HD

243.000 6

Unterschriften 7

erforderlich 8

. 9

10

APPR

ART

NN

ART

NN

VAFIN

ADV

CARD

NN

ADJD

$.

a

asm

asm

gsn

gsn

3pis

−−

−−

npf

−−

−−

1

In case of interest, please refer to web page (http://www.sfs.uni-tuebingen.de/en/de_tuebadz. shtml) for contact information.

131

SIMPX 511 −





VF 509

MF 510

ON

OA

NX 505 −

LK 506



HD

älteren

Brüder 1



sich 3



NX 503

HD

fühlen 2

HD

NX 502

HD

0

ADJX 508



VXFIN 501

HD

PRED

PX 507

HD

ADJX 500

Die

OPP

für 4

HD

ihre 5

HD

ADVX 504 HD

Schwestern 6

sehr 7

verantwortlich 8

. 9

10

ART

ADJA

NN

VVFIN

PRF

APPR

PPOSAT

NN

ADV

ADJD

$.

npm

npm

npm

3pis

ap*3

a

apf

apf

−−

−−

−−

8.1.1

Distinction of FOPP, OPP, and V-MOD

One of the major problems is to distinguish, whether a given PP is an obligatory (OPP) or an optional (FOPP) complement of a specific verb in a specific reading, or whether it is a free adjunct (V-MOD) of that verb. The T¨ uBa-D/Z-Verblist is intended as a reference for these problematic cases. In the following, we will briefly describe what criteria have been used in order to decide about the subcategorization with respect to PP complements/modifiers: 1. A PP is called OPP within a sentence if the sentence were ungrammatical without the OPP (or if there was at least a very noticeable change of meaning). For instance, Sie gehen [OPP gegen die Faschisten] vor./ Das Gesetz ist [OPP in Kraft] getreten. 2. A PP is called FOPP if it can be left out of this specific sentence without causing ungrammaticality (or a very noticeable change of meaning) and if its preposition is selected by this specific verb. For instance, Insgesamt berichtet die Polizei [FOPP von 19 Festnahmen und 98 Ingewahrsamnahmen]./Sp¨ ater w¨ urden wir [FOPP u ¨ber Auswandern] nachdenken. Here, the prepositions select these specific verbs and the PPs cannot be added to any arbitrary verb (which is possible for free adjuncts). In addition, in passive clauses, the subject of the original active clause, which has the form of a prepositional phrase, is marked as FOPP (Sie wurden [FOPP von Autonomen] umringt.). 3. A PP is called V-MOD if its preposition is not selected by this specific verb, i.e., it can be exchanged by any other modifying PP, and similarly, this PP can occur with arbitrary verbs (Nur [V-MOD im griechischen Lager] gab es Probleme). Typical V-MODs are temporal or local adjuncts specifying time and location of the action, event, or state expressed by the verb.

8.1.2

Distinction of MOD, MOD-MOD, and V-MOD

A typical case of modification of modifiers is a temporal expression (V-MOD) that further specifies another temporal expression (MOD-MOD) in the same clause: 1. [V-MOD am Samstag] finden [MOD-MOD ab 16 Uhr] F¨ uhrungen statt. [MOD-MOD Wann] finden [V-MOD am Samstag] F¨ uhrungen statt? 132

2. [MOD da] finden [V-MOD am Samstag] F¨ uhrungen statt. [V-MOD wann] finden [MOD da] F¨ uhrungen statt? [MOD dann] finden [V-MOD am Samstag] F¨ uhrungen statt. da, dann, etc. can be either temporal, causal, consequential, or local expressions. Thus, one cannot make sure whether the following time expression am Samstag really refers to them. The only obvious observation is that the time expression is a V-MOD in any case. For resumptive constructions (LV), there is also a clear criterion concerning the modification relations. Within a verb-second clause, a modifier occurring in VF is MOD/X-MOD, whereas the modifier in LV is MOD-MOD, not vice versa, because the modifier in VF occurs within the core of the sentence, whereas the modifier in LV has to be licensed by some other constituent in the core sentence, e.g. Wenn da was gebucht worden ist, dann ist das nicht in Ordnung. (cf. 6.1.5).

8.1.3

Distinction of ON, PRED, ON-MOD, and PRED-MOD

It is not always trivial to distinguish which constituent is ON, PRED, or ON-MOD for predicative verbs. For this reason, a few criteria and examples are listed here that can be of help. Here are some properties of ON and PRED: 1. Typically, PRED occurs in MF, whereas ON occurs in VF of verb-second clauses. This should be considered for annotation, if no other criterion (as described below) applies. 2. Subject-verb agreement always has to be taken into account. For instance, if the verb is in plural form, the subject has to be plural as well. 3. If there is a suitable NP that could serve as subject, then this NP is annotated as subject rather than any other constituent with a different syntactic category (PP, ADVP, etc.). For verb-second clauses, it is important to follow these two steps in exactly this order to stick to the distributional criterion that has been chosen for the PRED/ON distinction: 1. Have a look at the constituent in VF. If it is an NP which might serve as subject and if it agrees with the verb, annotate it as ON. 2. If it does not agree with the verb, annotate it as PRED (ADJP, ADVP, PP, etc.). Examples: 1. [ON neue Wortsch¨ opfungen] sind [PRED es] nur. [PRED es] sind nur [ON neue Wortsch¨ opfungen]. oder sind [PRED es] nur [ON neue Wortsch¨ opfungen]. [PRED das] sind ohnehin [ON die schw¨ achsten Partner]. [ON die schw¨ achsten Partner] sind [PRED das] ohnehin. oder sind [PRED das] ohnehin [ON die schw¨ achsten Partner]. 133

Subject-verb agreement suggests that neue Wortsch¨ opfungen und die schw¨ achsten Partner are the subject, because of their plural form regardless in which field they occur. 2. [ON die Ursache] war [PRED unklar]. [PRED unklar] war [ON die Ursache] [ON Candan Ercettin] ist [PRED u ¨berall]. [PRED u ¨berall] ist [ON Candan Ercettin]. ADJPs and ADVPs typically have PRED function when occurring together with predicative verbs and NP subjects. 3. [PRED aus den Trauernden] wird [ON ein w¨ utender Mop]. ein w¨ utender Mop is considered the subject, because it is a noun phrase. Therefore, the prepositional phrase is PRED. 4. [ON [ON [ON [ON [ON [ON

das] ist [PRED eine einmalige Chance]. eine einmalige Chance] ist [PRED das]. es] ist [PRED der erste Besuch eines Papstes]. der erste Besuch eines Papstes] ist [PRED es]. Hauptauftraggeber] ist [PRED die Bremer Verwaltung]. die Bremer Verwaltung] ist [PRED Hauptauftraggeber].

The NP in VF position agrees with the verb and therefore has subject priority. As a consequence, the constituent in MF is PRED. 5. [PRED wer] bin [ON ich]. [PRED was] ist [ON das]. In w-questions, the interrogative pronoun is always PRED because here also the agreement rule applies. 6. [ON-MOD es] sei [PRED wichtig], [ON daß man ... . [ON Aufgabe des Festspielhauses] sei [PRED-MOD es], [PRED das Haus spielfertig zu halten]. If a sentential subject or a sentential predicate occurs with an expletive es, the expletive es is either ON-MOD or PRED-MOD (cf. 4.2.9).

134

References Bech, G. 1955–57. Studien u ¨ber das deutsche Verbum infinitum. Kopenhagen. 2 B¨ ande. 2. unver¨anderte auflage 1983 mit einem Vorwort von Catharine Fabricius-Hansen. T¨ ubingen: Max Niemeyer. Behaghel, O. 1932. Deutsche Syntax (Eine geschichtliche Darstellung), Band 4. Heidelberg: Carl Winter. Brants, T., and W. Skut. 1998. Automation of treebank annotation. In Proceedings of the Conference on New Methods in Language Processing (NeMLaP-3/CoNLL98), January 14-17, 1998, pages 49-57, Sydney, Australia, 49–57. Brants, T. 1997. The NeGra Export Format for Annotated Corpora. University of Saarbr¨ ucken, Germany. Brants, T. 1998. TnT–A Statistical Part-of-Speech Tagger. Universit¨at des Saarlandes, Computational Linguistics, Saarbr¨ ucken, Germany. Bresnan, J. 1995. Lexicality and Argument Structure. In Invited Paper given at the Paris Syntax and Semantics Conference. October 12-14, 1995. URL: http://wwwcsli.stanford.edu/∼bresnan/download.html. Drach, E. 1937. Grundgedanken der Deutschen Satzlehre. Frankfurt/M. Drosdowski, G. (Ed.). 1995. Duden ”Die Grammatik der deutschen Gegenwartssprache”. Mannheim, Leipzig, Wien, Z¨ urich: Dudenverlag. Eisenberg, P. 1999-2001. Grundriß der deutschen Grammatik, Band 2: Der Satz. Stuttgart, Weimar: J.B. Metzler. Engel, U. 1996. Deutsche Grammatik. Heidelberg: Julius Groos Verlag. Erdmann, O. 1886. Grundz¨ uge der deutschen Syntax nach ihrer geschichtlichen Entwicklung dargestellt. Stuttgart. Erste Abteilung. Grewendorf, G. 1991. Aspekte der deutschen Syntax, Band 33 of Studien zur deutschen Grammatik. T¨ ubingen: Gunter Narr Verlag. Helbig, G., and J. Buscha. 1998. Deutsche Grammatik. Ein Handbuch f¨ ur den Ausl¨ anderunterricht. Leipzig, 18. Auflage. ¨ Herling, S. H. A. 1821. Uber die Topik der deutschen Sprache. In Abhandlungen des frankfurterischen Gelehrtenvereins f¨ ur deutsche Sprache, 296–362, 394. Frankfurt/M. Drittes St¨ uck. Hinrichs, E. W., J. Bartels, Y. Kawata, V. Kordoni, and H. Telljohann. 2000. The T¨ ubingen treebanks for spoken German, English, and Japanese. In W. Wahlster (Ed.), Verbmobil: Foundations of Speech-to-Speech Translation. Berlin: Springer. Hinrichs, E. W., and H. Telljohann. 2009. Constructing a valence lexicon for a treebank of german. In Proceedings of the Seventh International Workshop on Treebanks and Linguistic Theories (TLT 7): January 23-24, 2009, Groningen, The Netherlands. URL: http://www.let.rug.nl/tlt/. 135

H¨ohle, T. N. 1986. Der Begriff ‘Mittelfeld’. Anmerkungen u ¨ber die Theorie der topologischen Felder. In A. Sch¨one (Ed.), Kontroversen alte und neue. Akten des 7. Internationalen Germanistenkongresses G¨ ottingen, 329–340. Kathol, A. 1995. Linearization-Based German Syntax. PhD thesis, Ohio State University. Kiss, T. 1995. Infinitive Komplementation. Neue Studien zum deutschen Verbum infinitum. T¨ ubingen: Max Niemeyer. K¨ ubler, S., and H. Telljohann. 2002. Towards a dependency-based evaluation for partial parsing. In Beyond PARSEVAL – Towards Improved Evaluation Measures for Parsing Systems – (LREC 2002 Workshop), Las Palmas, Gran Canaria, June 2002. Marcus, M., B. Santorini, and M. A. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2):313–330. Naumann, K., and V. M¨oller. 2007. Manual for the Annotation of in-document Referential Relations. University of T¨ ubingen, May 2007. Plaehn, O. 1998. Annotate - Bedienungsanleitung, Universit¨at des Saarlandes, FR 8.7 Computerlinguistik, Projekt C3 Nebenl¨aufige Grammatische Verarbeitung, Sonderforschungsbereich 378, Ressourcenadaptive Kognitive Prozesse, 13. April 1998. ¨ P¨ utz, H. 1986. Uber die Syntax der Pronominalform ’es’ im modernen Deutsch. T¨ ubingen: Stauffenburg. 2nd edition. Schiller, A., S. Teufel, and C. Thielen. 1995. Guidelines f¨ ur das Tagging deutscher Textcorpora mit STTS. Technical report, Universit¨aten Stuttgart und T¨ ubingen. URL: http://www.sfs.nphil.uni-tuebingen.de/ELWIS/stts/stts.html. Stegmann, R., H. Telljohann, and E. W. Hinrichs. 2000. Stylebook for the German Treebank in verbmobil. Technical report, Verbmobil-Report 239. Telljohann, H., E. W. Hinrichs, S. K¨ ubler, and H. Zinsmeister. 2006. Stylebook for the T¨ ubingen Treebank of Written German (T¨ uBa-D/Z). University of T¨ ubingen, July 2006. Trushkina, J. 2004. Morpho-syntactic annotation and dependency parsing of German. PhD thesis, University of T¨ ubingen. (http://w210.ub.unituebingen.de/dbt/volltexte/2004/1523).

136

Appendix: The T¨ uBa-D/Z Data Formats The T¨ uBa-D/Z treebank is released in three different data formats : 1. the NEGRA export format, 2. the Penn Treebank format, 3. an XML format (including anaphora and coreference relations). 1. The NEGRA Export Format This format is provided by the annotation tool Annotate (Brants and Skut 1998), it is created automatically from the database underlying the annotation process in Annotate. The NEGRA export format is a line-oriented pointer-based representation of the syntactic annotation. It is also the most complete data format since it preserves all the information available during the manual annotation. A more complete description of the negra export format can be found in (Brants 1997). An example of the NEGRA export format is given below, combined with the graphical representation of the syntactic annotation for the sentence ”Vikare mssen sich nach dem Kandidatengetz so verhalten, wie es von einem k¨ unftigen Pfarrer erwartet werden kann”. Graphical representation (print out of the annotate tool): −

SIMPX 523 −





− NF 522 PRED−MOD SIMPX 521 −





MF 520 ON

FOPP

MF 518 OA VF 512 ON

LK 513 HD

NX 500 HD

VXFIN 501 HD

Vikare

0

müssen

PX 519

V−MOD

PRED

PX 514 −

1

sich

2

NX 503 −

nach

3

dem

NN

VMFIN

PRF

APPR

ART

npm

3pis

ap*3

d

dsn

HD

4

HD

VC 515 OV

HD

NX 502 HD



Kandidatengetz

ADVX 504 HD

VXINF 505 HD

NX 516 −

− C 506 −

NX 507 HD

HD

ADJX 508 HD

verhalten

,

wie

es

von

VVINF

$,

KOUS

PPER

APPR

ART

ADJA

NN

VVPP

VAINF

VMFIN

$.

dsn

−−

−−

−−

−−

nsn3

d

dsm

dsm

dsm

−−

−−

3sis

−−

8

9

10

11

12

13

14

erwartet

15

werden

VXFIN 511 HD

so

7

Pfarrer

HD

VXINF 510 HD

ADV

6

künftigen

VXINF 509 HD

NN

5

einem

VC 517 OV refvc 509

OV

16

kann

Kandidatengesetz

The first line of the sentence representation (marked as ’begin of sentence’ (BOS) includes the sentence id (here: 24429), the identity of the last annotator (here the one with id 27), the time of the last modification (in UNIX format, i.e. seconds since 1/1/1970) and the id of the origin of the file (1222 points to article 155 of the edition of 11/7/1992). Secondary edges (here: ’refvc’ pointing from node # 510 to node # 509, a dependency within the verbal complex) as well as corrections of misspellings (here: ’Kandidatengesetz’) are also represented.

137

17

.

18

Export format: #BOS 24429 27 1134150923 1222 Vikare NN npm HD 500 mssen VMFIN 3pis HD 501 sich PRF ap*3 HD 502 nach APPR d - 514 dem ART dsn - 503 Kandidatengetz NN dsn HD 503 %% Kandidatengesetz so ADV -- HD 504 verhalten VVINF -- HD 505 , $, -- -- 0 wie KOUS -- - 506 es PPER nsn3 HD 507 von APPR d - 519 einem ART dsm - 516 knftigen ADJA dsm HD 508 Pfarrer NN dsm HD 516 erwartet VVPP -- HD 509 werden VAINF -- HD 510 kann VMFIN 3sis HD 511 . $. -- -- 0 #500 NX -- ON 512 #501 VXFIN -- HD 513 #502 NX -- OA 518 #503 NX -- HD 514 #504 ADVX -- PRED 518 #505 VXINF -- OV 515 #506 C -- - 521 #507 NX -- ON 520 #508 ADJX -- - 516 #509 VXINF -- OV 517 #510 VXINF -- OV 517 refvc 509 #511 VXFIN -- HD 517 #512 VF -- - 523 #513 LK -- - 523 #514 PX -- V-MOD 518 #515 VC -- - 523 #516 NX -- HD 519 #517 VC -- - 521 #518 MF -- - 523 #519 PX -- FOPP 520 #520 MF -- - 521 #521 SIMPX -- PRED-MOD 522 #522 NF -- - 523 #523 SIMPX -- -- 0 #EOS 24429

138

The only deviation from context-freeness which the annotation scheme allows concerns the annotation of parentheses. Parentheses are annotated as separate trees with no attachment to surrounding trees. The following tree gives an example for such a phenomenon (for a more complete description of the annotation cf. 7.6). Graphical representation: SIMPX 517 −



VF 514 OA

HD

LK 511

HD

ON

HD

VXFIN 501 HD

etwas 0

, 1

ON

MF 510

HD

So



LK 509

ADVX 500

NX 502 −

sagen

2

, 5

NX 504 HD

hätten

6

MOD



HD

Abgeordneten 4

MOD

ADVX 512

VXFIN 503 HD

die 3



MF 516



NX 508 −



SIMPX 515

sie 7

VC 513 HD

OV

ADVX 505

ADVX 506

VXINF 507

HD

HD

HD

auch 8

noch 9

nicht 10

erlebt 11

. 12

13

ADV

PIS

$,

VVFIN

ART

NN

$,

VAFIN

PPER

ADV

ADV

PTKNEG

VVPP

$.

−−

***

−−

3pis

np*

np*

−−

3pkt

np*3

−−

−−

−−

−−

−−

The pointer-based representation of the NEGRA export format separates information about the linear precedence of words from attachment information so that parentheses can be represented naturally without having to resort to explicitely marking non-attached nodes. Here, the SIMPX node dominating the parenthesis (node #515) is marked as not having a mother node. #BOS 7219 19 1121695339 839 So ADV -- HD 500 etwas PIS *** HD 508 , $, -- -- 0 sagen VVFIN 3pis HD 501 die ART np* - 502 Abgeordneten NN np* HD 502 , $, -- -- 0 htten VAFIN 3pkt HD 503 sie PPER np*3 HD 504 auch ADV -- HD 505 noch ADV -- HD 506 nicht PTKNEG -- HD 512 erlebt VVPP -- HD 507 . $. -- -- 0 #500 ADVX -- - 508 #501 VXFIN -- HD 509 #502 NX -- ON 510 #503 VXFIN -- HD 511 #504 NX -- ON 516 #505 ADVX -- MOD 516 #506 ADVX -- - 512 #507 VXINF -- OV 513 #508 NX -- OA 514 #509 LK -- - 515 #510 MF -- - 515 #511 LK -- - 517 #512 ADVX -- MOD 516 #513 VC -- - 517 #514 VF -- - 517 #515 SIMPX -- -- 0 #516 MF -- - 517 #517 SIMPX -- -- 0 #EOS 7219

139

2. The Penn Treebank Format This format is based on the format of the Penn Treebank (Marcus et al. 1993). The attachment of constituents is shown via bracketing and indentation. Thus, all constituents which show the same level of indentation are attached on the same level. In the Penn Treebank format, grammatical functions, which are shown in the NEGRA export format in the column ”edge label”, are attached to the syntactic label via a colon. Thus, the label ”NX:OA” means that the constituent is a noun phrase with the grammatical function accusative object. The Penn Treebank format is a representation that combines the linear representation of words with their attachment to higher constituents. For this reason, this format is restricted to completely context-free tree structures, i.e. it cannot adequately represent the annotation of parentheses in TBa-D/Z. In order to capture the original syntactic annotation as well as the original word order in the sentence, it was decided to introduce a new edge label to mark such cases: PAR. Thus, the sentence ”So etwas , sagen die Abgeordneten , htten sie auch noch nicht erlebt .”, as shown above is represented in the Penn Treebank format by the following bracketed structure: %% sent. no. 7219 ( (SIMPX (VF (NX:OA (ADVX (ADV:HD So) ) (PIS:HD etwas) ) ) ($, ,) (SIMPX:PAR %% here starts the parenthesis! (LK (VXFIN:HD (VVFIN:HD sagen) ) ) (MF (NX:ON (ART die) (NN:HD Abgeordneten) ) ) ) ($, ,) (LK (VXFIN:HD (VAFIN:HD htten) ) ) (MF (NX:ON (PPER:HD sie) ) (ADVX:MOD

140

(ADV:HD auch) ) (ADVX:MOD (ADVX (ADV:HD noch) ) (PTKNEG:HD nicht) ) ) (VC (VXINF:OV (VVPP:HD erlebt) ) ) ) ($. .) )

Comments are preceded by a double ’%’ sign. The comment behind the structure is intended to help the reader locate the beginning of the parenthesis and it is not part of the actual data.

141

Commas, which are not attached to the tree, are indented on the highest level although they are included in the bracketing of the constituent surrounding them. In the sentence below, e.g., the first comma is grouped into the noun phrase NX via word order. The indentation, however, signals that the comma cannot necessarly be attached to this node. It is also conceivable that it may be attached to one of the lower nodes, NX or R-SIMPX. In the case of the second comma, there are even more possible attachment sites. %% sent. no. 33 ( (R-SIMPX (C (NX:ON (PRELS:HD die) ) ) (MF (NX:OA (NX:HD (ART die) (EN-ADD:HD (NN AWO) ) ) ($, ,) (R-SIMPX (C (PX:V-MOD (PWAV:HD wo) ) ) (MF (NX:ON (PPER:HD er) ) (NX:PRED (NN:HD Kreisvorsitzender) ) ) (VC (VXFIN:HD (VAFIN:HD ist) ) ) ) ) ) ($, ,) (VC (VXFIN:HD (VVFIN:HD prfte) ) ) ) ($. .) )

142

3. The XML Format The XML format is a custom-made XML format that follows the NEGRA export file format. It is designed to accomodate all original information provided in the export format, including e.g. comments and editor/origin information, which are resolved so that it is not necessary to consult the NEGRA tables. Dominance relations between nodes are represented directly within the XML tree structure. Root nodes are marked by the attribute parent=”0”. Thus, it is possible to represent parentheses without the use of additional labels. The root node of a parenthesis contains the attribute parent=”0”, which signifies that while the tree is part of the higher constituent where linear order is concerned, it is not attached to the surrounding tree. Anaphora is expressed by a link between two related nodes. Coreference sets therefore are represented implicitly by chains of nodes that are part of a referential relation. The following example shows the XML structure for the sentence ”Schillen erkl¨arte, sie werde als Kriegsgegnerin kandidieren”. The personal pronoun “sie” is anaphoric to the antecedent noun phrase “Schillen”. In the XML document, an tag is added below each node that is part of a referential relation. The tag, which is a child of the tag, encodes the type of referential relation and the node ID of the antecedent node. In our example, the antecedent is the node with ID s 1723 n 500, that is the NX dominating the named entity “Schillen”. This NX in turn is in a coreferential relationship with node s 1721 n 500 (node number 500 in sentence 1721), thus part of a coreference chain. Graphical representation of the tree without annotation of the referential relation: SIMPX 514 −



− NF 513 OS





VF 506 ON

LK 507 HD

VF 508 ON

LK 509 HD

NX 500 HD

VXFIN 501 HD

NX 502 HD

VXFIN 503 HD

Schillen

VC 511 OV

NX 504 −

,

sie

$,

PPER

VAFIN

KOKOM

NN

VVINF

$.

nsf

3sit

−−

nsf3

3sks

−−

nsf

−−

−−

2

3

4

als

VXINF 505 HD

HD

VVFIN

1

werde



MF 510 PRED

NE

0

erklärte

SIMPX 512 −

5

Kriegsgegnerin

143

6

kandidieren

7

.

8

XML format including the referential relation:

144

Index accusative object,double, 79 AcI, 78 adverbial adjective, 20, 65, 66 adverbial phrase, 24, 70 ambiguity, 17, 29, 31, 86, 88–91, 109 apposition, 25, 38 attributive adjective, 19, 20, 30, 32, 33, 65, 67

KOORD-field, 15, 16, 24, 94, 102

C-field, 15, 24, 92, 95, 99 cardinal numbers, 20, 36, 55, 56 circumposition, 20, 64 coherency, 77, 78 comparatives, 12, 119 context-freeness, 11 coordination, 12, 14, 18, 19, 24, 34, 62, 67, 101–104, 106–113, 115, 121

named entities, 11, 19, 26, 45, 51–53 Negra export format, 137 Negra treebank, 9 node labels, 11, 18, 19, 24, 26, 42, 45, 46, 73, 91, 108, 125 nominalized adjective, 66 non-ambiguity, 29, 87 non-words, 21, 56

Dependency Grammar, 30, 61 determiner phrase, 24, 60 discourse marker, 10, 12, 23, 24, 30, 125– 127

ordinal numbers, 54

lassen, 78 levels of annotation, 17, 18 long-distance dependency, 11 long-distance dependency, 29, 96, 120 longest match principle, 17, 23, 28, 111 modal verbs, 21, 85

edge labels, 11, 16, 18, 25, 26, 29, 30, 87, 101 elliptical construction, 17, 101, 116, 117 elliptical constructions, 12, 23 Ersatzinfinitiv, 15, 24, 73, 74 expletive, 25, 57, 59, 134 flat clustering principle, 17, 31, 71, 92 foreign language material, 20, 42, 53 headline, 10, 12, 23, 73, 86, 87, 124 high attachment principle, 17, 31, 107, 121 imperative, 21, 79 incoherency, 78 infinitives with zu, 75, 76, 78 initial field, 13, 15, 24, 89 isolated phrase, 26, 27, 90, 91, 113, 125 145

paratactic construction, 24, 111, 113 parenthesis, 10, 12, 23, 128 PARORD-field, 15, 16, 24, 94, 95, 113 part-of-speech tags, 11, 18, 26, 56, 62 particle verb, 81 Penn Treebank format, 137 postmodification, 31, 69 postmodifier, 17, 19, 31, 35–37, 39, 44, 49, 51, 70, 89, 99, 100, 119 postnominal modifier, 35, 36 postposition, 20, 64 predicate, 82, 131, 134 predicate-argument structure, 10, 16 predicative adjective, 20, 65, 122 preference principle, 87, 124 premodification, 31 premodifier, 31, 33, 51, 54, 65, 66, 68, 70, 89, 109 prenominal modifier, 32, 33, 35 preposition, 20, 30, 44, 47, 61, 62, 132 proper noun, 19, 20, 26, 31, 34, 37, 41, 42, 45–50, 53

punctuation marks, 21, 23, 39, 89, 90 relative clause, 15, 24, 88, 99, 100 relative clause, event-modifying, 100 relative clause, independent, 100 resumptive construction, 14, 15, 24, 96, 133 reusability, 8, 10 secondary edge label, 11, 18, 19, 25, 26, 29, 45, 48, 49, 71, 72, 78, 99 split coordination, 25, 102, 114 superlative forms, 119 syntactic dependencies, 11 T¨ uBa-D/S treebank, 9 T¨ uBa-D/Z data formats, 9, 137 theory-neutrality, 8, 10 TIGER treebank, 9 topicalization, 12, 123 topological fields, 11–18, 29, 30, 32, 82, 86, 92, 108, 116, 117 truncated word, 21, 106 verb complex, 11, 13, 15, 24, 71–73, 75, 77–79, 81, 99, 123 verb particle, 21, 81 VERBMOBIL treebank, 8, 9, 26 XML format, 137

146