CROVALLEX 2.008 - Logical Structure of the Data


Contents

   1. Word Entries
   2. Lemmas
   3. Homographs
   4. Lexical homonyms
   5. Frame Entries
   6. Valence Frames
   7. Functors
   8. Morphemic Forms
   9. Explicitly Declared Forms
   10. Implicitly Declared Forms
   11. Types of Complementations
   12. Frame Attributes
   13. Class
   14. Aspect
   15. Idiomatic frames

 

 


1. Word Entries

On the topmost level, CROVALLEX 2.008 is divided into word entries. Each word entry relates to one or more headword lemmas  (Sec. 2) . The word entry consists of a sequence of frame entries  (Sec. 5)  relevant for the lemma(s) in question (where each frame entry usually corresponds to one of the lemma's meanings). Information about the aspect  (Sec. 14)  of the lemma(s) is assigned to each word entry as a whole.

Most of the word entries correspond to lemmas in a simple one-to-one manner, but the following two non-trivial situations appear as well in CROVALLEX 2.008:

The content of a word entry roughly corresponds to the traditional term of lexeme.

2. Lemmas

Verb lemma represents the infinitive form of the verb, which is in case of lexical homonyms (Sec. 3) and homographs (Sec. 4) followed by a Roman number in superscript.

Reflexive particle se is part of the infinitive only if the verb is derived reflexive (e.g. vratiti se) or reflexiva tantum (e.g. penjati se).

3. Lexical homonyms

Lexical homonyms are groups of two lemmas which have the same spelling and wordform, but considerably differ in their meanings (there is no obvious semantic relation between them). They also might differ as to their etymology (e.g. hȉtatiI - žuriti vs. hȉtatiII - bacati), aspect  (Sec. 14)  (e.g. matíratiI inf. - činiti da što ne dobije neželjen sjaj vs. matíratiII fin.-poraziti), or conjugated forms (izvezem [first person sg.] for izvestiI - okititi vezom vs. izvesti [first person sg.] for izvestiII - izvoziti).

The term 'lexical homonyms' should not be confused with the term 'synonymy'.

4. Homographs

Homographs are groups of two lemmas which have the same wordform, but different accent, and also considerably differ in their meanings (there is no obvious semantic relation between them). They also might differ as to their etymology (e.g. ìskapatiI - isteći kapljući vs. iskápatiII - iskopavati), aspect  (Sec. 14)  (e.g. ìsplakatiI fin.-suzama, plačem izraziti vs. isplákatiII inf.-ispirati), or conjugated forms (napadnem [first person sg.] for nàpastiI - ugroziti tjelesnu sigurnost vs. napasem [first person sg.] for nàpāstiII - pasući nahraniti).

The term 'homographs' should also not be confused with the term 'synonymy'.

5. Frame Entries

Each word entry  (Sec. 1)  consists of a non-empty sequence of frame entries, typically corresponding to the individual meanings (senses) of the headword lemma(s) (from this point of view, CROVALLEX 2.008 can be classified as a Sense Enumerated Lexicon).

The frame entries are numbered within each word entry; in the CROVALLEX 2.008 notation, the frame numbers are attached to the lemmas as subscripts.

The ordering of frames is not completely random, but it is not perfectly systematic either. So far it is based only on the following weak intuition: primary and/or the most frequent meanings should go first, whereas rare and/or idiomatic meanings should go last. (We do not guarantee that the ordering of meanings in this version of CROVALLEX 2.008 exactly matches their frequency of the occurrences in contemporary language.)

Each frame entry contains a description of the valence frame itself  (Sec. 6)  and of the frame attributes  (Sec. 12) .

The content of 'frame entry' roughly corresponds to the term of lexical unit.

6. Valence Frames

In CROVALLEX 2.008, a valence frame is modeled as a sequence of frame slots. Each frame slot corresponds to one (either required or specifically permitted) complementation of the given verb.

The following attributes are assigned to each slot:

7. Functors

In CROVALLEX 2.008, functors (labels of 'deep roles'; similar to theta-roles) are used for expressing types of relations between verbs and their complementations. Functors are divided into inner participants (actants) and free modifications (this division roughly corresponds to the argument/adjunct dichotomy).

Functors which occur in CROVALLEX 2.008 are listed in the following tables:

Inner participants:

Functor

Example sentence

AGT (agent)

John reads the book.

PAT (patient)

John plays the piano.

REC (recipient)

My mother sent her the money.

RESL (result)

His hard work took him to he victory.

ORIG (origin)

We received the message from the dean.

Free modifications:

Functor

Example sentence

ACMP (accompaniement)

My sister visited me with her husband.

AIM (aim)

He left the school to join the army.

BEN (benefactive)

My mother made a cake for me.

CAUS (cause)

My father got angry because I failed the exam.

CNCS (concession)

She still loves him although he lied.

COMPL (complement)

I was sailing the seas as a young researcher.

COND (condition)

I will give you my book if you promise not to lose it.

CONTR (contra)

Tomorrow he plays against the tennis player from Italy.

CPR (comparison)

You will have to study more than you did last time.

DIR1 (direction-from)

My mother just came from the theater.

DIR2 (direction-through)

She drove through the town.

DIR3 (direction-to)

My mother went to the shop.

EXT (extent)

The snow has risen over half a meter.

HER (heritage)

They named the boat after the great sailor.

LOC (locative)

My sister lives in Vienna.

MANN (manner)

She lost the interest in reading very quickly.

INST (instrument)

She sent her the news by email.

DIFF (difference)

The stock prices have risen by about 30%.

OBST(obstacle)

My granny tripped over her toys.

REG (regard)

Regardless of her beauty she still has no boyfriend.

RESTR (restriction)

She will make the lunch for all except John.

SUBS (substitution)

Your boy went to the playground instead of going to school.

TFRWH (temporal-from-when)

I remember her being smart from the highschool.

THL (temporal-how-long )

She stayed for her holidays in Italy for the whole month.

THO (temporal-how-often )

She plays the guitar every Saturday.

TOWH (temporal-to when)

The teacher postponed the exam to June 11.

TSIN (temporal-since-when)

She didn't study since the last semester.

TWHEN (temporal-when)

She visited us last summer.

8. Morphemic Forms

In a sentence, each frame slot can be expressed by a limited set of morphemic means, which we call forms. In CROVALLEX 2.008, the set of possible forms is defined either explicitly  (Sec. 9) , or implicitly  (Sec. 10) . In the former case, the forms are enumerated in a list attached to the given slot. In the latter case, no such list is specified, because the set of possible forms is implied by the functor of the respective slot (in other words, all forms possibly expressing the given functor may appear).

9. Explicitly Declared Forms

The list of forms attached to a frame slot may contain values of the following types:

  • Pure (prepositionless) case. There are seven morphological cases in Croatian. In the CROVALLEX 2.008 notation, they have traditional numbering: 1 - nominative, 2 - genitive, 3 - dative, 4 - accusative, 5 - vocative, 6 - locative, and 7 - instrumental.
  • Prepositional case. Lemma of the preposition and the number of the required morphological case are specified (e.g., od+2, na+4, o+6...). The prepositions occurring in CROVALLEX 2.008 are the following: u, tijekom, na, prema, s, kod, za, prije, nad, sukladno, u_pogledu, do, pod, od, iz, izvan, kroz, iza, u_istom, o, zbog, između, preko, po, poput, kao, u_vrijeme, pred, oko, nakon, protiv, u_doba, sa, za_vrijeme, posred, ispod, u_odnosu_na, pri, uza, uz, zajedno_sa, niz, zajedno_s, bez, na_temelju, sredinom, uz_pomoć, k, za_potrebe, u_smjeru, niza, na_kraju, pod_utjecajem, početkom, za_razliku_od, u_blizini, poslije, putem_prema, blizu, područjem, kao_kroz, u_znak, u_korist, kao_od, kao_u, umjesto, ispred, krajem, uslijed, unatoč, uoči, u_suradnji_s, na_području, na_račun, na_prostoru, navrh, sve_do, iznad, kraj, pored, putem, tek_nakon, u_slučaju, u_ime, prilikom, nedaleko, osim, nego, usprkos, usred, uime, radi, među, do_pred, u_svezi_s, paralelno_s, počevši_od, pomoću, pokraj, iz_smjera, u_području, unutar, ka, na_čelu, u_čast, na_čelu_s, širom, povodom, kao_iz, u_skladu_sa, diljem, do_poslije, u_osvit, u_formi, više_od, zahvaljujući.
  • Subordinating conjunction. Lemma of the conjunction is specified. The following subordinating conjunctions occur in CROVALLEX 2.008: što, zašto, kad, kako_bi, kada, jer, kao_da, nego, nego_da, prije, dok, da, čim, kako.
  • Infinitive construction. The abbreviation 'inf' stands for infinitive verbal complementation. 'inf' can appear together with a preposition (e.g. 'nego+inf') and with the morphological case (e.g. 'inf+4').
  • Construction with adjectives. Abbreviation 'adj-number' stands for an adjective complementation in the given case, e.g. adj-7 ('Osjećam se osvježenim' - 'I feel fresh').
  • Construction with adverbs. Abbreviation 'adv-adverbword' stands for an adverb complementation in the specific form, e.g. adv-hrabro ('Osjećam se hrabro' - 'I feel brave').
  • Construction with nominative predicate. Abbreviation 'nom_pred' stands for the complementation that represent nominative predicate , e.g. nom_pred ('Historija je postala legendom' - 'History has become legend').

10. Implicitly Declared Forms

If no forms are listed explicitly for a frame slot, then the list of possible forms implicitly results from the functor of the slot according to the following (yet incomplete) table:

LOC

blizu+2, kod+2, u+6, na+6 ...

MANN

adverb, sukladno+2, poput+2 ...

DIR3

na+4,u+4 ...

DIR1

s+2, od+2, iz+2 ...

DIR2

adverb, 7, kroz+4, oko+2, po+6 ...

TWHEN

adverb, pred+4, za+4, oko+2, tijekom+2, u+4, nakon+2 ...

THL

adverb, 7, indeclinabilia+2 ...

EXT

adverb, 4, za+4, indeclinabilia+2 ...

REG

za+4, u+6, prema+6...

TFRWH

s+2 ...

AIM

za+4, na+4, da bi, da ...

TOWH

na+4, za+4 ...

TSIN

od+2, adv ...

THL

adv,7 ...

INST

7, s+indeclinabilia+2 ...

CAUS

7, od+2, zbog+2, jer ...

11. Types of Complementations

In CROVALLEX 2.008, valence frames consist of inner participants and free modifications that are both obligatory and non-obligatory but typical ('obl' and 'typ' for short). Typical inner participants and free modifications are those that are typically ('typ') related to some verbs (or even to whole classes of them) and not to others.

The attribute 'type' is attached to each frame slot and can have one of the following values: 'obl' or 'typ' for both inner participants and free modifications.

12. Frame Attributes

In CROVALLEX 2.008, frame attributes (more exactly, attribute-value pairs) are either obligatory or optional. The obligatory attributes have to be filled in every frame. The optional attributes might be empty, usually because they are not applicable.

Obligatory frame attributes:

  • gloss - verb or paraphrase roughly synonymous with the given frame/meaning; this attribute is not supposed to serve as a source of synonyms or even of genuine lexicographic definition - it should be used just as a clue for fast orientation within the word entry
  • example - sentence(s) or sentence fragment(s) containing the given verb used with the given valence frame.

Optional frame attributes:

13. Class

Some frames are assigned semantic classes like 'motion', 'transport', 'push', 'meet', 'manner of expression', 'eat', etc. In CROVALLEX 2.008 there are 173 syntactic-semantic classes (more accurately, 72 classes with tho further levels of subdivision). Those classes have been derived from VerbNet (a verb lexicon based on Levin’s verb classes which also provides selectional restrictions attached to semantic roles) and specially refined and modified for Croatian language. This classification is still tentative and should not be as a properly defined ontology.

The motivation for introducing such semantic classification in CROVALLEX 2.008 was the fact that it simplifies systematic checking of consistency and allows for making more general observations about the data.

14. Aspect

Perfective verbs (in CROVALLEX 2.008 marked as 'fin.' for short) and imperfective verbs (marked as 'inf.') are distinguished between in Croatian; this characteristic is called aspect. In CROVALLEX 2.008, the value of aspect is attached to each word entry as a whole (i.e., it is the same for all its frames and it is shared by the homographs, if any).

Some verbs (i.e. analizirati, bombardirati) can be used in different contexts either as perfective or as imperfective (in CROVALLEX 2.008 marked as 'dual.' for short).

15. Idiomatic frames

The focus in CROVALLEX 2.008, is mainly on primary or usual meanings of verbs. But, many frames also correspond to peripheral usages of verbs - these are idiomatic frames with label 'idiom'. An idiomatic frame is tentatively characterized either by a substantial shift in meaning (with respect to the primary sense), or by a small and strictly limited set of possible lexical values in one of its complementations.