Photo by Ricardo Gomez Angel on Unsplash

Covarying collexeme analysis

(Stefanowitsch & Gries 2005)

What is CA?

(See note in Collostructional Analysis.txt) * Theoretical prerequisites * Methodological prerequisites

Why do we need CCA?

Little attention is paid to possible interactions between (sets of) lexemes occurring in two (or more) slots of the same construction

What is CCA?

Aim: to identify the association strength between pairs of lexical items occurring in two different slots of the same construction; to wit, looking at the way in which lexical items in one slot covary with those on another slot

Contingency table for covarying collexeme analysis*

+Wslot2 -Wslot2
+Wslot1 freq (+Wslot1, +Wslot2) freq (+Wslot1, -Wslot2) row total (freq of the lexeme in slot 1)
-Wslot1 freq (-Wslot1, +Wslot2) freq (-Wslot1, -Wslot2) row total
column total (frequency of the lexeme in slot 2) column total grand total

*Note: bold freq counts can be obtained directly form the corpus; the prerequisite for the contingency table is that the construction must be determined and given beforehand

Duo perspectives: paradigmatic + syntagmatic (set of choices available in a given position of a syntagmatic structure in relation to the set of choices available in another position in the same structure)

Statistics: Fisher-Yates Exact test + logarithmic transformation

Principle of Semantic Coherence: a word in any slot of a construction must be compatible with semantics provided by the construction for that slot, there should be an overall coherence among all slots.

『按照构式语法的原则,一个词素之所以能够出现在构式的某个槽位中且不引起歧义,是因为该词素的意义与构式意义相容。两个或多个槽位的词素能够与该构式共现,说明这两个或多个槽位中的词素有意义的连贯性(semantic coherence),并且这种连贯的意义与构式义是相容的。』(胡建和张佳易 2012)

*NOTE: this is not the Semantic Coherence Principle by Goldberg (1995: 50).

What kind of semantic coherence should be expected for any given construction?

Case study 1 (The into- causative)

SUBJcauser Vcausing.event OBJcausee [OBL into V-ingresulting.event]

  • Semantic constraints (Wierzbicka 1998): the causee initially does not want to perform the resulting event but where the causer overcomes this resistance, typically by persuasion or trickery.
  • Assumption: The causing-event slot should prefer verbs denoting actions that are suited to overcoming resistance; the resulting-event slot should prefer verbs denoting actions that causees are likely not to want to perform
  • Results: The covarying collexemes hold a high degree of semantic coherence; the sets of covarying collexemes also hold a high systematicity

Case study 2 (Possessive constructions)

a. NPpossessor’s Npossessee

b. det Nwhole of NPpart

  • Semantic constraints: s-genitive –> possession (ownership, kinship, body-part relations); of-construction –> partitive (part-whole, quantity relations)
  • Assumptions: The semantic constraints above should have semantic coherence effects on these two possessive constructions
  • Results: ICE-GB data –> bad; input-to-acquisition data (caretaker language from Manchester Corpus) –> a clear semantic prototype of possession

Case study 3 (The way-construction)

SUBJtheme Vmove POSS way [OBL P NP]path

  • Semantic constraints: (motion) verb <–> (path) preposition
  • Assumptions: should have semantic coherence effects
  • Results: verb-prep pairs in the way-construction display image-schematic coherence
    • verbs of circumvention/forcibly creating a path + OBSTACLE prepositions
    • verbs of forcibly creating a path/moving through a small opening + CONTAINER prepositions

CCA corrections

Drawback of the previous analysis > It restricts the investigation of the covariance of collexemes to one specific context (the constructions in question), disregarding the frequencies of the construction and the collexemes in the remainder of the corpus

The version of covarying-collexeme analysis introduced above treats covarying-collexme pairs as bigrams and investigates them in the subcorpus made up of the tokens of the construction in question –> item-based covarying-collexeme analysis

Corrected version: system-based

  • Research question: Is the association of a given collexeme1-collexeme2-construction trigram stronger than any of the possible associations between just two of its elements in the absence of the third
  • 2×2×2 (three dimensions) frequency table (pp.23) 5jnqYl.png
  • Statistics: configural frequency analysis – binomial test + log10-transform (to identify the overall degree of attraction/repulsion of the three elements)
  • Results: similar to the results from item-based version; stricter in the identification of repelled collexemes
  • Issue: Is the association of a given collexeme1-collexeme2-construction trigram (target trigram) stronger than any of the possible associations between just two of its elements in the absence of the third (elsewhere trigram)?
    • Possibility: two of three elements are so strongly associated with each other that this association strength alone also accounts for the significant association of the whole trigram.
    • Relevant to both loosely and strictly constructional view

System-based corrections under a strictly and loosely constructional view

Log10-transformed p-value of the trigrams below; subtraction –> distinctive

  • Strictly constructional view:

clm1+clm2+cx - clm1+¬clm2+cx

            - clm1+clm2+¬cx

            - ¬clm1+clm2+cx
  • Loosely constructional view:

clm1+clm2+cx - clm1+clm2+¬cx


  • problematic data (specifically with respect to the repelled trigrams)
  • simple string search (all -ing words, because the POS-tagging in the BNC is unreliable) –> maximal recall, precision reduction –> inflate the freq of items in question in the elsewhere context
  • potential for application of this method (system-based CCA) is severely limited
  • promising enough and valuable addition to CA


  • CCA: investigate the relationship between lexical items occurring in different slots of the same construction, and more generally, for investigating associations between triplets of linguistic signs
  • Two specific theoretical issues:
    • semantic compatibility between constructions and lexical items
    • semantic coherence between lexical items occurring in different slots of the same construction
  • Two variants of CCA

    • item-based (pair of covarying collexemes ONLY in the construction in scrutiny)
    • system-based (consider overall single and joint frequencies of the words and the construction)
      • loosely constructional view (the co-occurrence of two lexical items within a construction VS. their co-occurrence outside of this construction)
      • strictly constructional view (the co-occurrence of all three elements VS. the co-occurrence of any two of all three elements)

    Item-based method has a considerably higher precision and recall –> preferable

  • Frequency data < collostruction strength

  • Future work: register or dialect; statistical clustering techniques (objective identification of semantic classes)