Code Co-occurrence Coefficient

The c-coefficient indicates the strength of the relation between two codes.

This option can be activated in the ribbon of the Code Co-occurrence Table.
The range of the c-coefficient is between 0 and 1:

  • 0 mean codes do not co-occur
  • 1 means these two codes co-occur wherever they are used

The calculation of the c-coefficient is based on approaches borrowed from quantitative content analysis. It is calculated as follows:

c = n12 / (n1 + n2 - n12)

n12 = number of co-occurrences for code n1 and n2

The c-coefficient is useful when working with larger amounts of cases and structured data like open-ended questions from surveys. If you use the c-index, pay attention to the additional colored hints. As your data base is qualitative, the c-coefficient is not the same as for instance a Pearson correlation coefficient and therefore also no p-values are provided.

Distortion due to unequal frequencies

An inherent issue with the c-index and similar measures is that it is distorted by code frequencies that differ too much. In such cases the coefficient tends to be much smaller than the potential significance of the co-occurrence. For instance, if you had coded 100 quotations with code 'A' and 10 with code 'B', and you have 5 co-occurrences, then the c-coefficient is:

c = 5/(100 + 10 - 5) = 5/105 = 0.048

A coefficient of only 0.048 may slip your eye easily, although code B appears in 50% of all its applications with code A.

If one of the codes of one pair has been applied more than 5 times than the other code, a yellow dot appears in the top right of the table cell. So whenever a cell shows the yellow marker, it should invite you to look into the co-occurrences of this cell despite a low coefficient.

When you hover with the mouse over a cell, a pop-up note displays the ratio of the two codes.


The c-index assumes separate non-overlapping text entities. Only then can we expect a correct range of values.

The coefficient can exceed the 0 - 1 range it is supposed to stay with, if there is redundant coding. Let's take a look at the following example:

  • A quotation 1 is coded with codes A and B
  • an overlapping quotation 2 is coded with Code B.

Then quotation 1 counts for 1 co-occurrence event and the overlapping quotation 2 for another. This results in a value of twice the allowed maximum:

c = 2/(1 + 2 - 2) = 2

All cells displaying an out-of-range number (> 1) show a red dot in the top right corner.

In case the c-coefficient exceeds 1, you need to do some cleaning-up and eliminate the redundant codes. See Finding Redundant Codings .

Summary of Color Indicators

Yellow dot: Unequal quotation frequencies - the ration between the frequencies of the code and row code exceeds the threshold of 5.

Red dot: The C-index exceeds the 0 - 1 range.

Orange dot: The orange dot is simply a mixture of the red and yellow conditions.