Code Co-occurrence Coefficient

The c-coefficient indicates the strength of the relation between two codes.

To activate the c-coefficient, click on the wheel icon.

Code Co-occurrence Table coefficient

The range of the c-coefficient is between 0 and 1:

  • 0 mean codes do not co-occur
  • 1 means these two codes co-occur wherever they are used

The calculation of the c-coefficient is based on approaches borrowed from quantitative content analysis. It is calculated as follows:

c = n12 / (n1 + n2 - n12)

n12 = number of co-occurrences for code n1 and n2

The c-coefficient is useful when working with larger amounts of cases and structured data like open-ended questions from surveys. If you use the c-index, pay attention to the additional colored hints. As your database is qualitative, the c-coefficient is not the same as for instance a Pearson correlation coefficient and therefore also no p-values are provided.

Distortion due to unequal frequencies

An inherent issue with the c-index and similar measures is that it is distorted by code frequencies that differ too much. In such cases the coefficient tends to be much smaller than the potential significance of the co-occurrence. For instance, if you had coded 100 quotations with code 'A' and 10 with code 'B', and there are 5 co-occurrences, then the c-coefficient is:

c = 5/(100 + 10 - 5) = 5/105 = 0.048

A coefficient of only 0.048 may slip your eye easily, although code B appears in 50% of all its applications with code A.

If one of the codes of one pair has been applied more than 5 times than the other code, a yellow dot appears in the top right of the table cell. So whenever a cell shows the yellow marker, it should invite you to look into the co-occurrences of this cell despite a low coefficient.

Out of Range

The c-index assumes separate non-overlapping text entities. Only then can we expect a correct range of values.

The coefficient can exceed the 0 - 1 range it is supposed to stay with, if there is redundant coding. Let's take a look at the following example:

  • A quotation 1 is coded with codes A and B
  • an overlapping quotation 2 is coded with Code B.

Then quotation 1 counts for 1 co-occurrence event and the overlapping quotation 2 for another. This results in a value of twice the allowed maximum:

c = 2/(1 + 2 - 2) = 2

All cells displaying an out-of-range number (> 1) show a red circle in the top right corner.

In case the c-coefficient exceeds 1, you need to do some cleaning-up and eliminate the redundant codes. See Finding Redundant Codings .

Summary of color indicators

Yellow circle: Unequal quotation frequencies - the ration between the frequencies of the code and row code exceeds the threshold of 5.

Red circle: The C-index exceeds the 0 - 1 range.

Orange circle: The orange dot is simply a mixture of the red and yellow conditions.