Code-Document Table: Normalization

If documents are of unequal length, or document groups of unequal size, the absolute frequencies might be misleading as a measure of comparison. If you conduct interviews, and some interviews are only half an hour long and others 2 hours, the chance someone talks about certain issues and how often during the interview is higher in the long interviews. Thus, if you were to compare absolute numbers, the results might just reflect the different durations of the interviews. The same applies if you analyse reports. If one report is 25 pages long and another 100 and the full reports are coded, absolute frequencies are not a valid measure for comparison. The total number of quotations in a document, or a group of documents (see first row) can help you to decide whether normalization of the data is appropriate.

Data Normalization If you normalize the data the number of codings are adjusted. As benchmark, the document / document group column or row with the highest total is used. For example, if the total number of codings for a document is 100 and for another document it is 50, then all codings of the second document are multiplied by 100/50 = 2.

As normalization often results in uneven numbers, it is useful to work with relative rather than absolute frequencies. See Relative Frequencies.