AI Coding

ATLAS.ti is proud to offer the first of many truly automated features, powered by the world-leading GPT models from OpenAI.

AI Coding supports you, the researcher, by "reading" your documents and conducting fully automated inductive coding. Of course, you can review and refine the codes that are generated to suit your research, but it can save valuable time by processing your data and doing the menial task of initial coding for you. The time you save can be used to refine and merge codes, conduct analysis, and produce visualizations when reporting your research.

Please note that AI Coding is a beta feature. You are welcome to use it freely, and we welcome your feedback on all aspects of it.

You can access AI Coding via the Analysis or Document ribbons, or in the Analysis section in the document or document group context menus.

Quick Overview

Start by selecting the documents or document groups you want ATLAS.ti to code for you. After clicking "Start Coding," ATLAS.ti will ask if you want to proceed and show you a rough estimate of the time it will take to process your data.

AI Coding will upload your document content to ATLAS.ti and OpenAI servers. We will never upload your data without your explicit consent. If you wish to proceed, toggle the checkbox where you acknowledge that you agree to our EULA and Privacy Policy. OpenAI will NOT use ATLAS.ti user data to train OpenAI’s models.

You can continue working while AI Coding is running.

After AI Coding has finished, ATLAS.ti will present you an overview of the results.

Click "Review Results" to see the list of coded quotations with a list of codes on the left side. This view is fully interactive. Use it to review and refine the coding.

Select codes on the left side to show only quotations coded with this code, merge codes via drag-and-drop, delete codes you don't like, rename and color codes as you see fit, et cetera. Both the quotations as well as the codes can be right-clicked to show their context menu, giving you the full power of ATLAS.ti.

You can close this window now; the coding is done. Or you can leave it open and switch between the ATLAS.ti main window and this one to further refine and review.

All the codes that AI Coding coded with will be part of the code group "AI Codes".

AI Coding Document Selection

How to get the Best Out of AI Coding

It’s best to submit documents that belong together thematically in the same round of AI Coding. This will improve coding quality and enable ATLAS.ti to ignore repeating patterns across documents, such as interview questions.

AI Coding works per paragraph. For best results, interview questions or participant names in transcripts should be on their own paragraph, and the paragraph structure of a document should be well-defined. When editing your document, use the numbers on the left margin to determine separation between paragraphs.

PDF documents do not contain paragraphs, even if they look like they do, so results from PDF documents will likely be poor.

AI Coding skips very short paragraphs.

AI Coding only looks at the plain text of your documents. Existing codes, quotations, and formatting have no bearing on its results.

AI Coding In Depth

AI Coding works with the GPT family of large language models. These models are based on vast amounts of different texts and additional training by human researchers, enabling them to be used in a general-purpose way.

We at ATLAS.ti offer easy access to these capable models with our AI Coding feature without the need to understand and navigate the technical details and limitations. ATLAS.ti automatically splits your text into chunks that are handled the AI and passes them to the GPT models for repeated analysis. The results of the analysis are algorithmically combined to offer the best mix of codes covering different topics without producing an overwhelming amount of codes.

Here is an in-depth view on these steps:

Large language models are sensitive to the surrounding text as they have a window of attention, also called the context. Choosing the right context is key in working with a large language model. A short context can mean that not a lot information can be processed, because some words or even sentences only make sense in a larger context. Choosing a context that is too large overflows the model with information and makes it harder for code-extraction algorithms to find meaningful codes. ATLAS.ti choses contexts that are no less than 100 characters long, but breaks at natural paragraphs boundaries. We found that these confines have a positive impact on coding quality while leaving out most headings and titles which are of less importance to qualitative researchers. As a further step to reduce noise in the analysed data, ATLAS.ti removes paragraphs from analysis that have more than one occurrence, as these are likely recurring phrases from interviews or speaker names.

In the first step of analysis, the AI models are tasked with finding meaningful codings for each segment of data, acting as if the model was a qualitative research assistant. As the models from the GPT family are quite creative, AI Coding may produce similar codings, while others touch on the same topic but are expressed in dissimilar ways. This procedure can produce hundreds of different codes that are not manageable in a time-saving manner.

To address this, we use a characteristic of language models called embeddings, where each word or phrase can be associated with a point in a multidimensional space. One of the useful properties of this feature is that words and phrases that are more similar to one another are located nearer to each other than words or phrases that are less similar. Leveraging this property, ATLAS.ti clusters the codes by combining the ones nearest to each other into a collection. This clustering is repeated until a satisfactory number of collections is reached.

The combined codes are associated with the generated segments and proposed in the ATLAS.ti interface for viewing quotations.

Keep in mind that, while it is generally beneficial that the GPT models were trained on vast amounts of text, this may in some situations result in incorrect output that does not accurately reflect real people, places, or facts. In some cases, GPT models may encode social biases such as stereotypes or negative sentiment towards certain groups. You should evaluate the accuracy of any output as appropriate for your use case.