Search & Code

With the recent advances in deep learning, the ability of algorithms to analyse text has improved considerably. Creative use of advanced artificial intelligence techniques can be an effective tool for doing in-depth research.

Under the Search & Code tab, ATLAS.ti offers four ways of searching for relevant information in your data that can then be automatically coded.

Search & Code tab

You find more detail on each of the four tools in the main manual:

Below only one of the tools, namely Sentiment Analysis is explained:

Sentiment Analysis

Currently supported languages are: English, German, Spanish and Portuguese.

Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral) within text data using text analysis techniques.

Application Examples

  • identifying and cataloguing a piece of text according to the tone conveyed by it.
  • understanding the social sentiment of a brand, product or service.
  • identifying respondent sentiment toward the subject matter that is discussed in online conversations and feedback.
  • analysing student evaluations of lectures, seminars, or study programs.

Sentiment analysis works best on structured data like open-ended questions in a survey, evaluations, online conversations, etc.

Carrying out a Sentiment Analysis

To open the tool, sSelect the Search & Code tab and from there Sentiment Analysis.

Select documents or document groups that you want to search and click Continue.

Select whether the base unit for the search and the later coding should be paragraphs or sentences, and which sentiment you want to search for.

ATLAS.ti proposes code label for each sentiment: Positive / Neutral / Negative. If you want to use different code names, you can change them here.

Manage Models: If you want to improve your results, you can download and install a more comprehensive model. Currently, it is available for German and English language texts. More languages will be added in the future. The size of the German model is ~ 230 MB and for the English model ~ 110 MB.

Click on Manage Models if you want to install or uninstall an extended model.

Click Continue to begin searching the selected documents. On the next screen, the search results are presented, and you can review them.

The result page shows you a Quotation Reader indicating where the quotations are when coding the data with the proposed code. If coding already exist at the quotation, it will also be shown.

By clicking on the eye icon, you can change between small. medium and large previews.

You can code all results with the proposed codes at once; or you can go through review each data segment and then code it by clicking on the plus next to the code name. You can code all results at once by selecting Accept Proposed Codings from the Drop-Down menu.

Depending on the area you have selected at the beginning, either the sentence or the paragraph is coded.

The regular Coding Dialogue is also available to add or remove codes.

The Search Engine Behind the Sentiment Analysis

We are using spaCy as our natural language processing engine. More detailed information can be found here.

Input data gets processed in a pipeline - one step after the other as to improve upon the derived knowledge of the prior step. Click here for further details.

The first step is a tokenizer to chunk a given text into meaningful parts and replace ellipses etc. For example, the sentence:

“I should’ve known(didn’t back then).” 
    will get tokenized to:
        “I should have known ( did not back then ).“

The tokenizer uses a vocabulary for each language to assign a vector to a word. This vector was pre-learned by using a corpus and represents a kind of similarity in usage in the used corpus. Click here more information.

The next component is a tagger that assigns part-of-speech tags to every token and lexemes if the token is a word. The character sequence “mine”, for instance, has quite different meanings depending on whether it is a noun of a pronoun.

Thus, it is not just a list of words that is used as benchmark. Therefore, there is also no option to add your own words to a list or to see the list of words that is used.

The sentiment analysis pipeline is trained on a variety of texts ranging from social media discussions to peoples’ opinions on different subjects and products. We are using modified pre-trained/built-from-the-ground-up models - depending on the language.

Artificial intelligence techniques have been developed for big data analysis. The data corpora usually handled by ATLAS.ti are considerably smaller. Thus, you cannot expect all results to be perfect. Reviewing the results will be a necessary component of the analysis process when using these tools. When working with the tools, you will see that the tools will add another level to your analysis. You find things that you simply do not see when coding the data manually, or would have not considered to code. We, at ATLAS.ti, consider manual and automatic coding to be complementary; each enhancing your analysis in a unique way.