Article

Utilizing Amazon Comprehend for Natural Language Processing applications

Authored by Logan Wille

Natural Language Processing is as broad a topic as words themselves, and as such, there is far too much information to cover in one article. This article is the second part in a series discussing the value of Natural Language Processing. Read part one here.

Between phone calls, electronic communications, or written words, the prevalence of language is evident. Given our reliance on a digital business environment, understanding the context of words is even more important for operations to succeed. Businesses that can extract information from these types of data in a quick and systematic way will fully be able to harness customer insights and market trends while making business decisions. Leveraging natural language processing and Amazon Comprehend in your business operations can help your organization grow in the current digital landscape.  

What is Natural Language Processing?

Natural Language Processing (NLP) is a category of artificial intelligence designed to listen, read, understand, and generate natural language. The most common use case is natural language understanding; where a model is designed to read a piece of text and extract information be it categorizing the text, extracting the entities discussed in the text, or the sentiment of the text. NLP’s biggest value in business is its ability to gain insights into unstructured data that previously would be too much of a burden to assess.

What is Amazon Comprehend?

Amazon Comprehend is an Amazon Web Services product that provides a pre-trained, machine learning NLP model that includes out of the box natural language understanding and custom model creation to help businesses uncover insights from their written text data. These out-of-the-box implementations are:

  • Entity extraction
  • Key phrase
  • Language identification
  • Personally identifiable information
  • Sentiment
  • Syntax

These out-of-the-box models require zero training and only requires formatting the text correctly and uploading to AWS. This allows for a simple path to analyzing unstructured and unlabeled text data.

By leveraging Amazon Comprehend, businesses can organize and sub-divide unstructured text data to begin structuring it with relevant meta-information on the text, like companies mentioned, or general sentiment. Some types of information it can extract includes; personal information from text, such as addresses or phone numbers, and log this information into a database. After extracting the data, businesses can begin digging into the valuable insights and explore areas of their business they need to re-evaluate based on current client and market trends.

Amazon Comprehend custom models

In addition to the out-of-the-box implementations, Amazon Comprehend allows for custom NLP models to fit specific business application. Currently, Amazon Comprehend offers either classification or entity extraction for these custom models. The custom classification can organize articles into user defined groups, such as if a piece of text is fiction or non-fiction. These groups can be either separate or overlapping:

  • If separate, no one text can belong to two different categories
  • If overlapping, a text can belong to one category or several categories

The choice of category and if the grouping is overlapping or not, is chosen at the time of data labeling. Amazon Comprehend limits the maximum number of categories to 100.

Entity extraction finds, classifies, and extracts entities in a text that belong to a defined category. Utilizing Amazon Comprehend the user can create custom categories to extract, such as any person mentioned in the text who is identified as a data scientist. Another example of this would be to identify a device that is mentioned in a text, and Amazon Comprehend would extract the devices mentioned and tag it. This powerful tool can find important pieces of information contained in a text without requiring a human to read every line of the text, reducing the time a human must sift through documents and ultimately saving a business precious man hours.

Overall, custom models take a bit more work than the out-of-the-box implementations, and a decent amount of data is required. A key requirement to develop a custom model is labeled data, that is, text entries with the specific pieces of information that are to be extracted identified in each entry. This labeled data is what is used to train the custom model and identify the desired outputs for the unlabeled data. The volume of labeled data depends on a few things, the task at hand, the accuracy required, and the subtly of the differences of the text.

Once the labeled data is obtained, the process of training Amazon Comprehend is straightforward. The properly formatted data needs to upload to AWS’s S3 and a new Amazon Comprehend customized model can be created. After the training is complete, metrics will be provided to assess the performance of this custom model. To run this custom model on new unseen data, either an endpoint can be created that is accessed through the AWS console or API, or a batch operation can be created.

In the realm of machine learning and AI, natural language processing is at the forefront of innovation. These techniques provide computers a new level of understanding language. Analyzing the immense amounts of data produced and implementing tangible business solutions can feel like an overwhelming task. Our Baker Tilly Digital professionals are knowledgeable and prepared to assist your business through an evaluation of your current NLP processes, an ideation session or helping you determine if Amazon Comprehend applications are a good fit for your organization’s goals and objectives.

 

Contact us

arrowCreated with Sketch.
Reflection of building on a city skyscraper
Next up

Key takeaways: an inside look at the IRS and Department of Justice