Natural Language Generation (NLG)

Hana M April 21, 2023 | 10:00 AM Technology

What is Natural Language Generation?

Natural Language Generation, otherwise known as NLG, is a software process driven by artificial intelligence that produces natural written or spoken language from structured and unstructured data. It helps computers to feed back to users in human language that they can comprehend, rather than in a way a computer might.

For example, NLG can be used after analyzing customer input (such as commands to voice assistants, queries to chatbots, calls to help centers or feedback on survey forms) to respond in a personalized, easily-understood way. This makes human-seeming responses from voice assistants and chatbots possible.

It can also be used for transforming numerical data input and other complex data into reports that we can easily understand. For example, NLG might be used to generate financial reports or weather updates automatically. [1]

Figure 1. How NLG works [2]

How does NLG work?

Figure 1 shows how NLG works. An automated text generation process involves 6 stages. For the sake of simplicity, we’ll explain each stage from an example of robot journalist news on a football match:

1. Content Determination

The limits of the content should be determined. The data often contains more information than necessary. In football news examples, content regarding goals, cards, and penalties will be important for readers.

2. Data interpretation

The analyzed data is interpreted. Thanks to machine learning techniques, patterns can be recognized in the processed data. This is where data is put into context. For instance, information such as the winner of the match, goal scorers & assists, minutes when goals are scored are identified in this stage.

3. Document planning

In this stage, the structures in the data are organized with the goal of creating a narrative structure and document plan.

Football news generally starts with a paragraph that indicates the score of the game with a comment that describes the level of intensity and competitiveness in the game, then the writer reminds the pre-game standings of teams, describes other highlights of the game in the next paragraphs, and ends with player and coach interviews.

4. Sentence Aggregation

It is also called micro planning, and this process is about choosing the expressions and words in each sentence for the end-user. In other words, this stage is where different sentences are aggregated in context because of their relevance.

For example, below, the first two sentences provide different meanings. However, if the second event occurs right before half time, then these two sentences can be aggregated like the third sentence:

  1. “[X team] maintained their lead into halftime. “
  2. “VAR overruled a decision to award [Y team]’s [Football player Z] a penalty after replay showed [Football player T]’s apparent kick didn’t connect.”
  3. “[X team] maintained their lead into halftime after VAR overruled a decision to award [Y team]’s [Football player Z] a penalty after replay showed [Football player T]’s apparent kick didn’t connect.”

5. Grammaticalization

Grammaticalization stage makes sure that the whole report follows the correct grammatical form, spelling, and punctuation. This includes validation of actual text according to the rules of syntax, morphology, and orthography. For instance, football games are written in the past tense.

6. Language Implementation

This stage involves inputting data into templates and ensuring that the document is output in the right format and according to the preferences of the user. [3]

These days, in an era where content is king, natural language generation appears to be the way forward. [4]

References:

  1. https://www.qualtrics.com/experience-management/customer/natural-language-generation/
  2. https://www.soapboxlabs.com/blog/the-role-of-nlg-in-conversational-ai/
  3. https://research.aimultiple.com/nlg/
  4. https://builtin.com/artificial-intelligence/what-is-natural-language-generation

Cite this article:

Hana M (2023), Natural Language Generation (NLG), AnaTechmaz, pp.195

Recent Post

Blog Archive