What Is Hybrid Natural Language Understanding?


Table of Contents

1. What is Hybrid Natural Language Understanding?

2. Different Approaches to NLU
2.1 Rule- and Knowledge Graph-based (Symbolic) Approach
2.2. Machine Learning Model Approach

3. The Hybrid approach
3.1. Symbolic techniques in support of a machine learning model
3.2 Machine learning techniques in support of a symbolic model
3.3. Symbolic and machine learning working in parallel

Language fuels the enterprise. It’s in everything from emails to videos to business documents and beyond. However, organizations struggle to maximize its value.

There is no ignoring the importance of language to the enterprise ecosystem. Organizations are listening. Natural language processing technologies are delivering value for organizations right now, according to Forrester. An AI Journal report found that nearly 80{29fe85292aceb8cf4c6c5bf484e3bcf0e26120073821381a5855b08e43d3ac09} of companies surveyed will be spending more on NLP projects in 2023 and 2024 in order to capture NLP-driven efficiencies that reduce costs, drive growth and offer a competitive advantage.

Organizations need a method for leveraging the copious amounts of unstructured data available to them. So while this momentum towards NLP is a good start, it is just that…a start. Natural language understanding (NLU) is where the real difference is made for the enterprise.

What Is Natural Language Understanding?

Natural Language Understanding is a branch of artificial intelligence and a subset of NLP. Where NLP breaks down language into a machine-readable format, NLU goes a step further to help machines understand, interpret and emulate that language. It provides structure to unstructured data (e.g., contracts, emails, social media and other enterprise documents), which allows organizations to scale the reading, organizing and quantifying of text data for easier analysis.

It fills the gap between human communication and machine understanding. We can leverage NLP to automatically understand the meaning of words in context via disambiguation and extract valuable information from text data.

Different Approaches to NLU

There are many different approaches for building NLU capabilities. Each of them offers its own pros and cons. The most common approaches include:

Rule- and Knowledge Graph-based (Symbolic) Approach

A symbolic approach is based upon pre-established linguistic rules. A knowledge graph provides an explicit representation of knowledge complete with rich, expressive and actionable descriptions of concepts, both general and specific to a domain. This information supports the logical explanations of reasoning outcomes.

This approach is human-driven, as it relies on linguistic rules and the knowledge embedded in the knowledge graph to examine linguistic and semantic relationships to interpret language and its parts (e.g., grammar, sentence structure, etc.). This process enables you to analyze language, extract data and categorize text.

Subject matter experts (SMEs) and/or knowledge engineers (KEs) are critical to this process as a high level of control and the ability to adjust rules as needed is often required. This approach is well-suited for task-oriented experiences, complex document analysis or search.

Machine Learning Model Approach

With a machine learning approach, the algorithm learns from the data on which it is trained. The training method can be supervised—where the system maps an input to an output based on a training dataset that is labeled—or unsupervised, which uses unlabeled data, or finally, semi-supervised, which uses both labeled and unlabeled data

Unlike a symbolic approach, ML is not based in knowledge, but its “intelligence” depends on the volume and quality of the data used to train it. In fact, its learning is done in a sort of “black box.” This means that there is no visibility into how a system learns or how it arrives at decisions. So, if your model produces a different outcome than what you wanted, or if the model contains a bias, it’s extremely difficult, if not impossible, to pinpoint what caused it. The only recourse is to retrain the algorithm using more data. Still, this doesn’t guarantee that the issue will be resolved.

The emergence of ChatGPT has brought another form of machine learning into the picture. Large language models (LLMs) combine predictions from multiple ML algorithms, which allows them to make a more accurate prediction than one model alone. Compared to the average machine learning model, LLMs are trained on enormous general purpose datasets—think internet scale with billions of parameters— that are composed of data scraped from entire websites, such as Wikipedia, Reddit forums and Scribd. Their algorithms are built to “understand” by predicting the next word in a sentence, and they can be used to generate human language, among other tasks. Notably, LLMs are also considered to be black box models that are not explainable.

The Hybrid Approach

When it comes to solving the practical, yet complex language “problems” that businesses face, a single approach is rarely the solution. Before “large language models” and “generative AI” became front-page news, machine learning and symbolic AI were long considered the only viable approaches to natural language understanding and they have been pitted against each other as mutually exclusive options. This has forced organizations to compromise one way or another. In a hybrid AI approach, organizations can combine different techniques to take advantage of the advantages of each.

One point to clarify is that a hybrid approach does not mandate that ML and symbolic work in parallel. A hybrid approach can take any of the following three forms:

  • Symbolic techniques in support of a machine learning model

A primary example of this hybrid relationship can be seen in the features engineering process. This process is arguably the most important aspect of building a machine learning model as it establishes the features (i.e., attributes) with which you train your machine learning algorithms.

In an ML-only approach, this process is typically done manually by domain experts (tedious and time-consuming) or is automated using an open-source NLP library or API (with limited language comprehension capabilities). However, a symbolic approach enables your domain experts to establish a rule-based structure to identify elements from your text data that can become features of the input data. This is the best and fastest way to scale your expertise and maintain flexibility when you need to retrain your model.

  • Machine learning techniques in support of a symbolic model

A symbolic approach is ideal for efficiently classifying and extracting text from content in a highly accurate and explainable way. However, this technique can be less scalable due to the complex and time-consuming nature of rule writing, especially when subject matter experts are starting with a blank slate.

Machine learning can accelerate the process by creating an initial set of rules through automated annotation of a document set. In doing this, you transform “black box” results into an explainable rule-based framework. These rules can then be easily extended and fine-tuned via a symbolic approach for unrivaled quality control.

  • Symbolic and machine learning working in parallel

Though one approach often supports another in a hybrid approach, there are many instances where they work more closely together to accomplish a task. A primary example of this is the categorization of complex documents.

In many cases, a passage can appear multiple times in a document and imply something different in both instances. For example, a monetary amount (e.g., $50,000) found in an insurance policy could imply a reduction in risk for the insurer if it refers to a deductible cost or premium, or it could increase the risk if it refers to coverages.

In this example, a hybrid workflow that leverages a symbolic approach to assign specific roles and characteristics to document segments and makes the machine learning side aware of this information, will benefit both models.

Conclusion

Human language is a complex challenge that underpins so many critical enterprise processes. Solving these real-world problems with the greatest accuracy is rarely done with a single approach or technique.The hybrid approach is the only way to address the intrinsic limitations of each individual technique while also realizing the benefits that each has to offer.

Leave compromise out of your vocabulary (unless you need it in your knowledge graph) and embrace the approach that will transform the present and future of your organization.



Source link