Introduction to NLP

Natural language processing (NLP) is an area of computer science and artificial intelligence that is concerned with the interaction between computers and humans in natural language. The ultimate goal of NLP is to enable computers to understand language as well as we do. It is the driving force behind things like virtual assistants, speech recognition, sentiment analysis, automatic text summarization, machine translation and much more. In this post, you will learn the basics of natural language processing, dive into some of its techniques and also learn how NLP benefited from the recent advances in Deep Learning.


Table of Contents:

  1. Introduction
  2. Why NLP is difficult
  3. Syntactic and Semantic Analysis
  4. NLP Techniques 
    • Parsing
    • Stemming
    • Text segmentation
    • Named entity recognition
    • Relationship extraction
    • Sentiment analysis
  5. Deep Learning and NLP
  6. Summary


I. Introduction

Natural Language Processing (NLP) is the intersection of Computer Science, Linguistics and Machine Learning that is concerned with the communication between computers and humans in natural language. NLP is all about enabling computers to understand and generate human language. Applications of NLP techniques are Voice Assistants like Alexa and Siri but also things like Machine Translation and text-filtering. NLP is one of the fields that heavily benefited from the recent advances in Machine Learning, especially from Deep Learning techniques. The field is divided into the three following parts:

Speech Recognition – The translation of spoken language into text.

Natural Language Understanding – The computers ability to understand what we say.

Natural Language Generation – The generation of natural language by a computer.


II. Why NLP is difficult

Human language is special for several reasons. It is specifically constructed to convey the speaker/writers meaning. It is a complex system, although little children can learn it pretty quickly. Another remarkable thing about human language is that it is all about symbols. According to Chris Manning (Machine Learning Professor at Stanford University), it is a discrete, symbolic, categorical signaling system. This means that you can convey the same meaning by using different ways, like speech, gesture, signs etc. The encoding of these by the human brain is a continuous pattern of activation, where the symbols are transmitted via continuous signals of sound and vision.

Understanding human language is considered a difficult task due to its complexity. For example, there is an infinite number of different ways to arrange words in a sentence. Also, words can have several meanings and contextual information is necessary to correctly interpret sentences. Every Language is more or less unique and ambiguous. Just take a look at the following newspaper headline „The Pope’s baby steps on gays“. This sentence clearly has two very different interpretations, which is a pretty good example of the challenges in NLP.

Note that a perfect understanding of language by a computer would result in an AI that can process the whole information that is available on the internet, which in turn would probably result in artificial general intelligence.


III. Syntactic & Semantic Analysis

Syntactic Analysis (Syntax) and Semantic Analysis (Semantic) are the two main techniques that lead to the understanding of natural language. Language is a set of valid sentences, but what makes a sentence valid? Actually, you can break validity down into two things: Syntax and Semantics. The term Syntax refers to the grammatical structure of the text whereas the term Semantics refers to the meaning that is conveyed by it. However, a sentence that is syntactically correct, does not have to be semantically correct. Just take a look at the following example. The sentence “cows flow supremely” is grammatically valid (subject – verb – adverb) but does not make any sense.


Syntactic Analysis:



Syntactic Analysis, also named Syntax Analysis or Parsing is the process of analyzing natural language conforming to the rules of a formal grammar. Grammatical rules are applied to categories and groups of words, not individual words. Syntactic Analysis basically assigns a semantic structure to text.

For example, a sentence includes a subject and a predicate where the subject is a noun phrase and the predicate is a verb phrase. Take a look at the following sentence: “The dog (noun-phrase) went away (verb-phrase)”. Note that we can combine every noun phrase with a verb phrase. Like I already mentioned, sentences that are formed like that doesn’t really have to make sense although they are syntactically correct.


Semantic Analysis:

For us as humans, the way we understand what someone has said is an unconscious process that relies on our intuition and our knowledge about language itself. Therefore, the way we understand language is heavily based on meaning and context. Since computers can not rely on these techniques, they need a different approach. The word “Semantic” is a linguistic term and means something related to meaning or logic.



Therefore, Semantic Analysis is the process of understanding the meaning and interpretation of words, signs, and sentence structure. This enables computers partly to understand natural language the way humans do, involving meaning and context. I say partly because Semantic Analysis is one of the toughest parts of NLP and not fully solved yet. For example, Speech Recognition has become very good and works almost flawlessly but we are still lacking this kind of proficiency in Natural Language Understanding (e.g Semantic). Your phone basically understands what you have said but often can’t do anything with it because it doesn’t understand the meaning behind it. Also, note that some of the technologies out there only make you think they understand the meaning of a text. An approach based on keywords or statistics or even pure machine learning may be using a matching or frequency technique for clues as to what a text is “about.” These methods are limited because they are not looking at the real underlying meaning


IV. Techniques to understand Text

In the following; we will discuss many of the most popular techniques that are used for Natural language Processing. Note that some of them are closely intertwined and only serve as subtasks to solve larger problems.


What is Parsing? Let’s, first of all, look into the dictionary:

to parse

“resolve a sentence into its component parts and describe their syntactic roles.”

That actually nailed it but it could be a little bit more comprehensive. Parsing refers to the formal analysis of a sentence by a computer into its constituents, which results in a parse tree that shows their syntactic relation to each other in visual form, which can be used for further processing and understanding.

Below you can see a parse tree of the sentence „The thief robbed the apartment“, along with a description of the three different information types conveyed by it. 


If we look at the letters directly above the single words, we can see that they show the part of speech of each word (noun, verb, and determiner). If we look one level higher, we see some hierarchical grouping of words into phrases.  For example, „the thief“ is a noun phrase, „robbed the apartment“ is a verb phrase and all together, they form a sentence, which is marked one level higher.

But what is actually meant by a Noun- or Verb-Phrase? Let’s explain this with the example of „Noun Phrase“. These are phrases of one or more words that contain a noun and maybe descriptive words, verbs or adverbs. The idea is to group nouns with words that are in relation to them.

A parse tree also provides us with information about the grammatical relationships of the words due to the structure of their representation. For example, we can see in the structure that „the thief“ is the subject of „robbed“.

With structure I mean that we have the verb („robbed“), which is marked with a „V“ above it and a „VP“ above that, which is linked with a „S“ to the subject „the thief“, which has a „NP“ above. This is like a template for a subject-verb relationship and there are many others for other types of relationships.



Stemming is a technique that comes from morphology and information retrieval which is used in NLP for preprocessing and efficiency purposes. But let us first look into the dictionary what stemming actually means:


– “originate in or be caused by.”

Basically, Stemming is the process of reducing words to their word stem but what is actually meant by stem? A „stem“ is that part of a word that remains after the removal of all affixes. So for example, if you take a look at the word „touched“, it’s stem would be „touch“.  „Touch“ is also the stem of „touching“ and so on.

You may be asking yourself, why do we even need the stem? The stem is needed because you are going to encounter different variations of words that actually have the same stem and the same meaning.  Let’s take a look at an example of two sentences:

# I was taking a ride in the car

# I was riding in the car.

These two sentences mean the exact same thing and the use of the word is identical.

Now, imagine all the English words in the vocabulary with all their different fixations at the end of them. To store them all would require a huge database that would contain many words that actually mean the same. This is solved by focusing only on a word’s stem, through stemming.  Popular algorithms are for example the „Porter Stemming Algorithm“ from 1979, which works pretty good.


Text Segmentation:

Text Segmentation in NLP is the process of transforming text into meaningful units which can be words, sentences, different topics, the underlying intent and much more.  Mostly, the text is segmented into its component words, which can be a difficult task, depending on the language. This is again due to the complexity of human language. For example, it works relatively well in English to separate words by spaces, except for words like „ice box“ that belong together but are separated by a space. The problem is that people sometimes also write it as „ice-box“.


Named Entity Recognition

Named Entity Recognition (NER) concentrates on determining which items in a text („named entities“)  can be located and classified into pre-defined categories. These categories can range from the names of persons, organization, locations to monetary values and percentages.

Just take a look at the following example:

Before NER:  „Martin bought 300 shares of SAP in 2016.“

After NER: „[Martin]Person bought 300 shares of [SAP]Organization in [2016]Time.“


Relationship Extraction

Relationship Extraction takes the named entities of „Named Entity Recognition“ and tries to identify the semantic relationships between them. This could mean for example finding out who is married to whom, that a person works for a specific company and so on. This problem can also be transformed into a classification problem where you can train a Machine Learning model for every relationship type.


Sentiment Analysis:

With Sentiment Analysis, we want to determine the attitude (e.g the sentiment) of, for example, a speaker or writer with respect to a document, interaction, or event. Therefore it is a natural language processing problem where text needs to be understood, to predict the underlying intent. The sentiment is mostly categorized into positive, negative and neutral categories. With the use of Sentiment Analysis, we want to predict for example a customers opinion and attitude about a product based on a review he wrote about it. Because of that, Sentiment Analysis is widely applied to things like reviews, surveys, documents and much more.

If you’re interested in using some of these techniques with Python, you can take a look at the Jupyter Notebook about Python’s Natural Language Toolkit (NLTK) that I created. You can also check out my blog post about building Neural Networks with Keras where I train a Neural Network to do Sentiment Analysis.


V. Deep Learning and NLP

Now we know a lot about Natural Language Processing but the question that remains is, how do we use Deep Learning in NLP.

Central to Deep Learning and Natural Language is „word meaning“, where a word and especially it’s meaning are represented as a vector of real numbers. So with these vectors that represent words, we are placing words in a high-dimensional space. The interesting thing about this is, that the words, which are represented by vectors, will act as a semantic space. This simply means that words that are similar and have a similar meaning tend to cluster together in this high-dimensional vector space. You can see a visual representation of word meaning below:


Screen Shot 2018-08-07 at 09.33.59.png

You can find out what a group of clustered words mean by doing Principal Component Analysis (PCA) or Dimensionality Reduction with T-SNE but this can be misleading sometimes because they oversimplify and leave a lot of information on the side. Therefore, this is a good way to start (like logistic or linear regression in Data Science) but it isn’t cutting edge and it is possible to do it way better.

We can also think of parts of words as vectors which represent their meaning. Imagine the word „undesirability“. Using a „morphological approach“, which involves the different parts a word has, we would think of it as being made out of morphemes (word-parts) like this: „Un + desire + able + ity“. Every morpheme gets its own vector. This allows us to built a Neural Network out of that, which can compose the meaning of a larger unit, which in turn is made up of all of these morphemes.

Deep Learning can also make sense of the structure of sentences, by creating Syntactic Parsers that can figure out the structure of sentences. Google uses dependency parsing techniques like this, although in a more complex and larger manner, at their „McParseface“ and „SyntaxNet“.

By knowing the structure of sentences, we can start trying to understand the meaning of sentences. Like we already discussed, we start off with the meaning of words being vectors but we can also do this with whole phrases and sentences, where their meaning is also represented as vectors. And if we want to know the relationship of or between sentences, we train a Neural Network to make these decisions for us.

Deep Learning works also good at Sentiment Analysis. Just look at the following movie-review: „This movie does not care about cleverness, with or any other kind of intelligent humor“ A traditional approach would have fallen into the trap of thinking this is a positive review, because „cleverness or any other kind of intelligent humor“ sounds like a positive intent but a Neural Network would have recognized its real meaning. Other applications are Chatbots, Machine Translation, Siri, Google Inboxes suggested replies and so on.

There also have been huge advancements in Machine Translation through the rise of Recurrent Neural Networks, about which I also wrote a blog-post.

In Machine Translation done by Deep Learning algorithms, language is translated by starting with a sentence and generating vector representations that represent it. Then it starts to generate words in another language that entail the same information.

To summarize, NLP in combination with Deep Learning is all about vectors that represent words, phrases etc. and also to some degree their meanings.


VI. Summary

In this post, you’ve learned a lot about Natural Language Processing. Now you know why NLP is such a difficult thing and why a perfect language understanding would probably result in artificial general intelligence. We’ve discussed the difference between syntactic and semantic analysis and learned about some NLP techniques that enable us to analyze and generate language. To summarize, the techniques we’ve discussed were parsing, stemming, text segmentation, named entity recognition, relationship extraction, and sentiment analysis. On top of that, we’ve discussed how deep learning managed to accelerate NLP by the concept of representing words, phrases, sentences and so on as numeric vectors.





2 thoughts on “Introduction to NLP

Add yours

  1. An interesting post, although the discussion of vectors could be enhanced. Specifically, most vector-based methods (GloVe and word2vec) use a “bag of words” approach that loses all syntax and semantics before the modeling. That’s a severe limitation. Adding (unreliable) parsing on top of that gets to inferior results unless you have a LOT of training data and a LOT of processing. Great if you’re Google; a bit more difficult for everyone else.

    Instead, we apply a technique we call NoNLP(tm) representation to embed the structural information (can be sentence/paragraph, parse, negation, etc.) directly in the vector. Then the machine learning can proceed over a richer S/N input. In fact, then, simple learning like Ridge Regression are just as effective as neural nets, but faster, less in need of training data, and provably convergent. Check out to learn more.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Up ↑

%d bloggers like this: