This 2013 Seminal Paper By Google Researchers Changed NLP Forever. Here’s A Deep Dive

The purpose of a natural language is to facilitate communication and ideas among people. These ideas converge to identify the meaning of the utterance of the text. This meaning is called semantics. Researchers in the field of natural language processing (NLP) and computational linguistics try to outline theories and approaches to natural language semantics.

One of the milestones in the modern NLP practice has been the invention of embedded word vectors. A 2013 paper titled Efficient Estimation of Word Representations in Vector Space by Tomas Mikolov, Kai Chen, Greg Corrado and Jeffrey Dean, introduced techniques that can be used for learning high-quality word vectors from huge data sets with billions of words and with millions of words in the vocabulary. This was a breakthrough because the paper provided a much-needed alternative to the n-gram models. The simple techniques like n-gram models had reached their limits in many tasks. In domain-heavy tasks such as speech recognition, the results are mainly dominated by the high quality of the transcribed speech data. Thus, there were instances where simply scaling up the corpora didn’t enhance the performance.

The ideas stated in the paper were hugely successful and were used to make advances in the problem of capturing semantics and the semantic relationship between the words. The paper used a distributed and continuous representation of words as opposed to a 1-of-N encoding. The researchers Mikolov and others, weren’t the first to use continuous vector representations of words, but they substantially reduced the computational complexity of learning such representations.

Word Vectors

To put it simply, word vectors are just numerical representation of text, and may take many forms. One of the most common word vector representation is 1-of-N encoding. The encoding of a given word is simply the vector in which the corresponding element is set to one, and all other elements are zero. Broadly, we can classify the types of embeddings in two:

Frequency based word embeddings
Prediction based word embeddings

Learning word vectors

Efficient Estimation of Word Representations in Vector Space proposed two new architectures: A Continuous Bag-of-Words model, and a Continuous Skip-gram model.

Continuous bag-of-words (CBOW) Model

Consider this line:

“We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks.”

Now picking any chunk from the above line, select a focus word and other words around them as context.

In this case, for example:

[the quality of these] [representations] [is measured in a]

Context Focus word Context

In the CBOW model the context words form the input to the neural network. If the size of the vocabulary is N, then the inputs are represented in a 1-of-N encoding — with only one element switched to 1 and others to zero. There is a hidden layer and an output layer other than the above presented input layer in this approach. Refer to the following diagram:

The objective is to maximise the conditional probability of observing the actual output word (the focus word) given the input context words, with regard to the weights. In our example, given the input (“the”, “quality”, “of”, “these”, “are”, “measured”, “in”, “a”) we want to maximise the probability of getting “representation” as the output.

Remember our inputs are in one hot encoding, so the result of multiplying it with the weight matrix will simply be selecting a row from the weight matrix.

Therefore, after passing C input word vectors as input, the hidden layer simply does linear activation and passes the weighted sum of the input on to the output layer. At the output layer the error between the target and output layer is calculated and back-propagated to change the weights.

The Skip Gram Model

The skip gram model is the mirror image of CBOW model. It is built up by using the focus word as the input vector and the target is to learn the context words.

The activation function for the hidden layer is simply taking the corresponding row from the weights matrix W1 (linear) as we saw in the CBOW approach.

At the output layer, we have an output of C multinomial distributions. Hence the output is of multiple words. In our example, the input would be “representations”, and the correct answers would be (“the”, “quality”, “of”, “these”, “are”, “measured”, “in”, “a”) at the output layer. Element-wise sum is calculated over all the error vectors to obtain a final error vector. This error is again back-propagated to update the weights of the shallow networks.

Applications Of Word Embeddings:

Words are embedded into a real vector space and that’s why it is very easy to measure the distance between words. This helps in quantifying the relation between words and sentences.
Since word embeddings give a very powerful way to create vector representations, many recommendation systems are based on them. Spotify uses it to recommend music and Stitch Fix uses it to predict clothes.
Word vectors can be used to do subtraction and addition operations on the vector. These operations allow us to use them in machine translations and sentiment analysis among other applications.

With the examples mentioned above, many NLP and non-NLP applications have used the techniques proposed in the paper, revolutionising numerous internet services and products.

The post This 2013 Seminal Paper By Google Researchers Changed NLP Forever. Here’s A Deep Dive appeared first on Analytics India Magazine.

This 2013 Seminal Paper By Google Researchers Changed NLP Forever. Here’s A Deep Dive

Word Vectors

Learning word vectors

Applications Of Word Embeddings:

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112