How do we make computers understand our language?
John texted Matthew asking him the capital of Turkey. Now Matthew wasn’t sure about his answer, he thought it’s Istanbul but wanted to double check what he knew. He quickly googled ‘Capital of Turkey’ and then gave the answer to John. To those who want to know, the answer is Ankara.
Now when Matthew read John’s text, he simply processed the question in English and thought a while where he realized that he didn’t know the answer. Now when Matthew asked the same question to his web browser, the computer interpreted differently because computers do not understand English or any language we humans speak. So in order for them to understand our queries we should speak to them in a language which they understand.
So what basically Google did was converted the text query into a language which the computer can understand, asked him the question in that language and got the answer. All this in a matter of less than a second which far less than the time it takes for us to even type a search query.
How do we do that? One method of translating the languages of humans is converting text to vectors. But why vector?
Let’s try to better understand this with the help of an example. Amazon.com is a classic example of this kind of problem. Every day billions of people use Amazon.com to purchase items. Now when a purchaser goes to Amazon.com to purchase a product. He will check their product page. Now before purchasing a new product a purchaser normally goes to the customer review page and check the reviews given for a particular product. Every review on that page is divided into two groups: positive reviews and critical reviews. Now the problem which Amazon here has is it to classify a review given by the purchaser into either a positive review or negative review.
To solve this problem Amazon.com uses the power of Linear Algebra and if we use that, even we can build some simple but good models to classify say reviews.
Linear Algebra says that if we convert anything into vectors we can leverage the concepts such as normal, dot products, planes, lines, projections and use them to elegantly solve our problem
Now if we pull the data from Amazon.com for the millions of products they have we will have a host of features like productid, username, userid, product description, timestamp and the most important feature for us now i.e. the user review based on which we can classify our reviews.
Now, like us, computers cannot simply read a particular review and say that hey this looks like a critical review and this a positive so we will put them in their respective buckets. We have to convert these reviews into a language which computer can understand and they beautifully understand numerical data.
So we convert our reviews which are in text format into numerical values to leverage the power of linear algebra.
Now using a text transformer we convert the text reviews into numerical values and store them in vector form. Let’s for simplicity assume that we only have two dimensions in our vector. Let’s plot these values in a 2D chart.
Here we two clusters (two groups) of points. One cluster which we will be having would be of positive reviews points and the second cluster would be of critical review points.
Now using linear algebra, we can draw a line which will separate these two clusters and on that line, we will draw a normal which is perpendicular to the line. Now we know from Linear algebra that if we take a dot product of any review or point x from our normal w then we will get a value, if that value is greater than 0 then we say that it lies in our case on the critical side of the line and if that value is less than 0 then we say that it lies on the positive side of the line. Thereby giving us our two class of values: positive reviews and critical reviews
So what we simply did here is that we converted our review text to a vector and used the concept of line and normal to separate our points. Giants like Amazon.com work on similar lines but with greater depth.
So how do we convert these text to vectors? There are certain rules which are very important to be followed. From our example of Amazon.com, let’s take three reviews, r1, r2, and r3. Now these reviews are for a product say Masala Oats.
Reviewer1 : ‘I enjoyed the taste of Masala Oats.’
Reviewer2 : ‘I never thought Oats can be delicious too.’
Reviewer3 : ‘Why would someone want to mess up with something as nutritious as oats. I didn’t like it.’
We understood that the two classes i.e. the negative reviews and positive reviews form a cluster and every positive point will be in the proximity of the positive cluster and every negative point will be in the proximity of critical cluster. This proximity is nothing but how close these reviews are to each other. Let’s see how it will look like if we plot them on a graph
review1 and review2 are closer or the distance between the review1 and review2 will be less than the distance between review1 and review3 because review1 and review2 belong to the same cluster, whereas review1 and review3 are from different clusters so they are not very close or the distance between these two is greater. This rule of distance needs to be followed while converting a text to a vector, i.e. similar reviews will be closer to each other
Now we know why we need to convert the text to a vector and what rules should that conversion follow.
The big question which we have with us is how to do we convert the text to a vector? There are multiple techniques to do that such as BagofWords, TF_IDF or term frequency-inverse document frequency, word2vec, etc. Each technique has it’s own use and limitation. Over the period of time, we will be exploring those techniques in our posts.