UAE is the best country for entrepreneurs

UAE is a ripe market for those hungry for success — those willing to take risks in a risk-averse environment; those entrepreneurs willing to go the extra mile to buck the trend and survive the first…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Introduction to classify text using Bag of Words

The method is applied bag-of-words. Bag-of-words means we calculate word frequency in the text, regardless the order of words.

Let’s start by reading the data by the following code:

Before doing any work, it is important to see how the data look like:

I saw that the data have been anonymous, so I think this information is not necessary. Let’s remove this information:

The result:

An important step of text mining is stemming. Some word family, such as go — went — goes is the variant of the word “‘go”. Therefore, grouping them together as one word is necessary. There are many algorithms that do the stemming process. I just chose Porter Stemmer to get the job done:

After the final steps of cleaning the data, we have processed data. Now we can start vectorizing them by CountVectorizer

This function builds a matrix with the words and frequency of them, so we can go ahead and use it to fit in the classifier algorithm. There are many classifier that we can use, in this guide I use KNeighborsClassifier

Before we could fit in, we should split data into train and test set, and fit them in. The target label is “Score”:

The accuracy of the method is 32.7%. Quite low, but this is just the start. We need to implement more methods to clean the data, different algorithms, or even combinations of different algorithm. In later guide, I will try to improve the accuracy of the prediction.

Add a comment

Related posts:

Bibit Ceri Vietnam Kwalitas Terbaik Asli Tanaman Buah Okulasi Karawang Berkualitas

bibit ceri vietnam kwalitas terbaik asli tanaman buah okulasi #Selamat datang di toko kami selamat belanja#?Kami petani, penjual dan penyuplai segala benih / tanaman unggul?Keunggulan dan keuntungan…

Saturday Morning Zombies

Olivia poured herself a cup of coffee. She walked from the kitchen to the sunroom to watch the sunrise. It wasn’t uncommon for her to be the first one up on a Saturday morning. Her husband, Ted, was…

Zen Sandwiches

As the mother of four growing kids, keeping track of acceptable sandwich combinations is enough to drive me to an Excel spreadsheet. I fantasize about my kids’ names as rows and sandwich innards as…