UAE is a ripe market for those hungry for success — those willing to take risks in a risk-averse environment; those entrepreneurs willing to go the extra mile to buck the trend and survive the first…
The method is applied bag-of-words. Bag-of-words means we calculate word frequency in the text, regardless the order of words.
Let’s start by reading the data by the following code:
Before doing any work, it is important to see how the data look like:
I saw that the data have been anonymous, so I think this information is not necessary. Let’s remove this information:
The result:
An important step of text mining is stemming. Some word family, such as go — went — goes is the variant of the word “‘go”. Therefore, grouping them together as one word is necessary. There are many algorithms that do the stemming process. I just chose Porter Stemmer to get the job done:
After the final steps of cleaning the data, we have processed data. Now we can start vectorizing them by CountVectorizer
This function builds a matrix with the words and frequency of them, so we can go ahead and use it to fit in the classifier algorithm. There are many classifier that we can use, in this guide I use KNeighborsClassifier
Before we could fit in, we should split data into train and test set, and fit them in. The target label is “Score”:
The accuracy of the method is 32.7%. Quite low, but this is just the start. We need to implement more methods to clean the data, different algorithms, or even combinations of different algorithm. In later guide, I will try to improve the accuracy of the prediction.
bibit ceri vietnam kwalitas terbaik asli tanaman buah okulasi #Selamat datang di toko kami selamat belanja#?Kami petani, penjual dan penyuplai segala benih / tanaman unggul?Keunggulan dan keuntungan…
Olivia poured herself a cup of coffee. She walked from the kitchen to the sunroom to watch the sunrise. It wasn’t uncommon for her to be the first one up on a Saturday morning. Her husband, Ted, was…
As the mother of four growing kids, keeping track of acceptable sandwich combinations is enough to drive me to an Excel spreadsheet. I fantasize about my kids’ names as rows and sandwich innards as…