Processing math: 100%

Sentiment Analysis of Skincare Product Reviews using NLP


Overview

This project task is to make a Sentiment Analysis, a common task in natural language processing (NLP), that involves determining the emotional tone of a piece of text, such as a product review. This application can be a useful to apply in e-commerce channels, because it can be used to automatically analyze customer feedback and better tailor responses accordingly.

To complete the task, here it was used two popular algorithms used for sentiment analysis: VADER and BERT. Each one has their own approach and complexity:

Usually, if dealing with short and simple text, like tweets or product reviews, VADER might be a good option. On the other hand, if dealing with longer or more complex text, like news articles or customer feedback surveys, BERT might be a better choice. Even so, this project applied both algorithms to compare their performance on a simple task.

The dataset used in this project contains data scrapped from Ulta Skincare Reviews, before march 27, 2023. The source is at: Skincare-Products-Dataset

1. Project Set-up

This analysis beggins with the following steps:

1.1 First look
1.2. Data types and null values


Note: Two columns have null values. Because it is only three values (less than 1% of data), we can simply drop them and work with the remaining data



To better understand the data, let's randomly select a set of Review texts to try to understand what we will see:

Then, let's look into the titles of reviews, to see if there is a pattern.

2. Feature extraction & EDA


This part contains the following steps:

1. Extract the date from which reviews were made : The idea behind this is to see if there were peaks and valleys in product reviews among years. (It can further be seen if there was any influence of more positive or negative);

2. Extract the number of characters in each review : Is there a relation between how much you write and how well or undersatisfied you are? People tend to leave longer reviews if they are more or less satisfied? These questions will be answered through data;

3. Brands and Products : Overlook into favorite products and brands;

Brands and Products

Because the date of review was extracted from the time past scrapping, it does not tell the specific day of review. Therefore, only dates within one year can be relied on for time series analysis:

The above charts points a peak in sales during october, which may be due to events and promotions, or something deeper, which will be investigated

3. VADER Model


- Setting up texts using NLTK

Next steps:

Taking a Deeper look Negative Reviews:

For a quick overview, let's define a threshold of being a negative review of 35% or over, and look at the texts:

Obs: Let's also look the probability of being a positive review, to ensure the model really knows what it is classifying

The texts are too long, so it needs to be read in other way:

Taking a deeper look Positive Reviews :



The previous chart used a threshold probability to define the sentiment of a product's review. To get a better estimate, the next plotting will consider the predominant sentiment detected by the algorithm, to classify it as negative, neutral or Positive:



Now, it will be created a new field at the dataframe with the predicted class of review, and see it behaving month by month:

It can be seen that in september it had an increase in positive reviews versus overall reviews, whereas in october, even though it grew in number, the overall reviews had a sharp increase in neutral sentiment, which may caused the number of sales to reduce (in other words, normalize) in the next months.

3. BERT model


According to Hugging Face community, the creators of the "transformers" library, it is a State-of-the-art Machine Learning for PyTorch, TensorFlow, and JAX. In a nutshell:

🤗 Transformers provides APIs and tools to easily download and train state-of-the-art pretrained models. Using pretrained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities, such as:

(1) 📝 Natural Language Processing: text classification, named entity recognition, question answering, Sentiment Analysis, language modeling, summarization, translation, multiple choice, and text generation. (2) 🖼️ Computer Vision: image classification, object detection, and segmentation. (3) 🗣️ Audio: automatic speech recognition and audio classification. (4) 🐙 Multimodal: table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.

For this project, it will be used a **pretrained model over ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark**. The source can be found at [Hugging-Face-index](https://huggingface.co/docs/transformers/index)



After downloading the model, we can use it to calculate the predicted sentiments, and apply a softmax function to return the probabilities for each state.


Lastly, both vader and BERT results will be combined at a dataframe, and plotted against each other to see if they differ much. Also, let's suppose that 'upvotes' highlights comments that reflects how the community sees the product

4. Conclusion

Even though both models are doing the same task, it should be seen a positve correlation between both models predictions of negative, neutral and positive probabilities to the same row. However, it is not what is seen, because the data is completely scattered. As a result, it can be understood that the vader and BERT models have many different interpretations on the data.
As it was could be seen, the vader model does not interpret semmantics, and text containg words like "non" or "dead" are bound to be negative statements, which is reality was not true. Therefore, the BERT model corrects those mistakes and results in a much more reliable model for this sentiment analysis