Classifying Restaurant Reviews by Ethnicity

The goal of this project is to help ethnic restaurant owners make a better assessment of the reviews they receive on different platforms such as Yelp, Foursquare and Google Reviews. Most of the time, due to budget constraints, an ethnic restaurant faces the decision of choosing a population to cater to. Catering to a population is done by putting certain items on the menu, which are more popular among that population. The examples below consist of two Middle Eastern restaurants in Boston, one of which receives significantly higher review scores from people with Arabic names and the other of which receives significantly higher review scores from people non-Arabic names. I scrape the web for reviews on restaurants, using BeautifulSoup package of Python and employ a Recurrent Neural Network with long short-term memory in order to predict the ethnicity of names.

Average Review Score by Ethnicity

The histogram shows average review scores for the most common populations, taken from randomly selected 100 restaurants.





Al Wadi Restaurant (Boston)

This is a Lebanese restaurant. The distribution of its ratings by people with Arabic vs non-Arabic names on a scale from 1 to 5 star is shown below. The t-test confirms that reviewers with Arabic names on average give significantly higher ratings to Al Wadi.





Boston Shawarma

This is also a Middle Eastern restaurant. The t-test confirms this time that people with Arabic names on average give significantly lower ratings to Boston Shawarma.





The Model's Performance


The accuracy scores of my LSTM RNN model for the training and test sets for each epoch are shown below. After 10 epochs, we reach an accuracy score above 90% in both training and test sets. This finding suggests that there is no issue of overfitting. The output variable we are predicting here is whether a reviewer's first name has Arabic origin or not.





The losses of the model for the training and test sets for each epoch are displayed. The losses converge to a value around 0.25