Goals

Develop a recommender system to help consumers choose skincare products based on reviews from people with similar skin characteristics and experience.

The US beauty industry is a 18-billion dollar business. People, especially women, are on the constant look-out for their “Holy Grail” products. And as with many things in life, you don’t know if it works until you try it. People react to products differently: skin types, races, age, ingredients, personal preference are all contributing factors. With a recommender system, we can limit this trial-and-error process. Skip the products that are likely break us out and go straight to the more potential ones.

Approach and Data source

The two main sources of skincare product reviews that I know of are Amazon and MakeupAlley. I chose to go with MakeupAlley because it is a popular beauty review website and has data on attributes that I want to look at such as skin color, age etc. MakeupAlley is also less prone to fake reviews than Amazon.

The milestones that I envisioned for the project were:

  • scrape MakeupAlley website for reviews on skincare products
  • parse data into desired types and generate features
  • build recommender system
  • deploy on Heroku

Deliverables

  • Recommender system for skincare products with a web app interface (link)

Additional features that will be nice to have:

  • I have a review database from Amazon but to link it with products in MUA I need to get products’ UPC numbers from Amazon. So I would have to scrape Amazon for those numbers.
  • Add other kinds of beauty products outside of skincare.

How it’s done

I scraped through 2,748,608 reviews of 171,350 products. Having decided to focus on just skincare, my subset of data consists of 371,986 reviews for 6,056 products.

logo

stats

There are a few products that are VERY popular with a few thousands reviews. However, the median number of reviews is 28, as seen in the box plot.

The median rating for products is 3.8, which means consumers are generally happy with the products that they bought.

And finally, popular brands dont necessarily have higher ratings. Because they have a large portfolio with many products, popular brands usually have average ratings.

The final dataset have pretty good representation of users’ characteristics.

fig_age

fig_skin

fig_hair

The usual approaches to Recommender System are Content-based Filtering, Collaborative Filtering and Hybrid approach. Content-based Filtering recommends products using similar products. Collaborative Filtering recommends products based on users with similar taste. The hybrid approach tries to incorporate these two approaches.

fig_rs

The flowchart below shows the process of building the recommender system.

fig_process

The codes are on Github. Finally, head over to https://holygrailtdi.herokuapp.com/ to see the app in action!