{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## User and product embeddings\n", "\n", "We calculate user and product embeddings based on the training set, and evaluate the results on the unseen test set. We will evaluate the results by plotting the user and product similarity versus the review score. The dataset is created in the [Get_embeddings_from_dataset Notebook](Get_embeddings_from_dataset.ipynb)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Calculate user and product embeddings\n", "\n", "We calculate these embeddings simply by averaging all the reviews about the same product or written by the same user within the training set." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ProductId | \n", "UserId | \n", "Score | \n", "Summary | \n", "Text | \n", "combined | \n", "n_tokens | \n", "embedding | \n", "
---|---|---|---|---|---|---|---|---|
0 | \n", "B003XPF9BO | \n", "A3R7JR3FMEBXQB | \n", "5 | \n", "where does one start...and stop... with a tre... | \n", "Wanted to save some to bring to my Chicago fam... | \n", "Title: where does one start...and stop... wit... | \n", "52 | \n", "[0.03599238395690918, -0.02116263099014759, -0... | \n", "
297 | \n", "B003VXHGPK | \n", "A21VWSCGW7UUAR | \n", "4 | \n", "Good, but not Wolfgang Puck good | \n", "Honestly, I have to admit that I expected a li... | \n", "Title: Good, but not Wolfgang Puck good; Conte... | \n", "178 | \n", "[-0.07042013108730316, -0.03175969794392586, -... | \n", "