{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Load the dataset\n", "\n", "The dataset used in this example is [fine-food reviews](https://www.kaggle.com/snap/amazon-fine-food-reviews) from Amazon. The dataset contains a total of 568,454 food reviews Amazon users left up to October 2012. We will use a subset of this dataset, consisting of 1,000 most recent reviews for illustration purposes. The reviews are in English and tend to be positive or negative. Each review has a ProductId, UserId, Score, review title (Summary) and review body (Text).\n", "\n", "We will combine the review summary and review text into a single combined text. The model will encode this combined text and it will output a single vector embedding." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Time | \n", "ProductId | \n", "UserId | \n", "Score | \n", "Summary | \n", "Text | \n", "combined | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "1351123200 | \n", "B003XPF9BO | \n", "A3R7JR3FMEBXQB | \n", "5 | \n", "where does one start...and stop... with a tre... | \n", "Wanted to save some to bring to my Chicago fam... | \n", "Title: where does one start...and stop... wit... | \n", "
1 | \n", "1351123200 | \n", "B003JK537S | \n", "A3JBPC3WFUT5ZP | \n", "1 | \n", "Arrived in pieces | \n", "Not pleased at all. When I opened the box, mos... | \n", "Title: Arrived in pieces; Content: Not pleased... | \n", "