mirror of
https://github.com/hwchase17/langchain
synced 2024-10-29 17:07:25 +00:00
b7ae9f715d
I have added a reddit document loader which fetches the text from the Posts of Subreddits or Reddit users, using the `praw` Python package. I have also added an example notebook reddit.ipynb in order to guide users to use this dataloader. This code was made in format similar to twiiter document loader. I have run code formating, linting and also checked the code myself for different scenarios. This is my first contribution to an open source project and I am really excited about this. If you want to suggest some improvements in my code, I will be happy to do it. :) Co-authored-by: Taaha Bajwa <taaha.s.bajwa@gmail.com>
113 lines
8.3 KiB
Plaintext
113 lines
8.3 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"attachments": {},
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Reddit\n",
|
|
"\n",
|
|
"This loader fetches the text from the Posts of Subreddits or Reddit users, using the `praw` Python package.\n",
|
|
"\n",
|
|
"Make a Reddit Application from https://www.reddit.com/prefs/apps/ and initialize the loader with with your Reddit API credentials."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 1,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"from langchain.document_loaders import RedditPostsLoader"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 2,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# !pip install praw"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 5,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# load using 'subreddit' mode\n",
|
|
"loader = RedditPostsLoader(\n",
|
|
" client_id=\"YOUR CLIENT ID\",\n",
|
|
" client_secret=\"YOUR CLIENT SECRET\",\n",
|
|
" user_agent=\"extractor by u/Master_Ocelot8179\",\n",
|
|
" categories=['new', 'hot'], # List of categories to load posts from\n",
|
|
" mode = 'subreddit',\n",
|
|
" search_queries=['investing', 'wallstreetbets'], # List of subreddits to load posts from\n",
|
|
" number_posts=20 # Default value is 10\n",
|
|
" )\n",
|
|
"\n",
|
|
"# # or load using 'username' mode\n",
|
|
"# loader = RedditPostsLoader(\n",
|
|
"# client_id=\"YOUR CLIENT ID\",\n",
|
|
"# client_secret=\"YOUR CLIENT SECRET\",\n",
|
|
"# user_agent=\"extractor by u/Master_Ocelot8179\",\n",
|
|
"# categories=['new', 'hot'], \n",
|
|
"# mode = 'username',\n",
|
|
"# search_queries=['ga3far', 'Master_Ocelot8179'], # List of usernames to load posts from\n",
|
|
"# number_posts=20\n",
|
|
"# )\n",
|
|
"\n",
|
|
"# Note: Categories can be only of following value - \"controversial\" \"hot\" \"new\" \"rising\" \"top\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"[Document(page_content='Hello, I am not looking for investment advice. I will apply my own due diligence. However, I am interested if anyone knows as a UK resident how fees and exchange rate differences would impact performance?\\n\\nI am planning to create a pie of index funds (perhaps UK, US, europe) or find a fund with a good track record of long term growth at low rates. \\n\\nDoes anyone have any ideas?', metadata={'post_subreddit': 'r/investing', 'post_category': 'new', 'post_title': 'Long term retirement funds fees/exchange rate query', 'post_score': 1, 'post_id': '130pa6m', 'post_url': 'https://www.reddit.com/r/investing/comments/130pa6m/long_term_retirement_funds_feesexchange_rate_query/', 'post_author': Redditor(name='Badmanshiz')}),\n",
|
|
" Document(page_content='I much prefer the Roth IRA and would rather rollover my 401k to that every year instead of keeping it in the limited 401k options. But if I rollover, will I be able to continue contributing to my 401k? Or will that close my account? I realize that there are tax implications of doing this but I still think it is the better option.', metadata={'post_subreddit': 'r/investing', 'post_category': 'new', 'post_title': 'Is it possible to rollover my 401k every year?', 'post_score': 3, 'post_id': '130ja0h', 'post_url': 'https://www.reddit.com/r/investing/comments/130ja0h/is_it_possible_to_rollover_my_401k_every_year/', 'post_author': Redditor(name='AnCap_Catholic')}),\n",
|
|
" Document(page_content='Have a general question? Want to offer some commentary on markets? Maybe you would just like to throw out a neat fact that doesn\\'t warrant a self post? Feel free to post here! \\n\\nIf your question is \"I have $10,000, what do I do?\" or other \"advice for my personal situation\" questions, you should include relevant information, such as the following:\\n\\n* How old are you? What country do you live in? \\n* Are you employed/making income? How much? \\n* What are your objectives with this money? (Buy a house? Retirement savings?) \\n* What is your time horizon? Do you need this money next month? Next 20yrs? \\n* What is your risk tolerance? (Do you mind risking it at blackjack or do you need to know its 100% safe?) \\n* What are you current holdings? (Do you already have exposure to specific funds and sectors? Any other assets?) \\n* Any big debts (include interest rate) or expenses? \\n* And any other relevant financial information will be useful to give you a proper answer. \\n\\nPlease consider consulting our FAQ first - https://www.reddit.com/r/investing/wiki/faq\\nAnd our [side bar](https://www.reddit.com/r/investing/about/sidebar) also has useful resources. \\n\\nIf you are new to investing - please refer to Wiki - [Getting Started](https://www.reddit.com/r/investing/wiki/index/gettingstarted/)\\n\\nThe reading list in the wiki has a list of books ranging from light reading to advanced topics depending on your knowledge level. Link here - [Reading List](https://www.reddit.com/r/investing/wiki/readinglist)\\n\\nCheck the resources in the sidebar.\\n\\nBe aware that these answers are just opinions of Redditors and should be used as a starting point for your research. You should strongly consider seeing a registered investment adviser if you need professional support before making any financial decisions!', metadata={'post_subreddit': 'r/investing', 'post_category': 'new', 'post_title': 'Daily General Discussion and Advice Thread - April 27, 2023', 'post_score': 5, 'post_id': '130eszz', 'post_url': 'https://www.reddit.com/r/investing/comments/130eszz/daily_general_discussion_and_advice_thread_april/', 'post_author': Redditor(name='AutoModerator')}),\n",
|
|
" Document(page_content=\"Based on recent news about salt battery advancements and the overall issues of lithium, I was wondering what would be feasible ways to invest into non-lithium based battery technologies? CATL is of course a choice, but the selection of brokers I currently have in my disposal don't provide HK stocks at all.\", metadata={'post_subreddit': 'r/investing', 'post_category': 'new', 'post_title': 'Investing in non-lithium battery technologies?', 'post_score': 2, 'post_id': '130d6qp', 'post_url': 'https://www.reddit.com/r/investing/comments/130d6qp/investing_in_nonlithium_battery_technologies/', 'post_author': Redditor(name='-manabreak')}),\n",
|
|
" Document(page_content='Hello everyone,\\n\\nI would really like to invest in an ETF that follows spy or another big index, as I think this form of investment suits me best. \\n\\nThe problem is, that I live in Denmark where ETFs and funds are taxed annually on unrealised gains at quite a steep rate. This means that an ETF growing say 10% per year will only grow about 6%, which really ruins the long term effects of compounding interest.\\n\\nHowever stocks are only taxed on realised gains which is why they look more interesting to hold long term.\\n\\nI do not like the lack of diversification this brings, as I am looking to spend tonnes of time picking the right long term stocks.\\n\\nIt would be ideal to find a few stocks that over the long term somewhat follows the indexes. Does anyone have suggestions?\\n\\nI have looked at Nasdaq Inc. which quite closely follows Nasdaq 100. \\n\\nI really appreciate any help.', metadata={'post_subreddit': 'r/investing', 'post_category': 'new', 'post_title': 'Stocks that track an index', 'post_score': 7, 'post_id': '130auvj', 'post_url': 'https://www.reddit.com/r/investing/comments/130auvj/stocks_that_track_an_index/', 'post_author': Redditor(name='LeAlbertP')})]"
|
|
]
|
|
},
|
|
"execution_count": 6,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"documents = loader.load()\n",
|
|
"documents[:5]"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "env1",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.13"
|
|
},
|
|
"orig_nbformat": 4
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|