[searx-search] add docs, improved wrapper api, registered as tool

- Improved the search wrapper API to mirror the usage of the google
  search one.
- Register searx-search as loadable tool
- Added documentation and example notebook
searx-api-pre
blob42 1 year ago
parent a21e9becd4
commit a62b134e99

@ -0,0 +1,35 @@
# SearxNG Search API
This page covers how to use the SearxNG search API within LangChain.
It is broken into two parts: installation and setup, and then references to the specific SearxNG API wrapper.
## Installation and Setup
- You can find a list of public SearxNG instances [here](https://searx.space/).
- It recommended to use a self-hosted instance to avoid abuse on the public instances. Also note that public instances often have a limit on the number of requests.
- To run a self-hosted instance see [this page](https://searxng.github.io/searxng/admin/installation.html) for more information.
- To use the tool you need to provide the searx host url by:
1. passing the named parameter `searx_host` when creating the instance.
2. exporting the environment variable `SEARXNG_HOST`.
## Wrappers
### Utility
You can use the wrapper to get results from a SearxNG instance.
```python
from langchain.utilities import SearxSearchWrapper
```
### Tool
You can also easily load this wrapper as a Tool (to use with an Agent).
You can do this with:
```python
from langchain.agents import load_tools
tools = load_tools(["searx-search"], searx_host="https://searx.example.com")
```
For more information on this, see [this page](../modules/agents/tools.md)

@ -119,3 +119,11 @@ Below is a list of all supported tools and relevant information:
- Requires LLM: No
- Extra Parameters: `google_api_key`, `google_cse_id`
- For more information on this, see [this page](../../ecosystem/google_search.md)
**searx-search**
- Tool Name: Search
- Tool Description: A wrapper around SearxNG meta search engine. Input should be a search query.
- Notes: SearxNG is easy to deploy self-hosted. It is a good privacy friendly alternative to Google Search. Uses the SearxNG API.
- Requires LLM: No
- Extra Parameters: `searx_host`

@ -0,0 +1,197 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "40c7223e",
"metadata": {},
"source": [
"# SearxNG Search API\n",
"\n",
"This notebook goes over how to use a self hosted SearxNG search API to search the web."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "288f2aa4",
"metadata": {},
"outputs": [],
"source": [
"from langchain.searx_search import SearxSearchWrapper"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "f4ce83fa",
"metadata": {},
"outputs": [],
"source": [
"search = SearxSearchWrapper(searx_host=\"http://127.0.0.1:8888\")"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "ff6ef4e7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'In all, 45 individuals have served 46 presidencies spanning 58 full four-year terms. Joe Biden is the 46th and current president of the United States, having assumed office on January 20, 2021.'"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"search.run(\"Who is the current president of the united states of america?\")"
]
},
{
"cell_type": "markdown",
"id": "bf728728",
"metadata": {},
"source": [
"For some engines, if a direct `answer` is available the warpper will print the answer instead of the full search results. You can use the `results` method of the wrapper if you want to obtain all the results."
]
},
{
"cell_type": "markdown",
"id": "cbac93d4",
"metadata": {},
"source": [
"\n",
"# Custom Parameters\n",
"\n",
"SearxNG supports up to [139 search engines](https://docs.searxng.org/admin/engines/configured_engines.html#configured-engines). You can also customize the Searx wrapper with arbitrary named parameters that will be passed to the Searx search API . In the below example we will making a more interesting use of custom search parameters from searx search api."
]
},
{
"cell_type": "markdown",
"id": "7844deaa",
"metadata": {},
"source": [
"In this example we will be using the `engines` parameters to query wikipedia"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "1517e24b",
"metadata": {},
"outputs": [],
"source": [
"search = SearxSearchWrapper(searx_host=\"http://127.0.0.1:8888\", k=5) # k is for max number of items"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "4ded48b0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Large language models (LLMs) represent a major advancement in AI, with the promise of transforming domains through learned knowledge. LLM sizes have been increasing 10X every year for the last few years, and as these models grow in complexity and size, so do their capabilities.\\n\\nGPT-3 can translate language, write essays, generate computer code, and more — all with limited to no supervision. In July 2020, OpenAI unveiled GPT-3, a language model that was easily the largest known at the time. Put simply, GPT-3 is trained to predict the next word in a sentence, much like how a text message autocomplete feature works.\\n\\nAll of todays well-known language models—e.g., GPT-3 from OpenAI, PaLM or LaMDA from Google, Galactica or OPT from Meta, Megatron-Turing from Nvidia/Microsoft, Jurassic-1 from AI21 Labs—are...\\n\\nLarge language models are computer programs that open new possibilities of text understanding and generation in software systems. Consider this: ...\\n\\nLarge language models (LLMs) such as GPT-3are increasingly being used to generate text. These tools should be used with care, since they can generate content that is biased, non-verifiable, constitutes original research, or violates copyrights.'"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"search.run(\"large language model \", engines='wiki')"
]
},
{
"cell_type": "markdown",
"id": "259f5a5b",
"metadata": {},
"source": [
"## Obtaining results with metadata"
]
},
{
"cell_type": "markdown",
"id": "3c4cf1db",
"metadata": {},
"source": [
"In this example we will be looking for scientific paper using the `categories` parameter and limiting the results to a `time_range` (not all engines support the time range option).\n",
"\n",
"We also would like to obtain the results in a structured way including metadata. For this we will be using the `results` method of the wrapper."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "7cd5510b",
"metadata": {},
"outputs": [],
"source": [
"search = SearxSearchWrapper(searx_host=\"http://127.0.0.1:8888\")"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "2ff1acd5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[{'snippet': '… on natural language instructions, large language models (… the prompt used to steer the model, and most effective prompts … to prompt engineering, we propose Automatic Prompt …',\n",
" 'title': 'Large language models are human-level prompt engineers',\n",
" 'link': 'https://arxiv.org/abs/2211.01910'},\n",
" {'snippet': '… Large language models (LLMs) have introduced new possibilities for prototyping with AI [18]. Pre-trained on a large amount of text data, models … language instructions called prompts. …',\n",
" 'title': 'Promptchainer: Chaining large language model prompts through visual programming',\n",
" 'link': 'https://dl.acm.org/doi/abs/10.1145/3491101.3519729'},\n",
" {'snippet': '… can introspect the large prompt model. We derive the view ϕ0(X) and the model h0 from T01. However, instead of fully fine-tuning T0 during co-training, we focus on soft prompt tuning, …',\n",
" 'title': 'Co-training improves prompt-based learning for large language models',\n",
" 'link': 'https://proceedings.mlr.press/v162/lang22a.html'},\n",
" {'snippet': '… With the success of large language models (LLMs) of code and their use as … prompt design process become important. In this work, we propose a framework called Repo-Level Prompt …',\n",
" 'title': 'Repository-level prompt generation for large language models of code',\n",
" 'link': 'https://arxiv.org/abs/2206.12839'},\n",
" {'snippet': '… Figure 2 | The benefits of different components of a prompt for the largest language model (Gopher), as estimated from hierarchical logistic regression. Each point estimates the unique …',\n",
" 'title': 'Can language models learn from explanations in context?',\n",
" 'link': 'https://arxiv.org/abs/2204.02329'}]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"search.results(\"Large Language Model prompt\", num_results=5, categories='science', time_range='year')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

@ -15,6 +15,8 @@ The utilities listed here are all generic utilities.
`SerpAPI <./examples/serpapi.html>`_: How to use the SerpAPI wrapper to search the web.
`SearxNG Search API <./examples/searx_search.html>`_: Hot to use the SearxNG meta search wrapper to search the web.
`Bing Search <./examples/bing_search.html>`_: How to use the Bing search wrapper to search the web.
`Wolfram Alpha <./examples/wolfram_alpha.html>`_: How to use the Wolfram Alpha wrapper to interact with Wolfram Alpha.

@ -36,3 +36,8 @@ This uses the official Google Search API to look up information on the web.
## SerpAPI
This uses SerpAPI, a third party search API engine, to interact with Google Search.
## Searx Search
This uses the Searx (SearxNG fork) meta search engine API to lookup information
on the web. It supports 139 search engines and is easy to self-host
which makes it a good choice for privacy-conscious users.

@ -0,0 +1,6 @@
SearxNG Search
=============================
.. automodule:: langchain.searx_search
:members:
:undoc-members:

@ -13,6 +13,7 @@ These can largely be grouped into two categories: generic utilities, and then ut
modules/python
modules/serpapi
modules/searx_search
.. toctree::

@ -14,6 +14,7 @@ from langchain.serpapi import SerpAPIWrapper
from langchain.utilities.bash import BashProcess
from langchain.utilities.google_search import GoogleSearchAPIWrapper
from langchain.utilities.wolfram_alpha import WolframAlphaAPIWrapper
from langchain.searx_search import SearxSearchWrapper
def _get_python_repl() -> Tool:
@ -139,15 +140,23 @@ def _get_serpapi(**kwargs: Any) -> Tool:
coroutine=SerpAPIWrapper(**kwargs).arun,
)
def _get_searx_search(**kwargs: Any) -> Tool:
return Tool(
"Search",
SearxSearchWrapper(**kwargs).run,
"A meta search engine. Useful for when you need to answer questions about current events. Input should be a search query."
)
_EXTRA_LLM_TOOLS = {
"news-api": (_get_news_api, ["news_api_key"]),
"tmdb-api": (_get_tmdb_api, ["tmdb_bearer_token"]),
}
_EXTRA_OPTIONAL_TOOLS = {
"wolfram-alpha": (_get_wolfram_alpha, ["wolfram_alpha_appid"]),
"google-search": (_get_google_search, ["google_api_key", "google_cse_id"]),
"serpapi": (_get_serpapi, ["serpapi_api_key", "aiosession"]),
"searx-search": (_get_searx_search, ["searx_host", "searx_host"])
}

@ -1,18 +1,28 @@
"""Chain that calls SearxAPI.
"""Chain that calls Searx meta search API.
This is developed based on the SearxNG fork https://github.com/searxng/searxng
For Searx API refer to https://docs.searxng.org/index.html
SearxNG is a privacy-friendly free metasearch engine that aggregates results from multiple search engines
and databases.
For Searx search API refer to https://docs.searxng.org/dev/search_api.html
This is based on the SearxNG fork https://github.com/searxng/searxng which is
better maintained than the original Searx project and offers more features.
For a list of public SearxNG instances see https://searx.space/
NOTE: SearxNG instances often have a rate limit, so you might want to use a
self hosted instance and disable the rate limiter or use this PR: https://github.com/searxng/searxng/pull/2129 that adds whitelisting to the rate limiter.
"""
import requests
from pydantic import BaseModel, PrivateAttr, Extra, Field, validator, root_validator
from typing import Optional, List, Dict, Any
import json
from langchain.utils import get_from_dict_or_env
def _get_default_params() -> dict:
return {
# "engines": "google",
"lang": "en",
"format": "json"
}
@ -36,23 +46,50 @@ class SearxResults(dict):
# to silence mypy errors
@property
def results(self) -> Any:
return self.results
return self.get("results")
@property
def answers(self) -> Any:
return self.results
return self.get("answers")
class SearxSearchWrapper(BaseModel):
"""Wrapper for Searx API.
To use you need to provide the searx host by passing the named parameter
``searx_host`` or exporting the environment variable ``SEARX_HOST``.
In some situations you might want to disable SSL verification, for example
if you are running searx locally. You can do this by passing the named parameter
``unsecure``.
You can also pass the host url scheme as ``http`` to disable SSL.
Example:
.. code-block:: python
from langchain.searx_search import SearxSearchWrapper
searx = SearxSearchWrapper(searx_host="https://searx.example.com")
Example with SSL disabled:
.. code-block:: python
from langchain.searx_search import SearxSearchWrapper
# note the unsecure parameter is not needed if you pass the url scheme as http
searx = SearxSearchWrapper(searx_host="http://searx.example.com", unsecure=True)
"""
_result: SearxResults = PrivateAttr()
host: str = ""
searx_host = ""
unsecure: bool = False
params: dict = Field(default_factory=_get_default_params)
headers: Optional[dict] = None
k: int = 10
@validator("unsecure", pre=True)
@validator("unsecure")
def disable_ssl_warnings(cls, v: bool) -> bool:
if v:
# requests.urllib3.disable_warnings()
@ -71,16 +108,16 @@ class SearxSearchWrapper(BaseModel):
default = _get_default_params()
values["params"] = {**default, **user_params}
return values
searx_host = get_from_dict_or_env(values, "searx_host", "SEARX_HOST")
if not searx_host.startswith("http"):
print(f"Warning: `searx_host` is missing the url scheme, assuming secure https://{searx_host} ")
searx_host = "https://" + searx_host
elif searx_host.startswith("http://"):
values["unsecure"] = True
cls.disable_ssl_warnings(True)
values["searx_host"] = searx_host
@validator("host", pre=True, always=True)
def valid_host_url(cls, host: str) -> str:
if len(host) == 0:
raise ValueError("url can not be empty")
if not host.startswith("http"):
host = "http://" + host
return host
return values
class Config:
"""Configuration for this pydantic object."""
@ -88,19 +125,36 @@ class SearxSearchWrapper(BaseModel):
def _searx_api_query(self, params: dict) -> SearxResults:
"""actual request to searx API """
raw_result = requests.get(self.host, headers=self.headers
, params=params,
verify=not self.unsecure).text
self._result = SearxResults(raw_result)
return self._result
raw_result = requests.get(self.searx_host, headers=self.headers,
params=params,
verify=not self.unsecure).text
res = SearxResults(raw_result)
self._result = res
return res
def run(self, query: str, **kwargs: Any) -> str:
"""Run query through Searx API and parse results.
You can pass any other params to the searx query API.
Args:
query: The query to search for.
**kwargs: any parameters to pass to the searx API.
Example:
This will make a query to the qwant engine:
.. code-block:: python
def run(self, query: str) -> str:
"""Run query through Searx API and parse results"""
_params = {
from langchain.searx_search import SearxSearchWrapper
searx = SearxSearchWrapper(searx_host="http://my.searx.host")
searx.run("what is the weather in France ?", engine="qwant")
"""
_params = {
"q": query,
}
params = {**self.params, **_params}
}
params = {**self.params, **_params, **kwargs}
res = self._searx_api_query(params)
if len(res.answers) > 0:
@ -108,13 +162,13 @@ class SearxSearchWrapper(BaseModel):
# only return the content of the results list
elif len(res.results) > 0:
toret = "\n\n".join([r['content'] for r in res.results[:self.k]])
toret = "\n\n".join([r.get('content', 'no result found') for r in res.results[:self.k]])
else:
toret = "No good search result found"
return toret
def results(self, query: str, num_results: int) -> List[Dict]:
def results(self, query: str, num_results: int, **kwargs: Any) -> List[Dict]:
"""Run query through Searx API and returns the results with metadata.
Args:
@ -131,7 +185,7 @@ class SearxSearchWrapper(BaseModel):
_params = {
"q": query,
}
params = {**self.params, **_params}
params = {**self.params, **_params, **kwargs}
results = self._searx_api_query(params).results[:num_results]
if len(results) == 0:
return [{"Result": "No good Search Result was found"}]
@ -144,8 +198,3 @@ class SearxSearchWrapper(BaseModel):
metadata_results.append(metadata_result)
return metadata_results
# if __name__ == "__main__":
# search = SearxSearchWrapper(host='search.c.gopher', unsecure=True)
# print(search.run("who is the current president of Bengladesh ?"))

@ -2,6 +2,7 @@
from langchain.python import PythonREPL
from langchain.requests import RequestsWrapper
from langchain.serpapi import SerpAPIWrapper
from langchain.searx_search import SearxSearchWrapper
from langchain.utilities.bash import BashProcess
from langchain.utilities.bing_search import BingSearchAPIWrapper
from langchain.utilities.google_search import GoogleSearchAPIWrapper
@ -14,5 +15,6 @@ __all__ = [
"GoogleSearchAPIWrapper",
"WolframAlphaAPIWrapper",
"SerpAPIWrapper",
"SearxSearchWrapper",
"BingSearchAPIWrapper",
]

Loading…
Cancel
Save