You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
searxng/src/searx.plugins.autodetect_se...

208 lines
18 KiB
HTML

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Search language plugin &#8212; SearXNG Documentation (2023.1.23+522ba9a1)</title>
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="../_static/searxng.css" />
<link rel="stylesheet" type="text/css" href="../_static/tabs.css" />
<script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
<script src="../_static/jquery.js"></script>
<script src="../_static/underscore.js"></script>
<script src="../_static/_sphinx_javascript_frameworks_compat.js"></script>
<script src="../_static/doctools.js"></script>
<script src="../_static/sphinx_highlight.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Limiter Plugin" href="searx.plugins.limiter.html" />
<link rel="prev" title="Locales" href="searx.locales.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="../py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="searx.plugins.limiter.html" title="Limiter Plugin"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="searx.locales.html" title="Locales"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../index.html">SearXNG Documentation (2023.1.23+522ba9a1)</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="index.html" accesskey="U">Source-Code</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">Search language plugin</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<section id="module-searx.plugins.autodetect_search_language">
<span id="search-language-plugin"></span><span id="autodetect-search-language"></span><h1>Search language plugin<a class="headerlink" href="#module-searx.plugins.autodetect_search_language" title="Permalink to this heading"></a></h1>
<p>Plugin to detect the search language from the search query.</p>
<p>The language detection is done by using the <a class="reference external" href="https://fasttext.cc/">fastText</a> library (<a class="reference external" href="https://pypi.org/project/fasttext/">python
fasttext</a>). <a class="reference external" href="https://fasttext.cc/">fastText</a> distributes the <a class="reference external" href="https://fasttext.cc/docs/en/language-identification.html">language identification model</a>, for
reference:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://arxiv.org/abs/1612.03651">FastText.zip: Compressing text classification models</a></p></li>
<li><p><a class="reference external" href="https://arxiv.org/abs/1607.01759">Bag of Tricks for Efficient Text Classification</a></p></li>
</ul>
<p>The <a class="reference external" href="https://fasttext.cc/docs/en/language-identification.html">language identification model</a> support the language codes (ISO-639-3):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">af</span> <span class="n">als</span> <span class="n">am</span> <span class="n">an</span> <span class="n">ar</span> <span class="n">arz</span> <span class="k">as</span> <span class="n">ast</span> <span class="n">av</span> <span class="n">az</span> <span class="n">azb</span> <span class="n">ba</span> <span class="n">bar</span> <span class="n">bcl</span> <span class="n">be</span> <span class="n">bg</span> <span class="n">bh</span> <span class="n">bn</span> <span class="n">bo</span> <span class="n">bpy</span> <span class="n">br</span> <span class="n">bs</span> <span class="n">bxr</span>
<span class="n">ca</span> <span class="n">cbk</span> <span class="n">ce</span> <span class="n">ceb</span> <span class="n">ckb</span> <span class="n">co</span> <span class="n">cs</span> <span class="n">cv</span> <span class="n">cy</span> <span class="n">da</span> <span class="n">de</span> <span class="n">diq</span> <span class="n">dsb</span> <span class="n">dty</span> <span class="n">dv</span> <span class="n">el</span> <span class="n">eml</span> <span class="n">en</span> <span class="n">eo</span> <span class="n">es</span> <span class="n">et</span> <span class="n">eu</span> <span class="n">fa</span>
<span class="n">fi</span> <span class="n">fr</span> <span class="n">frr</span> <span class="n">fy</span> <span class="n">ga</span> <span class="n">gd</span> <span class="n">gl</span> <span class="n">gn</span> <span class="n">gom</span> <span class="n">gu</span> <span class="n">gv</span> <span class="n">he</span> <span class="n">hi</span> <span class="n">hif</span> <span class="n">hr</span> <span class="n">hsb</span> <span class="n">ht</span> <span class="n">hu</span> <span class="n">hy</span> <span class="n">ia</span> <span class="nb">id</span> <span class="n">ie</span> <span class="n">ilo</span> <span class="n">io</span>
<span class="ow">is</span> <span class="n">it</span> <span class="n">ja</span> <span class="n">jbo</span> <span class="n">jv</span> <span class="n">ka</span> <span class="n">kk</span> <span class="n">km</span> <span class="n">kn</span> <span class="n">ko</span> <span class="n">krc</span> <span class="n">ku</span> <span class="n">kv</span> <span class="n">kw</span> <span class="n">ky</span> <span class="n">la</span> <span class="n">lb</span> <span class="n">lez</span> <span class="n">li</span> <span class="n">lmo</span> <span class="n">lo</span> <span class="n">lrc</span> <span class="n">lt</span> <span class="n">lv</span>
<span class="n">mai</span> <span class="n">mg</span> <span class="n">mhr</span> <span class="nb">min</span> <span class="n">mk</span> <span class="n">ml</span> <span class="n">mn</span> <span class="n">mr</span> <span class="n">mrj</span> <span class="n">ms</span> <span class="n">mt</span> <span class="n">mwl</span> <span class="n">my</span> <span class="n">myv</span> <span class="n">mzn</span> <span class="n">nah</span> <span class="n">nap</span> <span class="n">nds</span> <span class="n">ne</span> <span class="n">new</span> <span class="n">nl</span> <span class="n">nn</span>
<span class="n">no</span> <span class="n">oc</span> <span class="ow">or</span> <span class="n">os</span> <span class="n">pa</span> <span class="n">pam</span> <span class="n">pfl</span> <span class="n">pl</span> <span class="n">pms</span> <span class="n">pnb</span> <span class="n">ps</span> <span class="n">pt</span> <span class="n">qu</span> <span class="n">rm</span> <span class="n">ro</span> <span class="n">ru</span> <span class="n">rue</span> <span class="n">sa</span> <span class="n">sah</span> <span class="n">sc</span> <span class="n">scn</span> <span class="n">sco</span> <span class="n">sd</span>
<span class="n">sh</span> <span class="n">si</span> <span class="n">sk</span> <span class="n">sl</span> <span class="n">so</span> <span class="n">sq</span> <span class="n">sr</span> <span class="n">su</span> <span class="n">sv</span> <span class="n">sw</span> <span class="n">ta</span> <span class="n">te</span> <span class="n">tg</span> <span class="n">th</span> <span class="n">tk</span> <span class="n">tl</span> <span class="n">tr</span> <span class="n">tt</span> <span class="n">tyv</span> <span class="n">ug</span> <span class="n">uk</span> <span class="n">ur</span> <span class="n">uz</span> <span class="n">vec</span> <span class="n">vep</span>
<span class="n">vi</span> <span class="n">vls</span> <span class="n">vo</span> <span class="n">wa</span> <span class="n">war</span> <span class="n">wuu</span> <span class="n">xal</span> <span class="n">xmf</span> <span class="n">yi</span> <span class="n">yo</span> <span class="n">yue</span> <span class="n">zh</span>
</pre></div>
</div>
<p>The <a class="reference external" href="https://fasttext.cc/docs/en/language-identification.html">language identification model</a> is harmonized with the SearXNGs language
(locale) model. General conditions of SearXNGs locale model are:</p>
<ol class="loweralpha simple">
<li><p>SearXNGs locale of a query is passed to the
<a class="reference internal" href="searx.locales.html#searx.locales.get_engine_locale" title="searx.locales.get_engine_locale"><code class="xref py py-obj docutils literal notranslate"><span class="pre">searx.locales.get_engine_locale</span></code></a> to get a language and/or region
code that is used by an engine.</p></li>
<li><p>SearXNG and most of the engines do not support all the languages from
language model and there might be also a discrepancy in the ISO-639-3 and
ISO-639-2 handling (<a class="reference internal" href="searx.locales.html#searx.locales.get_engine_locale" title="searx.locales.get_engine_locale"><code class="xref py py-obj docutils literal notranslate"><span class="pre">searx.locales.get_engine_locale</span></code></a>). Further
more, in SearXNG the locales like <code class="docutils literal notranslate"><span class="pre">zh-TH</span></code> (<code class="docutils literal notranslate"><span class="pre">zh-CN</span></code>) are mapped to
<code class="docutils literal notranslate"><span class="pre">zh_Hant</span></code> (<code class="docutils literal notranslate"><span class="pre">zh_Hans</span></code>).</p></li>
</ol>
<p>Conclusion: This plugin does only auto-detect the languages a user can select in
the language menu (<a class="reference internal" href="#searx.plugins.autodetect_search_language.supported_langs" title="searx.plugins.autodetect_search_language.supported_langs"><code class="xref py py-obj docutils literal notranslate"><span class="pre">supported_langs</span></code></a>).</p>
<p>SearXNGs locale of a query comes from (<em>highest wins</em>):</p>
<ol class="arabic simple">
<li><p>The <code class="docutils literal notranslate"><span class="pre">Accept-Language</span></code> header from users HTTP client.</p></li>
<li><p>The user select a locale in the preferences.</p></li>
<li><p>The user select a locale from the menu in the query form (e.g. <code class="docutils literal notranslate"><span class="pre">:zh-TW</span></code>)</p></li>
<li><p>This plugin is activated in the preferences and the locale (only the language
code / none region code) comes from the fastTexts language detection.</p></li>
</ol>
<p>Conclusion: There is a conflict between the language selected by the user and
the language from language detection of this plugin. For example, the user
explicitly selects the German locale via the search syntax to search for a term
that is identified as an English term (try <code class="docutils literal notranslate"><span class="pre">:de-DE</span> <span class="pre">thermomix</span></code>, for example).</p>
<div class="admonition hint">
<p class="admonition-title">Hint</p>
<p>To SearXNG maintainers; please take into account: under some circumstances
the auto-detection of the language of this plugin could be detrimental to
users expectations. Its not recommended to activate this plugin by
default. It should always be the users decision whether to activate this
plugin or not.</p>
</div>
<dl class="py data">
<dt class="sig sig-object py" id="searx.plugins.autodetect_search_language.supported_langs">
<span class="sig-prename descclassname"><span class="pre">searx.plugins.autodetect_search_language.</span></span><span class="sig-name descname"><span class="pre">supported_langs</span></span><em class="property"><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="pre">{'af',</span> <span class="pre">'ar',</span> <span class="pre">'be',</span> <span class="pre">'bg',</span> <span class="pre">'ca',</span> <span class="pre">'cs',</span> <span class="pre">'da',</span> <span class="pre">'de',</span> <span class="pre">'el',</span> <span class="pre">'en',</span> <span class="pre">'es',</span> <span class="pre">'et',</span> <span class="pre">'fa',</span> <span class="pre">'fi',</span> <span class="pre">'fil',</span> <span class="pre">'fr',</span> <span class="pre">'he',</span> <span class="pre">'hi',</span> <span class="pre">'hr',</span> <span class="pre">'hu',</span> <span class="pre">'id',</span> <span class="pre">'is',</span> <span class="pre">'it',</span> <span class="pre">'ja',</span> <span class="pre">'ko',</span> <span class="pre">'lt',</span> <span class="pre">'lv',</span> <span class="pre">'nl',</span> <span class="pre">'no',</span> <span class="pre">'pl',</span> <span class="pre">'pt',</span> <span class="pre">'ro',</span> <span class="pre">'ru',</span> <span class="pre">'sk',</span> <span class="pre">'sl',</span> <span class="pre">'sr',</span> <span class="pre">'sv',</span> <span class="pre">'sw',</span> <span class="pre">'th',</span> <span class="pre">'tr',</span> <span class="pre">'uk',</span> <span class="pre">'vi',</span> <span class="pre">'zh'}</span></em><a class="headerlink" href="#searx.plugins.autodetect_search_language.supported_langs" title="Permalink to this definition"></a></dt>
<dd><p>Languages supported by most searxng engines (<code class="xref py py-obj docutils literal notranslate"><span class="pre">searx.languages.language_codes</span></code>).</p>
</dd></dl>
</section>
<div class="clearer"></div>
</div>
</div>
</div>
<span id="sidebar-top"></span>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<p class="logo"><a href="../index.html">
<img class="logo" src="../_static/searxng-wordmark.svg" alt="Logo"/>
</a></p>
<h3><a href="../index.html">Table of Contents</a></h3>
<p class="caption" role="heading"><span class="caption-text">Contents</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="../user/index.html">User information</a></li>
<li class="toctree-l1"><a class="reference internal" href="../own-instance.html">Why use a private instance?</a></li>
<li class="toctree-l1"><a class="reference internal" href="../admin/index.html">Administrator documentation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../dev/index.html">Developer documentation</a></li>
<li class="toctree-l1"><a class="reference internal" href="../utils/index.html">DevOps tooling box</a></li>
<li class="toctree-l1 current"><a class="reference internal" href="index.html">Source-Code</a><ul class="current">
<li class="toctree-l2"><a class="reference internal" href="searx.babel_extract.html">Custom message extractor (i18n)</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.engines.html">Load Engines</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.engines.demo_offline.html">Demo Offline Engine</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.engines.demo_online.html">Demo Online Engine</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.engines.google.html">Google Engines</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.engines.tineye.html">Tineye</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.engines.yahoo.html">Yahoo Engine</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.infopage.html">Online <code class="docutils literal notranslate"><span class="pre">/info</span></code></a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.locales.html">Locales</a></li>
<li class="toctree-l2 current"><a class="current reference internal" href="#">Search language plugin</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.plugins.limiter.html">Limiter Plugin</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.plugins.tor_check.html">Tor check plugin</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.redisdb.html">Redis DB</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.redislib.html">Redis Library</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.search.html">Search</a></li>
<li class="toctree-l2"><a class="reference internal" href="searx.utils.html">Utility functions for the engines</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../donate.html">Donate to searxng.org</a></li>
</ul>
<h3>Project Links</h3>
<ul>
<li><a href="https://github.com/searxng/searxng/tree/master">Source</a>
<li><a href="https://github.com/searxng/searxng/wiki">Wiki</a>
<li><a href="https://searx.space">Public instances</a>
<li><a href="https://github.com/searxng/searxng/issues">Issue Tracker</a>
</ul><h3>Navigation</h3>
<ul>
<li><a href="../index.html">Overview</a>
<ul>
<li><a href="index.html">Source-Code</a>
<ul>
<li>Previous: <a href="searx.locales.html" title="previous chapter">Locales</a>
<li>Next: <a href="searx.plugins.limiter.html" title="next chapter">Limiter Plugin</a></ul>
</li>
</ul>
</li>
</ul>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="../search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>document.getElementById('searchbox').style.display = "block"</script>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="../_sources/src/searx.plugins.autodetect_search_language.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2021 SearXNG team, 2015-2021 Adam Tauber, Noémi Ványi.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 5.3.0.
</div>
<script src="../_static/version_warning_offset.js"></script>
</body>
</html>