You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/libs/community/langchain_community/llms
Cheng, Penghui cc407e8a1b
community[minor]: weight only quantization with intel-extension-for-transformers. (#14504)
Support weight only quantization with intel-extension-for-transformers.
[Intel® Extension for
Transformers](https://github.com/intel/intel-extension-for-transformers)
is an innovative toolkit to accelerate Transformer-based models on Intel
platforms, in particular effective on 4th Intel Xeon Scalable processor
[Sapphire
Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html)
(codenamed Sapphire Rapids). The toolkit provides the below key
features:

* Seamless user experience of model compressions on Transformer-based
models by extending [Hugging Face
transformers](https://github.com/huggingface/transformers) APIs and
leveraging [Intel® Neural
Compressor](https://github.com/intel/neural-compressor)
* Advanced software optimizations and unique compression-aware runtime.
* Optimized Transformer-based model packages.
*
[NeuralChat](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/neural_chat),
a customizable chatbot framework to create your own chatbot within
minutes by leveraging a rich set of plugins and SOTA optimizations.
*
[Inference](https://github.com/intel/intel-extension-for-transformers/blob/main/intel_extension_for_transformers/llm/runtime/graph)
of Large Language Model (LLM) in pure C/C++ with weight-only
quantization kernels.
This PR is an integration of weight only quantization feature with
intel-extension-for-transformers.

Unit test is in
lib/langchain/tests/integration_tests/llm/test_weight_only_quantization.py
The notebook is in
docs/docs/integrations/llms/weight_only_quantization.ipynb.
The document is in
docs/docs/integrations/providers/weight_only_quantization.mdx.

---------

Signed-off-by: Cheng, Penghui <penghui.cheng@intel.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
..
grammars community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
__init__.py community[minor]: weight only quantization with intel-extension-for-transformers. (#14504) 5 months ago
ai21.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
aleph_alpha.py infra: add print rule to ruff (#16221) 7 months ago
amazon_api_gateway.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
anthropic.py community[patch]: move pdf text tests to integration (#18746) 6 months ago
anyscale.py community[patch]: Remove model limitation on Anyscale LLM (#17662) 7 months ago
aphrodite.py community[minor]: Add Aphrodite Engine support (#14759) 9 months ago
arcee.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
aviary.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
azureml_endpoint.py community[patch]: Support Streaming in Azure Machine Learning (#18246) 6 months ago
baichuan.py infra: add -p to mkdir in lint steps (#17013) 7 months ago
baidu_qianfan_endpoint.py community[patch]: Fix the error of Baidu Qianfan not passing the stop parameter (#18666) 6 months ago
bananadev.py community[patch]: introduce convert_to_secret() to bananadev llm (#14283) 6 months ago
baseten.py community: refactor Baseten integration with new API endpoints & docs (#15017) 9 months ago
beam.py infra: add print rule to ruff (#16221) 7 months ago
bedrock.py community[patch]: Add explicit error message to Bedrock error output. (#17328) 6 months ago
bigdl_llm.py community[minor]: migrate `bigdl-llm` to `ipex-llm` (#19518) 6 months ago
bittensor.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
cerebriumai.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
chatglm.py <langchain_community\llms\chatglm.py>: <Correcting "history"> (#16729) 8 months ago
chatglm3.py community: Add ChatGLM3 (#15265) 8 months ago
clarifai.py community[patch] : Tidy up and update Clarifai SDK functions (#18314) 6 months ago
cloudflare_workersai.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
cohere.py cohere[patch]: add cohere as a partner package (#19049) 6 months ago
ctransformers.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
ctranslate2.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
databricks.py community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696) 6 months ago
deepinfra.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
deepsparse.py infra: add print rule to ruff (#16221) 7 months ago
edenai.py community: replace deprecated davinci models (#14860) 9 months ago
fake.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
fireworks.py community[patch]: invoke callback prior to yielding token (fireworks) (#19388) 6 months ago
forefrontai.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
friendli.py community[minor]: Integration for `Friendli` LLM and `ChatFriendli` ChatModel. (#17913) 6 months ago
gigachat.py community[minor]: Added GigaChat Embeddings support + updated previous GigaChat integration (#19516) 6 months ago
google_palm.py multiple[patch]: fix deprecation versions (#18349) 7 months ago
gooseai.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
gpt4all.py community[patch]: Enable streaming for GPT4all (#16392) 8 months ago
gradient_ai.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
huggingface_endpoint.py community: Fix import path for StreamingStdOutCallbackHandler example (#19170) 6 months ago
huggingface_hub.py Community: Fuse HuggingFace Endpoint-related classes into one (#17254) 7 months ago
huggingface_pipeline.py docs: Update docs for `HuggingFacePipeline` (#19306) 6 months ago
huggingface_text_gen_inference.py Community: Fuse HuggingFace Endpoint-related classes into one (#17254) 7 months ago
human.py infra: add print rule to ruff (#16221) 7 months ago
ipex_llm.py community[minor]: migrate `bigdl-llm` to `ipex-llm` (#19518) 6 months ago
javelin_ai_gateway.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
koboldai.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
konko.py community[minor]: Adding Konko Completion endpoint (#15570) 8 months ago
layerup_security.py community[minor]: add Layerup Security integration (#19787) 5 months ago
llamacpp.py community[patch]: invoke callback prior to yielding token (llama.cpp) (#19392) 6 months ago
llamafile.py community[minor]: Adds Llamafile as an LLM (#17431) 7 months ago
loading.py community[minor]: Allow passing `allow_dangerous_deserialization` when loading LLM chain (#18894) 6 months ago
manifest.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
minimax.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
mlflow.py docs: fix databricks document url (#19096) 6 months ago
mlflow_ai_gateway.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
modal.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
moonshot.py community[minor]: add support for Moonshot llm and chat model (#17100) 6 months ago
mosaicml.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
nlpcloud.py community[patch]: fix top_p type hint (#15452) 8 months ago
oci_data_science_model_deployment_endpoint.py docs, community[patch], experimental[patch], langchain[patch], cli[pa… (#15412) 9 months ago
oci_generative_ai.py community[patch]: docstrings (#16810) 7 months ago
octoai_endpoint.py community[minor]: Update OctoAI LLM, Embedding and documentation (#16710) 8 months ago
ollama.py community[patch]: Invoke callback prior to yielding token (ollama) (#18629) 6 months ago
opaqueprompts.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
openai.py community[patch]: invoke callback prior to yielding token (openai) (#19389) 6 months ago
openllm.py community[patch]: OpenLLM Client Fixes + Added Timeout Parameter (#17478) 7 months ago
openlm.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
pai_eas_endpoint.py community[patch]: Invoke callback prior to yielding token (pai_eas_endpoint) (#18627) 6 months ago
petals.py Refactor: use SecretStr for Petals llms (#15121) 9 months ago
pipelineai.py infra: add -p to mkdir in lint steps (#17013) 7 months ago
predibase.py community[minor]: fix failing Predibase integration (#19776) 6 months ago
predictionguard.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
promptlayer_openai.py Do not issue beta or deprecation warnings on internal calls (#15641) 8 months ago
replicate.py community[patch]: Invoke callback prior to yielding token (replicate) (#18626) 6 months ago
rwkv.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
sagemaker_endpoint.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
self_hosted.py community[major]: breaking change in some APIs to force users to opt-in for pickling (#18696) 6 months ago
self_hosted_hugging_face.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
solar.py community[minor]: Add solar model chat model (#18556) 6 months ago
sparkllm.py community[patch]: Invoke callback prior to yielding token (sparkllm) (#18625) 6 months ago
stochasticai.py infra: add -p to mkdir in lint steps (#17013) 7 months ago
symblai_nebula.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
textgen.py infra: add print rule to ruff (#16221) 7 months ago
titan_takeoff.py community[patch]: Invoke callback prior to yielding token (titan_takeoff) (#18560) 6 months ago
titan_takeoff_pro.py community[patch]: Invoke callback prior to yielding token (titan_takeoff_pro) (#18624) 6 months ago
together.py together[minor]: add llm (#15853) 8 months ago
tongyi.py community[patch]: Fixed bug in merging `generation_info` during chunk concatenation in Tongyi and ChatTongyi (#19014) 6 months ago
utils.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
vertexai.py community[patch]: Invoke callback prior to yielding token (#18447) 6 months ago
vllm.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
volcengine_maas.py community[patch]: Invoke callback prior to yielding token (#18288) 7 months ago
watsonxllm.py ibm: added partners package `langchain_ibm`, added llm (#16512) 7 months ago
weight_only_quantization.py community[minor]: weight only quantization with intel-extension-for-transformers. (#14504) 5 months ago
writer.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 9 months ago
xinference.py docs: docstrings `langchain_community` update (#14889) 9 months ago
yandex.py community[patch]: YandexGPT Use recent yandexcloud sdk version (#19341) 6 months ago
yuan2.py community[patch]: fix yuan2 errors in LLMs (#19004) 6 months ago