"The image appears to be a graph depicting the task accuracy of two different base model sizes (big and small) as a function of different training strategies and the effort/complexity associated with them. Here's a description of the differences in training strategy between a small and a large base model as suggested by the graph:\n",
"\n",
"1. **Zero-shot prompts**: Both models start with some baseline accuracy with no additional training, which is indicative of zero-shot learning capabilities. However, the big base model shows higher accuracy out of the box compared to the small base model.\n",
"\n",
"2. **Prompt engineering**: As the complexity increases with prompt engineering, the big base model shows a significant improvement in task accuracy, indicating that it can understand and leverage well-engineered prompts more effectively than the small base model.\n",
"\n",
"3. **Few-shot prompts**: With the introduction of few-shot prompts, where the model is given a few examples to learn from, the big base model continues to show higher task accuracy in comparison to the small base model, which also improves but not to the same extent.\n",
"\n",
"4. **Retrieval-augmented few-shot prompting**: At this stage, the models are enhanced with retrieval mechanisms to assist in the few-shot learning process. The big base model maintains a lead in task accuracy, demonstrating that it can better integrate retrieval-augmented strategies.\n",
"\n",
"5. **Finetuning**: As we move towards the right side of the graph, which represents finetuning, the small base model shows a more significant increase in accuracy compared to previous steps, suggesting that finetuning has a substantial impact on smaller models. The big base model, while also benefiting from finetuning, does not show as dramatic an increase, likely because it was already performing at a higher level due to its larger size and capacity.\n",
"\n",
"6. **Model training (finetuning, RLHF) & data engine**: The final section of the graph indicates that with extensive model training techniques like finetuning and Reinforcement Learning from Human Feedback (RLHF), combined with a robust data engine, the big base model can achieve near-perfect task accuracy. The small base model also improves but does not reach the same level, indicating that the larger model's capacity enables it to better utilize advanced training methods and data resources.\n",
"\n",
"In summary, the big base model benefits more from advanced training strategies and demonstrates higher task accuracy with increased effort and complexity, while the small base model requires more significant finetuning to achieve substantial improvements in performance.\n"
]
},
{
"cell_type": "markdown",
"id": "2552b0e6-9d07-40f1-8fbc-17567bd0fdd1",
"metadata": {},
"source": [
"## QA with OSS Multi-modal LLMs\n",
"\n",
"We cam also test various open source multi-modal LLMs.\n",
"\n",
"See [here](https://github.com/langchain-ai/langchain/blob/master/cookbook/Semi_structured_and_multi_modal_RAG.ipynb) for instructions to build llama.cpp for multi-modal LLMs:\n",
"# Execute the command and save the output to the defined output file\n",
"/Users/rlm/Desktop/Code/llama.cpp/build/bin/llava -m /Users/rlm/Desktop/Code/llama.cpp/models/${MODEL_NAME}/ggml-model-q5_k.gguf --mmproj /Users/rlm/Desktop/Code/llama.cpp/models/${MODEL_NAME}/mmproj-model-f16.gguf --temp 0.1 -p \"Based on the image, what is the difference in training strategy between a small and a large base model?\" --image \"$IMG_PATH\""