How to Integrate Mistral and Meta Llama Model Families with GitHub Models
You can integrate both Mistral and Meta Llama model families using the same OpenAI-compatible client by setting the model identifier to mistralai/<model-name> or meta-llama/<model-name> and authenticating with a GitHub personal access token.
The Generative AI for Beginners curriculum by Microsoft provides dedicated lessons and working code samples that demonstrate how to integrate Mistral and Meta Llama model families through the GitHub Models marketplace. These open-source model families can be accessed using standard OpenAI API patterns, making it straightforward to switch between proprietary and open-weight models without rewriting your application logic.
Setting Up Authentication for GitHub Models
Before integrating Mistral or Llama models, you must configure authentication to access the GitHub Models inference endpoint. The repository's course setup guide specifies creating a personal access token with the read:packages scope.
Create a .env file in your project root based on the template provided in 00-course-setup/.env.copy:
GITHUB_TOKEN=ghp_your_personal_access_token_here
Load this token in your Python application using python-dotenv to securely inject credentials without hardcoding them.
Configuring the OpenAI-Compatible Client
The integration relies on an OpenAI-compatible client that points to the GitHub Models inference endpoint rather than OpenAI's servers.
Base Client Configuration
In 06-text-generation-apps/python/githubmodels-app.py, the repository demonstrates the minimal client setup required to route requests to GitHub Models:
import openai
import os
from dotenv import load_dotenv
load_dotenv()
openai.api_base = "https://models.inference.ai.azure.com/v1"
openai.api_key = os.getenv("GITHUB_TOKEN")
# Test the connection
response = openai.ChatCompletion.create(
model="mistralai/Mistral-Large-2407",
messages=[{"role": "user", "content": "Hello, world!"}]
)
print(response.choices[0].message.content)
This same client configuration works for both Mistral and Meta Llama families—only the model parameter changes.
Integrating Mistral Model Family
The repository's lesson 20 provides comprehensive coverage of Mistral integration, including specific model identifiers and retrieval-augmented generation patterns.
Available Mistral Model Identifiers
According to translations/zh-TW/20-mistral/python/githubmodels-assignment.ipynb, the following model identifiers are available through GitHub Models:
mistralai/Mistral-Large– Flagship reasoning modelmistralai/Mistral-Large-2407– Updated Large 2 variantmistralai/Mistral-Small– Efficient, cost-effective optionmistralai/Mistral-Nemo– Specialized for specific domains
RAG Implementation with Mistral
The notebook demonstrates building a retrieval-augmented generation pipeline using Mistral as the LLM backend:
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
# Configure Mistral Large as the LLM
llm = OpenAI(
model_name="mistralai/Mistral-Large",
temperature=0,
openai_api_base="https://models.inference.ai.azure.com/v1",
openai_api_key=os.getenv("GITHUB_TOKEN")
)
# Build vector store from documents
embeddings = OpenAIEmbeddings()
docsearch = FAISS.from_documents(docs, embeddings)
# Execute RAG query
query = "What are the key features of Mistral Large?"
retrieved_docs = docsearch.similarity_search(query, k=4)
response = llm.generate([query + "\n\nContext: " + str(retrieved_docs)])
Integrating Meta Llama Model Family
Lesson 21 covers Meta's Llama models, including the vision-capable variants introduced in Llama 3.2.
Available Llama Model Identifiers
Per translations/zh-TW/21-meta/python/githubmodels-assignment.ipynb, supported Meta Llama identifiers include:
meta-llama/Llama-3.1-70B-Instruct– High-performance instruction-tuned modelmeta-llama/Llama-3.1-8B-Instruct– Efficient smaller variantmeta-llama/Llama-3.2-90B-Vision-Instruct– Multi-modal vision and textmeta-llama/Llama-3.2-11B-Vision-Instruct– Lightweight vision model
Multi-Modal Vision Capabilities
The Llama 3.2 integration supports image-plus-text prompting. The notebook provides this pattern for vision tasks:
import base64
# Encode image to base64
with open("example.png", "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode('utf-8')
# Construct multi-modal message
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe what is happening in this image."},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}"
}
}
]
}
]
# Call Llama 3.2 Vision
response = openai.ChatCompletion.create(
model="meta-llama/Llama-3.2-90B-Vision-Instruct",
messages=messages,
temperature=0.5
)
print(response.choices[0].message.content)
Switching Between Model Families Programmatically
Because both families use the same OpenAI-compatible endpoint, you can implement fallback logic or A/B testing by changing only the model string:
models = [
"mistralai/Mistral-Large-2407",
"meta-llama/Llama-3.1-70B-Instruct"
]
for model in models:
response = openai.ChatCompletion.create(
model=model,
messages=[{"role": "user", "content": "Explain quantum computing"}],
temperature=0.7
)
print(f"{model}: {response.choices[0].message.content[:100]}...")
The notebooks in 20-mistral and 21-meta also include tokenizer comparisons, allowing you to analyze token count differences between Mistral and Llama models for cost optimization.
Summary
- Authentication: Store your GitHub personal access token with
read:packagesscope in a.envfile asGITHUB_TOKEN. - Client Configuration: Use the OpenAI-compatible client pointing to
https://models.inference.ai.azure.com/v1to access both model families. - Mistral Integration: Reference models using
mistralai/<model-name>(e.g.,Mistral-Large-2407,Mistral-Small) for text generation and RAG pipelines. - Meta Llama Integration: Reference models using
meta-llama/<model-name>(e.g.,Llama-3.1-70B-Instruct,Llama-3.2-90B-Vision-Instruct) for instruction following and multi-modal vision tasks. - Code Resources: Refer to
06-text-generation-apps/python/githubmodels-app.pyfor base client setup,20-mistral/python/githubmodels-assignment.ipynbfor Mistral-specific examples, and21-meta/python/githubmodels-assignment.ipynbfor Llama vision capabilities.
Frequently Asked Questions
What authentication scope is required to access Mistral and Llama models on GitHub Models?
You need a GitHub personal access token with the read:packages scope. This token authenticates requests to the GitHub Models inference endpoint at https://models.inference.ai.azure.com/v1. Store this token in your .env file as GITHUB_TOKEN and load it via python-dotenv to keep credentials secure.
Can I use the same client code for both Mistral and Meta Llama models?
Yes. Both model families are accessed through the same OpenAI-compatible client configuration. You only need to change the model parameter in your API call. Use mistralai/<model-name> for Mistral models and meta-llama/<model-name> for Llama models. The base URL, authentication headers, and response parsing remain identical.
Which model should I choose for vision tasks involving images?
For vision tasks, use the Meta Llama 3.2 vision models available through GitHub Models. Specifically, meta-llama/Llama-3.2-90B-Vision-Instruct offers high-performance image understanding, while meta-llama/Llama-3.2-11B-Vision-Instruct provides a lighter alternative. These models accept base64-encoded images in the message payload and can answer questions about visual content.
How do I implement retrieval-augmented generation with these open-source models?
The repository's lesson notebooks demonstrate RAG implementation using LangChain with Mistral models. You configure the LLM with model_name="mistralai/Mistral-Large" and the GitHub Models endpoint, then combine it with a vector store like FAISS for document retrieval. The same pattern works for Llama models by swapping the model identifier, allowing you to compare retrieval accuracy across different open-source architectures.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →