How to Integrate Mistral and Meta Llama Model Families with GitHub Models

You can integrate both Mistral and Meta Llama model families using the same OpenAI-compatible client by setting the model identifier to mistralai/<model-name> or meta-llama/<model-name> and authenticating with a GitHub personal access token.

The Generative AI for Beginners curriculum by Microsoft provides dedicated lessons and working code samples that demonstrate how to integrate Mistral and Meta Llama model families through the GitHub Models marketplace. These open-source model families can be accessed using standard OpenAI API patterns, making it straightforward to switch between proprietary and open-weight models without rewriting your application logic.

Setting Up Authentication for GitHub Models

Before integrating Mistral or Llama models, you must configure authentication to access the GitHub Models inference endpoint. The repository's course setup guide specifies creating a personal access token with the read:packages scope.

Create a .env file in your project root based on the template provided in 00-course-setup/.env.copy:

GITHUB_TOKEN=ghp_your_personal_access_token_here

Load this token in your Python application using python-dotenv to securely inject credentials without hardcoding them.

Configuring the OpenAI-Compatible Client

The integration relies on an OpenAI-compatible client that points to the GitHub Models inference endpoint rather than OpenAI's servers.

Base Client Configuration

In 06-text-generation-apps/python/githubmodels-app.py, the repository demonstrates the minimal client setup required to route requests to GitHub Models:

import openai
import os
from dotenv import load_dotenv

load_dotenv()
openai.api_base = "https://models.inference.ai.azure.com/v1"
openai.api_key = os.getenv("GITHUB_TOKEN")

# Test the connection

response = openai.ChatCompletion.create(
    model="mistralai/Mistral-Large-2407",
    messages=[{"role": "user", "content": "Hello, world!"}]
)
print(response.choices[0].message.content)

This same client configuration works for both Mistral and Meta Llama families—only the model parameter changes.

Integrating Mistral Model Family

The repository's lesson 20 provides comprehensive coverage of Mistral integration, including specific model identifiers and retrieval-augmented generation patterns.

Available Mistral Model Identifiers

According to translations/zh-TW/20-mistral/python/githubmodels-assignment.ipynb, the following model identifiers are available through GitHub Models:

  • mistralai/Mistral-Large – Flagship reasoning model
  • mistralai/Mistral-Large-2407 – Updated Large 2 variant
  • mistralai/Mistral-Small – Efficient, cost-effective option
  • mistralai/Mistral-Nemo – Specialized for specific domains

RAG Implementation with Mistral

The notebook demonstrates building a retrieval-augmented generation pipeline using Mistral as the LLM backend:

from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI

# Configure Mistral Large as the LLM

llm = OpenAI(
    model_name="mistralai/Mistral-Large",
    temperature=0,
    openai_api_base="https://models.inference.ai.azure.com/v1",
    openai_api_key=os.getenv("GITHUB_TOKEN")
)

# Build vector store from documents

embeddings = OpenAIEmbeddings()
docsearch = FAISS.from_documents(docs, embeddings)

# Execute RAG query

query = "What are the key features of Mistral Large?"
retrieved_docs = docsearch.similarity_search(query, k=4)
response = llm.generate([query + "\n\nContext: " + str(retrieved_docs)])

Integrating Meta Llama Model Family

Lesson 21 covers Meta's Llama models, including the vision-capable variants introduced in Llama 3.2.

Available Llama Model Identifiers

Per translations/zh-TW/21-meta/python/githubmodels-assignment.ipynb, supported Meta Llama identifiers include:

  • meta-llama/Llama-3.1-70B-Instruct – High-performance instruction-tuned model
  • meta-llama/Llama-3.1-8B-Instruct – Efficient smaller variant
  • meta-llama/Llama-3.2-90B-Vision-Instruct – Multi-modal vision and text
  • meta-llama/Llama-3.2-11B-Vision-Instruct – Lightweight vision model

Multi-Modal Vision Capabilities

The Llama 3.2 integration supports image-plus-text prompting. The notebook provides this pattern for vision tasks:

import base64

# Encode image to base64

with open("example.png", "rb") as image_file:
    base64_image = base64.b64encode(image_file.read()).decode('utf-8')

# Construct multi-modal message

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe what is happening in this image."},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/png;base64,{base64_image}"
                }
            }
        ]
    }
]

# Call Llama 3.2 Vision

response = openai.ChatCompletion.create(
    model="meta-llama/Llama-3.2-90B-Vision-Instruct",
    messages=messages,
    temperature=0.5
)

print(response.choices[0].message.content)

Switching Between Model Families Programmatically

Because both families use the same OpenAI-compatible endpoint, you can implement fallback logic or A/B testing by changing only the model string:

models = [
    "mistralai/Mistral-Large-2407",
    "meta-llama/Llama-3.1-70B-Instruct"
]

for model in models:
    response = openai.ChatCompletion.create(
        model=model,
        messages=[{"role": "user", "content": "Explain quantum computing"}],
        temperature=0.7
    )
    print(f"{model}: {response.choices[0].message.content[:100]}...")

The notebooks in 20-mistral and 21-meta also include tokenizer comparisons, allowing you to analyze token count differences between Mistral and Llama models for cost optimization.

Summary

  • Authentication: Store your GitHub personal access token with read:packages scope in a .env file as GITHUB_TOKEN.
  • Client Configuration: Use the OpenAI-compatible client pointing to https://models.inference.ai.azure.com/v1 to access both model families.
  • Mistral Integration: Reference models using mistralai/<model-name> (e.g., Mistral-Large-2407, Mistral-Small) for text generation and RAG pipelines.
  • Meta Llama Integration: Reference models using meta-llama/<model-name> (e.g., Llama-3.1-70B-Instruct, Llama-3.2-90B-Vision-Instruct) for instruction following and multi-modal vision tasks.
  • Code Resources: Refer to 06-text-generation-apps/python/githubmodels-app.py for base client setup, 20-mistral/python/githubmodels-assignment.ipynb for Mistral-specific examples, and 21-meta/python/githubmodels-assignment.ipynb for Llama vision capabilities.

Frequently Asked Questions

What authentication scope is required to access Mistral and Llama models on GitHub Models?

You need a GitHub personal access token with the read:packages scope. This token authenticates requests to the GitHub Models inference endpoint at https://models.inference.ai.azure.com/v1. Store this token in your .env file as GITHUB_TOKEN and load it via python-dotenv to keep credentials secure.

Can I use the same client code for both Mistral and Meta Llama models?

Yes. Both model families are accessed through the same OpenAI-compatible client configuration. You only need to change the model parameter in your API call. Use mistralai/<model-name> for Mistral models and meta-llama/<model-name> for Llama models. The base URL, authentication headers, and response parsing remain identical.

Which model should I choose for vision tasks involving images?

For vision tasks, use the Meta Llama 3.2 vision models available through GitHub Models. Specifically, meta-llama/Llama-3.2-90B-Vision-Instruct offers high-performance image understanding, while meta-llama/Llama-3.2-11B-Vision-Instruct provides a lighter alternative. These models accept base64-encoded images in the message payload and can answer questions about visual content.

How do I implement retrieval-augmented generation with these open-source models?

The repository's lesson notebooks demonstrate RAG implementation using LangChain with Mistral models. You configure the LLM with model_name="mistralai/Mistral-Large" and the GitHub Models endpoint, then combine it with a vector store like FAISS for document retrieval. The same pattern works for Llama models by swapping the model identifier, allowing you to compare retrieval accuracy across different open-source architectures.

Have a question about this repo?

These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:

Share the following with your agent to get started:
curl -s "https://instagit.com/install.md"

Works with
Claude Codex Cursor VS Code OpenClaw Any MCP Client

Maintain an open-source project? Get it listed too →