# How to Integrate Mistral and Meta Llama Model Families with GitHub Models

> Integrate Mistral and Meta Llama models easily using an OpenAI-compatible client and GitHub token. Learn the simple steps in our beginner's guide.

- Repository: [Microsoft/generative-ai-for-beginners](https://github.com/microsoft/generative-ai-for-beginners)
- Tags: how-to-guide
- Published: 2026-02-26

---

**You can integrate both Mistral and Meta Llama model families using the same OpenAI-compatible client by setting the model identifier to `mistralai/<model-name>` or `meta-llama/<model-name>` and authenticating with a GitHub personal access token.**

The *Generative AI for Beginners* curriculum by Microsoft provides dedicated lessons and working code samples that demonstrate how to integrate Mistral and Meta Llama model families through the GitHub Models marketplace. These open-source model families can be accessed using standard OpenAI API patterns, making it straightforward to switch between proprietary and open-weight models without rewriting your application logic.

## Setting Up Authentication for GitHub Models

Before integrating Mistral or Llama models, you must configure authentication to access the GitHub Models inference endpoint. The repository's course setup guide specifies creating a personal access token with the `read:packages` scope.

Create a `.env` file in your project root based on the template provided in `00-course-setup/.env.copy`:

```bash
GITHUB_TOKEN=ghp_your_personal_access_token_here

```

Load this token in your Python application using `python-dotenv` to securely inject credentials without hardcoding them.

## Configuring the OpenAI-Compatible Client

The integration relies on an OpenAI-compatible client that points to the GitHub Models inference endpoint rather than OpenAI's servers.

### Base Client Configuration

In [`06-text-generation-apps/python/githubmodels-app.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/06-text-generation-apps/python/githubmodels-app.py), the repository demonstrates the minimal client setup required to route requests to GitHub Models:

```python
import openai
import os
from dotenv import load_dotenv

load_dotenv()
openai.api_base = "https://models.inference.ai.azure.com/v1"
openai.api_key = os.getenv("GITHUB_TOKEN")

# Test the connection

response = openai.ChatCompletion.create(
    model="mistralai/Mistral-Large-2407",
    messages=[{"role": "user", "content": "Hello, world!"}]
)
print(response.choices[0].message.content)

```

This same client configuration works for both Mistral and Meta Llama families—only the `model` parameter changes.

## Integrating Mistral Model Family

The repository's lesson 20 provides comprehensive coverage of Mistral integration, including specific model identifiers and retrieval-augmented generation patterns.

### Available Mistral Model Identifiers

According to `translations/zh-TW/20-mistral/python/githubmodels-assignment.ipynb`, the following model identifiers are available through GitHub Models:

- `mistralai/Mistral-Large` – Flagship reasoning model
- `mistralai/Mistral-Large-2407` – Updated Large 2 variant
- `mistralai/Mistral-Small` – Efficient, cost-effective option
- `mistralai/Mistral-Nemo` – Specialized for specific domains

### RAG Implementation with Mistral

The notebook demonstrates building a retrieval-augmented generation pipeline using Mistral as the LLM backend:

```python
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI

# Configure Mistral Large as the LLM

llm = OpenAI(
    model_name="mistralai/Mistral-Large",
    temperature=0,
    openai_api_base="https://models.inference.ai.azure.com/v1",
    openai_api_key=os.getenv("GITHUB_TOKEN")
)

# Build vector store from documents

embeddings = OpenAIEmbeddings()
docsearch = FAISS.from_documents(docs, embeddings)

# Execute RAG query

query = "What are the key features of Mistral Large?"
retrieved_docs = docsearch.similarity_search(query, k=4)
response = llm.generate([query + "\n\nContext: " + str(retrieved_docs)])

```

## Integrating Meta Llama Model Family

Lesson 21 covers Meta's Llama models, including the vision-capable variants introduced in Llama 3.2.

### Available Llama Model Identifiers

Per `translations/zh-TW/21-meta/python/githubmodels-assignment.ipynb`, supported Meta Llama identifiers include:

- `meta-llama/Llama-3.1-70B-Instruct` – High-performance instruction-tuned model
- `meta-llama/Llama-3.1-8B-Instruct` – Efficient smaller variant
- `meta-llama/Llama-3.2-90B-Vision-Instruct` – Multi-modal vision and text
- `meta-llama/Llama-3.2-11B-Vision-Instruct` – Lightweight vision model

### Multi-Modal Vision Capabilities

The Llama 3.2 integration supports image-plus-text prompting. The notebook provides this pattern for vision tasks:

```python
import base64

# Encode image to base64

with open("example.png", "rb") as image_file:
    base64_image = base64.b64encode(image_file.read()).decode('utf-8')

# Construct multi-modal message

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe what is happening in this image."},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/png;base64,{base64_image}"
                }
            }
        ]
    }
]

# Call Llama 3.2 Vision

response = openai.ChatCompletion.create(
    model="meta-llama/Llama-3.2-90B-Vision-Instruct",
    messages=messages,
    temperature=0.5
)

print(response.choices[0].message.content)

```

## Switching Between Model Families Programmatically

Because both families use the same OpenAI-compatible endpoint, you can implement fallback logic or A/B testing by changing only the model string:

```python
models = [
    "mistralai/Mistral-Large-2407",
    "meta-llama/Llama-3.1-70B-Instruct"
]

for model in models:
    response = openai.ChatCompletion.create(
        model=model,
        messages=[{"role": "user", "content": "Explain quantum computing"}],
        temperature=0.7
    )
    print(f"{model}: {response.choices[0].message.content[:100]}...")

```

The notebooks in `20-mistral` and `21-meta` also include tokenizer comparisons, allowing you to analyze token count differences between Mistral and Llama models for cost optimization.

## Summary

- **Authentication**: Store your GitHub personal access token with `read:packages` scope in a `.env` file as `GITHUB_TOKEN`.
- **Client Configuration**: Use the OpenAI-compatible client pointing to `https://models.inference.ai.azure.com/v1` to access both model families.
- **Mistral Integration**: Reference models using `mistralai/<model-name>` (e.g., `Mistral-Large-2407`, `Mistral-Small`) for text generation and RAG pipelines.
- **Meta Llama Integration**: Reference models using `meta-llama/<model-name>` (e.g., `Llama-3.1-70B-Instruct`, `Llama-3.2-90B-Vision-Instruct`) for instruction following and multi-modal vision tasks.
- **Code Resources**: Refer to [`06-text-generation-apps/python/githubmodels-app.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/06-text-generation-apps/python/githubmodels-app.py) for base client setup, `20-mistral/python/githubmodels-assignment.ipynb` for Mistral-specific examples, and `21-meta/python/githubmodels-assignment.ipynb` for Llama vision capabilities.

## Frequently Asked Questions

### What authentication scope is required to access Mistral and Llama models on GitHub Models?

You need a GitHub personal access token with the `read:packages` scope. This token authenticates requests to the GitHub Models inference endpoint at `https://models.inference.ai.azure.com/v1`. Store this token in your `.env` file as `GITHUB_TOKEN` and load it via `python-dotenv` to keep credentials secure.

### Can I use the same client code for both Mistral and Meta Llama models?

Yes. Both model families are accessed through the same OpenAI-compatible client configuration. You only need to change the `model` parameter in your API call. Use `mistralai/<model-name>` for Mistral models and `meta-llama/<model-name>` for Llama models. The base URL, authentication headers, and response parsing remain identical.

### Which model should I choose for vision tasks involving images?

For vision tasks, use the Meta Llama 3.2 vision models available through GitHub Models. Specifically, `meta-llama/Llama-3.2-90B-Vision-Instruct` offers high-performance image understanding, while `meta-llama/Llama-3.2-11B-Vision-Instruct` provides a lighter alternative. These models accept base64-encoded images in the message payload and can answer questions about visual content.

### How do I implement retrieval-augmented generation with these open-source models?

The repository's lesson notebooks demonstrate RAG implementation using LangChain with Mistral models. You configure the LLM with `model_name="mistralai/Mistral-Large"` and the GitHub Models endpoint, then combine it with a vector store like FAISS for document retrieval. The same pattern works for Llama models by swapping the model identifier, allowing you to compare retrieval accuracy across different open-source architectures.