# How to Use Open-Source LLMs from Hugging Face in Your Applications

> Integrate open-source LLMs from Hugging Face into your applications using the transformers library. Learn how to configure API keys and swap models easily to enhance your projects.

- Repository: [Microsoft/generative-ai-for-beginners](https://github.com/microsoft/generative-ai-for-beginners)
- Tags: how-to-guide
- Published: 2026-02-26

---

**You can integrate open-source LLMs from Hugging Face into your applications by configuring the `HUGGING_FACE_API_KEY` environment variable, using the `transformers` library's `pipeline` or `AutoModel` APIs, and swapping the model client in existing lesson code while keeping the same prompt logic.**

The *Generative AI for Beginners* curriculum by Microsoft provides a production-ready workflow for leveraging open-source large language models (LLMs) from Hugging Face. By following the repository's architecture, you can securely authenticate, load models like Llama 2 or Mistral, and integrate them into existing applications with minimal code changes.

## Configuring Hugging Face Authentication

Before calling any model, you must authenticate with the Hugging Face Hub. The repository separates credentials from code using environment variables.

### Environment Setup

Create a `.env` file in your project root based on the provided template `.env.copy`. Add your Hugging Face access token:

```bash
HUGGING_FACE_API_KEY=hf_your_token_here

```

Generate this token from your Hugging Face account settings. The [`00-course-setup/03-providers.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/00-course-setup/03-providers.md) file documents this requirement alongside other supported providers like OpenAI and Azure OpenAI.

### Secure Token Loading

Never hard-code secrets in your application logic. Instead, use the shared utility located at [`shared/python/env_utils.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/shared/python/env_utils.py). This module provides the `get_env()` function to safely load variables from your `.env` file:

```python
from shared.python.env_utils import get_env

hf_token = get_env("HUGGING_FACE_API_KEY")

```

This pattern ensures your credentials remain secure while remaining accessible to the model client.

## Installing and Loading Model Dependencies

The repository includes all necessary dependencies in [`requirements.txt`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/requirements.txt). Ensure you have `transformers` and `huggingface_hub` installed:

```bash
pip install -r requirements.txt

```

These libraries provide the core interfaces for downloading and running open-source models from the Hugging Face Hub.

## Generating Text with Hugging Face Models

The `16-open-source-models` lesson demonstrates two primary methods for interacting with LLMs: the high-level `pipeline` API and the lower-level `AutoModel` classes.

### Using the Pipeline API

The `pipeline` abstraction handles tokenization and generation in a single call. This is the fastest way to get started:

```python
from transformers import pipeline
from shared.python.env_utils import get_env

generator = pipeline(
    "text-generation",
    model="meta-llama/Llama-2-7b-chat-hf",
    token=get_env("HUGGING_FACE_API_KEY"),
    device=0,  # Use -1 for CPU

    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
)

result = generator("Explain quantum computing in simple terms.", return_full_text=False)
print(result[0]["generated_text"])

```

### Using AutoModel Classes

For fine-grained control over the generation process, instantiate the model and tokenizer separately:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from shared.python.env_utils import get_env

model_name = "meta-llama/Llama-2-7b-chat-hf"
token = get_env("HUGGING_FACE_API_KEY")

tokenizer = AutoTokenizer.from_pretrained(model_name, token=token)
model = AutoModelForCausalLM.from_pretrained(model_name, token=token)

inputs = tokenizer("What is the capital of France?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

```

## Integrating into Existing Applications

One strength of the repository's architecture is the clean separation between the model client and application logic. You can swap the backend provider without rewriting your prompt engineering or UI code.

### Swapping the OpenAI Client

The lesson [`06-text-generation-apps/python/oai-app.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/06-text-generation-apps/python/oai-app.py) demonstrates a text generation application using the OpenAI client. To use a Hugging Face model instead, replace the client initialization while preserving the rest of the application structure:

```python

# Remove: from openai import OpenAI

# Remove: client = OpenAI(api_key=get_env("OPENAI_API_KEY"))

# Hugging Face replacement

from transformers import pipeline
from shared.python.env_utils import get_env

generator = pipeline(
    "text-generation",
    model="meta-llama/Llama-2-7b-chat-hf",
    token=get_env("HUGGING_FACE_API_KEY"),
    device=0,
    max_new_tokens=256,
)

def generate_response(prompt: str) -> str:
    result = generator(prompt, return_full_text=False)
    return result[0]["generated_text"]

```

The `generate_response` function now serves as a drop-in replacement for the OpenAI client's `chat.completions.create` method, allowing the rest of the application to function unchanged.

## Deploying as a REST API

For production deployments, you can wrap the Hugging Face model in a simple Flask server. This pattern mirrors the Azure Functions examples in the repository but uses your local or remote open-source model.

```python

# flask_hf_api.py

from flask import Flask, request, jsonify
from transformers import pipeline
from shared.python.env_utils import get_env

app = Flask(__name__)

# Initialize model once at startup

generator = pipeline(
    "text-generation",
    model="meta-llama/Llama-2-7b-chat-hf",
    token=get_env("HUGGING_FACE_API_KEY"),
    device=0,
    max_new_tokens=256,
    temperature=0.7,
)

@app.post("/generate")
def generate():
    data = request.get_json()
    prompt = data.get("prompt", "")
    result = generator(prompt, return_full_text=False)
    return jsonify({"response": result[0]["generated_text"]})

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8000)

```

Run this server with `python flask_hf_api.py` and send POST requests to `http://localhost:8000/generate` with a JSON body containing your prompt.

## Key Files in the Repository

Understanding the repository structure helps you navigate the implementation details:

- **[`00-course-setup/03-providers.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/00-course-setup/03-providers.md)** – Documents the `HUGGING_FACE_API_KEY` requirement alongside other providers like OpenAI and Azure OpenAI.
- **[`shared/python/env_utils.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/shared/python/env_utils.py)** – Contains the `get_env()` function for secure credential loading from `.env` files.
- **[`16-open-source-models/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/16-open-source-models/README.md)** – Introduces open-source LLMs including Llama 2, Mistral, Falcon, and OLMo with specific usage guidance.
- **[`06-text-generation-apps/python/oai-app.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/06-text-generation-apps/python/oai-app.py)** – Example OpenAI-based application that can be adapted for Hugging Face models.
- **[`requirements.txt`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/requirements.txt)** – Declares `transformers` and `huggingface_hub` dependencies.
- **`.env.copy`** – Template file showing where to place your `HUGGING_FACE_API_KEY`.

## Summary

- **Configure authentication** by adding `HUGGING_FACE_API_KEY` to your `.env` file and loading it via [`shared/python/env_utils.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/shared/python/env_utils.py).
- **Install dependencies** from [`requirements.txt`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/requirements.txt) to get `transformers` and `huggingface_hub`.
- **Choose your interface**: use the `pipeline` API for simplicity or `AutoModelForCausalLM` for fine-grained control.
- **Swap providers** in existing applications by replacing the OpenAI client initialization with a Hugging Face `pipeline` while preserving prompt logic.
- **Deploy as an API** using Flask to create endpoints compatible with the repository's serverless patterns.

## Frequently Asked Questions

### How do I get a Hugging Face access token?

Visit your Hugging Face account settings and navigate to the "Access Tokens" section. Generate a new token with read permissions and copy it into your `.env` file as `HUGGING_FACE_API_KEY`. The [`00-course-setup/03-providers.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/00-course-setup/03-providers.md) file documents this process alongside other provider configurations.

### Can I use CPU instead of GPU for inference?

Yes. When creating the `pipeline` or loading `AutoModelForCausalLM`, set the `device` parameter to `-1` to force CPU usage, or omit the parameter entirely. The examples in `16-open-source-models` demonstrate both CPU and GPU configurations depending on your hardware availability.

### Which open-source models does the repository recommend?

The `16-open-source-models` lesson specifically mentions **Llama 2**, **Mistral**, **Falcon**, and **OLMo** as recommended open-source LLMs available on Hugging Face. These models vary in size and licensing, allowing you to choose based on your specific performance and commercial use requirements.

### How do I migrate an existing OpenAI-based app to Hugging Face?

Replace the OpenAI client initialization in files like [`06-text-generation-apps/python/oai-app.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/06-text-generation-apps/python/oai-app.py) with a Hugging Face `pipeline` instance. Keep the prompt engineering logic and response handling code identical—only the model client changes. The `generate_response` function serves as a drop-in replacement for `chat.completions.create`, returning text that the rest of your application can process unchanged.