# Fine-Tuning LLMs vs Prompt Engineering: Key Differences and When to Use Each

> Understand the core differences between fine-tuning LLMs and prompt engineering. Learn when to use each technique to customize your generative AI models for optimal performance.

- Repository: [Microsoft/generative-ai-for-beginners](https://github.com/microsoft/generative-ai-for-beginners)
- Tags: tutorial
- Published: 2026-02-26

---

**Prompt engineering modifies the input text sent to a base model without changing its weights, while fine-tuning updates the model's parameters through training on curated datasets to create a persistent custom model.**

The microsoft/generative-ai-for-beginners repository provides comprehensive guidance on both approaches in lessons [`04-prompt-engineering-fundamentals/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/04-prompt-engineering-fundamentals/README.md) and [`18-fine-tuning/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/18-fine-tuning/README.md). Understanding when to apply **fine-tuning** versus **prompt engineering** determines your project's cost, latency, and maintenance requirements. These techniques operate on different layers of the generative AI stack—one manipulates the model's input context, while the other permanently alters the model's internal representations.

## What is Prompt Engineering?

**Prompt engineering** is the practice of designing and optimizing the text sent to a large language model to elicit desired outputs. According to the source code in [`04-prompt-engineering-fundamentals/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/04-prompt-engineering-fundamentals/README.md), this approach involves constructing prompts with system messages, few-shot examples, and contextual cues that guide the model's behavior without modifying any underlying parameters.

The only logic added exists in your **prompt-construction code**, which runs immediately before inference. Because the base model weights remain static, you can switch between different prompt strategies instantly by changing the input text.

### Implementation Example: Few-Shot Prompting

The following example from the repository's prompt engineering lesson demonstrates how to teach a model to translate English to Spanish using in-context examples:

```python
import os, openai

openai.api_key = os.getenv("OPENAI_API_KEY")
model = "gpt-3.5-turbo"

# Few-shot prompt that teaches the model to translate English to Spanish

prompt = [
    {"role": "system", "content": "You are a helpful translation assistant."},
    {"role": "user", "content": "Translate to Spanish: Hello, how are you?"},
    {"role": "assistant", "content": "Hola, ¿cómo estás?"},
    {"role": "user", "content": "Translate to Spanish: The sky is blue."}
]

response = openai.ChatCompletion.create(
    model=model,
    messages=prompt,
    temperature=0.2,
)

print(response["choices"][0]["message"]["content"])

# → "El cielo es azul."

```

This approach requires no data collection beyond crafting the prompt text. The examples act as implicit guidance without changing the model itself, as detailed in the "Prompt Construction" and "Examples" sections of [`04-prompt-engineering-fundamentals/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/04-prompt-engineering-fundamentals/README.md).

## What is Fine-Tuning?

**Fine-tuning** creates a custom model by updating the base model's weights through additional training on a curated dataset of input-output pairs. As implemented in [`18-fine-tuning/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/18-fine-tuning/README.md), this process produces a new model artifact that persists the learned behavior across all subsequent inference calls.

The training loop lives entirely outside the request-response path. After fine-tuning, you deploy the resulting model as its own endpoint, allowing you to send much simpler prompts (often just the user query) while receiving specialized outputs.

### Implementation Example: Using a Fine-Tuned Model

After creating a fine-tuned deployment named `gpt-35-turbo-finetuned` following the Azure tutorial in the fine-tuning lesson, you interact with it as a specialized endpoint:

```python
import os, openai

openai.api_key = os.getenv("AZURE_OPENAI_API_KEY")
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_version = "2023-05-15"

# The fine-tuned deployment behaves as a specialized "recipe assistant"

deployment = "gpt-35-turbo-finetuned"

response = openai.ChatCompletion.create(
    engine=deployment,               # Azure OpenAI uses the `engine` (deployment) name

    messages=[{"role": "user", "content": "Give me a quick vegan lasagna recipe."}],
    temperature=0.7,
)

print(response["choices"][0]["message"]["content"])

```

Because the model has already learned the mapping from user query to recipe format during training, the prompt can be concise. The "How can we fine-tune a pre-trained model?" section in [`18-fine-tuning/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/18-fine-tuning/README.md) explains how this embeds domain knowledge directly into the model weights.

## Core Differences Between Fine-Tuning and Prompt Engineering

Understanding the architectural implications helps determine which approach suits your use case.

**What Changes:**
- **Prompt engineering** only alters the text sent to the model. The underlying weights stay exactly the same.
- **Fine-tuning** updates the model's parameters, producing a new custom model artifact.

**Pipeline Location:**
- **Prompt engineering** happens immediately before inference. The prompt is built, possibly enriched with examples or system messages, then sent to the inference endpoint of a base LLM.
- **Fine-tuning** occurs during a separate training step. After completion, the new model is deployed and used for inference.

**Cost Model:**
- **Prompt engineering** incurs no extra compute cost beyond normal request fees. The trade-off is higher token usage because more context and examples are added to each call.
- **Fine-tuning** requires a one-time compute cost for the training job (GPU/CPU hours) and storage for the fine-tuned model. Afterwards, inference can be cheaper because fewer tokens are needed per request.

**Data Requirements:**
- **Prompt engineering** requires no data collection; you simply craft better prompts.
- **Fine-tuning** requires a training set of input-output pairs, often hundreds to thousands of examples, plus careful data cleaning as noted in [`18-fine-tuning/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/18-fine-tuning/README.md).

**Skill Level:**
- **Prompt engineering** is accessible to most developers and domain experts.
- **Fine-tuning** requires ML-engineering expertise, familiarity with hyper-parameters, and access to a training environment (GPU, Azure OpenAI, or Hugging Face).

**Portability:**
- **Prompt engineering** code can be moved between providers (OpenAI, Azure, Hugging Face) with minor adjustments.
- **Fine-tuned** artifacts are provider-specific. An Azure OpenAI fine-tuned model differs from an OpenAI-hosted fine-tuned model.

## Architectural Pipeline Comparison

The microsoft/generative-ai-for-beginners repository illustrates two distinct architectural paths for shaping model behavior.

### Prompt Engineering Path

1. The user or application builds a **prompt** that may include system messages, few-shot examples, and cues.
2. The constructed prompt is sent to the **inference endpoint** of a base LLM.
3. The model returns a completion based on the provided context.

This path relies on [`shared/python/api_utils.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/shared/python/api_utils.py) and [`shared/python/env_utils.py`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/shared/python/env_utils.py) for reusable request logic and secure API key handling, as seen in the repository's code examples. The behavior changes must be reapplied on every request.

### Fine-Tuning Path

1. Curate a dataset of input-to-desired-output pairs.
2. Launch a **fine-tuning job** (via Azure OpenAI, OpenAI, or Hugging Face) that updates the base model weights.
3. Deploy the newly minted fine-tuned model as its own endpoint.
4. At inference time, send a simplified prompt (often just the user query) to the fine-tuned endpoint.

The training loop exists outside the request-response path. Because fine-tuning modifies the model itself, the resulting behavior is **persistent** across all calls.

### Practical Side-by-Side Comparison

The following snippet from the repository demonstrates latency and output differences between approaches:

```python
import os, openai, time

openai.api_key = os.getenv("OPENAI_API_KEY")
model_base = "gpt-3.5-turbo"
model_finetuned = "ft:gpt-3.5-turbo-xxxx"   # placeholder ID from OpenAI fine-tuning

prompt = [{"role":"user","content":"Write a haiku about sunrise."}]

def ask(model):
    start = time.time()
    resp = openai.ChatCompletion.create(
        model=model,
        messages=prompt,
        temperature=0.5,
    )
    elapsed = time.time() - start
    return resp["choices"][0]["message"]["content"], elapsed

base_ans, base_t = ask(model_base)
fine_ans, fine_t = ask(model_finetuned)

print("Base model (", round(base_t,2),"s):", base_ans)
print("Fine-tuned (", round(fine_t,2),"s):", fine_ans)

```

While inference latency remains roughly equivalent, the fine-tuned model produces more on-topic outputs (if trained for specific styles) while using fewer tokens, illustrating the practical trade-offs documented in [`18-fine-tuning/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/18-fine-tuning/README.md).

## When to Choose Which Approach

Select **prompt engineering** when you need:
- Quick experiments and rapid iteration
- Limited training data or budget constraints
- Flexibility to test many variations without retraining
- Portability across different LLM providers

Select **fine-tuning** when you require:
- Stable, domain-specific behavior for production workloads
- Repeatedly serving the same specialized task at scale
- Reduced token consumption to lower per-request costs
- Persistent model behavior without complex prompt templates

## Summary

- **Prompt engineering** manipulates model inputs (system messages, few-shot examples) while keeping base weights static, making it ideal for rapid prototyping and provider-agnostic solutions.
- **Fine-tuning** updates model parameters through training on curated datasets, creating persistent custom models that require simpler prompts but demand ML expertise and provider-specific deployment.
- **Cost structures differ** significantly: prompt engineering increases per-request token costs, while fine-tuning incurs upfront training compute costs but can reduce long-term inference expenses.
- **Data requirements** favor prompt engineering for low-data scenarios, whereas fine-tuning requires hundreds to thousands of high-quality input-output pairs.
- **Implementation files** in the microsoft/generative-ai-for-beginners repository include [`04-prompt-engineering-fundamentals/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/04-prompt-engineering-fundamentals/README.md) for input optimization and [`18-fine-tuning/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/18-fine-tuning/README.md) for model customization workflows.

## Frequently Asked Questions

### Can I use prompt engineering and fine-tuning together?

Yes. You can apply prompt engineering techniques to a fine-tuned model. In fact, the [`18-fine-tuning/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/18-fine-tuning/README.md) lesson suggests that even after fine-tuning, you may still need to craft specific system messages or prompts to guide the custom model's behavior for edge cases.

### Does fine-tuning reduce API costs compared to prompt engineering?

Often yes, but not always. Fine-tuning requires an upfront compute investment for the training job (GPU hours and storage). However, because the fine-tuned model understands the domain implicitly, you can send shorter prompts with fewer tokens per request, reducing ongoing inference costs at scale according to the cost analysis in the repository.

### How much data do I need for fine-tuning versus prompt engineering?

Prompt engineering requires no training data—you only need example prompts to refine your approach. Fine-tuning typically requires hundreds to thousands of curated input-output pairs, plus validation data, as specified in the "What is fine-tuning for language models" section of [`18-fine-tuning/README.md`](https://github.com/microsoft/generative-ai-for-beginners/blob/main/18-fine-tuning/README.md).

### Is fine-tuning possible with any LLM provider?

No. Fine-tuning availability depends on the provider and specific model. The microsoft/generative-ai-for-beginners repository covers fine-tuning workflows for Azure OpenAI, OpenAI, and Hugging Face, but not all base models support fine-tuning, and the resulting artifacts are provider-specific and cannot be transferred between platforms.