Advanced Prompt Engineering Techniques for Better Generative AI Results
Advanced prompt engineering techniques combine system messages, few-shot examples, chain-of-thought reasoning, and iterative refinement to transform vague queries into deterministic, high-quality LLM outputs.
Advanced prompt engineering techniques are essential for extracting reliable, production-grade results from large language models. The microsoft/generative-ai-for-beginners curriculum provides a comprehensive framework that layers foundational concepts with sophisticated strategies to control model behavior. By structuring prompts with specific architectural components and applying targeted methods, developers can significantly improve response accuracy and consistency.
Foundational Prompt Architecture
According to the curriculum in 04-prompt-engineering-fundamentals/README.md, effective prompts rely on five architectural layers that work together to guide model behavior.
-
System message: Defines the role, tone, and constraints for the assistant. This provides global guidance that persists throughout the interaction (lines 31-33).
-
User message: Contains the actual task or question the model must answer.
-
Primary content: Supplies the factual basis or data the model should operate on, such as passages to summarize or analyze (lines 31-35).
-
Secondary content: Includes extra instructions, format specifications, or parameters like temperature and length constraints (lines 90-94).
-
Template variables: Placeholders such as
{{variable}}that enable reusable prompt libraries by allowing runtime substitution (lines 25-34 in05-advanced-prompts/README.md).
Advanced Prompt Engineering Techniques
The repository catalogs several sophisticated methods in 05-advanced-prompts/README.md to enhance model performance across diverse tasks.
Chain-of-Thought Prompting
Chain-of-thought (CoT) prompting makes the model reason step-by-step before answering. This technique is particularly effective for arithmetic, logic, and complex reasoning tasks.
In 05-advanced-prompts/README.md (lines 92-106), the curriculum demonstrates how showing intermediate calculations within examples helps the model replicate the reasoning pattern. Instead of asking for the final answer directly, prompt the model to "think step by step" or provide examples that include the work.
Few-Shot Prompting
Few-shot prompting supplies the model with 2-3 examples to steer style, format, or domain expertise. As documented in lines 78-85 of 05-advanced-prompts/README.md, providing sample inputs and outputs before the actual request biases the model toward the desired response structure.
This technique proves especially valuable when you need specific output formats like JSON or when adapting the model to specialized domains such as legal analysis or creative writing.
Self-Refine and Maieutic Prompting
Self-refine prompting implements a critique-then-revise loop. The model first generates an answer, then receives instructions to critique and improve its own output (lines 1-4 under the Self-refine heading in 05-advanced-prompts/README.md).
Maieutic prompting extends this concept by forcing the model to explain each part of its answer before finalizing. This Socratic method, described in lines 70-74, reduces hallucinations by requiring explicit justification for each claim.
Generated Knowledge with Templates
Generated knowledge prompting injects external structured data into the prompt using template variables. The repository shows how to use placeholders like {{company}} and {{budget}} in lines 25-34 of 05-advanced-prompts/README.md, substituting them at runtime with domain-specific catalogs or databases.
This approach grounds the model in factual context rather than relying solely on parametric knowledge.
Least-to-Most Prompting
Least-to-most prompting breaks complex problems into sequences of increasingly detailed subtasks. As illustrated in lines 91-99 of 05-advanced-prompts/README.md, you first ask for a high-level overview (such as "How to perform data science in 5 steps?"), then expand each step with follow-up prompts.
Temperature and Sampling Controls
Temperature settings adjust output randomness, ranging from 0 (deterministic) to 1 (highly variable). The curriculum in 05-advanced-prompts/README.md (lines 31-38) recommends setting temperature=0.0 or 0.1 for repeatable code generation, while raising to 0.9 for creative tasks requiring diverse variations.
Prompt Cues and Output Guards
Prompt cues are short prefixes that nudge the model toward required formats. The repository documents this pattern in 04-prompt-engineering-fundamentals/README.md (lines 70-78), showing how appending "Summarize This" before target content improves instruction following.
Output guards provide fallback answers when the model cannot satisfy requests, reducing hallucinations. Line 40 of 04-prompt-engineering-fundamentals/README.md demonstrates adding instructions like "If you cannot generate a valid list, respond with 'No suitable items found.'"
Implementation Examples
The repository includes runnable Python examples demonstrating these techniques with the OpenAI SDK.
Basic Prompt Structure
This example shows the system and user message pattern from 04-prompt-engineering-fundamentals/README.md:
import openai, os
openai.api_key = os.getenv("OPENAI_API_KEY")
def chat(messages, temperature=0.7):
return openai.ChatCompletion.create(
model="gpt-4o-mini",
messages=messages,
temperature=temperature,
)["choices"][0]["message"]["content"]
system = {"role": "system", "content": "You are a concise technical writer."}
user = {"role": "user", "content": "Explain chain-of-thought prompting in two sentences."}
print(chat([system, user]))
Few-Shot with JSON Output
This implementation from the curriculum demonstrates few-shot prompting for structured output:
examples = """
Input: Budget $900, restrict to Car only → Output: {"product":"Car","cost":500}
Input: Budget $1200, restrict to Home & Life → Output: {"product":"Home","cost":600,"extra":"Life","extraCost":100}
"""
instruction = "Given a budget of $1000 and allow only Car and Home, return the cheapest valid combination as JSON."
messages = [
{"role": "system", "content": "You are a finance assistant that always returns valid JSON."},
{"role": "user", "content": f"{examples}\n---\n{instruction}"},
]
print(chat(messages, temperature=0.0))
Chain-of-Thought Implementation
prompt = """Solve step-by-step:
Alice has 12 apples. She gives 5 to Bob, then buys 3 more.
How many apples does Alice have now?"""
messages = [
{"role": "system", "content": "You are a logical reasoner."},
{"role": "user", "content": prompt},
]
print(chat(messages, temperature=0.0))
Self-Refine Loop
This pattern implements the critique-then-revise cycle documented in 05-advanced-prompts/README.md:
def self_refine(initial_prompt, max_iters=2):
# Initial answer generation
answer = chat([{"role": "user", "content": initial_prompt}])
for i in range(max_iters):
critique = "Identify any factual errors or missing steps, then rewrite the answer."
answer = chat([
{"role": "assistant", "content": answer},
{"role": "user", "content": critique},
])
return answer
question = "Write a short Python function that returns the Fibonacci series up to n."
print(self_refine(question))
Template Variable Substitution
Using Python's string.Template for generated knowledge prompting:
from string import Template
tpl = Template(
"""Insurance company: $company
Products:
$products
Budget: $budget
Restrict to: $allowed
Return JSON with the cheapest combination under budget."""
)
filled = tpl.substitute(
company="ACME",
products="- Car, $500/mo\n- Home, $600/mo\n- Life, $100/mo",
budget="$1000",
allowed="Car, Home"
)
print(chat([{"role": "system", "content": "You are a finance assistant."},
{"role": "user", "content": filled}],
temperature=0.0))
Key Source Files
The advanced prompt engineering techniques are implemented and documented across these repository locations:
-
04-prompt-engineering-fundamentals/README.md: Core concepts including system messages, primary/secondary content, prompt cues, and output guards. -
05-advanced-prompts/README.md: Comprehensive catalog of zero-shot, few-shot, chain-of-thought, self-refine, maieutic prompting, and temperature controls. -
06-text-generation-apps/python/aoai-app.py: Working Azure OpenAI chat implementation. -
06-text-generation-apps/typescript/recipe-app/src/app.ts: TypeScript implementation of message passing and temperature controls.
Summary
-
Layer your prompts using system messages for global context, primary content for data, and secondary content for formatting instructions.
-
Use few-shot examples to steer output format and style, particularly for structured outputs like JSON.
-
Implement chain-of-thought prompting for complex reasoning tasks by requesting step-by-step explanations.
-
Apply self-refine loops to iteratively improve responses through critique and revision cycles.
-
Control determinism with temperature settings—use 0.0 for code generation and higher values for creative tasks.
-
Utilize template variables to inject external knowledge and create reusable prompt libraries.
-
Add output guards to provide fallback responses and reduce hallucinations when the model lacks sufficient information.
Frequently Asked Questions
What is the difference between zero-shot and few-shot prompting?
Zero-shot prompting relies entirely on the model's pre-trained knowledge without providing examples, suitable for straightforward questions the model likely encountered during training. Few-shot prompting, as documented in 05-advanced-prompts/README.md (lines 78-85), supplies 2-3 input-output examples before the actual query, which significantly improves performance on specialized formats or domains by showing the model exactly how to structure its response.
How does chain-of-thought prompting improve model accuracy?
Chain-of-thought prompting improves accuracy by forcing the model to generate intermediate reasoning steps before providing the final answer. According to the curriculum in 05-advanced-prompts/README.md (lines 92-106), this technique is particularly effective for arithmetic, logic puzzles, and multi-step reasoning tasks because it surfaces the model's reasoning process, making errors in logic visible and allowing the model to self-correct during generation.
When should I use self-refine prompting versus single-pass generation?
Use self-refine prompting when accuracy and completeness are critical, such as technical documentation, complex code generation, or factual summaries. As implemented in 05-advanced-prompts/README.md, this technique runs the model through multiple iterations where it critiques and revises its own output, catching errors that single-pass generation might miss. Single-pass generation remains appropriate for simple queries or low-latency applications where speed outweighs perfection.
What temperature setting should I use for deterministic outputs?
Set temperature to 0.0 for fully deterministic outputs where consistency is paramount, such as generating code, JSON structures, or factual extractions. The repository's guidance in 05-advanced-prompts/README.md (lines 31-38) indicates that temperatures between 0.0 and 0.1 produce nearly identical outputs across repeated calls, while values above 0.7 introduce creative variation suitable for brainstorming, marketing copy, or artistic content.
Have a question about this repo?
These articles cover the highlights, but your codebase questions are specific. Give your agent direct access to the source. Share this with your agent to get started:
curl -s "https://instagit.com/install.md" Maintain an open-source project? Get it listed too →