
企业微信

飞书
选择您喜欢的方式加入群聊

扫码添加咨询专家
在构建企业级 LLM 应用时,Prompt 管理和可观测性是两个经常被忽视但至关重要的问题。AskTable 通过 Langfuse 实现了完整的 Prompt 生命周期管理和 LLM 调用追踪,让 AI 应用真正可维护、可优化。
很多团队在开发 LLM 应用时,Prompt 直接硬编码在代码中:
# ❌ 不推荐的做法
system_prompt = """
You are a helpful SQL assistant.
Given the following database schema:
{schema}
Generate SQL query for: {question}
"""
response = openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": system_prompt.format(schema=schema)},
{"role": "user", "content": question}
]
)
这种方式的问题:
AskTable 使用 Langfuse 实现了 Prompt 的集中管理和 LLM 调用的完整追踪:
# ✅ 推荐的做法
from app.atserver.ai.llmops import get_prompt
# 从 Langfuse 获取 Prompt
prompt_client = get_prompt("agent/db_agent")
system_prompt = prompt_client.compile(meta=db_meta)
# 使用 Langfuse 包装的 OpenAI 客户端
from langfuse.openai import openai
response = await openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": question}
]
)
优势:
上图展示了 Prompt 的完整迭代流程:在 Langfuse 平台创建新版本,通过 label 机制进行灰度测试,验证效果后推广到生产环境,发现问题可以一键回滚。整个过程无需修改代码或重启服务。
AskTable 实现了一个 Prompt 代理层,支持从 Langfuse 或本地文件加载 Prompt:
class PromptProxy:
prompt_cache: dict[str, PromptClient] | None = None
_from_local: bool
def __init__(self, from_local: bool):
self._from_local = from_local
if not self._from_local:
# 从 Langfuse 运行时获取 Prompt
self.prompt_cache = None
else:
# 从本地 assets/prompts.json 加载 Prompt
with open("assets/prompts.json") as f:
prompts = json.load(f)
prompts_typed = [Prompt_Text(**prompt) for prompt in prompts]
self.prompt_cache = {
prompt.name: TextPromptClient(prompt) for prompt in prompts_typed
}
def get_prompt(self, prompt_name: str) -> PromptClient:
if not self._from_local:
return langfuse.get_prompt(prompt_name, label=config.observer_prompt_label)
else:
assert self.prompt_cache is not None, "prompt cache is not initialized"
if prompt_name in self.prompt_cache:
return self.prompt_cache[prompt_name]
else:
raise errors.NotFound
def list_prompts(self) -> list[str]:
if not self._from_local:
all_prompts = get_all_prompts()
return [prompt.name for prompt in all_prompts]
else:
assert self.prompt_cache is not None, "prompt cache is not initialized"
return list(self.prompt_cache.keys())
# 全局单例
_prompt_proxy = PromptProxy(config.at_prompt_local)
def get_prompt(prompt_name: str) -> PromptClient:
return _prompt_proxy.get_prompt(prompt_name)
设计亮点:
在线模式(生产环境):
from_local = False
# 从 Langfuse 实时获取 Prompt
# 优势:可以在线修改,无需重启服务
离线模式(开发/测试环境):
from_local = True
# 从本地 JSON 文件加载 Prompt
# 优势:不依赖外部服务,开发更快
langfuse.get_prompt(prompt_name, label=config.observer_prompt_label)
Label 的作用:
production、staging、developmentvariant_a、variant_b示例场景:
# 生产环境使用稳定版本
config.observer_prompt_label = "production"
# 测试环境使用最新版本
config.observer_prompt_label = "latest"
# A/B 测试
if user_id % 2 == 0:
label = "variant_a"
else:
label = "variant_b"
Langfuse 支持 Jinja2 模板语法,可以动态注入变量:
# DBAgent 中的使用
system_prompt = get_prompt("agent/db_agent").compile(meta=self.db_meta)
Prompt 模板示例:
You are a database assistant. You have access to the following database schema:
{% for schema in meta.schemas %}
Schema: {{ schema.name }}
Tables:
{% for table in schema.tables %}
- {{ table.name }}
{% endfor %}
{% endfor %}
Your task is to help users query this database using natural language.
Available tools:
- show_table: Get detailed information about specific tables
- search_metadata: Search for relevant tables and fields
- execute_sql: Execute SQL queries
Guidelines:
1. First use search_metadata to find relevant tables
2. Then use show_table to get detailed field information
3. Finally generate and execute SQL query
4. Always explain your reasoning
编译后的结果:
You are a database assistant. You have access to the following database schema:
Schema: public
Tables:
- users
- orders
- products
Schema: analytics
Tables:
- sales_summary
- user_behavior
Your task is to help users query this database using natural language.
...
优势:
AskTable 使用 Langfuse 包装的 OpenAI 客户端,自动追踪所有 LLM 调用:
from langfuse.openai import openai
class AsyncLLMClient:
@property
def client(self) -> openai.AsyncOpenAI:
client, _ = get_current_llm()
return client
async def create_completion(
self,
messages: list[dict],
model: str = "gpt-4o-mini",
tools: list[dict] | None = None,
response_format: dict | None = None,
temperature: float = 0,
prompt_client: PromptClient | None = None,
parallel_tool_calls: bool = False,
) -> ChatCompletion:
kwargs = get_completion_kwargs(
messages=messages,
model=model,
temperature=temperature,
response_format=response_format,
tools=tools,
parallel_tool_calls=parallel_tool_calls,
prompt_client=prompt_client,
)
_s = time.perf_counter()
response = await self.client.chat.completions.create(**kwargs)
log.debug(f"LLM response: {response}")
handle_completion_response(response)
# 累计 LLM 调用时间
if at_llm_total_duration_var.get(None) is not None:
acc = at_llm_total_duration_var.get()
at_llm_total_duration_var.set(acc + time.perf_counter() - _s)
return response
自动追踪的信息:
初始版本(准确率 70%):
Generate SQL query for: {{ question }}
Database schema:
{{ schema }}
优化版本 1(准确率 75%):
You are an expert SQL developer. Generate a SQL query to answer the following question:
Question: {{ question }}
Database schema:
{{ schema }}
Requirements:
- Use proper JOIN syntax
- Add appropriate WHERE clauses
- Format the output clearly
优化版本 2(准确率 85%):
You are an expert SQL developer. Follow these steps to generate a SQL query:
1. Analyze the question: {{ question }}
2. Identify required tables and fields from the schema below
3. Determine the relationships between tables
4. Generate the SQL query
Database schema:
{{ schema }}
Requirements:
- Use explicit JOIN syntax (INNER JOIN, LEFT JOIN, etc.)
- Add appropriate WHERE clauses for filtering
- Use meaningful aliases for tables
- Add comments to explain complex logic
- Format the output for readability
Example:
Question: "Show me total sales by region"
SQL:
-- Calculate total sales grouped by region
SELECT
r.region_name,
SUM(s.amount) AS total_sales
FROM sales s
INNER JOIN regions r ON s.region_id = r.id
GROUP BY r.region_name
ORDER BY total_sales DESC;
迭代流程:
testingproduction无需修改代码,无需重启服务!
目标:对比两种 Prompt 风格的效果
Variant A(简洁风格):
Generate SQL for: {{ question }}
Schema: {{ schema }}
Variant B(详细风格):
You are an expert SQL developer...
[详细的指导和示例]
实现:
# 根据用户 ID 分流
if hash(user_id) % 2 == 0:
label = "variant_a"
else:
label = "variant_b"
prompt_client = langfuse.get_prompt("agent/db_agent", label=label)
在 Langfuse 中对比:
决策:
问题:发现某个 Prompt 的 token 消耗过高
分析:
在 Langfuse 中查看 Prompt 的 token 分布:
优化:
# 优化前:包含所有字段的详细信息
{% for table in schema.tables %}
Table: {{ table.name }}
Description: {{ table.description }}
Fields:
{% for field in table.fields %}
- {{ field.name }}: {{ field.type }} - {{ field.description }}
Sample values: {{ field.sample_values }}
Statistics: {{ field.statistics }}
{% endfor %}
{% endfor %}
# 优化后:只包含必要信息
{% for table in schema.tables %}
Table: {{ table.name }}
Fields: {{ table.fields | map(attribute='name') | join(', ') }}
{% endfor %}
Use show_table tool to get detailed information when needed.
效果:
问题:用户反馈某个查询生成的 SQL 有错误
追踪流程:
发现问题:
# Trace 显示
User question: "上个月的销售额"
Prompt version: v1.2.3
LLM output: "SELECT SUM(amount) FROM sales WHERE date > '2024-02-01'"
Error: 生成的 SQL 没有考虑"上个月"的结束日期
修复:
# 在 Prompt 中添加时间处理指导
When handling time-related queries:
- "上个月" means the previous calendar month
- Use BETWEEN for date ranges
- Example: WHERE date BETWEEN '2024-02-01' AND '2024-02-29'
验证:
在 Langfuse 中创建新版本,对比修复前后的效果。
性能指标:
质量指标:
业务指标:
完整的调用链追踪:
Trace: user_query_12345
├─ Span: plan_generation (2.3s, $0.02)
│ ├─ LLM Call: gpt-4o-mini (1.8s, $0.015)
│ └─ Tool Call: search_metadata (0.5s)
├─ Span: sql_generation (3.1s, $0.03)
│ ├─ Tool Call: show_table (0.3s)
│ ├─ LLM Call: gpt-4o (2.5s, $0.025)
│ └─ Tool Call: execute_sql (0.3s)
└─ Span: result_formatting (0.5s, $0.005)
└─ LLM Call: gpt-4o-mini (0.5s, $0.005)
Total: 5.9s, $0.055
分析价值:
# 在 Langfuse 中设置告警规则
if avg_latency > 5.0: # 平均延迟超过 5 秒
send_alert("High latency detected")
if error_rate > 0.1: # 错误率超过 10%
send_alert("High error rate detected")
if daily_cost > 1000: # 每日成本超过 $1000
send_alert("High cost detected")
命名规范:
agent/db_agent/v1.0.0 # 主版本.次版本.修订版本
agent/db_agent/v1.1.0 # 新增功能
agent/db_agent/v1.1.1 # Bug 修复
Label 策略:
production:生产环境稳定版本staging:预发布环境测试版本latest:最新开发版本variant_a、variant_b:A/B 测试版本清晰的角色定义:
You are an expert SQL developer with 10 years of experience.
Your task is to generate accurate and efficient SQL queries.
明确的任务描述:
Given a natural language question and database schema,
generate a SQL query that answers the question.
具体的要求和约束:
Requirements:
- Use explicit JOIN syntax
- Add appropriate indexes hints if needed
- Limit results to 1000 rows by default
- Handle NULL values properly
示例和模板:
Example:
Question: "Show me top 10 customers by revenue"
SQL:
SELECT
c.customer_name,
SUM(o.amount) AS total_revenue
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.customer_name
ORDER BY total_revenue DESC
LIMIT 10;
使用合适的模型:
# 简单任务用小模型
if task_complexity == "simple":
model = "gpt-4o-mini" # 便宜
else:
model = "gpt-4o" # 准确
# 或者使用级联策略
try:
result = await llm_client.create_completion(model="gpt-4o-mini", ...)
if not validate(result):
result = await llm_client.create_completion(model="gpt-4o", ...)
except:
result = await llm_client.create_completion(model="gpt-4o", ...)
优化 Prompt 长度:
# 使用工具调用按需获取信息,而不是一次性传入所有信息
Initial context: {{ summary }}
Use show_table tool to get detailed information when needed.
缓存策略:
# 对于相同的问题,缓存结果
cache_key = hash(question + schema_version)
if cache_key in cache:
return cache[cache_key]
角色分工:
工作流程:
testingproductionLangfuse 为 AskTable 提供了完整的 Prompt 管理和 LLM 可观测性能力:
这些能力让 LLM 应用从"黑盒"变成"白盒",从"不可维护"变成"可持续优化",是构建企业级 AI 应用的必备基础设施。
如果你正在构建 LLM 应用,强烈建议从第一天就集成 Langfuse 或类似的可观测性平台,而不是等到出现问题再补救。