Pytest Integration
Integrate FluxLoop experiments into your pytest test suite for CI/CD pipelines.
Overview
Starting with FluxLoop 0.2.29, you can run FluxLoop experiments as part of your pytest tests using specialized fixtures. This enables:
- Familiar Testing Workflow: Use
pytestcommands developers already know - CI/CD Integration: Run experiments in GitHub Actions, GitLab CI, etc.
- Assertion Support: Use pytest assertions on experiment results
- Test Discovery: Automatically discover and run FluxLoop tests
- Parallel Execution: Run multiple experiment tests in parallel
Quick Start
1. Install with Dev Dependencies
pip install -e packages/cli[dev]
# or for published package
pip install fluxloop-cli[dev]
2. Generate Test Template
# Generate pytest template in default location (tests/)
fluxloop init pytest-template
# Custom location
fluxloop init pytest-template --tests-dir integration_tests
# Custom filename
fluxloop init pytest-template --filename test_agent_smoke.py
What gets created:
# tests/test_fluxloop_smoke.py
import pytest
from pathlib import Path
from fluxloop_cli.testing.pytest_plugin import fluxloop_runner
PROJECT_ROOT = Path(__file__).resolve().parents[1]
def test_fluxloop_smoke(fluxloop_runner):
"""Smoke test: verify agent runs without errors."""
result = fluxloop_runner(
project_root=PROJECT_ROOT,
simulation_config=PROJECT_ROOT / "configs" / "simulation.yaml",
overrides={"iterations": 1},
env={"PYTHONPATH": str(PROJECT_ROOT)},
)
# Assert on results
assert result.total_runs > 0
assert result.success_rate >= 0.8
# Or use convenience method
result.require_success(threshold=0.8)
3. Run Tests
# Run FluxLoop tests only
pytest -k fluxloop_smoke
# Run with verbose output
pytest -k fluxloop -v
# Stop on first failure
pytest -k fluxloop --maxfail=1
# Run all tests including FluxLoop
pytest
Adapter Workflow for Existing Agents
Use the pytest bridge as an adapter layer when your AI agent already lives inside another repository (e.g., a LangGraph tutorial project) and FluxLoop needs to drive that code end-to-end.
1. Expose a FluxLoop Runner Entry Point
Create a module such as customer_support/runner.py and define a run function. The FluxLoop python-function runner imports and calls this entry point directly.
"""FluxLoop simulation runner entry point."""
from __future__ import annotations
import uuid
from typing import Any, Dict, Optional
from dotenv import load_dotenv
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI
from customer_support import prepare_database
from customer_support.data.travel_db import get_default_storage_dir
from customer_support.graphs import build_part4_graph
from customer_support.tracing import init_tracing, trace_graph_execution
DEFAULT_PROVIDER = "anthropic"
@trace_graph_execution
def run(input_payload: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
load_dotenv()
init_tracing()
payload = input_payload or {}
user_message = payload.get("message", "Hello, I need help with my flight.")
passenger_id = payload.get("passenger_id", "3442 587242")
provider = payload.get("provider", DEFAULT_PROVIDER).lower()
thread_id = payload.get("thread_id") or str(uuid.uuid4())
data_dir = get_default_storage_dir()
data_dir.mkdir(parents=True, exist_ok=True)
db_path = prepare_database(target_dir=data_dir, overwrite=False)
if provider == "openai":
llm = ChatOpenAI(model="gpt-4o-mini", temperature=1)
else:
llm = ChatAnthropic(model="claude-haiku-4-5-20251001", temperature=1)
graph = build_part4_graph(str(db_path), llm=llm)
result = graph.invoke({"messages": [("user", user_message)]}, config={
"configurable": {"passenger_id": passenger_id, "thread_id": thread_id}
})
messages = result.get("messages", [])
response = ""
if messages and hasattr(messages[-1], "content"):
response = messages[-1].content
return {
"response": response,
"messages": [
{"role": getattr(m, "type", "unknown"), "content": getattr(m, "content", str(m))}
for m in messages
],
"thread_id": thread_id,
}
2. Align simulation.yaml
Point the runner section at the entry point you just created. If you need additional import paths, add them to python_path and FluxLoop will extend sys.path for you.
runner:
module_path: "customer_support.runner"
function_name: "run"
working_directory: /Users/you/projects/customer-support
python_path:
- src
timeout_seconds: 120
max_retries: 3
Even if the tutorial removes metrics such as task_completion, FluxLoop CLI now injects default thresholds so the evaluation report keeps rendering.
3. Write a Pytest Adapter Test
The pytest file itself acts as the adapter that launches the FluxLoop experiment.
# tests/test_customer_support.py
from pathlib import Path
PROJECT_ROOT = Path(__file__).resolve().parents[1]
def test_customer_support_agent(fluxloop_runner):
result = fluxloop_runner(
project_root=PROJECT_ROOT,
simulation_config=PROJECT_ROOT / "configs" / "simulation.yaml",
overrides={
"iterations": 1,
"runner.working_directory": str(PROJECT_ROOT / "langgraph" / "customer-support"),
"runner.python_path.0": "src",
},
env={
"PYTHONPATH": str(PROJECT_ROOT / "langgraph" / "customer-support"),
"OPENAI_API_KEY": "...",
"ANTHROPIC_API_KEY": "...",
},
timeout=180,
)
result.require_success("customer support agent")
Before running the test for the first time, install the SDK locally (pip install -e packages/sdk) and verify python -c "import fluxloop; print(fluxloop.__version__)". Use env={"PYTHONPATH": ...} to append extra agent source folders whenever imports would otherwise fail.
4. Run from Pytest or CI
pytest tests/test_customer_support.py -k customer_support_agent -v
Use the same command inside GitHub Actions or GitLab CI (pytest -k customer_support_agent --maxfail=1). Upload result.output_dir / report.html as an artifact to share the regression report with the rest of your team.
Available Fixtures
fluxloop_runner
Executes experiments using FluxLoop's ExperimentRunner directly (SDK mode).
Signature:
def fluxloop_runner(
project_root: Path,
simulation_config: Path,
overrides: dict | None = None,
env: dict | None = None,
timeout: int = 600,
) -> FluxLoopTestResult:
...
Parameters:
| Parameter | Type | Description | Default |
|---|---|---|---|
project_root | Path | Project root directory | Required |
simulation_config | Path | Path to simulation.yaml | Required |
overrides | dict | Override config values | None |
env | dict | Environment variables | None |
timeout | int | Timeout in seconds | 600 |
Example:
def test_basic_run(fluxloop_runner):
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={"iterations": 5},
env={"PYTHONPATH": str(Path.cwd())},
)
assert result.total_runs == 5
result.require_success()
fluxloop_runner_multi_turn
Convenience fixture for multi-turn experiments (auto-enables multi-turn mode).
Example:
def test_multi_turn(fluxloop_runner_multi_turn):
result = fluxloop_runner_multi_turn(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={
"iterations": 2,
"multi_turn": {
"max_turns": 8,
"auto_approve_tools": True,
}
},
)
assert result.total_runs == 2
result.require_success(threshold=0.7)
fluxloop_cli (Advanced)
Executes experiments by calling fluxloop run experiment as a subprocess (CLI mode).
When to Use:
- Testing actual CLI commands
- Verifying command-line behavior
- Debugging CLI output
- Integration testing with full CLI stack
Example:
def test_cli_execution(fluxloop_cli):
result = fluxloop_cli(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
cli_args=["--iterations", "3", "--yes"],
)
# Check CLI execution
assert result.success_rate > 0.8
# Access CLI-specific info
print(f"Command: {result.cli_command}")
print(f"Stdout: {result.stdout_path}")
print(f"Stderr: {result.stderr_path}")
FluxLoopTestResult API
All fixtures return a FluxLoopTestResult object with test metrics and paths.
Properties
class FluxLoopTestResult:
# Metrics
total_runs: int # Total experiment runs
success_rate: float # Success rate (0.0-1.0)
avg_duration_ms: float # Average duration in ms
# File Paths
experiment_dir: Path # Experiment output directory
trace_summary_path: Path # trace_summary.jsonl path
per_trace_path: Path | None # per_trace.jsonl (if parsed)
# CLI-specific (fluxloop_cli fixture only)
cli_command: str | None # Full CLI command
stdout_path: Path | None # Stdout log file
stderr_path: Path | None # Stderr log file
Methods
require_success()
Assert that success rate meets threshold.
result.require_success(threshold=0.8)
# Raises AssertionError if success_rate < 0.8
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
threshold | float | 1.0 | Minimum success rate (0.0-1.0) |
message | str | Auto-generated | Custom error message |
Examples:
# Require 100% success
result.require_success()
# Require at least 80% success
result.require_success(threshold=0.8)
# Custom error message
result.require_success(
threshold=0.9,
message="Agent quality below 90%"
)
require_min_runs()
Assert minimum number of runs.
result.require_min_runs(min_runs=10)
# Raises AssertionError if total_runs < 10
require_max_duration()
Assert average duration is below threshold.
result.require_max_duration(max_ms=500)
# Raises AssertionError if avg_duration_ms > 500
Complete Examples
Basic Smoke Test
def test_agent_smoke(fluxloop_runner):
"""Quick validation that agent runs."""
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={"iterations": 1},
)
# Just verify it completes
assert result.total_runs > 0
Regression Test
def test_agent_regression(fluxloop_runner):
"""Ensure agent maintains quality standards."""
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={"iterations": 10},
)
# Strict success requirements
result.require_success(threshold=0.95)
result.require_max_duration(max_ms=1000)
Performance Test
def test_agent_performance(fluxloop_runner):
"""Verify agent meets latency SLAs."""
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={"iterations": 50},
)
# Check latency
assert result.avg_duration_ms < 500, \
f"Agent too slow: {result.avg_duration_ms}ms"
# Still require reasonable quality
result.require_success(threshold=0.85)
Multi-Turn Conversation Test
def test_multi_turn_conversation(fluxloop_runner_multi_turn):
"""Test agent handles multi-turn dialogues."""
result = fluxloop_runner_multi_turn(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={
"iterations": 5,
"multi_turn": {
"max_turns": 10,
"auto_approve_tools": True,
}
},
)
result.require_success(threshold=0.80)
result.require_min_runs(min_runs=5)
Persona-Specific Test
def test_expert_persona(fluxloop_runner):
"""Test agent with expert user persona."""
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={
"iterations": 10,
"personas": ["expert_user"], # Filter to specific persona
},
)
# Expert users should get faster responses
result.require_max_duration(max_ms=300)
result.require_success(threshold=0.90)
Custom Assertions
def test_custom_metrics(fluxloop_runner):
"""Test with custom evaluation logic."""
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={"iterations": 20},
)
# Read per-trace data for custom checks
import json
with open(result.trace_summary_path) as f:
traces = [json.loads(line) for line in f]
# Custom assertions
long_traces = [t for t in traces if t["duration_ms"] > 1000]
assert len(long_traces) < 2, "Too many slow responses"
failed_traces = [t for t in traces if not t.get("success")]
assert len(failed_traces) == 0, "Found failed traces"
CI/CD Integration
GitHub Actions
Create .github/workflows/fluxloop-tests.yml:
name: FluxLoop Tests
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e packages/cli[dev]
- name: Run FluxLoop tests
env:
PYTHONPATH: ${{ github.workspace }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
source .venv/bin/activate
pytest -k fluxloop --maxfail=1 -v
- name: Upload results
if: always()
uses: actions/upload-artifact@v3
with:
name: fluxloop-results
path: experiments/
GitLab CI
Create .gitlab-ci.yml:
test:fluxloop:
stage: test
image: python:3.11
before_script:
- python -m venv .venv
- source .venv/bin/activate
- pip install -U pip
- pip install -e packages/cli[dev]
script:
- export PYTHONPATH=$CI_PROJECT_DIR
- pytest -k fluxloop --maxfail=1 -v
artifacts:
when: always
paths:
- experiments/
expire_in: 1 week
Example Workflow
Full example at examples/ci/fluxloop_pytest.yml:
name: fluxloop-pytest
on:
workflow_dispatch:
workflow_call:
jobs:
smoke:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install deps
run: |
python -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -e packages/cli[dev]
- name: Run FluxLoop Pytest suite
env:
PYTHONPATH: ${{ github.workspace }}
run: |
source .venv/bin/activate
pytest -k fluxloop_smoke --maxfail=1 --disable-warnings
Best Practices
1. Start with Minimal Iterations
Use low iteration counts for fast feedback:
def test_quick_smoke(fluxloop_runner):
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={"iterations": 1}, # Fast validation
)
result.require_success()
2. Use Fixtures for Setup
Share common configuration:
import pytest
from pathlib import Path
@pytest.fixture
def project_root():
return Path(__file__).parents[1]
@pytest.fixture
def simulation_config(project_root):
return project_root / "configs" / "simulation.yaml"
def test_agent(fluxloop_runner, project_root, simulation_config):
result = fluxloop_runner(
project_root=project_root,
simulation_config=simulation_config,
overrides={"iterations": 5},
)
result.require_success()
3. Set PYTHONPATH
Ensure agent modules are importable:
def test_agent(fluxloop_runner):
project_root = Path.cwd()
result = fluxloop_runner(
project_root=project_root,
simulation_config=project_root / "configs/simulation.yaml",
env={"PYTHONPATH": str(project_root)}, # Important!
)
result.require_success()
4. Use Timeouts
Prevent hanging tests:
def test_agent(fluxloop_runner):
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
timeout=300, # 5 minute timeout
)
result.require_success()
5. Organize Tests by Category
# tests/test_fluxloop_smoke.py
def test_smoke(fluxloop_runner):
"""Quick validation."""
...
# tests/test_fluxloop_regression.py
def test_regression(fluxloop_runner):
"""Quality regression tests."""
...
# tests/test_fluxloop_performance.py
def test_performance(fluxloop_runner):
"""Latency and throughput tests."""
...
Run by category:
pytest tests/test_fluxloop_smoke.py # Fast smoke tests
pytest tests/test_fluxloop_regression.py # Quality checks
pytest -k fluxloop # All FluxLoop tests
Advanced Usage
Parameterized Tests
Test multiple configurations:
import pytest
@pytest.mark.parametrize("iterations", [1, 5, 10])
def test_varying_iterations(fluxloop_runner, iterations):
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={"iterations": iterations},
)
result.require_success(threshold=0.8)
Custom Result Processing
def test_with_custom_processing(fluxloop_runner):
result = fluxloop_runner(
project_root=Path.cwd(),
simulation_config=Path("configs/simulation.yaml"),
overrides={"iterations": 10},
)
# Parse per-trace results
import json
traces = []
with open(result.trace_summary_path) as f:
for line in f:
traces.append(json.loads(line))
# Custom analysis
durations = [t["duration_ms"] for t in traces]
p95 = sorted(durations)[int(len(durations) * 0.95)]
assert p95 < 1000, f"P95 latency too high: {p95}ms"
Troubleshooting
Tests Hang or Timeout
Issue: Tests don't complete.
Solutions:
-
Set explicit timeout:
result = fluxloop_runner(..., timeout=300) -
Use
pytest-timeout:pip install pytest-timeout
pytest -k fluxloop --timeout=600 -
Add
--maxfail=1to stop early:pytest -k fluxloop --maxfail=1
Module Not Found
Issue: ModuleNotFoundError: No module named 'my_agent'
Solution: Set PYTHONPATH:
result = fluxloop_runner(
...,
env={"PYTHONPATH": str(project_root)},
)
Or in CI:
env:
PYTHONPATH: ${{ github.workspace }}
API Key Not Found
Issue: LLM provider errors.
Solution: Set API keys in environment:
# CI configuration
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
Or locally:
export OPENAI_API_KEY=sk-your-key
pytest -k fluxloop
See Also
- CI/CD Integration - Full CI/CD guide
- Run Command - CLI reference
- Basic Workflow - Complete workflow
- init pytest-template - Template generation