Clawdia · Finance Advisor
A multi-asset portfolio advisor agent — equities, bonds, ETFs — that reacts proactively to every trade detected in a Schwab account, debates on demand about holdings and hypotheticals, and keeps a thesis ledger that turns into a learning loop. Everything the LLM says is grounded by citation discipline against hallucination: it cannot invent a number. It runs as an openclaw skill set, on the same single-VPS personal AI stack.
Doesn't execute trades. Only tracks, opines, and remembers.
The Use Case
Why this exists
I make portfolio decisions in fits and starts — a trade now, a rebalancing in three weeks. The hard part isn't the trade, it's remembering what I was thinking at the time. Six months later, looking at a position that went south, the question "was the thesis broken or am I just impatient?" is impossible to answer without a paper trail.
So this agent watches the account, prompts me to articulate a thesis at the moment I open a position, asks for an outcome when I close it, and on every detected trade weighs in with grounded commentary so I get a second opinion in real time. The bar is high: it has to be cheap, it has to never hallucinate a number, and it has to stay strictly read-only against the brokerage.
Architecture
How the loop runs
┌─────────────────────────────────────────────────────────────────────┐
│ cron (hourly): pm-cli portfolio sync │
└──────────────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────┐
│ Schwab API (read-only) │
│ OAuth · 7-day refresh │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ diff vs local DB │ qty > 5% → flagged
│ (SQLite · SQLAlchemy) │ cost > 2% → flagged
└────────────┬────────────┘
│ (flagged diffs)
▼
┌──────────────────┴───────────────────┐
▼ ▼ ▼
compute_impact market_data thesis ledger
deterministic yfinance (TTL 6h) position_thesis
│ │ │
└──────────────────┼───────────────────┘
▼
┌─────────────────────────┐
│ build_context_bundle │ typed key/value
│ (Python dict) │ pairs
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ LLM (glm-5.1) │
│ JSON response_format │
│ + citation enforcement│
└────────────┬────────────┘
│
┌────────────────┴────────────────┐
▼ ▼
citations valid → format_for_whatsapp citations invalid →
+ JudgmentLog row deterministic fallback
│
▼
openclaw message send → WhatsApp group
Two paths into the agent: a cron-driven proactive loop that fires on every detected position change, and a conversational loop the operator can drive on demand ("opiná sobre X", "qué pasa si compro X", "qué dijiste de X la semana pasada"). Both share the same context bundle, citation pipeline, and judgment log.
The Interesting Bit
Citation discipline against hallucination
The temptation with an LLM looking at a brokerage account is to let it say whatever it wants. In an advisor context that's not just sloppy — it's dangerous. So the LLM is pinned by a contract enforced in Python before any output reaches the user.
- Context bundle as ground truth. Before the LLM is called, Python builds a typed dict of facts:
affected_ticker,last_price,position_delta_pct,portfolio_weight_after,thesis_summary, etc. The bundle is the only source of facts the model is allowed to reference. - Structured output. The LLM must return JSON with exact required fields:
headline,market_read,portfolio_impact_read,ask_to_giu, and — critically — acitationslist. - Citation validation. Every string in
citationsmust be a key that exists in the context bundle. Cite an unknown key →JudgmentValidationError. Cite fewer than 2 keys → also rejected. The intent: if the model invented a number, it has to invent a citation to back it, and the invented citation will not match a real bundle key. - Deterministic fallback. On any validation failure, the system does not retry-until-it-works. It falls back to a deterministic, citation-free formatter that summarizes the bundle directly — slower thinking, no LLM, but always correct.
The result: the LLM is allowed to opine, but not to invent. If it tries to inject "Apple's P/E is 28" and the bundle didn't include a P/E key, the message it would have to fabricate to support that claim is structurally impossible. Failure is loud, not silent.
Design Choices
Decisions worth calling out
Deterministic compute, LLM opinion
compute_impact (portfolio weight changes, sector concentration,
what-if simulations) is pure Python — no LLM involvement. The LLM is only
given the result and asked to opine on it. Numbers are never produced by
the model.
Two models, two budgets
glm-5.1 for the judgment call (one or two times per detected
trade, ~$0.01-0.05 each). glm-5.1-turbo for natural-language
CLI parsing (every conversational query, <$0.001 each). Splitting the
budgets keeps the monthly bill under $5.
Thesis ledger as a learning loop
On every new position, the agent prompts for a thesis, a horizon, and what
would invalidate it. On close, it asks for an outcome:
cumplio / rompio / cambie_opinion /
profit_taking. Over time, the ledger reveals patterns the
operator wouldn't notice in flight.
Diff thresholds, not every tick
The cron flags a position only when qty moves >5% or
cost basis moves >2%. Signal-to-noise tuning to keep the
proactive channel useful (not "your portfolio rounded by $0.03 today").
Network-free tests
147 tests, ~5s, no network. yfinance, OpenAI client, schwabdev, and the openclaw subprocess are all mocked at the boundary. The test suite is identical on a laptop, CI, and a plane.
Out-of-scope, on purpose
No trade execution. No BUY/SELL signals. No sentiment scraping. No tax advice. The list of things this agent doesn't do is as deliberate as the list of what it does — half the failure modes of LLM finance tools are about claiming capabilities they shouldn't have.
CLI Surface
Exposed as a skill set in openclaw
The agent lives as a set of openclaw skills (pm-portfolio,
pm-thesis, pm-market, pm-judgment,
pm-prefs). Each maps to a CLI verb on pm-cli:
| Command | What it does |
|---|---|
| portfolio sync | Pull Schwab → diff → judgment → send (the proactive loop) |
| portfolio query | Read holdings, weights, P&L |
| portfolio impact | What-if a hypothetical trade, deterministic, no LLM |
| thesis set / get / list | Record / retrieve theses with horizons and invalidation criteria |
| thesis close | Mark outcome: cumplio · rompio · cambie_opinion · profit_taking |
| market fact | Pull cached yfinance facts for a ticker |
| judgment opine | On-demand LLM opinion with citation enforcement |
| judgment history | Replay past judgments by ticker / date range |
| prefs set / get | Persistent PM criteria the LLM should respect |
Stack
Technology
| Layer | Choice |
|---|---|
| Runtime | Python 3.12 · Typer (CLI) · packaged for openclaw skill discovery |
| Database | SQLite default (single user) · SQLAlchemy 2.x · Alembic migrations · Postgres-ready via DATABASE_URL |
| Brokerage | schwabdev 3.0.5 · OAuth + 7-day refresh · read-only scope |
| Market data | yfinance · in-process cache (TTL 6h) so the same ticker isn't hit twice in a window |
| LLM | OpenAI-compatible client → z.ai · glm-5.1 (judgment) · glm-5.1-turbo (NL parser) |
| Validation | JSON response_format · custom citation validator · deterministic fallback path |
| Messaging | openclaw subprocess bridge → WhatsApp group |
| Testing | pytest · 147 tests · ~5s · network-free |
| Lint | ruff |
| Cost | ~$5 / month all-in (LLM tokens dominate; everything else is free) |
Engineering Notes
Decisions worth calling out
- Validate at the boundary, fall back deterministically. When the LLM violates the contract, the answer isn't "retry" — that just burns tokens and hopes the next sample is correct. It's "use the deterministic formatter and move on". Loud, cheap, always correct.
- The thesis ledger forces articulation, not prediction. The agent doesn't ask the user to predict price; it asks them to articulate what would change their mind. Inversion of the typical "AI tells you what to do" pattern.
- Out-of-scope as a feature. Half the README is a list of things this agent will not do (no signals, no execution, no tax advice, no sentiment). For an LLM in a high-stakes domain, the explicit non-features matter more than the features.
- Read-only against the brokerage. The OAuth scope is read-only. No write path exists in the code. A future maintainer can't accidentally introduce one without granting a new scope, which is loud.
- Single host, single user. Same design principle as the rest of openclaw: pick one box, run everything there. The agent lives in
~/.finance-agent, runs as a cron, ships messages through a WhatsApp client that's already authenticated for the human. No infra astronaut moves.