Project · Technical Deep Dive

Clawdia · Finance Advisor

A multi-asset portfolio advisor agent — equities, bonds, ETFs — that reacts proactively to every trade detected in a Schwab account, debates on demand about holdings and hypotheticals, and keeps a thesis ledger that turns into a learning loop. Everything the LLM says is grounded by citation discipline against hallucination: it cannot invent a number. It runs as an openclaw skill set, on the same single-VPS personal AI stack.

Doesn't execute trades. Only tracks, opines, and remembers.

View on GitHub ↗ Back to portfolio

The Use Case

Why this exists

I make portfolio decisions in fits and starts — a trade now, a rebalancing in three weeks. The hard part isn't the trade, it's remembering what I was thinking at the time. Six months later, looking at a position that went south, the question "was the thesis broken or am I just impatient?" is impossible to answer without a paper trail.

So this agent watches the account, prompts me to articulate a thesis at the moment I open a position, asks for an outcome when I close it, and on every detected trade weighs in with grounded commentary so I get a second opinion in real time. The bar is high: it has to be cheap, it has to never hallucinate a number, and it has to stay strictly read-only against the brokerage.

Architecture

How the loop runs

┌─────────────────────────────────────────────────────────────────────┐
│  cron (hourly):  pm-cli portfolio sync                              │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                               ▼
                  ┌─────────────────────────┐
                  │  Schwab API (read-only) │
                  │  OAuth · 7-day refresh  │
                  └────────────┬────────────┘
                               │
                               ▼
                  ┌─────────────────────────┐
                  │  diff vs local DB       │  qty > 5%  → flagged
                  │  (SQLite · SQLAlchemy)  │  cost > 2% → flagged
                  └────────────┬────────────┘
                               │ (flagged diffs)
                               ▼
            ┌──────────────────┴───────────────────┐
            ▼                  ▼                   ▼
       compute_impact    market_data        thesis ledger
       deterministic    yfinance (TTL 6h)   position_thesis
            │                  │                   │
            └──────────────────┼───────────────────┘
                               ▼
                  ┌─────────────────────────┐
                  │   build_context_bundle  │   typed key/value
                  │   (Python dict)         │   pairs
                  └────────────┬────────────┘
                               │
                               ▼
                  ┌─────────────────────────┐
                  │   LLM (glm-5.1)         │
                  │   JSON response_format  │
                  │   + citation enforcement│
                  └────────────┬────────────┘
                               │
              ┌────────────────┴────────────────┐
              ▼                                 ▼
     citations valid → format_for_whatsapp     citations invalid →
     + JudgmentLog row                         deterministic fallback
              │
              ▼
     openclaw message send → WhatsApp group

Two paths into the agent: a cron-driven proactive loop that fires on every detected position change, and a conversational loop the operator can drive on demand ("opiná sobre X", "qué pasa si compro X", "qué dijiste de X la semana pasada"). Both share the same context bundle, citation pipeline, and judgment log.

The Interesting Bit

Citation discipline against hallucination

The temptation with an LLM looking at a brokerage account is to let it say whatever it wants. In an advisor context that's not just sloppy — it's dangerous. So the LLM is pinned by a contract enforced in Python before any output reaches the user.

Context bundle as ground truth. Before the LLM is called, Python builds a typed dict of facts: affected_ticker, last_price, position_delta_pct, portfolio_weight_after, thesis_summary, etc. The bundle is the only source of facts the model is allowed to reference.
Structured output. The LLM must return JSON with exact required fields: headline, market_read, portfolio_impact_read, ask_to_giu, and — critically — a citations list.
Citation validation. Every string in citations must be a key that exists in the context bundle. Cite an unknown key → JudgmentValidationError. Cite fewer than 2 keys → also rejected. The intent: if the model invented a number, it has to invent a citation to back it, and the invented citation will not match a real bundle key.
Deterministic fallback. On any validation failure, the system does not retry-until-it-works. It falls back to a deterministic, citation-free formatter that summarizes the bundle directly — slower thinking, no LLM, but always correct.

The result: the LLM is allowed to opine, but not to invent. If it tries to inject "Apple's P/E is 28" and the bundle didn't include a P/E key, the message it would have to fabricate to support that claim is structurally impossible. Failure is loud, not silent.

Design Choices

Decisions worth calling out

Deterministic compute, LLM opinion

compute_impact (portfolio weight changes, sector concentration, what-if simulations) is pure Python — no LLM involvement. The LLM is only given the result and asked to opine on it. Numbers are never produced by the model.

Two models, two budgets

glm-5.1 for the judgment call (one or two times per detected trade, ~$0.01-0.05 each). glm-5.1-turbo for natural-language CLI parsing (every conversational query, <$0.001 each). Splitting the budgets keeps the monthly bill under $5.

Thesis ledger as a learning loop

On every new position, the agent prompts for a thesis, a horizon, and what would invalidate it. On close, it asks for an outcome: cumplio / rompio / cambie_opinion / profit_taking. Over time, the ledger reveals patterns the operator wouldn't notice in flight.

Diff thresholds, not every tick

The cron flags a position only when qty moves >5% or cost basis moves >2%. Signal-to-noise tuning to keep the proactive channel useful (not "your portfolio rounded by $0.03 today").

Network-free tests

147 tests, ~5s, no network. yfinance, OpenAI client, schwabdev, and the openclaw subprocess are all mocked at the boundary. The test suite is identical on a laptop, CI, and a plane.

Out-of-scope, on purpose

No trade execution. No BUY/SELL signals. No sentiment scraping. No tax advice. The list of things this agent doesn't do is as deliberate as the list of what it does — half the failure modes of LLM finance tools are about claiming capabilities they shouldn't have.

CLI Surface

Exposed as a skill set in openclaw

The agent lives as a set of openclaw skills (pm-portfolio, pm-thesis, pm-market, pm-judgment, pm-prefs). Each maps to a CLI verb on pm-cli:

Command	What it does
portfolio sync	Pull Schwab → diff → judgment → send (the proactive loop)
portfolio query	Read holdings, weights, P&L
portfolio impact	What-if a hypothetical trade, deterministic, no LLM
thesis set / get / list	Record / retrieve theses with horizons and invalidation criteria
thesis close	Mark outcome: cumplio · rompio · cambie_opinion · profit_taking
market fact	Pull cached yfinance facts for a ticker
judgment opine	On-demand LLM opinion with citation enforcement
judgment history	Replay past judgments by ticker / date range
prefs set / get	Persistent PM criteria the LLM should respect

Stack

Technology

Layer	Choice
Runtime	Python 3.12 · Typer (CLI) · packaged for openclaw skill discovery
Database	SQLite default (single user) · SQLAlchemy 2.x · Alembic migrations · Postgres-ready via `DATABASE_URL`
Brokerage	schwabdev 3.0.5 · OAuth + 7-day refresh · read-only scope
Market data	yfinance · in-process cache (TTL 6h) so the same ticker isn't hit twice in a window
LLM	OpenAI-compatible client → z.ai · glm-5.1 (judgment) · glm-5.1-turbo (NL parser)
Validation	JSON response_format · custom citation validator · deterministic fallback path
Messaging	openclaw subprocess bridge → WhatsApp group
Testing	pytest · 147 tests · ~5s · network-free
Lint	ruff
Cost	~$5 / month all-in (LLM tokens dominate; everything else is free)

Engineering Notes

Decisions worth calling out

Validate at the boundary, fall back deterministically. When the LLM violates the contract, the answer isn't "retry" — that just burns tokens and hopes the next sample is correct. It's "use the deterministic formatter and move on". Loud, cheap, always correct.
The thesis ledger forces articulation, not prediction. The agent doesn't ask the user to predict price; it asks them to articulate what would change their mind. Inversion of the typical "AI tells you what to do" pattern.
Out-of-scope as a feature. Half the README is a list of things this agent will not do (no signals, no execution, no tax advice, no sentiment). For an LLM in a high-stakes domain, the explicit non-features matter more than the features.
Read-only against the brokerage. The OAuth scope is read-only. No write path exists in the code. A future maintainer can't accidentally introduce one without granting a new scope, which is loud.
Single host, single user. Same design principle as the rest of openclaw: pick one box, run everything there. The agent lives in ~/.finance-agent, runs as a cron, ships messages through a WhatsApp client that's already authenticated for the human. No infra astronaut moves.

Clawdia · Finance Advisor

Why this exists

How the loop runs

Citation discipline against hallucination

Decisions worth calling out

Deterministic compute, LLM opinion

Two models, two budgets

Thesis ledger as a learning loop

Diff thresholds, not every tick

Network-free tests

Out-of-scope, on purpose

Exposed as a skill set in openclaw

Technology

Decisions worth calling out

Want to talk about this?