Project · Technical Deep Dive

Clawdia · Finance Advisor

A multi-asset portfolio advisor agent — equities, bonds, ETFs — that reacts proactively to every trade detected in a Schwab account, debates on demand about holdings and hypotheticals, and keeps a thesis ledger that turns into a learning loop. Everything the LLM says is grounded by citation discipline against hallucination: it cannot invent a number. It runs as an openclaw skill set, on the same single-VPS personal AI stack.

Doesn't execute trades. Only tracks, opines, and remembers.

Why this exists

I make portfolio decisions in fits and starts — a trade now, a rebalancing in three weeks. The hard part isn't the trade, it's remembering what I was thinking at the time. Six months later, looking at a position that went south, the question "was the thesis broken or am I just impatient?" is impossible to answer without a paper trail.

So this agent watches the account, prompts me to articulate a thesis at the moment I open a position, asks for an outcome when I close it, and on every detected trade weighs in with grounded commentary so I get a second opinion in real time. The bar is high: it has to be cheap, it has to never hallucinate a number, and it has to stay strictly read-only against the brokerage.

How the loop runs

┌─────────────────────────────────────────────────────────────────────┐
│  cron (hourly):  pm-cli portfolio sync                              │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                               ▼
                  ┌─────────────────────────┐
                  │  Schwab API (read-only) │
                  │  OAuth · 7-day refresh  │
                  └────────────┬────────────┘
                               │
                               ▼
                  ┌─────────────────────────┐
                  │  diff vs local DB       │  qty > 5%  → flagged
                  │  (SQLite · SQLAlchemy)  │  cost > 2% → flagged
                  └────────────┬────────────┘
                               │ (flagged diffs)
                               ▼
            ┌──────────────────┴───────────────────┐
            ▼                  ▼                   ▼
       compute_impact    market_data        thesis ledger
       deterministic    yfinance (TTL 6h)   position_thesis
            │                  │                   │
            └──────────────────┼───────────────────┘
                               ▼
                  ┌─────────────────────────┐
                  │   build_context_bundle  │   typed key/value
                  │   (Python dict)         │   pairs
                  └────────────┬────────────┘
                               │
                               ▼
                  ┌─────────────────────────┐
                  │   LLM (glm-5.1)         │
                  │   JSON response_format  │
                  │   + citation enforcement│
                  └────────────┬────────────┘
                               │
              ┌────────────────┴────────────────┐
              ▼                                 ▼
     citations valid → format_for_whatsapp     citations invalid →
     + JudgmentLog row                         deterministic fallback
              │
              ▼
     openclaw message send → WhatsApp group

Two paths into the agent: a cron-driven proactive loop that fires on every detected position change, and a conversational loop the operator can drive on demand ("opiná sobre X", "qué pasa si compro X", "qué dijiste de X la semana pasada"). Both share the same context bundle, citation pipeline, and judgment log.

Citation discipline against hallucination

The temptation with an LLM looking at a brokerage account is to let it say whatever it wants. In an advisor context that's not just sloppy — it's dangerous. So the LLM is pinned by a contract enforced in Python before any output reaches the user.

  1. Context bundle as ground truth. Before the LLM is called, Python builds a typed dict of facts: affected_ticker, last_price, position_delta_pct, portfolio_weight_after, thesis_summary, etc. The bundle is the only source of facts the model is allowed to reference.
  2. Structured output. The LLM must return JSON with exact required fields: headline, market_read, portfolio_impact_read, ask_to_giu, and — critically — a citations list.
  3. Citation validation. Every string in citations must be a key that exists in the context bundle. Cite an unknown key → JudgmentValidationError. Cite fewer than 2 keys → also rejected. The intent: if the model invented a number, it has to invent a citation to back it, and the invented citation will not match a real bundle key.
  4. Deterministic fallback. On any validation failure, the system does not retry-until-it-works. It falls back to a deterministic, citation-free formatter that summarizes the bundle directly — slower thinking, no LLM, but always correct.

The result: the LLM is allowed to opine, but not to invent. If it tries to inject "Apple's P/E is 28" and the bundle didn't include a P/E key, the message it would have to fabricate to support that claim is structurally impossible. Failure is loud, not silent.

Decisions worth calling out

Deterministic compute, LLM opinion

compute_impact (portfolio weight changes, sector concentration, what-if simulations) is pure Python — no LLM involvement. The LLM is only given the result and asked to opine on it. Numbers are never produced by the model.

Two models, two budgets

glm-5.1 for the judgment call (one or two times per detected trade, ~$0.01-0.05 each). glm-5.1-turbo for natural-language CLI parsing (every conversational query, <$0.001 each). Splitting the budgets keeps the monthly bill under $5.

Thesis ledger as a learning loop

On every new position, the agent prompts for a thesis, a horizon, and what would invalidate it. On close, it asks for an outcome: cumplio / rompio / cambie_opinion / profit_taking. Over time, the ledger reveals patterns the operator wouldn't notice in flight.

Diff thresholds, not every tick

The cron flags a position only when qty moves >5% or cost basis moves >2%. Signal-to-noise tuning to keep the proactive channel useful (not "your portfolio rounded by $0.03 today").

Network-free tests

147 tests, ~5s, no network. yfinance, OpenAI client, schwabdev, and the openclaw subprocess are all mocked at the boundary. The test suite is identical on a laptop, CI, and a plane.

Out-of-scope, on purpose

No trade execution. No BUY/SELL signals. No sentiment scraping. No tax advice. The list of things this agent doesn't do is as deliberate as the list of what it does — half the failure modes of LLM finance tools are about claiming capabilities they shouldn't have.

Exposed as a skill set in openclaw

The agent lives as a set of openclaw skills (pm-portfolio, pm-thesis, pm-market, pm-judgment, pm-prefs). Each maps to a CLI verb on pm-cli:

CommandWhat it does
portfolio syncPull Schwab → diff → judgment → send (the proactive loop)
portfolio queryRead holdings, weights, P&L
portfolio impactWhat-if a hypothetical trade, deterministic, no LLM
thesis set / get / listRecord / retrieve theses with horizons and invalidation criteria
thesis closeMark outcome: cumplio · rompio · cambie_opinion · profit_taking
market factPull cached yfinance facts for a ticker
judgment opineOn-demand LLM opinion with citation enforcement
judgment historyReplay past judgments by ticker / date range
prefs set / getPersistent PM criteria the LLM should respect

Technology

LayerChoice
RuntimePython 3.12 · Typer (CLI) · packaged for openclaw skill discovery
DatabaseSQLite default (single user) · SQLAlchemy 2.x · Alembic migrations · Postgres-ready via DATABASE_URL
Brokerageschwabdev 3.0.5 · OAuth + 7-day refresh · read-only scope
Market datayfinance · in-process cache (TTL 6h) so the same ticker isn't hit twice in a window
LLMOpenAI-compatible client → z.ai · glm-5.1 (judgment) · glm-5.1-turbo (NL parser)
ValidationJSON response_format · custom citation validator · deterministic fallback path
Messagingopenclaw subprocess bridge → WhatsApp group
Testingpytest · 147 tests · ~5s · network-free
Lintruff
Cost~$5 / month all-in (LLM tokens dominate; everything else is free)

Decisions worth calling out

  • Validate at the boundary, fall back deterministically. When the LLM violates the contract, the answer isn't "retry" — that just burns tokens and hopes the next sample is correct. It's "use the deterministic formatter and move on". Loud, cheap, always correct.
  • The thesis ledger forces articulation, not prediction. The agent doesn't ask the user to predict price; it asks them to articulate what would change their mind. Inversion of the typical "AI tells you what to do" pattern.
  • Out-of-scope as a feature. Half the README is a list of things this agent will not do (no signals, no execution, no tax advice, no sentiment). For an LLM in a high-stakes domain, the explicit non-features matter more than the features.
  • Read-only against the brokerage. The OAuth scope is read-only. No write path exists in the code. A future maintainer can't accidentally introduce one without granting a new scope, which is loud.
  • Single host, single user. Same design principle as the rest of openclaw: pick one box, run everything there. The agent lives in ~/.finance-agent, runs as a cron, ships messages through a WhatsApp client that's already authenticated for the human. No infra astronaut moves.