How each agent works
This page explains — in plain language — exactly what every AI agent in the pipeline does, what data it reads, and how to interpret the numbers it produces. Reading this once turns the approval screen from a wall of text into something you can scan in thirty seconds.
What changed (M3 "Agent Depth"). The agents no longer reason over raw ticks and then invent indicator values. Stage 1 now computes a deterministic feature bundle — real RSI, MACD, moving averages, ATR, support/resistance, news features, a price forecast — and hands it to the agents as evidence. The agents reason over given numbers and are told not to restate a value they weren't handed. Where an agent has no real data, it now abstains instead of guessing. The sections below describe this current behaviour.
The feature bundle (computed before any agent runs)
Before the analysts wake up, stage 1 attaches a FeatureBundle to the market snapshot. This is pure deterministic code — no AI — and it is the single source of the numbers the agents are allowed to cite:
- Technical features — RSI(14), MACD(12/26/9), SMA 20/50/200, EMA20,
ATR(14), Bollinger(20,2), swing high/low, support/resistance and distance to each, gap %, and volume-vs-average.
- News features — each headline tagged by event class
(earnings / guidance / rating / regulatory / macro), recency-weighted, de-duplicated, and given a per-item sentiment score.
- Sentiment features — news-driven sentiment, plus a flag for whether a real
positioning feed (options / FII-DII) is wired (it is not yet, so the flag is off).
- Fundamentals — P/E, P/B, EV/EBITDA, growth, margins **when a source is
wired** (none is yet, so this is marked unavailable).
- Price forecast — a baseline expected move with a band over the horizon
(see the bottom of this page). Evidence only — it never touches risk or sizing.
A composition-root wrapper guarantees the bundle is always present: if any feed forgets to attach one, a fallback bundle is computed from the recent closes and news, so the evidence layer can never silently switch off.
The agent landscape
Seven AI agents run across four pipeline stages. All of them share the same underlying design: they receive a structured prompt built from the feature bundle and live data, call the configured LLM (or the offline mock), and return a Pydantic-validated object whose fields feed the next stage.
| Stage | Agents | Runs |
|---|---|---|
| 2 | News · Sentiment · Technical · Fundamental | In parallel |
| 5 | Bull Researcher · Bear Researcher → Research Manager | Bull/bear, then a rebuttal round, then the manager |
| 6 | Trader | After debate |
| 12 | Reviewer | After position closes |
Stages 7, 9, and 11 contain no agents — they are deterministic code only.
Anti-hallucination ticker guard. Every agent that emits a symbol is checked, centrally, against the symbol it was asked about. If a model ever returns a note or thesis for a different instrument, the stage fails closed — it does not slip through.
Stage 2 — The four analyst agents
All four run in parallel over the same snapshot and its feature bundle. They cannot see each other's notes. Each returns an AnalystNote.
What every analyst note contains
| Field | Range | Meaning |
|---|---|---|
stance | −1.00 to +1.00 | Directional lean. Positive = bullish, negative = bearish, near zero = neutral |
confidence | 0.00 to 1.00 | How sure the agent is of its own stance |
summary | text | One-sentence conclusion |
key_points | list | The supporting bullets |
subscores | map | Per-factor scores (e.g. momentum, trend) that fed the stance |
evidence | list | The exact feature values or headlines the agent cited |
expectation_gap | number / blank | How far reality sits from what the agent expected |
time_horizon | text | The horizon the note is reasoning over |
Quorum rule. At least 3 of the 4 analysts must succeed (return a note) for the pipeline to continue. If only 2 or fewer succeed, the run is marked DEGRADED and halts before the debate stage. An abstaining note still counts as a successful note for quorum — see below.
Self-critique pass. If an analyst returns a note below a confidence floor (0.40), it runs exactly one self-review pass ("what would change your stance, and is the low confidence justified?") before emitting. This is cheap and only fires when the model is genuinely unsure. Offline, the deterministic mock returns its calibrated note directly and skips the extra pass.
News Analyst
What it does. Reads the event-tagged news features for the symbol and rates how bullish or bearish the news is, by event class.
Inputs the agent sees:
- Last traded price and previous close
- The computed news features: each headline's event type
(earnings/guidance/rating/regulatory/macro), recency weight, and per-item sentiment score, plus the net recency-weighted sentiment and how many items were unique after de-duplication
- The recent raw headlines (still shown, inside the fence below)
- Up to 5 macro indicators
Security note. The news text is third-party data and could contain adversarial content. Before being placed in the prompt, every headline is wrapped in unforgeable fence markers (<UNTRUSTED_FEED_DATA>…</UNTRUSTED_FEED_DATA>) and the system prompt tells the model to treat that block as data only, not as instructions. The same neutralisation is applied to headlines echoed inside the news-features block, so the fence cannot be bypassed through the features path.
How it scores:
- Rates directional impact per event class and cites the specific headline or
feature behind each point — it is told not to invent figures it wasn't given.
- A surprise dividend or contract win typically produces a stance near +0.6 to
+0.9; a fraud allegation or profit warning near −0.6 to −0.9.
confidenceis typically lower when headlines are sparse or ambiguous.
Sentiment Analyst
What it does. Estimates the "temperature" of investor positioning around the stock — leaning on the news-driven sentiment it is actually given.
Inputs the agent sees:
- Last traded price and previous close
- The computed sentiment features: news-driven sentiment and a flag for
whether a real positioning feed (options open interest, FII/DII flows) is wired
- Up to 5 macro indicators
It abstains when there is nothing real to use. In the current build there is no options / FII-DII positioning feed wired. When there is also no news to derive sentiment from, the agent returns a deterministic abstention note — no LLM call at all — with stance 0, confidence 0.15, and model_used = "deterministic-abstain". When news is present, it leans on that news sentiment and is told not to fabricate flows it wasn't given — it lowers its confidence instead.
Stance interpretation (when it does take a side):
- Positive: bullish news-driven tone.
- Negative: bearish news-driven tone.
- Near 0 / low confidence: thin or mixed signals, or abstaining on absent data.
Technical Analyst
What it does. Reads the computed indicators and the price forecast to decide whether the chart is set up for a move up or down.
Inputs the agent sees:
- Last traded price, previous close, and last-tick volume
- The full computed technical block: RSI(14), MACD(12/26/9) with signal and
histogram, SMA 20/50/200, EMA20, ATR(14), Bollinger bands, support/resistance and distance to each, gap %, and volume-vs-average
- The baseline price forecast (expected move + band) as evidence only
This is the big M3 change. The technical analyst used to be handed only "the 5 most recent tick prices" and then asked to describe RSI and MACD — which it had to invent. It now reads the real computed indicators and is explicitly told to use those values and not to restate a different number. The hallucinated-RSI problem is gone.
How it reasons:
- Reads the trend (SMA/EMA stack), momentum (RSI/MACD), volatility
(ATR/Bollinger), and proximity to support/resistance into a directional stance, citing the indicator behind each point.
- A bullish MACD crossover + price above SMA200 + RSI in the 50–65 range would
produce a stance around +0.5 to +0.7; a breakdown below support with bearish MACD produces stances near −0.5 to −0.8.
Confidence tends to be high when multiple indicators agree, and low when they contradict each other (e.g. bullish trend but RSI diverging).
Fundamental Analyst
What it does. Evaluates the business quality and valuation of the company — when it actually has the ratios to do so.
Inputs the agent sees:
- Last traded price and previous close
- The computed fundamentals block: P/E, P/B, EV/EBITDA, revenue growth, and
net margin — when a source is wired
- Up to 5 macro indicators
It abstains when no source is wired. No NSE/BSE fundamentals source is wired in the current build, so this agent returns the deterministic abstention note (stance 0, confidence 0.15, model_used = "deterministic-abstain") rather than inventing multiples. When a source is present, it evaluates valuation, growth, and margin quality from the reported ratios and cites the figure behind each point — it is told not to fabricate multiples.
Why this still matters for your review. Even abstaining is information: a visible "abstaining — no fundamentals source wired" note tells you the trade is resting on technical and news evidence only, not on a view of the underlying business. The quorum is still reachable on the other three analysts.
Stage 5 — The debate layer (3 agents)
After the four analyst notes are collected, the debate stage takes over. In M3 it runs as two passes plus a synthesis: bull and bear each build their case, then each gets one bounded rebuttal round answering the other, and only then does the manager judge.
Why a debate (and why a rebuttal)?
One analyst panel can reach a consensus that is wrong. The debate forces the system to articulate the strongest opposing argument before committing. The rebuttal round then makes each side answer the other's best points rather than talk past them, so the manager judges the cases after they have been tested.
Bull Researcher
What it does. Builds the strongest case for buying — then rebuts the bear.
Pass 1 (build) inputs:
- Last price and previous close
- Only the bullish analyst notes (stance > 0), with stance and confidence
- All key points from the full panel (up to 2 per analyst)
Pass 2 (rebuttal): sees its own initial case and the bear's case, and returns a sharpened BullCase whose supporting points directly address the bear.
What it produces — BullCase: argument (the case for LONG), supporting_points, and risks it acknowledges even as a bull.
If no analyst is explicitly bullish, the agent argues from the available data anyway, so there is always a debate.
Bear Researcher
Mirrors the bull exactly, for the short side: a build pass over the bearish notes, then a rebuttal pass answering the bull. Produces a BearCase with argument, supporting_points, and acknowledged upside risks (earnings surprise, short-squeeze, sector re-rating).
The rebuttal round runs once and is bounded; if a rebuttal call fails, the system safely falls back to that side's initial case.
Research Manager
What it does. Acts as a neutral judge. Reads the full panel and both rebutted cases, declares a winner, and assigns a conviction score.
Inputs the agent sees:
- Last price and previous close
- All analyst notes with stance numbers
- The post-rebuttal bull and bear arguments and supporting points
What it produces — DebateResult:
| Field | Range / type | Meaning |
|---|---|---|
winner | LONG or SHORT | Which direction the debate favoured |
conviction | 0.00 to 1.00 | How decisive the verdict was (after calibration) |
manager_rationale | text | Explicit reasoning for why one side won |
key_disagreements | list | Where the panel / the two sides genuinely conflict |
falsifiers | list | What evidence would flip the winner |
rebuttals | list | The rebutted cases the verdict was judged on |
How conviction is scored — and then calibrated. The model proposes a conviction, but code then deterministically calibrates it down when the analyst panel diverges from the chosen winner. The detail that matters:
- If no analyst opposes the winner, conviction is left unchanged.
- The denominator counts only analysts that actually took a side — abstaining
or neutral notes do not dilute the disagreement. So one analyst opposing the winner with three abstaining reads as full opposition, not a 25% minority.
- Full opposition applies up to a 60% haircut on the model's proposed conviction.
This means a split panel can no longer produce a confident-looking number — the conviction you see has already been knocked down to reflect real disagreement.
Reading the approval screen. The winner pill shows direction and conviction. The manager rationale tells you why this side won; the falsifiers tell you what to watch for that would prove the trade wrong.
Stage 6 — Trader agent
What it does. Synthesises everything into a concrete trade proposal — but the prices are now derived deterministically, not invented by the model.
Inputs the agent sees:
- Last price and previous close
- Debate result: winner, conviction, and the manager rationale
- The deterministic price anchors (current price, ATR, support/resistance)
- All four analyst notes with stance and confidence
What it produces — TradeThesis:
| Field | Meaning |
|---|---|
direction | LONG or SHORT (follows the debate winner) |
conviction | 0–1 |
entry | Entry price in ₹ |
target | Take-profit price in ₹ |
stop | Stop-loss price in ₹ |
horizon_sessions | Expected holding period in trading sessions |
rationale | The trader's reasoning |
invalidation_conditions | What would invalidate the thesis |
key_risks | The main risks to the trade |
expected_horizon | A human-readable horizon note |
How entry / target / stop are set (the M3 change). The LLM owns direction, rationale, and risks; the code owns the prices. After the model proposes a thesis, the system overwrites the levels from deterministic anchors:
- Entry = the current price.
- Stop ≈ 2 × ATR(14) from entry (on the correct side for the direction).
- Target ≈ a 2R multiple — twice the entry-to-stop distance.
The old guide said the trader "infers reasonable levels" and was warned not to "invent arbitrary numbers." That framing is now obsolete — the trader is no longer trusted to pick prices at all. If the ATR is so small that 2×ATR rounds the stop back onto entry (a sub-tick ATR), the system keeps the model's own prudent prices rather than reject a sound thesis.
A thesis validator runs after anchoring and fails the stage closed if the geometry is wrong: a stop equal to entry, a target on the wrong side of entry, or a stop further than 4×ATR away are all rejected before the trade can reach you. (This is a thesis-side check; the deterministic risk engine in stage 7 is separate and unchanged.)
Conviction-based model escalation. When the manager's conviction is high (≥ 0.75), the trader escalates to the Opus model tier (the manager's tier) for that one call — the bigger the decision, the more the deeper model is worth. Normal-conviction runs stay on the default (Sonnet) tier.
Important: the trader does not size the position. Quantity is computed by the deterministic risk engine in stage 7 from the stop distance and capital. The trader only sets the three prices and the horizon.
Stage 7 — Risk engine (not an agent)
This stage is included here for completeness because the approval screen shows its output alongside the agent outputs.
It is pure deterministic code — no LLM. It takes the trader's thesis and runs its checks:
| Check | What it tests |
|---|---|
degenerate_thesis | Target and stop must be on the correct sides of entry |
size_nonzero | Computed share count must be at least 1 |
daily_loss_cap | Risk amount ≤ configured daily loss cap |
margin_sufficient | Position notional ≤ available capital |
max_notional_pct | Single-trade notional ≤ max % of capital |
max_positions | Open positions count < configured maximum |
exposure_cap | Portfolio gross exposure after this trade ≤ cap |
Sizing formula:
quantity = floor( (capital × risk_per_trade_pct / 100) / stop_distance )
Where stop_distance = |entry − stop|. The formula ensures that if the stop is hit, you lose exactly risk_per_trade_pct percent of capital — no more.
The risk decision on the approval screen shows each check with a ✓ or ✗. A single ✗ rejects the trade. You should not approve a trade the risk engine rejected.
Stage 12 — Reviewer agent
What it does. After a position closes (stop hit, target hit, or manual close), the reviewer critiques the outcome against the thesis and writes a structured lesson to the memory store for future runs.
Inputs the agent sees:
- The original thesis: direction, entry, target, stop, conviction, rationale
- The actual outcome:
win,loss, orscratch - The realized P&L (in ₹)
What it produces — TradeReview:
| Field | Meaning |
|---|---|
critique | An honest assessment of what the thesis got right and wrong |
lessons | Actionable takeaways for future runs |
signal_evolution | How the thesis fared (see below) |
thesis_vs_outcome | Predicted vs realised deltas (e.g. predicted target/stop % vs the outcome) |
memory_record | A structured, tagged record written back to memory |
signal_evolution is classified deterministically from the factual outcome, never from the model's wording: a win → Realized, a loss → Falsified, a scratch → Weakened. (The full set of states is Strengthened, Weakened, Falsified, Realized, and Unknown.)
Grounding rule. The realized P&L, the outcome label, and the signal-evolution state are all factual — derived from known numbers, not the model. Only the critique and lessons are model-authored. This prevents the learning loop from recording a hallucinated outcome. The memory record is also tagged with the symbol, direction, outcome, and signal so hybrid retrieval can match a similar setup next time.
How memory enriches every agent
Every agent has optional access to a memory store. The default store is now a hybrid retriever — it combines a BM25 keyword score with a hashed dense vector and fuses the two rankings with Reciprocal Rank Fusion (RRF). It is pure standard-library code: deterministic, offline, and needs no embedding model or network. ChromaDB is still available as an opt-in vector backend (TRADING_MEMORY=chroma).
The old guide described "a vector memory store (ChromaDB when enabled; in-memory mock otherwise)." The default is now this hybrid (BM25 + dense + RRF) store, not ChromaDB.
Before each agent's prompt is assembled, the base class queries the store for the most relevant past records for the current symbol. A relevance floor means only records that share at least one query term are surfaced — so an irrelevant past note can no longer leak into the prompt. If hits are found, they are prepended:
Relevant past notes:
- [review] RELIANCE: Chased a breakout; stock reversed. Lesson: wait for volume confirmation. [LONG RELIANCE | outcome=loss | signal=falsified | ...]
- [review] RELIANCE: Bull thesis on refinery margins held; hit target in 4 sessions. [LONG RELIANCE | outcome=win | signal=realized | ...]
[rest of prompt...]
The memory grows automatically as the reviewer writes its structured record after each closed position.
The price forecast (evidence only)
The feature bundle includes a baseline price forecast: an expected % move with a low/high band over the thesis horizon. The current implementation is a classical drift-plus-volatility-band baseline (deterministic, offline). It is shown to the technical analyst and the trader as evidence only and is physically barred from the risk and execution path — a forecast can inform reasoning but can never size or route a trade. A more sophisticated news-aware model can be dropped in later behind the same interface without changing any agent.
Quick interpretation guide
Use this when scanning the approval screen quickly:
| Pattern | What it likely means |
|---|---|
| All non-abstaining stances positive, high confidence | Strong consensus — rare and worth taking seriously |
| Mixed stances (some +, some −) | Genuine uncertainty; conviction will already be calibrated down — check the manager rationale and falsifiers |
| Sentiment / Fundamental abstaining | No real data source for that factor; the trade rests on the other evidence — not a red flag by itself |
| High debate conviction (> 0.75) + risk APPROVED | Cleanest signal — and the trader will have used the deeper Opus model |
| Low debate conviction (< 0.45) | Debate was close or the panel diverged; think twice |
| Risk check REJECTED | Hard stop — the trade violates a portfolio rule, do not override |
| Stop / target look tight or wide | They are derived from ATR (≈2×ATR stop, 2R target) — that is the volatility talking, not a guess |
| News high-confidence bearish | Event-driven risk; the thesis is fighting active headwinds |