Methodology
About these numbers
Every company on StatDesk pulls from two parallel sources: the machine-readable iXBRL filing Companies House publishes, and the printed PDF accounts read back via a structured-output AI extraction layer. Here's how that works, where it's reliable, and where the two views can diverge.
What we extract
For every UK company we surface, the latest annual accounts (AA) filing is processed through both pipelines:
- iXBRL parser — free, deterministic, runs on the tagged XML version of the filing. Reads the structured concepts the filer chose to tag.
- AI PDF extraction (Gemini 2.5 Flash) — paid, probabilistic, runs on the human-readable PDF version of the same filing. Reads every line item and disclosure the document carries, including auditor / going-concern / subsidiary metadata that iXBRL doesn't tag.
Both views are cached. The displayed numbers come from iXBRL where available; the AI-extracted PDF view fills in qualitative metadata and serves as a cross-check. For consolidated group filers, the PDF view is preferred because iXBRL on these filings often reports only the parent-company-only standalone numbers.
How reliable is the AI extraction?
We continuously run a "truth-check" that compares the two extractors filing-by-filing on the latest matched fiscal year. On a randomized n=200 cross-hub sample drawn 2026-05-09:
- Where both extractors return a comparable value (~60% of schema fields), the two views agree on the exact integer 80.2% of the time.
- A further 5.0% of disagreements are parent-vs-consolidated scope differences on group filers — both views are right at their own level. The PDF view gives the consolidated group number, which is usually the operationally correct one.
- 1.8% of disagreements reflect a UK iXBRL
convention split on cash-flow outflows like
dividends_paid: filers tag it as positive (magnitude) or negative (cashflow direction) about 50/50. The number's meaning is invariant under sign. - The remaining ~10% are taxonomy edge cases (intangibles split, fixed assets gross vs net) and a low-single-digit residual of genuine extraction error.
Across all field-comparisons, the AI PDF view independently provides ~25× the unique coverage of iXBRL alone (38.5% PDF-only vs 1.5% iXBRL-only). Most companies' financial records on StatDesk would be substantially less complete without it.
Why "Group / consolidated" appears on some pages
When a company is the holding company of a group, its annual accounts typically report two views:
- Parent-only standalone — the holding company on its own, often with little operating activity (because the business is run through subsidiaries). Can show small or negative net assets.
- Consolidated group — the holding company plus all subsidiaries, which is the operationally relevant view of the business.
iXBRL filings for these companies frequently tag only the parent-only numbers. The AI-extracted PDF view captures the consolidated group view, which is what we display when the badge appears in the page header. The official Companies House filing will still show both — what we're surfacing is the financially meaningful one for risk / valuation / headcount purposes.
What we do NOT extract
- Multi-year history — the latest filing typically carries the current year + 1 prior. We don't currently walk older filings to backfill 5+ years of trend data per company.
- Full footnote text — we capture short verbatim excerpts of going-concern, related-party, accounting policy changes, and other high-signal notes. The full footnote corpus is not in the structured record.
- Unaudited filings beyond the rich tier — micro-entity and abridged filings deliberately omit P&L disclosure; we cover them with metadata only.
If a number looks wrong
Click through to the Companies House profile from the page header to see the original filing. If you find a meaningful discrepancy between StatDesk and the original, it's worth knowing about — the truth-check tooling continues to validate the pipeline against every randomized sample we draw, and confirmed corrections feed back into the extraction prompt or parser logic.