- Timeline — documents and events interleaved by date, newest first, scoped to the active org or all your orgs.
- Entities — people, organisations, laws, places, event references, aggregated across every document that has been extracted.
- Citations & tags — most-cited legal references and tag histogram, to show what the corpus leans on.
At the top of Insights there are two progress bars: Embeddings and Entities. Each shows how many documents have been mined vs. the total. Click Backfill N pending to process them in batches of 5; the UI polls the endpoint until remaining hits zero. Hit Stop to interrupt.
Embeddings turn document chunks into vectors the agent can search by meaning. SDI uses Voyage AI (voyage-3, 1024 dimensions) — set VOYAGE_API_KEY in the environment to enable it. Without it, the Embed buttons return 503 and the agent falls back to keyword search.
Claude reads each document and returns up to 50 named entities (people, orgs, laws, places, events, other). Each entity is normalised (lowercased, whitespace-collapsed) so the same name across documents counts as one. The extraction is budget-gated — if the monthly LLM budget is exhausted, the job halts and comes back when budget is granted.
Open any row in Index, expand the metadata editor, and you'll find Embed andExtract entities buttons. Use them when you want to re-mine a specific document after editing it, without running the whole backfill.