SDI · Syntrociety Document Intelligence

Insights & Mining

Timeline, entities, citations — plus the backfill jobs that feed them.

What Insights shows

Timeline — documents and events interleaved by date, newest first, scoped to the active org or all your orgs.
Entities — people, organisations, laws, places, event references, aggregated across every document that has been extracted.
Citations & tags — most-cited legal references and tag histogram, to show what the corpus leans on.

Mining backfill

At the top of Insights there are two progress bars: Embeddings and Entities. Each shows how many documents have been mined vs. the total. Click Backfill N pending to process them in batches of 5; the UI polls the endpoint until remaining hits zero. Hit Stop to interrupt.

Embeddings (semantic search)

Embeddings turn document chunks into vectors the agent can search by meaning. SDI uses Voyage AI (voyage-3, 1024 dimensions) — set VOYAGE_API_KEY in the environment to enable it. Without it, the Embed buttons return 503 and the agent falls back to keyword search.

Entity extraction

Claude reads each document and returns up to 50 named entities (people, orgs, laws, places, events, other). Each entity is normalised (lowercased, whitespace-collapsed) so the same name across documents counts as one. The extraction is budget-gated — if the monthly LLM budget is exhausted, the job halts and comes back when budget is granted.

Per-document mining

Open any row in Index, expand the metadata editor, and you'll find Embed andExtract entities buttons. Use them when you want to re-mine a specific document after editing it, without running the whole backfill.