Shiny.DocumentDb v6 — Vectors, Filters, Composite Indexes & Real Pooling
Shiny.DocumentDb v6 is out. Same one-line services.AddDocumentStore(...), same zero-schema document model, same AOT story — but the v6 release lands five features that have been on the wish list since v3:
- Vector / ANN search that translates to the native engine on every provider (pgvector, SQL Server 2025
VECTOR_DISTANCE, Cosmos DiskANN, Mongo Atlas$vectorSearch, DuckDBvss,sqlite-vec). - Global query filters —
AddQueryFilter<T>(u => !u.IsDeleted)and they apply to every read, single-doc fetch, bulk operation, and change-stream subscription. Same shape as EF Core’sHasQueryFilter. - Composite (multi-column) JSON indexes —
CreateIndexAsync<T>(ctx.User, u => u.LastName, u => u.FirstName)on every relational provider. - Real connection pooling on server SQL — PostgreSQL, MySQL, and SQL Server stop serializing through a per-store semaphore and start using the ADO.NET driver’s pool. One
DocumentStoreinstance can now actually serve a web app. - Per-query change monitoring —
.NotifyOnChange()on any fluent query, filtered by the query’sWherepredicates.
The full changelog is on the release notes page. This post walks through the headliners.
Vector / ANN Search
Section titled “Vector / ANN Search”Embedding-search has been the most requested feature since the AI tools shipped in v4. v6 closes that gap.
Register an embedding property and query by similarity:
public class Document{ public Guid Id { get; set; } public string Content { get; set; } = ""; public ReadOnlyMemory<float> Embedding { get; set; }}
var store = new DocumentStore(new DocumentStoreOptions{ DatabaseProvider = new SqliteDatabaseProvider("Data Source=mydata.db") { EnableVectorExtension = true // loads sqlite-vec on every connection }}.MapVectorProperty<Document>( d => d.Embedding, dimensions: 1536, metric: VectorDistance.Cosine, indexKind: VectorIndexKind.Hnsw));
var hits = await store.Query<Document>() .Where(d => d.Content.Contains("invoice")) // pre-filter where supported .NearestVectors(queryEmbedding, k: 10);
foreach (var hit in hits) Console.WriteLine($"{hit.Score:F4} {hit.Document.Content}");The vector type is ReadOnlyMemory<float> everywhere — same shape as Microsoft.Extensions.AI.Embedding<float>.Vector, JSON-round-trips through System.Text.Json without a custom converter, and avoids an float[] allocation on every read.
Provider matrix
Section titled “Provider matrix”| Provider | Storage | Index | Filter |
|---|---|---|---|
| PostgreSQL | pgvector sidecar | HNSW, IVF | Pre-filter via JOIN |
| SQL Server 2025 | Native VECTOR(n) sidecar | DiskANN | Pre-filter via JOIN |
| Cosmos DB | Embedded in document JSON | DiskANN, QuantizedFlat, Flat | WHERE + ORDER BY VectorDistance(...) |
| MongoDB (Atlas) | $vectorSearch aggregation | HNSW (Atlas-managed) | Filter inside $vectorSearch |
| DuckDB | vss sidecar | HNSW | Pre-filter via JOIN |
| SQLite | sqlite-vec virtual table | None (flat scan) | Post-filter join back |
| MySQL / LiteDB / IndexedDB | — | — | Throws NotSupportedException |
Cosine / Euclidean / DotProduct are available everywhere; Hamming is pgvector-only. Cosine distance is always surfaced as [0, 2] regardless of which way the underlying engine likes to count, so ORDER BY score ASC works the same way on every provider.
Auto-embed on insert
Section titled “Auto-embed on insert”If you don’t want to call IEmbeddingGenerator by hand on every write, Shiny.DocumentDb.Extensions.AI ships an AutoEmbedOnInsert<T> helper that hooks the new OnBeforeInsert<T> pipeline:
using Shiny.DocumentDb.Extensions.AI;
opts.MapVectorProperty<Document>(d => d.Embedding, dimensions: 1536) .AutoEmbedOnInsert<Document>( embeddingGenerator, sourceSelector: d => d.Content, targetSetter: (d, vec) => d.Embedding = vec, targetGetter: d => d.Embedding); // skip when already set
await store.Insert(new Document { Content = "hello world" });// Embedding is populated automatically before the row hits the wire.It runs on Insert, BatchInsert, and Upsert, skips when the source is null/empty, and skips when the target already holds a non-default vector so explicit writes always win.
Tuning knobs
Section titled “Tuning knobs”VectorIndexOptions gives you strongly-typed HNSW (M, EfConstruction, EfSearch) and IVF (Lists) settings plus a ProviderHints dictionary for the long tail (sqlite.postFilterMultiplier, atlas.indexName, atlas.numCandidates).
Full design notes are in the Vector docs.
Global Query Filters
Section titled “Global Query Filters”If you have shipped anything on Entity Framework Core, you have written this:
modelBuilder.Entity<User>().HasQueryFilter(u => !u.IsDeleted);Shiny.DocumentDb v6 gets the same surface:
var store = new DocumentStore(new DocumentStoreOptions{ DatabaseProvider = new SqliteDatabaseProvider("Data Source=mydata.db")}.AddQueryFilter<User>(u => !u.IsDeleted) // unnamed.AddQueryFilter<Order>("tenant", o => o.TenantId == ctx.Current) // named.AddQueryFilter<Order>("status", o => o.Status != "Archived"));Filters AND together; the user’s Where is AND’d on top. Captured variables (ctx.Current) are re-read on every translation, so per-request tenant scopes work without rebuilding the store.
What gets filtered
Section titled “What gets filtered”The interesting decision is what isn’t filtered. v6 follows EF Core: every read path enforces the filter, but inserts and raw SQL stay free.
| Path | Filtered? |
|---|---|
Query<T>() + every terminal (ToList, Count, ExecuteUpdate, …) | Yes |
query.NotifyOnChange() | Yes — only matching documents emit |
Get<T>(id) / GetDiff<T>(id, ...) | Yes — returns null if filter fails |
Update<T> | Yes — throws “not found” if filter fails |
SetProperty<T> / RemoveProperty<T> / Remove<T>(id) / Clear<T>() | Yes |
Insert<T> / BatchInsert<T> / Upsert<T> | No — matches EF Core |
Query<T>(rawSql) / QueryStream<T>(rawSql) | No — your SQL, your call |
Per-query opt-out matches EF Core too:
// Disable all filtersvar allUsers = await store.Query<User>().IgnoreQueryFilters().ToList();
// Disable a specific named filter (others still apply)var anyTenant = await store.Query<Order>().IgnoreQueryFilters("tenant").ToList();This works on every provider that has a real query translator: relational SQL (DocumentStore), LiteDbDocumentStore, CosmosDbDocumentStore, MongoDbDocumentStore, and IndexedDbDocumentStore.
Full reference: Global Query Filters.
Composite JSON Indexes
Section titled “Composite JSON Indexes”CreateIndexAsync<T> has accepted a single expression since v3. v6 adds a multi-expression overload:
// Single-column (unchanged)await store.CreateIndexAsync<User>(u => u.Name, ctx.User);
// Composite — one B-tree over multiple JSON pathsawait store.CreateIndexAsync( ctx.User, u => u.LastName, u => u.FirstName);The composite index name is built by joining the resolved paths with __, so ix_User_LastName__FirstName is the resulting object on disk. Drop the composite index with the matching overload:
await store.DropIndexAsync(ctx.User, u => u.LastName, u => u.FirstName);How each provider implements it:
- SQLite / SQLCipher / PostgreSQL / MySQL / DuckDB — one composite index with one
json_extract(or provider equivalent) expression per path. Single statement, single index object. - SQL Server — JSON expression indexes need
PERSISTEDcomputed columns. v6 creates one column per path (cc_{indexName}_0,cc_{indexName}_1, …) and indexes them all. The drop path discovers the backing computed columns fromsys.index_columns, so single- and multi-column indexes drop through the same code path with no special-case logic.
Existing single-path index names are preserved bit-for-bit, so v5 indexes survive an upgrade without an OBJECT_DROP_FAILED somewhere in production.
Real Connection Pooling on Server SQL
Section titled “Real Connection Pooling on Server SQL”v5 was honest about its limit: a single DocumentStore instance serialized every operation through one semaphore around one long-lived connection. Fine for a phone, miserable for a server.
v6 splits behaviour along the provider:
- PostgreSQL, MySQL, SQL Server — open a connection per operation. The ADO.NET driver’s pool multiplexes callers. One store, many concurrent calls, no in-process queueing.
- SQLite, SQLCipher, DuckDB — embedded engines that take a database-wide write lock. These keep the v5 model: one long-lived connection, one per-store semaphore. The provider declares which mode it wants via
IDatabaseProvider.RequiresSingleConnection.
RunInTransaction pins one connection for the duration of the user callback regardless of provider, so every nested operation shares the transaction.
Table init is now backed by a ConcurrentDictionary<string, Lazy<Task>> — first-touch DDL runs exactly once per table even under concurrent first calls. No more “is the schema there yet?” races on cold start.
A small but important consequence for streaming: on the pooled providers, await foreach (... in store.Query<T>().ToAsyncEnumerable()) holds one connection out of the pool for the lifetime of the iterator instead of holding the whole store. Other callers don’t block. On the embedded engines, behaviour is unchanged — finish the enumeration before issuing another store call.
Per-Query Change Monitoring
Section titled “Per-Query Change Monitoring”IObservableDocumentStore shipped in v5.3 with a global, type-scoped stream of DocumentChange<T>. v6 adds a query-scoped overload — every fluent query now exposes a .NotifyOnChange() that filters the change feed by the query’s own Where predicates:
var pending = store.Query<Order>().Where(o => o.Status == "Pending");
await foreach (var change in pending.NotifyOnChange(ct)){ // Only fires when an Order matching Status == "Pending" is inserted or updated. UpdateUi(change);}OrderBy, Paginate, and GroupBy are ignored because they change result shape, not membership. Calling Select(...) first throws — projecting away the document body breaks the filter.
SetProperty, RemoveProperty, Remove, and Clear don’t carry the full document, so DocumentChange<T>.Document is null for those events. The per-query filter passes them through unconditionally so the consumer can re-query and decide for itself whether the document still matches.
Combined with the new global query filters and the existing IChangeFeedDocumentStore (cross-process change feeds backed by PostgreSQL LISTEN/NOTIFY, SQL Server Change Tracking, and Cosmos DB Change Feed), change observation is now end-to-end coherent: every read goes through the same filter; every change subscription sees only the changes that match.
Other Notable v6 Items
Section titled “Other Notable v6 Items”A few smaller things that show up in the release notes but didn’t get their own section:
MapIdProperty<T>(...)— standalone Id-property override that no longer requiresMapTypeToTable. Use it when the Id is namedSlugorDeviceKeybut you still want the type stored in the default shared table.OnBeforeInsert<T>— async pre-write hook onDocumentStoreOptions.AutoEmbedOnInsert<T>is the headline consumer but it’s a general “compute derived fields” extension point.SupportsVectoronIDocumentStoreandIDatabaseProvider, matching the existingSupportsSpatial.- PostgreSQL optimistic concurrency fix — the version check now extracts as a typed int (
::BIGINT), no more42883: operator does not exist. - PostgreSQL and DuckDB multi-tenancy fix — the
CAST(@data AS JSONB)/CAST(@data AS JSON)envelopes no longer break the tenant-column rewrite.
Upgrading
Section titled “Upgrading”v6 is API-compatible with v5 in every place that matters. The semaphore on the server-SQL providers is gone — if you relied on it to serialize writes from one store instance, you’ll want to switch to RunInTransaction for that semantics. Everything else is purely additive.
dotnet add package Shiny.DocumentDb.PostgreSql --version 6.0.0dotnet add package Shiny.DocumentDb.Sqlite --version 6.0.0dotnet add package Shiny.DocumentDb.SqlServer --version 6.0.0# etc.