-
Notifications
You must be signed in to change notification settings - Fork 432
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Description
Im trying to insert quite large documents (few thousands tokens) into a zvec collection using the sparse schema (based on tfidf).
Most documents get indexed but it fails for a few of them.
schema = zvec.CollectionSchema(
name="db",
vectors=[
zvec.VectorSchema("sparse_features", zvec.DataType.SPARSE_VECTOR_FP32),
zvec.VectorSchema("sparse_bm25_features", zvec.DataType.SPARSE_VECTOR_FP32)
],
fields=[zvec.FieldSchema("content", zvec.DataType.STRING)]
)
collection = zvec.create_and_open(path=DB_PATH, schema=schema)
for i, text in enumerate(documents):
docs_to_insert = zvec.Doc(
id=f"doc_{i}",
vectors={
"sparse_features": tri_pipeline.embed(text),
"sparse_bm25_features": word_pipeline.embed(text)
},
fields={"content": text}
)
collection.insert(docs_to_insert)For a few documents, I get this:
[ERROR 2026-02-23 13:55:54 140148983252800 flat_sparse_streamer.cc:255] Add vector failed, dim size too larg, dim_size=6598, key=22887
[ERROR 2026-02-23 13:55:54 140148983252800 index.cc:573] Failed to add vector
[ERROR 2026-02-23 13:55:54 140148983252800 segment.cc:772] insert vector failed[Failed to add vector to index]
[ERROR 2026-02-23 13:55:57 140148983252800 flat_sparse_streamer.cc:255] Add vector failed, dim size too larg, dim_size=4532, key=27571
[ERROR 2026-02-23 13:55:57 140148983252800 index.cc:573] Failed to add vector
[ERROR 2026-02-23 13:55:57 140148983252800 segment.cc:772] insert vector failed[Failed to add vector to index]
[ERROR 2026-02-23 13:55:59 140148983252800 flat_sparse_streamer.cc:255] Add vector failed, dim size too larg, dim_size=5767, key=31046
[ERROR 2026-02-23 13:55:59 140148983252800 index.cc:573] Failed to add vector
[ERROR 2026-02-23 13:55:59 140148983252800 segment.cc:772] insert vector failed[Failed to add vector to index]
[ERROR 2026-02-23 13:56:06 140148983252800 flat_sparse_streamer.cc:255] Add vector failed, dim size too larg, dim_size=8921, key=41169
[ERROR 2026-02-23 13:56:06 140148983252800 index.cc:573] Failed to add vector
Setting dimension to a big number doesn't fix it.
With short documents there is no problem.
Steps to Reproduce
See description.Logs / Stack Trace
Operating System
Ubuntu 22.04
Build & Runtime Environment
Python 3.11.7
Additional Context
- I've checked
git status— no uncommitted submodule changes - I built with
CMAKE_BUILD_TYPE=Debug - This occurs with or without
COVERAGE=ON - The issue involves Python ↔ C++ integration (pybind11)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
Backlog