Skip to content

[Bug]: Add vector failed, dim size too larg #160

@ccdv-ai

Description

@ccdv-ai

Description

Im trying to insert quite large documents (few thousands tokens) into a zvec collection using the sparse schema (based on tfidf).
Most documents get indexed but it fails for a few of them.

schema = zvec.CollectionSchema(
    name="db",
    vectors=[
        zvec.VectorSchema("sparse_features", zvec.DataType.SPARSE_VECTOR_FP32),
        zvec.VectorSchema("sparse_bm25_features", zvec.DataType.SPARSE_VECTOR_FP32)
    ],
    fields=[zvec.FieldSchema("content", zvec.DataType.STRING)]
)

collection = zvec.create_and_open(path=DB_PATH, schema=schema)

for i, text in enumerate(documents):
    docs_to_insert = zvec.Doc(
            id=f"doc_{i}",
            vectors={
                "sparse_features": tri_pipeline.embed(text),
                "sparse_bm25_features": word_pipeline.embed(text)
            },
            fields={"content": text}
        )
    
    collection.insert(docs_to_insert)

For a few documents, I get this:

[ERROR 2026-02-23 13:55:54 140148983252800 flat_sparse_streamer.cc:255] Add vector failed, dim size too larg, dim_size=6598, key=22887
[ERROR 2026-02-23 13:55:54 140148983252800 index.cc:573] Failed to add vector
[ERROR 2026-02-23 13:55:54 140148983252800 segment.cc:772] insert vector failed[Failed to add vector to index]
[ERROR 2026-02-23 13:55:57 140148983252800 flat_sparse_streamer.cc:255] Add vector failed, dim size too larg, dim_size=4532, key=27571
[ERROR 2026-02-23 13:55:57 140148983252800 index.cc:573] Failed to add vector
[ERROR 2026-02-23 13:55:57 140148983252800 segment.cc:772] insert vector failed[Failed to add vector to index]
[ERROR 2026-02-23 13:55:59 140148983252800 flat_sparse_streamer.cc:255] Add vector failed, dim size too larg, dim_size=5767, key=31046
[ERROR 2026-02-23 13:55:59 140148983252800 index.cc:573] Failed to add vector
[ERROR 2026-02-23 13:55:59 140148983252800 segment.cc:772] insert vector failed[Failed to add vector to index]
[ERROR 2026-02-23 13:56:06 140148983252800 flat_sparse_streamer.cc:255] Add vector failed, dim size too larg, dim_size=8921, key=41169
[ERROR 2026-02-23 13:56:06 140148983252800 index.cc:573] Failed to add vector

Setting dimension to a big number doesn't fix it.
With short documents there is no problem.

Steps to Reproduce

See description.

Logs / Stack Trace

Operating System

Ubuntu 22.04

Build & Runtime Environment

Python 3.11.7

Additional Context

  • I've checked git status — no uncommitted submodule changes
  • I built with CMAKE_BUILD_TYPE=Debug
  • This occurs with or without COVERAGE=ON
  • The issue involves Python ↔ C++ integration (pybind11)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions