-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Mismatch in Length of Values and Index in generate_text_embeddings #1382
Comments
Hello, I'm encountering the same error when following the "Getting Started" guide. Specifically, the issue arises when using the input text downloaded from: curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt Any guidance on how to resolve this would be greatly appreciated. Thank you! |
Same!!! Need help |
same problem with graphrag v0.5 |
This issue got resolved for me when I changed chunk size from 1200 to 300 with Open AI embedding model and llama 3 for text generation. |
Do you need to file an issue?
Describe the bug
Encountered an error when running the generate_text_embeddings function in the pipeline. The error log indicates a mismatch between the length of values (338) and the index (366) when attempting to set values in a DataFrame.
Error Details: The error traceback provided below highlights the issue occurring within the generate_text_embeddings function in generate_text_embeddings.py:
ValueError: Length of values (338) does not match length of index (366)
pipeline with the following steps:
create_base_text_units
create_base_entity_graph
create_final_entities
create_final_nodes
create_final_communities
create_final_relationships
create_final_text_units
create_final_community_reports
create_final_documents
generate_text_embeddings
The pipeline fails at the generate_text_embeddings step with the error message.
LOG:
{
"type": "error",
"data": "Error executing verb "generate_text_embeddings" in generate_text_embeddings: Length of values (338) does not match length of index (366)",
"stack": "Traceback (most recent call last):\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\workflows\v1\subflows\generate_text_embeddings.py", line 56, in generate_text_embeddings\n await generate_text_embeddings_flow(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 106, in generate_text_embeddings\n await _run_and_snapshot_embeddings(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 129, in _run_and_snapshot_embeddings\n data["embedding"] = await embed_text(\n ~~~~^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4311, in setitem\n self._set_item(key, value)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4524, in _set_item\n value, refs = self._sanitize_column(value)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 5266, in _sanitize_column\n com.require_length_match(value, self.index)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\common.py", line 573, in require_length_match\n raise ValueError(\nValueError: Length of values (338) does not match length of index (366)\n",
"source": "Length of values (338) does not match length of index (366)",
"details": null
}
{
"type": "error",
"data": "Error running pipeline!",
"stack": "Traceback (most recent call last):\n File "D:\anaconda\Lib\site-packages\graphrag\index\run\run.py", line 269, in run_pipeline\n result = await _process_workflow(\n ^^^^^^^^^^^^^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\run\workflow.py", line 105, in _process_workflow\n result = await workflow.run(context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 369, in run\n timing = await self._execute_verb(node, context, callbacks)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\datashaper\workflow\workflow.py", line 415, in _execute_verb\n result = await result\n ^^^^^^^^^^^^\n File "D:\anaconda\Lib\site-packages\graphrag\index\workflows\v1\subflows\generate_text_embeddings.py", line 56, in generate_text_embeddings\n await generate_text_embeddings_flow(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 106, in generate_text_embeddings\n await _run_and_snapshot_embeddings(\n File "D:\anaconda\Lib\site-packages\graphrag\index\flows\generate_text_embeddings.py", line 129, in _run_and_snapshot_embeddings\n data["embedding"] = await embed_text(\n ~~~~^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4311, in setitem\n self._set_item(key, value)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 4524, in _set_item\n value, refs = self._sanitize_column(value)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\frame.py", line 5266, in _sanitize_column\n com.require_length_match(value, self.index)\n File "C:\Users\DJ027103\AppData\Roaming\Python\Python311\site-packages\pandas\core\common.py", line 573, in require_length_match\n raise ValueError(\nValueError: Length of values (338) does not match length of index (366)\n",
"source": "Length of values (338) does not match length of index (366)",
"details": null
}
Using azure for both llm and embedding with text-embedding-ada-002
Steps to reproduce
running graphrag index with version of 0.4.0
Expected Behavior
No response
GraphRAG Config Used
# Paste your config here
Logs and screenshots
No response
Additional Information
The text was updated successfully, but these errors were encountered: