Conversation
| # Avoid "ValueError: need at least one array to concatenate" on | ||
| # older versions of pandas when converting empty RecordBatch to | ||
| # DataFrame. See: https://github.com/pandas-dev/pandas/issues/41241 |
There was a problem hiding this comment.
pandas-dev/pandas#41052 has released with pandas 1.3.0 (as mentioned in above issue: pandas-dev/pandas#41241 (comment))
So if we can increase minimum pandas version to 1.3.0, we don't need this workaround
There was a problem hiding this comment.
We should really come up with a consistent policy across all Google packages:
- pandas-gbq supports 1.1.4+ (https://github.com/googleapis/python-bigquery-pandas/blob/b3a9af27960c6e2736835d6a94e116f550f94114/setup.py#L27)
- db-dtypes supports 1.2.0+ (https://github.com/googleapis/python-db-dtypes-pandas/blob/44a5aa24ef3219849e9e0bedeeaebc4b9ee53d9e/setup.py#L34)
- bigframes supports 1.5.3+ (https://github.com/googleapis/python-bigquery-dataframes/blob/7d00be67cf50fdf713c40912f207d14f0f65538f/setup.py#L53)
- Colab and BigQuery Studio Python notebooks are both in the 2.x range now.
There was a problem hiding this comment.
Indeed, moving to a monorepo might help make the dependencies more consistent, ideally we might want any overlapped dependencies among the handwritten BigQuery projects to be consistent.
But just for the purpose of this PR, I think it's reasonable to increase the minimum pandas version to 1.3.0, as long as it's not breaking anything.
| opentelemetry-instrumentation==0.20b0 | ||
| opentelemetry-sdk==1.1.0 | ||
| packaging==24.2.0 | ||
| pandas==1.1.4 |
There was a problem hiding this comment.
I confirmed that pandas==1.2.5 reproduces
FAILED tests/unit/test_table.py::TestRowIterator::test_to_dataframe_w_bqstorage_no_streams - ValueError: need at least one array to concatenate
and pandas==1.3.0 can pass
There was a problem hiding this comment.
Thanks for updating the constraint file!
tswast
left a comment
There was a problem hiding this comment.
This looks good to me, but will wait for @chalmerlowe or @Linchin to approve.
| # Avoid "ValueError: need at least one array to concatenate" on | ||
| # older versions of pandas when converting empty RecordBatch to | ||
| # DataFrame. See: https://github.com/pandas-dev/pandas/issues/41241 |
There was a problem hiding this comment.
We should really come up with a consistent policy across all Google packages:
- pandas-gbq supports 1.1.4+ (https://github.com/googleapis/python-bigquery-pandas/blob/b3a9af27960c6e2736835d6a94e116f550f94114/setup.py#L27)
- db-dtypes supports 1.2.0+ (https://github.com/googleapis/python-db-dtypes-pandas/blob/44a5aa24ef3219849e9e0bedeeaebc4b9ee53d9e/setup.py#L34)
- bigframes supports 1.5.3+ (https://github.com/googleapis/python-bigquery-dataframes/blob/7d00be67cf50fdf713c40912f207d14f0f65538f/setup.py#L53)
- Colab and BigQuery Studio Python notebooks are both in the 2.x range now.
| opentelemetry-instrumentation==0.20b0 | ||
| opentelemetry-sdk==1.1.0 | ||
| packaging==24.2.0 | ||
| pandas==1.1.4 |
There was a problem hiding this comment.
Thanks for updating the constraint file!
|
I think we need to remove some liens in the unit test so coverage test can pass, could you fix it for us? Then we should be able to merge the PR |
|
Indeed, pandas>=1.3.0 must have |
|
A different system test is failing, could you fix it too? Thanks |
| # Result is dependent upon which version of pandas is being used. | ||
| # Float64 was not introduced until pandas version 1.4. | ||
| if PANDAS_INSTALLED_VERSION >= "1.4": | ||
| assert df.dtypes["float64_col"].name == "Float64" | ||
| else: | ||
| assert df.dtypes["float64_col"].name == "string" |
There was a problem hiding this comment.
Float64 is introduced in pandas 1.2.0, not 1.4. So we can drop this handling now (pandas >= 1.3.0)
nox > python -m pip freeze
.
.
.
pandas==1.3.0
.
.
.
if PANDAS_INSTALLED_VERSION >= "1.4":
assert df.dtypes["float64_col"].name == "Float64"
else:
> assert df.dtypes["float64_col"].name == "string"
E AssertionError: assert 'Float64' == 'string'
E
E - string
E + Float64
|
@Linchin sorry to overlooking another error, I fixed it too. Can you try to run Ci test again ? I have trouble to run the system test on my local (maybe due to old numpy on apple silicon) but the tests should pass. |
Linchin
left a comment
There was a problem hiding this comment.
LGTM. Thanks for your contribution!
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #2148 🦕
Can we increase the minimum supported pandas from
1.1.4to1.3.0?