-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source Mysql: syncing timeout (fetch size may be ignored) #9784
Comments
Hey @MahmoudElhalwany have you tried changing some config on Mysql db end and see if you can increase the connection time limit ? |
I have tried that, and it take 23 hours then it gives me the same error |
@MahmoudElhalwany were you able to get around this issue? I am running into the same problem and tried a few JDBC params, but no luck. |
Zendesk ticket #1737 has been linked to this issue. |
Comment made from Zendesk by Marcos Marx on 2022-08-02 at 01:58:
|
Comment made from Zendesk by Sajarin on 2022-08-03 at 16:17:
|
Let me know if there is anything I can do to help solve this issue. We are able to replicate it consistently. We are able to pull over a million records from small tables, but the moment we tap on a large table, the process fails randomly with the same error. We've tried updating server timeouts and JDBC parameters, and we have seen the job takes longer to fail, but still fails with the same error. here is the stack trace from our log...
|
@jorge-gt3 could you share how big your large tables are? |
They vary in size. The error above occurred when I added a table with 19,272,203 rows to the connection. I just joined Slack if you want to chat there. |
I took a closer look and it notice the worker chokes right after this entry log:
It looks like setting the fetch size is last thing that happens before it executing the PreparedStatement, which lead me to this old blog post where someone in the comments said that setFetchSize works with Statement but "doesn't seem to work with PreparedStatement"... ¯\_ (ツ)_/¯ |
We should verify that fetch size actually works based on the previous comment |
I looked into this a bit more and it seems like it has been a known issue in the MySQL community...
I found this post in SO and I tried using I assume we also need to add java.sql.ResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY & java.sql.ResultSet.CONCUR_READ_ONLY to the preparedStatement connection as suggested in one of the SO posts.. From the JDBC implementation notes... "By default, ResultSets are completely retrieved and stored in memory. In most cases this is the most efficient way to operate and, due to the design of the MySQL network protocol, is easier to implement. If you are working with ResultSets that have a large number of rows or large values and cannot allocate heap space in your JVM for the memory required, you can tell the driver to stream the results back one row at a time." To enable this functionality, create a Statement instance in the following manner:
|
Sync from mysql to mysql is failed too for large table. |
Changing scope to GA |
@grishick any updates on this? |
@danieldiamond , @grishick is out right now but let me inquire with the engineering team about this! |
THere looks to be some big changes in this area with this PR https://github.com/airbytehq/airbyte/pull/17236/files |
I finally got a chance to try this again with the latest version of the MySQL source and I was able to sync over 85 million records using the following JDBC parameters:
|
Per discussion with @prateekmukhedkar - we will add this to the docs. |
@jorge-gt3 curious how it's going with the changes you have made. I tried the same... the incremental task is looping - like a streaming process. I think this is happening because of the useCursorFetch parameter. |
Hi! Any updates here? I have this issues with a 47 million record table and the JDBC parameters from @jorge-gt3 didn't solve the error 😞 |
@javiCalvo can you link your Airbyte workspace or preferably the job history where you are seeing this error? Thank you! |
As this still seems to be an issue especially with large tables after a lot of trial and error we seem to have found an ideal set of params that helped us. We used @jorge-gt3 's recommendations with some additions as those alone did not solve our problem. Our setup is as follows:
Our MySQL JDBC URL Params are as follows
These timeouts are in milliseconds
These timeouts are in seconds
May seem like a bit of overkill, but it seems like somewhere when parallel writes start a connection is opened but is allowed to timeout. Whether thats in the connector code itself or due to our source table having 1.7B+ rows 🤷 Hopefully this helps someone else having the same issue and this saves you days of debugging and trial and error on large tables 😄 I also had made a post in the community slack channel of folks are looking for a solve there. |
Environment
Current Behavior
time out in syncing
Logs
LOG
The text was updated successfully, but these errors were encountered: