Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: missing row fetch size and other optimizations for large dataset #13067

Closed
asambeka opened this issue Oct 27, 2015 · 7 comments
Closed

Comments

@asambeka
Copy link

I have been trying to work with large datasets in Oracle and mySQL DBs. There are around 20 million records in these databases. Idea was to use golang to pull and export them as JSON. While I was able to use golang concurrency for export, I realized performance for pull from DB was VERY bad. Here is an estimate how bad it is:
golang (default driver setting): 30 min for 10 million (running next/scan in loop)
I was able to find configuration options for one of the Oracle drivers (gopkg.in/rana/ora.v3) and after setting PrefetchRowCount to 50K, I was able to reduce time taken to 5 min.

Big question, why is there no standard way to provide hint to underlying driver in database/sql package about configuration parameters(Or am I missing something here)? I think this is a big implementation issue, and will restrict golang use to very premetive scenarios around databases. This will also encourage non stadard usage of drivers. PrefetchRowCount is just one example, and there are many others including language (NLS settings) etc. Both ODBC and JDBC specifications have very well articulated and provide lot of options for performance tuning, and other aspects around data retrieval. Having atleast some of those will certainly make life easier.

@ianlancetaylor ianlancetaylor changed the title database/sql missing row fetch size and other optimizations for large dataset database/sql: missing row fetch size and other optimizations for large dataset Oct 28, 2015
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Oct 28, 2015
@bradfitz
Copy link
Contributor

Go prefers to start with simplicity instead of preemptively adding knobs. That explains why no explicit knobs yet exist.

But in this case it seems a potential knob already does exist: the database DSN string. Your driver could support a hint in there.

Or your driver could notice the result set is huge and buffer more, if it's buffering at all.

Before we do anything, I'd want to hear from the driver author first and make sure they've done the optimizations on their side.

@kritika1439
Copy link

hi,
please tell how to set PrefetchRowCount in golang by using the rana driver

@asambeka
Copy link
Author

You can use StmtCfg to set prefetch memory and row count in rana/ora.

-Ashwin

@bradfitz
Copy link
Contributor

Please take questions to the mailing list or other forums: https://golang.org/wiki/Questions

@kardianos
Copy link
Contributor

@asambeka I'm still feeling out the best practices here. But the advice I would like you to consider would be to add any driver knobs you need into the context value after the driver adds support for the driver. For instance, your PrefetchRowCount property could be set at some lower request handler as a default, and by the time it reaches your request, you just pass it in normally.

Context methods will be supported in go1.8+.

@gopherbot
Copy link

CL https://golang.org/cl/38533 mentions this issue.

gopherbot pushed a commit that referenced this issue May 18, 2017
Previously all arguments were passed through driver.IsValid.
This checked arguments against a few fundamental go types and
prevented others from being passed in as arguments.

The new interface driver.NamedValueChecker may be implemented
by both driver.Stmt and driver.Conn. This allows
this new interface to completely supersede the
driver.ColumnConverter interface as it can be used for
checking arguments known to a prepared statement and
arbitrary query arguments. The NamedValueChecker may be
skipped with driver.ErrSkip after all special cases are
exhausted to use the default argument converter.

In addition if driver.ErrRemoveArgument is returned
the argument will not be passed to the query at all,
useful for passing in driver specific per-query options.

Add a canonical Out argument wrapper to be passed
to OUTPUT parameters. This will unify checks that need to
be written in the NameValueChecker.

The statement number check is also moved to the argument
converter so the NamedValueChecker may remove arguments
passed to the query.

Fixes #13567
Fixes #18079
Updates #18417
Updates #17834
Updates #16235
Updates #13067
Updates #19797

Change-Id: I89088bd9cca4596a48bba37bfd20d987453ef237
Reviewed-on: https://go-review.googlesource.com/38533
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@kardianos
Copy link
Contributor

In go1.9 drivers will be able to implement driver.NamedValueChecker on the driver.Conn. When inspecting the arguments it can determine if the argument is some type of option value and pass back driver.ErrRemoveArgument so it will not be passed into the query argument list, but still use it on the connection to configure it in some way.

I believe this should fully resolve this issue.

@golang golang locked and limited conversation to collaborators May 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants