New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
database/sql: expose database original column type #16652
Comments
I'm available to work on this. I would like to start contributing to Go. More or less, what should the API look like for this? |
Much of the work is figuring out the API. Got a proposal? |
What if we had a method on
|
@juicemia That only works for table columns; the column in a query may be computed by the database engine, in which case there is no set of arguments to pass to your QueryMeta method that will return the correct type. |
How about a new method on Rows type:
I am not sure what type it should return though. It is certainly database specifict. E.g. there are ~150 different types in postgresql: https://github.com/lib/pq/blob/master/oid/types.go JDBC API defines ~40 standard types: https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html |
This introduces the concept of type information, useful for #12381 as well for input parameter types. The following type information would work: https://godoc.org/github.com/kardianos/rdb#Type |
@kostya-sh @kardianos defines a custom
|
That's actually not complete enough in my opinion. Also see https://godoc.org/github.com/kardianos/rdb#Schema and the Column def. There are other parameters that can be important to know. Is it nullable? Is it a primary key? Field length, decimal precision. I do think it is important to allow drivers to define custom types. In my implementation there is a generic type as well as a specific type. For example: varchar, char, nvarchar, and text are all genetically "Text". |
Maybe it should just be an interface that it's up to drivers to define? SQLite shrinks the lowest common denominator set of types quite a bit. Postgres let's you define your own types. SQLite doesn't really care if you use the right type. Postgres does. Some of their types share some names but that's about it: they behave quite differently. I'm not sure what methods it should have (other than String() string) though. Stuff to work with other types and values, presumably. |
@kardianos good point. Then maybe it could just be
|
@jimmyfrasche Please don't use SQLite as an example. If you are using SQLite, today's API is perfect. However every other RDBMS out there has a strong notion of type internally and in interfaces. As for variety, I've done the survey. What you have in the rdb Types is like 99% of types used. There really isn't that much variety. go type string is either varchar, char, nvarchar, varchar2, or text (and varchar2 is an oracle specific type and can be ignored for our use, can be provided by driver). That's it. Numeric types are generally all tinyint, smallint int, bigint. There are custom types: SQL server allow .Net types to be defined, including table types, postgres allows arrays and other types. But... I think we can let drivers define those types. So, yes, sqlite doesn't care about types, so the current API is perfect for it. However other databases care (for at least performance) if your param is a varchar(10) or a nvarchar(-1). If it isn't meaningful for the database system in question, ignore it. But that is mostly just sqlite where you would. |
@kardianos Sqlite may not be the best example (don't get me wrong, I love it) but it's still an example of a database that needs to be supported. MySql is another example of a database that's wishy-washy about types (this one I do not love). However, my point was that the definition of type in SQL across databases isn't something that can be set in stone. Even standardized types can be different between databases, sometimes in subtle ways. If you define a lot of types, even if most dbs support all of them, the drivers for the ones that don't are going to need to deal with handling the ones they don't support. If they have types not in that set, the driver and the user has to deal with it. I don't see a lot of value in filling database/sql up with a lot of constants that may or may not apply and don't really mean a lot on their own without knowing what driver you're using. If there's a Type interface that has nice semantics it can cover as many or as few types as a particular database provides and reflect the native semantics more accurately—even for databases that don't exist yet. This does push the implementation burden onto the provider of the driver, but it also affords them the ability to define their types to fit their underlying database better. Ultimately, what matters is how users of the database/sql package are going to be using this information. Admittedly, I've only ever used it for debugging, so strings would suit me just fine. Constants are cheaper but do not cover all the cases, so you still need a way to compare arbitrary types anyway since TypeCustom equaling TypeCustom isn't really the whole story. So there should probably be an Equal(t Type) method on Type and an AssignableTo(t Type). This would have to be provided by the driver even for common types. Most users will only be dealing with one database at a time so it doesn't really matter if sqlite3.TypeText != pg.TypeText: you're probably going to have to handle that specially anyway if you're writing code that cares about such things. I'm not convinced an interface is better than just strings: strings are much simpler and easier for a driver to implement. I'm pretty sure constants in database/sql aren't the way to go here, though. It makes the world seem more uniform than it is. |
@jimmyfrasche Can you please point me to documentation explaining how My Sql is wishy washy about types? Are you disagreeing that there are only so many distinct types or that there are many others then listed https://github.com/kardianos/rdb/blob/master/types.go#L53 ? |
MySql does a lot of implicit conversions https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html though knowing the type upfront could allow you to work around that, if you knew that might get you in trouble and you knew you were dealing with MySql.
Both. There are databases with fewer types. There are databases with more types. (Many with infinite types in the sense that you can define arbitrary composite types like arrays and records). I'm also trying to point out that just because two types have the same name in two different databases it doesn't mean they behave the same or that all the same rules apply. None of that is to say that the type information isn't useful: just that it's very dependent on what database you're using. I'm thinking something like
is going to be more useful since it can handle a lot of those database-specific nuances for you. An sqlite driver could just always return true for AssignableTo and a postgres driver could return false if Equal returns false (after checking a handful of special cases surrounding text types, iirc). |
This is how SQL works, not just in My SQL. In MS SQL Server, if you send up a nvarchar param to a varchar column in a where clause, the performance goes down significantly due to increased conversion overhead. Silly? Yes. But I've gotta work with that without a bunch of hacks.
To a degree. A type Text in MS SQL server has a distinct meaning from varchar(max), whereas in postgresql type Text is just an unbounded varchar w/o significant differences. But an interface you are suggesting can't capture that well, as I'm not just taking about assignability to another database type. I also want to specify the input parameters #12381 where I need a database type.
Fewer types, some types aren't used, more types are defined in the driver itself. Despite the many dialects of SQL, data types do have a common base. And the differences a driver may define and reference. I don't see the problem. I find value in knowing what the database thinks a type is for both result columns and for input parameters. You are saying for result columns, provide a string representation and an interface to provide many functions you might want to do with it. I think that falls down in the input parameter case. There may be value in letting the driver expose a function like:
That could have value. But I think a standard type list, along with a driver type list is a first step. Do a survey among common and uncommon RDMBS types. They all share the vast majority of types, there is some variation as in the MS text vs varchar, but in those cases knowing the exact differences as a programmer is required (If a stored proc takes text, don't give it varchar for example). |
I see what you mean re #12381 though the driver could export the types for this use. That doesn't help if you're trying to remain database agnostic, though. But presumably you're passing this as a hint to the driver, so you're already not database agnostic. Maybe I don't understand the use case. If a specific driver is defining its own types—which it would have to for databases that allow user defined types at the very least—you still need an interface. Then the Type constants in database/sql would have to implement that interface, so they could be used interchangeably with the driver's types. But then they couldn't have anything db specific, like equivalence, so none of the types in database/sql could ever contain information specific to a database, so the interface couldn't have things like AssignableTo. If you needed that information you'd need to encode that in the driver or a third location, so what's the point of having the type constants in database/sql? They'd just be a list of names that could mean different things. I do not see the point. It would be nice for database/sql to have more features, but I'm wary of encoding too much information into it. What is the use case for having these specific constants in database/sql? Like I said, I've only ever used it for debugging. I imagine it would be very handy for statement mappers and ORMs but I've never written such things. |
As far as I can see there are 3 places where types could be useful:
I think it would help to discuss and understand requirements for each of these cases separately. Before jumping to the API discussions I suggest we should understand the use case for the requested feature better. @kirubasankars, can you please explain what you meant by "some level dynamic column value parsing"? Has anyone needed this functionality before? What was the use case? |
For the reference this is what JDBC provides: http://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html I can imagine only a few database drivers support gettting all this information though. For example this is what you can get out of ProstgreSQL: https://www.postgresql.org/docs/9.5/static/protocol-message-formats.html (search for |
@jimmyfrasche Yes, this would not be database agnostic. That would need a different system layer to filter it through. It would just expose what the RDBMS SQL type is. Use case for results where switching on go type doesn't work:
If you need to encode a information regarding ForFullName, you can't rely on it being there in the first row, in fact it may never be represented. This isn't a problem if you ad-hoc everything, but if you push more to app specific framework, it can be more problematic. However, per Brad in comment #12381 (comment) I think this is moot. So while Brad also said #16652 (comment) , it would need to be without specific type information being represented in the sql pacakge. |
@kostya-sh, Trying to create entire row sets as map[string]interface{}. While executing that query, my code have no idea about how many column, their names and types Basic types are automatically mapped to GO types, that's good. Custom type are not, (example PostgreSQL custom type) driver's can get me those custom values by only as RawBytes I was expecting this package should expose type and size information. in my opinion, may be java handle this better, Also have a look at JDBC sql types |
Something just occurred to me. At the risk of sounding dumb: how would I go about adding a method on an interface type?
Adding it to the EDIT: I'm only doing this to start playing around and getting familiar with the source. I don't feel like we've reached a consensus as to what should be done. |
@juicemia *sql.Rows is a struct. As such it can be added to safely. |
@kardianos wow how embarrassing. 😅 i was looking in driver and totally forgot about the rest of the package. 😶 So what's the direction we should take on the types then? I agree with @jimmyfrasche that we should just expose a |
Has discussion for this moved somewhere else, or has it just tapered off? |
@juicemia It has not moved. See #12381 (comment) Also see: #16673 (comment) for the scope of changes I hope to tackle for go1.8. |
@kardianos, no, I'm fine with making type information available. What I was opposed to that in that comment is making users redundantly specify type information when inserting data, since an interface value already conveys type information. I'd like to see an API proposal for this issue. |
After a chat with @kardianos, we've converged on a design like: type ColumnInfo struct {
// unexported fields
// driver handle some
}
func (ci ColumnInfo) Name() string { ... }
func (ci ColumnInfo) ScanType() reflect.Type { ... } // something that's suitable for Scan
func (ci ColumnInfo) Nullable() (nullable, ok bool) { ... }
func (ci ColumnInfo) DriverTypeName() string { ... } // "CHAR" without (length)
func (rs *Rows) ColumnInfo() ([]ColumnInfo, error) {
} |
Are we missing column size from the driver? |
Seconding @kirubasankars on column size. Also |
@kardianos @bradfitz @kirubasankars I hope the ship hasn't sailed on the column size decision. I've been working with this very issue literally two minutes ago on my own project, decided I better drop in on this ticket again. I'll give you a specific use-case where knowing the column size is important (and this is only just my particular case from tonight, this is a common occurrence with ETL software). I'm working on an app that moves data between heterogenous datasources (where the schemata are not known ahead of time, etc). Sometimes this means that there's a scratch DB in the middle where data munging/staging goes on. Let's say I've got Now, under the proposal above, yes I would know that a column is a text-ish field, but without knowing the size the best one can do is create a Again, the col size is data that is ordinarily available to the driver. If for some reason the driver can't present that information, we could use |
One more thing: should we not also include a method as follows?
This can default to |
OK, maybe just one more thing... let's not forget about:
Although, that's not ideal either? I'd be more inclined to go this path:
This would be more extensible/future-proof; implementations could choose how much (if any) detail to send back in the attributes, and would allow us to drop the |
@neilotoole I agree we need length, and possibly Scale and Precision. |
CL https://golang.org/cl/29961 mentions this issue. |
@kardianos I think you're probably correct about |
@neilotoole Size and length are included in the CL. Please review if you want to. |
@kardianos thanks, I just reviewed patch set 5. |
I had some time this week to look over the proposal. I'm generally ok with the proposed TLDR: Replace the five new It seems to me that having these five separate interfaces is overly elaborate. package driver
type RowsColumnTypeScanType interface {
Rows
ColumnTypeScanType(index int) reflect.Type
}
type RowsColumnTypeDatabaseTypeName interface {
Rows
ColumnTypeDatabaseTypeName(index int) string
}
type RowsColumnTypeLength interface {
Rows
ColumnTypeLength(index int) (length int64, ok bool)
}
type RowsColumnTypeNullable interface {
Rows
ColumnTypeNullable(index int) (nullable, ok bool)
}
type RowsColumnTypePrecisionScale interface {
Rows
ColumnTypePrecisionScale(index int) (precision, scale int64, ok bool)
} And this from the package sql
func (rs *Rows) ColumnTypes() ([]ColumnType, error) {
if rs.isClosed() {
return nil, errors.New("sql: Rows are closed")
}
if rs.rowsi == nil {
return nil, errors.New("sql: no Rows available")
}
return rowsColumnInfoSetup(rs.rowsi), nil
}
func rowsColumnInfoSetup(rowsi driver.Rows) []ColumnType {
names := rowsi.Columns()
list := make([]ColumnType, len(names))
for i := range list {
ci := &list[i]
ci.name = names[i]
if prop, ok := rowsi.(driver.RowsColumnTypeScanType); ok {
ci.scanType = prop.ColumnTypeScanType(i)
} else {
ci.scanType = reflect.TypeOf(new(interface{})).Elem()
}
if prop, ok := rowsi.(driver.RowsColumnTypeDatabaseTypeName); ok {
ci.databaseType = prop.ColumnTypeDatabaseTypeName(i)
}
if prop, ok := rowsi.(driver.RowsColumnTypeLength); ok {
ci.length, ci.hasLength = prop.ColumnTypeLength(i)
}
if prop, ok := rowsi.(driver.RowsColumnTypeNullable); ok {
ci.nullable, ci.hasNullable = prop.ColumnTypeNullable(i)
}
if prop, ok := rowsi.(driver.RowsColumnTypePrecisionScale); ok {
ci.precision, ci.scale, ci.hasPrecisionScale = prop.ColumnTypePrecisionScale(i)
}
}
return list
} I suggest replacing those five package driver
type ColumnTyper interface {
Rows
ColumnTypes() ([]ColumnType, error)
} The issues I see with this current proposal include:
|
If we were to go with the List of drivers is from golang wiki. Each link is to the source for the struct that implements
|
For my own private project, I took almost this exact approach when hacking the package driver
type ColumnTyper interface {
ColumnTypes() ([]ColumnType, error)
} EDIT FOR CLARITY : this copy-n-edit snippet below (which exposes the driver package) is a poor example: package sql
func (rs *Rows) ColumnTypes() ([]driver.ColumnType, error) {
if rs.closed {
return nil, errors.New("sql: Rows are closed")
}
if rs.rowsi == nil {
return nil, errors.New("sql: no Rows available")
}
if colTyper, ok := rs.rowsi.(driver.ColumnTyper); ok {
return colTyper.ColumnTypes()
}
return nil, errors.New("sql: driver does not provide column type data")
} In my own project I have already implemented the |
The reason is for backwards compatibility and extensibility. If we need to add another property, it can be done. Also the driver package cannot reference the sql package, so we would need to copy the values from each driver.ColumnType to sql.ColumnType.
Some drivers may not have all the information available. But I generally agree with this point.
I don't really buy this one. At the end of the day you are testing the same amount of properties.
I don't understand this.
This is true. I'm not expecting this to be in a super hot loop. I also want to be able to work with tables with hundreds of columns ( http://www.jdetables.com/ sigh). In this case backwards compatibility and extensibility are the key design issues here. Let's take a step back and impose the following restrictions on design:
Thank you for the above links and the concrete counter proposal. Could you modify the proposal to meet the above, or describe how it meets the above? I agree that it meets the last point. |
Yeah, I take your point on the extensibility. I guess my assumption was that this proposal is the fix for the
The example I gave above is from my own hack of the API for a private project where I didn't care about exposing the driver package (I'll add a comment to clarify this, probably should have omitted that example). But yes, I was imagining the two package sql
type Rows struct {
dc *driverConn // owned; must call releaseConn when closed to release
releaseConn func(error)
rowsi driver.Rows |
I guess ^^^ would look something like this: package sql
type ColumnType struct {
cti driver.ColumnType
}
func (c *ColumnType) Name() string {
return c.cti.Name()
}
func (c *ColumnType) Length() (length int64, ok bool) {
return c.cti
}
// etc...
func (rs *Rows) ColumnTypes() ([]ColumnType, error) {
if rs.closed {
return nil, errors.New("sql: Rows are closed")
}
if rs.rowsi == nil {
return nil, errors.New("sql: no Rows available")
}
if colTyper, ok := rs.rowsi.(driver.ColumnTyper); ok {
drvCols, err := colTyper.ColumnTypes()
if err != nil {
sqlCols := make([]ColumnType, len(drvCols))
for i := range drvCols {
sqlCols[i] = ColumnType{drvCols[i]}
}
return sqlCols, nil
}
return nil, err
}
return nil, errors.New("sql: driver does not provide column type data")
} Which does result in a bunch of allocations, hmmmn. |
@kardianos I'm coming around to your approach due to your very valid point about future extensibility (which wasn't a design consideration for me in my own hack). Also, it's probably the case that my point about the amount of calls to the driver to construct the |
Please expose original database column type at least as string. this is currently so restricting on this package. this will help us to do some level dynamic column value parsing
The text was updated successfully, but these errors were encountered: