include synonyms in `odbcListObjects()` output #773

simonpcouch · 2024-03-07T22:36:34Z

Closes #221.

This PR modifies/adds SQL Server methods for dbListTables(), dbExistsTable(), and odbcListObjects() that include synonyms in output. This also means that synonyms will show up in the Connections pane.

~~Note that this PR makes no changes to dbListTables(), which is noted in the original issue.~~

[EDITs: update PR scope]

simonpcouch · 2024-03-07T22:40:26Z

R/driver-sql-server.R

+ INNER JOIN sys.databases AS DB
+ ON Sch.principal_id = DB.database_id
+ WHERE DB.name = ? AND Sch.name = ?
+ AND OBJECTPROPERTY(Object_ID(Syn.base_object_name), 'IsTable') = 1;",


For now, I've opted to only include synonyms whose base object is a table, and refer to that synonym as a "table" in the output. This means that synonyms (that are tables) will be previewable through the Connections pane, but also feels a bit hacky. From what I can tell, the drop-down based in the pane based on odbcListFields() also doesn't work (but doesn't cause errors/crashes).

Instead, we could opt not to include this table and return the type as synonym. This means that there would be entries in the Connections pane but no previews.

ThomasSoeiro · 2024-03-08T22:19:05Z

@simonpcouch
Hi,
I have the same issue with an Oracle database. Should I open a new issue or can it be taken care of here?
Thanks!

simonpcouch · 2024-03-11T13:54:19Z

@ThomasSoeiro A separate issue is good! :)

hadley · 2024-03-13T13:39:06Z

Hmmm, I think I'd be inclined to include these in dbListTables(), using our principle that if you can do SELECT foo then dbExistsTable(con, "foo") should be true, and "foo" should be included in the output of dbListTables().

simonpcouch · 2024-03-26T19:20:17Z

Synonyms are now supported with dbListTables() and dbExistsTable().

With `main`

library(DBI)
library(odbc)
con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA",
               pwd = Sys.getenv("sqlServerPass"))

dbExecute(con, "create schema odbc")
#> [1] 0
dbExecute(con, "create table odbc.test (x int)")
#> [1] 0

# confirm that we can find the table:
odbcListObjects(con, catalog = "master", schema = "odbc")
#>   name  type
#> 1 test table
dbListTables(con, catalog = "master", schema = "odbc")
#> [1] "test"

# make a synonym and show that it can't be found:
dbExecute(con, "create synonym odbc.test2 for odbc.test")
#> [1] 0
odbcListObjects(con, catalog = "master", schema = "odbc")
#>   name  type
#> 1 test table
dbListTables(con, catalog = "master", schema = "odbc")
#> [1] "test"

With this PR

With this PR:

library(DBI)
library(odbc)
con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA",
                 pwd = Sys.getenv("sqlServerPass"))

dbExecute(con, "create schema odbc")
#> [1] 0
dbExecute(con, "create table odbc.test (x int)")
#> [1] 0

# confirm that we can find the table:
odbcListObjects(con, catalog = "master", schema = "odbc")
#>   name  type
#> 1 test table
dbListTables(con, catalog = "master", schema = "odbc")
#> [1] "test"

# make a synonym and show that it CAN be found:
dbExecute(con, "create synonym odbc.test2 for odbc.test")
#> [1] 0
odbcListObjects(con, catalog = "master", schema = "odbc")
#>    name  type
#> 1  test table
#> 2 test2 table
dbListTables(con, catalog = "master", schema = "odbc")
#> [1] "test"  "test2"

My biggest concern at this point is performance, esp. in cases when databases have many synonyms. Should this feature be gated behind an argument?

had issues passing these as parameters using something like `WHERE (? IS NULL OR DB.name = ?) AND (? IS NULL OR Sch.name = ?)`

R/driver-sql-server.R

hadley · 2024-04-08T13:39:42Z

R/driver-sql-server.R

@@ -73,11 73,11 @@ setMethod("dbExistsTable", c("Microsoft SQL Server", "character"),
 stopifnot(length(name) == 1)
 if (isTempTable(conn, name, ...)) {
 query <- paste0("SELECT OBJECT_ID('tempdb..", name, "')")


I wonder if it's worth checking to see if this works for synonyms too? (without the tempdb..). If so, that would substantially simplify this function, and maybe make it simpler?

@simonpcouch Did you see this comment?

I did! Just haven't made a moment to come back to this PR to address this comment and run some quick benchmarks to see how it affects performance. Will re-request review then, likely tomorrow!

Oh sorry, I though you had re-requested a review, but I must've been misreading my notifications.

Was almost able to make this work with:

dots <- list(...) name_loc <- paste0( c(dots[["catalog_name"]], dots[["schema_name"]], name), collapse = "." ) return(!is.na(dbGetQuery(conn, paste0("SELECT OBJECT_ID('", name_loc, "')"))[[1]]))

The only issue is that OBJECT_ID() seems to only be able to work with fully qualified names. i.e.

library(DBI) library(odbc) con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA", pwd = Sys.getenv("sqlServerPass")) dbExecute(con, "create schema odbc") #> [1] 0 dbExecute(con, "create table odbc.test (x int)") #> [1] 0 dbExecute(con, "create synonym odbc.test2 for odbc.test") #> [1] 0 # when fully qualified, works fine: dbExistsTable(con, SQL("master.odbc.test2")) #> [1] TRUE # but not without schema/catalog specified "test2" %in% dbListTables(con) #> [1] TRUE dbExistsTable(con, "test2") #> [1] FALSE

^{Created on 2024-04-10 with reprex v2.1.0}

Specifically, this triggered this DBItest test.

Hmmmm, is dbListTables() correct here? i.e. SELECT * FROM test2 won't work (IIUC) because it's not in the current schema. So should it actually be listed?

:thinkies: The DBI docs for the generic read:

dbListTables() returns a character vector that enumerates all tables and views in the database.

so... I think it's correct that dbListTables() would list test2.

Some statements:

DBI docs: dbListTables() returns a character vector that enumerates all tables and views in the database.

DBItest, linked above: For all tables listed by dbListTables(), dbExistsTable() should return TRUE.

Us, on slack: maybe the principle is that if SELECT * FROM foo works then dbExistsTable(con, "foo") should return TRUE?

So dbListTables() and dbExistsTable() should agree, and dbListTables() seems to be doing the right thing. The connection to SELECT * FROM foo doesn't seem to hold up, at least for SQL Server.

FWIW, with this PR:

library(DBI) library(odbc) con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA", pwd = Sys.getenv("sqlServerPass")) "test" %in% dbListTables(con) #> [1] TRUE dbExistsTable(con, "test") #> [1] TRUE dbGetQuery(con, "SELECT * FROM test") #> Error in eval(expr, envir, enclos): nanodbc/nanodbc.cpp:1711: 00000 #> [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'test'. "test2" %in% dbListTables(con) #> [1] TRUE dbExistsTable(con, "test2") #> [1] TRUE dbGetQuery(con, "SELECT * FROM test2") #> Error in eval(expr, envir, enclos): nanodbc/nanodbc.cpp:1711: 00000 #> [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name 'test2'.

^{Created on 2024-04-10 with reprex v2.1.0}

With CRAN, both "test2" calls are FALSE.

I'm pretty sure that the DBI docs are underspecified here — I think they really should read a "character vector that enumerates all tables and views in the database for the active schema/catalog.".

So maybe we're blowing out the scope of this PR, but I think we should narrow down this behaviour because I think that we should be returning FALSE for the test calls too.

But might be easiest to discuss this live on a call with some examples?

hadley · 2024-04-08T13:41:07Z

R/driver-sql-server.R

+ if (!has_catalog & !has_schema) {
+ return(paste0(res, filter_is_table))
+ }


Suggested change

if (!has_catalog & !has_schema) {

return(paste0(res, filter_is_table))

}

I think you can just remove this?

Ah, went ahead and removed this at the time but just ran into why it was there. If !has_catalog & !has_schema then the filter has to start with WHERE rather than AND. I'll go ahead and add a test there so it's more obvious in the future. da86cc4.

R/driver-sql-server.R

edits to `dbExistsTable()` should resolve possible slowdown when the non-synonym table does exist

hadley · 2024-04-10T19:50:45Z

R/driver-sql-server.R

+ catalog = DB.name,
+ [schema] = Sch.name,
+ name = Syn.name


You might be able to simplify this query by using the DB_ID() and SCHEMA_ID() functions to convert schema/database names directly to IDs.

Was able to rework using SCHEMA_NAME() and respecting synonyms(catalog_name), so no more joins and a good bit simpler now.

hadley · 2024-04-10T19:51:25Z

R/driver-sql-server.R

@@ -73,11 73,11 @@ setMethod("dbExistsTable", c("Microsoft SQL Server", "character"),
 stopifnot(length(name) == 1)
 if (isTempTable(conn, name, ...)) {
 query <- paste0("SELECT OBJECT_ID('tempdb..", name, "')")


Hmmmm, is dbListTables() correct here? i.e. SELECT * FROM test2 won't work (IIUC) because it's not in the current schema. So should it actually be listed?

simonpcouch · 2024-04-10T20:38:33Z

Here are those benchmarks with the PR as-is (EDIT: updated timings below):

Elapsed times to the 3 affected functions with 0 to 10000 synonyms present. One column of plots gives CRAN timings and the other giving dev, showing a drastic increase in timings for the dev package.

..where the values to the left of x = 1 are x = 0 (with a small shift for log(0)). "Default" objects means what's available in a relatively fresh SQL Server Docker instance—length(dbListTables(con)) is 618. Timings for dbExistsTable() are for a table that doesn't exist, so the synonyms_query kicks in. Generally, doesn't look great.🏄

Will look into your most recent comments.

Source

 library(tidyverse)
 
 # run the following, once with cran odbc and once with dev ------------------------------
 library(DBI)
 library(odbc)
 con <- dbConnect(odbc(), dsn = "MicrosoftSQLServer", uid = "SA",
                  pwd = Sys.getenv("sqlServerPass"))
 
 dbExecute(con, "create schema odbc")
 dbExecute(con, "create table odbc.test (x int)")
 
 n_synonyms <- c(0, 10^c(0:4))
 timings <- 
   data.frame(
     n = numeric(0), 
     odbcList = numeric(0), 
     dbList = numeric(0), 
     dbExists = numeric(0)
   )
 
 for (n_synonym in n_synonyms) {
   for (n in seq_len(n_synonym)) {
     dbExecute(
       con, 
       SQL(paste0("create synonym odbc.test", n, " for odbc.test"))
     )
   }
   
   for (i in 1:3) {
     timings <- rbind(
       timings,
       data.frame(
         n = n_synonym, 
         odbcList = system.time(odbcListObjects(con))[["elapsed"]], 
         dbList = system.time(dbListTables(con))[["elapsed"]], 
         dbExists = system.time(dbExistsTable(con, "test0"))[["elapsed"]],
         version = ifelse(packageVersion("odbc") == "1.4.2", "cran", "dev")
       )
     )
   }

   for (n in seq_len(n_synonym)) {
     dbExecute(
       con, 
       SQL(paste0("drop synonym odbc.test", n))
     )
   }
 }
 
 # write to the respective .csv
 # write_csv(timings, file = "dev_timings.csv")
 # write_csv(timings, file = "cran_timings.csv")


 # then, reading in and plotting --------------------------------------------------------------
 dev_timings <- read_csv("dev_timings.csv")
 cran_timings <- read_csv("cran_timings.csv")
 
 
 bind_rows(dev_timings, cran_timings) %>%
   pivot_longer(cols = c(odbcList, dbList, dbExists), names_to = "fn") %>%
   mutate(n = n   .1) %>%
   ggplot()  
   geom_point(aes(x = n, y = value))  
   facet_grid(rows = vars(fn), cols = vars(version))  
   scale_x_log10()  
   labs(
     x = "Number of Synonyms", 
     y = "Time",
     caption = "Only 'default' objects   1 0-row table, 3 calls per (n, fn) pair"
   )

since each `sys.synonyms` references only the synonyms that live inside of the database/catalog it's inside of, only query the (fully qualified, if not the one from the active database) `sys.synonyms` in the `catalog_name`. if not `catalog_name` is specified, respect the active database.

simonpcouch · 2024-04-11T19:02:01Z

Updated benchmarks:

Definitely better than before but not negligible. I haven't seen enough real-world DBs to know how close to "worst-case" this is.

simonpcouch · 2024-05-10T17:34:10Z

Let's table this PR until at least after the upcoming release. Will mark as draft for now.

include synonyms in odbcListObjects() output

356c188

simonpcouch commented Mar 7, 2024

View reviewed changes

note change in NEWS

98d9df5

simonpcouch requested a review from hadley March 7, 2024 23:04

support synonyms in dbListTables()

30f592b

simonpcouch added 4 commits March 27, 2024 09:17

transition query to handle NULLs

124c7b9

had issues passing these as parameters using something like `WHERE (? IS NULL OR DB.name = ?) AND (? IS NULL OR Sch.name = ?)`

support synonyms in dbExistsTable()

e268bf8

test support for dbListTables() and dbExistsTable()

f12dd24

update verbiage in NEWS

80a7b43

simonpcouch requested review from hadley and removed request for hadley March 27, 2024 14:55

ThomasSoeiro mentioned this pull request Apr 2, 2024

Include synonyms in odbcListObjects(), dbListTables(), and the Connections Pane #779

Open

hadley reviewed Apr 8, 2024

View reviewed changes

address review comments

1a8170e

edits to `dbExistsTable()` should resolve possible slowdown when the non-synonym table does exist

hadley reviewed Apr 10, 2024

View reviewed changes

simonpcouch added 5 commits April 11, 2024 08:38

restore table filter edge case

da86cc4

correct test for appropriate object type

503f45f

only return synonyms when listing tables

779f0b2

fix bug when a schema consists only of synonyms

1cbc62d

simonpcouch requested a review from hadley April 11, 2024 19:38

simonpcouch marked this pull request as draft May 10, 2024 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include synonyms in `odbcListObjects()` output #773

include synonyms in `odbcListObjects()` output #773

simonpcouch commented Mar 7, 2024 •

edited

Loading

simonpcouch Mar 7, 2024

ThomasSoeiro commented Mar 8, 2024

simonpcouch commented Mar 11, 2024

hadley commented Mar 13, 2024

simonpcouch commented Mar 26, 2024 •

edited

Loading

hadley Apr 8, 2024

hadley Apr 9, 2024

simonpcouch Apr 9, 2024

hadley Apr 9, 2024

simonpcouch Apr 10, 2024

hadley Apr 10, 2024

simonpcouch Apr 10, 2024

hadley Apr 18, 2024 •

edited

Loading

hadley Apr 8, 2024

simonpcouch Apr 11, 2024

hadley Apr 10, 2024

simonpcouch Apr 11, 2024

hadley Apr 10, 2024

simonpcouch commented Apr 10, 2024 •

edited

Loading

simonpcouch commented Apr 11, 2024 •

edited

Loading

simonpcouch commented May 10, 2024

	if (!has_catalog & !has_schema) {
	return(paste0(res, filter_is_table))
	}

include synonyms in odbcListObjects() output #773

Are you sure you want to change the base?

include synonyms in odbcListObjects() output #773

Conversation

simonpcouch commented Mar 7, 2024 • edited Loading

Choose a reason for hiding this comment

ThomasSoeiro commented Mar 8, 2024

simonpcouch commented Mar 11, 2024

hadley commented Mar 13, 2024

simonpcouch commented Mar 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hadley Apr 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonpcouch commented Apr 10, 2024 • edited Loading

simonpcouch commented Apr 11, 2024 • edited Loading

simonpcouch commented May 10, 2024

include synonyms in `odbcListObjects()` output #773

include synonyms in `odbcListObjects()` output #773

simonpcouch commented Mar 7, 2024 •

edited

Loading

simonpcouch commented Mar 26, 2024 •

edited

Loading

hadley Apr 18, 2024 •

edited

Loading

simonpcouch commented Apr 10, 2024 •

edited

Loading

simonpcouch commented Apr 11, 2024 •

edited

Loading