-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming LIST data over ADBC #2066
Comments
I find installing the development ADBC Python packages rather difficult; however, I do have a build set up and ran your example (thanks!) against the postgres driver at main and it seems to fail! That error is coming from a previous implementation of ingestion that doesn't use COPY, and the COPY writer is where the feature was added in the PR you linked. It also may be that a recent PR I did resulted in the COPY path not getting picked up. import os
import pandas
import pyarrow
import adbc_driver_postgresql.dbapi
# From the apache-adbc checkout, run docker compose up postgres-test
conn = adbc_driver_postgresql.dbapi.connect("postgresql://localhost:5432/postgres?user=postgres&password=password")
with conn.cursor() as cur:
cur.execute("DROP TABLE IF EXISTS example")
dfs = [
pandas.DataFrame({"A": [[1, 2, 3], [1, 2], [1, 2, 3, 4]], "B": [10., 20., 30.]}),
pandas.DataFrame({"A": [[4, 5, 6], [4, 5, 6, 7], [4]], "B": [40., 50., 60.]}),
]
# Create a table and append to it.
with conn.cursor() as cur:
for df in dfs:
table = pyarrow.Table.from_pandas(df)
cur.adbc_ingest("example", table, mode="create_append")
cur.execute("SELECT * FROM example")
print(cur.fetchall())
conn.close()
#> Traceback
#> ...
#> NotSupportedError: NOT_IMPLEMENTED: [libpq] Field #1 ('A') has unsupported type for ingestion list Reproducer in R: library(adbcdrivermanager)
#> Warning: package 'adbcdrivermanager' was built under R version 4.3.3
con <- adbc_database_init(
adbcpostgresql::adbcpostgresql(),
uri = "postgresql://localhost:5432/postgres?user=postgres&password=password"
) |>
adbc_connection_init()
df <- tibble::tibble(
A = vctrs::list_of(1:3, 4:5, 6:10),
B = c(10.0, 20.0, 30.0)
)
con |>
execute_adbc("DROP TABLE IF EXISTS table_with_list")
df |>
write_adbc(con, "table_with_list")
#> Error in adbc_statement_execute_query(stmt): NOT_IMPLEMENTED: [libpq] Field #1 ('A') has unsupported type for ingestion list Created on 2024-08-07 with reprex v2.1.0 |
Hmm..I'm wondering if this call to arrow-adbc/c/driver/postgresql/statement.cc Line 1305 in 3168e09
|
Hello @paleolimbot, Thank you very much for your response. In addition to python, we created also a c example by following the code and the tests in the c repo. We were able to create a table and read from it when the table does not contain lists. However, the code crashes with segmentation fault if there is any list ingestion. All the queries run without throwing any error but no table is created in the postgresql database. The code crashes at the I also tried to comment the line you pointed and build the code but it did not fix the problem.
<\details> |
Thanks for this! I'll go through the C example in a moment, but I played with this a tiny bit and there is also the place where we serialize the type name (which can probably be moved to the arrow-adbc/c/driver/postgresql/statement.cc Lines 1098 to 1101 in 3168e09
I also tried shuffling some pieces around to see if we could use the "append" mode (i.e., issue the CREATE TABLE manually), but when I do this I still get an invalid COPY binary error. I'll put all this in a PR but wanted to document it while it was fresh on my mind! |
Hello @paleolimbot. Just giving this a ping to express our continued interest in this topic and willingness to contribute if we can. |
Thank you for the ping! I was unfortunately on PTO for the last week before the forthcoming release but will be back next week and will make sure a fix for this is included in the next release (~6 weeks). |
Thanks! That is helpful for our planning. I hope you had some enjoyable days off. |
…greSQL (#2153) This PR adds list ingest support in the PostgreSQL driver. The raw COPY and type support had already been added; however, there were still some hard-coded switch-on-type that implemented some necessary features (e.g., the CREATE TABLE syntax for each type). I moved these to the type/copy section and removed the previous implementation (which should also help for a future PR implementing the ability to bind nested types as parameters). I also added the infrastructure to test this in all drivers via the validation suite, which required modifying some of the built-in concepts to accommodate a nested type. Closes #2066. ``` r library(adbcdrivermanager) con <- adbc_database_init( adbcpostgresql::adbcpostgresql(), uri = "postgresql://localhost:5432/postgres?user=postgres&password=password" ) |> adbc_connection_init() df <- tibble::tibble( A = vctrs::list_of(1:3, 4:5, 6:10), B = c(10.0, 20.0, 30.0) ) con |> execute_adbc("DROP TABLE IF EXISTS table_with_list") df |> write_adbc(con, "table_with_list") con |> read_adbc("select * from table_with_list") |> tibble::as_tibble() |> dplyr::pull(1) #> <list_of<integer>[3]> #> [[1]] #> [1] 1 2 3 #> #> [[2]] #> [1] 4 5 #> #> [[3]] #> [1] 6 7 8 9 10 ``` <sup>Created on 2024-09-11 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>
What would you like help with?
Hello @paleolimbot, we briefly spoke at SciPy a month ago. @cpcloud and @gforsyth sent me your way.
My colleauge @skarakuzu and I have the following application:
In our use case, we are interested in reading partial datasets while an experiment is still ongoing, but we are not especially sensitive to real-time access; a little lag is acceptable. We would like to use Postgres for facility-scale "production" and SQLite for small deployments and dev/test.
We think it makes sense to use ADBC to get the data into Postgres and SQLite.
Most of our data is simple tabular data with basic types. This is a simple sketch that works:
In rare cases, we have detectors that produce a small "waveform" of data, a small variable-length list, alongisde other columns of scalar readings, like this:
Expand for a complete runnable example with this data:
As you would expect this fails with a clear error message:
It looks like support for lists was recently added for Postgres in #1962 but not yet released. We have tried without success to cobble together a C example inserting lists into PostgreSQL on
main
. I wonder if you could sketch a short runnable example to help us get started. (Let us know if you want to see what we've tried.) We would happy to contribute to the C documentation for ADBC once get something working.Once #1962 is released, should we expect lists to work from Python as well, as it wraps the C/C , or are more changes needed on the Python side to make that work?
Finally, we're interested in adding support for lists in SQLite. As SQLite is loose about types, perhaps one reasonable way to do this would be with a JSON column. I wonder what you think of that approach.
The text was updated successfully, but these errors were encountered: