Setup
import pandas as pd
import pyodbc
import credentials # separate file with user credentials
Pyodbc settings
host = 'localhost'
port = 31010
uid = credentials.user
pwd = credentials.password
driver = '/opt/dremio-odbc/lib64/libdrillodbc_sb64.so' # ubuntu/debian default odbc driver
cnxn = pyodbc.connect("Driver={};ConnectionType=Direct;HOST={};PORT={};AuthenticationType=Plain;UID={};PWD={};".format(driver, host, port, uid, pwd), autocommit=True)
Read dataframe based on SQL Query
sql = 'SELECT * from "test"."weather" Limit 10'
df = pd.read_sql(sql, cnxn)
Output
df.head()
STATION | NAME | LATITUDE | LONGITUDE | ELEVATION | DATE | PRCP | SNOW | SNWD | TAVG | TMAX | TMIN | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | USW00023272 | SAN FRANCISCO DOWNTOWN, CA US | 37.7705 | -122.4269 | 45.7 | 2018-01-01 | 0.00 | 61 | 48 | |||
1 | USW00023272 | SAN FRANCISCO DOWNTOWN, CA US | 37.7705 | -122.4269 | 45.7 | 2018-01-02 | 0.00 | 61 | 52 | |||
2 | USW00023272 | SAN FRANCISCO DOWNTOWN, CA US | 37.7705 | -122.4269 | 45.7 | 2018-01-03 | 0.09 | 58 | 53 | |||
3 | USW00023272 | SAN FRANCISCO DOWNTOWN, CA US | 37.7705 | -122.4269 | 45.7 | 2018-01-04 | 0.06 | 63 | 53 | |||
4 | USW00023272 | SAN FRANCISCO DOWNTOWN, CA US | 37.7705 | -122.4269 | 45.7 | 2018-01-05 | 0.26 | 61 | 52 |
- Docker
- Virtualenv or Conda environment
- Pip
- Dremio Data Lake Standalone Community or Enterprise Edition
- Dremio ODBC compatible driver
- Python package requirements
DREMIO - The Data Lake Engine docs.