This library provides an integration with the DuckDB database and PySpark data processing library.
Stores PySpark DataFrames in DuckDB.
Note: This type handler can only store outputs. It cannot currently load inputs.
To use this type handler, pass it to build_duckdb_io_manager
Example
from dagster_duckdb import build_duckdb_io_manager
from dagster_duckdb_pyspark import DuckDBPySparkTypeHandler
@asset
def my_table():
...
duckdb_io_manager = build_duckdb_io_manager([DuckDBPySparkTypeHandler()])
@repository
def my_repo():
return with_resources(
[my_table],
{"io_manager": duckdb_io_manager.configured({"database": "my_db.duckdb"})}
)