Querying Columnstore tables outside of Postgres
You can create a Polars or Pandas DataFrame directly from a columnstore table.
1. Find the Table location from Postgres Catalog Table
Find the directory in your object store / local file style where the columnstore table was written to.
select * from mooncake.columnstore_tables;
you will get
table_name | path
-------------------+----------------------------------------------------------------------------------------
users_columnstore | /usr/local/pgsql/data/mooncake_local_tables/mooncake_postgres_users_columnstore_16446/
users_log | /usr/local/pgsql/data/mooncake_local_tables/mooncake_postgres_users_log_16455/
x | s3://mooncake-test/mooncake_postgres_x_16459/
It will show either a local path or a s3 path for each table.
2. Install Polars
pip install 'polars'
3. Create a DataFrame
import polars as pl
s3_path = 's3://mooncake-test/reddit_comments_final/'
df = pl.scan_delta(s3_path)
4. Run Python Transformations
df.filter(
(pl.col("pickup_time") >= '2021-01-01T00:00:00+00:00') &
(pl.col("pickup_time") < '2021-02-01T00:00:00+00:00')
).collect()