Querying Columnstore Tables outside of Postgres

You can directly query pg_mooncake columnstore tables with execution engines like Polars. Here is how:

1. Find the Table location from Postgres Catalog Table

Find the directory in your object store / local file style where the columnstore table was written to.

select * from mooncake.columnstore_tables;

you will get

    table_name     |                                          path                                          
-------------------+----------------------------------------------------------------------------------------
 users_columnstore | /usr/local/pgsql/data/mooncake_local_tables/mooncake_postgres_users_columnstore_16446/
 users_log         | /usr/local/pgsql/data/mooncake_local_tables/mooncake_postgres_users_log_16455/
 x                 | s3://mooncake-test/mooncake_postgres_x_16459/

It will show either a local path or a s3 path for each table.

2. Install Polars

pip install 'polars'

3. Create a DataFrame

import polars as pl
s3_path = 's3://mooncake-test/reddit_comments_final/'
df = pl.scan_delta(s3_path)

4. Run Python Transformations

df.filter(
    (pl.col("pickup_time") >= '2021-01-01T00:00:00+00:00') & 
    (pl.col("pickup_time") < '2021-02-01T00:00:00+00:00')
).collect()