Load Data to columnstore table
Ensure the DDL of your columnstore table matches the source of data.
From a PostgreSQL table
Note: columnstore tables are not suited for small writes and updates.
It is recommended to batch writes using a cron job, or triggers from a staging table. Reach us for help.
From Parquet / CSV / JSON files
To load files from S3, set up cloud storage credentials.
1. Query a file with mooncake.read_parquet, mooncake.read_csv, or mooncake.read_json.
Note: you will have to define the schema of the file in AS... If your mooncake.read_parquet does not explicitly involve a columnstore table, you may need to run:
3. Load multiple files using the wildcard path:
From a Huggingface Dataset
Similar to loading from S3, replace s3:// prefix with hf://
1. Find Huggingface dataset like lmsys_chat_1m_clean.
2. Navigate to files, and find the path to data file.
3. Add hf:// prefix and remove /blob/main.
4. Join HuggingFace dataset with an existing PostgreSQL table.
From existing Iceberg/Delta tables
1. Read existing tables with iceberg_scan or delta_scan:
2. Load an Iceberg table into a columnstore table.