Replace Kafka + Flink + Spark + BI ETL with
one server and S3.
Transform and Analyse data in real-time using SQL.
No pipelines, no complexity.
Ideal for companies who need enterprise-grade streaming into S3 Data Lake without the engineering overhead.
Ducking simple!
Eliminate traditional ETL complexity. CREATE VIEW generates child streams automatically. No intermediate storage - just real-time SQL transformations flowing to your chosen destination. No hidden costs or volume based surprise bills! Quick start.
Handle thousands of simultaneous SQL streams with linear scalability. Our architecture aggregates high-volume concurrent writes into optimized Parquet files on S3.
Run fast real-time DuckDB SQL queries with our postgres compatible integration. Create derived data streams with standard SQL CREATE VIEW. Each view becomes a separate topic with real-time transformations. Filter, aggregate, and transform data as it flows through your BoilStream server.
-- Create base topic
CREATE TABLE boilstream.s3.events (user_id INT, event_type VARCHAR, timestamp TIMESTAMP);
-- Create real-time materialized view with filtering
CREATE VIEW boilstream.s3.eventsβlogin_events AS
SELECT * FROM boilstream.s3.events
WHERE event_type = 'login';
Every INSERT writes to both server local DuckDB databases (optional) and S3 Parquet files simultaneously. Immediate local analytics with zero latency, plus durable cloud storage for long-term data lakes. Best of both worlds in a single write operation, no backups needed. Also allows unlimited read replicas through S3. Runs with DuckLake for really easy Data Lake setup. Run anywhere, cloud agnostic.
Built on open standards and familiar technologies. All data stored as standard Parquet files on S3, queryable with pure DuckDB SQL. Easy exit strategy and incremental adoption minimize technology risk for startups.
Built on proven foundations with enterprise-level durability guarantees. Memory-safe Rust architecture eliminates common server vulnerabilities while S3's industry-leading durability backs your data.
"It's like Network Equipment
- a secure, ultra-high-throughput gw to S3"
"With half the infra I was able to push in like 4x w BoilStream!"
# Start BoilStream Stream Processor
./boilstream --config local-dev.yaml
# Connect from DuckDB and ingest with high throughput
LOAD airport;
ATTACH 'boilstream' (TYPE AIRPORT, location 'grpc://172.31.7.31:50051/');
INSERT INTO boilstream.s3.people SELECT
'boilstream_' || i::VARCHAR as name,
(i % 100) + 1 as age,
['airport', 'ducklake'] as tags
FROM generate_series(1, 20000000000000) as t(i);
# docker-compose.yml for Valkey
# Optionally: Prometheus, Grafana, and Minio
git clone https://github.com/boilingdata/boilstream.git
cd boilstream && docker compose up -d
# Download boilstream (OSX arm64)
wget https://www.boilstream.com/binaries/darwin-aarch64/boilstream
chmod 755 boilstream
./boilstream --config local-dev.yaml
# Storage with DuckLake integration
storage:
backends:
- name: "primary-s3"
backend_type: "s3"
bucket: "ingestion-data"
ducklake: ["analytics_catalog"] # Auto-register files
# DuckLake catalog configuration
ducklake:
- name: analytics_catalog
data_path: "s3://ingestion-data/"
attach: |
INSTALL ducklake; LOAD ducklake;
INSTALL postgres; LOAD postgres;
CREATE SECRET postgres (TYPE POSTGRES, HOST 'localhost', ...);
CREATE SECRET catalog_secret (TYPE DUCKLAKE,
DATA_PATH 's3://ingestion-data/',
METADATA_PARAMETERS MAP {'TYPE': 'postgres', 'SECRET': 'postgres'});
ATTACH 'ducklake:catalog_secret' AS analytics_catalog;
BoilStream eliminates the multi-step Kafka pipeline. While Kafka requires separate consumers, processors, and converters to produce analytics-ready data, BoilStream writes optimized Parquet files directly to S3 in one step.
Key differences:
Result: SQL developers can start immediately without learning streaming frameworks or managing complex pipelines.
BoilStream's materialized views are real-time streaming transformations created with standard CREATE VIEW syntax. Each view becomes a separate topic that automatically processes data as it arrives in the parent topic.
How it works:
Result: Complex data transformations become simple SQL statements that run continuously in your Stream Processor.
BoilStream automatically manages schema changes without breaking existing analytics or requiring downtime. When your data structure evolves, new versions are stored alongside existing data.
Schema evolution features:
Result: Your analytics pipelines remain stable as your data models evolve, eliminating the need for complex migration procedures.
Absolutely! BoilStream uses standard DuckDB SQL commands including COPY and INSERT statements. Your existing SQL knowledge transfers completely - no new APIs, SDKs, or streaming frameworks to learn.
What works out of the box:
Result: Your team can start streaming data immediately using skills they already have.
BoilStream guarantees data durability through immediate S3 persistence with no local disk dependencies. When your SQL statement completes successfully, your data is already safely stored on S3.
Reliability features:
Result: Enterprise-grade reliability with S3's 99.999999999% durability backing your streaming data.
Yes! BoilStream runs anywhere with complete feature parity. Use any S3-compatible storage including MinIO for on-premises deployments or local development.
Deployment options:
Result: No vendor lock-in, complete deployment flexibility, and consistent experience across all environments.
Yes! BoilStream features dual persistence - every INSERT operation writes to both local DuckDB databases and S3 Parquet files in a single transaction. This provides immediate local analytics with zero latency while ensuring durable cloud storage.
Dual persistence benefits:
Result: Your data is immediately available for real-time analytics locally while being safely stored in the cloud for data lake operations.
BoilStream exposes comprehensive metrics through a Prometheus endpoint with ready-to-use Grafana dashboards provided via Docker Compose for immediate production monitoring.
Monitoring capabilities:
Result: Complete observability out-of-the-box with industry-standard monitoring tools your DevOps team already knows.