Skip the Pipeline. Stream to Gold.

Blazing Fast Streaming Ingestion and Analytics

Replace Kafka + Flink + Spark + BI ETL with
one server and S3.

Transform and Analyse data in real-time using SQL.
No pipelines, no complexity.

Ideal for companies who need enterprise-grade streaming into S3 Data Lake without the engineering overhead.
Ducking simple!

DuckDB Terminal

Why Teams Choose BoilStream

🧩

10x Simpler
- Save 80%+

Eliminate traditional ETL complexity. CREATE VIEW generates child streams automatically. No intermediate storage - just real-time SQL transformations flowing to your chosen destination. No hidden costs or volume based surprise bills! Quick start.

Traditional
(hours, $$$):
Source with SDK β†’ Stream Server β†’ Real-time Transformer β†’ S3 Bronze β†’ Batch Processor β†’ S3 Silver β†’ Analytics Engine β†’ S3 Gold β†’ BI ETL
BoilStream
(seconds, $):
DuckDB β†’ BoilStream β†’ S3 Parquet
πŸš€

Quick to Start and Easy
- Unmatched Rust Performance

Handle thousands of simultaneous SQL streams with linear scalability. Our architecture aggregates high-volume concurrent writes into optimized Parquet files on S3.

βœ… 10,000 concurrent sessions tested
βœ… 2.5GB/s throughput achieved (16 core instance)
βœ… Optimised Topic Parquet files
⚑

Real-time Analytics
- One tool for many purposes, like SQL

Run fast real-time DuckDB SQL queries with our postgres compatible integration. Create derived data streams with standard SQL CREATE VIEW. Each view becomes a separate topic with real-time transformations. Filter, aggregate, and transform data as it flows through your BoilStream server.

-- Create base topic
CREATE TABLE boilstream.s3.events (user_id INT, event_type VARCHAR, timestamp TIMESTAMP);

-- Create real-time materialized view with filtering
CREATE VIEW boilstream.s3.events→login_events AS 
SELECT * FROM boilstream.s3.events 
WHERE event_type = 'login';
πŸ”’

Operational Ease
- Run anywhere

Every INSERT writes to both server local DuckDB databases (optional) and S3 Parquet files simultaneously. Immediate local analytics with zero latency, plus durable cloud storage for long-term data lakes. Best of both worlds in a single write operation, no backups needed. Also allows unlimited read replicas through S3. Runs with DuckLake for really easy Data Lake setup. Run anywhere, cloud agnostic.

🏠 Local instant analytics
☁️ Cloud durable storage
⚑ Both single write
πŸ›‘οΈ

Low-Risk Technology Choice
- No Lock-in

Built on open standards and familiar technologies. All data stored as standard Parquet files on S3, queryable with pure DuckDB SQL. Easy exit strategy and incremental adoption minimize technology risk for startups.

βœ… Standard Parquet & SQL
βœ… No Lock-in open formats
βœ… Easy Exit switch anytime
πŸ”

Enterprise-Grade Reliability
- Battle-Tested

Built on proven foundations with enterprise-level durability guarantees. Memory-safe Rust architecture eliminates common server vulnerabilities while S3's industry-leading durability backs your data.

βœ… 99.999999999% S3 durability
βœ… Memory Safe Rust foundation
βœ… No SPOF Diskless S3 first design - with on-disk caching
"It's like Network Equipment
 - a secure, ultra-high-throughput gw to S3"
Industry Perspective On BoilStream's Architecture
"With half the infra I was able to push in like 4x w BoilStream!"
Principal Architect On BoilStream's Performance
(compared to industry leading analytics streaming database)

Streaming LakeHouse Architecture

Streaming LakeHouse Features

  • Real-time materialized views with CREATE VIEW syntax
  • Parentβ†’child topic relationships for data derivation
  • Single-node diskless architecture eliminates storage failures
  • Direct S3 writes with atomic commits ensure data durability
  • Built-in backpressure prevents system overload
  • Zero-copy Arrow processing with Rust maximizes throughput
  • Supports multiple concurrent storage destioations (e.g. two S3 Buckets) and multiple DuckLakes
SQL CREATE VIEW
↓
BoilStream
↓
Derived Topics β†’ S3
Real-time Transformations

Real-World Performance

10,000
Concurrent Sessions
simultaneous writers
2.5 GB/s+
Throughput
sustained ingestion rate (capped),
excluding S3 uploads
(AWS c7gn.4xLarge, 16 vCPU)
10M+
rows/s
horizontal scale

For Teams without Dedicated Data Engineers

Strategic Advantages

  • Eliminate complex streaming frameworks and pipelines
  • Deploy in hours, not months
  • Reduce analytics infrastructure costs by 80%+
  • Scale horizontally with diskless architecture
  • BI Tool integration with PostgreSQL protocol for real-time and non-realtime analytics
SQL Skills
↓
BoilStream
↓
Real-time Analytics
No Learning Curve

Choose Your Scale

Free Tier

$0
  • 40 GB/hour throughput limit
  • 100 concurrent sessions
  • Derived topics with SQL
  • Configurable rate limiting
  • DuckLake integration
  • ZSTD Compressed Parquet
  • BI Tool integration (pgwire)
Download Free

Enterprise

$450/month base (up to 4 cores) + $125/core/month above 4 cores Multi-node clustering available
  • Everything in Professional
  • Multi-node clustering for HA and horizontal scale
  • On-the-fly data encryption
  • Phone support, dedicated engineer
Contact Sales

See It In Action

Script

# Start BoilStream Stream Processor
./boilstream --config local-dev.yaml

# Connect from DuckDB and ingest with high throughput
LOAD airport;
ATTACH 'boilstream' (TYPE AIRPORT, location 'grpc://172.31.7.31:50051/');
INSERT INTO boilstream.s3.people SELECT
    'boilstream_' || i::VARCHAR as name,
    (i % 100) + 1 as age,
    ['airport', 'ducklake'] as tags
FROM generate_series(1, 20000000000000) as t(i);

Get Started Today

Download Links

Direct download links:

Linux ARM64:
https://www.boilstream.com/binaries/linux-aarch64/boilstream
macOS ARM64:
https://www.boilstream.com/binaries/darwin-aarch64/boilstream

Quick Start

# docker-compose.yml for Valkey
# Optionally: Prometheus, Grafana, and Minio
git clone https://github.com/boilingdata/boilstream.git
cd boilstream && docker compose up -d

# Download boilstream (OSX arm64)
wget https://www.boilstream.com/binaries/darwin-aarch64/boilstream
chmod 755 boilstream
./boilstream --config local-dev.yaml

Built-in DuckLake Integration

Automatic Catalog Management

  • Automatic Parquet file registration with DuckLake catalogs
  • Multi-catalog support (e.g. analytics vs operational)
  • Built-in reconciliation ensures storage and catalog sync
  • Topic names become DuckLake table names automatically
  • Schema inference from Parquet files for seamless evolution
BoilStream INSERT
↓
S3 Parquet + DuckLake
↓
Analytics Ready
Automatic Registration

Simple Configuration

# Storage with DuckLake integration
storage:
  backends:
    - name: "primary-s3"
      backend_type: "s3"
      bucket: "ingestion-data"
      ducklake: ["analytics_catalog"]  # Auto-register files

# DuckLake catalog configuration  
ducklake:
  - name: analytics_catalog
    data_path: "s3://ingestion-data/"
    attach: |
      INSTALL ducklake; LOAD ducklake;
      INSTALL postgres; LOAD postgres;
      CREATE SECRET postgres (TYPE POSTGRES, HOST 'localhost', ...);
      CREATE SECRET catalog_secret (TYPE DUCKLAKE, 
        DATA_PATH 's3://ingestion-data/',
        METADATA_PARAMETERS MAP {'TYPE': 'postgres', 'SECRET': 'postgres'});
      ATTACH 'ducklake:catalog_secret' AS analytics_catalog;

Frequently Asked Questions about
BoilStream Data Processor

How does BoilStream compare to Apache Kafka for data streaming?

BoilStream eliminates the multi-step Kafka pipeline. While Kafka requires separate consumers, processors, and converters to produce analytics-ready data, BoilStream writes optimized Parquet files directly to S3 in one step.

Key differences:

  • Interface: Standard SQL vs. custom streaming APIs
  • Output: Direct Parquet vs. raw messages requiring conversion
  • Architecture: Single-node diskless vs. multi-stage clusters
  • Validation: Built-in schema validation vs. external tooling

Result: SQL developers can start immediately without learning streaming frameworks or managing complex pipelines.

How do materialized views work in BoilStream's Stream Processor?

BoilStream's materialized views are real-time streaming transformations created with standard CREATE VIEW syntax. Each view becomes a separate topic that automatically processes data as it arrives in the parent topic.

How it works:

  • CREATE VIEW: Standard SQL syntax creates a derived topic
  • Real-time processing: Data transforms as it streams through
  • Separate topics: Each view creates an independent output stream
  • S3 storage: All views write optimized Parquet files to S3

Result: Complex data transformations become simple SQL statements that run continuously in your Stream Processor.

How does BoilStream handle schema evolution in data streaming?

BoilStream automatically manages schema changes without breaking existing analytics or requiring downtime. When your data structure evolves, new versions are stored alongside existing data.

Schema evolution features:

  • Automatic versioning: New schemas stored in separate S3 paths
  • Backward compatibility: Existing queries continue working
  • Transaction integrity: Sequence tracking and validation metadata
  • Zero downtime: Schema changes require no system restarts

Result: Your analytics pipelines remain stable as your data models evolve, eliminating the need for complex migration procedures.

Can I use my existing DuckDB and SQL skills with BoilStream?

Absolutely! BoilStream uses standard DuckDB SQL commands including COPY and INSERT statements. Your existing SQL knowledge transfers completely - no new APIs, SDKs, or streaming frameworks to learn.

What works out of the box:

  • Standard SQL queries and syntax
  • Existing DuckDB extensions and functions
  • Current data processing workflows
  • Team's existing SQL expertise

Result: Your team can start streaming data immediately using skills they already have.

How does BoilStream ensure data durability and reliability?

BoilStream guarantees data durability through immediate S3 persistence with no local disk dependencies. When your SQL statement completes successfully, your data is already safely stored on S3.

Reliability features:

  • Immediate durability: S3 multipart uploads with atomic commits
  • No single points of failure: Diskless architecture eliminates local storage risks
  • Transaction integrity: Complete transaction tracking with sequence validation
  • Automatic recovery: Failed uploads retry without data loss

Result: Enterprise-grade reliability with S3's 99.999999999% durability backing your streaming data.

Can I run BoilStream on-premises or locally for data streaming?

Yes! BoilStream runs anywhere with complete feature parity. Use any S3-compatible storage including MinIO for on-premises deployments or local development.

Deployment options:

  • Cloud: AWS, Azure, GCP with native S3 integration
  • On-premises: Your datacenter with MinIO or S3-compatible storage
  • Local development: Laptops and workstations for testing
  • Hybrid: Mix cloud and on-premises as needed

Result: No vendor lock-in, complete deployment flexibility, and consistent experience across all environments.

Does BoilStream write to both local DuckDB and S3 simultaneously?

Yes! BoilStream features dual persistence - every INSERT operation writes to both local DuckDB databases and S3 Parquet files in a single transaction. This provides immediate local analytics with zero latency while ensuring durable cloud storage.

Dual persistence benefits:

  • Instant analytics: Query local DuckDB with microsecond latency
  • Cloud durability: Automatic S3 backup for long-term storage
  • Single operation: No additional configuration or complexity
  • Best of both worlds: Local speed + cloud scale

Result: Your data is immediately available for real-time analytics locally while being safely stored in the cloud for data lake operations.

How do I monitor BoilStream performance in production environments?

BoilStream exposes comprehensive metrics through a Prometheus endpoint with ready-to-use Grafana dashboards provided via Docker Compose for immediate production monitoring.

Monitoring capabilities:

  • Performance metrics: Throughput, Inserts/s
  • System health: Memory usage, Queuu Backpressure
  • Pre-built dashboards: Production-ready Grafana visualizations

Result: Complete observability out-of-the-box with industry-standard monitoring tools your DevOps team already knows.