This page orients you to the integration surface so you can scope ingestion and consumption work. For validation and publishing, continue with Testing your integration and Documenting your integration.Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-mintlify-44cf2c7c.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Ingestion
Two paths bring data into ClickHouse. Choose based on whether your product should own the ingestion plane or delegate it.Path A: ClickPipes (managed, ClickHouse Cloud only)
If you prefer not to build and operate ingestion infrastructure, ClickPipes is the managed service that pulls from your customer’s sources into their ClickHouse Cloud service. ClickPipes handles scaling, parallelization, retries, and lag reporting. Supported sources today include:- Streaming: Apache Kafka (including MSK, Confluent Cloud, Redpanda, Azure Event Hubs, WarpStream), Amazon Kinesis
- Object storage: Amazon S3 (and S3-compatible stores), Google Cloud Storage, Azure Blob Storage
- CDC: PostgreSQL, MySQL, MongoDB, BigQuery
Path B: Self-driven ingestion via an official language client
If you own the pipeline, use one of the official language clients. They handle native-format serialization, batching, TLS, compression, and connection pooling. You pass runtime primitives; the client handles the wire format.- Official clients: Python, Go, Java, JavaScript, Rust, C#, C++
- Both wire protocols: HTTP (port
8123, TLS8443) and native TCP (port9000, TLS9440) - Auth: username and password over TLS by default; mTLS and SSL client-certificate auth are supported by all major clients
- Data format is usually an implementation detail. Clients convert runtime types to ClickHouse Native format. If you already produce Arrow, Parquet, JSONEachRow, or another format, most clients expose a raw-bytes API for pre-serialized data
- For throughput, batch 10K–100K rows and aim for roughly one insert per second as an upper bound for synchronous inserts. If client-side batching is impractical, use asynchronous inserts to shift batching to the server
Consumption
HTTP and native TCP both carry queries. Native is binary and lower overhead (whatclickhouse-client uses). HTTP works through load balancers and proxies. Both are first-class; pick based on infrastructure, not feature gaps.
- Application code: use the same official language clients as for ingestion
- BI and SQL tools: ClickHouse ships an official JDBC v2 driver (Java) and an ODBC driver. Tableau, Looker, Power BI, Metabase, Apache Superset, and Grafana integrate via these drivers or dedicated connectors maintained by ClickHouse and partners
- Result format: clients typically own serialization. You can request Arrow, Parquet, or other columnar formats on the wire if your product needs them
Result-set sizing
Most analytical queries return small result sets (aggregates, summaries, top-N), and the wire is rarely the bottleneck. ClickHouse tables can hold billions of rows, and an unboundedSELECT * over a large fact table can move terabytes. Shape the request in your application: use LIMIT, pagination, streaming reads, and explicit column lists. If you build user-facing analytics, treat unbounded result sets as a UX problem, not a transport problem.
ClickHouse has a rich type system: arrays, tuples, maps, JSON, nested, LowCardinality, and more. Official clients map these to idiomatic language types. If your product surfaces ClickHouse data to end users, plan a type-mapping strategy early.
Next steps
Pick a path, prototype against a ClickHouse Cloud trial, and register your integration through the partner portal when it is available.User-agent string convention
Partner clients that connect over HTTP should set aUser-Agent string that identifies the integration. ClickHouse parses this server-side to track adoption, surface usage telemetry to partners, and inform the roadmap.
Format:
clickhouse-java/0.8.0my-analytics-app/3.1.2 clickhouse-js/1.2.0 (env: staging; region: us-east-1; lv: node/20.10)
- No whitespace in client name or version
- If you include a comment, it must come first
- Standard metadata keys:
lv(language or framework version),os,arch - TCP and native protocol clients report client name and version via protocol fields, not
User-Agent