Based on Elasticsearch best practices and official documentation

Overview
Calculator
Explanation

Purpose of the Workbook

This workbook helps engineers, architects, and system administrators plan and estimate infrastructure requirements for deploying an Elasticsearch cluster. It calculates the recommended number of nodes for each role in the cluster—master-eligible nodes, data nodes, ingest nodes, and coordinating nodes—based on key factors like data ingestion volume, indexing rate, and retention policy.

By inputting realistic operational parameters, users can generate a sizing baseline tailored to their use case, improving the reliability and performance of their Elasticsearch deployment while ensuring efficient resource allocation.

Usage Considerations

This tool is intended to serve as a high-level estimation guide for Elasticsearch sizing. It should not be used in isolation to finalize production cluster designs.

Actual requirements may vary depending on:

  • Workload characteristics (query vs. indexing heavy)
  • Node hardware profiles (CPU, disk I/O, network bandwidth)
  • Performance tuning (caching, filters, storage tiering)
  • Security features (encryption, audit logging, etc.)
It is highly recommended to validate all assumptions through load testing, staging environments, and benchmark trials before applying this sizing in a production scenario.

Calculation Methodology

All calculations and assumptions in this workbook are informed by official Elasticsearch documentation and community-accepted best practices. The sizing estimates rely on formulas and thresholds that reflect how Elasticsearch handles data distribution, indexing, and query performance.

Key factors taken into account include:

  • Daily ingestion volume (GB or TB)
  • Retention period (in days or weeks)
  • Ideal shard size (usually ~30–50 GB)
  • Desired number of replicas for high availability
  • Estimated JVM heap size per node
  • Usable storage per node (factoring in 10–20% overhead)
  • Shard-to-heap ratio (maximum recommended: ~20 shards per GB of heap)

Input Parameters

2. Heap & Shard Calculations

Parameter Formula Value
Heap Size per Node MIN(ram / 2, 32) -
Max Shards per Node heapSize × shards_per_gb_heap -
Total Raw Storage (GB) dailyData × retention × (1 + replica) × overhead -
Usable Storage per Node (GB) disk × 1024 × 0.8 -
Data-Driven Min Shards CEIL(dailyData / targetShard) -
Heap-Constrained Shards CEIL(totalRawStorage / targetShard) -
Final Daily Shards MAX(minShards, CEIL(heapConstrainedShards / maxShardsNode)) -
Total Cluster Shards finalDailyShards × (1 + replica) × retention -

Cluster Recommendations

Data Nodes
0
Storage and shard requirements
Master Nodes
0
Cluster coordination
Ingest Nodes
0
Data processing pipelines
Coordinating Nodes
0
Query handling
Total Nodes
0
Minimum cluster size
Total Storage
0 TB
Raw storage required
Critical Rule: Never exceed 32GB heap! Split into more nodes instead of scaling RAM vertically.

Heap-Driven Shard Adjustment

Node RAM Heap Allocation Max Shards/Node (20/GB) Max Daily Data Before Adding Shards
32GB 16GB 320 16TB
64GB 30GB 600 30TB
128GB 32GB 640 32TB
256GB 32GB 640 32TB

Metric Explanations

1. Daily Data Volume (GB)
The total amount of raw, uncompressed data ingested into the system per day, including source fields and indexing overhead.
  • Measure uncompressed data at ingest
  • Include all fields (_source + indexing overhead)
  • Calculate 30-day peak, not average
  • Tool: Use GET _cat/allocation?v on test cluster
2. Retention Period (Days)
The number of days data is stored before it's eligible for archival or deletion, influencing storage lifecycle tiers (hot/warm/cold).
  • Hot tier: Data actively queried (3-7 days)
  • Warm tier: Older data (SSD/HDD hybrid)
  • Cold/Frozen: Archival (object storage)
ILM Policy Example:
PUT _ilm/policy/logs {
  "hot": {"min_age": "0d", "actions": {"rollover": {"max_size": "50gb"}}},
  "warm": {"min_age": "7d", "actions": {"forcemerge": {"max_num_segments": 1}}}
}
3. Replica Factor (Count)
1=HA, 2=Production
The number of copies of each data shard to ensure high availability and fault tolerance; each replica increases storage usage.
  • Required for high availability
  • Provides failover during node outages
  • Enables parallel query execution
  • Cost: Doubles storage (1 replica) or triples (2 replicas)
4. Overhead Multiplier
1 + (segment_merge_% + os_reserve_%)
A multiplier that accounts for additional storage usage from OS reserves and segment merging, varying by disk type.
Overhead Type SSD HDD
Segment Merges 15% 30%
OS Reserve 15% 20%
Total 30% 50%

Defaults: 1.3 (SSD), 1.5 (HDD)

5. CPU Cores per Node (Count)
vcpus × (1 - hyperthreading_discount)
Conservative: 8 cores = 8 vCPUs
The number of effective CPU cores per node, dictating processing capacity for indexing and queries.
Node Type Min Cores Recommended
Data 8 16-32
Ingest 4 8-16
Coordinating 4 8-16
Master 2 4
Avoid >64 vCPUs - leads to thread contention!
6. Data Node RAM (GB)
The total physical storage per node, with 80% typically allocated for usable Elasticsearch data.
  • Heap ≤ 32GB (Java compressed pointers threshold)
  • 50% RAM to heap, 50% to OS/filesystem cache
  • Minimum: 8GB RAM (test), 64GB RAM (production)
Never use swap memory for heap!
7. Disk Size per Data Node (TB)
disk_tb × 0.8
The total memory per node, with half allocated to the Java heap (≤32 GB).
Disk Type Max Size RAID Config
SATA SSD 8TB RAID 0
NVMe SSD 4TB None
HDD 16TB RAID 10
Avoid >90% disk usage; Prefer 4×2TB NVMe over 1×8TB SATA for throughput
8. Shards per GB Heap
20 + (5 × storage_type_bonus)
The ideal size range for each shard (30–50 GB), balancing performance and manageability.
Storage Type Bonus Value Resulting Shards/GB Calculation
HDD 0 20 20 + (5×0) = 20
SATA SSD 1 25 20 + (5×1) = 25
NVMe SSD 2 30 20 + (5×2) = 30

Notes:

  • 1 shard = ~2MB heap metadata (indexing + search)
  • Conservative scaling: Max Shards/Node = Heap_GB × 20
  • Aggressive scaling (SSD-only): Max Shards/Node = Heap_GB × 25
9. Target Shard Size (GB)
MIN(50, MAX(30, daily_data_gb / 20))
Ideal: 30-50GB
The maximum number of index shards that can be supported per GB of Java heap memory.
Scenario Shard Size
Time-series logs 50GB
Search-heavy 30GB
Vector DB 10GB
Enforcement example:
PUT logs-000001 {
  "settings": {
    "index.lifecycle.rollover_alias": "logs",
    "index.lifecycle.rollover_size": "50gb"
  }
}
10. Peak Ingestion Rate (evt/sec)
The highest event rate (per second) the system needs to handle during ingestion.
Node Type Events/Sec/Core
Ingest-Optim 50,000
Data Node 20,000
Coordinating 30,000

Scaling Tip: 1 ingest node (8 cores) handles 400K evt/sec with no pipelines and default mapping

11. Peak Query Load (QPS)
concurrent_users × queries_per_user
The maximum number of queries the cluster must support per second.
Query Type QPS/Core
Match_all 15,000
Term Aggregation 5,000
KNN Search 500

Optimization:

  • Increase coordinating nodes for search-heavy loads
  • Use shard request cache for repeated queries
12. Heap Size per Node
MIN(node_ram_gb / 2, 32)
The amount of JVM heap allocated per node, capped at 32 GB.
  • Elasticsearch recommends ≤32GB JVM heap due to Java pointer compression
  • Allocate 50% of physical RAM to heap (e.g., 64GB RAM → 32GB heap)
Beyond 32GB, garbage collection efficiency drops sharply
13. Max Shards per Node (Heap)
heap_gb × shards_per_gb_heap
The upper limit on shards a single node can support.
  • Conservative: 20 shards/GB heap (default for HDD)
  • Aggressive: 25 shards/GB heap (SSD-optimized clusters)
  • Example: 32GB heap × 20 shards/GB = 640 shards/node max
Monitor with: GET _nodes/stats/indices?filter_path=**.shards
14. Total Raw Storage
daily_data_gb × retention_days × (1 + replica_factor) × overhead_multiplier
The total cluster storage required for all retained data.

Example: 10TB/day × 30 days × (1+1) × 1.3 = 780TB

  • Replica Factor: 1 (HA) or 2 (production)
  • Overhead Multiplier: 1.3 (SSD) or 1.5 (HDD)
15. Usable Storage per Node
(disk_tb × 1000) × 0.8
Effective disk space per node after reserving 20% for operational needs.
  • Reserve 20% disk space for segment merges, OS operations, and snapshots
Never exceed 85% disk usage (critical for cluster health)
16. Data-Driven Min Shards
CEILING(daily_data_gb / target_shard_size_gb, 1)
The minimum number of daily shards based on the target shard size.
  • Target Shard Size: 30-50GB (sweet spot for query performance)
  • If daily data = 10TB (10,240GB): 10,240GB / 50GB/shard = 205 shards

Exception: Time-series data use ILM rollover at 50GB

17. Heap-Constrained Shards
CEILING(total_raw_storage_gb / target_shard_size_gb, 1)
The total number of shards required across the cluster.
  • Represents total shards cluster must handle (not daily)
  • Validates if heap can manage shard count

Example: 780TB raw storage / 50GB/shard = 15,600 shards

18. Final Daily Shards
MAX(min_shards, CEILING(heap_constrained_shards / max_shards_per_node, 1))
The final computed number of shards per day.
  • Takes more restrictive value between data-driven and heap-driven estimates
  • Ensures neither shard size nor heap limits are violated
19. Total Cluster Shards
final_daily_shards × (1 + replica_factor) × retention_days
The cumulative shard count for the full retention window.
  • Replica Factor: 1 replica → 2x shards (1 primary + 1 replica)
  • Absolute Limits: ≤ 1,000 shards/node, ≤ 100,000 shards/cluster

Example: 205 daily shards × 2 × 30 days = 12,300 shards

20. Data Nodes (Storage)
CEILING(total_raw_storage_gb / usable_storage_per_node, 1)
The number of data nodes required to meet total storage needs.
  • Based purely on storage capacity requirements
  • Uses total raw storage including replicas and overhead

Critical: Must be ≥ actual storage needed at retention period end

21. Data Nodes (Shards)
CEILING(total_cluster_shards / max_shards_per_node, 1)
The number of data nodes needed based on shard capacity.
  • Based on heap memory limitations for shard management
  • Prevents shard overload that causes node crashes

Rule: Always round up to whole nodes

22. Data Nodes (Final)
MAX(data_nodes_storage, data_nodes_shards)
The greater of the storage-based or shard-based node estimates.
  • Ensures both storage capacity AND heap limits are satisfied
  • Example: MAX(122, 32) = 122 nodes

Optimization: Add 10% buffer for growth

23. Ingest Nodes
CEILING(peak_ingestion_rate / (50,000 × cores_per_node), 1)
Dedicated nodes that handle document preprocessing before indexing.
  • 50,000 events/sec/core benchmark for medium-complexity pipelines
  • Scale up for heavy Grok parsing (+30%) or enrichment lookups (+50%)

Default: 1 node minimum even if calculation <1

24. Coordinating Nodes
CEILING(peak_query_load / (5,000 × cores_per_node), 1)
Nodes that serve as query routers and aggregators during search operations.
  • 5,000 QPS/core for typical search/aggregation queries
  • Scale up for complex aggregations (+100%) or ML jobs (+200%)
25. Master Nodes
if(data_nodes_final ≤ 20, 3, 5)
Responsible for cluster coordination, metadata management, and node discovery.
  • Always odd number (3,5,7) to prevent split-brain
  • Never mix roles - dedicated nodes only
  • For clusters >100 nodes: 7 masters
Critical Setting:
discovery.zen.minimum_master_nodes: (master_nodes / 2) + 1
26. Total Nodes
data_nodes_final + ingest_nodes + coordinating_nodes + master_nodes
The sum of all node types representing the minimum viable cluster size.
  • Minimum production cluster: 7 nodes (3 master + 3 data + 1 ingest/coordinating)
  • Always include 10-20% buffer for upgrades and failure recovery
27. Actual Shards per Node
total_cluster_shards / data_nodes_final
The average number of shards assigned to each data node.
Range Status
100-500 shards/node Optimal
>600 shards/node Warning
>1,000 shards/node Critical (risk of instability)
Monitor with: GET _cat/allocation?v&h=node,shards
28. Max Shards per Node
heap_gb × shards_per_gb_heap
The maximum shard capacity a node can safely manage.
  • Conservative: 20 shards/GB heap
  • Aggressive: 25 shards/GB heap (SSD-only)
  • Adjust: Lower to 15 shards/GB if using heavy vector search

Example: 32GB heap × 20 = 640 shards/node max

29. Storage Utilization
total_raw_storage_gb / (data_nodes_final × usable_storage_per_node)
The ratio of used raw storage to available usable disk space.
Threshold Effect
85% Read-only mode activated
90% Shard relocation stops

Target: 65-75% for headroom

Autoremediation:
PUT _cluster/settings {
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "85%"
  }
}
30. Shard Size Compliance
IF((daily_data_gb / final_daily_shards) ≥ 30, "ok", "⚠️")
Validation ensuring each shard is at least 30 GB (ideally 30–50 GB).
  • Optimal: 30-50GB/shard
  • Consequences of small shards (<10GB):
    • Metadata overhead up to 50% of heap
    • Slower query performance
Fix oversharding:
POST /my_index/_shrink/my_new_index {
  "settings": { "index.number_of_shards": 10 }
}