Installing a Distributed Monitoring Platform: 3-VM Setup Process
Learn to set up a three-layer system with separate VMs for application, monitoring, and logging as in production.

System Architecture
Overview: Three-tier distributed system using separate VMs for application, monitoring, and logging - mimicking production infrastructure.

Why This Architecture:
Separation of Concerns: Each VM has a dedicated role (app/monitoring/logging)
Scalability: Easy to scale each tier independently
Observability Pillars: Covers metrics (Prometheus), logs (ELK), and visualization (Grafana/Kibana)
Setup Flow
Purpose: Build infrastructure from golden image → deploy services → integrate monitoring/logging
Phase 1: VM Foundation
├── Create Golden Image Template (base OS with common tools)
├── Clone VMs (app-vm, monitoring-vm, logging-vm)
├── Fix Hostnames & Machine IDs ⚠️ [ERROR #1] (prevent duplicate identity)
├── Configure Static IPs (stable addressing for monitoring)
└── Create Users & Permissions (security & access control)
Phase 2: Application VM Setup
├── Install Python & Dependencies (runtime environment)
├── Create Flask Application (web app with metrics/logging)
├── Install Node Exporter (system-level metrics)
└── Install & Configure Filebeat ⚠️ [ERROR #2, #3] (log shipper)
Phase 3: Monitoring VM Setup
├── Install Prometheus (time-series metrics database)
├── Configure Scrape Targets (collect from app-vm)
├── Install Grafana (visualization & dashboards)
└── Create Dashboards (display metrics)
Phase 4: Logging VM Setup
├── Install Elasticsearch (log storage & search)
├── Install Kibana (log visualization)
├── Configure Data Views (index patterns)
└── Verify Log Ingestion (confirm data flow)
Phase 5: Integration & Testing
├── Test Metrics Collection (Prometheus → Grafana)
├── Test Log Shipping (Filebeat → ELK)
└── Create Comprehensive Dashboards (unified view)
Detailed Setup Steps
PHASE 1: VM Foundation

Goal: Create reusable VM template and properly configure cloned instances with unique identities.
1.1 Create Golden Image Template
Purpose: Single source of truth - install once, clone many times. Ensures consistency across all VMs.
# Install base Ubuntu Server 22.04
sudo apt update && sudo apt upgrade -y
sudo apt install -y qemu-guest-agent curl wget vim htop net-tools
sudo systemctl enable qemu-guest-agent
# Optional: Install Docker
sudo apt install -y docker docker-compose
1.2 Clean VM Before Cloning
Purpose: Remove machine-specific identifiers to prevent conflicts when cloning.
sudo cloud-init clean
sudo truncate -s 0 /etc/machine-id
sudo rm -f /var/lib/dbus/machine-id
sudo poweroff
1.3 Convert to Template in Proxmox UI
Right-click VM → Convert to Template
Note: Template becomes read-only - cannot boot directly
1.4 Clone VMs
Clone from template for: app-vm, monitoring-vm, logging-vm
Use Full Clone (recommended for independent VMs)
Result: 3 identical VMs that need unique configuration
⚠️ ERROR #1 ENCOUNTERED HERE
Why This Matters: Cloned VMs have identical hostnames and machine-ids, causing:
Prometheus to see only 1 node instead of 3
Systemd service conflicts
Network confusion
1.5 Fix Hostnames & Machine IDs
Purpose: Give each VM unique identity for proper monitoring and logging.
App VM:
sudo hostnamectl set-hostname app-vm
hostnamectl # Verify
Monitoring VM:
sudo hostnamectl set-hostname monitoring-vm
hostnamectl # Verify
Logging VM:
sudo hostnamectl set-hostname logging-vm
hostnamectl # Verify
1.6 Fix /etc/hosts (Each VM)
Purpose: Ensure hostname resolves correctly locally.
sudo nano /etc/hosts
Change:
127.0.1.1 app-server
To:
127.0.1.1 app-vm # (or monitoring-vm, logging-vm respectively)
1.7 Regenerate machine-id (CRITICAL)
Purpose: Create unique systemd identifier - required for proper journaling and service management. ⚠️ Do NOT manually create IDs - let systemd generate them.
sudo rm -f /etc/machine-id
sudo rm -f /var/lib/dbus/machine-id
sudo systemd-machine-id-setup
cat /etc/machine-id # Verify unique ID
sudo reboot
After reboot, verify each VM has different machine-id
1.8 Create Common User (All VMs)
Purpose: Standard non-root user for application management and SSH access.
sudo useradd -m -s /bin/bash devops
sudo passwd devops
sudo usermod -aG sudo devops
# Verify
getent passwd devops
id devops
1.9 Set Root Password (Optional)
Purpose: Enable root access for emergency situations (homelab only - disable in production).
sudo passwd -u root
sudo passwd root
1.10 Configure Static IPs
Purpose: Fixed IPs are essential for monitoring systems - DHCP changes would break scrape targets and log shipping.
Identify Network Interface:
ip a # Note interface name (e.g., ens18)
Edit Netplan (Each VM):
sudo nano /etc/netplan/00-installer-config.yaml
App VM (192.168.8.50):
network:
version: 2
renderer: networkd
ethernets:
ens18:
dhcp4: no
addresses:
- 192.168.8.50/24
gateway4: 192.168.8.1
nameservers:
addresses:
- 8.8.8.8
- 1.1.1.1
Monitoring VM (192.168.8.60):
addresses:
- 192.168.8.60/24
# (Everything else same)
Logging VM (192.168.8.70):
addresses:
- 192.168.8.70/24
# (Everything else same)
Apply Configuration:
sudo netplan apply
ip a # Verify
ip route # Verify gateway
Test Connectivity:
ping -c 3 192.168.8.1 # Gateway
ping 192.168.8.60 # Monitoring VM
ping 192.168.8.70 # Logging VM
Expected: All pings successful = network ready
1.11 Update /etc/hosts (All VMs)
Purpose: Enable hostname-based communication between VMs (easier than remembering IPs).
sudo nano /etc/hosts
Add:
192.168.8.50 app-vm
192.168.8.60 monitoring-vm
192.168.8.70 logging-vm
Test: ping monitoring-vm should work from any VM
PHASE 2: Application VM Setup
Goal: Deploy Flask web application with Prometheus metrics export and JSON logging.

2.1 Install Python & Dependencies
Purpose: Python runtime for Flask application.
ssh devops@app-vm
sudo apt update
sudo apt install -y python3 python3-pip python3-venv
2.2 Create Application Directory
Purpose: Isolated virtual environment prevents dependency conflicts.
cd ~
mkdir myapp
cd myapp
python3 -m venv venv
source venv/bin/activate
Result: Shell prompt shows (venv) prefix
2.3 Install Python Packages
pip install flask prometheus-client
pip list # Verify
2.4 Create Flask Application
Purpose: Web app that:
Serves HTTP requests
Exposes
/metricsfor PrometheusWrites JSON logs for ELK
nano app.py
Paste the following code:
from flask import Flask, Response, render_template_string
import time
import random
import logging
import json
from datetime import datetime
from prometheus_client import (
Counter,
Histogram,
generate_latest,
CONTENT_TYPE_LATEST
)
app = Flask(__name__)
# ----------------------
# JSON Logging Setup
# ----------------------
class JsonFormatter(logging.Formatter):
def format(self, record):
log_record = {
"@timestamp": datetime.utcnow().isoformat(),
"log.level": record.levelname,
"message": record.getMessage(),
"logger": record.name
}
if hasattr(record, "extra"):
log_record.update(record.extra)
return json.dumps(log_record)
handler = logging.FileHandler("/var/log/myapp/app.log")
handler.setFormatter(JsonFormatter())
logger = logging.getLogger("myapp")
logger.setLevel(logging.INFO)
logger.addHandler(handler)
logger.propagate = False
# ----------------------
# Prometheus Metrics
# ----------------------
REQUEST_COUNT = Counter(
"http_requests_total",
"Total HTTP requests",
["method", "endpoint", "status"]
)
REQUEST_LATENCY = Histogram(
"http_request_latency_seconds",
"HTTP request latency in seconds",
["endpoint"]
)
# ----------------------
# HTML Template
# ----------------------
HTML_TEMPLATE = """
<!DOCTYPE html>
<html>
<head>
<title>MyApp - Distributed Monitoring Platform</title>
<style>
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
html, body {
height: 100%;
overflow: hidden;
font-family: 'Segoe UI', Arial, sans-serif;
}
body {
background: linear-gradient(135deg, #0d47a1 0%, #1976d2 50%, #42a5f5 100%);
display: flex;
align-items: center;
justify-content: center;
}
.container {
width: 95vw;
height: 95vh;
background: rgba(227, 242, 253, 0.95);
border-radius: 15px;
padding: 2vh 2vw;
box-shadow: 0 15px 50px rgba(0, 0, 0, 0.3);
display: flex;
flex-direction: column;
}
header {
text-align: center;
padding-bottom: 1.5vh;
border-bottom: 3px solid #1976d2;
margin-bottom: 2vh;
}
h1 {
font-size: 2.5vw;
color: #0d47a1;
margin-bottom: 0.5vh;
}
.tagline {
font-size: 1.2vw;
color: #1565c0;
}
.main-content {
flex: 1;
display: grid;
grid-template-columns: 1fr 1fr;
grid-template-rows: auto 1fr;
gap: 2vh;
overflow: hidden;
}
.section {
background: #bbdefb;
padding: 2vh 1.5vw;
border-radius: 10px;
border-left: 5px solid #1976d2;
overflow: auto;
}
.section h2 {
color: #0d47a1;
font-size: 1.5vw;
margin-bottom: 1vh;
}
.section p, .section li {
color: #1565c0;
font-size: 1vw;
line-height: 1.5;
}
.architecture {
grid-column: 1 / -1;
background: #90caf9;
}
.vm-grid {
display: grid;
grid-template-columns: repeat(3, 1fr);
gap: 1.5vw;
margin-top: 1vh;
}
.vm-box {
background: #e3f2fd;
padding: 1.5vh 1vw;
border-radius: 8px;
border: 2px solid #1976d2;
text-align: center;
}
.vm-box h3 {
color: #0d47a1;
font-size: 1.3vw;
margin-bottom: 1vh;
}
.vm-box p {
color: #1565c0;
font-size: 0.9vw;
margin: 0.5vh 0;
}
.vm-icon {
font-size: 2.5vw;
margin-bottom: 1vh;
}
.api-list {
list-style: none;
}
.api-item {
background: #e3f2fd;
padding: 1vh 1vw;
margin: 0.8vh 0;
border-radius: 5px;
border-left: 3px solid #1976d2;
display: flex;
justify-content: space-between;
align-items: center;
}
.api-endpoint {
font-weight: bold;
color: #0d47a1;
font-size: 1vw;
}
.api-method {
background: #1976d2;
color: white;
padding: 0.3vh 0.8vw;
border-radius: 3px;
font-size: 0.8vw;
}
.sample-page {
display: flex;
flex-direction: column;
gap: 1vh;
}
.sample-card {
background: #e3f2fd;
padding: 1vh 1vw;
border-radius: 5px;
border-left: 3px solid #1976d2;
}
.sample-card h4 {
color: #0d47a1;
font-size: 1.1vw;
margin-bottom: 0.5vh;
}
.sample-card p {
font-size: 0.9vw;
}
footer {
text-align: center;
padding-top: 1vh;
border-top: 2px solid #1976d2;
color: #1565c0;
font-size: 0.9vw;
margin-top: 1.5vh;
}
</style>
</head>
<body>
<div class="container">
<header>
<h1>MyApp - Distributed Monitoring Platform</h1>
<p class="tagline">Three-Tier Architecture | Application • Monitoring • Logging</p>
</header>
<div class="main-content">
<div class="section architecture">
<h2>🏗️ System Architecture</h2>
<div class="vm-grid">
<div class="vm-box">
<div class="vm-icon">🖥️</div>
<h3>App VM</h3>
<p><strong>Role:</strong> Application Server</p>
<p><strong>Stack:</strong> Python Flask</p>
<p><strong>Port:</strong> 5000</p>
<p><strong>Features:</strong> REST APIs, Metrics Export</p>
</div>
<div class="vm-box">
<div class="vm-icon">📊</div>
<h3>Monitor VM</h3>
<p><strong>Role:</strong> Metrics & Visualization</p>
<p><strong>Stack:</strong> Prometheus + Grafana</p>
<p><strong>Ports:</strong> 9090, 3000</p>
<p><strong>Features:</strong> Time-series DB, Dashboards</p>
</div>
<div class="vm-box">
<div class="vm-icon">📝</div>
<h3>Logging VM</h3>
<p><strong>Role:</strong> Log Aggregation</p>
<p><strong>Stack:</strong> Elasticsearch + Kibana</p>
<p><strong>Ports:</strong> 9200, 5601</p>
<p><strong>Features:</strong> Log Search, Analysis</p>
</div>
</div>
</div>
<div class="section">
<h2>🔌 Available APIs</h2>
<ul class="api-list">
<li class="api-item">
<span class="api-endpoint">/</span>
<span class="api-method">GET</span>
</li>
<li class="api-item">
<span class="api-endpoint">/api</span>
<span class="api-method">GET</span>
</li>
<li class="api-item">
<span class="api-endpoint">/slow</span>
<span class="api-method">GET</span>
</li>
<li class="api-item">
<span class="api-endpoint">/error</span>
<span class="api-method">GET</span>
</li>
<li class="api-item">
<span class="api-endpoint">/metrics</span>
<span class="api-method">GET</span>
</li>
</ul>
</div>
<div class="section sample-page">
<h2>📄 Sample Page</h2>
<div class="sample-card">
<h4>Application Features</h4>
<p>Real-time monitoring with Prometheus metrics collection and Grafana visualization</p>
</div>
<div class="sample-card">
<h4>Logging System</h4>
<p>Centralized log management using Elasticsearch with Kibana dashboards</p>
</div>
<div class="sample-card">
<h4>Performance Tracking</h4>
<p>Request latency, error rates, and throughput metrics tracked across all endpoints</p>
</div>
<div class="sample-card">
<h4>Distributed Architecture</h4>
<p>Scalable three-tier setup with dedicated VMs for app, monitoring, and logging</p>
</div>
</div>
</div>
<footer>
<p>🚀 MyApp v1.0 | Powered by Flask • Prometheus • Grafana • Elasticsearch • Kibana | Status: ✅ Running</p>
</footer>
</div>
</body>
</html>
"""
# ----------------------
# Routes
# ----------------------
@app.route("/")
def home():
start_time = time.time()
REQUEST_COUNT.labels("GET", "/", "200").inc()
latency = time.time() - start_time
REQUEST_LATENCY.labels("/").observe(latency)
logger.info(
"request_completed",
extra={
"endpoint": "/",
"method": "GET",
"status": 200,
"latency_ms": round(latency * 1000, 2)
}
)
return render_template_string(HTML_TEMPLATE)
@app.route("/api")
def api():
start_time = time.time()
REQUEST_COUNT.labels("GET", "/api", "200").inc()
latency = time.time() - start_time
REQUEST_LATENCY.labels("/api").observe(latency)
logger.info(
"request_completed",
extra={
"endpoint": "/api",
"method": "GET",
"status": 200,
"latency_ms": round(latency * 1000, 2)
}
)
return "API is running\n"
@app.route("/slow")
def slow():
delay = random.uniform(1, 4)
time.sleep(delay)
REQUEST_COUNT.labels("GET", "/slow", "200").inc()
REQUEST_LATENCY.labels("/slow").observe(delay)
logger.warning(
"slow_request",
extra={
"endpoint": "/slow",
"method": "GET",
"status": 200,
"latency_ms": round(delay * 1000, 2)
}
)
return f"Slow response: {delay:.2f}s\n"
@app.route("/error")
def error():
REQUEST_COUNT.labels("GET", "/error", "500").inc()
logger.error(
"application_error",
extra={
"endpoint": "/error",
"method": "GET",
"status": 500
}
)
return "Error occurred\n", 500
@app.route("/metrics")
def metrics():
return Response(
generate_latest(),
mimetype=CONTENT_TYPE_LATEST
)
# ----------------------
# Application Entry
# ----------------------
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
App Features:
/- Home page with system info/api- Simple API endpoint/slow- Simulates slow requests (1-4s)/error- Returns 500 error/metrics- Prometheus metrics endpoint
2.5 Create Log Directory
Purpose: Application needs write permissions for log file.
sudo mkdir -p /var/log/myapp
sudo chown -R devops:devops /var/log/myapp
2.6 Test Application Manually
Purpose: Verify app works before creating systemd service.
python3 app.py
From your laptop:
curl http://192.168.8.50:5000
curl http://192.168.8.50:5000/metrics
cat /var/log/myapp/app.log # Verify logs
Expected: HTTP responses and JSON logs being written
⚠️ ERROR #3 ENCOUNTERED HERE
2.7 Create Systemd Service
Purpose: Auto-start Flask app on boot and keep it running. Production standard vs manual python app.py.
sudo nano /etc/systemd/system/myapp.service
Paste:
[Unit]
Description=MyApp Flask Application
After=network.target
[Service]
Type=simple
User=devops
Group=devops
WorkingDirectory=/home/devops/myapp
ExecStart=/home/devops/myapp/venv/bin/python3 /home/devops/myapp/app.py
Restart=always
RestartSec=5
StandardOutput=journal
StandardError=journal
SyslogIdentifier=myapp
NoNewPrivileges=true
[Install]
WantedBy=multi-user.target
Critical Line: ExecStart must point to venv Python, not system Python (see Error #3)
Enable and Start:
sudo systemctl daemon-reload
sudo systemctl enable myapp.service
sudo systemctl start myapp.service
sudo systemctl status myapp.service
Expected: Status shows "active (running)"
2.8 Install Node Exporter
Purpose: Export system metrics (CPU, memory, disk) to Prometheus - app metrics come from Flask, system metrics from Node Exporter.
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar xvf node_exporter-1.6.1.linux-amd64.tar.gz
sudo mv node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
sudo useradd --no-create-home --shell /bin/false node_exporter
Create Service:
sudo nano /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
Start Service:
sudo systemctl daemon-reload
sudo systemctl enable node_exporter
sudo systemctl start node_exporter
curl http://localhost:9100/metrics # Verify
Expected: Hundreds of metrics like node_cpu_seconds_total, node_memory_MemAvailable_bytes
⚠️ ERROR #2 ENCOUNTERED HERE
2.9 Install Filebeat
Purpose: Lightweight log shipper - tails log files and sends to Elasticsearch. Part of the Elastic Stack.
Issue: Filebeat not in standard Ubuntu repos - requires Elastic repository.
# Fix apt repositories first
sudo apt install -y apt-transport-https curl gnupg
curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | \
sudo gpg --dearmor -o /usr/share/keyrings/elastic.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | \
sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update
sudo apt install filebeat -y
2.10 Configure Filebeat
Purpose: Tell Filebeat what to read (app.log) and where to send (Elasticsearch on logging-vm).
sudo nano /etc/filebeat/filebeat.yml
Minimal Config:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/myapp/app.log
fields:
service: my-python-app
fields_under_root: true
output.elasticsearch:
hosts: ["http://<ELK_VM_IP>:9200"]
setup.kibana:
host: "http://<ELK_VM_IP>:5601"
Key Points:
Input: Monitor
/var/log/myapp/app.logOutput: Send to Elasticsearch at 192.168.8.70:9200
Start Filebeat:
sudo systemctl enable filebeat
sudo systemctl start filebeat
sudo journalctl -u filebeat -f # Monitor logs
Expected Output:
"Publishing events"
"Connection to Elasticsearch established"
No "connection refused" errors
PHASE 3: Monitoring VM Setup
Goal: Deploy Prometheus (metrics storage) and Grafana (visualization) to monitor app-vm.

3.1 Install Prometheus
Purpose: Time-series database that pulls metrics from app-vm every 15 seconds. Industry standard for metrics.
ssh devops@monitoring-vm
sudo apt update && sudo apt upgrade -y
sudo useradd --no-create-home --shell /bin/false prometheus
Download and Install:
wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz
tar xvf prometheus-2.47.0.linux-amd64.tar.gz
sudo mv prometheus-2.47.0.linux-amd64/prometheus /usr/local/bin/
sudo mv prometheus-2.47.0.linux-amd64/promtool /usr/local/bin/
Create Directories:
sudo mkdir /etc/prometheus
sudo mv prometheus-2.47.0.linux-amd64/consoles /etc/prometheus/
sudo mv prometheus-2.47.0.linux-amd64/console_libraries /etc/prometheus/
sudo mv prometheus-2.47.0.linux-amd64/prometheus.yml /etc/prometheus/
sudo chown -R prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool
3.2 Configure Prometheus
Purpose: Define scrape targets - tell Prometheus where to collect metrics from.
sudo nano /etc/prometheus/prometheus.yml
Add Scrape Targets:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'flask_app'
static_configs:
- targets: ['192.168.8.50:5000']
- job_name: 'node_exporter'
static_configs:
- targets: ['192.168.8.50:9100']
What This Does:
Every 15s, scrape
192.168.8.50:5000/metrics(Flask app metrics)Every 15s, scrape
192.168.8.50:9100/metrics(system metrics)Store data in time-series database
3.3 Create Prometheus Service
sudo nano /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
Create Storage & Start:
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
sudo systemctl status prometheus
Test: Open http://192.168.8.60:9090
Go to Status → Targets
Both targets should be "UP"
3.4 Install Grafana
Purpose: Visualization layer on top of Prometheus - creates beautiful dashboards from metrics.
sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt update
sudo apt install -y grafana
Start Grafana:
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
sudo systemctl status grafana-server
Access: http://192.168.8.60:3000
Default login:
admin/adminYou'll be prompted to set new password
3.5 Configure Grafana
1. Add Prometheus Data Source:
Settings → Data Sources → Add data source
Select Prometheus
Click Save & Test (should show green checkmark)
2. Import Node Exporter Dashboard:
Dashboards → Import
Dashboard ID: 1860 (Node Exporter Full)
Select Prometheus data source
Import
Result: System metrics dashboard (CPU, memory, disk, network) for app-vm
3. Create Custom Flask Dashboard:
Purpose: Monitor application-specific metrics not covered by Node Exporter.
Create new dashboard
Add panel with queries:
Request Rate:
rate(http_requests_total[1m])
Latency (95th percentile):
histogram_quantile(0.95, sum(rate(http_request_latency_seconds_bucket[5m])) by (le))
Error Rate:
rate(http_requests_total{status="500"}[1m])
Why Two Dashboards:
Dashboard 1860: System health (CPU, RAM, disk)
Custom dashboard: App health (requests, latency, errors)
PHASE 4: Logging VM Setup
Goal: Deploy ELK stack (Elasticsearch + Kibana) for centralized log management and analysis.

4.1 Install Elasticsearch
Purpose: Scalable search engine - stores and indexes logs for fast querying. Core of the ELK stack.
ssh devops@logging-vm
sudo apt update && sudo apt upgrade -y
# Add Elastic repo
curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | \
sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update
sudo apt install elasticsearch -y
4.2 Configure Elasticsearch
Purpose: Make Elasticsearch accessible from network and disable security for homelab simplicity.
sudo nano /etc/elasticsearch/elasticsearch.yml
Set:
cluster.name: my-logging-cluster
node.name: logging-node-1
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node
xpack.security.enabled: false
Configuration Explained:
network.host: 0.0.0.0- Accept connections from any IPdiscovery.type: single-node- Not clustering (single VM)xpack.security.enabled: false- Disable auth (⚠️ production should enable)
Start Elasticsearch:
sudo systemctl daemon-reload
sudo systemctl enable elasticsearch
sudo systemctl start elasticsearch
curl http://localhost:9200 # Verify
Expected: JSON response with cluster name, version info
4.3 Install Kibana
Purpose: Web UI for Elasticsearch - search, visualize, and analyze logs through dashboards.
sudo apt install kibana -y
Configure:
sudo nano /etc/kibana/kibana.yml
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
Start Kibana:
sudo systemctl enable kibana
sudo systemctl start kibana
sudo systemctl status kibana
Access: http://192.168.8.70:5601
- Initial load may take 1-2 minutes
4.4 Create Data View in Kibana
Purpose: Tell Kibana which Elasticsearch indices to query. Filebeat creates indices like filebeat-2026.02.07.
Go to Stack Management → Data Views
Click Create data view
Fill in:
Name:
filebeat-myappIndex pattern:
filebeat-*(matches all filebeat indices)Time field:
@timestamp
Click Save
Why filebeat-* Pattern:
Filebeat creates daily indices:
filebeat-2026.02.07,filebeat-2026.02.08, etc.Wildcard
*matches all of themNew indices auto-included
4.5 View Logs
Purpose: Verify logs are flowing from app-vm → Filebeat → Elasticsearch → Kibana.
Navigate to Discover
Select data view:
filebeat-myappYou should see JSON logs with fields:
@timestamp- When log was createdlog.level- INFO, WARNING, ERRORmessage- Log messageendpoint- Which API endpointlatency_ms- Request duration
If No Logs Appear:
Check Filebeat status on app-vm:
sudo systemctl status filebeatCheck Elasticsearch indices:
curlhttp://192.168.8.70:9200/_cat/indices?vLook for
filebeat-*indices
PHASE 5: Integration & Testing
Goal: Verify complete data flow and create comprehensive monitoring dashboards.
5.1 Test Complete Flow
Purpose: Generate realistic traffic to produce metrics and logs for visualization.
Generate Traffic:
# From your laptop
for i in {1..100}; do curl http://192.168.8.50:5000/; done
for i in {1..50}; do curl http://192.168.8.50:5000/slow; done
for i in {1..20}; do curl http://192.168.8.50:5000/error; done
What This Creates:
100 normal requests →
http_requests_totalmetric increments50 slow requests →
http_request_latency_secondshistogram data20 errors → ERROR level logs in Kibana
5.2 Verify Metrics in Grafana
Purpose: Confirm Prometheus is scraping and Grafana is displaying metrics.
Check
http://192.168.8.60:3000Verify dashboards show:
Request rate: Should spike during traffic generation
Latency percentiles:
/slowendpoint shows higher latencyError rate: Spike from
/errorrequestsSystem metrics: CPU/memory usage from Node Exporter (Dashboard 1860)
Queries to Verify:
# In Grafana Explore
rate(http_requests_total[1m]) # Should show recent activity
http_request_latency_seconds{quantile="0.95"} # Should be higher for /slow
5.3 Verify Logs in Kibana
Purpose: Confirm Filebeat → Elasticsearch → Kibana pipeline is working.
Check
http://192.168.8.70:5601Go to Discover → Select
filebeat-myapp
Search Examples:
log.level: ERROR # Find all errors
endpoint: "/slow" # Find slow requests
latency_ms > 1000 # Requests over 1 second
Create Visualizations:
Errors Over Time:
Lens → Line chart
Filter:
log.level : "ERROR"X-axis:
@timestamp
Requests by Endpoint:
Lens → Bar chart
Y-axis: Count
X-axis:
endpoint.keyword
Latency Distribution:
Filter:
latency_msexistsHistogram of latency values
Save to Dashboard: Combine visualizations into unified logging dashboard
⚠️ Errors & Solutions Summary
ERROR #1: Duplicate Machine IDs & Hostnames After Cloning
Symptom: All VMs report same hostname and machine-id after cloning from template.
Why This Happens: Proxmox clones EVERYTHING including /etc/machine-id, /etc/hostname, and system identifiers.
Impact:
Prometheus sees only 1 node instead of 3 (metrics collision)
Systemd services conflict
Logs from all VMs appear to come from same source
Network confusion in monitoring tools
Root Cause: Machine-specific files were copied during clone operation.
Solution:
# Fix hostname
sudo hostnamectl set-hostname <vm-name>
# Fix /etc/hosts
sudo nano /etc/hosts
# Change 127.0.1.1 to correct hostname
# Regenerate machine-id (CRITICAL - must be systemd-generated)
sudo rm -f /etc/machine-id
sudo rm -f /var/lib/dbus/machine-id
sudo systemd-machine-id-setup
sudo reboot
Verification:
# On each VM, these should be DIFFERENT:
hostnamectl
cat /etc/machine-id
Why This is Critical for Observability:
Prometheus labels nodes by machine-id
Grafana dashboards group by hostname
ELK logs tagged with host.name
Without unique IDs, all data collapses into single source
ERROR #2: Apt Timeout When Installing Filebeat
Symptom:
E: Failed to fetch ...
E: Unable to fetch some archives
Timeout was reached
Why This Happens:
Using slow/blocked regional mirrors (lk.archive.ubuntu.com)
Missing Elastic repository (Filebeat not in standard Ubuntu repos)
Network routing issues for HTTP/HTTPS
Impact: Cannot install Filebeat, blocking log shipping pipeline.
Root Cause: Two-part problem:
Ubuntu mirrors unreachable/slow
Elastic repo not configured
Solution:
# Step 1: Fix Ubuntu repositories
sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
sudo nano /etc/apt/sources.list
# Replace with main Ubuntu mirrors:
deb http://archive.ubuntu.com/ubuntu/ jammy main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu jammy-security main restricted universe multiverse
# Step 2: Add Elastic repository
sudo apt install -y apt-transport-https curl gnupg
curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | \
sudo gpg --dearmor -o /usr/share/keyrings/elastic.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | \
sudo tee /etc/apt/sources.list.d/elastic-8.x.list
# Step 3: Update and install
sudo apt update
sudo apt install filebeat -y
Verification:
filebeat version
# Should show: filebeat version 8.x.x
Why This Matters:
Filebeat is log shipper - critical component of ELK pipeline
Without it, logs stay local on app-vm
No centralized logging = harder debugging in distributed systems
ERROR #3: Flask Service Failed - ModuleNotFoundError
Symptom:
ModuleNotFoundError: No module named 'flask'
systemctl status myapp.service → failed (code=exited, status=1)
Why This Happens: Systemd service points to system Python (/usr/bin/python3) instead of virtual environment Python.
How to Identify:
# Flask installed in venv:
/home/devops/myapp/venv/bin/python3 -c "import flask; print('OK')"
# Returns: OK
# System Python doesn't have Flask:
/usr/bin/python3 -c "import flask"
# Returns: ModuleNotFoundError
Root Cause: Virtual environment isolates dependencies. Systemd service must use venv Python, not system Python.
Incorrect Service File:
ExecStart=/usr/bin/python3 /home/devops/myapp/app.py
# ❌ Uses system Python → no Flask module
Correct Service File:
ExecStart=/home/devops/myapp/venv/bin/python3 /home/devops/myapp/app.py
# ✅ Uses venv Python → Flask available
Full Fix:
sudo systemctl stop myapp.service
sudo nano /etc/systemd/system/myapp.service
# Update ExecStart line to use venv/bin/python3
sudo systemctl daemon-reload
sudo systemctl start myapp.service
sudo systemctl status myapp.service
Verification:
# Service should show "active (running)"
sudo systemctl status myapp.service
# Test endpoint
curl http://localhost:5000
# Should return HTML response
Why This Matters:
Common mistake when deploying Python apps
Virtual environments prevent dependency conflicts
Production best practice: isolate app dependencies
Systemd service must match development environment
Prevention: Always specify full path to venv Python in systemd services.
🎯 Key Endpoints
| Service | VM | URL |
| Flask App | app-vm | http://192.168.8.50:5000 |
| Flask Metrics | app-vm | http://192.168.8.50:5000/metrics |
| Node Exporter | app-vm | http://192.168.8.50:9100/metrics |
| Prometheus | monitoring-vm | http://192.168.8.60:9090 |
| Grafana | monitoring-vm | http://192.168.8.60:3000 |
| Elasticsearch | logging-vm | http://192.168.8.70:9200 |
| Kibana | logging-vm | http://192.168.8.70:5601 |





