[{"data":1,"prerenderedAt":3143},["ShallowReactive",2],{"blog-es-\u002Fes\u002Fblog\u002Fstack-monitoring-prometheus-grafana-loki":3,"blog-es-surround-\u002Fes\u002Fblog\u002Fstack-monitoring-prometheus-grafana-loki":3129},{"id":4,"title":5,"author":6,"body":7,"category":3110,"cover":3111,"date":3112,"description":3113,"draft":3114,"extension":3115,"lastReviewed":3111,"meta":3116,"navigation":391,"path":3117,"readingTime":3118,"seo":3119,"sitemap":3120,"stem":3121,"tags":3122,"__hash__":3128},"blog_es\u002Fes\u002Fblog\u002Fstack-monitoring-prometheus-grafana-loki.md","Stack de monitoring completa en 2026: Prometheus + Grafana + Loki paso a paso","Equipo HeroCtl",{"type":8,"value":9,"toc":3090},"minimark",[10,26,29,34,42,53,56,59,63,66,106,122,126,129,162,165,169,175,178,181,209,224,228,233,240,250,257,661,680,683,726,750,754,759,765,1045,1055,1074,1078,1084,1087,1167,1188,1195,1234,1244,1248,1252,1258,1513,1516,1519,1707,1717,1724,1728,1733,1742,1748,1857,1864,1871,1891,1898,1902,1906,1909,1919,2257,2264,2482,2493,2507,2511,2515,2521,2545,2552,2556,2561,2564,2567,2600,2603,2636,2645,2656,2660,2663,2666,2692,2695,2699,2755,2758,2761,2765,2899,2902,2906,2909,2919,2929,2941,2947,2951,2957,2963,2969,2975,2984,2998,3004,3010,3014,3017,3045,3058,3061,3083,3086],[11,12,13,14,18,19,18,22,25],"p",{},"La primera vez que tu sitio caiga a las tres de la mañana, vas a descubrir una cosa incómoda: no tienes cómo saber qué pasó. No hay gráfico de CPU, no hay log del contenedor que murió, no hay alerta que avisó antes. Vas a abrir una terminal, conectar a los servidores uno por uno, correr ",[15,16,17],"code",{},"top",", ",[15,20,21],{},"df",[15,23,24],{},"journalctl",", e intentar reconstituir una escena de crimen que ya se enfrió.",[11,27,28],{},"Este post es el atajo para que no pases por eso. En cuatro horas, con R$80 a R$120 al mes de hardware, se puede montar la stack de observabilidad open-source que sustituye Datadog, New Relic y CloudWatch en 95% de los casos para startup. Las herramientas son las mismas que corren dentro de empresas con decenas de miles de servidores — y caben confortablemente en una VPS pequeña para el equipo que está empezando.",[30,31,33],"h2",{"id":32},"tldr","TL;DR",[11,35,36,37,41],{},"La stack de monitoring open-source estándar en 2026 — ",[38,39,40],"strong",{},"Prometheus + Grafana + Loki + Alertmanager"," — cabe en una única VPS de 4 GB de RAM y cubre métricas, logs centralizados, dashboards y alertas. Este tutorial muestra setup paso a paso para un cluster de 4 a 5 servidores en aproximadamente cuatro horas, usando docker-compose o job specs del orquestador.",[11,43,44,45,48,49,52],{},"Para startup brasileña, eso significa ",[38,46,47],{},"R$80 a R$120 al mes de hardware"," contra ",[38,50,51],{},"R$1.000 a R$2.000 al mes"," de SaaS de observabilidad equivalente. El costo de tiempo es honesto: cuatro horas de setup inicial más dos a cuatro horas al mes de mantenimiento continuo.",[11,54,55],{},"Resultado entregable al final del tutorial: dashboards de CPU, RAM, disco, red y métricas HTTP; logs buscables con retención de 30 días; alertas ruteadas a Slack, Discord o e-mail. Prerrequisitos: 1 VPS Linux con 4 GB de RAM y 50 GB de SSD, Docker instalado, y un dominio con DNS controlado por ti.",[11,57,58],{},"La elección entre correr esta stack en una VPS dedicada fuera del cluster de producción o como job dentro del propio orquestador es una decisión arquitectural — cubrimos las dos opciones en el paso 8 y en \"Cómo correr esto dentro de HeroCtl\".",[30,60,62],{"id":61},"que-hace-cada-componente-en-una-frase","Qué hace cada componente, en una frase",[11,64,65],{},"Antes de instalar cualquier cosa, vale la pena entender el papel de cada pieza. La stack tiene seis componentes; la confusión generalmente viene de pensar que alguno de ellos es \"el sistema de monitoring\". No lo es. Cada uno hace una cosa.",[67,68,69,76,82,88,94,100],"ul",{},[70,71,72,75],"li",{},[38,73,74],{},"Prometheus"," es una base de datos de series temporales (TSDB) que recolecta métricas vía HTTP scrape — él jala los números, nadie los empuja. Retiene 15 días por default.",[70,77,78,81],{},[38,79,80],{},"Grafana"," es la capa de visualización. Conecta a Prometheus, a Loki, a Postgres, a casi cualquier fuente estructurada, y dibuja gráficos.",[70,83,84,87],{},[38,85,86],{},"Loki"," es la pieza de logs. Sintaxis similar a la de Prometheus, indexa solo labels (no el contenido de los logs), y por eso queda cerca de diez veces más barato que ELK para correr.",[70,89,90,93],{},[38,91,92],{},"Promtail"," (o el Grafana Agent, que está sustituyendo al Promtail en 2026) es el recolector que lee los archivos de log de cada servidor y envía a Loki.",[70,95,96,99],{},[38,97,98],{},"node_exporter"," corre en cada nodo monitoreado y expone un endpoint HTTP con CPU, RAM, disco y red en formato Prometheus.",[70,101,102,105],{},[38,103,104],{},"Alertmanager"," recibe reglas de alerta de Prometheus y cuida del ruteo — Slack, e-mail, PagerDuty, webhook arbitrario.",[11,107,108,109,18,112,18,115,18,118,121],{},"Quien diseña la primera stack suele confundir Prometheus con \"monitoring\" y Grafana con \"dashboards bonitos\". La separación real es: ",[38,110,111],{},"Prometheus guarda números",[38,113,114],{},"Loki guarda texto",[38,116,117],{},"Grafana muestra ambos",[38,119,120],{},"Alertmanager grita cuando algún número queda mal",".",[30,123,125],{"id":124},"cual-es-la-arquitectura-recomendada","¿Cuál es la arquitectura recomendada?",[11,127,128],{},"Para un cluster de 3 a 5 servidores corriendo aplicaciones de producción, la topología que viene funcionando en la práctica es separar el servidor de observabilidad del resto. Un nodo dedicado, fuera del cluster que él monitorea, con dos objetivos: no morir junto cuando el cluster muera, y no competir por CPU\u002FRAM con la aplicación real.",[67,130,131,137,143,153],{},[70,132,133,136],{},[38,134,135],{},"1 servidor \"observability\" dedicado",", 4 GB de RAM, 50 GB de SSD. Corre Prometheus, Grafana, Loki, Alertmanager.",[70,138,139,142],{},[38,140,141],{},"Cada servidor monitoreado"," corre solo dos procesos livianos: node_exporter (métricas de sistema) y Promtail (envío de logs).",[70,144,145,148,149,152],{},[38,146,147],{},"Tus aplicaciones"," exponen un endpoint ",[15,150,151],{},"\u002Fmetrics"," en formato Prometheus. Si usas un framework popular, existe un cliente listo. Si no, es una biblioteca de pocas decenas de líneas.",[70,154,155,157,158,161],{},[38,156,80],{}," queda accesible vía subdominio (",[15,159,160],{},"monitor.tudominio.com",") con TLS automático y autenticación básica al frente.",[11,163,164],{},"Esa separación tiene un costo: pagas por una VPS más. A cambio, cuando el cluster principal caiga, todavía logras mirar los gráficos para entender qué pasó. Para startup, ese trade-off compensa casi siempre — el peor escenario en monitoring es descubrir que la única cosa que paró junto con el sitio fue el sistema que iba a avisarte que el sitio paró.",[30,166,168],{"id":167},"paso-1-como-provisionar-la-vps-de-observabilidad","Paso 1 — ¿Cómo provisionar la VPS de observabilidad?",[11,170,171,172,121],{},"Tiempo estimado: ",[38,173,174],{},"10 minutos",[11,176,177],{},"Cualquier proveedor barato sirve. Los dos con mejor costo-beneficio para el caso brasileño hoy son Hetzner (CPX21 a 7,99 EUR al mes con 3 vCPUs y 4 GB de RAM, datacenter en Alemania) y DigitalOcean (Basic Droplet de US$24 al mes con la misma configuración, datacenters más cercanos a Brasil). Para workload de monitoring, latencia de scrape en datacenter europeo no causa problema — Prometheus jala cada 15 segundos por default, entonces 200ms de RTT entre Hetzner y tus servidores no estorba.",[11,179,180],{},"Provisionando:",[182,183,184,187,190,196,203],"ol",{},[70,185,186],{},"Crea la VPS con Ubuntu 24.04 LTS o Debian 12.",[70,188,189],{},"Agrega tu clave SSH pública en la creación. Deshabilita login por contraseña.",[70,191,192,193,121],{},"Instala Docker y el plugin de compose: ",[15,194,195],{},"curl -fsSL https:\u002F\u002Fget.docker.com | sh && apt install docker-compose-plugin",[70,197,198,199,202],{},"Configura el firewall: puerto 22 (SSH) abierto, puerto 443 (HTTPS) abierto, todos los demás cerrados. Los puertos internos (3000, 9090, 3100, 9093) solo quedan accesibles por el ",[15,200,201],{},"localhost"," de la propia VPS — el reverse proxy expone Grafana vía 443.",[70,204,205,206,208],{},"Apunta el DNS: crea un registro A ",[15,207,160],{}," para la IP de la VPS.",[11,210,211,212,215,216,219,220,223],{},"Validación: ",[15,213,214],{},"docker --version"," retorna 26.x o superior; ",[15,217,218],{},"dig monitor.tudominio.com"," retorna la IP correcta; ",[15,221,222],{},"ssh root@monitor.tudominio.com"," conecta sin pedir contraseña.",[30,225,227],{"id":226},"paso-2-como-subir-la-stack-via-docker-compose","Paso 2 — ¿Cómo subir la stack vía docker-compose?",[11,229,171,230,121],{},[38,231,232],{},"45 minutos",[11,234,235,236,239],{},"Crea el directorio de trabajo en ",[15,237,238],{},"\u002Fopt\u002Fobservability\u002F"," con la siguiente estructura:",[241,242,247],"pre",{"className":243,"code":245,"language":246},[244],"language-text","\u002Fopt\u002Fobservability\u002F\n├── docker-compose.yml\n├── prometheus\u002F\n│   ├── prometheus.yml\n│   └── alerts.yml\n├── alertmanager\u002F\n│   └── alertmanager.yml\n├── loki\u002F\n│   └── loki-config.yml\n└── grafana\u002F\n    └── provisioning\u002F\n        └── datasources\u002F\n            └── datasources.yml\n","text",[15,248,245],{"__ignoreMap":249},"",[11,251,252,253,256],{},"El ",[15,254,255],{},"docker-compose.yml"," abreviado pero funcional:",[241,258,262],{"className":259,"code":260,"language":261,"meta":249,"style":249},"language-yaml shiki shiki-themes github-dark-default","services:\n  prometheus:\n    image: prom\u002Fprometheus:v2.55.0\n    volumes:\n      - .\u002Fprometheus:\u002Fetc\u002Fprometheus\n      - prometheus-data:\u002Fprometheus\n    command:\n      - '--config.file=\u002Fetc\u002Fprometheus\u002Fprometheus.yml'\n      - '--storage.tsdb.retention.time=30d'\n      - '--web.enable-lifecycle'  # permite reload via HTTP POST\n    ports:\n      - '127.0.0.1:9090:9090'\n    restart: unless-stopped\n\n  grafana:\n    image: grafana\u002Fgrafana:11.3.0\n    volumes:\n      - grafana-data:\u002Fvar\u002Flib\u002Fgrafana\n      - .\u002Fgrafana\u002Fprovisioning:\u002Fetc\u002Fgrafana\u002Fprovisioning\n    environment:\n      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}\n      - GF_USERS_ALLOW_SIGN_UP=false\n    ports:\n      - '127.0.0.1:3000:3000'\n    restart: unless-stopped\n\n  loki:\n    image: grafana\u002Floki:3.2.0\n    volumes:\n      - .\u002Floki\u002Floki-config.yml:\u002Fetc\u002Floki\u002Fconfig.yml\n      - loki-data:\u002Floki\n    command: -config.file=\u002Fetc\u002Floki\u002Fconfig.yml\n    ports:\n      - '127.0.0.1:3100:3100'\n    restart: unless-stopped\n\n  alertmanager:\n    image: prom\u002Falertmanager:v0.27.0\n    volumes:\n      - .\u002Falertmanager:\u002Fetc\u002Falertmanager\n    ports:\n      - '127.0.0.1:9093:9093'\n    restart: unless-stopped\n\nvolumes:\n  prometheus-data:\n  grafana-data:\n  loki-data:\n","yaml",[15,263,264,277,285,298,306,315,323,331,339,347,359,367,375,386,393,401,411,418,426,434,442,450,458,465,473,482,487,495,505,512,520,528,538,545,553,562,567,575,585,592,600,607,615,624,629,637,645,653],{"__ignoreMap":249},[265,266,269,273],"span",{"class":267,"line":268},"line",1,[265,270,272],{"class":271},"sPWt5","services",[265,274,276],{"class":275},"sZEs4",":\n",[265,278,280,283],{"class":267,"line":279},2,[265,281,282],{"class":271},"  prometheus",[265,284,276],{"class":275},[265,286,288,291,294],{"class":267,"line":287},3,[265,289,290],{"class":271},"    image",[265,292,293],{"class":275},": ",[265,295,297],{"class":296},"s9uIt","prom\u002Fprometheus:v2.55.0\n",[265,299,301,304],{"class":267,"line":300},4,[265,302,303],{"class":271},"    volumes",[265,305,276],{"class":275},[265,307,309,312],{"class":267,"line":308},5,[265,310,311],{"class":275},"      - ",[265,313,314],{"class":296},".\u002Fprometheus:\u002Fetc\u002Fprometheus\n",[265,316,318,320],{"class":267,"line":317},6,[265,319,311],{"class":275},[265,321,322],{"class":296},"prometheus-data:\u002Fprometheus\n",[265,324,326,329],{"class":267,"line":325},7,[265,327,328],{"class":271},"    command",[265,330,276],{"class":275},[265,332,334,336],{"class":267,"line":333},8,[265,335,311],{"class":275},[265,337,338],{"class":296},"'--config.file=\u002Fetc\u002Fprometheus\u002Fprometheus.yml'\n",[265,340,342,344],{"class":267,"line":341},9,[265,343,311],{"class":275},[265,345,346],{"class":296},"'--storage.tsdb.retention.time=30d'\n",[265,348,350,352,355],{"class":267,"line":349},10,[265,351,311],{"class":275},[265,353,354],{"class":296},"'--web.enable-lifecycle'",[265,356,358],{"class":357},"sH3jZ","  # permite reload via HTTP POST\n",[265,360,362,365],{"class":267,"line":361},11,[265,363,364],{"class":271},"    ports",[265,366,276],{"class":275},[265,368,370,372],{"class":267,"line":369},12,[265,371,311],{"class":275},[265,373,374],{"class":296},"'127.0.0.1:9090:9090'\n",[265,376,378,381,383],{"class":267,"line":377},13,[265,379,380],{"class":271},"    restart",[265,382,293],{"class":275},[265,384,385],{"class":296},"unless-stopped\n",[265,387,389],{"class":267,"line":388},14,[265,390,392],{"emptyLinePlaceholder":391},true,"\n",[265,394,396,399],{"class":267,"line":395},15,[265,397,398],{"class":271},"  grafana",[265,400,276],{"class":275},[265,402,404,406,408],{"class":267,"line":403},16,[265,405,290],{"class":271},[265,407,293],{"class":275},[265,409,410],{"class":296},"grafana\u002Fgrafana:11.3.0\n",[265,412,414,416],{"class":267,"line":413},17,[265,415,303],{"class":271},[265,417,276],{"class":275},[265,419,421,423],{"class":267,"line":420},18,[265,422,311],{"class":275},[265,424,425],{"class":296},"grafana-data:\u002Fvar\u002Flib\u002Fgrafana\n",[265,427,429,431],{"class":267,"line":428},19,[265,430,311],{"class":275},[265,432,433],{"class":296},".\u002Fgrafana\u002Fprovisioning:\u002Fetc\u002Fgrafana\u002Fprovisioning\n",[265,435,437,440],{"class":267,"line":436},20,[265,438,439],{"class":271},"    environment",[265,441,276],{"class":275},[265,443,445,447],{"class":267,"line":444},21,[265,446,311],{"class":275},[265,448,449],{"class":296},"GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}\n",[265,451,453,455],{"class":267,"line":452},22,[265,454,311],{"class":275},[265,456,457],{"class":296},"GF_USERS_ALLOW_SIGN_UP=false\n",[265,459,461,463],{"class":267,"line":460},23,[265,462,364],{"class":271},[265,464,276],{"class":275},[265,466,468,470],{"class":267,"line":467},24,[265,469,311],{"class":275},[265,471,472],{"class":296},"'127.0.0.1:3000:3000'\n",[265,474,476,478,480],{"class":267,"line":475},25,[265,477,380],{"class":271},[265,479,293],{"class":275},[265,481,385],{"class":296},[265,483,485],{"class":267,"line":484},26,[265,486,392],{"emptyLinePlaceholder":391},[265,488,490,493],{"class":267,"line":489},27,[265,491,492],{"class":271},"  loki",[265,494,276],{"class":275},[265,496,498,500,502],{"class":267,"line":497},28,[265,499,290],{"class":271},[265,501,293],{"class":275},[265,503,504],{"class":296},"grafana\u002Floki:3.2.0\n",[265,506,508,510],{"class":267,"line":507},29,[265,509,303],{"class":271},[265,511,276],{"class":275},[265,513,515,517],{"class":267,"line":514},30,[265,516,311],{"class":275},[265,518,519],{"class":296},".\u002Floki\u002Floki-config.yml:\u002Fetc\u002Floki\u002Fconfig.yml\n",[265,521,523,525],{"class":267,"line":522},31,[265,524,311],{"class":275},[265,526,527],{"class":296},"loki-data:\u002Floki\n",[265,529,531,533,535],{"class":267,"line":530},32,[265,532,328],{"class":271},[265,534,293],{"class":275},[265,536,537],{"class":296},"-config.file=\u002Fetc\u002Floki\u002Fconfig.yml\n",[265,539,541,543],{"class":267,"line":540},33,[265,542,364],{"class":271},[265,544,276],{"class":275},[265,546,548,550],{"class":267,"line":547},34,[265,549,311],{"class":275},[265,551,552],{"class":296},"'127.0.0.1:3100:3100'\n",[265,554,556,558,560],{"class":267,"line":555},35,[265,557,380],{"class":271},[265,559,293],{"class":275},[265,561,385],{"class":296},[265,563,565],{"class":267,"line":564},36,[265,566,392],{"emptyLinePlaceholder":391},[265,568,570,573],{"class":267,"line":569},37,[265,571,572],{"class":271},"  alertmanager",[265,574,276],{"class":275},[265,576,578,580,582],{"class":267,"line":577},38,[265,579,290],{"class":271},[265,581,293],{"class":275},[265,583,584],{"class":296},"prom\u002Falertmanager:v0.27.0\n",[265,586,588,590],{"class":267,"line":587},39,[265,589,303],{"class":271},[265,591,276],{"class":275},[265,593,595,597],{"class":267,"line":594},40,[265,596,311],{"class":275},[265,598,599],{"class":296},".\u002Falertmanager:\u002Fetc\u002Falertmanager\n",[265,601,603,605],{"class":267,"line":602},41,[265,604,364],{"class":271},[265,606,276],{"class":275},[265,608,610,612],{"class":267,"line":609},42,[265,611,311],{"class":275},[265,613,614],{"class":296},"'127.0.0.1:9093:9093'\n",[265,616,618,620,622],{"class":267,"line":617},43,[265,619,380],{"class":271},[265,621,293],{"class":275},[265,623,385],{"class":296},[265,625,627],{"class":267,"line":626},44,[265,628,392],{"emptyLinePlaceholder":391},[265,630,632,635],{"class":267,"line":631},45,[265,633,634],{"class":271},"volumes",[265,636,276],{"class":275},[265,638,640,643],{"class":267,"line":639},46,[265,641,642],{"class":271},"  prometheus-data",[265,644,276],{"class":275},[265,646,648,651],{"class":267,"line":647},47,[265,649,650],{"class":271},"  grafana-data",[265,652,276],{"class":275},[265,654,656,659],{"class":267,"line":655},48,[265,657,658],{"class":271},"  loki-data",[265,660,276],{"class":275},[11,662,663,664,667,668,671,672,675,676,679],{},"Tres puntos importantes en ese archivo. Primero, todos los puertos están atados a ",[15,665,666],{},"127.0.0.1"," — ninguno de los servicios es accesible directamente de internet. Segundo, los volúmenes son nombrados (no bind mounts), así que sobreviven a ",[15,669,670],{},"docker-compose down",". Tercero, la contraseña de Grafana viene de variable de ambiente: crea un ",[15,673,674],{},".env"," al lado del compose con ",[15,677,678],{},"GRAFANA_PASSWORD=algo_largo_aleatorio"," y nunca commitees eso.",[11,681,682],{},"Sube la stack:",[241,684,688],{"className":685,"code":686,"language":687,"meta":249,"style":249},"language-bash shiki shiki-themes github-dark-default","cd \u002Fopt\u002Fobservability\ndocker compose up -d\ndocker compose ps  # todos deben estar \"Up\" \u002F healthy\n","bash",[15,689,690,699,714],{"__ignoreMap":249},[265,691,692,696],{"class":267,"line":268},[265,693,695],{"class":694},"sFSAA","cd",[265,697,698],{"class":296}," \u002Fopt\u002Fobservability\n",[265,700,701,705,708,711],{"class":267,"line":279},[265,702,704],{"class":703},"sQhOw","docker",[265,706,707],{"class":296}," compose",[265,709,710],{"class":296}," up",[265,712,713],{"class":694}," -d\n",[265,715,716,718,720,723],{"class":267,"line":287},[265,717,704],{"class":703},[265,719,707],{"class":296},[265,721,722],{"class":296}," ps",[265,724,725],{"class":357},"  # todos deben estar \"Up\" \u002F healthy\n",[11,727,728,729,732,733,736,737,732,740,736,743,746,747,121],{},"Validación rápida: ",[15,730,731],{},"curl localhost:9090\u002F-\u002Fready"," retorna ",[15,734,735],{},"Prometheus Server is Ready","; ",[15,738,739],{},"curl localhost:3100\u002Fready",[15,741,742],{},"ready",[15,744,745],{},"curl localhost:3000\u002Fapi\u002Fhealth"," retorna JSON con ",[15,748,749],{},"\"database\": \"ok\"",[30,751,753],{"id":752},"paso-3-como-configurar-los-scrapes-de-prometheus","Paso 3 — ¿Cómo configurar los scrapes de Prometheus?",[11,755,171,756,121],{},[38,757,758],{},"30 minutos",[11,760,252,761,764],{},[15,762,763],{},"prometheus\u002Fprometheus.yml"," es donde dices a Prometheus qué endpoints raspar. Para un cluster de 4 servidores, queda así:",[241,766,768],{"className":259,"code":767,"language":261,"meta":249,"style":249},"global:\n  scrape_interval: 15s\n  evaluation_interval: 15s\n\nalerting:\n  alertmanagers:\n    - static_configs:\n        - targets: ['alertmanager:9093']\n\nrule_files:\n  - 'alerts.yml'\n\nscrape_configs:\n  - job_name: 'prometheus'\n    static_configs:\n      - targets: ['localhost:9090']\n\n  - job_name: 'node'\n    static_configs:\n      - targets:\n          - 'server-1.tudominio.internal:9100'\n          - 'server-2.tudominio.internal:9100'\n          - 'server-3.tudominio.internal:9100'\n          - 'worker-1.tudominio.internal:9100'\n        labels:\n          environment: 'production'\n\n  - job_name: 'apps'\n    static_configs:\n      - targets:\n          - 'api.tudominio.internal:8080'\n          - 'worker.tudominio.internal:8080'\n        labels:\n          environment: 'production'\n    metrics_path: '\u002Fmetrics'\n",[15,769,770,777,787,796,800,807,814,824,841,845,852,860,864,871,883,890,903,907,918,924,932,940,947,954,961,968,978,982,993,999,1007,1014,1021,1027,1035],{"__ignoreMap":249},[265,771,772,775],{"class":267,"line":268},[265,773,774],{"class":271},"global",[265,776,276],{"class":275},[265,778,779,782,784],{"class":267,"line":279},[265,780,781],{"class":271},"  scrape_interval",[265,783,293],{"class":275},[265,785,786],{"class":296},"15s\n",[265,788,789,792,794],{"class":267,"line":287},[265,790,791],{"class":271},"  evaluation_interval",[265,793,293],{"class":275},[265,795,786],{"class":296},[265,797,798],{"class":267,"line":300},[265,799,392],{"emptyLinePlaceholder":391},[265,801,802,805],{"class":267,"line":308},[265,803,804],{"class":271},"alerting",[265,806,276],{"class":275},[265,808,809,812],{"class":267,"line":317},[265,810,811],{"class":271},"  alertmanagers",[265,813,276],{"class":275},[265,815,816,819,822],{"class":267,"line":325},[265,817,818],{"class":275},"    - ",[265,820,821],{"class":271},"static_configs",[265,823,276],{"class":275},[265,825,826,829,832,835,838],{"class":267,"line":333},[265,827,828],{"class":275},"        - ",[265,830,831],{"class":271},"targets",[265,833,834],{"class":275},": [",[265,836,837],{"class":296},"'alertmanager:9093'",[265,839,840],{"class":275},"]\n",[265,842,843],{"class":267,"line":341},[265,844,392],{"emptyLinePlaceholder":391},[265,846,847,850],{"class":267,"line":349},[265,848,849],{"class":271},"rule_files",[265,851,276],{"class":275},[265,853,854,857],{"class":267,"line":361},[265,855,856],{"class":275},"  - ",[265,858,859],{"class":296},"'alerts.yml'\n",[265,861,862],{"class":267,"line":369},[265,863,392],{"emptyLinePlaceholder":391},[265,865,866,869],{"class":267,"line":377},[265,867,868],{"class":271},"scrape_configs",[265,870,276],{"class":275},[265,872,873,875,878,880],{"class":267,"line":388},[265,874,856],{"class":275},[265,876,877],{"class":271},"job_name",[265,879,293],{"class":275},[265,881,882],{"class":296},"'prometheus'\n",[265,884,885,888],{"class":267,"line":395},[265,886,887],{"class":271},"    static_configs",[265,889,276],{"class":275},[265,891,892,894,896,898,901],{"class":267,"line":403},[265,893,311],{"class":275},[265,895,831],{"class":271},[265,897,834],{"class":275},[265,899,900],{"class":296},"'localhost:9090'",[265,902,840],{"class":275},[265,904,905],{"class":267,"line":413},[265,906,392],{"emptyLinePlaceholder":391},[265,908,909,911,913,915],{"class":267,"line":420},[265,910,856],{"class":275},[265,912,877],{"class":271},[265,914,293],{"class":275},[265,916,917],{"class":296},"'node'\n",[265,919,920,922],{"class":267,"line":428},[265,921,887],{"class":271},[265,923,276],{"class":275},[265,925,926,928,930],{"class":267,"line":436},[265,927,311],{"class":275},[265,929,831],{"class":271},[265,931,276],{"class":275},[265,933,934,937],{"class":267,"line":444},[265,935,936],{"class":275},"          - ",[265,938,939],{"class":296},"'server-1.tudominio.internal:9100'\n",[265,941,942,944],{"class":267,"line":452},[265,943,936],{"class":275},[265,945,946],{"class":296},"'server-2.tudominio.internal:9100'\n",[265,948,949,951],{"class":267,"line":460},[265,950,936],{"class":275},[265,952,953],{"class":296},"'server-3.tudominio.internal:9100'\n",[265,955,956,958],{"class":267,"line":467},[265,957,936],{"class":275},[265,959,960],{"class":296},"'worker-1.tudominio.internal:9100'\n",[265,962,963,966],{"class":267,"line":475},[265,964,965],{"class":271},"        labels",[265,967,276],{"class":275},[265,969,970,973,975],{"class":267,"line":484},[265,971,972],{"class":271},"          environment",[265,974,293],{"class":275},[265,976,977],{"class":296},"'production'\n",[265,979,980],{"class":267,"line":489},[265,981,392],{"emptyLinePlaceholder":391},[265,983,984,986,988,990],{"class":267,"line":497},[265,985,856],{"class":275},[265,987,877],{"class":271},[265,989,293],{"class":275},[265,991,992],{"class":296},"'apps'\n",[265,994,995,997],{"class":267,"line":507},[265,996,887],{"class":271},[265,998,276],{"class":275},[265,1000,1001,1003,1005],{"class":267,"line":514},[265,1002,311],{"class":275},[265,1004,831],{"class":271},[265,1006,276],{"class":275},[265,1008,1009,1011],{"class":267,"line":522},[265,1010,936],{"class":275},[265,1012,1013],{"class":296},"'api.tudominio.internal:8080'\n",[265,1015,1016,1018],{"class":267,"line":530},[265,1017,936],{"class":275},[265,1019,1020],{"class":296},"'worker.tudominio.internal:8080'\n",[265,1022,1023,1025],{"class":267,"line":540},[265,1024,965],{"class":271},[265,1026,276],{"class":275},[265,1028,1029,1031,1033],{"class":267,"line":547},[265,1030,972],{"class":271},[265,1032,293],{"class":275},[265,1034,977],{"class":296},[265,1036,1037,1040,1042],{"class":267,"line":555},[265,1038,1039],{"class":271},"    metrics_path",[265,1041,293],{"class":275},[265,1043,1044],{"class":296},"'\u002Fmetrics'\n",[11,1046,1047,1048,1050,1051,1054],{},"Para clusters mayores o que cambian de composición con frecuencia, cambia ",[15,1049,821],{}," por ",[15,1052,1053],{},"file_sd_configs"," apuntando a un JSON que generas automáticamente. Para 4 servidores estáticos, el archivo de arriba resuelve.",[11,1056,1057,1058,1061,1062,1065,1066,1069,1070,1073],{},"Reload: ",[15,1059,1060],{},"curl -X POST localhost:9090\u002F-\u002Freload",". Verifica en ",[15,1063,1064],{},"localhost:9090\u002Ftargets"," si todos los jobs están ",[15,1067,1068],{},"UP",". Los que estén ",[15,1071,1072],{},"DOWN"," todavía no fueron instrumentados — ese es el paso 4.",[30,1075,1077],{"id":1076},"paso-4-como-instalar-el-node_exporter-en-cada-servidor","Paso 4 — ¿Cómo instalar el node_exporter en cada servidor?",[11,1079,171,1080,1083],{},[38,1081,1082],{},"15 minutos"," para 4 servidores.",[11,1085,1086],{},"En cada servidor monitoreado, corre el node_exporter. Hay dos formas: binario directo vía systemd, o contenedor Docker. En 2026 el consenso es container — facilita actualización y aislamiento. En cada nodo:",[241,1088,1090],{"className":685,"code":1089,"language":687,"meta":249,"style":249},"docker run -d \\\n  --name node-exporter \\\n  --restart unless-stopped \\\n  --net=\"host\" \\\n  --pid=\"host\" \\\n  -v \"\u002F:\u002Fhost:ro,rslave\" \\\n  prom\u002Fnode-exporter:v1.8.2 \\\n  --path.rootfs=\u002Fhost\n",[15,1091,1092,1106,1116,1126,1136,1145,1155,1162],{"__ignoreMap":249},[265,1093,1094,1096,1099,1102],{"class":267,"line":268},[265,1095,704],{"class":703},[265,1097,1098],{"class":296}," run",[265,1100,1101],{"class":694}," -d",[265,1103,1105],{"class":1104},"suJrU"," \\\n",[265,1107,1108,1111,1114],{"class":267,"line":279},[265,1109,1110],{"class":694},"  --name",[265,1112,1113],{"class":296}," node-exporter",[265,1115,1105],{"class":1104},[265,1117,1118,1121,1124],{"class":267,"line":287},[265,1119,1120],{"class":694},"  --restart",[265,1122,1123],{"class":296}," unless-stopped",[265,1125,1105],{"class":1104},[265,1127,1128,1131,1134],{"class":267,"line":300},[265,1129,1130],{"class":694},"  --net=",[265,1132,1133],{"class":296},"\"host\"",[265,1135,1105],{"class":1104},[265,1137,1138,1141,1143],{"class":267,"line":308},[265,1139,1140],{"class":694},"  --pid=",[265,1142,1133],{"class":296},[265,1144,1105],{"class":1104},[265,1146,1147,1150,1153],{"class":267,"line":317},[265,1148,1149],{"class":694},"  -v",[265,1151,1152],{"class":296}," \"\u002F:\u002Fhost:ro,rslave\"",[265,1154,1105],{"class":1104},[265,1156,1157,1160],{"class":267,"line":325},[265,1158,1159],{"class":296},"  prom\u002Fnode-exporter:v1.8.2",[265,1161,1105],{"class":1104},[265,1163,1164],{"class":267,"line":333},[265,1165,1166],{"class":694},"  --path.rootfs=\u002Fhost\n",[11,1168,252,1169,1172,1173,1176,1177,18,1180,1183,1184,1187],{},[15,1170,1171],{},"--net=host"," es necesario para que vea las interfaces de red reales. El bind mount en ",[15,1174,1175],{},"\u002Fhost"," permite leer ",[15,1178,1179],{},"\u002Fproc",[15,1181,1182],{},"\u002Fsys"," y ",[15,1185,1186],{},"\u002Fetc\u002Fpasswd"," del host (read-only) sin correr el contenedor con privilegios de root.",[11,1189,1190,1191,1194],{},"Firewall: abre el puerto 9100 solo para la IP del servidor de observabilidad. En Ubuntu con ",[15,1192,1193],{},"ufw",":",[241,1196,1198],{"className":685,"code":1197,"language":687,"meta":249,"style":249},"ufw allow from \u003CIP_DEL_OBSERVABILITY> to any port 9100\n",[15,1199,1200],{"__ignoreMap":249},[265,1201,1202,1204,1207,1210,1213,1216,1219,1222,1225,1228,1231],{"class":267,"line":268},[265,1203,1193],{"class":703},[265,1205,1206],{"class":296}," allow",[265,1208,1209],{"class":296}," from",[265,1211,1212],{"class":1104}," \u003C",[265,1214,1215],{"class":296},"IP_DEL_OBSERVABILIT",[265,1217,1218],{"class":275},"Y",[265,1220,1221],{"class":1104},">",[265,1223,1224],{"class":296}," to",[265,1226,1227],{"class":296}," any",[265,1229,1230],{"class":296}," port",[265,1232,1233],{"class":694}," 9100\n",[11,1235,1236,1237,1240,1241,121],{},"Validación: del servidor de observability, ",[15,1238,1239],{},"curl http:\u002F\u002Fserver-1.tudominio.internal:9100\u002Fmetrics"," debe retornar cientos de líneas empezando con ",[15,1242,1243],{},"# HELP node_cpu_seconds_total...",[30,1245,1247],{"id":1246},"paso-5-como-configurar-loki-promtail","Paso 5 — ¿Cómo configurar Loki + Promtail?",[11,1249,171,1250,121],{},[38,1251,758],{},[11,1253,1254,1255,1194],{},"Loki ya está corriendo en el compose del paso 2. Falta el ",[15,1256,1257],{},"loki-config.yml",[241,1259,1261],{"className":259,"code":1260,"language":261,"meta":249,"style":249},"auth_enabled: false\n\nserver:\n  http_listen_port: 3100\n\ncommon:\n  path_prefix: \u002Floki\n  storage:\n    filesystem:\n      chunks_directory: \u002Floki\u002Fchunks\n      rules_directory: \u002Floki\u002Frules\n  replication_factor: 1\n  ring:\n    kvstore:\n      store: inmemory\n\nschema_config:\n  configs:\n    - from: 2024-01-01\n      store: tsdb\n      object_store: filesystem\n      schema: v13\n      index:\n        prefix: index_\n        period: 24h\n\nlimits_config:\n  retention_period: 720h  # 30 días\n  reject_old_samples: true\n  reject_old_samples_max_age: 168h\n",[15,1262,1263,1273,1277,1284,1294,1298,1305,1315,1322,1329,1339,1349,1359,1366,1373,1383,1387,1394,1401,1413,1422,1432,1442,1449,1459,1469,1473,1480,1493,1503],{"__ignoreMap":249},[265,1264,1265,1268,1270],{"class":267,"line":268},[265,1266,1267],{"class":271},"auth_enabled",[265,1269,293],{"class":275},[265,1271,1272],{"class":694},"false\n",[265,1274,1275],{"class":267,"line":279},[265,1276,392],{"emptyLinePlaceholder":391},[265,1278,1279,1282],{"class":267,"line":287},[265,1280,1281],{"class":271},"server",[265,1283,276],{"class":275},[265,1285,1286,1289,1291],{"class":267,"line":300},[265,1287,1288],{"class":271},"  http_listen_port",[265,1290,293],{"class":275},[265,1292,1293],{"class":694},"3100\n",[265,1295,1296],{"class":267,"line":308},[265,1297,392],{"emptyLinePlaceholder":391},[265,1299,1300,1303],{"class":267,"line":317},[265,1301,1302],{"class":271},"common",[265,1304,276],{"class":275},[265,1306,1307,1310,1312],{"class":267,"line":325},[265,1308,1309],{"class":271},"  path_prefix",[265,1311,293],{"class":275},[265,1313,1314],{"class":296},"\u002Floki\n",[265,1316,1317,1320],{"class":267,"line":333},[265,1318,1319],{"class":271},"  storage",[265,1321,276],{"class":275},[265,1323,1324,1327],{"class":267,"line":341},[265,1325,1326],{"class":271},"    filesystem",[265,1328,276],{"class":275},[265,1330,1331,1334,1336],{"class":267,"line":349},[265,1332,1333],{"class":271},"      chunks_directory",[265,1335,293],{"class":275},[265,1337,1338],{"class":296},"\u002Floki\u002Fchunks\n",[265,1340,1341,1344,1346],{"class":267,"line":361},[265,1342,1343],{"class":271},"      rules_directory",[265,1345,293],{"class":275},[265,1347,1348],{"class":296},"\u002Floki\u002Frules\n",[265,1350,1351,1354,1356],{"class":267,"line":369},[265,1352,1353],{"class":271},"  replication_factor",[265,1355,293],{"class":275},[265,1357,1358],{"class":694},"1\n",[265,1360,1361,1364],{"class":267,"line":377},[265,1362,1363],{"class":271},"  ring",[265,1365,276],{"class":275},[265,1367,1368,1371],{"class":267,"line":388},[265,1369,1370],{"class":271},"    kvstore",[265,1372,276],{"class":275},[265,1374,1375,1378,1380],{"class":267,"line":395},[265,1376,1377],{"class":271},"      store",[265,1379,293],{"class":275},[265,1381,1382],{"class":296},"inmemory\n",[265,1384,1385],{"class":267,"line":403},[265,1386,392],{"emptyLinePlaceholder":391},[265,1388,1389,1392],{"class":267,"line":413},[265,1390,1391],{"class":271},"schema_config",[265,1393,276],{"class":275},[265,1395,1396,1399],{"class":267,"line":420},[265,1397,1398],{"class":271},"  configs",[265,1400,276],{"class":275},[265,1402,1403,1405,1408,1410],{"class":267,"line":428},[265,1404,818],{"class":275},[265,1406,1407],{"class":271},"from",[265,1409,293],{"class":275},[265,1411,1412],{"class":694},"2024-01-01\n",[265,1414,1415,1417,1419],{"class":267,"line":436},[265,1416,1377],{"class":271},[265,1418,293],{"class":275},[265,1420,1421],{"class":296},"tsdb\n",[265,1423,1424,1427,1429],{"class":267,"line":444},[265,1425,1426],{"class":271},"      object_store",[265,1428,293],{"class":275},[265,1430,1431],{"class":296},"filesystem\n",[265,1433,1434,1437,1439],{"class":267,"line":452},[265,1435,1436],{"class":271},"      schema",[265,1438,293],{"class":275},[265,1440,1441],{"class":296},"v13\n",[265,1443,1444,1447],{"class":267,"line":460},[265,1445,1446],{"class":271},"      index",[265,1448,276],{"class":275},[265,1450,1451,1454,1456],{"class":267,"line":467},[265,1452,1453],{"class":271},"        prefix",[265,1455,293],{"class":275},[265,1457,1458],{"class":296},"index_\n",[265,1460,1461,1464,1466],{"class":267,"line":475},[265,1462,1463],{"class":271},"        period",[265,1465,293],{"class":275},[265,1467,1468],{"class":296},"24h\n",[265,1470,1471],{"class":267,"line":484},[265,1472,392],{"emptyLinePlaceholder":391},[265,1474,1475,1478],{"class":267,"line":489},[265,1476,1477],{"class":271},"limits_config",[265,1479,276],{"class":275},[265,1481,1482,1485,1487,1490],{"class":267,"line":497},[265,1483,1484],{"class":271},"  retention_period",[265,1486,293],{"class":275},[265,1488,1489],{"class":296},"720h",[265,1491,1492],{"class":357},"  # 30 días\n",[265,1494,1495,1498,1500],{"class":267,"line":507},[265,1496,1497],{"class":271},"  reject_old_samples",[265,1499,293],{"class":275},[265,1501,1502],{"class":694},"true\n",[265,1504,1505,1508,1510],{"class":267,"line":514},[265,1506,1507],{"class":271},"  reject_old_samples_max_age",[265,1509,293],{"class":275},[265,1511,1512],{"class":296},"168h\n",[11,1514,1515],{},"Storage en filesystem es suficiente para empezar. Cuando pases de 50 GB de logs por día o quieras retención de 90+ días, migra a S3 (o compatible). No migres antes — complica la operación sin ganancia real.",[11,1517,1518],{},"En cada servidor monitoreado, instala Promtail (o Grafana Agent) también vía container:",[241,1520,1522],{"className":259,"code":1521,"language":261,"meta":249,"style":249},"# \u002Fopt\u002Fpromtail\u002Fpromtail-config.yml en cada servidor\nserver:\n  http_listen_port: 9080\n\nclients:\n  - url: http:\u002F\u002Fmonitor.tudominio.com:3100\u002Floki\u002Fapi\u002Fv1\u002Fpush\n\nscrape_configs:\n  - job_name: system\n    static_configs:\n      - targets: [localhost]\n        labels:\n          job: varlogs\n          host: ${HOSTNAME}\n          __path__: \u002Fvar\u002Flog\u002F*.log\n\n  - job_name: docker\n    docker_sd_configs:\n      - host: unix:\u002F\u002F\u002Fvar\u002Frun\u002Fdocker.sock\n    relabel_configs:\n      - source_labels: ['__meta_docker_container_name']\n        target_label: 'container'\n",[15,1523,1524,1529,1535,1544,1548,1555,1567,1571,1577,1588,1594,1606,1612,1622,1632,1642,1646,1657,1664,1676,1683,1697],{"__ignoreMap":249},[265,1525,1526],{"class":267,"line":268},[265,1527,1528],{"class":357},"# \u002Fopt\u002Fpromtail\u002Fpromtail-config.yml en cada servidor\n",[265,1530,1531,1533],{"class":267,"line":279},[265,1532,1281],{"class":271},[265,1534,276],{"class":275},[265,1536,1537,1539,1541],{"class":267,"line":287},[265,1538,1288],{"class":271},[265,1540,293],{"class":275},[265,1542,1543],{"class":694},"9080\n",[265,1545,1546],{"class":267,"line":300},[265,1547,392],{"emptyLinePlaceholder":391},[265,1549,1550,1553],{"class":267,"line":308},[265,1551,1552],{"class":271},"clients",[265,1554,276],{"class":275},[265,1556,1557,1559,1562,1564],{"class":267,"line":317},[265,1558,856],{"class":275},[265,1560,1561],{"class":271},"url",[265,1563,293],{"class":275},[265,1565,1566],{"class":296},"http:\u002F\u002Fmonitor.tudominio.com:3100\u002Floki\u002Fapi\u002Fv1\u002Fpush\n",[265,1568,1569],{"class":267,"line":325},[265,1570,392],{"emptyLinePlaceholder":391},[265,1572,1573,1575],{"class":267,"line":333},[265,1574,868],{"class":271},[265,1576,276],{"class":275},[265,1578,1579,1581,1583,1585],{"class":267,"line":341},[265,1580,856],{"class":275},[265,1582,877],{"class":271},[265,1584,293],{"class":275},[265,1586,1587],{"class":296},"system\n",[265,1589,1590,1592],{"class":267,"line":349},[265,1591,887],{"class":271},[265,1593,276],{"class":275},[265,1595,1596,1598,1600,1602,1604],{"class":267,"line":361},[265,1597,311],{"class":275},[265,1599,831],{"class":271},[265,1601,834],{"class":275},[265,1603,201],{"class":296},[265,1605,840],{"class":275},[265,1607,1608,1610],{"class":267,"line":369},[265,1609,965],{"class":271},[265,1611,276],{"class":275},[265,1613,1614,1617,1619],{"class":267,"line":377},[265,1615,1616],{"class":271},"          job",[265,1618,293],{"class":275},[265,1620,1621],{"class":296},"varlogs\n",[265,1623,1624,1627,1629],{"class":267,"line":388},[265,1625,1626],{"class":271},"          host",[265,1628,293],{"class":275},[265,1630,1631],{"class":296},"${HOSTNAME}\n",[265,1633,1634,1637,1639],{"class":267,"line":395},[265,1635,1636],{"class":271},"          __path__",[265,1638,293],{"class":275},[265,1640,1641],{"class":296},"\u002Fvar\u002Flog\u002F*.log\n",[265,1643,1644],{"class":267,"line":403},[265,1645,392],{"emptyLinePlaceholder":391},[265,1647,1648,1650,1652,1654],{"class":267,"line":413},[265,1649,856],{"class":275},[265,1651,877],{"class":271},[265,1653,293],{"class":275},[265,1655,1656],{"class":296},"docker\n",[265,1658,1659,1662],{"class":267,"line":420},[265,1660,1661],{"class":271},"    docker_sd_configs",[265,1663,276],{"class":275},[265,1665,1666,1668,1671,1673],{"class":267,"line":428},[265,1667,311],{"class":275},[265,1669,1670],{"class":271},"host",[265,1672,293],{"class":275},[265,1674,1675],{"class":296},"unix:\u002F\u002F\u002Fvar\u002Frun\u002Fdocker.sock\n",[265,1677,1678,1681],{"class":267,"line":436},[265,1679,1680],{"class":271},"    relabel_configs",[265,1682,276],{"class":275},[265,1684,1685,1687,1690,1692,1695],{"class":267,"line":444},[265,1686,311],{"class":275},[265,1688,1689],{"class":271},"source_labels",[265,1691,834],{"class":275},[265,1693,1694],{"class":296},"'__meta_docker_container_name'",[265,1696,840],{"class":275},[265,1698,1699,1702,1704],{"class":267,"line":452},[265,1700,1701],{"class":271},"        target_label",[265,1703,293],{"class":275},[265,1705,1706],{"class":296},"'container'\n",[11,1708,1709,1710,1713,1714,1716],{},"Importante: el endpoint ",[15,1711,1712],{},"http:\u002F\u002Fmonitor.tudominio.com:3100\u002Floki\u002Fapi\u002Fv1\u002Fpush"," necesita estar accesible desde los servidores. Si seguiste el paso 2 y ataste Loki en ",[15,1715,666],{},", tienes dos opciones: exponer el 3100 vía reverse proxy con autenticación básica, o abrir un túnel SSH\u002FWireGuard entre los servidores. La segunda opción es más segura y la que recomendamos.",[11,1718,1719,1720,1723],{},"Validación: en Grafana, ve a Explore, selecciona la fuente de datos Loki, corre ",[15,1721,1722],{},"{job=\"varlogs\"}"," y ve los logs apareciendo en tiempo real.",[30,1725,1727],{"id":1726},"paso-6-como-importar-los-dashboards-de-grafana","Paso 6 — ¿Cómo importar los dashboards de Grafana?",[11,1729,171,1730,121],{},[38,1731,1732],{},"20 minutos",[11,1734,1735,1736,1739,1740,121],{},"Accede a ",[15,1737,1738],{},"https:\u002F\u002Fmonitor.tudominio.com"," (después de configurar el reverse proxy del paso 8 — puedes saltar para allá ahora si quieres). Login admin con la contraseña del ",[15,1741,674],{},[11,1743,1744,1745,1194],{},"Agrega las dos fuentes de datos vía provisioning automático. En ",[15,1746,1747],{},"grafana\u002Fprovisioning\u002Fdatasources\u002Fdatasources.yml",[241,1749,1751],{"className":259,"code":1750,"language":261,"meta":249,"style":249},"apiVersion: 1\ndatasources:\n  - name: Prometheus\n    type: prometheus\n    access: proxy\n    url: http:\u002F\u002Fprometheus:9090\n    isDefault: true\n  - name: Loki\n    type: loki\n    access: proxy\n    url: http:\u002F\u002Floki:3100\n",[15,1752,1753,1762,1769,1781,1791,1801,1811,1820,1831,1840,1848],{"__ignoreMap":249},[265,1754,1755,1758,1760],{"class":267,"line":268},[265,1756,1757],{"class":271},"apiVersion",[265,1759,293],{"class":275},[265,1761,1358],{"class":694},[265,1763,1764,1767],{"class":267,"line":279},[265,1765,1766],{"class":271},"datasources",[265,1768,276],{"class":275},[265,1770,1771,1773,1776,1778],{"class":267,"line":287},[265,1772,856],{"class":275},[265,1774,1775],{"class":271},"name",[265,1777,293],{"class":275},[265,1779,1780],{"class":296},"Prometheus\n",[265,1782,1783,1786,1788],{"class":267,"line":300},[265,1784,1785],{"class":271},"    type",[265,1787,293],{"class":275},[265,1789,1790],{"class":296},"prometheus\n",[265,1792,1793,1796,1798],{"class":267,"line":308},[265,1794,1795],{"class":271},"    access",[265,1797,293],{"class":275},[265,1799,1800],{"class":296},"proxy\n",[265,1802,1803,1806,1808],{"class":267,"line":317},[265,1804,1805],{"class":271},"    url",[265,1807,293],{"class":275},[265,1809,1810],{"class":296},"http:\u002F\u002Fprometheus:9090\n",[265,1812,1813,1816,1818],{"class":267,"line":325},[265,1814,1815],{"class":271},"    isDefault",[265,1817,293],{"class":275},[265,1819,1502],{"class":694},[265,1821,1822,1824,1826,1828],{"class":267,"line":333},[265,1823,856],{"class":275},[265,1825,1775],{"class":271},[265,1827,293],{"class":275},[265,1829,1830],{"class":296},"Loki\n",[265,1832,1833,1835,1837],{"class":267,"line":341},[265,1834,1785],{"class":271},[265,1836,293],{"class":275},[265,1838,1839],{"class":296},"loki\n",[265,1841,1842,1844,1846],{"class":267,"line":349},[265,1843,1795],{"class":271},[265,1845,293],{"class":275},[265,1847,1800],{"class":296},[265,1849,1850,1852,1854],{"class":267,"line":361},[265,1851,1805],{"class":271},[265,1853,293],{"class":275},[265,1855,1856],{"class":296},"http:\u002F\u002Floki:3100\n",[11,1858,1859,1860,1863],{},"Reinicia Grafana con ",[15,1861,1862],{},"docker compose restart grafana"," y las fuentes aparecen automáticamente.",[11,1865,1866,1867,1870],{},"Importa los dashboards listos. En ",[38,1868,1869],{},"Dashboards → New → Import",", pega el ID del dashboard:",[67,1872,1873,1879,1885],{},[70,1874,1875,1878],{},[38,1876,1877],{},"1860"," — Node Exporter Full. CPU, RAM, disco, red, sistema de archivos. Es el dashboard más usado de la comunidad Prometheus, con razón.",[70,1880,1881,1884],{},[38,1882,1883],{},"13639"," — Logs \u002F App. Visualización básica de logs de Loki con filtros por job, container, host.",[70,1886,1887,1890],{},[38,1888,1889],{},"15172"," — Cluster overview. Visión consolidada por servidor, útil para cluster pequeño.",[11,1892,1893,1894,1897],{},"Customiza cada uno para usar ",[15,1895,1896],{},"environment=\"production\""," en el filtro default. Después de dos semanas usándolos, vas a querer crear dashboards propios para workloads específicos — no hay atajo ahí, es tiempo de silla.",[30,1899,1901],{"id":1900},"paso-7-como-configurar-alertas-basicas","Paso 7 — ¿Cómo configurar alertas básicas?",[11,1903,171,1904,121],{},[38,1905,232],{},[11,1907,1908],{},"Alertas son donde 80% de los equipos tropiezan: o ponen poquísimos y descubren incidentes por los clientes, o ponen decenas y desensibilizan al equipo.",[11,1910,1911,1912,1915,1916,1194],{},"Empieza con ",[38,1913,1914],{},"seis alertas esenciales",". En ",[15,1917,1918],{},"prometheus\u002Falerts.yml",[241,1920,1922],{"className":259,"code":1921,"language":261,"meta":249,"style":249},"groups:\n  - name: essentials\n    interval: 30s\n    rules:\n      - alert: ServerDown\n        expr: up{job=\"node\"} == 0\n        for: 2m\n        labels:\n          severity: critical\n        annotations:\n          summary: \"Servidor {{ $labels.instance }} está fuera del aire\"\n\n      - alert: HighCPU\n        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100) > 80\n        for: 10m\n        labels:\n          severity: warning\n\n      - alert: DiskAlmostFull\n        expr: (node_filesystem_avail_bytes{mountpoint=\"\u002F\"} \u002F node_filesystem_size_bytes{mountpoint=\"\u002F\"}) * 100 \u003C 15\n        for: 5m\n        labels:\n          severity: critical\n\n      - alert: HighMemory\n        expr: (1 - (node_memory_MemAvailable_bytes \u002F node_memory_MemTotal_bytes)) * 100 > 90\n        for: 10m\n        labels:\n          severity: warning\n\n      - alert: HighHTTPErrorRate\n        expr: sum(rate(http_requests_total{status=~\"5..\"}[5m])) \u002F sum(rate(http_requests_total[5m])) > 0.05\n        for: 5m\n        labels:\n          severity: critical\n\n      - alert: HighLatency\n        expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) > 2\n        for: 10m\n        labels:\n          severity: warning\n",[15,1923,1924,1931,1942,1952,1959,1971,1981,1991,1997,2007,2014,2024,2028,2039,2048,2057,2063,2072,2076,2087,2096,2105,2111,2119,2123,2134,2143,2151,2157,2165,2169,2180,2189,2197,2203,2211,2215,2226,2235,2243,2249],{"__ignoreMap":249},[265,1925,1926,1929],{"class":267,"line":268},[265,1927,1928],{"class":271},"groups",[265,1930,276],{"class":275},[265,1932,1933,1935,1937,1939],{"class":267,"line":279},[265,1934,856],{"class":275},[265,1936,1775],{"class":271},[265,1938,293],{"class":275},[265,1940,1941],{"class":296},"essentials\n",[265,1943,1944,1947,1949],{"class":267,"line":287},[265,1945,1946],{"class":271},"    interval",[265,1948,293],{"class":275},[265,1950,1951],{"class":296},"30s\n",[265,1953,1954,1957],{"class":267,"line":300},[265,1955,1956],{"class":271},"    rules",[265,1958,276],{"class":275},[265,1960,1961,1963,1966,1968],{"class":267,"line":308},[265,1962,311],{"class":275},[265,1964,1965],{"class":271},"alert",[265,1967,293],{"class":275},[265,1969,1970],{"class":296},"ServerDown\n",[265,1972,1973,1976,1978],{"class":267,"line":317},[265,1974,1975],{"class":271},"        expr",[265,1977,293],{"class":275},[265,1979,1980],{"class":296},"up{job=\"node\"} == 0\n",[265,1982,1983,1986,1988],{"class":267,"line":325},[265,1984,1985],{"class":271},"        for",[265,1987,293],{"class":275},[265,1989,1990],{"class":296},"2m\n",[265,1992,1993,1995],{"class":267,"line":333},[265,1994,965],{"class":271},[265,1996,276],{"class":275},[265,1998,1999,2002,2004],{"class":267,"line":341},[265,2000,2001],{"class":271},"          severity",[265,2003,293],{"class":275},[265,2005,2006],{"class":296},"critical\n",[265,2008,2009,2012],{"class":267,"line":349},[265,2010,2011],{"class":271},"        annotations",[265,2013,276],{"class":275},[265,2015,2016,2019,2021],{"class":267,"line":361},[265,2017,2018],{"class":271},"          summary",[265,2020,293],{"class":275},[265,2022,2023],{"class":296},"\"Servidor {{ $labels.instance }} está fuera del aire\"\n",[265,2025,2026],{"class":267,"line":369},[265,2027,392],{"emptyLinePlaceholder":391},[265,2029,2030,2032,2034,2036],{"class":267,"line":377},[265,2031,311],{"class":275},[265,2033,1965],{"class":271},[265,2035,293],{"class":275},[265,2037,2038],{"class":296},"HighCPU\n",[265,2040,2041,2043,2045],{"class":267,"line":388},[265,2042,1975],{"class":271},[265,2044,293],{"class":275},[265,2046,2047],{"class":296},"100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100) > 80\n",[265,2049,2050,2052,2054],{"class":267,"line":395},[265,2051,1985],{"class":271},[265,2053,293],{"class":275},[265,2055,2056],{"class":296},"10m\n",[265,2058,2059,2061],{"class":267,"line":403},[265,2060,965],{"class":271},[265,2062,276],{"class":275},[265,2064,2065,2067,2069],{"class":267,"line":413},[265,2066,2001],{"class":271},[265,2068,293],{"class":275},[265,2070,2071],{"class":296},"warning\n",[265,2073,2074],{"class":267,"line":420},[265,2075,392],{"emptyLinePlaceholder":391},[265,2077,2078,2080,2082,2084],{"class":267,"line":428},[265,2079,311],{"class":275},[265,2081,1965],{"class":271},[265,2083,293],{"class":275},[265,2085,2086],{"class":296},"DiskAlmostFull\n",[265,2088,2089,2091,2093],{"class":267,"line":436},[265,2090,1975],{"class":271},[265,2092,293],{"class":275},[265,2094,2095],{"class":296},"(node_filesystem_avail_bytes{mountpoint=\"\u002F\"} \u002F node_filesystem_size_bytes{mountpoint=\"\u002F\"}) * 100 \u003C 15\n",[265,2097,2098,2100,2102],{"class":267,"line":444},[265,2099,1985],{"class":271},[265,2101,293],{"class":275},[265,2103,2104],{"class":296},"5m\n",[265,2106,2107,2109],{"class":267,"line":452},[265,2108,965],{"class":271},[265,2110,276],{"class":275},[265,2112,2113,2115,2117],{"class":267,"line":460},[265,2114,2001],{"class":271},[265,2116,293],{"class":275},[265,2118,2006],{"class":296},[265,2120,2121],{"class":267,"line":467},[265,2122,392],{"emptyLinePlaceholder":391},[265,2124,2125,2127,2129,2131],{"class":267,"line":475},[265,2126,311],{"class":275},[265,2128,1965],{"class":271},[265,2130,293],{"class":275},[265,2132,2133],{"class":296},"HighMemory\n",[265,2135,2136,2138,2140],{"class":267,"line":484},[265,2137,1975],{"class":271},[265,2139,293],{"class":275},[265,2141,2142],{"class":296},"(1 - (node_memory_MemAvailable_bytes \u002F node_memory_MemTotal_bytes)) * 100 > 90\n",[265,2144,2145,2147,2149],{"class":267,"line":489},[265,2146,1985],{"class":271},[265,2148,293],{"class":275},[265,2150,2056],{"class":296},[265,2152,2153,2155],{"class":267,"line":497},[265,2154,965],{"class":271},[265,2156,276],{"class":275},[265,2158,2159,2161,2163],{"class":267,"line":507},[265,2160,2001],{"class":271},[265,2162,293],{"class":275},[265,2164,2071],{"class":296},[265,2166,2167],{"class":267,"line":514},[265,2168,392],{"emptyLinePlaceholder":391},[265,2170,2171,2173,2175,2177],{"class":267,"line":522},[265,2172,311],{"class":275},[265,2174,1965],{"class":271},[265,2176,293],{"class":275},[265,2178,2179],{"class":296},"HighHTTPErrorRate\n",[265,2181,2182,2184,2186],{"class":267,"line":530},[265,2183,1975],{"class":271},[265,2185,293],{"class":275},[265,2187,2188],{"class":296},"sum(rate(http_requests_total{status=~\"5..\"}[5m])) \u002F sum(rate(http_requests_total[5m])) > 0.05\n",[265,2190,2191,2193,2195],{"class":267,"line":540},[265,2192,1985],{"class":271},[265,2194,293],{"class":275},[265,2196,2104],{"class":296},[265,2198,2199,2201],{"class":267,"line":547},[265,2200,965],{"class":271},[265,2202,276],{"class":275},[265,2204,2205,2207,2209],{"class":267,"line":555},[265,2206,2001],{"class":271},[265,2208,293],{"class":275},[265,2210,2006],{"class":296},[265,2212,2213],{"class":267,"line":564},[265,2214,392],{"emptyLinePlaceholder":391},[265,2216,2217,2219,2221,2223],{"class":267,"line":569},[265,2218,311],{"class":275},[265,2220,1965],{"class":271},[265,2222,293],{"class":275},[265,2224,2225],{"class":296},"HighLatency\n",[265,2227,2228,2230,2232],{"class":267,"line":577},[265,2229,1975],{"class":271},[265,2231,293],{"class":275},[265,2233,2234],{"class":296},"histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) > 2\n",[265,2236,2237,2239,2241],{"class":267,"line":587},[265,2238,1985],{"class":271},[265,2240,293],{"class":275},[265,2242,2056],{"class":296},[265,2244,2245,2247],{"class":267,"line":594},[265,2246,965],{"class":271},[265,2248,276],{"class":275},[265,2250,2251,2253,2255],{"class":267,"line":602},[265,2252,2001],{"class":271},[265,2254,293],{"class":275},[265,2256,2071],{"class":296},[11,2258,2259,2260,2263],{},"Y el ",[15,2261,2262],{},"alertmanager\u002Falertmanager.yml"," apuntando a un webhook de Slack o Discord:",[241,2265,2267],{"className":259,"code":2266,"language":261,"meta":249,"style":249},"route:\n  group_by: ['alertname', 'severity']\n  group_wait: 30s\n  group_interval: 5m\n  repeat_interval: 4h\n  receiver: 'slack-default'\n  routes:\n    - match:\n        severity: critical\n      receiver: 'slack-critical'\n      repeat_interval: 1h\n\nreceivers:\n  - name: 'slack-default'\n    slack_configs:\n      - api_url: 'https:\u002F\u002Fhooks.slack.com\u002Fservices\u002FTU\u002FWEBHOOK\u002FAQUI'\n        channel: '#alerts'\n        send_resolved: true\n\n  - name: 'slack-critical'\n    slack_configs:\n      - api_url: 'https:\u002F\u002Fhooks.slack.com\u002Fservices\u002FTU\u002FWEBHOOK\u002FAQUI'\n        channel: '#alerts-critical'\n        send_resolved: true\n",[15,2268,2269,2276,2293,2302,2311,2321,2331,2338,2347,2356,2366,2376,2380,2387,2397,2404,2416,2426,2435,2439,2449,2455,2465,2474],{"__ignoreMap":249},[265,2270,2271,2274],{"class":267,"line":268},[265,2272,2273],{"class":271},"route",[265,2275,276],{"class":275},[265,2277,2278,2281,2283,2286,2288,2291],{"class":267,"line":279},[265,2279,2280],{"class":271},"  group_by",[265,2282,834],{"class":275},[265,2284,2285],{"class":296},"'alertname'",[265,2287,18],{"class":275},[265,2289,2290],{"class":296},"'severity'",[265,2292,840],{"class":275},[265,2294,2295,2298,2300],{"class":267,"line":287},[265,2296,2297],{"class":271},"  group_wait",[265,2299,293],{"class":275},[265,2301,1951],{"class":296},[265,2303,2304,2307,2309],{"class":267,"line":300},[265,2305,2306],{"class":271},"  group_interval",[265,2308,293],{"class":275},[265,2310,2104],{"class":296},[265,2312,2313,2316,2318],{"class":267,"line":308},[265,2314,2315],{"class":271},"  repeat_interval",[265,2317,293],{"class":275},[265,2319,2320],{"class":296},"4h\n",[265,2322,2323,2326,2328],{"class":267,"line":317},[265,2324,2325],{"class":271},"  receiver",[265,2327,293],{"class":275},[265,2329,2330],{"class":296},"'slack-default'\n",[265,2332,2333,2336],{"class":267,"line":325},[265,2334,2335],{"class":271},"  routes",[265,2337,276],{"class":275},[265,2339,2340,2342,2345],{"class":267,"line":333},[265,2341,818],{"class":275},[265,2343,2344],{"class":271},"match",[265,2346,276],{"class":275},[265,2348,2349,2352,2354],{"class":267,"line":341},[265,2350,2351],{"class":271},"        severity",[265,2353,293],{"class":275},[265,2355,2006],{"class":296},[265,2357,2358,2361,2363],{"class":267,"line":349},[265,2359,2360],{"class":271},"      receiver",[265,2362,293],{"class":275},[265,2364,2365],{"class":296},"'slack-critical'\n",[265,2367,2368,2371,2373],{"class":267,"line":361},[265,2369,2370],{"class":271},"      repeat_interval",[265,2372,293],{"class":275},[265,2374,2375],{"class":296},"1h\n",[265,2377,2378],{"class":267,"line":369},[265,2379,392],{"emptyLinePlaceholder":391},[265,2381,2382,2385],{"class":267,"line":377},[265,2383,2384],{"class":271},"receivers",[265,2386,276],{"class":275},[265,2388,2389,2391,2393,2395],{"class":267,"line":388},[265,2390,856],{"class":275},[265,2392,1775],{"class":271},[265,2394,293],{"class":275},[265,2396,2330],{"class":296},[265,2398,2399,2402],{"class":267,"line":395},[265,2400,2401],{"class":271},"    slack_configs",[265,2403,276],{"class":275},[265,2405,2406,2408,2411,2413],{"class":267,"line":403},[265,2407,311],{"class":275},[265,2409,2410],{"class":271},"api_url",[265,2412,293],{"class":275},[265,2414,2415],{"class":296},"'https:\u002F\u002Fhooks.slack.com\u002Fservices\u002FTU\u002FWEBHOOK\u002FAQUI'\n",[265,2417,2418,2421,2423],{"class":267,"line":413},[265,2419,2420],{"class":271},"        channel",[265,2422,293],{"class":275},[265,2424,2425],{"class":296},"'#alerts'\n",[265,2427,2428,2431,2433],{"class":267,"line":420},[265,2429,2430],{"class":271},"        send_resolved",[265,2432,293],{"class":275},[265,2434,1502],{"class":694},[265,2436,2437],{"class":267,"line":428},[265,2438,392],{"emptyLinePlaceholder":391},[265,2440,2441,2443,2445,2447],{"class":267,"line":436},[265,2442,856],{"class":275},[265,2444,1775],{"class":271},[265,2446,293],{"class":275},[265,2448,2365],{"class":296},[265,2450,2451,2453],{"class":267,"line":444},[265,2452,2401],{"class":271},[265,2454,276],{"class":275},[265,2456,2457,2459,2461,2463],{"class":267,"line":452},[265,2458,311],{"class":275},[265,2460,2410],{"class":271},[265,2462,293],{"class":275},[265,2464,2415],{"class":296},[265,2466,2467,2469,2471],{"class":267,"line":460},[265,2468,2420],{"class":271},[265,2470,293],{"class":275},[265,2472,2473],{"class":296},"'#alerts-critical'\n",[265,2475,2476,2478,2480],{"class":267,"line":467},[265,2477,2430],{"class":271},[265,2479,293],{"class":275},[265,2481,1502],{"class":694},[11,2483,2484,2485,2488,2489,2492],{},"Dos detalles que ahorran noche de sueño. El ",[15,2486,2487],{},"for: 10m"," en CPU evita que picos cortos se vuelvan alertas — el servidor puede llegar a 95% por 30 segundos y eso ser normal. El ",[15,2490,2491],{},"repeat_interval: 4h"," para warnings garantiza que un warning resuelto en una hora no se vuelva 60 mensajes — el Alertmanager agrupa.",[11,2494,2495,2496,2498,2499,2502,2503,2506],{},"Recarga Prometheus (",[15,2497,1060],{},") y prueba forzando una alerta: ",[15,2500,2501],{},"stress --cpu 4 --timeout 700s"," en algún servidor debe disparar ",[15,2504,2505],{},"HighCPU"," en 10 minutos.",[30,2508,2510],{"id":2509},"paso-8-como-poner-reverse-proxy-y-tls-al-frente","Paso 8 — ¿Cómo poner reverse proxy y TLS al frente?",[11,2512,171,2513,121],{},[38,2514,1732],{},[11,2516,2517,2518,2520],{},"Para acceder a Grafana vía ",[15,2519,1738],{}," con certificado válido, necesitas algo al frente del puerto 3000. Dos opciones:",[182,2522,2523,2533],{},[70,2524,2525,2528,2529,2532],{},[38,2526,2527],{},"Router integrado del orquestador"," — si ya tienes el cluster HeroCtl corriendo, basta declarar Grafana como job con ",[15,2530,2531],{},"ingress: { host: monitor.tudominio.com, tls: true }",". Certificado Let's Encrypt automático, sin herramienta adicional.",[70,2534,2535,2538,2539],{},[38,2536,2537],{},"Caddy standalone"," en la propia VPS de observabilidad — también emite Let's Encrypt automáticamente. Caddyfile mínimo:",[241,2540,2543],{"className":2541,"code":2542,"language":246},[244],"monitor.tudominio.com {\n  reverse_proxy localhost:3000\n  basicauth \u002Flogin {\n    admin \u003Chash_bcrypt>\n  }\n}\n",[15,2544,2542],{"__ignoreMap":249},[11,2546,2547,2548,2551],{},"Para defensa en profundidad, mantén autenticación básica del Caddy\u002Frouter al frente del login de Grafana — dos barreras, no una. La segunda es especialmente importante porque el login default de Grafana es ",[15,2549,2550],{},"admin\u002Fadmin"," y la primera cosa que los bots hacen en un Grafana expuesto es probar esa combinación.",[30,2553,2555],{"id":2554},"paso-9-como-instrumentar-metricas-de-aplicacion","Paso 9 — ¿Cómo instrumentar métricas de aplicación?",[11,2557,171,2558,121],{},[38,2559,2560],{},"varía según número de aplicaciones",[11,2562,2563],{},"Métricas de sistema son la mitad de la historia. La otra mitad es lo que tu aplicación está haciendo — cuántas requests por segundo, cuál la latencia p99, cuántos errores, cuál el tamaño de la cola de jobs en background.",[11,2565,2566],{},"Cada lenguaje popular tiene cliente Prometheus oficial:",[67,2568,2569,2577,2585,2592],{},[70,2570,2571,293,2574],{},[38,2572,2573],{},"Node.js",[15,2575,2576],{},"prom-client",[70,2578,2579,293,2582],{},[38,2580,2581],{},"Python",[15,2583,2584],{},"prometheus-client",[70,2586,2587,293,2590],{},[38,2588,2589],{},"Ruby",[15,2591,2584],{},[70,2593,2594,293,2597],{},[38,2595,2596],{},"Go",[15,2598,2599],{},"github.com\u002Fprometheus\u002Fclient_golang",[11,2601,2602],{},"El estándar mínimo son tres métricas por endpoint HTTP:",[67,2604,2605,2620,2626],{},[70,2606,2607,2610,2611,18,2614,18,2617,121],{},[15,2608,2609],{},"http_requests_total"," — counter, con labels ",[15,2612,2613],{},"method",[15,2615,2616],{},"path",[15,2618,2619],{},"status",[70,2621,2622,2625],{},[15,2623,2624],{},"http_request_duration_seconds"," — histogram, mismo set de labels.",[70,2627,2628,2631,2632,2635],{},[15,2629,2630],{},"app_errors_total"," — counter, con label ",[15,2633,2634],{},"kind"," (\"validation\", \"db\", \"external_api\", etc).",[11,2637,2638,2639,2641,2642,2644],{},"Expone todo eso en ",[15,2640,151],{},". Agrega el endpoint en el ",[15,2643,868],{}," de Prometheus. En horas tienes dashboards por endpoint, alertas por tasa de error, y la capacidad de responder \"qué estaba ocurriendo a las 3:14 de ayer\" con un gráfico en lugar de un tiro al aire.",[11,2646,2647,2648,2651,2652,2655],{},"Cuidado con ",[38,2649,2650],{},"cardinalidad",". Cada combinación única de labels se vuelve una serie temporal separada. Si pones ",[15,2653,2654],{},"user_id"," como label, con 100k usuarios creas 100k series — y Prometheus va a consumir 8+ GB de RAM solo para indexar eso. Regla práctica: labels tienen valores en conjuntos pequeños (status code: 5 valores; método: 5 valores; path: decenas). Identificadores únicos van en logs, no en métricas.",[30,2657,2659],{"id":2658},"como-correr-esto-dentro-de-heroctl-en-lugar-de-vps-dedicada","¿Cómo correr esto dentro de HeroCtl en lugar de VPS dedicada?",[11,2661,2662],{},"Para clusters que ya corren el orquestador, tiene sentido considerar la stack como un job más. Trade-off: ahorras una VPS, pero pierdes aislamiento (si el cluster muere, el monitoring muere junto).",[11,2664,2665],{},"La topología queda así:",[67,2667,2668,2674,2680,2686],{},[70,2669,2670,2673],{},[38,2671,2672],{},"1 job spec único"," con 4 tasks: prometheus, grafana, loki, alertmanager.",[70,2675,2676,2679],{},[38,2677,2678],{},"Volúmenes replicados"," en el cluster — los datos sobreviven a falla de un nodo.",[70,2681,2682,2685],{},[38,2683,2684],{},"Router integrado"," hace el TLS automático vía subdominio. No necesita Caddy adicional.",[70,2687,2688,2691],{},[38,2689,2690],{},"Métricas del propio cluster"," ya son expuestas en formato Prometheus en la API administrativa, entonces el scrape es directo.",[11,2693,2694],{},"Para producción crítica, recomendamos la separación física (VPS dedicada fuera del cluster). Para proyecto personal, MVP, o equipo pequeño donde \"todo cae junto\" es aceptable, correr dentro es más barato y operacionalmente más simple. El job spec entero queda en torno de 80 líneas de manifiesto.",[30,2696,2698],{"id":2697},"cuanto-cuesta-esa-stack-al-mes-en-brasil","¿Cuánto cuesta esa stack al mes en Brasil?",[2700,2701,2702,2715],"table",{},[2703,2704,2705],"thead",{},[2706,2707,2708,2712],"tr",{},[2709,2710,2711],"th",{},"Ítem",[2709,2713,2714],{},"Costo mensual (BRL)",[2716,2717,2718,2727,2735,2743],"tbody",{},[2706,2719,2720,2724],{},[2721,2722,2723],"td",{},"VPS observability dedicada (4 GB RAM)",[2721,2725,2726],{},"R$40 a R$80",[2706,2728,2729,2732],{},[2721,2730,2731],{},"Object storage para retención larga de logs (opcional)",[2721,2733,2734],{},"R$30",[2706,2736,2737,2740],{},[2721,2738,2739],{},"Tiempo de mantenimiento (2 a 4h × valor de la hora)",[2721,2741,2742],{},"R$200 a R$400",[2706,2744,2745,2750],{},[2721,2746,2747],{},[38,2748,2749],{},"Total operacional",[2721,2751,2752],{},[38,2753,2754],{},"R$300 a R$500",[11,2756,2757],{},"Para comparación, una suscripción de Datadog o New Relic con cobertura equivalente (5 hosts, retención de logs de 30 días, alertas, dashboards) sale en torno de R$1.500 a R$2.000 al mes — sin contar el overage automático que aparece al final del mes cuando alguien olvida un log verboso encendido.",[11,2759,2760],{},"La diferencia no es pequeña: en un año, la stack open-source self-hosted ahorra entre R$12.000 y R$18.000. Para startup en etapa inicial, eso es medio ingeniero júnior.",[30,2762,2764],{"id":2763},"tabla-de-puertos-recursos-y-caracteristicas-por-componente","Tabla de puertos, recursos y características por componente",[2700,2766,2767,2789],{},[2703,2768,2769],{},[2706,2770,2771,2774,2777,2780,2783,2786],{},[2709,2772,2773],{},"Componente",[2709,2775,2776],{},"Puerto",[2709,2778,2779],{},"RAM mínima",[2709,2781,2782],{},"Disco",[2709,2784,2785],{},"Retención default",[2709,2787,2788],{},"Formato de los datos",[2716,2790,2791,2810,2829,2847,2866,2882],{},[2706,2792,2793,2795,2798,2801,2804,2807],{},[2721,2794,74],{},[2721,2796,2797],{},"9090",[2721,2799,2800],{},"512 MB",[2721,2802,2803],{},"10 GB",[2721,2805,2806],{},"15 días",[2721,2808,2809],{},"TSDB binario",[2706,2811,2812,2814,2817,2820,2823,2826],{},[2721,2813,80],{},[2721,2815,2816],{},"3000",[2721,2818,2819],{},"256 MB",[2721,2821,2822],{},"1 GB",[2721,2824,2825],{},"N\u002FA",[2721,2827,2828],{},"SQLite o Postgres",[2706,2830,2831,2833,2836,2838,2841,2844],{},[2721,2832,86],{},[2721,2834,2835],{},"3100",[2721,2837,2800],{},[2721,2839,2840],{},"30 GB",[2721,2842,2843],{},"30 días (configurable)",[2721,2845,2846],{},"chunks comprimidos",[2706,2848,2849,2852,2855,2858,2861,2863],{},[2721,2850,2851],{},"Promtail \u002F Agent",[2721,2853,2854],{},"9080",[2721,2856,2857],{},"128 MB",[2721,2859,2860],{},"mínimo",[2721,2862,2825],{},[2721,2864,2865],{},"pasa por valor",[2706,2867,2868,2870,2873,2875,2877,2879],{},[2721,2869,104],{},[2721,2871,2872],{},"9093",[2721,2874,2857],{},[2721,2876,2822],{},[2721,2878,2825],{},[2721,2880,2881],{},"log de notificaciones",[2706,2883,2884,2886,2889,2892,2894,2896],{},[2721,2885,98],{},[2721,2887,2888],{},"9100",[2721,2890,2891],{},"64 MB",[2721,2893,2860],{},[2721,2895,2825],{},[2721,2897,2898],{},"endpoint de scrape",[11,2900,2901],{},"Esas son las mínimas viables para cluster pequeño. En producción con 30 servidores y tráfico real, multiplica RAM por 3 y disco por 5.",[30,2903,2905],{"id":2904},"los-cuatro-errores-que-matan-stack-de-monitoring-nueva","Los cuatro errores que matan stack de monitoring nueva",[11,2907,2908],{},"Equipos montando observabilidad por primera vez tropiezan casi siempre en los mismos cuatro errores. Saber sobre ellos antes ahorra meses.",[11,2910,2911,2914,2915,2918],{},[38,2912,2913],{},"No monitorear el monitoring."," Prometheus paró de scrape el jueves; nadie vio. El miércoles de la semana siguiente un servidor cayó de verdad y descubrieron que no había alerta porque Prometheus estaba muerto hace 6 días. Solución: configura un cron externo simple (hasta un Pingdom gratuito sirve) que pegue en ",[15,2916,2917],{},"https:\u002F\u002Fmonitor.tudominio.com\u002Fapi\u002Fhealth"," cada 5 minutos y te avise cuando el propio Grafana caiga.",[11,2920,2921,2924,2925,2928],{},[38,2922,2923],{},"Sin estrategia de retención."," Disco se llena en tres meses, Prometheus para de grabar, alguien borra todo en la desesperación, pierde 90 días de historial. Configura ",[15,2926,2927],{},"--storage.tsdb.retention.time=30d"," desde el día uno y establece un job de housekeeping.",[11,2930,2931,2934,2935,18,2937,2940],{},[38,2932,2933],{},"Cardinalidad alta en labels."," Ya cubrimos en el paso 9, pero vale la pena repetir: cada ",[15,2936,2654],{},[15,2938,2939],{},"request_id"," o UUID que se vuelve label es un número que multiplica explosivamente el consumo de RAM de Prometheus. Identificadores únicos van a Loki, no a Prometheus.",[11,2942,2943,2946],{},[38,2944,2945],{},"Alertas ruidosas."," El equipo recibe 200 alertas por día. En dos semanas, nadie mira más. Cuando el sitio caiga de verdad, la alerta va a estar en medio de otras 199. Solución: empieza con seis alertas (las del paso 7), audita cada dos semanas, y excluye todo lo que disparó pero no exigió acción humana. Alerta sin acción es ruido.",[30,2948,2950],{"id":2949},"faq","FAQ",[11,2952,2953,2956],{},[38,2954,2955],{},"¿Puedo correr todo en una VPS de 2 GB?","\nTécnicamente sí, para cluster de hasta 3 servidores y pocas aplicaciones. En la práctica vas a chocar contra el techo de RAM en 2 a 3 meses, especialmente si importas dashboards densos en Grafana. Paga los 50 reales más y ve directo a la VPS de 4 GB — el tiempo que ahorras no peleando con OOM kills se paga solo.",[11,2958,2959,2962],{},[38,2960,2961],{},"¿Cuánto disco para 30 días de logs?","\nDepende totalmente del volumen de logs de tu aplicación. Regla gruesa para startup pequeña: cluster de 4 servidores con aplicaciones web normales genera 1 a 5 GB de logs por día después de compresión de Loki. Treinta días da entre 30 y 150 GB. Empieza con 50 GB de SSD, monitorea el crecimiento por dos semanas, expande si es necesario. Si vas mucho más allá de eso, es hora de ir a object storage.",[11,2964,2965,2968],{},[38,2966,2967],{},"Grafana Cloud vs self-hosted, ¿cuál elegir?","\nGrafana Cloud free tier es generoso (10k series, 50 GB de logs, retención de 14 días) y elimina el trabajo de mantener el servidor. Para proyecto solo o equipo muy pequeño, tiene sentido. A partir del momento en que pasas del free tier, los precios escalan rápido — a partir de US$50\u002Fmes — y pierdes el control sobre los datos. Self-hosted cuesta hardware + tiempo, Cloud cuesta dinero + lock-in. Para empresa que pretende crecer y tiene un dev DevOps en el equipo, self-hosted gana.",[11,2970,2971,2974],{},[38,2972,2973],{},"¿Promtail o Grafana Agent?","\nEn 2026, Grafana Agent (rebautizado a Grafana Alloy) está sustituyendo al Promtail oficialmente. Para setup nuevo, ve directo a Alloy. Para setup que ya corre Promtail hace tiempo, no hay urgencia en migrar — Promtail va a seguir funcionando por años.",[11,2976,2977,2980,2981,2983],{},[38,2978,2979],{},"¿OpenTelemetry encaja dónde en esa stack?","\nOTel es el estándar de instrumentación de aplicación que está consolidando. En lugar de usar ",[15,2982,2576],{}," directo, usas el SDK de OTel y él exporta a Prometheus, Loki y Tempo simultáneamente. La ventaja grande es portabilidad — si quieres cambiar Prometheus por otra cosa de aquí a 3 años, tu aplicación no cambia una línea. Para startup empezando hoy, recomendamos OTel desde el día uno.",[11,2985,2986,2989,2990,2993,2994,2997],{},[38,2987,2988],{},"¿Cómo hago backup de Prometheus?","\nPrometheus tiene snapshot vía API: ",[15,2991,2992],{},"curl -X POST localhost:9090\u002Fapi\u002Fv1\u002Fadmin\u002Ftsdb\u002Fsnapshot"," crea un snapshot en el directorio de datos. Hazlo una vez al día vía cron, haz ",[15,2995,2996],{},"tar.gz"," y envíalo a object storage. En caso de desastre, lo que pierdes son métricas — y métricas, diferente de logs, son típicamente recuperables en horas (vuelves a recolectar y los dashboards vuelven). Logs perdidos están perdidos para siempre, entonces invierte más en backup de Loki.",[11,2999,3000,3003],{},[38,3001,3002],{},"¿Tempo (traces distribuidos) vale la pena instalar ahora?","\nNo. Traces se vuelven útiles a partir del momento en que tienes 5+ servicios conversando entre sí y depurar latencia involucra seguir una request por varios hops. Para arquitectura monolítica o pocos servicios, traces dan trabajo desproporcional al valor. Agrégalos cuando la complejidad lo pida.",[11,3005,3006,3009],{},[38,3007,3008],{},"¿Loki indexa full-text como ELK?","\nNo, y esa es la feature, no bug. Loki indexa solo labels (job, host, container, severity) y el contenido del log queda comprimido sin índice. Para buscar texto, filtras por labels primero y después haces grep en los chunks resultantes. Eso es lo que vuelve a Loki diez veces más barato que ELK en RAM y CPU. A cambio, queries de texto libre en todo el historial son más lentas. Para 90% de los casos de debugging, filtrar por job + host + ventana de tiempo ya reduce a decenas de MB donde el grep vuela.",[30,3011,3013],{"id":3012},"proximos-pasos","Próximos pasos",[11,3015,3016],{},"¿Subiste la stack, tienes dashboard, tienes alerta, tienes log buscable? Bien. Las próximas tres cosas que valen la inversión son, en orden:",[182,3018,3019,3025,3039],{},[70,3020,3021,3024],{},[38,3022,3023],{},"Custom dashboards por aplicación"," — métricas de negocio (suscripciones creadas\u002Fhora, jobs procesados, cola de e-mails) en lugar de solo infraestructura.",[70,3026,3027,3030,3031,3034,3035,3038],{},[38,3028,3029],{},"Runbooks linkados en las alertas"," — toda regla en ",[15,3032,3033],{},"alerts.yml"," debe tener ",[15,3036,3037],{},"annotations.runbook_url"," apuntando a una página explicando qué hacer. Cuando la alerta dispare a las 3 de la mañana, el sueño no piensa.",[70,3040,3041,3044],{},[38,3042,3043],{},"Revisión mensual de alertas"," — 30 minutos una vez al mes auditando lo que disparó en el mes anterior, eliminando lo que se volvió ruido, ajustando thresholds.",[11,3046,3047,3048,3053,3054,121],{},"Para quien quiere ir más allá y entender por qué elegimos esta stack en lugar de SaaS gestionado, lee ",[3049,3050,3052],"a",{"href":3051},"\u002Fes\u002Fblog\u002Fobservabilidad-sin-datadog-stack-startup","Observabilidad sin Datadog: la stack de la startup",". Y para cerrar el ciclo de operación — porque no sirve saber que la base cayó si no logras restaurar — vale la pena leer ",[3049,3055,3057],{"href":3056},"\u002Fes\u002Fblog\u002Fbackup-de-base-de-datos-en-cluster","Backup de base en cluster: estrategias para las 3 de la mañana",[11,3059,3060],{},"Si quieres saltarte ese montaje todo y correr la stack como job dentro de un orquestador que ya cuida de TLS, rolling update y replicación de volumen:",[241,3062,3064],{"className":685,"code":3063,"language":687,"meta":249,"style":249},"curl -sSL get.heroctl.com\u002Finstall.sh | sh\n",[15,3065,3066],{"__ignoreMap":249},[265,3067,3068,3071,3074,3077,3080],{"class":267,"line":268},[265,3069,3070],{"class":703},"curl",[265,3072,3073],{"class":694}," -sSL",[265,3075,3076],{"class":296}," get.heroctl.com\u002Finstall.sh",[265,3078,3079],{"class":1104}," |",[265,3081,3082],{"class":703}," sh\n",[11,3084,3085],{},"Cuatro horas se vuelven cuarenta minutos. El resto es el mismo trabajo de pensar qué alertas importan — y en esa parte nadie te libra.",[3087,3088,3089],"style",{},"html pre.shiki code .sPWt5, html code.shiki .sPWt5{--shiki-default:#7EE787}html pre.shiki code .sZEs4, html code.shiki .sZEs4{--shiki-default:#E6EDF3}html pre.shiki code .s9uIt, html code.shiki .s9uIt{--shiki-default:#A5D6FF}html pre.shiki code .sH3jZ, html code.shiki .sH3jZ{--shiki-default:#8B949E}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html pre.shiki code .sFSAA, html code.shiki .sFSAA{--shiki-default:#79C0FF}html pre.shiki code .sQhOw, html code.shiki .sQhOw{--shiki-default:#FFA657}html pre.shiki code .suJrU, html code.shiki .suJrU{--shiki-default:#FF7B72}",{"title":249,"searchDepth":279,"depth":279,"links":3091},[3092,3093,3094,3095,3096,3097,3098,3099,3100,3101,3102,3103,3104,3105,3106,3107,3108,3109],{"id":32,"depth":279,"text":33},{"id":61,"depth":279,"text":62},{"id":124,"depth":279,"text":125},{"id":167,"depth":279,"text":168},{"id":226,"depth":279,"text":227},{"id":752,"depth":279,"text":753},{"id":1076,"depth":279,"text":1077},{"id":1246,"depth":279,"text":1247},{"id":1726,"depth":279,"text":1727},{"id":1900,"depth":279,"text":1901},{"id":2509,"depth":279,"text":2510},{"id":2554,"depth":279,"text":2555},{"id":2658,"depth":279,"text":2659},{"id":2697,"depth":279,"text":2698},{"id":2763,"depth":279,"text":2764},{"id":2904,"depth":279,"text":2905},{"id":2949,"depth":279,"text":2950},{"id":3012,"depth":279,"text":3013},"engineering",null,"2026-05-12","Tutorial honesto para subir métricas, logs y dashboards de tu cluster — en 4 horas, sin Datadog. Stack open-source que cabe en 1 VPS de R$80\u002Fmes.",false,"md",{},"\u002Fes\u002Fblog\u002Fstack-monitoring-prometheus-grafana-loki","16 min",{"title":5,"description":3113},{"loc":3117},"es\u002Fblog\u002Fstack-monitoring-prometheus-grafana-loki",[3123,3124,3125,3126,3127,3110],"prometheus","grafana","loki","monitoring","tutorial","324n4tX2U_bO1GLljCpReEqIs6qn0hm9GJVO3FvHiGM",[3130,3136],{"title":3131,"path":3132,"stem":3133,"description":3134,"date":3135,"category":3110,"children":-1},"¿Service mesh es overkill en SaaS pequeño? Cuándo vale instalar Istio\u002FLinkerd","\u002Fes\u002Fblog\u002Fservice-mesh-cuando-vale-la-pena-en-saas-pequeno","es\u002Fblog\u002Fservice-mesh-cuando-vale-la-pena-en-saas-pequeno","Service mesh resuelve problemas reales (mTLS, observabilidad entre servicios, traffic shaping). Pero añade 30-50% overhead de RAM\u002FCPU y complejidad. Cuándo vale la pena y cuándo es overkill.","2026-05-29",{"title":3137,"path":3138,"stem":3139,"description":3140,"date":3141,"category":3142,"children":-1},"Strapi, Directus y Ghost auto-hospedados: guía honesta para agencias e indie hackers","\u002Fes\u002Fblog\u002Fstrapi-directus-ghost-auto-hospedado-guia","es\u002Fblog\u002Fstrapi-directus-ghost-auto-hospedado-guia","Los tres CMS modernos open-source que más se auto-hospedan. Cada uno para un caso. Tabla comparativa, requisitos reales y cuándo vale la pena pagar la versión cloud.","2026-03-25","case-study",1777362217187]