[{"data":1,"prerenderedAt":3145},["ShallowReactive",2],{"blog-\u002Fblog\u002Fmonitoring-stack-completa-prometheus-grafana-loki-passo-a-passo":3,"blog-surround-\u002Fblog\u002Fmonitoring-stack-completa-prometheus-grafana-loki-passo-a-passo":3129,"blog-en-alt-\u002Fblog\u002Fmonitoring-stack-completa-prometheus-grafana-loki-passo-a-passo":3143},{"id":4,"title":5,"author":6,"body":7,"category":3110,"cover":3111,"date":3112,"description":3113,"draft":3114,"extension":3115,"lastReviewed":3111,"meta":3116,"navigation":391,"path":3117,"readingTime":3118,"seo":3119,"sitemap":3120,"stem":3121,"tags":3122,"__hash__":3128},"blog_pt\u002Fblog\u002Fmonitoring-stack-completa-prometheus-grafana-loki-passo-a-passo.md","Monitoring stack completa em 2026: Prometheus + Grafana + Loki passo a passo","Equipe HeroCtl",{"type":8,"value":9,"toc":3090},"minimark",[10,26,29,34,42,53,56,59,63,66,106,122,126,129,162,165,169,175,178,181,209,224,228,233,240,250,257,661,680,683,726,750,754,759,765,1045,1055,1074,1078,1084,1087,1167,1188,1195,1234,1244,1248,1252,1258,1513,1516,1519,1707,1717,1724,1728,1733,1742,1748,1857,1864,1871,1891,1898,1902,1906,1909,1919,2257,2264,2482,2493,2507,2511,2515,2521,2545,2552,2556,2561,2564,2567,2600,2603,2636,2645,2656,2660,2663,2666,2692,2695,2699,2755,2758,2761,2765,2899,2902,2906,2909,2919,2929,2941,2947,2951,2957,2963,2969,2975,2984,2998,3004,3010,3014,3017,3045,3058,3061,3083,3086],[11,12,13,14,18,19,18,22,25],"p",{},"A primeira vez que o seu site cair às três da manhã, você vai descobrir uma coisa desconfortável: não tem como saber o que aconteceu. Não tem gráfico de CPU, não tem log do contêiner que morreu, não tem alerta que avisou antes. Você vai abrir um terminal, conectar nos servidores um por um, rodar ",[15,16,17],"code",{},"top",", ",[15,20,21],{},"df",[15,23,24],{},"journalctl",", e tentar reconstituir uma cena de crime que já esfriou.",[11,27,28],{},"Esse post é o atalho pra você não passar por isso. Em quatro horas, com R$80 a R$120 por mês de hardware, dá pra montar a stack de observabilidade open-source que substitui Datadog, New Relic e CloudWatch em 95% dos casos pra startup. As ferramentas são as mesmas que rodam dentro de empresas com dezenas de milhares de servidores — e cabem confortavelmente num VPS pequeno pro time que está começando.",[30,31,33],"h2",{"id":32},"tldr","TL;DR",[11,35,36,37,41],{},"A stack de monitoring open-source padrão em 2026 — ",[38,39,40],"strong",{},"Prometheus + Grafana + Loki + Alertmanager"," — cabe em um único VPS de 4 GB de RAM e cobre métricas, logs centralizados, dashboards e alertas. Esse tutorial mostra setup passo a passo pra um cluster de 4 a 5 servidores em aproximadamente quatro horas, usando docker-compose ou job specs do orquestrador.",[11,43,44,45,48,49,52],{},"Pra startup brasileira, isso significa ",[38,46,47],{},"R$80 a R$120 por mês de hardware"," contra ",[38,50,51],{},"R$1.000 a R$2.000 por mês"," de SaaS de observabilidade equivalente. O custo de tempo é honesto: quatro horas de setup inicial mais duas a quatro horas por mês de manutenção contínua.",[11,54,55],{},"Resultado entregável ao final do tutorial: dashboards de CPU, RAM, disco, rede e métricas HTTP; logs pesquisáveis com retenção de 30 dias; alertas roteados pra Slack, Discord ou e-mail. Pré-requisitos: 1 VPS Linux com 4 GB de RAM e 50 GB de SSD, Docker instalado, e um domínio com DNS controlado por você.",[11,57,58],{},"A escolha entre rodar essa stack num VPS dedicado fora do cluster de produção ou como job dentro do próprio orquestrador é uma decisão arquitetural — cobrimos as duas opções no passo 8 e em \"Como rodar isso dentro do HeroCtl\".",[30,60,62],{"id":61},"o-que-cada-componente-faz-em-uma-frase","O que cada componente faz, em uma frase",[11,64,65],{},"Antes de instalar qualquer coisa, vale entender o papel de cada peça. A stack tem seis componentes; a confusão geralmente vem de pensar que algum deles é \"o sistema de monitoring\". Não é. Cada um faz uma coisa.",[67,68,69,76,82,88,94,100],"ul",{},[70,71,72,75],"li",{},[38,73,74],{},"Prometheus"," é um banco de dados de séries temporais (TSDB) que coleta métricas via HTTP scrape — ele puxa os números, ninguém empurra. Retém 15 dias por padrão.",[70,77,78,81],{},[38,79,80],{},"Grafana"," é a camada de visualização. Conecta em Prometheus, em Loki, em Postgres, em quase qualquer fonte estruturada, e desenha gráficos.",[70,83,84,87],{},[38,85,86],{},"Loki"," é a peça de logs. Sintaxe similar ao Prometheus, indexa apenas labels (não o conteúdo dos logs), e por causa disso fica cerca de dez vezes mais barato que ELK pra rodar.",[70,89,90,93],{},[38,91,92],{},"Promtail"," (ou o Grafana Agent, que está substituindo o Promtail em 2026) é o coletor que lê os arquivos de log de cada servidor e envia pro Loki.",[70,95,96,99],{},[38,97,98],{},"node_exporter"," roda em cada nó monitorado e expõe um endpoint HTTP com CPU, RAM, disco e rede em formato Prometheus.",[70,101,102,105],{},[38,103,104],{},"Alertmanager"," recebe regras de alerta do Prometheus e cuida do roteamento — Slack, e-mail, PagerDuty, webhook arbitrário.",[11,107,108,109,18,112,18,115,18,118,121],{},"Quem desenha a primeira stack costuma confundir Prometheus com \"monitoring\" e Grafana com \"dashboards bonitos\". A separação real é: ",[38,110,111],{},"Prometheus guarda números",[38,113,114],{},"Loki guarda texto",[38,116,117],{},"Grafana mostra ambos",[38,119,120],{},"Alertmanager grita quando algum número fica errado",".",[30,123,125],{"id":124},"qual-e-a-arquitetura-recomendada","Qual é a arquitetura recomendada?",[11,127,128],{},"Pra um cluster de 3 a 5 servidores rodando aplicações de produção, a topologia que tem dado certo na prática é separar o servidor de observabilidade do resto. Um nó dedicado, fora do cluster que ele monitora, com dois objetivos: não morrer junto quando o cluster morrer, e não competir por CPU\u002FRAM com a aplicação real.",[67,130,131,137,143,153],{},[70,132,133,136],{},[38,134,135],{},"1 servidor \"observability\" dedicado",", 4 GB de RAM, 50 GB de SSD. Roda Prometheus, Grafana, Loki, Alertmanager.",[70,138,139,142],{},[38,140,141],{},"Cada servidor monitorado"," roda apenas dois processos leves: node_exporter (métricas de sistema) e Promtail (envio de logs).",[70,144,145,148,149,152],{},[38,146,147],{},"As suas aplicações"," expõem um endpoint ",[15,150,151],{},"\u002Fmetrics"," em formato Prometheus. Se você usa um framework popular, existe um cliente pronto. Se não, é uma biblioteca de poucas dezenas de linhas.",[70,154,155,157,158,161],{},[38,156,80],{}," fica acessível via subdomínio (",[15,159,160],{},"monitor.seudominio.com",") com TLS automático e autenticação básica em frente.",[11,163,164],{},"Essa separação tem um custo: você paga por mais um VPS. Em troca, quando o cluster principal cair, você ainda consegue olhar os gráficos pra entender o que aconteceu. Pra startup, esse trade-off compensa quase sempre — o pior cenário em monitoring é descobrir que a única coisa que parou junto com o site foi o sistema que ia te avisar que o site parou.",[30,166,168],{"id":167},"passo-1-como-provisionar-o-vps-de-observabilidade","Passo 1 — Como provisionar o VPS de observabilidade?",[11,170,171,172,121],{},"Tempo estimado: ",[38,173,174],{},"10 minutos",[11,176,177],{},"Qualquer provedor barato serve. Os dois com melhor custo-benefício pro caso brasileiro hoje são a Hetzner (CPX21 a 7,99 EUR por mês com 3 vCPUs e 4 GB de RAM, datacenter na Alemanha) e a DigitalOcean (Basic Droplet de US$24 por mês com a mesma configuração, datacenters mais próximos do Brasil). Pra workload de monitoring, latência de scrape em datacenter europeu não causa problema — Prometheus puxa a cada 15 segundos por padrão, então 200ms de RTT entre Hetzner e seus servidores não atrapalha.",[11,179,180],{},"Provisionando:",[182,183,184,187,190,196,203],"ol",{},[70,185,186],{},"Crie o VPS com Ubuntu 24.04 LTS ou Debian 12.",[70,188,189],{},"Adicione a sua chave SSH pública na criação. Desabilite login por senha.",[70,191,192,193,121],{},"Instale Docker e o plugin de compose: ",[15,194,195],{},"curl -fsSL https:\u002F\u002Fget.docker.com | sh && apt install docker-compose-plugin",[70,197,198,199,202],{},"Configure o firewall: porta 22 (SSH) aberta, porta 443 (HTTPS) aberta, todas as outras fechadas. As portas internas (3000, 9090, 3100, 9093) só ficam acessíveis pelo ",[15,200,201],{},"localhost"," do próprio VPS — o reverse proxy expõe Grafana via 443.",[70,204,205,206,208],{},"Aponte o DNS: crie um registro A ",[15,207,160],{}," pra o IP do VPS.",[11,210,211,212,215,216,219,220,223],{},"Validação: ",[15,213,214],{},"docker --version"," retorna 26.x ou superior; ",[15,217,218],{},"dig monitor.seudominio.com"," retorna o IP correto; ",[15,221,222],{},"ssh root@monitor.seudominio.com"," conecta sem pedir senha.",[30,225,227],{"id":226},"passo-2-como-subir-a-stack-via-docker-compose","Passo 2 — Como subir a stack via docker-compose?",[11,229,171,230,121],{},[38,231,232],{},"45 minutos",[11,234,235,236,239],{},"Crie o diretório de trabalho em ",[15,237,238],{},"\u002Fopt\u002Fobservability\u002F"," com a seguinte estrutura:",[241,242,247],"pre",{"className":243,"code":245,"language":246},[244],"language-text","\u002Fopt\u002Fobservability\u002F\n├── docker-compose.yml\n├── prometheus\u002F\n│   ├── prometheus.yml\n│   └── alerts.yml\n├── alertmanager\u002F\n│   └── alertmanager.yml\n├── loki\u002F\n│   └── loki-config.yml\n└── grafana\u002F\n    └── provisioning\u002F\n        └── datasources\u002F\n            └── datasources.yml\n","text",[15,248,245],{"__ignoreMap":249},"",[11,251,252,253,256],{},"O ",[15,254,255],{},"docker-compose.yml"," abreviado mas funcional:",[241,258,262],{"className":259,"code":260,"language":261,"meta":249,"style":249},"language-yaml shiki shiki-themes github-dark-default","services:\n  prometheus:\n    image: prom\u002Fprometheus:v2.55.0\n    volumes:\n      - .\u002Fprometheus:\u002Fetc\u002Fprometheus\n      - prometheus-data:\u002Fprometheus\n    command:\n      - '--config.file=\u002Fetc\u002Fprometheus\u002Fprometheus.yml'\n      - '--storage.tsdb.retention.time=30d'\n      - '--web.enable-lifecycle'  # permite reload via HTTP POST\n    ports:\n      - '127.0.0.1:9090:9090'\n    restart: unless-stopped\n\n  grafana:\n    image: grafana\u002Fgrafana:11.3.0\n    volumes:\n      - grafana-data:\u002Fvar\u002Flib\u002Fgrafana\n      - .\u002Fgrafana\u002Fprovisioning:\u002Fetc\u002Fgrafana\u002Fprovisioning\n    environment:\n      - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}\n      - GF_USERS_ALLOW_SIGN_UP=false\n    ports:\n      - '127.0.0.1:3000:3000'\n    restart: unless-stopped\n\n  loki:\n    image: grafana\u002Floki:3.2.0\n    volumes:\n      - .\u002Floki\u002Floki-config.yml:\u002Fetc\u002Floki\u002Fconfig.yml\n      - loki-data:\u002Floki\n    command: -config.file=\u002Fetc\u002Floki\u002Fconfig.yml\n    ports:\n      - '127.0.0.1:3100:3100'\n    restart: unless-stopped\n\n  alertmanager:\n    image: prom\u002Falertmanager:v0.27.0\n    volumes:\n      - .\u002Falertmanager:\u002Fetc\u002Falertmanager\n    ports:\n      - '127.0.0.1:9093:9093'\n    restart: unless-stopped\n\nvolumes:\n  prometheus-data:\n  grafana-data:\n  loki-data:\n","yaml",[15,263,264,277,285,298,306,315,323,331,339,347,359,367,375,386,393,401,411,418,426,434,442,450,458,465,473,482,487,495,505,512,520,528,538,545,553,562,567,575,585,592,600,607,615,624,629,637,645,653],{"__ignoreMap":249},[265,266,269,273],"span",{"class":267,"line":268},"line",1,[265,270,272],{"class":271},"sPWt5","services",[265,274,276],{"class":275},"sZEs4",":\n",[265,278,280,283],{"class":267,"line":279},2,[265,281,282],{"class":271},"  prometheus",[265,284,276],{"class":275},[265,286,288,291,294],{"class":267,"line":287},3,[265,289,290],{"class":271},"    image",[265,292,293],{"class":275},": ",[265,295,297],{"class":296},"s9uIt","prom\u002Fprometheus:v2.55.0\n",[265,299,301,304],{"class":267,"line":300},4,[265,302,303],{"class":271},"    volumes",[265,305,276],{"class":275},[265,307,309,312],{"class":267,"line":308},5,[265,310,311],{"class":275},"      - ",[265,313,314],{"class":296},".\u002Fprometheus:\u002Fetc\u002Fprometheus\n",[265,316,318,320],{"class":267,"line":317},6,[265,319,311],{"class":275},[265,321,322],{"class":296},"prometheus-data:\u002Fprometheus\n",[265,324,326,329],{"class":267,"line":325},7,[265,327,328],{"class":271},"    command",[265,330,276],{"class":275},[265,332,334,336],{"class":267,"line":333},8,[265,335,311],{"class":275},[265,337,338],{"class":296},"'--config.file=\u002Fetc\u002Fprometheus\u002Fprometheus.yml'\n",[265,340,342,344],{"class":267,"line":341},9,[265,343,311],{"class":275},[265,345,346],{"class":296},"'--storage.tsdb.retention.time=30d'\n",[265,348,350,352,355],{"class":267,"line":349},10,[265,351,311],{"class":275},[265,353,354],{"class":296},"'--web.enable-lifecycle'",[265,356,358],{"class":357},"sH3jZ","  # permite reload via HTTP POST\n",[265,360,362,365],{"class":267,"line":361},11,[265,363,364],{"class":271},"    ports",[265,366,276],{"class":275},[265,368,370,372],{"class":267,"line":369},12,[265,371,311],{"class":275},[265,373,374],{"class":296},"'127.0.0.1:9090:9090'\n",[265,376,378,381,383],{"class":267,"line":377},13,[265,379,380],{"class":271},"    restart",[265,382,293],{"class":275},[265,384,385],{"class":296},"unless-stopped\n",[265,387,389],{"class":267,"line":388},14,[265,390,392],{"emptyLinePlaceholder":391},true,"\n",[265,394,396,399],{"class":267,"line":395},15,[265,397,398],{"class":271},"  grafana",[265,400,276],{"class":275},[265,402,404,406,408],{"class":267,"line":403},16,[265,405,290],{"class":271},[265,407,293],{"class":275},[265,409,410],{"class":296},"grafana\u002Fgrafana:11.3.0\n",[265,412,414,416],{"class":267,"line":413},17,[265,415,303],{"class":271},[265,417,276],{"class":275},[265,419,421,423],{"class":267,"line":420},18,[265,422,311],{"class":275},[265,424,425],{"class":296},"grafana-data:\u002Fvar\u002Flib\u002Fgrafana\n",[265,427,429,431],{"class":267,"line":428},19,[265,430,311],{"class":275},[265,432,433],{"class":296},".\u002Fgrafana\u002Fprovisioning:\u002Fetc\u002Fgrafana\u002Fprovisioning\n",[265,435,437,440],{"class":267,"line":436},20,[265,438,439],{"class":271},"    environment",[265,441,276],{"class":275},[265,443,445,447],{"class":267,"line":444},21,[265,446,311],{"class":275},[265,448,449],{"class":296},"GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}\n",[265,451,453,455],{"class":267,"line":452},22,[265,454,311],{"class":275},[265,456,457],{"class":296},"GF_USERS_ALLOW_SIGN_UP=false\n",[265,459,461,463],{"class":267,"line":460},23,[265,462,364],{"class":271},[265,464,276],{"class":275},[265,466,468,470],{"class":267,"line":467},24,[265,469,311],{"class":275},[265,471,472],{"class":296},"'127.0.0.1:3000:3000'\n",[265,474,476,478,480],{"class":267,"line":475},25,[265,477,380],{"class":271},[265,479,293],{"class":275},[265,481,385],{"class":296},[265,483,485],{"class":267,"line":484},26,[265,486,392],{"emptyLinePlaceholder":391},[265,488,490,493],{"class":267,"line":489},27,[265,491,492],{"class":271},"  loki",[265,494,276],{"class":275},[265,496,498,500,502],{"class":267,"line":497},28,[265,499,290],{"class":271},[265,501,293],{"class":275},[265,503,504],{"class":296},"grafana\u002Floki:3.2.0\n",[265,506,508,510],{"class":267,"line":507},29,[265,509,303],{"class":271},[265,511,276],{"class":275},[265,513,515,517],{"class":267,"line":514},30,[265,516,311],{"class":275},[265,518,519],{"class":296},".\u002Floki\u002Floki-config.yml:\u002Fetc\u002Floki\u002Fconfig.yml\n",[265,521,523,525],{"class":267,"line":522},31,[265,524,311],{"class":275},[265,526,527],{"class":296},"loki-data:\u002Floki\n",[265,529,531,533,535],{"class":267,"line":530},32,[265,532,328],{"class":271},[265,534,293],{"class":275},[265,536,537],{"class":296},"-config.file=\u002Fetc\u002Floki\u002Fconfig.yml\n",[265,539,541,543],{"class":267,"line":540},33,[265,542,364],{"class":271},[265,544,276],{"class":275},[265,546,548,550],{"class":267,"line":547},34,[265,549,311],{"class":275},[265,551,552],{"class":296},"'127.0.0.1:3100:3100'\n",[265,554,556,558,560],{"class":267,"line":555},35,[265,557,380],{"class":271},[265,559,293],{"class":275},[265,561,385],{"class":296},[265,563,565],{"class":267,"line":564},36,[265,566,392],{"emptyLinePlaceholder":391},[265,568,570,573],{"class":267,"line":569},37,[265,571,572],{"class":271},"  alertmanager",[265,574,276],{"class":275},[265,576,578,580,582],{"class":267,"line":577},38,[265,579,290],{"class":271},[265,581,293],{"class":275},[265,583,584],{"class":296},"prom\u002Falertmanager:v0.27.0\n",[265,586,588,590],{"class":267,"line":587},39,[265,589,303],{"class":271},[265,591,276],{"class":275},[265,593,595,597],{"class":267,"line":594},40,[265,596,311],{"class":275},[265,598,599],{"class":296},".\u002Falertmanager:\u002Fetc\u002Falertmanager\n",[265,601,603,605],{"class":267,"line":602},41,[265,604,364],{"class":271},[265,606,276],{"class":275},[265,608,610,612],{"class":267,"line":609},42,[265,611,311],{"class":275},[265,613,614],{"class":296},"'127.0.0.1:9093:9093'\n",[265,616,618,620,622],{"class":267,"line":617},43,[265,619,380],{"class":271},[265,621,293],{"class":275},[265,623,385],{"class":296},[265,625,627],{"class":267,"line":626},44,[265,628,392],{"emptyLinePlaceholder":391},[265,630,632,635],{"class":267,"line":631},45,[265,633,634],{"class":271},"volumes",[265,636,276],{"class":275},[265,638,640,643],{"class":267,"line":639},46,[265,641,642],{"class":271},"  prometheus-data",[265,644,276],{"class":275},[265,646,648,651],{"class":267,"line":647},47,[265,649,650],{"class":271},"  grafana-data",[265,652,276],{"class":275},[265,654,656,659],{"class":267,"line":655},48,[265,657,658],{"class":271},"  loki-data",[265,660,276],{"class":275},[11,662,663,664,667,668,671,672,675,676,679],{},"Três pontos importantes nesse arquivo. Primeiro, todas as portas estão amarradas a ",[15,665,666],{},"127.0.0.1"," — nenhum dos serviços é acessível diretamente da internet. Segundo, os volumes são nomeados (não bind mounts), então sobrevivem a ",[15,669,670],{},"docker-compose down",". Terceiro, a senha do Grafana vem de variável de ambiente: crie um ",[15,673,674],{},".env"," ao lado do compose com ",[15,677,678],{},"GRAFANA_PASSWORD=algo_longo_aleatorio"," e nunca comite isso.",[11,681,682],{},"Suba a stack:",[241,684,688],{"className":685,"code":686,"language":687,"meta":249,"style":249},"language-bash shiki shiki-themes github-dark-default","cd \u002Fopt\u002Fobservability\ndocker compose up -d\ndocker compose ps  # todos devem estar \"Up\" \u002F healthy\n","bash",[15,689,690,699,714],{"__ignoreMap":249},[265,691,692,696],{"class":267,"line":268},[265,693,695],{"class":694},"sFSAA","cd",[265,697,698],{"class":296}," \u002Fopt\u002Fobservability\n",[265,700,701,705,708,711],{"class":267,"line":279},[265,702,704],{"class":703},"sQhOw","docker",[265,706,707],{"class":296}," compose",[265,709,710],{"class":296}," up",[265,712,713],{"class":694}," -d\n",[265,715,716,718,720,723],{"class":267,"line":287},[265,717,704],{"class":703},[265,719,707],{"class":296},[265,721,722],{"class":296}," ps",[265,724,725],{"class":357},"  # todos devem estar \"Up\" \u002F healthy\n",[11,727,728,729,732,733,736,737,732,740,736,743,746,747,121],{},"Validação rápida: ",[15,730,731],{},"curl localhost:9090\u002F-\u002Fready"," retorna ",[15,734,735],{},"Prometheus Server is Ready","; ",[15,738,739],{},"curl localhost:3100\u002Fready",[15,741,742],{},"ready",[15,744,745],{},"curl localhost:3000\u002Fapi\u002Fhealth"," retorna JSON com ",[15,748,749],{},"\"database\": \"ok\"",[30,751,753],{"id":752},"passo-3-como-configurar-os-scrapes-do-prometheus","Passo 3 — Como configurar os scrapes do Prometheus?",[11,755,171,756,121],{},[38,757,758],{},"30 minutos",[11,760,252,761,764],{},[15,762,763],{},"prometheus\u002Fprometheus.yml"," é onde você diz pro Prometheus quais endpoints raspar. Pra um cluster de 4 servidores, fica assim:",[241,766,768],{"className":259,"code":767,"language":261,"meta":249,"style":249},"global:\n  scrape_interval: 15s\n  evaluation_interval: 15s\n\nalerting:\n  alertmanagers:\n    - static_configs:\n        - targets: ['alertmanager:9093']\n\nrule_files:\n  - 'alerts.yml'\n\nscrape_configs:\n  - job_name: 'prometheus'\n    static_configs:\n      - targets: ['localhost:9090']\n\n  - job_name: 'node'\n    static_configs:\n      - targets:\n          - 'server-1.seudominio.internal:9100'\n          - 'server-2.seudominio.internal:9100'\n          - 'server-3.seudominio.internal:9100'\n          - 'worker-1.seudominio.internal:9100'\n        labels:\n          environment: 'production'\n\n  - job_name: 'apps'\n    static_configs:\n      - targets:\n          - 'api.seudominio.internal:8080'\n          - 'worker.seudominio.internal:8080'\n        labels:\n          environment: 'production'\n    metrics_path: '\u002Fmetrics'\n",[15,769,770,777,787,796,800,807,814,824,841,845,852,860,864,871,883,890,903,907,918,924,932,940,947,954,961,968,978,982,993,999,1007,1014,1021,1027,1035],{"__ignoreMap":249},[265,771,772,775],{"class":267,"line":268},[265,773,774],{"class":271},"global",[265,776,276],{"class":275},[265,778,779,782,784],{"class":267,"line":279},[265,780,781],{"class":271},"  scrape_interval",[265,783,293],{"class":275},[265,785,786],{"class":296},"15s\n",[265,788,789,792,794],{"class":267,"line":287},[265,790,791],{"class":271},"  evaluation_interval",[265,793,293],{"class":275},[265,795,786],{"class":296},[265,797,798],{"class":267,"line":300},[265,799,392],{"emptyLinePlaceholder":391},[265,801,802,805],{"class":267,"line":308},[265,803,804],{"class":271},"alerting",[265,806,276],{"class":275},[265,808,809,812],{"class":267,"line":317},[265,810,811],{"class":271},"  alertmanagers",[265,813,276],{"class":275},[265,815,816,819,822],{"class":267,"line":325},[265,817,818],{"class":275},"    - ",[265,820,821],{"class":271},"static_configs",[265,823,276],{"class":275},[265,825,826,829,832,835,838],{"class":267,"line":333},[265,827,828],{"class":275},"        - ",[265,830,831],{"class":271},"targets",[265,833,834],{"class":275},": [",[265,836,837],{"class":296},"'alertmanager:9093'",[265,839,840],{"class":275},"]\n",[265,842,843],{"class":267,"line":341},[265,844,392],{"emptyLinePlaceholder":391},[265,846,847,850],{"class":267,"line":349},[265,848,849],{"class":271},"rule_files",[265,851,276],{"class":275},[265,853,854,857],{"class":267,"line":361},[265,855,856],{"class":275},"  - ",[265,858,859],{"class":296},"'alerts.yml'\n",[265,861,862],{"class":267,"line":369},[265,863,392],{"emptyLinePlaceholder":391},[265,865,866,869],{"class":267,"line":377},[265,867,868],{"class":271},"scrape_configs",[265,870,276],{"class":275},[265,872,873,875,878,880],{"class":267,"line":388},[265,874,856],{"class":275},[265,876,877],{"class":271},"job_name",[265,879,293],{"class":275},[265,881,882],{"class":296},"'prometheus'\n",[265,884,885,888],{"class":267,"line":395},[265,886,887],{"class":271},"    static_configs",[265,889,276],{"class":275},[265,891,892,894,896,898,901],{"class":267,"line":403},[265,893,311],{"class":275},[265,895,831],{"class":271},[265,897,834],{"class":275},[265,899,900],{"class":296},"'localhost:9090'",[265,902,840],{"class":275},[265,904,905],{"class":267,"line":413},[265,906,392],{"emptyLinePlaceholder":391},[265,908,909,911,913,915],{"class":267,"line":420},[265,910,856],{"class":275},[265,912,877],{"class":271},[265,914,293],{"class":275},[265,916,917],{"class":296},"'node'\n",[265,919,920,922],{"class":267,"line":428},[265,921,887],{"class":271},[265,923,276],{"class":275},[265,925,926,928,930],{"class":267,"line":436},[265,927,311],{"class":275},[265,929,831],{"class":271},[265,931,276],{"class":275},[265,933,934,937],{"class":267,"line":444},[265,935,936],{"class":275},"          - ",[265,938,939],{"class":296},"'server-1.seudominio.internal:9100'\n",[265,941,942,944],{"class":267,"line":452},[265,943,936],{"class":275},[265,945,946],{"class":296},"'server-2.seudominio.internal:9100'\n",[265,948,949,951],{"class":267,"line":460},[265,950,936],{"class":275},[265,952,953],{"class":296},"'server-3.seudominio.internal:9100'\n",[265,955,956,958],{"class":267,"line":467},[265,957,936],{"class":275},[265,959,960],{"class":296},"'worker-1.seudominio.internal:9100'\n",[265,962,963,966],{"class":267,"line":475},[265,964,965],{"class":271},"        labels",[265,967,276],{"class":275},[265,969,970,973,975],{"class":267,"line":484},[265,971,972],{"class":271},"          environment",[265,974,293],{"class":275},[265,976,977],{"class":296},"'production'\n",[265,979,980],{"class":267,"line":489},[265,981,392],{"emptyLinePlaceholder":391},[265,983,984,986,988,990],{"class":267,"line":497},[265,985,856],{"class":275},[265,987,877],{"class":271},[265,989,293],{"class":275},[265,991,992],{"class":296},"'apps'\n",[265,994,995,997],{"class":267,"line":507},[265,996,887],{"class":271},[265,998,276],{"class":275},[265,1000,1001,1003,1005],{"class":267,"line":514},[265,1002,311],{"class":275},[265,1004,831],{"class":271},[265,1006,276],{"class":275},[265,1008,1009,1011],{"class":267,"line":522},[265,1010,936],{"class":275},[265,1012,1013],{"class":296},"'api.seudominio.internal:8080'\n",[265,1015,1016,1018],{"class":267,"line":530},[265,1017,936],{"class":275},[265,1019,1020],{"class":296},"'worker.seudominio.internal:8080'\n",[265,1022,1023,1025],{"class":267,"line":540},[265,1024,965],{"class":271},[265,1026,276],{"class":275},[265,1028,1029,1031,1033],{"class":267,"line":547},[265,1030,972],{"class":271},[265,1032,293],{"class":275},[265,1034,977],{"class":296},[265,1036,1037,1040,1042],{"class":267,"line":555},[265,1038,1039],{"class":271},"    metrics_path",[265,1041,293],{"class":275},[265,1043,1044],{"class":296},"'\u002Fmetrics'\n",[11,1046,1047,1048,1050,1051,1054],{},"Pra clusters maiores ou que mudam de composição com frequência, troque o ",[15,1049,821],{}," por ",[15,1052,1053],{},"file_sd_configs"," apontando pra um JSON que você gera automaticamente. Pra 4 servidores estáticos, o arquivo acima resolve.",[11,1056,1057,1058,1061,1062,1065,1066,1069,1070,1073],{},"Reload: ",[15,1059,1060],{},"curl -X POST localhost:9090\u002F-\u002Freload",". Confira em ",[15,1063,1064],{},"localhost:9090\u002Ftargets"," se todos os jobs estão ",[15,1067,1068],{},"UP",". Os que estiverem ",[15,1071,1072],{},"DOWN"," ainda não foram instrumentados — esse é o passo 4.",[30,1075,1077],{"id":1076},"passo-4-como-instalar-o-node_exporter-em-cada-servidor","Passo 4 — Como instalar o node_exporter em cada servidor?",[11,1079,171,1080,1083],{},[38,1081,1082],{},"15 minutos"," para 4 servidores.",[11,1085,1086],{},"Em cada servidor monitorado, rode o node_exporter. Há duas formas: binário direto via systemd, ou contêiner Docker. Em 2026 o consenso é container — facilita atualização e isolamento. Em cada nó:",[241,1088,1090],{"className":685,"code":1089,"language":687,"meta":249,"style":249},"docker run -d \\\n  --name node-exporter \\\n  --restart unless-stopped \\\n  --net=\"host\" \\\n  --pid=\"host\" \\\n  -v \"\u002F:\u002Fhost:ro,rslave\" \\\n  prom\u002Fnode-exporter:v1.8.2 \\\n  --path.rootfs=\u002Fhost\n",[15,1091,1092,1106,1116,1126,1136,1145,1155,1162],{"__ignoreMap":249},[265,1093,1094,1096,1099,1102],{"class":267,"line":268},[265,1095,704],{"class":703},[265,1097,1098],{"class":296}," run",[265,1100,1101],{"class":694}," -d",[265,1103,1105],{"class":1104},"suJrU"," \\\n",[265,1107,1108,1111,1114],{"class":267,"line":279},[265,1109,1110],{"class":694},"  --name",[265,1112,1113],{"class":296}," node-exporter",[265,1115,1105],{"class":1104},[265,1117,1118,1121,1124],{"class":267,"line":287},[265,1119,1120],{"class":694},"  --restart",[265,1122,1123],{"class":296}," unless-stopped",[265,1125,1105],{"class":1104},[265,1127,1128,1131,1134],{"class":267,"line":300},[265,1129,1130],{"class":694},"  --net=",[265,1132,1133],{"class":296},"\"host\"",[265,1135,1105],{"class":1104},[265,1137,1138,1141,1143],{"class":267,"line":308},[265,1139,1140],{"class":694},"  --pid=",[265,1142,1133],{"class":296},[265,1144,1105],{"class":1104},[265,1146,1147,1150,1153],{"class":267,"line":317},[265,1148,1149],{"class":694},"  -v",[265,1151,1152],{"class":296}," \"\u002F:\u002Fhost:ro,rslave\"",[265,1154,1105],{"class":1104},[265,1156,1157,1160],{"class":267,"line":325},[265,1158,1159],{"class":296},"  prom\u002Fnode-exporter:v1.8.2",[265,1161,1105],{"class":1104},[265,1163,1164],{"class":267,"line":333},[265,1165,1166],{"class":694},"  --path.rootfs=\u002Fhost\n",[11,1168,252,1169,1172,1173,1176,1177,18,1180,1183,1184,1187],{},[15,1170,1171],{},"--net=host"," é necessário pra ele enxergar as interfaces de rede reais. O bind mount em ",[15,1174,1175],{},"\u002Fhost"," permite ler ",[15,1178,1179],{},"\u002Fproc",[15,1181,1182],{},"\u002Fsys"," e ",[15,1185,1186],{},"\u002Fetc\u002Fpasswd"," do host (read-only) sem rodar o contêiner com privilégios de raiz.",[11,1189,1190,1191,1194],{},"Firewall: abra a porta 9100 apenas pra o IP do servidor de observabilidade. No Ubuntu com ",[15,1192,1193],{},"ufw",":",[241,1196,1198],{"className":685,"code":1197,"language":687,"meta":249,"style":249},"ufw allow from \u003CIP_DO_OBSERVABILITY> to any port 9100\n",[15,1199,1200],{"__ignoreMap":249},[265,1201,1202,1204,1207,1210,1213,1216,1219,1222,1225,1228,1231],{"class":267,"line":268},[265,1203,1193],{"class":703},[265,1205,1206],{"class":296}," allow",[265,1208,1209],{"class":296}," from",[265,1211,1212],{"class":1104}," \u003C",[265,1214,1215],{"class":296},"IP_DO_OBSERVABILIT",[265,1217,1218],{"class":275},"Y",[265,1220,1221],{"class":1104},">",[265,1223,1224],{"class":296}," to",[265,1226,1227],{"class":296}," any",[265,1229,1230],{"class":296}," port",[265,1232,1233],{"class":694}," 9100\n",[11,1235,1236,1237,1240,1241,121],{},"Validação: do servidor de observability, ",[15,1238,1239],{},"curl http:\u002F\u002Fserver-1.seudominio.internal:9100\u002Fmetrics"," deve retornar centenas de linhas começando com ",[15,1242,1243],{},"# HELP node_cpu_seconds_total...",[30,1245,1247],{"id":1246},"passo-5-como-configurar-loki-promtail","Passo 5 — Como configurar Loki + Promtail?",[11,1249,171,1250,121],{},[38,1251,758],{},[11,1253,1254,1255,1194],{},"O Loki já está rodando no compose do passo 2. Falta o ",[15,1256,1257],{},"loki-config.yml",[241,1259,1261],{"className":259,"code":1260,"language":261,"meta":249,"style":249},"auth_enabled: false\n\nserver:\n  http_listen_port: 3100\n\ncommon:\n  path_prefix: \u002Floki\n  storage:\n    filesystem:\n      chunks_directory: \u002Floki\u002Fchunks\n      rules_directory: \u002Floki\u002Frules\n  replication_factor: 1\n  ring:\n    kvstore:\n      store: inmemory\n\nschema_config:\n  configs:\n    - from: 2024-01-01\n      store: tsdb\n      object_store: filesystem\n      schema: v13\n      index:\n        prefix: index_\n        period: 24h\n\nlimits_config:\n  retention_period: 720h  # 30 dias\n  reject_old_samples: true\n  reject_old_samples_max_age: 168h\n",[15,1262,1263,1273,1277,1284,1294,1298,1305,1315,1322,1329,1339,1349,1359,1366,1373,1383,1387,1394,1401,1413,1422,1432,1442,1449,1459,1469,1473,1480,1493,1503],{"__ignoreMap":249},[265,1264,1265,1268,1270],{"class":267,"line":268},[265,1266,1267],{"class":271},"auth_enabled",[265,1269,293],{"class":275},[265,1271,1272],{"class":694},"false\n",[265,1274,1275],{"class":267,"line":279},[265,1276,392],{"emptyLinePlaceholder":391},[265,1278,1279,1282],{"class":267,"line":287},[265,1280,1281],{"class":271},"server",[265,1283,276],{"class":275},[265,1285,1286,1289,1291],{"class":267,"line":300},[265,1287,1288],{"class":271},"  http_listen_port",[265,1290,293],{"class":275},[265,1292,1293],{"class":694},"3100\n",[265,1295,1296],{"class":267,"line":308},[265,1297,392],{"emptyLinePlaceholder":391},[265,1299,1300,1303],{"class":267,"line":317},[265,1301,1302],{"class":271},"common",[265,1304,276],{"class":275},[265,1306,1307,1310,1312],{"class":267,"line":325},[265,1308,1309],{"class":271},"  path_prefix",[265,1311,293],{"class":275},[265,1313,1314],{"class":296},"\u002Floki\n",[265,1316,1317,1320],{"class":267,"line":333},[265,1318,1319],{"class":271},"  storage",[265,1321,276],{"class":275},[265,1323,1324,1327],{"class":267,"line":341},[265,1325,1326],{"class":271},"    filesystem",[265,1328,276],{"class":275},[265,1330,1331,1334,1336],{"class":267,"line":349},[265,1332,1333],{"class":271},"      chunks_directory",[265,1335,293],{"class":275},[265,1337,1338],{"class":296},"\u002Floki\u002Fchunks\n",[265,1340,1341,1344,1346],{"class":267,"line":361},[265,1342,1343],{"class":271},"      rules_directory",[265,1345,293],{"class":275},[265,1347,1348],{"class":296},"\u002Floki\u002Frules\n",[265,1350,1351,1354,1356],{"class":267,"line":369},[265,1352,1353],{"class":271},"  replication_factor",[265,1355,293],{"class":275},[265,1357,1358],{"class":694},"1\n",[265,1360,1361,1364],{"class":267,"line":377},[265,1362,1363],{"class":271},"  ring",[265,1365,276],{"class":275},[265,1367,1368,1371],{"class":267,"line":388},[265,1369,1370],{"class":271},"    kvstore",[265,1372,276],{"class":275},[265,1374,1375,1378,1380],{"class":267,"line":395},[265,1376,1377],{"class":271},"      store",[265,1379,293],{"class":275},[265,1381,1382],{"class":296},"inmemory\n",[265,1384,1385],{"class":267,"line":403},[265,1386,392],{"emptyLinePlaceholder":391},[265,1388,1389,1392],{"class":267,"line":413},[265,1390,1391],{"class":271},"schema_config",[265,1393,276],{"class":275},[265,1395,1396,1399],{"class":267,"line":420},[265,1397,1398],{"class":271},"  configs",[265,1400,276],{"class":275},[265,1402,1403,1405,1408,1410],{"class":267,"line":428},[265,1404,818],{"class":275},[265,1406,1407],{"class":271},"from",[265,1409,293],{"class":275},[265,1411,1412],{"class":694},"2024-01-01\n",[265,1414,1415,1417,1419],{"class":267,"line":436},[265,1416,1377],{"class":271},[265,1418,293],{"class":275},[265,1420,1421],{"class":296},"tsdb\n",[265,1423,1424,1427,1429],{"class":267,"line":444},[265,1425,1426],{"class":271},"      object_store",[265,1428,293],{"class":275},[265,1430,1431],{"class":296},"filesystem\n",[265,1433,1434,1437,1439],{"class":267,"line":452},[265,1435,1436],{"class":271},"      schema",[265,1438,293],{"class":275},[265,1440,1441],{"class":296},"v13\n",[265,1443,1444,1447],{"class":267,"line":460},[265,1445,1446],{"class":271},"      index",[265,1448,276],{"class":275},[265,1450,1451,1454,1456],{"class":267,"line":467},[265,1452,1453],{"class":271},"        prefix",[265,1455,293],{"class":275},[265,1457,1458],{"class":296},"index_\n",[265,1460,1461,1464,1466],{"class":267,"line":475},[265,1462,1463],{"class":271},"        period",[265,1465,293],{"class":275},[265,1467,1468],{"class":296},"24h\n",[265,1470,1471],{"class":267,"line":484},[265,1472,392],{"emptyLinePlaceholder":391},[265,1474,1475,1478],{"class":267,"line":489},[265,1476,1477],{"class":271},"limits_config",[265,1479,276],{"class":275},[265,1481,1482,1485,1487,1490],{"class":267,"line":497},[265,1483,1484],{"class":271},"  retention_period",[265,1486,293],{"class":275},[265,1488,1489],{"class":296},"720h",[265,1491,1492],{"class":357},"  # 30 dias\n",[265,1494,1495,1498,1500],{"class":267,"line":507},[265,1496,1497],{"class":271},"  reject_old_samples",[265,1499,293],{"class":275},[265,1501,1502],{"class":694},"true\n",[265,1504,1505,1508,1510],{"class":267,"line":514},[265,1506,1507],{"class":271},"  reject_old_samples_max_age",[265,1509,293],{"class":275},[265,1511,1512],{"class":296},"168h\n",[11,1514,1515],{},"Storage em filesystem é o suficiente pra começar. Quando você passar de 50 GB de logs por dia ou quiser retenção de 90+ dias, migra pra S3 (ou compatível). Não migre antes — complica a operação sem ganho real.",[11,1517,1518],{},"Em cada servidor monitorado, instale o Promtail (ou Grafana Agent) também via container:",[241,1520,1522],{"className":259,"code":1521,"language":261,"meta":249,"style":249},"# \u002Fopt\u002Fpromtail\u002Fpromtail-config.yml em cada servidor\nserver:\n  http_listen_port: 9080\n\nclients:\n  - url: http:\u002F\u002Fmonitor.seudominio.com:3100\u002Floki\u002Fapi\u002Fv1\u002Fpush\n\nscrape_configs:\n  - job_name: system\n    static_configs:\n      - targets: [localhost]\n        labels:\n          job: varlogs\n          host: ${HOSTNAME}\n          __path__: \u002Fvar\u002Flog\u002F*.log\n\n  - job_name: docker\n    docker_sd_configs:\n      - host: unix:\u002F\u002F\u002Fvar\u002Frun\u002Fdocker.sock\n    relabel_configs:\n      - source_labels: ['__meta_docker_container_name']\n        target_label: 'container'\n",[15,1523,1524,1529,1535,1544,1548,1555,1567,1571,1577,1588,1594,1606,1612,1622,1632,1642,1646,1657,1664,1676,1683,1697],{"__ignoreMap":249},[265,1525,1526],{"class":267,"line":268},[265,1527,1528],{"class":357},"# \u002Fopt\u002Fpromtail\u002Fpromtail-config.yml em cada servidor\n",[265,1530,1531,1533],{"class":267,"line":279},[265,1532,1281],{"class":271},[265,1534,276],{"class":275},[265,1536,1537,1539,1541],{"class":267,"line":287},[265,1538,1288],{"class":271},[265,1540,293],{"class":275},[265,1542,1543],{"class":694},"9080\n",[265,1545,1546],{"class":267,"line":300},[265,1547,392],{"emptyLinePlaceholder":391},[265,1549,1550,1553],{"class":267,"line":308},[265,1551,1552],{"class":271},"clients",[265,1554,276],{"class":275},[265,1556,1557,1559,1562,1564],{"class":267,"line":317},[265,1558,856],{"class":275},[265,1560,1561],{"class":271},"url",[265,1563,293],{"class":275},[265,1565,1566],{"class":296},"http:\u002F\u002Fmonitor.seudominio.com:3100\u002Floki\u002Fapi\u002Fv1\u002Fpush\n",[265,1568,1569],{"class":267,"line":325},[265,1570,392],{"emptyLinePlaceholder":391},[265,1572,1573,1575],{"class":267,"line":333},[265,1574,868],{"class":271},[265,1576,276],{"class":275},[265,1578,1579,1581,1583,1585],{"class":267,"line":341},[265,1580,856],{"class":275},[265,1582,877],{"class":271},[265,1584,293],{"class":275},[265,1586,1587],{"class":296},"system\n",[265,1589,1590,1592],{"class":267,"line":349},[265,1591,887],{"class":271},[265,1593,276],{"class":275},[265,1595,1596,1598,1600,1602,1604],{"class":267,"line":361},[265,1597,311],{"class":275},[265,1599,831],{"class":271},[265,1601,834],{"class":275},[265,1603,201],{"class":296},[265,1605,840],{"class":275},[265,1607,1608,1610],{"class":267,"line":369},[265,1609,965],{"class":271},[265,1611,276],{"class":275},[265,1613,1614,1617,1619],{"class":267,"line":377},[265,1615,1616],{"class":271},"          job",[265,1618,293],{"class":275},[265,1620,1621],{"class":296},"varlogs\n",[265,1623,1624,1627,1629],{"class":267,"line":388},[265,1625,1626],{"class":271},"          host",[265,1628,293],{"class":275},[265,1630,1631],{"class":296},"${HOSTNAME}\n",[265,1633,1634,1637,1639],{"class":267,"line":395},[265,1635,1636],{"class":271},"          __path__",[265,1638,293],{"class":275},[265,1640,1641],{"class":296},"\u002Fvar\u002Flog\u002F*.log\n",[265,1643,1644],{"class":267,"line":403},[265,1645,392],{"emptyLinePlaceholder":391},[265,1647,1648,1650,1652,1654],{"class":267,"line":413},[265,1649,856],{"class":275},[265,1651,877],{"class":271},[265,1653,293],{"class":275},[265,1655,1656],{"class":296},"docker\n",[265,1658,1659,1662],{"class":267,"line":420},[265,1660,1661],{"class":271},"    docker_sd_configs",[265,1663,276],{"class":275},[265,1665,1666,1668,1671,1673],{"class":267,"line":428},[265,1667,311],{"class":275},[265,1669,1670],{"class":271},"host",[265,1672,293],{"class":275},[265,1674,1675],{"class":296},"unix:\u002F\u002F\u002Fvar\u002Frun\u002Fdocker.sock\n",[265,1677,1678,1681],{"class":267,"line":436},[265,1679,1680],{"class":271},"    relabel_configs",[265,1682,276],{"class":275},[265,1684,1685,1687,1690,1692,1695],{"class":267,"line":444},[265,1686,311],{"class":275},[265,1688,1689],{"class":271},"source_labels",[265,1691,834],{"class":275},[265,1693,1694],{"class":296},"'__meta_docker_container_name'",[265,1696,840],{"class":275},[265,1698,1699,1702,1704],{"class":267,"line":452},[265,1700,1701],{"class":271},"        target_label",[265,1703,293],{"class":275},[265,1705,1706],{"class":296},"'container'\n",[11,1708,1709,1710,1713,1714,1716],{},"Importante: o endpoint ",[15,1711,1712],{},"http:\u002F\u002Fmonitor.seudominio.com:3100\u002Floki\u002Fapi\u002Fv1\u002Fpush"," precisa estar acessível dos servidores. Se você seguiu o passo 2 e amarrou Loki em ",[15,1715,666],{},", você tem duas opções: expor a 3100 via reverse proxy com autenticação básica, ou abrir um túnel SSH\u002FWireGuard entre os servidores. A segunda opção é mais segura e o que recomendamos.",[11,1718,1719,1720,1723],{},"Validação: no Grafana, vá em Explore, selecione a fonte de dados Loki, rode ",[15,1721,1722],{},"{job=\"varlogs\"}"," e veja os logs aparecendo em tempo real.",[30,1725,1727],{"id":1726},"passo-6-como-importar-os-dashboards-do-grafana","Passo 6 — Como importar os dashboards do Grafana?",[11,1729,171,1730,121],{},[38,1731,1732],{},"20 minutos",[11,1734,1735,1736,1739,1740,121],{},"Acesse ",[15,1737,1738],{},"https:\u002F\u002Fmonitor.seudominio.com"," (depois de configurar o reverse proxy do passo 8 — pode pular pra lá agora se quiser). Login admin com a senha do ",[15,1741,674],{},[11,1743,1744,1745,1194],{},"Adicione as duas fontes de dados via provisioning automático. Em ",[15,1746,1747],{},"grafana\u002Fprovisioning\u002Fdatasources\u002Fdatasources.yml",[241,1749,1751],{"className":259,"code":1750,"language":261,"meta":249,"style":249},"apiVersion: 1\ndatasources:\n  - name: Prometheus\n    type: prometheus\n    access: proxy\n    url: http:\u002F\u002Fprometheus:9090\n    isDefault: true\n  - name: Loki\n    type: loki\n    access: proxy\n    url: http:\u002F\u002Floki:3100\n",[15,1752,1753,1762,1769,1781,1791,1801,1811,1820,1831,1840,1848],{"__ignoreMap":249},[265,1754,1755,1758,1760],{"class":267,"line":268},[265,1756,1757],{"class":271},"apiVersion",[265,1759,293],{"class":275},[265,1761,1358],{"class":694},[265,1763,1764,1767],{"class":267,"line":279},[265,1765,1766],{"class":271},"datasources",[265,1768,276],{"class":275},[265,1770,1771,1773,1776,1778],{"class":267,"line":287},[265,1772,856],{"class":275},[265,1774,1775],{"class":271},"name",[265,1777,293],{"class":275},[265,1779,1780],{"class":296},"Prometheus\n",[265,1782,1783,1786,1788],{"class":267,"line":300},[265,1784,1785],{"class":271},"    type",[265,1787,293],{"class":275},[265,1789,1790],{"class":296},"prometheus\n",[265,1792,1793,1796,1798],{"class":267,"line":308},[265,1794,1795],{"class":271},"    access",[265,1797,293],{"class":275},[265,1799,1800],{"class":296},"proxy\n",[265,1802,1803,1806,1808],{"class":267,"line":317},[265,1804,1805],{"class":271},"    url",[265,1807,293],{"class":275},[265,1809,1810],{"class":296},"http:\u002F\u002Fprometheus:9090\n",[265,1812,1813,1816,1818],{"class":267,"line":325},[265,1814,1815],{"class":271},"    isDefault",[265,1817,293],{"class":275},[265,1819,1502],{"class":694},[265,1821,1822,1824,1826,1828],{"class":267,"line":333},[265,1823,856],{"class":275},[265,1825,1775],{"class":271},[265,1827,293],{"class":275},[265,1829,1830],{"class":296},"Loki\n",[265,1832,1833,1835,1837],{"class":267,"line":341},[265,1834,1785],{"class":271},[265,1836,293],{"class":275},[265,1838,1839],{"class":296},"loki\n",[265,1841,1842,1844,1846],{"class":267,"line":349},[265,1843,1795],{"class":271},[265,1845,293],{"class":275},[265,1847,1800],{"class":296},[265,1849,1850,1852,1854],{"class":267,"line":361},[265,1851,1805],{"class":271},[265,1853,293],{"class":275},[265,1855,1856],{"class":296},"http:\u002F\u002Floki:3100\n",[11,1858,1859,1860,1863],{},"Reinicie o Grafana com ",[15,1861,1862],{},"docker compose restart grafana"," e as fontes aparecem automaticamente.",[11,1865,1866,1867,1870],{},"Importe os dashboards prontos. Em ",[38,1868,1869],{},"Dashboards → New → Import",", cole o ID do dashboard:",[67,1872,1873,1879,1885],{},[70,1874,1875,1878],{},[38,1876,1877],{},"1860"," — Node Exporter Full. CPU, RAM, disco, rede, sistema de arquivos. É o dashboard mais usado da comunidade Prometheus, com razão.",[70,1880,1881,1884],{},[38,1882,1883],{},"13639"," — Logs \u002F App. Visualização básica de logs do Loki com filtros por job, container, host.",[70,1886,1887,1890],{},[38,1888,1889],{},"15172"," — Cluster overview. Visão consolidada por servidor, útil pra cluster pequeno.",[11,1892,1893,1894,1897],{},"Customize cada um pra usar ",[15,1895,1896],{},"environment=\"production\""," no filtro padrão. Depois de duas semanas usando, você vai querer criar dashboards próprios pra workloads específicos — não tem atalho aí, é tempo de cadeira.",[30,1899,1901],{"id":1900},"passo-7-como-configurar-alertas-basicos","Passo 7 — Como configurar alertas básicos?",[11,1903,171,1904,121],{},[38,1905,232],{},[11,1907,1908],{},"Alertas são onde 80% dos times tropeçam: ou colocam pouquíssimos e descobrem incidentes pelos clientes, ou colocam dezenas e desensibilizam o time.",[11,1910,1911,1912,1915,1916,1194],{},"Comece com ",[38,1913,1914],{},"seis alertas essenciais",". Em ",[15,1917,1918],{},"prometheus\u002Falerts.yml",[241,1920,1922],{"className":259,"code":1921,"language":261,"meta":249,"style":249},"groups:\n  - name: essentials\n    interval: 30s\n    rules:\n      - alert: ServerDown\n        expr: up{job=\"node\"} == 0\n        for: 2m\n        labels:\n          severity: critical\n        annotations:\n          summary: \"Servidor {{ $labels.instance }} está fora do ar\"\n\n      - alert: HighCPU\n        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100) > 80\n        for: 10m\n        labels:\n          severity: warning\n\n      - alert: DiskAlmostFull\n        expr: (node_filesystem_avail_bytes{mountpoint=\"\u002F\"} \u002F node_filesystem_size_bytes{mountpoint=\"\u002F\"}) * 100 \u003C 15\n        for: 5m\n        labels:\n          severity: critical\n\n      - alert: HighMemory\n        expr: (1 - (node_memory_MemAvailable_bytes \u002F node_memory_MemTotal_bytes)) * 100 > 90\n        for: 10m\n        labels:\n          severity: warning\n\n      - alert: HighHTTPErrorRate\n        expr: sum(rate(http_requests_total{status=~\"5..\"}[5m])) \u002F sum(rate(http_requests_total[5m])) > 0.05\n        for: 5m\n        labels:\n          severity: critical\n\n      - alert: HighLatency\n        expr: histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) > 2\n        for: 10m\n        labels:\n          severity: warning\n",[15,1923,1924,1931,1942,1952,1959,1971,1981,1991,1997,2007,2014,2024,2028,2039,2048,2057,2063,2072,2076,2087,2096,2105,2111,2119,2123,2134,2143,2151,2157,2165,2169,2180,2189,2197,2203,2211,2215,2226,2235,2243,2249],{"__ignoreMap":249},[265,1925,1926,1929],{"class":267,"line":268},[265,1927,1928],{"class":271},"groups",[265,1930,276],{"class":275},[265,1932,1933,1935,1937,1939],{"class":267,"line":279},[265,1934,856],{"class":275},[265,1936,1775],{"class":271},[265,1938,293],{"class":275},[265,1940,1941],{"class":296},"essentials\n",[265,1943,1944,1947,1949],{"class":267,"line":287},[265,1945,1946],{"class":271},"    interval",[265,1948,293],{"class":275},[265,1950,1951],{"class":296},"30s\n",[265,1953,1954,1957],{"class":267,"line":300},[265,1955,1956],{"class":271},"    rules",[265,1958,276],{"class":275},[265,1960,1961,1963,1966,1968],{"class":267,"line":308},[265,1962,311],{"class":275},[265,1964,1965],{"class":271},"alert",[265,1967,293],{"class":275},[265,1969,1970],{"class":296},"ServerDown\n",[265,1972,1973,1976,1978],{"class":267,"line":317},[265,1974,1975],{"class":271},"        expr",[265,1977,293],{"class":275},[265,1979,1980],{"class":296},"up{job=\"node\"} == 0\n",[265,1982,1983,1986,1988],{"class":267,"line":325},[265,1984,1985],{"class":271},"        for",[265,1987,293],{"class":275},[265,1989,1990],{"class":296},"2m\n",[265,1992,1993,1995],{"class":267,"line":333},[265,1994,965],{"class":271},[265,1996,276],{"class":275},[265,1998,1999,2002,2004],{"class":267,"line":341},[265,2000,2001],{"class":271},"          severity",[265,2003,293],{"class":275},[265,2005,2006],{"class":296},"critical\n",[265,2008,2009,2012],{"class":267,"line":349},[265,2010,2011],{"class":271},"        annotations",[265,2013,276],{"class":275},[265,2015,2016,2019,2021],{"class":267,"line":361},[265,2017,2018],{"class":271},"          summary",[265,2020,293],{"class":275},[265,2022,2023],{"class":296},"\"Servidor {{ $labels.instance }} está fora do ar\"\n",[265,2025,2026],{"class":267,"line":369},[265,2027,392],{"emptyLinePlaceholder":391},[265,2029,2030,2032,2034,2036],{"class":267,"line":377},[265,2031,311],{"class":275},[265,2033,1965],{"class":271},[265,2035,293],{"class":275},[265,2037,2038],{"class":296},"HighCPU\n",[265,2040,2041,2043,2045],{"class":267,"line":388},[265,2042,1975],{"class":271},[265,2044,293],{"class":275},[265,2046,2047],{"class":296},"100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100) > 80\n",[265,2049,2050,2052,2054],{"class":267,"line":395},[265,2051,1985],{"class":271},[265,2053,293],{"class":275},[265,2055,2056],{"class":296},"10m\n",[265,2058,2059,2061],{"class":267,"line":403},[265,2060,965],{"class":271},[265,2062,276],{"class":275},[265,2064,2065,2067,2069],{"class":267,"line":413},[265,2066,2001],{"class":271},[265,2068,293],{"class":275},[265,2070,2071],{"class":296},"warning\n",[265,2073,2074],{"class":267,"line":420},[265,2075,392],{"emptyLinePlaceholder":391},[265,2077,2078,2080,2082,2084],{"class":267,"line":428},[265,2079,311],{"class":275},[265,2081,1965],{"class":271},[265,2083,293],{"class":275},[265,2085,2086],{"class":296},"DiskAlmostFull\n",[265,2088,2089,2091,2093],{"class":267,"line":436},[265,2090,1975],{"class":271},[265,2092,293],{"class":275},[265,2094,2095],{"class":296},"(node_filesystem_avail_bytes{mountpoint=\"\u002F\"} \u002F node_filesystem_size_bytes{mountpoint=\"\u002F\"}) * 100 \u003C 15\n",[265,2097,2098,2100,2102],{"class":267,"line":444},[265,2099,1985],{"class":271},[265,2101,293],{"class":275},[265,2103,2104],{"class":296},"5m\n",[265,2106,2107,2109],{"class":267,"line":452},[265,2108,965],{"class":271},[265,2110,276],{"class":275},[265,2112,2113,2115,2117],{"class":267,"line":460},[265,2114,2001],{"class":271},[265,2116,293],{"class":275},[265,2118,2006],{"class":296},[265,2120,2121],{"class":267,"line":467},[265,2122,392],{"emptyLinePlaceholder":391},[265,2124,2125,2127,2129,2131],{"class":267,"line":475},[265,2126,311],{"class":275},[265,2128,1965],{"class":271},[265,2130,293],{"class":275},[265,2132,2133],{"class":296},"HighMemory\n",[265,2135,2136,2138,2140],{"class":267,"line":484},[265,2137,1975],{"class":271},[265,2139,293],{"class":275},[265,2141,2142],{"class":296},"(1 - (node_memory_MemAvailable_bytes \u002F node_memory_MemTotal_bytes)) * 100 > 90\n",[265,2144,2145,2147,2149],{"class":267,"line":489},[265,2146,1985],{"class":271},[265,2148,293],{"class":275},[265,2150,2056],{"class":296},[265,2152,2153,2155],{"class":267,"line":497},[265,2154,965],{"class":271},[265,2156,276],{"class":275},[265,2158,2159,2161,2163],{"class":267,"line":507},[265,2160,2001],{"class":271},[265,2162,293],{"class":275},[265,2164,2071],{"class":296},[265,2166,2167],{"class":267,"line":514},[265,2168,392],{"emptyLinePlaceholder":391},[265,2170,2171,2173,2175,2177],{"class":267,"line":522},[265,2172,311],{"class":275},[265,2174,1965],{"class":271},[265,2176,293],{"class":275},[265,2178,2179],{"class":296},"HighHTTPErrorRate\n",[265,2181,2182,2184,2186],{"class":267,"line":530},[265,2183,1975],{"class":271},[265,2185,293],{"class":275},[265,2187,2188],{"class":296},"sum(rate(http_requests_total{status=~\"5..\"}[5m])) \u002F sum(rate(http_requests_total[5m])) > 0.05\n",[265,2190,2191,2193,2195],{"class":267,"line":540},[265,2192,1985],{"class":271},[265,2194,293],{"class":275},[265,2196,2104],{"class":296},[265,2198,2199,2201],{"class":267,"line":547},[265,2200,965],{"class":271},[265,2202,276],{"class":275},[265,2204,2205,2207,2209],{"class":267,"line":555},[265,2206,2001],{"class":271},[265,2208,293],{"class":275},[265,2210,2006],{"class":296},[265,2212,2213],{"class":267,"line":564},[265,2214,392],{"emptyLinePlaceholder":391},[265,2216,2217,2219,2221,2223],{"class":267,"line":569},[265,2218,311],{"class":275},[265,2220,1965],{"class":271},[265,2222,293],{"class":275},[265,2224,2225],{"class":296},"HighLatency\n",[265,2227,2228,2230,2232],{"class":267,"line":577},[265,2229,1975],{"class":271},[265,2231,293],{"class":275},[265,2233,2234],{"class":296},"histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le)) > 2\n",[265,2236,2237,2239,2241],{"class":267,"line":587},[265,2238,1985],{"class":271},[265,2240,293],{"class":275},[265,2242,2056],{"class":296},[265,2244,2245,2247],{"class":267,"line":594},[265,2246,965],{"class":271},[265,2248,276],{"class":275},[265,2250,2251,2253,2255],{"class":267,"line":602},[265,2252,2001],{"class":271},[265,2254,293],{"class":275},[265,2256,2071],{"class":296},[11,2258,2259,2260,2263],{},"E o ",[15,2261,2262],{},"alertmanager\u002Falertmanager.yml"," apontando pra um webhook do Slack ou Discord:",[241,2265,2267],{"className":259,"code":2266,"language":261,"meta":249,"style":249},"route:\n  group_by: ['alertname', 'severity']\n  group_wait: 30s\n  group_interval: 5m\n  repeat_interval: 4h\n  receiver: 'slack-default'\n  routes:\n    - match:\n        severity: critical\n      receiver: 'slack-critical'\n      repeat_interval: 1h\n\nreceivers:\n  - name: 'slack-default'\n    slack_configs:\n      - api_url: 'https:\u002F\u002Fhooks.slack.com\u002Fservices\u002FSEU\u002FWEBHOOK\u002FAQUI'\n        channel: '#alerts'\n        send_resolved: true\n\n  - name: 'slack-critical'\n    slack_configs:\n      - api_url: 'https:\u002F\u002Fhooks.slack.com\u002Fservices\u002FSEU\u002FWEBHOOK\u002FAQUI'\n        channel: '#alerts-critical'\n        send_resolved: true\n",[15,2268,2269,2276,2293,2302,2311,2321,2331,2338,2347,2356,2366,2376,2380,2387,2397,2404,2416,2426,2435,2439,2449,2455,2465,2474],{"__ignoreMap":249},[265,2270,2271,2274],{"class":267,"line":268},[265,2272,2273],{"class":271},"route",[265,2275,276],{"class":275},[265,2277,2278,2281,2283,2286,2288,2291],{"class":267,"line":279},[265,2279,2280],{"class":271},"  group_by",[265,2282,834],{"class":275},[265,2284,2285],{"class":296},"'alertname'",[265,2287,18],{"class":275},[265,2289,2290],{"class":296},"'severity'",[265,2292,840],{"class":275},[265,2294,2295,2298,2300],{"class":267,"line":287},[265,2296,2297],{"class":271},"  group_wait",[265,2299,293],{"class":275},[265,2301,1951],{"class":296},[265,2303,2304,2307,2309],{"class":267,"line":300},[265,2305,2306],{"class":271},"  group_interval",[265,2308,293],{"class":275},[265,2310,2104],{"class":296},[265,2312,2313,2316,2318],{"class":267,"line":308},[265,2314,2315],{"class":271},"  repeat_interval",[265,2317,293],{"class":275},[265,2319,2320],{"class":296},"4h\n",[265,2322,2323,2326,2328],{"class":267,"line":317},[265,2324,2325],{"class":271},"  receiver",[265,2327,293],{"class":275},[265,2329,2330],{"class":296},"'slack-default'\n",[265,2332,2333,2336],{"class":267,"line":325},[265,2334,2335],{"class":271},"  routes",[265,2337,276],{"class":275},[265,2339,2340,2342,2345],{"class":267,"line":333},[265,2341,818],{"class":275},[265,2343,2344],{"class":271},"match",[265,2346,276],{"class":275},[265,2348,2349,2352,2354],{"class":267,"line":341},[265,2350,2351],{"class":271},"        severity",[265,2353,293],{"class":275},[265,2355,2006],{"class":296},[265,2357,2358,2361,2363],{"class":267,"line":349},[265,2359,2360],{"class":271},"      receiver",[265,2362,293],{"class":275},[265,2364,2365],{"class":296},"'slack-critical'\n",[265,2367,2368,2371,2373],{"class":267,"line":361},[265,2369,2370],{"class":271},"      repeat_interval",[265,2372,293],{"class":275},[265,2374,2375],{"class":296},"1h\n",[265,2377,2378],{"class":267,"line":369},[265,2379,392],{"emptyLinePlaceholder":391},[265,2381,2382,2385],{"class":267,"line":377},[265,2383,2384],{"class":271},"receivers",[265,2386,276],{"class":275},[265,2388,2389,2391,2393,2395],{"class":267,"line":388},[265,2390,856],{"class":275},[265,2392,1775],{"class":271},[265,2394,293],{"class":275},[265,2396,2330],{"class":296},[265,2398,2399,2402],{"class":267,"line":395},[265,2400,2401],{"class":271},"    slack_configs",[265,2403,276],{"class":275},[265,2405,2406,2408,2411,2413],{"class":267,"line":403},[265,2407,311],{"class":275},[265,2409,2410],{"class":271},"api_url",[265,2412,293],{"class":275},[265,2414,2415],{"class":296},"'https:\u002F\u002Fhooks.slack.com\u002Fservices\u002FSEU\u002FWEBHOOK\u002FAQUI'\n",[265,2417,2418,2421,2423],{"class":267,"line":413},[265,2419,2420],{"class":271},"        channel",[265,2422,293],{"class":275},[265,2424,2425],{"class":296},"'#alerts'\n",[265,2427,2428,2431,2433],{"class":267,"line":420},[265,2429,2430],{"class":271},"        send_resolved",[265,2432,293],{"class":275},[265,2434,1502],{"class":694},[265,2436,2437],{"class":267,"line":428},[265,2438,392],{"emptyLinePlaceholder":391},[265,2440,2441,2443,2445,2447],{"class":267,"line":436},[265,2442,856],{"class":275},[265,2444,1775],{"class":271},[265,2446,293],{"class":275},[265,2448,2365],{"class":296},[265,2450,2451,2453],{"class":267,"line":444},[265,2452,2401],{"class":271},[265,2454,276],{"class":275},[265,2456,2457,2459,2461,2463],{"class":267,"line":452},[265,2458,311],{"class":275},[265,2460,2410],{"class":271},[265,2462,293],{"class":275},[265,2464,2415],{"class":296},[265,2466,2467,2469,2471],{"class":267,"line":460},[265,2468,2420],{"class":271},[265,2470,293],{"class":275},[265,2472,2473],{"class":296},"'#alerts-critical'\n",[265,2475,2476,2478,2480],{"class":267,"line":467},[265,2477,2430],{"class":271},[265,2479,293],{"class":275},[265,2481,1502],{"class":694},[11,2483,2484,2485,2488,2489,2492],{},"Dois detalhes que economizam noite de sono. O ",[15,2486,2487],{},"for: 10m"," em CPU evita que picos curtos virem alertas — o servidor pode chegar a 95% por 30 segundos e isso ser normal. O ",[15,2490,2491],{},"repeat_interval: 4h"," pra warnings garante que um warning resolvido em uma hora não vire 60 mensagens — o Alertmanager agrupa.",[11,2494,2495,2496,2498,2499,2502,2503,2506],{},"Recarregue o Prometheus (",[15,2497,1060],{},") e teste forçando um alerta: ",[15,2500,2501],{},"stress --cpu 4 --timeout 700s"," em algum servidor deve disparar ",[15,2504,2505],{},"HighCPU"," em 10 minutos.",[30,2508,2510],{"id":2509},"passo-8-como-colocar-reverse-proxy-e-tls-na-frente","Passo 8 — Como colocar reverse proxy e TLS na frente?",[11,2512,171,2513,121],{},[38,2514,1732],{},[11,2516,2517,2518,2520],{},"Pra acessar Grafana via ",[15,2519,1738],{}," com certificado válido, você precisa de algo na frente da porta 3000. Duas opções:",[182,2522,2523,2533],{},[70,2524,2525,2528,2529,2532],{},[38,2526,2527],{},"Roteador integrado do orquestrador"," — se você já tem o cluster HeroCtl rodando, basta declarar o Grafana como job com ",[15,2530,2531],{},"ingress: { host: monitor.seudominio.com, tls: true }",". Certificado Let's Encrypt automático, sem ferramenta adicional.",[70,2534,2535,2538,2539],{},[38,2536,2537],{},"Caddy standalone"," no próprio VPS de observabilidade — também emite Let's Encrypt automaticamente. Caddyfile mínimo:",[241,2540,2543],{"className":2541,"code":2542,"language":246},[244],"monitor.seudominio.com {\n  reverse_proxy localhost:3000\n  basicauth \u002Flogin {\n    admin \u003Chash_bcrypt>\n  }\n}\n",[15,2544,2542],{"__ignoreMap":249},[11,2546,2547,2548,2551],{},"Pra defesa em profundidade, mantenha autenticação básica do Caddy\u002Froteador na frente do login do Grafana — duas barreiras, não uma. A segunda é especialmente importante porque o login default do Grafana é ",[15,2549,2550],{},"admin\u002Fadmin"," e a primeira coisa que bots fazem em um Grafana exposto é tentar essa combinação.",[30,2553,2555],{"id":2554},"passo-9-como-instrumentar-metricas-de-aplicacao","Passo 9 — Como instrumentar métricas de aplicação?",[11,2557,171,2558,121],{},[38,2559,2560],{},"varia conforme número de aplicações",[11,2562,2563],{},"Métricas de sistema são metade da história. A outra metade é o que sua aplicação está fazendo — quantas requisições por segundo, qual a latência p99, quantos erros, qual o tamanho da fila de jobs em background.",[11,2565,2566],{},"Cada linguagem popular tem cliente Prometheus oficial:",[67,2568,2569,2577,2585,2592],{},[70,2570,2571,293,2574],{},[38,2572,2573],{},"Node.js",[15,2575,2576],{},"prom-client",[70,2578,2579,293,2582],{},[38,2580,2581],{},"Python",[15,2583,2584],{},"prometheus-client",[70,2586,2587,293,2590],{},[38,2588,2589],{},"Ruby",[15,2591,2584],{},[70,2593,2594,293,2597],{},[38,2595,2596],{},"Go",[15,2598,2599],{},"github.com\u002Fprometheus\u002Fclient_golang",[11,2601,2602],{},"O padrão mínimo são três métricas por endpoint HTTP:",[67,2604,2605,2620,2626],{},[70,2606,2607,2610,2611,18,2614,18,2617,121],{},[15,2608,2609],{},"http_requests_total"," — counter, com labels ",[15,2612,2613],{},"method",[15,2615,2616],{},"path",[15,2618,2619],{},"status",[70,2621,2622,2625],{},[15,2623,2624],{},"http_request_duration_seconds"," — histogram, mesmo set de labels.",[70,2627,2628,2631,2632,2635],{},[15,2629,2630],{},"app_errors_total"," — counter, com label ",[15,2633,2634],{},"kind"," (\"validation\", \"db\", \"external_api\", etc).",[11,2637,2638,2639,2641,2642,2644],{},"Exponha tudo isso em ",[15,2640,151],{},". Adicione o endpoint no ",[15,2643,868],{}," do Prometheus. Em horas você tem dashboards por endpoint, alertas por taxa de erro, e a capacidade de responder \"o que estava acontecendo às 3:14 de ontem\" com um gráfico em vez de um chute.",[11,2646,2647,2648,2651,2652,2655],{},"Cuidado com ",[38,2649,2650],{},"cardinalidade",". Cada combinação única de labels vira uma série temporal separada. Se você botar ",[15,2653,2654],{},"user_id"," como label, com 100k usuários você cria 100k séries — e o Prometheus vai consumir 8+ GB de RAM só pra indexar isso. Regra prática: labels têm valores em conjuntos pequenos (status code: 5 valores; método: 5 valores; path: dezenas). Identificadores únicos vão em logs, não em métricas.",[30,2657,2659],{"id":2658},"como-rodar-isso-dentro-do-heroctl-em-vez-de-vps-dedicado","Como rodar isso dentro do HeroCtl em vez de VPS dedicado?",[11,2661,2662],{},"Pra clusters que já rodam o orquestrador, faz sentido considerar a stack como mais um job. Trade-off: você economiza um VPS, mas perde isolamento (se o cluster morrer, o monitoring morre junto).",[11,2664,2665],{},"A topologia fica assim:",[67,2667,2668,2674,2680,2686],{},[70,2669,2670,2673],{},[38,2671,2672],{},"1 job spec único"," com 4 tasks: prometheus, grafana, loki, alertmanager.",[70,2675,2676,2679],{},[38,2677,2678],{},"Volumes replicados"," no cluster — os dados sobrevivem a falha de um nó.",[70,2681,2682,2685],{},[38,2683,2684],{},"Roteador integrado"," faz o TLS automático via subdomínio. Não precisa de Caddy adicional.",[70,2687,2688,2691],{},[38,2689,2690],{},"Métricas do próprio cluster"," já são expostas em formato Prometheus na API administrativa, então o scrape é direto.",[11,2693,2694],{},"Pra produção crítica, recomendamos a separação física (VPS dedicado fora do cluster). Pra projeto pessoal, MVP, ou time pequeno onde \"tudo cair junto\" é aceitável, rodar dentro é mais barato e operacionalmente mais simples. O job spec inteiro fica em torno de 80 linhas de manifesto.",[30,2696,2698],{"id":2697},"quanto-custa-essa-stack-por-mes-no-brasil","Quanto custa essa stack por mês no Brasil?",[2700,2701,2702,2715],"table",{},[2703,2704,2705],"thead",{},[2706,2707,2708,2712],"tr",{},[2709,2710,2711],"th",{},"Item",[2709,2713,2714],{},"Custo mensal (BRL)",[2716,2717,2718,2727,2735,2743],"tbody",{},[2706,2719,2720,2724],{},[2721,2722,2723],"td",{},"VPS observability dedicado (4 GB RAM)",[2721,2725,2726],{},"R$40 a R$80",[2706,2728,2729,2732],{},[2721,2730,2731],{},"Object storage pra retenção longa de logs (opcional)",[2721,2733,2734],{},"R$30",[2706,2736,2737,2740],{},[2721,2738,2739],{},"Tempo de manutenção (2 a 4h × valor da hora)",[2721,2741,2742],{},"R$200 a R$400",[2706,2744,2745,2750],{},[2721,2746,2747],{},[38,2748,2749],{},"Total operacional",[2721,2751,2752],{},[38,2753,2754],{},"R$300 a R$500",[11,2756,2757],{},"Pra comparação, uma assinatura de Datadog ou New Relic com cobertura equivalente (5 hosts, retenção de logs de 30 dias, alertas, dashboards) sai em torno de R$1.500 a R$2.000 por mês — sem contar o overage automático que aparece no fim do mês quando alguém esquece um log verboso ligado.",[11,2759,2760],{},"A diferença não é pequena: em um ano, a stack open-source self-hosted economiza entre R$12.000 e R$18.000. Pra startup em estágio inicial, isso é meio engenheiro júnior.",[30,2762,2764],{"id":2763},"tabela-de-portas-recursos-e-caracteristicas-por-componente","Tabela de portas, recursos e características por componente",[2700,2766,2767,2789],{},[2703,2768,2769],{},[2706,2770,2771,2774,2777,2780,2783,2786],{},[2709,2772,2773],{},"Componente",[2709,2775,2776],{},"Porta",[2709,2778,2779],{},"RAM mínima",[2709,2781,2782],{},"Disco",[2709,2784,2785],{},"Retenção default",[2709,2787,2788],{},"Formato dos dados",[2716,2790,2791,2810,2829,2847,2866,2882],{},[2706,2792,2793,2795,2798,2801,2804,2807],{},[2721,2794,74],{},[2721,2796,2797],{},"9090",[2721,2799,2800],{},"512 MB",[2721,2802,2803],{},"10 GB",[2721,2805,2806],{},"15 dias",[2721,2808,2809],{},"TSDB binário",[2706,2811,2812,2814,2817,2820,2823,2826],{},[2721,2813,80],{},[2721,2815,2816],{},"3000",[2721,2818,2819],{},"256 MB",[2721,2821,2822],{},"1 GB",[2721,2824,2825],{},"N\u002FA",[2721,2827,2828],{},"SQLite ou Postgres",[2706,2830,2831,2833,2836,2838,2841,2844],{},[2721,2832,86],{},[2721,2834,2835],{},"3100",[2721,2837,2800],{},[2721,2839,2840],{},"30 GB",[2721,2842,2843],{},"30 dias (configurável)",[2721,2845,2846],{},"chunks comprimidos",[2706,2848,2849,2852,2855,2858,2861,2863],{},[2721,2850,2851],{},"Promtail \u002F Agent",[2721,2853,2854],{},"9080",[2721,2856,2857],{},"128 MB",[2721,2859,2860],{},"mínimo",[2721,2862,2825],{},[2721,2864,2865],{},"passa por valor",[2706,2867,2868,2870,2873,2875,2877,2879],{},[2721,2869,104],{},[2721,2871,2872],{},"9093",[2721,2874,2857],{},[2721,2876,2822],{},[2721,2878,2825],{},[2721,2880,2881],{},"log de notificações",[2706,2883,2884,2886,2889,2892,2894,2896],{},[2721,2885,98],{},[2721,2887,2888],{},"9100",[2721,2890,2891],{},"64 MB",[2721,2893,2860],{},[2721,2895,2825],{},[2721,2897,2898],{},"endpoint de scrape",[11,2900,2901],{},"Essas são as mínimas viáveis pra cluster pequeno. Em produção com 30 servidores e tráfego real, multiplique RAM por 3 e disco por 5.",[30,2903,2905],{"id":2904},"os-quatro-erros-que-matam-stack-de-monitoring-nova","Os quatro erros que matam stack de monitoring nova",[11,2907,2908],{},"Times montando observabilidade pela primeira vez tropeçam quase sempre nos mesmos quatro erros. Saber sobre eles antes economiza meses.",[11,2910,2911,2914,2915,2918],{},[38,2912,2913],{},"Não monitorar o monitoring."," O Prometheus parou de scrape na quinta-feira; ninguém viu. Na quarta-feira da semana seguinte um servidor caiu de verdade e descobriram que não tinha alerta porque o Prometheus estava morto há 6 dias. Solução: configure um cron externo simples (até um Pingdom gratuito serve) que bate em ",[15,2916,2917],{},"https:\u002F\u002Fmonitor.seudominio.com\u002Fapi\u002Fhealth"," a cada 5 minutos e te avisa quando o próprio Grafana cair.",[11,2920,2921,2924,2925,2928],{},[38,2922,2923],{},"Sem estratégia de retenção."," Disco enche em três meses, Prometheus para de gravar, alguém deleta tudo no desespero, perde 90 dias de histórico. Configure ",[15,2926,2927],{},"--storage.tsdb.retention.time=30d"," desde o dia um e estabeleça um job de housekeeping.",[11,2930,2931,2934,2935,18,2937,2940],{},[38,2932,2933],{},"Cardinalidade alta em labels."," Já cobrimos no passo 9, mas vale repetir: cada ",[15,2936,2654],{},[15,2938,2939],{},"request_id"," ou UUID que vira label é um número que multiplica explosivamente o consumo de RAM do Prometheus. Identificadores únicos vão pra Loki, não pro Prometheus.",[11,2942,2943,2946],{},[38,2944,2945],{},"Alertas barulhentos."," O time recebe 200 alertas por dia. Em duas semanas, ninguém olha mais. Quando o site cair de verdade, o alerta vai estar no meio de outros 199. Solução: comece com seis alertas (os do passo 7), audite a cada duas semanas, e exclua tudo que disparou mas não exigiu ação humana. Alerta sem ação é ruído.",[30,2948,2950],{"id":2949},"faq","FAQ",[11,2952,2953,2956],{},[38,2954,2955],{},"Posso rodar tudo num VPS de 2 GB?","\nTecnicamente sim, pra cluster de até 3 servidores e poucas aplicações. Na prática você vai bater no teto de RAM em 2 a 3 meses, especialmente se importar dashboards densos no Grafana. Pague os 50 reais a mais e vá direto pro VPS de 4 GB — o tempo que você economiza não brigando com OOM kills paga sozinho.",[11,2958,2959,2962],{},[38,2960,2961],{},"Quanto de disco pra 30 dias de logs?","\nDepende totalmente do volume de logs da sua aplicação. Regra grosseira pra startup pequena: cluster de 4 servidores com aplicações web normais gera 1 a 5 GB de logs por dia depois de compressão do Loki. Trinta dias dá entre 30 e 150 GB. Comece com 50 GB de SSD, monitore o crescimento por duas semanas, expanda se necessário. Se você for muito mais que isso, é hora de ir pra object storage.",[11,2964,2965,2968],{},[38,2966,2967],{},"Grafana Cloud vs self-hosted, qual escolher?","\nGrafana Cloud free tier é generoso (10k séries, 50 GB de logs, retenção de 14 dias) e elimina o trabalho de manter o servidor. Pra projeto solo ou time muito pequeno, faz sentido. A partir do momento que você passa do free tier, os preços escalam rápido — a partir de US$50\u002Fmês — e você perde o controle sobre os dados. Self-hosted custa hardware + tempo, Cloud custa dinheiro + lock-in. Pra empresa que pretende crescer e tem um dev DevOps no time, self-hosted ganha.",[11,2970,2971,2974],{},[38,2972,2973],{},"Promtail ou Grafana Agent?","\nEm 2026, o Grafana Agent (rebatizado pra Grafana Alloy) está substituindo o Promtail oficialmente. Pra setup novo, vá direto de Alloy. Pra setup que já roda Promtail há tempo, não tem urgência em migrar — o Promtail vai continuar funcionando por anos.",[11,2976,2977,2980,2981,2983],{},[38,2978,2979],{},"OpenTelemetry encaixa onde nessa stack?","\nOTel é o padrão de instrumentação de aplicação que está consolidando. Em vez de usar ",[15,2982,2576],{}," direto, você usa o SDK do OTel e ele exporta pra Prometheus, Loki e Tempo simultaneamente. A vantagem grande é portabilidade — se você quiser trocar Prometheus por outra coisa daqui a 3 anos, sua aplicação não muda uma linha. Pra startup começando hoje, recomendamos OTel desde o dia um.",[11,2985,2986,2989,2990,2993,2994,2997],{},[38,2987,2988],{},"Como faço backup do Prometheus?","\nPrometheus tem snapshot via API: ",[15,2991,2992],{},"curl -X POST localhost:9090\u002Fapi\u002Fv1\u002Fadmin\u002Ftsdb\u002Fsnapshot"," cria um snapshot no diretório de dados. Faça isso uma vez por dia via cron, faça ",[15,2995,2996],{},"tar.gz"," e envie pra object storage. Em caso de desastre, o que você perde é métricas — e métricas, diferente de logs, são tipicamente recuperáveis em horas (volta a coletar e os dashboards voltam). Logs perdidos são perdidos pra sempre, então invista mais em backup de Loki.",[11,2999,3000,3003],{},[38,3001,3002],{},"Tempo (traces distribuídos) vale instalar agora?","\nNão. Traces ficam úteis a partir do momento que você tem 5+ serviços conversando entre si e debugar latência envolve seguir uma requisição por vários hops. Pra arquitetura monolítica ou poucos serviços, traces dão trabalho desproporcional ao valor. Adicione quando a complexidade pedir.",[11,3005,3006,3009],{},[38,3007,3008],{},"Loki indexa full-text como ELK?","\nNão, e essa é a feature, não bug. Loki indexa apenas labels (job, host, container, severity) e o conteúdo do log fica comprimido sem índice. Pra buscar texto, você filtra por labels primeiro e depois faz grep nos chunks resultantes. Isso é o que torna o Loki dez vezes mais barato que ELK em RAM e CPU. Em troca, queries de texto livre em todo o histórico são mais lentas. Pra 90% dos casos de debugging, filtrar por job + host + janela de tempo já reduz pra dezenas de MB onde o grep voa.",[30,3011,3013],{"id":3012},"proximos-passos","Próximos passos",[11,3015,3016],{},"Subiu a stack, tem dashboard, tem alerta, tem log pesquisável? Boa. As próximas três coisas que valem o investimento são, em ordem:",[182,3018,3019,3025,3039],{},[70,3020,3021,3024],{},[38,3022,3023],{},"Custom dashboards por aplicação"," — métricas de negócio (assinaturas criadas\u002Fhora, jobs processados, fila de e-mails) em vez de só infraestrutura.",[70,3026,3027,3030,3031,3034,3035,3038],{},[38,3028,3029],{},"Runbooks linkados nos alertas"," — toda regra em ",[15,3032,3033],{},"alerts.yml"," deve ter ",[15,3036,3037],{},"annotations.runbook_url"," apontando pra uma página explicando o que fazer. Quando o alerta dispara às 3 da manhã, o sono não pensa.",[70,3040,3041,3044],{},[38,3042,3043],{},"Revisão mensal de alertas"," — 30 minutos uma vez por mês auditando o que disparou no mês anterior, deletando o que virou ruído, ajustando thresholds.",[11,3046,3047,3048,3053,3054,121],{},"Pra quem quer ir além e entender por que escolhemos essa stack em vez de SaaS gerenciado, leia ",[3049,3050,3052],"a",{"href":3051},"\u002Fblog\u002Fobservabilidade-sem-datadog-stack-startup","Observabilidade sem Datadog: a stack da startup brasileira",". E pra fechar o ciclo de operação — porque não adianta saber que o banco caiu se você não consegue restaurar — vale ler ",[3049,3055,3057],{"href":3056},"\u002Fblog\u002Fbackup-banco-em-cluster-estrategias-3-da-manha","Backup de banco em cluster: estratégias pras 3 da manhã",[11,3059,3060],{},"Se você quer pular essa montagem toda e rodar a stack como job dentro de um orquestrador que já cuida de TLS, rolling deploy e replicação de volume:",[241,3062,3064],{"className":685,"code":3063,"language":687,"meta":249,"style":249},"curl -sSL get.heroctl.com\u002Finstall.sh | sh\n",[15,3065,3066],{"__ignoreMap":249},[265,3067,3068,3071,3074,3077,3080],{"class":267,"line":268},[265,3069,3070],{"class":703},"curl",[265,3072,3073],{"class":694}," -sSL",[265,3075,3076],{"class":296}," get.heroctl.com\u002Finstall.sh",[265,3078,3079],{"class":1104}," |",[265,3081,3082],{"class":703}," sh\n",[11,3084,3085],{},"Quatro horas viram quarenta minutos. O resto é o mesmo trabalho de pensar quais alertas importam — e nessa parte ninguém te livra.",[3087,3088,3089],"style",{},"html pre.shiki code .sFSAA, html code.shiki .sFSAA{--shiki-default:#79C0FF}html pre.shiki code .s9uIt, html code.shiki .s9uIt{--shiki-default:#A5D6FF}html pre.shiki code .sQhOw, html code.shiki .sQhOw{--shiki-default:#FFA657}html pre.shiki code .sH3jZ, html code.shiki .sH3jZ{--shiki-default:#8B949E}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html pre.shiki code .suJrU, html code.shiki .suJrU{--shiki-default:#FF7B72}html pre.shiki code .sZEs4, html code.shiki .sZEs4{--shiki-default:#E6EDF3}html pre.shiki code .sPWt5, html code.shiki .sPWt5{--shiki-default:#7EE787}",{"title":249,"searchDepth":279,"depth":279,"links":3091},[3092,3093,3094,3095,3096,3097,3098,3099,3100,3101,3102,3103,3104,3105,3106,3107,3108,3109],{"id":32,"depth":279,"text":33},{"id":61,"depth":279,"text":62},{"id":124,"depth":279,"text":125},{"id":167,"depth":279,"text":168},{"id":226,"depth":279,"text":227},{"id":752,"depth":279,"text":753},{"id":1076,"depth":279,"text":1077},{"id":1246,"depth":279,"text":1247},{"id":1726,"depth":279,"text":1727},{"id":1900,"depth":279,"text":1901},{"id":2509,"depth":279,"text":2510},{"id":2554,"depth":279,"text":2555},{"id":2658,"depth":279,"text":2659},{"id":2697,"depth":279,"text":2698},{"id":2763,"depth":279,"text":2764},{"id":2904,"depth":279,"text":2905},{"id":2949,"depth":279,"text":2950},{"id":3012,"depth":279,"text":3013},"engenharia",null,"2026-05-12","Tutorial honesto pra subir métricas, logs e dashboards pro seu cluster — em 4 horas, sem Datadog. Stack open-source que cabe em 1 VPS de R$80\u002Fmês.",false,"md",{},"\u002Fblog\u002Fmonitoring-stack-completa-prometheus-grafana-loki-passo-a-passo","16 min",{"title":5,"description":3113},{"loc":3117},"blog\u002Fmonitoring-stack-completa-prometheus-grafana-loki-passo-a-passo",[3123,3124,3125,3126,3127,3110],"prometheus","grafana","loki","monitoring","tutorial","oC9lCwAyyAdpz2EHoBKkH49q8NBMUQp9-nioO314jWo",[3130,3137],{"title":3131,"path":3132,"stem":3133,"description":3134,"date":3135,"category":3136,"children":-1},"Migrar do Heroku pra cluster próprio: guia técnico em 5 passos","\u002Fblog\u002Fmigrar-do-heroku-guia-tecnico","blog\u002Fmigrar-do-heroku-guia-tecnico","O fim do plano gratuito do Heroku em novembro\u002F2022 transformou migração em prioridade pra centenas de times brasileiros. Plano detalhado com checklist, tempo estimado, e armadilhas comuns.","2026-03-11","caso-de-uso",{"title":3138,"path":3139,"stem":3140,"description":3141,"date":3142,"category":3110,"children":-1},"Multi-tenant SaaS com isolamento real: 3 padrões e quando cada um vira pesadelo","\u002Fblog\u002Fmulti-tenant-saas-isolamento-real","blog\u002Fmulti-tenant-saas-isolamento-real","Pool, schema-per-tenant, app-per-tenant. Cada padrão tem benefícios óbvios e custos invisíveis. Como decidir antes do primeiro cliente B2B sério perguntar 'meus dados estão isolados?'.","2026-04-01",{"path":3144},"\u002Fen\u002Fblog\u002Fmonitoring-stack-prometheus-grafana-loki",1777362207907]