44 Commits

Author SHA1 Message Date
e35415d3d1 Merge branch 'main' of https://github.com/KerradKerridi/prod 2026-02-01 22:31:08 +03:00
25dd64fc01 feat: add coverage test targets for Telegram bot and AnonBot in Makefile 2026-02-01 22:31:03 +03:00
ANDREY KATYKHIN
51c2a562fa Merge pull request #5 from KerradKerridi/dev-5
refactor: упростил скрипты deploy.yml и ci.yml
2026-01-25 22:26:16 +03:00
4d328444bd refactor: упростил скрипты deploy.yml и ci.yml 2026-01-25 22:24:12 +03:00
804ecd6107 remove: delete health check step from deploy workflow 2026-01-25 20:51:27 +03:00
d736688c62 fix: increase container wait time, fix status variable name, fix delays array for zsh 2026-01-25 20:43:12 +03:00
1bfe772a0d fix: use flock directly with file instead of file descriptor for zsh compatibility 2026-01-25 20:36:07 +03:00
e360e5e215 fix: replace exec 200 with flock -x 9 for zsh compatibility 2026-01-25 20:33:08 +03:00
76cb533851 fix: use exec for flock file descriptors to work with zsh 2026-01-25 20:27:55 +03:00
30465e0bea debug: add more verbose logging for secrets in deploy steps 2026-01-25 20:24:06 +03:00
0a73f9844e fix: pass secrets directly to SSH scripts instead of using env 2026-01-25 20:14:49 +03:00
2ee1977956 feat: add workflow_dispatch to deploy.yml and debug secrets 2026-01-25 20:09:18 +03:00
220b24e867 Merge branch 'dev-4' 2026-01-25 20:08:34 +03:00
fb33da172a debug: add secrets availability check in deploy workflow 2026-01-25 20:08:25 +03:00
ANDREY KATYKHIN
9baee2ceb7 Merge pull request #4 from KerradKerridi/dev-4
Merge dev-4 into main
2026-01-25 19:58:22 +03:00
60487b5488 some fix agaaain 2026-01-25 19:51:23 +03:00
07982ee0f2 some fix 3 2026-01-25 19:24:55 +03:00
6c51a82dce some fix 2 2026-01-25 19:14:07 +03:00
5e57e5214c some fix CI 2026-01-25 19:08:24 +03:00
8e595bf7f2 chore: remove outdated monitoring documentation files
- Deleted FIX_PROMLEMS.md and MONITORING_AUTH.md as they contained obsolete information regarding Prometheus and Alertmanager configurations.
- This cleanup helps streamline the documentation and focuses on current setup practices.
2026-01-25 19:02:46 +03:00
34b0345983 some fix 2026-01-25 18:50:18 +03:00
1dceab6479 chore: Обновление Docker Compose и CI/CD пайплайна
- Docker Compose теперь использует GitHub Secrets для токенов ботов (приоритет над .env)
- Добавлена функция ручного отката с указанием коммита
- Реализованы проверки работоспособности с экспоненциальной задержкой
- Улучшены уведомления об откате
2026-01-25 18:33:58 +03:00
0cdc40cd21 chore: enhance deployment workflow with improved health checks and manual trigger
- Updated the deployment job to allow manual triggering via workflow_dispatch.
- Implemented a retry mechanism for health checks on Prometheus and Grafana to improve reliability.
- Increased wait time for services to start before health checks are performed.
- Modified health check messages for better clarity and added logging for failed checks.
2026-01-25 16:58:16 +03:00
fde1f14708 chore: update CI/CD pipeline configuration for improved branch handling
- Renamed the CI/CD pipeline for clarity and consistency.
- Updated the branch triggers to include 'dev-*' for better integration of development branches.
- Removed the URL setting for the production environment to streamline the deployment process.
2026-01-25 15:52:02 +03:00
5a0c2d6942 chore: remove CI and deployment workflows to streamline processes
- Deleted outdated CI workflow file to simplify the continuous integration process.
- Removed deployment workflow file to eliminate redundancy and focus on a more efficient deployment strategy.
2026-01-25 15:46:58 +03:00
153a7d4807 chore: refine CI and deployment workflows with enhanced notifications and checks
- Improved CI workflow notifications for better clarity on test results.
- Added a status check job in the deployment workflow to ensure only successful builds are deployed.
- Updated deployment notification messages for improved context and clarity.
2026-01-25 15:44:21 +03:00
0944175807 chore: enhance CI and deployment workflows with status checks and notifications
- Updated CI workflow to provide clearer notifications on test results and deployment readiness.
- Added a new job in the deployment workflow to check the status of the last CI run before proceeding with deployment, ensuring that only successful builds are deployed.
2026-01-25 15:39:19 +03:00
3ee72ec48a chore: update CI and deployment workflows for improved notifications and permissions
- Upgraded the upload-artifact action from v3 to v4 in CI workflow for better performance.
- Added a notification step in the CI workflow to send test results via Telegram, including job status and repository details.
- Modified the deployment workflow to ensure correct file permissions before and after code updates.
- Renamed the deployment notification step for clarity and included a link to the action run details in the message.
2026-01-25 15:35:56 +03:00
dd8b1c02a4 chore: update Python version in Dockerfile and improve test commands in Makefile
- Upgraded Python version in Dockerfile from 3.9 to 3.11.9 for enhanced performance and security.
- Adjusted paths in Dockerfile to reflect the new Python version.
- Modified test commands in Makefile to activate the virtual environment before running tests, ensuring proper dependency management.
2026-01-25 15:27:57 +03:00
9e03c1f6f2 chore: optimize resource allocation and memory settings in Docker Compose
- Added memory and CPU limits and reservations for Prometheus, Grafana, and Uptime Kuma services to enhance performance and resource management.
- Updated Prometheus and Grafana configurations with new storage block duration settings for improved memory optimization.
- Revised README to include additional commands for running specific services and restarting containers.
2026-01-23 21:38:48 +03:00
75cd722cc4 fix: update htpasswd generation for monitoring and status page
- Modified the htpasswd command to limit the password length to 72 characters for security compliance.
- Added a new task to generate an htpasswd hash specifically for the status page.
- Updated the task that creates the htpasswd file to use the output from the new hash generation.
2026-01-22 22:38:01 +03:00
95fabdc0d1 refactor: consolidate Nginx configurations into a single file
- Merged individual Nginx configuration files for Grafana, Prometheus, and Alertmanager into a unified nginx.conf.
- Added location blocks for Grafana, Prometheus, and Alertmanager with appropriate proxy settings, authentication, and rate limiting.
- Removed obsolete configuration files to streamline the Nginx setup and improve maintainability.
2025-09-20 01:14:10 +03:00
8be219778c chore: update configuration files for improved logging and service management
- Enhanced .dockerignore to exclude bot logs, Docker volumes, and temporary files.
- Updated .gitignore to include Ansible vars files for better environment management.
- Modified docker-compose.yml health checks to use curl for service verification.
- Refined Ansible playbook by adding tasks for creating default Zsh configuration files and cleaning up temporary files.
- Improved Nginx configuration to support Uptime Kuma with specific location blocks for status and dashboard, including rate limiting and WebSocket support.
2025-09-19 16:40:40 +03:00
a075ef6772 chore: remove specific version reference for telegram-helper-bot in Ansible playbook
- Eliminated the hardcoded version 'dev-9' for the telegram-helper-bot repository in the Ansible playbook to allow for more flexible updates.
2025-09-19 13:03:25 +03:00
8595fc5886 refactor: streamline Ansible playbook and logrotate configurations
- Removed environment variable lookups for logrotate settings in logrotate configuration files, replacing them with hardcoded values.
- Updated the Ansible playbook to simplify project root, deploy user, and old server configurations by removing environment variable dependencies.
- Added tasks to copy Zsh configuration files from an old server to the new server, ensuring proper permissions and cleanup of temporary files.
- Enhanced logrotate configurations for bots and system logs to ensure consistent management of log files.
2025-09-19 13:00:19 +03:00
f7b08ae9e8 feat: enhance Ansible playbook and Nginx configuration with authentication and logrotate setup
- Added environment variables for project configuration in env.template.
- Updated Ansible playbook to use environment variables for project settings and added tasks for monitoring authentication setup.
- Enhanced Nginx configuration for Alertmanager and Prometheus with HTTP Basic Authentication.
- Introduced logrotate configuration for managing log files and set up cron for daily execution.
- Removed obsolete Uptime Kuma docker-compose file.
2025-09-19 12:09:05 +03:00
1eb11e454d chore: remove Nginx service from docker-compose and update Ansible inventory with new server IP
- Deleted the Nginx service configuration from docker-compose.yml.
- Updated the Ansible inventory file to reflect a new server IP address.
2025-09-19 02:21:57 +03:00
14b19699c5 feat: enhance Ansible playbook with project directory permissions and service checks
- Add tasks to set directory permissions for the project before and after cloning.
- Introduce a task to reload the SSH service to apply new configurations.
- Implement a check for Node Exporter metrics availability.
- Update Prometheus configuration comment for clarity on Node Exporter target.
2025-09-19 01:56:12 +03:00
1db579797d refactor: update Nginx configuration and Docker setup
- Change user directive in Nginx configuration from 'nginx' to 'www-data'.
- Update upstream server configurations in Nginx to use 'localhost' instead of service names.
- Modify Nginx server block to redirect HTTP to a status page instead of Grafana.
- Rename Alertmanager location from '/alertmanager/' to '/alerts/' for consistency.
- Remove deprecated status page configuration and related files.
- Adjust Prometheus configuration to reflect the new Docker network settings.
2025-09-18 21:21:23 +03:00
9ec3f02767 feat: integrate Uptime Kuma and Alertmanager into Docker setup
- Add Uptime Kuma service for status monitoring with health checks.
- Introduce Alertmanager service for alert management and notifications.
- Update docker-compose.yml to include new services and their configurations.
- Enhance Makefile with commands for managing Uptime Kuma and Alertmanager logs.
- Modify Ansible playbook to install necessary packages and configure SSL for new services.
- Update Nginx configuration to route traffic to Uptime Kuma and Alertmanager.
- Adjust Prometheus configuration to include alert rules and external URLs.
2025-09-16 21:50:56 +03:00
5e10204137 Merge branch 'main' of https://github.com/KerradKerridi/prod 2025-09-16 18:52:53 +03:00
5b8833a67f Merge branch 'main' of https://github.com/KerradKerridi/prod 2025-09-16 18:52:24 +03:00
2661b3865e fix: update Dockerfile reference in docker-compose and add versioning to Ansible playbook
- Change Dockerfile reference in docker-compose.yml from Dockerfile.bot to Dockerfile
- Add versioning comment for the telegram-helper-bot repository in playbook.yml
2025-09-16 18:51:05 +03:00
ANDREY KATYKHIN
539c074e9f Merge pull request #3 from KerradKerridi/dev-3
Dev 3
2025-09-16 18:32:23 +03:00
37 changed files with 5338 additions and 805 deletions

View File

@@ -0,0 +1,409 @@
---
name: prod-project-rules
description: Правила работы с проектом prod - инфраструктура, боты, CI/CD
---
# Правила работы с проектом prod
## 📋 Обзор проекта
**prod** — проект для управления Telegram ботами и мониторинга инфраструктуры в продакшене.
### Основные компоненты:
- **Инфраструктура мониторинга**: Prometheus, Grafana, Alertmanager, Uptime Kuma
- **Telegram боты**: telegram-helper-bot, AnonBot (в отдельных поддиректориях)
- **CI/CD**: GitHub Actions с автоматическим тестированием, созданием PR и деплоем
- **Контейнеризация**: Docker Compose для оркестрации сервисов
---
## 🌿 Работа с ветками и Git
### Структура веток:
- **`main`** — продакшен ветка, защищена, только через PR
- **`develop`** — ветка разработки (опционально)
- **`dev-*`** — ветки для разработки (например, `dev-4`)
- **`feature/**`** — ветки для новых фич
### Workflow разработки:
1. **Создание ветки для разработки:**
```bash
git checkout -b dev-4 # или feature/my-feature
```
2. **Перед коммитом - проверка качества кода:**
```bash
make code-quality # Проверяет форматирование, импорты, линтинг
# Или автоматически исправить:
make format # Исправить форматирование
make import-fix # Исправить сортировку импортов
```
3. **Коммит и пуш:**
```bash
git add .
git commit -m "feat: описание изменений"
git push -u origin dev-4
```
4. **Автоматические действия после push:**
- ✅ Запускаются тесты (Black, isort, flake8, pytest)
- ✅ При успешных тестах автоматически создается/обновляется PR в `main`
- ✅ Отправляется уведомление в Telegram
5. **После мержа PR в `main`:**
- ✅ Автоматически запускается деплой в продакшен (`deploy.yml`)
- ✅ Проверяются токены ботов
- ✅ Выполняется деплой на сервер
- ✅ Запускаются health checks и smoke tests
- ✅ При падении smoke tests — автоматический rollback
---
## 🎨 Стандарты кода
### Форматирование (Black):
- **Обязательно**: Все Python файлы должны быть отформатированы через Black
- **Проверка**: `make format-check` или `black --check .`
- **Исправление**: `make format` или `black .`
- **Правила**:
- Двойные кавычки `"` вместо одинарных `'`
- 2 пустые строки между импортами и определениями функций/классов
- Автоматический перенос длинных строк
### Сортировка импортов (isort):
- **Обязательно**: Импорты должны быть отсортированы
- **Проверка**: `make import-check` или `isort --check-only .`
- **Исправление**: `make import-fix` или `isort .`
- **Порядок**: стандартная библиотека → сторонние → локальные
### Линтинг (flake8):
- **Критические ошибки** (E9, F63, F7, F82) — блокируют пайплайн
- **Предупреждения** (F821, F822, F824) — игнорируются в CI
- **Проверка**: `make lint-check`
- **Исключения**: `.venv`, `venv`, `__pycache__`, `.git`
### Перед коммитом:
```bash
make code-quality # Проверяет всё сразу
```
---
## 🧪 Тестирование
### Структура тестов:
- **`tests/`** — тесты инфраструктуры проекта
- **`bots/*/tests/`** — тесты ботов (в их репозиториях)
### Запуск тестов:
```bash
make test # Все тесты
make test-infra # Только тесты инфраструктуры
make test-coverage # С отчетом о покрытии
make test-clean # Очистить кэш и отчеты
```
### Конфигурация pytest:
- Файл: `pytest.ini` в корне проекта
- Автоматическое обнаружение тестов в `tests/`
- Маркеры: `slow`, `integration`, `unit`
- Asyncio режим: автоматический
### Правила написания тестов:
- Используй описательные имена: `test_prometheus_config_is_valid`
- Группируй связанные тесты в классы
- Используй фикстуры для общих setup/teardown
- Тесты должны быть независимыми и идемпотентными
---
## 🐳 Docker и контейнеризация
### Структура:
- **`docker-compose.yml`** — основной файл оркестрации
- **`Dockerfile`** — базовый образ (если нужен)
- **`bots/*/Dockerfile`** — Dockerfile для каждого бота
### Сервисы в docker-compose:
- `prometheus` — сбор метрик (порт 9090)
- `grafana` — дашборды (порт 3000)
- `alertmanager` — управление алертами (порт 9093)
- `uptime-kuma` — мониторинг доступности (порт 3001)
- `telegram-bot` — Telegram Helper Bot (порт 8080)
- `anon-bot` — AnonBot (порт 8081)
### Команды:
```bash
make build # Собрать все контейнеры
make up # Запустить все сервисы
make down # Остановить все сервисы
make restart # Перезапустить все сервисы
make logs # Логи всех сервисов
make logs-bot # Логи Telegram бота
```
### Важные правила:
- **Токены ботов**: Используются из GitHub Secrets через переменные окружения
- **Формат**: `TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN:-${BOT_TOKEN}}` (Secrets имеют приоритет)
- **Build**: Используй `docker-compose build --pull` (не `--no-cache`) для оптимизации
- **Graceful shutdown**: `docker-compose down -t 30` для корректного завершения
---
## 🔐 Безопасность и секреты
### GitHub Secrets (обязательные):
- `TELEGRAM_BOT_TOKEN` — токен Telegram Helper Bot
- `TELEGRAM_TEST_BOT_TOKEN` — токен тестового бота (опционально)
- `ANON_BOT_TOKEN` — токен AnonBot
- `SSH_PRIVATE_KEY` — приватный ключ для SSH доступа к серверу
- `SERVER_HOST`, `SERVER_USER`, `SSH_PORT` — данные сервера
- `TELEGRAM_CHAT_ID` — ID чата для уведомлений
### Локальная разработка:
- Используй `.env` файлы для локальных переменных
- `.env` файлы в `.gitignore` — никогда не коммить!
- Токены из Secrets имеют приоритет над `.env` в продакшене
### Правила:
- ❌ **НЕ коммить** токены, пароли, секреты
- ❌ **НЕ коммить** `.env` файлы
- ✅ Используй `env.template` как шаблон
- ✅ Все секреты храни в GitHub Secrets
---
## 🚀 CI/CD Pipeline
### Два основных workflow:
#### 1. `pipeline.yml` (CI):
- **Триггер**: Push в `main`, `develop`, `dev-*`, `feature/**`
- **Jobs**:
- `test` — проверка качества кода и тесты
- `create-pr` — автоматическое создание/обновление PR (только для `dev-*` и `feature/**`)
- `rollback` — ручной откат через `workflow_dispatch`
#### 2. `deploy.yml` (CD):
- **Триггер**: Мерж PR в `main`
- **Jobs**:
- `deploy` — деплой на сервер
- `smoke-tests` — проверка работоспособности ботов
- `auto-rollback` — автоматический откат при падении smoke tests
### Правила работы с пайплайном:
1. **Перед push** — всегда запускай `make code-quality` локально
2. **После успешных тестов** — PR создастся/обновится автоматически
3. **После мержа PR** — деплой запустится автоматически
4. **При проблемах** — используй manual rollback через Actions → Run workflow
---
## 📁 Структура проекта
```
prod/
├── .github/workflows/ # CI/CD пайплайны
│ ├── pipeline.yml # CI: тесты, создание PR
│ └── deploy.yml # CD: деплой, smoke tests, rollback
├── bots/ # Директория для ботов (submodules)
│ ├── telegram-helper-bot/ # Telegram Helper Bot
│ └── AnonBot/ # AnonBot
├── infra/ # Инфраструктура
│ ├── prometheus/ # Конфигурация Prometheus
│ ├── grafana/ # Дашборды и provisioning Grafana
│ ├── alertmanager/ # Конфигурация Alertmanager
│ ├── nginx/ # Nginx конфигурация
│ └── ansible/ # Ansible playbooks
├── scripts/ # Скрипты развертывания
├── tests/ # Тесты инфраструктуры
│ └── infra/ # Тесты инфраструктуры
├── docker-compose.yml # Docker Compose конфигурация
├── Makefile # Команды для управления проектом
├── pytest.ini # Конфигурация pytest
└── README.md # Документация проекта
```
### Важные файлы:
- **`docker-compose.yml`** — основная конфигурация сервисов
- **`Makefile`** — команды для разработки и управления
- **`pytest.ini`** — конфигурация тестов
- **`.gitignore`** — исключает `.env`, `.venv`, логи, кэш
---
## 🔧 Разработка
### Локальная настройка:
1. **Клонирование и настройка:**
```bash
git clone <repo>
cd prod
cp env.template .env
# Отредактируй .env с локальными значениями
```
2. **Установка зависимостей:**
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install black isort flake8 pytest
```
3. **Проверка перед коммитом:**
```bash
make code-quality
```
### Работа с ботами:
- Боты находятся в `bots/` как отдельные репозитории (submodules или клоны)
- Каждый бот имеет свой Dockerfile
- Токены ботов передаются через environment variables в docker-compose
---
## 📝 Коммиты и PR
### Формат коммитов:
Используй понятные сообщения:
```
feat: добавлена функция X
fix: исправлена ошибка Y
chore: обновлены зависимости
docs: обновлена документация
refactor: рефакторинг модуля Z
```
### Pull Request:
- **Автоматическое создание**: Для веток `dev-*` и `feature/**` после успешных тестов
- **Обновление**: PR автоматически обновляется при новых коммитах в той же ветке
- **Мерж**: После мержа в `main` запускается автоматический деплой
---
## 🚨 Деплой и Rollback
### Автоматический деплой:
1. Мерж PR в `main` → запуск `deploy.yml`
2. Валидация токенов ботов
3. Деплой на сервер (SSH)
4. Пересборка контейнеров с `--pull`
5. Health checks с экспоненциальным retry
6. Smoke tests (отправка сообщений в Telegram)
7. При успехе — обновление истории деплоев
### Автоматический rollback:
- Срабатывает при падении smoke tests
- Откатывается к последнему успешному коммиту из истории
- Пересобираются контейнеры
- Проверяются health checks
### Ручной rollback:
- Actions → CI & CD pipeline → Run workflow
- Выбери `rollback` и опционально укажи commit hash
- Если commit не указан — используется последний успешный деплой
---
## 🛠️ Полезные команды Makefile
### Качество кода:
```bash
make code-quality # Все проверки (Black, isort, flake8)
make format # Автоисправление форматирования
make import-fix # Автоисправление импортов
make format-diff # Показать что будет изменено
```
### Docker:
```bash
make build # Собрать контейнеры
make up # Запустить сервисы
make down # Остановить сервисы
make restart # Перезапустить
make logs # Логи всех сервисов
make logs-bot # Логи бота
```
### Тестирование:
```bash
make test # Все тесты
make test-infra # Тесты инфраструктуры
make test-coverage # С покрытием
make test-clean # Очистить кэш
```
### Мониторинг:
```bash
make monitoring # Открыть Grafana
make prometheus # Открыть Prometheus
make status # Статус контейнеров
make health # Health checks
```
---
## ⚠️ Важные замечания
### НЕ делай:
- ❌ Коммить `.env` файлы с секретами
- ❌ Коммить токены ботов в код
- ❌ Использовать `docker-compose build --no-cache` без необходимости
- ❌ Пуш в `main` напрямую (только через PR)
- ❌ Игнорировать ошибки форматирования перед коммитом
### Всегда делай:
- ✅ Запускай `make code-quality` перед коммитом
- ✅ Используй ветки `dev-*` или `feature/**` для разработки
- ✅ Проверяй, что тесты проходят локально
- ✅ Используй GitHub Secrets для токенов в продакшене
- ✅ Проверяй логи после деплоя
---
## 📚 Дополнительные ресурсы
- **README.md** — основная документация проекта
- **`.cursor/rules/release-notes-template.md`** — шаблон для Release Notes
- **`pytest.ini`** — конфигурация тестов
- **`Makefile`** — все доступные команды (`make help`)
---
## 🔄 Workflow схема
```
1. Создание ветки (dev-* или feature/**)
2. Разработка + локальные тесты (make code-quality)
3. Git commit + push
4. GitHub Actions: автоматические тесты
5. При успехе: автоматическое создание/обновление PR
6. Ручной review и мерж PR в main
7. GitHub Actions: автоматический деплой
8. Health checks + Smoke tests
9. При успехе: ✅ Деплой завершен
При падении: 🔄 Автоматический rollback
```
---
## 💡 Советы
1. **Используй Makefile** — все команды там, не запоминай длинные команды
2. **Проверяй локально** — запускай `make code-quality` перед каждым коммитом
3. **Следи за уведомлениями** — Telegram уведомления показывают статус деплоя
4. **Используй правильные ветки** — `dev-*` для автоматического создания PR
5. **Читай логи** — при проблемах смотри логи в GitHub Actions и на сервере

View File

@@ -0,0 +1,124 @@
# Инструкция по оформлению Release Notes
## Назначение
Этот документ описывает структуру и формат для создания файлов Release Notes (например, `docs/RELEASE_NOTES_DEV-XX.md`).
## Структура документа
### 1. Заголовок
```markdown
# Release Notes: [название-ветки]
```
### 2. Обзор
Краткий абзац (1-2 предложения), описывающий:
- Количество коммитов в ветке
- Основные направления изменений
**Формат:**
```markdown
## Обзор
Ветка [название] содержит [N] коммитов с ключевыми улучшениями: [краткое перечисление основных изменений].
```
### 3. Ключевые изменения
Основной раздел с пронумерованными подразделами для каждого значимого изменения.
**Структура каждого подраздела:**
```markdown
### [Номер]. [Название изменения]
**Коммит:** `[hash]`
**Что сделано:**
- [Краткое описание изменения 1]
- [Краткое описание изменения 2]
- [Краткое описание изменения 3]
```
**Правила:**
- Каждое изменение = отдельный подраздел
- Название должно быть кратким и понятным
- В разделе "Что сделано" используй маркированные списки
- НЕ перечисляй затронутые файлы
- НЕ указывай статистику строк кода
- Фокусируйся на сути изменений, а не на технических деталях
- Разделяй подразделы горизонтальной линией `---`
### 4. Основные достижения
Раздел с чекбоксами, подводящий итоги релиза.
**Формат:**
```markdown
## 🎯 Основные достижения
✅ [Достижение 1]
✅ [Достижение 2]
✅ [Достижение 3]
```
**Правила:**
- Используй эмодзи ✅ для каждого достижения
- Каждое достижение на отдельной строке
- Краткие формулировки (3-5 слов)
- Фокусируйся на ключевых фичах и улучшениях
### 5. Временная шкала разработки
Раздел с информацией о сроках разработки.
**Формат:**
```markdown
## 📅 Временная шкала разработки
**Последние изменения:** [дата]
**Основная разработка:** [период]
**Предыдущие улучшения:** [контекст предыдущих веток/изменений]
**Хронология коммитов:**
- `[hash]` - [дата и время] - [краткое описание]
- `[hash]` - [дата и время] - [краткое описание]
```
**Правила:**
- Используй реальные даты из коммитов
- Формат даты: "DD месяц YYYY" (например, "25 января 2026")
- Для времени используй формат "HH:MM"
- Хронология должна быть в хронологическом порядке (от старых к новым)
## Стиль написания
### Общие правила:
- **Краткость**: Фокусируйся на сути, избегай избыточных деталей
- **Ясность**: Используй простые и понятные формулировки
- **Структурированность**: Информация должна быть легко читаемой и сканируемой
- **Без технических деталей**: Не перечисляй файлы, классы, методы (только если это ключевая фича)
- **Без статистики**: Не указывай количество строк кода, файлов и т.д.
### Язык:
- Используй прошедшее время для описания изменений ("Добавлена", "Реализована", "Обновлена")
- Избегай технического жаргона, если это не необходимо
- Используй активный залог
### Эмодзи:
- 🔥 для раздела "Ключевые изменения"
- 🎯 для раздела "Основные достижения"
- 📅 для раздела "Временная шкала разработки"
- ✅ для чекбоксов достижений
## Пример использования
При создании Release Notes для новой ветки:
1. Получи список коммитов: `git log [base-branch]..[target-branch] --oneline`
2. Для каждого значимого коммита создай подраздел в "Ключевые изменения"
3. Собери основные достижения в раздел "Основные достижения"
4. Добавь временную шкалу с реальными датами коммитов
5. Проверь, что документ следует структуре и стилю
## Важные замечания
- **НЕ включай** информацию о коммитах, которые уже были в базовой ветке (master/main)
- **НЕ перечисляй** все файлы, которые были изменены
- **НЕ указывай** статистику строк кода
- **Фокусируйся** на функциональных изменениях, а не на технических деталях реализации
- Используй **реальные даты** из коммитов, а не предполагаемые

View File

@@ -3,6 +3,10 @@
# Игнорируем ВСЕХ ботов - они не нужны в этом контейнере # Игнорируем ВСЕХ ботов - они не нужны в этом контейнере
bots/ bots/
# Игнорируем логи ботов
bots/*/logs/
bots/*/logs/**
# Игнорируем ВСЕ скрытые файлы и папки (кроме .gitignore) # Игнорируем ВСЕ скрытые файлы и папки (кроме .gitignore)
.* .*
!.gitignore !.gitignore
@@ -45,3 +49,29 @@ data/
*.bin *.bin
*.dat *.dat
*.model *.model
# Игнорируем данные Docker volumes
/var/lib/docker/volumes/
uptime_kuma_data/
prometheus_data/
grafana_data/
alertmanager_data/
# Игнорируем временные файлы Docker
.docker/
# Игнорируем базы данных и файлы данных
*.db
*.db-shm
*.db-wal
*.sqlite
*.sqlite3
# Игнорируем backup файлы
*.backup
*.bak
*.old
# Игнорируем файлы миграций и временные скрипты
migration.log
fix_dates.log

99
.github/workflows/ci.yml vendored Normal file
View File

@@ -0,0 +1,99 @@
name: CI pipeline
on:
push:
branches: [ 'dev-*', 'feature/**' ]
workflow_dispatch:
inputs:
action:
description: 'Action to perform'
required: true
type: choice
jobs:
test:
runs-on: ubuntu-latest
name: Test & Code Quality
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python 3.11
uses: actions/setup-python@v5
with:
python-version: '3.11'
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r tests/infra/requirements-test.txt
pip install flake8 black isort mypy || true
- name: Code formatting check (Black)
run: |
echo "🔍 Checking code formatting with Black..."
black --check . || (echo "❌ Code formatting issues found. Run 'black .' to fix." && exit 1)
- name: Import sorting check (isort)
run: |
echo "🔍 Checking import sorting with isort..."
isort --check-only . || (echo "❌ Import sorting issues found. Run 'isort .' to fix." && exit 1)
- name: Linting (flake8) - Critical errors
run: |
echo "🔍 Running flake8 linter (critical errors only)..."
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
- name: Linting (flake8) - Warnings
run: |
echo "🔍 Running flake8 linter (warnings)..."
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics || true
continue-on-error: true
- name: Run infrastructure tests
run: |
python -m pytest tests/infra/ -v --tb=short
- name: Validate Prometheus config
run: |
python -m pytest tests/infra/test_prometheus_config.py -v
- name: Send test success notification
if: success()
uses: appleboy/telegram-action@v1.0.0
with:
to: ${{ secrets.TELEGRAM_CHAT_ID }}
token: ${{ secrets.TELEGRAM_BOT_TOKEN }}
message: |
✅ CI Tests Passed
📦 Repository: prod
🌿 Branch: ${{ github.ref_name }}
📝 Commit: ${{ github.sha }}
👤 Author: ${{ github.actor }}
✅ All tests passed! Code quality checks completed successfully.
🔗 View details: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
continue-on-error: true
- name: Send test failure notification
if: failure()
uses: appleboy/telegram-action@v1.0.0
with:
to: ${{ secrets.TELEGRAM_CHAT_ID }}
token: ${{ secrets.TELEGRAM_BOT_TOKEN }}
message: |
❌ CI Tests Failed
📦 Repository: prod
🌿 Branch: ${{ github.ref_name }}
📝 Commit: ${{ github.sha }}
👤 Author: ${{ github.actor }}
❌ Tests failed! Deployment blocked. Please fix the issues and try again.
🔗 View details: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
continue-on-error: true

276
.github/workflows/deploy.yml vendored Normal file
View File

@@ -0,0 +1,276 @@
name: Deploy to Production
on:
push:
branches: [ main ]
workflow_dispatch:
inputs:
action:
description: 'Action to perform'
required: true
type: choice
options:
- deploy
- rollback
rollback_commit:
description: 'Commit hash to rollback to (optional, uses last successful if empty)'
required: false
type: string
jobs:
deploy:
runs-on: ubuntu-latest
name: Deploy to Production
if: |
github.event_name == 'push' ||
(github.event_name == 'workflow_dispatch' && github.event.inputs.action == 'deploy')
concurrency:
group: production-deploy
cancel-in-progress: false
environment:
name: production
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: main
- name: Deploy to server
uses: appleboy/ssh-action@v1.0.0
with:
host: ${{ vars.SERVER_HOST || secrets.SERVER_HOST }}
username: ${{ vars.SERVER_USER || secrets.SERVER_USER }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
port: ${{ vars.SSH_PORT || secrets.SSH_PORT || 22 }}
script: |
set -e
export TELEGRAM_BOT_TOKEN="${{ secrets.TELEGRAM_BOT_TOKEN }}"
export TELEGRAM_TEST_BOT_TOKEN="${{ secrets.TELEGRAM_TEST_BOT_TOKEN }}"
export ANON_BOT_TOKEN="${{ secrets.ANON_BOT_TOKEN }}"
echo "🚀 Starting deployment to production..."
cd /home/prod
# Сохраняем информацию о коммите
CURRENT_COMMIT=$(git rev-parse HEAD)
COMMIT_MESSAGE=$(git log -1 --pretty=format:"%s" || echo "Unknown")
COMMIT_AUTHOR=$(git log -1 --pretty=format:"%an" || echo "Unknown")
TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
echo "📝 Current commit: $CURRENT_COMMIT"
echo "📝 Commit message: $COMMIT_MESSAGE"
echo "📝 Author: $COMMIT_AUTHOR"
# Записываем в историю деплоев
HISTORY_FILE="/home/prod/.deploy_history.txt"
HISTORY_SIZE="${DEPLOY_HISTORY_SIZE:-10}"
echo "${TIMESTAMP}|${CURRENT_COMMIT}|${COMMIT_MESSAGE}|${COMMIT_AUTHOR}|deploying" >> "$HISTORY_FILE"
tail -n "$HISTORY_SIZE" "$HISTORY_FILE" > "${HISTORY_FILE}.tmp" && mv "${HISTORY_FILE}.tmp" "$HISTORY_FILE"
# Обновляем код
echo "📥 Pulling latest changes from main..."
sudo chown -R deploy:deploy /home/prod/bots || true
git fetch origin main
git reset --hard origin/main
sudo chown -R deploy:deploy /home/prod/bots || true
NEW_COMMIT=$(git rev-parse HEAD)
echo "✅ Code updated: $CURRENT_COMMIT → $NEW_COMMIT"
# Валидация docker-compose
echo "🔍 Validating docker-compose configuration..."
docker-compose config > /dev/null || exit 1
echo "✅ docker-compose.yml is valid"
# Проверка дискового пространства
MIN_FREE_GB=5
AVAILABLE_SPACE=$(df -BG /home/prod 2>/dev/null | tail -1 | awk '{print $4}' | sed 's/G//' || echo "0")
echo "💾 Available disk space: ${AVAILABLE_SPACE}GB"
if [ "$AVAILABLE_SPACE" -lt "$MIN_FREE_GB" ]; then
echo "⚠️ Insufficient disk space! Cleaning up Docker resources..."
docker system prune -f --volumes || true
fi
# Сборка и запуск контейнеров (кроме ботов для ускорения деплоя)
echo "🔨 Rebuilding infrastructure containers (excluding bots)..."
docker-compose stop prometheus grafana uptime-kuma alertmanager || true
export TELEGRAM_BOT_TOKEN TELEGRAM_TEST_BOT_TOKEN ANON_BOT_TOKEN
docker-compose build --pull prometheus grafana uptime-kuma alertmanager
docker-compose up -d prometheus grafana uptime-kuma alertmanager
echo "✅ Infrastructure containers rebuilt and started (bots remain running)"
- name: Update deploy history
if: always()
uses: appleboy/ssh-action@v1.0.0
with:
host: ${{ vars.SERVER_HOST || secrets.SERVER_HOST }}
username: ${{ vars.SERVER_USER || secrets.SERVER_USER }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
port: ${{ vars.SSH_PORT || secrets.SSH_PORT || 22 }}
script: |
HISTORY_FILE="/home/prod/.deploy_history.txt"
if [ -f "$HISTORY_FILE" ]; then
DEPLOY_STATUS="failed"
if [ "${{ job.status }}" = "success" ]; then
DEPLOY_STATUS="success"
fi
sed -i '$s/|deploying$/|'"$DEPLOY_STATUS"'/' "$HISTORY_FILE"
echo "✅ Deploy history updated: $DEPLOY_STATUS"
fi
- name: Send deployment notification
if: always()
uses: appleboy/telegram-action@v1.0.0
with:
to: ${{ secrets.TELEGRAM_CHAT_ID }}
token: ${{ secrets.TELEGRAM_BOT_TOKEN }}
message: |
${{ job.status == 'success' && '✅' || '❌' }} Deployment: ${{ job.status }}
📦 Repository: prod
🌿 Branch: main
📝 Commit: ${{ github.event.pull_request.merge_commit_sha || github.sha }}
👤 Author: ${{ github.event.pull_request.user.login || github.actor }}
${{ github.event.pull_request.number && format('🔀 PR: #{0}', github.event.pull_request.number) || '' }}
${{ job.status == 'success' && '✅ Deployment successful! Containers started.' || '❌ Deployment failed! Check logs for details.' }}
🔗 View details: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
continue-on-error: true
rollback:
runs-on: ubuntu-latest
name: Rollback to Previous Version
if: |
github.event_name == 'workflow_dispatch' &&
github.event.inputs.action == 'rollback'
environment:
name: production
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: main
- name: Rollback on server
uses: appleboy/ssh-action@v1.0.0
with:
host: ${{ vars.SERVER_HOST || secrets.SERVER_HOST }}
username: ${{ vars.SERVER_USER || secrets.SERVER_USER }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
port: ${{ vars.SSH_PORT || secrets.SSH_PORT || 22 }}
script: |
set -e
export TELEGRAM_BOT_TOKEN="${{ secrets.TELEGRAM_BOT_TOKEN }}"
export TELEGRAM_TEST_BOT_TOKEN="${{ secrets.TELEGRAM_TEST_BOT_TOKEN }}"
export ANON_BOT_TOKEN="${{ secrets.ANON_BOT_TOKEN }}"
echo "🔄 Starting rollback..."
cd /home/prod
# Определяем коммит для отката
ROLLBACK_COMMIT="${{ github.event.inputs.rollback_commit }}"
HISTORY_FILE="/home/prod/.deploy_history.txt"
if [ -z "$ROLLBACK_COMMIT" ]; then
echo "📝 No commit specified, finding last successful deploy..."
if [ -f "$HISTORY_FILE" ]; then
ROLLBACK_COMMIT=$(grep "|success$" "$HISTORY_FILE" | tail -1 | cut -d'|' -f2 || echo "")
fi
if [ -z "$ROLLBACK_COMMIT" ]; then
echo "❌ No successful deploy found in history!"
echo "💡 Please specify commit hash manually or check deploy history"
exit 1
fi
fi
echo "📝 Rolling back to commit: $ROLLBACK_COMMIT"
# Проверяем, что коммит существует
if ! git cat-file -e "$ROLLBACK_COMMIT" 2>/dev/null; then
echo "❌ Commit $ROLLBACK_COMMIT not found!"
exit 1
fi
# Сохраняем текущий коммит
CURRENT_COMMIT=$(git rev-parse HEAD)
COMMIT_MESSAGE=$(git log -1 --pretty=format:"%s" "$ROLLBACK_COMMIT" || echo "Rollback")
TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
echo "📝 Current commit: $CURRENT_COMMIT"
echo "📝 Target commit: $ROLLBACK_COMMIT"
echo "📝 Commit message: $COMMIT_MESSAGE"
# Исправляем права перед откатом
sudo chown -R deploy:deploy /home/prod/bots || true
# Откатываем код
echo "🔄 Rolling back code..."
git fetch origin main
git reset --hard "$ROLLBACK_COMMIT"
# Исправляем права после отката
sudo chown -R deploy:deploy /home/prod/bots || true
echo "✅ Code rolled back: $CURRENT_COMMIT → $ROLLBACK_COMMIT"
# Валидация docker-compose
echo "🔍 Validating docker-compose configuration..."
docker-compose config > /dev/null || exit 1
echo "✅ docker-compose.yml is valid"
# Проверка дискового пространства
MIN_FREE_GB=5
AVAILABLE_SPACE=$(df -BG /home/prod 2>/dev/null | tail -1 | awk '{print $4}' | sed 's/G//' || echo "0")
echo "💾 Available disk space: ${AVAILABLE_SPACE}GB"
if [ "$AVAILABLE_SPACE" -lt "$MIN_FREE_GB" ]; then
echo "⚠️ Insufficient disk space! Cleaning up Docker resources..."
docker system prune -f --volumes || true
fi
# Пересобираем и запускаем контейнеры (кроме ботов для ускорения отката)
echo "🔨 Rebuilding infrastructure containers (excluding bots)..."
docker-compose stop prometheus grafana uptime-kuma alertmanager || true
export TELEGRAM_BOT_TOKEN TELEGRAM_TEST_BOT_TOKEN ANON_BOT_TOKEN
docker-compose build --pull prometheus grafana uptime-kuma alertmanager
docker-compose up -d prometheus grafana uptime-kuma alertmanager
echo "✅ Infrastructure containers rebuilt and started (bots remain running)"
# Записываем в историю
echo "${TIMESTAMP}|${ROLLBACK_COMMIT}|Rollback to: ${COMMIT_MESSAGE}|github-actions|rolled_back" >> "$HISTORY_FILE"
HISTORY_SIZE="${DEPLOY_HISTORY_SIZE:-10}"
tail -n "$HISTORY_SIZE" "$HISTORY_FILE" > "${HISTORY_FILE}.tmp" && mv "${HISTORY_FILE}.tmp" "$HISTORY_FILE"
echo "✅ Rollback completed successfully"
- name: Send rollback notification
if: always()
uses: appleboy/telegram-action@v1.0.0
with:
to: ${{ secrets.TELEGRAM_CHAT_ID }}
token: ${{ secrets.TELEGRAM_BOT_TOKEN }}
message: |
${{ job.status == 'success' && '🔄' || '❌' }} Rollback: ${{ job.status }}
📦 Repository: prod
🌿 Branch: main
📝 Rolled back to: ${{ github.event.inputs.rollback_commit || 'Last successful commit' }}
👤 Triggered by: ${{ github.actor }}
${{ job.status == 'success' && '✅ Rollback completed successfully! Services restored to previous version.' || '❌ Rollback failed! Check logs for details.' }}
🔗 View details: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
continue-on-error: true

4
.gitignore vendored
View File

@@ -69,3 +69,7 @@ build/
# Ansible inventory files (contain sensitive server info) # Ansible inventory files (contain sensitive server info)
infra/ansible/inventory.ini infra/ansible/inventory.ini
infra/ansible/inventory_*.ini infra/ansible/inventory_*.ini
# Ansible vars files (contain passwords)
infra/ansible/vars.yml
infra/ansible/vars_*.yml

View File

@@ -1,7 +1,7 @@
########################################### ###########################################
# Этап 1: Сборщик (Builder) # Этап 1: Сборщик (Builder)
########################################### ###########################################
FROM python:3.9-slim as builder FROM python:3.11.9-slim as builder
# Устанавливаем ТОЧНО ТОЛЬКО то, что нужно для компиляции # Устанавливаем ТОЧНО ТОЛЬКО то, что нужно для компиляции
RUN apt-get update && apt-get install --no-install-recommends -y \ RUN apt-get update && apt-get install --no-install-recommends -y \
@@ -20,7 +20,7 @@ RUN pip install --no-cache-dir --target /install -r requirements.txt
# Этап 2: Финальный образ (Runtime) # Этап 2: Финальный образ (Runtime)
########################################### ###########################################
# Используем ОЧЕНЬ легковесный базовый образ # Используем ОЧЕНЬ легковесный базовый образ
FROM python:3.9-alpine as runtime FROM python:3.11.9-alpine as runtime
# В Alpine Linux свои пакеты. apk вместо apt. # В Alpine Linux свои пакеты. apk вместо apt.
# Устанавливаем минимальные рантайм-зависимости # Устанавливаем минимальные рантайм-зависимости
@@ -33,14 +33,14 @@ RUN addgroup -g 1000 app && \
WORKDIR /app WORKDIR /app
# Копируем зависимости из сборщика (если есть) # Копируем зависимости из сборщика (если есть)
COPY --from=builder --chown=1000:1000 /install /usr/local/lib/python3.9/site-packages COPY --from=builder --chown=1000:1000 /install /usr/local/lib/python3.11/site-packages
# Копируем исходный код # Копируем исходный код
COPY --chown=1000:1000 . . COPY --chown=1000:1000 . .
USER 1000 USER 1000
# Важно: явно указываем Python искать зависимости в скопированной директории # Важно: явно указываем Python искать зависимости в скопированной директории
ENV PYTHONPATH="/usr/local/lib/python3.9/site-packages:${PYTHONPATH}" ENV PYTHONPATH="/usr/local/lib/python3.11/site-packages:${PYTHONPATH}"
# Оставляем базовую команду для совместимости # Оставляем базовую команду для совместимости
CMD ["python", "-c", "print('Dockerfile готов для использования')"] CMD ["python", "-c", "print('Dockerfile готов для использования')"]

171
Makefile
View File

@@ -1,4 +1,4 @@
.PHONY: help build up down logs clean restart status deploy backup restore update clean-monitoring monitoring check-deps check-bot-deps check-anonBot-deps .PHONY: help build up down logs clean restart status deploy backup restore update clean-monitoring monitoring check-deps check-bot-deps check-anonBot-deps auth-setup auth-add-user auth-reset format-check format format-diff import-check import-fix lint-check code-quality
help: ## Показать справку help: ## Показать справку
@echo "🏗️ Production Infrastructure - Доступные команды:" @echo "🏗️ Production Infrastructure - Доступные команды:"
@@ -9,6 +9,8 @@ help: ## Показать справку
@echo "📊 Мониторинг:" @echo "📊 Мониторинг:"
@echo " Prometheus: http://localhost:9090" @echo " Prometheus: http://localhost:9090"
@echo " Grafana: http://localhost:3000 (admin/admin)" @echo " Grafana: http://localhost:3000 (admin/admin)"
@echo " Uptime Kuma: http://localhost:3001"
@echo " Alertmanager: http://localhost:9093"
@echo " Server Monitor: http://localhost:9091/health" @echo " Server Monitor: http://localhost:9091/health"
@echo " Bot Health: http://localhost:8080/health" @echo " Bot Health: http://localhost:8080/health"
@echo " AnonBot Health: http://localhost:8081/health" @echo " AnonBot Health: http://localhost:8081/health"
@@ -37,6 +39,12 @@ logs-bot: ## Показать логи Telegram бота
logs-anonBot: ## Показать логи AnonBot logs-anonBot: ## Показать логи AnonBot
docker-compose logs -f anon-bot docker-compose logs -f anon-bot
logs-uptime-kuma: ## Показать логи Uptime Kuma
docker-compose logs -f uptime-kuma
logs-alertmanager: ## Показать логи Alertmanager
docker-compose logs -f alertmanager
restart: ## Перезапустить все сервисы restart: ## Перезапустить все сервисы
docker-compose down docker-compose down
docker-compose build --no-cache docker-compose build --no-cache
@@ -54,6 +62,12 @@ restart-bot: ## Перезапустить только Telegram бота
restart-anonBot: ## Перезапустить только AnonBot restart-anonBot: ## Перезапустить только AnonBot
docker-compose restart anon-bot docker-compose restart anon-bot
restart-uptime-kuma: ## Перезапустить только Uptime Kuma
docker-compose restart uptime-kuma
restart-alertmanager: ## Перезапустить только Alertmanager
docker-compose restart alertmanager
status: ## Показать статус контейнеров status: ## Показать статус контейнеров
docker-compose ps docker-compose ps
@@ -63,6 +77,8 @@ health: ## Проверить здоровье сервисов
@curl -f http://localhost:8081/health || echo "❌ AnonBot health check failed" @curl -f http://localhost:8081/health || echo "❌ AnonBot health check failed"
@curl -f http://localhost:9090/-/healthy || echo "❌ Prometheus health check failed" @curl -f http://localhost:9090/-/healthy || echo "❌ Prometheus health check failed"
@curl -f http://localhost:3000/api/health || echo "❌ Grafana health check failed" @curl -f http://localhost:3000/api/health || echo "❌ Grafana health check failed"
@curl -f http://localhost:3001 || echo "❌ Uptime Kuma health check failed"
@curl -f http://localhost:9093/-/healthy || echo "❌ Alertmanager health check failed"
@curl -f http://localhost:9091/health || echo "❌ Server monitor health check failed" @curl -f http://localhost:9091/health || echo "❌ Server monitor health check failed"
deploy: ## Полный деплой на продакшен deploy: ## Полный деплой на продакшен
@@ -98,7 +114,7 @@ clean: ## Очистить все контейнеры и образы
clean-monitoring: ## Очистить только данные мониторинга clean-monitoring: ## Очистить только данные мониторинга
docker-compose down -v docker-compose down -v
docker volume rm prod_prometheus_data prod_grafana_data 2>/dev/null || true docker volume rm prod_prometheus_data prod_grafana_data prod_uptime_kuma_data prod_alertmanager_data 2>/dev/null || true
security-scan: ## Сканировать образы на уязвимости security-scan: ## Сканировать образы на уязвимости
@echo "🔍 Scanning Docker images for vulnerabilities..." @echo "🔍 Scanning Docker images for vulnerabilities..."
@@ -120,6 +136,8 @@ start: build up ## Собрать и запустить все сервисы
@echo "🏗️ Production Infrastructure запущена!" @echo "🏗️ Production Infrastructure запущена!"
@echo "📊 Prometheus: http://localhost:9090" @echo "📊 Prometheus: http://localhost:9090"
@echo "📈 Grafana: http://localhost:3000 (admin/admin)" @echo "📈 Grafana: http://localhost:3000 (admin/admin)"
@echo "📊 Uptime Kuma: http://localhost:3001"
@echo "🚨 Alertmanager: http://localhost:9093"
@echo "🤖 Bot Health: http://localhost:8080/health" @echo "🤖 Bot Health: http://localhost:8080/health"
@echo "🔒 AnonBot Health: http://localhost:8081/health" @echo "🔒 AnonBot Health: http://localhost:8081/health"
@echo "📡 Server Monitor: http://localhost:9091/health" @echo "📡 Server Monitor: http://localhost:9091/health"
@@ -151,16 +169,29 @@ test-all: ## Запустить все тесты в одном процессе
test-infra: check-deps ## Запустить тесты инфраструктуры test-infra: check-deps ## Запустить тесты инфраструктуры
@echo "🏗️ Запускаю тесты инфраструктуры..." @echo "🏗️ Запускаю тесты инфраструктуры..."
@python3 -m pytest tests/infra/ -v @source .venv/bin/activate && python3 -m pytest tests/infra/ -v
test-bot: check-bot-deps ## Запустить тесты Telegram бота test-bot: check-bot-deps ## Запустить тесты Telegram бота
@echo "🤖 Запускаю тесты Telegram бота..." @echo "🤖 Запускаю тесты Telegram бота..."
@cd bots/telegram-helper-bot && source .venv/bin/activate && python3 -m pytest tests/ -v @cd bots/telegram-helper-bot && source .venv/bin/activate && python3 -m pytest tests/ -v
test-bot-coverage: check-bot-deps ## Запустить тесты Telegram бота с отчетом о покрытии
@echo "🤖 Запускаю тесты Telegram бота..."
@cd bots/telegram-helper-bot && source .venv/bin/activate && python3 -m pytest tests/ --cov=helper_bot --cov-report=term-missing --cov-report=html:htmlcov/bot
@echo "📊 Отчеты о покрытии сохранены в htmlcov/"
@echo " - Telegram бот: $(shell python3 count_tests.py | head -2 | tail -1) тестов"
test-anonBot: check-anonBot-deps ## Запустить тесты AnonBot test-anonBot: check-anonBot-deps ## Запустить тесты AnonBot
@echo "🔒 Запускаю тесты AnonBot..." @echo "🔒 Запускаю тесты AnonBot..."
@cd bots/AnonBot && python3 -m pytest tests/ -v @cd bots/AnonBot && python3 -m pytest tests/ -v
test-anonBot-coverage: check-anonBot-deps ## Запустить тесты AnonBot с отчетом о покрытии
@echo "🔒 Запускаю тесты AnonBot..."
@cd bots/AnonBot && python3 -m pytest tests/ --cov=. --cov-report=term-missing --cov-report=html:htmlcov/anonbot
@echo "📊 Отчеты о покрытии сохранены в htmlcov/"
@echo " - AnonBot: $(shell python3 count_tests.py | head -3 | tail -1) тестов"
test-coverage: check-deps check-bot-deps check-anonBot-deps ## Запустить все тесты с отчетом о покрытии test-coverage: check-deps check-bot-deps check-anonBot-deps ## Запустить все тесты с отчетом о покрытии
@echo "📊 Запускаю все тесты с отчетом о покрытии..." @echo "📊 Запускаю все тесты с отчетом о покрытии..."
@echo "📈 Покрытие для инфраструктуры..." @echo "📈 Покрытие для инфраструктуры..."
@@ -192,6 +223,7 @@ test-clean: ## Очистить все файлы тестирования и о
@find . -name "__pycache__" -type d -exec rm -rf {} + 2>/dev/null || true @find . -name "__pycache__" -type d -exec rm -rf {} + 2>/dev/null || true
@echo "✅ Файлы тестирования очищены" @echo "✅ Файлы тестирования очищены"
check-ports: ## Проверить занятые порты check-ports: ## Проверить занятые порты
@echo "🔍 Checking occupied ports..." @echo "🔍 Checking occupied ports..."
@echo "Port 3000 (Grafana):" @echo "Port 3000 (Grafana):"
@@ -208,7 +240,7 @@ check-ports: ## Проверить занятые порты
check-deps: ## Проверить зависимости инфраструктуры check-deps: ## Проверить зависимости инфраструктуры
@echo "🔍 Проверяю зависимости инфраструктуры..." @echo "🔍 Проверяю зависимости инфраструктуры..."
@python3 -c "import pytest" 2>/dev/null || (echo "❌ Отсутствуют зависимости инфраструктуры. Установите: pip install pytest" && exit 1) @source .venv/bin/activate && python3 -c "import pytest" 2>/dev/null || (echo "❌ Отсутствуют зависимости инфраструктуры. Установите: source .venv/bin/activate && pip install pytest" && exit 1)
@echo "✅ Зависимости инфраструктуры установлены" @echo "✅ Зависимости инфраструктуры установлены"
check-bot-deps: ## Проверить зависимости Telegram бота check-bot-deps: ## Проверить зависимости Telegram бота
@@ -242,3 +274,134 @@ reload-prometheus: ## Перезагрузить конфигурацию Promet
reload-grafana: ## Перезагрузить конфигурацию Grafana reload-grafana: ## Перезагрузить конфигурацию Grafana
@echo "🔄 Reloading Grafana configuration..." @echo "🔄 Reloading Grafana configuration..."
@docker-compose restart grafana @docker-compose restart grafana
ssl-setup: ## Настроить SSL сертификаты (самоподписанный)
@echo "🔒 Setting up self-signed SSL certificates..."
@if [ -z "$(SERVER_IP)" ]; then echo "❌ Please set SERVER_IP variable in .env file"; exit 1; fi
@mkdir -p /etc/letsencrypt/live/$(SERVER_IP)
@openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout /etc/letsencrypt/live/$(SERVER_IP)/privkey.pem \
-out /etc/letsencrypt/live/$(SERVER_IP)/fullchain.pem \
-subj "/CN=$(SERVER_IP)"
@echo "✅ Self-signed certificate created for $(SERVER_IP)"
ssl-renew: ## Обновить SSL сертификаты
@echo "🔄 Renewing SSL certificates..."
@sudo /usr/local/bin/ssl-renewal.sh
ssl-status: ## Проверить статус SSL сертификатов
@echo "🔍 Checking SSL certificate status..."
@sudo certbot certificates
uptime-kuma: ## Открыть Uptime Kuma в браузере
@echo "📊 Opening Uptime Kuma..."
@open http://localhost:3001 || xdg-open http://localhost:3001 || echo "Please open manually: http://localhost:3001"
alertmanager: ## Открыть Alertmanager в браузере
@echo "🚨 Opening Alertmanager..."
@open http://localhost:9093 || xdg-open http://localhost:9093 || echo "Please open manually: http://localhost:9093"
monitoring-all: ## Открыть все мониторинг сервисы
@echo "📊 Opening all monitoring services..."
@echo " - Grafana: http://localhost:3000"
@echo " - Prometheus: http://localhost:9090"
@echo " - Uptime Kuma: http://localhost:3001"
@echo " - Alertmanager: http://localhost:9093"
@open http://localhost:3000 || xdg-open http://localhost:3000 || echo "Please open manually"
# ========================================
# 🔐 АВТОРИЗАЦИЯ МОНИТОРИНГА
# ========================================
auth-setup: ## Настроить авторизацию для мониторинга
@echo "🔐 Setting up monitoring authentication..."
@sudo mkdir -p /etc/nginx/passwords
@sudo cp scripts/generate_auth_passwords.sh /usr/local/bin/generate_auth_passwords.sh
@sudo chmod +x /usr/local/bin/generate_auth_passwords.sh
@echo "✅ Authentication setup complete!"
@echo "💡 Use 'make auth-add-user' to add users"
auth-add-user: ## Добавить пользователя для мониторинга (make auth-add-user USER=username)
@if [ -z "$(USER)" ]; then \
echo "❌ Please specify USER: make auth-add-user USER=username"; \
exit 1; \
fi
@echo "🔐 Adding user $(USER) for monitoring..."
@sudo /usr/local/bin/generate_auth_passwords.sh $(USER)
@echo "✅ User $(USER) added successfully!"
auth-reset: ## Сбросить пароль для пользователя (make auth-reset USER=username)
@if [ -z "$(USER)" ]; then \
echo "❌ Please specify USER: make auth-reset USER=username"; \
exit 1; \
fi
@echo "🔐 Resetting password for user $(USER)..."
@sudo htpasswd /etc/nginx/passwords/monitoring.htpasswd $(USER)
@echo "✅ Password reset for user $(USER)!"
auth-list: ## Показать список пользователей мониторинга
@echo "👥 Monitoring users:"
@sudo cat /etc/nginx/passwords/monitoring.htpasswd 2>/dev/null | cut -d: -f1 || echo "❌ No users found"
# ========================================
# Code Quality & Formatting
# ========================================
format-check: ## Проверить форматирование кода (Black)
@echo "🔍 Checking code formatting with Black..."
@if [ -f .venv/bin/python ]; then \
.venv/bin/python -m black --check . || (echo "❌ Code formatting issues found. Run 'make format' to fix." && exit 1); \
else \
python3 -m black --check . || (echo "❌ Code formatting issues found. Run 'make format' to fix." && exit 1); \
fi
@echo "✅ Code formatting is correct!"
format: ## Автоматически исправить форматирование кода (Black)
@echo "🎨 Formatting code with Black..."
@if [ -f .venv/bin/python ]; then \
.venv/bin/python -m black .; \
else \
python3 -m black .; \
fi
@echo "✅ Code formatted!"
format-diff: ## Показать что будет изменено Black (без применения)
@echo "📋 Showing Black diff (no changes applied)..."
@if [ -f .venv/bin/python ]; then \
.venv/bin/python -m black --diff .; \
else \
python3 -m black --diff .; \
fi
import-check: ## Проверить сортировку импортов (isort)
@echo "🔍 Checking import sorting with isort..."
@if [ -f .venv/bin/python ]; then \
.venv/bin/python -m isort --check-only . || (echo "❌ Import sorting issues found. Run 'make import-fix' to fix." && exit 1); \
else \
python3 -m isort --check-only . || (echo "❌ Import sorting issues found. Run 'make import-fix' to fix." && exit 1); \
fi
@echo "✅ Import sorting is correct!"
import-fix: ## Автоматически исправить сортировку импортов (isort)
@echo "📦 Fixing import sorting with isort..."
@if [ -f .venv/bin/python ]; then \
.venv/bin/python -m isort .; \
else \
python3 -m isort .; \
fi
@echo "✅ Imports sorted!"
lint-check: ## Проверить код линтером (flake8) - только критические ошибки
@echo "🔍 Running flake8 linter (critical errors only)..."
@if [ -f .venv/bin/python ]; then \
.venv/bin/python -m flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude=".venv,venv,__pycache__,.git,*.pyc" || true; \
else \
python3 -m flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude=".venv,venv,__pycache__,.git,*.pyc" || true; \
fi
@echo "✅ Linting check completed (non-critical warnings in dependencies ignored)!"
code-quality: format-check import-check lint-check ## Проверить качество кода (все проверки)
@echo ""
@echo "✅ All code quality checks passed!"
@echo ""
@echo " Note: F821/F822/F824 warnings in bots/ are non-critical and ignored in CI"

View File

@@ -19,10 +19,6 @@ prod/
## 🚀 Быстрый запуск ## 🚀 Быстрый запуск
### ⚠️ Важное замечание
**Убедитесь, что вы удалили файл `docker-compose.yml` из папки `bots/telegram-helper-bot/`**
для избежания конфликтов портов. Используйте только корневой `docker-compose.yml`.
### 1. Настройка переменных окружения ### 1. Настройка переменных окружения
Скопируйте шаблон и настройте переменные: Скопируйте шаблон и настройте переменные:
@@ -57,12 +53,25 @@ GRAFANA_ADMIN_PASSWORD=admin
docker-compose up -d docker-compose up -d
``` ```
### 2.1 Запуск только основного бота (с зависимостями). Можно заменить на AnonBot
```bash
docker-compose up -d prometheus telegram-bot
```
### 3. Проверка статуса ### 3. Проверка статуса
```bash ```bash
docker-compose ps docker-compose ps
``` ```
### 4. Перезапуск контейнера
```bash
docker-compose down telegram-bot && docker-compose build --no-cache telegram-bot && docker-compose up -d telegram-bot
```
## 📊 Сервисы ## 📊 Сервисы
- **Prometheus** (порт 9090) - сбор метрик - **Prometheus** (порт 9090) - сбор метрик

View File

@@ -12,18 +12,31 @@ services:
- '--web.console.templates=/etc/prometheus/consoles' - '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=${PROMETHEUS_RETENTION_DAYS:-30}d' - '--storage.tsdb.retention.time=${PROMETHEUS_RETENTION_DAYS:-30}d'
- '--web.enable-lifecycle' - '--web.enable-lifecycle'
- '--web.external-url=https://${SERVER_IP}/prometheus/'
# Оптимизация памяти
- '--storage.tsdb.max-block-duration=2h'
- '--storage.tsdb.min-block-duration=2h'
ports: ports:
- "9090:9090" - "9090:9090"
volumes: volumes:
- ./infra/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro - ./infra/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./infra/prometheus/alert_rules.yml:/etc/prometheus/alert_rules.yml:ro
- prometheus_data:/prometheus - prometheus_data:/prometheus
networks: networks:
- bots_network - bots_network
healthcheck: healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:9090/-/healthy"] test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:9090/prometheus/-/healthy"]
interval: 30s interval: 30s
timeout: 10s timeout: 10s
retries: 3 retries: 3
deploy:
resources:
limits:
memory: 128M
cpus: '0.5'
reservations:
memory: 64M
cpus: '0.25'
# Grafana Dashboard # Grafana Dashboard
grafana: grafana:
@@ -35,9 +48,12 @@ services:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-admin} - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_ADMIN_PASSWORD:-admin}
- GF_USERS_ALLOW_SIGN_UP=false - GF_USERS_ALLOW_SIGN_UP=false
- GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-simple-json-datasource - GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-simple-json-datasource
- GF_SERVER_ROOT_URL=https://${SERVER_IP:-localhost}/grafana/ - GF_SERVER_ROOT_URL=https://${SERVER_IP}/grafana/
- GF_SERVER_SERVE_FROM_SUB_PATH=true - GF_SERVER_SERVE_FROM_SUB_PATH=true
- GF_SERVER_DOMAIN=${SERVER_IP:-localhost} # Оптимизация памяти
- GF_DATABASE_MAX_IDLE_CONN=2
- GF_DATABASE_MAX_OPEN_CONN=5
- GF_DASHBOARDS_DEFAULT_HOME_DASHBOARD_PATH=/etc/grafana/provisioning/dashboards/node-exporter-full-dashboard.json
ports: ports:
- "3000:3000" - "3000:3000"
volumes: volumes:
@@ -52,27 +68,64 @@ services:
interval: 30s interval: 30s
timeout: 10s timeout: 10s
retries: 3 retries: 3
deploy:
resources:
limits:
memory: 200M
cpus: '0.5'
reservations:
memory: 100M
cpus: '0.25'
# Nginx Reverse Proxy # Uptime Kuma Status Page
nginx: uptime-kuma:
image: nginx:alpine image: louislam/uptime-kuma:latest
container_name: bots_nginx container_name: bots_uptime_kuma
restart: unless-stopped restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes: volumes:
- ./infra/nginx/nginx.conf:/etc/nginx/nginx.conf:ro - uptime_kuma_data:/app/data
- ./infra/nginx/conf.d:/etc/nginx/conf.d:ro ports:
- ./infra/nginx/ssl:/etc/nginx/ssl:ro - "3001:3001"
- ./infra/nginx/.htpasswd:/etc/nginx/.htpasswd:ro environment:
- UPTIME_KUMA_PORT=3001
networks:
- bots_network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3001"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
deploy:
resources:
limits:
memory: 150M
cpus: '0.5'
reservations:
memory: 80M
cpus: '0.25'
# Alertmanager
alertmanager:
image: prom/alertmanager:latest
container_name: bots_alertmanager
restart: unless-stopped
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
- '--web.external-url=https://${SERVER_IP}/alertmanager/'
- '--web.route-prefix=/'
ports:
- "9093:9093"
volumes:
- alertmanager_data:/alertmanager
- ./infra/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
networks: networks:
- bots_network - bots_network
depends_on: depends_on:
- grafana
- prometheus - prometheus
healthcheck: healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost/nginx-health"] test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:9093/-/healthy"]
interval: 30s interval: 30s
timeout: 10s timeout: 10s
retries: 3 retries: 3
@@ -95,10 +148,10 @@ services:
- LOG_RETENTION_DAYS=${LOG_RETENTION_DAYS:-30} - LOG_RETENTION_DAYS=${LOG_RETENTION_DAYS:-30}
- METRICS_HOST=${METRICS_HOST:-0.0.0.0} - METRICS_HOST=${METRICS_HOST:-0.0.0.0}
- METRICS_PORT=${METRICS_PORT:-8080} - METRICS_PORT=${METRICS_PORT:-8080}
# Telegram settings # Telegram settings (токены из GitHub Secrets имеют приоритет над .env)
- TELEGRAM_BOT_TOKEN=${BOT_TOKEN} - TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN:-${BOT_TOKEN}}
- TELEGRAM_LISTEN_BOT_TOKEN=${LISTEN_BOT_TOKEN} - TELEGRAM_LISTEN_BOT_TOKEN=${TELEGRAM_LISTEN_BOT_TOKEN:-${LISTEN_BOT_TOKEN}}
- TELEGRAM_TEST_BOT_TOKEN=${TEST_BOT_TOKEN} - TELEGRAM_TEST_BOT_TOKEN=${TELEGRAM_TEST_BOT_TOKEN:-${TEST_BOT_TOKEN}}
- TELEGRAM_PREVIEW_LINK=${PREVIEW_LINK:-false} - TELEGRAM_PREVIEW_LINK=${PREVIEW_LINK:-false}
- TELEGRAM_MAIN_PUBLIC=${MAIN_PUBLIC} - TELEGRAM_MAIN_PUBLIC=${MAIN_PUBLIC}
- TELEGRAM_GROUP_FOR_POSTS=${GROUP_FOR_POSTS} - TELEGRAM_GROUP_FOR_POSTS=${GROUP_FOR_POSTS}
@@ -152,8 +205,8 @@ services:
- PYTHONUNBUFFERED=1 - PYTHONUNBUFFERED=1
- DOCKER_CONTAINER=true - DOCKER_CONTAINER=true
- LOG_LEVEL=${LOG_LEVEL:-INFO} - LOG_LEVEL=${LOG_LEVEL:-INFO}
# AnonBot settings # AnonBot settings (токен из GitHub Secrets имеет приоритет над .env)
- ANON_BOT_TOKEN=${BOT_TOKEN} - ANON_BOT_TOKEN=${ANON_BOT_TOKEN:-${BOT_TOKEN}}
- ANON_BOT_ADMINS=${ADMINS} - ANON_BOT_ADMINS=${ADMINS}
- ANON_BOT_DATABASE_PATH=/app/database/anon_qna.db - ANON_BOT_DATABASE_PATH=/app/database/anon_qna.db
- ANON_BOT_DEBUG=${DEBUG:-false} - ANON_BOT_DEBUG=${DEBUG:-false}
@@ -194,10 +247,15 @@ volumes:
driver: local driver: local
grafana_data: grafana_data:
driver: local driver: local
uptime_kuma_data:
driver: local
alertmanager_data:
driver: local
networks: networks:
bots_network: bots_network:
driver: bridge driver: bridge
ipam: ipam:
config: config:
- subnet: 192.168.100.0/24 - subnet: 172.20.0.0/16
gateway: 172.20.0.1

View File

@@ -0,0 +1,17 @@
# Simplified Alertmanager Configuration
global:
smtp_smarthost: 'localhost:587'
smtp_from: 'alerts@localhost'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://localhost:5001/'
send_resolved: true

View File

@@ -0,0 +1,119 @@
# Alertmanager Configuration
# This file configures how alerts are handled and routed
global:
# SMTP configuration for email notifications
smtp_smarthost: 'localhost:587'
smtp_from: 'alerts@{{DOMAIN}}'
smtp_auth_username: 'alerts@{{DOMAIN}}'
smtp_auth_password: '{{SMTP_PASSWORD}}'
smtp_require_tls: true
# Resolve timeout
resolve_timeout: 5m
# Templates for alert formatting
templates:
- '/etc/alertmanager/templates/*.tmpl'
# Route configuration - defines how alerts are routed
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
routes:
# Critical alerts - immediate notification
- match:
severity: critical
receiver: 'critical-alerts'
group_wait: 5s
repeat_interval: 5m
# Warning alerts - grouped notification
- match:
severity: warning
receiver: 'warning-alerts'
group_wait: 30s
repeat_interval: 30m
# Bot-specific alerts
- match:
service: telegram-bot
receiver: 'bot-alerts'
group_wait: 10s
repeat_interval: 15m
- match:
service: anon-bot
receiver: 'bot-alerts'
group_wait: 10s
repeat_interval: 15m
# Infrastructure alerts
- match:
service: prometheus
receiver: 'infrastructure-alerts'
group_wait: 30s
repeat_interval: 1h
- match:
service: grafana
receiver: 'infrastructure-alerts'
group_wait: 30s
repeat_interval: 1h
- match:
service: nginx
receiver: 'infrastructure-alerts'
group_wait: 30s
repeat_interval: 1h
# Inhibition rules - suppress certain alerts when others are firing
inhibit_rules:
# Suppress warning alerts when critical alerts are firing
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'cluster', 'service']
# Suppress individual instance alerts when the entire service is down
- source_match:
alertname: 'ServiceDown'
target_match:
alertname: 'InstanceDown'
equal: ['service']
# Receiver configurations
receivers:
# Default webhook receiver (for testing)
- name: 'web.hook'
webhook_configs:
- url: 'http://localhost:5001/'
send_resolved: true
# Critical alerts - immediate notification via webhook
- name: 'critical-alerts'
webhook_configs:
- url: 'http://localhost:5001/critical'
send_resolved: true
# Warning alerts - less urgent notification
- name: 'warning-alerts'
webhook_configs:
- url: 'http://localhost:5001/warning'
send_resolved: true
# Bot-specific alerts
- name: 'bot-alerts'
webhook_configs:
- url: 'http://localhost:5001/bot'
send_resolved: true
# Infrastructure alerts
- name: 'infrastructure-alerts'
webhook_configs:
- url: 'http://localhost:5001/infrastructure'
send_resolved: true

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,529 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "rate(http_requests_total{job=~\"telegram-bot|anon-bot\"}[5m])",
"interval": "",
"legendFormat": "{{job}} - {{method}} {{status}}",
"refId": "A"
}
],
"title": "Bot Request Rate",
"type": "timeseries"
},
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job=~\"telegram-bot|anon-bot\"}[5m]))",
"interval": "",
"legendFormat": "{{job}} - 95th percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, rate(http_request_duration_seconds_bucket{job=~\"telegram-bot|anon-bot\"}[5m]))",
"interval": "",
"legendFormat": "{{job}} - 50th percentile",
"refId": "B"
}
],
"title": "Bot Response Time",
"type": "timeseries"
},
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"id": 3,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "rate(http_requests_total{job=~\"telegram-bot|anon-bot\",status=~\"5..\"}[5m]) / rate(http_requests_total{job=~\"telegram-bot|anon-bot\"}[5m]) * 100",
"interval": "",
"legendFormat": "{{job}} - Error Rate",
"refId": "A"
}
],
"title": "Bot Error Rate",
"type": "timeseries"
},
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bytes"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8
},
"id": 4,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "process_resident_memory_bytes{job=~\"telegram-bot|anon-bot\"}",
"interval": "",
"legendFormat": "{{job}} - Memory Usage",
"refId": "A"
}
],
"title": "Bot Memory Usage",
"type": "timeseries"
},
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 16
},
"id": 5,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "up{job=~\"telegram-bot|anon-bot\"}",
"interval": "",
"legendFormat": "{{job}} - Status",
"refId": "A"
}
],
"title": "Bot Health Status",
"type": "timeseries"
},
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 16
},
"id": 6,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "rate(process_cpu_seconds_total{job=~\"telegram-bot|anon-bot\"}[5m]) * 100",
"interval": "",
"legendFormat": "{{job}} - CPU Usage",
"refId": "A"
}
],
"title": "Bot CPU Usage",
"type": "timeseries"
}
],
"schemaVersion": 27,
"style": "dark",
"tags": ["bots", "monitoring"],
"templating": {
"list": []
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Bot Monitoring Dashboard",
"uid": "bot-monitoring",
"version": 1
}

View File

@@ -0,0 +1,523 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": null,
"links": [],
"panels": [
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
"interval": "",
"legendFormat": "CPU Usage - {{instance}}",
"refId": "A"
}
],
"title": "System CPU Usage",
"type": "timeseries"
},
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100",
"interval": "",
"legendFormat": "Memory Usage - {{instance}}",
"refId": "A"
}
],
"title": "System Memory Usage",
"type": "timeseries"
},
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"id": 3,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "(1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100",
"interval": "",
"legendFormat": "Disk Usage - {{instance}} {{mountpoint}}",
"refId": "A"
}
],
"title": "Disk Usage",
"type": "timeseries"
},
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8
},
"id": 4,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "up{job=~\"prometheus|grafana|nginx|alertmanager|uptime-kuma\"}",
"interval": "",
"legendFormat": "{{job}} - Status",
"refId": "A"
}
],
"title": "Service Health Status",
"type": "timeseries"
},
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "reqps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 16
},
"id": 5,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "rate(nginx_http_requests_total[5m])",
"interval": "",
"legendFormat": "Nginx - {{status}}",
"refId": "A"
}
],
"title": "Nginx Request Rate",
"type": "timeseries"
},
{
"datasource": "Prometheus",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"vis": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bytes"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 16
},
"id": 6,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"expr": "container_memory_usage_bytes{name=~\"bots_.*\"}",
"interval": "",
"legendFormat": "{{name}} - Memory",
"refId": "A"
}
],
"title": "Container Memory Usage",
"type": "timeseries"
}
],
"schemaVersion": 27,
"style": "dark",
"tags": ["infrastructure", "monitoring"],
"templating": {
"list": []
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Infrastructure Monitoring Dashboard",
"uid": "infrastructure-monitoring",
"version": 1
}

View File

@@ -0,0 +1,16 @@
# Grafana Dashboard Provisioning Configuration
# This file configures automatic dashboard import
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /etc/grafana/provisioning/dashboards
foldersFromFilesStructure: true

View File

@@ -4,5 +4,13 @@ datasources:
- name: Prometheus - name: Prometheus
type: prometheus type: prometheus
access: proxy access: proxy
url: http://prometheus:9090 url: http://prometheus:9090/prometheus
isDefault: true isDefault: true
jsonData:
httpMethod: POST
manageAlerts: true
prometheusType: Prometheus
prometheusVersion: 2.40.0
cacheLevel: 'High'
disableRecordingRules: false
incrementalQueryOverlapWindow: 10m

77
infra/logrotate/README.md Normal file
View File

@@ -0,0 +1,77 @@
# Logrotate Configuration
Эта директория содержит конфигурационные файлы для автоматической ротации логов.
## Файлы
### `logrotate_bots.conf.j2`
Шаблон конфигурации для логов ботов и Docker контейнеров:
- Логи ботов в `{{ project_root }}/bots/*/logs/*.log`
- Stderr логи ботов в `{{ project_root }}/bots/*/bot_stderr.log`
- Docker контейнер логи в `/var/lib/docker/containers/*/*.log`
### `logrotate_system.conf.j2`
Шаблон конфигурации для системных сервисов:
- Nginx логи в `/var/log/nginx/*.log`
- Системные логи (syslog, mail, auth, cron и др.)
- Fail2ban логи
- Docker daemon логи
- Prometheus node exporter логи
## Переменные окружения
Конфигурации используют следующие переменные из `.env` файла:
```bash
# Logrotate настройки
LOGROTATE_RETENTION_DAYS=30 # Количество дней хранения логов
LOGROTATE_COMPRESS=true # Сжатие старых логов
LOGROTATE_DELAYCOMPRESS=true # Отложенное сжатие
```
## Использование
Эти шаблоны автоматически применяются при запуске Ansible playbook. Они создают конфигурационные файлы в `/etc/logrotate.d/` на сервере.
### Ручное применение
Если нужно применить конфигурации вручную:
```bash
# Скопировать конфигурации
sudo cp logrotate_bots.conf.j2 /etc/logrotate.d/bots
sudo cp logrotate_system.conf.j2 /etc/logrotate.d/system
# Проверить конфигурацию
sudo logrotate -d /etc/logrotate.conf
# Принудительная ротация
sudo logrotate -f /etc/logrotate.conf
```
## Настройки по умолчанию
- **Ежедневная ротация**: все логи ротируются каждый день
- **Сжатие**: старые логи сжимаются gzip
- **Хранение**: 30 дней (настраивается через переменную)
- **Автоматический перезапуск сервисов**: после ротации логов
## Структура логов
После настройки логи будут организованы следующим образом:
```
/var/log/
├── nginx/
│ ├── access.log
│ ├── access.log.1.gz
│ ├── error.log
│ └── error.log.1.gz
└── ...
{{ project_root }}/bots/*/logs/
├── bot.log
├── bot.log.1.gz
├── bot.log.2.gz
└── ...
```

View File

@@ -0,0 +1,49 @@
# Logrotate configuration for bot applications
# This file manages log rotation for all bot services
{{ project_root }}/bots/*/logs/*.log {
daily
missingok
rotate 30
compress
delaycompress
notifempty
create 0644 {{ deploy_user }} {{ deploy_user }}
postrotate
# Restart bot services if they are running
if [ -f /home/{{ deploy_user }}/.docker-compose-pid ]; then
cd {{ project_root }} && docker-compose restart
fi
endscript
}
{{ project_root }}/bots/*/bot_stderr.log {
daily
missingok
rotate 30
compress
delaycompress
notifempty
create 0644 {{ deploy_user }} {{ deploy_user }}
postrotate
# Restart bot services if they are running
if [ -f /home/{{ deploy_user }}/.docker-compose-pid ]; then
cd {{ project_root }} && docker-compose restart
fi
endscript
}
# Docker container logs
/var/lib/docker/containers/*/*.log {
daily
missingok
rotate 7
compress
delaycompress
notifempty
create 0644 root root
postrotate
# Reload Docker daemon
systemctl reload docker
endscript
}

View File

@@ -0,0 +1,100 @@
# Logrotate configuration for system services
# This file manages log rotation for system services
# Nginx logs
/var/log/nginx/*.log {
daily
missingok
rotate 30
compress
delaycompress
notifempty
create 0644 www-data adm
sharedscripts
postrotate
if [ -f /var/run/nginx.pid ]; then
kill -USR1 `cat /var/run/nginx.pid`
fi
endscript
}
# System logs
/var/log/syslog {
daily
missingok
rotate 7
compress
delaycompress
notifempty
create 0644 syslog adm
postrotate
/usr/lib/rsyslog/rsyslog-rotate
endscript
}
/var/log/mail.info
/var/log/mail.warn
/var/log/mail.err
/var/log/mail.log
/var/log/daemon.log
/var/log/kern.log
/var/log/auth.log
/var/log/user.log
/var/log/lpr.log
/var/log/cron.log
/var/log/debug
/var/log/messages {
daily
missingok
rotate 7
compress
delaycompress
notifempty
create 0644 syslog adm
sharedscripts
postrotate
/usr/lib/rsyslog/rsyslog-rotate
endscript
}
# Fail2ban logs
/var/log/fail2ban.log {
daily
missingok
rotate 7
compress
delaycompress
notifempty
create 0644 root root
postrotate
systemctl reload fail2ban
endscript
}
# Docker daemon logs
/var/log/docker.log {
daily
missingok
rotate 7
compress
delaycompress
notifempty
create 0644 root root
postrotate
systemctl reload docker
endscript
}
# Prometheus node exporter logs
/var/log/prometheus-node-exporter.log {
daily
missingok
rotate 7
compress
delaycompress
notifempty
create 0644 prometheus prometheus
postrotate
systemctl reload prometheus-node-exporter
endscript
}

104
infra/nginx/AUTH_SETUP.md Normal file
View File

@@ -0,0 +1,104 @@
# Настройка авторизации для мониторинга
## Обзор
Добавлена HTTP Basic Authentication для следующих сервисов:
- **Prometheus** (`/prometheus/`) - метрики и мониторинг
- **Alertmanager** (`/alerts/` и `/api/v1/`) - управление алертами
## Управление паролями
### Автоматическая настройка через Ansible
При развертывании через Ansible пароли настраиваются автоматически:
```bash
# Использовать пароли по умолчанию
ansible-playbook -i inventory.ini playbook.yml
# Задать свои пароли
ansible-playbook -i inventory.ini playbook.yml \
-e monitoring_username=myuser \
-e monitoring_password=mypassword
```
### Ручная настройка
1. **Создать файл паролей:**
```bash
sudo mkdir -p /etc/nginx/passwords
sudo htpasswd -c /etc/nginx/passwords/monitoring.htpasswd admin
```
2. **Добавить дополнительных пользователей:**
```bash
sudo htpasswd /etc/nginx/passwords/monitoring.htpasswd username
```
3. **Установить правильные права:**
```bash
sudo chown root:www-data /etc/nginx/passwords/monitoring.htpasswd
sudo chmod 640 /etc/nginx/passwords/monitoring.htpasswd
```
4. **Перезапустить nginx:**
```bash
sudo systemctl reload nginx
```
### Использование скрипта генерации
```bash
# Сгенерировать пароль для пользователя admin
sudo /usr/local/bin/generate_auth_passwords.sh admin
# Сгенерировать пароль для другого пользователя
sudo /usr/local/bin/generate_auth_passwords.sh myuser
```
## Доступ к сервисам
После настройки авторизации доступ к сервисам:
- **Prometheus**: `https://your-server/prometheus/`
- **Alertmanager**: `https://your-server/alerts/`
- **Alertmanager API**: `https://your-server/api/v1/`
При первом обращении браузер запросит логин и пароль.
## Health Check endpoints
Следующие endpoints остаются доступными без авторизации для мониторинга:
- `https://your-server/prometheus/-/healthy` - проверка состояния Prometheus
- `https://your-server/nginx-health` - проверка состояния nginx
## Безопасность
- Пароли хранятся в зашифрованном виде в файле `/etc/nginx/passwords/monitoring.htpasswd`
- Файл доступен только для чтения пользователю root и группе www-data
- Используется HTTPS для всех соединений
- Настроена защита от брутфорса через fail2ban
## Устранение проблем
### Проверка конфигурации nginx
```bash
sudo nginx -t
```
### Проверка файла паролей
```bash
sudo cat /etc/nginx/passwords/monitoring.htpasswd
```
### Проверка логов nginx
```bash
sudo tail -f /var/log/nginx/error.log
sudo tail -f /var/log/nginx/access.log
```
### Сброс пароля
```bash
sudo htpasswd /etc/nginx/passwords/monitoring.htpasswd admin
```

View File

@@ -1,32 +0,0 @@
# Grafana reverse proxy configuration
upstream grafana_backend {
server grafana:3000;
keepalive 32;
}
# Grafana proxy configuration
location /grafana/ {
proxy_pass http://grafana_backend/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Port $server_port;
# WebSocket support for Grafana
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# Buffer settings
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;
}

View File

@@ -1,34 +0,0 @@
# Prometheus reverse proxy configuration
upstream prometheus_backend {
server prometheus:9090;
keepalive 32;
}
# Prometheus proxy configuration
location /prometheus/ {
proxy_pass http://prometheus_backend/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Port $server_port;
# Timeouts
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
# Buffer settings
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;
}
# Health check endpoint
location /prometheus/-/healthy {
proxy_pass http://prometheus_backend/-/healthy;
proxy_set_header Host $host;
access_log off;
}

View File

@@ -1,24 +0,0 @@
# Status page configuration (for future uptime kuma integration)
# Rate limiting for status page
location /status {
# Basic authentication for status page
auth_basic "Status Page Access";
auth_basic_user_file /etc/nginx/.htpasswd;
# Placeholder for future uptime kuma integration
# For now, show nginx status
access_log off;
return 200 '{"status": "ok", "nginx": "running", "timestamp": "$time_iso8601"}';
add_header Content-Type application/json;
}
# Nginx status stub (for monitoring)
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
allow 172.16.0.0/12; # Docker networks
allow 192.168.0.0/16; # Private networks
deny all;
}

View File

@@ -1,4 +1,4 @@
user nginx; user www-data;
worker_processes auto; worker_processes auto;
error_log /var/log/nginx/error.log warn; error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid; pid /var/run/nginx.pid;
@@ -63,7 +63,29 @@ http {
ssl_session_cache shared:SSL:10m; ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m; ssl_session_timeout 10m;
# Upstream configurations
upstream grafana_backend {
server localhost:3000;
keepalive 32;
}
upstream prometheus_backend {
server localhost:9090;
keepalive 32;
}
upstream uptime_kuma_backend {
server localhost:3001;
keepalive 32;
}
upstream alertmanager_backend {
server localhost:9093;
keepalive 32;
}
# Main server block # Main server block
# Redirect HTTP to HTTPS
server { server {
listen 80; listen 80;
server_name _; server_name _;
@@ -74,20 +96,23 @@ http {
listen 443 ssl http2; listen 443 ssl http2;
server_name _; server_name _;
# SSL configuration # SSL configuration (self-signed certificate)
ssl_certificate /etc/nginx/ssl/cert.pem; ssl_certificate /etc/nginx/ssl/fullchain.pem;
ssl_certificate_key /etc/nginx/ssl/key.pem; ssl_certificate_key /etc/nginx/ssl/privkey.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384;
ssl_prefer_server_ciphers off;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
# Security headers # Security headers
add_header X-Frame-Options "SAMEORIGIN" always; add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always; add_header X-Content-Type-Options "nosniff" always;
# Rate limiting # Root page - show simple status
limit_req zone=api burst=20 nodelay;
# Redirect root to Grafana
location = / { location = / {
return 301 /grafana/; return 200 "Bot Infrastructure Status\n\nServices:\n- Grafana: /grafana/\n- Prometheus: /prometheus/\n- Uptime Kuma: /status/\n- Alertmanager: /alerts/\n";
add_header Content-Type text/plain;
} }
# Health check endpoint # Health check endpoint
@@ -97,7 +122,282 @@ http {
add_header Content-Type text/plain; add_header Content-Type text/plain;
} }
# Include location configurations # Uptime Kuma status page
include /etc/nginx/conf.d/*.conf; location /status {
# Rate limiting
limit_req zone=status burst=5 nodelay;
# Proxy to Uptime Kuma
proxy_pass http://127.0.0.1:3001/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
# Buffer settings
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
}
# Uptime Kuma dashboard
location /dashboard {
# Rate limiting
limit_req zone=status burst=5 nodelay;
# Proxy to Uptime Kuma
proxy_pass http://127.0.0.1:3001/dashboard;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
# Buffer settings
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
}
# Uptime Kuma static assets
location /assets/ {
# Rate limiting
limit_req zone=api burst=20 nodelay;
# Proxy to Uptime Kuma
proxy_pass http://127.0.0.1:3001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Cache static assets
expires 1y;
add_header Cache-Control "public, immutable";
}
# Uptime Kuma icons and manifest
location ~ ^/(icon.*\.(png|svg)|apple-touch-icon.*\.png|manifest\.json)$ {
# Rate limiting
limit_req zone=api burst=20 nodelay;
# Proxy to Uptime Kuma
proxy_pass http://127.0.0.1:3001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Cache static assets
expires 1y;
add_header Cache-Control "public, immutable";
}
# Uptime Kuma WebSocket (Socket.IO)
location /socket.io/ {
# Rate limiting
limit_req zone=api burst=20 nodelay;
# Proxy to Uptime Kuma
proxy_pass http://127.0.0.1:3001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
}
# Uptime Kuma API endpoints
location /api/ {
# Rate limiting
limit_req zone=api burst=10 nodelay;
# Proxy to Uptime Kuma
proxy_pass http://127.0.0.1:3001;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# CORS headers
add_header Access-Control-Allow-Origin "*" always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization" always;
# Handle preflight requests
if ($request_method = 'OPTIONS') {
add_header Access-Control-Allow-Origin "*";
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, OPTIONS";
add_header Access-Control-Allow-Headers "DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization";
add_header Access-Control-Max-Age 1728000;
add_header Content-Type "text/plain; charset=utf-8";
add_header Content-Length 0;
return 204;
}
}
# Grafana proxy configuration
location /grafana/ {
proxy_pass http://grafana_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Port $server_port;
# WebSocket support for Grafana
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# Buffer settings
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;
}
# Prometheus proxy configuration with authentication
location /prometheus/ {
# HTTP Basic Authentication
auth_basic "Prometheus Monitoring";
auth_basic_user_file /etc/nginx/passwords/monitoring.htpasswd;
# Rate limiting
limit_req zone=api burst=10 nodelay;
proxy_pass http://prometheus_backend/prometheus/;
proxy_redirect /prometheus/ /prometheus/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Port $server_port;
# Timeouts
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
# Buffer settings
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;
}
# Prometheus health check endpoint
location /prometheus/-/healthy {
proxy_pass http://prometheus_backend/prometheus/-/healthy;
proxy_set_header Host $host;
access_log off;
}
# Alertmanager proxy configuration with authentication
location /alerts/ {
# HTTP Basic Authentication
auth_basic "Alertmanager Monitoring";
auth_basic_user_file /etc/nginx/passwords/monitoring.htpasswd;
# Rate limiting
limit_req zone=api burst=10 nodelay;
# Remove trailing slash for proxy
rewrite ^/alerts/(.*)$ /$1 break;
# Proxy to Alertmanager
proxy_pass http://alertmanager_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-Port $server_port;
# Timeouts
proxy_connect_timeout 30s;
proxy_send_timeout 30s;
proxy_read_timeout 30s;
# Buffer settings
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;
# Security headers
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
}
# Alertmanager API with authentication
location /api/v1/ {
# HTTP Basic Authentication
auth_basic "Alertmanager API";
auth_basic_user_file /etc/nginx/passwords/monitoring.htpasswd;
# Rate limiting
limit_req zone=api burst=20 nodelay;
# Proxy to Alertmanager
proxy_pass http://alertmanager_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# CORS headers
add_header Access-Control-Allow-Origin "*" always;
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization" always;
# Handle preflight requests
if ($request_method = 'OPTIONS') {
add_header Access-Control-Allow-Origin "*";
add_header Access-Control-Allow-Methods "GET, POST, PUT, DELETE, OPTIONS";
add_header Access-Control-Allow-Headers "DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization";
add_header Access-Control-Max-Age 1728000;
add_header Content-Type "text/plain; charset=utf-8";
add_header Content-Length 0;
return 204;
}
}
# All location configurations are now integrated into this file
} }
} }

View File

@@ -0,0 +1,27 @@
# Let's Encrypt SSL Configuration
# This file contains the SSL configuration for Let's Encrypt certificates
# SSL certificate paths (Let's Encrypt)
ssl_certificate /etc/letsencrypt/live/{{DOMAIN}}/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/{{DOMAIN}}/privkey.pem;
# SSL Security Configuration
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384;
ssl_prefer_server_ciphers off;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
ssl_session_tickets off;
# OCSP Stapling
ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /etc/letsencrypt/live/{{DOMAIN}}/chain.pem;
# Security Headers
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; img-src 'self' data: https:; font-src 'self' data:; connect-src 'self' wss: https:;" always;

View File

@@ -0,0 +1,253 @@
# Prometheus Alert Rules
# This file defines alerting rules for monitoring the bot infrastructure
groups:
# Bot Health Monitoring
- name: bot_health
rules:
# Telegram Bot Health
- alert: TelegramBotDown
expr: up{job="telegram-bot"} == 0
for: 1m
labels:
severity: critical
service: telegram-bot
annotations:
summary: "Telegram Bot is down"
description: "Telegram Bot has been down for more than 1 minute"
runbook_url: "https://docs.example.com/runbooks/telegram-bot-down"
- alert: TelegramBotHighErrorRate
expr: rate(http_requests_total{job="telegram-bot",status=~"5.."}[5m]) > 0.1
for: 2m
labels:
severity: warning
service: telegram-bot
annotations:
summary: "Telegram Bot high error rate"
description: "Telegram Bot error rate is {{ $value }} errors per second"
- alert: TelegramBotHighResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="telegram-bot"}[5m])) > 2
for: 5m
labels:
severity: warning
service: telegram-bot
annotations:
summary: "Telegram Bot high response time"
description: "95th percentile response time is {{ $value }} seconds"
# AnonBot Health
- alert: AnonBotDown
expr: up{job="anon-bot"} == 0
for: 1m
labels:
severity: critical
service: anon-bot
annotations:
summary: "AnonBot is down"
description: "AnonBot has been down for more than 1 minute"
runbook_url: "https://docs.example.com/runbooks/anon-bot-down"
- alert: AnonBotHighErrorRate
expr: rate(http_requests_total{job="anon-bot",status=~"5.."}[5m]) > 0.1
for: 2m
labels:
severity: warning
service: anon-bot
annotations:
summary: "AnonBot high error rate"
description: "AnonBot error rate is {{ $value }} errors per second"
- alert: AnonBotHighResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{job="anon-bot"}[5m])) > 2
for: 5m
labels:
severity: warning
service: anon-bot
annotations:
summary: "AnonBot high response time"
description: "95th percentile response time is {{ $value }} seconds"
# Infrastructure Health Monitoring
- name: infrastructure_health
rules:
# Prometheus Health
- alert: PrometheusDown
expr: up{job="prometheus"} == 0
for: 1m
labels:
severity: critical
service: prometheus
annotations:
summary: "Prometheus is down"
description: "Prometheus has been down for more than 1 minute"
- alert: PrometheusHighMemoryUsage
expr: (prometheus_tsdb_head_series / prometheus_tsdb_head_series_limit) > 0.8
for: 5m
labels:
severity: warning
service: prometheus
annotations:
summary: "Prometheus high memory usage"
description: "Prometheus memory usage is {{ $value | humanizePercentage }} of limit"
# Grafana Health
- alert: GrafanaDown
expr: up{job="grafana"} == 0
for: 1m
labels:
severity: critical
service: grafana
annotations:
summary: "Grafana is down"
description: "Grafana has been down for more than 1 minute"
# Nginx Health
- alert: NginxDown
expr: up{job="nginx"} == 0
for: 1m
labels:
severity: critical
service: nginx
annotations:
summary: "Nginx is down"
description: "Nginx has been down for more than 1 minute"
- alert: NginxHighErrorRate
expr: rate(nginx_http_requests_total{status=~"5.."}[5m]) > 0.1
for: 2m
labels:
severity: warning
service: nginx
annotations:
summary: "Nginx high error rate"
description: "Nginx error rate is {{ $value }} errors per second"
# System Resource Monitoring
- name: system_resources
rules:
# High CPU Usage
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
service: system
annotations:
summary: "High CPU usage"
description: "CPU usage is {{ $value }}% on {{ $labels.instance }}"
- alert: VeryHighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 95
for: 2m
labels:
severity: critical
service: system
annotations:
summary: "Very high CPU usage"
description: "CPU usage is {{ $value }}% on {{ $labels.instance }}"
# High Memory Usage
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
for: 5m
labels:
severity: warning
service: system
annotations:
summary: "High memory usage"
description: "Memory usage is {{ $value }}% on {{ $labels.instance }}"
- alert: VeryHighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 95
for: 2m
labels:
severity: critical
service: system
annotations:
summary: "Very high memory usage"
description: "Memory usage is {{ $value }}% on {{ $labels.instance }}"
# Disk Space
- alert: LowDiskSpace
expr: (1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 80
for: 5m
labels:
severity: warning
service: system
annotations:
summary: "Low disk space"
description: "Disk usage is {{ $value }}% on {{ $labels.instance }} ({{ $labels.mountpoint }})"
- alert: VeryLowDiskSpace
expr: (1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100 > 95
for: 2m
labels:
severity: critical
service: system
annotations:
summary: "Very low disk space"
description: "Disk usage is {{ $value }}% on {{ $labels.instance }} ({{ $labels.mountpoint }})"
# Docker Container Monitoring
- name: docker_containers
rules:
# Container Restart
- alert: ContainerRestarting
expr: rate(container_start_time_seconds[10m]) > 0
for: 0m
labels:
severity: warning
service: docker
annotations:
summary: "Container restarting"
description: "Container {{ $labels.name }} is restarting frequently"
# Container High Memory Usage
- alert: ContainerHighMemoryUsage
expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) * 100 > 80
for: 5m
labels:
severity: warning
service: docker
annotations:
summary: "Container high memory usage"
description: "Container {{ $labels.name }} memory usage is {{ $value }}%"
# Container High CPU Usage
- alert: ContainerHighCPUUsage
expr: (rate(container_cpu_usage_seconds_total[5m]) / container_spec_cpu_quota * 100) > 80
for: 5m
labels:
severity: warning
service: docker
annotations:
summary: "Container high CPU usage"
description: "Container {{ $labels.name }} CPU usage is {{ $value }}%"
# Database Monitoring
- name: database_health
rules:
# Database Connection Issues
- alert: DatabaseConnectionFailed
expr: increase(database_connection_errors_total[5m]) > 5
for: 1m
labels:
severity: critical
service: database
annotations:
summary: "Database connection failures"
description: "{{ $value }} database connection failures in the last 5 minutes"
# Database High Query Time
- alert: DatabaseHighQueryTime
expr: histogram_quantile(0.95, rate(database_query_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
service: database
annotations:
summary: "Database high query time"
description: "95th percentile database query time is {{ $value }} seconds"

View File

@@ -3,8 +3,7 @@ global:
evaluation_interval: 15s evaluation_interval: 15s
rule_files: rule_files:
# - "first_rules.yml" - "alert_rules.yml"
# - "second_rules.yml"
scrape_configs: scrape_configs:
- job_name: 'prometheus' - job_name: 'prometheus'
@@ -14,7 +13,7 @@ scrape_configs:
# Job для мониторинга Node Exporter # Job для мониторинга Node Exporter
- job_name: 'node' - job_name: 'node'
static_configs: static_configs:
- targets: ['172.17.0.1:9100'] # Специальное имя для доступа к хосту - targets: ['172.20.0.1:9100'] # Node Exporter на хосте через Docker gateway
labels: labels:
instance: 'main-server' instance: 'main-server'
@@ -46,4 +45,4 @@ alerting:
alertmanagers: alertmanagers:
- static_configs: - static_configs:
- targets: - targets:
# - alertmanager:9093 - alertmanager:9093

View File

@@ -0,0 +1,77 @@
# Uptime Kuma Configuration
Uptime Kuma - это статусная страница для мониторинга доступности сервисов.
## Доступ
- **Веб-интерфейс**: `https://your-domain/status/`
- **Прямой доступ**: `http://localhost:3001` (только локально)
## Настройка
### Первоначальная настройка
1. Запустите сервисы:
```bash
make up
```
2. Откройте `https://your-domain/status/`
3. Создайте администратора:
- Username: `admin`
- Password: `admin` (смените после первого входа)
### Мониторинг сервисов
Uptime Kuma автоматически настроит мониторинг следующих сервисов:
- **Telegram Bot**: `http://telegram-bot:8080/health`
- **AnonBot**: `http://anon-bot:8081/health`
- **Prometheus**: `http://prometheus:9090/-/healthy`
- **Grafana**: `http://grafana:3000/api/health`
- **AlertManager**: `http://alertmanager:9093/-/healthy`
- **Nginx**: `http://nginx:80/nginx-health`
### Уведомления
Настройте уведомления в веб-интерфейсе:
- Telegram Bot
- Email
- Webhook
- Discord
- Slack
## Файлы конфигурации
- `monitors.json` - экспорт настроенных мониторов
- `settings.json` - настройки приложения
- `backup/` - резервные копии конфигурации
## Команды управления
```bash
# Показать логи
make logs-uptime-kuma
# Перезапустить
make restart-uptime-kuma
# Проверить статус
make status
```
## Резервное копирование
Конфигурация сохраняется в Docker volume `uptime_kuma_data`.
Для резервного копирования:
```bash
# Создать backup
make backup
# Восстановить
make restore FILE=backup.tar.gz
```

View File

@@ -0,0 +1,36 @@
# Uptime Kuma Backup
Эта директория содержит резервные копии конфигурации Uptime Kuma.
## Автоматическое резервное копирование
Создайте скрипт для автоматического бэкапа:
```bash
#!/bin/bash
# backup-uptime-kuma.sh
DATE=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="/path/to/backups"
CONTAINER_NAME="bots_uptime_kuma"
# Создать backup
docker exec $CONTAINER_NAME tar -czf /tmp/uptime-kuma-backup-$DATE.tar.gz /app/data
# Скопировать backup на хост
docker cp $CONTAINER_NAME:/tmp/uptime-kuma-backup-$DATE.tar.gz $BACKUP_DIR/
# Очистить временные файлы
docker exec $CONTAINER_NAME rm /tmp/uptime-kuma-backup-$DATE.tar.gz
echo "Backup created: $BACKUP_DIR/uptime-kuma-backup-$DATE.tar.gz"
```
## Восстановление
```bash
# Восстановить из backup
docker cp backup-file.tar.gz $CONTAINER_NAME:/tmp/
docker exec $CONTAINER_NAME tar -xzf /tmp/backup-file.tar.gz -C /
docker restart $CONTAINER_NAME
```

View File

@@ -0,0 +1,147 @@
{
"monitors": [
{
"id": 1,
"name": "Telegram Bot Health",
"url": "http://telegram-bot:8080/health",
"type": "http",
"method": "GET",
"interval": 60,
"retries": 3,
"timeout": 10,
"keyword": null,
"maxredirects": 10,
"ignoreTls": false,
"upsideDown": false,
"tags": ["bot", "telegram", "health"],
"description": "Мониторинг состояния Telegram Helper Bot",
"active": true
},
{
"id": 2,
"name": "AnonBot Health",
"url": "http://anon-bot:8081/health",
"type": "http",
"method": "GET",
"interval": 60,
"retries": 3,
"timeout": 10,
"keyword": null,
"maxredirects": 10,
"ignoreTls": false,
"upsideDown": false,
"tags": ["bot", "anon", "health"],
"description": "Мониторинг состояния AnonBot",
"active": true
},
{
"id": 3,
"name": "Prometheus Health",
"url": "http://prometheus:9090/-/healthy",
"type": "http",
"method": "GET",
"interval": 60,
"retries": 3,
"timeout": 10,
"keyword": null,
"maxredirects": 10,
"ignoreTls": false,
"upsideDown": false,
"tags": ["monitoring", "prometheus", "health"],
"description": "Мониторинг состояния Prometheus",
"active": true
},
{
"id": 4,
"name": "Grafana Health",
"url": "http://grafana:3000/api/health",
"type": "http",
"method": "GET",
"interval": 60,
"retries": 3,
"timeout": 10,
"keyword": null,
"maxredirects": 10,
"ignoreTls": false,
"upsideDown": false,
"tags": ["monitoring", "grafana", "health"],
"description": "Мониторинг состояния Grafana",
"active": true
},
{
"id": 5,
"name": "AlertManager Health",
"url": "http://alertmanager:9093/-/healthy",
"type": "http",
"method": "GET",
"interval": 60,
"retries": 3,
"timeout": 10,
"keyword": null,
"maxredirects": 10,
"ignoreTls": false,
"upsideDown": false,
"tags": ["monitoring", "alertmanager", "health"],
"description": "Мониторинг состояния AlertManager",
"active": true
},
{
"id": 6,
"name": "Nginx Health",
"url": "http://nginx:80/nginx-health",
"type": "http",
"method": "GET",
"interval": 60,
"retries": 3,
"timeout": 10,
"keyword": "healthy",
"maxredirects": 10,
"ignoreTls": false,
"upsideDown": false,
"tags": ["infrastructure", "nginx", "health"],
"description": "Мониторинг состояния Nginx",
"active": true
},
{
"id": 7,
"name": "External Bot Status",
"url": "https://your-domain/status/",
"type": "http",
"method": "GET",
"interval": 300,
"retries": 2,
"timeout": 15,
"keyword": null,
"maxredirects": 10,
"ignoreTls": false,
"upsideDown": false,
"tags": ["external", "status-page"],
"description": "Мониторинг внешней доступности статусной страницы",
"active": false
}
],
"tags": [
{
"name": "bot",
"color": "#3498db"
},
{
"name": "monitoring",
"color": "#e74c3c"
},
{
"name": "infrastructure",
"color": "#f39c12"
},
{
"name": "health",
"color": "#27ae60"
},
{
"name": "external",
"color": "#9b59b6"
}
]
}

View File

@@ -0,0 +1,24 @@
{
"language": "ru",
"theme": "light",
"timezone": "Europe/Moscow",
"dateLocale": "ru",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"timeFormat": "24",
"weekStart": 1,
"searchEngineIndex": true,
"primaryBaseURL": "https://your-domain/status/",
"public": true,
"publicGroupList": true,
"showTags": true,
"showPoweredBy": false,
"keepDataPeriodDays": 365,
"retentionCheckInterval": 3600,
"maxmindLicenseKey": "",
"dnsCache": true,
"dnsCacheTtl": 300,
"trustProxy": true,
"disableAuth": false,
"defaultTimezone": "Europe/Moscow",
"defaultLanguage": "ru"
}

109
scripts/deploy-from-github.sh Executable file
View File

@@ -0,0 +1,109 @@
#!/bin/bash
# Скрипт для деплоя из GitHub Actions
# Используется на сервере для безопасного обновления
set -e
PROJECT_DIR="/home/prod"
BACKUP_DIR="/home/prod/backups"
LOG_FILE="/home/prod/logs/deploy.log"
# Создаем директории если их нет
mkdir -p "$BACKUP_DIR"
mkdir -p "$(dirname "$LOG_FILE")"
# Функция логирования
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE"
}
log "🚀 Starting deployment..."
# Переходим в директорию проекта
cd "$PROJECT_DIR" || exit 1
# Сохраняем текущий коммит
CURRENT_COMMIT=$(git rev-parse HEAD)
log "Current commit: $CURRENT_COMMIT"
# Создаем backup конфигурации перед обновлением
log "💾 Creating backup..."
BACKUP_FILE="$BACKUP_DIR/backup-$(date +%Y%m%d-%H%M%S).tar.gz"
tar -czf "$BACKUP_FILE" \
infra/prometheus/prometheus.yml \
infra/grafana/provisioning/ \
docker-compose.yml \
2>/dev/null || true
log "Backup created: $BACKUP_FILE"
# Обновляем код
log "📥 Pulling latest changes..."
git fetch origin main
git reset --hard origin/main
# Проверяем изменения
NEW_COMMIT=$(git rev-parse HEAD)
if [ "$CURRENT_COMMIT" = "$NEW_COMMIT" ]; then
log " No new changes to deploy"
exit 0
fi
log "✅ Code updated: $CURRENT_COMMIT$NEW_COMMIT"
# Проверяем синтаксис docker-compose
log "🔍 Validating docker-compose.yml..."
if ! docker-compose config > /dev/null 2>&1; then
log "❌ docker-compose.yml validation failed!"
log "🔄 Rolling back..."
git reset --hard "$CURRENT_COMMIT"
exit 1
fi
# Перезапускаем сервисы
log "🔄 Restarting services..."
if command -v make &> /dev/null; then
make restart
else
docker-compose down
docker-compose up -d --build
fi
# Ждем запуска сервисов
log "⏳ Waiting for services to start..."
sleep 20
# Health checks
log "🏥 Running health checks..."
HEALTH_CHECK_FAILED=0
# Prometheus
if curl -f http://localhost:9090/-/healthy > /dev/null 2>&1; then
log "✅ Prometheus is healthy"
else
log "❌ Prometheus health check failed"
HEALTH_CHECK_FAILED=1
fi
# Grafana
if curl -f http://localhost:3000/api/health > /dev/null 2>&1; then
log "✅ Grafana is healthy"
else
log "❌ Grafana health check failed"
HEALTH_CHECK_FAILED=1
fi
# Если health check не прошел, откатываемся
if [ $HEALTH_CHECK_FAILED -eq 1 ]; then
log "❌ Health checks failed! Rolling back..."
git reset --hard "$CURRENT_COMMIT"
make restart || docker-compose restart
log "🔄 Rollback completed"
exit 1
fi
log "✅ Deployment completed successfully!"
log "📊 Container status:"
docker-compose ps || docker ps --filter "name=bots_"
exit 0

View File

@@ -0,0 +1,31 @@
#!/bin/bash
# Script to generate HTTP Basic Auth passwords for monitoring services
# Usage: ./generate_auth_passwords.sh [username]
set -e
# Default username if not provided
USERNAME=${1:-"admin"}
# Create passwords directory if it doesn't exist
PASSWORDS_DIR="/etc/nginx/passwords"
mkdir -p "$PASSWORDS_DIR"
# Generate random password
PASSWORD=$(openssl rand -base64 32 | tr -d "=+/" | cut -c1-25)
# Create htpasswd file
echo "Creating password file for user: $USERNAME"
htpasswd -cb "$PASSWORDS_DIR/monitoring.htpasswd" "$USERNAME" "$PASSWORD"
# Set proper permissions
chown root:www-data "$PASSWORDS_DIR/monitoring.htpasswd"
chmod 640 "$PASSWORDS_DIR/monitoring.htpasswd"
echo "Password file created: $PASSWORDS_DIR/monitoring.htpasswd"
echo "Username: $USERNAME"
echo "Password: $PASSWORD"
echo ""
echo "Save this password securely!"
echo "You can add more users with: htpasswd $PASSWORDS_DIR/monitoring.htpasswd <username>"

163
scripts/setup-ssl.sh Executable file
View File

@@ -0,0 +1,163 @@
#!/bin/bash
# SSL Setup Script for Let's Encrypt
# This script sets up SSL certificates using Let's Encrypt
set -e
# Configuration
DOMAIN="${DOMAIN:-localhost}"
EMAIL="${EMAIL:-admin@${DOMAIN}}"
NGINX_CONTAINER="bots_nginx"
CERTBOT_IMAGE="certbot/certbot:latest"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Logging function
log() {
echo -e "${GREEN}[$(date +'%Y-%m-%d %H:%M:%S')] $1${NC}"
}
warn() {
echo -e "${YELLOW}[$(date +'%Y-%m-%d %H:%M:%S')] WARNING: $1${NC}"
}
error() {
echo -e "${RED}[$(date +'%Y-%m-%d %H:%M:%S')] ERROR: $1${NC}"
exit 1
}
# Check if running as root
if [[ $EUID -eq 0 ]]; then
error "This script should not be run as root for security reasons"
fi
# Check if domain is localhost
if [[ "$DOMAIN" == "localhost" ]]; then
warn "Domain is set to localhost. Let's Encrypt certificates cannot be issued for localhost."
warn "Please set the DOMAIN environment variable to your actual domain name."
warn "Example: DOMAIN=example.com ./scripts/setup-ssl.sh"
exit 1
fi
# Check if Docker is running
if ! docker info > /dev/null 2>&1; then
error "Docker is not running. Please start Docker and try again."
fi
# Check if nginx container is running
if ! docker ps | grep -q "$NGINX_CONTAINER"; then
error "Nginx container ($NGINX_CONTAINER) is not running. Please start it first with 'docker-compose up -d nginx'"
fi
log "Setting up SSL certificates for domain: $DOMAIN"
log "Email for Let's Encrypt: $EMAIL"
# Create necessary directories
log "Creating Let's Encrypt directories..."
sudo mkdir -p /etc/letsencrypt/live
sudo mkdir -p /etc/letsencrypt/archive
sudo mkdir -p /etc/letsencrypt/renewal
sudo chmod 755 /etc/letsencrypt
# Stop nginx temporarily for certificate generation
log "Stopping nginx container for certificate generation..."
docker stop "$NGINX_CONTAINER" || true
# Generate certificate using certbot
log "Generating SSL certificate using Let's Encrypt..."
docker run --rm \
-v /etc/letsencrypt:/etc/letsencrypt \
-v /var/lib/letsencrypt:/var/lib/letsencrypt \
-p 80:80 \
-p 443:443 \
"$CERTBOT_IMAGE" certonly \
--standalone \
--non-interactive \
--agree-tos \
--email "$EMAIL" \
--domains "$DOMAIN" \
--expand
# Check if certificate was generated successfully
if [[ ! -f "/etc/letsencrypt/live/$DOMAIN/fullchain.pem" ]]; then
error "Failed to generate SSL certificate for $DOMAIN"
fi
log "SSL certificate generated successfully!"
# Set proper permissions
log "Setting proper permissions for SSL certificates..."
sudo chmod 755 /etc/letsencrypt/live
sudo chmod 755 /etc/letsencrypt/archive
sudo chmod 644 /etc/letsencrypt/live/"$DOMAIN"/*.pem
sudo chmod 600 /etc/letsencrypt/live/"$DOMAIN"/privkey.pem
# Update nginx configuration to use Let's Encrypt certificates
log "Updating nginx configuration..."
if [[ -f "infra/nginx/ssl/letsencrypt.conf" ]]; then
# Replace domain placeholder in letsencrypt.conf
sed "s/{{DOMAIN}}/$DOMAIN/g" infra/nginx/ssl/letsencrypt.conf > /tmp/letsencrypt.conf
sudo cp /tmp/letsencrypt.conf /etc/letsencrypt/live/"$DOMAIN"/letsencrypt.conf
rm /tmp/letsencrypt.conf
fi
# Start nginx container
log "Starting nginx container..."
docker start "$NGINX_CONTAINER"
# Wait for nginx to start
log "Waiting for nginx to start..."
sleep 10
# Test SSL certificate
log "Testing SSL certificate..."
if curl -k -s "https://$DOMAIN" > /dev/null; then
log "SSL certificate is working correctly!"
else
warn "SSL certificate test failed. Please check nginx configuration."
fi
# Set up automatic renewal
log "Setting up automatic certificate renewal..."
cat > /tmp/ssl-renewal.sh << EOF
#!/bin/bash
# SSL Certificate Renewal Script
set -e
DOMAIN="$DOMAIN"
NGINX_CONTAINER="$NGINX_CONTAINER"
CERTBOT_IMAGE="$CERTBOT_IMAGE"
# Renew certificates
docker run --rm \\
-v /etc/letsencrypt:/etc/letsencrypt \\
-v /var/lib/letsencrypt:/var/lib/letsencrypt \\
"$CERTBOT_IMAGE" renew --quiet
# Reload nginx
docker exec "\$NGINX_CONTAINER" nginx -s reload
echo "\$(date): SSL certificates renewed successfully" >> /var/log/ssl-renewal.log
EOF
sudo mv /tmp/ssl-renewal.sh /usr/local/bin/ssl-renewal.sh
sudo chmod +x /usr/local/bin/ssl-renewal.sh
# Add cron job for automatic renewal (every Monday at 2 AM)
log "Adding cron job for automatic renewal..."
(crontab -l 2>/dev/null; echo "0 2 * * 1 /usr/local/bin/ssl-renewal.sh") | crontab -
log "SSL setup completed successfully!"
log "Certificate location: /etc/letsencrypt/live/$DOMAIN/"
log "Automatic renewal is configured to run every Monday at 2 AM"
log "You can test the renewal manually with: sudo /usr/local/bin/ssl-renewal.sh"
# Display certificate information
log "Certificate information:"
openssl x509 -in "/etc/letsencrypt/live/$DOMAIN/fullchain.pem" -text -noout | grep -E "(Subject:|Not Before|Not After|DNS:)"

View File

@@ -3,12 +3,12 @@
Тесты для конфигурации Prometheus Тесты для конфигурации Prometheus
""" """
import pytest
import yaml
import sys
import os import os
import sys
from pathlib import Path from pathlib import Path
import pytest
import yaml
class TestPrometheusConfig: class TestPrometheusConfig:
@@ -17,7 +17,12 @@ class TestPrometheusConfig:
@pytest.fixture @pytest.fixture
def prometheus_config_path(self): def prometheus_config_path(self):
"""Путь к файлу конфигурации Prometheus""" """Путь к файлу конфигурации Prometheus"""
return Path(__file__).parent.parent.parent / 'infra' / 'prometheus' / 'prometheus.yml' return (
Path(__file__).parent.parent.parent
/ "infra"
/ "prometheus"
/ "prometheus.yml"
)
@pytest.fixture @pytest.fixture
def prometheus_config(self, prometheus_config_path): def prometheus_config(self, prometheus_config_path):
@@ -25,100 +30,117 @@ class TestPrometheusConfig:
if not prometheus_config_path.exists(): if not prometheus_config_path.exists():
pytest.skip(f"Prometheus config file not found: {prometheus_config_path}") pytest.skip(f"Prometheus config file not found: {prometheus_config_path}")
with open(prometheus_config_path, 'r', encoding='utf-8') as f: with open(prometheus_config_path, "r", encoding="utf-8") as f:
return yaml.safe_load(f) return yaml.safe_load(f)
def test_config_file_exists(self, prometheus_config_path): def test_config_file_exists(self, prometheus_config_path):
"""Тест существования файла конфигурации""" """Тест существования файла конфигурации"""
assert prometheus_config_path.exists(), f"Prometheus config file not found: {prometheus_config_path}" assert (
prometheus_config_path.exists()
), f"Prometheus config file not found: {prometheus_config_path}"
def test_config_is_valid_yaml(self, prometheus_config): def test_config_is_valid_yaml(self, prometheus_config):
"""Тест валидности YAML конфигурации""" """Тест валидности YAML конфигурации"""
assert isinstance(prometheus_config, dict), "Config should be a valid YAML dictionary" assert isinstance(
prometheus_config, dict
), "Config should be a valid YAML dictionary"
def test_global_section(self, prometheus_config): def test_global_section(self, prometheus_config):
"""Тест глобальной секции конфигурации""" """Тест глобальной секции конфигурации"""
assert 'global' in prometheus_config, "Config should have global section" assert "global" in prometheus_config, "Config should have global section"
global_config = prometheus_config['global'] global_config = prometheus_config["global"]
assert 'scrape_interval' in global_config, "Global section should have scrape_interval" assert (
assert 'evaluation_interval' in global_config, "Global section should have evaluation_interval" "scrape_interval" in global_config
), "Global section should have scrape_interval"
assert (
"evaluation_interval" in global_config
), "Global section should have evaluation_interval"
# Проверяем значения интервалов # Проверяем значения интервалов
assert global_config['scrape_interval'] == '15s', "Default scrape_interval should be 15s" assert (
assert global_config['evaluation_interval'] == '15s', "Default evaluation_interval should be 15s" global_config["scrape_interval"] == "15s"
), "Default scrape_interval should be 15s"
assert (
global_config["evaluation_interval"] == "15s"
), "Default evaluation_interval should be 15s"
def test_scrape_configs_section(self, prometheus_config): def test_scrape_configs_section(self, prometheus_config):
"""Тест секции scrape_configs""" """Тест секции scrape_configs"""
assert 'scrape_configs' in prometheus_config, "Config should have scrape_configs section" assert (
"scrape_configs" in prometheus_config
), "Config should have scrape_configs section"
scrape_configs = prometheus_config['scrape_configs'] scrape_configs = prometheus_config["scrape_configs"]
assert isinstance(scrape_configs, list), "scrape_configs should be a list" assert isinstance(scrape_configs, list), "scrape_configs should be a list"
assert len(scrape_configs) >= 1, "Should have at least one scrape config" assert len(scrape_configs) >= 1, "Should have at least one scrape config"
def test_prometheus_job(self, prometheus_config): def test_prometheus_job(self, prometheus_config):
"""Тест job для самого Prometheus""" """Тест job для самого Prometheus"""
scrape_configs = prometheus_config['scrape_configs'] scrape_configs = prometheus_config["scrape_configs"]
# Ищем job для prometheus # Ищем job для prometheus
prometheus_job = None prometheus_job = None
for job in scrape_configs: for job in scrape_configs:
if job.get('job_name') == 'prometheus': if job.get("job_name") == "prometheus":
prometheus_job = job prometheus_job = job
break break
assert prometheus_job is not None, "Should have prometheus job" assert prometheus_job is not None, "Should have prometheus job"
assert 'static_configs' in prometheus_job, "Prometheus job should have static_configs" assert (
"static_configs" in prometheus_job
), "Prometheus job should have static_configs"
static_configs = prometheus_job['static_configs'] static_configs = prometheus_job["static_configs"]
assert isinstance(static_configs, list), "static_configs should be a list" assert isinstance(static_configs, list), "static_configs should be a list"
assert len(static_configs) > 0, "Should have at least one static config" assert len(static_configs) > 0, "Should have at least one static config"
# Проверяем targets # Проверяем targets
targets = static_configs[0].get('targets', []) targets = static_configs[0].get("targets", [])
assert 'localhost:9090' in targets, "Prometheus should scrape localhost:9090" assert "localhost:9090" in targets, "Prometheus should scrape localhost:9090"
def test_telegram_bot_job(self, prometheus_config): def test_telegram_bot_job(self, prometheus_config):
"""Тест job для telegram-helper-bot""" """Тест job для telegram-helper-bot"""
scrape_configs = prometheus_config['scrape_configs'] scrape_configs = prometheus_config["scrape_configs"]
# Ищем job для telegram-helper-bot # Ищем job для telegram-helper-bot
bot_job = None bot_job = None
for job in scrape_configs: for job in scrape_configs:
if job.get('job_name') == 'telegram-helper-bot': if job.get("job_name") == "telegram-helper-bot":
bot_job = job bot_job = job
break break
assert bot_job is not None, "Should have telegram-helper-bot job" assert bot_job is not None, "Should have telegram-helper-bot job"
# Проверяем основные параметры # Проверяем основные параметры
assert 'static_configs' in bot_job, "Bot job should have static_configs" assert "static_configs" in bot_job, "Bot job should have static_configs"
assert 'metrics_path' in bot_job, "Bot job should have metrics_path" assert "metrics_path" in bot_job, "Bot job should have metrics_path"
assert 'scrape_interval' in bot_job, "Bot job should have scrape_interval" assert "scrape_interval" in bot_job, "Bot job should have scrape_interval"
assert 'scrape_timeout' in bot_job, "Bot job should have scrape_timeout" assert "scrape_timeout" in bot_job, "Bot job should have scrape_timeout"
assert 'honor_labels' in bot_job, "Bot job should have honor_labels" assert "honor_labels" in bot_job, "Bot job should have honor_labels"
# Проверяем значения # Проверяем значения
assert bot_job['metrics_path'] == '/metrics', "Metrics path should be /metrics" assert bot_job["metrics_path"] == "/metrics", "Metrics path should be /metrics"
assert bot_job['scrape_interval'] == '15s', "Scrape interval should be 15s" assert bot_job["scrape_interval"] == "15s", "Scrape interval should be 15s"
assert bot_job['scrape_timeout'] == '10s', "Scrape timeout should be 10s" assert bot_job["scrape_timeout"] == "10s", "Scrape timeout should be 10s"
assert bot_job['honor_labels'] is True, "honor_labels should be True" assert bot_job["honor_labels"] is True, "honor_labels should be True"
# Проверяем static_configs # Проверяем static_configs
static_configs = bot_job['static_configs'] static_configs = bot_job["static_configs"]
assert len(static_configs) > 0, "Should have at least one static config" assert len(static_configs) > 0, "Should have at least one static config"
# Проверяем targets # Проверяем targets
targets = static_configs[0].get('targets', []) targets = static_configs[0].get("targets", [])
assert 'bots_telegram_bot:8080' in targets, "Should scrape bots_telegram_bot:8080" assert (
"bots_telegram_bot:8080" in targets
), "Should scrape bots_telegram_bot:8080"
# Проверяем labels # Проверяем labels
labels = static_configs[0].get('labels', {}) labels = static_configs[0].get("labels", {})
expected_labels = { expected_labels = {
'bot_name': 'telegram-helper-bot', "bot_name": "telegram-helper-bot",
'environment': 'production', "environment": "production",
'service': 'telegram-bot' "service": "telegram-bot",
} }
for key, value in expected_labels.items(): for key, value in expected_labels.items():
@@ -127,106 +149,144 @@ class TestPrometheusConfig:
def test_alerting_section(self, prometheus_config): def test_alerting_section(self, prometheus_config):
"""Тест секции alerting""" """Тест секции alerting"""
assert 'alerting' in prometheus_config, "Config should have alerting section" assert "alerting" in prometheus_config, "Config should have alerting section"
alerting_config = prometheus_config['alerting'] alerting_config = prometheus_config["alerting"]
assert 'alertmanagers' in alerting_config, "Alerting section should have alertmanagers" assert (
"alertmanagers" in alerting_config
), "Alerting section should have alertmanagers"
alertmanagers = alerting_config['alertmanagers'] alertmanagers = alerting_config["alertmanagers"]
assert isinstance(alertmanagers, list), "alertmanagers should be a list" assert isinstance(alertmanagers, list), "alertmanagers should be a list"
# Проверяем, что alertmanager закомментирован (не активен) # Проверяем, что alertmanager настроен правильно
# Это нормально для тестовой среды
if len(alertmanagers) > 0: if len(alertmanagers) > 0:
for am in alertmanagers: for am in alertmanagers:
if 'static_configs' in am: if "static_configs" in am:
static_configs = am['static_configs'] static_configs = am["static_configs"]
assert isinstance(
static_configs, list
), "static_configs should be a list"
for sc in static_configs: for sc in static_configs:
if 'targets' in sc: if "targets" in sc:
targets = sc['targets'] targets = sc["targets"]
# targets может быть None если все строки закомментированы # targets может быть None если все строки закомментированы
if targets is not None: if targets is not None:
# Проверяем, что все targets закомментированы assert isinstance(
targets, list
), "targets should be a list"
# Проверяем, что targets не пустые и имеют правильный формат
for target in targets: for target in targets:
assert target.startswith('#'), f"Alertmanager target should be commented: {target}" assert isinstance(
target, str
), f"Target should be a string: {target}"
# Если target не закомментирован, проверяем формат
if not target.startswith("#"):
assert (
":" in target
), f"Target should have port: {target}"
def test_rule_files_section(self, prometheus_config): def test_rule_files_section(self, prometheus_config):
"""Тест секции rule_files""" """Тест секции rule_files"""
assert 'rule_files' in prometheus_config, "Config should have rule_files section" assert (
"rule_files" in prometheus_config
), "Config should have rule_files section"
rule_files = prometheus_config['rule_files'] rule_files = prometheus_config["rule_files"]
# rule_files может быть None если все строки закомментированы # rule_files может быть None если все строки закомментированы
if rule_files is not None: if rule_files is not None:
assert isinstance(rule_files, list), "rule_files should be a list" assert isinstance(rule_files, list), "rule_files should be a list"
# Проверяем, что все rule files закомментированы # Проверяем, что rule files имеют правильный формат
for rule_file in rule_files: for rule_file in rule_files:
assert rule_file.startswith('#'), f"Rule file should be commented: {rule_file}" assert isinstance(
rule_file, str
), f"Rule file should be a string: {rule_file}"
# Если rule file не закомментирован, проверяем, что это валидный путь
if not rule_file.startswith("#"):
assert rule_file.endswith(".yml") or rule_file.endswith(
".yaml"
), f"Rule file should have .yml or .yaml extension: {rule_file}"
def test_config_structure_consistency(self, prometheus_config): def test_config_structure_consistency(self, prometheus_config):
"""Тест консистентности структуры конфигурации""" """Тест консистентности структуры конфигурации"""
# Проверяем, что все job'ы имеют одинаковую структуру # Проверяем, что все job'ы имеют одинаковую структуру
scrape_configs = prometheus_config['scrape_configs'] scrape_configs = prometheus_config["scrape_configs"]
required_fields = ['job_name', 'static_configs'] required_fields = ["job_name", "static_configs"]
optional_fields = ['metrics_path', 'scrape_interval', 'scrape_timeout', 'honor_labels'] optional_fields = [
"metrics_path",
"scrape_interval",
"scrape_timeout",
"honor_labels",
]
for job in scrape_configs: for job in scrape_configs:
# Проверяем обязательные поля # Проверяем обязательные поля
for field in required_fields: for field in required_fields:
assert field in job, f"Job {job.get('job_name', 'unknown')} missing required field: {field}" assert (
field in job
), f"Job {job.get('job_name', 'unknown')} missing required field: {field}"
# Проверяем, что static_configs содержит targets # Проверяем, что static_configs содержит targets
static_configs = job['static_configs'] static_configs = job["static_configs"]
assert isinstance(static_configs, list), f"Job {job.get('job_name', 'unknown')} static_configs should be list" assert isinstance(
static_configs, list
), f"Job {job.get('job_name', 'unknown')} static_configs should be list"
for static_config in static_configs: for static_config in static_configs:
assert 'targets' in static_config, f"Static config should have targets" assert "targets" in static_config, f"Static config should have targets"
targets = static_config['targets'] targets = static_config["targets"]
assert isinstance(targets, list), "Targets should be a list" assert isinstance(targets, list), "Targets should be a list"
assert len(targets) > 0, "Targets should not be empty" assert len(targets) > 0, "Targets should not be empty"
def test_port_configurations(self, prometheus_config): def test_port_configurations(self, prometheus_config):
"""Тест конфигурации портов""" """Тест конфигурации портов"""
scrape_configs = prometheus_config['scrape_configs'] scrape_configs = prometheus_config["scrape_configs"]
# Проверяем, что порты корректно настроены # Проверяем, что порты корректно настроены
for job in scrape_configs: for job in scrape_configs:
static_configs = job['static_configs'] static_configs = job["static_configs"]
for static_config in static_configs: for static_config in static_configs:
targets = static_config['targets'] targets = static_config["targets"]
for target in targets: for target in targets:
if ':' in target: if ":" in target:
host, port = target.split(':', 1) host, port = target.split(":", 1)
# Проверяем, что порт это число # Проверяем, что порт это число
try: try:
port_num = int(port) port_num = int(port)
assert 1 <= port_num <= 65535, f"Port {port_num} out of range" assert (
1 <= port_num <= 65535
), f"Port {port_num} out of range"
except ValueError: except ValueError:
# Это может быть Docker service name без порта # Это может быть Docker service name без порта
pass pass
def test_environment_labels(self, prometheus_config): def test_environment_labels(self, prometheus_config):
"""Тест labels окружения""" """Тест labels окружения"""
scrape_configs = prometheus_config['scrape_configs'] scrape_configs = prometheus_config["scrape_configs"]
# Проверяем, что production окружение правильно помечено # Проверяем, что production окружение правильно помечено
for job in scrape_configs: for job in scrape_configs:
if job.get('job_name') == 'telegram-helper-bot': if job.get("job_name") == "telegram-helper-bot":
static_configs = job['static_configs'] static_configs = job["static_configs"]
for static_config in static_configs: for static_config in static_configs:
labels = static_config.get('labels', {}) labels = static_config.get("labels", {})
if 'environment' in labels: if "environment" in labels:
assert labels['environment'] == 'production', "Environment should be production" assert (
labels["environment"] == "production"
), "Environment should be production"
def test_metrics_path_consistency(self, prometheus_config): def test_metrics_path_consistency(self, prometheus_config):
"""Тест консистентности paths для метрик""" """Тест консистентности paths для метрик"""
scrape_configs = prometheus_config['scrape_configs'] scrape_configs = prometheus_config["scrape_configs"]
# Проверяем, что все job'ы используют /metrics # Проверяем, что все job'ы используют /metrics
for job in scrape_configs: for job in scrape_configs:
if 'metrics_path' in job: if "metrics_path" in job:
assert job['metrics_path'] == '/metrics', f"Job {job.get('job_name', 'unknown')} should use /metrics path" assert (
job["metrics_path"] == "/metrics"
), f"Job {job.get('job_name', 'unknown')} should use /metrics path"
class TestPrometheusConfigValidation: class TestPrometheusConfigValidation:
@@ -236,73 +296,58 @@ class TestPrometheusConfigValidation:
def sample_valid_config(self): def sample_valid_config(self):
"""Пример валидной конфигурации""" """Пример валидной конфигурации"""
return { return {
'global': { "global": {"scrape_interval": "15s", "evaluation_interval": "15s"},
'scrape_interval': '15s', "scrape_configs": [
'evaluation_interval': '15s'
},
'scrape_configs': [
{ {
'job_name': 'test', "job_name": "test",
'static_configs': [ "static_configs": [{"targets": ["localhost:9090"]}],
{
'targets': ['localhost:9090']
}
]
} }
] ],
} }
def test_minimal_valid_config(self, sample_valid_config): def test_minimal_valid_config(self, sample_valid_config):
"""Тест минимальной валидной конфигурации""" """Тест минимальной валидной конфигурации"""
# Проверяем, что конфигурация содержит все необходимые поля # Проверяем, что конфигурация содержит все необходимые поля
assert 'global' in sample_valid_config assert "global" in sample_valid_config
assert 'scrape_configs' in sample_valid_config assert "scrape_configs" in sample_valid_config
global_config = sample_valid_config['global'] global_config = sample_valid_config["global"]
assert 'scrape_interval' in global_config assert "scrape_interval" in global_config
assert 'evaluation_interval' in global_config assert "evaluation_interval" in global_config
scrape_configs = sample_valid_config['scrape_configs'] scrape_configs = sample_valid_config["scrape_configs"]
assert len(scrape_configs) > 0 assert len(scrape_configs) > 0
for job in scrape_configs: for job in scrape_configs:
assert 'job_name' in job assert "job_name" in job
assert 'static_configs' in job assert "static_configs" in job
static_configs = job['static_configs'] static_configs = job["static_configs"]
assert len(static_configs) > 0 assert len(static_configs) > 0
for static_config in static_configs: for static_config in static_configs:
assert 'targets' in static_config assert "targets" in static_config
targets = static_config['targets'] targets = static_config["targets"]
assert len(targets) > 0 assert len(targets) > 0
def test_config_without_required_fields(self): def test_config_without_required_fields(self):
"""Тест конфигурации без обязательных полей""" """Тест конфигурации без обязательных полей"""
# Конфигурация без global секции # Конфигурация без global секции
config_without_global = { config_without_global = {"scrape_configs": []}
'scrape_configs': []
}
# Конфигурация без scrape_configs # Конфигурация без scrape_configs
config_without_scrape = { config_without_scrape = {"global": {"scrape_interval": "15s"}}
'global': {
'scrape_interval': '15s'
}
}
# Конфигурация с пустыми scrape_configs # Конфигурация с пустыми scrape_configs
config_empty_scrape = { config_empty_scrape = {
'global': { "global": {"scrape_interval": "15s"},
'scrape_interval': '15s' "scrape_configs": [],
},
'scrape_configs': []
} }
# Все эти конфигурации должны быть невалидными # Все эти конфигурации должны быть невалидными
assert 'global' not in config_without_global assert "global" not in config_without_global
assert 'scrape_configs' not in config_without_scrape assert "scrape_configs" not in config_without_scrape
assert len(config_empty_scrape['scrape_configs']) == 0 assert len(config_empty_scrape["scrape_configs"]) == 0
if __name__ == "__main__": if __name__ == "__main__":

View File

@@ -3,27 +3,35 @@
Тест конфигурации pytest Тест конфигурации pytest
""" """
import pytest
import os import os
import sys import sys
import pytest
def test_pytest_config_loaded(): def test_pytest_config_loaded():
"""Проверяем, что конфигурация pytest загружена""" """Проверяем, что конфигурация pytest загружена"""
# Проверяем, что мы находимся в корневой директории проекта # Проверяем, что мы находимся в корневой директории проекта
assert os.path.exists('pytest.ini'), "pytest.ini должен существовать в корне проекта" assert os.path.exists(
"pytest.ini"
), "pytest.ini должен существовать в корне проекта"
# Проверяем, что директория tests существует # Проверяем, что директория tests существует
assert os.path.exists('tests'), "Директория tests должна существовать" assert os.path.exists("tests"), "Директория tests должна существовать"
assert os.path.exists('tests/infra'), "Директория tests/infra должна существовать" assert os.path.exists("tests/infra"), "Директория tests/infra должна существовать"
assert os.path.exists('tests/bot'), "Директория tests/bot должна существовать" assert os.path.exists("tests/bot"), "Директория tests/bot должна существовать"
def test_test_structure(): def test_test_structure():
"""Проверяем структуру тестов""" """Проверяем структуру тестов"""
# Проверяем наличие __init__.py файлов # Проверяем наличие __init__.py файлов
assert os.path.exists('tests/__init__.py'), "tests/__init__.py должен существовать" assert os.path.exists("tests/__init__.py"), "tests/__init__.py должен существовать"
assert os.path.exists('tests/infra/__init__.py'), "tests/infra/__init__.py должен существовать" assert os.path.exists(
assert os.path.exists('tests/bot/__init__.py'), "tests/bot/__init__.py должен существовать" "tests/infra/__init__.py"
), "tests/infra/__init__.py должен существовать"
assert os.path.exists(
"tests/bot/__init__.py"
), "tests/bot/__init__.py должен существовать"
if __name__ == "__main__": if __name__ == "__main__":