Logging
All services use Pino for structured JSON logging. Logs include request IDs, job IDs, and chain identifiers for tracing.
Log Format
{
"level": "info",
"time": 1717585200000,
"requestId": "abc123",
"chain": "solana",
"msg": "Request completed",
"method": "GET",
"path": "/api/v1/wallets/7xKp.../rating",
"statusCode": 200,
"responseTime": 45
}
Log Levels
| Level |
Usage |
error |
Errors requiring attention |
warn |
Recoverable issues, deprecated usage |
info |
Request/response, job completion |
debug |
Detailed debugging (dev only) |
Key Metrics
Parser Metrics (Per Chain)
| Metric |
Description |
parser.{chain}.success_rate |
% of transactions parsed successfully |
parser.{chain}.unknown_protocol_rate |
% falling back to unknown |
parser.{chain}.latency_ms |
Time to parse a transaction |
parser.{chain}.mev_detected |
MEV attacks detected |
API Metrics
| Metric |
Description |
api.request_count |
Total requests by endpoint |
api.error_rate |
5xx error rate |
api.p99_latency |
99th percentile response time |
api.rate_limit_hits |
Rate limit rejections |
Worker Metrics
| Metric |
Description |
worker.job_queue_depth |
Pending jobs in queue |
worker.job_success_rate |
% of jobs completing successfully |
worker.backfill_throughput |
Transactions processed per minute |
worker.webhook_delivery_rate |
Successful webhook deliveries |
Intelligence Metrics
| Metric |
Description |
intelligence.wallets_flagged |
New wallets flagged per hour |
intelligence.copy_detections |
Copy trading alerts generated |
intelligence.decay_transitions |
Sharp → Fading → Dead transitions |
Health Checks
Each service exposes a /health endpoint with multi-chain status:
GET /health
{
"status": "healthy",
"version": "1.0.0",
"timestamp": "2026-06-05T12:00:00Z",
"checks": {
"database": "ok",
"job_queue": "ok",
"chains": {
"solana": "ok",
"base": "ok",
"hyperliquid": "ok"
}
}
}
Alerting & Alarms
Critical alerts are configured using Cloudflare Workers Alarms and external monitoring.
Critical Alarms P1
| Alarm |
Threshold |
Action |
| Parser failure rate |
> 5% |
Page on-call, investigate protocol change |
| API error rate |
> 1% |
Page on-call, check deployments |
| Database connections |
> 80% |
Scale pool, investigate leaks |
| Chain RPC down |
Any chain offline |
Failover to backup RPC |
Warning Alarms P2
| Alarm |
Threshold |
Action |
| Job queue depth |
> 1000 |
Scale workers, investigate backlog |
| Webhook delivery failures |
> 5% |
Retry logic, notify customer |
| Unknown protocol rate |
> 10% |
Add new DEX normalizer |
| API p99 latency |
> 5s |
Investigate slow queries |
Cloudflare Workers Alarms
The API uses Durable Objects for scheduled alarm handling:
// Alarm service for scheduled tasks
class AlarmService {
// Scheduled jobs
async scheduleDecayCheck(walletId: string, nextCheck: Date) {
await this.state.storage.setAlarm(nextCheck.getTime());
}
async alarm() {
// Execute scheduled decay state check
await this.checkDecayState();
// Reschedule next check
await this.scheduleNextCheck();
}
}
Scheduled Alarms
| Alarm |
Frequency |
Purpose |
| Decay state check |
Hourly per wallet |
Update sharp → fading → dead |
| Cohort snapshot |
Daily |
Generate cohort retention data |
| Rating recalculation |
6 hours |
Update wallet scores |
| Copy detection scan |
15 minutes |
Detect new copy patterns |
Parser failure rate is the most critical metric. It's the first sign of a protocol change or new transaction type.
Recovery Procedures
Parser Regression
- Identify failing transaction signatures in logs
- Reproduce locally with
pnpm --filter cli test-tx <signature>
- Check for protocol/program ID changes
- Update normalizer or add new one
- Deploy and verify success rate recovers
Chain RPC Outage
- Alarms trigger on health check failure
- Automatic failover to backup RPC (if configured)
- Manual intervention if all RPCs fail
- Jobs are retried with exponential backoff
Never manually delete jobs from the queue. Use the admin API to mark them as failed if needed.