Monitoring | posit. Developer Portal

Logging

All services use Pino for structured JSON logging. Logs include request IDs, job IDs, and chain identifiers for tracing.

Log Format

{
  "level": "info",
  "time": 1717585200000,
  "requestId": "abc123",
  "chain": "solana",
  "msg": "Request completed",
  "method": "GET",
  "path": "/api/v1/wallets/7xKp.../rating",
  "statusCode": 200,
  "responseTime": 45
}

Log Levels

Level	Usage
`error`	Errors requiring attention
`warn`	Recoverable issues, deprecated usage
`info`	Request/response, job completion
`debug`	Detailed debugging (dev only)

Key Metrics

Parser Metrics (Per Chain)

Metric	Description
`parser.{chain}.success_rate`	% of transactions parsed successfully
`parser.{chain}.unknown_protocol_rate`	% falling back to unknown
`parser.{chain}.latency_ms`	Time to parse a transaction
`parser.{chain}.mev_detected`	MEV attacks detected

API Metrics

Metric	Description
`api.request_count`	Total requests by endpoint
`api.error_rate`	5xx error rate
`api.p99_latency`	99th percentile response time
`api.rate_limit_hits`	Rate limit rejections

Worker Metrics

Metric	Description
`worker.job_queue_depth`	Pending jobs in queue
`worker.job_success_rate`	% of jobs completing successfully
`worker.backfill_throughput`	Transactions processed per minute
`worker.webhook_delivery_rate`	Successful webhook deliveries

Intelligence Metrics

Metric	Description
`intelligence.wallets_flagged`	New wallets flagged per hour
`intelligence.copy_detections`	Copy trading alerts generated
`intelligence.decay_transitions`	Sharp → Fading → Dead transitions

Health Checks

Each service exposes a /health endpoint with multi-chain status:

GET /health

{
  "status": "healthy",
  "version": "1.0.0",
  "timestamp": "2026-06-05T12:00:00Z",
  "checks": {
    "database": "ok",
    "job_queue": "ok",
    "chains": {
      "solana": "ok",
      "base": "ok",
      "hyperliquid": "ok"
    }
  }
}

Alerting & Alarms

Critical alerts are configured using Cloudflare Workers Alarms and external monitoring.

Critical Alarms P1

Alarm	Threshold	Action
Parser failure rate	> 5%	Page on-call, investigate protocol change
API error rate	> 1%	Page on-call, check deployments
Database connections	> 80%	Scale pool, investigate leaks
Chain RPC down	Any chain offline	Failover to backup RPC

Warning Alarms P2

Alarm	Threshold	Action
Job queue depth	> 1000	Scale workers, investigate backlog
Webhook delivery failures	> 5%	Retry logic, notify customer
Unknown protocol rate	> 10%	Add new DEX normalizer
API p99 latency	> 5s	Investigate slow queries

Cloudflare Workers Alarms

The API uses Durable Objects for scheduled alarm handling:

// Alarm service for scheduled tasks
class AlarmService {
  // Scheduled jobs
  async scheduleDecayCheck(walletId: string, nextCheck: Date) {
    await this.state.storage.setAlarm(nextCheck.getTime());
  }

  async alarm() {
    // Execute scheduled decay state check
    await this.checkDecayState();
    // Reschedule next check
    await this.scheduleNextCheck();
  }
}

Scheduled Alarms

Alarm	Frequency	Purpose
Decay state check	Hourly per wallet	Update sharp → fading → dead
Cohort snapshot	Daily	Generate cohort retention data
Rating recalculation	6 hours	Update wallet scores
Copy detection scan	15 minutes	Detect new copy patterns

Parser failure rate is the most critical metric. It's the first sign of a protocol change or new transaction type.

Recovery Procedures

Parser Regression

Identify failing transaction signatures in logs
Reproduce locally with pnpm --filter cli test-tx <signature>
Check for protocol/program ID changes
Update normalizer or add new one
Deploy and verify success rate recovers

Chain RPC Outage

Alarms trigger on health check failure
Automatic failover to backup RPC (if configured)
Manual intervention if all RPCs fail
Jobs are retried with exponential backoff

Never manually delete jobs from the queue. Use the admin API to mark them as failed if needed.