Metro Driver Backend Platform
Distributed platform of 10 Go microservices powering shift planning, overtime calculation, IAM and HR/accounting integrations for 2k+ metro drivers.

Distributed backend platform for a metropolitan transit authority — 10 Go services organised as a multi-module workspace (go.work), in production from 2023 through 2025. Service-to-service messaging is gRPC over Protocol Buffers (shared proto with buf generate + lint); external HTTP endpoints for the web dispatcher console and the mobile app are described by OpenAPI and generated with ogen. Storage is PostgreSQL 14 with pgx/v5 and goose-managed migrations. Async fan-out is Apache Kafka in KRaft mode (no ZooKeeper), Sarama client, at-least-once with idempotency enforced by entity-state checks. Clean Architecture is the per-service skeleton — DTO / Input / Model / Record layers with explicit converters, DIP-style interfaces declared by consumers, lazy DI in internal/app/di.go. DDD for the harder corners (overtime calculation, shift state machine): rich entities, Value Objects, and a clear domain-vs-application service split. Built by Simon Lapin (Principal Architect) with senior Go engineers Denis Stashkov, Ilya Karasev, Anna Yakubovskaya and Alexey Klimko.
- → 10 Go services in a multi-module workspace · gRPC + Protocol Buffers (buf) for internal, OpenAPI + ogen for external HTTP
- → Clean Architecture per service: DTO / Input / Model / Record layers with explicit converters and DIP-style interfaces
- → DDD for hard corners: rich entities, Value Objects, domain vs application service split for overtime and shift state
- → PostgreSQL 14 via pgx/v5; goose migrations; go-transaction-manager for atomic multi-repository writes
- → Race-safe hot paths: SELECT FOR UPDATE with ORDER BY uuid to prevent deadlocks; optimistic versioning for schedule conflicts
- → Zero-downtime NOT-NULL column rollout via three-step migration pattern (ADD COLUMN nullable → backfill → SET NOT NULL)
- → Apache Kafka in KRaft mode (no ZooKeeper), Sarama, at-least-once with idempotency through entity-state checks
- → IAM service: PostgreSQL users, Redis sessions, bcrypt; gRPC unary interceptor + HTTP middleware propagate session_uuid through the chain
- → Distributed rate limiting via Redis GCRA so combined-replica limits stay correct under horizontal scaling
- → Full observability: OTel SDK → Collector → Jaeger (traces), Prometheus (metrics), Elasticsearch/Kibana (logs); trace_id wired into slog
- → GitLab CI with testcontainers (PostgreSQL / Redis / Kafka) and bufconn-based in-memory gRPC API tests for fast full-flow coverage
- → Built by Simon Lapin (Principal Architect) with senior Go engineers Denis Stashkov, Ilya Karasev, Anna Yakubovskaya and Alexey Klimko

Click any screenshot to open full-size view