Staff Database Reliability Engineer

Job

Scribe

San Francisco, CA (In Person)

$225,000 Salary, Full-Time

Posted 2 days ago (Updated 14 hours ago) • Actively hiring

Expires 6/8/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

About the role We're hiring a Staff Database Reliability Engineer to own the strategy, architecture, and operational excellence of our data infrastructure. This is an expert-level IC role with deep influence on engineering direction, partnering closely with platform, backend, and DevOps engineers. Why this role matters You will own the data tier end-to-end. Design schemas and access patterns that scale, tune Aurora for latency and throughput, and set the standards for how engineers interact with our databases. When a migration script seizes up mid-deploy and writes start queueing behind an

ACCESS EXCLUSIVE

lock, your runbooks and automation resolve the incident quickly. Make the Django ORM a strength, not a liability: Review migrations for safety at scale — locks, backfills, concurrent index builds, NOT VALID constraints Catch N+1 patterns and missing select_related / prefetch_related in review Establish conventions for QuerySet usage and physical schema design (indexes, constraints, partitioning) Scale review through automation, not heroics — author AGENTS.md files and DNA scaffolding that encode our conventions, configure AI review bots (Claude Code, Cursor, etc.) to flag risky migrations and ORM anti-patterns, and iterate on those configs as new failure modes emerge Lead major infrastructure initiatives: Capacity planning as traffic and engineering throughput grow Zero-downtime schema migrations and cutovers Multi-AZ resilience within a single region — Aurora writer/reader placement, failover behavior and RTO/RPO, ElastiCache and OpenSearch AZ topology, RabbitMQ survivability across AZs Backups, PITR, failover testing, retention Own the CDC pipeline (Aurora → DMS → S3 Parquet → Snowflake): DMS task design and tuning, replication slot hygiene on the Postgres side Schema evolution as Django migrations roll through — so a column rename doesn't silently break the warehouse at 6 AM Parquet layout and partitioning, reliability of the Snowflake handoff Automated checks that flag migrations likely to break downstream consumers Drive observability across three complementary tools: pganalyze — query-level performance, index advisor, schema insights - the go-to for "why is this ORM query slow" CloudWatch — infrastructure metrics and alarms for Aurora, OpenSearch, ElastiCache, SQS, DMS Honeycomb — high-cardinality tracing that ties slow DB calls back to users, flags, deploys, and flows Shape how the three fit together, including Django-side instrumentation and trace attributes on ORM queries Build tooling and guardrails: Migration review automation and CI checks for risky patterns Slow query pipelines fed from pganalyze Self-service dashboards so teams understand their own query footprint Support and evolve the rest of the stack: OpenSearch — index design, sharding, mapping changes, reindexing strategy, Django-side indexing pipelines Redis — caching patterns, eviction, sizing, Django cache framework, Celery/RQ usage, avoiding hot keys and thundering herds SQS + RabbitMQ — queue design, DLQs, visibility timeouts, exchange/queue topology, AZ mirroring, consumer backpressure, Celery behavior under load What makes you a great fit Core expertise: Deep PostgreSQL — EXPLAIN (ANALYZE, BUFFERS), MVCC, bloat, lock contention, vacuum/autovacuum. Aurora Serverless V2 / Limitless experience strongly preferred (storage model, reader/writer split, ACU scaling) Strong ORM fluency (Django, SQLAlchemy, ActiveRecord, or similar) — predict the SQL a query will generate, spot N+1 problems on sight and how to control eager loading (joins vs. batched IN queries), column projection, aggregations, and subqueries Single-region multi-AZ design — practical understanding of what it does and doesn't protect against Data movement and observability: Production CDC experience, ideally AWS DMS — comfortable with logical replication, slot hygiene, schema evolution, and Parquet-based data lakes feeding Snowflake (or BigQuery/Redshift) Hands-on with pganalyze (or Datadog DBM / Performance Insights / pg_stat_statements pipelines), CloudWatch (custom metrics, composite alarms, log insights), and Honeycomb (or another high-cardinality tracing tool) — comfortable with OpenTelemetry and opinionated about what makes a trace useful AI-assisted workflow: Real experience making AI coding and review tools useful for a team — writing AGENTS.md files, configuring review agents, versioning and iterating on prompts and configs The rest of the stack: OpenSearch at scale — sizing, sharding, JVM tuning, rolling upgrades, snapshots Production Redis — persistence tradeoffs, cluster mode, hot keys, thundering herds At least one production message broker (SQS, RabbitMQ, Kafka) — delivery semantics, idempotency, failure modes Engineering and leadership: Strong automation and IaC background — real code (Python, Go, or similar) and Terraform Track record leading cross-team initiatives, writing design docs that hold up, influencing without authority Comfortable in a high-growth environment where the right answer for 50 engineers isn't the right answer for 100 Pragmatic outlook during incidents — focused on preventing the next one Full-Time US Employee Benefits Include Some of the nicest and smartest teammates you'll ever work with Competitive salaries Comprehensive healthcare benefits Exciting and motivating equity Flexible PTO 401k Parental Leave Commuter Benefits (SF office employees) WFH Stipend Compensation $200k-$250k base + equity We consider several factors when determining compensation, including location, experience, and other job-related factors. At Scribe, we celebrate our differences and are committed to creating a workplace where all employees feel supported and empowered to do their best work. We believe this benefits not only our employees but our product, customers, and community as well. Scribe is proud to be an Equal Opportunity Employer. Apply for this Job

Similar remote jobs

Job
Senior Business Analyst - Fraud & Claims
C
Citizens
Westwood, MA
Posted2 days ago
Updated14 hours ago
Job
Medical Director - Oncology - Remote from anywhere
UG
UnitedHealth Group
Fort Wayne, IN
Posted2 days ago
Updated14 hours ago
Job
Senior Capital Project Engineer
WF
Winland Foods
Salem, OR
Posted2 days ago
Updated14 hours ago
Job
Profee Audit Specialist - FT
D
Datavant
Columbia, SC
Posted2 days ago
Updated14 hours ago
Job
Sr. Marketing Specialist (Evolysse)
E
Evolus
Newport Beach, CA
Posted2 days ago
Updated14 hours ago

Similar jobs in San Francisco, CA

Job
Senior Business Sales Executive
A
AT&T
San Francisco, CA
Posted2 days ago
Updated14 hours ago
Job
Senior SAP BusinessObjects (BOBJ)
NL
NimbusAITech LLC
San Francisco, CA
Posted2 days ago
Updated14 hours ago
Job
Technology Sales Specialist - Automation - Integration
I
IBM
San Francisco, CA
Posted2 days ago
Updated14 hours ago
Job
Director of Sales
HH
Handlery Hotels, Inc.
San Francisco, CA
Posted2 days ago
Updated14 hours ago
Job
Full-Stack Software Engineer, Applied Foundations
O
OpenAI
San Francisco, CA
Posted2 days ago
Updated14 hours ago

Similar jobs in California

Job
Manager or Senior Manager, Provisions / Corporate Tax
K
KPMG
Irvine, CA
Posted2 days ago
Updated14 hours ago
Job
Delivery Material Handler / Roof Loader
Q
QXO
Vista, CA
Posted2 days ago
Updated14 hours ago
Job
Supplier Relations Lead
FA
First American
Santa Rosa, CA
Posted2 days ago
Updated14 hours ago
Job
Engage Life Director
AS
Atria Senior Living
Walnut Creek, CA
Posted2 days ago
Updated14 hours ago
Job
LPN-Assisted Living
FV
Freedom Village of Holland Michigan
San Diego, CA
Posted2 days ago
Updated14 hours ago