Resilience, Testability & Scalability Lead Position Available In Lancaster, South Carolina
Tallo's Job Summary: This job listing in Lancaster - SC has been recently added. Tallo will add a summary here for this job shortly.
Job Description
Resilience, Testability & Scalability Lead
Job Title:
Resilience, Testability & Scalability Lead
Location:
Fort Mill, SC / New York / New Jersey / Remote (Hybrid)
Project:
Cloud-Native Enterprise Data Platforms Engineering Quality & Resilience Track
Role Overview:
Client is looking for a hands-on, technically strong Resilience, Testability & Scalability
Lead to drive engineering excellence across our data platforms and cloud-based applications. This role is critical in ensuring system uptime, test automation maturity,performance under scale, and architectural resilience to meet stringent regulatory and service-level demands.
The ideal candidate will have a deep background in designing highly available systems,implementing robust disaster recovery, managing scalable cloud infrastructure, and building automated, testable, and observable platforms especially within AWS and Kubernetes environments.
Key Responsibilities:
Design and implement high availability and failover strategies across multi-zone AWS deployments
Lead the development and execution of disaster recovery and business continuity plans, including RTO/RPO validation and cross-region strategies
Define testability strategies, test data management frameworks, and performance testing protocols
Enable infrastructure and application resilience by introducing circuit breakers,retry patterns, service meshes, and graceful degradation mechanisms
Establish real-time monitoring, alerting, and log aggregation frameworks using tools like CloudWatch and Prometheus
Drive test automation and quality engineering best practices, integrating with CI/CD pipelines
Optimize application and data layer performance through query tuning, caching,and indexing strategies
Scale data processing using distributed frameworks like Apache Spark, and implement event-driven stream processing with Kafka
Collaborate with platform, DevOps, and SRE teams to ensure resource efficiency, cost control, and performance SLAs
Contribute to regulatory readiness by enforcing security, encryption, and audit logging standards
Required Skills & Experience:
Infrastructure Resilience & DR:
Multi-AZ deployments, auto-scaling, load balancing, circuit breakers
Disaster recovery design: backup/restore, cross-region replication, RTO/RPO
Monitoring & Observability:
Experience with CloudWatch, Prometheus, log aggregators
Set up alerting for incident response, latency, throughput, and error rates
Application Resilience & Security:
Error handling, service degradation, exponential backoff
Security best practices: IAM policies, encryption at rest/transit
Familiarity with
FINRA/SIPC
compliance standards (preferred)
Test Automation & Quality:
Unit testing (e.g., pytest), integration testing, E2E automation
Test data generation, synthetic data, environment provisioning
Performance testing using JMeter, Gatling, stress and capacity testing
Code reviews, static analysis, data validation, anomaly detection
Scalability & Optimization:
Horizontal scaling using Kubernetes, Docker, service discovery
API Gateway, caching layers (Redis, Memcached), DB partitioning
Connection pooling, capacity planning, cost-aware architecture
Data & Stream Processing:
Spark cluster management, parallel processing, big data optimization
Kafka-based messaging, windowing, and aggregation for real-time data
Preferred Qualifications:
Experience in financial services or regulated environments
Familiarity with Client s enterprise data and platform modernization initiatives
AWS or Kubernetes certifications
Strong communication skills and cross-functional collaboration experience
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job
Dice Id:
10112044
Position Id:
8682355