Hi, Data Engineer - Vanguard (brown)
Located:
Charlotte NC or Malvern PA (3 days onsite)
Rate:
55 on w2 wihout benefis 10+ years of experience is a must (dont send me anyone with less than that) required skillset - Python, SQL, AWS Services (Python/Pyspark) We are seeking a highly experienced and hands-on Senior Data Engineer to join our Data Engineering teams. You will play a key role in supplementing existing capacity, upgrading our data architecture, and ensuring the highest quality, performance, and cost-efficiency of our data platforms. The work is focused on critical deliverables for personal investment, personal wealth, and comprehensive data analytics, while preparing the platform for a larger strategic move in the future. Key Responsibilities
- Design, build, and maintain high-performance ETL/ELT data pipelines using Python and PySpark.
- Apply expert-level coding skills to develop and manage data processing jobs leveraging PySpark for distributed computing across large-scale datasets.
- Take full ownership of the data workflow, including getting data from multiple sources, scrubbing, and validating data to ensure the highest quality.
- Write and optimize complex, performant SQL queries for data extraction, integrity checks, and performance tuning.
- Contribute to platform modernization by exploring and increasing the adoption of AI/ML, including using tools like Copilot and Claude for acceleration, and building models to fill data gaps or improve systems.
- Collaborate with data architects by proposing ideas and great questions, taking ownership as the expert on data, pipelines, and systems.
- Implement DevOps practices for the automated deployment and orchestration of Python applications and data pipelines (e.g., using Docker, Jenkins, Terraform).
- Hands on experience with SQL and complex performance tuning. Required Technical Skills
Programming:
Expert-level proficiency in Python, including libraries like Pandas and NumPy.
Designing:
Designing data pipelines for the data coming from multiple sources
Data Processing:
Solid hands-on experience with PySpark for building scalable data workflows
Data Querying:
Expert-level knowledge of writing complex SQL queries (Oracle or Snowflake), with proven ability to perform performance tuning on large datasets and complex database code.
Cloud Platform:
Strong experience with AWS cloud services and associated data services, specifically: 1. AWS Glue (ETL) 2. S3 3. Lambda 4. Redshift 5. DynamoDB, Athena, ECS, EventBridge, OpenSearch, RDS
ETL & Data Management:
Strong proficiency in ETL/ELT methodologies and tools, as well as Data Quality, Data Validation, and Anomaly Detection techniques.
Scripting:
Working experience with scripting and automation using Unix and Python. Desired Skills & Professional Attributes
- Familiarity with AI/ML and Large Language Model (LLM) approaches to data analysis and validation.
- Knowledge of data warehousing concepts and data modeling techniques.
- Experience with DevOps, Continuous Integration, and Continuous Delivery (e.g., Jenkins, GitHub).
- Experience with BI Reporting tools such as Power BI or Tableau.
- Strong preference for candidates with prior experience in the investment data domain.
- Ability to work independently through complex data challenges and strong analytical and problem-solving skills.