Skip to main content
Tallo logoTallo logo
Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Senior Data Engineer (Chinese Mandarin Speaker)

Job

Bitus Labs

Irvine, CA (In Person)

$130,000 Salary, Full-Time

Posted 3 days ago (Updated 7 hours ago) • Actively hiring

Expires 7/19/2026

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
78
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

About the Role We are looking for a Senior Data Engineer to join our Data Platform team and take ownership of building and scaling our AWS-based data lakehouse. You will architect and deliver robust, production-grade data pipelines, work closely with data scientists, analytics engineers, and product teams, and set the technical direction for how data flows across the organization. This is a hands-on engineering role — you will write production code in Java and Python every day, while also contributing to platform design decisions, mentoring junior engineers, and driving best practices around data quality, reliability, and governance. Responsibilities Data Lakehouse Architecture & Development Design and build scalable medallion-architecture data lakehouses (Bronze / Silver / Gold) on AWS S3 using Apache Iceberg table format. Develop and maintain high-throughput ETL/ELT pipelines using AWS Glue, EMR (Spark), and Lambda. Implement schema evolution, partitioning strategies, and compaction processes for Iceberg tables to optimize storage and query performance. Write production-quality pipeline code in both Java and Python, selecting the appropriate language per performance and maintainability requirements. Real-Time & Batch Streaming Build and operate event-driven data pipelines using Amazon Kinesis Data Streams, Kinesis Firehose, or Apache Kafka (MSK). Design exactly-once and at-least-once processing semantics for streaming workloads using Apache Flink or Spark Structured Streaming on
EMR. AWS
Platform Engineering Manage infrastructure as code using AWS CDK or Terraform for repeatable, auditable data platform deployments. Optimize cost and performance across AWS services including S3, Glue, Athena, Redshift Spectrum, EMR, Lambda, Step Functions, and EventBridge.
Implement data security best practices:
IAM least-privilege policies, KMS encryption, VPC networking, and Lake Formation fine-grained access control. Build and maintain CI/CD pipelines for data workloads using AWS CodePipeline, GitHub Actions, or equivalent. Data Quality & Governance Implement data quality frameworks (e.g., Great Expectations, Deequ) and integrate validation steps into pipeline orchestration. Define and enforce data contracts between producing and consuming systems. Contribute to data cataloguing and lineage tracking using AWS Glue Data Catalog or Apache Atlas. Collaboration & Technical Leadership Partner with data scientists, ML engineers, and analysts to understand data requirements and deliver performant, well-documented datasets. Mentor mid-level and junior engineers through code reviews, design discussions, and pair programming. Document architecture decisions (ADRs) and contribute to internal engineering knowledge base. Required Qualifications Experience 5+ years of professional data engineering experience, with at least 3 years on AWS cloud platforms. Proven track record of delivering production data pipelines at scale (TB+ datasets, highthroughput SLAs). Experience with data lakehouse architectures — medallion pattern, open table formats (Iceberg preferred; Delta Lake or Hudi acceptable).
Programming Languages Java:
Strong command of Java (8+) for Spark jobs, custom Iceberg connectors, and performance-critical pipeline components. Familiarity with Maven/Gradle build systems.
Python:
Proficient in Python 3 for AWS Glue scripts, orchestration logic, data quality checks, and automation tooling. Experience with pandas, PySpark, boto3, and packaging best practices.
AWS Core Services Storage & Compute:
S3, Glue (jobs, crawlers, Data Catalog), EMR (Spark/Flink), Lambda, EC2.
Streaming:
Kinesis Data Streams, Kinesis Firehose, or MSK (Managed Kafka).
Orchestration:
Step Functions, MWAA (Managed Airflow), or EventBridge Scheduler.
Querying:
Athena, Redshift, or Redshift Spectrum.
Security & Governance:
IAM, KMS, Lake Formation, Secrets Manager, VPC.
DevOps:
AWS CDK or CloudFormation; CodePipeline or equivalent CI/CD tools. Data Processing Frameworks Apache Spark (PySpark and/or Spark Java API) — distributed transformations, performance tuning, memory management. Apache Iceberg — table maintenance, time travel, snapshot management, partition evolution. SQL — advanced SQL for data transformation, window functions, CTEs, query optimization.
Preferred / Nice
to Have AWS Certified Data Engineer - Associate or AWS Certified Solutions Architect certification. Experience with dbt for SQL-based transformation layers on top of the lakehouse.
Familiarity with ML platform integration:
feature stores (SageMaker Feature Store), model serving data needs, or MLflow experiment tracking. Experience with real-time OLAP engines such as Apache Druid or ClickHouse. Contributions to open-source data tooling or internal platform libraries. Exposure to data mesh or data product thinking — defining domain ownership and data contracts. Tech Stack at a Glance Languages Java (8+), Python 3 Cloud Platform AWS (S3, Glue, EMR, Kinesis, Athena, Lambda, Step Functions, Lake Formation, CDK) Processing Apache Spark, Apache Flink, Spark Structured Streaming Table Format Apache Iceberg (primary), Delta Lake / Hudi (familiarity) Streaming Amazon Kinesis, MSK (Kafka), Kinesis Firehose Orchestration Apache Airflow (MWAA), AWS Step Functions IaC & CI/CD AWS CDK / Terraform, GitHub Actions /
CodePipeline Pay:
From $130,000.00 per year
Benefits:
401(k) 401(k) matching Dental insurance Health insurance Life insurance Paid time off Parental leave Retirement plan Vision insurance
Language:
Chinese (Required) Ability to
Commute:
Irvine, CA 92618 (Required)
Work Location:
In person