Performance Architect
Compunnel, Inc.
Milpitas, CA (In Person)
Full-Time
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
AI/ML ASIC
architecture performance through hardware/software co-optimization, post-silicon performance analysis, and strategic roadmap influence. Conduct workload analysis and characterization of ASICs and competitive AI/datacenter solutions to identify performance improvement opportunities. Collaborate with architecture teams to resolve performance issues and optimize datacenter technologies for efficiency and TCO. Model and optimize components of AI/ML accelerator ASICs, including PCIe/UCIe/CXL, NoC, DMA, firmware interactions, NAND, fabrics, and xPU. Perform performance modeling and optimization for large-scale LLM training/inference, including Dense and Mixture of Experts (MoE) architectures across multiple modalities. Develop and optimize parallelization strategies across tensor, pipeline, context, expert, and data parallel dimensions. Architect memory-efficient training systems using techniques such as structured pruning, quantization, continuous batching, speculative decoding, and KV cache optimization. Incorporate and extend state-of-the-art models (e.g., GPT-4, Deepseek-R1) and multi-modal architectures. Collaborate with internal and external stakeholders to disseminate results and iterate rap idly. Required Qualifications B achelor's, Master's, or Ph.D. in Computer/Electrical Engineering. 5+ years of experience in performance modeling, simulation, and analysis using SystemC. Strong background in computer/graphics architecture, ML, and LLMs. Hands-on experience with SystemC/TLM simulation, behavioral modeling, and performance analysis. Preferred Qualifications (if any) Exp erience with storage systems, protocols, and NAND flash. Deep expertise in optimizing large-scale ML systems and GPU architectures. Proven technical leadership in GPU performance and workload analysis. Knowledge of transformer architectures, attention mechanisms, and model parallelism techniques. Experience with GPU/TPU microarchitecture and distributed training systems. Proficiency in PyTorch, CUDA, TensorRT, OpenAI Triton, ONNX, and distributed frameworks (Ray, Megatron-LM). Familiarity with performance analysis tools (NSight Compute, nvprof, PyTorch Profiler). Background in IO subsystem microarchitecture and protocols (NVMe, PCIe, UCIe, CXL, NVLink). Experience with datacenter workload analysis, multi-core systems, and multi-threa d interactions. Certifications (if any) R elevant certifications in performance engineering, AI/ML, or hardware architecture (preferred but not required).Similar remote jobs
Nityo Infotech Corporation
Posted1 day ago
Updated44 minutes ago
GE Vernova
Boston, MA
Posted1 day ago
Updated44 minutes ago
Similar jobs in Milpitas, CA
Trillium Staffing
Milpitas, CA
Posted2 days ago
Updated44 minutes ago
Milpitas Star Aquatics & Fitness
Milpitas, CA
Posted2 days ago
Updated1 day ago
Compunnel, Inc.
Milpitas, CA
Posted2 days ago
Updated44 minutes ago
Similar jobs in California
White Glove Placement
Los Angeles, CA
Posted12 hours ago
Updated44 minutes ago
Wells Fargo
Fontana, CA
Posted1 day ago
Updated44 minutes ago
Costco Wholesale Corporation
Carlsbad, CA
Posted1 day ago
Updated44 minutes ago
JP Euro
Anaheim, CA
Posted1 day ago
Updated44 minutes ago