Senior Data Architect
Entrada AI
⚲ Kraków, Warszawa, Wrocław, Gdańsk, Poznań
240 - 280 PLN/h netto (B2B)
Wymagania
- AWS
- Spark
- API
- Databricks
- Python
Opis stanowiska
Location: Remote. Must overlap with US Central and EU working hours.Employment Type: Full-time No part-time availability. No split focus. Start: ASAP (client timeline: ~16 weeks for Phase 2 MVP, likely follow-on phases) long term contract with Endrada This is a high-rigor environment. You will work with very senior client engineers and principal architects who expect you to reason at depth about Spark/Databricks internals, orchestration semantics, failure modes, and production SDLC. What you will own (Phase 2 deliverables) You will lead architecture + hands-on implementation of a Temporal-based orchestration wrapper that triggers, monitors, and classifies Databricks job runs, including: 1) Temporal infrastructure & deployment - Help deliver a production-grade Temporal deployment aligned to the client's Hub + Spoke architecture (in coordination with Cloud Engineering) - Demonstrate deployments/execution in staging workspace - AWS is the target cloud; identify Azure gaps (don't ignore cross-cloud realities) 2) Multi-environment SDLC - Support multiple environments (dev/staging/production) - Integrate with the client's existing internal deployment tooling and namespacing patterns - Ensure clean promotion paths with appropriate guardrails 3) Production pilot: migrate authentication pipeline - Migrate authentication token generation + secret-writing pipeline from its current orchestration into Temporal as a high-value, low-risk production pilot 4) Implement the "Sequence Pipeline" pattern in Temporal - Replicate the current "Sequence Job" pattern using Temporal workflows - Implement "pick up running child job" to prevent redundant compute costs - Implement step-level recovery: if Task 5 of 10 fails, keep results from 1–4 and allow resume from 5 (no "restart everything") - Add audit logging / observability for execution history + outcomes - Deliver an operational runbook for triage and ongoing operations in Temporal 5) Security & permissions model - Implement a robust permissions pattern so Temporal can trigger and monitor "child" jobs across Databricks workspaces - Maintain strict logical separation: Temporal is the "control plane," Databricks remains the data/compute plane 6) Reference implementation - Build a "dummy" reference job sequence as a blueprint for the client's engineers to extend in Phase 3 What is intentionally out of scope (so you can focus) Phase 2 explicitly defers deeper data-domain workstreams (DLQ enhancements, domain-specific pilots, hybrid compute guardrails, cost attribution) to Phase 3. You are not expected to become the business-domain owner of the client's graph logic—your job is to build a reliable orchestration layer that respects it. This is not a "PowerPoint architect" role You will: - Write production code - Own failure modes and recovery semantics - Ship to dev/test/prod with a real SDLC - Produce runbooks that on-call engineers can actually use If you prefer advisory-only architecture or you need someone else to "operationalize" your designs, this will not be a fit. Required qualifications (non-negotiable) Hands-on architecture + delivery - 8+ years in data engineering / platform engineering, including 3+ years as a technical lead/architect shipping production systems - Proven ownership of a system from design → implementation → production rollout → operational handoff Databricks + Spark depth - Deep expertise with Databricks (Jobs/Workflows, cluster configs, execution semantics, failure patterns) - Deep Spark fundamentals: shuffles, partitioning, skew, caching, job planning, and debugging via logs/event timelines - (The client's engineers operate at this level.) Durable orchestration / workflow systems - Strong experience with orchestration frameworks beyond UI-based DAG builders: - Temporal (preferred), Cadence, AWS Step Functions, Argo Workflows, Airflow at scale with custom state/recovery semantics, etc. - You must understand: idempotency, deterministic execution, retries vs replays, compensation patterns, state persistence, and workflow versioning Python + API integration - Strong production Python (packaging, testing, typing discipline, structured logging) - Experience integrating with REST APIs / SDKs (Databricks Jobs API patterns, auth, rate-limits, retries) Cloud + security - AWS fluency: IAM, networking boundaries, secrets management, KMS, deployment patterns - Comfortable partnering with Cloud Engineering but able to lead technically (you can't outsource all infra thinking) Operating model - Able to be 100% dedicated to this workstream during critical phases (no "50% attention" model) - Comfortable working across time zones (US Central + Europe overlap) Preferred qualifications (strongly preferred) - Temporal in production (or Cadence) with real incident learnings - Experience implementing "meta-orchestrators" that coordinate other orchestrators/systems - OpenTelemetry / structured observability patterns (logs + metrics + traces) - Experience with large "DAG of DAGs" pipelines, long runtimes, expensive failure restarts - Databricks certifications (or willingness to obtain/renew quickly as part of partner commitments) How we hire: • Introductory Call (20 min): Short conversation with our Recruiter to discuss your background and expectations. • Deep technical interview (1 - 1,5 h): (Spark/Databricks + orchestration semantics) and System design exercise (go though a durable orchestration wrapper with step-level resume) • Client Interview (45 min - 1 h): Required in this case