NoFluffJobs Stacjonarnie Expert New

Principal AI Data Readiness Architect

Motorola Solutions Systems Polska

⚲ Kraków

18 000 - 25 000 PLN (PERMANENT)

Wymagania

  • Data engineering
  • Data architecture
  • SQL
  • Python

Opis stanowiska

O projekcie: We are seeking a Staff/Principal AI Data Architect to modernize our enterprise data ecosystem so it is ready to support building new AI and ML tools(e.g., automated classification/summarization, agentic workflows, and RAG for example). This role focuses on data readiness, governance, quality, and secure access. You will define the standards, contracts, and observability that make structured and unstructured data trustworthy, discoverable, and easy to consume in batch and near-real-time contexts. Decisions about orchestration tooling are to be determined, but we are currently focusing on using an Airflow-centric approach. The person in this role will help make decisions about data infrastructure implementation and tooling. Wymagania: - 8+ years in data engineering/architecture/platform roles, preferably >1 year at Staff/Principal level. - Expert SQL and Python; track record building enterprise data governance, contracts, and quality frameworks. - Experience operating production data platforms in batch/near-real-time with strong lineage and access control. - Practical unstructured data governance (metadata standards, classification, PII detection/redaction). - Hands-on with catalogs/lineage as systems of record for definitions, ownership, and policy. - Familiarity with vector/RAG readiness concepts (schemas, metadata, provenance) without owning embeddings/model development. - Experience with workflow orchestration (e.g. Airflow) and CI/CD/testing for data pipelines. Codzienne zadania: - Define the enterprise AI data architecture vision, principles, and reference architectures. - Lead cross-functional reviews with IT, security, legal/privacy, and business stakeholders to align on data readiness roadmaps. - Establish data contracts for AI consumption (schemas, semantics, classifications, SLAs) and govern schema evolution for backward compatibility. - Make the data catalog the system of record for lineage, ownership, definitions, and policy labels; integrate with intake/change management. - Define standard data models and semantic conventions that improve joinability and reuse across domains. - Implement an enterprise data quality framework and automated scorecards (freshness, completeness, accuracy, consistency). - Monitor for anomalies and schema drift; publish AI data readiness dashboards (catalog coverage, lineage depth, PII detection coverage, contract adherence). - Standardize patterns for ingestion, processing, storage, serving, and environment promotion using Airflow or other standard ETL/Orchestration tools and CI/CD for data workflows. - Define secure, consistent access patterns/APIs for downstream analytics and AI consumers. - Vector Search and RAG Readiness (Enablement) - Drive the foundational architecture and standards necessary to enable advanced Retrieval Augmented Generation (RAG) and semantic search capabilities across the enterprise. - Provide guidance for chunking/segmentation policies, deduplication, and hybrid search compatibility; downstream teams implement embeddings/vector stores. - Define safe-access patterns for AI consumption to prevent sensitive data exposure. - Enforce security baselines (encryption, RBAC/ABAC, masking/tokenization) and policy-as-code for access. - Architect for transparent cost attribution and controls (tagging, storage tiering, retention) to enable informed cost/performance choices by consumers.