YOUR ROLE
BIT Capital is looking for an Associate Data Engineer to join our Data & Engineering team. This role is central to ensuring the correctness, reliability, and transparency of the data platform that powers our proprietary equity research and AI-driven analytics.
The position is well suited for someone early in their career who demonstrates exceptional analytical ability, strong technical judgment, and a high level of rigor. You will work closely with senior engineers, researchers, and AI practitioners, gaining hands-on exposure to production data systems and modern AI-supported workflows operating under demanding correctness and reliability requirements.What You Will Do
Initial Focus
In the first phase of the role, the focus is on developing a deep understanding of existing data pipelines, identifying subtle failure modes, and ensuring their correctness in production.
You will:
Design and implement robust test strategies for Python-based ETL pipelines, including validation of edge cases and failure scenarios
Define and maintain precise and unambiguous technical documentation for ETL workflows, data flows, and platform components
Write SQL-based data quality checks and assertions to enforce correctness and consistency across datasets
Monitor ETL pipelines in production and investigate failures, delays, anomalies, and non-obvious data quality regressions
Perform first-level root cause analysis for ETL incidents and escalate issues with clear, well-reasoned technical context
Act as a quality and reliability gate for pipeline changes, backfills, and releases
Provide technical documentation and evidence to compliance and audit teams when required
Collaborate with external data vendors to validate data deliveries and resolve data issues
Support AI-related initiatives, including data preparation, retrieval workflows, evaluations, and reliability checks for AI- and LLM-enabled systems
Expanding Scope Over Time
As familiarity with the platform increases and technical judgment is demonstrated, the scope of the role will expand toward more direct engineering ownership.
Over time, you will:
Implement and improve Python-based ETL workflows in AWS and Databricks, with a focus on careful design, safe operation, and long-term maintainability
Improve monitoring, alerting, and validation mechanisms across the data platform
Take ownership of defined workflows, datasets, or pipeline components with clear accountability for their correctness in production
Participate in AI experimentation and productionization as systems mature
Contribute to technical design discussions with a systems-level and AI-aware perspective