Engineering Principles

Seven principles I bring
to every project.

These are not borrowed from blog posts or framework documentation. They are positions I have taken often enough that they feel less like opinions and more like defaults. Each one comes with a justification, and where it matters, the trade-off I accept by holding it.

7 principles · 4 minute read · last reviewed May 2026

Principle 01

Schema is the source of truth, not Python.

I define warehouse tables in explicit SQL DDL files, not generated from pandas dtypes or ORM models. A schema written in version-controlled SQL is auditable in pull requests, readable by analysts who do not write Python, and portable across environments without a runtime.

The temptation to let Python infer types and create tables on the fly is real, especially early in a project. It feels faster. It is also the moment a warehouse stops being a contract and starts being whatever Python decided last Tuesday.

Trade-off Slower to bootstrap a new table. Slightly more boilerplate. Worth every line for the durability and clarity it buys.

Applied in Nova Retail · DDL Owned by SQL FibbieBanks · DDL-Owned Schema XTD Research Labs · Schema Owned by SQL PayFlow · Schema Design ChocoDelight · Schema Design

Principle 02

Idempotency is a contract, not a feature.

A pipeline that produces different output on rerun is a pipeline that produces production incidents. I default to wipe-and-reload over upsert logic until scale forces otherwise, because deterministic output is worth the storage cost.

Partial-load bugs are the worst kind to debug. They look fine until someone notices a metric drifted by 3% over a week. Idempotent pipelines fail loudly when something is wrong. Non-idempotent pipelines fail quietly, in the data, where you only find them after the dashboard has been showing the wrong number for a month.

Trade-off Storage and compute cost on every run. Acceptable until you are processing tens of millions of rows. Past that, incremental logic becomes worth the additional complexity.

Applied in Nova Retail · Three Overwrite Layers FibbieBanks · Deterministic Surrogate Keys XTD Research Labs · Three Idempotency Models PayFlow · Decisions ChocoDelight · Decisions

Principle 03

Write logs assuming you will be debugging at 2am.

Every pipeline stage gets a timing decorator and a structured log line at completion. Row counts in, row counts out, elapsed time, validation summaries. The cost of writing one extra log line is microseconds. The cost of not having it during an incident is hours of guessing.

Good logs read like a flight recorder. They tell you exactly what happened, in order, with enough context that you do not need to rerun anything to reconstruct the incident. Bad logs say "something went wrong" and leave you opening the database to check the row counts yourself.

Applied in Nova Retail · Shared Logger FibbieBanks · Custom Logger XTD Research Labs · Custom Logger PayFlow · Engineering Approach ChocoDelight · Tech Stack AliExpress · Tech Stack

Principle 04

Validate before you load, not after.

Schema drift, null values in critical columns, foreign key mismatches. These belong in a validation stage that runs before any data reaches the warehouse, not as a SELECT query you remember to run after a load looks suspicious.

I add validation at extract, clean, and transform stages. Row counts logged. Null counts logged. Foreign key references checked against dimension tables before the fact table is built. The cost is a few seconds of pipeline time. The benefit is that bad data never gets the chance to corrupt a downstream table.

Trade-off Validation logic adds code surface area. Worth it. The alternative is a warehouse that contains data nobody trusts.

Applied in Nova Retail · Row-Count Validation FibbieBanks · Schema Validation XTD Research Labs · Empty-Payload Defense PayFlow · Pre-load Validation ChocoDelight · FK Validation AliExpress · Health Check

Principle 05

Configuration belongs in environment variables, not in code.

Database URLs, credentials, source paths, environment flags. None of these belong as string literals in Python files, and none belong in a central config module that gets committed to the repo. They belong in a .env file, loaded at startup, never logged, and never committed.

This is not a security feature, although it is also that. It is a portability feature. A pipeline that reads its config from environment variables runs on my laptop, in CI, in staging, and in production with no code changes. A pipeline with hardcoded values runs in exactly one place and breaks in every other.

Applied in Nova Retail · Dual-Host Detection FibbieBanks · Env-Var Config XTD Research Labs · Fail-Fast Env Loader PayFlow · Config-driven ChocoDelight · Tech Stack AliExpress · Tech Stack

Principle 06

Modular ETL beats monolithic scripts, always.

Each stage of a pipeline (extract, clean, transform, load) is its own module with a single responsibility. The orchestrator composes them. This means I can run any stage independently for debugging, replace any stage without touching the others, and reason about each stage in isolation.

The opposite pattern (a single 800-line script that does everything) is faster to write and impossible to maintain. I have inherited those scripts. I will never write one.

Applied in Nova Retail · 5-Container Stack FibbieBanks · 5-Module Pipeline XTD Research Labs · Three-Layer Medallion PayFlow · Architecture ChocoDelight · Architecture AliExpress · Architecture

Principle 07

Engineer features in the warehouse, not in dashboards.

Customer segments, revenue buckets, product tiers, time-of-day categories. These belong in the analytics schema as columns on the dim or fact table, computed once during the transform stage. Not as CASE statements in BI tools, not as calculated fields in Tableau, not as SQL written fresh in every dashboard.

When the business logic for a segment changes, I want to update it in one Python file and have every dashboard automatically reflect the change. Not chase down 12 instances of the same CASE statement scattered across BI tools.

Trade-off Slower to iterate on a new segment definition during exploration. Worth it once the definition is settled, because the cost of duplicated business logic compounds quickly.

Applied in Nova Retail · 3-Table Gold Layer FibbieBanks · Date Dimension Attributes XTD Research Labs · Gold Daily Aggregation ChocoDelight · Feature Engineering

See how these principles show up in real pipelines.

All four case studies on this site demonstrate these principles applied to production code, with the featured FibbieBanks project showing every one of the seven in action.

View FibbieBanks View PayFlow View ChocoDelight View AliExpress

Seven principles I bringto every project.