Skip to content

Tax Practice AI - Design Decisions Log

Architecture decisions and their rationale


Document Purpose
backlog.md Active priorities
COMPLETED.md Completed items
TECH_DEBT.md Technical debt tracking
ROADMAP.md Future features
ARCHITECTURE.md System design
index.md Master navigation

Decision Log

ID Date Decision Status
DD-001 2024-12-22 Service centralization as core pattern Active
DD-002 2024-12-22 Mixed Java/Python architecture Active
DD-003 2024-12-22 Shared config.yaml for both languages Active
DD-004 2024-12-22 Aurora-only starting architecture Active
DD-005 2024-12-23 Self-hosted Airflow orchestration Active
DD-006 2024-12-23 Bookkeeping as separate requirements document Active
DD-007 2024-12-23 Bookkeeping phased approach Active
DD-008 2024-12-23 V1 integration strategy: integrate, don't replace Active
DD-009 2024-12-23 Dual integration pattern: Services + Skills Active
DD-010 2024-12-23 Multi-tenant SaaS: Separate databases per tenant Active
DD-011 2024-12-24 Frontend: React + Vite two-app architecture Active

DD-001: Service Centralization

Date: 2024-12-22 Status: Active Decision: Service centralization as core pattern

Context

Need consistent access patterns for all external services (database, S3, email, payment processors).

Decision

All external service access MUST go through centralized service modules in src/services/. No direct connector instantiation elsewhere.

Rationale

  • Single point of access for all external services
  • Easier maintenance, testing, and audit
  • Consistent error handling
  • Connection pooling managed in one place
  • Simplified dependency injection for tests

Consequences

  • All new integrations must follow the pattern
  • Existing code must be refactored if it bypasses services
  • Slightly more boilerplate for simple cases

DD-002: Mixed Java/Python Architecture

Date: 2024-12-22 Status: Active Decision: Mixed Java/Python architecture

Context

System requires both rapid API development and high-performance document processing.

Decision

Use Python for API/orchestration/AI and Java for performance-critical document processing.

Rationale

Component Language Why
API Layer Python FastAPI is productive, async-native, great for REST
AI Integration Python Bedrock SDK, prompt engineering, rapid iteration
Document Parsing Java Performance for OCR/PDF processing at scale
Tax Calculations Java Complex business logic, type safety, existing libraries
Batch Processing Java Memory efficiency, parallelism for large datasets

Consequences

  • Two build systems (pip + Maven)
  • Developers need both skill sets
  • Inter-process communication required
  • Shared configuration management needed

DD-003: Shared config.yaml

Date: 2024-12-22 Status: Active Decision: Shared config.yaml for both languages

Context

With mixed Java/Python architecture, configuration could diverge.

Decision

Single config.yaml as source of truth, read by both Python (PyYAML) and Java (SnakeYAML).

Rationale

  • Single source of truth
  • Environment variable substitution works in both
  • No configuration drift between components
  • Easier deployment configuration

Consequences

  • Both languages must parse same format
  • Config schema must be compatible with both parsers
  • Thoroughly comment all YAML parameters

DD-004: Aurora-Only Starting Architecture

Date: 2024-12-22 Status: Active Decision: Start with Aurora PostgreSQL only, defer Snowflake/analytics tier

Context

Need database for 7+ years of tax data. Could use Aurora for OLTP + Snowflake for OLAP, or simplify.

Decision

Aurora handles all data including 7+ years retention (via table partitioning). Add analytics tier only when Aurora partitioning is insufficient.

Rationale

  • Aurora handles hundreds of thousands of records easily
  • Table partitioning by year supports 7+ year retention
  • Avoids ETL complexity, schema sync, query routing
  • One system to maintain instead of two
  • Cost overhead of Snowflake not justified at small scale

Trigger for Revisiting

  • 10,000+ returns/year
  • Query response > 5 seconds on indexed data
  • Complex analytics spanning multiple years

Consequences

  • Simpler architecture
  • Lower initial cost
  • May need future migration if scale demands it
  • S3 used for document storage only (not a query layer)

DD-005: Self-Hosted Airflow

Date: 2024-12-23 Status: Active Decision: Use self-hosted Airflow on EC2 instead of MWAA or Step Functions

Context

Need workflow orchestration for document processing, reminders, scheduled tasks.

Decision

Self-hosted Apache Airflow on EC2 t3.medium (~$23/mo reserved).

Rationale

Factor Self-Hosted MWAA Step Functions
Cost ~$23/mo $360+/mo Pay per transition
Control Full Limited Limited
UI Yes Yes Basic
DAG complexity Python (full power) Python JSON/YAML

Consequences

  • Self-managed updates and maintenance
  • Full control over configuration
  • Significant cost savings (~$340/mo vs MWAA)
  • Acceptable maintenance overhead for small practice

DD-006: Bookkeeping Separate Document

Date: 2024-12-23 Status: Active Decision: Bookkeeping requirements in separate document from tax requirements

Context

Client interested in bookkeeping capabilities alongside tax prep.

Decision

Maintain bookkeeping_requirements.md separately from tax_practice_ai_requirements.md.

Rationale

  • Different cadence (monthly vs annual)
  • Different workflow patterns
  • Could serve non-tax clients
  • Independent prioritization and phasing
  • Shares infrastructure with tax system

Consequences

  • Two requirements documents to maintain
  • Clear separation of concerns
  • Easier to defer bookkeeping to post-MVP

DD-007: Bookkeeping Phased Approach

Date: 2024-12-23 Status: Active Decision: Implement bookkeeping in phases, not all at once

Context

Bookkeeping is large scope with varying complexity levels.

Decision

Three-phase implementation: 1. Phase 1: Tax-ready categorization + QuickBooks export 2. Phase 2: Reconciliation, recurring transaction detection 3. Phase 3: Full bookkeeping (chart of accounts, P&L, Balance Sheet)

Rationale

  • Start light, design for full
  • QuickBooks is system of record initially
  • Each phase delivers value
  • Can stop at any phase if needs are met

Consequences

  • Longer total timeline for full feature
  • Clear value delivery at each phase
  • Flexibility to adjust scope based on feedback

DD-008: V1 Integration Strategy

Date: 2024-12-23 Status: Active Decision: Integrate with existing tools, don't replace them in V1

Context

Client uses SmartVault (portal), SurePrep (OCR), UltraTax (tax prep). Could replace or integrate.

Decision

Integrate with SmartVault, SurePrep, and UltraTax (via SurePrep CS Connect). Don't replace industry-standard tools in V1.

Rationale

  • Lower risk for V1 launch
  • Leverage existing investments
  • Focus on AI value-add, not reimplementing commodity features
  • Future versions may replace SurePrep to capture per-return fees
  • UltraTax lacks API (blocker for replacement)

Consequences

  • Dependency on vendor APIs and pricing
  • Integration complexity
  • Per-return fees paid to SurePrep
  • Clear upgrade path for future versions

DD-009: Dual Integration Pattern

Date: 2024-12-23 Status: Active Decision: Each integration gets both a Service (API) and Skill (AI context)

Context

Integrations need both programmatic access and AI understanding.

Decision

  • Services (src/services/) handle API calls, auth, connection management
  • Skills (src/ai/skills/integrations/) provide AI context for data interpretation

Rationale

  • Services handle "how to call"
  • Skills handle "how to understand"
  • AI can interpret vendor-specific data formats
  • Clear separation of concerns

Example

SmartVault: - smartvault_service.py - OAuth, folder sync, file download - src/ai/skills/integrations/smartvault/ - Folder structure meaning, naming conventions

Consequences

  • Two components per integration
  • AI becomes vendor-aware
  • Skills need updating when vendor changes formats

DD-010: Multi-Tenant SaaS Architecture

Date: 2024-12-23 Status: Active Decision: Separate database per tenant firm within one Aurora cluster

Context

Planning SaaS deployment for multiple tax practices.

Decision

Deploy with separate database per tenant within one Aurora cluster.

Rationale

  • Strongest isolation for tax compliance
  • Shared tiered pricing model
  • Easy to migrate growing tenants to dedicated clusters
  • Single Aurora cluster for cost efficiency

Consequences

  • Requires tenant routing middleware
  • Dynamic connection management
  • Separate backup/restore per tenant
  • More complex deployment but stronger security

DD-011: Frontend Two-App Architecture

Date: 2024-12-24 Status: Active Decision: React + Vite with separate Client Portal and Staff App

Context

Need frontend for both tax clients and firm staff.

Decision

Two separate apps with shared component library: - Client Portal: Document upload, status, signing, payments - Staff App: Workflow queues, review UI, AI Q&A, billing - Shared: @tax-practice/ui component library

Technology Stack

React 18, Vite, TypeScript, Tailwind CSS, shadcn/ui, React Query, Zustand

Rationale

  • Production-grade, sellable stack
  • Separate concerns between client and staff UX
  • Shared components reduce duplication
  • HTMX considered but React chosen for richer UX and market perception

Consequences

  • Two deployment targets
  • Shared component library maintenance
  • Consistent UI patterns across apps

Last updated: 2025-12-30