Tax Practice AI - Design Decisions Log¶
Architecture decisions and their rationale
Related Documents¶
| Document | Purpose |
|---|---|
| backlog.md | Active priorities |
| COMPLETED.md | Completed items |
| TECH_DEBT.md | Technical debt tracking |
| ROADMAP.md | Future features |
| ARCHITECTURE.md | System design |
| index.md | Master navigation |
Decision Log¶
| ID | Date | Decision | Status |
|---|---|---|---|
| DD-001 | 2024-12-22 | Service centralization as core pattern | Active |
| DD-002 | 2024-12-22 | Mixed Java/Python architecture | Active |
| DD-003 | 2024-12-22 | Shared config.yaml for both languages | Active |
| DD-004 | 2024-12-22 | Aurora-only starting architecture | Active |
| DD-005 | 2024-12-23 | Self-hosted Airflow orchestration | Active |
| DD-006 | 2024-12-23 | Bookkeeping as separate requirements document | Active |
| DD-007 | 2024-12-23 | Bookkeeping phased approach | Active |
| DD-008 | 2024-12-23 | V1 integration strategy: integrate, don't replace | Active |
| DD-009 | 2024-12-23 | Dual integration pattern: Services + Skills | Active |
| DD-010 | 2024-12-23 | Multi-tenant SaaS: Separate databases per tenant | Active |
| DD-011 | 2024-12-24 | Frontend: React + Vite two-app architecture | Active |
DD-001: Service Centralization¶
Date: 2024-12-22 Status: Active Decision: Service centralization as core pattern
Context¶
Need consistent access patterns for all external services (database, S3, email, payment processors).
Decision¶
All external service access MUST go through centralized service modules in src/services/. No direct connector instantiation elsewhere.
Rationale¶
- Single point of access for all external services
- Easier maintenance, testing, and audit
- Consistent error handling
- Connection pooling managed in one place
- Simplified dependency injection for tests
Consequences¶
- All new integrations must follow the pattern
- Existing code must be refactored if it bypasses services
- Slightly more boilerplate for simple cases
DD-002: Mixed Java/Python Architecture¶
Date: 2024-12-22 Status: Active Decision: Mixed Java/Python architecture
Context¶
System requires both rapid API development and high-performance document processing.
Decision¶
Use Python for API/orchestration/AI and Java for performance-critical document processing.
Rationale¶
| Component | Language | Why |
|---|---|---|
| API Layer | Python | FastAPI is productive, async-native, great for REST |
| AI Integration | Python | Bedrock SDK, prompt engineering, rapid iteration |
| Document Parsing | Java | Performance for OCR/PDF processing at scale |
| Tax Calculations | Java | Complex business logic, type safety, existing libraries |
| Batch Processing | Java | Memory efficiency, parallelism for large datasets |
Consequences¶
- Two build systems (pip + Maven)
- Developers need both skill sets
- Inter-process communication required
- Shared configuration management needed
DD-003: Shared config.yaml¶
Date: 2024-12-22 Status: Active Decision: Shared config.yaml for both languages
Context¶
With mixed Java/Python architecture, configuration could diverge.
Decision¶
Single config.yaml as source of truth, read by both Python (PyYAML) and Java (SnakeYAML).
Rationale¶
- Single source of truth
- Environment variable substitution works in both
- No configuration drift between components
- Easier deployment configuration
Consequences¶
- Both languages must parse same format
- Config schema must be compatible with both parsers
- Thoroughly comment all YAML parameters
DD-004: Aurora-Only Starting Architecture¶
Date: 2024-12-22 Status: Active Decision: Start with Aurora PostgreSQL only, defer Snowflake/analytics tier
Context¶
Need database for 7+ years of tax data. Could use Aurora for OLTP + Snowflake for OLAP, or simplify.
Decision¶
Aurora handles all data including 7+ years retention (via table partitioning). Add analytics tier only when Aurora partitioning is insufficient.
Rationale¶
- Aurora handles hundreds of thousands of records easily
- Table partitioning by year supports 7+ year retention
- Avoids ETL complexity, schema sync, query routing
- One system to maintain instead of two
- Cost overhead of Snowflake not justified at small scale
Trigger for Revisiting¶
- 10,000+ returns/year
- Query response > 5 seconds on indexed data
- Complex analytics spanning multiple years
Consequences¶
- Simpler architecture
- Lower initial cost
- May need future migration if scale demands it
- S3 used for document storage only (not a query layer)
DD-005: Self-Hosted Airflow¶
Date: 2024-12-23 Status: Active Decision: Use self-hosted Airflow on EC2 instead of MWAA or Step Functions
Context¶
Need workflow orchestration for document processing, reminders, scheduled tasks.
Decision¶
Self-hosted Apache Airflow on EC2 t3.medium (~$23/mo reserved).
Rationale¶
| Factor | Self-Hosted | MWAA | Step Functions |
|---|---|---|---|
| Cost | ~$23/mo | $360+/mo | Pay per transition |
| Control | Full | Limited | Limited |
| UI | Yes | Yes | Basic |
| DAG complexity | Python (full power) | Python | JSON/YAML |
Consequences¶
- Self-managed updates and maintenance
- Full control over configuration
- Significant cost savings (~$340/mo vs MWAA)
- Acceptable maintenance overhead for small practice
DD-006: Bookkeeping Separate Document¶
Date: 2024-12-23 Status: Active Decision: Bookkeeping requirements in separate document from tax requirements
Context¶
Client interested in bookkeeping capabilities alongside tax prep.
Decision¶
Maintain bookkeeping_requirements.md separately from tax_practice_ai_requirements.md.
Rationale¶
- Different cadence (monthly vs annual)
- Different workflow patterns
- Could serve non-tax clients
- Independent prioritization and phasing
- Shares infrastructure with tax system
Consequences¶
- Two requirements documents to maintain
- Clear separation of concerns
- Easier to defer bookkeeping to post-MVP
DD-007: Bookkeeping Phased Approach¶
Date: 2024-12-23 Status: Active Decision: Implement bookkeeping in phases, not all at once
Context¶
Bookkeeping is large scope with varying complexity levels.
Decision¶
Three-phase implementation: 1. Phase 1: Tax-ready categorization + QuickBooks export 2. Phase 2: Reconciliation, recurring transaction detection 3. Phase 3: Full bookkeeping (chart of accounts, P&L, Balance Sheet)
Rationale¶
- Start light, design for full
- QuickBooks is system of record initially
- Each phase delivers value
- Can stop at any phase if needs are met
Consequences¶
- Longer total timeline for full feature
- Clear value delivery at each phase
- Flexibility to adjust scope based on feedback
DD-008: V1 Integration Strategy¶
Date: 2024-12-23 Status: Active Decision: Integrate with existing tools, don't replace them in V1
Context¶
Client uses SmartVault (portal), SurePrep (OCR), UltraTax (tax prep). Could replace or integrate.
Decision¶
Integrate with SmartVault, SurePrep, and UltraTax (via SurePrep CS Connect). Don't replace industry-standard tools in V1.
Rationale¶
- Lower risk for V1 launch
- Leverage existing investments
- Focus on AI value-add, not reimplementing commodity features
- Future versions may replace SurePrep to capture per-return fees
- UltraTax lacks API (blocker for replacement)
Consequences¶
- Dependency on vendor APIs and pricing
- Integration complexity
- Per-return fees paid to SurePrep
- Clear upgrade path for future versions
DD-009: Dual Integration Pattern¶
Date: 2024-12-23 Status: Active Decision: Each integration gets both a Service (API) and Skill (AI context)
Context¶
Integrations need both programmatic access and AI understanding.
Decision¶
- Services (
src/services/) handle API calls, auth, connection management - Skills (
src/ai/skills/integrations/) provide AI context for data interpretation
Rationale¶
- Services handle "how to call"
- Skills handle "how to understand"
- AI can interpret vendor-specific data formats
- Clear separation of concerns
Example¶
SmartVault:
- smartvault_service.py - OAuth, folder sync, file download
- src/ai/skills/integrations/smartvault/ - Folder structure meaning, naming conventions
Consequences¶
- Two components per integration
- AI becomes vendor-aware
- Skills need updating when vendor changes formats
DD-010: Multi-Tenant SaaS Architecture¶
Date: 2024-12-23 Status: Active Decision: Separate database per tenant firm within one Aurora cluster
Context¶
Planning SaaS deployment for multiple tax practices.
Decision¶
Deploy with separate database per tenant within one Aurora cluster.
Rationale¶
- Strongest isolation for tax compliance
- Shared tiered pricing model
- Easy to migrate growing tenants to dedicated clusters
- Single Aurora cluster for cost efficiency
Consequences¶
- Requires tenant routing middleware
- Dynamic connection management
- Separate backup/restore per tenant
- More complex deployment but stronger security
DD-011: Frontend Two-App Architecture¶
Date: 2024-12-24 Status: Active Decision: React + Vite with separate Client Portal and Staff App
Context¶
Need frontend for both tax clients and firm staff.
Decision¶
Two separate apps with shared component library: - Client Portal: Document upload, status, signing, payments - Staff App: Workflow queues, review UI, AI Q&A, billing - Shared: @tax-practice/ui component library
Technology Stack¶
React 18, Vite, TypeScript, Tailwind CSS, shadcn/ui, React Query, Zustand
Rationale¶
- Production-grade, sellable stack
- Separate concerns between client and staff UX
- Shared components reduce duplication
- HTMX considered but React chosen for richer UX and market perception
Consequences¶
- Two deployment targets
- Shared component library maintenance
- Consistent UI patterns across apps
Last updated: 2025-12-30