Tax Practice AI - Technical Debt¶
Tracking active technical debt items, code quality issues, and deferred improvements
Related Documents¶
| Document | Purpose |
|---|---|
| backlog.md | Active priorities |
| COMPLETED.md | Completed items (including TD-002, TD-003, TD-005, TD-014-017) |
| ROADMAP.md | Future features |
| DESIGN_DECISIONS.md | Architecture decisions |
| index.md | Master navigation |
Summary¶
| Status | Count |
|---|---|
| In Progress | 1 |
| Documented (with open items) | 3 |
| Not Started | 8 |
| Active Total | 12 |
9 items completed - see COMPLETED.md
TD-001: Java Build Configuration¶
Status: Not Started Priority: P2 Description: Need to establish Maven/Gradle configuration for Java components Notes: Should mirror ingestion engine patterns for consistency
TD-004: UAT Script Creation¶
Status: Not Started Priority: P1 (Pre-Launch) Description: Create User Acceptance Testing scripts for client validation Meeting Reference: 2024-12-29 call (Don's item #1)
Traceability: - [ ] Create requirements traceability matrix (RTM) linking UAT scripts to USER_STORIES.md - [ ] Each UAT test case references specific story ID (e.g., S2-001, S10-003) - [ ] Track coverage percentage against requirements - [ ] Bug tracking with requirement linkage for root cause analysis
UAT Scripts: - [ ] Define UAT scenarios for each sequence (S2-S18) - [ ] Create step-by-step testing scripts with requirement references - [ ] Define expected outcomes and acceptance criteria per story - [ ] Create UAT reporting templates with pass/fail per requirement - [ ] Document rollback procedures for failed UAT
UI Test Coverage (per 2024-12-29 meeting): - [ ] UI tests must cover every function - [ ] Determine scope through UI review, route inspection, and relevant documentation
TD-006: Placeholder and Assumption Audit¶
Status: In Progress Priority: P1 (Pre-Launch) Description: Review and address "for now", "placeholder", and "TODO" items throughout codebase Plan: See docs/plans/TECH_DEBT_CLEANUP.md
Audit Complete (2024-12-24): Found 89 items, categorized as: - 35 Production Blockers: External service stubs (Email, SMS, Persona, SmartVault, SurePrep, Google) - 12 Development Conveniences: Acceptable placeholders (PDF generation, placeholder citations) - 42 Not Issues: Documentation/expected behavior (SQL placeholders, template variables)
Non-API Fixes Complete (2024-12-24): 4 items fixed without external API credentials: - [x] Consent route client lookup (fetches real client data from ClientRepository) - [x] AI QA citation resolution (resolves document names to actual IDs via DocumentRepository) - [x] EFiling ready checks (checks Form 8879 signatures and document checklist) - [x] Placeholder PDF generation (ReportLab implementation) - [x] Fixed ChecklistRepository abstract method implementation
Production blockers organized into 6 phases: - [x] Phase 0: Audit complete - [x] Phase 0b: Non-API fixes complete (4 items) - [ ] Phase 1: EmailService + SMSService (5-7 hrs) - requires API credentials - [ ] Phase 2: PersonaService (4-5 hrs) - requires API credentials - [ ] Phase 3: SmartVaultService (6-8 hrs) - requires API credentials - [ ] Phase 4: SurePrepService (8-10 hrs) - requires API credentials - [ ] Phase 5: GoogleService (6-8 hrs) - requires API credentials - [ ] Phase 6: Webhook Security (2-3 hrs)
TD-007: Code Audit Findings (AUDIT-001 through AUDIT-008)¶
Status: Documented (action items open) Priority: P2 (Post-MVP) Audit Date: 2024-12-27 Report: docs/audits/CODE_AUDIT_2024-12-27.md
Overall Rating: 8.5/10 - Production-ready architecture in src/
Action Items: - [x] AUDIT-001: ~~Document local_api.py as dev-only~~ - N/A (local_api.py deleted in TD-018) - [x] AUDIT-002: ~~Add connection pooling to local_api.py~~ - N/A (local_api.py deleted in TD-018) - [x] AUDIT-003: ~~Split repository files exceeding 1,000 lines~~ - Cancelled (single developer, no merge conflict risk) - [ ] AUDIT-004: Add inline comments to complex workflow logic (2 hrs) - [x] AUDIT-005: ~~Move hardcoded values to environment variables~~ - Done in TD-018 - [x] AUDIT-006: ~~Consider migrating local_api.py to use src/ modules~~ - Done (deleted, using src/api directly) - [ ] AUDIT-007: Add request/response logging middleware (2 hrs) - [ ] AUDIT-008: Implement caching layer for read-heavy endpoints (4 hrs)
TD-008: Security Audit Findings (SEC-001 through SEC-016)¶
Status: Documented (P0 items open) Priority: P0 (Critical), P1 (High/Medium), P2 (Low) Audit Date: 2024-12-27 Report: docs/audits/SECURITY_AUDIT_2024-12-27.md
Overall Security Rating: 7/10 - Solid foundation, needs hardening before production
Critical (P0 - Fix Before Production): - [ ] SEC-001: XSS via template injection - src/services/template_service.py:276-301 - [ ] SEC-002: dangerouslySetInnerHTML without sanitization - TourOverlay.tsx:187-217 - [x] SEC-003: Missing security headers - Fixed (2025-12-31): SecurityHeadersMiddleware adds X-Frame-Options, X-Content-Type-Options, X-XSS-Protection, Referrer-Policy, HSTS, CSP - [x] SEC-004: Dev API weak auth - Already documented as dev-only per AUDIT-001
High (P0 - Fix Before Production): - [x] SEC-005: ORDER BY SQL injection risk - Fixed (2025-12-31): Whitelist validation in BaseRepository._validate_order_by() and AuditService._validate_order_by() - [x] SEC-006: CORS overly permissive - Fixed (2025-12-31): Centralized CORSConfig with explicit methods/headers whitelist - [ ] SEC-007: Missing role enforcement on list endpoints - clients.py:66 - [ ] SEC-008: Stripe test keys in plain text - .env:44-45
Medium (P1 - Fix in V1.1): - [ ] SEC-009: No token revocation mechanism - auth.py - [ ] SEC-010: Payment bypass flag needs compliance doc - efiling.py:31-35 - [ ] SEC-011: Search parameter unbounded - clients.py:70 - [x] SEC-012: ~~Filename not sanitized in S3 path - local_api.py:879~~ - N/A (local_api.py deleted) - [ ] SEC-013: Request size limits not enforced - main.py, nginx.conf
Low (P2 - Address When Convenient): - [ ] SEC-014: In-memory rate limiting (needs Redis for prod) - rate_limiting.py:68 - [x] SEC-015: ~~Error messages may leak info - local_api.py:1513~~ - N/A (local_api.py deleted) - [ ] SEC-016: Email query parameter not validated - registration.py:252
TD-009: Compliance Audit Findings (COMP-001 through COMP-022)¶
Status: Documented (P0 items open) Priority: P0 (Critical), P1 (High), P2 (Medium) Audit Date: 2024-12-27 Report: docs/audits/COMPLIANCE_AUDIT_2024-12-27.md
Overall Compliance Rating: 70/100 - Strong foundation with critical implementation gaps
Critical (P0 - Fix Before Production): - [ ] COMP-001: AI/cloud processing consent not implemented - Form 7216 violation risk - [ ] COMP-002: Form 7216 consent not enforced at e-filing - efiling_workflow.py - [ ] COMP-003: Authentication events not logged - audit_service.log_auth() never called - 2024-12-29 meeting: Believed complete but needs verification (item #12) - [ ] COMP-004: Document/client access not logged - only modifications tracked - 2024-12-29 meeting: Believed complete but needs verification (item #12) - [ ] COMP-005: Account lockout not implemented - SEC-005 design only - [ ] COMP-006: Field-level encryption not implemented - ENC-004 design only - [ ] COMP-007: Conflict of interest checks missing - Circular 230 requirement - [ ] COMP-008: Form 2848 (POA) workflow missing - domain model only
High (P1 - Fix in V1.1): - [ ] COMP-009: PTIN expiration not enforced in workflows - [ ] COMP-010: Competency/credential tracking missing - CIR-004 - [ ] COMP-011: MFA not implemented - framework only - [ ] COMP-012: Employee training tracking missing - WISP requirement - [ ] COMP-013: Database immutability not enforced - REVOKE statements commented - [ ] COMP-014: Incident detection/alerting not implemented - [ ] COMP-015: Persona integration in dry-run mode - [ ] COMP-016: Authorization denials not logged to audit
Medium (P2 - Address When Convenient): - [ ] COMP-017: Legal hold not implemented - RET-004 - [ ] COMP-018: Secure deletion not implemented - RET-005 - [ ] COMP-019: Vendor security assessments missing - [ ] COMP-020: Password policy not enforced - [ ] COMP-021: IP/device context not auto-captured - [ ] COMP-022: Consent table schema mismatch
Effort Estimate: 85-120 hours total (P0: 40-60 hrs, P1: 30-40 hrs, P2: 15-20 hrs)
TD-010: Remove Frontend Mock Mode Code¶
Status: Not Started Priority: P3 (Low - Technical Debt) Added: 2024-12-28
Description: Remove all DEV_MODE conditional branches and MOCK_* data arrays from frontend/packages/ui/src/lib/api.ts. Mock mode was disabled on 2024-12-28 in favor of using real API with seeded database.
Scope:
- Remove ~30 if (DEV_MODE) { ... } conditional blocks
- Remove MOCK_CLIENTS, MOCK_RETURNS, MOCK_DOCUMENTS, MOCK_ANALYSES, MOCK_CHAT_HISTORY arrays (~300 lines)
- Remove DEV_MODE constant and related comments
Effort Estimate: 1 hour
TD-011: Database Schema Consolidation - Remove Legacy Tables¶
Status: Not Started Priority: P1 (High - Pre-Production) Added: 2024-12-29
Description: The database contains duplicate tables from two parallel schemas that were never consolidated. The codebase uses the singular tables (modern schema), but plural tables still exist with stale data, causing confusion.
Tables to Remove (Legacy/Unused):
| Legacy Table | Active Table | Rows (Legacy) |
|--------------|--------------|---------------|
| clients | client | 6 |
| documents | document | 8 |
| tax_returns | tax_return | 5 |
Action Items:
- [ ] Verify no code references plural table names
- [ ] Backup legacy table data (in case needed for reference)
- [ ] Drop clients, documents, tax_returns tables
- [ ] Update DATABASE_SCHEMA.sql to remove legacy DDL
- [ ] Update any seed scripts that reference legacy tables
Risk: Low - code audit confirms no active references to plural tables Effort Estimate: 2 hours
TD-012: Deploy Tax Skills for Chat Sessions¶
Status: Not Started Priority: P1 (High - V1 Companion) Added: 2024-12-29 Meeting Reference: 2024-12-29 call item #10
Description: The tax skills (Tax Code Reference Lookup) exist but need deployment configuration so the AI chat feature can access them in both dev and cloud environments.
Action Items: - [ ] Configure skill deployment for local dev environment - [ ] Configure skill deployment for cloud (Cloudflare Pages) - [ ] Verify chat sessions can invoke tax skills - [ ] Test skill responses in both environments
Effort Estimate: 2-3 hours
TD-013: Identify Structured AI Request Types¶
Status: Not Started Priority: P1 (High - V1 Companion) Added: 2024-12-29 Meeting Reference: 2024-12-29 call item #11
Description: Define and implement structured request types for the AI assistant to handle common questions efficiently using cached MD documentation.
Structured Request Types: - [ ] "Are docs complete?" - Document completeness check against checklist - [ ] "What is missing?" - Identify missing documents for a return - [ ] "Summary" - Generate return summary from cached analysis
Implementation Approach: - [ ] Define request type patterns (keywords/intent detection) - [ ] Map request types to cached MD documents - [ ] Implement response generation from cached data (fast, no API call) - [ ] Fall back to full AI analysis when no cached data available - [ ] Leverage existing tax skills per 2024-12-29 meeting; assess if new skill needed
Dependencies: TD-012 (tax skills deployment) Effort Estimate: 4-6 hours
TD-018: Remove Hardcoded Config and Fallback Paths¶
Status: ✅ Completed (2025-12-30) Details: See COMPLETED.md
TD-020: E-Filing Schema Alignment (Mock → Production)¶
Status: Not Started Priority: P1 (High - Tests Blocked) Added: 2025-12-30 Discovered: Test suite failures when removing mock code branch
Description: The e-filing repository code was written against a mock schema structure that differs from the production DATABASE_SCHEMA.sql. Tests fail with "relation efile_submission does not exist" because the repository expects tables that don't exist in production.
Root Cause: Mock code used a simplified 2-table structure; production uses a 3-table normalized structure.
Current State (Mock Schema - what repository expects):
efile_submission (per jurisdiction)
├── id, tax_return_id, client_id, jurisdiction, state_code
├── status, sureprep_tracking_id, ultratax_confirmation, irs_confirmation
├── submitted_at, transmitted_at, accepted_at, rejected_at
└── filing_override, filing_override_by, filing_override_reason, filing_override_at
efile_rejection (references submission_id)
├── id, submission_id, tax_return_id
├── error_code, error_message, category, auto_correctable
└── resolution_notes, resolved_at, resolved_by
Production Schema (DATABASE_SCHEMA.sql):
efiling_transmission (per return, links to transmitter_account)
├── id, tax_return_id, transmitter_account_id
├── transmission_type, status, queued_at, transmitted_at
├── federal_submission_id, state_submissions (JSONB)
└── error_message, retry_count
efiling_acknowledgement (per jurisdiction acknowledgement)
├── id, transmission_id (FK to efiling_transmission)
├── jurisdiction, ack_type, ack_date
└── confirmation_number, irs_submission_id, raw_response
efiling_rejection (references acknowledgement_id)
├── id, acknowledgement_id (FK to efiling_acknowledgement)
├── error_code, error_category, error_message, field_name
├── is_auto_correctable, resolution_status, assigned_to
└── resolution_notes, resolved_at, resolved_by
Files Requiring Changes: | File | Change Type | Description | |------|-------------|-------------| | src/domain/efiling.py | Major | Refactor EFileSubmission entity to match transmission/acknowledgement structure | | src/repositories/efiling_repository.py | Major | Rewrite queries for 3-table structure | | src/workflows/filing/efiling_workflow.py | Moderate | Update workflow to use new repository methods | | src/api/routes/efiling.py | Moderate | Update API responses to match new entity structure | | src/api/schemas/efiling_schemas.py | Moderate | Update Pydantic schemas | | tests/e2e/test_efiling_api.py | Moderate | Update test assertions for new response structure |
Blocked Tests (14 tests): - tests/e2e/test_efiling_api.py - All 14 tests fail with "relation efile_submission does not exist"
Decision Points: 1. Should client_id be added to efiling_transmission or derived via tax_return FK? 2. Should filing_override fields be added to production schema or handled differently? 3. Should API response structure change or maintain backward compatibility?
Effort Estimate: 8-12 hours - Domain/Repository refactor: 4-6 hours - API/Schema updates: 2-3 hours - Test updates: 2-3 hours
TD-021: Login Attempt Repository Timezone Bug¶
Status: ✅ Completed (2025-12-30) Priority: P2 (Medium - 5 Tests Affected) Added: 2025-12-30 Discovered: Test suite failures in test_login_attempt_repository.py
Description: The login_attempt_repository.py had timezone-aware vs naive datetime comparison bugs causing 5 test failures.
Root Cause: Database uses TIMESTAMP (naive) but code passed timezone-aware datetimes to queries.
Fix Applied: Convert timezone-aware datetimes to naive using .replace(tzinfo=None) before passing to database queries. Fixed in 4 locations:
- get_recent_failed_attempts (line 125)
- count_recent_failed_attempts (line 161)
- get_lockout_status (line 227)
- get_attempts_by_ip (line 273)
Note: Long-term fix would be to change schema to use TIMESTAMPTZ instead of TIMESTAMP. See 79 occurrences of TIMESTAMP vs 19 TIMESTAMPTZ in DATABASE_SCHEMA.sql.
TD-022: Domain Enum Consolidation - Semantic Mismatches¶
Status: ✅ Complete Priority: P2 (Medium - Code Quality) Added: 2025-12-31 Completed: 2025-12-31 Discovered: Code audit Phase 2 - enum consolidation investigation
Resolution (Phase 7 - commit 4c0fd23):
- Unified VerificationStatus into single source at src/domain/identity_verification.py
- Added INITIATED status to canonical enum (values: INITIATED, PENDING, IN_PROGRESS, APPROVED, NEEDS_REVIEW, REJECTED, EXPIRED)
- Updated src/domain/client.py to import from identity_verification.py
- Renamed SurePrep's enum to DocumentVerificationStatus to avoid collision
- All 2253 tests passing
Original Description: During code audit, investigation of duplicate enum definitions revealed semantic mismatches that prevent simple consolidation. The enums have different meanings and database enum values don't match either Python definition.
VerificationStatus Semantic Mismatch (RESOLVED):
| Location | Original Values | Resolution |
|---|---|---|
| src/domain/client.py | INITIATED, PENDING, VERIFIED, FAILED, EXPIRED, MANUAL_REVIEW | Removed - now imports from identity_verification.py |
| src/domain/identity_verification.py | PENDING, IN_PROGRESS, APPROVED, REJECTED, EXPIRED | Canonical source - added INITIATED, NEEDS_REVIEW |
| src/ai/.../sureprep/mappings.py | VerificationStatus | Renamed to DocumentVerificationStatus |
ExtractionSource Value Mismatch (Deferred - intentional distinction):
| Location | AI Value | Issue |
|---|---|---|
| src/domain/extraction.py | INTERNAL_AI = "internal_ai" | Lowercase with underscore |
| src/domain/metadata.py | CLAUDE_BEDROCK = "claude_bedrock" | Different semantic meaning |
Analysis: These represent different things - one is a generic "internal AI" source, the other specifically identifies Claude/Bedrock. Intentional distinction, no action needed.
Completed Action Items: - [x] Investigate database enum usage vs Python enum usage - [x] Determine if VerificationStatus should remain separate (intentional) or be unified - [x] Unified VerificationStatus with identity_verification.py as canonical source - [x] Document ExtractionSource difference is intentional (no change needed) - [ ] Align database enum values with Python enum (future migration if needed)
Status: Consolidated 2024-12-27 Total Items: 15 (Security: 6, Compliance: 8, Code: 1 conditional) Total Effort: 45-50 hours
Implementation order optimized for dependencies and quick wins:
Phase 1: Quick Wins (2.5 hrs) - ✅ COMPLETE (2025-12-31)¶
| ID | Issue | Status | Fix |
|---|---|---|---|
| SEC-003 | Missing security headers | ✅ Done | SecurityHeadersMiddleware with configurable headers |
| SEC-005 | ORDER BY SQL injection risk | ✅ Done | Whitelist validation in BaseRepository + AuditService |
| SEC-006 | CORS overly permissive | ✅ Done | CORSConfig with explicit methods/headers whitelist |
Phase 2: XSS Fixes (3 hrs) - Parallelize¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| SEC-001 | XSS via template injection | 1 hr | Use html.escape() or Jinja2 autoescape |
| SEC-002 | dangerouslySetInnerHTML | 2 hrs | Add DOMPurify or refactor to React components |
Phase 3: Access Control (5 hrs) - Sequential¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| SEC-007 | Missing role enforcement | 2 hrs | Filter list endpoints by user role |
| COMP-005 | Account lockout missing | 3 hrs | Add lockout fields, check before auth |
Phase 4: Audit Logging (5 hrs) - Sequential after Phase 3¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| COMP-003 | Auth events not logged | 2 hrs | Call audit_service.log_auth() in auth routes |
| COMP-004 | Access not logged | 3 hrs | Add log_access() to all GET endpoints |
Phase 5: Consent System (6 hrs) - Sequential¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| COMP-001 | AI processing consent missing | 4 hrs | Add USE_AI_PROCESSING, check in BedrockService |
| COMP-002 | E-filing consent not enforced | 2 hrs | Validate Form 7216 before mark_ready_for_filing |
Phase 6: New Workflows (16 hrs) - Parallelize¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| COMP-007 | COI checks missing | 8 hrs | COI table, check workflow, API, audit log |
| COMP-008 | Form 2848 POA missing | 8 hrs | Form generation, signature, access control |
Phase 7: Data Protection (8 hrs)¶
| ID | Issue | Effort | Fix |
|---|---|---|---|
| COMP-006 | Field encryption missing | 8 hrs | encryption_service.py with pgcrypto wrapper |
Conditional¶
| ID | Issue | Effort | Condition |
|---|---|---|---|
| ~~AUDIT-002~~ | ~~local_api.py connection pooling~~ | ~~2 hrs~~ | N/A - local_api.py deleted in TD-018 |
Last updated: 2025-12-31