Tax Practice AI - Technical Debt¶

Tracking active technical debt items, code quality issues, and deferred improvements

Document	Purpose
backlog.md	Active priorities
COMPLETED.md	Completed items (including TD-002, TD-003, TD-005, TD-014-017)
ROADMAP.md	Future features
DESIGN_DECISIONS.md	Architecture decisions
index.md	Master navigation

Summary¶

Status	Count
In Progress	1
Documented (with open items)	3
Not Started	8
Active Total	12

9 items completed - see COMPLETED.md

TD-001: Java Build Configuration¶

Status: Not Started Priority: P2 Description: Need to establish Maven/Gradle configuration for Java components Notes: Should mirror ingestion engine patterns for consistency

TD-004: UAT Script Creation¶

Status: Not Started Priority: P1 (Pre-Launch) Description: Create User Acceptance Testing scripts for client validation Meeting Reference: 2024-12-29 call (Don's item #1)

Traceability: - [ ] Create requirements traceability matrix (RTM) linking UAT scripts to USER_STORIES.md - [ ] Each UAT test case references specific story ID (e.g., S2-001, S10-003) - [ ] Track coverage percentage against requirements - [ ] Bug tracking with requirement linkage for root cause analysis

UAT Scripts: - [ ] Define UAT scenarios for each sequence (S2-S18) - [ ] Create step-by-step testing scripts with requirement references - [ ] Define expected outcomes and acceptance criteria per story - [ ] Create UAT reporting templates with pass/fail per requirement - [ ] Document rollback procedures for failed UAT

UI Test Coverage (per 2024-12-29 meeting): - [ ] UI tests must cover every function - [ ] Determine scope through UI review, route inspection, and relevant documentation

TD-006: Placeholder and Assumption Audit¶

Status: In Progress Priority: P1 (Pre-Launch) Description: Review and address "for now", "placeholder", and "TODO" items throughout codebase Plan: See docs/plans/TECH_DEBT_CLEANUP.md

Audit Complete (2024-12-24): Found 89 items, categorized as: - 35 Production Blockers: External service stubs (Email, SMS, Persona, SmartVault, SurePrep, Google) - 12 Development Conveniences: Acceptable placeholders (PDF generation, placeholder citations) - 42 Not Issues: Documentation/expected behavior (SQL placeholders, template variables)

Non-API Fixes Complete (2024-12-24): 4 items fixed without external API credentials: - [x] Consent route client lookup (fetches real client data from ClientRepository) - [x] AI QA citation resolution (resolves document names to actual IDs via DocumentRepository) - [x] EFiling ready checks (checks Form 8879 signatures and document checklist) - [x] Placeholder PDF generation (ReportLab implementation) - [x] Fixed ChecklistRepository abstract method implementation

Production blockers organized into 6 phases: - [x] Phase 0: Audit complete - [x] Phase 0b: Non-API fixes complete (4 items) - [ ] Phase 1: EmailService + SMSService (5-7 hrs) - requires API credentials - [ ] Phase 2: PersonaService (4-5 hrs) - requires API credentials - [ ] Phase 3: SmartVaultService (6-8 hrs) - requires API credentials - [ ] Phase 4: SurePrepService (8-10 hrs) - requires API credentials - [ ] Phase 5: GoogleService (6-8 hrs) - requires API credentials - [ ] Phase 6: Webhook Security (2-3 hrs)

TD-007: Code Audit Findings (AUDIT-001 through AUDIT-008)¶

Status: Documented (action items open) Priority: P2 (Post-MVP) Audit Date: 2024-12-27 Report: docs/audits/CODE_AUDIT_2024-12-27.md

Overall Rating: 8.5/10 - Production-ready architecture in src/

Action Items: - [x] AUDIT-001: ~~Document local_api.py as dev-only~~ - N/A (local_api.py deleted in TD-018) - [x] AUDIT-002: ~~Add connection pooling to local_api.py~~ - N/A (local_api.py deleted in TD-018) - [x] AUDIT-003: ~~Split repository files exceeding 1,000 lines~~ - Cancelled (single developer, no merge conflict risk) - [ ] AUDIT-004: Add inline comments to complex workflow logic (2 hrs) - [x] AUDIT-005: ~~Move hardcoded values to environment variables~~ - Done in TD-018 - [x] AUDIT-006: ~~Consider migrating local_api.py to use src/ modules~~ - Done (deleted, using src/api directly) - [ ] AUDIT-007: Add request/response logging middleware (2 hrs) - [ ] AUDIT-008: Implement caching layer for read-heavy endpoints (4 hrs)

TD-008: Security Audit Findings (SEC-001 through SEC-016)¶

Status: Documented (P0 items open) Priority: P0 (Critical), P1 (High/Medium), P2 (Low) Audit Date: 2024-12-27 Report: docs/audits/SECURITY_AUDIT_2024-12-27.md

Overall Security Rating: 7/10 - Solid foundation, needs hardening before production

Critical (P0 - Fix Before Production): - [ ] SEC-001: XSS via template injection - src/services/template_service.py:276-301 - [ ] SEC-002: dangerouslySetInnerHTML without sanitization - TourOverlay.tsx:187-217 - [x] SEC-003: Missing security headers - Fixed (2025-12-31): SecurityHeadersMiddleware adds X-Frame-Options, X-Content-Type-Options, X-XSS-Protection, Referrer-Policy, HSTS, CSP - [x] SEC-004: Dev API weak auth - Already documented as dev-only per AUDIT-001

High (P0 - Fix Before Production): - [x] SEC-005: ORDER BY SQL injection risk - Fixed (2025-12-31): Whitelist validation in BaseRepository._validate_order_by() and AuditService._validate_order_by() - [x] SEC-006: CORS overly permissive - Fixed (2025-12-31): Centralized CORSConfig with explicit methods/headers whitelist - [ ] SEC-007: Missing role enforcement on list endpoints - clients.py:66 - [ ] SEC-008: Stripe test keys in plain text - .env:44-45

Medium (P1 - Fix in V1.1): - [ ] SEC-009: No token revocation mechanism - auth.py - [ ] SEC-010: Payment bypass flag needs compliance doc - efiling.py:31-35 - [ ] SEC-011: Search parameter unbounded - clients.py:70 - [x] SEC-012: ~~Filename not sanitized in S3 path - local_api.py:879~~ - N/A (local_api.py deleted) - [ ] SEC-013: Request size limits not enforced - main.py, nginx.conf

Low (P2 - Address When Convenient): - [ ] SEC-014: In-memory rate limiting (needs Redis for prod) - rate_limiting.py:68 - [x] SEC-015: ~~Error messages may leak info - local_api.py:1513~~ - N/A (local_api.py deleted) - [ ] SEC-016: Email query parameter not validated - registration.py:252

TD-009: Compliance Audit Findings (COMP-001 through COMP-022)¶

Status: Documented (P0 items open) Priority: P0 (Critical), P1 (High), P2 (Medium) Audit Date: 2024-12-27 Report: docs/audits/COMPLIANCE_AUDIT_2024-12-27.md

Overall Compliance Rating: 70/100 - Strong foundation with critical implementation gaps

Critical (P0 - Fix Before Production): - [ ] COMP-001: AI/cloud processing consent not implemented - Form 7216 violation risk - [ ] COMP-002: Form 7216 consent not enforced at e-filing - efiling_workflow.py - [ ] COMP-003: Authentication events not logged - audit_service.log_auth() never called - 2024-12-29 meeting: Believed complete but needs verification (item #12) - [ ] COMP-004: Document/client access not logged - only modifications tracked - 2024-12-29 meeting: Believed complete but needs verification (item #12) - [ ] COMP-005: Account lockout not implemented - SEC-005 design only - [ ] COMP-006: Field-level encryption not implemented - ENC-004 design only - [ ] COMP-007: Conflict of interest checks missing - Circular 230 requirement - [ ] COMP-008: Form 2848 (POA) workflow missing - domain model only

High (P1 - Fix in V1.1): - [ ] COMP-009: PTIN expiration not enforced in workflows - [ ] COMP-010: Competency/credential tracking missing - CIR-004 - [ ] COMP-011: MFA not implemented - framework only - [ ] COMP-012: Employee training tracking missing - WISP requirement - [ ] COMP-013: Database immutability not enforced - REVOKE statements commented - [ ] COMP-014: Incident detection/alerting not implemented - [ ] COMP-015: Persona integration in dry-run mode - [ ] COMP-016: Authorization denials not logged to audit

Medium (P2 - Address When Convenient): - [ ] COMP-017: Legal hold not implemented - RET-004 - [ ] COMP-018: Secure deletion not implemented - RET-005 - [ ] COMP-019: Vendor security assessments missing - [ ] COMP-020: Password policy not enforced - [ ] COMP-021: IP/device context not auto-captured - [ ] COMP-022: Consent table schema mismatch

Effort Estimate: 85-120 hours total (P0: 40-60 hrs, P1: 30-40 hrs, P2: 15-20 hrs)

TD-010: Remove Frontend Mock Mode Code¶

Status: Not Started Priority: P3 (Low - Technical Debt) Added: 2024-12-28

Description: Remove all DEV_MODE conditional branches and MOCK_* data arrays from frontend/packages/ui/src/lib/api.ts. Mock mode was disabled on 2024-12-28 in favor of using real API with seeded database.

Scope: - Remove ~30 if (DEV_MODE) { ... } conditional blocks - Remove MOCK_CLIENTS, MOCK_RETURNS, MOCK_DOCUMENTS, MOCK_ANALYSES, MOCK_CHAT_HISTORY arrays (~300 lines) - Remove DEV_MODE constant and related comments

Effort Estimate: 1 hour

TD-011: Database Schema Consolidation - Remove Legacy Tables¶

Status: Not Started Priority: P1 (High - Pre-Production) Added: 2024-12-29

Description: The database contains duplicate tables from two parallel schemas that were never consolidated. The codebase uses the singular tables (modern schema), but plural tables still exist with stale data, causing confusion.

Tables to Remove (Legacy/Unused): | Legacy Table | Active Table | Rows (Legacy) | |--------------|--------------|---------------| | clients | client | 6 | | documents | document | 8 | | tax_returns | tax_return | 5 |

Action Items: - [ ] Verify no code references plural table names - [ ] Backup legacy table data (in case needed for reference) - [ ] Drop clients, documents, tax_returns tables - [ ] Update DATABASE_SCHEMA.sql to remove legacy DDL - [ ] Update any seed scripts that reference legacy tables

Risk: Low - code audit confirms no active references to plural tables Effort Estimate: 2 hours

TD-012: Deploy Tax Skills for Chat Sessions¶

Status: Not Started Priority: P1 (High - V1 Companion) Added: 2024-12-29 Meeting Reference: 2024-12-29 call item #10

Description: The tax skills (Tax Code Reference Lookup) exist but need deployment configuration so the AI chat feature can access them in both dev and cloud environments.

Action Items: - [ ] Configure skill deployment for local dev environment - [ ] Configure skill deployment for cloud (Cloudflare Pages) - [ ] Verify chat sessions can invoke tax skills - [ ] Test skill responses in both environments

Effort Estimate: 2-3 hours

TD-013: Identify Structured AI Request Types¶

Status: Not Started Priority: P1 (High - V1 Companion) Added: 2024-12-29 Meeting Reference: 2024-12-29 call item #11

Description: Define and implement structured request types for the AI assistant to handle common questions efficiently using cached MD documentation.

Structured Request Types: - [ ] "Are docs complete?" - Document completeness check against checklist - [ ] "What is missing?" - Identify missing documents for a return - [ ] "Summary" - Generate return summary from cached analysis

Implementation Approach: - [ ] Define request type patterns (keywords/intent detection) - [ ] Map request types to cached MD documents - [ ] Implement response generation from cached data (fast, no API call) - [ ] Fall back to full AI analysis when no cached data available - [ ] Leverage existing tax skills per 2024-12-29 meeting; assess if new skill needed

Dependencies: TD-012 (tax skills deployment) Effort Estimate: 4-6 hours

TD-018: Remove Hardcoded Config and Fallback Paths¶

Status: ✅ Completed (2025-12-30) Details: See COMPLETED.md

TD-020: E-Filing Schema Alignment (Mock → Production)¶

Status: Not Started Priority: P1 (High - Tests Blocked) Added: 2025-12-30 Discovered: Test suite failures when removing mock code branch

Description: The e-filing repository code was written against a mock schema structure that differs from the production DATABASE_SCHEMA.sql. Tests fail with "relation efile_submission does not exist" because the repository expects tables that don't exist in production.

Root Cause: Mock code used a simplified 2-table structure; production uses a 3-table normalized structure.

Current State (Mock Schema - what repository expects):

efile_submission (per jurisdiction)
├── id, tax_return_id, client_id, jurisdiction, state_code
├── status, sureprep_tracking_id, ultratax_confirmation, irs_confirmation
├── submitted_at, transmitted_at, accepted_at, rejected_at
└── filing_override, filing_override_by, filing_override_reason, filing_override_at

efile_rejection (references submission_id)
├── id, submission_id, tax_return_id
├── error_code, error_message, category, auto_correctable
└── resolution_notes, resolved_at, resolved_by

Production Schema (DATABASE_SCHEMA.sql):

efiling_transmission (per return, links to transmitter_account)
├── id, tax_return_id, transmitter_account_id
├── transmission_type, status, queued_at, transmitted_at
├── federal_submission_id, state_submissions (JSONB)
└── error_message, retry_count

efiling_acknowledgement (per jurisdiction acknowledgement)
├── id, transmission_id (FK to efiling_transmission)
├── jurisdiction, ack_type, ack_date
└── confirmation_number, irs_submission_id, raw_response

efiling_rejection (references acknowledgement_id)
├── id, acknowledgement_id (FK to efiling_acknowledgement)
├── error_code, error_category, error_message, field_name
├── is_auto_correctable, resolution_status, assigned_to
└── resolution_notes, resolved_at, resolved_by

Files Requiring Changes: | File | Change Type | Description | |------|-------------|-------------| | src/domain/efiling.py | Major | Refactor EFileSubmission entity to match transmission/acknowledgement structure | | src/repositories/efiling_repository.py | Major | Rewrite queries for 3-table structure | | src/workflows/filing/efiling_workflow.py | Moderate | Update workflow to use new repository methods | | src/api/routes/efiling.py | Moderate | Update API responses to match new entity structure | | src/api/schemas/efiling_schemas.py | Moderate | Update Pydantic schemas | | tests/e2e/test_efiling_api.py | Moderate | Update test assertions for new response structure |

Blocked Tests (14 tests): - tests/e2e/test_efiling_api.py - All 14 tests fail with "relation efile_submission does not exist"

Decision Points: 1. Should client_id be added to efiling_transmission or derived via tax_return FK? 2. Should filing_override fields be added to production schema or handled differently? 3. Should API response structure change or maintain backward compatibility?

Effort Estimate: 8-12 hours - Domain/Repository refactor: 4-6 hours - API/Schema updates: 2-3 hours - Test updates: 2-3 hours

Status: ✅ Completed (2025-12-30) Priority: P2 (Medium - 5 Tests Affected) Added: 2025-12-30 Discovered: Test suite failures in test_login_attempt_repository.py

Description: The login_attempt_repository.py had timezone-aware vs naive datetime comparison bugs causing 5 test failures.

Root Cause: Database uses TIMESTAMP (naive) but code passed timezone-aware datetimes to queries.

Fix Applied: Convert timezone-aware datetimes to naive using .replace(tzinfo=None) before passing to database queries. Fixed in 4 locations: - get_recent_failed_attempts (line 125) - count_recent_failed_attempts (line 161) - get_lockout_status (line 227) - get_attempts_by_ip (line 273)

Note: Long-term fix would be to change schema to use TIMESTAMPTZ instead of TIMESTAMP. See 79 occurrences of TIMESTAMP vs 19 TIMESTAMPTZ in DATABASE_SCHEMA.sql.

TD-022: Domain Enum Consolidation - Semantic Mismatches¶

Status: ✅ Complete Priority: P2 (Medium - Code Quality) Added: 2025-12-31 Completed: 2025-12-31 Discovered: Code audit Phase 2 - enum consolidation investigation

Resolution (Phase 7 - commit 4c0fd23): - Unified VerificationStatus into single source at src/domain/identity_verification.py - Added INITIATED status to canonical enum (values: INITIATED, PENDING, IN_PROGRESS, APPROVED, NEEDS_REVIEW, REJECTED, EXPIRED) - Updated src/domain/client.py to import from identity_verification.py - Renamed SurePrep's enum to DocumentVerificationStatus to avoid collision - All 2253 tests passing

Original Description: During code audit, investigation of duplicate enum definitions revealed semantic mismatches that prevent simple consolidation. The enums have different meanings and database enum values don't match either Python definition.

VerificationStatus Semantic Mismatch (RESOLVED):

Location	Original Values	Resolution
src/domain/client.py	INITIATED, PENDING, VERIFIED, FAILED, EXPIRED, MANUAL_REVIEW	Removed - now imports from identity_verification.py
src/domain/identity_verification.py	PENDING, IN_PROGRESS, APPROVED, REJECTED, EXPIRED	Canonical source - added INITIATED, NEEDS_REVIEW
src/ai/.../sureprep/mappings.py	VerificationStatus	Renamed to DocumentVerificationStatus

ExtractionSource Value Mismatch (Deferred - intentional distinction):

Location	AI Value	Issue
src/domain/extraction.py	INTERNAL_AI = "internal_ai"	Lowercase with underscore
src/domain/metadata.py	CLAUDE_BEDROCK = "claude_bedrock"	Different semantic meaning

Analysis: These represent different things - one is a generic "internal AI" source, the other specifically identifies Claude/Bedrock. Intentional distinction, no action needed.

Completed Action Items: - [x] Investigate database enum usage vs Python enum usage - [x] Determine if VerificationStatus should remain separate (intentional) or be unified - [x] Unified VerificationStatus with identity_verification.py as canonical source - [x] Document ExtractionSource difference is intentional (no change needed) - [ ] Align database enum values with Python enum (future migration if needed)

Status: Consolidated 2024-12-27 Total Items: 15 (Security: 6, Compliance: 8, Code: 1 conditional) Total Effort: 45-50 hours

Implementation order optimized for dependencies and quick wins:

Phase 1: Quick Wins (2.5 hrs) - ✅ COMPLETE (2025-12-31)¶

ID	Issue	Status	Fix
SEC-003	Missing security headers	✅ Done	SecurityHeadersMiddleware with configurable headers
SEC-005	ORDER BY SQL injection risk	✅ Done	Whitelist validation in BaseRepository + AuditService
SEC-006	CORS overly permissive	✅ Done	CORSConfig with explicit methods/headers whitelist

Phase 2: XSS Fixes (3 hrs) - Parallelize¶

ID	Issue	Effort	Fix
SEC-001	XSS via template injection	1 hr	Use html.escape() or Jinja2 autoescape
SEC-002	dangerouslySetInnerHTML	2 hrs	Add DOMPurify or refactor to React components

Phase 3: Access Control (5 hrs) - Sequential¶

ID	Issue	Effort	Fix
SEC-007	Missing role enforcement	2 hrs	Filter list endpoints by user role
COMP-005	Account lockout missing	3 hrs	Add lockout fields, check before auth

Phase 4: Audit Logging (5 hrs) - Sequential after Phase 3¶

ID	Issue	Effort	Fix
COMP-003	Auth events not logged	2 hrs	Call audit_service.log_auth() in auth routes
COMP-004	Access not logged	3 hrs	Add log_access() to all GET endpoints

ID	Issue	Effort	Fix
COMP-001	AI processing consent missing	4 hrs	Add USE_AI_PROCESSING, check in BedrockService
COMP-002	E-filing consent not enforced	2 hrs	Validate Form 7216 before mark_ready_for_filing

Phase 6: New Workflows (16 hrs) - Parallelize¶

ID	Issue	Effort	Fix
COMP-007	COI checks missing	8 hrs	COI table, check workflow, API, audit log
COMP-008	Form 2848 POA missing	8 hrs	Form generation, signature, access control

Phase 7: Data Protection (8 hrs)¶

ID	Issue	Effort	Fix
COMP-006	Field encryption missing	8 hrs	encryption_service.py with pgcrypto wrapper

Conditional¶

ID	Issue	Effort	Condition
~~AUDIT-002~~	~~local_api.py connection pooling~~	~~2 hrs~~	N/A - local_api.py deleted in TD-018

Last updated: 2025-12-31