# Legacy Import Foundation Design

Date: 2026-05-24

Status: approved design for Phase 1 implementation.

## Goal

Build a safe legacy import foundation for ECOLE ECOIN that can read real student and training history from the old ECOIN platform through a dedicated read-only legacy database connection, stage it inside the new platform, validate it, classify it, and prepare it for human review.

This phase does **not** commit records into active `users`, `crm_leads`, `enrollments`, `payments`, attendance, or messaging workflows.

## Scope

### In scope for this phase

- Legacy source connection configuration
- Import run tracking
- Staging tables for students, training rows, course mappings, session mappings, and commit logs
- Read-only ingestion from:
  - `stagiaires`
  - `archives`
  - `formations`
  - `session_ecoins`
- Phone normalization
- Email placeholder detection
- Duplicate hinting
- Validation state calculation
- Business classification
- Session placeholder detection
- Dry-run reporting foundation
- Admin review pages for staged data and mappings
- Commands for import, validate, classify, and dry-run
- Tests and operational documentation

### Explicitly out of scope for the original foundation phase

- Import directly into `users`
- Import directly into `enrollments`
- Payment creation
- Attendance creation
- WhatsApp or messaging side effects
- Automatic course creation
- Automatic cohort creation from waiting sessions
- Broad commit execution
- Broad rollback of committed business data

## Source of Truth and Safety Model

The old platform remains an external source through a dedicated `legacy` database connection.

Rules:

- The importer reads through a read-only connection.
- The importer never mutates the legacy database.
- The importer never overwrites current platform data.
- The importer only writes to legacy staging tables in this phase.
- Every future commit path must carry `source=legacy_platform` and `legacy_import_run_id`.

## Module Ownership

Create a new module:

```text
app/Domain/LegacyImport/
  Actions/
  Data/
  Queries/
  Services/
  Support/
```

Module ownership:

- Owns ingestion, staging, validation, classification, and dry-run preparation
- Does not own active student lifecycle writes
- Does not bypass CRM, Enrollment, Finance, Attendance, or WhatsApp ownership

Future commit work will call owned Actions in other modules or create narrowly owned compatibility records. That is intentionally deferred.

## Data Model

### `legacy_import_runs`

Tracks one import cycle.

Key fields:

- run label/reference
- source connection name
- status
- counters
- started/finished timestamps
- initiated by user
- notes/metadata

### `legacy_student_imports`

This is the central staged record per legacy row from `stagiaires` or `archives`.

It stores:

- source table
- legacy id
- normalized person fields
- raw legacy status and session references
- raw payload
- business classification
- validation state
- duplicate hints
- future imported entity ids
- commit markers

This table is the review queue.

### `legacy_training_imports`

This is the child record for a legacy student's training/session reference.

It stores:

- linked staged student
- legacy formation id/name
- legacy session id/code/date range
- matched course/cohort hints
- session type
- tentative import action

### `legacy_course_mappings`

Stores human-reviewed mapping intent from old `formations` to current `courses`.

### `legacy_session_mappings`

Stores human-reviewed mapping intent from old `session_ecoins` to current cohorts or placeholder rules.

### `legacy_import_commit_logs`

Reserved now and created in schema now, but only lightly used in this phase for skip/review preparation if needed. Full commit history becomes active in the later commit phase.

## Ingestion Strategy

### Reader contract

Use a source abstraction now, but enable only one reader:

- `LegacySourceReader` contract
- `DatabaseLegacySourceReader` implementation

This keeps the design extensible without adding dump support now.

### Import flow

1. Create import run
2. Read reference tables:
   - `formations`
   - `session_ecoins`
3. Stage course mappings and session mappings
4. Read student rows from:
   - `stagiaires`
   - `archives`
5. Normalize and stage student rows
6. Create linked `legacy_training_imports`
7. Mark run as ingested

The raw source payload is preserved in staging for auditability.

## Classification Rules

Classification is deterministic and rule-based in this phase.

### Status mapping

- `Formée` -> `completed`
- `Archivé` / `archivé` / `Archivè` -> `archived_training`
- `Informer` / `informer` -> `lead_only`
- `Reporter` -> `registered` or `lead_only` depending on training/session quality
- `Ne répond pas` -> `lead_only` with lost/not responding review hint
- `Confirmer` -> `registered`
- `nouveau` -> `lead_only`
- anything else -> `unknown` or `needs_review`

### Import action mapping

In this phase, `import_action` is staged only, not executed:

- completed history -> `create_legacy_completed_training`
- weak interest only -> `create_course_interest`
- pending registration history -> `create_legacy_enrollment`
- ambiguity -> `needs_review`

## Session Rules

Certain legacy sessions must never become real cohorts automatically:

- `session_ecoin_id` in `1, 2, 235`
- any session code containing `Session-Attente`
- any session lacking a valid `formation_id`

These are classified as:

- `waiting_session`
- `placeholder_session`
- or `unknown`

Only sessions with valid structure are considered `real_session`.

## Validation Rules

Validation remains separate from classification.

Validation checks:

- essential legacy identity present
- phone can be normalized or retained as invalid
- email is valid or marked placeholder/invalid
- source references are structurally consistent
- duplicate confidence can be computed

Validation outcomes:

- `pending`
- `valid`
- `duplicate`
- `needs_review`
- `failed`

No row is dropped silently.

## Duplicate Strategy

Duplicate scoring is advisory in this phase.

### High confidence

- normalized phone matches existing `users` or `crm_leads`
- valid email matches existing `users` or `crm_leads`

### Medium confidence

- full name + birth date
- full name + raw phone

### Low confidence

- similar full name only

Rules:

- only high confidence is safe for future auto-linking
- medium and low always remain review-driven
- this phase stores hints only, no merges

## Normalization Rules

### Phone

Reuse the existing Algeria phone normalization service as the canonical normalizer when possible.

Examples:

- `0559446168` -> `213559446168`
- `0776845160` -> `213776845160`
- short, `0`, `00`, or malformed values -> invalid phone

Always retain `phone_raw`.

### Email

Treat placeholders like:

- `/`
- `NULL`
- known fake placeholders such as `Ecoin.hank@gmail.com`

as invalid placeholders rather than trusted contact data.

## Admin UX

Create review surfaces under:

- `/admin/legacy-imports`
- `/admin/legacy-imports/{run}`
- `/admin/legacy-imports/students`
- `/admin/legacy-imports/review`
- `/admin/legacy-imports/course-mapping`
- `/admin/legacy-imports/session-mapping`

The UI purpose in this phase is:

- browse staging runs
- inspect normalized/classified rows
- filter review queues
- inspect mapping confidence
- preview raw payload where authorized

This phase does not expose active commit buttons that mutate business entities.

## Commands

The old command name mentioning SQL dumps does not match the approved source strategy. Use connection-based commands now.

Foundation commands:

- `php artisan legacy:import --connection=legacy`
- `php artisan legacy:validate --run=`
- `php artisan legacy:classify --run=`
- `php artisan legacy:dry-run --run=`

Safe slice commands now active:

- `php artisan legacy:commit --run= --batch=100`
- `php artisan legacy:rollback --run=`

Current commit scope is intentionally narrow:

- `lead_only` -> CRM lead creation or safe high-confidence linking
- `completed` / `archived_training` -> compatibility `legacy_completed_trainings`
- `registered` remains deferred

## Security Model

Permissions:

- `legacy_imports.view`
- `legacy_imports.run`
- `legacy_imports.validate`
- `legacy_imports.review`
- reserve now for later:
  - `legacy_imports.commit`
  - `legacy_imports.rollback`

Security rules:

- phone and email are masked in list views
- raw payload requires explicit authorization
- audit import runs and review actions
- avoid logging full PII in command output

## Testing Strategy

Phase 1 tests focus on:

- staging import correctness
- normalization
- validation
- classification
- placeholder session detection
- duplicate hints
- permissions

No test in this phase should assert real active business entity creation from legacy rows.

## Risks

### 1. Misclassifying trained students as simple leads

Mitigation:

- strong rule priority for `Formée` and archived training cases
- classification and validation are stored, not auto-committed

### 2. Turning waiting sessions into real cohorts

Mitigation:

- hard-coded placeholder session rules
- real cohort creation deferred entirely

### 3. Damaging current production data

Mitigation:

- no direct writes to users/enrollments/crm in this phase
- staging-only writes

### 4. Bad contact data polluting the system

Mitigation:

- placeholder email detection
- invalid phone preservation without trust
- duplicate confidence separation

## Acceptance for Phase 1

This phase is done when:

- a run can ingest legacy rows into staging
- staged rows can be validated
- staged rows can be classified
- dry-run data can be reviewed safely
- mappings can be reviewed without creating active platform records
- no direct commit to active business tables occurs
