# Legacy Platform Migration

Date: 2026-05-24

See also:

- `docs/legacy_production_migration_runbook.md` for the operational execution runbook used during real migration waves

## Purpose

This document defines the safe migration strategy for historical ECOIN platform data coming from the legacy tables:

- `stagiaires`
- `archives`
- `formations`
- `session_ecoins`

The current phase is limited to:

- audit
- staging
- validation
- classification
- dry-run preparation

It does **not** commit imported rows into active platform business tables.

## Source Strategy

The old platform is connected through a dedicated external database connection:

- connection name: `legacy`
- access mode: read-only operationally

Current phase does not support SQL dump parsing. The importer reads directly from the external legacy database connection.

## Legacy DB Configuration

The application now defines a first-class `legacy` connection in `config/database.php`.

### Required `.env` keys

```dotenv
LEGACY_IMPORT_CONNECTION=legacy
LEGACY_IMPORT_FOCUS_MODE_DEFAULT=historical_completed

LEGACY_DB_DRIVER=mysql
LEGACY_DB_HOST=127.0.0.1
LEGACY_DB_PORT=3306
LEGACY_DB_DATABASE=ecoin_legacy
LEGACY_DB_USERNAME=root
LEGACY_DB_PASSWORD=secret
LEGACY_DB_SOCKET=
LEGACY_DB_CHARSET=utf8mb4
LEGACY_DB_COLLATION=utf8mb4_unicode_ci
LEGACY_DB_STRICT=true
```

### Production recommendation

- create a dedicated DB user for the old platform
- grant `SELECT` only
- do not grant:
  - `INSERT`
  - `UPDATE`
  - `DELETE`
  - `ALTER`
  - `DROP`

### Example MySQL grant

```sql
CREATE USER 'ecoin_legacy_reader'@'%' IDENTIFIED BY 'strong-password';
GRANT SELECT ON ecoin_legacy.* TO 'ecoin_legacy_reader'@'%';
FLUSH PRIVILEGES;
```

### Operational check before import

Run:

```bash
php artisan legacy:import
```

If you want to override the configured connection explicitly:

```bash
php artisan legacy:import --connection=legacy
```

The command now prints the connection name it actually used, which helps verify that the import did not accidentally read from the main application database.

## Source Analysis

Observed legacy data categories:

- `stagiaires`: mixed lead, registered, and historically trained learners
- `archives`: archived historical learner records, often training history
- `formations`: old training catalog reference
- `session_ecoins`: old session/cohort reference

Important operational observation:

- many records are not simple leads
- several rows represent real completed training history
- some session ids are waiting placeholders and must not become real cohorts automatically

Operational priority for ECOIN legacy data:

1. historical completed training
2. safe pending registration history
3. leads and follow-up interest

This priority now drives the dry-run review layer and admin UI summaries.

## Staging Schema

Phase 1 introduces:

- `legacy_import_runs`
- `legacy_student_imports`
- `legacy_training_imports`
- `legacy_course_mappings`
- `legacy_session_mappings`
- `legacy_import_commit_logs`

### Central review table

`legacy_student_imports` is the core staging table.

It stores:

- normalized student identity snapshot
- source table and legacy id
- raw status and legacy references
- raw payload
- validation status
- duplicate confidence
- classification
- reserved future imported ids

### Training child rows

`legacy_training_imports` stores:

- legacy formation reference
- legacy session reference
- session type
- tentative import action

## Classification Rules

### Status rules

- `Formée` -> `completed`
- `Archivé` / `archivé` / `Archivè` -> `archived_training`
- `Informer` / `informer` -> `lead_only`
- `Reporter` -> `registered` or `lead_only`
- `Ne répond pas` -> `lead_only`
- `Confirmer` -> `registered`
- `nouveau` -> `lead_only`
- anything else -> `unknown`

### Import action intent

In this phase, `import_action` is advisory only:

- `create_course_interest`
- `create_legacy_enrollment`
- `create_legacy_completed_training`
- `needs_review`

No action is executed against active platform entities yet.

## Session Placeholder Rules

The following sessions are treated as waiting or placeholder sessions:

- `session_ecoin_id = 1`
- `session_ecoin_id = 2`
- `session_ecoin_id = 235`
- any code containing `Session-Attente`
- any session without clear `formation_id`

These must not create real cohorts automatically.

## Course Mapping Rules

Old `formations` are staged into `legacy_course_mappings`.

Rules:

- exact course match may be suggested
- no automatic course creation
- unknown course references do not fail the row
- unknown course references become `needs_review` for historical training actions

## Duplicate Rules

### High confidence

- normalized phone matches existing user or CRM lead
- valid email matches existing user or CRM lead

### Medium confidence

- full name + birth date
- full name + phone raw

### Low confidence

- similar full name only

Only high confidence is safe for future auto-linking. This phase stores hints only and performs no merges.

## Phone Rules

Algeria normalization rules use the existing normalizer:

- `0559446168` -> `213559446168`
- `0776845160` -> `213776845160`
- malformed or short values stay invalid

`phone_raw` is always preserved.

## Email Rules

Placeholder or invalid values such as:

- `/`
- `NULL`
- `Ecoin.hank@gmail.com`

are not trusted as real contact emails and are stored as null after sanitization.

## Validation Strategy

Validation status options:

- `pending`
- `valid`
- `duplicate`
- `needs_review`
- `failed`

Validation and classification are separate:

- validation answers whether the row is structurally trustworthy
- classification answers what the row appears to represent in business terms

## Commit Strategy

The platform now supports a **narrow, safe commit slice** only.

Active commit paths:

- `CommitLegacyAsCrmLeadAction`
- `CommitLegacyAsCompletedTrainingAction`
- `CommitLegacyAsLegacyEnrollmentAction`
- `LinkLegacyRecordToExistingUserAction`
- `SkipLegacyRecordAction`

Still deferred:

- `CommitLegacyAsLegacyEnrollmentAction`
- any payment creation
- any attendance creation

### Current commit boundaries

- `lead_only` rows can create CRM leads when the row is structurally valid and has no duplicate risk.
- high-confidence duplicates can be linked to existing users or existing CRM leads without creating new business records.
- `completed` and `archived_training` rows can create `legacy_completed_trainings` compatibility records.
- `registered` rows can now create `legacy_imported_enrollments` compatibility records only when:
  - the training import action is `create_legacy_enrollment`
  - the session is a confirmed `real_session`
- waiting / placeholder session rows are still intentionally left uncommitted for manual review.

## Rollback Strategy

Rollback is now active for the supported safe slice.

Rules:

- rollback affects only records created by a specific `legacy_import_run_id`
- rollback deletes:
  - newly created CRM leads from legacy commit
  - newly created `legacy_completed_trainings`
- rollback never deletes:
  - pre-existing users
  - pre-existing CRM leads that were only linked
- rollback resets staging linkage markers and records audit history in commit logs

## Commands

Current phase commands:

```bash
php artisan legacy:import --connection=legacy
php artisan legacy:validate --run=RUN_ID
php artisan legacy:classify --run=RUN_ID
php artisan legacy:dry-run --run=RUN_ID
```

Safe slice commands now available:

- `php artisan legacy:commit --run=RUN_ID --batch=100`
- `php artisan legacy:rollback --run=RUN_ID`

## Risk Register

### Risk 1: Real trained students downgraded to simple leads

Mitigation:

- explicit status-based classification
- `Formée` and archived statuses are handled first

### Risk 2: Waiting sessions become real cohorts

Mitigation:

- placeholder session hard rules
- no cohort creation in this phase

### Risk 3: Dirty contact data pollutes active records

Mitigation:

- staging only
- phone normalization
- placeholder email sanitization

### Risk 4: Duplicate collisions overwrite current data

Mitigation:

- no overwrite
- no active commits
- duplicate hints only

## Operational Checklist

Before running:

1. confirm legacy DB connection works
2. confirm connection points to old platform only
3. confirm no write path exists to legacy DB
4. confirm admin operator understands this is staging only

Run order:

1. `legacy:import`
2. `legacy:validate`
3. `legacy:classify`
4. `legacy:dry-run`

Dry-run now surfaces:

- source table distribution
- classification distribution
- validation distribution
- duplicate confidence distribution
- priority buckets for:
  - completed history
  - registered pending
  - lead only
  - other ambiguous rows
- proposed training import actions
- session type distribution
- risk counters for:
  - needs review students
  - duplicate students
  - placeholder or waiting sessions
  - unmapped courses that block non-lead training history

Review before any future commit phase:

1. placeholder sessions
2. unknown courses
3. high-confidence duplicates
4. archived/completed training counts
5. rows still marked `needs_review`
6. rows classified as `registered` that still require manual enrollment decisions

## Current Implementation Boundary

As of this phase, the system can:

- read legacy data
- stage it safely
- validate it
- classify it
- prepare dry-run review summaries

It can now:

- create safe CRM leads from legacy lead-only rows
- link high-confidence duplicates to existing users or CRM leads
- create compatibility `legacy_completed_trainings` records
- create compatibility `legacy_imported_enrollments` records for safe registered rows
- rollback the supported safe slice

It still cannot:

- create real operating enrollments inside the modern enrollment lifecycle
- create payments
- create attendance records
- upgrade waiting sessions into real cohorts automatically

## Admin Review UI

The platform now exposes review-only screens under:

- `/admin/legacy-imports`
- `/admin/legacy-imports/{run}`
- `/admin/legacy-imports/students`
- `/admin/legacy-imports/review`
- `/admin/legacy-imports/course-mapping`
- `/admin/legacy-imports/session-mapping`

Current UI rules:

- phone and email are masked in list screens
- raw payload preview is restricted to the dedicated review queue
- the review queue requires `legacy_imports.review`
- safe commit controls now exist only for authorized operators
- run exports are available in:
  - masked CSV
  - JSON summary without raw payload dumps
- review queue focus modes now support:
  - historical completed training
  - registered pending
  - lead only

This helps operators process the highest-value historical training records first.

## Verification Note

Commands, routes, syntax, and architecture checks are verified for this phase.
The dedicated Legacy Import test harness is now stable for:

- foundation staging tests
- validation and classification tests
- admin review UI tests

Later commit-phase workflows still require their own isolated verification layer before activation.
