NCD Analytics Calculation Reference (Detailed)
==============================================

Purpose
-------
This file documents exactly how the NCD analytics results are calculated in the current Laravel + Python implementation.
Use this as a technical reference for medical stakeholders, QA, and data audits.

Code paths used
---------------
- Python pipeline: ncd_analysis.py
- SQL views: metrics_views.sql
- Laravel dashboard calculations: app/Http/Controllers/Dashboard1Controller.php
- NCD dashboard page: resources/views/ncd_dashboard.blade.php

Primary source tables
---------------------
1) ncd_pt_registers
- Key columns used:
  - Pid, FuchiaID, Clinic_code
  - Reg_Date
  - visit_Age, Current_Age
  - Gender
  - 1stBP, 2ndBP, 3rdBP
  - 1stHypertension, Diabetes_Diagnose (column: 2nd_Hypertension), staging_Hypertension
  - 1st_tot_Diabetes, 1st_RBS, 1st_RBS_date
  - 2nd_tot_Diabetes, 2nd_RBS, 2nd_RBS_date

2) ncd_followups
- Key columns used:
  - Pid, FuchiaID, Clinic_code
  - Visit_date, Next_Appointment
  - own_clinic_Bp, own_Bp_Stage
  - NCD_Diagnosis
  - FBS, FBS_test_date, Loaction_test
  - 2HPP, 2HPP_test_date, Loaction_Test2
  - HBA1C, Lab_res_Date
  - RBS result (legacy)
  - Creatinine, CRCL, Uring_AC_ratio, CVD_Risk
  - Medication changed, Patient_adhe medic, Drug_Supply
  - Foth_medi_spec
  - Out_come, Tout_mam_clinic, Ncd_Tout_icmv_location

3) patients
- Key columns used:
  - Pid
  - Date of Birth
  - Gender (used for gender correction on cumulative follow-up chart)

High-level processing flow
--------------------------
Step 1. Per clinic DB, create/update views:
- v_ncd_base_join: registration + patient join
- v_ncd_followups_clean: follow-up normalized columns + BP parse
- v_ncd_monthly_summary_seed: monthly seed counts

Step 2. Read view data into pandas:
- registers (from v_ncd_base_join)
- followups (from v_ncd_followups_clean)
- monthly seed

Step 3. Decrypt configured encrypted columns via Laravel helper:
- tools/laravel_decryptor.php
- Python sends batches to PHP and receives decrypted values.

Step 4. Data quality date limits:
- config.yaml -> data_quality.min_valid_date / max_valid_date
- Dates outside range are flagged.
- For core row date columns:
  - register row dropped if reg_date invalid range
  - follow-up row dropped if visit_date invalid range
- Other invalid date fields are nulled and tracked.

Step 5. Build core patient-level dataset:
- Latest follow-up per Pid is selected (max visit_date, tie by followup_id desc).
- This dataset is saved as patient_latest.csv and is used by many dashboard KPIs.

Step 6. Export clinic outputs:
- outputs/{DB}/registers_clean.csv
- outputs/{DB}/followups_clean.csv
- outputs/{DB}/patient_latest.csv
- plus trend and summary CSVs (listed below).

Step 7. Build combined overall output:
- Concatenate all clinic datasets (source_db retained).
- Recalculate metrics for outputs/overall.

Privacy and ID handling
-----------------------
- If privacy.mask_ids=true, stakeholder CSVs hash pid/patient_id/fuchia_id with SHA-256 + id_salt.
- Doctor list can keep full IDs using privacy.keep_full_ids_in_doctor_lists=true.
- For cumulative follow-up by gender, gender is remapped from patients table; encrypted Gender is decrypted with decrypt_light('General').

Core parsing rules
------------------
BP parsing
- Pattern: ^\\s*\\d{2,3}\\s*/\\s*\\d{2,3}\\s*$
- Valid range:
  - SBP: 50 to 300
  - DBP: 30 to 200
- Out-of-range BP is treated as invalid for control/staging.

BP stage classification
- Stage 3: SBP >= 180 OR DBP >= 110
- Stage 2: SBP >= 160 OR DBP >= 100
- Stage 1: SBP >= 140 OR DBP >= 90
- Normal: otherwise

Diabetes control hierarchy
- Per row, first available test in this order:
  1) HbA1c
  2) 2HPP
  3) FBS
  4) RBS (legacy fallback)
- Controlled thresholds (config.yaml):
  - HbA1c < 7.0
  - 2HPP < 180
  - FBS < 126
  - RBS < 200

Age derivation
- Primary: date_of_birth from patients + reference date
- Formula: floor((reference_date - dob in days) / 365.2425)
- Fallbacks when DOB unavailable:
  - current_age
  - visit_age
- Dashboard age distribution uses ncd_pt_registers.visit_Age and includes only ages 1..120.

Dashboard metrics: exact calculation steps
------------------------------------------

1) Yearly/Monthly cohort and diagnosis trend
Source:
- ncd_pt_registers (register rows within filters)

Grouping:
- Yearly: period = year(Reg_Date)
- Monthly: period = month(Reg_Date) for selected trend_year

Total cohort:
- Count of register rows per period.

Diagnosis flags:
- Hypertension diagnosis present if 1stHypertension contains New/Known/Know (case-insensitive normalized).
- Diabetes diagnosis present if Diabetes_Diagnose (column: 2nd_Hypertension) contains New/Known/Know.
- Both = both flags true.
- No diagnosis = total cohort - (HTN only + DM only + both).

Gender segmentation:
- Male/Female split for HTN only, DM only, both, and no diagnosis.

2) Cumulative follow-up visits by year
Source:
- ncd_followups.Visit_date + NCD_Diagnosis

Year bucket:
- year(Visit_date)

Diagnosis category per visit:
- hypertension if text contains hypertension/htn
- diabetes if text contains diabetes/dm
- both if both terms appear (or contains "both")
- other for unknown/non-target

Two outputs are produced:
- Yearly raw counts by category
- Cumulative counts by category across years

3) Cumulative follow-up visits by year (gender)
Source:
- follow-up visits + gender remapped from patients table (preferred over followup gender text)

Per year:
- Male/Female counts are split by diagnosis category.
- Chart displays cumulative male/female totals over years.

4) Age distribution
Source:
- ncd_pt_registers.visit_Age

Rules:
- Use floor(age)
- Include only 1 <= age <= 120
- Bin size: 5 years
- Bins: 1-5, 6-10, ..., 116-120

5) LTFU from latest appointment (+84 days)
Source:
- For each Pid, use latest follow-up by Visit_date.

Classification logic:
- If Out_come indicates died/death/dead/tout/transfer out -> Exited
- Else if Next_Appointment missing -> Missing next appointment
- Else cutoff = Next_Appointment + grace_days (default 84)
  - cutoff < observe_date -> LTFU
  - otherwise -> Active

Observe date:
- current time in dashboard calculation (Laravel side), or configured report end in pipeline summaries.

Also provided:
- gender-stacked counts for Active/LTFU/Exited/Missing.

6) Clinic visit plan status by year
Source:
- Consecutive visit pairs per Pid in ncd_followups

For each current visit i:
- Needs Next_Appointment on visit i
- Compare to next visit i+1
- delta_days = next_visit_date - next_appointment_date

Status:
- Unplan (early): delta_days < 0
- Ontime: 0 <= delta_days <= 7
- Late: 8 <= delta_days < 84

Exclusions:
- Next_Appointment year < minYear (default 2020) excluded
- delta_days >= 84 excluded
- Missing Next_Appointment tracked
- No following visit tracked

7) Return to care by year
Source:
- Consecutive Visit_date pairs per Pid

Rule:
- If gap between consecutive visits >= 84 days, later visit counted as return-to-care event.

Output:
- yearly event counts
- summary: total events, unique patients with >=1 event, excluded pairs under 84 days.

8) Control status (BP stage change matrix)
Source:
- Baseline stage from register
- Latest stage from follow-up

Baseline stage extraction priority:
- 3rdBP -> 2ndBP -> 1stBP (first valid BP parse)
- fallback to staging_Hypertension text parse

Latest stage extraction priority:
- numeric sbp/dbp columns if valid
- fallback parse own_clinic_Bp
- fallback own_Bp_Stage text parse

Then for each patient:
- Require valid baseline and valid latest stage.
- Build 4x4 matrix: baseline stage (rows) vs latest stage (columns)
- Direction:
  - Improved if latest rank < baseline rank
  - Unchanged if equal
  - Worsened if latest rank > baseline rank

Data quality tracked:
- baseline missing/invalid
- latest missing/invalid
- top invalid raw examples (baseline and latest)

Current dashboard scope note
----------------------------
The current NCD dashboard page intentionally focuses on the 8 chart groups above.
Legacy/extra metrics (for example BP control %, DM control %, sustained control, quality-of-care panels, and related KPI cards) are not included in this current dashboard layout.

Export files and what they represent
------------------------------------
Per clinic in outputs/{DB}/:
- registers_clean.csv: cleaned register-level dataset
- followups_clean.csv: cleaned follow-up-level dataset
- patient_latest.csv: one latest follow-up record per patient (main denominator table)
- monthly_summary.csv: registrations and follow-ups by month
- bp_control_trend.csv: monthly BP control trend
- dm_control_trend.csv: monthly DM control trend
- continuity_metrics.csv: active/ltfu/missed/late counts + rates
- quality_metrics.csv: lab coverage metrics
- risk_metrics.csv: risk markers and symptoms
- operations_metrics.csv: medication changed summary
- operations_distributions.csv: adherence + drug supply distributions
- other_medications.csv: normalized Foth_medi_spec distribution
- equity_bp_control.csv, equity_dm_control.csv: equity tables
- kpi_summary.csv: top KPI table
- data_quality_report.csv: quality counters
- data_quality/invalid_date_ranges.csv: row-level invalid date list
- doctor/doctor_action_lists.csv: clinical follow-up list

Overall outputs in outputs/overall/:
- Same structure computed on combined clinic data.
- Includes kpi_summary_by_clinic.csv and overall invalid_date_ranges.csv.

Note:
- Some exported CSV files are generated by the pipeline for audit/reference compatibility, even if those metrics are not currently rendered as charts on this dashboard page.

Current key default parameters (config.yaml)
--------------------------------------------
- date_range: 2015-01-01 to 2025-12-31
- data_quality.min_valid_date: 2018-01-01
- data_quality.max_valid_date: 2025-12-31
- active_days: 180
- ltfu_days: 90
- visit plan on-time window: 0-7 days
- visit plan late window: 8-83 days
- return-to-care gap: >=84 days

Audit/troubleshooting locations
-------------------------------
- Background status: storage/app/ncd_analysis_status.json
- Background log: storage/logs/ncd_analysis_background.log
- Last outputs timestamp: derived from latest CSV mtime in outputs/{clinic}/

Notes
-----
- Unknown/NA/null-like values are treated as missing in most calculations.
- Date filtering and trend filters are applied before most metrics.
- If a metric looks inconsistent, first compare:
  - selected clinic
  - timeframe / trend year
  - denominator definition (patient-level vs visit-level).