NCD Analytics Calculation Reference (Detailed) ============================================== Purpose ------- This file documents exactly how the NCD analytics results are calculated in the current Laravel + Python implementation. Use this as a technical reference for medical stakeholders, QA, and data audits. Code paths used --------------- - Python pipeline: ncd_analysis.py - SQL views: metrics_views.sql - Laravel dashboard calculations: app/Http/Controllers/Dashboard1Controller.php - NCD dashboard page: resources/views/ncd_dashboard.blade.php Primary source tables --------------------- 1) ncd_pt_registers - Key columns used: - Pid, FuchiaID, Clinic_code - Reg_Date - visit_Age, Current_Age - Gender - 1stBP, 2ndBP, 3rdBP - 1stHypertension, Diabetes_Diagnose (column: 2nd_Hypertension), staging_Hypertension - 1st_tot_Diabetes, 1st_RBS, 1st_RBS_date - 2nd_tot_Diabetes, 2nd_RBS, 2nd_RBS_date 2) ncd_followups - Key columns used: - Pid, FuchiaID, Clinic_code - Visit_date, Next_Appointment - own_clinic_Bp, own_Bp_Stage - NCD_Diagnosis - FBS, FBS_test_date, Loaction_test - 2HPP, 2HPP_test_date, Loaction_Test2 - HBA1C, Lab_res_Date - RBS result (legacy) - Creatinine, CRCL, Uring_AC_ratio, CVD_Risk - Medication changed, Patient_adhe medic, Drug_Supply - Foth_medi_spec - Out_come, Tout_mam_clinic, Ncd_Tout_icmv_location 3) patients - Key columns used: - Pid - Date of Birth - Gender (used for gender correction on cumulative follow-up chart) High-level processing flow -------------------------- Step 1. Per clinic DB, create/update views: - v_ncd_base_join: registration + patient join - v_ncd_followups_clean: follow-up normalized columns + BP parse - v_ncd_monthly_summary_seed: monthly seed counts Step 2. Read view data into pandas: - registers (from v_ncd_base_join) - followups (from v_ncd_followups_clean) - monthly seed Step 3. Decrypt configured encrypted columns via Laravel helper: - tools/laravel_decryptor.php - Python sends batches to PHP and receives decrypted values. Step 4. Data quality date limits: - config.yaml -> data_quality.min_valid_date / max_valid_date - Dates outside range are flagged. - For core row date columns: - register row dropped if reg_date invalid range - follow-up row dropped if visit_date invalid range - Other invalid date fields are nulled and tracked. Step 5. Build core patient-level dataset: - Latest follow-up per Pid is selected (max visit_date, tie by followup_id desc). - This dataset is saved as patient_latest.csv and is used by many dashboard KPIs. Step 6. Export clinic outputs: - outputs/{DB}/registers_clean.csv - outputs/{DB}/followups_clean.csv - outputs/{DB}/patient_latest.csv - plus trend and summary CSVs (listed below). Step 7. Build combined overall output: - Concatenate all clinic datasets (source_db retained). - Recalculate metrics for outputs/overall. Privacy and ID handling ----------------------- - If privacy.mask_ids=true, stakeholder CSVs hash pid/patient_id/fuchia_id with SHA-256 + id_salt. - Doctor list can keep full IDs using privacy.keep_full_ids_in_doctor_lists=true. - For cumulative follow-up by gender, gender is remapped from patients table; encrypted Gender is decrypted with decrypt_light('General'). Core parsing rules ------------------ BP parsing - Pattern: ^\\s*\\d{2,3}\\s*/\\s*\\d{2,3}\\s*$ - Valid range: - SBP: 50 to 300 - DBP: 30 to 200 - Out-of-range BP is treated as invalid for control/staging. BP stage classification - Stage 3: SBP >= 180 OR DBP >= 110 - Stage 2: SBP >= 160 OR DBP >= 100 - Stage 1: SBP >= 140 OR DBP >= 90 - Normal: otherwise Diabetes control hierarchy - Per row, first available test in this order: 1) HbA1c 2) 2HPP 3) FBS 4) RBS (legacy fallback) - Controlled thresholds (config.yaml): - HbA1c < 7.0 - 2HPP < 180 - FBS < 126 - RBS < 200 Age derivation - Primary: date_of_birth from patients + reference date - Formula: floor((reference_date - dob in days) / 365.2425) - Fallbacks when DOB unavailable: - current_age - visit_age - Dashboard age distribution uses ncd_pt_registers.visit_Age and includes only ages 1..120. Dashboard metrics: exact calculation steps ------------------------------------------ 1) Yearly/Monthly cohort and diagnosis trend Source: - ncd_pt_registers (register rows within filters) Grouping: - Yearly: period = year(Reg_Date) - Monthly: period = month(Reg_Date) for selected trend_year Total cohort: - Count of register rows per period. Diagnosis flags: - Hypertension diagnosis present if 1stHypertension contains New/Known/Know (case-insensitive normalized). - Diabetes diagnosis present if Diabetes_Diagnose (column: 2nd_Hypertension) contains New/Known/Know. - Both = both flags true. - No diagnosis = total cohort - (HTN only + DM only + both). Gender segmentation: - Male/Female split for HTN only, DM only, both, and no diagnosis. 2) Cumulative follow-up visits by year Source: - ncd_followups.Visit_date + NCD_Diagnosis Year bucket: - year(Visit_date) Diagnosis category per visit: - hypertension if text contains hypertension/htn - diabetes if text contains diabetes/dm - both if both terms appear (or contains "both") - other for unknown/non-target Two outputs are produced: - Yearly raw counts by category - Cumulative counts by category across years 3) Cumulative follow-up visits by year (gender) Source: - follow-up visits + gender remapped from patients table (preferred over followup gender text) Per year: - Male/Female counts are split by diagnosis category. - Chart displays cumulative male/female totals over years. 4) Age distribution Source: - ncd_pt_registers.visit_Age Rules: - Use floor(age) - Include only 1 <= age <= 120 - Bin size: 5 years - Bins: 1-5, 6-10, ..., 116-120 5) LTFU from latest appointment (+84 days) Source: - For each Pid, use latest follow-up by Visit_date. Classification logic: - If Out_come indicates died/death/dead/tout/transfer out -> Exited - Else if Next_Appointment missing -> Missing next appointment - Else cutoff = Next_Appointment + grace_days (default 84) - cutoff < observe_date -> LTFU - otherwise -> Active Observe date: - current time in dashboard calculation (Laravel side), or configured report end in pipeline summaries. Also provided: - gender-stacked counts for Active/LTFU/Exited/Missing. 6) Clinic visit plan status by year Source: - Consecutive visit pairs per Pid in ncd_followups For each current visit i: - Needs Next_Appointment on visit i - Compare to next visit i+1 - delta_days = next_visit_date - next_appointment_date Status: - Unplan (early): delta_days < 0 - Ontime: 0 <= delta_days <= 7 - Late: 8 <= delta_days < 84 Exclusions: - Next_Appointment year < minYear (default 2020) excluded - delta_days >= 84 excluded - Missing Next_Appointment tracked - No following visit tracked 7) Return to care by year Source: - Consecutive Visit_date pairs per Pid Rule: - If gap between consecutive visits >= 84 days, later visit counted as return-to-care event. Output: - yearly event counts - summary: total events, unique patients with >=1 event, excluded pairs under 84 days. 8) Control status (BP stage change matrix) Source: - Baseline stage from register - Latest stage from follow-up Baseline stage extraction priority: - 3rdBP -> 2ndBP -> 1stBP (first valid BP parse) - fallback to staging_Hypertension text parse Latest stage extraction priority: - numeric sbp/dbp columns if valid - fallback parse own_clinic_Bp - fallback own_Bp_Stage text parse Then for each patient: - Require valid baseline and valid latest stage. - Build 4x4 matrix: baseline stage (rows) vs latest stage (columns) - Direction: - Improved if latest rank < baseline rank - Unchanged if equal - Worsened if latest rank > baseline rank Data quality tracked: - baseline missing/invalid - latest missing/invalid - top invalid raw examples (baseline and latest) Current dashboard scope note ---------------------------- The current NCD dashboard page intentionally focuses on the 8 chart groups above. Legacy/extra metrics (for example BP control %, DM control %, sustained control, quality-of-care panels, and related KPI cards) are not included in this current dashboard layout. Export files and what they represent ------------------------------------ Per clinic in outputs/{DB}/: - registers_clean.csv: cleaned register-level dataset - followups_clean.csv: cleaned follow-up-level dataset - patient_latest.csv: one latest follow-up record per patient (main denominator table) - monthly_summary.csv: registrations and follow-ups by month - bp_control_trend.csv: monthly BP control trend - dm_control_trend.csv: monthly DM control trend - continuity_metrics.csv: active/ltfu/missed/late counts + rates - quality_metrics.csv: lab coverage metrics - risk_metrics.csv: risk markers and symptoms - operations_metrics.csv: medication changed summary - operations_distributions.csv: adherence + drug supply distributions - other_medications.csv: normalized Foth_medi_spec distribution - equity_bp_control.csv, equity_dm_control.csv: equity tables - kpi_summary.csv: top KPI table - data_quality_report.csv: quality counters - data_quality/invalid_date_ranges.csv: row-level invalid date list - doctor/doctor_action_lists.csv: clinical follow-up list Overall outputs in outputs/overall/: - Same structure computed on combined clinic data. - Includes kpi_summary_by_clinic.csv and overall invalid_date_ranges.csv. Note: - Some exported CSV files are generated by the pipeline for audit/reference compatibility, even if those metrics are not currently rendered as charts on this dashboard page. Current key default parameters (config.yaml) -------------------------------------------- - date_range: 2015-01-01 to 2025-12-31 - data_quality.min_valid_date: 2018-01-01 - data_quality.max_valid_date: 2025-12-31 - active_days: 180 - ltfu_days: 90 - visit plan on-time window: 0-7 days - visit plan late window: 8-83 days - return-to-care gap: >=84 days Audit/troubleshooting locations ------------------------------- - Background status: storage/app/ncd_analysis_status.json - Background log: storage/logs/ncd_analysis_background.log - Last outputs timestamp: derived from latest CSV mtime in outputs/{clinic}/ Notes ----- - Unknown/NA/null-like values are treated as missing in most calculations. - Date filtering and trend filters are applied before most metrics. - If a metric looks inconsistent, first compare: - selected clinic - timeframe / trend year - denominator definition (patient-level vs visit-level).