NCD Analytics Calculation Reference (Detailed) ============================================== Purpose ------- This file documents exactly how the NCD analytics results are calculated in the current Laravel + Python implementation. Use this as a technical reference for medical stakeholders, QA, and data audits. Code paths used --------------- - Python pipeline: ncd_analysis.py - SQL views: metrics_views.sql - Laravel dashboard calculations: app/Http/Controllers/Dashboard1Controller.php - NCD dashboard page: resources/views/ncd_dashboard.blade.php Primary source tables --------------------- 1) ncd_pt_registers - Key columns used: - Pid, FuchiaID, Clinic_code - Reg_Date - visit_Age, Current_Age - Gender - 1stBP, 2ndBP, 3rdBP - 1stHypertension, 2nd_Hypertension, staging_Hypertension - 1st_tot_Diabetes, 1st_RBS, 1st_RBS_date - 2nd_tot_Diabetes, 2nd_RBS, 2nd_RBS_date 2) ncd_followups - Key columns used: - Pid, FuchiaID, Clinic_code - Visit_date, Next_Appointment - own_clinic_Bp, own_Bp_Stage - NCD_Diagnosis - FBS, FBS_test_date, Loaction_test - 2HPP, 2HPP_test_date, Loaction_Test2 - HBA1C, Lab_res_Date - RBS result (legacy) - Creatinine, CRCL, Uring_AC_ratio, CVD_Risk - Medication changed, Patient_adhe medic, Drug_Supply - Foth_medi_spec - Out_come, Tout_mam_clinic, Ncd_Tout_icmv_location 3) patients - Key columns used: - Pid - Date of Birth - Gender (used for gender correction on cumulative follow-up chart) High-level processing flow -------------------------- Step 1. Per clinic DB, create/update views: - v_ncd_base_join: registration + patient join - v_ncd_followups_clean: follow-up normalized columns + BP parse - v_ncd_monthly_summary_seed: monthly seed counts Step 2. Read view data into pandas: - registers (from v_ncd_base_join) - followups (from v_ncd_followups_clean) - monthly seed Step 3. Decrypt configured encrypted columns via Laravel helper: - tools/laravel_decryptor.php - Python sends batches to PHP and receives decrypted values. Step 4. Data quality date limits: - config.yaml -> data_quality.min_valid_date / max_valid_date - Dates outside range are flagged. - For core row date columns: - register row dropped if reg_date invalid range - follow-up row dropped if visit_date invalid range - Other invalid date fields are nulled and tracked. Step 5. Build core patient-level dataset: - Latest follow-up per Pid is selected (max visit_date, tie by followup_id desc). - This dataset is saved as patient_latest.csv and is used by many dashboard KPIs. Step 6. Export clinic outputs: - outputs/{DB}/registers_clean.csv - outputs/{DB}/followups_clean.csv - outputs/{DB}/patient_latest.csv - plus trend and summary CSVs (listed below). Step 7. Build combined overall output: - Concatenate all clinic datasets (source_db retained). - Recalculate metrics for outputs/overall. Privacy and ID handling ----------------------- - If privacy.mask_ids=true, stakeholder CSVs hash pid/patient_id/fuchia_id with SHA-256 + id_salt. - Doctor list can keep full IDs using privacy.keep_full_ids_in_doctor_lists=true. - For cumulative follow-up by gender, gender is remapped from patients table; encrypted Gender is decrypted with decrypt_light('General'). Core parsing rules ------------------ BP parsing - Pattern: ^\\s*\\d{2,3}\\s*/\\s*\\d{2,3}\\s*$ - Valid range: - SBP: 50 to 300 - DBP: 30 to 200 - Out-of-range BP is treated as invalid for control/staging. BP stage classification - Stage 3: SBP >= 180 OR DBP >= 110 - Stage 2: SBP >= 160 OR DBP >= 100 - Stage 1: SBP >= 140 OR DBP >= 90 - Normal: otherwise Diabetes control hierarchy - Per row, first available test in this order: 1) HbA1c 2) 2HPP 3) FBS 4) RBS (legacy fallback) - Controlled thresholds (config.yaml): - HbA1c < 7.0 - 2HPP < 180 - FBS < 126 - RBS < 200 Age derivation - Primary: date_of_birth from patients + reference date - Formula: floor((reference_date - dob in days) / 365.2425) - Fallbacks when DOB unavailable: - current_age - visit_age - Dashboard age distribution uses ncd_pt_registers.visit_Age and includes only ages 1..120. Dashboard metrics: exact calculation steps ------------------------------------------ 1) Yearly/Monthly cohort and diagnosis trend Source: - ncd_pt_registers (register rows within filters) Grouping: - Yearly: period = year(Reg_Date) - Monthly: period = month(Reg_Date) for selected trend_year Total cohort: - Count of register rows per period. Diagnosis flags: - Hypertension diagnosis present if 1stHypertension contains New/Known/Know (case-insensitive normalized). - Diabetes diagnosis present if 2nd_Hypertension contains New/Known/Know. - Both = both flags true. - No diagnosis = total cohort - (HTN only + DM only + both). Gender segmentation: - Male/Female split for HTN only, DM only, both, and no diagnosis. 2) Cumulative follow-up visits by year Source: - ncd_followups.Visit_date + NCD_Diagnosis Year bucket: - year(Visit_date) Diagnosis category per visit: - hypertension if text contains hypertension/htn - diabetes if text contains diabetes/dm - both if both terms appear (or contains "both") - other for unknown/non-target Two outputs are produced: - Yearly raw counts by category - Cumulative counts by category across years 3) Cumulative follow-up visits by year (gender) Source: - follow-up visits + gender remapped from patients table (preferred over followup gender text) Per year: - Male/Female counts are split by diagnosis category. - Chart displays cumulative male/female totals over years. 4) Age distribution Source: - ncd_pt_registers.visit_Age Rules: - Use floor(age) - Include only 1 <= age <= 120 - Bin size: 5 years - Bins: 1-5, 6-10, ..., 116-120 5) LTFU from latest appointment (+84 days) Source: - For each Pid, use latest follow-up by Visit_date. Classification logic: - If Out_come indicates died/death/dead/tout/transfer out -> Exited - Else if Next_Appointment missing -> Missing next appointment - Else cutoff = Next_Appointment + grace_days (default 84) - cutoff < observe_date -> LTFU - otherwise -> Active Observe date: - current time in dashboard calculation (Laravel side), or configured report end in pipeline summaries. Also provided: - gender-stacked counts for Active/LTFU/Exited/Missing. 6) Clinic visit plan status by year Source: - Consecutive visit pairs per Pid in ncd_followups For each current visit i: - Needs Next_Appointment on visit i - Compare to next visit i+1 - delta_days = next_visit_date - next_appointment_date Status: - Unplan (early): delta_days < 0 - Ontime: 0 <= delta_days <= 7 - Late: 8 <= delta_days < 84 Exclusions: - Next_Appointment year < minYear (default 2020) excluded - delta_days >= 84 excluded - Missing Next_Appointment tracked - No following visit tracked 7) Return to care by year Source: - Consecutive Visit_date pairs per Pid Rule: - If gap between consecutive visits >= 84 days, later visit counted as return-to-care event. Output: - yearly event counts - summary: total events, unique patients with >=1 event, excluded pairs under 84 days. 8) Control status (BP stage change matrix) Source: - Baseline stage from register - Latest stage from follow-up Baseline stage extraction priority: - 3rdBP -> 2ndBP -> 1stBP (first valid BP parse) - fallback to staging_Hypertension text parse Latest stage extraction priority: - numeric sbp/dbp columns if valid - fallback parse own_clinic_Bp - fallback own_Bp_Stage text parse Then for each patient: - Require valid baseline and valid latest stage. - Build 4x4 matrix: baseline stage (rows) vs latest stage (columns) - Direction: - Improved if latest rank < baseline rank - Unchanged if equal - Worsened if latest rank > baseline rank Data quality tracked: - baseline missing/invalid - latest missing/invalid - top invalid raw examples (baseline and latest) 9) BP control % Patient-level metric from patient_latest.csv: - Denominator: patients with valid latest SBP and DBP (bp_with_values=true) - Numerator: denominator patients with SBP < 140 AND DBP < 90 (bp_controlled=true) - Rate = numerator / denominator * 100 Exclusion groups are tracked: - missing BP - invalid format - out-of-range - other 10) DM control % Patient-level metric from patient_latest.csv: - Denominator: patients with at least one usable DM test (dm_with_values=true) - Numerator: denominator patients controlled by hierarchy (dm_controlled=true) - Rate = numerator / denominator * 100 Hierarchy and source fields: - HbA1c -> HBA1C/Lab_res_Date - 2HPP -> 2HPP/2HPP_test_date/Loaction_Test2 - FBS -> FBS/FBS_test_date/Loaction_test - RBS fallback -> RBS result (legacy) 11) Active caseload and LTFU (patient-level flags) Reference date: - report_end (selected end date or config/date bounds fallback) Active patient: - latest visit_date >= report_end - active_days LTFU: - latest visit_date < report_end - ltfu_days (or missing visit_date) Current defaults: - active_days = 180 - ltfu_days = 90 12) Missed appointment (visit-level and patient-level forms) Visit-level continuity (displayed % in continuity section): - Only active patients included. - For each follow-up with Next_Appointment: - if no next visit exists -> missed - else if next visit date != Next_Appointment date -> missed - Missed appointment rate = missed visits / total follow-up visits considered Patient-level boolean (used in some summaries): - true if patient has at least one missed appointment event using same rule. 13) Quality-of-care coverage Source: - follow-up rows, grouped by patient Lookback windows from report_end: - HbA1c: last 6 months - Kidney (Creatinine/CRCL/Uring_AC_ratio): last 12 months - CVD risk: last 12 months For each patient: - metric is true if any visit within lookback has non-missing value in that field. Coverage rate: - covered patients / total patients with follow-up records. 14) Risk stratification From patient latest follow-up: - High CVD risk: cvd_risk >= cvd_risk_high (default 20) or text contains "high" - CKD marker: - creatinine > creatinine_high (default 1.3), OR - crcl < crcl_low (default 60), OR - uring_ac_ratio >= uring_ac_ratio_high (default 30) - Diabetic foot / neuropathy / hypoglycemia flags: - yes/y/true/1/late/on treated as positive 15) Operations - Medication changed rate: - numerator: latest rows where medication_changed is truthy - denominator: all latest rows - Adherence and Drug supply distributions: - category counts from patient_adherence and drug_supply values 16) Other medications (Foth_medi_spec) Source: - ncd_followups.Foth_medi_spec Normalization: - remove dose/schedule/time tokens (mg, od/bd/tid, wk/weeks, etc.) - tokenize and map aliases from config.yaml (e.g., ibuprofen -> ibu, multivitamin -> mv) - count mentions and share_of_mentions. 17) Equity summaries Grouped by: - sex (gender) - age band For each group: - patients count - BP control rate - DM control rate - LTFU rate 18) Control improvement Baseline status: - earliest register per patient - BP baseline from first valid of 1st/2nd/3rd BP - DM baseline from first_dm_* then second_dm_* Eligible for improvement: - baseline uncontrolled AND has latest valid measurement Rate: - improved / eligible * 100 19) Sustained control For each patient and condition: - need at least 2 valid follow-up measurements - sustained if last two are both controlled Rate: - sustained / eligible * 100 Export files and what they represent ------------------------------------ Per clinic in outputs/{DB}/: - registers_clean.csv: cleaned register-level dataset - followups_clean.csv: cleaned follow-up-level dataset - patient_latest.csv: one latest follow-up record per patient (main denominator table) - monthly_summary.csv: registrations and follow-ups by month - bp_control_trend.csv: monthly BP control trend - dm_control_trend.csv: monthly DM control trend - continuity_metrics.csv: active/ltfu/missed/late counts + rates - quality_metrics.csv: lab coverage metrics - risk_metrics.csv: risk markers and symptoms - operations_metrics.csv: medication changed summary - operations_distributions.csv: adherence + drug supply distributions - other_medications.csv: normalized Foth_medi_spec distribution - equity_bp_control.csv, equity_dm_control.csv: equity tables - kpi_summary.csv: top KPI table - data_quality_report.csv: quality counters - data_quality/invalid_date_ranges.csv: row-level invalid date list - doctor/doctor_action_lists.csv: clinical follow-up list Overall outputs in outputs/overall/: - Same structure computed on combined clinic data. - Includes kpi_summary_by_clinic.csv and overall invalid_date_ranges.csv. Important denominator notes --------------------------- 1) "Patients" in KPI cards: - count of rows in patient_latest (latest-follow-up cohort), not all raw attendance rows. 2) BP/DM control denominator: - not all patients; only those with valid measurements for that metric. 3) Visit-level missed appointment percentage: - denominator is follow-up visits assessed among active patients. - this can differ from patient-level missed count. Current key default parameters (config.yaml) -------------------------------------------- - date_range: 2015-01-01 to 2025-12-31 - data_quality.min_valid_date: 2018-01-01 - data_quality.max_valid_date: 2025-12-31 - active_days: 180 - ltfu_days: 90 - BP control: <140/<90 - DM thresholds: HbA1c<7, 2HPP<180, FBS<126, RBS<200 Audit/troubleshooting locations ------------------------------- - Background status: storage/app/ncd_analysis_status.json - Background log: storage/logs/ncd_analysis_background.log - Last outputs timestamp: derived from latest CSV mtime in outputs/{clinic}/ Notes ----- - Unknown/NA/null-like values are treated as missing in most calculations. - Date filtering and trend filters are applied before most metrics. - If a metric looks inconsistent, first compare: - selected clinic - timeframe / trend year - denominator definition (patient-level vs visit-level).