VPS Lakeshore CLINICAL DATA COMMONS
VPS Lakeshore Hospital · Nettoor, Kochi
OMOP CDM v5.4 · Pilot 2020–25
Clinical Research Data Commons · Pilot v0.1

A living repository of
Kerala's clinical reality

De-identified, OMOP-standardised research data across six clinical domains — structured for real-world evidence and pharma research partnerships.

47,840 patients
Total cohort
6 domains
Clinical specialties
2.4M records
Observations
OMOP v5.4
Data standard
87% avg
Completeness
🎗️
PILOT LIVE
Oncology
Breast, oral/H&N, cervical, haemato-oncology. mCODE-aligned with OMOP extension.
4,560
Patients
52
Fields
4
Types
❤️
IN CURATION
Cardiology
CAD, heart failure, arrhythmias. Echo parameters, cath lab, device therapy, outcomes.
~8,200
Est. patients
68
Fields
Q3 2026
Target
🫘
PLANNED
Nephrology & Transplant
CKD, renal replacement, organ transplantation. Graft survival, immunosuppression.
~3,800
Est. patients
58
Fields
Q4 2026
Target
🧠
PLANNED
Neurology & Neurosurgery
Stroke, epilepsy, neurosurgical outcomes. mRS scores, imaging findings, treatment.
~5,100
Est. patients
44
Fields
Q1 2027
Target
🩺
PLANNED
Endocrinology & Diabetes
T1/T2 diabetes, thyroid, obesity. HbA1c trajectories, diabetes-cancer comorbidity.
~14,200
Est. patients
40
Fields
Q2 2027
Target
🏥
PLANNED
Critical Care / ICU
43 ICU beds. APACHE/SOFA scores, ventilation, time-series vitals. MIMIC-compatible.
~11,980
Est. patients
120+
Fields
Q3 2027
Target
Most Prevalent Multi-Domain Patient Pairs
Patients appearing across two or more clinical domains — unique to a multi-disciplinary commons.
4,560
patients in cohort
4
Cancer types
2020–25
Dx years
87.4%
Completeness
mCODE v3
Standard
Summary
Cases
Survival
Genomics
Data Dictionary
Breast Cancer
1,842
40.4% · Median age 48
ER/PR+
68%
HER2+
22%
Triple Neg
14%
Oral / H&N
1,376
30.2% · Median age 54
Tobacco
74%
Areca nut
61%
HPV+
18%
Cervical
614
13.5% · Median age 44
FIGO I–II
42%
FIGO III–IV
58%
Squamous
82%
Haemato-oncology
728
16.0% · Median age 42
Lymphoma
38%
Leukaemia
29%
Myeloma
18%
Stage Distribution by Cancer Type
% at each TNM/FIGO stage at diagnosis
Age at Diagnosis
5-year bins · all cancer types
Treatment Modality
Primary treatment by cancer type
Annual Case Volume
New diagnoses 2020–2025
Patient Records — de-identified · DPDP 2023
1–20 of 4,560
Case IDTypeAgeStageTreatmentBiomarkerYearVital StatusCo-domainFU (mo)
Kaplan–Meier Overall Survival by Cancer Type
All four types · From date of diagnosis · Synthetic illustrative data · n = 4,560
5-Year OS by Stage
All cancer types combined
Breast: OS by Receptor Subtype
n = 1,842
Biomarker Availability by Cancer Type
% of cases with molecular data
EGFR / KRAS Prevalence — Indian vs Western Reference
Oral/H&N cohort · % with mutation
Genomic Fields Available
FieldTypeCoverageCancer TypesMethod
er_statusCategorical94%BreastIHC H-score
her2_statusCategorical91%BreastIHC / FISH
hpv_statusCategorical78%Oral CervicalPCR / p16 IHC
ki67_pctNumeric62%BreastIHC %
cytogeneticsText / coded71%HaemFISH / karyotype
ngs_panelJSON18%AllICD-O-3 / HGVS
mCODE-aligned Data Dictionary
6 domains · 52 fields · FHIR R4 · OMOP v5.4 compatible
👤
Patient
8 fields
case_idDe-identified unique patient identifierstring
age_at_dxAge at diagnosis in yearsinteger
sexBiological sex (Male / Female / Unknown)enum
districtKerala district of residencecoded
tobacco_exposureTobacco and areca nut use historyenum
ecog_psECOG performance status at diagnosis (0–4)0–4
🔬
Disease
12 fields
primary_siteICD-O-3 topography codecoded
histologyICD-O-3 morphology and descriptioncoded
tnm_stageAJCC 8th edition clinical TNM stageenum
figo_stageFIGO stage — cervical / gynaecologicalenum
m_categoryDistant metastasis (M0 / M1)enum
dx_dateDate of pathological diagnosis (date-shifted)date
💊
Treatment
14 fields
primary_txPrimary treatment modalityenum
surgery_typeSurgical procedure (if applicable)coded
chemo_regimenChemotherapy protocol and drugscoded
rt_techniqueRadiation technique (IMRT / 3DCRT / SBRT)enum
rt_dose_gyTotal radiation dose in Graydecimal
tx_completionPlanned treatment completion statusenum
🧬
Genomics
10 fields
er_statusOestrogen receptor status (Breast)enum
her2_statusHER2/neu amplification statusenum
hpv_statusHPV status (Oral / Cervical)enum
ki67_pctKi-67 proliferation index (%)decimal
cytogeneticsFISH results — haematologicalcoded
ngs_mutationsNGS panel somatic mutations (HGVS)JSON
📊
Outcome
6 fields
vital_statusAlive / Deceased / Lost to FUenum
last_contact_moFollow-up duration from dx (months)decimal
recurrenceDisease recurrence eventboolean
recurrence_typeLocal / Regional / Distantenum
response_assessRECIST response at end of primary treatmentenum
cause_of_deathCancer / other / unknownenum
🏥
Provenance
4 fields
source_siteContributing hospital sitestring
abstraction_dateDate record was curateddate
abstractor_qcQC status (passed / flagged / pending)enum
irb_protocolIEC approval reference numberstring

Built on the OMOP Common Data Model v5.4 — used by the FDA, EMA, and 300+ hospitals globally. Disease-specific extensions (mCODE for oncology, MIMIC-compatible time-series for ICU) plug into the OMOP backbone.

🏥
Source Systems — Hospital HIS / EMR
Physician notes, lab results, radiology, pharmacy, OT notes. Structured and unstructured.
MocDoc HISPACS/RISLab LIS
↓ ETL · NLP extraction · De-identification · ABHA linkage
🔄
OMOP CDM v5.4 — Standardised Patient Spine
Universal person, visit, condition, drug, measurement, observation tables. Enables cross-domain queries across all departments.
OMOP v5.4SNOMED-CTICD-10RxNormLOINC
↓ Domain-specific extensions mounted on OMOP backbone
🎗️
Oncology — mCODE FHIR profiles
Cancer staging, tumour histology, biomarkers, treatment protocols, real-world progression events.
mCODE v3ICD-O-3AJCC 8th
❤️
Cardiology — ACC/AHA FHIR profiles
Echo parameters (LVEF), cath lab findings, device therapy, NYHA/CCS classification.
ACC/AHASNOMEDDICOM SR
🏥
ICU — MIMIC-IV–compatible time-series
Hourly vitals, ventilator parameters, APACHE/SOFA scores, vasopressor timelines.
MIMIC-IVHL7 FHIR
↓ Access control layer
🔒
Access Control & Governance
IEC-approved tiers: open aggregate, registered de-identified, controlled linked/genomic. DPDP Act 2023 compliant.
DPDP 2023IEC ApprovedRBAC
🟢
Open Access
Aggregate statistics, domain summaries, data dictionaries, completeness reports. No registration required.
AVAILABLE NOW
🔵
Registered Access
De-identified patient-level records. Requires institutional affiliation, data use agreement, and IEC protocol reference.
Q3 2026
🔴
Controlled Access
Linked genomic and longitudinal data. Pharma RWE partnerships. Requires full data governance review and commercial agreement.
Q1 2027
Research Partner Pipeline
Institution TypeUse CaseDomainAccess LevelStatus
Academic Medical CentreRetrospective outcomes — Kerala cancer patternsOncologyRegisteredActive
Pharmaceutical CompanyIndian RWE for label expansion (CDSCO)OncologyControlledIn Review
Medical Device CompanyPost-market surveillance — cardiac devicesCardiologyRegisteredPending Q3
Public Health InstituteDiabetes-cancer comorbidity, KeralaEndo OncologyRegisteredPending Q4
AI / Health-tech CompanyDiagnostic AI — South Indian populationMulti-domainControlledPending 2027