Biostatistics & Bioinformatics Notes & MCQs

Biostatistics & Bioinformatics

Biostatistics applies statistical principles to biological and medical research. Bioinformatics uses computational tools to analyze biological data. Both are increasingly essential in modern medicine.

Key Biostatistical Concepts

Types of Data: Nominal (categories, no order), Ordinal (ranked categories), Interval (equal intervals, no true zero), Ratio (equal intervals + true zero — most lab values)
Measures of Central Tendency: Mean (average), Median (middle value — better for skewed data), Mode (most frequent)
Measures of Dispersion: Range, Variance (mean of squared deviations), SD (square root of variance), SE (SD/√n), Coefficient of Variation (CV = SD/mean × 100%)
Normal Distribution: Bell-shaped, symmetric; Mean ±1 SD = 68%, ±2 SD = 95%, ±3 SD = 99.7%

Diagnostic Test Parameters

Sensitivity = TP/(TP+FN): Ability to detect disease when present. High sensitivity → low FN. Best for ruling OUT disease (SnNout). Screening test.
Specificity = TN/(TN+FP): Ability to rule out disease when absent. High specificity → low FP. Best for ruling IN disease (SpPin). Confirmatory test.
PPV = TP/(TP+FP): Probability that positive test truly has disease. Depends on PREVALENCE (↑prevalence → ↑PPV).
NPV = TN/(TN+FN): Probability that negative test truly doesn't have disease. ↑prevalence → ↓NPV.
LR+ = Sensitivity/(1-Specificity): How much positive test increases disease odds
ROC curve: Plot of Sensitivity vs (1-Specificity) at all cutoffs. AUC (Area Under Curve) = overall accuracy; 0.5 = useless, 1.0 = perfect.

Hypothesis Testing

Null hypothesis (H₀): No difference (e.g., treatment has no effect)
p-value: Probability of observing result if H₀ is true. p < 0.05 → reject H₀ (statistically significant)
Type I error (α): Reject H₀ when it is true (false positive). Controlled by α level (0.05).
Type II error (β): Fail to reject H₀ when it is false (false negative). Power = 1-β.
Common tests: t-test (compare 2 means), ANOVA (compare >2 means), Chi-squared (categorical data), Mann-Whitney U (non-parametric)

Bioinformatics

Sequence Alignment: BLAST (Basic Local Alignment Search Tool) — rapidly aligns query sequence to database; finds homologs. CLUSTAL — multiple sequence alignment.
Databases: GenBank/NCBI (DNA sequences), UniProt/SwissProt (proteins), PDB (3D protein structures), OMIM (Online Mendelian Inheritance in Man — genetic diseases)
Genome Browsers: UCSC, Ensembl — visualize genome, annotations, variants
Variant Annotation: ClinVar, dbSNP, COSMIC (cancer somatic mutations)
Protein Structure: AlphaFold2 (DeepMind) — AI predicts 3D protein structure from sequence with remarkable accuracy. Revolutionized structural biology.
Pathway Analysis: KEGG, Reactome — map gene sets to biological pathways; used in RNA-seq data interpretation

Epidemiology Measures

Incidence: New cases per population per time
Prevalence: Existing cases per population at a time point
RR (Relative Risk): Risk in exposed / Risk in unexposed (cohort study)
OR (Odds Ratio): Odds of exposure in cases / Odds in controls (case-control study; approximates RR when disease rare)
NNT (Number Needed to Treat): 1/ARR (Absolute Risk Reduction)

Biostatistics & Bioinformatics

Biostatistics & Bioinformatics

Key Biostatistical Concepts

Diagnostic Test Parameters

Hypothesis Testing

Bioinformatics

Epidemiology Measures

Biostatistics & Bioinformatics Quiz

Quiz Complete!