Project: Big Data Analysis & Survival Analysis
Student: 马硕峰 | ID: 12313113
Part 2: Survival Analysis (核心展示)
1. Introduction
This project applies survival analysis to a telecom customer churn dataset. It uses Kaplan-Meier estimation, Cox Proportional Hazards model, and AFT model to identify key churn factors and build a Customer Lifetime Value (CLV) evaluation tool.
2. Data & Cohort
Dataset: IBM telecom churn data (7,043 customers)
Cohort: 3,351 customers with month-to-month contracts and internet services.
3. Kaplan-Meier Survival Analysis
- Median customer survival time: ~34 months
- OnlineSecurity, TechSupport, and InternetService show strong separation in survival curves
- Log-rank tests confirm significant differences between groups
4. Cox Proportional Hazards Model
- Key predictors: Dependents, InternetService, OnlineBackup, TechSupport
- DSL customers have ~80% churn risk of Fiber customers
- Proportional hazards assumption partially violated for some variables
5. Accelerated Failure Time (AFT) Model
- Distribution: Log-Logistic
- DeviceProtection, OnlineSecurity, and TechSupport significantly extend customer tenure
6. CLV Dashboard & Business Insights
- Interactive dashboard for CLV calculation
- Supports CAC budgeting and customer segmentation decisions
- Helps identify high-risk churn groups for retention strategies
Website deployed via GitHub Pages | Domain: mashuofeng.me | HTTPS enabled