data scienceintermediate740 tokens
Exploratory Data Analysis (EDA)
Comprehensive exploratory data analysis with visualizations
edadata-analysisvisualizationstatisticspythonpandas
Prompt Template
You are a data analyst conducting exploratory data analysis. Create a comprehensive EDA report for this dataset.
**Dataset:**
{dataset_name}
**Data:**
```
{data_sample}
```
**Business Context:**
{business_context}
**Key Questions:**
{research_questions}
Perform thorough EDA following this framework:
**1. Univariate Analysis:**
For each variable:
**Numerical Variables:**
- Distribution plots (histogram, box plot, violin plot)
- Central tendency (mean, median, mode)
- Spread (std, variance, IQR)
- Shape (skewness, kurtosis)
- Key insights and anomalies
**Categorical Variables:**
- Frequency tables
- Bar charts
- Top/bottom categories
- Cardinality
- Imbalance issues
**2. Bivariate Analysis:**
- **Numerical vs Numerical:**
- Scatter plots
- Correlation coefficients (Pearson, Spearman)
- Trend lines
- **Categorical vs Numerical:**
- Group comparisons (box plots)
- Statistical tests (t-test, ANOVA)
- **Categorical vs Categorical:**
- Cross-tabulation
- Chi-square test
- Mosaic plots
**3. Multivariate Analysis:**
- Correlation heatmap
- Pair plots for key variables
- PCA/t-SNE if high-dimensional
- Interaction effects
- Segmentation opportunities
**4. Pattern Discovery:**
- Trends over time (if temporal data)
- Seasonality or cyclical patterns
- Clusters or groups
- Outlier groups
- Missing data patterns
**5. Statistical Insights:**
For each key finding:
- Statistical evidence (test, p-value)
- Effect size
- Confidence intervals
- Business interpretation
**6. Python Code:**
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)
# Load data
df = pd.read_csv('{file_path}')
# 1. Dataset Overview
print("Dataset Shape:", df.shape)
print("\nColumns:", df.columns.tolist())
print("\nFirst few rows:")
print(df.head())
print("\nSummary Statistics:")
print(df.describe())
# 2. Univariate Analysis
# [Plot distributions for each variable]
# 3. Correlation Analysis
corr_matrix = df.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix')
plt.show()
# 4. Bivariate Analysis
# [Key relationships]
# 5. Outlier Detection
# [Identify and visualize outliers]
# 6. Feature Importance (if target variable)
# [Statistical tests or model-based importance]
```
**7. Key Findings Summary:**
1. **Finding 1:** [Insight with statistical backing]
- Evidence: [test result, p-value]
- Business Impact: [what it means]
- Recommendation: [action item]
2. **Finding 2:** [Continue pattern]
**8. Data Quality Issues:**
- Missing values: [patterns]
- Outliers: [legitimate or errors]
- Imbalanced features: [which ones]
- Data errors: [inconsistencies]
**9. Recommendations:**
**For Analysis:**
- Variables to focus on
- Transformations needed
- Segments to analyze separately
- Additional data needed
**For Business:**
- Actionable insights
- Opportunities identified
- Risks discovered
- Next steps
**10. Appendix:**
- Full statistical test results
- All visualization code
- Data dictionary
- Assumptions made
Format as: Executive Summary → Detailed Analysis → Code → RecommendationsVariables to Replace
{dataset_name}{data_sample}{business_context}{research_questions}{file_path}Pro Tips
Start with simple univariate analysis before diving into complex relationships. Always validate statistical assumptions.
Related Prompts
Need More Prompts?
Explore our full library of 60+ professional AI prompt templates
Browse All Prompts →