data scienceintermediate740 tokens

Exploratory Data Analysis (EDA)

Comprehensive exploratory data analysis with visualizations

edadata-analysisvisualizationstatisticspythonpandas

Prompt Template

You are a data analyst conducting exploratory data analysis. Create a comprehensive EDA report for this dataset.

**Dataset:**
{dataset_name}

**Data:**
```
{data_sample}
```

**Business Context:**
{business_context}

**Key Questions:**
{research_questions}

Perform thorough EDA following this framework:

**1. Univariate Analysis:**
For each variable:

**Numerical Variables:**
- Distribution plots (histogram, box plot, violin plot)
- Central tendency (mean, median, mode)
- Spread (std, variance, IQR)
- Shape (skewness, kurtosis)
- Key insights and anomalies

**Categorical Variables:**
- Frequency tables
- Bar charts
- Top/bottom categories
- Cardinality
- Imbalance issues

**2. Bivariate Analysis:**
- **Numerical vs Numerical:**
  - Scatter plots
  - Correlation coefficients (Pearson, Spearman)
  - Trend lines
- **Categorical vs Numerical:**
  - Group comparisons (box plots)
  - Statistical tests (t-test, ANOVA)
- **Categorical vs Categorical:**
  - Cross-tabulation
  - Chi-square test
  - Mosaic plots

**3. Multivariate Analysis:**
- Correlation heatmap
- Pair plots for key variables
- PCA/t-SNE if high-dimensional
- Interaction effects
- Segmentation opportunities

**4. Pattern Discovery:**
- Trends over time (if temporal data)
- Seasonality or cyclical patterns
- Clusters or groups
- Outlier groups
- Missing data patterns

**5. Statistical Insights:**
For each key finding:
- Statistical evidence (test, p-value)
- Effect size
- Confidence intervals
- Business interpretation

**6. Python Code:**
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 8)

# Load data
df = pd.read_csv('{file_path}')

# 1. Dataset Overview
print("Dataset Shape:", df.shape)
print("\nColumns:", df.columns.tolist())
print("\nFirst few rows:")
print(df.head())
print("\nSummary Statistics:")
print(df.describe())

# 2. Univariate Analysis
# [Plot distributions for each variable]

# 3. Correlation Analysis
corr_matrix = df.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix')
plt.show()

# 4. Bivariate Analysis
# [Key relationships]

# 5. Outlier Detection
# [Identify and visualize outliers]

# 6. Feature Importance (if target variable)
# [Statistical tests or model-based importance]
```

**7. Key Findings Summary:**
1. **Finding 1:** [Insight with statistical backing]
   - Evidence: [test result, p-value]
   - Business Impact: [what it means]
   - Recommendation: [action item]

2. **Finding 2:** [Continue pattern]

**8. Data Quality Issues:**
- Missing values: [patterns]
- Outliers: [legitimate or errors]
- Imbalanced features: [which ones]
- Data errors: [inconsistencies]

**9. Recommendations:**
**For Analysis:**
- Variables to focus on
- Transformations needed
- Segments to analyze separately
- Additional data needed

**For Business:**
- Actionable insights
- Opportunities identified
- Risks discovered
- Next steps

**10. Appendix:**
- Full statistical test results
- All visualization code
- Data dictionary
- Assumptions made

Format as: Executive Summary → Detailed Analysis → Code → Recommendations

Variables to Replace

{dataset_name}
{data_sample}
{business_context}
{research_questions}
{file_path}

Pro Tips

Start with simple univariate analysis before diving into complex relationships. Always validate statistical assumptions.

Need More Prompts?

Explore our full library of 60+ professional AI prompt templates

Browse All Prompts →