Snapshot of the Interactive Dashboard (Excel)
The Problem: Olist, a Brazilian e-commerce marketplace, operates in a challenging logistics environment where delivery delays directly impact customer retention. The company lacks visibility into how these logistics inefficiencies affect brand reputation (Review Scores) and needs to identify which product categories drive the majority of revenue to optimize inventory management.
My Objectives: To address these challenges, I analyzed 100,000+ order records to:
- Quantify the cost of delay: Measure exactly how much late deliveries damage customer satisfaction scores.
- Identify revenue drivers: Determine which product categories constitute the "Vital Few" (Pareto Principle) to focus inventory efforts.
- Analyze seasonal trends: Investigate sales anomalies, such as the 2018 sales flatline, to understand external market threats.
- Recommend strategic actions: Provide data-driven suggestions to improve logistics reliability and reduce churn.
After data exploration and cleaning, I visualized critical findings regarding the "Logistics Gap" and sales seasonality. I designed an interactive Dashboard on Excel using pre-attentive attributes (color/contrast) to highlight KPIs, allowing stakeholders to filter insights by State and Category dynamically.
The data was sourced from the Brazilian E-Commerce Public Dataset by Olist on Kaggle.
- Description: This is real commercial data (anonymized) comprising 100,000 orders made at multiple marketplaces in Brazil.
- Scope: It connects the order lifecycle from purchase to delivery, including customer reviews, seller location, product attributes, and geolocation data.
- Tables: 9 (Relational CSVs including Orders, Customers, Reviews, Products)
- Rows: 99,441 Orders (before filtering)
- Timeline: 2016 to 2018 (Analysis focuses on the 2017-2018 mature period).
While Python or SQL are standard for large datasets, this project was deliberately built in Excel to demonstrate:
- Stakeholder Accessibility: Creating a "low-code" tool that business executives can interact with immediately, without needing a Python environment or SQL access.
- Advanced Data Modeling: showcasing the use of Power Query and Data Models to handle 100k+ rows efficiently, bypassing standard spreadsheet limitations.
- Data Cleaning (Power Query): Merged multiple relational tables (Orders, Reviews, Customers, Geolocation, Products) and standardized data types.
- Feature Engineering: Created precision metrics including
Delivery_Time_Days(usingDuration.TotalDaysto capture fractional time vs. integer rounding),Delivery_Status(On-Time vs. Late), andCategory_Groups(reducing 74 Categories to 14 Category_groups). - Data Modeling: Utilized Pivot Tables and Data Models to aggregate millions of data points into dynamic KPIs.
- Visualization: Designed a professional dynamic Dashboard using Slicers, Linked Pictures, and Geographic Heat Maps.
An internal EDA_&_Stats sheet was created to validate assumptions:
- Distribution Analysis: Confirmed a "Long Tail" of late deliveries (20-90 days) using histograms.
- Outlier Removal: Filtered 2,500+ records with order status not being delivered and capped delivery time at 90 days to ensure metric stability.
- Sentiment Analysis: Identified a "J-Curve" in reviews. While on-time orders have high satisfaction, 46.3% of late orders receive a 1-star rating, proving that customers punish delivery delays with the lowest possible score.
The business relies heavily on a few core segments. The top three macro-categories (Health & Beauty, Watches & Gifts, and Bed, Bath & Table) generated 26% of the grand total revenue during the analyzed timeline. This indicates a strong market fit in these specific niches but exposes the business to risk if these specific categories underperform.
Figure 1: Top 3 categories drive over a quarter of total revenue.
Analysis reveals a sharp contrast in customer satisfaction based on delivery performance.
- On-time deliveries: Average Review Score of 4.21/5 (Avg delivery time: 10.8 days).
- Delayed deliveries: Average Review Score decreases to 2.55/5 (Avg delivery time: 30.3 days).
- Statistical Validation: Calculated a negative correlation coefficient of r = -0.31. While product quality remains the primary driver of satisfaction, this result confirms that delivery delays are a statistically significant drag on customer sentiment.
Figure 2: Late deliveries correlate with a massive 1.66-star drop in satisfaction.
The sales data displays a clear upward trend from Jan 2017 to Aug 2018, with a distinct peak in November, likely driven by Black Friday promotions. However, there is a notable seasonal pattern showing a sales decline of almost 14% from May to June in both 2017 and 2018, which requires further root cause analysis to mitigate future Q2 slumps.
-
Optimize "Last Mile" Logistics: Olist should Investigate carrier performance in states with the highest "Late Delivery" rates and or develop a predictive SLA breach algorithm. Reducing late deliveries by just 50% could potentially increase the overall Average Review Score, driving higher customer trust and repeat purchases.
-
Targeted Inventory Focus: Prioritizing stock availability and supplier relationships specifically for Health & Beauty, Watches & Gifts, and Bed, Bath & Table to prevent stockouts in these critical segments.
-
Seasonal Retention Strategy: Launching a dedicated "Post-May" investigation task force. If no concrete result was found, counter it with mid-year promotions or loyalty incentives.
The technical skills and concepts applied in this project include:
- Data Cleaning & ETL: Power Query (M Language), Data Type Standardization, Merging Queries.
- Note: The full M-Code logic is available in
power_query_etl.mfor technical review. - Data Modeling: Relational Schemas, Measure Creation (KPIs), Calculated Columns.
- Analysis: Statistical Correlation, Trend Analysis, Pareto Principle (80/20 Rule).
- Visualization: Dashboard Design, Slicers, Geographic Maps, Conditional Formatting, Interactive UI.
- Data Validation: compared calculation logic between Power Query (M) and SQL (BigQuery) to ensure metric consistency across platforms.
- Download Project File (Excel): The complete Excel model
- ETL Automation: The full Power Query M-Code logic is available in
power_query_etl.mfor technical review. - SQL Validation: The full BigQuery SQL code used to validate the Excel metrics is available in
olist_analysis_sql.sql.

