Skip to content

FutureEval v2.2#4272

Merged
CodexVeritas merged 13 commits intomainfrom
futureeval-v2.2
Feb 7, 2026
Merged

FutureEval v2.2#4272
CodexVeritas merged 13 commits intomainfrom
futureeval-v2.2

Conversation

@CodexVeritas
Copy link
Contributor

@CodexVeritas CodexVeritas commented Feb 6, 2026

Summary by CodeRabbit

  • Documentation

    • Added inline academic citations and clarified explanatory copy across methodology and performance text.
  • New Features

    • Partner tab/page and a four-card Tournament overview added.
  • UI / Content Updates

    • Renamed headings (e.g., "Pros vs. Bots"), added anchored graph sections for direct linking, updated chart Y-axis to "Pro Lead Over Bots", expanded legend, significance markers, binary-season messaging, and interactive tooltips; replaced a static tournament block with a clickable "Join" card.
  • Data / Content

    • Updated showcased bot win/loss examples, quote attributions, and summary metadata.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 6, 2026

Warning

Rate limit exceeded

@CodexVeritas has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Multiple FutureEval frontend updates: copy/link target rewrites and added section anchors; new Partner tab and partner page (server-side leaderboard fetch); Tournament overview UI; Biggest Bot Wins component API simplified; Pros-vs-Bots chart relabeled and enriched with per-quarter significance and legend; dataset and small metadata edits.

Changes

Cohort / File(s) Summary
Methodology & copy
front_end/src/app/(futureeval)/futureeval/components/futureeval-methodology-sections.tsx, front_end/src/app/(futureeval)/futureeval/components/futureeval-methodology-content.tsx
Added inline Links (including arXiv citation), rewrote paragraph/link text and section references; updated Bot Tournaments card href.
Benchmark headers & subtitle layout
front_end/src/app/(futureeval)/futureeval/components/benchmark/futureeval-benchmark-headers.tsx, front_end/src/app/(futureeval)/futureeval/components/benchmark/futureeval-model-benchmark.tsx
Renamed header to "Pros vs. Bots", adjusted subtitle copy; converted subtitle layout from grid to responsive flex and repositioned banner.
Section anchors & tabs
front_end/src/app/(futureeval)/futureeval/components/benchmark/futureeval-benchmark-tab.tsx, front_end/src/app/(futureeval)/futureeval/components/futureeval-tabs-shell.tsx, front_end/src/app/(futureeval)/futureeval/components/futureeval-tabs.tsx
Added explicit container IDs (performance-over-time-graph, biggest-bot-wins-graph, pros-vs-bots-graph, bot-tournaments-graph); added partner to Section union and registered Partner tab.
Partner page & tab
front_end/src/app/(futureeval)/futureeval/components/futureeval-partner-tab.tsx, front_end/src/app/(futureeval)/futureeval/partner/page.tsx
New Partner tab component and server-side partner page that fetches global leaderboard; exports metadata and default async page.
Biggest Bot Wins — API & data
front_end/src/app/(futureeval)/futureeval/components/benchmark/futureeval-biggest-bot-wins.tsx, front_end/src/app/(futureeval)/futureeval/data/biggest-bot-wins.ts
Removed navigation props (url, commentUrl) from ForecastDial/QuoteBlock and simplified wrappers; updated dataset entries (author swaps, added/removed loss entries).
Pros vs. Bots chart & config
front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx, front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/config.ts
Y-axis relabeled to "Pro Lead Over Bots"; added significance types, color fills, labels, ordering, significance icons, QuarterSummaryTable, binary-only season handling, responsive two-column layout, tooltip/legend/CI UI; new local components added.
Performance-over-time copy
front_end/src/app/(futureeval)/futureeval/components/benchmark/performance-over-time/futureeval-benchmark-forecasting-performance.tsx
Tweaked messaging text and added styling classes to the info block.
Participate / tournaments UI
front_end/src/app/(futureeval)/futureeval/components/futureeval-participate-tab.tsx, front_end/src/app/(futureeval)/futureeval/components/futureeval-tournaments.tsx
Added FutureEvalTournamentOverview (four-card grid) and integrated it; replaced static block with clickable "Ready to compete?" card; small resource copy changes.
Orbit metadata tweaks
front_end/src/app/(futureeval)/futureeval/components/orbit/metaculus-hub.tsx
Small copy edits to displayed metadata lines (counts/years).

Sequence Diagram(s)

sequenceDiagram
    participant Browser
    participant NextServer as Next.js Server (app/partner/page)
    participant LeaderboardAPI as ServerLeaderboardApi
    participant LeaderboardService as Leaderboard data source

    Browser->>NextServer: GET /futureeval/partner (SSR)
    NextServer->>LeaderboardAPI: getGlobalLeaderboard(null, null, "manual", "Global Bot Leaderboard")
    LeaderboardAPI->>LeaderboardService: fetch leaderboard data
    LeaderboardService-->>LeaderboardAPI: leaderboard payload
    LeaderboardAPI-->>NextServer: mapped leaderboard
    NextServer-->>Browser: Rendered FutureEvalPartnerPage (HTML with safeLeaderboard)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • elisescu
  • ncarazon
  • lsabor

Poem

🐰
I hopped through links and stitched a seam,
I renamed charts and tuned the theme.
Tabs and cards in tidy rows,
Significance blooms where data flows.
A tiny hop for FutureEval dreams ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'FutureEval v2.2' is vague and generic, lacking specific information about the actual changes in the pull request. Consider using a more descriptive title that highlights the main changes, such as 'Update FutureEval UI with improved navigation links and chart enhancements' or 'Refactor FutureEval benchmark visualizations and add partner tab'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch futureeval-v2.2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

🧹 Preview Environment Cleaned Up

The preview environment for this PR has been destroyed.

Resource Status
🌐 Preview App ✅ Deleted
🗄️ PostgreSQL Branch ✅ Deleted
⚡ Redis Database ✅ Deleted
🔧 GitHub Deployments ✅ Removed
📦 Docker Image ⚠️ Retained (auto-cleanup via GHCR policies)

Cleanup triggered by PR close at 2026-02-07T08:09:01Z

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@front_end/src/app/`(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx:
- Around line 897-900: SignificanceLegendIcon is pulling hardcoded colors from
LEGEND_ICON_COLORS and thus ignores dark mode; update it to use the theme-aware
color resolution used by SignificanceIconPoint by either 1) accepting a resolved
color prop and using getThemeColor(SIGNIFICANCE_FILL[type]) at the callsite, or
2) calling the theme helper inside the component (e.g., import/getThemeColor and
compute getThemeColor(SIGNIFICANCE_FILL[type])) and replace
LEGEND_ICON_COLORS[type] with that value; adjust any callers of
SignificanceLegendIcon accordingly so legend icons match SignificanceIconPoint
in dark/light themes.
🧹 Nitpick comments (3)
front_end/src/app/(futureeval)/futureeval/components/futureeval-methodology-sections.tsx (1)

644-645: Minor: section id no longer matches the new title.

The section id is pros-beat-bots but the visible heading is now "Pros vs. Bots." Consider updating the id to pros-vs-bots for consistency (and updating the #pros-beat-bots link on Line 632 accordingly). Not urgent since the anchor still works, but the semantic mismatch could confuse future maintainers.

front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx (2)

891-895: LEGEND_ICON_COLORS duplicates the DEFAULT values from SIGNIFICANCE_FILL in config.ts.

These hex values are identical to SIGNIFICANCE_FILL[status].DEFAULT. If the config colors change, the legend will fall out of sync. Consider importing and reusing the config values instead.

♻️ Suggested fix
-const LEGEND_ICON_COLORS: Record<SignificanceStatus, string> = {
-  significant_win: "#16a34a",
-  non_significant_win: "#ca8a04",
-  loss: "#dc2626",
-};
+const LEGEND_ICON_COLORS: Record<SignificanceStatus, string> = {
+  significant_win: SIGNIFICANCE_FILL.significant_win.DEFAULT,
+  non_significant_win: SIGNIFICANCE_FILL.non_significant_win.DEFAULT,
+  loss: SIGNIFICANCE_FILL.loss.DEFAULT,
+};

797-952: SignificanceIconPoint and SignificanceLegendIcon share nearly identical SVG drawing logic.

Both components render the same three icon shapes (plus, checkmark, X inside a circle) with the same proportions. If the icon design changes, both must be updated in lockstep. Consider extracting a shared helper that accepts cx, cy, r, color, and significance to render the inner glyph.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx (1)

386-404: ⚠️ Potential issue | 🟡 Minor

Hardcoded whisker SVG stroke color won't adapt to dark mode.

The #91999E stroke on the 95% CI whisker icon is hardcoded. In dark mode this may have poor contrast against a dark background. Consider using getThemeColor with a theme-aware color token (e.g., METAC_COLORS.gray[500]) like the rest of the chart does.

Since this is an inline SVG, one approach is to set stroke="currentColor" and let the parent <span>'s text color (which already uses text-gray-700 dark:text-gray-700-dark) cascade:

Proposed fix
              <line
                y1="1.83398"
                x2="13"
                y2="1.83398"
-               stroke="#91999E"
+               stroke="currentColor"
                strokeWidth="2"
              />
              <path
                d="M6.5 15.0391L6.5 2.03906"
-               stroke="#91999E"
+               stroke="currentColor"
                strokeWidth="2"
              />
              <line
                y1="14.959"
                x2="13"
                y2="14.959"
-               stroke="#91999E"
+               stroke="currentColor"
                strokeWidth="2"
              />
🧹 Nitpick comments (2)
front_end/src/app/(futureeval)/futureeval/components/futureeval-partner-tab.tsx (1)

42-62: Semantic heading level doesn't match visual typography.

<h3> is styled with FE_TYPOGRAPHY.h1. Since this component is embedded in a page that likely has its own <h1>, the lower semantic level is understandable — but the mismatch between the tag and the typography token name could confuse future maintainers. Consider using <h2> here (since it's the top-level heading within this tab), which would also give a cleaner h2 → h3 → h4 hierarchy across the subcomponents.

front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx (1)

798-873: QuarterSummaryTable — theme-aware colors used correctly, but getStatus uses linear search.

The table correctly resolves theme colors via getThemeColor(SIGNIFICANCE_FILL[status]), which is the right approach (and avoids the LEGEND_ICON_COLORS dark-mode issue). The getStatus helper uses .find() per category, which is fine given the small dataset sizes expected here.

One small note: SignificanceLegendIcon (called on Line 856) still uses the hardcoded LEGEND_ICON_COLORS rather than the theme-resolved col.color that is already computed. You could pass the resolved color as a prop to keep the icons consistent with the label color in dark mode (this ties into the existing review comment on LEGEND_ICON_COLORS).

Proposed approach — pass resolved color to icon
-const SignificanceLegendIcon: React.FC<{ type: SignificanceStatus }> = ({
-  type,
+const SignificanceLegendIcon: React.FC<{ type: SignificanceStatus; color?: string }> = ({
+  type,
+  color: colorProp,
 }) => {
-  const color = LEGEND_ICON_COLORS[type];
+  const color = colorProp ?? LEGEND_ICON_COLORS[type];

Then at the call site in QuarterSummaryTable:

-                   <SignificanceLegendIcon type={col.status} />
+                   <SignificanceLegendIcon type={col.status} color={col.color} />

This lets the icon inherit the already-resolved theme color without needing its own hook.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In
`@front_end/src/app/`(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx:
- Around line 399-428: The SVG legend uses hardcoded stroke="#91999E" which
doesn't adapt to dark mode; update the SVG in the FutureevalProsVsBotsChart
component to use the theme-aware color already computed as gridStroke (from the
variable on line ~269) instead of the literal "#91999E" — pass gridStroke into
the SVG elements via stroke={gridStroke} or inline style={{ stroke: gridStroke
}} for the <line> and <path> elements so the legend color follows the current
theme.

In
`@front_end/src/app/`(futureeval)/futureeval/components/futureeval-methodology-sections.tsx:
- Around line 447-463: The Link pointing to the model leaderboard uses the wrong
path; update the href on the Link in the FutureEvalMethodologySections JSX (the
Link used in the "continuously update our" sentence) from
"/futureeval#model-leaderboard" to "/futureeval/methodology#model-leaderboard"
so it points to the anchor rendered by the FutureEvalMethodologySections
component; also scan for any other occurrences of
"/futureeval#model-leaderboard" and replace them with
"/futureeval/methodology#model-leaderboard".

In `@front_end/src/app/`(futureeval)/futureeval/data/biggest-bot-wins.ts:
- Around line 117-121: The botQuote currently uses a human forecaster ("MWG")
which violates the bot naming pattern; update the data in biggest-bot-wins.ts by
moving the MWG quote from the botQuote object into the proQuote field (or
replace botQuote with a genuine bot's quote), ensuring the botQuote property
contains a quote whose author matches a bot identifier pattern used in this file
(e.g., starts with "mf-bot-", "metac-", "jlbot", or "bestworldbot"); modify the
objects named botQuote and proQuote accordingly so botQuote holds a real bot
author and proQuote holds the MWG human author.
🧹 Nitpick comments (2)
front_end/src/app/(futureeval)/futureeval/data/biggest-bot-wins.ts (1)

192-212: Consider removing the commented-out entry instead of leaving it as dead code.

If this entry is no longer needed, removing it entirely keeps the data file clean. Version control already preserves the history if it ever needs to be restored.

front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx (1)

121-155: Imperative SVG <defs> injection is fragile inside a React-managed tree.

This useLayoutEffect manually creates and inserts SVG elements into a DOM node owned by Victory. If Victory ever re-creates or replaces the <svg> element (e.g. on certain prop changes), the injected <defs> will silently vanish until the next effect trigger. Additionally, the width dependency causes the pattern to be torn down and re-inserted on every resize even when the colors haven't changed.

A more resilient approach: render the <defs> declaratively outside the chart (e.g. a hidden 0×0 <svg> with the pattern) so the definition is always present and React-managed, and reference it via url(#binary-stripe) as you already do.

Example: declarative pattern defs
+ {/* Declarative SVG defs — always present, React-managed */}
+ <svg width={0} height={0} style={{ position: "absolute" }}>
+   <defs>
+     <pattern
+       id="binary-stripe"
+       width="8"
+       height="8"
+       patternUnits="userSpaceOnUse"
+       patternTransform="rotate(45)"
+     >
+       <rect width="4" height="8" fill={s1Color} />
+       <rect x="4" width="4" height="8" fill={s2Color} />
+     </pattern>
+   </defs>
+ </svg>

Then remove the useLayoutEffect on lines 121-155 entirely.

@CodexVeritas CodexVeritas merged commit 0004811 into main Feb 7, 2026
14 checks passed
@CodexVeritas CodexVeritas deleted the futureeval-v2.2 branch February 7, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant