FutureEval v2.2 by CodexVeritas · Pull Request #4272 · Metaculus/metaculus

CodexVeritas · 2026-02-06T17:59:59Z

Summary by CodeRabbit

Documentation
- Added inline academic citations and clarified explanatory copy across methodology and performance text.
New Features
- Partner tab/page and a four-card Tournament overview added.
UI / Content Updates
- Renamed headings (e.g., "Pros vs. Bots"), added anchored graph sections for direct linking, updated chart Y-axis to "Pro Lead Over Bots", expanded legend, significance markers, binary-season messaging, and interactive tooltips; replaced a static tournament block with a clickable "Join" card.
Data / Content
- Updated showcased bot win/loss examples, quote attributions, and summary metadata.

coderabbitai · 2026-02-06T18:00:21Z

Warning

Rate limit exceeded

@CodexVeritas has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 58 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Multiple FutureEval frontend updates: copy/link target rewrites and added section anchors; new Partner tab and partner page (server-side leaderboard fetch); Tournament overview UI; Biggest Bot Wins component API simplified; Pros-vs-Bots chart relabeled and enriched with per-quarter significance and legend; dataset and small metadata edits.

Changes

Cohort / File(s)	Summary
Methodology & copy `front_end/src/app/(futureeval)/futureeval/components/futureeval-methodology-sections.tsx`, `front_end/src/app/(futureeval)/futureeval/components/futureeval-methodology-content.tsx`	Added inline Links (including arXiv citation), rewrote paragraph/link text and section references; updated Bot Tournaments card href.
Benchmark headers & subtitle layout `front_end/src/app/(futureeval)/futureeval/components/benchmark/futureeval-benchmark-headers.tsx`, `front_end/src/app/(futureeval)/futureeval/components/benchmark/futureeval-model-benchmark.tsx`	Renamed header to "Pros vs. Bots", adjusted subtitle copy; converted subtitle layout from grid to responsive flex and repositioned banner.
Section anchors & tabs `front_end/src/app/(futureeval)/futureeval/components/benchmark/futureeval-benchmark-tab.tsx`, `front_end/src/app/(futureeval)/futureeval/components/futureeval-tabs-shell.tsx`, `front_end/src/app/(futureeval)/futureeval/components/futureeval-tabs.tsx`	Added explicit container IDs (`performance-over-time-graph`, `biggest-bot-wins-graph`, `pros-vs-bots-graph`, `bot-tournaments-graph`); added `partner` to Section union and registered Partner tab.
Partner page & tab `front_end/src/app/(futureeval)/futureeval/components/futureeval-partner-tab.tsx`, `front_end/src/app/(futureeval)/futureeval/partner/page.tsx`	New Partner tab component and server-side partner page that fetches global leaderboard; exports `metadata` and default async page.
Biggest Bot Wins — API & data `front_end/src/app/(futureeval)/futureeval/components/benchmark/futureeval-biggest-bot-wins.tsx`, `front_end/src/app/(futureeval)/futureeval/data/biggest-bot-wins.ts`	Removed navigation props (`url`, `commentUrl`) from ForecastDial/QuoteBlock and simplified wrappers; updated dataset entries (author swaps, added/removed loss entries).
Pros vs. Bots chart & config `front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx`, `front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/config.ts`	Y-axis relabeled to "Pro Lead Over Bots"; added significance types, color fills, labels, ordering, significance icons, QuarterSummaryTable, binary-only season handling, responsive two-column layout, tooltip/legend/CI UI; new local components added.
Performance-over-time copy `front_end/src/app/(futureeval)/futureeval/components/benchmark/performance-over-time/futureeval-benchmark-forecasting-performance.tsx`	Tweaked messaging text and added styling classes to the info block.
Participate / tournaments UI `front_end/src/app/(futureeval)/futureeval/components/futureeval-participate-tab.tsx`, `front_end/src/app/(futureeval)/futureeval/components/futureeval-tournaments.tsx`	Added FutureEvalTournamentOverview (four-card grid) and integrated it; replaced static block with clickable "Ready to compete?" card; small resource copy changes.
Orbit metadata tweaks `front_end/src/app/(futureeval)/futureeval/components/orbit/metaculus-hub.tsx`	Small copy edits to displayed metadata lines (counts/years).

Sequence Diagram(s)

sequenceDiagram
    participant Browser
    participant NextServer as Next.js Server (app/partner/page)
    participant LeaderboardAPI as ServerLeaderboardApi
    participant LeaderboardService as Leaderboard data source

    Browser->>NextServer: GET /futureeval/partner (SSR)
    NextServer->>LeaderboardAPI: getGlobalLeaderboard(null, null, "manual", "Global Bot Leaderboard")
    LeaderboardAPI->>LeaderboardService: fetch leaderboard data
    LeaderboardService-->>LeaderboardAPI: leaderboard payload
    LeaderboardAPI-->>NextServer: mapped leaderboard
    NextServer-->>Browser: Rendered FutureEvalPartnerPage (HTML with safeLeaderboard)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Futureeval leaderboard entries fix #4263: Overlapping leaderboard utilities and mapped aggregates/bots that align with server leaderboard fetching and data-shape usage.
FutureEval Branding & Page Updates #4001: Related edits to FutureEval components (layout, charts, anchors, props) that intersect with this PR.

Suggested reviewers

elisescu
ncarazon
lsabor

Poem

🐰
I hopped through links and stitched a seam,
I renamed charts and tuned the theme.
Tabs and cards in tidy rows,
Significance blooms where data flows.
A tiny hop for FutureEval dreams ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'FutureEval v2.2' is vague and generic, lacking specific information about the actual changes in the pull request.	Consider using a more descriptive title that highlights the main changes, such as 'Update FutureEval UI with improved navigation links and chart enhancements' or 'Refactor FutureEval benchmark visualizations and add partner tab'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch futureeval-v2.2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-06T22:08:45Z

🧹 Preview Environment Cleaned Up

The preview environment for this PR has been destroyed.

Resource	Status
🌐 Preview App	✅ Deleted
🗄️ PostgreSQL Branch	✅ Deleted
⚡ Redis Database	✅ Deleted
🔧 GitHub Deployments	✅ Removed
📦 Docker Image	⚠️ Retained (auto-cleanup via GHCR policies)

Cleanup triggered by PR close at 2026-02-07T08:09:01Z

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In
`@front_end/src/app/`(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx:
- Around line 897-900: SignificanceLegendIcon is pulling hardcoded colors from
LEGEND_ICON_COLORS and thus ignores dark mode; update it to use the theme-aware
color resolution used by SignificanceIconPoint by either 1) accepting a resolved
color prop and using getThemeColor(SIGNIFICANCE_FILL[type]) at the callsite, or
2) calling the theme helper inside the component (e.g., import/getThemeColor and
compute getThemeColor(SIGNIFICANCE_FILL[type])) and replace
LEGEND_ICON_COLORS[type] with that value; adjust any callers of
SignificanceLegendIcon accordingly so legend icons match SignificanceIconPoint
in dark/light themes.

🧹 Nitpick comments (3)

front_end/src/app/(futureeval)/futureeval/components/futureeval-methodology-sections.tsx (1)

644-645: Minor: section id no longer matches the new title.

The section id is pros-beat-bots but the visible heading is now "Pros vs. Bots." Consider updating the id to pros-vs-bots for consistency (and updating the #pros-beat-bots link on Line 632 accordingly). Not urgent since the anchor still works, but the semantic mismatch could confuse future maintainers.
front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx (2)
891-895: LEGEND_ICON_COLORS duplicates the DEFAULT values from SIGNIFICANCE_FILL in config.ts.

These hex values are identical to SIGNIFICANCE_FILL[status].DEFAULT. If the config colors change, the legend will fall out of sync. Consider importing and reusing the config values instead.
♻️ Suggested fix
-const LEGEND_ICON_COLORS: Record<SignificanceStatus, string> = {
-  significant_win: "#16a34a",
-  non_significant_win: "#ca8a04",
-  loss: "#dc2626",
-};
+const LEGEND_ICON_COLORS: Record<SignificanceStatus, string> = {
+  significant_win: SIGNIFICANCE_FILL.significant_win.DEFAULT,
+  non_significant_win: SIGNIFICANCE_FILL.non_significant_win.DEFAULT,
+  loss: SIGNIFICANCE_FILL.loss.DEFAULT,
+};
797-952: SignificanceIconPoint and SignificanceLegendIcon share nearly identical SVG drawing logic.

Both components render the same three icon shapes (plus, checkmark, X inside a circle) with the same proportions. If the icon design changes, both must be updated in lockstep. Consider extracting a shared helper that accepts cx, cy, r, color, and significance to render the inner glyph.

.../(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx (1)
386-404: ⚠️ Potential issue | 🟡 Minor

Hardcoded whisker SVG stroke color won't adapt to dark mode.

The #91999E stroke on the 95% CI whisker icon is hardcoded. In dark mode this may have poor contrast against a dark background. Consider using getThemeColor with a theme-aware color token (e.g., METAC_COLORS.gray[500]) like the rest of the chart does.

Since this is an inline SVG, one approach is to set stroke="currentColor" and let the parent <span>'s text color (which already uses text-gray-700 dark:text-gray-700-dark) cascade:
Proposed fix
              <line
                y1="1.83398"
                x2="13"
                y2="1.83398"
-               stroke="#91999E"
+               stroke="currentColor"
                strokeWidth="2"
              />
              <path
                d="M6.5 15.0391L6.5 2.03906"
-               stroke="#91999E"
+               stroke="currentColor"
                strokeWidth="2"
              />
              <line
                y1="14.959"
                x2="13"
                y2="14.959"
-               stroke="#91999E"
+               stroke="currentColor"
                strokeWidth="2"
              />

🧹 Nitpick comments (2)

front_end/src/app/(futureeval)/futureeval/components/futureeval-partner-tab.tsx (1)

42-62: Semantic heading level doesn't match visual typography.

<h3> is styled with FE_TYPOGRAPHY.h1. Since this component is embedded in a page that likely has its own <h1>, the lower semantic level is understandable — but the mismatch between the tag and the typography token name could confuse future maintainers. Consider using <h2> here (since it's the top-level heading within this tab), which would also give a cleaner h2 → h3 → h4 hierarchy across the subcomponents.
front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx (1)
798-873: QuarterSummaryTable — theme-aware colors used correctly, but getStatus uses linear search.

The table correctly resolves theme colors via getThemeColor(SIGNIFICANCE_FILL[status]), which is the right approach (and avoids the LEGEND_ICON_COLORS dark-mode issue). The getStatus helper uses .find() per category, which is fine given the small dataset sizes expected here.

One small note: SignificanceLegendIcon (called on Line 856) still uses the hardcoded LEGEND_ICON_COLORS rather than the theme-resolved col.color that is already computed. You could pass the resolved color as a prop to keep the icons consistent with the label color in dark mode (this ties into the existing review comment on LEGEND_ICON_COLORS).
Proposed approach — pass resolved color to icon
-const SignificanceLegendIcon: React.FC<{ type: SignificanceStatus }> = ({
-  type,
+const SignificanceLegendIcon: React.FC<{ type: SignificanceStatus; color?: string }> = ({
+  type,
+  color: colorProp,
 }) => {
-  const color = LEGEND_ICON_COLORS[type];
+  const color = colorProp ?? LEGEND_ICON_COLORS[type];
Then at the call site in QuarterSummaryTable:
-                   <SignificanceLegendIcon type={col.status} />
+                   <SignificanceLegendIcon type={col.status} color={col.color} />
This lets the icon inherit the already-resolved theme color without needing its own hook.

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In
`@front_end/src/app/`(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx:
- Around line 399-428: The SVG legend uses hardcoded stroke="#91999E" which
doesn't adapt to dark mode; update the SVG in the FutureevalProsVsBotsChart
component to use the theme-aware color already computed as gridStroke (from the
variable on line ~269) instead of the literal "#91999E" — pass gridStroke into
the SVG elements via stroke={gridStroke} or inline style={{ stroke: gridStroke
}} for the <line> and <path> elements so the legend color follows the current
theme.

In
`@front_end/src/app/`(futureeval)/futureeval/components/futureeval-methodology-sections.tsx:
- Around line 447-463: The Link pointing to the model leaderboard uses the wrong
path; update the href on the Link in the FutureEvalMethodologySections JSX (the
Link used in the "continuously update our" sentence) from
"/futureeval#model-leaderboard" to "/futureeval/methodology#model-leaderboard"
so it points to the anchor rendered by the FutureEvalMethodologySections
component; also scan for any other occurrences of
"/futureeval#model-leaderboard" and replace them with
"/futureeval/methodology#model-leaderboard".

In `@front_end/src/app/`(futureeval)/futureeval/data/biggest-bot-wins.ts:
- Around line 117-121: The botQuote currently uses a human forecaster ("MWG")
which violates the bot naming pattern; update the data in biggest-bot-wins.ts by
moving the MWG quote from the botQuote object into the proQuote field (or
replace botQuote with a genuine bot's quote), ensuring the botQuote property
contains a quote whose author matches a bot identifier pattern used in this file
(e.g., starts with "mf-bot-", "metac-", "jlbot", or "bestworldbot"); modify the
objects named botQuote and proQuote accordingly so botQuote holds a real bot
author and proQuote holds the MWG human author.

🧹 Nitpick comments (2)

front_end/src/app/(futureeval)/futureeval/data/biggest-bot-wins.ts (1)

192-212: Consider removing the commented-out entry instead of leaving it as dead code.

If this entry is no longer needed, removing it entirely keeps the data file clean. Version control already preserves the history if it ever needs to be restored.
front_end/src/app/(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx (1)
121-155: Imperative SVG <defs> injection is fragile inside a React-managed tree.

This useLayoutEffect manually creates and inserts SVG elements into a DOM node owned by Victory. If Victory ever re-creates or replaces the <svg> element (e.g. on certain prop changes), the injected <defs> will silently vanish until the next effect trigger. Additionally, the width dependency causes the pattern to be torn down and re-inserted on every resize even when the colors haven't changed.

A more resilient approach: render the <defs> declaratively outside the chart (e.g. a hidden 0×0 <svg> with the pattern) so the definition is always present and React-managed, and reference it via url(#binary-stripe) as you already do.
Example: declarative pattern defs
+ {/* Declarative SVG defs — always present, React-managed */}
+ <svg width={0} height={0} style={{ position: "absolute" }}>
+   <defs>
+     <pattern
+       id="binary-stripe"
+       width="8"
+       height="8"
+       patternUnits="userSpaceOnUse"
+       patternTransform="rotate(45)"
+     >
+       <rect width="4" height="8" fill={s1Color} />
+       <rect x="4" width="4" height="8" fill={s2Color} />
+     </pattern>
+   </defs>
+ </svg>
Then remove the useLayoutEffect on lines 121-155 entirely.

.../(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx

front_end/src/app/(futureeval)/futureeval/components/futureeval-methodology-sections.tsx

front_end/src/app/(futureeval)/futureeval/data/biggest-bot-wins.ts

Updated minor copy

e3e6a2a

CodexVeritas temporarily deployed to testing_env February 6, 2026 18:00 — with GitHub Actions Inactive

CodexVeritas added 5 commits February 6, 2026 13:11

various copy updates

4be6b48

Skill score doc update

0dc1dc4

Initial update for participate page

899a48f

Continued to update participants page

b50087b

Added fixes to the bot win table

23c9ad6

CodexVeritas temporarily deployed to testing_env February 6, 2026 21:55 — with GitHub Actions Inactive

Added partner page and updated pros vs bots graph

b8ab0d5

CodexVeritas temporarily deployed to testing_env February 6, 2026 23:34 — with GitHub Actions Inactive

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

.../(futureeval)/futureeval/components/benchmark/pros-vs-bots/futureeval-pros-vs-bots-chart.tsx Show resolved Hide resolved

Updated pros vs bots graph

f0101b8

CodexVeritas temporarily deployed to testing_env February 7, 2026 00:01 — with GitHub Actions Inactive

coderabbitai bot reviewed Feb 7, 2026

View reviewed changes

CodexVeritas added 2 commits February 6, 2026 23:47

Updated pros vs bots graph.

6a3479d

Updated pros vs bots graph

3e69c58

CodexVeritas had a problem deploying to testing_env February 7, 2026 07:37 — with GitHub Actions Error

Formatting improvements

0ee865c

CodexVeritas temporarily deployed to testing_env February 7, 2026 07:39 — with GitHub Actions Inactive

coderabbitai bot reviewed Feb 7, 2026

View reviewed changes

Updated

78ddbcb

CodexVeritas had a problem deploying to testing_env February 7, 2026 07:48 — with GitHub Actions Error

Merge branch 'main' into futureeval-v2.2

2f90318

CodexVeritas temporarily deployed to testing_env February 7, 2026 07:48 — with GitHub Actions Inactive

CodexVeritas merged commit 0004811 into main Feb 7, 2026
14 checks passed

CodexVeritas deleted the futureeval-v2.2 branch February 7, 2026 08:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FutureEval v2.2#4272

FutureEval v2.2#4272
CodexVeritas merged 13 commits intomainfrom
futureeval-v2.2

CodexVeritas commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

github-actions bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CodexVeritas commented Feb 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

github-actions bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧹 Preview Environment Cleaned Up

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CodexVeritas commented Feb 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 6, 2026 •

edited

Loading

github-actions bot commented Feb 6, 2026 •

edited

Loading