Canary 2.0 – Project Conclusion & Final Report

What Canary 2.0 was

Canary 2.0 was a personal research project built and operated by Roger Smolski between February and June 2026. It was not a commercial product and was never intended to be one. The project had a single purpose: to test whether quantitative analysis of AI sector news sentiment could produce statistically meaningful signals about how that sector behaves.

The system ran for approximately 100 days. Each morning it collected between 800 and 1,000 news articles from three commercial APIs, scored each article for sentiment and relevance using Anthropic’s Claude Haiku, computed a daily sentiment score for six AI industry categories, and published the results to this website. In parallel it computed the Semantic Volatility Index — a composite indicator designed to detect shifts in the character of AI sector news coverage before those shifts appeared in the headline sentiment numbers.

The six categories tracked were AI Chips and Hardware, AI Platform Hyperscalers, AI Enterprise Software, Data Centre Infrastructure, AI GPU Cloud, and Pure Play AI. Across the full run, approximately 55,000 articles were scored.

The pre-registered hypothesis

Before collecting any price data and before conducting any analysis, we formally registered the following hypothesis:

Does a 3-day rolling change in category sentiment score predict the 1-day-forward equal-weighted log return of that category’s stock basket?

Pre-registration means the hypothesis, the statistical test, the significance threshold, and the minimum data requirements were all written down and fixed before we looked at whether the answer was yes or no. This discipline exists to prevent the well-documented problem of researchers unconsciously adjusting their methods after seeing the data. The hypothesis was registered on 28 March 2026.

The test used Pearson correlation with a one-tailed p-value, testing specifically whether positive sentiment change predicted positive returns. The significance threshold was p < 0.0083 — a Bonferroni-corrected threshold accounting for six simultaneous tests across six categories.

The result

The hypothesis was not supported.

Category	n	r	p (one-tailed)
AI Chips and Hardware	86	+0.035	0.374
AI Platform Hyperscalers	86	−0.171	0.942
AI Enterprise Software	86	−0.023	0.584
Data Centre Infrastructure	86	−0.100	0.821
AI GPU Cloud	86	+0.164	0.066
Pure Play AI	86	−0.162	0.932

No category reached the significance threshold of p < 0.0083. Four of the six correlations were slightly negative. AI GPU Cloud produced the closest result — a positive correlation of r = +0.164 at p = 0.066 — but this does not survive correction and should not be treated as a finding. It is noted as a directional indication that would warrant a more focused follow-up study with a longer observation period.

The null result is not a surprise in retrospect. The academic literature on news sentiment and market returns consistently finds that news predicts the direction of volatility movements more reliably than it predicts price direction. The price of large, heavily covered AI sector stocks reflects available information quickly — faster than a daily sentiment scoring run can capture. The hypothesis was a reasonable starting point but was testing the wrong outcome variable at the wrong timescale.

What the system did well

The null result on the price prediction hypothesis does not mean the system failed analytically. The Semantic Volatility Index performed as designed throughout the project.

The SVI detected two complete narrative rotation events in real time. The first was a deterioration rotation in late March 2026, in which Market Correction and Regulatory Risk framing grew while Technical Breakthrough framing declined — the Jensen-Shannon Divergence sub-component (SC5) reached 0.015 to 0.027 during the active rotation phase and correctly flagged the shift several days before it became apparent in the headline sentiment numbers. The second was a recovery rotation in mid-April, a mirror image of the first — Technical Breakthrough and Financial Results framing recovered while risk framing declined. Both were documented and characterised in real time using the SVI framework.

The most analytically interesting episode was the April-May Financial Results rotation. Between early April and mid-May, Financial Results framing in the AI sector corpus grew from approximately 20% of the frame distribution to a peak of 31.8%, while Technical Breakthrough framing fell from approximately 12% to 8%. We initially interpreted this as a structural reclassification — a permanent shift in how the AI sector was being discussed, from capability demonstration toward commercial deployment and earnings. Subsequent data falsified that interpretation. By late May, both frames had returned to their pre-rotation levels, confirming the rotation was cyclical — driven by the earnings season cycle — rather than structural. The system correctly detected the onset, tracked the peak, identified the decompression, and recorded the full return to baseline across approximately seven weeks.

The honest assessment is this: the SVI is a genuine detector of narrative regime shifts in sector news coverage. What it does not do — and what it was never designed to do — is predict whether those shifts will translate into price movements within a one-day window. That distinction was not fully appreciated at the start of the project. It is now.

The calibration lessons

Running a live system for 100 days produced two methodological lessons that will inform the next study.

The first concerns how to interpret frame rotations. A sustained movement in narrative framing that coincides with a known cyclical driver — an earnings season, a product announcement cluster, a regulatory news cycle — should be characterised as cyclical by default. A structural interpretation requires evidence spanning at least one full cycle. Seven weeks of data covering a single earnings period is not sufficient. This lesson was learned the hard way through the reclassification episode described above.

The second concerns hypothesis design. The right outcome variable for a news sentiment signal is realised volatility, not return direction. A better-designed hypothesis would test whether elevated SVI readings predict above-median realised volatility in constituent stocks over a 5 to 10 trading day window — a timescale consistent with how SVI signals actually develop and resolve, and an outcome variable that the academic literature supports connecting to news flow.

Why the project is closing

The project closes at this point because the pre-registered hypothesis has been tested and answered, and because enough has been learned to design a substantially better study from scratch. Continuing to run the current system would accumulate more data but would not address the fundamental design issue — the hypothesis was mis-specified. The right response is to stop, document what was learned, and build the next study correctly from the beginning.

A follow-up study is in the planning stage. It will test a volatility prediction hypothesis rather than a return direction hypothesis, use a longer observation period, and apply the calibration lessons from this project to its analytical framework from day one.

The dashboard and daily data

The Canary 2.0 dashboard on this site shows the final state of the system as of 4 June 2026, the date the hypothesis test was run. Daily updates have concluded. The dashboard and all methodology pages remain live as a record of the project.

Project Conclusion and Final Report — Canary 2.0