At the core of the Canary dashboard is a daily sentiment score for each of the six AI sector categories. Understanding what these scores represent, and what they do not, is the foundation for reading the dashboard intelligently.
The Sentiment Score
Each article processed by Canary receives a sentiment score on a scale from negative one to positive one. A score near positive one indicates strongly positive coverage; a score near negative one indicates strongly negative coverage; and a score near zero indicates neutral or balanced reporting. These are not simple positive or negative flags. They are continuous values that reflect the degree and direction of sentiment expressed in the article’s language.
The scoring is performed by Anthropic’s large language model (LLM), which reads each article in full context rather than simply counting positive or negative words. This matters because financial and technology journalism frequently uses language that is positive in form but negative in implication, or vice versa. A language model understands these distinctions; a keyword counter does not.
Relevance Weighting
Not all articles are equally relevant to the AI sector. A brief mention of a semiconductor company in a general technology roundup carries less information about AI sector dynamics than a dedicated analysis of that company’s latest chip architecture. Canary accounts for this by assigning each article a relevance score between four and ten, based on how directly and substantively it addresses the category in question.
The daily sentiment score for each category is calculated as a relevance-weighted average. Articles with higher relevance scores contribute proportionally more to the overall figure. This prevents peripheral news from diluting the signal from the stories that genuinely matter, and ensures that a single highly relevant investigative piece is not drowned out by a dozen brief mentions.
The 95% Confidence Interval
Each category’s daily score is shown alongside a 95% confidence interval (CI). This range represents the statistical uncertainty around the day’s estimate. On days with high article volume and consistent sentiment across sources, the interval is narrow. On days with few articles or widely divergent views, the interval is wide.
The confidence interval is important for interpretation. A score of +0.3 with a narrow interval is a much cleaner signal than a score of +0.3 with a wide interval that spans from −0.2 to +0.8. In the latter case, different sources are telling substantially different stories, and the aggregate score conceals as much as it reveals.
A 95% confidence interval means that if the same corpus of articles were scored repeatedly under slightly different conditions, the true underlying sentiment would fall within the displayed range on 95 out of 100 occasions. A narrow band indicates high certainty; a wide band indicates more variability across the day’s sources.
Statistical Significance versus History
Perhaps the most practically useful column in the category breakdown table is the comparison against historical norms. This compares today’s sentiment score against the historical distribution for that category using Welch’s t-test, a standard test for determining whether a current observation is genuinely unusual relative to a reference population.
A category flagged as significantly above normal is not simply positive today; it is positive by a margin that is statistically unlikely to have occurred by chance, given its own history. This distinction matters because different categories have different typical ranges. A category that routinely scores between +0.1 and +0.4 is not behaving unusually when it reads +0.35. But a category that typically sits close to zero is behaving unusually at +0.35, and the statistical test captures precisely that difference.
The significance flags on the dashboard — for example “GPU Cloud significantly above normal” or “Chips below normal” — are the output of this test. They direct attention to where something genuinely unusual is happening, rather than leaving the reader to compare numbers manually. Welch’s t-test produces a p-value representing the probability that today’s reading could have occurred by chance. A p-value below 0.05 means there is less than a five per cent probability that today’s score is simply random variation.
A Note on Article Volume
On days with low publication volume — Sundays and public holidays in particular — confidence intervals widen and statistical significance thresholds become harder to reach. The dashboard notes the article count each day; a reading based on 80 articles carries somewhat less weight than one based on 300. Both are valid signals; the former should simply be interpreted with a little more caution.
What the Scores Do Not Tell You
Sentiment scores measure the tone of news coverage, not the quality of the underlying businesses or the direction of their share prices. A strong positive sentiment score means the news about a category has been broadly positive; it does not mean those companies’ shares will rise, or that their fundamentals have improved. News sentiment and price are related but distinct signals, and Canary makes no predictions about market movements.
What the scores do tell you, consistently and daily, is how the AI sector conversation is developing. That is valuable information in its own right, and a critical input to the deeper signals described on the Narrative Frames and Semantic Volatility Index pages.