Talk:Scoring rule

Latest comment: 1 day ago by Niklas V Lehmann in topic Unclear citation

Untitled

edit

PurpleMage (talk) 03:13, 16 November 2010 (UTC)The binary decision scoring rule notation of U(x,q) does not lend itself to multiclass scoring. I would like to integrate the notation of binary vs multiclass scoring rules better as the division does not need to be so stark. PurpleMage (talk) 03:13, 16 November 2010 (UTC)Reply


The introduction to this should explain the usage in terms not just human forecasting, but also in terms of pattern classifier calibration. This article is tricky since p, which is our optimal probability is called the 'forecasters personal probability belief' for forecasting which does not make sense for a machine algorithm that we still desire honesty from.PurpleMage (talk) 05:04, 16 November 2010 (UTC)Reply

Yes! If so, should make comparisons with estimation theory, for example maximum likelihood. Should also include some proofs. Kjetil Halvorsen 05:43, 2 August 2011 (UTC) — Preceding unsigned comment added by Kjetil1001 (talkcontribs)

I agree. Also, a proof specifically showing that a rule is proper would be a good proof to add. 199.46.199.232 (talk) 01:21, 5 March 2012 (UTC)Reply

Would it be possible to write the lead section of the article in a way that lets it be understood by common human beings (as opposed to mathematicians)? The third phrase alone contains at least three non trivial concepts with which the reader needs to be familiar in order to understand just that one single sentence, not speaking of the rest of the lead. The same sentence in addition (!) mentions that probablities of all possible outcomes need to sum to one. Given that one knows what a proability is, how does mentioning the fact that sum = 1 helps add anything useful to letting the reader understand the subject topic? If one does not know what probabilies are then again how does that help? see what a lead should be. I assert that the lead is impenetrable already to common human beings and after that the reader gets drowned in math without any more addo. As with many other science related articles this article's target audience seems to be mathematicians AFAICS. I assert that that's not the purpose of Wikipedia. Mathematicians have their own publishing universe that serves as their reference. Wikipedia's main target is the general public and therefore the aim should as far as possible (!) be to allow the general public to understand the writing. I am aware that I am criticizing without improving the article. I guess I would if I felt that I am competent. Thanks TomasPospisek (talk) 21:56, 24 May 2020 (UTC)Reply

Over three years later, and TomasPospisek's statement still applies. This article is not comprehensible to people who do not have a deep understanding of statistics, and it need not be that way, nor is it useful to keep it so.
Further, weird residual text remains: "A poorly calibrated forecaster might be encouraged to do better by a bonus system. A bonus system designed around a proper scoring rule will incentivize the forecaster to report probabilities equal to his personal beliefs." This is a statement about the psychology of motivation for (weather) forecasters, which is likely quite wrong, and to which the article (Bickel, E.J. (2007)) cited as support is actually irrelevant. Given the topic of the article, these two sentences have nothing useful to say about forecasting or scoring rules and should be deleted. This text comes from long ago, when the surrounding text was different, and although it wasn't useful then either, it made a bit more sense. 38.147.235.238 (talk) 22:07, 19 August 2023 (UTC)Reply

What is a forecast scheme?

edit

This term is used in the defintion section without explanation. — Charles Stewart (talk) 13:31, 24 February 2017 (UTC)Reply

edit

I can't see a video under the link for "Video comparing spherical, quadratic and logarithmic scoring rules" MathieuPutz (talk) 22:12, 2 January 2023 (UTC)Reply

add a proper paragraph "Comparison of scoring rules"

edit

This paragraph should discuss the gif in depth and explain what are the graphs which are visible there. Biggerj1 (talk) 12:44, 1 September 2023 (UTC)Reply

Also when to use which scoring function is interesting, see discussion in https://doi.org/10.1287/deca.1070.0089 Biggerj1 (talk) 21:51, 1 September 2023 (UTC)Reply

Discuss Problem of extremely imbalanced dataset

edit

Biggerj1 (talk) 06:38, 24 September 2023 (UTC)Reply

https://stats.stackexchange.com/questions/489106/brier-score-and-extreme-class-imbalance Biggerj1 (talk) 06:39, 24 September 2023 (UTC)Reply

Discussion on possible merging of this page

edit
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
To not merge; capable of expansion and scope different. Klbrain (talk) 10:51, 10 November 2024 (UTC)Reply

On the top of this page, it has been suggested to merge this page together with Loss functions for classification, so I want to open a discussion about that.

I personally do not agree with this, as there's plenty of interesting research that has been done on continuous scoring rules (both univariate and multivariate). Until shortly, only CRPS was briefly mentioned as a continuous scoring rule.

In the past week, I added a variety of material on continuous scoring rules to this page, and I plan to summarize a variety of comparison papers, in order to create a full-fletched comparison of scoring rules section, including an expansion of the applications section, since scoring rules are often applied in machine learning applications. In my opinion, this is enough to warrant a separate page. CuriousDataScientist (talk) 13:31, 11 May 2024 (UTC)Reply

I agree. In my pov scoring rules are first and foremost about forecast verification, and should be treated separately from loss functions. It is clear that the topics do overlap, but it would be misleading to merge because 1)they stem from different branches of science, and it is good to acknowledge contributions of different fields, least for the sake of history. 2) Losses and metrics are, imho, distinct notions. You might want to minimize a loss (meaning you would study the way it behaves in a minimization algorithm), while you expect a metric to give you information about a phenomenon/system. Scoring rules can be both, they are not "only" losses. 90.55.188.103 (talk) 12:51, 16 July 2024 (UTC)Reply
Losses can be both as well. RMS loss is a common example of an easily-interpretable rule. Closed Limelike Curves (talk) 15:23, 18 July 2024 (UTC)Reply
I agree with you that there's a lot of research on continuous scoring rules, so I think that any merge should go in the opposite direction (from loss functions for classification into this page). Loss functions for classification are a specific kind of scoring rule; specifically, they are an application of scoring to classification tasks (usually with binary/categorical predictions rather than probabilistic ones). Closed Limelike Curves (talk) 15:28, 18 July 2024 (UTC)Reply
I oppose the merge. Loss functions aren't the same as scoring problems, for instance the loss can have a regularization term. The emphasis of scoring functions should be on the goodness of fit. For loss functions on training Earlsofsandwich (talk) 16:50, 4 November 2024 (UTC)Reply
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Unclear citation

edit

The section " Interpretation of proper scoring rules" starts with the claim "All proper scoring rules are equal to weighted sums (integral with a non-negative weighting functional) of the losses in a set of simple two-alternative decision problems that use the probabilistic prediction, each such decision problem having a particular combination of associated cost parameters for false positive and false negative decisions. A strictly proper scoring rule corresponds to having a nonzero weighting for all possible decision thresholds. Any given proper scoring rule is equal to the expected losses with respect to a particular probability distribution over the decision thresholds; thus the choice of a scoring rule corresponds to an assumption about the probability distribution of decision problems for which the predicted probabilities will ultimately be employed, with for example the quadratic loss (or Brier) scoring rule corresponding to a uniform probability of the decision threshold being anywhere between zero and one."

I have been unable to verify this claim using the citations. I think more explicit mention of where such a claim is made needs to be inserted. Niklas V Lehmann (talk) 13:17, 22 November 2024 (UTC)Reply