Talk:Percentile

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Clear inconsistencies in definition[edit]

The definition given for the percentile is "the value below which a given percentage of observations in a group of observations fall", yet in all of the worked examples, the 100th percentile is assumed to be the largest element in the set. despite at least one of the observations not falling below that value. This inconsistency is not up to the standards of the rest of wikipedia mathematics entries. — Preceding unsigned comment added by 2601:1C1:C100:453C:E5FE:36B9:4D03:5E07 (talk) 21:06, 6 September 2017 (UTC)[reply]

Weighted Median[edit]

I changed the formula for weighted median as I believe it was incorrect. The individual weight of the sample should not be a factor in its percentile. I believe it should be 100/S_N(S_n - 1/2) not 100/S_N(S_n - w_n/2) — Preceding unsigned comment added by Howeman (talkcontribs) 02:08, 23 August 2013 (UTC)[reply]

Okay. Apparently my change has been reverted. The current formula is wrong. Let's say I have 1000 samples at 1, 2 samples at 2 and 1 sample at thee

data = [1 2 3] weights = [1000 2 1}

s_n = [1000 1002 1003] S_N = 1003

Using the formula, p_n = [49.85 99.8 99.95]

Now, let's saw we want to find the median (P = 0.5). The integer k for which p_k < P < p_{k+1} is 1. Applying the formula, we get that the median is greater than 1. Given that the 1 values dominate overwhelmingly the dataset, the median should be 1.

Could someone provide a source for the current formula? The R documentation gives a different algorithm http://rss.acs.unt.edu/Rdoc/library/aroma.light/html/weightedMedian.html . — Preceding unsigned comment added by Howeman (talkcontribs) 19:46, 23 August 2013 (UTC)[reply]

Reverted your edit because it is incorrect. You have correctly identified a problem with the formula for v, not a problem with the formula for p_n. Rework your example with weights of [1 0.002 0.001] and see that your change to 1/2 doesn't fix the problem. The w_n factor is simply necessary; please leave it in. 18.18.14.50 (talk) 13:47, 25 September 2013 (UTC)[reply]

Agreed. The change to 1/2 doesn't work either. I think the article should show the algorithm I linked above. It is both correct and sourced. Is there a source for the current algorithm? Howeman (talk) 20:08, 1 October 2013 (UTC)[reply]

Random chatter[edit]

This page is currently slated to be merged with "Quantile", but I really think "Percentile" deserves its own page. The concept is important enough that I wanted to look it up, anyway. --Unregistered Wikipedia-reader, 20 Nov 2004

What is the purpose of the note about Persian translation? --Eve Teschlemacher 19:53, 24 February 2007 (UTC)[reply]

Which english reader is interested in the persian script!? I cannot see the relevance and therefore would delete this information or move it to a trivia section. -- 89.48.108.176 13:47, 11 March 2007 (UTC)[reply]

Both this page as Percentile rank have the same chinese interwiki, chinese site only has an interwiki to Percentile rank, not to this page. What's the right one, or are they both right? 129.125.155.228 07:43, 3 April 2007 (UTC)[reply]

Why are percentiles defined as proportions [0 1], only to be discussed as whole numbers in the range [0 100]?

A common use of the term nth percentile seems to be (based on a recent NYTimes article) the portion with scores in the extremal n%. The article was talking about a gifted program taking those in the top fifth or tenth percentile, i.e. the top 1/20 or 1/10. The definition in the article doesn't seem to admit this usage.

There seems to be a mistake in "Alternative methods": How can k ever be 0, since n is at least 1?

Percentiles and percentile ranks are different terms, and need their own page. "Percenitle" is the score needed to achieve a certain "percentile rank." The two are not interchangealbe. —Preceding unsigned comment added by 67.134.205.79 (talk) 19:31, 12 October 2009 (UTC)[reply]

I agree with the above, but the definition of "percentile" and "percentile rank" surely need to be consistent, so the two pages should be harmonised. The "percentile" page states that there is no clear definition: a statement which applies to a sample percentile but not, I think, to a population percentile. The discussion could be improved: a reference to a student report identifying an error in a piece of software is not helpful. User:mjuckes —Preceding unsigned comment added by Mnjuckes (talkcontribs) 11:15, 14 April 2010 (UTC)[reply]

Confusion[edit]

Not only should the Percentile and Percentile rank articles be "harmonized", we've got to be careful not to mix up the two concepts in the same article! For example, before I changed it just now the article said:

"A percentile (or centile) is the value of a variable below which a certain percent of observations fall."

then later, in the "linear interpolation" section:

"we define the percentile corresponding to the n-th value as . In this way, for example, if N = 5 the percentile corresponding to the third value is... 50."

No, no, no. That's the percent rank, not the percentile. For example: Given the numbers 15, 20, 35, 40, and 50, the percent rank of 20 is 30 (by the above formula), and so the 30th percentile is 20 (i.e., the "percentile" is not 30). - dcljr (talk) 04:51, 12 January 2011 (UTC)[reply]

pronounce[edit]

Hello, how is 97.72th percentile be pronounced? ninety-seventh point seven two percentile or ninety-seven point seven twoth percentile --Diwas (talk) 11:10, 9 June 2011 (UTC)[reply]

Ninety-seven point seventy-second percentile. — Preceding unsigned comment added by 67.0.106.196 (talk) 18:46, 15 April 2012 (UTC)[reply]

Yes, it should be merged[edit]

It would limit confusion among other Wikipedia users if both of these articles where merged. They both talk of the exact same concept. --Ctyonahl (talk) 20:43, 20 December 2011 (UTC)[reply]

No, they should not be merged[edit]

Percentile and quantile are as different as percentages and fractions. Unless you propose to merge percentage with fractions, these should also remain separate.

Quantile should include a short blurb about percentile, and should link to the percentile article. Percentile should mention quantile in the intro and provide a link (currently where quartiles are mentioned). That makes it easy enough to understand, and falls in line with most other sections of the wiki.

(Percentiles are used in much the same way that percentages are... and quantiles are often used in much the same way that small fractions are. While there is insubstantial mathematical separation between the two, there is definitely a usage difference. There is value to making math articles a little more approachable to the lay person, and keeping them separate will help.)

Gd2shoe (talk) 23:14, 17 May 2013 (UTC)[reply]

Not merged, at least not until a proper merge discussion is initiated. --jjron (talk) 15:08, 20 June 2013 (UTC) Suggestions given here implemented however. --jjron (talk) 15:22, 20 June 2013 (UTC)[reply]

Estimation methods, NIST & Excel[edit]

Recent versions of Excel (e.g. 2013) support both methods of estimating percentiles described, without breaking compatibility. By deprecating the PERCENTILE() function and adding two new functions PERCENTILE.EXC() and PERCENTILE.INC(), Excel provides for explicit selection between two methods described. PERCENTILE.EXC is equivalent to the first approach described in the NIST report, PERCENTILE and PERCENTILE.INC are equivalent to the second method.

It should be noted that for common textbook cases (e.g. the 50th percentile of a list with an odd number of entries) the two estimation methods generate the same exact solution. The differences between estimation methods shouldn't be overemphasized, they are simply different ways of estimating a parameter from a sample. Burt Harris 15:31, 15 May 2014 (UTC) — Preceding unsigned comment added by Burt Harris (talkcontribs)

Inconsistency - Can a score be "in" a percentile, or not?[edit]

WP entry "Quantile" states:

Standardized test results are commonly misinterpreted as a student scoring "in the 80th percentile," for example, as if the 80th percentile is an interval to score "in," which it is not; one can score "at" some percentile, or between two percentiles, but not "in" some percentile. Perhaps by this example it is meant that the student scores between the 80th and 81st percentiles, or "in" the group of students whose score placed them at the 80th percentile.

WP entry "Percentile" states:

For example, if a score is in the 86th percentile, it is higher than 86% of the other scores.

This inconsistency ought to be resolved. 106.37.84.22 (talk) 06:48, 25 September 2014 (UTC)[reply]

I had the same issue. It took me quite some time researching dictionaries etc. until I found out that the word "percentile" has at least two completely distinct, although related meanings. It can either mean
(1) a specific scalar value, as currently defined in this article, or
(2) one of a hundred equally sized intervals that you can divide the samples of a ordered distribution in (i.e. an interval between two of the scalar values in the first definition).
So, "being in the 80th percentile" (interval meaning) means to have a score between the 80th percentile (scalar meaning) and the 81th percentile (scalar meaning). https://www.oxfordreference.com/view/10.1093/oi/authority.20110803100316401 has a proper definition where it mentions both meanings.
This article should definitely and straight away mention these two separate and equally valid meanings. They are both right (words can have many equally valid meanings depending on context). But of course the bulk of the article should then talk about the scalar definition, as that is probably the one mostly used in statistics. Note, though, that the interval definition ("in the 80th percentile") is also used in peer-reviewed scientific research papers, for instance by the psychologists Dunning and Kruger in their research paper about the Dunning-Kruger effect (https://www.researchgate.net/publication/12688660_Unskilled_and_Unaware_of_It_How_Difficulties_in_Recognizing_One's_Own_Incompetence_Lead_to_Inflated_Self-Assessments).
I'm a bit unsure how to write this in a good way, but the two different meanings should definitely be mentioned straight away in the first sentence. I believe not being aware of the two different meanings is a cause of quite a bit of confusion, but here on Wikipedia we have a chance of settings things straight and clear out the confusion. --Jhertel (talk) 12:30, 17 August 2020 (UTC)[reply]
I also edited the Quantile article preliminarily now to explain this. The percentile article (this article) still needs to be edited to be consistent with this, with the proper references. --Jhertel (talk) 12:34, 17 August 2020 (UTC)[reply]
Also, see the Wiktionary article on percentile which already has this distinction. --Jhertel (talk) 13:13, 17 August 2020 (UTC)[reply]

Wouldn't "the 80th percentile (interval)" be the one between the 79th and 80th scalars? The 1st interval is below the 1st scalar, so the 80th should be below the 80th. 2A02:C7F:981C:AF00:CC85:B8A8:B348:2679 (talk) 12:34, 18 August 2020 (UTC)[reply]

It's a very good point to be unsure about this, and your reasoning makes good sense. I was actually unsure about it when I wrote it and only made a quick calculation in my head without a definitive conclusion; I should have stated that uncertainty. I guess it might depend on whether the first interval percentile is labeled the "0th" or the "1st". That would depend on a mathematical definition, one that I still haven't found for the interval percentile.
What I have found is https://www.dictionary.com/browse/percentile#science-section, where they state this definition of the interval meaning of percentile: "Any of the 100 equal parts into which the range of the values of a set of data can be divided in order to show the distribution of those values. The percentile of a given value is determined by the percentage of the values that are smaller than that value. For example, a test score that is higher than 95 percent of the other scores is in the 95th percentile." The source is given as "The American Heritage Science Dictionary". (Edit: That source is in fact correctly stated; here is a snippet from the actual book: https://books.google.dk/books?redir_esc=y&hl=en&id=yKUagx8PB_EC&q=percentile#v=snippet&q=percentile&f=false)
With that definition, the 95th interval percentile would be between the 95th and the 96th scalar percentiles, i.e., the nth interval percentile would be between the nth and the (n+1)th scalar percentiles. Somehow I believe that is how it is in fact used, and it seems intuitive: A value above the 95th scalar percentile (but not above the 96th; this is implicitly assumed) puts you in the 95th interval percentile.
In that case, the first interval percentile (number 1 out of the 100) would have to be defined as the "0th", and the last (number 100 out of the 100) would be the "99th". And yes, that specific labelling is not intuitive. So with this specific definition we get some intuitiveness by sacrificing some other intuitiveness, and it seems like we can't have both intuitivenesses at the same time. 🙂 The question is of course which definition is actually used. I only found one definition here. It would be interesting to find more and see if they agree. --Jhertel (talk) 14:53, 18 August 2020 (UTC)[reply]

Radical Re-Write Suggestion[edit]

Langford article cited in references of the current article discussess more than 14 possible definitions and shows that the CDF method is the ONLY one with certain key properties: For instance, DOUBLING the data leaves it unchanged. Many other properties that are intuitive also holds for CDF and not for others. Furthermore, CDF is NATURAL definition from statistical point of view.

My view is that this article can be RE-WRITTEN DRASTICALLY to eliminate confusion. FIRST it should give PROMINENCE to the CDF definition, SECOND it should display all other definitions as ERRORS. I would like to DISCUSS this with relevant people, to explain further why this is the case (it is already explained fairly well in Langford, but there are more arguments available.).

But I dont know enough about Wikipedia to know how this is to be done --

Asaduzaman (talk) 02:30, 4 October 2016 (UTC)[reply]

The article should reflect what is written in reputable and reliable published sources. While there is inconsistency among sources, this should be reflected and explained in the article. Maproom (talk) 08:03, 22 October 2016 (UTC)[reply]

k as abbreviation[edit]

seems Kennzahl is the german term and used in abbreviated form as k in some papers. Can anyone who knows the etymology add this. e.g. here DOI 10.1111/1469-7610.00023 — Preceding unsigned comment added by 212.95.7.80 (talk) 22:46, 7 February 2017 (UTC)[reply]


could someone link the german page version[edit]

https://de.wikipedia.org/wiki/Empirisches_Quantil#Perzentile seems to match. Though I am not sure.

Side Note: As a software developer, I still could not figure out how to edit the links to other languages. Wiki told me the link was already in use, so I guess i tried to create a reference when i only wanted to use an existing link, whose ID I did not find out.

134.93.113.63 (talk) 14:11, 30 October 2018 (UTC)[reply]

 Done. This is done with Wikidata (see percentile (Q2913954)). For this situation, two Wikidata entities needed to be merged. +mt 20:43, 31 October 2018 (UTC)[reply]

Either the text summary is wrong or the image is wrong.

should be 5 2 4 2 3 3 2 1

instead of 5 2 4 3 3 2 2 1

This image https://commons.wikimedia.org/wiki/File:PercentileRankFor10Scores.png is correct which is used in https://en.wikipedia.org/wiki/Percentile_rank

 Done I checked the calculations and the graph was correct, so I have corrected the frequency data in the png file --Nerlost (talk) 10:47, 12 November 2021 (UTC)[reply]

Bias[edit]

The sample percentiles are reported to be biased, could someone please add this? See https://www.sciencedirect.com/science/article/abs/pii/S016771520000242X and https://stats.stackexchange.com/questions/76259/demonstration-of-sample-quantile-bias Biggerj1 (talk) 09:47, 6 November 2021 (UTC)[reply]