Talk:Self-balancing binary search tree

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Missing citation on overview[edit]

The overview claims that "Maintaining the height always at its minimum value is not always viable; it can be proven that any insertion algorithm which did so would have an excessive overhead." with a missing citation. I believe this statement is incorrect. I sketched up a proof of a data structure that contains a sorted list backing it, which produces a perfectly height-balanced binary search tree with O(log-base2(n)) worst-case insertion time. This is as good as the insertion time for an AVL tree, so "excessive overhead" implying a slow runtime isn't right.

I followed one of the provided citations to The Art of Computer Programming Volume 3 (Donald Knuth), and on page 476, there is the following quote: "However, it appears that the bookkeeping required for maintaining the weight balance takes more time than Algorithm A [which is similar to an AVL tree]". I think the intended meaning of this is NOT that the asymptotic runtime is greater than O(); instead, that the multiplicative factor on maintaining a perfectly balanced BST is possibly higher, and the code complexity increases. Based on the data structure I sketched, there is an additional overhead in maintaining the weight balance (left and right weight) plus the neighbor nodes (two pointers; can be simplified to a single RANK field + a lookup, as in Knuth's book) per every node. This is in contrast to the AVL tree's two bits for left/right balance per node.

I propose that the line in question is changed to not state that it's impossible to maintain such a structure without sacrificing performance; instead, it should say something like "While it is possible to maintain a BST with minimum height with the expected log(n) operations (insertion/deletion/search), in practice, the additional space requirements required to maintain such a structure outweigh the decrease in search time. An AVL tree is guaranteed to be within a factor of 1.44 of the optimal height [cite Knuth here] while requiring only two additional bits of storage in a naive implementation." An additional note can be made that AVL trees tend to come even closer to the optimal height on average. Knuth, on page 468 of the same book, stated that empirical results show that the multiplicative factor is as low as 1.01 for large N.

Rushingseas8 (talk) 17:36, 8 March 2019 (UTC)[reply]

tree[edit]

Is a B-Tree a binary tree? Doesn't a binary tree only allow two connections from it, while a B-Tree can have a lot more.

A B-tree is certainly not a binary tree. I have no idea when that link snuck in. Away with it! Deco 02:41, 6 Apr 2005 (UTC)


overview[edit]

The overview states that

(log-base2(n+1) - 1) >= log-base2(n)

How can this be? Am I missing something? Seems to me it should be less than or equal, not greater than or equal. E.g., if n=1 (no children), then log-base2(1 + 1) - 1 = 0 = log-base2(1) ... so in that case they are equal. But in any other case, the left hand side is smaller. E.g., let's say n=7:

log-base2(7+1) - 1 = 2 < log-base2(7)

Right?? Mike Williamson 2013/6/3 —Preceding undated comment added 22:46, 3 June 2013 (UTC)[reply]

the formula is wrong[edit]

for example, n:=8, the left is 2, the right is 3. --User:Newmangling 2013/11/22 comment added 04:37,(UTC)

Ordered lists[edit]

The link to ordered lists goes to a page about HTML markup!

Presumably there's a better destination for it... does anyone know of one? 84.9.75.24 (talk) 11:00, 27 November 2007 (UTC)[reply]

Implementations[edit]

That section should mention the differences between the various algorithms, which ones are good, which ones aren't, and which perform better in certain situations. Shinobu (talk) 03:14, 7 December 2007 (UTC)[reply]

Good idea, this is an appropriate article for comparison. Dcoetzee 03:32, 7 December 2007 (UTC)[reply]

2-3 Tree is of course Self-balancing search tree but it is not "binary". Aleksey Midenkov (talk) 06:09, 18 August 2016 (UTC)[reply]

Proof of min size[edit]

I appreciate the work that went into typing the proof of the minimum tree height. That proof would be fine in a textbook, but unfortunately Wikpedia is not a textbook; its articles are not supposed to include proofs, over-justify statements, or over-explain examples. Sorry, and all the best, --Jorge Stolfi (talk) 02:27, 11 August 2009 (UTC)[reply]

There are a lot of proofs on Wikipedia. Are they really not allowed? What about if the proof only showed if you clicked a link saying 'show proof' or something? Strict rule-following should be weighed carefully against usefulness.Blackspotw (talk) 16:46, 20 May 2010 (UTC)[reply]
I think proofs are fine. They are certainly encyclopedic if they come from a reliable source. There's nothing in WP:NOT that would suggest otherwise as far as I can tell. -- intgr [talk] 18:44, 20 May 2010 (UTC)[reply]
See Category:Articles containing proofs (there are 379 of them). There is the question of how useful this proof is here - is it too verbose? Depends on the reader. Dcoetzee 18:57, 20 May 2010 (UTC)[reply]

Is a splay height balanced?[edit]

In the first part of this article it reads: "a self-balancing (or height-balanced) binary search tree is any binary search tree data structure that automatically keeps its height" a splay tree is listed in this article as a tree that fits this description. however I do not believe that a splay tree is height balanced. I'm new to this so maybe someone whos better ith this topic could fix the article if it is incorrect, I'm just studying for a test, so I am not certain enough to edit this. DriverChief (talk) 21:30, 16 December 2009 (UTC)[reply]

hash table are not in O(1)[edit]

This article includes the often repeated but wrong statement that search in hashtables perform in O(1) and binary trees in O(log(n)). This error is due to two possible confusions:

  • a misuse of the O() notation, when one assumes implicitly than n is smaller than some large but finite number, while O() does not have such restriction.
  • the assumption that the cost of calculating a hash and performing a comparison is comparable asymptotically, which is not the case when n goes to infinity.

Indeed, if you have a set of n elements, you need at least log(n) bits to represent them, thus a hash function will take at least log(n) bit operations to read the input and produces a hash value. On the other hand, comparison can be performed in O(1) in average, because you only need to compare the first few bits until a mismatch occurs and not all the bits.

Thus, search is O(log(n)) in both case. 2A01:E35:2F45:9A0:BA76:3FFF:FEF7:E4D5 (talk) 22:18, 20 November 2014 (UTC)[reply]

Er, what?
> the assumption that the cost of calculating a hash and performing a comparison is comparable asymptotically, which is not the case when n goes to infinity
The complexity of calculating the hash and comparing values is assumed to be constant, just like greater/less comparisons in a tree lookup aren't factored into the O() complexity. That's a fair assumption given constant-length keys.
> Indeed, if you have a set of n elements, you need at least log(n) bits to represent them
This applies just as well to a binary tree structure that needs to store left/right pointers, pointer size needs to be at least log(n). So by your argument its complexity should be O(log(log n))?
-- intgr [talk] 23:21, 20 November 2014 (UTC)[reply]