Talk:Tagged union

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Pascal[edit]

Uh, I suggest you reconsider your tagged union example for Pascal, not so much because of the Big-Brother implications of an Employee record tracking how many lovers someone has, which is merely bad from a legal and social point of view, but more because it has a serious logical flaw: it assumes that marriage and multiple lovers are mutually exclusive. This, of course, is fundamentally in error, making this a bad example of a tagged union, whose alternatives must be exclusive. Using code examples that demonstrate a significant misapprehension of one's problem domain only encourages buggy programming.

---

This is entirely true - I'll try and think of a better example. I was operating under the false assumption of 100% fidelity. It's a silly example anyway. Derrick Coetzee 13:54, 4 Feb 2004 (UTC)


Merge on tagged union[edit]

I'd like your justification on the merge header you added to tagged union. I'm inclined to remove it, since there's a great deal more to be said about tagged unions than about unions in general. That said, I have considered merging it into a term which does not imply a particular implementation, such as disjoint union or discriminated union. Derrick Coetzee 00:55, 19 Sep 2004 (UTC)

I agree that there is a great deal to say about tagged union. But, I ask, is there much to say about untagged union? Although I don't know which is more popular, tagged or untagged union, I believe the theoritical importance is about tagged union and I seems to me untagged union is really just lousy datatype or something not really union. And as you see, the current union article is really about untagged union and is not so good actually. If not merge, another option would be let tagged union be a new union and the current union article to be untagged union. I guess I don't believe there is a great deal to say about union in general and untagged union. Do you think not? -- Taku 01:15, Sep 19, 2004 (UTC)
Put in another way, I don't believe union in C/C++ is a good example of union. It is too primitive and is usually not used in the way unions are supposed to be used. -- Taku 01:18, Sep 19, 2004 (UTC)
This is not so much a question of importance as of word usage. The word union by itself almost exclusively refers to untagged unions, both in the literature and in popular culture (where it's used almost entirely in the C/C++ community). Tagged unions are not called tagged unions quite so often, however. In literature they're generally called discriminated or disjoint unions, but these terms are often considered to refer to the pure mathematical set theory concept. As a programming concept about the closest thing is ML's datatype construction, which is obviously an ambiguous term. The term tagged union is fairly widely used and understood, and avoids the problems with other terms, but to many it seems to imply a particular implementation (the use of an explicit tag field), which isn't quite what I intended. Derrick Coetzee 02:22, 19 Sep 2004 (UTC)

I don't think the term tagged union has any problem; it is well-understood and unambigous. But the question is do we need a separate article about untagged unions. I doubt the current union (computer science) will grow in the way there is little overlap with tagged union. I know that a name union particularly gives an impression of union type in C. But the article should be about a concept of union datatype in general and I don't think there are a lot to say about unions without the way of type checking, or tagging. I think we use terms like discriminated unions or such because we want to emphasize they are tagged, like we say static type checking instead of just type checking. But discussing the general concept, we don't have to be too specific. If you really want to see a name tagged and to avoid a false impression, we can name the combined article "tagged and untagged union", "union datatype" or something, though I think it is unnecessary. -- Taku 05:50, Sep 19, 2004 (UTC)

I think the links to tagged union in the union (computer science) article, which I made repeated and prominent (self-advertising after all ;) serve this purpose well enough. Another reason I avoid use of the word union for a discriminated union is because as mathematical concepts these are entirely distinct, and the Wikipedia already professes this viewpoint in the articles union (set theory) and discriminated union. You'll never hear the word untagged or undiscriminated union in mathematical discourse, to my knowledge. Derrick Coetzee 23:46, 20 Sep 2004 (UTC)

Is a set of C++ classes similar to a tagged union?[edit]

Can a set of C++ classes that inherit from a common base class, using a virtual method table to determine which (virtual) methods to call, be considered an example of a tagged union where the vtable pointer acts as the tag? --Damian Yerrick (talk | stalk) 18:54, 31 July 2007 (UTC)[reply]

In short, yes, but with the limitation that you can't modify which subclass is used; this is fixed at creation time. Also, C++ classes can be implemented by means other than vtables; depending on the implementation the tag may be implicit. Dcoetzee 00:25, 1 August 2007 (UTC)[reply]
You are saying that the tags are not mutable. This is the same behavior as in ML and other languages with algebraic data types. --Spoon! (talk) 05:26, 5 December 2007 (UTC)[reply]

The tree example is irrelevant[edit]

As far as I can see, the tree example is about structs, not unions. And, at the very least, doesn't bring any new insights about unions to the reader. Gwrede 15:45 21 February 2009 (UTC)

A tree is a tagged union which can be either a Leaf or a Node. Leaf is a unit type and Node is a structure containing an integer and two trees. Oktal (talk) 16:55, 21 August 2014 (UTC)[reply]
I agree with Oktal, the tree example is not just relevant but paradigmatic about the use and need of disjoint-unions.

Algebraic data type[edit]

Merge with Algebraic data type! --Jonah.ru (talk) 19:29, 3 April 2011 (UTC)[reply]

Algebraic data types cover coproducts, there is no need to join, maybe two different flavors one theoretical and other together with structures and higher order functions as examples of how algebraic data types are implemented in several programming languages.

"An enumerated type can be seen as a degenerate case"[edit]

Possibly change there section of the introduction to this? "An enumerated type can be seen as a degenerate case: a tagged union of unit types. It corresponds to a set of nullary constructors A^0 + B^0 and may be implemented as a simple tag variable, since it holds no additional data besides the value of the tag" --199.119.232.2 (talk) 00:44, 21 February 2012 (UTC)[reply]

It is in fact equivalent to a simple union, which is used to build the disjoint (tagged) union, providing the labels.

Merge from variant type[edit]

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
No merge, following no consensus over almost 3 years. Klbrain (talk) 09:59, 10 June 2017 (UTC)[reply]

I propose that variant type be merged into tagged union. The variant type article is of a low quality, covering too narrow a context and just being confusing. It seems to be describing a concept which is identical to the one described by tagged union, which does a much better job of describing it. Variant type could even be renamed Variant type (Visual Basic). -- Oktal (talk) 21:49, 19 August 2014 (UTC)[reply]


No. These two describe entirely different datatypes. Granted, the variant type article is of a low quality as you say, and certainly could be improved to cover other languages that feature variant typing. But variant typing and tagged unions are far too different to be merged. One is a compile-time check on how much memory a particular symbol represents, while the other is a run-time feature for dynamic typing. In Visual Basic, a variant type is essentially a reference that is not strongly typed until runtime.

Variants and tagged unions are the same thing: types that represent a dynamic type. The tagged union article says exactly this, and variant type says "the Variant type is a tagged union". What the heck is "a compile-time check on how much memory a particular symbol represents"? Any argument that relies on the meaning of "variant type" in one specific programming language (VB in this case) should be balanced by its meaning in other programming languages and programming in general. Oktalist (talk) 23:56, 16 December 2016 (UTC)[reply]
There are (at least) two different concepts here, but as with most things, different people call them by different names. The idea of “variants” in VB is totally different from “variants” in OCaml, and the Variant does a huge disservice to readers by conflating the two concepts.
  • VB's idea of a `Variant` is a single universal type that can hold a value of any arbitrary type through the use of dynamic reflection/run-time type information (sometimes with a few implementation-dependent restrictions). In Haskell this is called the Dynamic type, whereas Rust calls it Any. In Java, this would be Object, since everything is a subtype of object, and in C# it's called dynamic. I would say “Any” makes a good unambiguous name here, since such a term also exists in TypeScript. An Any value stores two pieces of information: what type is stored within, and its value. That's it. The compiler is unable to help very much, since there is no static information regarding what could be contained inside.
  • In contrast, OCaml's idea of a variant appears to be identical to the type theoretic notion of a *sum type*, which is also more informally known as a *tagged union*. Unlike Any, a sum type is “closed”: you have to declare all the possibilities up front for each sum type, and the compiler is therefore capable of detecting whether you have considered all the possibilities or not (this also implies that you can have multiple sum types, whereas there is usually only one Any type in any given language). Moreover, unlike Any types, each choice is distinguished by a user-defined tag, not by type. This means you can have a sum type `Err Integer | Ok Integer` that holds either a `Err` value with some associated integer value, or a `Ok` value with some associated integer value. This is in fact how one can use sum types to handle errors: despite both `Ok` and `Err` holding an integer value, their meaning could be totally different (one could be the error code, the other could be the result, which just happens to be an integer by coincidence). This is not possible with the Any type, since they are both integers and therefore there's no way to tell whether the integer is an error code or the result you wanted.
I propose editing the articles to emphasize this difference and remove the conflation of the two concepts, with some warning for the reader to indicate that the terminology in this area is not very standardized. --Fylwind (Fylwind) 23:55, 19 January 2017 (UTC)[reply]
I don't have a big problem conflating these two concepts. In a type-theoretical sense the "any" type is a special case of the tagged union. Variant type already says this in the second paragraph. The set of representable types of "any" is indeed not closed, nevertheless it does have such a set: the set of all types. I don't see this as a strong distinction. I would not oppose making this distinction, but I think the uncertainties of terminology would make it difficult. I think the more pressing problem is the emphasis on one particular programming language, to the exclusion of a treatment of the general type theoretic notion. Any edit that solves that problem is fine by me. Oktalist (talk) 04:10, 31 January 2017 (UTC)[reply]
There's like at least 5 closely related and commonly confused concepts here: plain old sum types (a special case of algebraic data types) A + B (A could be the same as B), existential / dependent sum types (open sum types) ∑[A] F(A), type-tagged unions (for the lack of a better name) A | B (A must be different from B), Any / dynamic types (requires runtime reflection) A | B | … (over all types), and extensible sum types (the dual of extensible records / row types). This is not even considering the C unions (which are untyped) and enums (which are degenerate tagged unions with only the tags). A type-theoretic approach would be to start with plain old sum types, and then see how they can be generalized and/or specialized to get all the other flavors of ‘unions’. --Fylwind (talk) 22:01, 1 February 2017 (UTC)[reply]
Which concepts should be described by the article tagged union? Which concepts should be described by the article variant type? If the answers to the previous two questions share some or all of the same concepts, should the articles be merged? Oktalist (talk) 19:27, 12 February 2017 (UTC)[reply]
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

ALGOL 68/Typed Racket/Ceylon union types[edit]

Union type currently talks about overlapped memory accessed through different types (untagged variants). This article says tagged unions are also called "variant, variant record, discriminated union, disjoint union, or sum type" but the ALGOL 68, Typed Racket, and Ceylon unions are tagged unions, but not sum types or any of these other terms. They're unions, where T ∪ T ≡ T, A ∪ B ≡ B ∪ A, and A ∪ B ∪ C ≡ (A ∪ B) ∪ C ≡ A ∪ (B ∪ C). Specsluce (talk) 23:08, 26 November 2014 (UTC)[reply]

There are many misconceptions in this article[edit]

the article say:

Tagged unions are most important in functional languages such as ML and Haskell,

NO Haskell and other typed functional languages have algebraic data types, but those types are not exclusive of functional languages, but a basis for data types in general. It is not possible to build important data types, like trees, without tagged union, even in languages like Fortran IV with no pointers, one had to emulate registers and variants, or tagged products and tagged unions for those who prefer those names, to implement tree data structures. (with arrays or a combinations of character arrays and variable formats.)

where it says:

Mathematically, tagged unions correspond to disjoint or discriminated unions, usually written using +

that is partially wrong, because that notation is used when disjoint unions are taken as sums or coproducts. In other mathematical context the symbol is preferred.

where it says:

An enumerated type can be seen as a degenerate case:

I commented above about that, here I can only say that this shows the misunderstanding of the abstract notion of a disjoint union. An enumerated type is the set of the labels. A disjoint union can be reversed, because the tags mark from which set an element was taken.

where it says:

== Advantages and disadvantages ==
The primary advantage of a tagged union over an untagged union is that all accesses are safe, and the compiler can even check that all cases are handled.

Talking about advantages and disadvantages is totally wrong, because the unions and disjoint unions are used for different purposes. The unions are used in low level programming as to ways to represent/observe the same thing. For example a 16 bit word, can also be read as 2 bytes. The disjoint (or tagged) union is used to represent different things the tag allows to separate them. For example different kinds of trees, an empty tree, a tree that is a leaf, and a tree that is a fork with two trees (recursive defined). Any tree is a set constructed with the disjoint union of those classes (in the math sense) of trees.

This article is redundant, because there are others for each alias, and there no need to separate from other kind of constructors like structures and functions (in higher order languages).

Scala 3 union types[edit]

I think Scala 3 union types should be mentioned somewhere, but if I understand correctly, they are neither union types nor tagged unions... Any ideas? — Chrisahn (talk) 18:46, 21 August 2022 (UTC)[reply]

Compiler Tokens[edit]

Compiler tokens are the quintessential tagged union. I think using tokens as an example of tagged union would make this article more understandable. 50.206.176.154 (talk) 02:32, 5 March 2023 (UTC)[reply]

Different variants != different types[edit]

quote „ could take on several different, but fixed, types“ seems to be indicating that values created via different variants have to have different types. that is not true in all languages since some treat variants more like constructors of the same type. the „different types“ requirement is probably coming from managed OO languages that use inheritance for the implementation of variants. 50.46.240.130 (talk) 06:28, 7 January 2024 (UTC)[reply]

Full quote: "A tagged union is a data structure used to hold a value that could take on several different, but fixed, types." The part "could take on several different types" refers to the value held in the union, not to the union type itself. The union type can remain the same, but the type of the value held in the union can vary. — Chrisahn (talk) 11:58, 7 January 2024 (UTC)[reply]