# Subset sum problem

(Redirected from Subset sum)

The subset sum problem is a decision problem in computer science. There are several equivalent formulations of the problem. One of them is: given a multiset of integers, is there a non-empty subset whose sum is zero? For example, given the set ${\displaystyle \{-7,-3,-2,9000,5,8\}}$, the answer is yes because the subset ${\displaystyle \{-3,-2,5\}}$ sums to zero. Another equivalent formulation is: given a multiset of positive integers and a target sum T, does any subset of the numbers sum to precisely T?[1] Subset sum can also be regarded as an optimization problem: find a subset whose sum is as close as possible to T.

Subset sum is related to several other problems:

• The partition problem is a special case of subset-sum, in which the target sum T is exactly half the sum of all input numbers (i.e., ${\displaystyle T={\frac {1}{2}}(a_{1}+\dots +a_{n})}$ ).
• The knapsack problem is a generalization of subset-sum.[2]

The subset sum problem is NP-complete, but there are several algorithms that can solve it reasonably fast in practice.

## Complexity

The complexity of the subset sum problem depends on two parameters: n - the number of input integers, and L - the precision of the problem, stated as the number of binary place values that it takes to state the problem.

• If n (the number of integers) is a small fixed number, then an exhaustive search for the solution is practical.
• If L (the number of binary digits) is a small fixed number, then there are dynamic programming algorithms that can solve it exactly.

The problem becomes hard when both n and L are large. The complexity of the best known algorithms is exponential in the smaller of the two parameters n and L.

## Exponential time algorithms

There are several ways to solve subset sum in time exponential in n.[3]

### Inclusion-Exclusion

The most naïve algorithm would be to cycle through all subsets of n numbers and, for every one of them, check if the subset sums to the right number. The running time is of order ${\displaystyle O(2^{n}\cdot n)}$, since there are ${\displaystyle 2^{n}}$ subsets and, to check each subset, we need to sum at most n elements.

The algorithm can be implemented by depth-first search of a binary tree: each level in the tree corresponds to an input number; the left branch corresponds to excluding the number from the set, and the right branch corresponds to including the number (hence the name Inclusion-Exclusion). The memory required is ${\displaystyle O(n)}$. The run-time can be improved by several heuristics:[3]

• Process the input numbers in descending order.
• If the integers included in a given node exceed the sum of the best subset found so far, the node is pruned.
• If the integers included in a given node, plus all remaining integers, are less than the sum of the best subset found so far, the node is pruned.

### Horowitz and Sanhi

Horowitz and Sahni[4] published a faster exponential-time algorithm, which runs in time ${\displaystyle O(2^{n/2}\cdot (n/2))}$, but requires much more space - ${\displaystyle O(2^{n/2})}$. The algorithm splits arbitrarily the n elements into two sets of ${\displaystyle n/2}$ each. For each of these two sets, it stores a list of the sums of all ${\displaystyle 2^{n/2}}$ possible subsets of its elements. Each of these two lists is then sorted. Using a standard comparison sorting algorithm for this step would take time ${\displaystyle O(2^{n/2}n)}$. However, given a sorted list of sums for ${\displaystyle k}$ elements, the list can be expanded to two sorted lists with the introduction of a (${\displaystyle k+1}$)th element, and these two sorted lists can be merged in time ${\displaystyle O(2^{k})}$. Thus, each list can be generated in sorted form in time ${\displaystyle O(2^{n/2})}$. Given the two sorted lists, the algorithm can check if an element of the first array and an element of the second array sum up to T in time ${\displaystyle O(2^{n/2})}$. To do that, the algorithm passes through the first array in decreasing order (starting at the largest element) and the second array in increasing order (starting at the smallest element). Whenever the sum of the current element in the first array and the current element in the second array is more than T, the algorithm moves to the next element in the first array. If it is less than T, the algorithm moves to the next element in the second array. If two elements that sum to T are found, it stops.

### Schroeppel and Shamir

Schroeppel and Shamir[5] presented an algorithm based on Horowitz and Sanhi, that requires similar runtime - ${\displaystyle O(2^{n/2}\cdot (n/4))}$, much less space - ${\displaystyle O(2^{n/4})}$. Rather than generating all subsets of n/2 elements in advance, they partition the elements into 4 sets of n/4 elements each, and generate subsets of n/2 elements dynamically using a min heap.

Due to space requirements, the HS algorithm is practical for up to about 50 integers, and the SS algorithm is practical for up to 100 integers.[3]

### Howgrave-Graham and Joux

Howgrave-Graham and Joux[6] presented a probabilistic algorithm that runs faster than all previous ones - in time ${\displaystyle O(2^{n/3})}$. It solves only the decision problem, cannot prove there is no solution for a given sum, and does not return the subset sum closest to T.

## Pseudo-polynomial time dynamic programming solution

The problem can be solved in pseudo-polynomial time using dynamic programming. Suppose the sequence is

${\displaystyle x_{1},\ldots ,x_{N}}$

sorted in the increasing order and we wish to determine if there is a nonempty subset which sums to zero. Define the boolean-valued function ${\displaystyle Q(i,s)}$ to be the value (${\displaystyle true}$ or ${\displaystyle false}$) of

"there is a nonempty subset of ${\displaystyle x_{1},\ldots ,x_{i}}$ which sums to ${\displaystyle s}$."

Thus, the solution to the problem "Given a set of integers, is there a non-empty subset whose sum is zero?" is the value of ${\displaystyle Q(N,0)}$.

Let ${\displaystyle A}$ be the sum of the negative values and ${\displaystyle B}$ the sum of the positive values. Clearly, ${\displaystyle Q(i,s)=false}$, if ${\displaystyle s or ${\displaystyle s>B}$. So these values do not need to be stored or computed.

Create an array to hold the values ${\displaystyle Q(i,s)}$ for ${\displaystyle 1\leq i\leq N}$ and ${\displaystyle A\leq s\leq B}$.

The array can now be filled in using a simple recursion. Initially, for ${\displaystyle A\leq s\leq B}$, set

${\displaystyle Q(1,s):=(x_{1}==s)}$

where ${\displaystyle ==}$ is a boolean function that returns true if ${\displaystyle x_{1}}$ is equal to ${\displaystyle s}$, false otherwise.

Then, for ${\displaystyle i=2,\ldots ,N}$, set

${\displaystyle Q(i,s):=Q(i-1,s)}$ or ${\displaystyle (x_{i}==s)}$ or ${\displaystyle Q(i-1,s-x_{i}),forA\leq s\leq B}$.

For each assignment, the values of ${\displaystyle Q}$ on the right side are already known, either because they were stored in the table for the previous value of ${\displaystyle i}$ or because ${\displaystyle Q(i-1,s-x_{i})=false}$ if ${\displaystyle s-x_{i} or ${\displaystyle s-x_{i}>B}$. Therefore, the total number of arithmetic operations is ${\displaystyle O(N(B-A))}$. For example, if all the values are ${\displaystyle O(N^{k})}$ for some ${\displaystyle k}$, then the time required is ${\displaystyle O(N^{k+2})}$.

This algorithm is easily modified to return the subset with sum 0 if there is one.

The dynamic programming solution has runtime of ${\displaystyle O(sN)}$ where ${\displaystyle s}$ is the sum we want to find in set of ${\displaystyle N}$ numbers. This solution does not count as polynomial time in complexity theory because ${\displaystyle B-A}$ is not polynomial in the size of the problem, which is the number of bits used to represent it. This algorithm is polynomial in the values of ${\displaystyle A}$ and ${\displaystyle B}$, which are exponential in their numbers of bits.

For the case that each ${\displaystyle x_{i}}$ is positive and bounded by a fixed constant ${\displaystyle C}$, Pisinger found a linear time algorithm having time complexity ${\displaystyle O(NC)}$ (note that this is for the version of the problem where the target sum is not necessarily zero, otherwise the problem would be trivial).[7][8] In 2015, Koiliaris and Xu found a deterministic ${\displaystyle {\tilde {O}}(s{\sqrt {N}})}$ algorithm for the subset sum problem where ${\displaystyle s}$ is the sum we need to find.[9] In 2017, Bringmann found a randomized ${\displaystyle {\tilde {O}}(s)}$ time algorithm [10].

## Polynomial time approximate algorithm

An approximate version of the subset sum would be: given a set of ${\displaystyle N}$ numbers ${\displaystyle x_{i},\ldots ,x_{N}}$ and a number ${\displaystyle s}$, output:

• Yes, if there is a subset that sums up to ${\displaystyle s}$.
• No, if there is no subset summing up to a number between ${\displaystyle (1-c)s}$ and ${\displaystyle s}$ for some small ${\displaystyle c>0}$.
• Any answer, if there is a subset summing up to a number between ${\displaystyle (1-c)s}$ and ${\displaystyle s}$ but no subset summing up to ${\displaystyle s}$.

If all numbers are non-negative, the approximate subset sum is solvable in time polynomial in ${\displaystyle N}$ and ${\displaystyle 1/c}$.

The solution for subset sum also provides the solution for the original subset sum problem in the case where the numbers are small (again, for non-negative numbers). If any sum of the numbers can be specified with at most ${\displaystyle P}$ bits, then solving the problem approximately with ${\displaystyle c=2^{-P}}$ is equivalent to solving it exactly. Then, the polynomial time algorithm for approximate subset sum becomes an exact algorithm with running time polynomial in ${\displaystyle N}$ and ${\displaystyle 2^{P}}$ (i.e., exponential in ${\displaystyle P}$).

The algorithm for the approximate subset sum problem is as follows:

initialize a list S to contain one element 0.

for each i from 1 to N do
let T be a list consisting of xi + y, for all y in S
let U be the union of T and S
sort U
make S empty
let y be the smallest element of U
for each element z of U in increasing order do
// Trim the list by eliminating numbers close to one another
// and throw out elements greater than s.
if y + cs/N < z ≤ s then
y = z

if S contains a number between (1 − c)s and s then
return yes
else
return no


The algorithm is polynomial time because the lists ${\displaystyle S}$, ${\displaystyle T}$ and ${\displaystyle U}$ always remain of size polynomial in ${\displaystyle N}$ and ${\displaystyle 1/c}$ and, as long as they are of polynomial size, all operations on them can be done in polynomial time. The size of lists is kept polynomial by the trimming step, in which we only include a number ${\displaystyle z}$ into ${\displaystyle S}$ if it is greater than the previous one by ${\displaystyle cs/N}$ and not greater than ${\displaystyle s}$.

This step ensures that each element in ${\displaystyle S}$ is smaller than the next one by at least ${\displaystyle cs/N}$ and do not contain elements greater than ${\displaystyle s}$. Any list with that property consists of no more than ${\displaystyle N/c}$ elements.

The algorithm is correct because each step introduces an additive error of at most ${\displaystyle cs/N}$ and ${\displaystyle N}$ steps together introduce the error of at most ${\displaystyle cs}$.

## References

1. ^ Kleinberg, Jon; Tardos, Éva (2006). Algorithm Design (2nd ed.). p. 491. ISBN 0-321-37291-3.
2. ^ Martello, Silvano; Toth, Paolo (1990). "4 Subset-sum problem". Knapsack problems: Algorithms and computer interpretations. Wiley-Interscience. pp. 105–136. ISBN 0-471-92420-2. MR 1086874.CS1 maint: ref=harv (link)
3. ^ a b c Richard E. Korf, Ethan L. Schreiber, and Michael D. Moffitt (2014). "Optimal Sequential Multi-Way Number Partitioning" (PDF).CS1 maint: multiple names: authors list (link)
4. ^ Horowitz, Ellis; Sahni, Sartaj (1974), "Computing partitions with applications to the knapsack problem" (PDF), Journal of the Association for Computing Machinery, 21 (2): 277–292, doi:10.1145/321812.321823, hdl:1813/5989, MR 0354006
5. ^ Schroeppel, Richard; Shamir, Adi (1981-08-01). "A $T = O(2^{n/2} )$, $S = O(2^{n/4} )$ Algorithm for Certain NP-Complete Problems". SIAM Journal on Computing. 10 (3): 456–464. doi:10.1137/0210033. ISSN 0097-5397.
6. ^ Howgrave-Graham, Nick; Joux, Antoine (2010). Gilbert, Henri (ed.). "New Generic Algorithms for Hard Knapsacks". Advances in Cryptology – EUROCRYPT 2010. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer: 235–256. doi:10.1007/978-3-642-13190-5_12. ISBN 978-3-642-13190-5.
7. ^ http://hjemmesider.diku.dk/~pisinger/codes.html
8. ^ Pisinger D (1999). "Linear Time Algorithms for Knapsack Problems with Bounded Weights". Journal of Algorithms, Volume 33, Number 1, October 1999, pp. 1–14
9. ^ Koiliaris, Konstantinos; Xu, Chao (2015-07-08). "A Faster Pseudopolynomial Time Algorithm for Subset Sum". arXiv:1507.02318 [cs.DS].
10. ^ Bringmann K. A near-linear pseudopolynomial time algorithm for subset sum[C]//Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, 2017: 1073-1084