. Efficiently Finding Small Representations for LTFs

Written by: Nimitt

.. Introduction

This post discusses a recent work by Johan Håstad: Efficiently finding small representations for LTFs. Linear Threshold Functions (LTFs) are a simple yet interesting class of Boolean functions. Informally, an LTF outputs the sign of a linear polynomial over boolean variables; for example, $f(x_1,x_2) = \operatorname{sign}(2x_1+3x_2+5)$. Formally, given coefficients $w = (w_0,w_1,\dots,w_n) \in \mathbb{Z}^{n+1}$, an LTF is a function $f_w : \{-1,1\}^n \to \{-1,1\}$ defined by:

$$f_w(x)=\operatorname{sign}\!\left (w_0 + \sum_{i=1}^{n}w_i x_i \right)$$

The vector $w$ is the representation of the LTF. Therefore, the size of the representation is the number of bits required to encode $w$. It is well known that every LTF has an equivalent representation whose coefficients require only $O(n\log n)$ bits each. However, given an arbitrary representation, finding such a small one efficiently is not known. Håstad's work studies a polynomial-time algorithm that, starting from any representation, produces an equivalent in which each coefficient can be encoded using only $O(n^2)$ bits.

.. Existence of smaller representations

Note that many different weight vectors represent exactly the same LTF. Since only the sign of the linear form matters, multiplying all coefficients by a positive constant does not change the function. Thus, large coefficients are often redundant. This naturally suggests the following idea: starting from a representation with large weights, can we reduce the coefficients (by a factor, dropping the least significant bit) to obtain smaller ones while preserving the sign? Unfortunately, reducing coefficients naively does not always work. Some inputs may lie very close to the threshold where the linear form changes sign, and we must account for these bad inputs. In achieving that, a main tool is finding short vectors in an appropriately constructed lattice.

.. Lattices and the LLL algorithm

Intuitively, a lattice is a discrete analogue of a vector space. A lattice spanned by basis vectors $b_1,b_2,\dots,b_m$ is the "grid" of vectors/points:

$$ \Lambda = \left\{\sum_{i=1}^m\alpha_ib_i \mid \alpha_i \in \mathbb{Z} \right\} $$

Many interesting problems can be reduced to finding the shortest vector in an appropriately constructed lattice. While finding the absolute shortest vector is NP-hard, the classical algorithm by Lenstra, Lenstra, and Lovász (LLL) finds a vector in polynomial time whose length is within a factor of $2^{(n-1)/2}$ of the shortest vector. LLL is an iterative algorithm which starts from initial basis of lattice and outputs a "reduced" basis, where each iteration involves some swapping or Gram--Schmidt operation.

Lemma (LLL works): If $(b_1,\dots,b_n)$ is a $n$-dimensional LLL reduced basis of lattice $\Lambda$, then $\|b_1\| \le 2^{(n-1)/2} s(\Lambda)$, where $s(\Lambda)$ is the length of shortest vector in $\Lambda$.

.. Main Algorithm

We start with learge initial integer weights $w_i$, where $\|w\| = 2^l \ge 2^{4n^2}$. We construct a lattice $\Lambda_m$ of dimension $n+1$ residing in an $n+2$ dimensional space.

\[ \Lambda_m = \begin{bmatrix} \mathbb{I}_n && 2^{-m}w \end{bmatrix} \]

A typical vector $v$ in this lattice takes the form $(u, 2^{-m}(u,w))$ for an integer vector $u$. The main algorithm is an iterative. In each iteration, we run the LLL algorithm on $\Lambda_m$ for increasing values of the parameter $m$ ($m = 0, 1, \dots, l$), to search for short vectors.

... Freezing the "Dangerous" directions

As $m$ increases, the algorithm updates $\Lambda_m$ to $\Lambda_{m+1}$ by simply dividing the last coordinate of all vectors by 2. During this process, we monitor the basis vectors $v_i$.

A vector $v_i$ is classified as "short" if its orthogonal component $\|v_i^*\|$ satisfies $\|v_i^*\| \le 2^{n-i/2}$. When LLL discovers a short vector, the algorithm essentially "freezes" it: keeping it aside and only modifying the non-short vectors in subsequent LLL reductions. We project the remaining vectors onto the space orthogonal to this newly found short vector and continue running LLL on the smaller space.

Let $m_i$ represent the specific value of $m$ at which the $i$-th short vector appears. We can bound the properties of these vectors using $\alpha_i = \log_2\|u_i^*\|$. Lemma 3.4 guarantees that for any $j$, $\alpha_j \le n$, and Lemma 3.5 further establishes that $0 \le \sum_{i=1}^s \alpha_j \le sn$.

To decide when to split our space, we introduce a tracking parameter $t_j = 3nj - \sum_{i=1}^{j-1}\alpha_i$. The algorithm finds the smallest index $k$ such that $m_{k+1} \ge t_{k+1}$. If $k=0$, the weights are generally massive and safely far from the threshold, meaning we can simply halve all weights ($w_i' = \lfloor w_i/2 \rfloor$) without changing the function (Lemma 3.8).

... Subspace split

When $k \ge 1$, the first $k$ short vectors define a linear span called $W^1$, while its orthogonal complement is $W^2$. We can perfectly decompose our original weight vector as $w = w^1 + w^2$, where $w^1 \in W^1$ and $w^2 \in W^2$.

Size Reduction:

$w^1$ is already short: Corollary 3.10 proves that $\|w^1\| \le 2^{t_{k+1}+1-2n}$. Because it is bounded, we don't need to reduce it.
$w^2$ is safe to shrink: Because $w^2$ lies in the orthogonal complement $W^2$, it represents directions where the inner product evaluated by the LTF is safely large.

We want to shrink $w^2$ by a factor of 4, aiming for $w' = w^1 + w^2/4$. However, $w^2/4$ might not have integer coordinates. To fix this, Lemma 3.11 guarantees that we can efficiently find an integer vector $\tilde{w}^2$ in the lattice of $W^2$ (denoted $L^2$) that is very close to $w^2/4$, bounded by an error of $\|\tilde{w}^2 - 3w^2/4\| \le n^2 2^{(k+1)n}$.

We set our new, reduced weights as $w' = w^1 + \tilde{w}^2$.

... $f_{w'} = f_w$

We must guarantee that the sign of $(w,x)$ equals the sign of $(w',x)$ for all inputs $x \in \{-1,1\}^{n+1}$.

If the input $x$ lies entirely in $W^1$, then $(x,w) = (x,w')$ exactly, because $\tilde{w}^2$ is orthogonal to $W^1$. If $x$ does not belong to $W^1$, Lemma 3.12 proves that the original inner product is strictly large: $|(x,w)| \ge 2^{m_{k+1}-1}$. Because we bounded the lengths of $w^1$ and the rounding error of $\tilde{w}^2$, the massive size of the $W^2$ component dominates the equation. Therefore, the sign is preserved perfectly, and $f_w = f_{w'}$.

... Conclusion

Lemma 3.13 proves that if our initial weights are large ($\|w\| \ge 2^{4n^2+4n}$), this single iterative step guarantees that $\|w'\| \le \|w\|/2$. This leads directly to the main result of the paper:

Theorem 3.14: Given a set of weights $w$ defining a threshold function $f_w$ in $n$ variables, we can find a set of weights $w'$ such that $\|w'\| \le 2^{4n^2+4n}$ and $f_w = f_{w'}$ in polynomial time.

By simply applying this algorithm repeatedly, we slash the weight size by at least a factor of 2 in every step until they drop below the $O(n^2)$ bit threshold.