INTRODUCTION

LDNN

Logistic Disjunctive Normal Network (LDNN) is a novel method for approximating Boolean functions. It provides state-of-the-art performance on general task of classification. We proposed a training algorithm for this structure that provides fast convergence rates. Here we briefly introduce this structure and explain the motivations and principles of this structure:

An artificial neural network (ANN) consisting of one hidden layer of squashing functions is an universal approximator for continuous functions defined on the unit hypercube. However, until the introduction of the backpropagation algorithm, training such networks was not possible in practice. The backpropagation algorithm propelled ANNs to be the method of choice for many classification and regression applications. However, eventually ANNs were replaced by more re- cent techniques such as SVMs and RFs. In addition to being surpassed in accuracy by these more recent techniques, an important drawback of ANNs has been the high computational cost of training emphasized by growing data set sizes and dimensionality. An underlying reason for the limited accuracy and high computational cost of training is the herd-effect problem. During backpropagation each hidden unit tries to evolve into a useful feature detector from a random initialization; however, this task is complicated by the fact that all units are changing at the same time without any direct communication between them. Consequently, hidden units can not effectively subdivide the necessary computational tasks among themselves leading to a complex dance which can take a long time to settle down. Here we proposed a network architectures that overcome the difficulties associated with ANNs and backpropagation for supervised learning.

LDNN consists of one adaptive layer of feature detectors implemented by logistic sigmoid functions followed by two fixed layers of logical units that compute conjunctions and disjunctions, respectively. LDNN provides efficient approximation of the classification function by using an union of convex polytopes. Unlike MLPs, LDNNs allow for a simple and intuitive initialization of the network weights which avoids the herd-effect. Assume that we have N conjunctions and M logistic sigmoid functions (discriminants) as inputs for every conjunction. So, the total number of linear discriminants in the first layer will be NxM. We partition the positive training data into N clusters and negative data into M clusters. We initialize the weights of discriminants of every conjunction as vectors that connect the centroid of a positive cluster to the centroids of all the negative clusters. Hence, every conjunctions is aimed to separate one cluster of positive data from all the negative data by forming a convex polytope. All the weights in this structure will be fine-tuned using backpropagation to get the best decision function. This process is illustrated in the figure below using two-moon dataset.

References:

[1] Mehdi Sajjadi, Mojtaba Seyedhosseini, and Tolga Tasdizen. "Disjunctive Normal Networks." Neurocomputing 2016.

A binary classification problem: (a) positive and negative training examples partitioned into three clusters each; linear discriminants from each negative cluster to (b) the first positive cluster, (c) the second positive cluster and (d) the third positive cluster; the conjunction of the discriminants for (g) the first positive cluster, (f) the second positive cluster and (e) the third positive cluster; (h) the disjunction of the conjunctions before (blue) and after (red) gradient descent. The 1/0 pair on the sides of the discriminants represent the direction of the discriminant.

Scientific Computing and Imaging Institute