Strongly Possible Functional Dependencies for SQL
Abstract
Missing data is a large-scale challenge to research and investigate. It reduces the statistical power and produces negative consequences that may introduce selection bias on the data. Many approaches to handle this problem have been introduced. The main approaches suggested are either missing values to be ignored (removed) or imputed (filled in) with new values. This paper uses the second method. Possible worlds and possible and certain keys
were introduced in Köhler et.al., and by Levene et.al. Köhler and Link introduced certain functional dependencies (c-FD) as a natural complement to Lien's class of possible functional dependencies (p-FD). Weak and strong functional dependencies were studied by Levene and Loizou. We introduced the intermediate concept of strongly possible worlds that are obtained by imputing values already existing in the table in a preceding paper. This results in strongly possible keys (spKey's) and strongly possible functional dependencies (spFD's). We give a polynomial algorithm to verify a single spKey and show that in general, it is NP-complete to verify an arbitrary collection of spKeys. We give a graph-theoretical characterization of the validity of a given spFD X →sp Y.
We show, that the complexity to verify a single strongly possible functional dependency is NP-complete in general, then we introduce some cases when verifying a single spFD can be done in polynomial time.
As a step forward axiomatization of spFD's, the rules given for weak and strong functional dependencies are checked. Appropriate weakenings of those that are not sound for spFD's are listed.
The interaction between spFD's and spKey's and certain keys is studied. Furthermore, a graph theoretical characterization of implication between singular attribute spFD's is given.