Abstract
Many biological propositions can be supported
by a variety of different types of
evidence. It is often useful to collect together
large numbers of such propositions,
together with the evidence supporting them,
into databases to be used in other analyses.
Methods that automatically make preliminary
choices about which propositions to
include can be helpful, if they are accurate
enough. This can involve weighing evidence
of varying strength.
We describe a method for learning a scoring
function to weigh evidence of different types.
The algorithm evaluates each source of evidence
by the extent to which other sources
tend to support it. The details are guided
by a probabilistic formulation of the problem,
building on previous theoretical work.
We evaluate our method by applying it to
predict protein-protein interactions in yeast,
and using synthetic data.
After publishing this paper we discovered a bug in the code
evaluating the EM algorithm on the protein-protein
data. The software above includes a patch.
Here is a revision of Table 1 from the paper: