Learning Bayesian Networks with the Complete Minimum Description Length
Learn_CMDL is a Java implementation of a score-based learning algorithm for Bayesian networks. A new proposed scoring function for Bayesian networks called Complete minimum description length is implemented. The program receives a data set with multivariate categorical observations and outputs the optimal structure, found by the greedy hill climbing algorithm (GHC).
The program comes packaged as an executable JAR file, already including the required external libraries and can be downloaded here. The source code can be downloaded here.
In order to visualize the output graph in dot format, download graphviz.
The algorithm receives a .csv file such that:
By executing the following .jar file:
$ java -jar Learn_CMDL.jar
The command-line options are the following:
--inputFile <file> Input CSV file to be used for network
learning.
--scoringFunction <arg> Scoring function to be used: CMDL, MDL,
LL and K2. CMDL is used by default.
--numRestarts <int> Number of random restarts for the greedy
hill climber(GHC).
--outputFile <file> Writes output to <file>.
Consider the benchmarck LED data set led_500.csv with 500 instances.Taking the following options:
The command to learn the optimal network is:
java -jar Learn_CMDL.jar led_500.csv CMDL 1000 out_cmdl
And outputs the following structure: