RapidMiner

From Robin

(Difference between revisions)
Jump to: navigation, search
(Optimising SVM Parameters)
(Optimising SVM Parameters)
 
Line 26: Line 26:
* Run grid.py with the name of your data file as an argument, and you will get a plot showing the best parameters for the libsvm classifier (C and gamma).
* Run grid.py with the name of your data file as an argument, and you will get a plot showing the best parameters for the libsvm classifier (C and gamma).
-
* If your python script gives an error message, saying it cannot find gnuplot executable, localize gnuplot on your computer, and change the line in the python script accordingly. Using the instructions on this page, gnuplot has been found to install in /opt/local/bin directory, and not /usr/local/bin which is the default in the grid.py script.
+
* If your python script gives an error message, saying it cannot find gnuplot executable, localize gnuplot on your computer, and change the line in the python script accordingly. Using the instructions on this page, gnuplot has been found to install in ''/opt/local/bin'' directory, and not ''/usr/local/bin'' which is the default in the grid.py script.

Current revision as of 11:35, 21 October 2010

Installation

Download Rapidminer from the rapid-i website.


Using SVM Classifier

The SVM classifier in Rapidminer is based on LIBSVM.


Optimising SVM Parameters

To calculate the best parameters for the classifier, the python script grid.py in the LIBSVM distribution is handy. This is how you install and use it:

  • Make sure the data is formatted properly. All attributes must be scaled between -1 and 1. The data must be in a text file where each line represents an instance in the classification set. The first value is the correct class, 1:first-attribute 2:second-attribute, and so forth. The example below shows the proper format for two instances of class 1 and two instances of class 2, with four attributes.
1 1:1.000 2:-0.543 3:-0.767 4:-0.253
1 1:-0.184 2:0.144 3:-0.647 4:-0.271 
2 1:-0.684 2:-0.542 3:0.723 4:-0.244 
2 1:-0.964 2:-1.000 3:0.111 4:-0.472 
  • Run grid.py with the name of your data file as an argument, and you will get a plot showing the best parameters for the libsvm classifier (C and gamma).
  • If your python script gives an error message, saying it cannot find gnuplot executable, localize gnuplot on your computer, and change the line in the python script accordingly. Using the instructions on this page, gnuplot has been found to install in /opt/local/bin directory, and not /usr/local/bin which is the default in the grid.py script.
Personal tools
Front page