data mining - Guaranteeing the same subset for several techniques in Rapidminer's X-Validation -


i in feature selection stage of class data mining project, main objective of compare several data mining techniques (naive baiyes, svm,etc...). in stage using wrapper x-validation,like in example below:

<?xml version="1.0" encoding="utf-8" standalone="no"?> <process version="5.3.008">   <context>     <input/>     <output/>     <macros/>   </context>   <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="process">     <process expanded="true">       <operator activated="true" class="optimize_selection" compatibility="5.3.008" expanded="true" height="94" name="optimize selection (3)" width="90" x="179" y="120">         <parameter key="generations_without_improval" value="100"/>         <parameter key="limit_number_of_generations" value="true"/>         <parameter key="maximum_number_of_generations" value="-1"/>         <process expanded="true">           <operator activated="true" class="x_validation" compatibility="5.3.008" expanded="true" height="112" name="validation" width="90" x="179" y="75">             <process expanded="true">               <operator activated="true" class="naive_bayes" compatibility="5.3.008" expanded="true" height="76" name="naive bayes (4)" width="90" x="119" y="30"/>               <connect from_port="training" to_op="naive bayes (4)" to_port="training set"/>               <connect from_op="naive bayes (4)" from_port="model" to_port="model"/>               <portspacing port="source_training" spacing="0"/>               <portspacing port="sink_model" spacing="0"/>               <portspacing port="sink_through 1" spacing="0"/>             </process>             <process expanded="true">               <operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="apply model (8)" width="90" x="45" y="30">                 <list key="application_parameters"/>               </operator>               <operator activated="true" class="performance" compatibility="5.3.008" expanded="true" height="76" name="performance (8)" width="90" x="209" y="30"/>               <connect from_port="model" to_op="apply model (8)" to_port="model"/>               <connect from_port="test set" to_op="apply model (8)" to_port="unlabelled data"/>               <connect from_op="apply model (8)" from_port="labelled data" to_op="performance (8)" to_port="labelled data"/>               <connect from_op="performance (8)" from_port="performance" to_port="averagable 1"/>               <portspacing port="source_model" spacing="0"/>               <portspacing port="source_test set" spacing="0"/>               <portspacing port="source_through 1" spacing="0"/>               <portspacing port="sink_averagable 1" spacing="0"/>               <portspacing port="sink_averagable 2" spacing="0"/>             </process>           </operator>           <connect from_port="example set" to_op="validation" to_port="training"/>           <connect from_op="validation" from_port="averagable 1" to_port="performance"/>           <portspacing port="source_example set" spacing="0"/>           <portspacing port="source_through 1" spacing="0"/>           <portspacing port="sink_performance" spacing="0"/>         </process>       </operator>       <portspacing port="source_input 1" spacing="0"/>       <portspacing port="sink_result 1" spacing="0"/>     </process>   </operator> </process> 

the issue if want compare several techniques must guarantee sets generated in cross validation phase identical techniques know accuracy of results generated made under exact same conditions. inside x-validation operator can't put more 1 model creating operator, don't know how guarantee that.

the optimize selection operator uses performance of inner operators determine attributes retain or remove during forward or backward selection. means attribute order determined performance returned inner learner. different inner learner yield different ordering in general. if want possible take copy of example set inside optimize selection operator using multiply operator , pass validation block containing other learner. use log operator record performance values learner , original 1 driving attribute ordering. optimize selection operator can have progress logged , possible record feature names being considered.


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

linux - phpmyadmin, neginx error.log - Check group www-data has read access and open_basedir -