Martijho-PathNet-Experiments

From Robin

(Difference between revisions)
Jump to: navigation, search
(Suggested Experiment)
Line 21: Line 21:
Rewards in early subtasks might be clear-cut values (1 if reached sub-goal, 0 if fail)
Rewards in early subtasks might be clear-cut values (1 if reached sub-goal, 0 if fail)
: Read up on curriculum design techniques
: Read up on curriculum design techniques
-
 
+
Create then a sequence of sub-tasks gradually increasing in complexity, and search for an optimal path through the PathNet for each of the sub-tasks.
 +
This implementation would use some version of Temporal Difference learning (Q-learning), and each path would represent some approximation of a value function.
= '''Capacity Increase''' =
= '''Capacity Increase''' =

Revision as of 11:45, 9 November 2017

Contents

Evolved paths through a Modular Super Neural Network

Research question

How would different evolutionary algorithms influence outcomes in training a PathNet structure on multiple tasks? What evolutionary strategies make the most sense in the scheme of training an SNN? Can evolutionary strategies easily be implemented as a search technique for a Pathnet structure?

Hypothesis

Different evolutionary algorithms would probably not change the PathNet results significantly for a limited number of tasks but might prove fruitful for a search for an optimal path in a saturated PathNet. Here, the search domain consists of pre-trained modules, hopefully with a memetic separation for each layer/module. This would ensure good transferability between tasks, and in the end, simplify the search and training of the task-specific softmax layer given the new task.

Gradual Learning in Super Neural Networks

Research question

Can the modularity of the SNN help show what level of transferability it is between modules used in the different tasks in the curriculum?

Hypothesis

By testing what modules are used in which optimal paths, this study might show a reuse of some modules in multiple tasks, which would indicate the value of curriculum design. A high level of reuse might even point towards the possibility of one-shot learning in a saturated SNN

Suggested Experiment

Training an RL agent on some simple toy-environment like the LunarLander from OpenAI gym. This requires some rework of the reward signal from the environment to fake rewards for subtasks in the curriculum. Rewards in early subtasks might be clear-cut values (1 if reached sub-goal, 0 if fail)

Read up on curriculum design techniques

Create then a sequence of sub-tasks gradually increasing in complexity, and search for an optimal path through the PathNet for each of the sub-tasks. This implementation would use some version of Temporal Difference learning (Q-learning), and each path would represent some approximation of a value function.

Capacity Increase

Research question

Can we estimate the decline in needed capacity for each new sub-task learned from the curriculum? How "much" capacity is needed to learn a new meme?

Hypothesis

Previous studies show a decline in needed capacity for each new sub-task (cite: Progressive Neural Networks-paper). If a metric can be defined for measuring the capacity change, we expect the results to confirm this.


Search for the first path?

Research question

Is there anything to gain from performing a proper search for the first path versus just picking a random path and training the weights? In a two-task system, whats the difference between picking a first path and a PNN?

Hypothesis

I think performance will have the same asymptote, but it will be reached in fewer training iterations. The only thing that might be influenced by this path selection is that the modules in PathNet might have more interconnected dependencies. Maybe the layers are more "independent" when the weights are updated as part of multiple paths? This might be important for transferability when learning future tasks.

Suggested experiment

Performing multiple small multi-task learning scenarios. Two tasks should be enough, but it is necessary to show that modules are reused in each scenario. Test both picking a path and the full-on search for a path and compare convergence time for the second task.

Implemented experiment

  • Information about the execution of the experiment

Results

  • Plots and table contents showing experiment results

Conclusion

  • What does the results tell me




Personal tools
Front page