Martijho-PathNet-Experiments

Fra Robin

(Forskjeller mellom versjoner)
Gå til: navigasjon, søk
(Plots)
Linje 96: Linje 96:
==== Plots ====
==== Plots ====
-
[[Fil:600binmnist_module_reuse_histogram.png|thumb|left|alt=Distribution of module reuse in s+s and p+s searches]]
+
[[Fil:600binmnist_module_reuse_histogram.png|thumb|left|alt=Distribution of module reuse in s+s and p+s searches along side the distribution for a random module selection]]
-
[[Fil:600binmnist_reuse_by_layer.png|thumb|left|alt=Distribution of module reuse in s+s and p+s searches]]
+
[[Fil:600binmnist_reuse_by_layer.png|thumb|left|alt=Amount of module reuse for each layer in s+s and p+s searches along side amount of reuse when randomly selecting modules]]
-
[[Fil:600binmnist_training_boxplot.png|thumb|left|alt=Distribution of module reuse in s+s and p+s searches]]
+
[[Fil:600binmnist_training_boxplot.png|thumb|left|alt=Average training each module in a path undergoes for s+s and p+s searches plotted over the amount of module reuse]]
<!--
<!--
= Format =
= Format =

Versjonen fra 29. nov 2017 kl. 15:59

Innhold

Evolved paths through a Modular Super Neural Network

Research question

How would different evolutionary algorithms influence outcomes in training a PathNet structure on multiple tasks? What evolutionary strategies make the most sense in the scheme of training an SNN? Can evolutionary strategies easily be implemented as a search technique for a Pathnet structure?

Hypothesis

Different evolutionary algorithms would probably not change the PathNet results significantly for a limited number of tasks but might prove fruitful for a search for an optimal path in a saturated PathNet. Here, the search domain consists of pre-trained modules, hopefully with a memetic separation for each layer/module. This would ensure good transferability between tasks, and in the end, simplify the search and training of the task-specific softmax layer given the new task.

Gradual Learning in Super Neural Networks

Research question

Can the modularity of the SNN help show what level of transferability it is between modules used in the different tasks in the curriculum? How large is the reduction in training necessary to learn a new task when a saturated PathNet is provided compared to learning de novo?

Hypothesis

By testing what modules are used in which optimal paths, this study might show a reuse of some modules in multiple tasks, which would indicate the value of curriculum design. A high level of reuse might even point towards the possibility of one-shot learning in a saturated SNN

Suggested Experiment

Training an RL agent on some simple toy-environment like the LunarLander from OpenAI gym. This requires some rework of the reward signal from the environment to fake rewards for subtasks in the curriculum. Rewards in early subtasks might be clear-cut values (1 if reached sub-goal, 0 if fail)

Read up on curriculum design techniques

Create then a sequence of sub-tasks gradually increasing in complexity, and search for an optimal path through the PathNet for each of the sub-tasks. This implementation would use some version of Temporal Difference learning (Q-learning), and each path would represent some approximation of a value function.

Capacity Increase

Research question

Can we estimate the decline in needed capacity for each new sub-task learned from the curriculum? How "much" capacity is needed to learn a new meme?

Hypothesis

Previous studies show a decline in needed capacity for each new sub-task (cite: Progressive Neural Networks-paper). If a metric can be defined for measuring the capacity change, we expect the results to confirm this.


Search for the first path?

Research question

Is there anything to gain from performing a proper search for the first path versus just picking a random path and training the weights? In a two-task system, whats the difference between picking a first path and a PNN?

Hypothesis

I think performance will have the same asymptote, but it will be reached in fewer training iterations. The only thing that might be influenced by this path selection is that the modules in PathNet might have more interconnected dependencies. Maybe the layers are more "independent" when the weights are updated as part of multiple paths? This might be important for transferability when learning future tasks.

Suggested experiment

Performing multiple small multi-task learning scenarios. Two tasks should be enough, but it is necessary to show that modules are reused in each scenario. Test both picking a path and the full-on search for a path and compare convergence time for the second task.

Run multiple executions of a first-task search for a path in a binary mnist classification problem up to 99% classification accuracy on a test-set (like original pathnet paper). Log the training-counter for each optimal path and take a look at the average number of training iterations each path has. (so far: around 12?)

12 x 50 = 600 => 600 backpropagations of batchsize 16 or => 9600 training-examples shown

Then run multiple iterations where random paths of same average size as in the original experiment is trained for 600 iterations. compare classification accuracy of each path

Metrics
Training counter for each module and average of each path
Path size (number of modules in path) connected to capacity?
Reuse of modules (transferability)

Implemented experiment

Problem: Binary MNIST classification (Same as deepminds experiments without the salt/pepper noise) 500 Search + Search

  • Search for path 1 and evaluate optimal path found
  • Search for path 2 and evaluate optimal path found
  • Each found path is evaluated on test set
  • For each path, save: the path itself, its evaluated fitness, number of generations used to reach it, the average training each module received within the path (1=50*minibatch),
  • Also store the number of reused modules from task 1 to task 2
  • Generate path1 and path2s training-plot and write it to pdf

500 Pick + Search:

  • Generate a random path
  • Train same number of times as average training time for first path from "search + search" with same iteration index
  • evaluate random path on test set
  • Search for path 2 and evaluate on optimal path found
  • Store same metrics as in search + search
  • Generate path1 and path2s training-plot and write it to pdf

Last, write log to file

Results

First run:

Iterations: 600

Population size: 64

Acc threshold: 98%

Tasks: [3, 4] then [1, 2]

Plots

Distribution of module reuse in s+s and p+s searches along side the distribution for a random module selection
Amount of module reuse for each layer in s+s and p+s searches along side amount of reuse when randomly selecting modules
Average training each module in a path undergoes for s+s and p+s searches plotted over the amount of module reuse
Personlige verktøy