Martijho-PathNet-thesis
From Robin
(→Theoretical Background) |
(→Implementation) |
||
Line 116: | Line 116: | ||
** ref to search | ** ref to search | ||
- | = Implementation = | + | = Implementation = |
+ | EDIT NOTE: | ||
+ | Limit overlap in implementation details between this chapter and experimentation implementation. Build up a base that can be built on in chapter 4 and 5. | ||
+ | |||
+ | == Python implementation == | ||
+ | * why python? | ||
+ | ** Problems: | ||
+ | *** Not quick to run | ||
+ | ** Pros: | ||
+ | *** Quick to prototype in | ||
+ | *** Generally good to debug | ||
+ | *** Multiple good tools for machine learning | ||
+ | **** \cite{tensorflow} | ||
+ | **** \cite{keras} | ||
+ | **** Why are these good? | ||
+ | *** Other packages | ||
+ | **** Matplotlib (visualization) | ||
+ | **** Numpy (math stuffs) | ||
+ | **** Pickle (data logging) | ||
+ | * code structure | ||
+ | ** Object oriented | ||
+ | *** Easily parameterizable for ease of prototyping pathnet structures | ||
+ | ** Class structure: | ||
+ | *** Modules | ||
+ | *** Layers | ||
+ | *** PathNet | ||
+ | **** Functionality for | ||
+ | ***** Building random paths | ||
+ | ***** Creating keras models | ||
+ | ***** static methods for creating pathnet structures | ||
+ | ***** reset backend session | ||
+ | *** Taks | ||
+ | *** Search | ||
+ | *** Plot generating | ||
+ | * Training on gpu | ||
+ | ** Quicker in general for ML | ||
+ | ** This implementation do lots on CPU | ||
+ | *** Other implementations could take advantage of customizing layers and models in keras. | ||
+ | * Noteable differences in implementation | ||
+ | ** Keras implementasjon | ||
+ | ** Path fitness not negative error but accuracy | ||
+ | ** exp 2: fitness calculated before evaluation (not same step) | ||
+ | ** Not added any noise to training data | ||
+ | * Implementation problems | ||
+ | ** Tensorflow sessions not made for using multiple graphs | ||
+ | *** Resetting backend session after a number of models are made | ||
+ | ** Tensorflow-gpus default is using all gpu memory it can | ||
+ | *** Limiting data allocation to scale when needed | ||
+ | ** Tensorflow session does not free allocated memory before python thread is done. | ||
+ | *** Run all experiments through treads. | ||
+ | * Code available on github | ||
+ | |||
+ | |||
+ | == Datasets == | ||
+ | * Data type | ||
+ | * Use cases and citations | ||
+ | * How does the data look? | ||
+ | * set sizes and class distributions | ||
+ | * state of the art and human level performance | ||
+ | \cite{MNIST} | ||
+ | \cite{SVHN} | ||
+ | |||
+ | == Search implementation == | ||
+ | * functions. callback to theoretical background and GA buzzwords | ||
+ | * parameterization | ||
= Experiment 1: Search versus Selection = | = Experiment 1: Search versus Selection = |
Revision as of 13:51, 21 March 2018
Contents |
Opening
Abstract
- What is all this about?
- Why should I read this thesis?
- Is it any good?
- What's new?
Acknowledgements
- Who is your advisor?
- Did anyone help you?
- Who funded this work?
- What's the name of your favorite pet?
Introduction
From essay. More on multi task learning More on transfer learning
Raise problem: catastrophic forgetting.
Multiple solutions (PNN, PN, EWC)
- Large structures (PNN, PN)
- Limited in number of tasks it can retains(EWC)
Optimize reuse of knowledge while still providing valid solutions to tasks. More reuse and limited capacity use will increase amount of task a structure can learn.
- where do i start?
Question DeepMind left unanswered is how different GAs influence task learning and module reuse. Exploration vs exploitation\ref{theoretic background on topic}
- why this?
broad answers first, specify later. We know PN works. would it work better for different algorithms? logical next step from original paper "unit of evolution"
Problem/hypothesis
- What do modular PN training do with the knowledge?
- More/less accuracy?
- More/less transferability?
Test by learning in end-to-end first then PN search. Difference in performance or reuse?
- Can we make reuse easier by shifting focus of search algorithm?
- PN original: Naive search. Higher exploitation improve on module selection?
How to answer?
- Set up simple multitask scenarios and try.
- 2 tasks where first are end to end vs PN
- List algorithms with different selection pressure and try on multiple tasks.
Theoretical Background
Machine Learning
Intro about ML from the thesis \subsection{MLP and NN modeling as function approx} Inspired by the structure of the brain, the Neural Network (NN) consists of one or more layer where each layer is made up of perceptrons
- What is a perceptron? How is it connected to input, output?
- How is training done? Input against target
- Multiple layer perceptron (MLP) as an artificial Neural Network (ANN).
- Ref binary MNIST classification in exp 1
- Backpropagation and optimizers (SGD and Adam)
- ref binary MNIST/Quinary MNIST/exp2
- Regression/function approximation (ReLU activation)
- Classification (Softmax and probability approximation)
- ref experiments
- Image classification
- ref experiments
- Convolutional Neural Networks (CNN)
- ref transition binary-quinary exp1 and exp2
- Deep Learning and Deep neural networks (DNN)
Deep Learning
- Feature extraction
- Bigger black box
- Network designs
- Transfer learning
- What is it?
- Why do it?
- How do it?
- TL in CNNs
- Who have done it?
- Results?
- Gabor approximation
- Multi-task Learning
- Curriculum Learning
- ref to motivation behind task ordering in exp2
- Curriculum Learning
- Catastrophic forgetting
- EWC
- PNN
- PathNet
- Super Neural Networks
- What are they?
Evolutionary algorithms
- What is it? Where does it come from?
- Exploration vs Exploitation
- ref experiments (formulated in the context of this trade-off)
- Terms used in the evolutionary programming context
- Population
- Genotype and genome
- Fitness-function
- selection
- recombination
- generation
- mutation
- population diversity and convergence
- Some types
- GA
- Evolutionary searches
- short. Straight into tournament search
- Tournament search
- How it works, what are the steps?
- Selection pressure (in larger context of EAs and then tournament search)
- ref to search
Implementation
EDIT NOTE: Limit overlap in implementation details between this chapter and experimentation implementation. Build up a base that can be built on in chapter 4 and 5.
Python implementation
- why python?
- Problems:
- Not quick to run
- Pros:
- Quick to prototype in
- Generally good to debug
- Multiple good tools for machine learning
- \cite{tensorflow}
- \cite{keras}
- Why are these good?
- Other packages
- Matplotlib (visualization)
- Numpy (math stuffs)
- Pickle (data logging)
- Problems:
- code structure
- Object oriented
- Easily parameterizable for ease of prototyping pathnet structures
- Class structure:
- Modules
- Layers
- PathNet
- Functionality for
- Building random paths
- Creating keras models
- static methods for creating pathnet structures
- reset backend session
- Functionality for
- Taks
- Search
- Plot generating
- Object oriented
- Training on gpu
- Quicker in general for ML
- This implementation do lots on CPU
- Other implementations could take advantage of customizing layers and models in keras.
- Noteable differences in implementation
- Keras implementasjon
- Path fitness not negative error but accuracy
- exp 2: fitness calculated before evaluation (not same step)
- Not added any noise to training data
- Implementation problems
- Tensorflow sessions not made for using multiple graphs
- Resetting backend session after a number of models are made
- Tensorflow-gpus default is using all gpu memory it can
- Limiting data allocation to scale when needed
- Tensorflow session does not free allocated memory before python thread is done.
- Run all experiments through treads.
- Tensorflow sessions not made for using multiple graphs
- Code available on github
Datasets
- Data type
- Use cases and citations
- How does the data look?
- set sizes and class distributions
- state of the art and human level performance
\cite{MNIST} \cite{SVHN}
Search implementation
- functions. callback to theoretical background and GA buzzwords
- parameterization