# Martijho-PathNet-thesis

(Forskjeller mellom versjoner)
Gå til: navigasjon, søk
 Versjonen fra 21. mar 2018 kl. 14:11 (vis kilde) (→Datasets)← Eldre redigering Versjonen fra 21. mar 2018 kl. 14:13 (vis kilde) (→Datasets)Nyere redigering → Linje 235: Linje 235: === MNIST === === MNIST === - +

## Versjonen fra 21. mar 2018 kl. 14:13

Notes
Experiments repicable? What to do to get same results?
Conclusion/end of thesis/ "what could be better?" section: Simplify experiment 2 with fewer algorithms and harder problems
Find all changes made to original implementation

# Opening

## Abstract

• What is all this about?
• Why should I read this thesis?
• Is it any good?
• What's new?

## Acknowledgements

• Who funded this work?
• What's the name of your favorite pet?

# Introduction

From essay. More on multi task learning More on transfer learning

## Raise problem: catastrophic forgetting.

Multiple solutions (PNN, PN, EWC)

• Large structures (PNN, PN)
• Limited in number of tasks it can retains(EWC)

Optimize reuse of knowledge while still providing valid solutions to tasks. More reuse and limited capacity use will increase amount of task a structure can learn.

where do i start?

Question DeepMind left unanswered is how different GAs influence task learning and module reuse. Exploration vs exploitation\ref{theoretic background on topic}

why this?

broad answers first, specify later. We know PN works. would it work better for different algorithms? logical next step from original paper "unit of evolution"

## Problem/hypothesis

• What do modular PN training do with the knowledge?
• More/less accuracy?
• More/less transferability?

Test by learning in end-to-end first then PN search. Difference in performance or reuse?

• Can we make reuse easier by shifting focus of search algorithm?
• PN original: Naive search. Higher exploitation improve on module selection?

• Set up simple multitask scenarios and try.
• 2 tasks where first are end to end vs PN
• List algorithms with different selection pressure and try on multiple tasks.

# Theoretical Background

## Machine Learning

Intro about ML from the thesis \subsection{MLP and NN modeling as function approx} Inspired by the structure of the brain, the Neural Network (NN) consists of one or more layer where each layer is made up of perceptrons

• What is a perceptron? How is it connected to input, output?
• How is training done? Input against target
• Multiple layer perceptron (MLP) as an artificial Neural Network (ANN).
• Ref binary MNIST classification in exp 1
• Backpropagation and optimizers (SGD and Adam)
• ref binary MNIST/Quinary MNIST/exp2
• Regression/function approximation (ReLU activation)
• Classification (Softmax and probability approximation)
• ref experiments
• Image classification
• ref experiments
• Convolutional Neural Networks (CNN)
• ref transition binary-quinary exp1 and exp2
• Deep Learning and Deep neural networks (DNN)

## Deep Learning

• Feature extraction
• Bigger black box
• Network designs
• Transfer learning
• What is it?
• Why do it?
• How do it?
• TL in CNNs
• Who have done it?
• Results?
• Gabor approximation
• Curriculum Learning
• ref to motivation behind task ordering in exp2
• Catastrophic forgetting
• EWC
• PNN
• PathNet
• Super Neural Networks
• What are they?

## Evolutionary algorithms

• What is it? Where does it come from?
• Exploration vs Exploitation
• ref experiments (formulated in the context of this trade-off)
• Terms used in the evolutionary programming context
• Population
• Genotype and genome
• Fitness-function
• selection
• recombination
• generation
• mutation
• population diversity and convergence
• Some types
• GA
• Evolutionary searches
• short. Straight into tournament search
• Tournament search
• How it works, what are the steps?
• Selection pressure (in larger context of EAs and then tournament search)
• ref to search

# Implementation

EDIT NOTE: Limit overlap in implementation details between this chapter and experimentation implementation. Build up a base that can be built on in chapter 4 and 5.

## Python implementation

• why python?
• Problems:
• Not quick to run
• Pros:
• Quick to prototype in
• Generally good to debug
• Multiple good tools for machine learning
• \cite{tensorflow}
• \cite{keras}
• Why are these good?
• Other packages
• Matplotlib (visualization)
• Numpy (math stuffs)
• Pickle (data logging)
• code structure
• Object oriented
• Easily parameterizable for ease of prototyping pathnet structures
• Class structure:
• Modules
• Layers
• PathNet
• Functionality for
• Building random paths
• Creating keras models
• static methods for creating pathnet structures
• reset backend session
• Taks
• Search
• Plot generating
• Training on gpu
• Quicker in general for ML
• This implementation do lots on CPU
• Other implementations could take advantage of customizing layers and models in keras.
• Noteable differences in implementation
• Keras implementasjon
• Path fitness not negative error but accuracy
• exp 2: fitness calculated before evaluation (not same step)
• Not added any noise to training data
• Implementation problems
• Tensorflow sessions not made for using multiple graphs
• Resetting backend session after a number of models are made
• Tensorflow-gpus default is using all gpu memory it can
• Limiting data allocation to scale when needed
• Tensorflow session does not free allocated memory before python thread is done.
• Run all experiments through treads.
• Code available on github

## Datasets

### MNIST

\begin{table}[h] \centering \caption{Distribution of samples on each class in both training-set and validation-set along with the portion of the whole sets each class constitute. } \label{table:MNIST class distribution} \begin{tabular}{ccccc}

                    & \multicolumn{2}{c}{Training data}    & \multicolumn{2}{c}{Validation data} \\


Class number (Digit) & Number of samples & \% of whole set & Number of samples & \% of whole set \\ 0 & 5923 & 9.9\% & 980 & 9.8\% \\ 1 & 6742 & 11.3\% & 1135 & 11.3\% \\ 2 & 5958 & 9.9\% & 1032 & 10.3\% \\ 3 & 6131 & 10.2\% & 1010 & 10.1\% \\ 4 & 5842 & 9.7\% & 982 & 9.8\% \\ 5 & 5421 & 9.0\% & 892 & 8.9\% \\ 6 & 5918 & 9.9\% & 958 & 9.6\% \\ 7 & 6265 & 10.4\% & 1028 & 10.3\% \\ 8 & 5951 & 9.9\% & 974 & 9.7\% \\ 9 & 5949 & 9.9\% & 1009 & 10.1\% \end{tabular} \end{table}

### SVHN

The sample distribution on each class follows Benfords law, which can be expected from a natural dataset such as this.

• Data type
• Use cases and citations
• How does the data look?
• set sizes and class distributions
• state of the art and human level performance

## Search implementation

• functions. callback to theoretical background and GA buzzwords
• parameterization

# Discussion

Are your results satisfactory? Can they be improved? Is there a need for improvement? Are other approaches worth trying out? Will some restriction be lifted? Will you save the world with your Nifty Gadget?

## Discussion

Discussion of the accuracy and relevance of the results; comparison with other researchers results. \subsection{Common errors} Too far reaching conclusions; guesswork not supported by the data; introduction of a new problem and a discussion around this.

## Conclusion

Consequences of the achieved results, for example for new research, theory and applications.

### Common errors

The conclusions are too far reaching with respect to the achieved results; the conclusions do not correspond with the purpose