Martijho-PathNet-thesis

From Robin

Revision as of 13:51, 21 March 2018 by Martijho (Talk | contribs)
Jump to: navigation, search

Contents

Opening

Abstract

  • What is all this about?
  • Why should I read this thesis?
  • Is it any good?
  • What's new?

Acknowledgements

  • Who is your advisor?
  • Did anyone help you?
  • Who funded this work?
  • What's the name of your favorite pet?

Introduction

From essay. More on multi task learning More on transfer learning

Raise problem: catastrophic forgetting.

Multiple solutions (PNN, PN, EWC)

  • Large structures (PNN, PN)
  • Limited in number of tasks it can retains(EWC)

Optimize reuse of knowledge while still providing valid solutions to tasks. More reuse and limited capacity use will increase amount of task a structure can learn.

where do i start?

Question DeepMind left unanswered is how different GAs influence task learning and module reuse. Exploration vs exploitation\ref{theoretic background on topic}

why this?

broad answers first, specify later. We know PN works. would it work better for different algorithms? logical next step from original paper "unit of evolution"

Problem/hypothesis

  • What do modular PN training do with the knowledge?
    • More/less accuracy?
    • More/less transferability?

Test by learning in end-to-end first then PN search. Difference in performance or reuse?

  • Can we make reuse easier by shifting focus of search algorithm?
    • PN original: Naive search. Higher exploitation improve on module selection?

How to answer?

  • Set up simple multitask scenarios and try.
    • 2 tasks where first are end to end vs PN
    • List algorithms with different selection pressure and try on multiple tasks.


Theoretical Background

Machine Learning

Intro about ML from the thesis \subsection{MLP and NN modeling as function approx} Inspired by the structure of the brain, the Neural Network (NN) consists of one or more layer where each layer is made up of perceptrons

  • What is a perceptron? How is it connected to input, output?
  • How is training done? Input against target
  • Multiple layer perceptron (MLP) as an artificial Neural Network (ANN).
    • Ref binary MNIST classification in exp 1
  • Backpropagation and optimizers (SGD and Adam)
    • ref binary MNIST/Quinary MNIST/exp2
  • Regression/function approximation (ReLU activation)
  • Classification (Softmax and probability approximation)
    • ref experiments
  • Image classification
    • ref experiments
  • Convolutional Neural Networks (CNN)
    • ref transition binary-quinary exp1 and exp2
  • Deep Learning and Deep neural networks (DNN)


Deep Learning

  • Feature extraction
    • Bigger black box
  • Network designs
  • Transfer learning
    • What is it?
    • Why do it?
    • How do it?
    • TL in CNNs
      • Who have done it?
      • Results?
      • Gabor approximation
  • Multi-task Learning
    • Curriculum Learning
      • ref to motivation behind task ordering in exp2
  • Catastrophic forgetting
      • EWC
      • PNN
      • PathNet
  • Super Neural Networks
    • What are they?


Evolutionary algorithms

  • What is it? Where does it come from?
  • Exploration vs Exploitation
    • ref experiments (formulated in the context of this trade-off)
  • Terms used in the evolutionary programming context
    • Population
    • Genotype and genome
    • Fitness-function
    • selection
    • recombination
    • generation
    • mutation
    • population diversity and convergence
  • Some types
    • GA
    • Evolutionary searches
    • short. Straight into tournament search
  • Tournament search
    • How it works, what are the steps?
    • Selection pressure (in larger context of EAs and then tournament search)
    • ref to search

Implementation

EDIT NOTE: Limit overlap in implementation details between this chapter and experimentation implementation. Build up a base that can be built on in chapter 4 and 5.

Python implementation

  • why python?
    • Problems:
      • Not quick to run
    • Pros:
      • Quick to prototype in
      • Generally good to debug
      • Multiple good tools for machine learning
        • \cite{tensorflow}
        • \cite{keras}
        • Why are these good?
      • Other packages
        • Matplotlib (visualization)
        • Numpy (math stuffs)
        • Pickle (data logging)
  • code structure
    • Object oriented
      • Easily parameterizable for ease of prototyping pathnet structures
    • Class structure:
      • Modules
      • Layers
      • PathNet
        • Functionality for
          • Building random paths
          • Creating keras models
          • static methods for creating pathnet structures
          • reset backend session
      • Taks
      • Search
      • Plot generating
  • Training on gpu
    • Quicker in general for ML
    • This implementation do lots on CPU
      • Other implementations could take advantage of customizing layers and models in keras.
  • Noteable differences in implementation
    • Keras implementasjon
    • Path fitness not negative error but accuracy
    • exp 2: fitness calculated before evaluation (not same step)
    • Not added any noise to training data
  • Implementation problems
    • Tensorflow sessions not made for using multiple graphs
      • Resetting backend session after a number of models are made
    • Tensorflow-gpus default is using all gpu memory it can
      • Limiting data allocation to scale when needed
    • Tensorflow session does not free allocated memory before python thread is done.
      • Run all experiments through treads.
  • Code available on github


Datasets

  • Data type
  • Use cases and citations
  • How does the data look?
  • set sizes and class distributions
  • state of the art and human level performance

\cite{MNIST} \cite{SVHN}

Search implementation

  • functions. callback to theoretical background and GA buzzwords
  • parameterization

Experiment 1: Search versus Selection

Experiment 2: Selection Pressure

Discussion

Ending

Personal tools
Front page