【连载16】GoogLeNet Inception V1

公众号后台回复“python“,立刻领取100本机器学习必备Python电子书
GoogLeNet是由google的Christian Szegedy等人在2014年的论文《Going Deeper with Convolutions》提出,其最大的亮点是提出一种叫Inception的结构,以此为基础构建GoogLeNet,并在当年的ImageNet分类和检测任务中获得第一,ps:GoogLeNet的取名是为了向YannLeCun的LeNet系列致敬。

一些思考

为了提高深度神经网络的性能,最简单粗暴有效的方法是增加网络深度与宽度,但这个方法有两个明显的缺点:
  • 更深更宽的网络意味着更多的参数,从而大大增加过拟合的风险,尤其在训练数据不是那么多或者某个label训练数据不足的情况下更容易发生;
  • 增加计算资源的消耗,实际情况下,不管是因为数据稀疏还是扩充的网络结构利用不充分(比如很多权重接近0),都会导致大量计算的浪费。
解决以上两个问题的基本方法是将全连接或卷积连接改为稀疏连接。不管从生物的角度还是机器学习的角度,稀疏性都有良好的表现,回想Dropout网络以及ReLU激活函数,其本质就是利用稀疏性提高模型泛化性(但需要计算的参数没变少)。
简单解释下稀疏性,当整个特征空间是非线性甚至不连续时:
  • 学好局部空间的特征集更能提升性能,类似于Maxout网络中使用多个局部线性函数的组合来拟合非线性函数的思想;
  • 假设整个特征空间由N个不连续局部特征空间集合组成,任意一个样本会被映射到这N个空间中并激活/不激活相应特征维度,如果用C1表示某类样本被激活的特征维度集合,用C2表示另一类样本的特征维度集合,当数据量不够大时,要想增加特征区分度并很好的区分两类样本,就要降低C1和C2的重合度(比如可用Jaccard距离衡量),即缩小C1和C2的大小,意味着相应的特征维度集会变稀疏。
尴尬的是,现在的计算机体系结构更善于稠密数据的计算,而在非均匀分布的稀疏数据上的计算效率极差,比如稀疏性会导致的缓存miss率极高,于是需要一种方法既能发挥稀疏网络的优势又能保证计算效率。好在前人做了大量实验(如《On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe》),发现对稀疏矩阵做聚类得到相对稠密的子矩阵可以大幅提高稀疏矩阵乘法性能,借鉴这个思想,作者提出Inception的结构。
  • 把不同大小卷积核抽象得到的特征空间看做子特征空间,每个子特征空间都是稀疏的,把这些不同尺度特征做融合,相当于得到一个相对稠密的空间;
  • 采用1×1、3×3、5×5卷积核(不是必须的,也可以是其他大小),stride取1,利用padding可以方便的做输出特征维度对齐;
  • 大量事实表明pooling层能有效提高卷积网络的效果,所以加了一条max pooling路径;
  • 这个结构符合直观理解,视觉信息通过不同尺度的变换被聚合起来作为下一阶段的特征,比如:人的高矮、胖瘦、青老信息被聚合后做下一步判断。
这个网络的最大问题是5×5卷积带来了巨大计算负担,例如,假设上层输入为:28×28×192:
  • 直接经过96个5×5卷积层(stride=1,padding=2)后,输出为:28×28×96,卷积层参数量为:192×5×5×96=460800;
  • 借鉴NIN网络,在5×5卷积前使用32个1×1卷积核做维度缩减,变成28×28×32,之后经过96个5×5卷积层(stride=1,padding=2)后,输出为:28×28×96,但所有卷积层的参数量为:192×1×1×32+32×5×5×96=82944,可见整个参数量是原来的1/5.5,且效果上没有多少损失。
    新网络结构为:

GoogLeNet结构

利用上述Inception模块构建GoogLeNet,实验表明Inception模块出现在高层特征抽象时会更加有效(我理解由于其结构特点,更适合提取高阶特征,让它提取低阶特征会导致特征信息丢失),所以在低层依然使用传统卷积层。整个网路结构如下:
网络说明:
  • 所有卷积层均使用ReLU激活函数,包括做了1×1卷积降维后的激活;
  • 移除全连接层,像NIN一样使用Global Average Pooling,使得Top 1准确率提高0.6%,但由于GAP与类别数目有关系,为了方便大家做模型fine-tuning,最后加了一个全连接层;
  • 与前面的ResNet类似,实验观察到,相对浅层的神经网络层对模型效果有较大的贡献,训练阶段通过对Inception(4a、4d)增加两个额外的分类器来增强反向传播时的梯度信号,但最重要的还是正则化作用,这一点在GoogLeNet v3中得到实验证实,并间接证实了GoogLeNet V2中BN的正则化作用,这两个分类器的loss会以0.3的权重加在整体loss上,在模型inference阶段,这两个分类器会被去掉;
  • 用于降维的1×1卷积核个数为128个;
  • 全连接层使用1024个神经元;
  • 使用丢弃概率为0.7的Dropout层;
网络结构说明:
输入数据为224×224×3的RGB图像,图中"S"代表做same-padding,"V"代表不做。
  • C1卷积层:64个7×7卷积核(stride=2,padding=3),输出为:112×112×64;
  • P1抽样层:64个3×3卷积核(stride=2),输出为56×56×64,其中:56=(112-3+1)/2+1
  • C2卷积层:192个3×3卷积核(stride=1,padding=1),输出为:56×56×192;
  • P2抽样层:192个3×3卷积核(stride=2),输出为28×28×192,其中:28=(56-3+1)/2+1,接着数据被分出4个分支,进入Inception (3a)
  • Inception (3a):由4部分组成
    • 64个1×1的卷积核,输出为28×28×64;
    • 96个1×1的卷积核做降维,输出为28×28×96,之后128个3×3卷积核(stride=1,padding=1),输出为:28×28×128
    • 16个1×1的卷积核做降维,输出为28×28×16,之后32个5×5卷积核(stride=1,padding=2),输出为:28×28×32
    • 192个3×3卷积核(stride=1,padding=1),输出为28×28×192,进行32个1×1卷积核,输出为:28×28×32
      最后对4个分支的输出做“深度”方向组合,得到输出28×28×256,接着数据被分出4个分支,进入Inception (3b);
  • Inception (3b):由4部分组成
    • 128个1×1的卷积核,输出为28×28×128;
    • 128个1×1的卷积核做降维,输出为28×28×128,进行192个3×3卷积核(stride=1,padding=1),输出为:28×28×192
    • 32个1×1的卷积核做降维,输出为28×28×32,进行96个5×5卷积核(stride=1,padding=2),输出为:28×28×96
    • 256个3×3卷积核(stride=1,padding=1),输出为28×28×256,进行64个1×1卷积核,输出为:28×28×64
      最后对4个分支的输出做“深度”方向组合,得到输出28×28×480;
      后面结构以此类推。

代码实践

  • googlenet_inception_v1.py
    # -*- coding: utf-8 -*-
    from keras.layers import Input, Conv2D, Dense, MaxPooling2D, AveragePooling2D
    from keras.layers import Dropout, Flatten, merge, ZeroPadding2D, Reshape, Activation
    from keras.models import Model
    from keras.regularizers import l1_l2
    import tensorflow as tf
    import googlenet_custom_layers
    def inception_module(name,
    input_layer,
    num_c_1x1,
    num_c_1x1_3x3_reduce,
    num_c_3x3,
    num_c_1x1_5x5_reduce,
    num_p_5x5,
    num_c_1x1_reduce):
    inception_1x1 = Conv2D(name=name+"/inception_1x1",
    filters=num_c_1x1,
    kernel_size=(1, 1),
    strides=(1, 1),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(input_layer)
    inception_3x3_reduce = Conv2D(name=name+"/inception_3x3_reduce",
    filters=num_c_1x1_3x3_reduce,
    kernel_size=(1, 1),
    strides=(1, 1),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(input_layer)
    inception_3x3 = Conv2D(name=name+"/inception_3x3",
    filters=num_c_3x3,
    kernel_size=(3, 3),
    strides=(1, 1),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(inception_3x3_reduce)
    inception_5x5_reduce = Conv2D(name=name+"/inception_5x5_reduce",
    filters=num_c_1x1_5x5_reduce,
    kernel_size=(1, 1),
    strides=(1, 1),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(input_layer)
    inception_5x5 = Conv2D(name=name+"/inception_5x5",
    filters=num_p_5x5,
    kernel_size=(5, 5),
    strides=(1, 1),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(inception_5x5_reduce)
    inception_max_pool = MaxPooling2D(name=name+"/inception_max_pool",
    pool_size=(3, 3),
    strides=(1, 1),
    padding="same")(input_layer)
    inception_max_pool_proj = Conv2D(name=name+"/inception_max_pool_project",
    filters=num_c_1x1_reduce,
    kernel_size=(1, 1),
    strides=(1, 1),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(inception_max_pool)
    print (inception_1x1.get_shape(), inception_3x3.get_shape(), inception_5x5.get_shape(), inception_max_pool_proj.get_shape())
    # inception_output = tf.concat(3, [inception_1x1, inception_3x3, inception_5x5, inception_max_pool_proj])
    from keras.layers.merge import concatenate
    #注意,由于变态的tensorflow更改了concat函数的参数顺序,需要注意自己的tf和keras版本
    #适时的将/usr/lib/python×××/site-packages/keras/backend/tensorflow_backend.py的1554行的代码由
    #return tf.concat([to_dense(x) for x in tensors], axis) 改为:
    #return tf.concat(axis, [to_dense(x) for x in tensors])
    inception_output = concatenate([inception_1x1, inception_3x3, inception_5x5, inception_max_pool_proj])
    return inception_output
    def googLeNet_inception_v1_building(input_shape, output_num, fine_tune=None):
    input_layer = Input(shape=input_shape)
    # 第一层,卷积层
    conv1_7x7 = Conv2D(name="conv1_7x7/2",
    filters=64,
    kernel_size=(7, 7),
    strides=(2, 2),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(input_layer)
    conv1_zero_pad = ZeroPadding2D(padding=(1, 1))(conv1_7x7)
    # 第二层,max pooling层
    pool1_3x3 = MaxPooling2D(name="max_pool1_3x3/2",
    pool_size=(3, 3),
    strides=(2, 2),
    padding='valid')(conv1_zero_pad)
    # 第二层,LRN规范化
    #pool1_norm1 = tf.nn.lrn(pool1_3x3, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='ax_pool1_3x3/norm1')
    pool1_norm1 = googlenet_custom_layers.LRN2D(name='max_pool1_3x3/norm1')(pool1_3x3)
    # 第四层,卷积层降维
    conv2_3x3_reduce = Conv2D(name="conv2_3x3_reduce/1",
    filters=64,
    kernel_size=(1, 1),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(pool1_norm1)
    # 第五层,卷积层
    conv2_3x3 = Conv2D(name="conv2_3x3/1",
    filters=192,
    kernel_size=(3, 3),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(conv2_3x3_reduce)
    # 第六层,LRN规范化
    #conv2_norm2 = tf.nn.lrn(conv2_3x3, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='conv2_3x3/norm2')
    conv2_norm2 = googlenet_custom_layers.LRN2D(name='conv2_3x3/norm2')(conv2_3x3)
    conv2_zero_pad = ZeroPadding2D(padding=(1, 1))(conv2_norm2)
    # 第七层,max pooling层
    pool2_3x3 = MaxPooling2D(name="max_pool2_3x3",
    pool_size=(3, 3),
    strides=(2, 2),
    padding='valid')(conv2_zero_pad)
    # 第八层,inception 3a
    inception_3a = inception_module("inception_3a",pool2_3x3, 64, 96, 128, 16, 32, 32)
    # 第九层,inception 3b
    inception_3b = inception_module("inception_3b",inception_3a, 128, 128, 192, 32, 96, 64)
    inception_3b_zero_pad = ZeroPadding2D(padding=(1, 1))(inception_3b)
    # 第十层,max pooling层
    pool3_3x3 = MaxPooling2D(name="max_pool3_3x3/2",
    pool_size=(3, 3),
    strides=(2, 2),
    padding='valid')(inception_3b_zero_pad)
    # 第十一层,inception 4a
    inception_4a = inception_module("inception_4a",pool3_3x3, 192, 96, 208, 16, 48, 64)
    # 第十二层,分支loss1
    loss1_ave_pool = AveragePooling2D(name="loss1/ave_pool",
    pool_size=(5, 5),
    strides=(3, 3))(inception_4a)
    loss1_conv = Conv2D(name="loss1/conv",
    filters=128,
    kernel_size=(1, 1),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(loss1_ave_pool)
    loss1_flat = Flatten()(loss1_conv)
    loss1_fc = Dense(1024,
    activation='relu',
    name="loss1/fc",
    kernel_regularizer=l1_l2(0.0001))(loss1_flat)
    loss1_drop_fc = Dropout(0.7)(loss1_fc)
    loss1_classifier = Dense(output_num,
    name="loss1/classifier",
    kernel_regularizer=l1_l2(0.0001))(loss1_drop_fc)
    loss1_classifier_act = Activation('softmax')(loss1_classifier)
    # 第十二层,inception_4b
    inception_4b = inception_module("inception_4b",inception_4a, 160, 112, 224, 24, 64, 64)
    # 第十三层,inception_4c
    inception_4c = inception_module("inception_4c",inception_4b, 128, 128, 256, 24, 64, 64)
    # 第十四层,inception_4c
    inception_4d = inception_module("inception_4d",inception_4c, 112, 144, 288, 32, 64, 64)
    # 第十五层,分支loss2
    loss2_ave_pool = AveragePooling2D(pool_size=(5, 5),
    strides=(3, 3),
    name='loss2/ave_pool')(inception_4d)
    loss2_conv = Conv2D(name="loss2/conv",
    filters=128,
    kernel_size=(1, 1),
    padding='same',
    kernel_initializer='he_normal',
    activation='relu',
    kernel_regularizer=l1_l2(0.0001))(loss2_ave_pool)
    loss2_flat = Flatten()(loss2_conv)
    loss2_fc = Dense(1024,
    activation='relu',
    name="loss2/fc",
    kernel_regularizer=l1_l2(0.0001))(loss2_flat)
    loss2_drop_fc = Dropout(0.7)(loss2_fc)
    loss2_classifier = Dense(output_num,
    name="loss2/classifier",
    kernel_regularizer=l1_l2(0.0001))(loss2_drop_fc)
    loss2_classifier_act = Activation('softmax')(loss2_classifier)
    # 第十五层,inception_4e
    inception_4e = inception_module("inception_4e",inception_4d, 256, 160, 320, 32, 128, 128)
    inception_4e_zero_pad = ZeroPadding2D(padding=(1, 1))(inception_4e)
    # 第十六层,max pooling层
    pool4_3x3 = MaxPooling2D(name="max_pool4_3x3",
    pool_size=(3, 3),
    strides=(2, 2),
    padding='valid')(inception_4e_zero_pad)
    # 第十七层,inception_5a
    inception_5a = inception_module("inception_5a",pool4_3x3, 256, 160, 320, 32, 128, 128)
    # 第十八层,inception_5b
    inception_5b = inception_module("inception_5b",inception_5a, 384, 192, 384, 48, 128, 128)
    # 第十九层,average pooling层
    pool5_7x7 = AveragePooling2D(name="ave_pool5_7x7",
    pool_size=(7, 7),
    strides=(1, 1))(inception_5b)
    loss3_flat = Flatten()(pool5_7x7)
    pool5_drop_7x7 = Dropout(0.4)(loss3_flat)
    # 第二十层,全连接层
    loss3_classifier = Dense(output_num,
    name="loss3/classifier",
    kernel_regularizer=l1_l2(0.0001))(pool5_drop_7x7)
    loss3_classifier_act = Activation('softmax')(loss3_classifier)
    googlenet_inception_v1 = Model(name="googlenet_inception_v1",
    input=input_layer,
    output=[loss1_classifier_act, loss2_classifier_act, loss3_classifier_act])
    if fine_tune:
    googlenet_inception_v1.load_weights(fine_tune)
    return googlenet_inception_v1
  • googlenet_custom_layers.py
    from keras.layers.core import Layer
    import keras.backend as K
    class LRN2D(Layer):
    """
    This code is adapted from pylearn2.
    License at: https://github.com/lisa-lab/pylearn2/blob/master/LICENSE.txt
    """
    def __init__(self, alpha=1e-4, k=2, beta=0.75, n=5, **kwargs):
    if n % 2 == 0:
    raise NotImplementedError("LRN2D only works with odd n. n provided: " + str(n))
    super(LRN2D, self).__init__(**kwargs)
    self.alpha = alpha
    self.k = k
    self.beta = beta
    self.n = n
    def get_output(self, train):
    X = self.get_input(train)
    b, ch, r, c = K.shape(X)
    half_n = self.n // 2
    input_sqr = K.square(X)
    extra_channels = K.zeros((b, ch + 2 * half_n, r, c))
    input_sqr = K.concatenate([extra_channels[:, :half_n, :, :],
    input_sqr,
    extra_channels[:, half_n + ch:, :, :]],
    axis=1)
    scale = self.k
    for i in range(self.n):
    scale += self.alpha * input_sqr[:, i:i + ch, :, :]
    scale = scale ** self.beta
    return X / scale
    def get_config(self):
    config = {"name": self.__class__.__name__,
    "alpha": self.alpha,
    "k": self.k,
    "beta": self.beta,
    "n": self.n}
    base_config = super(LRN2D, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))
    class PoolHelper(Layer):
    def __init__(self, **kwargs):
    super(PoolHelper, self).__init__(**kwargs)
    def call(self, x, mask=None):
    return x[:, :, 1:, 1:]
    def get_config(self):
    config = {}
    base_config = super(PoolHelper, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))
  • googlenet_inception_v1-cifar10.py
    # -*- coding: utf-8 -*-
    import numpy as np
    import matplotlib
    matplotlib.use("Agg")
    import matplotlib.pyplot as plt
    import os
    from scipy.misc import toimage
    from keras.datasets import cifar10
    from keras.utils import np_utils
    from keras.preprocessing.image import ImageDataGenerator
    from keras.callbacks import ModelCheckpoint
    from keras import backend as K
    import tensorflow as tf
    tf.python.control_flow_ops = tf
    from keras.callbacks import ReduceLROnPlateau, CSVLogger, EarlyStopping
    lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=np.sqrt(0.5), cooldown=0, patience=3, min_lr=1e-6)
    early_stopper = EarlyStopping(monitor='val_acc', min_delta=0.0005, patience=15)
    csv_logger = CSVLogger('resnet34_cifar10.csv')
    import os
    import googlenet_inception_v1
    if __name__ == "__main__":
    from keras.utils.vis_utils import plot_model
    with tf.device('/gpu:4'):
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=1, allow_growth=True)
    os.environ["CUDA_VISIBLE_DEVICES"] = "4"
    tf.Session(config=K.tf.ConfigProto(allow_soft_placement=True,
    log_device_placement=True,
    gpu_options=gpu_options))
    (X_train, y_train), (X_test, y_test) = cifar10.load_data()
    # 定义输入数据并做归一化
    dim = 32
    channel = 3
    class_num = 10
    X_train = X_train.reshape(X_train.shape[0], dim, dim, channel).astype('float32') / 255
    X_test = X_test.reshape(X_test.shape[0], dim, dim, channel).astype('float32') / 255
    Y_train = np_utils.to_categorical(y_train, class_num)
    Y_test = np_utils.to_categorical(y_test, class_num)
    # this will do preprocessing and realtime data augmentation
    datagen = ImageDataGenerator(
    featurewise_center=False, # set input mean to 0 over the dataset
    samplewise_center=False, # set each sample mean to 0
    featurewise_std_normalization=False, # divide inputs by std of the dataset
    samplewise_std_normalization=False, # divide each input by its std
    zca_whitening=False, # apply ZCA whitening
    rotation_range=25, # randomly rotate images in the range (degrees, 0 to 180)
    width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
    horizontal_flip=True, # randomly flip images
    vertical_flip=False) # randomly flip images
    datagen.fit(X_train)
    s = X_train.shape[1:]
    print(s)
    model = googlenet_inception_v1.googLeNet_inception_v1_building(s,class_num)
    model.summary()
    #import pdb
    #pdb.set_trace()
    plot_model(model, to_file="GoogLeNet-Inception-V1.jpg", show_shapes=True)
    model.compile(loss='categorical_crossentropy',
    optimizer='adadelta',
    metrics=['accuracy'])
    batch_size = 32
    nb_epoch = 100
    # import pdb
    # pdb.set_trace()
    ModelCheckpoint("weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5", monitor='val_loss', verbose=0,
    save_best_only=False, save_weights_only=False, mode='auto')
    for e in range(nb_epoch):
    batches = 0
    for X_batch, Y_batch in datagen.flow(X_train, Y_train, batch_size=64):
    loss = model.train_on_batch(X_batch, [Y_batch,Y_batch,Y_batch]) # note the three outputs
    print loss
    #print '\r\n'
    #loss_and_metrics = model.evaluate(X_test, [Y_test,Y_test,Y_test], batch_size=128)
    #model.fit(X_test, [Y_test,Y_test,Y_test], batch_size=64)
    batches += 1
    if batches >= len(X_train) / 64:
    # we need to break the loop by hand because
    # the generator loops indefinitely
    break
    score = model.evaluate(X_test, Y_test, verbose=0)
    print('Test score:', score[0])
    print('Test accuracy:', score[1])
‍‍‍‍‍‍‍‍

1.机器学习原来这么有趣!【第一章】

2.机器学习原来这么有趣!【第二章】:用机器学习制作超级马里奥的关卡

3.机器学习从零开始系列连载(1)——基本概念

4.机器学习从零开始系列连载(2)——线性回归

5.机器学习从零开始系列连载(3)——支持向量机

6.机器学习从零开始系列连载(4)——逻辑回归

7.机器学习从零开始系列连载(5)——Bagging and Boosting框架

8.机器学习从零开始系列连载(6)—— Additive Tree 模型

记得把公号加星标,会第一时间收到通知。

创作不易,如果觉得有点用,希望可以随手转发或者”在看“,拜谢各位老铁

(0)

相关推荐