NVIDIA之AI Course:Getting Started with AI on Jetson Nano—Class notes(三)
NVIDIA之AI Course:Getting Started with AI on Jetson Nano—Class notes(三)
Notice
The original text comes from NVIDIA-AI Course. This article only provides Chinese translation.
Image Classification
AI And Deep Learning
In this course, you'll build AI projects that can answer simple visual questions:
在本课程中,您将构建能够回答简单视觉问题的AI项目:
- Is my hand showing thumbs-up or thumbs-down? 我的手是大拇指朝上还是朝下?
- Does my face appear happy or sad? 我的表情是高兴还是悲伤?
- How many fingers am I holding up? 我举起了多少根手指?
- Where's my nose? 我的鼻子在哪里?
Although these questions are easy for any human child to answer, interpreting images with computer vision requires a complex computer model that can be tuned to find the answer in a number of scenarios. For example, a thumbs-up hand signal may be at various angles and distances from the camera, it may be held before a variety of backgrounds, it could be from a variety of different hands, and so on, but it is still a thumbs-up hand signal. An effective AI model must be able to generalize across these scenarios, and even predict the correct answer with new data.
虽然这些问题对任何人类孩子来说都很容易回答,但是用计算机视觉来解释图像需要一个复杂的计算机模型,这个模型可以在许多场景中找到答案。例如,一个大拇指向上的手势可能在不同的角度和距离的相机,它可能是举行之前的各种背景,它可以从各种不同的手,等等,但它仍然是一个大拇指向上的手势。一个有效的人工智能模型必须能够泛化这些场景,并且能够用新的数据预测正确的答案。
AI And Deep Learning
As humans, we generalize what we see based on our experience. In a similar way, we can use a branch of AI called Machine Learning to generalize and classify images based on experience in the form of lots of example data. In particular, we will use deep neural network models, or Deep Learning to recognize relevant patterns in an image dataset, and ultimately match new images to correct answers.
作为人类,我们根据我们的经验来概括我们所看到的。同样,我们可以使用人工智能的一个分支机器学习,以大量的例子数据的形式,根据经验对图像进行概括和分类。特别是,我们将使用深度神经网络模型,或深度学习来识别图像数据集中的相关模式,并最终匹配新的图像来纠正答案。
If you want to know more, you can check out this article about the differences between Artificial Intelligence, Machine Learning, and Deep Learning.1554/5000
如果你想知道更多,你可以看看这篇关于人工智能、机器学习和深度学习之间的区别的文章。
Deep Learning Models 深度学习模型
A Deep Learning model consists of a neural network with internal parameters, or weights, configured to map inputs to outputs. In Image Classification, the inputs are the pixels from a camera image and the outputs are the possible categories, or classes that the model is trained to recognize. The choices might be 1000 different objects, or only two. Multiple labeled examples must be provided to the model over and over to train it to recognize the images. Once the model is trained, it can be run on live data and provide results in real time. This is called inference.
深度学习模型由具有内部参数或权重的神经网络组成,这些参数或权重被配置为将输入映射到输出。在图像分类中,输入是摄像机图像的像素,输出是模型经过训练能够识别的可能类别或类。选择可能是1000个不同的对象,或者只有两个。必须反复向模型提供多个带标记的示例,以训练它识别图像。一旦模型得到训练,它就可以在实时数据上运行并实时提供结果。这叫做推理。
Before training, the model cannot accurately determine the correct class from an image input, because the weights are wrong. Labeled examples of images are iteratively submitted to the network with a learning algorithm. If the network gets the "wrong" answer (the label doesn't match), the learning algorithm adjusts the weights a little bit. Over many computationally intensive iterations, the accuracy improves to the point that the model can reliably determine the class for an input image.
在训练之前,由于权重不对,模型无法从图像输入中准确地确定正确的类。用一种学习算法将带标记的图像实例迭代地提交给网络。如果网络得到“错误”的答案(标签不匹配),学习算法会稍微调整权重。经过许多计算密集型迭代,精确度提高到模型可以可靠地确定输入图像的类。
As you will discover, the data that is input is one of the keys to a good model, i.e. one that generalizes well regardless of the background, angle, or other "noisy" aspect of the image presented. Additional passes through the data set, or epochs can also improve the model's performance.
你会发现,输入的数据是一个好的模型的关键之一,也就是说,不管背景、角度或图像的其他“噪声”方面如何,这个模型都能很好地概括。额外的通过数据集的传递,或者epochs也可以提高模型的性能。
Convolutional Neural Networks (CNNs)
Deep learning relies on Convolutional Neural Network (CNN) models to transform images into predicted classifications. A CNN is a class of artificial neural network that uses convolutional layers to filter inputs for useful information, and is the preferred network for image applications.
深度学习依赖于卷积神经网络(CNN)模型将图像转换为预测分类。CNN是一种利用卷积层过滤有用信息输入的人工神经网络,是图像应用的首选网络。
Artificial Neural Network
An artificial neural network is a biologically inspired computational model that is patterned after the network of neurons present in the human brain. At each layer, the network transforms input data by applying a nonlinear function to a weighted sum of the inputs. The intermediate outputs of one layer, called features, are used as the input into the next layer. The neural network, through repeated transformations, learns multiple layers of nonlinear features (like edges and shapes), which it then combines in a final layer to create a prediction (of more complex objects).
人工神经网络是一种受生物学启发的计算模型,它是仿照人类大脑中存在的神经元网络而设计的。在每一层,网络通过将非线性函数应用于输入的加权和来转换输入数据。一个层的中间输出(称为特征)用作下一层的输入。神经网络通过反复变换,学习多层非线性特征(如边缘和形状),然后将这些特征结合到最后一层中,创建(对更复杂对象的)预测。
Convolutions
The convolution operation specific to CNNs combines the input data (feature map) from one layer with a convolution kernel (filter) to form a transformed feature map for the next layer. CNNs for image classification are generally composed of an input layer (the image), a series of hidden layers for feature extraction (the convolutions), and a fully connected output layer (the classification).
针对CNNs的卷积操作将一层的输入数据(feature map)与卷积核(filter)相结合,形成下一层的变换后的feature map。用于图像分类的CNNs通常由一个输入层(图像)、一系列用于特征提取的隐藏层(卷积)和一个完全连接的输出层(分类)组成。
Figure 1: An input image of a traffic sign is filtered by 4 5x5 convolutional kernels which create 4 feature maps, these feature maps are subsampled by max pooling. The next layer applies 10 5x5 convolutional kernels to these subsampled images and again we pool the feature maps. The final layer is a fully connected layer where all generated features are combined and used in the classifier (essentially logistic regression). Image by Maurice Peemen.
图1: 通过4个5x5卷积核对一个交通标志的输入图像进行滤波,生成4个feature map,这些feature map通过max pooling进行子采样。下一层将10个5x5卷积核应用到这些子采样图像上,我们再次将这些特征图集中起来。最后一层是一个完全连接的层,其中所有生成的特征都被组合起来,并在分类器中使用(本质上是逻辑回归)。莫里斯·皮尔曼(Maurice Peemen)著。
As it is trained, the CNN adjusts automatically to find the most relevant features based on its classification requirements. For example, a CNN would filter information about the shape of an object when confronted with a general object recognition task but would extract the color of the bird when faced with a bird recognition task. This is based on the CNN's recognition through training that different classes of objects have different shapes but that different types of birds are more likely to differ in color than in shape.
经过训练后,CNN会根据分类要求自动调整,找到最相关的特征。例如,CNN在面对一般的物体识别任务时会过滤关于物体形状的信息,而在面对鸟类识别任务时则会提取出鸟类的颜色。这是基于CNN通过训练认识到不同种类的物体有不同的形状,但是不同种类的鸟更有可能在颜色上而不是形状上不同。
Accelerating CNNs Using GPUs
The extensive calculations required for training CNN models and running inference through trained CNN models can be quite large in number, requiring intensive compute resources and time. Deep learning frameworks such as Caffe, TensorFlow, and PyTorch, are optimized to run faster on GPUs. The frameworks take advantage of the parallel processing capabilities of a GPU if it is present, speeding up training and inference tasks.
训练CNN模型和通过训练过的CNN模型进行推理所需要的大量计算量非常大,需要密集的计算资源和时间。深度学习框架(如Caffe、TensorFlow和PyTorch)经过优化,可以在gpu上运行得更快。这些框架利用了GPU的并行处理能力,加速了训练和推理任务。
The Jetson Nano includes a 128-core NVIDIA Maxwell GPU. Since it can run the full training frameworks, it is also able to re-train networks with transfer learning, a capability you will use in the projects for this course. Jetson Nano enables you to experiment with deep learning and AI on a low-cost platform. See this article for more details on Jetson Nano performance.
Jetson Nano包括一个128核的NVIDIA Maxwell GPU。因为它可以运行完整的训练框架,所以它还可以使用迁移学习对网络进行再训练,您将在本课程的项目中使用这种功能。Jetson Nano可以让你在一个低成本的平台上进行深度学习和人工智能的实验。有关Jetson Nano性能的更多细节,请参阅本文。
ResNet-18
There are a number of world-class CNN architectures available to application developers for image classification and image regression. PyTorch and other frameworks include access to pretrained models from past winners of the famous Imagenet Large Scale Visual Recognition Challenge (ILSVRC), where researchers compete to correctly classify and detect objects and scenes with computer vision algorithms. In 2015, ResNet swept the awards in image classification, detection, and localization. We'll be using the smallest version of ResNet in our projects: ResNet-18.
有许多世界级的CNN架构可供应用程序开发人员用于图像分类和图像回归。PyTorch和其他框架包括从著名的Imagenet大型视觉识别挑战(ILSVRC)的往届获奖者那里获得预先训练的模型,在这个挑战中,研究人员通过计算机视觉算法来竞争正确分类和检测对象和场景。2015年,ResNet在图像分类、检测和定位方面横扫各大奖项。我们将在我们的项目中使用最小版本的ResNet: ResNet-18。
Residual Networks 残差网络
The Deep Residual Learning for Image Recognition research paper provides insight into why this architecture is effective. ResNet is a residual network, made with building blocks that incorporate "shortcut connections" that skip one or more layers.
图像识别的深度残差学习研究论文提供了深入了解为什么这种架构是有效的。ResNet是一个残差网络,由包含“快捷连接”的构建块组成,这些“快捷连接”可以跳过一个或多个层。
The shortcut output is added to the outputs of the skipped layers. The authors demonstrate that this technique makes the network easier to optimize, and have higher accuracy gains at greatly increased depths. The ResNet architectures presented range from 18-layers deep, all the way to 152-layers deep! For our purposes, the smallest network, ResNet-18 provides a good balance of performance and efficiency sized well for the Jetson Nano.
shortcut 输出被添加到跳过的层的输出中。结果表明,该方法使网络易于优化,在深度大幅度增加的情况下具有较高的精度。ResNet架构的范围从18层深,一直到152层深!对于我们的目的,最小的网络,ResNet-18提供了一个良好的平衡性能和效率大小为Jetson Nano。
Transfer Learning
PyTorch includes a pre-trained ResNet-18 model that was trained on the ImageNet 2012 classification dataset, which consists of 1000 classes. In other words, the model can recognize 1000 different objects already!
PyTorch包含一个预训练的ResNet-18模型,该模型是在ImageNet 2012分类数据集上训练的,包含1000个类。换句话说,该模型已经可以识别1000个不同的对象!
Within the trained neural network are layers that find outlines, curves, lines, and other identifying features of an image. Important image features that were already learned in the original training of the model are now re-usable for our own classification task.
在经过训练的神经网络中,有一些层可以找到图像的轮廓、曲线、直线和其他识别特征。在模型的原始训练中已经学习到的重要图像特征现在可用于我们自己的分类任务。
We will adapt it for our projects, which all include less than 10 different classes, by modifying the last neural network layer of the 18 that make up the ResNet-18 model. The last layer for ResNet-18 is a fully connected (fc) layer, pooled and flattened to 512 inputs, each connected to the 1000 possible output classes. We will replace the (512,1000) layer with one matching our classes. If we only need three classes, for example, this final layer will become (512, 3), where each of the 512 inputs is fully connected to each one of the 3 output classes.
我们将通过修改构成ResNet-18模型的18个神经网络层的最后一个神经网络层,使其适应我们的项目,这些项目都包含不到10个不同的类。ResNet-18的最后一层是完全连接的(fc)层,池化并平铺为512个输入,每个输入连接到1000个可能的输出类。我们将用一个匹配类的层替换(512,1000)层。例如,如果我们只需要三个类,那么最后一层将变成(512,3),其中512个输入中的每个都完全连接到3个输出类中的每个。
You will still need to train the network to recognize those three classes using images you collect, but since the network has already learned to recognize features common to most objects, training is already part-way done. The previous training can be reused, or "transferred" to your new projects.
您仍然需要训练网络使用您收集的图像来识别这三个类,但是由于网络已经学会了识别大多数对象的常见特性,所以训练已经完成了一部分。以前的训练可以重用,或者“转移”到新项目中。
Thumbs Project 大拇指项目
The goal of this exercise is to build an Image Classification project that can determine the meaning of hand signals ( thumbs-up> or thumbs-down>) that are held in front of a live camera.
本练习的目标是构建一个图像分类项目,该项目可以确定手持在实时摄像机前的手势(大拇指向上的>或大拇指向下的>)的含义。
Interactive Tool Startup Steps 交互式工具启动步骤
You will implement the project by collecting your own data, training a model to classify your data, and then testing and updating your model as needed until it correctly classifies thumbs-up or thumbs-down images before the live camera.
您将通过收集您自己的数据来实现该项目,培训一个模型来对您的数据进行分类,然后根据需要测试和更新您的模型,直到它在实时摄像机前正确地对大拇指向上或向下的图像进行分类。
Step 1: Open The Notebook
To get started, navigate to the classification folder in your JupyterLab interface and double-click the classification_interactive.ipynb
notebook to open it.
首先,导航到JupyterLab界面中的classification文件夹,双击classification_interactive。ipynb笔记本打开它。
Step 2: Execute All Of The Code Blocks 步骤2: 执行所有代码块
The notebook is designed to be reusable for any classification task you wish to build. Step through the code blocks and execute them one at a time. If you have trouble with this step, review the information on JupyterLab.
该记事本设计为可重用的任何分类任务,您希望建立。遍历代码块并一次执行一个。如果您在这一步有困难,请查看JupyterLab的信息。
- Camera 相机
This block sets the size of the images and starts the camera. If your camera is already active in this notebook or in another notebook, first shut down the kernel in the active notebook before running this code cell. Make sure that the correct camera type is selected for execution (USB or CSI). This cell may take several seconds to execute.
此块设置图像的大小并启动相机。如果您的相机已经在本笔记本或其他笔记本中处于活动状态,那么在运行此代码单元之前,请先关闭活动笔记本中的内核。确保选择正确的相机类型执行(USB或CSI)。此单元格可能需要几秒钟执行。 - Task 任务
You get to define your
TASK
andCATEGORIES
(the classes) parameters here, as well as how many datasets you want to track. For the Thumbs Project, this has already been defined for you, so go ahead and execute the cell. Subdirectories for each class are created to store the example images you collect. The subdirectory names serve as the labels needed for the model. This cell should only take a few seconds to execute.
您可以在这里定义任务和类别(类)参数,以及要跟踪多少数据集。对于Thumbs项目,这已经为您定义好了,所以继续执行单元格。创建每个类的子目录来存储您收集的示例图像。子目录名用作模型所需的标签。这个单元格只需要几秒钟就可以执行。 - Data Collection 数据收集
You'll collect images for your categories with your camera using an iPython widget. This cell sets up the collection mechanism to count your images and produce the user interface. The widget built here is the
data_collection_widget
. If you want to learn more about these powerful tools, visit the ipywidgets documentaion. This cell should only take a few seconds to execute.
您将使用iPython小部件用相机为您的类别收集图像。此单元格设置集合机制来计数图像并生成用户界面。这里构建的小部件是data_collection_widget。如果您想了解关于这些强大工具的更多信息,请访问ipywidgets documentaion。这个单元格只需要几秒钟就可以执行。 - Model 模型
This block is where the neural network is defined. First, the GPU device is chosen with the statement:
这个块就是定义神经网络的地方。首先,用语句选择GPU设备:device = torch.device('cuda')
The model is set to the ResNet-18 model for this project. Note that the
pretrained=True
parameter indicates we are loading all the parameter weights for the trained Resnet-18 model, not just the neural network alone:
这个项目的模型设置为ResNet-18模型。注意,预训练=True参数表示我们正在加载训练后的Resnet-18模型的所有参数权重,而不仅仅是神经网络:model = torchvision.models.resnet18(pretrained=True)
There are a few more models listed in comments that you can try out later if you wish. For more information on available PyTorch pre-trained models, see the PyTorch documentation.
在评论中列出了更多的模型,如果您愿意,可以稍后试用。有关可用的PyTorch预培训模型的更多信息,请参阅PyTorch文档。In addition to choosing the model, the last layer of the model is modified to accept only the number of classes that we are training for. In the case of the Thumbs Project, it is only 2 (i.e. thumbs-up and thumbs-down).
除了选择模型外,模型的最后一层被修改为只接受我们要训练的类的数量。在Thumbs项目中,只有2个(即大拇指向上和大拇指向下)。model.fc = torch.nn.Linear(512, len(dataset.categories))
This code cell may take several seconds to execute.
执行此代码单元格可能需要几秒钟。 - Live Execution
This code block sets up threading to run the model in the background so that you can view the live camera feed and visualize the model performance in real time. It also includes the code that defines how the outputs from the neural network are categorized. The network produces some value for each of the possible categories. The softmax function takes this vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities. The values now add up to 1 and can be interpreted as probabilities.
此代码块设置线程在后台运行模型,以便您可以查看实时摄像机提要并实时可视化模型性能。它还包括定义如何分类神经网络输出的代码。网络为每个可能的类别产生一些价值。softmax函数取这个K个实数的向量,并将其规范化为一个由K个概率组成的概率分布。现在这些值加起来等于1,可以解释为概率。output = F.softmax(output, dim=1).detach().cpu().numpy().flatten()
This cell should only take a few seconds to execute.
- Training and Evaluation
The training code cell sets the hyper-parameters for the model training (number of epochs, batch size, learning rate, momentum) and loads the images for training or evaluation. The model determines a predicted output from the loaded input image. The difference between the predicted output and the actual label is used to calculate the “loss”. If the model is in training mode, the loss is backpropagated into the network to improve the model. The widgets created by this code cell include the option for setting the number of epochs to run. One epoch is a complete cycle of all images through the trainer. This code cell may take several seconds to execute.这个单元格只需要几秒钟就可以执行。
训练代码单元设置模型训练的超参数(纪元数、批大小、学习率、动量),并加载用于训练或评估的图像。该模型从加载的输入图像中确定一个预测的输出。预测产量与实际标号之间的差额用于计算“损失”。如果模型处于训练模式,则将损失反向传播到网络中,对模型进行改进。这个代码单元格创建的小部件包括设置要运行的epoch数量的选项。一个历元是一个完整的循环,所有的图像通过教练。执行此代码单元格可能需要几秒钟。 - Display the Interactive Tool! 显示互动工具!
This is the last code cell. All that's left to do is pack all the widgets into one comprehensive tool and display it. This cell may take several seconds to run and should display the full tool for you to work with. This tool will look essentially the same, no matter how you set up the classification problem with this notebook.
这是最后一个代码单元格。剩下要做的就是将所有小部件打包到一个全面的工具中并显示它。这个单元格可能需要几秒钟的时间来运行,应该会显示要使用的完整工具。无论您如何设置这个笔记本的分类问题,这个工具在本质上看起来都是一样的。
Step 3: Collect Your Initial Data 第三步:收集初始数据The tool is designed for live interaction, so you can collect some data, train it, check the results, and then improve the model with more data and training. We'll try this in pieces to learn what effect the data you gather has on performance of the model. At each step, you'll vary the data in a new way, building your dataset as you go.
该工具是为实时交互而设计的,因此您可以收集一些数据,对其进行培训,检查结果,然后使用更多的数据和培训来改进模型。我们将逐一尝试,以了解您收集的数据对模型性能有什么影响。在每个步骤中,您都将以一种新的方式更改数据,即一边构建数据集,一边构建数据集。
Collect 30 images of thumbs-up images. Move your thumb through an arc of generally upward angles in front of the camera as you click the "add" button to save the data images.
Next, select the thumbs-down category on the tool
收集30张竖起大拇指的图片。当你点击“添加”按钮来保存数据图像时,拇指在相机前向上弯曲的弧线上移动。
接下来,选择工具上的“向下”类别。
and collect 30 images of your thumb in the down position, again varying the angle a bit as you click. The goal is to provide the model with lots of different examples from each category, so that the prediction can be generalized.
收集30张拇指向下的图片,在你点击的时候再稍微改变一下角度。我们的目标是为模型提供大量不同类别的例子,以便对预测进行推广。
Step 4: Train Your Initial Data 第四步:训练你的初始数据
Set the epoch number to 10, and click the train button. There will be a delay of about 30 seconds as the trainer loads the data. After that, the progress bar will indicate training status for each epoch. You'll see the calculated loss and accuracy displayed as well. With each epoch, the model improves, at least based on the data it has to work with! The accuracy should generally increase. Keep in mind that the accuracy is based on tests against the data the model already has access to, not truly unseen data.
将历元数设置为10,然后单击train按钮。当训练器加载数据时,将会有大约30秒的延迟。之后,进度条将显示每个epoch的训练状态。您还将看到计算出的损失和精度。每一个历元,模型都会改进,至少是基于它必须处理的数据!准确度一般应提高。请记住,准确性是基于对模型已经访问的数据的测试,而不是真正不可见的数据。
Step 5: Test Your Data In Real Time 步骤5:实时测试数据
Once training is done, hold your thumb up or down in front of the camera and observe the prediction and sliders. The sliders indicate the probability the model gives for the prediction made. How was the result? Are you satisfied that the model is robust? Try moving the camera to a new background to see if it still works the same.
一旦训练完成,将拇指向上或向下放在摄像机前,观察预测和滑块。滑块表示模型给出的预测概率。结果如何?您对模型的健壮性感到满意吗?试着把相机移动到一个新的背景,看看它是否还能正常工作。
Note: If at any time your camera seems to "freeze", it will be necessary to shut down the kernel from the menu bar, then restart the kernel and run all cells. Your data is saved, but the model training will need to be run again.
注意:如果您的相机在任何时候似乎“冻结”,将有必要从菜单栏关闭内核,然后重新启动内核并运行所有单元格。您的数据被保存,但是模型培训将需要再次运行。
Step 6: Improve Your Model 第六步:改进你的模型
- Using a different background, gather an additional 30 images for thumbs-up and thumbs-down, again varying the angle. Train an additional 5 epochs.
使用不同的背景,收集额外的30张图片,用于拇指向上和向下,再次改变角度。再训练5个epochs。
- Did your model become more reliable? What happens when you move the thumb to corners and edges of the camera view, or move your thumb very far away or very close to the camera?
你的模型变得更可靠了吗?当你将拇指移动到相机视图的角和边,或者将拇指移动到离相机非常远或非常近的地方,会发生什么?
- Using a variety of distances from the camera, gather an additional 30 images for thumbs-up and thumbs-down. Train an additional 5 epochs.
使用不同距离的相机,收集额外的30张图片,拇指向上和向下。再训练5个epochs。
- Keep testing and training in this way until you are satisfied with the performance of your first project!
以这种方式进行测试和培训,直到您对第一个项目的性能感到满意为止!
Step 7: Save Your Model 第七步:保存模型
When you are satisfied with your model, save it by entering a name in the "model path" box and click "save model".
当您对您的模型感到满意时,通过在“模型路径”框中输入一个名称并单击“保存模型”保存模型。
Emotions Project 情绪项目
The goal of this exercise is to build an Image Classification project that can determine the meaning of four different facial expressions ("happy", "sad", "angry", "none"), that you provide in front of a live camera.
这个练习的目的是建立一个图像分类项目,它可以确定你在摄像机前提供的四种不同的面部表情(“高兴”、“悲伤”、“愤怒”、“没有”)的含义。
Interactive Tool Startup Steps 交互式工具启动步骤
The setup for the Emotions Project is almost the as for the Thumbs Project.
Emotions项目的设置几乎与Thumbs项目一样。
Step 1: Open The Notebook 第一步:打开笔记本
You'll use the same classification_interactive.ipynb
notebook. If it's already open, restart the notebook and clear all the outputs using the Kernel menu with Kernel->Restart Kernel and Clear All Outputs. If your camera is active in any other notebook, shut down the kernel in that active notebook as well.
您将使用相同的classification_interactive.ipynb
笔记本。如果它已经打开,重新启动笔记本,并清除所有输出使用内核菜单与内核->重启内核和清除所有输出。如果您的相机在任何其他笔记本中处于活动状态,请关闭该活动笔记本中的内核。
Step 2: Modify The Task Code Cell 步骤2:修改任务代码单元格
Before you execute all of the code blocks in the notebook, you'll need to change the TASK
and CATEGORIES
parameters in the Task code cell block to define the new project. Comment out the "thumbs" project parameters, and uncomment the "emotions" parameters:
在执行记事本中的所有代码块之前,需要更改任务代码单元格块中的任务和类别参数,以定义新项目。注释掉“thumbs”项目参数,取消注释“emotions”参数:
# TASK = 'thumbs'
TASK = 'emotions'
# TASK = 'fingers'
# TASK = 'diy'# CATEGORIES = ['thumbs_up', 'thumbs_down']
CATEGORIES = ['none', 'happy', 'sad', 'angry']
# CATEGORIES = ['1', '2', '3', '4', '5']
# CATEGORIES = [ 'diy_1', 'diy_2', 'diy_3']
Step 3: Execute All Of The Code Blocks 步骤3:执行所有代码块
The rest of the blocks remain the same. You'll still use the ResNet18 pre-trained model as a base. This time, since there are four items in the CATEGORIES
parameter, there will be four different class subdirectories for data and four output probability sliders on the Interactive tool.
其余的块保持不变。您仍然将使用ResNet18预训练模型作为基础。这一次,因为CATEGORIES参数中有4个条目,所以在交互工具上有4个不同的类子目录和4个输出概率滑块。
Step 4: Collect Data, Train, Test 第四步:收集数据,训练,测试
Position the camera in front of your face and collect initial data. As you collect each emotion, vary your head position and pose. Try leaning your head left and right, up and down, side to side. As you create your emotion faces, think about the difference between sad and angry. Exaggerate Your expressions to make them distinctive for the initial training, then refine with more subtlety as you as your model improves:
把相机放在你的脸前面,收集初始数据。当你收集每种情绪时,改变你的头部位置和姿势。试着左右、上下、左右倾斜你的头。当你创造自己的表情时,想想悲伤和愤怒的区别。在最初的训练中,夸张你的表情,让它们与众不同,然后随着模型的改进,你的表情也会变得更加微妙:
- Add 20 images of a "happy" face with the happy category selected
添加20张“快乐”脸的图片,选择“快乐”类别 - Add 20 images of a "sad" face with the sad category selected
添加20张选择“悲伤”类别的“悲伤”脸的图片 - Add 20 images of an "angry" face with the angry category selected
添加20张选择“愤怒”类别的“愤怒”脸的图片 - Add 20 images of a face with no expression with the none category selected
添加20张没有表情的脸的图片,并选择none类别 - Set the number of epochs to 10 and click the train button
将epochs的数量设置为10,然后单击train按钮 - Once the training is complete, try different expressions live and observe the prediction
一旦训练完成,尝试不同的表情,观察预测
Step 5: Improve Your Model 第五步:改进你的模型
As you did in the Thumbs Project, you can improve your model by adding data for scenarios that don't work as well as you like, then retraining. For example:
正如在Thumbs项目中所做的那样,您可以通过为不太理想的场景添加数据来改进模型,然后再进行训练。例如:
- Move the camera so that the face is closer. Is the performance of the predictor still good? If not, try adding some data for each category (10 each) and retrain (5 epochs). Does this help? You can experiment with more data and more training.
移动相机,让脸更近。预测器的性能还好吗?如果没有,尝试为每个类别添加一些数据(每个类别10个)并重新训练(5个 epochs)。这有帮助吗?你可以试验更多的数据和更多的训练。 - Move the camera for a different background. Is the performance of the predictor still good? If not, try adding some data for each category (10 each) and retrain (5 epochs). Does this help? You can experiment with more data and more training.
移动相机到不同的背景。预测器的性能还好吗?如果没有,尝试为每个类别添加一些数据(每个类别10个)并重新训练(5个epochs)。这有帮助吗?你可以试验更多的数据和更多的训练。 - Can you get a friend to try your model? Does it work the same? You know the drill… more data and training!
你能找个朋友试试你的模型吗?工作原理一样吗?你知道训练…更多的数据和训练!
Step 5: Save Your Model 第五步:保存模型
When you are satisfied with your model, save it by entering a name in the "model path" box and click "save model".
当您对您的模型感到满意时,通过在“模型路径”框中输入一个名称并单击“保存模型”保存模型。
More Classification Projects 更多的分类项目
To build another project, follow the pattern you did with the Emotions Project. As an example, the Fingers project is provided as an example, but don't let that limit you! To start a new project, save your previous work, modify the TASK
and CATEGORIES
values, shutdown and restart the notebook, and run all the cells. Then collect, train, and test!
要构建另一个项目,请遵循您对Emotions项目所做的模式。例如,Fingers项目就是一个例子,但是不要让它限制您!要启动新项目,请保存以前的工作,修改任务和类别值,关闭并重启笔记本,并运行所有单元格。然后收集、训练和测试!