NVIDIA之AI Course:Getting Started with AI on Jetson Nano—Class notes(四)

NVIDIA之AI Course:Getting Started with AI on Jetson Nano—Class notes(四)

Notice
The original text comes from NVIDIA-AI Course. This article only provides Chinese translation.


​​​​​​​

Image Regression

正在更新……

Classification Vs. Regression  分类与回归

Unlike Image Classification applications, which map image inputs to discrete outputs (classes), the Image Regression task maps the image input pixels to continuous outputs.
       与将图像输入映射到离散输出(类)的图像分类应用程序不同,图像回归任务将图像输入像素映射到连续输出。

Continuous Outputs  连续输出

In the course regression project, those continuous outputs happen to define the X and Y coordinates of various features on a face, such as a nose. Mapping an image stream to a location for tracking can be used in other applications, such as following a line in mobile robotics. Tracking isn't the only thing a Regression model can do though. The output values could be something quite different such as steering values, or camera movement parameters.
       在课程回归项目中,这些连续的输出恰好定义了人脸(如鼻子)上各种特征的X和Y坐标。将图像流映射到用于跟踪的位置可以用于其他应用程序,比如在移动机器人中跟踪一条线。跟踪并不是回归模型唯一能做的事情。输出值可以是一些完全不同的东西,如转向值,或相机运动参数。

Changing The Final Layer  改变最后一层

The final layer of the pre-trained ResNet-18 network is a fully connected (fc) layer that has 512 inputs mapped to 1000 output classes, or (512, 1000). Using transfer learning in the Image Classification projects, that last layer was changed to only a few classes, depending on the application. For example, if there are to be 3 classes trained, we change the fc layer to (512, 3). The output includes the final layer of the neural network as a fully connected layer, with 512 inputs mapped to 3 classes.
       预训练的ResNet-18网络的最后一层是完全连接(fc)层,其中有512个输入映射到1000个输出类(512,1000)。在图像分类项目中使用迁移学习,根据应用程序的不同,最后一层只更改为几个类。例如,如果要训练3个类,我们将fc层更改为(52,3),输出包括神经网络的最后一层作为全连接层,其中512个输入映射到3个类。

In the case of a Regression project predicting coordinates, we want two values for each category, the X and Y values. That means twice as many outputs are required in the fc layer. For example, if there are 3 facial features (noseleft_eyeright_eye), each with both an X and Y output, then 6 outputs are required, or (512, 6) for the fc layer.
       对于预测坐标的回归项目,我们希望每个类别都有两个值,X和Y值。这意味着fc层需要两倍的输出。例如,如果有3个面部特征(鼻子、左眼、右眼),每个都有X和Y输出,那么fc层需要6个输出,或者(512,6)。

In classification, recall that the softmax function was used to build a probability distribution of the output values. For regression, we want to keep the actual values, because we didn't train for probabilities, but for actual X and Y output values.
       在分类中,记得使用softmax函数来构建输出值的概率分布。对于回归,我们想要保留实际的值,因为我们没有训练概率,而是实际的X和Y的输出值。

Evaluation  评价

Classification and Regression also differ in the way they are evaluated. The discrete values of classification can be evaluated based on accuracy, i.e. a calculation of the percentage of "right" answers. In the case of regression, we are interested in getting as close as possible to a correct answer. Therefore, the root mean squared error can be used.
       分类和回归在评估方法上也有所不同。分类的离散值可以根据准确度来评估,即计算“正确”答案的百分比。在回归的情况下,我们感兴趣的是尽可能接近一个正确的答案。因此,可以使用均方根误差。

Face XY Project  人脸坐标项目

The goal of this project is to build an Image Regression project that can predict the X and Y coordinates of a facial feature in a live image.
      该项目的目标是建立一个图像回归项目,可以预测一个活图像中面部特征的X和Y坐标。

Interactive Tool Startup Steps  交互式工具启动步骤

You will implement the project by collecting your own data using a clickable image display tool, training a model to find the XY coordinates of the feature, and then testing and updating your model as needed using images from the live camera. Since you are collecting two values for each category, the model may require more training and data to get a satisfactory result. 
        您将通过使用可单击的图像显示工具收集您自己的数据来实现该项目,训练一个模型来找到特性的XY坐标,然后根据需要使用来自live camera的图像测试和更新您的模型。由于您为每个类别收集两个值,因此模型可能需要更多的训练和数据来获得满意的结果。

Be patient! Building your model is an iterative process.  要有耐心!构建模型是一个迭代过程。

Step 1: Open The Notebook   第一步:打开笔记本

To get started, navigate to the regression folder in your JupyterLab interface and double-click the regression_interactive.ipynb notebook to open it.
     首先,导航到JupyterLab界面中的regression文件夹,双击regression_interactive。ipynb笔记本打开它。

Step 2: Execute All Of The Code Blocks   步骤2:执行所有代码块

The notebook is designed to be reusable for any XY regression task you wish to build. Step through the code blocks and execute them one at a time.
     记事本的设计是可重用的任何XY回归任务,您希望建立。遍历代码块并一次执行一个。

  1. Camera   相机

    This block sets the size of the images and starts the camera. If your camera is already active in this notebook or in another notebook, first shut down the kernel in the active notebook before running this code cell. Make sure that the correct camera type is selected for execution (USB or CSI). This cell may take several seconds to execute.
    此块设置图像的大小并启动相机。如果您的相机已经在本笔记本或其他笔记本中处于活动状态,那么在运行此代码单元之前,请先关闭活动笔记本中的内核。确保选择正确的相机类型执行(USB或CSI)。此单元格可能需要几秒钟执行。

  2. Task    任务

    You get to define your TASK and CATEGORIES parameters here, as well as how many datasets you want to track. For the Face XY Project, this has already been defined for you as the face task with categories of nose, left_eye, and right_eye. Each category for the XY regression tool will require both an X and Y values. Go ahead and execute the cell. Subdirectories for each category are created to store the example images you collect. The file names of the images will contain the XY coordinates that you tag the images with during the data collection step. This cell should only take a few seconds to execute.
    您可以在这里定义任务和类别参数,以及要跟踪的数据集的数量。对于Face XY项目,这已经为您定义为Face任务,包含nose、left_eye和right_eye类别。XY回归工具的每个类别都需要X和Y值。继续执行单元格。创建每个类别的子目录来存储您收集的示例图像。图像的文件名将包含在数据收集步骤中标记图像所用的XY坐标。这个单元格只需要几秒钟就可以执行。

  3. Data Collection   数据收集

    You’ll collect images for your categories with a special clickable image widget set up in this cell. As you click the “nose” or “eye” in the live feed image, the data image filename is automatically annotated and saved using the X and Y coordinates from the click.
    您将使用这个单元格中设置的一个特殊的可单击图像小部件为您的类别收集图像。当您单击实时提要图像中的“nose”或“eye”时,数据图像文件名将使用单击中的X和Y坐标自动注释和保存。

  4. Model   模型

    The model is set to the same pre-trained ResNet18 model for this project:
    模型设置为本项目相同的预训练后的ResNet18模型:

    model = torchvision.models.resnet18(pretrained=True)

    For more information on available PyTorch pre-trained models, see the PyTorch documentation. In addition to choosing the model, the last layer of the model is modified to accept only the number of classes that we are training for. In the case of the Face XY Project, it is twice the number of categories, since each requires both X and Y coordinates (i.e. nose Xnose Yleft_eye Xright_eye X and right_eye Y).
    有关可用的PyTorch预培训模型的更多信息,请参阅PyTorch文档。除了选择模型外,模型的最后一层被修改为只接受我们要培训的类的数量。在Face XY项目中,它是类别数的两倍,因为每个类别都需要X和Y坐标(即鼻子X,鼻子Y,左眼X,右眼X和右眼Y)。

    output_dim = 2 * len(dataset.categories)
    model.fc = torch.nn.Linear(512, output_dim)

    This code cell may take several seconds to execute.
    执行此代码单元格可能需要几秒钟。

  5. Live Execution   现场执行

    This code block sets up threading to run the model in the background so that you can view the live camera feed and visualize the model performance in real time. This cell should only take a few seconds to execute. For this project, circle blue circle will overlay the model prediction for the location of the feature selected.
    此代码块设置线程在后台运行模型,以便您可以查看实时摄像机提要并实时可视化模型性能。这个单元格只需要几秒钟就可以执行。对于这个项目,circle blue circle将覆盖所选特征位置的模型预测。

  6. Training and Evaluation  训练和评估

    The training code cell sets the hyper-parameters for the model training (number of epochs, batch size, learning rate, momentum) and loads the images for training or evaluation. The regression version is very similar to the simple classification training, though the loss is calculated differently. The mean square error over the X and Y value errors is calculated and used as the loss for backpropagation in training to improve the model. This code cell may take several seconds to execute.
    训练代码单元设置模型训练的超参数(epochs数、批大小、学习率、动量),并加载用于训练或评估的图像。回归版本与简单分类训练非常相似,只是计算损失的方法不同。通过计算X、Y值误差的均方误差,作为训练中反向传播的损失,对模型进行了改进。执行此代码单元格可能需要几秒钟。

  7. Display the Interactive Tool!    显示互动工具!

    This is the last code cell. All that's left to do is pack all the widgets into one comprehensive tool and display it. This cell may take several seconds to run and should display the full tool for you to work with. There are three image windows. Initially, only the left camera feed is populated. The middle window will display the most recent annotated snapshot image once you start collecting data. The right-most window will display the live prediction view once the model has been trained.
    这是最后一个代码单元格。剩下要做的就是将所有小部件打包到一个全面的工具中并显示它。这个单元格可能需要几秒钟的时间来运行,应该会显示要使用的完整工具。有三个图像窗口。最初,只填充左摄像机提要。一旦开始收集数据,中间的窗口将显示最新的带注释快照图像。一旦模型被训练好,最右边的窗口将显示实时预测视图。

Step 3: Collect Data, Train, Test    第三步:收集数据,训练,测试

Position the camera in front of your face and collect initial data. Point to the target feature with the mouse cursor that matches the category you've selected (such as the nose). Click to collect data. The annotated snapshot you just collected will appear in the middle display box. As you collect each image, vary your head position and pose:
       把相机放在你的脸前面,收集初始数据。用鼠标指针指向与您选择的类别匹配的目标特性(例如鼻子)。单击以收集数据。您刚刚收集的带注释的快照将出现在中间的显示框中。当你收集每张图片时,改变你的头部位置和姿势:

  1. Add 20 images of your nose with the nose category selected     添加20张你鼻子的图片,选择鼻子类别
  2. Add 20 images of your left eye face with the left_eye category selected     
    添加20张图片,您的左眼脸与左t_eye类别选择
  3. Add 20 images of your right eye with the right_eye category selected     添加20张右眼图片,并选择right_eye类别
  4. Set the number of epochs to 10 and click the train button       将epochs的数量设置为10,然后单击train按钮
  5. Once the training is complete, try the live view and observe the prediction. A blue circle should appear on the feature selected.     一旦训练完成,尝试实时视图并观察预测。选中的特性上应该出现一个蓝色圆圈。

Step 4: Improve Your Model    第四步:改进你的模型

Use the live inference as a guide to improve your model! The live feed shows the model's prediction. As you move your head, does the target circle correctly follow your nose (or left_eye, right_eye)? If not, then click the correct location and add data. After you've added some data for a new scenario, train the model some more. For example:
      使用活动推理作为指导来改进您的模型!实时feed显示了模型的预测。当你移动头部时,目标圆是否正确地跟随你的鼻子(或left t_eye, right_eye)?如果没有,则单击正确的位置并添加数据。在为新场景添加了一些数据之后,对模型进行更多的培训。例如:

  • Move the camera so that the face is closer. Is the performance of the predictor still good? If not, try adding some data for each category (10 each) and retrain (5 epochs). Does this help? You can experiment with more data and more training.
    移动相机,让脸更近。预测器的性能还好吗?如果没有,尝试为每个类别添加一些数据(每个类别10个)并重新训练(5个epochs)。这有帮助吗?你可以试验更多的数据和更多的训练。
  • Move the camera to provide a different background. Is the performance of the predictor still good? If not, try adding some data for each category (10 each) and retrain (5 epochs). Does this help? You can experiment with more data and more training.
    移动相机以提供不同的背景。预测器的性能还好吗?如果没有,尝试为每个类别添加一些数据(每个类别10个)并重新训练(5个epochs)。这有帮助吗?你可以试验更多的数据和更多的训练。
  • Are there any other scenarios you think the model might not perform well? Try them out!
    您是否认为模型还可能执行得不好?试一试!
  • Can you get a friend to try your model? Does it work the same? You know the drill: more data and training!
    你能找个朋友试试你的模型吗?工作原理一样吗?你知道这个练习:更多的数据和训练!​​​​​​​

Step 5: Save Your Model    第五步:保存模型

When you are satisfied with your model, save it by entering a name in the "model path" box and click "save model".
      当您对您的模型感到满意时,通过在“模型路径”框中输入一个名称并单击“保存模型”保存模型。

More Regression Projects    ​​​​​​​更多的回归项目

To build another project, follow the pattern you did with the Face Project. Save your previous work, modify the TASK and CATEGORIES values, shutdown and restart the notebook, and run all the cells. Then collect, train, and test!
      要构建另一个项目,请遵循您对Face项目所做的模式。保存以前的工作,修改任务和类别值,关闭和重启笔记本,并运行所有单元格。然后收集、训练和测试!

(0)

相关推荐