From Robin

(Difference between revisions)

Jump to: navigation, search

Revision as of 17:12, 18 March 2017

Master thesis

Mulige oppgaver

Visualisering

Mål:

Forstå algoritemene til roboten bedre ved å visualisere dem under kjøring.
Lettere se effektene av parametertuning

Ting:

Sammenligne forskjellige modeller
Bruke faktisk kamerastrøm og overlaye genererte 3D-modeller

Visualisering ved hjelp av VR ser jeg på som litt unødvendig, ettersom VR mest antakelig ikke vil forbedre forståelsen veldig. Her kan 2D anvendes helt fint. For å finne ut om man har klart målet må man brukerteste systemet og se om brukerne syns det er lettere å forstå algoritmene. Dette kan være en utfordring ettersom man både trenger nok brukere (20?) og en godt designet undersøkelse.

Fjernstyring

Mål: Løse en oppgave med roboten enklere med hjelp av VR. Vise at det er lettere å se på obstacles eller å styre en robotarm med VR.

Utfordringen her er å bygge en robotarm. Å teste hvor effektivt systemet er er enklere enn visualisering ettersom man kan ta tiden det tar å løse oppgaven, eller nøyaktigheten den ble utført med. Her kan det være en ide å bruke Viven's håndkontroller for å styre robotarmen.

Visualisering gjennom AR

Bruke AR - optimalt gjennom briller - til å se robotens algorimter visualisert. Dette krever både gode AR-briller (Micorsoft hololens eller HTC Vive-kamera) og mye datakraft. Brukerens nøyaktige pose (posisjon og orientering i forhold til roboten må kunne holdes oppdatert). Jeg ser for meg at dette prosjektet kan løses i 2D på skjerm først, så VR, og så portes til AR.

Denne ideen gjør meg veldig hyped og er dette jeg vil gå for.

Mål:

Bygge et system som gjør det enklere å forstå algoritemene til roboten ved å visualisere dem under kjøring
Lettere se effektene av parametertuning

Utfordringer:

Vet ikke hvor bra AR fungerer med Viven. Det er ikke et stereokamera, noe som er litt kjipt. På en annen side vil en ipad - som også er et alternaltiv - også være mono.
Finne metrics på hvor bra resultatet ble.

Visualization of sensory data and algorithms through augmented reality

Abstract

The main goal of the thesis is to figure out whether augmented reality can help users understand the robot's sensory information/world model. The classic way of examining the sensor data is to view the point clouds in a 3d program, where the clouds are floating in a black void, but can be rotated and zoomed in. It can be hard to understand the point cloud in relation to the physical robot. Therefore the point clouds are projected on top of the user's vision, hoping to improve his/her understanding of the situation.

Metrics

Can augmented reality improve the understanding of robotic sensory data?
Does presenting a robot's world model through augmented reality result in a more intuitive understanding for a human observer?
Does augmented reality help humans understand the robot's sensory information / world model in a more intuitive format?
How much better does the user understand how the robot works?

Things

Low latency on head mounted displays is very important for user experience.

Questionare

Rate traditional visualization (gazebo) 1-5 vs. AR system

The difficulty of understanding the robot's position in it's environment (easy to hard)
The difficulty of understanding the robot's orientation in it's environment (easy to hard)

Ideas

Jørgens ide: Traditional setup of a multi-robot/sensor setup in rviz, huge grid, inefficient and hard to get an overview. VR system makes this easier. Do not present 3D-world in rviz as an option, compare raw sensory data and VR environment instead.

Must find a specific task that is hard for the user, but is solved in a mixed reality environment.

Finding an algorithm, using sensory data from one or two sensors. Make it fail. Find example that is easy to see in VR, but hard on a screen.

Meeting with Tønnes 31. January

What does AR do? AR makes it possible to connect the virtual objects with the real objects. This makes the virtual objects easy or even possible for a human to understand.

Use a segmentation algorithm which fails to segment two overlapping objects in the scene. Watching the pointcloud in rviz while ajusting the parameters/angle + algorithm output is hard. Watching it in AR makes it intuitive. The algorithm output can be visualized by coloring the different segments and showing parameters overlayed.

Meeting with Kyrre and Tønnes 02.Febrary

Ta tiden på en oppgave er lettere å måle.

Finne et objekt i en punktsky. Mål tiden. Se forskjellige punktskyer.

Idea: Måle tiden for å identifisere et objekt i en punktsky med begrenset antall punkter. Be brukeren finne sweet spoten i forhold til antall punkter, der lavere er bedre.

Vise at VR er x% mer effektivt enn 3D gjennom rviz.

VR er første eksperiment og AR er andre eksperiment?

Sammenligne VR mot AR?

Sammenligne de forskjellige implementsjonene med hverandre?

Schedule

Fil:Master gantt mathiact.png Update Fil:Mathiact_Gantt_2.PNG

Implementation

Setup

Ros computer running Ubuntu 16.04:

Rosbridge server running on port 9090
realsense_camera publishing raw depth images at 640*240, 30 fps. Frame can be hacked to 10 fps by setting the dynamic parameter motion_range to 100. This however makes the images bad. My fork of realsense_camera here drops frames and only sends every third frame.

Visualization computer running Windows 10

HTC Vive connected
SteamVR installed
Unity project UnityRosSensorVisualizer, listening through rosbridgelib on local network (UDP)

Notes:

UDP blocked on local network, tried using a 3. party service called ngrok, which sends data with http on port 80 instead. When testing this, no frame rate greater than 1 fps was achieved. The reason for this might have been latency itroduced by the 3. party server, however this was not the case, as switching to a wired connection between the two computers did not improve the delay.

Using a wired connection:

Linux pc: Make sure it has a ipv4 address on the wired connection with ifconfig. If not, set it up and give it an ip address ie. 10.0.0.100 and subnet 255.255.255.0

Windows pc: Do the exact same thing, just pick another ip address, like 10.0.0.200

Now the two computers can reach each other on the local network with the given ip addresses.

Ros -> Unity

Data from Ros can be sent over websockets with [ RosBridge]

Similar projects

ARSEA

ARSEA - Augmented Reality Subsea Exploration Assistant

Github project

This project includes writing pointclouds to file for testing, folder: /PointCloudManager. However, it does not get the points from the socket, but the infrastructure is there.

Turtlesim example

Example of turtlesim data into unity

Github projects

C# point cloud library

Sensor

RealSense

Ros package realsense_camera and features support for the f200 camera (Creative).

Installation

Clone librealsense and follow the installation guide. No need for point 4 or 5, as we put a symlink to librealsense in our catkin_ws/src dir and build with catkin.

Then clone realsense and put a symlink to the inner folder /realsense_camera in the catkin_ws/src dir. Build everything.

How to run

roslaunch realsense_camara f200_nodelet_default.launch

Example modified launch file: File:F200 nodelet modify params.xml

Into Unity

Multiple ways to get data from this sensor. The frame rate of the depth sensor is stuck at 30 fps, making the pointclouds too heavy (300MB/s). See table below.

Bandwidth @ 640x240 resolution
Message type	Bandwidth (MB/s) @ 30 fps	Bandwidth @ 10 fps
Pointcloud2	148	46
Image	9.3	2.85
CompressedImage	0.45	0.2

Setting the resolution to 640x240 helps (default 640x480). PointCloud2 still lags too much. 1 fps and increasing delay. Raw image gives 2fps and stable short delay. Acceptable.

By using depth images, manually generation of pointclouds are necessary, see below.

The frame rate can be hacked down to 10 fps by setting the dynamic parameter f200_motion_range to 100 (default is 0). This forces longer exposure.

Measured Unity framerates @ 640x480 resolution
		30 fps	10 fps
Just network	PointCloud2	1.7	1.7
Just network	Image	25	8.6
With point cloud rendering	PointCloud2	1.6	1.7
With point cloud rendering	Image	8	8

We observe that network is the big problem here. The maximum frame rate from the PointCloud2 topic is 1.7. The Image topic is much lighter, and even with the required post processing (deprojection) in Unity, the Image topic is more than four times faster.

The format on the depth images are U16C1, which comes from OpenCV. uint16, one channel.

CompressedImage

The first one is the official one which uses the CompressedImage message for sending depth data.

The data is published at the topic /camera/depth/image_raw/compressedDepth. I have not been able to decompress this in Unity using Texture2D, as the image is read as RGBA24. ROS says it is supposed to be 16UC1, and the ros package hints that it is png, guessing by the settings.

Update After two weeks, I finally managed to get this topic into unity, the first 12 bytes of the data packet is junk and have to be removed for the png to be decoded. However, if a compressed image is to be used, a point cloud has to be generated. This includes using the camera matrix for mapping the pixels to its appropriate line through the lens, into 3d space.

Update 8. Desember 2016 I have not managed to get the png images into Unity with more than five gray levels: 0.000, 0.004, 0.008, 0.012 and 0.016.

The function System.Convert.FromBase64String(string) is used for converting the data from JSONNode to a byte array. This is the only way as far as I know. This conversion might be the reason for the loss og graylevels.

Another interesting detail is that the greylevels are all 0/256=0, 1/256=0.004, 2/256=0.008, 3/256=0.012 and 4/256=0.016

After the conversion, which most likely is the issue here, the byte array can be loaded into a texture by using new Texture2D(640, 480).loadImage(array); The parameters to the texture's constructor (like format, mipmap and linear) does not matter, since loadImage decides for itself.

Mapping the depth image to 3D points

Camera parameters:


   D: [0.1387016326189041, 0.0786430761218071, 0.003642927622422576, 0.008337009698152542, 0.09094849228858948]
   K: [478.3880920410156, 0.0, 314.0879211425781, 0.0, 478.38800048828125, 246.01443481445312, 0.0, 0.0, 1.0]
   R: [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
   P: [478.3880920410156, 0.0, 314.0879211425781, 0.025700006633996964, 0.0, 478.38800048828125, 246.01443481445312, 0.0012667370028793812, 0.0, 0.0, 1.0, 0.003814664203673601]

K means Calibration matrix. The calibration matrix maps points on the normalized image plane [x, y] to the pixel coordinates [u, v]. u = K*x. The matrix is 3x3 because it is homogeneous transformation. This transformation is the intrinsic part of the whole perspective camera model.

To calculate the 3D point in space, one must solve for x: x = K^-1 * u => x = K^-1 * u.

Since coordinates on the normalized image plane are homogeneous, we can multiply them by our depth measure (z), to extend the length of the vector to the appropriate point in 3D space.

x is [x/z, y/z, z], which gives the following result: [x, y, z] = K^-1 * u * z

K^-1:

0.00209035	0	-0.656555
0	0.00209035	-0.514257
0	0	1

Distortion

The images need to be undistorted using the Brown Conrady method. The distortion coefficients are:

k1	k2	p1	p2	k3
0.1387016326189041	0.0786430761218071	0.003642927622422576	0.008337009698152542	0.09094849228858948

Source Librealsense projection

Source OpenCV, distortion coefficients

PointCloud2

This one is an older(2015) implementation which instead uses the PointCloud2 message for sending depth data.

PointCloud2 live data packet example:


 header:
   seq: 537
   stamp: 
     secs: 1478272396
     nsecs:  68496564
   frame_id: camera_depth_optical_frame
 height: 480
 width: 640
 fields: 
   - 
     name: x
     offset: 0
     datatype: 7
     count: 1
   - 
     name: y
     offset: 4
     datatype: 7
     count: 1
   - 
     name: z
     offset: 8
     datatype: 7
     count: 1
 is_bigendian: False
 point_step: 16
 row_step: 10240
 data: [0, 0, 192, 127, 0, 0, 192, ...]
 is_dense: False

CompressedImage live data packet example:

The message is documented here

Datatype 7 means float32

The binary data array is of type uint8 and can be read like so:


 [      x      ][       y      ][      z       ][   garbage?  ][      x       ] ...
 0, 0, 192, 127, 0, 0, 192, 127, 0, 0, 192, 127, 0, 0, 128, 63, 0, 0, 192, 127,      ...

Each point uses 16 bytes, or places in the array. This means that the last 4 bytes, labeled "garbage?", can possibly be redundant and removed to save bandwidth.

Floats can be extracted like this:

float myFloat = System.BitConverter.ToSingle(mybyteArray, startIndex);

Augmented reality with the HTC Vive

Calibrating the camera

The camera was calibrated in MATLAB with a mean reprojection error of 0,1905

**Intrinsic parameters**
280,829665857757	0	303,305043312279
0	280,230119462474	233,171543364662
0	0	1

**Radial distortion**
-0,280758751984260	0,0739899523264349

The images need to be undistorted using the Brown Conrady method. The distortion coefficients are:

k1	k2	p1	p2	k3
-0.224250482220782	0.0432676652414734	0.000310839414039509	0.000696641409984896	-0.00329409354500417

Where k is radial distortion, and p is tangential distortion

The ArUco marker

Aruco is an OpenCV component for augmented reality. It can recognize markers in an image and calculate the marker's rotation and position in the camera coordinate frame. This makes it possible to draw objects correctly into an image at a location relative to the marker's position.

The marker has a different coordinate system than as documented in OpenCV(Blue=z, red=x and green=y). The drawDetectedMarkers function is wrong in relation to OpenCV documentation. Here the red axis is Z, blue is X, green is Y.

As we see below, by projecting a point cloud with coordinate system equal to the camera coordinate frame into the image with the markers pose, the result is as expected.

Fil:Faulty_aruco_coordinate_system.PNG

Projecting the point cloud correctly into the image

The output from estimatePoseSingleMarkers gives translation and rotation of the marker in the camera coordinate frame.

The following projection function can project 3D points into an image. Distortion coefficients must be given, or else the points get wrong position near the image border.


   Calib3d.projectPoints(objectPoints, rvecs, tvecs, camMatrix, distCoeffs, projectedPoints);

The most efficient way of making the point cloud align with the image is by adjusting the rotation and translation vectors received from the ArUco function estimateposesinglemarkes, so the transformation accounts for the offset (both rotation and translation) from the ArUco marker to the sensor.

In this project, for simplicity, another less efficient method is applied. The way this project achieves alignment is by transforming each point in the point cloud into the ArUco coordinate system. This is a computational heavy operation with a complexity of O(n), where n is the number of points in the cloud. In a typical scene there are about twenty to seventy thousand points, and this is calculated for each frame in the video feed.

Anyway - Below is code for rotating and translating in all directions.

The easiest way is to use Unity's built in function Matrix4x4.TRS (translation, rotation, scaling) like this:


 Matrix4x4 transMat = Matrix4x4.TRS(new Vector3(sensorTranslationX /1000, sensorTranslationY /1000, sensorTranslationZ /1000), Quaternion.Euler(sensorRotationX, sensorRotationY, sensorRotationZ), new Vector3(scaleFactor, scaleFactor, scaleFactor));
 
 // Vector to be transformed
 Vector3 vector = new Vector3(1, 2, 3);
 
 // Transform it
 Vector3 transformedPoint = transMat.MultiplyPoint3x4(vector);

Code for manually creating an OpenCV transformation matrix, translating and then rotating around z, y and x, respectively. (Alias, or passive rotation where coordinate system follows during rotation.)


 // The rotation angles in radians
 float rhoRot = Mathf.Deg2Rad * sensorRotationX;
 float thetaRot = Mathf.Deg2Rad * sensorRotationY;
 float phiRot = Mathf.Deg2Rad * sensorRotationZ;
 
 Mat sensor2ArucoMat = new Mat(4, 4, CvType.CV_64FC1);
 
 // Rotation around z, y, x (in that order)
 sensor2ArucoMat.put(0, 0, Mathf.Cos(phiRot) * Mathf.Cos(thetaRot));
 sensor2ArucoMat.put(0, 1, -Mathf.Sin(phiRot) * Mathf.Cos(rhoRot) + Mathf.Cos(phiRot) * Mathf.Sin(thetaRot) * Mathf.Sin(rhoRot));
 sensor2ArucoMat.put(0, 2, Mathf.Sin(phiRot) * Mathf.Sin(rhoRot) + Mathf.Cos(phiRot) * Mathf.Sin(thetaRot) * Mathf.Cos(rhoRot));
 sensor2ArucoMat.put(1, 0, Mathf.Sin(phiRot) * Mathf.Cos(thetaRot));
 sensor2ArucoMat.put(1, 1, Mathf.Cos(phiRot) * Mathf.Cos(rhoRot) + Mathf.Sin(phiRot) * Mathf.Sin(thetaRot) * Mathf.Sin(rhoRot));
 sensor2ArucoMat.put(1, 2, -Mathf.Cos(phiRot) * Mathf.Sin(rhoRot) + Mathf.Sin(phiRot) * Mathf.Sin(thetaRot) * Mathf.Cos(rhoRot));
 sensor2ArucoMat.put(2, 0, -Mathf.Sin(thetaRot));
 sensor2ArucoMat.put(2, 1, Mathf.Cos(thetaRot) * Mathf.Sin(rhoRot));
 sensor2ArucoMat.put(2, 2, Mathf.Cos(thetaRot) * Mathf.Cos(rhoRot));
 
 // Add translation
 sensor2ArucoMat.put(0, 3, sensorTranslationX);
 sensor2ArucoMat.put(1, 3, sensorTranslationY);
 sensor2ArucoMat.put(2, 3, sensorTranslationZ);
 
 /*
 Debug.Log("Sensor2ArucoMat:");
 Debug.Log(sensor2ArucoMat.get(0, 0)[0] + " " + sensor2ArucoMat.get(0, 1)[0] + " " + sensor2ArucoMat.get(0, 2)[0] + " " + sensor2ArucoMat.get(0, 3)[0]);
 Debug.Log(sensor2ArucoMat.get(1, 0)[0] + " " + sensor2ArucoMat.get(1, 1)[0] + " " + sensor2ArucoMat.get(1, 2)[0] + " " + sensor2ArucoMat.get(1, 3)[0]);
 Debug.Log(sensor2ArucoMat.get(2, 0)[0] + " " + sensor2ArucoMat.get(2, 1)[0] + " " + sensor2ArucoMat.get(2, 2)[0] + " " + sensor2ArucoMat.get(2, 3)[0]);
 Debug.Log(sensor2ArucoMat.get(3, 0)[0] + " " + sensor2ArucoMat.get(3, 1)[0] + " " + sensor2ArucoMat.get(3, 2)[0] + " " + sensor2ArucoMat.get(3, 3)[0]);
 */

Manipulating rvecs and tvecs from aruco

If one wants to be efficient and find the correct transformation for the camera instead of each and every point, one has to transform the rvecs and tvecs. rvecs is a rotation vector in matrix format (1x3). Calib3d.Rodrigues and be used to convert between 1x3 and 3x3 matrix.


 Mat arucoRotMat = new Mat(3, 3, CvType.CV_64FC1);
 Mat rvecsAjusted = new Mat(1, 3, CvType.CV_32FC1);
 
 // Convert rotation vector to matrix 3x3
 Calib3d.Rodrigues(rvecs, arucoRotMat)
 
 // Do the rotation
 // aruco2SensorMat is 3x3 and transforms from aruco coordinate system to the sensor's (This is what have to be implemented for increased efficiency).
 Mat result = arucoRotMat * aruco2SensorMat;
 
 // Convert back to 1x3 rotation vector
 Calib3d.Rodrigues(result, rvecsAjusted);

Stabilization

The pose from the marker should be stabilized using a weighted average, favoring the newest poses

µ = Sum(wi * xi) / Sum(wi), where wi are the weights and xi are the samples

Plan

Visualizing sensor data in Unity

Point cloud is extracted from ROS trough rosbridge. Data type is JSON and infrastructure is web sockets. Unity will need to have an asynchronous thread to handle the data to avoid frame drops due to latency. The point clouds are data intensive.

Articles

http://gbib.siggraph.org/ - Library from SIGGRAPH - Computer graphics conference

Virtual environments, 17 articles - frontiers [1]

Challenges in virtual environments (2014) [2]

Virtual reality

Multi-sensory feedback techniques, user studies [3]

Telepresence: Immersion with the iCub Humanoid Robot and the Oculus Rift [4]

Virtual reality simulator for robotics learning [5]

Augmented reality

User testing on AR system Instruction for Object Assembly based on Markerless Tracking

Evaluating human-computer interface in general Evaluation of human-computer interface for optical see-through augmented reality system

Sensor Data Visualization in Outdoor Augmented Reality

Discusses the challenges with displays in lit environments[6]

Projected AR onto dashboard vs normal dashboard User study on augmented reality control panel

Technical

Integration between Unity and ROS [7]

Unity: ROSBridgeLib [8]

Thesises

Teleoperation + visualization with oculus [9]

User studies on HRI
Bandwidth considerations

Remote Operations of IRB140 with Oculus Rift [10]

Controlling robot arm with stereo camera with oculus

Books

Comuter vision [11]

Internet

Camera matrix

User study

Useful links:

Designing the study

In order to test whether AR improves user understanding or not, a test where the two alternative ways of gaining knowledge are compared has to be designed:

The old, current way: The user switches between looking at the robot in its surroundings and on the monitor, to examine the senor data.
The AR way: The user looks at the robot in its surroundings, with the data overlaid.

In the AR system, the point cloud overlay does typically not give the user more information about the scene itself. The points are simply put on top of the objects the user could even see without the glasses. It does however give information about what the robot sees and its decision base.

Setups that ask the user what he/she thinks the robot will do in a given situation is hard, since we will have to "say" what the algorithm would do. There is no time to implement an algorithm on the system as of now. If this method is okay without making the experiment quasi, it might be a good approach.

AR only experiments

In these types of experiments the AR system is not tested in comparison with the old system.

In experiments where only the combination of visual quality and point cloud resolution are compared, the user might always favor the video feed, without the point cloud. This is because the points can get clustered together into a pink smudge, completely obscuring the object behind.

Possible setup: Impaired vision, objects hiding behind other objects for the user, but visual for the robot.

The problem with these types of experiments is that we are evaluating the AR system and finding which combinations of settings that works best, but we do not show that AR is better than the old way.

Experiments where the classic method is compared with the AR method

This type of experiment might be a better option, as if we can find a test where the users get a score (either how correct they solve the task and/or time usage), we can measure how well the new system perform relative to the old one.

Key things:

The experiments should test if the users gain perspective with the AR system.
The experiments should test the users ability to understand what each point in the cloud corresponds to in the real world. Where it belongs in the scene.
The tests should not be solvable only by looking at the scene in front of the robot.

Possible experiments:

Using an algorithm to point out an object, can be a random object for simplicity.
Make the user identify objects invisible in the scene (Mask out areas (preferably objects by object detection) in the point cloud.)
- Option 1: Use mocap lab to get the positions of the objects in the scene, send coordinates to unity for the masking.
- Option 2: Mask out rectangles. This option is easier.

[12]


 #include <ros/ros.h>
 #include <image_transport/image_transport.h>
 #include <opencv2/highgui/highgui.hpp>
 #include <opencv2/imgproc/imgproc.hpp>
 #include <cv_bridge/cv_bridge.h>
 #include <ros/console.h>
 
 
 image_transport::Publisher pub;
 
 void imageCallback(const sensor_msgs::ImageConstPtr& msg)
 {
   try
   {
     //cv::imshow("view", cv_bridge::toCvShare(msg, "bgr8")->image);
     //cv::imshow("view", cv_bridge::toCvShare(msg, "mono16")->image);
 
     // Copy image
           
     cv_bridge::CvImagePtr cv_ptr = cv_bridge::toCvCopy(msg);
 
     cv::rectangle(cv_ptr->image, cv::Point(300, 100), cv::Point(350, 150), cv::Scalar(0, 0, 0), -1);
     
     sensor_msgs::ImagePtr newMsg = cv_ptr->toImageMsg();
 
     pub.publish(newMsg);
 
     // cv::waitKey(30);
   }
   catch (cv_bridge::Exception& e)
   {
     ROS_ERROR("Could not convert from '%s' to 'bgr8'.", msg->encoding.c_str());
   }
 }
 
 int main(int argc, char **argv)
 {
   ros::init(argc, argv, "image_listener");
   ros::NodeHandle nh;
  
   //cv::namedWindow("view");
   //cv::startWindowThread();
   image_transport::ImageTransport it(nh);
 
   // Set up publish
   pub = it.advertise("camera/hacked_image_raw", 1);
 
   image_transport::Subscriber sub = it.subscribe("camera/depth/image_raw", 1, imageCallback);
 
   ros::spin();
   //cv::destroyWindow("view");
 }

NOTE: Glass object are somewhat a specific case, and the user will simply point them out in the scene, they are identifiable without the point cloud

Similar user studies

User interfaces in VR (Unity tutorial)

Building overlays on archaeological sites

Short subjective study on how it was to use the system

AR for understanding 3D geometry in engineering class Better link?

Nice user study with multiple choice test.
Both objective automatic data collection and subjective questionaire
Nice sources, lots on education and 3d models

[https://pdfs.semanticscholar.org/86f7/e2f92b9b9c42d59c125a324bd0970a00897b.pdf Virtual and Augmented Reality as Spatial Ability Training Tools]

First experiment - How effective is Augmented Reality contra traditional methods?

Motivation/abstract:

To be a robot engineer can be a tough job. When our algorithms behave in unexpected ways there can be a difficult task to understand why they did so. Why did the robot crash into the wall? Why did it fail to pick up a ball or chose the longest path to the goal destination? To find the problem, the engineer has to go trough every part of the system, from the data input to the output. Sometimes the robot can get insufficient, or even false information about its environment. In those cases the problem does not come from a bug in the code, but from its sensors. It is therefore important to verify the sensor data input. This is unfortunately not always a simple task, as examining point clouds in a black void on a screen can be quite confusing.

Augmented Reality is something that makes another thing possible.

This thesis presents a way of visualizing sensor data with Augmented reality. It does so by adding, or augmenting the reality with the sensor data, overlaying the real world with the pointcloud from the sensor. This way it is easy to verify that the points are in fact valid.

Goal: Create a color graph displaying visualization effectiveness as a function of video stream quality and point cloud resolution.

Some object setups and materials are hard to detect for the different types of sensors. Realsense. We wish to examine to what degree AR helps the developer understand what the sensor is sensing, as well as the robot's size and maneuverability in relation to the scene.

Procedure: We will test different combinations of video quality and pointcloud resolution and determine where the lower boundary of useful visualization lies.

Present a hypothesis prior to the experiments.

The setups will range from sparse point cloud and low visual quality to full AR, testing 16 combinations to create a 4x4 color graph.

Common for all experiments:

Points on the table surface and in the background can make it harder for the user to identify objects in the scene. Cropping the point clouds to remove said points could possibly make it easier for the user to identify the objects from the background.

Record all subjects on video. Make consent form like this. Ditch end date.

Even though there is a time limit for completing the task for each scene, the time usage should be recorded. Doing this makes it possible to distinguish correct answers, as classifying/counting for multiple combinations of image / point cloud quality can result in correct answers.

Sensor position/angle relative to the scene, tilted? Make sure it is consistent.

User position: Limited to sit in chair, but can lean around to see better.

Measure user movement in chair, low, medium, high. Classify measure from the videos. This might be an interesting measure, as this is a dynamic system meant to be used while moving around in the room.

Evaluating the subject's answers: Binary not good, as this will label 99% correct as fail. It is therefore probably better to give scores.

Experiment 1 - Identifying missing objects in the point cloud

CHANGED TO IDENTIFY THE OBJECTS PRESENT IN THE CLOUD, as this is more intuitive and easy for the user.

The user identifies objects in front of the robot that are missing from the point cloud.

The scenes consist of black plastic plates, with different numbers of objects on them. Each plate are placed on a raised bar stool. Both the plates and the stool are invisible for the sensor because of their black finish.

The sensor is placed on a tripod 20 cm above the scene, 40 cm to the side, angled 20 degrees towards the center of the scene.

The objects are white cubes of size 4x4x4 cm. On the side facing the user, an identifying number is written.

The test:

The tests are split into two difficulty groups: Easy (1-3 cubes) and hard (4-6 cubes).
The cubes are scattered around in the scenes
The user should see all scenes in both AR and traditional view, but in a randomized order randomizer
Each scene has a random number of boxes missing from the cloud. Identify the missing boxes.

Before the experiment:

Let the user try AR on the test scene
Let the user navigate in unity point cloud

Before each test:

The user should not be able to see the scene before each test starts. Use a blindfold.
AR: The user is shown a waiting image before the test starts

Metrics:

Score. Measured by the number of correctly identified missing objects in the cloud.
Time spent

Rules:

The user can not leave the chair, but is free to lean around in the chair.
Should the user be allowed to interact with the sensor?

Randomization in js:


 function shuffle(array) {
   var currentIndex = array.length
     , temporaryValue
     , randomIndex
     ;
 
   // While there remain elements to shuffle...
   while (0 !== currentIndex) {
 
     // Pick a remaining element...
     randomIndex = Math.floor(Math.random() * currentIndex);
     currentIndex -= 1;
 
     // And swap it with the current element.
     temporaryValue = array[currentIndex];
     array[currentIndex] = array[randomIndex];
     array[randomIndex] = temporaryValue;
   }
 
   return array;
 }
 
 var list = ["AR - 1", "AR - 2", "AR - 3", "AR - 4", "AR - 5", "AR - 6", "Trad - 1", "Trad - 2", "Trad - 3", "Trad - 4", "Trad - 5", "Trad - 6"]
 for (var i = 0; i < 50; i++) console.log(shuffle(list.slice()))

Cubes are placed in the scene. The user needs to be able to mark which he means, so color and numbers were applied. Numbers works best, must not face camera, is visible in point cloud.

Discussion: In the design process of this experiment there were a lot of questions. How many scenes to use? What kind of objects are best? Should all objects be the same? User difficulty as a factor, where the difficulty scales with the number of objects?

In the analysis, we were going for two groups, easy and hard. Easy includes scenes with 1-3 cubes, hard consist of scenes 4-6 cubes.

In the case where all cubes are visible, the user can simply count the cubes (counting the cubes are likely the first thing the user does to solve the tasks in the traditional way). In the case where no cubes are visible, the user immediately knows. We do not wish tasks that are too easy to solve.

Edge cases: We define edge cases to be scenes where all or no cubes are visible. The number of cubes to mask out was initially random, which produced edge cases. Our hypothesis was that the users traditional way of solving an edge case simply was to count the cubes in the scene and on the screen. In AR they can simply look at the scene. We wish to test the user's understanding of the relationship between the point cloud and the real world, however, this is not tested efficiently when the users simply count the objects.

By excluding the edge cases, we believe the experiments are more equal. However there should be noted that the results from the analysis might be affected, since the edge cases most likely will be much easier to solve in the traditional way than non-edge cases.

Another consequence of removing edge cases is that scene 1 had to be excluded, since it will always be an edge case.

Pilot study: What did we learn from the pilot study?

Edge cases. Plan, prepare participants for what will happen. Explain the view in rviz.

Experiment 2 - Counting objects

5-30 objects in each scene (number chosen to avoid using too much time counting)
The boxes containing the highest number of objects are designed to be nearly impossible for the subject to count correctly within the time limit.

The objects should be put in labeled boxes. Each having a set number of objects

Option 1: Different numbers of objects are in labeled boxes Advantages: Switching between the scenes are easy, as each scene is a box. The same boxes can be reused, six boxes might be enough.

Option 2: The objects are spread on the table Advantages: No background from the sides in the box are visible Disadvantage: Switching between the scenes takes longer

Both options requires labeled boxes with pre-counted objects to make the experiments more efficient.

Use only one kind of objects, or different?
Possible objects: Match boxes, dices, nuts, bolts
Put objects on top of each other, so point cloud will look like 1 object, but AR will reveal the true number of objects. By doing this the scenes

Experiment 3 - Identification of different objects

Option 1: Binary experiment with only one object in each scene. This gives a binary measure. Option 2: Multiple objects, ranging from 4-10. This requires more different objects. This does not give a binary measure.

Other experiments

Experiment 4: Find the shortest valid path for the robot through an obstacle track

Experiment 5: Classify track as maneuverable for the robot. Scene with glass pane obstacle. Scene with too tight gap for the robot.

Experiment 6: Time spent on finding sensor position and rotation in arucos coordinate system. Measure error from position and rotation (euclidian distance and rotation xyz)

This does not answer the main question in the thesis very good, but can be used as a bonus experiment. Maybe

It can be hard to verify that the sensor actually has the right position and rotation in the robot setup through rviz.

Show picture of how it is supposed to be, from three angles.
Traditional method: Measure with ruler and guess angle.
AR method: Just try until it aligns.

Technical

Make the screen black between each experiment so the user is unable to see the scene as it is being set up.

Maybe use mocap lab to set up all scenes to improve persistency. Manually setting up each scene introduces room for error and takes more time. Setting up takes more time.

Ideas

Use a scene with objects difficult for the Realsense to detect (glass), as this represents a typical problem for a robot developer. This might however not prove something very useful, as this test only verifies that the AR system is effective in this edge case.

@@ Line 737: / Line 737: @@
 Another consequence of removing edge cases is that scene 1 had to be excluded, since it will always be an edge case.
+'''Pilot study: ''' What did we learn from the pilot study?
+Edge cases. Plan, prepare participants for what will happen. Explain the view in rviz.
 === Experiment 2 - Counting objects ===

User:Mathiact