User:Mathiact
From Robin
(→The Aruco marker) |
(→The Aruco marker) |
||
Line 371: | Line 371: | ||
=== The Aruco marker === | === The Aruco marker === | ||
- | The marker has a different coordinate system than as documented in OpenCV | + | The marker has a different coordinate system than as documented in OpenCV(Blue=z, red=x and green=y). |
+ | |||
+ | The drawDetectedMarkers function is wrong in relation to OpenCV documentation. '''Here the red axis is Z, blue is X, green is Y.''' | ||
The output from estimatePoseSingleMarkers gives translation and rotation of the marker in the camera coordinate frame. | The output from estimatePoseSingleMarkers gives translation and rotation of the marker in the camera coordinate frame. |
Revision as of 18:26, 13 February 2017
Contents
|
Master thesis
Mulige oppgaver
Visualisering
Mål:
- Forstå algoritemene til roboten bedre ved å visualisere dem under kjøring.
- Lettere se effektene av parametertuning
Ting:
- Sammenligne forskjellige modeller
- Bruke faktisk kamerastrøm og overlaye genererte 3D-modeller
Visualisering ved hjelp av VR ser jeg på som litt unødvendig, ettersom VR mest antakelig ikke vil forbedre forståelsen veldig. Her kan 2D anvendes helt fint. For å finne ut om man har klart målet må man brukerteste systemet og se om brukerne syns det er lettere å forstå algoritmene. Dette kan være en utfordring ettersom man både trenger nok brukere (20?) og en godt designet undersøkelse.
Fjernstyring
Mål: Løse en oppgave med roboten enklere med hjelp av VR. Vise at det er lettere å se på obstacles eller å styre en robotarm med VR.
Utfordringen her er å bygge en robotarm. Å teste hvor effektivt systemet er er enklere enn visualisering ettersom man kan ta tiden det tar å løse oppgaven, eller nøyaktigheten den ble utført med. Her kan det være en ide å bruke Viven's håndkontroller for å styre robotarmen.
Visualisering gjennom AR
Bruke AR - optimalt gjennom briller - til å se robotens algorimter visualisert. Dette krever både gode AR-briller (Micorsoft hololens eller HTC Vive-kamera) og mye datakraft. Brukerens nøyaktige pose (posisjon og orientering i forhold til roboten må kunne holdes oppdatert). Jeg ser for meg at dette prosjektet kan løses i 2D på skjerm først, så VR, og så portes til AR.
Denne ideen gjør meg veldig hyped og er dette jeg vil gå for.
Mål:
- Bygge et system som gjør det enklere å forstå algoritemene til roboten ved å visualisere dem under kjøring
- Lettere se effektene av parametertuning
Utfordringer:
- Vet ikke hvor bra AR fungerer med Viven. Det er ikke et stereokamera, noe som er litt kjipt. På en annen side vil en ipad - som også er et alternaltiv - også være mono.
- Finne metrics på hvor bra resultatet ble.
Visualization of sensory data and algorithms through augmented reality
Metrics
- How much better does the user understand how the robot works?
Things
- Low latency on head mounted displays is very important for user experience.
Questionare
Rate traditional visualization (gazebo) 1-5 vs. AR system
- The difficulty of understanding the robot's position in it's environment (easy to hard)
- The difficulty of understanding the robot's orientation in it's environment (easy to hard)
Ideas
Jørgens ide: Traditional setup of a multi-robot/sensor setup in rviz, huge grid, inefficient and hard to get an overview. VR system makes this easier. Do not present 3D-world in rviz as an option, compare raw sensory data and VR environment instead.
Must find a specific task that is hard for the user, but is solved in a mixed reality environment.
Finding an algorithm, using sensory data from one or two sensors. Make it fail. Find example that is easy to see in VR, but hard on a screen.
Meeting with Tønnes 31. January
What does AR do? AR makes it possible to connect the virtual objects with the real objects. This makes the virtual objects easy or even possible for a human to understand.
Use a segmentation algorithm which fails to segment two overlapping objects in the scene. Watching the pointcloud in rviz while ajusting the parameters/angle + algorithm output is hard. Watching it in AR makes it intuitive. The algorithm output can be visualized by coloring the different segments and showing parameters overlayed.
Meeting with Kyrre and Tønnes 02.Febrary
Ta tiden på en oppgave er lettere å måle.
Finne et objekt i en punktsky. Mål tiden. Se forskjellige punktskyer.
Idea: Måle tiden for å identifisere et objekt i en punktsky med begrenset antall punkter. Be brukeren finne sweet spoten i forhold til antall punkter, der lavere er bedre.
Vise at VR er x% mer effektivt enn 3D gjennom rviz.
VR er første eksperiment og AR er andre eksperiment?
Sammenligne VR mot AR?
Sammenligne de forskjellige implementsjonene med hverandre?
First experiment - How effective is Augmented Reality contra traditional methods?
Goal: Create a color graph displaying visualization effectiveness as a function of video stream quality and point cloud resolution.
Some object setups and materials are hard to detect for the realsense. We wish to examine to what degree AR helps the developer understand what the sensor is sensing.
Procedure: We will test different combinations of video quality and pointcloud resolution and determine where the lower boundery of useful visualization lies.
Quality measure: The user counts the number of objects in a scene.
The setups will range from no AR, only pointcloud to full AR.
Present a hypothesis prior to the experiment.
Ideas
- Use a scene with objects difficult for the realsense to detect (glass), as this represents a typical problem for a robot developer.
Schedule
Implementation
Setup
Ros computer running Ubuntu 16.04:
- Rosbridge server running on port 9090
- realsense_camera publishing raw depth images at 640*240, 30 fps. Frame rate hacked to 10 fps by setting the dynamic parameter motion_range to 100
Visualization computer running Windows 10
- HTC Vive connected
- SteamVR installed
- Unity project UnityRosSensorVisualizer, listening through rosbridgelib on local network (UDP)
Notes:
UDP blocked on local network, tried using a 3. party service called ngrok, which sends data with http on port 80 instead. When testing this, no frame rate greater than 1 fps was achieved. The reason for this might have been latency itroduced by the 3. party server, however this was not the case, as switching to a wired connection between the two computers did not improve the delay.
Using a wired connection:
Linux pc: Make sure it has a ipv4 address on the wired connection with ifconfig. If not, set it up and give it an ip address ie. 10.0.0.100 and subnet 255.255.255.0
Windows pc: Do the exact same thing, just pick another ip address, like 10.0.0.200
Now the two computers can reach each other on the local network with the given ip addresses.
Ros -> Unity
Data from Ros can be sent over websockets with [ RosBridge]
Similar projects
ARSEA
ARSEA - Augmented Reality Subsea Exploration Assistant
This project includes writing pointclouds to file for testing, folder: /PointCloudManager. However, it does not get the points from the socket, but the infrastructure is there.
Turtlesim example
Example of turtlesim data into unity
Github projects
Sensor
RealSense
Ros package realsense_camera and features support for the f200 camera (Creative).
Installation
Clone librealsense and follow the installation guide. No need for point 4 or 5, as we put a symlink to librealsense in our catkin_ws/src dir and build with catkin.
Then clone realsense and put a symlink to the inner folder /realsense_camera in the catkin_ws/src dir. Build everything.
How to run
roslaunch realsense_camara f200_nodelet_default.launch
Example modified launch file: File:F200 nodelet modify params.xml
Into Unity
Multiple ways to get data from this sensor. The frame rate of the depth sensor is stuck at 30 fps, making the pointclouds too heavy (300MB/s). See table below.
Bandwidth @ 640x240 resolution | ||
---|---|---|
Message type | Bandwidth (MB/s) @ 30 fps | Bandwidth @ 10 fps |
Pointcloud2 | 148 | 46 |
Image | 9.3 | 2.85 |
CompressedImage | 0.45 | 0.2 |
Setting the resolution to 640x240 helps (default 640x480). PointCloud2 still lags too much. 1 fps and increasing delay. Raw image gives 2fps and stable short delay. Acceptable.
By using depth images, manually generation of pointclouds are necessary, see below.
The frame rate can be hacked down to 10 fps by setting the dynamic parameter f200_motion_range to 100 (default is 0). This forces longer exposure.
Measured Unity framerates @ 640x480 resolution | |||
---|---|---|---|
30 fps | 10 fps | ||
Just network | PointCloud2 | 1.7 | 1.7 |
Image | 25 | 8.6 | |
With point cloud rendering | PointCloud2 | 1.6 | 1.7 |
Image | 8 | 8 |
We observe that network is the big problem here. The maximum frame rate from the PointCloud2 topic is 1.7. The Image topic is much lighter, and even with the required post processing (deprojection) in Unity, the Image topic is more than four times faster.
The format on the depth images are U16C1, which comes from OpenCV. uint16, one channel.
CompressedImage
The first one is the official one which uses the CompressedImage message for sending depth data.
The data is published at the topic /camera/depth/image_raw/compressedDepth. I have not been able to decompress this in Unity using Texture2D, as the image is read as RGBA24. ROS says it is supposed to be 16UC1, and the ros package hints that it is png, guessing by the settings.
Update After two weeks, I finally managed to get this topic into unity, the first 12 bytes of the data packet is junk and have to be removed for the png to be decoded. However, if a compressed image is to be used, a point cloud has to be generated. This includes using the camera matrix for mapping the pixels to its appropriate line through the lens, into 3d space.
Update 8. Desember 2016 I have not managed to get the png images into Unity with more than five gray levels: 0.000, 0.004, 0.008, 0.012 and 0.016.
The function System.Convert.FromBase64String(string) is used for converting the data from JSONNode to a byte array. This is the only way as far as I know. This conversion might be the reason for the loss og graylevels.
Another interesting detail is that the greylevels are all 0/256=0, 1/256=0.004, 2/256=0.008, 3/256=0.012 and 4/256=0.016
After the conversion, which most likely is the issue here, the byte array can be loaded into a texture by using new Texture2D(640, 480).loadImage(array); The parameters to the texture's constructor (like format, mipmap and linear) does not matter, since loadImage decides for itself.
Mapping the depth image to 3D points
Camera parameters:
D: [0.1387016326189041, 0.0786430761218071, 0.003642927622422576, 0.008337009698152542, 0.09094849228858948] K: [478.3880920410156, 0.0, 314.0879211425781, 0.0, 478.38800048828125, 246.01443481445312, 0.0, 0.0, 1.0] R: [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0] P: [478.3880920410156, 0.0, 314.0879211425781, 0.025700006633996964, 0.0, 478.38800048828125, 246.01443481445312, 0.0012667370028793812, 0.0, 0.0, 1.0, 0.003814664203673601]
K means Calibration matrix. The calibration matrix maps points on the normalized image plane [x, y] to the pixel coordinates [u, v]. u = K*x. The matrix is 3x3 because it is homogeneous transformation. This transformation is the intrinsic part of the whole perspective camera model.
To calculate the 3D point in space, one must solve for x: x = K^-1 * u => x = K^-1 * u.
Since coordinates on the normalized image plane are homogeneous, we can multiply them by our depth measure (z), to extend the length of the vector to the appropriate point in 3D space.
x is [x/z, y/z, z], which gives the following result: [x, y, z] = K^-1 * u * z
K^-1:
0.00209035 | 0 | -0.656555 |
0 | 0.00209035 | -0.514257 |
0 | 0 | 1 |
Distortion
The images need to be undistorted using the Brown Conrady method. The distortion coefficients are:
k1 | k2 | p1 | p2 | k3 |
---|---|---|---|---|
0.1387016326189041 | 0.0786430761218071 | 0.003642927622422576 | 0.008337009698152542 | 0.09094849228858948 |
Source Librealsense projection
Source OpenCV, distortion coefficients
PointCloud2
This one is an older(2015) implementation which instead uses the PointCloud2 message for sending depth data.
PointCloud2 live data packet example:
header: seq: 537 stamp: secs: 1478272396 nsecs: 68496564 frame_id: camera_depth_optical_frame height: 480 width: 640 fields: - name: x offset: 0 datatype: 7 count: 1 - name: y offset: 4 datatype: 7 count: 1 - name: z offset: 8 datatype: 7 count: 1 is_bigendian: False point_step: 16 row_step: 10240 data: [0, 0, 192, 127, 0, 0, 192, ...] is_dense: False
CompressedImage live data packet example:
The message is documented here
Datatype 7 means float32
The binary data array is of type uint8 and can be read like so:
[ x ][ y ][ z ][ garbage? ][ x ] ... 0, 0, 192, 127, 0, 0, 192, 127, 0, 0, 192, 127, 0, 0, 128, 63, 0, 0, 192, 127, ...
Each point uses 16 bytes, or places in the array. This means that the last 4 bytes, labeled "garbage?", can possibly be redundant and removed to save bandwidth.
Floats can be extracted like this:
float myFloat = System.BitConverter.ToSingle(mybyteArray, startIndex);
Augmented reality with the HTC Vive
Calibrating the camera
The camera was calibrated in MATLAB with a mean reprojection error of 0,1905
280,829665857757 | 0 | 303,305043312279 |
0 | 280,230119462474 | 233,171543364662 |
0 | 0 | 1 |
-0,280758751984260 | 0,0739899523264349 |
The images need to be undistorted using the Brown Conrady method. The distortion coefficients are:
k1 | k2 | p1 | p2 | k3 |
---|---|---|---|---|
-0.224250482220782 | 0.0432676652414734 | 0.000310839414039509 | 0.000696641409984896 | -0.00329409354500417 |
Where k is radial distortion, and p is tangential distortion
The Aruco marker
The marker has a different coordinate system than as documented in OpenCV(Blue=z, red=x and green=y).
The drawDetectedMarkers function is wrong in relation to OpenCV documentation. Here the red axis is Z, blue is X, green is Y.
The output from estimatePoseSingleMarkers gives translation and rotation of the marker in the camera coordinate frame.
The following projection function can project 3d points into an image. Distortion coefficients must be given, or else the points get wrong position near the image border.
Calib3d.projectPoints(objectPoints, rvecs, tvecs, camMatrix, distCoeffs, projectedPoints);
Ajusting the rotation from Aruco to match the point cloud from the realsense
rvecs cloud needs a 45 degree rotation around the z axis to properly show the point cloud. Below is code for rotating in all directions.
// Ajust rotation Mat rotMat = new Mat(3, 3, CvType.CV_64FC1); Calib3d.Rodrigues(rvecs, rotMat); // The rotation angles in radians float angleRadsX = Mathf.PI * 0.5f; float angleRadsY = Mathf.PI * 0f; float angleRadsZ = Mathf.PI * 0f; // Three rotation matrices Mat rotX = new Mat(3, 3, CvType.CV_64FC1); Mat rotY = new Mat(3, 3, CvType.CV_64FC1); Mat rotZ = new Mat(3, 3, CvType.CV_64FC1); rotX.put(0, 0, 1); rotX.put(0, 1, 0); rotX.put(0, 2, 0); rotX.put(1, 0, 0); rotX.put(1, 1, Mathf.Cos(angleRadsX)); rotX.put(1, 2, -Mathf.Sin(angleRadsX)); rotX.put(2, 0, 0); rotX.put(2, 1, Mathf.Sin(angleRadsX)); rotX.put(2, 2, Mathf.Cos(angleRadsX)); rotY.put(0, 0, Mathf.Cos(angleRadsY)); rotY.put(0, 1, 0); rotY.put(0, 2, Mathf.Sin(angleRadsY)); rotY.put(1, 0, 0); rotY.put(1, 1, 1); rotY.put(1, 2, 0); rotY.put(2, 0, -Mathf.Sin(angleRadsY)); rotY.put(2, 1, 0); rotY.put(2, 2, Mathf.Cos(angleRadsY)); rotZ.put(0, 0, Mathf.Cos(angleRadsZ)); rotZ.put(0, 1, -Mathf.Sin(angleRadsZ)); rotZ.put(0, 2, 0); rotZ.put(1, 0, Mathf.Sin(angleRadsZ)); rotZ.put(1, 1, Mathf.Cos(angleRadsZ)); rotZ.put(1, 2, 0); rotZ.put(2, 0, 0); rotZ.put(2, 1, 0); rotZ.put(2, 2, 1); // Here only the z rotation is used Mat result = rotMat * rotZ; Mat rvecsAjusted = new Mat(1, 3, CvType.CV_32FC1); Calib3d.Rodrigues(result, rvecsAjusted);
Plan
Visualizing sensor data in Unity
Point cloud is extracted from ROS trough rosbridge. Data type is JSON and infrastructure is web sockets. Unity will need to have an asynchronous thread to handle the data to avoid frame drops due to latency. The point clouds are data intensive.
Articles
http://gbib.siggraph.org/ - Library from SIGGRAPH - Computer graphics conference
Virtual environments, 17 articles - frontiers [1]
Challenges in virtual environments (2014) [2]
Virtual reality
Multi-sensory feedback techniques, user studies [3]
Telepresence: Immersion with the iCub Humanoid Robot and the Oculus Rift [4]
Virtual reality simulator for robotics learning [5]
Augmented reality
User testing on AR system Instruction for Object Assembly based on Markerless Tracking
Evaluating human-computer interface in general Evaluation of human-computer interface for optical see-through augmented reality system
Sensor Data Visualization in Outdoor Augmented Reality
- Discusses the challenges with displays in lit environments[6]
Technical
Integration between Unity and ROS [7]
Unity: ROSBridgeLib [8]
Thesises
Teleoperation + visualization with oculus [9]
- User studies on HRI
- Bandwidth considerations
Remote Operations of IRB140 with Oculus Rift [10]
- Controlling robot arm with stereo camera with oculus
Books
Comuter vision [11]