Robot Learning and Robot Behavior Control Software
Skilligent Robot Vision System - Download Demo Software
Introduction

Skilligent Robot Vision, Brochure [PDF]

Skilligent Robot Vision System is a software component which implements powerful object recognition and object tracking algorithms. The system is specifically designed for robotics applications including visual object recognition and tracking, image stabilization, visual-based servoing, human-to-machine interaction and visual localization.

Skilligent Robot Vision System keeps digital object representations in an indexed structure optimized for fast searches. The software scans the video stream coming from a camera and searches occurrences of the objects. The computer vision software is based on algorithms resistant to

  • major changes in lightening,
  • partial object occlusions (up to 30-80% depending on the object and lighting conditions),
  • changes of angles of view (up to 30-45 degrees),
  • and camera lens distortions.

The software is based on a modified Harris Corner Detector algorithm. The software scans a video stream and extracts multiple image features. Those image features are matched against a database of known objects.

The video demonstrates how a robot recognizes various objects in a room. The robot is equiped with a low-cost video camera.

The robot shown on the video had the following objects remembered in its database:

  • The upper part of the entrance door (a landmark),
  • A PC on a table (a landmark),
  • A picture on the wall (a landmark),
  • An "Ubuntu" book,
  • A "Robotics and Control" book,
  • A white robot on the floor.

Skilligent Robot Vision System recongizes the objects from different angles - even if the objects were partially occluded.

Image Database System

The vision software comes with is an image database system, a searchable storage of visual information about objects, optimized for object identification and content-based image retrieval applications. The database stores visual information about physical rigid objects. Every object is described by one or more images of the object's facets. Having an image of a facet, the system builds a unique "fingerprint" of the image, called a model. There is a one-to-many relationship between objects and their models ("fingerprints").

Skilligent Image Database

The objects must have enough texture or labels which would allow identification of the object. Sample suitable objects are books, boxes, magazines, furniture items, buildings, rooms, home appliances, toys, landmarks for aerial navigation, pictures on the walls, and so on. The image database does not store raw images; instead, the system creates a unique "fingerprint" of an image and stores it in a tree-like indexed data structure optimized for fast searches. When recognizing objects shown on a given image, the system uses the indexed "fingerprint" information about all known objects for identifying which objects are shown on the image.

To enable the system to recognize an object from various angles, it might be necessary to take a picture of every side/facet and load those images into the database. For example, if an object is a book, it might be required to take pictures of the book's front cover as well as back cover in order to help the vision system recognize the book from various angles of view. The system assigns a unique ID to every object loaded into the database (Object ID). An object can have one or more models associated with it. A model represents a particular facet/side of an object. The image database stores unique "fingerprints" of every model. The image database assigns a unique ID to every model of every object.

An object recognition algorithm used by the image database system is of a logarithmic complexity. This means that recognition time does not increase much when additional objects are added into the database. For example, if the number of models stored in the database is doubled, the average recognition time would only increase by about 30%. This rule is valid for relatively large databases (e.g. hundreds or thousands of objects). If a database contains just several objects, the performance might not be noticeably affected at all after the number of objects has been doubled.

Visual Object Recognition and Tracking

Recognized partially occluded object

The vision system receives a video stream directly from a video camera. Through a network-based programming interface, a user process provides the vision system with a list of objects the application wants to track. Those objects could be visual landmarks used for localization, items on a conveyor belt, objects which a mobile robots needs to follow, and so on. Having a list of objects which need to be tracked, the vision system creates a special indexed data structure which helps identifying those objects in a realtime video stream.

A video stream coming from a video camera is a sequence of individual video frames. After processing a frame, the vision system sends a message to the user process with information about objects that have been recognized on the frame. If an object is recognized, the message carries information detailing current position of the object in the image frame [Object ID, Model ID, X, Y, Scale, Angle]. For every object in the list, the system either returns the object's current position, or a flag telling that the object has not been recognized on the current frame.

Because the system continuously processes image frames coming from a video camera, a continuous low-latency stream of information about current object positions is delivered to a user process (or processes). This enables the user process to exercise timely control logic based on the realtime visual input.

Recognition speed is generally proportional to the number of pixels on an input image (resolution of the camera). For example, if every dimension of an image is doubled (x2), the number of pixels in the image grows 4 times, - and the time required to recognize all objects on a given image is increased by 4 times.

Interpretation of Recognition Results

If an object is identified on a frame/input image, the system returns a unique ID of the object as well additional information detailing position of the object on the image: [Object-ID, Model-ID, X, Y, Scale, Angle]

  • Object-ID is a unique integer identifier which was assigned to the object by the system when the object was added into the database
  • Model-ID identifies what facet/side of the object is visible on the input image. If several facets are visible, the system returns information about every individual facet.
  • X defines a sub-pixel horizontal position of the center of the object’s facet relative to the input image’s frame coordinates.
  • Y defines a sub-pixel vertical position of the center of the object’s facet relative to the input image’s frame coordinates.
  • Scale is a floating number which tells how much bigger or smaller the object looks as compared to a model of the object stored in the database. This parameter is useful for determining distance to objects. The further away from the camera an object is, the smaller its visible scale is. If scale equals 1.0, this tells that the object is at the same distance from the camera as it was when an image of it was taken for adding into the database.
  • Angle tells how much the object is rotated as compared to a model of the object stored in the database.
Software Architecture

On Windows platforms, the system installs as a Windows Service and can be started or stopped through Control Panel/Administrative Tools/Services program. On Linux platforms, the vision system's initialization/shutdown procedures are controlled by a set of init.d scripts. The main system's process is called skilligent.exe. The process runs in parallel with user processes.

A single vision process hosts the following subsystems:

  • Visual Object Recognition and Tracking System
  • Image Database System

Robot Vision Architecture

Although both subsystems are packaged into the same executable file, those functions have their own distinctive applications and can be used independently. In fact, there is a way to disable one of the subsystems in order to preserve available computer resources; this is useful in case a particular application does not need both features. In order words, the process can be configured to act as an image database server, as a realtime visual object recognition and tracking server, or both. Each of the functions comes with an application programming interface provided via a network protocol.

Physical System Architecture

The design of the vision system follows client-server architecture with the vision system being a server, and a multitude of user processes acting as clients.

A single computer running a vision process and a number of user processes

The vision system can be installed on the same computer that hosts a user process, or on a standalone computer dedicated to vision processing tasks; in this case, an Ethernet network or a Serial cable can be used to interconnect the computers.

A system with a computer dedicated to vision processing tasks

Depending on chosen system architecture, a system's designer might prefer one networking interface over another:

  • An Ethernet interface is suitable for connecting the vision system to programmable realtime controllers/PLC equipped with an Ethernet port. In this case, the vision system can be installed on a dedicated computer connected to a controller via a LAN or a direct Ethernet connection.
  • A serial (RS232) interface would be an interface of choice for connecting the vision system to a programmable controller/PLC which doesn't have an Ethernet port. Not all programmable controllers come with an Ethernet port, and not all of them have CPUs powerful enough to run the vision processing software onboard concurrently with application logic. In this case, the vision software might be installed on a dedicated computer interfaced to the realtime programmable controller via a serial cable.
  • Loopback IP interface (127.0.0.1): If both vision system and a user process run concurrently on the same computer, a local loopback IP interface is used for communication between the processes. Basically, a user process communicates to a local vision system in the same way as it would communicate to a vision system running on a remote computer. Running vision processing algorithms in a separate process helps utilizing resources of modern multi-core or multi-CPU computers.
Application Programming Interfaces (APIs)

In order to provide a broadest choice of programming languages to an application developer, the vision system adopts a network-oriented programming approach. All user processes communicate to the vision system by sending and receiving UDP, TCP/IP or Serial messages. Those messages are OS-agnostic and programming language agnostic. Thus, an application can be written in any programming language as long as the language supports networking libraries or primitives (Visual C++, C#, Visual Basic, GNU C/C++, Python, Java and so on). The vision system comes with a set of code samples which help kick start a systems integration project.

If both a user process and the vision system run on the same computer, a local IP loopback interface (127.0.0.1) is used for exchanging UDP or TCP/IP messages. A serial (RS232) networking option is also available for interfacing robotic controllers which do not have an Ethernet port.

Network-style vision APIs

A vision process can be configured to be an image database system, a realtime object recognition and tracking system, or both. Each of the subsystems has its own interface due to specific functional requirements.

Image Database API: A text command line ASCII protocol is used for adding images into the image database as well as for querying the image database system. An application designer can choose TCP/IP or Serial (RS232) protocol as a transport protocol for interfacing the image database system. The command line interface is usable by both humans and client software. Users (human beings) can access the command line interface through Telnet (TCP/IP) or HyperTerminal (Serial) programs.

Command line interface (CLI) of Skilligent Image Database

Visual Object Recognition and Tracking API: A low latency binary protocol is used for realtime object recognition and tracking. The protocol's design ensures that updates of positions of tracked objects are delivered to a user process in a shortest time possible. An application designer can choose UDP or Serial (RS232) as a transport protocol for communicating to the real-time tracking system.

Visual Object Recognition and Tracking API

The binary protocol is optimized for use with connectionless UDP protocol known for low latencies; a serial (RS232) interface is a backup solution for networking with controllers which do not have an Ethernet port.

Download Demo Software