Skilligent Robot Vision System - Download Demo Software
Skilligent Robot Vision System is a software component which implements powerful object recognition and object tracking algorithms. The system is specifically designed for robotics applications including visual object recognition and tracking, image stabilization, visual-based servoing, human-to-machine interaction and visual localization.
Skilligent Robot Vision System keeps digital object representations in an indexed structure optimized for fast searches. The software scans the video stream coming from a camera and searches occurrences of the objects. The computer vision software is based on algorithms resistant to
The software is based on a modified Harris Corner Detector algorithm. The software scans a video stream and extracts multiple image features. Those image features are matched against a database of known objects.
The video demonstrates how a robot recognizes various objects in a room. The robot is equiped with a low-cost video camera.
The robot shown on the video had the following objects remembered in its database:
Skilligent Robot Vision System recongizes the objects from different angles - even if the objects were partially occluded.
Image Database System
The vision software comes with is an image database system, a searchable storage of visual information about objects, optimized for object identification and content-based image retrieval applications. The database stores visual information about physical rigid objects. Every object is described by one or more images of the object's facets. Having an image of a facet, the system builds a unique "fingerprint" of the image, called a model. There is a one-to-many relationship between objects and their models ("fingerprints").
The objects must have enough texture or labels which would allow identification of the object. Sample suitable objects are books, boxes, magazines, furniture items, buildings, rooms, home appliances, toys, landmarks for aerial navigation, pictures on the walls, and so on. The image database does not store raw images; instead, the system creates a unique "fingerprint" of an image and stores it in a tree-like indexed data structure optimized for fast searches. When recognizing objects shown on a given image, the system uses the indexed "fingerprint" information about all known objects for identifying which objects are shown on the image.
To enable the system to recognize an object from various angles, it might be necessary to take a picture of every side/facet and load those images into the database. For example, if an object is a book, it might be required to take pictures of the book's front cover as well as back cover in order to help the vision system recognize the book from various angles of view. The system assigns a unique ID to every object loaded into the database (Object ID). An object can have one or more models associated with it. A model represents a particular facet/side of an object. The image database stores unique "fingerprints" of every model. The image database assigns a unique ID to every model of every object.
An object recognition algorithm used by the image database system is of a logarithmic complexity. This means that recognition time does not increase much when additional objects are added into the database. For example, if the number of models stored in the database is doubled, the average recognition time would only increase by about 30%. This rule is valid for relatively large databases (e.g. hundreds or thousands of objects). If a database contains just several objects, the performance might not be noticeably affected at all after the number of objects has been doubled.
Visual Object Recognition and Tracking
The vision system receives a video stream directly from a video camera. Through a network-based programming interface, a user process provides the vision system with a list of objects the application wants to track. Those objects could be visual landmarks used for localization, items on a conveyor belt, objects which a mobile robots needs to follow, and so on. Having a list of objects which need to be tracked, the vision system creates a special indexed data structure which helps identifying those objects in a realtime video stream.
A video stream coming from a video camera is a sequence of individual video frames. After processing a frame, the vision system sends a message to the user process with information about objects that have been recognized on the frame. If an object is recognized, the message carries information detailing current position of the object in the image frame [Object ID, Model ID, X, Y, Scale, Angle]. For every object in the list, the system either returns the object's current position, or a flag telling that the object has not been recognized on the current frame.
Because the system continuously processes image frames coming from a video camera, a continuous low-latency stream of information about current object positions is delivered to a user process (or processes). This enables the user process to exercise timely control logic based on the realtime visual input.
Recognition speed is generally proportional to the number of pixels on an input image (resolution of the camera). For example, if every dimension of an image is doubled (x2), the number of pixels in the image grows 4 times, - and the time required to recognize all objects on a given image is increased by 4 times.
Interpretation of Recognition Results
If an object is identified on a frame/input image, the system returns a unique ID of the object as well additional information detailing position of the object on the image: [Object-ID, Model-ID, X, Y, Scale, Angle]
On Windows platforms, the system installs as a Windows Service and can be started or stopped through Control Panel/Administrative Tools/Services program. On Linux platforms, the vision system's initialization/shutdown procedures are controlled by a set of init.d scripts. The main system's process is called skilligent.exe. The process runs in parallel with user processes.
A single vision process hosts the following subsystems:
Although both subsystems are packaged into the same executable file, those functions have their own distinctive applications and can be used independently. In fact, there is a way to disable one of the subsystems in order to preserve available computer resources; this is useful in case a particular application does not need both features. In order words, the process can be configured to act as an image database server, as a realtime visual object recognition and tracking server, or both. Each of the functions comes with an application programming interface provided via a network protocol.
Physical System Architecture
The design of the vision system follows client-server architecture with the vision system being a server, and a multitude of user processes acting as clients.
The vision system can be installed on the same computer that hosts a user process, or on a standalone computer dedicated to vision processing tasks; in this case, an Ethernet network or a Serial cable can be used to interconnect the computers.
Depending on chosen system architecture, a system's designer might prefer one networking interface over another:
Application Programming Interfaces (APIs)
In order to provide a broadest choice of programming languages to an application developer, the vision system adopts a network-oriented programming approach. All user processes communicate to the vision system by sending and receiving UDP, TCP/IP or Serial messages. Those messages are OS-agnostic and programming language agnostic. Thus, an application can be written in any programming language as long as the language supports networking libraries or primitives (Visual C++, C#, Visual Basic, GNU C/C++, Python, Java and so on). The vision system comes with a set of code samples which help kick start a systems integration project.
If both a user process and the vision system run on the same computer, a local IP loopback interface (127.0.0.1) is used for exchanging UDP or TCP/IP messages. A serial (RS232) networking option is also available for interfacing robotic controllers which do not have an Ethernet port.
A vision process can be configured to be an image database system, a realtime object recognition and tracking system, or both. Each of the subsystems has its own interface due to specific functional requirements.
Image Database API: A text command line ASCII protocol is used for adding images into the image database as well as for querying the image database system. An application designer can choose TCP/IP or Serial (RS232) protocol as a transport protocol for interfacing the image database system. The command line interface is usable by both humans and client software. Users (human beings) can access the command line interface through Telnet (TCP/IP) or HyperTerminal (Serial) programs.
Visual Object Recognition and Tracking API: A low latency binary protocol is used for realtime object recognition and tracking. The protocol's design ensures that updates of positions of tracked objects are delivered to a user process in a shortest time possible. An application designer can choose UDP or Serial (RS232) as a transport protocol for communicating to the real-time tracking system.
The binary protocol is optimized for use with connectionless UDP protocol known for low latencies; a serial (RS232) interface is a backup solution for networking with controllers which do not have an Ethernet port.
Download Demo Software
Skilligent created demonstration software which shows several features of Skilligent Robot Vision System.
The demo requires:
Robot Vision Demo Installer [ZIP]. Unzip it before launching the installer.
© Skilligent Inc |