M. S. Aksoy
Of the five senses - vision, hearing, smell, taste and touch - vision is undoubtedly the one that man has come to depend upon above all others and indeed the one that provides most of the data he receives. Not only do the input pathways from the eyes provide megabits of information at each glance, but also the data rates for continuous viewing probably exceed 10 megabits per second.
Another feature of the human visual system is the ease with which interpretation is carried out. We see a scene as it is - trees in a landscape, books on a desk, products in a factory. No obvious deductions are needed and no overt effort is required to interpret each scene. In addition, answers are immediate and available normally within a tenth of a second. The important point is that we are for the most part unaware of the complexities of vision. Seeing is not a simple process.
We are still largely ignorant of the process of human vision. However, man is inventive and he is now trying to get machines to do much of his work for him. For the simplest tasks there should be no particular difficulty in mechanization but for more complex tasks the machine must be given man’s prime sense, i.e. that of vision. Efforts have been made to achieve this, sometimes in modest ways, for well over 30 years. At first, such tasks seemed trivial and schemes were devised for reading, for interpreting chromosome images and so on. But when such schemes were confronted with rigorous practical tests, the problems often turned out to be more difficult.
Computer vision blends optical processing and sensing, computer architecture, mechanics and a deep knowledge of process control. Despite some success, in many fields of application it really is still in its infancy.
With the current state of computing technology, only digital images can be processed by our machines. Because our computers currently work with numerical rather than pictorial data, an image must be converted into numerical form before processing.
The field of machine or computer vision may be sub-divided into six principal areas (1) sensing, (2) pre-processing, (3) segmentation, (4) description (5) recognition and (6) interpretation. Sensing is the process that yields a visual image. The sensor, most commonly a TV camera, acquires an image of the object that is to be recognised or inspected. The digitizer converts this image into an array of numbers, representing the brightness values of the image at a grid of points; the numbers in the array are called pixels. Pre-processing deals with techniques such as noise reduction and the enhancement of details. The pixel array is fed into the processor, a general-purpose or custom-built computer that analyses the data and makes the necessary decisions. Segmentation is the process that partitions an image into objects of interest. The segmentation of images should result in regions which correspond to objects, parts of objects or groups of objects which appear in the image. These features of these entities, along with their positions relative to the entire image, help us to make a meaningful interpretation. Description deals with the computation of features such as size, shape, texture, etc. suitably for differentiating one type of object from another. Recognition is the process that identifies these objects. Finally, interpretation assigns meaning to an ensemble of recognised objects.
Any description of the human visual system only serves to illustrate how far computer vision has to go before it approaches human ability.
In terms of image acquisition, the eye is totally superior to any camera system yet developed. The retina, on which the upside-down image is projected, contains two classes of discrete light receptors - cones and rods. There are between 6 and 7 million cones in the eye, most of them located in the central part of the retina called the fovea. These cones are highly sensitive to colour and the eye muscles rotate the eye so that the image is focused primarily on the fovea. The cones are also sensitive to bright light and do not operate in dim light. Each cone is connected by its own nerve to the brain.
There are at least 75 million rods in the eye distributed across the surface of the retina. They are sensitive to light intensity but not to colour.
The range of intensities to which the eye can adapt is of the order of 1010, from the lowest visible light to the highest bearable glare. In practice the eye per forms this amazing task by altering its own sensitivity depending on the ambient level of brightness.
Of course, one of the ways in which the human visual system gains over the machine is that the brain possesses some 1010 cells (or neurons), some of which have well over 10,000 contacts (or synapses) with other neurons. If each neuron acts as a type of microprocessor, then we have an immense computer in which all the processing elements can operate concurrently. Probably, the largest manmade computer still contains less than a million processing elements, so the majority of the visual and mental processing tasks that the eye-brain system can perform in a flash have no chance of being performed by present-day man-made systems.
Added to these problems of scale is the problem of how to organize such a large processing system, and also how to program it. Clearly, the eye- rain system is partly hard-wired but there is also an interesting capability to program it dynamically by training during active use. This need for a large parallel processing system with the attendant complex control problems illustrates clearly that machine vision must indeed be one of the most difficult intellectual problems to tackle.
Part of the problem lies in the fact that the sophistication of the human visual system makes robot vision systems pale by comparison.
Developing general-purpose computer vision systems has been proved surprisingly difficult and complex. This has been particularly frustrating for vision researchers, who experience daily the apparent ease and spontaneity of human perception.
As can be seen from the information given above, developing a computer-vision system requires knowledge. Hence the eye performs better and better than even the best computer-vision system, the very complex eye-brain system also requires more comprehensive knowledge to build. When man understands and discovers the functioning of the eye-brain system, better computer-vision systems will be developed. That means the eye-brain system (like other systems in the body of human being) is a very deep and rich knowledge source. As this system has links with other systems in the body, it obviously shows that the maker of these systems is One who knows everything about every single part of the whole body, for not only the eye-brain system but the entire body develops accordingly.
So, who can be the maker of this fantastic eye-brain system? If it is said that it is self-creating, this has no meaning because everybody knows that such an important system cannot create itself, just as a computer-vision system cannot give itself existence. As for chance, is it possible for such a system, which is full of knowledge for human being to imitate in order to develop computer- vision systems, to be made by chance at all? Of course not. So who is the maker of the eye-brain system?
The maker or the creator of this system can only be One who has supernatural power. He says in His Holy Book:
‘Have we not made for him (human being) a pair of eyes?’ (The Holy Qur’an 90.8).
Yes, indeed, He made them just as He made the whole of the rest of the cosmos with His unlimited knowledge.
REFERENCES
- DAVIES, B. R. (1990) Machine vision: theory, algorithms, practicalities, Academic Press, New York.
- GOWRISBANKAR, T. R, & BOURBAKIS, N. G. (June 1990) Specifications for the development of a knowledge-based image-interpretation systems, Engineering Applications of Al, Vol. 3, pp.79-9O.
- MAJUMDER, D. D. (1988) Computer vision and knowledge based computer systems, Institution of Electronics and Telecom. Engineers, Vol. 34, No 3, pp. 230-245.
- MULLER. B. & REINHARDT. J. (1990) Neural networks, an introduction. Springer- Verlag Publications, New York.
- RODD, M. G. (1990) Knowledge based vision systems, Knowledge Engineering, Vol. 2. ed, Adeli, II., McGraw [Jill, pp.245-276.
- ROSENFELD, A. (August 1985) Machine vision for industry: tasks, tools and techniques, Image and Vision Computing, Vol. 3, No 3. pp.122-135.
- SANDERSON, 1