Psychovisual Masks And Intelligent Streaming Rtp Techniques For The Mpeg-4 Standard - Citegraph

Paper Info

Title
Psychovisual Masks And Intelligent Streaming Rtp Techniques For The Mpeg-4 Standard

Abstract
In today multimedia audio-video communication systems, data compression plays a fundamental role by reducing the bandwidth waste and the costs of the infrastructures and equipments. Among the different compression standards, the NTEG-4 is becoming more and more accepted and widespread. One of the fundamental aspects of this standard, is the possibility of separately coding video objects (i.e. to separate moving objects from the background and adapt the coding strategy to the video content), nevertheless currently implemented codecs work only at the full-frame level. In this way, many advantages of the flexible NTEG-4 syntax are missed. This lack is due both to the difficulties in properly segmenting moving objects in real scenes (featuring an arbitrary motion of the objects and of the acquisition sensor), and to the current use of these codecs, that are mainly oriented towards the market of DVD backups (a full-frame approach is enough for these applications).In this paper we propose a codec for NIPEG-4 real-time object streaming, that codes separately the moving objects and the scene background. The proposed codec is capable of changing its strategy during the transmission, by analysing the video and by setting the parameters and modalities accordingly. For example, the background can be processed as a whole or by dividing it into "slightly-detailed" and "highly-detailed" zones that are coded in different ways to reduce the bit-rate while preserving the perceived quality. Psychovisual masks and other video-content based measurements have been used as inputs for a Self Learning Intelligent Controller (SLIC) that changes the parameters and the transmission modalities.The current implementation is based on the ISO 14496 standard code that allows Video Objects (VO) transmission (other Open Source Codes like: DivX, Xvid, and Cisco's Mpeg-41P, have been analyzed but, as for today, they do not support VO). The original code has been deeply modified to integrate the SLIC and to adapt it for real-time streaming. A personal RTP (Real Time Protocol) has been defined and a Client-Server application has been developed. The viewer can decode and demultiplex the stream in real-time, while adapting to the changing modalities adopted by the Server according to the current video content.The proposed codec works as follows: the image background is separated by means of a segmentation module and it is transmitted by means of a wavelet compression scheme similar to that used in the JPEG2000. The VO are coded separately and multiplexed with the background stream. At the receiver the stream is demultiplexed to obtain both the background and the VOs, that are subsequently pasted together.The final quality depends on many factors, in particular: the quantization parameters, the Group Of Video Object (GOV) length, the GOV structure (i.e. the number of I-P-B VOP), the search area for motion compensation. These factors are strongly related to the following measurement parameters (that have been defined during the development): the Objects Apparent Size (OAS) in the scene, the Video Object Incidence factor (VOI), the temporal coherence (indirectly measured through the factor TC). The SLIC module analyzes the currently transmitted video and selects the most appropriate settings by choosing from a predefined set of transmission modalities. For example, in the case of a highly temporal correlated sequence, the number of B-VOP is increased to improve the compression ratio. The strategy for the selection of the number of B-VOP turns out to be very different from those reported in the literature with reference to MPEG-1 and NIPEG-2 coding, this depends on the different behaviour of the temporal correlation when limited only to moving objects. The SLIC module also decides how to transmit the background. In our implementation we adopted the Visual Brain theory i.e. the study of what the "psychic eye" can get from a scene. According to this theory, a Psychomask Image Analysis (PIA) module has been developed to extract the visually homogeneous regions of the background. The PIA module produces two complementary masks one for the visually "smooth" zones and one for the "rough" zones; these zones are compressed with different strategies and encoded into two multiplexed streams. From practical experiments it turned out that the separate coding is advantageous only if the low variance zones exceed 50% of the whole background area (due to the overhead of transmitting the zone masks). The SLIC module takes care of deciding the appropriate transmission modality by analyzing the results produced by the PIA module.The main features of this codec are: low bit rate, good image quality, and coding speed. The current implementation runs in real time on standard PC platforms, the major limitation being the fixed position of the acquisition sensor. This limitation is due to the difficulties in separating moving objects from the background when the acquisition sensor moves. Our current real time segmentation module does not produce suitable results if the acquisition sensor moves (only slight oscillatory movements are tolerated). In any case, the system is particularly suitable for tele surveillance applications at low bit-rates, where the camera is usually fixed or alternates among some predetermined positions (our segmentation module is capable of accurately separate moving objects from the static background when the acquisition sensor stops, even if different scenes are seen as a result of sensor displacements). Moreover, the proposed architecture is general, in the sense that when real time, robust segmentation systems (capable of separating objects in real-time from the background while the sensor itself is moving) will be available, they can be easily integrated while leaving the rest of the system unchanged.

Year	DOI	Venue
2003	10.1117/12.502557	VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2003, PTS 1-3
Keywords	Field	DocType
communication system,data compression,video,image quality,image analysis,it strategy,quantization,real time,client server,multiplexing,operating systems,operating system,sensors,telecommunications,wavelets,jpeg2000,multimedia,switches,compression ratio	Computer vision,Microsoft Windows,Source code,Computer science,Motion compensation,Communications system,Artificial intelligence,JPEG 2000,Data compression,MPEG-4,Codec	Conference
Volume	ISSN	Citations
5150	0277-786X	0
PageRank	References	Authors
0.34	1	2

Authors (2 rows)

Cited by (0 rows)

References (1 rows)

Name	Order	Citations	PageRank
Alessandro Mecocci	1	60	14.38
Francesco Falconi	2	0	0.34

1