Side_1_360

of the telephony system are used. Hence, no echo cancellation is performed by the telephony system even though long distance calls are made. That forces VoIP to include echo cancellation among the services supplied.

Voice Activity Detection

In the ordinary telephone network a two-way simultaneous link is set up between sender and receiver. This link carries voice at a rate of 64 kb/s in both directions. Usually only one of the parties is active at any one time; even the active part has breaks and pauses in a normal speech pattern. Hence, the utilization of this two-way link is most of the time less than 40 %. This fact could be used in VoIP to enhance the performance of the transmission and less bandwidth is required to obtain better speech quality using voice activity detection (VAD). A generic outlook of the VAD algorithm is depicted in Fig- ure 5, where it is shown that the algorithm works by detecting the magnitude (dB) and then decid- ing when the voice is inactive and thereby stop- ping the transmission of packets in that direction for the moment. To be on the safe side, when cutting the transmission the algorithm waits a fixed amount of time, hang-over time, after it detects a drop in the voice magnitude before it totally stops the voice sample packet transmission. The hang-over time duration is in the magnitude of hundreds of ms (typically 150–250 ms). Another problem is to differ between voice and background noise, and to calibrate itself the VAD is disabled at the beginning of new calls. However, even after that it could be cumber- some to detect when a new voice spurt occurs. The algorithm cut-offs the beginning of each new voice spurt and waits until it is sure that it is a new voice spurt and not, for example, a noise peak. This phenomenon is called front-end speech clipping, and is usually not noticeable for the listener.

Standards

Interoperability among VoIP products has been a major stumbling block to widespread acceptance of the technology. The ITU’s H.323 umbrella standard, shown in Figure 6, which was the first posed for VoIP interoperability, proved complex and difficult to implement. As a result, other less-unwieldy standards were posed in its place and until recently, we have seen little consensus on which VoIP standards that would be the most widely implemented. Even though the H.323 standard is the dominating standard at present, most vendors foresee a coexistence of several standards in the arena for quite some time. The most supported standard is H.323 version 2, but version 3 and 4 are rapidly catching up. (It should be pointed out that H.323 version 1 is not forward compatible with the latter standards of H.323.) Other supported standards are SIP (Ses- sion Initiated Protocol) by IETF, the Media Gateway Control Protocol (MGCP) and H.248. SIP is an application layer signalling protocol that specifies call control for multiparty sessions, IP phones or multimedia distribution. Unlike H.323, which is based on binary encoding, SIP is a text-based protocol that is usually easier to implement. Further information regarding SIP could be found in [6,7].

MGCP is designed as a simple mechanism to mainly control the gateways. Its function is to control the gateways while relying on external call control intelligence for more complex functions. With the MGCP model, the gateway focuses on the audio signal translation function while a call agent, external to the gateway, han- dles the signalling and call processing functions. By separating out the internal gateway functions from the external signalling function, the imple- mentation, upgrade and maintenance of the gateway are reduced to a minimum. This increases the likelihood of widespread use of this technol-

Figure 5 The voice activity
detection (VAD) algorithm,
used to decrease the required
bandwidth for VoIP calls

dB Magnitude Hang-over Magnitude

Front-end speech clipping

Time

Noise floor

Front-end speech clipping

Side_1_360

Voice Activity Detection

Standards

Get our desktop app

Company

Features

Documentation

Resources