In general terms, VoIP (Voice over IP) refers to a technology domain that specifies protocols which enable users to utilize an IP network for transmission and reception of voice. Specifically, it was originally conceived as a cheaper alternative to dedicated circuit switched lines between calling and called parties, since IP provides a packet based infrastructure that can be re-used by several hosts at the same time. Today, the 'V' in VoIP is a misnomer as the technology has evolved significantly where users can utilize the IP network to transmit and receive voice, video as well as rich data, which is collectively referred as 'Multimedia'.
Te following functionality is needed for establishing an end to end call between two parties (the simplest case):
A bit of history: H.323 was conceived by the ITU as a protocol that aimed to emulate existing telephony services over a packet network, while SIP was conceived by the IETF. While both are well known standardization bodies, ITU approach the problem from a 'how to fit telephony services into a distributed packet network model' while IETF approached the problem from 'the Internet is the architecture - if it requires that traditional telephony services are mapped to the Internet, then the services need to be rearchitectured to fit the Internet and not the other way around'. After several years of contention, technical differences, and a lot of marketing and hype, SIP has evolved to become the more dominant protocol, even though in terms of deployment, H.323 is still ahead. All new deployments, however, are SIP based.
In this section, we describe four of the most common challenges that are faced in large scale VoIP deployments
From a deployment perspective, most users are behind a NAT and/or Firewall of sorts. Simply put, this provides unique challenges of 'reachability' for users behind such boxes. As an example, several enterprise networks offer 'private IP addresses' to computers. What this means is that the IP addresses are only understood within the domain they are allotted and are not globally reachable. Typically, in such a scenario, the NAT box converts from private to public IP addresses to and fro. NATing is a common solution to avoid the addressing space problems of IPv4 systems, where there may not be sufficient IP addresses allotted for all computers. Another example is broadband connections at our homes. We usually buy a DSL/Cable connection from our provider and pay for a single static IP address. However, most of us need to connect more than one PC to this connection. A common solution is to allocate private IP addresses to each internal PC and have the NAT box de/multiplex the connections using a combination of IP+port for the internal PCs. Technically, this is called NAPT (Network Address Port Translation) where IP+port is used to de-multiplex multiple simultaneous connections over a single public Internet IPv4 address. While all of this is great, NAT provides a unique challenge for protocols that embed IP addressing information as part of the packet payload. Consider, for example, a standard SIP message like so:
INVITE sip:firstname.lastname@example.org SIP/2.0
(rest of details follow)
In this case, 10.4.16.12 happens to be a private IP Address for the client in question. When this is inserted in an IP packet with source IP of 10.4.16.12,the NAT box smartly knows that 10.4.16.12 needs to be rewritten by a public IP+port so it can be routed back to the right client on receiving a response to this request. However, a traditional NAT does not know the contents of the payload, and hence, it does not change the IP address inside the SIP message.
In essence, this results in a situation where the response never reaches the originator. This is a classic example for the need of Application aware gateways which in addition to IP NATing are also capable of Application Level NATing. One possible solution is to deploy a NAT ALG at the network which in addition to IP level NATing also performs SIP NATing (These are called Application Layer Gateways - ALG). The problem with this solution is that it defeats end-end encryption. If the party behind the NAT wished to setup a secure signaling path with the called party, it would be mandatory for the ALG to obtain the encryption keys to be able to decode the payload for ALG operations. Therefore, the most desirable solution is for the endpoint to 'discover' its public IP address and insert the correct IP address in the SIP/SDP Payload IETF has proposed several solutions to NAT traversal. The most common and simple solution of them all is a protocol specification called STUN. In essence STUN is a client/server protocol that requires a "STUN client" to be installed with the UA behind the NAT and a STUN server to be available on the public Internet which can respond to the client's requests by providing its "view" of the receive IP and port. STUN works for both TCP and UDP and it is expected that before a UA makes a call, it executes STUN procedures to discover its public IP address + port and encodes a SIP message accordingly.
One of the drawbacks of STUN is that it does not work with symmetrical NAT configurations. A Symmetrical NAT configuration is where the NAT boxes create a mapping based on both, the source IP+port as well as the destination IP+port. Simply put, this means that the IP address+port that the UA discovered via STUN, from the STUN server will be different from the public IP+port combination when contacting another external UA.
To solve this issue, IETF proposed a TURN solution where a 'TURN server' can act as a relay for symmetric NATs. In other words, if a symmetric NAT is discovered, the IP address of the TURN server can safely be used as the public IP+port for the UA behind the NAT irrespective of which outside UA it is contacting. The TURN server, in turn ensures that this relaying is only allowed for the selected calling/called parties providing additional security against general abuse.
Finally, even though TURN works for all NAT configurations, since it is a middle-box relay architecture, STUN is more desirable, if possible. Therefore, there was a need to specify a 'framework of discovery' that would automatically try various options and select the best option of NAT traversal. Thus ICE was born. ICE (Interactive Connectivity Establishment) is a procedure description of how UAs can try various options like STUN, TURN and other means to discover the best traversal protocol for a given scenario. Unfortunately, there are very few deployments that use ICE because of its complexity. (The IETF draft has been considerably simplified since it first was authored, so we hope to see more deployments sooner.)
NATing is not the only problem plaguing this space. Several Firewalls deploy several security measures which include opening symmetrical ports which means if a client behind the NAT opens port 6789 for media streaming, the UA on the far side cannot transmit media to another port. In addition to this, several firewalls close connections if there is a lack of traffic in a tunnel opened between a UA inside a firewall and one outside. To avoid this problem, several Session Border Controllers (SBC) implement fake pings for the sake of keeping the connection open. Firewall and NAT traversal is a detailed topic in its own right and readers are urged to refer to the links provided below for detailed analysis. Suffice to say that one of the most critical requirements for VoIP phones is their ability to 'auto-configure' and 'auto-discover' so that users do not need to worry about these complexities. Skype is an example of a successful auto-discovery solution that makes NAT traversal very transparent. Columbia university researchers earlier dissected Skype's NAT traversal and found that Skype uses a protocol very similar to STUN and TURN.
Quality of Service is a much talked about problem space for Internet Telephony. The basic problem statement is "If the Internet is a massively distributed bunch of routers with no guarantee of which voice packet gets priority over another, how does one ensure that quality of service is consistent?”
There are different ways of looking at this problem. From one perspective, there are several reports from MCI which talk about the fact that the Internet core network backbone is 98% unutilized and QoS is lesser of an issue than vendors make it out to be. The other side of opinions state that instead of focusing on QoS priorities, optimized codec can be used to reduce bandwidth utilization. While these arguments are fine for the core backbone and for non-critical applications, QoS becomes a serious issue when one considers (Over The Air) transmission for VoIP. Radio resources are scarce, and in mission critical applications, consumers will want an SLA which states guaranteed Quality of Service.
There are several challenges to QoS - both from an implementation and a deployment perspective. With WiFi related technologies being deployed for the first hop access for VoIP phones, noise and interference become significant issues that can degrade overall user experience and proper QoS allocation schemes can help reduce these problems. QoS addresses several real-life problems with VoIP which include traffic shaping (high bandwidth applications which hog shared bandwidth can be controlled/limited), avoiding congestion (implement heuristic algorithms such as Weighted Random Early Detection (WRED) (where the packet Queue is dynamically adjusted by analyzing traffic priority and allocating more space for higher priority packets) and similar problems. Proper QoS management is critical for deployment of VoIP in large scale networks where first hop bandwidth is restricted, such as modern deployments of 3GPP cellular networks.
One of the biggest challenges of using the Internet as a medium for communication is that any mode of communication is inherently open to any security concerns of the way the Internet is deployed today. One of the most common issues is that of Identity and Trust. In the context of the discussion here, we are taking about subscriber and network identity and trust and not lower level security issues such as IP Spoofing. IP Spoofing is a concept where a rogue client can use a raw socket interface and spoof it's IP to belong to other nodes. There are several ways to address IP level spoofing including strong sequencing algorithms packet filtering etc.)
At the application level, the basic questions are: "Can the Network trust the subscriber?"
"Can the subscriber trust the network?"
Since the Internet is, as described earlier, a distributed environment with no central control, both of the questions above are important to assess. As an example, a rogue client may easily send a "From:email@example.com" in a SIP INVITE message but the network needs to assess if the advertised identity is really the correct identity. Common mechanisms include authentication of users using algorithms such as Digest, IMS-AKA and other secure mechanisms. Yet another concept that has gained acceptance is "Network Asserted Identity". In this scheme, even if the user has been authenticated, the network has the right to override or supplement this identity with a "Network Asserted Identity (NAI)" which is used by other nodes in the network. NAI as a concept is very useful to apply additional security policies (example, Bob has authenticated correctly, but he is using an IP domain that was never used before - so let me add an NAI which can be an indication to network services to possibly de-activate certain sensitive services provisioned to Bob). The second question is also equally important - can the subscriber trust the network? To address this issue, deployments typically use a mutual authentication mechanism where both the sides can verify each other. One example is establishing an IPSec tunnel between the client and the network.
There has been a lot of discussion in the past on how serious of a problem spam for Internet Telephony (SPIT) and Spam for Instant Messaging (SPIM) is as VoIP deployments increase their market share.
Spam itself as we all know is all-pervasive in the email world. I was reading an interesting report from Symantec which reports that 67% of email is spam these days. While the percentage is staggering, there is some (?) comfort in knowing that this percentage has stabilized, which might mean that spam filters/gateways are maturing at a rate that is able to cope with the mutation of spam tricks.
The interesting thing is that even though pundits scream about the problem of spam over Internet Telephony, not too many carriers are biting, at least, for now. It's not that they don't think its important, but it just seems that they have other problems to solve (like it or not, security is the hardest and the last solved problem in real life).
In short, SPIT is a problem, that will eventually surface. The logic: the infrastructure needed for SPIT is very similar to the infrastructure needed for spam. The cost is very low, and there are no Do-not-call registries on the internet (at least for now). In addition, it is harder to detect a valid IP address of source as compared to an originating phone number. All in all, the Internet offers SPITers better anonymity and lower costs – so why not ? Infact, there are companies who already think it is a big problem and have products out. Not to mention that this space will soon be chock-a-block with patents.
However, we really don't think SPIT is a huge problem for "walled-garden" networks - it's a much bigger problem when you have peer2peer networks.
There are a few reasons for this:
One of the strongest and most effective ways to avoid SPIT is by good authentication. In an ideal world, if every user could be authenticated for its identity, the chances that a spam bot gets to you reduce greatly. Walled garden networks usually operate in "circles of trust". Let us suppose firstname.lastname@example.org were to call email@example.com via SIP. For the att.net proxy to accept the call from bob, it would most likely expect a digital authentication certificate from Verizon's proxy telling the AT&T proxy that this call actually came from Verizon. In other words, AT&T trusts the Verizon network and expects the Verizon network to have authenticated its users. This sort of ‘trust circles’ between service providers is common. AT&T cannot go ahead and authenticate each and every user on a foreign domain, so instead it decides to trust the network, based on strong authentication. It is Verizon’s responsibility to make sure that bob is a valid user. For a spammer to break such a network would require that:
The third point works if the UA itself accepts calls from any source. A secure UA should be configured to reject any calls that are not passed on to them from their inbound proxy via TLS. In other words, if firstname.lastname@example.org were to call email@example.com, Sue's UA should detect if it came along with credentials from its authorized inbound SIP proxy and if not, it should be rejected, prompting the caller to go through the proxy. If all UAs were indeed configured this way, then the problem of SPIT reduces a great deal (not eliminated, but significantly reduced). However, not all UAs are configured this way. In addition, while this can be enforced within a walled garden network, problems arise when:
One way around it is for UA’s to allow caller authentication. In other words, even if a call comes via unknown channels, instead of rejecting it, challenge the caller for identity, while at the same time, don’t make it cumbersome for the called user (for example, if every SPIT phone rang your phone, and then challenged it, you’d switch back to PSTN within a day ! At least in PSTN, the FCC has done an effective job with the DoNotCall registry). For example, Vonage, which is an example of a Walled Garden implementation, requires authentication at the very least to let calls in.
SPIT, just like its older sibling spam, is a prime example of "many partial solutions lend to a stronger overall solution". In other words, there is no one mechanism that can effectively eliminate SPIT. The solution is to deploy various levels of defense in the hope that one of them catches the call before it reaches you. Here are some of them (since we mentioned, that content filtering which work with spam is mostly useless with voice)