Performance and Capacity Testing for V3.0

Summary

As part of research project into energy efficiency of VoIP systems, we performed a number of performance tests of a SIP-Router based SIP server. Our goal was to figure out the maximum number of User Agents (UA) that can be handled by a single SIP server with a fully-featured configuration and with signaling traffic that is similar to what ITSPs would see in the public Internet.

To generate realistic traffic patterns, we surveyed three major European ITSPs. We created a model of signaling traffic based on the data obtained from those ITSPs and setup a test bed to generate traffic with similar patterns for a variable number of user agents.

We found that a single full-featured SIP server can sustain the signaling traffic generated by 0.5 million subscribers. The aggregate volume of signaling traffic generated by the load generators during the test was 210 Mbit/s. We also learned that the SIP server consumes 4.5kB of memory per TCP connection and that OpenSSL consumes additional 61kB of memory for each TLS connection. That makes OpenSSL memory consumption a major bottleneck for TLS based setups.

We ran all the tests described below in February 2010 with a SIP-Router snapshot very similar to the source code released as version 3.0.0. The results are applicable to both SER and Kamailio SIP server series v3.0.x.

Jan Janak and Salman Baset created the testing scenarios, designed the test bed, administered the tests, and compiled this report.

Daniel-Constantin Mierla contributed section "Enhancements in v3.1.x" which describes possible performance improvements implemented in later releases.

Goals

We wanted to estimate the size of subscriber population a single SIP server with a fully-featured configuration file can handle. We were not interested in performing aggressive optimizations or simplyfing the configuration file. Our goal was to measure the performance of our SIP server in a default out-of-the-box configuration. In particular we wanted to use all the features a modern Internet Telephony Service Provider (ITSP) would need to use in the public Internet.

Our goal was to figure out the maximum number of User Agents (UA) the SIP server can support on a single server. All the tests were performed as part of an effort to estimate the power consumption of VoIP based services. Although we primarily focused on UDP as transport protocol for signaling, we also performed a number of simpler tests for TCP and TLS with the goal to determine bottlenecks in those scenarios.

Testbed Overview

The testbed consisted of 8 PCs connected with gigabit Ethernet. One machine was running the SIP server, the rest of the machines were used as SIP load generators. All of the load generator machines were connected to the SIP server machine via gigabit Ethernet. The SIP server machine had two gigabit Ethernet cards and the traffic generated by the load generators was evenly split between the two Ethernet segments.

The SIP server machine contained two Intel Xeon CPUs clocked at 2.33GHz. Each CPU contained 4 cores. The machine had 4GB of memory and two Intel 82545GM Gigabit Ethernet controllers. The operating system was Debian Squeeze with the Linux kernel 2.6.32. The Linux kernel had been compiled with Physical Address Extensions (PAE) enabled.

All load generators ran Debian Squeeze booted off a live CD. We used sipp version 3.1.r590-1, installed from a Debian package, to generate SIP traffic. Two load generators were 1U HP servers with Intel Xeon CPUs running at 3.06GHz, each of the CPUs had one core. The rest of the load generators were off-the-shelf desktop PCs.

To make sure that the testbed does not become a bottleneck, we first tried to establish the maximum number of calls per second (CPS) sipp can generate on our load generators, i.e., without a proxy server in the between. We were able to generate and process about 5000 CPS with sipp in our testbed. That generates about 97 Mbit/s of network traffic. Single sipp process cannot process this amount of traffic, therefore we started 5 sipp processes (5x UAS and 5x UAC) and let each process generate/process only 1000 CPS. With a higher number of CPS per process sipp started dropping calls.

SIP Server Configuration

The SIP server was based on the publicly available source code from the repository of the sip-router project. We retrieved a snapshot of the source code on 3rd February 2010.

Our SIP server implementation is multi-process based and we configured the server to create 16 processes to handle the SIP traffic. We configured the SIP server to use 2 GB of memory maximum (this memory is mainly used for SIP transactions and a cache of user location database records).

All the data related to subscribers was stored in a MySQL database. The MySQL database ran on the same machine as the SIP server. We used MySQL version 5.1.41 from a Debian package and configured the MySQL server to use 2 GB of memory for the query cache. In a previous performance test we learned that the query cache can have profound effect on SIP server's performance because the server emits simple, read-only, easy-to-cache queries repeatedly.

We provisioned the system with data for one million subscribers. The data includes user names and passwords for digest authentication, SIP URIs users can use in SIP messages and various other configuration related data. The size of all the data on the local hard disk was about 1.7 GB and because that is less than the size of the query cache, all the data can fit into the cache.

We also installed the RTP relay known as rtpproxy on the same server. Although we did not generate RTP traffic in our tests, we wanted to account for the communication overhead of creating and destroying RTP relaying sessions between the SIP server and rtpproxy. We used rtpproxy 1.2.1 installed from a Debian package.

For our tests we configured the SIP server with the most advanced configuration file which is available from the source code repository. We believe that the configuration file implements all the features typically implemented by ITSPs operating in the public Internet. In particular the following features were important for our tests:

  • Digest authentication of all REGISTER and INVITE messages.
  • User location database look-ups for incoming INVITEs.
  • NAT traversal detection.
  • Support for NAT-binding keep-alives.
  • The possibility to relay calls through an RTP relay (rtpproxy).

With this configuration file NOTIFY messages generated by user agents to keep NAT-bindings open were replied statelessly. REGISTER messages were authenticated and also replied statelessly. The processing of INVITE, ACK, and BYE requests was always transaction stateful and all INVITE requests originating from one of the local subscribers were subject to digest authentication.

The SIP server performed NAT detection of incoming SIP messages by inspecting the source IP of UDP datagrams, IP addresses in Via and Contact headers and the IP address in SDP bodies. The IP address in SDP was rewritten if the server detected that an RTP relay was needed.

The SIP server would insert Record-Route headers into all SIP message. This made all in-dialog requests, such as end-to-end ACKs and BYEs, go through the SIP server, no matter what.

SIP Traffic

We wanted to generate SIP traffic patterns similar to those found in existing real-world ITSP setups. Therefore, we collected traffic statistics from three European ITSPs and created our traffic model based on the data. The sizes of their subscriber population range from 100k to low millions. We design our test scenarios based on the information about the actual signaling traffic from one of the ITSPs. Our load generators emulate the UA population of a given size and for each UA they generate:

  • One NOTIFY request per 15 seconds. The server sends a 200 OK response back. The purpose of the request is to keep UDP bindings in NATs open. We observed that the ITSP used this technique because it was the most reliable method, despite it generated a lot of network traffic.
  • One registration refresh per 50 minutes. This included two REGISTER messages and two responses because the SIP server would challenge the first request with digest authentication.
  • Call setups with the following SIP message sequence:
INVITE-407-ACK-INVITE-100-180-200-ACK-BYE-200. 

During the test we kept increasing the rate of call setups as long as the system remained stable, with occasional retransmissions only and no dropped calls.

We tried to generate the traffic for for 0.5 million users. This resulted in 33k NOTIFY requests per second, 166 registration refreshes per second and 75 call setups per second.

Tests Scenarios

Our first goal was to see if the server could support 0.5 million users. Before running the main test, we populated the user location database with contacts for all the 0.5 million users. We did that by sending REGISTER requests for each of the users at a high rate. After that we slowed the registration rate down to 166 updates per second (each update generated two REGISTER messages, one with authentication and one without). In addition to populating the user location database with contacts, this initial test also fetched data into the memory query cache in MySQL.

UAC                 Registrar
 |    REGISTER          |
 |--------------------->|
 |     401              |
 |<---------------------|
 |  REGISTER w/ digest  |
 |--------------------->|
 |     200 OK           |
 |<---------------------|

Next, we added NOTIFY requests for NAT keep-alives. A population of 0.5 million where each user sends a NOTIFY request per 15 seconds generates 33k NOTIFY requests per second. We needed to start a number of sipp processes as multiple machines to achieve that rate.

UAC              Proxy
 |    NOTIFY       |
 |---------------->|
 |    200 OK       |
 |<----------------|

Finally, we added INVITE-ACK-BYE transactions to the mix. From the ITSP survey we know that 100k subscribers generate 20 INVITE transactions in busy hours. Therefore, we started additional sipp instances and configured them to generate 100 calls per second. We also configured sipp to wait for 4 second before sending the final 200 OK for an INVITE. The following diagram shows the call flow of this scenario.

UAC                   Proxy                  UAS
 |      INVITE          |                     |
 |--------------------->|                     |
 |      407             |                     |
 |<---------------------|                     |
 |                      |                     |
 |  INVITE w/ digest    |                     |
 |--------------------->|                     |
 |      100 Trying      |                     |                    
 |<---------------------|      INVITE         |
 |                      |-------------------->|
 |                      |      180 Ringing    |
 |      180 Ringing     |<--------------------|
 |<---------------------|                     |

                 (Pause for 4s)

 |                      |       200 OK        |
 |      200 OK          |<--------------------|
 |<---------------------|                     |
 |      ACK             |                     |
 |--------------------->|       ACK           |
 |                      |-------------------->|
 |      BYE             |                     |
 |--------------------->|       BYE           |
 |                      |-------------------->|
 |                      |       200 OK        |
 |      200 OK          |<--------------------|
 |<---------------------|                     |

Results

We found out that the SIP server can handle the signaling traffic generated by 0.5 million user agents without any problems. During the test all CPU cores were utilized to less than 10%. The signaling traffic generated by all sipp instances on both network interfaces on the server was about 210 Mbits per second (as measured by iftop). MySQL server consumed most CPU time and under this load the whole server consumed about 210W. Because the load of the SIP server was low, we repeated the test and tried to simulate 1 million subscribers.

For 1 million subscribers we needed to generate 66k NOTIFY requests per second, 332 registration refreshes per second and 200 calls per second. Unfortunately our sipp instances could not generate 66k NOTIFY requests per second and we had no other machines we could use as additional load generators.

To determine how much CPU load NOTIFY requests alone would generate, we stopped all other sipp instances and kept only those that were generating NOTIFY requests. The CPU load in this scenario was about 1-2% per CPU core. The amount of network traffic generated by the keep-alives alone was about 200 Mbit/s.

Finally, we stopped all the sipp instances generating NOTIFYs and repeated the registration and call setup tests at a rate that would be generated by 1 million users, i.e., 332 register refreshes per second and 200 calls per second. This amount of traffic generated from 10% to 20% of load per CPU core. The busiest processes were again those of MySQL. During this test (without NOTIFYs) the server consumed 190W. The traffic generators generated 25.5 Mbit of signaling traffic per second. During the test there were 42000 active transactions on the SIP server.

Even during this test the system was stable, with no dropped calls and a negligible number of retransmissions.

Conclusions

We were able to simulate all signaling traffic generated by the population of 0.5 million user agents and verified that the single SIP server can handle that amount of traffic.

Because of the low load on the SIP server, we repeated the test for 1 million user agents. Although we didn't have enough load generators to generate NOTIFY messages for 1 million users, we learned that the load generated by NOTIFY requests alone is very low. Furthermore. we verified that the single SIP server can handle user location updates and call setups generated by 1 million of users.

Short-Lived TLS Connections

The goal of this test was to stress-test the TLS connection establishment phase and see the impact of the TLS handshakes on CPU utilization. Therefore, we configured sipp to create a new TLS connection for each registration and for each new call. Below are some preliminary numbers. The testing scenario was the same as in the previous tests (333 registration refreshes per second, 150 CPS, 1000000 users), except the following:

  • There were no NOTIFYs keep-alives (they are not needed in this scenario).
  • All traffic was encrypted with TLS.

All averages were calculated from 5 consecutive measurements. With the following numbers the system was stable and could run for hours:

  • Maximum CPU utilization: 800% (2 CPUs, 4 cores per CPU)
  • Average CPU utilization by MySQL: 27% (as reported by top)
  • Average CPU utilization by SIP server processes: 176% (as reported by top)
  • Average number of established TLS connections: 617 (as reported by SIP server and verified with netstat)
  • Average number of SIP transactions: 3605 (as reported by sercmd tm.stats)
  • Call rate: 150 CPS (inv-407-ack-inv-180-4_sec_pause-200-ack-bye-200)
  • Maximum call rate: 200 CPS
  • Registered contacts: 1 million
  • Openssl compression: disabled
  • TLS version: TLSv1
  • Certificate verification: disabled everywhere
  • SIP server processes: 16
  • Maximum power consumption under load: 209W
  • Traffic volume with encryption: 27 Mbit/s (TX+RX)
  • SIP server shared memory: 2048MB
  • Average TLS connection setup rate: 478 new TLS connections per second

We could increase the call rate all the way up to 200 CPS. Above that sipp instances could not keep up and the whole system became unstable. sipp started generating traffic spikes (as a result of variable call rate) and the spikes would eventually overload the server.

We also determined that the SIP server needs 61 kB of memory per one TLS connection. On a 32-bit machine with 4GB of memory and with 2.5GB reserved for SIP server, the server could support no more than 43k simultaneous TLS connections.

TCP Connections

To stress test SIP over TCP, we configured sipp to send all signaling traffic over TCP and ran a series of very simple TCP tests. The load-generators generated 332 registration refreshes per second and all messages were sent over TCP. The CPU load generated by the SIP server was from 6% to 8%. With 80k permanent TCP connections, the SIP server could still handle at least 1000 requests per second and a connection arrival rate of 1000 new connections per second, with 20k new connections.

The SIP server consumed about 4.5 kB of memory per TCP connection. That is in addition to any memory consumed by the kernel and the OS for those connections.

Enhancements in v3.1.x

We used a source code snapshot made on 3rd February 2010 for all the tests described on this page. The source code in that snapshot is practically identical with the source code that is included in release 3.0.0, the first release that integrated both SER and Kamailio SIP servers in same application.

A new major version was released in October 2010. That release includes improvements that could potentially improve performance of the SIP server even further, such as:

  • Support for raw UDP sockets in Linux. A 30% increase in performance of the SIP server over UDP was reported on the mailing lists in some scenarios.
  • New options for memory tunning of OpenSSL. The TLS tests revealed that OpenSSL's memory consumption can be a major bottleneck. New options implemented in the tls module may be used to bring the memory consumption down somewhat.
  • Use of asynchronous API for TLS connections. This feature could potentially increase the overall processing capacity of the SIP server over TLS/TCP connections in scenarios where memory consumption is not the bottleneck.

For a full list of features and improvements in v3.1.x, see:


Navigation

Wiki

Other

QR Code
QR Code performance:v3.0-capacity (generated for current page)