Willkommen in der Business Community

Die Telekom Community für Geschäftskunden

Aktueller Hinweis

SIP-Trunk SRTP audio issues after sip refresh on 15min or 30min

Gelöst

Hello,

I am testing SIP-Trunk using Cisco CUBE as SBC. Unencrypted calls work fine with no interruptions.

When I switch to SIP-TLS/SRTP, calls (at least outbound to different providers) fail (one-way or no audio)  at 15min or 30min.

With the help of the vendor (Cisco), we have narrowed down the issue to SRTP ROC (Roll-Over Counter) synchronization issue: When a re-INVITE is sent from provider side, the gateway at Telekom seems to reset the ROC to 0 so our SBC drops incoming packets with error code "unprotect failure / auth check". 

The initial (randomly chosen) RTP sequence numbers plays a role on which stream (inbound/outbound) will be affected on the first 15min refresh, since an initial (randomly chosen) RTP seq-nr > 21.000 will result in a RTP seq-nr roll-over (and increment of ROC to 1) before the 15min refresh / re-Invite.

 

Also, another minor issue. I have noticed that SRTCP messages from the above mentioned gateways, when the negotiated crypto-suite is AES_CM_128_HMAC_SHA1_32, are using 32bit authentication tags instead of 80bit as per RFC4568 section 6.2

 

Anyone else using CUBE with SIP-TLS/SRTP? Does it work without issues in your case?

 

Kind regards,

John

1 AKZEPTIERTE LÖSUNG

@jkougoulos 

 

This should be fixed by today, so please retest if this is now fixed.

Lösung in ursprünglichem Beitrag anzeigen  

Did you check whether the Re-Invite is issuing a new SDP offer/answer cycle? Particularly a look at the version number in the o-line is interesting.

 

Is you installation not supporting UPDATE as session refresh methode?

UPDATE as session refresh methode does not include SDP exchange, therefore those problems should not occur.

Telekom hilft Team
Good morning @jkougoulos,

thanks for your message.

Did the answer from @Meester Prober help you?

@Meester Proper, vielen Dank für deine Hilfe.

Kind regards
Jamil-Ali G.

Hello @Meester Proper , thank you for your suggestions.

We saw that the version number increments, but from Cisco they said that this will not trigger ROC reset for their device. Browsing quickly through RFC3711 (section 3.2.3 & 3.3.1) it looks like this makes sense but in any case I am not an expert on the subject and many times there are multiple interpretations of the RFCs.

Perhaps have you seen somewhere (document/standard/whatever) an indication that the change of the SDP version without any other change of the content should trigger a ROC reset / change of the cryptographic context? That would help me to push Cisco if this the actual reason/indication for going back to ROC=0.

Syncing the ROC between transmitter and receiver over signalling without losing packets on the running RTP stream sounds challenging though.

 

The installation supports UPDATE and some calls (mostly inbound), refresh using this method. However the ones that have the problem (outbound, at least some destinations), get the refresh with re-INVITE from UAS side (Telekom) - I don't know why exaclty, the headers we send indicate UPDATE as an allowed method and the refresher is set to UAS in the "200 OK" of the inital INVITE, but it is quite difficult to guess the reason without knowing what happens on the other side.

 

Thanks again for your interest.

 

John

Telekom hilft Team
Good morning @jkougoulos,

thanks for your reply.

@Meester Proper, hast du noch eine Idee? / Do you have any ideas?

Kind regards
Jamil-Ali G.

@jkougoulos 

 

To sum it up: Your error only occurs on outbound calls to specific destinations. Do you know what kind of technology and/or provider these destinations are using? (ISDN/VoIP/MSAN-POTS, DT-customer / other Provider)

 

Are you able to set the session refresher to UAC on those outbound calls, so that you refresh these calls with UPDATE?

@Meester Proper@Jamil-Ali G. 

 

forcing the refresher to UAC for outbound calls helps for simple cases, but if eg the called party performs hold-resume this triggers reINVITES , the refresher switches to Telekom which sends again re-INVITEs to refresh -> reset ROC -> problems. This was an outbound call to Vodafone mobile.

When I use the same workaround with destination an ISDN DT customer (which had issues before forcing the refresher), the problem does not appear even in hold-resume. I guess at least this ISDN connection does not trigger a reINVITE on hold-resume.

 

So it is a bit better with this workaround but I cannot use it reliably in production mode.

 

I mention the outbound calls because it is easier for me to reproduce, test for long durations and the behaviour seems to be more consistent.

Example of a strange inbound case:

Originator is ISDN user of Orange Belgium.. Destination is our SIP trunk. Half of the calls arrive with "Session-Expires: 3600;refresher=uac" and they get refreshed with UPDATE successfully every 30 mins

Half of the calls arrive with "Session-Expires: 1800;refresher=uac". On 15min I get UPDATE. On 30min I get INVITE. on 45min again UPDATE. Obviously on the 30min INVITE, when SRTP is used, we have the same issue with ROC reset etc.

 

Other inbound calls come with no timer support, some others come with timer support but no indication of Session-Expiration. Again in those case we handle the refresh with UPDATE and seems to work, not really sure though what happens in more complicated scenarios with call-transfers etc.

 

It looks like there is a big variety of signalling patterns which indicates to me that a lot of the functionality has moved to the edges and passes through the core, making the troubleshooting especially in international calls a bit of a nightmare, if possible at all. Just angry end-users.

 

Also, I had noticed a few months ago that Telekom Mobile uses a different signalling pattern (eg no SDP on 200 OK of INVITE) but unfortunately it is not easy for me to test this case for the refresh issues. If there are any test numbers eg echo/callback it would be really helpful.

 

 I see that there are many signalling patterns and actually my test cases are quite limited. Knowing that reINVITEs do not work properly, IMHO is looking for trouble.

 

So, is there any chance that the system gets fixed so that reINVITEs do not trigger a ROC reset, or shall I abandon TLS/SRTP for the next years?

Other options?

 

Thanks again for your input.

Good morning @jkougoulos,

I can offer you support from my colleagues in the technology.

In order to be able to help you quickly, please fill out the fields "Customer number" and / or "Phone number" in your user data. The following link will take you immediately to the right place in your profile http://bit.ly/Customerinfos Afterwards I am pleased about a short feedback.

Best regards
Marita W.

Good morning @Marita W. ,

 

thank you for the interest, the link did not work, but I believe I found the place in my profile to put the requested data and I will wait for a contact from your colleagues

 

Have a nice day,

John

Hallo @jkougoulos,

thank you for storing the data in the profile. Unfortunately, I can not find you under this customer number. Can you please check it again?
Thank you very much.

Marita W.

Hello @Marita W. ,

 

I have changed to a different customer number, maybe depending on the service we receive, we have different numbers, not really sure which one is the correct. This one though looks like it is associated with the VDSL line for the SIP-Trunk.

 

Kind regards,

John

Good morning @jkougoulos,

Thank you for depositing your customer number. I can see that your connection is being looked after in a specific team. I will contact your contact immediately. He will then contact you.

Best regards
Marita W.

@jkougoulos 

 

Good news, as I heard the error was found and will be corrected in one of the next platform updates. I do not have any ETA, but there will be a fix for this.

@Meester Proper 

 

that is great news, thank you very much for the feedback and your effort! Fröhlich

 

@jkougoulos 

 

This should be fixed by today, so please retest if this is now fixed.

@Meester Proper,

 

thank you for the update!

My initial tests show that indeed the issue has been fixed. I will continue testing and let you know if I face any other issues.

 

Kind regards,

John