Forums › SM Series Discussions › Data synchronization error when processing VRT packets
- This topic has 5 replies, 2 voices, and was last updated 12 months ago by
OLanterman.
- AuthorPosts
OLantermanParticipantUsing the libsm_api.so.2.3.3 for RHEL8 we are getting “-11: Data synchronization error” when calling smGetVrtPackets(). This happens fairly consistently but also might not fail or might fail after some significant amount of time.
The setup is an SM200C. This is attached either via a sonnettech transceiver (fp10+) attached to a laptop or connected to a 10G switch connected to a server. Both are running RHEL 8.
This also happens when using Spike v3.9.0. Though most of time with Spike we’re getting a -6 error. Given the closed nature of libsm_api.so and no debug artifacts or sources I’m stumped. Any help would be appreciated. Thanks.
AndrewModeratorBoth -6 and -11 are related to network data loss. -11 is simply an indication of UDP packet loss, and assuming packet loss continues or is bad enough you will eventually see a -6 (device connection loss). -11 is generally bad enough that you can’t trust the I/Q data fully anymore.
Here are some troubleshooting ideas and suggestions,
– Ensure you have configured all the network settings we recommend in our network config manual. Ring buffer sizes, socket buffer sizes, jumbo packets, etc.
– Simplify your network to just SM200C directly connected to the PC. If stable, start adding additional equipment like your switch and determine when the issue starts.
– We recommend SFP+ for the full run. if you convert to RJ45 or SFP at any point, this is likely the source of your issue.
– Reduce the sample rate. You didn’t mention what sample rate you were using, but if it’s 200MS/s, try going down to 100/50/25. Any lower than 25 and the network rate doesn’t reduce, but it will perform on device decimation down to 25. This should put significantly less pressure on your network and hopefully contribute to a more stable system.
– If none of the above work, try swapping out fiber/transceivers to determine if there are any flaky ones. Definitely something we have seen in the past.I look forward to your results.
OLantermanParticipantThanks for the quick response. It seems it’s always the network on most things. Before I posted, we had pretty much done what you suggested. One wrinkle I forgot to mention, the system running the packet capture code is a VM.
On my laptop and in our local lab the hypervisor is KVM. At our customer site it will be ESXi. My laptop is connected to a Sonnettech 10G SFP+ (Thunderbolt to SFP+) and our lab system is connected via SFP+ to a 10G/40G switch to the SH (all SFP+) – so we’re good there. On my laptop I used the suggested changes on my VM’s NIC as well as on my laptop NIC. Also note that the VM had to be changed to use an e1000e NIC rather than the virtio NIC – virtio does not take the NIC settings.
Here’s where it gets interesting I think: I spoke with another team who had similar issues and they suggested changing/adding some of the NIC settings to:
sudo sysctl -w net.core.rmem_default=67108864
sudo sysctl -w net.core.rmem_max=67108864
sudo sysctl -w net.core.wmem_default=67108864
sudo sysctl -w net.core.wmem_max=67108864
sudo sysctl -w net.ipv4.udp_rmem_min=4096
sudo sysctl -w net.ipv4.udp_wmem_min=4096This seems to work (again, this must be set on all NICs) especially if we keep the sample rate to 25MS/s. I as able to run for over an hour. 50MS/s seems to be less error prone than before (run for 5-10min before a network error) but anything above that and we nearly immediately get the -6 error from smGetVrtContextPkt(). Better but still troublesome. I don’t know if making the max read/write to 64M or adding in the UDP setting helped? We’d like to get to 50MS/s if possible. Thanks again for the help.
AndrewModeratorWe’re going to do a little bit of research on our end to see if we can learn more about how our device performs in a VM.
If it’s possible on your end, I would be curious if running on bare metal makes these issues go away. Running bare metal may not be an option for you, but I think there is value in knowing if the VM configuration is the problem, or if there is still something else limiting the data rates.
AndrewModeratorIf you are willing to start a direct correspondence with us, please email me at aj@signalhound.com.
OLantermanParticipantWe have another project that uses a bare metal server. In our lab, the server is directly connected to a different SM200C (which, BTW, we borrowed to test to see if the SH was the issue – it wasn’t). In that case we could get 50MS/s easily but 100MS/s doesn’t work so well and 200MS/s was never an option. Even Spike has issues. I have not tried the updated NIC settings so we’ll have to see about that. That project is on hold until we’re ready for the next phase so it might be awhile before I can play with it.
I will contact you direct. Thanks for the help.
- AuthorPosts
You must be logged in to reply to this topic.