High latency can pose a problem for certain MediaGuard users. If you encounter any issues while using MediaGuard, a HUMAN representative will be happy to help you resolve these issues as soon as possible.
However, to help us diagnose the root cause of your problem, we ask that you follow these troubleshooting steps to provide us with as much information about the state of your integration as possible.
One possible source of high latency is compute time. Long compute time occurs when the MediaGuard cluster is underprovisioned (i.e., does not have enough nodes to support current traffic levels); otherwise, compute time is generally negligible compared to other network factors.
Please let us know the following:
- Has your advertising traffic increased beyond the MediaGuard cluster’s current traffic processing capacity? For example, has your QPS (queries per second) volume increased by more than 20% from your standard levels
- Have you received (or do you anticipate receiving) any expected increases in traffic? Have you received any unexpected increases in traffic?
Other issues with MediaGuard may be caused by network issues. To determine whether this is the primary cause of your increased latency or timeouts, please please select a server that you use to process bids, then run diagnostics from that server to the MediaGuard cluster and include all results in your support request:
Run the following diagnostic commands from the bidding servers making the API call to MediaGuard:
- traceroute <whiteops_endpoint> (at least three times)
- traceroute --tcp --port=443 <whiteops_endpoint> (at least three times)
- mtr --report-wide --report-cycles=200 <whiteops_endpoint>
- mtr --report-wide --tcp --report-cycles=200 --port=443 <whiteops_endpoint>
You may receive different results between different traceroutes. This sometimes occurs when ICMP requests and other information items are deprioritized
Ideally, you should run the above tests for multiple servers within your bidding infrastructure in the affected region. You can obtain the current IP address of any MediaGuard server by running the dig command on the cluster's DNS endpoint.
HUMAN will also run independent traceroutes from our servers to your system, which allows us to compare results from both approaches. Please provide us with an externally-accessible endpoint that allows us to send traceroute requests to your bidding server.
Another common source of high latency is non-ideal connection management. The number of connections, QPS (queries per second) per connection, and connection lifetime are the primary factors that may impact your MediaGuard latency.
Under ideal network conditions, your MediaGuard cluster can support up to 100 QPS per connection. However, in a typical implementation, we recommend utilizing between 65% and 80% of the available connection capacity to send and receive MediaGuard lookup requests. You can calculate the ideal number of connections for your MediaGuard connection utilization by using the following formula:
(mean latency x QPS) ÷ (1000 x utilization) = number of connections
If a given connection is only established for a minimal number of requests before disconnecting, this inefficiency will negatively impact performance due to connection overhead. To counteract this inefficiency, we recommend using fewer connections but keeping each connection open longer and sending a larger number of lookup requests (rather than using short-lived connections for a smaller number of requests).
Note: Since MediaGuard is optimized to perform more efficiently with higher traffic volumes, low-QPS requests will not accurately reflect MediaGuard's performance. After you've established a successful connection, we recommend running tests at slightly elevated QPS levels (above 200 QPS) to obtain more accurate metrics.
If your servers are measuring the response time for each of your MediaGuard requests, please provide these statistics in your support request. HUMAN can compare this data to our own measurements for more thorough troubleshooting.
Ideally, your metrics should consist of the following:
- Metrics by datacenter and/or region
- Timeout percentage
- Latency histogram (for example):
- 50% <15ms
- 30% >= 15ms and <20ms
- 15% >= 20ms and <30ms
- 5% >=30
To give us a broader sense of your system and MediaGuard implementation, we also ask that you fill out the following questionnaire and include the answers in your support request:
- Were there any recent changes to your ad server?
- Were there any recent changes to the volume of requests to your ad server?
- Were there any recent changes to the network/service provider?
- Were there any recent changes to the routing?
- Did the issue persist for at least half an hour?
- Is the distance between your ad server and the MediaGuard cluster greater than 48 kilometers (30 miles)?
- Are there more than fifteen hops between your ad server and the MediaGuard cluster?
- Have you performed a traceroute and MTR using ICMP protocol?
- Have you performed a traceroute and MTR using TCP protocol?
- Is the latency originating from a hop close to HUMAN's cluster?
- Have you confirmed the results with your NOC (Network Operations Center) team?
- What is the externally-accessible endpoint for HUMAN to send traceroute requests to your bidding server? (And is the ICMP protocol allowed through your firewalls?)
- If you are collecting any metrics, can you provide any graphs that illustrate the timeline of your issues?