Services Director 19.1 introduced a new communications channel to connect to Traffic Managers (vTM) using the REST API of vTM via a secure authenticated link, that is particularly useful in networks behind NATs or firewalls. However, this mutually authenticated link needs care when updating Services Director's server certificate. For information on the Services Director Communications Channel, and how to update the certificates manually, see Introducing the Services Director Communications Channel. Updating Services Director certificates with a script Where direct access to the vTM GUI is not available, or estates where breaks in communication cannot be tolerated, it is possible to update the Services Director certificate with a script. The script uses the Services Director's vTM REST API proxy to configure the vTMs using the comms-channel to use a pair of server certificates, both old and new server certificates for Services Director. This means that the vTMs can be primed before the Services Director server certificate is updated, and will be able to reconnect immediately when it has changed. The script is called update_all_vtm_certificates.py, is written for Python 3 and tested on Ubuntu 18.04), and is available for download from https://github.com/pulse-vadc/sd-update-vtm-certificates-script It is highly recommended that customers should follow the instructions in the README.md, as this describes how to use Python's virtualenv functionality to isolate the Python dependencies used for update_all_vtm_certificates.py from the native Python libraries on the machine it is run on (and vice-versa). The script is intended for use as follows: The user generates a new key and server certificate for the Services Director as described in the Advanced User Guide. The script is called passing in: - (A file path to) the new server certificate - (A file path to) the current server certificate - A URL to the Services Director's REST API (the Services Endpoint Address plus port for Services Director virtual appliances) - The administrator username for Services Director - The administrator password for Services Director - Note that if the password is not provided on the command line, the script will prompt interactively for it. - (A file path to) the new key corresponding to that certificate (for checking against the new server certificate only) With this invocation, the script will: - Validate the pairing of the new server certificate and key - Disable monitoring (because this could otherwise interfere with the upgrade process) - For each comms channel enabled vTM in the Services Director's estate: - Use the Services Director's vTM REST API proxy to install the new server certificate, with the old server certificate as a secondary server certificate. - Wait for the comms channel to be re-established (the change above causes an immediate disconnection), then check that the certificate change has taken effect - Re-enable monitoring - Any comms channel enabled vTMs that have not been updated successfully are recorded, and their identities output into the vTM-update-failed.txt file. If any vTMs have failed to be updated according to vTM-updated-failed.txt, step 2 can be repeated, this time passing in an additional parameter --vtms-to-update set to vTM-updated-failed.txt. This will retry the update process for just those instances in the file. This step can be repeated as needed to ensure all vTMs are updated; for any vTMs that continue to fail to be updated, a manual intervention may be required (for more information on manually updating the certificate, see the Manual Update section in Introducing the Services Director Communications Channel) At this point, the Services Director's server certificate can be updated in Services Director itself. In the Services Director's GUI, select System > Service SSL Certificate, then click the link to update the certificate. Comms channel enabled vTMs in the Services Director's estate should reconnect to Services Director once the certificate has been updated. Once the comms channels for the affected vTMs have re-established, the script can be run a second time to remove the old Services Director server certificate from the comms channel enabled vTMs in the estate. In this case, the --remove-old-certificates flag must be added to the parameters of the script (and --vtms-to-update removed). This will remove the old server certificate. Example usage session: In this example, we are updating a Services Director with three registered vTM instances that use the comms channel, one of which is temporarily unavailable. As a reminder, the script is called update_all_vtm_certificates.py, and is available for download from https://github.com/pulse-vadc/sd-update-vtm-certificates-script. $ python3 update_all_vtm_certificates.py --new-sd-service-certificate ../../mim/test_certs/cert1.pem --current-sd-service-certificate ../../mim/test_certs/cert.pem --sd-url https://10.62.164.81:8100 --sd-username admin --sd-password mypassword --new-private-key ../../mim/test_certs/key1.pem
INFO - Disabled monitoring on Services Director
Query vTMs : 100%|####################################################################| 3/3 [00:00<00:00, 23.13it/s]
Update vTMs : 0%| | 0/3 [00:00<?, ?it/s]WARN - Failed to update vtm Instance-P4FJ-CHM8-VEIT-41NG: 'properties'
Update vTMs : 100%|####################################################################| 3/3 [00:06<00:00, 2.09s/it]
The following vTMs were successfully updated:
There were errors updating the following vTMs.
A list of vTMs which were not updated has been saved in vTM-update-failed.txt to retry these vTMs re-run this script with the paramater --vtms-to-update vTM-update-failed.txt
not using comms channel: 0
successfully updated : 2
failed to update : 1
Some vTMs have not had their server_certificate_secondary updated. Please
ensure that the server certificate is updated for these vTMs, either by
retrying this script, or manually updating the server_certificate for each vTM.
Once the server_certificate is updated on all vTMs you can change the SSL
Service Certificate in services director.
INFO - Reset monitoring back to all on Services Director The script runs, and successfully updates TWO of the vTMs, but for some reason the third vTM was not successfully updated. This session outputs a file vTM-update-failed.txt which contains a list of identifiers for instances that are still to be updated in the form of a JSON list, and shows the third vTM: ["Instance-P4FJ-CHM8-VEIT-41NG"] We run the script a second time, adding this file to the parameter list using --vtms-to-update vTM-update-failed.txt: $ python3 update_all_vtm_certificates.py --new-sd-service-certificate ../../mim/test_certs/cert1.pem --current-sd-service-certificate ../../mim/test_certs/cert.pem --sd-url https://10.62.164.81:8100 --sd-username admin --sd-password password --new-private-key ../../mim/test_certs/key1.pem --vtms-to-update vTM-update-failed.txt
INFO - Disabled monitoring on Services Director
Update vTMs : 100%|####################################################################| 1/1 [00:04<00:00, 4.86s/it]
The following vTMs were successfully updated:
vtms to update : 1
successfully updated : 1
failed to update : 0
INFO - Reset monitoring back to all on Services Director At this point, all three vTMs have been updated, and are showing as being successfully monitored in the Services Director GUI: A final invocation of the script is then used to remove the old server certificate from the vTM estate, with the parameter --remove-old-certificates: $ python3 update_all_vtm_certificates.py --new-sd-service-certificate ../../mim/test_certs/cert1.pem --current-sd-service-certificate ../../mim/test_certs/cert.pem --sd-url https://10.62.164.81:8100 --sd-username admin --sd-password password --new-private-key ../../mim/test_certs/key1.pem --remove-old-certificates
INFO - Disabled monitoring on Services Director
Query vTMs : 100%|####################################################################| 3/3 [00:00<00:00, 23.21it/s]
Update vTMs : 100%|####################################################################| 3/3 [00:00<00:00, 3.59it/s]
The following vTMs have had their server_certificate_secondary removed:
vtms to update : 3
successfully updated : 3
failed to update : 0
INFO - Reset monitoring back to all on Services Director
... View more
Introduction Services Director 19.1 introduced a new communications channel to connect to Traffic Managers (vTM). This lets Services Director access the REST API of vTM via a secure authenticated link that is established by the vTM rather than the Services Director. This allows Services Director to manage vTMs deployed into networks that are not addressable by Services Director (e.g. networks behind NATs or firewalls). This new secure communications channel is useful in a range of use cases – not just in NAT/firewall scenarios. However, the new mutually authenticated link introduces complexity when updating Services Director's server certificate. This document describes the Communications Channel, and shows how to update the certificates manually, using the GUI. It is also possible to use a script to update the certificates - for information on this, see Automating Certificate Updates for the Services Director Communications Channel. The Communication Channel mechanism Configuration for the vTM/Services Director communication channel ("comms channel") is initially set up during the self-registration process that enrols a vTM into the Services Director's estate. This process requires the vTM to have been seeded with the Services Director's server certificate in order to authenticate it when connecting; the vTM establishes its own unique client key/certificate pair, and provides the client certificate to Services Director when making the self-registration request. Thus, given the sharing of client and server certificates, both parties have a means of authenticating future communications channel based links. Updating the Services Director certificate All comms channel links are mutually authenticated using the certificates as described above; the vTM will attempt to keep a connection to the Services Director alive constantly, reconnecting (and re-authenticating) if the link is lost or dropped. This is needs to be done carefully to ensure continued communication. Previously, before the introduction of the new comms channel capability, the Services Director server certificate held by vTM was used during the self-registration process only, with the result that the server certificate in use by Services Director could be updated without any implications for vTMs already in the estate. Now, for vTMs using the comms channel, updating Services Director to use a new server certificate can cause problems; as the vTMs still hold the old server certificate, when they attempt to reconnect the comms channel, the vTM will not be able to authenticate the Services Director, and the establishment of the comms channel will fail. While this failure does not affect Universal FLA licensing, Services Director will be unable to perform monitoring, metering, backup/restore operations or any other activity requiring the vTM's REST API. (NB. Although legacy FLA licenses do not use the comms channel, they do authenticate the Services Director using its server certificate, so would also be affected by an update). The way to avoid this problem is to "prime" the vTM with the new server certificate so it can reconnect once the Services Director's server certificate is updated. The next section shows how to update the certificates manually, using the GUI. It is also possible to use a script to update the certificates - for information on this, see Automating Certificate Updates for the Services Director Communications Channel. Manual update For very small estates, where the administrator has direct access to the GUI of each managed vTM, and the risks of a short break in communications between Services Director and the vTM are small, it is possible to update the certificates manually. The break in communication caused by a manual update will cause Services Director features such as monitoring, metering and vTM backups to fail for the affected vTMs (but not vTM licensing). This is because Services Director will be unable to access the vTMs REST API, and resulting failures in metering, vTM backups, and monitoring. If the "Auto Cleanup vTMs" feature is enabled, a monitoring failure would result in deletion of the vTM from the Services Director's estate. As noted above, vTM instances using the Universal FLA will continue to be licensed, unaffected by a change in the server certificate, while vTM instances using legacy FLAs will be impacted by this change; legacy FLA customers planning to update their server certificate must obtain an updated legacy FLA. Setting the new server certificate In this case, the user can update the Services Director's server certificate. In the Services Director's GUI, select System> Service SSL Certificate, then click the link to update the certificate (see below): For all vTM instances using a comms channel link to this Services Director, this will cause a comms channel disconnection; the vTMs in question will start reporting connection errors, and repeatedly do so until their certificates have been updated: Given access to their GUIs, each vTM instance can be updated easily to use the new server certificate. Under System> Licenses> Services Director Registration, paste the new Services Director server certificate into the box for remote_licensing!server_certificate, and press the "Save and register" button. Once this is done, the vTM event log should show a new "Self registration successful" message, and no further "SD Communications Channel Aborted" should appear (see below). This procedure should be repeated for each vTM in the estate.
... View more
This article is the last in the series, beginning with Analytics Application - Concepts and Metrics Explained The Analytics Application included in Services Director can apply a Sample Filter to reduce query times. This is because when you navigate interactively through very large analytics datasets, it is sometimes desirable to trade off accuracy against processing time. The analytics application provides a Sampling Selector control to allow users to do this. This can be applied to any graph except the Dataset View. A sampling ratio is the probability of any single event being included in the total result set. For example, if the sampling ratio value is 1:100, each event has a 1 in 100 chance of being included in the results. The selection of each event is independent. As a result, it is possible that many events will be included from the first 100 events, or that none of these events will be included. NOTE- If you re-run a sampling search, it is virtually certain that different specific results will be returned. A range of sampling ratios from1:10 to 1:10000 are supported in Services Director: A 1:10 sampling ratio retrieves the most data for a given dataset, and is the most representative of source data. A sampling ratio of 1:10000 retrieves the least data for a given dataset, and is therefore less representative. A sampling ratio of 1:1 is also supported, which indicates that all data is included. That is, that there is no sampling. If sampling is required, your search should always retrieve as much data as practical. That is, if a 1:10 sampling ratio produces acceptable results, do not proceed to using a 1:100 sampling ratio. NOTE - Where analytics events are used to calculate totals (such as throughput, and requests per second), sampling should be used with caution. All displayed totals will be approximated for the entiredataset based on the sample. As the sampling ratio increases, the accuracy of this approximation decreases. All of the standard controls/filters are applied when using sampling. Note that: The sampling ratio also affects the query performed for the Component Filter. This can result in the Component Filter 'missing' some values that are present in the dataset, particularly when using sampling on smaller datasets, or when using higher sampling rates. Where a sampled set of results does not include a selected value for a specific Component Filter category, the selected value for the filter is de-selected. This article is the last in the series, beginning with Analytics Application - Concepts and Metrics Explained Prev: Using the Component Filter to Refine Queries
... View more
This article is part of a series, beginning with Analytics Application - Concepts and Metrics Explained The Analytics Application included in Services Director can apply a Component Filter to refine queries. The values shown in the Component Filter are also calculated by a query. The intention of the Component Filter is to show the user all the permutations of Locations, Clusters, vTMs, vServers, Pools, and Nodes that appear in the selected part of the dataset. The values are derived as follows: Filter away all records outside the time range. Apply any further filters (from the Extended Filter). Count records, splitting by Location, Cluster, vTM, vServer, Pool and Node. NOTE - The count is unnecessary, but is the simplest way to achieve the split. Ignore the counts, while retaining the combinations of Location, Cluster, vTM, vServer, Pool and Node revealed by the split. The result of this is an internal table with columns for Locations, Clusters, vTMs, vServers, Pools and Nodes. Each row of this table represents a unique end-to-end 'path' that one (or more) connections/requests have taken through the estate, with time period and any filters applied. Any given category value will likely appear more than once, in combination with different values from other columns. Finally, the drop down boxes (Locations, Clusters, vTMs, vServers, Pools, and Nodes) for each category are filled with all the unique values from the corresponding column in the internal table. The query is re-run whenever the user navigates in a way that invalidates the table of results, such as they change the time range, or apply an Extended Filter clause. Note that some local optimisation allows values to be selected in the component filter without re-running the query. This article is part of a series, beginning with Analytics Application - Concepts and Metrics Explained Prev: Reporting on Top Events Next: Using Sampling to Reduce Query Times
... View more
This article is part of a series, beginning with Analytics Application - Concepts and Metrics Explained The Analytics Application included in Services Director can provide reports on the most common events tracked though the application through the Top Events tab: Top Events: Top 5 URLs: The five vServer URLs most often requested in the selected part of the dataset. For the purposes of this chart, vServer URL is defined as a combination of URL and the vServer that handles it as <vServer>:<URL>. The number of requests directed towards these vServer URLs. The average request duration for requests for each vServer URL. Each bar on the graph represents a different vServer URL. The length of each bar represent the number of requests directed to the vServer URL, while the color of the bar represents the average duration of these requests. The metrics for this graph are derived as follows: Combine the vServer name and request URL into a vServer URL (<vServer>:<URL>) For all request-based records, calculate the count and the average request duration, both split by vServer URL. Make a descending sort of the result by request count, and remove any results below the fifth row. Top Events: Top 5 Traffic IPs The Top 5 TIPs chart on the Top Events tab shows: The five front-end Traffic IPs that handled the most connections and requests in the selected part of the dataset. The number of requests and connections directed towards each of them. The average combined duration of the requests and connections. Each bar on the graph represents a different front-end Traffic IP. The length of each bar represents the number of requests and connections handled by the front-end Traffic IP, while the color of the bar represents the average combined duration of these requests and connections. The metrics for the graph are derived as follows: For all requests and connection records, calculate the count and average duration, both split by Traffic IP. The duration for request based transaction records is calculated as (timeline.crse - timeline crqs), while for connection based transaction records it is the value of the 'duration' field. Make a descending sort of the result by request/connection count, and remove any results below the fifth row. Top Events: Top 5 Referrers The five HTTP referrers that originated the most requests in the selected part of the dataset. The number of requests with each of those referrers. The average duration of those requests. Each bar on the graph represents a different HTTP referrer. The length of each bar represents the number of requests originated by that referrer, while the color of the bar represents the average request duration. The metrics for the graph are derived as follows: For all request-based records, calculate the count and the average request duration, both split by HTTP referrer. Make a descending sort of the result by request count, and remove any results below the fifth row. Top Events: Top 5 Pools The Top 5 Pools chart on the Top Events tab shows: The five pools that that handled the most requests and connections in the selected part of the dataset. The number of requests and connections handled by those pools. The combined average duration of the requests and connections. Each bar on the graph represents a different pool. The length of each bar represents the number of requests and connections handled by the pool, while the color of the bar represents the average combined duration of the requests and connections. The metrics for the graph are derived as follows: For all requests and connection records, calculate the count and average duration, both split by pool. Make a descending sort of the result by request/connection count, and remove any results below the fifth row. HTTP Response Code Charts The Analytics Application also offers a way to chart HTTP Response codes, which is presented as a Comparative Analysis view in the article Interpreting Horseshoe and Timeline Charts. The HTTP Response Code chart shows the HTTP response code distribution for the selected part of the dataset. The distribution is broken down by pool (by default), and then by the response code group (such that HTTP response codes 200, 201 belong to response code group "2XX", while 400, 403, 404 belong to the group "4XX", and so on. Note that if a split has been selected for the primary chart, then that same split is used for the HTTP Response Code chart in place of the pool split. Each column in the graph represents a different pool (or value from the primary chart split), with the overall height of the column representing the total number of requests handled, and the height of subdivisions within that column showing the distribution of the response code groups. The metrics for this graph are derived as follows: Convert the HTTP response code into an HTTP response code group Count all request-based records, split by HTTP response code group and pool name This article is part of a series, beginning with Analytics Application - Concepts and Metrics Explained Prev: Interpreting Horseshoe and Timeline Charts Next: Using the Component Filter to Refine Queries
... View more
This article is part of a series, beginning with Analytics Application - Concepts and Metrics Explained Comparative Analysis The Analytics Application included with Services Director offers a Comparative Analysis view, which provides the means to plot up to two further metrics (or a single metric with a split applied) against the same timeline represented on the primary timechart. Metrics are derived in the same way as for the primary timechart. The Alternative Views tab displayed under the line view provides a means to visualise the average timeline of requests passing through a single vServer, including "Horseshoe" and “Timeline” chart options. Deriving Timestamp Metrics The visualisation is based on timestamps relating to the start and end of phases of processing that vTM undertakes to handle a transaction. These timestamps are measured in seconds relative to the start of the connection (or, in the case of a request protocol on a keepalive connection, the end of the previous request on the connection). The timestamps are defined as follows: Request Handling Timestamps Client Timestamps “crqs” Client Request Start - first read from client after connection (or the end of previous request on a keepalive connection) “crqe” Client Request End - last read from client. That is, the end of the request body for HTTP vTM Timestamps "trqs" Traffic Manager Request Start - the point at which vTM has enough information to make a load balancing decision (i.e. end of headers for HTTP) "rrqs" Rules Request Start - about to run the first TrafficScript request rule "rrqe" Rules Request End - finished running TrafficScript request rules "trqe" Traffic Manager Request End - the vTM has initiated connection to the server, in preparation for transmitting the request and receiving a response Server Timestamps "srqs" Server Request Start - first write from vTM to the backend server node "srqe" Server Request End - last write from vTM to the backend server node Response Handling Timestamps Server Timestamps "srss" Server Response Start - first read from the backend server node by the vTM "srse" Server Response End - last read from the backend server node by the vTM vTM Timestamps "trss" Traffic Manager Response Start - the point at which vTM has enough information to process response. That is, the end of HTTP headers "rrss" Rules Response Start - about to run first TrafficScript response rule "rrse" Rules Response End - finished running TrafficScript response rules "trse" Traffic Manager Response End - the point at which vTM has completed processing the response and is simply forwarding to the client Client Timestamps "crss" Client Response Start - first write to the client "crse" Client Response End - last write to the client From the timestamps, we can derive the durations of a number of processing tasks within the transaction handling: Metric Calculated As Description "crq" ("crqe"-"crqs") Duration of vTM reception of client request "trq" ("trqe"-"trqs") Duration of vTM processing of client request "srq" ("srqe"-"srqs") Duration of vTM transmission of processed client request to server "spr" ("srss"-"srqe") Duration of server processing "srs" ("srse"-"srss") Duration of vTM reception of server response "trs" ("trse"-"trss") Duration of vTM processing of server response "crs" ("crse"-"crss") Duration of vTM transmission of server response These timestamps are illustrated on the following timeline of a typical vTM transaction, showing how the segments of the transaction map to the metrics calculated above: The "Horseshoe" and “Timeline” charts in the analytics application are based on averages of the timings and derived durations from the transaction records that fit within the timescale and filter set in. The color of each segment reflects the duration represented by it, with values close to zero are represented in green, values of 1000ms or more are represented in red, with a gradient of colors between (as shown in the key/legend). This can provide a handy visual clue where processing is taking human-discernible periods of time, and which phases of processing are taking longest to complete. Timeline Charts The timeline chart represents the same durations represented in the horseshoe chart, but combines these durations with the average startpoints for each phase of processing to present an aggregate timeline: Note - The start times for each phase are defined as the start time of the phase minus crqs; this is because crqs represents the wait time from client connection establishment to the first request byte (or the wait time between the end of one request and the first byte of the next request). These wait times can be of a very variable length, and can distort the timeline chart without adding much information about how the request processing time breaks down between the vTM and the back-end server. Hence, the left of the bars of the timeline view can be considered to represent the point in time where the first request byte is received by vTM (or in other words, crqs). The timeline will often require careful interpretation, for a number of reasons: 1. Phases of processing can be too small to visualize The phases may be so fast that they are almost unnoticeable when plotted on the timeline. For example, see the Request from Client, vTM Req Processing and Request To Server phases on the chart above. This indicates that the client request handling on vTM is trivial for this vServer. 2. Phases of processing can and often do overlap For example, note from the timeline of a typical vTM transaction diagram that a vTM can commence sending a request to the server before having fully received the client request. Also note in the example timeline chart above how the start of the Response from Server bar is almost immediately followed by a 0ms vTM Response Processing phase and a Response to Client bar. This is a reasonable indicator that for this request, the vTM is simply forwarding the response to the client. Where a later phase of processing commences before the end of an earlier phase, the bar representing the earlier phase is split into two differently-colored sections: A non-overlapping section, where only the earlier phase is in progress. This may represent a 'critical path' of processing before which the next phase cannot commence. However, it may also indicate that the next phase cannot commence for some other reason, for example while waiting for server connection. The color of this section of the bar reflects the duration it represents, using the same scale as the horseshoe segments. That is, values close to zero are represented in green, values of 1000ms or more are represented in red, with a gradient of colors between. An overlapping section, where both phases are in progress. This color of this bar is a darker shade of the color used for the non-overlapping section. Note that where two processing phases share long overlapping sections, it will be the bar of the 'later' phase whose colors will reflect the duration of the processing (for example, showing red for phases of 1000ms or more). While this is helpful for identifying processing phases that are taking longer than desirable, this should notbe taken to automatically mean that it is the later phase that is the cause of the delay. For example, when a server slowly streams a response to a vTM and the vTM streams this immediately back to the client, the Response to Client bar will be shown red, while the Response from Server bar will be shown green. 3. Similar traffic patterns may be processed differently The traffic that passes through a vServer does not always follow a single homogenous pattern of processing. The configuration of a vServer may result in a number of traffic processing patterns that lead to very different timing patterns, some of which will entirely skip some phases of processing. For example, a vServer that has caching enabled is sitting in front of a relatively slow server. The traffic passing through that vServer will likely fall into two categories: Traffic requiring server interaction. There will be definite Request to Server and Response from Server / Response to Client sections in the timing patterns, potentially separated by a Server Processing bar, during which the vServer is waiting for the server to start responding. For example, the following is a timeline showing averages over a five minute period for a vServer. The vServer is fronting a server that returns a large, static payload with caching enabled. In this case, there is an Extended Filter clause set to HTTP Response Cache Hit IS FALSE. Traffic served from cache. The Request from Client can be very short (possibly registering as 0ms if the request itself is trivial). Also, Response to Client will start without any delay for Server Processing and is potentially much shorter in duration than the equivalent non-cached case. This is because the response can be served by vTM from memory without making any connection/request to the back end server. As with Server Processing, the Response from Server and vTM Response Processing phases are not required in the cached case, and will show as 0ms in duration. The following is an example timeline from the same vServer. Cacheing is still enabled, the the same large static payload is being delivered over the same time period as above. However, the Extended Filter clause is now set to HTTP Response Cache Hit IS TRUE: The combination of heterogeneous traffic patterns and averaging can lead to timing charts that appear to depict "impossible" timelines. For example, a Response to Client occurring before Server Processing. This can be seen by combining the graphs above (by removing any filtering based on cache hits) to deliver the following timeline. Note that the average Response to Client begins before the average Response from Server: As a result, when viewing such charts, it is important to remember that the chart depicts averagestart times and durations, and that apparent timeline anomalies are likely a sign that there are two or more traffic patterns combined in the same dataset. These datasets differ - potentially radically - in terms of average timings. The Dataset View can be used to investigate potential reasons for these differences, such as responses from cache, responses from TrafficScript, connection failures, and so on. Further filtering can then be applied to separate out these different timing pattern groups, to produce a more standard "waterfall" timeline for each group. This article is part of a series, beginning with Analytics Application - Concepts and Metrics Explained Prev: Exploring Table Views Next: Reporting on Top Events
... View more
This article is part of a series, beginning with Analytics Application - Concepts and Metrics Explained The Analytics Application included with Pulse Services Director offers a unique Table View in the Explore functions, which combines request metrics, connection metrics, and sparkline graphs, all in a single page. It is formed from the results of two separate queries: A query for the overall metrics A query for the sparkline metrics Overall metrics The Table View shows five metrics in total. These are split by vServers, grouped together by cluster. Avg. Connection Duration (ms) Avg. Request Duration (ms) Throughput (Mbps) Connections / second Requests / second As described in Terminology: Transactions, Connections and Requests, each vServer supports a specific protocol which determines whether it produces request-based metrics or connection-based metrics. As a result of this, each vServer line will show either values in the columns for request-based metrics, orconnection-based metrics, but never both. Unused columns show a '–' symbol. The Throughput metric is an exception to this, as it can be sensibly applied to both connections and requests, and so appears for each vServer. The overall metrics are calculated using the following method: Filter out records outside the time range Apply any further filters (from the Component Filter and/or the Extended Filter) For each transaction, determine: Whether it is connection-based or request-based The size the transaction in Megabits (Mb): - Add together "vserver bytes in" and "vserver bytes out" - Multiply the result by 0.000008 to convert into Megabits. Aggregate the following results by cluster and vServer: - The count of transactions - The sum of Megabits (Mb) transferred - The average request duration (where applicable) - The average connection duration (where applicable) Take the results from the aggregation stage and: For request-based vServers: - Calculate requests per second as (transaction count / seconds in full time range) - Round average request duration to 0 decimal points - Rename to average request duration For connection-based vServers: - Calculate connections per second as (transaction count / seconds in full time range) - Round average connection duration to 0 decimal points - Rename to average connection duration For all vServers: - Calculate throughput in Mbps - Calculated as (throughput as sum of Mb transferred) / (seconds in full time range) Sparkline metrics The Table View also shows sparklines in one of the columns. This column is the current user-selected data metric. The data for all the sparklines is generated in one query. The sparkline metrics are calculated in a similar way to timecharts: Generate a request duration field for each analytics record in milliseconds: Subtract "timeline.crqs" from "timeline.crse" Multiply the result by 1000 to convert into milliseconds Filter away all records outside the time range. Apply any further filters (from the Component Filter and/or the Extended Filter) For each transaction: Determine whether the transaction is connection-based or request-based, and set fields as described below: For connection-based transactions: - Record a connection count of 1 - Make the connection duration equal to the transaction duration - Set the request count to 'null' - Set the request duration to 'null' For request-based transactions: - Set the connection count to 'null' - Set the connection duration to 'null' - Record a request count of 1 - Make the request duration equal to the pre-calculated request duration field Calculate the size the transaction in Megabits (Mb): - Add together "vserver bytes in" and "vserver bytes out" - Multiply the result by 0.000008 to convert into Megabits. For each timebucket, with an additional split by cluster vserver, calculate: - Connections per second as (sum of connection counts) / (seconds in the full time range). - The average of all connection durations. - Requests per second as (sum of request counts) / (seconds in the full time range) . - The average of all request durations. - Throughput in Mbps is calculated as (sum of transaction size in Mb) / (seconds in the full time range) Sparkline Scale It is important to remember that each Sparkline graph is intended to show changes in the selected metric over time on a per-vServer basis, and is notintended for direct vServer to vServer comparison. In order to make best use of available vertical browser space, each vServer-specific graph is individually scaled. The following example shows sparklines for vServers in the Avg. Request Duration (ms) column, with values of 5ms, 137ms and 1587ms respectively. The tallest lines on each graph are the same height; however, if you hover the cursor over each sparkline you can see the exact value. For example, the tallest line on the top graph (which has an average of 5ms) represents a request duration of 12ms: Similarly, the tallest lines on the bottom graph (which has an average of 1587ms) represents a request duration of ~4000ms. For direct vServer to vServer comparisons, it is better to use the Line View with a vServer split. This article is part of a series, beginning with Analytics Application - Concepts and Metrics Explained Prev: Understanding Time Charts and Metrics Next: Interpreting Horseshoe and Timeline Charts
... View more
This article is part of a series, beginning with Analytics Application - Concepts and Metrics Explained The Analytics Application included with Pulse Services Director operates on a dataset which is made from individual records, each of which describes a single transaction performed by Pulse Traffic Manager (vTM). This data includes (but is not limited to):
The time at which the transaction occurred.
The effective 'path' along which the transaction passed. That is:
The vTM's cluster ID.
The front end IP on which the transaction data was received.
The vServer that processed the data.
The pool through which the vServer routed the data (if any).
The node selected by the pool to handle the data (if any).
The size of the data received/transmitted.
The duration of the transaction.
Timings for various parts of the processing that the vTM had to undertake to process the transaction.
(For HTTP requests) Details of the request and response including request types, headers and URIs, response codes, response headers, and so on.
The outcome of the transaction. That is, how the transaction ended, both in terms of how the connection ended, and (where applicable) the HTTP response code.
The full transaction is represented in a JSON schema, which can be downloaded from any Enterprise Management licensed vTM, by navigating to System > Analytics Export > Transaction Metadata and selecting "View JSON Schema for transaction metadata")
The Analytics Application provides the user with tailored visualisations, and the tools to examine and drill into these potentially enormous datasets. These features enable the user to diagnose system problems, assess system performance, and so on.
Example of Transaction Records
The following example shows two requests making their way through a vTM cluster to back-end pool nodes, along with excerpts from the analytics transaction records that would be generated as a result.
The vTM cluster hosts two virtual servers: first, a simple web application; and second, an Ecommerce API gateway. Each application has been assigned resource pools and a set of dedicated server nodes with backup pools.
From the diagram, you can see that Request/Response #1 is a simple HTTP GET transaction. In addition to the path through the system, we also record the URL which is the subject of the GET request.
On the other hand, Request/Response #2 is a POST request to an Ecommerce gateway managed by the same cluster of vTMs.
Terminology: Transactions, Connections and Requests
Looking at the diagram above, we need to make sure we agree our terminology for Transactions, Connections and Requests.
Transactions: A transaction can be considered as the processing that a vTM undertakes to handle an incoming client connection or request. It may involve running TrafficScript, making a load balancing decision, creating a back-end connection to a server or any of a number of other traffic management activities. While a transaction is a meaningful unit for a vTM internally, it is not necessarily a useful abstraction for end users. This is because a transaction represents two distinct concepts: a connection and a request.
Connections: A connection represents a communication channel established over a network, over which data can be transferred. For example, a TCP connection.
Requests: A request is a higher level concept, usually capturing a protocol exchange carried over a connection. For example, a HTTP request-response carried over a TCP connection. Note that multiple requests can be carried over the same connection, for example HTTP keepalive requests.
The type of transaction record emitted by a particular vServer depends on the protocol of the vServer concerned; some protocols emit Request Records, while others emit Connection Records.
Protocols that Emit Request Records
Some protocols are able to emit request records. All vServers that are configured to use one of the following protocols will emit request records:
NOTE - for all other protocols, refer to Protocols that Emit Connection Records. When recording analytics for vServers with a request-based protocol, the vTM can look within the data it is transferring; for example, vTM can:
Inspect HTTP headers in order to optimise load balancing decisions at the vTM.
Update the requests/responses as they pass through the vTM.
Each analytics record for those vServers corresponds to a single request/response. Each request/response pair is carried over a virtual connection established from the client to the server via a vTM. The transaction record shows protocol-level information, such as bytes sent and received, duration, HTTP request type, HTTP response code, headers, and so on. Multiple requests/responses can be carried over the same underlying connection.
For all request-based transaction records, request duration is determined by subtracting the time the first byte was received by vTM from the client from the time the final response byte is sent by vTM to the client.
Protocols that Emit Connection Records
This section addresses all protocols that are not listed in Protocols that Emit Request Records. For example, UDP, LDAP, POP3, and others. All vServers that are configured to use non-request protocols will emit connection records.
When recording analytics for vServers with a non-request based protocol, each analytics record corresponds to a virtual connection established from the client to the server via a vTM. The record includes connection-level information such as:
Bytes sent and received.
How the connection was closed.
However, it does not include information about any higher-level protocols carried over the connection. The connection duration is the length of the complete elapsed time period between initial establishment of the connection and its end.
NOTE - It is theoretically possible that a vTM that emits request records could also emit connection records, but this is not currently the case. The Analytics Application treats connection-based and request-based vServers separately. As a result, when a connection-based metric is selected, the connection data will not include data for vServers that emit request records, and vice-versa. The "Throughput (Mbps)" metric is an exception to this, as it applies equally to request-based and connection-based transaction records.
This article is part of a series, beginning with Analytics Application - Concepts and Metrics Explained
Next: Understanding Time Charts and Metrics
... View more