Application performance management
Recognising and avoiding project risks at an early stage
If the performance quality of software is only checked at a very late stage, it always represents a high project risk. Specifying the expected performance together with application performance management (APM), which is started at an early stage, minimises the cost and deadline risks.
Performance tests are scheduled three to four weeks before software acceptance. This is usually too late to rectify any performance problems detected during the tests in good time. However, extending or even opening up the test periods would increase costs, which hardly any stakeholder is likely to agree to.
One possible solution could be to move application performance management, which was previously part of the software operation phase, to the development phase. Any performance problems can then be identified at the start of software development and corrective measures can be initiated promptly. Result: The deadline risk is eliminated, the project risk is reduced and the software quality increases considerably.
Successful early application performance management must fulfil several conditions: Firstly, a technical quantity structure with defined response times must be specified with the customer, which becomes part of the requirements package. In accordance with a standardised process, test scenarios must then be defined on the basis of this quantity structure and measurements must be carried out on an ongoing basis. Finally, the measurement results must be evaluated and optimisation measures derived in accordance with the process.
The goals of application performance management overlap with the requirements of a modern application. For example, a short and fast response time behaviour should create a positive user experience (UX). The faster the software responds to the user's input, the more productive and satisfied the user will be.
Software can only be taught to perform retrospectively with a great deal of effort and within certain limits. Either the software was designed for high performance from the outset, i.e. during the design phase, and this was also taken into account in the underlying architecture. Or the lack of performance must be compensated for during the operation of the software through compromises - possibly even through massive use of hardware. In most cases, only an emergency solution can be found afterwards.
The earlier in the development process that application performance management is started, the sooner the differences between actual and target become apparent and the sooner deviations can be counteracted. This requires continuous performance measurements that are maintained throughout all phases of software development and then provide detailed information about the performance quality and its development.
Performance is usually implicitly expected by the customer, but rarely specified in detail. However, such a basic expectation cannot be measured or compared. Therefore, the nebulous expectation must be translated into concrete quality characteristics and measurement criteria. From this, the performance requirement is derived, which from now on should serve as a target brand. The requirement must therefore be verifiable, agreed with the customer and, above all, explicitly described in a specification. This means that there are no more discussions about quality criteria later on.
A written quantity structure of functional units initially provides for a number of enquiries to be processed per hour or a quantity of new customers per year. This is supplemented by an exact response time per user action. This can be used to create a test scenario with clearly defined acceptance criteria.
In the case of applications that are also growing rapidly, it is crucial to know about the functional growth at an early stage and to specify this. Otherwise, later growth will cause problems that will probably only become apparent a few months after the start of the productive phase.
The load profile - i.e. the number of simultaneous sessions or the number of requests within a certain period of time, both in relation to normal operation and peak load times - is also interesting. Is the application used more in the morning? Or is there a peak load at the end of each quarter because the application is used to generate invoices? In which country are the users located? Are there latencies in data transfer because the application is used from South Africa and the servers are located in Europe? In these cases, scalability must be taken into account in the design of the software due to the strong increase in data volume and the number of enquiries, and secured with a suitable test scenario.
In addition to a test scenario, a large volume of test data is also required. The database of the integration environment should have the amount of data that would be available after one to two years of regular operation. Scripts generate a consistent and identical test database fully automatically and can be created in any scripting language.
As soon as the scripts for generating the test data are available, the performance tests can and should be repeated at any time and on an ongoing basis. The results then reveal the development trend of the response time behaviour over a longer period of time.
For very performance-critical business processes, a separate test scenario that is repeated daily has also proven to be useful. Which measurements are carried out depends on the individual application. Solutions such as JMeter, which can be easily integrated into a continuous integration system such as Jenkins, are suitable for these daily tests. The test scripts are triggered fully automatically in the nightly build via Jenkins, which in turn analyses the results of the triggered scripts graphically after each run. This makes the trend development of the response time behaviour visible.
This approach makes it possible to intervene quickly and in good time in the event of a negative performance development. Another advantage is that the performance behaviour of the application under load can be better understood and a more precise estimate of its future development can be obtained. The next performance measures are then developed step by step with each further iteration.
Together with the test scenario, the point in time at which an optimisation is to be carried out is also defined. However, optimisations are only carried out when the response time behaviour of a test case chain or an individual request exceeds a maximum value that deviates too much from the target value.
There are many reasons to carry out an audit or review in a project. Do you want to change service providers or do you want to know whether your software is state-of-the-art? Do you want to know whether your software is maintainable and future-proof? We can answer these questions for you. However, we do not limit ourselves to downstream, reactive measures. On the contrary: we help you to handle your IT projects to a high standard right from the start.
In addition to a test case scenario, it is also advisable to carry out performance monitoring of the running system in an integration environment. The performance monitoring data supplements the picture of the application gained from the test case scenario with the general status and an overview of the system's resources, such as utilisation, memory and threads. These values are all relevant for the performance of a system.
The test case scenarios are of course not comparable with normal load test scenarios, as the number of users is negligibly small. However, certain problems can be seen even under low loads. These small problems would possibly grow into huge ones under real load, for example in a later productive operation. The elimination of this particular type of problem must therefore be given particularly high priority.
Experience shows that it is sufficient to monitor the application on two to three levels for performance monitoring. The first level that should be monitored is the external service interface. The next interesting level for measurement is the business layer. In a Java EE application, this would ideally be the business facade. The third level of a measurement should take place at the level of the data layer and record all database queries and data manipulations in order to identify those that generate a significant load.
Important: Too many simultaneously configured measurement points are counterproductive, as the usable information content usually deteriorates rather than improves. It could even happen that the measurement tool itself becomes a performance problem because it cannot cope with the number of measurement points. This must not happen under any circumstances, which is why it is necessary to monitor any exceptions that occur in the measurement tool.
With a highly distributed application, not every single server in a cluster needs to be monitored or instrumented. As a rule, it is sufficient to equip one server in each network segment or network zone with measuring points if the expertise is replicated on all infrastructure resources. Otherwise, so much data is collected that analysing it becomes confusing, unwieldy or impossible. Deviations from this requirement are possible if, for example, an understanding of a problem can only be gained through a more detailed analysis. As a general rule, the smaller the footprint of the tool used for the measurements, the better.
The separate measurement levels of performance monitoring are also combined to form an overall picture that also shows correlations between the levels. For example, if there are individual, slow or extremely frequent database queries, then this is certainly also recognisable at the level of the external service interface.
The measured values collected can be supplemented by analysing the log files. Centralised monitoring of the log outputs, for example with ELK-Stack, is well suited for this. It can be configured in such a way that the logs are elegantly managed in the background without unnecessarily burdening the system with additional monitoring. The system is particularly advantageous in highly distributed environments.
The performance monitoring, i.e. the result of the load tests, is the starting point for the optimisation measures. The order of the optimisation measures is determined by the total cost metric. The metric determines the product of the number of calls and the response time of the calls and sorts the results in descending order, placing the most expensive calls at the top.
The first two to five calls in the list are optimised first. These are the so-called hotspots. Experience has shown that the hotspots generate up to 80 per cent of the total load of a system. Once the hotspots have been optimised, a further load test is carried out. This usually shows a significant and possibly higher than expected increase in application performance. This is because the optimisation of individual calls alone usually has a performance-enhancing effect on the other calls.
Instead of continuing to work through the total cost list in descending order after the hotspots, the call costs are recalculated on the basis of the new load test. This is because it can happen that calls that were originally rated rather favourably suddenly slip to the top of the list after the hotspot optimisation and form new hotspots. For example, because they can now be accessed much more frequently than was previously possible. For this reason, a further iteration of analysis, evaluation and hotspot optimisation takes place after the hotspot optimisation.
The following best practice has proven to be useful for optimisation: Never optimise for stock! This essentially corresponds to the paradigm ‘Do not design for future use!’. Problems usually occur in places where nobody previously suspected them, which is why the assumption about potential problem areas is not always correct. Ultimately, this is also a question of efficiency in the development process. For economic reasons alone, the available resources should be used specifically to solve a problem and not to optimise on suspicion or in advance.
The optimisation strategy is based on various factors: it is not always possible to use a scale-up or scale-out strategy, i.e. to use more powerful or more servers, as many customers work with a high level of standardisation in their data centre. An optimisation with more or faster resources usually has to be justified very precisely, as the optimisation is associated with higher operating costs.
Before scaling up or scaling out, all optimisation options must first be checked at application level. For example, is the application's data access optimised and performant? Sometimes there is a considerable difference in performance if the data is loaded in many individual queries or if the same amount of data is loaded in a single query with the right join-fetch strategy.
If the data in the memory does not have to be up to the second, then a caching strategy can be an effective solution. Distributed environments often force you to relax the temporal data consistency and be satisfied with eventual consistency. This approach of eventual consistency is very popular these days in the microservices environment in order to increase throughput. In such cases, availability and scalability are more important than immediate consistency.
Long-running business transactions should generally be avoided, as important resources such as threads or data are blocked for too long. There is a great risk of accesses blocking each other, up to the extreme case of deadlocks. If this type of processing is nevertheless necessary, for example in batch processing, it should be shifted to the non-productive time, where the load on the system is low. If there is no off-peak time, for example in online shops, then transferring batch processing to a separate server can also be a solution.
All optimisation approaches aim to reduce the processing time of a request so that resources are not blocked. Halving the average response time behaviour makes it possible to double the number of possible simultaneous requests to a system - based on Little's Law. Overall, fewer system resources are required if the application is performant across all calls and thus enables a high throughput of requests and data (throughput).
For this reason, it is interesting to increasingly rely on asynchronous processing. The extreme case of this is the reactive programming style, in which everything is processed asynchronously via messages (events) or a futures and promises pattern and no resource is blocked for longer than necessary.
From an architectural point of view, applications should work as stateless as possible. Only with this approach is it possible to create a highly scalable application. This can be realised with the REST architecture style, for example.
Conclusion
Application performance management should be applied as early as the design and development phase of an application in order to keep an eye on performance requirements at all times, meet the target and minimise project risks. Performance tests should be carried out continuously. Supplementary runtime monitoring improves the analysis options.
Analysis and optimisation should always be carried out iteratively in order to avoid surprises later on in an acceptance test. Code optimisations or architecture adjustments are only carried out if there are actual deviations from the requirements. Optimisation is not carried out in advance or on suspicion.
A high-performance application conserves system resources, reduces operating costs and improves the end-user experience.
Performance under control right from the start
Application performance management doesn't just start with the go-live. Paying attention to performance as early as the design and development phase saves costs, minimises risks and ensures satisfied users. With continuous testing, targeted monitoring and iterative optimisation, you stay in control at all times - and only optimise when it is really necessary.