A business may be only as successful as its team, but it is only as functional as its IT infrastructure. Receiving uninterrupted timely responses from an application is a critical factor in determining its success. This is generally referred to as the performance of the application.
Now, many factors affect the overall performance of an application, such as hardware at server-side, hardware at the client-side, application architecture, programming languages, database connections, database table and index design, application server connections, network latency, proxy cache, browser cache, application cache, load balancer, server nodes, and more. While all of the aforementioned criteria may help boost an application performance, the opposite is also quite true — there are so many more areas of improvement.
Identifying an application’s pain areas for performance optimization has always been a tricky affair. Firstly, you should have an in-depth understanding of the entire application architecture, as well as the strengths and limitations of the underlying technologies. You must also consider the current load, be able to predict the future burden of the application and continuously evaluate and enhance the application architecture to meet performance needs. In this blog series, we will uncover performance aspects for different technical components of enterprise applications.
This article highlights how you can optimize the database element to achieve the desired performance levels. We will also discuss different ways to measure and improve the overall application performance.
Creating and Maintaining a Strong and Scalable Database
An application without data storage and manipulation is a redundant one. Having a database is an important asset, but if you don’t treat your database wisely, this will become an overhead expense for you rather than a facility. Let’s take a look at the different aspects of a database that need to be taken care of to avoid any performance-related issues.
Table designs are fundamental steps with a large impact, so here’s a list of items you need to pay attention to:
- Select the data type and size of the columns carefully; factor in future needs.
- Choosing a higher size datatype will impact the index, searching, insertion, and overall database, so choose with care.
- Use numeric data types to store numbers that may require calculation or sorting. Keeping number values in the string form of datatype will require datatype conversation/casting, adding additional processing complexity and slowing down the performance.
- Avoid storing more extensive data in database columns, such as binary files, images, and text files as these will slow down your query performance. They require heavy physical IO processing and ample memory to hold. Instead, if you need to store such extensive data into a column, exclude this column from the query when performing an operation on multiple rows or retrieving more rows. You may opt to process or select an individual record to manipulate or select large data rows.
- Identify the nature of the table. Do you need it to store configuration-related data or transactional data? Tables designed for configuration will not increase rapidly and have a defined set of records. Having more columns and having an extensive data type for columns won’t impact such a table. On the other hand, tables designed to store transactional data will keep on growing periodically and may hold billions of records. You should keep selective columns with a low data size as they will cause table and database sizes to increase, triggering a problem of physical data storage and space.
- Rethink before introducing many columns with foreign key constraints as that will impact your insert. Update statements especially for your transactional tables with cascade operation if your foreign key table has a large set of records.
Sizing and purging
- Define an active set of data and a passive set of data to allow your application to work correctly. The active sets of data are frequently used by many users, consisting of configurations, user settings, and recent transaction-related data. The passive sets of data are used occasionally by a minimal group of users, which generally consist of old transactional data.
- If you envision a high volume of passive data generation in your database, it is advisable to archive or purge passive data periodically, like monthly, quarterly, or yearly, based on the active data retention you need. This will keep a defined set of data in your active table and allow you to retain your benchmarked performance against the table size.
- An insert query may get impacted from the trigger, index. Adding more indexes and executing more triggers will slow down insertion queries.
- Keep the trigger simple and quickly executed so that the insert query execution won’t be impacted.
- After inserting data in the table, the database must update all respective table indexes for the new values. Having more indexes on the table adds additional tasks for the database, which will impact the insert query. Hence, try to keep the index on the table low.
- Update and delete query will also have the same impact as insert query from trigger and index. The search condition in the update and delete query will also add complexities in query execution. As per conditions, the database needs to find the record from the table to update or delete data. A full table scan to find a required record for an update or delete will slow down the query. Remember to write a query in such manner that the database doesn’t require a full table scan and locate records quickly using an index or primary key reference.
- Processing extensive records in a single database transaction will impact the same transaction and other transactions executed in parallel on the same tables. While performing insert, update, delete operations on tables in the transaction, the database acquires a lock on all the tables on which process is getting execution. Until the transaction is fully committed or rolled back, such lock won’t be released from the tables. Other transactions won’t access those tables for that lock duration, and their query execution will slow down.
- Some available tools help trace all queries, which take more time for execution or use high resources. These include Myrafor Oracle, SQL server profiler, and Query Store for MS-SQL.
- As discussed earlier, to optimize the query, you first need to understand how exactly the database will execute the query. For that, you can check the query execution plan.
- The execution plan provides details about your query. Carefully review that part which takes more CPU or higher row scans.
- The index is a widespread solution to optimize query performance. However, you must understand the internal mechanism of the index before using it. The index comes with an additional overhead to the table, which needs to be considered.
- Indexes are designed based on binary tree data structure, which helps to search data by sorting them. It includes default and common data structured, followed by all the databases. However, some databases provide support from the different data structures to generate an index, like bitmap, partitioned, function-based and domain.
- There are two types of indexes: 1) A non-clustered index and 2) A clustered index
- A non-clustered index is stored separately from the table; hence the database will need more memory. You can define 999 non-clustered indexes on particular tables. As discussed earlier, the database has to update all the indexes whenever any change happens in rows; thus, it may impact the performance of insert, edit and delete queries.
- These indexes do not match the actual physical record location of the table and are kept separately on memory. Hence, the frequent change in table records with insert update will create high fragmentation in indexes, which will again slow down index traversal.
- To reduce fragmentation, you can reorganize or rebuild the index. However, rebuilding the database index may be time-consuming, depending on the number of records and the datatype length of the column selected for indexes. It can occupy the whole processing bandwidth of the database and may impact ongoing activities. Hence, it is advisable to rebuild indexes during low user activity time like late nights or on weekends. However, you can reorganize the index in low fragmentation, which can be performed online or opt for the rebuild offline option supported by some databases.
- It is advisable to keep the number index for the particular table as low as possible if your table data is manipulated frequently, such as when using transactional data.
- Only select those columns which need to be indexed.
- Sometimes you may create many indexes, but very few are in use. You can use tools to check query execution plans and ensure created indexes are used by query or not. All modern databases are now available with such tools. Take the EXPLAIN PLAN – this tool will showcase the algorithm of how the database will execute the query internally. Here you will notice which data is scanned inside your tables and how much overhead it causes in the entire query execution. If you don’t find the index you created in that plan, your index is unsuitable for a particular query.
- Only one clustered index can be created on each table. Adding a cluster index column will help store sorted data in the table itself rather than creating a separate storage space for the index.
- A clustered index is comparatively faster than a non-cluster index. It is a part of the table and does not require independent table row mapping with sorted keys in different memory locations.
- You can only introduce a single clustered index on the table; apart from that, all other features are similar to the non-cluster index.
- Be conscious when using a large data type for a column in the table. Most databases don’t allow creating indexes using such a large datatype column as a key column for the index. CLOB, BLOB, TEXT and XML cannot be part of a key column of the index. You can include all such large data types in the index, but an index search still goes with another key column. If you would like to improve searching using multiple rows, you can go for a function-based or domain-based index.
- The view is considered a pseudo table, similar to a virtual table.
- View generally consists of select queries that extract data from one or more tables. Hence, the performance of view ultimately relies on the underlying SELECT question.
- Many people think view is faster than a standard query or table, but they forget that view is finally a SELECT query. If you want to improve the performance of the view, optimize the underlying SELECT query.
- A materialized view is your best choice is you need to fetch data from multiple tables using complex queries where the query needs more time to produce a result, though data in the table are not updated so frequently.
- It helps to keep the underlying SELECT query data result in memory, ready to serve rather than fetching and recalculating from all tables on demand every single time.
- As it retains a snapshot of data from underlying query tables required to refresh over the period, this view cannot be a choice to serve real-time data.
Database redo/event/trace logs
- You can replicate or execute the scenario during development and trace the issue, but this is not the case in production. Sometimes the problem may only occur on production, not locally. In such instances, trace logs prove helpful.
- You can enable the logging and use all such logs to trace the query and database performance.
The above mentioned factors are critical elements in the maintenance and appropriate handling of the database to optimize it and achieve high-performing application performance. While continuing this series, we will discuss more performance aspects regarding database connection pulling, java applications, caching, and more.
Stay tuned to this space.
GS Lab’s expertise across end-to-end Performance Engineering, Testing, and Open-source
GS Lab has proven expertise in performance automation testing, capacity planning, competition benchmarking, and performance monitoring and consulting complex enterprise solutions across multiple domains. We build solutions to address a particular business problem as well as keep your business future-ready, giving you a competitive edge. We have also contributed significantly to Open Source. Our Open Source Contribution “Pepper-Box” is a free Kafka plugin for JMeter. It is scalable, customizable, and secure with an easy-to-use user interface. The Pepper-Box is an apt choice to generate load against streaming data and analytics pipeline implementation leveraging Kafka.
>> Stay tuned for more blogs in this series on application performance and ways to enhance it.
>> Know more about our expertise across Performance Engineering And Testing and how we can help you with your performance requirements across various stages of the product lifecycle.
Maulik Bamania | Software Architect
Maulik Bamania is a lead software engineer at GS Lab and has over 14 years of experience in enterprise application design and development. He is an avid technology reader and likes to learn about the latest technology trends to deliver quality solutions for customers. He has experience across all the facets of application management including, development, management, deployment, and application support.