Indexed Views and how it can help improving query performance


A considerable method to speed up query running time
A considerable method to speed up query running time

Written by Do Minh Tuan


Posted at 15 Feb 2018, 12:43 PM



Read my article on codeproject: https://www.codeproject.com/Articles/1280651/Indexed-Views-and-How-It-Can-Help-in-Improving-Que

INTRODUCTION

SQL Server Views can be a greate helpful tool for database developer, allows us to encapsulate many complicated query into one single query and help the user in working with our database easier. But if we just create a regular view, it's mission is just help to make the use of a complex SELECT query become faster and more readable and understandable. There is no performance improvement applied on that kind of view. However, now we can create an indexed view, which will help us to increase query performance, and save the cost. There are many benefit if we understand it well and manipulate it in the right way. But there is also no free-lunch, so what should we concern? Where and when to use indexed views?

WHAT IS A VIEW?

View is an object in SQL Server that allows us to group a complex SELECT clause, with many sub clauses, like JOIN operator or aggregate function like SUM COUNT AVG,... into a single query. For a very common database for managing customers invoices, assuming we have a query with many JOIN clause to list all order detail from customers who live in Danang city.

SELECT a.Id, a.FirstName, a.LastName, d.ProductId, d.ProductName
FROM Customers a JOIN Orders b
ON a.CustomerId = b.CustomerId
JOIN OrderDetails c 
ON b.OrderId = c.OrderId
JOIN Products d
ON c.ProductId = d.ProductId
WHERE a.City = 'Danang'

So wherever we want a list of order detail from customer who live in Danang city, we have to run the query above. It's sometime not a short but very very long and complex query, and if the database engineer gives the complex query to a backend developer, he would feel confuse. In this case, the database engineer will create a view name vOrderDetail_DanangCity as bellow:

CREATE VIEW dbo.vOrderDetail_DanangCity AS
SELECT a.Id, a.FirstName, a.LastName, d.ProductId, d.ProductName
FROM Customers a JOIN Orders b
ON a.CustomerId = b.CustomerId
JOIN OrderDetails c
ON b.OrderId = c.OrderId
JOIN Products d
ON c.ProductId = d.ProductId
WHERE a.City = 'Danang'

So now the database engineer provides a view to backend developer, name vOrderDetail_DanangCity. The developer only needs to know what does the view do, and doesn't care about how it does. In anywhere he needs to use the view, he just:

SELECT * FROM dbo.vOrderDetail_DanangCity

But what happen inside the view? In this case, creating a view is just a shortcut, and when the view is called, the Database management system (DBMS) still has to re-run the query above to get the result. It means many join clauses would be executed again. 

INDEXED VIEWS

Indexed Views (or Materialized Views in some books), is a view that has Unique, clustered index on it. Let's create an indexed view to understand it clearly.

Assuming I have a database with the following relationship:

 

One student can join many courses. One course can have many students joined. A student when join a course will have an examination and have a score. In the database, there are 2000 rows in Courses, 5000 rows in Students, and 10,000,000 rows in Student_Course. You can download this demo database backup file (bak) at the end of this article.

Assuming we have to list all courses that have at least 1 student joined, and the average score of that course. 

SELECT crs.CourseName, AVG(std_crs.Score) AS Average
FROM dbo.Courses crs JOIN Student_Course std_crs
ON crs.Id = std_crs.CourseId
GROUP BY crs.CourseName
 

It takes 4 seconds to execute this query. We will use this query to create a view:

CREATE VIEW dbo.AverageScrores
WITH SCHEMABINDING
AS
SELECT crs.CourseName, SUM(std_crs.Score) AS TotalScore, COUNT_BIG(*) AS NumberOfRecords
FROM dbo.Courses crs JOIN dbo.Student_Course std_crs
ON crs.Id = std_crs.CourseId
GROUP BY crs.CourseName

Note that we will use SUM and COUNT_BIG(*) for this case. Creating indexed views in SQL Server has some restrictions and requirements, and we have to deal with it, in this case by turning AVG to SUM and COUNT, and in the future we will calculate AVG = SUM / COUNT. We will not discuss these requirements here.

The WITH SCHEMABINDING is a mandatory requirement to create indexed view. It helps to protect the column appeare in the view, you can not alter or drop column from the base table. If you want, first you have to drop the view, alter or delete column, then re-create another view.

Finally, create an Unique, Clustered index on that view. We can create index on one or more column. About what is Index, Unique and clustered index, just take a fast look on Google.

CREATE UNIQUE CLUSTERED INDEX CIX_AverageScrores
ON dbo.AverageScrores(CourseName);

And now we have an indexed view. When we create Unique index on a view, we are "materializing" it.

  • If the view doesn't have any unique index, it is just a shortcut. When the view is called, DBMS will re-run the query to get the result. It means DBMS will run many join clauses, many complicated aggregate function,... all over again.
  • If the view has an unique index, it is "materialized". One copy of this view is created. This materialized view is used to store the result of the query we used to create the view. When the view is called, instead of re-run the query with many complex clause, DBMS will read from the materialized view.

Let's run the query bellow:

SELECT CourseName, TotalScore / NumberOfRecords AS Average FROM dbo.AverageScrores  WITH (NOEXPAND)
 

We are now using the indexed view, by pointing out that the query should you WITH (NOEXPAND). WITH (NOEXPAND) will let the DBMS know that it is an indexed view and we want to use the result stored before. If you forget to add WITH (NOEXPAND), the DBMS would re-run the query. 

The result appear after ... 00:00:00 seconds. Immediatly.

COMPARE INDEXED VIEWS AND VIEWS

Run these 2 query: The first query is the query we used to create the view. Second query is the index views.

SELECT crs.CourseName, SUM(std_crs.Score) / COUNT_BIG(*) AS Average
FROM dbo.Courses crs JOIN dbo.Student_Course std_crs
ON crs.Id = std_crs.CourseId
GROUP BY crs.CourseName
 
SELECT CourseName, TotalScore / NumberOfRecords AS Average FROM dbo.AverageScrores  WITH (NOEXPAND)

We can see the result is the same, there is no different. 

But the time of execution is totally different. When we use indexed view, it's 00:00:00. When we use the based query, it's 4 seconds. This difference because indexed views WITH NOEXPAND doesn't have to calculate the result again. It didn't join table and calculate aggregate function. It just reads the result stored in the view and show to us.

WHEN TO USE

Using indexed view seems to be a good choice. But is that a free-lunch?

No, it's not  In fact, it's an expensive lunch if we use them ... abusively. Whenever the based tables (the tables join in FROM clause when we create the view) has changed, DBMS not only have to update the table, re-indexed the view but also have to re-calculate the value stored. In this case, when some student join any course, the DBMS has to re-calculate SUM and COUNT of that course, which is stored in materialized view. It also means the more complex the query is, the more complicated the view maintenance takes. 

View maintenance is a big problem. What is the most effective way to maintain the result stored fastest without re-run the original query? Because of view maintenance problem, there are many restrictions and requirements for creating indexed view, for example, query with OUTER JOIN is not allowed. 

So we know Indexed view will:

Make SELECT clause run faster

Make INSERT, UPDATE, DELETE or some manipulating event become slower

So use them wisely. Just apply indexed view on a query that we use very often, but the based tables are not updated frequently. In this case, the performance and effectiveness reach the maximum. Otherwise, be careful. You can make the system run slower than you expected if the view is not used very often, but the based tables have been updating time by time.

 You can download the demo database at the link below (I use SQL Server 2014):

DOWNLOAD

 



Comments


Search