SQL Server healthcheck
Published on: Author: Pierre van der Ven Category: IT development and operationsOver the years I have performed healthchecks with several customers. We define a healthcheck as ‘how healthy is your system’, and off course we give advice. In this blog I want to share my experience with the results we found and the issues we encountered. Please keep in mind a healthcheck is advice. You are not urged to change anything, although it is recommended.
I will list the items we check, and what we advise depending on the output.
1. Version
Check SQLserverupdates.com to check the latest version for all base-releases. If you are at a supported release, be sure you also installed the latest updates. If you are at a unsupported release, be sure you installed the latest service pack and when needed the latest updates. There is no valid reason to use unsupported software. Be sure you are up-to-date!
2. SQL Server agent
The SQL Server agent is a Window-service. This service executes jobs in SQL Server itself, it monitors the instance, and it will send alerts when needed. When this service is not running you probably notice missing jobs, alerts, etc. This service must be up-and-running always, also after a restart of the server.
3. Database status
A database in an instance should always be online. Unless there is a very good reason to keep it offline. Most of the times a offline database is not in use anymore and can be archived or deleted. Keep the SQL Server instance clean!
4. SQL Server job status / all jobs
Prevent offline jobs. Except for some products from Microsoft itself, there is no reason to keep offline jobs in your database. If a job never runs, remove it. If a job fails constantly, investigate if the job is needed and if yes: fix it.
5. Database backup last 24 hours / databases not back upped recently
Be sure you have database-backups, and monitor them daily. A good guideline for SQL Server is to have a full backup every day for all databases. If the size is too big to perform daily backups, make a plan to perform a full backup during the weekend and (daily) differentials during the week. Also make a plan to test the backup, perform a restore every few months. Doing this you have a better feeling with the backups, and when a restore is needed and everybody is panicking just open your restore-runbook and perform it.
6. SQL Server error log
Do not forget to check the error log. Sometimes there are errors in there that are useful and must be handled. When there are no errors at all please check the older logs. This can happen in instances where the activity is very low.
7. CPU Info
It is quite hard to give exact guidelines for CPU. In general: if your CPU is too busy, give it more.
Check how busy the CPU is. If the average over a day or a week is above 15%, consider to add more CPU-power to the host. But when there are no performance related issues with the instance, consider to leave it as it is. Use this info as an indicator what instances qualify for more CPU-power.
8. SQL Server memory info
Check if the allocated memory is sufficient for the instance. If the allocated memory is (almost) equal to the total memory, consider to extent it. Also check if there is any memory-pressure.
9. Top 10 long running queries
Looking at the top 10 of long running queries is another indicator to find out if there are problems in your instance. But first of all: if nobody complains about long running queries, skip this. It is a waste of time to investigate this if nobody has an advantage.
Try to find the bottleneck in the long running queries. Is it memory? Is it IO? It is too detailed to dive into the deep for this one, but be sure to have a look at it when checking the health for your instance.
10. All files
Check if the files that are related to SQL server are distributed across several disks. It is quite hard to make a default for this, since every company and even person has his own standard. Overall a good default is:
- no related files on the C-drive
- tempfiles on their own drive
- separate datafiles and logfiles
- separate data from indexes, so they can be distributed to different drives
11. Service accounts
Check if the SQL server services (for instance and agent) are running under a domain-account. If not: make it so.
12. Sysadmin
Be sure you have an overview of the sysadmins in your instances. Best practice is that only DBA’s have this privilege. But unfortunately some applications need it otherwise they won’t work. Document those kind of exceptions, and keep asking your supplier when they will fix this.
13. Recovery model
I always prefer full recovery, unless there is a good reason to not do so. When using full recovery you can restore point in time. When using simple mode, be aware: point in time recovery is not possible. You can restore to the moment from the backup, and that’s it. Sometimes you see simple mode when running a datawarehouse with a lot of mutations. Be aware there is a special ‘bulk logged’-mode, which may be better to use instead of simple mode. When you don’t use full recovery document why you decided to use another model.
14. LinkedServers
Check if the linked servers connect to databases you know. Also check what exactly has been configured: can someone ‘on the other side’ retrieve all your data? Also document the linked servers, document everything you know about them. If you miss information: ask around for it.
15. Uptime
If uptime is above one month, the most recent Windows-updates are very likely not installed. Find out why the most recent Windows-updates are not installed. If there is no policy for Windows-updates: force your executives to make one and comply with it.
16. Default accounts
Check the status for all the default accounts in the database. It depends on the configuration for SQL Server what accounts are in there exactly, but at least check:
- sa
- NT Service\MSSQL$<instance>
- NT SERVICE\SQLAgent$<instance>
- NT AUTHORITY\SYSTEM
- NT SERVICE\MSSQLSERVER
- NT SERVICE\SQLSERVERAGENT
If they are enabled: disable them, and when needed replace them by a domain-account.
My conclusion:
- be aware
- dare to fix, update and upgrade
- practice your skills
- document when needed
Good luck with your healthchecks and thanks for reading this blog!