# SaaS Infrastructure metrics' thresholds

### 1. Application Instances

#### 1.1 CPU thresholds

In order to measure CPU, we recommend analyzing CPU USER and CPU STEAL. Additionally, you can use the SAR tool for getting CPU information.

Note: processor_vcpus is the number of virtual processors each CPU

~~~bash
  cpu_warning = 85
  cpu_critical = 90

  cpu_warning_stealtime = 10
  cpu_critical_stealtime = 25
~~~

#### 1.2 Load-average

In order to read the system Load-average, we use icinga2 plugin and monitor the value of load average for each one, five, and fifteen minutes. Also, we resize threshold values to take the numbers of CPU.


~~~ bash
  load_wload1  = processor_vcpus * 1.5
  load_cload1  = processor_vcpus * 2.5

  load_wload5  = processor_vcpus * 1.25
  load_cload5  = processor_vcpus * 2.25

  load_wload15 = processor_vcpus
  load_cload15 = processor_vcpus * 2
~~~


#### 1.3 Free RAM
In order to read freeable memory we use file `/proc/meminfo` and apply this thresholds.

~~~ bash
memory_warning_threshold = 10
memory_critical_threshold = 5
~~~

#### 1.4 Free space in HD
For reading freeable memory we use the icinga2 plugin and apply this threshold. The zequenze system are installed into `/opt/zequenze` so is necessary monitoring this directory

~~~ bash
memory_warning_threshold = 10
memory_warning_threshold = 5
~~~
### 2. Cache Instances

#### 2.1 CPU thresholds for Redis instances

Redis is a mono-thread service (version 6.0.6) so it only uses one CPU. That means we have to resize the CPU umbral to check the CPU behaviour correctly.

~~~ bash
cpu_warning = (80 / processor_vcpus)
cpu_critical = (90 / processor_vcpus)
~~~~


### 3. Application Status Monitoring

Application status can be monitored with the following http request in order to get information about ping, chache, database and locmem.

~~~ bash
curl 127.0.0.1:8000/status/
~~~

~~~ json
 {
   "ping":{
      "code":200,
      "status":"Ok",
      "time":0
   },
   "caches":{
      "default":{
         "code":200,
         "status":"Ok",
         "time":0.003637,
         "time_set":0.001012,
         "time_get":0.00225,
         "time_del":0.000375
      },
      "locmem":{
         "code":200,
         "status":"Ok",
         "time":0.000135,
         "time_set":5.5e-05,
         "time_get":7e-05,
         "time_del":1e-05
      }
   },
   "database":{
      "default":{
         "code":200,
         "status":"Ok",
         "time":0.003241
      },
      "replica1":{
         "code":200,
         "status":"Ok",
         "time":0.003045
      }
   }
}
~~~

#### 4. Database

Follows the metrics and default thresholds for each database instances

#### 4.1 CPU Utilization
~~~ bash
warning = "75"
critical = "85"
~~~
#### 4.2 Freeable Memory

~~~ bash
warning = "15%"
critical = "7%"
~~~