Release: 1.5 -This Document was generated 2015-05-27
Copyright © 2015 init.at informationstechnologie GmbH
NOCTUA® is a registered trademark of init.at informationstechnologie GmbH.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Red Hat®, Red Hat Enterprise Linux®, Fedora® and RHCE® are trademarks of Red Hat, Inc., registered in the United States and other countries.
Ubuntu® and Canonical® are registered trademarks of Canonical Ltd.
Debian® is a registered trademark of Software in the Public Interest, Inc.
SUSE® is a registered trademark of Novell, Inc.
openSUSE® is a registered trademark of the openSUSE community.
All trademarks are the property of their respective owners.
Please write to <noctua@init.at>
to contact our developer and support team.
Abstract
This document is the official documentation for NOCTUA®. It explains general concepts of the software, gives an overview of it's components and walks you through various installation and administration tasks, the appendix documents and explains the stable part of the API.
Table of Contents
List of Figures
List of Tables
List of Examples
Table of Contents
Before you start to dive deep into the documentation, we think it is fair to let you know if you get the right information out of the document or not. This documentation is intended for system administrators who want to get into monitoring and cluster management. It is also intended for user who only have to operate with the software but don't do any configurations.
The handbook provides also some background information about LINUX® commands in general, i.e. in context with package installation.
It does not deal with general monitoring themes or basic principles of monitoring or cluster management. If you wish to learn something about that, please save your time and look for a more suitable document in the world wide web or in your local specialised bookstore.
Below list shows you which prerequisite of users and administrators should be fulfiled to get into monitoring or cluster management with software by init.at ltd..
Experience with LINUX® in general
Experience with the LINUX® command line, e.g bash, zsh or other
Experience with standard html browser
Experience with network settings
For easier document handling on some place you will notice small icons with the following meanings:
Table 1.1. Symboltable
![]() | Link inside the documentation |
![]() | Mailto Link |
![]() | Internet http link |
![]() | Link to the glossary |
![]() | Mark some very important statement |
![]() | Moving mouse to menu with given name |
![]() | Work on this content is in progress |
![]() | Left mouse click |
![]() | Right mouse click |
![]() | Doublemouse click |
Table of Contents
NOCTUA® is an extensive Software Package for monitoring devices. Monitoring means to observe, record, collect and display different aspects of hardware and software. This monitored aspects could be close to hardware like CPU Temperature, CPU Voltage, FAN Spin or existing or not existing NET devices but also close to services running on monitored machines like SSH daemons, POSTFIX daemons, HTTP services or simply the availability of devices via ping.
Not only monitoring of devices is possible but also different methods of reaction due to reached predefined limits. And best of all, NOCTUA® is
open source software, licensed under the
GNU GPL 2.0 License.
You can find more Information about open source software and it's benefits at http://www.fsf.org
Even if the main task of Monitoring is to ease configuration and administration of icinga, there are some special and unique features icinga and other software solutions do not provide.
Following list contains the exclusive components that makes our software unique and unreachable on software market:
![]() |
| |
![]() |
| |
Central database - One central storage for all data like configs, settings, logging data user data and much more. |
| |
![]() |
| |
MD-Config-Server - md-config-server is responsible for proper configuration conversion into icinga config format |
| |
KPI [coming soon] - Key Performance Indicator, integration of user defined KPIs. Flexible preselection by device and monitoring categories. |
| |
![]() |
| |
![]() |
| |
![]() |
| |
![]() |
| |
![]() |
| |
![]() |
| |
Database shnapshot - Displays database snapshots at any point, cormparison of snapshots. Displays historical status of users, devices, networks and scripts. |
| |
![]() |
| |
Reporting - Clear and extensive visual display of availability for different hosts and services in defined timerange. Displays all status messages for hosts, services and even devicegroups. Simple handling of time span. |
|
In this section you will find out about what technical requirements NOCTUA® has.
Like every other software NOCTUA® has also certain system requirements. Because NOCTUA® is open source software, everybody who has enough programming knowledge could be able to port it to other open source software systems.
The good news: You don't have to port anything if you already use one of the following LINUX distributions:
Debian
Ubuntu
CentOS
Opensuse
SLES
For exact versions please take a look into Installation Chapter.
Monitoring configurations are stored in databases for faster access and therefore faster reaction time and further more flexible administration of data.
NOCTUA® is using Django as its database interface so every database which is compatible with Django can be used. The recommended Database
is PostgreSQL (mostly due to license issues).
MySQL database is not supported any more.
Table of Contents
Software packages are available for following operating systems:
Debian Squeeze (6.x)
Debian Wheezy (7.x)
Ubuntu 12.04
CentOS 6.5
openSuSE (12.1, 12.3, 13.1)
SLES 11 (SP1, SP2, SP3)
It might be very well possible to get the software up and running on different platforms - but this type of operation is not tested and supported at all.
There is no public access to our repository directories, therefore first, you have to contact us to get a valid LOGINNAME and PASSWORD.
After receiving you access data you are able to use below mentioned repositories.
To install the software on your operating system there are two ways to go.
You can either automatically install the software by downloading and running our install_icsw.py install script or manually by adding the below listed repositories with your individual access data and use your package manager as usual.
The automatically installation via script is well recommended because it is very comfortable and handles most of installation scenarios.
In order to use the install script, first you have to contact us to get your individual access data.
Download the script named install_icsw.py
from our download portal.
As user root run the script with your repository access data as follows:
install_icsw.py
[
-u
USERNAME
] [
-p
PASSWORD
] [
-n
CLUSTERNAME
]
The script does following:
Determine which operating system is running
Add the necessary repositories with the required access data in your operating systems repository directory
Refresh your package cache
Install the software
Automatically integration of valid license files to the system
After receiving your individual access data do following steps to install the software manually:
Add suitable repositories for your operating system
Refresh your package cache with your os package manager
Install the software (for details see section installation)
Integrate your received or downloaded license files
There are two main repositories you have to deal with to install the software.
Repositories are available for stable (2.5) and for master (devel) versions of above mentioned operating systems. The operating system running on your hardware and the version of the software you want, determine which repository configuration you must use for your package manager.
Repository for latest releases of init cluster soft ware.
The current developer Version, containing the newest functions and modules. Very fast update and change cycle due to active development. Sometimes a bug could slipped in but usually it works fine. From time to time it will be merged into stable.
Repository for stable releases of init cluster soft ware.
The current stable version for productive environment. Most features and functions are included and there are no knowing bugs.
Based on the above mentioned operating system, repository and desired software version, resulting repositories can be added.
There are two different ways to add new repositories for monitoring software by init.at ltd. to the operating system. It can be added all at once in one central file or in a repository directory. For each operating system there are special repository directories.
Table 3.1. For debian based systems
Debian wheezy |
/etc/apt/sources.list.d/
|
Debian squeeze |
/etc/apt/sources.list.d/
|
Ubuntu 12.04 |
/etc/apt/sources.list.d/
|
Following you can see some examples of
source.list
content.
This are the lines you must add to your
/etc/apt/sources.list
The relevant parts for deb based package manager looks like this for devel version and wheezy :
deb http://LOGINNAME:PASSWORD@www.initat.org/cluster/DEBs/debian_wheezy/icsw-devel wheezy main
For ubuntu 12.04 you can also add repositories either in one file, into
/etc/apt/sources.list
or in repository directory
/etc/apt/sources.list.d
Debian and ubuntu use an other package manager than CentOS, OpenSUSE or SLES. For that reason, on rpm based operating systems
sources.list
does not exist. Rather there are a few files for repository management not only one. All relevant repository files stays in the directory
/etc/zypp/repos.d/
.
[cluster_devel_remote] name=cluster_devel_remote enabled=1 autorefresh=0 baseurl=http://LOGINNAME:PASSWORD@www.initat.org/cluster/RPMs/suse_13.1/icsw-devel type=rpm-md
[cluster_devel_remote] name=cluster_devel_remote enabled=1 autorefresh=0 baseurl=http://LOGINNAME:PASSWORD@www.initat.org/cluster/RPMs/suse_13.1/icsw-2.5 type=rpm-md
Alternative it's possible to download repositories direct from internet instead of editing files manually. There are two URLs you can get repositories from.
Don't forget to request for access data to the repository directories otherwise you can not access it.
Repository directory is
/etc/yum.repos.d/
. Place your desired *.repo files inside this directory, do a
yum check-update
and you are
ready to install the software.
[initat_cluster] autorefresh=1 enabled=1 type=rpm-md name=initat_cluster baseurl=http://LOGINNAME:PASSWORD@www.initat.org/cluster/RPMs/rhel_6.2/icsw-devel
Before continuing the server installation it is worth to say something about the database because it is one of most important parts of the software. It contains all settings, configurations, users and much more in one single database. This is the reason why it is easy to migrate or backup data and this is why it lowers the monitoring effort in comparison to a simple icinga installation.
After a basic installation of the server, normally only a SQLite database exists. To start with monitoring it suffices completely but it must be mentioned that due to database limitations of SQLite we recommend you to switch to a better database solution e.g. SQL.
So before installing the software, two scenarios are possible.
SQL Database already exists
SQL Database does not exists
The server can handle both states, in case of an existing database we want the server doing some migrations for us, in the other case we want to do an initial database setup.
icsw setup runs in an interactive mode and is responsible for a couple of basic setting:
Creates suitable database schemata
Creates an administrator account and an initial password for first login into the web front-end
Creates the database config file /etc/sysconfig/cluster/db.cf
The script asks you for
Install postgresql-server
Install python-modules-psycopg2
Either run following command to migrate an former existing database:
icsw setup --ignore-existing
or run following command to create a completely new database:
icsw setup
After running icsw setup, the script awaits some input from the admin.
possible choices for DB engine: psql, sqlite DB engine (psql) : DB host (localhost) : DB user (cdbuser) : DB name (cdbase) : DB passwd (bTMZPUYMiR) : DB port (5432) :
Take the suggested defaults with the "ENTER" key or insert own data. By now you have to have a faultless PostgreSQL installation setup to allow the software to connect to the database.
In case something goes wrong the script displays possible steps have to be done.
Most of conflicts at this time are wrong permissions to the database or generally a wrong database installation and setup. If below steps can not solve the problem please take a look into your database manual or ask the database administrator to find out how to setup database with correct permissions.
Login into your database (commonly done with su postgres followed by a simple psql)and type following commands to create the right user and a new and empty database:
CREATE USER cdbuser LOGIN NOCREATEDB UNENCRYPTED PASSWORD 'my_password_123'; CREATE DATABASE cdbase OWNER cdbuser;
This will create a new database user with the name cdbuser, the desired password my_password_123 and a new, empty database called cdbase
After successful creation of database user and database, we have to edit /var/lib/pgsql/data/pg_hba.conf
(OpenSUSE) to setup correct
permissions to the database. Comment out other lines so you have only the bolow three remaining.
local cdbase cdbuser md5 host cdbase cdbuser 127.0.0.1/32 md5 host cdbase cdbuser ::1/128 md5
To be on the safe side it is recommended trying to login manually to the database. If you are able to connect manually likely the script is able too.
After everything goes well we get a successful connection message
dsn is 'dbname=cdbase user=cdbuser host=localhost password=bTMZPUYMiR port=5432' connection successful
Once connected successful to the database the script runs the migration for you. Finally you have an installation with a PostgreSQL database.
The database access data for the server is stored in
/etc/sysconfig/cluster/db.cf
created by icsw setup, a sample file is provided under
/etc/sysconfig/cluster/db.cf.sample
. If you want to connect via local socket leave DB_HOST empty. Either fill in the user and database information manually or run
icsw setup for an assisted config file creation.
Every daemon and process from the server is using this file to gain access to the database. The File has to be readable for the following system entities:
The user of the uwsgi-processes (wwwrun on SUSE systems)
The system group idg
A typical set of rights would look like
-rw-r----- 1 wwwrun idg 156 May 7 2013 /etc/sysconfig/cluster/db.cf
Although the software do periodic backups, it could be necessary to do a database backup by hand. For PostgreSQL there is a special dump command:
pg_dump -Fc -U cdbuser cdbase > DATABASE_BACKUP_NAME
This single line is enough to copy your whole database to a file.
A few more action is needed to restore database backup. First of all, before we are able to restore our database backup cleanly, we have to drop (delete) all database contents. After database contents are dropped we are able to import data into the existing and empty database.
Delete database contents:
su postgres -c "psql -c \"DROP DATABASE cdbase; \""
Restore database:
pg_restore -c -C -F c DATABASE_BACKUP_NAME | psql -U postgres
The Webfrontend for your server can be accessed via
http://SERVERNAME/cluster or by http://IP_ADDRESS:80/cluster/
The Webfrontend for your server can be accessed via
http://SERVERNAME/cluster or by http://IP_ADDRESS:80/cluster/
In case you run setup_noctua.sh
script manually, the portnumber will be rewritten to 18080.
So you can access the web front-end by this url:
http://SERVERNAME:18080/cluster or http://IP_ADDRESS:18080/cluster
Also you can use your server localhost alias for accessing the front-end:
http://localhost:18080/cluster or http://localhost:18080/cluster
After added desired repositories for your operating system it is time to install the software packages itself and configure it. There are three main packages you have to install to get a basic server running:
icsw-server
icsw-client
icsw-dependencies
These packages contains all necessary services, binaries, libraries and dependencies for a clean and proper Installation.
If you also want to get access to the server by web GUI, you additionally need to install the nginx-init package and run the nginx-init http server.
For SUSE operating systems a installation command should look like following one:
zypper ref; zypper install icsw-server icsw-client icsw-dependencies
While accessing the repositories you will be prompted for valid username and password. Type in your received access data on the terminal and continue installation.
To guarantee the maximum possible flexibility, we decided to involve the system administrator into the installation procedure. After installation you will get a note on stdout how to create a new database configuration.
icsw setup command
Alternative to usual installation of binary packages via repositories and the operating system package manager like zypper , apt-get or yum, you can use a virtual machine with an ready to go installation. We distribute two popular VM image file formats running with libvirt/qemu and vmware . For information how to set up your VM environment, please take a look at the corresponding documentation of your VM vendor.
Following steps have to be done to run a KVM libvirt/qemu virtual machine with preinstalled NESTOR®
Download the KVM/libvirt image and move it into the right image directory e.g.
/usr/local/share/images/
.
Copy an existing *.xml or create a new one
Edit your new *.xml file
Define your new virtual machine
Finally if your machine is setup correct, only you have to do is to start the virtual machine and have fun with monitoring.
From time to time, new software packages were built and can be downloaded. Especially for the master development branch there are frequent updates which can be applied to get new functions or features or simply fixing some bugs. Update period for master is about every second day.
The stable banch gets less frequent updates than the master version. Because it is the stable branch, most updates for stable affected security issues und bugfixes. Really big updates are done only if the master is stable enough for productive environment. The update period time is about 4-6 month.
The update procedure is very comfortable, it based on the system integrated package manager, for example zypper in OpenSUSE or apt-get in debian.
Comands for updating/upgrading all installed software by package manager are:
Refresh repositories and do whole system upgrade in OpenSUSE
Refresh repositories and do whole system upgrade in debian
Of course, you are also able to only update single packages, for example the package handbook-init . The command looks similar to the command used to update all packages:
Refresh repositories and do single package upgrade, in this case upgrade of package handbook-init in OpenSUSE
Refresh repositories and do single package upgrade, in this case upgrade of package handbook-init in debian
For other distributions please look into your distributors package management description.
After successful installation of NESTOR® first of all you have to check if all necessary services are running. Type following command into the terminal:
icsw service status
Get more information about possible flags with: icsw --help
One of the most common flag is -v. This shows additionally the version number of each package like shown below.
An other common flag is -a. With this flag, the script shows additional information:
Thread info
pids
runlevels
Memory
Take a look into our command reference to learn more about icsw command.
The main command, which can be used to manage the cluster components, is the icsw command. The following table present the frequently used examples of the icsw tool:
Table 3.4. icsw - command overview
icsw-command | Functionality |
---|---|
|
Show status or control icsw services |
|
Show the state overview or enable/disable services These are the service states the which are managed by the meta-server |
|
The logwatch command is intended to show logging messages to stdout. The -f option is used to append data while the file grows. The --system-filter option limits the logging output to a specific service. If used without any arguments it displays logging messages for all running services. |
|
With the icsw license command administrators are able to lock, unlock or show locked licenses and devices from the license system. You can also show your cluster ID or register a cluster. For more information about the lock or unlock command take a look the section called “ Lock command to fall below the parameter limitation ” |
|
Create database and perform initial setup. There are many options and arguments for this command, please take a look into the
|
Short overview about icsw commands
Table 3.5. icsw - service
icsw-command | Functionality |
---|---|
|
Displays the status of the server and all of its services With the "service-name" it displays only status of given "service-name" |
|
Starts the service with "service-name". Without service-name option the command initiates the start of all cluster components. Warning!: if the service is disabled in the meta-server, then in some minutes the meta-server will stop the service again. |
|
This command stops the service with the "service-name" or all services of cluster instance. |
|
This command restarts the service with the given "service-name" or, if no service-name is used, it restarts all services of cluster instance. |
|
Using the debug option is like to start a service in foreground. In contrast with services started as daemon (background), the service probably displays some stdout messages. |
|
Provides the state/status info about all services or specific service, if there is the "service-name" |
|
Enables the service in meta-server settings(database). That means, the meta-server is responsible for running of the service. If the service is not active, in few minutes the meta-server will start the service again. |
|
Disables the services in meta-server settings (database). Henceforward the meta-server controls, if the service is not running. |
The icsw service command
All core features, also called basis features, of the software are licensed under an open source license and for free but some enterprise features of NESTOR® are not. There are licenses for this enterprise features and you have to buy licenses in order to get the features working.
This section guides you through the process of getting licenses, understanding the concept of license limitation and manageing licenses.
If you never have applied licenses to your server then you are running an unlicensed software version. To remind you of this fact you will see notification messages on some pages, for example on the login page and also on the dashboard after succesfull login.
Also you can check your license state by navigating to SessionLicense.
There you will see three different drop down windows:
Your licenses for this cluster
License packages
Upload license file
Shipped Licenses are keyfiles containing information about licensed features, license period and license parameter. Each license is associated to one specific Cluster ID. A keyfile can contain one or more license packages assigned with one or more cluster IDs.
The license package dropdown window shows you an content overview of uploaded keyfiles.
Keyfiles containing more than one single license package will be displayed in separate tabs inside of the License package dropdown window.
You can't expand any dropdown menu by left clicking on the arrow beside the dropdown except the Upload license file one. Use the
Choose File button to select your valid license keyfile. After selecting your keyfile its name will be displayed on the right side of
the button:
Push the upload button to integrate your valid license keyfile into the server and activate aquired enterprise features.
After uploading your valid license keyfile to the server, your license overview imediately will be updated and shows your purchased licenses.
Generally, if you want to see your license overview navigate to SessionLicense
to display the license status.
License name
Description of the specific license
Parameter value is the limitation of licenses in context of
Device
Service
User
External license
Used licenses and amount will be displayed as info window
Valid in future, license will be valid from point of time in future
Valid, license is active and valid until displayed date
Grace, license is in grace time. It is still active until the grace timeperiod of 2 weeks is over.
Expired, license is out of grace time period.
Name of used license package
There are four different periods or states of licenses. Dependent of the period or the state a licensed feature is working or not.
The second factor which decides if a license is valid or not is the parameter limitation. Dependent of purchased parameter amount and of used parameter the license can be valid, in grace time or expired.
Licenses can be violated by exceeding the license time period or by exceeding one of the license parameters limitation. In case of exceeding the license parameter limitation also a grace time period starts which is totaly independent of the license time period grace time.
For a small violation of parameter limitation there is a lock command to get back within parameter limit.
In Figure 3.10, “ License states ” marked with (1) you can see a license violation by exceeding the parameter limitation. Because of a violated license the grace time period starts immediatelly. In your license overview you will notice a warning message for this violated license.
Now, you have two options.
Purchase an extended license for that feature to increase the parameter limitation
Lock the license parameter to get back below parameter limitation
There is a special command for the second option:
icsw license lock
{
-d DEVICE
} {
-l LICENSE
}
After locking licenses [illustrated on (2) of Figure 3.10, “ License states ”], the used license amount falls below the parameter limit and is valid again.
If you want to know if any licenses are locked, use following command:
icsw license show_locks
It displays all locked licenses and corresponding devices.
Table of Contents
This sections explains the core concepts behind NOCTUA®. It gives a top level overview of it's components and capabilities.
Following some separate components are listed and to which part of the software it belongs to.
As relative complex software, NOCTUA® use some well known frameworks and technology we absolutely have to mention.
Used Frameworks
Web application framework written in python.
Opensource framework developed by google™.
CSS Framework developed by twitter™
Used software solutions
Industy standard software for monitoring devices.
RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. RRDtool can be easily integrated in shell scripts, perl, python, ruby, lua or tcl applications.
Powerfull small and fast webserver.
Son of grid engine.
Parts which both software packages needs to works properly.
Components
This daemon is responsible for restarting services that might have crashed or were otherwise killed. This functionality should be taken over by systemd.
/var/lib/meta_server
contains the relevant information about which services should be running.
Creates the structure needed for receiving logs via rsyslog or syslog-ng.
Monitoring in context of hard- and software will be practiced to get information about specified systems.
Components
Responsible for configuration of icinga. Interacts with database.
Responsible for writing config files and general coordination of the cluster. Listens on port TCP/8004 . cluster-server.py is a general purpose server that handles various tasks like writing /etc/hosts, generating a valid DHCP configuration, configuration of the BIND nameserver, feeding LDAP and / or YP servers, ...
Client part of the host-monitoring collserver.py.
Frontend program to talk to collrelay.
NOCTUA® consist of many different parts and services. Each of these services perform a specific set of tasks in the cluster. Most of these services are network enabled and listen on a specific port for commands. The following list tries to give an overview about the most important parts.
The general idea of NOCTUA® is simple. Create an image, a kernel and a set of configuration files to be used by your nodes and distribute them to the nodes.
The distribution is done via PXE. NOCTUA® enables you to describe the node specific configurations in Python.
Components
Generates the files for the Clusternodes (based on the config stored in the Database) to make the nodes distinguishable.
Creates the tftpboot /ethernet structure, monitors the installation progress of the nodes. Listens on ports TCP/8000
TCP/8001 . Mother provides access to IPMI as well.
Provides repositories available for installation by the package-client. Listens on port TCP/8007.
Install required software by using the locally available package management commands. zypper, yum or apt-get.
A small program written in C that transmits node status messages to the cluster-server. The hoststatus is written in C to be easily includable in the initial ramdisk.It listens on port TCP/2002. hoststatus is in the package child
Do log rotation and deletes logs older than a specified time range.
Provides integration of NOCTUA® with SGE. The commands sns and sjs rely on it.
Daemon for automatic configuration of devices.
The database where all the configuration data of NOCTUA® installation is stored is generally referred as the "clusterdatabase".
Throughout this document we might refer to:
Server side scripts . Are generally services or scripts that are running on the clusterserver. Most of them need database connectivity to function properly.
Node side scripts . On the other hand, node side scripts are daemons or scripts that are generally run on a node of the cluster. Node side scripts don't require access to the cluster Database.
Because NOCTUA® consists of many different parts working together, it is not obligatory to run every service at once. Services like package-install or discovery-server are not essential to operate monitoring or cluster management.
For that reason, default installation of NOCTUA® is rudimentary. Special services and functions are not activated by default. Activation of certain services force the user to push some buttons or move some lever.
Two spots where you can activate services are:
cluster server information
Device Config
Table of Contents
Most configuration in NOCTUA® administrators have to do, is accessible over a standard html compatible browser like Mozilla Firefox™ or Google Chrome™. Once NOCTUA® is installed and all required services are running, all you have to do is to connect to the server via browser.
Type in
http://SERVER-IP-ADDRESS:80/cluster/
or
http://SERVERNAME/cluster/
in your browser addressbar to connect to the server. If you connect the first time to the server you will be redirected to the account info page.
NOCTUA® webfrontend offers you a very clear view. There are three areas you will work with:
Menu area (1)
Sidebar device tree (2)
Main area (3)
In the menu area you'll find submenus, buttons, date, time and user section.
Submenus
Base
Users
Monitoring
Session
NOCTUA® offers some additional menus:
RMS - Resource management System
Cluster
Buttons
cluster server information
show cluster handbook as pdf
show index
number of background jobs
In the tree area you can find your device group tree and associated devices. Located on top, there is a searchfield and 2 buttons.
Searchfield
use selection Button (green with arrow)
clear selection Button (red with circle)
Group
FQDN (Full Qualified Domain Name)
Category
Alternativ there is an one button selection method.
All the configurations and input takes place in the main area. According to the selected or preselected devices and settings, corresponding page appears.
Figure 5.8. Possible main area
One possible view of main area after select some devices in "device network"
The cluster server information button shows three overview tabs, one tab with information about definied cluster roles, one tab with information about server results and one with information about the server itself.
Inside this upper tab, there is a table showing the Name, reachableIP and the defined cost of each of them. This tab is only for displaying information.
Each of the defined roles provides special functionality to the server.
Also a tab only for displaying general information.
valid
name
Result
max Memory
total Memory
This is the only one tab inside of server information, which allows you to control something. You are able to control services as you do by command line.
Following information will be displayed:
Server information
Name of service
Type of service {node, server,system}
Kind of Check
Status if service is installed or not
Versionnumber of installed service
Number of processes started
Displays memory usage as number and as statusbar
Button to apply action to the services
Table of Contents
The workflow inside of the web front-end follows a special pattern. This workflow repeats for specific actions and therefore it is worth to mention and learn it. We divide this section into four subsections listed below to show the difference of each one. Of course it is possible to get similar results by different ways but each way have advantages and disadvantages or is more or less efficient.
There are also software regions like Nodeboot which are only accessible by one single way.
All preselections are done in the sidebar device tree Section 5.2.2, “
Sidebar device tree (2)
”. Select groups or devices and click in top menu on
desired submenu to access .
Submenu method is recommended for working with multiple devices.
Following areas can be accessed by the submenu method:
Device Tree
Device Variable
Device Network
Device Configuration
Nodeboot
Package Install
Device Settings
Monitoring Overview
Livestatus
The method by home button is more general. Again preselection of devices or groups takes place in sidebar device tree but afterwards instead of choosing a submenu,
this time we click on the home button.
As result we get some useful tabs for each device.
Home button method is recommended for working with multiple devices.
Following areas can be accessed by the home button method:
General
Category
Location
Network
Config
Disk
Vars
Status History (Detailed information about this tab, see in the chapter Status History)
Livestatus
MonConfig/hint
Graph
By direct click on the device name in sidebar device tree we get similar overview tabs like by home button method
but only for one single device.
Direct method is recommended for working with single devices.
For graphing there is the requirement first to select devices, then select the wanted graphs and finally draw this graphs for selected devices. The graph
preselection will remain unaffected if you change the device preselection. After changing device preselection you must push the
Apply button. Same is true for drawing graphs.
We implemented three little helping functions to ease the handling with large and complex tables inside of the web front-end. This helping functions are located on several places/pages e.g. like sidebar device tree, device tree or device configurations mostly on top of the specific place/page.
Most common auxiliary function is the filter input field on top of device trees or configuration tables. Most simple usage of this filter field is to insert some text string or number to filter for. If you do so for example in sidebar device tree then only matching devices will be selected.
Table 6.1. Examples for filter function in sibebar device tree
Regular expression match character | Matching description | Example | Result |
---|---|---|---|
^ | Starting possition of line |
^node | Select all devices whose device name begin with the string node. |
[0-9] | Range of numbers |
node[0-5] | Select all devices which called node immediatelly followed by a number between 0 and 5 |
$ | End of line |
[0-9]$ | Select all devices whose name end with a numeric between 0 and 9. |
\d | Digit |
\d$ | Select all devices whose name end with a digit. |
Regular expressions for input fields.
Further information on regular expressions filter can be found in the world wide web by looking for javascript regex.
Figure 6.4. Input field filter for device configurations
![]() |
Displays all device configuration entries beginning with base, even they are not selected.
An other auxiliary function in handling with tables are the show and hide buttons on top of tables. With this buttons you can easily show or hide specific table columns.
To ease displaying of longer lists and to avoid to much page scrolling there is also a simple pagination function built into the software. With pagination we are able to limit entry output on pages to a specific number. Only the choosen numer of etries will be displayed, the other entries, if there are some, will be divided on separate pages which can be accessed via the page button.
Last but not least we'd like to mention the column sort function. It also could be very useful to display only desired data.
Not all columns provide this sort function, but most of them do. The function will be toggled by clicking onto the column name. If the function is
activated there is a small triangle left beside the column name pointing with its tip either in top direction for ascending or in bottom
direction for descending sorting. If no triangle is visible sorting function is deactivated.
Sorting method is:
First numerical
Second alphabetical
Figure 6.7. Sort column
![]() |
Activated ascending sorting marked with a small black triangle pointing in top direction.
Sometimes it could be very neccessary to undo former applied changes, for example if you have a typo in an script, variable or whereever or if someone else has applied some changes and you want to see the state before and after this changes.
Newest data and changes will be attached on top.
Our developer created a reversion function not only to display what changes were done but also to go back in change history to
desired state and drop changes which were done afterwards. History reversion can be found in top menu at
ClusterHistory.
The reversion function is based upon the central database (default is PostgreSQL) so in principle every change written into the database can be reversed. Normally there are lot of different data stored in the database to ensure every component works fine but it makes no sense to provide the reversion feature to all of this collected data. For normal user and administrators it is completely enough to revert changes which were done via the web front-end.
For example three new user were added in User Management. Like shown below, system history lists all relevant database entries for each of them.
Now lets suppose there was a typo in one of these names e.g Lucy was changed to Luci. If we take a look into the system history under User we get exactly this change displayed in an diff-like style.
If you are not only satisfied to display changes but also really want to go back to an earlier version there is the revert to this version button.
For example, someone changed directory paths in an script located at BaseConfigurations
and you would like to display that changes, simply navigate to the History tab of Modification window
to get a list containing all applied changes up to now.
Next step to do is to mark your desired version out of the list. Now, you can either apply reversion by clicking on the Modify
button or just switch to the script editor to check how the script looks like after reverting.
Table 6.2. Colorcode for reversion
green |
inserted character |
red |
deleted character |
black |
unchanged character |
We can see all changes at a glance in above shown figure.
Table of Contents
After installation of NOCTUA® the user admin and the group admingrp already exists. This is the user you have to change password for after first login into your fresh installed system.
User admin has all possible rights and permissions to add, to modify and to delete devices/groups etc. User admin is also able to do reconfiguration of database and of course able to add or delete new user.
If you want to set restrictions for some user or groups, for example for external staff, you have to create this new restricted user/group with following buttons:
To add a new group in user management, klick the "create group" button, fill out the form and confirm your input by klicking the "Create" button.
The form is self-explanatory, but some input should be mentioned anyway:
Internal group ID
Set basic permissions to get access to selected devicegroup
Another extended form can be shown by clicking the new created group in the user/group tree:
A more complex permission system appears.
Similar structure and procedure is true for creating new user.
Also here we must mention some contents:
Internal user ID
Is the superior group
Operating system group
Owns all rights and permissions like the admin own
The permission system is divided into several parts which covers certain functions. Some permissions depend on other permissions, or in other words, chainpermissions. The more permissions user get the more powerfull they can act. The user "admin" or "superuser" is the most powerfull user. Admin have all possible rights and permissions.
Below is a list with permissions and what their functions are.
background_job
Shows additional menu button:
Session Background Job Info
config
Shows additional menu button:
Base
Configurations
device
Shows graphs tab for selected devices. Depends on possibility to choose devices (acess all devices)
Shows disk tab for selected devices. Depends on possibility to choose devices (acess all devices)
Change basic settings (General) for selected devices. Depends on possibility to choose devices (acess all devices)
Shows new top-menu named Cluster
Show Config tab for selected devices. Depends on possibility to choose devices (acess all devices)
Show Category tab for selected devices. Depends on possibility to choose devices (acess all devices)
Shows new top-menu:
Base
Device connections
Show Location tab for selected devices. Depends on possibility to choose devices (acess all devices)
Shows 3 new tabs for selected devices:
Livestatus
Monconfig
MonHint
Shows new top menu content:
Base
device network Depends on possibility to choose devices
(acess all devices)
Show vars tab for selected devices and new top menu:
Base
Device variables.
Depends on possibility to choose devices (acess all devices)
The main permission to show devices. Most of above permissions depends on it. Shows existing devices in device tree on the left.
group
...
image
...
kernel
...
mon_check_command
Shows new top menu content under:
Monitoring
Basic Setup / Build Info
network
...
...
package
Shows new top menu under:
Cluster
Package install. Additional software packages can be choosen and installed by
this menu button.
partition_fs
...
user
Shows new top menu content unter:
Session
Admin
...
Shows new top menu content unter:
Base
Category tree
Shows 2 new top menu content unter:
Base
Crerate new device / Device tree /
Shows new top menu content unter
Base
Domain name tree
...
The permission level defines what can be done by users. In combination with the permission itself, administrators are more flexible in assigning rights and permissions to user or to groups.
Below are 4 main permission levels which can be assigned.
Permits the user to read data. User can't change, create or delete data.
Permits the user to change existing data. Includes read-only level.
Permits user to change and create new data. Deletion is not possible.
All Permissions are granted.
Table of Contents
In this chapter we would like to explain how to add devices, also called hosts, into the system. This works in the same way in the context of monitoring as well as in the context of HPC management.
We already know the first [5.2.2: “
Sidebar device tree (2)
”] placed in the sidebar on the left side of
the web front-end. This device tree is mainly used for preselection and direct access to single device info area.
The second and much more powerful device tree can be found at BaseDevice tree. It provides an
extended filter function, possibility to modify and delete multiple devices at once and of course a function to add new devices and devicegroups. Among this there is also
a function to enable or disable single devices or whole device groups.
This is the main tool to managing devices.
Our filter function allows you to find and select devices even if your choices are very picky.
Possible filters
all, displays both devices and groups
Devices, display only devices
Group, display only groups
all, displays both selected and unselected groups or devices
selected, displays only selected groups or devices
unselected, displays only unselected groups or devices
all, displays both enabled and disabled groups or devices
enabled, displays only enabled groups or devices
disabled, displays only disabled groups or devices
ignore, ignore filter option
SERVERNAME
, displays only groups or devices linked to this Monitor Server
ignore, ignore filter option
SERVERNAME
, displays only groups or devices linked to this Monitor Server
This input field allows you to filter devices and groups by name and special regular expressions described in
Section 6.2.1, “
Input field filter/regex
”
There are two selection buttons at the end of the filter input field: select shown and
deselect all. This selection buttons are only related to this device tree and have no influence on the sidebar preselection.
On top of device tree overview, right below the filter options, there are four buttons. The green button function on the left side are creation of new devices or new groups. The blue and red button on the right side handle deletion and modification of multiple groups or devices at once.
Below our creation, modification and deletion buttons there is a pagination row and some column filter buttons. Next there is a table displaying overall information for groups (highlighted with a light green color) and devices:
Device or groupname
Selection button
Group or device escription text
Displays if device or group is enabled (yes) or disabled (---)
Device or grouptype
Top level node
Shows if sensor data is stored on disk
Shows if password is set or not
Shows monitoring master server for this device
Shows the boot master server if it exist
modify, modification of single device or group
delete, deletion of single device or group
create device, creation of single device
With help of these four top buttons and their equivalent on the right side of the device table overview you are able to comfortable managing devices and groups.
Use this buttons to create new devices or new groups.
Table 8.1. Settings for device
Table 8.2. Settings for devicegroup
Basic settings |
Name |
Name of devicegroup |
Descriptin |
Devicegroup description field | |
Additional settings |
Domain tree node |
Dropdown for domain tree node selection |
Flags |
Enabled |
Enable or disable whole group |
There are two different ways to modify devices in device tree. Either you can modify a single device by pushing the
modify button beside it [see Figure 8.2, “
Device tree overview
”] or you can modify several devices at once by
selecting all desired devices and leftclick
on modify selected on top.
Multiple modification of devices provides the possibility to change almost all settings for devices like single modification provides. Only the name and the comment field of devices can not be changed in multiple modify mode because it does not make any sense.
Typical usage of multiple modification could be changing device groups , changing the monitoring server or just enable or disable a couple of devices at once.
Due to the fact that all configuration data is stored in tables of our central database and that data objects of one single device or group can be referenced to one or more other objects we have to be very careful deleting objects.
A safe delete function is implemented to prevent database damage by deleting database objects which reference to other objects. This safe delete function works for two scenarios and provides additional options.
First scenario deals with objects without any hard reference. Lean back and enjoy deleting such devices or groups because it's completely secure. This is the usual case.
Second scenario deals with for objects with hard references. In this case you are also able to delete, but you have to specify actions to handle with references.
Delete dialog
If the checkbox is disabled deletion will be done in foreground blocking the web front-end until delete process is finished.
Delete asynchronously, if the checkbox is enabled deletion will be done in background and web front-end is unblocked. This option is recommended for deletion of many devices.
Deletion of devices listed here is safe, i.e. deleting these objects does not affect any other objects in the database. Using the checkboxes, you can restrict the devices you actually would like to delete.
Each device with hard references will be displayed in a separate tab.
Table, shows the table name in which the reference occurs
Field, shows the table field name which references the object we want to delete
First level references and Second level references, displays the number of references. An additional show/hide button opens a detailed content list of the table.
In order to prevent unwanted delete cascades, user interaction is needed to select one of the following actions in the
Action dropdown menu:
set reference to null, replace reference by null, causing no deletions of referenced objects.
delete referenced objects, delete referenced objects (if only first level reference)
delete cascade on referenced objects, delete referenced objects recursively, i.e. deleted all objects referenced by this objects and all objects referenced by any reference.
Sometimes administrators would like to add devices without the need of complex configuration. For this case there is a simple and fast device adding dialog, focused only on essential settings listed below.
The dialog for fast device device creation can be found under BaseCreate new device.
Input fields for easy and fast device addition
Fully qualified domain name. For example host.domain.com
Pick an existing group or create a new one
Field for IP address, if an FQDN exists and the nameserver works, IP address will be inserted automatically.
Resolve button apears beside the input field.
List of existing icons. Selected icon will be displayed inside of icinga.
Device is really routing capable, that means it is able to forward IPs
Connection to network topology central node, also called peer connection
Field for additional information
Table of Contents
There are two basic network settings inside of NOCTUA®. The first one is more general and relates to the monitoring system itself. It is placed
in BaseNetworks and is subdivided into three areas.
The second one is more specific and is direct related to devices. It resides in BaseDevice networks
and is also subdivided into three areas.
Like mentioned above the second network settings are related to devices.
This is the main place for device network settings and the place where devices, netdevices, IP address and peers (network topology central node) will be assigned.
Basically inside of Device networks our structure is also a tree. For each device one or more netdevice can be assigned. Same is true for IP adress. For each netdevice one or more IP address can be assigned. Only exception here is the peer connection. Following picture illustrates the tree structure inside of device network.
Figure 9.1. Typical tree structure
![]() |
Single device with one or more netdevice again assigned with one or more IPs
Devices, also often called hosts, are real hardware components or VM devices integrated into the software either through
[
8: “
Device Management
”] or through
[
8.4: “
Create new device
”].
One device is absolutely unique bounded into the system with an unique name.
Name of the device like listed in [Chapter 8, Device Management ]
Number of assigned netdevices
Number of assigned IPs
Number of assigned peers
Name of used snmp schemes (detected and used by [Chapter 13, SNMP discovery ])
Infobutton for extended information
Create new button to create new netdevices, IPs or network topology connections (peers).
update network button to use the discovery server with SNMP or host-monitor for automatically scan and integration of found netdevices. Take a look into [Chapter 13, SNMP discovery ] for more information.
After usage of Create new netdevice button, an modal window appears.
Table 9.1. Basic settings
Basic settings |
Devname* |
Name of netdevice |
Description |
Netdevice description text | |
Mtu* |
Maximum Transmission Unit | |
Netdevice speed* |
Speed settings for netdevice, with or without check | |
Enabled |
Enable or disable netdevice. | |
Is bridge |
Defines netdevice as bridge. Disabled flag results in additional vlan settings. | |
Routing settings |
Cost |
Prioritize network connections. The higher the value the less priority a network connection has. |
(important for proper monitoring) Network topology central node (peer) |
Define netdevice as | |
Inter device routing |
| |
VLAN settings (only available if "Is bridge" flag is disabled) |
Master device |
|
Vlan id |
|
Fields marked with * are required
Table 9.2. Hardware
Basic settings |
Driver |
|
Driver options |
| |
ethtool settings (for cluster boot) |
Ethtool autoneg* |
Dropdown for on, off and default. |
Ethtool duplex* |
Dropdown for on, off and default. | |
Ethtool speed* |
Dropdown for speed definition. | |
MAC address settings |
Macaddr |
Inputfield for MAC address |
Fake macaddr |
Inputfield for fake MAC address | |
Force write DHCP address |
|
Fields marked with * are required
One or more netdevice can be linked to each device. As usual for normal computer there are one loopback device and one ethernet device. But in real networks there can be much more netdevices. Server, switches or router for example could have assigned a large number of netdevices.
On top of netdevice overview there are show and hide buttons for some columns described below. Use this buttons to show or hide columns which are important or not for you.
Devicename
Like primary key in database context, idx is an unique id
Netdevice name
Number of assigned IP address
Number of assigned peers
Name of bridge device if one exists
The hardware MAC address.
Detected netdevice type
Maximum Transmission Unit
Netdevice link speed
Value to prioritise network connections, the higher the value, the less privileged network connection is.
Displays enabled flags for netdevices
Shows netdevice status if scanned by snmp. Up means the device is administrative up and ready to work, up/down means the device is administrative up but not operational e.g. because of missing link
Info button for extended information
modify button to modify netdevices
delete button to delete netdevices
Modification of existing netdevice is as easy as create new one. Simply push the modify button and the
modification window appears.
IP address can be assigned to netdevices by pushing the Create new button and selecting the dropdown
IP Address. Below you can see a description table for risen window.
Table 9.3. IP Adress
Basic settings |
Netdevice* |
IP address will be assigned to this netdevice |
Ip* |
Valid IP address | |
Network* |
Dropdown with available networks | |
Domain tree node* |
Dropdown with available domain tree nodes | |
Alias settings (will be written without node postfixes) |
Alias |
|
Alias excl |
|
Fields marked with * are required
Like shown in [Figure 9.1, “ Typical tree structure ”] network topology connection (peer connection) is a very important network setting to get your system and especially your monitoring work.
Please look into [Section 9.2.2, “ Network topology ”] for detailed information.
Table 9.4. Create new network topology connection
Settings |
Cost* |
Prioritize network connections. The higher the value the less priority a network connection has. |
Source |
The source netdevice | |
Network topology central node |
This is the central node other netdevices can be connected to. Usually this will be a monitoring server or a cluster server. | |
Source spec |
Additional field for source information or comment | |
Dest spec |
Additional field for destination information or comment | |
Info |
General info or comment field |
Fields marked with * are required
Third list in device network settings is reserved for IP overview.
What we called peering is the possibility to link single hosts and devices in the network to central nodes. It is not only the possibility but also a condition to connect and link devices with a monitoring server. As a result you get a structural network map called network topology.
Select some devices in the sidebar, navigate to BaseDevice network, and then into the
network topology tab.
Control: Selected devices should be displayed by default. Use left mouse button to move topology view, use your mouse
scroll wheel to zoom in or out. Pinning of devices is also possible with drag and drop. Pinned devices will be marked as red.
Following settings are available:
With this dropdown, you are able to display more or less of your network facility. Select + 1 (next ring) displays your selected devices, plus all devices which are one level above the selected ones. Analogue it's true for +2, +3.
Draws topology view again.
Toggle between livestatus view and pure view
Displays only selected segments of livestatus. For details please look into section
livestatus filter options
Pure view displays only hosts without any additional information. Because of less data transfer it is, especially for wide networks, faster than a view with livestatus.
In contrast to the pure view, livestatus view provides much more useful information. It overlays the real livestatus of every device into network topology view.
Table of Contents
The primary purpose of NOCTUA® is to monitor network devices and devicegroups. Nearly each measurable value like space, speed, temperature, rpm, availability and much more can be monitored, observed, recorded and evaluated.
Even applications and their stdout or stderr data can be integrated into the monitoring workflow by self designed commands
There are almost no limits about which device can be monitored. Typical devices are: Fileserver, Cluster, Webserver, Switches, Printer, Router, Thin clients and even Telephone systems
As "Basic Monitoring" we call monitoring which do not require any additional software on the client side. This kind of monitoring can be done out of the box. Only thing administrators have to know is the network IP-address of the client machine which should be monitored.
Typical checks for basic monitoring are:
http (check port 80)
https (check port 443)
ldap (check port )
ldaps (check port 636)
ping
ssh (check port 22)
All these checks can also be performed via command line of a linux box.
A bunch of checks are available over the SNMP protocol. In contrast to above mentioned passive monitoring checks, this kind of checks need a running SNMP daemon to work with. Many hardware out there like server, switches or router already provides SNMP information by default. For other machines don't providing SNMP by default, it is mostly possible to install a SNMP daemon afterwards.
Use following commands to install the snmp-client package on the machine:
apt-get install snmpd
zypper install snmpd
The other way to get active monitoring data and therefore check results is the IPMI interface integrated in devices. NOCTUA® is also able to evaluate this kind of data.
Last but not least way to monitor devices is our self developed host-monitoring service, which is one part of the icsw-client package. It provides not only common checks like ping or http-check but also special system monitoring checks like mailq, quota, free and much more.
If you like to monitor system values like that you have to install the icsw-client package on each client. For that use the following commands:
apt-get install icsw-client
zypper install icsw-client
Then start the host-monitoring service on the client side
To be able to intall the package you must already set the right repositories on your operating system. Please look into the
Installation
Chapter for details.
After successful installation of host-monitoring package, you have to edit its config file:
/etc/sysconfig/host-monitoring.d/machvector.xml
File content of machvector.xml should looks like this:
<mv_target target="192.168.1.232" send_every="30" enabled="1" port="8002" send_name="" full_info_every="10" immediate="1" send_id="1" sent="0"/>
Of course there can be more than one target line, each with unique send_id.
Relevant parts are the target and the enabledparameter. The other parameter are just for fine tuning. Finally you have to restart the host-monitoring service with rchost-monitoring restart.
Parameter description
IP address of the monitoring server
Possible value is any valid IP address
Period of sent data
Enable or disable sending function
Monitoring server port on which the server waits for data stream
Timeperiod of sent full information data
Option to send data immediately or cache data and send later
ID number of sent data
Monitoring configuration is build as a hirarchial tree structure. Root of every configurations is the catalog . The catalog contains configurations , which contains again either variables , monitoring checks or scripts , all together or only one of them.
Catalogs can contain various numbers of configurations.
Variables let you override NOCTUA® specific settings or pass information into NOCTUA®. Monitoring Configs are used to describe which check should be performed against the devices that are associated with the config. The most powerful part of the Configuration system are the Scripts. These allow you to execute arbitrary Python code to generate files and directories on the fly. There are several utility functions already accessible.
do_fstab() do_etc_hosts() do_nets() do_routes() do_uuid()
The Python dictionary
conf_dict
is available as well. It
contains configuration information like node ip and other.
To include an already existing file in the node config use show_config_script.py to render the content as Python code ready for inclusion.
show_config_script.py
[
FILENAME
]
The highest layer in configuration hierarchy is the catalog.
By default there is only one catalog containing some basic configurations. Take a look into
Base
Configurations
and choose the
catalog
tab. You will see a table with at least one entry and a
modify
button. The column
#configs
shows you how many configurations are stored in this catalog.
If you ever plan to collect your own configurations in one central place this is your first class choice. There will be a download function for catalogs in future versions. The create new catalog button shows a window with a few basic settings:
The catalog name
The name of the author
$$Place where catalog will be saved
The second layer below the catalog layer is the configurations layer. All check commands, variables and scripts are stored into it. Typically a configuration consists of check commands ,scripts and variables.
The configurations tab resides right beside the catalog tab, it displays an overview table with some configuration entries, filter tools and action buttons.
Configuration table
Allows filtering for configuration name, variables, scripts or check commands.
Creates a new empty configuration entry
Turns on other pages or define how many entries will be displayed
Buttons to create, modify or delete variables, scripts or check commands
Either download selected configurations or upload own configuration files (in *.xml format)
Buttons like this one:
shows you the amount of entries and allows you to open the specific section for more detailed view.
To bind checks or configurations to devices and groups we use BaseDevice configurations. We can select
single configurations for devices or for whole device groups. Displayed icons for enabled configurations differ for devices and groups.
Configurations which can not be assigned to groups due to server and system flags are marked as cross on red background. Assigned configurations are displayed as check mark on green background for devices and as circle with check mark inside of it for groups.
Following list describes only server and system configurations:
Server and system configurations
If enabled on server, the command cluster-server.py -c write_etc_hosts is able to write into /etc/hosts
.
...
Enables the server to act as discovery_server (required for SNMP scan or disk scan)
Enables the server to act as image server, providing images to nodes.
Enables the server to act as kernel server, providing kernels to nodes
...
Enables the server to act as monitoring master server
Enables the server to act as monitoring worker server (For distributed monitoring)
...
Enables the server to act as package server (required for package install)
...
Enables the server to act as rms server (required for RMS)
...
...
...
...
...
...
Enables the virtual desktop feature on server side
Enables the virtual desktop feature on client side
To begin slowly, first lets do a basic example configuration. In this basic example we want to check a simple ping response for a host in a local network. With this monitoring information we can make assumption about the networkdevice itself or its sourrounding network area.
Create a new device (connected to your monitoring server) and configure at least one network device for it, one IP address and one peer/network topology connection .
Select the new device from device tree and navigate to the config tab.
Alternative you can preselect one device in the sidebar device tree and go to BaseDevice configurations.
Enable the check_ping config to activate the check.
To make sure your configuration will be applied you must rebuild your config database. Go to top menu, click
Monitoring
rebuild config
(cached, RC)
Now, that we know how to create simple checks for single devices, lets do a more complex configuration with more than one device and more than one check.
For this plan we have to use devicegroups with defined check configs:
In top menu navigate to
Base
Device tree
Create a new devicegroup by pushing the create devicegroup button.
Create some new devices by navigating to
Base
Create new device
and entering some
domains into the
Fully qualified device name
field. The IP address should be automatic resolved, if not, try to push the
Resolve
button.
Choose your monitoring server as "Connect to" device.
Select the checkbox of the new devicegroup from devicetree menu on the left side and push the home button or the green arrow use selection button.
In Config tab click the minus sign below check_ping topic to activate ping checks for the whole group. Now every device assigned to the http group has the check_ping check activated.
Configurations with red background and a visible cross inside a circle are locked and can not be selectet. Mostly this affects configs which don't make any sense to be selected.
One of the most helpful feature in NOCTUA® is the livestatus. The livestatus view shows you current status of devicegroup, devices, services as different output formats.
The classic view of livestatus is the graphical burst. The burst consists of multi layer circles and its circle segments. Each segmet stands for a specific group, device or service and displays the current state of it almost in realtime. The segments color can be either green (okay), yellow (warning) or red (critical).
Like on other places within NOCTUA®, also in livestatus there are handy filter options. You can show or hide the different check states, and livestatus views.
The filteroption map only appears if location or image maps are defined otherwise it is hidden.
The layer of livestatus view follows this confentions:
The outer the layer the more exact is the indicator. The innermost circle layer respresents the whole system with all connected and configured checks.
Mouseover function provides in addition an information table beside of the burst view. For each segment of the burst there will be faded according information table.
Table 10.1. Description for mouseover information table
Device |
Devicename |
Description |
Description text for device |
Output |
Output of command or check |
State |
State of check |
Styte type |
Either hard or soft state type |
Check type |
Type of check, either active or passive |
attempts |
Number of checking attempts, i.e. 1 of 5 |
Dependent of the chosen segment there will be more or less details in the table.
Figure 10.13. Mouseover information table
Information table displayed while mouseover on burst segment
With left mouse click into a device/host segment, only selected device and its services will be displayed on the whole livestatus burst, now other devices should be hidden.
To go back to livestatus overview of all selected devices click with the left mousebutton into one of the inner burst layer.
If you don't like the graphical burst view there is also a table or list view integrated into NOCTUA®. To hide the burst view use the show/hide buttons within the filter options. With an active table filter button, you now should see only the table view.
On top of the table view there are some hide/unhide buttons for every column of the table. Use this buttons to show or hide columns you would like to be shown or not.
There is also a simple pagination function to limit the table length and a filter input field to sort according to strings in the node name, description and result column.
And of course the main filter function options mentioned in Livestatus filter can be applied too.
Other than livestatus burst and table view, livestatus categories serve an other powerful and handy arrange function. Administrators can group configurations or check commands into sefl created categories in the category tree.
This way it is very simple to group thematic related configurations or check commands. Grouped configurations/check commands can be easily displayed or hidden with the livestatus monitoring categoriy tree.
Figure 10.17. Livestatus monitoring categories
Only one selected category and its related checks will be displayed in livestatus burst and tables view.
Please look into Configurations categories section to find out how to set categories.
The livestatus burst is also part of the monitoring overview placed in menu Monitoring
Monitoring Overview. Monitoring overview is the best choice if you want to get a summary of your host status and service status.
There is a filter input field and an only selected button on top of the monitoring overview. Below that filter there is a simple pagination function to limit displayed device number on the page. The main area contains availability pie charts for hosts and services and additional the livestatus view. It shows only the status for the last week, for yesterday and for now.
Pay attention of the different color meanings for hosts and services, especially the orange and red color.
With mouseover function you are able to see related values of the pie chart.
Figure 10.18. Monitoring overview
Monitoring overview for seven devices with displayed mouseover values for one of it
In contrast to the livestatus function, status history provides output data for hosts and checks for past timeperiods. This can be very useful to generate availability reports and check the status of hosts and services to a specific timeperiode.
"Status History" tab shows the device status information, which can be devided into three parts:
The summary about reachability of device for the specific period
Te summary about checks status for the predetermined timerange
Extensive information about events and delivered messages of each check for the selected time cycle
For determination of the timerange at first select the time unit. There are 4 different values to define the time unit: day, week, month and year.
If the time unit set, in the field on right site you can select the specific period: day - if the unit set to day or week, month - for monthly unit, year - for unit "year".
If the timerange defined correctly, above the timeline appears the messages "Showing data form ... to ...".
And below the timeline shows up the refreshed status burst of the device reachability for the specific piriod.
For example, the weekly overview can look so:
or yearly overview:
Below the status burst there is the check status panel. The panel show percentage of the events with status "ok", "Waring", "Critical", "Unknown", "Undetermined", "Flapping".
Graphical represation of the checks status for specific timeline is presented with the last column in this table.
To get the detailed information about the events, click the "arrow" symbol on the left site for desired check. For selection createria there are 3 options:
"all events" - shows all events for selected check during predefined time period
"envents with new messages" - shows the events with differnt messages (works similar to "group by" option)
"state changing events" - shows only the events, which inform about the state change for the check during selected timeline
Select the appropriate option to get the requsted information. Additionaly, there is the "Filter" - field for precise selection.
The NOCTUA® notification system is made up of seven parts. Each of them is responsible for a specific task and is located in a separate tab.
Periods
Notifications
Contacts
Service templates
Device templates
Host check commands
Contactgroups
First tab in notification view is the period tab. By default there is only one period defined, it's name is always, it has no alias and can't be deleted. This period is true for every day and every hour, that means it is always true.
Periods can be helpful to set exact notification times for example only on weekends, hollidays, day and night stages or working hours. Or it is also possible to notify different people to different times for example during day or night shift operation at which different administrators are responsible for monitoring.
Notifications are basically message templates.
By default there are four monitoring notifications templates, SMS and MAIL notifications for each of services and hosts.
New templates can be created by the green create new button on top. In risen window most of the form fields are self-explanatory like Name, Channel (Either E-Mail or SMS), Notification type. To get exact information about the hostname, hostaddress, hostoutput, hoststate, date and time in recived notification, you are able to insert a couple of variables into the forms. This variables will be replaced by related values of related hosts. The table below explains all possible variables.
Configuration table
Notification name
E-Mail or SMS
Host or service
Mail subject
Message Content. Following variables can be used to generate message content: $HOSTSTATE$, $NOTIFICATIONTYPE$, $INIT_CLUSTER_NAME$, $HOSTNAME$, $HOSTSTATE$,
$HOSTADDRESS$, $HOSTOUTPUT$, $LONGDATETIME$
Table 10.4. Description of available variables
Variable | Description |
---|---|
$HOSTSTATE$ |
State of Host, can be UP, DOWN or UNREACHABLE |
$HOSTNAME$ |
Name of host that notifies |
$INIT_MONITOR_INFO$ |
Icinga version that notifies |
$NOTIFICATIONTYPE$ |
Either PROBLEM, |
$INIT_CLUSTER_NAME$ |
Name of cluster that notifies |
$HOSTADDRESS$ |
Host IP address that notifies |
$HOSTOUTPUT$ |
Information about what goes wrong |
$LONGDATETIME$ |
Date and time of notification |
The next part of notification setup is the Contact area. Usage of contacts makes it easy to forward specific notifications to specific user. With contacts you are able to distribute notifications to different user.
Sometimes it's not a good idea to send all notifications to one single person. Instead of this you can create a second contact with a second user which should get only nofifications of special services or only at specific time. This is why notification in NOCTUA® are so powerful. The administrator or user has a very powerful, easy to control and high configurable tool.
Combination of Periods, Notification Templates and contacts gives you a couple of options to setup your notification system.
To add other user to your notification contacts, and in case admin is the only one user, first of all you have to create new users in Users
Overview.
Use the create new button in order to add new notification contacts.
Monitoring contacts window consists of three areas:
Three monitoring contact sections
Base data
Service settings
Host settings
Base data section
User who will get notifications
Notification template which will be used
Alias for notification
Service settings section
Period which will be used
Notifies user if monitored service is in recovery state.
Notifies user if monitored service is in critical state.
Notifies user if monitored service is in warning state.
Notifies user if monitored service is in unknown state.
Notifies user if monitored service is in flapping state.
Notifies user if monitored service is in planed downtime state.
Host settings section
Period which will be used for host
Notifies user if monitored host is in recovery state.
Notifies user if monitored host is down.
Notifies user if monitored host is unreachable.
Notifies user if monitored host is in flapping state.
Notifies user if monitored host is planed downtime state.
All of the obove mentioned states are icinga states. Please consult the icinga manual for further information of icinga states -->
Service templates ease the use of notifications in conjunction with contact groups enormously. Once a service template is created, it can be used as a template inside of contactgroups tab or on other places (e.g. in configurations).
Possible settings for service templates are:
Base data
Name of the template
check
Notification period
Intervall of checks in minutes
Number of checks until it become from SOFT state to HARD state
Notification
Notification period
Notification interval
Send notification in case of recovery state
Send notification in case of critical state
Send notification in case of warning state
Send notification in case of unknown state
Send notification in case of flapping state. For details please look into the flapping section.
Send notification in case of planed downtime.
Freshnes settings
Limit for freshness (only if "Check freshness" is enabled)
Flap settings
Enables the flap detection mode
If state in percent of the last 20 checks falls below it, flapping stops (only if "Flap detection" is enabled)
If state in percent of the last 20 checks exceeds it, flapping starts (only if "Flap detection" is enabled)
Enables flap detection for state ok (only if "Flap detection" is enabled)
Enables flap detection for state warn (only if "Flap detection" is enabled)
xEnables flap detection for state critical (only if "Flap detection" is enabled)
Enables flap detection for state unknown (only if "Flap detection" is enabled)
Like service templates, device templates ease the use of notifications.
Possible settings are:
Base data
Device template name
Use specific mon service template
check
Choose a given check command
Time period which will be used
Intervall of checks in minutes
Number of checks until it become from SOFT state to HARD state
maximum number of attempts til
Notification
Notification period
Notification interval
Send notification in case of recovery state
Send notification in case if host is down
Send notification in case if host is unreachable
Send notification in case if host is flapping
Send notification if host is in planed downtime state
Freshnes settings
Limit for freshness (only if "Check freshness" is enabled)
Flap settings
Enables the flap detection mode
If state in percent of the last 20 checks falls below it, flapping stops (only if "Flap detection" is enabled)
If state in percent of the last 20 checks exceeds it, flapping starts (only if "Flap detection" is enabled)
Enables flap detection for host state up (only if "Flap detection" is enabled)
Enables flap detection for host state down (only if "Flap detection" is enabled)
Enables flap detection for host state unreachable (only if "Flap detection" is enabled)
This are the commands which check if hosts are in up or in down state. By default four check commands are included.
check-host-alive
$USER2$ -m localhost ping $HOSTADDRESS$ 5 5.0
Check the host with ping command
check-host-alive2
$USER2$ -m $HOSTADDRESS$ version
Check the host with version command
check-host-down
$USER1$/check_dummy 2 down
Check if the host is down with a dummy_check command
check-host-ok
$USER1$/check_dummy 0 up
Check if the host is ok with a dummy_check command
Notifications enables you also to send messages to a group of contacts. Every existing contact ( not existing user!)can be marked as a member of a contact group. This way it is possible
to notify not only one single person, but several persons at once.
For each member of the group, there are individual settings stored in the contacts tab itself. That means you can create a contact group, consisting of many different contacts which again could consisting of many different no
Contact group name
Alias of contact group
Member of contact group
Specify device group to use in contact group
Specify service templates to use in contact group
Flapping is the repeatedly change of host or service state during the last 21 checks. It will be calculated with special algorithms. The unit of flapping is percent.
Not every state change has to be notified to an administrator. Often there are regular processes which includes temporary service downtimes, timeouts and similar states. All this can happens in a smaller or bigger period of time. To make it possible to differ between normal flapping and between flapping due to an host or service issue, there are built in threshold values for flapping.
There are two threshold values which allows you to set up when flapping detection should starts and when it should stops. As a rule of thumb you can use following description:
Rule of thumb to set flapping thresholds
This is the lower value in percent, related to the last 21 checks (20 possible states). If the value goes below it, flapping stops.
This is the higher value in percent, related to the last 21 checks (20 possible states). If the messured value exceeds it, flapping starts.
For example: If the last 20 states, changes for 15 times, you get about 75% status change (15/20*100%). Of course this is not the absolute truth because of a more realistic calculation algorithm which weights current state changes more than older one. But to get a feeling for how flapping in NOCTUA® works it is enough.
To find out how exact flapp detection works, please take a look into the well written icinga manual at
Figure 10.27. Smooth flapping notification settings
Bigger difference between high and low threshold to get smoother flapping notifications
In above figure Description of flapping you can clearly see the function of high and low threshold. So to control your flapping notifications you have to choose a proper percent value and a proper difference value between the high and low threshold. The smaller the difference of high and low threshold is, the more accurat detected flapping notification is. Or in other words: The bigger the difference between high and low threshold is, the smoother detected flapping notification is.
Figure 10.28. More accurate flapping notifications
Smaller difference between high and low threshold to get more accurate flapping notifications
To explain parameterized checks, first of all we have to understand checks itself. Usually a check is a command, created in the monitoring web interface and executed by icinga. Some possible icinga commands are:
check_apt check_breeze check_by_ssh check_clamd check_cluster check_dhcp check_dig check_disk check_disk_smb check_dns check_dummy check_file_age check_flexlm check_ping ...
For every single command there are some special options. Below are some options for the check_ping command:
Options: -h, --help Print detailed help screen -V, --version Print version information --extra-opts=[section][@file] Read options from an ini file. See https://www.monitoring-plugins.org/doc/extra-opts.html for usage and examples. -4, --use-ipv4 Use IPv4 connection -6, --use-ipv6 Use IPv6 connection -H, --hostname=HOST host to ping -w, --warning=THRESHOLD warning threshold pair -c, --critical=THRESHOLD critical threshold pair -p, --packets=INTEGER number of ICMP ECHO packets to send (Default: 5) -L, --link show HTML in the plugin output (obsoleted by urlize) -t, --timeout=INTEGER Seconds before connection times out (default: 10)
Now that we know what checks really are we can go ahead and explain parameterized checks.
In NOCTUA® there are two different methods to create checks (icinga commands).
Checks will be defined individual with fixed options and bound on specific devices. These checks are always specific, that means to change one option of the check is the same as to change the whole check.
Checks are defined globally as
Parameterized check
and bound
on devices.
These checks are not specific, that means to
change one
option of the check it is enough
to change the parameter of
it.
There are 10 devices, 7 of them should be checked on port 80 and 3 of them on port 8080:
Solution fixed method:
You have to set up two different checks, one check with option set to port 80 (-p 80) and one check with option set to port 8080 (-p 8080).
Solution parameterized method:
You have to set up only one check with parameterized options (-p $PORT_NUMBER). Now you are able to modify the port option parameter to every desired value without changing the check itself.
For some reason we will create checks with 5 different warning values.
Solution fixed method:
You have to set up five different checks with five different warning option values. If there are even 10 different values you have a lot to do because you need to create 10 different checks.
Solution parameterized method:
You have to set up only one check with parameterized warning option value and change the parameter for each of the five different warning values. If there are also 10 different warning option values you only have to change the warning option parameter for each device instead of renew the check.
The main advantage of parameterized checks in contrast to fixed defined checks is a more flexible way to handle checks. A direct influence on check options is also a benefit.
With parameterizing it is possible to change some check option values after creating it. Additional, it is faster to set option values than to set whole checks, so your administration effort decrease.
The
bigger and more
complex a
Network
is, the more efficient it is to use parameterized checks.
An other advantage of parameterizing is the possibility to react faster in case of alternation established setup.
There are three types of configuration data that can be associated with a configuration.
Variables
Monitoring Config
Scripts
An other special feature of NOCTUA® is the ability to get partition data without any need to configure it. To get this feature run, only thing you have to do is to install the discovery-server on the machine you want and activate it.
Once it is installed you must activate it in deviceconfig like you do for RMS or Package-Install. (Klick on
device
Config
and on the blue arrow, select "discovery_server")
Now you can easily get partition data by pushing the fetch partition info button.
If you are familiar with icinga it may be pleasant to read that NOCTUA® completely supports icinga. All devices/hosts and services inside of NOCTUA® are available by icinga. As monitoring backend and very important part of our monitoring solution we distribute icinga with every installation as part of it.
Table of Contents
Graphs are one of the most important tools for monitoring devices. They allows you to create graphs of collected data for different timeranges easily. You do not have to write couple of config files or modify existing one. All the configuration will be done by the web front-end, lean back and keep an eye of your automatic generated graphs.
Below the graph itself there is a legend and a table with numeric values. It contains following parts:
RRD-graph legend
Describes the color of lines or areas and corresponding data.
Physical unit of displayed values.
Minimum value of displayed graph
Average value of displayed graph
Maximum value of displayed graph
Last value in timelime of displayed graph
Total amount of displayed graph
RRD stands for Round Robin Database and is a special designed database structure to collect data circular. That means that the database must be setup for the right amount of data which should be collect.
For that reason there are following advantages and disadvantages:
No danger to overfill database
After some time data will be overwritten and can not more be displayed with higher resolution.
But the monitoring software takes care for this details so you have not to agonize about it.
To collect data and draw graphs in NOCTUA® there are more services appropriate for. Lower figure illustrates how they work together and how the dataflow between each other is.
Dotted parts are still in progress but will be very soon implemented into NOCTUA® because of better data flow distribution, less read/write access and therefor less load on the server.
Datatransfer should only takes place if rrd-graphs will be requested.
To makes rrd-graphing work, the rrd-grapher service and collectd-init service must already run.
Select one ore more devices you want RRD graphs for and click either on the "house" button in top menu or on the green "use selection" button below the top menu.
If you only need RRD graphs for one device, just click on the device name in the device tree view.
In both cases there will be some new tabs displayed, one of them named Graphs
RRD data is not collected mandatory for every device. To find out if there are some rrd-graphs for devices, look for a pencil logo beside the name of the device in
the devicetree. rrd_pencil
The rrd frontend follows the same structure like other parts of NOCTUA®. There are buttons, lists selections, inputfields and if drawn of course the graphs itself.
Also there is a tree on the left side, but this time not for devices but for monitored or collected data.
The size in pixel the output graph will be. This size relates only for the graphs, not for legend. Keep this in mind if you want to insert graphs somewhere else.
Output graph size
420x200, 640x300, 800x350, 1024x400, 1280x450
Selection which timerange should be displayed. There are "last" and "current" selections.
Timerange
draw graphs of the last 24 hours from now ((now-24h) - now)
draw the whole last day (00:00 -23:59)
draw the whole current week (sunday -saturday)
draw the whole last week (sunday -saturday)
draw the whole current month
draw the whole last month
draw the whole current year (Jan - Dec)
draw the whole last year (Jan - Dec)
With the timeshift option you get a tool in your hands to map current graphs on future timeline. For example this is handy to compare current graphs with graphs drawed 1 week ago.
Timeshift
do not draw extra comparing graphs
draw one normal graph () plus the same graph 1 hour later (dotted)
draw one normal graph plus the same graph 1 day later (dotted)
draw one normal graph plus the same graph 1 week later (dotted)
draw one normal graph plus the same graph 1 month (31 days) later (dotted)
draw one normal graph plus the same graph 1 year (365 days) later (dotted)
Show specific jobs
Set the current timeframe to now
Set the endtime of the graph
Hide empty graphs
Always include y-axis = 0 into graph
Merge RRD from controlling devices
harmonize ordinate, for direct comparison with other graphs
Draw one graph for all devices
By default, no IPMI input data of controlling device will be shown in RRD tree. To display IPMI sources in RRD tree select the Merge RRD button and reload the page or push the green arrow button.
Ping data belongs to IPMI and will also be shown in RRD tree after selecting Merge RRD button.
Apart from typing starttime and endtime of graph into the inputfield or picking the start and endtime from calendar, you can also select timearea direct from the graph itself. To zoom into desired timearea, simply move your mouse over the graph, the mousearrow changes to a cross hair and now you are able to draw a rectangle field over the graph. At the same time the area outside the selection gets darker. After releasing the mousebutton, area can be moved around or resized.
Push the apply button to zoom into selected area or use Esc key to abort selection.
The rrd tree is, similar to the device tree, a overview. The difference is that a parent object can consists of one or more child objects. For instance the parent object mem contains 4 child objects, avail, free, icsw and used.
For parent groups it makes sense to summarize some graphs. This kind of summarization is called aggregation in NOCTUA®. Best way to get group information or to get overview of a cluster is to use aggregation. Aggregation is more complex than simple addition of data. It manages interferences and calculates values the most effective way to display sums as realistic as possible.
Above graph shows the 15 minutes single load value for 3 devices (pink, purpur and green) and a combined sum graph (brown) of all device graphs.
Compound view can be found on top of the monitoring data tree. It combines several data (for example load, cpu, processes, memory and io) on one multigraph, no need to select n graphs.
An other advantage of compund graphs is stacking. Some graphs are more significant if values are displayed stacked.
Above figure explains stacking in context of memory graphing.
In this example you can see straightaway the parts of memory usage in relation to available memory.
Table of Contents
One of the most interesting question admins wondering about is where monitored devices are located. Location means on the one hand the real physical position of devices.
On the other hand location could be structural location representing network infrastructure in context of functionality not in context of realistic physical locations or network connections.
No matter if structural or physical locations, both of them have to be configured the same way.
To add new device locations first of all we must create a new entry into the category tree. For this step you can, but do not have to select any device before.
Navigate to Base
Category tree
and choose the Categories tab.
Left click on create new button, a new window appears below. Enter a new category name and choose location as parent
category.
For advanced settings of new created category entry click left onto the caegory in category tree or push the modify button beside.
Advanced location settings
Name of category tree entry and its parent category
Coordinates for defined google map points
Checkbox to lock google map points in place
Checkbox to define location as physical one
If we go back and choose the Google maps tab, we notice a red Flag onto the google map and also two new buttons, an icon and category name appeared beside the map.
The blue locate button zooms the map in. With the green add location gfx button you are able to upload user image maps in two steps:
Define Location graphic name
Once you named your new location graphics, a new modify button appears. Use the button to upload user images.
Modify added graphic entry to upload user image
Of course you can add even more than just one user image, so you can create a stepwise zooming from google map to detailed server room photographs.
Figure 12.5. Concepts of zoom levels with multiple image maps
Zoom levels with according user image maps
The web front-end allows you also to edit uploaded images with the preview and enhance button.
Following self-explanatory buttons are accessible if you want to edit your uploaded image for quality reasons.
Following editing buttons are integrated:
left/right rotation (rotates image 90° clockwise or counter clockwise)
increase/decrease image brightness
sharpen/unsharpen image
Filter (includes a bunch of predefined filter for)
undo (undo last editing action)
restore original image
With localisation it is not only possible to display and locate the exact position of devices in different zoom levels, but also the status of monitored devices. That way you can get the best possible overview of your serverroom for example.
Once you have created new location categories and added some photos or images, you can easily add device livestatus to it.
Select all devices you wish to add and click either the home button or use selection button.
Navigate to the Location tab select the checkbox and left click on the location category. It appears a show location map button on the right side with some information about the image and a small preview of it. Push the button to show the image map.
Now you can place your livestatus burst on the right place at the image by clicking on the set button.
After placing livestatus burst on the right place left click on the lock button to prevent the livestatus burst from moving.
Use the remove button to remove livestatus burst from image.
Table of Contents
The Simple Network Management Protocol is a official RFC internet-standard-protocol which is designed to handle variables of devices like switches, router, server, workstation, printer, bridges, hubs, and more.
Variables contain hardware information and configuration of devices and can be picked up manually by special SNMP commands like snmpwalk. NOCTUA® implements SNMP as "autodiscovery" service, capable to scan network devices and get as much information about it as possible.
In the context of monitoring, snmp can deliver a huge amount of information about devices. Unfortunately there are some differences of implementation from several hardware vendors, as a result it is very difficult extracting useful and realistic data out of the snmp stack.
For this reason, NOCTUA® uses some intelligent algorithm and filter to avoid insertion of faulty data into the database.
To get SNMP data from devices, first of all target devices are required to provide such SNMP data. Most hardware in the network segment like swithes, router, server, printer, etc... provide SNMP by default.
For operating systems like windows or SUSE/RedHat machines, there are SNMP daemons which fist have to be started before they provide SNMP data.
Please read your operating system documentation or contact your administrator to find out how to activate SNMP daemon on your machines.
To activate SNMP discovery for one device, simply select the checkbox Enable perfdata, check IPMI and SNMP. To get this checkbox, either
select your device and left click the home icon on top, or double click the device.
To reach SNMP scan, go to Base
Device network.
There are no SNMP schemes yet in the settings window. Now perform a SNMP scan with left click on the orange update network button.
It appears a SNMP setting window, where you are able to adjust some basic settings.
Settings
The IP address of the device, a valid domainname or a valid host name.
SNMP security settings, either public or private
Number of snmp version, either 1 or 2
If this flag is marked, previously done config, SNMP auto discovery scan can not reading out will be deleted.
Depending on your network size and structure, it takes some time to get complete SNMP data tree, apply filter and algorithms to it and write the extracted data into the database.
After performing SNMP scan, you will get some new network config entries for the scanned device.
That way you automatically get a couple of netdevices with according names, values, MAC addresses, MTU values, speed, etc... without to invest much time or manpower. A very handy and timesaving tool for administrators.
Alternative to SNMP scan there is a tab for scan with host-monitoring. This scan works only if the host-monitoring service is installed and running on the host machine. You can find more information about host-monitoring in [Section 10.1.3, “ Advanced Monitoring with host-monitoring ”].
Figure 13.6. host-monitoring scan settings
host-monitoring scan settings
The flag all netdevices must be recognizable and all existing peers must be conserved ensures that existing peers (network topology connections) will not be deleted after scan.
Table of Contents
To obtain information about the general status of your server use icsw service status.
To show the last errors from the logfile you can use lse .
lse
[
-l
Error number
]
For more information type
lse --help
.
Example 14.2. Using lse to display the last error
clusterserver:~ #
lse -l 1
Found 40 error records Error 40 occured yesterday, 17:12:47, pid 11507, uid/gid is (30/8 [wwwrun/www]), source init.at.cluster.srv_routing, 72 lines: 0 (err) : IOS_type : error 1 (err) : args : None 2 (err) : created : 1409152367.94 3 (err) : exc_info : None 4 (err) : exc_text : None 5 (err) : filename : routing.py 6 (err) : funcName : _build_resolv_dict 7 (err) : gid : 8 8 (err) : levelname : err 9 (err) : levelno : 40 10 (err) : lineno : 179 11 (err) : message : device 'METADEV_server_group' (srv_type grapher) has an illegal device_type MD
Retrieving node information in an automated fashion is often useful in hunting down errors and bugs. To retrieve information about the nodes use collclient.py .
collclient.py
[
--host
Nodename
] [command]
For more information execute
collclient.py --help
The server provides its own logging service. Like usual in *NIX environments there are special directories logfiles will be written into. Access to these log files is given by the command lse. Of course it is also possible to read the logfiles directly by your favorite editor.
In case of something goes wrong the logging-server writes its logs under /var/log/cluster/logging-server/[HOSTNAME]/
.
Denotation of log files and subdirectories is related to the service which writes the log. For example if the meta-server can not start
some service it will write its log into the directory
If you want to see background information for package-installation on some nodes the file you have to check is package-client
. Analogue this is true
for server side, this time the filename is package-server
Files called *.bz2
are compressed logging backup files.
Critical errorlogs will also be delivered by mail. So you do not have to check your logs permanent, you will be notified by mail about critical errors.
Setting for recipient of errorlog mails is stored in /etc/sysconfig/logging-server
.
Another configuration file for mail notification is /etc/sysconfig/meta-server
.
Replace the given mailaddress in the line containing TO_ADDR= with your desired mail address.
By uncommenting and editing the line beginning with #FROM_ADDR= you are able to set the sender "From" name of recived emails.
# from name and addr FROM_NAME=pythonerror #FROM_ADDR=localhost.localdomain # to addr TO_ADDR=mymail@gmail.com # mailserver MAILSERVER=localhost
After editing the logging-server configuration file, the logging-server daemon must be restarted:
icsw service restart logging-server
The new configuration take effect after restart logging-server daemon.
A very handy command to read out logfiles is icsw logwatch. Logwatch makes it possible to display logs for different services and daemons at once. Even if you don't know in which file the logs are written to you are able to watch it. Thats the reason why logwatch is an allround tool for logging.
As usual for log files they have a typical output format.
Table 14.1. Logwatch columns
Column number | Column name | Example |
---|---|---|
1 | Date and time | Thu Apr 09 17:58:41 2015 |
2 | Device | 2_5_branch |
3 | System (logging daemon) | /collectd-init |
4 | Node | /--- |
5 | Loglevel | warn |
6 | Processname,processid | MainThread.19137 |
7 | Logmessage | sending 733 bytes to vector_socket |
Without any parameter logwatch.py displays the last 400 lines of logging messages for all services writing logfiles. With icsw logwatch -n 20 you can limit output lines to the last 20.
A very useful parameter for icsw logwatch is --system-filter. This flag restricts the output to one single daemon (service) e.g
icsw logwatch
[
--system-filter
rrd
]
displays only log messages related to an rrd (daemon) service.
With the -f flag it is possible to view logs in realtime. Use
icsw logwatch
[
-f
] [
--machine
MACHINE
] [
-n
N
] [
--system-filter
rrd
]
to output appended data as the file grows. Try icsw logwatch --help to list all possible options.
In case of malfunction it is very likely that the portnumber will be written into a logfile or apears in the web front-end. To find out which service or process causes the error we have to know which service communicates on which port. Following table shows you a little summary of common services and their communicating ports.
Table 14.2. Portnumber and services
Service | Port |
---|---|
md-config-server | 8010 |
rrd-grapher | 8003, 8003 |
logging-server | 8011 |
meta-server | 8012 |
discovery-server | 8006 |
cluster-server | 8004 |
Table of Contents
One of the main advantages in contrast with proprietary software is the ability to extend or adapt functionality to user-defined targets.
There are some documented APIs which allows you to customise and optimise the workflow and integration into your companys facility.
Table of Contents
This is a Collection of repeated Questions.
16.1.1. Bad looking font in RRD Graph | ||
| ||
16.1.1.1. | Why are my fonts looks so ugly?
ugly looking fonts due to wrong font setup | |
If you get something like in the picture above, you have to install fetchmsttfonts (OpenSUSE) and ... (debian) package. | ||
16.1.2. Server Error (500) | ||
| ||
16.1.2.1. | Why i get Server Error (500)? | |
This is a server internal error, likely the server can't find some files. Take a look into /var/log/nginx/error.log for detailed error message. Also the lse command could be helpful. | ||
16.1.3. Unable to connect to the web front end | ||
16.1.3.1. | Why i can not connect to the web front end? | |
For some reason the webserver nginx doesn't run. Start it manually, for example with "icsw service nginx start" | ||
16.1.4. An error occurred | ||
| ||
16.1.4.1. | Why i get a message "An error occured"? | |
Please wait a moment till database connection is active and reload the page. If you still get this message after waiting a time you have to start uwsgi-init, for example with "service uwsgi-init start" With top you can display a job list. If there is something like yuglify in the top row than wait some time until it disapears. After that and after reloading the page the error message should dissapears. | ||
16.1.5. Configurations seems to be ignored | ||
16.1.5.1. | I changed my configurations but it seems to be ignored. | |
For some changes in your configuration you have to rebuild config (cached, RC) first. If your config is stored in cache, you have even to rebuild config (refresh) | ||
16.1.6. An Error occured | ||
| ||
16.1.6.1. | Why does my discovery not working? | |
Most likely the discovery-server service is not running. Make sure the discovery-server is installed and running.
Run the icsw service status command and look for "discovery-server".
If it is not running, start it either by commandline icsw service start discovery-server or via the webfrontend in top menu under server
information | ||
An other possible reason for that malfunction could be disabled discovery server config for your monitoring server. To enable it select your monitoring server device, navigate to the config tab and select the discovery server config.
After that you have to wait some time or refresh the memcached by | ||
16.1.7. Slow network topology graph | ||
16.1.7.1. | Why is my network topology graph so slowly? | |
Sometimes complex network topology slows down display output in firefox. This issue affects firefox up to version 31.0. Reason is likely bad javascript interpretation on firefox side. If you get bad graphic display performance, try to use another browser e.g. chromium or google chrome™. | ||
16.1.8. Lost password | ||
16.1.8.1. | Ilost my password, how can i get a new one? | |
A short guide how to reset a login password by direct access to the database via clustershell follows:
From now you are able to login with your new password. | ||
16.1.9. "Please wait..." after add location gfx | ||
16.1.9.1. | What if "Please wait..." message occures for loger time? | |
If you must wait long time while pending upload and the infolabel "Please wait..." is shown after upload image with add location gfx button, reload the page to resolv this issue. | ||
16.1.10. Weird mouse events on virtual desktop | ||
16.1.10.1. | The mouse pointer position is wrong, what can i do to resolv this? | |
Some vnc-server tends to break correct mouse pointer handling in virtual desktop. To get back correct mouse pointer, log out of your session and back in again. | ||
16.1.11. Asynchron graphs | ||
| ||
16.1.11.1. | Why are my rrd graphs ascnchron? | |
If you get wrong graphs, for example 1 hour in past or 1 hour in future like the pink graph line below, make sure to set the correct timezone and times on the affected machines. | ||
16.1.12. I have no permissions to icinga | ||
16.1.12.1. | How can i get the right permissions to access the icinga view? | |
To get the right permissions to icinga, you have to have at least one contact in Monitoring
Rebuild your icinga config to apply your new contact entry. Monitoring Figure 16.4. No permission to icinga ![]() Without at least one contact you are not able to use the icinga view | ||
16.1.13. Can not reach any network devices | ||
| ||
16.1.13.1. | Why i can not reach any network device? | |
16.1.14. Unable to delete group from device tree | ||
| ||
16.1.14.1. | How can i delete groups? | |
There is no delete button visible for preselected device groups in device tree. Reason for this behavior is that there are disabled devices in this group. First delete this disabled devices and finally you are also able to delete the group. | ||
16.1.15. No "IP address" dropdown in device network | ||
16.1.15.1. | Why there is no "IP address" dropdown in device network?? | |
There is no "IP address" dropdown button visible in device network until at least one network is defined in Base | ||
16.1.16. Could not connect to server: Connection refused | ||
16.1.16.1. | I get server error on port 5432, how can i resolv the problem? | |
After running icsw service status script, a python error with Port 5432 occours. Generally, if you get error messages with the port number 5432, the reason is likely that your postgres server (listens by default on port 5432) is down. [...] django.db.utils.OperationalError: could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5432? To check if your postgres database server is running type in one of the following commands, depend on your os: rcpostgres status service postgresql status systemctl status postgresql.service Make sure to start the postgres server at boot time by enabling your operating system start scripts. Replace status with start to start the database server. Also make sure the service starts after rebooting the system. | ||
16.1.17. Internal Server Error | ||
16.1.17.1. | I get an internal server error in web front end, how can i resolv the problem? | |
After server installation and database setup, it could happen that you get an Internal Server Error. Try restarting your uwsgi-init service with this command:rcuwsgi-init restart. The restart of uwsgi service results in an running yuglify process which generates all static files. After all static files were generated, you should get access to your web front-end. | ||
16.1.18. License warning apears on every page | ||
16.1.18.1. | I get an license warning message on every single page, how can i resolve this? | |
A license warning message apears on the top right side, each time a new page is loaded. This warning means that one of your licenses is either out of date and for this reason in grace time, or used devices/services/user for this license exceeds its license limitation. In both cases the license is violated and from that moment the grace period begins to run. Grace period for licenses is 2 weeks long. In grace period, functionality of the software is as usual but you will see this license violation warning on each new loaded page
You have to get a new license or expand your existing one to avoid the license violation message. Please contact us by mail An other method to get again into a valid license limitation is to lock licenses for some devices. | ||
16.1.19. Reverse domain tree node order | ||
16.1.19.1. | How can i reverse the domain tree node order? | |
Suppose you have a couple devices with following domain tree node order: Domain tree node structure for above device domains looks like this: Now for example you want to revert the order of the device domains.
First step you have to do is to change the domain name tree itself. Navigate to Base Do the same with the other entries until your tree looks like this:
All you have to do now is to change your domain tree node for your devices. Select your desired devices in the device tree sidebar on left side and navigate to
Base The result should looks like this: |
Django is a free and open source web application framework, written in Python
Domain Tree Node, is the tree structure of fully quallified domain names.
Fully quallified domain name, for example noctua.init.at
Free software license named GNU General Public License, more Information at [http://www.gnu.org/licenses/]
Industy standard software for monitoring devices.
Intelligent Platform Managemnt Interface
A computer network or data network is a telecommunications network which allows computers to exchange data.
Is the central node other devices are connected to
Small and fast http server similar to apache
Often titled as OSS, is software with available source code. Popular open source software license is the GPL
Some option values of this check (command) can be accessed by parameter.
Preboot Execution Environment
RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data.
The "Son of Grid Engine", community project of Sun Grid Engine. [https://arc.liv.ac.uk/trac/SGE]
Top Level Node, is the top level domain the node belongs to.
Virtual Machine, general name for virtual systems running completely in an host environment like KVM or similar.