[comment]: # ({new-30385a91})
# 2 Preprocessing details

[comment]: # ({/new-30385a91})

[comment]: # ({67b39eda-67b39eda})
#### Overview

This section provides item value preprocessing details. Item value
preprocessing allows to define and execute [transformation
rules](/manual/config/items/preprocessing) for the received item values.

Preprocessing is managed by a preprocessing manager process, which was
added in Zabbix 3.4, along with preprocessing workers that perform the
preprocessing steps. All values (with or without preprocessing) from
different data gatherers pass through the preprocessing manager before
being added to the history cache. Socket-based IPC communication is used
between data gatherers (pollers, trappers, etc) and the preprocessing
process. Either Zabbix server or Zabbix proxy (for items monitored by
the proxy) is performing preprocessing steps.

[comment]: # ({/67b39eda-67b39eda})

[comment]: # ({7aff5db5-7aff5db5})
#### Item value processing

To visualize the data flow from data source to the Zabbix database, we
can use the following simplified diagram:

![](../../../../../assets/en/manual/appendix/items/overall_pic.png)

The diagram above shows only processes, objects and actions related to
item value processing in a **simplified** form. The diagram does not
show conditional direction changes, error handling or loops. Local data
cache of preprocessing manager is not shown either because it doesn't
affect data flow directly. The aim of this diagram is to show processes
involved in item value processing and the way they interact.

-   Data gathering starts with raw data from a data source. At this
    point, data contains only ID, timestamp and value (can be multiple
    values as well)
-   No matter what type of data gatherer is used, the idea is the same
    for active or passive checks, for trapper items and etc, as it only
    changes the data format and the communication starter (either data
    gatherer is waiting for a connection and data, or data gatherer
    initiates the communication and requests the data). Raw data is
    validated, item configuration is retrieved from configuration cache
    (data is enriched with the configuration data).
-   Socket-based IPC mechanism is used to pass data from data gatherers
    to preprocessing manager. At this point data gatherer continue to
    gather data without waiting for the response from preprocessing
    manager.
-   Data preprocessing is performed. This includes execution of
    preprocessing steps and dependent item processing.

::: noteclassic
Item can change its state to NOT SUPPORTED while
preprocessing is performed if any of preprocessing steps
fail.
:::

-   History data from local data cache of preprocessing manager is being
    flushed into history cache.
-   At this point data flow stops until the next synchronization of
    history cache (when history syncer process performs data
    synchronization).
-   Synchronization process starts with data normalization storing data
    in Zabbix database. Data normalization performs conversions to
    desired item type (type defined in item configuration), including
    truncation of textual data based on predefined sizes allowed for
    those types (HISTORY\_STR\_VALUE\_LEN for string,
    HISTORY\_TEXT\_VALUE\_LEN for text and HISTORY\_LOG\_VALUE\_LEN for
    log values). Data is being sent to Zabbix database after
    normalization is done.

::: noteclassic
Item can change its state to NOT SUPPORTED if data
normalization fails (for example, when textual value cannot be converted
to number).
:::

-   Gathered data is being processed - triggers are checked, item
    configuration is updated if item becomes NOT SUPPORTED, etc.
-   This is considered the end of data flow from the point of view of
    item value processing.

[comment]: # ({/7aff5db5-7aff5db5})

[comment]: # ({44da8577-44da8577})
#### Item value preprocessing

To visualize the data preprocessing process, we can use the following
simplified diagram:

![](../../../../../assets/en/manual/appendix/items/preprocessing_simplified.png)

The diagram above shows only processes, objects and main actions related
to item value preprocessing in a **simplified** form. The diagram does
not show conditional direction changes, error handling or loops. Only
one preprocessing worker is shown on this diagram (multiple
preprocessing workers can be used in real-life scenarios), only one item
value is being processed and we assume that this item requires to
execute at least one preprocessing step. The aim of this diagram is to
show the idea behind item value preprocessing pipeline.

-   Item data and item value is passed to preprocessing manager using
    socket-based IPC mechanism.
-   Item is placed in the preprocessing queue.

::: noteclassic
Item can be placed at the end or at the beginning of the
preprocessing queue. Zabbix internal items are always placed at the
beginning of preprocessing queue, while other item types are enqueued at
the end.
:::

-   At this point data flow stops until there is at least one unoccupied
    (that is not executing any tasks) preprocessing worker.
-   When preprocessing worker is available, preprocessing task is being
    sent to it.
-   After preprocessing is done (both failed and successful execution of
    preprocessing steps), preprocessed value is being passed back to
    preprocessing manager.
-   Preprocessing manager converts result to desired format (defined by
    item value type) and places result in preprocessing queue. If there
    are dependent items for current item, then dependent items are added
    to preprocessing queue as well. Dependent items are enqueued in
    preprocessing queue right after the master item, but only for master
    items with value set and not in NOT SUPPORTED state.

[comment]: # ({/44da8577-44da8577})

[comment]: # ({4b6fc468-4b6fc468})
#### Preprocessing queue

Preprocessing queue is a FIFO data structure that stores values
preserving the order in which values are revieved by preprocessing
manager. There are multiple exceptions to FIFO logic:

-   Internal items are enqueued at the beginning of the queue
-   Dependent items are always enqueued after the master item

To visualize the logic of preprocessing queue, we can use the following
diagram:

![](../../../../../assets/en/manual/appendix/items/queue_processing.gif)

Values from the preprocessing queue are flushed from the beginning of
the queue to the first unprocessed value. So, for example, preprocessing
manager will flush values 1, 2 and 3, but will not flush value 5 as
value 4 is not processed yet:

![](../../../../../assets/en/manual/appendix/items/queue_flush.png)

Only two values will be left in queue (4 and 5) after flushing, values
are added into local data cache of preprocessing manager and then values
are transferred from local cache into history cache. Preprocessing
manager can flush values from local data cache in single item mode or in
bulk mode (used for dependent items and values received in bulk).

[comment]: # ({/4b6fc468-4b6fc468})

[comment]: # ({new-c9d6405f})

#### Preprocessing caching

Preprocessing caching was introduced to improve the preprocessing performance for multiple dependent items having similar preprocessing steps (which is a common LLD outcome).

Caching is done by preprocessing one dependent item and reusing some of the internal preprocessing data for the rest of the dependent items. The preprocessing cache is supported only for the first preprocessing step of the following types:

*   Prometheus pattern (indexes input by metrics)
*   JSONPath (parses the data into object tree and indexes the first expression `[?(@.path == "value")]`)

[comment]: # ({/new-c9d6405f})

[comment]: # ({98589fcd-98589fcd})
#### Preprocessing workers

Zabbix server configuration file allows users to set count of
preprocessing worker processes. StartPreprocessors configuration
parameter should be used to set number of pre-forked instances of
preprocessing workers. Optimal number of preprocessing workers can be
determined by many factors, including the count of "preprocessable"
items (items that require to execute any preprocessing steps), count of
data gathering processes, average step count for item preprocessing,
etc.

But assuming that there is no heavy preprocessing operations like
parsing of large XML / JSON chunks, number of preprocessing workers can
match total number of data gatherers. This way, there will mostly
(except for the cases when data from gatherer comes in bulk) be at least
one unoccupied preprocessing worker for collected data.

::: notewarning
Too many data gathering processes (pollers,
unreachable pollers, ODBC pollers, HTTP pollers, Java pollers, pingers, trappers,
proxypollers) together with IPMI manager, SNMP trapper and preprocessing
workers can exhaust the per-process file descriptor limit for the
preprocessing manager. This will cause Zabbix server to stop (usually
shortly after the start, but sometimes it can take more time). The
configuration file should be revised or the limit should be raised to
avoid this situation.
:::

[comment]: # ({/98589fcd-98589fcd})

[comment]: # ({ad691011-ad691011})
##### Value processing pipeline

Item value processing is executed in multiple steps (or phases) by
multiple processes. This can cause:

-   Dependent item can receive values, while THE master value cannot.
    This can be achieved by using the following use case:
    -   Master item has value type `UINT`, (trapper item can be used),
        dependent item has value type `TEXT`.
    -   No preprocessing steps are required for both master and
        dependent items.
    -   Textual value (like, "abc") should be passed to master item.
    -   As there are no preprocessing steps to execute, preprocessing
        manager checks if master item is not in NOT SUPPORTED state and
        if value is set (both are true) and enqueues dependent item with
        the same value as master item (as there are no preprocessing
        steps).
    -   When both master and dependent items reach history
        synchronization phase, master item becomes NOT SUPPORTED,
        because of the value conversion error (textual data cannot be
        converted to unsigned integer).

As a result, dependent item receives a value, while master item changes
its state to NOT SUPPORTED.

-   Dependent item receives value that is not present in master item
    history. The use case is very similar to the previous one, except
    for the master item type. For example, if `CHAR` type is used for
    master item, then master item value will be truncated at the history
    synchronization phase, while dependent items will receive their
    value from the initial (not truncated) value of master item.

[comment]: # ({/ad691011-ad691011})
