IntelMQ

IntelMQ

Build Status codecov.io CII Badge

IntelMQ is a solution for IT security teams (CERTs & CSIRTs, SOCs abuse departments, etc.) for collecting and processing security feeds (such as log files) using a message queuing protocol. It’s a community driven initiative called IHAP (Incident Handling Automation Project) which was conceptually designed by European CERTs/CSIRTs during several InfoSec events. Its main goal is to give to incident responders an easy way to collect & process threat intelligence thus improving the incident handling processes of CERTs.

User guide

Introduction

About

IntelMQ is a solution for IT security teams (CERTs & CSIRTs, SOCs abuse departments, etc.) for collecting and processing security feeds (such as log files) using a message queuing protocol. It’s a community driven initiative called IHAP (Incident Handling Automation Project) which was conceptually designed by European CERTs/CSIRTs during several InfoSec events. Its main goal is to give to incident responders an easy way to collect & process threat intelligence thus improving the incident handling processes of CERTs.

Incident Handling Automation Project

Several pieces of software are evolved around IntelMQ. For an overview, look at the IntelMQ Ecosystem.

IntelMQ can be used for - automated incident handling - situational awareness - automated notifications - as data collector for other tools - etc.

IntelMQ’s design was influenced by AbuseHelper however it was re-written from scratch and aims at:

  • Reducing the complexity of system administration

  • Reducing the complexity of writing new bots for new data feeds

  • Reducing the probability of events lost in all process with persistence functionality (even system crash)

  • Use and improve the existing Data Harmonization Ontology

  • Use JSON format for all messages

  • Provide easy way to store data into Log Collectors like ElasticSearch, Splunk, databases (such as PostgreSQL)

  • Provide easy way to create your own black-lists

  • Provide easy communication with other systems via HTTP RESTful API

It follows the following basic meta-guidelines:

  • Don’t break simplicity - KISS

  • Keep it open source - forever

  • Strive for perfection while keeping a deadline

  • Reduce complexity/avoid feature bloat

  • Embrace unit testing

  • Code readability: test with inexperienced programmers

  • Communicate clearly

Usage

Various approaches of installing intelmq are described in Installation.

The Configuration and Management gives an overview how a intelmq installation is set up and how to configure and maintain the setup. There is also a list of available Feeds as well as a detailed description of the different Bots intelmq brings with it.

If you know additional feeds and how to parse them, please contribute your code or your configuration (by issues or the mailing lists).

For support questions please use the IntelMQ Users Mailinglist.

IntelMQ Manager

Check out this graphical tool to easily manage an IntelMQ system.

Contribute

Hardware Requirements

Do you ask yourself how much RAM do you need to give your new IntelMQ virtual machine?

The honest answer is simple and pointless: It depends ;)

IntelMQ and the messaging queue (broker)

IntelMQ uses a messaging queue to move the messages between the bots. All bot instances can only process one message at a time, therefore all other messages need to wait in the queue. As not all bots are equally fast, the messages will naturally “queue up” before the slower ones. Further, parsers produce many events with just one message (the report) as input.

The following estimations assume Redis as messaging broker which is the default for IntelMQ. When RabbitMQ is used, the required resources will differ, and RabbitMQ can handle system overload and therefore a shortage of memory.

As Redis stores all data in memory, the data which is processed at any point in time must fit there, including overheads. Please note that IntelMQ does neither store nor cache any input data. These estimates therefore only relate to the processing step, not the storage.

For a minimal system, these requirements suffice:

  • 4 GB of RAM

  • 2 CPUs

  • 10 GB disk size

Depending on your data input, you will need the twentiethfold of the input data size as memory for processing.

When using Redis persistence, you will additionally need twice as much memory for Redis.

Disk space

Disk space is only relevant if you save your data to a file, which is not recommended for production setups, and only useful for testing and evaluation.

Do not forget to rotate your logs or use syslog, especially if you use the logging level “DEBUG”. logrotate is in use by default for all installation with deb/rpm packages. When other means of installation are used (pip, manual), configure log rotation manually. See Logging.

Background on memory

For experimentation, we used multiple Shadowserver Poodle reports for demonstration purpose, totaling in 120 MB of data. All numbers are estimates and are rounded. In memory, the report data requires 160 MB. After parsing, the memory usage increases to 850 MB in total, as every data line is stored as JSON, with additional information plus the original data encoded in Base 64. The further processing steps depend on the configuration, but you can estimate that caches (for lookups and deduplication) and other added information cause an additional size increase of about 2x. Once a dataset finished processing in IntelMQ, it is no longer stored in memory. Therefore, the memory is only needed to catch high load.

The above numbers result in a factor of 14 for input data size vs. memory required by Redis. Assuming some overhead and memory for the bots’ processes, a factor of 20 seems sensible.

To reduce the amount of required memory and disk size, you can optionally remove the raw data field, see Removing raw data for higher performance and less space usage in the FAQ.

Additional components

If some of the optional components of the IntelMQ Ecosystem are in use, they can add additional hardware requirements.

Those components do not add relevant requirements:

  • IntelMQ API: It is just an API for intelmqctl.

  • IntelMQ Manager: Only contains static files served by the webserver.

  • IntelMQ Webinput CSV: Just a webinterface to insert data. Requires the amount of processed data to fit in memory, see above.

  • Stats Portal: The aggregation step and Graphana require some resources, but no exact numbers are known.

  • Malware Name Mapping

  • Docker: The docker layer adds only minimal hardware requirements.

EventDB

When storing data in databases (such as MongoDB, PostgreSQL, ElasticSearch), it is recommended to do this on separate machines for operational reasons. Using a different machine results in a separation of stream processing to data storage and allows for a specialized system optimization for both use-cases.

IntelMQ cb mailgen

While the Fody backend and frontend do not have significant requirements, the RIPE import tool of the certbund-contact requires about 8 GB of memory as of March 2021.

Installation

Please report any errors you encounter at https://github.com/certtools/intelmq/issues

For upgrade instructions, see Upgrade instructions. For setting up a development environment see the Developers Guide section Development Environment. For testing pre-releases see also the Developers Guide section Testing Pre-releases.

Requirements

The following instructions assume the following requirements. Python versions >= 3.6 are supported.

Supported and recommended operating systems are:

  • CentOS 7 and 8

  • Debian 10

  • openSUSE Leap 15.2, 15.13

  • Ubuntu: 18.04, 20.04

  • Docker Engine: 18.x and higher

Other distributions which are (most probably) supported include CentOS 8, RHEL, Fedora and openSUSE Tumbleweed.

A short guide on hardware requirements can be found on the page Hardware Requirements.

Install Dependencies

If you are using native packages, you skip this section as all dependencies are installed automatically.

Ubuntu / Debian
apt install python3-pip python3-dnspython python3-psutil python3-redis python3-requests python3-termstyle python3-tz python3-dateutil
apt install redis-server

Optional dependencies:

apt install bash-completion jq
apt install python3-sleekxmpp python3-pymongo python3-psycopg2
CentOS 7 / RHEL 7
yum install epel-release
yum install python36 python36-devel python36-requests
yum install gcc gcc-c++
yum install redis
CentOS 8
dnf install epel-release
dnf install python3-dateutil python3-dns python3-pip python3-psutil python3-pytz python3-redis python3-requests redis

Optional dependencies:

dnf install bash-completion jq
dnf install python3-psycopg2 python3-pymongo
openSUSE 15.1 / 15.2
zypper install python3-dateutil python3-dnspython python3-psutil python3-pytz python3-redis python3-requests python3-python-termstyle
zypper install redis

Optional dependencies:

zypper in bash-completion jq
zypper in python3-psycopg2 python3-pymongo python3-sleekxmpp
Docker (beta)

ATTENTION Currently you can’t manage your botnet via intelmqctl documentation. You need to use IntelMQ-Manager currently!

Follow Docker Install and Docker-Compose Install instructions.

The latest image is hosted on Docker Hub

Installation

Installation methods available:

  • native packages (.deb, .rpm)

  • PyPi (latest releases as python package)

Note: installation for development purposes must follow the instructions available on Development Environment.

Native Packages

These are the operating systems which are currently supported by packages:

  • CentOS 7 (run yum install epel-release first)

  • CentOS 8 (run dnf install epel-release first)

  • Debian 10

  • Fedora 30

  • Fedora 31

  • Fedora 32

  • openSUSE Leap 15.2

  • openSUSE Leap 15.3

  • openSUSE Tumbleweed

  • Ubuntu 18.04 (enable the universe repositories by appending universe in /etc/apt/sources.list to deb http://[…].archive.ubuntu.com/ubuntu/ bionic main first)

  • Ubuntu 20.04 (enable the universe repositories by appending universe in /etc/apt/sources.list to deb http://[…].archive.ubuntu.com/ubuntu/ focal main first)

Get the installation instructions for your operating system here: Installation Native Packages. The instructions show how to add the repository and install the intelmq package. You can also install the intelmq-manager package to get the Web-Frontend IntelMQ Manager.

Please report any errors or improvements at IntelMQ Issues. Thanks!

PyPi
sudo -i

pip3 install intelmq

useradd -d /opt/intelmq -U -s /bin/bash intelmq
sudo intelmqsetup

intelmqsetup will create all necessary directories, provides a default configuration for new setups. See the Configuration for more information on them and how to influence them.

Docker without docker-compose

Navigate to your preferred installation directory and run git clone https://github.com/certat/intelmq-docker.git --recursive.

You need to prepare some volumes & configs. Edit the left-side after -v, to change paths.

Change redis_host to a running redis-instance. Docker will resolve it automatically. All containers are connected using Docker Networks.

In order to work with your current infrastructure, you need to specify some environment variables

sudo docker pull redis:latest

sudo docker pull certat/intelmq-full:latest

sudo docker pull certat/intelmq-nginx:latest

sudo docker network create intelmq-internal

sudo docker run -v ~/intelmq/example_config/redis/redis.conf:/redis.conf \
                --network intelmq-internal \
                --name redis \
                redis:latest

sudo docker run --network intelmq-internal \
                --name nginx \
                certat/intelmq-nginx:latest

sudo docker run -e INTELMQ_IS_DOCKER="true" \
                -e INTELMQ_PIPELINE_DRIVER="redis" \
                -e INTELMQ_PIPELINE_HOST=redis_host \
                -e INTELMQ_REDIS_CACHE_HOST=redis_host \
                -v ~/intelmq/example_config/intelmq/etc/:/opt/intelmq/etc/ \
                -v ~/intelmq/example_config/intelmq-api:/opt/intelmq-api/config \
                -v /var/log/intelmq:/opt/intelmq/var/log \
                -v ~/intelmq/lib:/opt/intelmq/var/lib \
                --network intelmq-internal \
                --name intelmq \
                certat/intelmq-full:1.0
Additional Information

Following any one of the installation methods mentioned before, will setup the IntelMQ base. However, some bots may have additional dependencies which are mentioned in their own documentation).

Upgrade instructions

For installation instructions, see Installation.

Read NEWS.md

Read the NEWS.md file to look for things you need to have a look at.

Stop IntelMQ and create a Backup

  • Make sure that your IntelMQ system is completely stopped: intelmqctl stop

  • Create a backup of IntelMQ Home directory, which includes all configurations. They are not overwritten, but backups are always nice to have!

sudo cp -R /opt/intelmq /opt/intelmq-backup

Upgrade IntelMQ

Before upgrading, check that your setup is clean and there are no events in the queues:

intelmqctl check
intelmqctl list queues -q

The upgrade depends on how you installed IntelMQ.

Packages

Use your systems package management.

Docker (beta)

You can check out all current versions on our DockerHub.

docker pull certat/intelmq-full:latest

docker pull certat/intelmq-nginx:latest

Alternatively you can use docker-compose:

docker-compose pull

You can check the current versions from intelmq & intelmq-manager & intelmq-api via git commit ref.

The Version format for each included item is key=value and they are saparated via ,. I. e. IntelMQ=ab12cd34f, IntelMQ-API=xy65z23.

docker inspect --format '{{ index .Config.Labels "org.opencontainers.image.version" }}' intelmq-full:latest

Now restart your container, if you’re using docker-compose you simply write:

docker-compose down

If you dont use docker-compose, you can restart a single container using:

docker ps | grep certat

docker stop CONTAINER_ID
PyPi
pip install -U --no-deps intelmq
sudo intelmqsetup

Using –no-deps will not upgrade dependencies, which would probably overwrite the system’s libraries. Remove this option to also upgrade dependencies.

Local repository

If you have an editable installation, refer to the instructions in the Developers Guide.

Update the repository depending on your setup (e.g. git pull origin master).

And run the installation again:

pip install .
sudo intelmqsetup

For editable installations (development only), run pip install -e . instead.

Upgrade configuration and check the installation

Go through NEWS.md and apply necessary adaptions to your setup. If you have adapted IntelMQ’s code, also read the CHANGELOG.md.

Check your installation and configuration to detect any problems:

intelmqctl upgrade-config
intelmqctl check

## Start IntelMQ

intelmqctl start

Configuration and Management

For installation instructions, see Installation. For upgrade instructions, see Upgrade instructions.

Where to get help?

In case you are lost or something is not discussed in this guide, you might want to subscribe to the IntelMQ Users Mailinglist and ask your questions there.

With that clarified, let’s dig into the details…

Configure services

You need to enable and start Redis if not already done. Using systemd it can be done with:

systemctl enable redis.service
systemctl start redis.service

Configuration

/opt and LSB paths

If you installed the packages, standard Linux paths (LSB paths) are used: /var/log/intelmq/, /etc/intelmq/, /var/lib/intelmq/, /var/run/intelmq/. Otherwise, the configuration directory is /opt/intelmq/etc/. Using the environment variable INTELMQ_ROOT_DIR allows setting any arbitrary root directory.

You can switch this by setting the environment variables INTELMQ_PATHS_NO_OPT and INTELMQ_PATHS_OPT, respectively. * When installing the Python packages, you can set INTELMQ_PATHS_NO_OPT to something non-empty to use LSB-paths. * When installing the deb/rpm packages, you can set INTELMQ_PATHS_OPT to something non-empty to use /opt/intelmq/ paths, or a path set with INTELMQ_ROOT_DIR.

The environment variable ROOT_DIR is meant to set an alternative root directory instead of /. This is primarily meant for package build environments an analogous to setuptools’ --root parameter. Thus it is only used in LSB-mode.

Overview

All configuration files are in the JSON format. For new installations a default setup with some examples is provided by the intelmqsetup tool. If this is not the case, make sure the program was run (see installation instructions).

  • defaults.conf: default values for all bots and their behavior, e.g. error handling, log options and pipeline configuration. Will be removed in the future.

  • runtime.conf: Configuration for the individual bots. See Bots for more details.

  • pipeline.conf: Defines source and destination queues per bot (i.e. where does a bot get its data from, where does it send it to?).

  • BOTS: Includes configuration hints for all bots. E.g. feed URLs or database connection parameters. Use this as a template for runtime.conf. This is also read by the intelmq-manager.

To configure a new bot, you need to define and configure it in runtime.conf using the template from BOTS. Configure source and destination queues in pipeline.conf. Use the IntelMQ Manager mentioned above to generate the configuration files if unsure.

In the shipped examples 4 collectors and parsers, 6 common experts and one output are configured. The default collector and the parser handle data from malware domain list, the file output bot writes all data to /opt/intelmq/var/lib/bots/file-output/events.txt//var/lib/intelmq/bots/file-output/events.txt.

System Configuration (defaults)

All bots inherit the default configuration parameters and they can overwrite them using the same parameters in their respective configuration in the runtime.conf file. You can set the parameters from defaults.conf per bot as well. The settings will take effect for running bots after the bot re-reads the configuration (restart or reload).

Logging

The logging can be configured with the following parameters:

  • logging_handler: Can be one of "file" or "syslog".

  • logging_level: Defines the system-wide log level that will be use by all bots and the intelmqctl tool. Possible values are: "CRITICAL", "ERROR", "WARNING", "INFO" and "DEBUG".

  • logging_path: If logging_handler is file. Defines the system-wide log-folder that will be use by all bots and the intelmqctl tool. Default value: /opt/intelmq/var/log/ or /var/log/intelmq/ respectively.

  • logging_syslog: If logging_handler is syslog. Either a list with hostname and UDP port of syslog service, e.g. ["localhost", 514] or a device name/path, e.g. the default "/var/log".

We recommend logging_level WARNING for production environments and INFO if you want more details. In any case, watch your free disk space!

Log rotation

To rotate the logs, you can use the standard Linux-tool logrotate. An example logrotate configuration is given in contrib/logrotate/ and delivered with all deb/rpm-packages. When not using logrotate, IntelMQ can rotate the logs itself, which is not enabled by default! You need to set both values.

  • logging_max_size: Maximum number of bytes to be stored in one logfile before the file is rotated (default: 0, equivalent to unset).

  • logging_max_copies: Maximum number of logfiles to keep (default: unset). Compression is not supported.

Some information can as well be found in Python’s documentation on the used RotatingFileHandler.

Error Handling
  • error_log_message - in case of an error, this option will allow the bot to write the message (report or event) to the log file. Use the following values:
    • true/false - write or not write message to the log file

  • error_log_exception - in case of an error, this option will allow the bot to write the error exception to the log file. Use the following values:
    • true/false - write or not write exception to the log file

  • error_procedure - in case of an error, this option defines the procedure that the bot will adopt. Use the following values:

    • stop - stop bot after retrying X times (as defined in error_max_retries) with a delay between retries (as defined in error_retry_delay). If the bot reaches the error_max_retries value, it will remove the message from the pipeline and stop. If the option error_dump_message is also enable, the bot will dump the removed message to its dump file (to be found in var/log).

    • pass - will skip this message and will process the next message after retrying X times, removing the current message from pipeline. If the option error_dump_message is also enable, then the bot will dump the removed message to its dump file. After max retries are reached, the rate limit is applied (e.g. a collector bot fetch an unavailable resource does not try forever).

  • error_max_retries - in case of an error, the bot will try to re-start processing the current message X times as defined by this option. int value.

  • error_retry_delay - defines the number of seconds to wait between subsequent re-tries in case of an error. int value.

  • error_dump_message - specifies if the bot will write queued up messages to its dump file (use intelmqdump to re-insert the message).
    • true/false - write or not write message to the dump file

If the path _on_error exists for a bot, the message is also sent to this queue, instead of (only) dumping the file if configured to do so.

Miscellaneous
  • load_balance - this option allows you to choose the behavior of the queue. Use the following values:
    • true - splits the messages into several queues without duplication

    • false - duplicates the messages into each queue

    • When using AMQP as message broker, take a look at the Multithreading (Beta) section and the instances_threads parameter.

  • broker - select which broker intelmq can use. Use the following values:
    • redis - Redis allows some persistence but is not so fast as ZeroMQ (in development). But note that persistence has to be manually activated. See http://redis.io/topics/persistence

  • rate_limit - time interval (in seconds) between messages processing. int value.

  • ssl_ca_certificate - trusted CA certificate for IMAP connections (supported by some bots).

  • source_pipeline_host - broker IP, FQDN or Unix socket that the bot will use to connect and receive messages.

  • source_pipeline_port - broker port that the bot will use to connect and receive messages. Can be empty for Unix socket.

  • source_pipeline_password - broker password that the bot will use to connect and receive messages. Can be null for unprotected broker.

  • source_pipeline_db - broker database that the bot will use to connect and receive messages (requirement from redis broker).

  • destination_pipeline_host - broker IP, FQDN or Unix socket that the bot will use to connect and send messages.

  • destination_pipeline_port - broker port that the bot will use to connect and send messages. Can be empty for Unix socket.

  • destination_pipeline_password - broker password that the bot will use to connect and send messages. Can be null for unprotected broker.

  • destination_pipeline_db - broker database that the bot will use to connect and send messages (requirement from redis broker).

  • http_proxy - HTTP proxy the that bot will use when performing HTTP requests (e.g. bots/collectors/collector_http.py). The value must follow RFC 1738.

  • https_proxy - HTTPS proxy that the bot will use when performing secure HTTPS requests (e.g. bots/collectors/collector_http.py).

  • http_user_agent - user-agent string that the bot will use when performing HTTP/HTTPS requests (e.g. bots/collectors/collector_http.py).

  • http_verify_cert - defines if the bot will verify SSL certificates when performing HTTPS requests (e.g. bots/collectors/collector_http.py).
    • true/false - verify or not verify SSL certificates

Using supervisor as process manager (Beta)

First of all: Do not use it in production environments yet! It has not been tested thoroughly yet.

Supervisor is process manager written in Python. The main advantage is that it take care about processes, so if bot process exit with failure (exit code different than 0), supervisor try to run it again. Another advantage is that it not require writing PID files.

This was tested on Ubuntu 18.04.

Install supervisor. supervisor_twiddler is extension for supervisor, that makes possible to create process dynamically. (Ubuntu supervisor package is currently based on Python 2, so supervisor_twiddler must be installed with Python 2 pip.)

apt install supervisor python-pip
pip install supervisor_twiddler

Create default config /etc/supervisor/conf.d/intelmq.conf and restart supervisor service:

[rpcinterface:twiddler]
supervisor.rpcinterface_factory=supervisor_twiddler.rpcinterface:make_twiddler_rpcinterface

[group:intelmq]

Change IntelMQ process manager in the defaults configuration:

"process_manager": "supervisor",

After this it is possible to manage bots like before with intelmqctl command.

Pipeline Configuration

The pipeline configuration defines how the data is exchanges between the bots. For each bot, it defines the source queue (there is always only one) and one or multiple destination queues. This section shows the possibilities and definition as well as examples. The configuration of the pipeline can be done by the IntelMQ Manager with no need to intervene manually. It is recommended to use this tool as it guarantees that the configuration is correct. The location of the file is etc/pipeline.conf in your IntelMQ directory, for example /opt/intelmq/etc/pipeline.conf or /etc/intelmq/pipeline.conf.

Structure

The pipeline configuration has the same structure on the first level as the runtime configuration, i.e. it’s a dictionary with the bot IDs’ as keys. Each item holds again a dictionary with one entry for each the source and destination queue. A full example can be found later in this section.

{
    "example-bot": {
        "source-queue": <source queue data>,
        "destination-queues": <destination queue data>
    }
}
Source queue

The source queue is only a string, by convention the bot ID plus “-queue” appended. For example, if the bot ID is example-bot, the source queue name is example-bot-queue.

"source-queue": "example-bot-queue"

For collectors, this field does not exist, as the fetch the data from outside the IntelMQ system by definition.

Destination queues

There are multiple possibilities for the destination queues: - no value, i.e. the field does not exist. This is the case for outputs, as they push the data outside the IntelMQ system by default. - A single string (deprecated) with the name of the source queue of the next bot. - A list of strings, each with the name of the source queue of the next bot. - Named queues: a dictionary of either strings or lists.

Before going into the details of named paths, first dive into some simpler cases. A typical configuration may look like this:

"deduplicator-expert": {
    "source-queue": "deduplicator-expert-queue",
    "destination-queues": [
        "taxonomy-expert-queue"
    ]
}

And a bot with two destination queues:

"cymru-whois-expert": {
    "source-queue": "cymru-whois-expert-queue",
    "destination-queues": [
        "file-output-queue",
        "misp-output-queue"
    ]
}

These are the usual configurations you mostly see.

Named queues / paths

Beginning with version 1.1.0, queues can be “named”, these are the so-called paths. The following two configurations are equivalent:

"destination-queues": ["taxonomy-expert-queue"]
"destination-queues": {"_default": ["taxonomy-expert-queue"]}

As we can see the default path name is obviously _default. Let’s have a look at a more complex and complete example:

"destination-queues": {
    "_default": "<first destination pipeline name>",
    "_on_error": "<optional destination pipeline name in case of errors>",
    "other-path": [
        "<second destination pipeline name>",
        "<third destination pipeline name>",
        ...
        ],
    ...
    }

In that case, bot will be able to send the message to one of defined paths. The path "_default" is used if none is not specified. In case of errors during processing, and the optional path "_on_error" is specified, the message will be sent to the pipelines given given as on-error. Other destination queues can be explicitly addressed by the bots, e.g. bots with filtering capabilities. Some expert bots are capable of sending messages to paths, this feature is explained in their documentation, e.g. the filter expert and the Sieve. The named queues need to be explicitly addressed by the bot (e.g. filtering) or the core (_on_error) to be used. Setting arbitrary paths has no effect.

AMQP (Beta)

Starting with IntelMQ 1.2 the AMQP protocol is supported as message queue. To use it, install a broker, for example RabbitMQ. The configuration and the differences are outlined here. Keep in mind that it is slower, but has better monitoring capabilities and is more stable. The AMQP support is considered beta, so small problems might occur. So far, only RabbitMQ as broker has been tested.

You can change the broker for single bots (set the parameters in the runtime configuration per bot) or for the whole botnet (in defaults configuration).

You need to set the parameter source_pipeline_broker/destination_pipeline_broker to amqp. There are more parameters available:

  • destination_pipeline_broker: "amqp"

  • destination_pipeline_host (default: '127.0.0.1')

  • destination_pipeline_port (default: 5672)

  • destination_pipeline_username

  • destination_pipeline_password

  • destination_pipeline_socket_timeout (default: no timeout)

  • destination_pipeline_amqp_exchange: Only change/set this if you know what you do. If set, the destination queues are not declared as queues, but used as routing key. (default: '').

  • destination_pipeline_amqp_virtual_host (default: '/')

  • source_pipeline_host (default: '127.0.0.1')

  • source_pipeline_port (default: 5672)

  • source_pipeline_username

  • source_pipeline_password

  • source_pipeline_socket_timeout (default: no timeout)

  • source_pipeline_amqp_exchange: Only change/set this if you know what you do. If set, the destination queues are not declared as queues, but used as routing key. (default: ‘’).

  • source_pipeline_amqp_virtual_host (default: '/')

  • intelmqctl_rabbitmq_monitoring_url string, see below (default: "http://{host}:15672")

For getting the queue sizes, intelmqctl needs to connect to the monitoring interface of RabbitMQ. If the monitoring interface is not available under http://{host}:15672 you can manually set using the parameter intelmqctl_rabbitmq_monitoring_url. In a RabbitMQ’s default configuration you might not provide a user account, as by default the administrator (guest:guest) allows full access from localhost. If you create a separate user account, make sure to add the tag “monitoring” to it, otherwise IntelMQ can’t fetch the queue sizes.

RabbitMQ User Account Monitoring Tag

Setting the statistics (and cache) parameters is necessary when the local redis is running under a non-default host/port. If this is the case, you can set them explicitly:

  • statistics_database: 3

  • statistics_host: "127.0.0.1"

  • statistics_password: null

  • statistics_port: 6379

Runtime Configuration

This configuration is used by each bot to load its specific (runtime) parameters. Usually, the BOTS file is used to generate runtime.conf. Also, the IntelMQ Manager generates this configuration. You may edit it manually as well. Be sure to re-load the bot (see the intelmqctl documentation).

Template:

{
    "<bot ID>": {
        "group": "<bot type (Collector, Parser, Expert, Output)>",
        "name": "<human-readable bot name>",
        "module": "<bot code (python module)>",
        "description": "<generic description of the bot>",
        "parameters": {
            "<parameter 1>": "<value 1>",
            "<parameter 2>": "<value 2>",
            "<parameter 3>": "<value 3>"
        }
    }
}

Example:

{
    "malware-domain-list-collector": {
        "group": "Collector",
        "name": "Malware Domain List",
        "module": "intelmq.bots.collectors.http.collector_http",
        "description": "Malware Domain List Collector is the bot responsible to get the report from source of information.",
        "parameters": {
            "http_url": "http://www.malwaredomainlist.com/updatescsv.php",
            "feed": "Malware Domain List",
            "rate_limit": 3600
        }
    }
}

More examples can be found in the intelmq/etc/runtime.conf directory. See Bots for more details.

By default, all of the bots are started when you start the whole botnet, however there is a possibility to disable a bot. This means that the bot will not start every time you start the botnet, but you can start and stop the bot if you specify the bot explicitly. To disable a bot, add the following to your runtime.conf: "enabled": false. For example:

{
    "malware-domain-list-collector": {
        "group": "Collector",
        "name": "Malware Domain List",
        "module": "intelmq.bots.collectors.http.collector_http",
        "description": "Malware Domain List Collector is the bot responsible to get the report from source of information.",
        "enabled": false,
        "parameters": {
            "http_url": "http://www.malwaredomainlist.com/updatescsv.php",
            "feed": "Malware Domain List",
            "rate_limit": 3600
        }
    }
}
Multithreading (Beta)

First of all: Do not use it in production environments yet! There are a few bugs, see below

Since IntelMQ 2.0 it is possible to provide the following parameter:

  • instances_threads

Set it to a non-zero integer, then this number of worker threads will be spawn. This is useful if bots often wait for system resources or if network-based lookups are a bottleneck.

However, there are currently a few cavecats:

  • This is not possible for all bots, there are some exceptions (collectors and some outputs), see the Frequently asked questions for some reasons.

  • Only use it with the AMQP pipeline, as with Redis, messages may get duplicated because there’s only one internal queue

  • In the logs, you can see the main thread initializing first, then all of the threads which log with the name [bot-id].[thread-id].

Harmonization Configuration

This configuration is used to specify the fields for all message types. The harmonization library will load this configuration to check, during the message processing, if the values are compliant to the “harmonization” format. Usually, this configuration doesn’t need any change. It is mostly maintained by the intelmq maintainers.

Template:

{
    "<message type>": {
        "<field 1>": {
            "description": "<field 1 description>",
            "type": "<field value type>"
        },
        "<field 2>": {
            "description": "<field 2 description>",
            "type": "<field value type>"
        }
    },
}

Example:

{
    "event": {
        "destination.asn": {
            "description": "The autonomous system number from which originated the connection.",
            "type": "Integer"
        },
        "destination.geolocation.cc": {
            "description": "Country-Code according to ISO3166-1 alpha-2 for the destination IP.",
            "regex": "^[a-zA-Z0-9]{2}$",
            "type": "String"
        },
    },
}

More examples can be found in the intelmq/etc/harmonization.conf directory.

Utilities

Management

IntelMQ has a modular structure consisting of bots. There are four types of bots:

  • Collector Bots retrieve data from internal or external sources, the output are reports consisting of many individual data sets / log lines.

  • Parser Bots parse the (report) data by splitting it into individual events (log lines) and giving them a defined structure, see also Data Harmonization for the list of fields an event may be split up into.

  • Expert Bots enrich the existing events by e.g. lookup up information such as DNS reverse records, geographic location information (country code) or abuse contacts for an IP address or domain name.

  • Output Bots write events to files, databases, (REST)-APIs or any other data sink that you might want to write to.

Each bot has one source queue (except collectors) and can have multiple destination queues (except outputs). But multiple bots can write to the same pipeline (queue), resulting in multiple inputs for the next bot.

Every bot runs in a separate process. A bot is identifiable by a bot id.

Currently only one instance (i.e. with the same bot id) of a bot can run at the same time. Concepts for multiprocessing are being discussed, see this issue: Multiprocessing per queue is not supported #186. Currently you can run multiple processes of the same bot (with different bot ids) in parallel.

Example: multiple gethostbyname bots (with different bot ids) may run in parallel, with the same input queue and sending to the same output queue. Note that the bot providing the input queue must have the load_balance option set to true.

Web interface: IntelMQ Manager

IntelMQ has a tool called IntelMQ Manager that gives users an easy way to configure all pipelines with bots that your team needs. For beginners, it’s recommended to use the IntelMQ Manager to become acquainted with the functionalities and concepts. The IntelMQ Manager offers some of the possibilities of the intelmqctl tool and has a graphical interface for runtime and pipeline configurations.

See the IntelMQ Manager repository.

Command-line interface: intelmqctl

Syntax see intelmqctl -h

  • Starting a bot: intelmqctl start bot-id

  • Stopping a bot: intelmqctl stop bot-id

  • Reloading a bot: intelmqctl reload bot-id

  • Restarting a bot: intelmqctl restart bot-id

  • Get status of a bot: intelmqctl status bot-id

  • Run a bot directly for debugging purpose and temporarily leverage the logging level to DEBUG: intelmqctl run bot-id

  • Get a pdb (or ipdb if installed) live console. intelmqctl run bot-id console

  • See the message that waits in the input queue. intelmqctl run bot-id message get

  • See additional help for further explanation. intelmqctl run bot-id --help

  • Starting the botnet (all bots): intelmqctl start

  • Starting a group of bots: intelmqctl start --group experts

  • Get a list of all configured bots: intelmqctl list bots

  • Get a list of all queues: intelmqctl list queues If -q is given, only queues with more than one item are listed.

  • Get a list of all queues and status of the bots: intelmqctl list queues-and-status

  • Clear a queue: intelmqctl clear queue-id

  • Get logs of a bot: intelmqctl log bot-id number-of-lines log-level Reads the last lines from bot log. Log level should be one of DEBUG, INFO, ERROR or CRITICAL. Default is INFO. Number of lines defaults to 10, -1 gives all. Result can be longer due to our logging format!

  • Upgrade from a previous version: intelmqctl upgrade-config Make a backup of your configuration first, also including bot’s configuration files.

Botnet Concept

The “botnet” represents all currently configured bots which are explicitly enabled. It is, in essence, the graph (pipeline.conf) of the bots which are connected together via their input source queues and destination queues.

To get an overview which bots are running, use intelmqctl status or use the IntelMQ Manager. Set "enabled": true in the runtime configuration to add a bot to the botnet. By default, bots will be configured as "enabled": true. See Bots for more details on configuration.

Disabled bots can still be started explicitly using intelmqctl start <bot_id>, but will remain in the state disabled if stopped (and not be implicitly enabled by the start command). They are not started by intelmqctl start in analogy to the behavior of widely used initialization systems.

Scheduled Run Mode

In many cases, it is useful to schedule a bot at a specific time (i.e. via cron(1)), for example to collect information from a website every day at midnight. To do this, set run_mode to scheduled in the runtime.conf for the bot. Check out the following example:

"blocklistde-apache-collector": {
    "name": "Generic URL Fetcher",
    "group": "Collector",
    "module": "intelmq.bots.collectors.http.collector_http",
    "description": "All IP addresses which have been reported within the last 48 hours as having run attacks on the service Apache, Apache-DDOS, RFI-Attacks.",
    "enabled": false,
    "run_mode": "scheduled",
    "parameters": {
        "feed": "Blocklist.de Apache",
        "provider": "Blocklist.de",
        "http_url": "https://lists.blocklist.de/lists/apache.txt",
        "ssl_client_certificate": null
    },
}

You can schedule the bot with a crontab-entry like this:

0 0 * * * intelmqctl start blocklistde-apache-collector

Bots configured as scheduled will exit after the first successful run. Setting enabled to false will cause the bot to not start with intelmqctl start, but only with an explicit start, in this example intelmqctl start blocklistde-apache-collector.

Continuous Run Mode

Most of the cases, bots will need to be configured as continuous run mode (the default) in order to have them always running and processing events. Usually, the types of bots that will require the continuous mode will be Parsers, Experts and Outputs. To do this, set run_mode to continuous in the runtime.conf for the bot. Check the following example:

"blocklistde-apache-parser": {
    "name": "Blocklist.de Parser",
    "group": "Parser",
    "module": "intelmq.bots.parsers.blocklistde.parser",
    "description": "Blocklist.DE Parser is the bot responsible to parse the report and sanitize the information.",
    "enabled": false,
    "run_mode": "continuous",
    "parameters": {
    },
}

You can now start the bot using the following command:

intelmqctl start blocklistde-apache-parser

Bots configured as continuous will never exit except if there is an error and the error handling configuration requires the bot to exit. See the Error Handling section for more details.

Reloading

Whilst restart is a mere stop & start, performing intelmqctl reload <bot_id> will not stop the bot, permitting it to keep the state: the same common behavior as for (Linux) daemons. It will initialize again (including reading all configuration again) after the current action is finished. Also, the rate limit/sleep is continued (with the new time) and not interrupted like with the restart command. So if you have a collector with a rate limit of 24 h, the reload does not trigger a new fetching of the source at the time of the reload, but just 24 h after the last run – with the new configuration. Which state the bots are keeping depends on the bots of course.

Forcing reset pipeline and cache (be careful)

If you are using the default broker (Redis), in some test situations you may need to quickly clear all pipelines and caches. Use the following procedure:

redis-cli FLUSHDB
redis-cli FLUSHALL

Error Handling

Tool: intelmqdump

When bots are failing due to bad input data or programming errors, they can dump the problematic message to a file along with a traceback, if configured accordingly. These dumps are saved at in the logging directory as [botid].dump as JSON files. IntelMQ comes with an inspection and reinjection tool, called intelmqdump. It is an interactive tool to show all dumped files and the number of dumps per file. Choose a file by bot-id or listed numeric id. You can then choose to delete single entries from the file with e 1,3,4, show a message in more readable format with s 1 (prints the raw-message, can be long!), recover some messages and put them back in the pipeline for the bot by a or r 0,4,5. Or delete the file with all dumped messages using d.

intelmqdump -h
usage:
    intelmqdump [botid]
    intelmqdump [-h|--help]

intelmqdump can inspect dumped messages, show, delete or reinject them into
the pipeline. It's an interactive tool, directly start it to get a list of
available dumps or call it with a known bot id as parameter.

positional arguments:
  botid       botid to inspect dumps of

optional arguments:
  -h, --help  show this help message and exit
  --truncate TRUNCATE, -t TRUNCATE
                        Truncate raw-data with more characters than given. 0 for no truncating. Default: 1000.

Interactive actions after a file has been selected:
- r, Recover by IDs
  > r id{,id} [queue name]
  > r 3,4,6
  > r 3,7,90 modify-expert-queue
  The messages identified by a consecutive numbering will be stored in the
  original queue or the given one and removed from the file.
- a, Recover all
  > a [queue name]
  > a
  > a modify-expert-queue
  All messages in the opened file will be recovered to the stored or given
  queue and removed from the file.
- e, Delete entries by IDs
  > e id{,id}
  > e 3,5
  The entries will be deleted from the dump file.
- d, Delete file
  > d
  Delete the opened file as a whole.
- s, Show by IDs
  > s id{,id}
  > s 0,4,5
  Show the selected IP in a readable format. It's still a raw format from
  repr, but with newlines for message and traceback.
- v, Edit by ID
  > v id
  > v 0
  > v 1,2
  Opens an editor (by calling `sensible-editor`) on the message. The modified message is then saved in the dump.
- q, Quit
  > q

$ intelmqdump
 id: name (bot id)                    content
  0: alienvault-otx-parser            1 dumps
  1: cymru-whois-expert               8 dumps
  2: deduplicator-expert              2 dumps
  3: dragon-research-group-ssh-parser 2 dumps
  4: file-output2                     1 dumps
  5: fraunhofer-dga-parser            1 dumps
  6: spamhaus-cert-parser             4 dumps
  7: test-bot                         2 dumps
Which dump file to process (id or name)? 3
Processing dragon-research-group-ssh-parser: 2 dumps
  0: 2015-09-03T13:13:22.159014 InvalidValue: invalid value u'NA' (<type 'unicode'>) for key u'source.asn'
  1: 2015-09-01T14:40:20.973743 InvalidValue: invalid value u'NA' (<type 'unicode'>) for key u'source.asn'
recover (a)ll, delete (e)ntries, (d)elete file, (q)uit, (s)how by ids, (r)ecover by ids? d
Deleted file /opt/intelmq/var/log/dragon-research-group-ssh-parser.dump

Bots and the intelmqdump tool use file locks to prevent writing to already opened files. Bots are trying to lock the file for up to 60 seconds if the dump file is locked already by another process (intelmqdump) and then give up. Intelmqdump does not wait and instead only shows an error message.

By default, the show command truncates the raw field of messages at 1000 characters to change this limit or disable truncating at all (value 0), use the --truncate parameter.

Monitoring Logs

All bots and intelmqctl log to /opt/intelmq/var/log//var/log/intelmq/ (depending on your installation). In case of failures, messages are dumped to the same directory with the file ending .dump.

tail -f /opt/intelmq/var/log/*.log
tail -f /var/log/intelmq/*.log

Uninstall

If you installed intelmq with native packages: Use the package management tool to remove the package intelmq. These tools do not remove configuration by default.

If you installed manually via pip (note that this also deletes all configuration and possibly data):

pip3 uninstall intelmq
rm -r /opt/intelmq

Integration with ticket systems, etc.

First of all, IntelMQ is a message (event) processing system: it collects feeds, processes them, enriches them, filters them and then stores them somewhere or sends them to another system. It does this in a composable, data flow oriented fashion, based on single events. There are no aggregation or grouping features. Now, if you want to integrate IntelMQ with your ticket system or some other system, you need to send its output to somewhere where your ticket system or other services can pick up IntelMQ’s data. This could be a database, splunk, or you could send your events directly via email to a ticket system.

Different users came up with different solutions for this, each of them fitting their own organisation. Hence these solutions are not part of the core IntelMQ repository.
  • CERT.at uses a postgresql DB (sql output bot) and has a small tool intelmqcli which fetches the events in the postgresql DB which are marked as “new” and will group them and send them out via the RT ticket system.

  • Others, including BSI, use a tool called intelmq-mailgen. It sends E-Mails to the recipients, optionally PGP-signed with defined text-templates, CSV formatted attachments with grouped events and generated ticket numbers.

The following lists external github repositories which you might consult for examples on how to integrate IntelMQ into your workflow:

If you came up with another solution for integration, we’d like to hear from you! Please reach out to us on the IntelMQ Users Mailinglist.

Frequently Asked Questions

Consult the Frequently asked questions if you encountered any problems.

Additional Information

Bash Completion

To enable bash completion on intelmqctl and intelmqdump in order to help you run the commands in an easy manner, follow the installation process here.

Bots

Contents

General remarks

By default all of the bots are started when you start the whole botnet, however there is a possibility to disable a bot. This means that the bot will not start every time you start the botnet, but you can start and stop the bot if you specify the bot explicitly. To disable a bot, add the following to your runtime.conf: “enabled”: false. Be aware that this is not a normal parameter (like the others described in this file). It is set outside of the parameters object in runtime.conf. Check out Configuration and Management for an example.

There are two different types of parameters: The initialization parameters are need to start the bot. The runtime parameters are needed by the bot itself during runtime.

The initialization parameters are in the first level, the runtime parameters live in the parameters sub-dictionary:

{
    "bot-id": {
        "parameters": {
            runtime parameters...
        },
        initialization parameters...
    }
}

For example:

{
    "abusech-feodo-domains-collector": {
        "parameters": {
            "provider": "Abuse.ch",
            "name": "Abuse.ch Feodo Domains",
            "http_url": "http://example.org/feodo-domains.txt"
        },
        "name": "Generic URL Fetcher",
        "group": "Collector",
        "module": "intelmq.bots.collectors.http.collector_http",
        "description": "collect report messages from remote hosts using http protocol",
        "enabled": true,
        "run_mode": "scheduled"
    }
}

This configuration resides in the file runtime.conf in your IntelMQ’s configuration directory for each configured bot.

Initialization parameters

  • name and description: The name and description of the bot as can be found in BOTS-file, not used by the bot itself.

  • group: Can be “Collector”, “Parser”, “Expert” or “Output”. Only used for visualization by other tools.

  • module: The executable (should be in $PATH) which will be started.

  • enabled: If the parameter is set to true (which is NOT the default value if it is missing as a protection) the bot will start when the botnet is started (intelmqctl start). If the parameter was set to false, the Bot will not be started by intelmqctl start, however you can run the bot independently using intelmqctl start <bot_id>. Check Configuration and Management for more details.

  • run_mode: There are two run modes, “continuous” (default run mode) or “scheduled”. In the first case, the bot will be running forever until stopped or exits because of errors (depending on configuration). In the latter case, the bot will stop after one successful run. This is especially useful when scheduling bots via cron or systemd. Default is continuous. Check Configuration and Management for more details.

Common parameters

Feed parameters: Common configuration options for all collectors.

  • name: Name for the feed (feed.name). In IntelMQ versions smaller than 2.2 the parameter name feed is also supported.

  • accuracy: Accuracy for the data of the feed (feed.accuracy).

  • code: Code for the feed (feed.code).

  • documentation: Link to documentation for the feed (feed.documentation).

  • provider: Name of the provider of the feed (feed.provider).

  • rate_limit: time interval (in seconds) between fetching data if applicable.

HTTP parameters: Common URL fetching parameters used in multiple bots.

  • http_timeout_sec: A tuple of floats or only one float describing the timeout of the HTTP connection. Can be a tuple of two floats (read and connect timeout) or just one float (applies for both timeouts). The default is 30 seconds in default.conf, if not given no timeout is used. See also https://requests.readthedocs.io/en/master/user/advanced/#timeouts

  • http_timeout_max_tries: An integer depicting how often a connection is retried, when a timeout occurred. Defaults to 3 in default.conf.

  • http_username: username for basic authentication.

  • http_password: password for basic authentication.

  • http_proxy: proxy to use for HTTP

  • https_proxy: proxy to use for HTTPS

  • http_user_agent: user agent to use for the request.

  • http_verify_cert: path to trusted CA bundle or directory, false to ignore verifying SSL certificates, or true (default) to verify SSL certificates

  • ssl_client_certificate: SSL client certificate to use.

  • ssl_ca_certificate: Optional string of path to trusted CA certificate. Only used by some bots.

  • http_header: HTTP request headers

Cache parameters: Common Redis cache parameters used in multiple bots (mainly lookup experts):

  • redis_cache_host: Hostname of the Redis database.

  • redis_cache_port: Port of the Redis database.

  • redis_cache_db: Database number.

  • redis_cache_ttl: TTL used for caching.

  • redis_cache_password: Optional password for the Redis database (default: none).

Collector Bots

Multihreading is disabled for all Collectors, as this would lead to duplicated data.

AMQP

Requires the pika python library, minimum version 1.0.0.

Information

  • name: intelmq.bots.collectors.amqp.collector_amqp

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: collect data from (remote) AMQP servers, for both IntelMQ as well as external data

Configuration Parameters

  • Feed parameters (see above)

  • connection_attempts: The number of connection attempts to defined server, defaults to 3

  • connection_heartbeat: Heartbeat to server, in seconds, defaults to 3600

  • connection_host: Name/IP for the AMQP server, defaults to 127.0.0.1

  • connection_port: Port for the AMQP server, defaults to 5672

  • connection_vhost: Virtual host to connect, on an HTTP(S) connection would be http:/IP/<your virtual host>

  • expect_intelmq_message: Boolean, if the data is from IntelMQ or not. Default: false. If true, then the data can be any Report or Event and will be passed to the next bot as is. Otherwise a new report is created with the raw data.

  • password: Password for authentication on your AMQP server

  • queue_name: The name of the queue to fetch data from

  • username: Username for authentication on your AMQP server

  • use_ssl: Use ssl for the connection, make sure to also set the correct port, usually 5671 (true/false)

Currently only fetching from a queue is supported can be extended in the future. Messages will be acknowledge at AMQP after it is sent to the pipeline.

API

Information

  • name: intelmq.bots.collectors.api.collector

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: collect report messages from an HTTP REST API

Configuration Parameters

  • Feed parameters (see above)

  • port: Optional, integer. Default: 5000. The local port, the API will be available at.

The API is available at /intelmq/push. The tornado library is required.

Generic URL Fetcher

Information

  • name: intelmq.bots.collectors.http.collector_http

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: collect report messages from remote hosts using HTTP protocol

Configuration Parameters

  • Feed parameters (see above)

  • HTTP parameters (see above)

  • extract_files: Optional, boolean or list of strings. If it is true, the retrieved (compressed) file or archived will be uncompressed/unpacked and the files are extracted. If the parameter is a list for strings, only the files matching the filenames are extracted. Extraction handles gzipped files and both compressed and uncompressed tar-archives as well as zip archives.

  • http_url: location of information resource (e.g. https://feodotracker.abuse.ch/blocklist/?download=domainblocklist)

  • http_url_formatting: (bool|JSON, default: false) If true, {time[format]} will be replaced by the current time in local timezone formatted by the given format. E.g. if the URL is http://localhost/{time[%Y]}, then the resulting URL is http://localhost/2019 for the year 2019. (Python’s Format Specification Mini-Language is used for this.). You may use a JSON specifying time-delta parameters to shift the current time accordingly. For example use {“days”: -1} for the yesterday’s date; the URL http://localhost/{time[%Y-%m-%d]} will get translated to “http://localhost/2018-12-31” for the 1st Jan of 2019.

  • verify_pgp_signatures: bool, defaults to false. If true, signature file is downloaded and report file is checked. On error (missing signature, mismatch, …), the error is logged and the report is not processed. Public key has to be imported in local keyring. This requires the python-gnupg library.

  • signature_url: Location of signature file for downloaded content. For path http://localhost/data/latest.json this may be for example http://localhost/data/latest.asc.

  • signature_url_formatting: (bool|JSON, default: false) The same as http_url_formatting, only for the signature file.

  • gpg_keyring: string or none (default). If specified, the string represents path to keyring file, otherwise the PGP keyring file for current intelmq user is used.

Zipped files are automatically extracted if detected.

For extracted files, every extracted file is sent in its own report. Every report has a field named extra.file_name with the file name in the archive the content was extracted from.

HTTP Response status code checks

If the HTTP response’ status code is not 2xx, this is treated as error.

In Debug logging level, the request’s and response’s headers and body are logged for further inspection.

Generic URL Stream Fetcher

Information

  • name: intelmq.bots.collectors.http.collector_http_stream

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: Opens a streaming connection to the URL and sends the received lines.

Configuration Parameters

  • Feed parameters (see above)

  • HTTP parameters (see above)

  • strip_lines: boolean, if single lines should be stripped (removing whitespace from the beginning and the end of the line)

If the stream is interrupted, the connection will be aborted using the timeout parameter. No error will be logged if the number of consecutive connection fails does not reach the parameter error_max_retries. Instead of errors, an INFO message is logged. This is a measurement against too frequent ERROR logging messages. The consecutive connection fails are reset if a data line has been successfully transferred. If the consecutive connection fails reaches the parameter error_max_retries, an exception will be thrown and rate_limit applies, if not null.

The parameter http_timeout_max_tries is of no use in this collector.

Generic Mail URL Fetcher

Information

  • name: intelmq.bots.collectors.mail.collector_mail_url

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: collect messages from mailboxes, extract URLs from that messages and download the report messages from the URLs.

Configuration Parameters

  • Feed parameters (see above)

  • HTTP parameters (see above)

  • mail_host: FQDN or IP of mail server

  • mail_user: user account of the email account

  • mail_password: password associated with the user account

  • mail_port: IMAP server port, optional (default: 143 without SSL, 993 for SSL)

  • mail_ssl: whether the mail account uses SSL (default: true)

  • folder: folder in which to look for mails (default: INBOX)

  • subject_regex: regular expression to look for a subject

  • url_regex: regular expression of the feed URL to search for in the mail body

  • sent_from: filter messages by sender

  • sent_to: filter messages by recipient

  • ssl_ca_certificate: Optional string of path to trusted CA certificate. Applies only to IMAP connections, not HTTP. If the provided certificate is not found, the IMAP connection will fail on handshake. By default, no certificate is used.

The resulting reports contains the following special fields:

  • feed.url: The URL the data was downloaded from

  • extra.email_date: The content of the email’s Date header

  • extra.email_subject: The subject of the email

  • extra.email_from: The email’s from address

  • extra.email_message_id: The email’s message ID

  • extra.file_name: The file name of the downloaded file (extracted from the HTTP Response Headers if possible).

Chunking

For line-based inputs the bot can split up large reports into smaller chunks.

This is particularly important for setups that use Redis as a message queue which has a per-message size limitation of 512 MB.

To configure chunking, set chunk_size to a value in bytes. chunk_replicate_header determines whether the header line should be repeated for each chunk that is passed on to a parser bot.

Specifically, to configure a large file input to work around Redis’ size limitation set chunk_size to something like 384000000, i.e., ~384 MB.

Generic Mail Attachment Fetcher

Information

  • name: intelmq.bots.collectors.mail.collector_mail_attach

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: collect messages from mailboxes, download the report messages from the attachments.

Configuration Parameters

  • Feed parameters (see above)

  • extract_files: Optional, boolean or list of strings. See documentation of the Generic URL Fetcher for more details.

  • mail_host: FQDN or IP of mail server

  • mail_user: user account of the email account

  • mail_password: password associated with the user account

  • mail_port: IMAP server port, optional (default: 143 without SSL, 993 for SSL)

  • mail_ssl: whether the mail account uses SSL (default: true)

  • folder: folder in which to look for mails (default: INBOX)

  • subject_regex: regular expression to look for a subject

  • attach_regex: regular expression of the name of the attachment

  • attach_unzip: whether to unzip the attachment. Only extracts the first file. Deprecated, use extract_files instead.

  • sent_from: filter messages by sender

  • sent_to: filter messages by recipient

  • ssl_ca_certificate: Optional string of path to trusted CA certificate. Applies only to IMAP connections, not HTTP. If the provided certificate is not found, the IMAP connection will fail on handshake. By default, no certificate is used.

The resulting reports contains the following special fields: * extra.email_date: The content of the email’s Date header * extra.email_subject: The subject of the email * extra.email_from: The email’s from address * extra.email_message_id: The email’s message ID * extra.file_name: The file name of the attachment or the file name in the attached archive if attachment is to uncompress.

Generic Mail Body Fetcher

Information

  • name: intelmq.bots.collectors.mail.collector_mail_body

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: collect messages from mailboxes, forwards the bodies as reports. Each non-empty body with the matching content type is sent as individual report.

Configuration Parameters

  • Feed parameters (see above)

  • mail_host: FQDN or IP of mail server

  • mail_user: user account of the email account

  • mail_password: password associated with the user account

  • mail_port: IMAP server port, optional (default: 143 without SSL, 993 for SSL)

  • mail_ssl: whether the mail account uses SSL (default: true)

  • folder: folder in which to look for mails (default: INBOX)

  • subject_regex: regular expression to look for a subject

  • sent_from: filter messages by sender

  • sent_to: filter messages by recipient

  • ssl_ca_certificate: Optional string of path to trusted CA certificate. Applies only to IMAP connections, not HTTP. If the provided certificate is not found, the IMAP connection will fail on handshake. By default, no certificate is used.

  • content_types: Which bodies to use based on the content_type. Default: true/[‘html’, ‘plain’] for all: - string with comma separated values, e.g. [‘html’, ‘plain’] - true, false, null: Same as default value - string, e.g. ‘plain’

The resulting reports contains the following special fields: * extra.email_date: The content of the email’s Date header * extra.email_subject: The subject of the email * extra.email_from: The email’s from address * extra.email_message_id: The email’s message ID

Github API

Information

  • name: intelmq.bots.collectors.github_api.collector_github_contents_api

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: Collects files matched by regular expression from GitHub repository via the GitHub API. Optionally with GitHub credentials, which are used as the Basic HTTP authentication.

Configuration Parameters

  • Feed parameters (see above)

  • basic_auth_username: GitHub account username (optional)

  • basic_auth_password: GitHub account password (optional)

  • repository: GitHub target repository (<USER>/<REPOSITORY>)

  • regex: Valid regular expression of target files within the repository (defaults to .*.json)

  • extra_fields: Comma-separated list of extra fields from GitHub contents API.

Workflow

The optional authentication parameters provide a high limit of the GitHub API requests. With the git hub user authentication, the requests are rate limited to 5000 per hour, otherwise to 60 requests per hour.

The collector recursively searches for regex-defined files in the provided repository. Additionally it adds extra file metadata defined by the extra_fields.

The bot always sets the url, from which downloaded the file, as feed.url.

Fileinput

Information

  • name: intelmq.bots.collectors.file.collector_file

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: This bot is capable of reading files from the local file-system. This is handy for testing purposes, or when you need to react to spontaneous events. In combination with the Generic CSV Parser this should work great.

Configuration Parameters

  • Feed parameters (see above)

  • path: path to file

  • postfix: The postfix (file ending) of the files to look for. For example .csv.

  • delete_file: whether to delete the file after reading (default: false)

The resulting reports contains the following special fields:

  • feed.url: The URI using the file:// scheme and localhost, with the full path to the processed file.

  • extra.file_name: The file name (without path) of the processed file.

Chunking

Additionally, for line-based inputs the bot can split up large reports into smaller chunks.

This is particularly important for setups that use Redis as a message queue which has a per-message size limitation of 512 MB.

To configure chunking, set chunk_size to a value in bytes. chunk_replicate_header determines whether the header line should be repeated for each chunk that is passed on to a parser bot.

Specifically, to configure a large file input to work around Redis’ size limitation set chunk_size to something like 384000, i.e., ~384 MB.

Workflow

The bot loops over all files in path and tests if their file name matches postfix, e.g. `.csv`. If yes, the file will be read and inserted into the queue.

If delete_file is set, the file will be deleted after processing. If deletion is not possible, the bot will stop.

To prevent data loss, the bot also stops when no postfix is set and delete_file was set. This cannot be overridden.

The bot always sets the file name as feed.url

Kafka

Requires the kafka python library.

Information

  • name: intelmq.bots.collectors.kafka.collector

Configuration parameters

  • topic: the kafka topic the collector should get messages from

  • bootstrap_servers: the kafka server(s) the collector should connect to. Defaults to localhost:9092

  • ssl_check_hostname: false to ignore verifying SSL certificates, or true (default) to verify SSL certificates

  • ssl_client_certificate: SSL client certificate to use.

  • ssl_ca_certificate: Optional string of path to trusted CA certificate. Only used by some bots.

Rsync

Requires the rsync executable

Information

  • name: intelmq.bots.collectors.rsync.collector_rsync

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: Bot download file by rsync and then load data from downloaded file. Downloaded file is located in var/lib/bots/rsync_collector.

Configuration Parameters

  • Feed parameters (see above)

  • file: Name of downloaded file.

  • rsync_path: Path to file. It can be “/home/username/directory” or “username@remote_host:/home/username/directory”

  • temp_directory: Path of a temporary state directory to use for rsync’d files. Optional. Default: /opt/intelmq/var/run/rsync_collector/.

MISP Generic

Information

  • name: intelmq.bots.collectors.misp.collector

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: collect messages from MISP, a malware information sharing platform server.

Configuration Parameters

  • Feed parameters (see above)

  • misp_url: URL of MISP server (with trailing ‘/’)

  • misp_key: MISP Authkey

  • misp_tag_to_process: MISP tag for events to be processed

  • misp_tag_processed: MISP tag for processed events, optional

Generic parameters used in this bot: * http_verify_cert: Verify the TLS certificate of the server, boolean (default: true)

Workflow This collector will search for events on a MISP server that have a to_process tag attached to them (see the misp_tag_to_process parameter) and collect them for processing by IntelMQ. Once the MISP event has been processed the to_process tag is removed from the MISP event and a processed tag is then attached (see the misp_tag_processed parameter).

NB. The MISP tags must be configured to be ‘exportable’ otherwise they will not be retrieved by the collector.

Request Tracker

Information

  • name: intelmq.bots.collectors.rt.collector_rt

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: Request Tracker Collector fetches attachments from an RTIR instance.

You need the rt-library >= 1.9 from nic.cz, available via pypi: pip3 install rt

This rt bot will connect to RT and inspect the given search_queue for tickets matching all criteria in search_*, Any matches will be inspected. For each match, all (RT-) attachments of the matching RT tickets are iterated over and within this loop, the first matching filename in the attachment is processed. If none of the filename matches apply, the contents of the first (RT-) “history” item is matched against the regular expression for the URL (url_regex).

Configuration Parameters

  • Feed parameters (see above)

  • HTTP parameters (see above)

  • extract_attachment: Optional, boolean or list of strings. See documentation of the Generic URL Fetcher parameter extract_files for more details.

  • extract_download: Optional, boolean or list of strings. See documentation of the Generic URL Fetcher parameter extract_files for more details.

  • uri: URL of the REST interface of the RT

  • user: RT username

  • password: RT password

  • search_not_older_than: Absolute time (use ISO format) or relative time, e.g. 3 days.

  • search_owner: owner of the ticket to search for (default: nobody)

  • search_queue: queue of the ticket to search for (default: Incident Reports)

  • search_status: status of the ticket to search for (default: new)

  • search_subject_like: part of the subject of the ticket to search for (default: Report)

  • set_status: status to set the ticket to after processing (default: open). false or null to not set a different status.

  • take_ticket: whether to take the ticket (default: true)

  • url_regex: regular expression of an URL to search for in the ticket

  • attachment_regex: regular expression of an attachment in the ticket

  • unzip_attachment: whether to unzip a found attachment. Only the first file in the archive is used. Deprecated in favor of extract_attachment.

The parameter http_timeout_max_tries is of no use in this collector.

The resulting reports contains the following special fields:

  • rtir_id: The ticket ID

  • extra.email_subject and extra.ticket_subject: The subject of the ticket

  • extra.email_from and extra.ticket_requestors: Comma separated list of requestor’s email addresses.

  • extra.ticket_owner: The ticket’s owner name

  • extra.ticket_status: The ticket’s status

  • extra.ticket_queue: The ticket’s queue

  • extra.file_name: The name of the extracted file, the name of the downloaded file or the attachments’ filename without .gz postfix.

  • time.observation: The creation time of the ticket or attachment.

Search

The parameters prefixed with search_ allow configuring the ticket search.

Empty strings and null as value for search parameters are ignored.

File downloads

Attachments can be optionally unzipped, remote files are downloaded with the http_* settings applied (see defaults.conf).

If url_regex or attachment_regex are empty strings, false or null, they are ignored.

Ticket processing

Optionally, the RT bot can “take” RT tickets (i.e. the user is assigned this ticket now) and/or the status can be changed (leave set_status empty in case you don’t want to change the status). Please note however that you MUST do one of the following: either “take” the ticket or set the status (set_status). Otherwise, the search will find the ticket every time and we will have generated an endless loop.

In case a resource needs to be fetched and this resource is permanently not available (status code is 4xx), the ticket status will be set according to the configuration to avoid processing the ticket over and over. For temporary failures the status is not modified, instead the ticket will be skipped in this run.

Time search

To find only tickets newer than a given absolute or relative time, you can use the search_not_older_than parameter. Absolute time specification can be anything parseable by dateutil, best use a ISO format.

Relative must be in this format: [number] [timespan]s, e.g. 3 days. timespan can be hour, day, week, month, year. Trailing ‘s’ is supported for all timespans. Relative times are subtracted from the current time directly before the search is performed.

Rsync

Information

  • name: intelmq.bots.collectors.rsync.collector_rsync

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: Syncs a file via rsync and reads the file.

Configuration Parameters

  • Feed parameters (see above)

  • file: The filename to process, combine with rsync_path.

  • temp_directory: The temporary directory for rsync, by default $VAR_STATE_PATH/rsync_collector. $VAR_STATE_PATH is /var/run/intelmq/ or /opt/intelmq/var/run/.

  • rsync_path: The path of the file to process

Shadowserver Reports API

The Cache is required to memorize which files have already been processed (TTL needs to be high enough to cover the oldest files available!).

Information

  • name: intelmq.bots.collectors.shadowserver.collector_reports_api

  • description: Connects to the Shadowserver API, requests a list of all the reports for a specific country and processes the ones that are new.

Configuration Parameters

  • country: The country you want to download the reports for

  • apikey: Your Shadowserver API key

  • secret: Your Shadowserver API secret

  • types: A list of strings or a string of comma-separated values with the names of report types you want to process. If you leave this empty, all the available reports will be downloaded and processed (i.e. ‘scan’, ‘drones’, ‘intel’, ‘sandbox_connection’, ‘sinkhole_combined’). The possible report types are equivalent to the file names given in the section Supported Reports of the Shadowserver parser.

  • Cache parameters (see in section Common parameters, the default TTL is set to 10 days)

The resulting reports contain the following special field:

  • extra.file_name: The name of the downloaded file, with fixed filename extension. The API returns file names with the extension .csv, although the files are JSON, not CSV. Therefore, for clarity and better error detection in the parser, the file name in extra.file_name uses .json as extension.

Shodan Stream
Requires the shodan library to be installed:

Information

  • name: intelmq.bots.collectors.shodan.collector_stream

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: Queries the Shodan Streaming API

Configuration Parameters

  • Feed parameters (see above)

  • HTTP parameters (see above). Only the proxy is used (requires shodan-python > 1.8.1). Certificate is always verified.

  • countries: A list of countries to query for. If it is a string, it will be spit by ,.

If the stream is interrupted, the connection will be aborted using the timeout parameter. No error will be logged if the number of consecutive connection fails does not reach the parameter error_max_retries. Instead of errors, an INFO message is logged. This is a measurement against too frequent ERROR logging messages. The consecutive connection fails are reset if a data line has been successfully transferred. If the consecutive connection fails reaches the parameter error_max_retries, an exception will be thrown and rate_limit applies, if not null.

TCP

Information

  • name: intelmq.bots.collectors.tcp.collector

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: TCP is the bot responsible to receive events on a TCP port (ex: from TCP Output of another IntelMQ instance). Might not be working on Python3.4.6.

Configuration Parameters

  • ip: IP of destination server

  • port: port of destination server

Response

TCP collector just sends an “Ok” message after every received message, this should not pose a problem for an arbitrary input. If you intend to link two IntelMQ instance via TCP, have a look at the TCP output bot documentation.

XMPP collector

Warning: This bot is deprecated and will be removed in the version 3.0 of IntelMQ. Warning: This bot is currently unmaintained. The used XMPP library sleekxmpp is deprecated. For more information see Issue #1614.

Information

  • name: intelmq.bots.collectors.xmpp.collector

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: This bot can connect to an XMPP Server and one room, in order to receive reports from it. TLS is used by default. rate_limit is ineffective here. Bot can either pass the body or the whole event.

Requirements

The Sleekxmpp - Library needs to be installed on your System

pip3 install -r intelmq/bots/collectors/xmpp/REQUIREMENTS.txt

Configuration Parameters

  • Feed parameters (see above)

  • xmpp_server: The domain name of the server of the XMPP-Account (part after the @ sign)

  • xmpp_user: The username of the XMPP-Account the collector shall use (part before the @ sign)

  • xmpp_password: The password of the XMPP-Account

  • xmpp_room: The room which has to be joined by the XMPP-Collector (full address room@conference.server.tld)

  • xmpp_room_nick: The username / nickname the collector shall use within the room

  • xmpp_room_password: The password which might be required to join a room

  • use_muc : If this parameter is true, the bot will join the room xmpp_room.

  • xmpp_userlist: An array of usernames whose messages will (not) be processed.

  • xmpp_whitelist_mode: If true the list provided in xmpp_userlist is a whitelist. Else it is a blacklist.

    In case of a whitelist, only messages from the configured users will be processed, else their messages are not processed. Default is false / blacklist.

  • ca_certs: A path to a file containing the CA’s which should be used (default: /etc/ssl/certs/ca-certificates.crt)

  • strip_message: If true trailing white space will be removed from the message. Does not happen if pass_full_xml is set to true (default: true)

  • pass_full_xml: If this parameter is set to true the collector will read the full-xmpp-xml message and add it to the pipeline.

    this is useful if other systems like AbuseHelper should be processed. (default: false)

Alien Vault OTX

Information

  • name: intelmq.bots.collectors.alienvault_otx.collector

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: collect report messages from Alien Vault OTX API

Requirements

Install the library from GitHub, as there is no package in PyPi:

pip3 install -r intelmq/bots/collectors/alienvault_otx/REQUIREMENTS.txt

Configuration Parameters

  • Feed parameters (see above)

  • api_key: API Key

  • modified_pulses_only: get only modified pulses instead of all, set to it to true or false, default false

  • interval: if “modified_pulses_only” is set, define the time in hours (integer value) to get modified pulse since then, default 24 hours

Blueliv Crimeserver

Information

  • name: intelmq.bots.collectors.blueliv.collector_crimeserver

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: collect report messages from Blueliv API

For more information visit https://github.com/Blueliv/api-python-sdk

Requirements

Install the required library:

pip3 install -r intelmq/bots/collectors/blueliv/REQUIREMENTS.txt

Configuration Parameters

Calidog Certstream

A Bot to collect data from the Certificate Transparency Log (CTL) This bot works based on certstream library (https://github.com/CaliDog/certstream-python)

Information

  • name: intelmq.bots.collectors.calidog.collector_certstream

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: collect data from Certificate Transparency Log

Configuration Parameters

  • Feed parameters (see above)

ESET ETI

Information

  • name: intelmq.bots.collectors.eset.collector

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: collect data from ESET ETI TAXII server

For more information visit https://www.eset.com/int/business/services/threat-intelligence/

Requirements

Install the required cabby library:

pip3 install -r intelmq/bots/collectors/eset/REQUIREMENTS.txt

Configuration Parameters

  • Feed parameters (see above)

  • username: Your username

  • password: Your password

  • endpoint: eti.eset.com

  • time_delta: The time span to look back, in seconds. Default 3600.

  • collection: The collection to fetch.

McAfee openDXL

Information

  • name: intelmq.bots.collectors.opendxl.collector

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: collect messages via openDXL

Configuration Parameters

  • Feed parameters (see above)

  • dxl_config_file: location of the configuration file containing required information to connect $

  • dxl_topic: the name of the DXL topic to subscribe

Microsoft Azure

Iterates over all blobs in all containers in an Azure storage. The Cache is required to memorize which files have already been processed (TTL needs to be high enough to cover the oldest files available!).

This bot significantly changed in a backwards-incompatible way in IntelMQ Version 2.2.0 to support current versions of the Microsoft Azure Python libraries.

Information

  • name: intelmq.bots.collectors.microsoft.collector_azure

  • lookup: yes

  • public: no

  • cache (redis db): 5

  • description: collect blobs from Microsoft Azure using their library

Configuration Parameters

  • Cache parameters (see above)

  • Feed parameters (see above)

  • connection_string: connection string as given by Microsoft

  • container_name: name of the container to connect to

Microsoft Interflow

Iterates over all files available by this API. Make sure to limit the files to be downloaded with the parameters, otherwise you will get a lot of data! The cache is used to remember which files have already been downloaded. Make sure the TTL is high enough, higher than not_older_than.

Information

  • name: intelmq.bots.collectors.microsoft.collector_interflow

  • lookup: yes

  • public: no

  • cache (redis db): 5

  • description: collect files from Microsoft Interflow using their API

Configuration Parameters

  • Feed parameters (see above)

  • api_key: API generate in their portal

  • file_match: an optional regular expression to match file names

  • not_older_than: an optional relative (minutes) or absolute time (UTC is assumed) expression to determine the oldest time of a file to be downloaded

  • redis_cache_* and especially redis_cache_ttl: Settings for the cache where file names of downloaded files are saved. The cache’s TTL must always be bigger than not_older_than.

Additional functionalities

  • Files are automatically ungzipped if the filename ends with .gz.

Stomp

Information

  • name: intelmq.bots.collectors.stomp.collector

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: collect messages from a stomp server

Requirements

Install the stomp.py library from PyPI:

pip3 install -r intelmq/bots/collectors/stomp/REQUIREMENTS.txt

Configuration Parameters

  • Feed parameters (see above)

  • exchange: exchange point

  • port: 61614

  • server: hostname e.g. “n6stream.cert.pl”

  • ssl_ca_certificate: path to CA file

  • ssl_client_certificate: path to client cert file

  • ssl_client_certificate_key: path to client cert key file

Twitter

Collects tweets from target_timelines. Up to tweet_count tweets from each user and up to timelimit back in time. The tweet text is sent separately and if allowed, links to pastebin are followed and the text sent in a separate report

Information

  • name: intelmq.bots.collectors.twitter.collector_twitter

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: Collects tweets

Configuration Parameters

  • Feed parameters (see above)

  • target_timelines: screen_names of twitter accounts to be followed

  • tweet_count: number of tweets to be taken from each account

  • timelimit: maximum age of the tweets collected in seconds

  • follow_urls: list of screen_names for which URLs will be followed

  • exclude_replies: exclude replies of the followed screen_names

  • include_rts: whether to include retweets by given screen_name

  • consumer_key: Twitter API login data

  • consumer_secret: Twitter API login data

  • access_token_key: Twitter API login data

  • access_token_secret: Twitter API login data

API collector bot

Information

  • name: intelmq.bots.collectors.api.collector_api

  • lookup: no

  • public: no

  • cache (redis db): none

  • description: Bot for collecting data using API, you need to post JSON to /intelmq/push endpoint

example usage:

curl -X POST http://localhost:5000/intelmq/push -H 'Content-Type: application/json' --data '{"source.ip": "127.0.0.101", "classification.type": "backdoor"}'

Configuration Parameters

  • Feed parameters (see above)

  • port: 5000

Parser Bots

Not complete

This list is not complete. Look at intelmq/bots/BOTS or the list of parsers shown in the manager. But most parsers do not need configuration parameters.

TODO

AnubisNetworks Cyberfeed Stream

Information

  • name: intelmq.bots.parsers.anubisnetworks.parser

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: parsers data from AnubisNetworks Cyberfeed Stream

Description

The feed format changes over time. The parser supports at least data from 2016 and 2020.

Events with the Malware “TestSinkholingLoss” are ignored, as they are for the feed provider’s internal purpose only and should not be processed at all.

Configuration parameters

  • use_malware_familiy_as_classification_identifier: default: true. Use the malw.family field as classification.type. If false, check if the same as malw.variant. If it is the same, it is ignored. Otherwise saved as extra.malware.family.

Generic CSV Parser

Lines starting with ‘#’ will be ignored. Headers won’t be interpreted.

Configuration parameters

  • “columns”: A list of strings or a string of comma-separated values with field names. The names must match the harmonization’s field names. Empty column specifications and columns named “__IGNORE__” are ignored. E.g.

    "columns": [
         "",
         "source.fqdn",
         "extra.http_host_header",
         "__IGNORE__"
    ],
    

    is equivalent to:

    "columns": ",source.fqdn,extra.http_host_header,"
    

    The first and the last column are not used in this example.

    It is possible to specify multiple columns using the | character. E.g.

    "columns": "source.url|source.fqdn|source.ip"
    

    First, bot will try to parse the value as URL, if it fails, it will try to parse it as FQDN, if that fails, it will try to parse it as IP, if that fails, an error will be raised. Some use cases -

    • mixed data set, e.g. URL/FQDN/IP/NETMASK “columns”: “source.url|source.fqdn|source.ip|source.network”

    • parse a value and ignore if it fails “columns”: “source.url|__IGNORE__”

  • “column_regex_search”: Optional. A dictionary mapping field names (as given per the columns parameter) to regular expression. The field is evaluated using re.search. Eg. to get the ASN out of AS1234 use: {“source.asn”: “[0-9]*”}. Make sure to properly escape any backslashes in your regular expression (See also #1579).

  • “compose_fields”: Optional, dictionary. Create fields from columns, e.g. with data like this:

    # Host,Path
    example.com,/foo/
    example.net,/bar/
    

    using this compose_fields parameter:

    {"source.url": "http://{0}{1}"}
    

    You get:

    http://example.com/foo/
    http://example.net/bar/
    

    in the respective source.url fields. The value in the dictionary mapping is formatted whereas the columns are available with their index.

  • “default_url_protocol”: For URLs you can give a default protocol which will be pretended to the data.

  • “delimiter”: separation character of the CSV, e.g. “,”

  • “skip_header”: Boolean, skip the first line of the file, optional. Lines starting with # will be skipped additionally, make sure you do not skip more lines than needed!

  • time_format: Optional. If “timestamp”, “windows_nt” or “epoch_millis” the time will be converted first. With the default null fuzzy time parsing will be used.

  • “type”: set the classification.type statically, optional

  • “data_type”: sets the data of specific type, currently only “json” is supported value. An example

    {
        "columns": [ "source.ip", "source.url", "extra.tags"],
        "data_type": "{\"extra.tags\":\"json\"}"
    }
    

    It will ensure extra.tags is treated as json.

  • “filter_text”: only process the lines containing or not containing specified text, to be used in conjunction with filter_type

  • “filter_type”: value can be whitelist or blacklist. If whitelist, only lines containing the text in filter_text will be processed, if blacklist, only lines NOT containing the text will be processed.

    To process ipset format files use

    {
         "filter_text": "ipset add ",
         "filter_type": "whitelist",
         "columns": [ "__IGNORE__", "__IGNORE__", "__IGNORE__", "source.ip"]
    }
    
  • “type_translation”: If the source does have a field with information for classification.type, but it does not correspond to IntelMQ’s types, you can map them to the correct ones. The type_translation field can hold a dictionary, or a string with a JSON dictionary which maps the feed’s values to IntelMQ’s. Example:

    {"malware_download": "malware-distribution"}
    
  • “columns_required”: A list of true/false for each column. By default, it is true for every column.

Calidog Certstream

Information

  • name: intelmq.bots.parsers.calidog.parser_certstream

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: parsers data from Certificate Transparency Log

Description

For each domain in the leaf_cert.all_domains object one event with the domain in source.fqdn (and source.ip as fallback) is produced. The seen-date is saved in time.source and the classification type is other.

  • Feed parameters (see above)

ESET

Information

  • name: intelmq.bots.parsers.eset.parser

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: Parses data from ESET ETI TAXII server

Description

Supported collections: * “ei.urls (json)” * “ei.domains v2 (json)”

Cymru CAP Program

Information

  • name: intelmq.bots.parsers.cymru.parser_cap_program

  • public: no

  • cache (redis db): none

  • description: Parses data from Cymru’s CAP program feed.

Description

There are two different feeds available:
  • infected_$date.txt (“old”)

  • $certname_$date.txt (“new”)

The new will replace the old at some point in time, currently you need to fetch both. The parser handles both formats.

Old feed

As little information on the format is available, the mappings might not be correct in all cases. Some reports are not implemented at all as there is no data available to check if the parsing is correct at all. If you do get errors like Report … not implement or similar please open an issue and report the (anonymized) example data. Thanks.

The information about the event could be better in many cases but as Cymru does not want to be associated with the report, we can’t add comments to the events in the parser, because then the source would be easily identifiable for the recipient.

Cymru Full Bogons

http://www.team-cymru.com/bogon-reference.html

Information

  • name: intelmq.bots.parsers.cymru.parser_full_bogons

  • public: no

  • cache (redis db): none

  • description: Parses data from full bogons feed.

Github Feed

Information

  • name: intelmq.bots.parsers.github_feed.parser

  • description: Parses Feeds available publicly on GitHub (should receive from github_api collector)

Have I Been Pwned Callback Parser

Information

  • name: intelmq.bots.parsers.hibp.parser_callback

  • public: no

  • cache (redis db): none

  • description: Parses data from Have I Been Pwned feed.

Description

Parsers the data from a Callback of a Have I Been Pwned Enterprise Subscription.

Parses breaches and pastes and creates one event per e-mail address. The e-mail address is stored in source.account. classification.type is leak and classification.identifier is breach or paste.

HTML Table Parser

Configuration parameters

  • “columns”: A list of strings or a string of comma-separated values with field names. The names must match the harmonization’s field names. Empty column specifications and columns named “__IGNORE__” are ignored. E.g.

    "columns": [
         "",
         "source.fqdn",
         "extra.http_host_header",
         "__IGNORE__"
    ],
    

    is equivalent to:

    "columns": ",source.fqdn,extra.http_host_header,"
    

    The first and the last column are not used in this example. It is possible to specify multiple columns using the | character. E.g.

    "columns": "source.url|source.fqdn|source.ip"
    

    First, bot will try to parse the value as URL, if it fails, it will try to parse it as FQDN, if that fails, it will try to parse it as IP, if that fails, an error will be raised. Some use cases -

    • mixed data set, e.g. URL/FQDN/IP/NETMASK “columns”: “source.url|source.fqdn|source.ip|source.network”

    • parse a value and ignore if it fails “columns”: “source.url|__IGNORE__”

  • “ignore_values”: A list of strings or a string of comma-separated values which will not considered while assigning to the corresponding fields given in columns. E.g.

    "ignore_values": [
         "",
         "unknown",
         "Not listed",
     ],
    

    is equivalent to:

    "ignore_values": ",unknown,Not listed,"
    

    The following configuration will lead to assigning all values to malware.name and extra.SBL except unknown and Not listed respectively.

    "columns": [
         "source.url",
         "malware.name",
         "extra.SBL",
    ],
    "ignore_values": [
         "",
         "unknown",
         "Not listed",
    ],
    

    Parameters columns and ignore_values must have same length

  • “attribute_name”: Filtering table with table attributes, to be used in conjunction with attribute_value, optional. E.g. class, id, style.

  • “attribute_value”: String. To filter all tables with attribute class=’details’ use

    "attribute_name": "class",
    "attribute_value": "details"
    
  • “table_index”: Index of the table if multiple tables present. If attribute_name and attribute_value given, index according to tables remaining after filtering with table attribute. Default: 0.

  • “split_column”: Padded column to be split to get values, to be used in conjunction with split_separator and split_index, optional.

  • “split_separator”: Delimiter string for padded column.

  • “split_index”: Index of unpadded string in returned list from splitting split_column with split_separator as delimiter string. Default: 0.

    E.g.

    "split_column": "source.fqdn",
    "split_separator": " ",
    "split_index": 1,
    

    With above configuration, column corresponding to source.fqdn with value [D] lingvaworld.ru will be assigned as “source.fqdn”: “lingvaworld.ru”.

  • “skip_table_head”: Boolean, skip the first row of the table, optional. Default: true.

  • “default_url_protocol”: For URLs you can give a default protocol which will be pretended to the data. Default: “http://”.

  • “time_format”: Optional. If “timestamp”, “windows_nt” or “epoch_millis” the time will be converted first. With the default null fuzzy time parsing will be used.

  • “type”: set the classification.type statically, optional

  • “html_parser”: The HTML parser to use, by default “html.parser”, can also be e.g. “lxml”, have a look at https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Key-Value Parser

Information

  • name: intelmq.bots.parsers.key_value.parser

  • lookup: no

  • public: no

  • cache (redis db): none

  • description: Parses text lines in key=value format, for example FortiGate firewall logs.

Configuration Parameters

  • pair_separator: String separating key=value pairs, default ” “ (space).

  • kv_separator: String separating key and value, default =.

  • keys: Array of string->string, names of keys to propagate mapped to IntelMQ event fields. Example:

    "keys": {
        "srcip": "source.ip",
        "dstip": "destination.ip"
    }
    

    The value mapped to time.source is parsed. If the value is numeric, it is interpreted. Otherwise, or if it fails, it is parsed fuzzy with dateutil. If the value cannot be parsed, a warning is logged per line.

  • strip_quotes: Boolean, remove opening and closing quotes from values, default true.

Parsing limitations

The input must not have (quoted) occurrences of the separator in the values. For example, this is not parsable (with space as separator):

key="long value" key2="other value"

In firewall logs like FortiGate, this does not occur. These logs usually look like:

srcip=192.0.2.1 srcmac="00:00:5e:00:17:17"
McAfee Advanced Threat Defense File

Information

  • name: intelmq.bots.parsers.mcafee.parser_atd_file

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: parses file hash information off ATD reports

Configuration Parameters

  • Feed parameters (see above)

  • verdict_severity: min report severity to parse

McAfee Advanced Threat Defense IP

Information

  • name: intelmq.bots.parsers.mcafee.parser_atd_file

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: parses IP addresses off ATD reports

Configuration Parameters

  • Feed parameters (see above)

  • verdict_severity: min report severity to parse

McAfee Advanced Threat Defense URL

Information

  • name: intelmq.bots.parsers.mcafee.parser_atd_file

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: parses URLs off ATD reports

Configuration Parameters

  • Feed parameters (see above)

  • verdict_severity: min report severity to parse

Microsoft CTIP Parser
  • name: intelmq.bots.parsers.microsoft.parser_ctip

  • public: no

  • cache (redis db): none

  • description: Parses data from the Microsoft CTIP Feed

Description

Can parse the JSON format provided by the Interflow interface (lists of dictionaries) as well as the format provided by the Azure interface (one dictionary per line). The provided data differs between the two formats/providers.

The parser is capable of parsing both feeds: - ctip-c2 - ctip-infected-summary The feeds only differ by a few fields, not in the format.

The feeds contain a field called Payload which is nearly always a base64 encoded JSON structure. If decoding works, the contained fields are saved as extra.payload.*, otherwise the field is saved as extra.payload.text.

MISP
  • name: intelmq.bots.parsers.misp.parser

  • public: no

  • cache (redis db): none

  • description: Parses MISP events

Description

MISP events collected by the MISPCollectorBot are passed to this parser for processing. Supported MISP event categories and attribute types are defined in the SUPPORTED_MISP_CATEGORIES and MISP_TYPE_MAPPING class constants.

n6

Information

  • name: intelmq.bots.parsers.n6.parser_n6stomp

  • public: no

  • cache (redis db): none

  • description: Convert n6 data into IntelMQ format.

Configuration Parameters None

Description

Test messages are ignored, this is logged with debug logging level. Also contains a mapping for the classification (results in taxonomy, type and identifier). The name field is normally used as malware.name, if that fails due to disallowed characters, these characters are removed and the original value is saved as event_description.text. This can happen for names like “further iocs: text with invalid ’ char”.

If an n6 message contains multiple IP addresses, multiple events are generated, resulting in events only differing in the address information.

Twitter

Information

  • name: intelmq.bots.parsers.twitter.parser

  • public: no

  • cache (redis db): none

  • description: Extracts URLs from text, fuzzy, aimed at parsing tweets

Configuration Parameters

  • domain_whitelist: domains to be filtered out

  • substitutions: semicolon delimited list of even length of pairs of substitutions (for example: ‘[.];.;,;.’ substitutes ‘[.]’ for ‘.’ and ‘,’ for ‘.’)

  • classification_type: string with a valid classification type as defined in data harmonization

  • default_scheme: Default scheme for URLs if not given. See also the next section.

Default scheme

The dependency url-normalize changed it’s behavior in version 1.4.0 from using http:// as default scheme to https://. Version 1.4.1 added the possibility to specify it. Thus you can only use the default_scheme parameter with a current version of this library >= 1.4.1, with 1.4.0 you will always get https:// as default scheme and for older versions < 1.4.0 http:// is used.

This does not affect URLs which already include the scheme.

Shadowserver

There are two Shadowserver parsers, one for data in CSV format (intelmq.bots.parsers.shadowserver.parser) and one for data in JSON format (intelmq.bots.parsers.shadowserver.parser_json). The latter was added in IntelMQ 2.3 and is meant to be used together with the Shadowserver API collector.

Information

  • name: intelmq.bots.parsers.shadowserver.parser (for CSV data) or intelmq.bots.parsers.shadowserver.parser_json (for JSON data)

  • public: yes

  • description: Parses different reports from Shadowserver.

Configuration Parameters

  • feedname: Optional, the Name of the feed, see list below for possible values.

  • overwrite: If an existing feed.name should be overwritten.

How this bot works?

There are two possibilities for the bot to determine which feed the data belongs to in order to determine the correct mapping of the columns:

Automatic feed detection

Since IntelMQ version 2.1 the parser can detect the feed based on metadata provided by the collector.

When processing a report, this bot takes extra.file_name from the report and looks in config.py how the report should be parsed.

If this lookup is not possible, and the feed name is not given as parameter, the feed cannot be parsed.

The field extra.file_name has the following structure: %Y-%m-%d-${report_name}[-suffix].csv where suffix can be something like country-geo. For example, some possible filenames are 2019-01-01-scan_http-country-geo.csv or 2019-01-01-scan_tftp.csv. The important part is ${report_name}, between the date and the suffix. Since version 2.1.2 the date in the filename is optional, so filenames like scan_tftp.csv are also detected.

Fixed feed name

If the method above is not possible and for upgraded instances, the feed can be set with the feedname parameter. Feed-names are derived from the subjects of the Shadowserver E-Mails. A list of possible feeds can be found in the table below in the column “feed name”.

Supported reports

These are the supported feed name and their corresponding file name for automatic detection:

feed name

file name

Accessible-ADB

scan_adb

Accessible-AFP

scan_afp

Accessible-ARD

scan_ard

Accessible-Cisco-Smart-Install

cisco_smart_install

Accessible-CoAP

scan_coap

Accessible-CWMP

scan_cwmp

Accessible-MS-RDPEUDP

scan_msrdpeudp

Accessible-FTP

scan_ftp

Accessible-Hadoop

scan_hadoop

Accessible-HTTP

scan_http

Accessible-Radmin

scan_radmin

Accessible-RDP

scan_rdp

Accessible-Rsync

scan_rsync

Accessible-SMB

scan_smb

Accessible-Telnet

scan_telnet

Accessible-Ubiquiti-Discovery-Service

scan_ubiquiti

Accessible-VNC

scan_vnc

Blacklisted-IP (deprecated)

blacklist

Blocklist

blocklist

Compromised-Website

compromised_website

DNS-Open-Resolvers

scan_dns

Honeypot-Amplification-DDoS-Events

event4_honeypot_ddos_amp

Honeypot-Brute-Force-Events

event4_honeypot_brute_force

Honeypot-Darknet

event4_honeypot_darknet

HTTP-Scanners

hp_http_scan

ICS-Scanners

hp_ics_scan

IP-Spoofer-Events

event4_ip_spoofer

NTP-Monitor

scan_ntpmonitor

NTP-Version

scan_ntp

Open-Chargen

scan_chargen

Open-DB2-Discovery-Service

scan_db2

Open-Elasticsearch

scan_elasticsearch

Open-IPMI

scan_ipmi

Open-IPP

scan_ipp

Open-LDAP

scan_ldap

Open-LDAP-TCP

scan_ldap_tcp

Open-mDNS

scan_mdns

Open-Memcached

scan_memcached

Open-MongoDB

scan_mongodb

Open-MQTT

scan_mqtt

Open-MSSQL

scan_mssql

Open-NATPMP

scan_nat_pmp

Open-NetBIOS-Nameservice

scan_netbios

Open-Netis

netis_router

Open-Portmapper

scan_portmapper

Open-QOTD

scan_qotd

Open-Redis

scan_redis

Open-SNMP

scan_snmp

Open-SSDP

scan_ssdp

Open-TFTP

scan_tftp

Open-XDMCP

scan_xdmcp

Outdated-DNSSEC-Key

outdated_dnssec_key

Outdated-DNSSEC-Key-IPv6

outdated_dnssec_key_v6

Sandbox-URL

cwsandbox_url

Sinkhole-DNS

sinkhole_dns

Sinkhole-Events

event4_sinkhole/event6_sinkhole

Sinkhole-HTTP-Events

event4_sinkhole_http/event6_sinkhole_http

Sinkhole-Events-HTTP-Referer

event4_sinkhole_http_referer/event6_sinkhole_http_referer

Spam-URL

spam_url

SSL-FREAK-Vulnerable-Servers

scan_ssl_freak

SSL-POODLE-Vulnerable-Servers

scan_ssl_poodle

Vulnerable-Exchange-Server *

scan_exchange

Vulnerable-ISAKMP

scan_isakmp

Vulnerable-HTTP

scan_http

* This report can also contain data on active webshells (column tag is exchange;webshell), and are therefore not only vulnerable but also actively infected.

In addition, the following legacy reports are supported:

feed name

successor feed name

file name

Amplification-DDoS-Victim

Honeypot-Amplification-DDoS-Events

ddos_amplification

CAIDA-IP-Spoofer

IP-Spoofer-Events

caida_ip_spoofer

Darknet

Honeypot-Darknet

darknet

Drone

Sinkhole-Events

botnet_drone

Drone-Brute-Force

Honeypot-Brute-Force-Events, Sinkhole-HTTP-Events

drone_brute_force

Microsoft-Sinkhole

Sinkhole-HTTP-Events

microsoft_sinkhole

Sinkhole-HTTP-Drone

Sinkhole-HTTP-Events

sinkhole_http_drone

IPv6-Sinkhole-HTTP-Drone

Sinkhole-HTTP-Events

sinkhole6_http

More information on these legacy reports can be found in Changes in Sinkhole and Honeypot Report Types and Formats.

Development

Structure of this Parser Bot

The parser consists of two files:
  • _config.py

  • parser.py or parser_json.py

Both files are required for the parser to work properly.

Add new Feedformats

Add a new feed format and conversions if required to the file _config.py. Don’t forget to update the mapping dict. It is required to look up the correct configuration.

Look at the documentation in the bot’s _config.py file for more information.

Shodan

Information

  • name: intelmq.bots.parsers.shodan.parser

  • public: yes

  • description: Parses data from Shodan (search, stream etc).

The parser is by far not complete as there are a lot of fields in a big nested structure. There is a minimal mode available which only parses the important/most useful fields and also saves everything in extra.shodan keeping the original structure. When not using the minimal mode if may be useful to ignore errors as many parsing errors can happen with the incomplete mapping.

Configuration Parameters

  • ignore_errors: Boolean (default true)

  • minimal_mode: Boolean (default false)

ZoneH

Information

  • name: intelmq.bots.parsers.zoneh.parser

  • public: yes

  • description: Parses data from ZoneH.

Description This bot is designed to consume defacement reports from zone-h.org. It expects fields normally present in CSV files distributed by email.

Expert Bots

Abusix

Information

  • name: abusix

  • lookup: dns

  • public: yes

  • cache (redis db): 5

  • description: RIPE abuse contacts resolving through DNS TXT queries

  • notes: https://abusix.com/contactdb.html

Configuration Parameters

Requirements

This bot can optionally use the python module querycontacts by Abusix itself: https://pypi.org/project/querycontacts/

pip3 install querycontacts

If the package is not installed, our own routines are used.

ASN Lookup

Information

  • name: ASN lookup

  • lookup: local database

  • public: yes

  • cache (redis db): none

  • description: IP to ASN

Configuration Parameters

  • database: Path to the downloaded database.

Requirements

Install pyasn module

pip3 install pyasn

Database

Use this command to create/update the database and reload the bot:

intelmq.bots.experts.asn_lookup.expert --update-database

The database is fetched from [routeviews.org/](http://www.routeviews.org/routeviews/) and licensed under the Creative Commons Attribution 4.0 International license (see the [FAQ](http://www.routeviews.org/routeviews/index.php/faq/#faq-6666).

CSV Converter

Information

  • name: intelmq.bots.experts.csv_converter.expert

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: Converts an event to CSV format, saved in the output field.

Configuration Parameters

  • delimiter: String, default “,”

  • fieldnames: Comma-separated list of field names, e.g. “time.source,classification.type,source.ip”

Usage

To use the CSV-converted data in an output bot - for example in a file output, use the configuration parameter single_key of the output bot and set it to output.

Cymru Whois

Information

  • name: cymru-whois

  • lookup: Cymru DNS

  • public: yes

  • cache (redis db): 5

  • description: IP to geolocation, ASN, BGP prefix

Public documentation: https://www.team-cymru.com/IP-ASN-mapping.html#dns

Configuration Parameters

  • Cache parameters (see in section Common parameters)

  • overwrite: Overwrite existing fields. Default: True if not given (for backwards compatibility, will change in version 3.0.0)

Domain Suffix

This bots adds the public suffix to the event, derived by a domain. See or information on the public suffix list: https://publicsuffix.org/list/ Only rules for ICANN domains are processed. The list can (and should) contain Unicode data, punycode conversion is done during reading.

Note that the public suffix is not the same as the top level domain (TLD). E.g. co.uk is a public suffix, but the TLD is uk. Privately registered suffixes (such as blogspot.co.at) which are part of the public suffix list too, are ignored.

Information

  • name: domain suffix

  • lookup: no

  • public: yes

  • cache (redis db): -

  • description: extracts the domain suffix from the FQDN

Configuration Parameters

  • field: either “fqdn” or “reverse_dns”

  • suffix_file: path to the suffix file

Rule processing

A short summary how the rules are processed:

The simple ones:

com
at
gv.at

example.com leads to com, example.gv.at leads to gv.at.

Wildcards:

*.example.com

www.example.com leads to www.example.com.

And additionally the exceptions, together with the above wildcard rule:

!www.example.com

www.example.com does now not lead to www.example.com, but to example.com.

Deduplicator

Information

  • name: deduplicator

  • lookup: redis cache

  • public: yes

  • cache (redis db): 6

  • description: Bot responsible for ignore duplicated messages. The bot can be configured to perform deduplication just looking to specific fields on the message.

Configuration Parameters

  • Cache parameters (see in section Common parameters)

  • bypass- true or false value to bypass the deduplicator. When set to true, messages will not be deduplicated. Default: false

Parameters for “fine-grained” deduplication

  • filter_type: type of the filtering which can be “blacklist” or “whitelist”. The filter type will be used to define how Deduplicator bot will interpret the parameter filter_keys in order to decide whether an event has already been seen or not, i.e., duplicated event or a completely new event. * “whitelist” configuration: only the keys listed in filter_keys will be considered to verify if an event is duplicated or not. * “blacklist” configuration: all keys except those in filter_keys will be considered to verify if an event is duplicated or not.

  • filter_keys: string with multiple keys separated by comma. Please note that time.observation key will not be considered even if defined, because the system always ignore that key.

Parameters Configuration Example

Example 1

The bot with this configuration will detect duplication only based on source.ip and destination.ip keys.

"parameters": {
    "redis_cache_db": 6,
    "redis_cache_host": "127.0.0.1",
    "redis_cache_password": null,
    "redis_cache_port": 6379,
    "redis_cache_ttl": 86400,
    "filter_type": "whitelist",
    "filter_keys": "source.ip,destination.ip",
}

Example 2

The bot with this configuration will detect duplication based on all keys, except source.ip and destination.ip keys.

"parameters": {
    "redis_cache_db": 6,
    "redis_cache_host": "127.0.0.1",
    "redis_cache_password": null,
    "redis_cache_port": 6379,
    "redis_cache_ttl": 86400,
    "filter_type": "blacklist",
    "filter_keys": "source.ip,destination.ip",
}

Flushing the cache

To flush the deduplicator’s cache, you can use the redis-cli tool. Enter the database used by the bot and submit the flushdb command:

redis-cli -n 6
flushdb
DO Portal Expert Bot

Information

  • name: do_portal

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: The DO portal retrieves the contact information from a DO portal instance: http://github.com/certat/do-portal/

Configuration Parameters * mode - Either replace or append the new abuse contacts in case there are existing ones. * portal_url - The URL to the portal, without the API-path. The used URL is $portal_url + ‘/api/1.0/ripe/contact?cidr=%s’. * portal_api_key - The API key of the user to be used. Must have sufficient privileges.

Field Reducer Bot

Information

  • name: reducer

  • lookup: none

  • public: yes

  • cache (redis db): none

  • description: The field reducer bot is capable of removing fields from events.

Configuration Parameters * type - either “whitelist” or “blacklist” * keys - Can be a JSON-list of field names ([“raw”, “source.account”]) or a string with a comma-separated list of field names (“raw,source.account”).

Whitelist

Only the fields in keys will passed along.

Blacklist

The fields in keys will be removed from events.

Filter

The filter bot is capable of filtering specific events.

Information

  • name: filter

  • lookup: none

  • public: yes

  • cache (redis db): none

  • description: filter messages (drop or pass messages) FIXME

Configuration Parameters

Parameters for filtering with key/value attributes

  • filter_key - key from data harmonization

  • filter_value - value for the key

  • filter_action - action when a message match to the criteria (possible actions: keep/drop)

  • filter_regex - attribute determines if the filter_value shall be treated as regular expression or not.

    If this attribute is not empty, the bot uses python’s “search” function to evaluate the filter.

Parameters for time based filtering

  • not_before - events before this time will be dropped

  • not_after - events after this time will be dropped

Both parameters accept string values describing absolute or relative time:

  • absolute

  • basically anything parseable by datetime parser, eg. “2015-09-012T06:22:11+00:00”

  • time.source taken from the event will be compared to this value to decide the filter behavior

  • relative

  • accepted string formatted like this “<integer> <epoch>”, where epoch could be any of following strings (could optionally end with trailing ‘s’): hour, day, week, month, year

  • time.source taken from the event will be compared to the value (now - relative) to decide the filter behavior

Examples of time filter definition

  • `"not_before" : "2015-09-012T06:22:11+00:00"` events older than the specified time will be dropped

  • `"not_after" : "6 months"` just events older than 6 months will be passed through the pipeline

Possible paths

  • _default: default path, according to the configuration

  • action_other: Negation of the default path

  • filter_match: For all events the filter matched on

  • filter_no_match: For all events the filter does not match

action

match

_default

action_other

filter_match

filter_no_match

keep

keep

drop

drop

In DEBUG logging level, one can see that the message is sent to both matching paths, also if one of the paths is not configured. Of course the message is only delivered to the configured paths.

Format Field

Information

  • name: Format Field

  • lookup: none

  • cache (redis db): none

  • description: String method operations on column values

Configuration Parameters

Parameters for stripping chars

  • strip_columns - A list of strings or a string of comma-separated values with field names. The names must match the harmonization’s field names. E.g.

    "columns": [
         "malware.name",
         "extra.tags"
    ],
    

    is equivalent to:

    
    

    “columns”: “malware.name,extra.tags”

  • strip_chars - a set of characters to remove as leading/trailing characters(default: ` ` or whitespace)

Parameters for replacing chars * replace_column - key from data harmonization * old_value - the string to search for * new_value - the string to replace the old value with * replace_count - number specifying how many occurrences of the old value you want to replace(default: 1)

Parameters for splitting string to list of string * split_column - key from data harmonization * split_separator - specifies the separator to use when splitting the string(default: ,)

Order of operation: strip -> replace -> split. These three methods can be combined such as first strip and then split.

Generic DB Lookup

This bot is capable for enriching intelmq events by lookups to a database. Currently only PostgreSQL and SQLite are supported.

If more than one result is returned, a ValueError is raised.

Information

  • name: intelmq.bots.experts.generic_db_lookup.expert

  • lookup: database

  • public: yes

  • cache (redis db): none

  • description: This bot is capable for enriching intelmq events by lookups to a database.

Configuration Parameters

Connection

  • engine: postgresql or sqlite

  • database: string, defaults to “intelmq”, database name or the SQLite filename

  • table: defaults to “contacts”

PostgreSQL specific

  • host: string, defaults to “localhost”

  • password: string

  • port: integer, defaults to 5432

  • sslmode: string, defaults to “require”

  • user: defaults to “intelmq”

Lookup

  • match_fields: defaults to {“source.asn”: “asn”}

The value is a key-value mapping an arbitrary number intelmq field names to table column names. The values are compared with = only.

Replace fields

  • overwrite: defaults to false. Is applied per field

  • replace_fields: defaults to {“contact”: “source.abuse_contact”}

replace_fields is again a key-value mapping an arbitrary number of table column names to intelmq field names

Gethostbyname

Information

  • name: gethostbyname

  • lookup: DNS

  • public: yes

  • cache (redis db): none

  • description: DNS name (FQDN) to IP

Configuration Parameters

  • fallback_to_url If True and no source.fqdn present, use source.url instead while producing source.ip

  • gaierrors_to_ignore: Optional, list (comma-separated) of gaierror codes to ignore, e.g. -3 for EAI_AGAIN (Temporary failure in name resolution). Only accepts the integer values, not the names.

  • overwrite: Boolean. If true, overwrite existing IP addresses. Default: False.

Description

Resolves the source/destination.fqdn hostname using the gethostbyname syscall and saves the resulting IP address as source/destination.ip. The following gaierror resolution errors are ignored and treated as if the hostname cannot be resolved:

  • -2/EAI_NONAME: NAME or SERVICE is unknown

  • -4/EAI_FAIL: Non-recoverable failure in name res.

  • -5/EAI_NODATA: No address associated with NAME.

  • -8/EAI_SERVICE: SERVICE not supported for `ai_socktype’.

  • -11/EAI_SYSTEM: System error returned in `errno’.

Other errors result in an exception if not ignored by the parameter gaierrors_to_ignore (see above). All gaierrors can be found here: http://www.castaglia.org/proftpd/doc/devel-guide/src/lib/glibc-gai_strerror.c.html

IDEA Converter

Converts the event to IDEA format and saves it as JSON in the field output. All other fields are not modified.

Documentation about IDEA: https://idea.cesnet.cz/en/index

Information

  • name: intelmq.bots.experts.idea.expert

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: The bot does a best effort translation of events into the IDEA format.

Configuration Parameters

  • test_mode: add Test category to mark all outgoing IDEA events as informal (meant to simplify setting up and debugging new IDEA producers) (default: true)

MaxMind GeoIP

Information

  • name: intelmq.bots.experts.maxmind_geoip.expert

  • lookup: local database

  • public: yes

  • cache (redis db): none

  • description: IP to geolocation

Setup

The bot requires the MaxMind’s geoip2 Python library, version 2.2.0 has been tested.

To download the database a free license key is required. More information can be found at https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/

Configuration Parameters

  • database: Path to the local database, e.g. “/opt/intelmq/var/lib/bots/maxmind_geoip/GeoLite2-City.mmdb”

  • overwrite: boolean

  • use_registered: boolean. MaxMind has two country ISO codes: One for the physical location of the address and one for the registered location. Default is false (backwards-compatibility). See also https://github.com/certtools/intelmq/pull/1344 for a short explanation.

  • license_key: License key is necessary for downloading the GeoLite2 database.

Database

Use this command to create/update the database and reload the bot:

intelmq.bots.experts.maxmind_geoip.expert --update-database
MISP

Queries a MISP instance for the source.ip and adds the MISP Attribute UUID and MISP Event ID of the newest attribute found.

Information

  • name: intelmq.bots.experts.misp.expert

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: IP address to MISP attribute and event

Configuration Parameters

  • misp_key: MISP Authkey

  • misp_url: URL of MISP server (with trailing ‘/’)

Generic parameters used in this bot:

  • http_verify_cert: Verify the TLS certificate of the server, boolean (default: true)

McAfee Active Response Hash lookup

Information

  • name: intelmq.bots.experts.mcafee.expert_mar

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: Queries occurrences of hashes within local environment

Configuration Parameters

  • Feed parameters (see above)

  • dxl_config_file: location of file containing required information to connect to DXL bus

  • lookup_type: One of: - Hash: looks up malware.hash.md5, malware.hash.sha1 and malware.hash.sha256 - DestSocket: looks up destination.ip and destination.port - DestIP: looks up destination.ip - DestFQDN: looks up in destination.fqdn

McAfee Active Response IP lookup

Information

  • name: intelmq.bots.experts.mcafee.expert_mar_ip

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: Queries occurrences of connection attempts to destination ip/port within local environment

Configuration Parameters

  • Feed parameters (see above)

  • dxl_config_file: location of file containing required information to connect to DXL bus

McAfee Active Response URL lookup

Information

  • name: intelmq.bots.experts.mcafee.expert_mar_url

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: Queries occurrences of FQDN lookups within local environment

Configuration Parameters

  • Feed parameters (see above)

  • dxl_config_file: location of file containing required information to connect to DXL bus

Modify

Information

  • name: modify

  • lookup: local config

  • public: yes

  • cache (redis db): none

  • description: modify expert bot allows you to change arbitrary field values of events just using a configuration file

Configuration Parameters

  • configuration_path: filename

  • case_sensitive: boolean, default: true

  • maximum_matches: Maximum number of matches. Processing stops after the limit is reached. Default: no limit (null, 0).

  • overwrite: Overwrite any existing fields by matching rules. Default if the parameter is given: true, for backwards compatibility. Default will change to false in version 3.0.0.

Configuration File

The modify expert bot allows you to change arbitrary field values of events just using a configuration file. Thus it is possible to adapt certain values or adding new ones only by changing JSON-files without touching the code of many other bots.

The configuration is called modify.conf and looks like this:

[
    {
        "rulename": "Standard Protocols http",
        "if": {
            "source.port": "^(80|443)$"
        },
        "then": {
            "protocol.application": "http"
        }
    },
    {
        "rulename": "Spamhaus Cert conficker",
        "if": {
            "malware.name": "^conficker(ab)?$"
        },
        "then": {
            "classification.identifier": "conficker"
        }
    },
    {
        "rulename": "bitdefender",
        "if": {
            "malware.name": "bitdefender-(.*)$"
        },
        "then": {
            "malware.name": "{matches[malware.name][1]}"
        }
    },
    {
        "rulename": "urlzone",
        "if": {
            "malware.name": "^urlzone2?$"
        },
        "then": {
            "classification.identifier": "urlzone"
        }
    },
    {
        "rulename": "default",
        "if": {
            "feed.name": "^Spamhaus Cert$"
        },
        "then": {
            "classification.identifier": "{msg[malware.name]}"
        }
    }
]

In our example above we have five groups labeled Standard Protocols http, Spamhaus Cert conficker, bitdefender, urlzone and default. All sections will be considered, in the given order (from top to bottom).

Each rule consists of conditions and actions. Conditions and actions are dictionaries holding the field names of events and regular expressions to match values (selection) or set values (action). All matching rules will be applied in the given order. The actions are only performed if all selections apply.

If the value for a condition is an empty string, the bot checks if the field does not exist. This is useful to apply default values for empty fields.

Actions

You can set the value of the field to a string literal or number.

In addition you can use the standard Python string format syntax to access the values from the processed event as msg and the match groups of the conditions as matches, see the bitdefender example above. Group 0 ([0]) contains the full matching string. See also the documentation on re.Match.group.

Note that matches will also contain the match groups from the default conditions if there were any.

Examples

We have an event with feed.name = Spamhaus Cert and malware.name = confickerab. The expert loops over all sections in the file and eventually enters section Spamhaus Cert. First, the default condition is checked, it matches! OK, going on. Otherwise the expert would have selected a different section that has not yet been considered. Now, go through the rules, until we hit the rule conficker. We combine the conditions of this rule with the default conditions, and both rules match! So we can apply the action: classification.identifier is set to conficker, the trivial name.

Assume we have an event with feed.name = Spamhaus Cert and malware.name = feodo. The default condition matches, but no others. So the default action is applied. The value for classification.identifier will be set to feodo by {msg[malware.name]}.

Types

If the rule is a string, a regular expression search is performed, also for numeric values (str() is called on them). If the rule is numeric for numeric values, a simple comparison is done. If other types are mixed, a warning will be thrown.

For boolean values, the comparison value needs to be true or false as in JSON they are written all-lowercase.

National CERT contact lookup by CERT.AT

Information

Configuration Parameters

  • filter: (true/false) act as a filter for AT.

  • overwrite_cc: set to true if you want to overwrite any potentially existing cc fields in the event.

RecordedFuture IP risk

This Bot tags events with score found in recorded futures large IP risklist.

Information

  • name: recordedfuture_iprisk

  • lookup: local database

  • public: no

  • cache (redis db): none

  • description: Record risk score associated to source and destination IP if they are present. Assigns 0 to IP addresses not in the RF list.

Configuration Parameters

  • database: Location of csv file obtained from recorded future API (a script is provided to download the large IP set)

  • overwrite: set to true if you want to overwrite any potentially existing risk score fields in the event.

  • api_token: This needs to contain valid API token to download the latest database data.

Description

For both source.ip and destination.ip the corresponding risk score is fetched from a local database created from Recorded Future’s API. The score is recorded in extra.rf_iprisk.source and extra.rf_iprisk.destination. If a lookup for an IP fails a score of 0 is recorded.

See https://www.recordedfuture.com/products/api/ and speak with your recorded future representative for more information.

The list is obtained from recorded future API and needs a valid API TOKEN The large list contains all IP’s with a risk score of 25 or more. If IP’s are not present in the database a risk score of 0 is given

A script is supplied that may be run as intelmq to update the database.

Database

Use this command to create/update the database and reload the bot:

intelmq.bots.experts.recordedfuture_iprisk.expert --update-database
Reverse DNS

For both source.ip and destination.ip the PTR record is fetched and the first valid result is used for source.reverse_dns/destination.reverse_dns.

Information

  • name: reverse-dns

  • lookup: DNS

  • public: yes

  • cache (redis db): 8

  • description: IP to domain

Configuration Parameters

  • Cache parameters (see in section Common parameters)

  • cache_ttl_invalid_response: The TTL for cached invalid responses.

  • overwrite: Overwrite existing fields. Default: True if not given (for backwards compatibility, will change in version 3.0.0)

RFC1918

Several RFCs define ASNs, IP Addresses and Hostnames (and TLDs) reserved for documentation. Events or fields of events can be dropped if they match the criteria of either being reserved for documentation (e.g. AS 64496, Domain example.com) or belonging to a local area network (e.g. 192.168.0.0/24). These checks can applied to URLs, IP Addresses, FQDNs and ASNs.

It is configurable if the whole event should be dropped (“policies”) or just the field removed, as well as which fields should be checked.

Sources:

Information

  • name: rfc1918

  • lookup: none

  • public: yes

  • cache (redis db): none

  • description: removes events or single fields with invalid data

Configuration Parameters

  • fields: string, comma-separated list of fields e.g. destination.ip,source.asn,source.url. Supported fields are: * destination.asn & source.asn * destination.fqdn & source.fqdn * destination.ip & source.ip * destination.url & source.url

  • policy: string, comma-separated list of policies, e.g. del,drop,drop. drop will cause that the the entire event to be removed if the field is , del causes the field to be removed.

With the example parameter values given above, this means that: * If a destination.ip value is part of a reserved network block, the field will be removed (policy “del”). * If a source.asn value is in the range of reserved AS numbers, the event will be removed altogether (policy “drop). * If a source.url value contains a host with either an IP address part of a reserved network block, or a reserved domain name (or with a reserved TLD), the event will be dropped (policy “drop”)

Ripe

Online RIPE Abuse Contact and Geolocation Finder for IP addresses and Autonomous Systems.

Information

  • name: ripencc-abuse-contact

  • lookup: HTTPS API

  • public: yes

  • cache (redis db): 10

  • description: IP to abuse contact

Configuration Parameters

  • Cache parameters (see section Common parameters)

  • mode: either append (default) or replace

  • query_ripe_db_asn: Query for IPs at http://rest.db.ripe.net/abuse-contact/%s.json, default true

  • query_ripe_db_ip: Query for ASNs at http://rest.db.ripe.net/abuse-contact/as%s.json, default true

  • query_ripe_stat_asn: Query for ASNs at https://stat.ripe.net/data/abuse-contact-finder/data.json?resource=%s, default true

  • query_ripe_stat_ip: Query for IPs at https://stat.ripe.net/data/abuse-contact-finder/data.json?resource=%s, default true

  • query_ripe_stat_geolocation: Query for IPs at https://stat.ripe.net/data/maxmind-geo-lite/data.json?resource=%s, default true

Sieve

Information

  • name: sieve

  • lookup: none

  • public: yes

  • cache (redis db): none

  • description: Filtering with a sieve-based configuration language

Configuration Parameters

  • file: Path to sieve file. Syntax can be validated with intelmq_sieve_expert_validator.

Description

The sieve bot is used to filter and/or modify events based on a set of rules. The rules are specified in an external configuration file and with a syntax similar to the Sieve language used for mail filtering.

Each rule defines a set of matching conditions on received events. Events can be matched based on keys and values in the event. Conditions can be combined using parenthesis and the boolean operators && and ||. If the processed event matches a rule’s conditions, the corresponding actions are performed. Actions can specify whether the event should be kept or dropped in the pipeline (filtering actions) or if keys and values should be changed (modification actions).

Requirements

To use this bot, you need to install the required dependencies:

pip3 install -r intelmq/bots/experts/sieve/REQUIREMENTS.txt

Examples

The following excerpts illustrate some of the basic features of the sieve file format:

if :exists source.fqdn {
  keep  // aborts processing of subsequent rules and forwards the event.
}


if :notexists source.abuse_contact || source.abuse_contact =~ '.*@example.com' {
  drop  // aborts processing of subsequent rules and drops the event.
}

if source.ip << '192.0.0.0/24' {
    add! comment = 'bogon' // sets the field comment to this value and overwrites existing values
    path 'other-path' // the message is sent to the given path
}

if classification.type == ['phishing', 'malware'] && source.fqdn =~ '.*\.(ch|li)$' {
  add! comment = 'domainabuse'
  keep
} elif classification.type == 'scanner' {
  add! comment = 'ignore'
  drop
} else {
  remove comment
}

Reference

Sieve File Structure

The sieve file contains an arbitrary number of rules of the form:

if EXPRESSION {
    ACTIONS
} elif EXPRESSION {
    ACTIONS
} else {
    ACTIONS
}

Please note that nesting if-statements is currently not possible. ACTIONS must contain one or more actions of the actions listed below.

Expressions

Each rule specifies on or more expressions to match an event based on its keys and values. Event keys are specified as strings without quotes. String values must be enclosed in single quotes. Numeric values can be specified as integers or floats and are unquoted. IP addresses and network ranges (IPv4 and IPv6) are specified with quotes. Expression statements can be combined and chained using parenthesis and the boolean operators && and ||. The following operators may be used to match events:

  • :exists and :notexists match if a given key exists, for example:

    if :exists source.fqdn { ... }

  • == and != match for equality of strings and numbers, for example:

    if feed.name != 'acme-security' || feed.accuracy == 100 { ... }

  • :contains matches on substrings.

  • =~ matches strings based on the given regular expression. !~ is the inverse regular expression match.

  • Numerical comparisons are evaluated with <, <=, >, >=.

  • << matches if an IP address is contained in the specified network range:

    if source.ip << '10.0.0.0/8' { ... }

  • Values to match against can also be specified as list, in which case any one of the values will result in a match:

    if source.ip == ['8.8.8.8', '8.8.4.4'] { ... }

In this case, the event will match if it contains a key source.ip with either value 8.8.8.8 or 8.8.4.4.

With inequality operators, the behavior is the same, so it matches if any expression does not match:

if source.ip != ['8.8.8.8', '8.8.4.4'] { ... }

Events with values like 8.8.8.8 or 8.8.4.4 will match, as they are always unequal to the other value. Attention: The result is not that the field must be unequal to all given values.

  • The combination of multiple expressions can be done using parenthesis and boolean operators:

if (source.ip == '127.0.0.1') && (comment == 'add field' || classification.taxonomy == 'vulnerable') { ... }

Actions

If part of a rule matches the given conditions, the actions enclosed in { and } are applied. By default, all events that are matched or not matched by rules in the sieve file will be forwarded to the next bot in the pipeline, unless the drop action is applied.

  • add adds a key value pair to the event. This action only applies if the key is not yet defined in the event. If the key is already defined, the action is ignored. Example:

    add comment = 'hello, world'

    Some basic mathematical expressions are possible, but currently support only relative time specifications objects are supported. For example: `add time.observation += '1 hour'` `add time.observation -= '10 hours'`

  • add! same as above, but will force overwrite the key in the event.

  • update modifies an existing value for a key. Only applies if the key is already defined. If the key is not defined in the event, this action is ignored. This supports mathematical expressions like above. Example:

    update feed.accuracy = 50

    Some basic mathematical expressions are possible, but currently support only relative time specifications objects are supported. For example: `update time.observation += '1 hour'` `update time.observation -= '10 hours'`

  • remove removes a key/value from the event. Action is ignored if the key is not defined in the event. Example:

    remove extra.comments

  • keep sends the message to the next bot in the pipeline (same as the default behaviour), and stops sieve file processing.

    keep

  • path sets the path (named queue) the message should be sent to (implicitly or with the command keep. The named queue needs to configured in the pipeline, see the User Guide for more information.

    path 'named-queue'

    You can as well set multiple destination paths with the same syntax as for value lists:

    path ['one', 'two']

    This will result in two identical message, one sent to the path one and the other sent to the path two.

    If the path is not configured, the error looks like:

    ```
    File “/path/to/intelmq/intelmq/lib/pipeline.py”, line 353, in send

    for destination_queue in self.destination_queues[path]:

    KeyError: ‘one’ ```

  • drop marks the event to be dropped. The event will not be forwarded to the next bot in the pipeline. The sieve file processing is interrupted upon reaching this action. No other actions may be specified besides the drop action within { and }.

Comments

Comments may be used in the sieve file: all characters after // and until the end of the line will be ignored.

Validating a sieve file

Use the following command to validate your sieve files:

$ intelmq.bots.experts.sieve.validator
usage: intelmq.bots.experts.sieve.validator [-h] sievefile

Validates the syntax of sievebot files.

positional arguments:
  sievefile   Sieve file

optional arguments:
  -h, --help  show this help message and exit
Taxonomy

Information

  • name: taxonomy

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: Adds the classification.taxonomy field according to the RSIT taxonomy.

Please note that there is a slight mismatch of IntelMQ’s taxonomy to the upstream taxonomy, but it should not matter here much.

Configuration Parameters

None.

Description

Information on the “Reference Security Incident Taxonomy” can be found here: https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force

For brevity, “type” means classification.type and “taxonomy” means classification.taxonomy.

  • If taxonomy is missing, and type is given, the according taxonomy is set.

  • If neither taxonomy, not type is given, taxonomy is set to “other” and type to “unknown”.

  • If taxonomy is given, but type is not, type is set to “unknown”.

Threshold

Information

  • Cache parameters (see section Common parameters)

  • name: threshold

  • lookup: redis cache

  • public: no

  • cache (redis db): 11

  • description: Check if the number of similar messages during a specified time interval exceeds a set value.

Configuration Parameters

  • filter_keys: String, comma-separated list of field names to consider or ignore when determining which messages are similar.

  • filter_type: String, whitelist (consider only the fields in filter_keys) or blacklist (consider everything but the fields in filter_keys).

  • timeout: Integer, number of seconds before threshold counter is reset.

  • threshold: Integer, number of messages required before propagating one. In forwarded messages, the threshold is saved in the message as extra.count.

  • add_keys: Array of string->string, optional, fields and values to add (or update) to propagated messages. Example:

    "add_keys": {
        "classification.type": "spam",
        "comment": "Started more than 10 SMTP connections"
    }
    

Limitations

This bot has certain limitations and is not a true threshold filter (yet). It works like this: 1. Every incoming message is hashed according to the filter_* parameters. 2. The hash is looked up in the cache and the count is incremented by 1, and the TTL of the key is (re-)set to the timeout. 3. If the new count matches the threshold exactly, the message is forwarded. Otherwise it is dropped.

Please note: Even if a message is sent, any further identical messages are dropped, if the time difference to the last message is less than the timeout! The counter is not reset if the threshold is reached.

Tor Nodes

Information

  • name: tor-nodes

  • lookup: local database

  • public: yes

  • cache (redis db): none

  • description: check if IP is tor node

Configuration Parameters

  • database: Path to the database

Database

Use this command to create/update the database and reload the bot:

intelmq.bots.experts.tor_nodes.expert --update-database
Url2FQDN

This bot extracts the Host from the source.url and destination.url fields and writes it to source.fqdn or destination.fqdn if it is a hostname, or source.ip or destination.ip if it is an IP address.

Information

  • name: url2fqdn

  • lookup: none

  • public: yes

  • cache (redis db): none

  • description: writes domain name from URL to FQDN or IP address

Configuration Parameters

  • overwrite: boolean, replace existing FQDN / IP address?

Wait

Information

  • name: wait

  • lookup: none

  • public: yes

  • cache (redis db): none

  • description: Waits for a some time or until a queue size is lower than a given number.

Configuration Parameters

  • queue_db: Database number of the database, default 2. Converted to integer.

  • queue_host: Host of the database, default localhost.

  • queue_name: Name of the queue to be watched, default null. This is not the name of a bot but the queue’s name.

  • queue_password: Password for the database, default None.

  • queue_polling_interval: Interval to poll the list length in seconds. Converted to float.

  • queue_port: Port of the database, default 6379. Converted to integer.

  • queue_size: Maximum size of the queue, default 0. Compared by <=. Converted to integer.

  • sleep_time: Time to sleep before sending the event.

Only one of the two modes is possible. If a queue name is given, the queue mode is active. If the sleep_time is a number, sleep mode is active. Otherwise the dummy mode is active, the events are just passed without an additional delay.

Note that SIGHUPs and reloads interrupt the sleeping.

Output Bots

AMQP Topic

Sends data to an AMQP Server See https://www.rabbitmq.com/tutorials/amqp-concepts.html for more details on amqp topic exchange.

Requires the pika python library.

Information

  • name: intelmq.bots.outputs.amqptopic.output

  • lookup: to the amqp server

  • public: yes

  • cache: no

  • description: Sends the event to a specified topic of an AMQP server

Configuration parameters

  • connection_attempts : The number of connection attempts to defined server, defaults to 3

  • connection_heartbeat : Heartbeat to server, in seconds, defaults to 3600

  • connection_host : Name/IP for the AMQP server, defaults to 127.0.0.1

  • connection_port : Port for the AMQP server, defaults to 5672

  • connection_vhost : Virtual host to connect, on an http(s) connection would be http:/IP/<your virtual host>

  • content_type : Content type to deliver to AMQP server, currently only supports “application/json”

  • delivery_mode : 1 - Non-persistent, 2 - Persistent. On persistent mode, messages are delivered to ‘durable’ queues and will be saved to disk.

  • exchange_durable : If set to True, the exchange will survive broker restart, otherwise will be a transient exchange.

  • exchange_name : The name of the exchange to use

  • exchange_type : Type of the exchange, e.g. topic, fanout etc.

  • keep_raw_field : If set to True, the message ‘raw’ field will be sent

  • password : Password for authentication on your AMQP server

  • require_confirmation : If set to True, an exception will be raised if a confirmation error is received

  • routing_key : The routing key for your amqptopic

  • single_key : Only send the field instead of the full event (expecting a field name as string)

  • username : Username for authentication on your AMQP server

  • use_ssl : Use ssl for the connection, make sure to also set the correct port, usually 5671 (true/false)

  • message_hierarchical_output: Convert the message to hierarchical JSON, default: false

  • message_with_type : Include the type in the sent message, default: false

  • message_jsondict_as_string: Convert fields of type JSONDict (extra) as string, default: false

If no authentication should be used, leave username or password empty or null.

Examples of usage

  • Useful to send events to a RabbitMQ exchange topic to be further processed in other platforms.

Confirmation

If routing key or exchange name are invalid or non existent, the message is accepted by the server but we receive no confirmation. If parameter require_confirmation is True and no confirmation is received, an error is raised.

Common errors

Unroutable messages / Undefined destination queue

The destination exchange and queue need to exist beforehand, with your preferred settings (e.g. durable, lazy queue. If the error message says that the message is “unroutable”, the queue doesn’t exist.

Blackhole

This output bot discards all incoming messages.

Information

  • name: blackhole

  • lookup: no

  • public: yes

  • cache: no

  • description: discards messages

Elasticsearch Output Bot

Information

  • name: intelmq.bots.outputs.elasticsearch.output

  • lookup: yes

  • public: yes

  • cache: no

  • description: Output Bot that sends events to Elasticsearch

Only ElasticSearch version 7 supported.

It is also possible to feed data into ElasticSearch using ELK-Stack via Redis and Logstash, see ELK Stack for more information. This methods supports various different versions of ElasticSearch.

Configuration parameters

  • elastic_host: Name/IP for the Elasticsearch server, defaults to 127.0.0.1

  • elastic_port: Port for the Elasticsearch server, defaults to 9200

  • elastic_index: Index for the Elasticsearch output, defaults to intelmq

  • rotate_index: If set, will index events using the date information associated with the event.

    Options: ‘never’, ‘daily’, ‘weekly’, ‘monthly’, ‘yearly’. Using ‘intelmq’ as the elastic_index, the following are examples of the generated index names:

    'never' --> intelmq
    'daily' --> intelmq-2018-02-02
    'weekly' --> intelmq-2018-42
    'monthly' --> intelmq-2018-02
    'yearly' --> intelmq-2018
    
  • http_username: HTTP basic authentication username

  • http_password: HTTP basic authentication password

  • use_ssl: Whether to use SSL/TLS when connecting to Elasticsearch. Default: False

  • http_verify_cert: Whether to require verification of the server’s certificate. Default: False

  • ssl_ca_certificate: An optional path to a certificate bundle to use for verifying the server

  • ssl_show_warnings: Whether to show warnings if the server’s certificate cannot be verified. Default: True

  • replacement_char: If set, dots (‘.’) in field names will be replaced with this character prior to indexing. This is for backward compatibility with ES 2.X. Default: null. Recommended for ES2.X: ‘_’

  • flatten_fields: In ES, some query and aggregations work better if the fields are flat and not JSON. Here you can provide a list of fields to convert.

    Can be a list of strings (fieldnames) or a string with field names separated by a comma (,). eg extra,field2 or [‘extra’, ‘field2’] Default: [‘extra’]

See contrib/elasticsearch/elasticmapper for a utility for creating Elasticsearch mappings and templates.

If using rotate_index, the resulting index name will be of the form [elastic_index]-[event date]. To query all intelmq indices at once, use an alias (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html), or a multi-index query.

The data in ES can be retrieved with the HTTP-Interface:

> curl -XGET 'http://localhost:9200/intelmq/events/_search?pretty=True'
File

Information

  • name: file

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: output messages (reports or events) to file

Multihreading is disabled for this bot, as this would lead to corrupted files.

Configuration Parameters

  • encoding_errors_mode: By default ‘strict’, see for more details and options: https://docs.python.org/3/library/functions.html#open For example with ‘backslashreplace’ all characters which cannot be properly encoded will be written escaped with backslashes.

  • file: file path of output file. Missing directories will be created if possible with the mode 755.

  • format_filename: Boolean if the filename should be formatted (default: false).

  • hierarchical_output: If true, the resulting dictionary will be hierarchical (field names split by dot).

  • single_key: if none, the whole event is saved (default); otherwise the bot saves only contents of the specified key. In case of raw the data is base64 decoded.

Filename formatting

The filename can be formatted using pythons string formatting functions if format_filename is set. See https://docs.python.org/3/library/string.html#formatstrings

For example:
  • The filename …/{event[source.abuse_contact]}.txt will be (for example) …/abuse@example.com.txt.

  • …/{event[time.source]:%Y-%m-%d} results in the date of the event used as filename.

If the field used in the format string is not defined, None will be used as fallback.

Files

Information

  • name: files

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: saving of messages as separate files

Configuration Parameters

  • dir: output directory (default /opt/intelmq/var/lib/bots/files-output/incoming)

  • tmp: temporary directory (must reside on the same filesystem as dir) (default: /opt/intelmq/var/lib/bots/files-output/tmp)

  • suffix: extension of created files (default .json)

  • hierarchical_output: if true, use nested dictionaries; if false, use flat structure with dot separated keys (default)

  • single_key: if none, the whole event is saved (default); otherwise the bot saves only contents of the specified key

McAfee Enterprise Security Manager

Information

  • name: intelmq.bots.outputs.mcafee.output_esm_ip

  • lookup: yes

  • public: no

  • cache (redis db): none

  • description: Writes information out to McAfee ESM watchlist

Configuration Parameters

  • Feed parameters (see above)

  • esm_ip: IP address of ESM instance

  • esm_user: username of user entitled to write to watchlist

  • esm_pw: password of user

  • esm_watchlist: name of the watchlist to write to

  • field: name of the IntelMQ field to be written to ESM

MISP Feed

Information

  • name: intelmq.bots.outputs.misp.output_feed

  • lookup: no

  • public: no

  • cache (redis db): none

  • description: Create a directory layout in the MISP Feed format

The PyMISP library >= 2.4.119.1 is required, see REQUIREMENTS.txt.

Configuration Parameters

  • Feed parameters (see above)

  • misp_org_name: Org name which creates the event, string

  • misp_org_uuid: Org UUID which creates the event, string

  • output_dir: Output directory path, e.g. /opt/intelmq/var/lib/bots/mispfeed-output. Will be created if it does not exist and possible.

  • interval_event: The output bot creates one event per each interval, all data in this time frame is part of this event. Default “1 hour”, string.

Usage in MISP

Configure the destination directory of this feed as feed in MISP, either as local location, or served via a web server. See the MISP documentation on Feeds for more information

MISP API

Information

  • name: intelmq.bots.outputs.misp.output_api

  • lookup: no

  • public: no

  • cache (redis db): none

  • description: Connect to a MISP instance and add event as MISPObject if not there already.

The PyMISP library >= 2.4.120 is required, see REQUIREMENTS.txt.

Configuration Parameters

  • Feed parameters (see above)

  • add_feed_provider_as_tag: boolean (use true when in doubt)

  • add_feed_name_as_tag: boolean (use true when in doubt)

  • misp_additional_correlation_fields: list of fields for which the correlation flags will be enabled (in addition to those which are in significant_fields)

  • misp_additional_tags: list of tags to set not be searched for when looking for duplicates

  • misp_key: string, API key for accessing MISP

  • misp_publish: boolean, if a new MISP event should be set to “publish”.

    Expert setting as MISP may really make it “public”! (Use false when in doubt.)

  • misp_tag_for_bot: string, used to mark MISP events

  • misp_to_ids_fields: list of fields for which the to_ids flags will be set

  • misp_url: string, URL of the MISP server

  • significant_fields: list of intelmq field names

The significant_fields values will be searched for in all MISP attribute values and if all values are found in the same MISP event, no new MISP event will be created. Instead if the existing MISP events have the same feed.provider and match closely, their timestamp will be updated.

If a new MISP event is inserted the significant_fields and the misp_additional_correlation_fields will be the attributes where correlation is enabled.

Make sure to build the IntelMQ Botnet in a way the rate of incoming events is what MISP can handle, as IntelMQ can process many more events faster than MISP (which is by design as MISP is for manual handling). Also remove the fields of the IntelMQ events with an expert bot that you do not want to be inserted into MISP.

(More details can be found in the docstring of output_api.py.

MongoDB

Saves events in a MongoDB either as hierarchical structure or flat with full key names. time.observation and time.source are saved as datetime objects, not as ISO formatted string.

Information

  • name: mongodb

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: MongoDB is the bot responsible to send events to a MongoDB database

Configuration Parameters

  • collection: MongoDB collection

  • database: MongoDB database

  • db_user : Database user that should be used if you enabled authentication

  • db_pass : Password associated to db_user

  • host: MongoDB host (FQDN or IP)

  • port: MongoDB port, default: 27017

  • hierarchical_output: Boolean (default true) as MongoDB does not allow saving keys with dots, we split the dictionary in sub-dictionaries.

  • replacement_char: String (default ‘_’) used as replacement character for the dots in key names if hierarchical output is not used.

Installation Requirements

pip3 install pymongo>=2.7.1

The bot has been tested with pymongo versions 2.7.1, 3.4 and 3.10.1 (server versions 2.6.10 and 3.6.8).

Redis

Information

  • name: intelmq.bots.outputs.redis.output

  • lookup: to the Redis server

  • public: yes

  • cache (redis db): none

  • description: Output Bot that sends events to a remote Redis server/queue.

Configuration Parameters

  • redis_db: remote server database, e.g.: 2

  • redis_password: remote server password

  • redis_queue: remote server list (queue), e.g.: “remote-server-queue”

  • redis_server_ip: remote server IP address, e.g.: 127.0.0.1

  • redis_server_port: remote server Port, e.g.: 6379

  • redis_timeout: Connection timeout, in milliseconds, e.g.: 50000

  • hierarchical_output: whether output should be sent in hierarchical JSON format (default: false)

  • with_type: Send the __type field (default: true)

Examples of usage

  • Can be used to send events to be processed in another system. E.g.: send events to Logstash.

  • In a multi tenant installation can be used to send events to external/remote IntelMQ instance. Any expert bot queue can receive the events.

  • In a complex configuration can be used to create logical sets in IntelMQ-Manager.

Request Tracker

Information

  • name: intelmq.bots.outputs.rt.output

  • lookup: to the Request Tracker instance

  • public: yes

  • cache (redis db): none

  • description: Output Bot that creates Request Tracker tickets from events.

Description

The bot creates tickets in Request Tracker and uses event fields for the ticket body text. The bot follows the workflow of the RTIR:

  • create ticket in Incidents queue (or any other queue)

    • all event fields are included in the ticket body,

    • event attributes are assigned to tickets’ CFs according to the attribute mapping,

    • ticket taxonomy can be assigned according to the CF mapping. If you use taxonomy different from ENISA RSIT, consider using some extra attribute field and do value mapping with modify or sieve bot,

  • create linked ticket in Investigations queue, if these conditions are met

    • if first ticket destination was Incidents queue,

    • if there is source.abuse_contact is specified,

    • if description text is specified in the field appointed by configuration,

  • RT/RTIR supposed to do relevant notifications by scrip working on condition “On Create”,

  • configuration option investigation_fields specifies which event fields has to be included in the investigation,

  • Resolve Incident ticket, according to configuration (Investigation ticket status should depend on RT scrip configuration),

Take extra caution not to flood your ticketing system with enormous amount of tickets. Add extra filtering for that to pass only critical events to the RT, and/or deduplicating events.

Configuration Parameters

  • rt_uri, rt_user, rt_password, verify_cert: RT API endpoint connection details, string.

  • queue: ticket destination queue. If set to ‘Incidents’, ‘Investigations’ ticket will be created if create_investigation is set to true, string.

  • CF_mapping: mapping attributes to ticket CFs, dictionary. E.g {“event_description.text”:”Description”,”source.ip”:”IP”,”extra.classification.type”:”Incident Type”,”classification.taxonomy”:”Classification”}

  • final_status: the final status for the created ticket, string. E.g. resolved if you want to resolve the created ticket. The linked Investigation ticket will be resolved automatically by RTIR scripts.

  • create_investigation: if an Investigation ticket should be created (in case of RTIR workflow). true or false, boolean.

  • investigation_fields: attributes to include into investigation ticket, comma-separated string. E.g. time.source,source.ip,source.port,source.fqdn,source.url,classification.taxonomy,classification.type,classification.identifier,event_description.url,event_description.text,malware.name,protocol.application,protocol.transport.

  • description_attr: which event attribute contains text message being sent to the recipient, string. If it is not specified or not found in the event, the Investigation ticket is not going to be created. Example: extra.message.text.

REST API

Information

  • name: restapi

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: REST API is the bot responsible to send events to a REST API listener through POST

Configuration Parameters

  • auth_token: the user name / HTTP header key

  • auth_token_name: the password / HTTP header value

  • auth_type: one of: “http_basic_auth”, “http_header”

  • hierarchical_output: boolean

  • host: destination URL

  • use_json: boolean

SMTP Output Bot

Sends a MIME Multipart message containing the text and the event as CSV for every single event.

Information

  • name: smtp

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: Sends events via SMTP

Configuration Parameters

  • fieldnames: a list of field names to be included in the email, comma separated string or list of strings. If empty, no attachment is sent - this can be useful if the actual data is already in the body (parameter text) or the subject.

  • mail_from: string. Supports formatting, see below

  • mail_to: string of email addresses, comma separated. Supports formatting, see below

  • smtp_host: string

  • smtp_password: string or null, Password for authentication on your SMTP server

  • smtp_port: port

  • smtp_username: string or null, Username for authentication on your SMTP server

  • ssl: boolean

  • starttls: boolean

  • subject: string. Supports formatting, see below

  • text: string or null. Supports formatting, see below

For several strings you can use values from the string using the standard Python string format syntax. Access the event’s values with {ev[source.ip]} and similar. Any not existing fields will result in None. For example, to set the recipient(s) to the value given in the event’s source.abuse_contact field, use this as mail_to parameter: {ev[source.abuse_contact]}

Authentication is optional. If both username and password are given, these mechanism are tried: CRAM-MD5, PLAIN, and LOGIN.

Client certificates are not supported. If http_verify_cert is true, TLS certificates are checked.

SQL

Information

Configuration Parameters

The parameters marked with ‘PostgreSQL’ will be sent to libpq via psycopg2. Check the libpq parameter documentation for the versions you are using.

  • autocommit: psycopg’s autocommit mode, optional, default True

  • connect_timeout: Database connect_timeout, optional, default 5 seconds

  • engine: ‘postgresql’ or ‘sqlite’

  • database: PostgreSQL database or SQLite file

  • host: PostgreSQL host

  • jsondict_as_string: save JSONDict fields as JSON string, boolean. Default: true (like in versions before 1.1)

  • port: PostgreSQL port

  • user: PostgreSQL user

  • password: PostgreSQL password

  • sslmode: PostgreSQL sslmode, can be ‘disable’, ‘allow’, ‘prefer’ (default), ‘require’, ‘verify-ca’ or ‘verify-full’. See postgresql docs: https://www.postgresql.org/docs/current/static/libpq-connect.html#libpq-connect-sslmode

  • table: name of the database table into which events are to be inserted

PostgreSQL

You have two basic choices to run PostgreSQL: 1. on the same machine as intelmq, then you could use Unix sockets if available on your platform 2. on a different machine. In which case you would need to use a TCP connection and make sure you give the right connection parameters to each psql or client call.

Make sure to consult your PostgreSQL documentation about how to allow network connections and authentication in case 2.

PostgreSQL Version

Any supported version of PostgreSQL should work (v>=9.2 as of Oct 2016) [1].

If you use PostgreSQL server v >= 9.4, it gives you the possibility to use the time-zone formatting string “OF” for date-times and the GiST index for the CIDR type. This may be useful depending on how you plan to use the events that this bot writes into the database.

How to install

Use intelmq_psql_initdb to create initial SQL statements from harmonization.conf. The script will create the required table layout and save it as /tmp/initdb.sql

You need a PostgreSQL database-user to own the result database. The recommendation is to use the name intelmq. There may already be such a user for the PostgreSQL database-cluster to be used by other bots. (For example from setting up the expert/certbund_contact bot.)

Therefore if still necessary: create the database-user as postgresql superuser, which usually is done via the system user postgres:

createuser --no-superuser --no-createrole --no-createdb --encrypted --pwprompt intelmq

Create the new database:

createdb --encoding='utf-8' --owner=intelmq intelmq-events

(The encoding parameter should ensure the right encoding on platform where this is not the default.)

Now initialize it as database-user intelmq (in this example a network connection to localhost is used, so you would get to test if the user intelmq can authenticate):

psql -h localhost intelmq-events intelmq </tmp/initdb.sql

SQLite

Similarly to PostgreSQL, you can use intelmq_psql_initdb to create initial SQL statements from harmonization.conf. The script will create the required table layout and save it as /tmp/initdb.sql.

Create the new database (you can ignore all errors since SQLite doesn’t know all SQL features generated for PostgreSQL):

sqlite3 your-db.db
sqlite> .read /tmp/initdb.sql

Then, set the database parameter to the your-db.db file path.

STOMP

Information

Requirements :

Install the stomp.py library, e.g. apt install python3-stomp.py or pip install stomp.py.

You need a CA certificate, client certificate and key file from the organization / server you are connecting to. Also you will need a so called “exchange point”.

Configuration Parameters

  • exchange: The exchange to push at

  • heartbeat: default: 60000

  • message_hierarchical_output: Boolean, default: false

  • message_jsondict_as_string: Boolean, default: false

  • message_with_type: Boolean, default: false

  • port: Integer, default: 61614

  • server: Host or IP address of the STOMP server

  • single_key: Boolean or string (field name), default: false

  • ssl_ca_certificate: path to CA file

  • ssl_client_certificate: path to client cert file

  • ssl_client_certificate_key: path to client cert key file

TCP

Information

  • name: intelmq.bots.outputs.tcp.output

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: TCP is the bot responsible to send events to a TCP port (Splunk, another IntelMQ, etc..).

Multihreading is disabled for this bot.

Configuration Parameters

  • counterpart_is_intelmq: Boolean. If you are sending to an IntelMQ TCP collector, set this to True, otherwise e.g. with filebeat, set it to false.

  • ip: IP of destination server

  • hierarchical_output: true for a nested JSON, false for a flat JSON (when sending to a TCP collector).

  • port: port of destination server

  • separator: separator of messages, e.g. “n”, optional. When sending to a TCP collector, parameter shouldn’t be present. In that case, the output waits every message is acknowledged by “Ok” message the TCP collector bot implements.

Sending to an IntelMQ TCP collector

If you intend to link two IntelMQ instance via TCP, set the parameter counterpart_is_intelmq to true. The bot then awaits an “Ok” message to be received after each message is sent. The TCP collector just sends “Ok” after every message it gets.

Touch

Information

  • name: intelmq.bots.outputs.touch.output

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: Touches a file for every event received.

Configuration Parameters

  • path: Path to the file to touch.

UDP

Information

  • name: intelmq.bots.outputs.udp.output

  • lookup: no

  • public: yes

  • cache (redis db): none

  • description: Output Bot that sends events to a remote UDP server.

Multihreading is disabled for this bot.

Configuration Parameters

  • field_delimiter: If the format is ‘delimited’ this will be added between fields. String, default: “|”

  • format: Can be ‘json’ or ‘delimited’. The JSON format outputs the event ‘as-is’. Delimited will deconstruct the event and print each field:value separated by the field delimit. See examples below.

  • header: Header text to be sent in the UDP datagram, string.

  • keep_raw_field: boolean, default: false

  • udp_host: Destination’s server’s Host name or IP address

  • udp_port: Destination port

Examples of usage

Consider the following event:

{"raw": "MjAxNi8wNC8yNV8xMTozOSxzY2hpenppbm8ub21hcmF0aG9uLmNvbS9na0NDSnVUSE0vRFBlQ1pFay9XdFZOSERLbC1tWFllRk5Iai8sODUuMjUuMTYwLjExNCxzdGF0aWMtaXAtODUtMjUtMTYwLTExNC5pbmFkZHIuaXAtcG9vbC5jb20uLEFuZ2xlciBFSywtLDg5NzI=", "source": {"asn": 8972, "ip": "85.25.160.114", "url": "http://schizzino.omarathon.com/gkCCJuTHM/DPeCZEk/WtVNHDKl-mXYeFNHj/", "reverse_dns": "static-ip-85-25-160-114.inaddr.ip-pool.com"}, "classification": {"type": "malware"}, "event_description": {"text": "Angler EK"}, "feed": {"url": "http://www.malwaredomainlist.com/updatescsv.php", "name": "Malware Domain List", "accuracy": 100.0}, "time": {"observation": "2016-04-29T10:59:34+00:00", "source": "2016-04-25T11:39:00+00:00"}}

With the following Parameters:

  • field_delimiter : |

  • format : json

  • Header : header example

  • keep_raw_field : true

  • ip : 127.0.0.1

  • port : 514

Resulting line in syslog:

Apr 29 11:01:29 header example {"raw": "MjAxNi8wNC8yNV8xMTozOSxzY2hpenppbm8ub21hcmF0aG9uLmNvbS9na0NDSnVUSE0vRFBlQ1pFay9XdFZOSERLbC1tWFllRk5Iai8sODUuMjUuMTYwLjExNCxzdGF0aWMtaXAtODUtMjUtMTYwLTExNC5pbmFkZHIuaXAtcG9vbC5jb20uLEFuZ2xlciBFSywtLDg5NzI=", "source": {"asn": 8972, "ip": "85.25.160.114", "url": "http://schizzino.omarathon.com/gkCCJuTHM/DPeCZEk/WtVNHDKl-mXYeFNHj/", "reverse_dns": "static-ip-85-25-160-114.inaddr.ip-pool.com"}, "classification": {"type": "malware"}, "event_description": {"text": "Angler EK"}, "feed": {"url": "http://www.malwaredomainlist.com/updatescsv.php", "name": "Malware Domain List", "accuracy": 100.0}, "time": {"observation": "2016-04-29T10:59:34+00:00", "source": "2016-04-25T11:39:00+00:00"}}

With the following Parameters:

  • field_delimiter : |

  • format : delimited

  • Header : IntelMQ-event

  • keep_raw_field : false

  • ip : 127.0.0.1

  • port : 514

Resulting line in syslog:

Apr 29 11:17:47 localhost IntelMQ-event|source.ip: 85.25.160.114|time.source:2016-04-25T11:39:00+00:00|feed.url:http://www.malwaredomainlist.com/updatescsv.php|time.observation:2016-04-29T11:17:44+00:00|source.reverse_dns:static-ip-85-25-160-114.inaddr.ip-pool.com|feed.name:Malware Domain List|event_description.text:Angler EK|source.url:http://schizzino.omarathon.com/gkCCJuTHM/DPeCZEk/WtVNHDKl-mXYeFNHj/|source.asn:8972|classification.type:malware|feed.accuracy:100.0
XMPP

Warning: This bot is deprecated and will be removed in the version 3.0 of IntelMQ. Warning: This bot is currently unmaintained. The used XMPP library sleekxmpp is deprecated. For more information see Issue #1614.

Information

  • name: intelmq.bots.outputs.xmpp.collector

  • lookup: yes

  • public: yes

  • cache (redis db): none

  • description: The XMPP Output is capable of sending Messages to XMPP Rooms and as direct messages.

Requirements

The Sleekxmpp - Library needs to be installed on your System

pip3 install -r intelmq/bots/collectors/xmpp/REQUIREMENTS.txt

Configuration Parameters

  • xmpp_user : The username of the XMPP-Account the output shall use (part before the @ sign)

  • xmpp_server : The domain name of the server of the XMPP-Account (part after the @ sign)

  • xmpp_password : The password of the XMPP-Account

  • xmpp_to_user : The username of the receiver

  • xmpp_to_server : The domain name of the receiver

  • xmpp_room : The room which has to be joined by the output (full address a@conference.b.com)

  • xmpp_room_nick : The username / nickname the output shall use within the room.

  • xmpp_room_password : The password which might be required to join a room

  • use_muc : If this parameter is true, the bot will join the room xmpp_room.

  • ca_certs : A path to a file containing the CA’s which should be used

intelmqctl documentation

Introduction

intelmqctl is the main tool to handle a intelmq installation. It handles the bots themselves and has some tools to handle the installation.

Output type

intelmqctl can be used as command line tool, as library and as tool by other programs. If called directly, it will print all output to the console (stderr). If used as python library, the python types themselves are returned. The third option is to use machine-readable JSON as output (used by other managing tools).

Manage individual bots

As all init systems, intelmqctl has the methods start, stop, restart, reload and status.

start

This will start the bot with the ID file-output. A file with it’s PID will be created in /opt/intelmq/var/run/[bot-id].pid.

> intelmqctl start file-output
Starting file-output...
file-output is running.

If the bot is already running, it won’t be started again:

> intelmqctl start file-output
file-output is running.
stop

If the PID file does exist, a SIGINT will be sent to the process. After 0.25s we check if the process is running. If not, the PID file will be removed.

> intelmqctl stop file-output
Stopping file-output...
file-output is stopped.

If there’s no running bot, there’s nothing to do.

> intelmqctl stop file-output
file-output was NOT RUNNING.

If the bot did not stop in 0.25s, intelmqctl will say it’s still running:

> intelmqctl stop file-output
file-output is still running
status

Checks for the PID file and if the process with the given PID is alive. If the PID file exists, but the process does not exist, it will be removed.

> intelmqctl status file-output
file-output is stopped.
> intelmqctl start file-output
Starting file-output...
file-output is running.
> intelmqctl status file-output
file-output is running.
restart

The same as stop and start consecutively.

> intelmqctl restart file-output
Stopping file-output...
file-output is stopped.
Starting file-output...
file-output is running.
reload

Sends a SIGHUP to the bot, which will then reload the configuration.

> intelmqctl reload file-output
Reloading file-output ...
file-output is running.

If the bot is not running, we can’t reload it:

> intelmqctl reload file-output
file-output was NOT RUNNING.
run

Run a bot directly for debugging purpose.

If launched with no arguments, the bot will call its init method and start processing messages as usual – but you see everything happens.

> intelmqctl run file-output
file-output: RestAPIOutputBot initialized with id file-output and version 3.5.2 as process 12345.
file-output: Bot is starting.
file-output: Loading source pipeline and queue 'file-output-queue'.
file-output: Connected to source queue.
file-output: No destination queues to load.
file-output: Bot initialization completed.
file-output: Waiting for incoming message.

Should you get lost any time, just use the –help after any argument for further explanation.

> intelmqctl run file-output --help

Note that if another instance of the bot is running, only warning will be displayed.

> intelmqctl run file-output
Main instance of the bot is running in the background. You may want to launch: intelmqctl stop file-output

You can set the log level with the -l flag, e.g. -l DEBUG. For the ‘console’ subcommand, ‘DEBUG’ is the default.

console

If launched with console argument, you get a `pdb` live console; or `ipdb` or `pudb` consoles if they were previously installed (I.E. `pip3 install ipdb --user`).

> intelmqctl run file-output console
*** Using console ipdb. Please use 'self' to access to the bot instance properties. ***
ipdb> self. ...

You may specify the desired console in the next argument.

> intelmqctl run file-output console pudb
message

Operate directly with the input / output pipelines.

If get is the parameter, you see the message that waits in the input (source or internal) queue. If the argument is pop, the message gets popped as well.

> intelmqctl run file-output message get
file-output: Waiting for a message to get...
{
    "classification.type": "c&c",
    "feed.url": "https://example.com",
    "raw": "1233",
    "source.ip": "1.2.3.4",
    "time.observation": "2017-05-17T22:00:33+00:00",
    "time.source": "2017-05-17T22:00:32+00:00"
}

To send directly to the bot’s output queue, just as it was sent by `self.send_message()` in bot’s `process()` method, use the send argument. In our case of `file-output`, it has no destination queue so that nothing happens.

> intelmqctl run file-output message send '{"time.observation": "2017-05-17T22:00:33+00:00", "time.source": "2017-05-17T22:00:32+00:00"}'
file-output: Bot has no destination queues.

Note, if you would like to know possible parameters of the message, put a wrong one – you will be prompted if you want to list all the current bot harmonization.

process

With no other arguments, bot's `process()` method will be run one time.

> intelmqctl run file-output process
file-output: Bot is starting.
file-output: Bot initialization completed.
file-output: Processing...
file-output: Waiting for incoming message.
file-output: Received message {'raw': '1234'}.

If run with –dryrun|-d flag, the message gets never really popped out from the source or internal pipeline, nor sent to the output pipeline. Plus, you receive a note about the exact moment the message would get sent, or acknowledged. If the message would be sent to a non-default path, the name of this path is printed on the console.

> intelmqctl run file-output process -d
file-output:  * Dryrun only, no message will be really sent through.
...
file-output: DRYRUN: Message would be acknowledged now!

You may trick the bot to process a JSON instead of the Message in its pipeline with –msg|-m flag.

> intelmqctl run file-output process -m '{"source.ip":"1.2.3.4"}'
file-output:  * Message from cli will be used when processing.
...

If you wish to display the processed message as well, you the –show-sent|-s flag. Then, if sent through (either with –dryrun or without), the message gets displayed as well.

disable

Sets the enabled flag in the runtime configuration of the bot to false. By default, all bots are enabled.

Example output:

> intelmqctl status file-output
file-output is stopped.
> intelmqctl disable file-output
> intelmqctl status file-output
file-output is disabled.
enable

Sets the enabled flag in the runtime configuration of the bot to true.

Example output:

> intelmqctl status file-output
file-output is disabled.
> intelmqctl enable file-output
> intelmqctl status file-output
file-output is stopped.

Manage the botnet

In IntelMQ, the botnet is the set of all currently configured and enabled bots. All configured bots have their configuration in runtime.conf and their queues in pipeline.conf. By default, all bots are enabled. To disable a bot set enabled to false. Also see Bots and Runtime Configuration.

If not bot id is given, the command applies to all bots / the botnet. All commands except the start action are applied to all bots. But only enabled bots are started.

In the examples below, a very minimal botnet is used.

start

The start action applies to all bots which are enabled.

> intelmqctl start
Starting abusech-domain-parser...
abusech-domain-parser is running.
Starting abusech-feodo-domains-collector...
abusech-feodo-domains-collector is running.
Starting deduplicator-expert...
deduplicator-expert is running.
file-output is disabled.
Botnet is running.

As we can file-output is disabled and thus has not been started. You can always explicitly start disabled bots.

stop

The stop action applies to all bots. Assume that all bots have been running:

> intelmqctl stop
Stopping Botnet...
Stopping abusech-domain-parser...
abusech-domain-parser is stopped.
Stopping abusech-feodo-domains-collector...
abusech-feodo-domains-collector is stopped.
Stopping deduplicator-expert...
deduplicator-expert is stopped.
Stopping file-output...
file-output is stopped.
Botnet is stopped.
status

With this command we can see the status of all configured bots. Here, the botnet was started beforehand:

> intelmqctl status
abusech-domain-parser is running.
abusech-feodo-domains-collector is running.
deduplicator-expert is running.
file-output is disabled.

And if the disabled bot has also been started:

> intelmqctl status
abusech-domain-parser is running.
abusech-feodo-domains-collector is running.
deduplicator-expert is running.
file-output is running.

If the botnet is stopped, the output looks like this:

> intelmqctl status
abusech-domain-parser is stopped.
abusech-feodo-domains-collector is stopped.
deduplicator-expert is stopped.
file-output is disabled.
restart

The same as start and stop consecutively.

reload

The same as reload of every bot.

enable / disable

The sub commands enable and disable set the corresponding flags in runtime.conf.

> intelmqctl status
file-output is stopped.
malware-domain-list-collector is stopped.
malware-domain-list-parser is stopped.
> intelmqctl disable file-output
> intelmqctl status
file-output is disabled.
malware-domain-list-collector is stopped.
malware-domain-list-parser is stopped.
> intelmqctl enable file-output
> intelmqctl status
file-output is stopped.
malware-domain-list-collector is stopped.
malware-domain-list-parser is stopped.

List bots

intelmqctl list bots does list all configured bots and their description.

List queues

intelmqctl list queues shows all queues which are currently in use according to the configuration and how much events are in it:

> intelmqctl list queues
abusech-domain-parser-queue - 0
abusech-domain-parser-queue-internal - 0
deduplicator-expert-queue - 0
deduplicator-expert-queue-internal - 0
file-output-queue - 234
file-output-queue-internal - 0

Use the -q or –quiet flag to only show non-empty queues:

> intelmqctl list queues -q
file-output-queue - 234

The –sum or –count flag will show the sum of events on all queues:

> intelmqctl list queues --sum
42

Log

intelmqctl can show the last log lines for a bot, filtered by the log level.

See the help page for more information.

Check

This command will do various sanity checks on the installation and especially the configuration.

Orphaned Queues

The intelmqctl check tool can search for orphaned queues. “Orphaned queues” are queues that have been used in the past and are no longer in use. For example you had a bot which you removed or renamed afterwards, but there were still messages in it’s source queue. The source queue won’t be renamed automatically and is now disconnected. As this queue is no longer configured, it won’t show up in the list of IntelMQ’s queues too. In case you are using redis as message broker, you can use the redis-cli tool to examine or remove these queues:

redis-cli -n 2
keys * # lists all existing non-empty queues
llen [queue-name] # shows the length of the queue [queue-name]
lindex [queue-name] [index] # show the [index]'s message of the queue [queue-name]
del [queue-name] # remove the queue [queue-name]

To ignore certain queues in this check, you can set the parameter intelmqctl_check_orphaned_queues_ignore in the defaults configuration file. For example:

"intelmqctl_check_orphaned_queues_ignore": ["Taichung-Parser"],

Configuration upgrade

The intelmqctl upgrade-config function upgrade, upgrade the configuration from previous versions to the current one. It keeps track of previously installed versions and the result of all “upgrade functions” in the “state file”, locate in the $var_state_path/state.json (/opt/intelmq/var/lib/state.json or /var/lib/intelmq/state.json).

This function has been introduced in version 2.0.1.

It makes backups itself for all changed files before every run. Backups are overridden if they already exists. So make sure to always have a backup of your configuration just in case.

Exit code

In case of errors, unsuccessful operations, the exit code is higher than 0. For example, when running intelmqctl start and one enabled bot is not running, the exit code is 1. The same is valid for e.g. intelmqctl status, which can be used for monitoring, and all other operations.

Known issues

The currently implemented process managing using PID files is very erroneous.

Feeds

The available feeds are grouped by the provider of the feeds. For each feed the collector and parser that can be used is documented as well as any feed-specific parameters. To add feeds to this file add them to intelmq/etc/feeds.yaml and then rebuild the documentation.

Contents

Abuse.ch

Feodo Tracker Browse

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://feodotracker.abuse.ch/browse

    • name: Feodo Tracker Browse

    • provider: Abuse.ch

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.html_table.parser

  • Configuration Parameters:
    • columns: time.source,source.ip,malware.name,status,source.as_name,source.geolocation.cc

    • ignore_values: ,,,,,

    • skip_table_head: True

    • type: c2server

Feodo Tracker IPs
  • Public: yes

  • Revision: 2019-03-25

  • Documentation: https://feodotracker.abuse.ch/

  • Description: List of botnet Command&Control servers (C&Cs) tracked by Feodo Tracker, associated with Dridex and Emotet (aka Heodo).

  • Additional Information: https://feodotracker.abuse.ch/ The data in the column Last Online is used for time.source if available, with 00:00 as time. Otherwise first seen is used as time.source.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://feodotracker.abuse.ch/downloads/ipblocklist.csv

    • name: Feodo Tracker IPs

    • provider: Abuse.ch

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.abusech.parser_ip

  • Configuration Parameters:

URLhaus
  • Public: yes

  • Revision: 2020-07-07

  • Documentation: https://urlhaus.abuse.ch/feeds/

  • Description: URLhaus is a project from abuse.ch with the goal of sharing malicious URLs that are being used for malware distribution. URLhaus offers a country, ASN (AS number) and Top Level Domain (TLD) feed for network operators / Internet Service Providers (ISPs), Computer Emergency Response Teams (CERTs) and domain registries.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://urlhaus.abuse.ch/feeds/tld/<TLD>/, https://urlhaus.abuse.ch/feeds/country/<CC>/, or https://urlhaus.abuse.ch/feeds/asn/<ASN>/

    • name: URLhaus

    • provider: Abuse.ch

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.generic.parser_csv

  • Configuration Parameters:
    • columns: [“time.source”, “source.url”, “status”, “classification.type|__IGNORE__”, “source.fqdn|__IGNORE__”, “source.ip”, “source.asn”, “source.geolocation.cc”]

    • default_url_protocol: http://

    • delimiter: ,

    • skip_header: False

    • type_translation: {“malware_download”: “malware-distribution”}

AlienVault

OTX
  • Public: no

  • Revision: 2018-01-20

  • Documentation: https://otx.alienvault.com/

  • Description: AlienVault OTX Collector is the bot responsible to get the report through the API. Report could vary according to subscriptions.

Collector

  • Module: intelmq.bots.collectors.alienvault_otx.collector

  • Configuration Parameters:
    • api_key: {{ your API key }}

    • name: OTX

    • provider: AlienVault

Parser

  • Module: intelmq.bots.parsers.alienvault.parser_otx

  • Configuration Parameters:

Reputation List
  • Public: yes

  • Revision: 2018-01-20

  • Description: List of malicious IPs.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://reputation.alienvault.com/reputation.data

    • name: Reputation List

    • provider: AlienVault

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.alienvault.parser

  • Configuration Parameters:

AnubisNetworks

Cyberfeed Stream

Collector

  • Module: intelmq.bots.collectors.http.collector_http_stream

  • Configuration Parameters:
    • http_url: https://prod.cyberfeed.net/stream?key={{ your API key }}

    • name: Cyberfeed Stream

    • provider: AnubisNetworks

    • strip_lines: true

Parser

  • Module: intelmq.bots.parsers.anubisnetworks.parser

  • Configuration Parameters:
    • use_malware_familiy_as_classification_identifier: True

Autoshun

Shunlist
  • Public: no

  • Revision: 2018-01-20

  • Documentation: https://www.autoshun.org/

  • Description: You need to register in order to use the list.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.autoshun.org/download/?api_key=__APIKEY__&format=html

    • name: Shunlist

    • provider: Autoshun

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.autoshun.parser

  • Configuration Parameters:

Bambenek

C2 Domains

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_password: __PASSWORD__

    • http_url: https://faf.bambenekconsulting.com/feeds/c2-dommasterlist.txt

    • http_username: __USERNAME__

    • name: C2 Domains

    • provider: Bambenek

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.bambenek.parser

  • Configuration Parameters:

C2 IPs

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_password: __PASSWORD__

    • http_url: https://faf.bambenekconsulting.com/feeds/c2-ipmasterlist.txt

    • http_username: __USERNAME__

    • name: C2 IPs

    • provider: Bambenek

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.bambenek.parser

  • Configuration Parameters:

DGA Domains

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://faf.bambenekconsulting.com/feeds/dga-feed.txt

    • name: DGA Domains

    • provider: Bambenek

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.bambenek.parser

  • Configuration Parameters:

Blocklist.de

Apache
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://www.blocklist.de/en/export.html

  • Description: Blocklist.DE Apache Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours as having run attacks on the service Apache, Apache-DDOS, RFI-Attacks.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.blocklist.de/lists/apache.txt

    • name: Apache

    • provider: Blocklist.de

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.blocklistde.parser

  • Configuration Parameters:

Bots
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://www.blocklist.de/en/export.html

  • Description: Blocklist.DE Bots Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours as having run attacks attacks on the RFI-Attacks, REG-Bots, IRC-Bots or BadBots (BadBots = he has posted a Spam-Comment on a open Forum or Wiki).

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.blocklist.de/lists/bots.txt

    • name: Bots

    • provider: Blocklist.de

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.blocklistde.parser

  • Configuration Parameters:

Brute-force Logins
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://www.blocklist.de/en/export.html

  • Description: Blocklist.DE Brute-force Login Collector is the bot responsible to get the report from source of information. All IPs which attacks Joomlas, Wordpress and other Web-Logins with Brute-Force Logins.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.blocklist.de/lists/bruteforcelogin.txt

    • name: Brute-force Logins

    • provider: Blocklist.de

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.blocklistde.parser

  • Configuration Parameters:

FTP
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://www.blocklist.de/en/export.html

  • Description: Blocklist.DE FTP Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours for attacks on the Service FTP.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.blocklist.de/lists/ftp.txt

    • name: FTP

    • provider: Blocklist.de

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.blocklistde.parser

  • Configuration Parameters:

IMAP
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://www.blocklist.de/en/export.html

  • Description: Blocklist.DE IMAP Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours for attacks on the service like IMAP, SASL, POP3, etc.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.blocklist.de/lists/imap.txt

    • name: IMAP

    • provider: Blocklist.de

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.blocklistde.parser

  • Configuration Parameters:

IRC Bots

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.blocklist.de/lists/ircbot.txt

    • name: IRC Bots

    • provider: Blocklist.de

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.blocklistde.parser

  • Configuration Parameters:

Mail
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://www.blocklist.de/en/export.html

  • Description: Blocklist.DE Mail Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours as having run attacks on the service Mail, Postfix.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.blocklist.de/lists/mail.txt

    • name: Mail

    • provider: Blocklist.de

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.blocklistde.parser

  • Configuration Parameters:

SIP
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://www.blocklist.de/en/export.html

  • Description: Blocklist.DE SIP Collector is the bot responsible to get the report from source of information. All IP addresses that tried to login in a SIP-, VOIP- or Asterisk-Server and are included in the IPs-List from http://www.infiltrated.net/ (Twitter).

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.blocklist.de/lists/sip.txt

    • name: SIP

    • provider: Blocklist.de

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.blocklistde.parser

  • Configuration Parameters:

SSH
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://www.blocklist.de/en/export.html

  • Description: Blocklist.DE SSH Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours as having run attacks on the service SSH.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.blocklist.de/lists/ssh.txt

    • name: SSH

    • provider: Blocklist.de

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.blocklistde.parser

  • Configuration Parameters:

Strong IPs
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://www.blocklist.de/en/export.html

  • Description: Blocklist.DE Strong IPs Collector is the bot responsible to get the report from source of information. All IPs which are older then 2 month and have more then 5.000 attacks.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.blocklist.de/lists/strongips.txt

    • name: Strong IPs

    • provider: Blocklist.de

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.blocklistde.parser

  • Configuration Parameters:

Blueliv

CrimeServer
  • Public: no

  • Revision: 2018-01-20

  • Documentation: https://www.blueliv.com/

  • Description: Blueliv Crimeserver Collector is the bot responsible to get the report through the API.

  • Additional Information: The service uses a different API for free users and paying subscribers. In ‘CrimeServer’ feed the difference lies in the data points present in the feed. The non-free API available from Blueliv contains, for this specific feed, following extra fields not present in the free API; “_id” - Internal unique ID “subType” - Subtype of the Crime Server “countryName” - Country name where the Crime Server is located, in English “city” - City where the Crime Server is located “domain” - Domain of the Crime Server “host” - Host of the Crime Server “createdAt” - Date when the Crime Server was added to Blueliv CrimeServer database “asnCidr” - Range of IPs that belong to an ISP (registered via Autonomous System Number (ASN)) “asnId” - Identifier of an ISP registered via ASN “asnDesc” Description of the ISP registered via ASN

Collector

  • Module: intelmq.bots.collectors.blueliv.collector_crimeserver

  • Configuration Parameters:
    • api_key: __APIKEY__

    • name: CrimeServer

    • provider: Blueliv

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.blueliv.parser_crimeserver

  • Configuration Parameters:

CERT-Bund

CB-Report Malware infections via IMAP
  • Public: no

  • Revision: 2020-08-20

  • Description: CERT-Bund sends reports for the malware-infected hosts.

  • Additional Information: Traffic from malware related hosts contacting command-and-control servers is caught and sent to national CERT teams. There are two e-mail feeds with identical CSV structure – one reports on general malware infections, the other on the Avalanche botnet.

Collector

  • Module: intelmq.bots.collectors.mail.collector_mail_attach

  • Configuration Parameters:
    • attach_regex: events.csv

    • extract_files: False

    • folder: INBOX

    • mail_host: __HOST__

    • mail_password: __PASSWORD__

    • mail_ssl: True

    • mail_user: __USERNAME__

    • name: CB-Report Malware infections via IMAP

    • provider: CERT-Bund

    • rate_limit: 86400

    • subject_regex: ^\[CB-Report#.* Malware infections (\(Avalanche\) )?in country

Parser

  • Module: intelmq.bots.parsers.generic.parser_csv

  • Configuration Parameters:
    • columns: [“source.asn”, “source.ip”, “time.source”, “classification.type”, “malware.name”, “source.port”, “destination.ip”, “destination.port”, “destination.fqdn”, “protocol.transport”]

    • default_url_protocol: http://

    • delimiter: ,

    • skip_header: True

    • time_format: from_format|%Y-%m-%d %H:%M:%S

    • type: infected-system

CERT.PL

N6 Stomp Stream
  • Public: no

  • Revision: 2018-01-20

  • Documentation: https://n6.cert.pl/en/

  • Description: N6 Collector - CERT.pl’s N6 Collector - N6 feed via STOMP interface. Note that rate_limit does not apply for this bot as it is waiting for messages on a stream.

  • Additional Information: Contact cert.pl to get access to the feed.

Collector

  • Module: intelmq.bots.collectors.stomp.collector

  • Configuration Parameters:
    • exchange: {insert your exchange point as given by CERT.pl}

    • name: N6 Stomp Stream

    • port: 61614

    • provider: CERT.PL

    • server: n6stream.cert.pl

    • ssl_ca_certificate: {insert path to CA file for CERT.pl’s n6}

    • ssl_client_certificate: {insert path to client cert file for CERTpl’s n6}

    • ssl_client_certificate_key: {insert path to client cert key file for CERT.pl’s n6}

Parser

  • Module: intelmq.bots.parsers.n6.parser_n6stomp

  • Configuration Parameters:

CINSscore

Army List
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://cinsscore.com/#list

  • Description: The CINS Army list is a subset of the CINS Active Threat Intelligence ruleset, and consists of IP addresses that meet one of two basic criteria: 1) The IP’s recent Rogue Packet score factor is very poor, or 2) The IP has tripped a designated number of ‘trusted’ alerts across a given number of our Sentinels deployed around the world.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://cinsscore.com/list/ci-badguys.txt

    • name: Army List

    • provider: CINSscore

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.ci_army.parser

  • Configuration Parameters:

CZ.NIC

HaaS
  • Public: yes

  • Revision: 2020-07-22

  • Documentation: https://haas.nic.cz/

  • Description: SSH attackers against HaaS (Honeypot as a Service) provided by CZ.NIC, z.s.p.o. The dump is published once a day.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • extract_files: True

    • http_url: https://haas.nic.cz/stats/export/{time[%Y/%m/%Y-%m-%d]}.json.gz

    • http_url_formatting: {‘days’: -1}

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.cznic.parser_haas

  • Configuration Parameters:

Proki
  • Public: no

  • Revision: 2020-08-17

  • Documentation: https://csirt.cz/en/proki/

  • Description: Aggregation of various sources on malicious IP addresses (malware spreaders or C&C servers).

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://proki.csirt.cz/api/1/__APIKEY__/data/day/{time[%Y/%m/%d]}

    • http_url_formatting: {‘days’: -1}

    • name: Proki

    • provider: CZ.NIC

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.cznic.parser_proki

  • Configuration Parameters:

Calidog

CertStream

Collector

  • Module: intelmq.bots.collectors.calidog.collector_certstream

  • Configuration Parameters:
    • name: CertStream

    • provider: Calidog

Parser

  • Module: intelmq.bots.parsers.calidog.parser_certstream

  • Configuration Parameters:

CleanMX

Phishing
  • Public: no

  • Revision: 2018-01-20

  • Documentation: http://clean-mx.de/

  • Description: In order to download the CleanMX feed you need to use a custom user agent and register that user agent.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_timeout_sec: 120

    • http_url: http://support.clean-mx.de/clean-mx/xmlphishing?response=alive&domain=

    • http_user_agent: {{ your user agent }}

    • name: Phishing

    • provider: CleanMX

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.cleanmx.parser

  • Configuration Parameters:

Virus
  • Public: no

  • Revision: 2018-01-20

  • Documentation: http://clean-mx.de/

  • Description: In order to download the CleanMX feed you need to use a custom user agent and register that user agent.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_timeout_sec: 120

    • http_url: http://support.clean-mx.de/clean-mx/xmlviruses?response=alive&domain=

    • http_user_agent: {{ your user agent }}

    • name: Virus

    • provider: CleanMX

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.cleanmx.parser

  • Configuration Parameters:

CyberCrime Tracker

Latest

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://cybercrime-tracker.net/index.php

    • name: Latest

    • provider: CyberCrime Tracker

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.html_table.parser

  • Configuration Parameters:
    • columns: [“time.source”, “source.url”, “source.ip”, “malware.name”, “__IGNORE__”]

    • default_url_protocol: http://

    • skip_table_head: True

    • type: c2server

DShield

AS Details

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://dshield.org/asdetailsascii.html?as={{ AS Number }}

    • name: AS Details

    • provider: DShield

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.dshield.parser_asn

  • Configuration Parameters:

Block
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://www.dshield.org/reports.html

  • Description: This list summarizes the top 20 attacking class C (/24) subnets over the last three days. The number of ‘attacks’ indicates the number of targets reporting scans from this subnet.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.dshield.org/block.txt

    • name: Block

    • provider: DShield

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.dshield.parser_block

  • Configuration Parameters:

Suspicious Domains
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://www.dshield.org/reports.html

  • Description: There are many suspicious domains on the internet. In an effort to identify them, as well as false positives, we have assembled weighted lists based on tracking and malware lists from different sources. ISC is collecting and categorizing various lists associated with a certain level of sensitivity.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.dshield.org/feeds/suspiciousdomains_High.txt

    • name: Suspicious Domains

    • provider: DShield

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.dshield.parser_domain

  • Configuration Parameters:

Danger Rulez

Bruteforce Blocker

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://danger.rulez.sk/projects/bruteforceblocker/blist.php

    • name: Bruteforce Blocker

    • provider: Danger Rulez

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.danger_rulez.parser

  • Configuration Parameters:

Dataplane

SIP Query
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://dataplane.org/

  • Description: Entries consist of fields with identifying characteristics of a source IP address that has been seen initiating a SIP OPTIONS query to a remote host. This report lists hosts that are suspicious of more than just port scanning. The hosts may be SIP server cataloging or conducting various forms of telephony abuse. Report is updated hourly.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://dataplane.org/sipquery.txt

    • name: SIP Query

    • provider: Dataplane

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.dataplane.parser

  • Configuration Parameters:

SIP Registration
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://dataplane.org/

  • Description: Entries consist of fields with identifying characteristics of a source IP address that has been seen initiating a SIP REGISTER operation to a remote host. This report lists hosts that are suspicious of more than just port scanning. The hosts may be SIP client cataloging or conducting various forms of telephony abuse. Report is updated hourly.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://dataplane.org/sipregistration.txt

    • name: SIP Registration

    • provider: Dataplane

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.dataplane.parser

  • Configuration Parameters:

SSH Client Connection
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://dataplane.org/

  • Description: Entries below consist of fields with identifying characteristics of a source IP address that has been seen initiating an SSH connection to a remote host. This report lists hosts that are suspicious of more than just port scanning. The hosts may be SSH server cataloging or conducting authentication attack attempts. Report is updated hourly.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://dataplane.org/sshclient.txt

    • name: SSH Client Connection

    • provider: Dataplane

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.dataplane.parser

  • Configuration Parameters:

SSH Password Authentication
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://dataplane.org/

  • Description: Entries below consist of fields with identifying characteristics of a source IP address that has been seen attempting to remotely login to a host using SSH password authentication. The report lists hosts that are highly suspicious and are likely conducting malicious SSH password authentication attacks. Report is updated hourly.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://dataplane.org/sshpwauth.txt

    • name: SSH Password Authentication

    • provider: Dataplane

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.dataplane.parser

  • Configuration Parameters:

ESET

ETI Domains

Collector

  • Module: intelmq.bots.collectors.eset.collector

  • Configuration Parameters:
    • collection: ei.domains v2 (json)

    • endpoint: eti.eset.com

    • password: <password>

    • time_delta: 3600

    • username: <username>

Parser

  • Module: intelmq.bots.parsers.eset.parser

  • Configuration Parameters:

ETI URLs

Collector

  • Module: intelmq.bots.collectors.eset.collector

  • Configuration Parameters:
    • collection: ei.urls (json)

    • endpoint: eti.eset.com

    • password: <password>

    • time_delta: 3600

    • username: <username>

Parser

  • Module: intelmq.bots.parsers.eset.parser

  • Configuration Parameters:

Fraunhofer

DGA Archive

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_password: {{ your password}}

    • http_url: https://dgarchive.caad.fkie.fraunhofer.de/today

    • http_username: {{ your username}}

    • name: DGA Archive

    • provider: Fraunhofer

    • rate_limit: 10800

Parser

  • Module: intelmq.bots.parsers.fraunhofer.parser_dga

  • Configuration Parameters:

Have I Been Pwned

Enterprise Callback
  • Public: no

  • Revision: 2019-09-11

  • Documentation: https://haveibeenpwned.com/EnterpriseSubscriber/

  • Description: With the Enterprise Subscription of ‘Have I Been Pwned’ you are able to provide a callback URL and any new leak data is submitted to it. It is recommended to put a webserver with Authorization check, TLS etc. in front of the API collector.

  • Additional Information: “A minimal nginx configuration could look like:

``` server {

listen 443 ssl http2; server_name [your host name]; client_max_body_size 50M;

ssl_certificate [path to your key]; ssl_certificate_key [path to your certificate];

location /[your private url] {
if ($http_authorization != ‘[your private password]’) {

return 403;

} proxy_pass http://localhost:5001/intelmq/push; proxy_read_timeout 30; proxy_connect_timeout 30;

}

}

Collector

  • Module: intelmq.bots.collectors.api.collector_api

  • Configuration Parameters:
    • name: Enterprise Callback

    • port: 5001

    • provider: Have I Been Pwned

Parser

  • Module: intelmq.bots.parsers.hibp.parser_callback

  • Configuration Parameters:

Malc0de

Bind Format
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://malc0de.com/dashboard/

  • Description: This feed includes FQDN’s of malicious hosts, the file format is in Bind file format.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://malc0de.com/bl/ZONES

    • name: Bind Format

    • provider: Malc0de

    • rate_limit: 10800

Parser

  • Module: intelmq.bots.parsers.malc0de.parser

  • Configuration Parameters:

IP Blacklist

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://malc0de.com/bl/IP_Blacklist.txt

    • name: IP Blacklist

    • provider: Malc0de

    • rate_limit: 10800

Parser

  • Module: intelmq.bots.parsers.malc0de.parser

  • Configuration Parameters:

Windows Format
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://malc0de.com/dashboard/

  • Description: This feed includes FQDN’s of malicious hosts, the file format is in Windows Hosts file format.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://malc0de.com/bl/BOOT

    • name: Windows Format

    • provider: Malc0de

    • rate_limit: 10800

Parser

  • Module: intelmq.bots.parsers.malc0de.parser

  • Configuration Parameters:

Malware Domains

Malicious
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://www.malwaredomains.com/

  • Description: Malware Prevention through Domain Blocking (Black Hole DNS Sinkhole)

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://mirror1.malwaredomains.com/files/domains.txt

    • name: Malicious

    • provider: Malware Domains

    • rate_limit: 172800

Parser

  • Module: intelmq.bots.parsers.malwaredomains.parser

  • Configuration Parameters:

MalwarePatrol

DansGuardian

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://lists.malwarepatrol.net/cgi/getfile?receipt={{ your API key }}&product=8&list=dansguardian

    • name: DansGuardian

    • provider: MalwarePatrol

    • rate_limit: 180000

Parser

  • Module: intelmq.bots.parsers.malwarepatrol.parser_dansguardian

  • Configuration Parameters:

MalwareURL

Latest malicious activity

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.malwareurl.com/

    • name: Latest malicious activity

    • provider: MalwareURL

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.malwareurl.parser

  • Configuration Parameters:

McAfee Advanced Threat Defense

Sandbox Reports

Collector

  • Module: intelmq.bots.collectors.opendxl.collector

  • Configuration Parameters:
    • dxl_config_file: {{location of dxl configuration file}}

    • dxl_topic: /mcafee/event/atd/file/report

Parser

  • Module: intelmq.bots.parsers.mcafee.parser_atd

  • Configuration Parameters:
    • verdict_severity: 4

Microsoft

BingMURLs via Interflow
  • Public: no

  • Revision: 2018-05-29

  • Documentation: https://docs.microsoft.com/en-us/security/gsp/informationsharingandexchange

  • Description: Collects Malicious URLs detected by Bing from the Interflow API. The feed is available via Microsoft’s Government Security Program (GSP).

  • Additional Information: Depending on the file sizes you may need to increase the parameter ‘http_timeout_sec’ of the collector.

Collector

  • Module: intelmq.bots.collectors.microsoft.collector_interflow

  • Configuration Parameters:
    • api_key: {{your API key}}

    • file_match: ^bingmurls_

    • http_timeout_sec: 300

    • name: BingMURLs via Interflow

    • not_older_than: 2 days

    • provider: Microsoft

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.microsoft.parser_bingmurls

  • Configuration Parameters:

CTIP C2 via Azure
  • Public: no

  • Revision: 2020-05-29

  • Documentation: https://docs.microsoft.com/en-us/security/gsp/informationsharingandexchange

  • Description: Collects the CTIP C2 feed from a shared Azure Storage. The feed is available via Microsoft’s Government Security Program (GSP).

  • Additional Information: The cache is needed for memorizing which files have already been processed, the TTL should be higher than the oldest file available in the storage (currently the last three days are available). The connection string contains endpoint as well as authentication information.

Collector

  • Module: intelmq.bots.collectors.microsoft.collector_azure

  • Configuration Parameters:
    • connection_string: {{your connection string}}

    • container_name: ctip-c2

    • name: CTIP C2 via Azure

    • provider: Microsoft

    • rate_limit: 3600

    • redis_cache_db: 5

    • redis_cache_host: 127.0.0.1

    • redis_cache_port: 6379

    • redis_cache_ttl: 864000

Parser

  • Module: intelmq.bots.parsers.microsoft.parser_ctip

  • Configuration Parameters:

CTIP Infected via Azure
  • Public: no

  • Revision: 2020-05-29

  • Documentation: https://docs.microsoft.com/en-us/security/gsp/informationsharingandexchange

  • Description: Collects the CTIP (Sinkhole data) from a shared Azure Storage. The feed is available via Microsoft’s Government Security Program (GSP).

  • Additional Information: The cache is needed for memorizing which files have already been processed, the TTL should be higher than the oldest file available in the storage (currently the last three days are available). The connection string contains endpoint as well as authentication information.

Collector

  • Module: intelmq.bots.collectors.microsoft.collector_azure

  • Configuration Parameters:
    • connection_string: {{your connection string}}

    • container_name: ctip-infected-summary

    • name: CTIP Infected via Azure

    • provider: Microsoft

    • rate_limit: 3600

    • redis_cache_db: 5

    • redis_cache_host: 127.0.0.1

    • redis_cache_port: 6379

    • redis_cache_ttl: 864000

Parser

  • Module: intelmq.bots.parsers.microsoft.parser_ctip

  • Configuration Parameters:

CTIP via Interflow
  • Public: no

  • Revision: 2018-03-06

  • Documentation: https://docs.microsoft.com/en-us/security/gsp/informationsharingandexchange

  • Description: Collects the CTIP Infected feed (Sinkhole data for your country) files from the Interflow API.The feed is available via Microsoft’s Government Security Program (GSP).

  • Additional Information: Depending on the file sizes you may need to increase the parameter ‘http_timeout_sec’ of the collector. As many IPs occur very often in the data, you may want to use a deduplicator specifically for the feed.

Collector

  • Module: intelmq.bots.collectors.microsoft.collector_interflow

  • Configuration Parameters:
    • api_key: {{your API key}}

    • file_match: ^ctip_

    • http_timeout_sec: 300

    • name: CTIP via Interflow

    • not_older_than: 2 days

    • provider: Microsoft

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.microsoft.parser_ctip

  • Configuration Parameters:

Netlab 360

DGA
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://data.netlab.360.com/dga

  • Description: This feed lists DGA family, Domain, Start and end of valid time(UTC) of a number of DGA families.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://data.netlab.360.com/feeds/dga/dga.txt

    • name: DGA

    • provider: Netlab 360

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.netlab_360.parser

  • Configuration Parameters:

Hajime Scanner
  • Public: yes

  • Revision: 2019-08-01

  • Documentation: https://data.netlab.360.com/hajime/

  • Description: This feed lists IP address for know Hajime bots network. These IPs data are obtained by joining the DHT network and interacting with the Hajime node

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://data.netlab.360.com/feeds/hajime-scanner/bot.list

    • name: Hajime Scanner

    • provider: Netlab 360

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.netlab_360.parser

  • Configuration Parameters:

Magnitude EK
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://data.netlab.360.com/ek

  • Description: This feed lists FQDN and possibly the URL used by Magnitude Exploit Kit. Information also includes the IP address used for the domain and last time seen.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://data.netlab.360.com/feeds/ek/magnitude.txt

    • name: Magnitude EK

    • provider: Netlab 360

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.netlab_360.parser

  • Configuration Parameters:

Mirai Scanner
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: http://data.netlab.360.com/mirai-scanner/

  • Description: This feed provides IP addresses which actively scan for vulnerable IoT devices and install Mirai Botnet.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://data.netlab.360.com/feeds/mirai-scanner/scanner.list

    • name: Mirai Scanner

    • provider: Netlab 360

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.netlab_360.parser

  • Configuration Parameters:

OpenPhish

Premium Feed
  • Public: no

  • Revision: 2018-02-06

  • Documentation: https://www.openphish.com/phishing_feeds.html

  • Description: OpenPhish is a fully automated self-contained platform for phishing intelligence. It identifies phishing sites and performs intelligence analysis in real time without human intervention and without using any external resources, such as blacklists.

  • Additional Information: Discounts available for Government and National CERTs a well as for Nonprofit and Not-for-Profit organizations.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_password: {{ your password}}

    • http_url: https://openphish.com/prvt-intell/

    • http_username: {{ your username}}

    • name: Premium Feed

    • provider: OpenPhish

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.openphish.parser_commercial

  • Configuration Parameters:

Public feed
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://www.openphish.com/

  • Description: OpenPhish is a fully automated self-contained platform for phishing intelligence. It identifies phishing sites and performs intelligence analysis in real time without human intervention and without using any external resources, such as blacklists.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.openphish.com/feed.txt

    • name: Public feed

    • provider: OpenPhish

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.openphish.parser

  • Configuration Parameters:

PhishTank

Online

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://data.phishtank.com/data/{{ your API key }}/online-valid.csv

    • name: Online

    • provider: PhishTank

    • rate_limit: 28800

Parser

  • Module: intelmq.bots.parsers.phishtank.parser

  • Configuration Parameters:

PrecisionSec

Agent Tesla

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://precisionsec.com/threat-intelligence-feeds/agent-tesla/

    • name: Agent Tesla

    • provider: PrecisionSec

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.html_table.parser

  • Configuration Parameters:
    • columns: [“source.ip|source.url”, “time.source”]

    • default_url_protocol: http://

    • skip_table_head: True

    • type: malware

Shadowserver

Via API
  • Public: no

  • Revision: 2020-01-08

  • Documentation: https://www.shadowserver.org/what-we-do/network-reporting/api-documentation/

  • Description: Shadowserver sends out a variety of reports to subscribers, see documentation.

  • Additional Information: This configuration fetches user-configurable reports from the Shadowserver Reports API. For a list of reports, have a look at the Shadowserver collector and parser documentation.

Collector

  • Module: intelmq.bots.collectors.shadowserver.collector_reports_api

  • Configuration Parameters:
    • api_key: <API key>

    • country: <CC>

    • rate_limit: 86400

    • redis_cache_db: 12

    • redis_cache_host: 127.0.0.1

    • redis_cache_port: 6379

    • redis_cache_ttl: 864000

    • secret: <API secret>

    • types: <single report or list of reports>

Parser

  • Module: intelmq.bots.parsers.shadowserver.parser_json

  • Configuration Parameters:

Via IMAP

Collector

  • Module: intelmq.bots.collectors.mail.collector_mail_attach

  • Configuration Parameters:
    • attach_regex: csv.zip

    • extract_files: True

    • folder: INBOX

    • mail_host: __HOST__

    • mail_password: __PASSWORD__

    • mail_ssl: True

    • mail_user: __USERNAME__

    • name: Via IMAP

    • provider: Shadowserver

    • rate_limit: 86400

    • subject_regex: __REGEX__

Parser

  • Module: intelmq.bots.parsers.shadowserver.parser

  • Configuration Parameters:

Via Request Tracker

Collector

  • Module: intelmq.bots.collectors.rt.collector_rt

  • Configuration Parameters:
    • attachment_regex: \.csv\.zip$

    • extract_attachment: True

    • extract_download: False

    • http_password: {{ your HTTP Authentication password or null }}

    • http_username: {{ your HTTP Authentication username or null }}

    • password: __PASSWORD__

    • provider: Shadowserver

    • rate_limit: 3600

    • search_not_older_than: {{ relative time or null }}

    • search_owner: nobody

    • search_queue: Incident Reports

    • search_requestor: autoreports@shadowserver.org

    • search_status: new

    • search_subject_like: [__COUNTRY__] Shadowserver __COUNTRY__

    • set_status: open

    • take_ticket: True

    • uri: http://localhost/rt/REST/1.0

    • url_regex: https://dl.shadowserver.org/[a-zA-Z0-9?_-]*

    • user: __USERNAME__

Parser

  • Module: intelmq.bots.parsers.shadowserver.parser

  • Configuration Parameters:

Shodan

Country Stream
  • Public: no

  • Revision: 2021-03-22

  • Documentation: https://developer.shodan.io/api/stream

  • Description: Collects the Shodan stream for one or multiple countries from the Shodan API.

  • Additional Information: A Shodan account with streaming permissions is needed.

Collector

  • Module: intelmq.bots.collectors.shodan.collector_stream

  • Configuration Parameters:
    • api_key: <API key>

    • countries: <comma-separated list of country codes>

    • error_retry_delay: 0

    • name: Country Stream

    • provider: Shodan

Parser

  • Module: intelmq.bots.parsers.shodan.parser

  • Configuration Parameters:
    • error_retry_delay: 0

    • ignore_errors: False

    • minimal_mode: False

Spamhaus

ASN Drop
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://www.spamhaus.org/drop/

  • Description: ASN-DROP contains a list of Autonomous System Numbers controlled by spammers or cyber criminals, as well as “hijacked” ASNs. ASN-DROP can be used to filter BGP routes which are being used for malicious purposes.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.spamhaus.org/drop/asndrop.txt

    • name: ASN Drop

    • provider: Spamhaus

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.spamhaus.parser_drop

  • Configuration Parameters:

CERT

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: {{ your CERT portal URL }}

    • name: CERT

    • provider: Spamhaus

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.spamhaus.parser_cert

  • Configuration Parameters:

Drop
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://www.spamhaus.org/drop/

  • Description: The DROP list will not include any IP address space under the control of any legitimate network - even if being used by “the spammers from hell”. DROP will only include netblocks allocated directly by an established Regional Internet Registry (RIR) or National Internet Registry (NIR) such as ARIN, RIPE, AFRINIC, APNIC, LACNIC or KRNIC or direct RIR allocations.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.spamhaus.org/drop/drop.txt

    • name: Drop

    • provider: Spamhaus

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.spamhaus.parser_drop

  • Configuration Parameters:

Dropv6
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://www.spamhaus.org/drop/

  • Description: The DROPv6 list includes IPv6 ranges allocated to spammers or cyber criminals. DROPv6 will only include IPv6 netblocks allocated directly by an established Regional Internet Registry (RIR) or National Internet Registry (NIR) such as ARIN, RIPE, AFRINIC, APNIC, LACNIC or KRNIC or direct RIR allocations.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.spamhaus.org/drop/dropv6.txt

    • name: Dropv6

    • provider: Spamhaus

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.spamhaus.parser_drop

  • Configuration Parameters:

EDrop
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://www.spamhaus.org/drop/

  • Description: EDROP is an extension of the DROP list that includes sub-allocated netblocks controlled by spammers or cyber criminals. EDROP is meant to be used in addition to the direct allocations on the DROP list.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.spamhaus.org/drop/edrop.txt

    • name: EDrop

    • provider: Spamhaus

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.spamhaus.parser_drop

  • Configuration Parameters:

Strangereal Intel

DailyIOC

Collector

  • Module: intelmq.bots.collectors.github_api.collector_github_contents_api

  • Configuration Parameters:
    • basic_auth_password: PASSWORD

    • basic_auth_username: USERNAME

    • regex: .*.json

    • repository: StrangerealIntel/DailyIOC

Parser

  • Module: intelmq.bots.parsers.github_feed

  • Configuration Parameters:

Sucuri

Hidden IFrames
  • Public: yes

  • Revision: 2018-01-28

  • Documentation: http://labs.sucuri.net/?malware

  • Description: Latest hidden iframes identified on compromised web sites.

  • Additional Information: Please note that the parser only extracts the hidden iframes and the conditional redirects, not the encoded javascript.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://labs.sucuri.net/?malware

    • name: Hidden IFrames

    • provider: Sucuri

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.sucuri.parser

  • Configuration Parameters:

Surbl

Malicious Domains
  • Public: no

  • Revision: 2018-09-04

  • Description: Detected malicious domains. Note that you have to opened up Sponsored Datafeed Service (SDS) access to the SURBL data via rsync for your IP address.

Collector

  • Module: intelmq.bots.collectors.rsync.collector_rsync

  • Configuration Parameters:
    • file: wild.surbl.org.rbldnsd

    • rsync_path: blacksync.prolocation.net::surbl-wild/

Parser

  • Module: intelmq.bots.parsers.surbl.parser

  • Configuration Parameters:

Taichung

Netflow Recent

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.tc.edu.tw/net/netflow/lkout/recent/

    • name: Netflow Recent

    • provider: Taichung

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.taichung.parser

  • Configuration Parameters:

Team Cymru

CAP
  • Public: no

  • Revision: 2018-01-20

  • Documentation: https://www.team-cymru.com/CSIRT-AP.html https://www.cymru.com/$certname/report_info.txt

  • Description: Team Cymru provides daily lists of compromised or abused devices for the ASNs and/or netblocks with a CSIRT’s jurisdiction. This includes such information as bot infected hosts, command and control systems, open resolvers, malware urls, phishing urls, and brute force attacks

  • Additional Information: “Two feeds types are offered:

Both formats are supported by the parser and the new one is recommended. As of 2019-09-12 the old format will be retired soon.”

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_password: {{your password}}

    • http_url: https://www.cymru.com/$certname/$certname_{time[%Y%m%d]}.txt

    • http_url_formatting: True

    • http_username: {{your login}}

    • name: CAP

    • provider: Team Cymru

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.cymru.parser_cap_program

  • Configuration Parameters:

Full Bogons IPv4
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://www.team-cymru.com/bogon-reference-http.html

  • Description: Fullbogons are a larger set which also includes IP space that has been allocated to an RIR, but not assigned by that RIR to an actual ISP or other end-user. IANA maintains a convenient IPv4 summary page listing allocated and reserved netblocks, and each RIR maintains a list of all prefixes that they have assigned to end-users. Our bogon reference pages include additional links and resources to assist those who wish to properly filter bogon prefixes within their networks.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.team-cymru.org/Services/Bogons/fullbogons-ipv4.txt

    • name: Full Bogons IPv4

    • provider: Team Cymru

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.cymru.parser_full_bogons

  • Configuration Parameters:

Full Bogons IPv6
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://www.team-cymru.com/bogon-reference-http.html

  • Description: Fullbogons are a larger set which also includes IP space that has been allocated to an RIR, but not assigned by that RIR to an actual ISP or other end-user. IANA maintains a convenient IPv4 summary page listing allocated and reserved netblocks, and each RIR maintains a list of all prefixes that they have assigned to end-users. Our bogon reference pages include additional links and resources to assist those who wish to properly filter bogon prefixes within their networks.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.team-cymru.org/Services/Bogons/fullbogons-ipv6.txt

    • name: Full Bogons IPv6

    • provider: Team Cymru

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.cymru.parser_full_bogons

  • Configuration Parameters:

Threatminer

Recent domains

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.threatminer.org/

    • name: Recent domains

    • provider: Threatminer

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.threatminer.parser

  • Configuration Parameters:

Turris

Greylist
  • Public: yes

  • Revision: 2018-01-20

  • Documentation: https://project.turris.cz/en/greylist

  • Description: The data are processed and classified every week and behaviour of IP addresses that accessed a larger number of Turris routers is evaluated. The result is a list of addresses that have tried to obtain information about services on the router or tried to gain access to them. The list also contains a list of tags for each address which indicate what behaviour of the address was observed.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.turris.cz/greylist-data/greylist-latest.csv

    • name: Greylist

    • provider: Turris

    • rate_limit: 43200

Parser

  • Module: intelmq.bots.parsers.turris.parser

  • Configuration Parameters:

Greylist with PGP signature verification

IP addresses that accessed a larger number of Turris routers is evaluated. The result is a list of addresses that have tried to obtain information about services on the router or tried to gain access to them. The list also contains a list of tags for each address which indicate what behaviour of the address was observed.

The Turris Greylist feed provides PGP signatures for the provided files. You will need to import the public PGP key from the linked documentation page, currently available at https://pgp.mit.edu/pks/lookup?op=vindex&search=0x10876666 or from below. See the URL Fetcher Collector documentation for more information on PGP signature verification.

PGP Public key: ``` —–BEGIN PGP PUBLIC KEY BLOCK—– Version: SKS 1.1.6 Comment: Hostname: pgp.mit.edu

mQINBFRl7D8BEADaRFoDa/+r27Gtqrdn8sZL4aSYTU4Q3gDr3TfigK8H26Un/Y79a/DUL1o0 o8SRae3uwVcjJDHZ6KDnxThbqF7URfpuCcCYxOs8p/eu3dSueqEGTODHWF4ChIh2japJDc4t 3FQHbIh2e3GHotVqJGhvxMmWqBFoZ/mlWvhjs99FFBZ87qbUNk7l1UAGEXeWeECgz9nGox40 3YpCgEsnJJsKC53y5LD/wBf4z+z0GsLg2GMRejmPRgrkSE/d9VjF/+niifAj2ZVFoINSVjjI 8wQFc8qLiExdzwLdgc+ggdzk5scY3ugI5IBt1zflxMIOG4BxKj/5IWsnhKMG2NLVGUYOODoG pKhcY0gCHypw1bmkp2m+BDVyg4KM2fFPgQ554DAX3xdukMCzzZyBxR3UdT4dN7xRVhpph3Y2 Amh1E/dpde9uwKFk1oRHkRZ3UT1XtpbXtFNY0wCiGXPt6KznJAJcomYFkeLHjJo3nMK0hISV GSNetVLfNWlTkeo93E1innbSaDEN70H4jPivjdVjSrLtIGfr2IudUJI84dGmvMxssWuM2qdg FSzoTHw9UE9KT3SltKPS+F7u9x3h1J492YaVDncATRjPZUBDhbvo6Pcezhup7XTnI3gbRQc2 oEUDb933nwuobHm3VsUcf9686v6j8TYehsbjk+zdA4BoS/IdCwARAQABtC5UdXJyaXMgR3Jl eWxpc3QgR2VuZXJhdG9yIDxncmV5bGlzdEB0dXJyaXMuY3o+iQI4BBMBAgAiBQJUZew/AhsD BgsJCAcDAgYVCAIJCgsEFgIDAQIeAQIXgAAKCRDAQrU3EIdmZoH4D/9Jo6j9RZxCAPTaQ9WZ WOdb1Eqd/206bObEX+xJAago+8vuy+waatHYBM9/+yxh0SIg2g5whd6J7A++7ePpt5XzX6hq bzdG8qGtsCRu+CpDJ40UwHep79Ck6O/A9KbZcZW1z/DhbYT3z/ZVWALy4RtgmyC67Vr+j/C7 KNQ529bs3kP9AzvEIeBC4wdKl8dUSuZIPFbgf565zRNKLtHVgVhiuDPcxKmBEl4/PLYF30a9 5Tgp8/PNa2qp1DV/EZjcsxvSRIZB3InGBvdKdSzvs4N/wLnKWedj1GGm7tJhSkJa4MLBSOIx yamhTS/3A5Cd1qoDhLkp7DGVXSdgEtpoZDC0jR7nTS6pXojcgQaF7SfJ3cjZaLI5rjsx0YLk G4PzonQKCAAQG1G9haCDniD8NrrkZ3eFiafoKEECRFETIG0BJHjPdSWcK9jtNCupBYb7JCiz Q0hwLh2wrw/wCutQezD8XfsBFFIQC18TsJAVgdHLZnGYkd5dIbV/1scOcm52w6EGIeMBBYlB J2+JNukH5sJDA6zAXNl2I1H1eZsP4+FSNIfB6LdovHVPAjn7qXCw3+IonnQK8+g8YJkbbhKJ sPejfg+ndpe5u0zX+GvQCFBFu03muANA0Y/OOeGIQwU93d/akN0P1SRfq+bDXnkRIJQOD6XV 0ZPKVXlNOjy/z2iN2A== =wjkM —–END PGP PUBLIC KEY BLOCK—– ```

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://www.turris.cz/greylist-data/greylist-latest.csv

    • name: Greylist

    • provider: Turris

    • rate_limit: 43200

    • signature_url: https://www.turris.cz/greylist-data/greylist-latest.csv.asc

    • verify_pgp_signatures: False

Parser

  • Module: intelmq.bots.parsers.turris.parser

  • Configuration Parameters:

University of Toulouse

Blacklist

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • extract_files: true

    • http_url: https://dsi.ut-capitole.fr/blacklists/download/{collection name}.tar.gz

    • name: Blacklist

    • provider: University of Toulouse

    • rate_limit: 43200

Parser

  • Module: intelmq.bots.parsers.generic.parser_csv

  • Configuration Parameters:
    • columns: {depends on a collection}

    • delimiter: false

    • type: {depends on a collection}

VXVault

URLs

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://vxvault.net/URL_List.php

    • name: URLs

    • provider: VXVault

    • rate_limit: 3600

Parser

  • Module: intelmq.bots.parsers.vxvault.parser

  • Configuration Parameters:

ViriBack

Unsafe sites
  • Public: yes

  • Revision: 2018-06-27

  • Documentation: https://viriback.com/

  • Description: Latest detected unsafe sites.

  • Additional Information: You need to install the lxml library in order to parse this feed.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://tracker.viriback.com/

    • name: Unsafe sites

    • provider: ViriBack

    • rate_limit: 86400

Parser

  • Module: intelmq.bots.parsers.html_table.parser

  • Configuration Parameters:
    • columns: [“malware.name”, “source.url”, “source.ip”, “time.source”]

    • html_parser: lxml

    • time_format: from_format_midnight|%d-%m-%Y

    • type: malware

WebInspektor

Unsafe sites
  • Public: yes

  • Revision: 2018-03-09

  • Description: Latest detected unsafe sites.

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: https://app.webinspector.com/public/recent_detections/

    • name: Unsafe sites

    • provider: WebInspektor

    • rate_limit: 60

Parser

  • Module: intelmq.bots.parsers.webinspektor.parser

  • Configuration Parameters:

ZoneH

Defacements
  • Public: no

  • Revision: 2018-01-20

  • Documentation: https://zone-h.org/

  • Description: all the information contained in Zone-H’s cybercrime archive were either collected online from public sources or directly notified anonymously to us.

Collector

  • Module: intelmq.bots.collectors.mail.collector_mail_attach

  • Configuration Parameters:
    • attach_regex: csv

    • extract_files: False

    • folder: INBOX

    • mail_host: __HOST__

    • mail_password: __PASSWORD__

    • mail_ssl: True

    • mail_user: __USERNAME__

    • name: Defacements

    • provider: ZoneH

    • rate_limit: 3600

    • sent_from: datazh@zone-h.org

    • subject_regex: Report

Parser

  • Module: intelmq.bots.parsers.zoneh.parser

  • Configuration Parameters:

cAPTure

Ponmocup Domains CIF Format

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://security-research.dyndns.org/pub/malware-feeds/ponmocup-infected-domains-CIF-latest.txt

    • name: Infected Domains

    • provider: cAPTure

    • rate_limit: 10800

Parser

  • Module: intelmq.bots.parsers.dyn.parser

  • Configuration Parameters:

Ponmocup Domains Shadowserver Format

Collector

  • Module: intelmq.bots.collectors.http.collector_http

  • Configuration Parameters:
    • http_url: http://security-research.dyndns.org/pub/malware-feeds/ponmocup-infected-domains-shadowserver.csv

    • name: Infected Domains

    • provider: cAPTure

    • rate_limit: 10800

Parser

  • Module: intelmq.bots.parsers.generic.parser_csv

  • Configuration Parameters:
    • columns: [“time.source”, “source.ip”, “source.fqdn”, “source.urlpath”, “source.port”, “protocol.application”, “extra.tag”, “extra.redirect_target”, “extra.category”]

    • compose_fields: {‘source.url’: ‘http://{0}{1}’}

    • delimiter: ,

    • skip_header: True

    • type: malware-distribution

Frequently asked questions

For questions about the API, have a look at the API documentation page

Send IntelMQ events to Splunk

  1. Go to Splunk and configure in order to be able to receive logs(intelmq events) to a TCP port

  2. Use TCP output bot and configure accordingly to the Splunk configuration that you applied.

Permission denied when using Redis Unix socket

If you get an error like this:

intelmq.lib.exceptions.PipelineError: pipeline failed - ConnectionError('Error 13 connecting to unix socket: /var/run/redis/redis.sock. Permission denied.',)

Make sure the intelmq user as sufficient permissions for the socket.

In /etc/redis/redis.conf (or wherever your configuration is), check the permissions and set it for example to group-writeable:

unixsocketperm 770

And add the user intelmq to the redis-group:

usermod -aG redis intelmq

Why is the time invalid?

If you wonder why you are getting errors like this:

intelmq.lib.exceptions.InvalidValue: invalid value '2017-03-06T07:36:29' () for key 'time.source'

IntelMQ requires time zone information for all timestamps. Without a time zone, the time is ambiguous and therefore rejected.

How can I improve the speed?

In most cases the bottlenecks are look-up experts. In these cases you can easily use the integrated load balancing features.

Multithreading

When using the AMQP broker, you can make use of Multi-threading. See the Multithreading (Beta) section.

“Classic” load-balancing (Multiprocessing)

Before Multithreading was available in IntelMQ, and in case you use Redis as broker, the only way to do load balancing involves more work. Create multiple instances of the same bot and connect them all to the same source and destination bots. Then set the parameter load_balance to true for the bot which sends the messages to the duplicated bot. Then, the bot sends messages to only one of the destination queues and not to all of them.

True Multi*processing* is not available in IntelMQ. See also this discussion on a possible enhanced load balancing.

Other options

For any bottleneck based on (online) lookups, optimize the lookup itself and if possible use local databases.

It is also possible to use multiple servers to spread the workload. To get the messages from one system to the other you can either directly connect to the other’s pipeline or use a fast exchange mechanism such as the TCP Collector/Output (make sure to secure the network by other means).

Removing raw data for higher performance and less space usage

If you do not need the raw data, you can safely remove it. For events (after parsers), it keeps the original data, eg. a line of a CSV file. In reports it keeps the actual data to be parsed, so don’t delete the raw field in Reports - between collectors and parsers.

The raw data consumes about 50% - 30% of the messages’ size. The size of course depends on how many additional data you add to it and how much data the report includes. Dropping it, will improve the speed as less data needs to be transferred and processed at each step.

In a bot

You can do this for example by using the Field Reducer Expert. The configuration could be:

  • type: blacklist

  • keys: raw

Other solutions are the Modify bot and the Sieve bot. The last one is a good choice if you already use it and you only need to add the command:

remove raw

In the database

In case you store data in the database and you want to keep its size small, you can (periodically) delete the raw data there.

To remove the raw data for a events table of a PostgreSQL database, you can use something like:

UPDATE events SET raw = NULL WHERE "time.source" < '2018-07-01';

If the database is big, make sure only update small parts of the database by using an appropriate WHERE clause. If you do not see any negative performance impact, you can increase the size of the chunks, otherwise the events in the output bot may queue up. The id column can also be used instead of the source’s time.

My bot(s) died on startup with no errors logged

Rather than starting your bot(s) with intelmqctl start, try intelmqctl run [bot]. This will provide valuable debug output you might not otherwise see, pointing to issues like system configuration errors.

Orphaned Queues

This section has been moved to the section Orphaned Queues.

Multithreading is not available for this bot

Multithreading is not available for some bots and AMQP broker is necessary. Possible reasons why a certain bot or a setup does not support Multithreading include:

  • Multithreading is only available when using the AMQP broker.

  • For most collectors, Multithreading is disabled. Otherwise this would lead to duplicated data, as the data retrieval is not atomic.

  • Some bots use libraries which are not thread safe. Look a the bot’s documentation for more information.

  • Some bots’ operations are not thread safe. Look a the bot’s documentation for more information.

If you think this mapping is wrong, please report a bug.

IntelMQ API

intelmq-api is a hug based API for the IntelMQ project.

Installing and running intelmq-api

intelmq-api requires the IntelMQ package to be installed on the system (it uses intelmqctl to control the botnet).

You can install the intelmq-api package using your preferred system package installation mechanism or using the pip Python package installer. We provide packages for the intelmq-api for the same operating systems as we do for the intelmq package itself. For the list of supported distributions, please see the intelmq Installation page.

Our repository page gives installation instructions for various operating systems. No additional set-up steps are needed if you use these packages.

The intelmq-api provides the route /api for managing the IntelMQ installation.

For development purposes and testing you can also run intelmq-api directly using hug:

hug -m intelmq_api.serve
Installation using pip

The intelmq-api packages ship a configuration file in ${PREFIX}/etc/intelmq/api-config.json, a positions configuration for the manager in {PREFIX}/etc/intelmq/manager/positions.conf, a virtualhost configuration file for Apache 2 in ${PREFIX}/etc/intelmq/api-apache.conf and a sudoers configuration file in ${PREFIX}/etc/intelmq/api-sudoers.conf. The value of ${PREFIX} depends on your environment and is something like /usr/local/lib/pythonX.Y/dist-packages/ (where X.Y is your Python version).

The file ${PREFIX}/etc/intelmq/api-apache.conf needs to be placed in the correct place for your Apache 2 installation.
  • On Debian and Ubuntu, move the file to /etc/apache2/conf-available.d/manager-apache.conf and then execute a2enconf manager-apache.

  • On CentOS, RHEL and Fedora, move the file to /etc/httpd/conf.d/.

  • On openSUSE, move the file to /etc/apache2/conf.d/.

Don’t forget to reload your webserver afterwards.

  • The file ${PREFIX}/etc/intelmq/api-config.json needs to be moved to /etc/intelmq/api-config.json.

  • The file ${PREFIX}/etc/intelmq/manager/positions.conf needs to be moved to /etc/intelmq/manager/positions.conf.

  • Last but not least move the file ${PREFIX}/etc/intelmq/api-sudoers.conf to /etc/sudoers.d/01_intelmq-api and adapt the webserver user name in this file. Set the file permissions to 0o440.

Afterwards continue with the section Permissions below.

IntelMQ 2.3.1 comes with a tool intelmqsetup which performs these set-up steps automatically. Please note that the tool is very new and may not detect all situations correctly. Please report us any bugs you are observing. The tools is idempotent, you can execute it multiple times.

Configuring intelmq-api

Depending on your setup you might have to install sudo to make it possible for the intelmq-api to run the intelmq command as the user-account usually used to run intelmq (which is also often called intelmq).

intelmq-api is configured using a configuration file in json format. intelmq-api tries to load the configuration file from /etc/intelmq/api-config.json and ${PREFIX}/etc/intelmq/api-config.json, but you can override the path setting the environment variable INTELMQ_API_CONFIG. (When using Apache, you can do this by modifying the Apache configuration file shipped with intelmq-api, the file contains an example)

When running the API using hug, you can set the environment variable like this:

INTELMQ_API_CONFIG=/etc/intelmq/api-config.json hug -m intelmq_api.serve

The default configuration which is shipped with the packages is also listed here for reference:

{
    "intelmq_ctl_cmd": ["sudo", "-u", "intelmq", "intelmqctl"],
    "allowed_path": "/opt/intelmq/var/lib/bots/",
    "session_store": "/etc/intelmq/api-session.sqlite",
    "session_duration": 86400,
    "allow_origins": ["*"]
}

On Debian based systems, the default path for the session_store is /var/lib/dbconfig-common/sqlite3/intelmq-api/intelmqapi, because the Debian package uses the Debian packaging tools to manage the database file.

The following configuration options are available:

  • intelmq_ctl_cmd: Your intelmqctl command. If this is not set in a configuration file the default is used, which is ["sudo", "-u", "intelmq", "/usr/local/bin/intelmqctl"] The option "intelmq_ctl_cmd" is a list of strings so that we can avoid shell-injection vulnerabilities because no shell is involved when running the command. This means that if the command you want to use needs parameters, they have to be separate strings.

  • allowed_path: intelmq-api can grant read-only access to specific files - this setting defines the path those files can reside in.

  • session_store: this is an optional path to a sqlite database, which is used for session storage and authentication. If it is not set (which is the default), no authentication is used!

  • session_duration: the maximal duration of a session, its 86400 seconds by default

  • allow_origins: a list of origins the responses of the API can be shared with. Allows every origin by default.

Permissions

intelmq-api tries to write a couple of configuration files in the ${PREFIX}/etc/intelmq directory - this is only possible if you set the permissions accordingly, given that intelmq-api runs under a different user. The user the API run as also needs write access to the folder the session_store is located in, otherwise there will be an error accessing the session data. If you’re using the default Apache 2 setup, you might want to set the group of the files to www-data and give it write permissions (chmod -R g+w <directoryname>). In addition to that, the intelmq-manager tries to store the bot positions via the API into the file ${PREFIX}/etc/intelmq/manager/positions.conf. You should therefore create the folder ${PREFIX}/etc/intelmq/manager and the file positions.conf in it.

Adding a user

If you enable the session_store you will have to create user accounts to be able to access the API functionality. You can do this using intelmq-api-adduser:

intelmq-api-adduser --user <username> --password <password>

A note on SELinux

On systems with SELinux enabled, the API will fail to call intelmqctl. Therefore, SELinux needs to be disabled:

setenforce 0

We welcome contributions to provide SELinux policies.

Frequent operational problems

IntelMQCtlError

If the command is not configured correctly, you’ll see exceptions on startup like this:

intelmq_manager.runctl.IntelMQCtlError: <ERROR_MESSAGE>

This means the intelmqctl command could not be executed as a subprocess. The <ERROR_MESSAGE> should indicate why.

Access Denied / Authentication Required “Please provide valid Token verification credentials”

If you see the IntelMQ Manager interface and menu, but the API calls to the back-end querying configuration and status of IntelMQ fail with “Access Denied” or “Authentication Required: Please provide valid Token verification credentials” errors, you are maybe not logged in while the API requires authentication.

By default, the API requires authentication. Create user accounts and login with them or - if you have other protection means in place - deactivate the authentication requirement by removing or renaming the session_store parameter in the configuration.

Internal Server Error

There can be various reasons for internal server errors. You need to look at the error log of your web server, for example /var/log/apache2/error.log or /var/log/httpd/error_log for Apache 2. It could be that the sudo-setup is not functional, the configuration file or session database file can not be read or written or other errors in regards to the execution of the API program.

Can I just install it from the deb/rpm packages while installing IntelMQ from a different source?

Yes, you can install the API and the Manager from the deb/rpm repositories, and install your IntelMQ from a somewhere else, e.g. a local repository. However, knowledge about Python system administration experience and is recommended if you do so.

The packages install IntelMQ to /usr/lib/python3*/site-packages/intelmq/. Installing with pip results in /usr/local/lib/python3*/site-packages/intelmq/ (and some other accompaning resources) which overrides the installation in /usr/lib/. You probably need to adapt the configuration parameter intelmq_ctl_cmd to the /usr/local/bin/intelmqctl executable and some other tweaks.

sqlite3.OperationalError: attempt to write a readonly database

SQLite does not only need write access to the database itself, but also the folder the database file is located in. Please check that the webserver has write permissions to the folder the session file is located in.

Getting help

You can use the IntelMQ users mailing lists and GitHub issues for getting help and getting in touch with other users and developers. See also the Introduction page.

IntelMQ Manager

IntelMQ Manager is a graphical interface to manage configurations for IntelMQ. Its goal is to provide an intuitive tool to allow non-programmers to specify the data flow in IntelMQ.

Installation

For the intelmq-manager webinterface any operating system that can serve HTML pages is supported. intelmq-manager can be installed via Python pip or via the operating systems package manager. We provide packages for the intelmq-manager for the same operating systems as we do for the intelmq package itself. For the list of supported distributions, please see the IntelMQ Installation page.

Our repository page gives installation instructions for various operating systems. No additional set-up steps are needed if you use these packages.

To use the intelmq-manager webinterface, you have to have a working intelmq installation which provides access to the IntelMQ API.

When using distribution packages, the webserver configuration (which is also shown below) for Apache will be automatically installed and the HTML files are stored under /usr/share/intelmq-manager/html. The webinterface is then available at http://localhost/intelmq-manager.

Installation using pip

For installation via pip, the situation is more complex. The packages install the HTML files in ${PREFIX}/usr/share/intelmq-manager/html. The value of ${PREFIX} depends on your environment and is something like /usr/local/lib/pythonX.Y/dist-packages/ (where X.Y is your Python version). You can either move the files to /usr/share/intelmq-manager/html or adapt the path in the webserver configuration, see below.

intelmq-manager ships with a default configuration for the Apache webserver (in ${PREFIX}/etc/intelmq/manager-apache.conf):

Alias /intelmq-manager /usr/share/intelmq_manager/html/

<Directory /usr/share/intelmq_manager/html>
    <IfModule mod_headers.c>
    Header set Content-Security-Policy "script-src 'self'"
    Header set X-Content-Security-Policy "script-src 'self'"
    </IfModule>
</Directory>
This file needs to be placed in the correct place for your Apache 2 installation.
  • On Debian and Ubuntu, move the file to /etc/apache2/conf-available.d/manager-apache.conf and then execute a2enconf manager-apache.

  • On CentOS, RHEL and Fedora, move the file to /etc/httpd/conf.d/.

  • On openSUSE, move the file to /etc/apache2/conf.d/.

Don’t forget to reload your webserver afterwards.

IntelMQ 2.3.1 comes with a tool intelmqsetup which performs these set-up steps automatically. Please note that the tool is very new and may not detect all situations correctly. Please report us any bugs you are observing. The tools is idempotent, you can execute it multiple times.

Security considerations

Never ever run intelmq-manager on a public webserver without SSL and proper authentication!

The way the current version is written, anyone can send a POST request and change intelmq’s configuration files via sending HTTP POST requests. Intelmq-manager will reject non JSON data but nevertheless, we don’t want anyone to be able to reconfigure an intelmq installation.

Therefore you will need authentication and SSL. Authentication can be handled by the intelmq-api. Please refer to its documentation on how to enable authentication and setup accounts.

Never ever allow unencrypted, unauthenticated access to intelmq-manager!

Configuration

In the file /usr/share/intelmq-manager/html/js/vars.js set ROOT to the URL of your intelmq-api installation- by default that’s on the same host as intelmq-manager.

It is recommended to set these two headers for all requests:

Content-Security-Policy: script-src 'self'
X-Content-Security-Policy: script-src 'self'

Screenshots

This interface lets you visually configure the whole IntelMQ pipeline and the parameters of every single bot. You will be able to see the pipeline in a graph-like visualisation similar to the following screenshot (click to enlarge):

Main Interface

When you add a node or edit one you’ll be presented with a form with the available parameters for a bot. There you can easily change the parameters as shown in the screenshot:

Parameter editing

After editing the bots’ configuration and pipeline, simply click “Save Configuration” to automatically write the changes to the correct files. The configurations are now ready to be deployed.

Note well: if you do not press “Save Configuration” your changes will be lost whenever you reload the web page or move between different tabs within the IntelMQ manager page.

When you save a configuration you can go to the ‘Management’ section to see what bots are running and start/stop the entire botnet, or a single bot.

Botnet Management

You can also monitor the logs of individual bots or see the status of the queues for the entire system or for single bots.

In this next example we can see the number of queued messages for all the queues in the system.

Botnet Monitor

The following example we can see the status information of a single bot. Namely, the number of queued messages in the queues that are related to that bot and also the last 20 log lines of that single bot.

Bot Monitor

Usage

Any underscored letter denotes access key shortcut. The needed shortcut-keyboard is different per Browser:

  • Firefox: <kbd>Alt + Shift + letter</kbd>

  • Chrome & Chromium: <kbd>Alt + letter</kbd>

The IntelMQ Manager queries the configuration file paths and directory names from intelmqctl and therefore any global environment variables (if set) are effective in the Manager too. The interface for this query is intelmqctl debug --get-paths, the result is also shown in the /about.html page of your IntelMQ Manager installation.

For more information on the ability to adapt paths, have a look at the Configuration section.

Named queues / paths

With IntelMQ Manager you can set the name of certain paths by double-clicking on the line which connects two bots:

Enter path

The name is then displayed along the edge:

Show path name

Connecting with other systems

IntelMQ Ecosystem

IntelMQ is more than a the core library itself and many programs are developed around in the IntelMQ initiative. This document provides an overview of the ecosystem and all related tools. If you think something is missing, please let us know!

IntelMQ “Core”

This is IntelMQ itself, as it is available on github.

It includes all the bots, the harmonization, etc.

IntelMQ Manager

The Manager is the most known software and can be seen as the face of IntelMQ. This software provides a graphical user interface to the management tool intelmqctl.

Repository: IntelMQ Manager

EventDB

This is not a software itself but listed here because the term it is often mentioned.

The EventDB is a (usually PostgreSQL) database with data from intelmq.

For some related scripts see the contrib/eventdb directory and the eventdb-stats repository for simple statistics generation.

intelmq-webinput-csv

A web-based interface to inject CSV data into IntelMQ with on-line validation and live feedback.

Repository: intelmq-webinput-csv

intelmq-cb-mailgen

A solution allowing an IntelMQ setup with a complex contact database, managed by a web interface and sending out aggregated email reports. (In different words: To send grouped notifications to network owners using SMTP.)

Repository: intelmq-cb-mailgen

IntelMQ Fody + Backend

Fody is a web based interface for intelmq-mailgen’s contact database and the EventDB. It can also be used to just query the EventDB.

The certbund-contact expert fetches the information from this contact database and provides scripts to import RIPE data into the contact database.

Repository: intelmq-fody

Repository: intelmq-fody-backend

Repository: intelmq-certbund-contact

intelmq-mailgen

The email sending part:

Repository: intelmq-mailgen

“Constituency Portal” do-portal (not developed any further)

Note: A new version is being developed from scratch, see do-portal#133 for more information.

A contact portal with organizational hierarchies, role functionality and network objects based on RIPE, allows self-administration by the contacts. Can be queried from IntelMQ and integrates the stats-portal.

Repository: do-portal

stats-portal

A Grafana-based statistics portal for the EventDB. Integrated in do-portal.

Repository: stats-portal

Malware Name Mapping

A mapping for malware names of different feeds with different names to a common family name.

Repository: malware_name_mapping

IntelMQ-Docker

A repository with tools for IntelMQ docker instance.

Repository: intelmq-docker

ELK Stack

If you wish to run IntelMQ with ELK (Elasticsearch, Logstash, Kibana) it is entirely possible. This guide assumes the reader is familiar with basic configuration of ELK and does not aim to cover using ELK in general. It is based on the version 6.8.0 (ELK is a fast moving train therefore things might change). Assuming you have IntelMQ (and Redis) installation in place, lets dive in.

Configuring IntelMQ for Logstash

In order to pass IntelMQ events to Logstash we will utilize already installed Redis. Add a new Redis Output Bot to your pipeline. As the minimum fill in the following parameters: bot-id, redis_server_ip (can be hostname), redis_server_port, redis_password (if required, else set to empty!), redis_queue (name for the queue). Redis IP, port and password can be taken from defaults.conf. It is recommended to use a different redis_db parameter than used by the IntelMQ (specified in defaults.conf as source_pipeline_db, destination_pipeline_db and statistics_database).

Example values:

bot-id: logstash-output
redis_server_ip: 10.10.10.10
redis_server_port: 6379
redis_db: 4
redis_queue: logstash-queue

Notes

  • Unfortunately you will not be able to monitor this redis queue via IntelMQ Manager.

Configuring Logstash

Logstash defines pipeline as well. In the pipeline configuration of Logstash you need to specify where it should look for IntelMQ events, what to do with them and where to pass them.

Input

This part describes how to receive data from Redis queue. See the example configuration and comments below:

input {
  redis {
    host => "10.10.10.10"
    port => 6379
    db => 4
    data_type => "list"
    key => "logstash-queue"
  }
}
  • host - same as redis_server_ip from the Redis Output Bot

  • port - the redis_server_port from the Redis Output Bot

  • db - the redis_db parameter from the Redis Output Bot

  • data_type - set to list

  • key - same as redis_queue from the Redis Output Bot

Notes

  • You can also use syntax like this: host => “${REDIS_HOST:10.10.10.10}”The value will be taken from environment variable $REDIS_HOST. If the environment variable is not defined then the default value of 10.10.10.10 will be used instead.

Filter (optional)

Before passing the data to the database you can apply certain changes. This is done with filters. See an example:

filter {
  mutate {
    lowercase => ["source.geolocation.city", "classification.identifier"]
    remove_field => ["__type", "@version"]
  }
  date {
    match => ["time.observation", "ISO8601"]
  }
}

Notes

  • It is not recommended to apply any modifications to the data (within the mutate key) outside of the IntelMQ. All necessary modifications should be done only by appropriate IntelMQ bots. This example only demonstrates the possibility.

  • It is recommended to use the date filter: generally we have two timestamp fields - time.source (provided by the feed source this can be understood as when the event happened; however it is not always present) and time.observation (when IntelMQ collected this event). Logstash also adds another field @timestamp with time of processing by Logstash. While it can be useful for debugging, I recommend to set the @timestamp to the same value as time.observation.

Output

The pipeline also needs output, where we define our database (Elasticsearch). The simplest way of doing so is defining an output like this:

output {
  elasticsearch {
    hosts => ["http://10.10.10.11:9200", "http://10.10.10.12:9200"]
    index => "intelmq-%{+YYYY.MM}"
  }
}
  • hosts - Elasticsearch host (or more) with the correct port (9200 by default)

  • index - name of the index where to insert data

Notes

  • Authors experience, hardware equipment and the amount of events collected led to having a separate index for each month. This might not necessarily suit your needs, but is a suggested option.

  • By default the ELK stack uses unsecure HTTP. It is possible to setup Security for secure connections and basic user management. This is possible with the Basic (free) licence since versions 6.8.0 and 7.1.0.

Configuring Elasticsearch

Configuring Elasticsearch is entirely up to you and should be consulted with the official documentation. What you will most likely need is something called index template mappings. IntelMQ provides a tool for generating such mappings. See ElasticMapper Tool.

Notes

  • Default installation of Elasticsearch database allows anyone with cURL and connection capability administrative access to the database. Make sure you secure your toys!

MISP integrations in IntelMQ

MISP API Collector

The MISP API Collector fetches data from MISP via the MISP API.

Look at the Bots’ documentation for more information.

MISP Expert

The MISP Expert searches MISP by API for attributes/events matching the source.ip of the event. The MISP Attribute UUID and MISP Event ID of the newest attribute are added to the event.

Look at the Bots’ documentation for more information.

MISP Feed Output

This bot creates a complete “MISP feed” ready to be configured in MISP as incoming data source.

Look at the Bots’ documentation for more information.

MISP API Output

Can be used to directly create MISP events in a MISP instance.

Look at the Bots’ documentation for more information.

IntelMQ - n6 Integration

n6 is an Open Source Tool with very similar aims as IntelMQ, processing and distributing IoC data, developed by CERT.pl. The covered use-cases differ and both tools have non-overlapping strengths.

Information about n6 can be found here: - Website: https://n6.cert.pl/en/ - Development: https://github.com/CERT-Polska/n6/

n6 schema

Data format

The internal data representation differs for the systems, so any data exchanged between the systems needs to be converted. As n6 can save multiple IP addresses per event, which IntelMQ is unable to do, one n6 event results in one or more IntelMQ events. Thus and because of some other reasons, the conversion is not bidirectional.

Data exchange interface

n6 offers a STOMP interface via the RabbitMQ broker, which can be used for both sending and receiving data. IntelMQ has both a STOMP collector bot as well as a STOMP output bot.

Data conversion

IntelMQ can parse n6 data using the n6 parser and n6 can parse IntelMQ data using the Intelmq2n6 parser.

Webinput CSV

The IntelMQ Webinput CSV software can also be used together with n6. The documentation can be found in the software’s repository: https://github.com/certat/intelmq-webinput-csv/blob/master/docs/webinput-n6.md

Getting involved

Developers Guide

Intended Audience

This guide is for developers of IntelMQ. It explains the code architecture, coding guidelines as well as ways you can contribute code or documentation. If you have not done so, please read the Introduction first. Once you feel comfortable running IntelMQ with open source bots and you feel adventurous enough to contribute to the project, this guide is for you. It does not matter if you are an experienced Python programmer or just a beginner. There are a lot of samples to help you out.

However, before we go into the details, it is important to observe and internalize some overall project goals.

Goals

It is important, that all developers agree and stick to these meta-guidelines. IntelMQ tries to:

  • Be well tested. For developers this means, we expect you to write unit tests for bots. Every time.

  • Reduce the complexity of system administration

  • Reduce the complexity of writing new bots for new data feeds

  • Make your code easily and pleasantly readable

  • Reduce the probability of events lost in all process with persistence functionality (even system crash)

  • Strictly adhere to the existing Data Harmonization for key-values in events

  • Always use JSON format for all messages internally

  • Help and support the interconnection between IntelMQ and existing tools like AbuseHelper, CIF, etc. or new tools (in other words: we will not accept data-silos!)

  • Provide an easy way to store data into Log Collectors like ElasticSearch, Splunk

  • Provide an easy way to create your own black-lists

  • Provide easy to understand interfaces with other systems via HTTP RESTFUL API

The main take away point from the list above is: things MUST stay __intuitive__ and __easy__. How do you ultimately test if things are still easy? Let them new programmers test-drive your features and if it is not understandable in 15 minutes, go back to the drawing board.

Similarly, if code does not get accepted upstream by the main developers, it is usually only because of the ease-of-use argument. Do not give up , go back to the drawing board, and re-submit again.

Development Environment

Installation

Developers can create a fork repository of IntelMQ in order to commit the new code to this repository and then be able to do pull requests to the main repository. Otherwise you can just use the ‘certtools’ as username below.

The following instructions will use pip3 -e, which gives you a so called editable installation. No code is copied in the libraries directories, there’s just a link to your code. However, configuration files still required to be moved to /opt/intelmq as the instructions show.

In this guide we use /opt/dev_intelmq as local repository copy. You can also use other directories as long as they are readable by other unprivileged users (e.g. home directories on Fedora can’t be read by other users by default). /opt/intelmq is used as root location for IntelMQ installations, this is IntelMQ’s default for this installation method. This directory is used for configurations (/opt/intelmq/etc), local states (/opt/intelmq/var/lib) and logs (/opt/intelmq/var/log).

sudo -s

git clone https://github.com/<your username>/intelmq.git /opt/dev_intelmq
cd /opt/dev_intelmq

pip3 install -e .

useradd -d /opt/intelmq -U -s /bin/bash intelmq

intelmqsetup

Note: please do not forget that configuration files, log files will be available on /opt/intelmq. However, if your development is somehow related to any shipped configuration file, you need to apply the changes in your repository /opt/dev_intelmq/intelmq/etc/.

How to develop

After you successfully setup your IntelMQ development environment, you can perform any development on any .py file on /opt/dev_intelmq. After you change, you can use the normal procedure to run the bots:

su - intelmq

intelmqctl start spamhaus-drop-collector

tail -f /opt/intelmq/var/log/spamhaus-drop-collector.log

You can also add new bots, creating the new .py file on the proper directory inside cd /opt/dev_intelmq/intelmq. However, your IntelMQ installation with pip3 needs to be updated. Please check the following section.

Update

In case you developed a new bot, you need to update your current development installation. In order to do that, please follow this procedure:

  1. Add the new bot information to /opt/dev_intelmq/intelmq/bots/BOTS, not /opt/intelmq/etc/BOTS.

  2. Make sure that you have your new bot in the right place and the information on BOTS file is correct.

  3. Execute the following commands:

sudo -s

cd /opt/dev_intelmq
## necessary for pip metadata update and new executables:
pip3 install -e .
## only necessary if it's not a link yet
cp -fs /opt/dev_intelmq/intelmq/bots/BOTS /opt/intelmq/etc/BOTS

find /opt/intelmq/ -type d -exec chmod 0770 {} \+
find /opt/intelmq/ -type f -exec chmod 0660 {} \+
chown -R intelmq.intelmq /opt/intelmq
## if you use the intelmq manager (adapt the webservers' group if needed):
chown intelmq.www-data /opt/intelmq/etc/*.conf

Now you can test run your new bot following this procedure:

su - intelmq

intelmqctl start <bot_id>
Testing
Additional optional requirements

For the documentation tests two additional libraries are required: Cerberus and PyYAML. You can install them with pip:

pip3 install Cerberus PyYAML

or the package management of your operating system.

Run the tests

All changes have to be tested and new contributions should be accompanied by according unit tests. Please do not run the tests as root just like any other IntelMQ component for security reasons. Any other unprivileged user is possible.

You can run the tests by changing to the directory with IntelMQ repository and running either unittest or nosetests:

cd /opt/dev_intelmq
sudo -u intelmq python3 -m unittest {discover|filename}  # or
sudo -u intelmq nosetests3 [filename]  # alternatively nosetests or nosetests-3.8 depending on your installation, or
sudo -u intelmq python3 setup.py test  # uses a build environment (no external dependencies)

Some bots need local databases to succeed. If you only want to test one explicit test file, give the file path as argument.

There are multiple GitHub Action Workflows setup for automatic testing, which are triggered on pull requests. You can also easily activate them for your forks.

Environment variables

There are a bunch of environment variables which switch on/off some tests:

  • INTELMQ_TEST_DATABASES: databases such as postgres, elasticsearch, mongodb are not tested by default. Set this environment variable to 1 to test those bots. These tests need preparation, e.g. running databases with users and certain passwords etc. Have a look at the .github/workflows/nosetests.yml and the corresponding .github/workflows/scripts/setup-full.sh in IntelMQ’s repository for steps to set databases up.

  • INTELMQ_SKIP_INTERNET: tests requiring internet connection will be skipped if this is set to 1.

  • INTELMQ_SKIP_REDIS: redis-related tests are ran by default, set this to 1 to skip those.

  • INTELMQ_TEST_EXOTIC: some bots and tests require libraries which may not be available, those are skipped by default. To run them, set this to 1.

  • INTELMQ_TEST_REDIS_PASSWORD: Set this value to the password for the local redis database if needed.

For example, to run all tests you can use:

INTELMQ_TEST_DATABASES=1 INTELMQ_TEST_EXOTIC=1 nosetests3
Configuration test files

The tests use the configuration files in your working directory, not those installed in /opt/intelmq/etc/ or /etc/. You can run the tests for a locally changed intelmq without affecting an installation or requiring root to run them.

Development Guidelines

Coding-Rules

Most important: KEEP IT SIMPLE!! This can not be over-estimated. Feature creep can destroy any good software project. But if new folks can not understand what you wrote in 10-15 minutes, it is not good. It’s not about the performance, etc. It’s about readability.

In general, we follow PEP 0008. We recommend reading it before committing code.

There are some exceptions: sometimes it does not make sense to check for every PEP8 error (such as whitespace indentation when you want to make a dict=() assignment look pretty. Therefore, we do have some exceptions defined in the setup.cfg file.

We support Python 3 only.

Unicode
  • Each internal object in IntelMQ (Event, Report, etc) that has strings, their strings MUST be in UTF-8 Unicode format.

  • Any data received from external sources MUST be transformed into UTF-8 Unicode format before add it to IntelMQ objects.

Back-end independence and Compatibility

Any component of the IntelMQ MUST be independent of the message queue technology (Redis, RabbitMQ, etc…).

Layout Rules
intelmq/
  lib/
    bot.py
    cache.py
    message.py
    pipeline.py
    utils.py
  bots/
    collector/
      <bot name>/
            collector.py
    parser/
      <bot name>/
            parser.py
    expert/
      <bot name>/
            expert.py
    output/
      <bot name>/
            output.py
    BOTS
  /conf
    pipeline.conf
    runtime.conf
    defaults.conf

Assuming you want to create a bot for a new ‘Abuse.ch’ feed. It turns out that here it is necessary to create different parsers for the respective kind of events (e.g. malicious URLs). Therefore, the usual hierarchy ‘intelmq/bots/parser/<FEED>/parser.py’ would not be suitable because it is necessary to have more parsers for each Abuse.ch Feed. The solution is to use the same hierarchy with an additional “description” in the file name, separated by underscore. Also see the section Directories and Files naming.

Example (including the current ones):

/intelmq/bots/parser/abusech/parser_domain.py
/intelmq/bots/parser/abusech/parser_ip.py
/intelmq/bots/parser/abusech/parser_ransomware.py

/intelmq/bots/parser/abusech/parser_malicious_url.py
Documentation

Please document your added/modified code.

For doc strings, we are using the sphinx-napoleon-google-type-annotation.

Additionally, Python’s type hints/annotations are used, see PEP 484.

Directories Hierarchy on Default Installation
  • Configuration Files Path: /opt/intelmq/etc/

  • PID Files Path: /opt/intelmq/var/run/

  • Logs Files and dumps Path: /opt/intelmq/var/log/

  • Additional Bot Files Path, e.g. templates or databases: /opt/intelmq/var/lib/bots/[bot-name]/

Directories and Files naming

Any directory and file of IntelMQ has to follow the Directories and Files naming. Any file name or folder name has to * be represented with lowercase and in case of the name has multiple words, the spaces between them must be removed or replaced by underscores; * be self-explaining what the content contains.

In the bot directories name, the name must correspond to the feed provider. If necessary and applicable the feed name can and should be used as postfix for the filename.

Examples:

intelmq/bots/parser/taichung/parser.py
intelmq/bots/parser/cymru/parser_full_bogons.py
intelmq/bots/parser/abusech/parser_ransomware.py
Class Names

Class name of the bot (ex: PhishTank Parser) must correspond to the type of the bot (ex: Parser) e.g. PhishTankParserBot

Data Harmonization Rules

Any component of IntelMQ MUST respect the “Data Harmonization Ontology”.

Reference: IntelMQ Data Harmonization - Data Harmonization

Code Submission Rules
Releases, Repositories and Branches
  • The main repository is in github.com/certtools/intelmq.

  • There are a couple of forks which might be regularly merged into the main repository. They are independent and can have incompatible changes and can deviate from the upstream repository.

  • We use semantic versioning. A short summary: * a.x are stable releases * a.b.x are bugfix/patch releases * a.x must be compatible to version a.0 (i.e. API/Config-compatibility)

  • If you contribute something, please fork the repository, create a separate branch and use this for pull requests, see section below.

Branching model
  • “master” is the stable branch. It hold the latest stable release. Non-developers should only work on this branch. The recommended log level is WARNING. Code is only added by merges from the maintenance branches.

  • “maintenance/a.b.x” branches accumulate (cherry-picked) patches for a maintenance release (a.b.x). Recommended for experienced users which deploy intelmq themselves. No new features will be added to these branches.

  • “develop” is the development branch for the next stable release (a.x). New features must go there. Developers may want to work on this branch. This branch also holds all patches from maintenance releases if applicable. The recommended log level is DEBUG.

  • Separate branches to develop features or bug fixes may be used by any contributor.

How to Contribute
  • Make separate pull requests / branches on GitHub for changes. This allows us to discuss things via GitHub.

  • We prefer one Pull Request per feature or change. If you have a bunch of small fixes, please don’t create one RP per fix :)

  • Only very small and changes (docs, …) might be committed directly to development branches without Pull Request by the core-team.

  • Keep the balance between atomic commits and keeping the amount of commits per PR small. You can use interactive rebasing to squash multiple small commits into one (rebase -i [base-branch]). Only do rebasing if the code you are rebasing is yet not used by others or is already merged - because then others may need to run into conflicts.

  • Make sure your PR is merge able in the develop branch and all tests are successful.

  • If possible sign your commits with GPG.

Workflow

We assume here, that origin is your own fork. We first add the upstream repository:

> git remote add upstream https://github.com/certtools/intelmq.git

Syncing develop:

> git checkout develop
> git pull upstream develop
> git push origin develop

You can do the same with the branches master and maintenance.

Create a separate feature-branch to work on, sync develop with upstream. Create working branch from develop:

> git checkout develop
> git checkout -b bugfix
# your work
> git commit

Or, for bugfixes create a separate bugfix-branch to work on, sync maintenance with upstream. Create working branch from maintenance:

> git checkout maintenance
> git checkout -b new-feature
# your work
> git commit

Getting upstream’s changes for master or any other branch:

> git checkout develop
> git pull upstream develop
> git push origin develop

There are 2 possibilities to get upstream’s commits into your branch. Rebasing and Merging. Using rebasing, your history is rewritten, putting your changes on top of all other commits. You can use this if your changes are not published yet (or only in your fork).

> git checkout bugfix
> git rebase develop

Using the -i flag for rebase enables interactive rebasing. You can then remove, reorder and squash commits, rewrite commit messages, beginning with the given branch, e.g. develop.

Or using merging. This doesn’t break the history. It’s considered more , but also pollutes the history with merge commits.

> git checkout bugfix
> git merge develop

You can then create a PR with your branch bugfix to our upstream repository, using GitHub’s web interface.

Commit Messages

If it fixes an existing issue, please use GitHub syntax, e.g.: fixes certtools/intelmq#<IssueID>

Prepare for Discussion in GitHub

If we don’t discuss it, it’s probably not tested.

License and Author files

License and Authors files can be found at the root of repository.

  • License file MUST NOT be modified except by the explicit written permission by CNCS/CERT.PT or CERT.at

  • Credit to the authors file must be always retained. When a new contributor (person and/or organization) improves in some way the repository content (code or documentation), he or she might add his name to the list of contributors.

License and authors must be only listed in an external file but not inside the code files.

System Overview

In the intelmq/lib/ directory you can find some libraries:

  • Bots: Defines base structure for bots and handling of startup, stop, messages etc.

  • Cache: For some expert bots it does make sense to cache external lookup results. Redis is used here.

  • Harmonization: For defined types, checks and sanitation methods are implemented.

  • Message: Defines Events and Reports classes, uses harmonization to check validity of keys and values according to config.

  • Pipeline: Writes messages to message queues. Implemented for productions use is only Redis, AMQP is beta.

  • Test: Base class for bot tests with predefined test and assert methods.

  • Utils: Utility functions used by system components.

Code Architecture
Code Architecture
Pipeline
  • collector bot

  • TBD

Bot Developer Guide

There’s a dummy bot including tests at intelmq/tests/lib/test_parser_bot.py.

You can always start any bot directly from command line by calling the executable. The executable will be created during installation a directory for binaries. After adding new bots to the code, install IntelMQ to get the files created. Don’t forget to give an bot id as first argument. Also, running bots with other users than intelmq will raise permission errors.

$ sudo -i intelmq
$ intelmqctl run file-output  # if configured
$ intelmq.bots.outputs.file.output file-output

You will get all logging outputs directly on stderr as well as in the log file.

Template

Please adjust the doc strings accordingly and remove the in-line comments (#).

"""Parse data from example.com, be a nice ExampleParserBot.

Document possible necessary configurations.
"""
import sys

# imports for additional libraries and intelmq
from intelmq.lib.bot import Bot


class ExampleParserBot(Bot):
    def process(self):
        report = self.receive_message()

        event = self.new_event(report)  # copies feed.name, time.observation
        ... # implement the logic here
        event.add('source.ip', '127.0.0.1')
        event.add('extra', {"os.name": "Linux"})

        self.send_message(event)
        self.acknowledge_message()


BOT = ExampleParserBot

There are some names with special meaning. These can be used i.e. called:

  • stop: Shuts the bot down.

  • receive_message, send_message, acknowledge_message: see next section

  • parameters: the bots configuration as object

  • start: internal method to run the bot

These can be defined:

  • init: called at startup, use it to set up the bot (initializing classes, loading files etc)

  • process: processes the messages

  • shutdown: To Gracefully stop the bot, e.g. terminate connections

All other names can be used freely.

Pipeline interactions

We can call three methods related to the pipeline:

  • self.receive_message(): The pipeline handler pops one message from the internal queue if possible. Otherwise one message from the sources list is popped, and added it to an internal queue. In case of errors in process handling, the message can still be found in the internal queue and is not lost. The bot class unravels the message a creates an instance of the Event or Report class.

  • self.send_message(event, path=”_default”): Processed message is sent to destination queues. It is possible to change the destination queues by optional path parameter.

  • self.acknowledge_message(): Message formerly received by receive_message is removed from the internal queue. This should always be done after processing and after the sending of the new message. In case of errors, this function is not called and the message will stay in the internal queue waiting to be processed again.

Logging
Log Messages Format

Log messages have to be clear and well formatted. The format is the following:

Format:

<timestamp> - <bot id> - <log level> - <log message>

Rules: * the Log message MUST follow the common rules of a sentence, beginning with uppercase and ending with period. * the sentence MUST describe the problem or has useful information to give to an inexperienced user a context. Pure stack traces without any further explanation are not helpful.

When the logger instance is created, the bot id must be given as parameter anyway. The function call defines the log level, see below.

Log Levels
  • debug: Debugging information includes retrieved and sent messages, detailed status information. Can include sensitive information like passwords and amount can be huge.

  • info: Logs include loaded databases, fetched reports or waiting messages.

  • warning: Unexpected, but handled behavior.

  • error: Errors and Exceptions.

  • critical Program is failing.

What to Log
  • Try to keep a balance between obscuring the source code file with hundreds of log messages and having too little log messages.

  • In general, a bot MUST report error conditions.

How to Log

The Bot class creates a logger with that should be used by bots. Other components won’t log anyway currently. Examples:

The exception method automatically appends an exception traceback. The logger instance writes by default to the file /opt/intelmq/var/log/[bot-id].log and to stderr.

String formatting in Logs

Parameters for string formatting are better passed as argument to the log function, see https://docs.python.org/3/library/logging.html#logging.Logger.debug In case of formatting problems, the error messages will be better. For example:

Error handling

The bot class itself has error handling implemented. The bot itself is allowed to throw exceptions and intended to fail! The bot should fail in case of malicious messages, and in case of unavailable but necessary resources. The bot class handles the exception and will restart until the maximum number of tries is reached and fail then. Additionally, the message in question is dumped to the file /opt/intelmq/var/log/[bot-id].dump and removed from the queue.

Initialization

Maybe it is necessary so setup a Cache instance or load a file into memory. Use the init function for this purpose:

Custom configuration checks

Every bot can define a static method check(parameters) which will be called by intelmqctl check. For example the check function of the ASNLookupExpert:

Examples
Parsers

Parsers can use a different, specialized Bot-class. It allows to work on individual elements of a report, splitting the functionality of the parser into multiple functions:

  • process: getting and sending data, handling of failures etc.

  • parse: Parses the report and splits it into single elements (e.g. lines). Can be overridden.

  • parse_line: Parses elements, returns an Event. Can be overridden.

  • recover_line: In case of failures and for the field raw, this function recovers a fully functional report containing only one element. Can be overridden.

For common cases, like CSV, existing function can be used, reducing the amount of code to implement. In the best case, only parse_line needs to be coded, as only this part interprets the data.

You can have a look at the implementation intelmq/lib/bot.py or at examples, e.g. the DummyBot in intelmq/tests/lib/test_parser_bot.py. This is a stub for creating a new Parser, showing the parameters and possible code:

parse_line

One line can lead to multiple events, thus parse_line can’t just return one Event. Thus, this function is a generator, which allows to easily return multiple values. Use yield event for valid Events and return in case of a void result (not parseable line, invalid data etc.).

Tests

In order to do automated tests on the bot, it is necessary to write tests including sample data. Have a look at some existing tests:

  • The DummyParserBot in intelmq/tests/lib/test_parser_bot.py. This test has the example data (report and event) inside the file, defined as dictionary.

  • The parser for malwaregroup at intelmq/tests/bots/parsers/malwaregroup/test_parser_*.py. The latter loads a sample HTML file from the same directory, which is the raw report.

  • The test for ASNLookupExpertBot has two event tests, one is an expected fail (IPv6).

Ideally an example contains not only the ideal case which should succeed, but also a case where should fail instead. (TODO: Implement assertEventNotEqual or assertEventNotcontainsSubset or similar) Most existing bots are only tested with one message. For newly written test it is appreciable to have tests including more then one message, e.g. a parser fed with an report consisting of multiple events.

When calling the file directly, only the tests in this file for the bot will be expected. Some default tests are always executed (via the test.BotTestCase class), such as pipeline and message checks, logging, bot naming or empty message handling.

See the Testing Pre-releases section about how to run the tests.

Configuration

In the end, the new information about the new bot should be added to BOTS file located at intelmq/bots. Note that the file is sorted!

Cache

Bots can use a Redis database as cache instance. Use the intelmq.lib.utils.Cache class to set this up and/or look at existing bots, like the cymru_whois expert how the cache can be used. Bots must set a TTL for all keys that are cached to avoid caches growing endless over time. Bots must use the Redis databases >= 10, but not those already used by other bots. See bots/BOTS what databases are already used.

The databases < 10 are reserved for the IntelMQ core:
  • 2: pipeline

  • 3: statistics

  • 4: tests

Documentation

The documentation is automatically published to https://intelmq.readthedocs.io/ at every push to the repository.

To build the documentation you need three packages: - Sphinx - ReCommonMark - sphinx-markdown-tables

To install them, you can use pip:

pip3 install -r docs/requirements.txt

Then use the Makefile to build the documentation using Sphinx:

cd docs
make html
Feeds documentation

The feeds which are known to be working with IntelMQ are documented in the machine-readable file intelmq/etc/feeds.yaml. The human-readable documentation is in generated with the Sphinx build as described in the previous section.

Testing Pre-releases

Installation

The installation procedures need to be adapted only a little bit.

For native packages, you can find the unstable packages of the next version here: Installation Unstable Native Packages.

For the installation with pip, use the –pre parameter as shown here following command:

pip3 install --pre intelmq

All other steps are not different. Please report any issues you find in our Issue Tracker.

Data Harmonization

Overview

All messages (reports and events) are Python/JSON dictionaries. The key names and according types are defined by the so called harmonization.

The purpose of this document is to list and clearly define known fields in Abusehelper as well as IntelMQ or similar systems. A field is a `key=value` pair. For a clear and unique definition of a field, we must define the key (field-name) as well as the possible values. A field belongs to an event. An event is basically a structured log record in the form `key=value, key=value, key=value, …`. In the List of known fields, each field is grouped by a section. We describe these sections briefly below. Every event MUST contain a timestamp field.

An IOC (Indicator of compromise) is a single observation like a log line.

Rules for keys

The keys can be grouped together in sub-fields, e.g. source.ip or source.geolocation.latitude. Thus, keys must match ^[a-z_](.[a-z0-9_]+)*$.

Sections

As stated above, every field is organized under some section. The following is a description of the sections and what they imply.

Feed

Fields listed under this grouping list details about the source feed where information came from.

Time

The time section lists all fields related to time information. This document requires that all the timestamps MUST be normalized to UTC. If the source reports only a date, do not attempt to invent timestamps.

Source Identity

This section lists all fields related to identification of the source. The source is the identity the IoC is about, as opposed to the destination identity, which is another identity.

For examples see the table below.

The abuse type of an event defines the way these events needs to be interpreted. For example, for a botnet drone they refer to the compromised machine, whereas for a command and control server they refer the server itself.

Source Geolocation Identity

We recognize that ip geolocation is not an exact science and analysis of the abuse data has shown that different sources attribution sources have different opinions of the geolocation of an ip. This is why we recommend to enrich the data with as many sources as you have available and make the decision which value to use for the cc IOC based on those answers.

Source Local Identity

Some sources report an internal (NATed) IP address.

Destination Identity

The abuse type of an event defines the way these IOCs needs to be interpreted. For a botnet drone they refer to the compromised machine, whereas for a command and control server they refer the server itself.

Destination Geolocation Identity

We recognize that ip geolocation is not an exact science and analysis of the abuse data has shown that different sources attribution sources have different opinions of the geolocation of an ip. This is why we recommend to enrich the data with as many sources as you have available and make the decision which value to use for the cc IOC based on those answers.

Destination Local Identity

Some sources report an internal (NATed) IP address.

Extra values

Data which does not fit in the harmonization can be saved in the ‘extra’ namespace. All keys must begin with extra., there are no other rules on key names and values. The values can be get/set like all other fields.

Fields List and data types

A list of allowed fields and data types can be found in Harmonization field names.

Classification

IntelMQ classifies events using three labels: taxonomy, type and identifier. This tuple of three values can be used for deduplication of events and describes what happened. TODO: examples from chat

The taxonomy can be automatically added by the taxonomy expert bot based on the given type. The following taxonomy-type mapping is based on eCSIRT II Taxonomy:

Taxonomy

Type

Description

abusive content

spam

Or ‘Unsolicited Bulk Email’, this means that the recipient has not granted verifiable permission for the message to be sent and that the message is sent as part of a larger collection of messages, all having a functionally comparable content.

abusive-content

harmful-speech

Discreditation or discrimination of somebody, e.g. cyber stalking, racism or threats against one or more individuals.

abusive-content

violence

Child pornography, glorification of violence, etc.

availability

ddos

Distributed Denial of Service attack, e.g. SYN-Flood or UDP-based reflection/amplification attacks.

availability

dos

Denial of Service attack, e.g. sending specially crafted requests to a web application which causes the application to crash or slow down.

availability

outage

Outage caused e.g. by air condition failure or natural disaster.

availability

sabotage

Physical sabotage, e.g cutting wires or malicious arson.

fraud

copyright

Offering or Installing copies of unlicensed commercial software or other copyright protected materials (Warez).

fraud

masquerade

Type of attack in which one entity illegitimately impersonates the identity of another in order to benefit from it.

fraud

phishing

Masquerading as another entity in order to persuade the user to reveal private credentials.

fraud

unauthorized-use-of-resources

Using resources for unauthorized purposes including profit-making ventures, e.g. the use of e-mail to participate in illegal profit chain letters or pyramid schemes.

information content security

Unauthorised-information-acces

Unauthorized access to information, e.g. by abusing stolen login credentials for a system or application, intercepting traffic or gaining access to physical documents.

information content security

Unauthorised-information-modification

Unauthorised modification of information, e.g. by an attacker abusing stolen login credentials for a system or application or a ransomware encrypting data.

information content security

data-loss

Loss of data, e.g. caused by harddisk failure or physical theft.

information content security

dropzone

This IOC refers to place where the compromised machines store the stolen user data. Not in ENISA eCSIRT-II taxonomy.

information content security

leak

IOCs relating to leaked credentials or personal data. Not in ENISA eCSIRT-II taxonomy.

information-gathering

scanner

Attacks that send requests to a system to discover weaknesses. This also includes testing processes to gather information on hosts, services and accounts. Examples: fingerd, DNS querying, ICMP, SMTP (EXPN, RCPT, …), port scanning.

information-gathering

sniffing

Observing and recording of network traffic (wiretapping).

information-gathering

social-engineering

Gathering information from a human being in a non-technical way (e.g. lies, tricks, bribes, or threats). This IOC refers to a resource, which has been observed to perform brute-force attacks over a given application protocol.

intrusion attempts

brute-force

Multiple login attempts (Guessing / cracking of passwords, brute force).

intrusion attempts

exploit

An attack using an unknown exploit.

intrusion attempts

ids-alert

IOCs based on a sensor network. This is a generic IOC denomination, should it be difficult to reliably denote the exact type of activity involved for example due to an anecdotal nature of the rule that triggered the alert.

intrusions

application-compromise

Compromise of an application by exploiting (un)known software vulnerabilities, e.g. SQL injection.

intrusions

backdoor

This refers to hosts, which have been compromised and backdoored with a remote administration software or Trojan in the traditional sense. Not in ENISA eCSIRT-II taxonomy.

intrusions

burglary

Physical intrusion, e.g. into corporate building or data center.

intrusions

compromised

This IOC refers to compromised system. Not in ENISA eCSIRT-II taxonomy.

intrusions

defacement

This IOC refers to hacktivism related activity. Not in ENISA eCSIRT-II taxonomy.

intrusions

privileged-account-compromise

Compromise of a system where the attacker gained administrative privileges.

intrusions

unauthorized-command

The possibly infected device sent unauthorized commands to a remote device with malicious intent. Not in ENISA eCSIRT-II taxonomy.

intrusions

unauthorized-login

A possibly infected device logged in to a remote device without authorization. Not in ENISA eCSIRT-II taxonomy.

intrusions

unprivileged-account-compromise

Compromise of a system using an unprivileged (user/service) account.

malicious code

c2server

This is a command and control server in charge of a given number of botnet drones.

malicious code

dga domain

DGA Domains are seen various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. Not in ENISA eCSIRT-II taxonomy.

malicious code

infected-system

This is a compromised machine, which has been observed to make a connection to a command and control server.

malicious code

malware

A URL is the most common resource with reference to malware binary distribution. Not in ENISA eCSIRT-II taxonomy.

malicious code

malware-configuration

This is a resource which updates botnet drones with a new configuration.

malicious code

malware-distribution

URI used for malware distribution, e.g. a download URL included in fake invoice malware spam.

malicious code

ransomware

This IOC refers to a specific type of compromised machine, where the computer has been hijacked for ransom by the criminals. Not in ENISA eCSIRT-II taxonomy and deprecated, use ‘infected system instead’.

other

blacklist

Some sources provide blacklists, which clearly refer to abusive behavior, such as spamming, but fail to denote the exact reason why a given identity has been blacklisted. The reason may be that the justification is anecdotal or missing entirely. This type should only be used if the typing fits the definition of a blacklist, but an event specific denomination is not possible for one reason or another.

other

other

All incidents which don’t fit in one of the given categories should be put into this class.

other

proxy

This refers to the use of proxies from inside your network. Not in ENISA eCSIRT-II taxonomy.

other

tor

This IOC refers to incidents related to TOR network infrastructure. Not in ENISA eCSIRT-II taxonomy.

other

unknown

Unknown classification. Not in ENISA eCSIRT-II taxonomy.

test

test

Meant for testing.

vulnerable

ddos-amplifier

Publicly accessible services that can be abused for conducting DDoS reflection/amplification attacks, e.g. DNS open-resolvers or NTP servers with monlist enabled.

vulnerable

information-disclosure

Publicly accessible services potentially disclosing sensitive information, e.g. SNMP or Redis.

vulnerable

potentially-unwanted-accessible

Potentially unwanted publicly accessible services, e.g. Telnet, RDP or VNC.

vulnerable

vulnerable client

This attribute refers to a badly configured or vulnerable clients, which may be vulnerable and can be compromised by a third party. For example, not-up-to-date clients or client which are misconfigured, such as clients querying public domains for WPAD configurations. In addition, to specify the vulnerability and its potential abuse, one should use the classification.identifier, description and other attributes for that purpose respectively. Not in ENISA eCSIRT-II taxonomy.

vulnerable

vulnerable service

This attribute refers to a badly configured or vulnerable network service, which may be abused by a third party. For example, these services relate to open proxies, open dns resolvers, network time servers (NTP) or character generation services (chargen), simple network management services (SNMP). In addition, to specify the network service and its potential abuse, one should use the protocol, destination port and description attributes for that purpose respectively. Not in ENISA eCSIRT-II taxonomy.

vulnerable

vulnerable-system

A system which is vulnerable to certain attacks. Example: misconfigured client proxy settings (example: WPAD), outdated operating system version, etc.

vulnerable

weak-crypto

Publicly accessible services offering weak crypto, e.g. web servers susceptible to POODLE/FREAK attacks.

Meaning of source, destination and local values for each classification type and possible identifiers. The identifier is often a normalized malware name, grouping many variants.

Type

Source

Destination

Local

Possible identifiers

backdoor

backdoored device

blacklist

blacklisted device

brute-force

attacker

target

c2server

(sinkholed) c&c server

zeus, palevo, feodo

compromised

server

ddos

attacker

target

defacement

defaced website

dga domain

infected device

dropzone

server hosting stolen data

exploit

hosting server

ids-alert

triggering device

infected system

infected device

contacted c2c server

malware

infected device

internal at source

zeus, palevo, feodo

malware configuration

infected device

other

phishing

phishing website

proxy

server allowing policy and security bypass

ransomware

infected device

scanner

scanning device

scanned device

http,modbus,wordpress

spam

infected device

targeted server

internal at source

test

unknown

vulnerable service

vulnerable device

heartbleed, openresolver, snmp

vulnerable client

vulnerable device

wpad

Field in italics is the interesting one for CERTs.

Example:

If you know of an IP address that connects to a zeus c&c server, it’s about the infected device, thus type malware and identifier zeus. If you want to complain about the c&c server, it’s type c&c and identifier zeus. The malware.name can have the full name, eg. ‘zeus_p2p’.

Harmonization field names

Section

Name

Type

Description

Classification

classification.identifier

String

The lowercase identifier defines the actual software or service (e.g. heartbleed or ntp_version) or standardized malware name (e.g. zeus). Note that you MAY overwrite this field during processing for your individual setup. This field is not standardized across IntelMQ setups/users.

Classification

classification.taxonomy

LowercaseString

We recognize the need for the CSIRT teams to apply a static (incident) taxonomy to abuse data. With this goal in mind the type IOC will serve as a basis for this activity. Each value of the dynamic type mapping translates to a an element in the static taxonomy. The European CSIRT teams for example have decided to apply the eCSIRT.net incident classification. The value of the taxonomy key is thus a derivative of the dynamic type above. For more information about check ENISA taxonomies.

Classification

classification.type

ClassificationType

The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid type explosion, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.


comment

String

Free text commentary about the abuse event inserted by an analyst.

Destination

destination.abuse_contact

LowercaseString

Abuse contact for destination address. A comma separated list.

Destination

destination.account

String

An account name or email address, which has been identified to relate to the destination of an abuse event.

Destination

destination.allocated

DateTime

Allocation date corresponding to BGP prefix.

Destination

destination.as_name

String

The autonomous system name to which the connection headed.

Destination

destination.asn

ASN

The autonomous system number to which the connection headed.

Destination

destination.domain_suffix

FQDN

The suffix of the domain from the public suffix list.

Destination

destination.fqdn

FQDN

A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.

Destination Geolocation

destination.geolocation.cc

UppercaseString

Country-Code according to ISO3166-1 alpha-2 for the destination IP.

Destination Geolocation

destination.geolocation.city

String

Some geolocation services refer to city-level geolocation.

Destination Geolocation

destination.geolocation.country

String

The country name derived from the ISO3166 country code (assigned to cc field).

Destination Geolocation

destination.geolocation.latitude

Float

Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.

Destination Geolocation

destination.geolocation.longitude

Float

Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.

Destination Geolocation

destination.geolocation.region

String

Some geolocation services refer to region-level geolocation.

Destination Geolocation

destination.geolocation.state

String

Some geolocation services refer to state-level geolocation.

Destination

destination.ip

IPAddress

The IP which is the target of the observed connections.

Destination

destination.local_hostname

String

Some sources report a internal hostname within a NAT related to the name configured for a compromized system

Destination

destination.local_ip

IPAddress

Some sources report a internal (NATed) IP address related a compromized system. N.B. RFC1918 IPs are OK here.

Destination

destination.network

IPNetwork

CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.

Destination

destination.port

Integer

The port to which the connection headed.

Destination

destination.registry

Registry

The IP registry a given ip address is allocated by.

Destination

destination.reverse_dns

FQDN

Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.

Destination

destination.tor_node

Boolean

If the destination IP was a known tor node.

Destination

destination.url

URL

A URL denotes on IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.

Destination

destination.urlpath

String

The path portion of an HTTP or related network request.

Event_Description

event_description.target

String

Some sources denominate the target (organization) of a an attack.

Event_Description

event_description.text

String

A free-form textual description of an abuse event.

Event_Description

event_description.url

URL

A description URL is a link to a further description of the the abuse event in question.


event_hash

UppercaseString

Computed event hash with specific keys and values that identify a unique event. At present, the hash should default to using the SHA1 function. Please note that for an event hash to be able to match more than one event (deduplication) the receiver of an event should calculate it based on a minimal set of keys and values present in the event. Using for example the observation time in the calculation will most likely render the checksum useless for deduplication purposes.


extra

JSONDict

All anecdotal information, which cannot be parsed into the data harmonization elements. E.g. os.name, os.version, etc. Note: this is only intended for mapping any fields which can not map naturally into the data harmonization. It is not intended for extending the data harmonization with your own fields.

Feed

feed.accuracy

Accuracy

A float between 0 and 100 that represents how accurate the data in the feed is

Feed

feed.code

String

Code name for the feed, e.g. DFGS, HSDAG etc.

Feed

feed.documentation

String

A URL or hint where to find the documentation of this feed.

Feed

feed.name

String

Name for the feed, usually found in collector bot configuration.

Feed

feed.provider

String

Name for the provider of the feed, usually found in collector bot configuration.

Feed

feed.url

URL

The URL of a given abuse feed, where applicable

Malware Hash

malware.hash.md5

String

A string depicting an MD5 checksum for a file, be it a malware sample for example.

Malware Hash

malware.hash.sha1

String

A string depicting a SHA1 checksum for a file, be it a malware sample for example.

Malware Hash

malware.hash.sha256

String

A string depicting a SHA256 checksum for a file, be it a malware sample for example.

Malware

malware.name

LowercaseString

The malware name in lower case.

Malware

malware.version

String

A version string for an identified artifact generation, e.g. a crime-ware kit.

Misp

misp.attribute_uuid

LowercaseString

MISP - Malware Information Sharing Platform & Threat Sharing UUID of an attribute.

Misp

misp.event_uuid

LowercaseString

MISP - Malware Information Sharing Platform & Threat Sharing UUID.


output

JSON

Event data converted into foreign format, intended to be exported by output plugin.

Protocol

protocol.application

LowercaseString

e.g. vnc, ssh, sip, irc, http or smtp.

Protocol

protocol.transport

LowercaseString

e.g. tcp, udp, icmp.


raw

Base64

The original line of the event from encoded in base64.


rtir_id

Integer

Request Tracker Incident Response ticket id.


screenshot_url

URL

Some source may report URLs related to a an image generated of a resource without any metadata. Or an URL pointing to resource, which has been rendered into a webshot, e.g. a PNG image and the relevant metadata related to its retrieval/generation.

Source

source.abuse_contact

LowercaseString

Abuse contact for source address. A comma separated list.

Source

source.account

String

An account name or email address, which has been identified to relate to the source of an abuse event.

Source

source.allocated

DateTime

Allocation date corresponding to BGP prefix.

Source

source.as_name

String

The autonomous system name from which the connection originated.

Source

source.asn

ASN

The autonomous system number from which originated the connection.

Source

source.domain_suffix

FQDN

The suffix of the domain from the public suffix list.

Source

source.fqdn

FQDN

A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.

Source Geolocation

source.geolocation.cc

UppercaseString

Country-Code according to ISO3166-1 alpha-2 for the source IP.

Source Geolocation

source.geolocation.city

String

Some geolocation services refer to city-level geolocation.

Source Geolocation

source.geolocation.country

String

The country name derived from the ISO3166 country code (assigned to cc field).

Source Geolocation

source.geolocation.cymru_cc

UppercaseString

The country code denoted for the ip by the Team Cymru asn to ip mapping service.

Source Geolocation

source.geolocation.geoip_cc

UppercaseString

MaxMind Country Code (ISO3166-1 alpha-2).

Source Geolocation

source.geolocation.latitude

Float

Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.

Source Geolocation

source.geolocation.longitude

Float

Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.

Source Geolocation

source.geolocation.region

String

Some geolocation services refer to region-level geolocation.

Source Geolocation

source.geolocation.state

String

Some geolocation services refer to state-level geolocation.

Source

source.ip

IPAddress

The ip observed to initiate the connection

Source

source.local_hostname

String

Some sources report a internal hostname within a NAT related to the name configured for a compromised system

Source

source.local_ip

IPAddress

Some sources report a internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here.

Source

source.network

IPNetwork

CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.

Source

source.port

Integer

The port from which the connection originated.

Source

source.registry

Registry

The IP registry a given ip address is allocated by.

Source

source.reverse_dns

FQDN

Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.

Source

source.tor_node

Boolean

If the source IP was a known tor node.

Source

source.url

URL

A URL denotes an IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.

Source

source.urlpath

String

The path portion of an HTTP or related network request.


status

String

Status of the malicious resource (phishing, dropzone, etc), e.g. online, offline.

Time

time.observation

DateTime

The time the collector of the local instance processed (observed) the event.

Time

time.source

DateTime

The time of occurrence of the event as reported the feed (source).


tlp

TLP

Traffic Light Protocol level of the event.

Harmonization types

ASN

ASN type. Derived from Integer with forbidden values.

Only valid are: 0 < asn <= 4294967295 See https://en.wikipedia.org/wiki/Autonomous_system_(Internet) > The first and last ASNs of the original 16-bit integers, namely 0 and > 65,535, and the last ASN of the 32-bit numbers, namely 4,294,967,295 are > reserved and should not be used by operators.

Accuracy

Accuracy type. A Float between 0 and 100.

Base64

Base64 type. Always gives unicode strings.

Sanitation encodes to base64 and accepts binary and unicode strings.

Boolean

Boolean type. Without sanitation only python bool is accepted.

Sanitation accepts string ‘true’ and ‘false’ and integers 0 and 1.

ClassificationType

classification.type type.

The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/ with extensions.

These old values are automatically mapped to the new ones:

‘botnet drone’ -> ‘infected-system’ ‘ids alert’ -> ‘ids-alert’ ‘c&c’ -> ‘c2server’ ‘infected system’ -> ‘infected-system’ ‘malware configuration’ -> ‘malware-configuration’

Allowed values are:
  • application-compromise

  • backdoor

  • blacklist

  • brute-force

  • burglary

  • c2server

  • compromised

  • copyright

  • data-loss

  • ddos

  • ddos-amplifier

  • defacement

  • dga domain

  • dos

  • dropzone

  • exploit

  • harmful-speech

  • ids-alert

  • infected-system

  • information-disclosure

  • leak

  • malware

  • malware-configuration

  • malware-distribution

  • masquerade

  • other

  • outage

  • phishing

  • potentially-unwanted-accessible

  • privileged-account-compromise

  • proxy

  • ransomware

  • sabotage

  • scanner

  • sniffing

  • social-engineering

  • spam

  • test

  • tor

  • Unauthorised-information-access

  • Unauthorised-information-modification

  • unauthorized-command

  • unauthorized-login

  • unauthorized-use-of-resources

  • unknown

  • unprivileged-account-compromise

  • violence

  • vulnerable client

  • vulnerable service

  • vulnerable-system

  • weak-crypto

DateTime

Date and time type for timestamps.

Valid values are timestamps with time zone and in the format ‘%Y-%m-%dT%H:%M:%S+00:00’. Invalid are missing times and missing timezone information (UTC). Microseconds are also allowed.

Sanitation normalizes the timezone to UTC, which is the only allowed timezone.

The following additional conversions are available with the convert function:

  • timestamp

  • windows_nt: From Windows NT / AD / LDAP

  • epoch_millis: From Milliseconds since Epoch

  • from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’

  • from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’

  • utc_isoformat: Parse date generated by datetime.isoformat()

  • fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given

FQDN

Fully qualified domain name type.

All valid lowercase domains are accepted, no IP addresses or URLs. Trailing dot is not allowed.

To prevent values like ‘10.0.0.1:8080’ (#1235), we check for the non-existence of ‘:’.

Float

Float type. Without sanitation only python float/integer/long is accepted. Boolean is explicitly denied.

Sanitation accepts strings and everything float() accepts.

IPAddress

Type for IP addresses, all families. Uses the ipaddress module.

Sanitation accepts integers, strings and objects of ipaddress.IPv4Address and ipaddress.IPv6Address.

Valid values are only strings. 0.0.0.0 is explicitly not allowed.

IPNetwork

Type for IP networks, all families. Uses the ipaddress module.

Sanitation accepts strings and objects of ipaddress.IPv4Network and ipaddress.IPv6Network. If host bits in strings are set, they will be ignored (e.g 127.0.0.1/32).

Valid values are only strings.

Integer

Integer type. Without sanitation only python integer/long is accepted. Bool is explicitly denied.

Sanitation accepts strings and everything int() accepts.

JSON

JSON type.

Sanitation accepts any valid JSON objects.

Valid values are only unicode strings with JSON objects.

JSONDict

JSONDict type.

Sanitation accepts pythons dictionaries and JSON strings.

Valid values are only unicode strings with JSON dictionaries.

LowercaseString

Like string, but only allows lower case characters.

Sanitation lowers all characters.

Registry

Registry type. Derived from UppercaseString.

Only valid values: AFRINIC, APNIC, ARIN, LACNIC, RIPE. RIPE-NCC and RIPENCC are normalized to RIPE.

String

Any non-empty string without leading or trailing whitespace.

TLP

TLP level type. Derived from UppercaseString.

Only valid values: WHITE, GREEN, AMBER, RED.

Accepted for sanitation are different cases and the prefix ‘tlp:’.

URL

URI type. Local and remote.

Sanitation converts hxxp and hxxps to http and https. For local URIs (file) a missing host is replaced by localhost.

Valid values must have the host (network location part).

UppercaseString

Like string, but only allows upper case characters.

Sanitation uppers all characters.

Release procedure

General assumption: You are working on branch maintenance, the next version is a bug fix release. For feature releases it is slightly different.

Check before

  • Make sure the current state is really final ;) You can test most of the steps described here locally before doing it real.

  • Check the upgrade functions in intelmq/lib/upgrades.py.

  • Close the milestone on GitHub and move any open issues to the next one.

  • docs/user/installation.rst: Update supported operating systems.

Documentation

  • CHANGELOG.MD and

  • NEWS.MD: Update the latest header, fix the order, remove empty sections and (re)group the entries if necessary.

  • intelmq/version.py: Update the version.

  • debian/changelog: Insert a new section for the new version with the tool dch.

Eventually adapt the default log levels if necessary. Should be INFO for stable releases. See older releases.

Commit, push, review and merge

Commit your changes in a separate branch, the final commit’s message should start with REL:. Push and create a pull request to maintenance and after that from maintenance to master. Someone else should review the changes. Eventually fix them, make sure the REL: is the last commit, you can also push that one at last, after the reviews.

Why a separate branch? Because if problems show up, you can still force-push to that one, keeping the release commit the latest one.

Tag and release

Tag the commit with git tag -s version HEAD, merge it into master, push the branches and the tag. The tag is just a.b.c, not prefixed with v (that was necessary only with SVN a long time ago…).

Go to https://github.com/certtools/intelmq/tags and enter the release notes (from the CHANGELOG) for the new tag, then it’s considered a release by GitHub.

Tarballs and PyPI

  • Build the source and binary (wheel) distribution: python3 setup.py sdist bdist_wheel

  • Upload the files including signatures to PyPI with e.g. twine: twine upload -s dist/intelmq…

Packages

We are currently using the public Open Build Service instance of openSUSE: http://build.opensuse.org/project/show/home:sebix:intelmq

First, test all the steps first with the unstable-repository and check that at least installations succeed.

  • Create the tarballs with the script create-archives.sh.

  • Update the dsc and spec files for new filenames and versions.

  • Update the .changes file

  • Build locally for all distributions.

  • Commit.

Docker Image

Releasing a new Docker image is very easy.

  • Clone IntelMQ Docker Repository with git clone https://github.com/certat/intelmq-docker.git --recursive as this repository contains submodules

  • Run ./build.sh, check your console if the build was successful.

  • Run ./test.sh - It will run nosetests3 with exotic flag. All errors/warnings will be displayed.

  • Change the build_version in publish.sh to the new version you want to release.

  • Change the namespace variable in publish.sh.

  • If no error/warning was shown, you can release with ./publish.sh.

Announcements

Announce the new version at the mailinglists intelmq-users, intelmq-dev. For bigger releases, probably also at IHAP, Twitter, etc. Ask your favorite social media consultant.

Prepare new version

Increase the version in intelmq/version.py and declare it as alpha version. Add the new version in intelmq/lib/upgrades.py. Add a new entry in debian/changelog with dch -v [version] -c debian/changelog.

Add new entries to CHANGELOG.md and NEWS.md. For CHANGELOG.md:

### Configuration

### Core

### Development

### Harmonization

### Bots
#### Collectors

#### Parsers

#### Experts

#### Outputs

### Documentation

### Packaging

### Tests

### Tools

### Contrib

### Known issues

And for NEWS.md:

### Requirements

### Tools

### Harmonization

### Configuration

### Libraries

### Postgres databases

Feeds wishlist

This is a list with various feeds, which are either currently not supported or the usage is not clearly documented in IntelMQ.

If you want to contribute documenting how to configure existing bots in order to collect new feeds or by creating new parsers, here is a list of potentially interesting feeds. See Feeds documentation for more information on this.

This list evolved from the issue Contribute: Feeds List (#384).

Licence

This software is licensed under GNU Affero General Public License version 3

Funded by

This project was partially funded by the CEF framework

Co-financed by the Connecting Europe Facility of the European Union

Indices and tables